How to go to your page This eBook contains five volumes. Each volume has its own page numbering scheme, consisting of a...
244 downloads
2550 Views
7MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
How to go to your page This eBook contains five volumes. Each volume has its own page numbering scheme, consisting of a volume number and a page number, separated by a colon. For example, to go to page ii of Volume I, type I:ii in the "page #" box at the top of the screen and click "Go." To go to page 186 of Volume II, type II:186 … and so forth.
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY
Salkind_Prelims I.indd i
9/16/2010 12:41:58 PM
The SAGE Library of Educational Thought and Practice major works series encapsulates and disseminates the seminal works in the field of educational science and collects together those articles and essays which have been most influential in shaping and driving the discipline. Each multivolume set presents readers with a collection of both classical and contemporary published works sourced from the foremost publications in the field by an internationally renowned editor or editorial team. Each set includes a full introduction, presenting a rationale for the selection and which contextualizes the major work within the discipline, giving students, researchers and academics insight into the past, present and likely future of that area of research. The series covers both key approaches to studying education theory and the primary sub-fields which form the focus of educational practitioners’ work. The SAGE Library of Educational Thought and Practice is an essential addition for all libraries throughout the world with an interest in education. Neil J. Salkind has been teaching at the University of Kansas for 30 years, in the Department of Psychology and Research in Education. He has published more than 80 professional papers and is the author of several college-level textbooks, including Statistics for People Who (Think They) Hate Statistics (now in the third edition), Child Development, Exploring Research, and Introduction to Theories of Human Development (SAGE 2004). He was editor of Child Development Abstracts and Bibliography from 1989 through 2002 and is active in the Society for Research in Child Development.
Salkind_Prelims I.indd ii
9/16/2010 12:41:59 PM
SAGE LIBRARY OF EDUCATIONAL THOUGHT AND PRACTICE
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY VOLUME I
Edited by
Neil J. Salkind
Salkind_Prelims I.indd iii
9/16/2010 12:41:59 PM
Introduction and editorial arrangement © Neil J. Salkind 2011 First published 2011 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. Every effort has been made to trace and acknowledge all the copyright owners of the material reprinted herein. However, if any copyright owners have not been located and contacted at the time of publication, the publishers will be pleased to make the necessary arrangements at the first opportunity. SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1, Mohan Cooperative Industrial Area Mathura Road New Delhi 110 044 SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN: 978-0-85702-178-6 (set of five volumes) Library of Congress Control Number: 2010923776 Typeset by Mukesh Technologies Pvt. Ltd., Pondicherry, India. Printed on paper from sustainable resources Printed by MPG Books Group, Bodmin Cornwall
Salkind_Prelims I.indd iv
9/16/2010 12:41:59 PM
Contents Appendix of Sources Editor’s Introduction Neil J. Salkind
xiii xxiii
Volume I Section I: Human Development 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
11. 12.
Salkind_Prelims I.indd v
Aging and Human Performance Neil Charness Violence and Human Development Elton B. McNeil The Life-course and Human Development: An Ecological Perspective Glen H. Elder, Jr and Richard C. Rockwell The Family Conference: The Social Control of Human Development David R. Buckholdt From Childhood to the Later Years: Pathways of Human Development Robert Crosnoe and Glen H. Elder Jr The Developmental Niche: A Conceptualization at the Interface of Child and Culture Charles M. Super and Sara Harkness Conceptualizing Adult Development Calvin F. Settlage, John Curtis, Marjorie Lozoff, Milton Lozoff, George Silberschatz and Earl J. Simburg Early Child Care and Children’s Development Prior to School Entry: Results from the NICHD Study of Early Child Care NICHD Early Child Care Research Network A Developmental Approach to Language Acquisition: Two Case Studies M. Bamberg, N. Budwig and B. Kaplan Promoting Positive Youth Development: New Directions in Developmental Theory, Methods, and Research William M. Kurtines, Laura Ferrer-Wreder, Steven L. Berman, Carolyn Cass Lorente, Wendy K. Silverman and Marilyn J. Montgomery Children Have More Need of Models than Critics: Early Language Experience and Brain Development Travis Thompson Development: Transfer of Technology, Transfer of Culture Jacques Binet (Translated by Jeanne Ferguson)
3 15 25 43 57 81 103
119 151 171
181 191
9/16/2010 12:41:59 PM
vi
13. 14. 15. 16. 17. 18.
Contents
The Clinical Study and Treatment of Normal and Abnormal Development: A Psychological Clinic Lightner Witmer Self-Motivation for Academic Attainment: The Role of Self-Efficacy Beliefs and Personal Goal Setting Barry J. Zimmerman, Albert Bandura and Manuel Martinez-Pons The Dangerous and the Good? Developmentalism, Progress, and Public Schooling Bernadette Baker The Scientific Humanism of G. Stanley Hall Donald H. Meyer Growing Old – Or Older and Growing Carl R. Rogers Maturational Timing and the Development of Problem Behavior: Longitudinal Studies in Adolescence Rainer K. Silbereisen, Anne C. Petersen, Helfried T. Albrecht and Bärbel Kracke
207 231 245 287 299 311
Volume II Section I: Human Development (Continued) 19. 20. 21.
Motor Development as Foundation and Future of Developmental Psychology Esther Thelen Physical Growth Kai Jensen Mental Development during the Preadolescent and Adolescent Periods Gordon Hendrickson
3 31 79
Section II: Curriculum, Instruction and Learning 22. 23. 24. 25. 26.
Making Sense of Curriculum Evaluation: Continuities and Discontinuities in an Educational Idea David Hamilton Psychology of Learning Environments: Behavioral, Structural, or Perceptual? Herbert J. Walberg Thought and Two Languages: The Impact of Bilingualism on Cognitive Development Rafael M. Diaz Components of a Psychology of Instruction: Toward a Science of Design Robert Glaser The Emergence of Cognitive Psychology Robert R. Holt
Salkind_Prelims I.indd vi
93 123 159 189 211
9/16/2010 12:41:59 PM
Contents
27.
The Advancement of Learning Ann L. Brown 28. Paradigms of Knowledge and Instruction S. Farnham-Diggory 29. Health Promotion by Social Cognitive Means Albert Bandura 30. Models of the Learner Jerome Bruner 31. Child’s Talk: Learning to Use Language Jerome Bruner 32. The Reflexivity of Cognitive Science: The Scientist as Model of Human Nature Jamie Cohen-Cole 33. History, Culture, Learning, and Development Patricia M. Greenfield, Ashley E. Maynard and Carla P. Childs 34. Biology and Cognition Jean Piaget (Translated by Martin Faigel) 35. Neural Bases of Intelligence and Training Mark R. Rosenzweig
vii
227 249 267 291 299 303 333 351 369
Volume III Section II: Curriculum, Instruction and Learning (Continued) 36. 37. 38. 39. 40. 41.
Human Intelligence: An Introduction to Advances in Theory and Research David F. Lohman Cognitive Demands of New Technologies and the Implications for Learning Theory Richard J. Torraco Cognitive Conceptions of Learning Thomas J. Shuell Meaning in Complex Learning Ronald E. Johnson Phases of Meaningful Learning Thomas J. Shuell Growth, Development, Learning, and Maturation as Factors in Curriculum and Teaching William C. Trow
3 51 79 109 141 161
Section III: Motivation 42. 43. 44.
Maslow, Monkeys and Motivation Theory Dallas Cullen Maslow’s Theory of Motivation: A Critique Andrew Neher Caught on Fire: Motivation and Giftedness Ann Robinson
Salkind_Prelims I.indd vii
175 195 215
9/16/2010 12:41:59 PM
viii
45. 46. 47. 48. 49. 50. 51.
52. 53.
54. 55. 56.
57.
Contents
An Empirical Test of Maslow’s Theory of Motivation Eugene W. Mathes and Linda L. Edwards Meaningfulness, Commitment, and Engagement: The Intersection of a Deeper Level of Intrinsic Motivation Neal Chalofsky and Vijay Krishna Motivation and Human Growth: A Developmental Perspective M.S. Srinivasin Evolutionary Perspectives on Human Motivation Jutta Heckhausen The Debate about Rewards and Intrinsic Motivation: Protests and Accusations Do Not Alter the Results Judy Cameron and W. David Pierce A Comprehensive Expectancy Motivation Model: Implications for Adult Education and Training Kenneth W. Howard The Academic Motivation Scale: A Measure of Intrinsic, Extrinsic, and Amotivation in Education Robert J. Vallerand, Luc G. Pelletier, Marc R. Blais, Nathalie M. Brière, Caroline Senécal and Evelyne F. Vallières Extrinsic Rewards and Intrinsic Motivation in Education: Reconsidered Once Again Edward L. Deci, Richard Koestner and Richard M. Ryan Beyond the Rhetoric: Understanding Achievement and Motivation in Catholic School Students Janine Bempechat, Beth A. Boulay, Stephanie C. Piergross and Kenzie A. Wenk Dimensions of School Motivation: A Cross-cultural Validation Study Dennis M. McInerney and Kenneth E. Sinclair Achievement Motivation in Children of Three Ethnic Groups in the United States Manuel Ramirez III and Douglass R. Price-Williams Motivation and Learning Environment Differences between Resilient and Nonresilient Latino Middle School Students Hersholt C. Waxman, Shwu-yong L. Huang and Yolanda N. Padrón Attracting and Retaining Teachers: A Question of Motivation Karin Müller, Roberta Alliata and Fabienne Benninghoff
219 223 237 247 263 279 291
305 333
345 361 369 387
Volume IV Section III: Motivation (Continued) 58.
Interpersonal Relationships, Motivation, Engagement, and Achievement: Yields for Theory, Current Issues, and Educational Practice Andrew J. Martin and Martin Dowson
Salkind_Prelims I.indd viii
3
9/16/2010 12:41:59 PM
Contents
59. 60. 61.
62. 63. 64. 65. 66.
Classroom and Individual Differences in Early Adolescents’ Motivation and Self-Regulated Learning Paul R. Pintrich, Robert W. Roeser and Elisabeth A.M. De Groot Atkinson’s Theory of Achievement Motivation: First Step toward a Theory of Academic Motivation? Martin L. Maehr and Douglas D. Sjogren Motivation and Engagement across the Academic Life Span: A Developmental Construct Validity Study of Elementary School, High School, and University/College Students Andrew J. Martin Motivation and Achievement: A Quantitative Synthesis Margaret E. Uguroglu and Herbert J. Walberg Academic Motivation and Achievement among Urban Adolescents Joyce F. Long, Shinichi Monoi, Brian Harper, Dee Knoblauch and P. Karen Murphy Intrinsic Motivation and School Misbehavior: Some Intervention Implications Howard S. Adelman and Linda Taylor Reinforcement, Reward, and Intrinsic Motivation: A Meta-Analysis Judy Cameron and W. David Pierce Motivation in Transition Barbara Stauber
ix
45 67
87 121 135
157 179 241
Section IV: Research Design, Measurement and Statistics and Evaluation 67. 68. 69. 70. 71. 72.
Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing Raymond Hubbard and R. Murray Lindsay Alphabet Soup: Blurring the Distinctions between p’s and a ’s in Psychological Research Raymond Hubbard Research Methods: Experimental Design Julian C. Stanley What Can We Learn from International Assessments? Robert J. Mislevy Power, Control, and Validity in Research Randall M. Parker Testing Reasoning and Reasoning about Testing Walt Haney
263 283 313 325 353 371
Volume V Section IV: Research Design, Measurement and Statistics and Evaluation (Continued) 73.
Magnitudes of Experimental Effects in Social Science Research Lee Sechrest and William H. Yeaton
Salkind_Prelims I.indd ix
3
9/16/2010 12:41:59 PM
x
Contents
74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93.
Salkind_Prelims I.indd x
Hypothesis Testing in Relation to Statistical Methodology Cherry Ann Clark On Examinee Choice in Educational Testing Howard Wainer and David Thissen Historical Views of Invariance: Evidence from the Measurement Theories of Thorndike, Thurstone, and Rasch George Engelhard, Jr If Statistical Significance Tests Are Broken/Misused, What Practices Should Supplement or Replace Them? Bruce Thompson Musical Aptitude Testing: From James McKeen Cattell to Carl Emil Seashore Jere T. Humphreys The Life and Labors of Francis Galton: A Review of Four Recent Books about the Father of Behavioral Statistics Brian E. Clauser Regression towards the Mean, Historically Considered Stephen M. Stigler Karl Pearson and Statistics: The Social Origins of Scientific Innovation Bernard J. Norton A History of Effect Size Indices Carl J. Huberty The Role of Assessment in a Learning Culture Lorrie A. Shepard The Place of Theory in Educational Research Patrick Suppes Curriculum-based Measures: Development and Perspectives Stanley L. Deno Tests as Research Instruments Robert L. Thorndike My Current Thoughts on Coefficient Alpha and Successor Procedures Lee J. Cronbach and Richard J. Shavelson Handbook of Evaluation Research Lee Ross and Lee J. Cronbach A Model for Studying the Validity of Multiple-Choice Items Lee J. Cronbach and Jack C. Merwin Assisted Assessment: A Taxonomy of Approaches and an Outline of Strengths and Weaknesses Joseph C. Campione Standardized Testing Roger T. Lennon The Place of Statistics in Psychology Jum Nunnally Education in Statistics and Research Design in School Psychology Steven G. Little, Howard B. Lee and Angeleque Akin-Little
23 43 83 99 115 131 137 151 179 193 215 231 243 257 285 305 319 351 359 367
9/16/2010 12:41:59 PM
Contents
94. 95.
The Role of Measurement Error in Familiar Statistics Malcolm James Ree and Thomas R. Carretta Qualitative Methods and the Development of Clinical Assessment Tools Jane F. Gilgun
Salkind_Prelims I.indd xi
xi
377 393
9/16/2010 12:41:59 PM
Salkind_Prelims I.indd xii
9/16/2010 12:41:59 PM
Appendix of Sources All articles and chapters have been reproduced exactly as they were first published, including textual cross-references to material in the original source. Grateful acknowledgement is made to the following sources for permission to reproduce material in this book. 1.
‘Aging and Human Performance’, Neil Charness Human Factors: The Journal of the Human Factors and Ergonomics Society, 50 (2008): 548–555. Published by SAGE Publications, Inc. Reprinted with permission.
2.
‘Violence and Human Development’, Elton B. McNeil The ANNALS of the American Academy of Political and Social Science, 364 (1966): 149–157. Published by SAGE Publications, Inc. Reprinted with permission.
3.
‘The Life-Course and Human Development: An Ecological Perspective’, Glen H. Elder, Jr and Richard C. Rockwell International Journal of Behavioral Development, 2 (1979): 1–21. Published by SAGE Publications Ltd. Reprinted with permission.
4.
‘The Family Conference: The Social Control of Human Development’, David R. Buckholdt Journal of Family Issues, 4(4) (1983): 613–631. Published by SAGE Publications, Inc. Reprinted with permission.
5.
‘From Childhood to the Later Years: Pathways of Human Development’, Robert Crosnoe and Glen H. Elder Jr Research on Aging, 26(6) (2004): 623–654. Published by SAGE Publications, Inc. Reprinted with permission.
6.
‘The Developmental Niche: A Conceptualization at the Interface of Child and Culture’, Charles M. Super and Sara Harkness International Journal of Behavioral Development, 9 (1986): 545–569. Published by SAGE Publications Ltd. Reprinted with permission.
7.
‘Conceptualizing Adult Development’, Calvin F. Settlage, John Curtis, Marjorie Lozoff, Milton Lozoff, George Silberschatz and Earl J. Simburg Journal of the American Psychoanalytic Association, 36 (1988): 347–369. Published by SAGE Publications, Inc. Reprinted with permission.
8.
‘Early Child Care and Children’s Development Prior to School Entry: Results from the NICHD Study of Early Child Care’, NICHD Early Child Care Research Network American Educational Research Journal, 39(1) (2002): 133–164. Published by SAGE Publications, Inc. Reprinted with permission.
Salkind_Prelims I.indd xiii
9/16/2010 12:41:59 PM
xiv
Appendix of Sources
9. ‘A Developmental Approach to Language Acquisition: Two Case Studies’, M. Bamberg, N. Budwig and B. Kaplan First Language, 11 (1991): 121–141. Published by SAGE Publications Ltd. Reprinted with permission. 10. ‘Promoting Positive Youth Development: New Directions in Developmental Theory, Methods, and Research’, William M. Kurtines, Laura Ferrer-Wreder, Steven L. Berman, Carolyn Cass Lorente, Wendy K. Silverman and Marilyn J. Montgomery Journal of Adolescent Research, 23(3) (2008): 233–243. Published by SAGE Publications, Inc. Reprinted with permission. 11. ‘Children Have More Need of Models than Critics: Early Language Experience and Brain Development’, Travis Thompson Journal of Early Intervention, 19(3) (1995): 264 – 272. Published by SAGE Publications, Inc. Reprinted with permission. 12. ‘Development: Transfer of Technology, Transfer of Culture’, Jacques Binet (Translated by Jeanne Ferguson) Diogenes, 32 (1984): 19–38. Published by SAGE Publications Ltd. Reprinted with permission. 13. ‘The Clinical Study and Treatment of Normal and Abnormal Development: A Psychological Clinic’, Lightner Witmer The ANNALS of the American Academy of Political and Social Science, 34 (1909): 141–162. Published by SAGE Publications, Inc. Reprinted with permission. 14. ‘Self-Motivation for Academic Attainment: The Role of Self-Efficacy Beliefs and Personal Goal Setting’, Barry J. Zimmerman, Albert Bandura and Manuel Martinez-Pons American Educational Research Journal, 29(3) (1992): 663–676. Published by SAGE Publications, Inc. Reprinted with permission. 15. ‘The Dangerous and the Good? Developmentalism, Progress, and Public Schooling’, Bernadette Baker American Educational Research Journal, 36(4) (1999): 797–834. Published by SAGE Publications, Inc. Reprinted with permission. 16. ‘The Scientific Humanism of G. Stanley Hall’, Donald H. Meyer Journal of Humanistic Psychology, 11 (1971): 201–213. Published by SAGE Publications, Inc. Reprinted with permission. 17. ‘Growing Old – Or Older and Growing’, Carl R. Rogers Journal of Humanistic Psychology, 20(4) (1980): 5 –16. Published by SAGE Publications, Inc. Reprinted with permission. 18. ‘Maturational Timing and the Development of Problem Behavior: Longitudinal Studies in Adolescence’, Rainer K. Silbereisen, Anne C. Petersen, Helfried T. Albrecht and Bärbel Kracke The Journal of Early Adolescence, 9(3) (1989): 247–268. Published by SAGE Publications, Inc. Reprinted with permission.
Salkind_Prelims I.indd xiv
9/16/2010 12:41:59 PM
Appendix of Sources
xv
19. ‘Motor Development as Foundation and Future of Developmental Psychology’, Esther Thelen International Journal of Behavioral Development, 24(4) (2000): 385–397. Published by SAGE Publications Ltd. Reprinted with permission. 20. ‘Physical Growth’, Kai Jensen Review of Educational Research, XXV(5) (1955): 369–414. Published by SAGE Publications, Inc. Reprinted with permission. 21. ‘Mental Development during the Preadolescent and Adolescent Periods’, Gordon Hendrickson Review of Educational Research, XX(5) (1950): 351–360. Published by SAGE Publications, Inc. Reprinted with permission. 22. ‘Making Sense of Curriculum Evaluation: Continuities and Discontinuities in an Educational Idea’, David Hamilton Review of Research in Education, 5 (1977): 318–347. Published by SAGE Publications, Inc. Reprinted with permission. 23. ‘Psychology of Learning Environments: Behavioral, Structural, or Perceptual?’, Herbert J. Walberg Review of Research in Education, 4 (1976): 142–178. Published by SAGE Publications, Inc. Reprinted with permission. 24. ‘Thought and Two Languages: The Impact of Bilingualism on Cognitive Development’, Rafael M. Diaz Review of Research in Education, 10 (1983): 23–54. Published by SAGE Publications, Inc. Reprinted with permission. 25. ‘Components of a Psychology of Instruction: Toward a Science of Design’, Robert Glaser Review of Educational Research, 46(1) (1976): 1–24. Published by SAGE Publications, Inc. Reprinted with permission. 26. ‘The Emergence of Cognitive Psychology’, Robert R. Holt Journal of the American Psychoanalytic Association, 12 (1964): 650–665. Published by SAGE Publications, Inc. Reprinted with permission. 27. ‘The Advancement of Learning’, Ann L. Brown Educational Researcher, 23 (1994): 4 –12. Published by SAGE Publications, Inc. Reprinted with permission. 28. ‘Paradigms of Knowledge and Instruction’, S. Farnham-Diggory Review of Educational Research, 64(3) (1994): 463–477. Published by SAGE Publications, Inc. Reprinted with permission. 29. ‘Health Promotion by Social Cognitive Means’, Albert Bandura Health Education & Behavior, 31(2) (2004): 143–164. Published by SAGE Publications, Inc. Reprinted with permission. 30. ‘Models of the Learner’, Jerome Bruner Educational Researcher, 14 (1985): 5–8 Published by SAGE Publications, Inc. Reprinted with permission.
Salkind_Prelims I.indd xv
9/16/2010 12:41:59 PM
xvi
Appendix of Sources
31. ‘Child’s Talk: Learning to Use Language’, Jerome Bruner Child Language Teaching and Therapy, 1 (1985): 111–114. Published by SAGE Publications Ltd. Reprinted with permission. 32. ‘The Reflexivity of Cognitive Science: The Scientist as Model of Human Nature’, Jamie Cohen-Cole History of the Human Sciences, 18(4) (2005): 107–139. Published by SAGE Publications Ltd. Reprinted with permission. 33. ‘History, Culture, Learning, and Development’, Patricia M. Greenfield, Ashley E. Maynard and Carla P. Childs Cross-Cultural Research, 34(4) (2000): 351–374. Published by SAGE Publications, Inc. Reprinted with permission. 34. ‘Biology and Cognition’, Jean Piaget (Translated by Martin Faigel) Diogenes, 14 (1966): 1–22. Published by SAGE Publications Ltd. Reprinted with permission. 35. ‘Neural Bases of Intelligence and Training’, Mark R. Rosenzweig The Journal of Special Education, 15(2) (1981): 105–123. Published by SAGE Publications, Inc. Reprinted with permission. 36. ‘Human Intelligence: An Introduction to Advances in Theory and Research’, David F. Lohman Review of Educational Research, 59(4) (1989): 333–373. Published by SAGE Publications, Inc. Reprinted with permission. 37. ‘Cognitive Demands of New Technologies and the Implications for Learning Theory ’, Richard J. Torraco Human Resource Development Review, 1(4) (2002): 439–466. Published by SAGE Publications, Inc. Reprinted with permission. 38. ‘Cognitive Conceptions of Learning’, Thomas J. Shuell Review of Educational Research, 56(4) (1986): 411–436. Published by SAGE Publications, Inc. Reprinted with permission. 39. ‘Meaning in Complex Learning’, Ronald E. Johnson Review of Educational Research, 45(3) (1975): 425–459. Published by SAGE Publications, Inc. Reprinted with permission. 40. ‘Phases of Meaningful Learning’, Thomas J. Shuell Review of Educational Research, 60(4) (1990): 531–547. Published by SAGE Publications, Inc. Reprinted with permission. 41. ‘Growth, Development, Learning, and Maturation as Factors in Curriculum and Teaching’, William C. Trow Review of Educational Research, XXI(3) (1951): 186–195. Published by SAGE Publications, Inc. Reprinted with permission. 42. ‘Maslow, Monkeys and Motivation Theory ’, Dallas Cullen Organization, 4(3) (1997): 355–373. Published by SAGE Publications Ltd. Reprinted with permission.
Salkind_Prelims I.indd xvi
9/16/2010 12:41:59 PM
Appendix of Sources
xvii
43. ‘Maslow’s Theory of Motivation: A Critique’, Andrew Neher Journal of Humanistic Psychology, 31(3) (1991): 89–112. Published by SAGE Publications, Inc. Reprinted with permission. 44. ‘Caught on Fire: Motivation and Giftedness’, Ann Robinson Gifted Child Quarterly, 40(4) (1996): 177–178. Published by SAGE Publications, Inc. Reprinted with permission. 45. ‘An Empirical Test of Maslow’s Theory of Motivation’, Eugene W. Mathes and Linda L. Edwards Journal of Humanistic Psychology, 18(1) (1978): 75–77. Published by SAGE Publications, Inc. Reprinted with permission. 46. ‘Meaningfulness, Commitment, and Engagement: The Intersection of a Deeper Level of Intrinsic Motivation’, Neal Chalofsky and Vijay Krishna Advances in Developing Human Resources, 11(2) (2009): 189–203. Published by SAGE Publications, Inc. Reprinted with permission. 47. ‘Motivation and Human Growth: A Developmental Perspective’, M.S. Srinivasin Journal of Human Values, 14(1) (2008): 63–71. Published by SAGE Publications India. Reprinted with permission. 48. ‘Evolutionary Perspectives on Human Motivation’, Jutta Heckhausen American Behavioral Scientist, 43(6) (2000): 1015–1029. Published by SAGE Publications, Inc. Reprinted with permission. 49. ‘The Debate about Rewards and Intrinsic Motivation: Protests and Accusations Do Not Alter the Results’, Judy Cameron and W. David Pierce Review of Educational Research, 66(1) (1996): 39–51. Published by SAGE Publications, Inc. Reprinted with permission. 50. ‘A Comprehensive Expectancy Motivation Model: Implications for Adult Education and Training’, Kenneth W. Howard Adult Education Quarterly, 39(4) (1989): 199–210. Published by SAGE Publications, Inc. Reprinted with permission. 51. ‘The Academic Motivation Scale: A Measure of Intrinsic, Extrinsic, and Amotivation in Education’, Robert J. Vallerand, Luc G. Pelletier, Marc R. Blais, Nathalie M. Brière, Caroline Senécal and Evelyne F. Vallières Educational and Psychological Measurement, 52 (1992): 1003–1017. Published by SAGE Publications, Inc. Reprinted with permission. 52. ‘Extrinsic Rewards and Intrinsic Motivation in Education: Reconsidered Once Again’, Edward L. Deci, Richard Koestner and Richard M. Ryan Review of Educational Research, 71(1) (2001): 1–27. Published by SAGE Publications, Inc. Reprinted with permission.
Salkind_Prelims I.indd xvii
9/16/2010 12:42:00 PM
xviii
Appendix of Sources
53. ‘Beyond the Rhetoric: Understanding Achievement and Motivation in Catholic School Students’, Janine Bempechat, Beth A. Boulay, Stephanie C. Piergross and Kenzie A. Wenk Education and Urban Society, 40(2) (2008): 167–178. Published by SAGE Publications, Cor. Reprinted with permission. 54. ‘Dimensions of School Motivation: A Cross-Cultural Validation Study’, Dennis M. McInerney and Kenneth E. Sinclair Journal of Cross-Cultural Psychology, 23(3) (1992): 389–406. Published by SAGE Publications, Inc. Reprinted with permission. 55. ‘Achievement Motivation in Children of Three Ethnic Groups in the United States’, Manuel Ramirez III and Douglass R. Price-Williams Journal of Cross-Cultural Psychology, 7(1) (1976): 49–60. Published by SAGE Publications, Inc. Reprinted with permission. 56. ‘Motivation and Learning Environment Differences between Resilient and Nonresilient Latino Middle School Students’, Hersholt C. Waxman, Shwu-yong L. Huang and Yolanda N. Padrón Hispanic Journal of Behavioral Sciences, 19(2) (1997): 137–155. Published by SAGE Publications, Inc. Reprinted with permission. 57. ‘Attracting and Retaining Teachers: A Question of Motivation’, Karin Müller, Roberta Alliata and Fabienne Benninghoff Educational Management Administration & Leadership, 37(5) (2009): 574–598. Published by SAGE Publications Ltd. Reprinted with permission. 58. ‘Interpersonal Relationships, Motivation, Engagement, and Achievement: Yields for Theory, Current Issues, and Educational Practice’, Andrew J. Martin and Martin Dowson Review of Educational Research, 79(1) (2009): 327–365. Published by SAGE Publications, Inc. Reprinted with permission. 59. ‘Classroom and Individual Differences in Early Adolescents’ Motivation and Self-Regulated Learning’, Paul R. Pintrich, Robert W. Roeser and Elisabeth A.M. De Groot The Journal of Early Adolescence, 14(2) (1994): 139–161. Published by SAGE Publications, Inc. Reprinted with permission. 60. ‘Atkinson’s Theory of Achievement Motivation: First Step Toward a Theory of Academic Motivation?’, Martin L. Maehr and Douglas D. Sjogren Review of Educational Research, 41(2) (1971): 143–161. Published by SAGE Publications, Inc. Reprinted with permission. 61. ‘Motivation and Engagement across the Academic Life Span: A Developmental Construct Validity Study of Elementary School, High School, and University/College Students’, Andrew J. Martin Educational and Psychological Measurement, 69(5) (2009): 794–824. Published by SAGE Publications, Inc. Reprinted with permission.
Salkind_Prelims I.indd xviii
9/16/2010 12:42:00 PM
Appendix of Sources
xix
62. ‘Motivation and Achievement: A Quantitative Synthesis’, Margaret E. Uguroglu and Herbert J. Walberg American Educational Research Journal, 16(4) (1979): 375–389. Published by SAGE Publications, Inc. Reprinted with permission. 63. ‘Academic Motivation and Achievement among Urban Adolescents’, Joyce F. Long, Shinichi Monoi, Brian Harper, Dee Knoblauch and P. Karen Murphy Urban Education, 42(3) (2007): 196–221. Published by SAGE Publications, Cor. Reprinted with permission. 64. ‘Intrinsic Motivation and School Misbehavior: Some Intervention Implications’, Howard S. Adelman and Linda Taylor Journal of Learning Disabilities, 23(9) (1990): 541–550. Published by SAGE Publications, Inc. Reprinted with permission. 65. ‘Reinforcement, Reward, and Intrinsic Motivation: A Meta-Analysis’, Judy Cameron and W. David Pierce Review of Educational Research, 64(3) (1994): 363–423. Published by SAGE Publications, Inc. Reprinted with permission. 66. ‘Motivation in Transition’, Barbara Stauber Young: Nordic Journal of Youth Research, 15(1) (2007): 31–47. Published by SAGE Publications India. Reprinted with permission. 67. ‘Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing’, Raymond Hubbard and R. Murray Lindsay Theory & Psychology, 18(1) (2008): 69–88. Published by SAGE Publications Ltd. Reprinted with permission. 68. ‘Alphabet Soup: Blurring the Distinctions between p’s and α’s in Psychological Research’, Raymond Hubbard Theory & Psychology, 14(3) (2004): 295–326. Published by SAGE Publications Ltd. Reprinted with permission. 69. ‘Research Methods: Experimental Design’, Julian C. Stanley Review of Educational Research, XXVII(5) (1957): 449–459. Published by SAGE Publications, Inc. Reprinted with permission. 70. ‘What Can We Learn from International Assessments?’, Robert J. Mislevy Educational Evaluation and Policy Analysis, 17(4) (1995): 419–437. Published by SAGE Publications, Inc. Reprinted with permission. 71. ‘Power, Control, and Validity in Research’, Randall M. Parker Journal of Learning Disabilities, 23(10) (1990): 613–620. Published by SAGE Publications, Inc. Reprinted with permission. 72. ‘Testing Reasoning and Reasoning about Testing’, Walt Haney Review of Educational Research, 54(4) (1984): 597–654. Published by SAGE Publications, Inc. Reprinted with permission.
Salkind_Prelims I.indd xix
9/16/2010 12:42:00 PM
xx
Appendix of Sources
73. ‘Magnitudes of Experimental Effects in Social Science Research’, Lee Sechrest and William H. Yeaton Evaluation Review, 6(5) (1982): 579–600. Published by SAGE Publications, Inc. Reprinted with permission. 74. ‘Hypothesis Testing in Relation to Statistical Methodology’, Cherry Ann Clark Review of Educational Research, XXXIII(5) (1963): 455–473. Published by SAGE Publications, Inc. Reprinted with permission. 75. ‘On Examinee Choice in Educational Testing’, Howard Wainer and David Thissen Review of Educational Research, 64(1) (1994): 159–195. Published by SAGE Publications, Inc. Reprinted with permission. 76. ‘Historical Views of Invariance: Evidence from the Measurement Theories of Thorndike, Thurstone, and Rasch’, George Engelhard, Jr Educational and Psychological Measurement, 52 (1992): 275–291. Published by SAGE Publications, Inc. Reprinted with permission. 77. ‘If Statistical Significance Tests Are Broken/Misused, What Practices Should Supplement or Replace Them?’, Bruce Thompson Theory & Psychology, 9(2) (1999): 165–181. Published by SAGE Publications Ltd. Reprinted with permission. 78. ‘Musical Aptitude Testing: From James McKeen Cattell to Carl Emil Seashore’, Jere T. Humphreys Research Studies in Music Education, 10 (1998): 42–53. Published by SAGE Publications Ltd. Reprinted with permission. 79. ‘The Life and Labors of Francis Galton: A Review of Four Recent Books about the Father of Behavioral Statistics’, Brian E. Clauser Journal of Educational and Behavioral Statistics, 32(4) (2007): 440–444. Published by SAGE Publications, Inc. Reprinted with permission. 80. ‘Regression towards the Mean, Historically Considered’, Stephen M. Stigler Statistical Methods in Medical Research, 6 (1997): 103–114. Published by SAGE Publications Ltd. Reprinted with permission. 81. ‘Karl Pearson and Statistics: The Social Origins of Scientific Innovation’, Bernard J. Norton Social Studies of Science, 8 (1978): 3–34. Published by SAGE Publications Ltd. Reprinted with permission. 82. ‘A History of Effect Size Indices’, Carl J. Huberty Educational and Psychological Measurement, 62(2) (2002): 227–240. Published by SAGE Publications, Inc. Reprinted with permission. 83. ‘The Role of Assessment in a Learning Culture’, Lorrie A. Shepard Educational Researcher, 29(7) (2000): 4–14. Published by SAGE Publications, Inc. Reprinted with permission. 84. ‘The Place of Theory in Educational Research’, Patrick Suppes Educational Researcher, 3 (1974): 3–10. Published by SAGE Publications, Inc. Reprinted with permission.
Salkind_Prelims I.indd xx
9/16/2010 12:42:00 PM
Appendix of Sources
xxi
85. ‘Curriculum-based Measures: Development and Perspectives’, Stanley L. Deno Assessment for Effective Intervention, 28(3–4) (2003): 3–12. Published by SAGE Publications, Inc. Reprinted with permission. 86. ‘Tests as Research Instruments’, Robert L. Thorndike Review of Educational Research, XXI(5) (1951): 450–462. Published by SAGE Publications, Inc. Reprinted with permission. 87. ‘My Current Thoughts on Coefficient Alpha and Successor Procedures’, Lee J. Cronbach and Richard J. Shavelson Educational and Psychological Measurement, 64(3) (2004): 391–418. Published by SAGE Publications, Inc. Reprinted with permission. 88. ‘Handbook of Evaluation Research’, Lee Ross and Lee J. Cronbach Educational Researcher, 5 (1976): 9–19. Published by SAGE Publications, Inc. Reprinted with permission. 89. ‘A Model for Studying the Validity of Multiple-Choice Items’, Lee J. Cronbach and Jack C. Merwin Educational and Psychological Measurement, 15 (1955): 337–352. Published by SAGE Publications, Inc. Reprinted with permission. 90. ‘Assisted Assessment: A Taxonomy of Approaches and an Outline of Strengths and Weaknesses’, Joseph C. Campione Journal of Learning Disabilities, 22(3) (1989): 151–165. Published by SAGE Publications, Inc. Reprinted with permission. 91. ‘Standardized Testing’, Roger T. Lennon NASSP Bulletin: National Association of Secondary-School Principals, 39 (1955): 34 – 40. Published by SAGE Publications, Inc. Reprinted with permission. 92. ‘The Place of Statistics in Psychology’, Jum Nunnally Educational and Psychological Measurement, XX(4) (1960): 641–650. Published by SAGE Publications, Inc. Reprinted with permission. 93. ‘Education in Statistics and Research Design in School Psychology’, Steven G. Little, Howard B. Lee and Angeleque Akin-Little School Psychology International, 24(4) (2003): 437–448. Published by SAGE Publications Ltd. Reprinted with permission. 94. ‘The Role of Measurement Error in Familiar Statistics’, Malcolm James Ree and Thomas R. Carretta Organizational Research Methods, 9(1) (2006): 99–112. Published by SAGE Publications, Inc. Reprinted with permission. 95. ‘Qualitative Methods and the Development of Clinical Assessment Tools’, Jane F. Gilgun Qualitative Health Research, 14(7) (2004): 1008–1019. Published by SAGE Publications, Inc. Reprinted with permission.
Salkind_Prelims I.indd xxi
9/16/2010 12:42:00 PM
Salkind_Prelims I.indd xxii
9/16/2010 12:42:00 PM
Editor’s Introduction Neil J. Salkind
I
f you walk into almost any educational setting or institution, you will see a variety of activities taking place, ranging perhaps from an early intervention program for very young children to traditional classroom teaching, to reviewing students’ work via a distance learning activity, and more. In the most general sense, the focus of educational psychology is the scientific basis of what occurs in these, and many other, settings. Educational psychology is a broad combination of the study of many disciplines, which together has the goal of better understanding the processes through which change takes place in such settings and how scientists, teachers, researchers and practitioners (and these categories are surely mutually inclusive of one another) can help facilitate that change. As a discipline, educational psychology might have its origins in John Dewey’s presidential address to the American Psychological Association in 1899 where he expressed concern about the need for developing a science that links theory in areas such as learning, cognitive processes and human development and practical application of such work. He emphasized linking theory and practice – the essence of the educational psychologist’s universe. With that in mind, this five-volume set of Sage Directions in Educational Psychology undertakes to familiarize the reader with important references from four areas of study including • • • •
human development, curriculum, instruction and learning, motivation, and research design, measurement and statistics and evaluation.
Each of these topics contributes to a better understanding of what goes on when children and adults are participating in educational activities and in educational settings be it school, home or even work.
Salkind_Prelims I.indd xxiii
9/16/2010 12:42:00 PM
xxiv
Editor’s Introduction
Within Sage Directions in Educational Psychology, each topic will be introduced and accompanied by a set of resources that are easily accessible through online library databases. These collections of articles are an overview of the important topics within each of the four areas and serve as an introduction to the critical issues that the field is facing. These citations alone, cannot of course, cover the entire field of educational psychology but they provide an accurate overview of what the most important topics are and in many cases, who the people are who are involved in the efforts to better understand how the educational process works.
The Study of Human Development: Understanding Change Over Time A Definition of Development and Some Influential Factors An understanding of the process of human development is essential to the educational psychologist. Development can be defined as a progressive series of changes that occur in a predictable pattern and as the result of interactions between biological and environmental factors. Anyone involved in the educational process well recognizes that the two sources of these factors (biology or nature and the environment or nurture) have to be taken into account, both in theory and in practice. Not only does the individual’s innate abilities need to be considered, but of course, the environments in which these qualities, attributes and characteristics flourish (or suffer), are modified and reinvented. And while there have been endless discussions as to whether it is ‘nature’ or ‘nurture’ that controls the processes that result in developmental outcomes, it is generally accepted by most educational psychologists that it is the presence of both types of factors all the time. It is not additive such as 60 percent heredity and 40 percent environmental factors (or any combination thereof), but rather it is of an interactive nature where the results are multiplicative and both forces are always operating 100 percent of the time.
The Course of Human Development As Buckholdt (1983) points out, among other viewpoints, there are three somewhat distinct ways of viewing the process of human development. The first focuses on individual traits such as memory, aggression, social behavior, and perception. These abilities, traits or characteristics are studied over time and often age groups are compared with one another. Such studies are usually conducted using research designs which fall in the general
Salkind_Prelims I.indd xxiv
9/16/2010 12:42:00 PM
Editor’s Introduction
xxv
categories of cross-sectional or longitudinal although there are many alternatives recently developed that may better answer the questions at hand. Another viewpoint is that offered by life-span psychologists such as K. Warner Schaie, in which the process of human development is viewed as a series of stages that have the following characteristics. 1. Stages occur in a defined and invariant order. 2. Each stage is based in the characteristics of the previous stage but is qualitatively distinct. 3. Stages cannot by skipped. 4. Stages result in structural change that are characterized by a push and pull between equilibrium and disequilibrium. This stage approach was made popular by the early and very influentialbiologist-turned-epistemologist-turned-developmentalist Jean Piaget and his thorough study of cognitive development, but was also suggested by the pediatrician and Yale Child Study Center founder, Arnold Gesell, who thought of age as a way to organize descriptions of behavior. Gesell’s work developed in parallel fashion with that of G. Stanley Hall (Meyer, 1971), a pioneer American psychologist who proposed a ‘genetic’ view of development, the word not meant as is used today to represent the study of the transmission of traits, but more as a developmental theme signifying change over time. A final viewpoint is where age is thought of as a correlate of development, rather than playing a causal role. This is where experience accounts for more of the variability than age in understanding differences in outcomes and many educational psychologists view this as an age irrelevant view of development, a phrase coined by Don Baer. The importance here for the educational psychologist is that much of any curriculum can be taught at any age, as long as the structural elements (that Gesell often mentioneds), were present. So in fact, adhering to this perspective, one would believe that higher level thinking skills can be taught to very young children if an environment is created that supports such learning.
Trends and Issues Generic to our discussion here is also a set of six common trends and seven issues (discussion points) that characterize the process of human development as applied in an educational setting. The trends, and what they encompass are as follows: • Early experiences and events matter. For example, the introduction of harmful substances during a pregnancy or the involvement in early
Salkind_Prelims I.indd xxv
9/16/2010 12:42:00 PM
xxvi
•
•
•
•
•
Editor’s Introduction
stimulation programs have powerful impacts on later events. Crosnoe and Elder (2004) discuss pathways of human development and the impact that early experiences can have on later outcomes. Movement from global to discrete response systems. For example, a child’s early language is characterized by generalities (‘dog’ for any four-legged animal) whereas later in development, specific words are used for highly specific experiences (‘puppies’ for small and cute dogs). Increase in complexity. For example, in the most basic form, a simple collection of cells leads to a complex set of systems or early crying representing hunger, pain or discomfort, leads to more complex emotional responses to unpleasant events (such as specific types of cries). With change, students of all ages can better consolidate and integrate experiences and knowledge. Increasing integration and differentiation. For example, cognitive schemes or blueprints of what the individual’s world (such as safety, peers, relationships) become more finely discrete and well-defined while also becoming integrated into a useful whole. Decrease in egocentrism. For example, as growth and development proceed, the individual becomes less focused on him or herself and more focused on integrating personal characteristics and experiences with the surrounding culture. Development of social autonomy. And, as the individual grows, he or she becomes more socially autonomous and better able to function as an individual in increasingly complex environments.
The issues characterizing development, and the questions they pose are as follows. Many of these are addressed in Elder and Rockwell (1978) and their discussion of a ‘life-course’ view of the developmental process mentioned earlier. • The nature of development. Is development a function of environmental or biological factors or some combination of both? • Important developmental processes. What are the relative roles of learning (behavior that changes as function of experience) versus maturation (behavior that originates in biological changes)? • What role does age play as a marker of development? Are developmental changes a function of age or simply an accompanying event or correlate? • The rate of development. What role do sensitive or critical periods, such as adolescence, play in the developmental process? • The shape of development. Is the process of development characterized by a gradual shift or by more abrupt and qualitative changes? • The origin of individual differences. How do individual differences arise under the same, and different, circumstances?
Salkind_Prelims I.indd xxvi
9/16/2010 12:42:00 PM
Editor’s Introduction
xxvii
• The study of development. What are the methods used to study development and how does the question being asked influence the method selected?
Theories of Development These trends and issues are often the source of differences between different theoretical perspectives that have developed over the past 200 years (and often have their roots in philosophical themes). There are four separate families of traditional developmental theories which can guide educational psychologists in their understanding of how learning and change takes place in educational settings. 1. Maturation and biological theories emphasize the sequences and content of development as determined by biological factors and the evolutionary history of the species. Arnold Gessell’s work and that of current-day evolutionary psychologists are characterized by this approach. In recent years, the study of motor development as the foundation for developmental psychology (Thelen, 2000) has become more popular as well. 2. Psychodynamic theories contend that individuals are conflicted beings and that differences between them are the result of how these conflicts are resolved. The work of Sigmund Freud and Erik Erikson are characterized by this approach. 3. Behavioral theories focus on development being a function of the laws of learning and how the environment can have a pronounced impact on developmental outcomes. The learning theory of B. F. Skinner and social learning theory of Albert Bandura characterize this approach. 4. Cognitive-developmental theories see development as the result of structural change based on the individual’s active participation in the developmental process. The theoretical works of Jean Piaget, Lev Vygotsky, and Jerome Bruner characterize this approach. Also, the early work on John Dewey (Trow, 1951) echoes this reliance on ‘inner motivation’. However, beyond these theories, educational psychologists and others have made significant strides in articulating specific positions regarding the developmental process. For example, Super and Harkness (1986) discuss ‘the developmental niche’ focusing on the ‘interface’ between the child and culture. They contend that the study of development and the theories that underlie the process, have focused too much on outcomes rather than the process itself and the three components of this niche; the physical and social settings in which the child lives, the customs of child-rearing that surround the child, and the psychological approach of the caretakers.
Salkind_Prelims I.indd xxvii
9/16/2010 12:42:00 PM
xxviii
Editor’s Introduction
Their approach is only one of many where different theoretical perspectives result in an understanding of the influence of the child’s surroundings on his or her development. This approach also characterizes a more applied side of educational psychology.
New Perspectives on Aging Another significant trend over the past 100 years has been an increased interest in aging and development through the later years of adulthood. What was once known as old age (post 65 years) is now considered ‘the new 40’ or as the young-old or at least towards the end of what was once termed middle age. This is in part due to extended lifespans where people are living longer, but it is also due to changing demographics where the spike in births in the years following World War II resulted in a generation commonly known as baby boomers. Regardless of one’s theoretical perspective about the importance of aging, older is a relative term. This point is well made by the humanistic psychologist Carl Rogers (1980) in ‘Getting Old – or Older and Growing’ where he emphasizes that the personal changes he is experiencing in all aspects of his life are noticeable, but not as domineering as he expected (or as the culture at large expects when one ‘ages’). As cited by Rogers, perhaps Chief Justice Oliver Wendell Holmes best captures the spirit of this, upon leaving a burlesque house at age 80; ‘Oh to be 70 again!’. Earlier, the importance of age was discussed, and how as a variable, it correlates with many measures of performance, but has relatively low explanatory power. And, as it does correlate, Charness (2008) points out how it is a moderate predictor of performance on most laboratory tasks and important life management skills such as driving, the use of computers, and success using training materials. He goes on to conclude how aging reduces adaptive capacity so that achieving a successful environment-person fit requires extra effort and attention to the design of environments by planners. Ever since the systematic study of adulthood (and aging) became popular beginning in the 1960s, several different approaches have been taken to explain the developmental process that extends beyond the traditional period of adolescence, once seen as the ‘last’ stage of development. Calvin Settlage and his colleagues (1988) take a very interesting approach in moving away from the traditional stage model and moving towards a model that emphasizes what they call developmental process. They define this as ‘the function and structure forming process that parallels and derives from developmental interaction’. That is, through the interaction of the individual with the environment, different structures serve certain functions. This view can explore such concepts as challenge, tension and conflict as ways in which these structures are formed.
Salkind_Prelims I.indd xxviii
9/16/2010 12:42:00 PM
Editor’s Introduction
xxix
Interestingly, the last 50 years of understanding development within a broader context of educational psychology has provided the tools needed to address issues facing adults within an educational context including everything from managing professional and family life, to differences in learning strategies, to distance learning and the use of technology.
The Importance of Early Experience At least one major theme that has resulted from over 100 years of understanding the important theme of continuity and discontinuity in development and later, its application to educational endeavors, is the importance of early experience. While theorists have almost always explored the implications of this, it was not until the advent of early childhood interventions and educational programs (such as Head Start) was there a corpus of data that could be used to evaluate the effectiveness of such efforts. The NICHD Early Child Care Research Network (2002), examined the effect of early childcare on children’s functioning at 4½ years. This very ambitious longitudinal study of over 1,000 children revealed that children’s later states were predictable by their participation in early childhood activities. Specifically, such factors as high quality child care predicted better academic outcomes in important markets such as academic performance and language skills. While educators traditionally would not have the resources or the infrastructure to begin the educational process so early, changing social practices and demands require the inclusion of such populations in the overall educational goals of any community. More children than ever are enrolled in early care programs, and it is early childhood educators who often apply the lessons learned from the research of developmental psychologists. Travis Thompson (1995) in his discussion of early language experience and brain development (in his review of Hart and Risley’s book Meaningful Differences in the Everyday Experiences of Young American Children), makes the same point regarding language development, a primary interest of those who study child development and those who are interested in facilitating educational growth. Even over 100 years ago (Witmer), in 1909, reported how psychologists bemoaned the loss of early experience in the treatment of a truant child who ‘lost three years of the invaluable six or eight years of school life’. For educators, early experience and the impact the lack of, or presence of, has on the growing individual will always be paramount.
An Increased Interest in Adolescence and Young Adulthood Culture and society both play an important role in what becomes an important topic of focus for educational psychologists. Just as there has been
Salkind_Prelims I.indd xxix
9/16/2010 12:42:00 PM
xxx
Editor’s Introduction
increased interest in early childhood by developmental and then educational psychologists, so the same pattern has evolved for the period known as adolescence and young adulthood including the newest concept of emerging adulthood coined by Jeffrey Arnett from Clark University. The importance of understanding the developmental and educational needs of the adolescent cannot be overestimated given that they are the heir apparent to the culture in which they live as well as the costs of inaction in terms of displaced young children at a time in their lives when structure and guidance are most important. William Kurtine and his colleagues (2008) address many of these concerns. In their work on promoting positive youth development in a Miami, Florida-based project. They draw on the importance of understanding ‘developmental intervention science’ (DIS) within an outreach program for the development of strategies to promote change and healthy outcomes. The Miami Youth Development Project (YDP), started in 1988, emphasizes community-based interventions through the use of practical communitybased research. While much of their work discusses the DIS model and its application, the results are aimed at working with the children in the community to avert the negative impacts of a rapidly growing multicultural community while preserving all the benefits – both challenges to school systems suddenly facing an influx of children from different cultures. Additionally, there has been a great deal of attention paid to the role of maturational timing on psychological functioning, especially in adolescence, given the extent of biological change that takes place during this time of growth (Silbereisen, et al., 1989). Interests such as these, have in turn, led to an increased interest in neuroscience made possible through the development of sophisticated imaging and other noninvasive experimental techniques.
Self-efficacy and Academic Success In the past, educational psychologists have realized the importance of belief systems and how they can influence outcomes. In particular, there has been a sizeable number of studies and interest in how such beliefs by students, parents and teachers can affect academic performance. This is clearly demonstrated based on the results of studies by Barry Zmmerman and his colleagues (1992) where the causal role of students’ self-efficacy beliefs and academic goals were examined. They found that the quality and level of goals set by parents and the personal goals of students at the beginning of the semester predicted students’ final grades in social studies. Such research has important implications for education. It not only reflects the belief that individuals can, in part, mitigate negative circumstances by changing their belief systems (how well they can do in school, for example) but also lends credibility to social cognitive theories that
Salkind_Prelims I.indd xxx
9/16/2010 12:42:00 PM
Editor’s Introduction
xxxi
emphasize growth and learning to be a result of internal, as well as external, factors and influences. In sum, the study of human growth and development informs educational psychologists regarding the influence of certain factors (such as maturational processes and early experiences) and how these factors and their complex interactions can facilitate an understanding of educational participation, achievement and accomplishment.
Curriculum, Instruction and Learning: Sharing Knowledge and Skills Curriculum, instruction and learning focus on the transmission of information and abilities from one part of a culture to another, be it from generation to generation, from teacher to student, or from peer to peer. This area may indeed be one of the most applied for educational psychologists in that the majority of this work focuses on the everyday concerns of how to teach better and more efficiently. As we use the terms here, curriculum is the content of a teaching episode, instruction is the activity that takes place on the part of the teacher or the transmitter of information and skills, and learning is the process through which changes in behavior occur. Note that the words ‘teacher’ and ‘learner’ are both used in the broadest of terms. Traditionally, we think of teachers as college educated and credentialed employees of an educational institution and learners, their students. Rather, the broadest of educational contexts allows us to think of a teacher as anyone who transmits or shares any aspect of a culture and a student as anyone who receive such ‘knowledge’ be it in the form of rote memory to a family tradition.
The Nature of Curriculum An important point in any discussion of curriculum is that it no longer follows past perceptions of public school experience of reading, writing and arithmetic. Rather, the study and design of curriculum as it matches the learner’s abilities and the program’s goals, has become a discipline in and of itself in its application to almost any area that needs to be taught or shared. But at its most basic (and perhaps most important) level, curriculum is about learning and how the content can be designed to facilitate such. One excellent example of how this transmission of information might take place is the study of cultural apprenticeship by Patricia Greenfield and her colleagues (2000). Over a 24-year period, they studied an ‘ecocultural’ transition from a society based on agricultural activities to one based on commerce and in doing so, looked at the impact of this change on two generations of children.
Salkind_Prelims I.indd xxxi
9/16/2010 12:42:00 PM
xxxii
Editor’s Introduction
They found that within the space of a single generation, the primary skill taught, weaving, made the transition from an interdependent to an independent style of learning and that the work itself became more abstract. The results of this study show how transitions take place and how the curriculum of learning (in the broadest sense) often reflects the cultural influences that surround the learner. They also very effectively show that as with many other traits and characteristics of humans and the groups they live in, change and adaptation are not a one-way street. Rather, curriculum design is characterized by reciprocal effect where the conditions in a society (say agricultural or commercial) exert a profound effect on what and how newer generations will learn, while in turn, later generations are learning new things and developing new skill profiles that fit their new environments. Greenfield’s conclusion that ‘processes of cultural learning and cultural transmission change as culture changes over time’ are incredibly important as literate societies face the impact of accommodating (through newly designed curriculum) new cultures due to reform in immigration and other important social and political policies.
The Nature of Instruction and Teaching Few of the topics in the study of educational psychology stand alone. For example, understanding the child or adult’s pathways of development including their history and current circumstances can certainly have an impact on our understanding of their motives to learn. This is certainly the case when it comes to teaching, be it the recurring themes in Homer’s Odyssey or the rationale behind the scientific method. It is therefore difficult (and some would say, impossible) to discuss the act of instructing or teaching without paying heed to the intricacies of the learner and what impact such instruction might have on the individual. Indeed, Shuell (1990) posited that the learner proceeds through a series of phases where the process used to learn and the variables that might affect the final outcome, vary greatly. Shuell used the word ‘phase’ rather than stage since he felt that stage was too tied to a fixed time period. And he generally identifies an initial phase, an intermediate phase and a terminal phase as the last in his model where a great deal of emphasis is placed on understanding the transitions between them. Once that understanding is achieved, the teacher’s teaching activities can focus on the various factors affecting learning at different points during the teaching process. The challenge to the teacher is two-fold. First, of course, is to identify what phase the individual learner might be in and tailor the curriculum to fit the developmental and cognitive needs of the learner. Second, is to
Salkind_Prelims I.indd xxxii
9/16/2010 12:42:00 PM
Editor’s Introduction
xxxiii
address the complex and difficult topic of understanding the transitions between phases (and incidentally, the same issue that developmental psychologists have always faced in dealing with stages). Over the last 50 years, the effort has been towards development of a psychology of instruction (Glaser, 1976) – where there is an ongoing effort to link the science of learning and the educational application of that knowledge. His classic paper, ‘Components of a Psychology of Instruction: Toward a Science of Design’, is must reading for any student of educational psychology since it illuminates how basic and applied sciences can work together to produce effective teaching (and learning) outcomes. He also discusses four elements of any teaching environment that are essential to success. The first is the development of competence of the students which reflects their cognitive and intellectual skills. The second is a description of the initial state from where instruction begins. The third is the identification of the conditions that can facilitate change from a relatively naïve learner to a competent one, and the last is the assessment of the learning to evaluate whether the strategies applied have short- and long-term consequences of any substance. If indeed there is a reciprocity between the methods used to teach and the content being taught; advances in technology open up new worlds of possibilities. Distance learning, for example, could not take place without the advent and intense use of the Internet and the tools it has made available to teachers and students including everything from the most simple of web browsers to the use of social networking. Yet, in spite of the advances in technology, the inclusion of it as a part of curriculum does not necessarily ensure successful teaching or learning. Rather, some of the most basic elements seem to continue to be important, almost beyond that of the tools used to convey knowledge. For example, Upvall, Decker and Eilerson (2000) examined the quality of distance learning in a cohort of nurses using two-way instructional television. They found that the three most important categories of elements, broadly defined, which emerged from the model they used were learning (including recognition of others, increasing independence and increased creativity), teaching (including feedback and the encouragement of interactions) and support for both of these activities (including emotional and financial support and scheduling). It was revealed through both questionnaires and focus groups that the importance of these elements and their presence led to a positive experience for the students enrolled.
The Nature of Learning Learning is most basically defined as a measurable change in behavior and through the history of understanding how humans learn, there has been
Salkind_Prelims I.indd xxxiii
9/16/2010 12:42:00 PM
xxxiv
Editor’s Introduction
considerable emphasis on the nature of the underlying process that accounts for this change. Early theories such as those offered by Edward Thorndike focused on the creation of associations between events and how they might be strengthened over time with repetition. This idea eventually grew to represent what we know today as classical and operant conditioning, two strong foundations of the behavioral views of learning. The principles underlying these types of learning were species, age, and domain specific and independent of any context (Brown, 1994). The most important implication of this for the educational psychologist is that the human (or animal, as was often the case) is not the active member in the learning setting, but rather is acted on. The human is also not responsible for not learning – rather, it is the design of the environment that accounts for success or failure. While classical conditioning is basically characterized by uncontrollable autonomic nervous system responses (such as knee-jerk reflexes), the operant conditioning paradigm offered by B. F. Skinner and other behaviorists during the 1950s expanded the approach towards understanding the best way to learn. Operant conditioning is a very specific type of learning that posits that the likelihood of events occurring is a function of what follows them. Most simply, behavior is a function of its consequences. If the frequency of a desired behavior increases, it must be the result of what follows that behavior. For example, a simple example is if a teacher wants children to arrive at school on time, the children should be rewarded for such behavior. Likewise, if unnecessary hand raising has become a problem in a teaching setting, being selective as to who is called on and why and how often, may help to decrease the frequency of that behavior. What followed this period of engagement with such mechanistic theories as classical and operant conditioning was the introduction of cognitive components (hence the revolution that characterized social learning theory) and also the cognitive evolution in understanding learning processes. What the cognitive revolution introduced during the 1960s and 1970s was the notion of the individual as an active learner – or, as Brown wrote, an active constructor – who is simply not a receiver of information but one who constructs his or her own learning experiences to match (not just meet) the demands of the environment. No longer passive, learning was now thought of as an active endeavor and classroom practices and curriculum materials and teaching methods reflected that. Efforts at better understanding the process of learning in a general cultural context have also reflected some of the theoretical positions we have talked about in other parts of this introduction. For example, Albert Bandura (2004) discusses the use of a social cognitive model to promote positive health outcomes. His particular model (in this specific application to health promotion)
Salkind_Prelims I.indd xxxiv
9/16/2010 12:42:00 PM
Editor’s Introduction
xxxv
focuses on the individual’s self-efficacy to control ‘how psychosocial influences affect health functioning’. Holt writes, in his 1964 review (a date right in the middle of this revolution) of several books that have strong cognitive themes, how the cognitive approach emphasizes concepts that one may ordinarily think are outside those of ‘simple’ school learning such as perceiving, judging, forming concepts, imagining, and solving problems. As is clear, the path from simple associations (characterizing a behavioral view) to creative behavior (characterizing a cognitive view) can be seen as a long and indirect, but very rich, one. And while the process of learning certainly deserves adequate coverage, a less well defined, but as important, piece in the puzzle of what makes learning work, is what a learner should be and how a culture of learning can be cultivated (Bruner, 1985). But, as with most disciplines involving the social and behavioral sciences, notions of how to teach which curriculum run in cycles with trends coming and going as the Zeitgeist changes. One perspective that has not changed over time and has been resilient to calls for the newest fad is that of the knowledge and skills a teacher must have to be successful. More than 60 years ago, William Trow (1951) echoed John Dewey’s beliefs that teachers cannot know ‘what to teach, how to teach or when to teach’ unless they know who they are teaching, and from what homes, groups and cultures their students come from. Interestingly, if this knowledge becomes part of curriculum and development philosophies, Trow believes that learners will not be viewed as being free and capricious, but rather, a part of a ‘dynamic’ and lawful system that effectively interacts with the environment. And if this were not enough of a challenge, Dewey also felt that the teaching experience is not complete, until the teacher also can ‘identify the student’s current needs against the future needs of a dynamic society’. Quite an assignment.
Motivation: Why We Do What We Do No educational endeavor, be it teaching a child to swim or providing a foundation in molecular biology for a future physician can happen without the learner being sufficiently motivated to master the task at hand. Indeed motivation describes the process through which goal-oriented behavior becomes activated. It should come as no surprise to the student of educational psychology that this concept is critical to understanding why people do what they do and how educators can be informed by science to motivate (or ‘mover’ – Latin for move which is the root of the word ‘motivation’) them towards useful and attainable goals.
Salkind_Prelims I.indd xxxv
9/16/2010 12:42:01 PM
xxxvi
Editor’s Introduction
Theories of Motivation: Driving Forces and Fulfilling Needs As with any approach to better understanding human behavior and in our case the entirety of the educational process, theories abound that drive empirical research that in turn drives educational policy. In the case of motivation this too is the case.
From Internal to External From the earliest days of understanding what motivates humans to act, there has been general agreement that motivation takes one of two forms (Srinivasin, 2008) with a simple distinction between the two. Intrinsic motivation explains behavior that is done for its own sake and not for other obvious rewards. Extrinsic motivation motivates behavior that is done for a more obvious reward, which usually has its origins external to the individual. For example, learning for its own sake would be motivated intrinsically and while there may be a reward (satisfaction and mastery), those rewards are often not at all obvious. Another more sophisticated form of intrinsic motivation is self-determination theory where it is posited that individual behavior is self-motivated and self-determined. The humanists bring to the table the notion that humans are preprogrammed with an inherent need to do work that is meaningful. On the other hand, learning a new set of skills to get an upgrade at work and an increase in salary would be motivated extrinsically. The intrinsic/extrinsic distinction is a relatively simple one that is somewhat superficial on a first reading. However, researchers have discussed other levels of intrinsic motivation such as meaningfulness (Chalofsky and Krishna, 2009). Although most of the literature they review has to do with human resources and employment, their distinction between the meaning of work and the meaning of at work is useful for those who study educational processes including achievement and especially vocational counseling (a sub-area of counseling psychology which in turn is often a sub-area of educational psychology). One works towards accomplishing a certain goal, but within this model, it is the work itself that provides the individual satisfaction and it is the process and action involved in the work that is motivating. Jean Piaget was famously quoted as saying, ‘A child’s work is play’ and so it may very well be intrinsically motivating for the student (and once again across any subject and of any age) to find that learning is a form of play as well and, in and of itself, motivating. The importance of extrinsic motivation being recognized, there is a growing body of literature and commentary that changes the idea that rewards
Salkind_Prelims I.indd xxxvi
9/16/2010 12:42:01 PM
Editor’s Introduction
xxxvii
(as extrinsic motivators) decrease intrinsic motivation (Cameron and Pierce, 1996). This is a huge issue for educators since much of the scientific literature has contended (and some question whether these contentions are based on fact or bias) that as motivation becomes extrinsic, the task at hand is less enjoyed. In other words, rewards and reinforcers do not work to further motivation, but instead may dampen it. Cameron and Pierce (1994) conducted a meta-analysis (where the results of many studies are combined) and concluded that in all but one of 100 empirical studies, this position is not supported and that external motivation can have a bearing on the enjoyment and pleasure associated with undertaking a particular task. In educational settings where external motivators are often are easily provided, knowing that such activity on the part of the teacher helps the student to be more highly motivated, can be a great step forward in reaching both the students’ and the teacher’s goal.
Maslow’s Hierarchy of Needs Some of the earliest thinking regarding motivation was done by Abraham Maslow in the early 1940s (Cullen, 1997) as he posited what he called a ‘needs hierarchy’ and even though discussions about this idea are well into their seventh decade, this may be the most widely recognized and most wellknown of all theories of motivation given its ubiquitous place among textbooks and as the focus of scientific articles. Perhaps most interesting about the development of Maslow’s theory, was that he was the first doctoral student of Harry Harlow, the Wisconsin-based psychologist whose early work on attachment theory tested on mother-child monkey dyads framed the continuing discussion about the role and importance of early experiences in development. The needs hierarchy posits a set of needs that all organisms have, hence the democratic nature of the theory, which also reflected Maslow’s humanistic approach toward understanding why people do what they do. Specifically, this set of needs progresses as a hierarchy with the most basic needs at the base of a triangle expanding to the most advanced at the top. These sets of needs (all of which are qualitatively distinct yet grounded in previous levels) are as follows:
More Advanced
More Basic
Salkind_Prelims I.indd xxxvii
Level
Characteristics and Behaviors
Self-actualization
Individual potentialities reached and realized
Esteem
Self-esteem and respect of community and family members
Love and Belonging
Social support and family and group identity
Safety
Safety of self and family and general well being
Physiological
Provision of basic needs such as food and shelter
9/16/2010 12:42:01 PM
xxxviii
Editor’s Introduction
All these needs are a prerequisite to the next higher level need. This means that early needs must be met before individuals can proceed through higher and more advanced needs. Quite interestingly, Maslow, for the humanist that he was, started his early career by studying animals in a quest to develop insights into ‘general humanness’. He believed that such study would provide insights into the behavior of humans’ earliest ancestors and how and what behaviors have evolved to help insure human success and survival as a species. He also believed that a better understanding of animal behavior (especially primates) provides us with an insight into behaviors that present-day humans share, independent of the culture in which they are shared – almost a quest of universals. Indeed, his hierarchy of needs shown earlier in this introduction is characterized by its universal nature in that all humans have this set of needs, mostly expressed in the same way. But there has’ as well, been a significant amount of criticism directed at Maslow’s theory of needs satisfaction or self-actualization as well. Neher (1991) writes that contrary to Maslow’s assumptions, some basic needs might be present at birth and innate in nature, but for the most part, the higher needs, such as those for self-esteem, are driven by cultural experiences. as said in another way, the influence of the environment is given more importance than Maslow’s ideas provide. And, the hierarchy where one level invariably follows another, really is not as structured as one might believe. In fact, Neher points out where certain societies are characterized by members who may go hungry and be in physical peril, but nonetheless, have strong family ties and high self-esteem. There have also been extended empirical tests of his theory, such as the one by Mathes and Edwards (1978) where they found that five levels might be too ambitious, suggesting that three would be a better number. The implications of Maslow’s theory, even given some reservations, for the educational psychologist should be clear. No matter the effectiveness of the teaching methods or the relevance of the curriculum, changes in behavior or learning are unlikely to take place unless basic needs are met. For the policy maker, children coming to school hungry or adults outside of meaningful personal relationships make unsuccessful students.
The Evolution of Motivation Over the past two decades there has been an increasing interest in the role that evolution plays in the development of different types of human behaviors. What was once the province of ethnologists and sociobiologists, is now the center of work by evolutionary psychologists. And, as it should not be to anyone’s surprise, several in this new wave of psychologists have focused on how motivation has evolved (Heckhausen, 2000).
Salkind_Prelims I.indd xxxviii
9/16/2010 12:42:01 PM
Editor’s Introduction
xxxix
After all, any organism must be sufficiently motivated to satisfy essential needs and as with other critical behaviors, these strategies have changed through the process of natural selection as the human species has evolved. If we can better understand the mechanism through which motivation is actuated (when and what goals humans seek) and deactivated (when and what goals humans do not seek), a much more clear and ambitious picture of what motivates learners, for example, could result. There are three underlying (and somewhat overlapping) innovations in all of psychology, which led to this interest in an evolutionary basis. The first was that basic instincts guide behavior. When people are hungry, they seek out food. When they are aware of what needs to be done to accomplish a goal, they devise a strategy to achieve it. Second, much in the tradition of psychodynamic theory, behavior (both physical and cognitive) seeks to reduce conflict through the satisfaction of both physical and psychological instincts. Finally, and most relevant to our discussion, when environmental conditions change (as they do over time) humans can adjust instinctual behavior or, at least, how instincts are satisfied. When humans once fought for food, the orderly and now ‘civilized’ way of satisfying the instincts that accompany hunger is to go shopping at the market or at least figuratively ‘battle’ for the best job and procure the best resources. Plundering and ravaging are out; neighborhoods and play dates are in. What is so critical about understanding motivation from an evolutionary psychology theoretical perspective for the educational psychologist? Clearly evolution does not invent behaviors, but rather brings to the table those already created through the process of evolution and better allows us to better understand how the origin of behaviors has been formed. Should educational psychologists be better able to understand the origins of motivational forces, they should also be better able to harness those forces and more effectively use them to reach for more satisfying and complete educational outcomes.
Motivation moves Forward By any measure, the topic of motivation and its study have permeated far beyond the walls of the laboratory and even beyond that of the classroom’s typical charges. Today, principles of motivation developed by educational and other psychologists have found their way into the business world (first suggested by Maslow very early on) as well as into classrooms for exceptional students. For example, Ann Robinson (1996) points out how the leading scholars in the world of the gifted found ‘falling in love with an idea’ as a primary motive for achievement. Here, motivation becomes an emotional component of the individual’s behavior rather than simply a goal-driven behavior.
Salkind_Prelims I.indd xxxix
9/16/2010 12:42:01 PM
xl
Editor’s Introduction
Interestingly, with many other topics in the area of educational psychology having their origins in concerns about educating children, the study of motivation was never far away from both younger and older humans. With an increasing emphasis on distance learning and millions of adults continuing, or returning to school, there is even more reason to look at specific models of motivation that have implications for adults. One such model is proposed by Kenneth Howard (1989) referred to as an expectancy motivation model, which has its bases in the social learning theory that we mentioned earlier. The model describes expectancy motivation as part of a dynamic process which includes a host of related variables such as past experiences, level of motivation, effort, performance, rewards and level of need satisfaction. The expectancy theory sees humans as being purposeful and who interact in a proactive way with the environment based on their likelihood that their efforts will result in a positive outcome or an outcome that they value. If they are successful, one could say they were highly motivated. This theory has its origins in the classic theory of social psychologist Kurt Lewin and experimental psychologist Edward Tolman during the 1930s. Later work by Julian Rotter led to increased refinement of the expectancy theory and it was not until the 1990s that research turned towards understanding motivation from this perspective in terms of what is known as the valenceinstrumentality-expectancy (or VIE) theory. VIE theory makes three assumptions about behavior and notes how related these can be to adult achievement. First, anticipation of rewards increases an individual’s motivation. For example, knowing that good performance in class results in a higher grade. Second, perceived value of various outcomes provides direction for behaviors. For example, attaching values to outcomes helps the individual distinguish those potentially motivating behaviors from those that are not. And, third, connections between a behavior and the expectancy that behavior will work (or not) become stronger over time and as they are exercised. For the older individual (that is, the adult in an educational context) these three elements can effectively help predict specific outcomes but more importantly provide a framework as to how these adults might be motivated to reach their goals. One of the latest incantations of VIE is known as the expectancy motivation model. The variables that are most important and the sequence in which they occur are… Effort → Performance → Reward → Need Satisfaction It is not very difficult to speculate how such a model might work. Effort and performance are called E→P expectancy and is defined as an individual’s perception that efforts will lead to successful performance, P→R expectancy is the individual’s perception that performance will be rewarded, and
Salkind_Prelims I.indd xl
9/16/2010 12:42:01 PM
Editor’s Introduction
xli
R→N expectancy is the likelihood that the rewards will satisfy a need. And, of course, this model is cyclical in that need satisfaction leads to new, more advanced needs and additional efforts to fulfill them. Howard believes that the most important implications for adult education and training is that adults need highly structured environments and specific learning objectives allowing them to make informed choices regarding the likelihood of success. Simply put by Howard, ‘Learners that believe the learning goals are achievable and will result in personal rewards that met their individual needs will be more motivated than those who do not.’
Research Design, Measurement and Statistics, and Evaluation: Keeping Track – Tools and Methods The educational psychologist strives to capture a body of knowledge that can inform educators as how best to maximize the effectiveness of the educational experience. Given that these professionals have knowledge of human development, curriculum and its design and theories of motivation and their application, the next step is accountability. The disciplines of research design (how to ask and answer questions), measurement and statistics (how to assess outcomes and make sense of data) and evaluation (the science of determining the value or worth of outcomes) provide the tools to help understand the knowledge accrued through the study of the first three areas we introduced earlier.
Research Design: Answering Questions through Experiments and Inquiries Research design focuses on the way in which educational psychologists propose to ask and answer the questions they deem important. These designs fall into two very general categories; qualitative and quantitative methods.
Qualitative Research Designs Qualitative research designs focus on investigations that examine the how and why of different phenomena. They are distinctly different from other methods in that they often focus on individual entities, be they individual people or individual institutions. For example, one such qualitative method would be the case study, where investigators examine the characteristics of a school collecting data that are consistent with the questions that are being asked. An in-depth study of the nature and changing mission of a charter
Salkind_Prelims I.indd xli
9/16/2010 12:42:01 PM
xlii
Editor’s Introduction
school (‘How did this school receive its charter and how has the school changed since?’) might be one such example. In turn, a common strategy in qualitative research is to use the answers to initial questions as the basis for later and more refined subsequent questions, until the researchers believe they have the initial inquiry answered. Sources of data for qualitative inquiries can include such diverse origins as archival records, physical artifacts, direct observation, participant observation and focus groups. The qualitative approach and methods as applied to the social sciences is relatively new, but cogent arguments have been made how it might work in tandem with more traditional quantitative methods. Gilgun (2004) effectively argues that given the pressure that practitioners face to demonstrate the effectiveness of a particular treatment and to do such in a timely manner, the wants and preferences of patients are critical to incorporate into what is important when it comes to evaluating an outcome. While traditional standardized tools are very valuable, this review shows how qualitative tools add additional dimensions (and information) such that the different methods can inform one another and that the evaluation of clinical outcomes using standardized methods are incomplete without their including a qualitative component.
Quantitative Research Methods Quantitative research designs can be broken down into three general categories; pre-experimental, quasi-experimental and experimental. Pre-experimental designs are those that do not include a control group, an element that is critical for understanding the effects of a treatment. Preexperimental designs also do not have any random assignment of treatments to groups thereby hindering the generalizability of any findings from a sample to a population (a hallmark of inferential statistics). For example, a pre-experimental design would be where an older adult was taught to improve his or her balance. The fact that there is no control group (other seniors) who do not receive the treatment and positive or negative outcomes could be due to many other factors. In addition, since these seniors were not selected at random from the larger population of seniors, there is little credibility as to how effectively the results can be generalized to the larger population of seniors. Quasi-experimental designs can involve a control group but also involve participants who are members of preassigned groups. For example, if one were to examine the effectiveness of the balance training on groups of both men and women, this would be quasi-experimental in nature because these groups come to the experiment already assigned to a group. The same would be the case if political party, age, grade, neighborhood of residence or preexisting medical conditions were factors being studied. While one may have
Salkind_Prelims I.indd xlii
9/16/2010 12:42:01 PM
Editor’s Introduction
xliii
confidence in differences between groups within such factors (such as gender), the level of confidence is somewhat limited because it is unclear as to what these participants bring to the experiment that occurred prior to the experiment’s beginning and what effect those prior experiences might have on ultimate outcomes. The purest form of research design is experimental where participants are randomly selected from a population, assigned to groups at random, and there is a control group present. For example, if one was to compare seniors who receive balance training with those who do not and the individuals in both groups were selected from a large population of seniors who are relatively similar to one another (in age, income, and previous experience, for example), that would be an experimental design.
Evaluating Experimental Methods The classic work of Campbell and Stanley defined two sets of criteria for the evaluation of experimental designs; internal and external validity. Internal validity is the quality of an experiment such that what was manipulated can be clearly shown to have an impact on the outcome of interest. For example, if one can clearly show that no other factors are responsible for changes in balance in a group of seniors, one could say that the experiment has internal validity. An example of a threat to internal validity could be exposure to exercises outside the experiment that would provide additional balance exercise for some participants, but not all. External validity is the quality of an experiment where the results are generalizable to other participants or settings similar to the original. For example, if one can generalize the results of the balance experiment from a sample of seniors to a larger population, one could say that the experiment had external validity. However, a threat to the external validity of this experiment would have been if the seniors selected from the experiment are, perhaps, not representative of those to whom the researchers would like to generalize. In almost all cases, threats to internal and external validity can be addressed through the use of a control group included in the original design of the experiment.
Measurement and Statistics: Beyond Numbers Types of Statistics Statistics are tools used to make sense of small and large sets of data and generally fall into two categories.
Salkind_Prelims I.indd xliii
9/16/2010 12:42:01 PM
xliv
Editor’s Introduction
Descriptive statistics describe outcomes such as when the average level of income is reported for families within a certain time frame or the range of scores on a history test. Examples of such measures of central tendency are the mean, median, and mode and examples of such measures of dispersion or spread are the standard deviation and the variance. Taken together, these descriptive measures of central tendency and dispersion can describe any collection or distribution of data. Inferential statistics are quite a bit different. They are the tools that educational psychologists (and other scientists) use to infer from a small sample of observations to a much larger population of observations. For example, if it is observed that an intervention program helps a small group of adolescents stop smoking, one might be interested in seeing whether this finding is generalizable to a larger population of the same type of participants. There are hundreds of inferential statistics tests with names such as t-test between independent means, chi Square and multivariate analysis of variance. What they all do is test a hypothesis and allow one to conclude whether one can accurately infer from those finding to a population in general and whether those findings are significant.
The Importance of Significance It is difficult to scan any journal in the field of educational psychology and not find a discussion about significance levels and their importance. Statistical significance is a central theme to all scientific research and that certainly is the case in educational psychology as well. There is also an important distinction between significance and meaningfulness, but that distinction will be left for a later part of this introduction. Statistical significance is the probability that is assigned to an outcome that it is a ‘true’ outcome beyond a certain level of doubt. For example, a researcher is examining the difference between two groups of adolescents in their distance learning success; one group participates in the program and the other does not. As a measure of success, the lead researcher tests all the adolescents on what they have learned. Assuming that the two groups are equal at the start of the experiment and at the end (she can hypothesize some other outcome but has to assume equality at the beginning since she is being as objective as possible), if there is a difference between the groups (given that all other relevant factors are controlled) she will attribute the difference to the presence of the learning program. All well and good, but there is always the chance that she is wrong in her conclusions. The degree of risk that she is willing to take that she is wrong (and conventionally that risk is set at .01 or 1% or .05 of 5%) is what is known as statistical significance. It technically is the likelihood of rejecting the fact that there is no difference between groups (in this case), when there actually is one.
Salkind_Prelims I.indd xliv
9/16/2010 12:42:01 PM
Editor’s Introduction
xlv
While ‘p’ for probability values is very useful, its objectivity is often questioned (Hubbard, 2004; Hubbard and Lindsay, 2008). With increasing frequency when examining the results of statistical analysis, these statements of probability are weighed against other evidence including subjective impressions and as you shall read next, meaningfulness.
Significance versus Meaningfulness Especially for beginning students of statistics, there is no issue that may need more attention than whether findings are meaningful in addition to being statistically significant and there are two strategies one may take in addressing this topic. The first is of a substantive nature. For example, let us say that a wide scale experiment has been conducted where it has been shown that test scores in reading can be increased 7 points over a year’s time using a new instructional program. This increase in 7 points is statistically significant at the .05 level of significance. Once again, this means that the likelihood of an error is quite low (less than 5%) that indeed there is not a difference between the scores of children who participated in the reading program and those who did not. Now, the question that the administrator has to address, is whether that 7-point difference is worth it. How much did the program cost? What does a 7-point advantage mean? Do such children (as examined in other settings where the same program was used) seem to have long lasting effects of this boost in reading score? What about unintended consequences? Do the children who participate do better in other subjects? Parents and caretakers more involved? It should be clear by now that using statistical significance, in and of itself, as a measure of success, is inadequate beyond the concerns of basic science. The implications and meaning of that significance have to be explored as well. The second strategy is by examining effect size. Effect size (or ES) is a relatively new idea formerly present by Jacob Cohen in the late 1960s (Cohen, 1969). Basically, effect size is a measure of how different two groups are from one another. It is not just a measure of differences between scores, however, but a measure of the magnitude of the difference – very handy when you have to determine if a treatment indeed had an impact, or was meaningful, beyond simple statistical significance. Effect size is easily calculated and in the case of group comparisons, compares the difference between groups as a function of the amount of variability within them. The larger the effect size, the larger the ‘true’ group difference relative to the amount of variability or the larger the ‘true’ effect of the treatment. And, the larger the effect size, the better.
Salkind_Prelims I.indd xlv
9/16/2010 12:42:01 PM
xlvi
Editor’s Introduction
Measuring Outcomes: ‘If You Can, Measure It’ It is not exactly clear who first said, ‘If you can, measure it.’, but it may have very well been S. S. Stevens, the famous experimental psychologist who came up with a model of levels of measurement (nominal, ordinal, interval and ratio) that progress from the least precise (the nominal where things are named) to the most precise (the ratio level where there is a true zero present). Because there are increasing demands for assessing outcomes (be they job performance or college entrance exams) the world of measurement has expanded widely over the past 50 years.
The Basics There are many different topics in measurement, but the most important to know about as one reads a collection of papers such as those contained here is the distinction between reliability and validity and their application. Reliability is the quality of an assessment tool, be it a classroom test or a shipping mall questionnaire that insures the test does what it does reliably. Reliability can be assessed over time, between different forms of the same test and even estimates of the internal qualities of a test (do all the questions reliably ask about the same topic or sample the same personality disorder, for example). Each of these different types of reliability are sought out depending upon the purpose of the test. Validity is the quality of an assessment tool that indicates a test does what it is supposed to. For example, if one is designing a test of world history, then one would assume it is valid if the items on the test reflect the content covered in the general universe of what is known about world history. As reliability, there are several different types of validity ranging from face validity (where, on the surface, the test does what it should such as the world history test), to much more complex forms of validity such as construct validity, where one focuses on how well an instrument directly reflects the theory on which it is based. The relationship between reliability and validity is as follows. One can have a test that is reliable but not valid, but one cannot have a test that is valid without being reliable. Quite simply, a test must be able to do what it does repeatedly for it to be valid and a test cannot be valid unless it is reliable time after time. ‘What is the currency of great Britain?’ is certainly a reliable item (since it can be replicated time after time), but if one calls it a one-time test of basic math, it is widely off the mark (and not at all valid). Within the set of papers contained in these volumes, almost every empirical study will use some measure to assess outcomes. One should look carefully at the reliability and validity data reported in such studies for
Salkind_Prelims I.indd xlvi
9/16/2010 12:42:01 PM
Editor’s Introduction
xlvii
evidence of care and authenticity in collecting the data and the conclusions that are drawn.
Keeping Score: The New Sciences of Assessment and Evaluation Assessment and evaluation are the sciences of determining the worth of a particular outcome according to a set of criteria. For example, if one were to evaluate the efficiency of an early intervention program or a semester-at-sea educational experience, one would define the criteria beforehand (acquisition of language skills or development of international awareness) to be evaluated during and after the educational experience. In the past 50 years, assessment and evaluation have assumed a huge importance where individuals and institutions are having to be more accountable for their actions and the outcomes of their research, experiments and demonstration projects. In fact, the first substantive handbook regarding anything to do with evaluation was not published until 1975 – reflecting disciplines still in their infancy. Assessment and evaluation are also the sciences that reflect the increasing globalization of the educational community which also encouraged the development of strategies for improving approaches to difficult educational issues across countries and cultures. This is well demonstrated by Mislevy (1995) in his paper on inference in international assessment. He asks the important question regarding what can be learned from international assessments and how might that knowledge bear on what educators believe is important to learn? He concludes that spending resources on international assessments provides invaluable information from different perspectives that can form the basis for educational policy. Throughout this discussion, one should keep in mind that there are many different roles that assessment and evaluation play in an educational context, but perhaps the most important is how well these tools facilitate learning. There are many different views of how this might happen and Lorrie Shepard (2000) presents one of the most interesting. She does not contend that assessment should focus alone on the assignment of grades or even the evaluation of outcomes to satisfy external requirements or demands. Rather, she sees assessment as part of an instructional strategy that enhances learning. Where instruction and assessment were once seen as being ‘curiously separate’, she contends that the traditional separation of the two has hindered the integration of these two processes which when working together, reveals new possibilities for instructional design and evaluation. Others have integrated the various aspects of educational psychology covered in this collection of papers. For example, Stanley Deno (2003) writes about how assessment measures should be curriculum based, further integrating the
Salkind_Prelims I.indd xlvii
9/16/2010 12:42:01 PM
xlviii
Editor’s Introduction
disciplines of curriculum and assessment and evaluation. He believes that the Curriculum Based Measures (CBM) movement is an excellent tool for evaluating the effectiveness of instruction. Most interestingly, CBM was first used with students with learning disabilities and as with many ‘new’ approaches to education, the positive outcomes have been further applied to the mainstream. Earlier, we mentioned the utility of effect size in understanding the magnitude of experimental differences, which might be observed. Interestingly, in the evaluation literature there is a similar concern effectively summarized by Sechrest and Yeaton (1982). They acknowledge how effect size is very important to any evaluation effort and review the many questions that still surround the use of this tool.
Reference Cohen, J. (1969). Statistical Power Analysis for the Behavioral Sciences, 1st Edition, Lawrence Erlbaum Associates.
Salkind_Prelims I.indd xlviii
9/16/2010 12:42:01 PM
Section I: Human Development
Salkind_Chapter 01.indd 1
9/16/2010 12:41:40 PM
Salkind_Chapter 01.indd 2
9/16/2010 12:41:40 PM
1 Aging and Human Performance Neil Charness
H
uman factors practitioners are advised to understand the characteristics of user populations. Most populations are aging. Developed countries owe this change to increased fertility rates from 1946 to 1964 (“baby boom” cohorts), followed by rapid declines in population fertility coupled with increasing longevity (He, Sengupta, Velkoff, & DeBarros, 2005). Increased longevity is also occurring in developing countries. At the turn of the 21st century, Japan had the oldest population: a median age of 41 years compared with the world median of 26 years (World Population Ageing, 2001). Furthermore, older populations are also becoming more diverse because of increases in migration and differential birth rates within subpopulations, presenting a challenge to those pursuing inclusive design principles (Vanderheiden, 1997). Meta-analytic studies have shown that age is a moderate (r > .3; Cohen & Cohen, 1975) predictor of performance on most laboratory tasks (Verhaeghen & Salthouse, 1997). Age is an important predictor of performance with life management tasks such as driving, use of products (e.g., technology), and success with training materials. Aging reduces adaptive capacity, so achieving a successful person-environment fit (Fozard, 1981a) is more challenging, requiring older adult involvement in general product design (Nayak, 1995) because they can sometimes identify design flaws more successfully than younger adults (e.g., Stephens, Carswell, & Schumacher, 2006). Another promising approach is the application of technology to both prevent and alleviate age-related declines in abilities (gerontechnology: Bouma, Fozard,
Source: Human Factors: The Journal of the Human Factors and Ergonomics Society, 50 (2008): 548–555.
Salkind_Chapter 01.indd 3
9/16/2010 12:41:40 PM
4
Human Development
Bouwhuis, & Taipale, 2007). Physical ergonomics (Kelly & Kroemer, 1990) can play an important role in ensuring that today’s young adults reach old age in the best possible health. Although there is little consensus on how to define older adult, a distinction that has proven useful in the general aging literature is to consider chronological age bands such as the young-old (65–74), middle-old (75–84), and old-old (85+). These bands are often associated with normative events such as full pension access (in the United States, it used to be age 65) and tend to differentiate (e.g., from the earlier age band of 65+) and move upward depending on current longevity in the population. Older is a relative term, and some prefer to use the term aging adult. Those in the older bands often have very different needs than their younger counterparts. Improved design necessitates coupling age bands with a range of environments (e.g., housing, work environments, public places) and assessing associated physical and mental demands. Although Human Factors has not served as a primary outlet for research on aging, it has played an influential role. A citation analysis was conducted using the ISI Web of Science (http://scientific.thomson.com/products/wos/) for articles published in Human Factors using the search terms age OR aging for topic, evaluating which articles had the most impact via citations from ISI-covered sources ( journals, books). Citations are only one tool for judging impact and are probably not ideal for assessing impact on product design. As Figure 1 shows, interest in this topic was low until special issues on aging in 1981 and 1990, and particularly the 1991 issue on age and driving, from which point there has been an increasing stream of papers. As Table 1 makes clear, the articles having the most impact (as judged by citations) make up a fairly narrow set: 8 of the top 10 deal with driving, and the other 2 (by Welford, Salthouse) discuss age-related changes in perceptual and cognitive performance. Given the severe consequences of errors in driving 16 14
Number
12 10 8 6 4 2 0 1950
1960
1970
1980 Year
1990
2000
2010
Figure 1: Number of published articles in Human Factors for the topic of age or aging by year through 2007.
Salkind_Chapter 01.indd 4
9/16/2010 12:41:40 PM
Charness
Aging and Human Performance
5
Table 1: Top 10 cited papers in Human Factors dealing with the topic of age or aging Citations through 2007
Citations/Year
Ball, K., & Owsley, C. (1991). Identifying correlates of accident involvement for the older driver. Human Factors, 33, 583–595.
93
5.17
Summala, H., & Mikkola, T. (1994). Fatal accidents among car and truck drivers: Effects of fatigue, age, and alcoholconsumption. Human Factors, 36, 315–326.
70
4.67
Welford, A. T. (1981). Signal, noise, performance, and age. Human Factors, 23, 97–109.
65
2.32
Shinar, D., & Schieber, F. (1991). Visual requirements for safety and mobility of older drivers. Human Factors, 33, 507–519.
55
3.06
Parasuraman, R., & Nestor, P. G. (1991). Attention and driving skills in aging and Alzheimers-disease. Human Factors, 33, 539–557.
50
2.78
Jette, A. M., & Branch, L. G. (1992). A 10-year follow-up of driving patterns among the community-dwelling elderly. Human Factors, 34, 25–31.
44
2.59
Brouwer, W. H., Waterink, W., Vanwolffelaar, P. C., & Rothengatten, T. (1991). Divided attention in experienced young and older drivers: Lane tracking and visual analysis in a dynamic driving simulator. Human Factors, 33, 573–582.
42
2.33
Waller, P. F. (1991). The older driver. Human Factors, 33, 499–505.
41
2.28
Stelmach, G. E., & Nahom, A. (1992). Cognitive-motor abilities of the elderly driver. Human Factors, 34, 53–65.
38
2.24
Salthouse, T. A. (1990). Influence of experience on age-differences in cognitive-functioning. Human Factors, 32, 551–569.
38
2.11
Publication
and the greater risk to older adults from crashes, it is not that surprising that publications on age and driving have received so much attention. Similarly, models of age-related changes in performance, the neural noise model of Welford, and the Salthouse article on the mitigating role of experience have stimulated theoretical (e.g., Mireles & Charness, 2002) and practical (e.g., Web design) advances. The themes represented by these top 10 articles are representative of the development of the field. Much of the progress and many of the contributions over the past 50 years have taken the form of increasingly precise measurements of the impact of aging on human performance (Welford, 1958; see also Charness & Bosman, 1990; Fisk & Rogers, 1997; Fozard, 1981b; Nichols, Rogers, & Fisk, 2006; Pew & Van Hemel, 2004). Welford (1958) stressed the importance of understanding aging for skilled performance from the perspective of the training and retraining needs of older adults in rapidly changing job settings in the United Kingdom (Rabbitt, 1997). Significant contributions to national productivity can be had by improving efficiency in an aging workforce, and recent research has uncovered better ways to
Salkind_Chapter 01.indd 5
9/16/2010 12:41:40 PM
6
Human Development
measure productivity and to train older workers (Callahan, Kiker, & Cross, 2003; Charness & Czaja, 2006; Kubeck, Delp, Haslett, & McDaniel, 1996; Marbach, 1968; Schulz & Adams, 2007; Straka, 1990; Wegman & McGee, 2004). British (e.g., Belbin, 1965; Murrell, Powesland, & Forsaith, 1962) and European investigators (e.g., Marquié, Cau-Bareille, & Volkoff, 1998) have led research into age and work, probably because aging occurred earliest for European populations.
Designing for Present and Future Older Cohorts: Methodological Challenges Aging research is handicapped because chronological age cannot be manipulated, so causal reasoning about aging is difficult, necessitating chains of assumptions and sophisticated statistical modeling techniques (e.g., Hofer & Sliwinski, 2006). Most studies are cross-sectional, quasi-experimental designs that measure age differences in performance as a proxy for aging: within individual changes. Cross-sectional studies typically contrast the performance of younger adults (age ~25) with older adults (age ~65). For human factors practitioners facing a short time horizon, a cross-sectional study, particularly one that uses representative sampling from the target population, is sufficient for understanding how best to design for today’s older adults. Determining sign characteristics that enable older and younger drivers to respond to road conditions in a timely fashion is a good example (icons vs. words: Kline, Ghali, & Kline, 1990). Longitudinal and sequential designs are useful for understanding changing trends in development that may forecast the capabilities of future cohorts of older adults. A critical contribution of early research was the finding that older adults are more heterogeneous in level of functioning (interindividual variability) than younger adults. As has been shown more recently with sophisticated analyses of longitudinal data, older adults also exhibit greater intraindividual or occasion-to-occasion variability (Hultsch, MacDonald, & Dixon, 2002). So, designing for the 95th percentile within a targeted older population may make for very broad design boundary conditions. A recent accomplishment for the field has been the development of reliable, representative data on changes in cognition, perception, and psychomotor skill from cross-sectional studies (e.g., Kroemer, 2005; Steenbekkers & van Beijsterveldt, 1998). Standards bodies are capitalizing on such research to develop guidelines to support older users – for example, the International Organization for Standardization (ISO) developing a document on ergonomic needs of older persons and persons with disabilities (http://www.iso.org/iso/iso_catalogue/ catalogue_tc/catalogue_detail.htm?csnumber=40933) and a developing standard for telehealth (http://portal.etsi.org/stfs/STF_HomePages/STF299/ STF299.asp).
Salkind_Chapter 01.indd 6
9/16/2010 12:41:40 PM
Charness
Aging and Human Performance
7
Theoretical Frameworks Given the difficulty of establishing causation, much of the progress in aging research has resulted from conceptual development. The life span developmental perspective (Baltes, 1979) stresses that biology and culture jointly determine the developmental trajectory of human capabilities, with performance at any point in the life span representing a balance between losses and gains. How to mitigate the losses and promote and capitalize on the gains has been a central focus of human factors practitioners. Another helpful framework is that of differential trajectories for fluid and crystallized abilities. Abilities that are culture laden, known as “crystallized abilities” (Horn, 1982) or “pragmatic abilities” (Baltes, 1993), often measured by tests of word knowledge or information, show modest increases with age until the 50s or 60s. The “fluid abilities” or “mechanics of intelligence,” abstract problem-solving abilities that are more biologically constrained, show consistent cross-sectional declines from the 20s, as does working memory (Dobbs & Rule, 1989). These findings about age-related changes in cognition have influenced design by providing guidelines for how to minimize age differences in performance by drawing on preserved abilities (e.g., Fisk, Rogers, Charness, Czaja, & Sharit, 2004). The general slowing with age framework (Salthouse, 1996) argues that slowing accounts for much of the age-related variance in performance on complex cognitive tasks. Typically, older adults are slower than young ones by between 50% and 100%. Using meta-analytic techniques to identify informationprocessing parameters, Jastrzembski and Charness (2007) found different slowing factors for cognitive (1.7), perceptual (1.8), and motor (2.1) processing. Welford (1981) postulated that aging resulted in increased neural noise with concomitant diminished signal-to-noise ratios for environmental inputs. This neural noise framework leads to straightforward design recommendations to increase signal strength by increasing the size and contrast of text and other perceptual information, as well as by decreasing noise in the display (irrelevant flanking information for vision and external noise for auditory signals). Such theoretical frameworks, coupled with basic research on perception and cognition, have led to checklists for Web design for older adults (www. nlm.nih.gov/pubs/checklist.pdf ) and model sites for health information (http://nihseniorhealth.gov/). See Morrell (2002). Recent advances in neuroscience have identified important brain changes and are suggestive of ways to intervene to change the course of aging. ReuterLorenz, Stanczak, and Miller (1999) and Cabeza (2002) have shown that for cognitive tasks that activate one lateralized brain region in younger adults, older adults also recruit the homologous region in the other hemisphere. This finding suggests that “brain workload” (metabolic demand) may be greater for older adults, even when they are performing at levels similar to the performance of younger ones, and implies that they are at greater risk of
Salkind_Chapter 01.indd 7
9/16/2010 12:41:40 PM
8
Human Development
excess workload for complex procedures. Hence, minimizing complexity, as in the number of steps in a procedure, may be more important for older than younger adults (Fisk et al., 2004). A related framework is cognitive reserve (e.g., Stern et al., 1995). Those engaging in cognitively complex work and leisure activities may build up brain reserves that enable them to weather both normal aging and dementia much longer, although they may show faster decline when the reserve is exhausted. An exciting discovery is that training interventions can modestly remediate normative age-related cognitive decline in some abilities (e.g., Ball et al., 2002; speed training: Ball, Edwards, & Ross, 2007; and exercise: Colcombe et al., 2003). Craik (1986) noted that older adult memory performance depended on the extent to which there was “environmental support” for retrieval operations, with recall generating worse performance than did cued recall and recognition. The principles of providing environmental support for users (environmental cuing) and relying on old habits that demand few processing resources (Craik & Anderson, 1998) are useful guides for designing tools and environments for older users. Such principles are readily seen in the human-computer interaction field (e.g., Sharit, Czaja, Nair, & Lee, 2003) through provision of online help systems and the shift from command line to graphic user interfaces wherein menu items and icons may cue the user to specific functions that are available. However, such shifts do not always provide differential benefit to older adults (Charness, Kelley, Bosman, & Mottram, 2001). Differential benefit has been seen in other domains, such as air traffic control read back (Morrow et al., 2003). Other techniques for minimizing age effects have relied on identifying processes that are least impaired by aging (direct vs. indirect pointing devices for computer systems: Charness, Holley, Feddon, & Jastrzembski, 2004). Simulation and modeling can sometimes substitute for usability testing to allow designers to make choices about different designs. As reliable estimates for basic information-processing parameters become available for older adult populations, cognitive architectures can be modified to predict older adult performance. Influential models such as ACT (Anderson, 1996), EPIC (Meyer & Kieras, 1997), and CHREST (Smith, Gobet, & Lane, 2007) have been modified to predict older adult performance by changing base parameters to older adult variants (Byrne, 1998; Salvucci, Chavez, & Lee, 2004). GOMS modeling (Card, Moran, & Newell, 1983) using older adult parameters has been successful at predicting multiple tasks for different models of mobile phones (Jastrzembski & Charness, 2007).
Application of Human Factors to Aging: Driving Ensuring safe driving for older adults is an important applied issue (see Table 1), given that older occupants of crash vehicles are about three times
Salkind_Chapter 01.indd 8
9/16/2010 12:41:40 PM
Charness
Aging and Human Performance
9
more likely to die from an impact compared with younger ones (Evans, 2004). Driving safety typically follows a U-shaped function, with relatively high crash rates at young and very old ages on a crashes/km driven basis and safest driving performance in the 50s and early 60s (Evans, 2004). Classic human factors approaches – redesigning the tool and environment, as well as training the user – have been adopted in driving research. Ball and colleagues (e.g., Owsley, Ball, Sloane, Roenker, & Bruni, 1991) have developed a product, the useful field-of-view (UFOV) test, assessing the ability to divide attention between central and peripheral target detection. UFOV predicts automobile crashes (retrospectively and prospectively) and could become a screening device for determining fitness to drive. More important, training on speeded perception is possible and durable, and it improves driving simulator performance (Ball et al., 2007). Another approach to improving safety in older drivers is to substitute automated procedures for age-degraded monitoring functions. Despite added workload, a GPS was superior to physical maps for older drivers (Dingus et al., 1997). Kramer, Cassavaugh, Horrey, Becic, and Mayhugh (2007) showed that older drivers benefited to the same or sometimes to a greater extent from a combined auditory and visual collision avoidance warning system in driving simulator tasks. Aging is a relatively young field, and given the roughly 20- to 30-year lag from research to application (Adams, 1972), we would not expect to see many products in the marketplace that take advantage of human factors efforts in aging. However, human factors researchers have contributed valuable data and guidelines to many domains: aircraft piloting (Taylor, O’Hara, Mumenthaler, Rosen, & Yesavage, 2005), workplace performance (Sharit et al., 2004), training principles (Jamieson & Rogers, 2000), medication adherence (Morrell, Park, & Poon, 1990; Morrow, Leirer, Carver, & Tanke, 1998), and technology interaction (Czaja et al., 2006). Another example of commercial product development is Intelehealth’s adoption of a rotary knob as an input device for a product designed to enable informal family caregivers to interact with and support older adults living alone, based on research by Rogers, Fisk, McLaughlin, and Pak (2005; Rogers, personal communication, April 16, 2008).
Needs Going Forward Future generations of older adults are likely to be advantaged compared with current cohorts. Disability has been decreasing (Manton, Stallard, & Corder, 1995); general cognitive capabilities have increased by a standard deviation in a generation (Flynn, 1987); education levels have been increasing, as has societal and personal wealth (Charness, in press, 2008), which permits greater access to assistive technologies. Nonetheless, demands made by complex products, particularly by miniaturized technological artifacts, threaten
Salkind_Chapter 01.indd 9
9/16/2010 12:41:40 PM
10
Human Development
to undermine advances in older adult well-being. Furthermore, rehabilitation and assistive technology research (e.g., Mihailidis & Fernie, 2002) will probably be in increasing demand given lengthened lives and work lives. Sometimes being able to predict speed of performance is adequate to predict errors, as in the case of a crash in a simulator. However, for those older adults no longer working or driving, performing accurately may be more important than performing quickly, for instance, when taking medications or using health care devices. Although a number of frameworks have been constructed for understanding error (Norman, 1981; Reason, 1991; Sharit, 2006), additional development is needed for error prediction in older adults. Design for error-free performance can be an exceptionally valuable goal. The design community tries to optimize productivity, safety, and comfort/ satisfaction. As the citation analysis suggests, there is limited attention to determining how to make products, such as assistive technology, safe and attractive to older users. With the advent of artificially intelligent robotic assistants (nursebots: Matthews, 2002) and pets for older adults (robotic seal: Wada & Shibata, 2007), design for comfort (that includes privacy concerns) and safety will be increasingly important. Much more can be done to improve quality of life for older adults, particularly in the exciting area of interventions to mitigate normative negative age changes, if the design and research community more frequently enlists human factors specialists who have practical implementation experience.
References Adams, J. A. (1972). Research and the future of engineering psychology. American Psychologist, 27, 615–622. Anderson, J. R. (1996). ACT: A simple theory of complex cognition. American Psychologist, 51, 355–365. Ball, K., Berch, D. B., Helmers, K. F., Jobe, J. B., Leveck, M. D., Marsiske, M., et al. (2002). Effects of cognitive training interventions with older adults: A randomized control trial. Journal of the American Medical Association, 288, 2271–2281. Ball, K., Edwards, J. D., & Ross, L. A. (2007). The impact of speed of processing training on cognitive and everyday functions. Journal of Gerontology: Psychological Sciences, 62B, 19–31. Baltes, P. B. (1979). Life-span development psychology: Some converging observations on history and theory. In P. B. Baltes & O. G. Brim, Jr. (Eds.), Life-span development and behavior (Vol. 2, pp. 255–279). New York: Academic Press. Baltes, P. B. (1993). The aging mind: Potential and limits. Gerontologist, 33, 580–594. Belbin R. M. (1965). Training methods for older workers. Paris: Organization of Economic Cooperation and Development (OECD). Bouma, H., Fozard, J. L., Bouwhuis, D. G., & Taipale, V. (2007). Gerontechnology in perspective. Gerontechnology, 6, 190–216. Byrne, M. (1998). Taking a computational approach to aging: The SPAN theory of working memory. Psychology and Aging, 13, 309–322. Cabeza, R. (2002). Hemispheric asymmetry reduction in older adults: The HAROLD model. Psychology and Aging, 17, 85–100.
Salkind_Chapter 01.indd 10
9/16/2010 12:41:41 PM
Charness
Aging and Human Performance
11
Callahan, J. S., Kiker, D. S., & Cross, T. (2003). Does method matter? A meta-analysis of the effects of training method on older learner training performance. Journal of Management, 29, 663–680. Card, S. K., Moran, T. P., & Newell, A. (1983). The psychology of human-computer interaction. Hillsdale, NJ: Erlbaum. Charness, N. (in press, 2008). Technology as multiplier effect for an aging work force. In K. W. Schaie & R. Abeles (Eds.), Social structures and aging individuals: Continuing challenges. New York: Springer. Charness, N., & Bosman, E. A. (1990). Human factors and design for older adults, In J. E. Birren & K. W. Schaie (Eds.), Handbook of the psychology of aging (3rd ed., pp. 446– 463). San Diego: Academic Press. Charness, N., & Czaja, S. J. (2006). Older worker training: What we know and don’t know (AARP Public Policy Institute, #2006-22). Washington, DC: AARP. Retrieved April 28, 2008, from http://www.aarp.org/research/work/issues/2006_22_worker.html Charness, N., Holley, P., Feddon, J., & Jastrzembski, T. (2004). Light pen use and practice minimize age and hand performance differences in pointing tasks. Human Factors, 46, 373–384. Charness, N., Kelley, C. L., Bosman, E. A., & Mottram, M. (2001). Word processing training and retraining: Effects of adult age, experience, and interface. Psychology and Aging, 16, 110–127. Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum. Colcombe, S. J., Erickson, K. I., Raz, N., Webb, A. G., Cohen, N. J., McAuley, E., et al. (2003). Aerobic fitness reduces brain tissue loss in aging humans. Journal of Gerontology: Medical Sciences, 58A, 176–180. Craik, F. I. M. (1986). A functional account of age differences in memory. In F. Klix & H. Hagendorf (Eds.), Human memory and cognitive capabilities: Mechanisms and performances. Symposium in memoriam Hermann Ebbinghaus 1885, Berlin Humboldt University 1985 (pp. 409–422). Amsterdam: North-Holland. Craik, F. I. M., & Anderson, N. D. (1998). Applying cognitive research to problems of aging. In D. Gopher & A. Koriat (Eds.), Attention and performance XVII (pp. 583–616). Cambridge, MA: MIT Press. Czaja, S. J., Charness, N., Fisk, A. D., Hertzog, C., Nair, S. N., Rogers, W. A., et al. (2006). Factors predicting the use of technology: Findings from the Center for Research and Education on Aging and Technology Enhancement (CREATE). Psychology and Aging, 21, 333–352. Dingus, T. A., Hulse, M. C., Mollenhauer, M. A., Fleischman, R. N., McGehee, D. V., & Manakkal, N. (1997). Effects of age, system experience, and navigation technique on driving with an Advanced Traveler Information System. Human Factors, 39, 177–199. Dobbs, A. R., & Rule, B. G. (1989). Adult age differences in working memory. Psychology and Aging, 4, 500–503. Evans, L. (2004). Traffic safety. Bloomfield, MI: Science Serving Society. Fisk, A. D., & Rogers, W. (Eds.). (1997). Handbook of human factors and the older adult. San Diego: Academic Press. Fisk, A. D., Rogers, W. A., Charness, N., Czaja, S. J., & Sharit, J. (2004). Designing for older adults: Principles and creative human factors approaches. Boca Raton, FL: CRC Press. Flynn, J. R. (1987). Massive gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101, 171–191. Fozard, J. L. (1981a). Person-environment relationships in adulthood: Implications for human factors engineering. Human Factors, 23, 7–27. Fozard, J. L. (1981b). Special issue preface. Human Factors, 23, 3–6.
Salkind_Chapter 01.indd 11
9/16/2010 12:41:41 PM
12
Human Development
He, W., Sengupta, M., Velkoff, V. A., & DeBarros, K. A. (2005). 65+ in the United States: 2005 (Current Population Rep. P23–209). Washington, DC: Government Printing Office. Retrieved April 28, 2008, from http://www.census.gov/prod/2006pubs/p23–209.pdf Hofer, S. M., & Sliwinski, M. J. (2006). Design and analysis of longitudinal studies on aging. In J. E. Birren & K. W. Schaie (Eds.), Handbook of the psychology of aging (6th ed., pp. 15–37). Amsterdam: Elsevier Academic Press. Horn, J. L. (1982). The theory of fluid and crystallized intelligence in relation to concepts of cognitive psychology and aging in adulthood. In F. I. M. Craik & S. Trehub (Eds.), Aging and cognitive processes (pp. 237–278). New York: Plenum. Hultsch, D. F., MacDonald, S. W. S., & Dixon, R. A. (2002). Variability in reaction time performance of younger and older adults. Journal of Gerontology: Psychological Sciences, 57B, P101–P115. Jamieson, B. A., & Rogers, W. A. (2000). Age-related effects of blocked and random practice schedules on learning a new technology. Journal of Gerontology: Psychological Sciences, 55B, P343–P353. Jastrzembski, T. S., & Charness, N. (2007). The Model Human Processor and the older adult: Parameter estimation and validation within a mobile phone task. Journal of Experimental Psychology: Applied, 13, 224–248. Kelly, P. L., & Kroemer, K. H. E. (1990). Anthropometry of the elderly: Status and recommendations. Human Factors, 32, 571–595. Kline, T. J., Ghali, L. M., & Kline, D. W. (1990). Visibility distance of highway signs among young, middle-aged, and older observers: Icons are better than text. Human Factors, 32, 609–619. Kramer, A. F., Cassavaugh, N., Horrey, W. J., Becic, E., & Mayhugh, J. L. (2007). Influence of age and proximity warning devices on collision avoidance in simulated driving. Human Factors, 49, 935–949. Kroemer, K. H. E. (2005). “Extra-ordinary” ergonomics: How to accommodate small and big persons, the disabled and elderly, expectant mothers, and children. Boca Raton, FL: CRC Press. Kubeck, J. E., Delp, N. D., Haslett, T. K., & McDaniel, M. A. (1996). Does job-related training performance decline with age? Psychology and Aging, 11, 92–107. Manton, K. G., Stallard, E., & Corder, L. (1995). Changes in morbidity and chronic disability in the U.S. elderly population: Evidence from the 1982,1984, and 1989 National Long Term Care Surveys. Journal of Gerontology: Social Sciences, 50, S194–S204. Marbach, G. (1968). Job redesign for older workers. Paris: OECD Employment of Older Workers. Marquié, J. C., Cau-Bareille, D. P., & Volkoff, S. (Eds.). (1998). Working with age. London: Taylor & Francis. Matthews, J. T. (2002). The Nursebot Project: Developing a personal robotic assistant for frail older adults in the community. Home Health Care Management Practice, 14, 403–405. Meyer, D. E., & Kieras, D. E. (1997). A computational theory of executive cognitive processes and multiple-task performance: Part 1. Basic mechanisms. Psychological Review, 104, 3–65. Mihailidis, A., & Fernie, G. R. (2002). The importance of using “context-aware” design principles when developing cognitive assistive devices for older adults. Gerontechnology, 2, 173–188. Mireles, D. E., & Charness, N. (2002). Computational explorations of the influence of structured knowledge on age-related cognitive decline. Psychology and Aging, 17, 245–259. Morrell, R. W. (Ed.). (2002). Older adults, health information, and the World Wide Web. Mahwah, NJ: Erlbaum. Morrell, R. W., Park, D. C., & Poon, L. W. (1990). Effects of labeling techniques on memory and comprehension of prescription information in young and old adults. Journal of Gerontology, 45, P166–P172.
Salkind_Chapter 01.indd 12
9/16/2010 12:41:41 PM
Charness
Aging and Human Performance
13
Morrow, D., Leirer, V., Carver, L. M., & Tanke, E. D. (1998). Older and younger adult memory for health appointment information: Implications for automated telephone messaging design. Journal of Experimental Psychology: Applied, 4, 352–374. Morrow, D. G., Ridolfo, H. E., Menard, W. E., Sanborn, A., Stine-Morrow, E. A. L., Magnor, C., et al. (2003). Environmental support promotes expertise-based mitigation of age differences on pilot communication tasks. Psychology and Aging , 18, 268–284. Murrell, K. F. H., Powesland, P. F., & Forsaith, B. (1962). A study of pillar-drilling in relation to age. Occupational Psychology, 36, 45–52. Nayak, U. S. L. (1995). Elders-led design. Ergonomics in Design, 3, 8–13. Nichols, T. A., Rogers, W. A., & Fisk, A. D. (2006). Design for aging. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (3rd ed., pp. 1418–1445). Hoboken, NJ: Wiley. Norman, D. A. (1981). Categorization of action slips. Psychological Review, 88, 1–15. Owsley, C., Ball, K., Sloane, M. E., Roenker, D. L., & Bruni, J. R. (1991). Visual/cognitive correlates of vehicle accidents in older drivers. Psychology and Aging, 6, 403–415. Pew, R. W., & Van Hemel, S. B. (Eds.). (2004). Technology for adaptive aging. Washington, DC: National Academies Press. Rabbitt, P. (1997). The Alan Welford memorial lecture. Ageing and human skill: A 40th anniversary. Ergonomics, 40, 962–981. Reason, J. T. (1991). Human error. Cambridge, UK: Cambridge University Press. Reuter-Lorenz, P. A., Stanczak, L., & Miller, A. C. (1999). Neural recruitment and cognitive aging: Two hemispheres are better than one, especially as you age. Psychological Science, 10, 494–500. Rogers, W. A., Fisk, A. D., McLaughlin, A. C., & Pak, R. (2005). Touch a screen or turn a knob: Choosing the best device for the job. Human Factors, 47, 271–288. Salthouse, T. A. (1996). The processing-speed theory of adult age differences in cognition. Psychological Review, 103, 403–428. Salvucci, D. D., Chavez, A. K., & Lee, F. J. (2004). Modeling effects of age in complex tasks: A case study in driving. In Proceedings of the 26th Annual Conference of the Cognitive Science Society (pp. 1197–1202). Mahwah, NJ: Erlbaum. Schulz, K. S., & Adams, G. A. (Eds.). (2007). Aging and work in the 21st century. Mahwah, NJ: Erlbaum. Sharit, J. (2006). Human error. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (3rd ed., pp. 708–760). Hoboken, NJ: Wiley. Sharit, J., Czaja, S. J., Hernandez, M., Yang, Y., Perdomo, D., Lewis, J. E., et al. (2004). An evaluation of performance by older persons on a simulated telecommuting task. Journal of Gerontology: Psychological Sciences, 59B, 305–316. Sharit, J., Czaja, S. J., Nair, S., & Lee, C. C. (2003). Effects of age, speech rate, and environmental support in using telephone voice menu systems. Human Factors, 45, 234–251. Smith, L. I., Gobet, F., & Lane, P. C. R. (2007). An investigation into the effect of ageing on expert memory with CHREST. In Proceedings of the United Kingdom Workshop on Computational Intelligence – UKCI07. Available from http://hdl.handle.net/2438/1064 Steenbekkers, L. P. A., & van Beijsterveldt, C. E. M. (Eds.). (1998). Design-relevant characteristics of ageing users. Delft, the Netherlands: Delft University Press. Stephens, E. C., Carswell, C. M., & Schumacher, M. M. (2006). Evidence for an elder’s advantage in the naive product usability judgments of older and younger adults. Human Factors, 48, 422–133. Stern, Y., Alexander, G. E., Prohovnik, I., Stricks, L., Link, B., Lennon, M. C., et al. (1995). Relationship between lifetime occupation and parietal flow: Implications for a reserve against Alzheimer’s disease pathology. Neurology, 45, 55–60. Straka, G. A. (1990). Training older workers for and in the years after 2000. Journal of Educational Gerontology, 5, 68–78.
Salkind_Chapter 01.indd 13
9/16/2010 12:41:41 PM
14
Human Development
Taylor, J. L., O’Hara, R., Mumenthaler, M. S., Rosen, A. C., & Yesavage, J. A. (2005). Cognitive ability, expertise, and age differences in following air-traffic control instructions. Psychology and Aging, 20, 117–133. Vanderheiden, G. C. (1997). Design for people with functional limitations resulting from disability, aging, or circumstance. In G. Salvendy (Ed.), Handbook of human factors and ergonomics (2nd ed., pp. 2010–2052). New York: Wiley. Verhaeghen, P., & Salthouse, T. A. (1997). Meta-analyses of age-cognition relations in adulthood: Estimates of linear and non-linear age effects and structural models. Psychological Bulletin, 122, 231–249. Wada, K., & Shibata, T. (2007). Social effects of robot therapy in a care house: Change of social network of the residents for two months. In Proceedings of IEEE International Conference on Robotics and Automation (WeD8.4, pp. 1250–1255). Piscataway, NJ: Institute of Electrical and Electronics Engineers, Inc. Wegman, D. H., & McGee, J. P. (2004). Health and safety needs of older workers/Committee on the Health and Safety Needs of Older Workers. Washington, DC: National Academies Press. Welford, A. T. (1958). Ageing and human skill. Oxford, UK: Oxford University Press. Welford, A. T. (1981). Signal, noise, performance and age. Human Factors, 23, 97–109. World population ageing: 1950–2050 (ST/ESA/SER.A/207). (2001). New York: Department of Economic and Social Affairs Population Division, United Nations. Retrieved April 28, 2008, from http://www.un.org/esa/population/publications/worldageing19502050/
Salkind_Chapter 01.indd 14
9/16/2010 12:41:41 PM
2 Violence and Human Development Elton B. McNeil
B
irth control may ultimately be the only trustworthy way to limit the amount of violence on this planet. Throughout history we have tried to reduce the human potential for violence by killing as many of our fellow men as we could, but we are falling behind in the task. Somewhere, between these extremes of was and never was, mankind still seeks a middle ground on which to stand without fear and trembling. The human condition is this: we can control violence in some of the people all of the time; we can control violence in all of the people some of the time; and we have failed throughout history to control violence in all of the people all of the time. Perhaps violence involves so much primitive joy and raw gratification that the quest for its absolute control is nothing but a fool’s errand. It may well be that only the long-term evolutionary alteration of humankind will produce a level of wisdom and restraint sufficient to banish assault as a means of communicating feelings to one’s fellow man. It is equally possible that man has strayed from basic truth, has been corrupted by civilized living, and can only rediscover peaceful coexistence by examining the ways of our more primitive brethren – the animals.
Violence – Fang and Claw Style An ancient adage states that man is distinguishable from animals primarily by his capacity for making trouble for himself. While this may be a somewhat cynical view of the condition of Homo sapiens, it remains true that in moments
Source: The ANNALS of the American Academy of Political and Social Science, 364 (1966): 149–157.
Salkind_Chapter 02.indd 15
9/16/2010 12:41:31 PM
16
Human Development
of despair about human violence we wistfully search the animal kingdom for moral and ethical guidance. Ever since Rousseau, we have suspected that the human condition is one depraved by the baleful influence of high-rise crowding, megalopolis, breakneck speed, and the unremitting clamor of industrial society. In the earthy simplicity of species less complex than our own, we have sought a sign that there is hope for mankind. In simplicity there may indeed be truth, but it is wildly improbable that the anthill, beehive, or monkey colony has much to teach modern, interplanetary, atomic man straining to burst the bonds of time, place, and person. Yet, there are some basic observations of animal life reported by Scott1 that are worth underscoring. Scott insists, for example, that our classic stereotypes of animal behavior simply do not square with the facts. Wolves, for example, are regularly maligned; yet, the “traditional slinking, slavering, and treacherous animal of fiction corresponds only to the behavior of a wolf that has been recently trapped and is extremely frightened.” Wolf packs, in a natural setting, live peacefully and co-operatively once a means of social control of aggression is established. Among dogs and wolves, the principles of dominance and submission and of territoriality serve to limit the occasions on which violence occurs. Fighting is an instrument used to establish relative position in the social hierarchy, but it disappears once the rank-ordering is accomplished. Combat reappears, primarily, when the maturing young demand, from an aging social order, a new alignment of power and privilege. The establishment of territories for foraging or nesting is another means of preventing conflict with rival groups. At the edges of these territories there are, regularly, the “border incidents” typical of human nations. Conflict, it seems, occurs most often at the point of intersection of one group competing with another. Human bands have long sought to control and regulate violence by establishing territoriality (the nation-state) and by dominance and submission (power politics). Yet, the yield has been no more than continuous assaultive conflict and a history of civilization that is writ excessively large with dramatic accounts of senseless “heroism.” Scott suggests that, at least in puppies, methods of rearing the young can foster gentility in the mature animal. He has demonstrated the truth of this assumption by raising five hundred puppies while only once being bitten. His method was simple: the puppies were never punished; they were hand-carried from place to place from birth; and, whenever they appeared aggressive, he rendered them helpless by hoisting them off the floor with a firm grip under the belly. This method required consistent but gentle restraint of aggressiveness early in life; he reports that it worked, and he suggests that, with some modification, these methods would produce comparable gentility in human beings. While techniques of child-rearing are important contributors to the final shape of adult behavior, they cannot be considered in isolation from some
Salkind_Chapter 02.indd 16
9/16/2010 12:41:31 PM
McNeil
Violence and Human Development
17
measure of the organized-disorganized status of the society to which the individual must adjust. As Scott indicates, in an organized, highly structured, stable society the “social animal” is peaceful and cooperative; in a society that is disorganized and in transition, he is capable of the worst of destructive and violent behavior. Man’s animal nature is a feeble excuse for violence; a more reasonable explanation is that the seemingly senseless violence of humans may be one of the costs of urban living. In the neglected center of our crowded cities the young, unmarried, unemployed male product of a broken home tends to be a prime source of the purposeless assault of one human on another. Yet, when individuals are driven to seek out kindred restive souls and to construct of them groups with a common hostile purpose, we have all the necessary ingredients of violence and defiance of social control. Studies of the animal kingdom have a limited usefulness in expanding our grasp of the human condition. At best, the lives of social animals are only an approximate fit to that period in human life when the young child is without the tool of language and must rely solely on primitive methods for expressing aggression. As the child’s capacity for verbal, abstract, and symbolic responses increases, the comparison of animal and man no longer contains either truth or relevance.
The Creation of Violent Individuals The monster created by the legendary Dr. Frankenstein had to be destroyed because, having been spared the psychological trials and tribulations of childhood, it failed to learn alternative, nonviolent ways of reacting to frustration. In every known culture, history has been mute witness to the unremitting production of generation after generation of Frankensteinlike citizens. The methods vary, but the prime ingredients of this bitter stew of physical destruction of one man by another are recognizable even without a written recipe. How do parents in any culture deliberately fashion a social Frankenstein? It is not a task easily accomplished since there is resiliency to youth that resists even the most horrendous of child-rearing circumstances. Yet, the steps parents must take are fairly direct, if not simple. Step 1: Have no love for the child Love is a mercurial element that can vitiate the best of malicious intentions. Love topples what hate constructs. Love undermines rejection, softens the sting of anger, and dulls the edge of rage. Love fashions a protective cocoon that shelters the individual from the full force of the blows of fate. Love is insidious, and its workings are invisible to the eye. When love is absent, the child becomes an object like any other – an object to be used or misused as the needs of the parents dictate.
Salkind_Chapter 02.indd 17
9/16/2010 12:41:31 PM
18
Human Development
Step 2: Shape the child’s view of the world and people Reward and punishment are the most useful tools for this purpose. The selective reward of some natural responses, coupled with the punishment of others, can underscore particular dimensions of personality at the same moment that it selects others for exclusion. If selective reward and punishment are begun early enough, continued for a sufficient length of time, and meshed subtly with an unmistakable parental example, the child will grow to maturity with a fixed and immutable perception of what constitutes truth and reality. The world view of the inexperienced and only partly comprehending child inevitably contains the image of the dominant parent whose philosophy of life and reaction to mankind get reflected in the basic fabric of the child’s developing psychological life. Parents need only act as an interpretive filter of the real world; the selective experiences to which the child is exposed and the selective interpretation of these by the parents work together to fashion a child with a unique and highly personalized view of world affairs. Step 3: Reinforce preferred behavior while rationalizing it The child must be totally convinced that his reactions to people and the fashion in which he treats them are natural, reasonable, correct, and not monstrous. Essentially, an extremist philosophy must pulse through the veins of the individual if violence is to lose its menacing aspect and become a necessary means to an absolutely essential end. The child must learn that in a jungle only savages survive. Addiction to violence on a personal, smallgroup, national, or international level must either be rewarded more often than it is punished or the punishment must seem undeserved and produce even greater dedication to a lethal life style. Violence needs an end to justify its means, and if the child can come to believe that his aggressive actions have a rational base he will, in times of high anxiety, become predictably violent as a means of solving problems. Western society has always given lip-service to the belief that the end does not justify the means. It is an interesting ethical notion, but in real life mortals rarely live by such an unworkable dictum. Perhaps the production of a Frankensteinlike monster requires simply a conscious reversal of the “means-ends” ethic such that the emerging leader does as people do rather than as people say ought to be done. As far as the power-hungry individual is concerned, any ethic is defensible if it produces success. How, then, do education and methods of child-rearing produce docile, non-aggressive, nonviolent adults? It is done, most often, by taking advantage of the dependent, helpless nature of the growing child. A set of expectations is established for the child, and a model is outlined of the kind of person he must strive to become. Then, the average child is exposed to pain, fear, deprivation, and isolation from others if he behaves in a nonacceptable manner.
Salkind_Chapter 02.indd 18
9/16/2010 12:41:31 PM
McNeil
Violence and Human Development
19
At the same time, he is praised and rewarded for approved behavior. These externally applied punishments produce, in the human animal, psychological reactions of guilt, fear, anxiety, a sense of loss and alienation from others, and feelings of rejection. It is from these internal emotional experiences that the child’s self-image and self-esteem are formed. This simplified schematic view of human development has one very serious limitation. The psychological reaction of the child to the internal and external events and pressures in his life is not always a direct or straight-line arrangement in which stimuli X predictably produces reaction Y. The psyche of the child is not a mechanical system in which a known amount of push is automatically counterbalanced by a fixed amount of pull. Human beings are capable of distorting reality into shapes and forms that have a nightmare quality about them; human beings “process” their reactions through a complex psychological apparatus that allows seemingly incompatible and opposite reactions to issue from what appear to be identical, or at least similar, life circumstances. Hollywood was so enthralled by the discovery of this psychological fact that it produced an appalling series of Grade-B movies in which the central theme was always the same – of a pair of siblings, one became a priest and the other, a gangster chieftain. The dramatic conflict between the two and the puzzling suggestion that both had issued from the same seed and the same squalid environment provided the dramatic denouement of the film. The psychological truth, of course, is that each individual is unique and that the “environment” is an inert substance until it is mixed with the volatile chemicals of a particular and peculiar psychological structure and stirred briskly by fate. Thus, the fiveand-ten-cent variety of psychological generalization about the “mentality” of world leaders is, typically, grossly and frighteningly in error. Producing two psychological peas in a pod is beyond the capability of any known science.
Growing Up Violent Violence on a planetary scale ought to be the most frightening possibility any of us could imagine; yet, our immediate anxieties are most often triggered by reports of teen-agers and young persons rioting across the face of the land. The dynamics of group violence among teen-agers can be instructive of violence in other groups, as the kind and quality of organization of basic impulses, rather than the fact of teen-agedness, is the key to group violence. According to the psychotherapist Rhoda Lorand, groups of young people riot as one means of dealing with a collection of personal and social pressures for which no other workable outlet is provided by the society – pressures such as a lack of confidence in their own masculinity, a need to discharge sexual excitement, or a deep-seated hostility toward parents and the adult model dictated by society. This analysis of the impulses expressed in “group acting-out”
Salkind_Chapter 02.indd 19
9/16/2010 12:41:31 PM
20
Human Development
of basic problems and urges may or may not be accurate, and in no instance could we blithely assume that such an interplay of dynamic forces is typical of all members of a mob. Our concern is less with the personal dynamics of the individual members of a mob than with the chemistry of how these individual patterns of behavior get translated into violent group action. The loosely federated mass of young people at a jazz festival or resort area – each of whom is there because he anticipates that “that’s where the action is” – needs only the addition of alcohol to begin the transformation from mass to mob. It was once said that an individual’s conscience is best defined as that part of the personality soluble in alcohol. As alcohol dissolves inhibitions, those persons in the crowd with the least self-confidence, the least self-control, and the greatest need to “be someone” become visible as they impulsively act out their problems in a primitive, childish, and aggressive fashion. These first daring, violent, or defiant outbursts surge through the milling crowd and strike a responsive chord in a second wave of young persons who, stimulated by seeing in action the inclinations, urges, and impulses they themselves have barely been able to contain, soon join the melee. And they join it with a vigor that outdoes the initiators. The members of this second wave of violence are unaware that the search for an excuse (someone else started it) for the open expression of violence is what has led them to be an “innocent” bystander at the scene of the action. Statistically, these innocent bystanders are most numerous, and they form the bond between a series of isolated incidents and the final mob ugliness. Shortly, the fingers of riot reach out to the remaining onlookers – young people who now and then slip out of character and do foolish things when swept along by the tide of excitement that washes over their usually wellcontrolled impulse systems. It is at this point in time that the thinness of the veneer of civilization becomes most apparent. What we learn from history is that there never existed a time free of cruelty and violence and that any age is capable of becoming the worst that mankind has ever known, once the veneer of self-control is removed.
A Time of Juveniles Eric Hoffer observed that “history is made by men who have the restlessness, impressionability, credulity, capacity for make-believe, ruthlessness, and selfrighteousness of children.”2 He suggests that it is a reasonable assumption – given the average life expectancy of past eras – that the invention of the wheel and the calendar, the chivalry and romanticism of knights in armor, and the savagery of every recorded historical epoch may well be the work of the “juvenile mentality.” Even the ranks of elders may be populated by persons who grew older but never grew up. Perhaps whole societies can come to act and think like juveniles if they are directed by leaders who personally epitomize
Salkind_Chapter 02.indd 20
9/16/2010 12:41:31 PM
McNeil
Violence and Human Development
21
this mentality and capitalize on the promise of unfettered impulse expression for all. The drums, the bugles, the uniforms, and the posturings of humanity – the deadly serious playing-of-soldiers – appears in every age. Hoffer suggests that the juvenile turn of mind can be produced in an otherwise mature adult whenever that adult – be he immigrant, deprived citizen, civilian becoming soldier, or serf becoming free man – finds himself enmeshed in a mode of existence or state of in-betweenness of the adolescent. Perhaps the state of in-betweenness is the devil, and, perhaps, we are witnessing Hoffer’s time of juveniles reborn. A society that must call out the National Guard in order to control its youth is an unappetizing society, indeed. There is a quotation from the psychologist, Shakespeare, that contains the nub of difficulty in our attempt to erase violence. First Servant: Why then we shall have a stirring world again. This peace is nothing, but to rust iron, increase tailors, and breed ballad makers. SECOND SERVANT: Let me have war, say I; it exceeds peace as far as day does night; it’s spritely, awaking, audible, and full of vent. Peace is a very apoplexy, lethargy; mulled, death, sleepy, insensible. . . . SHAKESPEARE, Coriolanus
Managing the Medusa If violence is a function of complicated individual psychic processes, what can be done to manage the various forms in which it will rear its ugly head? It would be unrealistic to hope that child-rearing procedures will ever be systematized and regulated by a controlled educational process designed to eliminate violence in cultures the world over. Societies of every sort will continue to supply the world with potentially violent citizens. The challenge is to manage the expression of aggression in adults who can no longer be controlled by the simple devices of childhood. Violence tends to be a pastime of the young, and no society has succeeded in the search for an adequate substitute for it. William James’ suggestion of a Moral Equivalent of War exactly describes the dimensions of the problem we face: For the young, life needs to be defined in terms of the strenuous, the vivid, the intense. Life is to be conceived in such heroic terms that, in comparison with it, the heroism of war will offer no charms. It is doubtful whether a peaceful way of living will be achieved for modern man in terms of the traditional hymn writers’ conception of peace as a region of lilies in the green pastures beside a murmuring brook. The old, the sick, the tired can be charmed by such visions; the young, the tough, and the resolute cannot. They will have their danger; they will have their struggle against obstacles.3
The preferred means of managing violence – prevention – may also be only a pipe dream. Fritz Redl once said that prevention, in its simplest form,
Salkind_Chapter 02.indd 21
9/16/2010 12:41:31 PM
22
Human Development
means do not poison the soup. Thus, the prevention of violence may require correction of the conditions that produce the frustration that finds its outlet in assault and physical injury. Prevention, at another level, means detecting those among us most subject to uncontrolled violent expression and altering their personal adjustment or life circumstances – keeping the socially sick from becoming even sicker. To date, the best we seem able to accomplish is a kind of fire-brigade psychology in which we get to the conflagration shortly after the barn has burned down. We cannot prevent what we cannot comprehend, and the older generation has yet to understand that times are different – that their musty memories of their own youth are a confused guide to the future. We have become the unwilling victims of the speed with which cultural change is taking place and we have become an Uncomfortable Generation. What has been lost to us is the comfort of slow-motion change that once gave us enough time to adjust, adapt, and come to terms. We have lost sympathy with the needs, anxieties, and frustration of the modern young and have forgotten that, throughout history, violence has been an anguished outcry of the hopeless, the frightened, and the insecure. To prevent is not the same as to stifle or ignore. We must find a means to render less alien this new generation, the placard-bearing, social-protesting, civilly disobedient segment of our social fabric.
The Excuse for Violence Violence has always cloaked itself in the garments of some means of making it legitimate. In defense of violence, man has insisted that he was provoked beyond all human endurance; he has stated that he was not responsible by reason of insanity; he has pointed out that he acted only in self-defense; he has claimed that honor and manhood required violent response; he has maintained that he never intended to produce the outcome that occurred; he has said that what he did was for the ultimate good of society; and he has felt, if not said, that his actions were inescapably necessary given the situation in which he found himself. Theoretically, these reasons are an inadequate apology for human violence; in real life, these explanations are a valuable catalogue of excuses for destruction of one’s fellow man. Every society manages to teach a certain proportion of its members that these reasons for violence are acceptable and sensible explanations for recourse to injury of one’s fellow man. If we teach some of our young that nonviolence is a luxury to be afforded only when conflict is not intense, then violence will never be dropped from the repertoire of human responses because, in certain circumstances, crime does pay, and may even be pleasurable. Physical assault too often produces exactly the outcome for which it was designed. Children bully one another and get away with it; adults threaten one another and achieve their goals; parents encourage violence in
Salkind_Chapter 02.indd 22
9/16/2010 12:41:31 PM
McNeil
Violence and Human Development
23
their children in conscious and unconscious ways; society rewards violence if it is conducted in good taste and is a means to a socially agreed-upon end; and subtle forms of social blackmail have long been an important aspect of man’s interpersonal relations both on an individual and international level.
Violence and Leadership Mob violence, while distressing, and often fatal when it reaches its frenetic, fever pitch, remains a fairly isolated and infrequent event. Our anxiety is misplaced if it dwells for long on mob destruction, because the primary issue to be resolved is that of the violent leader who stimulates to action the impulses of those who would be less violent if not provoked. If the urge to power among the mature is substituted for the beery motivation of the young, we can assemble a fatal equation. Political violence is far more dangerous than the panty-raids of the young. Our larger and more complex cultures demand cultured and sophisticated forms of violence in the service of power; the less “developed” the national unit, the more convenient and comfortable it is to wear the shroud of raw and apparent violence. It is in the setting of an emerging country that the leader makes his most visible contribution to the aggressive course of human affairs, but his influence is no less real in sophisticated cultures. The leaders of people do not issue from the common mold of men; they tend, rather, to be drawn from among those deviates from the average whose personal charisma matches closely the needs and spirit of the times. Leaders with the unique ability to draw the human race willingly down the path to its eventual destruction must – in this view of humanity – have assembled a collection of personal characteristics and ways of behaving that fit the temper of the times and match the age in which they live. The nature of their developing years is a critical factor in understanding their response to the state of the world. Despite the insistence of some theorists that the complex and highly interdependent organization of society acts to emasculate the forceful leader and render him powerless, it must be noted that even the advice and counsel of political associates must finally be shaped into a decision by one man. In truth, leaders rarely surround themselves with followers who are openly critical of their personality, life style, and decision-making techniques; leaders tend, rather, to establish a decision-making environment with a great deal of built-in consensus. The violent leader assures himself at least of sympathy and support and, often, of carte blanche for his actions. In so doing, the leader is less the victim of bad advice from others than he is the manufacturer of final consensus. Thus, group violence – at either the mob or the national level – has a series of preconditions which weld the needs of the group to the personality and psychological structure of the leader. These psychological forces become an
Salkind_Chapter 02.indd 23
9/16/2010 12:41:31 PM
24
Human Development
inseparable part of the current political and national conditions that define the direction that events will take. The conclusion stressed here is that the comprehension of individual or group violence will continue to be a mystery if the form of development of human personality and the form to which human psychic structure can be modeled are treated as nothing more than an annoying gadfly pestering the concept of large-scale violence. The complexity of the human psyche has made it so forbidding an area of exploration that modern theorists have discounted human personality as an important influence in the affairs of mankind. It is, indeed, an alien concoction and one not easily digested by the politico-economic-sociological theorists of this generation. Yet, denying that the psychological nature of man has relevance in understanding human violence has produced only a bankrupt and barren vision of the future of humanity. Man’s psychic nature cannot remain an unknown in the equation of violence or we will find ourselves presiding over the dissolution of the human race. In the course of development of the hostile human being destined for leadership, we see an organism fashioned to perceive a world composed primarily of threatening elements – threatening to him as a person and threatening to his conception of the way things ought to be in the world. The threat so visible to such a person is reacted to rapidly, intensely, and violently. Thus, his violent response happens easily, it happens often, and it needs little provocation. Faced with threat, the aggressive leader has few alternative forms of response at his command and, being incapable of tolerating stress, he falls back rapidly on the only response that has served him faithfully in the past. Cornered, he is incapable of a rational judgment free of the urge to aggrandizement or the impulse to strike out and destroy those he perceives as plaguing him with anxiety. We are rapidly approaching that point in time when the fate of humanity will be cradled in the sweating palm of just such a person. At this fatal juncture in the history of man we may pay sorely if we fail to recognize that violence and human development are twin facets of the same basic process. The dehumanized study of violence is very much like pretending that “things” and “abstract conceptions” of political-economic-social events have an existence all their own, and should be called “living” systems. I think that the historians of 1984 will conclude that “the proper study of man is mankind.” I am convinced that an understanding of the pattern of human development is the key that will one day suggest a workable plan for controlling violence in Homo sapiens.
Notes 1. J. P. Scott, “The Anatomy of Violence,” The Nation (1965), pp. 200, 662–666. 2. Eric Hoffer, “A Time of Juveniles,” Harpers, Vol. 230 (June 1965), p. 238. 3. William James, Memories and Studies (1911).
Salkind_Chapter 02.indd 24
9/16/2010 12:41:31 PM
3 The Life-course and Human Development: An Ecological Perspective Glen H. Elder, Jr and Richard C. Rockwell
T
he ecology of human development relates patterns of development to the enduring and changing environments in which people live (Bronfenbrenner 1977). This enterprise has much in common with prominent analytic concerns of the flourishing early stage of the social sciences in the United States, the 1920s and 1930s. W.I. Thomas, among others, made a compelling case for an historical and comparative study of life patterns in their sociocultural environments (Volkhart 1951). Since then a number of developments in theory and method have separated the study of lives from social context, as implied by the critical title of a recent essay, ‘Bringing society back in’ (Barton 1968). In studies uninformed by the lifecourse and its historical context, the study of development has generated knowledge bearing an uncertain relationship to the actual lives of individuals (Baltes et al. 1977). By ‘bringing context back in’, the ecology of human development has given new vitality to three analytic themes long dormant in research. First, it reasserts the significance of place by attending to the family, neighborhood, and larger community as settings of development. Second, it charts the course of families and lives by focusing attention on age differentiation in the timing and coordination of events. Third, it acknowledges the importance of historical time by a concern with events, crises, and social change. The sociological analysis of age, life-span developmental psychology, life history methodology, and social demography have converged in the past decade in a life-course perspective on human development from birth Source: International Journal of Behavioral Development, 2 (1979): 1–21.
Salkind_Chapter 03.indd 25
9/16/2010 12:41:21 PM
26
Human Development
to old age. This perspective offers a fruitful way to address each of the above themes. We briefly outline this perspective, and provide an example of its application in a study of unwed teenage mothers, and contrast it to other perspectives on life change, careers, and social position, in which research practice falls short of potential. Throughout the essay our concern is with the fundamental role of problem formulation in research, theory development, and method.
The Life-course Perspective: Essential Elements and an Exemplar The life-course perspective locates individuals in age cohorts and thus in historical context, depicts their age-differentiated life patterns in relation to this context, and illumines the continual interplay between the social course of lives and development. The relation of age and time lies at the core of this perspective and is expressed in three temporal meanings: (1) chronological age marks developmental time as a simple index of stage in the inevitable process of growing older; (2) social age identifies age patterns in social roles and timetables; and (3) historical time enters through a concern with birth year as it relates membership in a specific cohort to the experience of history and social change. Each meaning of age informs our study of pathways through the age-differentiated life course and their developmental implications. The life-course refers to these pathways, to social patterns in the timing, duration, spacing and order of events and roles.
Social Age Differentiation in the life-course arises from social meanings of age, as well as from biological facts of birth, sexual maturity, and death. Throughout history and across cultures these social meanings have varied, as evidenced by the shifting meaning of ‘childhood’. Norms, expectations, privileges, and constraints express societal distinctions regarding age. Age strata are socially recognized divisions of the life span which constitute a basis for identity and specify appropriate behavior. In complex societies age structures and timetables are plural; the individual life-course is comprised of interlocking careers, such as those of work, marriage, and parenthood (Elder 1975). The scheduling of events and obligations thus becomes a problem of how resources and pressures are managed. The economic squeeze of early childbearing illustrates the adaptive problems that stem from asynchrony between resources and demands. This perspective assumes that the consequences of events in the life-course vary according to their context and timing. There are cultural definitions of
Salkind_Chapter 03.indd 26
9/16/2010 12:41:22 PM
Elder and Rockwell
Life-course and Human Development
27
appropriate times for schooling, leaving home, marriage, and childbearing. As a rule, individuals are aware of how the timing of their lives fits with cultural timetables and of the consequences associated with off-timed events (Elder 1975). Extreme departure from cultural timetables often entails decisions among undesirable options, and formal and informal sanctions. The plight of the unwed teenage mother illustrates this bind, for she has few desirable options open to her. Adult career progress is judged in terms of life phase. For example, one type of status inconsistency (occupational status well below education) is normatively inconsistent and a source of distress only in middle age, a time of peak earnings and status (House and Harkins 1975). Moreover, a complete understanding of status inconsistency during the middle years requires knowledge of the process by which it occurred. Both midlife demotion and prolonged worklife instability may produce the same inconsistent pattern, but their implications for health and well-being are bound to differ.
Historical Time Birth year locates people in history just as social age locates roles in the social structure. Individuals are exposed to a slice of historical experience in the process of moving through age-graded roles, and they share much of this life experience with other members of their cohort. Cohort membership acquires substantive meaning when we relate cohort experiences and characteristics (such as composition and size – themselves products of historical experience) to historical events and trends. Size differences between the birth cohorts of 1930–34 and 1946–50 reflect the historical experience of the Depression and postwar years. During times of rapid change successive cohorts are likely to differ in life patterns. They encounter the same historical event at different points in their life-course and thus differ in their experience of it. A recent comparative study of two cohorts of men (birthdates 1920–21 and 1928–29) found substantial cohort differences in the effect of Depression hardship on psychological development from childhood to middle age (Elder and Rockwell 1978). Deprivation (relative income loss between 1929 and 1933) imposed a greater burden on members of the younger cohort, for family hardship occurred at an earlier age and spanned a longer period of their lives. On the transition to adult status, Reuben Hill (1970: 322) observes that each cohort in periods of rapid change “encounters at marriage a unique set of historical constraints and incentives which influence the timing of its crucial life decisions, making for marked generational dissimilarities in the life cycle career patterns”. In summary, a life-course perspective directs inquiry toward understanding the process by which lives are lived. As we trace the impact of larger contexts
Salkind_Chapter 03.indd 27
9/16/2010 12:41:22 PM
28
Human Development
and distant events to the world of the child and his family, we find that knowledge of the social course of families and individual lives is fundamental. Through an understanding of the life-course and its consequences for development, we are able to explain the process by which early life events are related to later events. Age relates history and social structure in the human biography, and it is through age differentiation that we find the implications of time and place for development. With these general points in mind, we turn to a specific example of a life course study.
Unwed Teenage Motherhood as a Moral Career Furstenberg (1976) has advanced the study of unwed teenage motherhood by applying the concept of career to a topic formerly viewed in terms of simplistic, atemporal concepts. Prior research generally identified specific kinds of people most likely to have illegitimate births; it viewed unwed motherhood as an event isolated from the life-course. In contrast, Furstenberg showed that a birth out-of-wedlock represents one point in a moral career (see also Rains 1971) and that specific sequences of events lead to an illegitimate birth. At each stage young women have an option: premarital sex or not, contraception or not, abortion or not, marriage or not. Only a few of the possible paths lead to an illegitimate birth, and Furstenberg explored why some girls followed these paths and others did not. After the birth girls encounter further decisions: abandonment of the child, putting the child up for adoption, marriage or single parenthood, more illegitimate births, educational and vocational options, entry into the welfare system or economic independence. The impact of illegitimate births depends on how the career of unwed motherhood meshes with the other careers of marriage, occupation, and education. Each point of decision lies at a different stage in a career, and each requires a different explanation. A full understanding of the sources and results of unwed teenage motherhood involves linking these separate explanations into a broader perspective. Furstenberg’s study of the association between out-of-wedlock births and economic dependency shows the extent of his departure from previous analyses. As background to his study of a longitudinal sample of mostly black adolescents living in low-income areas of Baltimore, Furstenberg reported two studies that obtained conflicting results on the economic effects of premarital births. Cutright (1973) studied women who had borne children, dividing the premaritally pregnant from others. His comparison of these groups indicated that no ill effects of premarital pregnancy occurred if the mother married. In contrast, Coombs and others (1970) observed a longterm economic disadvantage of premarital pregnancy among two samples of married women. Though differing samples restrict comparability, these contradictory findings may reflect an incomplete research question: do
Salkind_Chapter 03.indd 28
9/16/2010 12:41:22 PM
Elder and Rockwell
Life-course and Human Development
29
women who are economically dependent tend to have a history of premarital pregnancy? Answers to this question do not reveal why the effect is observed. Furstenberg poses a different question: “How many women with similar childbearing careers manage to remain economically independent [and] how many with entirely different histories ultimately end up on welfare?” (emphasis author’s 1976: 148). By investigating how “the process of recovery is achieved” and “the critical conditions that determine whether or not the economic consequences of premarital pregnancy will be temporary or persistent” (1976: 149), Furstenberg is able to explain the effects of unwed teenage motherhood. Recovery from economic loss turned upon marriage, education, household composition, and additional children – not upon personal values, for few young mothers desired public welfare. Marriage was a critical decision in economic recovery; women who did not marry had little chance of recovery. Though employment increased prospects for economic recovery, young mothers entered the labor market with a handicap. They were deficient in education and experience, younger than their competitors and often dropouts from school. They suffered labor market discrimination against women, and because most were black, they also faced racial discrimination. The jobs they obtained often did not even cover the expenses of child care and maintenance of a family. If other adults were present in the household, the young mother was often able to use this economic and child-care help to finish school and bring home net income. Finally, if the young mother had additional births, child-care problems often rendered it impossible for her to get a job that provided net income. Persistent economic dependency thus turned not upon the event of unwed pregnancy itself but instead upon which of several pathways through the life-course were followed by the young mother. Furstenberg’s analysis also helps us to understand the effects of illegitimate birth upon children. The young mother’s occupational and marital status after childbearing made the greatest difference in her child’s cognitive and social development. Her status at the time of the birth made relatively little difference. Indeed, children in families with a father present displayed cognitive skills almost equal to those of children not born out-of-wedlock. Thus Furstenberg’s analysis illumines the process by which some unwed teenage mothers were able to repair the damage of an illegitimate birth in their own lives and in the early lives of their children.
Practice and Potential in the Study of Lives In this section we examine selected developmental studies in which the research problem neglects temporal distinctions emphasized in the life-course perspective. Problem formulation – underlying both theory and method – fails to meet the demands of developmental research for an understanding
Salkind_Chapter 03.indd 29
9/16/2010 12:41:22 PM
30
Human Development
of process. Research on the family commonly gives no attention to temporal variations in family life that are related to timing of events, and studies of careers fail to examine pathways that connect events widely separated in time. Research on the psychological effects of life change all too frequently ignores when the change occurs, its nature and relation to other life events. These deficiencies stem from research questions that disregard two principles of life-course analysis: first, the effects of an event depend on its timing and relation to other events; and second, the social and developmental meaning of an event is derived from its context and from life history. On matters of timing Furstenberg focused on the effects of a disturbance in the normative schedule during a woman’s adolescence. Such effects would not be seen if the illegitimate birth had occurred some ten or fifteen years later, after the completion of school and a period of work and accumulation of assets. Likewise, late marriage differs from early and on-time marriage in divergent patterns of disadvantage and advantage: late marriers often have well-established worklives and sometimes advanced education, but they also have a smaller number of potential mates (Elder and Rockwell 1976). Economic gain or loss bears different meaning when household size is expanding than when it is contracting, and when economic demands of children are high or low. The analytic significance of these temporal matters is underscored by unsatisfactory explanations when they are slighted. The second principle distinguishes between the cross-sectional and the longitudinal study of lives. Consider studies of the relation of socialization to family status. Cross-sectional analysis is not sensitive to the socioeconomic history of the family, nor can it attend to consequences of status change for childbearing. A sample of working-class families may include the downwardly mobile, the upwardly mobile from the laboring class, and the stable working class. Although each type of family is ‘working class’ in cross-section, they have substantially different aspirations and provide different resources for children (Elder and Rockwell 1978). A life-course perspective on family status moves beyond correlations or regression coefficients between statuses at different points in time to examine the process or paths that link events at different times. Thus some working-class men who advance into the middle class by mid-life do so through an orderly pattern of worklife progression, while others switch lines of work. The status change may occur early for some and late for others. One would not expect a single explanation of mobility to suffice for each of these patterns. In what follows, we explore the cost of ignoring these two principles for knowledge about lives and briefly suggest modes of inquiry that are informed by a life-course perspective. We begin with a study of status differences in psychological status and indicate the inadequacy of research that fails to view status within specific phases of the life-course. This is followed by a study of ‘careers’ that is not guided by a concept of the life-course and how it is socially patterned. Finally, we identify both of these weaknesses (treating
Salkind_Chapter 03.indd 30
9/16/2010 12:41:22 PM
Elder and Rockwell
Life-course and Human Development
31
status apart from time and apart from the socially patterned life course) as major flaws in analyses of children’s socioeconomic environment and of life change in relation to psychological functioning.
Status Variations in Psychological States Values, attitudes, and psychological functioning reflect the constraints and opportunities of life situations; education, occupation, and income structure this context (see Kohn 1977). But we still know very little about the mechanisms that link social position to life outcomes. Why are differences in occupation relevant for health and child care? A life-course perspective focuses inquiry on these mechanisms, and orients research to potential historical and lifetime variations in the psychological effects of status. The impact of status varies across the life span as status and status change assume different meanings within the normative context of age strata. Age-graded standards give specific meanings to status. Promotion to senior partner in a law firm has different meaning for lawyers at the midpoint and at the end of their worklives. Prospects for advancement diminish in later life. The effects of loss of status are different among older and younger workers; opportunities, obligations, resources all differ by age. Moreover, occupation is not expected to reflect education, or income to reflect occupation, in the early years of worklife. During the middle years job advancement and earnings more nearly approach their lifetime peak. These observations favor analysis which views the psychological correlates of status by life stage or phase. The appropriate analysis is one in which status patterns are linked to psychological states within age strata – such as young adult, the early and later phases of middle age, and old age. But we see no evidence of this recommended method in one of the more ambitious studies conducted on status differences as they affect people’s lives. Curtis and Jackson (1977) sampled men in six American communities (male heads of households, 21 years of age or older) for a study of the sources of educational, occupational, and income inequality, and their psychological consequences (perceptions of the class structure, conservatism, anomia, and punitiveness). We shall only deal with those portions of the study that bear upon age-related lifetime variations in psychological states. Age clearly has relevance for this problem, but the authors use age as a statistical control, not as an index of context [1]. In their regression analyses Curtis and Jackson assume that the apprentice lawyer has the same attitudes whether younger or older; that an increase in earnings of $1000 has the same impact for men starting out and concluding their careers. The simple adjustment for the ‘effects’ of age assumes additivity and linearity where theory underscores the need to examine interaction. They are thus prevented from observing that higher-status jobs reach an economic peak
Salkind_Chapter 03.indd 31
9/16/2010 12:41:22 PM
32
Human Development
later in life than lower-status jobs; that imbalances between supply and demand contribute to the stresses of childbearing in family life; and that income acquires psychological meaning in relation to demand, which varies over the family life-course. Although they do acknowledge potential differences in the relation between status and attitudes by life stage (1977: 156 –157), this expectation is not based on an understanding of the life-course. In our judgment the study’s basic flaw stems from a research problem that is uninformed by the sociology of age and the life-course. The life course perspective calls for a study of the relation between psychological status and status within life stages. This type of study selects a sample stratified by men just entering the labor force, at the peak of their careers, and nearing retirement. Analysis could then examine the multiple sources of status differences (education, occupation, and income) in relation to their various effects within each stratum. Age patterns in norms and career progress support the expectation of systematic variation in status effects within age strata.
Social Status vs. Career The study of careers involves questions that cannot be answered by information on a person’s status at a point in time, or by the relation between statuses at different points in the life-course. Though a person’s first job may predict his last job with fair success, the association does not tell us about his occupational career between these points – stability of a line of work, status change, idleness, and shifts in employer. The concept of career refers to a sequence of activities that are functionally related across settings. In this sense a career is roughly the same as a person’s life history in work, marriage, parenthood, or consumption. Career analysis is oriented toward the process of situational change and its implications. This task is illustrated by a study which found relatively high levels of worklife achievement in men with incongruently low levels of formal education (Elder and Rockwell 1978). These men, born in Berkeley, California, just before the Great Depression, had grown up in deprived families. Whether middle or working class, they obtained substantially less education than did nondeprived men. However, at midlife, there was no difference in average occupational status between deprived and nondeprived men. Contrasts in worklives resolved this incongruity: deprived men generally began their worklives and established a stable line of work at earlier ages. This pattern of accelerated career formation countered the handicap of Depression hardship and limited education. In the Berkeley analysis the research problem stemmed from the convergence of life patterns among men who entered the labor force with significantly different historical and pre-adult experiences. Another type of career question
Salkind_Chapter 03.indd 32
9/16/2010 12:41:22 PM
Elder and Rockwell
Life-course and Human Development
33
starts with people in a common situation and seeks to explain why some are more successful than others. This is the question which Coleman and associates (1972) explored in a sample of white and black men born in the 1930s. Working with men in the same occupational stratum at first job, they sought the “mechanisms which lead to differential levels of success” some ten years later. Additional education emerged from regression analyses as the most significant source of success, especially among white men. Occupational events were second in importance for whites; marital and family events, for blacks. When combined, these factors accounted for a substantial portion of the variation in men’s status after ten years, but they do not explain the process by which specific worklife or family events made a difference in level of success. Moreover, the study does not place such events in the context of temporal phases of careers. For example, number of jobs and employers assume different meanings when part of orderly or disorderly careers (Wilensky 1961). Though event timing and sequencing do not enter Coleman’s analysis, a lifecourse perspective would orient the study to such concerns. When did marriage occur relative to work entry, exit from education, and military service? Timing of the first birth bears directly upon worklife pressures of family needs, but this was not part of the research. The study also does not distinguish between career costs of unemployment at the beginning and end of the ten-year period. Overall, the Coleman study exemplifies research that lacks theory on careers and the mechanisms by which men attain differential success. We find a similar deficiency in Robert Sear’s (1977) longitudinal study of sources of occupational satisfaction among men near the end of their careers (average age 61). The men were members of Terman’s original sample of gifted children in California. Unlike Coleman, Sears did not focus his study on life-course questions. His analytic task entailed prediction of satisfaction, not an exploration of life patterns which have differing implications for the later years. However, a man’s satisfaction with what he has done in his occupational life is a function of who he is, where he started from, how he arrived at his final position, and what he did along the way. It is a product of the life-course. The same degree of satisfaction may have different meaning for men who followed different paths. Sears used a step-wise regression procedure for the selection of predictive factors. Not surprisingly, prior attitudes emerged as the most substantial predictors of occupational satisfaction at age 62. Work satisfaction in 1960, “extent of having lived up to intellectual capacity”, and vitality in 1972 assumed statistical precedence over worklife and the life-course. This suggests that attitudes regarding work are more powerful determinants of work satisfaction than the career itself. Indeed, Sears concluded from his analysis that work does not matter in work satisfaction (an invalid interpretation even on statistical grounds, Duncan 1970): “it looks as if there were some continuing affective quality” rather than “the objective facts of life” that determines work satisfaction.
Salkind_Chapter 03.indd 33
9/16/2010 12:41:22 PM
34
Human Development
It is most unlikely that analysis undertaken from a life-course perspective would support this conclusion (Elder 1974; Kohn 1976). Though Sears’ work may be a perfectly valid description of attitudinal correlates of work satisfaction in later life, it does not help us understand the psychological consequences of the various routes men followed to old age. The study fails to do justice to the diverse realities in men’s lives. A life-course study of sources of work satisfaction would trace early differences, such as class origin, aspirations, and interests, to occupational choices, education, and career formation. Orderly work-lives would be differentiated from disorderly; early career establishment, from later; and upward mobility, from downward mobility and stability. Work satisfaction would be linked to these differences. Certain career lines offer satisfaction through steady progression; others are gratifying because they represent unplanned achievement; and still others yield satisfaction for men with low aspirations. Although some of these variables enter Sears’ analysis, they are not ordered in a life-course account of men’s work satisfaction at age 62.
The Family Economy in Children’s Lives We criticized the Curtis and Jackson study for its failure to examine the psychological meaning of status within the life stages of men. The Coleman study ostensibly focused on the mechanisms of differential achievement, but it did not in fact examine this process from a career perspective. Both of these limitations also appear in traditional concepts of the socioeconomic environment of children. Child development is a temporal process, yet research in this area has generally relied upon atemporal measures of the child’s socioeconomic environment such as parental occupational status and income. In her review of the literature on social class and development, Cynthia Deutsch (1973) emphasized the diversity within general class strata, but she did not acknowledge the limitations of atemporal measures of family position for developmental research on children. The problem of diversity within classes is at least matched by that of variation in socioeconomic career of families. As a panel study documents (Lane and Morgan 1975), poverty is not a stable condition for a substantial number of lower-income American families. Over a six-year period families moved above and below the poverty line. Static measures of economic wellbeing represent a mixture of temporal patterns that obscures their social and psychological significance. Development, social reality, and the life-course perspective all make a persuasive case for temporal concepts of children’s socioeconomic environments, but this is only a first step. It is also necessary to recognize that both family income and composition change over time and that their effects on children cannot be fully understood apart from their relationship. Few studies have actually explored this relationship
Salkind_Chapter 03.indd 34
9/16/2010 12:41:22 PM
Elder and Rockwell
Life-course and Human Development
35
or its consequences for family interaction and child development. The familiar concept of the family life cycle captures change in family composition resulting from the addition and departure of children (Elder 1978). Change may also occur when parents die, divorce, or remarry, or relatives move in or out of the household. Studies have examined marital relations and parenting in relation to family stage, and a critical review of the literature (Clausen and Clausen 1973: 186) cautions that “the meaning and consequences of having a given number of children in the family will vary with each phase of the family cycle”. The meaning of family stage and size also varies by the timing of events. Variations of ten years or more in mother’s age at first birth produce large differences between the age and career position of parents in the childbearing stage. Even within the same occupational stratum, late marriage and childbearing offer a number of socioeconomic advantages when compared to the early timing of these events. The later the events occur, the more both husband and wife are able to accumulate material resources and augment their income. As Freedman and Coombs (1966: 648) point out, couples “who have their children very quickly after marriage find themselves under great economic stress, particularly if they married at an early age …”. These effects are not adequately specified in terms of social status or family stage alone; problems of child care, parental stress, and family management arise from the temporal relation between socioeconomic career and family composition. By focusing on this relationship we obtain a simple model of the family economy that takes into account demands and contributions of all members of the household. Economic consequences of change in household composition stem from the relation between supply and demand, from earning levels and number of earners, and from the number of young and old dependents. Change in the family economy occurs as the household head ages and changes roles; as children arrive, age, and depart; and as productive family members are lost through disability, death, divorce, and new family formation. One of the most significant results of studies based on this concept is that a family’s economic welfare has more to do with household composition than with economic loss or gain among family earners (Lane and Morgan 1975: 50). How do families adapt when resources fall below, match, or exceed demands? Gove and associates (1973) identify three responses to decline: (1) efforts to control and reduce consumption – a reduction in living standards, such as a move to lower-cost housing; (2) reallocation of time and energy resources – more labor intensive operations, employment of the wife, double shift work; and (3) attempts to balance income and outgo through credit, use of savings, and loans from kin. Young and Wilmott (1973: ch. 7) identified three adaptations to economic squeeze in large families: highly paid overtime work, shift work, and moonlighting. Reentry of mothers into the labor force after childbearing may reflect accumulated pressures of debt, aspirations, and pending educational costs. A child’s family environment
Salkind_Chapter 03.indd 35
9/16/2010 12:41:22 PM
36
Human Development
thus changes as families adapt to new relations between household composition and resources. These changes create pressure points in the family’s experience that increase the likelihood of family problems. Sensitivity to the relation of household composition and resources leads us to study temporal variations in a family’s support network and its ties to community agencies and institutions. For many years sociologists have explored the complex meaning of a family’s residential environment in the socio-psychological effects of neighborhoods on children. Families with majority- and minority-status in a neighborhood – such as middle-class families in a working-class neighborhood (Rosenberg 1975) – have been compared on environmental influences. But appropriate attention has not been given to family attributes such as career stage and its relation to income. Middle class families in working class areas may include aspiring, younger couples with grade school children and families that have suffered financial misfortune. Such differences in family history and stage are relevant to assessments of neighborhood composition and effects. Residential choice is one adaptation to the relation between household composition and economics and acquires meaning from its context within the life course.
Life Change and Its Psychological Effects Family responses to change are among the life changes which constitute important foci in developmental research. Family routines, roles, and relationships change as mother enters the labor force, as the father takes on overtime work (and has few hours to spend with children), and as grandparents begin to care for younger children. Change in the family economy entails change in the lives of family members. These changes are structured in part by norms regarding the life course, such as appropriate times for children to leave home and for mothers to enter the labor force. But changes may also be offtimed and conflict with other events and obligations, producing disadvantage and stress. According to a life-course perspective, the stressfulness of a life change depends on three primary considerations: (1) the nature of the change (drastic in alteration of customary habits or not, loss or gain, expected or not); (2) the life history of experience, expectations, and adaptive skills that one brings to the change (Elder 1974: ch. 2); and (3) the temporal context of the change – its position within the life course and relation to other events. On all of these counts, a parent’s death, mother’s employment, and father’s loss of job have different implications for child and family at each family stage. Geographic mobility entails minimal disruption for children in grade school; far greater disruption, if they are locked into school curricula, testing schedules, and peer networks of the high school years. A shift in line of work may be a gain in the early years of the worklife and a loss in later years.
Salkind_Chapter 03.indd 36
9/16/2010 12:41:22 PM
Elder and Rockwell
Life-course and Human Development
37
The nature and stressful impact of a life change cannot be understood apart from knowledge of its temporal context, and the resources and beliefs people bring to it. However, this view of life change and its implications for health bears no relation to that proposed by Holmes and Rahe (1967). Their physiological view argues that life change itself entails risk of illness. This risk is not affected by the temporal context of the change and its relation to other changes, by any characteristics of the change other than a global perceived stressfulness, or by differences in the life histories of individuals and families. Using the method of magnitude estimation of psychophysics, they built a Social Readjustment Rating Scale (SRRS) for psychiatric and research use. The scale includes judgments of 43 changes that range from family relations and economics to social activities. A high level of inter-rater agreement has been achieved across samples of old and young, males and females, and different cultures (cf. Askenasy et al. 1977). Death of spouse is consistently ranked highest in magnitude of life change, followed closely by divorce and separation. At the low end of the scale raters place ‘change in eating habits’, ‘Christmas’, and ‘minor violations of the law’. However, this ranking is not independent of major social change: Janney et al. (1977) report that economic changes took precedence over other personal and family events in an earthquake-stricken Peruvian city. The SRRS has appeared in the work of medical, psychological, and sociological researchers. It (or a variant) is one of the most widely-used instruments for the study of life change in stress (Dohrenwend and Dohrenwend 1974; Gunderson and Rahe 1974; Wildman and Johnson 1977). The scale has earned a fair record in predicting health change, although mainly in questionable retrospective studies. But studies using the SRRS are necessarily divorced from theoretical and empirical knowledge about the life course. As a result we are unlikely to learn precisely what psychosocial processes link life change and health. The SRRS favors what might be termed a ‘trait approach’ to situations, rather than an approach which examines behavior as a function of transactions between person and situation (Eckehammer 1974). Each event receives a single ‘life change’ score. A change in living conditions entails the same degree of social readjustment for a single man and a father of adolescents, for young adults and the elderly, even though research and observation suggest readjustment demands greatly differ. Consider also the meaning of death of spouse, an event which typically occurs late in life according to demographic timetables. Death of a young husband leaves a family with few resources and heavy obligations, and can markedly alter the development experience of children. Although a spouse’s death may qualify as a life change of maximum proportions in all life stages, its effects depend on timing within the life-course. The SRRS makes little effort to specify the direction of life change, though others have assessed the different adaptive outcomes of loss (widowhood, empty nest) and gain (marriage, parenthood) (Lowenthal and Chiriboga
Salkind_Chapter 03.indd 37
9/16/2010 12:41:22 PM
38
Human Development
1973). But the implications of direction depend on timing, and Barbara Dohrenwend’s (1973) failure to consider timing might well account for her conclusion that gains and losses differ little in their effects on anxiety. The SRRS obscures the causal structure of life changes. The same global score indexes any number of causal sequences, each with differing health implications. Some changes, such as taking a mortgage or loan over $10,000, may be adaptations to pressure that, in fact, alleviate stress. Other events are clearly evidence of health change, such as change in sleeping habits. Hetherington and associates (1976) observed effects of a single life change, divorce, that range from changes in psychological functioning and self-perception to economic stress, effects which varied by characteristics of the marriage before divorce and by sex of the child. Family stability may be enhanced by ‘change in work hours or conditions’, ‘change in social activities’, and ‘marital reconciliation’ – all of which are presently scored as stressful (and perhaps destabilizing) events. When all such changes are lumped together into a single score, the researcher cannot specify the precise social and behavioral meaning of the score. The SRRS may predict stress and health decline, but we do not know precisely what it means or what process links life change to stress. Sociologists have taken preliminary steps toward introducing theory into a research form of the SRRS (Mechanic 1975). Hough and associates (1976) developed a revised version which incorporates some distinctions of direction and timing. This form distinguishes certain gains from losses: ‘health of family member become better’ and ‘health … becomes worse’ received ratings some 42 ranks apart. They also introduced limited ordering distinctions: birth of first child ranked 18; birth of second or later child, 34; and ‘gain of new family member other than child’, 22. Wife’s entry into and departure from the labor force produced no reliable difference, but additional precision would be needed to bring out the meaning of these events. The wife’s entry into the labor force after a long period of homemaking is probably more stressful than reentry after the early phase of child rearing. These clarifying steps are essential in making the scale more interpretable, but such steps ultimately document the theoretical limitations of a global measure of life change. Despite glaring deficiencies as a representation of life change, the SRRS’ utility as a predictive device generally affirms that life changes do follow a predictable order. Consensus in judging the magnitude of life change undoubtedly reflects this social order. The evidence of the SRRS suggests that people do have their own catalog of life events which are ranked by required readjustments and that this ranking is based on both normative and factual (biological, demographic) criteria. However, the SRRS’ research use has reversed steps in the process of inquiry: it has placed technique before problem formulation, explanatory theory, and an understanding of the lifecourse. Informed questions on life change and health do not lead in the direction of a global life change measure which ignores the timing and context of events.
Salkind_Chapter 03.indd 38
9/16/2010 12:41:22 PM
Elder and Rockwell
Life-course and Human Development
39
Overview: A Matter of Question and Perspective We have sought to illumine some differences in problem formulation and perspective by application of a life-course perspective to selected problem areas. Our criticism has focused on the kinds of questions posed and on the method which was brought to bear. An ecologist of human development might have addressed any of these questions, and we suggest that a lifecourse perspective would be fruitful. Questions on context and process come readily from this perspective, a point well illustrated by Furstenberg’s research objective: to elucidate the effects of precipitate parenthood in adolescence by “exploring when, how, and why childbearing before the age of 18 jeopardizes the life prospects of the young mother and her child …” (1976: 1). Thomas (Volkhart 1951: 114) once called question formulation the “hunting activity” of the creative social scientist. It is a core task of inquiry, and one where developmental research often falters. The goal is to trace out the linkages which explain processes, not simply to assess the validity of a hypothesis of theory. Depending on the problem, research may utilize case histories, clinical judgments, surveys, field experiments, and varied statistical techniques. Explanatory linkages, once teased out, must be tested for consistency, unambiguity, and invariance, using the full spectrum of statistical tools (see McCall 1977: 336). Theory and method are interwoven with substance and the flow of ideas through the course of explanatory research. In all of this, there is above all “the use of some imagination or mind from point to point”, as Thomas once put it (Volkhart 1951: 84); the analyst “raises the question, at appropriate points, ‘what if’, and prepares a setup to test this query”. These succinct observations define an appropriate strategy for life-course analysis. The life course perspective offers a conceptual means of introducing temporal considerations and explanatory analysis to the study of lives and human development. Through its articulation of age and time, this perspective views persons in age-differentiated careers and phases over the life span. Career stages and their relation specify the meaning and consequences of life events. By locating people in historical context and in the social order, the sociology of age orients research to the process by which historical change is expressed in life experience. In this essay, we have explored some implications of the neglect of such temporal distinctions in a review of research on the psychic effect of status variation, on careers and occupational satisfaction, on children’s socioeconomic environment and the health impact of life change. Though each example addresses topics that are relevant to developmental study, their problem statements do not incorporate temporal principles on life patterns. In each case a lifecourse perspective suggests alternative modes of research of the process of human development.
Salkind_Chapter 03.indd 39
9/16/2010 12:41:22 PM
40
Human Development
Note 1. Curtis and Jackson occasionally introduce the ‘family life cycle’ as a dummy variable in their regression analyses. However, each stage is defined by role change and configurations, not by roles in relation to age patterns. Thus families in a particular stage, such as childbearing, will vary widely in age range and status. Moreover the approach does not specify processes by which families move from point to point in the life-course. Family cycle analysis conveys the erroneous impression that all families move through the stages and all at the same rate (see Elder 1977).
References Askenasy, Alexander R., Bruce P. Dohrenwend and Barbara Snell Dohrenwend, 1977. Some effects of social class and ethnic group membership on judgments of the magnitude of stressful life events: a research note. Journal of Health and Social Behavior 18, 432– 439. Baltes, Paul B., Steven W. Cornelius and John R. Nesselroade, 1977. ‘Cohort effects in developmental psychology: theoretical and methodological perspectives’. In: W.A. Collins (ed.), Minnesota symposium on child psychology, Vol. 11. Minneapolis: University of Minnesota Press. Barton, Allen, 1968. Bringing society back in. American Behavioral Scientist 12, 1–9. Bronfenbrenner, Urie, 1977. Toward an experimental ecology of human development. American Psychologist 32, 513–531. Clausen, John A. and Suzanne R. Clausen, 1973. ‘The effects of family size on parents and children’. In: James T. Fawcett (ed.), Psychological perspectives on population. New York: Basic Books, pp. 185–208. Coleman, James S., Zahava D. Blum, Aage B. Sorensen and Peter H. Rossi, 1972. White and black careers during the first decade of labor force experience. Part I: occupational status. Social Science Research 1, 243–270. Coombs, Lolagene C., R. Freedman, J. Friedman and W. Pratt, 1970. Premarital pregnancy and status before and after marriage. American Journal of Sociology 75, 800–820. Curtis, Richard F. and Elton F. Jackson, 1977. Inequality in American communities. New York: Academic Press. Cutright, Phillips, 1973. Timing and first birth: does it matter? Journal of Marriage and the Family 35, 585–596. Deutsch, Cynthia, 1973. ‘Social class and child development’. In: Bettye M. Caldwell and Henry Ricciuti (eds.), Review of child development research, Volume 3. Chicago: University of Chicago Press, ch. 4. Dohrenwend, Barbara Snell, 1973. Life events as stressors: a methodological inquiry. Journal of Health and Social Behavior 14, 167–175. Dohrenwend, Barbara Snell and Bruce P. Dohrenwend, 1974. Stressful life events: their nature and effects. New York: John Wiley and Sons. Duncan, Otis Dudley, 1970. ‘Partials, partitions, and paths’. In: Edgar F. Borgatta and George W. Bohrnstedt (eds.), Sociological methodology 1970. San Fransisco: Jossey-Bass, pp. 38 – 47. Eckehammer, B., 1974. Interactionism in personality from a historical perspective. Psychological Bulletin 81, 1026–1048. Elder, Glen H. Jr., 1974. Children of the Great Depression: social change in life experience. Chicago: University of Chicago Press. Elder, Glen H. Jr., 1975. ‘Age differentiation and the life course’. In: Alex Inkeles (ed.), Annual review of sociology, Volume 1. Palo Alto: Annual Reviews Inc.
Salkind_Chapter 03.indd 40
9/16/2010 12:41:22 PM
Elder and Rockwell
Life-course and Human Development
41
Elder, Glen H. Jr., 1977. Family history and the life course. Journal of Family History 2 (Winter), 279–304. Elder, Glen H. Jr., 1978. Approaches to social change and the family. Special issue, Sarane Boocock and John Demos (eds.), American Journal of Sociology. Elder, Glen H. Jr., and Richard C. Rockwell, 1976. Marital timing in women’s life patterns. Journal of Family History 1, 34 –53. Elder, Glen H. Jr., and Richard C. Rockwell, 1978. ‘Economic depression and postwar opportunity in men’s lives: a study of life patterns and health’. Forthcoming in: Roberta G. Simmons (ed.), Research in community and mental health: an annual compilation of research. Greenwich, Connecticut: JAI Press. Freedman, Ronald and Lolagene Coombs, 1966. Childspacing and family economic position. American Sociological Review 31, 631–648. Furstenberg, Frank F. Jr., 1976. Unplanned parenthood: the social consequences of teenage childbearing. New York: The Free Press. Gove, Walter, James W. Grimm, Susan C. Motz and James D. Thompson, 1973. The family life cycle: internal dynamics and social consequences. Sociology and Social Research 57, 182–195. Gunderson, E.K. Eric and Richard H. Rahe, 1974. Life stress and illness. Springfield, Illinois: Charles C. Thomas. Hetherington, E. Mavis, Martha Cox and Roger Cox, 1976. ‘The aftermath of divorce’. Paper presented to the meetings of the American Psychological Association, Washington, D.C. Hill, Reuben, 1970. Family development in three generations. Cambridge, Mass.: Schenkman. Holmes, Thomas H. and Richard H. Rahe, 1967. The Social Readjustment Rating Scale. Journal of Psychosomatic Research 11, 213–218. Hough, Richard L., Dianne Timbers Fairbank and Alma M. Garcia, 1976. Problems in the ratio measurement of life stress. Journal of Health and Social Behavior 17, 70–82. House, James S. and Elizabeth Bates Harkins, 1975. Why and when is status inconsistency stressful? American Journal of Sociology 81, 395– 412. Janney, James G., Minoru Masuda and Thomas H. Holmes, 1977. Impact of a natural catastrophe on life events. Journal of Human Stress 3, 22–34. Kohn, Melvin L., 1976. Occupational structure and alienation. American Journal of Sociology, 82, 111–130. Kohn, Melvin L., 1977. Class and conformity. Chicago: University of Chicago Press (orig. pub. 1969). Lane, Jonathan P. and James N. Morgan, 1975. ‘Patterns of change in economic status and family structure’. In: Greg J. Duncan and James N. Morgan (eds.), Five thousand American families – patterns of economic progress. Volume III: Analyses of the first six years of the panel study of income dynamics. Ann Arbor: Institute for Social Research, ch. 1. Lowenthal, M.F. and D. Chiriboga, 1973. ‘Social stress and adaptation: toward a life course perspective’. In: C. Eisdorfer and M.P. Lawton (eds.), The psychology of adult development and aging. New York: American Psychological Association, pp. 281–310. McCall, Robert B., 1977. Challenges to a science of developmental psychology. Child Development 48, 333–344. Mechanic, David, 1975. Some problems in the measurement of stress and social readjustment. Journal of Human Stress 1 (3), 43– 48. Rains, Prudence M., 1971. Becoming an unwed mother. Chicago: Aldine-Atherton. Rosenberg, Morris, 1975. ‘The dissonant context and the adolescent self-concept’. In: Sigmund E. Dragastin and Glen H. Elder, Jr. (eds.), Adolescence in the life cycle. Washington: Hemisphere, ch. 6. Sears, Robert R., 1977. Sources of life satisfactions of the Terman gifted men. American Psychologist 32, 119–128.
Salkind_Chapter 03.indd 41
9/16/2010 12:41:22 PM
42
Human Development
Volkart, Edmund H., 1951. Social behavior and personality: contributions of W.I. Thomas to theory and social research. New York: Social Science Research Council. Wildman, Richard C. and David R. Johnson, 1977. Life change and Langner’s 22-item mental health index: a study and partial replication. Journal of Health and Social Behavior 18, 179–188. Wilensky, Harold L., 1961. Orderly careers and social participation in the middle mass. American Sociological Review 26, 521–539. Young, Michael and Peter Wilmott, 1973. The symmetrical family. New York: Pantheon Books.
Salkind_Chapter 03.indd 42
9/16/2010 12:41:23 PM
4 The Family Conference: The Social Control of Human Development David R. Buckholdt
T
here are alternative ways of approaching the study of human development and the life course. Psychologists have tended to focus on traits of the person such as cognitive ability, memory, perception, and motivation, and have shown how these skills and characteristics change (decline) with age (Birren and Schaie, 1976). There is also a strong interest among psychologists in describing human development as a series of stages through which we all must pass if we are to grow and develop normally or optimally (Piaget, 1932; Erikson, 1950; Kohlberg, 1969; Levinson, 1978). Social historians (Aries, 1962; Shorter, 1975; and Kett, 1977) and anthropologists (Mead and Wolfenstein, 1955; Whiting, 1963) also have contributed through their studies of the significance of age at different historical periods and the wide variation in cultural definitions of age-appropriate behavior. Sociologists have had a part in the study of the life course through their research on matters such as the influence of class position on social opportunity (Blau and Duncan, 1967), the effects of occupation on childrearing practices (Kohn, 1969), the impact of social disasters at childhood or youth on the adult (Elder, 1974), and the consequences for persons who marry early or late or have a few or many children (Aldous, 1978). While the several approaches make an important contribution to our understanding of human development and the life course, they tend to blind us to the concrete social processes through which development or change is experienced in social interaction. It is as if factors external to human consideration and control force us to develop in one way or another. Seemingly, forces behind Source: Journal of Family Issues, 4(4) (1983): 613–631.
Salkind_Chapter 04.indd 43
9/16/2010 12:41:08 PM
44
Human Development
the life cycle are beyond human experience and social construction. We are pawns in the play of stages, culture, historical moment, or class position. Without denying the obvious importance of such conditions or factors, it may be useful to look beyond or through them to see how matters such as human development and the life course are addressed as practical matters of everyday life. In a recent book, Karp and Yoels (1982) suggest such an approach, using the theoretical resources of symbolic interaction. They conceptualize the life course as an emergent product of social interaction, a social creation of the reality-defining and reality-sustaining work of human actors. While insisting on the importance of historical, cultural, and social-locational factors in matters relating to the life course, Karp and Yoels make their most important theoretical contribution by emphasizing the role of social process. They insist that aging and related matters of the life cycle are not merely relevancies imposed on persons by external forces, but are experienced products of immediate social processes whereby human actors engage one another in the business of coming to terms with the meaning of human development and the life course. Although historical, cultural, psychological, and other factors may frame and constrain the issues and provide a stock of acceptable questions and answers, the meaning or significance assigned to matters relating to the present, past, and future are arrived at through a process of symbolic work in concert with others.
Controlling the Definition of Development A social interactionist perspective on the life course attends to the negotiated meanings of age and related matters in everyday contexts, rather than in predefined categories or stages that are outside of human experience and construction (Gubrium and Buckholdt, 1977). Given the significance of social context, individuals may encounter quite different inventories of symbols and experience different aging selves. A person can feel, or be made to feel, hopelessly old in one context and young and vital in another. The experienced truth of the matter is an emergent product of social process, not simply a question to be answered by decontextualized measurement or expert opinion. Yet there is something too fluid about this. Particular images, interests, and audiences serve to shape the social processing of development (Gubrium and Buckholdt, 1982). While the significance of age and the life course may arise as an issue in numerous settings, there are contexts in which these matters are particularly focused and controlled in terms of the formal attention given to them and the consequences of decisionmaking. I am referring to contexts in which one or more participants claim expertise or professional insight. They are often found within organizations or institutions that have a responsibility for assessing life course trajectories and providing
Salkind_Chapter 04.indd 44
9/16/2010 12:41:09 PM
Buckholdt
Family Conferences
45
remedial or therapeutic help for those off-course or otherwise in need of care. Organizations like schools, hospitals, prisons, and residential treatment centers are examples. The process of symbolic or definitional work undertaken in these contexts is similar to what takes place in more informal, everyday settings, but there are also some important differences. Professional or expert judgment tends to be more self-assured and conclusive and less susceptible to alternative, nonprofessional opinion. And the interests of professional disciplines and organizations become entangled with judgments about the person, making the business of deciding on development and the life course much more than simply objective assessment of an individual’s needs and problems (Gubrium and Buckholdt, 1977; Karp and Yoels, 1982). There are important interests and control components to the social processing of the life course. This article deals with the social control of the meaning or relevance of the life course and age, particularly old age, in two organizations: nursing homes and a physical rehabilitation hospital. The particular setting of interest is the family conference, a meeting between professional staff and family members at which there is extended discussion of matters related to the resident’s past, current status, and likely future. The focus of the proceedings is on the resident or family member. Participants discuss the life course of the resident, particularly as this relates to success or failure in treatment and care and prognosis for the future. Although discussion is centered on the resident or family member, it is not the sole concern. The significance of age and the life course also involves consideration of the professional and organizational interests of staff members and attention by family members to their own development.
The Settings Observations of twenty family conferences in a physical rehabilitation hospital were undertaken as part of a larger study of caregiving and professional practice in physical medicine and rehabilitation (Gubrium and Buckholdt, 1982). The hospital, a 92-bed facility, cared for patients in need of specialized services for the physical and psychological problems resulting from strokes, amputations, head trauma, hip fractures, and spinal cord injury. Normal stay in the hospital was 6 to 8 weeks. A family conference was scheduled for most patients, usually several weeks before discharge. Staff members in attendance normally included physical, occupational, activity, and speech therapists; a primary care nurse; a social worker; and occasionally a physician. Representatives of the family varied considerably from only a spouse to several generations of family members, and occasionally a neighbor or friend. The official rationale for the conference, which usually lasted for about one hour, included: to report on progress in treatment, to answer
Salkind_Chapter 04.indd 45
9/16/2010 12:41:09 PM
46
Human Development
questions, to enlist the family’s help with problems, and to prepare the family and the client for a future that would be different from the past. Observations of 25 family conferences also were made in two nursing homes, one a 360-bed facility and the other with 270 beds. While both facilities provided a variety of levels of care, they were primarily skilled care institutions. Some of the residents were considered temporary, being inpatients until they recovered from an illness or an accident, but the majority were not expected to return to independent living or to leave the confines of the nursing home for any extended period. A family conference was scheduled for each resident about one month after admission. Staff and family members met to discuss reasons for and feelings about placement; problems and progress in therapy; what the family could do to ease the transition; particular concerns family members might have with medications, roommates, or staff members; and related matters. As in the physical rehabilitation hospital, a wide range of family members and others attended. Those who came regularly from the staff included a social worker, a nurse, and a representative from the activity department. Dietitians and physical and occupational therapists attended irregularly, but they usually did supply written progress notes that the social worker read to family members.
Charting a New Stage of Life Once family members are seated around a conference table, a social worker typically explains that he or she has called the meeting in order for staff to meet the family, answer questions about the resident’s condition, and report on progress in treatment and care. Various staff members then report individually on their work with the resident and evaluate responsiveness or progress. Staff ask the family to apprise itself of life and care in the institution so that they can assist in motivating the resident and in making life there more comfortable and meaningful, the typical message being that if the resident is cooperative in treatment and therapy, he will fare better than he would without professional help. Staff members focus their presentations on disabilities, treatments, and possible futures for the resident. Family members are not often content to limit discussion in this way, however. They speak of the patient’s past, presenting him or her in various roles, how he or she was a successful professional, a loyal worker, or a generous and inexhaustible mother and wife. The stories often end on a note of remorse that the last phase of life must end this way, with a serious disability or being confined to an institution. Staff members listen to the negative imagery but are not content to allow discussion to dwell there too long. They have a message of hope, that the future need not be so discontinuous with the past if certain things are done. First, the resident must cooperate with their treatment plans. If their
Salkind_Chapter 04.indd 46
9/16/2010 12:41:09 PM
Buckholdt
Family Conferences
47
advice is followed, at least partial recovery from the disability may be possible or life in a nursing home can be fulfilling. Second, the family must assist the professional staff. They need to understand that life for their kin and for them has changed. The family must assume a new role as “linker” or “facilitator” between the professional staff and the resident. They are still responsible but the burden must now be shared. It will not do to think of this as a hospital where responsibility for the care of a family member is given up only temporarily. A positive future will depend in part on how open they are to professionals. However this message is delivered, discussion typically returns to the resident’s past – but there is a new focus. Instead of portraying a life course which has been sadly diverted or redirected, attention is given to personal history, to characteristics or dispositions that may help to explain the resident’s responsiveness to treatment and care in the institution. For example, an elderly woman is evaluated as having made remarkable progress in her treatment for an amputated leg. Her son and daughter explain that, because of her “independent spirit” and “fighting instincts,” she has overcome numerous adversities in her life. All predict a positive future for her. An aged man is said to be adapting well to the nursing home. A son explains that his father has always been an “adaptive” and “cooperative” person, a man who does whatever he has to do. They decide he will enjoy his last days in a nursing home or at least he will not be too unhappy. There are also negative cases to explain. Staff complain that a 60-year-old male resident refuses to follow an exercise program to strengthen his arms and legs, which have been weakened by a stroke. His wife notes that she is sure that her husband appreciates their efforts, but that he is an “ornery” and “independent” person who was taught by his parents to do things for himself and has always done so. Once he gets out of the hospital and is more on his own, she is certain that he will follow some of their suggestions. Several staff members predict poor results and thus less independence in the future unless he follows their directions immediately. An elderly woman in a nursing home is said to constantly complain about her care and food. Her sister explains that she was spoiled as a child and that it has become a habit for her to complain about everything. In cases where longstanding dispositional characteristics can explain problems with responsiveness to treatment or adaptation to an institutional environment, staff remain hopeful for positive change. They explain to the family that sometimes the shock of facing a major life transition will produce new attitudes and behavior. Also, they are trained to deal with behavioral and attitudinal problems. They will structure new expectations and new demands that will make it difficult to maintain former behavioral patterns and dispositions that limit adjustment and the effectiveness of treatment. Of course they will need the help of the family if, together, they are to confront the resident with a consistent approach.
Salkind_Chapter 04.indd 47
9/16/2010 12:41:09 PM
48
Human Development
When responsiveness to care and treatment can be understood with reference to the resident’s past, staff and family members are usually content with their hunches about why things are going well or poorly. In some cases, however, the resident’s attitudes or behaviors seem “out of character” to the family. A husband is shocked to learn that his wife is swearing and striking out at her nurses and their aides. A wife cannot believe that her husband is making sexual advances toward elderly women on his floor. A daughter is overwhelmed when she hears that her father, who has always been strongwilled and optimistic, refused his medications and other treatments and claims that he wants to die. Staff members ordinarily explain responses that are out of character as consequences of the shock or trauma of disability or relocation to an institution. The person’s life has been altered dramatically and it will take time to adjust to new and different circumstances. The family can help with the process of adjustment. They can visit regularly and call when they cannot come in person. Occasional visits to the home or favorite restaurant may help. They can bring cherished items from home to the institution, such as a chair or photographs, and inform the staff about favorite foods and activities. Staff members do not accept the occasional suggestion that a person’s life is now so different and so unacceptable that he or she should simply be left alone by staff and allowed to spend the remaining years without externally imposed goals or demands. Their role is to serve and they will not be turned away. They were shocked when the son of a nursing home resident made the following plea: Look, Dad has always been a kind a generous person but you’re pushing him too far. He’s always prided himself on his independence but now you people have to do almost everything for him. He hates his roommate and that night nurse who wakes him up and shoots him in the ass. He may come out of it and he may not. Just leave him a little space for himself and understand why he’s not glad to see you. If he wants to go his own way, that’s okay.
Some families willingly accept the staff’s explanation that “out of character” behavior is a normal response to a dramatic change in physical condition or living arrangements. They are confident that, given time and therapeutic intervention, the person will return to his or her old and real self. However, some families see a different source. They suggest that at least some of the problems may be traced to professional treatment or institutional routine and organization. In this view, institutions and their caregivers may be interfering with positive development and creating barriers for those seeking a return to health and happiness. The family conference is, in part, an occasion for reviewing the life course and charting the future for the aged and disabled. While family members may be allowed to engage in wide-ranging discussion and speculation for a time, the staff sooner or later focus attention on matters that are relevant to
Salkind_Chapter 04.indd 48
9/16/2010 12:41:09 PM
Buckholdt
Family Conferences
49
treatment and care. Details of a person’s past are useful if they help to explain adjustment to the institution and responsiveness to therapy. The future will depend on how a resident and his or her family cooperate with professional caregivers.
Professional Care and the Future Most family members arrive at a conference with vague expectations of what is to be accomplished there. At best, they plan to find out how their loved one is responding, learn what they can do to help, thank staff for their work, and possibly present one or more specific complaints. Staff members, however, usually prepare a specific “game-plan” for the conference. Sometimes the plan is simply to congratulate the family on the progress or cooperation of the resident and to thank them for their own cooperation. In other instances their purposes are more elaborate and complex. The family needs to be won over to the goals of the institution and its caregivers. Continued inaction or obstruction on the part of the resident and family will result in a poorer future than could have been achieved by proper responsiveness to professional care. Many families are seen by staff as being “unrealistic” about the future. They have not yet adjusted to the changes in family and personal lifestyles that are clearly demanded. Some families of disabled persons expect their relative to return home “good as new.” As a result, they have not made important physical alterations in the home or made arrangements for homecare or out-patient therapy. Staff members take the opportunity provided by the conference to describe the likely condition of the resident upon release and to insist on better planning. Some families readily accept the staff’s forecast of the future and agree to be more realistic about what they will be facing. Others object, either by questioning the staff’s predictions or by suggesting that it is the responsibility of the institution to do more. Such disagreements sometimes lead to heated debates that usually end with staff alluding to their training and expertise in such matters but admitting that sometimes they are wrong. Another – and for staff a more serious – form of unreality comes in the form of families who expect their relative to return home some day when staff feel certain that permanent residence in an institution is highly probable. This is a particularly annoying situation when the staff believe that false expectations are interfering with the resident’s responsiveness to professional care and adaptation to institutional living. Residents who falsely think they will be going home soon often cause problems. They may refuse to participate fully in therapy and activities and do not conform to institutional routine. Staff members sometimes suggest that it would be best for all involved if the relative believes that he or she will reside in this or another institution permanently,
Salkind_Chapter 04.indd 49
9/16/2010 12:41:09 PM
50
Human Development
even if there is some small chance of returning some day. The staff ’s work will certainly be easier and more successful. Families often counter with the argument that their relative might lose all hope and interest in life if he or she thinks that institutional residence is permanent. Some families admit to encouraging their relative to think about, and work toward, the day of return, even when they are uncertain that this is possible. Staff discourage this practice in the name of realism, honesty, and pragmatism. They ask family members to change their approach and to enlist the help of others, particularly family physicians, who are sometimes suspected of encouraging false expectations. Some families also are asked to take additional concrete steps in order to insure the resident’s realistic view of the future which, they feel, will lead to better cooperation with institutional care and routine. Some common suggestions are to sell the resident’s house, rent an apartment to someone else, and divide up household items among family members. Of course, legal custody will be needed first. Most families agree to cooperate with the staff, particularly after they hear about the problems the staff are having with their relative and the improvements that will come only with proper responsiveness to professional care and treatment. In some instances, the new strategy is put into effect immediately by bringing the resident to the family conference and announcing that he or she should plan on residing in this or another institution for a long, long time. The family is comforted with the assurance that tears and accusations of betrayal are a small price to pay for a more realistic outlook. Sometimes the recommended strategy to increase cooperation and adjustment is exactly the opposite to the one described above. Family members are asked to lure their relative into a more positive stance toward care and treatment through a promise of eventual return to the outside world. Staff members do not hesitate to suggest other ways that the family can assist the staff as well as their relative beyond issues related to length of stay. Some residents are said to be too demanding, expecting too much from staff. Their families are asked to explain that individual needs cannot always be taken care of on demand. The resident must understand that a nursing home or physical rehabilitation center is not a regular hospital, meaning that the level of care is not as intense or recovery not as immediate. Other problems mentioned frequently include disagreements with roommates, nasty remarks to “colored” aides and residents of different ethnic or religious backgrounds, and complaints about the food. Staff members ask the family to help with these problems for, in so doing, they can assist their relative to adjust to the environment and take advantage of the services offered. Most families agree to help, although some express concern that the problem caused by daily living in an institution might cancel out the advantages to be had from the professional care and treatment available there. One additional issue that is forced on family members, particularly in the physical rehabilitation hospital but also on occasion in the nursing homes,
Salkind_Chapter 04.indd 50
9/16/2010 12:41:09 PM
Buckholdt
Family Conferences
51
is the matter of progress in therapy. Staff call on family members to visit therapy sessions and to encourage their relative to follow directions and practice what they are taught. In part these requests are based on a professional model of treatment that recognizes the positive contribution that the family can make to treatment. There is also a self-serving motive for both staff and families. Third-party insurers will not pay for physical, occupational, or speech therapy beyond a few weeks if progress cannot be demonstrated. Staff members are often concerned that the institution might not be reimbursed for services rendered. In that case, they will try to bill the resident or the family and further services will certainly have to be on a private basis. Staff do not hesitate to remind family members that regular attendance at therapy sessions and follow-up practice and exercise are in everyone’s interest. Many family members passively accept both the long-term prognosis of the professional staff concerning the future of their relative, and the latter’s views on particular issues and problems concerning life in the institution. Some questions may be asked about the probability of alternative futures and how much the will of God and personal determination might help matters, but there are ordinarily no direct challenges to the expertise of the caregivers. On occasion, however, one or more family members suggests that the institution and its staff are in fact hastening or contributing to the decline of which they were reporting. A wife suggests that her husband’s apparent confusion and lack of motivation are caused by being around so many others who are disoriented. She feels that the only way to save him from further decline in his mental functioning is to take him home. A son believes that his father’s growing apathy can be traced to his dependence on institutional personnel and routine. He has always been fiercely independent and now has no reason to live. He has lost his pride. A former social worker argues that the role expectations of the institution encourage her mother to utilize less ability and independence than she actually possesses. “She thinks you are supposed to feel sick and complain a lot in a nursing home,” her daughter argues. Another common explanation is that medications are making the resident lethargic, confused, or belligerent. The possibility that an imbalance in medications or a reaction to them might be causing problems is usually given some credence by staff, at least to the extent that they promise to check with a physician. Any suggestion, however, that the nursing home or hospital is contributing to the very problem staff are reporting on is greeted with some combination of rejection, disbelief, or anger. Sometimes staff consult their charts to demonstrate that mental or physical problems were evident well before the resident came to the present institution, or they question the memory of family members concerning when the problems actually appeared. Another common strategy for avoiding blame is to remind the family of the excellent care and services that they are providing, and to suggest how much more severe the problem might be if these were not available.
Salkind_Chapter 04.indd 51
9/16/2010 12:41:09 PM
52
Human Development
In spite of their denial of any personal or institutional responsibility for problems, staff admit in other ways that institutions are not always therapeutic or even benign to health and development. Sometimes they trace problems of a patient to a hospital or other nursing home that they claim has a bad reputation. They often counsel family members on how to be proper advocates for their relative. This usually means how to approach a nurse or aide with a complaint so that they will not get angry and take it out on the patient or label him or her as a “squealer” or “complainer.” Finally, during light moments in a conference or after the family departs, staff members admit occasionally that an institution can have negative as well as positive effects. As one nurse put it, “If I have to go to a home someday, I sure hope it’s like this one. But even this place will make you nuts. When you’re around senility all day, you catch it.” Whether inside or outside of an institution, the future of a resident, as portrayed by staff, is contingent on a person’s cooperation with institutional routine and professional expectations. The family can and should assist with their member’s adaptation. Discussion focuses on the interests of residents and their families, which are assumed to be similar if not identical, and the work of the staff on behalf of the resident. Any suggestion that institutional living or treatment may be irrelevant to the person’s future, or even harmful to it, is officially denied.
Family Life Courses Family members have a stake in whatever the future is to hold for their relative beyond his or her personal well-being or development. This becomes clear in numerous conferences as family members discuss, often with great emotion, how their relative’s problems have affected their own lives and how the future of the person being considered in the conference will have an impact on their own life course. Sometimes the primary concern is money. Family members worry that their personal savings or other assets will be depleted if private or social insurers will not cover all or a significant part of the stay in the institution or subsequent out-patient treatment. If family resources are to be used in this unanticipated way, future plans of the family may have to be altered. A daughter with young children will need a full-time job, teenage children will have to delay their plans for college, a middle-aged woman might have to give up her job in order to attend to an older sister since professional help is beyond their means, or plans for a new home may be shelved. The physical and emotional toll of caring for an aged or disabled spouse or parent are presented often in discussion. A common theme, particularly in the nursing homes, is a spouse or child who has cared for a loved one for several years. For many this has been a full-time job, around the clock.
Salkind_Chapter 04.indd 52
9/16/2010 12:41:09 PM
Buckholdt
Family Conferences
53
The decision to seek nursing home placement has been promoted not from any dramatic change in the person needing care but from the exhaustion of the caregiver. An elderly woman tells how she has cared for her husband for ten years, usually getting up at night every two hours to tend him. Her own health has suffered as a result and she feels that if she is to have any satisfaction in her remaining years, she needs relief from this daily grind. A middle-aged woman, whose husband died several years before, reports that she had two full-time jobs. She puts in eight hours each day at a hospital as a nurse and then spends most of the remainder of the day caring for her aged father. She has little time or energy left for her two younger children or for herself. The decision to put her father in the nursing home has been difficult, but she feels that she has no choice if she and her children are to have any chance for a satisfactory family life. Similar concerns are expressed by family members in conferences at the rehabilitation hospital. They worry about the impact of their loved one’s disability on their own lives. For some, the effects are regrettable but will be managed since this is part of being a family. A daughter will give up her job in order to take care of her stricken father rather than send him to a nursing home. A wife will resign from her beloved service activities so that she can transport her husband daily to outpatient therapy. In other cases, the needs and interest of one or more members of the family take precedence in decision-making. A young woman whose husband was paralyzed in an automobile accident tells the staff that his future is in a nursing home. She refuses to learn how to take care of him at home. As soon as his condition stabilizes she plans to seek a divorce. The husband of a stroke victim likewise refuses to learn the techniques of therapy or to involve other members of his family. If they can afford to pay for care, his wife can stay at home. Otherwise, she will go to a nursing home or elsewhere. Their future cannot be restricted by this unfortunate accident to one of them. The symbolic significance of decisions concerning the future of a family member is as important, or more important, to some families as the concrete impact on their own lives. Some families take pride in living up to what they see as cultural, ethnic, or religious beliefs concerning illness or old age. A loved one will return home even though the physical and emotional costs to the family will be enormous. These costs are justifiable given the personal debts to be repaid to a loving parent and the satisfaction stemming from beliefs upheld. The daughter of a woman who immigrated from Romania put it this way: “We do things differently here but in Romania you don’t go to a nursing home. My mother would never forgive me and I don’t think I could live with myself.” On the other hand, family members can suffer when they are unable to live up to their ideals of family responsibility. Some family members in conferences at both the physical rehabilitation hospital and the nursing home express a firm belief that one should never pay someone for services or care that the family can provide. A variety of circumstances, however, will prevent them from living
Salkind_Chapter 04.indd 53
9/16/2010 12:41:09 PM
54
Human Development
up to this principle. While the expenses for outside service will seriously damage the family budget, the real hurt comes from a loss of pride and dignity. There is also pain from their inability to repay personal debts and obligations. One young woman illustrates this well as she tearfully agrees with the staff’s suggestion that her mother remain permanently in the nursing home: I really have no choice with my kids and my job. You’re probably right, mom will be happy here and that’s what counts. But it won’t be good for me. When I was young I was sick a lot and she took care of me and now it’s my turn to repay her. I can’t and it’s going to be hard to live with.
Sometimes a spouse or parent will add to the difficulty by reminding the family of debts owed and suggest betrayal. He or she will depict the nursing home or hospital as a prison or tell of his or her misery at being kept away from home. Some residents who attend a family conference will directly contradict the reports by staff members on how well they are being cared for. At other times, however, there is more of a sense of resignation or even satisfaction with one’s circumstances. The resident thanks the staff for their helpfulness and care and the family expresses their appreciation. Even so, there is often an underlying tone of discontent, a wish that things could be different. A son whose father is a permanent resident in a nursing home and who is pleased with the institution nevertheless has some lingering concerns that he expresses this way: This is a real nice place and I’m glad for Dad. But what are we doing to ourselves? Everyone has his own things to do, a job or what else. If you can’t care for your parents and only visit them once in a while and have them over for Thanksgiving and Christmas, are you really still a family?
Issues relating to interlocking life courses complicate the decisionmaking done in family conferences considerably beyond the concerns of staff for the well-being of the resident. Staff members are ordinarily sympathetic to these matters and willing to listen but their interests are limited. Their primary concern is with the resident’s responsiveness to institutional care and routine and they, sooner or later, return to this topic. They envision a future for their client that depends to a great extent on how he or she cooperates with their efforts. The family can help with this. The fact that the resident’s future may have a significant effect on the futures of other family members is of concern to them, but is not seen as something they can or should deal with in the context of a family conference.
Conclusion This study of family conferences contributes to the growing social interactionist literature on the processing of human development by focusing on its social
Salkind_Chapter 04.indd 54
9/16/2010 12:41:09 PM
Buckholdt
Family Conferences
55
control component. It illustrates the role of professional and organizational contexts in which matters of human development and the life course are increasingly being addressed in our society. Contexts similar to the family conference can be found in schools, hospitals, prisons, and numerous other settings. While the skills, deficiencies, and problems of individual clients are officially at issue in these contexts, judgments are made not merely in terms of ideal standards or life in a social vacuum but with reference to the purposes and goals of particular professional groups and the interests of particular organizations. Human development then becomes a matter of how well or poorly a client meets the expectations and needs of professional caregivers and their host institutions. An emphasis on the importance of social context and control makes it reasonable to assume that a person may experience a variety of developing and aging selves. Thus, a person may be cast as difficult or objectionable in one setting, and strong and inspirational in another, depending on contextually-variable interests in the person whose life course is being addressed. The study documents the intermingling of life courses, in this case in families. Interpretations of problems, needs, and desirable future for loved ones include concern for one’s own development as well as the interests of other members of the family. This is not to suggest that self-serving interests dominate. Often they do not. However, how a family interprets the developmental needs and rights of a relative can have a significant impact on their own finances, opportunities, commitments, and feelings of self-worth. Decisions made about the life course of a parent or spouse will constrain or expand developmental opportunities and trajectories of family members. The family conference is one among many similar forums in which the life course is not only assessed and understood, but also shaped and directed. While attention is on the resident, patient, or family member, and the official purpose is to diagnose and correct concrete problems, a variety of other concerns enter the decisionmaking process. Institutions and their professional caregivers have interests in particular life courses that verify their interpretations, facilitate their work, and testify to their effectiveness. The needs and interests of families also become intertwined with the futures of relatives and the work of caregivers. An appreciation of this makes it reasonable to claim that the life course is not only experienced but also controlled.
References Aldous, J., 1978. Family Careers. New York: John Wiley. Aries, P., 1962. Centuries of Childhood. New York: Vintage. Birren, J. and K. Schaie, 1976. Handbook of the Psychology of Aging. New York: Van Nostrand, Reinhold. Blau, P. and O. D. Duncan, 1967. The American Occupational Structure. New York: John Wiley. Elder, G. H., 1974. Children of the Great Depression. Chicago: Univ. of Chicago Press.
Salkind_Chapter 04.indd 55
9/16/2010 12:41:09 PM
56
Human Development
Erikson, E., 1950. Childhood and Society. New York: Norton. Gubrium, J. F. and D. R. Buckholdt, 1977. Toward Maturity: The Social Processing of Human Development. San Francisco: Jossey-Bass. ———(1982). Describing Care: Image and Practice in Rehabilitation. Cambridge, MS: Oelgeschlager, Gunn, and Hain. Karp, D. A. and W. C. Yoels, 1982. Experienceing the Life Cycle. Springfield, IL: Charles C. Thomas. Kett, J., 1977. Rites of Passage. New York: Basic Books. Kohlberg, L., 1969. “Stage and sequence: the cognitive-developmental approach to socialization,” in D. A. Goslin (ed.) Handbook of Socialization Theory and Research. Chicago: Rand McNally. Kohn, M. L., 1969. Class and Conformity: Study of Values, Homewood, IL: Dorsey Press. Levinson, P., 1978. The Seasons of a Man’s Life. New York: Knopf. Mead, M. and M. Wolfenstein (eds.), 1955. Childhood in Contemporary Cultures. Chicago: Univ. of Chicago Press. Piaget, J., 1932. The Moral Judgment of the Child. New York: Harcourt Brace Jovanovich. Shorter, E., 1975. The Making of the Modern Family. New York: Basic Books. Whiting, B. (ed.), 1963. Six Cultures: Studies of Child Rearing. New York: John Wiley.
Salkind_Chapter 04.indd 56
9/16/2010 12:41:09 PM
5 From Childhood to the Later Years: Pathways of Human Development Robert Crosnoe and Glen H. Elder Jr
A
ccording to life course theory, human development and aging are lifelong processes (Elder and Johnson 2002). Experiences in childhood have long-term consequences that filter into later stages of the life course, whereas patterns of adjustment and functioning in the later years arise from trajectories through preceding life stages (Block 1993; Settersten 1999). Thus, although the data requirements of studying long-term development are great, linking multiple life stages is an important goal in the study of the full breadth of aging. Linking developmental processes from childhood to the later years is one such example that, for very practical reasons, has not been fully examined. This study attempts such a linkage by exploring whether, how, and why men who grew up in different types of family environments demonstrate different patterns of adjustment and functioning later in life. It draws on the longestrunning study in the United States, the Terman study of talented California children that began in 1922. This data source, a mixture of survey and qualitative data spanning seven decades, is highly specialized, but it offers a unique opportunity to trace the lives of men from their earliest experiences to their final days and across eras of sweeping historical change. As such, it is a valuable resource for life course studies. In pursuing this research, we draw on two key features of an earlier study of the Terman men that we conducted. First, in this earlier study, we created holistic profiles of aging based on men’s life satisfaction, vitality, family engagement, occupational success, and civic involvement. In the present study, we replicate Source: Research on Aging, 26(6) (2004): 623–654.
Salkind_Chapter 05.indd 57
9/16/2010 12:40:57 PM
58
Human Development
these profiles in order to capture, more broadly, the adjustment and functioning of men in their later years. Second, in the earlier study, we formulated a basic conceptual framework that posited two pathways from adult to later experiences: the mediational in which adult experiences predicted aging profiles by shaping life circumstances in the later years and the supplemental in which adult experiences and current circumstances were independently related to aging profiles. The evidence supported the latter, suggesting that knowledge of the journey added to knowledge about the destination. The present study applies this framework to the potential carryover of early family experiences. In doing so, we address a challenging question of developmental research: how much does childhood really matter?
Family Experiences in Childhood and Adolescence Scholars and laypersons agree that the family has a profound impact on how children and adolescents “turn out” (Dornbusch 1989). After all, the child development literature, one of the richest in the social and behavioral sciences, has documented the numerous ways that families influence the growth and adjustment of young people (Maccoby 2000). The underlying assumption of this literature is that these effects are long-lasting. Certainly, family experiences early in life predict many adult outcomes, but the potential of early family experiences to shape the full life course has been largely untested. The Terman data, however, give us an opportunity to test this basic assumption for a special group of men. Conceptualizing the family of origin as a context of lifelong development could conceivably entail a laundry list of early family factors as predictors of later outcomes. To avoid such an ad hoc endeavor, we chose to focus on three meaningful categories of family status that have been studied extensively in the past, have been linked to adulthood (if not the later years), and map onto distinct mechanisms. These characteristics of early family life are socioeconomic status, parental divorce, and the affective tone of parentchild relationships. First, socioeconomic status has long been a focal point of social research. Among children, it encompasses numerous circumstances, including the income and occupational status of parents. In this study, we focus on another key dimension of socioeconomic status: parent education (Sewell and Hauser 1980). Being raised by well-educated parents influences child development (e.g., less antisocial behavior, better health, higher achievement in school) and, more important, this influence has been found to extend to adult experiences, in the form of educational and occupational attainment (Crockett and Petersen 1993; McLoyd 1998; Schneider and Coleman 1993). The potential for the socioeconomic status of the family of origin to have implications for later life is based on its power to stabilize the multiple trajectories of development in and out of childhood. The educational attainment of
Salkind_Chapter 05.indd 58
9/16/2010 12:40:58 PM
Crosnoe and Elder
Childhood and Aging
59
parents taps many things, including financial capital, access to opportunities, stable careers and living situations, and the well-documented tendency for education to cultivate psychological resources (e.g., personal control) that allow individuals to better manage their lives (Cameron and Heckman 1993; Mirowsky and Ross 2003). What ties these disparate things together? In effect, well-educated parents are better able to construct a stable home life, open up social and educational opportunities for children, and model effective strategies for life management and social interaction, all of which will smooth entry into young adulthood; this, in turn, provides a firmer foundation for subsequent life stages (Clausen 1991; Shanahan, Hofer, and Miech 2002). In other words, well-educated parents ensure a measure of stability and life opportunity that likely has far-reaching benefits. Second, divorce is a decision about a relationship between adults, but children’s lives are linked to this decision, affected by how it influences their parents’ well-being, their living circumstances, and their relationships with their parents and other family members. In other words, divorce transforms the family context, and the extent of this transformation may have lasting effects (Hetherington, Bridges, and Insabella 1998). For these reasons, parental divorce has long been of interest to developmental research. Indeed, children who experience a parental divorce have been found to have greater problems with school, relationships, emotional health, and conduct, even controlling for the conflict and economic changes that come with this event (Amato and Keith 1991; Cherlin 1992), and more recent, longitudinal evidence suggests that some of these consequences linger into adulthood (Amato, Loomis, and Booth 1995; Chase-Lansdale, Cherlin, and Kiernan 1995; Hetherington and Kelly 2001; Wallerstein, Lewis, and Blakeslee 2000). Divorce can alter many aspects of life that, independently, shape individual outcomes in the short term and long term. Together, these myriad consequences likely influence the later stages of the life course because of their potential to create an early disruption in development. Thus, in many ways, divorce acts as the counterpoint to socioeconomic advantages. Children who experience a parental divorce are likely to witness a certain degree of conflict in their homes, undergo at least a temporary period of upheaval (e.g., moves, school changes, economic adjustments), and experience a change in time spent with adults. Moreover, boys often have greater trouble with parental divorce because it generally signifies a separation from their same-sex parent (Hetherington et al. 1998). Many children recover nicely from parental divorce, but this experience raises the prospect that young people will have “shocks” to their schooling careers, relationship models, and social life that, in turn, interfere with stable transitions into adulthood. In other words, unlike socioeconomic advantages, parental divorce early in life could initiate a pattern of instability in the life course that filters into later life. Third, central to the study of child development are the affective bonds between parents and children. Warm, supportive relationships with parents are a support system at the most crucial stages of development, when children and
Salkind_Chapter 05.indd 59
9/16/2010 12:40:58 PM
60
Human Development
adolescents are learning how to engage the world (Grotevant 1998; Steinberg 2001). Scholars from different disciplines have conceptualized the role of these bonds in different ways, as seen in the social capital literature in sociology and attachment theory in developmental psychology, but the basic theme, and empirical evidence, is the same. Young people do better when they have these supportive bonds to fall back on as they grow up, even after they have left home (Bowlby 1988; Call and Mortimer 2001; Furstenberg et al. 1999). How parent-child attachment influences early development is more or less clear, but why would its apparent benefits persist even after the child has left home, even if the parent-child relationship changes dramatically with time or even after the relationship ends through volition or death? Several specific explanations suggest why this would occur. To encompass them, we focus on the ability of parent-child attachments to create a secure base for the navigation of the world (Furstenberg et al. 1999). Axiomatic in the developmental literature is the notion that children need a foundation of trust and commitment from parents before they can try new things, enjoy novel situations, test boundaries, and risk failure, all of which are necessary to create social networks and pursue new opportunities (Grotevant 1998). Once the young person has developed this model of social engagement, it tends to be applied through self-direction in later life stages, sustaining itself independent of changing family relations. These factors are important in their own right. They also come together to form a basic picture of family life, one that is significant for early development and its potential to serve as a foundation for lifelong development.
Holistic Profiles of Aging This study considers the linkage between early family experiences and later patterns of adjustment and functioning. Having introduced these family factors, we turn now to later life. To gauge adjustment and functioning in the later years, we use the holistic profiles of aging that we created for an earlier study (Crosnoe and Elder 2002) to reflect emerging themes in developmental science that view individuals as multidimensional but indivisible wholes. This holistic (or person-centered) approach combines multiple attributes together in a single profile rather than focusing on any one aspect (Magnusson and Cairns 1996). For example, psychological adjustment by itself likely categorizes a different set of individuals as doing well or poorly than when psychological adjustment, interpersonal functioning, and social involvement are considered together. The key is to consider how different aspects of life come together in one person (Bergman 2001). Using this holistic approach, we identified four basic aging profiles that encompassed multiple dimensions of adjustment and functioning, such as health, role enactment, social engagement, and life review that are prioritized by different fields of aging research (Crosnoe and Elder 2002). Specifically,
Salkind_Chapter 05.indd 60
9/16/2010 12:40:58 PM
Crosnoe and Elder
Childhood and Aging
61
1
0.5
0
−0.5 Life Satisfaction Vitality
−1
Family Engagement Occupational Success
−1.5
Civic Involvement −2
Less Adjusted (N = 52)
Career-Focused but Socially Disengaged (N = 121)
Family-Focused (N = 126)
Well-Rounded (N = 113)
Aging Profiles
Figure 1: Four aging profiles Source: Profiles originally derived by Crosnoe and Elder (2002).
we performed a cluster analysis on five factors measured in men’s later years: family engagement, perceived occupational success, civic involvement, life satisfaction, and vitality (Antonucci and Akiyama 1995; Menaghan 1989; Moen, Dempster-McClain, and Williams 1992; Neugarten 1969; Thoits 1992). A lengthier discussion of the creation of these profiles is included in the Methods section. The analysis produced four “styles of aging” (see Figure 1). The less adjusted men scored low on all factors, career-focused but socially disengaged men scored low on all factors but perceived occupational success, family-focused men scored low on all factors but family engagement, and the well-rounded men scored high on all factors. By integrating well-documented significant dimensions of aging into general profiles, this holistic approach addressed the within-person heterogeneity of people’s lives in the later years as well as the between-person diversity of the aging process (Baltes and Baltes 1990; Shanahan et al. 2002). These aging profiles serve as the reference by which we evaluate the potential long-term implications of early family experiences.
Linking Childhood to the Later Years Up to this point, we have detailed the reasons why certain early family factors might contribute to development after young people have left the family of origin and the holistic perspective taken to gauge adjustment and functioning in the later years. To consider how these life stages may be linked, we again
Salkind_Chapter 05.indd 61
9/16/2010 12:40:59 PM
62
Human Development
Mediational Pathway Adult Experiences Early Family Factors
Aging Profiles Current Circumstances
Supplemental Pathway
Adult Experiences Aging Profiles
Early Family Factors Current Circumstances
Figure 2: Mediational and supplemental pathways between early and later life childhood and aging
draw on the model of later life aging that we developed in an earlier study (Crosnoe and Elder 2002) to examine and explain how later life experiences are embedded in the life stages that preceded them. Two basic pathways of this framework – the mediational and supplemental – are useful for thinking about how the significance of early experiences filters across the life course. In the mediational pathway, early family experiences shape later adjustment and functioning indirectly through their influence on intervening life stages (Figure 2). For example, if the socioeconomic status of the family of origin is a stabilizing and goal-orienting force for the life course, early socioeconomic advantages predict more stable and rewarding trajectories through young adulthood and middle age that translate into socioeconomic, health, and interpersonal advantages later in life; these, in turn, become significant determinants of adjustment and functioning in the later years. In other words, holistic profiles of aging are most closely related to proximate circumstances (e.g., marriage or health in current period), but early family experiences retain importance for the later years by serving as a foundation for these proximate circumstances. Aspects of the current situation and adult trajectories, therefore, mediate the linkage between early family experiences and aging profiles (Baron and Kenny 1986). In the supplemental pathway, early family experiences have consequences for adjustment and functioning that are not completely channeled through current circumstances (Figure 2). Instead, they have an independent association with aging profiles because, along with adult experiences and current circumstances, they detail the journey that men take to the later years. For example, one child may experience a parental divorce, whereas another does not.
Salkind_Chapter 05.indd 62
9/16/2010 12:40:59 PM
Crosnoe and Elder
Childhood and Aging
63
According to the mediational pathway, these boys will one day have different aging profiles because this early family experience charts their young adult and adult trajectories in different ways. Yet, consider the possibility that, despite this early difference, the basic circumstances of their life eventually converge by the time that they enter their later years. If so, we still might speculate that their aging profiles (which encompass social and psychological functioning) might be different, as they likely arrived at these current circumstances in different ways. If so, taking into account experiences at each stage of life taps a life history that best predicts aging profiles. In other words, the supplemental approach injects an additive value of life history into proximate models of aging (Shanahan et al. 2002). These two pathways provide different ways of thinking about life course trajectories, but they are not mutually exclusive. Indeed, early family experiences may be significant for adjustment and functioning in the later years because they lead to current circumstances and because, at the same time, they have lingering effects in their own right. To explore these pathways, we will examine whether aspects of the current situation (e.g., retirement, marital, and health status in the later years) and key adult trajectories (e.g., educational attainment, stable marital history, persistent alcoholism in adulthood) mediate the associations between early family experiences and aging profiles or whether early family experiences, adult trajectories, and current circumstances are each independently associated with aging profiles.
Method Sample This research was based on data from the Terman study, a longitudinal study of intellectually talented children that was started by Lewis Terman, a psychologist at Stanford University. In 1922, a sample of children who had scored in the talented range on the newly created Stanford-Binet IQ test was selected from large public schools across the state of California. This original sample included 857 boys and 671 girls who were born in the first two decades of the twentieth century. The primary motivation for this study was to identify the most able young people as a means to enable society to ensure the flow of talent into important leadership positions. With this motivation in mind, the study tracked the children as they grew up. The original sample was surveyed nearly every five years until 1960 and then again beginning in 1972. These survey data can be supplemented with qualitative information (e.g., letters, newspaper stories, written responses to open-ended questions) organized by Elder and colleagues. For those unfamiliar with this data set, a thorough discussion of its genesis is found in Terman and Oden (1959); a detailed description of its basic structure and recasting is available in Elder, Pavalko, and
Salkind_Chapter 05.indd 63
9/16/2010 12:40:59 PM
64
Human Development
Clipp (1993); and an excellent example of the longitudinal research possible with it is provided by Vaillant (1983). The Terman children were a highly select group. Not only did they score high on one of the first IQ exams; they were generally White, came from middle-class homes, and – not surprisingly – had high educational attainment. The generalizability of results from these data, therefore, is always a question. This problem must be acknowledged but is also mitigated by some important factors. For example, comparison of the Terman sample with the general population reveals few differences in marriage, divorce, and other family experiences (Pavalko and Elder 1990). At the same time, no representative, individual-based data are available that cover such a broad swath of time for long-term longitudinal analyses. For these reasons, we argue that this specialized sample is a valuable resource for life course research if it is presented with the appropriate, fully described warnings. For the purposes of this study, we selected a subgroup of the Terman sample. Because we use key measures and approaches from our earlier study (Crosnoe and Elder 2002), we were bound to follow the same sample selection process. This process had three steps, discussed in detail below, and resulted in a study sample of 424 respondents. First, the study sample included men only. Given the extreme gender differences in opportunity and experience in the early twentieth century, the creation of measures of adjustment and functioning, as well as the identification of key life-course markers, for men and women as a single group would be inappropriate. The study of men, therefore, is a first effort that will then be followed with equal attention to the women. Second, the selection process narrowed the study sample to the men born between 1905 and 1914. Past research has often divided the Terman sample into pre-1910 and post-1910 cohorts (Shanahan, Elder, and Miech 1997). To avoid conducting parallel analyses of two cohorts, other studies have elected to focus on the sample members born between 1905 and 1914, the years clustered around the cohort breakpoint that also contained the largest concentration of sample members, approximately 80% (Crosnoe and Elder 2002). This selection minimizes age variation; simplifies analyses; and, compared to studying only one cohort by itself, retains a larger portion of the original sample. Ancillary analyses revealed that the basic results of this study, and the earlier work on which it was based, replicated across the original cohort divisions. Third, the final filter excluded those who did not remain in the study until the 1970s, the end point of our analyses. Of the 688 men still eligible after the application of the first two filters, 264 did not make the study sample. Of these, 60% had died; the rest dropped out, were lost, or died without notifying the study. Considering the broad time span of the Terman study, this attrition is hardly surprising, but it could be problematic.
Salkind_Chapter 05.indd 64
9/16/2010 12:40:59 PM
Crosnoe and Elder
Childhood and Aging
65
Comparison of the men who remained in the sample until the 1970s (the study sample) and those who remained until the 1940s but dropped out by the 1970s (who remained long enough to measure key life circumstances for comparison) revealed that the first group had higher educational attainment and marriage rates than the second but that the two groups did not differ on other factors (e.g., age, cohort, family socioeconomic status). Thus, attrition biased the study sample toward greater social adjustment, but this bias was less extreme than it could have been. These comparisons echo past studies that report a minimal attrition bias in the Terman sample (e.g., Shanahan et al. 1997).
Measures We have already presented, in Figure 1, the four aging profiles, created for our earlier study, that serve as the foundation of the present study. We first describe our replication of these aging profiles and then turn to the measures of early family experiences, adult experiences, and current circumstances that serve as primary independent variables. The complete descriptions of, and descriptive statistics for, all study variables are included in Table 1. Table 1: Descriptions and descriptive statistics for study variables Variable
M
SD
Description
108.28
33.68
Vitality in 1972
4.50
1.12
The sum of respondents’ self-rated energy and happiness (1 = low to 3 = high).
Family engagement in 1977
1.50
1.06
The sum of three measures: communication with offspring (1 point for speaking to at least one child once a week), communication with siblings (1 point for speaking to at least one sibling once a week), and time with relatives (1 point for spending frequent time with relatives).
Occupational success in 1972
3.95
0.95
Respondents’ agreement with whether they had achieved their occupational goals in life (1 = low to 5 = high), essentially a review of the respondents’ careers as they wind down.
Civic involvement in 1977
1.05
1.14
The sum of nine binary items from 1977: participation in service, community, professional, religious, political, social, educational, recreational, and miscellaneous organizations (1 = yes).
Aging profile factors Life satisfaction in 1972
The mean of four composites: satisfaction with family, friendships, cultural life, and social service. To create each, we took the respondents’ rating of the salience of that domain during adulthood (1 = low to 5 = high) and their satisfaction with goal achievement in each domain (1 = low to 5 = high) and then calculated (Salience × Achievement) – (Salience – Achievement).
(Continued )
Salkind_Chapter 05.indd 65
9/16/2010 12:40:59 PM
66
Human Development
Table 1: (Continued ) Variable
M
SD
Description
Socioeconomic status
2.73
1.49
The highest level of education reached for two parents or for one parent in a single-parent family (1 = no high school graduation, 2 = high school graduate, 3 = some college, 4 = bachelor’s degree, 5 = postgraduate degree). Based on parents’ self-reports in 1922.
Parental divorce
0.10
0.30
Retrospective family history given by Terman participants in 1936 identified those who had experienced a parental divorce (1 = yes, 0 = no) before turning 18.
Parent-child attachment
3.57
0.76
The mean of level of attachment to each parent as a child (1 = none, 2 = very little, 3 = moderate, 4 = a good deal, 5 = very close), reported in 1936 survey. We recognized the qualitative difference of attachment to mother and father, and so we compared the two measures (as well as the combined measure) in all analyses and found no meaningful differences.
Education level
4.37
1.26
Education obtained by 1940: 1 (no high school graduation), 2 (high school graduate), 3 (some college), 4 (bachelor’s degree), 5 (master’s degree), 6 (Ph.D., MD, JD)
Long-term intact marriage
0.65
0.48
1 (1972 marital record shows one marital partner, still living, in life), 0 (no or multiple partners, widower)
Persistent alcoholism
0.11
0.30
1 (self-reported alcohol problem in 1940, 1950, 1960), 0 (no problem in any or only one time period)
62.07
2.45
Self-reported age in 1940 (+32) (M = 62.07, SD = 2.45)
.26
0.44
1 (retired and not seeking employment), 0 (employed or seeking employment)
22.57
19.58
Early family factors
Adult experiences
Current circumstances Age in 1972 Retirement status in 1972 Income level in 1972
Self-reported total earnings from work in 1972: 0 (none) – 91 (over $90,000)
Physical health in 1972
3.24
0.81
Self-reported physical health, 1 (poor) to 4 (very good)
Emotional health in 1972
4.61
0.74
Self-reported emotional health, 1 (poor) to 5 (excellent)
Marital status in 1972
0.91
0.28
1 = currently married, 0 = unmarried.
Two measurement issues deserve comment. First, all measures are based on self-reports. Because exclusive use of self-reported survey data has drawbacks (e.g., problems of recall, the tension between perception and reality, shared method variance), this study is best thought of as an examination of the life-course pathways as seen and perceived by those who lived them. Second, because our study sample has a broad age range, each year of data collection falls at slightly different stages of life for respondents of different ages. Table 2 contains an age-by-year breakdown to facilitate the age-related interpretation of measures and results.
Salkind_Chapter 05.indd 66
9/16/2010 12:40:59 PM
Crosnoe and Elder
Childhood and Aging
67
Table 2: Age range of study sample for each stage of data collection Age range 1922 1928 1936 1940 1945 1950 1955 1960 1972 1977
8–17 14–23 22–31 26–35 31–40 36–45 41–50 46–55 58–67 63–72
Aging profiles. The four aging profiles were created through cluster-analytic techniques, described below, of four factors measured in either 1972 or 1977: life satisfaction, vitality, family engagement, perceived occupational success, and civic involvement. The rationale for the selection of these factors is explained in greater detail in our earlier article (Crosnoe and Elder 2002), and we will direct the reader to that source for the information that we cannot provide here. Essentially, these factors together provide an overall assessment of adjustment and functioning in the later years in both social and psychological domains. As opposed to physiological markers, these factors, in total, tapped how men felt about themselves and their lives and how engaged they were in the social world. As seen in Table 1, the measure of life satisfaction differs from traditional treatments in that it essentially taps the achievement of life goals rather than an overall assessment of life circumstances. We argue that this difference is useful because it gives men a standard – earlier goals and dreams – by which to evaluate their lives. These five factors were then entered into a cluster analysis with Sleipner 2.0 (Bergman and El-Khouri 1998). This analysis grouped together similar cases, as determined by the squared Euclidean distance between them, using Ward’s hierarchical method to optimize the variance within clusters (Aldenderfer and Blashfield 1984). As previously noted, we replicated the solution from our earlier study, resulting in four clusters that explained 41% of the variance in the five factors. A total of 412 men were assigned to a cluster. Seven cases were eliminated because of missing data and five because Sleipner identified them as outliers. The four clusters produced by this analysis correspond to the four aging profiles presented in Figure 1. These profiles were the dependent variables in multivariate analyses. Early family factors. The three measures of early family experiences were socioeconomic status of the family of origin (measured by the highest level of parent education), parental divorce, and parentchild attachment. We should
Salkind_Chapter 05.indd 67
9/16/2010 12:40:59 PM
68
Human Development
note that, although Terman data collection began when most respondents were children or adolescents, we did not always use the surveys from respondents’ child and adolescent periods to measure early family experiences. Socioeconomic status was based on the reports of parents in 1922, when the respondents in the study sample were minors, but the other two family measures were indexed by data from the 1936 survey, when most respondents in the study sample were young adults. Whether parents divorced during a respondent’s childhood or adolescence could only be assessed after the end of this long period. Furthermore, the Terman study was designed to focus on psychological and intellectual issues rather than contexts of development, but, over time, more contextual items began to appear on the survey. Consequently, items about relationships with parents, surprisingly, were not asked in abundance until the respondents had grown older. Our focus on socioeconomic status of the family of origin may seem curious, considering that the Terman sample was largely middle-class. Socioeconomic status, however, encompasses more than money, and, indeed, our early discussion about the need to study socioeconomic status tapped more social psychological mechanisms than financial capital. Our focus on the educational side of this status is important because, in the early twentieth century, education was not as much of a natural byproduct of class (e.g., college was not necessarily the normative experience for high-status youths as it is today). Educational attainment of parents, therefore, had more variation than might be expected. Adult experiences. The heart of our earlier study of aging profiles (Crosnoe and Elder 2002) was the linkage of these profiles to adult trajectories, which were constructed by combining and collapsing data across multiple time points from the 1930s to the 1960s. As noted earlier, we chose to focus on three that proved to be consequential for adjustment and functioning in the later years. The first two – long-term intact marriage and lifetime educational attainment – represent social pathways, the life-course continua made up of social role sequences. Because of the looser organization of life-course sequences in the early twentieth century and the interruption of major historical events (e.g., Great Depression, World War II), the educational careers of the men in this sample were not as limited to young adulthood as they might be today. The third – persistent alcoholism – represents another type of life-course continua, developmental trajectories, which refer to patterns of continuity and change in psychological, physiological, or health-related factors. Alcohol use is a health behavior that, when extreme, has serious repercussions for health and other domains. Current circumstances. The aging profiles were created based on data from 1972 and 1977, and so current circumstances refer to some basic aspects of life during this time: age, retirement status, income, physical health, emotional health, and marital status.
Salkind_Chapter 05.indd 68
9/16/2010 12:40:59 PM
Crosnoe and Elder
Childhood and Aging
69
Plan of Analyses In all multivariate analyses, the four-category aging profile measure served as the dependent variable in multinomial logistic regression. In this type of regression, independent variables (measures of early family experiences, adult experiences, and current circumstances) predicted the odds of being in one category of the dependent variable versus the reference category. We estimated multinomial models with each category of the dependent variable as the reference. These analyses produced voluminous results that would be difficult to present in total. Consequently, we present the results for the models with the two “anchor” profiles – less adjusted and well-rounded – as the references but discuss other models when necessary. The modeling strategy for this study was intended to capture the associations between early family experiences and the aging profiles and to examine whether these associations best fit the mediational or supplemental pathway. Support for the mediational pathway would be strongest if significant associations between early family experiences and the aging profiles were reduced substantially or eliminated by taking into account adult experiences and current circumstances. Support for the supplemental pathway would be strongest if these associations persisted even after taking into account these later experiences. To compare these pathways, we first estimated two separate models for the aging profiles – one that included only the early family measures as predictors, one that included only the adult and current measures as predictors – and then a comprehensive model that included all predictors together. Examining the magnitude and significance level of the associations between the early family measures and the aging profiles before and after the inclusion of the measures of adult experiences and current circumstances demonstrated how much these associations were attenuated by the adult and current measures, which provided evidence of the extent to which these factors from three stages of the life course were part of one sequential pathway or were largely independent. Additional analyses, in which the adult/ current measures were regressed on the early family measures, provided further evidence on the extent to which these three sets of factors were mediational or supplemental predictors of the aging profiles.
Results Early Family Experiences and Aging Profiles in the Later Years The general purpose of this study was to examine the associations between family experiences in childhood and adolescence and patterns of adjustment and functioning many decades later. Before turning to multivariate analyses
Salkind_Chapter 05.indd 69
9/16/2010 12:40:59 PM
70
Human Development
Table 3: Mean differences in profile factors and early family experiences, by aging profile Career focused but less adjusted
Socially disengaged
89.70c
25.57
99.80b
35.11
97.81bc
24.23
137.23a
25.96
Family focused
Well-rounded
Profile factors Life satisfaction in 1972 Vitality in 1977
3.58c
1.00
4.09b
1.11
4.18b
0.97
5.40a
0.63
Family engagement in 1977
0.96c
0.88
0.37d
0.49
2.35a
0.54
1.87b
0.87
Occupational success in 1972
2.22d
0.65
4.19b
0.58
3.95c
0.71
4.54a
0.56
Civic involvement in 1977
0.83b
0.97
0.36c
0.52
0.84b
0.98
1.76a
1.25
2.64
1.52
2.82
1.55
2.56
1.44
2.86
1.48
Early family experiences Family socioeconomic status Parental divorce
0.08
0.27
0.12
0.33
0.08
0.27
0.12
0.32
Attachment to parents
3.55ab
0.77
3.38b
0.79
3.57ab
0.68
3.75a
0.80
n
52
121
126
113
Note: Means with different subscripts differ significantly ( p < .05) according to one-way analyses of variance; a indicates the highest mean, with b, c, and so on indicating means in descending order from the highest.
of these associations, we present a basic descriptive picture of the men in our sample during each of these stages of life (see Table 3). Table 3 is broken down into the four aging profiles that were replicated in this study to capture, more holistically, adjustment and functioning in the later years. To clarify what these four profiles represent, we have presented the means, for each profile, of the five factors used to create them. Two of the profiles (well-rounded, less adjusted) were internally consistent – the men in these profiles tended to be consistently high or low on the five factors used to create the profiles. The other two (career focused, family engaged) were internally inconsistent – the men in these profiles scored low on all but one factor. Age was not included in Table 3, but we should note that the profiles did not differ by age (the mean age for all was approximately 62 years) even though they were measured from two time points (1972, 1977) and covered a broad age range (59–72). To explore the possibility that the men in these profiles had different family experiences early in life, we also present the within-profile means of the three family factors. The four profiles exhibit few differences in their early family experiences, except that the men in the well-rounded profile reported the strongest attachment to their parents as children and the men in the socially disengaged profile reported the weakest. These bivariate statistics did not take into account other life circumstances from any stage
Salkind_Chapter 05.indd 70
9/16/2010 12:40:59 PM
Crosnoe and Elder
Childhood and Aging
71
of the life course and, consequently, may have masked real differences among the aging profiles in early family experiences.
Early, Adult, and Current Experiences and Aging Profiles Are early family experiences associated with adjustment and functioning in the later years? To answer this question, Table 4 presents results from our first set of multinomial logistic regressions. Model 1 in Table 4 contains the odds ratios for the model in which the three family factors were the only predictors of the aging profiles. According to Table 4, one family factor differentiated the less adjusted men from the men in the other profiles, and two factors differentiated the well-rounded men from men in the other profiles. The ancillary analyses – in which the career- and family-focused profiles served as reference categories – revealed additional associations between early family factors and the aging profiles, and so we will include those results, not shown in Table 4, in this discussion. All three family factors predicted aging profiles, although in different ways. The first, socioeconomic status of family of origin, differentiated men in the family-focused profile from those in the well-rounded profile. Specifically, a one-unit change in socioeconomic status in childhood (e.g., parent with high school degree vs. parent with college experience) Table 4: Results of multinomial logistic analyses predicting aging profiles by early family experiences, adult experiences, and current circumstances in separate models (N = 309) Less adjusted as reference
Well-rounded as reference
Career focused
Family focused
Wellrounded
Less adjusted
Career focused
Family focused
1.21 1.38 0.75
0.97 0.90 1.14
1.20 1.95 1.51†
0.84 0.51 0.66
1.01 0.50 0.71***
0.81* 0.46† 0.75
Adult experiences Educational attainment Long-term intact marriage Persistent alcoholism
0.85 0.48 1.85
0.71† 1.21 1.05
1.55* 1.35 0.78
0.64* 0.74 1.28
0.55*** 0.36* 2.37
0.46*** 0.90 1.35
Current circumstances Age Retirement status in 1972 Income level in 1972 Physical health in 1972 Emotional health in 1972 Marital status in 1972
1.05 1.45 1.05** 1.20 1.21 0.98
1.02 0.87 1.06* 1.26 1.64† 0.77
0.94 2.12 1.06* 4.07*** 1.62 0.83
1.07 0.47 0.94** 0.25*** 0.62 1.20
1.13 0.69 0.99 0.30*** 0.75 1.17
1.09 0.41* 0.99 0.21*** 1.01 0.93
Variable Model 1 Early family experiences Family socioeconomic status Parental divorce Attachment to parents Model 2
Note: Coefficients are odds ratios. Model 1 also controlled for age. †p < .10. *p < .05. **p < .01. ***p < .001.
Salkind_Chapter 05.indd 71
9/16/2010 12:40:59 PM
72
Human Development
decreased the odds of being in the family-focused profile later in life, compared to being in the well-rounded profile, by 19% (1 – odds ratio = 1.00 – .81 = .19). The exact same pattern was seen for the family-focused profile when compared to the career-focused profile (not shown). The second family factor, parental divorce, differentiated the men in the familyfocused profile from those in the well-rounded profile – experiencing a parental divorce as a child or adolescent decreased the odds of being in the family-focused profile decades later, compared to the well-rounded, by 54%. The third, parent-child attachment, differentiated the well-rounded men from the less adjusted and career-focused men (51% and 29% changes, respectively). It also differentiated the family-focused men from the career-focused men (not shown), with a one-unit increase in attachment associated with a 51% decrease in the odds of being career focused rather than family focused. To offer a baseline comparison for the associations between more proximate factors and aging profiles, Model 2 in Table 4 presents the results from multinomial regressions that contained only the measures of adult experiences and current circumstances as predictors of the aging profiles. Again, we supplement the results presented in Table 4 with results from regressions in which the other two aging profiles served as the reference categories. The two major aspects of current circumstances were physical health and income. Compared with all other aging profiles, the men in the wellrounded profile reported better physical health. Compared with all other aging profiles, the men in the less adjusted profile reported lower annual earnings later in life. A third current factor, emotional health, was also related to the aging profile. Emotional health predicted membership in the family-focused aging profile compared with the less-adjusted profile. Turning to the three longitudinal measures representing adult experiences, educational attainment predicted membership in the well-rounded profile compared with all other profiles, long-term intact marriage predicted membership in the family-focused and well-rounded profiles compared with the career-focused profile, and persistent alcohol problems did not predict aging profiles at all. The nonsignificant main effect of alcohol problems differed from our earlier study (Crosnoe and Elder 2002), which reported that such problems differentiated the career-focused men from others. This difference could be due to slight differences in the variables included in the model.
Mediational and Supplemental Pathways from Early to Later Life The results just described catalog the associations between factors from various stages of the life course and aging profiles measured in the later years, but they do not really speak to how these associations are connected. In other
Salkind_Chapter 05.indd 72
9/16/2010 12:41:00 PM
Crosnoe and Elder
Childhood and Aging
73
words, do the early family factors identified as important in the previous analyses matter because they influence the associations between adult/current experiences that were also identified as important, or were all of these factors important in their own right? Table 5 presents results from a comprehensive set of multinomial regressions aimed at answering this question. In reference to Figure 2, the mediational pathway suggests that early family factors predict adult experiences and current circumstances, which in turn predict aging profiles. If the mediational pathway best captured the role of early family experiences, then we would expect that any significant associations between the early family factors and the profiles would be attenuated by the inclusion of measures of adult experiences and current circumstances. The supplemental pathway suggests that early family experiences predict aging profiles above and beyond experiences in the intervening stages of the life course. If the supplemental pathway best captured the role of early family experiences, then we would expect that any significant associations between the early family factors and the profiles to persist even after the inclusion of the measures of adult experiences and current circumstances. To assess the two pathways, we focus, in turn, on each association between early family factors and the aging profiles. The significant association between family socioeconomic status and membership in the well-rounded (versus the family-focused) profile was almost completely eliminated by the measures of adult experiences and current circumstances (odds ratio = .81, p < .05 in Table 4, .98, ns in Table 5).
Table 5: Results of multinomial logistic analyses predicting aging profiles by early family experiences, adult experiences, and current circumstances in comprehensive model (N = 309) Less adjusted as reference
Variable
Career focused
Well-rounded as reference
Family focused
Wellrounded
Less adjusted
Career focused
Family focused
Early family experiences Family socioeconomic status Parental divorce Attachment to parents
1.14 1.49 0.88
0.99 0.86 1.36
1.01 2.51 1.65
0.98 0.28 0.60
1.13 0.43 0.53**
0.98 0.24* 0.82
Adult experiences Educational attainment Long-term intact marriage Persistent alcoholism
0.83 0.48 1.89
0.69† 1.16 0.97
1.50† 1.28 0.72
0.67† 0.78 1.38
0.56** 0.38* 2.61
0.46*** 0.91 1.35
Current circumstances Age Retirement status in 1972 Income level in 1972 Physical health in 1972 Emotional health in 1972 Marital status in 1972
1.04 0.84 1.05* 1.18 1.12 0.57
0.98 1.05 1.05** 1.27 1.60† 0.93
0.98 1.40 1.06** 3.48*** 1.47 0.95
1.06 0.60 0.94** 0.23*** 0.68 0.96
1.11 0.70 0.99 0.28*** 0.81 1.02
1.09 0.39* 0.99 0.30*** 1.09 0.87
Note: Coefficients are odds ratios. †p < .10. *p < .05. **p < .01. ***p < .001.
Salkind_Chapter 05.indd 73
9/16/2010 12:41:00 PM
74
Human Development
This attenuation suggests a mediating process, but what was the mediator? Table 6 contains the results of a series of ordinary least squares and logistic regressions in which the early family factors predicted the adult experiences and current circumstances. In tandem with the results from Table 4 on the associations between the adult/current factors and the aging profiles, these results allow us to chart potential pathways from family factors to adult/ current factors to the aging profiles that might explain this mediation. Only educational attainment has a significant association with family socioeconomic status (as an outcome) and a well-rounded versus family-focused profile (as a predictor). Thus, the tendency for children from more advantaged profiles to have more well-rounded, as opposed to family-focused, profiles in their later years was largely explained by their greater success in the educational system. The significant association between family socioeconomic status and a career-focused (versus family-focused) profile changed only slightly after the inclusion of the adult and current factors. Moreover, no adult or current factor was linked to both family socioeconomic status early in life and to greater odds of being career focused versus family focused later in life (seen by reviewing Tables 4 and 6). This role of early socioeconomic status appeared to better fit the supplemental pathway. It was largely independent of other life course experiences. Turning to the second early family factor, the apparent effect of experiencing a parental divorce as a child or adolescent on the likelihood of a wellrounded versus family-focused profile actually strengthened significantly once adult experiences and current circumstances were taken into account (odds ratio = .46, p < .05 in Table 4, .24, p < .05 in Table 5). This change in TABLE 6: Results from ordinary least squares (OLS) and logistic regressions predicting current circumstances and adult experiences by early family experiences OLS models
Adult-experiences models Family socioeconomic status Parental divorce Attachment to parents
Education
Marriage
Alcohol
.22*** −.12 .17†
−.12 .34 .19
.15 .67 .23
Retirement
Marriage
Income Current-circumstances models Family socioeconomic status Parental divorce Attachment to parents
Logistic models
.44 −.38 −1.19
Physical health .02 −.37* .03
Emotional health .01 .14 .09†
−.10 −.22 .23
−.07 −.44 −.30
Note: All coefficients are unstandardized b coefficients – the logistic coefficients are not odds ratios. All models controlled for age. †p < .10. *p < .05. ***p < .001.
Salkind_Chapter 05.indd 74
9/16/2010 12:41:00 PM
Crosnoe and Elder
Childhood and Aging
75
the odds ratio for this early family factor suggests the opposite of mediation: a suppression effect. Physical health offers a clue as to why this suppression might occur. Physical health in the later years was the only factor associated with both early parental divorce and greater odds of being well-rounded versus family focused. Thus, young people raised by married parents were more likely to have well-rounded lives in their later years than to be more focused on family life relative to other domains of life. They also, however, had better health, which was itself related to aging profiles. Taking into account this added advantage revealed that the apparent advantage of being raised in such a home was even greater than it first seemed. As for the third early family factor, greater attachment to parents early in life predicted being well-rounded versus career focused or less adjusted. In both cases, the odds ratio for attachment increased slightly upon including the current and family factors. These changes reveal an apparent suppression effect, mostly because of educational attainment. This effect, however, was weak in magnitude. The other role of early attachment (differentiating career focused and family focused) fits better the supplemental pathway – the coefficient did not differ substantially across models, and no current/ adult factor was related to attachment and to the difference between the career- and family-focused profiles.
Discussion Research on human development has been largely stage specific – focusing on developmental processes within rather than across different stages of the life course. This traditional focus has arisen from several factors, such as differing emphases across disciplines, the complexities of linking life stages both empirically and theoretically, and the lack of long-term longitudinal data. Unfortunately, this tradition has likely constrained our understanding of the full breadth of human development and aging. Recent trends in psychology, sociology, and related disciplines, however, have begun to break down the barriers among different stages of the life course. As longitudinal samples age from childhood into subsequent life stages and as studies gather retrospective data to inform aging models, these contemporary trends can be expanded significantly to capture developmental processes across the full life course. This study represents a preliminary step in such life course research. It builds on past research with one such data set that had aged long enough to facilitate long-term longitudinal analysis. This past research was interested in how styles of aging, holistically defined profiles of social psychological adjustment and functioning, were embedded in earlier experiences. The present study reversed this objective, exploring the carryover of early experiences into later life. We used the same basic modeling strategy and data source as the earlier research but addressed different goals. Essentially, we were interested
Salkind_Chapter 05.indd 75
9/16/2010 12:41:00 PM
76
Human Development
in leveraging longitudinal data, life-course perspectives, and conceptual models from aging research to address a central issue that has concerned research on child development and family relations for decades: whether experiences earlier in life influence developmental trajectories into adulthood and the later years. Specifically, this study asked whether, and how, key aspects of the family environment in childhood and adolescence were related to patterns of adjustment and functioning in the later years. We reported that the socioeconomic status of the family of origin predicted being career focused decades later. Men who came from higher status back-grounds may have had more advantages in life but also appeared to be at risk for more problematic functioning (e.g., concentrating on careers at the expense of other domains). We also reported that early family disruption had little long-term impact on individual adjustment in this sample, except that the men who were defined by the well-rounded profile were more likely to have experienced a parental divorce in the early life course than those with a family-focused profile. Thus, men who had early family problems appeared to be more likely to apply themselves equally across life domains rather than to specialize in family life. Finally, we reported that early attachment to parents generally differentiated the two profiles highest in family engagement in later life from the two profiles lowest in such engagement. Our multinomial analyses, therefore, established links between early and later stages of life, but why did these links occur? We attempted to discover whether these associations were mediated by experiences in adulthood and the basic circumstances of later life (mediational pathway) or whether they existed independently of, or were additive to, these later experiences (supplemental pathway). We expected that these two pathways were not mutually exclusive, and our analyses bore out this possibility. The evidence did not uniformly support either pathway. Our analyses revealed one clear example of the mediational pathway. An observed association between socioeconomic advantages early in life and a well-rounded aging profile later in life was almost completely a function of the greater educational attainment of young people who grew up in such families, in line with the basic predictions of the Wisconsin Model of Status Attainment (Sewell and Hauser 1980). We also found one clear example of a suppression effect, which might be considered in the same class of pathways as mediation. The observed association between experiencing a divorce in the early life course and having a well-rounded aging profile (compared to focusing on the family at the expense of other domains) actually masked the added benefits of good physical health for later patterns of adjustment and functioning. Taking this into account revealed an even stronger tendency for those with this early family experience to be more well-rounded later in life. The other associations reported above all most closely fit the supplemental pathway. We should acknowledge, however, the possibility that more mediational pathways did exist in these men’s lives but that we were unable, with what we have done here, to identify the mechanisms.
Salkind_Chapter 05.indd 76
9/16/2010 12:41:00 PM
Crosnoe and Elder
Childhood and Aging
77
Comparison of mediational and supplemental pathways demonstrated the myriad ways that early experiences might matter in the long term. Fitting traditional perspectives on the linkages among life stages, these early experiences are important because of what they lead to over time, as in a cumulative process where later stages build on earlier ones. We also saw evidence of another form of linkages among life stages that has been reported from earlier research on the Terman sample and that has received greater attention in recent years. In this form, each stage of the life course has additive value in explaining adjustment and functioning. Circumstances at one stage of the life course do not merely subsume early experiences and statuses. Instead, these early experiences provide information about how two people in the same position arrived at that position in different ways and whether these differences have implications for overall adjustment and functioning. Put another way, our findings suggest that early experiences can have long-term consequences that are not necessarily cumulative in nature. In both the mediational and supplemental pathways, elements of earlier life stages, including family experiences, provide information about the routes that individuals take through life. They are chapters in a biography, both related to and independent of each other. Drawing on life histories to link multiple stages of the life course, as we have done here, represent key tools in life-course studies of human development and aging. The Terman study facilitates this endeavor with data encompassing most of the twentieth century and multiple domains of life. Use of the Terman data, however, also comes with trade-offs, the most important of which are their limited generalizability. The Terman men – highly intelligent, from generally comfortable backgrounds – are certainly not representative of American men as a whole, not even those from the same cohort. Yet, in many other ways, these men are quite “average.” Past comparisons of the Terman sample with more representative data, for example, have consistently revealed more similarities than differences (Pavalko and Elder 1990; Shanahan et al. 1997). Our results are specialized, but we have two key reasons for arguing that they are not as specialized as the sample on which they were based. First, Neugarten (1969) contended that social psychological aspects of well-being are valuable for life-course research because of their power to transcend many population-level differences, and our research does have a focus on a social psychological phenomenon – how men feel about themselves and their worlds. Second, the men whose lives are captured in the results of the present study came of age in a social context that might have minimized their differences with other Americans. They lived through the Great Depression and two world wars, period economic booms and busts, the rise of the service sector, and the exponential growth of higher education. Such sweeping historical change likely minimized a good deal of the standardization of the life course and increased the opportunities for diverse developmental trajectories in the process (Shanahan 2000). Finally, this study of more advantaged men might be viewed as a companion piece to focused studies of men from other
Salkind_Chapter 05.indd 77
9/16/2010 12:41:00 PM
78
Human Development
race and class backgrounds, such as Blauner’s (1964) study of working-class men or Newman’s (2003) study of minority aging. Triangulating such studies, and looking for commonalities or differences among them, offers a strategy for sorting out the life course of Americans in the twentieth century. The role of history, just mentioned, also brings up another important issue for long-term longitudinal research. In any study covering multiple stages of the life course, the sample members will have lived most of their lives in a time much different from the present, and their life courses will have spanned diverse historical epochs. The Terman sample is a prime example of this issue – the results of this study are embedded in a larger historical context that gives special meaning to them. The men were born in a time of relative peace and prosperity, things disrupted by World War I. They were typically out of high school and in college during the Great Depression and were young adults during World War II, in which many of them served both in the United States and abroad. They were adults during the sheltered postwar 1950s, middleaged (at least) as the rapid social changes (e.g., civil rights, feminism) of the 1960s took place, and at or near retirement age in the economically stagnant 1970s. Consequently, the socioeconomic advantages many enjoyed early in life were likely partially blunted by massive historical events; the higher education that they pursued in great numbers was certainly not as normative as it might be for similar men today but also part of a larger wave of men who entered college on the G.I. Bill, and the parental divorces that some experienced certainly had a stronger social stigma to them than today. Regardless of historical era, research on children and early family life often concludes that early experiences matter, but does the extent to which they matter “wear off” with time and with age? At the same time, studies have identified many social and psychological factors that play key roles in the aging process, but how and from where do these factors arise? These two questions point to an increasingly important and recognized connection between different fields of research on widely separated phases of the life course. This study, and others based on the Terman data, provide at least a step toward the ultimate goal of life-course research: to capture how lives are lived within context from birth to death.
References Aldenderfer, Mark S. and Roger K. Blashfield. 1984. Cluster Analysis. Beverly Hills, CA: Sage. Amato, Paul R. and Bruce Keith. 1991. “Parental Divorce and the Well-Being of Children: A Meta-Analysis.” Psychological Bulletin 110:26–46. Amato, Paul R., Laura S. Loomis, and Alan Booth. 1995. “Parental Divorce, Marital Conflict, and Offspring Well-Being During Early Adulthood.” Social Forces 73:895–915. Antonucci, Toni and Hiroko Akiyama. 1995. “Convoys of Social Relations: Family and Friendships Within a Life Span Context.” Pp. 355–71 in Handbook of Aging and the Family, edited by Rosemary Blieszner and Victoria Hilkevitch-Bedford. Westport, CT: Greenwood.
Salkind_Chapter 05.indd 78
9/16/2010 12:41:00 PM
Crosnoe and Elder
Childhood and Aging
79
Baltes, Paul B. and Margaret M. Baltes. 1990. “Psychological Perspectives on Successful Aging: The Model of Selective Optimization With Compensation.” Pp. 1–34 in Successful Aging: Perspectives From the Behavioral Sciences, edited by Paul B. Baltes and Margaret M. Baltes. New York: Cambridge University Press. Baron, Reuben and David Kenny. 1986. “The Moderator-Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations.” Journal of Personality and Social Psychology 51:1173–82. Bergman, Lars R. 2001. “A Person Approach in Research on Adolescence: Some Methodological Challenges.” Journal of Adolescent Research 16:28–53. Bergman, Lars R. and Bassam M. El-Khouri. 1998. Sleipner: A Statistical Package for PatternOriented Analyses. Stockholm, Sweden: Stockholm University. Blauner, Bob. 1964. Alienation and Freedom: The Factory Worker and His Industry. Chicago: University of Chicago Press. Block, Jack. 1993. “Studying Personality the Long Way.” Pp. 9–41 in Studying Lives Through Time: Personality and Development, edited by David C. Funder, Ross D. Parke, Carol Tomlinson-Keasey, and Keith Widaman. Washington, DC: American Psychological Association. Bowlby, John. 1988. A Secure Base: Parent-Child Attachment and Healthy Human Development. New York: Basic Books. Call, Kathleen T. and Jeylan T. Mortimer. 2001. Arenas of Comfort in Adolescence: A Study of Adjustment in Context. Mahwah, NJ: Lawrence Erlbaum. Cameron, Stephen V. and James J. Heckman. 1993. “The Nonequivalence of High School Equivalences.” Journal of Labor Economics 11:1–47. Chase-Lansdale, P. Lindsey, Andrew Cherlin, and Kathleen E. Kiernan. 1995. “The LongTerm Effects of Parental Divorce on the Mental Health of Young Adults: A Developmental Perspective.” Child Development 66:1614–34. Cherlin, Andrew. 1992. Marriage, Divorce, and Remarriage: Social Trends in the U.S. Cambridge, MA: Harvard University Press. Clausen, John A. 1991. “Adolescent Competence and the Shaping of the Life Course.” American Journal of Sociology 96:805–42. Crockett, Lisa J. and Anne C. Petersen. 1993. “Adolescent Development: Health Risks and Opportunities for Health Promotion.” Pp. 13–37 in Promoting the Health of Adolescents: New Directions for the 21st Century, edited by Susan G. Millstein, Anne C. Petersen, and Elena O. Nightingale. New York: Oxford University Press. Crosnoe, Robert and Glen H. Elder Jr. 2002. “Successful Adaptation in the Later Years: A Life Course Approach to Aging.” Social Psychology Quarterly 65:309–28. Dornbusch, Sanford M. 1989. “The Sociology of Adolescence.” Annual Review of Sociology 15:233–59. Elder, Glen H., Jr. and Monica Kirkpatrick Johnson. 2002. “The Life Course and Human Development: Challenges, Lessons, and New Directions.” Pp. 49–81 in Invitation to the Life Course: Toward New Understandings of Later Life, edited by Richard A. Settersten. Amityville, NY: Baywood. Elder, Glen H., Jr., Eliza K. Pavalko, and Elizabeth C. Clipp. 1993. “Introduction.” Pp. 1–23 in Working With Archival Data: Studying Lives, by Glen H. Elder Jr., Eliza K. Pavalko, and Elizabeth C. Clipp. Newbury Park, CA: Sage. Furstenberg, Frank, Thomas Cook, Jacquelynne Eccles, Glen Elder, and Arnold Sameroff. 1999. Managing to Make It: Urban Families and Adolescent Success. Chicago: University of Chicago Press. Grotevant, Harold D. 1998. “Adolescent Development in Family Contexts.” Pp. 1097–1147 in Handbook of Child Psychology, edited by William Damon. New York: John Wiley. Hetherington, E. Mavis and John Kelly. 2001. For Better or Worse: Divorce Reconsidered. New York: Norton.
Salkind_Chapter 05.indd 79
9/16/2010 12:41:00 PM
80
Human Development
Hetherington, E. Mavis, Margaret Bridges, and Glendessa M. Insabella. 1998. “What Matters? What Does Not? Five Perspectives on the Association Between Marital Transitions and Children’s Adjustment.” American Psychologist 53:167–84. Maccoby, Eleanor. 2000. “Parenting and Its Effects on Children: On Reading and Misreading Behavior Genetics.” Annual Review of Psychology 51:1–27. Magnusson, David and Robert B. Cairns. 1996. “Developmental Science: Toward a Unified Framework.” Pp. 7–30 in Developmental Science, edited by Robert B. Cairns, Glen H. Elder Jr., and E. J. Costello. New York: Cambridge University Press. McLoyd, Vonnie. 1998. “Socioeconomic Disadvantage and Child Development.” American Psychologist 53:185–204. Menaghan, Elizabeth G. 1989. “Role Changes and Psychological Well-Being: Variations in Effects by Gender and Role Repertoire.” Social Forces 67:693–14. Mirowsky, John and Catherine E. Ross. 2003. Education, Social Status, and Health. New York: Aldine. Moen, Phyllis, Donna Dempster-McClain, and Robin M. Williams Jr. 1992. “Successful Aging: A Life-Course Perspective on Women’s Multiple Roles and Health.” American Journal of Sociology 97:1612–38. Neugarten, Bernice L. 1969. “Continuities and Discontinuities of Psychological Issues Into Adult Life.” Human Development 12:121–30. Newman, Kathleen. 2003. A Different Shade of Gray. New York: New Policy Press. Pavalko, Eliza K. and Glen H. Elder Jr. 1990. “World War II and Divorce: A Life Course Perspective.” American Journal of Sociology 95:1213–34. Schneider, Barbara and James S. Coleman. 1993. Parents, Their Children, and Schools. Boulder, CO: Westview. Settersten, Richard A. 1999. Lives in Time and Place: The Problems and Promises of Developmental Science. Amityville, NY: Baywood. Sewell, William H. and Robert Hauser. 1980. “The Wisconsin Longitudinal Study of Social and Psychological Factors in Aspirations and Achievements.” Pp. 59–100 in Research in the Sociology of Education and Socialization, edited by Alan C. Kerckhoff. Greenwich, CT: JAI. Shanahan, Michael J. 2000. “Pathways to Adulthood in Changing Societies: Variability and Mechanisms in Life Course Perspective.” Annual Review of Sociology 26:667–92. Shanahan, Michael J., Glen H. Elder Jr., and Richard A. Miech. 1997. “History and Agency in Men’s Lives: Pathways to Achievement in Cohort Perspective.” Sociology of Education 70:54–67. Shanahan, Michael J., Scott M. Hofer, and Richard A. Miech. 2002. “Planful Competence, the Life Course, and Aging: Retrospect and Prospect.” Pp. 189–211 in Personal Control in Social and Life Contexts, by Steven Zarit, Leonard Pearlin, and K. Warner Schaie. New York: Springer. Steinberg, Laurence. 2001. “We Know Some Things: Parent-Adolescent Relationships in Retrospect and Prospect.” Journal of Research on Adolescence 11:1–20. Terman, Lewis M. and Melita H. Oden. 1959. Genetic Studies of Genius, Volume 5: The Gifted Group at Mid-Life: Thirty-Five Years of Follow-Up of the Superior Child. Stanford, CA: Stanford University Press. Thoits, Peggy A. 1992. “Identity Structures and Psychological Well-Being: Gender and Marital Status Comparisons.” Social Psychology Quarterly 55:236–56. Vaillant, George E. 1983. The Natural History of Alcoholism. Causes, Patterns, and Paths to Recovery. Cambridge, MA: Harvard University Press. Wallerstein, Judith S., Julia Lewis, and Susan Blakeslee. 2000. The Unexpected Legacy of Divorce: A Twenty-Five Year Landmark Study. New York : Hyperion.
Salkind_Chapter 05.indd 80
9/16/2010 12:41:00 PM
6 The Developmental Niche: A Conceptualization at the Interface of Child and Culture Charles M. Super and Sara Harkness
R
esearch on human development has been shaped by two contrasting images. The first is of a single individual in a carefully controlled setting, demonstrating behaviors characteristic of a certain level or kind of functioning. Questions asked in this setting include: how do people like this perceive, think, or react? What is the structure of their intellectual abilities or the style of their affective regulation? How does this change from one age to another? The metaphor of development here is growth, an unfolding or emergence of structures and functions, a sequence of transformations that belongs to our species and the laws of which can be discovered by detailed probing under laboratory conditions. The second image is of a person richly attired in ceremonial garments and surrounded by friends and kin, behaving in a way unique to that particular setting and to the larger culture which creates it. The questions here are: why is this person doing this thing, and how did he or she learn to do it? How does the behavior fit with other aspects of the culture? What does it mean to the persons involved, and how did it come to mean that? The metaphor of development evoked by these questions is the molding by culture of human potential to the particular patterns of behavior that are adaptive in that context. Each of these images has been associated with a field of academic inquiry. The ‘universal’ individual observed under special conditions has been the Source: International Journal of Behavioral Development, 9 (1986): 545–569.
Salkind_Chapter 06.indd 81
9/16/2010 12:40:48 PM
82
Human Development
object of psychological research, from Wundt’s brass-instrumented laboratory in Leipzig to Piaget’s methode clinique and American experiments in cognitive development. The behavior of people in exotic cultures, on the other hand, has been the domain of anthropological study. Unlike the vertical theories of developmental psychology, anthropological theories have presented a horizontal panorama of human variation. To be sure, anthropological studies have drawn on psychological theory in attempts to formulate the links between culture and the individual, and psychological researchers have recently acknowledged that the laboratory is a cultural artifact embedded in socially regulated meanings (see Jahoda 1982; Harkness and Super in press). Nevertheless, the contrasting images continue to function in the creation and presentation of research on human development, and the integration of these metaphors is a continuing challenge. In this essay, we briefly review some earlier formulations of the interface between culture and individual, and we then introduce the ‘developmental niche’, a set of concepts that is proving useful in research on culture and child development. The physical and social settings of everyday life, the customs of child care, and the psychology of caretakers are seen as three integrated subsystems of the niche, each with its own set of relations to the larger environment. Although not a formal theory in the classical sense, the developmental niche provides a framework for examining the effects of cultural features on child rearing in interaction with general developmental parameters. These ideas have been developed in the context of our work in East Africa, and we will draw on it to illustrate the major points.
Anthropological Perspectives on Human Development Culture, according to one major perspective in anthropology, resides in the individual mind; a theory of culture must therefore include how it gets there and how it functions there. From the time of Margaret Mead and Ruth Benedict, anthropologists have attempted to draw the relationships between cultural environments and the behavior of individuals within them. The most fully elaborated of these attempts is John Whiting’s ‘model for psychocultural research’, which postulated that: (1) Features in the history of any society and in the natural environment in which it is situated influence (2) the customary methods by which infants (and children) are cared for in that society, which have (3) enduring psychological and physiological effects on the members of that society, which are manifested in (4) the cultural projective-expressive systems of the society and the physiques of its members’ (J. Whiting 1981: 155).
Salkind_Chapter 06.indd 82
9/16/2010 12:40:48 PM
Super and Harkness
The Developmental Niche
83
As drawn in schematic form (e.g., J. Whiting 1977: 30), history and environment jointly influence the ‘maintenance systems’ of a society, which include the settlement patterns, economic base, division of labor, and household structure. From the maintenance system flow elements of the child’s ‘learning environment’, the whole of socialization that leads to variation in adult psychological functioning. Aspects of adult personality as culturally formed can be inferred, finally, from the ways that they are culturally expressed or ‘projected’ in rituals or belief systems. Although Whiting explicitly acknowledges that other arrows of causality might reasonably be drawn (for example, expressive systems might influence child rearing techniques), research using this model generally follows the hypothesized causal lines, with much of it centered on describing elements of the learning environments of children in different cultures (see Munroe, Munroe and B. Whiting 1981). The Whiting model was built on prevailing psychological theories of the 1940s and 1950s, as well as on the premise in functionalist anthropology that different domains of a culture are systematically and usefully related to each other. From psychology came the idea of personality as a set of enduring dispositions whose roots could be traced to early experience. Both Freudian theory and social learning theory were used in formulating the links from individual experience to adult behavior as represented by rituals and beliefs. In addition, the model assumed that both methods of child rearing and projective systems were patterns of behavior and thought shared by many if not all members of the culture. This was, in anthropology, the ‘culture and personality’ approach to understanding the ‘typical’ or ‘modal’ personality of members of a culture. More recently, some of the theoretical underpinnings of the Whiting model have been challenged. The usefulness of ‘personality’ as a construct, its roots in interpersonal experience, and the assumption of continuity over the life-span have all been strongly questioned by empirical research (Fiske 1974; Mischel 1968; Kagan and Klein 1973; Shweder 1979). At the same time, studies of individual people in different cultures have shown that, as Whiting himself has said, culture is orthogonal to personality, and the constructs that are useful for describing behavior at the group level do not seem to apply very well to the explanation of individual behavior. In addition, the linear assumptions of cause and effect, borrowed from social learning theory’s experimental model, have been recognized as inadequate. LeVine (1970: 596–597) comments: ‘Customs like child-rearing practices and the variety of cultural behavior patterns with which they have been hypothetically linked tend to be associated with many other customs, and these multiple associations lend themselves to a variety of interpretations, some of them sociological or ecological rather than psychological. In the welter of multiple connections ... it is all to easy to find support of simple causal hypotheses by limiting one’s investigation to a few variables rather than looking at the larger structure of relations in which they are embedded.’
Salkind_Chapter 06.indd 83
9/16/2010 12:40:48 PM
84
Human Development
The Whiting model stimulated, and over the years synthesized, a full generation of anthropological research on children and their caretakers. One of its fruits has been renewed thinking about the interface between individual development and its cultural context (see Harkness and Super in press). As anthropological understanding of this interface evolved from ‘child training’ (Whiting and Child 1953) to ‘learning environments’ (Whiting and Whiting 1975) to the ‘acquisition of culture’ (Schwartz 1981), developmental psychology was also revising its appreciation of the relationship, and we turn now to that history.
Psychology and the Environment The notion that development is influenced by the environment is about as old as the idea of development itself; in a trivial sense, environment in the form of ‘stimuli’ or even ‘experience’ has been a cornerstone of psychology since its earliest philosophical beginnings. But as the formal discipline of psychology was created to apply ‘the scientific method’ to understanding the human mind, the environment as an object of study was excluded. The new science of the mind sought universal laws, free of context, in the isolation of the laboratory. The child study movement, as it grew from collateral roots in the early part of this century (see Siegel and White 1982), was nutured by humanist, educational, interdisciplinary, and policy-oriented concerns as well as scientific ones. When it came to be firmly and broadly established in the acadamy in the two decades after World War II, however, it was transformed as ‘developmental psychology’, a logical-positivist, laboratory-based enterprise. It had fully incorporated psychology’s dedication to the individual as the object of study (Cairns 1983; McCandless 1970; Super 1982). Although the experimental paradigm has dominated the field of child development for several decades, a small but persistent tradition has always been concerned with the limitations of studying human behavior only in the laboratory. In its interdisciplinary origins it has connections to the work of Mead and other anthropologists, but it also includes observational work by psychologists: Dennis’s (1940) research in a Hopi village, for example, and studies of the psychological ecology of growing up in the Midwest by Barker and Wright (1949). The latter was particularly inspired by Lewin’s (1936) ‘field theory’ of behavior which incorporated both experimentation and non-laboratory locales. Based to some degree on this tradition, there occurred in the mid-1970s a major shift within the field of developmental psychology concerning the role of the environment in development. Many exemplars of this shift could be cited, but we will review three major statements to indicate the breadth of theoretical reorientation. McCall (1977), in one of the more widely applauded critiques of what was then modal work in developmental psychology, focused on inherent
Salkind_Chapter 06.indd 84
9/16/2010 12:40:48 PM
Super and Harkness
The Developmental Niche
85
problems in the laboratory paradigm. ‘Few studies’, he wrote, ‘are concerned with development as it transpires in naturalistic environments’, and he attributed the triviality of much research to excessive devotion to an experimental model that came ‘to dictate rather than serve research questions’ (1977: 333). Because it is neither practical nor ethical to manipulate essential aspects of human development, McCall concluded, laboratory research can never answer questions essential to the discipline. A similar concern prompted Bronfenbrenner’s (1979) frequently quoted statement that ‘much of developmental psychology, as it now exists, is the science of the strange behavior of children in strange situations with strange adults for the briefest possible periods of time’ (1979: 19). The major thrust of his ecological approach to child development is to expand both the methods and the vision of psychology beyond the individual as the exclusive focus of analysis. A child’s environment can not be reduced to a single immediate setting containing the subject, Bronfenbrenner argued, for ‘environmental events and conditions outside any immediate setting containing the person can have a profound influence on behavior and development within that setting … for example (by) defining the meaning of the immediate situation to that person’ (1979: 18). Bronfenbrenner’s (1979) scheme for dividing the child’s environment into micro-, meso-, exo-, and macro-systems has proven widely influential as a framework for examining the ‘environment’ in a new way. One aspect of his approach especially relevant here is an emphasis on ‘the progressive accommodation throughout the life span, between the growing human organism and the changing environments in which it lives and grows’ (1977: 513). Kessen’s (1979) essay on ‘the American child and other cultural inventions’ and its subsequent elaborations (Kessel and Siegel 1983) present a philosophical and historical argument that complements McCall’s critique of methodology and Bronfenbrenner’s statement of theory. Our understandings of the nature of the child are too varying over time and too related to contemporary intellectual ambiance to permit any confident conclusions about ‘the child’. In Kessen’s words: ‘If we were truly to recognize that the study of children is not exclusively or even mainly a scientific enterprise in the narrow sense [he means “experimental”], but stretches out toward philosophy and history and demography, if we were to recognize such an expanded definition of child study, we might anticipate a new (science) whose object of study is not the true child or my piece of the true child but the changing diversity of children’ (Kessen 1983: 37–38). In short, Kessen’s claim, like McCall’s and Bronfenbrenner’s, is that child study of the previous decades did not use an adequate model of development and did not provide adequate tools for arriving at one. The appropriate object of study, he argues, is not the child but the child-in-context. As the theoretical ferment of the 1970s centered on the nature and role of the developmental environment, it is not surprising to find also at that time a reconsideration of the models for the environment. Bronfenbrenner
Salkind_Chapter 06.indd 85
9/16/2010 12:40:48 PM
86
Human Development
and Crouter (1983) have presented an analysis of the theories of the environment hidden in the major theories of development, and correspondingly the ‘latent paradigm shifts’ concerning the environment that accompanied more overt changes in developmental theory. Until recently ‘hidden’ theories of the environment were the only kind available in psychology because of the personological bias in the discipline (Harkness 1980) and, in fact, in Western culture generally (Shweder and Bourne 1982). Bronfenbrenner’s (1979) ecological model of the environment and its network of influences has already been briefly mentioned as one product of the new look at the context of development. A second, increasingly common approach is to see continuous, inductive aspects of the environment as an ‘epigenetic landscape’ (e.g., Fishbein 1976; McCall 1981, Scarr-Salapatek 1976), borrowing the metaphor from Waddington’s (1957) genetics and Spemann’s (1938) embryology. Life-span and life-course approaches (e.g., Elder and Rockwell, 1979; Baltes 1979) represent a third rethinking of the environment, while the application of general systems theory to human development combines some features of all these models (see Sander, Stechler, Burns and Lee 1979; Sameroff 1983; Sameroff and Chandler 1975). In sum, developmental psychology underwent a fundamental change in its appreciation of the context of development in the 1970s. The limitations of a purely analytic, laboratory discipline were argued by a number of prominent authors, the validity of a developmental model based exclusively on the individual child was questioned, and fresh theories blossomed quickly from a variety of historical roots in order to represent psychology’s new insights.
The Developmental Niche The concept of the developmental niche lies at the juncture of the theoretical concerns in psychology and anthropology outlined above, and it attempts to capture important features from both disciplines. The recent models of the environment for development, however, do not generally acknowledge its cultural structuring, even though this may be the most important aspect of human ecology. On the other hand, anthropological approaches to culture and human development have been excessively oriented to the ‘final product’ in adulthood rather than focusing on developmental processes throughout the life-span. The developmental niche, in response, is a theoretical framework for studying cultural regulation of the micro-environment of the child, and it attempts to describe this environment from the point of view of the child in order to understand processes of development and acquisition of culture. The developmental niche has three major subsystems which operate together as a larger system and each of which operates conditionally with other features of the culture. The three components are: (1) the physical and social
Salkind_Chapter 06.indd 86
9/16/2010 12:40:48 PM
Super and Harkness
The Developmental Niche
87
settings in which the child lives; (2) culturally regulated customs of child care and child rearing; and (3) the psychology of the caretakers. These three subsystems share the common function of mediating the individual’s developmental experience within the larger culture. Regularities in the subsystems, as well as thematic continuities from one culturally defined developmental stage to the next, provide material from which the child abstracts the social, affective, and cognitive rules of the culture, much as the rules of grammar are abstracted from the regularities of the speech environment. The three components of the developmental niche form the cultural context of child development.
Physical and Social Settings B. Whiting (1980) has pointed out that one of the most powerful ways culture influences child development is through providing the settings of daily life. The people who frequent the settings are seen as especially formative of social behaviors because they determine the kind of interactions children have the opportunity and the need to practice. Infants, for example, universally elicit nurturant acts from caretakers and others around them. Societal institutions such as formal schooling have a major effect on the age and sex of children’s daily companions, and thus on the types of social interactions experienced. B. Whiting, Edwards, and their collaborators (1986) have recently compiled observational data from a number of communities around the world to explore this function of culture. In our research in Kokwet, a rural Kipsigis community of Kenya, we have examined relationships between the settings of children’s everyday lives and various aspects of child development. Some of these studies reveal differences from Western norms in aspects of development that have been considered universal. For example, differences in sleep patterns between infants in Kokwet and in an urban American sample were related to differences in settings: whereas the Kokwet babies slept with their mothers and were never left alone during the day, the American babies generally slept in their own beds, often in a separate room, and they slept in separate, quiet places during the day as well. One result of these differences in physical settings and daily routines was that the Kokwet babies slept less, overall, than the American ones; they also continued to wake every few hours at night months after most American babies had begun to sleep for long periods (Super and Harkness 1982). Similarly, the percent of time an infant spent sitting (e.g., in a caretaker’s lap) as opposed to lying down was found to be a factor in the speed at which the universal skill of sitting alone is acquired (Super 1976, 1981). The physical environment of mats, cribs, and/or chairs, combined with the social environment of caretakers and companions, structure the infant’s opportunities for developing emerging behavioral potentials.
Salkind_Chapter 06.indd 87
9/16/2010 12:40:48 PM
88
Human Development
Another example of the power of settings in determining the development of apparently universal behaviors is in the domain of gender segregation in children’s peer groups. Recent American research has established the tendency of boys and girls to associate preferentially with members of their own sex, and some effort has been oriented to documenting the exact onset of this behavior in the preschool years. In Kokwet, however, children from late infancy through middle childhood spent most of their time in mixed-age, mixed-sex groups of children from the same or neighboring households. The tendency for boys and girls to associate more with same-sex peers did not emerge until after the age of six, when they were considered old enough to leave their own homesteads to seek companions. Thus, it appears that the question of developmental trends in children’s choice of companions cannot be addressed independently of the settings of their daily lives (Harkness and Super 1985b). A salient aspect of child life in rural East African communities, as in many other cultures, is the extent to which children participate in the work of the household. Age trends in work activities, contrasted to play or rest, illustrate this aspect of the physical and social settings of the children in Kokwet. Using several hundred ‘family spot observations’ which noted the activities and locations of all members of a household at different times of day, we assembled a composite picture of the main activities of children from infancy to age nine years in Kokwet. The category of ‘work’ included a long list of chores such as processing food, cooking and tending the fire, collecting firewood and bringing water, taking care of animals (mainly cows or sheep and goats), and caring for babies. ‘Play’ included both individual and social play, while ‘rest’ included sleeping, lying down, and sitting quietly alone or with others. Together, these three categories accounted for approximately 80 percent of the children’s observations, with eating and school filling in most of the remainder (most children in Kokwet under the age of 10 years, however, did not attend school). Analysis of the activities of children of different ages shows that at age two years, play occupied almost half the children’s time while rest accounted for another 25 percent. However, the proportions of time spent in these different activities began to change rapidly toward participation in the household economy. By age four, children were observed almost equally often in play, rest, and work. By six or seven years, children were spending half their time in work activities while play and rest came to occupy minimal proportions of their days. The structuring of settings in terms of activities set the parameters for the kinds of social interactions which could take place within them, in much the same way as the cast of characters present also sets limits. In the case of Kokwet, playful interactions might occur within the context of carrying out household tasks such as watching the cows or caring for a younger sibling; but these play sequences were frequently punctuated by the demands of work (Harkness and Super 1986). In contrast to the middle class
Salkind_Chapter 06.indd 88
9/16/2010 12:40:49 PM
Super and Harkness
The Developmental Niche
89
Western emphasis on play as central to young children’s development, work was clearly the main task of childhood in Kokwet.
Customs of Child Care Physical aspects of the setting can shape the growing child’s experience, at the most basic level, through infectious pathogens and parasites that slow, alter, or terminate the processes of biological growth. Similarly the physical availability of adequate nutrients is critical. Virtually all aspects of the physical setting, however, are mediated by cultural adaptations in child care practices. The presence, for example, of dangerous objects such as cooking fires, deep water, staircases, and large or poisonous animals will prompt accommodations in techniques of care, including closeness of supervision. Given the human and technological resources available, parents and other caretakers adapt the customs of child care to the ecological and cultural settings in which they live. Customs as discussed here are sequences of behavior so commonly used by members of the community, and so thoroughly integrated into the larger culture, that they do not need individual rationalization and are not necessarily given conscious thought. Although at the group level they can be seen as adaptations to the larger environment or ways of coping with developmental issues, they are more likely to be regarded by members of a culture as the ‘reasonable’ or ‘natural’ thing to do. As such, these features of child rearing are not so much the immediate product of individual choice or personal disposition as they are community-wide solutions to recurrent issues in child rearing. Customs in this sense include not only routine tools for everyday living, such as where to put the baby, but also infrequent, complex, and institutionalized mechanisms such as adolescent circumcision rituals and sending children to school. From the point of view of the researcher, customs of child care can be seen as behavioral strategies for dealing with children of particular ages, in the context of particular environmental constraints. Carrying an infant on the back, tied with a shawl or piece of cloth, is a customary method of infant care in many societies. Our spot observations in Kokwet show that backcarrying was rare in the first month of life but thereafter during the first year was used for 17 percent of the infant’s daytime care. Initially much of the carrying was done by the mother, but by three months of age a sibling caretaker (typically a 7-year-old sister) had assumed more than 25 percent of the immediate handling of the baby. Reasons for carrying given by Kipsigis mothers and child caretakers when asked were to soothe the baby (through contact and rocking) and to keep him or her out of trouble. In addition, the infants were riding on the caretaker’s hip or being held vertically in her arms for an additional 12 percent of the day.
Salkind_Chapter 06.indd 89
9/16/2010 12:40:49 PM
90
Human Development
There are a number of possible consequences for the infant, including the pattern of visual experience, social interaction, and physical exercise through bodily adjustments to the caretaker’s movement (see Super 1981). In the latter case, experimental research has identified lasting effects. Porter (1972) and Clark, Kreutzberg and Chee (1977), for example, introduced passive limb exercise and vestibular stimulation to normal American infants and demonstrated significantly increased physical growth and reflexive and gross motor development. Their limited interventions appear to be less than the routine difference between rural Kipsigis and urban American customs of care. Further, though perhaps of less significance, the increased time being held results in less time available for practicing prone and supine behaviors. Infants in Kokwet were observed to be lying down about 10 percent of their waking time compared to 30 percent in an urban American sample. This difference in the patterning of physical exercise is thought to contribute to the later emergence of crawling in Kokwet, just as greater experience with sitting and walking behaviors contributes to the Kipsigis infants’ earlier accomplishment of these milestones (Super 1976, 1981). Corresponding to the physical care that results in differential exercise, parents in Kokwet customarily and deliberately ‘taught’ their infants to sit and walk (but not to crawl). There were specific behavioral routines, with specific words to refer to them, that parents and siblings all knew and practiced on a nearly daily basis months before the skills were fully acquired by the baby.
Psychology of the Caretakers Although most child-rearing customs are accepted without critical examination, they are often accompanied by specific beliefs concerning their significance. Kipsigis parents believed that without specific teaching, infants’ sitting and walking would be delayed or impaired (Super 1976, 1981); the belief did not extend to crawling. There are many beliefs and values that are regulated by the culture and that in turn regulate development of the child; we separate them, as the psychology of the caretakers, to be the third systematic feature of the developmental niche. The psychology of the caretakers includes ethnotheories of child behavior and development as well as the commonly learned affective orientations which parents bring to their experience of parenting. Most important among the ethnotheories are beliefs concerning the nature and needs of children, parental and community goals for rearing, and caretaker beliefs about effective rearing techniques. Within constraints created by the physical environment, available technology, customs of child care, and the demands of parents’ own activities, the psychology of the caretakers organizes parental strategies of child rearing in both the immediate and the more long-term
Salkind_Chapter 06.indd 90
9/16/2010 12:40:49 PM
Super and Harkness
The Developmental Niche
91
sense. For example, parents’ assignments of their children to different settings expresses beliefs about the capabilities of children at different ages as well as parental goals for their children’s development. The responses of parents and other caretakers to children’s emotional displays also are directed by ideas, often implicit, about the development of the self in the context of the particular culture. Caretaker psychology provides immediate structure to children’s development through the meaning it invests in universal behaviors and processes. Even to the earliest behavior of newborns adults apply culturally relevant schemas of interpretation. We asked mothers in Kokwet and Boston to rate the similarity of various neonatal bahaviors included on the Neonatal Behavioral Assessment Scale (Brazelton 1973), and the results indicate that while mothers in both cultures used similar dimensions in making their responses, their emphasis differed (Super 1986b). A jerky sweep of the hand in response to an examiner’s touch on the face, for example, was seen positively by Kipsigis mothers as reflecting responsive motor integrity (Dimension II), a sign of health and strength. A mother in Boston, in contrast, was more likely to weight her perception of the motion with concern over the disorganization implicit in the jerkiness, for controlled states of arousal (Dimension I) were the dominant organizing feature of American perceptions of the newborn. More generally, deVries and Super (1979) concluded on the basis of conducting neonatal examinations in the home that some cultures (Masai, Kikuyu, and Kipsigis in their study) assume infants to be ‘fragile creatures, easily threatened by rough handling or overstimulation … In contrast, the Digo appear to think of their babies as relatively hardy and not in need of special protection from physical distress’ (1979: 95). Mothers’ beliefs were also evident in their approaches to child language socialization. In interviews about how children learn to talk, the Kokwet mothers generally expressed the view that children learned to talk more from each other than from the mothers themselves. Some of the mothers claimed they did nothing to enourage their children’s language development, and among those who did, commands (which generally do not require a verbal response) were the most frequently mentioned type of language input. Naturalistic observations confirm the mothers’ reports: by comparison with American studies, the frequency of the Kokwet mothers’ speech to their twoto three-year-old children was remarkably low. We have suggested that this approach to child language socialization in Kokwet reflects Kipsigis parental goals of training for obedience and responsibility rather than for verbally expressive individuality (Harkness and Super 1982). The centrality of obedience and responsibility in Kipsigis parental theories was also demonstrated in our explorations of mothers’ ideas of intelligence and personality in Kokwet (Super 1983). Discussions with a group of mothers in the community yielded a group of words and phrases that were commonly used in talking about children. Concepts referring to a child’s helpfulness
Salkind_Chapter 06.indd 91
9/16/2010 12:40:49 PM
92
Human Development
and obedience were the largest group among these. Another term, translated as ‘intelligence’ (ng’om), also carried a strong component of ‘responsibility’. One informant illustrated the meaning as follows: ‘For a girl who is ng’om, after eating she sweeps the house because she knows it should be done. Then she washes the dishes, looks for vegetables, and takes good care of the baby. When you come home, you feel pleased and say, “This child is ng’om.” Another girl may not even clean her own dishes, but just go out and play, leaving the baby to cry. For a boy, if he is ng’om, he will watch the cows, and take them to the river without being told. He knows to separate the calves from the cows and he will fix the thorn fence when it is broken. The other boy will let the cows into the maize field and will be found playing while they eat the maize.’
Further investigations showed that while ‘intelligence’ was recognized as a verbal, social quality in the abstract, its most salient expression was in the domain of carrying out one’s responsibilities at home (cf. Dasen, Barthelemy, Kan, Kouame, Daouda, Adjei and Assande 1985). In this context the ability to be helpful without being reminded by an adult emerged as an important marker of intelligence. Likewise, mothers in Kokwet stated that they felt they could make judgments of a child’s personality at about the same age (five or six years) that they expected to be able to assign the child to run an errand to a nearby homestead or store (Super and Harkness 1983). Culturally constructed theories such as these were important in parents’ definitions of their children’s developmental stage; and such definitions, in turn, were translated into parental assignments of their children to different physical and social settings. The concepts of obedience and responsibility were important not only for parents’ judgments of their children’s enduring qualities, but also in parental decisions about whether a child was ‘old enough’ to carry out culturally salient tasks, e.g., old enough to send on errands.
Three Corollaries We have borrowed the term ‘niche’ from biological ecology, where it is used to refer to an organism’s place or function in a biosystem (the etymological origin is the same as ‘nest’). There are three corollaries to be borrowed as well. (1) The three components of the niche operate in a coordinated manner. (2) Each component interacts differentially with other features of the larger ecology. (3) The organism and the niche are mutually adapted. It is noteworthy that these ideas are also represented to varying degrees in culture theory; that cultural components act as a coordinated system, in particular, has been a central concept in anthropological theory almost from its beginnings. The niche as a system. The three components of the developmental niche operate as a system with homeostatic mechanisms that promote consonance
Salkind_Chapter 06.indd 92
9/16/2010 12:40:49 PM
Super and Harkness
The Developmental Niche
93
among them. This is particularly evident in the examples of motor and language development described above. The settings, customs, and caretaker psychology each dispose toward the same acquisition and socialization. It is through such reinforcing patterns that culture has its most powerful immediate influence. Coordination in subsystems of the niche is also evident at times of successful transition in the child’s culturally defined developmental status, for example the shift from infancy to early childhood. Like many other sub-Saharan peoples, the Kipsigis believed that having a younger sibling was an important element in the socialization of children. Last-born children, because they were never replaced by a new baby as the center of the family’s indulgent attentions, tended to be ‘spoiled’ and difficult throughout life, it was thought, lacking in those qualities of obedience and responsibility which we have described above. For this reason, the arrival of a new baby was seen as the opportunity to implement a change of status for the second-to-youngest, which was expressed through changes in the settings and customs related to the child. While as an infant the child had slept at the mother’s breast, he or she would now be moved to sleep at her back or perhaps with the other young children in a separate bed. This child would also no longer be carried by the mother, and would be considered old enough to be the junior member of a household play or chore group rather than being assigned to a child caretaker. We have documented the changes in the amount and nature of adult attention which children received as a function of this culturally denned developmental change, as well as the changes in their daily activities (Harkness and Super 1983, 1986). Subsystems of the niche and external systems. Each of the three subsystems of the niche is also embedded, in different ways, in other aspects of the human ecology; the niche is an ‘open system’ in the formal sense (von Bertalanffy 1968). We have discussed some immediate effects of the physical setting above, but there are larger effects of the physical environment on various aspects of the niche. For example, the differences in infant carrying between Kokwet and Boston appear, in wider perspective, to be strongly influenced by climate. J. Whiting (1981: 175–176) concluded on the basis of a cross-cultural survey: ‘The manner in which infants are cared for is to a considerable extent constrained by the physical environment, the temperature of the coldest month of the year being the most important factor. In cold climates infants tend to be carried in a cradle, swaddled, and put in a cradle to sleep. In warm climates they are usually carried in a sling or shawl, often nap on their caretaker’s back, sleep next to their mothers at night, and are clothed lightly or not at all.’ Similarly, the subsistence base of a society (agricultural vs hunting and gathering) has been related to the goals and techniques of socialization for independence and obedience (Barry, Child and Bacon 1958), that is particularly, in our terms, to the psychology of the caretakers and the parent–child interactions that derive therefrom. The concept
Salkind_Chapter 06.indd 93
9/16/2010 12:40:49 PM
94
Human Development
of the development niche is designed, in part, to facilitate identification of the specific mechanisms that lie behind such large-scale, cross-cultural findings, and in so doing it reveals that the three components are differentially responsive to features of the larger culture and environment. The connections are most evident under conditions of change, for any component of the niche can be a route of innovation and disequilibrium. In Kokwet, the introduction of free, government-sponsored schooling has affected the settings of daily life for school-age children and the younger siblings who have been their charges. The custom of adolescent circumcision has been affected by the strictures of Christian missionary churches in the area, and more recently by a Presidential order that female circumcision was to be disallowed altogether. Parental beliefs about parent–child relations have been affected by teachings of the churches and other sources of ‘modern’ thinking, with wide-ranging effects that include language socialization (Harkness 1977) and family intimacy. In order to understand local adaptations to these changes introduced from the outside, it is useful to refer back to the first corollary of homeostatic mechanisms promoting cultural consistency. When change is introduced through one of the subsystems of the developmental niche, the initial cultural response is likely to be ‘conservative’ in that attempts are made to preserve as many elements as possible of the subsystem affected, and the other two subsystems may not change at all. Thus in the example of schooling, child caretakers continued to be used for infant care in Kokwet, even though in theory there should have been fewer of them available. Mothers have overcome this potential shortage through using younger than ideal siblings as caretakers, enrolling some children in school later, or hiring children from other families. The parental theories of obedience and responsibility, central for traditional roles of children in Kipsigis society but probably less adaptive for success in school, continued to define children’s developmental status and social identities. Eventually, however, if consequences of changes grow and ripple through the system, the same forces of homeostasis that minimize the initial response will now bring the three subsystems of the niche into a new consonance. In the case of schooling in Kokwet, parents began to perceive the importance of education as a way to send children into the salaried economy and reduce pressure on the farm land. This fostered changes in the settings parents assigned their children to and their customary child care practices. Daily homework and year-end exam preparation have come to replace some chores and other traditional features of family life. The concept of ng’om has been elaborated to ng’om en ga (‘intelligent at home’) and ng’om en sukul (‘intelligent at school’), child characteristics which were generally agreed to be uncorrelated. The frequent appearance of ‘all or nothing’ forms of culture change for children may be the joint result of the homeostatic and the differential linkage features of the developmental niche.
Salkind_Chapter 06.indd 94
9/16/2010 12:40:49 PM
Super and Harkness
The Developmental Niche
95
Mutual adaptation. Popular conceptions of adaptation have the organism adapting to the environment. Evolutionary biologists have found the relationship more problematic. Lewontin (1978), for example, agrees that as antelope and other hooved species migrate to new grasslands, selection may indeed, over time, effect their adaptation to the niche. On the other hand, he points out, the animals also alter the grasses through the physical action of their feet, the biochemical action of their droppings, and of course their selective actions of consumption and seed dispersal. The niche adapts too, and the ‘final’ result, if there is one, is a mutual adaptation of organism and niche, a co-evolution of the individual–environment system. The same mutuality occurs in the developmental niche. Certainly children ‘adapt’ to their environment; that is the basis of a full literature concerning environmental effects on child development. But there is also a complementary environmental adaptation, or more accurately, a co-evolution. At the level of individuals, this has received wide attention in the study of temperament and ‘child effects on parental behavior’ (Bell 1968; Thomas and Chess 1977). It is also evident in attempts to conceptualize the individual and environment as a formal system (Sander et al. 1979; Sameroff 1983). More generally, however, species-wide characteristics of growth act to constrain the kinds of niches that work. Rogoff, Sellers, Pirrota, Fox, and White (1975) have drawn inferences about universal stages in development from similarities across cultures in the ages at which certain tasks and responsibilities are assigned to children. In a more limited study, we have found an age-related structure to children’s social environments in Kokwet that is familiar to the Western eye. Despite some unique features, it seems to reflect environmental accommodation to the universal needs and abilities of different aged children (Harkness and Super 1983). There is a growing body of evidence on maturationally controlled shifts in children’s cognitive and emotional characteristics (Kagan 1976, 1984; Konner 1982; Super 1972, in press), and these changes appear to be a critical element in the expectations and demands placed on children by parents and the community. Because of the multiple interconnectedness of elements of the niche with each other and with the larger environment, however, there are constraints on the ability of niches to adapt. For example, the daily schedules of American parents and their values regarding independence and autonomy make particularly troublesome an infant who is irregular in sleeping habits. This aspect of individual temperament is one factor in the classic ‘difficult child syndrome’ of Thomas and Chess (1977). In Kokwet, however, sleeping arrangements and the absence of institutionalized work schedules virtually eliminated sleep as a source of difficulty in caring for infants. On the other hand, the Kipsigis niche was not easily able to deal with the baby who did not like being carried on the back, or who objected to being cared for by someone other than the mother. These two common features of infant care in Kokwet were too tightly connected to the mothers’ work and the larger organization
Salkind_Chapter 06.indd 95
9/16/2010 12:40:49 PM
96
Human Development
of family life to be very flexible in the absence of major reorganization (Super and Harkness 1981; Super 1986a).
Niches in Development The developmental niche of a child does not remain constant for long. In large part this is environmental accommodation to the growing individual, but the quality and timing of shifts in the niche bear the imprint of culture. Most importantly, there is a synergy to the sequence of niches that creates the most powerful long-term effects of culture on development. Western theories of development, aside from the most extreme behaviorist position, locate discrete stages in psychological growth, a hierarchical, goaloriented anologue to ‘punctuated equilibrium’ in evolutionary theory (Gould and Eldredge 1977, 1986). At the core of each stage is a common, age-related task, be it understanding object permanency (Piaget 1970), establishing basic trust (Erikson 1950), or resolving Oedipal issues (Freud 1956). There are important truths represented in such theories, but they overlook culturally specific themes that run across stages. One consequence of these larger themes is a subtle restatement of the task for any one stage in light of the transcendent issues. In Kokwet, the values of obedience and responsibility provided a central theme of continuity in successive developmental niches of infancy and childhood. The sharing of infant care, the close proximity of infants to others, and the consequent necessity for the infant to adapt to the exigencies of other people’s daily lives composed this lesson: you are part of a social group whose needs will shape your life from moment to moment, just as it will accommodate to your needs. The universal transition to early childhood took place with local goals and methods. The child was distanced from the mother’s breast, back, and bed, stronger ties were developed with peers and older sibling caretakers, and the child began in the third year of life involvement in the household economy. The child learned about respect for elders and responsibilities to the household. By six or seven years the child spent the majority of waking hours in productive and largely prosocial activities, but forged a new and generally positive relationship with the parents as a reliable helper in the tasks of the household. The acquisition of social responsibility was the criterion for adequate development, and its growth defined the beginnings, ends, and internal structuring of Kipsigis developmental stages. This agenda for social behavioral development is intimately related to the themes of affective development. Unlike most middle-class American parents, parents in Kokwet did not customarily engage in negotiations with their infants or children over the regulation of emotion, sleep–wake patterns, or eating. Initially, infant care practices consisted of management by others of the infant’s state. Signals of hunger, tiredness, or fussiness were responded
Salkind_Chapter 06.indd 96
9/16/2010 12:40:49 PM
Super and Harkness
The Developmental Niche
97
to promptly for the restoration of equilibrium. Although this pattern of care is sometimes labeled indulgent, as might be appropriate for a European or American who used it, the local meaning was probably quite different: others, not the baby, are in charge of dealing with variations in the baby’s physical and emotional state. The decrease in outside regulation as a Kipsigis child progressed to the early childhood niche could be difficult, but one theme remained constant: emotional perturbations were met with canning and distraction, not communication and elaboration with others. By middle childhood the focus was on what needs to be done, not on what the child felt like doing. The management of state in the individual became an accessory to the management of the social group as a whole. Short of physical symptoms of distress, variations in emotional state were not a focus of major concern to either caretakers or the child. The ‘affective–cognitive structures’ (Izard 1978) developed by the child who moves through these niches necessarily reflects the meanings abstracted from them. As revealed by symoblic interpretations of line drawings (Harkness and Super 1985a), Kipsigis children have, by middle childhood, learned to experience a relatively calm state as positive. They are cautious with regard to a more ‘agitated, excited’ stimulus, universally labeled ‘happy’ by American adults and older children. As one Kipsigis explained, ‘Being happy is when nothing is bothering you.’ Even when responding to identical physical stimuli and using common words with broadly similar denotation, Kipsigis and Americans have constructed different systems of meaning, different affective – cognitive structures, from the scripts learned and relearned during childhood. Our discussion so far has dealt with similarities in the content of themes across niches, but it is important to realize that the sequence of niches also regulates transitions. Thus what may appear to be a sudden break in societal demands may actually be a familiar and rehearsed transformation– metaphorically, an intra-dimensional rather than extra-dimensional shift (Kendler and Kendler 1967). The ‘indulgent’ niche of infancy and the strict prohibitions on crying at the time of adolescent circumcision are not inconsistent. Rather the surgical ceremony marks quite dramatically the transition from childhood to adulthood, a transition that has been prepared by all the previous niches. The change is a sharp one from an outside view, but for the Kipsigis child it is an important culmination of experience, tying together the central symbols of childhood and transforming one to an adult, a Kipsigis adult.
Summary and Conclusion The developmental niche is a conceptualization at the interface of child and culture. It can serve as a framework for relating findings in the separate disciplines of psychology and anthropology, and for examining the mechanisms involved in the cultural regulation of child development. The three components
Salkind_Chapter 06.indd 97
9/16/2010 12:40:49 PM
98
Human Development
of the developmental niche involved in this mediation are: (1) the physical and social settings in which the child lives; (2) the customs of child care and child rearing; and (3) the psychology of the caretakers. These three subsystems function with different relationships to other features of the larger culture and environment and thus they constitute somewhat independent routes of disequilibrium and innovation in the rearing of different cohorts of children. Nevertheless, homeostatic mechanisms tend to keep the three subsystems in harmony with each other and appropriate to the developmental level of the child. The settings, customs, and caretaker psychology share a common function in organizing the individual’s developmental experience. Regularities within and among the subsystems, and thematic continuities and progressions across the niches of childhood provide material from which the child abstracts the social, affective, and cognitive rules of the culture. Research on human development has been shaped by two central but contrasting metaphors. In psychology, human development has been viewed as a process of growth, of stage-like unfolding species-specific abilities. In anthropology, development has been viewed primarily as learning, even as a process of molding from rather general potentials the culturally particular patterns of behavior and thought. The concept of the developmental niche represents an attempt to synthesize these two opposing metaphors, and it has drawn from several disciplines recent theories of the relationships between individual growth and its environmental context. The development niche is thus also a metaphor, in which the child and the culture are seen as mutually interactive systems. The usefulness of this metaphor for research lies in its delineation of aspects of the child’s environment that have gone often unrecognized in psychology, while focusing on the processes of growth that are at the heart of developmental theory.
Note The original research summarized here was supported in part by grants from the Carnegie Corporation of New York, W.T. Grant Foundation, the National Institute of Mental Health (grant no. 33281), and the Spencer Foundation. All statements made and opinions expressed are the sole responsibility of the authors.
References Baltes, P. B., 1979. ‘Life-span developmental psychology: some converging observations on history and theory’. In: P. B. Baltes and O.G. Brim (eds.), Life-span development and behavior, Vol. 2. New York: Academic Press. Barker, R.G. and H.F. Wright, 1949. Psychological ecology and the problem of psychosocial development. Child Development 20, 131–143. Barry, H.H., III, I.L. Child and M.K. Bacon, 1958. Relation of child training to subsistence economy. American Anthropologist 61, 51–63.
Salkind_Chapter 06.indd 98
9/16/2010 12:40:49 PM
Super and Harkness
The Developmental Niche
99
Bell, R.Q., 1968. A reinterpretation of the direction of effects in studies of socialization. Psychological Review 75, 81–95. Brazelton, T.B., 1973. Neonatal behavioral assessment scales. London: Spastics International Medical Publications. Bronfenbrenner, U., 1977. Toward an experimental ecology of human development. American Psychologist 32, 513–531. Bronfenbrenner, U., 1979. The ecology of human development. Cambridge, MA: Harvard University Press. Bronfenbrenner, U. and A.C. Crouter, 1983. ‘The evolution of environmental models in developmental research’. In: W. Kessen (ed.), History, theories, and methods, Vol. 1, of P.H. Mussen (ed.), Handbook of child development. New York: Wiley. pp. 397–414. Cairns, R.B., 1983. ‘The emergence of developmental psychology’. In: W. Kessen (ed.), History, theories, and methods, Vol. 1, of P.H. Mussen (ed.), Handbook of child development. New York: Wiley. pp. 41–102. Clark, D.L., J.R. Kreutzberg and F.K.W. Chee, 1977. Vestibular stimulation influence on motor development in infants. Science 196, 1228–1229. Dasen, P., D. Barthelemy, E. Kan, K. Kouame, K. Daouda, K.K. Adjei and N. Assande, 1985. N’glouele, I’intelligence chez les Baoule. Archives de Psychologie 53, 293–324. Dennis, W., 1940. The Hopi child. Charlottesville, VA: University of Virginia Institute for Research in the Social Sciences. deVries, M.W. and C.M. Super, 1979. ‘Contextual influences on the Brazelton Neonatal Behavioral Assessment Scale and implications for its cross-cultural use’. In: A. Sameroff (ed.), Organization and stability of newborn behavior: a commentary on the Brazelton Neonatal Behavioral Assessment Scale. Monographs of the Society for Research in Child Development 43(5–6), 92–101. Elder, G.H., Jr. and R.C. Rockwell, 1979. The life course approach and human development: an ecological perspective. International Journal of Behavioral Development 2, 1–21. Erikson, E.H., 1950. Childhood and society. New York: W.W. Norton. Fishbein, H.D., 1976. Evolution, development, and children’s learning. Pacific Palisades, CA: Goodyear. Fiske, D.W., 1974. The limits for the conventional science of personality. Journal of Personality 42,1–11. Freud, S., 1956. A general introduction to psychoanalysis. New York: Permabooks. Gould, S.J. and N. Eldredge, 1977. Punctuated equilibria: the tempo and mode of evolution reconsidered. Paleobiology 3, 115–151. Gould, S.J. and N. Eldredge, 1986. Punctuated equilibrium at the third stage. Systematic Zoology 35, 143–148. Harkness, S., 1977. ‘Aspects of social environment and first language acquisition in rural Africa’. In: C.E. Snow and C.A. Ferguson (eds.), Talking to children: language input and acquisition. Cambridge: Cambridge University Press. pp. 309–316. Harkness, S., 1980. ‘The cultural context of child development’. In: C.M. Super and S. Harkness (eds.), Anthropological perspectives on child development. (New Directions in Child Development, 8) pp. 7–14. Harkness, S. and C.M. Super, 1982. ‘Why African children are so hard to test’. In: L.L. Adler (ed.), Cross-cultural research at issue. New York: Academic Press. pp. 145–152. Harkness, S. and CM. Super, 1983. The cultural construction of child development: a framework for the socialization of affect. Ethos 11, 221–231. Harkness, S. and C.M. Super, 1985a. ‘Child-environment interactions in the socializations of affect’. In: M. Lewis and C. Saarni (eds.), The socialization of emotions. New York: Plenum Press, pp. 21–36. Harkness, S. and C.M. Super, 1985b. The cultural context of gender segregation in children’s peer groups. Child Development 56, 219–224.
Salkind_Chapter 06.indd 99
9/16/2010 12:40:49 PM
100
Human Development
Harkness, S. and C.M. Super, 1986. ‘The cultural structuring of children’s play in a rural African community’. In: K. Blanchard (ed.), The many faces of play. Champaign, IL: Human Kinetics. pp. 96–103. Harkness, S. and C.M. Super, in press. ‘The uses of cross-cultural research in child development’. In: G. J. Whitehurst and R. Vasts (eds.), Annals of child development, Vol. 4. Greenwich, CT: JAI Press. Izard, C.E., 1978. ‘On the development of emotions and emotion-cognition relationships in infancy’. In: M. Lewis and L.A. Rosenblum (eds.), The development of affect. New York: Plenum. Jahoda, G., 1982. Psychology and anthropology: a psychological perspective. London: Academic Press. Kagan, J., 1976. Emergent themes in human development. American Scientist 64, 186–196. Kagan, J., 1984. The nature of the child. New York: Basic Books. Kagan, J. and R.E. Klein, 1973. Cross-cultural perspectives on early development. American Psychologist 28, 947–961. Kendler, T.S. and H.H. Kendler, 1967. ‘Experimental analysis of inferential behavior in children’. In: L.P. Lipsitt and C.C. Spiker (eds.), Advances in child development and behavior, Vol. 3. New York: Academic Press. Kessel, F. S. and A. W. Siegel (eds.), 1983. The child and other cultural inventions. New York: Praeger. Kessen, W., 1979. The American child and other cultural inventions. American Psychologist 34, 815–820. Kessen, W., 1983. ‘The child and other cultural inventions’. In: F. S. Kessel and A.W. Siegel (eds.), The child and other cultural inventions. New York: Praeger. pp. 26–39. Konner, M., 1982. ‘Biological aspects of the mother–infant bond’. In: R.N. Emde and R.J. Harmon (eds.), The development of attachment and affiliative systems. New York: Plenum. pp. 137–159. LeVine, R., 1970. ‘Cross-cultural study in child development’. In: P.H. Mussen (ed.), Carmichael’s manual of child psychology, Vol. 2. New York: Wiley. pp. 559–612. Lewin, K., 1936. Principles of topological psychology. New York: McGraw-Hill. Lewontin, R.C., 1978. Adaptation. Scientific American 239, 212–235. McCall, R.B., 1977. Challenges to a science of developmental psychology. Child Development 48, 333–334. McCall, R.B., 1981. Nature-nurture and the two realms of development: a proposed integration with respect to mental development. Child Development 52, 1–12. McCandless, B.R., 1970. Editorial. Developmental Psychology 2, 1–4. Mischel, W., 1968. Personality and assessment. New York: Wiley. Munroe, R.H., R.L. Munroe and B.B. Whiting (eds.), 1981. Handbook of cross-cultural human development. New York: Garland Press. Piaget, J., 1970. ‘Piaget’s theory’. In: P. Mussen (ed.), Carmichael’s manual of child psychology, Vol. 1. New York: Wiley. pp. 703–732. Porter, L.S., 1972. The impact of physical-physiological activity on infants’ growth and development. Nursing Research 21, 210–219. Rogoff, B., M.J. Sellers, S. Pirrotta, N. Fox, and W. H. White, 1975. Age of assignment of roles and responsibilities to children: a cross-cultural survey. Human Development 18, 353–369. Sameroff, A. J., 1983. ‘Developmental systems: contexts and evolution’. In: W. Kessen (ed.), History, theories, and methods, Vol. 1, of P. H. Mussen (ed.), Handbook of child development. New York: Wiley. pp. 237–294. Sameroff. A.J. and M.J. Chandler, 1975. ‘Reproductive risk and the continuum of caretaking casualty’. In: F. D. Horowitz (ed.), Review of child development research, Vol 4. Chicago, IL: University of Chicago Press. pp. 187–294.
Salkind_Chapter 06.indd 100
9/16/2010 12:40:49 PM
Super and Harkness
The Developmental Niche
101
Sander, L.W., G. Stechler, P. Burns and A. Lee, 1979. ‘Change in infant and caregiver variables over the first two months of life: integration of action in early development’. In: E.B. Thoman (ed.), Origins of the infant’s social responsiveness. Hillsdale, NJ: Erlbaum. pp. 349–408. Scarr-Salapatek, S., 1976. ‘An evolutionary perspective on infant intelligence: species patterns and individual variations’. In: M. Lewis (ed.), Origins of intelligence. New York: Plenum. pp. 165–198. Schwartz, T., 1981. The acquisition of culture. Ethos 9, 4–17. Shweder, R.A., 1979. Rethinking culture and personality theory. Part I: A critical examination of two classical postulates. Ethos 7, 255–278. Shweder, R.A. and E.J. Bourne, 1982. ‘Does the concept of the person vary cross-culturally?’ In: A.J. Marsella and G.M. White (eds.), Cultural conceptions of mental health and therapy. Dordrecht: D. Reidel. pp. 97–137. Siegel, A.W. and S.H. White, 1982. “The child study movement: early growth and development of the symbolized child’. In: H.W. Reese (ed.), Advances in child development and behavior, Vol. 17. New York: Academic Press. pp. 234–286. Spemann, H., 1938. Embryonic development and induction. New Haven, CT: Yale University Press. Super, C.M., 1972. Cognitive changes in Zambian children during the late pre-school years. HDRU Reports, no. 22. Lusaka: University of Zambia. Super, C.M., 1976. Environmental effects on motor development: the case of ‘African infant precocity’. Developmental Medicine and Child Neurology 18, 561–567. Super, C.M., 1981. ‘Behavioral development in infancy’. In: R.H. Munroe R.L. Munroe, and B.B. Whiting (eds.), Handbook of cross-cultural human development. New York: Garland. pp. 181–270. Super, C.M., 1982. Secular trends in child development and the institutionalization of professional disciplines. Newsletter of the Society for Research in Child Development, Spring, 10–11. Super, C.M., 1983. ‘Cultural variation in the meaning and use of children’s “intelligence” ’. In: J.B. Deregowski, S. Dziurawiec and R.C Annis (eds.), Expiscation in cross-cultural psychology. Lisse: Swets and Zeitlinger. pp. 199–212. Super, C.M., 1986a. Culture, temperament, and behavior problems in infancy. Manuscript submitted for publication. Super, C.M., 1986b. Adult perceptions of neonatal behavior. Unpublished manuscript. Super, C.M., in press. ‘Developmental transitions in cognitive functioning in rural Kenya and metropolitan America’. In: K. Gibson, M. Konner and J. Lancaster (eds.), Brain and development. Hawthorne, NY: Aldine. Super, C.M. and S. Harkness, 1981. ‘Figure, ground, and gestalt: the cultural context of the active individual’. In: R.M. Lerner and N.A. Busch-Rossnagel (eds.), Individuals as producers of their development: a life-span perspective. New York: Academic Press. pp. 69–86. Super, C.M. and S. Harkness, 1982. ‘The infant’s niche in rural Kenya and metropolitan America’. In: L.L. Adler (ed.), Cross-cultural research at issue. New York: Academic Press. pp. 47–55. Super, C. M. and S. Harkness, 1983. Parental theories of children’s intelligence and personality. Paper presented on Symposium ‘Folk theories of childhood: the impact of cultural notions on adult–child interaction’, at the meetings of the American Anthropological Association, Chicago, IL, November. Thomas, A. and S. Chess, 1977. Temperament and development. New York: Brunner/ Mazel. von Bertalanffy, L., 1968. General systems theory. (Rev. ed.) New York: George Braziller. Waddington, C.H., 1957. The strategy of the genes. London: Allen and Unwin.
Salkind_Chapter 06.indd 101
9/16/2010 12:40:49 PM
102
Human Development
Whiting, B. B., 1980. Culture and social behavior: a model for the development of social behavior. Ethos 8, 95–116. Whiting, B.B. and J.W.M. Whiting, 1975. Children of six cultures: a psycho-cultural analysis. Cambridge, MA: Harvard University Press. Whiting, B. B., C. P. Edwards et al., 1986. The company they keep: the effect of age, gender and culture on social behavior of children aged 2–10. Unpublished manuscript. Whiting, J.M.W., 1977. ‘A model for psychocultural research’. In: P. H. Leiderman, S.R. Tulkin and A. Rosenfeld (eds.), Culture and infancy: variations in the human experience. New York: Academic Press. pp. 29–48. Whiting, J. M. W., 1981. ‘Environmental constraints on infant care practices’. In: R.H. Munroe, R.L. Munroe and B.B. Whiting (eds.), Handbook of cross-cultural human development. New York: Garland. pp. 155–180. Whiting, J. W. M. and I.L. Child, 1953. Child training and personality. New Haven, CT: Yale University Press.
Salkind_Chapter 06.indd 102
9/16/2010 12:40:49 PM
7 Conceptualizing Adult Development Calvin F. Settlage, John Curtis, Marjorie Lozoff, Milton Lozoff, George Silberschatz and Earl J. Simburg
O
ur study of adult development rests on the premise that human psychological development is a lifelong process. Originally development was regarded as a childhood phenomenon ending with the attainment of adult sexual capabilities during puberty and adolescence. Freud’s (1905) theory of psychosexual development was anchored in the biologically predetermined maturational progression through the oral, anal, oedipal, latency, and adolescent stages of psychosexual development. Psychoanalytic thinking later defined development as being initiated not only by biological, but also by psychological factors; development was extended to include adulthood. For example, Erikson (1950) elaborated Freud’s psycho-sexual theory to include three post-adolescent stages characterized by the successive attainment of a capacity for intimacy, for generativity, and for ego integrity. The concept of adult stages of development was also employed by Benedek (1959) in her discussion of parenthood as a developmental phase. Similarly, Bibring et al. (1961) viewed pregnancy as a part of a woman’s development and as initiating a developmental process leading to the special and relatively unambivalent attitude of the grandmother toward the grandchild. Freud’s theory of psychosexual development has been significantly extended and complemented by separation-individuation theory, a psychoanalytic developmental schema conceptualized by Mahler (Mahler et al., 1975). Although focused on the first three years of life, separation-individuation theory has life-course applicability (Panel, 1973).
Source: Journal of the American Psychoanalytic Association, 36 (1988): 347–369.
Salkind_Chapter 07.indd 103
9/16/2010 12:40:37 PM
104
Human Development
In recent years, new studies in “lifespan psychology” have generated data showing that development continues in an active way throughout life, and that structural change does not stop with adolescence (Emde, 1985, pp. 59–60). A number of writers have applied stage theory to adulthood. Citing evidence from his own and other studies, Gould (1972) has described adulthood as a time of active and systematic change embodying a series of distinct stages. Gould’s stages reflect changes in the sense of self as influenced particularly by the passage of time. Jacques (1981) demarcated adult developmental stages consisting of early adulthood; the midlife crisis, with transition into mature adulthood; and the late adult crisis, with transition into late adulthood. He has reserved judgement about the possibility of a still later stage at about 80 years of age. Dewald (1981) related adult stages to developmental tasks and crises: choice of occupation, marriage, parenthood, personal limitations and disappointments, illness and disability, retirement, aging and death. Moving away from stage theory, Colarusso and Nemiroff (1981) divided adulthood into early, middle, and late periods, stating as their objective the definition of the dynamic adult developmental tasks which occur roughly within these arbitrary chronological demarcations, rather than the definition of phases as such. They noted the absence of a comprehensive theory of adult development and offered a basis for such a theory. Similarly, Neugarten (1979) feels that psychological remodeling occurs throughout adult life, but that it is inaccurate to describe adulthood as a series of discrete and neatly bound stages. In his discussion of adult development based on the mourningliberation process, Pollock (1981) speaks of life-course expectancies and of adult developmental sequences as fields of development encompassing many variables. He, too, does not delineate discrete developmental stages. Our initial explorations led to the recognition that the stage model is not entirely satisfactory for adult development. Whereas it does provide a coherent and consistently applicable framework for the progression of individual child and adolescent development, the stage model has less coherence and a more limited application in individual adult development. In childhood development, each stage is initiated by a biologically predetermined maturational change. The childhood stages are therefore universal, occurring in all individuals. They follow an invariant sequence, and the ages for the onset and accomplishment of the stages are quite uniform, usually varying narrowly among individuals. In contrast, adult development is not initiated by biological maturational change, and the adult stages are not universal. For example, not everyone marries or becomes a parent, and all women do not become pregnant. Consequently, the stages are not invariant. The age of onset and accomplishment of an adult stage, such as parenthood, can vary widely among individuals. While the adult stage model provides a useful outline of the overall adult developmental progression and offers valuable developmental insights, it does not provide sufficiently specific criteria for adult development. We therefore
Salkind_Chapter 07.indd 104
9/16/2010 12:40:38 PM
Settlage et al.
Conceptualizing Adult Development
105
perceived the need to delineate a new model for development that would have applicability in adulthood and in childhood as well. In this initial conceptualization, we propose and will attempt to illustrate that such criteria can be derived from the concept of developmental process. Our pursuit of this objective will involve: (1) examination of the nature of development, (2) delineation of developmental process, (3) conceptualization of a process model of development, and (4) application of the proposed model to individual life-course development.
The Nature of Development The following statements from various authorities reflect close agreement on definitions of development1: 1. A gradual advance or growth and differentiation through progressive stages. 2. The whole process of growth and differentiation by which potentialities are realized. 3. A progressive development from lower and simpler to higher or more complex forms of organization. 4. A sequence of continuous change in a system extending over a considerable time leading to progressive change to a higher degree of differentiation and complexity (English and English, 1958, p.. 148). 5. Development proceeds from a state of relative globality and lack of differentiation to a stage of increasing differentiation, articulation, and hierarchic integration (H. Werner, in Wolff, 1960, p. 29). 6. A progression of stages wherein the transition from one stage to the next is defined by phase-specific and qualitatively new interactions between the individual and his environment; the transition constitutes a total restructuring of already present schemata under a new total organization in which more global (earlier) forms of behavior as well as more differentiated forms are available to the more mature individual (J. Piaget, in Wolff, 1960, p. 34). 7. Development is usually considered to include growth, learning, and the changes of biological maturation (English and English, 1958, p. 34). Psychoanalytic theory conceives human development to be determined by individual biological endowment and the influence of the parental, familial, sociocultural, and physical environment. The term maturation refers to: (1) the emergence of hereditary potential, such as talent and intelligence; (2) the biologically predetermined progression through the psychosexual and separation-individuation stages; and (3) the parallel unfolding of ego apparatus, as for locomotion, language, and procreation. Development, in both
Salkind_Chapter 07.indd 105
9/16/2010 12:40:38 PM
106
Human Development
children and adults, refers to progressive growth resulting from the interaction of endowment factors and environmental factors. Drive manifestations and the ego apparatus, although biologically provided, are shaped in this interaction. Those ego functions and structures which are not biologically rooted are developed through human interactions and the processes of internalization and identification (see Hartmann, 1939, pp. 50, 103–105, and Hartmann and Kris, 1945, pp. 24–26, regarding the concepts maturation and development). In basic agreement with the above-stated definitions, we define development as a process of growth, differentiation, and integration that progresses from lower and simpler to higher and more complex forms of organization and function. We propose further that the functions and structures resulting from development constitute additions to or advances in the self-regulatory and adaptive capacities. Although in general agreement with Pollock’s (1981) conceptualization of adult development, our definition of development differs from his. Pollock states (p. 552): Aging is development throughout the life course. Development, obviously, is not the same as growth and can include progression, regression, new contributions, remodeling, and, in some ways, decline. Aging, beginning with conception and ending with death, is to be distinguished from aged, a period of late adult life (usually after 75) where changes that lead to ultimate failures become evident. Decline may or may not be regressive.
In Pollock’s view, aging as a life-course process involves development as well as decline. We do not believe that decline or loss of function, or regression can be regarded as development. Temporary regression is part of the ebb and flow of developmental process, but the eventual outcome of development is forwart and new. In contrast to the biological changes of childhood and adolescence which are intrinsic to the development of new and higher functions, the biological changes of adulthood commonly result in decline and loss of function. Such change is nevertheless instrumental in adult development. Particularly in late adulthood, it can be a major stimulus for new development to compensate for loss.
Developmental Process Developmental process is the function- and structure-forming process that parallels and derives from developmental interaction. Our discussion of developmental process begins with childhood where its basis is most readily perceived. In its original form, it rests on the mother-child interaction. The mutual interactive regulation of the child’s emotional state begins at birth, and is the predominant regulatory mode during the first year of life.
Salkind_Chapter 07.indd 106
9/16/2010 12:40:38 PM
Settlage et al.
Conceptualizing Adult Development
107
During the first year and more intensely during the second year, developmental process accounts for the formation of the child’s self-regulatory capability. Within the mother-child interaction, the mother serves initially as an external auxiliary ego for the child. Through identification with the mother, her regulatory interventions and the attitudes governing them are internalized and become part of the child’s own regulatory functions. These functions gain increasing autonomy as they gradually become independent of their source in the parent. Concurrently, they become integrated, organized, and grouped within the developing ego and superego structures as they progress from preliminary to more definitive structuring. A developmental interaction also takes place with the father and with other closely involved persons, such as older siblings, grandparents, and parent surrogates. In early development, the influence of the broader sociocultural and physical environment is mediated through these primary dyadic relationships. The attainment of a new function by the child requires a corresponding relinquishment by the child of the mother’s no longer needed participation as an external auxiliary ego, and the relinquishment by the mother of the new function to the child. Such relinquishment is essential to the full internalization of functions in the progression toward integration and relative autonomy. Self-regulation means that a function, although still related to its source, is operative without immediate external support. Successful developmental process thus leads to a diminishing developmental need for the human object (see, e.g. Settlage 1980). The mother-child relationship embodies a developmental potential or gradient determined by the difference between the functional and structural level of the mental apparatus of the mother and that of the child. Because of this gradient, the developmental interaction “lifts” the forming structure of the child to successively higher levels of function and organization (Loewald, 1960, pp. 20–21; Loewald, 1978, p. 498; Settlage, 1980, pp. 152–153). The gradient concept can be applied to any two-person relationship wherein one individual can learn arid develop in interaction with the other. Developmental process in the adult can involve an overt, close relationship, as between mentor and student, or a more subtle and distant relationship, as between lecturer and listener or even author and reader. The developmental positions of the involved pair and the gradient can shift back arid forth as each develops in relation to the other. This phenomenon can be observed in study arid discussion groups. It is conceivable that developmental process, in its most refined form, can take place solely intrapsychically in imagined interactions with internalized representations of others or oneself, as in creative process. In the progression from infancy through adulthood, the interactive process tends to become less interpersonally close, more refined, and more internalized. Developmental process usually becomes engaged and is carried out at a conscious or preconscious level of awareness. But it can take place, in part or in whole, at an unconscious level.
Salkind_Chapter 07.indd 107
9/16/2010 12:40:38 PM
108
Human Development
Loss and Development Historically, the role of loss in psychic structure formation was set forth by Freud (1917) in his account of the mourning process (pp. 237–258). Through mourning and internalization, the tie to the lost object is replaced by psychic structure in the form of an identification. Freud (1923) subsequently applied the concept of mourning to development, stating that an attenuated process similar to that of mourning is inherent in development generally. In his words, “the character of the ego is the precipitate of abandoned object cathexes” (p. 29). Of interest in this regard is Pollock’s (1977) concept of the mourningliberation process. Seeing mourning as a transformational process that provides for the adaptation to change (p. 14), Pollock emphasized gain, not loss (p. 11): I have found the focus on the mourning-liberation process to be of great importance. The basic insight is that parts of the self that once were, or that one hoped might be, are no longer possible. With the working out of the mourning of the changed self, lost others, unfulfilled aspirations, as well as feelings about reality losses and changes, there is an increasing ability to face reality as it is and as it can be. “Liberation” from the past allows the unattainable to occur [Pollock, 1981, p. 576].
Mahler (1972), in her conceptualization of the separation-individuation process, observed that a minimal threat of object loss is inherent in every new step of independent functioning. Separation is obligatory in normal development, and the threat of object loss is an indispensable developmental catalyst (p. 333). Thus, loss in the course of development is associated with advances in ego and superego development resulting from internalization and identification. The full structuring of an identification-derived function entails the attenuation and eventual relinquishment of the function-associated involvement with and tie to the love object. Such a “letting go of” entails both pleasure in the child’s autonomous functioning and varying degrees of sense of loss. The sense of loss ranges from absent or minimal, to strong, for both the child and the mother. Facing the loss and letting go involve feelings of sadness and grief in a process similar to mourning (Mahler, 1961, p. 162). However, the forward-moving impact of the maturational thrust and the anticipatory excitement of the desire to develop can supercede the sense of loss. The threat of passively experienced loss can also be overridden defensively by an active relinquishment of object ties and earlier adaptive modes. In both childhood and adulthood, normative loss or threat of loss within the developmental progression and adventitious loss from life’s unexpected experiences can stimulate and mobilize developmental process. Adventitious loss can include the following: loss of a loved one; loss of a function; loss of a body part; loss of self-esteem; loss of a pet; and even loss of a valued inanimate object.
Salkind_Chapter 07.indd 108
9/16/2010 12:40:38 PM
Settlage et al.
Conceptualizing Adult Development
109
Loss of a relationship and the often involved tacit dependency commonly confront the individual with the need to develop new capacities. Despite the prevalence of loss experiences in old age, it is noteworthy that psychic integrity and the sense of identity can be maintained, often through further development.
The Process Model of Development The stimulus for development is disturbance of the previously adequate selfregulatory and adaptive functioning. Such disturbance is caused by different kinds of stimuli: (1) biological maturation; (2) environmental expectation and demand; (3) a loss or other traumatic experience; and (4) a perceived possibility of achieving a better adaptation resulting in a self-initiated desire to develop. The disturbance of the previously satisfactory functioning creates an unsettled state or disequilibrium with varying degrees of mental and emotional stress. The state can be ego-syntonic, as in the case of the self-initiated desire to develop, or it can be ego-dystonic, as in the case of the thrust-upon, intrusive, traumatic experience. Its emotional concomitants can range from pleasant anticipatory feelings, even eagerness, to intensely unpleasant, anxious or depressed feelings. Regardless of whether the state is self-initiated or thrust upon the individual, there is a conscious or unconscious sense of dissatisfaction with one’s situation that calls for solution and change. In response to the dissatisfaction, the individual may develop, may attempt to maintain the status quo, may regress to earlier levels and modes of self-regulation and adaptation, or may employ defensive moves that can lead to the formation of psychopathology.
A Sequence of Developmental Process A developmental response to the dissatisfaction of the unsettled state activates a sequence of developmental process. This sequence includes the following elements: 1. Developmental Challenge. Either consciously or unconsciously, the individual perceives and accepts a developmental challenge. Examples of developmental challenges are the need for new skills, new modes of regulating feelings and impulses, and new attitudes and values. Acceptance of a specific developmental challenge engages developmental process and transforms the unsettled state into an organized, goal-directed state. 2. Developmental Tension. Within this goal-directed state, the gap between where the individual is and where the individual now wants or needs to
Salkind_Chapter 07.indd 109
9/16/2010 12:40:38 PM
110
Human Development
be creates a developmental tension. This positive tension, which replaces the negatively experienced disequilibrium of the unsettled state, serves as a motivating and development sustaining force. 3. Developmental Conflict. The engagement of developmental process also generates developmental conflict. The acceptance of a developmental challenge transforms the internally generated or environmentally presented expectations and demands, and the resulting unsettled state, into an internal developmental conflict. The desire to change, as it includes the wish for approval and fear of disapproval, commonly evokes fear of loss of the security experienced in the status quo, fear of failure and discouragement at seeming lack of progress, and anxieties about imagined negative consequences of success. Optimally, developmental conflict causes only transient or no symptomatic behavior and is resolved through development (see Nagera, 1966, pp. 39–47). 4. Resolution of Developmental Conflict. Resolution of developmental conflict leads to self-regulatory or adaptive structure formation. It proceeds hand in hand with the mastery, internalization, and integration of the new function. 5. Change in the Self-representation. Finally, the development and integration of a new function or structure is marked by a change in the selfrepresentation and in the individual’s overall sense of identity. A sequence of developmental process results in one or more of the following accomplishments: (a) formation of a new function; (b) elaboration or refinement of an existing function; (c) further integration of an existing function toward greater autonomy and structural stability; (d) reorganization of psychic structure to a higher level of function. To reiterate and elaborate, a developmental challenge can come from within the individual, through self-initiation or as a result of maturational processes, or from the environment, through the presentation of a new expectation, a new opportunity, or a new problem. Examples of developmental challenges are: the maturational unfolding of the language capacity in childhood; adult experiences such as marriage, parenthood, or the death of spouse; desirable opportunities such as learning a new skill or assuming new responsibilities; and threatening experiences such as illness of oneself or an important love object, loss of a job, loss of a friend, or loss of function due to aging. Engagement of developmental process can result from a conscious decision or from unconscious processes. The developmental tension resulting from the engagement of developmental process is positively experienced and serves to motivate and propel the individual toward the acquisition, mastery, and integration of the new function. Developmental process includes developmental conflict. The striving to learn and master a new function can be complicated by the fear of failure, by
Salkind_Chapter 07.indd 110
9/16/2010 12:40:38 PM
Settlage et al.
Conceptualizing Adult Development
111
frustration over lack of progress, and by fear of success. Movement toward a new and higher level of functioning also engenders conflict. It characteristically involves letting go of emotional ties and giving up familiar, well-practiced and therefore “safe” modes of functioning in which there also has been a significant emotional investment. Conflict also can be generated by conscious and unconscious fantasies and concerns about the effect of one’s developmental advance on others. Examples of such concerns are: guilt over winning in actual or imagined competitions; guilt over surviving and enjoying success at the expense of another; and guilt over “abandoning” the parent or the spouse in moving ahead of, and thus away, from them. Under normal conditions, resolution of developmental conflict is a usual and natural occurrence. It requires tolerance of frustration and anxiety while resolving the conflict and relinquishing failed or outmoded functions and dependencies. Resolution of conflict is paralleled by the acquisition, mastery, and integration of the new function; pleasure and satisfaction are associated with these achievements. Although resolution and mastery characteristically involve progressive and regressive alternations, development sometimes proceeds smoothly, quickly, and relatively free of conflict. Failure to resolve developmental conflict tends to occur under pathological conditions and results in temporary or indefinite arrest of development in given areas. Resumption of development sometimes takes place spontaneously at a later time, due to more favorable conditions (Goodman, 1977, pp. 56–60) or as a consequence of successful treatment.
Examples The following examples are taken from the clinical psychoanalytic situation and from everyday life. In keeping with the focus of this paper, the illustrations are mainly of adult development. One example of child development and one of adolescent development are included to demonstrate the lifecourse applicability of the model. The immediate antecedents and the proposed steps in the accomplishment of developmental process are denoted and underscored in the child example and are discernable and noted, in some measure, in the other examples. By these examples, we are not seeking to explicate all the factors in therapeutic and developmental process that result in the initiation or accomplishment of a developmental advance. Rather, our purpose is to demonstrate the application of the major dimensions of our process model of development. The child example involves a grandmother and her two grandsons, ages five and ten, whom she regularly took on outings. Customarily, the younger boy would go with his grandmother to the ladies’ room while the older boy went to the men’s room. On an outing at the time the younger boy had begun kindergarten, the three of them headed for the ladies’ and men’s rooms.
Salkind_Chapter 07.indd 111
9/16/2010 12:40:38 PM
112
Human Development
As they approached them, conflict was unmistakably present in the younger boy’s facial and bodily expressions: He had a very warm relationship with his grandmother. With her, was security and the familiar. With the older brother, who tended to be impatient and indifferent, lay change and uncertainty. Yet, in the service of growth, the choice of the five-year-old seemed inevitable. Pulled by his gender identity, he tagged along behind his brother. As he marched into the men’s room, his expression of hesitancy and conflict turned into determination and swagger. The five-year-old’s previous comfort in going to the ladies’ room with his grandmother was disrupted by his maturationally induced ability to fend for himself, by his growing identity as a male, and by the environmental expectation that he use the men’s room and not the ladies’ room. The resulting unsettled state, which he also had been experiencing in school and in other contexts with his parents, eventuated in his solution of the problem through development. He embraced the developmental challenge. The attendant developmental tension and developmental conflict were manifested in his briefly expressed facial and bodily gestures. Resolution of conflict was evinced in the action of using the men’s room. In this example, the emotional experience of loss and relinquishment was not manifest, but was suggested by the hesitancy and the defensive stance of determination. The pleasure and pride of accomplishment were evident in the swaggering gait. In his developmental advance, the boy employed existing ego functions to regulate his feelings and adaptively fend for himself. The adaptive value of behaving like a man was affirmed; the resulting change in the self-representation strengthened and elaborated his identity as a male. An example of adolescent development was observed during the fourth year of analysis of an adolescent girl. When she began treatment at age 16, she suffered from a chronic, intense separation anxiety which appeared to stem mainly from conflicts between her parents. They had separated for some months when the patient was three years old and were divorced when she was 10. She had lived in an excessively dependent relationship with her mother from 10 to 17, and then with her father while attending college. Her neurotic conflicts hampered her learning capacity and her ability to form relationships with male peers. At the time of the observation, the patient was successfully working toward transfer to a better college and was enjoying relationships with males. The therapeutic resolution of the separation anxiety and the associated freeing up of her independent strivings upset the patient’s heretofore comfortable parental dependency. Her disrupted adaptation led her to move out of her father’s home into the nearby home of the father’s fiancée and her 18-year-old son. She thus accepted the developmental challenge of becoming more independent while still living close to her father. For the first week after the move, she felt an exhilarating sense of freedom and self-sufficiency. She also thoroughly enjoyed the relationship with the
Salkind_Chapter 07.indd 112
9/16/2010 12:40:38 PM
Settlage et al.
Conceptualizing Adult Development
113
18-year-old. She cooked for him, did his laundry, and made his bed. During the same week, her father bought her a car. He had promised to provide it when she went to the new college, where she would be living in a dormitory. She felt he was generally nicer to her than when she was living with him. The following week the patient came to treatment feeling “down” for no apparent reason. As she talked, she became aware of mixed feelings about the car. She felt that the premature gift of the car was a reaction to her move toward independence and reflected her father’s desire to hold on to her and keep her dependent. It was confusing, though, because she also felt he was demonstrating that it was alright to leave and grow up. The patient felt sad and cried as she realized that she was indeed growing up. She recognized that her “down” mood was due to the sense of loss experienced in moving away from her father, physically and psychologically. As would be expected, previously analyzed oedipal conflicts were reawakened, reinterpreted, and further worked through in this new context. The resumption of the arrested separation-individuation process of early childhood continued for some months, intertwined with the age-appropriate individuation of adolescence (Blos, 1967). There were repeated regressive shifts in her overall forward progression. The patient gradually worked through the feelings of loss associated with the relinquishment of dependency ties to her parents. As her developmental conflicts became resolved, she took increasing pride in her new image as a self-sufficient, responsible person. An example of adult development, taken from the analysis of a woman in her late forties, illustrates a developmental conflict involving the parenting and post-parenting phases of the stage model. For better than a year during the latter part of her analysis, this woman had been moving toward full adult functioning and free exercise of her considerable, formerly inhibited abilities, talents, and creativity. Her conflict about successful performance had been analyzed mainly in terms of her guilt about outdoing her parents, particularly her mother, and outdistancing and therefore figuratively abandoning them. Her guilt about achieving an adult level of independence was accentuated by the initially unconscious implication of figurative parricide in no longer needing the parent (Loewald, 1979). This woman’s developing identity as an independent adult, facilitated by the resolution of her pathological conflicts, then came into conflict with her long-standing identity as the mother of three children. She had intermittently been aware of the conflict between her new interests and new functioning and her still-practiced role of a mother. The conflict became an unsettled state when her grown children, in part in reaction to the healthy changes in her, forcefully began to assert their autonomy and independence. This disrupted the relationship with each of them and led to temporary withdrawal on both sides. In reaction to the disruption of her maternal adaptation, the woman experienced intense feelings of bitterness, loss, and sadness alongside
Salkind_Chapter 07.indd 113
9/16/2010 12:40:38 PM
114
Human Development
ambivalent anticipation of freedom from the responsibilities of being a mother. In the process, she vacillated between feeling “lousy,” worthless, and depressed, and having suicidal thoughts. In her blackest mood, she felt it was too late for her, too late to get anywhere. She wondered whether the opposite of benign motherhood was mean, ugly misanthropy. After a period of struggle, she reembraced the challenge to develop to her full potential. Through the mourning process, she gradually worked through the feelings of loss associated with giving up her identity as an active parent. The initial incompatibility between her new self and her parenting self was resolved. The developing new functions serving her independence became more fully structured and integrated into her personality. A second example of adult development also derives from clinical observation. The life situation of a 54-year-old woman was disrupted by her husband’s heart attack and subsequent continuing disability. The initial reaction of shock was followed by a period of distress and anxiety and a sense of helplessness and vulnerability. A major determinant of this state was ignorance about her and her husband’s financial resources and their management. This had been her husband’s province. With a view to helping her husband in this area and, if necessary, managing it herself, she embraced the challenge of learning about their finances and the investment and business world. Excited by the challenge and motivated by the developmental tension, she also experienced doubt and conflict: “Am I capable of doing it?” “Do I really want to take care of myself?” “Will my husband approve?” Her conflict waxed and waned as she progressed in her learning and took pleasure in her growing competence. In the midst of these changes, she pulled a ligament in her arm while playing racquet ball. She both joked and cried during the analytic session in which she reported this incident. Mindful of previously acquired insight about her fear of her aggression and her mixed feelings about competition and success, she quipped that she had ruined her “killing arm.” Connecting the pulled ligament with her success in learning about the finances, she wondered why she was crying. “I should be triumphant! It is a very satisfying thing to feel I can handle our financial affairs.” She then burlesqued the tragedy of “Poor Me.” “I have to take care of myself. All by myself. I have to be – I can be? – both the mother to myself and the child.” In response to the observation that she was crying over and mourning the loss of being taken care of, she said, “That did flash through my mind. I’ve lost my innocence, my dependency, a comfortable way of being that I counted on.” This woman resolved her conflict and continued her learning. She developed not only investment skills, but also business skills, and eventually undertook her own business venture. The third example of adult development comes from the case of a 69-year-old man, a retired professional who was first analyzed in his early forties. Through that analysis, he had improved his professional functioning, gained insight into his troubled relationship with his wife, and experienced
Salkind_Chapter 07.indd 114
9/16/2010 12:40:38 PM
Settlage et al.
Conceptualizing Adult Development
115
a marked diminution of anxiety and depressive reactions. He sought treatment again four years after the dissolution of his marriage. He was then living with a woman who, though physically handicapped, was professionally active. She managed her colostomy and the consequent limitations with imagination, fortitude, and good spirit. A concern in this new relationship was the man’s long-standing attitude of not wanting to be dependent on anyone, even if he were sick and disabled. Therefore, he did not want to marry again and felt he would prefer suicide to being a burden. His ladyfriend could not understand his willingness to take care of her, but his unwillingness to let her take care of him, should the need arise. The treatment helped the man deal with a major obstacle to his acceptance of dependency. He became aware that his considerable hostility toward his adult adopted son was behind his own fear of dependency. The son had never been self-sufficient, and the patient had supported him for years, financially and emotionally. The patient feared that his own dependency would, in turn, evoke intense hostility toward himself. This insight permitted him to reconsider his attitude about dependency and to accept the challenge of developing a healthier perspective. Acceptance of this challenge mobilized anxieties and conflicts which the patient needed to resolve in developing an appropriate attitude about dependency. Was he a good enough parent and a good enough person to deserve care? Would he be exploited? Would he exploit? Could he trust himself not to exploit? Would he suffer shame in losing control and strength? Would he be respected for the strengths he could maintain, or would he be belittled, disregarded, and treated as less advanced and less wise because of physical disability? Could he maintain his personal integrity? With his ladyfriend as a prime example, he became aware that a degree of dependency not only can be an essential part of successful adaptation in the later years but, paradoxically, can serve to support the maintenance of overall personal integrity. An example of unconscious engagement of developmental process, observed outside of the clinical situation, is represented by the sudden realization of an elderly woman writer that she had inexplicably acquired a new critical faculty toward her own writing. This function appeared some months after she had grieved the death of her loved, but resented, older sister. The writer had experienced her sister as hypercritical of her generally, and of her writing in particular. It can be inferred that the combination of mourning the loss of the sister, the consequent identification with the sister and her criticism, and the no-longer present external criticism freed the writer to develop and use her own critical functions in relation to her writing. The final example of adult development, also from outside the clinical situation, is that of a recently widowed man in his early seventies. Throughout his long and happy marriage, his wife had done all of the grocery shopping and the cooking, and had done all of the preparations for entertaining friends at dinner. In the months following his wife’s death, friends frequently invited him to their homes for dinner. As his mourning process eased, he felt ready to
Salkind_Chapter 07.indd 115
9/16/2010 12:40:38 PM
116
Human Development
initiate social relationships and wanted to invite friends to dinner at his home. But he found himself completely lacking in culinary knowledge and cooking skills, and was not sure he could even set a proper table. He decided to acquire the necessary knowledge and skills, proceeded to do so, and shortly was having friends over for a dinner he had prepared. Significantly, these accomplishments represent development in two respects. He acquired new skills serving self-regulation and adaptation. At the same time, he altered his self-representation, not only in these regards but through identification with his deceased wife and her abilities. This identification enhanced his sense of closeness to his wife as he remembered her, and thus helped him deal with his loss.
Discussion As observed in the psychoanalytic situation, development during adulthood involves adult-level development and the resumption of childhood development. Adult-level development includes (1) development in response to life events characteristic of adulthood, such as marriage, parenthood, retirement, biological decline, or the loss of loved ones through death; and (2) development in response to the perceived possibility of achieving a better level of function and adaptation. Although taking place during adulthood, the resumption of childhood development, such as the consolidation of the superego enabled by resolution of the oedipal conflict, is not adult-level development. We wish to reiterate that the loss of function due to biological decline and adaptation through regression to earlier, previously developed modes of functioning docs not fit our definition of development. Development, as we define it, involves new functions and structures, and new and higher levels of organization. However, the acceptance of a necessary dependency can represent development insofar as it reflects a change in attitude and self-perception without loss of the sense of integrity. Our conceptualization of a process model of development involving a sequence beginning with a developmental challenge and ending with a change in the self-representation is not meant to suggest that development of the involved function is thereby closed off or completed once and for all. On the contrary, development is normally an open system, in contrast to the closed system mental apparatus resulting from pathological structure formation (Emde, 1980, pp. 218–220). Furthermore, psychic structure is not fixed or rigid (Loewald, 1978; Settlage, 1980, p. 160). Under conditions of emotional stress, psychic structure is subject to regression in degree of integration and level of organization and function. At the same time, it is open to revision and further structuring and organization through further developmental process. Despite integrating plateaus and epigenetic leaps and discontinuities, overall development is coherent and continuous (A. Freud, 1963).
Salkind_Chapter 07.indd 116
9/16/2010 12:40:38 PM
Settlage et al.
Conceptualizing Adult Development
117
Appreciation of the fact of development gives rise to several related questions: 1. What is the relation between therapeutic process and developmental process? From one perspective, they are different and separate, in that therapeutic process merely enables the resumption of developmental process through the undoing of the arrest-causing psychopathology. From another perspective, therapeutic process can be conceived to embody developmental process. This perspective rests on an analogy between processes in the parent-child relationship and the therapist-patient relationship (Loewald, 1960; Settlage, 1980, pp. 159–166). 2. In what ways does our understanding of adult development bear on clinical understanding and psychoanalytic technique? 3. What are the determinants of the strikingly different individual responses to trauma and loss? Why do some adults, including very elderly adults, respond with growth and development, and why do others succumb to lasting regression and increasing withdrawal from life? In conclusion, we hope that the conceptualized process model of development will stimulate and aid the search for answers to these important questions.
Note 1. The first three definitions derive from Webster’s Third New International Dictionary.
References Benedek, T. (1959). Parenthood as a developmental phase: a contribution to libido theory. J. Amer. Psychoanal. Assn., 7:389–417. Bibring, G. L., Dwyer, T. F., Huntington, D. S. & Valenstein, A. F. (1961). A study of the psychological processes in the pregnancy and earliest mother-child relationship. Psychoanal. Study Child, 16:9–72. Blos, P. (1967). The second individuation process of adolescence. Psychoanal. Study Child, 22:162–186. Colarusso, C. A. & Nemiroff, R. A. (1981). Adult Development: A new Dimension in Psychodynamic Theory and Practice. New York: Plenum. Dewald, P. A. (1981). Adult phases of the life cycle. In: The Course of Life: Psychoanalytic Contributions Toward Understanding Personality Development. Vol. 3. Washington, D.C.: U.S. Gov’t. Printing Office, pp. 35–53. Emde, R. N. (1980). Ways of thinking about new knowledge and further research from a developmental orientation. Psychoanal. Contemp. Thought, 3:213–235. ———(1985). From adolescence to midlife: remodeling the structure of adult development. J. Amer. Psychoanal. Assn., 33 (Suppl.):59–112. English, H. B. & English, A. C. (1958). A Comprehensive Dictionary of Psychological and Psychoanalytic Terms. New York: Longmans, Green.
Salkind_Chapter 07.indd 117
9/16/2010 12:40:38 PM
118
Human Development
Erikson, E. H. (1950). Childhood and Society. New York: Norton. Freud, A. (1963). The concept of developmental lines. Psychoanal. Study Child, 8:245–265. Freud, S. (1905). Three essays on the theory of sexuality. S. E., 7. ———(1917). Mourning and melancholia. S. E., 14. ———(1923). The ego and the id. S. E., 19. Goodman, S., Ed. (1977). Psychoanalytic Education and Research: Current Situation and Future Possibilities. New York: Int. Univ. Press. Gould, R. L. (1972). The phases of adult life: a study in developmental psychology. Amer. J. Psychial., 129:521–531. Hartmann, H. (1939). Ego Psychology and the Problem of Adaptation. New York: Int. Univ. Press, 1958. ———& Kris, E. (1945). The genetic approach in psychoanalysis. Psychoanal Study Child, 1:11–30. Jacques, E. (1981). The midlife crisis. In The Course of Life: Psychoanalytic Contributions Toward Understanding Personality Development, Vol. 3. Washington, D.C.: U.S. Gov’t Printing Office, pp. 1–23. Loewald, H. (1960). On the therapeutic action of psychoanalysis. Int. J. Psychoanal., 41:16–33. ———(1978). Instinct theory, object relations, and psychic structure formation. J. Amer. Psychoanal. Assn., 26:493–506. ———(1979). The waning of the Oedipus complex. J. Amer. Psychoanal. Assn., 27:751–755. Mahler, M. S. (1961). On sadness and grief in infancy and childhood: Loss and restoration of the symbiotic object. Psychoanal. Study Child, 16:332–351. ———(1972). The rapprochement subphase of the separation-individuation process. Psychoanal. Q., 41:487–506. ———Pine, F., & Bergman, A. (1975). The Psychological Birth of the Human infant: Symbiosis and Individuation New York: Basic Books. Nagera, H. (1966). Early Childhood Disturbances, the Infantile Neurosis, and the Adulthood Disturbances: Problems of a Developmental Psychology. New York: Int. Univ. Press. Neugarten, B. (1979). Time, age, and the life cycle. Amer. J. Psychiat., 136:887–894. Panel, (1973). The experience of separation-individuation in infancy and its reverberations through the course of life (3 parts). M. C. Winestine, I. M. Marcus & I. Sternschein, reporters (consecutively). J. Amer. Psychoanal. Assn., 21:135–167, 633–645. Pollock, G. H. (1977). The mourning process and creative organizational change. J. Amer. Psychoanal. Assn., 25:3–34. ———(1981). Aging and aged: development on pathology. In The Course of Life: Psychoanalytic Contributions Toward Understanding Personality Development, Vol. 3. Washington, D.C.: U.S. Gov’t. Printing Office, pp. 549–585. Settlage, C. F. (1980). Psychoanalytic developmental thinking in current and historical perspective. Psychoanal. Contemp. Thought, 3:139–170. Wolff, P. H. (1960). The Developmental Psychologies of Jean Piaget and Psychoanalysis. New York: Int. Univ. Press.
Salkind_Chapter 07.indd 118
9/16/2010 12:40:38 PM
8 Early Child Care and Children’s Development Prior to School Entry: Results from the NICHD Study of Early Child Care NICHD Early Child Care Research Network
H
istorical changes in the economy of the United States, as well as changes in women’s concepts of their roles in society and the family, have together led to substantive changes in the rearing of infants and young children (Scarr, 1998). Early child care, beginning a few months after birth, has become a normative experience for American children (Bachu, 1995). In 1997, for example, 79% of children under the age of 3 years regularly spent time in nonparental care, with 39% of these children in care for 35 or more hours per week (Capizzano & Adams, 2000). In the kindergarten class of 1998–1999, 81% of the children had child-care experience prior to school entry (West, Denton, & Germino-Hausken, 1999). This child-rearing landscape contrasts with that of some other countries where parents (most often, mothers) provide most of the care for their young children. The placement of infants and young children in child care challenges deeply held beliefs and scientific theories that stress the importance of maternal care (Bowlby, 1973; Brazelton, 1986). Research on the effects of early child care on children’s development has proved highly controversial (Fox & Fein, 1990), with researchers drawing vastly different conclusions about the direction of those effects. Some have contended that child care is a source of enrichment that promotes academic and social development (e.g., Clarke-Stewart,
Source: American Educational Research Journal, 39(1) (2002): 133–164.
Salkind_Chapter 08.indd 119
9/16/2010 12:40:24 PM
120
Human Development
Gruber, & Fitzgerald, 1994; Lamb, 1998), whereas others have expressed concerns about the developmental risks associated with early child care (e.g., Belsky, 1999, in press). A third group asserts that reports of both negative and positive consequences of child care are vastly exaggerated because discerned effects are negligible and do not endure over time (e.g., Blau, 1999; Scarr, 1998). Increasingly, as nations move to raise educational standards for children’s performance in school (National Education Goals Panel, 1997), experiences in child-care settings are looked to as sources of variability in children’s readiness for school (Pianta & Cox, 1999). Because the debate about the effects of child care on school readiness has implications for social and educational policy, clarification of the nature and extent of child care as a source of variability in children’s developmental status is a pressing scientific concern. One reason for the ongoing controversy about the developmental consequences of child care is that different child-care parameters – quantity, quality, and type of setting – typically have been examined in isolation or in only limited contexts (Vandell, Gallagher, & Dadisman, 2000). Consider, for example, child-care quantity. Studies have reported significant associations between substantial amounts of nonmaternal care during infancy and poorer parentchild relationships (Belsky, 1999; Clark, Hyde, Essex, & Klein, 1997), elevated rates of insecure infant-parent attachments (Belsky & Rovine, 1988; BraungartRieker, Courtney, & Garwood, 1999), heightened behavior problems (Baydar & Brooks-Gunn, 1991; Park & Honig, 1991), and problematic peer relationships (Bates et al., 1994; Hoffman & Youngblade, 1999; Vandell & Corasaniti, 1990). Unfortunately, research examining the effects of large amounts of care has rarely included assessments of child-care quality, making it impossible to determine if the seemingly adverse effects of substantial hours in child care are a function of poor-quality care or particular types of care. Other researchers have documented positive relations between child-care quality and children’s linguistic, cognitive, and social functioning (e.g., Burchinal et al., 2000; Goelman & Pence, 1987; Howes & Stewart, 1987; McCartney, 1984; Vernon-Feagans, Emmanuel, & Blood, 1997). In their studies, stimulating and emotionally supportive care was associated with enhanced development. Indeed, the link between quality of child care and children’s development is probably the most consistent finding to emerge during the past two decades (cf. Lamb, 1998). Most of this research, however, did not consider the quality of care in different types of settings (but see Clarke-Stewart, Gruber, & Fitzgerald, 1994; Melhuish, Lloyd, Martin, & Mooney, 1990, for notable exceptions) or the child’s history of care, including the amount of care since birth. Child-care research also has been limited in that it has tended to focus on centers, even though the care of infants and very young children often occurs in the child’s own home or the home of a relative or unrelated caregiver (Hofferth, Shauman, Henke, & West, 1998). Positive effects of center-based
Salkind_Chapter 08.indd 120
9/16/2010 12:40:25 PM
NICHD Early Child Care Research Network
Early Child Care
121
programs on children’s cognitive and language skills have been demonstrated in experimental interventions for infants and children from low-income families (Burchinal et al., 2000; Ramey et al., in press). It has not been established whether there are similar beneficial effects of center-type experiences for children from affluent families, or beneficial effects when the quality of care is not exemplary or when the care is used for brief periods. An initial series of reports from the current project highlighted both positive and negative child-care effects up to 3 years of age. A lot of time in child care was associated with less harmonious mother-child interactions during the first 3 years (NICHD Early Child Care Research Network, 1999a) and heightened behavior problems according to caregivers at 2 years (NICHD Early Child Care Research Network, 1998). Higher-quality care and more experience in centers predicted better linguistic, cognitive, and pre-academic functioning (NICHD Early Child Care Research Network, 2000b) and fewer behavior problems (NICHD Early Child Care Research Network, 1998). These initial findings must be regarded as preliminary, however, because children’s developmental trajectories are quite fluid during the first 3 years of life. In the current study, all three child-care parameters – quantity, quality, and type of care – are considered in relation to children’s cognitive, language, and social functioning at 4½ years of age. That age is an important one to study because it is just prior to children’s entry into formal schooling. A recent meta-analytic review of 60 studies (Laparo & Pianta, in press) determined that cognitive and social skills measured in the late preschool years were predictive of performance in the same domains during the early school years. For cognitive or academic predictors of similar outcomes, effect size was moderate (.49). For social or behavioral predictors of similar outcomes, effect size was small (.27). Unfortunately, many children begin formal schooling with deficiencies in these areas. In one recent national survey (Rimm-Kaufman, Pianta, & Cox, in press), kindergarten teachers reported that 15% of their pupils had “serious problems” and that another 30% had “some problems” in adjusting to school. Even more telling was the finding that when asked about specific difficulties in readiness skills, half of the kindergarten teachers said that the majority of the children in their current class lacked competencies in pre-academic skills, following directions, and peer relations. Determining whether early childcare experiences contribute to individual differences in the kinds of skills valued and assessed by kindergarten teachers is a pressing scientific concern that has widespread implications for educational and social policies. Consequently, in this report we consider two basic questions: (1) Are early child-care experiences positively or negatively related to child functioning prior to school entry? And if so, (2) are statistical effects sufficiently large to be meaningful? In addressing these questions, we seek to move beyond a global characterization of early child care as good or bad for children and to examine specific aspects of care that may foster or undermine children’s
Salkind_Chapter 08.indd 121
9/16/2010 12:40:25 PM
122
Human Development
development, by focusing on the cumulative amount or quantity of care from birth onward, the quality of the care received throughout the early years, and the types of care experienced (e.g., center-based versus home-based care). Investigators have different views about how best to test for the effects of child care. Some (Blau, 1999) have argued that previous studies have inflated effects because selection factors were not adequately controlled. Others (Burchinal & Nelson, in press; Vandell & Wolfe, 2000) have contended that child-care effects are underestimated when too many family selection factors are included because some controls, such as parenting, that influence selection of child care also are influenced by children’s child-care experiences. To address the issue of appropriate levels of selection controls, we compare the results of analyses involving many covariates with results of those involving fewer covariates. We also consider if child-care effects are large enough to be of practical importance. Only recently have investigators begun to report effect sizes, and the effects have varied from large to small: d = 1.0 in a clinical trial in which high-quality child care was randomly assigned to low-income AfricanAmerican children (Campbell, Pungello, Miller-Johnson, Burchinal, & Ramey, 2001); d = .75 for cognitive and language outcomes among predominantly low-income African-American children (Burchinal et al., 2000); d = .5 for 4-year vocabulary, and d = .3 for 4-year math in a large four-site study of center child care (Peisner-Feinberg et al., in press). Social outcomes were examined only in the latter study and were not significantly related to observed quality of care. Effect sizes in naturalistic studies are typically small because they are measured in the context of many other influences (Cohen, 1988). Comparisons with other effects judged to be meaningful can be used as a gauge of social significance (McCartney & Rosenthal, 2000). In this report we compare the effect sizes associated with quantity, quality, and type of care with effect sizes associated with two other well-recognized developmental contexts. One, quality of parenting, is defined at the level of family process. The second, family poverty, represents a socioeconomic context. Parenting is a major predictor of children’s cognitive and social development because of the centrality of the family in children’s early lives and because it includes possible genetic as well as environmental influences on the child (Collins, Maccoby, Steinberg, Hetherington, & Bornstein, 2000). The negative relations of poverty to children’s cognitive, social, and physical development also are well documented (McLoyd, 1998). Children from economically poor homes begin school at a disadvantage that has been judged to be large enough to warrant public expenditures for Head Start, Title I early education programs, and other services. Hence parenting quality and family poverty are socially significant contexts with which to evaluate child-care effects. Studying effects of child care in the United States is methodologically challenging because children typically experience multiple arrangements
Salkind_Chapter 08.indd 122
9/16/2010 12:40:25 PM
NICHD Early Child Care Research Network
Early Child Care
123
(Hofferth et al., 1998; NICHD Early Child Care Research Network, 1997a). It cannot be assumed that reliance on a single assessment at one point during the first 5 years is an adequate representation of a child’s experience in child care. Consequently, in the NICHD Study we have collected information about amounts and types of care every 3 to 4 months. Observations of the quality of primary arrangements were obtained at five points, when children were 6, 15, 24, 36, and 54 months of age. Previous research (e.g., Chin-Quee & Scarr, 1994; Deater-Deckard, Pinkerton, & Scarr, 1996; Peisner-Feinberg & Burchinal, 1997) that examined long-term effects typically has relied on assessments of quality at a single age.
Method Participants Families were recruited through hospital visits shortly after the birth of a child in 1991 at ten locations in the United States (Little Rock, AR; Irvine, CA; Lawrence, KS; Boston; Philadelphia; Pittsburgh; Charlottesville, VA; Morganton, NC; Seattle, WA; Madison, WI). During selected 24-hour intervals, all women giving birth were screened for eligibility and willingness to be contacted again. Of the 8,986 mothers who gave birth during the sampling period, 5,416 (60%) met the eligibility requirements (mother was more than 18 years of age; mother spoke English; mother was healthy; baby was not from multiple birth or released for adoption; mother and child lived within an hour of research site; no move from the area was planned in the next 3 years; the neighborhood was not too dangerous to visit, as verified by police; and mother agreed to be telephoned in 2 weeks. Of that group, a conditionally random sample of 3,015 was selected (56%) for the telephone call. The conditioning assured adequate representation (at least 10%) of single mothers, mothers without a high school degree, and ethnic minority mothers. At the 2-week call, families were excluded if the baby had been hospitalized for more than 7 days, if the family expected to move in the next year, or if the family could not be reached after three attempts at contact. A total of 1,525 eligible families agreed to an interview. Of these, 1,364 completed a home interview when the infant was 1 month old and became study participants. The resulting sample was diverse, including 24% children of color, 11% mothers who had not completed high school, and 14% single mothers. Mothers had an average of 14.4 years of education. Average family income was 3.6 times the poverty threshold. Seventy-nine percent of the children were white and non-Hispanic. The participating families were very similar to the eligible hospital sample in terms of maternal education, percentage in different ethnic groups, and presence of a husband or partner. The participants differed from the 281 children who were recruited but were lost to follow-up.
Salkind_Chapter 08.indd 123
9/16/2010 12:40:25 PM
124
Human Development
Mothers of participants had significantly ( p < .05) more education (M = 14.4 years and SD = 2.5, as opposed to M = 13.6 years and SD = 2.6), higher family incomes (income–poverty ratio: M = 3.6 and SD = 2.8, as opposed to M = 3.2 and SD = 3.1), and were more likely to have a husband or partner in the household (85% as opposed to 76%). The children were less likely to be African American (11% as opposed to 19%).
Overview of Data Collection Children were followed from birth to 4½ years of age. Mothers were interviewed in person when infants were 1 month old. Detailed measures of home and family environments were obtained by means of interviews and observations when children were 6, 15, 24, 36, and 54 months old. Primary child-care settings were observed at the same ages for all children who were in nonmaternal care on a regular basis for 10 or more hours per week. Mothers were telephoned regularly to update reports on child-care usage. Children’s cognitive skills and social behavior were assessed at 4½ years. Means and standard deviations for all measures included in the analyses are presented in Table 1.
Child-care Measures During telephone interviews conducted at 3-month intervals through 36 months and at 4-month intervals thereafter, mothers reported types and hours of nonmaternal care that were being used.
Type of Care For each 3– 4 month interval (16 epochs, or intervals, in all), the child’s primary care arrangement was classified as center, child-care home (any home-based care outside the child’s own home except care by grandparents), in-home (any caregiver in the child’s own home except father or grandparent), grandparent, or father. Epochs in which children were in nonmaternal care for less than 10 hours per week were coded as exclusive maternal. The proportion of epochs in which the child received care in a center and the proportion of epochs in a child-care home were determined and included as type of care predictors in analyses.
Child-care Quantity Parents were asked about the hours of routine nonmaternal care during the telephone and personal interviews. The hours spent in all settings were summed for each of the 16 epochs.
Salkind_Chapter 08.indd 124
9/16/2010 12:40:25 PM
Salkind_Chapter 08.indd 125
9.4 (5.5) 0.2 (0.7)
.03 (.64) −.01 (.07)
M (SD) M (SD)
M (SD) M (SD) M (SD) M (SD)
M (SD) M (SD) n M (SD) M (SD)
Child-care characteristics Quantity: hrs/week Mean level Rate changea Quality: Positive caregiving rating Mean level Rate changea 26.3 (1.9) 1.9 (6.4) 985 2.8 (0.2) −.03 (.07)
11.2 5.6 78.9 4.3 85.2 14.4 (2.5) 3.7 (2.8)
50.4
Overall (n = 1,083)
% % % % % M (SD) M (SD)
%
% or M
Family characteristics Ethnicity African American Hispanic White Other Single parent Maternal education Income-to-needs ratio Maternal depression Mean level Slope Parenting: HOME & maternal sensitivity composite Mean level Rate change HOME total Maternal sensitivity
Child characteristics Gender (male)
Characteristic
Table 1: Descriptive statistics for all measures
500 3.0 (0.6)
26.0 (17.0)
36.8 (4.5) 9.3 (1.8)
9.1 (8.4)
3.8 (3.1)
87.5
6 months (n = 1,073)
561 2.9 (0.6)
23.3 (21.4)
37.6 (4.4) 9.4 (1.6)
9.0 (8.1)
3.7 (3.2)
86.5
15 months (n = 1,069)
597 2.8 (0.6)
25.1 (21.6)
9.4 (1.7)
9.5 (8.7)
3.8 (3.0)
86.3
24 months (n = 1,066)
641 2.8 (0.5)
26.1 (21.3)
41.7 (7.3) 17.3 (2.7)
9.2 (8.3)
3.7 (3.1)
84.0
36 months (n = 1,073)
Early Child Care
(Continued )
850 3.0 (0.6)
28.1 (21.1)
46.1 (5.3) 17.0 (2.9)
9.8 (8.6)
3.6 (3.2)
83.5
54 months (n = 1,075)
NICHD Early Child Care Research Network 125
9/16/2010 12:40:25 PM
Salkind_Chapter 08.indd 126
% % % % % %
% or M
Average rate of change per month across study period.
a
Child outcomes Cognitive (n = 1,043) Pre-academic composite Letter-word indentification Applied problems Memory for sentences Preschool language scale Mother report, social behavior (n = 1,044) Social competence Behavior problems Caregiver report, social behavior (n = 699) Social competence scale Behavior problems
Type of care (10 + hrs/wk) Exclusively mother Father care Grandparent In-home Child-care home Center Number of months child with caregiver
Characteristic
Table 1: (Continued )
25.2 16.4 10.0 7.3 19.8 22.1 n = 787 13.4 (15.4)
Overall (n = 1,083)
34.8 13.5 11.6 8.3 21.8 8.2
6 months (n = 1,073)
30.3 17.3 9.6 8.8 23.2 11.1
15 months (n = 1,069)
27.1 14.9 9.2 7.7 24.3 17.3
24 months (n = 1,066)
20.7 14.5 9.6 7.2 20.9 28.4
36 months (n = 1,073)
104.9 (13.5) 50.4 (10.3)
98.3 (13.5) 50.8 (89.5)
98.9 (11.6) 99.3 (13.6) 103.4 (15.8) 93.1 (18.6) 99.8 (10.3)
10.8 17.7 9.7 5.9 12.3 53.8
54 months (n = 1,075)
126 Human Development
9/16/2010 12:40:26 PM
NICHD Early Child Care Research Network
Early Child Care
127
Child-care Quality Observational assessments of quality were obtained for primary nonmaternal arrangements that were used for 10 or more hours per week at 6, 15, 24, 36, and 54 months. Observations were conducted during two half-day visits scheduled within a 2-week interval at 6 –36 months and one half-day visit at 54 months. At each half-day visit, observers completed two 44-minute cycles of the Observational Record of the Caregiving Environment (ORCE). The ORCE format consists of 44-minute cycles, each broken into four 10-minute observation periods. In each 10-minute period, observers alternate between 30-s observe and record frames. During the observe intervals, observers focus on the study child’s behavior, activities, and interaction with the caregiver or with other people. During the record intervals, observers complete the frequency checklist. At the end of the 10-minute period the observer makes brief notes and tentative qualitative ratings of the caregiver’s behavior and the child’s behavior for 2 minutes. This process is repeated for three 10-minute periods. In the final 10-minute period the observer makes observations exclusively for the qualitative ratings. At the end of the 44 minutes the observer makes final qualitative ratings for up to three caregivers using 4-point scales that range from not at all characteristic to highly characteristic, based on all four 10-minute periods. On average, four ORCE cycles were completed for children from 6 to 36 months, and two ORCE cycles were completed at 54 months. ORCE quality ratings were obtained for at least one age period for 91% of the sample (985 of 1,083) and for at least two age periods for 779 children. Thirty-four children were never in nonmaternal care on a regular basis; thus it was not possible to observe them in a child-care arrangement. Specific items that constitute the ORCE behavioral scales and qualitative ratings are listed in the Appendix. The behavior scales provided a record of the occurrence or quantity of specific acts, whereas the qualitative scales took into account the quality (and nuances) of the caregiver’s behavior in relation to the child’s behavior. Positive caregiving composites were calculated for each age level. At 6, 15, and 24 months, positive caregiving composite scores were the mean of five 4-point qualitative ratings (sensitivity to child’s nondistress signals, stimulation of cognitive development, positive regard for child, emotional detachment [reflected], and flatness of affect [reflected]). Cronbach alphas for the composite were .89 at 6 months, .88 at 15 months, and .87 at 24 months. At 36 months, these five ratings and two additional ratings, fosters child’s exploration and intrusive (reflected), were included in the composite (Cronbach α = .83). At 54 months the positive caregiving composite was the mean of 4-point ratings of caregivers’ sensitivity or responsivity, stimulation of cognitive development, intrusiveness (reflected), and detachment (reflected) (Cronbach α = .72). The behavioral composite, which was highly correlated with the qualitative composite at each age, was not used in the current analyses.
Salkind_Chapter 08.indd 127
9/16/2010 12:40:26 PM
128
Human Development
Before conducting observations at each age, observers studied extensive manuals, which detailed age-appropriate expectations for caregivers at 6, 15, 24, 36, and 54 months. At each age, observers from the 10 sites also attended a centralized training at which they viewed master-coded videotapes of appropriately aged children and their caregivers, conducted live observations at centers and home-based child-care settings, completed written tests, and participated in question-and-answer sessions. Further training and practice were conducted at each site using videotaped and live examples and instruction. To ensure cross-site reliability before data collection was initiated at each age, observers coded six tapes, each containing one 44-minute ORCE cycle that focused on a specified child who was the same age as the study children for a given assessment. The tapes represented all types of care and captured a range of quality. To be certified as a data collector, observers had to achieve exact agreement with the master codes of the behavior scales at 70% or better and with the qualitative ratings at 60% or better. To prevent observer drift, reliability at each age was checked with two further tests, each consisting of six new master-coded 44-minute ORCE cycles. A criterion of 60% exact agreement of the qualitative ratings and 70% on the behavioral frequencies was required for continued data collection. In addition, observer agreement was assessed during live on-site observations. At each site, all possible pairs of observers visited both home-based care and centers. Interobserver agreement for the positive caregiving composite score was computed for the master-coded videotapes and the live observations, using Pearson correlations and the repeated measures ANOVA formulation described in Winer (1971, p. 287). Inter-observer agreement exceeded .90 at 6 months, .86 at 15 months, .81 at 24 months, .80 at 36 months, and .90 at 54 months. Detailed descriptions of the infant version of the ORCE assessments can be found in NICHD Early Child Care Research Network, 1996. Details about the toddler versions and preschool versions can be found in NICHD Early Child Care Research Network, 2000a. Complete observation manuals can be found at http://public.rti.org/secc/.
Maternal, Child, and Family Controls Measures of maternal, child, and family characteristics were collected and used as controls for selection effects.
Demographic Variables During home interviews at 1 month, mothers reported their own education (in years) and the study children’s race and ethnicity (non-Hispanic African American, non-Hispanic European American, Hispanic, or other)
Salkind_Chapter 08.indd 128
9/16/2010 12:40:26 PM
NICHD Early Child Care Research Network
Early Child Care
129
and sex. The presence of a husband or partner in the home was reported in telephone interviews spaced every 3 to 4 months. Partner status was the proportion of 3-to-4-month intervals during which the mother reported that a husband or partner was present. Mothers reported family income at 6, 15, 24, 36, and 54 months. An income-to-needs ratio was calculated at each age from U.S. Census Bureau tables based on family income relative to household size and number of children under 18. In the current analyses, these ratios were averaged.
Maternal Depressive Symptoms Maternal depressive symptoms were assessed at 6, 15, 24, 36, and 54 months, using the Center for Epidemiological Studies Depression Scale (CES-D; Radloff, 1977), a self-report measure that assesses depressive symptomatology in the general population. Cronbach’s alpha coefficients ranged from .88 to .91 in the present sample. The intercept and linear slope were included as factors in the current analyses.
Mother-child Interactions Mother-child interactions were videotaped in semi-structured 15-minute observations at 6, 15, 24, 36, and 54 months. The tasks provided a context for assessing age-appropriate qualities of maternal behavior. The observation task at the 6-month visit had two components. In the first 7 minutes, mothers were asked to play with their infants and were told that they could use any toy or object available in the home or none at all. For the remaining 8 minutes, mothers were given a standard set of toys that they could use in play. At 15, 24, and 36 months, the observation procedures followed a three-boxes procedure in which mothers were asked to show their children age-appropriate toys in three containers in a set order (see Vandell, 1979). For example, at 15 and 24 months, a storybook was in the first container (different books were used at 15 and 24 months); a toy stove and related objects were in the second; and a toy house with various moving parts, a person, a dog, and a car were in the third. At 36 months, washable markers, stencils, and paper were in the first container; dress-up clothes and a cash register were in the second; and Duplo blocks with a picture of a model were in the third. The mother was asked to have her child play with the toys in each of the three containers and to do so in the order specified. Data were collected by research assistants who had attended centralized training sessions. Each data collector passed certification procedures based on a common certifier’s review of videotapes of the data collector administering the procedures. The certification procedures were designed to ensure that standard data collection procedures were used across the sites.
Salkind_Chapter 08.indd 129
9/16/2010 12:40:26 PM
130
Human Development
Videotapes of the mother-child interactions were shipped to a central location for coding by raters who were blind to other information about the families. Inter-coder reliability was determined by assigning two coders to 19–20% of the tapes randomly drawn at each assessment period. Coders did not know which tapes were assigned to double coding, and reliability assessments were made throughout the period of coding. Inter-coder reliability was calculated as the intra-class correlation coefficient. Reliability for the composite scores used in the current report exceeded .83 at every age. At 6, 15, and 24 months, composite maternal sensitivity scores were created from the sums of three 4-point ratings (maternal sensitivity to child nondistress, intrusiveness [reversed], and positive regard). At 36 and 54 months, the maternal sensitivity composite was the sum of the three 7-point ratings of supportive presence, hostility (reversed), and respect for autonomy. Cronbach alphas exceeded .70 at every age. The maternal sensitivity composite rating was a significant predictor of children’s attachment security at 15-months (NICHD Early Child Care Research Network, 1997b) and 36-months (NICHD Early Child Care Research Network, in press–a) and peer competencies at 36 months (NICHD Early Child Care Research Network, in press–b). It also was a significant mediator and moderator of relations between maternal depression and children’s expressive language and cooperation (NICHD Early Child Care Research Network, 1999b). The Home Observation for Measurement of the Environment (HOME; Caldwell & Bradley, 1984) was administered during home visits at 6, 15, 36, and 54 months. The focus is on the child in the environment, the child as a recipient of inputs from objects, events, and transactions occurring in connection with the family surroundings. The Infant/Toddler version of the Inventory (IT-HOME) is aimed for use during infancy (birth to age 3). It is composed of 45 items clustered into six subscales: (a) Parental Responsivity, (b) Acceptance of Child, (c) Organization of the Environment, (d) Learning Materials, (e) Parental Involvement, and (f ) Variety in Experience. Each item is scored in binary fashion ( yes/no). Information used to score the items is obtained during the course of the home visit by means of observation and the semi-structured interview. The Early Childhood version of the Inventory (EC-HOME) is aimed for use during early childhood (age 3 to 6 years). It is composed of 55 items clustered into eight subscales: (a) Learning Materials, (b) Language Stimulation, (c) Physical Environment, (d) Responsivity, (e) Academic Stimulation, (f ) Modeling, (g) Variety, and (h) Acceptance. Each item is scored in binary fashion ( yes/no). Information at this age also is scored during the course of the home visit by means of observation and the semi-structured interview. Both forms of the HOME are correlated with intellectual or academic performance and adaptive social behavior in the expected direction. A centrally located system of training was used for data collectors at each age. Every 4 months, each observer coded videotaped visits, and the coding
Salkind_Chapter 08.indd 130
9/16/2010 12:40:26 PM
NICHD Early Child Care Research Network
Early Child Care
131
was compared with gold standard codes. All observers were required to maintain a criterion of scoring like the master coder on 90% of the items. Cronbach alphas for the total score at each age exceeded .77. The HOME and maternal sensitivity ratings were standardized at each age and then averaged at each age to create a composite score. Together, these combined scores reflect parenting in two contexts: in the home and during semi-structured play. In previous research (NICHD Early Child Care Research Network, 1998, in press–b), we have found this composite parenting rating to be a strong and consistent predictor of children’s cognitive and social competencies at 24 and 36 months. Two indices of parenting quality (the intercept and slope) were created from the mean of the standardized scores at each age using HLM.
Expanded List of Child and Family Covariates Additional child and family variables were included as covariates in some analyses. These variables were maternal rating of child temperament obtained at 6 months measured by the 55-item Revised Infant Temperament Questionnaire (Carey & McDevitt, 1978), maternal psychological adjustment measured by three subscales (Agreeableness, Neuroticism, Extraversion) of the NEO Personality Inventory (Costa & McCrae, 1985) collected at the 6-month home visit, maternal report of social support using 11 items that were rated with 6-point Likert scales collected at all visits (Marshall & Barnett, 1993), maternal report of separation anxiety using 21 items that were rated with 5-point Likert scales averaged from the 1-to-24-month visits (DeMeis, Hock, & McBride, 1986), and maternal beliefs about the benefits of maternal employment for children using 11 items collected at the 1-month visit (Greenberger, Goldberg, Crawford, & Granger, 1988). These additional covariates were based on established measures with excellent psychometric properties.
Child Functioning at 4½ Years Measures of child functioning were obtained during a laboratory visit, home visit, and child-care visit at 4½ years.
Pre-academic Skills The score for pre-academic skills is a composite score from two subtests of the Woodcock Johnson Achievement and Cognitive Batteries (1990). The Letter-Word Identification test measures skills at identifying letters and words. Standard scores range from 63 to 180, with values above 100 indicating
Salkind_Chapter 08.indd 131
9/16/2010 12:40:26 PM
132
Human Development
that the raw score was above the mean score of children on whom the test was standardized. The Applied Problems test measures skill in analyzing and solving practical problems in mathematics. Standard scores range from 41 to 157, with values above 100 indicating that the raw score was above the mean score of the standardization sample. Internal consistencies for 4-year-olds are .92 and .91 for the two scales, respectively. Their correlation with each other was .51 in the standardization sample and .57 in our sample. Cronbach alphas for the Letter-Word Identification and Applied Problems tests were .86 and .85 in the current study. The composite score was formed by averaging the standardized scores on the two subtests.
Short-term Memory Short-term memory was assessed using the Woodcock Johnson Cognitive Memory for Sentences subtest. Standardized scores ranged from 17 to 150 (M = 93, SD = 18.57), with values above 100 indicating that the raw score was above the mean score for the standardization sample. Cronbach alpha was .84 for this measure in the current sample.
Language Competence Language competence was assessed using the Preschool Language Scale (PLS-3; Zimmerman, Steiner, & Pond, 1979). It measures a range of language behaviors, including vocabulary, morphology, syntax, and integrative thinking, which are grouped into two subscales: Auditory Comprehension and Expressive Language (Cronbach α = .89 and .92, respectively, in the current study). These scales were highly correlated (r = .70, p < .001, in our sample). The test is standardized to have a mean of 100 and a standard deviation of 15. In our sample, scores ranged from 50 to 133 (M = 99.39, SD = 18.43). The PLS-3 correctly identified 4-year-olds with language disorders 80% of the time and was correlated with other language measures (rs = .66 to .82).
Social Competence Social competence was measured by having mothers complete the Social Skills Rating System (SSRS, Gresham & Elliott, 1990) for their children. This instrument is composed of 38 items describing child behavior. Mothers responded on a 3-point scale reflecting how often their child exhibited each behavior. Items are grouped into four areas: cooperation (e.g., keeps room neat and clean without being reminded), assertion (e.g., makes friends easily), responsibility (e.g., asks permission before using a family member’s property), and self-control (controls temper when arguing with other children). The total score is the sum of all 38 items, with higher scores reflecting higher levels of perceived social competence. The SSRS was normed on a
Salkind_Chapter 08.indd 132
9/16/2010 12:40:26 PM
NICHD Early Child Care Research Network
Early Child Care
133
diverse, national sample of children in the 3-to-5-year age range and shows high levels of internal consistency (median = .90) and test-retest reliability (.75 to .88). Cronbach alpha in the current sample was .88. For children who were in child care at least 10 hours per week at age 54 months (n = 833), caregivers completed the California Preschool Social Competency Scale (Levine, Elzey & Lewis, 1969), a 30-item instrument assessing a range of social competencies especially relevant in child-care settings (e.g., safe use of equipment, using names of others, greeting new child, initiating group activities). Four items were added to index specific features of peer play (cooperation, following rules in games, empathy, and aggression). Items were rated on 4-point scales. Items scored as not applicable were set as missing. The Total Social Competency score was the sum of the 34 items, with higher scores denoting greater social competence. Scores ranged from 46 to 135 (M = 104.88, SD = 13.6, α = .88).
Behavior Problems Behavior problems were assessed by having mothers and caregivers complete the appropriate versions of the Child Behavior Checklist (Achenbach, 1991). The parent version lists 113 problem behaviors. The parent rates each as not true (0), somewhat true (1), or very true (2) of her child. Caregivers (n = 768) in children’s child-care settings completed the 100-item caregiver teacher version developed for children ages 2-5 years. Both parent and teacher versions contain two subscales: Internalizing Problems (e.g., too fearful and anxious) and Externalizing Problems (e.g., argues a lot). Achenbach reports test-retest reliability of .89, inter-parent agreement of .70, and stability of scale of .71 over 2 years. Cronbach alphas for the mother version in the current sample were .81 for internalizing and .88 for externalizing. For the teacher version, Cronbach alphas were .90 for internalizing and .95 for externalizing in the current sample. For both subscales as well as for the Total Problem score, raw scores were converted into standard T scores, based on normative data for children of the same age. Details about all data collection procedures are documented in Manuals of Operation of the study, which can be found at http://public.rti.org /secc/.
Results Longitudinal Analyses of Child Care and Family Characteristics Preliminary analyses summarized our longitudinal assessment of the children’s child-care experiences and family context. Mothers were asked every 3 to 4 months about the number of hours spent in routine nonmaternal
Salkind_Chapter 08.indd 133
9/16/2010 12:40:26 PM
134
Human Development
care and the type of setting. Two quantity indices were created from these maternal reports of hours per week in all nonmaternal care arrangements using Hierarchical Linear Model (HLM) analyses (Bryk & Raudenbush, 1987). The HLM approach offered parsimonious, interpretable, and continuous summary scores describing quantity of child care over the first 4½ years of life by estimating individual measures reflecting overall amount of care and rate of change over time. Unconditional quartic individual growth curves were estimated, with age centered at 24 months. The HLM analyses revealed significant individual differences in the quantity intercept (z = 23.77, p < .001) and the quantity linear slope (z = 19.29, p < .001). Overall, at 24 months children experienced almost 25 hours of care per week (M = 26.25 hours/week, SD = 17.4) and showed modest increases in child-care hours over time (M = 1.90 hours/week, SD = 4.5). Two individual growth-curve parameters were retained for subsequent analysis: the intercept (general tendency) of hours per week that nonmaternal care was used during the 16 intervals from 1 month through 4½ years and the linear slope of reported hours per week over time. In addition, two analysis variables representing the type of care were computed: the proportion of epochs that the child was in center care (% center care) and the proportion of epochs that the child was in a child-care home (% cc home). Observations of caregiver sensitivity also were summarized using HLM to describe longitudinal patterns of change. Unconditional linear growth curves were fit, and individual intercepts and slopes were estimated. Two cumulative quality indices were formed using scores for all time periods in which a particular child’s care settings were observed and rated. HLM analyses yielded significant individual differences in the positive caregiving quality intercept (z = 10.21, p < .001) and positive caregiving linear slope (z = 5.68, p < .001). On average, children experienced moderately good quality care (M = 2.82, SD = .23), with child-care providers showing slightly more sensitivity to children when they were younger (M = −.029, SD = .013). In addition, HLM analyses were used to summarize longitudinal assessments of maternal depression and parenting. An unconditional linear growth curve was fit to the repeated assessments of maternal depression. There were systematic individual differences in the intercept (z = 20.9, p < .001) and linear change over time (z = 6.8, p < .001). On average, mothers reported few symptoms overall (intercept M = 9.35, se = .18, p < .0001) and very modest gains over time (M = .19 symptoms per year, se = .06, p < .003). An unconditional linear model also was fit to the repeated assessments of parenting. Significant individual differences emerged in both the overall level (z = 22.9, p < .001) and linear change over time (z = 8.2, p < .001). The parenting variable was created as the mean of standardized variables, so it is not surprising that the group growth curve was characterized by intercepts and slopes that did not significantly vary from zero. Nevertheless, the substantial individual differences in the intercepts and slopes made these summary measures interesting as covariates in subsequent analyses. Finally, income
Salkind_Chapter 08.indd 134
9/16/2010 12:40:26 PM
NICHD Early Child Care Research Network
Early Child Care
135
was summarized as the mean income-to-needs ratio and partner status was summarized as the proportion of time the mother reported a partner in the household from the 6–54 assessments.
Is Child Functioning Associated with Child-care Quantity, Quality, and Type? The primary analyses involved multivariate linear regression models that tested if child functioning at 4½ years varied as a function of child-care quantity, quality, and type. Two quantity indicators (individual intercept and slope of reported hours/week in care from 3 months to 4½ years), two quality indicators (individual intercept and slope of positive caregiving ratings), and two type indicators (proportion of 3-to-4-month epochs in which children attended centers and proportion of epochs in which children attended child-care homes) were tested along with the following control variables: child sex (1 = male), child ethnicity (coded African American, European American, Hispanic American, and other), proportion of epochs in which a husband or partner was in the household, maternal education, average ratio of income to needs, maternal depression intercept and slope, and parenting-quality intercept and slope. Interactions between the child-care parameters and each of the controls were tested to determine if child-care effects were moderated by family characteristics. Interactions between the three child-care parameters were tested to determine if those factors acted synergistically. The tests of interactions also served as tests of homogeneity of regression. Because none of the interactions was significant, they are not presented or discussed further. The results of the primary analyses are shown in Tables 2 and 3. The second, third, and fourth rows in Table 2 present the explained variance (R2) for the models as a whole, the block of child-care predictors, and the block of variables composed of the child and family controls. Also presented in Table 2 are the multivariate test statistics for the child-care and control blocks. The next six rows list the test statistics for the multivariate test and the standardized regression coefficients for each child-care predictor; the final rows list the standardized regression coefficients for each one-degree-of-freedom covariate and the p-value level for multiple-degree-of-freedom covariates. Table 3 shows a complementary measure of association, the structural coefficients (Courville & Thompson, 2001). This measure reflects the relative predictive power of each predictor included in the analysis model without adjusting for shared variance among the predictors. The structural coefficient is computed as the zero-order correlation between a predictor and an outcome measure divided by the multiple correlation. We identified these coefficients within the context of a given model (i.e., within each column in Table 3) by identifying the coefficients that are largest as the best unconditional predictors if the overall model provides significant prediction of the outcome. Examination of both the structural and the standardized
Salkind_Chapter 08.indd 135
9/16/2010 12:40:26 PM
136
Human Development
Table 2: Prediction of child functioning at 4½ years from child-care quantity, quality, and type: Model tests and standardized coefficients ( β) Social outcome Caregiver report (n = 533)
Cognitive outcome (n = 737) Predictor
MANOVA Acad
Model fit F Overall model Child care 1.84* block Covariate 11.39*** block Predictors Child care Quantity intercept Quantity slope Quality intercept Quality slope % Child-care home % Centers Covariates Site Male Ethnicity Maternal education Partnered Income Parenting intercept Parenting slope Depression intercept Depression slope
2
Lang 2
Mem
MANOVA Skills
2
2
Mother report (n = 748)
Prob 2
MANOVA
R .39*** .02*
R .44*** .01*
R F .22*** .01 2.95***
R R .13*** .14*** .02 .04***
.23***
.25***
.11*** 3.02***
.07*** .07*** 10.34***
F
β
β
1.48
.03
−.02
−.05
5.78**
0.53
.03
−.01
−.00
4.16**
.16***
.10*
R .18*** .00
R2 .17*** .00
.15***
.16***
β
F
−.07
.16**
.05
−.01
.01
2.40
−.08
.09*
.02
−.00
.01
.08
2.73
.11
−.00
.44
.01
.04
−.03 .05
1.64 0.28
.09 .04
−.01 −.02
.79 .09
.03 −.02
.04 .01
3.45*
.02
.11
.02
.00
−.01
.01
F
β
.20
Prob
2
β
β
3.04* 1.71
.10* −.01
3.98**
.05
.11**
.11*
−.06*
−.11***
.03
−.11**
.09*
.06
.04
−.10
−.05
.03 −.00 .16*
−.08 .07 −.16*
−.10* .02 .01 .04 .28*** −.09
.06
.08
−.14**
.16***
.05 .06
β
F
Skills
−.08* −.06 −.08* .01 .07 .06 .40*** .37*** .28*** .12***
.15***
.04
.00
−.06
−.06
−.03
−.07*
−.08*
−.07
.04
−.06
.17*** −.03
.06
−.02
−.15***
−.20***
.33***
−.02
.08*
Note: Acad = pre-academic skills, Lang = language competence, Mem = memory, Skills = social skills, Prob = behavior problems. *p < .05, **p < .01, ***p < .001.
coefficients provides information about the degree to which a predictor is associated with the outcome and offers unique prediction.
Cognitive Outcomes Three cognitive outcomes (pre-academic skills, language, and short-term memory) were considered. As shown in the first four columns of Table 2, the multivariate analysis indicated that cognitive functioning was significantly
Salkind_Chapter 08.indd 136
9/16/2010 12:40:26 PM
NICHD Early Child Care Research Network
Early Child Care
137
Table 3: Prediction of child functioning at 4½ years from child-care quantity, quality, and type: Structural coefficients (rs) Caregiver report on social Outcome
Cognitive outcome Predictor
Mother report on social outcome
Acad
Lang
Mem
Skills
Prob
Skills
Prob
rs
rs
rs
rs
rs
rs
rs
Child Care Quantity intercept Quantity slope Quality intercept Quality slope % Child-care home % Centers
.02 −.13 .43 −.13 −.03 .18
.01 −.18 .36 −.11 .02 .22
−.05 −.14 .45 −.27 −.04 .23
−.19 −.30 .39 −.05 .03 −.09
.47 .27 −.32 .09 −.04 .37
−.06 −.05 .18 .00 −.06 .08
−.02 .02 −.08 .02 .00 −.03
Covariatesa Male Maternal education Partnered Income Parenting intercept Parenting slope Depression intercept Depression slope
−.14 .69 .25 .51 .86 .29 −.27 −.16
−.20 .63 .32 .55 .84 .31 −.30 −.17
.04 .60 .24 .54 .80 .23 −.38 −.25
−.31 .48 .35 .38 .66 .36 −.39 −.24
.00 −.44 −.40 −.19 −.56 −.43 .32 .12
.40 .35 .19 .33 .71 .25 −.66 −.05
−.06 −.35 −.21 −.23 −.50 −.42 .86 .24
Note: rs = rX Y /R, where rX 1
1Y
is the correlation coefficient for predictor X1 and outcome Y; R is the square
root of R2 of the model. Acad = pre-academic skills, Lang = language competence, Mem = memory, Skills = social skills, Prob = behavior problems. a
Site and ethnicity were also included in the model but are not listed because they are categorical predictors.
associated with child care, F(18, 2003) = 1.84, p = .02, and specifically with the quality intercept, F(3, 708) = 4.6, p = .006, the quality slope, F(3, 708) = 3.04, p = .03, and the proportion of center-care epochs, F(3, 708) = 3.98, p =.008. Children who attended higher-quality child care scored higher on tests of pre-academic skills and language than did children who attended lower-quality child care. Children whose child care increased in quality over time had better pre-academic skills, whereas pre-academic skills were lower when child care decreased in quality over time. Children who had more center experience displayed better language skills and better performance on the memory test than did children with less center-type experience. The structural coefficients in Table 3 show a similar pattern of results. These unconditional measures of association indicate that family characteristics such as parenting, maternal education, and income show the strongest association with the cognitive outcomes, but that overall quality of child care (quality intercept) was a moderately strong predictor of these outcomes. In contrast, amount of center care was a stronger predictor in the regression model than when considered alone.
Salkind_Chapter 08.indd 137
9/16/2010 12:40:26 PM
138
Human Development
Social Outcomes We considered four aspects of social functioning (social skills and behavior problems reported by mothers, social skills and behavior problems reported by caregivers) in relation to child-care quantity, quality, and type by using multivariate hierarchical regression models that paralleled those used to predict cognitive functioning. Separate analyses were conducted for reports by mothers and by caregivers because those reports were only minimally related. The correlation between maternal and caregiver reports of behavior problems was r = .23, p < .001; and the correlation between maternal and caregiver reports of social skills was r = .21, p < .001. As shown in Table 2, caregiver reports of social behavior were significantly related to child care, F(12, 1010) = 2.95, p < .001, and specifically to overall quantity of care (individual intercept) from 3 months to 4½ years, F(2, 505) = 5.78, p = .003, and proportion of epochs of center care, F(2, 505) = 3.45, p = .03. The structural coefficients shown in Table 3 reveal a similar pattern of associations, although the various family measures, not surprisingly, show stronger associations with the structural coefficients than was the case for standardized coefficients because of their shared variance. Both sets of coefficients indicated that children with more child-care hours per week (quantity intercept) had more problem behaviors according to their caregivers than did children with fewer child-care hours. Although the multivariate test indicated that proportion of center-care epochs was significantly related to caregiver reports of social outcomes and although the structural coefficients identified proportion of center care as a moderately strong predictor of behavior problems, the individual betas associated with center care were not significant for either social skills or behavior problems. Similarly, quality of care shows a moderate association with the unadjusted structural coefficients but not with the adjusted standardized coefficients. Thus, among the child-care variables, only quantity of care provides significant prediction when all covariates are considered; but both type and quality of care are associated before adjusting for the extensive set of family characteristics. To address the concern that the difference in caregiver reports was an artifact of differential familiarity with the study child, the length of time that the caregiver provided care to the child was added in a follow-up analysis of the caregiver’s ratings of social behaviors. Caregiver ratings of problem behaviors continued to be significantly related to overall amount of time the child spent in nonmaternal care (B = .13, p < .05) and became significantly related to proportion of center care that the child experienced (B = .14, p < .05).
Analyses with Additional Covariates To address the concern that selection factors were not adequately controlled for, analyses were reconducted with the expanded list of covariates, consisting
Salkind_Chapter 08.indd 138
9/16/2010 12:40:27 PM
NICHD Early Child Care Research Network
Early Child Care
139
of the maternal rating of child temperament, maternal psychological adjustment, maternal report of social support, maternal separation anxiety, and maternal beliefs about the benefits of employment. These covariates were added to the nine child and family predictors in the previous model. The same significant child-care findings were obtained with the expanded list of covariates, suggesting that the obtained findings were not an artifact of inadequate controls for family characteristics.
Analyses of the “Whole” Sample Additional multivariate regressions were then conducted for all of the children in the sample, including children without any nonmaternal care. In these additional analyses, two quantity indicators (hours intercept and slope) and two type indicators (proportion of epochs of center care and proportion of epochs of child-care homes) were used as predictors. Childcare quality was not included as a predictor in these analyses, because quality could not be assessed for children who were not in child care. Findings regarding quantity and type of child care very similar to the findings described above were obtained in these follow-up analyses. Consequently, the quantity and type findings, excluding quality controls, are not presented or discussed further. They are available from the authors upon request.
How Large Are the Effects of Child-care Quantity, Quality, and Type? Follow-up analyses were then conducted to evaluate the magnitude of the statistically significant child-care effects reported above. Following the recommendation of McCartney and Rosenthal (2000), the obtained effects were evaluated in relation to two other well-established predictors of child outcomes – parenting quality and poverty. Effect sizes were computed as the difference between the adjusted means for high and low groups divided by the pooled standard deviation. For these analyses, continuous variables were transformed to categorical ones so that differences between the mean scores for high- and low-risk groups could be compared. Child-care quantity was categorized as <10 hours/week, 10–29 hours/week, or 30 + hours/week using the estimated individual intercepts from the HLM analysis of quantity up to 54 months. Child-care quality was categorized as the bottom, middle, and top third of the distribution of the estimated intercept from the HLM analysis of quality ratings from 6 to 54 months. Type-of-care variables (center and child-care home) were categorized as 0 epochs, 1–32% of epochs, and 33% + of epochs. Quality of parenting was categorized as the bottom, middle, and top third of the distribution of the estimated intercept from the HLM analysis of parenting
Salkind_Chapter 08.indd 139
9/16/2010 12:40:27 PM
140
Human Development
quality from 6 to 54 months. Family income was categorized as poverty if the average income-to-needs ratio was 2.0 or lower (n = 301, 27.8% of sample). For analytic purposes, the same proportion of families was categorized as high income (n = 301, 27.8%). These families had income-to-needs ratios in excess of 4.43. Middle-income was categorized as income-to-needs ratios that were greater than 2.0 but less than 4.43. Two sets of child-care effect sizes were estimated, based on the number of family selection factors included in the analyses. The first analysis estimated effect sizes for parenting and childcare variables and included all other variables as covariates similar to the regression analyses reported above. The second analysis estimated effect sizes for poverty and child-care variables and included fewer covariates. This approach was adopted because poverty is a widely recognized risk factor for child development and therefore an intuitively appealing comparison. However, the full array of family selection factors could not be used as controls in this analysis because inclusion of these factors eliminated poverty as a risk factor. This probably occurred because poverty is linked to child development through its negative association with parental beliefs and practices (McLoyd, 1998). Thus separate effect sizes were computed for one model with the full set of covariates as the comparison and for a second model with a reduced set of covariates. In the comparisons of child-care and parenting effects, all of the family and child characteristics included in the primary regression analyses reported above were used as covariates. In the comparisons of child-care and poverty effects, a more limited array of covariates (site, child gender, child ethnicity) was used.
Quantity Effect The ANCOVA that included the full set of child care, child, and family covariates indicated that children who averaged 30 or more hours of child care per week during the first 4½ years had more problem behaviors according to their caregivers than did children who averaged less than 10 hours of care per week (effect size, d = .38). The high-quantity group scored, on average, 3.7 points higher on the behavior problem scale than the low-quantity group. In comparison, children who experienced parenting quality in the bottom third scored 2.4 points higher than did children in the top third (d = .25). The quantity effect size was 152% (i.e., .38/.25) as large as the parenting effect size. The ANCOVA that included the more limited set of covariates indicated that caregivers reported more problem behaviors among children who had been in care for more than 30 hours per week than children with little or no early child care. The high-quantity group scored, on average, 4.2 points higher on the behavior problem scale than the low-quantity group (d = .43).
Salkind_Chapter 08.indd 140
9/16/2010 12:40:27 PM
Behavior Problems
NICHD Early Child Care Research Network
Early Child Care
141
55 low middle
50
high 45 Parenting Quality
Child-Care Quantity–1
Family Income
Child-Care Quantity–2
Figure 1: Children’s behavior problems associated with parenting quality, child-care quantity, and family income. Problem score means for parenting quality and childcare quantity-1 groups are adjusted for all covariates listed in Table 2. Problem score means for family income and child-care quantity-2 groups are adjusted for the more limited array of covariates.
In comparison, children in the poverty group scored 4.5 points higher than did children in the high-income group (d = .47). The quantity effect size was 91% as large as the poverty effect size. For illustrative purposes, the first two sets of bars in Figure 1 show the adjusted means of behavior problems for the three parenting quality groups and three child-care quantity groups. The second two sets of bars in Figure 1 show the adjusted means for the three income groups and the three childcare quantity groups.
Quality Effects Children whose child care was in the highest third of quality obtained higher scores on tests of pre-academic skills and language than did children whose child care was in the bottom third. The ANCOVA that included the full set of family and child covariates indicated that the adjusted mean scores for children in higher-quality care were 2.3 points (d = .24) higher on pre-academic skills and 2.3 points (d = .15) higher on language skills than the scores for children in the low-quality group. In comparison, the adjusted means for children receiving high-quality parenting were 8.3 points (d = .88) higher on pre-academic skills and 13.7 points (d = .87) higher on language skills than for children receiving low-quality parenting group. The effect sizes for childcare quality were 27% and 17%, respectively, as large as the effect associated with parenting quality and poverty. The ANCOVA that included the more limited set of covariates yielded differences of 3.8 points (d = .39) for pre-academic skills and 4.8 points (d = .29) for language for children in high- and low-quality child care. In comparison, the adjusted means for children from the poverty group were 8.3 points (d = .83) lower on pre-academic skills and 15.6 points (d = .95) lower on language skills than for children from high-income families.
Salkind_Chapter 08.indd 141
9/16/2010 12:40:27 PM
142
Human Development
Pre-Academic Skills
105
100
low middle
95
high
90 Parenting Quality
Child-Care Quality – 1
Family Income
Child-Care Quality – 2
Figure 2: Children’s pre-academic skills associated with parenting quality, child-care quality, and family income. Means for parenting quality and child-care quality–1 groups are adjusted for all covariates listed in Table 2. Means for family income and child-care quality–2 groups are adjusted for the more limited array of covariates.
The effect sizes for child-care quality were 47% and 31%, respectively, as large as the effect size for parenting quality and poverty. For illustrative purposes, the first two sets of bars in Figure 2 show children’s adjusted mean scores for pre-academic skills for the three parenting quality groups and the three child-care quality groups. The second two sets of bars in Figure 2 show pre-academic scores for the three familyincome groups and the three child-care quality groups.
Type Effects The analysis with the full set of family and child covariates yielded differences between children who were in center care for at least one-third of the epochs versus children with no center experience: a difference of 4.6 points (d = .29) on language performance and 5.7 (d = .34) on memory. In comparison, the difference in adjusted means between children who received high- and lowquality parenting was 13.7 (d = .87) on language and 10.6 (d = .64) on memory. Center effect sizes comprised 32% and 52% of the parenting quality in these analyses. Differences in adjusted means with the more limited set of covariates were 6.6 points for language performance (d = .41) and 7.1 points for memory performance (d = .41) for children with extensive center-type experience as contrasted with those who had no such experience. The comparable effects for poverty were 15.6 points (d = .95) for language and 9.6 points (d = .56) for memory. The center effect sizes were from 43% to 73% as large as poverty effect sizes in these analyses. Figure 3 illustrates these center, parenting quality, and poverty findings with regard to children’s language.
Salkind_Chapter 08.indd 142
9/16/2010 12:40:27 PM
NICHD Early Child Care Research Network
Early Child Care
143
Language Skills
105
100
95
low middle
90
high
85 Parenting Quality
Child-Care Center–1
Family Income
Child-Care Center–2
Figure 3: Children’s language skills associated with proportion of center care. Means for parenting quality and center care–1 groups are adjusted for all covariates listed in Table 2. Means for family income and center care–2 groups are adjusted for the more limited array of covariates.
Conclusions Results from the current analyses indicate that early child care is associated with both developmental risks and developmental benefits for children’s functioning prior to school entry, even after controlling for a host of factors including gender, ethnicity, family socioeconomic status, maternal psychological adjustment, and parenting quality. The risk is that more hours in child care across the first 4½ years of life is related to elevated levels of problem behavior at 4½ years. The benefit is that higher-quality child care, quality that improves over time, and more experience in centers predicts better performance on measures of cognitive and linguistic functioning. It is important that each of these aspects of child care (quantity, quality, and type) was associated with child functioning when other aspects of child care were controlled. Higher-quality child care predicted better pre-academic skills and language, regardless of hours and type of care. Larger amounts of child care were associated with behavior problems, even after quality of care was controlled. Center experience was a unique predictor of both language and memory performance. Thus it appears that focusing on only one facet of child care provides an incomplete picture, just as focusing on a single aspect of the elephant led to misleading conclusions in the Indian folk tale. The focus on a single aspect of child care fails to fully represent child-care effects on young children (Vandell et al., 2000). These child-care findings at 4½ years are consistent with earlier findings involving the same sample, even though different measures of cognitive, language, and social functioning were used with younger children. In particular, large amounts of child care were related to heightened behavior problems at 2 years of age as reported by caregivers and less social competence as
Salkind_Chapter 08.indd 143
9/16/2010 12:40:27 PM
144
Human Development
reported by mothers (NICHD Early Child Care Research Network, 1998). Higher-quality care and center-type experience predicted better linguistic, cognitive, and pre-academic functioning on standardized measures at 15, 24, and 36 months (NICHD Early Child Care Research Network, 2000b). The consistency of the earlier findings and the 4½-year results suggest robust associations, at least across the preschool period. Also consistent with our prior findings, when children were 2 and 3 years old the effects associated with hours of care and with type of care were maintained when analyses were conducted for all of the children in the sample, including those with zero hours. In the current analyses, amount of child care was related to caregiver but not to mother report of behavior problems. This difference may well reflect the fact that mothers and caregivers observed the children in different contexts, the former at home where the number of other people is rather limited and usually the child has known those people for a long time, the latter where there can be many children and adults who may not be highly familiar. The modest correlations between maternal and caregiver report of the children’s social behaviors obtained in the current study and in other research (Ablow et al., 1999) are consistent with the argument that mothers and caregivers offer different perspectives on child functioning. To evaluate the practical importance of the 4½-year findings, we compared the effect sizes that were associated with child care with effects associated with two other well-recognized influences on young children’s development: parenting quality and poverty. The obtained quantity effects on caregivers’ reports of child behavior problems were larger than the effects on behavior problems that were associated with parenting (d = .38 versus .23) and almost as large as the effects associated with poverty (d = .43 versus .47). Effects of child-care quality on children’s pre-academic skills and language (d = .39 and .29, respectively) and effects of center-type experience on language and memory (d = .28 and .33) were comparable in size to quantity effects on behavior problems but smaller than effects of parenting on these cognitive and linguistic outcomes. Using these factors as benchmarks, we conclude that the detected child-care effects were meaningful. The practical importance of these effects is further underscored by the fact that the majority of young children in the United States are in child care for a substantial number of hours (West et al., 2000) and that much of this care is not of high quality (NICHD Early Child Care Research Network, 2000a). Center care is more likely to be used by higher-income households that have the economic resources to purchase it. Children from low-income families are more likely to be cared for by relatives in informal settings because their families do not have the funds to purchase center-based care (Hofferth et al., 1998). Even modest effects may aggregate when large numbers of children are affected. For example, many of the most important risk behaviors from a public health perspective have a low or moderate risk, but
Salkind_Chapter 08.indd 144
9/16/2010 12:40:27 PM
NICHD Early Child Care Research Network
Early Child Care
145
they are multiplied in importance because of their wide prevalence and links to problematic outcomes (Jeffrey, 1989). The results of the current study also underscored the importance of parenting and the home environment for young children. Children who received higher-quality parenting as indicated by more sensitive, stimulating, and supportive maternal behavior at home and in semi-structured play displayed higher pre-academic skills, better language skills, more social skills, and fewer behavior problems than children who received lower-quality parenting. These effects are consistent with a substantial theoretical and empirical literature (Collins et al., 2000), as well as with earlier findings from the NICHD studies (1999b; 2000b, in press-a) that have linked maternal sensitivity and stimulation to young children’s subsequent cognitive and social development. The effects associated with parenting and pre-academic skills (d = .88) and language (d = .87), which include both shared genes and environmental influences, were among the largest effects obtained in the current study. Although we obtained significant child-care and parenting effects in the current study, it is important to acknowledge that both types of effects may be underestimated. Our sampling plan excluded some high-risk families (adolescent mothers, non–English speakers, and those who lived in very dangerous neighborhoods) and children at biological risk, such as children of low birth weight. It seems likely that these exclusions truncated the range in scores, resulting in lower effect sizes for parenting and poverty. Because low-income families are more likely to use care that is lower in quality and less likely to use centers (Phillips, 1995), child-care effects also may be underestimated because of truncated scores. It seems less likely that we have overestimated the effects associated with child care because of a failure to control adequately for family selection factors. In follow-up analyses with the expanded set covariates, a substantial array of family characteristics, including ethnicity, maternal education, income, parenting quality, maternal depression, psychological adjustment, social support, beliefs about the benefits of maternal employment, and separation anxiety were used as controls. Results were essentially the same when the additional covariates were included: Child-care quality was related to pre-academic skills and language, center care was related to memory and language, quantity was related to behavior problems reported by caregivers. Finally, it should be noted that the child-care main effects were not moderated by family factors. Consistent with the long and well-established intervention literature, which has argued that the most needy benefit the most from high-quality early education programs (Campbell et al., 2001; Ramey et al., in press), we had anticipated that high-quality child care and center-based care would be most advantageous for children growing up in less advantageous circumstances such as lower family income, maternal depression, and lower-quality parenting. We failed, however, to detect statistically significant child care and family interactions of that sort. We also had expected interactions between the child-care parameters. For example, we
Salkind_Chapter 08.indd 145
9/16/2010 12:40:27 PM
146
Human Development
had anticipated that effects of child-care quality would be magnified by child-care hours. We failed, however, to detect dosage effects or other interactions between child-care parameters. Others (McClelland & Judd, 1993) have noted that multiple regression approaches are relatively insensitive to conditional relations. In other papers being completed with this data set, we consider in more detail each of these specific child-care parameters, which were beyond the scope of the current article. For example, related to child-care quality, we consider specific caregiver behavior such as reading and talking to children as predictors of child developmental outcomes. In relation to child-care quantity, we will consider several factors that might mediate the association between child-care hours and child behavior problems. In future research, we also plan to extend consideration of effects of early child care and family experiences to child developmental outcomes during elementary school. As the nation moves to raise educational standards for children’s performance in school (National Education Goals Panel, 1997), experiences in child-care settings must be considered as sources of variability in children’s readiness for school (Pianta & Cox, 1999). It remains to be determined whether the apparent consequences of child care remain, dissipate, or grow in time, and whether early schooling maintains or deflects developmental trajectories set in motion during the infant, toddler and preschool years.
Appendix ORCE Qualitative Rating Scales (4-point ratings collected after 44 min of observation): Sensitivity to child’s nondistress Cognitive stimulation Positive regard toward child Negative regard Detachment Flatness of affect Fosters exploration Intrusiveness ORCE Behavior Frequencies (time sample observation; 30 sec observe/30 sec record): Caregiver expresses positive affect Positive physical contact Speaks positively Asks questions Responds to the child’s vocalization Reads Initiates other talk to the child Stimulates child’s cognitive development Stimulates child’s social development
Salkind_Chapter 08.indd 146
9/16/2010 12:40:28 PM
NICHD Early Child Care Research Network
Early Child Care
147
Facilitates child behavior Restricts activity Speaks negatively Exhibits negative physical actions Teaches academic skill Has mutual exchanges with child
Note This study is directed by a steering committee and supported by NICHD through a cooperative agreement (U10) that calls for a scientific collaboration between the grantees and NICHD staff. The participating investigators, listed in alphabetical order, are Virginia Allhusen (University of California, Irvine), Jay Belsky (University of London), Cathryn Booth (University of Washington), Robert Bradley (University of Arkansas, Little Rock), Celia A. Brownell (University of Pittsburgh), Margaret Burchinal (University of North Carolina, Chapel Hill), Bettye Caldwell (University of Arkansas for Medical Sciences), Susan B. Campbell (University of Pittsburgh), K. Alison Clarke-Stewart (University of California, Irvine), Martha Cox (University of North Carolina, Chapel Hill), Sarah L. Friedman (NICHD, Bethesda, Maryland), Kathryn Hirsh-Pasek (Temple University), Aletha Huston (University of Texas, Austin), Elizabeth Jaeger (Temple University), Deborah J. Johnson (Michigan State University), Jean F. Kelly (University of Washington), Bonnie Knoke (Research Triangle Institute), Nancy Marshall (Wellesley College), Kathleen McCartney (Harvard University), Marion O’Brien (University of Kansas), Margaret Tresch Owen (University of Texas, Dallas), Chris Payne (University of North Carolina, Greensboro), Deborah Phillips (Georgetown University), Robert Pianta (University of Virginia), Suzanne M. Randolph (University of Maryland, College Park), Wendy Robeson (Wellesley College), Susan Spieker (University of Washington), Deborah Lowe Vandell (University of Wisconsin, Madison), Marsha Weinraub (Temple University).
References Ablow, J. C., Measelle, J. R., Kraemer, H. C., Harrington, R., Luby, J., Smider, N., Dierker, L., Clark, V., Dubick, B., Heffelfinger, A., Essex, M. J., & Kupfer, D. J. (1999). The MacArthur Three-City Outcome Study: Evaluating multi-informant measures of young children’s symptomatology. Journal of the American Academy of Child and Adolescent Psychiatry, 38(12), 1580–1590. Achenbach, T. (1991). Manual for the Child Behavior Checklist/4 –18 and 1991 Profile. Burlington, VT: Author. Bachu, A. (June, 1995). Fertility of American women: June 1994. (U.S. Bureau of Census Current Population Report P20–482). Washington, DC: U.S. Government Printing Office. Bates, J., Marvinney, D., Kelly, T., Dodge, K., Bennett, D., & Pettit, G. (1994). Child-care history and kindergarten adjustment. Developmental Psychology, 30, 690–700. Baydar, N., & Brooks-Gunn, J. (1991). Effects of maternal employment and child care arrangements on preschoolers’ cognitive and behavioral outcomes: Evidence from the children of the National Longitudinal Survey of Youth. Developmental Psychology, 27, 932–945. Belsky, J. (1999). Quantity of nonmaternal care and boys’ problem behavior/adjustment at 3 and 5: Exploring the mediating role of parenting. Psychiatry: Interpersonal and Biological Processes, 62, 1–20. Belsky, J. (in press). Development risks (still) associated with early child care. Journal of Child Psychology and Psychiatry.
Salkind_Chapter 08.indd 147
9/16/2010 12:40:28 PM
148
Human Development
Belsky, J., & Rovine, M. (1988). Nonmaternal care in the first year of life and the security of infant-parent attachment. Child Development, 59, 157–167. Blau, D. (1999). The effects of child care characteristics on child development. Journal of Human Resources, 34, 786–822. Bowlby, J. (1973). Attachment and loss: Vol. 2. Separation. New York: Basic Books. Braungart-Rieker, J., Courtney, S., & Garwood, M. (1999). Mother- and father-infant attachment: Families in context. Journal of Family Psychology, 13, 535–553. Brazelton, T. (1986). Issues for working parents. American Journal of Orthopsychiatry, 56, 14 –25. Bryk, A. S., & Raudenbush, S. W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147–156. Burchinal, M. R., & Nelson, L. (in press). Family selection and child care experiences: Implications for studies of child outcomes. Early Childhood Research Quarterly. Burchinal, M. R., Roberts, J., Riggins, R., Zeisel, S., Neebe, E., & Bryant, D. (2000). Relating quality of center-based child care to early cognitive and language development longitudinally. Child Development, 71, 339–357. Caldwell, B. M., & Bradley, R. H. (1984). Home observation for measurement of the environment. Little Rock: University of Arkansas Press. Campbell, F. A., Pungello, E. P., Miller-Johnson, S. Burchinal, M., Ramey, C. T. (2001). The development of cognitive and academic abilities: Growth curves from an early childhood educational experiment. Developmental Psychology, 37, 231–242. Cappizzano, J., & Adams, G. (2000, March). The hours that children under five spend in child care: Variation across states. A report assessing the new federalism. Washington, DC: The Urban Institute. Carey, W. B., & McDevitt, S. C. (1978). Revision of the infant temperament questionnaire. Pediatrics, 61, 735–739. Chin-Quee, D., & Scarr, S. (1994). Lack of early child care effects on school-age children’s social competence and academic achievement. Early Development and Parenting, 3, 103–112. Clark, R., Hyde, J., Essex, M., & Klein, M. (1997). Length of maternity leave and quality of mother-infant interactions. Child Development, 68, 364–383. Clarke-Stewart, K. A., Gruber, C., & Fitzgerald, L. (1994). Children at home and in day care. Hillsdale, NJ: Lawrence Erlbaum. Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum. Collins, W. A., Maccoby, E., Steinberg, L., Hetherington, E. M., & Bornstein, M. (2000). Contemporary research on parenting: The case for nature and nurture. American Psychologist, 55, 218–232. Costa, P. T., & McCrae, R. R. (1985). The NEO personality inventory manual. Odessa, FL: Psychological Assessment Resources. Courville, T., & Thompson, B. (2001). Use of structure coefficients in published multiple regression articles: β is not enough. Educational and Psychological Measurement, 61, 229–248. Deater-Deckard, K., Pinkerton, P., & Scarr, S. (1996). Child care quality and children’s behavioral adjustment: A four-year longitudinal study. Journal of Child Psychology and Psychiatry and Allied Disciplines, 37, 937–948. DeMeis, D. K., Hock, E., & McBride, S. L. (1986). Balance of employment and motherhood: Longitudinal study of mothers’ feelings about separation from their firstborn infants. Developmental Psychology, 22, 627–632. Fox, N., & Fein, G. (1990). Infant day care: The current debate. Norwood, NJ: Ablex. Goelman, H., & Pence, A. (1987). Effects of child care, family, and individual characteristics on children’s language development: The Victoria day care research project. In
Salkind_Chapter 08.indd 148
9/16/2010 12:40:28 PM
NICHD Early Child Care Research Network
Early Child Care
149
D. Phillips (Ed.), Quality in child care: What does research tell us? Washington, DC: National Association for the Education of Young Children. Greenberger, E., Goldberg, W. A., Crawford, T. J., & Granger, J. (1988). Beliefs about the consequences of maternal employment for children. Psychology of Women Quarterly, 12, 35–59. Gresham, F., & Elliott, S. (1990). Social skills rating system. Circle Pines, MN: American Guidance Service. Hofferth, S. L., Shauman, K. A., Henke, R. R., & West, J. (1998). Characteristics of children’s early care and education programs: Data from the 1995 National Household Education Survey. (U.S. Department of Education, National Center for Education Statistics 98–128, 1998). Hoffman, L., & Youngblade, L. (1999). Mothers at work: Effects on children’s well-being. New York: Cambridge University Press. Howes, C., & Stewart, P. (1987). Child’s play with adults, toys, and peers: An examination of family and child care influences. Developmental Psychology, 23, 423 – 430. Jeffrey, R. (1989). Risk behaviors and health: Contrasting individual and population perspectives. American Psychologist, 44, 1194 –1202. Lamb, M. (1998). Nonparental child care: Context, quality, correlates, and consequences. In W. Damon, I. E. Sigel, & K. A Renninger (Eds.), Handbook of child psychology: Vol. 4. Child psychology in practice. New York: Wiley. Laparo, K., & Pianta, R. C. (in press). Cross-time relations of child performance from preschool to second grade: A meta-analytic review. Review of Educational Research. Levine, S., Elzey, F. F., & Lewis, M. (1969). California preschool social competency scale. Palo Alto, CA: Consulting Psychologists Press. Marshall, N. L., & Barnett, R. C. (1993). Work-family stains and gains among two-earner couples. Journal of Community Psychology, 21, 64–78. McCartney, K. (1984). Effect of quality of day-care environment on children’s language development. Developmental Psychology, 20, 244–260. McCartney, K., & Rosenthal, R. (2000). Effect size, practical importance, and social policy for children. Child Development, 71, 173–180. McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detecting interactions and moderator effect. Psychological Bulletin, 114, 376–390. McLoyd, V. C. (1998). Socioeconomic disadvantage and child development. American Psychologist, 53, 185–204. Melhuish, E., Lloyd, E., Martin, S., & Mooney, A. (1990). Type of child care at 18 months II: Relations with cognitive and language development. Journal of Child Psychology and Psychiatry, 31, 849–859. National Education Goals Panel. (1997). The national education goals report: Building a nation of learners. Washington, D.C.: National Education Goals Panel. NICHD Early Child Care Research Network. (1996). Characteristics of infant child care: Factors contributing to positive caregiving. Early Childhood Research Quarterly, 11, 269–306. NICHD Early Child Care Research Network. (1997a). Child care in the first year of life. Merrill-Palmer Quarterly, 43, 340–360. NICHD Early Child Care Research Network. (1997b). The effects of infant child care on infant-mother attachment security: Results of the NICHD Study of Early Child Care. Child Development, 68, 860–879. NICHD Early Child Care Research Network. (1998). Early child care and self-control, compliance and problem behavior at 24 and 36 months. Child Development, 69, 1145–1170. NICHD Early Child Care Research Network. (1999a). Child care and mother-child interaction in the first 3 years of life. Developmental Psychology, 35, 1399–1413. NICHD Early Child Care Research Network. (1999b). Chronicity of maternal depressive symptoms, maternal sensitivity, and child functioning at 36 months. Developmental Psychology, 35, 1297–1310.
Salkind_Chapter 08.indd 149
9/16/2010 12:40:28 PM
150
Human Development
NICHD Early Child Care Research Network. (2000a). Characteristics and quality of child care for toddlers and preschoolers. Applied Developmental Sciences, 4, 116–135. NICHD Child Care Research Network. (2000b). The relation of child care to cognitive and language development. Child Development, 71, 958–978. NICHD Early Child Care Research Network (in press–a). Child care and family predictors of preschool attachment and stability from infancy. NICHD Early Child Care Research Network (in press–b). Parenting and family influences when children are in child care. In J. G. Borkowski, S. Ramey, & M. Bristol-Power (Eds.), Parenting and the child’s world: Influences on intellectual, academic, and socialemotional development. Mahwah, NJ: Erlbaum. Park, K., & Honig, A. (1991). Infant child care patterns and later teacher ratings of preschool behaviors. Early Child Development and Care, 68, 89–96. Peisner-Feinberg, E. S., & Burchinal, M. R. (1997). Relations between preschool children’s child care experiences and concurrent development: The Cost, Quality, and Outcomes Study. Merrill-Palmer Quarterly, 43, 451–477. Peisner-Feinberg, E. S., Burchinal, M. R., Clifford, R. M., Culkin, M., Howes, C., Kagan, S. L., & Yazejian, N. (in press). The relation of preschool quality to children’s cognitive and social developmental trajectories through second grade. Child Development. Phillips, D. (1995). Child care for low-income families: Summary of two workshops. Washington, DC: National Academy Press. Report available at http://nccic.org/ research/nrc_care/c_care.html. Pianta, R. C., & Cox, M. J. (1999). The transition to kindergarten. Baltimore, MD: Paul H. Brookes. Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 385 – 401. Ramey, C., Campbell, F., Burchinal, M., Skinner, M., Gardner, D., & Ramey, S. (in press). Persistent effects of early childhood education on high-risk children and their mothers. Applied Developmental Science. Rimm-Kaufman, S. E., Pianta, R. C., & Cox, M. J. (in press). Teachers’ judgments of success in the transition to kindergarten. Early Childhood Research Quarterly. Scarr, S. (1998). American child care today. American Psychologist, 53, 95–108. Vandell, D. L. (1979). Effects of a playgroup experience on mother-son and father-son interaction. Developmental Psychology, 15, 379–385. Vandell, D. L., & Corasaniti, M. A. (1990). Child care and the family: Complex contributors to child development. New Directions for Child Development, 49, 23–37. Vandell, D. L., Gallagher, K., & Dadisman, K. (2000). Another look at the elephant: Child care research in the nineties. In R. Taylor (Ed.), Resilience across contexts: Family, work, culture and community (pp. 91–120). Hillsdale, NJ: Erlbaum. Vandell, D. L., & Wolfe, B. (2000). Child care quality: Does it matter and does it need to be improved? Report prepared for the Department of Health and Human Services, Washington, DC. Report available at http://www.ssc.wisc.edu/irp/. Vernon-Feagans, L., Emanuel, I. B., & Blood, I. (1997). The effect of otitis media and quality daycare on children’s language development. Journal of Applied Developmental Psychology, 18, 395– 409. West, J., Denton, K., & Germino-Hausken, E. (2000). America’s kindergarteners. Washington, DC: National Center for Education Statistics. Winer, B. (1971). Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill. Woodcock, R. W., & Johnson, M. B. (1990). Tests of achievement, WJ-R: Examiner’s manual. Allen, TX: DLM Teaching Resources. Zimmerman, I. L., Steiner, V. G., & Pond, R. E. (1979). Preschool language scale. San Antonio, TX: The Psychological Corporation.
Salkind_Chapter 08.indd 150
9/16/2010 12:40:28 PM
9 A Developmental Approach to Language Acquisition: Two Case Studies M. Bamberg, N. Budwig and B. Kaplan
I. Introduction
T
he principal intended readers of this paper are those scholars who are engaged in empirical inquiry into some aspect of first language acquisition. Within this diverse group, we take as our primary addressees those who define themselves as developmental psychologists. We therefore formulate issues of language acquisition in accord with the discursive practices dominant in Developmental Psychology. Nevertheless, we believe that our treatment of issues of language acquisition from a developmental point of view has considerable pertinence to linguistic theorists, whose main interest in language acquisition appears to be the construction and refinement of linguistic theory; we thus hope to attract the attention of such theorists, even if they are finally led to conclude that our developmental concerns are marginal or irrelevant to their focal interests.
II. Debates on the Theme of Development in Child Language Research A cursory survey of recent works in the field of child language acquisition suggests that concerns with acquisition theory, learning mechanisms, telos of development, and even development itself, figure more centrally in the
Source: First Language, 11 (1991): 121–141.
Salkind_Chapter 09.indd 151
9/16/2010 12:40:12 PM
152
Human Development
theorizing of linguists and cognitive psychologists than in the discourse of those who occupy the traditional domain of developmental psycholinguistics. For example, Atkinson (1987) remarks critically on the lack of concern among developmental psycholinguists in the seventies with developmental mechanisms; and Pinker (1988) extends this critique to the whole research programme of developmental psycholinguists, charging that it has not really been concerned with language acquisition at all: There were studies of the speech of children at one or more stages, and of their comprehension abilities, but virtually no one paid attention to the learning processes per se … nor to the end-state of acquisition … (p. 100).
Not surprisingly, accompanying such criticisms of traditional developmental psycholinguistics, there emerged a new orientation towards problems of first language acquisition. This orientation was labelled by many of its proponents, Linguistic Theory. Given the criticisms launched against traditional developmental psycholinguistics, it is, again, not surprising that linguistic theorists sought clearly to formulate the end-state (presumably, the attainment of linguistic maturity) and also tried to articulate learning principles leading to the end-state. Despite many intramural disagreements concerning the characteristics of the end state and the learning principles presumed to culminate in that end state, linguistic theorists have demarcated themselves as a group by their formulation of a telos for linguistic development, and their insistence that all descriptions of changes in linguistic knowledge during ontogenesis must be formulated in relation to this telos. Those adopting the Linguistic Orientation seem to show a scepticism with regard to the relevance of descriptions of child language performance. One manifestation of this scepticism is Wexler & Culicover’s (1980) claim that descriptions of child language data cannot contribute to an explanation of language acquisition. Not all linguistic theorists are so categorical in their rejection of descriptions of child language performance. Hyams (1986, 1988), and also Roeper (1987), admit empirical data to the study of language acquisition inasmuch as they mediate between the ‘logical problem’ and the ‘developmental problem’ of language acquisition. In Hyams’s terms the logical problem of language acquisition consists of the hypothesis of language development as an instantaneous process: Linguistic theory treats language development as an ‘instantaneous’ process, which is to say it idealizes to a situation in which the child has at his disposal all of the principles and parameters of UG and all linguistic data necessary to fix these parameters (1988:155).
However, in determining the developmental problem of language acquisition, the flip-side of the logical problem, Hyams calls for the empirical investigation into how (and ultimately why) language acquisition is not instantaneous.
Salkind_Chapter 09.indd 152
9/16/2010 12:40:12 PM
Bamberg et al.
Developmental Approach
153
In spite of the fact that one does not know which principles and parameters the child originally has at his/her disposal – this is up for grabs in light of the particular linguistic theory actually applied – the linguistic approach has the virtue of stating a clear telos of language acquisition, and – bracketing or denying the role of cognitive and environmental factors in the establishment of the end-state – can develop testable hypotheses, governed by the accepted telos, for particular domains of grammar. This linguistic theory orientation toward language acquisition has not gone unchallenged. Several recent investigators (e.g., Bates & MacWhinney 1988, Bloom & Harner 1989, Slobin 1988) have argued that allowing the end-state or telos to delimit studies of language acquisition blinds one to interesting things that happen along the way toward that state. It also marginalizes or excludes, a priori, investigations of phenomena which may be essential to a full understanding of language acquisition, even with respect to the telos posited by the linguistic theorists. In sum, this latter group seeks to keep the space open for a more full-blown developmental approach to child language, a space they take to have been peremptorily foreclosed by the linguistic theorists. In a recent discussion note Bloom & Harner (1989) have sought to contrast these two orientations by characterizing the linguistic theorists’ approach as ‘top-down’ and the alternative approach as ‘bottom-up’. Bloom & Harner designate the latter approach as the ‘developmental perspective’. Although Bloom & Harner in their schematization of the two approaches surely do not wish to promote the inference, the notion of a bottom-up procedure may be taken by some to suggest that a developmental perspective is intrinsically empiricistic and positivistic: merely descriptive and atheoretical, or arriving at theory through induction from data. In order to forestall such possible readings, we shall avoid Bloom & Harner’s contrast and simply talk about the two approaches as the LINGUISTIC APPROACH and the DEVELOPMENTAL APPROACH (see Bloom in press for a more detailed discussion of different perspectives on language acquisition research). Developmental approaches are not monolithic. There are different developmental approaches, and they are often at odds with one another. In the following section, therefore, we shall focus on our specification of the Developmental Approach, incorporating in the discussion a consideration of the differences, as we see them, between our view of development and that of the linguistic theorists.
III. A Developmental Approach to Child Language Acquisition A central feature of our approach is that it presupposes an a priori definition of development, one that serves to distinguish developmental changes from other historical or temporal changes over the life-span. Development ‘is not
Salkind_Chapter 09.indd 153
9/16/2010 12:40:12 PM
154
Human Development
a mere object of discovery in the Book of Nature, but a way of organizing phenomena’ (Kaplan 1964:6), of theorizing phenomena occurring not only in ontogenesis, but in culture history, in psychopathology, etc. It is thus an imposition on data of a specific mode of describing and understanding phenomena occurring over time, and even of phenomena that are coeval. It is important to stress that our Developmental Approach – as is also the case for the developmental approaches of Freud, Piaget and others – is, in no way, less theoretical than the Linguistic Approach; it simply approaches linguistic phenomena from a different perspective than that adopted by theorists of the Linguistic Approach. Nor is it less empirical than putatively atheoretical approaches, which claim to eschew any theory (perspective), but merely describe the given phenomena in an objective, theory-free, way; rather, it recognizes the inevitability of theory in all empirical inquiry and enjoins a certain way of observing/describing the phenomena. In this way, it is akin to the Linguistic Approach. It differs from the Linguistic Approach, however, in its basic orientation. Starting with more general presuppositions about a human being’s functioning in his/her environment, it construes linguistic activity as only one aspect – albeit a central one – in such a human being’s functioning, and consequently sees language use and understanding in the light of such more general functioning. In contrast, the Linguistic Approach begins from a specific definition of LANGUAGE, and this definition or circumscription determines its further views of the human organism and its ways of operating in its milieu. It follows from this difference in orientation that the very issue of what constitutes empirical data and empirical evidence will differ considerably. For example, while the Linguistic Approach can institute a clear-cut division between competence and performance, the Developmental Approach avoids such a radical division, and allows for a conception of language intrinsically less segregated from other aspects of the human being’s activity in a socio-cultural order. Let us explicate in terms of a related developmental approach. In Piaget’s approach, ‘all cognitive acquisitions, language included, [are] the product of a progressive construction’; these constructions and reconstructions are viewed as having their roots ‘in action and in sensori-motor mechanisms that reach deeper than the linguistic reality’ (Inhelder 1980:133). It may appear as if Piaget’s view grounds language totally in other non-linguistic activities of the human organism, but this interpretation seems to us to be only partly warranted. Piaget’s general developmental orientation is concerned with the ways in which variable structures (forms, means) emerge in ontogenesis with regard to functions taken as invariant; it therefore allows in principle for a dialectic between linguistic and other activities of the human organism, rather than a strict dependence of linguistic activities on other non-linguistic ones. To illustrate our approach of developmental research with an example stemming from Piaget’s research: the first ‘finding’ of a hidden object should
Salkind_Chapter 09.indd 154
9/16/2010 12:40:12 PM
Bamberg et al.
Developmental Approach
155
not necessarily be regarded as evidence for the existence of the Object Concept in the child. Rather, the Object Concept is considered to be the result of an integrated set of actions which vary according to different conditions – and as such, they give insight into the growth of structure. By analogy, the first appearance of a particular linguistic form cannot be viewed as evidence for the structures the way they figure in the target language. Rather, the Developmental Approach requires one to analyse these forms in terms of what they do from the child’s perspective; and what these forms do can only be analysed sufficiently across different contexts as part of an action orientation on the part of the child, which contributes (and as such can be said: leads) to the structure that is assumed to operate in the target language. The principle that informs our Developmental Approach, and serves to guide our developmental analyses, is the Orthogenetic Principle, formulated by Werner & Kaplan. This principle defines development in terms of increasing differentiation and hierarchic integration (Werner 1957, Werner & Kaplan 1956, Werner & Kaplan 1963/1984). It does not predict that any historical or temporal changes will be developmental. Whether or not such changes over time are to be construed as developmental changes depends on whether they manifest an increasing differentiation of parts within a whole, of forms with respect to functions, of functions themselves; and, concurrently or sequentially, whether they manifest a higher level of integration (synthesis, unity, co-operation) relative to prior states of affairs. In this framework, developmental changes are characterized by the reciprocity between means (forms) and ends (functions), i.e., ‘that changes in means affect the character of the ends, and conversely, changes in ends influence the character of the means’ (Werner & Kaplan 1963/84:7). Taking into account that linguistic behaviour IS action, using means that physically represent (make manifest) mental representations of action(s), the means (or forms) are figuring in at least two dual functions: (i) they have an effect on the world, and (ii) they represent. These two directions of fit (cf. Bamberg 1980, Budwig 1986, 1989, Searle 1983) undergo marked changes themselves and are of particular relevance for the development of the means to realize these dual functions. Consequently, in this approach the development of the linguistic system is not viewed as independent of the development of linguistic behaviour. Rather, both are coupled developmentally by an early fusion of undifferentiated ‘action’ and ‘representation’ (similar to Piaget’s original approach) out of which particular (language specific) functions and forms are increasingly differentiated and hierarchically integrated (cf. Werner & Kaplan’s Orthogenetic Principle). To summarize, while the Developmental Approach focuses on an early fusion of knowledge and action, the Linguistic Approach has already leapt over this developmentally early stage, trying to reify the level of abstraction which from a Developmental perspective first needs to be explained.
Salkind_Chapter 09.indd 155
9/16/2010 12:40:12 PM
156
Human Development
The Developmental Approach that we are proposing shows a number of affinities with various functional approaches to language and particularly to language development (cf. Bates & MacWhinney 1982, 1987, 1988, Benveniste 1971, Dik 1978, 1980, 1983, Foley & Van Valin 1984, Halliday 1973, Jakobson 1960, Karmiloff-Smith 1979, 1984, 1987, Silverstein 1987a, 1987b, Van Valin this volume – to name just a few). We will touch on the similarities to and differences among such approaches at the end of this article. Next we will turn to a more detailed discussion of developmental changes in two sub-domains of language, the use of personal pronouns and tense contrasts. The purpose of these two case studies in language changes is to lead to an interaction of concerns regarding the telos of development with concerns of the child’s point of view. More specifically, we will put forth the view that the notion of the telos of development has to take account of the multifunctionality of language; and as the flip-side of this orientation, in the empirical analysis of children’s speech patterns, we presume children to formulate their own linkages between language forms and language functions (cf. also ErvinTripp 1989 for a similar position). Whether and how these linkages are in the service of the ultimate telos is an empirical question, though our Developmental position holds us to construe the changes in light of an increasing differentiation and a hierarchic integration.
IV. The Empirical Domain in the Developmental Approach: Two Case Studies IV.1 Introduction In this section we will present two examples of DEVELOPMENTAL CHANGES, one from earlier development, the other from so-called later development (cf. Karmiloff-Smith 1986, 1987). The example from early language development documents the changes in the organization and reorganization processes of form-function relationships in the first uses of personal pronouns referring to Self. The example from later language development focuses on the reorganization of functions that are tied to particular linguistic contrasts, the pronominal-nominal contrast on the one hand, and the contrast between two tense forms (present perfect and simple present) on the other. Our first question in both analyses is ‘What do these forms “mean” to the child?’. In other words, we assume that the child’s matching of forms with functions is based on (a) some analysis of what these forms do in speech addressed to the child (or his/her immediate surrounding), and (b) some pre-existing categorization of what language consists of and how it functions. Thus, our first question of what particular forms mean to the child involves
Salkind_Chapter 09.indd 156
9/16/2010 12:40:13 PM
Bamberg et al.
Developmental Approach
157
a consideration of the merger of pre-existing categorizations of language functions, and the analysis of what particular forms do in the input. The second question that we pursue in the following two analyses of developmental reorganizations is the question of how these ‘meanings’ are reorganized over time into relationships between forms and functions in adult language. In this part of our developmental analysis, we are concerned with the question of how early contextually based form-function relationships can develop into some more abstract representations of what forms mean at levels of semantic or syntactic (‘purely formal’) generalizations. Thus, the fact that particular forms can be more adequately described in the target language as parts of a (more or less) abstract system becomes part of our developmental investigations of what the child may ‘mean’ with a particular form. However, there is no basis to make it the a priori starting point for developmental analysis, unless one believes that this is all that needs to be explained. A third question that is followed up in our first example is: How can we account for the additions of forms, i.e., for processes of changes in the purely formal sector of language acquisition? Here again, we document that the dynamics between forms and functions push towards the integration of new forms into the system; and as such we supercede formulations of logical necessities for y developing before x as pure predictions on the basis of linguistic theorizing. We also document that similar developmental processes can be revealed in the analysis of form-function relationships in later phases of language development. Though the issue of entry into a system of formal devices has consumed the energy of language researchers, a thorough analysis of developmental changes up to the full integration of forms into multifunctional facets of performance is equally revealing of developmental forces. In sum, mapping out two examples of developmental changes in detail serves the function of giving a glimpse of the factors that occasion these changes. Furthermore, documenting these changes also serves the purpose of explicating a method for doing empirical research that is not preoccupied (and often blinded) by a priori linguistic generalizations of what to expect due to the theory of the target system, but rather is open to minute and subtle changes that contribute to the child’s linguistic functioning.
IV.2 Developmental Changes in the Entry to the Pronoun System In our first example of DEVELOPMENTAL CHANGES we will focus on the first appearances of personal pronouns in English-speaking children. The pronouns that are used before any other pronouns are those that refer to the Self, i.e., to the speaker (cf. Budwig 1986, 1989, Chiat 1986). Thus this
Salkind_Chapter 09.indd 157
9/16/2010 12:40:13 PM
158
Human Development
quasi-natural division or sub-division of the early pronoun system led us to follow up these early uses under the guidelines of two questions: (a) Do the early self-reference forms comprise a system in and of themselves, with an inner systematicity (giving to different self-reference forms different functions), and with an outer systematicity, delineating this early system from later reference forms to others than the self? (b) What happens when reference forms to others are integrated into the pronoun system (or more precisely, differentiated from the early restricted pronoun system of self-reference forms)? More specifically, our first example will represent the fine-grained changes in the organization of the I, Me, and My, and how this system changes in its inner consistency with the adding of a new form We, which includes and amalgamates reference to Self and other. The organizational changes can best be characterized in three different phases. In the first phase, several first person pronouns are used to refer to the Self, often in ways that deviate strikingly from adult grammar. Children say things such as ‘I like peas’, ‘Me jump’, ‘My build tower’. These three different forms are not used in free variation, but rather contrastively express various perspectives on the Self’s role in events: My and I can be contrasted in terms of semantic and pragmatic considerations. At the semantic level My is used in utterances referring to Self as prototypical agent, while I is used in references to Self as experiencer (‘My blew the candles out’ versus ‘I like peas’). In terms of pragmatic force My tends to appear in reference to Self in control acts (directives, requests, protests), while I occurs in assertives in which pragmatic control is not at issue (replies to information questions and descriptions). The use of Me is quite similar to the use of My with the exception that the children refer to events in which they attempt or actually bring about actions that impinge upon themselves rather than upon some external object. One can characterize children’s use of first person pronominal forms during Phase I as having an inner consistency. At a time before children regularly refer to others, their use of first person forms is based on an interplay between factors involving semantic agency and pragmatic control. Thus, as long as this system is clearly delineated, the organizational effort of the child is focused on the inner consistency of the system along a semantic/ pragmatic axis of language functioning, namely that of semantic agentivity and pragmatic control. In the second phase, two major changes occur. First, children no longer segregate I, Me, My, for expression of action-specific categories; rather, I is exploited to express the functions previously covered by the separate forms. This decoupling of I from a specific function to cover multiple functions frees the forms My and Me to express their adult functions as possessive pronoun and object of the verb. Along with this change in inner systematicity, a new form, We, enters the system and contrasts with I. However, at first, the I/We contrast is not used
Salkind_Chapter 09.indd 158
9/16/2010 12:40:13 PM
Bamberg et al.
Developmental Approach
159
to mark the distinction between first person singular and plural. Rather, I now refers to Self in the context of talk about desire to manipulate objects, or actual object manipulation. Utterances with We are used when the action that is referred to is part of a broader action sequence leading to an independent or jointly conceived goal. We signals that the motive for the action is not anchored in the individual’s subjectivity. The action is conducted in accordance with more generalized plans according to which goals can be achieved. In the corpus collected, the We form is frequently used as the children build with blocks while the mothers quietly observe. In these activities the children refer to several component actions (e.g., ‘WE needa take this off and then WE hafta put this here …’) which are carried out in accordance with the broader goal of building a particular block structure. There is no nonverbal indication that the children are using We to refer to a collective. Further support for this interpretation stems from the mothers’ verbal responses in that the mothers verbally encode the children’s actions as independent (e.g., ‘Are YOU taking that off?’). In sum, there is little reason to believe that the child is using We to refer to a collective, but rather it seems to function as a kind of marker of impersonal agency (see Budwig 1990 for further details). Finally, in the third phase, We contrasts with I to establish the plural/singular distinction. Thus, I and We are referentially differentiated, and at the same time, We by now can be viewed as fully integrated into the newly established pronoun system. Let us summarize the DEVELOPMENTAL CHANGES that take place over the three phases described. The transition from Phase I to Phase II involves a decoupling of the form-function pairings of the three other Self reference forms, as well as the addition of a new form. In the second phase a consideration of the degree of semantic agency or the pragmatic force of the utterance no longer motivates the choice of pronominal form. However, while the use of I, My, and Me appears to be guided be nonfunctional (more abstract) motivations, our discussion of the I/We contrast reveals that functional considerations actually are still at play. At this point, I contrasts with We in terms of discursive considerations. It is only during Phase III that the I/We distinction is conceptualized at the referential level in terms of plurality. What these DEVELOPMENTAL CHANGES signify at a more general level is that children first link the use of particular linguistic forms with concrete action categories and only later move to a level of categorization that can be characterized as more abstract in that the motivations need not be based on functional considerations.
IV.3 Developmental Changes in Later Reorganizations of Pronouns and Tense Forms Our second example of DEVELOPMENTAL CHANGES differs from the first along a number of dimensions: (1) In contrast to first appearances of forms,
Salkind_Chapter 09.indd 159
9/16/2010 12:40:13 PM
160
Human Development
this example focuses on LATER developmental changes, i.e., on changes in form-function relationships after the form has been used for quite a while – however, not yet in the way it is established in the adult system of use. (2) In this example we will compare developmental changes across linguistic domains. On the one hand, we will be concerned with changes in the use of a tense form – to be more precise, the present perfect as it contrasts with the simple present. On the other hand, we will focus on the changes in the use of pronouns – here specifically the use of the third person pronoun as it contrasts with the full nominal expression. (3) In contrast to the addition (integration and differentiation) of new forms as in the first example, we will concentrate in this example solely on the changes in functions. However, we feel that this reduction in complexity of the developmental issue will be counterbalanced by the comparison across the two domains mentioned above. Another difference should be mentioned, though not as relevant as the previous ones, consisting of the language under investigation in this second example: German. The data on which the following demonstration will be based stem from German-speaking, monolingual children from 3–10 years of age, as well as adults who were asked to tell a story based upon a 24-page picture book. In their employment of the first form-contrast, that of the simple present vs. the present perfect, children go through three distinct phases. In the first phase, the use of the perfect functions aspectually to signal the completion of actions or happenings. For instance, a 3½-year-old narrator lines up three happenings according to the format: ‘First X happens, then Y happens, and then Z has happened’; using the present perfect for the closure of a discourse string that seemingly pulls a number of events or happenings together into a discourse unit. In some way, the use of the present perfect – which in German is acquired before the simple past – resembles those findings that have been discussed under the heading ‘Aspect Before Tense’ (cf. Antinucci & Miller 1976, Bloom, Lifter & Hafitz 1980, Bronckart & Sinclair 1973), though the subjects in the present study in contrast to the subjects in those other studies (a) are older, (b) relate reference time and event time (and not speech time and event time), and (c) use the perfect for a particular discourse function. In the second phase, the present perfect is used to signal temporality. Whenever the narrator – here usually 5-year-old children – deviated from the canonical order of presenting the narrative events, the perfect was used to mark the event referred to as having taken place before an event referred to in a previous clause. This function of temporally reordering events typically followed the format: ‘Y happens, X has happened, Z happens’; where X in story or event time actually took place before Y – or at least is perspectivized from the narrator’s view in that particular sequence. In the third phase, which has been mastered by most of the 10-year-olds as well as by the adults, both functions are integrated into one singular pattern
Salkind_Chapter 09.indd 160
9/16/2010 12:40:13 PM
Bamberg et al.
Developmental Approach
161
of use, according to which aspectual AND temporal value join forces in the marking of discourse boundaries. As such, the present perfect foreshadows an orientation point, which is expressed by the use of the simple present – often in conjunction with a spatial and/or temporal deictic term. At the same time, the present perfect segments the discourse into episodic units which can be read as instructions to the listener for understanding the part-whole relationship of the narrative (cf. Bamberg 1990, for further discussions of the role of the present perfect in different speech genres). The typical format of the perfect use can be paraphrased as: ‘X has happened, and now Z happens here, because Y has happened’, followed and preceded by a new thematic event chain. While the present perfect/simple present contrast expresses different functions at the semantic level (aspectual → temporal → integration of both), at any given phase it functions analogously at the level of textual differentiation and integration, i.e., at the level where parts of the text are differentiated from each other, and at the same time integrated into an overarching conceptual whole. Similarly, the contrast signals to the story recipient how the speech activity of narrating is constituted. In the first phase, the formal contrast is used to signal some global discursive activities of the sort: This is how far I am in my speech activity, next I’ll turn to something else. In the second phase, the contrast is used locally to connect propositions, here to contrast them in their temporal order. Finally, in the third phase, both discursive functions are realized concurrently in signalling the episodic structure of the narrative. Comparing these developmental changes in the domain of tense contrasts with changes in the domain of noun/pronoun contrasts (in the same children within the same narrative activity), we find the same three phases along which children differentiate and integrate the contrast. In the first phase, the use of the third person pronoun (German: er) is reserved for the main character of the story, a little boy, irrespective of whether the narrator wants to maintain reference to this character, or whether he/she wants to reintroduce him into the story-line after some other object or character has been mentioned in the immediately preceding utterance. Other characters or objects are always introduced or re-introduced into the story-line by use of a full nominal expression. In the second phase, children reorganize their use of the pronoun/noun contrast inasfar as they now use full nominal expressions for the main character for the purpose of re-introducing him into the story-line, but only when the second main character, a little dog, has been mentioned in the immediately preceding utterance. However, when this is not the case, i.e., when the child narrator has been talking about some other object or even about any of the antagonists is the story, the third person pronoun is used for such re-introductions. Of course, the third person pronoun is also used to maintain reference to any other object or character.
Salkind_Chapter 09.indd 161
9/16/2010 12:40:13 PM
162
Human Development
Finally, in the third phase, the third person pronoun is used for reference maintenance only, irrespective of topic or character, and full nominal expressions are used to introduce or re-introduce characters after they have been temporarily suspended. These three phases in the organization/reorganization process of the pronoun/noun contrast can be summarized in a fashion similar to the phases for the tense contrast between simple present and present perfect. Children start out using the pronoun as an indicator for something that holds the narrative as a speech activity together as a whole. More specifically, the indicator is the form-function relationship of er and main protagonist at the semantic or referential level. At the same time, this form-function relationship demarcates the narrative activity that starts out with er referring to the main character, and ends with er referring to the main character of the story. Before that speech activity, and thereafter, er is free to refer to someone else. Thus, it can be argued that the picking of a particular pronoun (er) and imposing it systematically on to one character is a means-end relationship for signalling some thematic/referential relevance, as well as conveying some textual and interactional relationships. In the second phase, the contrast is used locally to connect and demarcate propositions, here to contrast them with regard to their thematic relationship. Finally, in the third phase, both functions are realized concurrently and signalled in a switching back and forth among the different characters of the story, thus establishing a textual structure within which antagonists and main characters relate to each other in their thematic roles. The fact that both contrastive pairs are organized and reorganized in the outlined three phases concurrently points to some common underlying organizational procedures. As in the previous section, where we discussed children’s developmental changes in the pronoun system expressing selfreference, children in their organizational processes of tense contrast and pronominal/nominal contrast work on problem areas that are phase-specific. In the first phase, the child is concerned solely with constraints of the formfunction relationship at the speech activity level of language functioning. As such, the role of the form contrast is to signal genre-specific discourse activities consisting of a beginning, a continuation and an ending. These parts are not filled by any conceptual discourse units such as episodes, but rather are established as ‘empty’, global discourse packages. In the second phase, the child seems to have forgotten that he/she ever had established these discursive packages and focuses on the role of the formal contrast in the establishment of the local level order of events. Accordingly, the referential/semantic connotation of particular forms is highlighted, while the discourse function does not seem to be taken into account. In the third and final phase, the two formerly differentiated functions are integrated into one conceptual whole. It now takes on a discursive and referential role in the establishment of a sequential order of events along the horizontal axis of
Salkind_Chapter 09.indd 162
9/16/2010 12:40:13 PM
Bamberg et al.
Developmental Approach
163
narrative structure and a hierarchical order of events according to the episodic structure of the narrative. Thus, the three phases developmentally fuse the two formerly differentiated functions of form contrasts into an integrated, multi-functional whole: starting out with a speech activity representation of the form-function contrast, followed by a referential/semantic representation, children overcome their limited focus on only one language function at a time by integrating both levels of representation into a single formal contrast representing multi-functional relationships.
IV.4 Commonalities in Developmental Changes – Towards the Construction of Linguistic Systems The first, and possibly most interesting, commonality that surfaces across the examples of both earlier and later developmental reorganizations lies in children’s reliance at first on categorizing their linkages of forms with functions in terms of concrete actions. This reliance holds for early entries into linguistic systems as well as for so-called later phases of organizations. Concrete actions here refers to the children’s use of linguistic forms in terms of what is DONE and ACHIEVED with language at an interpersonal level of meaning construction. More specifically, the early uses of I, Me, or My in subject position are differentiated along the axis of how actions can be qualified with regard to the Self, i.e., from an active doer to an inactive experiencer. Similarly, the early occurrences of the present perfect and the pronoun of the third person singular are used to index the performed narrative activity. In other words, children at the age of about 3½ years seem to have already developed an idea of the genre-specificity of narratives, and are trying to match particular linguistic forms with this activity type. Taking from Bakhtin (1986) and Barthes (1977), genre specificity is signalled through the interpersonal and textual functions that language performs, lending support to our conclusion that the interpersonal (and textual) activity function of language is highly relevant for children’s early phases of entering formal linguistic systems. The question raised earlier in this section of how the child in these phases of language development matches his/her pre-existing notions about language function with input analysis can now be tackled using this newly established framework. We assume that in these phases the child has recourse to a level of language functioning at which forms can be matched with concrete actions, i.e., particular achievements of conversational goals. The fact that categories are formed at this level of matching early sounds and early grammatical morphemes with concrete actions is documented in an abundance of mainly psychologically-oriented research in language development (cf. Carter 1975, 1978, 1979, Deutsch & Budwig 1983, Ervin-Tripp 1989, Ervin-Tripp & Bocaz 1989, Gerhardt 1990). The regression to this way of
Salkind_Chapter 09.indd 163
9/16/2010 12:40:13 PM
164
Human Development
matching forms with concrete action categories at later phases of organizational processes has been documented only recently (Bamberg 1987, Karmiloff-Smith 1986). Regressing in so-called later development in order to link linguistic forms with concrete actions, therefore, is nothing but a developmental repetition of more general category formation processes in language development. At the same time, we feel we have good grounds to assume that the child’s attention in the process of input analysis is more strongly tuned to the pragmatics of caregivers’ speech. Though the multi-functionality of particular forms in adult speech organization is still in the process of being re-introduced into the main stream of linguistic research – especially under the headings of cross-linguistic typologies (see Slobin 1989 for a discussion of ‘typologies of use’) and discourse analysis – the analysis of what grammatical markers DO in indexing and demarcating action categories in adult speech seems to be equally important to the child in matching input with pre-existing categories. A second, equally important commonality revealed in the analysis of the sequence of the developmental changes in early and later phases of development is the gradual move from an embeddedness in interpersonal/textual levels of language functioning to more abstract levels of categorization. More specifically, over time, reliance on concrete actions as part of a primary category system can be penetrated by a foregrounding of ideational functions of language use, leading to a gradual freedom from the confinement to categories of concrete actions. As such, focusing on the ideational aspects of language functions in this second phase in the developmental changes helps the child to solve the problem of defining new entries into the linguistic systems (personal pronouns which refer to the self, tense meanings, anaphora). At the same time, focusing on the ideational aspects of language constrains the efforts of the pattern seeker anew, inasmuch as the problem is not viewed with regard to the overall system. Rather, the child’s focus is restricted to some property of the system: the present perfect is used solely as a referential pastness marker; the third-person pronoun is used as a local contrast to disambiguate; and with regard to the acquisition of the pronoun system for self-reference, the child regresses to an action category reflected in the I/We differentiation/contrast along the dimensions of semantic agency and pragmatic control. Note that this functional category was the centre for the earlier differentiation between the formal contrast of I, Me, and My, which in this developmental phase has been resolved and integrated into syntactic differentiations. However, at this phase the child’s efforts are centred on the ideational function of language. This limited focus constitutes a prerequisite for the re-integration of action categories with referential/ideational purposes of language use. As such, in a third phase of developmental changes, the integration of form-function relationships into an over-arching system – such as the pronoun system of English, or the German tense-system – indexes the abstract connections within the system of language specific grammars.
Salkind_Chapter 09.indd 164
9/16/2010 12:40:13 PM
Bamberg et al.
Developmental Approach
165
Regarding the question of whether these developmental changes are best characterized in terms of continuity or discontinuity, our answer is unequivocally: both. The developmental changes proceed along a route of discontinuity inasmuch as the grammatical system is not the starting point, but the end point. Thus, there is a qualitative change from reliance on early action categories to a later system of grammar. The question of how abstract grammatical (systematic) relationships can emerge is resolved by the fusion or amalgamation of different language functions (interpersonal/textual/ ideational) into a multi-functional matching of form-function relationships in the target system (or telos). Still, other analyses can differentiate these form-function relationships of the target system at different levels of abstraction (cf. Bamberg 1990).
V. Conclusions In the concluding section of this article we will try to summarize the contributions of a Developmental Approach to the study of language acquisition, and show how this approach redefines the actual empirical domain. We should recall from our introduction that a Developmental Approach is not just another way of approaching language acquisition data, but rather an integrative view of the telos of development, the organism, learning mechanisms and the course of development. Views about these four topics are synthesized under the heading of the Orthogenetic Principle of Development as an increasing differentiation and hierarchical integration of human functioning. In the definition of the empirical domain of what is being acquired, the Developmental Approach views Language as Action and Language as Representation of Representations as functionally related. Consequently, a theory of the telos of development requires a model (a theory of language) that integrates the two. This model of the telos does not require that formal regularities necessarily have to be traced back and grounded in pragmatics. Rather, there should be ample ways in this model to formulate rules that are, in some cases, totally abstract, i.e., allowing for non-functional motivations. Accordingly, the formulation of parts of the linguistic system as abstract, i.e., without a direct relationship to pragmatic functioning, can take place in an over-arching functionalist framework. There is an important consequence of stating this telos of development for the potential course of development. Though the child is generally viewed as starting from an action orientation working his/her way up into higher forms of representation, this orientation does not always lead the child directly into the domain of abstract rule formation. The domains of pronouns and tenses that we used in the above case studies to demonstrate what we consider the prototypical way of rule formation are indeed special domains. They are domains in which linguistic forms function pragmatically as ‘pointers’ or
Salkind_Chapter 09.indd 165
9/16/2010 12:40:13 PM
166
Human Development
‘shifters’ (cf. Benveniste 1971, Jakobson 1960, Silverstein 1985, 1987b). Therefore, it may be argued that the telos of some linguistic sub-systems is abstract, while the telos of other systems is functionally motivated. Regardless of whether one stipulates a functionally based or abstract telos, the problem remains one of explaining how the child arrives at such an end-point. We have shown in our discussion of the two case studies that children formulate their own form-function linkages. Since the form-function linkages developed by children differ from those of the telos, it becomes clear that the stating of a telos is a necessary but not sufficient aspect of a Developmental Approach. It is here that we have suggested that the Orthogenetic Principle plays a fundamental role in understanding the transition from the child’s original construction of means-ends relations to those found in the target language. In closing then, it should be pointed up that our notion of Developmental Approach is close to what is commonly called the ‘functional’ or ‘functionalist’ approach to language and to language acquisition. Inasmuch as the Developmental Approach views language acquisition in terms of means-ends relationships, we meet the basic principle of functionalist theories. However, the dialectic viewpoint of the mediation between the product of language acquisition (the telos) and the process of acquiring language (‘the child’s point of view’) which is basic to the Developmental Approach is not necessarily shared by all approaches that adopt the term functionalist. To the extent that we integrate the multifunctionality of language, in particular language as it functions to ‘represent’ and to ‘act’, the Developmental Approach compares with approaches that strive for general divisions of language functions (cf. Benveniste 1971, Bühler 1934/65, Halliday 1973, Jakobson 1960, Malinowski 1923). However, the Developmental Approach outlined here goes beyond a typology of different language functions and their developmental histories by examining the dynamic relationship between the different functions, inasfar as they contribute to the construction of a so-called abstract linguistic system. In taking this focus, the Developmental Approach comes closer to what is commonly called the ‘functionalist sentence perspective’ (Firbas 1964, 1971) or other functionalist grammars (e.g., Dik 1983, Foley & Van Valin 1984, Givón 1984). These kinds of approaches and the Developmental Approach stipulate the need to study formal aspects of grammar in terms of semantics and pragmatics. An important additional ingredient of the Developmental Approach, however, is the assumption that a theory of a synchronic function-based grammar is not sufficient. Ontogenetically specific linkages of the semantic/ pragmatic realm in the construction of child-like grammars, resulting in particular reorganization phases, may not be prognosticated from the establishment of a function-based telos alone. Among those functionalists who have been interested in language development, Givón’s (1984) notion of typological differentiations may be most fruitfully applied to children’s construction of
Salkind_Chapter 09.indd 166
9/16/2010 12:40:13 PM
Bamberg et al.
Developmental Approach
167
reorganization phases, namely that at times the child settles for a semantic solution or a pragmatic solution, or he/she finds an intermediate solution to the linking of the different language functions in his/her grammar. Finally, we want to point to a strong affinity between what we have called the Developmental Approach and two efforts in the study of child language acquisition which are named (though for different reasons) functional approaches. These are the approaches put forth by Karmiloff-Smith (1979, 1984, 1987) and the Competition Model developed by Bates & MacWhinney (1982, 1987). Karmiloff-Smith highlights the interdependence of language and cognitive development, emphasizing the process-basis of a functional approach. Her distinction between behavioural changes that are predominatly exogenously driven and representational changes which are endogenously driven resembles the distinction between the child’s early action categories and abstract formal representations within the Developmental Approach. Future work within the Developmental Approach will need to address this issue in more detail. The language-learning mechanisms hypothesized by Bates & MacWhinney (1987) similarly employ cognitive structures, though quite different from those assumed by Karmiloff-Smith. The Competition Model of language performance is particularly well adapted to make predictions about the rate at which a particular form is mastered, since it takes into account both (a) the nature of the form-function relationship in the telos and (b) the frequency with which a particular form is available to the child. The assumption of a powerful ‘Parallel Distributing Processing’ learning mechanism may be fruitfully integrated into the Development Approach.
References Antinucci, F. & Miller, R. (1976). How children talk about what happened. Journal of Child Language, 3, 167–189. Atkinson, M. (1987). Mechanisms for language acquisition: learning, parameter-setting and triggering. First Language, 7(1), 3–30. Bakhtin, M.M. (1986). Speech Genres and Other Late Essays (Austin: University of Texas Press). Bamberg, M. (1980). A fresh look at the relationship between pragmatic and semantic knowledge. Archives of Psychology, 133, 23–43. ——— (1987). The Acquisition of Narratives (Berlin: Mouton de Gruyter). ——— (1990). The German perfekt: form and function of tense alternations. Studies in Language, 14:2, 253–290. Barthes, R. (1977). Image, Music, Text. Essays selected and translated by Stephen Heath (New York Hill & Wang). Bates, E. & MacWhinney, B. (1982) Functionalist approaches to grammar. In E Wanner & L. Gleitman (eds), Language Acquisition: The State of the Art (New York: Cambridge University Press). ——— (1987). Competition, variation and language learning. In B. MacWhinney (ed.), Mechanisms of Language Acquisition (Hillsdale, N.J.: Erlbaum).
Salkind_Chapter 09.indd 167
9/16/2010 12:40:13 PM
168
Human Development
Bates, E. & MacWhinney, B. (1988). What is functionalism? Papers and Reports on Child Language Development (Stanford University, Department of Linguistics), 27, 137–152. Benveniste, E. (1971). Problems in General Linguistics (Coral Gables, Florida University of Miami Press). Bloom, L. (in press). Language Development from Two to Three (Cambridge: Cambridge University Press). Bloom, L. & Harner, L. (1989). On the developmental contour of child language: a reply to Smith & Weist. Journal of Child Language, 16, 207–216. Bloom, L., Lifter, K. & Hafitz, J. (1980). Semantics of verbs and the development of verb inflection in child language. Language, 56:2, 386–412. Bronckart, J. P. & Sinclair, H. (1973). Time, tense, and aspect. Cognition, 2, 107–130. . Budwig, N. (1986). Agentivity and control in early child language. Unpublished Ph.D.dissertation, University of California, Berkeley. ——— (1989). The linguistic marking of agentivity and control in child language. Journal of Child Language, 16, 263–284. ——— (1990). A functional approach to the acquisition of personal pronouns. In G. ContiRamsden & C. Snow (eds), Children’s Language, Vol. 7 (Hillsdale, N. J: Erlbaum). Bühler, K. (1934/65). Sprachtheorie Die Darstellungsfunktion der Sprache, second edition (Stuttgart: Fischer). Carter, A. L. (1975). The transformation of sensori-motor morphemes into words: a case study of the development of more and mine. Journal of Child Language, 2, 233–250. ——— (1978). The development of systematic vocalizations prior to words: a case study. In N. Waterson & C. E. Snow (eds), The Development of Communication (Chichester: Wiley). ——— (1979). The disappearance schema: case study of a second-year communicative behavior. In E. Ochs & B. B. Schieffelin (eds), Developmental Pragmatics (London: Academic Press). Chiat, S. (1986) Personal pronouns. In P. Fletcher & M. Garman (eds), Language Acquisition, second edition (Cambridge: Cambridge University Press). Deutsch, W. & Budwig, N. (1983). Form and function in the development of possessives. Papers and Reports on Child Language Development (Stantord University, Department of Linguistics), 22, 36–42. Dik, S. C. (1978). Functional Grammar. North-Holland Linguistic Series 37 (Amsterdam: North-Holland). ——— (1980). Studies in Functional Grammar (London and New York Academic Press). ——— (1983). Basic principles of functional grammar. In S. C. Dik (ed.), Advances in Functional Grammar (Dordrecht, Holland: Foris). Ervin-Tripp, S. M. (1989). Speech acts and syntactic development: Linked or independent? Keynote address to the Boston Child Language Conference, October 1987. Berkeley Cognitive Science Report, 61, September. Ervin-Tripp, S. M. & Bocaz, L (1989). Quickly, before a witch gets me: children’s temporal conjunctions within speech acts, Berkeley Cognitive Science Report, 61, September. Firbas, J. (1964). On defining the theme in functional sentence analysis. Travaux linguistiques de Prague, 1: L’École de Prague d’aujourd’hui (Prague: Éditions de l’Academie Tchécoslovaque des Sciences). ——— (1971). On the concept of communicative dynamism in the theory of functional sentence perspective. In Studia Minora Facultatis Philosophicae Universitatis Brunensis. Foley, W. A. & Van Valin, R. D. (1984) Functional Syntax and Universal Grammar (Cambridge: Cambridge University Press). Gerhardt, J. (1990). The relation of language to context in children’s speech: the role of hafta statements in structuring three-year-olds’ discourse. Manuscript submitted for publication.
Salkind_Chapter 09.indd 168
9/16/2010 12:40:13 PM
Bamberg et al.
Developmental Approach
169
Givón, T. (1984) Syntax: A Functional Approach, Vol. I (Amsterdam: John Benjamins). Halliday, M. A. K. (1973). Explorations in the Functions of Language (London: Edward Arnold). Hyams, N. (1986) Language Acquisition and the Theory of Parameters (Dordrecht, Holland: Reidel). ——— (1988). A principles-and-parameters approach to the study of child language. Papers and Reports on Child Language Development (Stanford University, Department of Linguistics), 27, 153–161. Inhelder, B. (1980). Language and knowledge in a constructivist framework. In M. PiattelliPalmarini (ed.), Language and Learning: The Debate Between Jean Piaget and Noam Chomsky (Cambridge, MA: Harvard University Press). Jakobson, R. (1960). Linguistics and poetics. In T. A Sebeok (ed.), Style and Language (Cambridge, MA: MIT Press). Kaplan, B. (1964). Developmental aspects of the representation of time. Paper presented at the fourth annual meetings of the New England Psychological Association, Nov. 13, 1964, Chicopee, Massachusetts, as part of a symposium on ‘current Research in Language and Cognition in the Child’. Karmiloff-Smith, A. (1979). A Functional Approach to Child Language (Cambridge University Press). ——— (1984). Children’s problem solving. In M. E. Lamb, A. L. Brown & B. Rogoff (eds), Advances in Developmental Psychology, Vol. 3 (Hillsdale, N.J.: Erlbaum). ——— (1986). Some fundamental aspects of language development after age 5. In P. Fletcher & M. Garman (eds), Language Acquisition, second edition (Cambridge: Cambridge University Press). ——— (1987). Function and process in comparing language and cognition. In M. Hickmann (ed.), Social and Functional Approaches to Language and Thought (Orlando: Academic Press). Malinowski, B. (1923). The problem of meaning in primitive languages. Supplement to C. K. Ogden & I. A. Richards, The Meaning of Meaning (New York: Harcourt, Brace & World). Pinker, S. (1988). Learnability theory and the acquisition of a first language. In F. S. Kessel (ed.), The Development of Language and Language Researchers. Essays in Honor of Roger Brown (Hillsdale, N.J.: Erlbaum). Roeper, T. (1987). The acquisition of implicit arguments and the distinction between theory, process and mechanism. In B. MacWhinney (ed.), Mechanisms of Language Acquisition (Hillsdale, N.J.: Erlbaum). Searle, J. R. (1983). Intentionality: An Essay in the Philosophy of Mind (Cambridge: Cambridge University Press). Silverstein, M. (1985). The functional stratification of language and ontogenesis. In J. V. Wertsch (ed.), Culture, Communication, and Cognition Vygotskian Perspectives (London and New York: Cambridge University Press). ——— (1987A). The three faces of ‘function’: preliminaries to a psychology of language. In M. Hickmann (ed.), Social and Functional Approaches to Language and Thought (Orlando: Academic Press). ——— (1987B). Cognitive implications of a referential hierarchy. In M. Hickmann (ed.), Social and Functional Approaches to Language and Thought (Orlando: Academic Press). Slobin, D. I. (1988). Confessions of a wayward Chomskian. Papers and Reports on Child Language Development (Stanford University, Department of Linguistics), 27, 131–136. ——— (1989). Factors of language typology in the crosslinguistic study of acquisition. Paper presented to Symposium: ‘The Crosslinguistic Study of Child Language’, Tenth
Salkind_Chapter 09.indd 169
9/16/2010 12:40:13 PM
170
Human Development
Biennial Meetings of the International Society for the Study of Behavioural Development, Jyväskylä, Finland, July 9–13, 1989. Werner, H. (1957). The concept of development from a comparative and organismic point of view. In D. B. Harris (ed.), The Concept of Development: An Issue in the Study of Human Behavior (Minneapolis: University of Minnesota Press). Werner, H. & Kaplan, B. (1956). The developmental approach to cognition: its relevance to the psychological interpretation of anthropological and ethnolinguistic data. American Anthropologist, 58, 866–880. Werner, H. & Kaplan, B. (1963/84) Symbol Formation (Hillsdale, N.J.: Erlbaum). Wexler, K. & Culicover, P. (1980). Formal Principles of Language Acquisition (Cambridge, Mass.: MIT Press).
Salkind_Chapter 09.indd 170
9/16/2010 12:40:13 PM
10 Promoting Positive Youth Development: New Directions in Developmental Theory, Methods, and Research William M. Kurtines, Laura Ferrer-Wreder, Steven L. Berman, Carolyn Cass Lorente, Wendy K. Silverman and Marilyn J. Montgomery
T
he Miami Youth Development Project ( YDP) had its beginnings nearly two decades ago as a grassroots response to the needs of troubled (multiproblem) young people (Arnett, Kurtines, & Montgomery, 2008, this issue). Miami, an international city at the intersection of North and South America, was undergoing (and still is undergoing) an extended period of substantial multicultural growth. The community and its youth were experiencing negative (as well as positive) impact of this change. In this context, the evolution of YDP exemplifies the practical value of conducting research based on university-community collaboration and research-related principles consistent with the outreach research approach, that is, research designed to meet community needs by generating innovative knowledge of effective change-producing strategies (e.g., community-supported interventions) that are feasible, sustainable, and affordable in “real world” settings. In developing the Miami YDP, we drew on the strengths of a developmental intervention science (DIS) approach, a fusion of the developmental and intervention science literatures. We have adopted a DIS approach for our program of outreach research because it is specifically committed to the use of descriptive and explanatory knowledge about changes within human systems that occur across the life Source: Journal of Adolescent Research, 23(3) (2008): 233–243.
Salkind_Chapter 10.indd 171
9/16/2010 12:40:01 PM
172
Human Development
span in the development of empirically based, multidisciplinary/life-span intervention strategies. In the process, we have been refining a multistage research design, the structure and format of which is intended to realize fully the potential for conducting comparative and longitudinal program evaluation research made possible by the logic of outreach research. A distinct advantage of a community-supported outreach research program committed to remain in the community long enough for the realization of community-valued developmental goals for its youth is that this long-term commitment also creates the potential for addressing long-term, research-related-knowledge development goals for the field in ways typically not available to short-term, externally funded studies. As this special issue illustrates, drawing on the strengths of the fusion of these literatures has the potential to bring together (a) a more empowering model knowledge development for research involvement in the community, one that includes meeting community needs as well a knowledge development needs; (b) a nuanced and contextualized notion of youth and their development; and (c) methodologies that richly reflect rather than reduce the experiences of the young people whose development we seek to promote. Specifically, this special issue illustrates the potential of DIS outreach research to do more than generate knowledge of effective intervention strategies for meeting community needs that are feasible, affordable, and sustainable in real-world settings. It explores the potential of DIS outreach research to generate knowledge that contributes to advancing the field of human development as well.
Developmental Intervention Science (DIS) As the 20th century ended, developmental science emerged as a core perspective in the human sciences as a result of its integration and application of a life-span/interdisciplinary orientation to basic and applied issues (Lerner, Fisher, & Weinberg, 2000). From this perspective, the term applied developmental science (ADS) is used to refer to scientific investigation that focuses on the use of research and application to promote positive development across the life span (Damon, 2004; Lerner et al., 2000). Applied developmental scientists adopt the view that positive individual development and family functioning is an interactive product of biology and the physical and social environments that continuously evolve and change over time. This perspective stresses the importance of understanding normative and atypical processes as they emerge within different developmental periods and across diverse physical and cultural settings. The ADS orientation is committed to the use of descriptive and explanatory knowledge about changes within human systems that occur across the life span in the development of empirically based theory that not only addresses a full spectrum of applied concerns
Salkind_Chapter 10.indd 172
9/16/2010 12:40:01 PM
Kurtines et al.
Promoting Positive Youth Development
173
(ranging from specific intervention strategies to broadband social policy) but is also influenced by the outcome of these community activities (Lerner et al., 2000).
New Problems and New Populations: Promoting Positive Youth Development As the 21st century begins to unfold, a large and growing literature on promoting positive youth development (PYD) has emerged in response to a complex set of interrelated contextual changes, with transformations in the conceptual foundations of both developmental and intervention science being particularly relevant. As Lerner (2005) noted, with respect to developmental science, the PYD movement was the result of the emergence of ADS accompanied by a shift away from the tendency to view adolescence as a period of “stress and storm” and youth as both dangerous and endangered or as “problems to be managed” (Arnett, 2000; Roth, Brooks-Gunn, Murray, & Foster, 1998). The predominant lens for conceptualizing the nature of adolescence was thus one that implicitly or explicitly used a deficit model of youth until recently when, increasingly, the study of adolescence became intermeshed with the emerging ideas associated with developmental systems theories (Lerner, 2005). These interests converged in the formulation of a set of ideas that enabled youth to be viewed as resources to be developed, and not as problems to be managed (Roth & Brooks-Gunn, 2003). During the same period, the conceptual foundations of intervention science were also undergoing transformation. With respect to intervention science, these changes included the emergence of prevention science as a logical extension of treatment science. More important with respect to the work reported here, with its emphasis on positive adjustment and optimal functioning, prevention science dovetailed with the emerging developmental science view of youth as resources to be developed rather than problems to be managed (e.g., Masten & Coatsworth, 1998). In the prevention science literature, efforts to broaden the criteria by which prevention intervention outcomes are evaluated (beyond reducing risk factors) have resulted in the inclusion of more general indices of positive adjustment and optimal functioning to include views of psychological health and resilience (including a sense of one’s meaning and purpose) as components of well-being (e.g., Masten & Coatsworth, 1998). However, interventions developed under the prevention science model, like the treatment science model, necessarily maintain a core focus on “preventing” negative developmental outcomes rather than promoting positive ones (Catalano, Berglund, Ryan, Lonczak, & Hawkins, 1999). The beginning of a convergence of concepts and constructs broadly related to promoting positive development in both developmental and intervention science has resulted in a recognition that intervention science needs to do more than “treat” problem
Salkind_Chapter 10.indd 173
9/16/2010 12:40:01 PM
174
Human Development
behaviors or “prevent” negative developmental outcomes (Damon, 2004; Lerner, 2005; Lerner et al., 2000; Seligman, Steen, Park, & Peterson, 2005). Indeed, as Lerner (2005) noted, hundreds of millions of federal tax dollars continue to be spent each year to reduce or prevent the problems. In contrast, far less research and fewer resources have been directed toward intervention efforts promoting positive development in general (Damon, 2004) and positive development interventions for at-risk and behavior problem youth in particular (Lerner, 2005). The emergence of the PYD movement has similarly resulted in recognition that developmental science needs to do more than generate complex “descriptive” models of developmental systems and of relations between individuals and their real-world ecological settings. The descriptive models need to be translated into programs that can be implemented in “usual care” practice in community settings. In this context, there has been a growing interest in bringing together evolving developmental science models, what Overton (1996) refers to as model of what changes and how it changes (Lerner, 2005), and evolving intervention science models of what to change and how to change it (Holmbeck, 2002; K. R. Weisz & Hawley, 2002). To date, however, there has been a paucity of examples of this type of research in the intervention literature in general and on PYD interventions in particular. Indeed, the literature on PYD and efforts to integrate developmental and intervention science is still in its infancy relative to the well-developed (and well-funded) treatment and prevention research literatures targeting problem and risky behavior (Jensen, Hoagwood, & Trickett, 1999).
A Developmental Intervention Science (DIS) Approach Drawing on the conceptual base provided by ADS and informed by social policy research (Lerner et al., 2000), a DIS approach is one specifically committed to the use of both descriptive and explanatory knowledge about changes within human systems that occur across the life span in the development, implementation, and evaluation of evidence-based multidisciplinary life-span intervention strategies. In this frame, we conceptualize the work described here as directed toward creating PYD interventions that draw on the strengths of a DIS approach and extend the PYD perspective by drawing on outreach research principles in the development of community-supported positive development programs. The use of an outreach research approach in the development of programs reported here is thus broadly conceptualized within the PYD perspective (Lerner, 2005). The PYD perspective has arisen because of interest among developmental scientists in using developmental systems, or dynamic models of human behavior and development, for understanding the plasticity of human development and, as well, the importance of relations between individuals
Salkind_Chapter 10.indd 174
9/16/2010 12:40:01 PM
Kurtines et al.
Promoting Positive Youth Development
175
and their real-world ecological settings as the basis of variation in the course of human development (Lerner, 2005).
Outreach Research Applied developmental scientists have increasingly recognized the importance of effective university-community collaborative models in achieving positive development goals (Damon, 2004). Such models involve a learning collaboration between scholars and community members and can be essential to knowledge-generation processes (Eccles, 1996; Keys, Bemak, & Lockhart, 1998). The recognition of this need, however, is recent, and the development of outreach research models has lagged behind other types of research models. In the area of intervention science, for example, Jensen et al. (1999) described two distinct models of research relevant to nationally funded treatment and prevention intervention research (e.g., by the National Institutes of Health). The first and most prominent model they described was efficacy research (Jensen et al., 1999), a model that focuses on generating knowledge of the efficacy of intervention strategies primarily developed by funded research evaluated in well-controlled studies conducted in university clinic and laboratory settings. The profusion of funded efficacy research has resulted in substantial support for the efficacy of a wide range of treatment and prevention interventions for youth under the problem/risky behavior reduction model, at least when conducted under well-controlled conditions (FerrerWreder, Stattin, Lorente, Tubman, & Adamson, 2004; Kazdin & Weisz, 1998; Ollendick, King, & Chorpita, 2006). One consequence of the emphasis on the use of rigorous experimental procedures to control for unwanted sources of variation in research designs has been that when applied in real-world settings (i.e., without benefit of experimental control), the effectiveness of such interventions has been of concern. That is, they have proven difficult to transport into usual-care practice because the utility and validity of the resulting interventions, when applied in real-world settings, is unclear. Consequently, a gap has emerged between evidence for the efficacy of the interventions when they were being developed in controlled clinic or laboratory settings and evidence for their effectiveness when they were applied in real-world settings without the advantages of control (J. R. Weisz, Donenberg, Han, & Kauneckis, 1995). Furthermore, there is currently little evidence that the gap has narrowed very much over the last decade (Spoth, Greenberg, Bierman, & Redmond, 2004). Moreover, as Spoth et al. (2004) have pointed out, too frequently efficacious interventions implemented in schools and communities through grant funding fail to survive the withdrawal of that funding (Adelman & Taylor, 2003). A chief reason for the limited sustainability of interventions begun by research projects may be because successful research implementation of the
Salkind_Chapter 10.indd 175
9/16/2010 12:40:01 PM
176
Human Development
project does not build the local ownership and infrastructure capacity required for the institutionalization of interventions (see Lerner, 1995). A second and far less prominent model was referred to as outreach research, a model that has been rarely and poorly funded relative to the efficacy research model. In contrast to theory-driven clinic or lab-based efficacy research and effectiveness research as an extension of this process (Schoenwald & Hoagwood, 2001), outreach research adopts an alternative perspective and starting point. Outreach research uses a bottom-up rather than a top-down approach to developing intervention strategies (Kurtines & Silverman, 1999; Silverman & Kurtines, 1997). Outreach research emerges out of and remains in community or usual-care practice settings because it is rooted in local and particular needs in real-world community settings. Because such intervention strategies initially emerge to meet local and particular needs and are implemented and evaluated in real-world community settings with respect to their capacity to do so from the beginning, effectiveness is built into intervention strategies developed under this model. Therefore, there is no need to address issues of transportability, dissemination, and implementation, because such approaches have never been in a clinic or lab from which they have to be transported. Another distinct advantage of a community-supported outreach research program is that this long-term commitment also creates the potential for addressing (in ways that are typically more difficult and costly to address by short-term, externally funded projects) long-term, research-related-knowledge development goals for the field. Because of its long-term community commitment, an important advantage of outreach research is its use of both short-term designs (i.e., randomized clinical trials or quasi-experimental) and long-term designs (i.e., multistage longitudinal and comparative) in evaluating long-term, community-supported outreach programs for both internal and external validity. Moreover, with respect to external validity, employing “long term” designs are useful in evaluating not only the “efficacy” of a DIS program’s specific intervention strategies, but also in evaluating the “effectiveness” of its service delivery program (Arnett, Kurtines, & Montgomery, 2008, this issue). We do not suggest outreach effectiveness research as a replacement for efficacy research. On the contrary, it should be employed in addition to efficacy research. Outreach research is viewed as an approach to be employed in ways that are contingently and contextually complementary to efficacy research. A view of efficacy and outreach research as complementary is consistent with this tradition and suggests that a researcher’s choice of intervention development strategy (e.g., efficacy research or outreach research or mixed efficacy/ outreach research) is (or should be) contingent upon factors relevant to the research issue in question, that is, type of problem (narrowband vs. broadband), type of intervention (treatment, prevention, positive development), type of outcome (short-term, long-term), the type of population (child, youth, adult, elderly), and/or the level of implementation (public sector vs. private sector).
Salkind_Chapter 10.indd 176
9/16/2010 12:40:01 PM
Kurtines et al.
Promoting Positive Youth Development
177
In addressing a relatively narrowband and short-term problem (e.g., symptom/risk reduction), for example, a researcher might choose to initially develop and refine an approach under relatively controlled settings in a university clinic or laboratory setting and subsequently extend that approach to usual-care practice in the community (i.e., efficacy/effectiveness research as defined in the current literature). On the other hand, in addressing a relatively broadband and long-term problem in the public sector (changing negative life-course trajectories in troubled youth in multicultural urban communities undergoing social transition), a researcher might choose to first develop and refine an approach under relatively uncontrolled settings in real time in a realworld community setting, then establish its basic utility and validity under those conditions, and finally conduct a long-term outcome evaluation of the program itself as well as the program participants, that is, outreach research as defined in the emerging literature. As described here, we adopted this approach in our work. As noted previously, there has been a differential pattern of nationally funded research that targets relatively narrowband and short-term problems (e.g., symptom/risk reduction) over the past decades, with a strong tendency to focus on developing intervention approaches under relatively controlled settings and subsequently seeking to extend them to usual-care practice in the community (i.e., efficacy/effectiveness research). A concomitant pattern of disappointing outcome results and little evidence for the sustainability of interventions developed under this approach in usual-care practice in community settings prompted a call for an expansion of approaches. Jensen et al. (1999) made this point specifically concerning the need for outreach research. They noted with respect to the question of how many youth intervention programs have successfully addressed issues of feasibility, sustainability, affordability, and so forth, the current answer is “very few (if any), indeed.” To change this to “many, if not most,” Jensen et al. acknowledged that new approaches are needed and that more effective partnerships need to be created between universities and communities. When it comes to research that is pertinent to the promotion of youth development, Jensen et al. also believe that there must be a qualitative change in the way universities interact with communities (cf. Eccles, 1996; McHale & Lerner, 1996). Jensen et al. (1999) acknowledged that university-community partnerships should be based on research-related principles that maximize internal validity. However, they propose that to be effective, such university-community collaborations should also be based on research-related principles that (1) enhance the focus on external validity and on the pertinence of research to the actual ecology of human development (Bronfenbrenner, 1979; Hultsch & Hickey, 1978); (2) incorporate the values and needs of community collaborators within research activities (Kellogg Commission on the Future of State and Land-Grant Colleges, 1999; Spanier, 1999); (3) utilize a full conceptualization and assessment of outcomes, that is, a commitment to understanding
Salkind_Chapter 10.indd 177
9/16/2010 12:40:01 PM
178
Human Development
thoroughly both the direct and the indirect effects of an intervention on youth and their context and to measuring these outcomes; (4) display a willingness to make modifications to research methods in order to fit the circumstances of the local community (Weiss & Greene, 1992); and (5) embrace a long-term perspective, that is, the commitment of the university and its programs to remain in the community for a time period sufficient to see the realization of community-valued developmental goals for its youth. To this, Lerner et al. (2000) add that the principles of “best practice” articulated by Jensen et al. (1999) may be merged with or, perhaps better, built upon those discussed by Eccles (1996) and McHale & Lerner (1996). These principles include colearning (between two expert systems–the community and the university); humility on the part of the university and its faculty so that true colearning and collaboration among equals can occur; and cultural integration, so the university and community can recognize and appreciate each other’s perspective. It has been proposed (Lerner et al., 2000) that through the conduct of research consistent with the outreach frame described by Jensen et al. (1999), the blurring of the distinctions between science and practice in developmental science will be facilitated. Moreover, such scholarship will provide needed vitality for future progress in the field of human development and, according to Lerner et al., for the very viability of the academy (Eccles, 1996).
Authors’ Note This journal’s special issue describes work conducted as part of the Miami Youth Development Project, Child and Family Psychosocial Research Center, Department of Psychology, Florida International University, Miami, FL 33199. Any comments or suggestions regarding the material should be directed to William M. Kurtines at the above address.
References Adelman, H. S., & Taylor, L. (2003). On sustainability of project innovations as systemic change. Journal of Educational & Psychological Consultation, 14(1), 1–25. Arnett, J. J. (2000). Emerging adulthood: A theory of development from the late teens through the twenties. American Psychologist, 55, 469–480. Arnett, J. J., Kurtines, W. M., Montgomery, M. J. (Eds.) (2008, this issue). Promoting positive youth development [Special issue]. Journal of Adolescent Research, 23. Bronfenbrenner, U. (1979). Contexts of child rearing: Problems and prospects. American Psychologist, 34(10), 844 – 850. Catalano, R. F, Berglund, M. L., Ryan, J. A. M., Lonczak, H., & Hawkins, J. D.(1999). Positive youth development in the United States: Research findings on evaluations of positive youth development programs. Washington, DC: U.S. Department of Health & Human Services Damon, W. (2004). What is positive youth development? Annals of the American Academy of Political & Social Science, 591, 13–24.
Salkind_Chapter 10.indd 178
9/16/2010 12:40:01 PM
Kurtines et al.
Promoting Positive Youth Development
179
Eccles, J. S. (1996). The power and difficulty of university-community collaboration. Journal of Research on Adolescence, 6, 81–86. Ferrer-Wreder, L., Stattin, H., Lorente, C. C., Tubman, J., & Adamson, L. (2004). Prevention and youth development programs: Across borders. New York: Kluwer Academic/ Plenum. Holmbeck, G. N. (2002). A developmental perspective on adolescent health and illness: An introduction to the special issues. Journal of Pediatric Psychology, 27, 409–416. Hultsch, D. F, & Hickey, T. (1978). External validity in the study of human development: Theoretical and methodological issues. Human Development, 21, 76–91. Jensen, P., Hoagwood, K., & Trickett, E. (1999). Ivory towers or earthen trenches? Community collaborations to foster “real world” research. Applied Developmental Science, 3(4), 206–212. Kazdin, A. E., & Weisz, J. R. (1998). Identifying and developing empirically supported child and adolescent treatments. Journal of Consulting and Clinical Psychology, 66(1), 19–36. Kellogg Commission on the Future of State and Land-Grant Colleges. (1999). Returning to our roots: The engaged institution. Washington, DC: National Association of State Universities and Land-Grant Colleges. Keys, S. G., Bemak, F., & Lockhart, E. J. (1998). Transforming school counseling to serve the mental health needs of at-risk youth. Journal of Counseling and Development, 76, 381–388. Kurtines, W. M., & Silverman, W. K. (1999). Emerging views of the role of theory. Journal of Clinical Child Psychology, 28, 558–562. Lerner, R. M. (1995). The place of learning within the human development system: A developmental contextual perspective. Human Development, 38(6), 361–366. Lerner, R. M. (2005, September). Promoting positive youth development: Theoretical and empirical bases. White paper: Workshop on the Science of Adolescent Health & Development, NRC/Institute of Medicine. Washington, DC: National Academies of Science. Lerner, R. M., Fisher, C. B., & Weinberg, R. A. (2000). Toward a science for and of the people: Promoting civil society through the application of developmental science. Child Development, 71(1), 11–20. Masten, A. S., & Coatsworth, D. J. (1998). The development of competence in favorable and unfavorable environments: Lessons from research on successful children. American Psychologist, 53(2), 205–220. McHale, S. M., & Lerner, R. M. (1996). University-community collaborations on behalf of youth. Journal of Research on Adolescence, 6, 1–7. Ollendick, T. H., King, N. J., & Chorpita, B. F. (2006). Empirically supported treatments for children and adolescents. In P. C. Kendall (Ed.), Child and adolescent therapy: Cognitivebehavioral procedures (3rd ed., pp. 492–520). New York: Guilford. Roth, J., Brooks-Gunn, J., Murray, L., & Foster, W. (1998). Promoting healthy adolescents: Synthesis of youth development program evaluations. Journal of Research on Adolescence, 8, 423–459. Roth, J. L, & Brooks-Gunn, J. (2003). What exactly is a youth development program? Answers from research and practice. Applied Developmental Science, 7, 94–111. Schoenwald, S. K., & Hoagwood, K. (2001). Effectiveness, transportability, and dissemination of interventions: What matters when? Psychiatric Services, 52, 1190–1197. Seligman, M. E. P., Steen, T. A., Park, N., & Peterson, C. (2005). Positive psychology progress: Empirical validation of interventions. American Psychologist, 60(5), 410–421. Silverman, W. K., & Kurtines, W. M. (1997). Theory in child psychosocial treatment research: Have it or had it? A pragmatic alternative. Journal of Abnormal Child Psychology, 25, 82–94.
Salkind_Chapter 10.indd 179
9/16/2010 12:40:01 PM
180
Human Development
Spanier, G. B. (1999). Enhancing the quality of life: A model for the 21st century land-grant university. Applied Developmental Science, 3(4), 199–205. Spoth, R., Greenberg, M., Bierman, K., & Redmond, C. (2004). PROSPER communityuniversity partnership model for public education systems: Capacity-building for evidence-based, competence-building prevention. Prevention Science, 5(1), 31–39. Weiss, H. B., & Greene, J. C. (1992). An empowerment partnership for family support and education programs and evaluations. Family Science Review, 5, 131–148. Weisz, J. R., Donenberg, G. R., Han, S. S., & Kauneckis, D. (1995). Child and adolescent psychotherapy outcomes in experiments versus clinics: Why the disparity? Journal of Abnormal Psychology, 23(1), 83–106. Weisz, K. R., & Hawley, J. (2002). Developmental factors in the treatment on adolescents. Journal of Consulting & Clinical Psychology, 70(1), 21– 43.
Salkind_Chapter 10.indd 180
9/16/2010 12:40:01 PM
11 Children Have More Need of Models than Critics: Early Language Experience and Brain Development Travis Thompson
P
sychologists make easy targets for parody. The satirical paper, “The etiology and treatment of childhood” (Ellenbogen, 1986, pps. 5–12) described “childhood” as clinical psychologists might view the condition psychi-atrically, including a list of such diagnostic features as congenital onset, dwarfism, emotional lability and immaturity, knowledge deficits and legume anorexia. The paper suggested the prognosis for those suffering from childhood was generally poor. But the article cited a fictitious paper by Moe, Larrie and Kirly (1974; pps. 9–10) which purported to shed new Iight on long term outcome for those afflicted with childhood. Moe et al. studied a group of 42 children longitudinally but provided no treatment or their alleged affliction. The follow-up findings “were startling … Moe et al. (1974) found all subjects improved uniformly on all measures … (with) a spontaneous remission rate of 95%, a finding which is certain to revolutionalize the clinical approach to childhood” (p. 12). The obvious implication was that most children grow up just fine without much help. There is no need to see pathology where none exists nor propose that ordinary children need any special help beyond what Nature provides. If children are simply left to explore and learn, while occasionally being sprinkled with parental love, they will develop into competent young adults, at least that’s what the satirist would have the reader conclude. Most children do seem to develop normally in a range of surroundings, but it turns out that the characteristics of those circumstances aren’t at all obvious. Source: Journal of Early Intervention, 19(3) (1995): 264–272.
Salkind_Chapter 11.indd 181
9/16/2010 12:39:51 PM
182
Human Development
Child rearing is not a form of horticulture and parents’ roles in shaping a child’s intellectual growth isn’t like sprinkling water on pansies. Many children growing up in poverty America turn out to be anything but typical. They are predisposed to all manner of difficulties, including problems learning to read and write, even though their IQ’s may be within a normal range Moe, Larrie, and Kirlie not-withstanding, the circumstances of early childhood really do matter to children’s developmental outcomes, as Betty Hart and Todd Risley powerfully demonstrate in Meaningful Differences. Hart and Risley concur that most children living in “ordinary families” actually do develop surprisingly well, but they uncovered startling differences in the cognitive development of children growing up in poor and affluent families. Like the fictitious Moe, et al., Hart and Risley also studied 42 children and their families longitudinally. But these were real Midwestern families, not figments of a creative imagination. Though they did not intentionally intervene, there is no way that spending hours in people’s homes month in and month out over two and one half years, tape recording and writing notes about their children’s interactions with their parents without having some effect. But it is the only way to obtain the kind of data Hart and Risley secured, which is an unprecedented gold mine of information about children’s early language development. The Hart and Risley book may very well change our thinking about how we arrange early experiences for our children, if not revolutionize our approach to childhood. It should be required reading by anyone seriously involved in early education and intervention as well as policy makers. Hart and Risley are unapologetic, unreconstructed social reformers whose interest in differences in children’s language development stems from work during the 1960’s under the auspices of the War on Poverty. Their work is consistent with the view expressed by Fritz Fanon that “Mastery of language affords remarkable power” (Fanon, 1967). Hart, Risley and colleagues taught poor pre-school age children early academic skills so they would be similar to those of children of affluent white upper middle class families as they entered kindergarten. They reasoned that once the poor youngsters had caught up with their more affluent peers, they would continue on that trajectory in elementary school. But like other academics who had wandered into the fray of the War on Poverty, they found the “booster shot” approach to early education had evanescent effects. Though Hart and Risley’s interventions, as well as those of Susan Gray and Rupert Klaus, Carl Bereiter and Siegfried Englemann, and others, all produced rapid improvements in language and cognitive performance, those academic gains were temporary (Brotman, 1968). By 7 years of age children with and without the earlier language interventions were functioning similarly academically. The early intervention academic effects had largely washed out. These findings seriously undermined confidence in early intervention as a strategy for reversing problems sustaining intergenerational poverty. Worst of all, the long term outcomes of these first early intervention studies appeared to lend scientific legitimacy to nativist arguments
Salkind_Chapter 11.indd 182
9/16/2010 12:39:51 PM
Thompson
Children Have More Need of Models than Critics
183
for abandoning supplementary early education for poor children, especially racial and ethic minorities (see Jensen, 1969; Her-renstein & Murray, 1994). Hart and Risley remained convinced they were missing something very basic about the effects of early experience. In Meaningful Differences Hart and Risley tell the story of how they ferreted out the reasons for the limited success of these previous early intervention efforts, and provide guidance as to what needs to be done in the future to capitalize on what they have learned. It is a fascinating tale. They begin by posing a painful question for dedicated behavioral and social scientist activists such as themselves. “Was the widespread belief in the 1960’s that pressing social problems could be overcome by improving the everyday experiences of America’s children fundamentally misguided?” After studying the first three years of life of 42 Midwestern children, and following them into elementary school, they conclude that the answer is clearly “no,” it was not misguided. But it turns out that the nature of those experiences, and exactly when those experiences are provided may make all the difference in the world between lasting and ephemeral effects of early educational experiences. They have structured their exposition like a whodone-it, a series of research forays, blind alleys, and renewed efforts to discover whether a child’s every day experiences really can have lasting effects on their cognitive development. It is an interestingly written book which keeps the reader on the edge of their chair. Meaningful Differences reports an investigation of the development of 42 children’s language from 6 months to 3 years of age and how that language acquisition appears to be related to what the children heard and saw at home, especially how the children’s language was elated to their parents’ language. Hart and Risley’s staff made monthly one hour home observations and collected audio tape recorded samples of child-adult interactions from 6 months to 3 years of age. Later, they coded the interactions among children and parents and analyzed the complex array of interactions using sophisticated computer programs. Thirteen of the families were professionals, 23 were working class, and 6 families were supported by public welfare. The six poorest families were relatively intact and functional and were not typical of the most extreme poverty in America. There were some very encouraging findings. Hart and Risley found that poor, middle class, and more affluent parents alike were all deeply invested in their children’s development. Not only were all the parents dedicated and did all the parents make sacrifices for their children, “our surprise was with how naturally skillful all the parents were and the regularity with which we saw optimum conditions for language learning” (p. 55) in the homes of all of these ordinary families. Despite major social and economic differences among the families studied, all of the children developed language and functioned reasonably competently by three years of age. Hart and Risley ended up “feeling above all a profound pride in American families and the promise of the future” (p. 42). Hart and Risley’s observations suggest that the impression we receive from
Salkind_Chapter 11.indd 183
9/16/2010 12:39:52 PM
184
Human Development
television and print tabloids may understate the commitment of the vast majority of poor and affluent American families, alike, to their children. Abuse and neglect are serious problems in America, to be sure, but they may not be as ubiquitous as some suggest. The poorest families in the Hart and Risley study seriously lacked resources, but they were caring, committed families that did what had to be done for their children despite profound disadvantages. In fat, despite their lack of material resources they were in many ways remarkable for their ordinariness. But the essence and importance of this book is in the major differences in the poor and affluent children’s language growth and how Hart and Risley believe those differences were related to the way their parents talked to and with them. The differences appear to arise from more subtle sources, and in the long run, may pose more hypotheses for further investigation, than closing a final chapter on the investigation of early experience. The differences observed between affluent and poor children’s language development are enormous. By age 3, the affluent children’s vocabulary is twice that of the poor children and the gap is widening. The associated differences in the way their parents interact with them are equally staggering. The long-term implications could be profoundly important for America’s children. Hart and Risley found that, from the outset, parents of affluent children talk more to their children during an average day than the parents of poor children … far more. Affluent parents use five times the number of nouns and six times the number of modifiers in their child-directed speech per hour than poor parents. These were not artifacts of including a few highly talkative parents in one group or a few very reticent parents in the other. The least talkative affluent parents directed more words per hour toward their children than the most talkative poor parents. Measures of the richness of language interactions were even more highly correlated with rates of vocabulary growth and to the children’s accomplishments at age 3 than the amount of these language features the children experienced. Several features of parental language were negatively associated with children’s rate of language development: the net amount of parental initiations (“What are you doing?”), imperatives (“Put your shoes away”), and prohibitions (“Stop, that!”), the less rapid the children’s vocabulary growth. Vocabulary growth and use were highly correlated with socioeconomic status (.63– .65) and, by age 3, the children’s Stanford Binet IQ scores were also positively correlated with parental SES (.54). It has been said, “Children have more need of models than of critics” (Joubert, 1816/ 1989), which is made abundantly evident in Hart and Risley’s findings. When 2-year-old Celia points to the image of a young child on the television screen and says, “Bee bee,” her affluent, college-educated mother replies “Yes, that’s a baby.” But only once in seven times that Jodell reaches for something she wants, like the container of orange juice on the kitchen table and says, “Jis,” does her communicative attempt produce positive results.
Salkind_Chapter 11.indd 184
9/16/2010 12:39:52 PM
Thompson
Children Have More Need of Models than Critics
185
More likely, her mother who dropped out of high school and is subsisting on public welfare will say, “Hush up! Stop grabbing!” The implications of the differences Hart and Risley found in how often affluent and poor parents encourage or make negative statements following their children’s attempts to communicate are astounding. Extending their findings from the first 3 years of life to the 4th year of age when they had begun working with the poor children in their earlier study, Hart and Risley project that an average affluent child is exposed to seven times the number of encouraging consequences (positive reinforcement) as their poor peers. Conversely, poor children receive between two and three times the number of prohibitions (punishers) following their language attempts. If children born into poor and affluent families were identical at birth, from what we know about the Matching Law (Herrenstein, 1970), these differences alone in affirmations and prohibitions of children’s utterances would be expected to produce enormous differences in these children’s language development. Such gross discrepancies in early positive and negative parent-child interactions may do far more than increase or decrease the rate of the children’s language development. Hart and Risley (1992) have referred to this as a “toxic function” (p. 1103) of such disproportionate administration of prohibitions, which could affect the way children come to think of themselves. But what has all this to do with the fascinating question Hart and Risley set out to answer … “Why is it that the effects of 1960s early interventions were temporary?” What does this tell us about the mechanisms by which symbolic social experiences during the first 3 years of life are translated into lasting behavioral and cognitive changes? When a baby is born its cerebral cortex is an associative tabula rasa. Most synapses between adjacent neurons in the brain’s cerebral cortex develop as a consequence of experience over the first several years of life (Hutten-locher & de Courten, 1987; Huttenlocher, de Courten, Garey, & VanderLoos, 1982). There is currently no algorithm that translates repetitions of “Noun-Verb-Object” into the development of synapses, but it may be only a matter of time until correlations in the language areas of cortex are as compelling as the correlations between increased visual experience and increases in synapse per neuron ratios in animal visual cortex (Greenough, 1986; Greenough, Black, & Wallace, 1987). Those 45 million words to which the affluent child is exposed by the time it is 4 years old must produce a greater impact on the microstructure of cortical circuitry, than the 13 million words that the average child born into a welfare family experiences. If we assume, as a clear history of laboratory animal research literature indicates (Diamond, 1988), that the amount of early experience translates directly into amount of brain weight, glial cell to nerve cell ratio, amount of dendritic branching, and level of synaptogensis, this would seem to be a logical conclusion. Once the hard wiring is in place, the affluent child’s brain circuitry is poised to frolic through the fascinating escapades of Curious George and romp
Salkind_Chapter 11.indd 185
9/16/2010 12:39:52 PM
186
Human Development
through the rhymes and alliterations in Green Eggs and Ham. The child growing up in a poor family may end up with 50–70% fewer synaptic connections to be programmed, which could make Brown Bear, Brown Bear hard going, while the average middle class 4-year-old easily understands and enjoys this old standard. It is important to remember that the rate of synaptogenesis drops off sharply after 4 or 5 years of age, which means that trying to play linguistic catch-up in elementary school wouldn’t work very well. Once the die has been cast, it becomes far more difficult to make lasting changes. Since children’s vocabulary growth is highly related to their rate and quality of parental language interactions, Hart and Risley would like the reader to conclude that we should be able to increase the language (and therefore cognitive) ability of poor children by improving their language exposure during the first three years of life. The title of Chapter 9, “Intervention to Equalize Early Experience” says it all. If we equalize language experience, they imply we can expect to equalize children’s vocabulary and presumably more broadly, their cognitive ability. Although Hart and Risley seem to realize that one cannot logically draw such a conclusion from the correlational results of their study, they allow themselves and the reader to do so none the less. The methodological problem is very basic. The parents and children in their study share 50% of the same genes, which inextricably confounds parental vocabulary (and intellectual ability) with family socioeconomic status. Perhaps the lower SES children performed less well because the genes received from their parents predisposed them to find it more difficult to acquire language or perhaps they acquired fewer language skills because they were exposed to a lower rate and richness of language from their parents, or very likely both. Scarr (1992) graphically expressed the problem: “Parents who read well and who like to read will be likely to subscribe to magazines and papers, buy and borrow books, take books from the local library and read to the child. Parents who have reading problems are less likely to expose themselves to this world of literacy, so that their children are more likely to be reared in a less I iterate environment. Those same children are more likely to have reading problems themselves…” (p. 9). But there is much more to gene-environment relationships than that. Scarr and McCartney (1983) proposed that a child’s genetic makeup shapes what they experience, even though the opportunities provided by different environments may be similar. “The toddler who has ‘caught on’ to the idea that things have names and who demands the names for everything is experiencing a fundamentally different verbal environment from what she experienced before, even though her parents talked to her extensively in infancy” (Scarr and McCartney, 1983; p. 425). In comparing the children in Hart and Risley’s welfare and affluent families, the toddlers in the affluent families had caught on to all kinds of language functions, whereas the kids growing up in poor families had far less understanding of how language operated. Scarr and McCartney would argue that much of the reason they caught on so early
Salkind_Chapter 11.indd 186
9/16/2010 12:39:52 PM
Thompson
Children Have More Need of Models than Critics
187
is that they are genetically predisposed to acquire those skills. Once they began to use language, their social environment would abruptly change in response to that demonstrated skill. In a study of school-age children, Garmezy, Masten, and Tellegen (1984) found that youngsters from disadvantaged families who were more intelligent and “spunky” were more likely to be given positive attention and encouragement by teachers than less intelligent and less “spunky” children, which would be consistent with Hart and Risley’s findings concerning the frequency of positive consequences for poor children’s and affluent children’s verbal attempts. The more precocious children of affluent families received seven times more positive consequences from their parents than some of the less capable children in the poor families received from their parents. In short, the children are not passive receptacles of language experience. It is likely their responsiveness plays a role in shaping the experience made available to them by their parents, and part of that responsiveness may be contributed genetically. Hart and Risley restate their goal plainly near the end of the book: “Our goal … was to discover what was happening in the children’s early experience that could account for the intractable differences in rates of vocabulary growth we saw among 4 year olds” (in their earlier intervention studies). The goal clearly assumes that what was happening in the children’s lives, and not what the children brought to those experiences, was responsible for the outcome. Their main thesis is that intergenerational socioeconomic status determines the frequency and the quality of verbal interactions between parents and their young children, which in turn, determines children’s vocabulary growth. Scarr (1992) has argued that different parents and different parenting styles have very little effect on children’s intelligence and presumably language skills. However, her conclusion is based on the assumption that most family environments are well within a “typical range” – presumably typical in an abstract phyolgenetic sense. But since the days of Francis Galton (1876) it has been recognized that these relationships hold only “when the differences of nurture do not exceed what is commonly to be found among persons of the same rank of society and in the same country” (Bouchard, in press, p. 576). In the Hart and Risley study, differences as great as 700% were found between affluent and poor children’s language environments, which would appear to be well beyond any definition of “typical range” of variation within American culture. While we must be cautious in drawing causal conclusions from correlational data such as those provided in Meaningful Differences, as Thoreau (1854) aptly pointed out, “Some circumstantial evidence is very strong, as when one finds a trout in the milk.” Hart and Risley have come upon a veritable Moby Dick. As a matter of public policy it makes sense to focus our attention on that portion of the malleable variance in language and corresponding cognitive outcome that can be influenced through early experience, on the assumption that Galton was right, that we ought not attempt to extrapolate across vastly
Salkind_Chapter 11.indd 187
9/16/2010 12:39:52 PM
188
Human Development
different social classes and cultural gulfs. We would be wise to assume that Gottesman’s (1963) reaction range concept is operative and that it should guide our early education planning. Gottesman argued that there were vast individual differences in t interaction between an individual’s ability level and susceptibility to influence by the environment. For most children, the greatest gains in language functioning appear to be attainable early in life, with the magnitude of change decreasing chronologically. Indeed, Hart and Risley’s findings make a compelling case for the value of early intervention at the extremes of the socioeconomic and intellectual distribution, which is consistent with the conclusions of such behavior geneticists as Robert Plomin (1990) and Thomas Bouchard (1995), as well as others in the field of early intervention (e.g., Gallagher and Ramey, 1987). Hart and Risley have written an exceptionally readable scientific treatment of a socially sensitive topic. In doing so, they have turned a fine phrase: “Like most wars, the War on Poverty was more successful in destroying the past than creating the future … ,” “The intervention programs of the War on Poverty … were modeled on the booster shot. It was assumed that a concentrated dose of mainstream culture would be enough to raise intellectual performance and lead to success in mainstream schools,” and, “What kept us going was the continual springtime humanity displays in its children.” This is strong, evocative writing for which Hart and Risley deserve high praise. The book also walks a challenging line. It is intended primarily for a broad audience of college-educated readers who are interested in children’s early development as ordinary citizens … as parents, teachers, social workers, nurses, doctors, and public policy makers. As a result, technical jargon is minimized. On the other hand, Hart and Risley strive to maintain credibility with their colleagues who are expert in child language and early intervention research. Without providing details concerning methods and procedures their academic colleagues may have taken them to task. As a result, some sections of the book will be heavy going for readers outside of language research, special education, communication, linguistics, and psychology. The less technically trained reader may wish to skim Chapter 5 on “Quality Features of Language and Interaction,” which deals with the observational language codes, when and how they were sampled, reliability measures, derived language measures, and so on. It might have been more effective if Hart and Risley had directed this book entirely to an audience of nonspecialists and addressed their academic colleagues in subsequent volumes. Why has it taken so long to come to grips with the fleeting nature of the academic outcomes of previous early intervention efforts? In truth, we simply didn’t want to acknowledge the possibility that early intervention might not work. The spate of articles and books based on genetic arguments not only rubbed salt into the early intervention wound, but they lent comfort to those who, for sociopolitical reasons, sought to legitimize racial and ethnic discrimination. This chain of events had a profoundly unsettling effect on the
Salkind_Chapter 11.indd 188
9/16/2010 12:39:52 PM
Thompson
Children Have More Need of Models than Critics
189
scientific community and goaded Hart and Risley into action. Few scientists, including behavior geneticists, accepted the premise that early experience was irrelevant to cognitive development, but the research that was necessary to set the score straight remained to be done. In the absence of evidence, most academics sat on their hands waiting for the definitive research to be done … by someone else. The requisite studies were extremely time consuming, labor intensive, and not very sexy work. It meant slogging away, day in day out, meticulously recording and analyzing the interactions between very young children and their parents. In an age when research funding decisions increasingly resemble a Miss America contest or a demolition derby, procuring financial support for Hart and Risley’s work was no easy task. Finally, the combination of qualities of the researchers that were necessary to engage these issues effectively was rare. It required theoretical background in linguistics and developmental psychology as well as behavior analytic research strategies and theory – an extremely improbable combination. Betty Hart and Todd Risley had the theoretical background, intellectual tough-mindedness, methodological rigor, and the strength of their convictions necessary to get the job done. Beneath their warm fuzzy facades, they were committed to the fundamental correctness of their basic scientific premise that very early experience shapes the way a child develops an epistemology … comes to know about the world around them. And they were equally committed to using rigorous scientific methods in order to obtain the necessary evidence. The evidence presented in Meaningful Differences suggests that what a child knows now and in the future depends in substantial measure on how her or his parents talk with them during their first three years of life, and that as James Baldwin remarked “Children have never been very good at listening to their elders, but they have never failed to imitate them” (Baldwin, 1961).
References Baldwin, J. (1961). Nobody knows my name. New York: Dial. Bouchard, T. J. (In press). IQ similarity in twins reared apart: Finding and responses to critics. In R. Sternberg & C. Grigorenko (Eds.), Intelligence: Heredity and environment. New York: Cambridge University Press. Brottman, M.A. (Ed.). (1968). Language remediation for the disadvantaged child. Monographs of the Society for Research in Child Development, 33 [8 Serial No. 124]. Diamond, M.C. (1988). Enriching heredity: The impact of the environment on the anatomy of the brain, Free Press. Fanon, F. (1967). Black skin, white masks. New York: Grove Press. Gallagher, J.J., & Ramey, C.T. (Eds.). (1987). The malleability of children. Baltimore: P.H. Brookes Pub. Co. Garmezy, N., Masten, A., & Tellegen, A. (1984). The study of stress and competence in children: A building block or developmental psychopathology. Child Development, 55, 97–11.
Salkind_Chapter 11.indd 189
9/16/2010 12:39:52 PM
190
Human Development
Gottesman, I.I. (1963). Genetic aspects of intelligent behavior. In N.R. Ellis (Eds), Handbook of Mental Deficiency (pps. 253–296). New York: McGraw-Hill. Greenough, W.T. (1986). What’s special about development? Thoughts on the bases of experience-sensitive synaptic plasticity. In W.T. Greenough and J.M. Juraska (Eds)., Developmental Neuropsychobiology (pp. 387– 405). Academic Press, New York. Greenough, W.T., Black, J.E., & Wallace, C.S. (1987). Experience and brain development. In: Child Development, Society for Research in Child Development, pp. 539–559. Hart, B. and Risley, T.R. (1992). American parenting of language-learning children: Persisting differences in family-child interactions observed in natural home environments. Developmental psychology, 28, 1096–1105. Herrenstein, R.J. and Murray C. (1994). The bell curve: Intelligence and class structure in American life. New York: Free Press. Huttenlocher, P.R. and de Courten, C. (1987). The development of synapses in striate cortex of man. Human Neurobiol., 6:1–9. Huttenlocher, P.R., C. de Courten, L.J. Garey, and H. VanderLoos (1982). Synaptogenesis in the human visual cortex—evidence for synapse elimination during normal development. Neuroscience Letters, 33:247–252.) Jensen, A.R. (1969). How much can we boost IQ and scholastic achievement. Harvard Educational Review, 39, 1–123. Joubert, J. (1989). Pensees (No. 261) jugements et notations; Anthologie critique etablie par Remy Tessonneaui. Paris: J. Corti. (Original work published 1816). Martin, B. and Caule, E. (1983). Brown bear, brown bear, what do you see? New York: Holt, Rinehart and Winston Publishing Co. Plomin, R. (1994). The nature of nurture: Family environment. In R. Plomin (Ed), Genetics and Experience: The Interplay Between Nature and Nurture (pps. 104 –148). Beverly Hills: Sage. Rey, H.A. (1969). Curious George. Boston: Houghton Mifflin Publishing Company. Scarr, S. (1992). Developmental theories for the 1990’s: Development and individual differences. Child Development, 63, 1–19. Scarr, S. and McCartney, K. (1983). How people make their own environments: A theory of geotype-environment effects. Child Development, 54, 424 – 435. Seuss, Dr. (1960). Green eggs and ham. New York: Beginner Books. Smoller, J.W. (1986). The etiology and treatment of childhood (pp. 5–12) In Ellenbogen, G.C. (Ed) Oral sadism and the vegetarian personality. New York: Bruner/Mazel Publishers. Thoreau (1991). Journal, (November 11, 1854). Selections, 1st Edition. New York: Paragon House.
Salkind_Chapter 11.indd 190
9/16/2010 12:39:52 PM
12 Development: Transfer of Technology, Transfer of Culture Jacques Binet (Translated by Jeanne Ferguson)
L
ately, the issues of “transfer of technology” seem to have become fashionable. However, they cannot be considered at length until those of DEVELOPMENT are clarified: transfer of technology is a means, development is an end, and, if we are not careful, we risk – in all good faith – being carried away by the example of the development and techniques of the “Northern countries,” while the needs and possibilities of the “South” may be quite different. Efforts toward development have been essentially centered on the economy: development of production, development of consumption and development of wealth, all of which conforms to the tendencies of our age and its predominant ideas. Marxism gives priority to wealth and its distribution, while “capitalism” is concerned mainly with economics. Ingenuously or on purpose, development neglects all that is connected with psychology, moral codes, metaphysics or sociology. All efforts are directed toward wealth and the acquisition of objects, an extremely reductive and caricatural view of man and his aspirations. Giving complete attention to economy is justifiable when material conditions impose it: the hungry man must be fed before we can speak to him on any subject. For the most part, we have not arrived at that extreme. Food rations may be meager and not balanced, but people are not dying of hunger. We must reflect before taking urgent measures, because generosity in giving
Source: Diogenes, 32 (1984): 19–38.
Salkind_Chapter 12.indd 191
9/16/2010 12:39:40 PM
192
Human Development
may conceal adverse effects. For example, milk is sent during a famine: will mothers become accustomed to nourishing their children by other means than breast-feeding? Everywhere, serious efforts have been made toward schooling. Minds are broadened through learning. Schools have certainly developed an aptitude for rational thought, but their effectiveness on other human faculties is limited: it is above all the family that must educate, reveal affective values and develop social or moral life. Now, the development of the school leads to a weakening of the role of the family: there is less available time, less prestige. Finally, when the agents of development envisage man, they consider the individual rather than the groups of which he is a part. These groups are multiple, going from the family – which at times is very large – to the village, to age groups or initiating associations. In any human society there is a delicate balance to be maintained between the individual and the community. History shows that by concentrating their attention on different values other periods or milieus realized different kinds of development. In the Middle Ages religion was an essential value, and the monastery materialized this social ideal. Land was the basis of power; feudal rights and the bonds of vassalage were its manifestations. From the 16th to the 18th century power was in the hands of the nobility. In the 19th and early 20th centuries power was money. Today, according to some sociologists, the era of capital is over and that of technostructures and organizers is on the march. Defining the objectives of development is completely justifiable, but even when a primary role is give to economy, several ways are open. The West has advanced through liberal capitalism, but Russia or China have adopted a different kind of social and economic organization. Finally, according to the times, technical orientation changes: mechanical, physical and chemical procedures have successively made up the panoply of industry. Computers or robots may perhaps be the pivotal instruments of tomorrow, unless biology is not the key to the engineering of the coming century, as it was at the beginning of the Neolithic revolution that invented agriculture, animal husbandry, basket-making or pottery. The paths of the future are not entirely marked out: all kinds of solutions are conceivable. Certain needs are essential: the human organism needs a certain amount of protein but it may be furnished by foie gras as well as by smoked fish. Almost everywhere in underdeveloped countries two economic channels exist side by side. One, supplied by local agricultural and artisanal production, furnishes requirements at moderate prices (traditional housing, clothing from cotton that is locally spun and woven, earthenware pots); the other, supplied by imports or industry, satisfies at a higher price analogous needs that are enhanced by the prestige of being European. For several years attention has been focused on the transfer of technologies, and conferences elicit the ceding of patents, as if these patented and
Salkind_Chapter 12.indd 192
9/16/2010 12:39:40 PM
Binet
Development: Transfer of Technology
193
later appropriated discoveries and techniques were the “secret” of an evolution. In fact, there is probably more than one way to evolve. Other procedures, other objectives, other ways of thinking could perhaps improve living conditions. The “secrets of manufacture” are as fascinating as a myth: the secret and the esoteric are at the heart of all magical thought; their affective echos are deep within us. In reality, technologies that have become public domain or are not protected by a patent would permit enormous transformations in Black Africa. Examining various sectors of human life – economic, political or psychological – we see that evolution, following the direction indicated by 20th-century Europe, is not always desired by nor desirable for Africans. In fact, technology is not usually a neutral element in the cultural mosaic: it is linked to a system of laws, to a conception of man. The introduction of tools, of scientific knowledge and a way of dividing wealth may make the civilization that receives them obsolete. The greatest prudence is thus imperative.
Agriculture At present, improvements in agriculture do not require a complex technology, but innovations must be accepted by the people without uneasiness or regret. The African, who in most cases has a quasi-religious respect for his ancestors, is sometimes bewildered by the idea of doing things differently from them. For him, the nourishing Earth is often an almost divine power with which he has filial ties. Good harvests are granted by the earth when they are earned through sacrifices or prayer: they do not occur through technique, there is always uncertainty. To consider the earth and the elements as things to be commanded or manipulated at one’s will would certainly appear strange and blasphemous to any man brought up in an agricultural tradition. The Neolithic revolution brought to his unconscious mind sentiments of respect and love for the goddess of fertility and a fatalistic submission to her omnipotence. To go from this attitude of devotion to an attitude of conquest would be a difficult step to take. The methods of modem agriculture are solidly and clearly based on intelligent reasoning, whereas the rites of traditional agriculture came from the distant past. Mysterious and fragmentary, they did not form a coherent doctrine; there was a place left for poetic imagination. By repeating the actions of his ancestors, man felt close to the supernatural powers. Agriculture was a ritual; modernity makes it a secularized technique. Some African intellectuals revolt against the science that they claim will bring about a “cultural genocide” by substituting a rational way of thought and action, efficient but prosaic, for an action that makes man an interlocutor of the gods.
Salkind_Chapter 12.indd 193
9/16/2010 12:39:40 PM
194
Human Development
Many, in fact, do not want progress in agrarian techniques and ways of life. For urban dwellers, especially those who suffer the restrictions and overwork of modern life, the “bush” is a sort of paradise lost, a refuge. Our epoch is quick to accuse imperialism and exploitation. More than anyone else, intellectuals who do not have a realistic view of the country make of it a bucolic Utopia. They have a poor comprehension of the demographic pressure. In order to merely maintain the mediocre standard of living of 1958, all agricultural production would have to be doubled, since the population has doubled in a generation, but the people, especially young urban intellectuals, have hardly taken account of this fact. Since governments have statistics at their disposal, they are more aware, but they do not inform the people. The agencies for rural services and classes for informing and educating the people prefer to avoid the facts, taking an attitude of mistrust and criticism. Projects for dams do not arouse enthusiasm: there is a feeling of unexpressed uneasiness about them, as well as a fear that the country people may become employees and proletarian, that they may have to abandon foodraising cultures that assure independence and that they may be exploited. Irrigation would permit cultivation in the dry season, and an increase in production would not be threatened by a lack of cultivatable land – except in the case of overpopulation – or an unsuitable use of the land but by the limited duration of agricultural work. At present, with crops dependent on rainfall, the work period is no more than three months in a tropical climate and nine in an equatorial climate. With irrigation, it would be possible to cultivate more fields and have several harvests beyond the usual period. Mechanization of farming, with equipment similar to that of European farmers, would certainly have its advocates. The prestige of machines and the ways of the Whites is enormous. However, the switchover would be inconceivable without a powerful and restrictive organization: the example of the kolkhoz is well known. The African village is too small to be the base of such an organization; a state structure would have to be superimposed on the villages. We can imagine the dangers inherent in such a solution – a strict discipline, politicization and difficulties in management (the state would exhaust itself meeting the deficits), the technical problems of working vast expanses of land. Non-traditional farming eases restrictions in another way. Ploughs, seeders and cultivators permit a more rapid accomplishment of work that must be done at a precise moment to prevent bottlenecks. But do the people feel the need? Inquiries in the field have shown that young people reproach the “Whites” with “forcing the farmer to go into debt.” The training of oxen or horses, the purchase of material (modest) and fertilizers seem to be a constraint. Should not the people then be left to reflect and discuss the matter until their wishes are clearly known? An interesting Senegalese film deals with these questions. Its author believes that the farmers should stop growing peanuts – a crop for exportation – and
Salkind_Chapter 12.indd 194
9/16/2010 12:39:40 PM
Binet
Development: Transfer of Technology
195
devote themselves only to millet. Is this return to a total autarchy possible and desirable for the farmer who would like to have a bicycle or a transistor radio, for the State that would like to sustain its finances through export taxes? A problem of economic organization arises: do the buyers for cooperatives perform their duties honestly, or do they cheat the farmer? The agricultural price level, the “deterioration in terms of exchange,” is also brought up, but the myth of an idyllic past, of a self-sufficient village, is quickly established. To arrive at a transformation – which should be a progress – with such distrust is dangerous, It would be better to renounce it, at least until the need for an increase in production or monetary profit were clearly expressed. In fact, increased production and monetary gain assume and bring with them the development of inequalities. African societies are quite diverse; inequality is not unknown, but in the past hierarchies rested on physiological facts such as age, seniority and in some cases birth. Societies bound to wealth are rare. Up until now, inequalities corresponded to a difference in social prestige and the more or less superior aptitude for power. We may assume that the farmers who became wealthy used their wealth for ends proposed by traditional society. However, the development of states brings with it a concentration and new nature of power. These are double: political power and technocratic power. State employees of all kinds have authority because of their technical knowledge, the prefect because of his knowledge of law or the nurse because of his medical knowledge. Politicians are supported by public opinion. The newly-rich are in competition with the traditional men of importance. On the other hand, it is fairly obvious that the development of wealth will be accompanied by avarice, which is rarely found in most cultures of the Black world. Almost everywhere the most highly esteemed quality is generosity, to the point where wastefulness is often recommended. In the Dyola society funerals are accompanied with a great slaughter of cattle and an excessive amount of rice, to such an extent that Senegalese law has had to regulate the practice. In Gabon the custom of bilaba was common: important men were rivals in generosity, overwhelming each other with more and more sumptuous gifts until one of them was outdone and unable to offer more than the other. Saving, control of expenses, avarice, are unthinkable in most cases. To create resources in kind or in money is certainly desirable. In addition, this surplus of production should be directed toward a reasonable and efficient application of development. In countries where Islam is strong, alcoholism is probably not a threat, but elsewhere? When we see things in these perspectives, we realize that restructurations are required. If the shops of Senegalese cooperatives are empty, of what use to the farmer is the money he has earned from his harvest? What good does it do to have a surplus if commerce and transport are inefficient or do not deliver produce to the interior market?
Salkind_Chapter 12.indd 195
9/16/2010 12:39:40 PM
196
Human Development
Industry and Development Industrialization seems the key to all development. The myth of the colonial past is probably at the origin of this concept. We know that in the 18th century colonies had to furnish their home countries with raw materials and receive all their manufactured products from them. The revolt of the American colonies was born from this restriction when the “rebels” refused to permit their activities to be thus limited. Historians of colonization who have thought that this division of charges between home country and colony was still in use in the 19th and 20th centuries in the new colonial empires have neglected the importance of local industries in the latter. The oil producers of Bordeaux or the soap-makers of Marseilles would probably have liked to preserve their monopolies of the transformation of peanut or palm oil. However, this did not prevent the installation of an oil-mill – due to the merchant Jau-bert – at Saint Louis in 1881; De Dietrich’s automobile assembly shop for Sudan Auto at Kayes in 1899; a textile mill at Bouaké in 1920. To explain the mediocrity of development by obstacles present in local industries is thus inexact. The desire to have a national industry, expressed and repeated many times over by intellectual writers and politicians, is so well interiorized that it becomes a sort of reflex: development will come with the arrival of industry, it is thought. In reality, the question is more complex. To set up industries is to inevitably open the door to the multinationals, who are so feared. In fact, nationals have neither the capital nor the necessary competence to organize enterprises on the technical and juridical level. It would not be impossible to find capital: this is the role of the banks, and states have created banks expressly for development. We could also imagine institutions for pooling savings and directing them toward productive investment. The bank of Abidjan is one attempt in this direction, the Crédits Mutuels are others. Nevertheless, acceptable projects are very rare. In the Ivory Coast, where all types of efforts have been made, we find barely a hundred “businesses” going beyond the artisanal level as understood in Europe, that is, employing a dozen workers in commercial bakeries, carpentry or masonry. We must accept the evidence: for the moment, there are no businessmen willing to set up industries. It would be possible and easy to find workers or managers, but no organizer has appeared. On the other hand, it is obvious that other activities may justifiably tempt ambitious men: a managerial career in a foreign enterprise would be less risky; a career as civil servant or political man opens broader and more prestigious perspectives. To create industries is to create employment, but it is also essential to know how workers would adapt to being employees: would they accept the restrictions of stability and discipline’? Would they support without too much difficulty
Salkind_Chapter 12.indd 196
9/16/2010 12:39:40 PM
Binet
Development: Transfer of Technology
197
the inevitable depersonalization connected with large organizations? Would they feel torn between having to make friendships in the factory and in so doing slacken in their family or tribal relationships? The passage into the world of techniques with its rigid and implacable logic, with the rejection of all affective warmth, is apparently a severe test, all the more so because the worker, once he enters the factory, finds a world in which he can compromise with supernatural powers and go beyond the laws of causality. European industrial organizations or multinationals are cold. There is no place in them for affectivity. The hierarchical structure is restrictive. The factory was born of the industrial revolution in a world in which money was the only recognized value. In this world of exploitation, class struggle brought distrust and depersonalization of relationships. Is all that inherent in industry? The Japanese example proves that it is not, since the employees of the large daïbatsu find a family atmosphere and an esprit de corps in their plants. An African industry could perhaps set up relationships of this kind; a multinational risking it would be immediately accused of paternalism.
Exchanges and Commerce Even limited in area and without currency, exchanges transform living conditions. In fact, objects or food produced by man take on an autonomous importance in this perspective. They have a value in themselves, independently of their creator or his needs. In the framework of exchange it is possible and even useful to produce more than is used. For example, the tailor who made clothes to order will now prepare them in advance. This production of “ready–to–wear” marks an important step: the tailor must learn how to stock, buy materials in advance and predict his market. The client becomes an abstract personage. International commerce magnifies this abstraction as far as caricature. In the forest the villagers keep palm nuts after the oil has been extracted; they break them up and sell the kernels, often without knowing what they will be used for. We have seen the surprise of Cameroonian notables visiting a margarine factory and understanding the use of cabbage trees. To the frustration of producing things whose use is not understood is added that of receiving a price whose justification is not grasped. Even if the latter corresponds to the just remuneration for a particular work, the mystery of its determination arouses suspicion and uneasiness. In a monetarized economy a new difficulty is added to all the above: currency itself is a mysterious instrument for the majority of the “economic agents”. The people have a poor understanding of questions arising from this domain which are rarely explained to them in clear terms. Even the literate members of the middle class, such as primary school teachers or instructors, are uninformed on these problems and, it must be said, seem to be little concerned with them.
Salkind_Chapter 12.indd 197
9/16/2010 12:39:40 PM
198
Human Development
In view of the major upsets that the diffusion of exchanges – especially monetary ones – has brought to the entire population, the installation in some rare businesses of refined techniques such as that of the computer has little importance.
Social Groups Confronted with Transfers of Technology The State is a form of adapted technology, not for the management of tools and products, but of social groups. This juridical technique has inevitably been tranferred overseas without a critical examination. The colonizers transmitted powers and institutions that had developed in their own countries. The nation-state, born in Western Europe after centuries of dynastic states and kingdoms, became accepted in Africa. Naturally, regroupings are always possible, but that does not change the facts of the problem. States have precise views on development. They intend to insure the stability of their power through the growth of economic activity. Indeed, a market economy based on exportation is an easy source of revenue: export taxes are not difficult to collect. The development of interior commerce could equally improve the activity and prosperity of the inhabitants. However, the stream of exchanges of this nature is divided into a multitude of rivulets: it is as difficult to make their inventory as it is to tax them. Outside commerce, on the contrary, is concentrated in the port; it is governed by a small number of easily-controllable organizations. The State, or more exactly, those who hold the levers of command, wants to insure its power. An extended administration is the instrument of this power. Costly, it requires financing and thus development. Mainly, it has a tendency to neglect rural areas so as to occupy itself solely with urban centers that are nearby and exert constant pressure. Civil servants form a veritable social class. Chosen because of their education, they marry women belonging to the same cultural level as themselves; all schooled, their children are in the best possible position to confront competitions for recruitment. Such a caste can easily take power and make its authority felt on the rest of the country. State power may feel challenged by traditional societies, the old chiefdoms or tribes. These ancient solidarities have, however, lost their importance. They have an odor of the past which is disliked, and their influence seems too diminished for an Africa that wants to be on a level with the rest of the world. Some states have limited the powers of the great chiefs, others try to operate around an assembly of one or several ethnic groups. The question of national unity arises. Unification, in addition, runs the risk of basing itself on superficial foundations. The ancient cultures are bound to ethnic groups, and the search for common denominators risks the elimination of everything that has depth.
Salkind_Chapter 12.indd 198
9/16/2010 12:39:40 PM
Binet
Development: Transfer of Technology
199
Knowledge, philosophies, rites are protected everywhere through secrecy, but the esoteric makes any civilization fragile. The number of those who possess knowledge is submitted to all sorts of restrictive conditions. A few premature deaths are sufficient for entire sections of traditional culture to disappear forever. All development, whatever its form, risks to have a harmful effect. During travel or work, people mix with each other. The exodus from rural areas takes the young people away from their tribal milieu and makes them sceptical: strangers they meet live differently without suffering prejudice because of it. Favored by modern means of diffusion, a world civilization tends to impose itself, everywhere conforming to itself. When it is a matter of smaller groups, problems are different. In the village or patriarchal family, development leads the individual to a personal awareness. With money, each becomes independent. Travel, made easy, allows the acquaintance with cities and the possibility to live there, thus escaping the authority of the elders and the rigid hierarchy of the village. Although development is not limited to the economic level, it is obvious that, taken in its entirety, it brings about an increase in the economy of exchange and, particularly, currency. All development, on the other hand, assumes a certain diversification of poles: an effort must be made to create new areas of activity so that exchanges are not all concentrated in one or even several cities. Nevertheless, urban centers are inevitably privileged; it is difficult to avoid an exodus from rural areas to the city. Villages are losing the most dynamic members of their population. Even if the emigrants return, they are changed: they have lived far away from the constraints of tradition, and their earnings guarantee them their independence; they feel more modern, more efficient and stronger than their elders. Even if they learned nothing constructive during their stay in the city, they have seen a different world from that of their ancestors. They will not accept with good grace finding themselves in a subordinate position. The esotericism with which the old people invest all traditional culture disheartens them. Why suffer, apply oneself and waste one’s time in initiations whose interest does not seem evident? On a certain technical level, reciprocal aid used to be indispensable. New equipment gives a farmer more strength and more time and spares his having to call on groups for reciprocal aid. The one who owns a tractor or, more simply, a pair of oxen may carry out his work without being obliged to anyone. He isolates himself, and the solidarity of the group diminishes. Monetary economy is progressing. Money has been present for a long time. As long as the sale of crops was the principal source of currency, wealth was tied to the social hierarchy. The heads of families, the elders of the clans, had resources at their disposal. With emigration, it is the, young who become rich: the hierarchy is turned upside down. Besides, commerce spreads through the villages. Currency becomes a daily means for those who have it to succumb to attractive purchases. It is important to have money, and its
Salkind_Chapter 12.indd 199
9/16/2010 12:39:41 PM
200
Human Development
lack may be strongly felt. Formerly, inequalities were linked to social status, fixed by birth or age. They did not engender a difference in a standard of living. Everyone ate the same millet and similar sauces. Because it allows the purchase of consumer goods, money brings about a feeling of inequality. Changes arising from economic development are the same in every family: rural exodus, upsetting of the hierarchy because of money. Collective goods are in the hands of the elders and, since there is always the risk that these goods will serve for the personal use of the latter, the younger people feel injured. The family used to be a refuge against illness or death. Now, money can buy medical care. Family support was indispensable for negotiating marriages. The work of the elders supplied the benefits demanded by the parentsin-law. The amount of the dowry is now fixed in money, and the young man attempts to meet it by his own means. There again the solidarity formerly imposed by circumstances tends to disintegrate and may only be found again through a family spirit freely experienced by each member. Restrictive social institutions are inevitably transformed under the pressure of economic evolution and the awareness of individual liberty. They must be modified to take the new facts into account. Progress in the economic domain is not enough. It must be preceded or followed by tranformations in the social order. The people must be informed of it and led to reflect and discuss all aspects of the problems.
Psychology and Metaphysics The adoption of new technologies and development imposes or brings with it particular attitudes toward time, rational thought and abstraction. Those who rush toward progress must know that their conceptions, particularly in those areas, are almost certain to be modified. One might even say that any progress supposes and brings on a certain number of modifications in the ideas of the population. In most African cultures, the ideal time, the Age of Gold, is located in the past. This is perfectly logical in the prevailing gerontocratic or ancestrolatric perspective. The Ancestors, founders of the tribes, transmitted to man the civilization coming from the Creator. Their knowledge, their virtues are, by definition, superior to ours. The distant age in which they lived was the ideal: the entire task of the living is to try to preserve the heritage and maintain the tradition. The ancient time is the one sought for. Problems can be solved if things are returned to their former state. One can hope to find in myths and daydreams of the past what must be aimed for in the present. On the contrary, the task of one who looks toward the future is more difficult: he does not know where he is going. He must ceaselessly innovate, try, without illusions,
Salkind_Chapter 12.indd 200
9/16/2010 12:39:41 PM
Binet
Development: Transfer of Technology
201
erase the rough copies, continually start over. Tao maintain or to find again is less exalting but easier than to create without respite and worry about success or defeat. Europe long ago abandoned its attachment to the past. By stressing Paradise, the reward of the Chosen, monotheism created an eschatological expectation. The development of knowledge proved that mankind was growing in wisdom and power. Optimism – at times ingenuous – always foresees better tomorrows. This orientation of the mind is indispensable for bringing about any change. There must be faith in the future. A willingness to reject the present and the past comes naturally from this faith in the future. A denial of the past shows distrust of the original culture. It may also show, in a less instinctive way, the desire to examine dispassionately all that is transmitted through heritage. Studies made on industrial workers in Douala have shown that this philosophic attitude severely traumatized some employees. Living far from the village, working in the city in modem situations, they feel guilty about having in some way denied their ancestors: some are racked by remorse because of it. We may ask if a “will to fail” is not adopted as a just self-punishment for this denial. Faith in future progress has another serious consequence. Those who accept it live in a constant instability, looking continually for the latest fashions, the most recent revelations, the most modern manners. Change, in itself, requires difficult adjustments and destroys all intellectual or moral certainty. Europe and America suffer from it. Young people who no longer know what values to devote themselves to are testimony to an increase in disoriented individuals. A receptiveness to change is at the same time useful and dangerous. Things are even more difficult for Africa. Europe since the Renaissance is accustomed to rejecting the old and tends toward the new. Africa still has an attachment to the past. In addition, to accept change is to open up to modern currents that come from the outside and to renounce cultural originality. The attitude toward time and the future thus poses serious problems. The development of nationalism is still more strongly felt (and rejected in some milieu). To participate in industrial life means to apply oneself to keeping conscious control over every action. The machine does not tolerate daydreaming. Perceptions must be strictly measured: the red light changes to green, a gauge shows mounting temperature. The worker must know how to govern his affectivity. Accidents often occur when a worker, preoccupied with his family, is no longer able to put it from his mind. Modern man is forced to adapt himself to living by putting up partitions between the sectors of his personality. This is perhaps indispensable for giving complete attention to his work, but the price of this efficiency is a fragmentation of consciousness, a rupture and an incoherence.
Salkind_Chapter 12.indd 201
9/16/2010 12:39:41 PM
202
Human Development
Separating his personal cares from his work, the man from Dakar or Abidjan increases the rupture of his Self by adopting two contradictory philosphies: in the workplace he lives in a scientific, precise and rational world in which causes and effects are linked; at home with his family he again finds the traditional world in which the supernatural is present, in which mysterious beings can revenge unknown errors. Sleep is peopled with sorcerers sent by enemies. Capricious and unpredictable powers exercise their tyranny. One can make these forces harmless, use trickery with the spirits and make them change their intentions. The traditional world is bathed in mystery. The intelligent scientist knows that the magic of the unknown is immense and that he is far from having demystified the universe. However, he undertakes to dissipate the obscurity. On the contrary, the man who lives in the cult of tradition takes pleasure in the mystery: he appreciates it as a poet and the very word “traditionalist” evokes a contact with the sacred. Participation in the industrial and scientific world leads to a rejection of magic. Of course there are always margins of chance and the unexplainable, but habitual technical actions lead to the conviction that causality is rigorous. There is sometimes a reaction in Black students when they are faced with an excess of scientism and the icy coldness of Reason: they pretend to see the “science of the ancestors” in magic and reject as genocide any rationalizing position on this point. The man of development and technology is Homo faber in the strongest sense of the term: he intends to “dominate nature”, “harness rivers” and “extract minerals from the bowels of the earth” Promethean will resounds in all these metaphors. Until they become parts of an exchange economy, objects and products are so bound to their proprietor or producer that they have no existence outside him. When a man dies, custom provides that his possessions are burnt; in other cases, they are transmitted to his successor, not as inheritance but to complete the latter’s new role as a replacement for the defunct. The destruction of cattle or harvests as a celebration of mourning is often explained in this way. The proprietor being dead, his herd must follow him. Recent laws in Senegal have tried to stop this waste that is only the manifestation of a different philosophy. The herd of cattle is still marked by subjectivity: cows are not interchangeable objects; they are appreciated according to their esthetic qualities, according to the herd in which they originated, according to the personality of the one who gave them. They are saleable, because no more than anywhere else can the attraction of money be resisted, but they are not entirely objects. The earth is not an instrument of work or capital for production. The European had a visceral attachment for it; peasant dynasties handed down covetousness in order to round out their land. In Black Africa land has a sacred character: sacrifices had to be made before crops were sown, and the first fruits had to be deconsecrated before the crops were consumed.
Salkind_Chapter 12.indd 202
9/16/2010 12:39:41 PM
Binet
Development: Transfer of Technology
203
An “earth priest” was there to celebrate these rites, to settle disputes and prevent crimes that would have sullied the fields and made them infertile. If the sexual act took place in the bush, if human blood fell on the ground, sacrifices of propitiation were needed. Conquerors usurp political power, but power over the soil remains with the heirs of the first inhabitants. In some regions of Senegal the entire village is Moslem, but one old man remains outside Islam in order to make sacrifices to the “tur”, on the altar at the foot of a tree. In Casamance the “rain king of Enampore” celebrated rites of his ancestors and assured the fertility of the seasons. He wanted to convert to Islam, but torn between his traditional duties and his religious convictions, he went mad. If it becomes no more than an instrument to produce crops, the land is deconsecrated. The gods are chased away from the world and materialism becomes dominant. All this is not in accord with African culture, which instead is respectful of land and water and submitted to the will of the gods and the ancestors. American or Soviet literature has exalted man’s seizure of the world and has glorified gigantic dams or monstrous machines. The African farmer asks forgiveness from the earth or water for taking their fruits from them. Here economic development clashes with metaphysics. Before any transformation of nature the people must feel assured that what they are participating in is right; the God of the Bible gave man dominion over creation. Can African divinities or ancestors encourage this dominion’? Must the world be deconsecrated in order to develop it? Can harmony be achieved with the sacred world? It will be said that all these questions are very complicated and are not really relative to workers or peasants. Actually, they are perhaps not expressed consciously or explicitly, but at the level of the unconscious they run the risk of being even more disturbing. An unconscious malaise is more serious than one that is expressed. If it can be expressed, it finds answers and outlets; if it remains buried, it engenders a vague sense of guilt. Now, in order to create a new society, which is not without risk and without difficulty, man needs all his enthusiasm.
Conclusion Development is a complex undertaking: technical and economic modalities are not always mastered, and their consequences are quite different. A civilization is a more or less coherent collection of techniques, law, social organization, beliefs, values and knowledge. The mere introduction of a new and strange element into such a mosaic may well destroy it altogether. Africans feel to a greater or lesser degree their cultural fragility: all the discussions on negritude, authenticity and African-ness are evidence of this.
Salkind_Chapter 12.indd 203
9/16/2010 12:39:41 PM
204
Human Development
When a cultural void appears it is filled by world civilization. America, China, the USSR and Europe have dynamic ideologies in common. The primacy of economy and materialism, scientific rationalism, the confining of all affective values to the private sphere, state control, priority of the individual over intermediary communities are all part of the dominant ideas. We could wish for local cultures to be solid enough to oppose certain doctrines. At least, we should like nuances or variants to appear in the proposed design. The world tends toward uniformity; it is all the more desirable to preserve the possibilities of choice within the international monotone that threatens mankind. This is why it is not enough to call for negritude: there is an urgent need to define and forge an adapted civilization, integrating the values we hope to preserve and promote, discarding those that are undesirable and incompatible with the proposed aim. In the end, it is the people, informed by political leaders, writers and intellectuals, the “voices of the nation”, who must decide. A re-examination of the question has already occurred under the impact of crises and wars. Progress is no longer the rigid and ineluctable mechanism that was seen in operation at the beginning of the 20th century. Now man is perhaps in the position to make a balance-sheet of his acquisitions and reject their doubtful elements. However, if no conscious effort is made, the errors of the 19th century may reappear. Industrialization will bring about a rural exodus and proletarianization. Social classes will become differentiated and in opposition to each other. In the place of a moneyed middle class there will perhaps be a middle class having public powers, but the result would be the same. Cities will develop, rural areas will stagnate, withdrawing from the national and international community. Driven by a consumer society, materialism is already engendering insatiable covetousness and invented needs. Men become conscious of their liberty and their responsibilities, which is good in itself, but they let themselves be dominated by egoism; old, familial and ethnic social ties are broken and not replaced by new community attachments. Groupings on a- human scale disappear before mass organizations, in the social and political domains as in the economic domain. This evolution is already in progress in the large cities. The modernization of rural areas, indispensable for maintaining a certain equilibrium, runs the risk of introducing it into the rural population. Demographic growth demands increase in production, but prudence is essential. Before approaching any element of social mechanization, a very sensitive subject, we must be certain that the people really want change. Technocrats are usually perfectionists in their respective fields; they are reluctant to accept the fact that man may be a hindrance to material development. They are in a hurry to launch operations and programs and feel offended if everything does not go according to a pre-established rhythm.
Salkind_Chapter 12.indd 204
9/16/2010 12:39:41 PM
Binet
Development: Transfer of Technology
205
However, the people must have all the time they need to make their voices heard. It is not easy. The rural population is dispersed, far from roads, slow to react and timid. Nevertheless, the initiative must come from them; if not, success is uncertain. Two voices are raised: the spokesmen for the state are generally favorable to modernization. They see in it an immediate means to furnish capital and count on an economic development that will bring affluence to public finances. The large-scale projects are usually financed by foreign countries or international organizations. This kind of support confirms the state receiving such aid in its idea of importance and good management. However, the emigrants or intellectuals, who are often dissatisfied and in opposition, cause another voice to be heard, echoed by Third-World intelligentsia. The age is fertile in suspicions; every action is’ viewed as colonialism, capitalism or imperialism, for example. As long as there is a reticence toward them in public opinion, a commitment to developmental operations is counterindicated. Even when they are thoroughly studied on the technical level, they may have unexpected consequences. If the people want them, they will be able to adapt, innovate, draw a profit from what is presented to them; if it is imposed, on the contrary, the project will be ill-received and will crystallize all sorts of uneasiness and criticism. Our age has learned that the development of consumerism is not an end in itself, that production has limits, that the enjoyment of life has a value. Scientific progress has been reconsidered since Hiroshima. Many old certainties have been shaken. In reconsidering this subject, it would be profitable to encourage creativity wherever it may be found. The transfer of technologies is good; inventing new ones is better, and in any case, the true needs of man must be defined. He must not be permitted to merely drift along paths that are already marked out. By following the technico-scientific track, he risks abandoning a close contact with nature in order to live in an abstract world. Let us not forget that a simple tool permitted the farmer to act directly on matter. Our peasant fathers measured with their muscles the compactness of the earth, like the Senufo with his hoe. The use of horses or oxen did not eliminate his walking in the furrows of his field. The worker is more and more distant from the material with which he deals. He must handle machines whose functioning he does not entirely understand: he must trust in processes over which he has no control. The economic world is abstract: products whose manufacture and origin are unknown, a network of producers, consumers and distributors that remain faceless initials or statistics. Like it or not, man is committed to a growing abstraction.
Salkind_Chapter 12.indd 205
9/16/2010 12:39:41 PM
Salkind_Chapter 12.indd 206
9/16/2010 12:39:41 PM
13 The Clinical Study and Treatment of Normal and Abnormal Development: A Psychological Clinic Lightner Witmer
I
have said to the president of the American Academy that I would demonstrate for the benefit of the members of the Academy, the nature of the work which is being conducted here under the caption of the Psychological Clinic. In the time at our disposal it will be impossible for me to give you more than a very superficial view. Some of you doubtless are interested in the scientific aspects of the problem. You would like to know what a psychologist is doing, what are the tests which he applies. This phase of the work I shall not be able to demonstrate. The tests which I shall make here this morning are very simple indeed, and are intended merely to put before you a few of the multifarious aspects of the problems with which we have to deal. They will have the purpose of making you acquainted with some of the physical and mental characteristics of the children in whom we are interested. I am going to proceed this morning just as I would in an ordinary clinic. This little girl, whom I know quite well, has consented to come here this morning and make one or two of these simple tests. (Professor Witmer takes the form board, which is a shallow oblong tray of light oak, having depressions of various shapes in its surface, into which fit ten blocks of dark walnut shaped like the depressions, – a square, circle, triangle, star, cross, semi-circle, and so on. He removes the blocks from their places and throws them on the table.) Source: The ANNALS of the American Academy of Political and Social Science, 34 (1909): 141–162.
Salkind_Chapter 13.indd 207
9/16/2010 12:42:54 PM
208
Human Development
Q. I am going to give you a new name this morning; you are going to be called Gertrude. What is your name going to be this morning? A. Gertrude. Q. Now if I make a mistake and call you by any other name, don’t you answer. Gertrude, will you put these blocks back again? Do it just as quickly as you can. It is an extremely simple test, but a very valuable one for those on the border line between normality and abnormality. The fact that she uses her vision and hands co-ordinately and without hesitation is proof enough in my opinion that the child is of approximately normal intelligence. Now I am going to ask a few questions. Q. What is that (showing Gertrude a doll)? A. A doll. Q. What is that (showing her a toy dog)? A. That is a dog. Q. Have you a dog yourself? A. No. (Miss Elliott and Fannie enter, and the former is warmly greeted by Gertrude.) This demonstration is just as important a disclosure of character as any test we may give. Fannie, you take those blocks out (spoken in a low tone). This child is deaf. I was lowering my tone in order to bring out that fact. She seems to be hearing quite well this morning, Miss Elliott. (Miss Elliott.) Some days she can hear very well, and sometimes not so well. Sometimes it is normal. It seems very nearly normal to-day. (Miss Elliott.) In this kind of weather you might say it is all right. Fannie, take up the doll for me. (Repeated louder and louder.) Pick up the doll. (She does so.) Sit down in your chair. (She does so.) Her hearing is very much better this morning than it usually appears to be. Fannie, would you be willing to read a little for us? I do not know whether you have this reader in your school. (Fannie reads.) See – my doll’s – funny – carriage. She has a lisping voice, that is a defect of articulation. (Fannie reads.) I – have – brought – the – doll – with – me. That will do Fannie, much obliged. I want to say that the appearance of this child here before this large assembly, her ability to read before you, is really surprising to me. When I first saw this child about two years ago, she was one of the shyest children I have ever encountered, in fact part of her trouble was shyness. That shyness was bred of continued failure, without any doubt, and the reason this child is able to appear here this morning and read a few sentences; meagre as the
Salkind_Chapter 13.indd 208
9/16/2010 12:42:54 PM
Witmer
Treatment of Normal and Abnormal Development
209
performance may appear to you for a child of her age, is due to the fact that she has had the encouragement of success; she has been shown that she is able to do something. Another cause of shyness was deafness. Originally her hearing was about one-fourth normal, perhaps worse than that. To-day it has considerably improved. Defective hearing produces shyness. Defective hearing also produces other characteristics which were marked in this child, – sullenness and stubbornness. It was at first impossible for us, even in the quiet of the recitation room, with only one or two children, to get anything out of her at all. These fits of sullenness and stubbornness were pathological, in the sense that they would come on apparently without sufficient cause, and would persist for half an hour or an hour. They were overcome simply through improvement in physical condition, and through subjection to the proper kind of educational treatment. I mention the fact because I want you to observe her actions here this morning. She is apparently a perfectly self-possessed child, not at all shy, not at all sullen. The first time I ever showed this child at a clinic of this kind, she positively refused to do anything. She is the kind of child who, in the public school, if sent to the principal simply sits down in a chair or stands absolutely sullen, refusing to answer any question. Now young man (turning to the boy R. S.), I am going to give you something very easy to do. I am going to ask you to read something for me. (The boy reads very low and hesitatingly. The children are then all sent out of the room.) I am going to speak to you about these three children, Gertrude, Fannie, and the boy R. S. The boy you saw last is a child who is in course of treatment here. This morning is the second time I have seen him. The first time he came here was April the tenth. He came with a statement from the principal of the school which he was attending, that he was about to be expelled from that school or, sent to truant school because of persistent stubbornness. The statement was also made that he is extremely backward in his studies. He is an overgrown boy of twelve years of age. He is only in the third school year, so he has lost three years of the invaluable six or eight years of school life. He is not likely to get into the high school until he is eighteen, so he will undoubtedly be cut short in his educational work. This boy comes to the Psychological Clinic with the request that I find out what is the matter with him, and send some report to the principal and to his teacher. He is brought to me by his mother, who is perfectly willing to give a complete history. She has a family consisting of a number of girls. This is the first and only boy. Apparently she has always had trouble with him. She is one of those women who are always voluble about their troubles, and in his presence she tells how bad and obstinate he is, – practically giving up the task of discipline before her twelve-year-old boy. She cannot manage him any longer. This boy as I saw him for half an hour, does not appear to me to be a child
Salkind_Chapter 13.indd 209
9/16/2010 12:42:55 PM
210
Human Development
who could be suspected of mental enfeeblement, and does not look or behave to me like a boy who would be especially difficult to manage. When a boy comes into the school and manifests obstinacy there. we must remember that his behavior is in large part a product of his home treatment. The discipline of the child should begin the day he is born, and many children show lack of discipline in the schools when eight, fifteen, or perhaps twenty years old, because the initial lack of discipline was in the first, second, or third year of the child’s life. These problems are being turned over to the schools. The home is practically asking the school to remedy its defects. We must assist the home in the better training and disciplining of these children before and after they enter school. Part of our work must be to send a competent social worker or teacher into the home. This mother is perfectly willing to learn. Whether she is competent to learn I do not know. Perhaps she will be very resistive of an education, as many mothers are, but we must try to do it, and undoubtedly we shall find some who can be instructed and assisted. The usual fault is too much affection or too much and ill-advised discipline. Now we see in this boy certain marks or signs which suggest the advisability of suspending judgment for a while. He is an extremely shy boy, and I wished to say very little about him in his presence, nor did I desire to put him to any test. His heart was beating violently, without a doubt, while he was in the room, and I did not wish to increase the strain in any way, so I let him go quickly. This boy I suspected of having adenoids. I sent him over to the University Hospital, where a physician diagnosed the presence of adenoids, and on Monday morning he will be operated on for them. In addition he was sent to the medical dispensary, and in this work I may say that we are assisted greatly by Miss Ogilvie, who has charge of the social service department of the University Hospital. When we tell a parent or a teacher to take a child to a medical dispensary for adenoids or medical treatment, we have not assured ourselves that the proper treatment will be accorded to the child. We must follow the child into the dispensary and see that the child really gets the necessary attention. It is a question of time on the physician’s part. He is overloaded with work in most dispensaries, and the very child for whom we think it is most important that he should give time and attention, is sometimes the child who may be brushed aside. If I suspect adenoids, and I get a negative report from one dispensary, I sometimes send him to another. Corroborative opinions are particularly necessary where one suspects defective action of the internal organs. It is easy to have adenoids diagnosed and cut out, but it is extremely difficult to find anyone who will make a careful investigation where there is some chronic digestive trouble, and who will give the prolonged and careful treatment which is required in these cases. This boy seems to be on the verge of going to destruction. He is obstinate, likely to be thrown out of school. He is overgrown, precocious physically. He is already beyond the control of his family. I would say that his condition is
Salkind_Chapter 13.indd 210
9/16/2010 12:42:55 PM
Witmer
Treatment of Normal and Abnormal Development
211
just as critical as that of a patient who must be operated upon for appendicitis. Some do not think so. It is a chronic state; he is not going to suffer particularly to-day, to-morrow, or within five years possibly. Nevertheless it is critical, if we are interested in his taking the narrow path in preference to the broad road. We must see, therefore, that these children obtain the kind of medical treatment which we believe necessary for them. This child is reported from the University Hospital to have a mild myocarditis, and an arhythmia of the heart, a fibroid lesion of the heart perhaps not active at the present time. The redness of the hands was evidence to me of some circulatory disturbance. I am not a physician. I never diagnose, – not even a case of defective vision. My work is simply to find out what are the danger signs displayed in the child’s mental and physical make-up, and when I find these danger signs there, I send the child to medical experts for diagnosis and treatment. If it would not overload the dispensaries, I should send every child for a thorough medical examination of eyes, ears, nose and throat, nervous system and internal organs. This boy may be a moral degenerate for all I know at the present minute, and my work in a large number of cases means suspended judgment for a time. Trust nobody’s report of what the child has been like. One must rely chiefly on what can be found from direct observation and examination. This other child, Gertrude, is a very interesting case illustrating just this particular point. She was brought to the clinic one morning by Miss Campion, a representative of the Children’s Aid Society in this city. She had previously told me that the child came from a county poor-house in the state: that she had been brought by the authorities of that county to the city of Philadelphia with the statement that she was a menace to the other inmates of the institution. In the care of the Children’s Aid Society, the child had been placed in a hospital in this city, and the report from the hospital was that the child was a danger to the other children and they wanted to get rid of her as soon as they could. At the time I first saw her, the child was living in a boarding house in this city, being boarded out by the Children’s Aid Society, and the report was made that the woman in charge of the boarding house found it necessary to give the child valerian every day in order to keep her quiet. Gertrude was subject to outbursts of passion, in which she was dangerous to other children of her own age or older, and to adults. With little children the statement was made that she was usually kind, and Miss Campion herself made the same observation. There was a report from a physician who had examined the child, which warned the Children’s Aid against putting her with normal children, and the question was put to me whether I thought there was any likelihood that the care of this child could ever be confided to some family who might be willing to take her for adoption. On her history, no society would be justified in getting anyone to look after the child. When Gertrude first came into the clinic, I felt
Salkind_Chapter 13.indd 211
9/16/2010 12:42:55 PM
212
Human Development
that this was a case I could dispose of in a moment. I then had before me the physical picture of degeneracy, and at times, – I do not know whether you felt so this morning, – the child’s appearance is such that one could easily suspect her of mental and moral degeneracy. But when you receive a report like the reports spread about this child, you may be sure your interpretation of what you see in her face will tend to substantiate the reports. Fifteen minutes’ examination showed me that I had to deal with a child not mentally deficient, but rather above than below ordinary mentality. Subsequent observation has confirmed that judgment. I came to the conclusion that any retardation the child showed in her school work (and she was retarded, – she cannot really read at the present time), was simply due to the fact that she had not been educated. Why, I am not able to say, but it is lack of education, not lack of ability. As to the existence of moral symptoms, no examination of fifteen minutes can be conclusive. I simply said, “I will have to keep the child under observation.” I put her with a woman in whom I had confidence, in order to try her out. Miss Campion succeeded in raising the money for the child’s support. After she had been ten days in this house, living with the little girl Fannie, not being a serious menace but nevertheless rather troublesome, – she was entered in the first grade of a public school. She stayed in that grade two months, but did not get on particularly well. The principal reported that she was troublesome and required too much individual care from the teacher of the grade who had charge of her. I then took the child into the Hospital School, where she has been for five weeks. She is a source of great trouble to us. She is the most expensive child in the school, in the sense that she takes more of the time of the people who are taking care of those children, than do the others, and the reason, in my opinion, that she is so difficult to handle is because she is so normal. I am ready to be shown that I have made a mistake in this case, but I believe I have ninety-nine chances out of a hundred of being right. Of course, I am expressing a prognosis, and a prognosis in regard to a child’s mental and moral future is a risky thing to make, even for a normal child. But I say this child is normal mentally and normal morally, and I think she has the stuff in her to make it possible for her to develop into something worth while. For that very reason, she is difficult to handle in the institutions in which it has pleased society to place her. The child has fight in her. She has been fighting like a rat in a corner. Now your institution child, the one who does nicely, is the one who stays where he is put, – apathetic, a nice child. He is the cheapest child the institutions can possibly handle; he does not require any individual attention. This child will not stay where she is put. She is very troublesome, always up to something. The more you punish her with violence, the more obstinate and stubborn she becomes. This child has good concentration of attention. When she is interested in a bicycle or roller-skates, she has that on her mind and nothing else. That is
Salkind_Chapter 13.indd 212
9/16/2010 12:42:55 PM
Witmer
Treatment of Normal and Abnormal Development
213
what we want in education. If used in the right way and developed in the right direction, you have something which you will never have in the child who is willing to take up one thing as well as another. Gertrude is also an extremely imaginative child. While taking her to school one day, she said to my assistant, “Everybody spoils me very much. I suppose that is the reason I am so much trouble.” Now if any child had not been spoiled, this one had not, except entirely in the wrong sense of the word. For all I know, she may think she is some little princess. She certainly manifests intense imagination. Thus she walked lame for two or three days at one time, imitating another child in the school, until she was put to bed, which cured her lameness. You saw how well she did here. She entered into the spirit of the occasion and did this work well. I can take a splendid photograph of this child, because she has perfect lack of self-consciousness. She would make a good actress. At the same time she is very emotional and responsive. You saw how she greeted Miss Elliott. She would have greeted Fannie in the same way except for the fact that she has been told she must leave Fannie completely alone. Now this child is suffering from what I suppose may be called physical degeneracy. She has a few very slight, but yet noticeable marks of the effects of an infectious disease, probably congenital, from which she has recovered, but the effects of which have not been entirely outgrown. This is a physical handicap of a slight sort which the child will probably carry more or less through life. She cannot help it. It is due to the sins and misfortunes of her father and mother, but for the rest it remains for society to repair that damage, and at the same time to see that this child has a chance in an environment that is suitable for her development. The other case, Fannie, is the one that I selected for presentation here because it brings up in specific form the social and economic issue. Here is a child, one of seven, of Russian Jewish parentage, living in two or three rooms, brought out here to the clinic two years ago by Miss Stanley, head school nurse in this city. “Is the child feeble-minded?” That was the question, practically, which was asked of me. She had been two years in the first grade and had made no progress, and there was no chance of her being advanced into the grade above. “Is she feeble-minded?” It appeared to me that whatever the answer to that question might be, the first thing in importance was that the child was deaf. She could not hear my questions unless I had her right close to me and yelled in her ear. The next thing in importance was adenoids. The next, that she was suffering from insufficient and improper food. Now what are you going to do with a case of this sort? For two years I have had her under observation. I take a case of this sort for the purpose of illustration. I do not expect ever to have another case like Fannie. It is too expensive, for one thing. But I do expect to finish up with this case and place it before the community as an illustration of what can be done in certain cases. Here is one of a large number of children. At eight years of age this child was already hit, knocked out by the social and economic environment into which she had
Salkind_Chapter 13.indd 213
9/16/2010 12:42:55 PM
214
Human Development
been born. Insufficient food and bad air gave her adenoids. The adenoids gave her middle ear disease, and middle ear disease made her deaf. The deafness has been largely corrected, but the child is still deaf. To-day it is surprising to me how well she hears, and it has encouraged me to think that her hearing may be restored to normal, but I have always been very doubtful about that. More than this, the child has been of the greatest interest and stimulus to me from the psychological standpoint. In making out the mental status of a child we have to deal, in the first place, with the senses and activities of the child. For one thing, Fannie lacked the sense of hearing, and she lacked articulation. We found the first Christmas she was with us, that this eight and a half years’ old child did not know the word “bird,” and was absurdly ignorant in many other respects, not because she was feeble-minded, but simply because she was deaf. Feeble articulation increased her deafness for words. The sensory and motor sides must be corrected simultaneously. Every child has a group of, – instincts is about all we can call them, – traits of character if you choose. These traits of character are a result of the development of the child’s nervous system. We cannot say whether they are inherited or not. They come into the child as a part of the general inheritance. Imitation is one such instinct; curiosity is another; affection is another. This child, when she came to us, had no affection; she was sullen and apathetic; she was stubborn, showed no signs of vanity, and no imitation, – a sort of cabbage which you might have around in your garden. It was not psychological treatment that was required by this child. What was needed was psychological insight in the person who was handling this child, but more than anything else in the world she needed good food. That is what helped to bring her up. She wanted something in her stomach which she could put into her nervous system so that it could grow. Where are you going to get it? I do not know. That is for an Academy like yours to decide. Do not bring social problems like this to the Psychological Laboratory. They do not belong here. The problem is an economic one and must be solved outside. I may be able to put this problem clearly before the community, in order to show that we must reconstruct the community before we can make many a child’s mind develop in the proper way. Fannie, when she came to us, knew nothing at all about affection. When she saw another child cry because she was homesick, Fannie only laughed in the most silly, idiotic way. This was an odd phenomenon. When she was petted, she laughed in the same silly way. At the present time Fannie is one of the most affectionate and demonstrative of children. She is still shy, though. Since she has had clothing and fairly good food she has vanity. Vanity is a most important instinct, both in the man and in the woman. Take the other child, Gertrude, for instance. Gertrude would go to school washed up, nice and clean, her gloves tied to her coat, and she would come home looking as if a cyclone had struck her. She would not take care of her
Salkind_Chapter 13.indd 214
9/16/2010 12:42:55 PM
Witmer
Treatment of Normal and Abnormal Development
215
clothes. In a rich home they would be taken care of for her. It would not be serious in a rich home, but it is serious when you are trying your level best to have her supported at all. But this trait of character should not be used to misjudge the poor child. When we gave her a room, good clothes, and a bureau to put the clothes in, there was no child in the place who took better care of her clothes, – that is, her good clothes; she does not take much care of the others; she knows the difference. Gertrude has good taste. She can tell whether she likes a woman’s hat, and she can tell you why she likes it. At least that is what the teachers report. I have not had any conversation with her on the subject. I want now to say a word or two in regard to the general aspects of our work here. I began what I call the Psychological Clinic in 1896. I now use the term “psychological clinic” in three senses. The Psychological Clinic, or dispensary, is a place I have down stairs here. On certain days I am on hand to see children who are sent to us. We try to find out what is wrong, and we send the child to the proper agencies. What we need more than anything else is a number of efficient social workers who will go into the home and show how things should be done, and see that the child goes through the medical dispensaries. Out of this work has come the Hospital School. That is to say, in the case of certain children like Gertrude, there is no means of finding out what the child’s mental and moral status is unless you have had her under observation with the right kind of environment and with competent persons. If the Psychological Clinic is going to do a large measure of service, it must do it through its education of the entire community. It must, through the reporting of its work and the development of an educational department in connection with a university like this, be able to give instruction to those who will subsequently continue the work. For that reason I employed the term Psychological Clinic as the title of a journal which I started some two years ago, which is growing to be an extremely important factor in the development of this work. I must get reports of the work which we are doing here sent out into the world, and I must try to get people from outside to send reports in to me, so that there may be an interchange of experience and opinion. In this current number of the journal there are two extremely valuable and important articles, both by teachers of special classes. If we can once get the teacher of the special class to become articulate, – not only to do good work, but to talk about it, – if we can get such teachers to study their cases just as a physician studies and reports his cases, I think we shall have gone a long way towards solving the problem. The psychological laboratory which will solve the problem is either the school room or the social settlement. If we can put the right people in to do the work, and then see that we get the right kind of reports of what they are doing, I shall feel that this work has at least been put upon a basis where it is likely to achieve results of some importance. The Psychological Clinic in the third use of the term is a course of lectures and demonstrations similar to the one I have given you to-day. Once a week, on
Salkind_Chapter 13.indd 215
9/16/2010 12:42:55 PM
216
Human Development
Saturday mornings, I give a lecture at which I bring children here, present them to the class, and then talk about the situation, the kind of treatment indicated, the results of treatment in progress, etc. This is the educational feature of the work, as it may be carried on as a department of university instruction. I have said that one feature of this work is the special class in the public schools. I am going to show you a special class, a selection of children from a single school in the city of Philadelphia. Miss Maguire is the supervising principal of the Wharton Combined School. In that school was organized a special class. She has in her school 1800 children. I believe that every school with a population of a thousand has enough children to form a special class of fifteen to twenty-five. Miss Devereux is the teacher of this special class, and the record she has made in advancing some of these children I think is a very remarkable one, and I want Miss Maguire very quickly to run over a number of the children treated in that class.
Miss Maguire The first case is Little Mary; sent to me three years ago from the first grade. In consequence of scarlet fever and diphtheria she could not at that time talk. We took her in, and mixing with sixty children in the first grade she learned to talk a little. At the end of two years we placed her in the second grade and she seemed to go back very rapidly, because everything was out of her reach then. Mary was placed in the special class formed at that time. Her mental and physical condition was at a very low stage. She is now entirely dismissed from the special class, is doing second-year work, and will go to the third class in June. In every way the child’s improvement is decided. H. S., three years in the first grade – practically accomplished nothing – placed in a special class. Was a year in that class and spent part of the time in regular class. Now dismissed from special class and doing good work in second year. His eyesight was in a bad condition and had to be attended to. This was the case of a boy whom a trained psychologist had graded as an imbecile. He was three or four years in the first grade. He has been examined and glasses prescribed. They helped him marvelously. The special work in the class, hand work, and study of his own particular condition have made the most remarkable results. A wonderful change has taken place in the child’s physical appearance and mental condition. I am sure that if this boy had not had special training, with study of the child himself, and hand work, there is no doubt that he would have developed into a backward child of a very low type. This little girl was sent from a school outside three months ago. Five years in the first grade – it was not a public school, so I may speak of it frankly in this way – five years in the first grade of a parochial school. I think in the
Salkind_Chapter 13.indd 216
9/16/2010 12:42:55 PM
Witmer
Treatment of Normal and Abnormal Development
217
public schools something would have been done in five years. She had been allowed to remain there five years, and at the end of the time was sent home to her mother with the statement that she was developing incorrigibility. She did not look like a hopeful case when she came. Her personal appearance improved remarkably. I hesitated a good deal in putting her into the class, but I let her go into the class three months ago. She could not read a word, could count none at all, and we first had her do things around the room. We have been training her mind and hand, and her mother told me the other day that her improvement was marvelous. She now appears to be getting some of her words, and we are gradually teaching her to read, and we are depending very much on her hand work. I believe we shall be able to put the child into the second year at the end of this year. This little child is a boy in our first year. He has done up to this time very little in the first year, so he has been put in the special class, where the hand training appeals to him greatly. He can do very fine work with his hand. His hand work is what we depend upon. The doctor diagnosed him as cretinoid. Rachel is nearly twelve years of age and she is in our second year, but is not doing second year work. This child seems to be the most hopeless case in our school. I do not believe we can educate her enough to have her earn her own living. Without a great deal of care her conduct would be troublesome. In the special class we are able to interest her sufficiently to hold her attention. I do not think that we can ever discharge her entirely from our special class. Jacob is one of our very fine specimens. He was also marked by one of our examiners as very low grade, and I thought him right in that respect. This child had very poor eyesight and his physical condition was very low. He is now one of the best boys in the second grade. I think sometimes we make backward children. We should study the children and see what needs to be done. This is a good example of what can be done with a thoroughly backward case. His physical condition was such that he could not keep up with the class. After he had been trained to think and see he could keep up with the class. He is a good student and will go through school with very little difficulty. This child was sent from an outside school. He was sent to me after three years in the first grade. His physical condition seems to be normal. I have not found out any reason why the child should not be doing something in the first grade, but in our first class he can scarcely do anything. Even after three months of very special training his power is very limited. He learns a word with great effort, recalls it and forgets it alternately. We are uncertain as to the outcome. He does take to his hand work and we are able to train his mind quite considerably through his hand work, and within a year we may be able to show why he was as he is to-day. I am sure we can say that the work of the regular class in the school would develop a very backward boy. He does not show any symptoms physically. He plays and is happy. He was the pitcher on a baseball team, but you cannot teach him to add and subtract. I brought him out to have Dr. Witmer tell me what was the trouble.
Salkind_Chapter 13.indd 217
9/16/2010 12:42:55 PM
218
Human Development
Dr. Witmer Miss Maguire has given us an excellent presentation of the work of the special class. It is my opinion that we need special classes in all our schools, and the success of this class I want to say is dependent not only on the teacher of the class, on the supervision of Miss Maguire, but it has also depended on the work of Miss Stanley, the head school nurse, who, even before the class was organized, took an interest in many of these children, and visited them in the schools. Miss Stanley brought the child Fannie here first. The success of the work with this class is therefore not only due to such work as we may be doing here, and as may be done in the public schools, but is also due to the associated work done by the medical inspectors and the trained nurses. We were to have had the pleasure of having Dr. Neff address us this morning. I had hoped Dr. Neff would speak on the subject of medical inspection, and especially on the institution case. If the public school endeavors to take care of the institution case I believe it will make a grave mistake. And yet there are many institution cases in our public schools to-day. Dr. Neff not being present, I shall ask District Superintendent Cornman to say a few words in regard to a school for backward children which he has organized, and also in regard to the feeble-minded children in the schools of Philadelphia.
Dr. Cornman The Adams School, Darien Street, below Buttonwood, is an instructive object lesson of the need and value of special classes for backward pupils as part of the public school system. It is in a semi-slum district where a considerable proportion of the population is near or below the poverty line. Some of the children have dissolute parents, many are poorly nourished and an unusually large proportion are both physically and mentally subnormal. Individual examination of the 250 children of the school was made about three years ago. So many backward pupils were found that it was determined to utilize the building as a special school for backward children. About 160 children of fair or good mentality were transferred to nearby schools, while the remainder were retained for further diagnosis and educational treatment in small classes. Backward children from surrounding schools were transferred to the Adams, so that it now numbers, in the third year of its existence as a special school, about 190 pupils. These are under the care of two kindergartners, six grade teachers and a teacher of woodwork and other forms of manual training. The size of class has been reduced from fifty to about twenty-five per teacher. The classes are small enough, therefore, to permit the teacher to assist the pupil in accordance with his individual needs.
Salkind_Chapter 13.indd 218
9/16/2010 12:42:55 PM
Witmer
Treatment of Normal and Abnormal Development
219
The children vary in capacity from the very slow or dull, who are held under observation to determine whether they shall be placed in a regular class or not, to the distinctly backward and even to the feeble-minded. Indeed it has been found necessary to assign to one teacher a group of twenty of the latter class, every one of whom is an institutional case. The feebleminded present a most serious problem. They should undoubtedly be under permanent custody, but existing institutions are already much overcrowded. The true functions of the special school are seriously hampered by these cases, and it is a question whether they should not be refused admittance altogether. The little that can be done for them in special school may only aid them to take a place in the world where they almost inevitably drift into vicious and dissolute ways of living. They are, however, happier in the special schools than on the street or in regular classes, and their segregation in a special school is a standing object lesson of the necessity for their institutional care. If refused admission to special school the existence of these cases is liable to be concealed or ignored and the need of public provision for them fails to be appreciated. The results have fully justified the conversion of the Adams into a special school. About a dozen pupils each school year make such progress that they are transferred to regular schools. A few of these are fourth grade pupils (the highest grade of the school) who have earned promotion to a nearby grammar school. The majority of the pupils, however, receive the greatest benefit by remaining in the school until they reach the age when they leave to go to work. The enrollment at the Adams School represents about 4 per cent of the number of children of school age within walking distance of the school. This percentage, though higher than that which obtains for the city as a whole owing to the special local conditions, is an indication of the great demand for special classes for backward children. For the first time in the history of public education in this city, a careful census has been taken of the mentally subnormal children in the schools. This census has been made under the direction of the Bureau of Health, acting in conjunction with the Department of Superintendence of the Public Schools. Official report of the returns has not yet been made, but the preliminary count shows about 500 denominated as “feeble-minded” in all the schools of the city. Of these about fifty are enrolled in special schools, so that special provision is made for only about one-tenth of all the cases. About 1500 ‘‘truant or incorrigible,” one-third of whom are in special schools, are enumerated, and 3000 “backward,” one-tenth of whom are in special schools, are reported. The number of defectives thus listed aggregates about 5000, or approximately 3 per cent of the public elementary schools enrollment. The census is an under rather than an over estimate of the number of defective children in the city. If the same percentage obtains in parochial as in public schools, about 1500 more must be added, while many not attending school at all would also swell the total.
Salkind_Chapter 13.indd 219
9/16/2010 12:42:55 PM
220
Human Development
Such provision as has been made for the subnormal children is both crude and inadequate. The buildings are, as a rule, in poor condition and not well adapted for the work. While many of the teachers are doing admirable work, they have not, as a class, been specially trained nor selected for it. Separate institutions are needed for the permanent custody of the feeble-minded. A considerable proportion also of the truant and incorrigible class are of such a character, or have such home environment that they should be cared for in a parental school, and at least 100 additional special classes for the backward should be established. It is evident that the problem of the training of the defective child is a serious one. It is to be hoped that the report of the census by the Bureau of Health will arouse the public to an appreciation of its importance and result in adequate provision being made by the educational authorities. At the conclusion of Dr. Cornman’s address, Dr. Witmer introduced Mr. Otto T. Mallery, who read the following paper on: Playgrounds as a Municipal Investment in Health, Character and the Prevention of Crime1 There may be some misguided persons, of course not among the membership of the Academy, who are under the impression that play is something trivial, something incidental, something unimportant done between hours of work. Such a person may be converted to the Gospel of Play by observing a small boy standing on his head. Every muscle is under orders. His attention is concentrated and his will issuing peremptory commands to all parts of the organism. The whole boy is very much alive, keen, alert. His head, both outside and inside, is undergoing quite as great a strain as though he were studying a book. A moment’s wool gathering at his books is possible without serious mental prostration, but a moment’s wool gathering with his feet above his head results in physical prostration of the most ignominious sort. Play is a great mind as well as muscle builder. Self-control under stress; loyalty, obedience and fair play in team games and a sense of subordination of the individual to the welfare of the team, are all not only ideals of the playground, but ideals of character as well. If our misguided person needs to be reinforced by observation of the other sex, he will find an unconscious missionary of the Gospel of Play in a girl of six, seated upon a pile of builders’ sand in the street. The little girl has found the sand plastic. She is molding the sand, impressing her character upon it. Most of the things of the street – its filth, its standards, its diseases – impress their character upon her, whether she wishes it or not. Over the sand she is the commanding purpose, the arbiter of its shape. She is exercising her creative, her formative instinct. The child is making something, perhaps the first thing she has ever consciously made, and making things is an important part of being alive. Wherever children are
Salkind_Chapter 13.indd 220
9/16/2010 12:42:55 PM
Witmer
Treatment of Normal and Abnormal Development
221
gathered together, on the sands of the sea or the sands of the street, this universal creative instinct comes into action. Creation and recreation are closely allied. The first commandment in the Gospel of Play is: “Thou shalt play with all thy mind and with all thy strength, and with thy neighbor as well as by thyself.” This is implied in “Thou shalt love thy neighbor as thyself,” for psychologists and experience alike tell us that in group play our social affections are first developed. So in many other directions the influence of play upon the normal growth of the character and health of a child is traceable. Play is as necessary to a child as light and air to a growing plant, and yet modern industrial conditions have deprived the majority of city children of the exercise of this universal instinct in its proper form. “In the planning of our cities the children have been left out,” and as a result American municipalities have serious social problems to solve. One hundred and seventy-seven American cities have opened supervised playgrounds, and the playground movement has gained its impetus upon the sound argument that playgrounds are a good municipal investment in health, character and prevention of crime. Chicago has spent $11,000,000 upon a system of playgrounds which Theodore Roosevelt describes as “the greatest civic achievement of the age.” One-tenth of the area of the city of Boston is devoted to parks, playgrounds and bathing beaches. The administration has undertaken the development of the children with the same care upon the physical as upon the educational side. New York demolished a block of tenements at a cost of nearly $2,000,000 and established a playground upon the site. Where once several murders were committed each week, now a thousand children are playing each day. New standards have been set up and the influence of the playground is felt throughout the neighborhood. Other smaller cities have made great strides towards an adequate playground system, which shall offer healthful organized activity to every child. The influence of playgrounds upon civic health is obvious. The International Tuberculosis Conference has placed playgrounds as an important plank in its platform. Backward children are often found to be handicapped solely by lack of physical development. The increase of vitality gained upon the playground shows itself in increased efficiency in the school room. In Philadelphia it is estimated that 20 per cent of the school funds are spent upon children who are going over the same work for the second or third time. The cost of the repeater is great. The playground reduces the number and cost of the repeater. When England underwent an industrial transformation at the end of the eighteenth century the population flocked to the towns and were herded in unsanitary and deteriorating congestion. No municipal care was undertaken. According to the individualistic theory, the fittest would survive. The submerged tenth, however, had its origin. Breeding took place from lower
Salkind_Chapter 13.indd 221
9/16/2010 12:42:55 PM
222
Human Development
and lower physical and moral levels. As a result, when the debilitated city dwellers marched upon the plain of South Africa, they dragged out the Boer War and threatened the fall of the British Empire. The same city congestion is an American problem to-day. Playgrounds provide a means of raising the average vitality of the community. Hospitals will always be necessary, but a playground opened to-day saves the opening of a hospital to-morrow. On the score of economy of money and industrial efficiency playgrounds are a good municipal investment. The games of the street teach shrewdness and cunning. Every boy is for himself. There are no rules except to win at all costs. On the playground, under proper supervision, new standards are inculcated. In team games a boy learns to work for the welfare of the team, rather than for himself. It is a great step forward to fight as a member of the team for the honor of the neighborhood, rather than for oneself against every one else in the neighborhood. The ideals of the playground are fair play and self-government. The relation to the ideals of good citizenship is not difficult to see. When a certain playground was first opened the bats and balls began to disappear, leaving that many less for use. Searching parties were formed and one by one recalcitrant offenders were rounded up and the bats and balls ferreted out. Now the community sense has so far developed that the bats and balls are guarded as community property with a greater vigor and success than transportation and lighting franchises are retained for the community’s benefit by those who have lived longer in this world. So much of a human being’s character is formed in play that it is quite to be expected that much character is deformed, degraded and twisted and perverted where wholesome play is prevented. A boy is much like a boiler – full of restless energy which must find an outlet. The boy’s safety valve is play, and much of what we call juvenile crime is merely play energy gone wrong. Give the boy the game to play, give him exciting feats to perform on the flying rings and trapeze and the juvenile court will be deserted for the public playground. The boy in the street who throws most energy into knocking out a window or a policeman is the same boy who on the playground throws the most energy into knocking out a home run. The boy who most successfully steals a cabbage from the corner grocery is the same boy who most successfully steals a base in the ball game. The stolen cabbage is a test of wits and legs against the policeman, who in his capacity of catcher is apparently provided for that very purpose. The stolen base is a test of wits and legs, with no after effects on the runner or catcher in the juvenile court, reformatory or prison. The boy who leads the gang of hoodlums against the blue-coated symbol of the law is the same boy who, under other conditions, leads the playground to order and fair play. The personal force is the same. The difference lies in the direction of its application. In a certain district in Chicago the number of cases in the juvenile court decreased one-half after a playground had been established. Everywhere the testimony of judges, supervisors and social workers is to similar results.
Salkind_Chapter 13.indd 222
9/16/2010 12:42:55 PM
Witmer
Treatment of Normal and Abnormal Development
223
The test of economy again holds good. A playground is cheaper than a jail. Play is more attractive than vice, and the prevention of crime by the provision of a preferable substitute is a demonstrably sane and practicable municipal investment. When public opinion intelligently and forcibly demands, the funds are always forthcoming. The cost of an adequate playground system is a large item in the budget, and agitation must now concentrate upon this phase in order that the foundations may be laid for a robust motherhood and a vigorous citizenship for the next generation of city dwellers. Dr. Witmer introduced Miss Ogilvie, head of the Social Service Department of the University Hospital, who said: This hospital service is very new, so new as not to be known by many of the other hospitals in this city. It was started three years ago in the outpatient department of the Massachusetts General Hospital, and has become almost indispensable and so popular as to be established in at least fifteen of the large hospitals in the East. I do not know of any of the western hospitals, except one in Chicago, which has it. We started the work in the University Hospital just eighteen months ago, as an experiment, and after twelve months we decided it was of sufficient account to be made a permanent department of the hospital. During the first twelve months we spent most of our energy in what was most important to us, the tuberculosis work. Nearly a third of our patients were cases of tuberculosis. We gave instruction in hygiene, arranged for home treatment where we could, and where it was possible and the cases were suitable we sent them to sanatoria or hospitals. Another department of that work was securing proper employment for people who have tuberculosis. Just this morning I had a letter from a certain sanitarium asking if I could not send them a probationary nurse who might have tuberculosis in an incipient stage. They wrote that the nurse we sent three months ago had done such good work that they wanted another. While the work along this line seemed at times rather hopeless, we have accomplished a good deal. We have a great many neurotic cases and a great many cases with the simple request that we cheer them up. Sometimes the doctor could find no reason for the symptoms they had. Only yesterday we had a case of hysteria at the office. We tried to give her some good cheer. We have not really established that part of the work known as social therapeutics, in the way that Dr. Worcester is doing it in Massachusetts in the Emmanuel Church Movement, and yet I may say that we do a great deal of good right along the line of suggestion. It is of course impossible to state just how much good we have done, sitting in the office and giving advice to the people, instilling some hope into them and helping them along in the journey of life. To me the most interesting part of the work is the “steering” or conducting patients through the dispensary, sent from other sources. Last year we had
Salkind_Chapter 13.indd 223
9/16/2010 12:42:55 PM
224
Human Development
only 366 cases altogether, but 131 of them were patients sent in by other agencies to be conducted through, with the request that we send a report back. A good many were children and came mostly from the University Settlement House, the Society for Organizing Charity and Dr. Witmer’s Psychological Clinic. There were also some cases from the S. P. C. C. Perhaps you do not know, most of you, what it means to take a child so sent in, make a special case of him, and see that he gets the very best medical attention. I always try to see that the chief of a medical dispensary examines the child and gives the treatment. It is a little hard to get hold of the chief. He is always busy, but if possible I have Dr. Fussell see the child. We get his very expert diagnosis, treatment and advice, and we then take the child to the next dispensary, if necessary. For a long time doctors dealt with these cases with a feeling of hopelessness, because there was no one interested in them. Now that there are several persons interested in these cases, the doctor is willing to do his best, with the assurance that he will have intelligent co-operation, whereas before this bureau was established he had no means of knowing whether his orders would be carried out or not. If the patients were able to pay $25.00 for the advice of a specialist they could not be better attended to than they are at the dispensary. Last year a boy was sent to us by Dr. Witmer. Like most of the cases he sends us, this boy was about twelve years old. We sent the boy through five dispensaries, four in one day. It took a good deal of work to see that he was examined first at one dispensary, and in the last he waited a little later and was seen. After he had been examined in five dispensaries, it was found in four of them that he had some positive defect or ailment, for which he received treatment. This boy had quite a remarkable propensity for lying and stealing, and it is hardly necessary to say that his morals have improved to a great extent. As for this little girl Fannie, I cannot tell you how many dispensaries she has been through, but I went with her to many. She has a sister (Rose) sixteen years old. From her attitude and the hopeless expression on her face you would think her a woman of 60 or 65, that she had a dozen diseases and had lost her last child. When she came into the dispensary people remarked about her, saying, “Who is that poor girl?” She had been through at least five dispensaries and is always talking about her ailments. I found her living in the rear of a squalid tenement house, with no open space excepting an alley about eighteen inches wide. Her family might have a little air, but they keep the windows almost hermetically sealed, and three, four or five people sleep in one room. They have three rooms, one above another. We succeeded in enlisting the interest of the Jewish Young Women’s Union, and one of their workers is now arranging to place this girl, if the consent of the parents can be obtained, in a country home for a term of years.
Salkind_Chapter 13.indd 224
9/16/2010 12:42:55 PM
Witmer
Treatment of Normal and Abnormal Development
225
Unless we go into the homes, in most cases we do not accomplish much. When we are asked either by the patients or by the doctors to go into the home we go, sometimes co-operating with another agency. Only yesterday I secured groceries from another agency for a destitute family. Dr. Witmer: There has been in the City of Philadelphia for some years a psychological clinic. It was not called that, but the Magistrate’s Office. We have with us Magistrate Gorman, who made his work, in connection with the Juvenile Court, the work of a clinical psychologist.
Magistrate Gorman I must say this in answer to the very complimentary and eulogistic introduction of Professor Witmer, that it shows how necessary the branch of study in which he is the pioneer is to the community, when I tell you that notwithstanding the efforts that I have made in this direction, after I have done all I can, I am still compelled to send cases to Dr. Witmer. I believe that I was to talk upon the Juvenile Court. I doubt very much whether you could spare me the time even to speak briefly on that subject. You have heard much that pertains to the good of the children, in all its various branches, and the Juvenile Court, as it was demonstrated in the two years and nine months when I had the honor of presiding, shows the real reasons why these children should be the subject of our special attention. If you sat with me in the magistrate’s office at the House of Detention, and saw day after day the cases of unfortunate children, I doubt very much whether you, like myself, would not be willing to devote your life to them. You might find there four or five small children with a father taken away by death, the mother bound to her children by natural affection, and willing to make any sacrifice to keep that flock together – locking them in the daytime – sometimes not locking them in but permitting them to run the streets, and taking the chances of their going to school or not. If we do not take up the child in his youth and give him what it was intended every child should have, that care, physical, moral and religious, we are neglecting a duty; and I have maintained again and again that the hundreds of thousands of adult prisoners who travel around in that terrible circle before the magistrates to-day are nothing more nor less than the neglected children of past generations. Are we going to have this dreadful line continued indefinitely and interminably? It is greatly to be hoped that we are approaching the time when we will not have recorded, as we had at the beginning of this year in the annual report of the superintendent of our police, that there were 50,000 arrests made in Philadelphia during the year 1908. I am prepared to say with authority, that there would not have been 10,000 persons arrested by the
Salkind_Chapter 13.indd 225
9/16/2010 12:42:55 PM
226
Human Development
police of Philadelphia were it not for the fact that they were the neglected and unfortunate children of past generations. If I were to discuss the Juvenile Court, I would have to speak of its history, of its purposes and of its achievement. Its history in Philadelphia is like its history all over the United States. It is indeed a compliment to us as American citizens that we have had among us during the past four years, representatives from almost every foreign country coming to study and investigate the Juvenile Court System of the United States. The Juvenile Court idea was practically first conceived in Philadelphia. The first thought was of a separate house, where these little children could be kept apart from adults. It was not conceived by any public official, but by the Rev. Mr. Camp, who went to the prisons of Philadelphia and saw there sights which could not fail to elicit his charity. He gathered together a number of people in Philadelphia, Mr. Barnes of old Christ Church and several other equally philanthropic men, and they had a bill passed establishing the House of Detention, providing $25,000.00 was subscribed. Up to 1903 there was not $25,000.00 to provide for a House of Detention. After a second bill passed, we commenced operations in 1906. From 1906 to the present time I have had the pleasure to stand as the attorney and friend of the boy, and that is the only pleasure there was about it. It was an honor also to represent a new system. In the two years and nine months I was there I heard every boy, – who was not discharged by the lieutenant or “a friend,” – every boy that was arrested and sent to the House of Detention. During those two years and nine months I had 14,000 boys and girls before me in the House of Detention, and out of that 14,000 I had about 100 bad boys and girls; the rest were the victims of causes over which the child had absolutely no control. Out of the 14,000 who were in the Magistrate’s Court, less than 4000 were returned to the Juvenile Court, and I am proud of it. If I were back there again there would not be so many. Less than 4000 – and here is something to which I wish to devote a thought, because it is important. While we were the first city in the world to attempt to make history in this magnificent movement, we are the last and least efficient in developing that movement. We have a system in the city of Philadelphia such as exists nowhere else in these United States. It is without logic, without system and without result. In this city, after the case is heard and sent into court, it is sent before the judge of the Juvenile Court. We have fifteen judges and one sits each month. When I tell you that each of these judges sits but four out of the 365 days to hear the cases of children sent from the Juvenile Court, what good can you expect to be done for the child? The judges do their duty wonderfully well. This complaint is against the citizen. It is necessary that the judge should go along with the child from his first appearance in the Juvenile Court until he finds a place in some worthy home, or institution, but to sit but four days in the year and think you are
Salkind_Chapter 13.indd 226
9/16/2010 12:42:55 PM
Witmer
Treatment of Normal and Abnormal Development
227
accomplishing some good, does not appeal to me as being a very systematic, efficient or logical way of clearing up this problem. What is the result? A boy appears before me and is discharged. He appears a second time in a month. He might be discharged. A third time he returns, and now I am quite sure he means to be bad. He is sent into the Juvenile Court and is sent home on probation. Sometimes it is good for him and sometimes it is not. It is good when there is a probation office to follow up the child, but if the child is meeting the probation officer once a week and is enjoying pink tea, while the probation officer does not know he has run away from home, you could not consider that good probationary work. Then after that he is in for the fourth time. The court thinks him a very bad boy, and says, “We will send him to the Protectory,” or “We will send him to the House of Refuge,” or some other reformatory institution. He may stay three or six months. If he runs away it is nobody’s business to look after him. He comes back to the city, and after three or four months he gets in trouble again and goes before another judge, who sends him home once more on probation. I want to say one word about our school system, since three have spoken about it. They have spoken about the special school, and I think this will be of interest to everyone connected with this movement. I believe with those who know anything about these unfortunate children, that there is but one grand defect in our school system. I do not agree with Mr. Cornman that much good is done by our special schools. I think they are breeding spots for crime. While they were originally intended to be schools for backward children or truant children, now those who are mentally deficient and morally deficient are sent to these schools, so that the backward children are mixed up with a lot of bad boys, and it does not require much thought to see what way those truants and backward boys are going. My experience is from the number I have dealt with, that the morally delinquent models the character of the other boys, and where you have one moral delinquent you have five others made so because of contact with him. My statistics show that within one year I have had 200 boys from special schools before me. There are 1200 in the special schools. That is just one-sixth, or 162/3 per cent, whom I have had in the magistrate’s office, arrested for some delinquency, who were members of a special school. This proves the charge I make that special schools should be restricted, or else they should be done away with altogether, and other schools put in their places. Miss Maguire has solved it as far as it can be solved without the Board of Education, – that is, to have a special class where the backward boy or truant is put under special care such as Dr. Witmer has explained this morning, instead of making new morally delinquent boys out of the others in the same class. I hope that your good work will result in the redemption, rejuvenation and repair of all our poor unfortunate children.
Salkind_Chapter 13.indd 227
9/16/2010 12:42:56 PM
228
Human Development
Mr. Edwin D. Solenberger was then introduced and spoke as follows: The Pennsylvania Children’s Aid Society in common with other childcaring agencies finds that the homes from which its children come are much below the standard of the average home in the community. It is the rule rather than the exception to find that the physical, mental and moral development of children from such homes has been neglected to a greater or less extent. If the father has died leaving the mother with the burden of the support of the children or if the mother has died leaving the father a widower under the necessity of employing a poor housekeeper or placing his children to board with irresponsible persons, the children are likely to be still further neglected. The same result is likely to follow if the domestic life is shattered by the separation of the parents or by the immorality or desertion of one or the other. If either parent is stricken with a disease resulting in chronic illness of greater or less duration, the chances for proper parental attention to the children are greatly lessened. An industrial depression resulting in the idleness of the bread winners of the family still further decreases the chances of the children for proper care. The very fact that children are brought to the attention of child-caring agencies of any kind is often evidence in itself that the parents are lacking in intelligence or efficiency in the proper care of their own children. Unfortunately we have usually to add to the lack of proper care on the part of the parents, bad housing conditions and unfavorable neighborhood surroundings. These untoward conditions for the proper development and training of children are unfortunately hot of short duration. Children are not usually made dependent, destitute, delinquent or reduced to a state of neglect in a day. It is generally a long and gradual descent downward until the family is finally so demoralized as to call for intervention on the part of some public or private child-saving agency. From such sources as these, boys and girls come through the juvenile courts, from the almshouses, from the societies to protect children from cruelty, and from charitable associations, to be placed out in family homes by children’s aid societies or cared for in institutions. Is not this statement of sources from which the children are received a sufficient and urgent reason for making use of every available facility to help to arrive at a complete knowledge of the physical, mental and moral development of the child as a basis for wise action in providing care and treatment? Some method of examination, observation and study of the child such as is made possible through the Psychological Clinic conducted by Dr. Witmer at the University of Pennsylvania is of great value in a large number of cases. It is needed to supplement and complete the physical examination of the child made by the doctor. It is only by some such method as this that we can secure the proper interpretation and understanding of many of the physical defects which the doctor notes in his examination. On the other hand, after an examination, study and observation of the child by a trained psychologist, a further examination and study of the child by a
Salkind_Chapter 13.indd 228
9/16/2010 12:42:56 PM
Witmer
Treatment of Normal and Abnormal Development
229
doctor in the light of what the psychologist has discovered is frequently of great help to both in their treatment of the case. Surely it is important in order to deal properly with the child to have a diagnosis made with respect to its memory, judgment, reason and general mental development. This is particularly true in view of the fact that such a large number of children dealt with by child-caring agencies are abnormal or subnormal by predisposition on account of their bad inheritance and unfavorable environment. The study and observation of children by the psychological clinic methods enables the child-helping agency to adapt its care and training to the needs of the child. It helps us to distinguish between permanent and temporary abnormalities; between characteristics of deficiency and characteristics of backwardness; and, between deficit and surplus in the mental development of the child. Progressive children’s agencies have long since recognized the value of a careful investigation by which they mean chiefly a study of the social and industrial relations of the family whose children are to be the objects of their care. There has also been a recognition to some extent of the value of a doctor’s examination of such children in order to guard against contagious disease and to protect the institution or society from receiving into its care the physically unfit. Should we not recognize the necessity of dealing with the child as a whole and considering not merely the social and industrial aspects of the family from which he comes and the more obvious physical conditions of the child, but also the finer and subtler question of his mental and moral development? Universities have already established experiment stations for the study of domestic animals and vegetation of all kinds. Bulletins of information are sent out to stock-raisers and farmers. Biology, chemistry and geology and other sciences have made some contribution toward the improvement of live stock, fruit and grain. May we not reasonably demand and expect some help toward the improvement of our methods of care and treatment of children from the psychologist, as well as from the doctor and the social worker.
Note 1. With acknowledgments to Mr. Joseph Lee.
Salkind_Chapter 13.indd 229
9/16/2010 12:42:56 PM
Salkind_Chapter 13.indd 230
9/16/2010 12:42:56 PM
14 Self-Motivation for Academic Attainment: The Role of Self-Efficacy Beliefs and Personal Goal Setting Barry J. Zimmerman, Albert Bandura and Manuel Martinez-Pons
I
n recent years, there has been a growing interest in students’ self-regulation of their academic learning and performance (e.g., Corno, 1989; Harris, 1990; McCombs & Marzano, 1990; Paris & Newman, 1990; Pressley & Ghatala, 1990; and Zimmerman & Schunk, 1989). Academic self-regulation is concerned with the degree to which students are metacognitively, motivationally, and behaviorally proactive regulators of their own learning process (Zimmerman, 1986, 1990a). Self-regulated learners are not only distinguished by their proactive orientation and performance but also by their selfmotivative capabilities. From a social cognitive perspective (Bandura, 1986; 1989b; 1991a), selfregulated learners direct their learning processes and attainments by setting challenging goals for themselves (Bandura, 1989c; Schunk, 1990), by applying appropriate strategies to achieve their goals (Zimmerman, 1989), and by enlisting self-regulative influences that motivate and guide their efforts (Bandura & Cervone, 1983, 1986). Self-regulated learners exhibit a high sense of efficacy in their capabilities, which influences the knowledge and skill goals they set for themselves and their commitment to fulfill these challenges (Zimmerman, 1989, 1990b). This conception of self-directed learning not only encompasses the cognitive skills emphasized by metacognitive
Source: American Educational Research Journal, 29(3) (1992): 663–676.
Salkind_Chapter 14.indd 231
9/16/2010 12:42:46 PM
232
Human Development
theorists, but also extends beyond to include the self-regulation of motivation, the learning environment, and social supports for self-directedness. An increasing body of evidence provides support for these assumptions. Experimental studies have shown that teaching low-achieving students to set proximal goals for themselves enhances their sense of cognitive efficacy, their academic achievement, and their intrinsic interest in the subject matter (Bandura & Schunk, 1981; Schunk, 1983). Numerous studies have shown that students with a high sense of academic efficacy display greater persistence, effort, and intrinsic interest in their academic learning and performance (Schunk, 1984, 1989). Finally, a growing body of correlational research indicates that self-regulated learners make greater use of learning strategies and achieve better than do learners who make little use of selfdirected learning strategies (Zimmerman & Martinez-Pons, 1986, 1988, 1990). To date, however, the causal impact of students’ self-efficacy beliefs, personal goal setting, and use of learning strategies on their academic grades under natural school conditions has not been systematically examined. According to social cognitive theory (Bandura, 1986, 1991b), goals increase people’s cognitive and affective reactions to performance outcomes because goals specify the requirements for personal success. Goals also prompt selfmonitoring and self-judgments of performance attainments (Bandura & Cervone, 1983, 1986; Locke, Cartledge, & Knerr, 1970). However, selfregulation of motivation depends on self-efficacy beliefs as well as on personal goals. Perceived self-efficacy influences the level of goal challenge people set for themselves, the amount of effort they mobilize, and their persistence in the face of difficulties. Perceived self-efficacy is theorized to influence performance accomplishments both directly and indirectly through its influences on self-set goals. This hypothesized relationship has been tested and verified in organizational research (Bandura & Wood, 1989; Wood & Bandura, 1989). In a recent comprehensive review, Locke and Latham (1990) provide substantial evidence that externally assigned goals in organizational settings can influence personally set goals. Parental goals might be expected to have a similar impact on children’s goals. Although parental aspirations for children have been found to affect children’s achievement (e.g., Henderson, 1981; Majoribanks, 1978), the influences of parental goal setting have received little attention to date. From a social cognitive perspective, students’ personal goal setting is influenced jointly by their self-beliefs of efficacy and the goals their parents set for them. In addition, strategies for regulating self-motivating processes as well as academic learning processes play an important role. Zimmerman and Martinez-Pons (1992) have reviewed evidence corroborating a close link between students’ use of self-regulated learning strategies and their perceptions of academic efficacy. However, the impact of students’ perceived self-efficacy for using self-regulated learning strategies has not been tested directly. Recently Bandura (1989a) developed multidimensional scales for measuring perceived self-regulatory efficacy for academic achievement, and
Salkind_Chapter 14.indd 232
9/16/2010 12:42:47 PM
Zimmerman et al.
Self-Motivation for Academic Attainment
233
children’s perceived self-efficacy in other domains of functioning. The scales for perceived self-efficacy for self-regulated learning assess students’ perceived capability to use a variety of self-regulated learning strategies such as planning and organizing their academic activities, transforming instructional information using cognitive strategies to understand and remember material being taught, resisting distractions, motivating themselves to complete school work, structuring environments conducive to study, and participating in class. These items were developed to measure learning strategies reported by high school students during structured interviews (Zimmerman & MartinezPons, 1986, 1988). Perceived self-efficacy for academic achievement items assessed students’ beliefs in their capability to learn nine areas of course work, ranging from mathematics to foreign language proficiency. It was hypothesized that students’ perceived efficacy to use self-regulated learning strategies would enhance their perceived efficacy to achieve in their academic courses. The following conceptual model of self-regulated motivation and academic learning was tested: Students’ perceived self-regulatory efficacy would influence their perceived self-efficacy for academic achievement, and their efficacy should, in turn, influence their personal goals and grade achievement. These causal paths are depicted in Figure 1. Following Locke and Latham (1990), a second causal path was hypothesized linking parents’ academic goals to their children’s personal goals, which in turn are linked to their academic grades. Because self-efficacy beliefs and parental and student grade goals are expected to be influenced by prior academic achievement, the latter variable was entered as an antecedent influence in the causal model. The inclusion of prior achievement will indicate whether self-regulatory efficacy and goal factors make independent contributions to subsequent academic achievement. Path analysis was used to test the sociocognitive model of academic self-motivation and achievement.
Self-Efficacy For Self-Regulated Learning
Self-Efficacy For Academic Achievement
Prior Grades
Final Grades
Parent Grade Goals
Student Grade Goals
Figure 1: A causal model of student self-motivation
Salkind_Chapter 14.indd 233
9/16/2010 12:42:47 PM
234
Human Development
Method Sample From two high schools in a large Eastern city, 116 ninth and tenth graders were selected to participate in this study. However, 2 students were dropped due to parental refusal to participate, and 12 students were dropped for parental failure to return their questionnaires. Thus, 102 students participated: 50 boys and 52 girls. The schools served lower middle-class neighborhoods, and the students were 17% Asian, 34% Black, 23% Hispanic, and 24% White; 2% did not report their ethnicity. Each of five teachers who taught ninth- and tenth-grade classes in social studies agreed to include one of his or her randomly selected classes in the study. Social studies was selected because the course was required of all students and was not subject to academic tracking according to ability. It thus provided a representative sample of the students attending the high schools.
Student Perceived Self-Efficacy Two subscales from the Children’s Multidimensional Self-Efficacy Scales (Bandura, 1989a) were selected for use in this study: self-efficacy for selfregulated learning and self-efficacy for academic achievement. The self-efficacy for self-regulated learning scale included 11 items that measured students’ perceived capability to use a variety of self-regulated learning strategies. Previous research on students’ use of these learning strategies revealed a common self-regulation factor (Zimmerman & Martinez-Pons, 1988), thus providing a basis for aggregating items in a single scale. The self-efficacy for academic achievement scale was composed of nine items that measured students’ perceived capability to achieve in nine domains: mathematics, algebra, science, biology, reading and writing language skills, computer use, foreign language proficiency, social studies, and English grammar. The items of both sets of self-efficacy scales are listed in Table 1. For each item, students rated their perceived self-efficacy according to a 7-point scale. The descriptions were not well at all for a rating of 1, not too well for 3, pretty well for 5, and very well for 7.
Grade Goals Both the students’ and their parents’ grade goals were assessed using rating scales developed by Locke and Bryan (1968). They examined numerous variations in question formats and found two important measures for deriving valid ratings of college students’ academic grade goals: one’s expected grade and the grade one regarded as minimally satisfying. Locke and Bryan found
Salkind_Chapter 14.indd 234
9/16/2010 12:42:47 PM
Zimmerman et al.
Self-Motivation for Academic Attainment
235
Table 1: Self-efficacy item means and standard deviations Items
M
SD
Self-efficacy for self-regulated learning 1. finish homework assignments by deadlines? 2. study when there are other interesting things to do? 3. concentrate on school subjects? 4. take class notes of class instruction? 5. use the library to get information for class assignments? 6. plan your schoolwork? 7. organize your schoolwork? 8. remember information presented in class and textbooks? 9. arrange a place to study without distractions? 10. motivate yourself to do schoolwork? 11. participate in class discussions?
4.84 3.49 4.30 5.34 4.69 4.10 4.85 4.67 4.16 4.42 4.88
1.63 1.50 1.39 1.52 1.76 1.01 1.63 1.50 1.84 1.79 1.71
Self-efficacy for academic achievement 1. learn general mathematics? 2. learn algebra? 3. learn science? 4. learn biology? 5. learn reading and writing language skills? 6. learn to use computers? 7. learn foreign languages? 8. learn social studies? 9. learn English grammar?
5.33 4.81 5.35 4.83 5.97 5.63 4.41 5.01 5.14
1.66 1.91 1.43 1.72 1.30 1.50 1.85 1.49 1.47
How well can you:
these two measures to be highly correlated, r = .67, and recommended using them together to provide an index of goal setting. For purposes of the present study, students rated their goal expectation and the lowest academic grade they would find satisfying in terms of 5 grade levels. Five response options were 1 = F (0–59%), 2 = D (60–69%), 3 = C (70–79%), 4 = B (80–89%), and 5 = A (90–100%). Percentages were included along with letter grades because of the widespread use of this dual system of grading in these schools. To provide the most reliable measure of students’ grade goals, the two items were combined into a single measure of students’ grade goals. Two parallel goal items were developed for the parents regarding the goal levels they held for their children in the social studies class: (a) What academic grade do you expect your child to receive in the social studies course? and (b) What is the lowest academic grade you find satisfying for your child in this course? Parents recorded their goals using letter-grade response options. These two items were combined in a single measure of parental grade goals.
Procedure The self-efficacy and goal-setting scales were included in a questionnaire that was administered in the students’ social studies class. In addition, the students were asked to provide demographic information (sex, grade, age, ethnicity)
Salkind_Chapter 14.indd 235
9/16/2010 12:42:47 PM
236
Human Development
and their identification number but not their names. They were assured of anonymity and that only the investigators would see their answers. The questionnaires were administered shortly after the semester began in their social studies class. The parents’ questionnaires were sent home with the students with instructions. The parents’ completed forms were returned to the school by the students in a sealed envelopes. At the end of the semester, the teachers provided the final grades for their students in the social studies course. Each student’s grade in social studies for the prior year was obtained from school records. Prior academic achievement was selected because it was the most recent indicant of achievement and was identical in numerical form and academic content to each student’s final grade in social studies. From a social cognitive perspective, this measure provided the most relevant previous academic experience that could influence students’ perceptions of their efficacy and goal setting.
Results Cronbach alpha reliability tests were performed for each of the scales used in the present study. The two self-efficacy scales proved to be highly reliable. A coefficient of .87 was found for the 11-item self-efficacy for self-regulated learning scale, and a coefficient of .70 was found for the 9-item self-efficacy for academic achievement scale. The two student grade goal items correlated .65, which was similar to the correlation of .60 reported by Locke and Bryan (1968) by college students. The two parent grade goal items correlated .45. The Cronbach reliability coefficients were .80 for the student goal items and .63 for the parent goal items. The means and standard deviations for the two self-efficacy scores are presented in Table 1. With regard to self-efficacy for self-regulated learning, students rated their efficacy lowest for being able to get themselves to study when there were other more interesting things to do (M = 3.49) and highest for being able to take notes on class instruction (M = 5.34). With regard to self-efficacy for academic achievement, they rated their efficacy lowest for learning foreign languages (M = 4.41) and highest for learning reading and writing language skills (M = 5.97). The means and standard deviations for each of the variables in the causal model are presented in Table 2. The mean for the final grade in the social studies course was 3.75, which falls between a B (4) and a C (3). The students’ mean grade in their prior social studies course was slightly lower at 3.43. The average perceived self-efficacy for self-regulated learning was 4.53, which was below the pretty well level. The average perceived self-efficacy for academic learning was 5.16, slightly above the rating of pretty well. Students’ grade goals were 3.21 for their expected grade and 3.16 for the lowest satisfying grade. Their parents’ expected grade was 4.22, and the lowest satisfying
Salkind_Chapter 14.indd 236
9/16/2010 12:42:47 PM
Zimmerman et al.
Self-Motivation for Academic Attainment
237
Table 2: Means and standard deviations for self-efficacy, goals, and grades Variables 1. 2. 3. 4. 5. 6.
Prior grades Efficacy for self-regulated learning* Efficacy for academic achievement* Parent grade goals Student grade goals Final grades
M
SD
3.43 4.53 5.16 7.96 6.37 3.75
.91 1.07 .86 1.08 1.61 .81
Note: N = 102. *Average item rating.
grade was 3.74. The parents’ grade goals were significantly higher than their children’s mean for both these two items (t(101) = 8.16, p < .01). Correlation coefficients for the different variables are in Table 3. Students’ prior grade in social studies correlated significantly with their perceived academic self-efficacy, r = .22, their grade goal, r = .23, their parents’ grade goal, r = .26, and their final grade in the course r = .23. Students’ perceived efficacy for academic achievement correlated significantly with their grade goals, r = .41, and with their final grades in social studies, r = .39. The grade goals of the parents correlated significantly with their child’s grade goals, r = .41. The students’ personal grade goals were related to their final grades in social studies, r = .52. Finally, students’ perceived efficacy for self-regulated learning correlated significantly with their self-efficacy for academic achievement, r = .51. Before testing the proposed model of self-motivation, two background factors were examined, namely, the students’ school and class membership. The scores of the students on each of the variables in the model were compared between the two schools and between each class and the remaining four classes in a series of regression analyses. None of these comparisons yielded any significant differences for students’ prior grades for their grade goals or their parents’ goals, for students’ self-regulatory and academic self-efficacy, or for their final grades. The largest unstandardized beta weight, –.65, was
Table 3: Correlations among measures of self-efficacy, goals, and grades Variables 1. Prior grades 2. Efficacy for self-regulated learning 3. Efficacy for academic achievement 4. Parent grade goals 5. Student grade goals 6. Final grades
1
2
3
4
5
6
1.00 .14 .41* .39*
1.00 .41* .26*
1.00 .52*
1.00
1.00 .14 .22* .26* .23* .23*
1.00 .51* .15 .30* .16
Note: N = 102. *p < .05.
Salkind_Chapter 14.indd 237
9/16/2010 12:42:47 PM
238
Human Development
nonsignificant. As a result neither of these two background factors was included in the final path model. The causal structure of the path model is presented in Figure 2. The path analysis was conducted using SPSS (Statistical Package for the Social Sciences) procedures (Nie, Hull, Jenkins, Steinbrenner, & Bent, 1975). A multivariate test for the fit of the model indicated no significant divergence, chi-square (8) = 3.78, ns. Thus, none of the causal paths excluded from the model was statistically significant. The model of self-motivation and students’ prior grade achievement was predictive of their final grade in their social studies course, R = .56, p < .01, and accounted for 31% of the variance in their academic attainment. The path between students’ prior grade in social studies and their parents’ grade goal for them was significant, P = .26, but none of the other paths between the students’ prior grade and motivation factors attained significance. Nor was the direct path between prior grade and final grade in social studies significant when the impact of self-motivation factors was controlled statistically. As hypothesized, a significant causal path was found between students’ perceived efficacy for self-regulated learning and their efficacy for academic achievement, P = .51. Students’ perceived self-efficacy for academic achievement predicted both their final grade in the course, P = .21, and their personal goals, P = .36. Students’ grade goals were significantly predictive of their grades in social studies, P = .43. The combined direct and indirect (i.e., via student goals) causal effect of students’ perceived self-efficacy for academic achievement on their final grades was P = .37, p < .05. Parental grade goals were also causally related to student personal goal setting, P = .36. The direct paths from student self-efficacy for self-regulated learning to their grade outcomes and from their parents’ grade goals to students’ grade outcomes were not significant.
Self-Efficacy For Self-Regulated Learning
.51∗
.26∗
Parent Grade Goals
.21∗
.36∗
.15
Prior Grades
Self-Efficacy For Academic Achievement
.36∗
Final Grades .43∗
Student Grade Goals
Figure 2: Path coefficients between variables in the sociocognitive model of students’ self-motivation and class grades (*p < .05)
Salkind_Chapter 14.indd 238
9/16/2010 12:42:47 PM
Zimmerman et al.
Self-Motivation for Academic Attainment
239
Discussion The present investigation tested the predictiveness of several self-motivational factors of students’ academic achievement in the naturalistic context of a high school social studies class. It was an initial effort to test a social cognitive model of academic self-motivation for subsequent academic achievement in a regular class. The proposed model provided a statistically adequate fit for the obtained data, with perceived self-efficacy for academic achievement and student goals accounting for 31% of the variance in the students’ academic course attainment. Although the selected self-motivational factors make a significant contribution to academic attainment, a major portion of the variance in student grades under these natural conditions remains unexplained. Social cognitive theory encompasses additional motivators and guides for performance (Bandura, 1986; 1991b). For example, outcome expectations in the form of anticipatory social and self-evaluative consequences operate as significant contributions to personal attainments. The fully expanded sociocognitive model would most likely account for even more of the variance in academic attainment. Inferences about causality are often difficult to make in field research because of many uncontrolled background sources of variance. However, two key extraneous variables in this study, students’ school and class membership, were eliminated through a series of regression analyses as alternative explanations for the results. Students’ prior achievement was entered first in the path model as a potential determinant of perceived self-efficacy, students’ and parental goals, and their final academic attainment. The use of this measure of prior achievement provided a particularly stringent test of the independent contribution of the self-regulatory factors because this measure (prior course grade) was obtained recently, was directly relevant to the current course, and was identical metrically for the two academic learning periods. Although prior grade attainment correlated with student academic self-efficacy and goal setting, it operated only through parents’ academic goals for their children. Apparently, parents rely on their children’s prior grade accomplishments when they set goals for their children; however, their children rely on their self-efficacy beliefs as well as their parents’ aspirations for them when setting their goals. Interestingly, the direct path of influence between students’ prior grades and final grades was not significant. This suggests that self-regulatory factors not only mediated the influence of prior achievement, but also contributed independently to students’ academic attainment. Whereas prior grades were correlated, r = .23, with subsequent grades, perceived self-efficacy and academic goals combined to produce a multiple correlation of .56. This represents an increase of 26% in predicted variance in final academic attainment. It might be questioned whether the size of these self-motivation results was affected by the low correlation between students’ prior grades and their final
Salkind_Chapter 14.indd 239
9/16/2010 12:42:47 PM
240
Human Development
grades in the course. Would a standardized test measure of prior achievement correlate better and correspondingly reduce the relative predictiveness of selfregulation measures? Recently, a highly regarded standardized test was compared with another academic self-regulation measure, time-management ratings. Britton and Tesser (1991) found that college students’ Scholastic Aptitude Test (SAT ) scores correlated r = .20 with their cumulative grade point during the freshman and sophomore years. In contrast, time-management questionnaire items predicted substantially more variance (R2 = .21) than SAT items did (R2 = .05). These researchers concluded that the best predictor, a time attitude factor, ‘‘seems very much like self-efficacy. Subjects report feelings of being in charge of their own time” (Britton & Tesser, 1991, p. 409). Although this issue merits further study, there is no reason to assume that the results would have been affected substantially by using standardized test scores in place of prior grades in the same subject matter area. The path analyses provided support for a social cognitive view of academic self-regulation. As expected, personal goals played a key role in students’ attainment of grades in school. These self-set goals committed the students to specific grade achievements for positive self-evaluation. In accord with prior research, the higher the perceived self-efficacy, the higher the goals students set for themselves (Bandura, 1992; Locke & Latham, 1990). Selfefficacy influenced not only students’ setting of academic goals for themselves, but also their achievement of these goals. The direct and indirect influence of students’ perceived academic self-efficacy on academic attainment produced a combined effect of .37. As might be anticipated, parents’ academic goals for their children were significantly higher than those their children set for themselves. However, the influence of parents’ grade goals on their children’s goals was tempered by the youngsters’ beliefs in their academic efficacy. This finding indicates that academic attainment is regulated, in large part, through self-motivating influences. Further research is needed to determine how parents socially influence their children’s goal setting. The findings of this study correspond with what many parents and teachers have learned from frustrating personal experiences: Students often do not adopt the high academic aspirations imposed upon them. Clearly, a determinant of student aspirations is their belief in their academic efficacy. Efforts to foster academic achievement need to do more than simply set demanding standards for students. They need to structure academic experiences in a way that enhances students’ sense of academic efficacy as well. The significant path between parents’ grade goals and their children’s provides some initial evidence of a causal linkage of goal setting in the academic domain. Previously, most of the assigned and participant goal setting research had been conducted on performance in the organizational domain. The evidence of parental social influence is important because parents can only monitor their children’s goal setting and performance in school indirectly. As might be expected, the size of the strength of parental influence on their children’s
Salkind_Chapter 14.indd 240
9/16/2010 12:42:47 PM
Zimmerman et al.
Self-Motivation for Academic Attainment
241
goal setting, r = .41, is lower than that reported in research in nonacademic settings, r = .58, by Locke and Latham (1990). In organizational applications, goals are used primarily to enhance use of preexisting skills; in the academic domain, goals are used to develop knowledge and cognitive skills. It is easier to use skills than to develop them originally. Additionally, the social dynamics of the two settings are different. Nevertheless, the present results show that social influences on motivational processes underlying academic self-regulated learning are similar to those underlying performance in nonacademic settings. A sizable body of research demonstrates that students’ use of learning strategies promotes academic achievement (Pressley, Borkowski, & Schneider, 1987; Weinstein, Goetz, & Alexander, 1988; Zimmerman & Martinez-Pons, 1986, 1988, 1990). However, there is also evidence (Borkowski & Cavanaugh, 1979; Kramer & Engle, 1981) that knowledge of learning strategies does not ensure their effective and consistent use. The present results indicate that student self-beliefs of efficacy to strategically regulate learning play an important role in academic self-motivation. A significant causal path was found between efficacy for self-regulated learning, efficacy for academic achievement, and academic attainment. Students who perceived themselves as capable of regulating their own activities strategically are more confident about mastering academic subjects and attain higher academic performance. In conclusion, perceived efficacy to achieve motivates academic attainment both directly and indirectly by influencing personal goal setting. Self-efficacy and goals in combination contribute to subsequent academic attainments. However, substantial variance in student achievement remains to be explained, and future research efforts need to focus on additional self-regulatory factors, such as self-monitoring, judgmental processes, and self-reactive influences (Zimmerman, 1989), as well as other influences, such as reading ability and home environment measures, that might affect children’s academic pursuits. Research also needs to be extended to include interviews and behavioral measures of academic studying as well as survey rating scales. Such an expanded model is likely to provide a more complete picture of the relative role of self-regulatory factors in student academic achievement.
References Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice-Hall. Bandura, A. (1989a). Multidimensional scales of perceived self-efficacy. Unpublished test, Stanford University, Stanford, CA. Bandura, A. (1989b). Human agency in social cognitive theory. American Psychologist, 77, 122–147. Bandura, A. (1989c). Self-regulation of motivation and action through internal standards and goal systems. In L.A. Pervin (Ed.), Goal concepts in personality and social psychology (pp. 19–38). Hillsdale, NJ: Erlbaum. Bandura, A. (1991a). Social cognitive theory of self-regulation. Organizational Behavior and Human Performance, 50, 248–287.
Salkind_Chapter 14.indd 241
9/16/2010 12:42:48 PM
242
Human Development
Bandura, A. (1991b). Self-regulation of motivation through anticipatory and self-reactive mechanisms. In R.A. Dienstbier (Ed.), Perspectives on motivation: Nebraska symposium on motivation (Vol. 38, pp. 69–164). Lincoln: University of Nebraska Press. Bandura, A., & Cervone, D. (1983). Self-evaluative and self-efficacy mechanisms governing the motivational effects of goal systems. Journal of Personality and Social Psychology, 45, 1017–1028. Bandura, A., & Cervone, D. (1986). Differential engagement of self-reactive influences in cognitive motivation. Organizational Behavior and Human Decision Processes, 38, 92–113. Bandura, A., & Schunk, D.H. (1981). Cultivating competence, self-efficacy, and intrinsic interest through proximal self-motivation. Journal of Personality and Social Psychology, 41, 586–598. Bandura, A., & Wood, R.E. (1989). The perceived controllability and performance standards on self-regulation of complex decision-making. Journal of Personality and Social Psychology, 56, 805–814. Borkowski, J.G., & Cavanaugh, J.C. (1979). Maintenance and generalization of skills and strategies by the retarded. In N.R. Ellis (Ed.), Handbook of mental deficiency, psychological theory, and research (2nd ed., pp. 569–617). Hillsdale, NJ: Erlbaum. Britton, B.K., & Tesser, A. (1991). Effects of time-management practices on college grades. Journal of Educational Psychology, 83, 405–410. Corno, L. (1989). Self-regulated learning: A volitional analysis. In B.J. Zimmerman & D.H. Schunk (Eds.) Self-regulated learning and academic achievement: Theory, research, and practice (pp. 111–141). New York: Springer Verlag. Harris, K. (1990). Developing self-regulated learners: The role of private speech and selfinstructions. Educational Psychologist, 25, 35–49. Henderson, R.W. (1981). Home environment and intellectual performance. In R.W. Henderson (Ed.), Parent-child interaction (pp. 3–32). New York: Academic Press. Kramer, J.J., & Engle, R.W. (1981). Teaching awareness of strategic behavior in combination with strategy training: Effects on children’s memory performance. Journal of Experimental Child Psychology, 32, 513–530. Locke, E.A., & Bryan, J. (1968). Grade goals as determinants of academic achievement. Journal of General Psychology, 79, 217–228. Locke, E.A., Cartledge, N., & Knerr, C. (1970). Studies of the relationship between satisfaction, goal-setting, and performance. Organizational Behavior and Human Performance, 5, 135–158. Locke, E.A., & Latham, G.P. (1990). A theory of goal-setting and task performance. Englewood, NJ: Prentice-Hall. Marjoribanks, K. (1978). Ethnic families and children’s achievement. Sydney: George Allen & Unwin. McCombs, B.L., & Marzano, R.J. (1990). Putting the self in self-regulated learning: The self as agent in integrating will and skill. Educational Psychologist, 25, 51–70. Nie, N.H., Hull, C.H., Jenkins, J.G., Steinbrenner, K., & Bent, D.H. (1975). Statistical package for the social sciences (2nd ed.). New York: McGraw-Hill. Paris, S.G., & Newman, R.S. (1990). Developmental aspects of self-regulated learning. Educational Psychologist, 25, 87–105. Pressley, M., Borkowski, J.G., & Schneider, W. (1987). Cognitive strategies: Good strategy users coordinate metacognition and knowledge. Annals of Child Development, 4, 89–129. Pressley, M., & Ghatala, E.S. (1990). Self-regulated learning: Monitoring learning from text. Educational Psychologist, 25, 19–33. Schunk, D.H. (1983). Goal difficulty and attainment information: Effects on children’s behaviors. Human Learning, 25, 107–117.
Salkind_Chapter 14.indd 242
9/16/2010 12:42:48 PM
Zimmerman et al.
Self-Motivation for Academic Attainment
243
Schunk, D.H. (1984). The self-efficacy perspective on achievement behavior. Education Psychologist, 19, 119–218. Schunk, D.H. (1989). Social cognitive theory and self-regulated learning. In B.J. Zimmerman & D.H. Schunk (Eds.), Self-regulated learning and academic achievement: Theory, research, and practice (pp. 83–110). New York: Springer Verlag. Schunk, D.H. (1990). Goal setting and self-efficacy during self-regulated learning. Educational Psychologist, 25, 71–86. Weinstein, C.E., Goetz, E.T., & Alexander, P.A. (1988). Learning and study strategies: Issues in assessment instruction and evaluation. San Diego, CA: Academic Press. Wood, R.E., & Bandura, A. (1989). Impact of conceptions of ability on self-regulatory mechanisms and complex decision-making. Journal of Personality and Social Psychology, 56, 407–415. Zimmerman, B.J. (1986). Development of self-regulated learning: Which are the key subprocesses? Contemporary Educational Psychology, 16, 307–313. Zimmerman, B.J. (1989). A social cognitive view of self-regulated learning. Journal of Educational Psychology, 81, 329–339. Zimmerman, B.J. (1990a). Self-regulated learning and academic achievement: An overview. Educational Psychologist, 25, 3–17. Zimmerman, B.J. (1990b). Self-regulating academic learning and achievement: The emergence of a social cognitive perpective. Educational Psychology Review, 2, 173–201. Zimmerman, B.J., & Martinez-Pons, M. (1986). Development of a structured interview for assessing student use of self-regulated learning strategies. American Educational Research Journal, 23, 614–628. Zimmerman, B.J., & Martinez-Pons, M. (1988). Construct validation of a strategy model of student self-regulated learning. Journal of Educational Psychology, 80, 284–290. Zimmerman, B.J., & Martinez-Pons, M. (1990). Student differences in self-regulated learning: Relating grade, sex, and giftedness to self-efficacy and strategy use. Journal of Educational Psychology, 82, 51–59. Zimmerman, B.J., & Martinez-Pons, M. (1992). Perceptions of efficacy and strategy use in the self-regulation of learning. In. D.H. Schunk & J. Meese (Eds.), Student perceptions in the classroom: Causes and consequences. Hillsdale, NJ: Erlbaum. Zimmerman, B.J., & Schunk, D.H. (Eds.). (1989). Self-regulated learning and academic achievement: Theory, research, and practice. New York: Springer Verlag.
Salkind_Chapter 14.indd 243
9/16/2010 12:42:48 PM
Salkind_Chapter 14.indd 244
9/16/2010 12:42:48 PM
15 The Dangerous and the Good? Developmentalism, Progress, and Public Schooling Bernadette Baker
O
ne of the most significant discourses to permeate the educational field over the 20th century has been that of developmentalism. Developmentalism is not a singular movement but a way of reasoning about humanity that was taken up noticeably in formal educational discourse in the late 19th century. It proffered a view of human life in which new abilities and proficiencies were thought to unfold in set steps or be acquired in a series of stages. Generally, it has, in its variety of theoretical forms, guided descriptions of children, selection of classroom content for different grade levels, preparation of teachers, and judgments of excellence in educational endeavors. There has been a trend in recent decades, however, to rethink developmentalism as a rationale for the organization of schooling and the assessment of children. Works by Burman (1994), Morrs (1996), Rose (1989, 1996), and Walkerdine (1984, 1993) are examples of texts from different academic fields that pose questions about the validity and effects of developmental logic. The texts are aimed at historicizing developmental theories, that is, at elucidating the historical conditions under which developmentalism arose and produced “the developing child.” Explicitly or implicitly, these texts have operating within the analyses an assumption that there is something wrong with leaving developmentalism’s dominance in place. Developmentalism is positioned as an object that needs Source: American Educational Research Journal, 36(4) (1999): 797–834.
Salkind_Chapter 15.indd 245
9/16/2010 12:42:37 PM
246
Human Development
to be understood by historicizing it. This understanding is meant to highlight developmental theories as sociocultural inventions without the knowledge of which a (sometimes nebulous) danger remains intact. Thus, a key question arising from the convergence of the critiques is “Has developmentalism been a dangerous way to think about human life?” If Foucault’s (1972) claim that every discourse is dangerous but that not every discourse is good is taken seriously, it becomes perspectival and historical as to what constitutes the dangerous and the good at any given moment. How one deploys developmentalism and writes its history and reads its effects in the present will thus be a marker of the intellectual traditions of the age, a cultural tagging of one’s timespace. My aim is to offer one narrative on how it is that we have come to debate developmentalism in the present. Like the texts just referred to, I locate some of the historical and cultural trajectories that enabled developmentalism to take hold in public schooling in the United States. The focus particularly concerns developmentalism in regard to the child rather than to the reformulation of a teaching identity.1 I argue here that multiple effects at the point of developmentalism’s emergence have made possible the debates that seek to destabilize it today. I also argue, however, that the debates seeking to either “deconstruct” or “trouble” developmentalism are mired similarly in a narrative of progress to which developmentalism’s emergence was tied. That is, I trace developmentalism’s emergence as the first form of “progressive education,” a historical association that rarely receives notice and seems a surprising conjuncture given the liberal aura surrounding “progressive education” today and current critiques of developmentalism as a conservative, controlling, or dominating logic. The article concludes by considering whether the destabilization of developmentalism can occur outside of narratives of progress and rationality, whether a child-adult schism automatically incites developmental theories, and by thus examining the question of alternatives to developmentalist discourse.
Surveying Developmentalism Developmental reasoning was imported into educational discourse during the 19th century, most noticeably through the formation of psychology as a social science. Although this was not the only science to advocate developmentalism – biology, anthropology, history, and philosophy are several examples of fields drawing upon developmental reasoning – it was primarily to psychology that education became wedded. Before a discipline labeled psychology was named as such, the idea that “faculties of mind” unfolded in children who developed their “capacities” (e.g., for reason or for morality) was commonly available. Both John Locke (1632–1704) and Jean Jacques Rousseau (1712–1778) depended on such
Salkind_Chapter 15.indd 246
9/16/2010 12:42:37 PM
Baker
Developmentalism, Progress, and Public Schooling
247
“faculty psychology” to give their child-rearing advice. In addition, prior to Locke and Rousseau, Johann Amos Comenius’s (1592–1670) Didactica Magna (The Great Didactic) had postulated that children passed through somewhat discrete stages of development that were articulated to differential capacities for doing things and judging things. By the 19th century, then, the idea that children “develop” and do so in stages was not new; nor was the fact that different meanings had been given to “development” in different cultures. The difference conferred by the nomenclature, developmentalism, as opposed to noting that children could develop faculties progressively, is significant. The significance can be ascertained through considering why education became so infatuated with psychology. If the explanation for the conjoining of educational and psychological discourse is based on the observations of Johann Friedrich Herbart (1776–1841), it is because ethics provided the aims of education (i.e., morality), while psychology provided “the way, the means, and the obstacles” (Herbart, 1835/1904, p. 2). Herbart was one of the earlier psychologists to depart from “faculty psychology” toward mathematical psychology. He argued, in a noticeable departure from his predecessors, that a child was not born with a will and that there were no separate “faculties” of mind (Herbart, 1804/1977a). Mind was a single entity, and will, feelings, and desires were made via “presentations” to the child’s consciousness, preferably those organized by the tutor. By the early 1800s, therefore, it was possible for Herbart to disarticulate philosophy from psychology, ends from means, and thereby to assert that pedagogy was a distinct science (Herbart, 1806/1977b). Pedagogy was, in Herbart’s view, that science focused on the act of instruction, and it was intimately interrelated with other sciences, specifically that branch of aesthetics he called ethics and the study of consciousness called psychology. These conceptual splits and different intellectual boundaries between disciplines announced the possibility of developmentalism. The earlier form of Herbartian developmentalism assumed the ability to implant, build, and contour a child’s will, distinguishing it from previous accounts of the developing child and opening pedagogy to a relationship with a different kind of antifaculty psychology: the new psychology. Herbart’s version of the new psychology saw him calculate mathematically how the intersection of presentations or stimuli, acting in similar ways to relational Newtonian forces, would result in the development of mind. Developmentalism in this early form emerged as a way of reasoning predicated on the belief that it was possible to build up a child’s “insides” mechanically, in a scientific manner, and in a particular order. This did not suggest that the early pedagogy lay outside views of inheritance. Herbart saw pedagogy as working between inheritance, the raw physiological material, and the goal of forming a moral Christian. An order of stages marked by age inhered in his pedagogical advice. It was an order in
Salkind_Chapter 15.indd 247
9/16/2010 12:42:37 PM
248
Human Development
which the child’s physical growth was assumed to loosely parallel the development of “the race” over time from savage to civilized, from primitive to sophisticated. The pedagogue was to observe what was already in the child and to teach on the basis of what was thought possible from there. The stages of development were given scientific and “meta”-physical justification, then, through appeal to the child’s physiology and absence of will, and, in turn, both will and physiology were used to ground psychological, historical, and pedagogical reasonings.
Latter 19th-Century Developmentalism By the turn of the 20th century, psychology and education had formalized in the U.S. academy as professional fields and did so almost inseparably. Psychology was one of the first social science disciplines to split from philosophy, with the first chairs in psychology being appointed in universities around 1900 (e.g., Joseph Jastrow at the University of Wisconsin-Madison). At the point of its departure, psychology in the United States was raising different questions regarding the “development” of humans than those posed earlier by Herbart in Germany and was beginning to “experiment” with experimenting (Herman, 1995; Taylor, 1994). Psychology had begun to make use of scientific methods of observation and aggregation of data to investigate problems such as will in children, criminality in adults, delinquency in juveniles, and degeneracy in “races.” Education’s uptake of psychological methods for posing and answering questions was also facilitated by an appeal to science as a means for truth production. In the late 19th century, the new scientific pedagogy and the new psychology were often synonymous terms in educational discourse and were deployed rhetorically to assert a truth claim. The interdependency of education and psychology was enabled by education’s provision of the subjects (e.g., children) necessary for data gathering and by psychology’s production of new strategies for monitoring and changing those subjects (e.g., teaching techniques). It was partly because of an institutional and intellectual interdependence that developmentalism could take hold, creating a new kind of “developing child” through techniques of study that emerged in/as a variety of public school reform efforts. As noted earlier, the idea that there were phases to human life, even phases marked by age, was not a new one in the late 19th century (see Thomas, 1976). While the idea that children developed and did so in discrete but connected stages that could be labeled may have harked back at least to the time of Comenius, what was noticeably different about the developmentalism of the late 19th century was a much more widespread belief in the scientific validity of “stages of development” and in the social science mutation of evolutionary and Darwinian ideas that supported them. The difference
Salkind_Chapter 15.indd 248
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
249
between early Herbartian developmentalism and its descendants did not take one form, however. Some developmental theorists emphasized the “ontology recapitulates phylogeny” argument, suggesting that children develop in stages marked by the evolution of “the race” and that this was primarily a genetic unfolding. Others, like the American Herbartianists, reinscribed the child’s will as inborn and not built, thereby positing development more singularly as a widening of the child’s “circle of thought” rather than as a form of implanting will. Still further, Froebelians (proponents of Friedrich Froebel [1782–1852] and the kindergarten movement) focused on very young children, particularly within the context of the family and preschooling, and did not assume that “scientific observation” of a child was necessary to helping a child “develop.” In sum, development did not mean one thing, but developmentalism can be understood as a variety of reforms that converged around a belief that the child did in fact develop through set stages that were scientifically verifiable. In continental Europe, the United Kingdom, and the United States, developmental ideas in evolutionary and scientific forms emerged simultaneously. The preceding suggests how, in the context of the U.S. public school movement, developmentalism was a controversial description of the young being tussled with intellectually at the turn of the 20th century. In conversations surrounding public schooling (which is the focus here), and more indirectly in teacher training, the idea that schools and lessons should be built around the child’s developmental stages as opposed to the organization of classical content was a radical one. In 1901, for instance, a belief in the uniqueness of developmental reasoning and the challenges it was thought to pose was communicated to one of the largest education conferences ever assembled to that point. I shall try in this paper to break away from all current practices, traditions, methods, and philosophies for a brief moment, and ask what education would be if based solely upon a fresh and comprehensive view of the nature and needs of childhood. Hitherto the data for such a construction of the ideal school have been insufficient, and soon they will be too manifold for any one mind to make the attempt. (Hall, 1901, p. 24)
Generally speaking, the move toward centering developmental conceptions of the child and curriculum suggested a new relationship between child, teacher, and knowledge. The child, instead of being perceived as a subject that would fit around the order of knowledge in the school, was newly positioned as the central subject around whom knowledge should be ordered. The teacher, rather than looking for true knowledge in the classics, was now to look into the child, via science, for true knowledge of development (Baker, 1998a).
Salkind_Chapter 15.indd 249
9/16/2010 12:42:38 PM
250
Human Development
Developmentalism in its early and late-19th-century forms opened up a very different view of the child not simply in relation to knowledge but in relation to the kind of treatment thought appropriate for the young. Before the emergence of a developmental logic, in the middle ages, for instance, children did not enter educational texts as people who passed first and foremost through “stages of development.” Children were variously described as guilty and, later, as innocent, and recommendations for their rearing were made on this basis. In Western Europe, for example, children were depicted in predominantly Catholic territories as being born with Original Sin. Human frailty was thus posited as the key concept that should guide their treatment. In the later works of romantic poets such as Wordsworth and social commentators such as Rousseau, however, children were born innocent and with a naturalness that did not lie in relation to struggling against a sinful inner essence or a subjugated flesh. It was the environment surrounding the child that might subsequently misdirect innocence, but in its pristine and initial state, it was inherently innocence that the child possessed. Developmentalism, as a formalized system for studying and depicting the child, offered a new view of what lay “inside” the child and a new basis for the treatment of children. For Herbart, what lay inside the child was in the negative; there was no inborn will: hence the need for development. For latter Herbartianists, what lay inside the child was a “nature” very different from the Rousseauean child’s nature and from Herbart’s child. Children had a will and had a nature that was internal to them, but that nature had the potential for both good and sin: hence the need for formal lessons. By the late 1800s, the inherited characteristics of the child were given greater significance in the equation of learning. Genes were thought to constitute the child’s nature and the child’s potential for good and evil. The judgment of potential, of nature, would subsequently determine how the child should be treated and how the environment surrounding the child should be organized. By the late 19th century also, however, nervousness emanated around the making of such judgments. Educators contesting developmentalism’s rise, in a variety of forms, often protested on the grounds of its lock stepping the child, its predetermined view of what labor a child would perform in adulthood, and its neglect of the classics. Thoughtful students … of the psychology of adolescence will refuse to believe that the American public intends to have its children sorted before their teens into clerks, watchmakers, lithographers, telegraph operators, masons, teamsters, farm laborers, and so forth. and treated differently in their schools according to the prophecies of their appropriate life careers. Who are to make these prophecies? (Eliot, 1905, pp. 330–331)
Eliot’s concern was for a reinstantiation of a humanist curriculum that was articulated to university study and that centered Greek and Latin. In Eliot’s view, all children should have the opportunity to study the curriculum
Salkind_Chapter 15.indd 250
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
251
offered in elite private schools. As I argue subsequently, the late-20th-century critiques of developmentalism were to arrive at similar sorts of nervousness over the categorization of children but on very different discursive grounds.
Setting Discursive Limits: Developmentalism, Data, and Debate Developmentalism first emerged in public education (not including kindergarten) through two key movements in the late 19th century. One, as already indicated, was Herbartianism, and the other was its offshoot, Child-study. These movements were the first educational reforms in the United States to specifically target public schools and widely use the words childhood, the child, children, and centering in or on the child while discussing pedagogy. As the dominant forms of developmentalism, they extended the sensation-based psychology introduced to definitions of the child under romanticism. What was formalized under late-19th-century developmentalism, then, was a view of the child as developing in stages of sensory awareness and physiological growth as articulated to morality and judged via characteristics thought to be inherited. In other words, the stages of development were given moral and intellectual implication, articulating questions of the soul to questions of psychophysiology (Rose, 1989). In Child-study, for example, children under 4 years of age were depicted via the physiology of the brain. A sense of the child’s vulnerability to influence by others was rationalized through appeals to the brain’s physiology. The child is not critical, either of self or others. He is willing to try his hand at everything; he accepts without much question whatever is done for him or told him and has no hard and fast notions of law either of nature or society to trammel his thinking or acting. The mind is suggestible and imitative more than at any other period of life. Both the moral and the aesthetic life are crude, like the savage’s. The whole life of the child is unformed and in the rough, but rich, full, and active. This is analogous to what we find of the physical side; a brain relatively large, but lacking coordination and delicacy of structure. (Partridge, 1912, pp. 75–76)
The moral and intellectual implication given to physiology suggested teaching strategies and activities that should be engaged in to protect the child. By virtue of the mechanism he has inherited from the past, the child is a self-active being. The stored-up energy of nerve-centers is constantly seeking an outlet. Previous to the age of seven years the undeveloped body and mind of the child plainly forbid activities which require skill, and which make a demand for the co-ordination of fine muscles and nerve-centers not yet developed. The growth which is taking place in the large muscles, however, and the energy stored in the nerve-centers which control their movements, make an urgent demand for such large activities
Salkind_Chapter 15.indd 251
9/16/2010 12:42:38 PM
252
Human Development
as creeping, walking, running, jumping, sliding, swinging, whirling, and rolling . . . . The child is in bondage to his senses. His alertness to sights and sounds is undoubtedly due to racial habits which were ingrained in nerve and muscle during the dangerous life of the remote past. (Dopp, 1904, p. 438)
Human life was thereby depicted as the linear experience of the passage of time and the senses as the means through which time’s passage could be traced.2 The idea of psychophysiological, formalized stages provided new limits for viewing the young. They were explicated in Child-study, for instance, as the kindergarten, transition, juvenile, and adolescent stages. By the turn of the 20th century, the “stages of development” concept that was deployed through Herbartianism and child study had made its way into teacher training curricula. Children, development, and stages were listed as topics in preservice coursework for the first time around 1900, 70 years after the advent of the first public schools in Massachusetts (Baker, 1998b). Interestingly, schooling and teacher education, for the bulk of the 19th century, were not only possible but were enacted without a view of the child as a developing subject in scientific terms and without a view of the teacher as one who looked “inside” the child for evidence of “development.” In addition to the preceding movements, there were many new intellectual positions available in the late 19th century where educators could locate themselves in relation to a concept of the developing child. Froebelians, and other theorists such as Edward Thorndike and John Dewey, for instance, while often in conflict and debate with Child-study enthusiasts and Herbartianists, still held to developmental conceptions of the child and human life. Froebelianism suggested that children of kindergarten age especially should be introduced to the gifts and occupations of “civilization,” giving play a serious status in the education of the young. In the United States, Froebelians were central to the uptake of a kindergarten movement and initially distanced their projects from those positioned as “scientific” in the late 19th century. For example, the move into scientific pedagogy and its minute prescription of content knowledge for each stage was interpreted as dangerous for a child’s morality by Froebelians. Children required ideal models of morality in their environment, and play was positioned as the means to symbolic thought necessary in learning how to relate morally to others (Weber, 1984). It was the lack of science that Child-study enthusiasts perceived as the danger in Froebelianism and Herbartianism, however. They were well-intended movements but missed the point of doing science because they were dependent upon introspection and intuition as opposed to observation and recording of data on children’s activities (see Hall, 1895). The different methodologies and teaching strategies debated back and forth across varying educational movements ruptured and reconstituted the
Salkind_Chapter 15.indd 252
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
253
child-adult division. The developing child was produced not simply through an amalgam of techniques and practices, as suggested by Rose (1996) and Walkerdine (1984), but through a simultaneous acceptance that there was an a priori distinction between children and adults to which the techniques could be applied. The application and invention of techniques for studying “children” therefore formalized a preexisting child-adult division while giving it a new meaning in scientific, “evolutionary,” and moral terms. The uptake of developmentalism also set new limits on how educational thought about the child could be evidenced. Despite the differences in style and content of pedagogy across movements, the unifying belief was that a child did indeed “develop” and that this development had in some way to be facilitated by adult action, whether that action was perceived as a lack of interference in the unfolding of a child’s nature or as very organized and highly specific “interference” at crucial times in a child’s “growth.” Even classicists like Eliot did not contest that children “developed” but, rather, debated its significance to the selection of school curricula. Developmentalism provided a new language for discussing education and had set limits on how and what one could proffer as an oppositional political project. Furthermore, while there was an oft-noted opposition between educators who perceived teaching as an art and those who perceived teaching as a science at the turn of the 20th century, again, the common point of reasoning in the debates was that children’s insides unfolded and developed. The task of the teacher was to ensure that this occurred in the “right” direction (see Bloch, 1987). Teaching as art or as science was appended to a view of “the child’s nature” as an object that required guidance, even when that guidance was construed as knowing what not to do. For example, an artistic view of teaching saw the child as a flower unfolding in the garden of life and as a delicate object that teaching-as-science would maim in its lesser moral standard (a Froebelian view). The teacheras-artist interpreted nature but did not interfere with it. Teachers-as-artists provided for nature’s innocent expression and were a conduit of its force. The scientific view of teaching also saw the child as a flower naturally unfolding in the garden of life but posited the teacher as both gardener and sculptor. The teacher was one who had to tend the garden, providing trellises for the pathway of nature, and to sculpt the clay of the child. Kindergarten discipline embodies two main elements, which I shall name play and prescription. In free untrammeled action the child prescribes for himself. In the prescriptive activity he acts as another wills that he should, or as another’s will prescribes for him. In play the child-self directs his natural, physical being. Without guidances and limitation, the tender, immature child could soon destroy himself. We therefore prescribe certain food, clothing, shelter, and human protection . . . . The ideal of kindergarten discipline incloses the substance of prescription in the form of play. In other words, a difficult activity ceases to be drudgery when it is combined
Salkind_Chapter 15.indd 253
9/16/2010 12:42:38 PM
254
Human Development
with that labor-saving element of “make believe” which play makes possible to the child. By the encouragement of the kindergartener, together with the progressive use of the kindergarten gifts, occupations, plays, games, songs, and stories, the processes involved in this otherwise laborious and uninteresting task are reapplied by the child in his spontaneous play, in which he forms correct habits. (Miller, 1904, p. 427)
The teacher-as-scientist made decisions and took action in consonance with what were posited as “broader” scientific laws, not with unsubstantiated “feelings” about the developing child. The teacher-as-scientist was thought logically derived and methodologically validated through scientific techniques of observation and recording of children’s actions. By using categories in which to place children’s appearances, actions, and verbal expressions, the teacher became the doer of science and the gatherer of data. In 1896 alone, for instance, educators had been invited to participate in Child-study projects by observing Peculiar and Exceptional Children; Moral Defects and Perversions; the Beginnings of Reading and Writing; Thoughts and Feelings about Old Age, Disease and Death; Moral Education; Studies of School Reading Matter; Comparative Study of Courses of Study in Various Grades; Early Musical Manifestations; Fancy, Imagination and Reverie; Tickling, Fun, Wit, Humor and Laughing; Suggestion and Imitation; Religious Experience; Kindergartens; Habits and Instincts in Animals; Number and Mathematics; and the Only Child in the Family (Wiltse, 1896, p. 111). In all responses, age, sex, nationality, complexion, temperament, and whether the trait was seen to be hereditary or not were to be recorded. The traits composing peculiar and exceptional children give a general idea of the structures used to gather and record “data.” Peculiar and exceptional children could be “beautiful and ugly, deft and clumsy, weak and strong, [could have] keenness and defect of sense, exceptional precocity and slowness of development, [could be] clean and dirty, selfish and generous, loquacious and taciturn, brave and cowardly, sweet and ugly tempered, orderly and disorderly, etc.” (Hall, 1896, pp. 1–2). The recordings made by teachers, teacher educators, or teachers to be were verbal accounts of their observations of any of the preceding. The “characteristics” that composed each “type” of child were assumed to be universally understood. For instance, in observing “ugliness and beauty” in students, no definition was required for what constituted either concept. Teachers responded as though the standards were already agreed upon. F[female]. 9 yrs. old. Light [complexion]. Nervous [temperament]. Ugliest person ever seen. Looks like a monkey. Peaked face, turned up nose, bulging eyes, and thick lips made to project by her teeth. Petted at home and teased at school. Backward. Mother the same when her age. M. 7 yrs. old. Dark. Obstinate and stubborn. Broad, flat face, low forehead, large straight mouth and thin lips, with dark, watery eyes. Parents unkind to and dislike him for his ugliness. Either teased or avoided
Salkind_Chapter 15.indd 254
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
255
at school. Teacher punishes him for stubbornness and he cries because his feelings are hurt. Inherited. (Bohannon, 1896, pp. 14–15)
Thus, while a variety of intellectual positions had opened around discourses of development, it was the inscription of the teacher-as-scientist and as judger of categories for children that became the key educational identity for teachers under developmentalism. This was so not because such an identity was “right” about children but because it was produced on the grounds for claiming truth that science had already begun to colonize. Although movements such as Child-study were to lose their status as “science” in the following decades through charges of mysticism and fad, the attachment of “science” to developmental reasoning that permeated movements beyond Child-study was to remain. What changed was not a belief in developmentalism, but in how one should study it and hence what constituted “real” scientific practice. Educational debate over the child’s nature and what to do with it emerged, then, as part of the expression of a developmentalist discourse, inscribing the child’s nature as a political site while paradoxically objectifying it as “data.” Whether pedagogues were promoting their identity as artists or scientists, however, it was an idea of power-as-right to oversee “a child’s nature” that united what might otherwise appear to be antithetical conceptions of teaching. The child’s nature, posited as malleable in some cases or to some extent, was conceived as a kind of internal engine. It had a power that drove the body to “grow” and the mind to “formalize,” a power that had to be harnessed and/or redirected for “the good” of the child, “the nation,” and “the race.” The debates that opened with developmentalism did not question the teacher’s or school’s right to do the harnessing but contested how that harnessing should be enacted. Developmentalism was thus a new discourse born also of the modernist enrapture with power-as-right. It provided new languages, techniques, justifications, and limits for the management of children.
Enabling Developmentalism As noted earlier, a widespread view of the child as a developing, becoming, unfolding, not-quite-complete, growing organism in scientific terms was a noticeable shift in educational reasoning. What were the discursive conditions that enabled developmentalism to take hold in so many different educational reforms at the turn of the 20th century? There is considerable debate in the present concerning how children were perceived and treated prior to developmentalism, especially in European contexts.3 What is missed in the debate over whether children were treated this way or that in different locales, however, is how the justification of developmentalism
Salkind_Chapter 15.indd 255
9/16/2010 12:42:38 PM
256
Human Development
seems to have required a negative reference point to announce its difference and to claim its contribution to educational discourse. In the present, it is often assumed that the negative reference points have been the treatment of children as laborers and their lack of distinction from adults in feudal-serf societies. These have been rhetorically mobilized as the problems that gave space to developmentalism as a more sensitive understanding of the young and that have ensconced its ally, public schooling, seemingly as an institution of rescue (Baker, 1998a). An image of children being treated as miniature adults with no consideration given in their pedagogy or rearing to the uniqueness and specificities of their “nature” constituted one ground on which developmentalism has staked its difference, especially as scientific. There may be comfort in such a narrative, but that is all. While developmentalism extended the individualization of the child as a subject, positing children’s nature as distinct from adults’ and as an object requiring differentiated treatment, it was not the “discovery” of a child-adult distinction per se that underwrote developmental theories. If the history of “Western” thought is considered, from Plato through to Herbart in the early 1800s, there is ample evidence that children had entered texts as distinguishable subjects, had sometimes engaged in private schooling, had undertaken different tasks than adults, and hence had been considered identifiable as children (Baker, in press). Romanticism, and developmentalism after it, did not discover “the child” but were products of reworking child-adult distinctions. If, therefore, children and adults had been previously distinguishable as subjects, then why did developmentalism not emerge earlier? It is beyond the scope of this article to detail the massive 17th-century upheavals in European social life, the history of the family, and ruptures in political power that accompanied them. The specific question here is what enabled the idea of developmentalism to become a central one written into and between a preexisting child-adult division. There are several possible parts to the answer: the racialization (and its related sexualization) of humanity based on colonization, the advent of public schooling, and the emergence of science as distinct from theology. The racialization and the sexualization of the child-adult binary that opened with European colonization helped create “the developing child.” Foucault (1990) suggested that the obsession with sexual intercourse as the marker between the child and the adult from the 18th century onward was propped up by racializing discourses. A “governmentality” on the part of the emerging welfare state interwove a concern for a population’s strength and “purity” with the monitoring of population groups via births, deaths, and marriage (Foucault, 1991). The point at which the population reproduces itself, sexual intercourse, thereby became a site for surveillance, emerging in Victorian concerns over the sexuality of children, the purity of blood and race, and the increase in national populations.
Salkind_Chapter 15.indd 256
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
257
Foucault’s analysis arrives implicitly at the undergirding fears, desires, and tensions in late-19th-century developmentalism but in the wrong order. For Foucault (1990), the primary category was sex, and it is sex and sexuality that rewrite the child-adult binary with racializing discourse acting as props or mysteriously falling from the sky to intersect with sexual dimorphism. The preexistence of “man” and “woman” as categories, as well as the preexistence of child and adult as categories, did not generate the idea of “the developing child” of developmentalism. These categories had been available to ancient Greeks, and developmentalism did not result. What were required to incite developmentalism were the anthropological and travel reports to European audiences of peoples so diverse and different to European cosmologies of personhood that Europeans would not inscribe them recognizably as people. The existing discursive systems could not explain cohabitation on the earth. Through initially Biblical appeals, because that was the cultural material available, such coexistence was explained; “Adamic” and “pre-Adamic” races were constructed and articulated to the narrative in Genesis (Livingstone, 1992). Through the 17th-century mechanical philosophies such as those of Thomas Hobbes, John Locke, and, later, Rousseau emerges, however, less Biblical tales of a twofold distinction, the evolution of “Man” from the “state of nature” to “civil society.” Foundational to the “developing child,” which was implicitly inscribed as European, was the developing race. Rousseau’s first discourse (1751), A Discourse on the Moral Effects of the Arts and Sciences, and his second (1755), Discourse on the Origin and Foundations of lnequality Among Men, are suggestive of how a new developmental logic was asserted through the racialization of humans that imperialism made possible. For Rousseau, humans were primarily of two types, natural-savage and civil. Correspondingly, in the distant past savage “Man” lived in a state of nature, while civil “Man” now lived in civil society and its pseudonyms: society, the state, the city-state, the nation-state, or political life. For Rousseau, civil society and civil Man were the problem – artificial, corrupt, degenerating – while natural-savage Man in a state of nature was valorized and reified as the ideal. Rousseau’s theory of “Man” enabled what might be thought of today as the European child to implicitly appear as a historicized subject, but not linearly so. Development does not take the form of a simple parallelism in which evolution from a state of nature to civil society mirrors the development of the young to adulthood. For Rousseau, it is because of the evolution into civil society that the child of such a society could be viewed as “developing” at all, and it is because of the natural glow he writes into “savage Man” that the child must be kept in circular relation with “its” natural past. Rousseau appears ambiguous about whether Man has an innate, biological potential to develop. In the state of nature, children leave the breast of their mother, they grow, and they become independent, at which point they leave the mother to begin their own wanderings. There is no society and no
Salkind_Chapter 15.indd 257
9/16/2010 12:42:38 PM
258
Human Development
community. Sexual difference does not matter, except at the point of childbearing. It is only in a society that children can become metaphysical infants to whom an “order of learning” from sensation to reason applies and where sexual dimorphism is and must be secured. Thus, growth in the state of nature is natural. Children, like plants, get bigger (Rousseau, 1762/1991, p. 38). In civil society, though, children grow and “develop,” and what they develop are faculties. The child of civil society can only be considered developing then because Man evolved “out” of a state of nature. It is the quality of will, choosing to live in huts, establishing a family, and forming a society, that was the springboard to the emergence of faculties such as morality and reason in Rousseau’s view. Choosing to live in huts was also the precondition to learning selfperception, comparison with others, and, hence, virtue and vice. Rousseau’s order of reasoning, especially in his Second Discourse, was that without will there is no such thing as the formation of a society, without society or the stabilization of living space there is no such thing as self-perception, and without self-perception there is no such thing as “development” of all of “the successive faculties” (Rousseau, 1755/1992). Rousseau’s theory of development is, therefore, a theory of history as applied to humans culminating in an explanation for his present which makes use of the texts spawned by colonization. Development of the individual applies only to civil Man, since it is only civil Man that can be individuated. Such individuation can only come into view, though, because of the prior presence of savage Man. The independence of “the savage” which Rousseau constructs and admires is not the equivalent of being an individual in Rousseauean terms. The amorphousness of the strong independent savage stands in contrast to the dependence of differentiated and weak individuals in civil society. In his “normalization” of Man from the savage state to civil society, this interdependence between individuals in a society is the space in which a negative concept of development is carved out. There is no child development in a natural or savage state because there is no notion of individual versus individual (self-perception) or individual versus group. One does not stay still long enough to see and compare natural inequalities and for Rousseau this is good. Hence, development of the child comes into play only in civil society, only where the concept of “faculties” for reason and morality arises. Development of the individual in civil society is thus a sign of degeneration of the human species over time. Even this cursory insight into what is actually a very complicated view of development in Rousseau’s writings is suggestive of how gender and child-adult divisions were explained in relation to the racialization of humans.4 Different sexes and different-sized beings are noticeable for Rousseau in the state of nature, but the story he tells himself and others is that they come to matter in terms of “development” only when there
Salkind_Chapter 15.indd 258
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
259
is something there to develop (i.e., faculties of mind). And there is only something there to develop in civil societies because civil Man unfortunately evolved out of being a savage one and therefore evolved faculties that were not present in the state of nature. Hence, the child is only possible to conceive of as developing, as increasing its faculties, because of this “prehistory” of human types. Without the evolution out of savagery in Rousseau’s terms, it was theoretically impossible to think in terms of development and its different “stages.” This grounding of developmental logic in racializing discourse enabled the subsequent narrower focus on gender as something of significance in civil societies and brought into relief “madness” and “lunacy.” Rousseau’s restriction of rationality to the European boy-child is given meaning against the backdrop of evolving races. European girls, like Rousseau’s (1755/1992) character, Sophie, do not develop fully the faculty for reason, though they develop something. That something is a body with the potential to bear children, and hence the nervousness over sexual intercourse as a marker between childhood and adulthood takes the form of monitoring changes in girls’ bodies in relation to the “underdevelopment” of faculties of mind. None of this matters for “savages,” though, in Rousseau’s texts, and of savage Man in the mythical state of nature he is apparently envious. Where one has to deal with development, then, is in civil society and in civil society it is rationality, and hence a boy for Rousseau, that matters most. Similarly, madness is not a possibility in the state of nature for Rousseau. It is only because one must expect a boy in civil society to develop faculties of reason and morality that the “mad” adult, who is now posited as childlike, comes into view. Thus, enabling “development” and carried with it as a term in Rousseau’s texts were the racialized discursive grounds on which concern for gender, sexual intercourse, and lunacy could pervade “the child’s” comportment and rearing. In the early and latter Herbartian reworking of development as a concept – that is, under developmentalism – this dependence on racializing discourse would be both assumed and inverted. It would be intrinsic to the very idea of “the developing child,” although the child would be treated differently under developmentalism than under Rousseau. The child would not be a subject circled back to its “savage” roots but one seeking a departure from everything that savagery was thought to signify. Civil society would not be cast as the problem for it was savages that were to be blamed for the problem of “degeneration.” Thus what enabled developmentalism was not the normativity found in Rousseau’s reasoning but the structure of the analytical schemata, the binary between savage and civil, that scaffolded the narrative of human history. A second “discursive fact” that enabled developmentalism was the advent of public schooling itself. One response to a perception of life as a more linear, less Rousseauean journey was the organization of state-funded
Salkind_Chapter 15.indd 259
9/16/2010 12:42:38 PM
260
Human Development
schooling. Schools, as mass institutions for segregating the young, were not only made possible by the distinctions given to children and adults but gave to developmentalism the material practices and space required for the observation of children. As Rose (1989) notes, the physical space that schools provided for assembling the young and the condensation of children into confined spaces provided a visual grid for making comparisons between children, especially when grouped according to age. Teachers could newly view children in relation to each other, as could children themselves. The invention of mass public schooling in Germany in the late 1700s and its uptake in Massachusetts in the early 1830s thus constitute an important discursive precondition to a developmentalist logic and its ability to take hold in and suggest an organization for schooling thereafter. It is not surprising that developmental theories have suited the organization of schooling so well and have acted back upon the organization of schooling, given that the formulation of the theories was in part a response to the confinement of children that schooling made possible. Third, the invention of schooling could not alone have enabled a move toward developmentalism if there had been no impulse to figure out human life. Asking questions about human life had to have been an acceptable pursuit in order for developmentalism to emerge. It was especially the opening of sciences and their protracted departure from “theology” that made possible questions about human life, its description, its categorization, its observation, its measurement, its nurturing, and its limits. As a way of moving out of Judeo-Christian theories of humanity, science was asserted as a series of practices, techniques, and concepts directed toward understanding the causes effecting Man (Wynter, 1995). Over the 19th century several images of Man directed the search for knowledge of the child. The elusiveness of Man was first pinned down by projecting a picture of Man as machine and by the conceptualization of Man as a raw material with an inherent capacity to generate either a profit or loss for humanity at large. The images of Man as machine provided the perceptual stability for analyzing the child in motion. A second image of Man was that of an organism responding to an environment. In this Darwinian and dialectical image of Man as acted upon and acting back on the surrounds, the child was theorized as a conglomeration of cells and systems amenable to cause-effect reasoning. Without a view of Man as a project to be sorted out and through, the developing child could not have existed as a research site. Herbart would not have been able to work out his mathematical psychology based on Newtonian mechanics and think that applying it to the child’s mind and pedagogy could make sense and nor could G. Stanley Hall (1901) describe the child really as “human larvae.” It was in part the New Man, the Scientific Man, then, as opposed solely to Woman-as-mother, that gave birth to the developmental child.5
Salkind_Chapter 15.indd 260
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
261
Reworking Developmentalism at the Turn of the 20th Century The preceding has suggested that what enabled developmentalism was a history of reasonings/practices that were constructed around racializing discourse on humanity (which had sexualizing and ableing effects), the confinement of schooling, and the opening of biophysical science. In the Herbartianism and Child-study of the late 19th century, such conjunctures were reworked, generating a series of effects for thinking about humanity and human differences. For example, the way in which children were defined and delineated through Child-study was via a savage-civilized binary but in very different ways to Rousseau. This binary was one of the most common and pervasive ways of reasoning about humanity available in the United States and Europe in the late 19th century. From medicine to history, physiology, philosophy, anthropology, and the new psychology, a savage-civilized schism operated as an organizer of perceptions. In educational discourse, the nobility of Rousseau’s savages to which the child should be returned would find little place. Savagery was not to be considered an idyllic point of return, and civil society was not the problem as it was for Rousseau. Savagery was a point from which a departure was sought. Children were thereby being reconstituted around a new kind of savage-civilized binary and “races” around a child-adult binary. The articulation of preexisting categories of identity called “sex” entered the new equations mostly in regard to “civilized” children’s pedagogies. Girls and boys of particular descent should be educated separately during adolescence, while prepubescent children should be educated into and monitored for expressions of sexuality by their teachers and parents (Hall, 1911). For Child-study enthusiasts such as G. Stanley Hall, who had studied Herbartianism in Germany, Africans and African Americans were signified by a lack of sexual dimorphism and were negatively characterized (Hall, 1905). Unlike Rousseau, who in principle did not restrict savagery by “color” but by time and geographical space (e.g., the tropics, the desert, and the forest), Child-study inscribed savagery via race-as-color and mental /physical “defects.” The sexualizing discourse on the young had intersected with the binary racialization of whiteness and blackness to form the sane “developing child.” The relationship was not an additive one of gender/sexuality, race/ ethnicity, and ability, as it is often written in the present, but one of race/sexuality that incorporated all of the constructions of gender and ability that were rewriting “the’: child of civilization. Specifically, in German and American Herbartianism and in Child-study, a child’s development was described as moving from savagery to civilization through “culture epochs” of “racial evolution” that corresponded
Salkind_Chapter 15.indd 261
9/16/2010 12:42:38 PM
262
Human Development
to “stages of development.” It was through Herbartianism that a view of ontology recapitulating phylogeny made its way from biological discourse to education in the United States. However, only if children were perceived to have inherited the “genes” that signaled their “capacity” to fully civilized “development,” which was to be sexually specific and dimorphic, could they be considered to have reached “adulthood” (Hall, 1905). It was no longer a question of innocence versus Original Sin that informed debates over teaching children. Rather, it was debate over the “capacity” to “develop” that newly marked education’s discursive field. Development was not a neutral term. A concept of human life as “capacity” produced multiple effects within developmental reasoning. For example, adulthood as a signifier of material ownership and independence to pursue further possessions operated off the enslavement of African Americans in particular, even after the Civil War. The positioning of African American children and adults as possessions, as property that signified the adult status of others, evidenced a different reconstitution of adulthood in North America from that in Europe. Adulthood came to signify a movement away from all of the negatives that were projected onto blackness-as-property, and it was this reading of adulthood that was taken up in movements such as Child-study. Furthermore, and related, the inscription of the adult as a site of distinction along intellectual, moral, physical, and economic lines rewrote the child as both sacred and pathological (see also Baker, 1998b). In order for a developmental conception of the child to emerge, for instance, it had to be possible to perceive a move into adulthood as constituting a journey – a journey that now had to be explicitly organized by the schools because adulthood was being constituted as a new kind of privilege. As a site for access to employment, sexual knowledge, ownership of material products, university entrance, and so forth, adulthood had taken on a different kind of discursive significance by the late 19th century.6 The positioning of adulthood as a time when one had arrived at a consistent moral or ethical endpoint became interwoven with a belief in adulthood as an expression of acquired rationality, intellectual power, physical strength, economic worth, and the achievement of citizen status. The meaning given to the child-adult binary under late-19th-century developmentalism was not just dependent on seeing children as separate subjects with separate natures to adults; it was dependent on methods for interpreting the distinction as consequential to the moral, physical, intellectual, and economic “outcomes” that the adult-as-nation represented – represented, that is, with all of the attendant racialization and sexualization that the categories “adult” and “citizen” signified. In sum, in the public school reforms of the late 19th century, race/sexuality and the limits and possibilities this suggested were being reworked around a developmental conception of the child. Under developmentalism as it appeared in literature of Child-study, “Man-lessness” was inscribed onto the
Salkind_Chapter 15.indd 262
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
263
body-mind of all children of color, all girls, and all children perceived as having physical or mental impairments (see Baker, 1998b). To become an adult, a Man of pure, untainted whiteness, a Man of sound mind and body, was to have been none of the above in childhood. The conjuncture between developmentalism in its late-19th-century forms and the reading of phenotypical markers is not a new narrative, however (see Gould, 1981). Developmentalism provided a way of uniting a Cartesian split between mind and body by inscribing the “external” (the body) into the lenses for viewing the “internal” (the soul and mind) (Rose, 1989). In other words, it objectified the mind and soul, drawing them outward, ensurfacing them, and making them legible to the scientist. The elusiveness of “the child’s nature” could be pinned down, enabling an array of management and governing strategies to be suggested on the basis of the new scripts of body-mind. In accepting as truth that phenotypes existed, that “race” could operate as a function of phenotype, that organs could be classified as “sexual” or not, and that meaning and value could be ascribed to differentiation, developmentalism posited the sacred-ness of childhood as a unitary, unfolding experience at the same time that it required the operation of multiple “differences” to bring meaning to its unity.
Progressivism Given the preceding narrative, it is possible to question the point of studying and theorizing children in the late 19th century. If the rise of developmentalism is reduced to being about “race-ing” and “sex-ing” exclusions in the realm of education and in theories of humanity, then it was not necessary. This had already been “achieved” through colonialism, slavery, and romanticism. Why, then, were children so energetically sought after as educational subjects? Why was there such a rush to monitor children’s “development”? Why were coexisting movements that Kliebard (1986) identifies as social meliorism, social efficiency, or humanism not taken up as the preferred discourse on the young? In short, why developmentalism? Part of the answer to these questions resides in what emerged alongside developmentalism. One catalyst to developmentalism’s rise in the late 19th century was its conjuncture with a broader social movement of progressivism. Progress in education, and its extension into “teaching with a truly progressive spirit” around 1900, was tied to a social reform agenda that went beyond and yet held implications for the inscription of children under developmentalism (see Rice, 1893). Interestingly, there is no agreement in the present as to whether a progressive education movement ever existed. What have been identified instead are movements in which educators defined their work as “progressive.” For example, Cremin (1961) argued that, historically, labeling an educational movement progressive meant that it had four major themes of
Salkind_Chapter 15.indd 263
9/16/2010 12:42:38 PM
264
Human Development
cancern: (a) broadening the function of schools to include concerns about health, vocation, and the quality of family life; (b) classroom principles based on new scientific research in psychology and the social sciences; (c) a tailoring of instruction to different kinds and classes of children; and (d) faith that culture could be democratized without being vulgarized. Cremin saw a progressive education movement, arguing that its genesis lay in the decades immediately following the Civil War, blossoming in the first few decades of the 20th century and collapsing shortly after World War II. Kliebard (1986), in contesting the existence of a “progressive education movement,” has pointed to the inappropriateness of labeling an array of often contradictory efforts as though they were a discrete entity. Kliebard (1986) argued that humanism, developmentalism, social efficiency, and social meliorism, which were the educational reform movements emerging in the late 1800s and early 1900s, represented the efforts of those in often oppositional interest groups. This led Kliebard to suggest that “it was not just the word ‘progressive’ that I thought was inappropriate but the implication that something deserving a single name existed and that something could be identified and defined if we only tried” (1986, p. xi). In the present, however, “progressive education” has been used much less cautiously than Kliebard suggests and generally remains a category for historical research in the field.7 Reforms in the present are often considered identifiable as progressive if they project a more liberal rather than conservative vision of public schooling and its function. Progressive education is currently associated with building a more democratic democracy, with concerns for social justice, with methods based on cooperation and group decision making, with organic and culturally relevant teaching, and with a centering of the child in pedagogical strategies. What has remained dormant in some present celebratory readings of progressive education as a “liberal” event and in past analyses like Cremin’s is the historical specificity of “progress” at the time of its importation into educational discourse. “Progress” and “progressive” were frequent iterations in education at the turn of the 20th century, especially in developmental movements. The meaning of progress was not drawn only from how progress functioned in educational texts, however, and obviously not from how it functions in educational texts of today. Progressivism emerged in a variety of fields in the late 19th century that lent meaning to its early usage in education. At the turn of the 20th century, progress meant, in its most rudimentary form, an improvement. The term progress was popular in England in the 1790s, and it comes from the Latin progressus, to advance forward or reach a higher stage. It was appropriated and formed anew in the United States in the 1800s, however. Progressivism was seen in the latter half of the 1800s, especially by the British, as a distinctively American phenomenon.
Salkind_Chapter 15.indd 264
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
265
Progressivism, outside of education, operated primarily in antithesis to “primitivity.” Primitivity held technological, intellectual, moral, religious, and health dimensions that were assumed “inferior” to “Western” cultural forms of “civilization.” Progressivism represented a movement beyond these alignments, positioning a departure from the “tropical,” the “black,” or the “savage” as a form of cultural superiority that evidenced a “survival of the fittest.” In addition, progressivism was underwritten by its association with the liberal welfare state, capitalist expansionism, and ideas of democracy. “Democracy” and “civilization” were inseparable conceptual partners that produced the Citizen, the New Man, who was meant to inhabit late-19th-century America. Democracy meant, at its loosest, voting for political representatives. Civilization meant, at its loosest, that only some people could vote or be “representative,” predicated upon the inheritance of “rationality” and the acquisition of literacy. Progressivism was, then, a curious mix, what Wagner (1994) calls a tension between “liberty and discipline.” Liberalism and democracy were articulated in the latter 1800s as the formation of a certain kind of civilization – one that had developed beyond the primitivity of the “savage” and hence had earned or inherited the right to manage Progress in a democracy. Progress with a capital “P” had a specific direction for social reform that wove through discursive fields beyond education. What was being progressed toward was a very particular notion of a “higher stage” as civilization and democracy and was inseparable from the ideas of “savagery” that were being moved away from. Progress was an improvement that was gauged by movement into a different social form – a movement that would leave behind or ameliorate whatever the signifiers of backwardness were taken to be. The “race-ing” and “sex-ing” of humanity that had been “achieved” previously and elsewhere, then, did not generate developmental theories as a form of progressivism but converged in them, enabling a reconstitution of privilege through reworking what the distinctions were to mean. There were multiple effects of such a conjuncture between progressivism as a move toward a “higher” social form and developmentalism as a move toward a “higher” ontological form. Three such effects are outlined subsequently. They concern how the interpenetration of developmentalism and progressivism took the form of the production and management of “difference” in regard to the child.
Producing and Managing Difference: The Intersection of Developmentalism and Progressivism The management of difference that developmentalism and progressivism achieved can be understood via the production of discourse on deficit children, on humanitarian adult helpers, and on emergent welfare state experts who
Salkind_Chapter 15.indd 265
9/16/2010 12:42:38 PM
266
Human Development
mediated the two. First, in Herbartianism and Child-study, for instance, educators frequently spoke of the progress of the nation, mankind, the race, or the human race when arguing for a developmental orientation to children in schools. The notion of progress as inextricably bound to civilization, to science, and to the existence of formal educational institutions operated as a given. It was the idea of development as a form of progress that could then suggest who or what the detractors from progress might be. If a child is well born, if he springs from sound, sane stock, if he possesses high endowment potential in the germ, then the problem of his unfoldment is well-nigh solved long before it is presented. Such a child is easily protected from adverse influences; and he is delicately and abundantly responsive to the positive influences of education. But if, on the other hand, the child is marred in the original making, if he springs from a worm-eaten stock, if the foundation plan of his being is distorted and confused in heredity before his unfoldment begins, then the problem of healthy normal development is rendered insoluble before it is presented. Such a child is difficult to protect against adverse influences, and he remains to the end stupidly unresponsive to the delicate growth factors of education. (Bobbitt, 1909, p. 385)
A discourse of progress not only set in motion subject positions of the “detractors” but also enabled a view of public schooling as an institution that had to better manage the difference. At the present time our medicine, hygiene, and public sanitation keep alive multitudes of weaklings that formerly were weeded out by hard conditions. Thus to-day we save weak lungs, weak muscles, weak eyes and ears, weak minds and weak wills, weakness in general, and weakness in every particular, and permit it to reproduce itself in heredity, further corrupting the next generation. Our schools and our charities supply crutches to the weak in mind and in morals, nursing them and cherishing them in every possible way, helping them to economic independence, to family life, and thus further to corrupt the streams of heredity which all admit are at present sufficiently turbid. Thus we see two sinister processes at work; the upper and better strata of society are continually dying away; and poorer ones are being added on at the bottom. There is a continual drying up of the highest, purest tributaries to the stream of heredity, and a rising flood in the muddy, undesirable streams. (Bobbitt, 1909, pp. 387–388)
The relationship between progress and public schooling was such that public schooling was positioned as a necessity, but only for those children whose “natures” qualified them to attend. Progress was being constituted as the smelting of difference through schooling in some cases and the exclusion of difference from schooling in others. An image of power-as-right, including what Ladson-Billings and Tait (1995) call “the absolute right to exclude,” became inscribed in the naming and framing of categories of children.
Salkind_Chapter 15.indd 266
9/16/2010 12:42:38 PM
Baker
Developmentalism, Progress, and Public Schooling
267
Ironically, difference was a concept that psychology required for making observations. Educators and public schools chased Progress by offering new formulas for the management of categorical difference. In the midst of the chase lay inescapably the creation of further differences, especially along race and ability lines. Interestingly, the smelting of difference, such as the attempted Americanization of new immigrants through public schooling, did not operate in relation to categories of sex. It was the continual production of sex difference among “the civilized” that constituted progress in the 19th-century developmental movements. This emphasis on dimorphism gave adolescence a special significance. The maintenance of gendered comportment was meant to feed into a rational reproduction of a nuclear family. Puberty and adolescence are without a doubt the most important period in the life history of a human being. The whole of childhood is directly related to this period. When the functions of sex are early matured, childhood is proportionately shortened. The lengthening of childhood and the increase of intellectual and moral development, the expansion of uplifting and educational forces, the increase of wealth, the lengthening of old age, which can be demonstrated as accompaniments or consequents [sic] of this phenomenon, are thus primarily correlated with the problem of sex. In the history of evolution a period of nonage or childhood comes in as an intercalation between birth and sexual maturity. To increase this intercalation is the aim of progress. (Scott, 1897, p. 841)
In regard to civilized youths, childhood was a difference related to the horizon of sexual maturity, while progress was specifically inscribed as an extensive preparation for a movement into a new body and its possibilities. The evacuation of sexual knowledge from childhood may have been related to distress over pedophilia, but it was also related to a fear of adulthood’s loss of distinction and privilege. More broadly, discourse on race, gender, and sexual intercourse was given meaning in regard to the child through moralistic and political fears over population increases, nationalism, and purity of “stock.” Images of “the detractors” and “the facilitators” accompanied and inhered in discourse on Progress and the child. Second, the appeal of progressivism in its wider context was predicated upon ideas of “doing good” or “helping” that further inscribed developmentalism as a discursive site for the management of difference. The view of the helper as to what constituted help relied upon a specific form of “populational reasoning” that encoded an assumption about the helper’s right to manage those who were newly positioned as detracting from progress (Foucault, 1991). Education became a site of salvation, positioned as a necessary rescue of humanity from inhumanity, of the “civilized” from the “savage.” The humanitarian impulse, at the point of its coincidence with developmentalism, operated from a view that assumed some children as deficits. As noted earlier, children who were thought to be capable of progressing beyond
Salkind_Chapter 15.indd 267
9/16/2010 12:42:38 PM
268
Human Development
the savagery of their childhood were to be educated through public schools. Children who were thought incapable of further development or progress than “savagery” made available were to be excluded on the basis that this was for the greater good of humanity (Hall, 1901). Here, savagery included racialized notions of personhood in addition to the intersection of “physical and mental disabilities,” while “the helper” lay outside such signification. Developmentalism was an expression of progressivism in education that did not newly announce “exclusion” from humanity or educability but that newly organized and monitored it through ordering the ontology of the child. “The” child and “the” helper were very particular concepts – raced, sexed, and “abled” concepts that could not hold equally, within the one embrace, all children and all adults. They could not do so partly because the idea of Progress used points of “stagnation” from which its meaning of “moving on” was drawn. As in the present, humanitarian discourse, as allied to Progress, established in developmentalism a series of discursive spaces surrounding the politics of care, a space for the tenderness of the carer and the limits, the harshness, of the caring gaze. Third, the merging of developmentalism and progressivism did not simply operate at the level of ideas, as though ideas were separate from “the material,” but intersected with the emergence of the nation-state and welfare experts as managers of difference. By the mid- to late 1800s, the welfare state had garnisheed the management of difference via education as its responsibility and had come into a much more visible role in organizing the young. Previously, denominational colleges had dominated the organization of schooling and teacher training well into the 19th century and well beyond the advent of the first “public” schools in Massachusetts. The pedagogies and technologies deployed through different developmental movements articulated a new relationship between the individual and the state, however. That tension could be perceived between religious and secular control of education is indicated by the rise of childhood and psychology experts. By the turn of the 20th century, the welfare state’s role as manager or arbiter of difference had dissolved somewhat into the hands of scientific experts, experts whose work could be deployed across a variety of institutions with often oppositional philosophical views. Salvation of the child via developmentalism could be religious or secular in its framing, as long as it contributed to Progress. The rise of the “expert” in relation to the physical and social sciences facilitated the spread of progressivism as a social movement. The expert, who understood and recommended the management tasks required for Progress, was an important new site for articulating projects of the welfare state to the governance of individuals by either themselves or religious or secular others (Popkewitz, 1991). In terms of developmentalism, it was the growth of the child’s body-mind that became the specific object for articulating this relationship between state and individual Progress.
Salkind_Chapter 15.indd 268
9/16/2010 12:42:39 PM
Baker
Developmentalism, Progress, and Public Schooling
269
The emphasis on growth was a new emphasis on questioning the process rather than the product. Under developmentalism, means and ends were split. Danger was not posited in the ends of the process of growth, as adulthood was assumed as a positive, as a good outcome, and as evidence of Progress. Danger was thus posited into means, into the methods for ensuring that the ends were achieved. The wrong means, the wrong teaching or rearing techniques, constituted danger, while the good became their timely application in accordance with predetermined goals and assumptions of normalcy. The emphasis on growth as a sign of Progress was not an entirely new theme, however. The view of a growing body or organism as “progress” reflects a specifically Western tradition. As Popkewitz and Pittman (1986) note, the idea of “natural growth” as a metaphor for societal progress emerged in Greek and Herbraic thought, was modified through Christian theology, and was then secularized through the sciences in the 18th and 19th centuries. The school practices recommended for different stages of development in different movements reflect the new-found faith in physical and social sciences to sketch pathways for Progress and to provide for the “growth” of the young and “the race” through state institutions. In sum, the juncture between developmentalism and progressivism established a form of reasoning in educational discourse that is still pervasive. In order for Progress to be achieved, there had to be “growth,” and it had to be monitored. One had to accept that developmental theories were the best interpreters of growth and, crucially, that the goal of growth was neutral, or at least unquestionably good in regard to the status of the state. Growth toward a particular kind of ideal “adult” was central to the gauging of Progress and to the production of a national image. Without growth and development, the child, the nation, and the future would apparently degenerate. It was the alchemy between developmentalism and progressivism, then, rather than simply new scientific techniques or isolated concepts, that has produced “the developing child” and has simultaneously colonized the grounds for judging the dangerous and the good in the name of Progress ever since.
Summarizing Effects of Developmentalism’s Emergence The uptake of developmentalism in education produced some effects that have provided the intellectual scaffolding for its destabilization today. What might be called the productive effects of developmental theories are familiar and are often left untouched in present-day efforts to destabilize developmentalism: the concern for children’s health, the lowering of infant mortality, the relief from physical pain that came with new knowledge of “disability,” and so on.
Salkind_Chapter 15.indd 269
9/16/2010 12:42:39 PM
270
Human Development
The celebration of developmentalism in these terms has been perceived as part of the problem, however, in that it has been taken to mask some of the more dangerous and less good implications of its emergence. That is, it is an interpretation of developmentalism’s repressive effects that presently constitutes the sometimes unspoken danger underlying its deconstruction and, hence, receives the most notice. First, as noted earlier, it was impossible to have a view of the child as “developing” in the late 19th century without having, at the same time, methods for defining children on the basis of “lack” in developmental terms (Rose, 1989; Walkerdine, 1984). Developmentalism not only shifted a view of the life cycle to a view of the “life line,” but it did so through positing inherent “lack” into the “natures” of certain children whose development would purportedly end at childhood. Public schooling as the management of lack in linear terms was thus expanded most noticeably under developmental reasonings. Second, the new visibility given the child’s nature segregated children on the basis of their “usefulness” to Progress. Children were grouped not just according to old categories of race and sex but via new categories of developmental “deficiency” or “precocity” (Burman, 1994; Morrs, 1996). It has been noted elsewhere that all children of color, some girls, and children described as “retards” were the subjects who became crucial and central to the comportment of the “normal” developing child (see Baker, 1998a; Franklin, 1994). What developmentalism formalized in its techniques of care was a vision of Progress as being able to leave others behind. Its effects were not simply directed toward its “Others” but toward those children who were left unproblematized through such discourse. The intersection between developmentalism and progressivism enabled positions of dominance in regard to public schooling’s preferred clientele. Third, and related, the juxtaposition of children as either able to be fully human or not, adult in the future or not, facilitated differential treatment, funding of public schools, and respect in relation to enacting education (see Anderson, 1988). This was not a new occurrence in public schooling, although developmentalism extended preexisting systems of inclusion/exclusion in new ways. Public schools emerged on the East Coast at a time when African American children, for instance, were enslaved and prohibited from attending. It is nothing new to note that the very building of a “public” school from government funds already had circulating within its architectural plan limits in who would be attending, in who “the public” were. Developmentalism was to extend the notion of exclusion from “the public” through a much more effective means. It was by inscribing limits into the capacity of the child, into the body-mind, rather than attributing them to God’s plan alone that inclusion/exclusion could be reconstituted in educational discourse. In some ways, developmentalism in its first form in the public school system can be seen as a more complex form of repression than that
Salkind_Chapter 15.indd 270
9/16/2010 12:42:39 PM
Baker
Developmentalism, Progress, and Public Schooling
271
indicated by direct physical violence. It provided a discursive structure that acted to discipline “savage” and “defective” children and to “de-intellectualize” the construction, “girls,” without having to touch them (Baker, 1998b). The emergence of developmentalism and the subsequent reading of its repressive and productive effects have been made possible by a wider movement to manage difference in the name of Progress. Developmental theories were historically mobilized through a perceived right to invent or attribute meaning to difference, to enforce difference, and then to claim that difference required management toward a norm, toward Progress. The mobilization was neither singularly conspiratorial nor without differential effect. Events such as the rise of science, of experts, of a humanitarian impulse, and of discourse on race, sex, and capacity that the first developmental theories were predicated upon were events that originated in spaces sometimes far removed from debate over public schooling and exclusion in the United States. It is paradoxical, though, that it is the education into “difference” that has subsequently enabled developmentalism to be perceived as singular. The critique of developmentalism on the grounds that it is exclusionary and “Othering” requires the presence of difference to operate. If difference can be defined outside of appeals to “rationality” or the “logocentrism” of “the norm,” as Walker-dine (1993) suggests, then it is possible to do so because of the way in which normativity has functioned to produce subjectivities of difference. The question as to whether one can ever be completely outside of rationality or logocentrism is a cogent point to consider.
Arresting Developmentalism? If the present writes the past, then it might seem that it is not difficult to judge, perspectivally, what the good and the dangerous have been in terms of developmentalism. The preceding indicates that while the good, the productive, may have been the acute concern now paid to children’s health, the dangerous, the repressive, can be read from the present as a different kind of exclusion. Bringing all children into a developmental gaze provided less physically coercive forms of exclusion from definitions of humanity for some children and made available for others justifications for dominance. There is a problem with the preceding summary of developmentalism’s effects, though, that does not relate only to its simplicity. Can the good and the dangerous be as easily pinned down as the preceding implies? Is it, for instance, the very act of pinning, the very judgment of effect from presentist perspectives, that reinscribes the same old system of inclusion/exclusion into new critiques of developmentalism? The problem with the above reading of developmentalism’s effects is that its assessment of the dangerous and the good arises out of similar 19th-century notions of progress and power-as-right that situated the problem in the first
Salkind_Chapter 15.indd 271
9/16/2010 12:42:39 PM
272
Human Development
place. Judging the dangerous and the good still operates from a concern for “progress” in which progress is to be differently organized in relation to the child than current practices enable. This is a pervasive and often unspoken logic in addressing issues of change in educational discourse. One is meant to take comfort in certainty, to lend security to an audience, by depicting other views as “erroneous” in order to proffer a better solution, a better way to achieve progress. Yet, as noted above, it was this chasing of “progress” through developmentalism that seems to have led to present charges of developmentalism’s danger. Developmentalism was enabled by distancing itself from past theories of the child and by claiming to know better how to better treat the young. The inadequacy of “faculty psychology” was spun into the view of some children as inadequate for adulthood under developmentalism. Interestingly, present efforts to destabilize developmental theories still take as their starting point some perception of inadequacy. The analysis presented so far might be read, for instance, as an artifact documenting the problem with progress while simultaneously attempting it in a different form. “Lack,” “deficiency,” anti “inadequacy” operate silently behind the observations. Even where effects are noted in the positive, as the productive moments of a discursive rapture, it is lack and deficit, problem and inadequacy in regard to developmentalism that direct the analysis. The inscription of race, sex, and ability into narratives of child development has been implicitly posited as problematic, as inadequate, as dangerous, as a perhaps unintended yet repressive effect. One way of understanding the circularity in which writing and judging developmentalism’s effects in the present is mired, then, is through temporarily locating the grounds upon which its emergence was staked or, at least, upon which the reading of its emergence is staked. This article thus far has been an effort to locate grounds for developmentalism’s hold on schooling and its present controversial status. Obviously, grounds are written differently depending on the theoretical frame. One does not have to claim that these are the only ways of reading and writing developmentalism in order to have it stand alongside other multiplicitous possibilities for interpretation, In that sense, I offer some deliberately paradoxical thoughts on how developmentalism is being fractured in the present and on the debate surrounding “alternative” ways of reasoning about children and education.
Fracturing Developmentalism Developmentalism’s hold on schooling in its late-19th-century form has not completely dominated educational reasoning, and neither has its initial obsession to validate only certain kinds of children as possible adults. If it had, this article could not have been written. Developmental theories have continuously
Salkind_Chapter 15.indd 272
9/16/2010 12:42:39 PM
Baker
Developmentalism, Progress, and Public Schooling
273
required reworking. It has been a never-quite-complete task to solve the puzzle of the child in the name of knowing Man better. The grounds upon which the puzzle can be solved have constantly shifted, though, and developmental theories from Herbartianism, to Freud, to Gesell, to Piaget, to the rebirth of Vygotsky have sought to grapple with the imposition of order onto/into the child. One effect of this grappling has been a distancing from the earlier inscriptions of child as savage. Even within developmental-friendly subdisciplines of education, earlier constitutions of the child as a “primitive” being are undermined by the move to reincarnate a complex Vygotskian child. The multiple readings of and uses to which Vygotsky is currently being put generally coincide around the inscription of the child as an active, social, culturally contexted constructor of knowledge. Vygotsky is sometimes positioned as a reminder of the importance of environmental influence and culture in depictions of development, as a marker of distinction standing in opposition to the genetic determinism that pervades work such as The Bell Curve: Intelligence and Class Structure in American Life (Hernstein & Murray, 1994). The fact that arguments like those in The Bell Curve are still available suggests, however, that there is a continuity in conceptions of “the mind,” “ability,” “race,” “sex,” and “class” that some developmental theories are unable to operate without. Furthermore, the hold that developmental psychology has over defining what constitutes the “special” in special education suggests that narratives of development are still articulated to an old tension between differentiation and normalization. Present debate over nature and nurture, between genetic determinism and environmental influence, does not signal a rupture with the past. The child, as an object meant to carry educational reform into the future through “development,” still bears the burden of “progress,” whether Vygotsky or The Bell Curve is depicted as the means to its fruition. Morrs (1996) notes similarly, for instance, that the move to reincarnate Vygotsky is not a move that gives up developmentalism. It simply changes the steps and the strategies. The circular reasoning that developmentalism drew around the child in terms of inscribing difference and then perceiving it as a problem to be managed has (not surprisingly), then, come undone from elsewhere. Developmental theories have been critiqued in the latter half of the 20th century through several strategies that have been co-emergent with the linguistic turn in the social sciences. Three of these “theoretical” shifts or strategies are outlined subsequently to convey a sense of the controversy and fracturing surrounding developmentalism in present literature. One strategy has been to challenge a conception of power-as-the-rightto-name and to prescribe courses of action for others. Morrs argues (1996), for instance: It is not so difficult to displace development from the adulthood area. We need only say that development implies regularity, and that adult humans
Salkind_Chapter 15.indd 273
9/16/2010 12:42:39 PM
274
Human Development
are diverse and open-ended in their potential. But the younger the person we discuss, the harder it might seem to be to avoid developmental prescriptions. If it is difficult for childhood in general, surely it is impossible for infancy. Infancy, then, must surely be the most severely testing case for any general claims for anti-developmentalism. Can I really be saying that babies do not develop? To be consistent I do need to say just that. To say that adults do not develop but that babies do would be an unacceptable retreat, effectively a retreat to the classical biological position . . . . In studying infants, and in formulating general claims about the ways they change across time, scientists impose their own world-views on infants and their social contexts, (p. 155)
A second strategy has been to question the fixity of “the subject,” that is, the human or various categories of humans, in educational discourse. The way in which a child-adult division has been reconstituted from romanticism to developmentalism to new cultural interpretations has enabled a view of the relative and shifting meaning of childhood in educational discourse. Strategies for pegging change and continuity have been deployed because the highly contested emergence of modern childhood as a sociocultural space subsequently became submerged, allowing childhood to operate as a takenfor-granted fixture and an apolitical site in the educational field. The insertion of childhood, especially but not only as a matter of race, into educational reasoning under developmentalism has been excavated as an invention that has become a convention, one both enabled by and operating as effects of power (Baker, 1998a). A third strategy concerns the rethinking of time and space in regard to how we understand “the subject.” Developmentalism has been challenged in regard to how it has marked time into the young and into educational reasoning. The move to decenter the subject in the social sciences has moved a view of the self as creating culture to a view of how culture creates the self, the subject, the individual, and the group (Cahoone, 1996). The decentering of the subject has opened a view of how the ordering of time and space has operated in theories of humanity, in the quest for seeing order in the child and hence having grounds for prediction and control. The instability and fluidity now surrounding theories of time and space have coincided with studies of “disability.” Disability is newly defined in relation to how discourse and environment establish what constitutes a disability (Carrier, 1986; Johnson, 1994; Skrtic, 1995; Spear-Swerling & Sternberg, 1996). Disability can be identified only in relation to what children are asked to do or expected to perform within a particular time frame and within a particular set of rules for performing. Under different environments, under different questions, the disability identified in one environment ceases to exist in another (Bronfenbrenner, 1979; Skrtic 1991; Sontag, 1996). Studies
Salkind_Chapter 15.indd 274
9/16/2010 12:42:39 PM
Baker
Developmentalism, Progress, and Public Schooling
275
of “disability” in children and the tremendous debate over how or whether disability, handicap, and impairment can be defined have, then, been turned against the notion of linear, time-based development. One result of the debates has been the fragmenting of suppositions of “order,” of “capacity,” and of “stage” through the very terms in which disability was first established. The impossibility of recognizing or identifying disability can subsequently be raised. If “ability” and “disability” collapse under the contradictions inherent to their relationship in a text (where text can be anything from a classroom to a test), then how can a pathway for “normal development” ever be prescribed? All of these strategies to destabilize developmentalism in relation to power, to the subject, and to “timespace” do so under the present assumption that developmentalism’s hold on views of human life has been and can still be dangerous, especially if left in singular charge of multiple and diverse humans. What is interesting and uneasy to consider, however, is how developmentalism in part enabled a view of humans as multiple and diverse. It did not simply trace difference; it produced difference. Now it is in the name of “difference” that developmentalism is critiqued. If the elimination of danger is what the destabilization of developmentalism is about, then it is also important to consider whether a move away from developmentalism is in fact a move away from the very narratives of progress that helped to produce the idea of danger or “detractors.” It is to alternatives to developmentalism and their relationship with narratives of progress and rationality that I now turn.
Alternatives to Developmentalism? The question of alternatives to developmentalism assumes that the grounds for its challenge, like those outlined earlier, have shifted and that this shift has enabled a move of the educational field into a new space. New ways of reasoning have opened up new space and possibilities for action so that now developmentalism can be seen in a different light, as a product of human invention, and hence as amenable to further change. There are at least three difficulties to consider in regard to the line of reasoning that posits developmentalism now as amenable to change, fracture, and alternatives. They concern paradoxes in concepts of progress, concepts of freedom, and concepts of prescription that emerge in the act of articulating alternatives to developmentalism. Together, these “difficulties” that I map subsequently disenfranchise any neat conclusions in regard to educational action, for at their base are the very problems from which a departure is sought. Thus, their delineation serves as an opening out onto, rather than a closing down of, discussions about developmentalism in education.
Salkind_Chapter 15.indd 275
9/16/2010 12:42:39 PM
276
Human Development
Alternatives and Progress First, it is the conception of opening or moving into new space, itself a very colonial metaphor, that proves problematic in undermining the dangers attributed to developmentalism. This is because a view of developmentalism as something that ought to be moved beyond reinvokes a discourse of progress to which developmentalism has been tied. The role of education as the manager of difference, the means to progress, and the organizer of the future is not necessarily displaced by producing new theories of the childadult distinction. Chasing progress through new and better means is a problematic entrapment in the discourse on Man, problematic because it suggests that unless one can find a better way to manage difference, there will be no more progress. More tautologous still is that unless one finds a way to move beyond a discourse of progress, there will be no more progress! One of the key difficulties in destabilizing developmentalism and the reading of its dangers, then, has been the lack of consideration of narratives of progress, in whatever form they may take. The assumption that children can be better viewed through different lenses and that this will in turn generate a more positive future, however that is defined, constitutes a similar form of social engineering and chase of “civilization” and “democracy” as that out of which developmentalism arose. The identification of danger, the naming of views that detract from progress, albeit in a different form, does not shift the grounds upon which fears of danger have arisen. Such acts simply rewrite them in a different form because “progress” is still being chased through damage control, through nullifying “the detractors.” It is at this point, however, that the truly circular reach of discourses of Progress and the difficulty of positing alternatives to developmentalism come into view. The paradox is such that the preceding redirection of the focus can be perceived as a recommendation, as a chasing of something “better,” as implicated in the very narrative of progress that is posited as problematic. In Foucault’s (1980) terms, such an effect is symptomatic of how new discourses establish the limits of their opposition. The effects of such circularities are not stultifying (and not necessarily “bad” even if they were), however, for they force a consideration and clarification of the specific grounds on which any alternatives to developmentalism might claim the status of being alternative.
Alternatives and Freedom Second, this initial difficulty of critiquing developmentalism and leaving narratives of progress in place may seemed tied to the acceptance of the divisibility of humanity. Is it possible to have a child-adult distinction without
Salkind_Chapter 15.indd 276
9/16/2010 12:42:39 PM
Baker
Developmentalism, Progress, and Public Schooling
277
thinking of the difference as one that is covered by developmental stages that make progress toward a higher state? Does the division automatically incite developmentalism, given that the division has existed historically without being thought of as scientifically verified “stages” or “capacities”? Does developmentalism necessarily require a “race-ing,” sexing, and “able-ing” of the child-adult distinction to become operable? On the question of alternatives to developmental psychology, Morrs (1996) argues that the point is not to posit an alternative, at least not in developmental terms. Because developmentalism is historically and culturally specific, “culture-bound,” and because those social specificities have involved various forms of inclusion/exclusion, the terms on which developmentalism has been constructed in Morrs’s view cannot be left in place. That is, he suggests that perhaps it is time to operate in a space that is ambiguous and less clearly defined than what we may be used to and that this is a constructive, not destructive, outcome of antidevelopmental arguments. Rewriting developmentalism means rethinking education (Morrs, 1996, p. 153). There is no such thing as “atheoretical” development. Development must always be described according to some theoretical perspective or perspectives. “Scientific” approaches to development are clearly flawed then, if they are meant to be objective accounts of natural change . . . descriptions of natural processes of “unfolding” are inevitably culture-bound. (Morrs, 1996, pp. 155–156)
Operating silently behind Morrs’s conclusion not to prescribe, however, is a preference – a preference that seems to implicate the denial of prescribed alternatives as the alternative itself. Ambiguity becomes a code, ironically, for knowing not to know in developmental terms. Less silently operating in antidevelopmental arguments over alternatives is the appeal to “human freedom.” It is freedom, or a lack of, that unlocks the code, that suggests in what ways developmentalism has been dangerous and for what reasons it ought to be “de-known.” We should be on our guard against the implications of the developmental attitude to people’s lives and hopes. It treats others as behind or below ourselves, but destined to follow the same path. The search for antidevelopmental alternatives must therefore be seen as an emancipatory project. This seems odd, because freedom has often been defined as freedom to develop. But when we unpack the notion of development we find that there are problems and that “freedom to develop” is a freedom with strings attached. We must take care to read the small print before we sign the contract. Developmentalism may be antithetical to human freedom. (Morrs, 1996, p. 1)
The dismantling of developmentalism, only to leave in place modernist narratives of human freedom, is an understandable but difficult position.
Salkind_Chapter 15.indd 277
9/16/2010 12:42:39 PM
278
Human Development
Morrs (1996) is skeptical of how psychological scientists have enforced worldviews on others and have tracked normalizing pathways to adulthood but is not as skeptical of the social scientists who have tracked out pathways to human freedom as though that itself was not an act of prescription or free from the danger of dominance by “the expert.’’8 Alternatives to developmentalism that stake their difference on quests for human freedom miss how notions of freedom have historically been “psychologized” and written into a developmental child-adult division of personhood. Freedom … is enacted only at the price of relying upon experts of the soul … We have been freed from the arbitrary prescriptions of religious and political authorities, thus allowing a range of different answers to the question of how we should live. But we have been bound into relationship with new authorities which are more profoundly subjectifying because they appear to emanate from our individual desires to fulfil ourselves in our everyday lives, to craft out personalities, to discover who we really are. Through these transformations we have “invented ourselves” with all the ambiguous costs and benefits that this invention has entailed. (Rose, 1996, p. 17)
Rose (1989, 1996) suggests that the act of offering “alternatives” to developmentalism is extremely difficult given the encompassment that narratives of human freedom, progress, personhood, and individual development have recently woven around the child-adult distinction. He suggests that we are never very far from what he calls the psy effect: the influence of a range of psychological techniques and concepts. Even in the very act of chasing “freedom” under liberalism, the psy effect has shaped what has been imaginable as freedom. How have we come to define and act toward ourselves in terms of freedom? How has freedom provided the rationale for all manner of coercive interventions into the lives of those seen as unfree or threats to freedom: the poor, the homeless, the mad, the risky, or those at risk? What are the relations between rationalities and techniques of government that have sought to justify themselves in terms of freedom and these practices of the self regulated by norms of freedom? … One central feature of the emergence of this contemporary regime of the free individual, and the political rationalities of liberalism to which freedom is so dear, has been the invention of a range of psy technologies for governing individuals in terms of their freedom. (Rose, 1996, p. 16)
Rose suggests further that one is never “free” of the psy effect, that in moving away from psychology, one invariably brings it along. Because psy has so saturated our notions of humanity, it is difficult to think outside of it, to posit alternatives that do not reinvoke some indebtedness to psy. Faced with a similar conundrum in regard to alternatives and human freedom, Walkerdine (1993) has to write with the historical legacy of
Salkind_Chapter 15.indd 278
9/16/2010 12:42:39 PM
Baker
Developmentalism, Progress, and Public Schooling
279
developmentalism at the same time as against it. Walkerdine (1993) suggests that one cannot stay inside “logocentric” and “Western” conceptions of rationality and still “deconstruct” developmentalism. It was the dominance of a set of practices for thinking, what came to be called “abstraction,” that generated “rationality” as the pinnacle of developmental theories. One has to move away from a notion of rationality, then, as rationality has provided the grounds for marginalizing others (peoples of color, all females, children, and constructing persons with disabilities), particularly through developmental psychology (Walkerdine, 1993). Yet, in suggesting a move away from rationality, and not unproblematically using the historical tools of rational argument to do so, Walkerdine (1993) posits progress as a movement forward to a different space beyond developmentalism. I am not proposing a shoring up of modernity by tacking difference onto a developmental model. I think that ship is holed below the water-line. It is sinking and cannot be saved. No, the way, forward [italics added] is far more challenging. It is to account for the production of subjectivity within historically and geographically specific practices, with no clear developmental sequence at all. (Walkerdine, 1993, p. 461)
The way forward (progressus, an improvement) in educational enactments is unproblematically construed as a move out of developmentalism and rationality, as though the concepts are disarticulated from the idea of progress through which developmentalism has subsequently been critiqued. The idea of progress has propelled the search for a “way forward” in the educational field ever since its convergence with developmentalism. The recent impulse to deconstruct developmentalism and to rescue the child from developmentalism’s clutches is another chapter in a long-standing narrative of progress, allied problematically to concepts of “human freedom” and played out through educational debate. Instead of the child being freed from the bondage of labor, the child is now freed from the bondage of development, from a normalized pathway of life, and saved by being read through other lenses. If one can stake the claim that there are none so trapped as those who think they are free or who think they know where freedom is not, then neither developmentalism nor the destabilization of developmentalism secures freedom. Foucault (1980) argued that one never leaves power behind. If one moves into a new space, then what makes it “new” will not be the absence of power but power’s shifting circulation. One is never in a state of being outside of power; hence, one is never “free” from power’s effects. It seems, then, that if the destabilization of developmentalism amid the clinging to narratives of progress can enable freedom, it will be because freedom has taken on a very different meaning than what we might write it as today, a meaning currently unimaginable.
Salkind_Chapter 15.indd 279
9/16/2010 12:42:39 PM
280
Human Development
Alternatives and Prescriptions Third, what might the preceding seemingly pessimistic assertion mean in terms of acting within and around education? Can developmentalism be disarticulated from Progress? Can recommendations be given without occupying the space of the “expert” on freedom? Can the dangerous and the good be summed up, clarifying once and for all where one stands or, more problematically, where others ought to? And more fundamentally, is education even conceivable without differences that it harbors and manages, without discourses that posit stages, without adulthood as the unproblematized outcome of a “less-abled” past called childhood? Pushing the rethinking of developmentalism to its logical extreme means rethinking education as an institution founded on the staged organization of progress that has been written into the child-adult division with multiple and uneven effects. If one is not intent on dismantling the school and yet concerned with developmentalism’s dominance, it would seem that one has to mobilize the very thing that one is critiquing, to implement the very problem trying to be avoided, to accept that danger is inherent to dismantling development and progress and to leaving them alone, to accept that “rethinking” will be seen as yet another form of “progress,” and, finally, to accept that any “recommendation” given, even that of not recommending alternatives, will be contradictory in terms of critiquing progress. On the question of alternatives to developmentalism, then, the point may not be to consider and produce yet another model for how or why children grow but to consider how or why it has come to matter so much, how or why “difference” underwrites the notion of progress, the idea of development, and their critique. This article has thus been an exploratory effort in the direction of those considerations, an opening out onto the complexities that inhere in judging the dangerous and the good.
More Expert Conclusions? I have argued that developmentalism, as a series of systems for educating children into and through “difference,” often only to reorient them toward singularity, has provided the tools for its dismantling in the present. Paradoxically, this dismantling has implicitly taken place in the name of progress, in the name of a movement forward, to a better or freer space or higher plane. Its destabilization has thus continued to deploy a form of reasoning that first enabled developmentalism to take hold. Noting this circularity is neither unique nor stultifying, however. If progress has been constituted in relation to the management of difference in educational work since the late 1800s, then it is precisely this education into difference that has enabled developmental theories to be perceived as just another partial way of
Salkind_Chapter 15.indd 280
9/16/2010 12:42:39 PM
Baker
Developmentalism, Progress, and Public Schooling
281
describing life. It is not surprising, then, that what made developmentalism claim its “progressive-ness” at the turn of the 20th century (i.e., its detailing of difference, its appeal to rational scientific argument, and its obsession with a higher stage) has both enabled and “entrapped” its critique at the turn of the 21st century.
Notes 1. I have considered the effect of developmentalism on the refiguration of a teaching identity elsewhere, hence the delimitation here to the child (see Baker, 1998a). At the outset, further delimitations need to be made. The focus throughout the article is not so much on developmentalism and its more obvious relationship to categories of physical and mental disability, which are well known. Although references are made to the production of disability, the analysis deliberately gives more weight to the racializing and then sexualizing effects of developmentalism in its earlier forms, and this is due to their relative submergence in the literature. In addition, class distinctions are treated by the focus on public schooling and the reality of slavery. The forms of developmentalism that targeted public schooling never had children from wealthy backgrounds or those who attended elite denominational schools as their targets. I have considered the intersection of such an array of “difference” elsewhere (see Baker, 1998b) and hence focus primarily here on developmentalism as a racializing and then sexualizing discourse network. From this emphasis, it is clear that the article spends most time in the 19th century to gain purchase on the effects of developmentalism at the point of its intersection with public schooling. This is a necessary historical excavation in terms of understanding that which underwrites present-day critiques. As such, the article does not seek to trace the varying forms of developmental discourse across the 20th century but, rather, focuses on its emergence and the limits of opposition that were established. 2. The assumed link between sensory stimuli and intellectual development generated a belief that people with visual and hearing “impairments” in particular could never acquire “intelligence” (see Franklin, 1994). 3. For an idea of how the young have historically been perceived in European, Canadian, and U.S. settings over the course of the Middles Ages and into the latter years of the “Enlightenment” and how they have been constituted through the writing of history itself, see the following: for France, Ariès (1962) and Heywood (1988); for the United States, de Mause (1975), Hawes and Hiner (1991), and Summerfield (1984); for Canada, Schnell (1977, 1979); and for England, Abbott (1993), Hanawalt (1993), Pollock (1983, 1987), Sommerville (1982), and Stone (1977). 4. The complexity of Rousseau’s theory of development and its racializing and sexualizing effects is given more detailed treatment in Baker (in press). One important inflection in Rousseau’s theories is that savagery is not necessarily “color specific.” It refers to past Europeans, in addition to the more frequent reference to those Rousseau identifies as “Caribs” and indigenous peoples outside of Europe. Thus, his notion of civil society is geographically limited to those parts of Europe, like his beloved Geneva, that have established city-states, and it incorporates his nostalgic preferences for ancient Spartan and Roman republics. 5. I have argued elsewhere, drawing on Kittler (1990), that the New Man that emerged with the first public schools in Germany produced the New Child in relation to the rise of the nation-state, an emphasis on literacy as a move away from “animality,” and the reconfiguration of Woman around motherhood and instruction in the “Mother tongue”
Salkind_Chapter 15.indd 281
9/16/2010 12:42:39 PM
282
Human Development
(Baker. in press). The role of women as mothers, and eventually as elementary teachers, was not foundational to the emergence of the developing child but was conjoined to it. The woman-mother-nature equation was produced in relation to the New Man and the state as a public sphere, but the first elementary teachers were not necessarily women either in Prussia or the United States (e.g., in rural Wisconsin male farmers more commonly taught children in the off-season until the 1850s). The complexity this implies, while significant, is not the focus of this article. 6. Adulthood was not available to all in the late-19th-century theories, and its discursive significance as a site newly embodying privileged access to rationality, ownership, and so forth traded off, as I argue later, the enslavement of African Americans even after slavery’s protracted and unevenly drawn out “abolition” after the Civil War. For a survey of ideas about the child-adult distinction prior to the emergence of psychological theories, see Borstelman (1983). 7. For histories of education that do not draw” upon celebratory notions of progressive education, see Popkewitz (1997) and Hamilton (1989). 8. Morrs also does not recognize the uncertainty and diversity in views of development that developmental psychologists acknowledge and currently debate.
References Abbott, M. (1993). Family ties: English families 1540 –1920. London: Routledge. Anderson, J. (1988). The education of Blacks in the South, 1860–1935. Chapel Hill: University of North Carolina Press. Ariès, P. (1962). Centuries of childhood: A social history, of family life. New York: Vintage Books. Baker, B. (1998a). Child-centeredness, redemption, and educational identities: A history of the present. Educational Theory, 48, 155–174. Baker, B. (1998b). “Childhood” in the emergence and spread of US public schools. In T. Popkewitz & M. Brennan (Eds.), Foucault’s challenge: Discourses, knowledge, and power in educational research (pp. 117–143). New York: Teachers College Press. Baker, B. (in press). In perpetual motion: Theories of power, educational history, and the child. New York: Peter Lang. Bloch, M. (1987). Becoming scientific and professional: An historical perspective on the aims and effects of early education. In T. Popkewitz (Ed.), The formation of school subjects: The struggle for creating an American institution (pp. 21–62). New York: Falmer Press. Bobbitt, J. F. (1909). Practical eugenics. Pedagogical Seminary 16, 385–394. Bohannon, E. W. (1896). A study of peculiar and exceptional children. Pedagogical Seminary, 4, 3–60. Borstelman, L. (1983). Children before psychology: Ideas about children from antiquity to the late 1800s. In P. H. Mussen (Ed.), Handbook of child psychology (4th ed., Vol. 1, pp. 1–40). New York: Wiley. Bronfenbrenner, U. (1979). An ecologv of human development: Experiment by nature and design. Cambridge, MA: Harvard University Press. Burman, E. (1994). Deconstructing developmental psychology. New York: Routledge. Cahoone, L. (Ed.). (1996). From modernism to postmodernism: An anthology. Cambridge, MA: Blackwell. Carrier, J. (1986). Learning disability: Social class and the construction of inequality, in American education. New York: Greenwood Press. Comenius, J. A. (1910). The great didactic (M. W. Keating, Trans.). London: Black. Cremin, L. (1961). The transformation of the school. New York: Knopf.
Salkind_Chapter 15.indd 282
9/16/2010 12:42:39 PM
Baker
Developmentalism, Progress, and Public Schooling
283
de Mause, L. (1975). The new psychohistory. New York: Psychohistory Press. Dopp, K. (1904). The natural activities of children as determining the industries in early education. In Proceedings and addresses of the Forty-third Annual Meeting of the National Education Association (pp. 437–443). Winona, MN: National Education Association. Eliot, C. W. (1905). The fundamental assumptions in the report of the Committee of Ten, 1893. Educational Review, 30, 325–343. Foucault, M. (1972). The archaeology, of knowledge; and, the discourse of language. New York: Pantheon Books. Foucault, M. (1980). Power/knowledge: Selected interviews and other writings, 1972–I977. New York: Pantheon Books. Foucault, M. (1990). The history, of sexuality, Vol. 1: An introduction (R Hurley, Trans.). New York: Vintage Books. Foucault, M. (1991). Governmentality. In G. Burchell, C. Gordon, & P. Miller (Eds.), The Foucault effect (pp. 87–104). Chicago: University of Chicago Press. Franklin, B. (1994). From “backwardness” to “at-risk” Childhood learning difficulties and the contradictions of school reform. Albany: State University of New York Press. Gould, S. (1981). The mismeasure of man. New York: Norton. Hall, G. S. (1895). Child study. In Proceedings and addresses of the National Education Association, session of the year 1894, Asbury Park (pp. 173–179). St. Paul, MN: National Education Association. Hall, G. S. (1896). Editorial. Pedagogical Seminary, 4, 1–2. Hall, G. S. (1901). The ideal school as based on child study. The Forum, 32(1), 24–29. Hall, G. S. (1905). The Negro in Africa and America. Pedagogical Seminary, 12, 350–368. Hall, G. S. (1911). Adolescence: Its psychology, and its relations to physiology, anthropology’, sociology sex, crime, religion and education (Vols. 1 and 2). New York: Appleton. Hamilton, D. (1989). Towards a theory of schooling. London: Falmer Press. Hanawalt, B. (1993). Growing up in medieual London: The experience of childhood in history New York: Oxford University Press. Hawes, J., & Hiner, R. (Eds.). (1991). Children in historical and comparative perspective: An international handbook and research guide. New York: Greenwood Press. Herbart, J. (1904). Outlines of educational doctrine. New York: Macmillan. (Original work published 1835) Herbart, J. (1977a). Aesthetic revelation of the world. In D. Robertson (Ed.), Significant contributions to the history, of psychology 1750–1920 (pp. 57–77). Washington, DC: University Publications of America. (Original work published 1804) Herbart, J. (1977b). The science of education. In D. Robertson (Ed.), Significant contributions to the history of psychology, 1750–1920 (pp. 78–268). Washington, DC: University Publications of America. (Original work published 1806) Herman, E. (1995). The romance of American psychology: Political culture in the age of experts. Berkeley: University of California Press. Hernstein, R., & Murray, C. (1994). The bell curve: Intelligence and class structure in American life. New York: Free Press. Heywood, C. (1988). Childhood in nineteenth century France. Cambridge, England: Cambridge University Press. Johnson, G. M. (1994). An ecological framework for conceptualizing educational risk. Urban Education, 29, 34–49. Kittler, F. (1990). Discourse networks 1800/1900. Stanford, CA: Stanford University Press. Kliebard, H. (1986). The struggle for the American curriculum: I893–1958. Boston: Routledge & Kegan Paul. Ladson-Billings, G., & Tait, W. F. IV. (1995). Toward a critical race theory of education. Teachers College Record, 97, 47–68. Livingstone, D. (1992). The preadamite theory and the marriage of science and religion. Philadelphia: American Philosophical Society.
Salkind_Chapter 15.indd 283
9/16/2010 12:42:39 PM
284
Human Development
Miller, M. J. (1904). What is kindergarten discipline? In Proceedings and addresses of the Forty-third Annual Meeting of the National Education Association (pp. 427–431). Winona, MN: National Education Association. Morrs, J. (1996). Growing critical: Alternatives to developmental psychology. New York: Routledge. Partridge, G. (1912). Genetic philosophy of education: An epitome of the published educational writings of President G. Stanley Hall of Clark University. New York: Sturgis & Walton. Pollock, L. (1983). Forgotten children: Parent-child relations from 1500–1900. London: Cambridge University Press. Pollock, L. (1987). A lasting relationship: Parents and children over three centuries. Hanover, NH: University Press of New England. Popkewitz, P., & Pittman, A. (1986). The idea of progress and the legitimation of state agendas: American proposals for school reform. Curriculum and Teaching, 1, 11–23. Popkewitz, T. (1991). A political sociology of educational reform: Power/knowledge in teaching, teacher education and research. New York: Teachers College Press. Popkewitz, T. (1997). The production of reason and power: Curriculum history in intellectual traditions. Journal of Curriculum Studies, 29, 131–164. Rice, J. (1893). The public-school system of the United States. New York: Arno Press. Rose, N. (1989). Governing the soul: The shaping of the private self. London: Routledge. Rose, N. (1996). Inventing our selves: Psychologoy, power, and personhood. Cambridge, England: Cambridge University Press. Rousseau, J. J. (1968). A discourse on the moral effects of the arts and sciences (G. D. H. Cole, Trans.). London: Dent. (Original work published 1751) Rousseau, J. J. (1991). Emile, or on education (A. Bloom, Trans.). London: Penguin Books. (Original work published 1762) Rousseau, J. J. (1992). Discourse on the origins of inequality (second discourse), polemics and political economy (J. Bush, R. Maters, C. Kelly, and T. Marshall, Trans.). Hanover, NH: University Press of New England. (Original work published 1755) Schnell, R. (1977). “The most ordered cf rescues”: A reinterpretation of childhood history and the common school. Paper presented at the annual meeting of the Midwest History of Education Society, Chicago. Schnell, R. (1979). Childhood as ideology: A reinterpretation of the common school. British Journal of Educational Studies, 27, 7–28. Scott, C. (1897). The psychology and puberty of adolescence. In Proceedings and addresses of the Thirtieth Annual Meeting of the National Education Association (pp. 843–851). Chicago: National Education Association. Skrtic, T. (1991). Behind special education: A critical analyis of professional culture and school organization. Denver, CO: Love. Skrtic, T. (Ed.). (1995). Disability, and democracy: Reconstructing (special) education for postmodernity. New York: Teachers College Press. Sommerville, J. (1982). The rise and fall of childhood. Beverley Hills, CA: Sage. Sontag, J. C. (1996). Toward a comprehensive theoretical framework for disability research—Bronfenbrenner revisited. Journal of Special Education, 30, 319–384. Spear-Swerling, L., & Sternberg, R. (1996). Off track: When poor readers become “learning disabled.” Boulder, CO: Westview Press. Stone, L. (1977). The family, sex and marriage in England, 1500–1800. New York: Harper & Row. Summerfield, G. (1984). Fantasy and reason: Children’s literature in the eighteenth century. Athens: University of Georgia Press. Taylor, E. (1994). An epistemological critique of experimentalism in psychology: Or, why G. Stanley Hall waited until William James was out of town to found the American
Salkind_Chapter 15.indd 284
9/16/2010 12:42:39 PM
Baker
Developmentalism, Progress, and Public Schooling
285
Psychological Association. In H. Adler & R. Rieber (Eds.), Aspects of the history of psychology in America 1892/1992. (pp. 37–61). New York: New York Academy of Sciences. Thomas, K. (1976). Age and authority in early modern England. Proceedings of the British Academy, 62, 205–248. Wagner, P. (1994). A sociology of modernity: Liberty and discipline. London: Routledge. Walkerdine, V. (1984). Developmental psychology and the child-centered pedagogy: The insertion of Piaget into early education. In J. Henriques, W. Hollway, C. Urwin, C. Venn, & V. Walkerdine (Eds.), Changing the subject (pp. 153–202). London: Methuen. Walkerdine, V. (1993). Beyond developmentalism? Theory and Psychology, 3, 451–469. Weber, E. (1984). Ideas influencing early childhood education: A theoretical analysis. New York: Teachers College Press. Wiltse, S. (1896). A preliminary sketch of the history of child study, for the year ending September, 1896. Pedagogical Seminary, 4, 111–125. Wynter, S. (1995). 1492: A new world view. In V. Lawrence & R. Nettleford (Eds.), Race, discourse and the origin of the Americas: A new world view (pp. 5–57). Washington, DC: Smithsonian Institution Press.
Salkind_Chapter 15.indd 285
9/16/2010 12:42:40 PM
Salkind_Chapter 15.indd 286
9/16/2010 12:42:40 PM
16 The Scientific Humanism of G. Stanley Hall1 Donald H. Meyer
Hall: A Transition Figure in American Psychology
G
ranville Stanley Hall (1844 –1924),2 known as a champion of experimental psychology in America, was also an advocate of what is today called humanistic psychology. On the one hand, he wanted to make psychology “a true science” by introducing laboratory. Techniques and by freeing it “from certain metaphysical vestiges which still encumber it” (Hall, 1923, p. 431). But on the other hand, he wanted to preserve what he considered to be essential human values in a time of change and uncertainty, to help man find himself in a technological and scientific world, with the aid of this metaphysically emancipated, scientific study of the mind. His evolutional or, as he called it, “genetic” psychology would serve science by placing man squarely in nature; and it would serve man by acquainting him with the natural source of his higher, spiritual powers. In short, G. Stanley Hall wanted to find a place for human aspirations in the realm of scientific objectivity. Clearly, Hall was a transitional figure, overworked though that phrase is, and has to be considered in the light of two traditions. He participated in the new movement to release psychology from its bondage to metaphysics and theology, so that it could more closely approximate the experimental precision of the biological and physical sciences. But he also participated in the old tradition of “mental philosophy” or “mental science,” which played such a central part in nineteenth century American intellectual life. Mental philosophy was a combination of epistemology, amateur anthropology, and armchair Source: Journal of Humanistic Psychology, 11 (1971): 201–203.
Salkind_Chapter 16.indd 287
9/16/2010 12:42:29 PM
288
Human Development
introspection; and it was usually associated with another discipline, moral philosophy, itself a combination of ethics, social science, and old-fashioned moralizing. Mental and moral philosophy were related in this respect: that they began by assuming that human nature, empirically studied, could be shown to be fundamentally moral and religious. The human mind, it was believed, possessed a number of moral attributes – such as the conscience, will, and certain benevolent affections – which pointed eternally beyond themselves, towards a higher, transcendent reality. Conscience, for example, the voice of God within, forever prodded one not only to consider one’s moral responsibilities to others, but also to realize or actualize one’s own spiritual potential (Wayland, 1835, pp. 42–63; Bowen, 1849, pp. 288–332; McCosh, 1850, pp. 8-9; Hopkins, 1869, pp. 125–131; Bascom, 1879, pp. 17– 43). In the mid-nineteenth century a course in mental and moral philosophy was taught in virtually every American college, usually in the senior year, frequently by the college president himself; and it was intended as the capstone of a student’s college education, providing him with what one historian has called “a unified interpretation of life” (Schmidt, 1930, pp. 143–145; Persons, 1958, pp. 189-194). Scores of textbooks were written for this course, and they may be found today in nearly any college library, dusty and deteriorating, shelved between modern studies in ethics or psychology, looking piteously out of place, like a fading photograph of some Victorian ancestor that gets mixed up among the snapshots of last summer’s vacation trip. The course was taught by men like Francis Wayland (1796 –1865), Mark Hopkins (1802–1887), and Noah Porter (1811–1892), who liked to refer to their discipline as a “science,” although people like Hall considered mental and moral philosophy anything but scientific and struggled mightily to disassociate psychology from its ancestor. This was done, but at a high price. Shabby as our Victorian forebear may look in that old photo, there was nonetheless something undeniably humane about him, a quality that seems diminished in our own time. We are more sophisticated than he, but not more fully alive than he was, perhaps less so. And here is precisely where Hall is of interest to us. For, despite his commitment to science and his contempt for what he once called the “insidious orthodoxy” of the old regime, Hall wanted psychology to remain a humanistic study. Hall, who studied under Mark Hopkins and John Bascom (1827–1911) at Williams College, shared many of the assumptions of his nineteenth century predecessors. To be sure, he criticized the exponents of mental and moral philosophy for their reliance on introspection, their seeming obliviousness to new intellectual trends, their confusion of science and metaphysics, and, in general, for the old-fogey influence they had over psychological and philosophical studies in American colleges (Hall, 1879). But, like these pioneers in American psychology, Hall sought in the study of mind a justification for basic moral and religious sentiments and convictions; and, like his predecessors, he assumed that fundamental moral and spiritual principles are implicit
Salkind_Chapter 16.indd 288
9/16/2010 12:42:29 PM
Meyer
Scientific Humanism
289
in human nature, and that a careful scrutiny of the human mind would prove that man is, by nature, a moral and religious being. These principles were to be discovered, however, by a rigorously scientific study, an approach supposedly free of prior theological commitments. Furthermore, whereas the earlier American philosophers emphasized man’s rational and conscious life, and concentrated on the individual moral agent, Hall put far greater stress on man’s emotional and instinctive life, and emphasized the communitarian, even tribal, aspects of morality (Hall and Saunders, 1900, pp. 534–591; Wilson, 1968, pp. 114 –143). Still, despite his criticisms of the early academic philosophers and his differences with them on specific matters, Hall carried on in a tradition of psychological moralism that had been prominent in Victorian America. And, committed though he was to scientific objectivity, he insisted that no less important was a regard for the warm, human element in all intellectual enterprises, including science. The psychologist, Hall believed, should minister to the psychic needs of his time – and this he tried to do. He knew he was living in an age in which longstanding beliefs and traditional values were being challenged by thoroughgoing social and intellectual changes. Like many others at the turn of the century, moreover, Hall believed that urban-industrial civilization was insulating man and robbing people of their vitality by denying them their basic instincts. Our emotional life is jeopardized, he complained, by a civilization that makes “the hot life of feeling” seem “remote and decadent” (Hall, 1904a, II, p. 59). Meanwhile, our ever-probing intellect has destroyed our religious faith – that is, our perception of the larger relations of life. The wholeness of life is lost, and consequently the scientist and the seer or poet can no longer be one man (Hall, 1912a, pp. 73–74; 1885, pp. 247–248). Knowing and feeling have been divorced and, in science, experiential knowledge is replaced by cold, impassive, spectator knowledge. Hall wanted to restore a sense of ontological wonder, a fusion of thought and feeling in what William Kingdon Clifford (1877) once called “cosmic emotion” – that sense of awe we feel in contemplating the universe, and of reverence in considering the microcosm of our own mind. In this he shared a concern of nineteenth century thinkers generally, whether they be atheists (like Clifford) or believers, positivists or romantics. Late in his life, Hall observed that man “has in a sense outgrown his world so that it is now too narrow for him. From now on development must be intensive rather than extensive, and inward as well as outward.” This means, he added, that we face the task “of getting deeper knowledge of human nature and of finding more effective ways of guiding it” (Hall, 1923, pp. 535–536). Stanley Hall wanted a scientific psychology that would restore and preserve the whole man. He attempted nothing less than to construct a new ontology of mind, rewording the teleological assumptions of an old, pre-evolutional world view, phrasing them in evolutional terms, and applying them to the modern situation. Freeing the scientific intellect from the old assumptions that had restricted it in the past, he would use science to
Salkind_Chapter 16.indd 289
9/16/2010 12:42:30 PM
290
Human Development
recover man’s essential humanity by probing into the rich, affective life that underlies intellect. This exploration of man’s deepest soul would help reestablish old-time moral and religious values – values that had lost their grip on the imagination. Hall’s unshakable faith in the unity of truth and in the oneness of nature and spirit sustained his conviction that he could introduce a new anthropology, using the science of the mind to define man’s nature and destiny.
Hall’s Genetic Psychology Hall developed a coherent psychological theory in the 1880s and 1890s, although he never gave it systematic exposition. He insisted, first of all, that we return to Aristotle, and base psychology on biology (Hall, 1904a, I, pp. viiviii). This would place psychology in the post-Darwinian world, and teach us “that the human soul in all its powers is just as much a product of evolution as the body”3 (Hall, 1909, p. 256). Genetic psychology, he maintained, would not be limited to the study of normal persons or to the information derived from literature or the laboratory. It would study life itself, and concern itself with abnormal persons, children, primitive peoples, animals and even plants, as well as every aspect of human existence, including religion, sex, myths, mores, folk lore, and music. Hall was distrustful of tidy “categories,” and believed that, as Darwinism had upset the logical order man had imposed on the biological world, so the genetic psychology would destroy the artificial classifications used in the study of mind (Hall, 1904a, II, pp. 49–51). In a tradition of scientific humanism that was to include Abraham Maslow (1966), Hall consistently opposed a too-narrow definition of science as something ahuman and mechanistic. He insisted that the science of psychology avoid reducing the life of the mind to mere units and laws of behavior in order more easily to embrace it in some explanatory structure. Thus he opposed what has in our day been called a “mechanomorphic” approach to psychology (Bugental, 1967, pp. 4–5), which, for the sake of methodological convenience, replaces the living person with a dehumanized abstraction. Hall’s objection, it should be noted, was scientific as well as ethical. Our knowledge of the mind is too rudimentary, he said, to justify the proliferation of categories and laws that some psychologists seem to favor. It is “the mark of the amateur to insist on a greater degree of accuracy than the subject permits” (Hall, 1906, p. 300). In effect, Hall accused the behaviorists and many experimental psychologists of the same kind of unscientific abstractionism or what Morton White calls “formalism” (1957, pp. 11–31) of which the exponents of the old “faculty psychology” had been guilty when they simplistically divided mind into quasi-organs of reason, will, and emotion (Hall, 1912a, p. 412; 1923, p. 361). The genetic psychology, in Hall’s view, is in an admirable position to serve the cause both of humanism and of science.
Salkind_Chapter 16.indd 290
9/16/2010 12:42:30 PM
Meyer
Scientific Humanism
291
Stanley Hall’s genetic psychology performed two particular functions. It provided him with a scientific methodology, and it served as a map of the human psyche. Proceeding on the assumption that mind is the product of the evolutional process, Hall argued that one can isolate various emotions, modes of thought, and patterns of behavior, and explain them historically. This raised the methodological problem. How is one to study ancient manifestations of these psychic elements? How is one to resurrect the mental past? Hall answered this in two ways. In the first place, since it seemed likely that ancient psychic activities may still be observed in animal life or in the customs of primitive people, Hall turned to comparative psychology to study the past as it is represented in the present. In the second place, adopting the recapitulation theory of Herbert Spencer and Ernst Haeckel, who held that the history of the individual repeats the history of the race, Hall was able to make child study the study of man’s psychic development. To understand the human mind, he said, we must undertake “the objective study of every phase and every growing stage of the psyche in animals, savages, and children” (Hall, 1898, p. 393). Savages could be studied in the field, and animals could be brought into the laboratory. Children could be studied through the technique of the questionnaire, developed by Francis Galton, and used by Hall and his students as early as 1883 (Hall, 1883). Genetic psychology was not just a methodological device but also a key to human nature: for it provided a chart of the human mind – indeed, of man’s soul. Hall had come to doubt the usefulness of speculative philosophy in this regard, and became convinced that, truly to understand the human mind, we must get behind the so-called “higher faculties” of consciousness and intellect, and examine emotions, instincts, and adaptive reflexes. We must look backward over the generations to a larger “soul-life.” Once we have done this, we will soon realize that the soul, in its lowest stages, is indistinguishable from life; for there can be no vitality without soul. As an organism, animal, or plant adjusts structurally or functionally to new influences in the environment, it manifests soul-life. There is even a sense, he suggested as he considered phenomena like heat, light, and radio-activity, in which the entire universe is alive, hence possessed of soul (Hall, 1908, pp. 149–212; also: 1898, p. 393; 1904a, II, p. 63). “If we abandon ourselves to the very madness of mysticism,” he once said, we may, in the “vibrations and impacts” found in matter, heat, light, and even in atoms and electrons, sense a rhythm which suggests “the material soul of the All” (Hall, 1911, I, p. 92). For Hall, the soul was part of the structure of being. In rhapsodic moments, which were not infrequent, he would speak of the psyche as a “quantum and direction of vital energy,” and of the ego as “a spark struck off from the central source of all being,” the essence of which is “its process of becoming” (Hall, 1904a, II, p. 69). We may examine the vaster soul-life that underlies human consciousness, Hall reasoned, because each person is actually a “vast aggregate of qualities and influences vinculated together, treated and acting as a unity” (Hall, 1898,
Salkind_Chapter 16.indd 291
9/16/2010 12:42:30 PM
292
Human Development
p. 391). By analyzing this aggregate into its component parts, and then tracing each of these components down the scale of life, we can determine the boundaries of the soul. But we must begin not with awareness, choice, or even sentience, but with “affectivity” – the ability to respond to a pleasurable or painful stimulus. This response – which at times may only be a reflex action – is the earliest manifestation of soul. Pleasure and pain, said Hall, “emerged perhaps at first as separate outcrops in a hitherto apsychic world” (Hall, 1914, pp. 157–159). Hall’s study of fear illustrates his technique of exploring our psychic life. Fear, or the anticipation of pain, is the earliest observable complex psychic phenomenon. “If pleasure-pain is the result of the first day’s work of creative psychic evolution,” he said, “fear is that of the second” (Hall, 1914, p. 151). In experimenting on paramoecia and on the excised muscles of a frog’s leg, Hall noted that stimuli which were normally below the threshold of sensitivity nonetheless elicited a response if they were repeated often enough. He suggested that the “physiological anticipation” evident in these experiments permits at least a “pragmatic, quasi, or als ob assumption of fear,” that is, an anticipation of pain. From this inference Hall deduced that, “as man is implicit in the ovum, so fear is implicit in the above conditions” (Hall, 1914, pp. 156 –157). All of man’s “higher” mental capabilities may be traced to such rudimentary phenomena. Mind, as we think of it, emerged as basic emotions, like fear, evolved from some primitive manifestation in the earliest life-forms into more complex patterns of feeling. The intellectual powers developed and more highly evolved organisms tried to cope with such emotions as fear by foreseeing and guarding against possible future pain. Knowledge, which Hall associated with activity and regarded more as a function of the will than of the intellect (Hall, 1904a, I, pp. 131–132, 235), emerged as the animal used his muscles to accomplish specific tasks. Mechanical actions, instincts, habits, and emotions came first. Consciousness, curiosity, and intellect appeared later, and remain peripheral to the old, pre-conscious attributes of the soul, which are at the root of our psychic life. Although Hall’s system of psychology was never fully developed, it consisted in a remarkably unified set of assumptions. Indeed, behind much of his research lay his desire for unity. He delighted in harmony, and had a passion for reconciling opposites. Although he rejected Hegel’s philosophical system (Hall, 1878b), he remained very much a Hegelian in aspiration. He wanted to synthesize all the contradictions he found about him, and discovered in psychology the means of doing this. Concerned as it was with man’s deepest soul, psychology had the mission of identifying man in relation to a new universe. Hall announced this mission in 1885. The “new psychology,” he said, is to flood and transfuse the new and vaster conceptions of the universe and man’s place in it—now slowly taking form, and giving reason a new
Salkind_Chapter 16.indd 292
9/16/2010 12:42:30 PM
Meyer
Scientific Humanism
293
cosmos, and involving momentous and far-reaching practical and social consequences—with the old Scriptural sense of unity, rationality, and love beneath and above all [Hall, 1885, pp. 247–248].
Psychology would reacquaint man with the cosmos, giving him a sense of belongingness and rootedness in it. Hall was prone to make ambitious claims for his genetic psychology. In 1897 he announced that the “doctrine of mind” that it offered was “fit to be a national philosophy of democracy,” because it required no mystical or esoteric insights and, hence, was publically verifiable (Hall, 1897, p. 248); and, he added later, because it supported the proposition that human nature is trustworthy (Hall, 1923, p. 363). This doctrine of mind, he said, holds that every individual is an expression of a “folk-soul,” which provides him with his racial and ethnic values. Beneath the folk-soul lies the soul of humanity itself, “Mansoul,” binding the human race together. Finally, there is a cosmic soul, the sustainer of reality, which is at once material and psychic (Hall, 1923, pp. 438 – 443). God, for Hall, was the “cosmic order personified.” God, he said, is the power which provides for “order, law, and possibility of science” (Hall, 1911, I, p. 139). The genetic psychology served a philosophical end as it revealed this order; and it served a religious end as it showed man his place in the universe. It is not enough for a man just to feel at home in the universe. Knowing, for Hall, implied doing; and any descriptive statement contained something of an ethical imperative. Hall’s ethics centered on the ideals of love and service. The individual, animated by the natural instincts of gregariousness, pity, and love, must devote himself to the welfare of others (Hall, 1904a, II, p. 337; 1917, pp. 282–283, 726–733; 1920, pp. 369–371). The life of benevolent service, rooted as it is in moral impulses that are endowments of the human soul, is the natural life for man. Consequently, moral development is the result not of control but of normal growth; and values are not to be imposed from without but cultivated as they emerge from within. The practical conclusion that Hall drew from his psychological theory of morals is that morality is the result of education rather than of coercion. He predicted that education, which provides the link between man’s new knowledge of himself and the moral implications of that knowledge, would lead to a “higher anthropology” (Hall, 1893, pp. 440–441; 1892, pp. 72–89). In 1904 Hall revealed the wider dimensions of his “higher anthropology” when he published his greatest work, Adolescence, drawing together many of his own speculations and using material from over, 2,000 sources. A central theme of the work is that the child is the link between the past and the future. In the child’s soul are the “echoes” and “murmurings” of an ancient, long-forgotten past – a past that the child must be allowed to recapitulate, step by step (Hall, 1904a, I, pp. x-xi). It is essential, he warned, that we keep out of nature’s way and trust evolution in this matter, working with it and not against it; for
Salkind_Chapter 16.indd 293
9/16/2010 12:42:30 PM
294
Human Development
otherwise we may induce “precocity,” forcing the child to skip a step in the evolutional ladder and thereby stunting his emotional growth. Adolescence is an especially crucial time in a person’s growth because, in this period, the higher sensibilities develop and the ideals of love and service take form (Hall, 1904a, I, pp. xiii-xv; II, p. 337). At last the child becomes fully human and is ready to make his contribution to the always-emerging human soul.
Conclusion: Hall’s Significance Hall, as we have seen, was a transitional figure in the history of psychology, and perhaps in intellectual history generally. Like William James, whom he resembles in some respects, and like John Dewey, he was aware of the danger threatening modern man – the danger of being dehumanized by the very processes, scientific and technological, that were expected to enlarge his vision and improve his life. He wanted to see to it that certain spiritual and emotional qualities that had, for centuries, been part of western man’s definition of humanity, were not destroyed or allowed to atrophy in the modern world. The application of science to the question of human nature, Hall knew, was resulting in a change in the philosophy of man – the definition of human aspirations, instincts, potentials. And when the philosophy of man changes, as Abraham Maslow observed, “then everything changes” (1968, p. 189). Hall could accept this; but he was concerned about the direction of that change. He wanted to see an enlargement, not a dimunition, of human possibility. For him, scientific psychology was the most promising means of ensuring this enlargement, although toward the end of his life he grew more doubtful. He became convinced “that neither the metaphysician, philosopher, mathematician, chemist, physicist, nor strange as this may seem to some of my fellow psychologists, even the mechanic or mere computer, is or ever can become a good psychologist. All of them lack the instinct, insight, or flair needful here somewhat in proportion to their excellence in these fields.” Their influence is “de-vitalizing, de-animistic, de-anthropomorphizing while psychology stands for the progressive refinement and final blossoming and fruitage of just these tendencies, which take their rise with biology” (Hall, 1923, p. 435). The biological language as it is used here expresses more than Hall’s desire to associate psychology with that branch of science. It expresses the vitalism at the center of his psychological vision. He liked to speak of the “vital energy” of the human spirit that was behind all viable institutions, the “will-to-life, èlan vital, horme, libido, nisus,” whatever one might choose to call it, “which made all the ascending orders of life and in Mansoul itself evolved mind, society, language, industry, gods, religion – in short all human institutions, and lastly science” (Hall, 1920a, p. 1). For Hall there was no dichotomy between science and values or between the scientific and the human. The only danger was that psychologists might so rigidly and narrowly limit the scope and permissible
Salkind_Chapter 16.indd 294
9/16/2010 12:42:30 PM
Meyer
Scientific Humanism
295
limits of their discipline as to excise from it matters of ultimate concern. Living until 1924, Hall could see this happening. He complained the year before his death that psychology, as he understood it, was on the decline. He deplored what he considered the excessive reliance on physiological psychology and the failure to make sufficient advance in psychoanalytical techniques and the study of the unconscious life. Above all, he spoke of the need to reevaluate life, “its mores and its institutions, by inquring how they square with the vastly older basal instinct-feelings of the race – hunger, love, the herd, property, appentencies, and many others.” What is needed is “a great synthetic movement which shall bring new harmony to our now ominously divided effort and set us again on the trail we have so unfortunately lost” (Hall, 1923, p. 16). Undeniably, Hall’s vision (and “vision” is the appropriate term) was noble. His scientific humanism, however, did not survive him; and the man who was once introduced as the “Darwin of the mind” was, ironically, ultimately dismissed as being too metaphysical and speculative – too unscientific, in short. He is remembered, if at all, as an educator, a founder, and a promoter (Boring, 1929, p. 504). Perhaps Hall’s greatest shortcoming as a theorist was not his effort to combine the scientific and the humane, but the fact that, like many latter day Victorians, he was by temperament what Isaiah Berlin (1953) calls a “hedgehog” – demanding a single, unitary vision capable of encompassing reality and of accounting for all things. In this he makes an interesting comparison with his contemporary, William James, who was as much a humanist as Hall and more respected as a scientific psychologist (although James was less passionately eager to describe himself as a scientist than was Hall). But James, considering the primitive state of psychology in his day and the disordered state of philosophy and religion, was willing to accept a pluralistic universe. Furthermore, for James, psychology did not provide answers to basic metaphysical questions; it merely allowed him to rephrase these questions in somewhat more manageable form. In contrast, Hall was, by his own description, a “monist” (Hall, 1878a, p. 450), and he sought in psychology nothing less than the basis for a new theology that would chart the limits of being-itself. He was asking for a lot – for more, many thought, than he had any right to expect – and he was, consequently, too readily repudiated as a romantic and a dreamer. But it was more his metaphysical immodesty than his metaphysical and humanistic concern that discredited Hall. What does a figure like G. Stanley Hall have to say to us today? To be sure, we can find similarities between some of his ideas and our own: his underlying concerns are, after all, universal. But there is more. With all his shortcomings and his early over-confidence, Hall had the largeness of mind to realize that a truly scientific psychology need not and ought not be permitted to reduce man to a moral nullity. Some philosophers argue that one cannot logically derive normative judgments from descriptive statements, and perhaps they are right; although, as Hall frequently observed, life is frequently illogical, and it is with life that the psychologist is primarily concerned. In any case, what Hall
Salkind_Chapter 16.indd 295
9/16/2010 12:42:30 PM
296
Human Development
realized is that a truly scientific psychology will not make its subject (man) less complex than it is simply to fit it conveniently into an explanatory framework. Perhaps Hall erred in maintaining that science can somehow verify man’s most deeply cherished subjective values. But he was right in insisting that man is a moral creature, capable of cherishing things and of aspiring beyond the realm of cherished things, and that a scientific explanation which ignores ‘or minimizes this ignores too much. In retrospect, the irony of Hall’s position is at once apparent. His schemes to strip psychology of its metaphysical assumptions in order to use it to restore ultimate truths, and to recapture man’s soul-life by a more objective investigation of mind seem futile and even self-defeating. Nevertheless, Hall remained the custodian of an old belief and an ancient hope – the belief that human nature points beyond itself and the hope that, through self-knowledge, man may indeed transcend himself. Science, he once remarked, “should be taught at first in a large, all-comprehensive way, reopening the half-obscured but broad road by which man passes from nature to nature’s God” (Hall, 1904a, II, p. 151). His shortcomings as a scientist should not obscure for us the essential humanity of his ideal; for the irony of his failure is not exclusively his.
Notes 1. Requests for reprints should be addressed to Professor Donald H. Meyer, Department of History, University of Delaware, Newark, Delaware, 19711. 2. Hall was born in Ashfield, Massachusetts, and in 1844 entered Williams College with the intention of studying for the ministry. In 1867 he enrolled at Union Theological Seminary, but left in 1868 to go to Germany, where he studied philosophy, rather aimlessly, for three years. Returning to America in 1872, he completed his work at Union, but abandoned the idea of the ministry. He entered Harvard as a graduate student in psychology, and in 1878 received his Ph.D., having worked under William James and Henry Bowditch. That year he returned to Germany for another three years’ study under such commanding figures as Herman von Helmholtz, Karl Ludwig, and Wilhelm Wundt. Soon after returning to America, Hall accepted a position at the new Johns Hopkins University, where he established one of the first complete psychological laboratories in 1883. In 1889 he founded the American Journal of Psychology, a year after having become the first president of the newly founded Clark University at Worcester, Massachusetts. He remained at Glark until 1920. 3. Although he sometimes used them interchangeably, the terms “mind” and “soul” usually had different meanings for Hall. He tended to relate the soul to instinct and emotion, and to regard it as primordial; while he considered the mind to be a more plastic entity, and associated it with later-appearing intellectual powers (Hall, 1904a, II, p. 63).
References Bascom, J. Ethics or science of duty. New York: G. P. Putnam’s Sons, 1879. Berlin, I. The hedgehog and the fox. New York: Simon and Schuster, 1953. Boring, E. G. A history of experimental psychology. New York: Century, 1929.
Salkind_Chapter 16.indd 296
9/16/2010 12:42:30 PM
Meyer
Scientific Humanism
297
Bowen, F. Lowell lectures, on the application of metaphysical and ethical science to the evidences of religion. Boston: Little and Brown, 1849. Bugental, J. F. T. The challenge that is man. Journal of Humanistic Psychology, 1967, 7, 1–9. Clifford, W. K. Cosmic emotion (1877). In L. Stephen and F. Pollock (Eds.). London: Macmillan, 1879. 2 vols. 2, 253–285. Hall, G. S. The muscular perception of space. Mind, 1878, 3, 433–450. (a) Hall, G. S. Notes on Hegel and his critics. Journal of Speculative Psychology, 1878, 12, 93–108. (b) Hall, G. S. The philosophy of the future. Nation, 1878, 27, 283–294. (c) Hall, G. S. Philosophy in the United States. Mind, 1879, 4, 89–105. Hall, G. S. Aspects of German culture. Boston: Osgood, 1881. Hall, G. S. The content of children’s minds on entering school. Princeton Review, 1883, 11, 249–272. Hall, G. S. The new psychology. Andover Review, 1885, 3, 120–135, 239–248. Hall, G. S. Moral education and will-training.” Pedagogical Seminary, 1892, 2, 72–89. Hall, G. S. Child study the basis of exact education. Forum, 1893, 16, 429–441. Hall, G. S. A study of fears. American Journal of Psychology, 1897, 8, 141–150. Hall, G. S. Some aspects of the early sense of self. American Journal of Psychology, 1898, 9, 351–395. Hall, G. S. A study of anger. American Journal of Psychology, 1899, 10, 516–591. Hall, G. S. Pity. American Journal of Psychology, 1900, 11, 534 –591. Hall, G. S. Civilization and savagery. Massachusetts Historical Society, Proceedings, 1903, 17 (Ser. 2), 4 –13. Hall, G. S. Adolescence: Its psychology and its relation to physiology, anthropology, sociology, sex, crime, religion and education. New York: Appleton, 1904. 2 vols. (a) Hall, G. S. Youth: Its education, regimen, and hygiene. New York: Appleton, 1904. (b) Hall, G. S. The affiliation of psychology with philosophy and with the natural sciences. Science, 1906, 23, 297–301. Hall, G. S. Aspects of child life and education. New York: Appleton, 1907. Hall, G. S. A glance at the phyletic background of genetic psychology. American Journal of Psychology, 19, 1908, 149–212. Hall, G. S. Evolution and psychology. Fifty years of Darwinism: Modern aspects of evolution. New York: Henry Holt, 1909. Hall, G. S. Educational problems. New York: Appleton, 1911. 2 vols. Hall, G. S. Founders of modern psychology. New York: Appleton, 1912. (a) Hall, G. S. The genetic view of Berkeley’s religious motivation. Journal of Religious Psychology, 1912, 5, 137–162. (b) Hall, G. S. Why Kant is passing. American Journal of Psychology, 1912, 23, 370– 426. (c) Hall, G. S. A synthetic genetic study of fear. American Journal of Psychology, 1914, 25, 149–200, 321–392. Hall, G. S. Freudian methods applied to anger. American Journal of Psychology, 1915, 26, 439–443. Hall, G. S. Jesus, the Christ, in the light of psychology. New York: Appleton, 1917. Hall, G. S. Morale: The supreme standard of life and conduct. New York: Appleton, 1920. (a) Hall, G. S. Recreations of a psychologist. New York: Appleton, 1920. (b) Hall, G. S. Life and confessions of a psychologist. New York: Appleton, 1923. Hopkins, M. The law of love and love as a law; Or, moral science, theoretical and practical. New York: Charles Scribner, 1869. Maslow, A. H. The psychology of science: A reconnaissance. New York and London: Harper, 1966. Maslow, A. H. Toward a psychology of being (2nd ed.) New York: Van Nostrand Reinhold, 1968.
Salkind_Chapter 16.indd 297
9/16/2010 12:42:30 PM
298
Human Development
McCosh, J. Method of divine government physical and moral. London: Macmillan, 1850. Persons, S. American minds: A history of ideas. New York, Henry Holt, 1958. Schmidt, G. P. The old time college president. New York: Columbia University Press, 1930. Wayland, F. The elements of moral science (1835), J. L. Blau (Ed.). Cambridge, Massachusetts: Harvard University Press, 1963. White, M. Social thought in America: The revolt against formalism. Boston: Beacon Press, 1957. Wilson, R. J. In quest of community: Social philosophy in the United States, 1860–1920. New York and London: Oxford University Press, 1968.
Salkind_Chapter 16.indd 298
9/16/2010 12:42:30 PM
17 Growing Old – Or Older and Growing1 Carl R. Rogers
W
hat is it like to be 75 years old? It is not the same as being 55 or 35, and yet, for me, the differences are not so great as you might imagine. I’m not sure whether my story will have any use or significance to anyone else, because I have been so uniquely fortunate. It is mostly for myself that I am going to set down a few perceptions and reactions. I have chosen to limit myself to the decade from age 65 to 75, because 65 marks, for many people, the end of a productive life, and the beginning of “retirement” – whatever that means!
The Physical Side I do feel physical deterioration. I notice it in many ways. Ten years ago I greatly enjoyed throwing a frisbee. Now my right shoulder is too painfully arthritic for that. In my garden I realize that a task which would have been easy five years ago but difficult last year, now seems like too much, and I had better leave it for our weekly gardener. This slow deterioration, with various minor disorders of vision, heart-beat and the like, inform me that the physical portion of what I call “me” is not going to last forever. Yet I still enjoy a four-mile walk on the beach. I can lift heavy objects, do all the shopping, cooking, and dishwashing when my wife is ill, carry my own luggage without puffing. The female form still seems to me one of the loveliest creations of the universe and I appreciate it greatly. I feel as sexy in my interests as I was at 35, though I can’t say the same about my ability.
Source: Journal of Humanistic Psychology, 20(4) (1980): 5–16.
Salkind_Chapter 17.indd 299
9/16/2010 12:42:20 PM
300
Human Development
But I am delighted that I still feel sexually alive, even though I can sympathize with the remark of Supreme Court Justice Oliver Wendell Holmes upon leaving a burlesque house at age 80: “Oh to be 70 again!” So, I am well aware that I am obviously old. Yet from the inside I’m still the same person in many ways, neither old nor young. It is that person of whom I will speak.
My Activities New Enterprises I realize that in the past decade I have embarked upon many new ventures involving psychological or even physical risk. It puzzles me that in most instances my engagement in these enterprises was triggered by a suggestion or a remark of someone else. This makes me realize that frequently there must be a readiness in me of which I am not aware, which springs into action only when someone presses the appropriate button. Let me illustrate. It was primarily Bill Coulson and a few others who said, in 1968, “Our group should form a new and separate organization.” Out of that came the formation of the Center for Studies of the Person – the zaniest, most improbable, and most influential non-organization imaginable. Once it was suggested, I was very active in the group which brought it into being, and helped nurture it – and ourselves – during the first difficult years. It was a friend of mine, Ruth Cornell, an elementary school teacher, who asked, “Why is there no book of yours on our reading lists in Education?” This sparked the initial thinking which led to my book Freedom to Learn. I never would have considered trying to influence the status-conscious medical profession, had it not been for Orienne Strode’s dream of having a humanizing impact on physicians through intensive group experiences. Skeptical but hopeful, I devoted energy to helping start the program. We ran a great risk of failure. Instead, it has become widely influential. Nine hundred medical educators have participated in the encounter groups along with many spouses, and some physicians-in-training who bring in the “worm’s-eye view” of medical education. It has been an exciting and rewarding development, now completely independent of any but the most minor assistance from me. This summer we are holding our fifth 16-day intensive workshop in the person-centered approach. These workshops have taught me perhaps more than any other one venture in the past decade. I have learned and put into practice new ways of being myself. I have learned cognitively and intuitively about the group process and about group-initiated ways of forming a community. These have been tremendous experiences, involving a strong staff which has become a close professional “family.” We have done more and more risking in trying out new ways of being with a group. And how did
Salkind_Chapter 17.indd 300
9/16/2010 12:42:20 PM
Rogers
Growing Old
301
I become involved in this large and time-consuming enterprise? Four years ago my daughter Natalie said to me, “Why don’t we do a workshop together, perhaps around a client-centered approach?” Neither of us could have possibly guessed all that would grow out of that conversation. My new book on personal power likewise found its initial spark in a conversation. A graduate student then, Alan Nelson, challenged me on my statement that there was no “politics” in client centered therapy. This led me into a line of thought which I must have been very ready to pursue because portions of the book simply wrote themselves.
Foolhardy or Wise? The most recent, and perhaps most risky venture is the trip I and four other CSP members took to Brazil. Here it was the organizing efforts, the vision, and the persuasiveness of Eduardo Bandeira which caused me to agree to go. Some people believed the trip would be too long and hard for me at my age, and I had qualms myself about 15-hour plane flights and the like. Then there was the arrogance of thinking that our efforts could in any way influence a vast country. But the opportunity to train Brazilian facilitators, most of whom had attended our workshops in this country, in order that they could put on their own intensive workshop, was very attractive. Then there was another opportunity. We were to meet audiences of 600 to 800 people in three of Brazil’s largest cities. These were two-day institutes, in which we would be together for about 12 hours over two days. Before we left the United States we agreed that with meetings of this size, and of such short duration, we would necessarily have to rely on giving talks. Yet, as the time approached, we felt more and more strongly that to talk about a person-centered approach, without sharing the control and direction of the sessions, without giving the participants a chance to express themselves and experience their own power, was something we could not do. So we took some extremely far-out gambles. In addition to very short talks, we tried leaderless small groups, special interest groups, a demonstration encounter group, dialogue between staff and audience. But the most daring thing was to form a large circle of 800 people (10 to 12 deep), and permit feelings and attitudes to be expressed. Microphones were handed about to those who wished to speak. Participants and staff took part as equals. There was no one person or group exercising leadership. It became a mammoth encounter group. There was much initial chaos, but then people began to listen to one another. There were criticisms – sometimes violent – of the staff and of the process. There were persons who felt they had never learned so much in such a short time. There were the sharpest of differences. After one person blasted the staff for not answering questions, not taking control and giving guidance, the next person said, “But when, if ever,
Salkind_Chapter 17.indd 301
9/16/2010 12:42:20 PM
302
Human Development
have we all felt so free to criticize, to express ourselves, to say anything?” Finally, there was constructive discussion of what they would do with their new knowledge in their back-home situations. I remember, after the first evening in Sao Paulo, when the session had been extremely chaotic, and I was keenly aware that we had only six hours more with the group, I refused to talk to anyone about that meeting. I was experiencing enormous confusion. Either I had helped launch an incredibly stupid experiment doomed to failure, or I had helped to innovate a whole new way of permitting 800 people to sense their own potentialities and to participate in forming their own learning experience. There was no way of telling which it would prove to be. Perhaps the greater the risk, the greater the satisfaction. In Sao Paulo, the second evening, there was a real sense of community, and people were experiencing significant changes in themselves. Informal follow-up in the weeks and months since bear out the worthwhileness of the experience for hundreds of people in each of the three cities. Never have I felt an extended trip to be so valuable. I learned a great deal, and there is no doubt that we managed to create a facilitative climate in which all kinds of creative things – at personal, interpersonal, and group levels – happened. I believe we left a mark on Brazil, and certainly Brazil changed all of us. Certainly we have extended our vision of what can be done in very large groups. So those are some of the activities – all extremely profitable to me – into which I have been drawn during this period.
Risk Taking In these activities there has in each case been an element of risk. Indeed it seems to me that the experiences I value most in my recent life all entail considerable risk. So I should like to pause for a moment and speculate as to the reasons behind my taking of chances. Why does it appeal to me to try the unknown, to gamble on something new, when I could easily settle for ways of doing things which I know from past experience would work very satisfactorily? I am not sure I understand fully, but I can see several factors which have made a difference. The first has to do with what I think of as my support group, the loose cluster of friends and close associates, most of whom have worked with me in one or another of these endeavors. In the interactions of this group, there is no doubt that we actually or implicitly encourage each other to do the new or daring thing. For example I am certain that, acting singly, no member of our Brazil group would have gone so far in experimentation as did the five of us working together. We could gamble because if we failed, we had colleagues who believed in us, and who could help put the pieces back together. We gave each other courage.
Salkind_Chapter 17.indd 302
9/16/2010 12:42:20 PM
Rogers
Growing Old
303
A second element is my affinity for youth and for the emerging life style which is being brought about with the help of young people. I cannot say why I have this affinity. I know it exists. I have written about “the emerging person” of tomorrow, and I know that I myself am drawn toward this newer way of being and living. I have wondered if I were simply engaged in wishful thinking in describing that person. Now I feel confirmed when I discover that the Stanford Research Institute has completed a study in which it estimates that 45 million Americans are committed to “a way of living that reflects these inner convictions: first, it is better to have things on a human scale; second, that it is better to live frugally, to conserve, recycle, not waste; and third, that the inner life, rather than externals, is central.”2 I belong to that group, and trying to live in this new way is necessarily risky and uncertain. Another factor is that I am bored by safety and surety. I know that sometimes when I prepare a talk or deliver a paper it is very well received by an audience. This tells me that I could give the talk twenty times to twenty different audiences and I would be assured of a good reception. I simply cannot do this. If I give the same talk three or four times, I become bored with myself. I cannot bear to do it again. I could earn money, I could obtain a positive reaction, but I can’t do it. I’m bored by knowing how it will turn out. I’m bored to hear myself saying the same things. It is necessary to my life to try something new. But perhaps the major reason I am willing to take chances is that I have found that in doing so, whether I succeed or fail, I learn. Learning, especially learning from experience, has been a prime element in making my life worthwhile. Such learning helps me to expand. So I continue to risk.
Productivity Writing In thinking about this talk I asked myself, “What have I produced during this past decade?” I was utterly astonished at what I found. The list of my publications, which my secretary keeps up to date, tells me that since I turned 65 I have turned out four books, some 40 shorter pieces, and several films. This is more than I have published or produced during any previous decade. Furthermore, each of the books is on a distinctively different subject, though they are all tied together by a common philosophy. Freedom to Learn, in 1969, brought together in one book my unconventional approach to education. The following year saw my book on encounter groups expressing my accumulating learnings on this exciting development. In 1972 Becoming Partners pictured many of the new patterns in men-women relationships. And now Carl Rogers on Personal Power explores, in many fields, the emerging politics of a person-centered approach.
Salkind_Chapter 17.indd 303
9/16/2010 12:42:20 PM
304
Human Development
Of the two-score papers, four stand out for me, two of them looking forward, two backward. An article on empathy consolidates what I have learned about that extremely important way of being, and I think well of this paper. I also like the freshness of my statement on “Do We Need ‘A’ Reality?” published only in Brazil! Then two other papers reflect upon my career as a psychologist, and the development of my philosophy of interpersonal relationships. I stand by both.
Why? I look on all this surge of writing with wonder. What is the explanation? Different people in their later years have had very individual reasons for their writing. At age 80, Arnold Toynbee (1969) asks himself the question, “What has made me work?” He responds: Conscience. In my attitude toward work I am American-minded, not Australian-minded. To be always working and still at full stretch, has been laid upon me by my conscience as a duty. This enslavement to work for work’s sake is, I suppose, irrational, but thinking so would not liberate me. If I slacked, or even just slackened, I should be conscience-stricken and therefore uneasy and unhappy, so this spur seems likely to continue to drive me as long as I have any working power left in me.
Somehow to live such a driven life seems very sad to me. It certainly bears little resemblance to my motivation. I know that Abraham Maslow, in the years before his death, had a different urge. He experienced a great deal of internal pressure because he felt there was so much he had to say which was still unsaid. This urge to get it all down kept him writing to the end. My view is quite different. My psychoanalyst friend, Paul Bergman, wrote that no man has more than one seminal idea in his lifetime, and that all his writings are simply further explications of that one theme. I agree. I think this describes my products. Certainly one reason for writing is that I have a curious mind. I like to see and explore the implications of ideas – mine and others’. I like to be logical, I like to pursue the ramifications of a thought. I am deeply involved in the world of feeling, intuition, nonverbal as well as verbal communication, but I also enjoy thinking and writing about that world. It clarifies its meaning for me. Yet there is, I believe, a much more important reason for my writing. To me it seems likely that inside I am still the shy boy who found communication very difficult in interpersonal situations, who wrote love letters which were more eloquent than his direct expressions of love, who expressed himself freely in high school themes but felt himself too “odd” to say the same
Salkind_Chapter 17.indd 304
9/16/2010 12:42:20 PM
Rogers
Growing Old
305
things in class. That boy is still very much a part of me. Writing is my way of communicating with a world to which, in a very real sense, I feel I do not quite belong. I wish very much to be understood, but I don’t expect to be. Writing is the message I seal in the bottle, and cast into the sea. My astonishment is that people on an enormous number of beaches – psychological and geographical – find the bottles and discover that the messages speak to them. So I continue to write.
Learnings Taking Care of Myself I have always been better at caring for and looking after others than I have been at caring for myself. But in these later years I have made progress. I have always been a very responsible person. If someone else is not looking after the details of an enterprise, or the persons in a workshop, I must. But I have changed. In our 1976 workshop, when I was not feeling well, and at the Arcozelo workshop in Brazil, I simply shed all responsibility for the conduct of these complex undertakings and left it completely in the hands of others. I needed to take care of myself. So, with a few relapses, I simply let go of all responsibility except the responsibility (and the satisfaction) of being myself. For me it was a most unusual feeling to be comfortably irresponsible, without feelings of guilt. And to my surprise, I found I was more effective that way. I have taken better care of myself physically, in a variety of ways. I have also learned to respect my psychological needs. Three years ago a workshop group helped me to realize how harried and driven I felt by outside demands – “nibbled to death by ducks” was the way one person put it – and it captured my feelings exactly. So I did what I have never done before; I spent 10 days absolutely alone in a beach cottage which had been offered me, and refreshed myself immensely. I found I thoroughly enjoyed being with me – I like me. I have been more able to ask for help. I ask others to carry things for me, do things for me, instead of “proving” that I can do it myself. I can ask for personal help. When Helen, my wife, was very ill, and I was close to the breaking point from being on call as a 24-hour nurse, a housekeeper, a professional person in much demand, and a writer, I asked for help – and got it from a therapist-friend. I explored and tried to meet my own needs. I explored the strain this period was putting on our marriage. I realized that it was necessary for my survival to live my life, and that this must come first, even though Helen was so ill. I am not quick to turn to others, but I am much more aware of the fact that I can’t handle everything within myself. In these varied ways, I do a better job of prizing and looking after the person that is me.
Salkind_Chapter 17.indd 305
9/16/2010 12:42:21 PM
306
Human Development
Opening Up to New Ideas During these years, I have been, I think, more open to new ideas. The ones of most importance to me have to do with inner space, the realm of the psychological powers and the psychic capabilities of the human person. In my estimation this area constitutes the new frontier of knowledge, the cutting edge of discovery. Ten years ago I would not have made such a statement. But reading, experience, and conversation with individuals who are working in these fields have changed my view. The human being has potentially available a tremendous range of intuitive powers. There is much evidence that we are indeed wiser than our intellects. We are learning how sadly we have neglected the capacities of the nonrational, creative “metaphoric mind” – the right half of our brain. Biofeedback has shown us that if we let ourselves function in a less conscious, more relaxed way, we can learn at some level to control temperature, heart rate, and all kinds of organ functions. We find that terminal cancer patients, with an intensive program of meditation and fantasy training focused on overcoming the malignancy, effect a surprising number of remissions. So I am open to even more mysterious phenomena – precognition, thought transference, clairvoyance, human auras, Kirlian photography, even out-ofthe-body experiences. They may not fit with known scientific laws, but perhaps we are on the verge of discovering new types of lawful order. I feel I am learning a great deal in a new area, and I find it enjoyable and exciting.
Intimacy In the past few years I have found myself opening up to much greater intimacy in relationships. I see this as definitely the result of workshop experiences. I am more ready to touch and be touched, physically. I do more hugging and kissing, of both men and women. I am more aware of the sensuous side of my life. I realize more sharply how much I desire close psychological contact with others. I recognize how much I need to care deeply for another and to receive that kind of caring in return. I can say openly what I have always recognized dimly, that my deep involvement in psychotherapy was a cautious way of meeting this need for intimacy, without risking too much of my person. Now I am more willing to be close in other relationships, and risk giving more of myself. I feel as though a whole new depth of capacity for intimacy has been discovered in me. What has this meant with regard to my behavior? It has meant deeper and more intimate relationships with men, sharing without holding back, trusting the security of the friendship. Only during my college days – never before or after – did I have a group of really trusted, intimate men friends. So this is a new, tentative, adventurous development which seems very rewarding. It has meant much more intimate communication with several
Salkind_Chapter 17.indd 306
9/16/2010 12:42:21 PM
Rogers
Growing Old
307
women. There are now a number of women with whom I have a platonic but psychologically intimate relationship. With these close friends, men and women, I can share any aspect of my self – the painful, joyful, frightening, crazy, insecure, egotistical, self-deprecating feelings. I can share fantasies and dreams. I receive a similar deep sharing from them. I find these experiences very enriching. In my marriage of so many years, and in these friendships, I think I am continuing to learn more in the realm of intimacy. I am becoming more sharply aware of the times when I experience pain, anger, frustration, and rejection, as well as the closeness of shared meanings or the satisfaction of being understood and accepted. I have learned how hard it is to confront with negative feelings a person about whom I care deeply. I have learned how expectations of a relationship turn very easily into demands made on the relationship. In my experience I have found that one of the hardest things for me is to care for a person for whatever he or she is, at that time, in the relationship. It is so much easier to care for others for what I think they are, or wish they would be, or feel they should be. To care for this person for what he or she is, dropping my own expectations of what I want him or her to be for me, dropping my desire to change this person to suit my needs, is a most difficult but enriching way to a satisfyingly intimate relationship. All of this has been a changing part of my life during this decade. I find myself more open to closeness and to love.
Personal Joys and Difficulties This has been a period which has had some painful and many pleasant experiences. The greatest stress revolves around coping with Helen’s illness, which during the past five years has been very serious. She has met her pain and her restricted life with the utmost of courage. Her disabilities have posed new problems for each of us, physical and psychological, problems which we continue to work through. It has been a very difficult period of alternate despair and hope, with currently much more of the latter. But on the whole, this period has contained a wealth of positive experiences. There was our Golden Wedding celebration three years ago – several days of fun in a resort setting with our two children, our daughter-in-law, and all six of our grandchildren. It is such a joy to us that our son and daughter are now not only our offspring, but two of our best and closest friends, with whom we share our inner lives. There have been numerous intimate visits with them individually, and similar visits with close friends from other parts of the country. There is the continuing and growing closeness with our younger circle of friends here as well. For me there have been the pleasures of gardening and of long walks. There have been honors and awards – more than I believe I deserve. The
Salkind_Chapter 17.indd 307
9/16/2010 12:42:21 PM
308
Human Development
most touching was the honorary degree from Leiden University on the occasion of its 400th anniversary, brought to me by a special emissary from this ancient Dutch seat of learning. There have been the dozens of highly personal letters from those whose lives have been touched or changed by my writings. This never ceases to amaze me. That I could have had an important part in altering the life of a man in South Africa or a woman in the “outback” of Australia still seems a bit incredible – like magic, somehow.
Thoughts Regarding Death And then there is the ending of life. It may surprise you that at my age, I think very little about death. The current popular interest in it surprises me. Ten or 15 years ago I felt quite certain that death was the end of the person and would be the total end of me. I guess I still regard that as the most likely prospect. It does not seem to me a tragic or awful prospect. I have been able to live my life – not to the full, certainly, but with a satisfying degree of fullness – and it seems natural that that life should come to an end. I already have a degree of immortality in persons. I have sometimes said that psychologically I have strong sons and daughters all over the world. Also I believe that the ideas and the ways of being which I and others have helped to develop will continue on, for some time at least. So if I, as an individual, come to a complete and final end, aspects of me will continue to live on in a variety of growing ways, and that is a pleasant thought. I think no one can know whether he or she fears death until it arrives. Certainly it is the ultimate leap in the dark, and I think that it is highly likely that the apprehension I feel when going under an anaesthetic will be duplicated or increased when I face death. Yet I don’t experience a really deep fear of this process. So far as I am aware the fears that I have around death relate to its circumstances. I have a dread of any long and painful illness leading to death. I dread the thought of senility, or of partial brain damage due to a stroke. My preference would be to die quickly, and before it is too late to die with dignity. I think of Winston Churchill. I didn’t mourn his death. I mourned the fact that death had not come sooner, when he could have died with the dignity he deserved. This whole view of death as the end has been modified by some of the learning of the past decade. I am impressed with the accounts by Moody of the experiences of persons who have been so near death as to be declared dead, but who have come back to life. I am impressed by some of the reports of reincarnation, though reincarnation seems a very dubious blessing indeed. I am interested in the work of Kübler-Ross, and the conclusions she has reached about life after death. I find definitely appealing the views of Arthur Koestler that individual consciousness is but a fragment of a cosmic consciousness, the fragment being reabsorbed into the whole upon the death of the individual.
Salkind_Chapter 17.indd 308
9/16/2010 12:42:21 PM
Rogers
Growing Old
309
I like his analogy of the individual river eventually flowing into the tidal waters of the ocean, dropping its muddy silt as it enters the boundless sea. So I consider death with an openness to the experience. It will be what it will be, and I trust I can accept it as either an end to, or a continuation of life.
Conclusion I recognize that I have been unusually fortunate in my health, in my marriage, in my family, in my stimulating younger friends, in the unexpectedly adequate income from my books. So I am in no way typical. But for me this has been a fascinating 10 years – full of adventuresome undertakings. I have been able to open myself to new ideas, new feelings, new experiences, new risks. Increasingly I discover that being alive involves taking a chance, acting on less than certainty, engaging with life. All of this brings change, and for me the process of change is life. I realize that if I were stable and steady and static, I would be a living death. So I accept confusion and uncertainty and fear and emotional highs and lows because they are the price I willingly pay for a flowing, perplexing, exciting life. As I consider all the decades of my existence there is only one other, the period at the Counseling Center at Chicago, which can be compared to this one. It too involved risk, learning, personal growth, and enrichment. But it was also a period of deep personal insecurity and strenuous professional struggle, much more difficult than these past years. So I believe I am being honest when I say that all in all, this has been the most satisfying decade in my life. I have been increasingly able to be myself and have enjoyed doing just that. As a boy I was rather sickly, and my parents have told me that it was predicted I would die young. This prediction has been proven completely wrong in one sense, but has come profoundly true in another sense. I think it is correct that I will never live to be old. So now I agree with the prediction. I believe that I will die young.
Notes 1. Written in 1977, I withheld this paper from publication because my wife had some reservations about seeing something so personal in print. Since her death, I have given fresh consideration to permitting its publication. 2. Mitchell, A. Los Angeles Times, February 28, 1977.
Reference Toynbee, A. Why and how I work. Saturday Review, April 5, 1969, p. 22.
Salkind_Chapter 17.indd 309
9/16/2010 12:42:21 PM
Salkind_Chapter 17.indd 310
9/16/2010 12:42:21 PM
18 Maturational Timing and the Development of Problem Behavior: Longitudinal Studies in Adolescence Rainer K. Silbereisen, Anne C. Petersen, Helfried T. Albrecht and Bärbel Kracke
S
timulated by research on the biological changes of puberty, the interplay between biological growth and social development has been a major focus of recent research on adolescence (see Petersen, 1988). The physical changes themselves have generally been less important than the responses to the physical changes of the self and others in the adolescents’ social environment. For example, Magnusson and his colleagues (Magnusson, Stattin, & Allen, 1986) found a higher prevalence of problem behaviors such as alcohol consumption among early-maturing girls. The authors inferred that the girls adopted these behaviors in the course of socializing with older male adolescents for whom alcohol consumption was part of their normal, age-appropriate conduct. Thus, problem behavior among faster-developing girls may not indicate deviant attitudes; rather, it may simply represent the attempt to match their behavior and their appearance regardless of chronological age (Silbereisen & Kastner, 1987). A temporary increase in problem behavior may be the cost of coping with the difficult and sometimes disturbing experience of this developmental mismatch. Other researchers have found effects of maturational timing on psychological functioning as well (see Petersen & Taylor, 1980, for a review). Petersen and Crockett (1985) reported improved psychological adjustment for late maturing boys and girls. With body image, Tobin-Richards, Boxer, Source: The Journal of Early Adolescence, 9(3) (1989): 247–268.
Salkind_Chapter 18.indd 311
9/16/2010 12:42:12 PM
312
Human Development
and Petersen (1983) found the most negative effects for early maturing girls and positive effects for early maturing boys. Similarly, Simmons, Blyth, Van Cleave, and Bush (1979) found better body images and higher self-esteem for late maturing girls; among boys, however, more positive effects on functioning were typical for early maturers. Thus, early maturation seems to have less positive developmental outcomes for adolescent girls than for adolescent boys. According to Simmons, Blyth, and McKinney (1983), negative outcomes depend on whether maturational timing puts the adolescent in a deviant status relative to the peer group. As a consequence of more advanced appearance and experiences, early maturing girls may suffer from rejection by their peers. Because it undermines the development of intimacy (see Sullivan, 1953), peer rejection is one of the major risk factors for psychological disorders in adolescence. The aim of the present study was to address the role of maturational timing within a model of the development of problem behavior in adolescence. Alhough previous research has begun to elucidate the mechanisms related to these effects, more information is required about the role of social factors such as family and peer contexts (Petersen, 1988). Throughout this study, problem behavior was identified by two related concepts. First, transgression proneness (Jessor & Jessor, 1977) was the degree to which an adolescent showed positive attitudes toward norm-breaking behaviors. Second, contact with deviant peers was conceived as the prevalence of norm-breaking behaviors in the adolescent’s peer group. In the model, contexts of development, such as the family, were distinguished from individual conditions, such as maturational timing. It was expected that both would exert direct and indirect influences on problem behavior. According to Kaplan’s (1980) theory of deviance, the self is the major link between individual and contextual antecedents on one hand, and developmental outcomes on the other hand. More specifically, in the model in this study, self-derogation, or negative self-esteem, was expected to increase transgression proneness that, in turn, was expected to increase contacts with deviant peers. The model specified three antecedent, or background, variables, two representing context and one representing the individual. The contexts represented are the family and the peer group. The family-child interaction was characterized by the degree of parental support. The peer context was described in terms of acceptance versus rejection of schoolmates. Finally, maturational timing was an antecedent variable at the individual level. In sum, the model included six variables: maturational timing, parental support, peer rejection, self-derogation, transgression proneness, and contacts with deviant peers. Because stability and change of the relationships were of interest, the model also included two assessments, obtained one year apart, of self-derogation,
Salkind_Chapter 18.indd 312
9/16/2010 12:42:12 PM
Silbereisen et al.
Maturational Timing
313
contacts with deviant peers, and transgression proneness. To further address the role of maturational timing in the present analyses, two age groups were included: one termed the early adolescent cohort (aged 11 years) and another termed the middle adolescent cohort (aged 14 years).1 Because middle adolescence is a time of rather rapid changes in the development of intimate friendships (Youniss, 1980), maturational timing may have more effects in this age group than in early adolescence. The present analyses focused on girls because the maturational status and timing data for boys could be collected only occasionally in the younger cohort.
Hypotheses The following hypotheses guided the analyses: 1. Maturational timing was expected to play a role in the development of problem behavior. Considering that most girls are either prepubertal or just beginning to mature by 11 years of age but pubertal or postpubertal by age 14 years (see Marshall & Tanner, 1969), all effects of maturational timing were expected to be especially strong for the older cohort. Earlier research (e.g., Simmons et al., 1979; Simmons et al., 1983) led to the hypothesis that there would be negative effects on the self with early maturation as compared to late maturation. There were no specific hypotheses for consequences of peer rejection. Although both early and late maturers may deviate relative to the peer group in terms of social interests (Simmons et al., 1983), the importance of timing deviations may change across adolescence (Faust, 1960). Late maturers may lag behind their schoolmates in social activities, with higher peer rejection as one result (Dunphy, 1963). Based on the findings of Magnusson et al. (1986), early maturing girls should have had more contacts with deviant peers. 2. Parental support was hypothesized to exert protective effects; that is, it should have decreased transgression proneness and deviant contacts (Snyder & Patterson, 1987). 3. It was expected that peer rejection would play a key role in the development of problem behavior. Following Kaplan (1980), likely reactions are negative effects on the self and attempts to find alternative reference groups. Thus, self-derogation and contacts with deviant peers were hypothesized to be higher among adolescents who experienced more rejection. 4. Again drawing on Kaplan (Kaplan, Martin, & Robbins, 1984), self-derogation was expected to increase future transgression proneness and contacts with deviant peers. Furthermore, prior transgression proneness should promote deviant peer contacts in the future.
Salkind_Chapter 18.indd 313
9/16/2010 12:42:12 PM
314
Human Development
Method Subjects The sample for the present study consisted of 62 girls in the early adolescent cohort (mean age 11.5 years) and 193 girls in the middle adolescent cohort (mean age 14.7 years). All subjects attended schools in West Berlin; the younger adolescents attended grades 5 and 6, and the older adolescents attended grades 8 and 9. This sample was drawn from all subjects who took part in the 1985 and 1986 data collections of the Berlin Youth Longitudinal Study (BYLS).2 The BYLS used a stratified random sampling procedure on more than 70 schools (one classroom per school) representative of schools in Berlin with respect to socioeconomic status and school program or track. The BYLS sampled subjects for the older cohort attending all three types of schools in West Germany: (a) 30% in the Hauptschule, or nonacademic school (including some from the Gesamtschule, offering both academic and nonacademic programs); (b) 21% in the Realschule, or more academically oriented school; and (c) 49% in the Gymnasium, or traditional preuniversity school program. (See Holmes, 1983 for details on the school system in West Germany.) These percentages undersample the Hauptschule/Gesamtschule (38% in the population) and oversample the Gymnasium (40% in the population), but are quite accurate for Realschule (22% in the population). The younger cohort can be classified according to schools ultimately attended. Again, representation is fairly close ([a] is 43% versus 38% in the population; [b] is 14% versus 22%; and [c] is 43% versus 40%); thus, the lowest track is oversampled in the younger cohort, with the middle track undersampled and the highest track quite similar. The rate of unemployment in the BYLS sample was 8%, comparable to a 9% rate in Berlin generally. Both educational and occupational data on mothers and fathers very closely represent those levels in West Germany generally. The BYLS included 218 girls in the younger and 429 girls in the older cohort. Except for a slight upward bias, the present study sample is similar to the overall sample in socioeconomic status (SES). Girls who were not living at home (17 and 15 in each cohort, respectively), those not living with both parents (60 and 111, respectively), and all girls of foreign nationality (47 and 46, respectively) were excluded from the analyses. These factors all served to produce the slight upward bias in SES. Of the remaining 108 younger girls, only 83 participated in the second measurement (attrition: 23%) and of the remaining 257 older girls, 221 took part at the second measurement (attrition: 14%). Of those participating in the second assessment, 62 and 193, respectively, had complete data. The higher attrition in the younger cohort was a consequence of educational tracking. Following grade 6, students attend different schools depending on achievement and academic aspirations, making it difficult to locate them.
Salkind_Chapter 18.indd 314
9/16/2010 12:42:12 PM
Silbereisen et al.
Maturational Timing
315
Because recent studies have shown effects of family structure on adolescents’ norm-breaking behavior (Dornbusch, Carlsmith, Bushwall, Ritter, Leidermann, & Hastorf, 1985), the chosen strategy of restricting family structure to two-parent families was preferable despite the disadvantage in terms of sample size. However, the sizes of the subsamples representing other family structures were too small to make it feasible to analyze for variation in this dimension.
Measures The scales used in the analyses consisted of a small number of items each. The BYLS gathered information on many aspects of adolescent development within the constraints of time-limited questionnaire assessments. The costs in terms of psychometric quality are acknowledged. Maturational Timing. As part of a larger instrument (see Ewert, 1985), an item assessing perceived maturational timing relative to one’s same-aged peers was administered: In comparison with same-aged peers, I am developing slower/equally fast/faster. Earlier studies (see Greif & Ulman, 1982) showed the relevance of such judgments in predicting psychosocial functioning. Moreover, as compared to more objective measures (Crockett & Petersen, 1987), perceived maturational timing relative to same-aged peers may more appropriately assess the processes linking biological growth to personality development. In order to distinguish effects of early and late maturation in the analyses, the three response categories were effect-coded (Cohen & Cohen, 1975) into two dummy variables: “quick” indicates adolescents who develop quicker than their peers, that is, early maturers, and “slow” indicates adolescents who develop slower than their peers, that is, late maturers.3 Validity of the maturational timing measure was examined two ways: (a) by comparison with interview responses to questions about physical change available on a subsample and (b) by comparison with height, weight, and another maturational status item. As part of a complementary qualitative study on puberty (Kracke, 1988), extensive interview data were available on 11 girls belonging to the present sample. Their maturational status and the time of onset of several physical changes were assessed using a German version of the Pubertal Development Scale (PDS; Petersen et al., 1988). In 9 of these 11 cases the girls’ questionnaire response and the interview assessment (accomplished two years later) led to the same rating. In the two remaining cases, girls rated themselves to be earlier maturers in the interview than in the question-naire. Thus, responses to the maturational timing question were generally consistent with responses on actual physical changes. Self-reported height arid weight as well as maturational status were used to further check the validity of adolescents’ self-attributions of maturational timing. In order to assess maturational status, subjects were asked to rate themselves
Salkind_Chapter 18.indd 315
9/16/2010 12:42:12 PM
316
Human Development
with respect to their physical appearance. The item was worded, How do you look at present? and the response categories were: I still look rather childlike-0, I am already changing-1, I already look more like a woman-2. Correlations between these variables as well as age and the three indices of timing (maturational timing, quick, slow) are given in Table 1; means and standard deviations are shown as well. Coefficients above the diagonal refer to the early adolescent cohort, below the diagonal to the middle adolescent cohort. For convenience, data are shown for the first wave of measurement only. As indicated by differences in the means between cohorts, the older girls were indeed significantly taller, heavier, and of more mature status. Within-cohort variations in timing, however, were not influenced to any extent by age. Except for the correlation between quick and age (r = .24, p < .05) in the younger cohort, neither maturational timing nor the derivative indices quick and slow were related to chronological age within cohorts. As should be the case, maturational timing was positively correlated in both cohorts with height, weight, and maturational status. The dummy coded timing variables, quick and slow, were less consistently related to these variables. There were no significant correlations for slow, the contrast
Table 1: Correlations, means, and standard deviations for maturational timing and status, height, weight, and age Quick Quick Slow Timing Status Height a Weight b Agec
− .72* .44* .14* .07 .20* .02
Slow
Timing
Status
.70* − −.30* −.06 −.05 .01 .03
.19 −.57* _ .28* .17* .26* −.01
.20 .06 .22* − .26* .24* .04
Height .35 −.03 .45* .33* − .63* .12*
Weight .35* −.02 .36* .19 .76* − .09
Age .24* .13 −.11 .31* .50* .46* −
Early Adolescents – X SD
−.44 .74
−.31 .88
.87 .64
1.46 .50
150.2 8.4
38.5 7.5
11.5 .7
Middle Adolescents – X SD
−.52 .77
−.55 .73
1.03 .56
1.84 .53
165.0 6.5
53.0 8.7
14.7 .7
Note: Coefficients above the diagonal refer to the early adolescent cohort (n = 62), coefficients below the diagonal to the middle adolescent cohort (n = 193). Adolescents rated their timing of maturation (timing) as slower, equally fast, or faster than their same-aged peers; quick contrasts the last category to the middle one, slow the first. Thus, quick indicates early maturation, slow late maturation. Status refers to self-rated physical appearance (child-like, feels changes, quite like a woman). *p ≤ .05 a. Measured in centimeters. b. Measured in kilograms. c. Within cohort.
Salkind_Chapter 18.indd 316
9/16/2010 12:42:12 PM
Silbereisen et al.
Maturational Timing
317
between later and average development relative to one’s peers. Quick, indexing perceptions of faster development than average, was significant though moderately correlated with most of the status variables. Thus, the timing measure appears to be valid when compared with measures indexing maturational status in both cohorts. Those who perceive themselves to be faster in development appear to be carrying most of the variation. To identify which maturational status variables were most involved with perceptions of maturational timing, multiple regressions were run with height, weight, age, and maturational status as predictors, and maturational timing as the criterion. In middle adolescence, weight (β = .21, p < .05) and maturational status (β = .23, p < .01) were the only predictors; that is, changes in gender-specific body attributes seemed to provide the relevant cue for adolescents’ judgments (R = .34). In early adolescence, however, the only variable predictive of maturational timing was height (β = .42, p < .05; R = .47). For these young girls, the height spurt may have provided a salient experience on which their judgment was based.4 Based on these analyses, which showed that the measure of maturational timing showed the expected patterns of change and relationships with other variables, it was concluded that the measure was reasonably valid. Parental Support. The BYLS data contain a large number of items that were drawn from German versions of various instruments addressing the quality of parent-child interaction (Helmke & Väth-Szusdziara, 1980; Moos, 1974). For the present study, three items were chosen tapping authoritative parenting (Baumrind, 1968): (a) When something goes wrong, my parents talk it over calmly with me, (b) Do your parents expect you to make up your own mind and stick to it even when they have a different opinion? and (c) My parents show respect to me and expect the same of me. Adolescents judged their own experience ranging from 0 (never) to 4 (always) on the first and the last item; for the second item, their answers ranged from 0 (no emphasis) to 3 (a lot of emphasis). The internal consistencies are adequate given the small number of items. For the younger cohort α = .61; for the older cohort α = .68 (both in time 1). Peer Rejection. In several publications, Kaplan and his coworkers reported a number of scales developed within the framework of their model of deviance (Kaplan, 1980). Shortened German adaptations were used in the BYLS assessments. In the present study, peer rejection, contact with deviant peers, and self-derogation were measured using items from this pool. Responses indicate agreement with scale items ranging from 0 (not at all) to 3 ( yes, very much). The peer rejection scale (Kaplan et al., 1984) was made up of three items: (a) My schoolmates are not interested in my opinion, (b) I don’t feel comfortable at school, and (c) The majority of my schoolmates aren’t particularly keen on me. The internal consistencies were reasonably high. For the younger cohort, α = .69; for the older cohort, α = .73 (both in time 1).
Salkind_Chapter 18.indd 317
9/16/2010 12:42:12 PM
318
Human Development
Contacts with Deviant Peers. This scale consisted of three items used originally by Kaplan (1980) to characterize adolescents’ social contexts: (a) Many of my friends lie to their parents if they want something, (b) I know a lot of youth who have already stolen something without being caught, and (c) My friends are often in trouble with adults. Internal consistency varied from the younger cohort, α = .58, to the older cohort, α = .72 (both in time 1). This apparent difference in alphas between cohorts was seen also at time 2 (.54 younger and .69 older); thus, early adolescents had lower values than middle adolescents. This difference was not a function of more 0 ratings in the younger cohort (about one third of the subjects in both cohorts had at least one 0), but rather seemed due to less consistent response patterns in the younger cohort. Self-Derogation. This variable was assessed using four items adapted from Kaplan (1978; see Silbereisen, Reitzle, & Zank, 1986). The items are (a) I would like to change a lot concerning myself, (b) Sometimes I wish I would be different, (c) I don’t think I’m worth much, and (d) I am satisfied with myself (reversed). The internal consistencies for both times used in the present analyses were .58 and .73 for the younger cohort, .68 and .76 for the older cohort, each for time 1 and time 2, respectively. The term “self-derogation” chosen by Kaplan was retained, although the more common term for this construct reversed is self-esteem. Transgression Proneness. Drawing on Jessor and Jessor’s (1977) theory of problem behavior proneness, five items were formulated (see Galambos & Silbereisen, 1987): (a) If you break the law sometimes, you get on better in life, (b) I can imagine myself stealing something sometime, (c) Sometimes I like to lie to people, (d) I often find adults’ rules and laws bad and don’t always want to follow them, and (e) Sometimes I really want to do something that is forbidden. The internal consistencies were adequate: .61 and .64 for the younger cohort, .70 and .75 for the older cohort, each for time 1 and time 2, respectively.
Statistical Analyses In order to test the model and its specific hypotheses, structural equation models were tested (Lisrel; Jöreskog & Sörbom, 1986). First, assumptions about measure invariance were tested using confirmatory factor analysis. A measurement model consisting of 5 (Self-derogation, Peer rejection, Contact with deviant peers, Transgression proneness, Parental support) × 2 (waves of measurement) correlated factors was tested separately for each cohort. Good fit was indicated by a ratio χ 2 of 1.60 (855.16/535 df ) for the younger and 1.45 (771.34/531 df ) for the older cohort. The loadings of the items per factor were almost identical across time and cohort, demonstrating invariant meaning of the constructs.
Salkind_Chapter 18.indd 318
9/16/2010 12:42:13 PM
Silbereisen et al.
Maturational Timing
319
Relations among the variables as specified in the model and the hypotheses were formulated as two-wave structural equation models (Lisrel; Jöreskog & Sörbom, 1986). The sums of the respective item scores were used to represent the constructs. Self-derogation, contacts with deviant peers, and transgression proneness were the process variables. All lagged effects across time within and among these variables were allowed. Maturational timing, parental support, and peer rejection were modeled as background variables that could exert immediate or delayed effects. Whether they showed systematic change or mutual effects across time was tested separately. In order to test the hypotheses, an exploratory strategy was used. Starting with the saturated model, all nonsignificant (z > 1.96) effects were set to zero. The resulting model was evaluated in terms of its goodness of fit. The analyses were run separately for each cohort. The importance of the cross-lagged effects was evaluated by comparing this model with a more restricted model that allowed for stability paths only (see Bentler & Bonnet, 1980).
Results The correlations, means, and standard deviations of the variables used in analyzing the data are given in Table 2. Coefficients above the diagonal refer to the early adolescent cohort, below the diagonal to the middle adolescent cohort. As revealed in the bottom part of Table 2, the older adolescents reported significantly less self-derogation independent of time of measurement. This is consistent with research on change in self-esteem across adolescence (O’Malley & Bachman, 1983). Older adolescents also felt significantly less rejected by their school peers. The significant differences in transgression proneness and contacts with deviant peers at second measurement were mainly due to the younger adolescents’ much lower scores than those of the previous year. As many of them changed schools, this decrease may actually indicate a change in the baseline of their ratings. The structural equation model for the early adolescent cohort is depicted in Figure 1. Paths, structural residuals, and covariances are shown. Only significant effects are given. As indicated by χ 2 = 32.91 with df = 41 (ratio of .80, much smaller than the recommended limit of 2), and a goodness-of-fit index of .91 (larger than .90 is suggested), the fit of the model was appropriate. As shown, only a fraction of the possible effects was significant. This model shows a significantly better fit than an alternative model assuming no cross-lagged effects. The difference amounts to χ 2 = 4.92 with df = 1 ( p < .05). Thus, the path from self-derogation to contacts with deviant peers is relevant for interpretation.
Salkind_Chapter 18.indd 319
9/16/2010 12:42:13 PM
320
Human Development
Table 2: Correlations, means, and standard deviations for year 1 and year 2 measurements of variables in structural equation models Process Variables
Background Variables
SD1
TP1
CDP1
SD2
TP2
CDP2
− .28 .18 .55 .15 .11 −.06 .01 −.14 .42
.13 − .42 .23 .64 .43 −.02 −.12 −.24 .18
.42 .22 − .18 .35 .68 .17 .03 −.25 .30
.58 −.05 .39 − .24 .10 .00 .00 −.21 .27
−.03 .49 .17 .13 − .59 .09 −.01 −.20 .34
Early Adolescents – X 5.15 SD 2.62
4.65 2.75
3.27 2.20
5.19 2.68
Middle Adolescents – X 4.39 SD 2.18
4.49 2.80
3.38 2.11
.39
−.35
SD1 TP1 CDP1 SD2 TP2 CDP2 Q1 S1 PS1 PR1
Difference t
2.27*
Q1
S1
PS1
PR1
.39 .25 .48 .39 .31 − .05 −.02 −.19 .25
.02 .06 .06 −.04 −.06 .05 _ .72 −.07 .11
.23 .02 .15 .05 −.11 −.01 .70 − −.01 .04
−.13 −.18 −.30 −.02 −.02 −.26 −.03 −.10 − −.26
.45 .12 .28 .19 .07 .17 .09 .26 −.18 −
3.95 2.43
2.76 1.96
−.44 .74
−.31 .88
7.92 2.18
2.32 2.08
4.25 2.36
4.80 3.07
3.37 2.09
−.52 .77
−.55 .73
8.01 2.12
1.85 1.76
2.64*
−1.99*
−.20
.72
2.14*
−.29
1.75
Note: SD = self-derogation, TP = transgression proneness, CDP = contacts with deviant peers, Q = quick, S = slow, PS = parental support, PR = peer rejection. Measurement at year 1 (1), at year 2 (2). Coefficients above the diagonal refer to the early adolescent cohort (n = 62, r > .21/.30 are significant at the .05/.01 level), coefficients below the diagonal to the middle adolescent cohort (n = 193, r > .12/.17 are significant at the .05/.01 level). In comparing means across cohorts, the critical t for p < .05 is 1.65. *indicates significant ts.
The model for the middle adolescent cohort is given in Figure 2. Again, paths, structural residuals, and covariances are shown. Only significant effects are depicted. The goodness of fit is adequate: χ2 = 21.69, df = 28, resulting in a ratio of .77; goodness-of-fit index = .98. Again, the fit of this model was better than the fit of a model with no cross-lagged effects at all (difference χ2 = 40.81, df = 4, p < .001). Obviously the number and variety of significant effects in the middle adolescent cohort was much more extensive than was found with younger adolescents, especially for the background variables. Multi-group comparisons were used in order to confirm the structural differences between the cohorts. While it was possible to fit the structure of the older cohort to both groups (χ2 = 84.15, df = 83, p = .44), it was not possible to fit the structure of the younger cohort to both groups (χ2 = 190.63, df = 93, p < .001). Thus, processes and background influences shown for early adolescence seem to undergo a qualitative and quantitative change in the course of further development.5 In the following, further results will be presented by hypotheses. Only significant coefficients (z > 1.96) are mentioned. See Figures 1 and 2 for reference.
Salkind_Chapter 18.indd 320
9/16/2010 12:42:13 PM
Silbereisen et al.
Maturational Timing
Year 1
Year 2
Quick .70
.58
Peer Rejection
.79
1.0
Parental Support −.18
Self Derogation
.45
Transgression .26
.66
.80 Self Derogation
Slow
321
.46
Transgression .20
.28 −.24
.24
.26 Contact Deviant Peers
.80
.36
Contact Deviant Peers
.73
Chi–Sq. = 32.91 df = 41
Early Adolescence N = 62
Figure 1: Structural equation model for the early adolescent cohort. Data on these girls were gathered twice, with an interval of 12 months. At first measurement, the mean age was 11.5 years. Parental support, peer rejection, self-derogation, contact with deviant peers, and transgression proneness are represented by the sum of the items per construct. Quick and slow indicate adolescents whose self-reported maturational timing was faster (early maturers) or slower (late maturers) relative to same-aged peers. Only significant paths, covariances among the exogeneous variables, structural residuals, and covariances among residuals are depicted (z > 1.96). Path coefficients are set in boldface. Chi-square statistics refer to the model as depicted.
Hypothesis 1 In both cohorts, covariances with peer rejection are relevant in assessing the role of maturational timing.6 Among the early adolescents, later maturation had a positive relation with peer rejection (β = .26). That is, late maturers reported more problems concerning acceptance by peers in school than did the other adolescents. The rejection may be based on their prepubertal status and its psychosocial correlates. In considering effects of the background variables on the processes depicted in the model, early maturation was predicted to increase self-derogation and contacts with deviant peers. In the early adolescent cohort, there were no effects of maturational timing at all. In the middle adolescent cohort, however, some effects were found. In support of the hypothesis, early maturers reported more contacts with deviant peers (β = .15). As the negative effect of early maturation on self-derogation shows (β = −.21), they were also more satisfied with themselves. This is quite the opposite of what was expected. The slight positive effect of late maturation on self-derogation (β = .16, ns) was consistent with the surprising result. Although analysis of variance revealed no significant differences among
Salkind_Chapter 18.indd 321
9/16/2010 12:42:13 PM
322
Human Development
Year 1 Quick
Year 2
−.21
.68
.82 Self Derogation
.72 .16
Self Derogation
.53
−.14
.42
Slow
.18
.11
−.11
.55
.95
.11
.63
Transgression Parental Support
.26
−.23
−.19 Peer Rejection
.20
.26
.33
.15
−.26
Transgression
Contact Deviant Peers
.87
.18 .60
Contact Deviant Peers
.52
Chi–Sq. = 21.69 df = 28
Middle Adolescence N = 193
Figure 2: Structural equation model for the middle adolescent cohort (mean age 14.7 years). Significant paths are depicted (z > 1.96); for the path from slow to self-derogation, z = 1.7. See Figure 1 for further explanation.
timing groups in either cohort, the means for self-derogation declined from slower to faster in both cohorts (6.35, 4.83, 4.11 and 4.67, 4.44, and 4.00, for younger and older cohorts, respectively). In sum, early maturation increased the risk of contacts with deviant peers in middle adolescence, as hypothesized. However, early maturation also corresponded to a more positive self-evaluation, contrary to hypothesis.
Hypothesis 2 Low social competence as a consequence of low parental support appeared to be responsible for the negative relation of support and peer rejection in both cohorts (β = −.18 and −.26, respectively). Higher support resulted in fewer contacts with deviant peers in the younger (β = −.24) as well as in the older cohort (β = −.19). Thus, there was a protective effect as stated in the hypothesis. Higher support was protective against transgression proneness, however, for middle adolescence only (β = −.23). The older cohort showed another peculiarity. Whereas there was no significant effect of parental support on self-derogation at first measurement ( β = −.04, ns), more support corresponded to less self-derogation in the following year ( β = −.14). Thus, the impact of parental support on selfderogation became more pronounced from age 14 to age 15 years. To test whether the model was similar for different levels of parental support, the scores for the older cohort were split at the median for parental
Salkind_Chapter 18.indd 322
9/16/2010 12:42:13 PM
Silbereisen et al.
Maturational Timing
323
support and this variable was removed from the model; the resulting model was tested with high and low parental support subsamples. Although the low parental support subsample was significantly higher on transgression (time 1), deviant peers (time 1), peer rejection (time 1), and self-derogation (time 2), there were few differences in which standardized coefficients were significant in the model. Two cross-lagged paths were significant for only one group: The path from self-derogation to transgression was significant only among the low-support group (β = −.18, high: β = .02) and the path from transgression to deviant peers was significant only among the high-support group (β = .29, low: β = .11). Only one difference appeared among the exogenous effects: The path from late maturation to self-derogation became significant among the high-support group (.30). Other effects were similar in size to the original effects, although occasionally they did not attain significance because of small ns.
Hypothesis 3 The influences of peer rejection were almost invariant across cohorts. Higher peer rejection resulted in more contacts with deviant peers in the younger (β = .24) and older groups (β = .20). Similarly, higher peer rejection corresponded to higher self-derogation in both cohorts (β = .45 and .42, respectively). This relation between peer rejection and self-derogation was generally the strongest effect in the model. An effect restricted to the middle adolescent cohort was also found: Higher peer rejection in the first year was related to higher transgression proneness in the following year (β = .26). Again, the contemporaneous relationship between these two variables was less pronounced (β = .13, ns) than a time-lagged relationship. The time-lagged effects of peer rejection and parental support could be spurious because second measurements on these variables were not part of the model (see Rogosa, 1979). Therefore, additional two-wave cross-lagged analyses were run for peer rejection and transgression proneness and for parental support and self-derogation. In both cases, the lagged effects shown in Figure 2 were not attenuated by the reverse lagged effects on peer rejection and parental support, respectively; the latter were almost nonexistent (β = .00 and (β = .05). Thus, both background variables became more important over the one-year period.
Hypothesis 4 Not expectedly, the dominant effects among the process variables were their stabilities across time. Although all were significant, the size of the coefficients varied considerably between cohorts. Both transgression proneness
Salkind_Chapter 18.indd 323
9/16/2010 12:42:13 PM
324
Human Development
(β = .46 vs. .63) and contacts with deviant peers (β = .36 vs. .60) became more stable during the adolescent transition. The hypothesis posited an influence of self-derogation on later transgression proneness and contacts with deviant peers. The latter was confirmed with the early adolescent cohort (β = .26). In the older cohort, however, the effect on transgression proneness was reversed in sign. Adolescent girls who were not satisfied with themselves tended to show less transgression proneness, not more as expected (β = −.11). It is open to speculation whether they invest instead in positive alternatives such as academic performance. This result was contrary to what was hypothesized. With the middle adolescent group, higher scores in prior transgression proneness corresponded to more contacts with deviant peers (β = .18), as hypothesized. Taking both cross-lagged effects together, transgression proneness seemed to play a mediating role between self-derogation and deviant peer contexts.
Discussion The antecedents of and links between self-derogation and contacts with deviant peers differed in the two cohorts. In early adolescence, the levels of both variables varied mainly as a function of peer rejection. Adolescent girls who felt rejected by schoolmates had a less favorable self-perception and also tended to affiliate with deviant peers. More important, selfderogation directly increased the risk of future contacts with deviant peers, while transgression proneness was not related to this process. In middle adolescence, however, self-derogation and contacts with deviant peers were targets of multiple influences in addition to peer rejection. Among the background variables, maturational timing played a prominent role. Adolescent girls who developed faster than their agemates had more contacts with deviant peers but also showed less self-derogation. In contrast to its irrelevance in early adolescence, in middle adolescence transgression proneness seemed to mediate between self-derogation and contacts with deviant peers. Finally, a tendency was observed for parental support and peer rejection to increase their impact on self-evaluation and willingness to transgress by middle adolescence. Consistent with the focus of this paper, the discussion will concentrate on maturational timing. Generally speaking, maturational timing played a role in the development of problem behavior, as expected. More important, however, some of the effects seemed to contradict results of earlier studies. This was especially true for relations with the self. Girls who matured earlier than their agemates in middle adolescence reported less self-derogation.
Salkind_Chapter 18.indd 324
9/16/2010 12:42:13 PM
Silbereisen et al.
Maturational Timing
325
The results on the link between maturation and self were different from what was expected from the literature. Certainly, the discrepancy could simply derive from differences in the measurement of maturational timing. However, Petersen (unpublished data) used similar self-report assessments and found comparable relationships with height and weight.7 Although there was no independent, more objective measurement, the assessment of maturational timing seems valid. The findings on contacts with deviant peers are consistent with earlier research. Early-maturing girls reported more such contacts; that is, they agreed more with statements that characterize their friends as having trouble with adult norms. Magnusson et al. (1986) interpreted the higher risk of problem behavior among early-maturing girls as a consequence of age-inappropriate friendships with older males. Unfortunately, no data on the ages of their boyfriends were available. However, early-maturing girls were more likely to have close friendships with males (r = .24; p < .001). Thus, the results may relate at least in part to the more grown-up social affiliates of the girls. How can this result be brought together with the higher self-esteem of early maturers in this study? Simmons et al. (1983) reported lower selfesteem among early-maturing girls. In the present study the reverse was found: higher self-esteem among early maturers. The processes producing these different results require further examination. However, the apparent advantage to early maturers could be temporary. There are a number of problems with the present study that demand further research. The results require cross-validation on independent samples. Additional waves of measurement are required in order to determine whether the reported differences between cohorts indeed represent qualitative change in the role of maturational timing. A more extended time-span would help to distinguish relatively stable effects from short-term variations. We also wish to further illuminate the interplay between maturation and social development by adding other target behaviors to the model. More specifically, it would be highly interesting to compare effects on problem behaviors with those on more positive alternatives such as prosocial action. A final word concerning the positive relation between early maturation and self-evaluation: Most other studies on this issue deal with samples from the United States. Although a definite resolution of whether the present finding may be generalized requires systematic comparisons of matched national groups,8 we risk a premature answer at present. It may well be the case that a true cultural difference exists. Concerning sex education, for instance, adolescents in West Germany receive much more information in regular school curricula than their American gemates. The remarkably low rate of adolescent pregnancy as compared to the United States (Statistisches Bundesamt,1984) is but one of the presumable consequences.
Salkind_Chapter 18.indd 325
9/16/2010 12:42:13 PM
326
Human Development
Notes 1. It is noted that usually even 14-year-olds are called early adolescents. However, distinctive names were needed for the younger and the older cohort. 2. The principal aim of the BYLS (for short technical reference see Verdonik & Sherrod, 1984) is the analysis of the role of problem behavior in normal adolescent development. Risk and protective factors within the individual, and within family, work, and leisure contexts, are investigated in Berlin, West Germany. By 1989, one of the cohorts will have been followed up once every year from ages 11 to 18. A parallel study was started in Warsaw, Poland, in 1985. 3. Adolescents whose maturational timing is consistent with their peers receive a score of −1 on both dummy variables. Adolescents who develop faster get a 1 on quick, and a 0 on slow; conversely, those who develop slower get a score of 0 on quick, and a 1 on slow. These raw dummy variables represent a contrast between the group indicated by 1 and −1, respectively. However, when both are used simultaneously in the same regression analysis, due to effects of partialling-out, each of them represents a contrast with the remaining groups. Thus, a positive effect of quick on contacts with deviant peers would indicate more such contacts among early maturers than in the two other categories of maturational timing. 4. Other attributes such as the development of breast and pubic hair were not measured. Our interpretation is supported by data from the second measurement. One year later, the only predictor is maturational status (β = .29, p < .05; r = .35). Although of similar size, the coefficient for height fails to achieve significance due to large variance. Judged from a study on a Swiss sample in which the age at peak height velocity is 12.2 years (Gasser et al., 1984; Largo & Prader, 1983), the age at peak height velocity for girls is close to 12 years in Germany. 5. Note that all computations were done using the sum of the respective item scores in order to represent the constructs. Additional analyses were based on multiple indicator models. For the older cohort the results were confirmed (χ2 = 609.35, df = 433, GFI = .84). Unfortunately, for the younger cohort the multiple indicator analyses revealed a rather bad fit (χ2 = 668.82, df = 442, GFI = .66). This seems to result from the low stability of contacts with deviant peers (see Figure 1 for comparison). Thus, results on the younger cohort should be taken with some caution. Furthermore, random fluctuations in this small sample lead to unstable estimates (which is especially serious when running multiple indicator models). 6. The covariance between quick and slow deserves no attention as it is simply required by the method of dummy-coding chosen. 7. For example, among eighth graders (13 to 14 years) in a United States sample, “quick” had very low correlations with height and weight, “slow” was significantly negatively related to both variables (−.37 and −.35, respectively), and perceived timing was positively related (.32 and .51 with height and weight, respectively). 8. In collaborative research we have just begun to design cross-national secondary analyses on adolescents living in Berlin and Warsaw (Berlin Youth Longitudinal Study), Chicago (Developmental Study of Adolescent Mental Health), and Pennsylvania (Rural Adolescent Project).
References Baumrind, D. (1968). Authoritarian versus authoritative parental control. Adolescence, 3, 255–272. Bentler, P. M., & Bonnet, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588–606.
Salkind_Chapter 18.indd 326
9/16/2010 12:42:13 PM
Silbereisen et al.
Maturational Timing
327
Cohen, J., & Cohen, P. (1975). Applied multiple regression/correlation analysis for the behavioral sciences (pp. 188–195). Hilisdale, NJ: Erlbaum. Crockett, L. J., & Petersen, A. C. (1987). Pubertal status and psychosocial development: Findings from the Early Adolescence Study. In R. M. Lerner & T. T. Foch (Eds.), Biological-psychosocial interactions in early adolescence: A life-span perspective (pp. 173 –188). Hillsdale, NJ: Erlbaum. Dornbusch, S., Carlsmith, J., Bushwall, S., Ritter, P., Leiderman, H., & Hastorf, A. H.(1985). Single parents, extended households, and the control of adolescents. Child Development, 56, 326 –341. Dunphy, D. (1963). The social structure of urban adolescent peer groups. Sociometry, 26, 230–246. Ewert, O. M. (1985, July). Differential rates of maturation and psychological functioning. Paper presented to the 8th Biennial Meetings of the International Society for the Study of Behavioral Development, Tours. Faust, M. (1960). Developmental maturity as a determinant of prestige in adolescent girls. Child Developmental, 31, 173 –184. Galambos, N. L., & Silbereisen, R. K. (1987). Income change, parental life outlook, and adolescent expectation for job success. Journal of Marriage and the Family, 49, 141–149. Gasser, T., Kohler, W., Muller, H. G., Kneip, A., Largo, R., Molinari, L., & Prader, A. (1984). Velocity and acceleration of height growth using kernel estimation. Annals of Human Biology, 11, 397– 411. Greif, E. B., & Ulman, K. J. (1982). The psychological impact of menarche on early adolescent females: A review of the literature. Child Development, 53, 1413–1430. Helmke, A., & Väth-Szusdziara, R. (1980). Familienklima, Leistungsangst und Selbstakzeptierung bei Jugendlichen [Family climate, fear of failure, and self-acceptance in adolescents]. In H. Lukesch, M. Perrez, & K. A. Schneewind (Eds.), Familiäre Sozialisation und Intervention [Familial Socialization and Intervention]. Bern: Huber. Holmes, B. (Ed.). (1983). International handbook of educational systems: Vol. 1. Europe and Canada. New York: Wiley. Jessor, R., & Jessor, S. L. (1977). Problem behavior and psychosocial development. A longitudinal study of youth. New York: Academic Press. Jöreskog, K. G., & Sörbom, D. (1986). Lisrel VI: Analysis of linear structural relationships by maximum likelihood, instrumental variables, and least square methods. Mooresville, IN: Scientific Software. Kaplan, H. B. (1978). Social class, self-derogation and deviant response. Social Psychiatry, 13, 19–28. Kaplan, H. B. (1980). Deviant behavior in defense of self. New York: Academic Press. Kaplan, H. B., Martin, S. S., & Robbins, C. (1984). Pathways to adolescent drug use: Selfderogation, peer influence, weakening of social controls, and early substance use. Journal of Health and Social Behavior, 25, 270–289. Kracke, B. (1988). Problemverhalten und Pubertät [Problem behavior and puberty]. Unpublished master’s thesis, Technical University of Berlin, West Berlin. Largo, R. H., & Prader, A. (1983). Pubertal development in Swiss girls. Helvetica Baediatrica Acta, 38, 229–243. Magnusson, D., Stattin, H., & Allen, V. L. (1986). Differential maturation among girls and its relation to social adjustment: A longitudinal perspective. In D. L. Featherman & R. M. Lerner (Eds.), Life-span development and behavior ( Vol. 7, pp. 135–172). New York: Academic Press. Marshall, W. A., & Tanner, J. M. (1969). Variations in the pattern of pubertal changes in girls. Archive of Disorders in Childhood, 44, 291–303. Moos, R. H. (1974). Family environment scale (FES) (Preliminary manual). Palo Alto CA: Stanford University, Social Ecology Laboratory, Department of Psychiatry.
Salkind_Chapter 18.indd 327
9/16/2010 12:42:13 PM
328
Human Development
O’Malley, P. M., & Bachman, J. G. (1983). Self-esteem: Change and stability between ages 13 and 23. Development Psychology, 19, 257–268. Petersen, A. C. (1988). Adolescent development. Annual Review of Psychology, 39, 583–607. Petersen, A. C. (1989). [The correlation of height and weight with the timing of maturation]. Unpublished raw data. Petersen, A. C, & Crockett, L. J. (1985). Pubertal timing and grade effects on adjustment. Journal of Youth and Adolescense, 14, 191–206. Petersen, A. C, Crockett, L. J., Richards, M., & Boxer, A. M. (1988). A self-report measure of pubertal status: Reliability, validity, and initial norms. Journal of Youth and Adolescence, 17, 117–133. Petersen, A. C, & Taylor, B. (1980). The biological approach to adolescence: Biological change and psychological adaptation. In J. Adelson (Ed.), Handbook of Adolescent Psychology (pp. 117–155). New York: Wiley. Rogosa, D. (1979). Causal models in longitudinal research: Rationale, formulation, and interpretation. In J. R. Nesselroade & P. B. Baltcs (Eds.), Longitudinal research in the study of behavior and development (pp. 263–302). New York: Academic Press. Silbereisen, R. K., & Kastner, P. (1987). Jugend und Problemverhalten: Entwicklungspsychologische Perspektiven [Youth and problem behavior: Developmental perspectives]. In R. Oerter & L. Montada (Eds.), Entwicklungspsychologie [Developmental Psychology] (pp. 882–919). Munchen, FRG: Psychologie Verlags Union. Silbereisen, R. K., Reitzle, M., & Zank, S. (1986). Stability and change of self-concept in adolescence: Self-knowledge and self-strategies. In F. Klix & H. Hagendorf (Eds.), Human memory and cognitive capabilities: Mechanisms and performances. Symposium in memoriam Hermann Ebbinghaus, (pp. 449–457). Amsterdam: Elsevier Science Publishers. Simmons, R. G., Blyth, D. A., & McKinney, K. L. (1983). The social and psychological effects of puberty on white females. In J. Brooks-Gunn & A. C. Petersen (Eds.), Girls at puberty: Biological and psychosocial perpectives (pp. 229–272). New York: Plenum. Simmons, R. G., Blyth, D. A., Van Cleave, E., & Bush, D. (1979). Entry into early adolescence: The impact of school stucture, puberty, and early dating on self-esteem. American Sociological Review, 44, 948–967. Snyder, J., & Patterson, G. R. (1987). Family interaction and delinquent behavior. In H. C. Quay (Ed.), Handbook of juvenile delinquency (pp. 216–243). Somerset, NJ: Wiley. Statistisches Bundesamt (1984). Zur Situation der Jugend in der Bundesrepublik Deutsch-land [The situation of youth in the Federal Republic of Germany]. Mainz: Kohlhammer. Sullivan, H. S. (1953). The interpersonal theory of psychiatry. New York: Norton. Tobin-Richards, M. H., Boxer, A. M., & Petersen, A. C. (1983). Early adolescents’ perceptions of their physical development. In J. Brooks-Gunn & A. C. Petersen (Eds.), Girls at puberty, biological and psychosocial perspectives (pp. 127–154). New York: Plenum. Verdonik, F., & Sherrod, L. R. (1984). An inventory of longitudinal research on childhood and adolescence. New York: Social Science Research Council. Youniss, J. (1980). Parents and peers in social development: A Sullivan-Piaget perspective. Chicago: University of Chicago Press.
Salkind_Chapter 18.indd 328
9/16/2010 12:42:13 PM
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY
Salkind_Prelims II.indd i
9/4/2010 10:35:19 AM
Salkind_Prelims II.indd ii
9/4/2010 10:35:20 AM
SAGE LIBRARY OF EDUCATIONAL THOUGHT AND PRACTICE
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY VOLUME II
Edited by
Neil J. Salkind
Salkind_Prelims II.indd iii
9/4/2010 10:35:20 AM
Introduction and editorial arrangement © Neil J. Salkind 2011 First published 2011 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. Every effort has been made to trace and acknowledge all the copyright owners of the material reprinted herein. However, if any copyright owners have not been located and contacted at the time of publication, the publishers will be pleased to make the necessary arrangements at the first opportunity. SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1, Mohan Cooperative Industrial Area Mathura Road New Delhi 110 044 SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN: 978-0-85702-178-6 (set of five volumes) Library of Congress Control Number: 2010923776 Typeset by Mukesh Technologies Pvt. Ltd., Pondicherry, India. Printed on paper from sustainable resources Printed by MPG Books Group, Bodmin Cornwall
Salkind_Prelims II.indd iv
9/13/2010 3:52:12 PM
Contents Volume II Section I: Human Development (Continued) 19. 20. 21.
Motor Development as Foundation and Future of Developmental Psychology Esther Thelen Physical Growth Kai Jensen Mental Development during the Preadolescent and Adolescent Periods Gordon Hendrickson
3 31 79
Section II: Curriculum, Instruction and Learning 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32.
Making Sense of Curriculum Evaluation: Continuities and Discontinuities in an Educational Idea David Hamilton Psychology of Learning Environments: Behavioral, Structural, or Perceptual? Herbert J. Walberg Thought and Two Languages: The Impact of Bilingualism on Cognitive Development Rafael M. Diaz Components of a Psychology of Instruction: Toward a Science of Design Robert Glaser The Emergence of Cognitive Psychology Robert R. Holt The Advancement of Learning Ann L. Brown Paradigms of Knowledge and Instruction S. Farnham-Diggory Health Promotion by Social Cognitive Means Albert Bandura Models of the Learner Jerome Bruner Child’s Talk: Learning to Use Language Jerome Bruner The Reflexivity of Cognitive Science: The Scientist as Model of Human Nature Jamie Cohen-Cole
Salkind_Prelims II.indd v
93 123 159 189 211 227 249 267 291 299 303
9/4/2010 10:35:20 AM
vi
Contents
33.
History, Culture, Learning, and Development Patricia M. Greenfield, Ashley E. Maynard and Carla P. Childs 34. Biology and Cognition Jean Piaget (Translated by Martin Faigel) 35. Neural Bases of Intelligence and Training Mark R. Rosenzweig
Salkind_Prelims II.indd vi
333 351 369
9/4/2010 10:35:20 AM
Section I: Human Development (Continued )
Salkind_Chapter 19.indd 1
9/4/2010 10:35:11 AM
Salkind_Chapter 19.indd 2
9/4/2010 10:35:11 AM
19 Motor Development as Foundation and Future of Developmental Psychology Esther Thelen
H
uman infants are born with very little control over their bodies. Yet within a year or so, they are able to sit, stand, walk, reach, manipulate objects, feed themselves, gesture, and even speak a few works. A year later, toddlers are adept at running, climbing, scribbling, riding a tricycle, and talking in simple sentences. For parents, these new motor skills are the most dramatic and visible changes in the first few years of life, frequently noted and commented on. For those interested in studying developmental processes, this sequential unfolding of motor milestones is like an open window on change. Whereas children’s mental lives must be measured indirectly, their movements are continuously observable. Each new pattern is there to see and describe. There are no hidden processes between the control of the movement and its actual execution.
Motor Development: The Golden Age It is no surprise, therefore, that the emergence of motor skills has figured so prominently in the first scientific studies of human development, truly laying the foundation of the field. Rich description of infant movement dates to the last century with Darwin’s (1877) well-known “biography” of his own child and the pioneering work of the German physiologist Preyer (1888), and the tradition was continued into the 20th century with the narratives
Source: International Journal of Behavioral Development, 24(4) (2000): 385–397.
Salkind_Chapter 19.indd 3
9/4/2010 10:35:11 AM
4
Human Development
of Millicent Shinn (1900). The full flowering of the descriptive work in motor development began in the 1920s with the publication of the first of Arnold Gesell’s pathbreaking research and popular monographs (e.g., Gesell, 1928; Gesell & Thompson, 1934). The golden age continued through the 1930s with Mary Shirley’s (1931) exquisite longitudinal descriptions of 25 infants, Myrtle McGraw’s (1935) well-known and still contentious study of the twins Jimmy and Johnny, and the Nancy Bayley Berkeley Growth Study (1935). In 1946, Carmichael’s Manual of Child Psychology contained two seminal articles by Gesell (1946) and McGraw (1946), attesting to the theoretical status of motor studies in the field. There were three important and related legacies from this golden age of motor development research. The first legacy from these pioneers is their theoretical contributions, and especially their strong grounding of human development in biology. The second is empirical through the introduction of detailed and rich description, novel methods for capturing human movement, and clever natural experiments. The third, and perhaps the most enduring, was the establishment of developmental norms. I discuss each of these legacies in turn.
The Legacy of Development Grounded in Biology The two most important theorists of this era were Arnold Gesell and Myrtle McGraw. Both provided us with massive descriptions of early motor development. But for both scientists, the motor catalogues were not the ends in themselves. Rather, both Gesell and McGraw saw these descriptions as a way to understanding the most general, and profound, developmental principles. Gesell and the dynamic principles of growth. Gesell left us the most wellarticulated comprehensive developmental theory, one that is still insightful and attractive (for many details, see Thelen & Adolph, 1992). From the start of his career, Gesell claimed deep roots in the science of biology. He viewed development as a unitary process, requiring description at many levels encompassing evolution, embryology, comparative psychology, neurophysiology, and anthropology. He was especially influenced by Charles Darwin, whom he considered the founder of the scientific study of the child and by G.E. Coghill, an early behavioural embryologist. Gesell credited Darwin with making humans legitimate subjects of scientific study. Before Darwin, Gesell maintained, infants and children were understood primarily in theological and philosophical terms. Because Darwin showed that the human mind was continuous with all other living things, he gave scientists the freedom to study human nature in the same scientific way they investigated the rest of the natural world. Gesell attributed to Darwin his core belief that mental life is continuous with, and impelled by, the same
Salkind_Chapter 19.indd 4
9/4/2010 10:35:11 AM
Thelen
Motor Development
5
processes that drive all organic growth. Moreover, Gesell admired Darwin’s naturalistic methods, and he saw himself, too, as a naturalist, “tirelessly” seeking “ideological order” through relentless observation and comparison (Gesell, 1948, pp. 36–37). Gesell’s other hero was Coghill, who was his contemporary and a leader in the new field of behavioural embryology. Coghill’s major work was on the behavioural embryology of the salamander Amblystoma. Coghill’s contribution was to show a correlation between the development of movement patterns in these animals and corresponding changes in their nervous systems. To do this, Coghill took motion pictures of locomoting salamanders at different stages of development and then painstakingly traced the patterns of their limbs and their bodies. He discovered that a particular form of locomotion, for example a “C” body shape or an “S” body shape was coincident with the growth of specific neural connections (Coghill, 1969). Gesell learned several lessons from Coghill. First, he saw that development was a morphological process, that is, a change in the form of behaviour. Gesell believed that just as movements and postures provided a read-out of the nervous system, so even mental development had this morphological character, that is, it could be understood though observable behaviour. Second, Gesell was convinced that principles of growth and development illustrated by Coghill’s studies of the salamander were the same for all species, including humans. “We believe that the growth processes which mold the body and behavior of the human infant are in essence comparable with those which are being successfully analyzed by experimental embryology”, he claimed (Gesell & Thompson, 1938, p. v). This led to Gesell’s third legacy from Coghill, the view that behavioural changes followed biologically driven neural maturation, and not the other way around. Gesell’s maturationist views stemmed directly from Coghill’s discoveries that in the salamander, changes in movement patterns emerged from neural events that happened before sensory connections were made.1 Movements, therefore, were a product of autonomous neural changes, not from the sensory input. “Patterns of behavior in all species”, Gesell wrote, “tend to follow an orderly genetic sequence in their emergence. This genetic sequence is itself an expression of elaborate pattern – a pattern whose basic outline is itself the product of evolution and is under the influence of maturational factors” (Gesell, 1933, p. 217). Just as Coghill unlocked the mysteries of salamander development through filming postures and movements, so Gesell believed he could discover the secrets of human development by filming and describing the morphology of movement. For example, Gesell learned from embryology that new forms and tissues evolved from chemical gradients and polarities in the fertilised egg and early embryo. In his principle of developmental direction, he argued that just as the direction of early embryonic growth is determined by a longitudinal gradient in the mesoderm, so the same direction of change appears in the fetus and the newborn infant. That is, development proceeds in a head-to-tail fashion,
Salkind_Chapter 19.indd 5
9/4/2010 10:35:12 AM
6
Human Development
in succeeding waves, throughout the early years. Superimposed on this gradient is the proximal-to-distal sequence of maturation, with control emerging first in the trunk and head, and later in the distal digits. This theme of patterns arising from gradients and polarities has re-emerged in contemporary neuroembryology. Edelman (1988), for instance, shows how the complexity of form develops, not through precise specification from the genes, but through local, physical, and chemical interactions at the surfaces of the cells. He echoes Gesell’s views of development as a morphological or “topobiological” process, where collectives of cells or neural elements self-organise into waves of pattern change. The practical importance of Gesell’s theory was to act as a counterweight to the popular notions of behaviourism that were fashionable at that time. To parents who were told that infants were totally shaped by their environment, Gesell offered a different view, one of an autonomous unfolding of potential. The role of the environment was to support this unfolding, but it did not engender it. Thus, there was no point of training or teaching children until they were developmentally ready. McGraw and the biology of development. Myrtle McGraw was also a developmentalist of great sophistication and subtlety who used human motor skill development as her principle empirical data. Like Gesell, she questioned the relevance of behaviourism, the dominant paradigm in experimental psychology, as an explanation for developmental change. Rather, she, too, took inspiration from a generation of experimental biologists to consider the processes of growth, which she understood as a continuous, contingent, holistic but nonlinear organic process (McGraw, 1935). She was also influenced by Coghill for his emphasis on development as patterns of total form (rather than the concatenation of individual reflexes, a more behaviourist view) and for his studies relating nervous structure to behavioural function. Although McGraw believed strongly in the ultimate inseparability of structure and experience, she expressly designed her study of the twins Jimmy and Johnny (1935) to ask which of their early motor skills could be trained and which were more fixed by developmental design. One twin, Johnny, was exercised daily in a variety of skills, both universal (sitting, walking) and culturally specific (swimming, roller-skating). She discovered that although training had some effects on the quality and initial performance of Johnny’s movements, in the long run, intensive training did not make a big difference. These results were taken up by the popular press as refuting associationism and supporting the primacy of maturation. Bergenn, Dalton, and Lipsitt (1992) claim that McGraw’s legacy as a maturationist oversimplifies her more sophisticated view of development. They are likely correct, but the twin study, plus her subsequent work on the development of locomotion, put the role of maturation into the forefront. McGraw’s last major work was a monograph on the development of locomotion, The Neuromuscular Maturation of the Human Infant (1945),
Salkind_Chapter 19.indd 6
9/4/2010 10:35:12 AM
Thelen
Motor Development
7
a model of astute interpretation. Again, inspired by Coghill, she described the phases of prone and upright locomotion as a series of whole body forms. At first, she claimed, infants movements are involuntary and under control of subcortical centres. Successive phases of new forms emerged as behaviour becomes increasingly encephalised. In addressing the role of learning versus maturation in this study. McGraw reiterated that the two influences are impossible to parse apart. Nonetheless, she stated that for locomotion, “Improvement of a function through practice or exercise appears to coincide with cortical participation in the activity” (McGraw, 1945/1972, p. 127), in essence agreeing with Gesell that an amount of readiness is necessary, and that there may be critical periods for advancing function.
The Legacy of Methodology From our technology-rich vantage point, we can only stand in awe of the detailed data so cleverly and painstakingly collected by the early movement pioneers. In addition to photographs and movies, analysed in great detail, they also recorded movement directly. For instance, Burnside (1927) and Shirley (1931) recorded infants’ and toddlers’ footprints by oiling the children’s feet and allowing them to walk on paper. Many critical parameters of gait can be measured by this simple technique, and indeed it is still being used today (Adolph, 1997). Another methodological advance, important in the establishment of developmental norms, was repeated testing of the same children in a large number of standardised tasks. Gesell took this testing to high art, with highly standardised equipment, structured interviews, and detailed instructions to the testers (Gesell & Thompson, 1938). The point, of course, was to chart developmental changes in a systematic way. Piaget’s (1952) methods, in contrast, were very flexible, as he continually adapted the tasks to fit the child. Contemporary researchers have again raised the issue of whether a standardised task really reveals the child’s full abilities, suggesting task materials may need to be scaled to children’s growth (Newell, Scully, McDonald, & Baillargeon, 1989) or measured in a supportive social situation (Vygotsky, 1978). Finally, we must credit both Gesell (Gesell & Thompson, 1929) and McGraw (1935) for the intensive study of twins, a method still used intensively today to parse apart the contributions of genetics and environment.
The Legacy of Developmental Norms The legacies of theory and measurement are closely tied to the third, and perhaps most lasting influence, that of developmental norms. Although mental testing of children dates back to the early part of the 20th century, largely through the work of Alfred Binet, it was Gesell who brought the concept of developmental norms for infants and young children both into
Salkind_Chapter 19.indd 7
9/4/2010 10:35:12 AM
8
Human Development
the mainstream of developmental psychology, into the popular psyche, and into the homes of millions of parents. The test battery that Gesell and his staff perfected in the 1920s and 1930s still forms the basis of the most widely used infant tests today, the Bayley Scales of Infant Development and the Denver Developmental Screening test. Today such tests are so universally accepted that their origins are little discussed. But it was Gesell’s theoretical insights and methodological rigour that led to the vast catalogue of motor milestones on which these tests were based. Gesell, like Coghill, saw in the postural and movement forms a direct reflection of the internal, lawful processes of growth. The link to collecting normative data on infant movement was a direct one: . . . the underlying concepts of the normative study may be summed up as follows: Behavior grows. Growth expresses itself in ordered patterns. Behavior growth, like physical growth, is a morphosis. It is a process which produces a progressive organization of behavior forms. This morphogenesis can be investigated by morphographic methods and especially by analytic cinematography. By these methods we can ascertain the lawful sequences and norms of psychological growth for the purpose of genetic research. These norms may also be used as standards of reference for the analytic appraisal of development status (Gesell & Thompson, 1938, p. 4).
Thus, our contemporary reliance on movement for early developmental diagnosis may have had its origins in Gesell’s theoretical interest in the more general issues of the nature of development. In particular, Gesell believed that movement was the most direct expression of the forms of organic growth. By studying postures and movements of infants and children. Gesell both illustrated the more general principles of development and provided an enormous set of normative data. In the late 1920s and early 1930s, Gesell and his colleagues observed over 500 infants in a carefully structured and highly detailed series of tasks to establish age norms for a long list of motor behaviours. For example, Gesell and Thompson (1934, 1938) reported age norms for 41 stages of siting behaviour, 58 stages of pellet behaviour, and 50 stages of standing and walking behaviour. Many of these behaviours were filmed using precise cinematic methods (Gesell & Thompson, 1934). Gesell claimed that his interest in such extensive norms was not to establish a single model of performance for everyone, but to have a standard by which the “abounding variety of individual differences” (Gesell & Thompson, 1938, p. 4) could be detected and understood. In reality, Gesell’s books for both professionals and parents were much more concerned with the behaviour of the “typical” child at particular ages than in this variability. Indeed, Gesell chose his sample to be highly homogeneous: children of European descent, economically stable, intact families, 99% of whom attended church.
Salkind_Chapter 19.indd 8
9/4/2010 10:35:12 AM
Thelen
Motor Development
9
The idea that by a particular age, an infant or toddler “should” have achieved a particular motor milestone has not only become a standard developmental diagnosis, but it has also become completely entrenched in our cultural beliefs about child-raising. For instance, I own a used copy of Gesell’s The First Five Years of Life (1940) which was originally a gift from “Mother Dingle” to the “Dunlaps” with the loving inscription, “Please read from cover to cover”. This book is one of many written by Gesell which sets normative standards of behaviour based on his observations, and we can imagine the Dunlaps scrutinising their child at each age for “typical behaviour” in each “typical day”. In sum, the golden age of motor development left us with a rich heritage: a deep understanding of growth and form, an appreciation for the interweaving and nonlinear course of development, a sense of our biological continuity, and exact scientific methods. What happened to that legacy?
Motor Development: The Dormant Times After more than two decades of extraordinary theoretical and empirical contributions to our understanding of development, the study of motor systems had declined by 1950 and then lay dormant for nearly 30 years afterward. This dramatic reversal of fortunes can be attributed to both the state of the field itself and to changes occurring in psychology as a whole. As I have suggested previously, it may be that the successes of the early pioneers also contributed to the decline of the field (Thelen, 1995). They produced vast and widely published catalogues of motor milestones and richly detailed descriptive studies. The norms were incorporated into developmental tests and became family lore, at least in middle class North American families. There seemed to be little left to do. Moreover, both Gesell’s and McGraw’s theoretical positions appeared to lead to dead-ends in terms of further empirical studies, but for different reasons. Once Gesell showed, through the descriptive topologies of behaviour, that human development obeyed universal principles, the case was effectively closed. Change dictated by the principles of developmental direction, reciprocal interweaving or individuating maturation was a biological imperative. Although Gesell acknowledged (as did McGraw) the interaction of experience and maturation, he did not inquire further about those mechanisms, nor did he encourage others to do so. For McGraw, the situation was more complex. She admitted in her 1962 Foreword to The Neuromuscular Maturation of the Human Infant that she did not achieve her goal of relating the development of function to the maturation of structure. Part of the problem was that methods for studying changes in the brain were inadequate. But she also recognised that the earlier theoretical formulations were too simplistic, especially the division of behaviour into cortical and subcortical influences, and the neat separation of
Salkind_Chapter 19.indd 9
9/4/2010 10:35:12 AM
10
Human Development
“instincts” versus “acquired traits” or “maturation” opposed to “learning” (McGraw, 1945/1973). Throughout her career, McGraw continued to elaborate a biologically sophisticated theory of development (Bergenn et al., 1992), but she did not translate her insights into new empirical work. And she, too, failed to convince others to continue to use motor development as an entry for understanding processes of change. The decline of motor development must also be understood in the light of the other forces in the field during the 1950s, 1960s, and 1970s. During this time, learning theorists and experimental psychologists dominated academic departments of psychology. Descriptive and normative data, no matter how detailed, could not be compared in apparent “scientific rigour” to the tightly controlled experimental methods practised by these disciplines. Moreover, neither learning theory nor mainstream experimental psychology has traditionally considered how people control their bodies a question of major interest. Movement is often treated as only a by-product of the more psychologically interesting processes, or an arbitrary response modality such as pressing a lever. At the same time, the focus in developmental psychology shifted dramatically. The work of Jean Piaget became increasingly well known and highly influential, igniting fervent interest in the inner mental life of children (as opposed to the form of their overt behaviour), which continues to this day. (It is perhaps ironic that although Piaget’s theory of mental development was entirely grounded in perception-action, later interpreters were more interested in the contents of mind than its sensorimotor origins.) Piaget offered descriptive stages, but he also inspired brilliant experimentation, something that the motor development theorists did not do. The other dominant force on the field came from the ethological and psychodynamic theories of John Bowlby on the nature of attachment. Here again, I believe that Bowlby’s richly descriptive work made such a long-lasting impact because it was implemented experimentally by Mary Ainsworth. The lesson may be that even the most elegant theory survives only as it generates new and copious empirical research.
Motor Development: Two Decades of Rebirth Beginning in about 1980, the tide began to turn again, and interest in movement gradually gained momentum. Just as multiple factors contributed to its decline, the field’s revitalisation has come from a number of converging influences. These included several theoretical advances: new ideas in movement science and biomechanics, insights from ecological psychology, and the import of dynamic systems theory. But the field has also benefited greatly from a new understanding of the plasticity of the brain, and from technological advances in recording movement and brain activity.
Salkind_Chapter 19.indd 10
9/4/2010 10:35:12 AM
Thelen
Motor Development
11
The Importance of N. Bernstein However subtle and interactionist their own positions, the legacy from the neuroembryologists, through Gesell and McGraw, was that movement was a direct read-out of the maturational status of the nervous system. The reverse was scarcely considered – that the developmental course of the nervous system may be moulded by the nature of the body and how it moves. The same “top-down” view was also dominant in the psychology of adult movement, where most effort was devoted to understanding the nature of the executive “motor programme”. We can credit the Soviet movement physiologist, N. Bernstein (1967), for inspiring a real revolution in the conceptualisation of movement. Although he too worked during the 1930s, Bernstein’s ideas only became known in the West after his book was translated in 1967. Bernstein’s seminal insight was to pose the control issue differently. He started with the body, which, he noted, had hundreds of bones and joints and millions of muscle fibres. Yet every movement is a coherent, coordinated event. How can the brain accomplish this feat of coordination, given so many possible combinations, or degrees of freedom? The answer, according to Bernstein, was that movements were organised in synergies, that is, a functional linking together of muscles into ensembles that worked together. What the brain recruited, according to Bernstein, was not individual muscles, but an appropriate pattern to accomplish a functional task. This vastly simplified the control problem. Indeed, Bernstein showed that movement was functionspecific and not muscle-specific: His classic example is that you can sign your name using a pen on paper or using a broomstick on a blackboard, but the signature remains the same. Moreover, as the motor system is assembled for functional action, it actually exploits the mechanical properties of the limbs and body. For example, limbs have springlike properties because of the elastic qualities of the muscles and the anatomical configuration of the joints. When an ordinary spring is stretched and let go, it oscillates in a regular trajectory. The movement pattern need not be explicitly configured because it arises from the natural properties of the spring. Similarly, many aspects of human movements need not be detailed in the nervous system because they arise by themselves from the natural properties of the body. One good example is walking: As people step on one leg and then shift their weight forward over it, the back leg is stretched and stores energy, like a spring. As that leg swings forward, it uses little active muscle contraction, but rather relies on the potential energy gained by stretching the muscles. In his writing on the development of movement, Bernstein turned the old theories on their heads. It was not, he claimed, so much the nervous system instructing the muscles, as the dynamics of the movement instructing the nervous system. Children must learn the biodynamics of their bodies, the changing forces that produce and accompany each movement in each situation.
Salkind_Chapter 19.indd 11
9/4/2010 10:35:12 AM
12
Human Development
To learn to walk, for instance, infants must deal with the complex interactions between the movements of the legs, the centre of gravity of the body, and the support surface. These cannot be instructed beforehand, but must be individually assembled through experience. Eventually, infants learn not just to control their movements, but to make them efficient by exploiting their biomechanics, that is, using what the system can provide “for free”. The impact of Bernstein on the study of motor development was to shift the focus from thinking exclusively of the central nervous system as the sole contributor to the emergence of new skills to considering the contributions of the biomechanics of the moving limbs. In my early work on the coordination and control of spontaneous leg movements in infants, for example, I came across two puzzling results. The first involved the kinematic (time-space) organisation of the common kicking movements seen throughout the first six months. These movements are well coordinated, displaying a nearly simultaneous flexion and extension of the hips, knees, and ankles. Moreover, the durations of the parts of the movements were not random, but showed structure as well (Thelen & Fisher, 1983). How did this structure arise? Surprisingly, when my colleagues and I looked at the patterns of muscle activity that produced kicking movements, we found that the underlying electromyographic activity was far less patterned than the resulting movement. Indeed, infants appeared to just contract all the flexors and extensors at the beginning of the flexion phase of the kick and use very little muscle activation thereafter. Extension was largely passive. This meant that the extension was not directly programmed by the central nervous system, but rather emerged as a consequence of the elastic properties of the legs and their stored energy when flexed. In short, the precise patterns of these movements came about from the interplay between the neural command and the peripheral properties of the body. The second puzzle involved the well known newborn “stepping” reflex. Newborn infants will perform alternating step-like movements when held upright with their feet on a surface. Within a few months, such movements can no longer be elicited. Since the time of Preyer (1888), the common explanation has been that the patterns are a primitive reflex, which becomes inhibited as the cortex, and voluntary movement, matures. However, while stepping when upright does disappear, kinematically identical kicking movements performed while infants are supine, do not, and indeed increase in frequency. To account for this strange disparity – movements that are a function of body posture – we looked again to the peripheral contribution. Here again, we discovered that there was a relation between infants’ abilities to step and the mass of their legs. As their legs became heavier through the deposition of subcutaneous fat in the first few months, their ability to lift them in the biomechanically demanding upright posture decreased. Simply laying infants down, or reducing the effective mass by holding them in a tank of water, restored the movement (Thelen & Fisher, 1982). Again, performance depended on all elements of the moving system.
Salkind_Chapter 19.indd 12
9/4/2010 10:35:12 AM
Thelen
Motor Development
13
Because body masses, lengths, centres of inertia and so on are continually changing as infants grow, and because new postures bring on new biomechanical challenges, skill acquisition is a continually interactive process. Infants must discover how to produce the appropriate coordinative pattern and modulate it to fit the task. Furthermore, the addition of each new skill opens different opportunities for these interactions to occur. Development progresses through each new achievement, setting the next set of challenges. Thus, Bernstein’s impact was to reintroduce the child as an active movement problem-solver, much as Piaget had replaced the child as passive stimulusresponder with one who actively seeks stimuli in the world.
The Importance of Ecological Psychology Bernstein’s message of the child as an active explorer meshed beautifully with the second powerful influence on the renaissance of motor development research. This was the theoretical approach loosely known as ecological psychology, based largely on the work of psychologists Eleanor Gibson (E.J. Gibson, 1969, 1988) and James Gibson (J.J. Gibson, 1966). The basic assumption of Gibsonian psychology is that people, and other animals, are able to directly perceive structured information in the environment that enables them to functionally act within it. (This is in contrast, of course, to the view that the environment does not have meaning until it is reconstructed within the brain.) The goal of development, in the Gibsonian view, therefore, is for infants and children to progressively discover the affordances for action in the environment, a process of matching the abilities of the actors with the opportunities in the world around them. Eleanor Gibson has been especially eloquent in championing the child as an active explorer in this process, where both perception and action are mutually coupled together. According to her, infants, from the beginning, are continually coordinating their movements with concurrent perceptual information to learn how to maintain balance, reach for appropriate objects, and locomote across various surfaces and terrains (Gibson, 1988). Research from this perspective is not primarily concerned with the form of the movement or its neurological control, but how children come to recognise the match between their abilities and the qualities of the task environment. One of E. Gibson’s classic studies, for example, showed that crawling infants crossed both a rigid surface and a squishy waterbed without hesitation. Toddlers, however, hesitated and explored the surface of the waterbed, and then shifted to crawling rather than risk falling on the squishy surface (Gibson et al., 1987). Ecological psychology has shaped the field of contemporary motor development studies in several important ways. First and foremost is the notion that perception and action are inseparable in the formation of skills. Perception is essential for movement, but movement also informs perception. Movements of the head and eyes, for instance, enable the perceiver to
Salkind_Chapter 19.indd 13
9/4/2010 10:35:12 AM
14
Human Development
sample the visual array. Movements of the arms, hands, and fingers are necessary for haptic exploration of new objects and surfaces. Locomotion is essential for understanding the quality of surfaces and the layout of the spatial surroundings. Indeed, we can even cast movement as a form of perception, a way of knowing the world by moving in it. Second, research in the ecological tradition has made researchers aware that perception in the service of action is always multimodal, likely right from birth. For example, a study by Rochat and Morgan (1995) has shown that very young infants are aware of the correspondences between their visual perception of their moving limbs and their proprioceptive and haptic senses of them. In these studies, infants were shown two side-by-side televised displays of their own legs, clothed in distinctively striped stockings. On one display, the video image was concordant with the infants’ views of their legs as they sat in the infant seat. On the second, the legs were reversed. Infants showed by their preferential looking that they could distinguish the two displays. Clearly, they must have, through experience of looking and moving, mapped the correspondences between what they saw and what they felt. The third lasting contribution from ecological psychology to current motor development research is the emphasis on exploration as an important force for developmental change. E.J. Gibson (1988), for instance, describes three overlapping phases of exploration in the first year. At first, infants explore events with vision and hearing. Next, they explore objects with reaching, grasping, and mouthing, and later, they explore the large layout with self-produced locomotion. Changing motor skills contribute to infants’ exploratory behaviour, but the behaviour itself also leads to new motor skills.
The Introduction of Dynamic Systems Both Bernstein and the Gibsons were concerned with a similar issue: How to avoid the “homunculus” problem – some entity in the head of the actor that represents the world and makes decisions to act in it. For Gibson, the solution was direct perception. For Bernstein, the solution was reducing the executive decisions by the synergistic organisation of the brain and body. In the 1980s, a group of young movement theorists centred around the Haskins Laboratories in New Haven, Connecticut merged these two theoretical traditions with recent advances in the physics and thermodynamics of complex systems to produce a radical change in the theory of motor control and development. In two landmark papers, Peter Kugler, Scott Kelso, and Michael Turvey (Kelso, Holt, Kugler, & Turvey, 1980; Kugler, Kelso, & Turvey, 1980) considered Bernstein’s synergies in a new light, as self-organising systems, and described by the same dynamic principles that governed complex, so-called dissipative structures, in physics and chemistry. (Dissipative systems absorb energy to maintain themselves in organised states far from
Salkind_Chapter 19.indd 14
9/4/2010 10:35:12 AM
Thelen
Motor Development
15
thermodynamic equilibrium. All biological systems are dissipative, but so are some other natural systems such as cloud formations or fluid flows.) The key insight here was that when a person assembles a motor synergy to do some task, the participating components cohere and produce patterns that have temporal and spatial organisation that is not the result solely of the detailed instructions from the nervous system. These investigators, and subsequently many others, demonstrated that simple, cyclical movements of the limbs in humans could be described by the mathematics used for coupled oscillators in general. In particular, they discovered that when people move their body parts in a rhythmical fashion, they have preferred coordination modes: Spatial and temporal patterns that are comfortable and easily performed. Other possible patterns are unstable, such that the preferred mode seems to suck them in, or “attract” them. Under certain conditions, people may spontaneously shift coordination modes from one preferred regime to another. Again, using locomotion as an example, quadrupeds use different gaits depending on the speeds of their movements. The gaits performed are those that are the most energy efficient for the particular speed (see Kelso, 1995). There were several important implications of dynamic systems ideas for conceptualising motor development, and for developmental theory in general. First is the restatement of Bernstein’s ideas that every movement is a system-wide ensemble of all participating components, assembled in the context of a particular task at hand. Kugler and Turvey (1987) used the terms “soft-assembly” to describe this flexibility: Behaviour patterns are not prescribed, although some may be preferred. This meant that behaviour was not “hard-wired” into the brain, but emerges “online” in the light of the person’s available structure, energetic resources, and the nature of the task to be done. Second, dynamic systems theory emphasised not only patterns in space, but also that behaviour has a pattern over time as well. This is important because it changes the focus from taking a “snapshot” of behaviour at a particular age or skill level, to more serious considerations of how change occurs over time. Here, the timescale could be seconds, or minutes, or weeks, or months. Indeed, from a dynamic perspective, it makes little sense to consider changes at different time scales as different processes. What happens at an action timescale cascades into changes over the timescales of learning and development. Finally, there is the idea that the coordinative state of a stable movement pattern – for instance, walking or reaching – behaves like a dynamic attractor with varying levels of stability. For a pattern to change, something must disrupt the stability of the old pattern, so that the components can coalesce into new forms. In terms of development, this meant characterising behaviour in terms of its stability to identify the transitions into new forms and test the mechanisms that engender them (see Thelen & Smith, 1994; Thelen & Ulrich, 1991).
Salkind_Chapter 19.indd 15
9/4/2010 10:35:12 AM
16
Human Development
Systems ideas are not new in developmental psychology. They have been proposed by Werner, Lewin, Piaget, and many others. For example, Piaget (1952) believed that new levels of mental equilibration could only be achieved through disequilibration of current stages, similar to the phase shifts described by contemporary dynamicists, and he was profoundly concerned with processes of change. Current dynamic systems theories have gone beyond the old formulations, however, in two important ways. First, we have achieved some level of success in refining the precision of our ideas through formal mathematical models (see Newell & Molenaar, 1998; Thelen, Schöner, Scheier, & Smith, in press). But equally important, dynamic thinking has inspired a renewed interest in empirical work that is closer to Piaget’s original agenda of looking at processes of change. (In the postPiagetian era, studies were more concerned with describing age-defined performance stages.) Process accounts often involve detailed longitudinal studies to identify times of transition combined with microgenetic methods that test possible mechanisms that move the child into new developmental phases. Finally, dynamic systems have also inspired studies that incorporate multiple levels of analysis from the cognitive to the biomechanical. For instance, I have used dynamic systems principles in my own work to uncover the multiple influences on change in locomotor-movement (Thelen & Ulrich, 1991) and reaching (Thelen et al., 1993, 1996).
Plasticity in the Central Nervous System There have been two other influences on research in motor development that bear mention. First, just as our predecessors over 50 years ago were inspired by the current findings in neurophysiology, so too has contemporary thinking been shaped by remarkable discoveries in neuroscience. First is a growing understanding of the systems-wide properties of the brain. Although much research is dedicated to pinpointing local areas that subserve different functions, it has also been discovered that no area works in isolation. Indeed, neural networks supporting perceptual, motor, and cognitive processes are widely and densely interconnected (see, for instance, Edelman, 1987). For example, neurones responding to both spatial localisation of visual targets and to intended movements are found in many areas of the cortex. At the same time, a single neurone may be activated by the visual, planning, memory, and movement aspects of a task (reviewed in Thelen et al., in press). These findings reinforce the idea that perception and action, and its cognitive counterparts, are part of the same continuous and coupled process. Second, neuroscientists have discovered remarkable, dynamic plasticity in even the adult brain. By careful brain mapping in monkeys, they have established that experiences in the world both establish and maintain the functional connectivity of both cortical and subcortical areas. Old ideas
Salkind_Chapter 19.indd 16
9/4/2010 10:35:12 AM
Thelen
Motor Development
17
about the fixity of the adult brain have been overthrown (see, for example, Merzenich, Allard, & Jenkins, 1990). The implication for development is profound: Experience moulds the brain. What may have previously been considered as autonomous maturational changes in brain function may indeed be driven by children’s everyday actions in the world. But the loop is still closed: Just as experience reorganises the brain, so also the resulting improvements in perceptual discrimination, memory, and motor control provide children with new opportunities for experience to further remap the brain. A final major influence in the field has been the important theoretical work of Gerald Edelman, which is a synthesis of current neuroembryology, neurophysiology, and behavioural development consistent with Bernstein, Gibson, and dynamic systems. In his Theory of Neuronal Group Selection, Edelman (1987) proposes that adaptive behaviour emerges as the recurrent perceiving and acting in the world strengthens particular neural networks such that patterns are progressively selected from many wider possibilities. In his view, genetic and neuroembryonic processes provide the rough outline of the neural anatomy. The functional mapping of the brain is subsequently experiencedependent, especially through perceptual-motor exploration.
Technological Contributions to Motor Development When Halverson (1931) published his very detailed studies of the development of prehension, he collected his kinematic data by filming infants reaching over a table with a grid drawn on it. From the film, he traced the path of the hand, frame-by-frame, and then extracted the quantitative data from the measurements on the grid. Seventy years later, the fundamental technique for measuring movement is the same: Sample the position of the body part in space many times per second and reconstruct the pathway of the moving segment. But we now have equipment to do this much more rapidly and accurately. Video has replaced expensive movie film, which required special lighting and development. Devices that automatically track markers on the limbs with great precision are commercially available. Computers that can handle the very large datasets generated by movement analysis are also inexpensive and accessible to all. In addition to measuring movement, we can now also track forces through the use of force platforms and patterns of muscle activation with electromyography. Although it requires a great deal of skill and patience to use these sophisticated techniques with infants and children, there are a number of laboratories around the world that have successfully conquered the challenges. There is no question that these technological advances have contributed greatly to advances in our understanding of motor development, perhaps more than in any other area of developmental psychology. It may be that our visions of what questions can be asked are limited by the means we have to answer them.
Salkind_Chapter 19.indd 17
9/4/2010 10:35:12 AM
18
Human Development
Major Themes in Contemporary Motor Development In this section, I review the current major thrusts in the study of motor development in the light of the historical precedence of the field. Again, this review is far from exhaustive. In the final section, I point to some new directions for the future.
Descriptions of the Development of Motor Skill: Understanding Coordination and Control We still carry on the heritage of McGraw and Gesell by using detailed, longitudinal studies as the foundation for understanding motor skill development. Like our predecessors, we also use these studies to infer developmental changes in the underlying mechanisms. Moreover, from the point of view of dynamic systems, such studies are essential for providing the “landscape” of behavioural patterns: When they are stable and when they change. This is the first step for identifying points of transition, where the system may be probed by experiments. Thus, longitudinal studies are further supplemented by experiments to test hypotheses about coordination and control. Notable examples of contemporary descriptive work are the pioneering studies of Claes von Hofsten on infant reaching (1979, 1982, 1984, 1991). Von Hofsten was the first to reintroduce detailed kinematic measures of infant movement, after Halverson’s efforts 40 years earlier. Especially compelling were von Hofsten’s (1980) demonstrations of young infants catching moving objects, a seemingly remarkable precocious ability. Von Hofsten’s work has been extended by other longitudinal studies of reaching using even more dense sampling techniques and adding kinetic and electromyographic measures (Spencer & Thelen, 2000; Thelen et al., 1993; Thelen, Corbetta, & Spencer, 1996), and following infants until they were several years old (Konczak, Borutta, & Dichgans, 1997; Konczak, Borutta, Topka, & Dichgans, 1995). Lower limb movements have also received attention. For instance, my colleagues and I were the first to describe infants’ spontaneous leg movements using kinematic techniques. Previously, infants’ leg movements, because they were not apparently goal-directed, were thought to be disorganised or random. However, we demonstrated a high degree of coordination both within and between limbs and a pattern of developmental changes in that organisation. We followed this by a more in-depth look at the kinetics of infant leg movements, that is, how infants managed the forces that produce movements (Jensen, Ulrich, Thelen, Schneider, & Zernicke, 1994; Schneider, Zernicke, Ulrich, Jensen, & Thelen, 1990). As in motor development’s “golden-age”, the development of upright locomotion has been a primary focus in contemporary studies. Learning to walk is a dramatic developmental milestone, the transition from infancy to
Salkind_Chapter 19.indd 18
9/4/2010 10:35:12 AM
Thelen
Motor Development
19
childhood. Moreover, there has been increasing recognition of the complexity of the task, and thus, in the question of how infants solve the problems involved (Thelen, 1984). For example, Sutherland (1984) reported gait measures on children up to seven years of age, documenting kinematic changes in step parameters and joint excursions, as well as changes in muscle patterns. Bril and Breniere (1992; Breniere, Bril, & Fontaine, 1989) focused more directly on changes in newly walking infants, using a large force plate to provide detailed descriptions of weight shifts and propulsive forces. Clark and Phillips (1988) looked primarily at changes in interlimb coordination over a similar time period. Moreover, there have been two recent longitudinal studies of interlimb patterns in crawling, another topic well-researched by earlier investigators (Adolph, Vereijken, & Denny, 1998; Freedland & Berntental, 1994). Finally, I mention several longitudinal descriptive studies undertaken from an explicit dynamic systems perspective. Thelen and Ulrich (1991) reported changes over the first year in infant treadmill stepping, emphasising the mechanisms promoting transitions to better performance. Angulo-Kinzler, Ulrich, Chapman, & Thelen (2000) followed this by tracing the continuity between treadmill stepping and later supported and unsupported upright locomotion, using multiple measures of kinematics, kinetics (forces), and electromyography. The longitudinal reaching study I mentioned earlier (Thelen et al., 1993, 1996) was also designed to follow dynamic principles, not only in the dense longitudinal design, but also in the multiple measures used to capture the multiple influences on the task (Spencer & Thelen, 2000; Spencer, Vereijken, Diedrich, & Thelen, 2000). Finally, in a most clever study, Goldfield, Kay, and Warren (1993) used a dynamic analysis to understand how infants learned a novel movement task, bouncing in a Jolly-Jumper. Here, I do not dwell on the detailed results of these investigations. Rather, I suggest that taken together, these contemporary studies of reaching and walking have gone beyond the classic studies in two important ways, reflecting the influence of Bernstein, Edelman, and dynamic systems. First, there is more explicit consideration of the biomechanical aspects of early movement. Theoretically, this means thinking about movement as a biomechanical problem to be solved by the nervous system: We cannot think about neural control distanced from what is controlled. For instance, learning to walk requires keeping the centre of mass of the body over a permissible base of support and controlling the “fall” as the infant steps forward. Crawling requires the correct limb combinations to maintain a dynamic base of support. Reaching involves stabilising one arm segment against the forces generated by the other moving segments. In this view, the patterns of muscle activation may well be the result of the biomechanical demands of the movement, not the cause of the movement. For example, Angulo-Kinzler et al. (2000) discovered that, as Bernstein predicted, the patterns of muscle activation underlying treadmill stepping, supported, and independent walking in infants were much more variable than the patterns of forces that moved the legs.
Salkind_Chapter 19.indd 19
9/4/2010 10:35:12 AM
20
Human Development
And second, contemporary researchers are considering new ideas about variation and individual differences. From the start, researchers in motor development have noticed, measured, and discussed individual differences. Shirley (1931), for instance, documented the differences in onsets of various motor skills in her 25 babies and related these differences to the infants’ physical growth, muscle tone, and “willingness to expend energy” (p. 125). Physical dimensions and movement “styles’’ are still seen as an important part of the story (Adolph, 1997; Thelen et al., 1993). Additionally, today’s researchers are less concerned with variability around age norms than in earlier times. Indeed, in many studies, children are compared on the basis of their skill levels rather than their ages. There are two ways in which variation and individual differences have taken on new theoretical status. First, there is increasing recognition that individual differences in body dimensions, muscle qualities, and inherent energy levels, provide children with different kinds of movement problems that they must solve in order to gain skills. Low energy children with large limbs may have to learn different adaptive strategies than small, wiry, highly energetic ones. This emphasises the problem-solving nature of motor skill development, that there cannot be a rigid, phylogenetic blueprint because individuals must fit their own bodies to their own tasks. Second, we have come to recognise variation itself as the source of developmental change, a heritage from both dynamic systems theory and Edelman’s selectionism. If children do not have multiple options, they will be stuck in only a few solutions. Thus, some investigators have reported not just their dependent variables, but also the structure of variability as indicators of when skills are stable and when they change (e.g., Thelen & Ulrich, 1991). Experimental studies of early skills. One important way to uncover processes of change is to experimentally manipulate variables to which the system is sensitive. For example, I mentioned previously my hypothesis that infants’ limb masses were important in the disappearance of the stepping reflex. To test this, we changed the limb mass by submerging the infants’ legs in water or adding weights (Thelen, Fisher, & Ridley-Johnson, 1984). More recently, the effects of biomechanics and movement have been evaluated by changing infants’ postures (Jensen et al., 1994; Savelsbergh & van der Kamp, 1993) or by adding weights to their limbs and torsos (Adolph & Avolio, 2000; Thelen, Skala, & Kelso, 1987). To study the development of postural control, for example, researchers have commonly intervened experimentally by placing infants and children on a platform that perturbs the infants’ balance. By this means, researchers can test the limits of children’s postural stability at various ages, and investigate the underlying neuromotor mechanisms that produce the response (Woollacott & Sveistrup, 1992). I should note here that several studies have combined the longitudinal, descriptive method with experiments by repeating an experimental manipulation in the same infants over different ages. A most elegant example of
Salkind_Chapter 19.indd 20
9/4/2010 10:35:12 AM
Thelen
Motor Development
21
this is Adolph’s (1997) study of infants climbing up and down slopes, where she followed babies from the onset of crawling through stable independent walking. At each visit, Adolph assessed their abilities to judge climbable slopes with a psychophysical measurement on slopes of different grades. Similarly, Angulo-Kinzler et al. (2000) tested infants in three locomotor contexts varying in support from eight months until they were walking well. Studies in perception-action coupling. Although all motor development studies are perception-action studies, experiments explicitly in the Gibsonian tradition continue to dominate the field. Here the issue is not so much motor control per se, but how action is modulated by perception, and in turn, how action informs perception. For example, early work by Lee and Aronson (1974) and Butterworth and Hicks (1977) established the dominance of vision in toddlers’ postural control by using the famous “moving room” paradigm. This situation was later extended and refined by Bertenthal and Bai (1989) with younger infants and also used by Stoffregen, Schmuckler, and Gibson (1987) to show that infants were sensitive to peripheral visual flow. Since then, there has been considerable debate over the relative importance of vision and proprioception in the control of posture. In their review, Bertenthal and Clifton (1998) concluded that the balance between multiple sources of information may be contextually determined, much like the soft-assembly of action I described earlier. The role of perception in prehension has also been a topic of intense study. Early researchers, including Piaget (1952), Bruner (1973), and White, Castle, and Held (1964) stressed the necessity of infants gradually learning to match the sight of their hands with the sight of the objects to be reached. Questions about this gradual visual-visual matching arose with von Hofsten’s (1982) report of infants’ directed reaches during the newborn period, before a long period of learning commenced. Recently, Clifton, Muir, Ashmead, & Clarkson (1993) further challenged the visual-matching idea by showing that at first, infants reached as well in the dark to a lighted or sounding object as when they could see their hands. This work underscored the importance of learning the “feel” as well as the sight of the arm and hand. Over the first year, infants are not only better able to reach a target, they become increasingly skilled in anticipating the location and size of objects. Notable work in this area includes von Hofsten and Ronnqvist’s (1988) descriptions of the development of anticipatory hand shaping and Ashmead, McCarty, Lucas, & Belvedere’s (1993) demonstration of infants’ abilities to adjust movements “online”. McCarty, Clifton, and Collard (1999) investigated infants’ discovery of the correct way to grip a spoon. In a particularly clever experiment, they presented the spoon in varying orientations to see when and how infants could anticipate and perform the appropriate grip. The studies I have mentioned so far have looked at how infants use perception to adjust movements. But the converse is also true: Children use actions
Salkind_Chapter 19.indd 21
9/4/2010 10:35:13 AM
22
Human Development
to inform perception, primarily through the use of exploratory movements (Gibson, 1988). For instance, when confronted with an unfamiliar surface, infants, unsure of the suitability of the surface for locomotion, will use their hands to touch and pat (Adolph, 1997; Gibson et al., 1987). Bushnell and Boudreau (1993) provide an excellent illustration. They show how infants’ perceptual detection of the properties of objects such as weight, texture, or sounding abilities, develops only as their motor abilities are sufficient to manipulate the objects appropriately. In sum, we have learned a great deal about development in general through the experimental study of perception and action. In particular, this work has emphasised the continual active role of children in exploring their environments, and the context-dependent, problem-solving process that constitutes developmental change. Many of the old ideas about developmental timetables can now be recast in different terms. Instead of a phylogenetically determined sequence of stages, development is better conceptualised as a changing landscape of patterns, whose stability depends not only on the organic status of the child, but also on their experiential history, and how those interact with the particular task at hand. A final example makes the point well. In his classic work, Gesell described a series of grip configurations infants used to grasp a cube, ranging from a simple scoop to the fine pincer grasp. Newell and colleagues (1989) looked at the grasping task in a new way. They reasoned that for a young infant, a one-inch cube presented a different task, based on the infant’s hand size, than to an older infant or an adult. Indeed, they discovered that infants were much more adaptable, depending on the cube size, with young infants using more appropriate grasping patterns than previously thought possible. In short, the baby could “soft-assemble” a solution matching their own skill level to the demands of the task. Cognition and motor skill development. A final area gaining in prominence is the intersection between perception-action and cognition. Views on the relationship between these two domains have been complex. For many years, motor skill and cognition were believed to be unrelated because early studies showed only modest correlation, if any, between children’s motor and intellectual development (e.g., Shirley, 1931). (Of course, this has always been a contentious issue, because, especially in infancy “mental” test items have an enormous motor component, e.g., Bayley, 1936.) On the other hand, Gesell believed that both domains were governed by the same developmental principles. For Piaget, cognition was built from perception and action, and Piaget’s descriptions of how early motor skills, such as reaching and sucking, are used in the service of developing cognition are still among our most insightful. Today, there is little interest in using motor development to predict later mental status, but there is increasing agreement with Piaget of the tight linkage between movement and cognitive development. The work of Bertenthal and Campos and their colleagues has been especially influential in this regard
Salkind_Chapter 19.indd 22
9/4/2010 10:35:13 AM
Thelen
Motor Development
23
(e.g., Bertenthal & Campos, 1990; Berthental, Campos, & Barrett, 1984). These scholars argue that one setting event – the onset of crawling – initiates a developmental cascade that has consequences for changes in spatial cognition and emotional development. The mechanisms, by which being able to move about changes the ways that babies think, are not fully understood, but may involve their increased attention to perceptual information as they move about. In other words, movement helps children sample the world more completely. There is, perhaps, even a more basic way in which movement and cognition are tightly linked, a way close to what Piaget envisioned (Thelen, 2000). Infants, children, and adults are perceiving and moving all of their waking hours. Movement itself is a form of perception because the proprioceptive and haptic senses are continuously receiving information, information that is perfectly coupled with information from the external senses such as vision and hearing. Thus, movement is an integral part of the ensemble of all our experience, including the times when we are just looking at something, because looking involves movements of the eyes, head, and neck. If, in the Piagetian sense, higher cognition is built from sensorimotor experiences, then the movement that occurs with those experiences is remembered and recalled to the same degree as information from the other perceptual senses. Even as mental events become more abstracted from the immediacy of the senses with development, they never become fully disassociated from the sensorimotor events that produced them (Edelman, 1987; Thelen, 2000). Indeed the hallmark of a skilled person is the ability to process efficiently both “online” and “offline” and to be able to switch between these modes as the situation demands. I have argued (Thelen, 2000; Thelen et al., in press) that this flexibility demands that action and mental events be encoded in the same dynamic language so that they can be tightly intermeshed, and that this encoding is there from the start.
An Agenda for the Future: It is not Just Motor any Longer Today, motor development is a robust field, with strong theoretical bases and empirical work of great sophistication. The continued strength of the field in the future. I believe, lies both in our abilities to pursue important issues within the field and, at the same time, to tie motor development with advances in other, related disciplines. In this concluding section, I offer some thoughts about these future directions. Multimodal perception and action. We have made great strides in understanding the role of visual perception in the development of reaching, posture, and locomotion. But, as I discussed above, experience is continually multimodal, including the perception of movement. Much less is known about how infants and children use correlated information from multiple sources, especially in decisions to act (but see, for instance, Streri & Pêcheux, 1986).
Salkind_Chapter 19.indd 23
9/4/2010 10:35:13 AM
24
Human Development
Formal models and robotics. The processes involved in motor development are excellent candidates for a variety of types of computational and dynamic models, as well as implementation in robots. Modelling of any type offers the opportunity to think more precisely about the phenomenon in question, and to generate testable hypotheses about the processes involved. There is already considerable progress towards this end (see, for instance, Newell & Molenaar, 1998). Examples inspired explicitly by dynamic systems theory include Robertson’s (1993; Robertson, Cohen, & Mayer-Kress, 1993) dynamic analysis of the time structure of fetal and infant movements, Bertenthal, Boker, & Rose’s (1995) dynamic analysis of infant postural control, Goldfield et al.’s (1993) work on infants learning the Jolly-Jumper, Fitzpatrick, Schmidt, and Lockman’s (1996) analysis of children learning to clap, and Thelen et al.’s (in press) dynamic field model of infant perseverative reaching. Other notable efforts are Taga’s (1995) model of the development of locomotion, and Berthier’s (1996) simulations of infants learning to reach. The models of Sporns and Edelman (1993; Almassy, Edelman, & Sporns, 1998) are implemented both on the computer and in an autonomous mobile robot. Indeed, there is considerable interest in the “developmental” aspects of such robots, and especially how perception and action work together to produce emergent adaptive behaviour (Pfeifer & Scheier, 1999). Embodied cognition. There is a great need for studying the role of movement in so-called “higher” cognition – memory, decision making, categorisation, and language. For example, Thelen, Smith, and colleagues have demonstrated the role of body memory in a classic Piagetian task, the A-not-B error (Smith, Thelen, Titzer, & McLin, 1998; Thelen et al., in press). They showed that when infants reach several times to one of two targets, they build up a location memory of the target that also includes the feel of the arms and infants’ postural set, and that these memories influence further decisions to reach. This is a clear demonstration that movement is not separate from remembering and deciding, which are traditionally considered “cognitive” processes. These authors suggest that movement must be considered as part of every task: What aspects of movements that accompany everyday actions are remembered and encoded as part of the task ensemble? A very promising entry into this question lies in the area of speech and gesture. Infants produce interpretable gestures many months before they speak, and of course, gestures are universal in older children and adults. Until recently, gestures were considered as by-products of speech, or augmented communication. Now the motor aspects of both gesture and speech are being reconsidered, especially the deep coupling between hand gestures and cognition. There is also considerable evidence that control of hand and mouth are both phylogenetically and ontogenetically linked, and that indeed language acts are profoundly embodied (Iverson & Thelen, 1999).
Salkind_Chapter 19.indd 24
9/4/2010 10:35:13 AM
Thelen
Motor Development
25
Neural bases of motor skill development. The field of “developmental cognitive neuroscience” is just coming into its own (e.g., Nelson, in press). Current neuroimaging and direct recording techniques are not well adapted to studying large movements in normal young human subjects. However, scientists have used other approaches to better understand brain correlates of skill development. For example, several research programmes are following groups of infants, largely prematurely born, who have suffered well-characterised, perinatal brain lesions. Many show considerable recovery of function while still others do not attain fully functional outcomes (see review in Elman et al., 1996). Such studies raise profound issues both about early plasticity and the effects of experience, but also about the old ideas of localisation of function, because it is apparent that when there is injury to one part of the brain, other areas can assume needed functions. It is hoped that as collaborations between neuroscientists and developmentalists increase, and with inevitable technical advances, this exciting area of brain-behaviour interface will grow at a rapid pace. Learning and plasticity. The explosion of interest in learning and plasticity in the neurosciences has already reverberated in our field. There is already a large and detailed literature devoted to motor learning and plasticity in human adults and nonhuman primates. Unfortunately, as yet, few of these findings have been extended to infants and children, although they are highly relevant. For example, Shadmehr and colleagues (e.g., Shadmehr & Mussa-Ivaldi, 1994) have conducted elegant studies examining how adults learn novel motor tasks, what is remembered from these newly learned tasks, and then how the newly learned task competes with previously learned ones. This is a good model for learning new skills as infants and children, yet much work needs to be done (but see Adolph, 1997, for a good example of competing skills.) Cultural and individual differences. Individual and cultural differences in learning motor skills are as important as the commonalities that children share. Such differences inform us about the plasticity of developmental pathways, and the influences and limits of daily experience in shaping them. All cultures and all intact individuals learn to sit, walk, reach and manipulate, and speak, but what are the diverse means of attaining similar ends? For example, work by Bril and colleagues (e.g., Bril & Sabatier, 1986) has documented child care practices and beliefs in Mali, which differ markedly from Western cultures, with the suggestion that they may contribute to differences in motor skill development. Campos and colleagues (2000) have undertaken a large project in China investigating the consequences of cultural restrictions on crawling, especially on spatial cognition. This remains an understudied area of motor development, likely due to the difficulties of collecting cross-cultural data, but one with great potential for helping us understand deep developmental issues.
Salkind_Chapter 19.indd 25
9/4/2010 10:35:13 AM
26
Human Development
Conclusion The study of motor development is alive and vigorous at the turn of the century, perhaps reviving and reliving its old golden age. It has again returned to the mainstream of developmental psychology, but the field also has extensions, as I have shown, into new fields, including neuroscience, cognitive science, and motor science. Moreover, the theoretical and empirical work of the last two decades is reshaping clinical practice dealing with perceptual-motor disorders in infants and children. I believe, as in other fields of psychology, real progress will be made in the next millennium, only as we continue to combine our traditional naturalistic and experimental approaches with insights from other disciplines. This will work in two ways. First, we will learn more about the development of perception and action as we bring in information and techniques from neuroscience, cognitive science, clinical practice, and the like. But I also think we will take leadership in the 21st century as we continue to show the centrality of movement in other domains of psychological interest. Finally, we offer models of how development can truly be studied as a time-dependent process.
Note 1. Bergenn et al. (1992) argue that Gesell’s interpretation of Coghill is more ‘‘maturationist’’ than Coghill himself espoused.
References Adolph, K.E. (1997). Learning in the development of infant locomotion. Monographs of the Society for Research in Child Development, 62 (3, Serial No. 251). Adolph, K.E., & Avolio, A.M. (2000). Walking infants adapt locomotion to changing body dimensions. Journal of Experimental Psychology: Human Perception and Performance, 26, 1148–1166. Adolph, K.E., Vereijken, B., & Denny, M.A. (1998). Roles of variability and experience in development of crawling. Child Development, 69, 1299–1312. Almassy, N., Edelman, G.M., & Sporns, O. (1998). Behavioral constraints in the development of neuronal properties: A cortical model embedded in a real world device. Cerebral Cortex, 8, 346–361. Angulo-Kinzler, R.M., Ulrich, B.D., Chapman, D., & Thelen, E. (2000). Context and control in the step patterns of newly walking infants. Manuscript submitted for publication. Ashmead, D.H., McCarty, M.E., Lucas, L.S., & Belvedere, M.C. (1993). Visual guidance in infants’ reaching toward suddenly displaced targets. Child Development, 64, 1111–1127. Bayley, N. (1936). The development of motor abilities during the first three years: A study of sixty-one infants tested repeatedly. Monographs of the Society for Research in Child Development, 1, 26–61. Bergenn, V.W., Dalton, T.C., & Lipsitt, L.P. (1992). Myrtle B. McGraw: A growth scientists. Developmental Psychology, 28, 381–395. Bernstein, N. (1967). The coordination and regulation of movements. Oxford: Pergamon.
Salkind_Chapter 19.indd 26
9/4/2010 10:35:13 AM
Thelen
Motor Development
27
Bertenthal, B.I., & Bai, D.L. (1989). Infants’ sensitivity to optical flow for controlling posture. Developmental Psychology, 25, 936–945. Bertenthal, B.I., Boker, S.M., & Rose, J.L. (1995). Dynamical analyses of postural development. Journal of Sport and Exercise Psychology, 17, 8. Bertenthal., B.I., & Clifton, R.K. (1998). Perception and action. In W. Damon (Ed.), Handbook of child psychology: Vol. 2. Cognition, perception and language (pp. 51–102). New York: Wiley. Bertenthal, B.I., & Campos, J.J. (1990). A systems approach to the organizing effects of self-produced locomotion during infancy. In C. Rovee-Collier & L.P. Lipsitt (Eds.), Advances in infancy research (Vol. 6, pp. 1–60). Norwood, NJ: Ablex. Bertenthal, B.I., Campos, J.J., & Barrett, K.C. (1984). Self-produced locomotion: An organizer of emotional, cognitive, and social development in infancy. In R. Emde & R. Harmon (Eds.), Continuities and discontinuities in development (pp. 175–210). New York: Plenum. Berthier, N.E. (1996). Learning to reach: A mathematical model. Developmental Psychology, 32, 811–823. Breniere, Y., Bril, B., & Fontaine, R. (1989). Analysis of the transitions from upright stance to steady state locomotion in children with under 200 days of autonomous walking. Journal of Motor Behavior, 21, 20–37. Bril, B., & Breniere, Y. (1992). Postural requirements and progression velocity in young walkers. Journal of Motor Behavior, 24, 105–116. Bril, B., & Sabatier, C. (1986). The cultural context of motor development: Postural manipulations in the daily life of Bambara babies (Mali). International Journal of Behavioral Development, 9, 439–453. Bruner, J.S. (1973). Organization of early skilled action. Child Development, 44, 1–11. Burnside, L.H. (1927). Coordination in the locomotion of infants. Genetic Psychology Monographs, 2, 279–372. Bushnell, E.W., & Boudreau, J.P. (1993). Motor development and the mind: The potential role of motor abilities as a determinant of aspects of perceptual development. Child Development, 64, 1005–1021. Butterworth, G., & Hicks, L. (1977). Visual proprioception and postural stability in infancy: A developmental study. Perception, 6, 255–262. Campos, J.J., Anderson, D.I., Barbu-Roth, M.A., Hubbard, E.M., Hertenstein, M.J., Witherington, D. (2000). Travel broadens the mind. Infancy, 1, 149–219. Clark, J.E., & Phillips, S.J. (1993). A longitudinal study of interlimb coordination in the first year of independent walking: A dynamical systems analysis. Child Development, 64, 1143–1157. Clifton, R.K., Muir, D., Ashmead, D.H., & Clarkson, M.G. (1993). Is visually guided reaching in early infancy a myth? Child Development, 64, 1099–1110. Coghill, G.E. (1969). Anatomy and the problem of behavior. New York: Macmillan. (Original work published 1929) Darwin, C. (1877). Biographical sketch of an infant. Mind, 2, 285–294. Edelman, G.M. (1987). Neural Darwinism: The theory of neuronal group selection. New York: Basic Books. Edelman, G.M. (1988). Topobiology: An introduction to molecular embryology. New York: Basic Books. Elman, J.L., Bates, E.A., Johnson, M.H., Karmiloff-Smith, A., Parisi, D., & Plunkett, K. (1996). Rethinking innateness: A connectionist perspective on development. Cambridge, MA: MIT Press. Fitzpatrick, P., Schmidt, R.C., & Lockman, J.J. (1996). Dynamical patterns in the development of clapping. Child Development, 67, 2691–2708. Freedland, R.L., & Bertenthal, B.I. (1992). Kinematic analyses of the development of creeping in human infants. Infant Behavior and Development, 15, 300.
Salkind_Chapter 19.indd 27
9/4/2010 10:35:13 AM
28
Human Development
Gesell, A. (1928). Infancy and human growth. New York: Macmillan. Gesell, A. (1933). Maturation and the patterning of behavior. In C. Murchison (Ed.), A handbook of child psychology (2nd rev. ed., pp. 209–235). Worcester, MA: Clark University Press. Gesell, A. (1940). The first five years of life. New York: Harper. Gesell, A. (1946). The ontogenesis of infant behavior. In L. Carmichael (Ed.), Manual of child psychology (pp. 295–331). New York: Wiley. Gesell, A. (1948). Studies in child development. Westport, CT: Greenwood Press. Gesell, A., & Thompson, H. (1929). Learning and growth in identical infant twins: An experimental study by the method of co-twin control. Genetic Psychology Monographs, 6, 1–124. Gesell, A., & Thompson, H. (1934). Infant behavior: Its genesis and growth. New York McGraw-Hill. Gesell, A., & Thompson, H. (1938). The psychology of early growth including norms of behavior and a method of genetic analysis. New York: Macmillan. Gibson, E.J. (1969). Principles of perceptual learning and development. Englewood Cliffs, NJ: Prentice-Hall. Gibson, E.J. (1988). Exploratory behavior in the development of perceiving, acting and the acquiring of knowledge. Annual Review of Psychology, 39, 1– 41. Gibson, E.J., Ricco, G., Schmuckler, M.A., Stoffregen, T.A., Rosenberg, D., & Taormina, J. (1987). Detection of the traversability of surfaces by crawling and walking infants. Journal of Experimental Psychology: Human Perception and Performance, 13, 533–544. Gibson, J.J. (1966). The senses considered as perceptual systems. Boston, MA: Houghton Mifflin. Goldfield, E.C. (1995). Emergent forms: Origins and early development of human action and perception. New York: Oxford University Press. Goldfield, E.C., Kay, B.A., & Warren, W.H. (1993). Infant bouncing: The assembly and turning of action systems. Child Development, 64, 1128–1142. Halverson, H.M. (1931). An experimental study of prehension in infants by means of systematic cinema records. Genetic Psychology Monographs, 10, 107–286. Iverson, J.M., & Thelen, E. (1999). Hand, mouth, and brain: The dynamic emergence of speech and gesture. Journal of Consciousness Studies, 6, 19–40. Jensen, J.L., Ulrich, B.D., Thelen, E., Schneider, K., & Zernicke, R.R. (1994). Adaptive dynamics of the leg movement patterns of human infants: I. The effects of posture on spontaneous kicking. Journal of Motor Behavior, 26, 303–312. Kelso, J.A.S. (1995). Dynamic patterns: The self-organization of brain and behavior . Cambridge, MA: MIT Press. Kelso, J.A.S., Holt, K.G., Kugler, P.N., & Turvey, M.T. (1980). On the concept of coordinative structures as dissipative structures: I. Empirical lines of convergence. In G. Stelmach & J. Requin (Eds.), Tutorials in motor behavior (pp. 49–70). Amsterdam: North-Holland. Konczak, J., Borutta, M., Topka, H., & Dichgans, J. (1995). The development of goaldirected reaching in infants: hand trajectory formation and joint torque control. Experimental Brain Research, 106, 156–168. Konczak, J., Borutta, M., & Dichgans, J. (1997). The development of goal-directed reaching in infants: II. Learning to produce task-adequate patterns of joint torque. Experimental Brain Research, 113, 465–474. Kugler, P.N., Kelson, J.A.S., & Turvey, M.T. (1980). On the concept of coordinative structures as dissipative structures: I. Theoretical lines of convergence. In G. Stelmach & J. Requin (Eds.), Tutorials in motor behavior (pp. 1–47). Amsterdam: North-Holland. Kugler, P.N., & Turvey, M.T. (1987). Information, natural law, and the self-assembly of rhythmic movement. Hillsdale, NJ: Erlbaum. Lee, D.N., & Aronson, E. (1974). Visual proprioceptive control of standing in human infants. Perception and Psychophysics, 15, 529–532.
Salkind_Chapter 19.indd 28
9/4/2010 10:35:13 AM
Thelen
Motor Development
29
McCarty, M.E., Clifton, R.K., & Collard, R.R. (1999). Problem solving in infancy: The emergence of an action plan. Developmental Psychology, 35, 1091–1101. McGraw, M.B. (1935). Growth: A study of Johnny and Jimmy. New York: AppletonCentury-Crofts. McGraw, M.B. (1945). The neuromuscular maturation of the human infant. New York: Hafner. (Reprinted, 1972.) McGraw, M.B. (1946). Maturation of behavior. In L. Carmichael (Ed.), Manual of child psychology (pp. 332–369). New York: Wiley. Merzenich, M.M., Allard, T.T., & Jenkins, W.M. (1990). Neural ontogeny of higher brain functions: Implications of some recent neurological findings. In O. Franzen & P. Westman (Eds.), Information processing in the somatosensory system (pp. 293–311). London: Macmillan. Nelson, C. (Ed.) (in press). Handbook of developmental cognitive neuroscience. Cambridge, MA: MIT Press. Newell, K.M., & Molenaar, P.C.M. (Eds.) (1998). Applications of nonlinear dynamics to developmental process modeling. Mahwah, NJ: Erlbaum. Newell, K.M., Scully, D.M., McDonald, P.V., & Baillargeon, R. (1989). Task constraints and infant grip configurations. Developmental Psychobiology, 22, 817–832. Pfeifer, R., & Scheier, C. (1999). Understanding intelligence. Cambridge, MA: MIT Press. Piaget, J. (1952). The origins of intelligence in children. New York: International Universities Press. Preyer, W. (1888). The mind of the child: Part I. The senses and the will (H.W. Brown, Trans.). New York: Appleton. Robertson, S.S. (1993). Oscillation and complexity in early infant behavior. Child Development, 64, 1022–1035. Robertson, S.S., Cohen, A.H., & Mayer-Kress, G. (1993). Behavioral chaos: Beyond the metaphor. In L. Smith & E. Thelen (Eds.), A dynamic systems approach to development (pp. 119–150). Cambridge, MA: MIT Press. Rochat, P., & Morgan, R. (1995). Spatial determinants in the perception of self-produced leg movements by 3- to 5-month-old infants. Developmental Psychology, 31, 626–636. Savelsbergh, G.J.P., & Kamp, J.v.d. (1993). The coordination of infants’ reaching, grasping, catching and posture: A natural physical approach. In G.J.P. Savelsbergh (Ed.), The development of coordination in infancy (pp. 289–317). Amsterdam: Elsevier. Schneider, K., Zernicke, R., Ulrich, B., Jensen, J., & Thelen, E. (1990). Understanding movement control in infants through the analysis of limb intersegmental dynamics. Journal of Motor Behavior, 22, 493–520. Shadmehr, R., & Mussa-Ivaldi, F.A. (1994). Adaptive representation of dynamics during learning of a motor task. Journal of Neuroscience, 14, 3208–3224. Shin, M.W. (1900). The biography of a baby. New York: Houghton Mifflin. Shirley, M.M. (1931). The first two years, a study of twenty-five babies: I. Postural and locomotor development. Minneapolis, MN: University of Minnesota Press. Smith, L.B., Thelen, E., Titzer, R., & McLin, D. (1999). Knowing in the context of acting: The task dynamics of the A-not-B error. Psychological Review, 106, 235–260. Spencer, J.P., & Thelen, E. (2000). Spatially specific changes in infants’ muscle co-activity as they learn to reach. Infancy, 1, 275–302. Spencer, J.P., Vereijken, B., Diedrich, F.J., & Thelen, E. (2000). Posture and the emergence of manual skills. Developmental Science, 3, 216–233. Sporns, O., & Edelman, G.M. (1993). Solving Bernstein’s problem: A proposal for the development of coordinated movement by selection. Child Development, 64, 960–981. Stoffregen, T.A., Schmuckler, M.A., & Gibson, E.J. (1987). Use of central and peripheral optical flow in stance and locomotion in young walkers. Perception,16, 113–119. Streri, A., & Pêcheux, M. (1986). Vision-to-touch and touch-to-vision transfer of form in 5-month-old infants. British Journal of Developmental Psychology, 4, 161–167.
Salkind_Chapter 19.indd 29
9/4/2010 10:35:13 AM
30
Human Development
Sutherland, D.H. (1984). Gait disorders in childhood and adolescence. Baltimore, MA: Williams & Wilkins. Taga, G. (1995). A model of the neuro-muscular-skeletal system for human locomotion: I. Emergence of basic gait. Biological Cybernetics, 73, 97–111. Thelen, E. (1984). Learning to walk: Ecologial demands and phylogenetic constraints. In L.P. Lipsitt (Ed.), Advances in infancy research (Vol. II, pp. 213–250). Norwood, NJ: Ablex. Thelen, E. (1995). Motor development: A new synthesis. American Psychologist, 50, 79–95. Thelen, E. (2000). Grounded in the world: Developmental origins of the embodied mind. Infancy, 1, 3–28. Thelen, E., & Adolph, K.E. (1992). Arnold L. Gesell: The paradox of nature and nuture. Developmental Psychology, 28, 368–380. Thelen, E., Corbetta, D., Kamm, K., Spencer, J.P., Schneider, K., & Zernicke, R.F. (1993). The transition to reaching: Mapping intention and intrinsic dynamics. Child Development, 64, 1058–1098. Thelen, E., Corbetta, D., & Spencer, J.P. (1996). The development of reaching during the first year: The role of movement speed. Journal of Experimental Psychology: Human Perception and Performance, 22, 1059–1076. Thelen, E., & Fisher, D.M. (1982). Newborn stepping: An explanation for a “disappearing reflex”. Developmental Psychology, 18, 760–775. Thelen, E., & Fisher, D.M. (1983). The organization of spontaneous leg movements in newborn infants. Journal of Motor Behavior, 15, 353–377. Thelen, E., Fisher, D.M., & Ridley-Johnson, R. (1984). The relationship between physical growth and a newborn reflex. Infant Behavior and Development, 7, 479–493. Thelen, E., Schöner, G., Scheier, C., & Smith, L.B. (in press). The dynamics of embodiment: A field theory of infant perseverative reaching. Behavioral and Brain Sciences. Thelen, E., Skala, K., & Kelso, J.A.S. (1987). The dynamic nature of early coordination: Evidence from bilateral leg movements in young infants. Developmental Psychology, 23, 179–186. Thelen, E., & Smith, L.B. (1994). A dynamic systems approach to the development of cognition and action. Cambridge, MA: MIT Press. Thelen, E., & Ulrich, B. (1991). Hidden skills: A dynamic systems analysis of treadmill stepping during the first year. Monographs of the Society for Research in Child Develompent, 56(1, Serial No. 223). von Hofsten, C. (1979). Development of visually guided reaching: The approach phase. Journal of Human Movement Studies, 5, 160–178. von Hofsten, C. (1980). Predictive reaching for moving objects by human infants. Journal of Experimental Child Psychology, 30, 369–382. von Hofsten, C. (1982). Eye-hand coordination in the newborn. Developmental Psychology, 18, 450–461. von Hofsten, C. (1984). Developmental changes in the organization of prereaching movements. Developmental Psychology, 20, 378–388. von Hofsten, C. (1991). Structuring of early reaching movements: A longitudinal study. Journal of Motor Behavior, 23, 280–292. von Hofsten, C., & Ronnqvist, L. (1988). Preparation for grasping an object: A developmental study. Journal of Experimental Psychology: Human Perception and Performance, 14, 610–621. Vygotsky, L.S. (1978). Mind in society: The development of higher psychological processes Cambridge, MA: Harvard University Press. White, B., Castle, P., & Held, R. (1964). Observations on the development of visuallydirected reaching. Child Development, 35, 349–364. Woollacott, M., & Sveistrup, H. (1992). Changes in the sequencing and timing of muscle response coordination associated with developmental transitions in balance abilities. Human Movement Science, 11, 23–36.
Salkind_Chapter 19.indd 30
9/4/2010 10:35:13 AM
20 Physical Growth Kai Jensen
P
hysical Growth is a biological process which involves rates, directions, and patterns of change and development affected by a variety of diverse and complex external and internal factors and causes. It encompasses a diversity of detectable and measurable changes in size, shape, or function occurring in living organisms with the passage of time. A multiplicity of scientific disciplines study physical growth from a variety of angles at many different levels with increasingly refined and ingenious methods and technics. Genetic origins and backgrounds; reproduction; cell multiplication; protein synthesis; the role of chemical excitors and inhibitors; cell migration; prenatal development; birth phenomena; developmental history of special tissues, organs, and intact organisms; increases in body measurements and changes in shape; comparative growth of groups; interindividual and intra-individual growth; and environmental conditioners and impacts have all proved intriguing and often rewarding fields for study. Because of the many problems involved, the great variety of disciplines interested, the high stakes, and the many levels and avenues of approach, the literature is vast – far too great to be adequately covered in either the time available or the space at hand. What follows is at best only representative of the work being done in this area.
General Publications and Reviews of the Literature A recent symposium on the dynamics of growth processes (35) covered such topics as chemical growth in animals, the relation between skeletal Source: Review of Educational Research, XXV(5) (1955): 369– 414.
Salkind_Chapter 20.indd 31
9/4/2010 10:34:58 AM
32
Human Development
“status” based on carpal X rays and bodily maturity, growth curves, and hereditary mechanisms in animal growth. Several general books in the field of human development which contain special sections on physical growth (39, 49, 61, 67, 114, 157, 158, 160, 167, 269, 311, 343) recently were written or revised. The most recent edition of Holt and McIntosh’s Pediatrics (150) organized several chapters around body organs and in an appendix included growth tables, norms for body measurements of boys and girls, and nomograms for estimating surface area of infants and children. A recent volume by Dunham (87) dealt entirely with premature infants. Gestation was the subject of a conference (99) held at Princeton, New Jersey, in March 1954. Baker (18), Cruickshank (65), Heck (139), and Jenks (166), published volumes on the exceptional child which include many data on the physical growth of various kinds of atypical children. Data on the functioning of the endocrine-controlled systems which maintain homeostasis during the course of various diseases will be found in a volume on functional endocrinology from birth thru adolescence (306). This work also contains tables and nomographs for estimating surface area from body weight and height, and other data of value to those interested in growth and development. The U.S. Department of Health, Education, and Welfare continued to issue bulletins on research in human development currently under way (320, 321, 322, 323). Research listed under the headings, “Pregnancy and the Newborn,” “Growth and Development (Physical and Motor Development),” and “Physical Health and Disease,” will be of particular value for those interested in physical growth. Sontag and Garn (289), in a review of growth phenomena restricted to the field of human growth, discussed the various factors which may modify the genetically determined potentiality for normal growth. Among these factors were defective implantation, deficiencies in maternal diet, infections, intoxications, endocrine disturbances, trauma, penetrating radiations, and fetal anoxia. Soffer and Gabrilove’s review (288) of the literature on endocrinology for the year 1954 included 433 references. Two hundred and eighty titles dealing with primary organizer phenomena, gastrular metabolism, protein and amino acids, the serological approach, enzymes and adaptive enzymes, nuclear embryology, experiments with isotopes, affinities and disaggregations, developmental block, and dissection of morphogenesis by metabolic processes were reviewed under the title, “Developmental Physiology,” by Needham (236). Much pertinent material was reviewed in the annual reviews of physiology (127, 128, 129), medicine (68, 69, 70), and biochemistry (206, 207, 208), and the Year Book of Endocrinology (117). Harris (136) discussed the need for an interdisciplinary society for research in child development.
Salkind_Chapter 20.indd 32
9/4/2010 10:34:59 AM
Jensen
Physical Growth
33
Antecedents and Origins of Growth The biogenetic origin of growth and the complicated environmental circumstances modifying development have challenged and intrigued many observers. Growth may be patterned, ordered, and integrated, or it may be unintegrated, invasive, or destructive. Changes occur with the passage of time, but these time relationships may be altered. Speeding of development may occur to the point where the generation time span is markedly shortened. Slowing may occur to the point where immortality, in a particular tissue at least, is achieved. Reversal occurs where or when rejuvenation is demonstrated. Just when has the biogenetic die, which determines the future of a given individual, or individuals in the case of monovular origin, been cast? What factors influence this favorably or unfavorably? To what extent may germ cells be evaluated and selected in advance of conception? How is the progressively changing organization of the fertilized germ cell or reproductive unit controlled? By what biochemical processes are differentiation and orderly growth assured? Spratt (291) reviewed the literature dealing with such fundamental problems as the formation and organization of the ovarian egg; activation of the developmental system; progressive changes in organizations; changes in metabolism during development, regulation, and regeneration; and control of development. Marshall (222) devoted an entire volume to the physiology of reproduction, and Nelson (239) reviewed 223 titles dealing with the subject. Mammalian Germ Cells (338) dealt with a variety of factors influencing fertilization, embryonic death, ovulation, recovery and transplantation of ova, the preservation of spermatozoa, warmth-induced aspermia, semen quality, factors influencing longevity of sperm in vitro, human spermatozoan production in health and disease, and proteolytic enzymes in human semen. Moricard (231) reported on fertilization studies of mammalian ova in vivo and in vitro. Chang (54) concluded that the presence of large numbers of spermatozoa in the female tract was not as important as their physiological integrity, which is contrary to common belief. Doyle (83) used the culdoscope to observe and describe ovulation in women and also reported successful conception in 14 out of 17 infertile women, considered to have tubospasm, following denervation of the tubes and uterus. Asdell (13) studied the effect of several methods of controlling ovulation time upon the fertility of the mammalian egg. Casida’s approach (50) to the study of infertility consisted of attempting to induce fertility in animals representing different developmental or functional states, or different genetic or environmental backgrounds, and then analyzing the results in terms of both fertility failure and embryonic death. He tried to separate environmental causes from inherent defects in the gamete. Terner (309) developed a test for grading semen of good motility of otherwise normal healthy bulls. Factors influencing the longevity of bull sperm
Salkind_Chapter 20.indd 33
9/4/2010 10:34:59 AM
34
Human Development
cells in vitro were studied by Kok (180). MacLeod (215) reported that he and his co-workers were unable to find any relationship whatsoever between abnormal sperm morphology and fetal abnormalities or failure of conception in humans. Bunge and Sherman (46) reported that it has been shown repeatedly that preserved sperm can fertilize ova in the case of the bull and that recently the same was shown for man. Krohn and Zuckerman (182) reviewed the detailed observations that have been made on living gametes before, during, and after fertilization by means of phase-contrast microscopy. They also reported that there were differences in the viability of male and female zygotes, and that the differences probably did not remain constant thruout pregnancy. Wolff and others (337) reported the first successful steps in cultivating embryonic tissues in synthetic mediums (total nitrogen).
Fate of Transplants Autogenous grafts have been made successfully for a long time whereas successful homogenous grafts have proved exceedingly difficult to secure. Some autogenous grafts are impossible to get (for example, corneas), and the real future of grafts lies in the development of technics and procedures which will insure the successful and widespread use of homogenous grafts. Man rarely meets his end as did the one-horse shay, and so the rejuvenation of tissues, and perhaps even of individuals, can never be accomplished by autogenous grafts. This important problem of preservation and transplantation of normal tissues was the subject of a recently published volume (339). Medawar (225) expressed the conviction that the clinical problem of homografts is definitely soluble. He pointed out that gaps in the immunological defenses of the body that were unthought of even six years ago, have recently been discovered. Deanesly (74) demonstrated that prepuberal male and female gonadal tissue may be frozen and later successfully used as homografts with subsequent production of ova and spermatozoa. Smith and Parkes (286) on the basis of their experimentation in storing and homografting endocrine tissues reported that such tissues seemed potentially immortal under the conditions which they described. Precocious sexual development as a result of the use of gonadotropic treatment has been reported for many animals. The resultant ova have been successfully fertilized, but such superovulation and superfecundation have led to very small litters or inability to maintain the pregnancy if normal numbers of ova were involved. This raised the question of whether or not such failures were due to any inherent weakness in the ova. Adams (6) answered this question in the negative by successfully recovering and transplanting fertilized ova from immature animals to mature “recipients” and having normal births ensue. By this technic he also made available a new and brilliant method for markedly shortening the generation interval.
Salkind_Chapter 20.indd 34
9/4/2010 10:34:59 AM
Jensen
Physical Growth
35
Prenatal Development The functional role of the placenta, problems of sugar transport, and the comparative anatomy and histology of the placental barrier were dealt with in the transactions of the first conference on gestation, held in 1954 (99). In the second edition of Patten’s Human Embryology (258) even the descriptions of the earliest stages of development were based on human material. Blood pressure effects under various conditions of stress were recorded, samples of blood taken, and injections made in the same monkey fetus in utero in various stages of pregnancy (272). Abdominal and vaginal methods of demonstrating the electroencephalogram of the human fetus in utero were developed (30, 37). High voltage slow-wave activity with superimposed small fast waves was found in fetuses less than three months of age. By use of the electroencephalograph, fetal electrocardiograms were also successfully obtained in 50 cases (29). A three-channel tocograph, using fluid pressure and a metal bellows, was developed by Embrey (92) for use in the study of prenatal development. Adams (7) used intra-uterine roentgenography as an aid in determining fetal age by checking on the appearance of ossification in the distal epiphysis of the femur which is usually present by the thirty-fourth to thirty-seventh week of pregnancy. Armitage (12) administered varying dosages of sodium pentobarbital and sodium barbital to rats during pregnancy and then studied the subsequent learning performance of the offspring. The prenatal administration of the barbiturates produced significant impairment of performance in comparison with control groups.
Special Factors Influencing Prenatal Growth and Development Muller and Rugh stressed the possible damage to posterity that may result from irradiation of the gonads. Rugh (276) urged the use of extreme caution in the use of diagnostic and therapeutic radiation or even of extensive fluoroscopy in the pelvic region on the ground that such X-irradiation-induced changes at the gene or chromosome level are irrevocable, and dosages are cumulative without regard for time. Rugh did not predict that monstrosities would necessarily appear in the first generation, but rather argued that for a dose of 80 roentgens of exposure of gonadal tissue there will probably be a doubling of the normal rate of mutations, a majority of which will be harmful, and that such mutations are hereditary. Muller (234) contended that there is no dose so small as to produce no mutations at all and that even small mutational changes may eventually play a role in the extinction of the line of descent. Patt (257) reviewed 307 articles dealing with radiation effects on mammals. Greulich, Crismon, and Turner (120), Plummer (265), and Yamazaki, Wright, and Wright (341) studied the effects of the atomic bombing of Hiroshima and
Salkind_Chapter 20.indd 35
9/4/2010 10:34:59 AM
36
Human Development
Nagasaki on women and children. Greulich and his co-workers used skeletal ages assessed by X-ray photography and matched control groups. They concluded that growth and development were adversely affected by the atomic bombing and that some of the effects were still evident five years later. Plummer reported that 205 pregnant women who were exposed to the irradiation at Hiroshima produced 28 abnormal infants and that the number of malformed offspring increased with closeness to the hypocenter. Yamazaki, Wright, and Wright reported that the mean height and head circumference of children born to mothers with major signs of radiation, such as epilation, oropharyngeal lesions, purpura, or petechiae, were significantly smaller than in those children born to mothers in the control group. Among 30 such mothers there were seven fetal deaths, six neonatal and infant deaths, and four instances of mental retardation among 16 surviving children. The over-all morbidity and mortality was approximately 60 percent in this group of mothers as compared with 6 percent in a control group. Kaplan (173) and Rubin (275) reported studies involving third-generation follow-up in women receiving pelvic irradiation. Rubin reported that no acquired lethal effects on the genes were observed in hundreds of babies whose mothers were subjected to X-ray irradiation of the ovaries for menstrual dysfunction and sterility. He also reported no harmful effects in five grandchildren. Kaplan in a similar study irradiated the ovaries of 660 infertile women of whom at least 270 became pregnant and produced 347 children, all of whom were normal. Similar evaluation was made of 14 third-generation offspring in this group. Collins (62) raised a theoretical issue of general biological significance. She pointed to the fact that during periods of rapid somatic growth there is little development in integrative activity and conversely that integrative growth increases as somatic growth decreases. From these facts she hypothesized a reciprocal relationship between somatic or cellular growth and integrative or functional growth. She further suggested the existence of a closed system of available energies for which the somatic and integrative growth must compete. Dickerson (77) pointed out that studies of selection and of genetic correlation have both shown that genes influencing rate of growth in body weight had proportionately larger effects on fattening than on skeletal growth and least effect on appendages. Such genes also tended to improve viability, increase efficiency in food utilization, cause earlier sexual maturation, cause larger numbers of ovulations, and cause superior intra-uterine nutrition of the young, coupled with slightly inferior lactation.
Congenital Malformations and Experimental Teratology In the recently published proceedings of the Association for the Aid of Crippled Children (151) the section on congenital malformations considered the background of this subject, the principle underlying experimentally induced
Salkind_Chapter 20.indd 36
9/4/2010 10:34:59 AM
Jensen
Physical Growth
37
anomalies, cortisone induced cleft palate in rats, the effect of amino acid analogues on development of the explanted chick embryo, clinical classifications of malformations during prenatal life, traumatic abortion and prenatal death of the embryo, and statistical approaches to the study of congenital malformations. Harris and Steinberg (137) reported and classified the abnormalities found during the first six days of life in a sample of 8716 live-born infants. Their classification was according to systems such as gastrointestinal, respiratory, nervous, genito-urinary, skeleto-muscular, blood and cardiovascular, and skin. Stodard (294) studied the relationships between time of insult and the type of malformation. He classified the environmental factors which may produce anomalies as nutritional, chemical, endocrine, radiation, infectious, and mechanical. Marberger and Nelson (220) used skin biopsies in cases of ovarian agenesis to distinguish chromosomal sex and found the majority of such individuals to be genetic males. Collins (63) reviewed the literature dealing with the incidence of congenital abnormalities following maternal rubella and found that it varied with the stage of pregnancy at which the rubella was contracted. If contracted during the first four months of the pregnancy, the chances of malformations in the offspring were from 70 to 80 percent. After the fourth month, chances of malformation of the fetus were not great. Several people reported that congenital malformations may occur as a result of defective nutrition during pregnancy. Hogan (149), Nelson, Lyons, and Evans (238), and Workany (340) published comprehensive reviews of the literature on the effects of dietary deficiency in the production of congenital anomalies. Sontag and Garn (289) pointed out that dietary deficiencies experimentally induced during pregnancy have produced a wide range of developmental abnormalities in the offspring of laboratory animals. Giroud (115) reviewed the relationship between avita-minosis and malformation. Cohlan (60) showed that excess vitamin A, as well as a deficiency, may produce congenital malformations. Excess vitamin A fed during the seventh to tenth day of gestation in the rat produced encephaly, eye malformations, cleft palate, shortening of the mandible and maxilla, spina bifida with meningocele, or hydrocephalus in 52 percent of the 148 offspring in the experimental group as contrasted with no congenital anomalies in 1201 control young. He concluded that the developing embryo requires nutrients at critical developmental stages for the orderly progress of normal fetal differentiation and that teratologic effects may be produced by either a deficiency or an excess of specific nutrients at critical developmental stages. Lepkovsky and Borson (194) concluded that in humans nutritional supplements should be provided during the early prenatal period because organogenesis is practically complete 10 weeks after conception. Hartman (138) reported on experiments furnishing evidence against the notion that all prenatal abnormalities are due to defective maternal environment. His technic consisted in evaluating the growth of companion eggs in
Salkind_Chapter 20.indd 37
9/4/2010 10:34:59 AM
38
Human Development
the same maternal (uterine) environment. He then attributed embryonic death to inherent lack of growth potential or specific defects in the germ plasm if along with the death the other cell developed into a normal live individual. Nelson (237) reported that the folic acid antagonist, X-methyl pteroylglutamic acid, when administered to pregnant rats on a folic aciddeficient diet, produced brain injuries including encephaly and many other congenital abnormalities. Several experimenters (97, 103) administered cortisone to pregnant mice and rabbits. Large doses caused resorption of litters and smaller doses produced a variety of congenital defects with cleft palate the most common visible gross defect. Genetic make-up of the animals influenced the results markedly as did the gestational stage of the mother. The effect of diverse chemicals on mammalian embryos was studied by Hamburgh (130). The teratologic effects of ionizing radiation have been verified by several workers. Brent (40) found that irradiating embryos in shielded mothers produced the same types of abnormalities produced by irradiating the mother. He found some abnormalities would occur only if the embryo was irradiated at certain stages of embryogenesis. Most malformations were found to have a wide range of susceptibility, particularly central nervous system malformations. Hicks (146), Hicks, O’Brien, and Newcomb (147), Russell and Russell (277), and Wilson (335) experimenting with animals found the teratogenic potentiality of ionizing radiation to be high when exposure occurred during the time of major organogenesis. They also found that in some stages of embryonic development a certain amount of damage repair occurred following small total radiation dosages. Levinson (195) studied the effects of differing amounts of X irradiation at each of five stages of prenatal development upon maze-learning performance in rats. He found such learning decreased with increasing X irradiation and that the learning behavior was most impaired if X irradiation was administered on the thirteenth day of gestation. Blattner and his co-workers (31) studied developmental defects in the chick embryo following infection with Newcastle disease virus. They concluded that three key factors were involved: the site of the inoculation, the developmental stage of the embryo, and the amount of virus inoculated. Buck (44) reported on the exposure to virus diseases in early pregnancy and the resultant congenital malformations. Sontag and Garn (289) pointed out that some, if not all, viruses are able to cross the placental barrier with resultant embryological defects if the mother becomes infected during the first trimester, the duration and severity of the infection determining the nature and extent of the ensuing abnormalities. They also pointed out that a great many substances, including sex hormones and antibiotics, may pass the barrier and that the current widespread use of hormones and antibiotics to enhance the chances of a given conception
Salkind_Chapter 20.indd 38
9/4/2010 10:34:59 AM
Jensen
Physical Growth
39
coming to full term may well result in a concomitant increase in the number of defective offspring. It will be noted that to date, experimental work in this area has produced abnormalities largely, if not entirely, on the debit side. It is certainly conceivable that circumstances exist which would produce deviations from “normal” on the positive side, for surely the present environment is not usually the ultimately optimal.
Vital Statistics The United Nations published two volumes of accumulated data and interpretations in the field of mortality of young children (318, 319). These volumes included a mass of information from over 60 countries on all continents on birth rates; still-born; neonatal and infant death rates; and correlations of mortality rates with income, illiteracy, parity, nutritional levels, and other factors. In some countries 20 percent of all live-born children failed to survive the first five years. In 35 countries, where infant mortality rates were comparable, 10 had rates of 100 to 200 per 1000 live births, and in only five countries was the rate less than 50. Strom (295) reported that Sweden leads the world in low infant mortality having a total infant mortality rate of about 20 per 1000. Bundesen (45) reported the results of a 14-year study based on 10,000 neonatal deaths. He made the interesting point that there was practically no reduction in the mortality of the first day of life over this period of time. Abnormal pulmonary ventilation caused 54.3 percent of the deaths, and birth injuries accounted for 18.2 percent of the total. The correlation with prematurity was found to be high for all causes of death. Cenci (53) studied infant mortality in the city of Castello for the years 1938 to 1951. He found that infant mortality rose in the war years but has since fallen altho there was no decrease in prematures or infants with congenital or neonatal disorders. Wegman (330) reported on birth rate, death rate, marriage license rate, and infant mortality rate for 1952 and 1953 in the United States. Douglas (81) published data on trends in the risks of childbearing and in the mortalities of infants during the last 30 years in Scotland. Landucci (186) found that the birth rate in the province of Siena gradually dropped during the years 1946 to 1951 while the mortality among children under one year of age dropped from 68 to 49 per 1000 live births. Deaths in infants under one year of age accounted for 92.8 percent of the deaths of all children under 12 years of age during the year 1951. Both birth and death rates in Italy as a whole were higher than in Siena. Kendall and Rose (175) suggested an organization and procedure for studying neonatal mortality. Silverman, Fertig, and Kraus (283) proposed a method of computing standardized death rates for premature nurseries, which makes
Salkind_Chapter 20.indd 39
9/4/2010 10:34:59 AM
40
Human Development
due allowance for distribution differences in the structure of premature nursery populations. Essentially they proposed that the statistical technics of standardization which are used in the study of mortality rates of other population groups be applied to the problem of hospital nursery rates. Age-specific death rates for infants of differing birth weight, gestational age, sex, color, place of birth, and similar data obtained from the pooled experiences of many hospital nurseries were suggested as a basis for a set of standard rates. Maternal and perinatal mortality in New York City decreased from 1944 to 1951 altho there was no change in the incidence or weight distribution of prematurity (326). Lewis (196) analyzed data from U. S. Life Tables of the Bureau of the Census on the mortality of white males and females in the different eras of 1890, 1900, 1910, 1920, 1930, 1940, and 1949. He reported continuous improvement in survival in younger and older age groups with a significant decrease in the older age groups in 1940 and in 1949, except for white males aged 50 to 65 years. White females manifested a significantly higher survival rate at all ages thruout the period from 1890 to 1949, the difference being a major one after the age of 50. Wegman (331), after studying vital statistics for the United States, concluded that at a given weight Negro babies are more mature than white, children born to parents of lower socioeconomic status more mature than children from higher levels, and plural births more mature than single births. He also concluded that Negro babies of a given weight have a distinctly greater chance of survival than white babies of the same weight and that, as infant mortality rates continue to decline, the relative importance of the newborn period and particularly the problem of prematurity and postmaturity become greater. Wilson (336) stressed the growing importance of neoplasms as a cause of childhood maturity pointing out that in all age groups from 3 to 10 years, tumors (including leukemia) are one of the first four principal causes of death, tumors of the central nervous system being most common.
Multiple Births Guttmacher (121) reviewed the literature on the incidence of multiple births in man and some of the multipara with special reference to the variation in the frequency of twin births in various ethnic and social groups. Relative frequency of twin births for 12 countries with available data placed Norway highest with 14.5 per 1000 total births and Japan lowest with 6.5 per 1000 total births. Nichols (241, 242) combed the literature for accounts of quintuplet and sextuplet births in the United States. He judged most of the reported cases as not well authenticated. Miettinen (229) reported that 633 sets of triplets and 20 sets of quadruplets were born in Finland from 1905 to 1952. Digby (78) described what he believed to be the first case of quintuplet pregnancy in the British Isles.
Salkind_Chapter 20.indd 40
9/4/2010 10:34:59 AM
Jensen
Physical Growth
41
Lilienfeld and Pasamanick (199) studied the relationship between twin births and race and socioeconomic status. They found an increased frequency of both mono- and dizygotic twin births in the nonwhite population as compared with the white population after making necessary adjustments for economic status, birth order, and maternal age. McArthur (209) reported a tendency for monovular twinning to increase with maternal age and parity in young mothers in Italy from 1949 to 1950. Maternal age played a larger role than parity. Binovular twinning increased to a maximum at ages 35 to 39 years, the frequency increasing after birth ranks one and two. Binovular twinning was more closely related to modal maternal age than modal birth rank. Karn (174) found that mean birth weight of twins increased with the mother’s age up to the fourth birth and that there was an almost linear relationship between birth weight and gestation time. Low birth weight, first birth, and low age of mother all increased mortality. Friedman (105) pointed out that the possibility of solving the problem of the etiology of Mongolism would be greatly aided by cases of Mongolism in one or both twins provided data were available with respect to their zygosity.
Multiparity Miller and Oxom both reported studies of mutiparity. Miller (230) studied 563 grand multiparas in New Mexico and found that the incidence of prematurity was 8.8 percent and that it increased with the age of the mother. Oxom (253) found 1056 grand multiparas (seven or more children) in an examination of 63,140 confinements from 1926 to 1952 in Montreal. Eighty percent of the mothers were over 35 years of age and 25 percent over 40. He compared the more common and serious complications in the grand multipara with those in women who had given birth to fewer than eight children.
Prematurity and Postmaturity Such aspects of prematurity as the general background of the problem, the respiratory exchange, immunology, haemotology, development of enzyme systems, endocrinology, and the prevention of premature births were discussed at a conference attended by 30 scientists from all over the country who met to exchange information, to discuss research methods, and to clarify common objectives in the pursuit of basic knowledge germane to prematurity and related areas (151). Another general and extensive source of information on prematurity is the second edition of Dunham’s Premature Infants (87). Clifford (59) summarized present obstetric and pediatric knowledge by which a further reduction in the neonatal mortality rate may be brought about. He agreed with Bloch and his co-workers (32) that the occurrence of premature birth is primarily a public health problem involving the nutritional
Salkind_Chapter 20.indd 41
9/4/2010 10:34:59 AM
42
Human Development
condition and the socioeconomic status of the mother. Less frequent causative factors were obstetric conditions, ethnic origin, and maternal illness. Taff and Wilbar (305) reported that the neonatal death rate for immature infants of all races was about 30 times that of mature infants in his Chinese, Japanese, and Filipino populations. Racial immaturity rates ranged from 6 percent for Japanese to 12.1 percent for Filipinos, with a 7.3 percent over-all average. Verrotti (324) analyzed the causes and frequency of prematurity which occurred during the years 1936 thru 1950 in Siena, Italy. The incidence of prematurity varied from 3.5 percent in the prewar years to 7.3 percent in the early postwar period and then dropped to 5.2 percent in 1950. During the war the rate was 5 percent. There was no increase in death rate during the period studied. Thirty-six percent of the children were born before the eighth month of pregnancy and 55 percent were born of uniparae. Houghton and Ross (153), who studied birth weights and prematurity rates in Southern Rhodesia, suggested that the much higher rate which they found among Africans might be due to poor diet and social conditions. Llewellyn-Jones (201) reported that, if the international standard of prematurity were used in the tropics, abnormally favorable mortality rates would ensue and the percentage of premature babies would be abnormally high. The problem of the long-term prognosis for prematurely born children was studied by Alm (9) and by Douglas and Mogford (82). Alm found the mortality rate considerably higher among prematures especially those who were plural born. After this initial inequality, he found more brain-injury disorders in those surviving three years, altho there was no greater incidence of other disabling diseases. At 20 years the prematures were shorter and weighed less than the controls. In the matter of social adjustment he found no deficiency in the prematures at 20 as compared with normals. Douglas and Mogford (82) followed a national sample of premature children from birth to four years of age. Thruout this period the children were compared with a closely matched group of controls. There was a tendency for the smallest premature children to be most successful in eliminating their handicaps, and by the age of four years 36 percent of the premature children equaled or surpassed their controls in weight and 44 percent did so in height. Children who by four years of age had eliminated their initial height and weight handicaps were found to have mothers who were as tall and as heavy as the mothers of the controls. Children still lagging behind at the age of four years had mothers who were significantly smaller and lighter than the mothers of the controls. If the children whose mothers were smaller were really just smaller babies, rather than “prematures,” this would be further evidence of the inadequacy of the generally used criterion of prematurity which is birth weight of less than 2500 grams. Longo and Vianello (202) studied 18 reflexes commonly used in clinical tests in 50 immature and premature newborn infants on the first, fifth,
Salkind_Chapter 20.indd 42
9/4/2010 10:34:59 AM
Jensen
Physical Growth
43
fifteenth, thirtieth, forty-fifth, and sixtieth days of life. Reflexes of defense and medullary automatism were found to be more accentuated than in normal infants, but otherwise differences were slight. Postmaturity, which has an incidence of approximately 5 percent, ranks second only to prematurity as a cause of fetal and neonatal mortality. Like prematurity it is also arbitrarily defined; an infant delivered after a gestational period of 300 days being generally considered post-mature. In a sample of 2178 women included in a statistical study (57) where the incidence of neonatal death was approximately 15 percent, prematurity accounted for 36 percent of the perinatal deaths and postmaturity for 30 percent Postmaturity seems to be a hazard unique to primigravida; in one sample 73 percent of the primigravida whose gestation period lasted at least 300 days did not become pregnant again during the following 10 years (58). An analysis of 4401 consecutive primigravid labors at term, of which 482 had a prolonged first stage (30 hours plus), showed that stillbirths and neonatal death rates increased with prolongation of labor and that each passing hour beyond 30 increased the number of disasters (251). Higgins (148) recorded a well-authenticated case of pregnancy which lasted for one year and 24 days. It ended in delivery of an anencephalic monster. Ley (197) discussed criteria of postmaturity in addition to the 300-day gestation requirement. He reported a definite relation between prolonged gestation and dry peeling skin in the newborn, excessively long fingernails, small amounts of subcutaneous fat, and increased length in relation to body weight. Walker (325) reported that recent physiologic studies showed that the supply of oxygen in a clinically normal pregnancy fell gradually up to the fortieth week and thereafter declined rapidly leaving practically no reserve by the forty-third week, A review of 11,051 deliveries from 1948 to 1952 made it apparent that obstetric death was three times as high at 43 weeks as at 40, and that increased stillbirths accounted for the difference. In a study (266) of babies; weighing over nine pounds who were born during the years from 1949 to 1954, it was found that such large babies were two to three times commoner in white than in Negro patients. Eighty-seven percent of the mothers were multiparas and 20 percent grand multiparas. Nine percent of all mothers delivering infants weighing over 10 pounds were diabetic.
Neonatal and Early Postnatal Development In a 1953 publication of the Association for the Aid of Crippled Children (151) the sections devoted to birth injury dealt with the clinical impact and pathology of late pregnancy hemorrhage, pathologic changes in the infant in anoxia, placental transfer of oxygen, fetal tolerance of anoxia, and transverse narrowing of the pelvis as a cause of dystocia.
Salkind_Chapter 20.indd 43
9/4/2010 10:34:59 AM
44
Human Development
Hewitt and Stewart (145) standardized the first-year weight records of 298 boys and 282 girls in the Oxford Child Health Survey for sex, parity, and parental size. Such standardization reduced the estimated weight differences between infant groups according to maternal efficiency, home amenities, social class, and sickness experience altho bottle-fed infants were still found to outweigh breast-fed infants at the age of one year. Anderson (10) in another British study found some evidence that birth weight and incidence of prematurity were related to social class. He also found that birth weight increased with parity and age of mother and that subsequent development was affected by birth weight as indicated by height and weight measurements at school entry. This problem was also studied by Herdan (144) who found that birth weight accounted for not more than 25 percent of the variations in weight found at three years of age. He found that the relationships between birth weight and subsequent weight decreased systematically with the age of the child. Schlesinger and Allaway (278) studied the combined effect of birth weight and length of gestation on neonatal mortality in a large number of single live births with over 20 weeks of gestation. They found longer gestation increased the chance of survival within each birth weight group of 2500 grams or less and that high birth weight had a similar effect within each gestation group of 36 weeks or less. Cawley and his co-workers (51) studied the relationship between weight and length at birth and at 3, 6, 9, 12, and 24 months, and the duration of gestation, birth rank, and maternal age in 334 boys and 307 girls. Gestation correlated positively with both weight and length at birth but decreased after birth. Similarly, positive correlations at birth between birth rank and length and weight were found to disappear rapidly after birth. Maternal age had virtually no influence on height and weight at birth, but the correlations became slightly positive at two years. They (52) also reported the relationship between parental stature and birth weight. The influence of prenatal environment on the correlation between birth weight and parental height was studied by McKeown and Record (213). Humphreys (156) analyzed the maternal and fetal weight factors in normal pregnancy in 1000 mothers who delivered 1002 babies.
Mongolism (Congenital Acromicria) Mongolism is the most common of the congenital growth deficiencies, occurring three to four times in every 1000 births, yet it is poorly understood (28). Ingalls (161) examined 27 references in a critical review of the problem. Lande-Champain (185) studied detailed case histories of 150 Mongoloids and concluded that advanced maternal age did not play a dominating role, “functional” age being more important. Actually she found that Mongoloids who were the first-born children of young mothers outnumbered the “menopausal”
Salkind_Chapter 20.indd 44
9/4/2010 10:35:00 AM
Jensen
Physical Growth
45
babies. Young mothers who had a late menarche or menstrual irregularities before marriage seemed especially to run the risk of Mongolism in a first child. In such instances it was concluded that the Mongoloid did not mark the end but the beginning or regaining of fertility (in some older women). The Mongoloid develops at the border line between sterility and fertility as the result of fertilization of a subnormal ovum. Penrose (262) reviewed 35 references pertaining to the relationship of this growth abnormality to the mother’s age. Oster (250) analyzed the case records of 1000 Mongols traced from 1925 to 1949. Friedman (105) studied Mongolism in 42 twins of unlike sex and 71 twins of like sex. He also reviewed both germ cell theories and environmental theories pertaining to the disease. Benda and Mann (28) used radioactive iodine in a biochemical and biophysical investigation of Mongolism. They also compared 54 Mongoloids with 54 non-Mongoloid but defective children living in the same institution, in addition to comparing the Mongoloid group with 1039 normal subjects matched as to age and sex. They found Mongolism was not characterized by unusual serum lipid patterns. Simon and his associates (284) found no significant differences in the serum protein-bound iodine levels of Mongoloid children as compared with controls of the same age. They did find that mentally retarded children, including Mongols as a group, had significantly higher serum cholesterol levels than did normal children. The level of large molecule lipoproteins of the Sf 12–20 class provided the most marked differences among Mongoloids, normals, and control children. The Mongoloids had the highest level, the cases of undifferentiated mental deficiency an intermediate level, and the normals the lowest level.
Erythroblastosis Fetalis A major problem in erythroblastosis caused by Rh incompatibility is stillbirth since about 20 percent of the fetuses in these cases die in utero (8). Workers on this problem found that high maternal titer and an unfavorable previous history influenced the outcome adversely and concluded that in these cases the only known method of reducing the frequency of stillbirth late in pregnancy, was delivery before term. Neither cortisone nor other drugs had any value in preventing stillbirths due to Rh incompatibility. Day and Haines (73) studied the effects upon intelligence quotients of replacement transfusion in recovered cases of erythroblastosis fetalis. A later depression in mean IQ was found even when children with detectable nervous system defect (palsy and the like) were excluded from the population. Depression in IQ was found to be significantly related to the degree of jaundice but not to the degree of anemia. The children who were rated severely ill by both the criteria of severe anemia and severe jaundice averaged 23.1 points less in IQ than did their siblings.
Salkind_Chapter 20.indd 45
9/4/2010 10:35:00 AM
46
Human Development
Neonatal Anoxia Of 85 children who were treated at a children’s hospital for asphyxia neonatorium, 18 died during the newborn period. Of the remainder, 19 showed signs of a permanent cerebral injury such as convulsions, mental retardation, and spasticity. A conspicuously bad prognosis was found for children who had convulsions, changed tonus, bulging fontanel, or reduced sucking capacity during the newborn period (141). Apgar and others (11) recorded objective data relating to neonatal anoxia in an unselected series of 404 infants and then followed their subsequent intellectual development. Using the microgasometric method of Raughton and Scholander to determine oxygen content, and the Gesell Developmental Schedule and the Revised Stanford-Binet Scale, they found no significant correlation between the levels of blood oxygen content measured in the first three hours after birth and intelligence in early childhood in the 275 cases who returned for psychological tests. Becker and Donnell (24) asphyxiated guinea pigs in utero after one animal in each litter had been delivered as a control. Some of the asphyxiated animals were released in time for spontaneous recovery while others were asphyxiated to the point where administration of a gas mixture was required for revival. Eight to 10 weeks later all animals were tested in problem and learning situations of two levels of difficulty. Learning and retention performance were inferior for the experimental as compared with the control animals. Ingalls and his co-workers (163) found that maternal environment and genetic constitution interacted during prenatal development. Using mouse embryos of different strains, they found strain specific differences in susceptibility to hypoxia-induced anomalies of growth. Leedham (191) reported on glutamic acid in the treatment of mental deficiency. His study is a good presentation of the current status of the problem. Using Form L of the Stanford-Binet, Sequin’s Form Board and Critchton’s Vocabulary Scale on 12 matched pairs of children aged 4½ to 17½ years, ranging from 2 years 11 months to 7 years 7 months in mental age, he evaluated the effect of 10 grams of glutamic acid given daily for six months to one individual of each pair. The control received a similar quantity of saccharine lactate. No significant difference between the two groups was found. Earlier studies were reviewed and criticized.
Retrolental Fibroplasia (Retinopathy of Prematurity) Retrolental fibroplasia, which was first described by Terry in 1942, is basically an exuberant retinal vascular overgrowth which is invariably bilateral and results in characteristic retrolental membranes, retinal detachments, and other gross ocular changes leading to impaired vision or total blindness. It is now the
Salkind_Chapter 20.indd 46
9/4/2010 10:35:00 AM
Jensen
Physical Growth
47
commonest late complication of premature birth (143). It ranks first among the causes of blindness in children in the United States and in the case of premature infants is the foremost problem after death itself (189). Ten to 20 percent of babies weighing less than 3½ pounds at birth become blind within the first year after birth as a result of retrolental fibroplasia (89). Retrolental fibroplasia is a true developmental defect tho extra-uterine in origin. It is more hazardous the greater the degree of immaturity. Sontag and Garn (289) pointed out that the fact that retrolental fibroplasia is brought about by an environmental insult intended to support life (oxygen therapy) should lead to caution and careful evaluation of the use of hormonal therapy, vitamin therapy, antibiotics, and bacteriostatic agents, particularly in connection with the least developed newborn. There is certainly a serious possibility that such agents may artificially induce disturbances of growth in incompletely developed organisms. Several authors reviewed the literature dealing with retrolental fibroplasia (118, 142, 162, 189, 259, 260). Quite recently a symposium (252) on retrolental fibroplasia was held which dealt with the clinical course of the disease, the pathology involved, pediatric considerations, the etiology of retrolental fibroplasia, experimental studies in the field, and management of the disease. It was concluded that the incidence of retrolental fibroplasia is positively associated with the use of oxygen, that limiting the amount of oxygen used to that required for clinical emergency in prematures is without effect on the survival rate, and that much of the harmful effect appears to be associated with exposure to oxygen during the first 10 days of life. Retrolental fibroplasia is most likely to develop in infants beginning life with a birth weight of less than 1500 grams and almost never develops in those with a birth weight exceeding 2000 grams (289). Several workers produced or evaluated evidence pertaining to the rate of oxygen administration in the etiology of retrolental fibroplasia (1, 14, 15, 16, 17, 80, 89, 93, 113, 118, 122, 123, 124, 125, 155, 162, 177, 179, 187, 189, 219, 252, 259, 260, 342). There seems to be little question but that oxygen plays a major role tho the exact mechanism is not clear. Vascularization of the retina occurs during the fourth to eighth month of fetal life, and it has been suggested that the condition of the capillaries at birth, the rate of growth, and dietary components like electrolytes or tocopheral content may play auxiliary roles (118). Hepner (143) held that any factor which overstimulated vascular growth or which caused intra-ocular bleeding might produce or aggravate the acute vascular lesions of retrolental fibroplasia. Gyllensten and Hellstrom (124) contended that oxygen could not be the only cause of the disease and that it may act by way of an intermediate mechanism which can be provoked by other agents in exceptional cases. Variability in genetic components may also alter susceptibility (163). Eastman (89) pointed out that the normal fetus in utero exists in an environment in which the oxygen pressure is so low that a healthy adult could survive for only a few moments. He concluded that high and vacillating pressures
Salkind_Chapter 20.indd 47
9/4/2010 10:35:00 AM
48
Human Development
produced by oxygen therapy might readily have injurious effects in an organism which normally would not encounter such tensions until much further developed. The anoxia which adversely affects immature neural tissue may be produced by inadequate oxygenation of the blood or by exposure to high oxygen concentration followed by rapid withdrawal, an alternation which may result in irreparable damage before physiological acclimatization can take place. Jim and Krause (169) used electroencephalography in the study of retrolental fibroplasia, and Fortier (102) evaluated the therapeutic possibilities of carbon dioxide in the prevention of the disease.
Anthropometric Measurements General discussions of growth were published by Ojemann (246), Washburn (327), and Weech (329). Concepts of growth and what they mean for the classroom teacher were dealt with by Blommers and others (33), Jackson and Kelly (164), Olson (247), Olson and Lewellen (248), and Themen (310). Krogman (181) published an article on physical development and growth in relation to student success which may be of especial interest to educators. Tuddenham and Snyder (314) recently published a study of the physical growth of California boys and girls from birth to 18 years. For use in classroom planning and equipment design, Martin (223), and Martin and Thieme (224) published 45 measurements on 3318 school-age children in Michigan. The 45 tables were arranged by age and school grade. Clements (55) analyzed growth data to determine the age of children when growth in stature closes. Eppright and Sidwell (94) published means and standard deviations of five body measurements for 1200 Iowa school children aged 6 to 18 years. Meredith and Meredith (228) published data on 16 traits of size and form of presentday white boys and girls attending public elementary schools in WestCentral Oregon. Age changes in body form, sex differences in body form, and delineation of the body form of individuals were discussed. Abramson and Ernest (2) found that school boys in Stockholm had increased 15 centimeters in height and about 12 kilograms in weight during the past 70 years altho the increase of height of the adult population was scarcely half as great. A significant correlation between height and social class was reported. Gesell (114) reported a continuity of functional and physical development during the prenatal and postnatal growth periods up to the tenth year of life. He stressed the evolutionary development of eyes and hand-eye coordinations, basing his conclusions on longitudinal records of premature infants. Acheson and Hewitt (4) compared the physical development in the English and in the American preschool child using data from the Oxford and the Brush Foundation Surveys. Low (205) published measurements, at various intervals, of 66 boys and 60 girls who were born between 1923 and 1927. The fact that no data were published on the social and economic background of
Salkind_Chapter 20.indd 48
9/4/2010 10:35:00 AM
Jensen
Physical Growth
49
this population throws into relief the difficulty of maintaining constancy of methods in longitudinal studies over considerable periods of time or of providing duplication over an adequate period if advances in technics require it. Acheson and others (5) studied height, weight, and skeletal maturity in the first five years of life. Hammond (133) measured weight gain during the first year of life of a sample of 451 babies. Weights were measured at four-week intervals, and gain was found to be largely independent of birth weight. Boys gained about one pound more than girls and full-term babies less than prematures. Taller than average babies and those of leptosome type also gained more. Weight gain was independent of social class, was reduced by illness including colds and rashes, and decreased with increasing birth rank. Paiva (254) found that the pattern of growth of breast-fed infants did not differ significantly from that of bottle-fed infants from birth to six to seven months of age. Meredith (227) reviewed the literature on the comparative size and growth of North American Negro and white infants with respect to eight anthropometric measurements during the first postnatal year. Staton (292) reviewed 95 publications dealing with the physical growth and health of the adolescent under such headings as growth assessment and anthropometry, morphological and puberal development, nutritional status, physique and motor performance, physique and personality, physiological efficiency, and health. Gallagher and Gallagher (106) published tables relating height and weight values and increments to chronological and skeletal age in adolescents. They also discussed the psychological potentialities of wide divergences in size and growth from the point of view of the adolescent. Johnston (170) and Jones (171) also dealt with physical development during adolescence. Nicolson and Hanley (243) analyzed data on 180 boys and girls who were measured from the first to the eighteenth year of life. They were concerned with the derivation and interrelationships of indexes which could be used to assess progress along the hypothetical maturational continuum. They used factorial analysis of the intercorrelations among the various measures of maturation in their attack upon the question of generality in adolescent physical growth. They found a high degree of relationship among very different indexes of maturation.
Somatotyping Sheldon’s most recent volume in the “Human Constitution” series (280) is largely a tabular and pictorial presentation of somatotype variations. Part I deals with the nature of somatotypes and general theoretical considerations. Part II contains front, side, and rear photographic views of 1175 men photographed in identical postures, age-height-weight tables, and accompanying
Salkind_Chapter 20.indd 49
9/4/2010 10:35:00 AM
50
Human Development
text arranged according to the Sheldonian somatotypic scale. This section was based on studies of 46,000 American males between the ages of 18 and 65, Equipment and procedures in somatotyping are also described. Germain, Browne, and Bellows (112) described several physical profiling systems. Lindegård (200) published a new method of describing individual body build. He used four objectively determined variables, length, sturdiness, muscle, and fat factor, to express both outer body configuration and structure. He used both somatometric and X-ray cephalometric procedures in measuring his factors. This report analyzed 114 references. Hammond (132) compared physique and development of boys and girls in independent and council schools in England. Weights, heights, and other body measurements for boys and girls from 5 to 18 years of age in independent schools (Group A) were compared with pupils up to 14 or 15 years of age in council schools in the best (Group B) and worst areas (Group C) of several industrial towns. Group A was about two years’ growth ahead of Group C in height and most length measurements. Group A was also about two years ahead in “shape,” but for girth measurements the differences were much less. Differences in rate of development rather than specifically social differences were suggested to account for the different mature physiques of the social groups. Hammond (131) in another study used Burt’s multiple factor analysis technic to distinguish three body types in a population of 2967 British and American school children between the ages of 5 and 18. Remeasurements after three years indicated constancy of body type to be very high. A study (95) of the relationship between levels of nutrient intake of Iowa school children and physical and biochemical measurements showed children on diets which conformed fully to the recommended allowances of the National Research Council to be slightly taller, heavier, and larger in leg girth than the children on diets at the other extreme which averaged below the allowances. No significant differences in hemoglobin or serum alkaline phosphatese concentrations were reported. The mean serum concentration of ascorbic acid and carotenoids reflected the intakes of these two substances by the two groups of children. The differences in nutritional status of children noted in this study were small, but the significance of these differences to the long-time health and well-being of children who are on diets which are suboptimal in nutrient content should be assessed thru longitudinal studies. Bayley (23) studied parent-child similarities in height and weight for children in the Berkeley Growth Study whose records were complete thru 21 years. She concluded that there must be a core of parent-child similarities to account for the increasing similarities which occur up thru the teens even tho such similarities may not have been present during the first year or two of life. Lorr and Fields (203) published a factorial analysis of the 15 “purest” body types found in a group of 90 psychotic males. They reported the existence of three distinguishable groups of morphological trait patterns that closely resembled the patterns descriptive of Sheldon’s components. They also
Salkind_Chapter 20.indd 50
9/4/2010 10:35:00 AM
Jensen
Physical Growth
51
concluded that the 76 somatotypes identified by Sheldon could more simply and economically be defined in terms of measurements on only two type factors. Three new methods for calculating the surface area of the body were described by Schmitz (279). Trotter and Gleser (313) described a differential procedure for estimation of stature from the long bones of American whites and Negroes. Tanner (308) studied the reliability of anthropometric estimates of somatotype both in the same observer at different times and between different observers. Observers agreed within half a rating on a 7-point scale for 90 percent of the cases; mesomorphy proved the hardest component to rate, and ectomorphy the easiest. Changes in body build and form with age were pictorially and graphically presented by Boyd (38) both in terms of changes in proportion due to different rates of growth of the parts of the body and in comparison with the proportion and rates of growth of others. Dupertius and Michael (88) made a comparative study of the physical growth of 26 ectomorphs and 28 mesomorphs who were somatotyped at the age of 21. The ectomorphs averaged 2 3⁄16 inches taller and 267⁄10 pounds lighter in weight than the mesomorphs. The mesomorphs outweighed the ectomorphs at each age from 2 to 17 years while the ectomorphs were taller from 4 to 17 years of age. The height of the puberal spurt was reached one year earlier by the mesomorphs while the ectomorphs grew in height over a longer period of time. The mesomorphs grew at a more rapid rate. It was concluded that somatotypes as indicated by measures of height and weight remain fairly constant thruout childhood at least for ectomorphs and mesomorphs. Parnell (256) reviewed some of the difficulties associated with somatotyping and then described a short physical anthropometric method of estimating Sheldonian somatotype in young men in which the taking of the needed measurements required but five minutes of time. Pugh (267) discussed the charting of growth by means of the Wetzel Grid. Garn (108) studied individual and group deviations from “channel-wise” grid progression in girls and concluded that constancy of channel position was not a usual phenomenon. In his longitudinal series, the proportion still in the “starting” channel was only 50 percent after one year and 19 percent after two years. The short-term series revealed deviations of one or more channels in over 50 percent of the cases in a one-year period and deviations of two or more channels in 9 percent of the cases. In both series, Garn reported a downchannel trend during the earlier years and up-channel trend in later years.
Environment and Physique Kaplan (172) reviewed 25 articles dealing with the relationship between environment and human physique. She concluded that climate, diet, and altitude all had a significant effect upon the growth patterns of the populations
Salkind_Chapter 20.indd 51
9/4/2010 10:35:00 AM
52
Human Development
studied. The effect was most marked when one or two vital features of the environment were radically changed. Newman and Munro (240) measured 15,000 men at the time of their induction into the army and then analyzed the relationship of climate to body size. They concluded that there was a definite association between elements of body size and temperature of habitat. Body size was larger in colder climates, and this relationship was more highly correlated with January than with either annual or July temperatures. Heights and weights of 392 boys and 409 girls in a South African nursery school measured at quarterly intervals between the ages of 2¾ and 6½ years for periods ranging from 1936 to 1951 were found to be definitely related to socioeconomic status (264). At all ages children from the lower income group were significantly shorter and lighter, the differences being 3 to 5 pounds and 2 to 2½ inches for the boys, and 2 to 4½ pounds and 1¼ to 3½ inches for the girls. Roberts (274) found a highly significant inverse relationship between body weight in indigenous populations and mean annual temperature, both before and after correction for the influence of group affinity and the influences of stature. Roberts concluded that the relationship between mean weight and environmental temperature made the use of universal “norms” of weight based upon European standards inappropriate in nutrition and growth studies of other populations in other areas of the world. A close fit between lean body weight and metabolism was found by Behnke (25). When lean body weight was used as a reference, the usual sex differences in basal metabolic rate tended to disappear. Because lean body weight is thought not to change during adult life, Behnke suggested that it could serve as one property of the individual to which many other variables could be related. Garn, Clark, and Portray (109) studied the relationship between body composition and basal metabolic rate in 49 boys and 49 girls ranging from 6 to 18 years of age. He found that correlations between radiographic measurements of muscle size and basal metabolic rate equaled or exceeded those with measures of height, weight, or surface area. On the basis of his results he suggested the use of tissue measurements in establishing metabolic reference standards. Wedgwood and his colleagues (328), after pointing out that the commonly used relationship between basal metabolic rate and surface area is not a primary one, analyzed basal metabolic rate and estimates of the fluid compartments of the body in 17 healthy young men. They concluded that basal metabolic rate could be predicted from the volume of body fluid compartments as well as, or better than, from surface area of the body. Peckos (261) studied 28 endomorphs, 21 mesomorphs, and 37 ectomorphs to determine the relationship between caloric intake and physique in children. She found that the relation of the observed energy intake to body
Salkind_Chapter 20.indd 52
9/4/2010 10:35:00 AM
Jensen
Physical Growth
53
build was opposite to that expected and concluded that weight reductions in an endomorph may require a dangerously low energy intake. Lasker (190) pointed out the importance of ascertaining the extent to which the diagnostic criteria for each new typology are subject to change under the influence of environmental conditions. He concluded that the description given by Howell’s Factor I, is that of the nutritional state and that it accounts for the major fraction of the variance in extreme somatotypes. Škerlj, Brožek, and Hunt (285) studied subcutaneous fat and age changes in body build and body form in 84 women, 18 to 67 years of age. They concluded that the deposition of inner fat may be an important aspect of the complex phenomenon of aging and that the index of total body fat to subcutaneous fat may measure the aging process. They were unable to determine how much of the increase occurred because of accumulations of fat in depots and how much was the result of fatty infiltration of organs. Iliff and Lee (159) found that generally pulse rates, respiratory rates, and body temperatures decreased in children between two months and 18 years of age. Goldstein (116) reviewed the genetic and environmental evidence bearing on the differences in mortality and health status between whites and nonwhites and found notable progress in the instances of both American whites and nonwhites. He found, among other things, a marked reduction in the rate of mortality among nonwhites in the urban South in recent years. The relationship between physique and physical performance has been examined by several workers. Bookwalter (36) compared the physical fitness scores of 1977 Indiana elementary-school boys with their physique and developmental level as determined by the Wetzel Grid. Boys of thin or medium physique and those who were very large performed equally well physically. Maximum size and shape did not produce maximum fitness, but a relationship between physique and developmental level did seem to exist. Loveless (204) studied the relationships between scores on the Navy Standard Physical Fitness Test and age, height, and weight in 5669 randomly selected cases among enlisted personnel and officers. Height seemed to have less effect than age or weight. Age and test scores were largely unrelated below the age of 30 with a slight relationship above 30 indicated by consistently lower scores. Scores in the more strenuous exercises were adversely affected by weight over 190 pounds. Pere, Kunnas, and Telkkä (263) studied the correlation between performance and physique in 172 top-ranking track and field athletes. They found that physique had little effect upon performance, high achievement in a given field of athletics being reached by athletes of very different physiques. Throwers were tallest and seemed to benefit from height. Positive correlations were found between relative upper limb length and performance in throwers and in long distance runners and between relative chest circumference and performance in throwers. Negative correlations were found between
Salkind_Chapter 20.indd 53
9/4/2010 10:35:00 AM
54
Human Development
relative shoulder breadth and performance in throwers and between relative chest circumference and performance in sprinters. Lamp (184) correlated physical size and maturity with volleyball skills of junior high-school students. Tanner (307) examined the effect of weight training on physique. He concluded that the arm muscles in man seem to have a considerably greater growth potential than do the leg muscles. This is an interesting problem, for we have practically no quantitative data on the degree to which human muscles may be increased in size by exercise. We do not know whether the growth of a child’s muscles or other tissues or organs is affected temporarily or permanently by muscular exercise. Rarick (270) published a review of the literature dealing with the problem of maturity indicators and the development of strength and skill.
Physical Disability Barker and her colleagues (21) compared three sources of information as to the frequency of physical disability in children: laymen, teachers, and physicians. Teachers proved to be the best informants, reporting 76.5 percent of the total number of subsequently discovered disabled children. Laymen reported 48.2 percent and physicians 17.7 percent. No survey involving medical examinations was made. A recent volume (22) on the somatopsychological aspects of the adjustment to physical handicaps and illness contained extensive bibliographies and reviewed a large number of researches in this field. It is a major source book in this area and covers such topics as differences in physical size, strength, and attractiveness; crippling; the tubercular; impaired vision; social psychology of acute illness; and employment of the disabled. Wenar (333) studied the effects of a motor handicap on the integrative ability by administering Buhler’s World Test to a group of handicapped and nonhandicapped children. He found a significant decrease in integrative ability in children with a motor handicap and also a tendency for a decrease in integrative ability with increased severity of handicap. No evidence that motor handicap is associated with a particular kind of deviate thinking was found. Bruckner (41) published a book for parents and others interested in children with a handicap. The author is the mother of a child born a congenital bilateral arm amputee.
Skeletal Maturation Several writers dealt with the times of appearance of ossification centers. Dedick and Caffey (75) published several charts, based on roentgen findings, which showed the incidence of ossification centers in the skull and chest in 1030 newborn infants. Ellis and Joseph (91) published an account of the
Salkind_Chapter 20.indd 54
9/4/2010 10:35:00 AM
Jensen
Physical Growth
55
time of appearance of the centers of ossification of the tibular epiphyses. Harding (135) used roentgenograms of a group of children taken from birth to 14 years to establish the fact that the appearance and fusion of a second accessory center of ossification of the calcaneus appeared in most instances beginning at one and one-half years before menarche and was fused about one year after menarche. Noback (244) published a critical summary of the current status of data on the times of appearance of ossification centers and the fusion of bones. He urged that more accurately documented data be made available and that population samples be fully described. He particularly called attention to the fact that times of first observation of a center are usually recorded as times of first appearance whereas the actual first appearance time may have been months before. Acheson (3) and Harding (134) published methods of assessing skeletal maturity from radiographs. Harding studied approximately 50,000 X-ray films of 323 children from birth to 14 years of age. She published the percentages of boys and girls having a certain osseous center at various ages and the range of appearance of such centers. On the basis of her experience she also described a simple method of estimating osseous development which is of particular value in longitudinal research. Her subjects tended to maintain fairly constant rates of development over long periods of time. Mainland (217) and Mainland and Mainland (218) evaluated the skeletal age method of estimating children’s development, from the point of view of both systematic error and variable errors. To check on systematic error, expert assessments were compared with each other, and it was concluded that even expert assessment had not reached a desired degree of stability. The variable error in a single observer was studied by means of 1124 readings of 326 films for 233 subjects between the ages of 16 months and 17 years. No significant difference was associated with the Todd or Greulich-Pyle Atlas, the age of the child, sex, differences between skeletal age and chronological age, differences between children, differences between roentgenograms of the same child, or the speed with which the assessment was made. Greulich (119) reviewed the relationship between skeletal age (based on carpal X rays) and bodily maturity in normal growth, in precocious puberty, and in endocrine dysfunction. He stressed the complications produced by the presence of both early maturing and late maturing strains in the same population. This genetic diversity and nutritional differences tend to make standards which fit a particular group inapplicable to other groups. Cotellesa and DeToni (64) evaluated the weight, height, and skeletal development records of 500 normal and pathologically abnormal children and adolescents in terms of accepted norms and concluded that skeletal age should afford a useful index of general body maturity. In Japan, children who were skeletally above average were found to have a greater number of permanent teeth as compared with children who were
Salkind_Chapter 20.indd 55
9/4/2010 10:35:00 AM
56
Human Development
skeletally below average. At each age level, girls had a greater number of erupted permanent teeth than did the boys (303). Sutow (302) compared the skeletal maturation in 1200 healthy Hiroshima boys and 1150 healthy Hiroshima girls aged 6 to 19 years with the skeletal maturation in American children. He found that the skeletal development of these Hiroshima children, none of whom had been exposed to the atomic bomb, was consistently slower than that of American children of the same chronological ages. In another study in which the skeletal ages of West African Negro boys from 9 to 20 years of age were compared with those of American boys, an average retardation of 16 months was found (332). Findings of apparent retardation, such as were revealed by these studies, raise the interesting theoretical question as to whether or not we are dealing with a retarded group in the non-American sample or an accelerated American population. Whichever it may be, the further question of the desirability of such acceleration or retardation, in terms of optimum subsequent development, becomes of vital importance.
Growth of Body Segments and Tissues A number of workers studied the growth of various body segments and tissues: bones, the head, endocrines, teeth, the eye, the human diencephalon, skin, the blood, subcutaneous fat, and hematopoietic tissues. Stewart (293) studied the relationship between metamorphosis of the joints of the sternum and age changes in other bones. Moss (232) discussed the differential growth analysis of bone morphology as a useful technic for the study of bone growth. Gardner (107) studied prenatal development and growth of bones in man. Maresh (221) followed the linear growth of the long bones of the extremities from infancy thru adolescence. Park (255) dealt with the effects of health and disease upon bone growth, stressing the need to think of bone in the early period as a very much alive and sensitive tissue. He also pointed out that in times of stress, bone tissue can cease its growth activity and so relieve the general body growth of that added burden. MacDonald (210) related the head measurements of 1272 infants to estimated periods of gestation and concluded that growth continues in each of the diameters after the thirty-fifth week and that sex does not affect the rate of growth at that stage. In another study (211) he reported that for corresponding lengths of gestation and corresponding birth weights the head of the male fetus is larger and harder. As a sequel to this work, Meredith (226) published a review of more than 50 investigations of the growth in head width during the first 12 years of life in normal North American infants and children. He reported that thruout the period girls have slightly lower means and smaller standard deviations than boys.
Salkind_Chapter 20.indd 56
9/4/2010 10:35:00 AM
Jensen
Physical Growth
57
Lanman (138) summarized the developmental course, comparative anatomy, and possible physiologic functions of the fetal zone of the adrenal gland. Swingle and Kleinberg (304) reviewed studies of the effects of the growth hormone. Smith (287) studied the action of relaxin on growth of the mammary gland in the rat. Clements and Zuckerman (56) studied the order of eruption of the permanent teeth in 166 gorillas and 188 chimpanzees and compared the order of eruption with that of 2792 English children. Jeffreys (165) reported on the dental status of children in Delaware. Leicester (192) pointed out that teeth are extremely responsive to systemic and metabolic changes during their early developmental period before eruption and only very slightly responsive to such bodily changes thereafter. He also discussed caries etiology in some detail. Toverud and others (312) published a very extensive review of all phases of dental caries. McLean (214) and Ridley (273) published material on the growth and development of the lens of the eye. Fletcher (98) studied the pattern of development of the eye in a series of 320 small, premature infants over the period from April 1950 to January 1953 paying especial attention to the developing fundus oculi. She concluded that there were critical periods of development in the eye when it was particularly prone to retrolental fibroplasia. Kuhlenbeck (183) published a summary of the development, structure, function, and pathology of the human diencephalon. His monograph presented a detailed anatomical study of the epithalamus, dorsal thalamus, thalamus ventralis, and hypothalamus. This work contained a 421-item bibliography. Banfield (20) used the electron microscope to study the width and length of collagen fibrils during the development of human skin and in the skin of adult animals. He used human embryos, human fetuses, a three-year-old child, and adults 65, 68, and 85 years of age. Hale (126) published a quantitative and qualitative description of the morphogenesis of volar skin in 122 human fetuses ranging in size from 40-millimeter crown-rump length to 350-millimeter length. Duggins (86) reported age changes in head hair from birth to maturity using seven boys and nine girls in a longitudinal study. He found that refractive indexes of hair were of some value in indicating the approximate age and the sex of the individuals. Kiil (178), after studying frontal hair direction in American Chinese, Indian, and Negro populations, concluded that frontal hair direction in man may be a result of competition between two growth inducing centers of the skin. Two books dealing extensively with the distribution of blood groups in man were published recently (233, 268). Mourant’s volume (233) included a 97-page bibliography of 1716 items. It was intended to cover practically all existing works on distributions of blood groups. These volumes supplement each other in an excellent fashion.
Salkind_Chapter 20.indd 57
9/4/2010 10:35:00 AM
58
Human Development
The relationship between age changes and subcutaneous fat was studied by Eichorn and McKee (90), Reynolds (271), and Škerlj, Brožek, and Hunt (285). Osgood (249) studied the development and growth of the hematopoietic system using a method of analysis which took into consideration the relativity of biologic time.
General or Specific Variables Affecting Growth DeWijn (76) published a general review of factors governing the development of children. He discussed the relationship of environmental influences, climate, secular changes, socioeconomic status, racial factors, illness, psychic influences, and educational neglect to child growth. The effect upon growth patterns of chronic nutritive failure was the subject of a series of researches in which indexes, such as height, weight, and skeletal maturation, in children with chronic nutritive failure were compared with those in children without nutritive deficiency, One group of workers (84) found that the degree to which chronic nutritive failure affected the growth of a child was determined by an interplay of genetic and environmental influences, for, altho striking differences between mean height and weight and speed of growth values were found between the groups with and the groups without nutritive failure, within each age group individual children showed overlapping. It was also found that attainment of maximum physical development depended upon a child’s ability to remove any accumulated deficit before the epiphyses and diaphyses of the major long bones fused. In another study (290) it was found that prolonged nutritive failure produced a retarding effect on skeletal development which was largely reversible provided the amount of the nutrient supplement was increased beyond that needed to maintain equality in skeletal maturation between the children in the treated group and the normal controls. Skeletal maturation which was accelerated thru the feeding of essential nutrients did not equally affect all bone centers in the hand and wrist. Consequently it was concluded that all centers of a skeletal area must be evaluated to appraise fully and accurately the growth-promoting effects of nutritional therapy in children with chronic nutritive failure (85). Campbell and McLaughlan (48) published a review of the relationship between vitamin B12 and the growth of children. Crump and Tully (66) administered partial vitamin supplements daily to a group of 50 children who had a clinical diagnosis of malnutrition and anorexia associated with chronic illness. Comparing clinical impressions of improvement with analysis by Wetzel Grid charts, they concluded that the inability of clinical impression to differentiate between actual and apparent growth failure emphasized the need for objective standards in appraising response to nutritional therapy.
Salkind_Chapter 20.indd 58
9/4/2010 10:35:01 AM
Jensen
Physical Growth
59
In a similar study of growth failure in school children, Wetzel and his colleagues (334) found a significant growth response to nutritional therapy in children previously manifesting retarded growth. Improvement in growth response was not correlated with physical performance as measured by grip, leg, or back strength. Howe and Schiller (154) graphed data on height and weight for school children in Stuttgart, Germany, from 1915 to 1948 and related these data to changes in diet and environmental factors. The data covered changes in average weight and stature over a long period which included two world wars. Height and weight were diminished during both world wars and increased after each. No data on the adult height of these individuals were given. The effect of climate vectors in growth and development was also studied by Dodson (79). French and his co-workers (104) studied the effect of dietary fat and carbohydrate on growth and longevity in the rat. The life span of both male and female rats ingesting high fat diets was decreased markedly, tho more so in the case of the males. Their data also indicated that a high fat diet decreased the life span without noticeably altering the cause of death. Increased caloric intake per se was not associated with decrease in life span. Horn (152) found that rats on a protein-free diet were prevented from reaching sexual maturity. Ershoff (96) reported that proper nutrition is essential for functional integrity of the reproductive system. Bakwin (19) briefly reviewed the psychological aspects of various dietary deficiencies. In a survey of 285 cases drawn from a total population of 218,693 babies in Birmingham, MacMahon and McKeown (216) found harelip, with or without cleft palate, to be nearly four times as frequent in the offspring of old mothers (38 years of age and older) as in those of young mothers. Li (198) summarized the chemical and biological properties of the growth and adrenocorticotropic hormones of the anterior pituitary. Soffer and Gabrilove (288) reviewed 433 references on the important role of the endocrines in growth. In another comprehensive review of chemical growth in infancy and childhood, Forbes (101) pointed out the gaps in our presentday knowledge of chemical growth in children. Gaunt (110) recently discussed the endocrine factors which affect growth in a somewhat specific way. He dealt with the pituitary growth hormone, the adrenal cortex, the gonads, and the thyroid. In mammals the major growth stimulant is the pituitary growth hormone. The thyroid hormone is essential for the full action of the growth hormone and also for normal growth and development especially in the young. If it is absent, most food is deposited as fat rather than used for growth processes. The gonads and the adrenal cortex also produce definite but limited stimulants of general body growth. MacKay (212) reviewed 196 references dealing with the relationships between the endocrine and the nervous systems. Caldwell (47) found that intellectual functioning, speed and flexibility of reactions, and attitudes and
Salkind_Chapter 20.indd 59
9/4/2010 10:35:01 AM
60
Human Development
interests showed improvement in aged women following sex hormone administration. Thirty women whose mean age was 75 years were divided into two groups for the experiment, the control group receiving a placebo and the other, female sex hormones. Heller (140) reported that cortisone reduced the resistance of mammals to bacteria, fungi, viruses, and bacterial toxins. Jervis and his colleagues (168) reported the results of operations for revascularization of the brain in 25 mental defectives aged 3 to 20 years. Postoperative observations were made for two years or more, and no improvement in intelligence quotients, electroencephalographic findings, or clinical symptomatology was found. They also reviewed the data obtained by other investigators in a total of 331 cases and discussed theoretical aspects of the problem. Tyler and Armstrong (315) reviewed the evidence dealing with the metabolic aspects of some neurological and muscular disorders. They concluded that many studies of metabolic changes in schizophrenia suggest that a metabolic derangement underlies the disease process. They also pointed out that there is good evidence that patients with several types of endocrine abnormalities manifest an unusually high incidence of psychoses altho the position that all psychotic patients show an abnormality of endocrine function is much less well established.
Methodology and Technics Asdell (13) reported that several methods for controlling ovulation time in mammals were available. These procedures, combined with technics for preserving tissues indefinitely and transplantation developments, should open up a whole new era in the study of growth. Indeed, in his foreword to a recent volume entitled Mammalian Germ Cells Folley (100) stated that the technical advances foreshadowed by the researches reported in this volume might well be fraught with as much significance for man, and danger of misapplication, as the discoveries on how to release the energy of the atomic nucleus. The advantages of the longitudinal method of studying growth, and the problems associated with the approach, are well known. Bell (26, 27), in partial solution of some of these difficulties, suggested a method of combining the cross-sectional and longitudinal technics in such a manner that longrange developmental changes may be estimated in a relatively short period of time. He suggested that groups be selected so that final measurements on a younger group could be made at the same age as the initial measurements of the next-older group. By this procedure, for example, a longitudinal study which normally would take eight years might be accomplished over a twoyear period if four slightly overlapping age groups were used. Both absolute measurements and directions of development could be ascertained and used
Salkind_Chapter 20.indd 60
9/4/2010 10:35:01 AM
Jensen
Physical Growth
61
to help answer the question of whether or not age changes only were involved. He pointed out that such a short-cut method would be particularly helpful in studying transient populations or relatively uncooperative groups and wherever extensive study leads to undesirable contamination of the population universe. It could also be used to point up special problems and to obtain initial results to further experimental design in longitudinal studies. Suarez (296, 297), Suarez and Peva (298, 299, 300), and Suarez and Tirjeira (301) critically evaluated methods used to show growth and development beginning with Quetelet in 1871 and including the grids of DeToni, Gobesi and Tatafiori, and Wetzel. The authors also presented a new method of graphic representation of growth which permitted the visualization of developmental age, velocity, and direction of growth. LeLong and his colleagues (193) also published a new method of graphically recording growth which made it possible to compare the child with himself at successive ages and to attack some of the unsolved problems of auxology. Bryan and Greenberg (43) investigated methods suitable for crosssectional determination of immaturity points and sexual maturation. They compared three methods: logits, probits, and Karber’s method, and critically evaluated each. In a later paper Bryan (42) published methods for analyzing and interpreting physical measurements of groups of children. Tyler (316) critically evaluated various concepts of organismic growth with regard to their statistical bases, their psychological meaning, the validity of interpretation of basic data, and their important educational implications. He stressed the need for further empirical studies in this area. In another publication, Tyler (317) suggested the use of P-Technic in the study of the interrelatedness of rates of growth of children. Organismic concepts, such as “unity of growth” and “interrelatedness of the growth process,” have been given great educational and psychological significance, yet, referring, as they do, to intra-individual growth, they cannot be verified by data about interindividual growth. Nicolson and Hanley (243) published a factorial analysis of the intercorrelations among a group of indexes of physiological maturity obtained in an urban sample of 180 boys and girls who were measured annually from their first to eighth years and thereafter semiannually until they were 18 years of age. In this study of the generality of growth a high degree of relationship among very different indexes of maturation was found. Kerlinger (176) recommended the use of analysis of variance with child development data, particularly those data covering ages from 6 to 10 years. He reported that analysis of variance could be used to determine the significance of the difference between growth ages and between organismic ages, and within certain limitations, could give the degree of homogeneity and heterogeneity. Electroencephalography was used to study the developing brain of the intact prenatal human fetus (30, 37), the developing heart in the intact prenatal human fetus (29, 72), the normal aged adult (245, 281), brain phenomena in
Salkind_Chapter 20.indd 61
9/4/2010 10:35:01 AM
62
Human Development
senile psychoses (235), and in retrolental fibroplasia (169). Blum (34) checked on the reliability of electroencephalographic judgments by having 10 sets of electroencephalographic records evaluated by five experienced neurologists. He found the reliability to be low and emphasized the need for further research in this area. Geoghegan (111) devised a system by which a wide range of body measurements, surface area, and total and partial body volumes could be obtained from photographs of subjects in certain postures. The body specific gravity value could also be calculated if body weight were known. In a paper which was concerned with regularities in growth curves including rhythms and allometry, Sholl (282) made the point that accurate knowledge of growth of individuals or the growth process itself cannot be accomplished by deriving a kind of average curve from measurements of different individuals at different ages. No amount of mathematical theory or statistical practice can ever supplant the endless experimentation which lies at the base of progress in biological science. Peculiarities of growth data may be due to a peculiarity of the individual, a characteristic of growth of human beings, or an error of observation. In an important methodological note, Davies (71) pointed out that mortality statistics based on either cross-sectional or longitudinal data are limited in answering the question as to how fast man is aging at different periods of his life. Indeed, he concluded that the average rate of inherent aging will probably never be known because ideal environmental conditions which would be optimal for all, and hence not shorten life, do not exist now, nor are they likely to appear in the forseeable future. This is just another way of saying research must be continuous, that we shall not run out of problems, and that progress consists of a series of approximations or approaches to an ever receding and more enticing goal.
Bibliography 1. Aalde, O., and Innerslund, O. “Retrolental Fibroplasia and Treatment of Oxygen.” Acta Paediatrica 43: 553–56; November 1954. 2. Abramson, Ernest, and Ernest, Eva. “Height and Weight of Schoolboys at a Stockholm Secondary School, 1950, and a Comparison with Some Earlier Investigations.” Acta Paediatrica 43: 235–46; May 1954. 3. Acheson, R. M. “A Method of Assessing Skeletal Maturity from Radiographs.” Journal of Anatomy 88: 498–508; October 1954. 4. Acheson, R. M., and Hewitt, D. “Physical Development in the English and the American Pre-School Child: A Comparison Between Findings in the Oxford and the Brush Foundation Surveys.” Human Biology 26: 343–55; December 1954. 5. Acheson, R. M., and Others. “Height, Weight, and Skeletal Maturity in the First Five Years of Life.” Lancet 268: 691–92; April 2, 1955. 6. Adams, C. E. “Some Aspects of Ovulation, Recovery and Transplantation of Ova in the Immature Rabbit.” Mammalian Germ Cells. (Edited by G. E. W. Wolstenholme.) Boston: Little, Brown and Co., 1953. p. 198–216.
Salkind_Chapter 20.indd 62
9/4/2010 10:35:01 AM
Jensen
Physical Growth
63
7. Adams, Theodore W. “Intrauterine Roentgenography as an Aid in Determining Fetal Age.” Obstetrics and Gynecology 5: 43–48; January 1955. 8. Allen, Fred H., JR.; Diamond, Louis, K.; and Jones, A. Richardson. “Erythroblastosis Fetalis: IX. The Problems of Stillbirth.” New England Medical Journal 251: 453–59; September 16, 1954. 9. Alm, Ingvar. “The Long-Term Prognosis for Prematurely Born Children.” Acta Paediatrica Supplementum 42: 1–116; May 1953. 10. Anderson, A. “Some Observations on Birth Weights.” Medical Officer 89: 15–17; January 10, 1953. 11. Apgar, Virginia, and others. “Neonatal Anoxia: I. A Study of the Relation of Oxygenation at Birth to Intellectual Development.” Pediatrics 15: 653–62; June 1955. 12. Armitage, S. G. “The Effects of Barbiturates on the Behavior of Rat Offspring as Measured in Learning and Reasoning Situations.” Journal of Comparative and Physiological Psychology 45: 146–52; April 1952. 13. Asdell, S. A. “The Effect of Controlled Ovulation upon the Fertility of the Mammalian Egg.” Mammalian Germ Cells. (Edited by G. E. W. Wolstenholme.) Boston: Little, Brown and Co., 1953. p. 170–79. 14. Ashton, N. “Pathological Basis of Retrolental Fibroplasia.” British Journal of Ophthalmology 38: 385–96; July 1954. 15. Ashton, N. “Retrolental Fibroplasia.” American Journal of Ophthalmology 39: 153–59; April 1955. 16. Ashton, N.; Ward, B. A.; and Serpell, G. “Effect of Oxygen on Developing Retinal Vessels with Particular Reference to the Problem of Retrolental Fibroplasia.” British Journal of Ophthalmology 38: 397–432; July 1954. 17. Ashton, N.; Ward, B. A.; and Serpell, G. “Role of Oxygen in the Genesis of Retrolental Fibroplasia: A Preliminary Report.” British Journal of Ophthalmology 37: 51–320; September 1953. 18. Baker, Harry J. Introduction to Exceptional Children. Revised edition. New York: Macmillan Co., 1953. 500 p. 19. Bakwin, H. “Psychologic Aspects of Dietary Deficiency States.” Journal of Pediatrics 45: 110–14; July 1954. 20. Banfield, William G. “Width and Length of Collagen Fibrils During the Development of Human Skin, in Granulation Tissue and in the Skin of Adult Animals.” Journal of Gerontology 10: 13–17; January 1955. 21. Barker, Louise S., and others. “The Frequency of Physical Disability in Children: A Comparison of Three Sources of Information.” Child Development 23: 215–26; September 1952. 22. Barker, Roger G., and others. Adjustment to Physical Handicap and Illness: A Survey of the Social Psychology of Physique and Disability. New York: Social Science Research Council, 1953. 440 p. 23. Bayley, Nancy. “Some Increasing Parent-Child Similarities During the Growth of Children.” Journal of Educational Psychology 45: 1–21; January 1954. 24. Becker, R. Frederick, and Donnell, William. “Learning Behavior in Guinea Pigs Subjected to Asphyxia at Birth.” Journal of Comparative and Physiological Psychology 45: 153–62; April 1952. 25. Behnke, Albert R. “The Relation of Lean Body Weight to Metabolism and Some Consequent Systematizations.” Annals of the New York Academy of Science 56: 1095–1142; November 17, 1953. 26. Bell, Richard Q. “Convergence: An Accelerated Longitudinal Approach.” Child Development 24: 145–52; June 1953. 27. Bell, Richard Q. “An Experimental Test of the Accelerated Longitudinal Approach.” Child Development 25: 281–86; December 1954.
Salkind_Chapter 20.indd 63
9/4/2010 10:35:01 AM
64
Human Development
28. Benda, Clemens, E., and Mann, George V. “The Serum Cholesterol and Lipoprotein Levels in Mongolism.” Journal of Pediatrics 46: 49–53; January 1955. 29. Bernstine, Richard L., and Borkowski, Winslow J. “Prenatal Fetal Electrocardiography.” American Journal of Obstetrics and Gynecology 70: 631–38; September 1955. 30. Bernstine, Richard L.; Borkowski, Winslow J.; and Price, A. H. “Prenatal Fetal Electroencephalography.” American Journal of Obstetrics and Gynecology 70: 623–30; September 1955. 31. Blattner, Russell J., and others. “Developmental Defects in the Chick Embryo Following Injection with Newcastle Disease Virus.” American Journal of Diseases of Children 88: 654; November 1954. 32. Bloch, Harry, and others. “Reduction of Mortality in the Premature Nursery: II. Incidence and Cause of Prematurity; Ethnic, Sociometric and Obstetric Factors.” Journal of Pediatrics 41: 300–304; September 1952. 33. Blommers, Paul, and others. “Organismic Age Concept.” Journal of Educational Psychology 46: 142–50; March 1955. 34. Blum, Richard H. “A Note on the Reliability of Electroencephalographic Judgments.” Neurology 4: 143–46; February 1954. 35. Boell, Edgar J., editor. Dynamics of Growth Processes. Princeton, N. J.: Princeton University Press, 1954. 304 p. 36. Bookwalter, Karl W. “The Relationship of Body Size and Shape to Physical Performance.” Research Quarterly of the American Association for Health, Physical Education, and Recreation 23: 271–79; October 1952. 37. Borkowski, Winslow J., and Bernstine, Richard L. “Electroencephalography of the Fetus.” Neurology 5: 362–65; May 1955. 38. Boyd, Edith. “Pictorial and Graphic Analysis of the Body Build of One Boy.” American Journal of Diseases of Children 89: 332–40; March 1955. 39. Breckenridge, Marion E., and Vincent, E. Lee. Child Development. Third edition. Philadelphia: W. B. Saunders Co., 1955. 497 p. 40. Brent, Robert. “X-Ray-Induced Embryonic Malformations in the Rat: An Application to the Human Malformation Problem.” American Journal of Diseases of Children 88: 654–57; November 1954. 41. Bruckner, Leona S. Triumph of Love. New York: Simon and Schuster, 1954. 213 p. 42. Bryan, A. Hughes. “Methods for Analyzing and Interpreting Physical Measurements of Groups of Children.” American Journal of Public Health 44: 766–74; June 1954. 43. Bryan, A. Hughes, and Greenberg, B. G. “Methodology in the Study of Physical Measurements of School Children: II. Sexual Maturation—Determination of Immaturity Points.” Human Biology 24: 117–44; May 1952. 44. Buck, C. “Exposure to Virus Diseases in Early Pregnancy and Congenital Malformations.” Journal of the Canadian Medical Association 72: 744–46; May 15, 1955. 45. Bundesen Herman N. “Natal Day Deaths: The Long Neglected Field of Infant Mortality.” Journal of the American Medical Association 153: 466–73; October 3, 1953. 46. Bunge, R. G., and Sherman, J. K. “Frozen Human Semen.” Fertility and Sterility 5: 193–94; March-April 1954. 47. Caldwell, Bettye M. “An Evaluation of Psychological Effects of Sex Hormone Administration in Aged Women: II. Results of Therapy after Eighteen Months.” Journal of Gerontology 9: 168–74; April 1954. 48. Campbell, J. A., and MCLaughlan, J. M. “Vitamin B12 and the Growth of Children: A Review.” Journal of the Canadian Medical Association 72: 259–63; February 15, 1955. 49. Carmichael, Leonard, editor. Manual of Child Psychology. Second edition. New York: John Wiley and Sons, 1954. 1295 p. 50. Casida, L. E. “Some Factors Affecting Fertilization and Embryonic Death.” Mammalian Germ Cells. (Edited by G. E. W. Wolstenholme.) Boston: Little, Brown and Co., 1953. p. 262–74.
Salkind_Chapter 20.indd 64
9/4/2010 10:35:01 AM
Jensen
Physical Growth
65
51. Cawley, R. H.; MCKeown, Thomas; and Record, R. G. “Influence of the Pre-Natal Environment on Post-Natal Growth.” British Journal of Preventive and Social Medicine 8: 66–69; April 1954. 52. Cawley, R. H.; MCKeown, Thomas; and Record R. G. “Parental Stature and Birth Weight.” American Journal of Human Genetics 6: 448–56; December 1954. 53. Cenci, E. “Infant Mortality in the Commune of Citta di Castello in the Period 1938–1951.” Lattante 24: 258–67; April 1953. 54. Chang, M. C. “Fertilizability of Rabbit Germ Cells.” Mammalian Germ Cells. (Edited by G. E. W. Wolstenholme.) Boston: Little, Brown and Co., 1953. p. 226–42. 55. Clements, E. M. B. “The Age of Children When Growth in Stature Ceases.” Archives of Disease in Childhood 29: 144–51; April 1954. 56. Clements, E. M. B., and Zuckerman, S. “The Order of Eruption of the Permanent Teeth in the Hominoidea.” American Journal of Physical Anthropology 11: 313–32; September 1953. 57. Clifford, Stewart H. “Postmaturity: Clinical Syndrone and Pathologic Findings.” American Journal of Diseases of Children 86: 319–21; September 1953. 58. Clifford, Stewart H. “Postmaturity with Placental Dysfunction: Clinical Syndrone and Pathologic Findings.” Journal of Pediatrics 44: 1–13; January 1954. 59. Clifford, Stewart H. “The Problem of Prematurity: Obstetric, Pediatric, and Socioeconomic Factors.” Journal of Pediatrics 47: 13–24; July 1955. 60. Cohlan, Sidney Q. “Congenital Anomalies in the Rat Produced by Excessive Intake of Vitamin A During Pregnancy.” Pediatrics 13: 556–67; June 1954. 61. Cole, Luella. Psychology of Adolescence. Fourth edition. New York: Rinehart and Co., 1954. 712 p. 62. Collins, E. H. “The Reciprocal Nature of Growth and Behavior in the Fetus and Infant.” Growth 17: 163–67; September 1953. 63. Collins, I. S. “The Incidence of Congenital Malformations Following Maternal Rubella at Various Stages of Pregnancy.” Medical Journal of Australia 2: 456–58; September 19, 1953. 64. Cotellesa, G., and DeToni, E., Jr. “The Evaluation of Skeletal Growth in Children and in Adolescents in Normal and Pathologic Conditions.” Pediatrica 61: 872–88; November-December 1953. 65. Cruickshank, William M., editor. Psychology of Exceptional Children and Youth. New York: Prentice-Hall, 1955. 594 p. 66. Crump, Jean, and Tully, Robert. “The Use of Partial Vitamin Supplements in the Treatment of Growth Failure in Children.” Journal of Pediatrics 46: 671–81; June 1955. 67. Cruze, Wendell W. Adolescent Psychology and Development. New York: Ronald Press Co., 1953. 557 p. 68. Cutting, Windsor C, editor. Annual Review of Medicine. Stanford, Calif.: Annual Reviews, 1952. 442 p. 69. Cutting, Windsor C, editor. Annual Review of Medicine. Stanford, Calif.: Annual Reviews, 1953. 452 p. 70. Cutting, Windsor C, editor. Annual Review of Medicine. Stanford, Calif.: Annual Reviews, 1954. 490 p. 71. Davies, Dean F. “Mortality and Morbidity Statistics: II. Limitations of Approaches to Rates of Aging.” Journal of Gerontology 9: 186–95; April 1954. 72. Davis, J., and Meares, S. Devenish. “Preliminary Report on an Investigation of Foetal Electrocardiography and Foetal Stethography.” Medical Journal of Australia 2: 501–504; September 25, 1954. 73. Day, Richard, and Haines, Miriam S. “Intelligence Quotients of Children Recovered from Erythroblastosis Fetalis Since the Introduction of Exchange Transfusion.” Pediatrics 13: 333–38; April 1954.
Salkind_Chapter 20.indd 65
9/4/2010 10:35:01 AM
66
Human Development
74. Deanesly, Ruth. “Histological Evolution of Rat Gonadal Tissue Transplanted after Freezing and Thawing.” Preservation and Transplantation of Normal Tissues. (Edited by G. E. W. Wolstenholme and Margaret P. Cameron.) Boston: Little, Brown and Co., 1954. p. 86–99. 75. Dedick, Andrew P., and Caffey, John. “Roentgen Findings in Skull and Chest in 1,030 Newborn Infants.” Radiology 61: 13–20; July 1953. 76. DeWijn, J. F. “Factors Influencing Growth and Development.” Maandschrift Voor Kindergeneeskunde 22: 418–29; December 1954. 77. Dickerson, Gordon E. “Hereditary Mechanisms in Animal Growth.” Dynamics of Growth Processes. (Edited by Edgar J. Boell.) Princeton, N. J.: Princeton University Press, 1954. p. 242–76. 78. Digby, I. F. “A Case of Quintuplet Pregnancy.” Journal of Obstetrics and Gynaecology of the British Empire 61: 94–95; February 1954. 79. Dodson, D. W. “Climate Vectors in Growth and Development.” Journal of Educational Sociology 27: 98–101; November 1953. 80. Donegan, J. M. “Retrolental Fibroplasia.” Wisconsin Medical Journal 54: 209–11; April 1955. 81. Douglas, Charlotte A. “Trends in the Risks of Childbearing and in the Mortalities of Infants During the Last 30 Years.” Journal of Obstetrics and Gynaecology of the British Empire 62: 216–31; April 1955. 82. Douglas, J. W. B., and Mogford, C. “The Results of a National Inquiry into the Growth of Premature Children from Birth to Four Years.” Archives of Disease of Childhood 28: 436–45; December 1953. 83. Doyle, J. B. “Ovulation and the Effects of Selective Uterotubal Denervation: Direct Observations by Culdotomy.” Fertility and Sterility 5: 105–30; March-April 1954. 84. Dreizen, Samuel, and others. “The Effect of Nutritive Failure on Growth Patterns of White Children in Alabama.” Child Development 24: 189–202; September-December 1953. 85. Dreizen, Samuel, and others. “Maturation of Bone Centers in Hand and Wrist of Children with Chronic Nutritive Failure.” American Journal of Diseases of Children 87: 429–39; April 1954. 86. Duggins, Oliver H. “Age Changes in Head Hair from Birth to Maturity: IV. Refractive Indices and Birefringence of the Cuticle of Hair of Children.” American Journal of Physical Anthropology 12: 89–114; March 1954. 87. Dunham, Ethel C. Premature Infants. Second edition. New York: Hoeber-Harper Co., 1955. 459 p. 88. Dupertius, C. Wesley, and Michael, Nancy B. “Comparison of Growth in Height and Weight Between Ectomorphic and Mesomorphic Boys.” Child Development 24: 203– 14; September-December 1953. 89. Eastman, N. J. “Mount Everest in Utero.” American Journal of Obstetrics and Gynecology 67: 701–11; April 1954. 90. Eichorn, Dorothy H., and McKee, John P. “Oral Temperature and Subcutaneous Fat During Adolescence.” Child Development 24: 235–47; September-December 1953. 91. Ellis, F. G., and Joseph, J. “Time of Appearance of the Centres of Ossification of the Fibular Epiphyses.” Journal of Anatomy 88: 533–36; October 1954. 92. Embrey, Mostyn P. “A New Multichannel External Tocograph.” Journal of Obstetrics and Gynaecology of the British Empire 62: 1–5; February 1955. 93. Engle, Mary A., and others. “Oxygen Administration and Retrolental Fibroplasia.” American Journal of Diseases of Children 89: 399–413; April 1955. 94. Eppright, Ercel S., and Sidwell, Virginia D. “Physical Measurements of Iowa School Children.” Journal of Nutrition 54: 543–56; December 1954. 95. Eppright, Ercel S., and others. “Relationship of Estimated Nutrient Intakes of Iowa School Children to Physical and Biochemical Measurements.” Journal of Nutrition 54: 557–70; December 1954.
Salkind_Chapter 20.indd 66
9/4/2010 10:35:01 AM
Jensen
Physical Growth
67
96. Ershoff, B. H. “Nutrition and the Anterior Pituitary with Special Reference to the General Adaptation Syndrone.” Vitamins and Hormones 10: 79–140; 1952. 97. Fainstat, Theodore. “Cortisone-Induced Congenital Cleft Palate in Rabbits.” Endocrinology 55: 502–508; October 1954. 98. Fletcher, M. C. “The Developing Fundus Oculi of the Premature Infant and Its Relationship to Retrolental Fibroplasia.” Journal of Pediatrics 43: 499–523; November 1953. 99. Flexner, Louis B., editor. Gestation: Transactions of the First Conference, March 9, 10, 11, 1954, Princeton, New Jersey. New York: Corlies, Macy and Co., 1955. 238 p. 100. Folley, S. J. “Foreword.” Mammalian Germ Cells. (Edited by G. E. W. Wolstenholme.) Boston: Little, Brown and Co., 1953. p. vii–viii. 101. Forbes, Gilbert B. “Chemical Growth in Infancy and Childhood.” Journal of Pediatrics 41: 202–32; August 1952. 102. Fortier, E. G. “The Therapeutic Possibilities of Carbon Dioxide in the Prevention of Retrolental Fibroplasia.” American Journal of Ophthalmology 38: 342–48; September 1954. 103. Fraser, F. Clarke; Fainstat, T. D.; and Kalter, H. “The Experimental Production of Congenital Defects with Particular Reference to Cleft Palate.” Etudes Neo-natales 2: 43–58; June 1953. 104. French, C. E., and others. “The Influence of Dietary Fat and Carbohydrate on Growth and Longevity in Rats.” Journal of Nutrition 51: 329–39; November 10, 1953. 105. Friedman, Abraham. “Mongolism in Twins.” American Journal of Diseases of Children 90: 43–50; July 1955. 106. Gallagher, J. Roswell, and Gallagher, Constance D. “Some Comments on Growth and Development in Adolescents.” Yale Journal of Biology and Medicine 25: 334–48; April 1953. 107. Gardner, E. “Prenatal Development and Growth of Bones in Man.” Journal of Michigan Medical Society 54: 298–300; March 1955. 108. Garn, Stanley M. “Individual and Group Deviations from ‘Channelwise’ Grid Progression in Girls.” Child Development 23: 193–206; September 1952. 109. Garn, Stanley M.; Clark, Leland C, Jr.; and Portray, Renee. “Body Composition and Basal Metabolic Rate in Children.” Journal of Applied Physiology 6: 163–67; September 1953. 110. Gaunt, Robert. “Chemical Control of Growth in Animals.” Dynamics of Growth Processes. (Edited by Edgar J. Boell.) Princeton, N. J.: Princeton University Press, 1954. p. 183–211. 111. Geoghegan, Basil. “The Determination of Body Measurements, Surface Area and Body Volume by Photography.” American Journal of Physical Anthropology 11: 97–119; March 1953. 112. Germain, George L.; Browne, C. G.; and Bellows, Roger M. “Measuring Men and Jobs: Physical Profiling Systems.” Occupations 30: 579–83; May 1952. 113. Gerschman, Rebecca, and others. “Effect of High Oxygen Concentrations on Eyes of Newborn Mice.” American Journal of Physiology 179: 115–18; September 1954. 114. Gesell, Arnold. Infant Development: The Embryology of Early Human Behavior. New York: Harper and Brothers, 1952. 108 p. 115. Giroud, A. “Malformations Embryonnaires D’Origine Carentielle.” Cambridge Philosophical Society Biological Reviews 29: 220–50; May 1954. 116. Goldstein, Marcus S. “Longevity and Health Status of Whites and Nonwhites in the United States.” Journal of the National Medical Association 46: 83–104; March 1954. 117. Gordon, G. S., editor. Year Book of Endocrinology. Chicago: Year Book Publishers, 1954. 448 p. 118. Gordon, Harry H.; Lubchenco, Lula; and Hix, Ivan. “Observations on the Etiology of Retrolental Fibroplasia.” Bulletin of the Johns Hopkins Hospital 94: 34–44; January 1954.
Salkind_Chapter 20.indd 67
9/4/2010 10:35:01 AM
68
Human Development
119. Greulich, W. W. “The Relationship of Skeletal Status to the Physical Growth and Development of Children.” Dynamics of Growth Processes. (Edited by Edgar J. Boell.) Princeton, N. J.: Princeton University Press, 1954. p. 212–23. 120. Greulich, W. W.; Crismon, C. S.; and Turner, M. L. “The Physical Growth and Development of Children Who Survived the Atomic Bombing of Hiroshima or Nagasaki.” Journal of Pediatrics 43: 121–45; August 1953. 121. Guttmacher, A. F. “The Incidence of Multiple Births in Man and Some of the Unipara.” Obstetrics and Gynecology 2: 22–35; July 1953. 122. Gyllensten, Lars J., and Hellstrom, B. E. “Experimental Approach to the Pathogenesis of Retrolental Fibroplasia.” American Journal of Ophthalmology 39: 475–88; April 1955. 123. Gyllensten, Lars J., and Hellstrom, B. E. “Experimental Approach to the Pathogenesis of Retrolental Fibroplasia: I. Changes of the Eye Induced by Exposure of Newborn Mice to Concentrated Oxygen.” Acta Paediatrica 43: 131–48; October 1954. 124. Gyllensten, Lars J., and Hellstrom, B. E. “Experimental Approach to the Pathogenesis of Retrolental Fibroplasia: III. Changes in the Eye Induced by Exposure of Newborn Mice to General Hypoxia.” British Journal of Ophthalmology 39: 409–15; July 1955. 125. Gyllensten, Lars J., and Hellstrom, B. E. “Retrolental Fibroplasia; Animal Experiments: The Effect of Interminglingly Administered Oxygen on the Postnatal Development of the Eyes of Fullterm Mice.” Acta Paediatrica 41: 577–82; November 1952. 126. Hale, Alfred R. “Morphogenesis of Volar Skin in the Human Fetus.” American Journal of Anatomy 91: 147–81; July 1952. 127. Hall, Victor E., editor. Annual Review of Physiology. Stanford, Calif.: Annual Reviews, 1953. 558 p. 128. Hall, Victor E., editor. Annual Review of Physiology. Stanford, Calif.: Annual Reviews, 1954. 545 p. 129. Hall, Victor E., editor. Annual Review of Physiology. Stanford, Calif.: Annual Reviews, 1955. 551 p. 130. Hamburgh, M. “Malformations in Mouse Embryos Induced by Trypan Blue.” Nature 169: 27; January 5, 1952. 131. Hammond, W. H. “The Determination of Physical Type in Children.” Human Biology 25: 65–80; May 1953. 132. Hammond, W. H. “Physique and Development of Boys and Girls from Different Types of Schools.” British Journal of Preventive and Social Medicine 7: 231–37; October 1953. 133. Hammond, W. H. “Some Observations on the Conditions Affecting the First Year Growth of Babies (with Special Reference to a Sample of 451 Babies from Leeds).” Medical Officer 88: 225–28; November 15, 1952. 134. Harding, Vernette S. V. “A Method of Evaluating Osseous Development from Birth to 14 Years.” Child Development 23: 247–71; December 1952. 135. Harding, Vernette S. V. “Time Schedule for the Appearance and Fusion of a Second Accessory Center of Ossification of the Calcaneus.” Child Development 23: 181–84; September 1952. 136. Harris, Dale B. “Why an Interdisciplinary Society for Research in Child Development.” Child Development 24: 249–55; September 1953. 137. Harris, Lloyd E., and Steinberg, Arthur G. “Abnormalities Observed During the First Six Days of Life in 8,716 Live-Born Infants.” Pediatrics 14: 314–26; October 1954. 138. Hartman, Carl G. “Early Death of the Mammalian Ovum with Special Reference to the Aplacental Opossum.” Mammalian Germ Cells. (Edited by G .E. W. Wolstenholme.) Boston: Little, Brown and Co., 1953. p. 253–61. 139. Heck, Arch O. The Education of Exceptional Children: Its Challenge to Teachers, Parents, and Laymen. Second edition: New York: McGraw-Hill Book Co., 1953. 513 p.
Salkind_Chapter 20.indd 68
9/4/2010 10:35:01 AM
Jensen
Physical Growth
69
140. Heller, John H. “Cortisone and Phagocytosis.” Endocrinology 56: 80–85; January 1955. 141. Hellstrom, B., and JONSSON, B. “Late Prognosis in Asphyxia Neonatorium.” Acta Paediatrica 42: 398–406; September 1953. 142. Henry, M. “Recent Advances in Retrolental Fibroplasia.” California Medicine 81: 272–75; October 1954. 143. Hepner, W. R. “Retrolental Fibroplasia.” American Journal of Diseases of Children 88: 356–61; September 1954. 144. Herdan, G. “The Relation Between Birth Weight and Subsequent Weight in Childhood.” Archives of Disease in Childhood 29: 220–23; June 1954. 145. Hewitt, David, and Stewart, Alice. “The Oxford Child Health Survey: A Study of the Influence of Social and Genetic Factors on Infant Weight.” Human Biology 24: 309–19; December 1952. 146. Hicks, S. P. “Mechanism of Radiation Anencephaly, Anophthalmia, and Pituitary Anomalies: Repair in the Mammalian Embryo.” Archives of Pathology 57: 363–78; May 1954. 147. Hicks, S. P.; O’Brien, R. C:, and Newcomb, E. C. “Developmental Malformations Produced by Radiation: A Timetable of Their Development.” American Journal of Roentgenology, Radium Therapy, and Nuclear Medicine 69: 272–93; February 1953. 148. Higgins, L. G. “Prolonged Pregnancy.” Lancet 2: 1154–56; December 4, 1954. 149. Hogan, A. G. “Nutrition.” Annual Review of Biochemistry. (Edited by J. Murray Luck.) Stanford, Calif.: Annual Reviews, 1953. p. 299–318. 150. Holt, L. Emmett, JR., and McIntosh, Rustin. Pediatrics. Twelfth edition. New York: Appleton-Century-Crofts, 1953. 1485 p. 151. Holt, L. Emmett, JR.; Ingalls, Theodore H.; and Hellman, Louis B., editors. Prematurity, Congenital Malformation and Birth Injury. New York: Association for the Aid of Crippled Children, 1953. 255 p. 152. Horn, Eugene H. “The Influence of Dietary Protein and Thyroid on Reproductive Organs of Immature Male Rats.” Anatomical Record 115: 324; February 1953. (Abstract of paper presented at the 66th meeting of the American Association of Anatomists at Ohio State University, Columbus, Ohio, March 1953.) 153. Houghton, J. W., and Ross, W. F. “Birth Weights and Prematurity Rates in Southern Rhodesia.” Transactions of the Royal Society of Tropical Medicine and Hygiene 47: 62–65; January 1953. 154. Howe, Paul E., and Schiller, Maria. “Growth Responses of the School Child to Changes in Diet and Environmental Factors.” Journal of Applied Physiology 5: 51–61; August 1952. 155. Huggert, A. “Appearance of the Fundus Oculi in Prematurely Born Infants Treated with and Without Oxygen.” Acta Paediatrica 43: 327–36; July 1954. 156. Humphreys, R. C. “An Analysis of the Maternal and Foetal Weight Factors in Normal Pregnancy.” Journal of Obstetrics and Gynaecology of the British Empire 61: 725–37; December 1954. 157. Hurlock, Elizabeth B. Adolescent Development. Second edition. New York: McGrawHill Book Co., 1955. 603 p. 158. Ilg, Frances L., and Ames, Louise B. Child Behavior. New York: Harper and Brothers, 1955. 364 p. 159. Iliff, Alberta, and Lee, Virginia A. “Pulse Rate, Respiratory Rate, and Body Temperatures of Children Between Two Months and Eighteen Years of Age.” Child Development 23: 237–45; December 1952. 160. Illingworth, Ronald S. The Normal Child. Boston: Little, Brown and Co., 1953. 342 p. 161. Ingalls, T. H. “The Problem of Mongolism.” Annals of the New York Academy of Sciences 57: 551–57; January 15, 1954.
Salkind_Chapter 20.indd 69
9/4/2010 10:35:01 AM
70
Human Development
162. Ingalls, T. H., and Purshottam, N. “Oxygenation and Retrolental Fibroplasia.” New England Journal of Medicine 250: 621–29; April 1954. 163. Ingalls, T. H., and others. “Genetic Determinants of Hypoxia-Induced Congenital Anomalies.” Journal of Heredity 44: 185–94; September-October 1953. 164. Jackson, R. L., and Kelly, H. G. “Evaluation of Growth of Children.” Journal of School Health 24: 174–76; June 1954. 165. Jeffreys, M. H. “Dental Status of Delaware Children.” Delaware Medical Journal 26: 237–38; September 1954. 166. Jenks, William F., editor. Special Education of the Exceptional Child. Washington, D. C.: Catholic University of America Press, 1953. 156 p. 167. Jersild, Arthur T. Child Psychology. Fourth edition. New York: Prentice-Hall, 1954. 676 p. 168. Jervis, George A., and others. “Revascularization of the Brain in Mental Defectives.” Neurology 3: 871–78; December 1953. 169. Jim, V. K. S., and Krause, A. C. “Electroencephalography in Retrolental Fibroplasia.” American Journal of Ophthalmology 38: 337–41; September 1954. 170. Johnston, J. A. “Growth in Adolescence.” Journal of School Health 24: 179–83; September 1954. 171. Jones, B. W. “Physical Development of Junior College Students.” Junior College Bulletin 23: 306–10; February 1953. 172. Kaplan, Bernice A. “Environment and Human Plasticity.” American Anthropologist 56: 780–800; October 1954. 173. Kaplan, Ira J. “Third Generation Follow-up of Women Treated by X-Ray Therapy for Menstrual Dysfunction and Sterility Twenty-Eight Years Ago, with Detailed Histories of the Grandchildren Born to These Women.” American Journal of Obstetrics and Gynecology 67: 484–90; March 1954. 174. Karn, M. N. “Twin Data: A Further Study of Birth Weight, Gestation Time, Maternal Age, Order of Birth and Survival.” Annals of Eugenics 17: 233–48; February 1953. 175. Kendall, Norman, and Rose, Elizabeth. “A Mechanism of Studying Neonatal Mortality.” Pediatrics 13: 496–99; May 1954. 176. Kerlinger, Fred N. “The Statistics of the Individual Child: The Use of Analysis of Variance with Child Development Data.” Child Development 25: 265–75; December 1954. 177. Kerr, J. D., and SCOTT, G. I. “The Retinopathy of Prematurity.” Archives of Disease in Childhood 29: 543–50; December 1954. 178. Kiil, Vilheim. “Frontal Hair Direction in American Chinese, Indian and Negro Populations.” Journal of Heredity 43: 247–48; September-October 1952. 179. Kinsey, E. V. “Retrolental Fibroplasia.” American Journal of Ophthalmology 39: 105–107; January 1955. 180. Kok, J. C. N. “Some Factors Influencing the Longevity of Bull Sperm Cells in Vitro.” Mammalian Germ Cells. (Edited by G. E. W. Wolstenholme.) Boston: Little, Brown and Co., 1953. p. 82–90. 181. Krogman, W. M. “Physical Growth and Development in Relation to Student Success.” Bulletin of the National Association of Secondary-School Principals 39: 449–56; April 1955. 182. Krohn, P. L., and Zuckerman, S. “Reproduction.” Annual Review of Physiology. (Edited by Victor E. Hall.) Stanford, Calif.: Annual Reviews, 1953. p. 429–56. 183. Kuhlenbeck, Hartwig. “The Human Diencephalon: A Summary of Development, Structure, Function, and Pathology.” Confinia Neurologica Supplementum 1954. 230 p. 184. Lamp, N. A. “Volleyball Skills of Junior High School Students as a Function of Physical Size and Maturity.” Research Quarterly of the American Association for Health, Physical Education, and Recreation 25: 189–200; May 1954.
Salkind_Chapter 20.indd 70
9/4/2010 10:35:01 AM
Jensen
Physical Growth
71
185. Lande-Champain, Lotte. “The Etiology of Mongolism.” Journal of Child Psychiatry 3: 53–69; April 1954. 186. Landucci, L. “Infant Birth and Death Rates in the Province of Siena During the Years 1946–1951.” Lattante 24: 281–95; May 1953. 187. Lanman, J. T. “The Control of Oxygen Therapy for the Prevention of Retrolental Fibroplasia.” Journal of Pediatrics 46: 365–68; March 1955. 188. Lanman, J. T. “The Fetal Zone of the Adrenal Gland: Its Developmental Course, Comparative Anatomy, and Possible Physiologic Function.” Medicine 32: 389–430; December 1953. 189. Lanman, J. T.; Guy, L. P., and Dancis, J. “Retrolental Fibroplasia and Oxygen Therapy.” Journal of the American Medical Association 155: 223–26; May 15, 1954. 190. Lasker, Gabriel W. “Note on the Nutritional Factor in Howell’s Study of Constitutional Type.” American Journal of Physical Anthropology 10: 375–79; September 1952. 191. Leedham, J. N. “Glutamic Acid in the Treatment of Mental Deficiency.” Medical Officer 93: 117–22, March 4; 133–37, March 11, 1955. 192. Leicester, Henry M. “Dentistry.” Annual Review of Medicine. (Edited by Windsor C. Cutting.) Stanford, Calif.: Annual Reviews, 1954. p. 405–14. 193. LeLong, M., and others. “A New Method of Graphic Recording of Growth.” Presse Medicale 33: 701–704; May 5, 1954. 194. Lepkovsky, Samuel, and Borson, Harry J. “Nutrition and Nutritional Disease.” Annual Review of Medicine. (Edited by David A. Rytand.) Stanford, Calif.: Annual Reviews, 1955. p. 93–124. 195. Levinson, Billey. “Effects of Fetal Irradiation on Learning.” Journal of Comparative and Physiological Psychology 45: 140–45; April 1952. 196. Lewis, William H. “Differences in the Rate and Trend of Mortality for Different Age and Sex Groups in Different Eras.” Journal of Gerontology 8: 318–23; July 1953. 197. Ley, G. D. “Some Aspects of Prolonged Gestation.” Medical Journal of Australia 2: 749–52; November 14, 1953. 198. Li, C. H. “Growth and Adrenocorticotropic Hormones of the Anterior Pituitary.” Harvey Lectures 46: 181–217; 1952. 199. Lilienfeld, Abraham M., and Pasamanick, Benjamin. “A Study of Variations in the Frequency of Twin Births by Race and Socio-Economic Status.” American Journal of Human Genetics 7: 204–17; June 1955. 200. Lindegård, Bengt. “Variations in Human Body-Build: A Somatometric and X-Ray Cephalometric Investigation on Scandinavian Adults.” Acta Psychiatrica et Neurologica Supplementum. Copenhagen: Levin and Munksgaard, 1953. 163 p. 201. Llewellyn-Jones, Derek. “Premature Babies in the Tropics.” Journal of Obstetrics and Gynaecology of the British Empire 62: 275–79; April 1955. 202. Longo, I., and Vianello, A. “The Study of the Reflexes of the Immature and the Premature Newborn Infants.” Lattante 25: 149–62; March 1954. 203. Lorr, Maurice, and Fields, Victor. “A Factorial Study of Body Types.” Journal of Clinical Psychology 10: 182–85; April 1954. 204. Loveless, James C. “Relationship of the War-Time Navy Physical Fitness Test to Age, Height, and Weight.” Research Quarterly of the American Association for Health, Physical Education, and Recreation 23: 347–55; October 1952. 205. Low, Alexander. Growth of Children: Sixty-Six Boys and Sixty Girls Each Measured at Three Days and at One, Two, Three, Four and Five Years of Age. Aberdeen, Scotland: University Press, 1952. 63 p. 206. Luck, J. Murray, editor. Annual Review of Biochemistry. Stanford, Calif.: Annual Reviews, 1952. 781 p. 207. Luck, J. Murray, editor. Annual Review of Biochemistry. Stanford, Calif.: Annual Reviews, 1953. 729 p.
Salkind_Chapter 20.indd 71
9/4/2010 10:35:02 AM
72
Human Development
208. Luck, J. Murray, editor. Annual Review of Biochemistry. Stanford, Calif.: Annual Reviews, 1954. 636 p. 209. McArthur, Norma. “The Frequency of Monovular and Binovular Twin Births in Italy, 1949–50.” Acta Geneticae Medicae et Gemellologiae 2: 11–17; January 1953. 210. MacDonald, Ian. “The Growth of the Foetal Head in the Last Weeks of Pregnancy.” Journal of Obstetrics and Gynaecology of the British Empire 60: 61–66; February 1953. 211. MacDonald, Ian. “Hardness of Growth of the Foetal Head.” Journal of Obstetrics and Gynaecology of the British Empire 61: 253–58; April 1954. 212. MacKay, Eaton M. “Endocrinology: Relationships Between the Endocrine and Nervous Systems.” Annual Review of Medicine. (Edited by David A. Rytand.) Stanford, Calif.: Annual Reviews, 1955. p. 359–76. 213. McKeown, Thomas, and Record, R. G. “Influence of Prenatal Environment on Correlation Between Birth Weight and Parental Height.” American Journal of Human Genetics 6: 457–63; December 1954. 214. McLean, J. M. “Lens and Vitreous.” Archives of Ophthalmology 51: 556–69; April 1954. 215. MacLeod, John. “Human Spermatozoan Production in Health and Disease.” Mammalian Germ Cells. (Edited by G. E. W. Wolstenholme.) Boston: Little, Brown and Co., 1953. p. 134–58. 216. MacMahon, B., and McKeown, T. “The Incidence of Harelip and Cleft Palate Related to Birth Rank and Maternal Age.” American Journal of Human Genetics 5: 176–83; June 1953. 217. Mainland, Donald. “Evaluation of the Skeletal Age Method of Estimating Children’s Development: I. Systematic Errors in the Assessment of Roentgenograms.” Pediatrics 12: 114–29; August 1953. 218. Mainland, Donald, and Mainland, Ruth B. “Evaluation of the Skeletal Age Method of Estimating Children’s Development: II. Variable Errors in the Assessment of Roentgenograms.” Pediatrics 13: 165–73; February 1954. 219. Manschot, W. A. “Etiology of Retrolental Fibroplasia.” Archives of Ophthalmology 52: 833–45; December 1954. 220. Marberger, Eve, and Nelson, Warren O. “Sexual Differences in Nuclei of Human Skin.” Journal of Clinical Endocrinology and Metabolism 14: 768; July 1954. 221. Maresh, Marion M. “Linear Growth of Long Bones of Extremities from Infancy Through Adolescence.” American Journal of Diseases of Children 89: 725–42; June 1955. 222. Marshall, F. H. A. Physiology of Reproduction. Third Edition. (Edited by A. S. Parkes.) Volume II. London: Longmans, Green and Co., 1952. 880 p. 223. Martin, W. Edgar. Basic Body Measurements of School Age Children. Washington, D. C.: U.S. Department of Health, Education, and Welfare, Office of Education, 1953. 74 p. 224. Martin, W. Edgar, and Thieme, Fred P. The Functional Body Measurements of School Age Children. Chicago: National School Service Institute, 1954. 90 p. 225. Medawar, P. B. “General Problems of Immunity.” Preservation and Transplantation of Normal Tissues. (Edited by G. E. W. Wolstenholme and Margaret P. Cameron.) Boston: Little, Brown and Co., 1954. p. 1–22. 226. Meredith, Howard V. “Growth in Head Width During the First Twelve Years of Life.” Pediatrics 12: 411–29; October 1953. 227. Meredith, Howard V. “North American Negro Infants: Size at Birth and Growth During the First Postnatal Year.” Human Biology 24: 290–308; December 1952. 228. Meredith, Howard V., and Meredith, E. Matilda. “The Body Size and Form of PresentDay White Elementary School Children Residing in West-Central Oregon.” Child Development 24: 83–102; June 1953.
Salkind_Chapter 20.indd 72
9/4/2010 10:35:02 AM
Jensen
Physical Growth
73
229. Miettinen, Maija. “On Triplets and Quadruplets in Finland.” Acta Paediatrica 43: 493–96; September 1954. 230. Miller, E. “Grand Multiparas.” Obstetrics and Gynecology 4: 418–25; October 1954. 231. Moricard, R. “Meiosis and Fertilization Studies of Mammalian Ova in Vivo and in Vitro.” Gynaecologia 38: 310–36; August 1954. 232. Moss, Melvin L. “Differential Growth Analysis of Bone Morphology.” American Journal of Physical Anthropology 12: 71–75; March 1954. 233. Mourant, Arthur E. The Distribution of Human Blood Groups. Springfield, Ill.: Charles C. Thomas, 1954. 438 p. 234. Muller, H. J. “Damage to Posterity Caused by Irradiation of the Gonads.” American Journal of Obstetrics and Gynecology 67: 467–83; March 1954. 235. Mundy-Castle, A. C., and others. “The Electroencephalogram in the Senile Psychoses.” Electroencephalography and Clinical Neurophysiology 6: 245–52; May 1954. 236. Needham, Joseph. “Developmental Physiology.” Annual Review of Physiology. (Edited by Victor E. Hall.) Stanford, Calif.: Annual Reviews, 1955. p. 37–60. 237. Nelson, Marjorie M. Mammalian Fetal Development and Antimetabolites. Paper presented to 120th A. A. A. S. Meeting, Medical Science Section, Boston, Mass., December 26–27, 1953. 238. Nelson, Marjorie M.; Lyons, William R.; and Evans, Herbert M. “Comparison of Ovarian and Pituitary Hormones for Maintenance of Pregnancy in PyridoxineDeficient Rats.” Endocrinology 52: 585–89; May 1953. 239. Nelson, Warren O. “Reproduction.” Annual Review of Physiology. (Edited by Victor E. Hall.) Stanford, Calif.: Annual Reviews, 1955. p. 443–58. 240. Newman, Russell W., and Munro, Ella H. “The Relation of Climate and Body Size in U.S. Males.” American Journal of Physical Anthropology 13: 1–17; March 1955. 241. Nichols, John B. “Plural Births in the United States.” Western Journal of Surgery, Obstetrics and Gynecology 61: 229–36; May 1953. 242. Nichols, John B. “Quintuplet and Sextuplet Births in the United States.” Acta Geneticae Medicae et Gemellologiae 3: 143–52; May 1954. 243. Nicolson, A. B., and Hanley, C. “Indices of Physiological Maturity: Derivation and Interrelationships.” Child Development 24: 3–38; March 1953. 244. Noback, Charles R. “The Appearance of Ossification Centers and the Fusion of Bones.” American Journal of Physical Anthropology 12: 63–69; March 1954. 245. Obrist, Walter D. “The Electroencephalogram of Normal Aged Adults.” Electroencephalography and Clinical Neurophysiology 6: 235–44; May 1954. 246. Ojemann, Ralph H. “Child Growth and Development.” Children in Focus: Their Health and Activity. 1954 Yearbook. Washington, D. C.: American Association for Health, Physical Education, and Recreation, a department of the National Education Association, 1954. p. 47–55. 247. Olson, Willard C. “Recent Research Findings in Human Growth and Development as They Apply to Teacher Education.” Sixth Yearbook, 1953. Oneonta, N. Y.: American Association of Colleges for Teacher Education, a department of the National Education Association (Secy.-Treas.: Edward C. Pomeroy, 11 Elm Street), 1953. p. 46–63. 248. Olson, Willard C., and Lewellen, J. B. How Children Grow and Develop. Chicago: Science Research Associates, 1953. 48 p. 249. Osgood, Edwin E. “Development and Growth of Hematopoietic Tissues: With a Clinically Practical Method of Growth Analysis.” Pediatrics 15: 733–51; June 1955. 250. Oster, Jacob. Mongolism. Copenhagen: Danish Science Press, 1955. 206 p. 251. Ostry, E. I. “The Effect of Delay in the First Stage of Labour on the Forceps Rate and on the Stillbirth and Neonatal Mortality Rates: An Analysis of 4,401 Consecutive Primigravid Labours at Term with 482 Cases Having a Prolonged First Stage.” Journal of Obstetrics and Gynaecology of the British Empire 62: 115–16; February 1955.
Salkind_Chapter 20.indd 73
9/4/2010 10:35:02 AM
74
Human Development
252. Owens, William C., and others. “Symposium: Retrolental Fibroplasia (Retinopathy of Prematurity).” American Journal of Ophthalmology 40: 159–89; August 1955. 253. Oxom, Harry. “Hazards of Grand Multiparity.” Obstetrics and Gynecology 5: 150–56; February 1955. 254. Paiva, Silvio L. “Pattern of Growth of Selected Groups of Breast-Fed Infants in Iowa City.” Pediatrics 11: 38–47; January 1953. 255. Park, Edwards A. “Bone Growth in Health and Disease.” Archives of Disease in Childhood 29: 269–81; August 1954. 256. Parnell, R. W. “Somatotyping by Physical Anthropometry.” American Journal of Physical Anthropology 12: 209–39; June 1954. 257. Patt, H. M. “Radiation Effects on Mammalian System.” Annual Review of Physiology. (Edited by Victor E. Hall.) Stanford, Calif.: Annual Reviews, 1954. p. 51–80. 258. Patten, Bradley M. Human Embryology. Second edition. London: J. and A. Churchill, 1953. 798 p. 259. Patz, Arnall. “Oxygen Studies in Retrolental Fibroplasia: IV. Clinical and Experimental Observations.” American Journal of Ophthalmology 38: 291–308; September 1954. 260. Patz, Arnall, and others. “Oxygen Studies in Retrolental Fibroplasia: II. The Production of the Microscopic Changes of Retrolental Fibroplasia in Experimental Animals.” American Journal of Ophthalmology 36: 1511–22; November 1953. 261. Peckos, Penelope S. “Caloric Intake in Relation to Physique in Children.” Science 117: 631–33; June 1953. 262. Penrose, L. S. “Mongolian Idiocy (Mongolism) and Maternal Age.” Annals of the New York Academy of Science 57: 494–502; January 1954. 263. Pere, Soini; Kunnas, Mlkko; and Telkkä, Anti. “Correlation Between Performance and Physique in Finnish Athletes.” American Journal of Physical Anthropology 12: 201–208; June 1954. 264. Phillips, H. T. “Some Social and Ethnic Variations in the Physique of South African Nursery School Children.” Archives of Disease in Childhood 28: 226–31; June 1953. 265. Plummer, George. “Anomalies Occurring in Children Exposed in Utero to the Atomic Bomb in Hiroshima.” Pediatrics 10: 687–93; December 1952. 266. Posner, A. C; Friedman, S.; and Posner, L. B. “The Large Fetus.” Obstetrics and Gynecology 5: 268–78; March 1955. 267. Pugh, M. C. “Charting Growth with the Wetzel Grid.” Research Quarterly of the American Association for Health, Physical Education, and Recreation 25: 47–48; March 1954. 268. Race, R. R., and Sanger, Ruth. Blood Groups in Man. Springfield, Ill.: Charles C. Thomas, 1954. 400 p. 269. Rand, Winifred, and others. Growth and Development of the Young Child. Fifth edition. Philadelphia: W. B. Saunders Co., 1953. 523 p. 270. Rarick, G. L. “Maturity Indicators and the development of Strength and Skill.” Education 75: 69–73; October 1954. 271. Reynolds, Earle L. The Distribution of Subcutaneous Fat in Childhood and Adolescence. Evanston, Ill.: Child Development Publications, 1952. 189 p. 272. Reynolds, S. R. M.; Paul, W. M.; and Huggett, A. “Physiological Study of Monkey Fetus in Utero: A Procedure for Blood Pressure Recording, Blood Sampling and Injection of the Fetus under Normal Conditions.” Bulletin of the Johns Hopkins Hospital 95: 256–68; November 1954. 273. Ridley, Harold. “Further Observations on Intraocular Acrylic Lens in Cataract Surgery.” Transactions of the American Academy of Ophthalmology and Otolaryngology 57: 98–106; January-February 1953. 274. Roberts, D. F. “Body Weight, Race and Climate.” American Journal of Physical Anthropology 11: 533–58; December 1953.
Salkind_Chapter 20.indd 74
9/4/2010 10:35:02 AM
Jensen
Physical Growth
75
275. Rubin, I. C. “Third Generation Follow-up in Women Receiving Pelvic Irradiation.” Journal of the American Medical Association 150: 207–209; September 20, 1952. 276. Rugh, Roberts. “Genetic Hazards in Ovarian Radiation.” Journal of Obstetrics and Gynaecology of the British Empire 62: 461–63; June 1955. 277. Russell, L. B., and Russell, W. L. “An Analysis of the Changing Radiation Response of the Developing Mouse Embryo.” Journal of Cellular and Comparative Physiology 43: Supplement 1, 103–49; May 1954. 278. Schlesinger, Edward R., and Allaway, Norman C. “The Combined Effect of Birth Weight and Length of Gestation on Neonatal Mortality among Single Premature Births.” Pediatrics 15: 698–704; June 1935. 279. Schmitz, Karl L. “The Calculation of the Body Surface.” Zeitschrift Für Biologie 106: 325–29; March 1954. 280. Sheldon, William A. Atlas of Men: A Guide for Somatotyping the Adult Male at all Ages. New York: Harper and Brothers, 1954. 357 p. 281. Sheridan, F. P., and others. “Electroencephalography as a Diagnostic and Prognostic Aid in Studying the Senescent Individual: A Preliminary Report.” Journal of Gerontology 10: 53–59; January 1955. 282. Sholl, D. A. “Regularities in Growth Curves, Including Rhythms and Allometry.” Dynamics of Growth Processes. (Edited by Edgar J. Boell.) Princeton, N. J.: Princeton University Press, 1954. p. 224–41. 283. Silverman, William A.; Fertig, John W.; and Kraus, Arthur. “A Proposed Method of Computing Standardized Death Rates for Premature Nurseries.” Pediatrics 15: 467–78; April 1955. 284. Simon, A., and others. “Metabolic Studies in Mongolism: Serum Protein-Bound Iodine, Cholesterol and Lipoprotein.” American Journal of Psychiatry 111: 139–45; August 1954. 285. Škerlj, Bozo; Bžozek, Josef; and Hunt, Edward E., JR. “Subcutaneous Fat and Age Changes in Body Build and Body Form in Women.” American Journal of Physical Anthropology 11: 577–600; December 1953. 286. Smith, Audrey U., and Parkes, A. S. “Storage and Homografting of Endocrine Tissues.” Preservation and Transplantation of Normal Tissues. (Edited by G. E. W. Wolstenholme and Margaret P. Cameron.) Boston: Little, Brown and Co., 1954. p. 76–85. 287. Smith, Thomas C. “The Action of Relaxin on Mammary Gland Growth in the Rat.” Endocrinology 54: 59–70; January 1954. 288. Soffer, Louis J., and Gabrilove, J. Lester. “Endocrinology.” Annual Review of Medicine. (Edited by Windsor C. Cutting.) Stanford, Calif.: Annual Reviews, 1954. p. 115–66. 289. Sontag, Lester W., and Garn, Stanley M. “Growth.” Annual Review of of physiology. (Edited by victor E. Hall.) Stanford, Calif.: Annual Reviews, 1954. p. 37.50. 290. Spies, Tom D., and others. “Skeletal Maturational Progress in Children with Chronic Nutritive Failure.” American Journal of Diseases of Children 85: 1–12; January 1953. 291. Spratt, Nelson T., Jr. “Developmental Physiology.” Annual Review of Physiology. (Edited by Victor E. Hall.) Stanford, Calif.: Annual Reviews, 1953. p. 21–38. 292. Staton, Wesley M. “The Adolescent: His Physical Growth and Health.” Review of Educational Research 24: 19–29; February 1954. 293. Stewart, T. D. “Metamorphosis of the Joints of the Sternum in Relation to Age Changes in Other Bones.” American Journal of Physical Anthropology 12: 519–35; December 1954. 294. Stodard, Roy. “Prenatal Environment and Congenital Malformations.” Bulletin of the Tulane Medical Faculty 12: 123–27; May 1953. 295. Strom, Justus. “The Decrease of Infant Mortality in Sweden and Its Causes.” Nordisk Medicin 50: 1285–94; September 17, 1953.
Salkind_Chapter 20.indd 75
9/4/2010 10:35:02 AM
76
Human Development
296. Suarez, Manuel. “Growth: Critical Study of Graphic Methods of Expression of Growth.” Revista Española de Pediatria 8: 595–605; September-October 1952. 297. Suarez, Manuel. “Growth: Critical Study of Somatometry.” Revista Española de Pediatria 8: 571–94; September-October 1952. 298. Suarez, Manuel, and Peva, J. “Growth: Correlation Between Creatininuria and Radiographic Muscular Area as Index of Muscle Development.” Revista Española de Pediatria 8: 663–73; September-October 1952. 299. Suarez, Manuel, and Peva, J. “Growth: Creatininuria as Index of Muscular Development.” Revista Española de Pediatria 8: 633–52; September-October 1952. 300. Suarez, Manuel, and Peva, J. “Growth: Radiographic Index of Muscle Development.” Revista Española de Pediatria 8: 653–62; September-October 1952. 301. Suarez, Manuel, and Tirjeira, J. “New Method of Graphic Presentation of Growth.” Revista Española de Pediatria 8: 439–50; July-August 1952. 302. Sutow, Wataru W. “Skeletal Maturation in Healthy Japanese Children, 6 to 19 Years of Age: Comparison with Skeletal Maturation in American Children.” Hiroshima Journal of Medical Science 2: 181–93; 1953. 303. Sutow, Wataru, W., Terasaki, Taro; and Ohwada, Kenji. “Comparison of Skeletal Maturation with Dental Status in Japanese Children.” Pediatrics 14: 327–33; October 1954. 304. Swingle, W. W., and Kleinberg, William. “The Pituitary and Adrenals.” Annual Review of Physiology. (Edited by Victor E. Hall.) Stanford, Calif.: Annual Reviews, 1955. p. 367–92. 305. Taff, M. A., Jr., and Wilbar, C. L., Jr. “Immaturity of Single Live Births According to Weight, with Particular Reference to Race.” American Journal of Diseases of Children 85: 279–84; March 1953. 306. Talbot, Nathus B., and others. Functional Endocrinology from Birth Through Adolescence. Cambridge, Mass.: Harvard University Press, 1952. 638 p. 307. Tanner, J. M. “The Effect of Weight-Training on Physique.” American Journal of Physical Anthropology 10: 427–60; December 1952. 308. Tanner, J. M. “Reliability of Anthropometric Somatotyping.” American Journal of Physical Anthropology 12: 257–65; June 1954. 309. Terner, C. “Aerobic Metabolism and Semen Quality.” Mammalian Germ Cells. (Edited by G. E. W. Wolstenholme.) Boston: Little, Brown and Co., 1953. p. 46–58. 310. Theman, V. “Emerging Concepts of Child Growth and Development: What They Suggest for Classroom Practice.” The American Elementary School. Thirteenth Yearbook, John Dewey Society. New York: Harper and Brothers, 1953. p. 57–86. 311. Thompson, George G. Child Psychology. New York: Houghton Mifflin Co., 1952. 667 p. 312. Toverud, G., and others. A Survey of the Literature of Dental Caries. Washington, D. C.: National Research Council, 1952. 567 p. 313. Trotter, Mildred, and Gleser, Goldine C. “Estimation of Stature from Long Bones of American Whites and Negroes.” American Journal of Physical Anthropology 10: 463–514; December 1952. 314. Tuddenham, Read D., and Snyder, Margaret M. Physical Growth of California Boys and Girls from Birth to Eighteen Years. Publications in Child Development, Vol. 1, No. 2. Berkeley: University of California Press, 1954. p. 183–364. 315. Tyler, Frank H., and Armstrong, Marion D. “Diseases of the Nervous System: Metabolic Aspects of Some Neurological and Muscular Disorders.” Annual Review of Medicine. (Edited by Windsor C. Cutting.) Stanford, Calif.: Annual Reviews, 1954. p. 207–22. 316. Tyler, Fred T. “Concepts of Organismic Growth: A Critique.” Journal of Educational Psychology 44: 321–42; October 1953. 317. Tyler, Fred T. “Organismic Growth: P-Technique in the Analysis of Longitudinal Growth Data.” Child Development 25: 83–90; June 1955.
Salkind_Chapter 20.indd 76
9/4/2010 10:35:02 AM
Jensen
Physical Growth
77
318. United Nations, Department of Social Affairs, Population Division. Foetal, Infant and Early Childhood Mortality. Volume I: The Statistics. New York: Columbia University Press, International Documents Service, 1955. 137 p. 319. United Nations, Department of Social Affairs, Population Division. Foetal, Infant and Early Childhood Mortality. Volume II: Biological, Social and Economic Factors. New York: Columbia University Press, International Documents Service, 1955. 44 p. 320. U.S. Department of Health, Education, and Welfare, Social Security Administration, Children’s Bureau, Research Relating to Children: January 1, 1952–October 31, 1952. Bulletin II. Washington, D.C.: Superintendent of Documents, Government Printing Office, 1953. 459 p. 321. U.S. Department of Health, Education, and Welfare, Social Security Administration, Children’s Bureau. Research Relating to Children: January 1, 1952-March 31, 1954. Bulletin II, Supplement No. 1. Washington, D.C.: Superintendent of Documents, Government Printing Office, 1954. 76 p. 322. U.S. Department of Health, Education, and Welfare, Social Security Administration, Children’s Bureau. Research Relating to Children: November 1, 1952-May 31, 1954. Bulletin II, Supplement No. 2, Washington, D.C.: Superintendent of Documents, Government Printing Office, 1954. 195 p. 323. U.S. Department of Health, Education, and Welfare, Social Security Administration, Children’s Bureau. Research Relating to Children: April 1, 1954-January 31, 1955. Bulletin II, Supplement No. 3. Washington, D.C.: Superintendent of Documents, Government Printing Office, 1955. 235 p. 324. Verrotti, M. “An Analysis of the Causes and Frequency of Prematurity, Especially During the War Years.” Lattante 24: 449–57; July 1953. 325. Walker, James. “Foetal Anoxia.” Journal of Obstetrics and Gynaecology of the British Empire 61: 162–80; April 1954. 326. Wallace, Helen M., and others. “Trends in Maternal and Perinatal Mortality in New York City.” Journal of the American Medical Association 155: 716–19; June 1954. 327. Washburn, Alfred H. “Human Growth, Development, and Adaptation.” American Journal of Diseases of Children 90: 2–5; July 1955. 328. Wedgwood, Ralph J., and others. “Relationship of Body Composition to Basal Metabolic Rate in Normal Man.” Journal of Applied Physiology 6: 317–34; December 1953. 329. Weech, A. A. “Signposts on the Highway of Growth.” American Journal of Diseases of Children 88: 452–57; October 1954. 330. Wegman, Myron E. “Public Health, Nursing and Medical Social Work: Vital Statistics in the U.S.A.—1953.” Pediatrics 13: 588–89; June 1954. 331. Wegman, Myron E. “Weight at Birth and Survival of the Newborn.” Pediatrics 14: 396–400; October 1954. 332. Weiner, J. S., and Thambipillai, V. “Skeletal Maturation of West African Negroes.” American Journal of Physical Anthropology 10: 407–18; December 1952. 333. Wenar, Charles. “The Effect of a Motor Handicap on Personality: II. The Effects on Integrative Ability.” Child Development 25: 287–94; December 1954. 334. Wetzel, Norman C., and others. “Growth Failure in School Children: Further Studies of Vitamin B12 Dietary Supplements.” Journal of Clinical Nutrition 1: 17–31; September-October 1952. 335. Wilson, James G. “Differentiation and the Reaction of Rat Embryos to Radiation.” Journal of Cellular and Comparative Physiology 43: Supplement 1, 11–37; May 1954. 336. Wilson, James L. “Pediatrics.” Annual Review of Medicine. (Edited by Windsor C. Cutting.) Stanford, Calif.: Annual Reviews, 1954. p. 389–404. 337. Wolff, Etienne, and others. “The Culture of Embryonic Organs in Synthetic Media.” Journal of Embryology and Experimental Morphology 1: 55–84; March 1953.
Salkind_Chapter 20.indd 77
9/4/2010 10:35:02 AM
78
Human Development
338. Wolstenholme, G. E. W., editor. Mammalian Germ Cells. Boston: Little, Brown and Co., 1953. 302 p. 339. Wolstenholme, G. E. W., and Cameron, Margaret P., editors. Preservation and Transplantation of Normal Tissues. Boston: Little, Brown and Co., 1954. 236 p. 340. Workany, Josef. Congenital Malformations Induced by Maternal Dietary Deficiency. Harvey Lectures Series. New York: J. B. Lippincott, 1952–53. 273 p. 341. Yamazaki, James N.; Wright, Stanley W.; and Wright, Phyllis M. “Outcome of Pregnancy in Women Exposed to the Atomic Bomb in Nagasaki.” American Journal of Diseases of Children 87: 448–63; April 1954. 342. Zacharias, L., and others. “The Incidence and Severity of Retrolental Fibroplasia in Relation to Possible Causative Factors.” American Journal of Ophthalmology 38: 317–36; September 1954. 343. Zubek, John P., and Solberg, Patricia A. Human Development. New York: McGraw-Hill Book Co., 1955. 476 p.
Salkind_Chapter 20.indd 78
9/4/2010 10:35:03 AM
21 Mental Development during the Preadolescent and Adolescent Periods Gordon Hendrickson
Summaries and General Treatments
S
egel (57) prepared a well-organized monograph on the development of intellectual abilities in the adolescent period. He based his discussion on a wide variety of sources as well as on some new data. Other surveys of the research literature may be found in the revised Encyclopedia of Educational Research (42), and in articles by Jones and Bayley (33), and Thorndike (68). Textbooks on child or adolescent psychology by Averill (3), Breckenridge and Vincent (10), Cole (14), Cole and Morgan (15), Hurlock (30), and Olson (46) included sections on mental development. Olson drew many inferences for school practice from the research literature of the period. Anderson (2) proposed a classification for literature pertinent to child development. Gesell and Ilg (25) combined their earlier works on infant and child development, stressing characteristics of successive age levels up to age 10. Jenkins, Shacter, and Bauer (31) prepared a popular treatment of the characteristics of children at each age from five to 11, and Schnell (55) wrote a digest on psychological characteristics of youth at four age levels from 10 to 21. In a pamphlet addressed to adolescents, Bouthilet and Bryne (9) discussed the factors in general intelligence and provided a helpful orientation for prospective test-takers.
Source: Review of Educational Research, XX(5) (1950): 351–360.
Salkind_Chapter 21.indd 79
9/4/2010 10:34:48 AM
80
Human Development
The Organization of Intellectual Powers Theoretical discussions as well as research on the organization of intellect have revolved around the problems of traits or factors. Burt (12) argued that the evidence points to a hierarchy of a general factor and a small number of broad group factors, subdivided into narrower group factors; i.e., to factors arranged by levels. To the primary group abilities recognized by Thurstone, Burt (11) would add a general factor. Anastasi (1) regarded traits as results of learning and, due to the greater cultural standardization of intellectual activities, considered traits as more consistent and easier to identify in the intellectual aspects of behavior than in the emotional aspects. New tests produced in this period were chiefly analytical in character and generally yielded several scores. Notable are the SRA Primary Mental Abilities tests (PMA) by the Thurstones (69, 70), appropriate for subjects from the junior high school thru the college level, and a set of seven differential aptitude tests by Bennett, Seashore, and Wesman (5).
Adolescent Intellectual Abilities A number of studies attempted to isolate specific adolescent intellectual traits or dealt with their relationships. Johnson (32) studied problem-solving abilities in arithmetic at the eighth-grade level. Of the PMA tests, the vocabulary test gave the highest correlation with arithmetic problem tests. The flow of words in writing was studied by Taylor (66). Taylor analyzed fluency for highschool seniors into two factors: word fluency, i.e., facility in producing single, isolated words; and ideational fluency, facility in expressing ideas by means of words and their meanings. Murray (44) employed a multiple correlation procedure to analyze the geometric ability of high-school boys. He found spatial ability, as measured by the Minnesota Paper Form Board, and reasoning, as measured by the PMA tests, contributed less to success in geometry than numerical or verbal ability, as measured by the Modified Alpha Examination. Fattu and Fox (24) found the ability of ninth-grade pupils to interpret data to be closely associated with factors which make up typical group measures of intelligence and achievement. A unique approach to traits is found in two French studies by Michaud (38, 39). He was concerned with the interpretation which pupils give to geometric figures, and asked children aged nine to 14 to interpret the thickness of squares drawn on a blackboard or on paper. He found the percentage of realistic responses to diminish, and the percentage of rational responses to increase with age. Realistic responses, wherein subjects aged 10 to 15 were asked what would happen if a triangle which they imagined drawn on the ground were superimposed upon another imagined triangle, were also more characteristic of younger children in the second study.
Salkind_Chapter 21.indd 80
9/4/2010 10:34:49 AM
Hendrickson
Preadolescence and Adolescence
81
Factor Analysis Studies The most frequent procedure in attacking the problem of intellectual organization continued to be that of factor analysis. This procedure has been used to study changes with age, the relative importance of various factors for prediction, and other issues. In a group of studies, Swineford (62, 63, 65) reported the results of test batteries administered to pupils in Grades V to X and repeated at various intervals. Six tests were given to pupils in Grades VII or VIII and repeated when the pupils were in Grade IX. After one or two years factor analysis revealed no material change in the factor composition of the tests. The general factor apparently increased both in its absolute and in its relative contribution to the total test variance. For a group of pupils who took nine tests in the sixth grade and again in the ninth grade, three bi-factors persisted as entities but grew at different rates; the general factor most, the verbal factor to some degree, the spatial factor not at all. The means for the general factor increased repeatedly and steadily with school grade level from Grade V to X. The means for the verbal factor increased gradually and irregularly. Retarded pupils were markedly inferior in the general factor, less so in the verbal factor, and equivalent to the normal group in the spatial factor. According to Swineford the general factor is the only one which predicted school marks with any consistency. Another report by Swineford (64) dealt with a number factor revealed by data from 19 tests given to ninth-grade pupils. This factor may be related to the pupil’s mental set in approaching a task, a set determined by his liking or dislike for numbers. Swineford also inferred from the data that girls are more affectively sensitive to numbers than boys. Curtis (16) also presented data emphasizing the importance of a general factor. His data on nine- and 12-year-olds failed to support Garrett’s hypothesis that “abstract or symbol intelligence changes in its organization as age increases from a fairly unified and general ability to a loosely organized group of abilities or factors.” On the other hand, Segal (57) accepted Garrett’s conclusion and presented new data to show that differentiation among traits is more pronounced for bright ninth-grade pupils than for dull ones. Diamond (21) used a factor analysis procedure which he believed showed that the Wechsler-Bellevue subtests may serve as indicators of linguistic, clerical, and spatial aptitudes.
Development in Specific Traits A number of investigators compared subjects at various ages by measures which were designed to reveal growth in particular traits. Webber and Hunnicutt (74) studied improvement in the ability to perceive change of color in painting with subjects from Grades I thru IX. Birch (6) found the Goodenough
Salkind_Chapter 21.indd 81
9/4/2010 10:34:49 AM
82
Human Development
drawing test valuable in studying the processes of concept formation in a group of borderline or mentally defective children aged 10 to 16 years. Three investigators were interested in moral traits. Turner (71) developed a scale of altruism and found no improvement from age nine to age 16. Beller (4) studied the attitudes toward honesty of boys aged nine, 12, and 15 years. On the basis of verbal problems, Dowd (22) studied moral reasoning in Catholic girls from Grades VIII to XII. Hilden (27) reported a study of 100 children from birth, 30 of whom had taken several mental tests by the age of 16. The mean IQ of the subjects was 119 with a range of differences between repeated tests from seven to 64 points. On the average there was a slow and reliable rise in score not accounted for by practice effects. Hilden suggested that the highest IQ score prior to puberty might be more representative of mid-adolescent status than scores on the early test. Another retest study by Kvaraceus and Lanigan (36) reporting data on the Iowa Every-Pupil Tests of Basic Skills administered at half-year intervals in the junior high school, indicated that individual performance at any one testing period should be interpreted with discretion; in some cases scores drop for a test period. A European study presented results from tests of subjects of various age levels. Vernon (72) found general intelligence increasing more rapidly and to a later age among boys who continue in school to the age of 17 and beyond, and among men in “intellectual” occupations. In general, Vernon concluded that abilities depend largely on the extent to which they are used.
Gains in Intelligence during College Years Retests of students on the American Council on Educational Psychological Examination were reported from several colleges. In general, the investigators concluded that gains over and above practice effects do occur. Thorndike (67) found such gains occurring to the age of 20 and probably beyond. Projections of the growth curve for his data indicated either age 21 years, six months or 25 years, nine months as a point of zero gain, depending upon the mathematical treatment of the data. Shuey (59) found gains for college students.
Prediction of Academic Success Studies on prediction of academic success in high school and college ranged from those which employed simple correlations of test scores and grades to multiple correlation and factor analysis studies. Shaw (58) used multiple correlation and Beta coefficients in treating data from the PMA tests and 13 measures of achievement for 591 high-school students. He found verbal-meaning to be highly related to every achievement measured, with reasoning in second
Salkind_Chapter 21.indd 82
9/4/2010 10:34:49 AM
Hendrickson
Preadolescence and Adolescence
83
place but not closely so. Little power to predict achievement was found for number, word fluency, space, and memory scores. A study evaluating several tests for prediction of high-school achievement was reported by Bolton (7). At the college level Remmers, Elliott, and Gage (48) found certain tests developed at Purdue (Placement Test in English, Mathematics Training Test, Physical Science Test) more predictive of grade point averages of freshmen than scores on the American Council on Education Psychological Examination. Lanigan (37) found that the ACE differentiated better between highachieving and low-achieving college students than the Otis test or the Minnesota Speed of Reading Tests. At the University of Wisconsin, Milligan, Lins, and Little (40) also found the ACE especially useful for identifying students at the upper and lower ends of the distribution of intelligence. They reported the ACE helpful in predicting achievement for nonhigh-school graduates admitted to the university. Borg (8) reported low positive correlations between the ACE and success in a college of arts and crafts. Investigators using the ACE repeatedly on the same group were warned by Muntyan (43) that the norms for a first testing cannot be justifiably used in interpreting the results of a retest.
Miscellaneous Relationships of Intellectual Abilities Kendall (34) reported that there was no significant relationship of scores on a memory-for-design test with retardation in reading for a group of children aged six to 16. Hobson (29) gave PMA tests in Grades VIII and IX. Significant sex differences were found. Boys were superior in spatial orientation, and girls excelled in word fluency, inductive reasoning, and visual memory. Wheeler and Wheeler (75) inferred from correlations between ACE and reading test scores of university freshmen that ACE performance is highly influenced by reading skill. A Dutch version of the National Intelligence Test was used by de Groot (20) to study the effects of war upon the intelligence of youth. His 13- and 14-year-old subjects averaged four IQ points lower than similar subjects tested in prewar years.
Intellect in Relation to Social Factors Davis and Havighurst (19) prepared a general report on cultural factors claimed to produce differential test results in various socio-economic groups. These writers continued to present evidence on this problem as well as general discussions of their theoretical position (18). Schulman and Havighurst (56) found a correlation of .46 between vocabulary size and socioeconomic status for children in Grades IX and X in a midwestern community.
Salkind_Chapter 21.indd 83
9/4/2010 10:34:49 AM
84
Human Development
Durea (23) presented some evidence indicating that the mental retardation of delinquent boys aged 11 to 18 may be a reflection of the sub-par socioeconomic conditions from which the delinquents come.
Intellectual Growth of Feeble-minded Children By the publication of claims that feeble-minded children had been made normal thru education, Schmidt (52, 53) precipitated one of the most violent psychological controversies of recent years. Popular articles on Schmidt’s work by Stern (61) and Clark (13) challenged long-settled beliefs concerning the improvement of the mentally deficient. Schmidt reported an eightyear study of 322 school children, aged 12 to 14, ranging in IQ from 27 to 69, including experimental and control groups. For three years the experimental subjects were taught in a school environment planned to decrease nervous tensions, to remove emotional blocks, to further social interaction, and to develop self-confidence and a sense of personal worth. Regular school subjects plus hand work were taught at a slower rate than normal to a control group. A five-year follow-up permitted study of out-of-school or later school adjustment of these individuals. The results for the experimental group included (a) gains in social adjustment and maturity and in Bernreuter scores, (b) the completion of a four-year high-school course by 27 percent of the group, (c) a good employment record for children out of school, and (d) an increase in IQ from an initial mean of 52.1 to 71.6 after three years of training and to 89.3 after the five-year follow-up. Kirk (35) reviewed Schmidt’s study in the light of an investigation of pertinent data in board of education records in Chicago, where Schmidt was a teacher. Kirk raised questions concerning: (a) the correspondence of the initial IQ distributions for the subjects with the statistics for Chicago special classes as a whole; (b) the appropriateness of the Bernreuter test for pupils of the mental status of the subjects; (c) certain statistical anomalies in the presentation of the data; and (d) the professional status of Schmidt at the time of the study. Schmidt (54) replied in general terms stressing scientific method, similar results reported by other investigators, and professional ethics. A survey by Nolan (45) revealed considerable doubt of the validity of the results on the part of several well-known psychologists. Other evidence on intellectual changes in mental defectives is conflicting. Rudolf (49) reported that on the Wechsler-Bellevue verbal scale and on the Vineland Social Maturity Scale 395 defectives showed more rises than declines on retests six months or more after initial tests. The inference was drawn that defectives should be given continued education after the age of 16. Guertin (26) reported on the mental growth curve of 25 institutionalized defectives whose IQ scores showed marked increase over a period of time in comparison with the IQ performance of 25 controls who failed to show improvement.
Salkind_Chapter 21.indd 84
9/4/2010 10:34:49 AM
Hendrickson
Preadolescence and Adolescence
85
A report by Hill (28) on retests of 107 special-class children in Des Moines showed occasional significant changes in IQ, possibly due to the social environment, but there were no consistent gains such as those reported by Schmidt. Sloan and Harman (60) studied 1446 institutionalized mental defectives, for whom the median chronological age at initial testing was 14.4, at final testing, 17.6; the corresponding median IQ’s were 51.9 and 47.4. Cutts and Lane (17) reported that 57 defectives who had been hospitalized for seven years received lower scores on the Wechsler-Bellevue verbal scale than 57 defectives hospitalized for one year. Two studies dealt with educational programs for adolescent special-class pupils. Mones (41) discussed 10 years of experience in Newark, New Jersey, where a specially adapted program at the junior-high-school level proved profitable to special-class children.
Chemical Regulators of Intellectual Growth Glutamic acid has been claimed by several investigators to stimulate intellectual growth on the part of mental defectives. Waelsch (73) reviewed 23 references on this subject. Zimmerman, Burgemeister, and Putnam (77) reported on a series of clinical cases, ranging from infancy to adolescence, and concluded that glutamic acid accelerates mental functioning in human beings, chiefly in the first six months of treatment. A ceiling of improvement is apparently reached after one year of therapy. Zimmerman (76) suggested a definite dosage and claimed that the treatment had value for children in the 70 to 80 IQ range. Quinn and Durling (47) reported small gains (three to five IQ points) for institutionalized defectives treated with glutamic acid for six months. Rudolf (50, 51) investigated the value of thiamine treatment. Out of 90 defectives who had not improved for over a year, all of whom were treated with thiamine, 17 showed some increase in IQ, and 20 showed an increase in social age.
Unsettled Issues Few issues in this field can be regarded as closed, but a list of a few unsolved problems in which there is current interest may be helpful. Several of the following research areas were suggested by Segel’s review (57): (a) The existence or significance of a general factor in intellect; (b) Increase of differentiation among traits with age; (c) Relative variation of traits within the individual and within groups; (d) Stability of the mental growth of individuals; (e) Independence of time cycles for growth of various traits; (f) Existence of definite interest areas in early adolescence and their relationship to intelligence; and (g) Relationship of intellectual traits and level to socio-economic factors.
Salkind_Chapter 21.indd 85
9/4/2010 10:34:49 AM
86
Human Development
Bibliography 1. Anastasi, Anne. “The Nature of Psychological ‘Traits.’ ” Psychological Review 55: 127–38; May 1948. 2. Anderson, John E. Classification and Index for the Child Development, Human Development, and Psychology Areas., Minneapolis, Minnesota: Institute of Child Welfare, 1947. 98 p. 3. Averill, Lawrence A. The Psychology of the Elementary School Child. New York: Longmans, Green and Company. 459 p. 4. Beller, Emanuel K. “TWO Attitude Components in Younger Boys.” Journal of Social Psychology 29: 137–51; May 1949. 5. Bennett, George K., Seashore, Harold G., and Wesman, Alexander G. Differential Aptitude Tests. New York: Psychological Corporation, 1947. 6. Birch, Jack W. “The Goodenough Drawing Test and Older Mentally Retarded Children.” American Journal of Mental Deficiency 54: 218–24; October 1949. 7. Bolton, Floyd B. “Value of Several Intelligence Tests for Predicting Scholastic Achievement.” Journal of Educational Research 41: 133–38; October 1947. 8. Borg, Walter R. “A Study of the Relationship Between General Intelligence and Success in an Art College.” Journal of Educational Psychology 40: 434–40; November 1949. 9. Bouthilet, Lorraine, and Bryne, Katherine M. You and Your Mental Abilities. Chicago: Science Research Associates, 1948. 48 p. 10. Breckenridge, Marian E., and Vincent, E. Lee. Child Development; Physical and Psychological Growth Through the School Years. Philadelphia: W. B. Saunders Company, 1949. 622 p. 11. Burt, Cyril. “Critical Notice of Thurstone’s ‘Multiple Factor Analysis.’ ” British Journal of Educational Psychology 17: 163–69; November 1947. 12. Burt, Cyril. “The Structure of the Mind; A Review of the Results of Factor Analysis.” British Journal of Educational Psychology 19: 176–99; June 1949. 13. Clark, Thomas, B. “They Are Feeble-minded No Longer.” Reader’s Digest 51: 111–15: September 1947. 14. Cole, Luella. Psychology of Adolescence. New York: Rinehart and Company, 1948. 650 p. 15. Cole, Luella, and Morgan, John J. B. Psychology of Childhood and Adolescence. New York: Rinehart and Company, 1947. 416 p. 16. Curtis, Hazen A. “A Study of the Relative Effects of Age and of Test Difficulty upon Factor Patterns.” Genetic Psychology Monographs 40: 99–148; August 1949. 17. Cutts, Richard A., and Lane, Margery O’Kelley. “The Effect of Hospitalization on Wechsler-Bellevue Subtest Scores by Mental Defectives.” American Journal of Mental Deficiency 51: 391–93; January 1947. 18. Davis, W. Allison. Social-Class Influences upon Learning. Iglis Lecture, 1948. Cambridge, Massachusetts: Harvard University Press, 1948. 100 p. 19. Davis, W. Allison, and Havighurst, Robert J. “The Measurement of Mental Systems. (Can Intelligence Be Measured?)” Science Monthly 66: 301–16; April 1948. 20. de Groot, A. D. “The Effects of War upon the Intelligence of Youth.” Journal of Abnormal and Social Psychology 43: 311–17; July 1948. 21. Diamond, Solomon. “The Wechsler-Bellevue Intelligence Scales and Certain Vocational Aptitude Tests.” Journal of Psychology 24: 279–82; October 1947. 22. Dowd, M. Amedeus. “Changes in Moral Reasoning Through the High School Years.” Studies in Psychology and Psychiatry from the Catholic University of America 7, No. 2; 1948. 120 p. 23. Durea, Merrin A., and Taylor, G. J, “The Mentality of Delinquent Boys Appraised by the Wechsler-Bellevue Intelligence Tests.” American Journal of Mental Deficiency 52: 342–44; April 1948.
Salkind_Chapter 21.indd 86
9/4/2010 10:34:49 AM
Hendrickson
Preadolescence and Adolescence
87
24. Fattu, Nicholas A., and Fox, William R. “Scores on the Interpretation of Data Test: Their Relation to Measures of Achievement, Personality, and Interest.” Indiana University School of Education Bulletin 25: 1–54; May 1949. 25. Gesell, Arnold, and Ilg, Frances L. Child Development: An Introduction to the Study of Human Growth. New York: Harper and Brothers, 1949. 475 p. 26. Guertin, Wilson H. “Mental Growth in Pseudo-Feeblemindedness.” Journal of Clinical Psychology 5: 414–18; October 1949. 27. Hilden, Arnold H. “A Longitudinal Study of Intellectual Development.” Journal of Psychology 28: 187–214; July 1949. 28. Hill, Arthur. “Does Special Education Result in Improved Intelligence for the Slow Learner?” Journal of Exceptional Children 14: 207–13; April 1948. 29. Hobson, James R. “Sex Differences in Primary Mental Abilities.” Journal of Educational Research 41: 126–32; October 1947. 30. Hurlock, Elizabeth B. Child Growth and Development. New York: McGraw-Hill Book Company, 1949. 374 p. 31. Jenkins, Gladys G., Shacter, Helen, and Bauer, William W. These Are Your Children: How They Develop and How To Guide Them. Chicago: Scott, Foresman and Company, 1949. 192 p. 32. Johnson, John T. “On the Nature of Problem-solving in Arithmetic.” Journal of Educational Research 43: 110–15; October 1949. 33. Jones, Harold E., and Bayley, Nancy. “Growth, Development, and Decline.” Annual Review of Psychology 1: 1–8; 1950. 34. Kendall, Barbara S. “A Note on the Relation of Retardation in Reading to a Performance on a Memory-for-Designs Test.” Journal of Educational Psychology 39: 370–73; October 1948. 35. Kirk, Samuel A. “An Evaluation of the Study by Bernardine G. Schmidt Entitled: ‘Changes in Personal, Social, and Intellectual Behavior of Children Originally Classified as Feebleminded. ’ ” Psychological Bulletin 45: 321–33; July 1948. 36. Kvaraceus, William C., and Lanigan, Mary A. “Pupil Performance on the Iowa EveryPupil Tests of Basic Skills Administered at Half-Year Intervals in the Junior High School.” Educational and Psychological Measurement 8: 93–100; Spring 1948. 37. Lanigan, Mary A. “The Effectiveness of the Otis, the A.C.E. and the Minnesota Speed of Reading Tests for Predicting Success in College.” Journal of Educational Research 41: 289–96; December 1947. 38. Michaud, S. “L’ Enfant et les Figures Geometriques.” (The Child and Geometric Figures.) Journal de Psychologie Normale et Patholagique 40: 154–68; April–June 1947. 39. Michaud, E. “L’ Interpretation de Figures Geometriques par I’Enfant.” (The Child’s Interpretation of Geometric Patterns.) Journal de Psychologie Normale et Pathologique 42: 295–308; July-September 1949. 40. Milligan, Edward E., Lins, L. Joseph, and Little, Kenneth. “The Success of Non-High School Graduates in Degree Programs at the University of Wisconsin.” School and Society 67: 27–29; January 10, 1948. 41. Mones, Leon. “The Binet Pupils Get a Chance.” School and Society 67: 281–83; April 10, 1948. 42. Monroe, Walter S., editor. Encyclopedia of Educational Research. New York: The Macmillan Company, 1950. 1520 p. 43. Muntyan, Milosh. “A Study of the Re-test Factor in the Illinois Statewide High School Testing Program.” Journal of Educational Research 41: 183–92; November 1947. 44. Murray, John E. “An Analysis of Geometric Ability.” Journal of Educational Psychology 40: 118–24; February 1949. 45. Nolan, William J. “A Critique of the Evaluations of the Study of Bernardine G. Schmidt Entitled: ‘Changes in Personal, Social, and Intellectual Behavior of Children Originally Classified as Feebleminded.’ ” Journal of Exceptional Children 15: 225–34; May 1949.
Salkind_Chapter 21.indd 87
9/4/2010 10:34:49 AM
88
Human Development
46. Olson, Willard C. Child Development. Boston: D. C. Heath and Company, 1949. 417 p. 47. Quinn, Karl V., and Durling, Dorothy. “Twelve Months’ Study of Glutamic Acid Therapy in Different Clinical Types in an Institution for the Mentally Deficient.” American Journal of Mental Deficiency 54: 321–32; January 1950. 48. Remmers, Herman H., Elliott, Donald N., and Gage, Nathaniel L. “Curricular Differences in Predicting Scholastic Achievement: Applications to Counseling.” Journal of Educational Psychology 40: 385–94; November 1949. 49. Rudolf, G. de M. “Retesting of the Intelligence Quotient and the Social Age.” Journal of Mental Science 95: 696–702; 1949. 50. Rudolf, G. de M. “The Treatment of Mental Defectives with Thiamine.” Journal of Mental Science 95: 910–19; 1949. 51. Rudolf, G. de M. “The Treatment of Mental Defectives with Anemia for One Year.” Journal of Mental Science 96: 265–71; January 1950. 52. Schmidt, Bernardine G. “Changes in Behavior of Originally Feeble-Minded Children.” Journal of Exceptional Children 14: 67–72, 94; December 1947. 53. Schmidt, Bernardine G. “Changes in Personal, Social, and Intellectual Behavior of Children Originally Classified as Feeble-Minded.” Psychological Monographs 60, No. 5; 1946. 144 p. 54. Schmidt, Bernardine G. “A Reply.” Psychological Bulletin 45; 334–43; July 1948. 55. Schnell, Dorothy Maclary. Characteristics of Adolescence. Minneapolis, Minnesota: Burgess Publishing Company, 1947. 68 p. 56. Schulman, Mary Jean, and Havighurst, Robert J. “Relations Between Ability and Social Status in a Mid-Western Community. IV: Size of Vocabulary.” Journal of Educational Psychology 38: 437–42; November 1947. 57. Segel, David. Intellectual Abilities in the Adolescent Period. U. S. Office of Education Bulletin, 1948, No. 6. Washington, D. C: Superintendent of Documents, U. S. Government Printing Office, 1948. 41 p. 58. Shaw, Duane C. “A Study of the Relationships Between Thurstone Primary Mental Abilities and High School Achievement.” Journal of Educational Psychology 40: 239– 49; April 1949. 59. Shuey, Audrey M. “Improvement in Scores on the American Council Psychological Examination from Freshman to Senior Year.” Journal of Educational Psychology 39: 417–26; November 1948. 60. Sloan, William, and Harman, Harry H. “Constancy of I.Q. in Mental Defectives.” Pedagogical Seminary and Journal of Genetic Psychology 71: 177–85; December 1947. 61. Stern, Edith M. “Feeble-Minded Children Can Be Cured.” Woman’s Home Companion 74: 34–35; September 1947. 62. Swineford, Frances. “General, Verbal, and Spatial Bi-factors After Three Years.” Journal of Educational Psychology 40: 353–60; October 1949. 63. Swineford, Frances. “Growth in the General and Verbal Bi-factors from Grade VII to Grade IX.” Journal of Educational Psychology 38: 257–72; May 1947. 64. Swineford, Frances. “A Number Factor.” Journal of Educational Psychology 40: 157–67; March 1949. 65. Swineford, Frances. “A Study in Factor Analysis: The Nature of the General, Verbal, and Spatial Bi-Factors.” Supplementary Educational Monographs 67: 1–70; 1948. 66. Taylor, Calvin W. “A Factorial Study of Fluency in Writing.” Psychometrika 12: 239–62; December 1947. 67. Thorndike, Robert L. “Growth of Intelligence During Adolescence.” Pedagogical Seminary and Journal of Genetic Psychology 72: 11–15; September 1948. 68. Thorndike, Robert L. “Individual Differences.” Annual Review of Psychology 1: 87–104; 1950.
Salkind_Chapter 21.indd 88
9/4/2010 10:34:49 AM
Hendrickson
Preadolescence and Adolescence
89
69. Thurstone, Louis L., and Thurstone, Thelma G. SRA Primary Mental Abilities. Chicago: Science Research Associates, 1947. 70. Thurstone, Thelma G., and Thurstone, Louis L. SRA Verbal Form. Chicago: Science Research Associates, 1947. 71. Turner, William D. “Altruism and Its Measurement in Children.” Journal of Abnormal and Social Psychology 43: 502–16; October 1948. 72. Vernon, Philip E. “Changes in Abilities from 14 to 20 Years.” Advancement of Science 5: 138; July 1948. 73. Waelsch, Heinrich. “A Biochemical Consideration of Mental Deficiency. The Role of Glutamic Acid.” American Journal of Mental Deficiency 52: 305–13; April 1948. 74. Webber, Vera J., and Hunnicutt, Clarence W. “Children’s Ability to Perceive Change of Color in Painting.” Elementary School Journal 48: 494–97; May 1948. 75. Wheeler, Lester R., and Wheeler, Viola D. “The Relationship Between Reading Ability and Intelligence Among University Freshmen.” Journal of Educational Psychology 40: 230–38; April 1949. 76. Zimmerman, Frederick T. “The Glutamic Acid Treatment of Mental Retardation.” Quarterly Review of Psychiatry and Neurology 4: 263–69; October 1949. 77. Zimmerman, Frederick T., Burgemeister, Bessie B., and Putnam, Tracy J. “The Ceiling Effect of Glutamic Acid upon Intelligence in Children and in Adolescents.” American Journal of Psychiatry 104: 593–99; April 1948.
Salkind_Chapter 21.indd 89
9/4/2010 10:34:50 AM
Salkind_Chapter 21.indd 90
9/4/2010 10:34:50 AM
Section II: Curriculum, Instruction and Learning
Salkind_Chapter 22.indd 91
9/4/2010 10:34:40 AM
Salkind_Chapter 22.indd 92
9/4/2010 10:34:40 AM
22 Making Sense of Curriculum Evaluation: Continuities and Discontinuities in an Educational Idea David Hamilton
A
t first encounter the reviewer’s task for this chapter appears impossible. Curriculum evaluation is a field which lacks a strong sense of boundary. The growing corpus of published and unpublished material tends to foster feelings of unease and incompetence rather than insight and optimism. The more secluded corners of the academic garden seem to offer a greater sense of security. Further reflection, however, indicates the shallowness of such withdrawal. Disengagement is neither a solution to the problems of the researcher nor an adequate representation of the process of intellectual inquiry. Curriculum evaluation – like any other educational activity – is guided by the accumulated experience (or inexperience) of its participants and focused by their individual or group aspirations. The purpose of this chapter is to examine the conventional, but often tacit, wisdom of curriculum evaluation. In short, it is an attempt to demystify the invisible college. Two broad strategies are open to a reviewer. The first is to adopt the style of a 19th-century anthropologist and set out, as it were, to unearth the totality of cultural artifacts embedded in a bygone age. Unfortunately, however, product-centered reviews of this type are often trapped by their own rhetoric. By claiming to provide an exhaustive account they also become impossible to complete. Posthumous or partial publication is their most conspicuous outcome. Source: Review of Research in Education, 5 (1977): 318–347.
Salkind_Chapter 22.indd 93
9/4/2010 10:34:40 AM
94
Curriculum, Instruction and Learning
The second strategy is to focus upon the processes of research. Reviews of this type make no particular claim to catalog the myriad manifestations of an endeavor but rather seek to characterize the generative elements that help to create them. This chapter follows the second strategy. Its aim is to make sense of the present through an appraisal of the past. A historical perspective is believed to be a valid and useful heuristic for establishing the processes that activate curriculum evaluation. No claim is made that this account provides an allembracing history of evaluation. Indeed, to do so would be to switch to the encyclopedic stance of the erstwhile anthropologist. There is also a more profound sense in which this account cannot be complete. As a recurrent feature of educational life, curriculum evaluation necessarily prefigures a past that has yet to come. Thus, insofar as the interpretations of this paper are sensitive to the future as well as to the past, they must remain, in Cronbach’s cautionary phrase, “more provocative rather than authoritative” (1963, p. 672). Guided by these initial assumptions, this chapter is divided into four sections. The first part, “Some Perspectives for the Study of Curriculum Evaluation,” offers a set of conceptual prisms for differentiating the relatively unchanging features of curriculum evaluation. Primarily, its purpose is to delimit the concerns of this review. The second part, “The Origins of Curriculum Evaluation,” outlines the beliefs and practices which came to dominate evaluation research after World War II, arguing that many of these concerns had remained unaltered since the 19th century. The third part, “Curriculum Evaluation and the Image of Consensus,” considers the incorporation of these earlier ideas into the education reform movements of the 1960s and early 1970s. It focuses on the relationship between evaluation as course improvement and evaluation as social auditing. The fourth part, “Curriculum Evaluation and the Image of Pluralism,” documents and comments upon some of the contrary perspectives that have arisen along with the consensus assumptions of the 1960s. Accordingly, it suggests that recent developments reveal the existence of a major disjunction in both the theory and practice of curriculum evaluation. Overall, the aims of this chapter are to distill the ideas and events of the past 150 years and to provide a parsimonious yet comprehensive review of contemporary practice in curriculum evaluation.
Some Perspectives for the Study of Curriculum Evaluation The substance of this review is held together by a number of different unifying ideas. These relate to evaluation as a form of practical morality, evaluation and social change, evaluation and curriculum development, the internal and external dynamics of evaluation, the politics of evaluation, and evaluation and pluralism.
Salkind_Chapter 22.indd 94
9/4/2010 10:34:40 AM
Hamilton
Making Sense of Curriculum Evaluation
95
The foremost assumption of this chapter is that curriculum evaluation falls within the sphere of practical morality. As such, it responds not only to the ethical question “What should we do?” but also to the empirical question “What can we do?” Traditionally, however, these two questions have been held at arm’s length by the educational research community. Value statements have been regarded as something quite different from factual statements. As Scriven (1974, p. 4) has noted, many of the debates surrounding curriculum evaluation have been created through the interpenetration of these hitherto separately considered concerns. The second assumption is that societal concern for evaluation is heightened, if not created, by the facts of social change. Evaluation is meaningless without the possibility or requirement of alternative courses of action. Almost by definition, social change engenders such options. Evaluative actions are as old as social life. They occur whenever there is a social setting and someone in a position to change it. Curriculum evaluation takes place at all levels in the education system. A kindergarten child’s decision to do math rather than painting is, in principle, just as much an evaluation as a superintendent’s decision to spend more money on science and less on the arts. In each of these examples, a choice is made by weighing the options against a set of criteria. These illustrations, however, indicate only part of the story. Throughout the history of schooling continuous attempts have been made to translate these informal decision processes into explicit rules and formal procedures. Governments have made evaluation compulsory; federal agencies have formulated guidelines; universities have trained evaluators; textbooks have supplied methodologies; journals have established accepted practices. And so on. Thus social change creates not only new options but also new traditions and institutions. The third assumption is that curriculum evaluation can be seen as functionally related to curriculum development. For instance, if curriculum developers attend to the production of instructional packages, evaluators seem to respond in analogous fashion (e.g., the “Product Evaluation Profile” in Scriven, 1974). If, however, development becomes the preparation of delivery systems, then procedures developed for the evaluation of packages may become devalued and inoperative. The fourth assumption is that curriculum evaluation has both an internal and external dynamic. It can be discussed, for instance, within the restricted concerns of program operation as well as within the broader boundaries of social policy. These overlapping realms of thought and action do not always operate in concert. There is always the possibility of disagreement among the various parties to the evaluation, such as researchers, sponsors, and audiences. A fifth assumption is that evaluation is directly linked to the distribution of resources in the education system. As such it is essentially a political process. The history of curriculum evaluation can be seen as part of the struggle by different interest groups – educationalists, teachers, administrators,
Salkind_Chapter 22.indd 95
9/4/2010 10:34:40 AM
96
Curriculum, Instruction and Learning
industrialists – to gain control and exercise power over the forces that shape the practices of schooling. In these terms, a review of curriculum evaluation has also to be concerned with the distribution of power in the education system. The final assumption of this review relates to the idea of evaluation as a social process. As indicated above, evaluations conducted by individuals with respect to their own practice are usually based on a single set of criteria. If, however, more than one person is involved, the process takes on a completely different complexion. The participants may not agree upon the selection of criteria. Consensus can no longer be assumed. To this extent, value differences are crucial to the organization and enactment of educational change. They figure prominently in this review.
The Origins of Curriculum Evaluation An appropriate starting point is the work of John Stuart Mill (1806–1873). In the first half of the 19th century Mill laid down some of the most important ground rules of Western thought. Together with the ideas of colleagues and contemporaries like Bentham, Carlyle, Whewell, Herschel, Comte, and de Tocqueville, Mill’s notions have exerted a major, though often unrecognised, influence on 20th-century social philosophy. As a journal editor and member of parliament, and as a philosopher and economist, Mill was a pivotal figure in the linking of scientific practice to social administration. His writings – notably A System of Logic (1843), Principles of Political Economy (1848) and Utilitarianism (1861) – were both a defense and an elaboration of the liberal ideologies that took shape during the periods of revolutionary social change in North America and Western Europe. They were an “attempt . . . to embody and systematize” the “best ideas of the epoch” (A System of Logic, preface). Although more than a century has passed since Mill reached the height of his career, the issues he addressed are still a source of contention among theorists, administrators, and politicians. Mill’s impact on curriculum evaluation can be traced to three related concerns. First, he provided a coherent rationale for the conduct of the social sciences. Second, he developed a naturalistic (i.e., empirically rooted) theory of ethics. And third, he laid the philosophic foundations for what would now be termed the welfare state.
Science, Values, and State Intervention A System of Logic was published in eight editions during Mill’s life time. It was the first comprehensive formulation of the newly fashionable empirical method, and the “best attacked” book of the time (Nagel, 1950, p. xvii). In the final
Salkind_Chapter 22.indd 96
9/4/2010 10:34:40 AM
Hamilton
Making Sense of Curriculum Evaluation
97
section (“On the Logic of the Moral Sciences”) Mill sought to rescue the “proper study of mankind” from what he regarded as the inadequacies of philosophy and theology. He emphasized that the moral (i.e., social) sciences should follow the same methods and strive for the same goals as the natural sciences. By this reasoning, Mill articulated what would now be called the scientific approach (Kerlinger, 1964) to the study of social phenomena. Mill’s utilitarian theory of ethics was directly related to his views on scientific and logical method. In its most general form utilitarianism embodies two assumptions: first, that principles of conduct can be adduced from the canons of experimental inquiry, and second, that social behavior can be unequivocably judged against an overarching (and “self-evident”) moral principle. These assumptions enabled Mill to measure morality against a one-dimensional ordinal scale. The “Greatest Happiness” can be established unequivocally by reference to what “competent judges” consider to be “desirable” (Utilitarianism, part 2). By adopting this form of moral yardstick Mill was able to overcome what would now be called the criterion problem in evaluation: There must be some standard by which to determine the goodness or badness, absolute or comparative, of ends or objects of desires. And whatever that standard is, there can be but one; for if there were several ultimate principles of conduct, the same conduct might be approved by one of these principles and condemned by another; and there will be needed some more general principle as umpire between them (A System of Logic, final chapter).
Mill’s views on the state also drew support from his moral precepts. He believed that in certain areas of social life (e.g., elementary education, the alleviation of poverty), the free-trade assumptions of laissez-faire government were contrary to the overall “Greatest Happiness” of society. As a result, Mill held that the welfare state should act as a counterbalancing force by supporting the charitable efforts of “private and voluntary agency” (Principles of Political Economy, Book 5, Chapter 11; there were seven editions during Mill’s lifetime). In 1859, the publication of Charles Darwin’s The Origin of Species gave Mill’s methodological and political ideas a fresh lease on life. The Darwinian precept that differences between members of the same species provide the mainspring of biological evolution gave a new impetus to the empirical study of human characteristics. Soon after Mill’s death in 1873, Francis Galton began a series of anthropometric and psychometric surveys which helped to establish not only a psychology of individual differences but also a new inferential calculus (correlational analysis) for the codification of empirical associations (see Hamilton, 1974). The United States first learned of Galton’s ideas – and those of his associate Karl Pearson – through the efforts of J. McKeen Cattell, who coined the term mental test in 1890, and
Salkind_Chapter 22.indd 97
9/4/2010 10:34:41 AM
98
Curriculum, Instruction and Learning
Edward L. Thorndike, who used their ideas in the construction of achievement tests (see Joncich, 1968, pp. 290–293). Although Mill and Galton shared a common belief in the scientific methods of the 19th century, their political theories were mutually at variance. Mill stressed the shaping influence of environmental forces; Galton emphasized the primary importance of heredity. Nevertheless, the methodological unity of the two schools of thought meant that their ideas could be tested using the same equipment and procedures. A rash of social investigations in late 19th-century Britain were the outcome of this common concern. The crucial question was whether social assistance increased or decreased the self-help capacities of the urban poor. One of the most prominent investigators of the time was Charles Booth, a wealthy shipping magnate. Booth conducted a series of inquiries which were reported in the 17 volumes of Life and Labour of the People of London (1889–1903). Although he began his research by siding with Galton, Booth later came out in favor of state intervention in the affairs of the “helpless and incompetent” (Webb, 1926, pp. 260–261). Booth and his assistants used questionnaires, official census data and “personal (i.e., participant) observation” to document and portray the extent of poverty in London. Furthermore, part of their work focused on the preparation of a “social diagnosis” (or evaluation) of various “experiments” in poor-law relief. The influence of Booth’s work, like that of Galton, also spread to the United States, giving strong support to the settlement movements in Chicago and New York (see Cremin, 1961, chap. 3).
Pragmatism and Social Change in the United States By the end of the 19th century the social and economic forces which had made Britain the world’s leading industrial power began to stir more vigorously in the United States. In the wake of the Civil War, Darwin’s ideas – transposed to the realm of social evolution by Herbert Spencer – were the focus of long and vigorous debate. The dominant viewpoint was that “survival of the fittest” (Spencer’s term) should be retained, through laissez-faire government, as the most efficient mechanism for social improvement. Other commentators – notably the pragmatists – took a position that was close to Mill’s (William James’s Pragmatism was dedicated to J. S. Mill). They felt that Spencer’s evolutionist views were a one-sided interpretation of Darwinism and, moreover, a thinly disguised biological apology for the excesses of laissezfaire government. Above all, they rejected the assumption that the social environment was outside the realm of human control. If Spencer offered a philosophy of inevitability, the pragmatists replied with a vision of possibility (see Hofstadter, 1955, p. 103). In an era of rapid social change, characterized by such movements as accelerating urbanization, massive immigration, economic boom and bust, and labor unrest, they put forward proposals
Salkind_Chapter 22.indd 98
9/4/2010 10:34:41 AM
Hamilton
Making Sense of Curriculum Evaluation
99
which could serve to coordinate the disparate elements of an ungainly social system. Education rather than competition was advocated as the most effective instrument of social improvement (Feinberg, 1973). The major architect of practical pragmatism was John Dewey. Like Mill, Dewey addressed a wide range of concerns in the realms of social science, ethics, and government policy. Briefly, he believed that logic could be redefined as the theory of inquiry, that moral knowledge was a species of empirical knowledge, and that social life could be enhanced through the use of a political technology (see White, 1972, p. 277 ff ). During his stay at the University of Chicago (1896–1904), Dewey not only founded the Laboratory School (1896) but also developed his education theories within a new philosophical and psychological framework. For instance, in his 1899 presidential address to the American Psychological Association (“Psychology and Social Practice,” 1900) Dewey argued for a “fuller” understanding of the relationship between the “new education” and the elements of “psycho-physical mechanism.” Through its new-found knowledge, psychology provided education with a “statement of mechanism” through which “ethical ends” could be “realized” (p. 121): . . . the more thorough-going and complete the mechanical and causal statement, the more controlled, the more economical are the discovery and realization of human aims. It is not in spite of, nor in neglect of, but because of the mechanical statement that human activity has been freed, and made effective in thousands of new practical directions, upon a scale and with a certainty hitherto undreamed of. (p. 118)
Dewey’s rhetoric resonated not only with the aspirations of an emerging industrial society but also with the debates taking place inside education, such as the NEA’s “Committee of Ten” on colleges and secondary schools, initiated in 1892, and the “Committee of Fifteen” on elementary education, initiated in 1893. The established curriculum of the 19th-century secondary school was based on faculty psychology and the related concept of transfer of training. By the last decade of the century faculty psychology was forced onto the defensive (see Krug, 1969, chaps. 1 & 2). Criticism came from two sides. Psychologists such as Thorndike and Woodworth, at the conference presided over by John Dewey, claimed that there was limited experimental evidence for transfer of training from the old disciplines such as Latin to the new ones. The other attack came from those inside and outside education who wanted the curriculum to respond more adequately to the social efficiency movements gaining ground in the spheres of industrial and administrative life.
Evaluation and the Cult of Efficiency In due course, the education system responded to these concerns with a welter of innovations. Examinations began to replace school accreditation as
Salkind_Chapter 22.indd 99
9/4/2010 10:34:41 AM
100
Curriculum, Instruction and Learning
a means of selecting students for college (the College Entrance Examination Board, for example, was founded in 1900). Individualization became a key concept of school theory and practice (Joncich, 1968, p. 311); age-grade statistics were collected to measure the productive quality of school systems (Tyack, 1974, p. 199 ff.); mental tests were used to categorize school children (Karier, 1973, p. 115); and industrial and vocational schooling came into greater prominence (Lazerson, 1971, chaps. 5–7). The net result of these developments was the centralization of education, locally and nationally, and the growth of an administrative and managerial elite – many of whom were recruited from expanding graduate institutions like Teachers College, Columbia, the University of Chicago, and Stanford University (Joncich, pp. 216–231; Tyack, pp. 182–198). The rallying cry and self-justification of these “administrative progressives” (Tyack’s term) was that streamlined efficiency would be achieved in all spheres of education through a more rigorous application of the scientific method. As the new century grew a little older, school superintendents began to see themselves more as business executives than as scholars or statesmen (Callahan, 1962, pp. 7–8). The practical consequences of this trend took various forms. For instance, the National Education Association appointed four committees between 1904 and 1911 to study the classification and progress of children; E. L. Thorndike published his first achievement scale (on handwriting) in 1908; and New York City established a Bureau of Research in 1912 to conduct a continuous built-in survey of the school system, using “the new measurement techniques” (Seguel, 1966, p. 75). In turn, the administration of education shifted from a rural model of lay community control toward one that stressed professional training and bureaucratic expertise (see Tyack, 1974, passim). The control of the school curriculum underwent a similar change. In the late 19th century curriculum and pedagogical decisions rested with two agencies: local school boards and college accreditation committees, initiated by the University of Michigan in 1871. By 1910, the impact of the reform movement in schooling meant that the school curriculum fell increasingly under the influence of the business ethic. It spoke for the captains of industry, and their lieutenants, the superintendents, and not for the belowdecks personnel such as parents, teachers, and students. A further wave of support for the scientific movement followed the publication of F. W. Taylor’s The Principles of Scientific Management (1911). Taylor’s ideas were brought to the attention of educational administrators through The Supervision of City Schools, The 12th Yearbook of the National Society for the Study of Education (1913). The yearbook’s editor was Franklin Bobbitt, who in later years began to focus more specifically on the organization of school subjects (e.g., The Curriculum, 1918; and How to Make a Curriculum, 1924). Bobbitt’s interests were shared by W. W. Charters (e.g., Curriculum Construction, 1923), who, unlike Bobbitt, came to curriculum design from an interest in teaching rather than administration.
Salkind_Chapter 22.indd 100
9/4/2010 10:34:41 AM
Hamilton
Making Sense of Curriculum Evaluation
101
Bobbitt and Charters developed a conveyor-belt system of curriculum making. They believed, following Taylor, that educational efficiency could be increased through a detailed analysis of the skills a child must acquire to become a socially mature adult. Further, they held that educational goals could be established by reference to “common aims” (Bobbitt, 1913, quoted in Seguel, 1966, p. 99) rather than to the concerns of any particular interest group such as principals or teachers. By this appeal, Bobbitt and Charters were able to unite teachers and administrators in a common technological task – the facilitation of effective schooling. In one respect, therefore, they were successful in taking politics out of the curriculum. In another respect, they also enhanced the value-neutral image of research. Evaluation of a scholastic activity could be regarded as a technical achievement equivalent to the evaluation of a mathematical expression. Within the rhetoric of educational efficiency, the main purpose of curriculum construction was to facilitate the production (or reproduction) of an ideal adult. For their curriculum blueprints, Bobbitt and Charters looked to the superintendents and teachers; for their quality control they looked to the growing measurement community.
Evaluation and Curriculum Design Both Bobbitt and Charters had connections with the University of Chicago. Charters had been a graduate student of John Dewey, and Bobbitt was to serve the university as a professor of educational administration from 1912 to 1941 (Jackson, 1975). In 1919 Charters became director of a research bureau for retail training at the Carnegie Institute of Technology in Pittsburgh. The strong vocational concerns of the bureau (and the availability of federal grants for trade and industrial training under the Smith-Hughes Act, 1917) allowed Charters to extend his earlier pedagogic enquiries toward the analysis of adult occupations and the construction of suitably related curricula and teaching methods. After a second period at Chicago in the 1920s, Charters moved to the directorship of the Bureau of Educational Research at Ohio State University. While at Ohio State, Charters shared his job analysis interests with Ralph Tyler, a former doctoral student at the University of Chicago. Tyler’s crucial contribution to the work of the bureau derived from his graduate training. Unlike many earlier researchers, he combined expertise in both testing theory (the responsibility of psychologists) and curriculum construction (the responsibility of administrators and teachers). The title of Tyler’s 1927 doctoral thesis – “Statistical Methods for Evaluating Teacher-Training Curricula” – symbolizes this unification. Tyler’s complementary interests were successfully combined when he became research director of the Committee on Evaluation and Recording
Salkind_Chapter 22.indd 101
9/4/2010 10:34:41 AM
102
Curriculum, Instruction and Learning
of the Eight-Year Study (1932–40). This was a curriculum experiment commissioned at the height of the Great Depression by the Progressive Education Association (honorary president, John Dewey) and supported by private funds from the Carnegie Foundation and the Rockefeller-initiated General Education Board. Over 300 colleges agreed with the PEA to relax their formal entrance requirements, and 29 experimental schools reciprocated by redesigning their curricula along “progressive” lines. Tyler’s evaluation rationale was that an “appraisal of an educational institution is fundamentally only the process by which we find out how far the objectives of the institution are being realized” (Tyler, in Smith & Tyler, 1942, p. 5). The first half of the study followed this “objectives” model. The evaluation staff was “primarily concerned with developing means by which the achievement of students in the schools could be appraised” (p. 5). The second half used a different rationale. The relative merits of progressive and traditional courses were adduced by means of a comparative design. The college careers of the participating students were compared with those of a matched sample of 1,475 nonexperimental students. To practising educationalists the apparently favorable results of the Eight-Year Study indicated the efficacy of progressive methods. To the research community, they signaled the emergence of a sophisticated paradigm for the design and evaluation of school and college curricula (Tyler, 1949). Tyler’s specific contributions to the Eight-Year Study reflected both his training and his experience. As a colleague of Charters, he placed high priority on the analysis of curriculum goals and activities (see Charters, 1926). As a recently trained psychologist, he argued that objectives should be prespecified in behavioral terms (see Anderson, 1975, p. 143). And as a witness to the industrial collapse of the depression he held to a much broader conception of education – one that included affective as well as cognitive and vocational components (see Cremin, 1961, chap. 7; Smith & Tyler, 1942). After World War II the public success of the Eight-Year Study stimulated other cooperative investigations of a similar kind, such as the American Council on Education Project on Evaluation in General Education (McKim, 1957; Taylor & Cowley, 1972). Tyler’s concern for schoolwide behavioral objectives received a fresh boost in 1956 with the publication of Handbook 1 of the Taxonomy of Educational Objectives (Bloom, 1956). This short but seminal work was the result of a seven-year collaborative project set up in 1949 by two of Tyler’s co-workers: J. Thomas Hastings, of the University of Illinois, and Benjamin S. Bloom, of the University of Chicago. In a sense the encyclopedic efforts of Bloom and his colleagues marked both a beginning and an end. Their work had an air of completeness and finality about it. Yet, in combination with the postwar growth of factor analysis (Thurstone & Thurstone, 1941), psychometric theory (Lindquist, 1951), and experimental design (Stanley, 1966), it also offered an awesome prospect for the future.
Salkind_Chapter 22.indd 102
9/4/2010 10:34:41 AM
Hamilton
Making Sense of Curriculum Evaluation
103
This section has sketched some of the recurrent themes and precipitating events that accompanied educational change and evaluation in the period prior to the mid-1950s. At the risk of underestimating the influence of counter-currents, it has argued that the dominant ideas of the day were translated into educational terms by John Dewey, fostered by private and state investment in education, operationalized (not always with Dewey’s approval) by Thorndike, Bobbitt, and Charters, and reproduced by the generations of administrators and professors who passed through the portals of Teachers College, Chicago, Stanford, and elsewhere. The ideas of the founding fathers were clearly articulated and efficiently disseminated. Relatively little, however, is known about the translation of their prescriptions into the realm of classroom practice. Given the “lack” of “empirical studies on the conduct of evaluation during this period” (Lortie, 1970, p. 155), the stipulation remains clearer than the deed. It is also probably true, though perhaps a little unjust, that the founding fathers are better remembered by their technologies than by their aspirations. In these terms, the new wave of educational evaluators created in the late 1950s and 1960s had much to contend with. The groundwork had been done; a plateau had been reached (see Taylor & Cowley, 1972, p. 1). In some senses, then, the history of evaluation had drawn to a close. From another standpoint, however, it had hardly begun.
Curriculum Evaluation and the Image of Consensus The ethos of the Eight-Year Study carried through into the postwar years. But, according to Hagen and Thorndike (1960), there was a shift from a “research oriented attempt to develop new and better evaluation procedures to an action research oriented attempt to involve school personel in evaluating their own educational programs” (1960, p. 482, emphasis in original; see Smith & Tyler, 1942, p. 30). In time, however, this new “school of thought” (Cronbach, 1963, p. 674) came under sharp attack. It was claimed for instance that general academic standards were being eroded and, more specifically, that the average college entrant’s performance had begun to decline. From a vocational perspective, it was also claimed that colleges were failing to fill the demand for scientific personnel in industry (Cremin, 1961, chap. 9). The response to this “crisis in popular schooling” (Cremin’s phase) was slow but sure. In 1951, for example, the University of Illinois Committee on School Mathematics received funds from the Carnegie Corporation to enable faculty members to give guidance to high school teachers. Similar stirrings occurred within other specialized fields (Goodlad, 1966). In the process, curriculum development shifted away from the techniques of course construction toward a concern for the substance of course content. The new curriculum mandarins were drawn from the ranks of subject specialists, not management technologists or educational psychologists.
Salkind_Chapter 22.indd 103
9/4/2010 10:34:41 AM
104
Curriculum, Instruction and Learning
This emphasis on discipline-centered curriculum reform eventually received official and financial recognition from Congress (Hurd, 1969, p. 14). In 1958 the sputnik-prompted National Defense Education Act released federal funds to the National Science Foundation for the improvement of science, mathematics, and (in part) social science curricula.
Evaluation and Course Improvement The circumscribed nature of curriculum development’s funding and the materials-based character of its commitments meant that it gradually became synonymous with program development. Task forces of specialists were convened to produce packages of ideas and procedures which, if required, could be transmitted intact to the farthest corners of the school system. At first the overall merit of the revised curricula was taken to be self-evident. Evaluation remained an informal iterative process directed toward course improvement and conducted by members of the subject team in association with teachers in trial schools. Command of the endeavor remained in the hands of subject specialists. The superior intellectual prestige of the pure sciences enabled the developers to be a self-policing, self-evaluating community. Curriculum projects tended to reject or ignore the conventional wisdom of the evaluation traditions that had flowered in the 1930s (Atkin, 1963). In the early 1960s evaluation issues began to be raised more sharply as the earlier curriculum projects penetrated through the school system. A typical complaint was that the new schemes had no visible impact (Goodlad, 1968; Provus, 1971, Chap. 1). In this respect, doubt was cast upon the pedagogical rather than the intellectual viability of the programs. Merit could no longer be taken for granted; it had to be made manifest. Insignificant or equivocal results made curriculum developers more willing to enlist the support of ideas from within the realms of behavioral research (see Weir’s account of the Biological Science Curriculum Study project, 1976). At that time the educational research community was dominated by psychologists trained in the experimental or individual-differences traditions. Not surprisingly, therefore, these traditions began providing the basic blueprints used in curriculum evaluation. Concern about the visibility of programs also influenced the organization of curriculum research: Evaluation gradually became a specialist activity.
Evaluation and Social Auditing In 1965 the emergent evaluation community received a boost when continued financial support under Title 1 and Title 3 of the Elementary and Secondary Educational Act was made contingent upon submission of evaluation reports
Salkind_Chapter 22.indd 104
9/4/2010 10:34:41 AM
Hamilton
Making Sense of Curriculum Evaluation
105
by local program operators. With this administrative device, consolidated and extended in subsequent legislation, curriculum evaluation took a new turn. Its major concern ceased to be course improvement and became instead educational auditing. The range of objectives to be scrutinized was much more limited than the “comprehensive” range of objectives envisaged by Tyler in Basic Principles of Curriculum and Instruction (1949, p. 5 ff.). At the same time, de facto control of the curriculum was taken out of the hands of subject specialists and located closer to the heart of the federal administration, itself undergoing a period of reappraisal. In the early 1960s, Secretary of Defense Robert McNamara, newly recruited from the Ford Motor Company, introduced a form of evaluation termed Planning, Programming, Budgeting System into the decision-making processes of his own department. The essential feature of this innovation was that it shifted the basis of decision making from input to output budgeting – that is, from indices such as class size to measurements such as pupil performance. In 1965 this type of cost-effective appraisal was extended to all federal departments and agencies (Williams & Evans, 1969). Until that time most federal and local educational agencies had evaluated the results of their endeavors using the same internally organized procedures as curriculum developers used. Both groups saw their primary task as the production of a visible program; every dollar spent on evaluation was a dollar lost to program development (McDill, McDill, & Sprehe, 1972, p. 148). As suggested above, the legislation of the mid-1960s foreshadowed a rapid growth of administrative involvement in evaluation. This interest was particularly evident with respect to poverty programs funded under the Economic Opportunity Act of 1964 and with respect to the growth of statewide accountability schemes and the nationwide Program for the Assessment of Educational Progress. From its inception, the Office of Economic Opportunity (OEO) contained a section for Research Plans, Programs and Evaluation (Glennan, 1972, p. 188). The evaluation efforts of this section during its early years were directed toward servicing the requests of program participants and consumers. No attempts were made to question the existence of any given program. In 1966 (presumably in response to the introduction of PPBS) the Research Plans, Programs, and Evaluation (RPP&E) office began a series of program evaluations. These studies were precipitated by an internal request for evidence which could assist with decisions over the alteration, curtailment, or discontinuation of programs. Whatever its espoused intention, this request had the effect of luring the evaluators’ allegiances from the concerns of the program teams to those of the program sponsors. In effect, the evaluation agency was charged with the task of eliciting visible results which could be displayed in the company prospectus and itemized in the annual balance sheet. In such a climate, it is not surprising that evaluators became more concerned with visible products than inferred processes.
Salkind_Chapter 22.indd 105
9/4/2010 10:34:41 AM
106
Curriculum, Instruction and Learning
These social auditing concerns were formalized in an OEO “instruction” of March 1968 which established a major component of evaluation as “determining the extent to which programs are successful in achieving basic objectives” (Glennan, 1972, p. 189). Specific responsibility for this evaluation strategy was invested in RPP&E, which was established as a separate division of OEO in 1967. The separation between outcome evaluation and program development was also formalized in a decision that RPP&E should automatically receive a small proportion (0.16%) of any program budget (Glennan, 1972, p. 190). This division of labor soon provoked its own contradiction. When asked to design an evaluation of Head Start, the RPP&E evaluators, who needed data for decision making, came into conflict with program staff, who believed the evaluators’ proposals were too narrow. After “much internal debate” the director of OEO “ordered” the study. A contract was made with the Westinghouse Learning Corporation and Ohio University in June 1968. Eight months later (i.e., prior to the completion of this study), former President Richard Nixon’s economic opportunity message to Congress revealed that the long-term effect of Head Start appeared to be “extremely weak” (Williams & Evans, 1969, p. 124). This statement provoked a storm of controversy. In its wake, the population at large became more aware that social scientists were divided among themselves as to the implications of the Westinghouse-Ohio results. Along with controversies about the relationship between schooling and educational achievement (Coleman, Campbell, Hobson, McPartland, Mood, Weinfeld, & York, 1966; Jensen, 1969), the Head Start investigation did little to validate the activities of professional evaluators. Nevertheless, as Nixon’s statement suggests, the research community’s power to legitimate political decision making remained as strong as ever. Thus, despite its evident technical shortcomings, the Ohio-Westinghouse study increased rather than decreased the attention focused on curriculum evaluation. As the Vietnam War drew to an end and no new revenue was made available for federal spending (see Glazer, 1973), evaluation began to serve more sharply in an auditing function. At a time of economic stagnation it became an agent of program contraction (or rationalization) rather than a patron of program promotion.
Monitoring the Curriculum Statewide accountability schemes were a further instance of centralized monitoring of the curriculum. Educational accountability, as it developed in the 1970s (see Sciara & Jantz, 1972), rested upon the logic that educational processes – rather like productive mechanisms – can be broken down into their constituent parts and specified in terms of operational criteria. Uniform
Salkind_Chapter 22.indd 106
9/4/2010 10:34:41 AM
Hamilton
Making Sense of Curriculum Evaluation
107
and heightened efficiency was taken to be the end which justifies the means. As in the days of Bobbitt and Charters, educational technology of this kind had strong links with business management and centralized control. Just as the earlier generation appealed to the division of labor, so their descendants utilized the language of cybernetics. Assistant Commissioner of Education Leon M. Lessinger’s references in 1970 to logistics, systems analysis and human factors engineering are examples (see Popkewitz & Wehlage, 1973, p. 49). The National Assessment of Education Progress (NAEP) took shape in the mid-1960s. In 1965 an exploratory committee (under the chairmanship of Ralph Tyler, and supported by the Carnegie Corporation and the Ford Foundation) set out to prepare educational objectives and assessment procedures which would embrace the entire school curriculum (Flanagan, 1969, p. 223). The goal of the NAEP was to develop indices of educational output, like economic indices such as the gross national product, which might serve as a basis for social planning. The first results of NAEP were announced in 1970. They had been generated by a national sample of about 100,000 children and adults who had responded to test items drawn from 460 exercises in science, citizenship, and writing. A significant feature of the NAEP is that its indices do not refer to particular students, classes, schools, or school systems, but rather to the overall (aggregate) attainment of a large number of people. To this extent, the information directly serves the generalized interests of administrative bureaucracies, not the specific concerns of students, teachers, or schools. As with the utilitarian theory of J. S. Mill, the needs of the system are held to be congruent with the needs of the individual (Britton, 1969, p. 53).
Educational Research and Curriculum Evaluation Confronted with the problems and opportunities offered by these wider developments, the educational research community reacted with not unsurprising speed. In 1964 (i.e., before evaluation was mandated by the Elementary and Secondary Education Act) L. J. Cronbach, then the president of the American Educational Research Association, appointed an ad hoc committee to study the contribution the association could make to the growing interest in evaluation. A year later, President Benjamin S. Bloom commissioned a committee to develop evaluation guidelines and model procedures. The following year’s committee (extended by President Julian S. Stanley) rejected Bloom’s concerns in favor of a more eclectic stance. One of the outcomes of these deliberations was the AERA Monograph Series on Curriculum Evaluation, seven volumes of which were published between 1967 and 1974 (see Stake, 1967b, pp. 8–12). The Monograph Series bears witness to the diversity of opinions expressed within the research community. In the early days there were two competing
Salkind_Chapter 22.indd 107
9/4/2010 10:34:41 AM
108
Curriculum, Instruction and Learning
schools of evaluation thought. On the one hand there were those (like Cronbach) who argued in favor of a modified Tylerian rationale (see Tyler, 1949), whereby a study is made of the “post-course performance of a well described group with respect to many important objectives and side effects” (Cronbach, 1963, p. 676). On the other hand, there were those (like Stanley) who advocated comparative studies which used control and treatment groups (see Campbell & Stanley, 1963). Although the experimental model had the most persuasive scientific appeal, few studies of the new curricula achieved the required levels of randomization and control. According to Welch and Walberg (1974), only 4 out of 46 government-sponsored course development projects had used “true” experiments in their evaluation strategies by 1969 (p. 113). Nevertheless, the comparative assumptions of the experimental paradigm still served to underpin the “two most frequently used models in large-scale program evaluation” (Light & Smith, 1970, p. 9). The first of these models – post hoc quasiexperimentation, as in the Head Start evaluation – establishes experimental and control groups after the treatment has been applied. The second model – post hoc sample surveying, as in the Coleman report (1966) – relies on a large data base which, because of its size and variability, can be subsequently analyzed to account for the various designated treatments.
Comparative versus Tylerian Rationales Objections to the comparative model (Cronbach, 1963; Guba, 1969) rested on the argument that it was both technically and philosophically inappropriate to the nature of curriculum evaluation. Cronbach and Guba, for instance, maintained, like J. S. Mill, that group comparisons may give equivocal results if more than one variable is studied (i.e., the control group may appear superior on one variable, the experimental group on another). Other critics (e.g., Walker & Schaffarzick, 1974) argued against comparative evaluation designs on the grounds that the new curricula set their own goals and standards. In principle, Tylerian evaluation models avoid these problems: The innovative curriculum is measured against agreed internal standards, not against the results achieved by another (possibly nonequivalent) program. Scriven identified some of the epistemological weaknesses of the Tylerian and comparative rationales in the first AERA Monograph (Scriven, 1967). The Tylerian approach does not solve the comparison problem since, as in the Eight-Year Study, curriculum objectives are always established by reference to (or in reaction to) the objectives and achievements of other programs and sets of standards. In effect, Scriven demonstrated that the question “Does it meet the standards laid down by program staff?” is, in principle, no different from the question “Is it better than Brand X?” Scriven
Salkind_Chapter 22.indd 108
9/4/2010 10:34:41 AM
Hamilton
Making Sense of Curriculum Evaluation
109
also reiterated the argument that two-group comparative designs give ambiguous results in that there is no intrinsic mechanism for separating the impact of the actual treatment from that of the associated Hawthorne and John Henry effects (p. 68). By analyzing the complementary weaknesses of the preeminent rationales, Scriven was able to outline some possible solutions. First, he maintained that the opinion of subject specialists should count more heavily in the validation of Tylerian objectives and criteria. Second, he suggested that simple designs based on experimental versus control groups should be replaced by designs with more than one experimental group. Finally, Scriven offered a solution to J. S. Mill’s multiple-criterion problem by arguing that individual criteria could be differentially weighted and then combined to form a single criterion measure. Scriven’s theoretical appraisal of the criterion problem was both elegant and appropriate. In 1970, however, Glass claimed that the practical implementation of Scriven’s solution required “evaluation techniques still not discovered” (p. 23). Glass examined possible procedures (such as minimax techniques) but was unable to devise a further technique for choosing among them. He concluded that “human judgment” (p. 29) was the only valid arbiter. Although the science of human conduct had come a long way since the days of John Stuart Mill, it continued to run a ground on the shifting sands of human values. Confronted by such a tangle of epistemological, empirical, and statistical problems (Lord, 1967; Campbell & Erlebacher, 1970), certain educationists began to look for evaluation models beyond the conventional boundaries of postwar educational research. Among the more successful forays have been those into the realms of management theory, literary criticism, jurisprudence, and consumer science.
Evaluation and Management Theory Management models for evaluation (Provus, 1969; Rippey, 1973; Stufflebeam, Foley, Gephart, Guba, Hammond, Merriman & Provus, 1971) are program, organization, or system centered. They take as their basic aim the improvement of rational decision making. Evaluations are designed to reduce “institutional conflict” (Rippey, p. 14), to “facilitate quality control and improvement” (Stufflebeam et al., p. 217), or to “determine whether to improve, maintain or terminate a program” (Provus, p. 245). Although their methodologies may vary, the data and performance criteria of management models relate to the “total system” (Stufflebeam et al., 1971, p. 238) rather than to individual pupils or teachers. As such, they reflect the aspirations of personnel with programwide responsibility, not the immediate concerns of classroom practitioners.
Salkind_Chapter 22.indd 109
9/4/2010 10:34:42 AM
110
Curriculum, Instruction and Learning
Management-oriented evaluation models hark back to Bobbitt’s writings on the administration of school systems. Bureaucratic (i.e., management) efficiency tends to be blended with educational efficiency. In Provus’ revealing formulations, the evaluator is like a “management engineer” (p. 245), and the evaluation functions as a “watchdog of program management” (p. 260).
Evaluation and Literary Criticism The influence of literary criticism as a role model for evaluation (Eisner, 1972; Kelly, 1971) also grew out of dissatisfaction with existing paradigms. Evaluations of this kind – a “supplement to the use of scientific procedures” (Eisner, 1975) – draw upon an artistic tradition of “connoisseurship and criticism.” They incorporate ways of seeing rather than ways of measuring. The evaluator (or “critic”) aims to sensitize the individual practitioner (or reader) by “rendering” an account of the program, using the “vehicles” of “suggestion, simile, and metaphor” (Eisner, 1972, p. 586). Nevertheless, despite these important methodological differences, Eisner’s “new” approach was, in essence, just as much an abstruse technology (with specialist training, journals, books, studentships) as the procedures it sought to supplement (see Eisner, 1975). This is not altogether surprising, since Eisner’s concern for “judgment . . . grounded in reasons” harks back to Dewey, whom he quotes approvingly, just as Cooley and Lohnes’ (1976) call, equally Dewey-inspired, for evaluation research that is “multi-variate, large-sample and longitudinal” (p. 5). Eisner also solves the criterion problem in a way that is similar to the manner of Cooley and Lohnes. Judgments are established externally – by reference to prior “human needs” (Cooley & Lohnes, p. 13) or values derived from “tradition and habit” and “the nature of artistic virtue” (Eisner, 1975).
Evaluation and Jurisprudence Legal or adversary models for evaluation (Kourilsky, 1973a; Levine, 1973b; Wolf, 1974) use the notion that courts of law have well-established principles of procedure which can be used to regulate and administer the processes of decision making. They can be seen as an attempt by the evaluation community to institutionalize the kind of debates that typically occur following the publication of an evaluation report. The most significant theoretical feature of these models is that they legitimate the existence of discrepant accounts presented by advocates and adversaries. Different models, however, embody different concepts of decision making. Kourilsky (1973b) saw the “goal” of adversary evaluation as the generation of “properly informed” decisions, whereas Levine (1973a) regarded the adversary model simply as a means of conducting debates about educational programs. Kourilsky focused on the “technology” of decision making, such as “selecting appropriate information”
Salkind_Chapter 22.indd 110
9/4/2010 10:34:42 AM
Hamilton
Making Sense of Curriculum Evaluation
111
(1973a, p. 4), or “empanelling jurors” (see Wolf, 1974, preface). Levine emphasized the “politics of decision making” (1973b, p. 8; Levine, 1974). As shown below, such a distinction is crucial to this review.
Evaluation and Consumer Science Consumer science provides a model for evaluation in cases where the curriculum is studied in terms of its value to the user, rather than in terms of the intentions (or goals) of the producer. Consumption is the ultimate criterion, not production. Evaluations of this kind examine payoffs rather than precepts. By comparison with the Tylerian rationale, the “actual effects” of the program are given priority over its objectives or “alleged effects” (Scriven, 1972b, p. 2, emphasis in original). In these instances the judgmental criteria are not prespecified by the curriculum developer. They are applied post hoc by the evaluator who uses external “standards of merit” derived from “the needs of the nation” (Scriven, 1972b, p. 2). Scriven (1972b) coined the term Goal Free Evaluation to describe this type of study. However, its ancestry stretches back through the evaluation of broad-aim programs of social action (see Weiss & Rein, 1969) to the social diagnoses conducted by Booth and others in the 19th century (see Caro, 1971).
Evaluation and the Problem of Consensus Despite superficial differences, the evaluation models discussed in this section share a number of attributes. Each one draws upon a consensual image of social life. They assume that, in principle, the goals of a curriculum and the criteria for its success can be agreed upon. Their credibility rests on the stability of this assumption. In practice, consensus is usually arrived at by allowing surrogate interest groups, such as the evaluation community, to speak for the “welfare of society as a whole” (Scriven, 1967, p. 81). Whether consent is in fact assumed or established will vary from case to case. In most instances, however, there seems to be a tendency for course developers and evaluators to play a particularly strong role. The Tylerian tradition, for example, relies heavily upon the “curriculum maker” for its objectives (Tyler, 1949, passim; Stake, 1970, p. 187). Likewise, Scriven’s comparative evaluation model uses criteria validated by “highly qualified experts” and “professionally competent evaluators” (1967, pp. 58, 53). The national assessment program follows a similar pattern. It utilizes objectives identified by committees of “subject matter specialists” sprinkled with “thoughtful lay-persons” (Merwin & Womer, 1969, p. 315). Management systems approaches also subscribe to a similar view of consensus. Responsibility for defining performance criteria is
Salkind_Chapter 22.indd 111
9/4/2010 10:34:42 AM
112
Curriculum, Instruction and Learning
delegated to “skilled operating personnel” (Rippey, 1973, p. 13) or to the “program manager” (Provus, 1969, p. 251). Even the literary criticism, legal, and consumer models take an equivalent stance. Eisner, for instance, looks to “connoisseurship” for his ultimate criteria; decision-oriented legal models are suffused with the consensus image of unanimous verdicts (see Wolf, 1974, p. 62 ff.); and goal-free evaluation allows the evaluator to infer the “goals of the consumer or the funding agency” (Scriven, 1972b, p. 2). Given the assumption of goal consensus, the implementation of an evaluation rationale hinges upon the comparison of various means to achieve such ends. From John Stuart Mill and John Dewey, to Ralph Tyler and Michael Scriven, the possibility of realizing a theory of evaluation rested upon this assumption. For them, the dualistic separation of fact and value is incorrect and unacceptable. From Mill’s “Greatest Happiness” principle to Scriven’s “system of principles aimed at maximizing long-run social utility” (1967, p. 81), the assumptions and logic have remained essentially the same. Throughout, this vision of consensus has been well formulated, overtly rational, and immensely powerful. All the evaluation models discussed in this section stress the importance of agreement about objectives and/or criteria. As a consequence, they tend to play down the possibility that criteria might be mutually exclusive. This does not mean, of course, that they ignore areas of antagonism – merely that they regard them as potentially or pragmatically resolvable. Although Scriven, for instance, has acknowledged that different individuals may have an “opposite preference” (1972b, p. 2), his main thrust has been that evaluators should focus preferentially upon areas of agreement (1972a, p. 84). Stufflebeam et al. (1971) made a similar point. They proposed that the decision maker should go into the “value web of the larger world only as far as necessary to find a common value among his constructs” (p. 116). Both these strategies presume that values which are shared are more significant than discrepant values. There is no logical reason why this should be the case. Such a presumption may offer an expedient solution to the criterion problem, but it has difficulty in resolving the prior value question, Who decides that consensus has been achieved? As this suggests, consensus models tend to be justified by appeals to representative democracy, yet, as in the days of J. S. Mill, it is still not clear in each case whether everyone has achieved the right to vote or to sit on the jury panel.
Curriculum Evaluation and the Image of Pluralism Just as Herbert Spencer raised objections to Mill’s social theories, and various contemporaries of Bobbitt and Charters expressed concern with the curriculum-building model (e.g., Rugg, 1931), so certain commentators
Salkind_Chapter 22.indd 112
9/4/2010 10:34:42 AM
Hamilton
Making Sense of Curriculum Evaluation
113
articulated doubts about the consensus assumptions of recent evaluation theories. Ideas about pluralism and politics were brought to the forefront. A connection between consensus and politics was clearly identified by the early critics. For instance, in a review of “Research Styles in Science Education,” Atkin (1967–68) noted that one of the “major shortcomings” of the systems model of curriculum development was its reluctance to “recognize the competition among diverse value systems and power groups” (p. 341). Around the same time Stake’s “The Countenance of Evaluation” (1967a) paper included the pluralist argument that “part of the responsibility of evaluation is to make known which standards are held by whom” (p. 535). In retrospect, developments of this kind can be seen as a turning point in the recent history of evaluation. Atkin and Stake (both at the University of Illinois) were accepted leaders in, respectively, the fields of curriculum development and curriculum evaluation. It was as if the winners of the tournament had suddenly begun to question the rules that had made them victorious. The rediscovery of values was also fueled by Scriven’s contemporaneous argument that “evaluation proper” must also include the “evaluation of goals.” The idea that goal evaluation should be an “equal partner with the measuring of performance against goals” implied a radical shift of concern (Scriven, 1967, p. 52). Within Scriven’s rationale, evaluation was not merely “the process of determining to what extent the educational objectives are being realized” (Tyler, 1949, p. 105); it also included the post hoc scrutiny of the prespecified objectives.
Value Analysis and Value Pluralism The possibilities of value analysis and value pluralism opened up new perspectives and new problems for curriculum evaluation. The major premise that evaluation is the ascription of worth with reference to a given set of standards was joined by a new assumption: that a uniformity of standards may not be attainable in social situations. The introduction of this second assumption nullified the conventional wisdom of earlier theory. The notion of evaluator as “watchdog” became difficult to advocate, since its flavor was too reminiscent of autocratic, hired-hand research. Likewise, evaluators could no longer claim to provide categorical answers acceptable to all parties or to furnish prepackaged instruments suitable for every occasion. A longestablished technology became unwieldy, if not unsafe. In the search for more “democratic” models (see MacDonald, 1976) certain evaluators sought to redefine the evaluation problem, renegotiate their role, and reformulate their strategies for information gathering and data analysis. The new perspectives tended to acknowledge that evaluation is as much a sociopolitical as a methodological process (e.g., “The Process
Salkind_Chapter 22.indd 113
9/4/2010 10:34:42 AM
114
Curriculum, Instruction and Learning
and Ideology of Valuing in Educational Settings,” Apple, 1974; School Evaluation: The Politics and the Process, House, 1973; “Racism and Educational Evaluation,” Jenkins, Kemmis, MacDonald, & Verma, 1977; and “Politics, Ethics and Ideology,” Sjoberg, 1975). At the present time, evaluation models with a pluralistic concern are still relatively limited in their impact. They occupy either an interstitial or a subordinate status in the education system. In the former case they tend to be employed (and funded) where Tylerian and experimental models are empirically, financially, or politically less attractive – as in studies of alternative schools (Black & Geiser, 1971), programs in aesthetic education (Stake, 1975), extracurricular activities (Stake & Gjerde, 1974) and multicultural projects such as the Teacher Corps (Fox, 1976). The second context for the utilization of pluralist models is as a complement to Tylerian or experimental designs, as in the evaluation of Home Start (Love, Nauta, Coelen, Hewett, & Rupp, 1976) and the evaluation of a school-based computer-aided instruction curriculum (Smith & Pohland, 1974). In these latter cases, however, the pluralist assumptions were overshadowed by the consensus concerns of the dominant models. Despite a declared eclecticism, the conflicting priorities of consensus and pluralist models are rarely (if at all) resolved in a manner that honors the aspiration of all parties (see, e.g., Wehlage, 1976). In practical terms, pluralist evaluation models (Parlett & Hamilton, 1972; Patton, 1975; Stake, 1967a) can be characterized in the following manner. Compared with the classical models, they tend to be more extensive (not necessarily centered on numerical data), more naturalistic (based on program activity rather than program intent), and more adaptable (not constrained by experimental or preordinate designs). In turn they are likely to be sensitive to the different values of program participants, to endorse empirical methods which incorporate ethnographic fieldwork, to develop feedback materials which are couched in the natural language of the recipients, and to shift the locus of formal judgment from the evaluator to the participants.
Problems of Pluralism At first, such models were rarely self-conscious or explicit about their pluralism. They were more likely to emerge in isolation as a response to the methodological weaknesses of the traditional models. Through time, however, they began to develop an epistemological and logical identity of their own: a theory about evaluation rather than a theory of evaluation. This, in turn, raised a series of specific problems for pluralist practitioners. One such difficulty stemmed from the separation of program development from program evaluation. What role, for instance, can a pluralist evaluation play with regard to a Tylerian rationale which has as one of its “basic
Salkind_Chapter 22.indd 114
9/4/2010 10:34:42 AM
Hamilton
Making Sense of Curriculum Evaluation
115
assumptions” that “an educational program is appraised by finding out how far the objectives of the program are actually being realized” (Tyler, in Smith & Tyler, 1942, p. 12)? The evaluation team would be predisposed to scrutinize the objectives of the program – something that the development team would consider to be illegitimate. How can such a tension be resolved? The possibility of practitioner disagreement over curriculum objectives also means that pluralist evaluations are likely to work from program practices to program goals. In effect, a figure-ground reversal takes place. As in the case of goal-free evaluation, the learning milieu is regarded as containing the substance of educational innovation, not, as is sometimes implied, its pale or distorted shadow. The program shapes the evaluation methodology, not vice versa. As noted above, this can create serious difficulties, since it is no longer possible to prescribe specific methodological procedures without a knowledge of the context in which they are to be used. The preparation of training programs for pluralist evaluators is vitiated by this problem. In what sense is it possible to talk about a methodology in the absence of a complementary theory of the situation? A third issue resulting from the adoption of pluralist evaluation models relates to the establishment of evaluative criteria. Given the unacceptability of standards unilaterally offered by “experts,” the problem facing the evaluator is not so much which criteria and how as it is whose criteria and why. A fourth issue hinges on the interpretation of pluralist. In the extreme, pluralist evaluation models could be taken to mean that all viewpoints are equally valid, and all interest groups are equally powerful. This position creates profound problems, since the very essence of evaluation is making statements about the relative merits of different perspectives. To espouse this cause is to be committed to a relativist, value-free evaluation (see NowellSmith, 1971).
A Pluralist Theory of Evaluation? Doubts about relativism lie at the heart of debates about pluralist evaluation: In what sense is a pluralist evaluation possible? Will objective data be honored? In what sense can a pluralist evaluation be fair to all parties? Whose logic will be followed? These questions were posed very sharply in “Justice and Evaluation” (House, 1976). Although critical of the utilitarian basis of consensus models, House noted the absence of an explicit pluralist theory of evaluation. His paper offered such a theory, using notions derived from A Theory of Justice (Rawls, 1971). If J. S. Mill claimed that “utility” should be the “first principle” of ethics, Rawls countered with the concept of “justice as fairness.” The value of a curriculum is not measured against its effectiveness, as with aggregate changes in test score, but against its fairness. Furthermore, individual persons
Salkind_Chapter 22.indd 115
9/4/2010 10:34:42 AM
116
Curriculum, Instruction and Learning
are taken to be the basic focus of analysis, rather than social institutions such as “schools and colleges,” as in the Eight-Year Study (see Smith & Tyler, 1942, p. 5) or “geographic regions,” as in the National Assessment Educational Progress (see Education Commission of the States, 1974, p. 2). This last point allows for the fact that different individuals can pursue a “plurality of ends” (House, 1976, p. 97). Justice as fairness aspires to be the pluralist counterpart of utilitarianism. In these terms House confronted the problem of relativism by arguing that the viewpoint of the “least advantaged” (p. 84) should be given priority over the values of other groups. House had already tested these ideas in his evaluation of the Michigan Accountability System (House, Rivers, & Stufflebeam, 1974). This evaluation was commissioned by certain recipients of the scheme (the NEA), rather than by its originators (Michigan Department of Education). House acknowledged that his analysis “ignored” some of the “philosophic difficulties” (p. 98) surrounding the notion of justice as fairness (e.g., it presupposes the existence of a consensus in favor of pluralism). Nevertheless, his paper had the important distinction of forcing the relativism problem out into the open. Other models of pluralist evaluation have tried to resolve the relativist dilemma by stressing the information-gathering rather than the judgmentmaking aspects of evaluation. Still others have idealized the evaluator as a free-floating, independent intellectual. Both these positions (reviewed in House, 1976) are weak and unsatisfactory. At root they embody a kind of concealed consensus. The former comes very close to the hired-hand role of management-oriented evaluation, and the latter, by its appeal to expertise, is highly reminiscent of elitist variants of the consensus model. One final and perhaps more coherent response to the problems of pluralist evaluation has been to hand over responsibility for the control of an evaluation to those who have to live with its consequences. Evaluation is conducted by the participants rather than for the participants. Models of this kind, such as Black & Geiser’s (1971) notion of “peer research,” and Scheyer & Stake’s (1976) image of a “self-evaluation portfolio,” undermine the de facto hierarchical and bureaucratic processes of consensus evaluation. They respond to an alternative conception of accountability – one that locates community rather than centralized control at its core. They imply that lifelong experience of social relationships in inner-city schools may generate a more sophisticated account of educational practice than proficiency at multivariate analysis. And they also imply that the tacit knowledge of practitioners may be more significant to program operation than the generalized statements of theoreticians. In a sense, the history of curriculum evaluation has come full circle. Just as the curriculum was taken out of lay control in the 19th century, so, at a new level, demands are heard for its restoral. Curriculum evaluation has played a part in both these movements.
Salkind_Chapter 22.indd 116
9/4/2010 10:34:42 AM
Hamilton
Making Sense of Curriculum Evaluation
117
Summary and Conclusions In his introduction to a 1970 review symposium on educational evaluation, Denny (1970) noted the absence of investigations into the “historical growth of evaluation methodology.” This chapter, written by a participant rather than a bystander, has tried to respond to that shortcoming. Through an examination of the historical forms embraced by curriculum evaluation, it has related some of their more fundamental features. At the same time, it has suggested that recent events indicate a crucial differentiation of these institutional forms. On the one hand, there are evaluation models which rely for their coherence on notions of consensus; on the other, there are others (here termed pluralist) which treat consensus as a problematic assumption. As befits their origins in 19th-century liberal and pragmatic thought (Gouldner, 1971, passim; Karier, 1974, p. 280), consensus models are strong on technology, reformism, and social engineering. They regard evaluation as a technical accomplishment: the demonstration of empirical/logical connections between what is and what (we all agree) ought to be. If an evaluation is to be successful, there has to be “consensus on the key issues of the hierarchy of purposes of education and the rules of evidence” (Cooley & Lohnes, 1976, p. 5). The internal logic of these models guarantees the strength of their truth statements. Given agreement on educational ends, the unambiguous selection of appropriate means (i.e., the curriculum) is simply a technical problem. A crucial feature of pluralist models, however, is that they are skeptical of these “preconditions” (Cooley & Lohnes, p. 5). To this extent, the differences between consensus and pluralist models are epistemological rather than methodological. One explanation for the emergence of pluralist theories is that, as in the 1930s, they reflect a general crisis in the realm of social values. They are an expression of doubt and reflection rather than certainty and action. As such, they tend to be strong on conflicting interpretations, value differences, and incomplete closure. Evaluation is offered as an unfinished blueprint rather than a perfected technology. It generates issues, not solutions. It is about information rather than confirmation. By their openness, however, pluralist models also leave a number of questions unanswered. Is pluralism consonant with an overarching theory of values? Or is it incompatible with such a universalistic notion? Is the rise of pluralism a temporary phase? Will it be replaced by a new consensus? Or has crisis become a necessary feature of any social system that has gone “beyond the stable state” (Schon, 1971)? To address questions such as these is to reach out far beyond the problems of research technique and goal identification. Yet, if acts are best comprehended by establishing their context, then tasks of this magnitude are essential to an adequate understanding of curriculum evaluation.
Salkind_Chapter 22.indd 117
9/4/2010 10:34:42 AM
118
Curriculum, Instruction and Learning
References Anderson, S. B., Ball, S., & Murphy, R. T. (Ed.). Encyclopedia of educational evaluation. San Francisco: Jossey-Bass, 1975. Apple, M. W. The process and ideology of valuing in educational settings. In M. W. Apple, M. J. Subkoviak, & H. S. Lufler (Eds.), Educational evaluation: Analysis and responsibility. Berkeley, Cal.: McCutchan, 1974. Apple, M. W., Subkoviak, M. J., & Lufler, H. S. (Eds.). Educational evaluation: Analysis and responsibility. Berkeley, Cal.: McCutchan, 1974. Atkin, J. M. Some evaluation problems in a course content improvement project. Journal of Research in Science Teaching, 1963, 1, 129–132. Atkin, J. M. Research styles in science education. Journal of Research in Science Teaching, 1967–68, 5, 338–345. Black, S., & Geiser, K. The Watertown Home Base School evaluation methodology report. Watertown, Mass.: Home Base School, 1971. (mimeo) Bloom, B. S. (Ed.). Taxonomy of educational objectives: The classification of educational goals. Handbook 1: Cognitive domain. London: Longmans Green, 1956. Britton, K. John Stuart Mill. New York: Dover, 1969. Callahan, R. G. Education and the cult of efficiency: A study of the social forces that have shaped the administration of the public schools. Chicago: University of Chicago Press, 1962. Campbell, D. T., & Erlebacher, A. How regression artifacts in quasi-experimental evaluations can mistakenly make compensatory education appear harmful. In J. Hellmuth (Ed.), Disadvantaged Child (Vol. 3). New York: Brunner/Mazel, 1970. Campbell, D. T., & Stanley, J. C. Experimental and quasi-experimental designs for research on teaching. In N. L. Gage (Ed.), Handbook of research on teaching. Chicago: Rand McNally, 1963. Caro, F. G. Evaluation research: An overview. In F. G. Caro (Ed.), Readings in evaluation research. New York: Russell Sage Foundation, 1971. Charters, W. W. Review and critique of curriculum making for the vocations. In G. M. Whipple (Ed.), The foundations and technique of curriculum construction. 26th Yearbook of the NSSE. Bloomington, Ill.: Public School Publishing Company, 1926. Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood, A. M., Weinfeld, F. D., & York, R. L. Equality of educational opportunity. Washington, D.C.: U.S. Government Printing Office, 1966. Cooley, W. W., & Lohnes, P. R. Evaluation research in education. New York: Irvington Publishers (John Wiley), 1976. Cremin, L. A. The transformation of the school: Progressivism in American education, 1876–1957. New York: Alfred A. Knopf, 1961. Cronbach, L. J. Course improvement through evaluation. Teachers College Record, 1963, 64, 672–683. Denny, T. Foreword to a series of review articles on educational evaluation. Review of Educational Research, 1970, 40(2), Foreword. Dewey, J. Psychology and social practice. Psychological Review, 1900, 7, 105–124. Education Commission of the States. Questions and answers about the National Assessment of Educational Progress. Denver, Colo.: Education Commission of the States, 1974. Eisner, E. Emerging models for educational evaluation. School Review, 1972, 80, 573–590. Eisner, E. The perceptive eye: Toward the reformation of educational evaluation. Paper presented at the meeting of the American Educational Research Association. Washington, D.C., 1975.
Salkind_Chapter 22.indd 118
9/4/2010 10:34:42 AM
Hamilton
Making Sense of Curriculum Evaluation
119
Feinberg, W. Ethics and objectivity: The effects of the Darwinian revolution on educational reform. Educational Theory, 1973, 23, 294–302. Flanagan, J. C. The uses of educational evaluation in the development of programs, courses, instructional materials and equipment, instructional and learning procedures, and administrative arrangements. In R. W. Tyler (Ed.), Educational evaluation: New roles, new means, 69th Yearbook of the NSSE, Pt. 2. Chicago: University of Chicago Press, 1969. Fox, J. T. (Ed.). The 1975 CMTI impact study. Madison: University of Wisconsin, School of Education, 1976. (mimeo) Glass, G. V. The growth of evaluation methodology. Boulder, Colorado: Laboratory of Educational Research, 1970. (mimeo) Glazer, N. Social policy in America. New Society, April 5, 1973, pp. 9–11. Glennan, T. K. Evaluating federal manpower programs: Notes and observations. In P. H. Rossi & W. Williams (Eds.), Evaluating social programs. New York: Seminar Press, 1972. Goodlad, J. I. The changing school curriculum. New York: Fund for the Advancement of Education, 1966. Goodlad, J. I. Thought, invention and research in the advancement of education. Educational Forum, 1968, 33, 7–18. Gouldner, A. W. The coming crisis of Western sociology. London: Heinemann, 1971. Guba, E. Significant differences. Educational Researcher, 1969, 20, 4–5. Hagen, E. P., & Thorndike, R. L. Evaluation. In C. W. Harris (Ed.), Encyclopedia of educational research. New York: Macmillan, 1960. Hamilton, D. Educational research and the shadows of Francis Galton and Ronald Fisher. Unpublished paper, 1974 (mimeo). To appear in W. B. Dockrell & D. Hamilton (Eds.), Rethinking educational research. London: Hodder & Stoughton, in press. Hofstadter, R. Social Darwinism in American thought, 1860–1915. Philadelphia: University of Pennsylvania Press, 1955. House, E. (Ed.). School evaluation: The politics and process. Berkeley: McCutchan, 1973. House, E. Justice in evaluation. In G. V. Glass (Ed.), Evaluation studies review annual. Beverly Hills, Cal.: Sage, 1976. House, E. R., Rivers, W., & Stufflebeam, D. L. An assessment of the Michigan accountability system. Phi Delta Kappan, 1974, 55, 663–669. Hurd, P. D. New directions in teaching secondary school science. Chicago: Rand McNally, 1969. Jackson, P. W. Shifting visions of the curriculum: Notes on the ageing of Franklin Bobbitt. Elementary School Journal, 1975, 75, 119–133. Jenkins, D., Kemmis, S., MacDonald, B., & Verma, G. Racism and educational evaluation. In G. Verma & C. Bagley (Eds.), Race, education and identity. London: Heinemann, 1977. Jensen, A. R. How much can we boost IQ and scholastic achievement? Harvard Educational Review, 1969, 39, 1–123. Joncich, G. The sane positivist: A biography of Edward L. Thorndike. Middletown, Conn.: Wesleyan University Press, 1968. Karier, C. J. Testing for order and control in the corporal liberal state. In C. Karier, P. Violas, & J. Spring, Roots of crisis: American education in the twentieth century. Chicago: Rand McNally, 1973. Karier, C. Ideology and evaluation: In quest of meritocracy. In M. W. Apple, M. J. Subkoviak, & H. S. Lufler (Eds.), Education evaluation, analysis and responsibility. Berkeley, Cal.: McCutchan, 1974. Kelly, E. Curriculum evaluation and literary criticism: The explication of an analogy. Unpublished doctoral dessertation, University of Illinois at Champaign–Urbana, 1971.
Salkind_Chapter 22.indd 119
9/4/2010 10:34:42 AM
120
Curriculum, Instruction and Learning
Kerlinger, F. N. Foundations of behavioral research. New York: Holt, Rinehart & Winston, 1964. Krug, E. A. The shaping of the American high school, 1880–1920. Madison, Wis.: University of Wisconsin Press, 1969. Kourilsky, M. An adversary model for educational evaluation. Evaluation Comment, 1973, 4 (2), 3–6. (a) Kourilsky, M. The Levine adversary model: An adversary comment. Evaluation Comment, 1973, 4 (2), 6–7. (b) Lazerson, M. Origins of the urban school: Public education in Massachusetts, 1870–1915. Cambridge, Mass.: Harvard University Press, 1971. Levine, M. The Kourilsky adversary model: An adversary comment. Evaluation Comment, 1973, 4 (2), 8. (a) Levine, M. Scientific method and the adversary model. Evaluation Comment, 1973, 4 (2), 1–3. (b) Levine, M. Scientific method and the adversary model. American Psychologist, 1974, 29, 661–677. Light, R. J., & Smith, P. V. Choosing a future: Strategies for designing and evaluating new programs. Harvard Educational Review, 1970, 40, 1–28. Lindquist, E. F. (Ed.). Educational measurement. Washington, D.C.: American Council on Education, 1951. Lord, F. M. A paradox in the interpretation of group comparisons. Psychological Bulletin, 1967, 68, 304–305. Lortie, D. C. The cracked cake of educational custom and emerging issues in evaluation. In M. C. Wittrock & D. E. Wiley (Eds.), The evaluation of instruction: Issues and problems. New York: Holt, Rinehart & Winston, 1970. Love, J., Nauta, M., Coelen, C., Hewett, K., & Rupp, R. National Home Start evaluation: Final report. Ypsilanti, Mich.: High/Scope Educational Research Foundation; Cambridge: Abt Associates Inc., 1976. MacDonald, B. Evaluation and the control of education. In D. Tawney (Ed.), Curriculum evaluation today: Trends and implications. London: Macmillan, 1976. McDill, E. L., McDill, M. S., & Sprehe, J. T. Evaluation in practice. In P. H. Rossi & W. Williams (Eds.), Evaluating social programs. New York: Seminar Press, 1972. McKim, M. G. Curriculum research in historical perspective. In Research for curriculum improvement. 1957 Yearbook of the Association for Supervision and Curriculum Development. Washington, D.C., 1957. Merwin, J. C., & Womer, F. B. Evaluation in assessing the progress of education to provide bases of public understanding and public policy. In R. W. Tyler (Ed.), Educational evaluation: New roles, new means. 68th Yearbook of the NSSE, Pt. 2. Chicago: University of Chicago Press, 1969. Nagel, E. (Ed.). John Stuart Mill’s philosophy of scientific method. New York: Hafner, 1950. Nowell-Smith, P. H. Cultural relativism. Philosophy of Social Science, 1971, 1, 1–17. Parlett, M., & Hamilton, D. Evaluation as illumination: A new approach to the study of innovatory programs. Occasional Paper No. 9, University of Edinburgh Centre for Research in the Educational Sciences, 1972. (Reprinted in G. V. Glass (Ed.), Evaluation studies review annual. Beverly Hills, Cal.: Sage, 1976.) Patton, M. Q. Alternative evaluation research paradigm. Grand Forks, N.D.: North Dakota Study Group on Evaluation, 1975. Popham, W. J. (Ed.). Evaluation in education. Berkeley, Cal.: McCutchan, 1974. Popkewitz, T. S., & Wehlage, G. G. Accountability: Critique and alternative perspective. Interchange, 1973, 4 (4), 48–62. Provus, M. Evaluation of ongoing programs in the public school system. In R. W. Tyler (Ed.), Education evaluation: New roles, new means. 68th Yearbook of the NSSE, Pt. 2. Chicago: University of Chicago Press, 1969.
Salkind_Chapter 22.indd 120
9/4/2010 10:34:43 AM
Hamilton
Making Sense of Curriculum Evaluation
121
Provus, M. Discrepancy evaluation. Berkeley, Cal.: McCutchan, 1971. Rawls, J. A theory of justice. Cambridge, Mass.: Belknap, 1971. Rippey, R. M. (Ed.). Studies in transactional evaluation. Berkeley, Cal.: McCutchan, 1973. Rugg, H. O. Culture and education in America. New York: Harcourt Brace, 1931. Scheyer, P., & Stake, R. E. A program’s self-evaluation portfolio. Center for Instructional Research and Curriculum Evaluation, University of Illinois at Champaign–Urbana, 1976. (mimeo) Schon, D. A. Beyond the stable state: Public and private learning in a changing society. London: Temple Smith, 1971. Schutz, R. E. Methodological issues in curriculum research. Review of Educational Research, 1969, 39, 359–366. Sciara, F. J., & Jantz, R. K. (Eds.). Accountability in American education. Boston: Allyn & Bacon, 1972. Scriven, M. The methodology of evaluation. In AERA Monograph Series on Curriculum Evaluation (No. 1). Chicago: Rand McNally, 1967. Pp. 39–83. Scriven, M. An introduction to meta-evaluation. In P. A. Taylor & D. M. Cowley (Eds.), Readings in curriculum evaluation. Dubuque, Iowa: William C. Brown, 1972. (a) Scriven, M. Pros and cons about goal-free evaluation. Evaluation Comment, 1972, 3, 1–4. (b) Scriven, M. Evaluation perspectives and procedures. In W. J. Popham (Ed.), Evaluation in education. Berkeley, Cal.: McCutchan, 1974. Seguel, M. L. The curriculum field: Its formative years. New York: Teachers College Press, 1966. Sjoberg, G. Politics, ethics and evaluation research. In E. L. Struening & M. Guttentag (Eds.), Handbook of evaluation research (Vol. 2). Beverly Hills, Cal.: Sage, 1975. Smith, E. R., & Tyler, R. W. Appraising and recording student progress. New York: Harper & Bros., 1942. Smith, L. M., & Pohland, P. A. Education, technology and the rural highlands. In AERA Monograph Series on Curriculum Evaluation (No. 7). Chicago: Rand McNally, 1974. Stake, R. E. The countenance of educational evaluation. Teachers College Record, 1967, 68, 523–540. (a) Stake, R. E. Toward a technology for the evaluation of educational programs. In AERA Monograph Series on Curriculum Evaluation (No. 1). Chicago: Rand McNally, 1967. (b) Stake, R. E. Objectives, priorities, and other judgment data. Review of Educational Research, 1970, 40, 181–212. Stake, R. E. (Ed.). Evaluating the arts in education: A responsive approach. Columbus, Ohio: Charles E. Merrill, 1975. Stake, R., & Gjerde, C. An evaluation of TCITY, the Twin City Institute for Talented Youth. In AERA Monograph Series on Curriculum Evaluation (No. 7). Chicago: Rand McNally, 1974. Stanley, J. C. The influence of Fisher’s “The Design of Experiments” on educational research thirty years later. American Educational Research Journal, 1966, 3, 223–229. Stufflebeam, D. L., Foley, W. J., Gephart, W. J., Guba, E. G., Hammond, R. L., Merriman, H. O., & Provus, M. M. Educational evaluation and decision making. Itasca, Ill.: F. E. Peacock, 1971. Taylor, F. W. The principles of scientific management. New York: Harper & Bros., 1911. Taylor, P. A., & Cowley, D. M. New dimensions in evaluation. In P. A. Taylor & D. M. Cowley (Eds.), Readings in curriculum evaluation. Dubuque, Iowa: William C. Brown, 1972. Taylor, P. A., & Cowley, D. M. (Eds.). Readings in curriculum evaluation. Dubuque, Iowa: William C. Brown, 1972. Thurstone, L. L., & Thurstone, T. G. Factorial studies of intelligence. Chicago: University of Chicago Press, 1941. Tyack, D. B. The one best system: A history of American urban education. Cambridge, Mass.: Harvard University Press, 1974.
Salkind_Chapter 22.indd 121
9/4/2010 10:34:43 AM
122
Curriculum, Instruction and Learning
Tyler, R. W. Basic principles of curriculum and instruction. Chicago: University of Chicago Press, 1949. Tyler, R. W. (Ed.). Educational evaluation: New roles, new means. 68th yearbook of the NSSE, part 2. Chicago: University of Chicago Press, 1969. Walberg, H. J. (Ed.). Evaluating educational performance: A sourcebook of methods, instruments, and examples. Berkeley: McCutchan, 1974. Walker, D. F., & Schaffarzick, J. Comparing curricula. Review of Educational Research, 1974, 44, 83–111. Webb, B. My apprenticeship. Harmondsworth, England: Penguin Books, 1971. (Originally published, 1926.) Wehlage, G. The ethics and politics of evaluation: Patrons, clients and casualties. Paper presented at the meeting of the American Educational Research Association, San Francisco, April, 1976. Weir, E. An experimental approach to curriculum evaluation: The BSCS population genetics field trial. In R. E. Stake (Ed.), Case studies in the evaluation of educational programmes. Paris: Organization for Economic Cooperation and Development, 1976. Weiss, R. S., & Rein, M. The evaluation of broad-aim programs: A cautionary case and a moral. Annals, 1969, 385, 133–142. Welch, W. W., & Walberg, H. J. A course evaluation. In H. J. Walberg (Ed.), Evaluating educational performance. Berkeley, Cal.: McCutchan, 1974. White, M. G. Science and sentiment in America: Philosophical thought from Jonathan Edwards to John Dewey. New York: Oxford University Press, 1972. Williams, W., & Evans, J. W. The politics of evaluation: The case of Head Start. Annals, 1969, 385, 118–132. Wolf, R. L. The application of select legal concepts to educational evaluation. Unpublished doctoral dissertation, University of Illinois at Champaign–Urbana, 1974.
Salkind_Chapter 22.indd 122
9/4/2010 10:34:43 AM
23 Psychology of Learning Environments: Behavioral, Structural, or Perceptual? Herbert J. Walberg
B
efore overviewing the purpose and parts of this chapter, it may be helpful to mention the general features of the three models of learning process that are contrasted in the discussion (Figure 1). In the behavioral model the teacher presents stimuli to the student, observes or psychometrically assesses the responses, and selectively reinforces them by reward and punishment. In the structural model, the preprogrammed development of internal mechanisms mainly determines the course of learning; the teacher stimulates the maturation of these mechanisms, draws them out, or provides the environment in which they can be acted upon or be concretized. The perceptual model allows for behavioral and structural mechanisms but holds that the student’s conscious perception of internal and external stimuli and his choices are the proximate, mediating determinants of learning. Because behaviorism has increasingly dominated psychological thought about education since John B. Watson’s (1913) famous paper, it should be critically examined. The natural sciences now raise questions about its assumptions, and certain theoretical presuppositions of behaviorism in education that have been neglected should be made explicit. Accordingly, the first section of this review critically questions behaviorism from a structural perspective. The second section describes the intriguing but rudimentary character of structuralism itself, as either a scientific theory of psychology or a workable basis of educational practice. The third section proposes perception as a useful, transactional concept between structure and behavior in research on Source: Review of Research in Education, 4 (1976): 142–178.
Salkind_Chapter 23.indd 123
9/4/2010 10:34:29 AM
124
Curriculum, Instruction and Learning
1. Behavioral Model
Behavior
Instruction
External Stimuli 2. Structural Model
Learning
Instruction
Internal Structures 3. Perceptual Model External and Internal Stimuli Perception
Learning
Emergent Structures
Figure 1: Three models of the learning process
classroom learning environments, and the fourth proposes a framework for future empirical work. The fifth section discusses several analytic problems of perceptions of learning environments and examples of training, intervention, and evaluation applications. Thus the first three sections of this review concentrate on theoretical issues of psychology and their consequences for educational research, and the last two sections take up questions concerning research methods in classroom perceptions. An attempt has been made to avoid topics and issues that have been recently reviewed elsewhere. See Moos (1973) and Insel and Moos (1974) for an analytic treatment of psychological research on a variety of human settings; W. J. Campbell (1970) and Marjoribanks (1974) for valuable but neglected collections of substantive work on learning environments carried out in Australia, Canada, England, and the United States; Khan and Weiss (1973), Randawa and Fu (1973), Shulman and Tamir (1973), and Walberg (1971, 1974a) for substantive and methodological reviews; and Walberg (1974b) for a source book of learning environment instruments and evaluations by several research groups.
Salkind_Chapter 23.indd 124
9/4/2010 10:34:29 AM
Walberg
Psychology of Learning Environments
125
Structural Criticism of Behaviorism One way of sorting psychological theories of education is to ask whether learning is more the enactment of structural potentialities or the external shaping of behavior. This classification leaves out some theories, and the dichotomy oversimplifies for some purposes. Nevertheless, it enables us to see the historical continuity of two fairly distinct psychological traditions that trace back to the origins of Western thought and inquiry. They remain philosophically and scientifically unreconciled, and they conflict in their implications for educational practice. Alfred North Whitehead observed that Western philosophy (which until this century included psychology as a special topic) may be considered footnotes to Plato. Indeed, historically oriented reviews of social and developmental psychology (Allport, 1968; Riegel, 1972; Walberg, 1973) and psychological theories of instruction (Walberg, 1975) show that Plato’s Republic posed many of the enduring questions of educational psychology. Plato’s dualism sharply distinguished mind and matter, and he held that ideas alone endure. Perhaps influenced by Oriental beliefs in reincarnation, he held that ideas are present in the child’s mind at birth. Since “knowledge is but remembrance,” the Socratic teacher acts as a “midwife of ideas.” The original meaning of educate – to draw forth rather than to stamp in – follows from Plato’s theory of ideas. Education, for him, is the soul’s re-cognition, more precisely, stagewise apprehension and integration of abstract ideas, of which the empirical flux is but a series of images. However odd these notions appear at first glance, they underlie a vital tradition of structuralism in Continental Europe that challenges Anglo-American behaviorism today. Structuralists, represented by Sig-mund Freud, Carl Jung, and Jean Piaget, posit innate or a priori structures (ideas, qualities, or sophisticated capacities) and often stress their stagewise development or integration in the individual or in society. Educational cultivators of the structural tradition include Rousseau, Pestalozzi, Froebel, Montessori and open educators who call for child-centered instruction in contrast to that which is centered in the external authority of the subject matter, teacher, or society (Riegel, 1972; Walberg, 1975). The tenets of behaviorism can be traced to Aristotle, who was skeptical of Plato’s concept of ideas. Perhaps because of his early interest in biology, he favored explanation in terms of empirically discriminable qualities, classification rather than integration of subject matter for inquiry, descriptive taxonomies, and an atomistic concept of mind as the aggregation and association of mental elements by induction from external reality. John Locke’s portrayal of the child’s mind as a “blank tablet” descends from Aristotle. Other English and American inheritors of the Aristotelian tradition include Bacon, Hume, Newton, and other English empiricists; Hobbes, Darwin, Spencer, Summer, Pearson, and Galton, who described ideal types that survive in competitive environments; and Galton, Thorndike, Watson, Gesell, Hull, Terman, and Skinner, who sought to identify, norm, or amass items of knowledge or discrete behaviors.
Salkind_Chapter 23.indd 125
9/4/2010 10:34:29 AM
126
Curriculum, Instruction and Learning
To be sure, structuralism and behaviorism have stimulated one another, and the important theorists are not all geographically, philosophically, or linguistically separated by the English Channel. Starting with Aristotle, who tried to harmonize his ideas with Plato’s, many synthesizers had first-hand contact with the two traditions: William James, John Dewey, and Henry Murray were directly influenced by Continental thought; Ludwig Wittgenstein, Kurt Lewin, Egon Brunswik, Erik Erikson, Bruno Bettelheim, and Paul Lazarsfeld were Continental migrants to England and the United States. But there has been no grand, enduring synthesis. Structuralism is substantive and methodologically alien from the psychologies of the English-speaking countries, and we often seem to misunderstand or distort it. Some examples illustrate the point. Although Freud is one of the great minds of this century, and perhaps of all time, he was rightly pessimistic about American psychologists’ understanding of psychoanalysis, and there are few psychology departments today where it may be studied in depth. The simple tests Alfred Binet devised or adapted to pick out those few Parisian children who might not benefit from regular schooling led to the American educational psychologists’ apparent preoccupation with testing and ranking of verbal achievement, among both hereditarians and environmentalists. Contrary to Piaget’s (1971) position, behaviorists attempt to norm and accelerate the development of his schemata. Proponents of open or informal education whose premises are structural, as evidenced in content analysis of their writings (Walberg, 1975), resist the behavioral confusion of their ideas with open space, criterion-referenced tests, and permissiveness. Notwithstanding these misunderstandings, structuralists can constructively stimulate behaviorists. Each position, in my opinion, is more like a complex of aesthetic preferences than a scientific model that can be rigorously tested. The structuralist insistence on theoretical coherence, scope, and invariance challenges and balances behaviorist inclinations toward classification and taxonomy, unrelieved empiricism, and the vision of prediction and control as ultimate scientific criteria. As Shulman (1974) points out, E. G. Boring, historical spokesman for American psychology’s hard-nosed experimental wing, cites Freud as the most important psychologist of the first half of the 20th century. McGuire (1973) admires Freud and Piaget for their direct confrontation of empirical reality, in a sense going behaviorists one better on their own ground, rather than studying indirect or secondary evidence of behavior such as the closing of relays, EKG recordings, and filled-in blanks. Moreover, psychoanalysis and the méthode clinique require ongoing experimental interventions. Since our sympathies may be more strongly linked to behaviorism, we ought to examine critically our position from the other point of view. Accordingly, a structural reconsideration of the behavioral concepts of atomism, flux, evolution, control, and reductionism is in order. The discussion is prefaced by noting that not all behaviorists would subscribe to the extreme behavioral views described below (perhaps none would), and neither do
Salkind_Chapter 23.indd 126
9/4/2010 10:34:29 AM
Walberg
Psychology of Learning Environments
127
structuralists constitute a pure type. The differences in views are sharply drawn to illustrate the theoretical issues.
Atomism Both Aristotle and Locke drew on Democritus’ atomism in formulating models of learning. Locke, who began the atomistic tradition of English empiricism, followed Bacon’s and Newton’s utmost caution in theorizing. Newton, it may be recalled, took pride in refusing to speculate about a causal structure to explain gravitational attraction. As the biologist Pantin (1968) points out, even Darwin’s “theory” of evolution must be considered an inductive generalization based on an immense number of empirical instances; the explanation in terms of genetic structure came much later. Physics and chemistry began to be made exact and mature at the time of Locke and Newton by progressively finer analysis of discrete particles and briefer phenomena (Pantin, 1968). Influenced by the unquestionable breakthroughs of the natural sciences, psychology and the other nascent human sciences sought revelations first in psychobehavioral (mind-matter) atomism. Later, Watson (1913) and others tried to purge the concept of mind (or psyche) from psychology because of its subjectivity; they aimed for purely behavioral atomism. E. L. Thorndike, who explained the transfer of learning from one situation to another by the overlap of identical elements, broke the subject matter into elements and represented them on exercise and test items (Walberg, 1975). Similarly, many contemporary educational psychologists advocate lists of behavioral objectives and criterion-referenced items to formulate curricula and study teaching by classifying the teacher’s behavior every few seconds. Clark Hull’s (1943) theory of learning is the paragon of behavioral atomism: repeated responses of the organism to external stimuli (cues), reinforcement by external stimuli (rewards), and chains of association that produce an internal copy of external cues. Piaget (1971) holds that if the purpose of education is to elicit the repetition of transmitted cues, the machine can indeed replace the teacher, but he argues that if the purpose is personal comprehension, then “to know by heart is not to know” (p. 139). The molecular biologist Stent (1975) argues that behaviorism’s insistence on treating only raw elements of sensory data and direct inductive inferences has restricted the human sciences to taxonomic or descriptive disciplines with little explanatory power. Recent neurological studies show that information about the world reaches consciousness not as elements of raw data but as transformed patterns, and the transformations proceed according to a pattern that preexists in the brain. Koestler (1964) and Langer (1973) marshal natural science findings supporting the existence of a priori mental structures. There is little reason to doubt that adaptive capacities and complex patterns of behavior are passed on
Salkind_Chapter 23.indd 127
9/4/2010 10:34:29 AM
128
Curriculum, Instruction and Learning
genetically through DNA and RNA (B. Clark, 1975; Stent, 1975; Kolata, 1975; Piaget, 1970). For example, Konrad Lorenz’s Nobel Prize work in comparative ethology shows that innate mechanisms such as imprinting in ducks are preprogrammed for release at critical times. Many complex patterns of singing, courtship, navigation, and other activities in insects, birds, fish, and mammals are also innate; and the environment ordinarily plays a comparatively minor permissive or activating role rather than a determining one (Kolata, 1975). Since Immanuel Kant’s transcending concepts – time, space, and causality – are adaptive for a wide variety of human environments, there is reason to believe that they may have been selected for evolutionary fitness (Stent, 1975). Moreover, the psycholinguist E. H. Lenneberg (1969) believes that a structural capacity for language is a priori, in view of the fact that though human environments vary enormously, there is a nearly universal mastery of complex language rules among children. It does seem that humans innately possess certain universal conceptual capacities or frameworks for the interpretation of experience and for complex response patterns. These can be explained neither as randomly emitted, atomistic behaviors nor as products of discrete cues or reinforcements.
Flux Atomism often goes with flux in psychological theories because both derive from extreme empiricism. The pre-Socratics stated the issue, which is still vital in philosophy of science: Heraclitus believed all is flux; you cannot step into the same river twice. But Parmenides held that transitory appearances conceal mutual relations of deeper, unchanging realities. Acceptance of flux as the main stuff of psychological and educational science can be found in several recent reviews, as well as in many contemporary research papers. Scriven (1956), Glass (1972), and Cronbach (1975) believe that explanations of human behavior remain short-lived, in contrast with the enduring theories of physical scientists. Gergen (1973) holds that psychology does not originate but merely reflects the thoughts of any historical period of time. Cronbach (1975) again calls for a search for trait-by-treatment interactions and now holds that such interactions vary across decades of history. In accord with these notions, McGuire (1968) measures the maturity of a science by the complexity of its interactions. On the other hand, S. S. Stevens, an American inheritor of Continental structural psychophysics, taught that “the scientist’s contest with nature has prospered to the degree that simplicities and uniformities (invariances) have been detected amid the complexities that afflict observation and experiment” (quoted in Lockhead, 1975). The French sociologist Boudon (1971) contrasts structural analysis of enduring essentials with phenomenal description of transitory appearances. Structural equation models, of which path analysis is
Salkind_Chapter 23.indd 128
9/4/2010 10:34:29 AM
Walberg
Psychology of Learning Environments
129
a special case, specify the invariant causal mechanisms that generate observable variables and their relations (Goldberger & Duncan, 1973). Albert Einstein dismissed hypotheses generated solely to fit a particular data set and stuck to his relativity theory, despite the early appearance of apparently disconfirming evidence, because he believed his equations were encompassing, parsimonious, and beautiful (Clark, 1971). Similarly, educational psychologists may do better by adhering to theoretical invariances more often than to disconfirming data, because measurement error, reciprocal causality, and inadequate control prevent the more definitive empirical tests of theory carried out in the natural sciences.
Evolution Clark (1975) notes the irony that following the centuries in which the Bible was considered the final word on biology, interpretations and misinterpretations of Darwin’s Origin of the Species became an evolutionary basis of psychology, religion, ethics, politics, and sociology. The “evolutionary” behaviorists can be divided into two groups. Although they disagree vehemently, they both make the same error, the narrowing of the quality of human nature or cultural progress to a single criterion. The original hereditarian group – Herbert Spencer, Francis Galton, and Karl Pearson – lived in a socially stratified English society in which evolution was a ready explanation for the ideal social types as well as the poor or deviant. Perhaps influenced by Calvinistic ideas of predestination, they sought to describe, measure, and norm hereditary potential for acquiring and connecting elements of external reality. The environmentalist side was originally and audaciously stated by Watson; residing in Chicago, where some poor migrants became rich while others did not, he claimed he could train any healthy infant to become a doctor or a thief, regardless of his heredity. (The statement following his famous contention is seldom quoted but is still true today: “I am going beyond my facts and I admit it, but so have advocates of the contrary and they have been doing it for thousands of years” (Watson, 1925, 104). The testing movement gave behaviorists on both sides of the naturenurture question an axis for the controversy. Because standardized ability and achievement tests are built on the item-total score correlation and are evaluated most often for internal consistency, they tend to measure a single characteristic, what English psychologists call verbal-educational ability. Lohnes and Marshall’s (1965) factor and canonical analyses show that grades and test scores in student records are mainly accounted for by this one source of variance. Since some teachers measure their students’ progress by this single criterion, the tests reinforce a typically academic verbal parochialism, what Piaget (1970) calls “a proliferation of pseudo-ideas loosely hooked on to a string of words lacking all real meaning” (p. 157). Perhaps this judgment
Salkind_Chapter 23.indd 129
9/4/2010 10:34:30 AM
130
Curriculum, Instruction and Learning
is harsh; but it is true that, although verbal-educational ability and grades afford moderately accurate predictions of subsequent grades and educational attainment in years, for persons with a given education, neither factor seems to predict occupational or other kinds of adult success (Walberg, 1974b, l974d; Taylor and Ellison, 1975). The exception is school-like tasks, and success in these is predicted very inaccurately, with validities in the .20s (Taylor & Ellison, 1975). Verbal-educational ability has been overemphasized at the expense of other traits such as problem solving, integrity, courage, leadership, and practical knowledge; and it does not follow that what is conveniently or conventionally measurable or observable is most important. A new and persuasive caution is set forth by evolutionary geneticists (for example, B. Clark, 1975). Contrary to behavioral premises, evolution is not an orderly progression of successively superior types that can be ranked on a single dimension but a complex set of shifting equilibriums between the genes and the environment. By a process of polymorphism, evolution does not converge on pure types suited only for one environment but instead generates genetically diverse populations, able to adapt to changing, heterogeneous environments. Samples of individuals even from a single ethnic group will differ “at hundreds of chromosomal loci, and possibly at thousands” (B. Clark, 1975, p. 60). These differences result in an enormous diversity among individuals, which is to be celebrated rather than simplified or suppressed in a free, cosmopolitan society. Thus, it would be a great mistake to allow the purpose of education to become the genetic, environmental, or even interactionist optimization of verbal-educational ability or those few psychometric traits and behavioral norms that can be conveniently assessed. Rather, one of the general purposes of education should be to enhance the best qualities of the child’s inheritance and early experience. Plato and Piaget say this is best accomplished by drawing the child out rather than stamping knowledge in; and we can look to the educational tradition of Rousseau, Pestalozzi, Froebel, Montessori, and Piaget for more specific clues on how it might be done. But even if this view is unconvincing, it does appear that, analogous to Cannon’s (1932) Wisdom of the Body, the child’s mind consists in part of homeostatic components which preserve identity and individuality against the vicissitudes of the environment (see Bloom, 1964, on the stability of a wide variety of traits after school age). The surpassing adaptability of these components, a product of evolution and early environment, may account for our difficulties in replicating enduring main or interaction effects of behavioral variables in the classroom (Walberg, 1975).
Control Chomsky (1959) and Bowers (1973), among others, point out that there is nothing wrong with the Skinnerian definition of a reward or reinforcer as
Salkind_Chapter 23.indd 130
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
131
“a stimulus that increases the probability of a response,” but they maintain that to say “behavior that is acquired and maintained is reinforced” not only does not follow but completes a vicious circle. Aside from tautology, the evidence for the generality of reinforcement is limited. Some behaviorists wish to make it the guiding principle for the perplexed individual or society. As evolution has shaped species throughout the ages, so it is claimed, contingencies of reinforcement shape the individual during his lifetime, increasing the frequency of adaptive behaviors and extinguishing maladaptive ones. One way to demonstrate reinforcement is by reductionism: modeling psychological research on experimental physics, seeking artificially pure species and stable environmental conditions, and selecting highly manipulable reinforcers. Connectionists can “deprive the white rat (Strain number 02×Y ) of a third of its body weight” (to avoid the mentalistic concept of “hunger”) and show that food pellets control the speed at which it runs (not “memorizes”) the maze. Rats, however, cannot be controlled by water deprivation in this way because, in the rat’s environment, the location of food varies, although water is usually found in the same place (Bermant, 1973); the “independent variable” is unsuitable for the rat’s life-style as it has evolved. If each species requires a different reinforcer, then comparative behaviorism faces a difficult task of descriptive taxonomy, because science requires either theoretical generalization or enumerated instances. Enumeration would be still more difficult if allowances were to be made for differences in drive and maturation levels and other factors within species. Moreover, since humans are more complex than rats and pigeons, and since social environments afford more cues, competing reinforcers, and opportunities for action that individuals perceive and weigh differently in reaching decisions, the task of behavioral application becomes indeed awesome. No wonder, then, that operantly shaped patient behaviors typically revert to base line after the end of therapy, that such behaviors as smoking, stuttering, and autism are remarkably resistant to control by reinforcement (Bowers, 1973), and that main and interaction effects of behavioral programs have been difficult to extend to natural settings. If, on the other hand, behaviorists posit some general theory of mechanism of reinforcement instead of the mere empirical regularities called for by Skinner (1950) in “Are Theories of Learning Necessary?” then perhaps there is less of a quarrel with structuralists. Freud, for example, offers “the pleasure principle” that is not only a part of a grand explanation of human nature but is accompanied by cautions that it applies mostly to infants and regressed adults in extreme situations. Freud, one imagines, would want to question Watson’s repudiation of all forms of subjective experience and ask, with Murray (1959), how the semantic somersaulting of “cognitive behavior,” “emotional behavior,” and even “dream behavior” clarifies psychological theory. On the educational wisdom of behaviorism, aside from its efficacy, George Orwell and Aldous Huxley have envisioned 1984 and Brave New World as
Salkind_Chapter 23.indd 131
9/4/2010 10:34:30 AM
132
Curriculum, Instruction and Learning
examples of reinforcement that are as abhorrent as genetic control. Nor does it seem wise to control young children in this way simply because they are docile. As Whitehead warned: The result of training is that qualities essential at a latter stage of a career are apt to be stamped out at an earlier stage. This is only an instance of a more general fact, that necessary technical excellence can only be acquired by training which is apt to damage those energies of mind which should direct the technical skill. This is a key fact of education and the reason for most of its difficulties. (1929, p. 96)
Reductionism During the past half century, behavioral tenets gained much of the allegiance of psychologists in the United States and in other English-speaking countries, even to the extent of identifying psychology as that “behavioral” science closest to the natural sciences. (Ironically, “psychology” first meant the study of the soul or mind; strictly speaking, the science of behavior is kinesiology, a branch of physical education.) Stent (1975) exaggerates in saying: “Now, in retrospect, at a time when such tenets appear to be moribund, it seems surprising that these views ever did manage to gain such a hold on the human sciences” (p. 1054). On the contrary, it is understandable: psychologists had been tearing themselves away from philosophy, armchair speculation, and “subjective mentalism,” even from phenomenological experiences of everyday life. They emulated what they understood to be the objective, atomistic, esoteric spirit of the natural sciences. Whether or not the psyche should be purged from psychology is a question that goes beyond psychology. Two eminent scientists whose thought survives the test of time, Darwin and William James, warn against behavioral reductionism. Skinner (1971) desires to “follow the path taken by physics and biology by turning directly to the relation between behavior and the environment and neglecting supposedly mediating states of minds” (p. 15). Before him, Watson (1925) took James to task for departing from the “Darwinian” model of “thoroughly objective and behavioristic descriptions of emotional reactions” (p. 29). In fact, as his recently published notebooks (Gruber & Barrett, 1974) show, Darwin made much use of subjective experience, recorded and interpreted his own dreams, conducted one of the earliest questionnaire studies, and aided Galton in questionnaire investigations of introspective experiences. Darwin criticized views of learning as blind trial and error or as specialized instinct and did not hesitate to attribute intelligence, even to worms: To sum up, as chance does not determine the manner in which objects are drawn into the burrows, and as the existence of specialized instincts for each particular case cannot be admitted, the first and most natural supposition is that worms try all methods until they at last succeed; but many
Salkind_Chapter 23.indd 132
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
133
appearances are opposed to such a supposition. One alternative alone is left, namely, that worms, although standing low in the scale of organization, possess some degree of intelligence. (1881, pp. 92–93)
Moreover, as Gruber and Barrett (1974) point out, Darwin went into great detail in The Origin of the Species (1859) about cuckoos laying their eggs in other birds’ nests, ants making slaves of other ants, and the like to make the point that many complex adaptive patterns among infrahumans are intelligent. James insisted that mind is an active force. The knower, he held: . . . is not simply a mirror floating with no foothold anywhere, and passively reflecting an order that he comes upon and finds simply existing. The knower is an actor, and coefficient of the truth on one side, whilst on the other he registers the truth which he helps to create. Mental interests, hypotheses, postulates, so far as they are bases for human action – action which to a great extent transforms the world – help to make the truth which they declare. In other words, there belongs to mind, from its birth upward, a spontaneity, a vote. It is in the game, and not a mere looker-on. (1899, pp. 148–149)
Since the basic form of classroom instruction – the short factual recitation of text or what the teacher says – appears to have been unchanged since the turn of the century (Hoetker & Ahlbrand, 1969), James might protest today that the student needs something more than choosing the “one best answer”; he or she needs to compose his or her own answer, or better, his or her own question. For personal meaning and choice are the essences of education. As James says in Talks to Teachers (1889): “The solid meaning of life is always the same eternal thing – the marriage, namely, of some unhabitual ideal, however special, with some fidelity, courage, and endurance; with some man’s or woman’s pains” (pp. 88–89; emphasis added).
Structuralism: Problems and Possibilities Although structuralism may eventually provide the conceptual keys that will unlock the human mind, it is by no means evident that this will be accomplished in the next few generations. The arguments for structuralism, like those for behaviorism, are hypothetical, based on evidence from unusual circumstances or unverifiable analogies from research on infrahumans. Even though the biological sciences provide indications of the evolution and inheritance of innate mental structures in the human race, there is little consensus within the structural tradition as to exactly what they are: Kant’s a priori ideas, Freud’s principles or structures, Jung’s archetypes, Piaget’s schemata, or Levi Strauss’s universals. Nor can it be claimed that factor analysis, a kind of “objective” Anglo-American methodological search for psychological structure, has established the number and nature of mental components.
Salkind_Chapter 23.indd 133
9/4/2010 10:34:30 AM
134
Curriculum, Instruction and Learning
Mature science implies consensus, and it is not apparent. Moreover, psychological structuralists, apart from introspectionists, depend on behaviorism in the sense that the latencies they seek can only be inferred from what is observable or in the sense that they are obligated to specify the strings that connect the outside of the machine to the ghost inside. Structuralists and behaviorists seem generally to be going their separate ways. Educators, of course, cannot wait for a grand synthesis; and many find behaviorism preferable to structuralism as the basis for practice. Behaviorism indicates how to split up the curriculum; gives the teacher (or mechanism of instruction) a pragmatic, active, dominant role; and delivers visible (though not necessarily generalizable) results. Behaviorists have also made themselves, their ideas, commodities, and services, available to the schools, first in the form of tests and exercises and today in the form of behavioral objectives and systems of instruction. In contrast, leading structural theorists seem preoccupied with ascertaining the nature of the internal structures and are less directly interested in education. There are no widespread educational programs based on structural premises comparable to the federally supported, behaviorally oriented systems that are designed for adoption by whole schools or districts. A teacher is most likely to pick up structural ideas from a Socratic professor, an inspiring colleague, a summer trip to Bristol, England, or Grand Forks, North Dakota, to see an open classroom, or a chance reading of Summerhill or Piaget. But this informality can easily be underestimated. Comparing the contemporary structural theory represented in, say, Piaget (1970) or a content analysis of the scattered writings on open or informal education (Walberg & Thomas, 1972) with observations of nominally open classes in England and the United States reveals rough correspondences: 1. Direct encounters with science materials and the nonacademic environment, as well as vicarious and verbal experience. 2. Group-developed standards of mutual aid, discipline, and justice. 3. Teacher-student and student-student collaboration in determining the goals and means of learning, within teacher constraints. 4. Diagnostic notes and samples of the student’s work to evaluate and redirect learning. 5. Allowances for maturation and individuality on the basis of teacher questioning, observation, and judgments, rather than on norm- or criterionreferenced tests. 6. An enriched environment and stress on the integration of knowledge. 7. The use of critical discussion and reflection to encourage reason and selfdetermination rather than belief in authority to replace egocentrism. These practices can be seen in classes where behavioral principles are evident, and the observer can only stand in admiration of the genius of school men
Salkind_Chapter 23.indd 134
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
135
and women who have generally been able, as Ravitch (1974) suggests, to find a workable compromise or reconciliation of the conflicting pressures on education. Such is likely to be the case before structural and behavioral theorists settle their differences.
Perception and Learning Between the external flux of behavior and the mysteries of deep structures is what Hebb (1974) declares psychology is all about: the mind. Acceptance of this and related phenomenological concepts may help us to rediscover that psychology is “truly William James, ’ ‘science of mental life,’ not merely John B. Watson’ ‘science of behavior’ ” (Shulman, 1974, p. 334), or structural metaphysics. This section selectively treats one aspect of mental life: perception. Perception is a broad, complex subject of psychological inquiry; Carterette and Friedman (1973) enlisted more than 200 authors to write 159 chapters treating perception in ten prospective volumes. Obviously this work cannot be summarized here. Rather, this section discusses how perception of environments can “account” for considerable variation in classroom learning and some of the research complexities that investigators of educational perceptions face.
Accounting for Learning Variation How much of the variation in rate or amount of learning among students is attributable to their aptitudes, to educational treatments, and to the interaction of these two sets of variables? In an influential paper, “ The Two Disciplines of Scientific Psychology,” Cronbach (1957) identified aptitudes and treatments as the two important areas of psychology that had been studied separately (by correlators of psychometric measures and by experimenters who manipulate stimuli, respectively) and which would have to be brought together for a comprehensive analysis of the organism’s change in relation to the environment. Not only should psychologists investigate the main (or separate) effects of aptitudes and treatments (b1 and b2 ) on, say, learning in the regression equation: 1 = b1a + b2t + b3 (at) + constant + error, they should also study the possibly differential effects of treatments on students with different levels of aptitudes (interactions at). Such an analysis makes sense for a number of reasons, even if a parsimonious two-term additive model is preferred to one containing the product. Aptitude often serves as a potent covariate, decreases the error of estimate, and increases the power of the analysis to detect treatment effects. The weight (b3) for the product term provides an indication of the equality of the dependence of learning on aptitude in the different treatment groups, a possibility that should
Salkind_Chapter 23.indd 135
9/4/2010 10:34:30 AM
136
Curriculum, Instruction and Learning
be routinely checked in covariance. Most important, educators recognize individual differences in students and would like to have hints about which treatments are most suitable for students with different levels of aptitudes. Cronbach’s notion of aptitude–treatment interaction is perhaps the most influential research paradigm in educational psychology in several decades. It not only sensitized many to the possibility of interaction; it also led to hundreds of behavioral studies. In brief, the research showed that the effects of behavioral treatments (aside from content opportunities) and their interactions with aptitudes were difficult to replicate (Berliner & Cahen, 1973; Cronbach & Snow, 1976) if not small and inconsistent (Walberg, 1971). On the other hand, the estimates of b2 in the equation 1 = b1a + b2 p + constant + error reveal large, consistent “effects” of student perceptions of the social environment of learning that “account” in 11 analyses for a median of 30% (range = 13 to 46; all significant) of the variance in cognitive, affective, and behavioral postcourse measures, beyond that accounted for by parallel precourse measures. By contrast, IQ accounts for only a median of 7% (range = 0 to 9) of the residualized variance (Anderson & Walberg, 1974). Efforts at generalizing these results suggest consistency across different school subjects (Anderson & Walberg, 1974) and different languages and cultures (United States, Canada, Australia, Brazil, and India). Although in classroom research the usual treatment (aside from content opportunity) and its interaction with aptitude add little to the explanation of learning variation beyond that accounted for by aptitude, student perceptions of the social environment of learning add considerably. It can be hoped that measures of aptitude, perception, and exposure to content measured on the criterion test will account for most of the reliable variance in standard measures of learning outcomes.
What Perception Encompasses What does perception include that treatment omits? Brunswik (1956) recognized that, for Lewin, the “exterior field” (including treatments) is completely inside the person and argued that we selectively perceive our own and others’ behaviors, perceptions, intentions, emotions, ideas, traits, abilities, and memories. As James (1899) pointed out, restricting ourselves to atomistic, objective behavior omits the main subject of psychology, the conscious, perceiving mind. Even cosmopolitan Murray (1959) felt the need for “objective observer judgments,” in view of the possibility that “the internal is morbid delusion” or “unrelated to realistic or congruent estimation of the external situation” (p. 27). Such reasoning might better be reversed: what is objectively counted or measured should be weighed and justified by what is subjectively perceived, insofar as individual learning is concerned.
Salkind_Chapter 23.indd 136
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
137
Specifically, what are some things subjective phenomenology encompasses that the “objective observer” who counts cues or behaviors might miss? They include the following: 1. The Chicago social psychological school’s (Cooley, Mead, Dewey) observations that we modify our potential reactions in view of the way others might react to them and that we consciously alternate between the self and this other point of view (D. T. Campbell, 1963). 2. The well-known phenomenon from human and infrahuman psychology that organisms can ignore stable or inconsequential stimuli in the environment (Pantin, 1968). 3. That a mean man is kind when he doesn’t hurt you (Tagiuri, 1969), that we damn with faint praise, and that some teachers give great rewards with a smile and others observably gush to no avail. 4. That discriminations can sometimes be made not only on the basis of single, consistent, atomistic attributes (Aristotelian man as featherless biped) but that in natural settings we most often distinguish, rightly or wrongly, on the basis of imperfectly consistent groups or patterns of cues such as the recognition of family resemblances in faces (Wittgenstein, 1953). 5. That different cues in the same environment may lead people to the same (or different) perceptions, conclusions, or actions. 6. That perception, though imperfectly related to veridical environments, and imperfect attribution of causality, intention, and consistency adaptively simplify and stabilize complex social environments for the individual (Tagiuri, 1969). 7. That perception itself is readily adaptable; for example, the respective advantages of weightlifters and watchmakers in judging different ranges of weights are soon lost with practice (Tresselt, 1948). Behavioral psychology, with its origins in simple, controllable laboratory experimentation, seems to carry into education the simplistic notion of one-way causation, of teacher as first or only cause, the dispenser of cues and rewards (Figure 2, Model 1a) who sometimes delegates control over the student to the text or workbook. Much research on instruction relates counts of teacher behaviors every few seconds to residualized learning, on the assumption of equal one-way, univariate causal effects on students (Figure 2, Model 5). Such models, of course, do not consider programmed systems (see Talmage, 1975) which determine teacher role or the influence of student on teacher or of students upon one another, as common sense and the other models would suggest. Psychology, as McGuire (1973) argues, must begin to take account of such bidirectionality as well as reverberation and feedback among humans in natural settings. (See Fiedler’s, 1975, pioneering paper on classroom interaction.) But the task has to be as complicated and difficult as the foregoing examples suggest only if we insist on studying behavior alone. Murray (1959) is worth quoting on the point:
Salkind_Chapter 23.indd 137
9/4/2010 10:34:30 AM
138
Curriculum, Instruction and Learning
1. Teacher centered
5. Large group instruction S
a. Teacher
Student
S T S
Materials
S 2. Technology S
a. M
6. Small group and tutoring b.
T S S T
S
3. Student centered
S a. S
M
T 7. Transactional
S 4. Transactional T
S
S S S
T
M
Figure 2: Some possible paths of causal influence in the classroom
In due course, I assumed that correctness of prediction is the best index of the worth of different methods. I did a few impromptu experiments and found empirically that the most dependable single operation I could perform in attempting to foretell what a behaviorist would do next or in the near future was to ask him. But the commonsensical avowal I wish to make here is this: that first as a doctor and second as a psychologist I have never ceased to elicit direct expressions and reports of interior experiences – somatic, emotional, and intellectual. . . . (p. 10)
Similarly, Donald T. Campbell, in an interview with Tavris (1975), when asked about the legitimacy of Unobtrusive Measures (Webb, Campbell, Schwartz, & Sechrest, 1966), said: When I started out, I was, like many social scientists, obsessed with gadgets. I believed I was being more scientific if the people I studied didn’t know what I was up to, and if they didn’t realize what they were telling me.
Salkind_Chapter 23.indd 138
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
139
After 20 years research, I feel that this is an unworthy stance for a scientist. It exploits others and implies social distance between researchers and the people they study. In addition, all the evidence shows that when you ask people to cooperate with you and tell you their attitudes, you get greater validity than with all this mumbo-jumbo. (p. 47)
Thus one way to find out about, say, the suitability of the learning environment is to ask students, as well as teachers and trained observers. The student, however, stands at a superior vantage point; what he takes in makes the difference in learning. By the age of ten he has encountered a variety of educational environments; he is with his teacher for many hours during the year; he is a sophisticated judge with plenty of information to weigh. His perceptions, as partaker of classroom social transaction, are of great value, and it is easy enough (and incrementally valid) to ask him for them. As Fiedler’s (1975) study of classroom transactions shows, students’ perceptions of their own influences on the class, but not observer estimates of the same, predict academic gains. This is not to say that “objective” tests, counts of teacher behavior, and the like must be put aside, but only that by themselves they may be an inefficient approach to understanding. Behavioral efforts to link classroom treatment with learning are numerous, expensive, and difficult. Investigating the links of perception with treatment and learning may yield more revealing clues to the puzzle of optimizing classroom learning.
A Framework for Research on Classroom Perceptions Another appeal for incorporating perceptual measures into classroom-effects research is scientific and practical parsimony. Even if psychometrists could agree that there are ten aptitude factors, and if instructional theorists established ten aspects of treatment, the number of main effects and first-order interactions of aptitude and treatment alone, 120, would be unparsimonious, to say nothing of other interactions such as aptitudes with one another, treatments with one another, and aptitudes and treatments with student moods and developmental levels and teacher characteristics. Moreover, the possibility of higher order interactions and sequences of treatments would raise the number of theoretical effects by many magnitudes.
Mediation There is one instrument that can help us to deal with such complexity – the human mind. Students seem quite able to perceive and weigh stimuli and to render predictively valid judgments of the cohesiveness, democracy, goal direction, friction, and other psychological characteristics of the social environments of their classes. These molar judgments may mediate the multiplicity of
Salkind_Chapter 23.indd 139
9/4/2010 10:34:30 AM
140
Curriculum, Instruction and Learning
molecular events of instruction and other classroom activities and properties. Notwithstanding the extremely general appeal of atomic description, the physicist chooses larger units of analysis for problems of astronomy and mechanics; he and the engineer, of course, feel free to draw on fine-grain analysis when it seems helpful in accounting for macrophenomena. And moot physiological questions about the linkage of the five or ten processes that regulate blood clotting do not deter the physician from cleansing a wound or applying a tourniquet. Similarly, perception can usefully and simply index the complex match of internal and external elements, structures, and sequences that optimizes learning. How can the validity of this mediation be tested? Unlike a path diagram, which requires an indisputable list of all causal variables plus an unequivocal indication of their causal direction, the mediation diagram (Figure 3) simply asserts that, in ordinary classroom instruction, no variable explains any sizable variation in immediate learning outcomes beyond that accounted for by aptitude, content opportunity, and student perception of the learning environment, provided the variables are validly and reliably measured. Thus other variables (including products and quadratics to test for interaction and curvature) added to a regression containing these variables will not replicably increase R2. Similarly, extended outcomes are mainly accounted for by immediate outcomes, perceptions, and past (and possibly present) student background. In samples of ordinary middle-school, high school, and college lecturediscussion classes that meet three to five times a week for a term or a year, it may be found that a relevant postcourse learning measure will regress significantly on at least one aptitude measure; and the residual from this equation will regress significantly on at least one opportunity measure (measured by the degree of exposure to criterion content). The second residual will significantly regress on a perception measure. This hypothesized sequence of stagewise entry means that content opportunity will account for variance in the learning criterion, independent of that large amount, perhaps 40 to 60%, which overlaps aptitude and pretest. Similarly, perception will account for learning variance independent of that accounted for by the other two terms. The weighting of the terms in the regression will depend on the covariations of the variables with one another and with the outcome. However, covariations are functions of their variations; a variable must vary reliably, or it cannot covary. For example, if the learning outcome is unreliably measured or is an easy test on which all students get very high scores, then the outcome measure detects very little variation that can covary with aptitude, opportunity, or perception. Concluding that individual differences have been removed or that the latent variables do not covary is based on the fallacy that what is poorly measured does not exist. Whether or not more than one term from each set enters the regression will in part depend on the covariations of the variables within sets. For example, within the aptitude set, IQ alone will generally account for much of the variation
Salkind_Chapter 23.indd 140
9/4/2010 10:34:30 AM
Salkind_Chapter 23.indd 141
Posttest understanding Posttest attitude Structural stage
Pretest knowledge
Pretest understanding
Pretest attitude
Generalization
Transfer
Follow-up tests
Extended outcome
Psychology of Learning Environments
Figure 3: A mediation diagram for student learning. (This figure is not a path diagram and thus does not identify all causal variables and paths.)
Community environment
Peer environment
Family environment
(Heredity)
Student background
Personality
Behavior
Teacher characteristics
Overlap of homework with outcome measures
Overlap of lessons with outcome measures Student perception of classroom environment
Posttest knowledge
IQ
Content opportunity
Immediate outcome
Aptitude
Walberg 141
9/4/2010 10:34:30 AM
142
Curriculum, Instruction and Learning
in the usual outcomes; only if the variation in pretest knowledge, understanding, or attitude covaries sufficiently with outcome variation independent of that accounted for by IQ will one of these variables be significant. Such psychometric measures are reliable, but they have little incremental validity beyond one another in prediction. For example, only recently has it been possible to conclude that small classes and discussions are educationally more beneficial, because much of the older research employed measures of superficial verbaleducational achievement rather than deeper understandings (McKeachie & Kulik, 1975). Weighting the variables within sets equally or by principal or canonical components has empirical appeal. However, theory-guided stagewise, stepwise regression would seem the method of choice in theoretically based research. An extended illustration is given below; here, a word about the reasoning is in order. Just as advocates of perception should allow perceptual variables the worst chance in the regression, that is, allow them to enter in the third stage, after aptitude and opportunity, so should variables within sets be allowed chances to enter in a guided stepwise sequence. To take the aptitude set as example: since IQ predicts learning so generally, it should be allowed to enter first, and the burden of proof should be placed on the incremental explanatory power of questionable variables such as pretest attitude and structural measures. However, these educationally valued but questionably measured variables can serve as tentative outcome indicators; greater effort should go into making them differentially valid.
Educational Cautions In concluding this section two critical points are worth noting. First, even if the mediation hypothesis survives repeated empirical probes by independent investigators, we could not infer that changing perceptions would change rates of learning. Similarly, we cannot unequivocally conclude that cigarette smoking causes lung cancer, even from prospective surveys that link them, and control for rival explanations such as social class and area pollution levels. Statistically controlled correlation does not prove causation, and the mechanism of carcinogens at the cell level still has not been elucidated. Nonetheless, a prudent person who smokes cigarettes reconsiders the possible consequences of smoking on the basis of imperfect correlational evidence before the definitive experiment on humans in natural settings is carried out. Should we do as much in educational psychology? (Some practical experiments and evaluations are described in the final section of this chapter.) Second, the mediation hypothesis may be wrong in holding perception of the learning environment as a means to behavioral or structural ends. The limitations of measurement of such ends and the fallacy of equating what is most measurable or most often measured with what is most important have
Salkind_Chapter 23.indd 142
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
143
already been mentioned. Beyond these points, it can be argued that certain perceptions (of cohesiveness and democratic practices, for example) are worthy ends in their own right, or that certain perceptions optimize learnings that presently cannot be measured. Either case suggests that educators may find evaluations of perception useful, and additional deliberation and research on these points are certainly in order.
Developments in Research on Classroom Perceptions The works cited in the beginning of this review discuss a number of substantive and methodological issues of research on learning environments. There is little need to repeat points in these reviews. This final section seeks out what is neglected elsewhere, specifically points of emerging consensus or controversy and developments, especially in works that are somewhat inaccessible, that suggest interesting possibilities for future research.
Dimensions of Perception What aspects of perception should be investigated? Shulman (1974) and Walberg (1974c) infer from a number of sources that we normally only hold a few things in conscious perception. Moreover, several possibly similar dimensions of classroom perceptions can be discerned in several research programs that differ greatly in theoretical starting points, operationalization of surface variables, and methods of synthesis. Bales, from unpublished factor analyses of observed behaviors of members of small self-analytic classes at Harvard (personal communication, 1968), posited three factors: Affection, Achievement, and Status. His quasi-spatial coordinates serve as metaphors for individual acts: a member expresses Achievement by forward moves: Status by upward moves; and Affection by moves to the right. These broad classes of behavior presumably summarize much of the individual behavior in groups that could be minutely analyzed. Upon learning of these dimensions in 1968, Ahlgren and Walberg, in unpublished work, tried an orthogonal rotation of factors of student perceptions of the social environments of their classes to an a priori Bales structure. Notwithstanding the fact that Bales factored individual acts and Ahlgren and Walberg factored the means of class perceptions, the rotation produced a close match, in the sense of explaining nearly as much variance as the original components. Independently, Moos (1973) and Insel and Moos (1974) found that the variables represented on instruments measuring perceptions of a wide variety of human environments could be reasonably classified in three categories – Interpersonal Relations, Personal Development, and System Maintenance and Change, which seem close to the factors of Bales.
Salkind_Chapter 23.indd 143
9/4/2010 10:34:30 AM
144
Curriculum, Instruction and Learning
Although canonical analysis is more concerned with the number and nature of orthogonal dimensions of association between two sets of variables (rather than within one set, as in factor analysis), it lent support to the first two Bales and Moos dimensions in one study. Walberg (see Anderson & Walberg, 1974) found that student perceptions of satisfaction and cohesiveness were associated with strong interest in physics at the end of the course; a second variate, independent of the first, linked perceived difficulty and pace of the class to cognitive achievement. Because of psychology’s ancient distinction between emotion and cognition, tracing back to Plato, the story might have happily ended with some consensus about the two factors, with not only historical but face, factorial, and predictive validity. Unfortunately, later explorations, reviewed by Anderson and Walberg (1974), questioned such a neat distinction. In new samples on other subjects, perceptions of high cohesiveness and other affective qualities, but not cognitive perceptions, predicted cognitive achievement. Other research (Anderson & Walberg, 1974) suggested that affective and cognitive perceptions are fused in one canonical variate; classes high on the variate contain students who have major accomplishments and interest in the subject. The authors of the perceptual instruments that have been factored tried to make the scales mutually uncorrelated; so it is somewhat contradictory to search for structure in them. Moreover, the scales are reliable enough to treat individually in regression, and because two variables load similarly on a factor does not mean they will correlate similarly with a criterion or predictor. Yet having factors that are comparable across subjects, samples, and instruments would enable replication and consolidation of findings. It is clear that the search for perceptual structure will continue. Future work should avoid the continual exploratory factoring that in ability and personality measurement has led to nearly as many sets of factors as the number of major investigators. The results of purely empirical factoring, such as the popular components and varimax rotation, are highly dependent on the accidents of variance and covariance described in a previous section. Moreover, removing a few outlying observations can greatly change the factor solution. For these reasons (and contrary to its purposes), exploratory factoring seems to have multiplied rather than coordinated or synthesized psychological theory. To avoid such flux in perceptual research, one should start with a priori schemes, either prior empirical solutions or theoretical frameworks. The analyst might either seek confirmation of one or pit one against another, using confirmatory factor or canonical analysis (Mulaik, 1975). The threefactor models of Bales and Moos are good starting points.
Analytic Models The school class is obviously a social group composed of individuals, and it can be analyzed at the group or individual levels. The analysis of perceptions
Salkind_Chapter 23.indd 144
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
145
of the social environment of the class or school presents some interesting problems that should be considered in future research on social perceptions and other group properties. In educational research around 1965 to 1970 there were four positions on the units question: 1. Sociologists, nearly hypnotized by Robinson’s (1950) seminal paper, cautioned that analysis (correlation, regression, or ANOVA, for example) of means (and other group properties such as percentages in various social stata in census tracts) cannot be generalized to individuals; indeed, they correctly pointed out that regression weights for group means and individuals may differ in sign as well as magnitude. 2. In contrast, psychologists began to realize that the analysis of individuals in groups, which characterizes a large fraction of quantitative research in psychology and education, violates the assumption of discrete, independently responding units for statistical inference. Hence group means seemed to be the unit of choice. 3. Many seemed to be uninformed of the problem or ignored it and arbitrarily chose mean or individual analysis. 4. The Harvard Project Physics (HPP) group got conflicting advice from eminent scholars and so carried out analyses both ways and tried to compare them (see Anderson & Walberg, 1974, for a summary of early analyses). In general, as others have found, means show stronger relations than do individual scores; that is, the ratios of regression weights to their standard errors are larger. Anderson (1970) carried out an integrated analysis. Assuming that perception of the social environment is a class property and IQ is an individual property, he punched the class-mean perception on each student’s card and regressed postcourse individual learning on it and the student’s IQ. Anderson’s insight that different levels can be analyzed simultaneously led to a test of the possibility that perception of the class merely indexes the idiosyncratic match of each individual to the environment. In a set of Montreal data (Anderson & Walberg, 1974) the mean perceptions of a random group of students in the class were nearly as closely related to the mean post-course learning (residualized for IQ) of another random group in the class as they were to the same measure for the same group. Thus, class-mean perceptions are realistic or congruent in Murray’s (1959) sense. Walberg (unpublished tables, 1968) regressed individual learning scores across classes on perceptions, with their class-mean property removed by computing within-class Z scores. The data for these analyses were obtained on the first HPP instrument to measure perceptions of the social environment, and the reliabilities of the scales were low to moderate; hence the research went unpublished. In correspondence about the analysis, Cronbach
Salkind_Chapter 23.indd 145
9/4/2010 10:34:30 AM
146
Curriculum, Instruction and Learning
(personal communication, 1969) suggested the concomitant analysis of the class-mean perception and the raw difference of the individual’s perception from his or her class mean. Z scores measure the individuals distance from the class mean in units standardized within class. Raw difference scores measure the distance in the original metric of the variable; they are easier to compute ( x − x ) than Z scores, and have no disadvantage unless standardized ranking is theoretically critical. But what does decomposition of x into x and ( x − x ) tell us? Figure 4a shows the expected situation in which three parallel statements can be made: 1. The more cohesive the class is perceived, the greater the class learning (both measured by means). 2. The extent to which the individual differs from the means in his perception is reflected in his learning deviation, as measured by difference or Z scores. If he perceives the class as more cohesive than his classmates do, he will learn correspondingly more. 3. The higher the individual’s score in perception of cohesiveness across classes, the greater his learning.
a.
Learning
Cohesiveness b.
Learning
Cohesiveness
Figure 4: Parallel and reversed mean and individual measures. (Class means on the variables are indicated with a dot.)
Salkind_Chapter 23.indd 146
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
147
Figure 4B illustrates a reversed situation: 1. The greater the class cohesiveness, the greater the class learning, as before. 2. To the extent that the individual exceeds the class mean in his perception of cohesiveness, however, the less is his learning in relation to the mean of his classmates. 3. The individual’s perception of cohesiveness across classes is unrelated to his learning. Such an odd reversal might be attributable not to the general benefits of cohesiveness to the class as a whole but to the harm of channeling psychic energy into socializing rather than learning by members who see more cohesion in the class than others do. The two effects cancel each other out and produce a zero correlation or flat regression on individuals across classes. Learning might also regress oddly on IQ or anxiety, and it is worth testing the possibility with mean-difference analysis. Another important notion that has implications for perceptual research and other analyses of individuals in groups is Lohnes’s (1972) regression of reading posttest means on pretest means and standard deviations (and other shape indicators). For example, a smaller variation on the pretest might be associated with greater mean gains, because instruction can be targeted on the narrow spread of beginning achievement. To investigate shape and difference effects, Walberg and Singh (1974) regressed individual posttest achievement in social studies and science on means, standard deviations, and differences of aptitude, perception, and control variables, plus their products and squares (to test for interaction and curvature), using large samples of classes in Rajasthan, India. Such regressions, without the information loss of leveling in analysis of variance, test such complex possibilities that brighter classes or students brighter than the rest of their class achieve better or worse under conditions of varying IQ heterogeneity. Such complex analyses, however, generate many variables; unguided stepwise regression should be avoided, and select equations should be cross-validated on independent sets of data. Thus, the order of possible stagewise and stepwise entry should be determined by such rules as placing well-established variables, for example, IQ, before hypothesized variables such as perception; means before standard deviations and difference scores; main effects before interactions; and linear variables before quadratics. The cross-validated Rajasthan results showed that students who learn more are in bright classes, are brighter than their classmates, are with teachers perceived as effective by their classmates, see their teacher as more effective than do their classmates, and are seen as studious but not misbehaving by their classmates. Whereas prior work on learning environments obtained perceptions of the social environment, to which teacher and student contribute, the Rajasthan study showed it is possible to partition significant variation associated with student perception of the teacher and each student. Although separating the objects of perception may be useful, for example, to gain
Salkind_Chapter 23.indd 147
9/4/2010 10:34:30 AM
148
Curriculum, Instruction and Learning
formative insights on how the teacher or student is perceived, the separate effects are probably mediated by the student perception of the generalized social environment of learning. The Rajasthan analysis tested the possible entry of 51 different variables but showed that only 6 were significant in the theoretically guided ordering. Sex, socioeconomic status, class size, and interactions were mediated by the main effects of the variables in the select equation, and no curvature among the variables was significant. Moreover, although unrecognized by Walberg and Singh at the time, the results could have been further simplified. The raw regression weights for mean and difference terms were approximately equal (parallel as in Figure 4A rather than reversed as in 4B), and the terms did not interact with one another or any other variable. Therefore, the individual score for IQ and for student perception of the teacher, that is, the simple individual score x (ignoring class), would account for learning as well as both class mean x and individual difference x − x . The analytic points made above are based on two value judgments that coincide with educational practicality. We should strive for simplicity: we should prefer main effects over interactions, linearity over curvature, parallel rather than reversed effects at group and difference levels of analysis, and, in general, few over many variables. But such parsimony and practicality should be accepted only if analyses show that assumptions of complexity are unwarranted.
Analytic Outcomes The discussion has dealt with the regression of educational outcomes on aptitude, perception, and other variables and complexities and possible simplifications among these regressors. The cases of complexity in the criteria or regressands also call for discussion. In addition to using moment-based shape indicators of the pretest, Lohnes (1972) used such indicators of the posttest distribution as regressands. He drew no educational implications about shape indicators – location, spread, and skew – as outcomes, but his method is suggestive for research on educational perceptions specifically and educational effects research in general. It is ironic that the sociological controversy of recent years on educational opportunity has left researchers unsensitized to the possibilities of shape indicators other than a school or class mean or individual score as outcome. As Lohnes points out, shape indicators go back at least as far as Ronald Fisher’s early work. Moreover, instructional psychologists who have recently aimed for equality of results should be considering spread as outcome. And another indicator that deserves consideration is skew; by definition, at least in relative terms, excellence implies one or a few individuals far above the rest (a cosmopolitan educator, however, entertains many aspects of excellence). In addition to such shape indicators, subgroup similarity and contrast effects may warrant analysis when learning or perception is the regressand.
Salkind_Chapter 23.indd 148
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
149
Walberg, Sorenson, and Fischbach (1972) developed a multivariate, multiple regression procedure to investigate such effects. If we are interested in the relation of an input, such as school size or innovation, associated with differential perceptions of subgroups within a school, such as boys and girls or students of high and low socioeconomic status, we can calculate the means of the subgroup perceptions for each school. Then we can calculate the sums of the subgroup means which indicate, as a dependent variable, the general effect of the independent variable, and the differences between subgroup means which indicate the advantage or disadvantage to one group compared to the other at various levels of a regressor. Sums and differences could also be calculated on learning measures of subgroups within classes and regressed on perceptions and other variables. For example, we can ask how much cohesiveness benefits learning in general and also how much it benefits boys more than girls, or vice versa. Such sum-difference analyses, unlike mean-difference analyses, do not violate the independence assumption and do not have ambiguous numbers of degrees of freedom, as long as multivariate tests are carried out. Three final analytic points should be mentioned. The first is the problem of unreliability corrections. If two variables differ in reliability, it can be misleading to compare their relations to a third variable without adjusting for the differences in reliability, because the measure of relation (r,b,t,F) is diminished by unreliability. Moderately complex analyses seem to call for extraordinarily complex adjustments (Cronbach, Gleser, Nanda, & Rajaratnam, 1972), and the extension of these adjustments to the cases above is far from obvious. How to correct for such varying degrees of unreliability of means, deviations, differences, and products in the presence of colinearity is a difficult and important problem that remains unresolved. Second, although the mean is the standard estimate of location, the Princeton group (Andrews, Bickel, Hampel, Huber, Rosers, & Tukey, 1972) nominates it as “clear candidate” for the “worst estimator” (p. 239) among 68 studied for robustness; even mild skew and slight outliers vitiate it (s, r, and b are even more susceptible). Because of this problem, differences would be better measured from the median, and Lohnes’s location and spread would be better measured by the median and the semi-interquartile range (see Tukey, 1972). These replacements are less correlated with one another than are means and standard deviations in the presence of skew or outliers, which makes them more estimable regressors. Third, research on perceptions of learning environments usually deals with many correlated independent and dependent variables, and much of the published work (see reviews cited in the first section of this chapter) makes use of multivariate techniques not only for their summary power, which is often useful, but to guard against a few chance findings among many investigated, on either side of the equations. But some research regrettably is exploiting such chance relations, and even atheoretical, unguided, uncross-validated stepwise regressions could be cited.
Salkind_Chapter 23.indd 149
9/4/2010 10:34:30 AM
150
Curriculum, Instruction and Learning
Evaluation Notwithstanding the great amount of theoretical and analytic research that is necessary, perception instruments have proven to be useful in applied evaluative studies. Eash and Talmage and their associates make student perception of learning environments a focus of their evaluative research. In a rare true educational experiment, Eash, Sparkis, and Rasher (1975) evaluated a public school district’s “alternative” junior high school featuring self-direction, learning contracts, and community involvement. Although achievement tests and self-concept measures showed borderline significance levels, the experimental students rated their classroom environments as being sharply more cohesive, democratic, and goal directed and as having less apathy, disorganization, and favoritism than did students who remained in control schools. The persistence of the differences over the several years of follow-up helped to discount Hawthorne effects. The standard approach to evaluating National Science Foundationsponsored in-service training programs is to administer before and after achievement tests to the teachers. Eash and Talmage (1975) engaged the teachers directly in the evaluation of an “investigative mathematics” program by having each participant administer preceptual measures to her or his class and a control class, discuss the comparison of the class with the control and those of the other participants, and study the changes in perception during the year. It was found that the investigative classes gained more in cohesiveness and less in friction and competitiveness. The teachers found the perceptual data valuable in comparing their initial status to that of others and in charting their comparative progress. Eash and Talmage also evaluated the changes in the perceptions of their classes by black and white students over a period of several years in a suburban school district which carried out an integration plan. The general trend of the significant changes was increasing satisfaction and decreasing friction; and the perceptual data revealed several troubled classes and schools in which the alerted staff was able to remedy conditions. Well-designed correlational evaluations can be as valuable as experiments, but they must be judged on different criteria. Man – A Course of Study, the controversial social studies program, is undergoing an imaginative, methodologically sophisticated correlational evaluation by H.R. Cort (personal communication, 1975). Although the national sample of Man and control classes, the time series design, test sampling, and convergent-divergent measurement are strong points of the evaluation, the perceptual emphasis is of most interest here. Since the objectives of Man are too open ended and implicit to operationalize uniformly, both teachers and students are asked repeatedly during the year to describe in some detail what they think they are accomplishing, how they are doing it, and, in the case of teachers, why they are doing it. In effect, goals are
Salkind_Chapter 23.indd 150
9/4/2010 10:34:30 AM
Walberg
Psychology of Learning Environments
151
inferred, within limits, from what is being learned and then compared to what is avowed. Diversity of comment is solicited, and a full range of apparent outcomes is sought. These are being related to scales measuring the student perceptions of the social environment (Anderson & Walberg, 1974) and perceptions of the emphasis given to levels of the Bloom taxonomy of cognitive objectives (Steele, House, & Kerins, 1971). Cort is also experimenting with mean, difference, and variation analyses.
Measurement Since a number of methodological questions have been considered above, this subsection is brief and treats only developments in measurement. Doctoral dissertations by Gardner (1974) at Monash University (Victoria, Australia), Holsinger (1972) at Stanford, and Zussman (1976) at McGill indicate some new directions for measuring perceptions of learning environments. All three works are large-scale empirical studies based on a priori theoretical frameworks that hold bright promise as beginning research programs. Gardner developed a perceptual instrument based on Murray’s (1959) needs-press theory and brought this personality-social-psychological perspective into curriculum research, with interesting results. Zussman’s dissertation, which is still in progress, shows that reducing the 15 scales from the Learning Environment Inventory (LEI; Anderson & Walberg, 1974) from seven to three items each, and hence the time from about 25 to 11 minutes, results in little reliability loss. Holsinger translated only 7 of the 105 LEI items to Portuguese and found the drastically shortened scale had greater predictive validity than a socioeconomic status index with student cognitive performance and modernity as criteria for a sample of Brazilian primary school children. Ahlgren (personal communication, 1974), at the University of Minnesota, increases rather than reduces the number of scales and items; adapting items from the LEI and the semantic differential, he uses optically scannable sheets that yield several hundred perceptual responses concerning the school plant, program, staff, and subjects in the curriculum. It is easy for younger children to fill in smiling, neutral, or frowning faces to indicate their evaluations, and the result is scannable for conversion to punched cards or tape. Ahlgren designs graphic transparencies to convey the information to school staff committees for evaluation and planning. Barclay’s instrument (personal communication, 1975) is sociometrically based, and his and other such work merit review in a separate treatment. However, a feature of his methodology at the University of Kentucky that has implications for research on perception of educational environments is the use of numerical ratings to trigger computer readouts of diagnosticprescriptive, natural language statements on classes and individual students, as well as computer-drawn profiles for use by the school staff.
Salkind_Chapter 23.indd 151
9/4/2010 10:34:31 AM
152
Curriculum, Instruction and Learning
Such technical developments allow not only quicker, more efficient, and more comprehensive mutual enrichment of theory and data but also various practical applications. Harnessing computer memory, analyses, and graphic displays to these developments and to time series and student sampling should, with due restraint for privacy, offer many possibilities for research, training, and evaluation. Groups at two universities base their research design and measurement on social psychological theory. Johnson and Johnson of Minnesota and DeVries, Edwards, and Slavin of Johns Hopkins focus on classroom competition and cooperation and measure perception of the social environment of learning as well as academic performance and other variables as dependent variables. Johnson and Johnson’s (1974, 1975) masterful integrative review of psychological research (including their own and the early Hopkins work) and classroom application is a model of applied social science. Since the Johnsons’s review is readily accessible, only one point will be drawn from it as an introduction to the later work of the Hopkins group. The Johnsons conclude that when class members compete as individuals for grades, they often become hostile and hinder one another; the social environment becomes less wholesome and constructive. And, as Slavin (1975) shows, grades given to groups as wholes do not uniformly bring out the best in each member when individual accountability is unclear. The Hopkins group invented a mixed cooperative and competitive classroom reward structure called Teams-Games-Tournament (TGT). In TGT, students placed on five-person teams (a cooperative reward structure) compete with members of other teams at three-person “tournament tables” (a competitive reward structure). Points won by students at the tables by competing on course-content-relevant games contribute to a team score. The student at the tournament table, like the baseball player at bat, strives toward a group goal and is individually and publicly accountable for his contribution. Unlike a “group grade” contingency, a student cannot easily afford to coast and let his teammates carry him along. DeVries and Edwards’s (1974) true experiments show that TGT can increase academic achievement, cross-sex and cross-race help and friendship, and favorable perceptions of satisfaction and mutual concern. The Hopkins group finds perceptual measures prove not only quite sensitive to treatments but also more convenient to use than behavioral counts, which are expensive to obtain, and custom-made or published achievement tests, which require special standardizing to compare achievement in units of lessons of different lengths, at various grade levels, and in several subjects in the curriculum. Finally, some brief comments on several on-going investigations may give some indication of the future of perceptual research on learning environments. Robert Ellison of the Institute for Behavioral Research in Salt Lake City, William Genova of TDR Associates in Newton, Massachusetts, and
Salkind_Chapter 23.indd 152
9/4/2010 10:34:31 AM
Walberg
Psychology of Learning Environments
153
Walter Hathaway and Stephen Murray of the Northwest Regional Laboratory in Portland are carrying out state-wide assessments of educational environments. Ellison developed a perceptual instrument based upon eclectic psychological theory and educational needs in Utah and is using a systematic sampling frame to assess classroom environments throughout the state for educational planning. Genova, working with the Massachusetts Department of Education and parent advisory groups, is reviewing existing instruments to formulate a school environment measure that will be used by school staff and parent groups to assess their schools and plan data-based improvement programs. Hathaway and Murray, under contract with the National Institute of Education, are constructing perceptual instruments to measure the intended and unintended effects of the state-mandated competency-based programs in the Oregon high schools. Murray, unlike many psychometrists who assume a purely methodological, atheoretical stance, is also carrying out several sophisticated evaluations of programs to increase the interpersonal sensitivity of teachers; his theoretical models, conceived in collaboration with program developers, relate prior characteristics of teachers, group-formulated goals, social perceptions in the training groups, group effectiveness ratings, changes in classroom behavior, and student perceptions of classroom climate. Empirical results identify breakdowns in the hypothesized causal framework that suggest improvements in materials and procedures. Among the several hundred investigators using the Learning Environment Inventory, several in the United States, Australia, India, and Canada require at least brief mention. Chad Ellet and David Payne of the University of Georgia set forth and refined more than 1,000 behavioral competencies of school principals which are being validated against student perceptions of the classroom environment and achievement. More than 80 percent of the conceptually-hypothesized directions of relations of competencies to perceptions were born out with a sample of 60 schools. Continuing analyses of such relations, with statistical controls for school socioeconomic status and other factors, are laying the basis for systematic assessment and training programs for principals. Colin Powers of the University of Queensland and Richard Tisher of Monash University in Australia are collaborating on replication and extension of multivariate analyses of aptitudes, perceptions, and achievement originally done in the United States and Canada. Rampal Singh of the Jialal Institute of Education in Ajmer, Rajasthan, India is replicating earlier predictive validity studies with a Hindi version of the Inventory. Vidya Bhushan of the University of Montreal translated the Inventory to French and had an independent translation made back to English as a check; he is investigating the cross-cultural validity of prediction and of the Bales-Moos factors discussed in an earlier section. Such national and international hybrid vigor of theory, substance and method bodes well for the future of perceptual research on learning environments.
Salkind_Chapter 23.indd 153
9/4/2010 10:34:31 AM
154
Curriculum, Instruction and Learning
Conclusion Hints on the possible utility and direction of perceptual research on classroom teaching and learning may be drawn by analogy from Counsilman’s (1968) analysis of competitive swimming, a sport that has shown steady progress during this century. Physics provides the inverse-square law of resistance and Newton’s law of action-reaction to suggest ways of improving stroke mechanics; and physiological principles set forth regimens for improving aerobic and anaerobic conditioning. But what seems critical among swimmers of international rank is the ability to regulate lap times to the fraction of a second by the “feel of the water.” To be sure, teaching is far more complex than swimming, but perception may be no less critical. What may be required to evaluate teaching techniques based on structural and behavioral theories are perceptual measures of the “feel of the class,” say, its cohesiveness, level of participation, and pace of achievement. Such proximal, immediate measures may yield sensitive indications of effects and conditions that will cumulate in distal medium- and longterm educational consequences. The mystique of the master teacher may be partially attributable to perceptiveness, the ongoing assessment of posture, tone of voice, and subtle patterns and changes in the class. The multivariate association of such behavioral complexes with educational perceptions may reveal the general factors that characterize the social environment, that optimize various learnings, and that can lead to a theory of teaching based on perceptual mediation.
References Allport, G. W. The historical background of modern social psychology. In G. Lindzey & E. Aronson (Eds.), The handbook of social psychology (Vol. 1). Reading, Mass.: AddisonWesley, 1968. Anderson, G. J. Effects of classroom social climate on individual learning. American Educational Research Journal, 1970, 7, 135–152. Anderson, G. J., & Walberg, H. J. Learning environments. In H. J. Walberg (Ed.), Evaluating educational performance. Berkeley: McCutchan, 1974. Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rosers, W. H., & Tukey, J. W. Robust estimates of location. Princeton, N.J.: Princeton University Press,1972. Berliner, D. C., & Cahen, L. S. Trait-treatment interaction and learning. In F. N.Kerlinger (Ed.), Review of research in education, 1. Itasca, Ill.: F. E. Peacock,1973. Bermant, G. (Ed.). Perspectives on animal behavior. Glenview, Ill.: Scott, Foresman, 1973. Bloom, B. S. Stability and change in human characteristics. New York: Wiley, 1964. Boudon, R. The uses of structuralism. London: Heineman, 1971. Bowers, K. S. Situationism in psychology: An analysis and critique. Psychological Review, 1973, 80, 307–336. Brunswik, E. Perception and the representative design of psychological experiments. Berkeley: University of California Press, 1956. Campbell, D. T. Social attitudes and other acquired behavioral dispositions. In S. Koch (Ed.), Psychology: A study of a science (Vol. 6). New York: McGraw-Hill, 1963.
Salkind_Chapter 23.indd 154
9/4/2010 10:34:31 AM
Walberg
Psychology of Learning Environments
155
Campbell, W. J. (Ed.). Scholars in context: The effects of environments on learning. Sydney: Wiley, 1970. Cannon, W. B. Wisdom of the body. New York: Norton, 1932. Carterette, E. C., & Friedman, M. P. (Eds.). Handbook of perception. Vol. 3, Biology of perceptual systems. New York: Academic Press, 1973. Chomsky, N. Review of “Verbal learning” by B. F. Skinner. Language, 1959, 35, 26–58. Clark, B. Causes of biological diversity. Scientific American, 1975, 233, 50–60. Clark, R. W. Einstein: The life and times. New York: Avon, 1971. Counsilman, J. E. The science of swimming. Englewood Cliffs, N.J.: Prentice-Hall, 1968. Cronbach, L. J. Beyond the two disciplines of scientific psychology. American Psychologist, 1975, 30, 116–127. Cronbach, L. J. The two disciplines of scientific psychology. American Psychologist,1957, 12, 671–684. Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. The dependability of behavioral measurement: Theory of generalizability for scores and profiles. New York: Wiley, 1972. Cronbach, L. J., & Snow, R. E. Aptitudes and instructional methods. New York: Irvington, 1976. Darwin, C. The Origin of Species. London: Murray, 1859. Darwin, C. The formation of vegetable mould through the action of worms, with observations of their habits. London: Murray, 1881. DeVries, D. L., & Edwards, K. J. Learning games and student teams: Their effects on classroom process. American Educational Research Journal, 1973, 10, 307–318. DeVries, D. L., & Edwards, K. J. Student teams and learning games: Their effects on crossrace and cross-sex interaction. Journal of Educational Psychology, 1974, 66, 741–749. Eash, M. J., Rasher, S. P., & Sparkis, V. An evaluation of a new curriculum design as a true experiment. Chicago: University of Illinois, 1975. (ERIC Document Reproduction Service No. ED 113 373). Eash, M. J., & Talmage, H. Evaluation of learning environments (ERIC TM Rep. No. 43). Princeton, N.J.: ERIC Clearing House on Tests, Measurement, and Evaluation, Educational Testing Service, 1975. Fiedler, M. L. Bidirectionality of influence in classroom interaction. Journal of Educational Psychology, 1975, 67(6), 735–744. Gardner, P. L. Attitudes to physics. Unpublished doctoral dissertation, Monash University, Melbourne, Australia, 1974. Gergen, K. J. Social psychology as history. Journal of Personality and Social Psychology, 1973, 26, 309–320. Glass, G. V. The wisdom of scientific inquiry in education. Journal of Research in Science Teaching, 1972, 9, 3–18. Goldberger, A. S., & Duncan, O. D. Structural equation models in the social sciences. New York: Academic Press, 1973. Gruber, H. E., and Barrett, P. H. Darwin on man. New York: Dutton, 1974. Hebb, D. O. What psychology is all about. American Psychologist, 1974, 29, 71–79. Hoetker, J., & Ahlbrand, W. P. The persistence of recitation. American Educational Research Journal, 1969, 6, 145–168. Holsinger, D. B. The elementary school as a socializer of modern values: A Brazilian study. Unpublished doctoral dissertation, Stanford University, 1972. Hull, C. L. Principles of behavior. New York: Appleton-Century-Crofts, 1943. Hull, C. L. A behavior system. New Haven, Conn.: Yale University Press, 1952. Insel, P. M., & Moos, R. H. Psychological environments: Expanding the scope of human ecology. American Psychologist, 1974, 29, 179–188. James, W. Talks to teachers on psychology: And to students on some of life’s ideals. New York: H. Holt & Co., 1899. Johnson, D. W., & Johnson, R. T. Instructional goal structure: Cooperative, competitive, or individualistic. Review of Educational Research, 1974, 44, 213–240.
Salkind_Chapter 23.indd 155
9/4/2010 10:34:31 AM
156
Curriculum, Instruction and Learning
Johnson, D. W., & Johnson, R. T. Learning together and alone: Cooperation, competition, and individualization. Englewood Cliffs, N.J.: Prentice-Hall, 1975. Kahn, S. B., & Weiss, J. The teaching of affective responses. In R. M. W. Travers (Ed.), Second handbook of research on teaching. Chicago: Rand McNally, 1973. Koestler, A. The act of creation. New York: Macmillan, 1964. Kolata, G. B. Behavioral development: Effects of environments. Science, 1975, 189, 207–209. Langer, S. K. Mind: An essay on human feeling (Vol. 2). Baltimore: Johns Hopkins Press, 1973. Lenneberg, E. H. On explaining language. Science, 1969, 164, 635–643. Lockhead, G. Psychophysics. Science, 1975, 189, 451. Lohnes, P. R. Statistical descriptors of school classes. American Educational Research Journal, 1972, 9, 547–556. Lohnes, P. R., & Marshall, T. O. Redundancy in student records. American Educational Research Journal, 1965, 2, 19–23. Marjoribanks, K. (Ed.). Environments for learning. London: National Foundation for Educational Research, 1974. McGuire, W. J. Personality and susceptibility to social influence. In E. F. Borgatta & W. W. Lambert (Eds.), Handbook of personality theory and research. Chicago: Rand McNally, 1968. McGuire, W. J. The yin and yang of progress of social psychology: Seven koan. Journal of Personality and Social Psychology, 1973, 26, 446–456. McKeachie, W. J., & Kulik, J. A. Effective college teaching. In F. N. Kerlinger (Ed.), Review of research in education, 3. Itasca, Ill.: F. E. Peacock, 1975. Moos, R. H. Conceptualizations of human environments. American Psychologist, 1973, 28, 652–664. Mulaik, S. A. Confirmatory factor analyses. In D. A. Amick & H. J. Walberg (Eds.), Introductory multivariate analysis. Berkeley: McCutchan, 1975. Murray, H. A. Preparations for the scaffold of a comprehensive system. In S. Koch (Ed.), Psychology: A study of science, Vol. 3. New York: McGraw-Hill, 1959. Pantin, C. F. A. The relations between the sciences. London: Cambridge University Press, 1968. Piaget, J. Science of education and the psychology of the child. New York: Viking, 1970. Piaget, J. Structuralism. New York: Basic Books, 1971. Power, C., & Tisher, R. Variations in the environment of self-paced science classes: Their nature determinants, and effects. Paper presented at the Annual Conference of the Australian Association for Research in Education, Adelaide, Australia, November, 1975. Randhawa, B. S., and Fu, L. L. W. Assessment and effect of some classroom environment variables. Review of Educational Research, 1973, 43, 303–321. Ravitch, D. The great school wars. New York: Basic Books, 1974. Riegel, K. F. Influence of economic and political ideologies on the development of developmental psychology. Psychological Bulletin, 1972, 78, 129–141. Robinson, W. Ecological correlations and the behavior of individuals. American Sociological Review, 1950, 15, 351–357. Scriven, M. A possible distinction between traditional scientific disciplines and the study of human behavior. In H. Feigl & M. Scriven (Eds.), Minnesota studies in the philosophy of science (Vol. 1). Minneapolis: University of Minnesota Press, 1956. Shulman, L. S. The psychology of school subjects: A premature obituary. Journal of Research in Science Teaching, 1974, 11, 319–339. Shulman, L. S., & Tamir, P. Research on teaching in the natural sciences. In R. M. W. Travers (Ed.), Second handbook of research on teaching. Chicago: Rand McNally, 1973. Skinner, B. F. Are theories of learning necessary? Psychological Review, 1950, 57, 193–216. Skinner, B. F. Beyond freedom and dignity. New York: Knopf, 1971.
Salkind_Chapter 23.indd 156
9/4/2010 10:34:31 AM
Walberg
Psychology of Learning Environments
157
Slavin, R. E. Classroom reward structure: Effects on academic performance and social growth. Unpublished doctoral dissertation, Johns Hopkins University, 1975. Steele, J. M., House, E. R., & Kerins, T. An instrument for assessing instructional climate through low-inference student judgments. American Educational Research Journal, 1971, 8, 447–466. Stent, G. S. Limits to the scientific understanding of man. Science, 1975, 187, 1052–1057. Tagiuri, R. Person perception. In G. Lindzey & E. Aronson (Eds.), The handbook of social psychology (Vol. 3). Reading, Mass.: Addison-Wesley, 1969. Talmage, H. Systems of individualized education. Chicago: National Society for the Study of Education, 1975. Tavris, C. The experimenting society. Psychology Today, 1975, 9, 44–50. Taylor, C. W., & Ellison, R. L. Moving toward working models in creativity. In I. A. Taylor and J. W. Getzels (Eds.), Perspectives in creativity. Chicago: Aldine, 1975. Tresselt, M. E. The effect of the experiences of contrasted groups upon the formation of a new scale of judgment. Journal of Social Psychology, 1948, 27, 209–216. Tukey, J. W. Exploratory data analysis. Reading, Mass.: Addison-Wesley, 1972. Walberg, H. J. Optimizing and individualizing instruction. Interchange, 1971, 2, 15–27. Walberg, H. J. An overview of social psychology. In J. Culbertson (Ed.), Social science content for preparing educational leaders. Columbus, Ohio: Charles E. Merrill, 1973. Walberg, H. J. Educational process evaluation. In M. W. Apple et al. (Eds.), Educational evaluation: Analysis and responsibility. Berkeley: McCutchan, 1974. (a) Walberg, H. J. (Ed.). Evaluating educational performance: A sourcebook of methods, instruments, and examples. Berkeley: McCutchan, 1974. (b) Walberg, H. J. Learning models and learning environments. Educational Psychologist, 1974, 11, 102–109. (c) Walberg, H. J. Psychological theories of educational individualization. In H. Talmage (Ed.), Systems of individualized education. Berkeley: McCutchan, 1975. Walberg, H. J., & Singh, R. Teacher quality perceptions and achievement in Rajasthan. Alberta Journal of Educational Research, 1974, 20, 226–232. Walberg, H. J., Sorenson, J., & Fischbach, T. Ecological correlates of ambience in the learning environment. American Educational Research Journal, 1972, 9, 139–148. Walberg, H. J., & Thomas, S. C. Open education: An operational definition and validation in Great Britain and United States. American Educational Research Journal, 1972, 9, 197–208. Watson, J. B. Psychology as a behaviorist views it. Psychological Review, 1913, 20, 158–177. Watson, J. B. Behaviorism. Chicago: University of Chicago Press, 1925. Webb, E. J., Campbell, D.T., Schwartz, R. D., & Sechrest, L. Unobtrusive measures: Nonreactive research in the social sciences. Chicago: Rand McNally, 1966. Whitehead, A. N. The aims of education and other essays. New York: Macmillan, 1929. Wittgenstein, L. Philosophical investigations. New York: Macmillan, 1953. Zussman, D. The convergent-divergent abilities of students and their teachers. Unpublished doctoral dissertation, McGill University, 1976.
Salkind_Chapter 23.indd 157
9/4/2010 10:34:31 AM
Salkind_Chapter 23.indd 158
9/4/2010 10:34:31 AM
24 Thought and Two Languages: The Impact of Bilingualism on Cognitive Development Rafael M. Diaz
B
y the end of 1979, approximately 3.6 million children in the United States were judged to be in need of special linguistic assistance to cope with the regular school curriculum (Pifer, 1980); at the time, however, roughly 315,000 children were participating in some kind of bilingual education program. Despite the fact that federal spending on bilingual education is comparatively low, and that existing programs reach only a fraction of eligible children, bilingual education is presently under considerable attack. Indeed, “few other educational experiments in recent years have managed to arouse such passionate debate – so much so, in fact, that the future of this promising educational tool is uncertain” (Pifer, 1980, p. 4). The attack against bilingual education can be explained mostly in terms of political, cultural, and socioeconomic variables (see Fishman, 1977). A discussion of such variables is well beyond the scope of this paper. Nevertheless, for our purposes it should be noted that psychological and educational research on the effectiveness of bilingual education often has provided the attackers with sophisticated weapons. For example, an influential study of bilingual education projects sponsored by the Office of Education in 1976 (American Institute for Research, 1977) showed that many existing programs were not providing academic gains for students and, in some cases, were allowing
Source: Review of Research in Education, 10 (1983): 23–54.
Salkind_Chapter 24.indd 159
9/4/2010 10:33:56 AM
160
Curriculum, Instruction and Learning
students to fall behind. Although the study has been criticized severely for basic methodological flaws, it has contributed significantly to a negative mood against bilingual education efforts in the nation (Blanco, 1977). Tucker and D’Anglejan (1971) outlined four commonly held beliefs regarding the effects of bilingual education: (1) Children who are instructed bilingually from an early age will suffer cognitive or intellectual retardation in comparison with their monolingually instructed counterparts. (2) They will not achieve the same level of content mastery as their monolingually instructed counterparts. (3) They will not achieve acceptable native language or target language skills. (4) The majority will become anomic individuals without affiliation to either ethnolinguistic group. (as cited in Cummins & Gulutsan, 1974, p. 259). Some of these beliefs are just that – beliefs. Others are based on studies that were poorly designed and that failed to control for relevant confounding variables such as children’s actual knowledge of their two languages or bilingual-monolingual group differences in socioeconomic status. At present, almost everyone in the field agrees that research on the effects of bilingual education in this country is relatively scarce and, at best, inconclusive (Paulston, 1977). Nevertheless, studies of doubtful validity and ill-founded conclusions are much too often used by legislators and politicians to decide the future of social and educational programs for children. In the near future, the Bilingual Education Act will come up for reauthorization, and once again research findings will be used (and misused) to attack or support decisions regarding the future of bilingual education. There is an urgent need to carefully review the validity of present research findings, in light of their theoretical assumptions and research methodologies, to enlighten policy decisions with scientific facts. This paper attempts, in part, to respond to this need by reviewing the literature on the effects of bilingualism on children’s cognitive development. The review focuses on the psychological literature relating bilingualism and secondlanguage learning to children’s cognitive performance rather than on formal educational evaluations of existing bilingual education programs. Special attention is given to research showing the cognitive advantages of becoming bilingual, bringing to surface the underlying theoretical models relating children’s bilingualism to positive cognitive gains. After all, the rationale for bilingual education rests heavily on the belief that true bilingualism, rather than “semilingualism” or the gradual loss of the first language, is advantageous to children’s learning and cognitive development. It is my hope that this review will not only stimulate further and more rigorous research in the area, but also serve as a guide and inspiration to educational policymakers.
Salkind_Chapter 24.indd 160
9/4/2010 10:33:56 AM
Diaz
Thought and Two Languages
161
Bilingualism and Intelligence: Early Studies Systematic studies on the relationship between bilingualism and intelligence began in the early 1920s, parallel to the flourishing of psychometric tests of intelligence. Because the measurement of intellectual potential was, and still is, heavily dependent on verbal abilities, psychologists and educators were concerned about the validity of such tests for bilingual children. The main concern was that bilingual children would suffer from some kind of language handicap, and this, in turn, would be an obstacle to a fair assessment of their intellectual abilities and potential. The overwhelming majority of studies prior to 1962 found, indeed, strong evidence for the so-called “language handicap” in bilingual children (see reviews by Arsenian, 1937; Darcy, 1953, 1963; Macnamara, 1966). When compared to monolinguals, bilingual children appeared inferior on a wide range of linguistic abilities. Among other things, bilinguals were shown to have a poorer vocabulary (Barke & Perry-Williams, 1938; Grabo, 1931; Saer, 1923), deficient articulation (Carrow, 1957), lower standards in written composition, and more grammatical errors (Harris, 1948; Saer, 1923). Interestingly enough, evidence of a language handicap in bilingual children did not lead to a questioning of the validity of psychometric tests of intelligence for this population. Rather, the consistent findings about bilinguals’ deficient linguistic performance quickly led to statements about the negative effects of bilingualism on children’s intelligence. For a long time, children’s bilingualism was considered as some kind of social plague (Epstein, 1905), “a hardship devoid of apparent advantage” (Yoshioka, 1929, p. 476). The language handicap of bilinguals was interpreted as a linguistic confusion that deeply affected children’s intellectual development and academic performance up to the college years (Saer, 1923). Beliefs about the negative effects of early bilingualism were further confirmed when several studies showed that bilinguals also performed significantly lower than monolinguals on tests of nonverbal abilities, such as tests of dextrality (Saer, 1931) and mathematical competence (Carrow, 1957; Manuel, 1935). Most early studies in this area, however, suffer from a wide range of methodological problems; so much so that at present most investigators in the field regard the findings of early studies as totally unreliable (see Cummins, 1976). Many early studies, for example, failed to control for group differences in socioeconomic status between bilingual and monolingual samples. As early as 1930, McCarthy pointed out that bilingualism in the United States was seriously confounded with low socioeconomic status. She found that more than half the occurrences of bilingualism in school children could be classified as belonging to families from the unskilled labor occupational group. Along the same lines, Fukuda (1925) alerted researchers to the fact that high-scoring, English-speaking subjects were mostly in the occupational and executive classes; he reported a correlation of .53 between the Whittier
Salkind_Chapter 24.indd 161
9/4/2010 10:33:57 AM
162
Curriculum, Instruction and Learning
(socioeconomic) Scale and the Binet IQ for this population. Nevertheless, prior to the early 1960s, most studies investigating the effects of bilingualism on children’s intelligence did not account for group differences in socioeconomic status. A second major methodological flaw of early studies is that investigators consistently ignored children’s actual degree of bilingualism. An extreme example is a study by Brunner (1929) where degree of bilingualism was determined by the foreignness of parents. Brunner divided his bilingual sample into three categories: (1) both parents born in this country, (2) one parent born here and the other abroad, and (3) both parents born abroad. The classification was simply assumed to represent children’s varied degree of bilingualism. In other studies, the sample’s bilingualism was assessed through family names or even place of residence (see Darcy, 1953, for a review). As present investigators have stated repeatedly, it is impossible to ascertain if the bilingual subjects of many studies were indeed bilingual or just monolingual of a minority language. A few studies, however, were conducted with controls for socioeconomic variables and attempted more refined measures of subjects’ bilingualism. Fritz and Romkin (1934), for example, tested 201 junior high school students in Kansas on the Otis Self-Administering Test of Mental Ability, the New Stanford Achievement Test, and the Sims Socio-Economic Score Card. The sample consisted of two different groups: an “only-English-speaking” group and a “usually-foreign-speaking” group. As expected, the results showed that the monolingual English-speaking group was at a definite advantage in all achievement and IQ variables, as well as in socioeconomic status. To make the two language groups more comparable, Fritz and Romkin matched 12 children from each group on relevant variables such as sex, age, mental ability, and socioeconomic status. Once again, the results showed that “foreign-speaking” children performed at a lower level than monolinguals on all sections of the achievement test. Although the matched samples were small, and the matching procedure never guarantees that groups are equivalent on all relevant variables, this study shows that the language handicap of bilinguals was evident even when socioeconomic variables were controlled somewhat. The methodological problem remained, however, with the fact that the selection of foreign-speaking subjects does not guarantee that the bilingual sample masters both languages at age-appropriate levels to be considered truly bilingual. Other studies attempted such strict controls that comparisons between bilingual and monolingual samples on cognitive variables became meaningless. Hill (1935) compared Italian children who heard and spoke only Italian at home with Italian children who heard and spoke only English at home. The sample’s degree of bilingualism was ascertained by questionnaires and tests of comprehension of spoken Italian and Italian word meaning. The two groups of children were equated on age, sex, educational environment,
Salkind_Chapter 24.indd 162
9/4/2010 10:33:57 AM
Diaz
Thought and Two Languages
163
mental age, and intelligence quotient. As could be reasonably expected, the results showed no reliable differences between the two groups of children in verbal and nonverbal scores. Arsenian (1937) argued that Hill’s (1935) results are basically meaningless, because matching the groups on an IQ measure that is based on both verbal and nonverbal performance guarantees a lack of difference result in verbal and nonverbal abilities. This study, however, is an excellent example of the dilemma faced by both early and present investigators in the field. To date, it is not clear how to control for group differences between bilingual and monolingual intellectual abilities and at the same time study meaningful group differences in both cognitive and linguistic abilities. One possible solution is to use subjects as their own controls and study cause–effect relationships between degree of bilingualism and cognitive variables using a longitudinal design. Unfortunately, there are very few longitudinal studies that shed light on these cause-effect relationships.
Lessons from Four Decades of Research The severe methodological problems of early studies resulted in few clear facts about the effects of bilingualism on children’s intelligence and intellectual development. On the other hand, early studies yielded a great deal of wisdom about the complexity of the issues. The first few decades of serious systematic studies in the field have alerted researchers to simplistic theories and methodologies regarding the phenomenon of bilingualism and recognize the variables that mediate its effects on children’s cognitive development. As early as 1937, Arsenian argued against a unidimensional construct of bilingualism and argued that variations between different bilingual experiences could make a big difference in the types of effects observed in children’s cognitive performance. Specifically, Arsenian proposed that for scientific research purposes, bilingual samples should be defined along the following dimensions: Degree of bilingualism. Bilinguals vary in degree of proficiency in their two languages. Some bilingual children are just beginners in learning the second language, while others have achieved age-appropriate levels of proficiency in both languages. Furthermore, the bilingualism of a given person may vary with time; for example, in some bilingual situations increased competence and mastery of a second language gradually replaces the use and abilities of the first language. The effects of such variations within bilinguals should be the object of scientific investigation rather than simply ignored. Degree of difference between the two languages. Two languages from different language families vary along more dimensions than two languages within the same language family. Spanish, for example, is closer to other IndoEuropean languages such as Italian, French, and Rumanian than it is to
Salkind_Chapter 24.indd 163
9/4/2010 10:33:57 AM
164
Curriculum, Instruction and Learning
English or Japanese. It is clear that more cognitive effort is required from a Spanish child to learn the morphology, grammar, and phonetics of English than for the same child to learn Italian. Furthermore, the degree of difference between two languages might represent deeper cultural differences that the child must assimilate and accommodate to achieve proper mastery of the language. In Arsenian’s. (1937) words: The degree of difference between the two languages of a bilinguist is important from the point of view not only of the learning mechanism, but also of the thinking process; because the difference between two languages usually denotes a difference in the culture and civilization of the two peoples using them, and hence denotes also a difference in the connotation of words which will influence the direction and the content of thought in the two languages. (p. 20)
It should not be surprising, therefore, that the degree of difference between two languages might mediate the effects of a bilingual experience on children’s cognitive development. The effects of this variable must be considered carefully when attempting to generalize from one bilingual experience to another. Age when learning a second language. Although it is not clear what age is best (or worst) to learn a second language, most likely the experience of becoming bilingual will have different cognitive effects, depending on the learner’s age. For example, the experience of infants exposed to two languages simultaneously (Leopold, 1949a, 1949b) seems to be qualitatively different from the experience of a monolingual 6- or 7-year-old who is faced with the task of learning a second language to understand the school curriculum. The question regarding the best age to learn a second language is, indeed, an unresolved issue in current research. By the same token, it is not clear if the age of the second-language learner is an important variable mediating the possible positive or negative effects of bilingualism. Those who argue in favor of a critical period hypothesis in language acquisition, and the relative ease of acquiring a language during this period, tend to postulate different cognitive effects of second-language learning depending on whether the learner is within or beyond this critical period (see Lenneberg, 1967; Penfield & Roberts, 1959). Others argue that the introduction of a second language at an early age, when the child has not yet achieved a certain degree of competence in his first language, might be detrimental to the child’s cognitive development, while positive cognitive gains should be expected from bilingualism if the second language is introduced after the child has achieved a certain threshold level of competence in his first language (Cummins, 1976). It is important to note that certain dependent variables in studies of bilingualism and cognition might be particularly sensitive to age effects. For example, several studies have shown that a bilingual’s vocabulary in both the first and second language is smaller than the vocabulary of monolinguals (Grabo, 1931; Saer, 1923; Sanchez, 1934). However, on the basis of the data from several other studies, Arsenian (1937) showed that this
Salkind_Chapter 24.indd 164
9/4/2010 10:33:57 AM
Diaz
Thought and Two Languages
165
apparent deficit is closely related to a given age group of bilinguals, and therefore is a temporary effect of second-language learning at a young age. The same effects simply are not found in older bilinguals (Murdoch, Maddow, & Berg, 1928). Method of learning the second language. Arsenian (1937) insisted that researchers should be attentive to whether the bilingual child had learned the two languages simultaneously or whether the second language had followed the first. Relevant to this dimension is the distinction between acquiring and learning a second language. Briefly stated, second-language acquisition refers to the process of acquiring a second language in a natural environment, outside of formal instruction; second-language learning refers to the process of formal language education where one aspect of the grammar is introduced at a time, and systematic feedback with error correction is provided (McLaughlin, 1978). There are few empirical findings regarding the cognitive effects of acquiring versus learning a second language. Probably, in most situations, bilinguals both acquire and learn different aspects of the second language. However, there is some scattered evidence that certain features of language acquisition might ease the process of formal second-language learning. In one of the earliest studies in the area, Saer (1923) tested approximately 1,400 children from ages 7 to 12 in five rural and two urban districts in Wales. Saer obtained the following results on the Stanford-Binet scale:
Monolingual Bilingual
Urban
Rural
99 100
96 86
According to Saer’s data, differences in the performance of bilingual and monolingual children seem to exist only in the rural sections. Saer explained his findings in the following way: For the rural Welsh-speaking children, Welsh is the language of home, play, and Church and, therefore, a language with strong affective connotations. When these children are exposed to a second language at school, a conflict is raised between the child’s “selfregarding sentiment or positive self-feeling” and his “negative self-feeling or his instinct for submission” (p. 37). On the other hand, for the Welsh-speaking child in the urban areas this conflict is played down by the fact that they come in contact and play with English-speaking children at an early age, before a formal learning contact with the second language at school. Although there is no evidence to support Saer’s psychodynamic assertions, his data do indeed suggest that opportunities to acquire a second language might mediate the effects of second-language learning on cognitive development. More recent studies show that children who begin bilingual education programs with a fair amount of knowledge of the second language perform significantly better on several cognitive measures than children with little or no previous experience in the second language (Diaz & Hakuta, Note 1).
Salkind_Chapter 24.indd 165
9/4/2010 10:33:57 AM
166
Curriculum, Instruction and Learning
Attitudes toward the second language. Bilingual experiences vary significantly in terms of the social, political, and religious sentiments connected with the first and second languages. As Saer’s (1923) conclusions suggested, having to learn a second language might threaten a person’s self-esteem when the second language is identified in any way with a colonizing or assimilating force. In such situations, a negative attitude toward the second language might play a crucial role in determining children’s linguistic and academic performance. Arsenian believed, therefore, that when defining a given bilingual situation, researchers must include a detailed description of the national, religious, and political significance of the second language for the bilingual sample involved (see also Fishman, 1977). Although Arsenian (1937) at an early stage outlined the five dimensions mentioned above, the majority of studies in the field prior to 1962 lacked adequate assessments of the sample’s actual degree of bilingualism or proficiency in both languages. Also, as a rule, bilinguals were treated as a homogeneous group with no adequate consideration of the variability in second-language learning or acquisition histories. Furthermore, results from studies of specific bilingual situations were grossly generalized as effects of the universal aspects of bilingualism. Toward the end of the 1950s, research on the effects of bilingualism showed consistent findings. Monolinguals performed significantly higher than bilinguals on measures of verbal intelligence. Some studies showed that monolinguals were also at an advantage on measures of nonverbal ability, but group differences on this variable were not consistent across studies. On one hand, the findings suggested that at certain stages of second-language learning, bilinguals suffered from a “language handicap.” On the other hand, it was not clear if this linguistic disadvantage in bilinguals was a true intellectual deficit of a permanent nature, or just a temporary manifestation of the struggle to cope with two different language systems at a relatively young age. Further research to clarify these issues seemed extremely important on two counts. First, the question was obviously and directly relevant to educational policy in several countries. Second, the negative findings contradicted linguists’ case studies and theoretical statements regarding the effects of early bilingualism. The best-known linguistic study of a child’s simultaneous acquisition of two languages is Leopold’s monumental investigation of his daughter Hildegard (Leopold, 1939, 1947, 1949a, 1949b). Hildegard lived most of the time in an English-speaking environment, but her father spoke to her in German and her mother in English. As was the case in similar earlier studies (see e.g., Pavlovitch, 1920; Ronjat, 1913), Leopold’s study found little interference between Hildegard’s two languages, and no evidence at all of any serious linguistic retardation in either language. Hildegard shifted languages with relative ease and developed strategies to use new words appropriately in the context of their respective languages. Leopold (1949b) noted in his last
Salkind_Chapter 24.indd 166
9/4/2010 10:33:57 AM
Diaz
Thought and Two Languages
167
volume that by age 3 both his daughters had an awareness of dealing with two separate languages, and from then on both languages seemed to develop adequately as two independent systems. Furthermore, Leopold regarded his daughters’ bilingualism as a genuine asset to their mental development. He felt that bilingual children must learn very early to separate the sound of the word from its referent, and this, in turn, forced the child to focus on essentials, on “content instead of form” (p. 188). Leopold’s conclusion implies that bilingualism accelerates the development of abstract thinking by freeing the child’s thought from the concreteness and “tyranny” of words. Similar claims can be found in the work of Evans (1953) and Vygotsky (1962). Nevertheless, because the majority of studies before 1962 showed that bilinguals performed lower than monolinguals on linguistic, cognitive, and academic variables, the first four decades of psychological research on the effects of bilingualism were loaded with the notion that bilingualism was detrimental to children’s intelligence and cognitive development. In the early 1960s, however, new experimental procedures and more controlled sample selection procedures led to very different conclusions. Peal and Lambert’s study in 1962 marked the turning point.
Bilingualism and Pseudobilingualism: Peal and Lambert (1962) Aware of the potential advantages of bilingualism for children’s cognitive development, Peal and Lambert (1962) attributed the negative findings of early studies to the failure of researchers to differentiate “pseudo-bilinguals” from truly bilingual children. “The pseudo-bilingual knows one language much better than the other, and does not use his second language in communication. The true bilingual masters both at an early age and has facility with both as means of communication” (p. 6). Guided by O’Doherty’s (1958) writings, Peal and Lambert believed that while pseudobilingualism might be a serious problem that could result in intellectual retardation, genuine bilingualism may be a real asset to children’s intellectual development. Because early studies had been lax in their definition of bilingualism and in the assessment of their sample’s degree of bilingualism, negative findings could be attributed to a situation of pseudobilingualism. To test their hypotheses, Peal and Lambert (1962) administered several measures of degree of bilingualism to 364 10-year-old children in Canada. Three tests were used to determine whether children were “balanced” bilinguals, that is, equally skilled in French and English, or whether they were monolingual. Children’s self-ratings of their ability in the second language were taken into account also. The final sample was composed of 164 subjects: 75 monolinguals and 89 (genuine or balanced) bilinguals. Children in the sample were administered a modified version of the Lavoie-Laurendau
Salkind_Chapter 24.indd 167
9/4/2010 10:33:57 AM
168
Curriculum, Instruction and Learning
(1960) Group Test of General Intelligence, the Raven’s Coloured Progressive Matrices, and a French version of selected subtests of the Thurstone and Thurstone (1954) Primary Mental Abilities Test. In addition, several measures of attitudes toward English Canadians, French Canadians, and the self were administered to the subjects. Contrary to the findings of earlier studies, the results of the Peal and Lambert study showed that bilinguals performed significantly better than monolinguals in most of the cognitive tests and subtests, even when group differences in sex, age, and socioeconomic status were appropriately controlled. Bilingual children performed significantly higher than monolinguals on tests of both verbal and nonverbal abilities; the bilinguals’ superiority in nonverbal tests was more clearly evident in those subtests that required mental manipulation and reorganization of visual stimuli, rather than mere perceptual abilities. A factor analysis of test scores indicated that bilinguals were superior to monolinguals in concept formation and in tasks that required a certain mental or symbolic flexibility (the notion of cognitive flexibility will be discussed in detail in a later section). Overall, bilinguals were found to have a more diversified pattern of abilities than their monolingual peers. Peal and Lambert’s (1962) findings must be considered, however, with a certain degree of caution. First, as Macnamara (1964, 1966) pointed out, the process of subject selection might have introduced a bias in favor of the bilingual sample. Peal and Lambert’s bilingual sample included only children who scored above a certain determined level in the English Peabody Picture Vocabulary Test, a test commonly used to measure intelligence in monolinguals. It is possible that in a situation like Canada, the intelligence of French-Canadian children might be reflected in a measure of English (the second language) vocabulary. Second, on the average, the bilingual sample belonged to a higher grade than the monolingual sample; perhaps the superiority observed in bilinguals was the result of their having longer exposure to formal education. And third, the frequency distribution of the Raven’s test scores was very different for both groups of children; it was negatively skewed for bilinguals, while the opposite was true for monolinguals. In short, the cognitive advantages observed in Peal and Lambert’s balanced bilingual sample could have been inflated by several artifacts in their subject selection procedures. As Peal and Lambert admitted, A partial explanation of this [the results] may lie in our method of choosing the bilingual sample. Those suffering from a language handicap may unintentionally have been eliminated. We attempted to select bilinguals who were balanced, that is, equally fluent in both languages. However, when the balance measures did not give a clear indication of whether or not a given child was bilingual, more weight was attached to his score on the English vocabulary test. Thus some bilinguals who might be balanced, but whose vocabulary in English and French might be small, would be omitted from our sample. The less intelligent bilinguals, those who have not acquired as large an English vocabulary, would not be considered bilingual enough for our study. (p. 15)
Salkind_Chapter 24.indd 168
9/4/2010 10:33:57 AM
Diaz
Thought and Two Languages
169
Nevertheless, Peal and Lambert’s (1962) empirical distinction between bilinguals and pseudobilinguals made a significant (and much needed) methodological contribution to the field. Their distinction has forced recent investigators to select their bilingual samples with greater care and to measure the sample’s actual knowledge of the two languages. Peal and Lambert’s study also alerted researchers to the possible positive and negative effects of bilingualism depending on the bilingual situation involved. Recently, more attention has been given to descriptions of different types of bilingual experiences that might have different effects on children’s cognitive development (see Cummins, 1976). One such situation results in “semilingualism.” Semilinguals are children whose second language gradually replaces the native tongue. Therefore, at a given point, these children are neither fluent speakers of the first language nor have mastered the second language with age-appropriate ability. Along these lines, Macnamara (1966) noted that in certain Irish-English bilingual situations in Ireland, competence in the second language was attained at the expense of competence in the first language. Macnamara named this process the “balance effect,” which must be carefully distinguished from those situations where children move toward balanced bilingualism, that is, age-appropriate abilities in both languages. Recent studies in Scandinavia (e.g., Hansegard, 1968; Skuttnabb-Kangas, Note 2) have shown that semilingualism has negative emotional, cognitive, linguistic, and scholastic consequences (see Paulston, 1975, for a review of Scandinavian research on semilingualism). When trying to understand the situation of minority bilingual children in the United States, one must look carefully for signs of semilingualism or the balance effect. The main reason is that semilingualism is usually associated with the bilingualism of the poor economic classes. Sociolinguists have often made a sharp distinction between the bilingualism of upper- and lower-class children in terms of “elitist” versus “folk” bilingualism (Fishman, 1967; Paulston, 1975). As a rule, elitist bilingualism is a matter of choice for the educated classes and has not presented any educational problems. On the other hand, folk bilingualism is “the result of ethnic groups in contact and competition within a single state” (Cummins, 1976, p. 19). Folk bilingualism also is associated with several sociocultural factors, such as negative attitudes and actual discrimination against the use of a minority language, which probably prevent the adequate development of genuine or balanced bilingualism.
Cognitive Advantages of Balanced Bilinguals Although the Peal and Lambert (1962) study had some serious methodological difficulties, it must be pointed out that their findings regarding the positive effects of balanced bilingualism have been replicated in more recent studies that have carefully assessed the sample’s actual knowledge of the
Salkind_Chapter 24.indd 169
9/4/2010 10:33:57 AM
170
Curriculum, Instruction and Learning
two languages. Indeed, when compared to monolinguals, balanced bilingual children show a wide range of advantages in different cognitive tasks. These studies will be carefully reviewed here.
Cognitive Flexibility Several studies have concluded that bilinguals are more cognitively “flexible” than monolinguals; the construct “cognitive flexibility,” however, has never been adequately defined. The notion of flexibility has been loosely used and abused to account for bilinguals’ superior performance on a wide range of cognitive tasks. For example, the term was used by Peal and Lambert (1962) to describe bilinguals’ performance on tests of general reasoning; by BenZeev (1976, 1977a) to describe bilinguals’ improved attention to structure and detail; by Balkan (1970) to describe performance on perceptual and “set changing” tasks; and by Landry (1974) to describe divergent thinking skills measured by tests of creativity. (See Cummins, 1976, for a discussion of the conceptual confusion underlying the term cognitive flexibility.) Nevertheless, this poorly defined construct is now widely used, and many students and researchers in the field argue that bilinguals are, indeed, more cognitively flexible than monolinguals. It is important, therefore, to trace the history of the term’s usage, as well as to clarify the nature of the tasks where bilingual children seem to perform more “flexibly” than monolinguals. In the literature on bilingualism and cognitive development, the term cognitive flexibility was used first by Peal and Lambert (1962) to describe bilinguals’ performance on measures of general intelligence. Specifically, the term was used to explain a puzzling finding, namely, that bilinguals performed significantly better than monolinguals on several nonverbal tests of intelligence. On the basis of earlier linguistic studies, the superior performance of balanced bilinguals on verbal tests could be explained rather easily by the linguistic advantages of knowing two different languages, such as the early separation between sound and meaning. However, a similar explanation was not available for the effects of bilingualism on nonverbal abilities. Bilinguals’ need to switch languages and a resulting mental flexibility proved to be a logical and attractive explanation. Because bilinguals outranked monolinguals on both verbal and nonverbal tests, an alternative explanation would have been to simply admit the (nonintuitive) conclusion that bilinguals in the study were more intelligent than the monolinguals. Such an explanation, however, would have cast further doubts on Peal and Lambert’s sample selection procedures. After submitting their data to a factor analysis, Peal and Lambert (1962) noted that the nonverbal advantages of balanced bilinguals appeared more clearly on tests requiring some manipulation and reorganization of symbols, rather than on tasks requiring perceptual or spatial abilities. Previous analyses
Salkind_Chapter 24.indd 170
9/4/2010 10:33:57 AM
Diaz
Thought and Two Languages
171
of nonverbal tests of ability (Ahmed, 1954; Anastasi, 1961) suggested that spatial visualization and mental manipulation of visual symbols are independent abilities. Moreover, Ahmed (1954) described this second ability “as if it consisted of mental flexibility which is involved in the process of mentally reorganizing the elements of a problem situation” (as cited in Peal & Lambert, 1962, p. 14; italics added by Peal & Lambert). Peal and Lambert went a step further and cleverly explained the newly discovered flexibility of bilinguals in terms of their habitual language switching: The second hypothesis is that bilinguals may have developed more flexibility in thinking. Compound bilinguals typically acquire experience in switching from one language to another, possibly trying to solve a problem while thinking in one language, and then, when blocked, switching to the other. This habit, if it were developed, could help them in their performance on tests requiring symbolic reorganization since they demand a readiness to drop one hypothesis or concept and try another. (p. 14)
Implied in Peal and Lambert’s explanation is the assumption that bilingual children would perform verbally the mental manipulation of visual symbols required by nonverbal tests like the Raven’s Progressive Matrices. More specifically, their hypothesis involves three basic (and untested) assumptions: (1) that bilingual children are thinking verbally while performing these nonverbal tasks, (2) that bilinguals switch from one language to the other while performing these tasks, and (3) that bilinguals’ habit of switching languages while performing these tasks stimulates the ability to more readily discard doubtful hypotheses and formulate new ones to find a correct solution to the problem involved. In support of their explanatory hypothesis, Peal and Lambert cite the case of a Gaelic-speaking boy of 11 (originally cited in Morrison, 1958), who had just taken the Raven’s Progressive Matrices test. According to Morrison, when the boy was asked whether he had done his thinking in Gaelic or in English, the boy replied, “Please Sir, I tried it in the English first, then I tried in the Gaelic to see would it be easier; but it wasn’t so I went back to the English” (p. 280). Recent research on the Raven’s Progressive Matrices suggests that the matrices can be solved by performing either verbal or nonverbal operations on the elements involved (see Hunt, 1974). However, research on children’s performance on the Raven’s Matrices (Kirby & Das, 1978) suggests that, most likely, children rely on visual-spatial strategies when solving the matrices. Kirby and Das found that even the items that are more prone to verbal processing, such as terms requiring some kind of analogical reasoning, are highly correlated with tests of pure spatial abilities in fourth-grade monolinguals.
Salkind_Chapter 24.indd 171
9/4/2010 10:33:57 AM
172
Curriculum, Instruction and Learning
Although Peal and Lambert’s (1962) assumptions are fascinating and suggestive hypotheses in themselves, it is clear that they cannot be taken at face value. This writer is currently investigating bilinguals’ use of verbal and spatial strategies when solving problems like those encountered in the Raven test. It is possible that, because of their unique linguistic experience, bilingual children prefer to process information and to solve nonverbal tasks verbally; in fact, some preliminary data analyses suggest that this might be the case. Hopefully, this kind of research will shed some light on bilinguals’ superior performance in nonverbal tests. Nevertheless, it is too early to tell whether bilingual and monolingual children do indeed differ in their informationprocessing strategies. Peal and Lambert’s conclusions regarding bilinguals’ flexibility, therefore, must be taken with great caution. One of the most frequently cited studies of bilinguals’ cognitive flexibility is a study conducted by Balkan in Switzerland. Balkan (1970) administered several tests of nonverbal abilities that purportedly measured cognitive flexibility. The bilingual group, as expected, performed significantly higher than the control monolingual group in two of these measures. One task, Figures Cachees, similar to the familiar Embedded Figures Test, involved the ability to reorganize a perceptual situation. The other task, Histoires, involved sensitivity to the different meanings of a word. Interestingly, the positive effects of bilingualism on these measures were much stronger for children who had become bilingual before the age of 4. The differences between monolinguals and children who had become bilingual at a later age were in favor of the latter but did not reach statistical significance. Balkan’s study implies, as earlier linguistic studies had suggested, that bilingualism might have the most beneficial cognitive effects for those children who learn their two languages simultaneously. However, to consider bilinguals’ superior performance on these very different cognitive tasks a sign of their cognitive flexibility might be stretching things too far. On one hand, because balanced bilinguals have two different words for most referents, it is not surprising that they show a greater sensitivity than monolinguals to the possible different meanings of a single word, as shown in the Histoires task. On the other hand, Balkan’s study offers no clue as to how or why bilingualism should contribute to a greater ability to reorganize and reconstruct perceptual arrays, as shown in the Figures Cachees task. As Peal and Lambert’s (1962) conclusions suggest, the clue might be in bilinguals’ tendency to use verbal mediation when performing these visual-spatial tasks. Ben-Zeev’s (1977b) study with Hebrew-English bilingual children provides further evidence of bilinguals’ so-called cognitive flexibility. When compared to monolinguals, the bilingual children in this study showed a marked superiority in symbol substitution and verbal transformation tasks. The symbol substitution task involved children’s ability to substitute words in a sentence according to the experimenter’s instructions. In a typical instance, children were asked to substitute the word “I” with the word “spaghetti.”
Salkind_Chapter 24.indd 172
9/4/2010 10:33:57 AM
Diaz
Thought and Two Languages
173
Children were given correct scores when they were able to say sentences like “Spaghetti am cold,” rather than “Spaghetti is cold,” or a similar sentence that, although grammatically correct, violated the rules of the game. The verbal transformation task involved the detection of changes in a spoken stimulus that is repeated continuously by a tape loop. Warren and Warren (1966) reported that when a spoken stimulus is presented in such a way, subjects older than 6 years report hearing frequent changes in what the taped voice says. The authors attributed this illusion to the development of a reorganization mechanism that aids the perception of ongoing speech. The bilingual children in Ben-Zeev’s study also outperformed the monolingual group on certain aspects of a matrix transposition task; bilinguals were better at isolating and specifying the underlying dimensions of the matrix. No group differences were found, however, on the rearrangement of figures in the matrix. The two comparison groups also performed similarly on the Raven’s Progressive Matrices. It should be noted that the bilinguals in Ben-Zeev’s study showed cognitive advantages only in measures that were directly related to linguistic ability and on the verbal aspects of the matrix transformation task. Ben-Zeev (1977b) noted that throughout the study bilingual children seemed to approach the cognitive tasks in a truly analytic way. They also seemed more attentive to both the structure and details of the tasks administered, as well as more sensitive to feedback from the tasks and the experimenter. Ben-Zeev explained these improved abilities in terms of bilinguals’ confrontation with their two languages. She argued that to avoid linguistic interference, bilinguals must develop a keen awareness of the structural similarities and differences between their two languages as well as a special sensitivity to linguistic feedback from the environment. Supposedly, this more developed analytic strategy toward linguistic structures is transferred to other structures and patterns associated with different cognitive tasks. Ben-Zeev summarized her results as follows: Two strategies characterized the thinking patterns of the bilinguals in relation to verbal material: readiness to impute structure and readiness to reorganize. The patterns they seek are primarily linguistic, but this process also operates with visual patterns, as in their aptness at isolating the dimensions of a matrix. With visual material the spatial reorganizational skill did not appear. however. (p. 1017)
In conclusion, the nature or meaning of cognitive flexibility is far from being understood; the studies just reviewed, however, suggest that the flexibility noted in bilinguals could stem from language-related abilities such as a precocious use of verbal mediation in solving nonverbal tasks or an early awareness of the conventionality and structural properties of language. The next section will review in greater detail the linguistic and metalinguistic abilities that have been related empirically to the bilingual experience.
Salkind_Chapter 24.indd 173
9/4/2010 10:33:57 AM
174
Curriculum, Instruction and Learning
Linguistic and Metalinguistic Abilities As mentioned earlier, linguists’ case studies (Leopold, 1961; Ronjat, 1913) concluded that early bilingualism was advantageous to children’s cognitive and linguistic development. In particular, Leopold suggested that bilingualism promoted an early separation of the word sound from the word meaning, “a noticeable looseness of the link between the phonetic word and its meaning” (1961, p. 358). Furthermore, Leopold postulated a fascinating connection between the semantic and cognitive development of bilingual children; namely, the separation of sound and meaning leads to an early awareness of the conventionality of words and the arbitrariness of language. This awareness could promote, in turn, more abstract levels of thinking. Vygotsky (1935/1975) saw the cognitive advantages of bilingualism along the same lines; in his own words, bilingualism frees the mind “from the prison of concrete language and phenomena” (as cited in Cummins, 1976, p. 34). Leopold’s observations were tested empirically by Ianco-Worrall (1972) in a remarkably well-designed and controlled study of English-Afrikaans bilingual children in South Africa. The bilingual sample consisted of nursery school children who had been raised in a one-person, one language environment, similar to the situation of Leopold’s daughter Hildegard. The sample’s degree of bilingualism was determined by several measures, including detailed interviews with parents and teachers as well as a direct test of the children’s vocabulary in both languages. Two comparable monolingual samples, one English and one Afrikaans, were included in the study. In a first experiment, children were administered a semantic-phonetic preferences test. The test consisted of eight sets of three words. A typical set was the words cap, can, and hat. Children were asked questions such as: Which word is more like cap, can or hat? Choosing the word can or hat was an indication of the child’s phonetic or semantic preference in analyzing word similarities. The capacity to compare words on the basis of a semantic dimension is regarded as more advanced developmentally than comparing words along a phonetic dimension. The results of Ianco-Worrall’s (1972) experiment showed not only that semantic preferences increased with age, but also that bilinguals outranked monolinguals in choosing words along a semantic rather than a phonetic dimension. As Ianco-Worrall reported, “of the young 4–6 year old bilinguals, 54% consistently chose to interpret similarity between words in terms of the semantic dimension. Of the unilingual groups of the same age, not one Afrikaans speaker and only one English speaker showed similar choice behavior” (p. 1398). Ianco-Worrall concluded that bilingual children who are raised in a one-person, one-language environment reach a stage of semantic development 2 to 3 years earlier than monolingual children. In a second experiment, using Vygotsky’s (1962) interviewing techniques, Ianco-Worrall (1972) asked her subjects to explain the names of different
Salkind_Chapter 24.indd 174
9/4/2010 10:33:58 AM
Diaz
Thought and Two Languages
175
things (e.g., why is a dog called dog?). She also asked children whether or not names of things could be arbitrarily interchanged. For the first question, children’s responses were assigned to different categories, such as perceptible attributes, functional attributes, social convention, and so forth. The results of this experiment, however, showed no reliable differences between bilingual and monolingual children in the types of explanations offered. For the second question, the differences favored the bilingual children; bilinguals replied that names of objects could in principle be changed, while the opposite was true for monolingual children. As part of the same experiment, Ianco-Worrall played a “game” with her young subjects where the names of objects were actually changed. She then asked questions about the qualities and properties of the newly named objects. For example, “Let us call a dog, cow. Does this cow have horns? Does this cow give milk?” (pp. 1394–1395). The results indicated that there was no difference between bilinguals and monolinguals in their capacity to separate in play the qualities of objects from their names. In the study just described, bilinguals exceeded monolinguals in their capacity to analyze the similarity of words along semantic rather than acoustic dimensions. Also, bilingual children seemed more aware than monolinguals of the conventional nature of words and language. This awareness or flexibility with respect to the use of language was also evident in bilinguals’ responses to Ben-Zeev’s (1977b) symbol substitution task, mentioned above. In another study (Feldman & Shen, 1971), bilingual 5-year-olds were better than their monolingual peers at relabeling objects and expressing relations between objects in simple sentences. Further evidence of the positive effects of bilingualism on verbal and linguistic abilities can be found in the work of Casserly and Edwards (Note 3) and in the reports of the St. Lambert experimental bilingual project in Canada (Lambert & Tucker, 1972; Lambert, Tucker, & D’Anglejan, 1973). Casserly and Edwards reported that firstthrough third-grade children in bilingual programs showed definite advantages on several psycholinguistic measures when compared to children attending regular school programs. By the same token, bilingual children in the St. Lambert project outperformed monolinguals when tested on verbal tests of intelligence. Several investigators have explored the effects of bilingualism on the development of metalinguistic awareness. Metalinguistic awareness refers to the ability to analyze objectively linguistic output, that is, “to look at language rather than through it to the intended meaning” (Cummins, 1978, p. 127). Indeed, as children develop, they become more capable of looking at language as an objective set of rules, an objective tool for communication. Because bilingualism induces an early separation of word and referent, it is possible that bilingual children also develop an early capacity to focus on and analyze the structural properties of language. Vygotsky (1935/1975, 1962) suggested that because bilinguals could express the same thought in
Salkind_Chapter 24.indd 175
9/4/2010 10:33:58 AM
176
Curriculum, Instruction and Learning
different languages, a bilingual child would tend to “see his language as one particular system among many, to view its phenomena under more general categories, and this leads to an awareness of his linguistic operations” (1962, p. 110). Similarly, Ben-Zeev (1977b) hypothesized that bilinguals develop an analytic strategy toward language to fight interference between their two languages. Lambert and Tucker (1972) noted that children in the St. Lambert bilingual experiment engaged in some sort of “contrastive linguistics” by comparing similarities and differences between their two languages. Cummins (1978) investigated the metalinguistic development of thirdand sixth-grade Irish-English bilinguals. Children in the sample came from homes where both Irish and English were spoken; all children received formal school instruction in Irish. An appropriate monolingual comparison group was selected that was equivalent to the bilingual group on measures of IQ and socioeconomic status. A first task investigated children’s awareness of the arbitrariness of language. Similar to the measure used by Ianco-Worrall (1972), children were asked whether names of objects could be interchanged; children were then asked to explain or justify their responses. The results indicated that at both third- and sixth-grade levels bilinguals showed a greater awareness of the arbitrary nature of linguistic reference. In a second task, children were presented with several contradictory and tautological sentences about some poker chips that were either in view of the child or hidden. The sentences varied in two additional dimensions: true versus false and empirical versus nonempirical. Nonempirical statements refer to sentences that “are true or false by virtue of their linguistic form rather than deriving their truth value from any extra-linguistic state of affairs” (p. 129). The task was chosen as a measure of metalinguistic awareness because previous research had shown that to correctly evaluate contradictions and tautologies, it is necessary to examine language objectively. Although the results for this measure were not clear-cut in favor of the bilinguals, sixth-grade bilingual children showed a marked superiority in correctly evaluating hidden nonempirical sentences. The monolinguals “analyzed linguistic input less closely, being more content to give the obvious ‘can’t tell’ response to the hidden nonempirical items” (p. 133). In a second experiment with balanced Ukranian-English bilinguals, Cummins (1978) investigated children’s metalinguistic awareness using a wide variety of measures including analysis of ambiguous sentences and a class inclusion task. Contrary to previous findings, the bilinguals in this study did not show advantages on the Semantic-Phonetic Preference Test or on the arbitrariness of language task. However, “the results of the Class Inclusion and Ambiguities tasks are consistent with previous findings in that they suggest that bilingualism promotes an analytic orientation to linguistic input” (p. 135). Diaz and Hakuta (Note 1) investigated two different types of metalinguistic awareness; namely, bilingual children’s awareness of grammatical errors
Salkind_Chapter 24.indd 176
9/4/2010 10:33:58 AM
Diaz
Thought and Two Languages
177
in their first language and their ability to perceive their two languages as two independent and different language systems. In this study, a group of Spanish-English balanced-bilingual children were compared to a group of Spanish-speaking children who were just beginning to learn English as a second language at school; therefore, the comparison group could be considered relatively monolingual children who were at beginning stages of secondlanguage learning. The two groups of children were equivalent in their Spanish ability, lived in the same neighborhoods, and attended the same kindergarten and first-grade bilingual classes. The metalinguistic awareness tasks consisted of eight ungrammatical Spanish sentences and eight Spanish sentences with one English word in each (e.g., La teacher está en la clase or El dog es grande); several correct Spanish sentences were intermixed within each set of wrong sentences. For the first set of sentences, children were asked to give a correct or grammatical version of the sentences presented. The results showed no differences between the two groups of children in their ability to detect grammatical errors in their native language. However, balanced bilinguals showed a greater ability to make grammatical corrections and to detect confusions between their two languages. Contrary to popular belief that early bilingualism causes confusion and interference between the two languages, the balanced-bilingual children in this study showed an awareness of the independence and proper separate usage of their two languages.
Concept Formation By far, the most detailed descriptions of concept formation in childhood are those by Jean Piaget. His theory of cognitive development emphasizes the importance of four different factors in the development of intelligence: maturation, experience, social interaction, and equilibration (Flavell, 1963). Although Piaget’s theory implies the existence of stages with a universal invariant sequence in development, his interactionist formulations allow for the role of experience and social interaction in the acceleration or retardation of different cognitive abilities. Using a Piagetian theoretical framework, and capitalizing on the fact that bilinguals are exposed to a unique and complex “two worlds of experience,” Liedtke and Nelson (1968) investigated differences between bilinguals and monolinguals on a concept formation task. Based on tasks similar to those used by Piaget, Inhelder, & Szeminska (1960), Liedtke and Nelson (1968) constructed a test on concepts of linear measurement. The test measured six different aspects of linear measurement: (a) reconstructing relations of distance, (b) conservation of length, (c) conservation of length with change of position, (d) conservation of length with distortion of shape, (e) measurement of length, and (f ) subdividing a straight line. The test was administered to English-French bilingual and
Salkind_Chapter 24.indd 177
9/4/2010 10:33:58 AM
178
Curriculum, Instruction and Learning
English monolingual first-grade children in Canada. The bilingual sample consisted of children who were exposed to the two languages at home; that is, simultaneous learners of the languages. The monolingual subjects came from monolingual homes and had no functional knowledge of a second language. Subjects’ IQs, socioeconomic status, as well as a measure of their kindergarten attendance, were carefully controlled. Subtests a to d yielded a measure of children’s ability to conserve length, while subtests e and f yielded a measure of children’s ability to measure length. On both measures, bilinguals performed significantly better than their monolingual counterparts. After such strict experimental controls, the results were clearly in favor of the bilingual children; so much so, in fact, that the authors were carried away in their enthusiasm for bilingual education: If bilingualism increases intellectual potential and is beneficial to concept formation [as the study shows], then a second language should be introduced during the early years when experience and environmental factors are most effective in contributing to the development of intelligence. (p. 231)
In a modest attempt to reconcile Piaget’s and Vygotsky’s conceptions of thought and language, Bain (1974) examined the effects of bilingualism on “discovery learning” tasks (see Gagne & Brown, 1961, for a detailed description of such tasks). The paradigm of Bain’s study was to discover the rules that lead to solution of linear numerical problems such as, A. 1, 3, 7, 15, 31, _____ B. 1 2, 1, 1 1 2 , 2, 2 1 2 , 3, ⎯⎯⎯ C. 1, 2, 4, 8, 16, 32, ——— Children were presented with two sets of items on 2 different days. On the second day of testing, children were told to “use the rules that you learnt last day to help you solve the problems” (p. 123). The task was chosen because it involved the ability to discover a rule and then use the rule to deduce a certain outcome. Also, a second round of testing with similar items demanded transferring the newly derived rules to novel situations. In Piagetian terms, the task involved concept formation abilities such as classification and generalization of rules. Bain’s sample consisted of French-English balanced bilinguals and a control group of monolingual English speakers. Besides controlling for group differences in variables such as IQ, socioeconomic status, and school grades, Bain controlled for his sample’s developmental level of operations. Over a 1-week period, he administered conservation tasks to both bilingual and monolingual children and selected only subjects whose explanations for conserving mass, weight, and volume placed them at the concrete-operational level of thought. Bain’s research question could then be reformulated as follows: Do differences in linguistic experience (bilingual vs. monolingual) affect the cognitive performance
Salkind_Chapter 24.indd 178
9/4/2010 10:33:58 AM
Diaz
Thought and Two Languages
179
of children who are at similar levels of cognitive development? According to Bain, if concrete operational bilingual children perform better than comparable monolinguals on tasks requiring formal operations, then one could conclude that linguistic experiences do indeed affect the development of cognitive structures, and therefore Vygotsky’s position would be supported. Before the test was administered, children were asked to proceed as fast as they could, but to complete one item before going to the next. Two measures of response latency were taken: discovery time, the time it took subjects to complete the first set of items; and transfer time, the time it took to complete the second set of items at a later date. Bilinguals completed the first set of items approximately 8 minutes earlier than their monolingual peers (discovery time = 31.25 minutes for bilinguals vs. 39.48 minutes for monolinguals). The difference, however, failed to reach statistical significance ( p = .17). There were no substantial group differences on the transfer time measure. Unfortunately, the results of this experiment are difficult to interpret for two reasons. First, the sample was rather small, including only 20 children, 10 subjects in each comparison group. Second, Bain does not report whether children responded to the items correctly. Without this information, a faster discovery time could also mean that bilinguals were more impulsive, that is, faster than their monolingual peers at the expense of accurate performance. Nevertheless, assuming that Bain’s (1974) findings are valid, and taken together with Liedtke and Nelson’s (1968) results, it seems that balanced bilinguals do enjoy some advantages over monolinguals in concept-formation abilities. In summary, bilinguals demonstrate a greater grasp of linear measurement concepts and a greater facility to discover additive rules in a string of numbers than their monolingual counterparts. More important, the findings from the experiments reviewed in this section give modest support to Vygotsky’s contention that language influences the development of new cognitive structures.
Divergent Thinking Skills and Creativity With few exceptions, the majority of studies that have investigated the relationship between bilingualism and creative abilities have used the Torrance Tests of Creative Abilities (Torrance 1966a, 1966b) as their dependent variable. Although different definitions of creativity are available (see, e.g., Rothenberg & Hausman, 1976), it is no surprise that researchers interested in the effects of bilingualism chose Torrance’s formulations as their conceptual framework. For Torrance, creativity is closely identified with divergent productions and transformations with the ability to take different perspectives and different approaches to a given problem. Moreover, Torrance strongly believes that creativity can be trained and that it is, therefore, vulnerable to
Salkind_Chapter 24.indd 179
9/4/2010 10:33:58 AM
180
Curriculum, Instruction and Learning
the influence of cultural factors. In fact, so close were his ideas of creativity to the abilities affected by bilingualism, that Torrance himself conducted a large-scale study comparing the creative functioning of bilingual and monolingual children in Singapore (Torrance, Wu, Gowan, & Aliotti, 1970) Influenced by Guilford’s “Structure of the Intellect” model and his concern regarding the measurement of thinking abilities involved in creativity (Guilford, 1967), Torrance developed tests that measured fluency, flexibility, originality, and elaboration, involving both verbal and visual stimuli. Although a detailed description of these abilities is beyond our purposes here, a brief outline of Torrance’s tests is called for to better understand and interpret the results of the studies to be reviewed. Figural Form A of the test consists of three 10-minute tasks: Picture Construction, Picture Completion, and Repeated Figures (Parallel Lines). The “ideational” form of the test involves verbal stimuli and ideas rather than figures. Figural flexibility, for example, would be a measure of the different patterns that a child can create using the same set of lines. Fluency (figural or ideational) refers to the number of associations to a given stimulus expressed in a given amount of time. Usually, six measures can be derived from children’s performance on these tests: verbal fluency, flexibility, and originality, as well as figural fluency, flexibility, and originality. A measure of elaboration can also be derived from these tests. However, the criteria for scoring elaboration are not too clear, and investigators shy away from such measure. Postulating both possible positive and negative effects of bilingualism on creative functioning, Torrance et al. (1970) tested 1,063 third- to fifth-grade bilingual and monolingual children in Singapore. The bilingual sample included Chinese-English and Malayan-English speaking children. Torrance and his coworkers hypothesized, on one hand, that bilingualism could have negative effects on fluency and flexibility skills. They believed that bilingualism fostered a competition of associations: that is, older associations could compete with the assimilation of new associations, a kind of “negative transfer” between the two languages. In their words, When a child reared during his early years in a particular culture learns to speak the language common with that culture, and then enters a school where instruction is in a different language and the practices and ways of thinking of a different culture predominate, one has a good example of this negative transfer. (p. 72)
On the other hand, Torrance et al. expected a positive correlation between bilingualism and originality. They argued that the competition between the two languages, between old and new association, should facilitate originality, especially if originality was assessed independently of fluency. As expected, the results of the study showed that monolinguals surpassed bilinguals on both measures of fluency and flexibility. In addition, as the authors hypothesized, bilinguals scored higher than monolinguals on both originality and
Salkind_Chapter 24.indd 180
9/4/2010 10:33:58 AM
Diaz
Thought and Two Languages
181
elaboration. However, the group differences in originality, though obviously in favor of the bilinguals, failed to reach statistical significance. The results of the study just described must be evaluated with a great deal of caution. First, there were no measures of relevant variables such as IQ, socioeconomic status, or children’s actual knowledge of the two languages to insure that the two groups differed only in the bilingual versus monolingual dimension. Second, the authors do not specify what criteria they used to include children in the bilingual sample. It should be noted that the bilingual children in this study attended Malaysian-, Chinese-, or English-speaking schools. The children were not attending bilingual education programs where both languages are maintained and equally developed. It is most likely that the sample consisted of semilingual rather than bilingual children; that is, children whose native language was being gradually replaced by exposure and formal instruction in a second language. In fact, the situation of linguistic interference and negative transfer that Torrance and his coworkers described is a more accurate description of semilingualism than of genuine bilingualism. And third, one must be a bit skeptical about the construct “creative functioning” when there is so little relationship between subtests that purportedly measure creativity, especially when trends in subtest performance are so distinctly reversed within the same group of children. In a somewhat better controlled study, Landry (1974) examined the creative abilities of children who were learning a foreign language in elementary school. Landry compared children who attended both Foreign Language in the Elementary School (FLES) and regular school programs. To study the effectiveness of the FLES program in promoting creative abilities, Landry eliminated from the sample those children who had a bilingual home background; he tested both first and third graders, monolinguals and secondlanguage learners. As expected, there were no differences between the FLES and non-FLES first graders; Landry explained this finding in terms of first graders’ limited exposure to the second language. By the third grade, however, children learning a second language showed significant advantages on all measures of the Torrance test. Stretching the notion of cognitive flexibility a bit too far, Landry concluded that the flexibility produced by learning a second language was conducive to both divergent thinking and originality.
Cognitive Style Several investigators have been interested in the influence of bilingualism on children’s cognitive style (cf., Duncan & DeAvila, 1979; Ramirez, Castaneda, & Herold, 1974; Ramirez & Price-Williams, 1974). Cognitive style usually refers to “individual variations in modes of perceiving, remembering, and thinking, or as distinctive ways of apprehending, sorting, remembering, transforming and utilizing information” (Kogan, 1971, as cited in Duncan & DeAvila, 1979, p. 21). Involved in the conceptualization of cognitive style is the notion that
Salkind_Chapter 24.indd 181
9/4/2010 10:33:58 AM
182
Curriculum, Instruction and Learning
there is diversity in cognitive performance; diversity, however, is regarded as value-neutral, with no implications of better or worse, bright or dull. Witkin and Goodenough (1977), for example, stress that each pole of the field dependence/independence cognitive styles has adaptive characteristics. It is not surprising, therefore, that minority researchers have made efforts to understand the effects of bilingualism on cognitive style and have advocated value-neutral formulations of cognitive performance. Among the many possible dimensions of cognitive style, field dependence/independence has been the most widely studied. Although measures of field dependence/independence are usually simple and straightforward, such as subjects’ performance on the familiar Embedded Figures Test, there are almost as many definitions of this construct as there are investigators in the field. Field independence, for example, usually refers to a measure of a subject’s ability to overcome the effects of a visually distracting background. Nevertheless, field independence has also been conceptualized as a personality characteristic of assertiveness, as a cognitive restructuring competency, and as an intellectual and perceptual segregation of the “me” and “not me” (Witkin & Goodenough, 1977; see also Cazden & Leggett, 1981; Duncan & DeAvila, 1979, for reviews of the pertinent literature). With this warning in mind, let us review the major formulations and empirical findings on the effects of bilingualism on field-dependent and independent cognitive styles. Ramirez (1973) argued that achievement and success in U.S. mainstream education are associated with characteristics of the field-independent person. He further claims that the academic failure of Mexican-American children can be attributed mainly to the predominantly field-dependent cognitive style of these children. Some studies (Buriel, 1975; Sanders, Scholz, & Kagan, 1976) have shown, indeed, that Mexican-American children tend to be more field dependent than their Anglo-American counterparts according to their performance on the Portable Rod and Frame Test. To emphasize the positive cognitive and social aspects of this style, Ramirez and Castaneda (1974) substituted the term “field dependence” with “field sensitivity.” In the social sphere, for example, field dependence is associated with more sensitivity to social feedback and a more developed repertoire of interpersonal behaviors. Following the same line of thought, Ramirez and his coworkers suggested that cognitive style varies with the degree of assimilation to the mainstream culture. Furthermore, they suggest that speaking two languages and belonging to two cultures fosters some kind of “bicognitivity”; that is, “in the same way that the bilingual child switches language codes in response to the demand characteristics of the socio-linguistic situation, so the bicognitive child switches cognitive styles as demanded” (Duncan & DeAvila, 1979, p. 25). Although these are fascinating theoretical formulations relating bilingualism to cognitive styles, the empirical evidence is rather weak and not convincing. First, the findings are not consistent across studies; in contrast to studies using the Portable Rod and Frame Test, some studies using the Children’s
Salkind_Chapter 24.indd 182
9/4/2010 10:33:58 AM
Diaz
Thought and Two Languages
183
Embedded Figures Test (CEFT) did not find significant differences between bilinguals’ and monolinguals’ cognitive styles. In fact, when reviewing such studies, Kagan & Buriel (1977) argued that at this time it is meaningless to describe Mexican-American children as more field dependent than their Anglo-American peers. Second, most of these studies have not measured children’s language proficiency in either English or Spanish, so it is difficult to sort out the influence of linguistic variables from the effects of other cultural and socioeconomic variables on cognitive style differences found so far. To the best of my knowledge, only one study has looked at the relation between bilingualism and field dependence/independence, carefully controlling for the sample’s actual degree of bilingualism. Using the Language Assessment Scale, Duncan and DeAvila (1979) assessed the relative linguistic proficiency in English and Spanish in four groups of children of Hispanic background in grades one and three. The sample included urban and rural Mexican Americans, Puerto Ricans, and Cuban Americans. Through performance on the Language Assessment Scale, and according to their relative proficiency in English and Spanish, children were classified into five groups ranging from late language learners (poor in both languages) to proficient bilinguals. Of course, the sample included monolinguals of both languages. Field dependence/independence was assessed through two different measures: the CEFT and the Draw a Person Test (DAP). The results of the study showed that proficient bilingual children outperformed the monolingual children on both the CEFT and the DAP test. Proficient (i.e., balanced) bilingual children showed more advanced skills at perceptual disembedding and produced the most articulate or “field-independent” drawings. The investigators also found a positive linear relationship between degree of relative language proficiency in English and Spanish and field independence. It should be noted that in this study children who had not yet achieved an adequate balance between their two languages, that is, the partial and limited bilinguals, performed similarly to the monolingual group; there was no evidence of negative cognitive effects as a result of exposure to a second language. The authors concluded that their results support Cummins’ (1976) threshold hypothesis, namely, that a certain level of proficiency in both languages must be obtained before bilingualism can show its positive effects on cognitive variables.
Summary and Conclusion During the last two decades, many studies have presented evidence showing a positive influence of bilingualism on children’s cognitive and linguistic abilities. When compared to monolinguals, balanced bilingual children show definite advantages on measures of metalinguistic abilities, concept formation, field independence, and divergent thinking skills. Although the cognitive
Salkind_Chapter 24.indd 183
9/4/2010 10:33:58 AM
184
Curriculum, Instruction and Learning
advantages of bilingual children have been explained in several ways, the empirical literature gives most support to the “objectification” hypothesis (see Cummins, 1977) that bilingualism accelerates cognitive development by fostering an early awareness of the objective and structural properties of language. Indeed, as several studies have shown, bilingual children demonstrate a keen awareness of the arbitrariness of language, as well as an early capacity to focus on linguistic structure and detail. Nevertheless, presently there are not enough data or adequate cause-effect analyses to accept the objectification hypothesis without further doubt. Generally, present-day investigators have been able to master most of the methodological difficulties encountered by early studies in the field. However, to adequately conclude this review, two major gaps in current research should be brought to surface. First, current research fails to look at the effects of bilingualism on nonbalanced bilinguals, that is, on children who have disparate abilities in the two languages. Many children who attend bilingual education programs in this country come to school with little or no knowledge of English. These children are nonbalanced bilinguals for a good number of years, and little is known about the immediate cognitive effects of gradually learning a second language during the early years of schooling. Moreover, it is not clear what percentage of these children will actually attain a reasonable degree of balance between their two languages to insure a positive effect of their bilingual experience. It is important to note that when one looks at balanced bilinguals only, one necessarily excludes children who are limited in their second-language proficiency for historical or environmental reasons; that is, children who recently have been exposed to a second language and are therefore at initial stages of second-language learning. At beginning stages of second-language learning, children must learn not only a new and different vocabulary, but also different syntactic rules and linguistic constraints. It is likely that during initial stages of second-language learning, children must exercise a great deal of cognitive effort to produce grammatically correct utterances. Once basic rules and linguistic constraints are mastered, second-language development proceeds through the less painful process of learning new vocabulary and idioms. Although there are no empirical data to support the above statements, one could logically hypothesize that initial stages of secondlanguage learning will produce the most dramatic, and perhaps negative, cognitive effects on young second-language learners. These effects must be temporary for those children who will develop toward balanced bilingualism; the issue, however, is still an open empirical question. A second gap in current research is that, to the best of my knowledge, there are no information-processing studies of young bilingual children. Because studies in this area have relied mostly on data from psychometric tests, it is not clear what cognitive processes or processing strategies, if any,
Salkind_Chapter 24.indd 184
9/4/2010 10:33:58 AM
Diaz
Thought and Two Languages
185
truly differentiate bilingual from monolingual children. Most psychometric tests of ability really measure a whole array of different cognitive abilities and tap a wide variety of different processes. It is almost impossible, with our present knowledge, to develop a process model of how bilingualism affects children’s cognitive abilities or accelerates cognitive development. The development (and empirical support) of a detailed model relating bilingualism and cognitive development is still a few years ahead.
Reference Notes 1. Diaz, R. M., & Hakuta, K. Bilingualism and cognitive development: A comparison of balanced and non-balanced bilinguals. Paper presented at the meeting of the Society for Research in Child Development, Boston, April 1981. 2. Skuttnabb-Kangas, T. Bilingualism, semilingualism and school achievement. Paper presented at the Fourth International Conference of Applied Linguistics, Stuttgart, August 1975. 3. Casserly, S. M., & Edwards, H. P. Detrimental effects of grade one bilingual programs: An exploratory study. Paper presented at the annual conference of the Canadian Psychological Association, Victoria, June 1973.
References Ahmed, M. A. S. Mental manipulation. Egyptian Yearbook of Psychology, 1954, 1, 23–88. American Institutes for Research (AIR). Evaluation of the impact of ESEA Title VII Spanish/ English Bilingual Education Program. Palo Alto, Calif.: Author, 1977. Anastasi, A. Psychological testing (2nd ed.). New York: Macmillan, 1961. Arsenian, S. Bilingualism and mental development. New York: Columbia University Press, 1937. Bain, B. Bilingualism and cognition: Toward a general theory. In S. T. Carey (Ed.), Bilingualism, biculturalism, and education: Proceedings from the Conference at College Universitaire Saint Jean. Edmonton: The University of Alberta, 1974. Balkan, L. Les effets du bilinguisme francais-anglais sur les aptitudes intellectuelles. Bruxelles: Aimav, 1970. Barke, E. M., & Perry-Williams, D. E. A further study of the comparative intelligence of children in certain bilingual and monoglot schools in South Wales. British Journal of Educational Psychology, 1938, 8, 63–77. Ben-Zeev, S. Mechanisms by which childhood bilingualism affects understanding of language and cognitive structures. In P . A. Hornby (Ed.), Bilingualism: Psychological, social, and educational implications. New York: Academic Press, 1977. (a) Ben-Zeev, S. The influence of bilingualism on cognitive strategy and cognitive development. Child Development, 1977, 48, 1, 009–1, 018. (b) Ben-Zeev, S. The effects of bilingualism in children from Spanish-English low economic neighborhoods on cognitive development and cognitive strategy. Working Papers on Bilingualism, 1976, 9, 83–122. Blanco, G. The education perspective. In Bilingual education: Current perspectives (Vol. 4). Arlington, Va.: Center for Applied Linguistics, 1977. Brunner, E. D. Immigrant farmers and their children. New York: Doubleday, Doran, & Co., 1929. Buriel, R. Cognitive styles among three generations of Mexican-American children. Journal of Cross-Cultural Psychology, 1975, 6, 417–429.
Salkind_Chapter 24.indd 185
9/4/2010 10:33:58 AM
186
Curriculum, Instruction and Learning
Carrow, S. M. A. Linguistic functioning of bilingual and monolingual children. Journal of Speech and Hearing Disorders, 1957, 22, 371–380. Cazden, C. B., & Leggett, E. L. Culturally responsive education: A discussion of Lau Remedies II. In H. T. Trueba, G. P. Gunthrie, & K. H. Au (Eds.), Culture and the bilingual classroom: Studies in classroom ethnography. Raleigh, Mass.: Newbury House, 1981. Cummins, J. The influence of bilingualism on cognitive growth: A synthesis of research findings and explanatory hypotheses. Working Papers on Bilingualism, 1976, 9, 1–43. Cummins, J. Metalinguistic development of children in bilingual education programs: Data from Irish and Canadian Ukranian-English programs. In M. Paradis (Ed.), The Fourth Locus Forum 1977. Columbia, S.C.: Hornbeam Press, 1978. Cummins, J. Cognitive factors associated with the attainment of intermediate levels of bilingual skill. Modern Language Journal, 1977, 61, 3–12. Cummins, J., & Gulutsan, M. Bilingual education and cognition. Alberta Journal of Educational Research, 1974, 20, 259–269. Darcy, N. T. Bilingualism and the measurement of intelligence: Review of a decade of research. Journal of Genetic Psychology, 1963, 103, 259–282. Darcy, N. T. A review of the literature on the effects of bilingualism upon the measurement of intelligence. Journal of Genetic Psychology, 1953, 82, 21–57. Duncan, S. E., & DeAvila, E. A. Bilingualism and cognition: Some recent findings. NABE Journal, 1979, 4, 15–50. Epstein, I. La pensee et la poligloise. Lausanne: Libraire Payot, 1905. Evans, S. J. Address of the Conference of Headmasters of Grammar Schools, Wales, 1906. In Central Advisory Council for Education (Wales), The Place of Welsh and English in the Schools of Wales. London: Her Majesty’s Stationary Office, 1953. Feldman, C, & Shen, M. Some language-related cognitive advantages of bilingual fiveyear-olds. Journal of Genetic Psychology, 1971, 118, 235–244. Fishman, J. A. Review of Bilingualism and primary education by J. Nacnamara. Irish Journal of Education, 1967, 1, 79–83. Fishman, J. A. The social science perspective. In Bilingual Education: Current Perspectives (Vol. 1). Arlington, Va.: Center for Applied Linguistics, 1977. Flavell, J. H. The developmental psychology of Jean Piaget. Princeton, N.J.: Van Nostrand, 1963. Fritz, R. A., & Romkin, N. R. The English handicap of junior high school pupils from foreign speaking homes, and remedial suggestions. Journal of Educational Research, 1934, 27, 412–421. Fukuda, T. A survey of the intelligence and environment of school children. American Journal of Psychology, 1925, 36, 124–139. Gagne, M., & Brown, L. Some factors in the programming of conceptual learning. Journal of Experimental Psychology, 1961, 62, 313–321. Grabo, R. P. A study of comparative vocabularies of junior high school pupils from English and Italian speaking homes. Bulletin No. 13. Washington, D.C.: U. S. Office of Education, 1931. Hansegard, N. E. Tvasprakighet eller Halvsprakighet (Bilingualism or semilingualism). Stockholm: Aldurs/Bonniers, 1968. Harris, C. W. An exploration of language skill patterns. Journal of Educational Psychology, 1948, 32, 351–364. Hill, H. S. The effects of bilingualism on the measured intelligence of elementary school children of Italian parentage. Unpublished doctoral dissertation, Rutgers University, 1935. Hunt, E. Quote the raven? Never more. In L. W. Gregg (Ed.), Knowledge and cognition. Hillsdale, N.J.: Lawrence Erlbaum Associates, 1974. Ianco-Worrall, A. D. Bilingualism and cognitive development. Child Development,1972, 43, 1,390–1,400. Kagan, S., & Buriel, R. Field dependence-independence and Mexican-American culture and education. In J. L. Martinez, Jr. (Ed.), Chicano psychology. New York: Academic Press, 1977.
Salkind_Chapter 24.indd 186
9/4/2010 10:33:58 AM
Diaz
Thought and Two Languages
187
Kirby, J. R., & Das, J. P. Skills underlying the Coloured Progressive Matrices. The Alberta Journal of Educational Research, 1978, 24, 94–99. Lambert, W. E., Tucker, G. R., & D’Anglejan, A. Cognitive and attitudinal consequences of bilingual schooling: The St. Lambert project through grade five. Journal of Educational Psychology, 1973, 65, 141–159. Lambert, W. E., & Tucker, G. R. Bilingual education of children: The St. Lambert experiment. Rowley, Mass.: Newbury House, 1972. Landry, R. G. A comparison of second language learners and monolinguals on divergent thinking tasks at the elementary school level. Modern Language Journal, 1974, 58, 10–15. Lavoie, G., & Laurendau, M. Tests collectivs d’ intelligence generale. Montreal: Institut de Recherches Psychologiques, 1960. Lenneberg, E. H. Biological foundations of language. New York: Wiley, 1967. Leopold, W. F. Speech development of a bilingual child: A linguist’s record (4 vols.). Evanston, Ill.: Northwestern University Press, 1939, 1947, 1949a, 1949b. Leopold. W. F. Patterning in children’s language learning. In S. Saporta (Ed.), Psycholinguistics. New York: Holt, Rinehart, & Winston, 1961. Liedtke, W. W., & Nelson, L. D. Concept formation and bilingualism. Alberta Journal of Educational Research, 1968, 14, 225–232. Macnamara, J. The concession on Irish: Psychological aspects. Studies, 1964, 164–173. Macnamara, J. Bilingualism and primary education. Edinburgh: Edinburgh University Press, 1966. Manuel, H. T. A comparison of Spanish-speaking and English-speaking children in reading and arithmetic. Journal of Applied Psychology, 1935, 19, 189–201. McCarthy, D. A. The language development of the pre-school child: Minneapolis: University of Minnesota Press, 1930. McLaughlin, B. Second language acquisition in childhood. Hillsdale, N.J.: Lawrence Earlbaum Associates, 1978. Morrison, J. R. Bilingualism: Some psychological aspects. Advanced Science, 1958, 56, 287–290. Murdoch, K. A., Maddow, D., & Berg, N. L. A study of the relation between intelligence and the acquisition of English. The 27th yearbook of the National Society for the Study of Education (Part 1). Bloomington, Ill.: Public School Publishing, 1928. O’Doherty, E. F. Bilingualism: Educational aspects. Advanced Science, 1958, 56, 282–286. Paulston, C. B. Ethnic relations and bilingual education: Accounting for contradictory data. Working Papers on Bilingualism, 1975, 6, 1–44. Paulston, C. B. Viewpoint: Research. In Bilingual education: Current perspectives (Vol. 2). Arlington, Va.: Center for Applied Linguistics, 1977. Pavlovitch, M. Lelanguage enfantin: Acquisition du serbe et du francais par un enfant serbe. Paris: Champion, 1920. Peal, E., & Lambert, W. The relation of bilingualism to intelligence. Psychological Monographs, 1962, 76(546), 1–23. Penfield, W., & Roberts, L. Speech and brain mechanisms. Princeton, N.J.: Princeton University Press, 1959. Piaget, J., Inhelder, B., & Szeminska, A. The child’s conception of geometry. New York: Basic Books, 1960. Pifer, A. Bilingual education and the Hispanic challenge. The President’s (1979) Annual Report. New York: Carnegie Corporation of New York, 1980. Ramirez, M. Cognitive styles and cultural democracy in education. Social Science Quarterly, 1973, 53, 895–904. Ramirez, M., Castaneda, A., & Herold, P. L. The relationship of acculturation to cognitive style among Mexican Americans. Journal of Cross-Cultural Psychology, 1974, 5, 425–433. Ramirez, M., & Castaneda, A. Cultural democracy, bicognitive development, and education. New York: Academic Press, 1974.
Salkind_Chapter 24.indd 187
9/4/2010 10:33:59 AM
188
Curriculum, Instruction and Learning
Ramirez, M., & Price-Williams, D. Cognitive styles in children: Two Mexican communities. International Journal of Psychology, 1974, 8, 93–100. Ronjat, J. Le development du language observe chez un enfant bilingue. Paris: Champion, 1913. Rothenberg, A., & Hausman, C. R. (Eds.). The creativity question. Durham, N.C.: Duke University Press, 1976. Saer, D. J. The effects of bilingualism on intelligence. British Journal of Psychology, 1923, 14, 25–38. Saer, H. Experimental inquiry into the education of bilingual peoples. In Education in a changing commonwealth. London: New Educational Fellowship, 1931. Sanchez, G. I. The implications of a basal vocabulary to the measurement of the abilities in bilingual children. Journal of Social Psychology, 1934, 5, 395–402. Sanders, M., Scholz, J. P., & Kagan, S. Three social motives and field independencedependence in Anglo American and Mexican American children. Journal of CrossCultural Psychology, 1976, 7, 451–462. Thurstone, L. L., & Thurstone, T. G. Primary mental abilities: Ages 7 to 11. Chicago: Science Research Associates, 1954. Torrance, E. P . Torrance tests of creative thinking: Directions manual and scoring guide. Figural Test Booklet A. Princeton, N.J.: Personnel Press, 1966. (a) Torrance, E. P . Torrance tests of creative thinking: Norms-technical manual (Research ed.). Princeton, NJ.: Personnel Press, 1966. (b) Torrance, E. P., Wu, J. J., Gowan, J. C, & Aliotti, N. C. Creative functioning of monolingual and bilingual children in Singapore. Journal of Educational Psychology, 1970, 61, 72–75. Tucker, G. R., & D’Anglejan, A. Some thoughts concerning bilingual education programs. Modern Language Journal, 1971, 55, 491– 493. Vygotsky, L. S. [Multilingualism in children] (M. Gulutsan & I. Arki, trans.). Edmonton: The University of Alberta, 1975. (The original essay appears in L. U. Zankov, Zh. I. Skif, & D. B. EI’konin [Eds.], Umstvennoe razvitie detei v protsse obucheviia, spornik statei [Mental development of children in the process of education, a collection of essays]. Moscow: State Pedagogical Publishing House, 1935.). Vygotsky, L. S. Thought and language. Boston: MIT Press, 1962. Warren, R. M., & Warren, R. P. A comparison of speech perception in childhood, maturity, and old age by means of the verbal transformation effect. Journal of Verbal Learning and Verbal Behavior, 1966, 5, 142–146. Witkin, H. A., & Goodenough, D. R. Field dependence revisited (Research Bulletin 77–16). Princeton, N.J.: Educational Testing Service, 1977. Yoshioka, J. G. A study of bilingualism. Journal of Genetic Psychology, 1929, 36, 473–479.
Salkind_Chapter 24.indd 188
9/4/2010 10:33:59 AM
25 Components of a Psychology of Instruction: Toward a Science of Design Robert Glaser
I
t is a well-known historical fact that two major areas of scientific psychology, psychometrics and general experimental psychology, came out of different traditions and have developed in different ways. Psychometrics has become a major technological application of psychology, with primary effort being devoted to practical techniques and less effort to theoretical concerns. In contrast, the experimental psychology of learning and cognition has been almost exclusively a theoretical endeavor, with little effort devoted to application and the design of practical techniques for assisting in the conduct of human affairs. Although practical work has been carried out in educational psychology, industrial psychology, and human engineering, no integrated body of special technique of application has emerged. In recent years, however, there has been increasing interest in and social pressure for the development of professional techniques for the application of what knowledge there is of learning, cognitive processes, and human development. It appears that some linking of theory and practice needs to take place. It is of interest to note in this regard that John Dewey, in his presidential address before the American Psychological Association in 1899, expressed concern about developing a linking science between psychological theory and practical work. Dewey said the following:
Source: Review of Educational Research, 46(1) (1976): 1–24.
Salkind_Chapter 25.indd 189
9/4/2010 10:33:29 AM
190
Curriculum, Instruction and Learning
“Do we not lay a special linking science everywhere else between the theory and practical work? We have engineering between physics and the practical workingmen in the mills; we have a scientific medicine between the natural science and the physician.”1 The sentences suggest . . . that the real essence of the problem is found in . . . [a] connection between the two extreme terms – between the theorist and the practical worker – through the medium of the linking science. The decisive matter is the extent to which the ideas of the theorist actually project themselves, through the kind offices of the middleman, into the consciousness of the practitioner. It is the participation by the practical man in the theory, through the agency of the linking science, that determines at once the effectiveness of the work done, and the moral freedom and personal development of the one engaged in it. (1900, pp. 110–111) It is the [teacher’s] inability to regard, upon occasion, both himself and the child as just objects working upon each other in specific ways that compels him to resort to purely arbitrary measures, to fall back upon mere routine traditions of school teaching, or to fly to the latest fad of pedagogical theorists – the latest panacea peddled out in school journals or teachers’ institutes – just as the old physician relied upon his magic formula. (pp. 112–113)
In this paper, my concern is similar to Dewey’s, and I would like to speculate on the nature of a “linking science” – a psychology of instruction – between the scientific knowledge of learning (including human cognition and development) and educational applications. As a further historical note, I refer to Edward L. Thorndike’s book, published in 1922, entitled The Psychology of Arithmetic. In the preface, Thorndike wrote as follows: Within recent years there have been three lines of advance in psychology which are of notable significance for teaching. The first is the new point of view concerning the general process of learning. We now understand that learning is essentially the formation of connections or bonds between situations and responses, that the satisfyingness of the result is the chief force that forms them, and that habit rules in the realm of thought as truly and as fully as in the realm of action. The second is the great increase in knowledge of the amount, rate, and conditions of improvement in those organized groups of hierarchies of habits which we call abilities, such as ability to add or ability to read. Practice and improvement are no longer vague generalities, but concern changes which are definable and measurable by standard tests and scales. The third is the better understanding of the so-called “higher processes” of analysis, abstraction, the formation of general notions, and reasoning. The older view of a mental chemistry whereby sensations were compounded into percepts, percepts were duplicated by images, percepts and images were amalgamated into abstractions and concepts, and these were manipulated by reasoning, has given way to the understanding of the laws of response to elements or aspects of situations . . . . This book presents the applications of this newer dynamic psychology to the teaching of arithmetic. (pp. v–vi)
Salkind_Chapter 25.indd 190
9/4/2010 10:33:29 AM
Glaser
Components of a Psychology of Instruction
191
In this book, Thorndike applied his theory and findings about learning directly to the teaching process. The theory of stimulus-response bonds that made up complex chains of behavior was applied to the analysis of arithmetic tasks; the task of adding integers, for example, was carefully analyzed in terms of S-R bonds that could be taught and observed by the teacher. Thorndike also applied the results of his experimental work on transfer of training and reward in suggesting practical teaching techniques. He rejected the old notion of training general faculties and accepted the fact that training needed to be carried out in more specific contexts. He injected his notions of reinforcement by indicating that students should work on problems where, as a result of carrying out a successful response, a student could see the utility of his behavior. There is an important difference between Dewey and Thorndike, in terms of the publications I have cited, with respect to what it takes to translate science into practice. Dewey pressed for some kind of intermediate linking science. He conceived of a special structure that intervened between scientific theory and practical application. Thorndike, on the other hand, was concerned with the more direct application of what he knew about learning and psychological method to teaching practice. In addition to his general theory of learning, he brought to educational topics a scientific approach which involved careful analysis of the nature of the task, the design of teaching techniques as a function of his experimental findings, and measurement of what the task analysis indicated were the components of the performance being learned. Thorndike’s approach set a very special pattern: the combination in one person of the theoretical scientist and the applied scientist interested in designing instructional procedures. And since that time, for major advances in the psychology of instruction, we have come to look for individuals interested in both fields, particularly someone trained in the science of psychology who is motivated to look at problems in education. Such a tactic, however, has its shortcomings. It is a highly individualistic, noncumulative kind of venture which does not necessarily lead to the development of a linking science in which knowledge can be accumulated into a body of techniques and procedures for practical application by a professional. In contrast, my concern in this paper is with the possibilities for the development of a linking structure which, because of its own cumulative strength as a body of theory and practice, would be less dependent upon the sporadic interests and insights of individuals. In the sense described above, B. F. Skinner continued in the pattern of E. L. Thorndike, and most of those who became interested in programmed learning and teaching machines continued to work in this mode. As the field became popular, however, it took on a superficial momentum that separated it from the implicit theory that generated it; no substantial structure was built up into which new data, parameters of application, and boundary conditions could be placed.
Salkind_Chapter 25.indd 191
9/4/2010 10:33:29 AM
192
Curriculum, Instruction and Learning
In the late 1950’s and early 1960’s, as part of a general Zeitgeist, the notion of a linking science was being nurtured. Bruner (1964) contrasted the nature of a theory of instruction with a theory of learning. He pointed out that a theory of learning is descriptive, whereas a theory of instruction is prescriptive in the sense that it sets forth rules specifying the most effective way to achieve knowledge or mastery of skills. A theory of learning describes, after the fact, the conditions under which some competence is acquired. A theory of instruction is a normative theory in that it sets up criteria of performance and then specifies the conditions required for meeting them. Skinner, too, had made this point in the course of his interest in the technology of teaching, since the nature of his approach to the study of behavior makes the development of procedures for prescribing conditions for learning almost indistinguishable from a theoretical description of learning. Most approaches to psychological knowledge emphasize both the theoretical and empirical description of learning, they have not been concerned with the problems of prescriptive science. There is, however, at the present time a growing feeling that a strong test of the adequacy of descriptive theory in the behavioral and social sciences can be made through attempts at application based upon the development of prescriptive theory for the design of social policy and social institutions, including education.
The Activity of Design The general characteristics of a prescriptive science of design have been discussed recently by Herbert Simon in his book, The Sciences of the Artificial (1969). Simon’s ideas on this matter are worth noting here. He points out that it traditionally has been the task of the sciences and other disciplines in the university to describe how things are and how they work, and it has been the task of professional schools to teach how to design and make things. The intellectual activity of design is involved not only in producing material artifacts as in engineering, but also in prescribing remedies for a sick patient, devising a sales plan for a company, constructing a new social welfare policy for a state, and designing a program of instruction for a school system. Simon writes: Design, so construed is the core of all professional training; it is the principal mark that distinguishes the professions from the sciences. Schools of engineering, as well as schools of architecture, business, education, law, and medicine, are all centrally concerned with the process of design. (pp. 55–56)
In view of the key role of design in professional activity, it is ironic, Simon argues, that prescriptive design sciences are less prominent in professional school curricula than they might be.
Salkind_Chapter 25.indd 192
9/4/2010 10:33:29 AM
Glaser
Components of a Psychology of Instruction
193
Engineering schools have become schools of physics and mathematics; medical schools have become schools of biological sciences; business schools have become schools of finite mathematics. The use of adjectives like “applied” conceals, but does not change, the fact. (p. 56)
Curriculum topics are selected from disciplines that are thought to be most relevant to professional practice; but design, as distinguished from descriptive analysis, is not necessarily taught. To some extent, this phenomenon is a function of the professional schools being absorbed into the general culture of the university and hankering after respectability in terms of the prevailing norms of academic respectability. Descriptive theory and analysis is intellectually tough and prestigiously teachable. Design and application has generally appeared to be more intellectually soft, intuitive, and “cookbooky.” (I have before used the expression “by-the-numbers”; Simon’s word “cookbooky” is much better.) This certainly seems to be the existing state of affairs with respect to the application of psychology to the design of instruction. In an effort to explore the possibilities for design theory in psychology and education, a lead can be taken from certain intellectually rigorous practices that have been developed in other fields. The essence of design is to devise courses of action aimed at changing existing situations into preferred ones, and techniques called “optimization methods” have been developed in statistical decision theory, management science, and engineering design that are concerned with deciding upon optimal courses of action. In very general terms, the technique is this: Given a set of alternative goals or possibilities for action, certain fixed parameters and constraints of the situation, and a function that describes the relationship between these factors, find a set of values that provides the best means of attaining possible outcomes. A stock application of this paradigm, described by Simon, is to the so-called “diet problem.” Given the goal of losing a certain number of pounds; given parameters and constraints such as food prices and nutritional content; and given the relationship between the cost of a diet, calories per day, and minimum needs for nutritional requirements; find the kinds and quantities of food necessary to maximize utility – for example, no more than 2,000 calories per day with proper nutritional requirements. Once such a problem can be formalized in terms of a quantitative functional relationship, then standard mathematical techniques can be applied to maximize the outcome subject to the given constraints. On the basis of this solution, a course of action can be decided upon. In an exploratory way, the formal apparatus of optimization methods has been introduced into instructional design. Richard Atkinson and his students (Atkinson & Paulson, 1972; Groen & Atkinson, 1966) have described procedures for optimizing paired-associate list-learning of the kind found in initial reading tasks or in learning the vocabulary of a second language. This work makes it clear, however, that at the present time, the determination of optimal alternatives is a relatively easy matter only in “trivial” cases. Atkinson is careful to point out that formal uses of optimization routines developing out
Salkind_Chapter 25.indd 193
9/4/2010 10:33:29 AM
194
Curriculum, Instruction and Learning
of linear programming theory, dynamic programming, and control theory are of little help for the complex performances and instructional procedures of most interest in education. However, his work to date on simple cases might help clarify some of the steps involved in devising and testing optimal instructional strategies. A significant problem in using optimization methods is the requirement for a formal description of the functional relationships involved. If one can employ a formal model like statistical learning theory, then standard optimization methods can be applied. However, such formal descriptions are not readily forthcoming for the complex cognitive tasks and instructional procedures that are of central interest to educators. For progress now, on the basis of our current knowledge and ability to model and describe the learning process, new kinds of prescriptive methods are required. But still, descriptive theory of some kind is a necessary prerequisite for prescriptive theory if the design procedures we will use in the design of instruction are to be at all like the procedures used in other professions. Of significant interest is that instructional design – the development of instructional procedures and methods – can also become a strong way of testing descriptive theory.
Questions for Instructional Psychology For the development of an instructional psychology, there are two questions that need to be asked, the first methodological and the second substantive. The first is: What can be learned about techniques to be used in the application of psychological knowledge to the design of instruction from the strategies of design used in other fields? One answer to this question recognizes the fact that an effective design strategy incorporates procedures for identifying admissible alternatives and then proceeds to make decisions about the most satisfactory of these alternatives. In this regard, a main lesson to be learned from the work to date is that design is not merely assembling a problem solution from what is known, but is rather a search for the most appropriate assembling of the components involved. The components of a design problem need to be assembled into a number of alternative procedures; exploration of these tentative paths then needs to be pursued so that the most promising ones can be followed up and the less promising ones given a lower priority. The design process essentially involves the generation of alternatives and the testing of these alternatives against practical requirements, constraints, and values. This is not done in a single generate-and-test cycle, but through an iterative series involving the generation of alternatives, testing them (through actual smallscale studies or through simulation), describing revised alternatives, testing them, and so on. This will take us away from the intuitive, one-shot innovation mode of educational reform to a mode of operation in which reforms are seen as actual or simulated experiments, with each experiment providing information for successive improvement and refinement of possible alternatives.
Salkind_Chapter 25.indd 194
9/4/2010 10:33:29 AM
Glaser
Components of a Psychology of Instruction
195
A second question to be considered is: Given methodologies for deciding among possible alternatives, what are the substantive components that are required as the data to which these methodologies can be applied? This question is a large one for psychological research, and discussion of it will comprise the remainder of this paper. Regardless of the descriptive theory with which one works, four components of a prescriptive theory for the design of instructional environments appear to be essential: (a) analysis of the competence, the state of knowledge and skill, to be achieved; (b) description of the initial state with which learning begins; (c) conditions that can be implemented to bring about change from the initial state of the learner to the state described as the competence; and (d) assessment procedures for determining the immediate and long-range outcomes of the conditions that are put into effect to implement change from the initial state of competence to further development. These components of a psychology of instruction comprise the information – the parameters, constraints, and functional relationships – that is required for employing procedures to optimize instruction or for deciding between instructional alternatives. I shall discuss each of these in turn, but before doing so, let me give you some feeling for the general nature of the kind of individual cognitive development with which I am concerned here and to which the above components refer.
The Development of Competence The process of instruction, as distinguished from education in general, is, to a large extent, concerned with the development of the behaviors and cognitive structures that differentiate between the novice and the competent performer in a particular subject matter and intellectual skill. In attaining this knowledge and skill, the learner proceeds through a novitiate stage and then on to a stage of relative expertise; he or she learns to be a good reader, a competent mathematician, a deep thinker, a quick learner, a creative person, an inquiring individual, and so on. Competence in these activities is assessed according to criteria of expertise established by the school and the community; more specifically, it is assessed by subject-matter requirements, peer-group expectations, and the general social and professional criteria for what constitutes low, average, and high levels of competence. The educational and social community adjusts its expectations to the competence level of the learner so that initially awkward and partially correct performances are acceptable, whereas later, they are not. The changes that take place as an individual progresses from ignorance to increasing competence are of the following kinds: (a) Variable, awkward, and crude performance changes to performance that is consistent, relatively fast, and precise. Unitary acts change into larger response integrations and overall strategies. (b) The contexts of performance change from simple stimulus patterns with a great deal of clarity to complex patterns in which information must
Salkind_Chapter 25.indd 195
9/4/2010 10:33:29 AM
196
Curriculum, Instruction and Learning
be abstracted from a context of events that are not all relevant. (c) Performance becomes increasingly symbolic, covert, and automatic. The learner responds increasingly to internal representations of an event, to internalized standards, and to internalized strategies for thinking and problem solving. (d) The behavior of the competent individual becomes increasingly self-sustaining in terms of skillful employment of the rules when they are applicable and subtle bending of the rules in appropriate situations. Increasing reliance is placed on one’s own ability to generate the events by which one learns and the criteria by which one’s performance is judged and valued. It is the understanding and facilitation of this process of change from ignorance to competence, from novice to expert, that is a major focus of the emerging psychology of instruction. Consider now the components required to facilitate this process.
Components of a Psychology of Instruction The Analysis of Competent Performance Central to a concern with instructional processes is the problem of task analysis; analytic description is required of what it is that is to be learned. What has a competent performer in a subject-matter domain learned that distinguishes him from a novice? What distinguishes a skilled reader from an unskilled one? When a task analysis identifies the properties of a certain class of performance, then inferences can be formulated and tested concerning optimal instructional processes for acquiring these performance abilities. Analyzing the content of instruction means studying tasks considerably more complex than those typically studied in the laboratory. It also requires techniques for the detailed analysis of performance in terms of the demands placed on cognitive processes and on knowledge and skills assumed to be in the learner’s repertoire as acquired through instruction, development, or self-learning. The requirement for the analysis of competent performance is related to the specification of behavioral objectives so strongly advocated by many educational psychologists. This salutary advice given by behavioral psychologists is now being taken seriously by cognitive theorists concerned with the cognitive components of criterion performance. There seem to be two main aspects to such an analysis. One is the identification of the information structures that are required for performance, and the other is a description of the processes and cognitive strategies – heuristics and algorithms – that need to be applied to this information, and which themselves are part of the information data base. As an interesting case in point, consider the work that has been going on in the cognitive simulation of expert chess players. An article by Simon and Chase (1973) summarizes differences between novice and average players,
Salkind_Chapter 25.indd 196
9/4/2010 10:33:30 AM
Glaser
Components of a Psychology of Instruction
197
and masters and grandmasters in chess. They indicate that the most likely explanation for the extraordinary skill of the chess master is that he is acquainted with tens of thousands of familiar patterns of pieces, and he associates many of these patterns with plausible moves by taking advantage of the informational features represented by the patterns. The basic heuristics that guide the search for good moves are based upon the perceptual ability to recognize an informational pattern on the board. “For example,” Simon and Chase point out, “every chess player of even moderate skill is familiar with the advice: ‘If there’s an open file, put a Rook on it’ ”. (p. 402). The pattern of an open file triggers this heuristic and initiates a move in a heuristic search for the best move. For a chess master, hundreds of immediately recognized patterns may be associated with an algorithmic solution – i.e., moves that lead to the guaranteed win of a piece or a checkmate – so that a series of moves may be played almost by rote. The key to understanding chess skill lies in understanding the large perceptual vocabulary of piece configurations, the associated algorithms, and the particular perceptual processes involved in this skill. From an instructional point of view, the target behavior of interest is that the chess master’s performance seems to involve a buildup in long-term memory of a vast repertoire of patterns and associated plausible moves. Early in practice, these move sequences are arrived at by slow, conscious heuristic search – “If I take that piece, then he takes this piece . . . ” – but with practice, the initial condition is seen as a pattern, quickly and unconsciously, and the plausible move comes almost automatically. Such a learning process takes time – years – to build the thousands of familiar chunks needed for masterlevel chess. (Simon & Chase, 1973, p. 403)
It is to be noted further that grandmasters may possess exceptional talents along certain dimensions, but their talents are chess-specific. There is no evidence that masters demonstrate more than above-average competence on basic intellectual factors. Thus, the acquisition of chess skill depends, in large part, on building up specific recognition memory for many familiar chess patterns. In a psychology of instruction, this kind of contrastive analysis of the informational content and skills of competent performers and novices might be prototypic of the kind of research that is especially relevant to an understanding of the objectives of instruction. Consider another example of work on simple arithmetic problems and the nature of competent performance in addition and subtraction. Studies carried out by Suppes and Groen (1967); Woods, Resnick, and Groen (1975); and Resnick (in press) suggest an interesting relationship between what children are taught to do and how they eventually perform efficiently. Young children are generally taught to solve a single-digit addition problem such as 6 + 8 by an algorithm in which they count out six blocks, then count
Salkind_Chapter 25.indd 197
9/4/2010 10:33:30 AM
198
Curriculum, Instruction and Learning
eight blocks, and then count to combine the set. With practice, children perform this smoothly; when the blocks are taken away, they frequently shift to counting on their fingers, and then eventually shift to internal processing. When the nature of this internal processing is examined, it is found that most children carry out addition by using what has been called a “choice model.” They appear to set a mental counter to the magnitude of whichever number is larger and then increment by the smaller number. Some children retain the earlier model used in instruction – that is, they increment six times, then increment eight more times, and then read their mental counter. The most efficient children, however, appear to be able, without direct instruction, to convert a routine that has been taught into a different routine – a routine that shows they have discovered commutativity and have developed a performance that requires fewer steps. It is to be noted that the initial teaching procedure reflected the rational “union of sets” definition of addition, and thus is a mathematically correct procedure that represents the subject matter clearly and provides a routine that is easy to demonstrate and learn. For an efficient performer, however, the routine is awkward and slow. Thus, the routine derived by rational analysis of the subject-matter structure is transformed to a performance routine that reflects a more sophisticated definition of the subject matter. What are the implications of this analysis? On the face of it, it would seem that we ought to abandon the algorithm suggested by direct analysis of tasks in favor of analysis of skilled performance. We can argue that the rational analysis of tasks may not match skilled performance and that it therefore should not be used as a basis for instruction. It would seem best to carry out detailed empirical analyses of skilled performance on subject-matter tasks and teach the routines uncovered by such analyses. However, in discussing her work, Resnick (in press) points out that such a conclusion could be in error, since it rests on the assumption that efficient instruction is necessarily direct instruction in skilled performance strategies rather than instruction in routines that put learners in a good position to invent or derive efficient strategies for themselves. So, it is implied that the teaching routines in elementary arithmetic were not poor ones that inhibited the acquisition of efficient performance, but may have been good ones that fostered the invention of more efficient algorithms. As suggested by the above examples, the work on the analysis of competent performance that is going on at the present time is of two kinds: the characterization of the information structures and cognitive processes of the skilled performer, and behaviorally oriented work on rational task analysis. Such analyses of human competence and subject-matter tasks may allow us to do two things regarding the optimization of instruction: (a) Specifying the structures and processes by which competent individuals might be performing a task may put us in a position to try to teach these processes to individual learners. (b) Knowing that a task is performed efficiently in one way
Salkind_Chapter 25.indd 198
9/4/2010 10:33:30 AM
Glaser
Components of a Psychology of Instruction
199
rather than in another might enable us to design instruction so that the performance learned allows individuals to directly or indirectly transfer to the more efficient method. It would be a serious omission to leave the topic of task analysis without referring to the influential work of Robert Gagné on learning hierarchies (1962, 1970). This theory continues to be widely accepted as a framework for investigating instructional processes and for designing educational procedures and curricula in various subject matters. Gagné has presented us with a system for rational task analysis based upon a cumulative learning model that states that there are different types of learning, with the simpler types being prerequisite states for learning the more complex types. For example, problem solving, a complex higher-order type of learning, requires rule learning, a lower-order task, as a prerequisite; and rule learning, since rules consist of relationships between concepts, requires concept learning as a prerequisite; and so forth. In general, the lower-order task is defined as being prerequisite to a higher-order task when competence in the simpler task facilitates positive transfer in learning the more complex task. In addition to a clear-cut transfer relationship, there are, however, several possible relationships that might exist between prerequisite tasks and superordinate tasks. The lower-order task might be one of a number of components of the more complex task, each of which can be acquired independently of the others, but all of which must be combined to produce the higher-order performance. Alternatively, the lower-order tasks may themselves be hierarchically related to one another, constituting a sequenced progression leading to increasingly complex performance. Lower-order tasks may also be competencies which facilitate the learning of the more complex task, but which drop out in the more “skillful” performance. Furthermore, the lower-order tasks might function as heuristics for discovering or inventing procedures for carrying out the more complex task. Research along these lines, i.e., investigating the acquisition of complex performance on the basis of existing competencies, is especially relevant for instructional psychology.
Description of Initial State Instruction begins with an initial state of the learner, and instruction proceeds on this base toward the development of competent performance. There are two approaches to this component of instructional design: “immediate” and “long-term.” The immediate approach is to take seriously the fact that effective instruction requires careful assessment of the strengths, weaknesses, styles, and background interests and talents of individual learners. What are the details of what a child knows and does not know at particular points in his or her learning? What are the details of the skills that he or she is developing? What needs to be improved? What strengths can be capitalized on? What
Salkind_Chapter 25.indd 199
9/4/2010 10:33:30 AM
200
Curriculum, Instruction and Learning
do various developmental levels and various cultural backgrounds mean for what should be taught and how it should be taught? Educational practices need to be designed so that answers to these kinds of questions are possible for all individuals attending school. Teachers and students need to be in a position to obtain and utilize this kind of information; with it, teachers can prescribe the instruction required, and students can assess their own abilities and select appropriate instruction. The use of procedures for providing this kind of information for teaching requires the adoption of an attitude that looks upon the information obtained as information for improving instruction, and not simply as a test for evaluating and classifying students. For this purpose, it has been useful to provide teachers with hierarchies of increasing competence in various school subjects (Resnick, Wang, & Kaplan, 1973). These take the form of “structured maps” into which a teacher can place a child and thereby direct attention to prerequisite skills that might need to be learned or advanced skills that the child might explore. The hierarchical map serves as a guide upon which both the teacher and the child can impose additional judgments. The provision of procedures for identifying the current competence and talents of the learner in a way that provides a basis for instruction is generally not done in current educational methods at a level of detail necessary for the effective guidance of individual learners. The implementation of such procedures is not only a matter of research, but also largely a matter of administrative change and the design of appropriate materials. The more long-term approach derives from the fact that aptitude and intelligence tests are the prevalent methods for assessing initial states that are, to some extent, predictive of eventual educational success, but these measures do not provide sufficient information about instructional processes (Glaser, 1972). Having been devised primarily for purposes of selection, these measures do not provide a basis for deciding how instruction might be designed to make the attainment of successful performance more probable. The significant requirement in this regard for a psychology of instruction is to describe the initial state of the learner in terms of processes involved in achieving competent performance. This would then allow us to influence learning in two ways: (a) to design instructional alternatives that adapt to these processes, and (b) to attempt to improve an individual’s competence in these processes so that he is more likely to profit from the instructional procedures available. There is, at the present time, a spurt of interesting research devoted to analyzing the underlying cognitive processes that contribute to intelligence and aptitude-like performance. Three illustrative examples will be presented. In a recent series of studies by Hunt, Frost, and Lunneborg (1973), students were classified into high- and low-verbal ability groups and into highand low-quantitative ability groups on the basis of a battery of tests used for selection for college entrance at the University of Washington. The individuals
Salkind_Chapter 25.indd 200
9/4/2010 10:33:30 AM
Glaser
Components of a Psychology of Instruction
201
in each of these groups were then given a series of tasks employed in laboratory experiments on the experimental analysis of information processing models of memory. In this way, the characteristics of high-verbal ability and high-quantitative ability students, as defined by aptitude tests, were examined in terms of cognitive processes, as defined by tasks used to investigate particular theories of cognition. The conclusions from the studies tentatively indicate that there is a relationship between verbal ability and the rapidity and efficiency of data manipulation in short-term memory, and between quantitative ability and resistance to distraction while consolidating information in short-term memory. It is thus suggested that verbal and mathematical aptitude is related to the nature of information processing in memory, and the interesting question for an instructional psychology is whether we can proceed further and identify situations where the speed and other properties of such processing will be predictive of school achievement. Such an endeavor could have more significant implications than present correlationally derived relations between aptitude tests and school success because clues would perhaps be available about how verbal and mathematical ability processes might be modified or employed for learning. In a very recent paper, Estes (1974) discusses the digit-span test that appears on the Stanford-Binet. At year ten, the subject’s task is to repeat a sequence of random digits after they have been read aloud by the examiner. The test correlates satisfactorily with the usual validation criteria, but the interesting instructional question is: If an individual scores low on this test, what instructional procedure should we expect to be useful in improving this performance, performance that we know is correlated with academic accomplishment? Estes describes recent research and theory dealing with shortterm memory for sequences of items that indicate that the digit-span task appears to involve a hierarchical structure of representations in memory. A quote gives the gist and flavor of this: On presentation of the digit sequence of 691472, the individual is conceived to subgroup the sequence into two chunks, assigning a code to each which he maintains in memory, and within each chunk relating the items of the sequence to the ordinal numbers 1, 2, and 3. On a request to recall the string, the individual brings into memory his coded representations of the two chunks; each of these in turn activates recall of the individual digits and their associated serial positions. While this process goes on, the individual must hold the partially reconstructed sequence in an output response buffer by an inhibitory process until the decoding is complete and then emit the digits in the proper order. (p. 743)
Estes points out that such an analysis of performance on the digit-span task may have implications for assessing individual differences. Young or mentally retarded children might fail the test because of insufficient familiarity with the sequence of ordinal numbers or because of inexperience in ordering materials
Salkind_Chapter 25.indd 201
9/4/2010 10:33:30 AM
202
Curriculum, Instruction and Learning
with the number sequence. An individual may not perform well because he has not developed an appropriate strategy of grouping (although he might utilize grouping when prompted by the examiner), is unable to accomplish the coding process necessary to take advantage of chunking, or lacks the capacity for selective inhibition in buffer storage necessary to order his output properly. Estes writes: Clearly, it would be possible with the advantage of added theoretical insight to augment the standard digit span test in such a way as to localize the source of difficulty for an individual who fails under the standard procedure. This augmentation would quite likely do little to improve the predictive value of the test, but it might be of considerable help in indicating how deficient performance in this and related tasks might be remedied. (p. 744)
Holzman (1975) has studied letter series completion problems of the sort used by the Thurstones (1941) in their factor analytic studies of intelligence. Letter series consist of a sequence of alphabetic characters running in a consistent pattern. In any one test item, usually about a dozen of these patterned letters are presented to the examinee followed by four blank spaces. The individual must fill in the four blanks with letters that are consistent with the pattern exhibited by the previously presented letters of that series. For example, the individual might see the problem “defgefghfghi . . . ” and be asked to fill in the blanks. Work on analyzing this task has been carried out by Simon and Kotovsky (Simon & Kotovsky, 1963; Kotovsky & Simon, 1973), who have obtained protocols of adolescents and adults solving these sorts of problems; then, based on these observations, they wrote computer programs to simulate humans’ solution routines. Four basic component routines are necessary for the simulation of correct solution. The first routine is the detection of relations between letters: Are letters identical, sequential, or sequential in reverse order? The second routine or subskill is the discovery of periodicity in a series. This involves noticing that letter relations repeat themselves at regular, predictable intervals. A third routine, called pattern description, assembles knowledge of letter relations and knowledge of periodicity into a rule that generates the series. The final routine required is extrapolation. This involves remembering the pattern description and using this rule to generate the appropriate letters for the blanks. Using this information about the possible cognitive processes involved, Holzman taught elementary school children to be very proficient in the detection of relations and the discovery of periodicity. As a result of their training, children were able to show substantial pretest to posttest gains on a typical letter series completion test. Most strikingly, the children were significantly more able than control subjects to demonstrate perfect posttest solutions to the types of problems which they found difficult on the pretest. Both the control subjects, as a result of repeated testing, and experimental
Salkind_Chapter 25.indd 202
9/4/2010 10:33:30 AM
Glaser
Components of a Psychology of Instruction
203
subjects were able to make gains on easy problems, but the children trained on component subskills seem to have acquired an information management strategy that allowed them frequently to reach perfect solution even on difficult problems. The skills taught to the children in this study were quite specific; however, the question is raised about the possibilities for the analysis of abilities that are more general than these and that might provide a basis for truly generative intellectual abilities. Studies like those I have just described raise the possibility that measures of intelligence and aptitude, analyzed in terms of cognitive processes, will, as Hunt and his colleagues (1973) write, “move many psychometric predictions from static statements about the probability of success to dynamic statements about what can be done to increase the likelihood of success” (p. 118). And furthermore, “Hopefully [this] new viewpoint . . . will lead to measuring instruments which are diagnostic, in the sense that they tell us how the institution should adjust to the person, instead of simply telling us which people already are adjusted to the institution” (p. 120).
Conditions That Foster the Acquisition of Competence This third component of instructional design – the conditions that can be implemented to foster the acquisition of competence – essentially involves the procedures that assist learning and the techniques and materials that are designed into the environment in which learning occurs. In this regard, we should recognize that the little we do know about learning is known in terms of descriptive science. Little investigation has taken place from the point of view of utilizing this information for designing the conditions of instruction. Exceptions to this are the work on behavior modification, the work of Gagné, and the limited work referred to earlier on optimization models for paired-associate forms of learning. However, for the most part, these enterprises have not considered complex cognitive performance in any intensive way. What is required is that research on instruction be cast into the mold of a design science that attempts to maximize the outcomes of learning for different individuals. A new form of experimentation would be called for where the tactic is not to develop models of learning and performance, but to test existing models by using them for maximizing learning under various conditions. For this purpose, we need a theory of the acquisition of competent performance. Such a theory would be concerned with how an individual acquires increasingly complex performances by assembling the present components of his repertoire, by manipulating the conditions and events around him, and by employing his knowledge of how he learns. With the development of such a theory in mind, some very brief preliminary observations can be made on knowledge structures in memory, on generalized abilities for learning to learn, and on the nature of reinforcement.
Salkind_Chapter 25.indd 203
9/4/2010 10:33:30 AM
204
Curriculum, Instruction and Learning
Knowledge structures. Some recent work on the semantic structure of information in memory (e.g., Greeno, in press) has been concerned with the semantic networks and information processing mechanisms that are available at different levels of subject-matter competence. If, at various levels of learning or stages of competence, the kinds of knowledge we wish to create in the minds of students can be specified in this way, then some interesting implications are suggested for the relationship among subject-matter structure, curriculum content, and instructional design. One such relationship can be seen by distinguishing between the structure of a subject-matter domain as it is organized by scholars studying that domain and the structure that is devised for teaching it (Glaser, 1973). The structure of a subject-matter discipline, as employed for the purpose of advanced scholarship, consists of theories, concepts, and definitions that serve to make the domain manipulable for the work of subject-matter experts. However, the structures employed for this purpose are not necessarily the most useful for facilitating the learning of an individual at a less advanced level of development or subject-matter sophistication. Good theory for the scholar may not be good pedagogical theory; what leads to knowledge for the expert may neither lead to knowledge for the novice nor help him to develop competence. It follows that a significant consideration for instructional design is the organization of curriculum sequences that provide knowledge structures optimally organized for moving the novice toward expertise. Appropriately designed structures for learning can reduce the amount of information that must be held in mind to comprehend the subject matter; for example, a verbal label, a conceptual formulation, or a rule or principle may help to organize and summarize a large number of observations. The rule can be thought of as a structure or representation by which an individual is directed or directs himself to look at the relevant features of what might otherwise be an unorganized task situation. As a consequence, a student can generalize across the superficial details of the limited set of experiences encountered in instruction (Gilbert, 1962). Some ways of organizing information may permit better memory retrieval than other ways and, as a result, facilitate the learner’s capacity to learn new things on the basis of what he has already learned and to access information readily for thinking and problem solving. The organization of subject-matter content can do for the learner what advanced theory does for the expert. Such organizations, however, are not readily available; they are sometimes devised by ingenious teachers and built by them into instructional procedures. I would further suggest that the nature and the design of these organizations or pedagogical structures are a unique province of study for a psychology of instruction. Teaching generalized learning-to-learn abilities. In the acquisition of competence, a significant instructional consideration is the way in which individuals use their current competence and components of their repertoire for learning new higher-order performance or for solving problems that lead to
Salkind_Chapter 25.indd 204
9/4/2010 10:33:30 AM
Glaser
Components of a Psychology of Instruction
205
learning this higher-order performance. Thus, an appropriate concern for instruction is the possibility for teaching general strategies that will help individuals learn on their own and be less dependent on the instructor’s elegance in presenting particular tasks. An interest in teaching such general “learning to learn” abilities has been widely expressed by educators and psychologists, but at the present time, there is little scientific basis for such instruction. One possible basis can come from the studies already described on the process analyses of aptitude-like skills. Still another potential basis for such instruction might be provided by the growing number of information processing analyses of problem-solving tasks. In a recent paper, Resnick and Glaser (in press) argue that the processes involved in certain kinds of problem solving are probably similar to the processes involved in learning in the absence of direct or complete instruction, and that instruction in these processes might constitute a means of increasing an individual’s generalized learning-to-learn abilities. A model of problem solving was developed in which three interacting phases were identified: (a) problem detection, in which the inapplicability of “usual routines” for solving a problem is noted and a problem or goal is formulated; (b) feature detection, in which the task environment (the external situation, which includes both physical and social features) is scanned for cues that might lead to appropriate actions; and (c) goal analysis, in which goals are successively reformulated, partly on the basis of external task cues, in order to yield soluble subgoals that contribute eventually to solution of the problem as presented. A study by Schadler and Pellegrino (Note 1) has shown that requiring subjects to verbalize their goals and strategies in each of these phases, before making overt moves toward solution, greatly enhances the likelihood of problem solution. Along these lines, it seems reasonable to anticipate that ways can be found to make individuals more conscious of the role of environmental cues in problem solving. Individuals might be taught strategies of feature scanning and analysis that will enhance the likelihood of their noticing cues that prompt effective actions while somehow “deactivating” those cues that prompt ineffective actions. Such self-regulation could be a major characteristic of successful self-learning and problem solving. The specific suggestions that can be offered at this time for instruction of such generalized learning abilities are limited, since relatively little has been done on developing task analyses that characterize these general processes in instructable terms; but work on problem solving is especially relevant to this important goal of instruction. Related to this is work on reinforcement effects to which I now turn. Reinforcement. Contingencies of reinforcement pervade the acquisition of competence. However, with the strong emergence of cognitive psychology, and with awareness of the fact that the bulk of our knowledge about reinforcement is derived from animal studies in simple task situations and from human experimental contexts in which conditions constrain subjects to employ limited behavioral processes, we are in some danger of ignoring the potential influence
Salkind_Chapter 25.indd 205
9/4/2010 10:33:30 AM
206
Curriculum, Instruction and Learning
of reinforcement on complex performance. There is, on the one hand, a strong suggestion of discontinuity in the operation of reinforcement when moving from simple to higher-order behaviors. On the other hand, the view that seems best supported at the moment is that the mechanisms of reinforcement are similar at all levels of development, but variations in response organization result in different phenotypic manifestations (Estes, 1971). As individuals mature, human behavior is organized into higher-order routines and strategies, and it is these large cognitive organizations whose probabilities of occurrence are modified by reinforcing contingencies. It is the nature of the unit of response that may distinguish the mature human learner, whereas the operation of the principles of reinforcement may be similar for different species and different levels of development and competence. From the point of view of a theory of instructional psychology, we should be further aware that in the natural settings of classrooms reinforcement occurs extensively within a social context. This highlights certain dimensions of the nature of reinforcement that need to be considered in instructional situations (e.g., Bandura, 1971). One aspect is that people continually observe the behaviors of others as this behavior is rewarded, ignored, or punished; and this observation influences the subsequent operation and effect of reinforcers on the observers. This is the phenomenon of modeling and vicarious reinforcement. A second aspect is that individuals regulate their own actions by mechanisms of self-reinforcement. Self-generated anticipatory consequences allow possible future contingencies to influence present behavior, and self-evaluations of the consequences of one’s own actions influence behavior as these consequences are made apparent by classroom reinforcement contingencies.
Assessment of the Effects of Instructional Implementation The fourth component of instructional design is concerned with the effects of instructional implementation in the short and in the long run – effects that occur immediately in the context of instruction and effects that persist in terms of long-term transfer, generalized patterns of behavior, and ability for further learning. One requirement for this purpose is to break away from the tradition of norm-referenced measurement to measurement more concerned with identifying the nature of competent performance (Glaser, 1963; Glaser & Nitko, 1971). For effective instructional design, tests will have to be criterion referenced in addition to being norm referenced. They will have to assess performance attainments and capabilities that can be matched to available educational options in more detailed ways than can be carried out with currently used testing and assessment procedures. This will be an important part of the development of a psychology of instruction. It is mandatory that testing not stand out as evaluative devices that are an extrinsic and external adjunct of instruction. Tests need to be interpreted in terms of performance criteria so that the learner and the teacher are informed about an individual’s
Salkind_Chapter 25.indd 206
9/4/2010 10:33:30 AM
Glaser
Components of a Psychology of Instruction
207
progress relative to developing competence. In this way, information is provided for deciding upon appropriate courses of instruction. The performance measured by tests designed to facilitate instruction needs to be related to processes identified as components of competence. For this purpose, some interesting endeavors can be envisioned. One example is work going on in analyzing the processes involved in the comprehension of written language, stimulated by the work in psycholinguistics and cognitive psychology (e.g., Carroll & Freedle, 1972). This development should be juxtaposed with the fact that there has been a great deal of work on the development of tests of reading comprehension. As we begin to analyze comprehension tasks and relate them to theories of semantic memory, imagery, and so forth, we should be able to develop tests that provide us with diagnostic information about component processes that contribute to performance and that can be influenced through instruction. This kind of activity should change the nature of assessment procedures and provide us the kind of information required for maximizing instructional outcomes. Another area of investigation that is beginning to provide significant evaluative information about the conditions under which learning takes place in school contexts should be mentioned. This is the growing sophistication in the study of the nature of classroom processes. In the past, we essentially attempted to describe school learning by relating the nature of student input to the quality of student output; but the process intervening between the two, the independent variable, was only generally described. Detailed information was rarely obtained about differences between effective and less effective classroom processes. There are now a number of attempts to research these details. I am especially impressed by the model for such research being developed by my colleague, William Cooley, in conjunction with Paul Lohnes (Cooley & Lohnes, in press). Their model is derived from Carroll’s 1962 model of school learning and consists of six components: (a) initial ability, which reflects the basic incoming skills and general intellectual development of children in a classroom; (b) opportunity, which describes the relative proportion of classroom activities (the dominant classroom subject-matter themes) that are directly related to the assessed outcomes of instruction; (c) motivation, which reflects a student’s tendency to engage in learning activities when the opportunity exists, and operationally defined (in elementary school classrooms) as the fit between the learning situation and the child’s needs, and the relative incidence of teacher praise and encouragement and their antitheses for particular pupil behaviors; (d) structure and placement, which reflect the extent to which the curriculum is structured by specifying objectives, sequences of instruction, particular methods used in differentiating students or in individualizing instruction, and, in general, the organization of instruction and teaching materials; (e) instructional events, which reflect the relative incidence of teacher-pupil instructional interaction and observed, for example, through the extent of teacher acknowledgment of, and feedback with
Salkind_Chapter 25.indd 207
9/4/2010 10:33:30 AM
208
Curriculum, Instruction and Learning
respect to, a student’s task-related activity; (f ) criterion ability, which reflects end-of-year student performance, for example, on standardized achievement and intellectual ability tests. After obtaining information on these components of instruction, a multivariate analysis procedure is used to determine the regression of criterion ability on the other five components of the instructional model. This permits an analysis of the total variance represented in the criterion variable that is explainable in terms of the other components – (a) variance due to incoming ability independent of classroom process variables, (b) variance uniquely due to the classroom process variables independent of initial ability factors, and (c) variance due to the interaction or overlap between initial ability and instructional processes. In this way, detailed information is obtained on the kind of classroom implementation of an instructional system that is effective or ineffective in producing school outcomes. What is of particular interest in research of this kind is that we can begin to relate the effectiveness of school implementation procedures to psychological dimensions of learning theory and to a theory of the acquisition of competence. Each endeavor can reinforce or challenge the findings of the other. To conclude: A speculative outline of a psychology of instruction as a science of design has been presented. Directions in which it might develop and what some of its substantive components might be have been suggested. There is much to be done, but many promising leads are now offered for testing fundamental theories of human learning and cognition and for contributing strongly to educational practice.
Reference Note 1. Schadler, M., & Pellegrino, J. W. Maximizing performance in a problem solving task. Unpublished manuscript, University of Pittsburgh, Learning Research and Development Center, 1974.
Note 1. Dewey is quoting Hugo Munsterberg, Psychology and life, p. 138. (New York: Houghton, Mifflin & Co., 1899.)
References Atkinson, R. C., & Paulson, J. A. An approach to the psychology of instruction. Psychological Bulletin, 1972, 78, 49–61. Bandura, A. Vicarious- and self-reinforcement processes. In R. Glaser (Ed.), The nature of reinforcement. New York: Academic Press, 1971. Bruner, J. S. Some theorems on instruction illustrated with reference to mathematics. Theories of learning and instruction. The Sixty-third Yearbook of the National Society for the Study of Education, Part I, 1964, 63, 306–335.
Salkind_Chapter 25.indd 208
9/4/2010 10:33:30 AM
Glaser
Components of a Psychology of Instruction
209
Carroll, J. B., & Freedle, R. O. (Eds.). Language comprehension and the acquisition of knowledge. Washington, D. C.: V. H. Winston & Sons, 1972. Cooley, W. W., & Lohnes, P. R. Evaluative inquiry in education. New York: Irvington Publishers, in press. Dewey, J. Psychology and social practice. The Psychological Review, 1900, 7, 105–124. Estes, W. K. Reward in human learning: Theoretical issues and strategic choice points. In R. Glaser (Ed.), The nature of reinforcement. New York: Academic Press, 1971. Estes, W. K. Learning theory and intelligence. American Psychologist, 1974, 29, 740–749. Gagné, R. M. The acquisition of knowledge. Psychological Review, 1962, 69, 355–365. Gagné, R. M. The conditions of learning (2nd ed.). New York: Holt, Rinehart & Winston, 1970. Gilbert, T. F. Mathetics: The technology of instruction. Journal of Mathetics, 1962, 1, 7–73. Glaser, R. Instructional technology and the measurement of learning outcomes: Some questions. American Psychologist, 1963, 18, 519–521. Glaser, R. Individuals and learning: The new aptitudes. Educational Researcher, 1972, 1, 5–13. Glaser, R. Educational psychology and education. American Psychologist, 1973, 28, 557–566. Glaser, R., & Nitko, A. J. Measurement in learning and instruction. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.). Washington, D. C.: American Council on Education, 1971. Greeno, J. G. Cognitive objectives of instruction: Theory of knowledge for solving problems and answering questions. In D. K lahr (Ed.), Cognition and instruction. Hillsdale, N. J.: Lawrence Erlbaum Associates, in press. Groen, G. J., & Atkinson, R. C. Models for optimizing the learning process. Psychological Bulletin, 1966, 66, 309–320. Holzman, T. G. Process training as a test of computer simulation theory. Unpublished master’s thesis, University of Pittsburgh, 1975. Hunt, E., Frost, N., & Lunneborg, C. Individual differences in cognition: A new approach to intelligence. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 7). New York: Academic Press, 1973. Kotovsky, K., & Simon, H. A. Empirical tests of a theory of human acquisition of concepts for sequential patterns. Cognitive Psychology, 1973, 4, 399–424. Resnick, L. B. Task analysis in instructional design: Some cases from mathematics. In D. K lahr (Ed.), Cognition and instruction. Hillsdale, N. J.: Lawrence Erlbaum Associates, in press. Resnick, L. B., & Glaser, R. Problem solving and intelligence. In L. B. Resnick (Ed.), The nature of intelligence. Hillsdale, N. J.: Lawrence Erlbaum Associates, in press. Resnick, L. B., Wang, M. C., & Kaplan, J. Task analysis in curriculum design: A hierarchically sequenced introductory mathematics curriculum. Journal of Applied Behavior Analysis, 1973, 6, 679–710. Simon, H. A. The sciences of the artificial. Cambridge, Mass.: MIT Press, 1969. Simon, H. A., & Chase, W. G. Skill in chess. American Scientist, 1973, 61, 394–403. Simon, H. A., & Kotovsky, K. Human acquisition of concepts for sequential patterns. Psychological Review, 1963, 70, 534–546. Suppes, P., & Groen, G. J. Some counting models for first-grade performance data on simple addition facts (Tech. Rep. 90). Also in J. M. Scandura (Ed.), Research in mathematics education. Washington, D. C.: National Council of Teachers of Mathematics, 1967. Thorndike, E. L. The psychology of arithmetic. New York: Macmillan, 1922. Thurstone, L. L., & Thurstone, T. G. Factorial studies of intelligence. Chicago: University of Chicago Press, 1941. Woods, S. S., Resnick, L. B., & Groen, G. J. An experimental test of five process models for subtraction. Journal of Educational Psychology, 1975, 67, 17–21.
Salkind_Chapter 25.indd 209
9/4/2010 10:33:30 AM
Salkind_Chapter 25.indd 210
9/4/2010 10:33:31 AM
26 The Emergence of Cognitive Psychology Robert R. Holt
O
ne of the great unmistakable trends in psychology during the past decade and a half has been the emergence of a new and vigorous interest in cognition. This last term itself has experienced a revival of currency and respectability; once a scholastic term for knowledge, encountered in the classical threefold division of human function into cognition, affection, and conation, it has now shaken off such dusty and facultative connotations and is used boldly by neo-behaviorists who still feel a trifle shy about admitting an interest in thinking as a subject matter for psychology. The scope of the term is, to be sure, a good deal broader than thought processes: it comprises perceiving, judging, forming concepts, learning (especially that of a meaningful, verbal kind), imagining, fantasying, imaging, creating, and solving problems. One might try to summarize all this by saying that cognition deals with all aspects of symbolic behavior, in the broad sense, if it were not for the fact that the study of language is traditionally separated off into linguistics. Such boundaries are, of course, artificial, but the attempt to draw them helps us to recognize that cognitive psychology is growing actively at its peripheries as well as in its core area of thinking. The appended bibliography gives some of the evidence for the declaration of a renaissance starting in 1951. It does not, of course, show the preceding decades of dearth, but they could be easily documented. Other evidences of the trend are the establishment, during these years since 1951, of a number of institutes such as the Center for Cognitive Studies at Harvard University and the Research Center for Mental Health at New York University, and the appearance of such new publications as the Journal of
Source: Journal of the American Psychoanalytic Association, 12 (1964): 650–665.
Salkind_Chapter 26.indd 211
9/4/2010 10:33:22 AM
212
Curriculum, Instruction and Learning
Verbal Learning and Psychological Issues. Indeed, a substantial part of the appended bibliography comes from this last-mentioned monograph series. Of the nine titles that had appeared by the end of 1962, seven are listed here, and several other major contributions to cognitive psychology are announced for early release, notably works by I. Kohler, G. S. K lein, C. Fisher and W. Dement. Elsewhere1 I have set forth some hypotheses about historical reasons for the decline of interest in cognition, which was the central problem of psychology before 1910, and its recent recrudescence. Briefly, it seems plausible that the two greatest revolutionary influences in psychology of the twentieth century both helped to turn attention away from the subjective phenomena of consciousness: behaviorism and psychoanalysis. Conscious processes were downgraded in these simultaneous and seemingly antithetical movements for seemingly quite different reasons, but I believe that there was an underlying unity of outlook that is beginning to emerge. During the polemical decade 1915–1925, both the psychoanalysts and the behaviorists were impatient with studying the contents of consciousness as misleading and epiphenomenal, Freud because he was so much more impressed by the importance of unconscious motivations, Watson because he wanted to focus on objectively observable behavior. As each of these schools became entrenched and attained the status of an orthodoxy instead of a struggling radicalism, it slowly became apparent that there were legitimate and important problems left unsolved by the neglect of the subjective, cognitive realm. And both those who strove for depth and those who valued objectivity most highly began to realize that they did not have to betray their basic values to study these problems. Learning had always been a central concern of the behaviorists; gradually, they began to take an interest in the acquisition of meaningful cognitive structures – concepts as well as motor performances, words as well as nonsense syllables. Ego psychology began to come to the fore in psychoanalysis; thanks largely to Hartmann, Erikson, and Rapaport, interest was turned to the nature of reality and man’s adaptation to it. Another trend that impresses me as underlying the burgeoning of concern with cognition is the gradual emergence of model-building as a major theoretical objective in almost all schools of psychology, including psychoanalysis. During the past two decades, we have begun to appreciate the proper role of psychological theory as an abstract model or simulacrum of man capable of predicting his behavior (in the broadest sense). This point of view implies that we shall not truly understand how and why people do, feel, and think as they do until we can build an imitation that will accurately simulate these human activities, and that a good theory is an abstract blueprint for such a working model. The advent of the high-speed digital computer, which can acquire information, store it in a memory, process it in such a way as to solve problems and output the answers, has been a powerful stimulant to the
Salkind_Chapter 26.indd 212
9/4/2010 10:33:22 AM
Holt
The Emergence of Cognitive Psychology
213
imagination of model-builders – of whom Freud was one of the first, long before the thinking machines. But whether the model is as concrete as a computer or as abstract as Hullian “behavior theory” or Freud’s “psychology for neurologists” (the Project), it has to provide for some internal processing of symbols – that is, coded information; and this is what cognitive psychology essentially deals with. In his lucid survey of trends in cognitive psychology, which ends the Colorado symposium (33) (see below), Fritz Heider comments on the emergence of this field and draws attention to some of the immediate influences within psychology that helped bring it about. Shortly after World War II there was a movement informally called the New Look, which brought together perception and thought processes into relation with emotions, needs, and other aspects of personality (3); it was in a sense an experimental exploration of instances in which perception and thinking lose their primary autonomy. One source of this revived interest in perception, Heider goes on, was the strong interest in the structure and dynamics of thinking engendered by the rise of projective techniques and (more generally) diagnostic psychological testing, particularly as conceptualized by Rapaport and his co-workers.2 The other source pointed out by Heider is information theory, which may be only indirectly related to cognition, “since it does not concern itself with meanings,” but it “suddenly achieved high status through being clothed in a magnificent mathematical theory” (p. 204). Because all these movements led to a good deal of laboratory experimentation, it became possible for academic psychology to readmit thought processes as a legitimate focus of scientific interest. A single book essay of reasonable length can only sample the flood of textbooks and monographic literature that has appeared in the last decade. The five books rather haphazardly chosen for consideration here are in a number of ways not a very representative sampling – most glaringly omitted are anything by Piaget (5, 10, 11, 19, 30, 39, 44; see also 67), any representative of information theory and related work (17, 25, 34, 42, 45, 74), and the work of the Russians (78). Nevertheless, they do illustrate a number of important trends in this new literature, about which psychoanalysts should be informed. Moreover, one of the books, the Colorado symposium, is itself quite a good sample of major developments in cognitive psychology, containing essays by a half dozen of the most important workers in this field, each of whom is represented by a book in the bibliography. Let us begin with it, therefore. Contemporary Approaches to Cognition (33) is highly recommended to analysts who wish to learn something about this whole area. The contributions vary considerably in their level of difficulty and the amount of prior contact with academic cognitive psychology they presuppose, yet all are worthy of close study, and the chapter on “Cognitive Structures” by Rapaport – the one most immediately accessible to psychoanalysts, and the only
Salkind_Chapter 26.indd 213
9/4/2010 10:33:23 AM
214
Curriculum, Instruction and Learning
one that considers data from psychopathology – is a small classic of ego psychology, one of Rapaport’s important statements about the neglected concepts of psychic structure and states of consciousness. The book begins with a characteristically difficult but important essay by the late Egon Brunswik. His major work (28) is more widely known and respected by psychologists than it is read, for Brunswik not only had an original and highly abstract set of ideas but clothed them in a jargon of barbaric density. He alludes, for example, to “problems of textural ecology,” by which he means problems of the psychologically significant structure of man’s environment – all of which are swallowed, for psychoanalysis, by the blandly encompassing term, “reality.” Moreover, he seems at first to be dealing with problems of little concern for analysts: details of how we achieve perception of distance, for example, and just how correct it usually is under everyday conditions. Yet his ideas are of fundamental importance to any theory that wants even grossly to cover the problems of adaptation, for the considerations he advances are at the heart of man’s relation to reality. Without a mastery of his conceptual apparatus, we come up against insoluble problems in the study of cognitive development, for example. Yet I hesitate to try to expound Brunswik here; he is too condensed, and his relevance not readily elucidated. Let me only indicate how he locates the problems of cognition. For Brunswik, the real objects that consensual validation assures us surround the percipient organism are distal; they give rise to patterns of radiant energy which at the moment of their impact on the sense organs comprise the proximal object, an objective state of affairs that changes constantly while the distal object remains constant. Consider the mother moving about the nursery and cooing to her child; she is a relatively stable distal object. To the baby’s retina, however, she presents an ever-changing pattern of light waves, which undergoes radical transformation as she approaches a step nearer and as the infant moves his head or eyes. But this is only the beginning of ambiguity; there are two further cognitive stages: the peripheral events in the organism, the configuration of retinal excitations and auditory stimulations in the sense organ proper, which are transmitted and transformed by various way stages until they attain central representation in the projection areas of the cortex. In this system, there are problems of what Heider calls mediation (how information is transmitted at various points between distal object and central process), and of what Brunswik calls “achievement” or “functional validity”: “the over-all correspondence between a certain distal and a certain central variable, so that the former could be considered successfully mapped into the latter.” More simply, it is the basic epistemological problem: how does man attain veridical knowledge of the world? And Brunswik shows a number of ways in which it can be fruitfully studied. It is no news to psychoanalysts that, by and large, they have neglected the study of reality in their fascination with inner dynamics; but it is interesting to hear Brunsrvik charge psychology with a similar neglect: “Both historically
Salkind_Chapter 26.indd 214
9/4/2010 10:33:23 AM
Holt
The Emergence of Cognitive Psychology
215
and systematically psychology has forgotten that it is a science of organismenvironment relationships, and has become a science of the organism.” There is one notable exception, Fritz Heider (48), the pioneer who has taken more seriously than perhaps any other psychologist the problems of mediation and of the structure of the environment, and who first brought them to Brunswik’s attention. In one place, Brunswik mentions psychoanalysis (toward which he was generally sympathetic): “Psychoanalytic mechanisms [of defense, that is] are an expression of vicarious functioning,” he writes ( p. 22). The significance of this statement depends first on a realization that he believed “vicarious functioning is one of the most fundamental principles, if not the most fundamental principle, of behavior.” This asserts that man is not a stimulus-response machine, but that many external events may have the same significance for a person (thus are vicarious substitutes one for another) and he may respond by a great variety of specific acts, all having the same meaning and readily interchangeable. In line with its tendency to put as much as possible within the organism, making hypothetical energic transformations the fundamental explanatory principle, psychoanalysis conceptualizes as displacements or transformation of instincts events that can be simply accounted for in terms of the looseness of coordination between stimulus cues, with their limited validities as indicators of what is the real state of affairs in the distal object, and centrally evoked meanings. Just because there is this ambiguity of relation between aspects of reality and proximal events (eg., retinal), adaptation demands that man be able to use, in a flexibly vicarious way, a great variety of cues; and to use them not at random but in a hierarchical order following their actual ecological validities. It would miss the point entirely and locate in the realm of primary process something that is an essential part of secondary process and contact with reality, if we considered such substitutions to be displacements. At this point, it may be meaningful to digress a bit and take up a book by Sarbin, Taft, and Bailey (64), Clinical Inference and Cognitive Theory, for it professes to be much influenced by Brunswik. It is an unfortunate example of how much easier it is to copy mannerism than to absorb substance; the worst features of Brunswik’s writing are here together with much of his idiosyncratic vocabulary, but without his underlying clarity of thought. Taking off from the controversial attack on clinical thinking in Paul Meehl’s influential Clinical vs. Statistical Prediction (University of Minnesota Press, 1954), these authors restate the epistemological problem for the clinician, and set up a hypothetical analysis of the ways he proceeds from observations to inferred propositions about the distal object: the patient. Their answer, briefly, is that all cognition is the ordering of data into categories, which it pleases them to call “modules.” The process of statistical inference, too, is one of ordering data into categories, and the book’s principal conclusion is “that, in principle, clinical inference is only a special form of statistical
Salkind_Chapter 26.indd 215
9/4/2010 10:33:23 AM
216
Curriculum, Instruction and Learning
inference” (p. 267). Only a few lines later, as well as in many other places in the book, we read that major premises often used by clinicians “are developed through non-inductive processes, namely, analogy and construction” – that is, from theory. If so, they are not empirical generalizations or summaries of statistical experience, and it puzzles me how Sarbin and his collaborators can square this with their main conclusion. What is statistical about statistical inference is not the syllogistic form but the way the major premises are formed. Thus, it is misleading and incorrect to claim that clinical inference is a “special form” of the statistical. There is much in the book that I agree with, yet the tone, the slant, the interpretation, continually rub me the wrong way, and in fact seem inconsistent with much of the argument. It is enough of a contribution to show the clinician that much of what he does is in fact based on statistical reasoning, and to urge the use of more exact and formalized experience tables for decisions of this kind. Why not admit, however, that there is such a thing as a creative or a synthesizing cognitive act, and that clinical judgment involves a good deal that is qualitatively different from statistical inference? The need to deny these obvious facts gets the authors into a couple of fallacies. They show that nonstatistical forms of reasoning can lead to error, giving them only the unimportant status of flaws in the inferential system, which may help the authors to ignore their existence. This argument overlooks the phenomena implied by the concept of regression in the service of the ego: in creating new ideas, new forms, new theories, we can use primitive, illogical, even magical kinds of thought processes. A single case is obviously not a reliable base for a hypothesis, yet the hypothesis may prove true. The fact that the primary process often disregards logic and reality does not mean that it necessarily leads to error: it only means that hypotheses must be tested. But these authors seem unaware that anything like the primary process exists; if they had to recognize it, they would undoubtedly dismiss it as only a faulty and thus unimportant variant of logical (for us, secondaryprocess; for them, statistical) thinking. Sarbin et al. try to deal with the sticky problem of creativity essentially by denying that it exists except within a narrow range. Since their model for thought involves only the manipulation of empirically built-up premises, it is embarrassed by the problem of novelty; therefore, in trying to refute Meehl’s argument that clinical intuition involves creative cognitive acts, they minimize the extent to which novelty actually exists and stress the fact that “no creative act emerges from the void without knowable antecedents” – obviously a strawman argument. Of course there are antecedents; of course nothing is wholly new; but how do we account for the part that is new? Only through the “development and employment of previously undiscovered or unutilized species or classes,” we are told (P. 82). Although this is stated in a grudging way that seems to imply that it doesn’t amount to much, it is a considerable admission; but how about novel configurations of concepts? There is not a word on this
Salkind_Chapter 26.indd 216
9/4/2010 10:33:23 AM
Holt
The Emergence of Cognitive Psychology
217
issue, because this aspect of clinical work is ignored in a procrustean restriction of clinical inference to processes that can be fitted to a syllogistic model. Finally, we read the curious, truistic statement that the creative acts of the clinician “are predictable with a probability p.” Does this, then, deny that they are creative? Is the value of p irrelevant, no matter how small? This is a variant of an argument that pervades the book: a false dichotomy is set up between statistical inference, on the one hand, and intuition defined as a mystical bolt from the blue, a revelation not amenable to empirical analysis, on the other. The latter is not hard to refute, and so the (specious) demonstration is apparently complete that nothing is left but statistical inference. Despite the apparent promise in its title that a psychoanalyst who is interested in thought processes will find Clinical Inference and Cognitive Theory especially rewarding, quite the opposite is the case. The book’s merits are far overshadowed by the fallacies mentioned, by the preposterous difficulties of its style, and by a pervasively static outlook to which psychodiagnosis is mere pigeonholing. All of this is a far cry from Brunswik, to whom this book was dedicated, as was that of Bruner, Goodnow, and Austin (27). Next in the Colorado symposium is a characteristically graceful and interesting paper by Jerome s. Bruner (director of the already-mentioned Center for Cognitive Studies), on “Going Beyond the Information Given.” In this chapter, as in his book, A Study of Thinking (27), the Harvard psychologist is so literate and persuasive that one may easily miss the fact that his contribution is more a synthesizing than a very deep or original one. These remarks are largely a response to what strikes me as overpraise of his book in some early reactions: sound, thorough, suggestive, insightful, and valuable it certainly is, but hardly a “magnificent book . . . a revolution in the psychology of thought,” as Jean Piaget was incautious enough to comment for the dust jacket. It covers far more intelligibly much of the same territory as Sarbin’s book – the organization of knowledge by means of conceptual categories – and is well worth reading, at least its first few, general chapters; most psychoanalysts will be less interested in the bulk of the book, which reports specific experiments. To a reader who is already familiar with the extensive treatments of conceptual thinking in Rapaport’s Diagnostic Psychological Testing and Organization and Pathology of Thought (6), much of the general discussion will seem like familiar thoughts put into slightly different words (though the book is quite without reference to either of these books and does not mention Rapaport’s name). In the Colorado essay as well as in his own book, Bruner has interesting things to say about the adaptive utility of concepts, which enable man to go beyond the information given him; he ingeniously brings together a wide variety of sources and shows a common thread of relevance to his point. But in the end it is difficult to say just what was specifically Bruner. Perhaps I am overvaluing originality, and being overcritical of this gifted, widely learned, highly productive, and eminently useful middleman of ideas.
Salkind_Chapter 26.indd 217
9/4/2010 10:33:23 AM
218
Curriculum, Instruction and Learning
One of Bruner’s contributions is his ability to point out relevance of an idea in many diverse fields, or to see the point that many seemingly illassorted sources have in common. He is in this respect a kind of virtuoso of the very process he and his collaborators studied: concept attainment. One of the fields he brings into relation to his thesis is linguistics; and here he has the doughty assistance of Roger Brown, who contributed a meaty, 65-page appended chapter on “Language and Categories.” Brown is one of those interesting new scientific hybrids, the psycholinguists; Charles Osgood, who contributed the next chapter in the Colorado symposium, is another. Osgood declares his allegiances with his chapter title: “A Behavioristic Analysis of Perception and Language as Cognitive Phenomena.” He is indeed a behaviorist, but one who has been induced to expand his purview by an attempt to take “verbal behavior” seriously. As psychologists have lifted their eyes from the simplified artificial situation of the association test, in which one word calls forth one other, and have caught sight of the intricate beauty and highly organized complexity of ordinary, connected, meaningful discourse, they have begun to see that these woods are something more than rows of trees. Moreover, they have discovered there a hardy band of woodsmen, the structural linguists, already possessed of a great deal of lore about this awesome realm. Linguists have developed a formidable body of rigorous method and theory for searching out the order within language and conceptualizing it quite without any concern for its psychological origins. One of the saving virtues the behaviorists have always had has been a genuine respect for rigor. Even though it has led many of them to reject psychoanalysis out of hand for its conspicuous lack of such properties, this value enabled behavioristically trained psychologists such as Osgood and Mowrer (61), of Illinois, and Russell and Jenkins of Minnesota (to name only two well-known university centers of this kind) to see the merits of structural linguistics and the possibility of a fruitful collaboration of disciplines. Out of such unions, in which nonbehaviorists also took part, the field of psycholinguistics was born. It is showing all the signs of hybrid vigor, and much contribution to cognitive psychology may be expected from it (18, 20, 34, 40). Osgood’s paper (which I shall not attempt to summarize, since it is a technical and detailed analysis, partly in neurophysiological terms) introduces his ingenious technique of studying connotative meaning, the semantic differential, which he describes in a great deal more detail in his recent book (38). The essence of the device is to get people to rate any word or concept on a variety of rating scales, which may be related to it only tangentially or physiognomically. People agree that the term HERO is “fast” rather than “slow,” “heavy” instead of “light,” and more “clean” than “dirty.” StatisticaI studies show that three and only three independent dimensions of connotative meaning, sampled by the terms just quoted, recur again and again. This flexibly useful tool has won a far broader acceptance than the theory, in which coding and mediation are key conceptions.
Salkind_Chapter 26.indd 218
9/4/2010 10:33:23 AM
Holt
The Emergence of Cognitive Psychology
219
With the aid of these concepts, Osgood attains a level of sophistication rather characteristic of neo-behaviorists in recent decades. It is no longer possible for psychoanalysts simply to brush their work aside as “obviously fallacious” or “inherently too limited to be taken seriously.” David Rapaport’s discussion, which follows the paper, is quite respectful, yet points out several deficiencies from the vantage point of ego psychology: the theory’s lack of any concept of or explanation for autonomy, the instability of conditioning as a theory of learning on which the structural aspects of Osgood’s theory are based, and its neglect of the problem of motivation. One of Bruner’s principal concepts, which played something of an economic role in his theory, was cognitive strain and the need to minimize it. In his essay on “The Relation between Behavior and Cognition,” Leon Festinger introduces the apparently similar conception of cognitive dissonance, which has, however, had a much more massive impact. Like so many of the other contributors, he set his ideas forth at greater length in a contemporary book (35); he and his students have published extensively on his theory and experiments growing out of it since then. A new book by Brehm and Cohen (73) is the most recent addition to a fast-growing literature. Obviously, there are some productive ideas here. The underlying conception is very simple. Festinger postulates “that, if a circumstance should arise such that some cognitive elements do not fit or are not in line with a person’s actions, there will arise pressures directed toward changing these dissonant cognitive parts” (p. 128). In short, he has discovered intrasystemic conflict and the synthetic function of the ego, operating on cognitive materials in a way Breuer and Freud nicely described in 1893. But they did not discover what philosophers have called the strain for consistency, either; it is an ancient conception. Festinger’s merit lies in having observed sharply some of the kinds of behavior that are so motivated, and in having set up a theory about it from which a number of nonobvious predictions can be derived and tested. As Bruner points out in his discussion, some of them are obvious and Aesop-like enough: the theory predicts that the inaccessible grapes will be judged sour. And in the brief presentation that was possible here, in the course of which three experiments are described, Festinger could not go into some of its more interesting and less elementary aspects. His success has been such that it may be desirable to point out the theory’s limitations: cognitive dissonance does not try to account for all of cognition, but deals only with a range of problems in the realm of the synthetic function. At least, however, his work shows that this single aspect of autonomous ego functioning can have quite powerful motivating effects on behavior. Rapaport’s paper on “Cognitive Structures” is, except for Fritz Heider’s wise concluding remarks, the last chapter in the Colorado book. Rapaport never made a more richly concrete presentation of his ideas than in the present paper. Particularly fascinating are his detailed descriptions and structural
Salkind_Chapter 26.indd 219
9/4/2010 10:33:23 AM
220
Curriculum, Instruction and Learning
analyses of sequences of his own images, reveries, fantasies, and dreams – the yield of an extraordinary feat of self-discipline. He trained himself in a kind of near-automatic writing so that he was able to make a running record of the contents of experience as he passed from drowsiness into sleep, a series of states of consciousness accompanied by demonstrable changes in many detailed formal features of thought. Altogether, it constitutes an extremely valuable clinical exposition of primary-process ideation (cf. also his paper in [2]). The material of self-observation is well supplemented by clinical observations of an amnestic case and test data from schizophrenics, all interpreted to support the propositions “that varieties of consciousness are themselves organized means of cognition . . . [and] that we are dealing here with quasistable cognitive organizations that use different tools or mechanisms of cognition, and are themselves organized means of cognition” (p. 180). The concept of states of consciousness has had a curious history in psychoanalysis. Breuer, of course, expounded the idea that there were special hypnoid states, with their own unique structural characteristics and contents in poor communication with those of other states of mind – all this was central to his understanding of hysteria. The origins of the conception are plain to see in the contemporary French psychopathology, impressed as it was ever since Mesmer by the phenomena of that special state of consciousness, hypnosis. For a youthful Freud, fired with his first dynamic hypotheses and insights into the defensive nature of hysterical symptoms, what was valid in the older views was much less impressive than the fact that they could be used as a conservative resistance against his own bold propositions. His intellectual development required first the mastery of the French approach through his apprenticeship with Charcot and then the rejection of hypnosis and most of the ideas associated with it. It has apparently required the long years during which the dynamic point of view reigned supreme in psychoanalysis to make it possible for analysts to turn serious attention to such structural considerations as states of consciousness without feeling it an act of disloyalty or resistance. So we have seen, on the one hand, the general growth of ego psychology, and on the other, the rise of interest in hypnosis (cf. the important book by Rapaport’s colleagues Gill and Brenman [47]), what Fliess has aptly called The Revival of Interest in the Dream,3 and the remarkable revival of attention to the work of Pötzl, Allers and Teler (62) on subliminal phenomena, with which the names of Charles Fisher and George S. K lein are prominently associated. Important parallel and recent developments outside psychoanalysis have also turned scientific attention to states of consciousness and correlated cognitive phenomena: the physiologically oriented approach to sleep and dreams growing out of the work of K leitman and Dement (and related work on the cognitive effects of sleep deprivation and dream deprivation), and the enormously influential work of Magoun, Lindsley, Jaspers, and others on the structure and function of the reticular activating systems of the brainstem.
Salkind_Chapter 26.indd 220
9/4/2010 10:33:23 AM
Holt
The Emergence of Cognitive Psychology
221
From this last line of work has come the important physiological concept level of arousal, which clearly seems coordinated with the psychological phenomena Rapaport called states of consciousness. An exciting field of collaborative work is emerging in which such teams as the physiologist Dement and the psychoanalyst Fisher can bring together the insights and methods of these two independent traditions on phenomena of this type. Dreams, subliminal influences on thought, altered states of consciousness – all these are at the center of the stage for the non-Freudian anaIysts Tauber and Green, and are summed up as forms of Prelogical Experience (53). The range they try to cover is even broader, taking in all forms of symbols, and phenomena as diverse as countertransference and extrasensory perception – truly a fascinating set of off-beat phenomena. One approaches the book with hopes that a Sullivanian approach to this underworld may provide some fresh insights about the primary process. The authors have a commendable striving for breadth; they read widely, quote many authors, pick up and handle many concepts, and cite quite a few experimental studies. Somehow, however, nothing much seems to come of it all. What tries to be breadth finishes as footless eclecticism; when they quote an interesting set of observations, it ends up as an isolated datum for lack of any conceptual framework within which diverse contributions might have been integrated. One has only to recall Organization and Pathology of Thought to realize that it is perfectly possible to piece together ideas and facts from diverse sources in a useful way; but these authors lack not only Rapaport’s synthetic powers but his theory. The difficulty goes beyond the lack of a stout theory to give the book backbone. The authors always fail to come to grips with a problem fully: they open it up, talk around it, and occasionally throw out an original observation or insight – for they are intelligent men – but then it somehow slips through their fingers in a lather of frothy phrases about the richness of human experience, or the like. The key to the difficulty may be found, I believe, in the point of view expressed in such a passage as this: “The rational process is actually at its best when it is dealing with past established structural entities; but when applied to emerging material, to emerging facts, it may actually introduce an irrational note by virtue of the fact that it is not applicable as such to emerging material” (p. 73). The argument is, in effect, that the secondary process cannot really grasp the primary process: therefore, we should abandon the attempt to be clear, logical, and rational when discussing what is intrinsically unclear, irrational, and inchoate. This is the logic of “set a thief to catch a thief” – which is plausible until you start to argue that all good policemen must be practicing criminals. It is the attitude of the notorious group of psychopharmacologists who recently proclaimed that it was fallacious to try to study the effects of LSD unless the experimenter was under the influence of the drug himself at the same time as the subjects. In this extreme example, we can see the fallacy: empathy and intuition play an indispensable role when one studies the primary process, but they are not identical with losing oneself
Salkind_Chapter 26.indd 221
9/4/2010 10:33:23 AM
222
Curriculum, Instruction and Learning
in it. Fortunately for man’s survival, it is perfectly possible for the secondary process to deal with data obtained during moments of controlled regression; otherwise we should have no science of psychoanalysis at all. At the risk of getting beyond my sphere of competence, I cannot resist pointing out the dangerous implications of this point of view when applied to technique. This same kind of failure to see the limits and the proper role of empathy leads Tauber and Green to advocate acting out in the countertransference as a therapeutic strategy. They have a valid point, that the analyst may at times be unable to grasp the unconscious significance of the patient’s behavior except through seeing its effect on his emotions (what H. A. Murray called recipathy); but it is perfectly possible for the analyst to let only enough of this kind of affect develop to recognize and use it, without burdening the patient by countertransference confessions, all done in the self-justifying name of “spontaneity” and being honest with the patient. The links of this outlook to the anti-intellectualism of Zen and the oldfashioned verstehende psychology are apparent enough. It is sad to see interesting data on fringe experiences misused in an attempt to justify the abandonment of controls by invoking fidelity to the “prelogical” sources of what is creative and spontaneous in human behavior. It is even sadder to reflect on the consequences for the patient of the therapeutic approach advocated here. For the reader who turns from this slight book to Rapaport’s paper in the Colorado symposium, there could be no more convincing refutation of the Tauber-Green hypothesis: their attempt gets them nowhere, while the application of a sharp critical intelligence to the same kinds of twilight phenomena greatly enriches our understanding. The last book to be considered here also is devoted to the realm of imagery, creative and abnormal thought, dreams and the products of lowered states of vigilance. In his Imagination and Thinking, Peter McKellar (36) draws on a British tradition of gentlemanly curiosity about odd mental phenomena that goes back to Sir Francis Galton. With a very British kind of urbanity, he converses with us informally yet elegantly about neglected aspects of experience, into which he has delved over a period of years. He begins with a distinction between A-thinking and R-thinking, which turns out to be exactly Freud’s differentiation of primary and secondary process under another name. Oddly enough, though McKellar shows a pretty good grasp of a few of Freud’s basic works, he never refers to these concepts and fails to integrate his own observations with the psychoanalytic tradition. Nevertheless, the very fact that so many of the facts gathered together in this small and pleasantly written book do not come from psychoanalytic sources may make it worth the while of analysts to read McKellar’s contribution. It adds a little to the study of the primary process, and is quite easy to absorb into a psychoanalytic conceptualization. One of the principal merits of Imagination and Thinking is its meticulous attention to the form-varieties of mental imagery, a much neglected and
Salkind_Chapter 26.indd 222
9/4/2010 10:33:23 AM
Holt
The Emergence of Cognitive Psychology
223
fascinating topic. Analysts who have been impressed by Isakower’s observations concerning hypnagogic imagery will be interested by this report of McKellar’s large-scale studies of these images occurring “between wakefulness and sleep” (one of his chapter headings), though perhaps a bit disappointed by his failure to do much with the ramifications of such phenomena into the realm of unconscious motivation. He is a natural historian with a touch of statistics, very much after Galton’s model, rather than a seeker for functional relatedness. He misses, therefore, the fact that “A-thinking” is not merely associative and uninhibited but has laws of its own – those of the primary process. Cognitive psychology has many faces, as even this cursory survey shows. To change the metaphor, it is growing rapidly at several separate points. Equally healthy and promising growth has been taking place in sectors of the field that are informed with a knowledge of psychoanalysis and others in which the term itself is anathema. Yet little of the old-fashioned ignorant and prejudiced rejection of psychoanalytic contributions is heard any more. Ours is a more sophisticated era, one in which Gestalt psychology (once a stronghold of cognitive psychology [cf. Wertheimer, 55]) is disappearing as such, and behaviorism is mellowing, having won its point about the need for objectivity. Now that workers of so many persuasions can deal with many of the same cognitive problems, there is renewed hope that research in this field may stimulate a truly embracing theory, a synthesis of academic and psychoanalytic psychologies that will be more powerful and more comprehensive than either alone.
A Bibliography of Recent Books on Thinking and Cognitive Psychology Note: The coverage of the core topic is relatively complete for the dozen years under consideration, but books in the following tangential areas have been included only rather unsystematically if at all: testing, intelligence, hypnosis, parapsychology, perception, learning, dreams, psycholinguistics, communication and information theory, prejudice and other attitudes, forcible indoctrination, computer simulation of cognitive processes, and works centered on neurological or physiological approaches.
Notes 1. In a paper, “Imagery: The Return of the Ostracized.” Amer. Psychologist, 19:254–264, 1964. 2. Rapaport, D., Gill, M. M., & Schafer, R. Diagnostic Psychological Testing, 2 Vols. Chicago: Yearbook Publishers, 1945–1946. 3. New York: International Universities Press, 1953.
Salkind_Chapter 26.indd 223
9/4/2010 10:33:23 AM
224
Curriculum, Instruction and Learning
References 1951 1. Abramson, H. A., ed. Problems of Consciousness: Transactions of the First Conference (1950). New York: Josiah Macy, Jr. Foundation. 2. Abramson, H. A., ed. Problems of Consciousness: Transactions of the Second Conference (1951). New York: Josiah Macy, Jr. Foundation. 3. Blake, R. R. & Ramsey, G. V., eds. Perception: An Approach to Personality. New York: Ronald Press. 4. Humphrey, G. Thinking. London: Methuen. 5. Piaget, J. Play, Dreams and Imitation in Childhood (1945). New York: Norton. 6. Rapaport, D., ed. Organization and Pathology of Thought. New York: Columbia University Press. 1952 7. Abramson, H. A., ed. Problems of Consciousness: Transactions of the Third Conference (1952). New York: Josiah Macy, Jr. Foundation. 8. Ashby, W. R. Design for a Brain. New York: Wiley. 9. French, T. M. The Integration of Behavior, Vol. 1. Chicago: University of Chicago Press (Vol. 2, 1954; Vol. 3, 1958). 10. Piaget. J. The Child’s Conception of Numbers (1941). New York: Humanities Press. 11. Piaget, J. The Origins of Intelligence in Children (1936). New York: International Universities Press. 12. Vinacke, W. E. The Psychology of Thinking. New York: McGraw-Hill. 1953 13. Abramson, H. A., ed. Problems of Consciousness: Transactions of the Fourth Conference (1953). New York: Josiah Macy, Jr. Foundation. 14. Price, H. H. Thinking and Experience. Cambridge: Harvard University Press. 15. Walter, W. G. The Living Brain. New York: Norton. 1954 16. Abramson, H. A., ed. Problems of Consciousness: Transactions of the Fifth Conference (1954). New York: Josiah Macy, Jr. Foundation. 17. Jackson, W., ed. Communication Theory. London: Butterworths Sci. Publ. 18. Osgood, C. E. & Sebeok, T. A., eds. Psycholinguistics: A survey of theory and research problems. J. Abnorm. Soc. Psychol., 203 pp. suppl. 19. Piaget, J. The Construction of Reality in the Child (1937). New York: Basic Books. 20. Revesz, G., ed. Thinking and Speaking: a symposium. Amsterdam: North-Holland. 21. Witkin, H. A. et al. Personality Through Perception. New York: Harper. 1955 22. Allport, F. H. Theories of Perception and the Concept of Structure. New York: Wiley. 23. Johnson, D. M. The Psychology of Thought and judgment. New York Harper. 24. Kelly, G. A. The Psychology of Personal Constructs, 2 Vols. New York: Norton. 25. Quastler, H., ed. Information Theory in Psychology. Glencoe, Ill.: Free Press. 26. Patrick, C. What Is Creative Thinking? New York: Philosophical Library. 1956 27. Bruner, J. S., Goodnow, J. J., & Austin, G. A. A Study of Thinking. New York: Wiley. 28. Brunswik, E. Perception and the Representative Design of Experiments, 2nd ed. Berkeley: University of California Press. 29. Odier, C. Anxiety and Magic Thinking (1948). New York: International Universities Press.
Salkind_Chapter 26.indd 224
9/4/2010 10:33:23 AM
Holt
The Emergence of Cognitive Psychology
225
30. Piaget, J. & Inhelder, B. The Child’s Conception of Space (1948). New York: Humanities Press. 31. Russell, D. H. Children’s Thinking. Boston: Ginn. 32. Smith, M. B., Bruner, J. S., White, R. W., Aberle, D. F., et al. Opinions and Personality. New York: Wiley. 1957 33. Bruner, J. S., Brunswik, E., Festinger, L., Heider, F., Muenzinger, K. F., Osgood, C. E., & Rapaport, D. Contemporary Approaches to Cognition. Cambridge: Harvard University Press. 34. Cherry, C. On Human Communications. New York: Wiley. 35. Festinger, L. A Theory of Cognitive Dissonance. Evanston, Ill.: Row, Peterson. 36. McKellar, P. Imagination and Thinking. New York: Basic Books. 37. Olsen, F. et al. The Nature of Creative Thinking. New York: New York University Press. 38. Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. The Measurernent of Meaning. Urbana: University of Illinois Press. 39. Piaget, J. Logic and Psychology (1953). New York: Basic Books. 40. Skinner, B. F. Verbal Behavior. New York: Appleton-Century-Crofts. 1958 41. Bartlett, F. C. Thinking. New York: Basic Books. 42. Broadbent, D. E. Perception and Communication. New York: Pergamon Press. 43. Kubie, L. S. Neurotic Distortion of the Creative Process. Lawrence: University of Kansas Press. 44. Inhelder, B. & Piaget, J. The Growth of Logical Thinking from Childhood to Adolescence (1955). New York: Basic Books. 1959 45. Attneave, F. AppIications of Information Theory to Psychology. New York: Holt. 46. Gardner, R., Holzman, P. S., Klein, G. S., Linton, H., & Spence, D. P. Cognitive Control: A Study of Individual Consistencies in Cognitive Behavior [Psychological Issues, Monogr. No. 4]. New York: International Universities Press. 47. Gill, M. M. & Brenman, M. Hypnosis and Related States. New York: International Universities Press. 48. Heider, F. On Perception, Event Structure, and the Psychological Environment [Psychological Issues, Monog. No. 3]. New York: International Universities Press. 49. Luchins, A. S. & Luchins, E. H. Rigidity of Behavior. Eugene: University of Oregon Books. 50. Paul, I. H. Studies in Remembering: The Reproduction of Connected and Extended Verbal Material [Psychological Issues, Monogr. No. 2]. New York: International Universities Press. 51. National Physical Laboratory. Mechanization of Thought Processes, 2 Vols. London: Her Majesty’s Stationery Office. 52. Schachtel, E. G. Metamorphosis. New York: Basic Books. 53. Tauber, E. S. & Green, M. R. Prelogical Experience. New York: Basic Books. 54. Thomson, R. The Psychology of Thinking. Baltimore: Penguin Books. 55. Wertheimer, M. Productive Thinking, enlarged ed. New York: Harper. 1960 56. Abercrombie, M. L. J. The Anatomy of Judgment. New York: Basic Books. 57. Berlyne, D. E. Conflict, Arousal and Curiosity. New York: McGrarv-Hill. 58. Gardner, R. W., Jackson, D. N., & Messick, S. J. Personality Organization in Cognitive Controls and Intellectual Abilities [Psychological Issues, Monogr. No. 8]. New York: International Universities Press.
Salkind_Chapter 26.indd 225
9/4/2010 10:33:23 AM
226
Curriculum, Instruction and Learning
59. Harms, E. & Guilford, J. P., eds. Fundamentals of psychology: the psychology of thinking. Trans. N. Y. Acad. Sci., 91:1–158. 60. Miller, G. A., Galanter, E., & Pribram, K. H. Plans and the Structure of Behavior. New York: Holt. 61. Mowrer, O. H. Learning Theory and the Symbolic Processes. New York: Wiley. 62. Pötzl, O., Allers, R., & Teler, J. Preconscious Stimulation in Dreams, Associations, and Images: Classical Studies [Psychological Issues, Monogr. No. 7]. New York: International Universities Press. 63. Rokeach, M. The Open and Closed Mind. New York: Basic Books. 64. Sarbin, T. R.. Taft, R., & Bailey, D. E. Clinical Inference and Cognitive Theory. New York: Holt, Rinehart & Winston. 65. Shands, H. Thinking and Psychotherapy. Cambridge: Harvard University Press. 66. Solley, C. M. & Murphy, G. The Development of the Perceptual World. New York: Basic Books. 67. Wolff, P. H. The Developmental Psychologies of Jean Piaget and Psychoanalysis [Psychological Issues, Monogr. No. 5]. New York: International Universities Press. 1961 68. Church, J. Language and the Discovery of Reality: A Developmental Psychology of Cognition. New York: Random House. 69. Fiske, D. W. & Maddi, S. R. Functions of Varied Experience. Homewood, Ill.: Dorsey Press. 70. Harvey, O. J. Conceptual Systems and Personality Organization. New York: Wiley. 71. Restel, F. Psychology of Judgment and Choice. New York: Wiley. 72. Schwartz, F. & Rouse, R. O. The Activation and Recovery of Associations [Psychological Issues, Monogr. No. 9]. New York: International Universities Press. 1962 73. Brehm, J. W. & Cohen, A. R. Explorations in Cognitive Dissonance. New York: Wiley. 74. Garner, W. R. Uncertainty and Structure as Psychological Concepts. New York: Wiley. 75. Getzels, J. W. & Jackson, P. W. Creativity and Intelligence. New York: Wiley. 76. Gruber, H., Terrell, G., & Wertheimer, M., eds. Contemporary Approaches to Creative Thinking. New York: Atherton Press. 77. Messick, S. & Ross, J., eds. Measurement in Personality and Cognition. New York: Wiley. 78. Vygotsky, L. S. Thought and Language (1934). New York: Wiley.
Salkind_Chapter 26.indd 226
9/4/2010 10:33:23 AM
27 The Advancement of Learning Ann L. Brown
Neither the hand nor the mind alone would amount to much without aids and tools to perfect them. (Bacon, Novum Organum, 1623)
T
his loosely translated quotation is taken from Francis Bacon’s Novum Organum, not from Vygotsky, as one might well imagine. In this article, I argue that designing aids and tools to perfect the mind is one of the primary goals of educational research. In this spirit, the major themes of the article are that:
• Instruction is a major class of aids and tools to enhance mind. • To design instruction, we need appropriate theories of learning and development. • Enormous advances have been made in this century in our understanding of learning and development. • School practices in the main have not changed to reflect these advances. • The question posed is, Why? My title, The Advancement of Learning, is also taken from Bacon (1605). The title is a metaphor, as I will view the advancement of learning particularly during the 30 years or so since the cognitive revolution. Contemporary theories, unlike those of the past, concentrate on the learning of complex ideas as it occurs in authentic situations including, but not limited to, schools. In keeping with Bacon, I will paint a general picture of progress but at the same time add a cautionary note concerning the infanticide rate of our profession. We repeatedly throw out babies along with bathwater, when we Source: Educational Researcher, 23 (1994): 4 –12.
Salkind_Chapter 27.indd 227
9/4/2010 10:33:15 AM
228
Curriculum, Instruction and Learning
should build cumulatively. No community can afford to lose so many valuable offspring in the service of progress. I will begin with a personal odyssey. In rereading the Presidential Addresses from the past 10 years or so, I realized that this genre, the odyssey, is a popular one. Indeed, the metaphor of an odyssey was the leitmotif of Eliot Eisner’s 1993 address. Pivotal to this narrative genre is the retelling of the myriad interesting life experiences of those who subsequently went on to become President of AERA. Now here’s my problem. I am a psychologist. I have always been a psychologist of sorts. I started my academic career as an undergraduate studying learning, and I am still doing that today, in my fashion. But what I did then and what I do now are as distinct as night and day. I was well prepared for my career as a learning theorist. In high school, I specialized in 18th century literature and 19th century history, and was on my way to study history in college. Why switch? I saw a television program on animal learning, on how animals learn naturally in their environments, an introduction to ethology. The heroes of this piece were Huxley, Lorenz, Thorpe, and Tinbergen. Fascinated, I looked up animal learning in my handy guide to universities and found that to study learning you needed a degree in psychology. Thus prepared I set out for an interview, having seen one television program on ethology and having read Freud’s Psychopathology of Everyday Life on the train getting there. By chance the head of department was an expert in 18th century literature. We discussed poetry for 2 hours. I got a scholarship to study psychology! So in the early 60s I started out for London to study animal learning. I arrived in Iowa, or maybe it was Kansas, feeling a little like Dorothy in The Wizard of Oz. The cognitive revolution had not yet come to London. What followed was 3 years of exposure to behaviorist learning theory. Rather than learning about animals adapting to their natural habitats, I learned about rats and pigeons learning things that rats and pigeons were never intended to learn. Pan-Associationism. Experimental psychologists in England (and Iowa) at that time were enthralled with a certain form of behaviorism. Dominating the field were the all-encompassing learning theories of Hull/Spence, Tolman, and Skinner.1 These theories shared certain features that limited to a greater or lesser extent their ability to inform educational practice. All derived their primary data from rats and pigeons learning arbitrary things in restricted situations. They shared a belief that laws of learning of considerable generality and precision could be found. These basic principles of learning were thought to apply uniformly and universally across all kinds of learning and all kinds of situations. The principles were intended to be species-, age-, domain-, and context-independent. Pure learning was tested in impoverished environments where the skills to be learned had little adaptive value for the species in question. Paul Rozin (1976) argued that by studying the behavior
Salkind_Chapter 27.indd 228
9/4/2010 10:33:16 AM
Brown
The Advancement of Learning
229
of pigeons in arbitrary situations, we learned nothing about the behavior of pigeons in nature, but a great deal about the behavior of people in arbitrary situations. I will illustrate with a surely apocryphal tale related by Mary Catherine Bateson (1984). Her father Gregory Bateson’s favorite tongue-in-cheek psychologist anecdote was the following: It occurred to a thoughtful rat-runner after many years of running rats that as rats do not usually live in mazes, mazes were perhaps less than optimal testing grounds for learning. Therefore, he bought a ferret, a species that in nature does hunt in mazes – rabbit warrens. He baited a maze with fresh rabbit meat and set the ferret to find it. On the first day, the ferret systematically searched the maze and found the rabbit quicker than a rat. But what happened on the second day? The rat, as expected, searched the maze and found the bait more quickly than on the original trial. Learning was said to have occurred. But not so for the ferret. It searched the maze and came to the route that had previously led to the reward, but didn’t go down it. Why? He’d eaten that rabbit yesterday. What the ferret had learned was colored by its expectation of how the world works – for ferrets. (anecdote adapted from pp. 170–171)
How did this dominance of certain forms of behaviorism come about? Psychology as a nascent science didn’t start out that way. One of the few female pioneers in the early part of the century, Mary Calkins (1915), criticized the overwhelmingly male establishment by arguing that psychology started out as the study of consciousness and then set about to explain it away, even to deny its existence. Throughout her career she argued, in the wilderness, that psychology should be the study of “conscious interacting social selves in relation to other selves and objects.” Vygotsky, perhaps, but a far cry from Thorndike, Watson, and Hull. Animal Learning. The dominance of behaviorism in the mid part of the century has often been blamed on the increasing dependence on animals as experimental subjects. Animals are not known for their introspection, and few investigators were concerned whether animal thought was imageless or not, or whether they entertained theories of mind. This argument does not follow through, however, as early work with animals had a distinctly mentalistic flavor. Leonard Hobhouse, in his delightful book, Mind in Evolution (1901), studied a variety of animals, albeit somewhat informally: One reads that the subjects were: “a dog, a cat, an otter, and an elephant” or “a rhesus monkey called Jimmy and a chimpanzee named Professor.” Using a variety of puzzle-like, meaningful situations (a dog opening a gate to escape its own yard, rather than playing in a Thorndikian puzzle box), Hobhouse found evidence for such mental-sounding entities as purpose, planning, cunning, and deceit, mental entities again being studied today (Griffin, 1992). So too,
Salkind_Chapter 27.indd 229
9/4/2010 10:33:16 AM
230
Curriculum, Instruction and Learning
during the first world war, Kohler’s chimpanzees, such as the famous Sultan, were also seen to be insightful as they set about building towers of boxes to reach fruit hanging out of reach, or combining short sticks into long ones to reach outside cage bars. This mentalism was almost stamped out, but with notable exceptions, such as Lashley’s rats on the jumping stand experiencing vicarious (mental) trial and error, or Tolman’s rats buried in thought at the start box of a maze, troubled by ideas, hypotheses, and mental maps. Lashley and Tolman were atypical, however; Lashley was trained as an ethologist, and Tolman was always a closet cognitivist, and a self-proclaimed cryptomentalist.2 But to the dyed-in-the-wool behaviorist, learning did not imply conscious intent but rather was seen as the autonomous outcome of the formation of S-R bonds stamped in or out by reinforcement contingencies with no need for conscious intent. This position had powerful implications for education, whose residual clings today. Developmental Psychology. Child psychology underwent a similar history. Although at the beginning of the century we saw ingenious studies of children’s thought (witness those of Binet, Baldwin, Piaget, and Darwin for that matter), they were forgotten, and a large part of the field became imprinted on behaviorism. The Zeitgeist affected not only the theories of learning that were tested but also the methods by which they were examined. What were children asked to learn? Some were asked to stack boxes or use sticks to obtain objects out of reach, just like Sultan the chimp (Sobel, 1939). It did not seem to occur to anyone that a set of boxes more readily affords climbing to an ape than to a less agile human toddler. Others were asked to run mazes! They were “run” through a child-size maze of darkened runways where they had to complete routes to reach goal boxes in a similar inferential pattern to that shown by rats. It was not until well into the school years that children performed as well at this as did rats (Maier, 1936)! Again, the fact that running in a darkened maze may be a task suitable to no organism, but better suited to rats than preschoolers, did not seem to be open to debate. Children were tested in cages – well, almost – specifically, a Wisconsin General Test Apparatus designed by Harlow3 for use with monkeys that bit. I assume children in the 1960s were not rabid, and, therefore, the physical protection of the experimenter could not have been a prime motivation for this odd practice, engaged in, I might add, by myself and many of my closest friends. The prime motivation was in fact to minimize social or verbal interactions with the child. Deliberately, the child could not see the experimenter’s facial expressions behind a one-way mirror, and hence could not be influenced by them. The fact that a great deal of learning is inherently social was not a topic of discussion; indeed, we explicitly controlled for such undesirable influences. The point of this little walk down memory lane is not only to amuse you, but also to make the point that it was on the basis of studies like these that
Salkind_Chapter 27.indd 230
9/4/2010 10:33:16 AM
Brown
The Advancement of Learning
231
children below 7 or so were deemed incapable of inferential reasoning, insightful learning, and all kinds of logical operations, a position later reinforced by simplistic interpretations of Piaget.
Impact on Education These developments in psychology impacted educational practice. The dominant learning theories for many years encouraged educational psychologists to concentrate on such external factors as reward schedules and transfer gradients. Transfer could be expected only if identical elements of external situations were held constant, thereby capturing the mind willy-nilly. Even though Thorndike, the originator of much of this, gave up on his position concerning learning and transfer in the late 20s (Thorndike & Gates, 1929), the theories, albeit somewhat disguised, are still alive today. Equally important was the model of the child that emerged. It was received wisdom that young children had limited attention spans. They got bored easily in those boxes, mazes, and cages. So it was assumed that the young bore easily in any learning situation. Similarly, young children performed abysmally in settings designed to exploit animal wit. As a result they were deemed incapable of inferential reasoning, of performing certain types of classification, of insightful learning and transfer in general. Because of these assumed problems of immaturity, it was believed that children in school should work to mastery on simple decontextualized skills for short periods of time under appropriate reinforcement schedules. Despite this pessimistic legacy, behaviorist theories of learning of the midcentury had their clear value. They were in fact remarkably successful at explaining the range of phenomena they set out to explain. For example, Skinnerian theory gave us token economies, fading, scaffolding, and today, valuable clinical methods, such as those used to control nausea during chemotherapy. Tolman was a clear forerunner of cognitive psychology, lending a legitimacy to mental models and states. And Hullian theory has much to say to contemporary connectionism. And in defense of psychologists, those concerned with educational practice were only too ready to adopt these theories in the absence of viable alternatives that did include concerns for context, content, and developmental status. Behaviorist conceptions of learning and development postulated 30 years ago had important implications for instruction, both positive and negative. The theories permeated the language of schooling – and are still in evidence. Lauren and Dan Resnick (1991) have made this point forcibly regarding the state of the art in standardized testing, where the design of tests still reflects behaviorist theories of the past. Cognitive learning theory is only now beginning to have an effect on classroom practice and the testing industry. The vocabulary is slowly changing. The practices lag behind. Where we once had
Salkind_Chapter 27.indd 231
9/4/2010 10:33:16 AM
232
Curriculum, Instruction and Learning
behavioral objectives, we now have cognitive objectives, although it is sometimes a challenge to find the differences.
New Learning Theory So what’s new in learning theory? Slowly, the cognitive revolution did come to town and upset many accepted beliefs. A dramatic change occurred in what “subjects” were required to learn, even in laboratory settings, accompanied by a dawning awareness that real life learning is intrinsically entangled with situations. One cluster of such situations is the classroom. The model of the human learner, including the child, was transformed. Learners came to be viewed as active constructors, rather than passive recipients of knowledge. Learners were imbued with powers of introspection, once verboten. One of the most interesting things about human learning is that we have knowledge and feelings about it, sometimes even control of it, metacognition if you will. And, although people are excellent all-purpose learning machines, equipped to learn just about anything by brute force, like all biologically evolved creatures, humans come predisposed to learn certain things more readily than others. We know now that small children understand a great deal about basic principles of biological and physical causality. They learn rapidly about number, narrative, and personal intent. They entertain theories of mind. All are relevant to concepts of readiness for school, and for early school practices. Those interested in older learners began to study the acquisition of disciplined bodies of knowledge characteristic of academic subject areas (e.g., mathematics, science, computer programming, social studies, and history). Higher order thinking returned as a subject of inquiry. Mind was rehabilitated. Psychologists also began considering input from other branches of cognitive science: anthropology, sociology, linguistics, and they began to consider learning settings outside the laboratory, or even the classroom walls. Clearly a strictly laboratory-based psychological theory of learning is, and always was, a chimera.
Community of Learners I now turn to my current work in urban classrooms, where my colleagues and I are attempting to orchestrate environments to foster meaningful and lasting learning in collaboration with inner-city grade school students and teachers. We refer to this as the Community of Learners (COL) project (Brown & Campione, 1990, 1994).
Salkind_Chapter 27.indd 232
9/4/2010 10:33:16 AM
Brown
The Advancement of Learning
233
How did I get here from there? How did I make the journey from testing kids in cages to designing learning communities? To me the journey felt seamless. From studying rote memory for words and pictures, and strategies to enhance it, I progressed to studying memory for stories, narrative, and expository text. As the human mind does not resemble a tape recorder, memory for texts involves seductive simplification and inadvertent elaboration well documented by Bartlett (1932) at the early part of the century. Inferences and strategies abound, and their development in the young interested me. Texts are understood and re-created in the telling. Understanding admits of degree, monitoring one’s understanding of texts requires far more subtle judgment than monitoring if one can recall lists of words or sentences. It was this move away from rote learning of discrete stimuli to understanding text that led me down the slippery slope toward an area of research with obvious educational implications: reading comprehension and comprehension monitoring. Children have difficulty in recruiting strategies to help them understand lengthy texts. So too the subjective judgment required to monitor whether or not one has understood presents the developmentally young with difficulty, not surprising given the problems college students have with calibrating their attention to avoid the illusion of comprehension. So, my colleagues and I began a series of studies to help children learn from texts, training individual strategies such as questioning, clarifying, and summarizing to help them monitor their progress (Brown, Bransford, Ferrara & Campione, 1983). This was the precursor to the next step, the design of a reading comprehension instructional intervention that would combine these activities in an effort after meaning. Reciprocal teaching, designed by Annemarie Palincsar and me (Palincsar & Brown, 1984) became that intervention, and, as we will see, it is still a central part of the COL. Reciprocal teaching involved the development of a minilearning community, intent not only on understanding and interpreting texts as given, but also on establishing an interpretive community (Fish, 1980) whose interaction with texts was as much a matter of community understanding and shared experience as it was strictly textual interpretation. It was to capture this influence of common knowledge, beliefs, and expectations that the notion of a community of learners was developed. For the past 10 years or so, my colleagues and I have been gradually evolving learning environments that would deliberately foster interpretive communities of grade-school learners.
Engineering of a Community of Learners The fundamental engineering principle behind the design of a COL is to lure students into enacting roles typical of a research community. I take this
Salkind_Chapter 27.indd 233
9/4/2010 10:33:16 AM
234
Curriculum, Instruction and Learning
metaphor seriously. The COL classrooms feature a variety of activities that are essentially dialogic in nature, modeled after research seminars, that when working well facilitate interchange, reciprocity, and community. Theoretically, I imagine such classrooms as enculturating multiple zones of proximal development, to use the now popular Vygotskian (1978) term. A zone of proximal development defines the distance between a child’s current level of learning and the level she can reach with the help of people, tools, and powerful artifacts – tools and aids to perfect mind, in Bacon’s terms. Within these multiple overlapping zones, students navigate by different routes and at different rates. But the push is toward upper, rather than lower, levels of competence. These levels are not immutable, but rather constantly changing as participants become increasingly independent at successively more advanced levels. Practically I imagine classrooms as learning communities that have extensions beyond the classroom walls. I will share with you a few essential components (for fuller details, see Brown & Campione, 1990, 1994). One is that we feature students as researchers and teachers, partially responsible for designing their own curriculum. A variety of collaborative activities encourage this. I will discuss just two of them: reciprocal teaching learning seminars and jigsaw teaching sessions. Reciprocal Teaching. Reciprocal teaching began as a method of conducting “reading group,” once an established ritual of the grade-school class. Reciprocal teaching seminars can be led by teachers, parents, peers, or older students. Six or so participants form a group with each member taking a turn leading a discussion about an article, a video, or other materials they need to understand for research purposes. The leader begins the discussion by asking a question and ends by summarizing the gist of the argument to date. Attempts to clarify any problems of understanding take place when needed, and a leader can ask for predictions about future content if this seems appropriate. These four activities were chosen because they are excellent comprehensionmonitoring devices. Quite simply, if you cannot summarize what you have just read, you do not understand, and you had better do something about it (for more details, see Palincsar & Brown, 1984). Reciprocal teaching was designed to provoke zones of proximal development within which readers of varying abilities could find support. Group cooperation, where everyone is trying to arrive at consensus concerning meaning, relevance, and importance, helps ensure that understanding occurs, even if some members of the group are not yet capable of full participation. Because thinking is externalized in the form of discussion, beginners can learn from the contributions of those more expert than they. So, unlike many decontextualized skills approaches to reading, skills here are practiced in the context of actually reading. Collaboratively, the group, with its variety of expertise, engagement, and goals, gets the job done; usually the text gets understood. The integrity of the task, reading for meaning, is maintained throughout.
Salkind_Chapter 27.indd 234
9/4/2010 10:33:16 AM
Brown
The Advancement of Learning
235
Jigsaw. This idea of learning with a clear purpose in mind is a mainstay of all the components of the Community of Learners. In particular it carries over to our version of Aronson’s (1978) jigsaw classroom. Students are asked to undertake independent and collaborative research. As researchers, they divide up units of study and share responsibility for learning and teaching their piece of the puzzle to each other. How does this work? Classroom teachers and domain area specialists together decide on central abiding themes visited at a developmentally sensitive level. Each theme (e.g., changing populations) is then divided into five or six subtopics (endangered species, rebounding populations, introduced species, etc.), dependent in part upon student age and interest. Each group of students conducts research on one subtopic, and then shares its knowledge by teaching it to others. As a concrete example, recent classes of second graders chose to study animal/habitat interdependence. Some children studied how animals protect themselves from the elements or from predators. Others became experts on animal communication or reproductive strategies. Still others studied predator/prey relations. Design teams were then formed that create habitats for an adopted animal or invent an animal of the future. These design teams were configured so that each member had conducted research on part of the knowledge. In each group someone knew about predator/prey relations, someone could talk wisely on the strengths and weaknesses of possible methods of communication, and so forth. All pieces are needed to complete the puzzle, to design the habitat, hence jigsaw. By these methods, expertise is distributed deliberately. Majoring. Expertise is also distributed by happenstance. Variability in expertise arises naturally because of the different research paths followed by groups and individuals. We refer to this phenomenon as majoring. Children are free to major in a variety of ways, free to learn and teach whatever they like within the confines of their subtopic. Some become experts on disease and contagion, some concentrate on bizarre reproductive strategies; others major in pesticides or pollution. All contribute their specific knowledge, thereby enriching the intellectual resources of the community. Let us consider just one example of majoring: delayed implantation. This is a reproductive strategy whereby fertilized eggs lay dormant inside the female until environmental conditions are suitable for the survival of offspring, at which point the eggs begin to develop. This principle was discovered by some fifth graders last year, but not by previous cohorts. At least 9 months after their discovery, a group of now sixth graders told me about another example of the principle, the Minnesota Mink, that they had seen in a television program. According to my informants (my commentary in brackets): • Minks breed aggressively in late winter because their thick coats will protect them from bites and scratching.
Salkind_Chapter 27.indd 235
9/4/2010 10:33:16 AM
236
Curriculum, Instruction and Learning
• [This was an inference. On the program, we learned only that mink shed their valuable heavy winter coats for light summer ones. And the mating minks did look like they were engaged in strenuous activity. The inference was actually an example of transfer of prior knowledge from an animal these students had previously studied, the sea otter, with a heavy coat and notably rough mating habits.] • The females mate with as many males as possible, and subsequent litters consist of pups that are fathered by more than one male. The students argued that this increased the variability of the gene pool [a biologically appropriate inference]. • The last male to mate has more pups, because, the students argued, if he could still mate at the end of the season, he must be pretty strong [inference based on a Spencerian/Darwinian notion of survival of the fittest]. • The fertilized eggs just sit there, another child corrects, lie dormant, until it is spring, and then start to develop. • Pups are partly “acquarian.” [I think they meant aquatic]. The point about my story is not the demonstration of long-term retention of facts, or the assimilation of new facts about a complex biological mechanism, or even the inferential powers the students displayed. It is their excitement about what they are learning sustained over considerable time, and at their own expense (they were no longer accountable for this topic). I was impressed by their confidence in their own developing knowledge and their belief that this is something that the community will respect and value. And by way of metaphorical extension, delayed implantation is what we do with ideas – plant them in the community and hope they come to fruition when the time is ripe. The Role of Performance. In telling their story, these students were putting on a performance, for my benefit. Everyone in the community is at some stage an actor and an audience. Regular exhibitions to a variety of audiences are an important component of the community. The sense of audience for one’s research efforts is not imaginary, but palpable and real. Audiences demand coherence, push for higher levels of understanding, require satisfactory explanations, request clarification of obscure points, and so on. Students do not have to deal only with a single audience, the teacher, as they often do in school. These opportunities to display provide an element of reality testing, also an important feature of many of the school activities such as dramatic plays put on by boys’ and girls’ clubs (Heath & McLaughlin, in press). Such groups typically engage in seasonal cycles of planning, preparing, rehearsing, and finally performing. There are deadlines, discipline, and most important, reflection on performance. So, too, in the COL we have cycles of planning, preparing, practicing, and teaching others. Deadlines and performance demand the setting of priorities – what is important to know? What is important to teach? What of our newfound knowledge do we display?
Salkind_Chapter 27.indd 236
9/4/2010 10:33:16 AM
Brown
The Advancement of Learning
237
The Classroom Teacher. The classroom teacher is not absent from these proceedings. She learns along with the children as well as assists their efforts. In addition, she periodically calls the whole class into conference to consider the main theme and the relation among the research activities. The aim is to lead the students to higher levels of thinking and to help them set goals for future research. These whole-class discussions provide a reflection period in which to take stock of where they are and where they want to be.
Extending the Learning Community Inside the School For the program to run optimally, adults other than the classroom teacher are needed to guide the learning activities. But we have to live with the feasible. How many extra bodies can there be? Parenthetically, I note that at its peak, Dewey’s (1936) Laboratory School had a 4:1 child/adult ratio, not counting adult experts. Because this is unrealistic, the COL relies heavily on the expertise of the children themselves. We use cross-age teaching, both face-to-face and via electronic mail. We use older students as discussion leaders guiding the reciprocal teaching or jigsaw activities of younger students. Such tutoring extends the teaching “capital” available to our students, but it is also a formative aspect of community building.
Outside the School Any learning community is limited by the combined knowledge of its members. Within traditional schools, members draw on a limited knowledge capital if the faculty and students are relatively static. Or they face jarring discontinuity if there is rapid turnover, as is the case in many inner-city schools. In addition, both teachers’ and students’ expectations concerning excellence, or what it means to learn and understand, may be limited if the only standards are local. Schools are not islands. They exist in wider communities, and we rely on them. For example, experts coaching via electronic mail provide us with an essential resource, freeing teachers from the sole burden of knowledge guardian and allowing the community to extend in everwidening circles of expertise.
Principles of Learning A major part of my personal effort in the design experiment (Brown, 1992) of creating community is to contribute to a theory of learning that can capture
Salkind_Chapter 27.indd 237
9/4/2010 10:33:16 AM
238
Curriculum, Instruction and Learning
and convey the core essential features. The development of theory is critical for two reasons, conceptual understanding and practical dissemination. The development of theory has always been necessary as a guide to research, a lens through which one interprets, that sets things apart and pulls things together. But theory development is essential for practical implementation as well. It is for these reasons that we have been concerned with the development of a set of first principles of learning to guide research and practice. But in this light, it is a sobering thought that for decades the Progressive Education Association of America produced sets of principles (usually 9) every few years, principles that were so vague that they could not lead to a convergence in practice of any kind (Graham, 1967). They included: freedom to develop naturally; work guided by interest; cooperation between home and school; community building; teacher as guide, not taskmaster. All these are principles that I would agree with and will probably reiterate. But what does developing naturally mean? How does one follow interest and guide learning while at the same time helping chart legitimate pathways of intellectual inquiry? Without more specificity, more models, more documentation, more evaluation, these principles become part of a common vocabulary, but influence practice little. Descriptions of current “innovative” programs also share a family resemblance in rhetoric, but again one might ask, do they result in any consensual practice? My own rhetoric in describing principles of learning is far from safe from these criticisms. And the problem of dissemination is a real one. As a cautionary tale, consider the fate of reciprocal teaching. The program has enjoyed widespread dissemination. It has been picked up by researchers, teachers, and textbook publishers, and has become part of the discourse of the educational community. But too often something called reciprocal teaching is practiced in such a way that the principles of learning it was meant to foster are lost, or at best relegated to a minor position. The surface rituals of questioning, summarizing, and so forth are engaged in, divorced from the goal of reading for understanding that they were designed to serve. These “strategies” are sometimes practiced out of the context of reading texts. Quite simply, if one wants to disseminate a program on the basis of principles of learning rather than surface procedures, one must be able to specify what those principles are in such a way that they can inform practice. Adaptation and modification are an organic part of any implementation process. When working with new teachers, we encourage implementation as evolution (Majone & Wildavsky, 1978) constrained by first principles. Here, by way of illustration, we will discuss a few of these first principles of learning. A more complete list is given in Brown and Campione (1984).
Steps Toward Learning Principles of the COL Program 1. A great deal of academic learning, though not everyday learning, is active, strategic, self-conscious, self-motivated, and purposeful. Effective learners
Salkind_Chapter 27.indd 238
9/4/2010 10:33:16 AM
Brown
The Advancement of Learning
239
operate best when they have insight into their own strengths and weaknesses and access to their own repertoires of strategies for learning. For the past 20 years or so, this type of knowledge and control over thinking has been termed metacognition (Brown, 1978). Interest in things metacognitive is, of course, not new; it is just that a concentrated period of research has reaffirmed what was already known but not established very well. And that is progress. A little recognized progenitor of this position was actually Binet, known in this country primarily for the introduction of intelligence testing. Binet was also interested in the education of the child-like mind. True to the newfound confidence in testing, Binet designed tests of what he called autocriticism to root out metacognitive lacunae. For example, what is wrong with these sentences? • An unfortunate cyclist fractured his skull and died at once; he has been taken to the hospital and we are afraid he won’t be able to recover. • Yesterday we found a woman’s body sliced in 18 pieces; we believe she killed herself. Gruesome Victoriana indeed, but as Binet pointed out, “You would be surprised at how many of the thoughtless young are quite happy with this nonsense.” “Apres le mal, le remede.” Binet believed diagnosis to be of little use if it were not followed by remediation. “If it is not possible to change intelligence, why measure it in the first place?” Given this philosophy, not shared by many in the early part of the century who began to believe in the immutability of IQ, Binet developed a remedial curriculum for the “thoughtless young.” The curriculum, called Mental Orthopedics, was intended to strengthen the child’s “unreflective and inconsistent mind.” As the thoughtless child “does not know that he does not understand,” he needs help “to observe, to listen and to judge better.” The curriculum was specifically designed to train, in Binet’s terms, “habits of work, effort, attention, reasoning and self-criticism,” leading to the “pleasures of intellectual selfconfidence” (all quotations from Binet, 1909). Unfortunately for us, he was more than a little vague about how we might do this. Actual descriptions of the training or its outcomes do not survive, a problem in general for past innovative programs. One might argue that all this talk of strategies and metacognition is silly. Who indeed would want passive, unmotivated, purposeless, indeed mindless, learning? There is certainly a place for mindlessness in human learning; a great deal of learning does occur incidentally, and humans have reasoning biases that allow them to get by on this most of the time (Bartlett, 1958; Tversky & Kahneman, 1974). But scholarship, the domain of schools, demands intentional learning (Bereiter & Scardamalia, 1989). In this context, who could possibly argue against mindful learning? My point is not that peopled
Salkind_Chapter 27.indd 239
9/4/2010 10:33:16 AM
240
Curriculum, Instruction and Learning
argued against mindful learning; rather, that they did not campaign actively for it. Remember, a belief that rote learning trains the mind has been around for a long time. Advocates of fact acquisition, in and of itself and by whatever means, still stalk the land. One legacy of behaviorism was a concern with capturing the mind in spite of itself. Understanding and reflection were not prominent features of the psychological learning theories of the mid-century. The need for a resurgence of interest in mind and its uses was overdue. 2. Classrooms as Settings for Multiple Zones of Proximal Development. I take it as given that learners develop at different rates. At any time they are ripe for new learning more readily in some arenas than others. They do not come “ready for school” in some cookie-cutter fashion. The central Vygotskian notion of zones of proximal development is one of learning flowering between lower and upper bounds of potential, depending on environmental support. Bacon’s aids, tools, and guides to perfect mind serve to push as much as possible toward the upper bounds of competence. This is also a position that needed to be reinvented. The set of influential contrasting theories that has influenced American schools include errorless learning, mastery learning, skill building, and so on: All attempt to aim instruction at the child’s existing level of competence, often interpreted as lower levels of performance. Indeed, many interpret Dewey as suggesting emphasis on lower bounds when he argued in favor of teaching to the child’s level. I argue that an essential role for teachers is to guide the discovery process toward forms of disciplined inquiry that would not be reached without expert guidance, to push for the upper bounds. 3. Legitimization of Differences. A central principle of COL is that individual differences be recognized and valued. I borrowed the term from studies of out-of-school learning (Heath, 1991), but I also see reflections in Howard Gardner’s (1983) concern for fostering multiple intelligences in school and Lave and Wenger’s (1991) description of multiple ways into communities of practice. Can we do this in schools, can we rejoice in diversity? What if classrooms were designed explicitly to capitalize on varieties of talent to provide multiple “ways in” – through art, drama, technological skills, content knowledge, reading, writing, teaching, social facilitation, and so forth? Indeed, it is very much our intention to increase diversity in COL classrooms. Traditionally, school agendas have aimed at just the opposite, decreasing diversity. This tradition is based on the false assumption that there exist prototypical, normal students who, at a certain age, can do a certain amount of work, or grasp a certain amount of material, in the same amount of time (Becker, 1972). In our program, although we assuredly aim at conformity on the basics (everyone must read, write, think, reason, etc.), we also aim at nonconformity in the distribution of expertise and interests so everyone can benefit from the subsequent richness of available knowledge. The essence of teamwork is pooling expertise. Teams composed of members with homogeneous ideas and skills are denied access to such richness.
Salkind_Chapter 27.indd 240
9/4/2010 10:33:17 AM
Brown
The Advancement of Learning
241
4. A Community of Discourse. It is a common belief that higher thought is an internalized dialogue. To foster this we create the active exchange and reciprocity of a dialogue in our classrooms, which are intentionally designed to foster interpretive communities (Fish, 1980). The sociologist Wurthnow (1989) argued that changes in communities of discourse led the way to powerful movements in society – the Reformation, the Enlightenment, and European Socialism. At a less grandiose level, our baby COLs foster change by encouraging newcomers to adopt the discourse structure, goals, values, and belief systems of the community. Ideas are seeded (or implanted) in discussion. Sometimes these ideas migrate throughout the community via mutual appropriation and negotiated meaning, sometimes they lie fallow, and sometimes they bloom. These interpretive communities (Fish, 1980) give place to multiple voices in Bakhtin’s (1986) sense of voice as the speaking personality. 5. Community of Practice. Learning and teaching depend heavily on creating, sustaining, and expanding a community of research practice. Members of the community are critically dependent on each other. No one is an island; no one knows it all; collaborative learning is not just nice, but necessary for survival. This interdependence promotes an atmosphere of joint responsibility, mutual respect, and a sense of personal and group identity. These five principles are closely intertwined, forming as they do a system. Multiple zones of proximal development presuppose distributed expertise, distributed expertise presupposes legitimization of differences, and so on. Two final pairs of principles form systemic clusters: (a) the need for deep conceptual content that is sensitive to the developmental level of the students; and (b) the need for assessment procedures that are authentic, transparent, and aligned with the curriculum (Frederiksen & Collins, 1989). I have space to discuss just the first set.
Need for a Theory of Development I am reminded of a story told by Jerry Bruner in his book Actual Minds, Possible Worlds (1986). After he had given a presentation, a member of the audience stood up and said she had a question about his claim that any subject could be taught to a child at any age in some intellectually honest way. Bruner was expecting the usual question about calculus in the first grade. But no, the question was much more thoughtful: “How do you know what’s honest?” Now that really is the pivotal question. It is not an easy question to answer. Most contemporary school reform projects finesse the problem by adopting a “one-size-fits-all” philosophy. The principles and structure of the program are the same, independent of age. The developmental model is missing. Of course, from some theoretical stances, learning and development are synonymous: learning = development; development is simply the outcome of learning, a truly Skinnerian argument.
Salkind_Chapter 27.indd 241
9/4/2010 10:33:17 AM
242
Curriculum, Instruction and Learning
Implicit developmental assumptions are governing school practices nonetheless. We teach the young social studies in reference to their own neighborhood. Why? Because someone decided this was developmentally appropriate? A unit on boats was thought suitable for third graders at the Lincoln School, and 6-year-olds in the Chicago Lab School studied “occupations serving the household.” Why do we teach fractions (American history, biology) when we do? It is traditional in educational circles to make up developmental theory. My favorite example is that of G. Stanley Hall, sometimes called the father of developmental psychology. Brushing aside the need for empirical validation, Hall (1881) championed a developmental-stage theory made up of cultural epochs, a notion subsequently picked up by Dewey. Hall argued that a curriculum should mimic the history of mental evolution. Young children at the “savage” stage should study material from the corresponding historical epoch, that is, ancient myths and fables. High school boys should study the knights of the feudal period because, developmentally, they were in the period of chivalry and honor. Young women were not accorded a corresponding period! There existed no scientific justification for these developmental stages whatsoever. This story is not just one of historical curiosity. In contemporary curriculum design, in both science and history, a simplistic interpretation of Piagetian theory has led to the consistent underestimation of young students’ capabilities. This slant on Piagetian theory encourages sensitivity to what children of a certain age cannot do because they have not yet reached a certain stage of cognitive operations. The “theory” still prevails in the face of 30 years of ingenious work by developmental psychologists emphasizing the impressive cognitive abilities that children do possess. Especially relevant to the design of, for example, science curricula is the painstaking documentation of children’s evolving knowledge about biological and physical causality. Similarly, we know a great deal about children’s impressive reasoning processes within contexts that they do understand. Again my point is that the design of school practice is influenced by theories of development more typical of the 1950s than the 1990s. It is essential to the philosophy of the COL that the students be engaged in research in an area of inquiry that is based on deep disciplinary understanding, and that follows a developmental trajectory based on research about children’s developing understanding within a domain. Deep Disciplinary Understanding. Although it is surely romantic to think of young children entering the community of practice of adult academic disciplines, awareness of the deep principles underlying disciplinary understanding enables us to design academic practice for the young that are stepping stones to mature understanding, or at least are not glaringly inconsistent with the end goal. For example, in the domain of ecology and environmental science, a contemporary understanding of the underlying biology would necessitate a ready
Salkind_Chapter 27.indd 242
9/4/2010 10:33:17 AM
Brown
The Advancement of Learning
243
familiarity with biochemistry and genetics, not within the grasp of the young. Instead of watering down such content, we invite young students into the world of the 19th-century naturalist, scientists who also lacked modern knowledge of biochemistry and genetics. Ideally, by the time students are introduced to contemporary disciplinary knowledge, they will have developed a thirst for that knowledge, as indeed has been the case historically. Developing Understanding Within a Domain. I take seriously the fact that a scientific understanding of the growth of children’s thinking in a domain should serve as the basis for setting age-appropriate goals. As we learn more about children’s knowledge and theories about the biological and physical world (Carey & Gelman, 1991), we are better able to design a spiraling curriculum such as that intended by Bruner (1969). Topics are not just revisited willy-nilly at various ages at some unspecified level of sophistication, as is the case in many curricula that are self-described as spiraling, but each revisit is based on a deepening knowledge of that topic, critically dependent on past experience and on the developing knowledge base of the child. It should matter what the underlying theme is at, say kindergarten and Grade 2; it should matter that the sixth-grade students have experienced the secondgrade curriculum, and so on. In designing the ecology/environmental science/ biology strand, we seek guidance from developmental psychology concerning students’ evolving biological understanding (Carey, 1985; Hatano & Inagaki, 1987). We know that by age six, children can fruitfully investigate the concept of a living thing, a topic of great interest that they refine over a period of years, gradually assimilating plants into this category. Second graders concentrate on design criteria for animal/habitat mutuality and interdependence. Sixth graders study the effect of broad versus narrow niches, and by eighth grade the effect of variation in the gene pool on adaptation and survival is not too complex a research topic. Whereas second graders begin to consider adaptation and habitats in a simple way, sixth through eighth graders come to distinguish among structural, functional, and behavioral adaptations, biotic and abiotic interdependence, and so forth. Similarly, a consideration of extant research governs our approach to reasoning within a domain. Again in biology, we permit teleological reasoning (Keil, 1992) and an overreliance on causality, but then we press for an increasingly sophisticated consideration of chance, probability, and necessity that underlies mature disciplinary thinking. Let us not forget domain-general scientific reasoning (Brown, 1990) if such exists. Do children understand the difference between hypothesis and evidence? What is their understanding of “the scientific method”? Indeed, what should it be? Francis Bacon’s or Karl Popper’s? Dare we share with them the insights of Peter Medawar that scientists as human beings do what everyday people do? They are not omniscient. They tell good stories, they create imaginary worlds. Indeed, the scientific method itself
Salkind_Chapter 27.indd 243
9/4/2010 10:33:17 AM
244
Curriculum, Instruction and Learning
like any other explanatory process is a dialogue between fact and fancy, the actual and the possible, between what could be true and what is in fact the case – it is a story of justifiable beliefs about a possible world. (Medawar, 1982, p.111)
And then there is the age-old problem for a developmental psychologist – transition mechanisms. What triggers conceptual change? In short, the amount of work involved in mapping a spiraling curriculum that is truly developmentally sensitive is quite overwhelming. But it would be more so if we fail to capitalize on the impressive amount we already know by throwing out the bathwater and the babies.
Conclusion There is a conundrum running throughout this article. I have argued that: • School practices are influenced by outmoded theories of learning and development that are relics of psychology’s behaviorist past; • Contemporary theories are better suited to inform the design of schooling because they take as their data base the learning of complex systems of knowledge characteristic of what we want schools to enculturate; and • The new theories are making little headway at influencing school practices. To quote Bacon again, “All things change, but nothing perishes.” Why? I argue that this is because what the new theories ask is so hard. It is easier to organize drill and practice in decontextualized skills to mastery, or to manage 164 behavioral objectives, than it is to create and sustain environments that foster thought, thought about powerful ideas. We are asking a great deal from everyone in the learning community. But we know a great deal more about how to do it now than a century ago. Advancement in our understanding of learning is slow but real. So, I conclude with a paraphrase of quotations from John F. Kennedy, Lee Shulman, and Jerry Bruner, to show my catholic tastes: “We choose to do this, not because it is easy, but because it is hard.” (Kennedy, 1962) “Those that understand, teach honestly.” (Shulman, 1986, p. 14) Those that teach honestly teach ideas that are “lithe and beautiful and immensely generative.” (Bruner, 1969, p. 21)
I believe that a century of research has helped us know what these ideas are and better prepared us to design instruction in the form of aids and tools to perfect hand and mind.
Salkind_Chapter 27.indd 244
9/4/2010 10:33:17 AM
Brown
The Advancement of Learning
245
Notes This article was the presidential address at the AERA Annual Meeting in April 1994. The work reported in this article was supported by grants from the James S. McDonnell Foundation, the Andrew W. Mellon Foundation, the Evelyn Lois Corey Research Fund, and Grant HD-06864 from the National Institute of Child Health and Human Development. But the preparation of the article was supported principally by the Spencer Foundation, whom I would like to thank for giving me time to think. I would like to thank my many colleagues and friends who contributed to the research agenda in this article, but notably I thank my husband and colleague, Joseph C. Campione, for contributions too deep for telling. 1. For descriptions and retrospectives on the major psychological learning theories of the mid century, see Koch (1959). 2. Koch, 1959. 3. Koch, 1959.
References Aronson, E. (1978). The jigsaw classroom. Beverly Hills, CA: Sage. Bacon, F. (1605). The advancement of learning. Oxford: The Clarendon Press. Bacon, F. (1623). Novum organum. Oxford: The Clarendon Press. Bakhtin, M. M. (1986). Speech genres and other late essays (C. Emerson & M. Holquist, Eds., V. W. McGee, Trans.). Austin, TX: University of Texas Press. Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge: Cambridge University Press. Bartlett, F. C. (1958). Thinking: An experimental and social study. New York: Basic Books. Bateson, M. C. (1984). With a daughter’s eye: A memoir of Margaret Mead and Gregory Bateson. New York: Morrow. Becker, H. (1972). A school is a lousy place to learn anything in. American Behavioral Scientist, 16, 85–105. Bereiter, C., & Scardamalia, M. (1989). Intentional learning as a goal of instruction. In L. B. Resnick (Ed.), Knowing, learning, and instruction: Essays in honor of Robert Closer, (pp. 361–392). Hillsdale, NJ: Erlbaum. Binet, A. (1909). Les idees modernes sur les infants. Paris: Ernest Flammarion. Brown, A. L. (1978). Knowing when, where, and how to remember: A problem of metacognition. In R. Glaser (Ed.), Advances in Instructional Psychology, 1 (pp. 77–165). Hillsdale, NJ: Erlbaum. Brown, A. L. (1990). Domain-specific principles affect learning and transfer in children. Cognitive Science, 14, 107–133. Brown, A. L., (1992). Design experiments: Theoretical and methodological challenges in creating complex interventions in classroom settings. The Journal of the Learning Sciences, 2(2), 141–178. Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. (1983). Learning, remembering, and understanding. In P. H. Mussen (Series Ed.) & J. H. Flavell & E. M. Markman (Vol. Eds.), Handbook of child psychology: Vol. 3. Child development, (4th ed., pp. 77–166). New York: Wiley. Brown, A. L., & Campione, J. C. (1990). Communities of learning and thinking, or A context by any other name. In D. Kuhn (Ed.), Contributions to Human Development, 21, 108–125.
Salkind_Chapter 27.indd 245
9/4/2010 10:33:17 AM
246
Curriculum, Instruction and Learning
Brown, A. L., & Campione, J. C. (1994). Guided discovery in a community of learners. In K. McGilly (Ed.), Classroom lessons: Integrating cognitive theory and classroom practice, (pp. 229–270). Cambridge, MA: MIT Press/Bradford Books. Bruner, J. S. (1969). On knowing: Essays for the left hand. Cambridge, MA: Harvard University Press. Bruner, J. S. (1986). Actual minds, possible worlds. Cambridge, MA: Harvard University Press. Calkins, M. W. (1915). The self in scientific psychology. American Journal of Psychology, 26, 495–524. Carey, S. (1985). Conceptual change in childhood. Cambridge, MA: Bradford Books, MIT Press. Carey, S., & German, R. (1991). The epigenesis of mind. Hillsdale, NJ: Erlbaum. Dewey, J. (1936). The theory of the Chicago experiment. In K. C. Mayhew & A. C. Edwards (Eds.), The Dewey School: The laboratory school of the University of Chicago, 1896 –1903 (pp. 463–477). New York: Appleton-Century. Fish, S. (1980). Is there a text in this class? The authority of interpretive communities. Cambridge: Harvard University Press. Frederiksen, J., & Collins, A. (1989). A systems approach to educational testing. Educational Researcher, 18(9), 27–32. Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York: Basic Books. Graham, P. S. (1967). Progressive education, from Arcady to Academe: A history of the Progressive Education Association, 1919–1955. New York: Columbia University, Teachers College. Griffin, D. R. (1992). Animal minds. Chicago: University of Chicago Press. Hall, G. S. (1881). The contents of children’s minds. Princeton Review, 11, 249–272. Hatano, G., & Inagaki, K. (1987). Everyday biology and school biology: How do they interact? The Newsletter of the Laboratory of Comparative Human Cognition, 9, 120–128. Heath, S. B. (1991). “It’s about winning!” The language of knowledge in baseball. In L. B. Resnick, J. M. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition, (pp. 101–126). Washington, DC: American Psychological Association. Heath, S. B., & McLaughlin, M. W. (in press). Learning for anything every day. Journal of Curriculum Studies. Hobhouse, L. T. (1901). Mind in evolution. London: Macmillan. Keil, F. C. (1992). The origins of autonomous biology. In M. R. Gunnan & M. Maratsos (Eds.), Minnesota symposium on child psychology: Modularity and constraints on language and cognition, (pp. 103–137). Hillsdale, NJ: Erlbaum. Kennedy, J. F. (1962). Televised address from Rice University, September 12. Koch, S. (Ed.) (1959). Psychology: A study of a science: General systematic formulations, learning, and special processes. Vol. 2. New York: McGraw-Hill. Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. New York: Cambridge University Press. Maier, N. R. F. (1936). Reasoning in children. Journal of Comparative Psychology, 21, 357–66. Majone, G., & Wildavsky, A. (1978). Implementation as evolution. In H. E. Freeman (Ed.), Policy studies review annual. Vol. 2 (pp. 103–117). Beverly Hills: Sage Publications. Medawar, P. (1982). Pluto’s republic. Oxford: Oxford University Press. Palincsar, A. S., & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering and monitoring activities. Cognition and Instruction, 1(2), 117–175. Resnick, L. B., & Resnick, D. P. (1991). Assessing the thinking curriculum: New tools for educational reform. In B. R. Gifford & M. C. O’Connor (Eds.), Future assessment: Changing views of aptitude, achievement and instruction. Boston: Academic Press. Shulman L. S. (1986). Those who understand teach: Knowledge growth in teaching. Educational Researcher, 15(2), 4–14.
Salkind_Chapter 27.indd 246
9/4/2010 10:33:17 AM
Brown
The Advancement of Learning
247
Sobel, B. (1939). The study of the development of insight in preschool children. Journal of Genetic Psychology, 55, 381–385. Thorndike, E. L., & Gates, A. I. (1929). Elementary principles of education. New York: Macmillan. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185, 1124–1131. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. (M. Cole, V. John-Steiner, S. Scribner, & E. Souberman, Eds. and Trans.). Cambridge, MA: Harvard University Press. Wurthnow, R. (1989). Communities of discourse: Ideology and social structure in the Reformation, the Enlightenment, and European socialism. Cambridge, MA: Harvard University Press.
Salkind_Chapter 27.indd 247
9/4/2010 10:33:17 AM
Salkind_Chapter 27.indd 248
9/4/2010 10:33:17 AM
28 Paradigms of Knowledge and Instruction S. Farnham-Diggory
A
nyone who tries to review a corpus of literature for this journal must grapple with an extraordinarily difficult problem: identifying exactly what a set of studies has in common. Common terminology is not enough. One person’s motivation may be another’s cognitive strategy. Methodological differences often crucially differentiate studies that go by the same names (e.g., mastery learning). Figuring out what the studies really have in common means delving well below terminology to underlying assumptions. Once identified, such assumptions should constitute categories that can be strongly defended on theoretical grounds. They could even be true. People who write textbooks, as compared to people who undertake such limited projects as writing review articles, face categorization problems that are practically insurmountable. That’s why textbook authors so often clump and list research findings in ways that are essentially arbitrary. And that’s why their hapless students (and I speak as an instructor of hundreds of the hapless) have no alternative but to try to memorize arbitrary lists of research findings and hope for the best on examinations. The first textbook I wrote, Cognitive Processes in Education, was published in 1972. The revision was published in 1992. Normally, a revision is accomplished in 3 years. It took me 20 years because the field of educational psychology was changing so rapidly. I produced three completely revised drafts in manuscript before the fourth draft finally went to press. This rapid change forced me to try to figure out, once and for all, how the instructional literature could be classified logically. Source: Review of Educational Research, 64(3) (1994): 463–477.
Salkind_Chapter 28.indd 249
9/4/2010 10:33:09 AM
250
Curriculum, Instruction and Learning
I have concluded that there are exactly three instructional paradigms and that, within the framework of these paradigms, exactly five types of knowledge can be acquired. I believe this categorization system provides solid stepping stones through research swamps. I believe it also provides guidelines for curriculum design. I believe that my system helps resolve controversies – between types of reading instruction, for example. Above all, I believe that logical categorization of instruction and learning, as compared to political rhetoric, is the first step in genuine school reform. However, in this article, I will discuss only research issues, especially as they relate to literature reviews. I’ll first provide a very brief, introductory summary. I will then. go into details and address issues that the summary raises.
The Three Core Instructional Paradigms Every instructional program – those designed for laboratory research, written up in curriculum guides, observed in schools, and informally practiced by communities – can be classified as fitting one of three mutually exclusive models that I am calling behavior, development, or apprenticeship models, although I am using these terms in limited ways. As shown in Table 1, the nature of the model is determined by two factors: how the model distinguishes novices from experts and what the mechanism of transformation is. I use the term expert relatively, to refer to whatever level of competence a program may set. I also use the term novice relatively, to refer to whatever beginning level a program may specify. Mechanism of transformation refers to what happens inside the head that turns a novice into an expert. I now turn to details and issues.
The Behavior Model In the behavior model, novices and experts are on the same scale(s), and transformation is accomplished through the mechanism of incrementation. A novice is low, and an expert is high. A novice gets to be an expert by accumulating something, getting better, getting faster, getting more, and so forth. Table 1: Instructional model criteria Instructional paradigm
Expert-Novice distinction
Key mechanism of transformation
Behavior Development
Quantitative differences on same scale(s) Differences in qualitative models (personal beliefs)
Incrementation Perturbation
Apprenticeship
Sociological differences in the culture of practice
Acculturation
Salkind_Chapter 28.indd 250
9/4/2010 10:33:09 AM
Farnham-Diggory
Paradigms of Knowledge and Instruction
251
A founding father of this model was, of course, Edward Thorndike (Farnham-Diggory, 1992, pp. 485–491; Joncich, 1968; Thorndike, 1913– 1914). Anything that exists, Thorndike used to say, exists in some amount. (The original version of this often-paraphrased statement was in an essay by Thorndike that appeared in the Seventeenth Yearbook of the National Society for the Study of Education, 1918, p. 16). What is taught, how often it is reviewed, and so on can be measured. What is learned, how well it is learned, and so forth, can be measured. Education, being thus quantifiable, can therefore be a science, and Thorndike turned it into one. My own most recent experience with this type of instructional model has concerned reading. For several years at the University of Delaware, I directed a remedial reading center that introduced a program called Intensive Literacy (Farnham-Diggory, 1992, appendix to chap. 9, pp. 296–310). This was a modification of a tutorial program originated by neurologist Samuel Orton (1925) and adapted for whole-class instruction (with modifications of her own) by Romalda Spalding (Spalding & Spalding, 1986). In this program, students are taught letter-sound correspondences called phonograms, as well as rules such as “ y not i is used at the end of English words.” Eventually, students learn about 70 phonograms and about 30 rules. Once an initial set of phonograms has been mastered, students begin spelling and reading words. They will progress through a 1700-word list, arranged by frequency. They study between 20 and 30 words per week – writing word lists from dictation, reading them back, reviewing previous lists, and so on. Students are repeatedly assessed on a test keyed to the word list, so the teacher will know how far up the list the student has traveled. Each student builds a personal wordbook annotated according to rules. By the end of a year, with daily instruction, students will have compiled a personal glossary of over 700 words. There is much more to the program than I have described here, but the principles of countability and incrementation should be clear.
The Development Model In the development model, novices and experts are distinguished on the basis of their personal theories and explanations, sometimes called qualitative models, of events or experiences. The novice’s model is different in many complex ways from the expert’s model (an example is given below). Instruction begins with probes of the student’s personal theory. By questioning, contradicting, and challenging that theory (the process that I have called perturbation), the student is pushed to revise it. The result is essentially a whole new way of thinking, a wide-ranging qualitative shift. I presented a detailed example in my book, Schooling (Farnham-Diggory, 1990, pp. 96–01), taken from the “Shadows” curriculum designed by Deborah Smith (1989; Smith is now at Michigan State University). The curriculum is
Salkind_Chapter 28.indd 251
9/4/2010 10:33:09 AM
252
Curriculum, Instruction and Learning
a science unit for first or second graders. Through interviews, Smith determined (in line with Piaget’s findings) that young children believe that a shadow is a substance projected from the front of the body, exists in the dark, and is something like a mirror image. Smith designed a series of daily projects that perturbed these beliefs. For example, the children collected data on shadows of nonhuman objects, on shadow movements, on double shadows (from double light sources), and so forth. Eventually, through interviewing the children again, Smith demonstrated that they had constructed a new theory – that shadows result when light is blocked, that light is emitted in all directions, that mirror images result from bouncing light, not from blocked light, and so on. A founding father of the development instructional model was, of course, Jean Piaget (see his chap., “The Problem of Shadows,” in The Child’s Conception of Physical Causality, 1951). Interest in this type of instruction has been more recently revived under the name of conceptual change (Posner, Strike, Hewson, & Gertzog, 1982).
The Apprenticeship Model In the apprenticeship model, novices and experts are from different worlds, and a novice gets to be an expert through the mechanism of acculturation into the world of the expert. Actual participation in this world is critical for two reasons: (a) much of the knowledge that the expert transmits to the novice is tacit, and (b) the knowledge often varies with context. To illustrate: My own most recent experience with this model of instruction has been in the field of archaeology – literally, out in the field as a novice member of excavation teams. It has become clear that I am, to begin with, an outsider, but it is considerably less clear (to me) what I am outside of. The experience, as a whole, I would describe as coming to understand shapes emerging from a fog. For example, I cannot spot flakes – artifacts that have been chipped from core rocks by human hands. I have walked across fields with experts who spot flakes, exclaim at their beauty, and explain to me, time and time again, exactly what the criteria are (a curve here, a point there). What they cannot explain to me is how they spot the flakes in the first place. We are walking across a newly plowed field after a rain. Rocks, crop stubble, flakes, and heaven knows what else have been raised to the surface and washed clean. It is in the midst of all this perceptual rubble, that experts spot the flakes. I remain befogged. But I am hopeful that eventually I will somehow acquire their tacit knowledge. The acquisition of such knowledge, through membership in a culture, is what apprenticeship is mainly about. Another example: as an archaeology apprentice, it is necessary for me to learn the lines of intellectual allegiance and emnity. I mention the name of a wellknown authority. “Oh, him,” my apprentice master says dismissively.
Salkind_Chapter 28.indd 252
9/4/2010 10:33:09 AM
Farnham-Diggory
Paradigms of Knowledge and Instruction
253
Now what do I do? Ask “What’s wrong with him?” The answer is going to be largely uninterpretable (to an apprentice), since it is linked to unknown histories, affinities, and styles. There are, of course, published materials that explicate controversies and critically review them. But nowhere is it explained which of these materials should be taken seriously and which constitute just another set of “Oh, hims.” I am again acquiring tacit knowledge through immersion in a culture. As a final example, there is the problem of learning what constitutes scientific rigor. You find a bit of bone or pot. What can you arguably extrapolate from these bits about the nature of the ancient culture? In one excavation, it was discovered that treasures were found only in the grave of a child. The archaeologists concluded that the culture must have been relatively advanced – because status was ascribed. That is, status (as indexed by accompanying treasures) obviously could not have arisen simply from having lived a long time. The presence of something like royal families, wherein even young members are given burial honors, is an indication of a relatively advanced culture. As an apprentice, I must learn how it got decided that this interpretation is scientifically justifiable. What I think is that maybe some kid discovered a treasure hoard, tried to dig for it, and got buried when the dirt caved in on him. But I have learned to keep such ruminations to myself. Little by little, out of the fog, the principles of this particular scientific culture will emerge. That is what it means to be an apprentice – what it has meant since all that flaking began. Recent interest in cognitive apprenticeship and situated cognition was sparked by Collins, Brown, and Newman (1989); Brown, Collins, and Duguid (1989); and Rogoff (1990). Two useful books on craft apprenticeships are by Coy (1989) and Rorabaugh (1986). Regarding school programs, I reviewed eight apprenticeship projects in science and the humanities, including an entire secondary school built around marine studies, in Schooling (FarnhamDiggory, 1990, chaps. 4 and 5). In Cognitive Processes in Education (FarnhamDiggory, 1992), I devoted a whole chapter to a detailed example of a multidisciplinary curriculum called “Whale” (Watterson, Rendell, & Bell, 1988). I was surprised to discover that there doesn’t appear to be a cognitive instructional model, as such. Cognitive processes, strategies, knowledge bases, and so forth can be taught within behavior, development, or apprenticeship frameworks. The fact that cognitive procedures are being explicitly taught is not a defining characteristic of a core instructional paradigm. In fact, it is impossible to teach anything in any format without activating cognition. However, cognitive science has a major role to play in my overall classification system, as I will shortly explain.
Exclusivity and Inclusivity The three core models are defined as mutually exclusive. Definitional criteria were deliberately selected so that a form of instruction fitting one model
Salkind_Chapter 28.indd 253
9/4/2010 10:33:09 AM
254
Curriculum, Instruction and Learning
would not fit the others. For example, both the behavior model and the development model are defined as culture free, while the apprenticeship model is not. Both the development and the apprenticeship models define expertise as qualitative changes in cognition and/or life styles, while the behavior model is defined as not doing so. I recognize that more than one model may appear to be operating simultaneously, but close analysis will show that one or two models are functioning as modules within a parent instructional paradigm. For example, my apprenticeship training included behavior modules on sorting pottery shards and some major developmental shifts in beliefs about archaeology. Issues of instructional hierarchies and interfacing of modules are, of course, important but are beyond the scope of this essay.
The Five Core Learning Paradigms Within the framework of the three instructional paradigms, five types of knowledge may be acquired – declarative, procedural, conceptual, analogical and logical. I formulated this typology through processes that mathematicians aptly call brute force. I was trying to find connections between knowledge as cognitive scientists currently study it and learning as experimental psychologists used to study it. For example, the declarative knowledge studied by cognitive psychologists got into heads somehow. But cognitive psychologists weren’t concerned with acquisition. The psychologists who were concerned with acquisition used to be called verbal learning theorists. They produced a huge experimental literature prior to the onset of the cognitive revolution in the 1960s. As it turned out, I discovered that five distinct types of knowledge, as defined by cognitive psychologists, could be linked to five distinct experimental paradigms of learning psychology, as conducted from the 1930s through the 1960s. Below, I present very brief, simplified summaries of what I mean. (For more details and references, see Farnham-Diggory, 1992, pp. 77–80, chap. 4, and pp. 139–152).
Declarative Knowledge This is knowledge that can be declared, usually in words, through lectures, books, writing, verbal exchange, braille, sign language, mathematical notation, and so on. Acquisition of declarative knowledge has been studied primarily in the field of verbal learning, and the direction of the field, over the years, has been toward understanding the crucial role of meaning. The field began with heroic attempts to extrude meaning from verbal learning and only gradually devised ways of defining and quantifying meaning units (Ebbinghaus, 1964; Underwood, 1982; van Dijk & Kintsch, 1983).
Salkind_Chapter 28.indd 254
9/4/2010 10:33:09 AM
Farnham-Diggory
Paradigms of Knowledge and Instruction
255
Procedural Knowledge Procedural knowledge is in the form of action sequences. It is knowledge that must be demonstrated. (One may also have verbal knowledge of an action sequence, but the part that can be talked about is declarative knowledge.) Acquisition of procedural knowledge has been extensively studied and reported in the skill-learning literature. Additionally, there is a vast literature on skill learning in everyday life – reports of athletes, composers, business entrepreneurs, and many others. Despite differences in terminology, there turn out to be three phases of skill development that everyone, in and out of experimental psychology laboratories, talks about: (a) analysis, (b) practice to the point of automaticity, and (c) attention management. These are mutually exclusive phases. Analyzing football strategies (by watching, e.g., video replays) isn’t practicing. Automaticity, by definition, precludes analysis. Attention management (usually referred to by performers as concentration) requires strictly controlled sequences of thought and crucially avoids on-line analysis of automatized behaviors (Bilodeau, 1969; Russell & Branch, 1979; Shiffrin & Schneider, 1977).
Conceptual Knowledge Concepts are of two types: (a) categories and (b) schemata. Categories are defined usually as lists of attributes – dogs have tails, ears, four legs, bark. Schemata add in spatial and temporal attributes. They are map-like and/or scriptlike. The acquisition of conceptual knowledge was extensively studied in the concept-learning (sometimes called concept-attainment or conceptformation) experimental literature. Concepts come into existence through repeated exposure to examples that are similar in some respects and dissimilar in others. When one acquires a concept, one has, in effect, learned to extract commonalities. This extraction must be performed by the learner, or a concept – category or schema – will not, by definition, have been acquired. If a student is told outright: “This is the principle . . . ,” only a bit of declarative knowledge will have been imparted. If a student repeatedly rehearses the same experience, only a bit of procedural knowledge will have been acquired (Hilgard, 1948; Klausmeier & Harris, 1966; Reber, 1967; Reed, 1973).
Analogical Knowledge Analogical knowledge, sometimes called imagery, preserves specific correspondences between what is outside in the world and what is inside the head. It may come into existence through a single exposure. The senses were stimulated, and the memory of that sensory pattern remains.
Salkind_Chapter 28.indd 255
9/4/2010 10:33:09 AM
256
Curriculum, Instruction and Learning
This type of learning was studied within the framework of what used to be called the one-trial learning paradigm. Dogs, given one severe shock, would never go into that red box again. In more humane versions of the paradigm, pictures that subjects had seen only once were mixed in with unfamiliar pictures. Subjects were astonishingly accurate at picking out the pictures they had seen before. This type of learning is, in effect, something like a sensory imprint (Brush & Overmier, 1985; Standing, 1973).
Logical Knowledge Logical knowledge is a system of causal implications, a mental model of what is connected to what and what leads to what. The implications and connections may or may not be true by objective scientific standards, but they are characterized, as Piaget pointed out, by a compelling feeling of necessity. This is because they have come into existence by the exercise of one’s own reasoning. The early experimental literature on the acquisition of logical knowledge came from the field called problem solving and also from the so called naive theories introduced by social psychologists (Heider, 1958).
Learning and Instruction At first glance, one might think that the acquisition of certain forms of knowledge could occur only within certain instructional paradigms (e.g., declarative or procedural knowledge within the behavior paradigm, logical knowledge within the development paradigm). That’s not actually the case. All five types of knowledge can be acquired within the framework of all three types of instruction. However, particular instructional paradigms dictate different knowledge acquisition strategies and objectives. This is summarized in Table 2. I will not go into further detail here because that would lead into issues of curriculum design, which is not what this essay is about.
Teaching Tactics It may be surprising (it surprised me) that methods of teaching are not defining characteristics of instructional paradigms or of knowledge acquisition. There are, in my experience, exactly four teaching methods, and they can, and usually do, appear whenever a teacher is present. Much learning, of course, occurs in the absence of a teacher, but in those cases self-instruction is usually occurring. That, too, may take one or more of the following forms:
Salkind_Chapter 28.indd 256
9/4/2010 10:33:09 AM
Farnham-Diggory
Paradigms of Knowledge and Instruction
257
Table 2: Paradigms of knowledge and instruction Knowledge paradigms Instruction paradigms
Declarative, Procedural, Conceptual, Analogical, Logical
Behavior
Experts and novices are on the same measurement scales, and instruction enables novices to systematically accrue all five types of knowledge, until they reach expert levels.
Development
Experts and novices have different beliefs, and instruction enables novices to acquire all five types of knowledge in ways that challenge them to reconstruct their beliefs.
Apprenticeship
Experts and novices are in different worlds, and instruction enables novices to acquire all five types of knowledge (often tacit) in ways that facilitate their entry into the culture of expertise.
(a) Talking – lecturing, telling, reading from notes, presenting information verbally, talking back and forth (including dialectic), questioning, and so on, including talking to oneself. (b) Displaying – modeling, showing, demonstrating. (c) Coaching – pointing out cues, suggesting changes, guiding (all this while the student is doing something). (d) Arranging the learning environment – setting up a self-instructing situation (e.g., arranging for students in a geology lab to classify rocks by checking them against a chart). All other teaching tactics – reinforcement, use of media, and so forth – can appear in all four categories. Involvement of students as co-teachers can also occur in all four categories, as can the degree of social interaction generally. Different combinations and proportions of teaching methods may occur in different instructional and learning paradigms. For example, there may be more lecturing where declarative knowledge is being imparted, but not necessarily. Arranging for students to browse through a library may impart more declarative knowledge than lecturing does. My point is that, while it is important to be aware of teaching tactics, these tactics do not uniquely characterize the paradigms under discussion here.
Implications for Research The foregoing paradigms carry a number of implications for research in instruction and learning. For example, they help clarify calls for alternative forms of research, and they provide parsimonious criteria for classifying research that is currently being conducted. I will briefly discuss a few of these implications and then conclude.
Salkind_Chapter 28.indd 257
9/4/2010 10:33:09 AM
258
Curriculum, Instruction and Learning
Calls for Alternative Research An excellent example is the presidential address to the American Educational Research Association delivered by Elliot Eisner in 1993. Eisner presents a personal odyssey of his own intellectual growth – from painting, through teaching painting to children in the ghettos of Chicago, through graduate school at the University of Chicago, through specialization in formal educational evaluation, and eventually to a confrontation (still unresolved, as he describes it) with discrepancies between educational research as it is traditionally practiced and the true ways that (in Eisner’s experience) human minds grow and work. As Eisner sums up his thesis: If there are different ways to understand the world, and if there are different forms that make such understanding possible, then it would seem to follow that any comprehensive effort to understand the processes and outcomes of schooling would profit from a pluralistic rather than a monolithic approach to research. How can such a pluralism be advanced? What would it mean for the way we go about our work? I hope that questions of these kinds will become an agenda for our research in the future. (Eisner, 1993, p. 8)
The paradigms that I have described provide practicable ways of specifying the agenda for which Eisner is calling. The “monolithic” approach could be described as a behavior instructional paradigm that primarily fosters the acquisition of certain types of declarative and procedural knowledge. The “pluralistic” approach could be described as development or apprenticeship paradigms (the latter being especially familiar to art students) that would foster conceptual, analogical, and logical knowledge, as well as a wider variety of declarative and procedural knowledge. The question then becomes: Just how monolithic is current educational research?
The Nature of Current Research I decided to approach this question by analyzing a volume of the Journal of Educational Psychology (1992, Vol. 84; see appendix). This is, of course, a very different task from observing in classrooms and laboratories. I had to depend on what authors said they did, and, in view of journal page constraints, experimenters and subjects must have done a great deal more than authors could describe. So I have doubtless made some errors, but I think the general picture will be clear. Method. As listed in the appendix, there were 43 articles altogether. A few were omitted because they were commentaries, reprints from earlier years in the history of the journal, or had foreign language ambiguities.
Salkind_Chapter 28.indd 258
9/4/2010 10:33:10 AM
Farnham-Diggory
Paradigms of Knowledge and Instruction
259
Listed first in each appendix entry are authors, titles, and page numbers. Listed second are the core instructional models that explicitly or implicitly influenced subject responses – behavior, development, apprenticeship, or unspecified. Only 11 of these articles explicitly manipulated instruction. These are coded El (for experimental instruction). However, because all subjects had attended school, I also attempted to classify the nature of relevant school instruction (SI). Unspecified SI means that no portion of any school instruction (e.g., arithmetic, reading) was described in any terms by the author(s), and the school instruction’s possible relevance to the theory and procedure of the research project was therefore never addressed. Listed third in the appendix entries is the type of knowledge activated and/or enhanced by experimental manipulations – declarative, procedural, conceptual, analogical, logical, or unclassifiable composite (UC). This last was usually an achievement or aptitude test battery. I asked myself what kind of knowledge was primarily activated by procedures described in the methods section of the research report? Declarative knowledge was activated by reading, writing, talking, filling out questionnaires, drawing lines under words, and so forth. Procedural knowledge was activated by strategy training, rehearsing, eye movements, and so on. (I recognize that doing anything activates some procedural knowledge, but my concern here was with explicit skills of theoretical interest to the investigator.) Conceptual knowledge was activated by requirements to formulate schemata or categories. Analogical knowledge was activated by specific sensory stimuli – for example, the cartoon cat, Garfield; diagrams; and so forth. Logical knowledge was activated by reasoning about people (attribution theory), mental models of mechanical contrivances, belief systems, and problem solving. Results. As will be immediately evident from scanning Table 3, the most outstanding result of my analysis of these articles is that the nature of school instruction was seldom specified in them. Of the four papers that did mention (if only in a sentence or two) the nature of the school program, two were behavior models, and two were apprenticeship models. Of the latter, one was described as a play-based kindergarten program, and the second was a language arts program that required outside reading. Because both programs implied that instruction involved some degree of acculturation, I classified them as apprenticeships. Of the 11 experimental instructional programs, 6 were behavior models, 1 was a development model, and 4 were apprenticeship models. In three of the apprenticeship studies, researchers simply administered questionnaires about activities (e.g., participation in community sports) that involved acculturation. The remaining apprenticeship study examined the acculturation of mainstreamed special education students. In the single development study, students were challenged with controversial questions, and older students were expected to give responses that qualitatively differed from those of younger students.
Salkind_Chapter 28.indd 259
9/4/2010 10:33:10 AM
260
Curriculum, Instruction and Learning
Table 3: Number of instruction and knowledge paradigms in Journal of Educational Psychology, Vol. 84, 1992 Paradigm Behavior Development Apprenticeship Unspecified Declarative Procedural Conceptual Analogical logical Unclassifiable Composite
School instruction (N = 43)
Experimental manipulation (N = 11/43)
Instruction paradigms 2 0 2 39 Knowledge paradigms (in 43 studies) 38 26 8 8 19 14
6 1 4 0
The lower part of Table 3 lists the knowledge categories. A number of the studies activated more than one type of knowledge (see appendix), so the knowledge category list total exceeds the article total (43). It is clear from the table that declarative and procedural knowledge occur most frequently. A close third is logical knowledge – mostly arising from problem solving (e.g., algebra) and from belief systems (e.g., judgments of peers). Fourth in frequency is the unclassifiable composite of standardized test scores, performance ratings, and so on. There were only eight studies that activated conceptual and/or analogical knowledge.
Discussion and Conclusions I want first to emphasize that I did not conduct a reliability study and that some of my classifications can be (and undoubtedly will be) disputed. However, the classification of school instruction as unspecified is not disputable. Description of school instruction is either present in the article, or it isn’t. Mostly, it isn’t. I found this result surprising and disturbing. The majority of researchers who are conducting studies of reading, writing, arithmetic, and other school subjects or classroom activities are not presenting any information about their subjects’ school training in these areas. The standard justification is that school programs constitute randomized variation. While this may make sense statistically, the question is does it make sense logically? A 5-year period of school training in a particular subject, such as reading, surely constitutes a powerful treatment effect. Have researchers perhaps trapped themselves into a quintessential catch-22? Are they alleging that rigorous educational research can be conducted only if
Salkind_Chapter 28.indd 260
9/4/2010 10:33:10 AM
Farnham-Diggory
Paradigms of Knowledge and Instruction
261
major treatments (school training) and avowed objectives (improvements in school training) are defined as random errors? Surely this is not the case. Surely most researchers avoid describing school programs because it is often very difficult to map classroom instruction onto theories that drive laboratory research. One specific purpose of my classification system has been to facilitate such mappings. School and laboratory programs can both be classified within the framework of three instructional paradigms and five knowledge paradigms. Once that is done, the links between school and laboratory can be much more easily identified. Reviews, as well, should begin with an analysis of the core instructional and knowledge paradigms that underlie the research programs of interest. It is this underlying logic that cuts through fashionable, but often confusing, terminology; that identifies the lines of research that the program is truly linked to; and that reveals the degree to which educational science as a whole is (or is not) advancing.
Appendix Classification of research reports in the Journal of Educational Psychology, Vol. 84, 1992 Stader, S. R., & Licht, B. G. Effects of questionnaire administration condition on children’s achievement-related beliefs, 28–34; behavioral SI; declarative, procedural, logical. Marsh, H. W. Content specificity of relations between academic achievement and academic self-concept, 35–42; unspecified SI; declarative, logical; UC. Vaughn, S., Haager, D., Hogan, A., & Kouzekanani, K. Self-concept and peer acceptance in students with learning disabilities: a four- to five-year prospective study, 43–50; unspecified SI; declarative, analogical, logical; UC. Benbow, C. P. Academic achievement in mathematics and science of students between ages 13 and 23: are there differences among students in the top one percent of mathematical ability? 51–61; unspecified SI; UC. Low, R., & Over, R. Hierarchical ordering of schematic knowledge relating to area-of-rectangle problems, 62–69; unspecified SI; declarative, procedural, conceptual, analogical, logical. Hegarty, M., Mayer, R. E., & Green, C. E. Comprehension of arithmetic word problems: evidence from students’ eye fixations, 76–84; unspecified SI; declarative, procedural, conceptual, logical. Verschaffel, L., De Corte, E., & Pauwels, A. Solving compare problems: an eye movement test of Lewis and Mayer’s consistency hypothesis, 85–94; unspecified SI; declarative, procedural, conceptual, logical. Herdman, C. M., & LeFevre, J. Individual differences in the efficiency of word recognition, 95–102; unspecified SI; declarative, procedural. Bisanz, G. L., Das, J. P., Varnhagen, C. K., & Henderson, H. R. Structural components of reading time and recall for sentences in narratives: exploring changes with age and reading ability, 103–114; unspecified SI; declarative, procedural, conceptual.
Salkind_Chapter 28.indd 261
9/4/2010 10:33:10 AM
262
Curriculum, Instruction and Learning
Woloshyn, V. E., Pressley, M., & Schneider, W. Elaborative-interrogation and prior-knowledge effects on learning of facts, 115–124; unspecified SI; declarative, procedural. Gustafsson, J., & Undheim, J. O. Stability and change in broad and narrow factors of intelligence from ages 12 to 15 years, 141–149; unspecified SI, apprenticeship El; UC. Hong, E., & O’Neil, H. F. Instructional strategies to help learners build relevant mental models in inferential statistics, 150–159; unspecified SI, behavior EI; declarative, procedural, analogical, logical. Imai, M., Anderson, R. C, Wilkinson, I. A. G., & Yi, H. Properties of attention during reading, 160–173; unspecified SI, behavior EI; declarative, procedural. Treiman, R., & Weatherston, S. Effects of linguistic structure on children’s ability to isolate initial consonants, 174–181; apprenticeship SI; declarative, procedural. Haenggi, D., & Perfetti, C. A. Individual differences in reprocessing of text, 182– 192; unspecified SI; declarative, procedural. Breznitz, Z., & Share, D. L. Effects of accelerated reading rate on memory for text, 193–199; unspecified SI; declarative, procedural. Maki, R. H., & Share, M. Role of practice tests in the accuracy of test predictions on text material, 200–210; unspecified SI; declarative, procedural. Rabinowitz, M., Freeman, K., & Cohen, S. Use and maintenance of strategies: the influence of accessibility to knowledge, 211–218; unspecified SI; declarative, procedural. Hall, V. C, & Edmondson, B. Relative importance of aptitude and domain knowledge on immediate and delayed posttests, 219–223; unspecified SI, apprenticeship EI; declarative; UC. Nicholls, J. G., & Nelson, J. R. Students’ conceptions of controversial knowledge, 224–230; unspecified SI, development EI; declarative, logical. Pressley, M., Shuder, T., Bergman, J. L., & EI-Dinary, P. B. A research-educator collaborative interview study of transactional comprehension strategies instruction, 231–246; behavior SI; declarative. Duda, J. L., & Nicholls, J. G. Dimensions of achievement motivation in schoolwork and sport, 290–299; unspecified SI; apprenticeship EI; declarative, logical. Pierson, L. H., & Connell, J. P. Effect of grade retention on self-system processes, school engagement, and academic performance, 300–307; unspecified SI; declarative; UC. Costanzo, M. Training students to decode verbal and nonverbal cues: effects on confidence and performance, 308–313; unspecified SI; behavior EI; declarative, procedural, analogical, logical. Juvonen, J. Negative peer reactions from the perspective of the reactor, 314–321; unspecified SI; declarative, logical. Juvonen, J., & Bear, G. Social adjustment of children with and without learning disabilities in integrated classrooms, 322–330; unspecified academic SI; apprenticeship EI; declarative, logical; UC. Fantuzzo, J. W., King, J. A., & Heller, L. R. Effects of reciprocal peer tutoring on mathematics and school adjustment: a component analysis, 331–339; unspecified SI; behavior EI; declarative, procedural, logical; UC. Sawyer, R. J., Graham, S., & Harris, K. R. Direct teaching, strategy instruction, and strategy instruction with explicit self-regulation: effects on the composition
Salkind_Chapter 28.indd 262
9/4/2010 10:33:10 AM
Farnham-Diggory
Paradigms of Knowledge and Instruction
263
skills and self-efficacy of students with learning disabilities, 340–352; unspecified SI; behavior EI; declarative, procedural, conceptual, logical. Spector, J. E. Predicting progress in beginning reading: dynamic assessment of phonemic awareness, 353–363; unspecified SI; declarative, procedural. Torgesen, J. K., Morgan, S. T., & Davis, C. Effects of two types of phonological awareness training on word learning in kindergarten children, 364–370; unspecified SI; behavioral EI; declarative, procedural. Weaver, C. A., & Kintsch, W. Enhancing students’ comprehension of the conceptual structure of algebra word problems, 419–428; unspecified SI; declarative, procedural, conceptual, logical. Schommer, M., Crouse, A., & Rhodes, N. Epistemological beliefs and mathematical text comprehension: believing it is simple does not make it so, 435–443; unspecified SI; declarative, logical. Mayer, R. E., & Anderson, R. B. The instructive animation: helping students build connections between words and pictures in multimedia learning, 444–452; unspecified SI; declarative, procedural, analogical, logical. Patterson, M. E., Dansereau, D. F., & Newbern, D. Effects of communication aids and strategies on cooperative teaching, 453–461; unspecified SI; behavior EI; declarative, procedural, analogical. Tirre, W. C, & Pena, C. M. Investigation of functional working memory in the reading span test, 462–472; unspecified SI; declarative, procedural; UC. Swanson, H. L. Generality and modifiability of working memory among skilled and less skilled readers, 473–488; unspecified SI; declarative, procedural, analogical, conceptual; UC. Allen, L., & Cipielewski, J. Multiple indicators of children’s reading habits and attitudes: construct validity and cognitive correlates, 489–503; apprenticeship EI; declarative, procedural, analogical; UC. Goldman, S. R., & Murray, J. D. Knowledge of connectors as cohesion devices in text: a comparative study of native-English and English-as-a-second-language speakers, 504–519; unspecified SI; declarative, conceptual, logical. Wang, A. Y., Thomas, M. H., & Ouellette, J. A. Keyword mnemonic and retention of second-language vocabulary words, 520–528; unspecified SI; declarative, procedural. Lobel, T. E., & Bempechat, J. Socialization of achievement: influence of mothers’ need for approval on children’s achievement cognitions and behavior, 529–536; unspecified SI; declarative, logical. Goff, M., & Ackerman, P. L. Personality-intelligence relations: assessment of typical intellectual engagement, 537–552; unspecified SI; UC. Marsh, H. W. Extracurricular activities: beneficial extension of the traditional curriculum or subversion of academic goals, 553–562; unspecified SI; UC. Cashin, W. E., & Downey, R. C. Using global students rating items for summative evaluation, 563–572; unspecified SI; UC. Note. Articles are listed in the order of their appearance in the Journal of Educational Psychology, rather than alphabetically. Articles of a certain type may have been grouped together by the journal editor, and I didn’t want to disrupt those patterns. SI = school instruction. EI = experimental instruction. UC = unclassifiable knowledge composite.
Salkind_Chapter 28.indd 263
9/4/2010 10:33:10 AM
264
Curriculum, Instruction and Learning
References Bilodeau, E. A. (Ed.). (1969). Principles of skill acquisition. New York: Academic Press. Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18(1), 32–42. Brush, F. R., & Overmier, J. B. (Eds.). (1985). Affect, conditioning, and cognition: Essays on the determinants of behavior. Hillsdale, NJ: Erlbaum. Collins, A., Brown, J. S., & Newman, S. E. (1989). Cognitive apprenticeship: Teaching the crafts of reading, writing, and mathematics. In L. B. Resnick (Ed.), Knowing, learning, and instruction: Essays in honor of Robert Glaser (pp. 453–494). Hillsdale, NJ: Erlbaum. Coy, M. W. (Ed.). (1989). Apprenticeship: From theory to method and back again. New York: SUNY. Ebbinghaus, H. (1964). Memory. New York: Dover. Eisner, E. W. (1993). Forms of understanding and the future of educational research. Educational Researcher, 22(7), 5–11. Farnham-Diggory, S. (1972). Cognitive processes in education (1st ed.). New York: Harper & Row. Farnham-Diggory, S. (1990). Schooling. Cambridge, MA: Harvard University Press. Farnham-Diggory, S. (1992). Cognitive processes in education (2nd ed.). New York: HarperCollins. Heider, R. (1958). The psychology of interpersonal relations. New York: Wiley. Hilgard, E. R. (1948). Theories of learning. New York: Appleton-Century-Crofts. Joncich, G. (1968). The sane positivist: A biography of Edward L. Thorndike. Middletown, CT: Wesleyan University Press. Klausmeier, H. J., & Harris, C. W. (Eds.). (1966). Analyses of concept learning. New York: Academic Press. Orton, S. T. (1925). Word-blindness in school children. Archives of Neurology and Psychiatry, 14, 582–615. Piaget, J. (1951). The child’s conception of physical causality. London: Routledge & Kegan Paul. Posner, G., Strike, K., Hewson, P., & Gertzog, W. (1982). Accommodation of a scientific conception: Toward a theory of conceptual change. Science Education, 66, 211–227. Reber, A. S. (1967). Implicit learning of artificial grammars. Journal of Verbal Learning and Verbal Behavior, 6, 855–863. Reed, S. K. (1973). Psychological processes in pattern recognition and categorization. Cognitive Psychology, 3, 382–407. Rogoff, B. (1990). Apprenticeship in thinking. New York: Oxford University Press. Rorabaugh, W. J. (1986). The craft apprentice: From Franklin to the machine age in America. New York: Oxford University Press. Russell, B., & Branch, T. (1979). Second wind. New York: Ballantine. Shiffrin, R. M., & Schneider, W. (1977). Controlled and automatic human information processing II: Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127–190. Smith, D. C. (1989). The role of teacher knowledge in teaching conceptual change science lessons. Unpublished doctoral dissertation, University of Delaware. Spalding, R., & Spalding, W. (1986). The writing road to reading. New York: Morrow. Standing, L. (1973). Learning 10,000 pictures. Quarterly Journal of Experimental Psychology, 35, 207–222. Thorndike, E. L. (1913–1914). Educational psychology (Vols. 1–3). New York: Teachers College Press. Thorndike, E. L. (1918). The nature, purposes, and general measurements of educational products. In G. M. Whipple (Ed.), The measurement of educational products. Seventeenth
Salkind_Chapter 28.indd 264
9/4/2010 10:33:10 AM
Farnham-Diggory
Paradigms of Knowledge and Instruction
265
Yearbook of the National Society for the Study of Education, Part II (pp. 16–24). Bloomington, IL: Public School Publishing Co. Underwood, B. J. (1982). Studies in learning and memory: Selected papers. New York: Praeger. van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. New York: Academic. Watterson, P. R., Rendell, F., & Bell, A. (1988). Whale: Computer-assisted topic. Glasgow: Jordanhill College of Education, Sales and Publications.
Salkind_Chapter 28.indd 265
9/4/2010 10:33:10 AM
Salkind_Chapter 28.indd 266
9/4/2010 10:33:10 AM
29 Health Promotion by Social Cognitive Means Albert Bandura
I
am deeply honored to be a recipient of the Healthtrac Award. It is a special honor to be recognized by a foundation that promotes the betterment of human health in the ways I value highly. In comparing myself to the figure Larry so generously described, I feel like a Swiss yodeler following Pavarotti. The field of health is changing from a disease model to a health model. It is just as meaningful to speak of levels of vitality and healthfulness as of degrees of impairment and debility. Health promotion should begin with goals, not means.1 If health is the goal, biomedical interventions are not the only means to it. A broadened perspective expands the range of health-promoting practices and enlists the collective efforts of researchers and practioners who have much to contribute from a variety of disciplines to the health of a nation. The quality of health is heavily influenced by lifestyle habits. This enables people to exercise some measure of control over their health. By managing their health habits, people can live longer and healthier and retard the process of aging. Self-management is good medicine. If the huge health benefits of these few habits were put into a pill, it would be declared a scientific milestone in the field of medicine.
Supply-Side versus Demand-Side Approaches Current health practices focus heavily on the medical supply side. The growing pressure on health systems is to reduce, ration, and delay health services to Source: Health Education & Behavior, 31(2) (2004): 143–164.
Salkind_Chapter 29.indd 267
9/4/2010 10:32:58 AM
268
Curriculum, Instruction and Learning
contain health costs. The days for the supply-side health system are limited. People are living longer. This creates more time for minor dysfunctions to develop into chronic diseases. Demand is overwhelming supply. Psychosocial factors partly determine whether the extended life is lived efficaciously or with debility, pain, and dependence.2,3 Social cognitive approaches focus on the demand side. They promote effective self-management of health habits that keep people healthy through their life span. Aging populations will force societies to redirect their efforts from supply-side practices to demand-side remedies. Otherwise, nations will be swamped with staggering health costs that consume valuable resources needed for national programs.
Social Cognitive Theory This article focuses on health promotion and disease prevention by social cognitive means.4,5 Social cognitive theory specifies a core set of determinants, the mechanism through which they work, and the optimal ways of translating this knowledge into effective health practices. The core determinants include knowledge of health risks and benefits of different health practices, perceived self-efficacy that one can exercise control over one’s health habits, outcome expectations about the expected costs and benefits for different health habits, the health goals people set for themselves and the concrete plans and strategies for realizing them, and the perceived facilitators and social and structural impediments to the changes they seek. Knowledge of health risks and benefits creates the precondition for change. If people lack knowledge about how their lifestyle habits affect their health, they have little reason to put themselves through the travail of changing the detrimental habits they enjoy. But additional self-influences are needed for most people to overcome the impediments to adopting new lifestyle habits and maintaining them. Beliefs of personal efficacy play a central role in personal change. This focal belief is the foundation of human motivation and action. Unless people believe they can produce desired effects by their actions, they have little incentive to act or to persevere in the face of difficulties. Whatever other factors may serve as guides and motivators, they are rooted in the core belief that one has the power to produce desired changes by one’s actions. Health behavior is also affected by the outcomes people expect their actions to produce. The outcome expectations take several forms. The physical outcomes include the pleasurable and aversive effects of the behavior and the accompanying material losses and benefits. Behavior is also partly regulated by the social reactions it evokes. The social approval and disapproval the behavior produces in one’s interpersonal relationships is the second major class of outcomes. This third set of outcomes concerns the positive and negative selfevaluative reactions to one’s health behavior and health status. People adopt
Salkind_Chapter 29.indd 268
9/4/2010 10:32:59 AM
Bandura
Health Promotion
269
personal standards and regulate their behavior by their self-evaluative reactions. They do things that give them self-satisfaction and self-worth and refrain from behaving in ways that breed self-dissatisfaction. Motivation is enhanced by helping people to see how habit changes are in their self-interest and the broader goals they value highly. Personal goals, rooted in a value system, provide further self-incentives and guides for health habits. Long-term goals set the course of personal change. But there are too many competing influences at hand for distal goals to control current behavior. Short-term attainable goals help people to succeed by enlisting effort and guiding action in the here and how. Personal change would be easy if there were no impediments to surmount. The perceived facilitators and obstacles are another determinant of health habits. Some of the impediments are personal ones that deter performance of healthful behavior. They form an integral part of self-efficacy assessment. Self-efficacy beliefs must be measured against gradations of challenges to successful performance. For example, in assessing personal efficacy to stick to an exercise routine, people judge their efficacy to get themselves to exercise regularly in the face of different obstacles: when they are under pressure from work, are tired, feel depressed, are anxious, face foul weather, and have more interesting things to do. If there are no impediments to surmount, the behavior can be easy to perform and everyone is efficacious. The regulation of behavior is not solely a personal matter. Some of the impediments to healthful living reside in health systems rather than in personal or situational impediments. These impediments are rooted in how health services are structured socially and economically.
Primacy of Efficacy Belief in Causal Structures Self-efficacy is a focal determinant because it affects health behavior both directly and by its influence on the other determinants. Efficacy beliefs influence goals and aspirations. The stronger the perceived self-efficacy, the higher the goals people set for themselves and the firmer their commitment to them. Self-efficacy beliefs shape the outcomes people expect their efforts to produce. Those of high efficacy expect to realize favorable outcomes. Those of low efficacy expect their efforts to bring poor outcomes. Self-efficacy beliefs also determine how obstacles and impediments are viewed. People of low efficacy are easily convinced of the futility of effort in the face of difficulties. They quickly give up trying. Those of high efficacy view impediments as surmountable by improvement of self-management skills and perseverant effort. They stay the course in the face of difficulties. Figure 1 shows the paths of influence in the posited sociocognitive causal model. Beliefs of personal efficacy affect health behavior both directly and by their impact on goals, outcome expectations, and perceived facilitators and impediments.
Salkind_Chapter 29.indd 269
9/4/2010 10:32:59 AM
270
Curriculum, Instruction and Learning
Outcome Expectations Physical Social Self-Evaluative
Self-Eficacy
Goals
Behavior
Sociostructural Factors Facilitators Impediments
Figure 1: Structural paths of influence wherein perceived self-efficacy affects health habits both directly and through its impact on goals, outcome expectations, and perception of sociostructural facilitators and impediments to health-promoting behavior
Overlap in Health Belief Models There are many psychosocial models of health behavior. They are founded on the common metatheory that psychosocial factors are heavy contributors to human health. For the most part, the models include overlapping determinants but under different names. In addition, facets of a higher order construct are often split into seemingly different determinants, as when different forms of anticipated outcomes of behavioral change are included as different constructs under the name of attitudes, normative influences, and outcome expectations. Following the timeless dictum that the more the better, some researchers overload their studies with a host of factors that contribute only trivially to health habits because of redundancy. Figure 2 shows the factors the various health models select and their overlap with determinants in social cognitive theory. Most of the factors in the different models are mainly different types of outcome expectations. Perceived severity and susceptibility to disease in the health-belief model are the expected negative physical outcomes. The perceived benefits are the positive outcome expectations. In the theory of reasoned action and planned behavior, attitudes toward the behavior and social norms produce intentions that are said to determine behavior. Attitude is measursed by perceived outcomes and the value placed on those outcomes. As defined and operationalized, these are outcome expectations, not attitudes as traditionally conceptualized. Norms are measured by perceived social pressures and one’s motivation to comply with them. Norms correspond to expected social outcomes for a given behavior. Goals may be distal ones or proximal ones. Intentions are essentially proximal goals. I aim to do x and I intend to do x are really the same thing. Perceived control in the theory of planned behavior overlaps with perceived self-efficacy.
Salkind_Chapter 29.indd 270
9/4/2010 10:32:59 AM
Salkind_Chapter 29.indd 271
SelfEfficacy Physical
Social
SelfEvaluative
Outcome Expectations
Proximal
Goals
Distal
Personal & Situational
Health System
Impediments
Figure 2: Summary of the main sociocognitive determinants and their areas of overlap in different conceptual models of health behavior
Protection Motivation Theory
Theory of Planned Behavior
Theory of Reasoned Action
Health Belief Model
Social Cognitive Theory
Theories
Psychosocial Determinants of Health Behavior
Bandura Health Promotion 271
9/4/2010 10:32:59 AM
272
Curriculum, Instruction and Learning
Regression analyses reveal substantial redundancy of predictors bearing different names.6 For example, after the contributions of perceived self-efficacy and self-evaluative reactions to one’s health behavior are taken into account, neither intentions nor perceived behavioral control add any incremental predictiveness. Most of the models of health behavior are concerned only with predicting health habits. But they do not tell you how to change health behavior. Social cognitive theory offers both predictors and principles on how to inform, enable, guide, and motivate people to adapt habits that promote health and reduce those that impair it.4
Threefold Stepwise Implementation Model The social utility of health promotion programs can be enhanced by a stepwise implementation model. In this approach, the level and type of interactive guidance is tailored to people’s self-management capabilities and motivational preparedness to achieve desired changes. The first level includes people with a high sense of efficacy and positive outcome expectations for behavior change. They can succeed with minimal guidance to accomplish the changes they seek. Individuals at the second level have self-doubts about their efficacy and the likely benefits of their efforts. They make halfhearted efforts to change and are quick to give up when they run into difficulties. They need additional support and guidance by interactive means to see them through tough times. Much of the guidance can be provided through tailored print or telephone consultation. Individuals at the third level believe that their health habits are beyond their personal control. They need a great deal of personal guidance in a structured mastery program. Progressive successes build belief in their ability to exercise control and bolster their staying power in the face of difficulties and setbacks. Thus, in the stepwise model, the form and level of enabling interactivity is tailored to the participants’ changeability readiness. The following sections are devoted to a more detailed consideration of how to enable people at these various levels of changeability to improve their health status and functioning.
Public Health Campaigns Societal efforts to get people to adopt healthful practices rely heavily on public health campaigns. These population-based approaches promote changes mainly in people with high perceived efficacy for self-management and positive expectations that the prescribed changes will improve their health. Meyerowitz and Chaiken7 examined four possible mechanisms through which health
Salkind_Chapter 29.indd 272
9/4/2010 10:33:00 AM
Bandura
Health Promotion
273
communications could alter health habits: by transmitting information on how habits affect health, by arousing fear of disease, by increasing perceptions of one’s personal vulnerability or risk, or by raising people’s beliefs in their efficacy to alter their habits. They found that health communications foster adoption of healthful practices to the extent that they raise beliefs in personal efficacy. To help people reduce health-impairing habits by health communications requires a change in emphasis from trying to scare people into health to enabling them with the self-management skills and self-beliefs needed to take charge of their health habits. In longitudinal analyses of community-based health campaigns, Rimal8,9 found that perceived self-efficacy governs whether individuals translate perceived risk into a search for health information and whether they translate acquired health knowledge into healthful behavioral practices. Those of low self-efficacy take no action even though they are knowledgeable about lifestyle contributors to health and perceive themselves to be vulnerable to disease. Maibach and colleagues10 found that both people’s preexisting self-efficacy beliefs that they can exercise control over their health habits and the selfefficacy beliefs instilled by a community health campaign contributed to adoption of healthy eating habits and regular exercise (Figure 3).
Overprediction of Refractoriness Our theories overpredict the resistance of health habits to change. This is because they are developed by studying mainly refractory cases but ignoring successful self-changers. For example, smoking is one of the most addictive substances. It is said to be intractable because it is compelled by biochemical and psychological dependencies. Each puff sends a reinforcing nicotine shot to the brain. Prolonged use is said to create a relapsing brain disease.
.35 (.32)
Adoption of Health Habits
.3 5
(.4 4)
Preexisting Self-Efficacy
Community Health Campaign
.16 (.16)
Change in Self-Efficacy
Note: The initial numbers on the paths of influence are the significant path coefficients for adoption of healthy eating patterns; the numbers in parentheses are the path coefficients for regular exercise.10
Figure 3: Paths of the influence of perceived self-efficacy on health habits in community-wide programs to reduce risk of cardiovascular disease
Salkind_Chapter 29.indd 273
9/4/2010 10:33:00 AM
274
Curriculum, Instruction and Learning
The problem with this theorizing is that it predicts far more than has ever been observed. More than 40 million people in the United States have quit smoking on their own. Where was their brain disease? How did the smokers cure the disease on their own? Superimposed Superimposed on the 40 million self-quitters, the dismal relapse curves that populate our journals are but a tiny ripple in the vast sea of successes. Carey and his colleagues verified longitudinally that heavy smokers who quit on their own had a stronger belief in their efficacy at the outset than did continuous smokers and relapsers.11 Successful self-changers combine efficacy belief with outcome expectations that benefits will outweigh disadvantages of the lifestyle changes. The same is true for alcohol and narcotic addiction. Lee Robins12 reported a remarkably high remission for heroin addiction among Vietnam veterans without the benefit of treatment. Vaillant13 has shown that a large share of alcoholics eventually quit drinking without treatment, assistance from selfhelp groups, or radical environmental change. Granfield and Cloud14 put it well when they characterized the inattention to successful self-changes in substance abuse as “the elephant that no one sees.”
Enhancement of Health Impact by Interactive Technologies The absence of individual guidance places limits on the power of one-way mass communication. The revolutionary advances in interactive technology can increase the scope and impact of health promotion programs. On the input side, health communications can now be personally tailored to factors known to affect health behavior. Tailoring communications does not necessarily guarantee better outcomes. The benefits of individualization will depend on the predictive value of the tailored factors. If weak or irrelevant factors are targeted, individualization will not provide incremental benefits. Development of measures for key social cognitive determinants known to affect health behavior can provide guidance for tailoring strategies. On the behavioral adaption side, individualized interactivity further enhances the impact of health promotion programs. Social support and guidance during early periods of personal change and maintenance increase longterm success. Here, too, the impact of social support will depend on its nature. Converging evidence across diverse spheres of functioning reveals that the social support has beneficial effects only if it raises people’s beliefs in their efficacy to manage their life circumstances.15 If social support is provided in ways that foster dependence, it can undermine coping efficacy. Effective enablers provide the type of support and guidance that is conducive to self-efficacy enhancement for personal success.5 Interactive computer-assisted feedback provides a convenient means for informing, enabling, motivating, and guiding people in their efforts to make lifestyle changes. The personalized feedback can be adjusted to participants’
Salkind_Chapter 29.indd 274
9/4/2010 10:33:00 AM
Bandura
Health Promotion
275
efficacy level, the unique impediments in their lives, and the progress they are making. The feedback may take a variety of forms, including individualized print communications, telephone counseling, and linkage to supportive social networks. I shall describe shortly a self-management system that encompasses these various enabling features.
Socially Mediated Pathways of Influence There is another way in which the power of population-based approaches to health promotion can be strengthened. There is only so much that largescale health campaigns can do on their own, regardless of whether they are tailored or generic. There are two pathways through which health communication can alter health habits (Figure 4). In the direct pathway, media promote changes by informing, modeling, motivating, and guiding personal changes. In the socially mediated pathway, the media link participants to social networks and community settings. These places provide continued personalized guidance, natural incentives, and social supports for desired changes. The major share of behavioral changes is promoted within these social milieus.16 Psychosocial programs for health promotion will be increasingly implemented via interactive Internet-based systems. People at risk for health problems typically ignore preventive or remedial health services. For example, young women at risk of eating disorders resist seeking help. But they will use Internet-delivered guidance because it is readily accessible, convenient, and provides a feeling of anonymity. Studies by Taylor and colleagues17 attest to its potential. Through interactive guidance, women reduced dissatisfaction with their weight and body shape, altered dysfunctional attitudes, and rid themselves of disordered eating behavior. Interactive technologies are a tool, not a panacea. They cannot do much if individuals cannot motivate themselves to take advantage of what they have to offer. These systems need to be structured in ways that build motivational and self-management skills as well as guide habit changes. Otherwise, those who need the guidance most will use this tool least. Dual Paths of Influence
Media Influence
Connections to Social Systems
Behavior Change
Figure 4: Paths of influences through which mass communications affect psychosocial changes both directly and via a socially mediated pathway by linking viewers to social networks and community settings
Salkind_Chapter 29.indd 275
9/4/2010 10:33:00 AM
276
Curriculum, Instruction and Learning
Promoting Society-Wide Changes by Serial Dramas The social-linking function via the media is illustrated in global applications of serial television dramas founded on social cognitive theory that address some of the most urgent global problems.18 They include the soaring population growth and transmission of AIDS. Hundred of episodes in these long-running serials get people deeply involved in the lifestyle changes being modeled. The serials dramatize the everyday problems people struggle with, model solutions to them, and provide people with incentives and strategies for bettering their lives. The story lines model family planning, women’s equality, environmental conservation, AIDS prevention, and a variety of life skills. It is of limited value to motivate people to change if they are not provided with appropriate resources and environmental supports to realize those changes. The dramatizations, therefore, link people to community resources where they can receive a lot of continued supportive guidance. Worldwide applications in Africa, Asia, and Latin America are raising people’s efficacy to exercise control over their family lives, enhancing the status of women, and fostering the adoption of contraceptive practices to lower the rates of childbearing. A controlled study in Tanzania compared changes in family planning and contraception use in half the country that received a dramatic series with the rest of the country that did not.19 Compared to the control region, more families in the broadcast area went to family planning clinics and adopted family planning and contraceptive methods (Figure 5). The dramatic series produced similar changes later, when they were broadcast in the former control region of the country. Some of the story lines centered on safer sexual practices to prevent the spread of AIDS. Infection rates are high among long-distance truckers and prostitutes at truck stops. The dramatic productions focused on self-protective and risky sexual practices and modeled how to curb the spread of HIV infection. Compared with residents in the control region, those in the broadcast region increased belief in their personal risk of HIV infection through unprotected sexual practices, talked more about HIV infection, reduced the number of sexual partners, and increased condom use.20,21 The greater the exposure to the modeled behavior, the stronger the effects on perceived efficacy to control family size and risky sexual practices.
Self-Management Model Health habits are not changed by an act of will. It requires motivational and self-regulatory skills. Self-management operates through a set of psychological subfunctions. People have to learn to monitor their health behavior and
Salkind_Chapter 29.indd 276
9/4/2010 10:33:00 AM
Bandura
Health Promotion
277
Mean Number of New Family Planning Adopters Per Clinic
120 110
Broadcast Region Control Region Control => Broadcast
100 90 80 70 60 50 40 30 20 10 0 1989
1990
1991
1992
1993
1994
1995
1996
Note: The period 1990 to 1992 is the prebroadcast baseline. The values for 1993 to 1994 are the family planning adoption levels in the broadcast region (solid line) and the control region (dotted line). The values for 1995 to 1996 are the adoption levels when the serial was aired in the previous control region (solid line).20
Figure 5: Mean number of new family planning adopters per clinic in the Ministry of Health Clinics in the broadcast region and those in the control region
the circumstances under which it occurs, and how to use proximal goals to motivate themselves and guide their behavior. They also need to learn how to create incentives for themselves and to enlist social supports to sustain their efforts. DeBusk and his colleagues22 have developed a self-management model for health promotion and disease risk reduction founded on the self-regulatory mechanisms of social cognitive theory. This self-management model combines self-regulatory principles with computer-assisted implementation (Figure 6). It includes exercise programs to build cardiovascular capacity, nutrition programs to reduce dietary fat to lower risk of heart disease and cancer, weight reduction programs, and smoking cessation programs. For each risk factor, people are provided detailed guides on how to improve their health functioning. They monitor their health habits, set themselves short-term goals, and report the changes they are making. The computer mails personalized reports that include feedback of progress toward subgoals. The feedback also provides guides on how to manage troublesome situations and new subgoals to realize. Efficacy ratings identify areas in which self-regulatory skills must be developed if beneficial changes are to be achieved and maintained. A single implementer, assisted with a computerized implementation system, provides intensive, individualized guidance in self-management to large numbers of people.
Salkind_Chapter 29.indd 277
9/4/2010 10:33:00 AM
278
Curriculum, Instruction and Learning
Self-Regulatory Delivery System Progress Reports
Physician
Program Implementor
Phone Contact
Patient
Computerized System Data Base Self-Regulatory Change Programs
Figure 6: Computer-assisted self-regulatory system for altering health habits
In tests of the preventive value of this self-management system, employees in the workplace lowered elevated cholesterol by altering eating habits high in saturated fats (Figure 7). They achieved even larger reductions if their spouses took part in the program. The more room for dietary change, the larger the reduction in plasma cholesterol. A single nutritionist implemented the entire program at minimal cost for large numbers of employees. Nonadherence to drug therapies is a pervasive, serious problem. It worsens health conditions and raises medical costs. Moreover, it may lead physicians to prescribe stronger medications or more drastic interventions in response to the seeming failure of the prescribed treatment. A major public health nightmare is that excessive use of drugs and erratic compliance will breed hardier strains of pathogens that render existing medications ineffective. The success of the self-management system in promoting adherence is shown in a program by West and his colleagues23 to reduce sodium intake in patients suffering from heart failure (Figure 8). It strengthened patients’ efficacy to adhere to a low-sodium diet. They achieved substantial reduction in sodium intake and maintained it during a 6-month period. At each time point, the higher the perceived self-efficacy, the greater the sodium reduction. Haskell and his associates24 used the self-management system to promote lifestyle changes in patients suffering from coronary artery disease. This places them at high risk of heart attacks. At the end of 4 years, those receiving medical care by their physicians showed no change or they got worse. In contrast, those aided in self-management by nurse implementers achieved big reductions in multiple risk factors: They lowered their intake of saturated fat, lost weight, lowered their bad cholesterol, raised their good cholesterol, exercised more, and increased their cardiovascular capacity (Figure 9). The program also altered the physical progression of the disease. Those receiving the self-management program had 47% less buildup of plaque on their artery walls (Figure 10). They also had fewer coronary events, fewer hospitalizations, and fewer deaths.
Salkind_Chapter 29.indd 278
9/4/2010 10:33:00 AM
Bandura
Health Promotion
279
30
30 Self-Regulation
Reduction in Plasma Cholesterol (mg/dl)
Control 25
25
20
20
15
15
10
10
5
5
0
0 Subject Subject & Spouse
Small Large Room for Dietary Change
Note: The panel on the left summarizes the mean cholesterol reductions achieved in applications in the workplace by participants who used the self-management system either by themselves or along with their spouses, or did not receive the system to provide a control baseline. The right panel presents the mean cholesterol reductions achieved with the self-management system by participants whose daily cholesterol and fat intake was high or relatively low at the outset of the program.
Figure 7: Levels of reduction in plasma cholesterol achieved with the self-regulation system
100 Self-Efficacy
Sodium Intake
3500 3250
95
3000 90
2750 2500
85 2250 2000
80
B
2
6
8
20
B
2
6
8
20
Weeks
Figure 8: Enhancement of perceived self-regulatory efficacy and reduction of sodium intake through the aid of the self-management system
Salkind_Chapter 29.indd 279
9/4/2010 10:33:01 AM
Salkind_Chapter 29.indd 280
Percent Change
LDL HDL Cholesterol
Dietary Fat
Weight
Exercise
Cardiovascular Framingham Capacity Risk Index
Figure 9: Reduction in multiple risk factors by patients with coronary atherosclerosis depending on whether they received the usual care from their physicians or training in self-management of health habits
Triglycerides
Self-Regulatory System
Physicians’ Care
Source: Plotted from data of Haskell et al.24
−30
−25
−20
−15
−10
−5
0
5
10
15
20
25
30
280 Curriculum, Instruction and Learning
9/4/2010 10:33:01 AM
Bandura
Health Promotion
281
Cummulative Cardiac Events
50 Physicians’ Care 40
30 Self-Regulatory System
20
10
0 0
1
2
3
4
5
Year Source: Plotted from data of Haskell et al.24
Figure 10: Differences in the number of cardiac deaths, hospitalizations for nonfatal myocardial infarction, and other cardiac events for patients who received the usual care from their physician or training in self-management of health habits
The success of the self-management system has been compared in five hospitals to the standard medical postcoronary care in patients who have already suffered a heart attack. At the end of the 1st year, the self-management system is more effective in reducing risk factors and increasing cardiovascular functioning than the standard medical care. The self-management system is well received because it is individually tailored to people’s needs. It provides continuing personalized guidance that enables people to exercise control over their own change. It is a home-based program that does not require any special facilities, equipment, or group meetings plagued with high drop-out rates. It can serve large numbers of people simultaneously under the guidance of a single implementer. It is not constrained by time and place. It combines the high individualization of the clinical approach with the large-scale applicability of the public health approach. It provides valuable health promotion services at low cost. In the present applications, the computer is used as a coordinating and mailing system to guide self-directed change and to provide feedback of progress. By linking the interactive aspects of the self-management model to the Internet, one can vastly expand its availability to people wherever they may live, at whatever time they may choose to use it.
Health Promotion in Children Through Interactive Media The interactive capabilities of electronic technologies are beginning to be creatively enlisted for health promotion. A company in Silicon Valley is developing interactive video games that raise children’s perceived self-efficacy and enable them to manage chronic health conditions.25
Salkind_Chapter 29.indd 281
9/4/2010 10:33:01 AM
282
Curriculum, Instruction and Learning
In a role-playing video game for diabetic children, they win points depending on how well they understand the diabetic condition and regulate the diet, insulin, and blood sugar levels of two wacky diabetic pachyderms, Packy and Marlon. They set out to retrieve the food and diabetes supplies snatched by pesky enemy critters in a diabetes summer camp. To succeed, children have to boost the elephants’ health by managing their diabetes as they fight off the pesky critters using their trunks as water cannons and peanut launchers. The better the children manage the meals, blood glucose, and insulin dosage of the pachyderm duo to stay in the safe zone, the more points the children win. Children love the video game. They quickly become experts in how to manage diabetes (Figure 11). In assessments conducted 6 months later,26 the interactive role playing raises the children’s self-care efficacy. They talk more freely about their diabetes and their feelings about it. They adopt dietary and insulin practices to keep their blood sugar level under control. They reduce urgent doctor visits for diabetes emergencies by 77%. Control children who played a video game unrelated to health decreased their self-care and increased emergency doctor visits by 7%. Communication 100 80 Percent Change
Percent Change
Self-Efficacy 9 8 7 6 5 4 3 2 1 0
60 40 20 0 −20 −40
Diabetes Selfcare
Urgent Doctor Visits
6
10 0
4
−10
2 0 −2 −4 −6 −8
Percent Change
Percent Change
8
−20 −30 −40 −50 −60 −70 −80
Figure 11: Changes exhibited in a 6-month follow-up in perceived self-efficacy to manage different aspects of diabetes, child-initiated discussions about diabetes, level of diabetes self-care, and number of emergency doctor visits by children who had the benefit of the role-playing video game and diabetic control children who played other entertainment video games26
Salkind_Chapter 29.indd 282
9/4/2010 10:33:01 AM
Bandura
Health Promotion
283
Asthmatic children learn how to manage their condition by helping an asthmatic dinosaur named Bronchiasaurus stay strong and healthy while on a risky mission in an environment riddled with allergens. In the interactive game, children learn how to avoid asthma triggers, to keep the air free of respiratory irritants, to track peak flow, and to take medication. The video game improves knowledge about asthma, enhances perceived efficacy to avoid things that trigger asthma attacks, and improves use of emergency medications.27 Children with cystic fibrosis are taught how to deal with their lung problem by using medications and physical therapy to keep the lungs of a virtual puppy clear. Another interactive video game discourages children from smoking promoted by the Blackburn Tobacco Company. A daring surgeon enters the body in microscopic size with lasers to repair the damage done by smoking to save the smoker’s life. He clears phlegm from the bronchial tubes, removes tar deposits and precancerous cells from the throat and lungs, removes plaque and a deadly blood clot in the arteries, and enters the brain to conquer nicotine addition. The children become experts in the harmful effects of smoking. They lose any appetite for it. These health-promoting videos are being widely distributed to families by pediatricians. This is but the beginning in the creative use of the interactive video technology to promote childhood health.
Childhood Health Promotion Models Many of the lifelong habits that jeopardize health are formed during childhood and adolescence. For example, unless youngsters take up the smoking habit in their teens, they rarely become smokers in adulthood. It is easier to prevent detrimental health habits than to try to change them after they become deeply entrenched as part of a lifestyle. Prevention should be given priority but rarely is. Health habits are rooted in familial practices. But schools have an important role to play in promoting the health of a nation. This is the only place where all children can be easily reached. It is a natural setting for promoting healthful eating and exercise habits, discouraging smoking and other types of substance abuse, and building generic self-management skills. An effective preventive program includes four major components. The first component is informational. It informs children of the health risks and benefits of different lifestyle habits. The second component develops the social and self-management skills for translating informed concerns into effective preventive practices. The third component builds a resilient sense of efficacy to support the exercise of control in the face of difficulties and setbacks that inevitably arise. The final component enlists and creates social supports for desired personal changes. Educational efforts to promote the health of youths usually produce weak results. They provide factual information about health. But they usually do little
Salkind_Chapter 29.indd 283
9/4/2010 10:33:02 AM
284
Curriculum, Instruction and Learning
to equip children with the skills and efficacy beliefs that enable them to manage the emotional and social pressures to adopt detrimental health habits. Managing health habits involves managing social relationships, not just targeting a specific health behavior for change. Health promotion programs that include the essential elements of the self-management mastery model prevent or reduce injurious health habits. Health knowledge can be conveyed readily, but changes in values, self-efficacy, and health habits require greater effort. The more behavioral mastery experiences provided, the greater the beneficial effect.28 The more intensive the program, and the better the implementation, the stronger the impact.29 Comprehensive approaches that integrate guided mastery health programs with family and community efforts are more successful in promoting health and in preventing detrimental habits than are programs in which the schools try to do it alone.30 Schools are inadequately equipped with the resources, training, and incentives to undertake health promotion and early modification of habits that jeopardize health. As in other social systems, schools focus on areas in which they are evaluated. They are not graded for health promotion. When preventive programs are grudgingly allowed in schools, they try to do too much, with too little, in too short a time, with fitful quality of implementations to achieve much. Such efforts often do more to discredit psychosocial approaches through deficient implementation than to advance the health of youths. Health promotion must be structured as a part of a societal commitment that makes the health of its youth a matter of high priority. A serious commitment must provide the personnel, incentives, resources, and the operational control needed to do the job well. The programs should be in the school, but not of the school. New school-based models of health promotion should operate together with the home, the community, and the society at large. Schools’ health-related practices need changing as well. Schools that are provided with a brief health promotion curriculum and encouraged to lower the fat content of their lunch offerings and enhance their physical activity offerings produce lasting improvements in children’s eating and exercise habits.31 It is the height of irony to strive to promote healthful habits in schoolchildren while schools promote in their lunch program fast foods and house vending machines that dispense sodas and candy in return for substantial payments to schools by commercial enterprises.
Self-Management of Chronic Diseases The weight of disease is shifting from acute to chronic maladies. The selfmanagement of chronic diseases is another example of the use of self-regulatory and self-efficacy theory to develop cost-effective models with high social
Salkind_Chapter 29.indd 284
9/4/2010 10:33:02 AM
Bandura
Health Promotion
285
utility. Biomedical approaches are ill-suited for chronic diseases because they are devised mainly for acute illness. The treatment of chronic disease must focus on self-management of physical conditions over time. Holman and Lorig32 devised a generic self-management program in which patients are taught pain control techniques, self-relaxation, and proximal goal setting combined with self-incentives as motivators to increase level of activity. Participants are also taught problem-solving self-diagnostic skills and how to take greater initiative for their health care in dealings with health personnel. These skills are developed through modeling of self-management skills, guided mastery practices, and informative feedback. In the self-management of arthritis, the program is implemented in groups in community settings by leaders who lead active lives despite their arthritis (Figure 12). A 4-year follow-up with arthritic patients reveals that it retards the biological progression of diseases, raises perceived efficacy, reduces pain, markedly decreases the use of medical services by 43%, and improves the quality of life. Both the baseline efficacy beliefs and the efficacy beliefs instilled by the self-management program predict the health benefits 4 years later.
+20
+10
Percent Change
0
Disease Progression
Self Efficacy
−10
−20
Pain
−30
−40 Physician Visits Note: The 9% biological progression of the disease is much less than the 20% disease progression one would normally expect during 4 years for this age group. Source: Plotted from data of Lorig (1990).37
Figure 12: Enduring healthful changes achieved by training in self-management of arthritis as revealed in a follow-up assessment 4 years later
Salkind_Chapter 29.indd 285
9/4/2010 10:33:02 AM
286
Curriculum, Instruction and Learning
The self-management approach provides a generic model that can be adapted with supplementary components to different chronic diseases. Indeed, the self-management program produces similar health benefits for people suffering from other types of chronic diseases, such as heart disease, lung disease, stroke, and arthritis.33
Socially Oriented Approaches to Health The field of health has been plagued by a contentious dualism. It gets politicized in battles between individualist approaches and structuralist approaches to health. The individualist proponents argue that people can exercise a good deal of control over their health. So it is their responsibility to maintain it. The structuralist proponents argue that health is largely the product of social, environmental, political, and economic conditions, over which individuals have little control. In actuality, health promotion needs both approaches, not contentious debates. The quality of health of a nation is a social matter, not just a personal one. It requires changing the practices of social systems that impair health rather than just changing the habits of individuals. We do not lack sound policy prescriptions in the field of health. What is lacking is the collective efficacy to realize them. The main focus of a social approach is on collective enablement for changing social, political, and environmental conditions that affect health.4 Socially oriented approaches seek to raise public awareness of health hazards, to educate and influence policy makers, to build community capacity to change health policies and practices, and to mobilize the collective citizen action needed to override vested political and economic interests that benefit from existing unhealthful practices. Social cognitive theory extends the conception of human agency to collective agency.34,35 People do not operate as isolates. They work together to improve the quality of their lives. Their shared beliefs in their collective efficacy to accomplish social change play a key role in the policy and public health approaches to health promotion and disease prevention. For example, cigarette smoking is the most personally preventable cause of death. People got smoke-free workplaces, restaurants, public buildings, and airliners through their own collective action, not through the governmental agencies with the responsibility to protect national health. Lobbyists get legislators to block tobacco regulation (Figure 13). The more tobacco money the legislators get, the more dutifully they vote against tobacco regulation. The political impediments to legislative initiatives take the form of the obstructive triad – defeat, defang, and deregulate. The obstructive strategy is to defeat legislative initiatives, preferably in congressional committees, to spare legislators public votes that may be unpopular with their constituents. Laws provide the general guidelines. Congressional staff must convert them
Salkind_Chapter 29.indd 286
9/4/2010 10:33:02 AM
Bandura
U.S. House of Representatives 80 % Voting for Tobacco Control
% Voting for Tobacco Control
287
U.S. Senate
100
80
60
40
20
0
Health Promotion
$0− 300
$301− $2,751− 2,750 37,750 Money received
70 60 50 40 30 20 10 0
$0− 1,500
$1,501− $9,251− $19,551− 9,250 19,550 61,989 Money received
Source: Public Citizen Health Research Group, 1993.38
Figure 13: Relationship between the amount of campaign money legislators receive from the tobacco industry and their likelihood of voting against legislation to regulate tobacco products
into operational regulations. If you cannot defeat the legislation, defang it by translating the law into regulations that circumvent the intent of the legislation. If you cannot defang it, deregulate the regulators to undermine the monitoring and implementation of the legislation. With industry lobbyists and legislators erecting protective barriers, the social battles over health shift increasingly to grassroots initiatives at local levels.
Enablement for Community Self-Help While collective efforts are made to change unhealthful social practices, people need to improve their current life circumstances over which they have some control. We need to devote more attention to psychosocial models on how best to enable people to work together to improve their health at local levels. The approaches that work best promote community self-help. But people need to be given the necessary resources and enabling guidance to help themselves. Otherwise, simply to tell people with intractable problems to fend for themselves is an evasion of societal responsibility. Unsupported prescription of local self-help can be easily used as a political subterfuge for civic neglect. A community effort to reduce infant mortality resulting from unsanitary conditions in poor Latino neighborhoods provides one example of effective collective enablement.36 The community was fully informed of the impact of unsanitary conditions on children’s health through the local media, churches, schools, and
Salkind_Chapter 29.indd 287
9/4/2010 10:33:02 AM
288
Curriculum, Instruction and Learning
neighborhood meetings conducted by influential persons in the community. The residents were taught how to install plumbing systems, sanitary sewerage facilities, and refuse storage. They were also taught how to secure the financing needed from different local and governmental sources. This enabling self-help program greatly improved sanitation and markedly reduced infant mortality.
Components of Psychosocial Models for Social Change There are three major components in the social cognitive theory for promoting psychosocial changes society-wide.16,18 The first component is a sound theoretical model that specifies the determinants of psychosocial change and the mechanisms through which they produce their effects. This knowledge provides the guiding principles. The second component is a translational and implementational model that converts theoretical principles into an innovative operational model by specifying the content, strategies of change, and their mode of implementation. The third component is a social diffusion model on how to promote adoption of psychosocial programs in diverse cultural milieus. It does so by making functional adaptations of the programs to different sociostructural circumstances, providing enabling guidance, and enlisting the necessary resources to achieve success. We construct theories and clarify how they work. But we do not profit from our successes because we fail to develop effective translational and social diffusion models. If we are to contribute significantly to the betterment of human health, we must broaden our perspective on health promotion and disease prevention beyond the individual level. This calls for a more ambitious socially oriented agenda of research and practice. We can further amplify our impact on human health by making creative use of evolving interactive technologies that expand the scope and impact of health promotion efforts. But this is another story. And I have come to the end of this one. As you venture forth to promote your own health and that of others, may the efficacy force be with you.
Note A major portion of this article was presented as the Healthtrac Foundation Lecture at the convention of the Society for Public Health Education in Philadelphia, November 9, 2002.
References 1. Nordin I: The limits of medical practice. Theor Med Bioeth 20:105–123, 1999. 2. Fries JF, Crapo LM: Vitality and Aging: Implications of the Rectangular Curve. San Francisco, Freeman, 1981. 3. Fuchs V: Who Shall Live? Health Economics and Social Choice. New York, Basic Books, 1974.
Salkind_Chapter 29.indd 288
9/4/2010 10:33:02 AM
Bandura
Health Promotion
289
4. Bandura A: Self-Efficacy: The Exercise of Control. New York, Freeman, 1997. 5. Bandura A: Psychological aspects of prognostic judgments, in Evans RW, Baskin DS, Yatsu FM (eds.): Prognosis of Neurological Disorders (2nd ed.). New York, Oxford University Press, 2000, pp. 11–27. 6. Dzewaltowski DA, Noble JM, Shaw JM: Physical activity participation: Social cognitive theory versus the theories of reasoned action and planned behavior. J Sport Exerc Psychol 12:388– 405, 1990. 7. Meyerowitz BE, Chaiken S: The effect of message framing on breast self-examination attitudes, intentions, and behavior. J Pers Soc Psychol 52:500–510, 1987. 8. Rimal RN: Closing the knowledge-behavior gap in health promotion: The mediating role of self-efficacy. Health Commun 12:219–237, 2000. 9. Rimal RN: Perceived risk and self-efficacy as motivators: Understanding individuals’ long-term use of health information. J Communic 8:633–654, 2001. 10. Maibach E, Flora J, Nass C: Changes in self-efficacy and health behavior in response to a minimal contact community health campaign. Health Commun 3:1–15, 1991. 11. Carey MP, Kalra DL, Carey KB, Halperin S, Richards CS: Stress and unaided smoking cessation: A prospective investigation. J Consult Clin Psychol 61:831–38, 1993. 12. Robins LN: The Vietnam drug user returns. Special Action Office Monograph. Ser. A, No. 2. Washington, DC, Government Printing Office, 1974. 13. Vaillant GE: The Natural History of Alcoholism Revisited. Cambridge, MA, Harvard University Press, 1995. 14. Granfield R, Cloud W: The elephant that no one sees: Natural recovery among middleclass addicts. J Drug Iss 26:45–61, 1996. 15. Bandura A: Social cognitive theory in cultural context. J Appl Psychol 51:269–290, 2002. 16. Bandura A: Social cognitive theory of mass communications, in Bryant J, Zillman D (eds.): Media Effects: Advances in Theory and Research (2nd ed.). Hillsdale, NJ, Lawrence Erlbaum, 2001, pp. 121–153. 17. Taylor CB, Winzelberg A, Celio A: Use of interactive media to prevent eating disorders, in Striegel-Moor R, Smolak L (eds.): Eating Disorders: New Direction for Research and Practice. Washington, DC, American Psychological Association, 2001, pp. 255–270. 18. Bandura A: Environmental sustainability by sociocognitive deceleration of population growth, in Schmuck P, Schultz W (eds.): The Psychology of Sustainable Development. Dordrecht, the Netherlands, Kluwer, 2002, pp. 209–238. 19. Rogers EM, Vaughan PW, Swalehe RMA, Rao N, Svenkerud P, Sood S: Effects of an entertainment-education radio soap opera on family planning behavior in Tanzania. Stud Fam Plann 30:1193–1211, 1999. 20. Vaughan PW, Rogers EM, Swalehe RMA: The Effects of “ Twende Na Wakati,” an Entertainment-Education Radio Soap Opera for Family Planning and HIV/AIDS Prevention in Tanzania. Unpublished manuscript, University of New Mexico, Albuquerque, 1995. 21. Vaughan PW, Rogers EM, Singhal A, Swalehe RM: Entertainment-education and HIV/ AIDS prevention: A field experiment in Tanzania. J Health Communic 5:81–100, 2000. 22. DeBusk RF, Miller NH, Superko HR, Dennis CA, Thomas RJ, Lew HT, Berger WE III, Heller RS, Rompf J, Gee D, Kraemer HC, Bandura A, Ghandour G, Clark M, Shah RV, Fisher L, Taylor CB: A case-management system for coronary risk factor modification. Ann Intern Med 120:721–729, 1994. 23. West JA, Bandura A, Clark E, Miller NH, Ahn D, Greenwald G, DeBusk RF: Self-Efficacy Predicts Adherence to Dietary Sodium Limitation in Patients With Heart Failure. Unpublished manuscript, Stanford University, Stanford, CA, 1999. 24. Haskell WL, Alderman EL, Fair JM, Maron DJ, Mackey SF, Superko HR, Williams PT, Johnstone IM, Champagne MA, Krauss RM, Farquhar JW: Effects of intensive multiple
Salkind_Chapter 29.indd 289
9/4/2010 10:33:02 AM
290
25.
26.
27.
28. 29.
30.
31.
32.
33.
34. 35. 36.
37.
38.
Curriculum, Instruction and Learning
risk factor reduction on coronary atherosclerosis and clinical cardiac events in men and women with coronary artery disease. Circulation 89:975–990, 1994. Lieberman DA, Brown SJ: Designing interactive video games for children’s health education, in Morgan K, Satava RM, Sieburg HB, Mattheus R, Christensen JP (eds.): Interactive Technology and the New Paradigm for Healthcare. Amsterdam, IOS Press and Ohmsha, 1995, pp. 201–210. Brown SJ, Lieberman DA, Gemeny BA, Fan YC, Wilson DM, Pasta DJ: Educational video game for juvenile diabetes care: Results of a controlled trial. Med Inform 22:77–89, 1997. Lieberman DA: Interactive video games for health promotion: Effects on knowledge, self-efficacy, social support, and health, in Street RL, Gold WR, Manning T (eds.): Health Promotion and Interactive Technology: Theoretical Applications and Future Directions. Hillsdale, NJ, Lawrence Erlbaum, 1997, pp. 103–120. Bruvold WH: A meta-analysis of adolescent smoking prevention programs. Am J Public Health 83:872–880, 1993. Connell DB, Turner RR, Mason EF: Summary of findings of the school health education evaluation: Health promotion effectiveness, implementation, and costs. J School Health 55:316–321, 1985. Perry CL, Kelder SH, Murray DM, Klepp K: Communitywide smoking prevention: Long-term outcomes of the Minnesota heart health program and the class of 1989 study. Am J Publ Health 82:1210–1216, 1992. Luepker RV, Perry CL, McKinlay SM, Nader PR, Parcel GS, Stone EJ, Webber LS, Elder JP, Feldman HA, Johnson CC, Kelder SH, Wu M: Outcomes of a field trial to improve children’s dietary patterns and physical activity: The child and adolescent trial for cardiovascular health (CATCH). JAMA 275:768–776, 1996. Holman H, Lorig K: Perceived self-efficacy in self-management of chronic disease, in Schwarzer R (ed.): Self-Efficacy: Thought Control of Action. Washington, DC, Hemisphere, 1992, pp. 305–323. Lorig K, Sobel DS, Stewart AL, Brown BW, Bandura A, Ritter P, Gonzalez VM, Laurent DD, Holman HR: Evidence suggesting that a chronic disease self-management program can improve health status while reducing hospitalization: A randomized trial. Med Care 37:5–14, 1999. Bandura A: Exercise of human agency through collective efficacy. Curr Dir Psychol Sci 9:75–78, 2000. Bandura A: Social cognitive theory: An agentic perspective. Annu Rev Psychol 52:1–26. Palo Alto, CA, Annual Reviews Inc., 2001. McAlister AL, Puska P, Orlandi M, Bye LL, Zbylot P: Behaviour modification: Principles and illustrations, in Holland WW, Detels R, Knox EG (eds.): Oxford Textbook of Public Health (2nd ed.), Vol. 3. Applications in Public Health . Oxford, UK, Oxford University Press, 1991, pp. 3–16. Lorig K: Self-Efficacy: Its Contributions to the Four Year Beneficial Outcome of the Arthritis Self-Management Course. Paper presented at the meeting of the Society for Behavioral Medicine, Chicago, April 1990. Public Citizen Health Research Group: Health Letter 9(11), 1993.
Salkind_Chapter 29.indd 290
9/4/2010 10:33:02 AM
30 Models of the Learner Jerome Bruner
T
opics, including the topics of keynote addresses to learned societies, have a hermeneutic history. The hermeneutic history of a topic, we are cautioned, must be taken into account if we are fully to interpret its meaning. The topic of my paper, Models of the Learner, is no exception. It has such a history and has a proximal origin in a set of exchanges – first as a conversation and then as the topic of a more formal learned discussion. Let me set forth the beginning narrative of that hermeneutic circle (or spiral) and continue it in the discussion that follows. The setting was an international conference in the not very Orwellian summer of 1984, a conference ostensibly on the vexed subject of how to improve the quality of education. Sponsored jointly by the Van Leer Jerusalem Foundation and the Aspen Institute, it took place in a handsome mansion overlooking one of the scenic lakes on the outskirts of Berlin – a mansion that had been reconstructed on the ruins of the residence of the infamous Goebbels, Hitler’s Minister of Culture, or was he the Minister of Propaganda? The participants were appropriately distinguished: some Deans of famous faculties of education, more than a sprinkling of great names in what everybody would agree is educational research, and a handful of psychologists and associated behavioral scientists whose work bore that tangential relation to the process of education that excites the optimism of educators with respect to the relevance of “pure” research. We were perhaps two dozen in number, and it was a convivial company. After a day and a half of discussion on topics of great generality, all conducted at a level of striking knowledgeability, someone proposed that we
Source: Educational Researcher, 14 (1985): 5–8.
Salkind_Chapter 30.indd 291
9/4/2010 10:32:50 AM
292
Curriculum, Instruction and Learning
could really not get to the heart of the matter unless we had more clearly in mind some working model of what a learner was, how he or she operated, and above all, what we thought to be an adequate learning environment for our putative learner. It was proposed to the plenary session that we give over the next morning to these issues. I was among those asked to prepare some sort of statement on the matter. The discussion that ensued was lively. What it left behind in my mind and what several of us discussed later was the flat-footed impossibility of ever settling institutional questions of education without first making a decision – yes, a political decision – on the nature of learning and learners. Yet for all that, the decision about learning and learners was perforce a decision about an ideal, about how we conceived what a learner should be in order to assure that a society of a particularly valued kind could be safeguarded. There is no completely naturalistic way of resolving the question about what model of the learner we want to enshrine at the center of our practice of education. For there are many ways to learn and many ways of encouraging different forms of learning with different ends in view. At the heart of the decision process there must be a value judgment about how the mind should be cultivated and to what end. While I wish to consider alternate models of the learner, I have no illusion that I can do so just in the spirit of a naturalist or as a student of the learning process. In fact, models of the learner that are on offer in the psychological literature, in the cognitive sciences, or in AI are themselves constructions based on a selection not simply of data, but of the conditions under which learning is studied. As I tried to say a few years ago, it is possible to construct not only experimental studies but “real life” situations that make people (or pigeons, for that matter) look stupid or clever, generative or passive, combinatorial or rote (Bruner, 1982). Then the theoretical model that is constructed becomes, as it were, the text of the culture, and life is made to imitate text in the same subtle ways in which, in another closely related domain, life imitates art. Please do not interpret what I am saying in the relativistic sense that all theories or models of the learner are equally true or even equally right. Rather, what I wish to say is that any model of learning is right or wrong for a given set of stipulated conditions, including the nature of the tasks one has in mind, the form of the intention one creates in the learner, the generality or specificity of the learning to be accomplished, and the semiotics of the learning situation itself – what it means to the learner. This is not to say that a new or different model of the learner is needed for every task or situation in which learning takes place. To put it in the current jargon, it is absurd to insist that each and every theory of learning is utterly domain specific, that nothing general can be said about learners or learning or learning environments. You do not quite need a different model of a learner to talk about learning how to play chess, learning how to play the flute, learning mathematics, and learning to read the sprung rhymes in the verse of Gerard
Salkind_Chapter 30.indd 292
9/4/2010 10:32:51 AM
Bruner
Models of the Learner
293
Manley Hopkins. Even if I do have to say it in folk psychology rather than in programming talk, all of them will involve attention and memory and courage and even, pace AI, some heuristics for maintaining frustration tolerance. The issue, as we shall see yet again, is that learning is indeed context sensitive, but that human beings, given their peculiarly human competence, are capable of adapting their approach to the demands of different contexts. But I am tipping my hand, for it is only later that I wanted to talk about a general model of a learner as one equipped to discriminate and deal differentially with a wide variety of possible worlds exhibiting different conditions, yet worlds in which one can cope. Let me now take a fast gallop through the landscape we surveyed that day from the phoenix nest on the site of Goebbels’ house in the exurbs of Berlin when we got down to our formal discussion of “models of the learner.”
Models of the Learner Tabula rasa. The first, and perhaps the most ancient is really based on the Aristotelian notion of mimesis. In its 18th century version, it rested on the premise that experience writes on the wax tablet of the mind. One learns from experience (rather than through divine revelation or through received texts). Or as Locke put it, nothing gets into the mind save through the senses – but as Leibniz countered, nothing except mind itself. This view takes as a central premise that such order as there is in mind is a reflection of the order that exists in the world, and that is why the concept of association is always so central to empiricist theories. Things that are together in space and time in the world succeed, under the sway of this principle, in being together in the mind. I need not go into the troubles of empiricism. They have been raked over historically by everybody from Aristotle (whose sensus communis was something of a constructivist takeover bid) through the Schoolmen, from Kant through Wittgenstein and Chomsky. I want, rather, to take it as a given, a cultural text in Geertz’s sense, to be examined for its cultural significance in shaping our practices. I want to note only that, given belief in associationist empiricism, we adopted ideas about learning procedures to fit and constructed learning environments that in fact made people look like little empiricists–averting our eyes from all instances where it didn’t, as for example in the acquiring of a language. And when we were forced to look at that, we concocted Augustinian theories about it and devised nonsense syllable research in support of them. The formula for success in empiricism is to have experience. Hypothesis generator. There is a class of learner models that represents a reaction against the rather passive view of empiricist, tabula rasa notions. They have in common a notion of intentionality at their center. The learner, rather than being the creature of experience, selects that which is to enter.
Salkind_Chapter 30.indd 293
9/4/2010 10:32:51 AM
294
Curriculum, Instruction and Learning
The principle of selection varies from theory to theory: from the sensus communis of Aristotle and the vis integretiva of Aquinas that sorted the associated input of experience in the light of the principles of reason, to the principles of wish-fulfillment and ego defense of Freud that permitted us to experience (or interpret) only those parts of experience that were adequate compromises between the demands of conflicting needs. What exactly generates hypotheses or programs the filter, which selects and organizes what gets through the senses into the mind, varied widely and was always seriously underdefined. Even such towering learning theorists as Edward Tolman, Lev Vygotsky, and John Dewey, all of whom took the view that experience came shaped by hypotheses rather than by the world, were grandly vague in their specification of how hypotheses came into being – though Dewey and Vygotsky gave special pride of place to the role of language as a hypothesis-generator, a place that promised more than it delivered. It was never altogether clear how to extrapolate an educational posture from hypothesis theories, save in one respect. Emphasis was on an active curiosity guided by self-directed projects – a feature of Progressivism in America and in the unrealized pedagogy of Vygotsky’s followers in the Soviet Union, unrealized save in the discipline of “defectology.” The formula for success in learning, according to the hypothesis formulation, is to have a good theory. Nativism. At least three forms of muted nativism have shaped our models of the learner. One derives from Immanuel Kant. A second comes from Gestalt theory. The third, derivative of Descartes, is still with us in Chomsky’s theory of mind. In a deep historical sense, they are all inheritors of the tradition of Platonism. All share one central concept: Mind is inherently or innately shaped by a set of underlying categories, hypotheses, forms of organizing experience. The task of the learner is to work his or her way through the cluttered surface structure of sense to an underlying or ideal or deep organization that provides a richer or righter or more predictive or more generalizable representation of reality. Where evolutionism entered this view (as with ethnologists and, in a shriller form, in sociobiology) it is assumed that the fit between the categories or hypotheses of mind and the world that they represent is a product of natural selection. For all their disagreements on details, Nativist theories have one big thing in common: The opportunity to use and exercise the innate powers of mind is all. That is the formula for success as well. Constructivism. Probably the most powerful expression of this view comes from Jean Piaget, although a more rigorous and considered expression of it can be found in the writings of the philosopher Nelson Goodman. The tenet of Piaget’s constructivism is that the world is not found, but made, and made according to a set of structural rules that are imposed on the flow of experience. By structural rules it is intended to emphasize that knowledge is not local but derived from a structure of the whole – that local operations reflect universal
Salkind_Chapter 30.indd 294
9/4/2010 10:32:51 AM
Bruner
Models of the Learner
295
operations of the system as a whole. Learning is bound within the limits of the rules of the system; it consists of realizations of the general rules in application to particulars. Development consists of a series of stage-like progressions, stage change consisting of a change in the rules of the system and later rule systems absorbing earlier ones as special cases. The learning dynamic of the system at any stage is provided by an unstable equilibrium or dialectic between assimilating experience to the rules and accommodating the rules to experience. When the equilibrium becomes unstable enough, the structure changes. The constructivist model of the learner places strong emphasis on selfpropelled operations on the world as the way to mastery – a pretty wideband conception. Its formula for success is that nothing succeeds like a theoretical system, and one succeeds supremely only by going to a higher system that subsumes it. Novice-to-Expert. This view of the learner has so recently emerged that is hard to characterize. It is very practical, in some respects highly anti-theoretical. It operates within domains, almost at times denying the utility of a general theory – or perhaps that is a sign of its immaturity. It begins with the premise that if you want to find out about learning, ask first about what is to be learned, find an expert who does it well, and then look at the novice and figure out how he or she can get there. To aid in this task, simulate the novice’s performance and the expert’s in a computer program, and see what transformations and heuristics will get you from the one to the other. You may even be helped by studying and simulating some typical mid-stages. Such generality as may be present in learning different tasks will eventually show up in the simulations. The immediate challenge is to get the novice to be an expert as quickly and as painlessly as possible, and never mind high theory. The formula for success is “be specific and be explicit.” Or, a computer programmer is a better friend than a philospher of mind. Or, it is more important to get through the keyhole than to see the sky. Or, and perhaps more seriously, subordinate the learner to the steps he must take to attain expertise. In sketching these views about the model of the learner, I have omitted an important issue, one that had better be treated independently of each, for it is curiously extrinsic to all of them. It is the issue of the carrot and the stick – the role of motivation in learning. It has been a source of embarrassment in the history of the subject from the Stoics to Skinner. Let me state its dilemma in the starkest way. How can knowing something be affected by whether the knowledge gained leads to reward or to punishment? If the theory of reinforcement related to the acquisition of knowledge, God would not have had to expel poor Adam and Eve from the garden for eating of the tree of knowledge. He would have arranged, Huck Finn style, for them to have developed a very bad stomach ache from the consumption of green apples. Instead, He knew that knowledge, once attained, is irreversible and for better or for worse. And so, if I may be Miltonian, he had to condemn them to a new way of life where that knowledge could be put to use.
Salkind_Chapter 30.indd 295
9/4/2010 10:32:51 AM
296
Curriculum, Instruction and Learning
It is the use of knowledge rather than knowledge itself that is affected by the nature of its consequences. Use implies performance; performance entails action. The carrot and the stick are instruments for affecting action, not thought. Thus the degree to which models of the learner feature reinforcement is the degree to which they concentrate on the behavior of the learner rather than on his or her mind. It is not surprising, then, that even in the heyday of the Empiricists (who thought of themselves as philosophers of mind) virtually nothing was said about carrots and sticks. Indeed, as Crane Brinton reminded us a generation ago in his classic Anatomy of Revolution, the precepts of Empiricism (particularly in John Locke) were designed to justify man’s freedom from the authority of King and Clergy. He was, in this new dispensation, his own knowledge getter. Thus Jonathan Edwards could preach to his flock in Northampton on the frontier of the Massachusetts Bay Colony in the late 17th century that they too, like Isaac Newton, could by their own efforts of mind unlock the secrets of God. It is interesting, then, that most theories depicted the learner, either implicitly or explicitly, as self-motivated – at least while they concentrated on learning as a means of acquiring knowledge. Indeed, we can say that the carrot and the stick – reinforcement – have to do not with learning but with morality: how one acts on the basis of what one knows. Even then, the connection between reward and punishment on the one side and virtuous action on the other remains as obscure as ever. The debate over the effectiveness of, say, prisons rages as incoherently as ever. And the thought controls imposed by dictators are much more concerned with censorship and other means of stopping the flow of information than they are with tinkering with schedules of reinforcement.
A Model of Models I have already tipped my hand, as I confessed in passing. There is no reason, save ideology and the exercise of political control, to opt for a single model of the learner. We do learn from experience, when that is all we have to go on. On occasion we act like induction machines, though it is rarely so dark out that we can’t do better than that. Indeed, given half a chance, we generate hypotheses that take us way beyond the information given – often with good effect, and always with some risk, which requires courage and the buffering of a support system. There is every reason to believe that a nervous system evolved in nature and more latterly and swiftly in culture endows us with a set of useful presuppositions about both nature and culture. How else can we account for the swift mastery of language and other symbolic forms to which we take so easily and with insufficient knowledge for proper induction? How can we doubt that a culture that regulates its moods and acts according to such abstract inventions as interest rates, social slights,
Salkind_Chapter 30.indd 296
9/4/2010 10:32:51 AM
Bruner
Models of the Learner
297
gross national products, and loyalty to Alma Mater is made up of people who not only construct the world in which they live but share it as an ontological given? It is even true that if you want to be a postman or a trust officer, you would do well to look closely at how they go about their business and then try to simulate them as a clever clone, hopefully keeping your tongue in your cheek and your powder dry the while. What it amounts to, as I have already hinted, is treating all models of the learner as stipulative, and then inquiring into the conditions under which they might be effective or useful or comforting. If you genuinely believe that it improves a nation’s confidence in its control over things to keep children in schools for a good part of the day, then do so. Or if you think formal schooling is structurally inevitable in a society with more disensus than consensus, again keep them in school. These are reasons of politics, and they plainly have a place in any debate, for education is political too. But if you see children learning mathematics by rote, you can also say (this time on more naturalistic yet practical grounds) that somebody got confused about models and slipped in an empiricist one in place of a constructionist one. In a word, the best approach to models of the learner is a reflective one that permits you to “go meta,” to inquire whether the script being imposed on the learner is there for the reason that was intended or for some other reason. There is not one kind of learning. It was the vanity of a preceding generation to think that the battle over learning theories would eventuate in one winning over all the others. Any learner has a host of learning strategies at command. The salvation is in learning how to go about learning before getting irreversibly beyond the point of no return. We would do well to equip learners with a menu of their possibilities and, in the course of their education, to arm them with procedures and sensibilities that would make it possible for them to use the menu wisely. Here the hermeneutic circle ends. You cannot improve the state of education without a model of the learner. Yet the model of the learner is not fixed but various. A choice of one reflects many political, practical, and cultural issues. Perhaps the best choice is not a choice of one, but an appreciation of the variety that is possible. The appreciation of that variety is what makes the practice of education something more than a scripted exercise in cultural rigidity.
Salkind_Chapter 30.indd 297
9/4/2010 10:32:51 AM
Salkind_Chapter 30.indd 298
9/4/2010 10:32:51 AM
31 Child’s Talk: Learning to Use Language Jerome Bruner
T
his is a short but fascinating book which sums up Jerome Bruner’s work on children’s language over a ten-year period. As anyone who has read any of his earlier work would expect, there is a major emphasis in the book on the child’s ability to USE the linguistic structures he acquires to communicate needs, wishes, and intentions, and to ‘conduct joint action with another’ (p. 7). Learning to talk is not simply learning the words, or the syntax, but learning how to do things with them. It follows, for Bruner, that an exclusive focus on grammatical structure is unlikely to be fruitful in explaining the mystery of language development in the child, because the functions which the linguistic forms serve are ignored. In his preface Bruner is explicit about his lack of interest in the earliest phase of developmental linguistics, in the 1960s, when the field was dominated by the study of syntax. It was his move to Oxford as Watts Professor of Experimental Psychology in 1972, to an intellectual climate sympathetic to his views on language, which stimulated a research programme which forms the basis for his book. Bruner and his collaborators recorded conversations between young children and their mothers in their homes, and used these data to try to answer the question of how children are assisted by their linguistic community to develop the language they use. The claim put forward in the introduction to the book is that a crucial factor in the child’s acquisition is the ‘transactional format’: [Language acquisition] begins when the mother and infant create a predictable format of interaction that can serve as a microcosm for communicating and for constituting a shared reality. The transactions that occur in Source: Child Language Teaching and Therapy, 1 (1985): 111–114.
Salkind_Chapter 31.indd 299
9/4/2010 10:32:44 AM
300
Curriculum, Instruction and Learning
such formats constitute the input from which the child then masters grammar, how to refer and mean, and how to realize his intentions communicatively (p. 18).
The formats are a principal feature of what Bruner calls the Language Acquisition Support System, a framework for interaction which inducts the child into the language to be learned via its appropriate uses. In the first part of the book Bruner examines some early formats which he claims ‘provide a type case for the framing of early communication’. These forms are games like peekaboo, hide and seek, and object exchange, and they each provide a highly ritualized setting in which joint action by mother and child takes place, regulated by (relatively restricted though variable) language. In examining these formats in his longitudinal studies, Bruner was able to document the child’s developing mastery of roles, and of the language of transactions in the game. He also noted that children become able to generalize the game formats to contexts in which they had not occurred before. Bruner moves on from these early game routines to discuss the growth of reference, which, it is again argued, develops within strictly constrained routines or formats. One of these is the ‘reading format’ (studied by Bruner and Anat Ninio) in which mother and child are looking at picture books together. It turns out that in this format there are a restricted range of utterance types by the mother, and that their order in the discourse is stable and predictable. The mother’s utterances direct the child’s attention towards easily identifiable objects, ask questions about them, label them, and provide feedback for the child’s attempts at labelling. Again a context shared between mother and child, in which the child can gradually become familiar with a restricted range of linguistic possibilities, serves as the framework within which a crucial aspect of language is learned. The linguistic possibilities cannot be TOO restricted, however, otherwise change cannot take place. It is in addressing this issue that Bruner uses the metaphor (familiar to poker players) of ‘raising the ante’. Within a familiar framework the mother will be stricter about what she will accept as appropriate behaviour as the child’s linguistic ability improves. Bruner gives as an example a mother’s changing response to her child’s vocalisations in the reading game, once he had started to use his first words. She responded by treating him as someone who was linguistically more capable, and required him to produce identifiable lexical item in his responses, and would not accept babbles that she could not understand. The child had to repeat or modify his utterance. ‘She became much firmer in her demands’ (p. 83). The demands that are made are however fine-tuned to the child’s current capacities. This maternal raising of the ante is seen, then, as an adaptation of these shared contexts to the child’s developing linguistic abilities, which is presumably crucial to their constructive role in development.
Salkind_Chapter 31.indd 300
9/4/2010 10:32:44 AM
Bruner
Child’s Talk
301
The third important topic in the book is that of procedures of request – invitations, requests for objects, and requests for assistance in action. Once again early development is found to depend on the scaffolding provided by familiar formats. The ‘specification’ for requests in English is made clear for the child by his participation in routines structured for him by the adult community. The book presents a well-worked-out and coherent view of the Language Acquisition Support System, not only as a necessary framework for the child learning to talk, but also as a way for adults to induct the child into the culture which he is entering. Readers of this journal, particularly those concerned with very young children, or with older children who for some reason are just starting to talk, will find it thought-provoking and challenging. It is not a practical book, in any sense, but in its attempt to face head-on some of the most central and difficult questions about language development, from the perspective of the child as a language user in a cultural setting, it cannot fail to be instructive. There are however caveats to be entered, which concern the extent to which the notion of format can be generalized. First, the data which are carefully considered in the book are from children from middle-class homes. One is aware of other modes of acculturation, in different homes, which could well not involve some or many of the formats which emerge in Bruner’s data. For the theory to hold, it would be necessary to identify, in these alternative environments, similar routines which did the work of peekaboo, reading picture books, etc. More seriously, if the substitute formats were not available, we would expect that the functions they underpin would simply not develop. The second problem of generalization is with respect to language development later than that addressed by Bruner. He largely restricts himself to the first 18 months of life, and so it might be regarded as unreasonable to raise issues that concern children older than this. But once we start to think about language development even up to the age of three, it is a little unclear what the explanatory role of formats is, perhaps because as the child’s linguistic abilities improve dramatically, it seems likely that identifying and isolating formats will prove a problem. There is between two-and-a-half and three years of age a fairly dramatic influx of various kinds of verb modification into the English-speaking child’s language. The modals can and will, for example, are likely to make their appearance, most often in sentence-initial position in utterances which are requests of various kinds. Now if it is possible to identify a ‘steady format’ (p. 127) for requesting, that persists up to the age of three (and one would need convincing that procedures could be specified for format identification), the weight of explanation for the changes in the child’s request language (from pointing, to single word, to multi-item declarative, to can-initial structure) has to fall somewhere else – perhaps on the mother’s linguistic fine-tuning within the steady format and not on the
Salkind_Chapter 31.indd 301
9/4/2010 10:32:45 AM
302
Curriculum, Instruction and Learning
format at all. And if we consider other verb-forms, like the tense markers which typically appear at the same time, what is/are the format/s which assist the child to their appropriate use? (Bruner in fact suggests right at the end of the book that children may ‘develop primitive concepts of aspectual time’ (p. 134) from the sequential structure that formats have. But this is an aside which is not particularly illuminating.) The reader will find that the book stimulates thought on these and other issues central to language acquisition studies. The book is well produced and has been carefully copy-edited. A paperback version is available.
Salkind_Chapter 31.indd 302
9/4/2010 10:32:45 AM
32 The Reflexivity of Cognitive Science: The Scientist as Model of Human Nature Jamie Cohen-Cole
I
n 1963, Bernard Berelson edited a collection of essays by leading scholars in the behavioral sciences. The essays found their origin in a series of radio broadcasts for Voice of America, the radio-based propaganda arm of the United States Information Agency. These programs aimed both to cover the immediate topic at hand and to carry out the general mandate of the Voice of America – showing the virtues of the American way to people around the world (on Voice of America see Heil, 2003; Krugler, 2000; Shulman, 1990). To accomplish this dual aim, the programs’ more specific goal was to explain how the various behavioral sciences operated. Their broader goal – unstated but very real – was to demonstrate the connection between American democracy and the objective and scientific study of society. At the conclusion of his address on psycholinguistics, George Miller remarked: ‘the scientist is Everyman, looking just as you and I. We go and look for the things we want, and when we find them we find part of ourselves’ (1963: 150). These comments raise several issues worth close attention. First, Miller collapsed the distinction between the scientist and ‘Everyman’. From this perspective, the psychology of the scientist is the same as the psychology of the human subject. Second, the salient feature of human nature (or of the scientific process) is the process of searching. To Miller, searching provides both knowledge of the world and knowledge of oneself. Clearly, Miller’s image of objectivity was not one that required the scientist to stand apart from the object of knowledge.
Source: History of the Human Sciences, 18(4) (2005): 107–139.
Salkind_Chapter 32.indd 303
9/4/2010 10:32:36 AM
304
Curriculum, Instruction and Learning
Miller’s conclusion engaged in a double reflexivity, linking the scientist’s self to the human subjects (Ss) studied and, at the same time, connecting self-knowledge to knowledge of the world. While Miller may be unusual in his ability to engage two forms of reflexive argument in the space of two sentences, the mere fact that he engaged in reflexivity at all should not be particularly surprising. Several genres have suggested the importance of seeing the interconnection of psychologists’ own selves with the human selves they seek to describe. First, history of psychology literature has noted how psychologists have engaged in reflexivity by linking their topics of study to their own subjectivity (Capshew, 1999; Danziger, 1990; Morawski, 1992, 2000; Richards, 1987). Second, work in science studies has argued for analyzing science by breaking down the analytic boundaries between the natural world and the social world (Latour, 1993). If applied to history of psychology, this work would imply the analytic value of breaking down distinctions between the natural world described by psychology (the human mind) and the minds and social worlds of psychologists themselves. Even if one does not accept the value of engaging in reflexive practices in the human sciences, there remain good reasons to believe that human nature consistently leads human scientists to engage in reflexive practices. If human selves are socially constructed or if people are ‘made up’ as Ian Hacking has argued (1986), then one would expect to find that human scientists have often used their self-knowledge as a generative feature in their work. Science made from such self-knowledge would then have helped bring into being an external psychological and social world that, to some extent, mirrored the human scientist’s understanding of himself or herself. Historians have shown us numerous examples of this phenomenon, from Sigmund Freud, to William James, to Gordon Allport (Anzieu, 1986; Nicholson, 1998; Richards, 1987; Shorske, 1981; Toews, 1991). While it might not be surprising to find such reflexivity (some might say ‘lack of objectivity’) in social psychology and psychoanalysis, the same interplay between self-knowledge and scientific psychological knowledge has pervaded even those parts of experimental psychology that have been regarded as the most ‘objective’ and methodologically rigorous. Neo-behaviorists Ernest Tolman, B. F. Skinner, Clark Hull, and Edwin Guthrie regularly worked between their senses of themselves and their scientific investigations (Hilgard and Bower, 1966: 104; Smith, 1986; 1990: 237–66). Likewise, the arch-operationist S. S. Stevens took the normative rules of data collection that he prescribed for psychologists and translated them into his studies of audition. By the time he was done, Stevens had produced a theory of hearing in which the brain acts as if it were a scientist following operationist rules of method by making measurements of the cochlea’s electrical output to calculate the loudness of a particular sound (Stevens, 1936). Although it may be the case that the possibility of ‘making up’ people and the nature of the subject matter make it difficult, if not impossible, for human
Salkind_Chapter 32.indd 304
9/4/2010 10:32:37 AM
Cohen-Cole
The Reflexivity of Cognitive Science
305
scientists to escape reflexive practices, this article concerns itself with another reason for the pervasiveness of reflexivity in the human sciences. Because of their history and the history of the cultures within which they emerged, human scientists have lacked sufficient authority as experts to maintain their autonomy and insulate the objects of knowledge they produce from lay audiences. The issue for the human sciences extends beyond the fact that nonexperts invariably possess some form of folk psychology (or folk sociology, or folk anthropology, etc.). For just as all (or most) adults possess some form of theory of mind, so too do we find that all (or most) people possess naïve physics. If people did not possess folk physics then Jean Piaget’s studies on children’s construction of time, space, matter, and volume would have looked quite different; moreover, college students, including 70 per cent of engineering students, would not retain a belief in Aristotelian physics after taking a course in introductory mechanics.1 The difference I want to highlight here is that while physicists have had the authority to hold forth on topics such as quantum mechanics and, for the most part in recent history, remain unchallenged,2 human scientists have not often been accorded a similar authority by their societies. This lack of autonomy has meant that the barriers between professional human science and folk psychology, sociology, and anthropology have been both low and permeable. Not only have the human sciences generously borrowed and shared ideas, facts, categories, and ways of thinking with other parts of the societies within which they developed, they have also been particularly answerable to non-experts (Barber, 1952; Merton and Wolfe, 1995; Rose, 1996).3 Consequently, the human sciences have been largely unable to establish a reliable epistemic or social distinction between the intrinsic aspects of their fields and their extrinsic social, cultural, political, and institutional contexts. The porous nature of the boundaries surrounding their disciplines has opened possibilities for human scientists to engage in reflexive work. It has provided them with the opportunity to turn a wide variety of ideas (whether popular or expert) about human nature on themselves and their colleagues. Human scientists have been able to deploy categories, methods and arguments drawn either from the human sciences or from folk knowledge to legitimate their own endeavors or attack foes within their own disciplines. This article focuses on a particular moment in the history of human sciences in which this sort of reflexivity played a significant role: the early days of revolution in which cognitive science supplanted behaviorism as the hegemonic science of human nature.4 In the struggle that marked the cognitive revolution we see little use of Jamesian or Freudian deep and thorough selfexamination in efforts to make a science of the human. Rather, behaviorists and their foes regularly traversed the boundary between scientific and folk psychology as a strategy for legitimating their work. Reflexivity provided the combatants with weapons to attack their foes and also methods and concepts to form their respective sciences of human nature.
Salkind_Chapter 32.indd 305
9/4/2010 10:32:37 AM
306
Curriculum, Instruction and Learning
To enhance their public standing, they sought to make their own thought processes match folk ideas of scientific thinking. They applied the same categories of selfhood found in popular culture and social psychology to themselves. They collapsed distinction between normative rules for scientific thinking and the actual processes of human thinking. As cognitive scientists like George Miller and Herbert Simon (1966a; 1966b; 1980) crossed back and forth between scientific descriptions of the human and normative discussions of the best way for scientists to think, they borrowed from the folk and social psychological image of right thinking to inform their own personal and public images. These very same scientific self-images would form the basis for the image of human nature that cognitive science produced.
The (Disciplinary) Politics of Psychological Theory In the years after the Second World War, intellectuals and social scientists developed a language with which to discuss social and political issues in terms of categories of thought. They produced normative accounts of mind, characterizing some forms of thinking as better than others. Academics valued flexible, interdisciplinary minds in their colleagues and students (Cohen-Cole, 2003: ch. 3). A wide range of Americans joined them in assigning political and social meaning to open-mindedness, equating it with democratic values. To social critics, personal autonomy and freedom of thought distinguished the United States from the Soviet Union and also offered hope to those who bemoaned the disappearance of individualism under the pressures of mass society, suburbanization, and social conformism.5 This psychologization of politics appeared, for instance, in two different landmark works in 1950, The Authoritarian Personality (hereafter cited as TAP) and The Lonely Crowd. The latter work examined how, under the influence of mass society and late capitalism, Americans had lost their traditional individualism to social conformity. TAP argued that democratic people exhibit openminded, tolerant, and flexible minds while authoritarian people are closed-minded and rigid, leaving them insensible to empirical realities. This particular characterization of the cognitive attributes of democratic and undemocratic minds was part of and followed a much larger research tradition in social psychology that examined the psychological structures connected to certain political orientations (Crutchfield, 1955; Maslow, 1943; Rokeach, 1948, 1950, 1951b, 1951c; Stanger, 1936a, 1936b). While these psychological works approached the politics of mind from the position of psychological expertise, others examined the problem from the side of political analysis. In his ‘Long Telegram’ of 1946, the work in which he articulated the containment doctrine that would define America’s cold war strategy and form the basis of much of the country’s foreign policy for the subsequent 50 years, George Kennan used essentially the same psychological categories to
Salkind_Chapter 32.indd 306
9/4/2010 10:32:37 AM
Cohen-Cole
The Reflexivity of Cognitive Science
307
describe Soviets as TAP used to describe authoritarians. Importantly, Kennan’s telegram preceded the appearance of TAP by several years. Later on, Pulitzer prize-winning historian Arthur Schlesinger, Jr, an advisor to John F. Kennedy, likewise noted how communism’s rigid ideology ‘obscured reality’.6 There were those at the time who believed that priority for explaining the mind lay not with human scientists but with politicians. For instance, in 1953 Homer Ferguson, Republican Senator from Michigan, attacked Harvard’s Russian Research Center for wasting taxpayers’ money by usurping Congress’ role and priority in defining, identifying, characterizing, and locating the communist mind. Gesturing to the work led by his colleague, Joseph McCarthy, Ferguson noted: ‘there has been research, by the Congress, in the way the communist mind works…’ He added that Congress had done a better job in understanding communist mentality than Harvard’s Russian Research Center (Kelso, 1953). This widespread practice of treating politics in terms of thinking styles had implications for the disciplinary structure of psychology and for research programs within the field. Because of the low and permeable boundaries between expert and folk psychology, the choice to pursue one kind of psychology rather than another was filled with political meaning. Moreover, again and again psychologists would engage in a casual form of reflexivity, investing not only their models of human nature, but also their own thought processes with political meaning. To many psychologists, pursuit of political change could come about through the development of the right kinds of psychological theories. For instance, in 1950, Theo Lentz articulated a typical political argument for pursuing psychological study of specific forms of human subjectivity (1950: 213–14). In this paper, Lentz’s argument relied on a common juxtaposition of claims. In particular, he linked disciplinary reform (the advancement of social psychology) with political reform (development of world government) with reconceiving human nature (by making world-mindedness a facet of the human mind) with the call for psychologists to be imaginative. In a 1953 grant proposal, Jerome Bruner articulated a similar argument for how the ‘world crisis’ required the development of cognitive psychology.7 Although there were certainly variations in the political goals (not everyone called for world government), norms of thought (not everyone called for world-mindedness), models of human nature, and disciplinary goals (not everyone called for social psychology), a wide range of psychologists, social scientists, public figures, and foundation officials made these sorts of links. They tied the promotion of certain ‘better’ forms of human nature to particular scientific models of human nature and to the disciplinary reconstruction of the human sciences (Bryson, 1948; Cantril, 1949, 1950; Frank, 1951; Kluckhohn, 1952; Maslow, 1946, 1948; Mooney, 1954; Tolman, 1948). Although social scientists used their tools of psychological analysis to critique specific social groups or specific modes of thinking, they also regularly
Salkind_Chapter 32.indd 307
9/4/2010 10:32:37 AM
308
Curriculum, Instruction and Learning
used these techniques to talk about their own discipline. In one instance, Milton Rokeach’s discussion of psychological ideas indicates the way in which those theories could carry political significance. Rokeach had devoted substantial effort to understanding and explaining the emotional and cognitive deficits of the closed-minded person and the corresponding virtues of the open-minded person (Rokeach, 1948, 1949, 1950, 1951a, 1951b, 1951c, 1951d). His analysis of these categories, much like the arguments in TAP, associated openmindedness to democracy and closed-mindedness with racism, ethnocentrism, or authoritarianism. On the basis of this work, Rokeach’s description of behaviorism and psychoanalysis as presenting a model of the closed-minded person, and of Gestalt psychology as presenting a model of the open-minded person, was laden with political implications. Specifically, the model of human thinking presented in Gestalt theory was that of the democratic citizen, while the model presented in behaviorism and psychoanalysis was appropriate to people who were subjects of totalitarian states (Rokeach, 1960: 65).8 As the language and arguments of psychologists like Bruner, Lentz, and Rokeach and folk psychologists like Kennan, Schlesinger, and Ferguson (and McCarthy) indicates, political and cultural values were embedded in the effort to develop understanding of the open, autonomous mind. Their arguments involved contentious disciplinary politics in which the overtly political categories of open, rational, and democratic thinking were contrasted with those of the closed, prejudiced, authoritarian mind and deployed in a disciplinary struggle to define what constituted scientific psychology.
Psychology, the Science of Behavior or the Science of Mind? In the 1950s at the center of psychology’s struggle was the question of whether the study of mind could properly be understood as a part of scientific psychology. In a 1958 invited address to the American Psychological Association, the philosopher of science Herbert Feigl noted: ‘Intuition,’ ‘insight,’ ‘understanding,’ and ‘empathy’ have been key words in the strife of psychological movements. These terms are used honoriflcally by one party, but they are suspect (if not on the index verborum prohibitorum) with the other party. (1959: 118)
These two parties struggled over whether intuition, insight, and understanding belonged in psychology.9 Feigl’s sense of two parties was by no means an outsider’s idiosyncratic reading of the field. George Miller, Eugene Galanter, and Karl Pribram, for instance, divided the field into ‘optimists’ who believed that human behavior was determined by the environment and could be described completely by stimulus-response chains and ‘pessimists’ who believed that other things (such as mental processes) were necessary to explain human nature.10
Salkind_Chapter 32.indd 308
9/4/2010 10:32:37 AM
Cohen-Cole
The Reflexivity of Cognitive Science
309
At the end of the Second World War, and for the following 10 years, Miller, Galanter and Pribram’s optimists controlled the discourse on what was and what was not scientific psychology. At that time, psychology was anything but the science of mind.11 The ‘fundamental’ and scientific center of the discipline was experimental psychology, which was dominated by the behaviorist and operationist concerns that made mind an improper subject for scientific study.12 The central importance of ‘learning theory’ in the scientific end of the discipline gives another indication of how many in psychology deemed the study of mind to be unscientific.13 Learning theorists sought to explain how people and other organisms act differently in different circumstances. The main branch of learning theorists followed in the footsteps of J. B. Watson and Edward Thorndike. These psychologists included E. R. Guthrie, Clark Hull, Kenneth Spence, B. F. Skinner, and their followers. They, their students, and their theories dominated experimental psychology, at least in numbers.14 Although it is important to recognize the differences among these psychologists, they shared a fundamental perspective on the nature of human (and animal) subjectivity. For them, most, or all, of what one needed to know about psychology could and should be explained on the basis of the environment’s effects on the subject. From this perspective, organisms ‘learned’ to solve problems not from ‘understanding’, but through random trial and error and the association of particular behavioral responses with a reward or other stimulus. Although there were certainly other approaches to learning represented by Gestalt theorists or Edward Chance Tolman or David Krech (né Krechevesky) that emphasized the importance of insight, hypothesis, and cognition in learning, these were distinct minority positions.15 It is certainly the case that large sections of psychology – particularly clinical and educational psychology (the fastest-growing components) – were concerned with mind.16 But social and clinical psychology had, at best, a marginal status as a scientific endeavor. Those who studied mind may have been psychologists, but their status as scientists was questionable in their own community. The more psychologists were concerned with mind, the less they qualified as scientists within the discipline (Johnson, 1956: 712; Kelley, 1955; McGuire, 1956: 153; Rogers, 1955; Skaggs, 1945: 234–48; Strupp, Castore, Lake, Merrill and Bellak, 1956: 153–7). As the psychologist E. Parker Johnson put it in 1956: Practically everyone who is not a psychologist knows that [psychology] is the science or study of the mind, and anyone with a dictionary may easily confirm this. . . . But, oddly enough, many modern psychologists refuse to accept this definition. Why? … The word mind … [is] by its very definition beyond the ken of science which, by its very definition, is built on the observation of observable events. . . . . Many protest, indeed, that scientifically speaking there is no such thing as the mind to be studied! (1956: 712)
Salkind_Chapter 32.indd 309
9/4/2010 10:32:37 AM
310
Curriculum, Instruction and Learning
Two critical assumptions were necessary for this argument, the definition of mind and the definition of science. With these two definitions taken as given, there was little compatibility between mind and science. But those two definitions were under attack even as this paper was published.
Attacks on Behaviorism At the end of the Second World War a broad range of academics called for developing a science that could account for the autonomous, the creative, and the rational aspects of human nature. This effort involved a struggle with proponents of behaviorism and (somewhat less) of operationism and positivism – those who saw humans as (mere) products of their environment and their basic drives (such as hunger). Drawing on the politics of thinking, a primary strategy for advocates of the science of mind was to attack the thought processes of behavioristic psychologists. These attacks made use of the normative categories of thinking drawn from social and folk psychology. Specifically, the features of the closed-minded, conformist person appeared as characteristics of the behavioristic psychologists. In these critiques, the center of scientific psychology (which was primarily behavioristic) appeared as uncreative, narrow-minded, rigid (Solomon, 1955: 170), and dogmatic (Allport, 1940; Ericson, 1941: 76; Harlow, 1958: 674); in short, it appeared to be governed by an ideology that confused methodological rigor with true (i.e. creative) science (Brower, 1949: 326, 328, 330, 332; Bruner, 1957a: 156). The primary reason offered for using epithets such as ‘narrow’ and ‘ideological’ to describe operationist, behaviorist, and positivist psychology was because of its reported aversion to and slighting of the study of the mind. Some psychologists, for instance, argued that ‘narrow operationism’ had limited the ‘freedom’ of psychology to pursue its ‘ultimate purpose, the scientific understanding of man’s cognitive behavior’ (Gruber, Hammond and Jessor, 1957: v). Just like ethnocentric, closed-minded people described in The Authoritarian Personality or Gordon Allport’s Studies in Prejudice, behavioristic psychologists were pictured as conformists, intolerant of difference. Of operationists, one psychologist noted, ‘their discussions and criticisms have produced a social climate in which the psychological theorist may hesitate to present theories which contain non-operational definitions’ (Prentice, 1946: 247). Another commented: ‘it has been noted that psychologists seem over conscientious and even compulsive in their efforts to be simon-pure [sic] and scientific almost to the point of fetish. Colleagues suspected of indiscretions are ostracized and avoided’ (Thorne, 1956: 152). The philosopher Alain Locke, an important advocate of social pluralism, noted that ‘in the cause of scientific objectivity’ positivism and behaviorism had become dogmatic and had ‘squeezed values and ideals out completely in a fanatical cult of “fact” ’ (1942: 197).
Salkind_Chapter 32.indd 310
9/4/2010 10:32:37 AM
Cohen-Cole
The Reflexivity of Cognitive Science
311
Critics of behavioristic psychology regularly suggested that it was a religious phenomenon. One of the standard procedures of behavioristic psychology – the assumption that rats could stand in for humans or other organisms as experimental subjects17 – came under attack as religious dogma (Bitterman, 1960: 705, 711).18 Donald Hebb participated in this critique, noting that psychologists deviating from the formula of stimulus and response theory could place themselves in a ‘larger demonology’. ‘As for “insight,” “purpose,” “attention” ’, Hebb continued, ‘any one of these may still be an invocation of the devil, to the occasional psychologist’ (1949: 4).19 This characterization of behaviorism was by no means a simple matter of internal disciplinary debate. The picture of behaviorism as rigid, religious ideology spilled outside of the discipline of psychology into academic journals outside the field and into popular non-fiction (Birnbaum, 1955: 15, 30; Koestler, 1964: 560–1 ).20 Sigmund Koch, editor of a three-volume survey of scientific psychology commissioned by the National Science Foundation (NSF), devoted his epilogue to an extended critique of the reigning methodology of behavioristic psychology. In Koch’s eyes, both behaviorism and the use that psychologists made of philosophy of science (operationism and logical positivism) could be subsumed under a single heading, the ‘Age of Theory’. Koch commented again and again on the ‘ideology’ (1959a: 732, 734–5, 769, 776–7, 786) or ‘code’ (1959a: 783) of the ‘age of theory’, its ‘reigning stereotypes’ (1959a: 783), ‘lack of realism’ (in contrast with ‘increase in realism’ as the age waned) (1959a: 748, 770),21 and its ‘narrow’ approach (1959a: 769–70). Koch noted the age of theory’s ‘hypothetico-deductive prescription’ (1959a: 776–8), its ‘doctrine’ (1959a: 785–6), its ‘programmatic’ thinking style and attachment to ‘facile’ mythology of perfection (1959a: 786), as well as its ‘autism’ and ‘autisms’ (1959a: 770, 785). In the parlance of the psychological theory of the time, autism meant lack of connection with reality and often implied lack of creativity (see McKellar, 1957). Noting that psychology sought security and respectability in following ‘fashionable theory of proper science’, Koch argued that ‘the dependence of the Age of Theory on prescription from extrinsic sources is but the most recent chapter in a consistent story of such extrinsic determination of ends and means’ (1959a: 783). With the waning of the Age of Theory, Koch saw reason to be hopeful. He noted that ‘for the first time in its history, psychology seems ready – or almost ready – to assess its goals and instrumentalities with primary reference to its own indigenous problems’ (1959a: 783). Koch continued: [Psychology] seems ready to think contextually, freely, and creatively about its own refractory subject matter, and to work its way free from dependence on simplistic theories of correct scientific conduct, (ibid.)
In other words, psychology’s ideological subscription to an external vision of science had narrowed its range and restricted its creativity (see also Hunt,
Salkind_Chapter 32.indd 311
9/4/2010 10:32:37 AM
312
Curriculum, Instruction and Learning
1956: 6; Koch, 1959a: 748, 786). Koch continued by pointing out that this new independence had led psychology to be more open-minded. Here Koch highlighted psychology’s maturity and autonomy and contrasted it with its past history (during the Age of Theory) in which it depended upon an outdated model of science derived from external authority framed, and subsequently rejected, by logicians of science. Thus behavioristic psychology had relinquished its own autonomy to a philosophy of science that philosophers themselves no longer believed in (1959a: 787–8). Koch’s assessment of the Age of Theory operated by a similar system as that the social psychologists used in their analysis of the prejudiced or authoritarian mind.22 Both critiques pointed out a lack of autonomy, a tendency to follow rules imposed by others, a lack of realism, and narrow, stereotyped thinking. Koch added that with psychology’s independence from external rules, it had also recently matured from focusing on rigor alone. ‘From the beginning, some pooled image of the form of science was dominant [in psychology]: respectability held more glamour than insight, caution than curiosity, feasibility than fidelity or fruitfulness’ (1959a: 784). To Koch this transition had enabled a more modern ‘open and liberated conception of psychology’ which allowed for ‘the role of creativity in all aspects of the scientific enterprise’ (1959a: 785–6).
Philosophy of Science, Natural Science, and Their Use in Psychology Philosophy of science was at the center of this debate within psychology. Whether following a formal school or a more informal understanding of science, psychologists were highly attuned to philosophical issues (Green, 1992; Winston and Blais, 1996). Casual references to modern philosophers and philosophical issues that peppered articles in psychology journals indicate the depth of familiarity psychologists had with philosophy of science (for example, see MacCorquodale and Meehl, 1948: 96). In fact, psychology paid more attention to philosophy than did the other sciences. As Herbert Feigl put it in an invited lecture to the American Psychological Association, ‘The majority of physicists want to unmuddle themselves without the aid of philosophical clarifiers. But I have found psychologists and social scientists much more hospitable’ (1959: 115). There was symmetry between behaviorists and anti-behaviorists. While behaviorists such as Kenneth Spence commonly welcomed Feigl’s positivist brand of philosophy of science, anti-behaviorists subscribed to an anti-positivist vision of science (Bergmann and Spence, 1941; Bruner, 1957a: 155–7; Spence, 1957). The normative account of scientific practice the anti-behaviorists adopted was one that emphasized the insightful aspects of science. Rather than functioning in the role of collating data, to anti-behaviorists the scientist’s mind had to be active and creative.
Salkind_Chapter 32.indd 312
9/4/2010 10:32:37 AM
Cohen-Cole
The Reflexivity of Cognitive Science
313
The anti-positivist vision of science was articulated by men vested with enormous cultural authority to speak for the nature of the scientific endeavor. These men included leaders of the scientific establishment such as James Bryant Conant (1950, 1951a, 1951b, 1952), J. Robert Oppenheimer (1956, 1958), Warren Weaver,23 and Jerrold Zacharias.24 They were joined by best-selling science writers such as Jacob Bronowski and Arthur Koestler (Bronowski, 1956; Koestler, 1964). Sociologists such as Bernard Barber and Talcott Parsons articulated similar anti-positivist visions of science (Barber, 1952).25 Likewise, historians of science including Thomas Kuhn (who was a protégé of Conant) attacked the positivist vision of science (Conant, 1947; Kuhn, 1962).26 All of these figures argued that science was a process that involved creativity, insight, ideas, and invention as much as the collection of data.27 Jacob Bronowski’s vision of science indicates just how closely the critiques of behaviorism and positivism were bound. For him, positivist and operationist philosophers failed to grasp the creative nature of scientific work. Attacking the behaviorist flavor of operationist philosophy and the rigidity of logical positivism, Bronowski argued: These accounts of science seem to me to be mistaken, on two counts. First, they fly in the face of historical evidence. . . . And second, both schools fly in the face of contemporary evidence.
Scientists know, Bronowski argued, ‘that science is not something which insects or machines can do. What makes it different is a creative process. . . . and this has sadly tiptoed out of the mechanical worlds of the positivists and the operationalists, and left them empty.’ From this point about the nature of science, Bronowski appended an argument about the nature of human thought. The world which the human mind knows and explores does not survive if it is emptied of thought. And thought does not survive without symbolic concepts. The symbol and the metaphor are as necessary to science as to poetry. (1956: 48–9)
In Bronowski’s eyes then, all thinking and science rely on the creative use of symbolic concepts. Moreover he took his vision of cognition and proper science to be opposed to that advocated by positivist philosophers of science. Oppenheimer concurred with Bronowski. ‘Truth’, he proclaimed in a 1955 invited address to the American Psychological Association, ‘is not the whole thing; certitude is not the whole of science. Science is an immensely creative and enriching experience; and it is full of novelty and exploration’ (1956: 130). Oppenheimer cautioned psychologists against borrowing from obsolete classical physics the mistaken view that the physical world is determinate (ibid.: 134). Even more, he warned psychologists against quantification for
Salkind_Chapter 32.indd 313
9/4/2010 10:32:37 AM
314
Curriculum, Instruction and Learning
its own sake, noting that such fascination with number had been typical of and appropriate for Babylonian prophecy and magic. More modern sciences, Oppenheimer suggested, should be pluralistic enough to value descriptive naturalistic approaches. Tellingly, the cognitive-developmentalist Jean Piaget was Oppenheimer’s candidate as someone who deserved respect despite his lack of statistically robust results (1956: 135). Oppenheimer’s arguments would have been useful to the anti-behavioristic psychologists. His argument against rigor for its own sake, suggestion that zeal for quantification was superstitious, and call for methodological pluralism gave ammunition to anti-behavioristic psychologists.28 Jerome Bruner, for instance, echoed many of Oppenheimer’s points in a hostile review (1957a) of a book by Kenneth Spence (Spence, 1956). The negative characterization of behavioristic psychology adopted the categories of analysis that social psychologists had used to describe rigid, closedminded, ideological people. Critics of behaviorism framed a positive counterpart to behaviorism’s narrowness. This positive version of psychology would value and reward creative insight among its practitioners rather than seeing merit in rigorous methodology alone. In this version of psychology, psychologists would be autonomous of narrow positivist philosophy of science, independent of external influence, open-minded, flexible, realistic, interdisciplinary, and creative (Koch, 1959a: 784 – 6).
Scientific Thinking as the Content of Cognitive Psychology In this section I turn to a direct challenge to the content of behaviorism, a challenge right on behaviorism’s home turf: the scientific study of normal (and universal) human nature. At stake was whether human nature could or could not be completely accounted for by stimulus-response connections. In the 1950s, an array of scientists joined forces from several fields to critique behaviorism by arguing for the existence of thought or behavior that was autonomous from stimulus. Central to this endeavor of creating a cognitive rather than behavioral psychology was proving that human behavior was creative and was not simply the product of experience.29 To those who insisted that thinking could not be explained solely by conditioning, the project extended beyond the claim that cognition was an irreducible aspect of normal human nature. Instead, there were very specific modes of thought that they ascribed to the normal human: human cognition was supposed to operate much like the thinking of a particular sort of person – the good scientist. But as discussed above, there was not unanimity in America about how to conduct proper science. When cognitivists compared human thinking to the scientific process, they were quite selective in their choice of the model of science. They adopted a vision of science that emphasized the creative and insightful nature of the scientific process.
Salkind_Chapter 32.indd 314
9/4/2010 10:32:37 AM
Cohen-Cole
The Reflexivity of Cognitive Science
315
From the earliest days of cognitive science, studies of human mental processes treated thinking, perception, and language as relying on scientific methods such as hypothesis formation and theory construction. The 1956 A Study of Thinking by Bruner, Goodnow, and Austin explained everyday cognitive processes by comparing them to scientific thinking. The authors wrote: ‘the development of formal categories is, of course, tantamount to sciencemaking’ (1956: 6; hereafter cited as BGA): Let us take as an example of concept attainment the work of a physicist who wishes to distinguish between substances that undergo fission under certain forms of neutron bombardment from substances that do not. . . . This kind of problem is hardly unique. The child seeks to distinguish cats and dogs by means other than the parent’s say-so, the Army psychiatrist seeks out traits that will predict ultimate adjustment to and performance in the Army, (ibid.: 233)
While these psychologists saw science in everyday cognition, their metaphor was more focused than merely linking one to the other. Not just any form of science would be the model of human thinking. They selected only certain aspects of the scientific process to compare to thinking: inference, invention, problem-solving, making hypotheses, and model construction (BGA, 1956: 10, 14, 17, 19, 31, 37–8, 54, 56, 92, 233, 244, 246). BGA saw their account of science as opposing the dogma of ‘naive realism’ in which science is a ‘voyage of discovery’ that sought to ‘discover the islands of truth’. In contrast with this vision, they argued, ‘science and common-sense inquiry alike do not discover the ways in which events are grouped in the world; they invent ways of grouping’ (1956: 7; emphasis added). This emphasis on the inventive nature of science drew support from both the biological and the physical sciences. BGA cited Ernest Mayr’s point that ‘species are not “discovered” but “invented”‘ (1956: 19; Mayr, 1952).30 The physical sciences taught ‘the revolution of modern physics is as much as anything a revolution against naturalistic realism in the name of a new nominalism’ (7). From the perspective of this nominalism, they asked: in what sense do the categories ‘such as tomatoes, lions, snobs, atoms, and mammalia exist’? The answer was that ‘they exist as inventions, not as discoveries’ (ibid.). According to this account, the categories atoms and tomatoes are both invented, not discovered, by people. Atoms are invented by scientists and tomatoes by everyone – but neither is discovered. The nominalistic lessons of modern physics had two sorts of implications. On the one hand, these lessons were relevant to the argument about which sorts of scientific thinking were appropriate as metaphors for everyday cognition. In this regard, the claim was that nominalistic philosophy of science was better than realist philosophy of science as a model for human nature. On the other hand, the reference to nominalism could also serve as a critique of behaviorism. This was because stimuli were equivalent to people not when
Salkind_Chapter 32.indd 315
9/4/2010 10:32:37 AM
316
Curriculum, Instruction and Learning
they were objectively, measurably equivalent to the experimenter, but when people constructed psychological categories that grouped the stimuli together (8). This point could make meaningless the behaviorist effort to relate observable and measurable stimuli and responses.31 As Bruner put it in 1951, Let us begin by stating a heuristic theory of perception. We shall assume that the organism is always set or tuned or expectant; he is, in short, ready for certain classes of stimulus events to occur. The tuning of the organism, and we shall discuss its determinants presently, we shall call an hypothesis. It is a predisposition to organize and classify the perceptual field in a certain way at a certain moment. Stimulus information enters the prepared organism. We use the term stimulus information rather than stimuli for what we wish to denote here is not the energy characteristics of stimulation, but the cue characteristics provided by stimulation – its signaling value. . . . The data of the scientist are not the raw cues of stimulation, but the perceptions of the scientist which occur when those cues confirm perceptual hypotheses which he has acquired. In this important sense, then, the scientist’s data are not found, but created.32
On this account, data, stimuli, and responses do not exist independently of expectations. Both the psychologist and the people he or she studies do not experience the world in ‘raw’ form. Human perception is so thoroughly laden by prior hypotheses and theory that it is impossible for any scientist to make purely objective pointer-readings.33 While the implications of nominalism for behaviorism were not explicitly drawn in A Study of Thinking, Bruner did make this last point explicit in his review of a book by the behaviorist Kenneth Spence only a year later (1957a). In this sense, nominalism was not merely the proper model of human thinking, it was also a better model for psychological research. The argument, then, was that it is human nature to think nominalistically, and that it is good for psychologists (and other scientists) to think nominalistically. Thus there was an equivalence of good scientific thinking and normal human thinking. An implication here is that the only people who held to the dogma of realism were naïve philosophers and behaviorists. And realism, since it was dogma, was abnormal or ill – or, in Koch’s terms, autistic. While A Study of Thinking focused on categorizing the different ways that people go about understanding the world, Eugene Galanter and Murray Gerstenhaber’s 1956 article ‘On Thought’ drew on the conceptual modelbuilding aspects of science. This article argued that thinking and understanding were much like building an internal model of the world. These models would be either like maps or like the three-dimensional scale models scientists and inventors constructed to understand and represent large-scale physical phenomena. Galanter and Gerstenhaber extended the technical nature of the model analogy of thinking by suggesting ‘the environment will be a “machine,” or “mechanism.” … The process by which the behavior of the mechanism is predicted is called “thinking” ’ (1956: 219).34
Salkind_Chapter 32.indd 316
9/4/2010 10:32:37 AM
Cohen-Cole
The Reflexivity of Cognitive Science
317
Whether they came to the study of mind from psychology or other disciplines, for the early cognitive scientists although creative thinking was praiseworthy, it was not exceptional. Instead theory construction and creative problem-solving was the cognitive scientists’ model of everyday thinking and problem-solving. Learning was not so much a process of acquiring facts about the world as of developing a skill or acquiring proficiency with a conceptual tool that could then be deployed creatively (BGA, 1956: 6–7; Chomsky, 1959a; Galanter and Gerstenhaber, 1956; Miller, Galanter and Pribram, 1960; Newell, Shaw and Simon, 1958; Newell, Shaw and Simon, 1962; Simon, 1966a; Simon, 1967). For instance, according to the MIT linguist Noam Chomsky, the acquisition and use of language was an active and creative process. In his eyes, a child learning a language was not acquiring specific words so much as operating like a scientist by actively developing a theory of how to speak properly. Such theories are nothing other than the grammar of the language in question. Chomsky also held that adults require similar theories in order to produce and comprehend sentences (1956: 113, 116; 1957). This view of language users and learners as scientists was not neutral with respect to either linguistic theory or philosophy of science. According to Chomsky the linguist or psychologist looking to account for the ‘actual behavior of speaker, listener, and learner’ would fail if he or she followed the purely empiricist rules of scientific method as described and advocated, for instance, by B. F. Skinner (1950, 1956). Indeed, the ‘mechanisms’ or ‘theories’ of grammar that Chomsky saw each individual as possessing could never be observed directly, but only be inferred from their behavior. While Chomsky’s view of language set constraints on the proper method for scientists seeking to account for language, it also framed humans as following certain forms of method as well. Specifically, native ‘hypothesis-forming’ abilities enable children to rapidly learn grammar, a process which Chomsky described as constructing an ‘abstract deductive theory’ or ‘an extremely complex mechanism’ for producing or recognizing proper sentences (1959b: 56–7). Chomsky supplemented his claims for the active and thoughtful nature of learning by citing the neurologist Roger Sperry’s argument that even simple conditioning requires insight (cited in Chomsky, 1959b: 44; Sperry, 1955). Chomsky’s views of language, learning, thinking and the ways to study them both typified early cognitive science and catalyzed much later work in the field (see, for instance, Galanter and Gerstenhaber, 1956; Miller, Galanter and Pribram, I960).35 Practitioners in the field discussed the human mind as if it were a complex machine or computer capable of reasoning, hypothesis formation, and insight. These cognitive processes were taken as innate human abilities that were necessary for both learning and perception. To many cognitive scientists the computer provided a useful metaphor to combat stimulus-response behaviorism. George Miller, for instance, recalls that the metaphor allowed cognitive psychologists the opportunity to have a
Salkind_Chapter 32.indd 317
9/4/2010 10:32:38 AM
318
Curriculum, Instruction and Learning
mechanism to support their views (Baars, 1986: 212). If the computer could demonstrate higher thinking, then surely it would not be pure speculation to attribute those thought processes to people. There could, therefore, be a defensible science of thinking. The computer metaphor was also used to make an anti-behaviorist point and emphasize the way in which human nature is creative and (partly) autonomous of the environment. Hebb, for instance, argued that the ‘computer analogy’ developed by Miller, Galanter and Pribram (1960), and Donald Broadbent (1958), ‘can readily include an autonomous central process as a factor in behavior’ (1960: 740; emphasis added). Although Hebb made a common, cognitively oriented interpretation of the computer analogy, it is worth noting its paradoxical framing.36 Specifically, why did he believe that human autonomy could be illustrated by a machine? As Miller, Galanter and Pribram argued, the mechanism that psychologists choose as a model of humanness does not necessarily force a particular vision of that nature (1960: 41). It may have been the case that computers did things that could not be predicted, but it seems unlikely that committed stimulus-response behaviorists would have concluded that human cognition is autonomous from that result. Certainly prior failures by psychologists to perfectly predict human behavior had not convinced behaviorists that organisms are autonomous. Early cognitive scientists and artificial intelligence researchers selected quite specific features of human nature to model with computer programs. For the cognitive scientists Newell, Shaw and Simon, and Miller, Galanter and Pribram, there was a clear route to convincing their audiences that mind did in fact exist and that it was possible to study it scientifically. This was to build models of the forms of human thought that Americans most widely saw as requiring higher reasoning. The cognitive scientists thus selected quite specific and widely recognized problems to model. For instance, Herbert Simon and Allen Newell built a program that could solve a logic problem that had recently been featured on a television show (1962). The computer models that cognitive scientists built used heuristic (rather than strictly logical or deterministic) methods to play chess, re-derive the proofs in Russell and Whitehead’s Principia Mathematica, or produce novel proofs of the theorems in Euclid’s Elements (Miller, Galanter and Pribram, 1960; Newell, Shaw and Simon, 1958; Newell, Shaw and Simon, 1962). The picture of human thinking that cognitive scientists inscribed in their computer models depended on the accounts of science provided by Henri Poincare (1952), Michael Polyani (1958), and George Polya (1945; 1954),37 and shared much with the anti-positivist one developed by men such as Conant, Oppenheimer, Kuhn, Bronowski, and Zacharias. Notably, Herbert Simon opined in 1958 that Bruner, Goodnow and Austin’s use of ‘strategies’ to describe thinking was ‘the nearest thing in the psychological literature to the use of programs to describe behavior’.38 In the time since these early works, cognitive science has continued to use computer models to ascribe such scientific methods as
Salkind_Chapter 32.indd 318
9/4/2010 10:32:38 AM
Cohen-Cole
The Reflexivity of Cognitive Science
319
hypothesis-making, theory construction, and inference to everyday thought (for instance, Johnson-Laird, 1988). There is, however, the possibility that the computer would have been used to model not the rational, autonomous, cognitive, creative version of human nature, but the reactive, reflex-driven version of the human nature. Consider the models of human thinking proposed by Clark Hull, E. G. Boring (1946), Saul Gorn (1959), and Howard Hoffman (1962). In every one of these cases, psychologists developed models that strengthened the behaviorist vision of human nature. Clark Hull, for instance, compared learning to making a series of connections linking stimulus and response on a telephone switchboard and thereby emphasized its non-cognitive, non-insightful aspects (Smith, 1990: 249–54). Gorn designed a computer program and Hoffman built an electrical device (an ‘analogue lab’) to simulate stimulus-response learning. In both cases, the models indicate that the behaviorist vision of human nature, rather than being at stake, was assumed to be true. E. G. Boring’s robot model of human nature was likewise dependent on the stimulus-response model. Having failed Norbert Wiener’s challenge to give a single example of any human mental function that computers could not perform, Boring proceeded to outline all of the characteristics that the computer should have if it were to be a good model of human nature (1946: 178).39 The remarkable feature of Boring’s mechanical model is the specific characteristics of humanness that the computer was supposed to mimic. To Boring, a computer or robot would be a good model of human nature if it exhibited the properties of stimulus-response learning developed by behaviorists (1946: 183–4). Unlike Newell, Simon and Shaw, Boring did not argue that a computer could be considered a good model of human nature if it could produce novel solutions to mathematics problems or play chess (Newell, Shaw and Simon, 1959).40 In other words, unlike cognitive scientists, Boring used the computer model of mind to argue against mentalism. Miller, Galanter and Pribram pointed out that behaviorist, anti-cognitive implications that Hull drew from his analogy between a telephone switchboard and organisms were not necessitated by the model itself. They noted that until shortly before they wrote, switchboards needed human operators to work (1960: 41). Thus, they showed that the switchboard metaphor of thinking and learning could be used to imply that human thought processes were necessary to properly link a stimulus to a response. Of course, Hull did not take this option. His model emphasized the non-thinking aspects both of telephone switching and of learning. Although Miller, Galanter and Pribram noted that the telephone switchboard did not force behaviorist conclusions, they did not also point out a related conclusion: much as Boring had demonstrated, the computer model does not require a cognitive vision of the human mind. Set the task of designing something like a Turing test, behaviorists and cog-nitivists framed very different sorts of questions to put to the computer. Boring looked to see if the computer could follow stimulus-response rules of
Salkind_Chapter 32.indd 319
9/4/2010 10:32:38 AM
320
Curriculum, Instruction and Learning
learning. The cognitivists looked to see if the computer could play chess or solve problems according to heuristic methods. The way that these two groups of human scientists understood themselves and their own thinking was at the center of this difference. That Boring, Hull, Gorn, and Hoffman looked at machines and saw a way to support behaviorists’ claims about organisms says more about these psychologists than it does about the organisms or the machine models they worked with. Likewise, that Hebb, Miller, Galanter, Pribram, Newell, Simon, and Shaw looked at computers and saw a model of autonomous human cognitive processes says more about these scientists than about either computers or people. Whether behaviorist or cognitivist, the meaning that scientists read into mechanical metaphors depended on highly value-laden visions of human nature. It turned on what they already knew about thinking, human nature, themselves, and the scientific process.
Conclusion This article has examined the elision of a normative vision of ‘right thinking’ with a descriptive account of ‘thinking’. Cognitive scientists inscribed a highly political and value-laden notion of proper thinking into their descriptive accounts of human thinking. In their struggle with behaviorism, contests over the structure of the world (i.e. human nature and the nature of mind) were often contests over proper scientific methodology at the same time. Conversely, statements about whether rats or computers exhibited insight were also claims about the kind of thinking in which scientists themselves ought to engage. Psychologists who ascribed insight to rats, computers, or people were likely to see insight as an important component of the scientific process. In opposition, behaviorists who denied rodent insight and focused on learning through trial and accumulated experience were more likely to frame the empirical experience as the foundation of proper scientific method – of which they were practitioners.41 Making a science of the mind involved developing a cognitive psychology – an image of normal human nature that was universal, independent (at least in part) from the environment and instinct, creative, and autonomous. This move involved, for the most part, dropping the normative descriptions of better and worse kinds of people and personality types described in political discourse and social psychology. Rather than contrasting the open and closed mind, the democratic and authoritarian, the creative and conformist personality, cognitive scientists looked at computers and saw the open-minded, creative, flexible, and heuristic thinking processes that they deemed to be characteristic of human nature. Although lacking language that identified better and worse forms of thinking, this science of autonomy substituted normalizing for normative
Salkind_Chapter 32.indd 320
9/4/2010 10:32:38 AM
Cohen-Cole
The Reflexivity of Cognitive Science
321
terminology. Instead of identifying better and worse forms of humanness, it identified as universally human the specific forms of human nature that political discourse, social psychology, and anti-positivist philosophy of science had already marked as good. Thus, the better forms of human thinking constructed by social psychology (democratic, broad, open, flexible, creative) became the only forms of human thinking. The person portrayed by this psychology was fundamentally the same regardless of social situation, personality, or culture. By implication, that which had been marked as other forms of humanness were consigned to a pathology worse than illness – to non- or sub-humanness. The permeable boundary between expert and non-expert knowledge of human nature afforded cognitive scientists a collection of tools that, used reflexively, could further their research programs. The politics and social psychology of right thinking in cold war American culture gave anti-behaviorists techniques to turn on their own discipline and with which to mark themselves as open-minded and behaviorists as authoritarian ideologues. Antipositivist philosophy of science offered cognitive scientists not only a defense of their own kind of thinking, but also a model for how humans in general think. As George Miller put it in the Voice of America address discussed at the beginning of this article: A scientist … searches through ideas as well as through objects in order to find what he seeks. And he does not look indiscriminately – always he carries an image of what he seeks. . . . He is looking for something that matches up to his image of what the world must be, something that meets a test he himself imposes, something that has meaning only in terms of the standards he lives by.
Miller and his fellow cognitive scientists did just that – they looked for human nature by holding an image of what they were looking for in their minds. The image they held was none other than their own self-image. Concluding his address Miller remarked, ‘the scientist is Everyman, looking just as you and I. We go and look for the things we want, and when we find them we find part of ourselves’ (1963: 149–50).
Notes 1. On the difficulty of teaching physics see Clement (1982: 66–71); Gardner (1991: 152–8). 2. One example of invasion of the physicists’ domain was the case of Aryan physics (Beyerchen, 1992). 3. Of course the same could be said about the modern physical sciences. See, for instance, Galison (2003); Wise (1988; 1993). 4. In an examination of a later period and somewhat different vision of mind than that considered in this article, Gigerenzer (1991; 1992) has shown how cognitive psychologists’ own ways of thinking and working became the model for human cognition. For general discussion of the cognitive revolution see Baars (1986); Dupuy (2000); Gardner
Salkind_Chapter 32.indd 321
9/4/2010 10:32:38 AM
322
5. 6.
7. 8.
9.
10.
11.
12.
13.
14.
Curriculum, Instruction and Learning
(1985); Greenwood (1999); Mandler (2002). Robins, Gosling, and Craik have provided an empirical study of how cognitive psychology supplanted behaviorism (1999). Thomas Leahey has argued that the cognitive revolution was not actually a ‘real’ revolution (1992). On the other hand, historians and psychologists continue to use the term. Without seeking to judge the validity of the term, this article follows actors’ categories in adopting the term ‘cognitive revolution’. For discussion of the deficiencies in how communists thought, see Allen (1949) and Shils (1954). ‘Ideology vs Democracy’: text of a speech by Arthur Schlesinger, Jr, before the Indian Council on World Affairs, New Delhi, 15 February 1962; papers of J. Robert Oppenheimer, Box 65, Folder: Schlesinger, Arthur, Manuscript Division, Library of Congress. Schlesinger (1949). Archives of the Ford Foundation, Grant Files, Reel R-004, Grant 53–78, Project Proposal, Jerome S. Bruner to Ford Foundation, 1/9/53. For a similar, but slightly less politically charged, argument, see Tolman (1948: 207–8). For similar discussion in regards to theory in the social sciences in general, see Dunlop, Gilmore, Kluckhohn, Parsons and Taylor (1941). There is a double sense in which the place of insight and intuition in psychology was contested. First, there was the question of whether the discipline should study insight and intuition. Second, there was the question of whether psychologists themselves display insight and intuition in their work. I address the relationship of these two questions later in this article. For interesting examination of the connection between the subject matter of psychology and the psychological states of psychologists, see Morawski (1992) and Richards (1987). The split was noted by many leaders of psychology. The social psychologist Gordon Allport parsed the division in psychology similarly. To Allport, psychology was split between ‘Lockeans’ and ‘Leibnizians’. Allport’s behavioristic Lockeans believed that humans could be completely explained by (and were determined by) their experiences and the stimuli that impinged upon their senses. According to Allport, Leibnizians believed that the human mind was, at least in part, autonomous (Allport, 1955). Others who saw a similar division in psychology included three presidents of the APA (Tolman, 1937; Hebb, 1960; Rogers, 1947; Hilgard, 1949; Harlow, 1958; and Miller, 1969), the editor of The Journal of Experimental Psychology (Melton), and the author of one of the most widely read textbooks in psychology (Hilgard). See Harlow (1956: 274); Hebb (1960); Hilgard (1948: 9); Kahn (1955: 171–2); Kelley (1955: 172–3); Melton (1956); Miller, Galanter and Pribram (1960: 7–8); Rogers (1955); Tolman (1948). Hilgard’s book is particularly significant due to its wide readership. In surveys conducted in 1953– 4 and 1958–9 Hilgard’s book was recommended for graduate education by over half of surveyed departments. Hilgard’s text was one of only six books to rate so highly (Sundberg, 1960). David McClelland noted that ‘psychologists used to be interested in what went on in people’s heads. . . . [B]ut with the rise of modern scientific psychology we lost interest in ideas, by and large’ (McClelland, 1955: 297). Positivist, operationist and behavioristic psychology had such claim on being the ‘fundamental’ part of psychology that it was a notable event when social and clinical psychologists characterized their own work as ‘fundamental’ (Koch, 1959b: 5). In his presidential address to the American Psychological Association, Donald Hebb remarked that learning had been ‘the fundamental issue in psychology’ ever since the work of J. B. Watson and Edward Thorndike in the early 20th century (Hebb, 1960: 736). ‘[B]etween 1944 and 1950, 70 per cent of the articles on learning and motivation in the Journal of Experimental Psychology and the Journal of Comparative and Physiological Psychology cited Hull’ (Logue, 1985: 180). Logue cites evidence collected in Spence (1952).
Salkind_Chapter 32.indd 322
9/4/2010 10:32:38 AM
Cohen-Cole
The Reflexivity of Cognitive Science
323
15. A useful overview of learning theory, including both perspectives, is Hilgard and Bower (1966). Discussion of the marginality of Gestalt theory may be found in Ash (1985); Sokal (1984). On understanding and cognition see Krechevesky (1932a, 1932b); Tolman (1948). 16. In descending order of size of APA membership, the top ten psychological specialties in 1952 were clinical, educational, experimental, industrial, vocational, social, general, developmental, personality, and physiological (Sanford, 1952; cited in Hunt, 1956: 14). For excellent analysis of the changing composition of the discipline cf. Capshew (1999). 17. John Dollard and Neal Miller articulated the primary reason for using rats to study people. ‘The basic facts and concepts can best be introduced by the discussion of a simple experiment on albino rats. In using results from an experiment of this kind we are working on the hypothesis that people have all the learning capacities of rats so that any general phenomenon of learning found in rats will also be found in people, although, of course, people may display additional phenomena not found in rats. Even though the facts must be verified at the human level, it is often easier to notice the operation of principles after they have been studied and isolated in simpler situations so that one knows exactly what to look for’ (Dollard and Miller, 1950: 63). For a similar discussion about the inter-species applicability of laws of behavior see Skinner (1956: 221–33). 18. For other attacks on this methodological assumption see Beach (1950) and Dukes (1960). 19. For other examples of psychologists noting the religious nature of behaviorism see Bruner (1957b: 341); Harlow (1953: 23); Hunt (1956: 7). 20. This image of religion and particularly Catholicism as dogmatic and anti-scientific (and anti-democratic) appeared particularly strongly during the Second World War and afterwards in the writings of Sidney Hook (1945; 1991). For further discussion of the religion versus science polarization see Cohen-Cole (2003: ch. 2, sec. 1); Hollinger (1995); Purcell (1972). 21. Koch also suggested a contrast between the ‘realities of science’ and its converse the ‘facile myth’ advanced by the Age of Theory (1959a: 786). 22. For discussion of conformity and other features of the closed mind see Cohen-Cole (2003). 23. Warren Weaver, ‘Science and Faith’, delivered on Layman’s Sunday in the Congregational Church of New Milford, Connecticut, 16 May 1954. Papers of Vannevar Bush, Box 117, Folder: Weaver, Warren (1948–1954), Manuscript Division, Library of Congress. 24. Oppenheimer was known as the ‘father’ of the Atomic Bomb because he had been scientific director of the Manhattan Project. Conant, in some sense, was the grandfather of the bomb because of his work in OSRD, the government agency that oversaw scientific development for the Second World War. On the cultural authority of Manhattan Project alumni after the Second World War, see Kevles (1987). Weaver served as a grants officer for the Rockefeller Foundation. On Weaver, see Kohler (1991). Zacharias had worked in the Rad Lab during the Second World War and in MIT’s Research Laboratory of Electronics afterwards. He organized PSSC, the first of numerous NSF-funded secondary school science curricula. On Zacharias, see Goldstein (1992); Rudolph (2002). 25. On Parsons’s philosophy of science see Camic (1991). For discussion of how these two and other social scientists followed Conant’s lead in defining science, see Cohen-Cole (2003: ch. 3). 26. On the relationship between Conant and Kuhn see Fuller (2000). 27. Although Kuhn is well known for his anti-positivist stance, he did not join his contemporaries in seeing creativity as the most important component of scientific thought. For instance, as a participant in a series of conferences sponsored by the
Salkind_Chapter 32.indd 323
9/4/2010 10:32:38 AM
324
28.
29.
30.
31.
32.
33.
34.
35.
Curriculum, Instruction and Learning
National Science Foundation, devoted to characterizing and improving creativity in science, in 1959 Kuhn scolded his audience for focusing too much on ‘divergent’ and ‘flexible’ thinking scientific work to the exclusion of the forms of cognition that characterize normal science (Kuhn, 1959). The psychologist Abraham Maslow had made similar arguments about the proper structure of science almost a decade earlier (1946). Focused on the general problems with ‘means-centering’ in all sciences, Maslow did not explicitly point to behavioristic psychology as the object of his critique. The terms of his criticism of ‘means-centering’, however, were so typical of charges leveled at behavioristic psychology (narrowness, over-concern with rigor, etc.) that Maslow was almost certainly intending his article as a thinly veiled critique of main-line experimental psychology. Several years later, Maslow published an article making the direct link between ‘means-centering’ and psychology (1949: 261). Surveying the field in his 1960 presidential address to the APA, Donald Hebb remarked: ‘though this opposition of aims may seem over-simplified, I believe it is fundamentally sound. How [should we] understand otherwise the learning theorist’s [i.e. the behaviorist’s] bland refusal even to discuss attention or purpose, or the cognitive psychologist’s happy preference for phenomena he cannot explain – so long as the other cannot explain them either?’ (1960: 737). Six years earlier, the psychologist Hadley Cantril, with whom Bruner had worked during the Second World War, made similar points. He noted that ‘variables scientists use do not exist in their own right. They are only aspects abstracted out of the total situation by scientists as inquiring human beings endowed with the capacity to manipulate ideas.’ Citing Einstein and Infeld, Cantril also noted the ‘creative imagination’ necessary in science (Cantril, 1950: 491; Einstein and Infeld, 1942: 95). On this point, Koch argued: ‘If stimuli and responses are acknowledged to depend for their identification on the perceptual sensitivities of human observers, then the demand for something tantamount to a language of pointer readings, whether as simple energysource or movement descriptions, or as disjunctions of fixed-stimulus “indicators” and response “measures,” must be given up. And if this demand must be given up, then much time-worn argumentation as to the intrinsic ambiguity of experiential language, or in fact any language the end-terms in which are not S and R, become idle and beside the point. If, further, the requirement is asserted that S be specified in a way which includes its inferred meaning for the organism then any basis for a difference in epistemological status between an S-R language and what has been called “subjectivistic” language is eliminated’ (1959a: 768–9; original emphasis). For a similar argument see also Bruner (1957a: 156; 1957b: 340–58); Hebb (1954: 404); Miller, Galanter and Pribram (1960: 21–5). ‘Cognition and the Limits of Scientific Inquiry’, unpublished paper read at the Institute for the Unity of Science at the American Academy of Arts and Sciences, 1951. Jerome S. Bruner Papers, Harvard University Archives, HUG 4242.28. Bruner based this argument on a series of experiments he conducted on how expectation determines what people see. For historians of science the most notable of these experiments is one that examines how individuals experience Gestalt switches in their perception when viewing mis-colored playing cards such as black hearts (Bruner and Postman, 1949). This paper was one of Thomas Kuhn’s central examples of Gestalt switching in The Structure of Scientific Revolutions. Bruner later made a similar argument. ‘I am inclined to think of mental development as involving the construction of a model of the world in the child’s head, an internalized set of structures for representing the world around him’ (Bruner, 1979[1962]: 103). Subsequently, Chomsky collaborated with the Harvard psychologist George Miller in developing a cognitively oriented field of psycholinguistics (Chomsky and Miller,
Salkind_Chapter 32.indd 324
9/4/2010 10:32:38 AM
Cohen-Cole
36. 37. 38.
39. 40. 41.
The Reflexivity of Cognitive Science
325
1958; Chomsky and Miller, 1963a and 1963b; Miller and Chomsky, 1963). In addition to these co-authored works, Miller and the cognitive psychologists he trained spent much time investigating the psychological reality of Chomsky’s understanding of language (Harvard University Center for Cognitive Studies, 1961–9; Miller, 1967). For more discussion of Chomsky’s linguistics work, its anti-behaviorist implications, and the ways that psychologists adopted it, see Cohen-Cole (2003: chs 5–6). For discussion of this paradox, see Crowther-Heyck (1999). Cited in Miller, Galanter and Pribram (1960: 87, 160, 167–9, 179, 180, 183, 191). For additional references to Poincaré and Polya, see Newell, Shaw and Simon (1962). ‘Bibliography – Miscellaneous Comments on Starred Items’, Commentary – 1958 [bundled] (Consulting – Social Science Research Council – Conferences and Seminars – Summer Research Training Institute on Simulation of Cognitive Processes – 1958). Herbert Simon Digital Archive. http://diva.library.cmu.edu/ Simon/index.html[.] See also Newell, Shaw and Simon (1958: 153, note 2); Newell and Simon (1959: 10). For discussion of the exchange between Boring and Wiener, see Galison (1994: 247, 251–2). Cited in Miller, Galanter and Pribram (1960: 189). Lawrence Smith has noted the tendency of neo-behaviorists to see the mode of animal thinking described in their work as functioning in essentially the same fashion as their announced prescriptions (philosophy of science) for their own thinking (1986).
Bibliography Unpublished Sources Jerome Bruner Papers, Harvard University Archives. Ford Foundation Archives, Grant Files. J. Robert Oppenheimer Papers, Manuscript Division, Library of Congress. Herbert Simon Digital Archive, http://diva.library.cmu.edu/Simon/index.html
Published Sources Adorno, T. W., Frenkel-Brunswik, E., Levinson, D. J. and Sanford, R. N. (1950) The Authoritarian Personality. New York: Harper & Brothers. Allen, R. B. (1949) ‘Communists Should Not Teach in American Colleges’, Educational Forum 13: 433– 40. Allport, G. W. (1940) ‘The Psychologist’s Frame of Reference’, Psychological Bulletin 37: 1–28. Allport, G. W. (1955) Becoming: Basic Considerations for a Psychology of Personality. New Haven, CT: Yale University Press. Anzieu, D. (1986) Freud’s Self-Analysis. London: Hogarth Press and the Institute of Psycho–analysis. Ash, M. G. (1985) ‘Gestalt Psychology: Origins in Germany and Reception in the United States’, in C. E. Buxton (ed.) Points of View in the Modern History of Psychology. Orlando, FL: Academic Press, pp. 295–344. Baars, B. J. (1986) The Cognitive Revolution in Psychology. New York: Guilford Press. Barber, B. (1952) Science and the Social Order. Glencoe: Free Press. Beach, F. A. (1950) ‘The Snark Was a Boojum’, American Psychologist 5: 115–24. Berelson, B., ed. (1963) The Behavioral Sciences Today. New York: Basic Books.
Salkind_Chapter 32.indd 325
9/4/2010 10:32:38 AM
326
Curriculum, Instruction and Learning
Bergmann, G. and Spence, K. W. (1941) ‘Operationism and Theory in Psychology’, Psychological Review 48: 1–14. Beyerchen, A. D. (1992) ‘What We Now Know about Nazism and Science’, Social Research 59: 615– 41. Birnbaum, L. C. (1955) ‘Behaviorism in the 1920s’, American Quarterly 7: 15–30. Bitterman, M. E. (1960) ‘Toward Comparative Psychology of Learning’, American Psychologist 15: 704 –12. Boring, E. G. (1946) ‘Mind and Mechanism’, American Journal of Psychology 59: 173–92. Broadbent, D. E. (1958) Perception and Communication. New York: Pergamon. Bronowski, J. (1956) Science and Human Values. New York: Julian Messner. Brower, D. (1949) ‘The Problem of Quantification in Psychological Science’, Psychological Review 56: 325–33. Bruner, J. S. (1957a) ‘Mechanism Riding High: Review of Kenneth W. Spence, Behavior Theory and Conditioning’, Contemporary Psychology 2: 155–7. Bruner, J. S. (1957b) ‘Neural Mechanism in Perception’, Psychological Review 64: 123–52. Bruner, J. S. (1979[1962]) ‘On Learning Mathematics’, in On Knowing: Essays for the Left Hand. Cambridge, MA and London: Harvard University Press, pp. 97–111. Bruner, J. S., Goodnow, J. J. and Austin, G. A. (1956) A Study of Thinking. London: John Wiley & Sons. Bruner, J. S. and Postman, L. J. (1949) ‘On the Perception of Incongruity: a Paradigm’, Journal of Personality 18: 206–23. Bryson, L. (1948) Science and Treedom. New York: Columbia University Press. Camic, C. (1991) ‘Introduction: Talcott Parsons Before The Structure of Social Action, in C. Camic (ed.) Talcott Parsons: the Early Essays: Edited and with an Introduction by Charles Camic. Chicago, IL: University of Chicago Press. Cantril, H. (1949) ‘Toward a Scientific Morality’, The Journal of Psychology 27: 363–76. Cantril, H. (1950) ‘An Inquiry Concerning the Characteristics of Man’’, Journal of Abnormal and Social Psychology 45: 490–503. Capshew, J. H. (1999) Psychologists on the March: Science, Practice, and Professional Identity in America, 1929–1969. Cambridge and New York: Cambridge University Press. Chomsky, N. (1956) ‘Three Models for the Description of Language’, IRE Transactions on Information Theory IT-2: 113–24. Chomsky, N. (1957) Syntactic Structures. The Hague: Mouton. Chomsky, N. (1959a) ‘Review of B.F. Skinner, Verbal Behavior, in J. A. Fodor and J. J. Katz (eds) The Structure of Language. Englewood Cliffs, NJ: Prentice-Hall. Chomsky, N. (1959b) ‘Review of B.F. Skinner, Verbal Behavior’, Language 35: 26–58. Chomsky, N. and Miller, G. A. (1958) ‘Finite State Languages’, Information and Control 1: 91–112. Chomsky, N. and Miller, G. A. (1963a) ‘Finitary Models of Language Users’, in R. D. Luce, R. Bush and E. Galanter (eds) Handbook of Mathematical Psychology, Vol. II. New York: Wiley, pp. 419–91. Chomsky, N. and Miller, G. A. (1963b) ‘Introduction to the Formal Analysis of Natural Languages’, in R. D. Luce, R. Bush and E. Galanter (eds) Handbook of Mathematical Psychology, Vol. II. New York: Wiley, pp. 269–322. Clement, J. (1982) ‘Student Preconceptions of Introductory Mechanics’, American Journal of Physics 50: 66–71. Cohen-Cole, J. (2003) Thinking About Thinking in Cold War America. PhD thesis. Princeton, NJ: Princeton University. Colodny, R. G., ed. (1966) Mind and Cosmos: Essays in Contemporary Science and Philosophy. Pittsburgh, PA: University of Pittsburgh Press. Conant, J. B. (1947) On Understanding Science: A Historical Approach. New Haven, CT: Yale University Press.
Salkind_Chapter 32.indd 326
9/4/2010 10:32:39 AM
Cohen-Cole
The Reflexivity of Cognitive Science
327
Conant, J. B., ed. (1950) Harvard Case Histories in Experimental Science. Cambridge, MA: Harvard University Press. Conant, J. B. (1951a) On Understanding Science. New York: New American Library. Conant, J. B. (1951b) Science and Common Sense. New Haven, CT: Yale University Press. Conant, J. B. (1952) Modern Science and Modern Man. New York: Doubleday. Crowther-Heyck, H. (1999) ‘George A. Miller, Language, and Computer Metaphor of Mind’, History of Psychology 2: 37–64. Crutchfield, R. S. (1955) ‘Conformity and Character’, American Psychologist 10: 191–8. Danziger, K. (1990) Constructing the Subject: Historical Origins of Psychological Research. Cambridge and New York: Cambridge University Press. Dollard, J. and Miller, N. E. (1950) Personality and Psychotherapy: An Analysis in Terms of Learning, Thinking, and Culture. New York: McGraw-Hill. Dukes, W. E (1960) ‘The Snark Revisited’, American Psychologist 15: 157. Dunlop, J. T., Gilmore, M. P., Kluckhohn, C. K., Parsons, T. and Taylor, O. H. (1941) ‘Toward a Common Language for the Area of the Social Sciences’ (unpublished typescript). Cambridge, MA: Harvard University. Dupuy, J.-P. (2000) The Mechanization of the Mind: On the Origins of Cognitive Science. Princeton, NJ: Princeton University Press. Einstein, A. and Infeld, L. (1942) The Evolution of Physics, 2nd edn. New York: Simon & Schuster. Ericson, S. C. (1941) ‘Unity in Psychology: a Survey of Some Opinions’, Psychological Review 48: 73–82. Feigl, H. (1959) ‘The Philosophical Embarrassments of Psychology’, American Psychologist 14: 115–28. Frank, L. K. (1951) Nature and Human Nature: Man’s New Image of Himself. New Brunswick, NJ: Rutgers University Press. Fuller, S. (2000) Thomas Kuhn: A Philosophical History for Our Times. Chicago, IL: University of Chicago Press. Galanter, E. and Gerstenhaber, M. (1956) ‘On Thought: The Extrinsic Theory’, Psychological Review 63: 218–27. Galison, P. (1994) ‘The Ontology of the Enemy: Norbert Wiener and the Cybernetic Vision’, Critical Inquiry 21: 228–68. Galison, P. L. (2003) Einstein’s Clocks and Poincare’s Maps: Empires of Time, 1st edn. New York and London: W.W. Norton. Gardner, H. (1985) The Mind’s New Science: A History of the Cognitive Revolution. New York: Basic Books. Gardner, H. (1991) The Unschooled Mind: How Children Think and How Schools Should Teach. New York: Basic Books. Gigerenzer, G. (1991) ‘From Tools to Theories: a Heuristic of Discovery in Cognitive Psychology’, Psychological Review 98: 254 –67. Gigerenzer, G. (1992) ‘Discovery in Cognitive Psychology: New Tools Inspire New Theories’, Science in Context 5: 329–50. Goldstein, J. S. (1992) A Different Sort of Time: The Life of Jerrold Zacharias. Cambridge, MA: MIT Press. Gorn, S. (1959) ‘On the Mechanical Stimulation of Habit-Forming and Learning’, Information and Control 2: 226–59. Green, C. D. (1992) ‘Of Immortal Mythological Beasts: Operationism in Psychology’, Theory & Psychology 2: 291–320. Greenwood, J. D. (1999) ‘Understanding the “Cognitive Revolution” in Psychology’, Journal of the History of the Behavioral Sciences 35: 1–22. Gruber, H. E., Hammond, K. R. and Jessor, R. (1957) ‘Foreword’, in Contemporary Approaches to Cognition. Cambridge, MA: Harvard University Press, pp. v–vi.
Salkind_Chapter 32.indd 327
9/4/2010 10:32:39 AM
328
Curriculum, Instruction and Learning
Hacking, I. (1986) ‘Making up People’, in T. C. Heller, M. Sosna and D. E. Wellbery (eds) Reconstructing Individualism: Autonomy, Individuality and the Self in Western Thought. Stanford, CA: Stanford University Press, pp. 222–36. Harlow, H. F. (1953) ‘Mice, Monkeys, Men, and Motives’, Psychological Review 60: 23–31. Harlow, H. F. (1956) ‘Current and Future Advances in Physiological and Comparative Psychology’, American Psychologist 11: 273–7. Harlow, H. F. (1958) ‘The Nature of Love’, American Psychologist 13: 673–85. Harvard University Center for Cognitive Studies (1961–9) Annual Reports. Cambridge, MA: HUCCS. Hebb, D. O. (1949) The Organization of Behavior: a Neuropsychological Theory. New York: John Wiley & Sons. Hebb, D. O. (1954) ‘The Problem of Consciousness and Introspection’, in E. D. Adrian (ed.) Brain Mechanisms and Consciousness. Oxford: Blackwell. Hebb, D. O. (1960) ‘The American Revolution’, American Psychologist 15: 735–45. Heil, A. L. (2003) Voice of America: a History. New York and Chichester, Sx: Columbia University Press. Hilgard, E. R. (1948) Theories of Learning, 1st edn. New York: Appleton-Century-Crofts. Hilgard, E. R. and Bower, G. H. (1966) Theories of Learning, 3rd edn. New York: Appleton-Century-Crofts. Hoffman, H. S. (1962) ‘The Analogue Lab: a New Kind of Teaching Device’, American Psychologist 17: 684 –94. Hollinger, D. A. (1995) ‘Science as a Weapon in Kulturkämpfe in the United States During and After World War II, Isis 86: 440–54. Hook, S. (1945) ‘Democracy and Education: Introduction’, in The Authoritarian Attempt to Capture Education: Papers from the Second Conference on the Scientific Spirit and Democratic Taith. New York: King’s Crown Press, pp. 10–12. Hook, S. (1991) Reason, Social Myths and Democracy. Buffalo, NY: Prometheus Books. Hunt, W A. (1956) The Clinical Psychologist. Springfield, IL: Charles C. Thomas. Johnson, E. P. (1956) ‘On Readmitting the Mind’, American Psychologist 11: 712–14. Johnson-Laird, P. N. (1988) The Computer and the Mind: an Introduction to Cognitive Science. Cambridge, MA: Harvard University Press. Kahn, T. C. (1955) ‘Clinically and Statistically Oriented Psychologists Split Our Profession’, American Psychologist 10: 171–2. Kelley, G. A. (1955) ‘I Itch Too’, American Psychologist 10: 172–3. Kelso, J. (1953) Harvard Study of Russia Called ‘Insane’ – Costs U.S. $450,000. The Boston Post, 28 September: 1. Kennan, G. F. (2000) ‘Long Telegram’, in D. Merrill and T. G. Paterson (eds) Major Problems in American Foreign Relations, Volume II, Since 1914. Boston, MA: Houghton Mifflin, pp. 210–12. Kevles, D. J. (1987) The Physicists: The History of a Scientific Community in Modern America. Cambridge, MA: Harvard University Press. Kluckhohn, C. (1952) ‘Universal Values and Anthropological Relativism’, in Modern Education and Human Values. Pittsburgh, PA: University of Pittsburgh Press, pp. 87–112. Koch, S. (1959a) ‘Epilogue: Some Trends of Study I’, in S. Koch (ed.) Psychology: a Study of a Science, Volume 3, Formulations of the Person and the Social Context. New York: McGraw-Hill, pp. 729–88. Koch, S. (1959b) ‘Introduction to Volume 3’, in S. Koch (ed.) Psychology: a Study of a Science, Volume 3, Formulations of the Person and the Social Context. New York: McGraw-Hill, pp. 1–6. Koestler, A. (1964) The Act of Creation. New York: Macmillan. Kohler, R. E. (1991) Partners in Science: Foundations and Natural Scientists, 1900–1945. Chicago, IL: University of Chicago Press.
Salkind_Chapter 32.indd 328
9/4/2010 10:32:39 AM
Cohen-Cole
The Reflexivity of Cognitive Science
329
Krechevesky, I. (1932a) ‘The Genesis of “Hypotheses” in Rats’, Psychological Review 45: 107–33. Krechevesky, I. (1932b) ‘ “Hypothesis” vs. “Chance” in the Pre-Solution Period in Sensory Discrimination-Learning’, University of California Publications in Psychology 6: 27– 44. Krugler, D. F. (2000) The Voice of America and the Domestic Propaganda Battles, 1945–1953. Columbia: University of Missouri Press. Kuhn, T. S. (1959) ‘The Essential Tension: Tradition and Innovation in Scientific Research’, in C. W. Taylor (ed.) The Third (1959) University of Utah Research Conference on the Identification of Scientific Talent. Salt Lake City: University of Utah Press, pp. 162–74. Kuhn, T. S. (1962) The Structure of Scientific Revolutions. Chicago, IL: University of Chicago Press. Latour, B. (1993) We Have Never Been Modern. Cambridge, MA and London: Harvard University Press. Leahey, T. H. (1992) ‘The Mythical Revolutions of American Psychology’, American Psychologist 47: 308–18. Lentz, T. F. (1950) ‘The Attitudes of World Citizenship’’, Journal of Social Psychology 32: 207–14. Locke, A. (1942) ‘Pluralism and Intellectual Democracy’, in Science, Philosophy and Religion, 2nd Symposium of the Conference on Science, Philosophy and Religion. New York: CSPR, pp. 196–209. Logue, A. W. (1985) ‘The Growth of Behaviorism: Controversy and Diversity’, in C. E. Buxton (ed.) Points of View in the Modern History of Psychology. Orlando, FL: Academic Press, pp. 169–96. McClelland, D. C. (1955) ‘The Psychology of Mental Content Reconsidered’, Psychological Review 62: 297–302. MacCorquodale, K. and Meehl, P. E. (1948) ‘On a Distinction between Hypothetical Constructs and Intervening Variables’, Psychological Review 55: 98–107. McGuire, F. L. (1956) ‘On the Issue “What is Science?” ’, American Psychologist 11: 152–3. McKellar, P. (1957) Imagination and Thinking: a Psychological Analysis. New York: Basic Books. Mandler, G. (2002) ‘Origins of the Cognitive Revolution’, Journal of the History of the Behavioral Sciences 38: 339–53. Maslow, A. H. (1943) ‘The Authoritarian Character Structure’, Journal of Social Psychology 18: 401–11. Maslow, A. H. (1946) ‘Problem-Centering versus Means-Centering in Science’, Philosophy of Science 13: 326–31. Maslow, A. H. (1948) ‘Cognition of the Particular and of the Generic’, Psychological Review 55: 22–39. Maslow, A. H. (1949) ‘The Expressive Component of Behavior’, Psychological Review 56: 261–72. Mayr, E. (1952) ‘Concepts of Classification and Nomenclature in Higher Organisms and Microorganisms’, Annals of the New York Academy of Science 56: 391–7. Melton, A. W. (1956) ‘Present Accomplishments and Future Trends in Problem-Solving and Learning Theory’, American Psychologist 11: 278–81. Merton, R. K. and Wolfe, A. (1995) ‘The Cultural and Social Implications of Sociological Knowledge’, American Sociologist 26: 15–39. Miller, G. A. (1963) ‘Thinking, Cognition, and Learning’, in B. Berelson (ed.) The Behavioral Sciences Today. New York: Basic Books, pp. 139–50. Miller, G. A. (1967) ‘Project Grammarama’, in The Psychology of Communication. New York: Basic Books, pp. 125–87. Miller, G. A. and Chomsky, N. (1963) ‘Finitary Models of Language Users’, in R. D. Luce, R. R. Bush and E. Galanter (eds) Handbook of Mathematical Psychology, Vol. II. New York: Wiley, pp. 419–91.
Salkind_Chapter 32.indd 329
9/4/2010 10:32:39 AM
330
Curriculum, Instruction and Learning
Miller, G. A., Galanter, E. and Pribram, K. H. (1960) Plans and the Structure of Behavior. New York: Henry Holt. Mooney, R. L. (1954) ‘Groundwork for Creative Research’, American Psychologist 9: 544 –8. Morawski, J. G. (1992) ‘Self-Regard and Other-Regard: Reflexive Practices in American Psychology, 1890 –1940’, Science in Context 5: 281–308. Morawski, J. G. (2000) ‘Just One More “Other” in Psychology?’, Theory & Psychology 10: 63–70. Newell, A., Shaw, J. C. and Simon, H. (1958) ‘Elements of a Theory of Human Problem Solving’, Psychological Review 65: 151–66. Newell, A., Shaw, J. C. and Simon, H. (1959) ‘Report on a General Problem-Solving Program’, Proceedings of the International Conference on Information Processing. Paris. Newell, A., Shaw, J. C. and Simon, H. A. (1962) ‘The Processes of Creative Thinking’, in H. E. Gruber, G. Terrell and M. Wertheimer (eds) Contemporary Approaches to Creative Thinking. New York: Atherton Press, pp. 63–119. Newell, A. and Simon, H. (1959) The Simulation of Human Thought (No. P–1734). Santa Monica, CA: RAND Corporation. Nicholson, I. A. M. (1998) ‘Gordon Allport, Character, and the “Culture of Personality”, 1897–1937’, History of Psychology 1: 52–68. Oppenheimer, J. R. (1956) ‘Analogy in Science’, American Psychologist 11: 127–35. Oppenheimer, J. R. (1958) ‘Theory Versus Practice in American Values and Performance’, in E. E. Morison (ed.) The American Style: Essays in Value and Performance. New York: Harper & Brothers, pp. 111–23. Poincaré, H. (1952) Science and Method, trans. F. Maitland. New York: Dover. Polya, G. (1945) How to Solve It. Princeton, NJ: Princeton University Press. Polya, G. (1954) Mathematics and Plausible Reasoning. Princeton, NJ: Princeton University Press. Polyani, M. (1958) Personal Knowledge. Chicago, IL: University of Chicago Press. Prentice, W. C. H. (1946) ‘Operationism and Psychological Theory: a Note’, Psychological Review 53: 247–9. Purcell, E. A., Jr (1972) The Crisis of Democratic Theory: Scientific Naturalism & the Problem of Values. Lexington: University Press of Kentucky. Richards, G. (1987) ‘Of What is History of Psychology a History?’, British Journal of the History of Science 20: 201–11. Riesman, D., w. t. a. o. Glazer, N. and Denney, R. (1950) The Lonely Crowd: a Study of the Changing American Character. New Haven, CT: Yale University Press. Robins, R. W., Gosling, S. D. and Craik, K. H. (1999) ‘An Empirical Analysis of Trends in Psychology’, American Psychologist 54: 117–28. Rogers, C. R. (1955) ‘Persons or Science? A Philosophical Question’, American Psychologist 10: 267–78. Rokeach, M. (1948) ‘Generalized Mental Rigidity as a Factor in Ethnocentrism’, Journal of Abnormal and Social Psychology 43: 259–78. Rokeach, M. (1949) ‘Rigidity and Ethnocentrism: a Rejoinder’’, Journal of Personality 17:467–74. Rokeach, M. (1950) ‘The Effect of Perception Time upon Rigidity and Concreteness of Thinking’, Journal of Experimental Psychology 20: 206–16. Rokeach, M. (1951a) ‘A Method for Studying Individual Differences in “NarrowMindedness” ’, Journal of Personality 30: 219–33. Rokeach, M. (1951b) ‘ “Narrow-Mindedness” and Personality’, Journal of Personality 30:234–51. Rokeach, M. (1951c) ‘Prejudice, Concreteness of Thinking, and Reification of Thinking’, Journal of Abnormal and Social Psychology 46: 83–91. Rokeach, M. (1951d) ‘Toward the Scientific Evaluation of Social Attitudes and Ideologies’, Journal of Psychology 31: 97–104.
Salkind_Chapter 32.indd 330
9/4/2010 10:32:39 AM
Cohen-Cole
The Reflexivity of Cognitive Science
331
Rokeach, M. (1960) The Open and Closed Mind: Investigations into the Nature of Belief Systems and Personality Systems. New York: Basic Books. Rose, N. (1996) Inventing Ourselves: Psychology, Power and Personhood. Cambridge and New York: Cambridge University Press. Rudolph, J. L. (2002) Scientists in the Classroom: The Cold War Reconstruction of American Science Education. New York: Palgrave. Sanford, S. H. (1952) ‘Annual Report of the Executive Secretary’, American Psychologist 7: 686–96. Schlesinger, A. M., Jr (1949) The Vital Center: The Politics of Freedom. Boston, MA: Houghton-Mifflin. Shils, E. (1954) ‘Authoritarianism: “Right” and “Left” ’, in R.Christie and M. Jahoda (eds) Studies in the Scope and Method of the Authoritarian Personality. Glencoe, IL: Free Press, pp. 24 – 49. Shorske, C. E. (1981) Fin-de-Siècle Vienna: Politics and Culture. New York: Vintage Books. Shulman, H. C. (1990) The Vèice of America: Propaganda and Democracy, 1941–1945. Madison: University of Wisconsin Press. Simon, H. and Newell, A. (1962) ‘Computer Simulation of Human Thinking and Problem Solving’, in M. Greenberger (ed.) Management and the Computer of the Future. New York: MIT Press and Wiley. Simon, H. A. (1966a) ‘Scientific Discovery and the Psychology of Problem Solving’, in R. G. Colodny (ed.) Mind and Cosmos. Pittsburgh, PA: University of Pittsburgh Press, pp. 22– 40. Simon, H. A. (1966b) ‘Thinking by Computers’, in R. G. Colodny (ed.) Mind and Cosmos. Pittsburgh, PA: University of Pittsburgh Press, pp. 3–21. Simon, H. A. (1967) ‘Understanding Creativity’, in J. C. Gowan, G. D. Demos and E. P. Torrence (eds) Creativity: Its Educational Implications. New York: John Wiley & Sons, pp. 43–53. Simon, H. A. (1980) ‘Cognitive Science: The Newest Science of the Artificial’, Cognitive Science 4: 33– 46. Skaggs, E. B. (1945) ‘Personalistic Psychology as Science’, Psychological Review 52: 234 – 48. Skinner, B. F. (1950) ‘Are Theories of Learning Necessary?’, Psychological Review 57: 193–216. Skinner, B. F. (1956) ‘A Case History in Scientific Method’, American Psychologist 11: 221–33. Smith, L. D. (1986) Behaviorism and Logical Positivism: A Reassessment of the Alliance. Stanford, CA: Stanford University Press. Smith, L. D. (1990) ‘Metaphors of Knowledge and Behavior in the Behaviorist Tradition’, in D. E. Leary (ed.) Metaphors in the History of Psychology. Cambridge and New York: Cambridge University Press, pp. 237–66. Sokal, M. (1984) ‘The Gestalt Psychologists in Behaviorist America’, American Historical Review 89: 1240–63. Solomon, L. N. (1955) ‘The Paradox of the Experimental Clinician’, American Psychologist 10: 170–1. Spence, K. W. (1952) ‘Clark Leonard Hull: 1884 –1952’, American Journal of Psychology 65: 639– 46. Spence, K. W. (1956) Behavior Theory and Conditioning. New Haven, CT: Yale University Press. Spence, K. W. (1957) ‘The Empirical Basis and the Theoretical Structure of Psychology’, Philosophy of Science 24: 97–108. Sperry, R. W. (1955) ‘On the Neural Basis of the Conditioned Response’, British Journal of Animal Behavior 3: 41– 4. Stanger, R. (1936a) ‘Fascist Attitudes: an Exploratory Study’, Journal of Social Psychology 7: 309–19.
Salkind_Chapter 32.indd 331
9/4/2010 10:32:39 AM
332
Curriculum, Instruction and Learning
Stanger, R. (1936b) ‘Fascist Attitudes: Their Determining Conditions’, Journal of Social Psychology 7: 438–54. Stevens, S. S. (1936) ‘A Scale for the Measurement of a Psychological Magnitude’, Psychological Review 43: 405–16. Strupp, H. H., Castore, G. F, Lake, R. A., Merrill, R. M. and Bellak, L. (1956) ‘Comments on Rogers’ “Persons or Science” ’, American Psychologist 11: 153–7. Sundberg, N. D. (1960) ‘Basic Readings in Psychology’, American Psychologist 15: 343–5. Thorne, F C. (1956) ‘Psychologists, Heal Thyselves!’ American Psychologist 11: 152. Toews, J. E. (1991) ‘Historicizing Psychoanalysis: Freud in His Time and for Our Time’, Journal of Modern History 63: 504– 45. Tolman, E. C. (1948) ‘Cognitive Maps in Rats and Men’, Psychological Review 55: 189–208. Winston, A. S. and Blais, D. J. (1996) ‘What Counts as an Experiment?: a Transdisciplinary Analysis of Textbooks, 1930–1970’, American Journal of Psychology 109: 559–616. Wise, M. N. (1988) ‘Mediating Machines’, Science in Context 2: 77–113. Wise, M. N. (1993) ‘Mediations: Enlightenment Balancing Acts, or the Technologies of Rationalism’, in P. Horwich (ed.) World Changes: Thomas Kuhn and the Nature of Science. Cambridge, MA: MIT Press, pp. 207–56.
Salkind_Chapter 32.indd 332
9/4/2010 10:32:39 AM
33 History, Culture, Learning, and Development Patricia M. Greenfield, Ashley E. Maynard and Carla P. Childs
W
e feel very privileged to be part of the special issue honoring and remembering Dr. Ruth Munroe. In the history of our field, she was a pioneering figure who introduced a cross-cultural approach to all aspects of human development into the field of cross-cultural psychology. With her husband, Dr. Robert L. Munroe, she blazed a path for fruitful collaboration between members of anthropology and psychology departments. There are a number of important ways in which the research that we will present in this article can be considered the fruit of intellectual and empirical seeds planted by Dr. Munroe, in collaboration with her husband. First, the Munroes carried out longitudinal research, making connections between two different parts of the life cycle in Kenya. Second, they made cross-cultural investigations of children’s work, highlighting the importance of work as an important shaper of children’s development and the sensitivity of children’s work to larger ecological forces. Third, they realized the potential impact of economic factors on cognitive performance and were able to test this relationship through controlled cross-cultural study. The research presented in this article stands on the shoulders of Ruth Munroe’s collaborative research program in all three respects: It is longitudinal, it centers on children’s work, and it connects larger economic forces with pathways of socialization and human development. In human history, there have been three major ecological adaptations: hunting and gathering, agriculture, and commerce with advanced technology. Like the Munroes, we hypothesize that each ecology emphasizes a different Source: Cross-Cultural Research, 34(4) (2000): 351–374.
Salkind_Chapter 33.indd 333
9/4/2010 6:31:20 PM
334
Curriculum, Instruction and Learning
set of skills, different developmental pathways, and different processes of socialization or informal education. Human development is an adaptation to two types of characteristics: the characteristics of the surrounding ecology, such as the climate and type of land available, and the characteristics of cultural practices that arise as adaptations to those ecologies (Weisner, 1984). It follows that different socialization patterns are necessary to prepare children for a changing environment or for an environment that is different from the one in which parents themselves were raised. In this first diachronic study of the impact of ecocultural changes on the practices of informal education, we demonstrate how the distal variable of historical epoch affects proximal variables in the teaching of weaving in a Zinacantec Maya community, resulting in changed teaching practices from one generation to the next. We also investigate a closely related change from a small, closed stock of woven patterns to a new variety of woven patterns, with constant innovation. Last, we show how ecocultural variability in subsistence patterns affects the representation of cultural artifacts.
Informal Education Processes of informal education have been documented by many researchers in recent years (e.g., Greenfield, 1984; Greenfield & Lave, 1982; Lave & Wenger, 1990; Rogoff, 1990). We have come to understand informal education as an apprenticeship process that expresses cultural goals. Past studies of informal education have focused on scaffolding processes from parent to child in the teaching of everyday tasks (Rogoff, Mistry, Göncü, & Mosier, 1993), and the apprenticeship of crafts such as carpentry, tailoring (Lave & Wenger, 1990), and weaving (Childs & Greenfield, 1980). No study has yet considered how processes of apprenticeship change with changes in the ecocultural environment. The research presented in this article investigates the historical transition from agriculture to commerce, focusing on the implications of this transition for learning and development. We focus on three areas of learning and development: the creation of artifacts, apprenticeship, and the symbolic representation of those artifacts. The data are taken from research conducted with two generations of participants in Nabenchauk, a Zinacantec Maya hamlet in the highlands of Chiapas, Mexico.
History, Culture, and Socialization Our investigation also relates to larger questions concerning the relationship between history, culture, and the socialization of the individual. This relationship is central to the field of cultural psychology and, particularly,
Salkind_Chapter 33.indd 334
9/4/2010 6:31:21 PM
Greenfield et al.
History, Culture, Learning
335
to the sociohistorical approach. Culture at any given moment is the product of historical change, as well as a reflection of cultural constancy and conservatism. The process of cultural transmission from one generation to another links culture at one historical moment with culture at another historical moment. What is called cultural transmission from the point of view of society is called socialization from the point of view of the family, and development from the point of view of the individual. Socialization is intrinsically future oriented; it prepares children for an adulthood that has not yet arrived. It follows that changing socialization patterns should be a key component of the psychological adaptation to social change. However, an important question in conditions of ecocultural change is, do parents merely repeat the socializing process that they underwent as children? Or do parents develop new methods and processes as societal conditions – in this case, economic conditions – change? And what, if any, are the consequences of such changes in socialization for the development of children? The sociohistorical research tradition, derived from Vygotsky (1962,1978), emphasizes that development is constructed through social interaction, cultural practices, and the internalization or cognitive appropriation of symbolic tools (Saxe, 1990). Although the historical dimension of cultural practices and symbolic tools is emphasized – that is, we understand how the practices and tools fit with the development of the culture itself over time – the developmental implications of historical change for those cultural practices and symbolic tools have not been studied directly. To do this, diachronic evidence comparing the development and socialization of one generation with that of the next is required. In taking up these issues of the connection between history and individual development, it is important to consider how, methodologically, to connect macro conditions on the societal level to the micro level of individual development and socialization. We use both quantitative and qualitative analyses to demonstrate the relationship of the macro conditions of a society undergoing ecocultural change to the micro level of individual development and behavior.
Results from Our Study of the First Generation of Weavers In the first video study of informal education in a nonindustrial society, Childs and Greenfield (1980) looked at the interactional processes involved in the transmission of weaving skill from one generation to the next in Zinacantán. This study was a sequel to another in which the authors compared the cognitive consequences of weaving, the most complex skill acquired by Zinacantec girls, with those of formal schooling, received predominantly by boys at the time the data were collected (Greenfield & Childs, 1977).
Salkind_Chapter 33.indd 335
9/4/2010 6:31:21 PM
336
Curriculum, Instruction and Learning
In 1969 and 1970, weaving instruction in Zinacantán was characterized by a relatively error-free scaffolding process based on observation of models, obedience to developmentally sensitive commands, and use of help when needed (Childs & Greenfield, 1980; Greenfield, 1984). This mode of informal instruction was well adapted both to the superordinate Zinacantec goal of preserving the baz’i, or “true” (i.e., traditional Zinacantec) way of life (Greenfield & Lave, 1982), and to the innate nature of Zinacantec children (Brazelton, Robey, & Collier, 1969). In terms of developmental theory, weaving apprenticeship followed a Vygotskian model of learning, with its emphasis on guidance by a more skilled “other” (Vygotsky, 1978). In weaving, the “true” way involved learning to construct the repertoire of only four traditional Zinacantec patterns. Pattern innovation and the creation of new patterns were simply not a part of the culture or the transmission process. In 1969 and 1970, the transmission of weaving skill was a relatively error-free, scaffolded process. Teachers stayed close to their pupils and prevented errors before they happened. In the intervening two decades since the first weaving data were collected, profound social change has occurred in Zinacantán. Many Zinacantecs have become entrepreneurs, joining the modern Mexican economy (Cancian, 1990). The community is in the process of a transition from agriculture and a subsistence economy to commerce, entrepreneurship, and cash. Both men and women have become involved in the new cash economy. Some men who formerly farmed are now involved in the transport business, running a van service back and forth to the former colonial city of San Cristobal de las Casas. Some girls and women weave and embroider servilletas, pieces of cloth suitable for use as placemats by tourists and other outsiders who buy them. Whereas the method of apprenticeship practiced in 1969 and 1970 was adapted to transmitting a tradition intact, another method of apprenticeship, trial-and-error learning, with its emphasis on the learner’s own discovery process, should foster the development of an ability to innovate. If innovation had, in fact, entered the culture as a value orientation in response to or as part of commercial entrepreneurship, we thought that weaving education would make a corresponding shift. Earlier, the teacher had carefully built a scaffold of help for the learner, providing help before the learner had an opportunity to make a serious error. Because the learner, in this situation, was afforded very little opportunity to make a mistake, let alone to explore, we predicted that the methods of teaching and learning would change to a more independent trial-and-error approach. From the point of view of developmental theory, this is the model of learning emphasized and valued by Piaget (1965/1997). Independence could also come from having a mother engaged in her own commercial activity. A mother might, for example, use her time to create a commodity to sell, assigning another daughter to teach the younger one how to weave.
Salkind_Chapter 33.indd 336
9/4/2010 6:31:21 PM
Greenfield et al.
History, Culture, Learning
337
Our study of the effects of cultural change on developmental processes has three parts. In the first part, we present our qualitative data on the changes in the woven artifacts. We predicted that the shift toward entrepreneurship would engender greater innovation in Zinacantec woven artifacts, and this prediction was confirmed. We also predicted that greater innovation would be preceded by a different socialization pattern, one not oriented to the maintenance of tradition. This prediction was confirmed by the results of the second part of our study: We show, using qualititative and quantitative data, how informal weaving apprenticeship at home moved from a more controlled, interdependent style to a more independent, trial-and-error style. In the third part, we present both qualitative and quantitative findings from studies of Zinacantecs’ representations of woven patterns, linking changes in representation to economic movement away from subsistence and toward commerce.
From Tradition to Innovation: The Creation of Artifacts Innovation was in sharp contrast to the values and practices observed in Zinacantán in 1969 and 1970. At that time, tradition, rather than innovation, was valued; there was but a single baz’i, or “true way,” to do everything, from speaking to dressing. In 1969 and 1970, woven artifacts, like other parts of the culture, were stable and unchanging, limited by tradition. Woven patterns were limited to two red-and-white striped configurations, one multicolor stripe, and one gray-and-white basket-weave pattern. Figure 1 shows two Zinacantec boys dressed alike in 1970. Indeed, in 1970, all males dressed virtually alike; for example, these two boys are wearing red-and-white striped pon-
Note: The ponchos are red-and-white striped, with white predominating (see Figure 6, top). Photo courtesy of Sheldon Greenfield.
Figure 1: Two brothers wearing the Zinacantec Poncho, 1970
Salkind_Chapter 33.indd 337
9/4/2010 6:31:21 PM
338
Curriculum, Instruction and Learning
Note: Figure 2a shows the whole garment with embroidered flowers on two sides and a bottom band of brocade-woven deer. Figure 2b is a detail from another poncho showing a bottom band of brocade-woven flowers. The background is a red-and-white stripe. Note the high ratio of red to white in the fringe of each poncho. Photos courtesy of Lauren Greenfield.
Figures 2a and 2b: Two different Zinacantec Ponchos, 1991
chos in the only available pattern with no distinctive embroidery. By 1991, each poncho had unique, innovative elements of design. Two examples, each with different brocaded designs along the lower edge (Figures 2a and 2b) and elaborate embroidered designs on each side (Figure 2a), reflect a contemporary trend of pattern innovation. No two pieces of clothing or other woven items were alike in their brocaded and embroidered designs. We saw both new motifs and new recombinations of old motifs. Although the garments were now unique, one element remained the same – the configuration of the red-and-white background stripe (see Figure 2b, top). However, the ratio of red to white had increased dramatically. (The red and white background stripe of the poncho was an important stimulus in the pattern representation experiment described in the last section.)
Salkind_Chapter 33.indd 338
9/4/2010 6:31:21 PM
Greenfield et al.
History, Culture, Learning
339
From Interdependence to Independence: Apprenticeship Methods Based on our research in 1969 and 1970, we concluded that the goal of Zinacantec education and socialization was the intergenerational replication of tradition: Learning to weave meant learning to weave a few specific patterns. Because tradition was maintained by a more controlled apprenticeship process, with the teacher guiding the pupil very closely, we predicted that innovation would be the result of a less controlled, less guided apprenticeship process, in other words, a more independent process. Our focus for studying changes in apprenticeship methods was on the learning processes involved in the important cultural technology of weaving, the most complex skill in the culture, a skill acquired by virtually all Zinacantec females. Recall that the particular way in which weaving was taught in 1969 and 1970 fostered the goal of maintaining tradition: The learning process was a relatively error-free one in which the teacher, usually the mother, sensitively provided help, models for observation, and verbal direction in accord with the developmental level of the learner. The mother provided a scaffold of help that allowed the learner to complete a weaving she could not have done by herself. There were no failures; every young girl successfully learned to weave. Because the process was highly structured by the older generation and did not allow room for learner experimentation and discovery, the method of informal education (or apprenticeship) was, as mentioned earlier, well adapted for the continuation of tradition and the status quo. In the 1991 and 1993 data, we expected to see more trial-and-error learning, with the learner spending more time weaving unassisted and having to ask for help herself.
Method Participants The participants were 72 Zinacantec girls, ranging in age from 3 to 19 (mean = 11.8, median = 12). The first generation of girls was observed learning to weave in 1970. The second generation, virtually all daughters, nieces, and goddaughters of the first generation, was mainly observed in 1991. Two descendents of the original sample, too young to weave in 1991, were observed in 1993. The girls had varying experience in learning to weave. For some girls, the videotaped session in our protocol was their very first weaving session. Others had woven various items before, ranging in difficulty from very low to very high. Participants were recruited in two ways: by a Zinacantec research assistant, Xun Pavlu, who visited people he knew in the community and asked them to participate, and by word of mouth, as people in the community began to know the researchers and feel comfortable with them.
Salkind_Chapter 33.indd 339
9/4/2010 6:31:21 PM
340
Curriculum, Instruction and Learning
Procedure Participants were videotaped for one hour in front of their homes (or inside if it was raining). Participants and their mothers were interviewed about the girls’ experience in weaving. A more extensive description of the procedures is presented in Greenfield, Maynard, and Childs (1999). The medium of communication between researchers and participants was the Maya language of Tzotzil.
Coding of the Videotapes The videotapes were extensively coded by Childs. One major variable of interest in this article is the proportion of time in which learner and teacher were engaged in collaborative weaving activity. This was defined as the proportion of time the learner and teacher/helper were observed working together. For this article, we measured collaborative activity during two segments of the weaving process that are relatively difficult to carry out: attaching the endstick (for woven items that do not have fringe and must therefore be woven to the end of the warp threads), and the first cycle of weaving (the first time a weft thread is inserted into the warp). For learners who attached an endstick during their video observation, we used this segment for our measure; for learners who were making fringed items and therefore did not attach an endstick, we used the first cycle of weaving for our measure. A second measure of interest was the extent of the learner’s observational activity; this was defined as time spent observing the teacher demonstrate some aspect of the weaving process. Interrater reliability for these measures had been established by Childs and Greenfield (1980), based on the 1970 study. However, for the historical comparison presented in this article, Childs recoded all of the 1970 videotapes, as well as coding the 1991/1993 data; this recoding of the old data prevented “historical drift” in the coding process and ensured that the old and new videos were coded in exactly the same way. New interrater reliability was also established. Interrater reliability for collaborative weaving activity and learner observation was based on a random sample of eighteen weaving segments from eighteen learners. Greenfield served as the reliability coder. The correlation between the two coders for proportion of collaborative weaving activity was .8872 ( p = .000); for proportion of learner observation, the correlation was .9703 ( p = .000). Another variable of interest in the current article was the generational status of the teacher. This was a 4-point scale: no helper, helper younger than the learner, helper in same age cohort as learner, helper in older generation than learner. Generational status of the teacher/helper was coded from the video record aided by notes taken at the time of the observation, familiarity with the weavers and their families, and family tree records.
Salkind_Chapter 33.indd 340
9/4/2010 6:31:21 PM
Greenfield et al.
History, Culture, Learning
341
Textile Commerce Scale To assess mothers’ and daughters’ experiences in textile commerce, we used our own interview data supplemented by access to a survey of the community carried out in 1991 by the Stanford Medical Project. From these data sources, we created a textile-commerce scale. Mother-daughter textile commerce scores are an additive composite of various binary items: for example, whether either mother or daughter sold her weavings, whether daughter wound balls of thread for wages, or whether mother or daughter worked in a family retail store selling thread.
Results Qualitative Results The 1970 video data reveal a highly structured apprenticeship method. Katal Pavlu, age 9, is one example of a girl learning to weave. In the videotape of Katal’s learning session, we first see Katal there by herself. Very soon after the tape starts, her mother enters to help on her own initiative, without being summoned by her daughter. Her mother is very much there, continuously helping or doing part of the weaving for her daughter. This type of involved participation is illustrated in a frame from the videotape, shown in Figure 3. In the video frame, four hands on the loom symbolize the closely assisted style of weaving apprenticeship typical of the era. Katal grew up and had daughters of her own. In the video of her daughter, Loxa Santis, learning to weave 21 years later, also at about age 9, we see a style of teaching geared more toward independent learning (see Figure 4). In the video of Loxa learning to weave, her mother is not there
Note: Nabenchauk (1970), video by Patricia Greenfield.
Figure 3: Four hands on the loom: Xunka’ helps her daughter, Katal Pavlu
Salkind_Chapter 33.indd 341
9/4/2010 6:31:21 PM
342
Curriculum, Instruction and Learning
Note: Loxa is about the same age as her mother was in Figure 3. Her teacher is her older sister, Xunka’ Santis. Nabenchauk, 1991, video by Patricia Greenfield.
Figure 4: Katal Pavlu’s daughter, Loxa Santis, learns to weave 21 years later
at all to help her. This might be because she is busy embroidering a blouse sold on order to one of the researchers; this is an example of textile commerce. In addition, Figure 4 illustrates that the teacher no longer comes from the older generation; it is Loxa’s older sister, Xunka’, a member of the peer generation. Although Xunka’ is the teacher, she is paying little attention to the learner (note the direction of the teacher’s visual attention in Figure 4, away from the learner). Indeed, Loxa, the learner, has to call Xunka’ several times, taking the initiative to get her attention. We see, in the diachronic study of one family over two generations, how learning has moved from a more interdependent style of apprenticeship to a more independent style of apprenticeship.
Quantitative Results But how general is this historical case study? Its generality was confirmed by the quantitative analysis of our entire sample of weaving learners from both historical periods. Using structural equation modeling as our primary means of statistical analysis, we demonstrated the predicted relationship between historical period and learner independence-interdependence. Moreover, as predicted, this relationship was mediated by mother and daughter’s participation in textile-related commerce. Figure 5 shows how participation in textile-related commerce creates a pathway by which historical period, a very distal variable, influences collaboration, a proximal variable in cultural apprenticeship, through the mediation of involvement in textile commerce. The model shows that from one historical period to the next, participation in textile-related commerce increases significantly (a positive link of .38 between
Salkind_Chapter 33.indd 342
9/4/2010 6:31:21 PM
Greenfield et al.
History, Culture, Learning
343
Historical period 0.38∗∗
Textile commerce –0.28∗
Collaboration Note: *Parameter is significant at the .05 level. **Parameter is significant at the .01 level. Using EQS (Bentler, 1980, 1995) with maximum likelihood estimation, we found a good fit between model and data. The comparative fit index (CFI) for the tested model was 1.000, and the model chi-square was nonsignificant, χ2 = .029, p = .8659. (For the CFI and chi-square test, good fit is indicated by a value greater than .90 and by nonsignificant results, respectively. A CFI of 1 is the maximum possible). The model includes all weaving participants videotaped attaching the endstick or weaving the first weft thread (N = 69).
Figure 5: Path diagram of relationship among the variables of historical period, mother-daughter involvement in textile commerce, and teacher-learner collaboration
historical period and textile commerce, significant at the .01 level). More involvement in textile-related commerce led, in turn, to less collaborative activity between learner and teacher (a negative link of .28 between textile commerce and collaboration, significant at the .05 level); in other words, it led to a decrease in learner-teacher interdependence. In line with our diachronic case study, the relationship between the generational status of the teacher/helper and the amount of collaborative activity also generalized to the sample as a whole. In other words, just as Katal’s mother provided more collaborative help to Katal in 1970 than Loxa’s older sister provided to Loxa in 1991, so too there was an overall significant correlation between the generational status of the teacher and amount of collaborative weaving between teacher and learner.1 This result indicated that older teacher/helpers (who were also more skilled weavers) provided more collaborative assistance to weaving learners than younger teacher/helpers did. In other words, on the average, a mother would provide more collaborative help to a weaving learner than an older teenage sister, who would, in turn, provide more collaborative help to the learner than a younger cousin. There was additional evidence of an increase, from 1970 to the 1990s, in learner independence and trial-and-error learning. Between the first and second generation of learners, we found a significant increase in the proportion of time learners spent working independently (attaching the endstick without help from the teacher)2 and a significant decrease in the proportion of time learners spent watching their teachers demonstrate this part of the weaving process.3 Together, these findings paint a picture of an increase in learner independence and trial-and-error experimentation from 1970 to the 1990s.
Salkind_Chapter 33.indd 343
9/4/2010 6:31:21 PM
344
Curriculum, Instruction and Learning
Discussion These findings indicate that changes in apprenticeship accompany changes in the ecocultural milieu. Our path model demonstrates the relationships among the variables, from the distal variable of historical period to the experiential variable of mother-daughter participation in textile commerce to the most proximal variable of teacher-learner collaboration. Other analyses showed that in the 1990s, weaving learners spent more time working independently and less time observing a model provided by the teacher than their mothers, aunts, and godmothers had when they learned to weave in 1970. On a theoretical level, our findings indicate that processes of scaffolded guidance (the processes emphasized by Vygotsky, 1978) are emphasized more when cultures are in a more stable, traditionmaintaining state. In contrast, processes of independent trial-and-error experimentation (the processes emphasized by Piaget, 1965/1977) are used more when cultures are in a more dynamic, innovation-oriented state.4 As the Zinacantecs moved from one state to the other in our time slice of two decades, the emphasis in their modes of cultural learning changed accordingly. However, change was uneven. As predicted, it was most concentrated in those families who had made the greatest shift to a commercial way of life.
From Specificity to Abstraction: Symbolic Representation Subsistence involves exchanges and contributions of very specific items. In sharp contrast, a cash economy involves the abstraction of money, which is a totally generalized medium of exchange. Our study of the shift from specificity to abstraction focuses on participants’ ability to represent Zinacantec woven patterns. Our hypothesis was that the historical increase in commerce and use of money would lead to an increase in abstract (as opposed to detailed) representation of woven patterns. We also thought that this historical shift would be mediated by commercial involvement.
Method Participants Participants were 202 Zinacantec children and young adults, ranging in age from 3.5 to 22, with a mean age of 11.54 years. Participants were recruited by the same Zinacantec assistant as in the study of weaving apprenticeship, Xun Pavlu.
Salkind_Chapter 33.indd 344
9/4/2010 6:31:22 PM
Greenfield et al.
History, Culture, Learning
345
Materials Materials included a wooden frame and sticks that could be arranged in striped patterns inside the frame. The sticks were available in three widths – narrow, medium, and broad. In each width, sticks came in several colors, including pink, orange, red, and white. In addition, participants were provided examples of Zinacantec woven items, one male poncho and one female shawl; they were asked to use the sticks to represent the poncho and the shawl. The poncho (such as was shown in Figure 1) and shawl each had a distinctive configuration of stripes. The male pattern contains a simple red-and-white stripe, whereas the female pattern contains a more complex red-and-white stripe. Examples of each configuration, circa 1969, are shown in Figure 6. The particular items to be represented were whatever the participant was wearing (shawl if a girl, poncho if a boy), plus another contemporary standard example from the clothing of the opposite sex.
Procedure Participants came to the home of a Zinacantec family to be individually tested on the pattern representation procedure. Each participant was asked to use the colored sticks to represent two patterns, the pattern for the men’s poncho and the pattern for the women’s shawl (see Figure 6). (Additional pattern representation tasks are analyzed in Greenfield, Maynard, & Childs, 1999.)
Note: Two examples of each red-and-white stripe pattern are shown (photo by Carla Childs).
Figure 6: Striped configuration for male Poncho (upper left), striped configuration for female Shawl (lower right)
Salkind_Chapter 33.indd 345
9/4/2010 6:31:22 PM
346
Curriculum, Instruction and Learning
Family Commerce Scale Because we had both boys and girls in the data set, we made a scale of family participation in nontextile commerce. Like the textile commerce scale, this scale was derived from interview and census data. Almost all items could equally apply to boys or girls. The scale included such items as the family owning a television, working in a local shop, and selling peaches.
Results Styles of Representation in 1969 and 1970 Detailed representation. Skilled weavers often produced an accurate analysis of the configuration of stripes (Greenfield & Childs, 1977). Their analytic representations were always specific or detailed: Each thread in a broad stripe was represented by a separate, thin stick, just as a weaver would construct a broad stripe out of putting together several individual threads (see Figure 7). The accuracy of the pattern analysis can be seen by comparing the configurations of red-and-white stripes constructed in the experiment (see Figure 7) with the actual woven patterns (see Figure 6). Abstract representation. Figure 8 shows an abstract representation of the same woven patterns. This is a style of representation never used by the 1969 and 1970 Zinacantec weavers but used by U.S. college students (Greenfield & Childs, 1977). Like Zinacantec weavers, these college students carried out an accurate, analytic representation of the configuration of stripes (compare Figure 8 with the actual woven patterns in Figure 6). However, this representation uses a single broad stick for a broad stripe rather than combining several narrow ones into a single stripe. As a representation of the two patterns, this strategy is equally accurate. However, it is less specific or detailed, thus more general or abstract. Poncho
Shawl
KEY: RED WHITE
Figure 7: Detailed representation of Poncho and Shawl
Salkind_Chapter 33.indd 346
9/4/2010 6:31:22 PM
Greenfield et al.
Poncho
History, Culture, Learning
347
Shawl
KEY: RED WHITE
Figure 8: Abstract representation of Poncho and Shawl
Styles of Representation in 1991 Detailed representations showing a line-by-line or thread-by-thread construction of the patterns remained in 1991. However, many participants used the abstract style, as depicted in Figure 8. Abstraction had been added to analysis of the woven patterns. Our hypothesis was that it was participation in the money economy, with its abstract medium of exchange, that caused this change to a more abstract and less detailed style of representation.
Quantitative Analysis The historical change toward increasingly abstract representation and the role of commercial involvement in this shift were tested by means of structural equation modeling. The structural equation model (Greenfield, Maynard, & Childs, 1999) confirmed our hypotheses that there was a historical shift from detailed to abstract representation of the woven patterns and that this shift was mediated by involvement with commerce. For the purpose of creating a quantitative variable that could be used in a structural equation model, abstract representation was based on the number of medium and broad sticks (as opposed to thin sticks) used to represent stripes in the poncho and shawl. The model showed that from one historical period to the next, participation in nontextile commerce increased and that this increase led, in turn, to a more abstract mode of representation. Although we tend to associate both formal schooling and maturational age with the development of abstraction, the Wald statistical test indicated that neither of these variables contributed to mediating the historical increase in abstract visual representation.
Discussion Our results indicate that ecocultural patterns affect the cognitive representation of cultural artifacts. That is, those Zinacantec children whose families were more involved in commercial activity tended to show a more abstract style of
Salkind_Chapter 33.indd 347
9/4/2010 6:31:22 PM
348
Curriculum, Instruction and Learning
representing the woven patterns they were presented. Representational style is not a static quality of a given population. Instead, representational style can change and adapt in the face of changing ecocultural conditions.
General Conclusions The diachronic study of ecocultural change has reaped rich rewards: It reveals theoretical and empirical links between individual processes of cultural apprenticeship and societal processes of cultural change. Our qualitative and quantitative findings point to a process of reciprocal change in which societal conditions provide an ecological push toward new modes of cultural apprenticeship, as new modes of apprenticeship create a younger generation with the skill profile appropriate to the changed societal conditions. Our findings indicate that processes of cultural learning and cultural transmission change as cultures change over time. Based on our earlier analysis, we predicted that socioeconomic changes in the culture would be accompanied by a change in the cultural goals of socialization – that innovation would begin to replace conformity to tradition and, most important for this study, that informal education would reflect the changing value system by coming to rely more on trial-and-error methods, less on demonstration and help (Greenfield, Brazelton, & Childs, 1989). We have found that the Zinacantecs used scaffolded guidance in weaving apprenticeship when they were in a more stable, tradition-maintaining state. In contrast, they used more independent, trial-and-error learning when they moved to a more dynamic, innovation-oriented state. As predicted, the teaching style associated with innovation and independence was used more in those families who had made a greater shift from agriculture to commerce. Sociocultural forces on the macro level affected the process of cultural apprenticeship on the micro level. Changes in processes of cultural apprenticeship produced a new generation that was well adapted to the changed ecological niche. In other words, there was a tight relationship between a changing ecological niche and a changing developmental niche (Super & Harkness, 1986). One aspect of this adaptation was changes in the creation of cultural artifacts. With the shift from a more interdependent to independent style of weaving apprenticeship, girls had the independence to be more creative in their weaving, going outside the traditional frameworks of what a woven piece of clothing should look like and innovating with new designs and colors. At the same time, commerce itself was a socializing force that affected cognitive representation. As the economy moved from subsistence to money-based commerce in our window of two decades, styles of representing textiles became more abstract and less tied to the detailed way
Salkind_Chapter 33.indd 348
9/4/2010 6:31:22 PM
Greenfield et al.
History, Culture, Learning
349
in which the textiles were created. Our quantitative analysis showed that this change in representational style was mediated by participation in the commercial economy. Our diachronic study is a kind of longitudinal study on the family, rather than individual, level. This new methodology was able to demonstrate links between cultural change, variability in the production and representation of cultural artifacts, and the apprenticeship process by which people learn to produce those artifacts. When the Zinacantecs were in a more homogeneous, agriculture-based ecocultural pattern, socialization processes fostered a continuance of tradition and a more specific style of representation. As many families moved to a more commercial, money-based ecocultural pattern, socialization processes changed to stimulate independent learning, innovation, and abstraction.
Notes 1. r = .3152; p = .026, two-tailed test; n = 50. This correlation is based on all weaving learners who had teachers or helpers. 2. One-way analysis of variance, F(1, 58) = 5.0793, p = .0280. 3. One-way analysis of variance, F(1, 58) = 11.1965, p = .0014. 4. The link to Piaget and Vygotsky was suggested by R. Gelman (personal communication, 1991).
References Bentler, P. M. (1980). Multivariate analysis with latent variables: Causal modeling. Annual Review of Psychology, 31, 419–456. Bentler, P. M. (1995). EQS: Structural Equation Program Manual. Encino, CA: Multivariate Software. Brazelton, T. B., Robey, J. S., & Collier, G. A. (1969). Infant development in the Zinacantecan indians of southern Mexico. Pediatrics, 44, 274–383. Cancian, F. (1990). The Zinacantan cargo waiting list as a reflection of social, political, and economic changes, 1952–1987. In L. Stephen & J. Dow (Eds.), Class, politics, and popular religion in Mexico and Central America (pp. 63–76). Washington, DC: American Anthropological Association. Childs, C. P., & Greenfield, P. M. (1980). Informal modes of learning and teaching: The case of Zinacanteco weaving. In N. Warren (Ed.), Studies in cross-cultural psychology ( Vol. 2, pp. 269–316). New York: Academic Press. Greenfield, P. M. (1984). A theory of the teacher in the learning activities of everyday life. In B. Rogoff & J. Lave (Eds.), Everyday cognition (pp. 117–138). Cambridge, MA: Harvard University Press. Greenfield, P. M., Brazelton, T. B., & Childs, C. P. (1989). From birth to maturity in Zinacantan: Ontogenesis in cultural context. In V. Bricker & G. Gossen (Eds.), Ethnographic encounters in Southern Mesoamerica: Celebratory essays in honor of Evon Z. Vogt (pp. 177–216). Albany: Institute of Mesoamerican Studies, State University of New York.
Salkind_Chapter 33.indd 349
9/4/2010 6:31:22 PM
350
Curriculum, Instruction and Learning
Greenfield, P. M., & Childs, C. P. (1977). Weaving, color terms, and pattern representation: Cultural influences and cognitive development among the Zinacantecos of southern Mexico. Inter-American Journal of Psychology, 11, 23–48. Greenfield, P. M., & Lave, J. (1982). Cognitive aspects of informal education. In D. Wagner & H. Stevenson (Eds.), Cultural perspectives on child development (pp. 181–207). San Francisco: Freeman. Greenfield, P. M., Maynard, A. E., & Childs, C. P. (1999). Historical change, cultural learning, and cognitive representation in Zinacantec Maya children. Manuscript submitted for publication. Lave, J., & Wenger, E. (1990). Situated learning: Legitimate peripheral participation. New York: Cambridge University Press. Piaget, J. (1977). Developments in pedagogy. Reprinted in H. E. Gruber & J. J. Vonèche (Eds.), The essential Piaget: An interpretive reference and guide (pp. 696–719). New York: Basic Books. (Original work published 1965) Rogoff, B. (1990). Apprenticeship in thinking: Cognitive development in social context. New York: Oxford University Press. Rogoff, B., Mistry, J. J., Göncü, A., & Mosier, C. (1993). Guided participation in cultural activities by toddlers and caregivers. Monographs of the Society for Research in Child Development, 58, (7, Series No. 236). Saxe, G. (1990). Culture and cognitive development: Studies in mathematical understanding. Hillsdale, NJ: Lawrence Erlbaum. Super, C., & Harkness, S. (1986). The developmental niche: A conceptualization at the interface of society and the individual. International Journal of Behavioural Development, 9, 545–570. Vygotsky, L. S. (1962). Thought and language. Cambridge, MA: Harvard University Press. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge, MA: Harvard University Press. Weisner, T. S. (1984). Ecocultural niches of middle childhood: A cross-cultural perspective. In W. A. Collins (Ed.), Development during middle childhood: The years from six to twelve (pp. 334–369). Washington, DC: National Academy Press.
Salkind_Chapter 33.indd 350
9/4/2010 6:31:22 PM
34 Biology and Cognition Jean Piaget ( Translated by Martin Faigel )
T
his article is a summary of the conclusions of a work in progress on “Biology and Cognition;” from this stems the rather general character of the following observations. In order to compare cognitive and biological mechanisms, we must first state that the former are an extension and utilization of organic auto-regulations, of which they are a form of end-product. To demonstrate this, one can begin by noting the close parallels between the major problems faced by biologists and those faced by theoreticians of the intelligence or of cognition. Secondly, one can analyze the functional analogies and especially the structural isomorphisms between organic life and the means of cognition: “nested” structures, structures of order, multiplicative correspondence, etc. One can also attempt a sort of comparative epistemology of the different levels of behavior (the “logic” of the instincts or of the learning processes, etc.). Finally, one can examine the explanations current among biologists to account for the formation of intelligence. But if these various analyses bring into relief the continuity between organic life and cognitive mechanisms, on the other hand it still remains to be seen that the latter constitute differentiated and specialized organs for reacting physiologically to the external world. Or in other words, that at the same time that they are an elaboration of organic structures in general, they fulfill particular functions, although still of a biological nature. The following pages are based on this premise, but it should be understood that it is not a question of contrasting cognition with organic behavior but rather of placing the functions of the former within the framework of the latter. Source: Diogenes, 14 (1966): 1–22.
Salkind_Chapter 34.indd 351
9/4/2010 10:32:19 AM
352
Curriculum, Instruction and Learning
A. The Functions Specific to Cognition In studying the functional relationships and the partial structural isomorphisms between cognitive and organic functions, one notes the existence of a remarkable number of similarities but also a certain number of differences which show that cognition also has specific functions. Moreover the contrary would be unthinkable since if organisms were self-sufficient – without instincts, acquired ability, or intelligence – it would indicate a radical duality of kind between life and cognition, since cognitive mechanisms do in fact exist. This in turn would raise inextricable difficulties for an epistemology simply trying to explain how science is able to arrive at objective knowledge. I. Behavior, the extension of the environment and the closing of the “open system.” To begin with the basic facts of ethology, the majority of perceptions characteristic of animals are of a utilitarian and practical kind. Instinct is always at the service of the three fundamental needs of nutrition, selfdefense, and reproduction. If with migrations or different types of social organization it seems to pursue derivative ends, they are derivative only in the sense that these interests, grafted onto the three principal ones, are still based on them and are ultimately subordinate to the survival of the species and to the possible survival of the individual. The elementary forms of perceptual or sensory-motor learning fall within a similar functional structure, and it is the same for a very large part of routine or sensory-motor intelligence. Nevertheless, in this latter area one must admit that with mammals and especially Anthropoids there is some development of activity which remains functional but involves comprehension for its own sake: we know that young mammals play and that this, despite K. Groos, is not just an exercise of the instincts, but a general exercise of the activities possible at a given level, without present utility or without being put into use. Now, play is but one pole of the functional processes operating in the course of individual development, the other pole being non-playful exercise, where the young subject “learns to learn” (Harlow) in a context of cognitive adaptation and not solely of play. One of our children, aged about one year, chanced to pass through the bars of his play-pen a toy which he wanted but which, being too long, had to be placed vertically in order to make the passage possible. He was not satisfied by his chance success, but he put it outside again and repeated his efforts until he “understood.” This beginning of disinterested knowledge is without doubt equally accessible to chimpanzees. But whether exclusively utilitarian or involved in this transition from “know-how” to “understanding,” animal cognition thus already quite clearly demonstrates a specific function, in comparison with survival, nutrition, or reproduction in their purely organic aspects: this is the function of extension of the environment. To search for food instead of drawing it from the earth or from the atmosphere like a plant, is already to enlarge the environment.
Salkind_Chapter 34.indd 352
9/4/2010 10:32:19 AM
Piaget
Biology and Cognition
353
To search for the female and to care for offspring is to assure to reproduction more spatial-temporal control than that of the purely physiological function. And to explore for the sake of exploration (like the rats described by Blodgett), without immediate utility, to the point of learning for its own sake, as this already appears within the realm of sensory-motor intelligence, is to extend even further the part of the environment that is actually put to use. It is clear that during later development the mere existence of instruments for intelligent cognition, even if it pursued only utilitarian ends at the start, creates a new functional situation, since every organ tends to develop and maintain itself for its own sake: from this stem the fundamental cognitive needs of comprehension and invention; but they in turn lead to an evergrowing extension of the environment, this time as an object of consciousness. One can express biologically this slow extension, later to become more and more accelerated with man, of the accessible environment to needs at first biological and later more specifically cognitive, by relating it to the fundamental traits of the living system. An organism, according to Bertalanffy, is an “open system” precisely in the sense that it retains its form only through a continuous flow of exchanges with the environment. Now, an open system is a constantly threatened system, and it is not for nothing that the basic concerns of survival, food and reproduction lead to behavior which results in the extension of the usable environment. This extension must then be translated into terms which express its actual function: it is essentially an attempt to close the system and this precisely because it is too “open.” From the point of view of probability (and it is the only one suitable here) the particular risk to the open system is that its immediate environment or its frontiers will not supply the necessary elements for its survival. To close the system would instead be to circumscribe an area capable of ensuring survival. One sees at once that the closing of the system is a goal constantly pursued but never achieved. It is not that the initial needs of food, protection, or reproduction are infinite, far from it. Rather it is that, as soon as various actions serving to satisfy these needs are developed, thanks to a slight enlargement of the initial environment, the cognitive controls of these actions lead sooner or later to an unlimited extension of the system, and this for two reasons. The first is related to the probability of encounter with desired elements (food and sex) or feared ones (protection). So long as a living creature does not have differentiated sensory organs, exterior events affect it only through immediate contacts and cease to exist as soon as the immediacy disappears. There exist then only momentary needs which disappear as soon as they are satisfied and reappear later, according to a periodic cycle of varying length. However, as soon as a cognitive control develops and olfactory or visual organs indicate food or danger some distance away, the needs are modified by this extension itself: even if the appetite is momentarily satisfied, the absence of visible nourishment or its odor becomes a disturbing modification of the
Salkind_Chapter 34.indd 353
9/4/2010 10:32:19 AM
354
Curriculum, Instruction and Learning
possibilities of recurrence and creates a new need in the form of the need to search, although there may be no immediate desire to be satisfied. Similary, awareness of enemies, even a safe distance away, engenders a new need for vigilance and watchfulness. In other words, the appearance of a cognitive control leads to its alteration as a consequence of function, and this change involves an enlargement of the environment without the possibility, on this elementary level, of ever closing the “open system.” Moreover we should note that a similar general extension of the environment begins already on an organic scale previous to sensory controls. This is the dissemination of seed in the sexual reproduction of plants, a good example of spontaneous extension without cognitive control. What would happen if a cognitive control permitted the plant to be informed by feedback of the relative insuccess of this manner of propagation? II. Behavior and cognitive controls. The second reason for the enlarging of the environment which aims at closing the “open system” but which constantly pushes back the limits of this closure is progress in the internal mechanisms of cognitive regulation. Here we reach an essential point about the nature of the cognitive process and the way it develops. Let us take an ordinary physiological cycle (A × A′)→(B × B′) →… (Z × Z′) → (A × A′) →, where A, B … Z are the elements of the organism and A′, B′… Z′ are the elements of the milieu with which they are in basic interaction. One can then schematize the intervention of a developing cognitive mechanism as a control which reacts to the presence of some external element or other, A′, informs the relevant organs, A, and thus participates in the process A B, facilitating its development. From the beginning therefore, cognitive response has a role of control and leads to compromise, intensification, change, compensation or other regulation of the physiological process. But it goes without saying that this elementary response, which can take the form of tropisms or of only slightly differentiated reflexes, precisely because it is a regulating mechanism involves the possibility of, and even requires, indefinite development, for it is in the nature of a regulating agency to be able to correct itself through the control of controls. In the case of our elementary scheme the chain or feedback leading from A′ to A, which comprises a signal from A′, or afference, and an effect on A, or effection, results in two kinds of possible improvements or controls of behavior to the second power, while internal or physiological regulation affect the process A→B: (1) there can be refinements in the recording of A′ in the form of various conditionings which assimilate new signals or cues within the initial set of perceptive schemata and thus constantly enrich the perceptive keyboard with controls differentiating the initial total stimulus; (2) above all there will be refinements in the reactive systems affecting A, and it is here that new controls show their possibilities in an uninterrupted sequence, of which sensory-motor development in man’s growth pattern gives a particularly striking example: on the basic reflex schemata such as suction,
Salkind_Chapter 34.indd 354
9/4/2010 10:32:19 AM
Piaget
Biology and Cognition
355
grasping, or ocularymotor reflexes a succession of more and more complex behaviors is built, whose two general principles are the accommodation of assimilatory schemata leading to their differentiation, and above all the reciprocal assimilation of schemata (vision and touch, etc.), leading to their coordination. Now from the point of view which concerns us here, the double basic significance of this development, which produces sensory-motor intelligence, is (a) that the progress we have observed is due to a control of controls which results in the exercise of cognitive functions for their own sake, independently of utilitarian or strictly biological basic needs (nutrition, etc.), and (b) that consequently this pushes further and further back the “closure” of a system open to the environment. That this progress is due to a control of controls is evident, to begin with, in the differentiation by accommodation of the assimilatory systems. For on one hand this accommodation is carried out by trial and error, and this is typical of feedback systems where the action is corrected according to its results. But on the other hand, this trial and error control does not develop from nothing, but from within a previous framework of reflexes or acquired assimilatory schemata, and these initial schemata are the basic controls whose differentiation is elicited by a superimposed regulation. The coordination of schemata by reciprocal assimilation also involves the control of previous regulations by new ones, and these secondary regulations are especially important since they are related to actions. For the coordination of schemata is a process which simultaneously moves forwards and backwards, since it arrives at a new synthesis which modifies in its turn the schemata thus coordinated. The internal progression of the mechanism of cognitive control then implies its exercise, that is to say, the formation of a series of new interests no longer subject to the initial interests which are activated by the functioning per se of the system. These interests are the functional expression of the mechanism of cognitive assimilation itself but, again we see, as a direct extension of the initial sensory process. The resulting enlarged environment is therefore both the environment, in the biological sense of all the stimuli which affect the organism in its physiological cycle, and the cognitive milieu, considered as all the objects of interest to the consciousness. But this new extension of the environment is unable to close the “open system” since it remains subject to probabilities of occurrence or, in other words, to the chance experiences of the subject. It is only with imagination or thought, which multiplies at an accelerating rate the spatial-temporal distances characterizing the field of action and comprehension of the subject, that the closing of the open system becomes a possibility. But this requires inter-individual or social exchanges as well as individual exchanges with the environment, and we shall return to this problem later. III. Organic equilibrium and cognitive equilibrium. If the first essential function of cognitive mechanisms is thus the progressive closing of the “open system”
Salkind_Chapter 34.indd 355
9/4/2010 10:32:20 AM
356
Curriculum, Instruction and Learning
of the organism thanks to an indefinite extension of the environment (and this function is indeed an essential part of the process even if, or above all if, it never reaches complete stability), this function leads to a series of others. The second one to remember is of equally fundamental importance, for it relates to the system’s mechanisms of equilibrium. Living systems are essentially self-regulating. If what we have discussed is correct, the development of cognitive functions is clearly, in accordance with our hypothesis, the creation of specialized organs of control for the regulation of exchanges with the exterior, at first of a physiological type, directed at materials and forces, and later purely functional, that is to say, bearing essentially on the functioning of actions and of behavior. But once differentiated organs come into being, are their controls identical to those of the organism? Or in other words, are the forms of equilibrium the same? The body of known facts leads to the reply, yes and no. They are the same regulations or the same forms of equilibrium in the sense that cognitive organization is an extension of living organization and therefore introduces an equilibrium in the sectors where the organic equilibrium is inadequate – in its particular sphere (as we have seen) and in its accomplishments. But the controls and the cognitive equilibrium differ from the organic equilibrium precisely in that they succeed where the latter is incomplete. The evolution of organized life appears as an uninterrupted sequence of assimilations of the environment to more and more complex forms, but the very diversity of these forms shows that none of them has been able to put this assimilation in a state of lasting equilibrium. If each group or species is in equilibrium, their succession demonstrates a perpetual beginning anew. It is therefore first of all in the relationship between assimilation and adjustment that the cognitive functions introduce something new. To begin with the development of knowledge, it seems at first sight that we are in the presence of a completely comparable phenomenon. Not to mention the diversity of instincts or of elementary learning processes, the evolution of the human sciences does not always give us a picture of coherent development easily able to introduce new adjustments required by experience into a permanent assimilatory framework by enlarging or simply differentiating it. But there is an exception, and this is the major one of logico-mathematical structures, important enough by itself but notably increased in significance by the fact that these structures provide the principal assimilatory schemes used by the experimental sciences. In effect, logico-mathematical structures present the unique example of a continuously evolving development, such that no new structurization has had to eliminate its predecessors. Of course these can be poorly adapted to an unforeseen situation but only in the sense that they are unable to resolve a new problem and not that they are contradicted by the very terms of this problem, as it can happen in physics. Thus, in the relationship between assimilation and accommodation, logico-mathematical structures involve a sui generis type of equilibrium.
Salkind_Chapter 34.indd 356
9/4/2010 10:32:20 AM
Piaget
Biology and Cognition
357
On one hand they can be viewed as the continuous construction of new schemes of assimilation – the assimilation of previous structures in a new, integrated one, and the assimilation of experimental data in the structures thus created. But on the other hand, they show a permanent accommodation in the sense that they are not modified by the newly created structures (except to be amplified) or by the experimental data which the latter are capable of assimilating. Certainly, new data on physical experience can pose unexpected problems for mathematicians and lead to the creation of theories which can absorb them; but the creation in this case is not drawn from an accommodation in the manner of the concepts of physics. On the contrary it is derived from previous structures or schemata at the same time that it is adapted to the new reality. One can then propose an interpretation which might appear to be rash but which seems to have a true biological foundation if one agrees, as everything seems to suggest, that the primary source for the coordination of actions, out of which come mathematics, can be found in the general laws of the system: it is that the equilibrium between assimilation and accommodation reached by logico-mathematical structures constitutes the simultaneously flexible, or dynamic, and stable state vainly sought after by the succession of forms, at least in the realm of behavior, during the evolution of organized life. While this evolution is marked by a continuous series of disequilibriums and equilibriums, logico-mathematical structures achieve a permanent equilibrium despite the new additions which characterize their evolution. This brings us to the problem of “vection” or of “progress” raised by many present-day biologists. Vection, which seems to be proved by organic evolution, is characterized by the remarkable union of two apparently antithetical qualities, whose cooperation is necessary for the major accomplishments of adaptation. One has been especially stressed by Schmalhausen: this is an increasing integration which makes the processes of development more and more autonomous with regard to the environment. The other, stressed by Rensch and by J. Huxley, is the increasing “widening” of possibilities for influencing the environment, and by consequence penetration into environments which become more and more extended. It goes without saying that these two aspects in combination can be found in the development of the sciences. It is to the extent that human intelligence has found in logico-mathematical structures an instrument of integration increasingly independent of experience that it has made a greater conquest of the experienced environment. But once again, because of the very nature of their equilibrium, the cognitive structures develop from the organic ones through extension. They have a similar nature but, as we have seen, in the case of cognition it has developed into forms which are inaccessible to the organic equilibrium. With regard to vection, the difference appears in the following way. The process of integration pointed out by Schmalhausen involves only a certain type of integration, which can be described as current or synchronous,
Salkind_Chapter 34.indd 357
9/4/2010 10:32:20 AM
358
Curriculum, Instruction and Learning
and it therefore has to reconstitute itself in every new group without being able to integrate the entire phyletic past as a sub-system both retained and developed (to put it concretely, mammals have lost some of the characteristics of reptiles by becoming mammals, etc.). The unique character of the integration characteristic of cognitive evolution is, on the contrary, as we have seen, that it is more than temporary and integrates previous structures as subsystems of the current integration. This integration, surprisingly both diachronic and synchronous, occurs without conflict in mathematics (whose “crises” are only those of growth with but momentary contradictions); however, in the experimental sciences a new theory can contradict previous ones. It remains notable though that a new theory always aims at a maximum of integration of the past, so that the best theory is the one which integrates previous results, adding necessary retroactive corrections. IV. The dissociation and conservation of forms. But this achievement is due to another specific character of the cognitive functions in contrast with organic life: this is the possible dissociation of form and content. An organic form is inseparable from the matter which it organizes, and in any particular case it is suited only to a limited and well-defined group of substances, whose modification necessitates a change in form. Once again we find a similar situation (given the continuity between the living system and the cognitive one) in elementary forms of consciousness such as sensory-motor or perceptual schemes, although they are already more generalized than the innumerable forms of biological organization. But with the development of intelligence, operative systems become still more generalized, although at the level of concrete operations (classes and relations) they may still be related to their contents, just as structurization is to the structured matter when it can proceed only step by step without sufficient deductive mobility. Finally, with the hypothesizing-deductive activity which proportional combination permits, it becomes possible to elaborate a formal logic, in the sense of an organizing structure applicable to any kind of content whatever. This is what makes it possible to create “pure” mathematics, viewed as an assemblage of organized forms prepared to organize anything, but ceasing temporarily to act according as it is dissociated from application. Once again we find a biological situation impossible on an organic level, where micro-organisms are capable of “transduction” of genetic messages from one species to another, but only as content or matter, and where genetic “transduction” of an organization understood as a form dissociated from all substance has not yet been observed! But on the cognitive level, this refining of form leads to accomplishments constantly sought after, one might say, in the organic domain but never fully achieved. It is possible to establish certain analogies between the conservation of biological forms (so evident in the regulatory self-conservation of the chromosome) and the exigencies of conservation characteristic of different forms of intelligence, from sensory-motor intelligence (a system for the
Salkind_Chapter 34.indd 358
9/4/2010 10:32:20 AM
Piaget
Biology and Cognition
359
permanency of objects) to operative conservation. In this respect it might seem that an artificial comparison is being made between quasi-physical systems on the one hand and normative or ideating ones on the other. But once one is aware of the basic nature of regulation characteristic of elementary cognitive functions and the sequence from regulation to action, the comparison becomes more natural, for organic conservation is in fact the outcome of regulatory mechanisms. But the analogies thus touched upon nonetheless run into an important difference, and this is what concerns us here: organic conservation is never more than approached. Moreover, this is also true for preliminary cognitive forms (perceptual constants), while only the operative conservations of intelligence are rigorous and “necessary,” on account of the dissociation of form and content. Conservation is closely related to operative reversibility, which is its source and which, in addition, demonstrates the particular form of equilibrium reached by logico-mathematical structures. We must then be at the very heart of the difference which, deep within their similarity, distinguishes the constructive work of intelligence from that of organic transformations. The basic analogy is that both have to struggle incessantly against the irreversibility of events and the deterioration of energy and information. And both systems deal with the problem by elaborating organized and balanced systems whose principle is to compensate for deviation and error. Thus, beginning with controls of a homeostatic1 nature – genetically as well as physiologically – there is a fundamental tendency towards reversibility of which the attempted conservation of the system is the result. Whatever may be the eventual explanations, still to be worked out, used to resolve the problem of the anti-chance function necessary to the organization and evolution of life (exceptions to Carnot’s principle or various forms of conciliation) there remains however that an auto-regulatory system involves actions oriented in two opposed directions and that it is this partial reversibility whose progress we can follow in the development of cognitive controls. But as we have pointed out above, the result of the general interplay of reflective abstractions and of reconstructions converging with this evolution, is that the evolution which marks the progress of each level with respect to the preceding one is based more on the regulation of regulations, and so on a reflexive refining of the system or on superimposed controls, rather than on a simple horizontal extension. This is why the mechanism of the “operations” of thought represents more than an extension of previous controls and constitutes a sort of limiting process towards the point where strict reversibility establishes itself as soon as the retroactive action of feedback becomes an “inverse operation,” thus ensuring the exact functional equivalence of the two possible directions of the construction. V. Social life and the general coordination of action. But the most remarkable aspect of human knowledge in its mode of formation, as compared with the evolutionary transformations of organisms and the forms of knowledge
Salkind_Chapter 34.indd 359
9/4/2010 10:32:20 AM
360
Curriculum, Instruction and Learning
achieved by animals, is its collective as well as individual nature. One can of course observe the outlines of a similar characteristic in a number of animal species, especially the chimpanzee. Nevertheless, the novelty with man is that external or educative transmission (as opposed to the hereditary or internal transmission of the instincts) has led to an organization capable of fathering civilizations. We should first note that, if it is necessary to distinguish between two types of development, one organic (characteristic of a single organism) and the other genealogical (comprising lines of descent, whether social or genetic), the history of human science combines these two developments in a single whole: ideas, theories, and schools of thought develop genealogically, and one can construct for them genealogical trees representing the relationship of structures. But they are so well integrated into a single intellectual organism that the succession of thinkers is comparable, to quote Pascal, to a single man endlessly learning. Now, human societies have been described, in turn, as the result of individual initiative propagating itself by imitation, as totalities acting from the outside on individuals, or as systems of complex interactions producing individual action, which is always in conjunction with a more or less important part of the group, as well as producing the entire group defined as the system of these interactions. In the area of cognition, it seems evident that the individual operations of intelligence and the operations that ensure the actual exchange in cognitive cooperation are one and the same thing, since the “general coordination of the actions,” which is the source of logic, is an inter-individual as well as intra-individual coordination, inasmuch as these “actions” are collective as well as individual. It is therefore a meaningless question to ask if logic or mathematics are essentially individual or social. The epistemic subject which creates them is both an individual, placed off-center with respect to his specific “me,” and the sector of the social group, off-centered with respect to the constraining idols of the tribe; and these two types of displacement show the same intellectual interactions or general coordination of action which is constitutive of cognition. The result is thus (and this is the final basic difference which we shall point out between biological and cognitive organization) that the most general forms of thought, since they are capable of being dissociated from their content, are because of this the medium for cognitive exchange or inter-individual regulation, at the same time that they arise out of common functions characteristic of all living systems. Certainly, from a psychogenetic point of view, these inter-individual or social (and not hereditary) regulations form a new element with respect to individual thought, which if deprived of them is exposed to all kinds of egocentric deformation, and they are a necessary condition for the constitution of a decentralized, epistemic subject. But from a logical point of view, these higher controls are still dependent on the conditions of all general coordination of action and so have the same biological origins.
Salkind_Chapter 34.indd 360
9/4/2010 10:32:20 AM
Piaget
Biology and Cognition
361
B. Organic Regulation and Cognitive Regulation This collective re-elaboration of forms already built out of elements pertaining to biological organization also helps to locate the remaining observations within their true framework. Our hypothesis is thus that cognitive functions are a specialized organ for the regulation of exchanges with the external world, although they derive their instruments from biological organization in its general forms. I. Life and truth. It might seem that the necessary existence of a differentiated organ is self-evident, since the specific character of knowledge is to attain truth, while it is specific of life only to seek its persistence. But if we do not know exactly what life consists of, we know even less about cognitive “truth” There is general agreement that it is something other than a faithful copy of reality, for the good reason that such a copy is impossible, since only the copy could provide the knowledge of the model to be copied and since this knowledge, on the other hand, is necessary for the copy! To attempt it leads to a simple phenomenism, where subjectivity constantly interferes with the perceived datum, which itself demonstrates an inextricable connection between subject and object. If truth is not a copy, it is then an organization of reality. But organized by what subject? If we take the human subject, the risk in this case is expanding egocentrism intro anthropocentrism – which will also be sociocentrism – and the gain is minimal. Consequently philosophers concerned with the absolute have had recourse to a transcendental subject which goes beyond man and especially “nature” so as to place truth outside spatial-temporal and physical contingencies and to make nature intelligible in a non-temporal or eternal perspective. But the question then is whether it is possible to leap over one’s shadow and to reach the “Subject” in se, without his remaining, in spite of all, “human, too human,” to quote Nietzsche. For the trouble is that from Plato to Husserl the transcendental subject has constantly changed shape, with no improvements other than those due to the progress of the sciences themselves, hence of the real model and not the transcendental one. Our intention then is not to run away from nature, since no one escapes nature, but to investigate it step in step with the effort of science because, whatever the philosophers may think, it has still not given up all its secrets and because, before putting the absolute in the clouds, it may be useful to look at the inside of things. Consequently, if truth is an organization of reality, the first question is to understand how one organizes an organization, and this is a question for biology. In other words, since the epistemological problem is to know how science is possible, we should exhaust the possibilities of immanent organization before having recourse to the transcendental. But if truth is not egocentric and should no longer be anthropocentric, is it then necessary to reduce it to a biocentric organization? If truth is more than man, is it necessary to look for it in protozoa, termites, or chimpanzees?
Salkind_Chapter 34.indd 361
9/13/2010 3:29:13 PM
362
Curriculum, Instruction and Learning
If one defined it as a vision of the world shared in common by all living creatures, including man, the result would be a meager one. But the character of life is to surpass itself constantly, and if one seeks the secret of rational organization in the living system, including its own mechanisms of progress, the method then consists of trying to understand knowledge by its very construction, which is not the least bit absurd, since it is essentially construction. II. The deficiencies of the organism. From a cognitive point of view, these progressive evolutions, which are just as essential as the initial state, seem inherent to the living system itself. Its organization is that of the system of all exchanges with the environment; it tends then to spread out into the envire environment but it never completely succeeds. This is where cognition comes in to assimilate functionally the whole universe without being limited to material physiological assimilation. The living system creates forms and it tends to conserve them in as much stability as possible, but without success. And this again is where cognition comes in to extend material forms into forms of action or of operation which are then capable of conservation under their applications to the various contents from which they are dissociated. This living system is a source of homeostasis at every step; its regulations ensure equilibrium by the evolution of quasi-reversible mechanisms. However, this equilibrium remains fragile and resists the surrounding irreversibility during but transient stages, so that evolution appears to be a series of disequilibriums and of returns to equilibrium, partially giving way to a mode of structuring that comprises the integrations and reversible mobility which cognitive mechanisms only are able to accomplish completely by integrating control into the construction itself in the form of “operations” In short, the need for differentiated organs to regulate exchanges with the external world results from the inability of the living system to carry out its own program, implicit in the very laws of its organization. For on one hand, it involves genetic mechanisms which are formative and not merely transmittive; but their method of formation (as it is now understood) founded on the recombination of genes, ensures only a limited construction, bounded by the needs of hereditary programming which is necessarily restricted, as it is unable to conciliate construction and conservation within a single coherent dynamic (as cognition will do), and as it lacks sufficiently flexible information on the environment. On the other hand, phenotypes,2 that achieve a certain amount of interaction with the environment, fall within a norm of reactions in itself bounded; but above all their individual achievements remain both limited and without influence on the whole (for want of the social or external interactions which are made possible for man by cognitive exchanges) except through genetic recombinations, with their afore-mentioned limitations. This double deficiency of organisms in their material exchanges with the environment is partly compensated by the constitution of structured behaviors, created by the system as an extension of its internal program. For behavior is nothing more than the very organization of life, but applied or generalized to
Salkind_Chapter 34.indd 362
9/4/2010 10:32:20 AM
Piaget
Biology and Cognition
363
a larger sector of material and energy exchanges than those which are already ensured by the physiological organization. And functional implies that the emphasis is on the actions and forms or schemes of action that extend organic forms. Nonetheless, these new exchanges, like all the others, consist in adaptations to the environment, that take into account its events and their sequence; but above all they consist of assimilations which use the environment and often even impose shapes upon it through constructions or arrangements of objects satisfying the needs of the organism. Like all organization, this behavior involves regulations, whose function is to control constructive adaptations and assimilations by acting on information on the results received in the course of action or by the elaboration of anticipations which allow the forecasting of favorable events or of obstacles and the preparation of the necessary compensations. These regulations, which are differentiated with regard to the internal control of the organism (since we are concerned here with behavior) constitute the cognitive functions. And the problem then is to understand how they widen the scope of organic regulation to the point where they can carry out the internal program of the system without being subject to the deficiencies we mentioned. III. Instinct learning, and logico-mathematical structures. The basic facts here are in the first place, that cognitive controls begin by using only the instruments of organic adaptation in general, that is to say, heredity with its limited variations and phenotypic accommodation: from these stem the hereditary modes of cognition such as those that appear in instinctive behavior. But subsequently the deficiencies of the initial system that are corrected only slightly on the new behavioral level turn up at the level of this innate cognition. This is what causes, but only during the later stages of evolution, the final break-up of instinct and the separation of its two components, internal organization and phenotypic adaptation. What results then (and this is not immediately upon dissociation, but as an effect of complementary reconstructions in two opposite directions), is the double emergence of logico-mathematical structures and of experimental science, still undifferentiated in the practical intelligence of Anthropoids, who are geometers3 as much as they are technicians, and in the technical intelligence of the beginnings of humanity. The three fundamental types of knowledge are innate skill, whose prototype is instinct, knowledge of the physical world, which extends the learning process as a function of the environment, and logico-mathematical knowledge; and the connection between the first and the latter two seems essential to an understanding of the way in which higher forms are indeed an organ for controlling interchanges. We shall return to this point in conclusion. Instinct indeed already includes some cognitive controls as may be observed, for example, in the feedback system formed by Grassé’s “stigmergies.”4 But these controls remain limited and rigid, precisely because they develop within a framework of hereditary programming, and programmed controls are not capable of invention. Certainly it happens that animals are able to deal with
Salkind_Chapter 34.indd 363
9/4/2010 10:32:20 AM
364
Curriculum, Instruction and Learning
unforeseen situations through readjustments which foretoken intelligence. The coordination of schemata that occurs on this occasion can be compared with the innate coordinations of the instinctual, trans-individual cycle, which gives an important indication of the possible functional relationship between instinct and intelligence, despite the difference of epigenotypic5 and phenotypic levels which characterizes them. But the phenotypic developments of instinct remain very limited and its deficiency thus remains tied to its nature, which demonstrates that a form of cognition that remains linked to the simple mechanisms of organic adaptation, despite some traces of cognitive regulation, scarcely approaches the achievements of intelligence. Though the area of learning stricto sensu, that which lies beyond the innate, begins with protozoa, it grows only very slowly until the cerebralization of the higher vertebrates, and however remarkable the exceptions that begin to appear with insects, it shows no systematic development until the primates. IV. The break-up of instinct. The fundamental phenomenon of this scission, or in other words, the almost total disappearance in the Anthropoids and man, of a cognitive organization which remained dominant throughout the entire evolution of animal behavior, is thus highly significant. This is not, as it is generally said, because a new mode of cognition, that is to say, intelligence considered en bloc, replaces a superseded one. More deeply, it is because a still quasi-organic form of cognition develops into new forms of control which take the place of the preceding form but do not replace it. Properly speaking, they inherit it, dissociating it and using its components in two complementary directions. What disappears with the dismemberment of instinct is hereditary programming, and this benefits two new types of cognitive self-regulation, that are both flexible and constructive. One might then say that this is in fact a replacement, and indeed a complete one. But one then forgets two essential factors. Instinct does not consist exclusively of hereditary mechanisms – such a concept is an extreme one, as Viaud has properly pointed out. On the one hand, instinct derives its programs and above all its “logic” from an organized activity which originates in the most general forms of the living system. On the other hand, it extends this programming by individual or phenotypic actions that contain an important element of adaptation and even of assimilation, in part learned and in certain cases almost intelligent. Now, what vanishes with the disappearance of instinct is only the central or median part, that is to say, programmed control, while the other two components – the origins of organization and the results of individual or phenotypic adjustment – remain. Intelligence therefore inherits instinct while it rejects the methods of programmed regulation in favor of constructive selfregulation. What it retains allows it to follow the two complementary directions of interiorization, towards sources, and of exteriorization, towards learned or experienced adjustments.
Salkind_Chapter 34.indd 364
9/4/2010 10:32:21 AM
Piaget
Biology and Cognition
365
The condition for this double evolution is naturally the construction of a new mode of control, and this must be remembered to begin with. These controls, which are no longer programmed but from now on are flexible, begin with the usual corrective activity, carried out as a function of the results of actions and of anticipations. But as participants in the construction of schemes of assimilation and in their coordination, under the combined influence of progressive and retroactive effects they end up moving in the direction taken by operations themselves, in as much as these are viewed as controls for precorrection and not just correction, and as the inverse operation is viewed as an action ensuring complete and not simply approximate reversibility. It is thanks to this new kind of control, that constitutes a differentiated organ for deductive verification as well as for construction, that intelligence can evolve simultaneously in the two directions of reflexive interiorization and experimental exteriorization we have just discussed. It is clear that this double orientation does not involve, and in fact has nothing in common with a sharing of the spoils of instinctual cognition. On the contrary, what remains of instinct is only its sources of organization and its end-products such as exploration and individual research. For intelligence to use the former and extend the latter, it must therefore turn to new constructions, of which some release the pre-conditions for general coordination of action through the use of reflective abstraction, and others absorb the experimental data into the operatory systems thus constructed. But it remains no less true that these two directions carry on the functions of two of the previous components of instinct. After the break-up of instinct, a new cognitive evolution begins and in fact it starts from scratch since the innate mechanisms of instinct have disappeared and, no matter how hereditary the cerebral nervous system and intelligence, seen as an ability to learn and invent, may be, the work to be done henceforth is phenotypic. Moreover, it is because this intellectual evolution starts from scratch that one generally finds it so difficult to relate it to the living system or above all to the structures, remarkable in their own right, of instinct. This is a good example of what one might call “convergent evolving reconstructions.” In the case of human intelligence, this reconstruction is in fact so complete that hardly any theoreticians of logico-mathematical knowledge have thought to explain it in the clearly necessary framework of biological organization. This was true at least before mechanophysiology showed the connection between logic, cybernetic models and the neurophysiological activity of the brain, or before McCulloch described the logic of neurons. V. Knowledge and society. But if such complete reconstruction is possible, it is because intelligence, by discarding the prop provided by hereditary structures and moving towards constructed and phenotypic controls, turns away from the trans-individual cycles of instinct only in order to engage in interindividual and social interaction. There does not seem to be any discontinuity here, since we already find group action in chimpanzees.
Salkind_Chapter 34.indd 365
9/4/2010 10:32:21 AM
366
Curriculum, Instruction and Learning
One might say in this connection that from a cognitive point of view the social group plays the same role that “population” does from the point of view of genetics and therefore from that of instinct. In this sense society is the supreme unit, and the individual succeeds in inventing or in creating intellectual structures only to the degree that he is the seat of collective interactions whose level and value naturally depend on that of the society in general. The great man who seems to initiate new trends is only a point of intersection or of synthesis, of ideas elaborated by continuous co-operation, and even when he dissents from majority opinion he is responding to underlying needs of which he is not the source. This is why the social environment actually does for intelligence what genetic recombinations in the entire population did for evolutionary variation or the trans-individual cycle of the instincts. But society, however external and educative its methods of transmission and interaction may be in comparison with those of hereditary transmission or combination, is no less than the latter a product of life. And “collective representations,” as Durkheim called them, still presuppose the existence of a nervous system in the members of the group. This is why the important question is not to weigh the merits of the individual versus the group (like asking which came first, the chicken or the egg): it is to distinguish between logic, whether in the course of solitary reflection or co-operation, and errors or insanities in collective opinion or in the individual consciousness. For, despite Tarde, there are not two logics, one serving the group and the other, the individual. There is only one way of coordinating actions A and B in a nested relationship or in one of order, etc., regardless of whether these are the actions of various individuals, one or some for A and another or others for B, or the actions of the same person (who did not after all invent them alone, since he is a member of the whole society). It is in this sense that cognitive controls or operations are the same whether in a single brain or in a system of co-operations (which is the meaning in French of the word coopération). * In sum, and however banal the thesis might seem, it is worth stressing that cognitive functions are extensions of organic controls and that they constitute a differentiated organ for regulating exchanges with the external world, for this hypothesis implies far more than these few pages can suggest.
Notes 1. According to Cannon, homeostasis means the regulatory mechanism which maintains equilibrium as a physiological system, plus, as we have since discovered, the organic function which ensures hereditary transmission (genetic homeostasis). 2. By phenotypes we mean the form which individual organisms take with relation to the milieu, as opposed to the “genotype” or hereditary form.
Salkind_Chapter 34.indd 366
9/4/2010 10:32:21 AM
Piaget
Biology and Cognition
367
3. See the interesting experiments of I. Meyerson and P. Guillaume. 4. Grassé calls “stigmergies” certain hereditary behavioral regulations of termites. They form small pellets of matter in building their homes, and when these reach a specific volume, the pellets then become buildingused as supports, floors, etc., in accordance with a new set of laws, but without a particular order of succession. 5. The epigenotype is a structure (using the definition suggested by the work of Waddington) which includes both genotypic and epigenotypic structures, that is, related to an embryonic development interacting with the environment.
Salkind_Chapter 34.indd 367
9/4/2010 10:32:21 AM
Salkind_Chapter 34.indd 368
9/4/2010 10:32:21 AM
35 Neural Bases of Intelligence and Training Mark R. Rosenzweig
A
lthough research on biological bases of intelligence has not yet had much influence on special education, it may well have major impact over the next few decades. Opinions on the relevance and probable importance of biological research for special education vary, however, among members of different professional fields. Positive predictions come from many of those who are active in investigating the biological bases of intelligence and learning or who are acquainted with this research. But such optimistic forecasts may not be widely shared by professionals in special education, as we will discuss later. To provide readers with further bases for estimating possible developments in this field, this article will focus on research on neural bases of intelligence and training, stressing the advances that are being made but also acknowledging problems and limitations. Before entering on the review proper, let us note some dramatic predictions that the incidence of mental retardation will decrease markedly by the end of the century. Advances of science and technology, coupled with a low birthrate and improved medical services, could halve the incidence of biologically caused mental retardation in the United States by the year 2000, according to the U.S. President’s Committee on Mental Retardation (USPCMR, 1976b). Further, retardation caused by sociocultural-socioenvironmental factors can reasonably be expected to drop by one-third in the next 10 to 20 years, according to the same source (USPCMR, 1976b). The latter prediction is based on increasing educational opportunities and evolution of life styles among those segments of the population now in the poverty sector. Both alleviation
Source: The Journal of Special Education, 15(2) (1981): 105–123.
Salkind_Chapter 35.indd 369
9/8/2010 12:05:49 PM
370
Curriculum, Instruction and Learning
and prevention of retardation can be aided by research on biological bases of intelligence. A number of biological causes of retardation have already been discovered. Growing knowledge of the neural mechanisms of learning and memory should permit a more systematic search for causal factors, whose isolation is an important step in finding treatments and methods of prevention for both biologically and socioenvironmentally induced retardation. In view of these prospects, those in the field of special education may wish to consider the progress of research on biological bases of learning and intelligence, both for its implications for their own work and for that of their students. The present article will review briefly some of the main lines of research.1 The main sections of this review are the following: (a) research on neural mechanisms of intelligence, concentrating on mechanisms of learning and memory as behaviors that are basic to intelligence; (b) effects of training and differential experience on the brain and behavior; and (c) a discussion of further predictions, including comment on divergent views of neuroscientists and professionals in special education.
Neural Mechanisms of Intelligence Intelligent behavior is considered by many investigators to depend upon both elaborate, orderly networks of neurons in the brain and upon the capacity for altering some aspects of these networks. Complex neural circuitry is required for such functions as processing both sensory information and internally generated signals, comparing percepts with memory stores, programming coordinated muscular responses, and monitoring ongoing bodily activity. Alteration of some aspects of the circuits is needed in order to store new information, to change existing patterns of response, and to elaborate new patterns. Such alterations may be achieved either by changing functional characteristics of existing synaptic functions between neurons or by formation of new connections (or removal of old ones). Deficits in intelligence may thus arise from several causes. Genetic defects may prevent the formation of the requisite complex and orderly sets of neural structures (Huttenlocher, 1975; Purpura, 1975), or they may prevent the normal plasticity of synaptic junctions. Inadequate early nutrition or inadequate early secretion of certain hormones (e.g., thyroid hormone) may also impair the growth of normally complex neural circuits. Diseases, vascular accidents, or mechanical injury may destroy or impair the functions of parts of the nervous systems needed for complex, adaptive behavior. Lack of adequate stimulation and experience during development may prevent both the full growth of important neural circuits and also the acquisition of knowledge needed for later intelligent behavior.2 It has been estimated that about half the cases of mental retardation have some type of neurological or genetic deficit (Mykleburst & Boshes, 1965).
Salkind_Chapter 35.indd 370
9/8/2010 12:05:50 PM
Rosenzweig
Neural Bases of Intelligence and Training
371
Let us now consider neural mechanisms of learning and memory, since these are basic for intelligence. Neural mechanisms of learning and memory have been topics of clinical investigation for a century, and they have been investigated with animal subjects since the beginning of the present century. The French psychologist T. Ribot published his influential Diseases of Memory, in 1882. In 1902 the American psychologist S. I. Franz published an important article in which he combined two experimental techniques – the new technique of E. L. Thorndike for studying animal learning, and the technique of making experimental brain lesions. These procedures allowed Franz to study the effects of localized lesions on learning and memory. Karl Lashley later worked with Franz and then continued this line of research. By the 1890s some investigators focused attention on the junctions between neurons as the likely site of plastic changes (Tanzi, 1893). This kind of junction did not have a name yet, but a little later in that same decade, Sherrington, in his chapter in Foster’s Neurophysiology (Foster & Sherrington, 1897), gave it the name “synapse.” Sherrington also stated that the synapse was likely to be strategic for learning. He put it in this picturesque way: Shut off from all opportunity of reproducing itself and adding to its number by mitosis or otherwise, the nerve cell directs its pent-up energy towards amplifying its connections with its fellows, in response to the events which stir it up. Hence, it is capable of an education unknown to other tissues. (p. 1117)
But investigators of the 1890s did not possess the techniques needed for detailed work at the level of the synapse. That had to await electron microscopy and intracellular recording of neural potentials; we will now examine the dramatic findings made in recent years by the use of these and other sophisticated techniques.
Neurochemistry of Learning and Memory By the 1960s several groups of biochemists had undertaken research on learning and memory. They were interested in the nucleic acids that hold the genetic instructions of cells and that direct the manufacture of enzymes and other proteins; the chemical agents that transmit signals from one neuron to the next across the synaptic gap were another focus of interest. Since biochemical contents and processes are closely similar in the brains of all mammals, the biochemists have worked chiefly with laboratory rats and mice. Often in collaboration with psychologists, these investigators employed two main strategies: Either they trained animals and looked for small chemical changes in the brain, or they attempted to see how various pharmacological agents would either improve or impair the formation of long-term memories in animals.
Salkind_Chapter 35.indd 371
9/8/2010 12:05:50 PM
372
Curriculum, Instruction and Learning
Results of extensive research indicate strongly that synthesis of protein in the brain is required soon after training if long-term memories are to be formed (Dunn, 1980; Flood & Jarvik, 1976). If inhibitors of protein synthesis are given to animal subjects shortly before training in dosages that are effective for a few hours posttraining, learning proceeds normally and memory is present for a short period, but tests 24 hours or more later reveal that there is no long-term memory. Formation of memory can also be modulated by administering certain synaptic transmitters, excitant or depressant drugs, and certain hormones or hormone fractions (see Dunn, 1980). For example, mild doses of excitant drugs aid memory formation, whereas mild doses of depressants impair memory formation; neither of these effects works through alteration of protein synthesis (Flood, Jarvik, Bennett, Orme, & Rosenzweig, 1978). Acetylcholine is the transmitter agent at many central synapses as well as at peripheral synapses, and agonists or antagonists of cholinergic function have been reported, respectively, to improve or impair learning in studies with laboratory animals (e.g., Deutsch, 1971; Stratton & Petrinovich, 1963). Some of the research on neurochemistry of learning and memory done with laboratory rodents has proven applicable to human subjects. For example, serial verbal learning in normal human subjects was reported to be enhanced by arecholine, a cholinergic agonist, and by choline, a precursor of acetylcholine, but to be impaired by scopalamine, a cholinergic antagonist. Those subjects who showed poor scores under control conditions were also these who were more affected by both the enhancing and impairing drugs. In other words, the drugs may be useful in bringing individuals towards an optimal level of cholinergic activity, and may not be able to improve those who are already at that level (Sitaram, Weingartner, & Gillin, 1978). Both storage and retrieval of verbal material were enhanced in normal human subjects by physostigmine, which inhibits the enzyme acetylcholinesterase and thus prolongs activity of acetylcholine (Davis, Mohs, Tinklenberg, Pfefferbau, Hollister, & Kopell, 1978). Both Sitaram et al. and K. L. Davis et al. noted that in Alzheimer’s disease and other presenile dementias, the cortex shows a decrease in the enzyme that synthesizes acetylcholine, and both groups of investigators suggested that research should be done to see whether cholinergic agents might aid such patients. A subsequent pilot study with Alzheimer’sdisease patients has reported that while physostigmine alone did not cause improvement, there was facilitation of memory when it was coupled with lecithin, a precursor of acetylcholine (Peters & Levin, 1979). Perhaps these agents could also aid some kinds of retarded individuals. Vasopressin (antidiuretic hormone) has been known for several years to play an important part in regulation of fluid balance in the body. More recently vasopression has been shown to occur in the brain, and it may be a synaptic transmitter. Furthermore, administration of vasopressin has been shown to aid memory formation in rodents (e.g., de Wied, van Wimersma
Salkind_Chapter 35.indd 372
9/8/2010 12:05:50 PM
Rosenzweig
Neural Bases of Intelligence and Training
373
Greidanus, Bohus, Urban, & Gispen, 1976). Following up on this lead, a pilot study with four cases of amnesia (three caused by concussions and one by alcoholism) found that administration of vasopressin over a few days brought recovery of memory in each case (Oliveros, Jandali, Tinsit-Bethier, Remy, Benghezal, Audibert, & Moeglen, 1978). Further research on this topic can be expected in the near future. It is true that attempts to aid hyperkinetic children with learning problems or retarded children by pharmacological treatments have not demonstrated the value of such therapy (e.g., Adelman & Compas, 1977). Nevertheless, to call such attempts premature suggests that their time will come, and the favorable results mentioned in this section indicate that rapid progress is being made toward the identification of effective pharmacological treatments for certain types of learning disorders.
Synaptic Changes and Processes in Learning Two main kinds of synaptic changes have been found to occur during or after training. First, the effectiveness of already existing synapses may alter; that is, transmission at existing synapses may either increase or decrease. Second, the number of synapses in a region of brain may increase (or possibly decrease) after training has occurred. Let us take up change in the number of synapses first and then change in effectiveness of existing synapses. In a number of experiments since the early 1960s, laboratory rats have been given differential experience by assigning them to different living conditions – an enriched condition (EC), in which 10 to 12 animals are placed in a large cage with a variety of stimulus objects that are changed daily; the standardcolony condition (SC), with three animals housed together; or the impoverished condition (IC), with a single animal in a colony cage. Later, we will consider cerebral effects of these conditions more fully, but here we should note differences in numbers of synaptic junctions. Most of the synapses on neurons in the cerebral cortex are made on dendritic spines, small projections from the surface of the receptive branches (dendrites) of neurons (Figure 1). Globus, Rosenzweig, Bennett, and Diamond (1973), measuring numbers of healthy spines per unit of dendrite length in 40 EC and 40 IC rats, found significantly more spines in EC than in IC littermates. Subsequently, Greenough (1976) found significantly greater branching of dendrites in EC than in IC rats. Combining these effects, it is clear that rats develop greater numbers of cortical synaptic junctions in EC, where the opportunities for informal learning are greater than in IC. More recently, Chang and Greenough (1978) gave formal training to only one cerebral hemisphere in rats. To accomplish this, they first transected the corpus callosum. Then they gave rats daily maze training with a different maze pattern for each of 30 days. Each day during the maze training some
Salkind_Chapter 35.indd 373
9/8/2010 12:05:50 PM
374
Curriculum, Instruction and Learning
Figure 1: Diagram of pyramidal neurons in the cerebral cortex. Each neuron has a single axon (1), which conducts nerve impulses away from the cell body (2); it also has several dendrites (3), which receive neural messages from the axons of other cells. To simplify the diagram, we show only a tiny fraction of the axons and synaptic junctions, since each pyramidal neuron receives thousands of contacts. Certain terminal boutons of axons contact the cell body or the surface of dendrites, but many of them end upon dendritic spines, which are little extensions of the dendrites. At the lower left is an enlargement of a synaptic junction: An axon (4) terminates in a bouton (5), which contains synaptic vesicles (6); the synaptic cleft (7) separates the bouton from a dendritic spine (8). The junction is magnified about 5,000 times, whereas the neuron at the right is magnified about 250 times. (After P. Mussen and M. R. Rosenzweig, Psychology: An Introduction. D. C. Heath, 1977.)
rats had an opaque contact lens placed over one eye so that visual information reached only one hemisphere. At the end of the 30-day training period, neurons in the occipital areas of the two hemispheres were analyzed for branching of dendrites. In the hemisphere that received input from the closed eye, branching was no greater than in control animals with no maze experience. But in the hemisphere that received input from the open eye, there was significantly greater branching, as in EC. Thus formal training as well as informal experience caused greater branching of dendrites and presumably a greater number of synaptic contacts. To study synaptic processes in learning in even greater detail, some investigators have worked with invertebrates that possess rather simple invariant nervous systems. In this field some of the best known research has been done
Salkind_Chapter 35.indd 374
9/8/2010 12:05:50 PM
Rosenzweig
Neural Bases of Intelligence and Training
375
on a large marine snail, Aplysia (Kandel, 1976, 1979). Such animals have rather limited learning abilities, but habituation (decrement in response to a repeated stimulus) proceeds in much the same way in Aplysia as in the human being. Moreover, Aplysia has the advantages for research that certain neural circuits have been traced out completely; the cells of these circuits can be identified and recognized from one Aplysia to the next, and electrical activity of these cells can be recorded by intracellular electrodes as habituation proceeds. With this preparation it has been possible to pinpoint the site of plasticity in habituation of the defensive gill-withdrawal reflex; it is at the synaptic junction between sensory and motor cells. Although the connections are fixed anatomically, the functional gain of the junctions varies. This occurs when an impulse in the sensory nerve causes release of fewer packets of synaptic transmitter than under control conditions. Sensitization also occurs in Aplysia; that is, increased response to a stimulus after strong stimulation of another input. The mechanism of this change has also been studied in some detail. The sensitizing impulses arrive at terminals on the presynaptic junctions and alter their state, so that impulses over the axons to the junction now release more synaptic transmitter than usual. To date, neurophysiological studies of learning in relatively simple invertebrates have been mainly confined to nonassociative learning – habituation and sensitization. Now ways are being found to train some of these animals associatively (Davis & Gillette, 1978; Mpitsos, Collins & McClellan, 1978; Walters, Carew, & Kandel, 1979), and we should soon be seeing results on synaptic mechanisms of associative learning. It is true that these animals look and behave quite differently from human beings, but their neurons are rather similar to ours, and the same synaptic transmitters are found in their nervous systems as in ours. Invertebrates also need to adapt, and the basic neural mechanisms by which they do so may be shared by human beings. Investigators working with these invertebrate preparations hope that they will provide keys to the fundamental cellular processes of learning. This would not be the first time that research with simple animals has revealed mechanisms that are important in human biology. Consider the revolution in molecular biology in the last 20 years that has led to a profound increase in knowledge about our hereditary mechanisms – the major experiments in this field were done on the bacterium E.coli.
Effects of Training and Experience on Brain and Behavior Giving animals formal training or allowing them to gain experience in differential environments brings about measurable changes in a number of aspects of the anatomy, chemistry, and electrophysiology of the brain and also in behavioral measures, as has been reported in many recent research
Salkind_Chapter 35.indd 375
9/8/2010 12:05:50 PM
376
Curriculum, Instruction and Learning
publications and review articles (e.g., Bennett, Rosenzweig, Morimoto, & Hebert, 1979; Greenough, 1976; Rosenzweig & Bennett, 1976b, 1977, 1978; Rutledge, 1976). We have already noted changes in dendritic branching and in dendritic spines caused by giving animals experience in EC or IC environments. Now let us review briefly some of the other respects in which the nervous system can be altered by differential experience. Beneficial effects of enriched experience on later learning will be noted; in fact, enriched experience has been used as a therapeutic treatment for various kinds of impairment of the brain. We will consider later whether the cerebral changes are in fact related to learning and memory storage or whether they may be caused by other aspects of differential experience.
Effects on Brain Anatomy Both gross anatomy and microscopic anatomy have revealed differences induced in the brain by experience in EC or IC environments. The weight of standard samples of cerebral cortex is in greater in EC than in IC littermate rats. These differences amount to 9% or 10% in occipital cortex and 4% or 5% in total cortex. Significant differences are found after only 4 days of experience in the EC or IC environments, and the differences increase in magnitude as the period of differential experience is extended up to about 30 days. The age at which the animals are assigned to the differential environments has some effect on the magnitude of the cerebral differences; the differences are largest when the experience starts at or shortly after weaning (25 days of age), but significant effects are obtained even if rats are placed in the differential environments at 300 days of age. The difference in cortical weight probably reflects mainly the increased thickness of cortex of EC as compared with IC rats. It has also been found that the cerebral hemispheres of EC rats are both longer and wider than those of IC littermates. When rats under SC conditions (three per cage) have been compared in weights of brain regions with EC and IC littermates, it has been found that the EC animals exceed SC in cortical weight, whereas the IC have significantly lower weights than do SC. That is, enriched experience above the colony level increases cortical weight, while impoverished experience below the colony norm leads to a decrease in cortical weight. Similar relations among the groups have been found for cortical thickness and dendritic branching. Microscopic examination reveals other differences in addition to the numbers of dendritic spines and dendritic branching reported earlier. The average size of synaptic junctions has been found to be greater in EC than in IC rats (Diamond, Lindner, Johnson, Bennett, & Rosenzweig, 1975; West & Greenough, 1972). Altschuler (1976) reported that 80 days of combined nonspecific and specific training led to a doubling of synaptic density in the hippocampus of trained rats compared with that of control rats.
Salkind_Chapter 35.indd 376
9/8/2010 12:05:50 PM
Rosenzweig
Neural Bases of Intelligence and Training
377
Effects on Brain Chemistry An early finding in our laboratory was a small increase in total activity of the enzyme acetylcholinesterase (AChE) in EC as compared with IC rats. Further experiments showed the increase in total AChE to be rather small, but with the more effective enrichment of a seminatural outdoor environment (SNE), total AChE activity in the cortex is clearly and significantly greater in SNE rats than in IC littermates (Rosenzweig, Bennett, Hebert, & Morimoto, 1978). The less specific enzyme cholinesterase (ChE) has been found in numerous experiments to be greater in EC than in IC littermates. This difference suggested that EC brains might have greater numbers of glial cells than do IC brains, and we found such a difference (Diamond, Law, Rhodes, Lindner, Rosenzweig, Kreech, & Bennett, 1966). Nevertheless, we wish to be cautious about the interpretation of ChE as an index to glial number, because blood vessel walls as well as glial cells are rich in ChE activity. Glial cells play several roles in the nervous system – they form insulating sheaths around axons, bridge between neurons and capillaries, and remove dead tissue. It has been speculated that glial cells may also play active roles in learning. Several experiments have shown the RNA/DNA ratio to be a highly reliable chemical indicator of the EC-IC effect. In 90% of more than 550 EC-IC pairs, the RNA/DNA ratio in occipital cortex of the EC rat has been larger than that of the IC rat with an average difference of 8%. An increase in RNA can support heightened chemical synthetic processes in the brain. We have proposed that long-term increases in RNA with enriched experience represent the integrated effect of a continuing series of pulses of increased RNA synthesis and other biosynthetic processes resulting from a number of individual learning experiences. As is the case with tissue weight, somewhat greater effects in RNA are found when the animals are assigned to the differential environments at weaning, but highly significant effects are also found if the differential experience begins only at later ages. Significantly greater diversity of brain RNA has been found in EC than in IC rats in double-blind experiments using unique sequence molecular hybridization (Grouse, Schrier, Bennett, Rosenzweig, & Nelson, 1979). It is tempting to interpret this result as reflecting greater diversity of proteins in the brains of EC than in those of IC rats, but such a conclusion would be premature because of complexities in RNA functions.
Effects on Electrophysiology of Brain Latency of electrophysiological responses of the occipital cortex evoked by flashes of light was measured in EC and IC rats (Edwards, Barry, & Wyspianski, 1969; Mailloux, Edwards, Barry, Roswell, & Achorn, 1974). Latencies of responses were shorter in the visual cortex of EC rats as compared with those
Salkind_Chapter 35.indd 377
9/8/2010 12:05:50 PM
378
Curriculum, Instruction and Learning
with standard colony experience. The authors noted that their finding was in agreement with reports that evoked potential latencies are longer in lowintelligence human beings. Differences in sleep patterns and in the electrophysiology of sleep have been reported between EC and IC animals. McGinty (1971) found that isolation-reared kittens spent less time sleeping than did kittens raised in a complex environment. When the previously isolated kittens were exposed to a complex environment, sleeping time increased. Tagney (1973) similarly reported that EC rats spent more time sleeping than did IC rats; there was apparently no difference between the percentages of time spent in fast-wave or slow-wave sleep. Lambert and Truong-Ngoc (1976) reported that not only did EC rats show more total sleeping time than did IC rats, but the EC rats also had a significantly higher proportion of fast-wave sleep. This finding may be related to the report of Bloch (1976) that formal training increases the percentage of fast-wave sleep in rats, and that preventing the occurrence of fast-wave sleep during the few hours following training impairs consolidation of memory.
Are the Cerebral Effects due to Training or to Other Factors? The differential experience studies were undertaken originally to investigate effects of different amounts of informal learning on the brain. The cerebral effects obtained could, however, be attributed to other aspects of the experimental situations. Thus the results might have been due to such factors as differential amounts of locomotion, handling, or stress. Each of these possibilities has been ruled out by control experiments (see Rosenzweig & Bennett, 1978). Walsh and Cummins (1975) have suggested that differential arousal may play an important role in causing the cerebral effects, but we have given reasons for rejecting this hypothesis (Rosenzweig, 1979; Rosenzweig & Bennett, 1978). Recent experiments support the hypothesis that learning as such can produce measurable changes in the brain, as in the Chang and Greenough (1978) study described earlier. In another study (Bennett et al., 1979), some individually caged rats ran self-paced maze trials between food and water stations, solving a different maze pattern each day for 30 days. They developed changes in weights and in RNA/DNA of brain regions similar to those of rats kept in groups in an enriched environment. But rats that ran self-paced trials through an enclosure without maze barriers did not develop changes from control rats kept in small individual cages. Thus the training itself seems to be the cause of the cerebral effects. In other words, training can produce measurable changes in the anatomy and biochemistry of the brain.
Salkind_Chapter 35.indd 378
9/8/2010 12:05:50 PM
Rosenzweig
Neural Bases of Intelligence and Training
379
Effects on Behavior The report by Hebb (1949) that rats reared as pets learned mazes more rapidly than did rats reared in laboratory cages provided the impetus for a major effort to investigate and understand the effects of differential experience on subsequent learning or problem-solving behavior. Among reviews on this subject are those by Davenport (1976), Greenough (1976), and Rosenzweig and Bennett (1977). Some of the behavioral differences reported appear to be specific to the species or even to the strain tested, to the ages at which differential experience is given, and to the behavioral test employed. The most consistent finding, although there are some exceptions even here, is that EC rats were significantly superior to SC or IC rats in performance on complex mazes such as the Hebb-Williams maze.
Enriched Experience as Therapy for Brain Damage The beneficial effects of enriched experience for learning and problem-solving behavior have led to investigation of the use of experience to alleviate behavioral deficits caused by brain damage. In several experiments, cortex was removed from the occipital area of rats; then the subjects were assigned to EC and either SC or IC environments, and several weeks later were tested on a series of problems in the Hebb-Williams maze. The deleterious effects of the lesions on performance were partially overcome by EC experience, regardless of whether the lesions were inflicted on neonatal rats (Schwartz, 1964; Will, Rosenzweig, & Bennett, 1976), on young postweanlings (Will, Rosenzweig, Bennett, Hebert, & Morimoto, 1977), or on adult rats (Will & Rosenzweig, 1976). In human beings also, there are indications that effects of early brain damage can be alleviated by subsequent enriched experience. Such evidence comes from a study by Holden and Willerman (1973) concerning development and retardation in children diagnosed as having a neurological abnormality at 1 year of age. (Children with Down’s disease were not included in this sample.) The children, part of a large national collaborative study, were evaluated medically at regular intervals. The families were rated on a socioeconomic index. Of the infants from lower-class homes who had been diagnosed as neurologically abnormal, 35% were found to be retarded at age 4 (IQ scores less than 80). Even among the lower-class children classified as neurologically normal at age 1, 14% had IQ scores below 80 at age 4. In contrast, among the children from upper-class homes only 5% of those with a neurological abnormality at 1 showed retarded IQ scores at 4, and none of the neurologically normal were found to be retarded. This finding – that the neurologically impaired upper-class children were less likely to show retarded IQ scores than were the neurologically normal children from lower-class homes – suggests the power of environmental enrichment.
Salkind_Chapter 35.indd 379
9/8/2010 12:05:50 PM
380
Curriculum, Instruction and Learning
Effects on early insufficiency of thyroid secretion on the nervous system could also be counteracted by enriched experience. Davenport (1976) prepared “experimental cretin” rats by impairing thyroid function in utero. These animals performed poorly on the Hebb-Williams maze and on other tests, but enriched experience significantly lessened the degree of the behavioral deficits. Effects of early malnutrition on behavior may also be alleviated by enriched experience. Wells, Geist, and Zimmermann (1972) reported that the deleterious effects of early protein malnutrition in rats on Hebb-Williams scores could be largely overcome by EC experience. It should be noted, however, that many investigators have not found malnutrition by itself to impair intelligent behavior. It may be that heightened motivation masks deficits in previously malnourished animals (Katz, Rosett, & Ostwald, 1979). It has also been noted that in human beings malnutrition usually occurs in a context of poverty and inadequate social stimulation, and it has been suggested (e.g., Richardson, 1976) that it would be too simple to attribute mental retardation or impairment to severe malnutrition as such. Note that the additional background factors are the same ones that have been implicated as causes of sociocultural retardation. In children, effects of early malnutrition on later growth and on intelligence have been investigated in a study in which Korean infants were adopted into middle- and upper-class American homes. The children differed in their nutritional status at the time they entered the Korean orphanage; some were classified as severely malnourished (below the 3rd percentile for height and weight on Korean norms), some were moderately malnourished (between the 3rd and 25th percentiles), and the rest were classified as well nourished. Some were adopted before the age of 2 years and some later. In the children adopted before the age of 2, the effects of malnutrition on size or intelligence were almost entirely overcome (Winick, Meyer, & Harris, 1975). Although the previously severely malnourished group still had the lowest mean height and weight, all three groups were considerably above the Korean norms (but below U.S. norms). In both intelligence and school achievement, all three groups were above the U.S. means, and differences among the groups were small. For those adopted beyond the age of 2, the differences among groups in body size were larger, and the previously severely malnourished group did not quite reach the U.S. means in intelligence or achievement (Nguyen, Meyer, & Winick, 1977). Sustained environmental enrichment overcame much of the consequences of early malnutrition, and this treatment was most effective when it was begun before the age of 2.
Age, Neural Plasticity, and Training Given the findings that training or experience can act as therapy to overcome some consequences of brain damage, are such effects a function of the age of
Salkind_Chapter 35.indd 380
9/8/2010 12:05:50 PM
Rosenzweig
Neural Bases of Intelligence and Training
381
the individual? Is age a factor in overcoming other deficits, such as sensory handicaps? Is there a critical period in the life of an individual when experience is important in overcoming a deficit and outside of which it will not be effective? Experiments with animals have revealed critical periods for certain neural/behavioral systems but not for others, and these findings may be relevant to special education for different kinds of deficits in human beings. Sensory impairment or distorted sensory input affects development of neural pathways and subsequent behavior, but only during a critical period early in life. If one eye of a kitten or a monkey is kept closed or occluded during the first 2 months after birth, that eye will show gravely impaired acuity and is unlikely ever to gain normal acuity thereafter. On the contrary, closing the eye of an adult cat or monkey for a year will produce only a small transient effect on its acuity. In the case of children with congenital sensory deficits, special training can be of great help, but apparently only if it is begun during the first 2 years. This has been found with hard-of-hearing children (Wedenberg, 1954). Hearing aids are now being prescribed for hard-of-hearing infants, not only to aid programs of training but also so that they can benefit from ambient stimulation and informal experience. In the case of children with congenital strabismus, normal binocular vision can usually be restored if surgery is performed before the age of 2, but not thereafter (Banks, Aslin, & Letson, 1975). On the other hand, other brain systems mature considerably later than the sensory systems, and some may retain plasticity into adulthood. Harlow (1959) showed that while monkeys can learn some problems readily at the age of a few weeks, other problems cannot be learned with an adult level of speed and efficiency until 2 years of age, and some require 4 years or more of maturity. Goldman (1974) has confirmed that some brain systems in the monkey take several years to mature and that the same behavioral test may reflect activity of different brain structure in infant and adult monkeys. Earlier we saw that enriched experience could aid recovery of problemsolving ability in adult-operated as well as infant-operated rats. Evidence with monkeys indicates that training may benefit chiefly those that sustain brain lesions at an early age. It is not yet clear, however, whether there is a true species difference here or whether the results reflect different sites of brain lesions and/or different kinds of test problems (Goldman & Lewis, 1978). An unexpected finding is that training and testing that show no immediate benefit may nevertheless aid performance many months later. This was seen in a study by Goldman (1976). Lesions were made in orbital prefrontal cortex of monkeys at either 50 days of age or 18 months. At about 27 months of age they were tested for thousands of trials on a delayed spatial-alternation task, and both groups showed severe impairment as compared with controls. Nine to 12 months later, the animals were tested again. Now the early-operated monkeys showed considerable recovery, but the late-operated did not. A further experiment revealed that even the early-operated monkeys did not show
Salkind_Chapter 35.indd 381
9/8/2010 12:05:51 PM
382
Curriculum, Instruction and Learning
improvement at 3 years of age unless they had been trained at an earlier period. Furthermore, the earlier training can exert a beneficial effect even if it is for quite a different task from that to be tested later. Obviously, these animal experiments can be controlled better than studies with retarded humans or brain-injured patients, and accordingly much more research is needed to advance our understanding of behavioral therapy for inadequate cerebral development or brain lesions. The results to date are encouraging, however, in several respects: They demonstrate that training or enriched experience can promote behavioral recovery after specific brain lesions, that training can be helpful even if given years after the lesion was sustained (although earlier treatment is usually better), and that training may have delayed beneficial effects even if no immediate benefits are seen.
Discussion The emphasis of this chapter has been on brain mechanisms of intelligence and learning and their relevance for future developments in special education. Some evidence suggests the importance of these data. For example, The 77th Yearbook of the National Society for the Study of Education concluded with a chapter on implications of brain research for education (Chall & Mirsky, 1978). Let us note briefly some of the points made in that chapter, and then consider whether special educators are prepared to consider the changes that brain research may bring to their field.
Implications for Education A major theme in Chall and Mirsky (1978) was the central role of environmental stimulation and experience, both in normal growth and development of human and animal brains and in overcoming the effects of inherited deficiencies or of injuries. In essence, the neuroscientists writing in the volume were saying to educators that education is indispensable for optimal development of the brain. While some progress was noted concerning pharmacological methods for effecting behavioral and cognitive changes, greater stress was placed upon education. Even though neurological deficiencies and injuries are feared because of their effects on learning and intelligence, evidence was presented to show that training and experience often help to overcome these handicaps. Research with children and with animals was cited to show the value of training; even if immediate success is not evident, it is often helpful in the long run for the learner to continue to practice. For many difficulties related to brain dysfunctions, emphasis was placed on the individual’s practice under
Salkind_Chapter 35.indd 382
9/8/2010 12:05:51 PM
Rosenzweig
Neural Bases of Intelligence and Training
383
the supervision of a knowledgeable and sensitive teacher. When drugs were mentioned, they were placed in a broader context of training and environmental stimulation. Another theme was the importance of timing. In general, stimulation and appropriate training help most when given early. But even among adults there can be some recovery of function with proper stimulation and retraining. Epstein (1978; 1979) has suggested the provocative hypothesis that growth of brain and of intelligence proceeds by spurts at well-defined ages, with plateaus between the spurts. According to Epstein, attempts at compensatory education are helpful only if given during the periods of rapid brain growth, not if given during plateaus. Although Epstein has gathered much evidence in support of this hypothesis, it has not yet been tested by other investigators and so must be considered as a topic for further research. The yearbook was written during a crest of interests in brain lateralization, and it included possible educational implications of differences in function between the left and right hemispheres of the brain. It now seems likely that the hemispheric differences were exaggerated and that they will not be a source of valuable recommendations for educational practice. At any rate, some of the discoverers of hemispheric differences are now stressing the necessity of coordinated activity of the two hemispheres for normal behavior (Gazzaniga & LeDoux, 1978). Because the brain is so complex and because knowledge about it is constantly growing and changing, the nonspecialist may be perplexed and not know how to incorporate findings of the neurosciences into educational practice. Chall and Mirsky (1978), therefore, hope to see as early as the 1980s a fruitful collaboration between neuroscientists and educators – a collaboration ultimately as productive as the long-standing one between educators and psychologists. Looking further ahead, they foresee the possibility in the next century of a new speciality of educational neuroscientist or educational neuropsychologist. A practitioner of this new profession would be well versed in the latest pedogogical methods as well as in neuropsychological and neurophysiological methods and techniques. Each child who needed special assistance would be tested by new neurophysiological and neurochemical as well as by behavioral methods in order to assess his/her individual strengths and weaknesses and developmental progress. Such a program would be intended to permit early and continuing identification, assessment, and remediation in individually planned pedogogical efforts. But are special educators aware of these developments and prepared for cooperative efforts? The answer, of course, can neither be absolute nor certain. However, in 1976, when PCMR (1976a, 1976b) was predicting that a substantial decrease in the incidence of mental retardation could be achieved by the end of this century, another study also appeared – “A Forecast of Events Affecting the Education of Exceptional Children: 1976–2000” (Schipper & Kenowitz, 1976). This was a study undertaken by the National Association of State
Salkind_Chapter 35.indd 383
9/8/2010 12:05:51 PM
384
Curriculum, Instruction and Learning
Directors of Special Education in order to make present decisions with a view to the future, an effort to avoid “future shock.” A pool of 121 special education administrators from all regions of the United States were surveyed into two successive rounds, using the Delphi method to stimulate brainstorming and planning. More than 800 statements about future events were screened down to 60, and these were rated as to the year in which they were likely to occur and their value to special education – positive, neutral, or negative. The main categories of prediction were legal/statutory, administrative, instructional, and teacher education. It should be noted that not 1 of the 60 predictions dealt with a possible change in the incidence of cases requiring special education. What changes did the experts foresee in 1973–74 when the survey was conducted? Here are some of the most highly rated changes, along with the median year forecast for each: Due process procedures are guaranteed to all exceptional children in the public schools (1980). Seventy percent of all teacher training programs require 6 credits of course work with exceptional children (1980). Twenty-five states have moved teacher-training programs from the college/university campus to the public schools (1985). School calendars adapt the hours in the day and the days in the year to the needs of the handicapped (1990).
And here are some of the negatively valued predictions: The Supreme Court rules that compulsory school attendance is unconstitutional (1990). The pendulum swings back from mainstreaming to segregating handicapped children in the public schools (1990). Teacher unions dominate decisions affecting educational enrollments and in-service training systems for special education programs in 25 states (1990).
Only a few of the forecast trends had to do with biological research on retardation or other handicapping conditions. These were the following: Preservice training of special educators includes an academic and clinical course on basic concepts of medicine and principles of drug-induced behavior modification (1990; somewhat favorably evaluated). Drugs control 40% of the behavioral results of handicapped conditions (1990; neutrally evaluated). Non-habit-forming drugs that accelerate learning are administered daily by school personnel to 40% of the public school population (2000; somewhat negatively evaluated).
Note that these predictions all had to do with drugs, and that whereas threequarters of all the predictions were evaluated favorably, only one of the these
Salkind_Chapter 35.indd 384
9/8/2010 12:05:51 PM
Rosenzweig
Neural Bases of Intelligence and Training
385
three was so evaluated. It appears that this pool of special education administrators did not expect much from biological research and their view of this field was rather narrow, being largely confined to drug therapy. As was noted in the preceding section, this emphasis on drug therapy is not characteristic of current research on brain mechanisms of learning and memory. If the prediction of the President’s Committee on Mental Retardation (1976a, 1976b) that a substantial decrease in mental retardation by the end of the century proves correct, and if Chall and Mirsky (1978) are correct in foreseeing collaboration between educators and neuroscientists, then special educators may indeed encounter future shock.
Notes 1. The National Institute of Education sponsored a state-of-the-art symposium on neural mechanisms of learning and memory in 1974, and an updated collection of the papers appeared later (Rosenzweig & Bennett, 1976a). A further sign of developing interest by educators in the neural bases of learning is that The 77th Yearbook of the National Society for the Study of Education was devoted to the topic “Education and the Brain” (Chall & Mirsky, 1978). 2. A review of causes of mental retardation was presented in a publication of the U.S. President’s Committee on Mental Retardation (1976a).
References Adelman, H. S., & Compas, B. E. Stimulant drugs and learning problems. Journal of Special Education, 1977, 11, 377–415. Altschuler, R. A. Changes in hippocampal synaptic density with increased learning experience in the rat. Society for Neuroscience Abstracts, 1976, 2, 438. Banks, M. S., Aslin, R. N., & Letson, R. D. Sensitive period for the development of human binocular vision. Science, 1975, 190, 675–677. Bennett, E. L., Rosenzweig, M. R., Morimoto, H., & Hebert, M. Maze training alters brain weights and cortical RNA/DNA ratios. Behavioral and Neural Biology, 1979, 26, 1–22. Bloch, V. Brain activation and memory consolidation. In M. R. Rosenzweig & E. L. Bennett (Eds.), Neural mechanisms of learning and memory. Cambridge, Mass.: MIT Press, 1976. Chall, J. S., & Mirsky, A. F. (Eds.), Education and the brain, The seventy-seventh year-book of the National Society for the Study of Education, Part II. Chicago: The University of Chicago Press, 1978. Chang, F. F., & Greenough, W. T. Increased dendritic branching in hemispheres opposite eyes exposed to maze training in split-brain rats. Society for Neuroscience Abstracts, 1978, 4, 469. Davenport, J. W. Environment as therapy for brain effects of endocrine dysfunction. In R. N. Walsh & W. T. Greenough (Eds.), Environments as therapy for brain dysfunction. New York: Plenum Press, 1976. Davis, W. J., & Gillette, R. Neural correlate of behavioral plasticity in command neurons of pleurobranchaea. Science, 1978, 199, 801–804. Davis, K. L., Mohs, R. C., Tinklenberg, J. R., Pfefferbau, A., Hollister, L. E., & Kopell, B. S. Physostigmine: Improvement of long-term memory processes in normal humans. Science, 1978, 201, 272–274.
Salkind_Chapter 35.indd 385
9/8/2010 12:05:51 PM
386
Curriculum, Instruction and Learning
Deutsch, J. A. The cholinergic synapse and the site of memory. Science, 1971, 174, 788–794. de Weid, D., van Wimersma Greidanus, T. B., Bohus, B., Urban, I., & Gispen, W. H. Vasopressin and memory consolidation. In M. A. Corner & D. F. Swaab (Eds.), Perspectives in brain research. Amsterdam: Elsevier, 1976. Diamond, M. C., Law, F., Rhodes, H., Lindner, B., Rosenzweig, M. R., Krech, D., & Bennett, E. L. Increases in cortical depth and glia numbers in rats subjected to enriched environment. Journal of Comparative Neurology, 1966, 128, 117–125. Diamond, M. C., Lindner, B., Johnson, R., Bennett, E. L., & Rosenzweig, M. R. Differences in occipital cortical synapses from environmentally enriched, impoverished, and standard colony rats. Journal of Neuroscience Research, 1975, 1, 109–119. Dunn, A. J. Neurochemistry of learning and memory: An evaluation of recent data. Annual Review of Psychology, 1980, 31, 343–390. Edwards, H. P., Barry, W. F., & Wyspianski, J. O. Effect of differential rearing on photic evoked potentials and brightness discrimination in the albino rat. Developmental Psychobiology, 1969, 2, 133–138. Epstein, H. T. Growth spurts during brain development: Implications for educational policy and practice. In J. S. Chall A. F. Mirsky (Eds.), Education and the Brain. Chicago: The University of Chicago Press, 1978. Epstein, H. T. Correlated brain and intelligence development in humans. In M. E. Hahn, C. Jensen, & B. C. Dudek (Eds.), Development and evolution of brain size: Behavioral implications. New York: Academic Press, 1979. Flood, J. F., & Jarvik, M. E. Drug influences on learning and memory. In M. R. Rosenzweig & E. L. Bennett (Eds.), Neural mechanisms of learning and memory. Cambridge, Mass.: MIT Press, 1976. Flood, J. F., Jarvik, M. E., Bennett, E. L., Orme, A. E., & Rosenzweig, M. R. Memory: Modification of anisomycin-induced amnesia by stimulants and depressants. Science, 1978, 324–326. Foster, M., & Sherrington, C. S. A textbook of physiology. Part III. The central nervous system. New York: Macmillan, 1897. Gazzaniga, M. S., & Le Doux, J. E. The integrated mind. New York: Plenum Press, 1978. Globus, A., Rosenzweig, M. R., Bennett, E. L., & Diamond, M. C. Effects of differential experience on dendritic spine counts. Journal of Comparative and Physiological Psychology, 1973, 82, 175–181. Goldman, P. S. An alternative to developmental plasticity: Heterology of CNS structures in infants and adults. In D. G. Stein, J. J. Rosen, & N. Butters (Eds.), Plasticity and recovery of function in the central nervous system. New York: Academic Press, 1974. Goldman, P. S. The role of experience in recovery of function following orbital prefrontal lesions in infant monkeys. Neuropsychologia, 1976, 14, 401–412. Goldman, P. S., & Lewis, M. E. Developmental biology of brain damage and experience. In C. W. Cotman (Ed.), Neuronal plasticity. New York: Raven Press, 1978. Greenough, W. T. Enduring brain effects of differential experience and training. In M. R. Rosenzweig and E. L. Bennett (Eds.), Neural mechanisms of learning and memory. Cambridge, Mass.: MIT Press, 1976. Grouse, L. D., Schrier, B. K., Bennett, E. L., Rosenzweig, M. R., & Nelson, P. G. Sequence diversity studies of rat brain RNA: Effects of environmental complexity on rat brain total RNA diversity. Journal of Neurochemistry, 1979, 30, 191–203. Harlow, H. F. The development of learning in the rhesus monkey. American Scientist, 1959, 47, 459– 479. Hebb, D. O. The organization of behavior. New York: Wiley, 1949. Holden, R., & Willerman, L. Neurological abnormality in infancy, preschool intelligence, and social class. In P. Trapp & P. Himmelstein (Eds.), The exceptional child (2nd ed.). New York: Appleton-Century-Crofts, 1973.
Salkind_Chapter 35.indd 386
9/8/2010 12:05:51 PM
Rosenzweig
Neural Bases of Intelligence and Training
387
Huttenlocher, P. R. Synaptic and dendritic development and mental defect. In N. A. Buchwald & M. A. B. Brazier (Eds.), Brain mechanisms in mental retardation (UCLA Forum in Medical Sciences, No. 18). New York: Academic Press, 1975. Kandel, E. R. Cellular basis of behavior: An introduction to behavioral neurobiology. San Francisco: Freeman, 1976. Kandel, E. R. Small systems of neurons. Scientific American,1979, 241, 66 –76. Katz, H. B., Rosett, R. E., & Ostwald, R. The compensatory role of food motivation in the maze learning performance of lactationally undernourished rats. Developmental Psychobiology, 1979, 12, 305–315. Lambert, J.-F., & Truong-Ngoc, A. Influence de l’environnement instrumental et social sur la structure d’un échantillon du cycle veille-sommeil chez le rat Wistar mâle: Corrélations avec les modifications de l’excitabilité du système réticulo-cortical. Agressologie, 1976, 17, 19–25. Mailloux, J. G., Edwards, H. P., Barry, W. F., Rowsell, H. C., & Achorn, E. G. Effects of differential rearing on cortical evoked potentials of the albino rat. Journal of Comparative and Physiological Psychology, 1974, 87, 475– 480. McGinty, D. J. Encephalization and the neural control of sleep. In M. B. Sterman, D. J. McGinty, & A. M. Adinolfi (Eds.), Brain development and behavior. New York: Academic Press, 1971. Mpitsos, G. J., Collins, S. D., & McClellan, A. D. Learning: A model system for physiological studies, Science, 1978, 199, 497–506. Mykleburst, H. R., & Boshes, B. Final report, minimal brain damage in children. Washington, D.C.: U.S. Department of Health, Education and Welfare, 1965. Nguyen, M. L., Meyer, K. K., & Winick, M. Early malnutrition and “late” adoption: A study of their effects at the development of Korean orphans adopted into American families. American Journal of Clinical Nutrition, 1977, 30, 1734–1739. Oliveros, J. C., Jandali, M. K., Tinsit-Berthier, M., Remy, R., Benghezal, A., Audibert, A., & Moeglen, J. M. Vasopressin in amnesia. Lancet, 1978, 1, 42. Peters, B. H., & Levin, H. S. Effects of physostigmine and lecithin on memory in Alzheimer disease. Annals of Neurology, 1979, 6, 219–222. Purpura, D. P. Normal and aberrant neuronal development in the cerebral cortex of human fetus and young infant (UCLA Forum in Medical Sciences, No. 18). In N. A. Buchwald & M. A. B. Brazier (Eds.), Brain mechanisms in mental retardation. New York: Academic Press, 1975. Richardson, S. A. The influence of severe malnutrition in infancy on the intelligence of children at school age: An ecological perspective. In R. N. Walsh & W. T. Greenough (Eds.), Environments as therapy for brain dysfunction. New York: Plenum Press, 1976. Rosenzweig, M. R. Responsiveness of brain size to individual experience: Behavioral and evolutionary implications. In M. Hahn, C. Jensen, & B. Dudek (Eds.), Development and evolution of brain size: Behavioral implications. New York: Academic Press, 1979. Rosenzweig, M. R., Bennett, E. L. (Eds.). Neural mechanisms of learning and memory. Cambridge, Mass.: MIT Press, 1976. (a) Rosenzweig, M. R., & Bennett, E. L. Enriched environments: Facts, factors, and fantasies. In L. Petrinovich and J. L. McGaugh (Eds.), Knowing, thinking, and believing. New York: Plenum Press, 1976 (b) Rosenzweig, M. R., & Bennett, E. L. Effects of environmental enrichment or impoverishment on learning and on brain values in rodents. In A. Oliverio (Ed.), Genetics, environment, and intelligence. Amsterdam: Elsevier/North-Holland, 1977. Rosenzweig, M. R., & E. L. Bennett. Experiential influences on brain anatomy and brain chemistry in rodents. In G. Gottlieb (Ed.), Studies on the development of behavior and the nervous system (Vol. 4). Early influences. New York: Academic Press, 1978.
Salkind_Chapter 35.indd 387
9/8/2010 12:05:51 PM
388
Curriculum, Instruction and Learning
Rosenzweig, M. R., Bennett, E. L., Hebert, M., & Morimoto, H. Social grouping cannot account for cerebral effects of enriched environments. Brain Research, 1978, 158, 563–576. Rutledge, L. T. Synaptogenesis: Effects of synaptic use. In M. R. Rosenzweig & E. L. Bennett (Eds.), Neural mechanisms of learning and memory. Cambridge, Mass.: MIT Press, 1976. Schipper, W. V., & Kenowitz, L. A. Special education futures—a forecast of events affecting the education of exceptional children: 1976–2000. The Journal of Special Education, 1976, 10, 401–413. Schwartz, S. Effect of neonatal cortical lesions and early environmental factors on adult rat behavior. Journal of Comparagive and Physiological Psychology, 1964, 57, 72–77. Sitaram, N., Weingartner, H., & Gillin, J. C. Human serial learning: Enhancement with arecholine and choline and impairment with scopolamine. Science, 1978, 201, 274–276. Stratton, L. O., & Petrinovich, L. F. Post-trial injection of an anticholinesterase drug and maze learning in two strains of mice. Psychopharmacologia, 1963, 5, 47–54. Tagney, J. Sleep patterns related to rearing rats in enriched and impoverished environments. Brain Research, 1973, 53, 353–361. Tanzi, E. I fatti e le induzioni nell’odierna isologia del sistema nervoso. Revista Sperimentale di Freniatria e di Medicina Legale, 1893, 19, 419– 472. U.S. President’s Committee on Mental Retardation. Mental retardation: The known and the unknown. Washington, D.C., 1976 (DHEW Publication No. (OHD) 76 -21008). (a) U.S. President’s Committee on Mental Retardation. Mental retardation: Century of decision. Washington, D.C. (DHEW Publication No. (OHD) 76-21013). (b) Walsh, R. N., & Cummins, R. A. Mechanisms mediating the production of environmentally induced brain changes. Psychological Bulletin, 1975, 82, 986 –1000. Walters, E. T., Carew, T. J., & Kandel, E. R. Classical conditioning in Aplysia californica. Proceedings National Academy of Sciences U.S.A., 1979, 76, 6675–6679. Wedenberg, E. Auditory training of severely hard-of-hearing preschool children. Acta OtoLaryngologica, Suppl. 110, 1954. Wells, A. M., Geist, C. R., & Zimmermann, R. R. Influence of environmental and nutritional factors on problem solving in the rat. Perceptual and Motor Skills, 1972, 35, 235–244. West, R. W, & Greenough, W. T. Effect of environmental complexity on cortical synapses of rats: Preliminary results. Behavioral Biology, 1972, 7, 279–284. Will, B. E., & Rosenzweig, M. R. Effets de l’environnement sur la récupération fonctionnelle après lésions cérébrales chez des rats adultes. Biology of Behavior, 1976, 1, 5–16. Will, B. E., Rosenzweig, M. R., & Bennett, E. L. Effects of differential environments on recovery from neonatal brain lesions, measured by problem-solving scores. Physiology & Behavior, 1976, 16, 603–611. Will, B. E., Rosenzweig, M. R., Bennett, E. L., Hebert, M., & Morimoto, H. Relatively brief environmental enrichment aids recovery of learning capacity and alters brain measures after postweaning brain lesions in rats. Journal of Comparative and Physiological Psychology, 1977, 91, 33–50. Winick, M., Meyer, K. K., & Harris, R. C. Malnutrition and environmental enrichment by early adoption. Science, 1975, 190, 1173–1175.
Salkind_Chapter 35.indd 388
9/8/2010 12:05:51 PM
Salkind_Chapter 35.indd 389
9/8/2010 12:05:51 PM
Salkind_Chapter 35.indd 390
9/8/2010 12:05:51 PM
Salkind_Chapter 35.indd 391
9/8/2010 12:05:51 PM
Salkind_Chapter 35.indd 392
9/8/2010 12:05:51 PM
Salkind_Chapter 35.indd 393
9/8/2010 12:05:51 PM
Salkind_Chapter 35.indd 394
9/8/2010 12:05:51 PM
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY
Salkind_Prelims III.indd i
9/4/2010 11:13:18 AM
Salkind_Prelims III.indd ii
9/4/2010 11:13:18 AM
SAGE LIBRARY OF EDUCATIONAL THOUGHT AND PRACTICE
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY VOLUME III
Edited by
Neil J. Salkind
Salkind_Prelims III.indd iii
9/4/2010 11:13:19 AM
Introduction and editorial arrangement © Neil J. Salkind 2011 First published 2011 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. Every effort has been made to trace and acknowledge all the copyright owners of the material reprinted herein. However, if any copyright owners have not been located and contacted at the time of publication, the publishers will be pleased to make the necessary arrangements at the first opportunity. SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1, Mohan Cooperative Industrial Area Mathura Road New Delhi 110 044 SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN: 978-0-85702-178-6 (set of five volumes) Library of Congress Control Number: 2010923776 Typeset by Mukesh Technologies Pvt. Ltd., Pondicherry, India. Printed on paper from sustainable resources Printed by MPG Books Group, Bodmin Cornwall
Salkind_Prelims III.indd iv
9/13/2010 3:53:36 PM
Contents Volume III Section II: Curriculum, Instruction and Learning (Continued) 36. 37. 38. 39. 40. 41.
Human Intelligence: An Introduction to Advances in Theory and Research David F. Lohman Cognitive Demands of New Technologies and the Implications for Learning Theory Richard J. Torraco Cognitive Conceptions of Learning Thomas J. Shuell Meaning in Complex Learning Ronald E. Johnson Phases of Meaningful Learning Thomas J. Shuell Growth, Development, Learning, and Maturation as Factors in Curriculum and Teaching William C. Trow
3 51 79 109 141 161
Section III: Motivation 42. 43. 44. 45. 46. 47. 48. 49. 50.
Maslow, Monkeys and Motivation Theory Dallas Cullen Maslow’s Theory of Motivation: A Critique Andrew Neher Caught on Fire: Motivation and Giftedness Ann Robinson An Empirical Test of Maslow’s Theory of Motivation Eugene W. Mathes and Linda L. Edwards Meaningfulness, Commitment, and Engagement: The Intersection of a Deeper Level of Intrinsic Motivation Neal Chalofsky and Vijay Krishna Motivation and Human Growth: A Developmental Perspective M.S. Srinivasin Evolutionary Perspectives on Human Motivation Jutta Heckhausen The Debate about Rewards and Intrinsic Motivation: Protests and Accusations Do Not Alter the Results Judy Cameron and W. David Pierce A Comprehensive Expectancy Motivation Model: Implications for Adult Education and Training Kenneth W. Howard
Salkind_Prelims III.indd v
175 195 215 219 223 237 247 263 279
9/4/2010 11:13:19 AM
vi
51.
52. 53.
54. 55. 56.
57.
Contents
The Academic Motivation Scale: A Measure of Intrinsic, Extrinsic, and Amotivation in Education Robert J. Vallerand, Luc G. Pelletier, Marc R. Blais, Nathalie M. Brière, Caroline Senécal and Evelyne F. Vallières Extrinsic Rewards and Intrinsic Motivation in Education: Reconsidered Once Again Edward L. Deci, Richard Koestner and Richard M. Ryan Beyond the Rhetoric: Understanding Achievement and Motivation in Catholic School Students Janine Bempechat, Beth A. Boulay, Stephanie C. Piergross and Kenzie A. Wenk Dimensions of School Motivation: A Cross-cultural Validation Study Dennis M. McInerney and Kenneth E. Sinclair Achievement Motivation in Children of Three Ethnic Groups in the United States Manuel Ramirez III and Douglass R. Price-Williams Motivation and Learning Environment Differences between Resilient and Nonresilient Latino Middle School Students Hersholt C. Waxman, Shwu-yong L. Huang and Yolanda N. Padrón Attracting and Retaining Teachers: A Question of Motivation Karin Müller, Roberta Alliata and Fabienne Benninghoff
Salkind_Prelims III.indd vi
291
305 333
345 361 369
387
9/4/2010 11:13:19 AM
Section II: Curriculum, Instruction and Learning (Continued )
Salkind_Chapter 36.indd 1
9/4/2010 10:42:05 AM
Salkind_Chapter 36.indd 2
9/4/2010 10:42:06 AM
36 Human Intelligence: An Introduction to Advances in Theory and Research David F. Lohman
W
hat is intelligence? How does it develop? Does it decline? Has cognitive science really changed our understanding of this construct? Old questions about intelligence have been raised with a renewed vigor, and new questions have been posed. In short, there has been a remarkable resurgence of research on human abilities in the past 15 years, fueled in part by legal challenges to intelligence tests, but in even larger measure by a renewed interest in cognition in psychology. New methods of investigation and theories of cognition have been applied to old tests and theories of individual differences. Although the results have not met the loftier expectations of some advocates, progress has been made. The purpose of this paper is to provide a sampling of this progress, to note some of the problems that have attended it, and to suggest some research strategies for future research on human intelligence. I focus on three research traditions: trait theories of intelligence, informationprocessing theories of intelligence, and general theories of thinking. The discussion of trait theories of intelligence focuses on Cattell’s (1963) theory of fluid and crystallized abilities, particularly the elaborations of this theory proposed by Horn (1985) and by Snow (1981). Their work provides a convenient framework for the discussion of information-processing theories of intelligence. First, I summarize attempts to build process theories of the major factors identified in Horn’s (1985) model, such as the work of Jensen (1982) and Eysenck (1982) on mental speed, of Hunt (1985) and Frederiksen (1982) on verbal-crystallized abilities, of Sternberg (1977) on fluid-reasoning abilities, Source: Review of Educational Research, 59(4) (1989): 333–373.
Salkind_Chapter 36.indd 3
9/4/2010 10:42:06 AM
4
Curriculum, Instruction and Learning
and of Pellegrino and Kail (1982) and Lohman (1988) on spatial-visualization ability. This section concludes with a discussion of Sternberg’s (1984, 1985) recent attempts to develop a comprehensive theory of intelligence. I then turn the problem around. Instead of asking how cognitive science might help us understand existing tests or ability constructs, I ask how a theory of intelligence might be derived from the sort of general theories of thinking currently advanced in cognitive psychology and artificial intelligence (AI). Here the discussion emphasizes Anderson’s (1983) ACT∗ theory (the latest version of his Adaptive Control of Thought system) and the “New Connectionism” of Rumelhart, McClelland, and the PDP (Parallel Distributed Processing) Research Group (1986). The paper concludes with some speculations about the meaning of the construct intelligence and some suggestions for research on it. The resurgence of general ability. Several developments converged in the early 1970s to renew interest in the construct intelligence. First, there was the growing realization that the ability profiles provided by multiple-aptitude batteries were not as useful for prediction as many had hoped (McNemar, 1964). Although there were exceptions, the predictive validities of the several scores from multiple-aptitude batteries were repeatedly found to be little better than the corresponding validity of one general factor estimated from the same battery.1 Nor were the specific abilities that Thurstone (1938) and Guilford (1959) had identified of much use in attempts to adapt instructional methods to the ability profile of the learner. Instead, general ability accounted for most of the findings. In their summary of 20 years of research on Aptitude X Treatment interactions, Cronbach and Snow (1977) concluded: It has become fashionable to decry the use of measures of general ability, and sometimes their use has been prohibited in school systems. The attackers usually insist that the tests do not assess ability to learn, and it is often proposed to substitute measures of achievement or “learning styles.”. . . While we see merit in a hierarchical conception of abilities, with abilities differentiated at coarse and fine levels, we have not found Guilford’s subdivision a powerful hypothesis. . . . Instead of finding general abilities irrelevant to school learning, we find nearly ubiquitous evidence that general measures predict amount learned or rate of learning or both. And, whereas we had expected specialized abilities rather than general abilities to account for interactions, the abilities that most frequently enter into interactions are general. Even in those programs of research that started with specialized ability measures and found interactions with treatment, the data seem to warrant attributing most effects to general ability, (pp. 496 – 497)
Thus, on one hand, special abilities failed either to predict educational outcomes better than general ability or to predict which students would profit from specialized educational interventions designed to match their particular patterns of abilities. On the other hand, American theorists gradually adopted a hierarchical model of abilities which, while allowing for both broad and narrow abilities, clearly emphasized the role of general ability.
Salkind_Chapter 36.indd 4
9/4/2010 10:42:06 AM
Lohman
Human Intelligence
5
The cognitive revolution. The second development was an outgrowth of the cognitive revolution in psychology. From Watson (1925) until Skinner (1953), American psychology was dominated by the belief that mind was not the proper subject matter for psychology. Studies of animal learning or conditioning were the norm. Thinking and reasoning were considered complex behaviors that would be explained sometime in the future after elementary mechanisms of learning were adequately understood. By the mid-1960s, however, this promise was wearing thin. Psychology seemed not to be building toward the explanation of complex phenomena but, if anything, was digging increasingly deeper into reductionism. Some had already called for a rejection of radical behaviorism on theoretical grounds (Chomsky, 1959). But it was the emergence of the computer as a metaphor for mind and as a vehicle for testing theories about thinking that finally dethroned behaviorism. Rather swiftly, the mainstream of psychology moved from conditioning to perception and then to thinking and problem solving. By 1985, in the first paragraph of his introductory text on cognitive psychology, Anderson was proclaiming, “the goal of cognitive psychology is to understand human intelligence and how it works” (p. 1). Thus, in 2 decades, the word intelligence moved from the periphery of American psychology to its center.2 The cognitive revolution had two rather different influences on theories of human intelligence. There were some who saw that the methods and theories of the cognitive psychologists provided a new way to understand what intelligence and other ability tests were really measuring. Carroll (1976), Glaser (1972), Hunt (e.g., Hunt, Frost, & Lunneborg, 1973), Sternberg (1977), and Snow (e.g. Snow, Marshalek, & Lohman, 1976) were leaders in this effort. There were others, however, who were not at all concerned with intelligence as an individual difference construct. These investigators sought to develop theories of human cognition and, at times, to simulate their theories in computer programs that then displayed AI. Both of these efforts will be briefly reviewed in this paper.
The Challenge of Process Although most research on intelligence has focused on the products of intelligence, both theoreticians and clinicians have long called for greater attention to the process of intelligent thinking.3 Nobody has ever made an inventory of tasks [that define the universe of intellectual tasks], determined the correlation of each with intellect, selected an adequate battery of them, and found the proper weights to attach to each . . . If anybody did this wisely, a large fraction of his labor would be precisely to find out what abilities our present instruments did measure, and how these abilities were related to intellect; or to find out what abilities
Salkind_Chapter 36.indd 5
9/4/2010 10:42:06 AM
6
Curriculum, Instruction and Learning
constituted intellect, and how these abilities were measured by our present instruments. (E. L. Thorndike, Bregman, Cobb, & Woodyard, 1926, p. 2)
Three decades later in his call for the unification of the two disciplines of scientific psychology – the correlational psychology of mental testing and the experimental psychology of learning – Cronbach (1957) argued Sophistication in data analysis has not been matched by sophistication in theory. The correlational psychologist was led into temptation by his own success, losing himself first in practical prediction, then in a narcissistic program of studying his tests as an end in themselves. A naive operationism enthroned theory of test performance in the place of theory of mental processes. (p. 675)
In this Cronbach echoed Thurstone (1947), who considered a factor-analytic study of abilities only the first step in a research program. Ability factors identified in such studies should be investigated in experiments designed to manipulate and thus identify “the processes which underlie” the factors (p. 55). But such experiments had little appeal in a psychology dominated by behaviorism, and so the research program Thurstone advocated had to await the rediscovery of mental process by the mainstream of American experimental psychology.
Cognitive Science and the Computer Metaphor Recent research on intelligence has been driven by a renewed interest in cognition in psychology and in many other fields. Cognitive science is the term now commonly used to refer to this new blend of computer science, cognitive psychology, linguistics, neuropsychology, philosophy, and instructional psychology. Although roots of the cognitive revolution may be traced to many earlier sources, several observers see 1956 as the pivotal year in the development of cognitive science. In that year, Newell and Simon (see Newell, Shaw, & Simon, 1957) reported their success in devising a computer program that could actually prove theorems in logic. In the same year, Bruner, Goodnow, and Austin published their Study of Thinking, and Miller published a seminal paper on short-term memory in which he argued that the capacity of this memory store seemed to be limited by “the magic number seven” (Newell & Simon, 1972, p. 4). The cognitive revolution gathered momentum in the 1960s and achieved ascendency during the 1970s (see Gardner, 1985). The computer has contributed importantly to this revolution in at least two ways. The most obvious contribution of the computer has been as a metaphor for human cognition.4 This metaphor has taken several forms. At the simplest level, direct analogies have been made between the hardware of
Salkind_Chapter 36.indd 6
9/4/2010 10:42:06 AM
Lohman
Human Intelligence
7
the computer and the human cognitive system. Computers have devices for encoding information from external sources (card readers, keyboards), temporarily storing it (memory buffers), transforming it (central processors), retaining it on long-term storage devices (tapes, disks), and producing output (printers, video displays). Early models of human information processing relied heavily on this analogy in positing similar structures in the human cognitive system. When used in this way, the computer is but the latest mechanical metaphor for mind in psychology (Marshall, 1977). Although more sophisticated than previous metaphors such as the wax tablet or the hydraulic pump, the computer metaphor is incomplete and even misleading. For example, some researchers have begun to question the extent to which theorizing has been artificially constrained by the serial-processing, digital computer. New research programs based on parallel processing may circumvent some of these problems, particularly for modeling perception and other nonlinguistic processes. But, as will be explained, these theories have their critics too. Some analogies between computers and human cognition go considerably beyond comparisons of the superficial characteristics of system hardware. In particular, it is argued that similar principles govern the functioning of any system that processes information. Fodor (1981) and others who espouse this computational metaphor for thought treat the mind as a device for manipulating symbols. At this level of abstraction, differences in hardware, whether electronic or neurophysiological, are thought to be irrelevant. Whether such an assumption is tenable is a hotly debated issue in cognitive science. However, all would agree that the contribution of the computer has far exceeded its admittedly limited value as a metaphor for the human cognitive system. The greater contribution of the computer has been as a tool for developing and testing theories of cognition or, as Anderson and Bower (1973) put it, for experimenting on the nature of the connection between stimulus and response. In this way, the computer has changed the evidentiary base to include something other than human behavior. Theories of thinking and learning can be formalized as computer programs. Programs gain a measure of plausibility if they solve problems using sequences of steps that are similar to the steps used by successful human problem solvers or if, when failing to solve problems, they make errors that mimic human errors. A constant exchange between those who study human problem solving in the psychological laboratory and those who attempt to develop computer programs that display AI serves to refine and extend both efforts. Some would object that such comparisons between humans and computers diminish human dignity. However, cognitive science makes no pretense that computational theories completely account for human cognition. Computational models of thought are in principle no different from computational models of the weather (Miller, 1981). Yet, as Miller observes, no one fears that a tornado might destroy the computer center when the computer is used to model the behavior of tornadoes. Nor do we dismiss efforts to model the weather
Salkind_Chapter 36.indd 7
9/4/2010 10:42:06 AM
8
Curriculum, Instruction and Learning
because such models will never produce rain. Perhaps we expect more from computational models of thought because “the brain is itself a computer in a sense in which the weather is not,” and so a “computer that models an intelligent brain is expected to be a brain” (p. 220).
Contributions of Cognitive Research Cognitive science has contributed to the understanding of human intelligence in three ways. First, methods and theories of cognitive science have been applied to existing tests of intelligence, either through experimental analysis of tasks taken from intelligence and other ability tests, or through careful study of the problem-solving or other information-processing characteristics of individuals identified as more or less able by existing tests. In this way, cognitive psychology offers a new source of evidence on the construct validity of tests and the ability factors they define. Second, tests of intelligence and narrow abilities are often used to predict performance in some non–test situation (e.g., conventional schooling). Careful study of the knowledge and processing demands of these criterial performances has led to the development of new measurement strategies and suggestions for the refinement of existing measures (Frederiksen, 1984; Snow & Lohman, 1989). Third, cognitive science has sought to move beyond existing definitions of intelligence grounded in individual differences to develop general theories of thinking and learning. New measures are then developed to estimate particular processes or knowledge structures hypothesized by these theories. Patterns of individual differences on these new measures are then investigated, usually by determining relationships between new measures and scores on existing tests or experimental tasks. The following section contains a brief review of attempts to understand intelligence through the study of existing tests or ability constructs defined by such tests. Cattell’s theory of fluid and crystallized abilities has had a major impact on these efforts, particularly the theories of Horn (1985), Snow (1981), and Sternberg (1985), and so his theory and recent extensions of it are summarized first. Then, experimental research on four of the major ability constructs identified by Cattell, Horn, and other theorists is summarized. The four constructs are verbal-crystallized (Gc) ability, spatial-visualization (Gv) ability, fluid-reasoning (Gf ) ability, and mental speed (Gs).
Controversies about Intelligence Controversies about the nature of intelligence seem to repeat themselves. Two of the most important controversies relate to the question of whether the general (sometimes called g) factor that is commonly equated with intelligence should be viewed as a psychological entity, or whether it is merely a mathematical abstraction. E. L. Thorndike (see E. L. Thorndike et al., 1926)
Salkind_Chapter 36.indd 8
9/4/2010 10:42:06 AM
Lohman
Human Intelligence
9
and Thomson (1920) were early advocates of the view that responses to items on intelligence tests represent a particular sample of mental bonds, and thus intelligence was better understood as a mathematical abstraction than as a psychological entity. Humphreys (1985) gives a modern statement of this view. Spearman, on the other hand, interpreted the g factor as the ability to reduce relations and correlates. Sternberg’s (1977) early work on analogical reasoning constitutes a modern version of this view. This controversy has important implications for the potential contributions of cognitive theory to a theory of intelligence. If the ubiquitous general factor is simply a mathematical dimension (Humphreys, 1985), then analyses of tasks used on intelligence tests are unlikely to isolate a particular set of mental processes that are the core of intelligence. In fact, tests that are a good measure of this dimension should be composed of maximally heterogeneous items and thus would be psychologically complex (Humphreys, 1986). However, higher order processes such as coordination of existing routines or assembly of new routines (Snow, 1981) might still emerge across diverse performances (see Butcher, 1968, p. 25).5 The second controversy, often correlated with the first, is whether intelligence is an innate cognitive capacity or, instead, an acquired set of cognitive competencies.6 Hereditarians such as Burt (1958), Terman (1922), Jensen (1980), and Eysenck (1982) argue that good intelligence tests are – or should be – measures of this basic, biologically-based capacity. Others, such as Humphreys (1986) and Cronbach (1972), claim that potential and capacity are pie-in-the-sky concepts with no place in a scientific account of human ability. In fact, both argue that the psychology of individual differences would be well rid of the term intelligence. This controversy is reflected in the search for neurological correlates of intelligence test scores among the hereditarians and perhaps in the search for an explanation of intelligence in terms of structural differences (e.g., capacity of working memory, rate of information transfer in memory) by like-minded cognitive psychologists.7 On the other hand, those who believe that abilities are acquired competencies tend to emphasize the importance of knowledge in thinking (Glaser, 1984), to study the development of abilities rather than attempt to explain individual differences at a particular point in time (Kail & Pellegrino, 1985), and to view intelligence as a product of formal schooling, not simply as a predictor of success in that medium (Snow & Yalow, 1982). The third perennial controversy concerns the question of whether intelligence is unitary, as Spearman emphasized, or has multiple dimensions, as E. L. Thorndike, Thurstone, and Guilford emphasized.
The Theory of Fluid and Crystallized Abilities It is fitting that the most popular current resolution to the debate between Spearman, Thorndike, and Thurstone about the dimensions of intelligence was proposed by an Englishman who received his PhD under Spearman (in 1929),
Salkind_Chapter 36.indd 9
9/4/2010 10:42:06 AM
10
Curriculum, Instruction and Learning
completed a postdoctoral fellowship under E. L. Thorndike (in 1937), conducted research with both Burt and Thurstone (Cattell, 1971, p. ix), and eventually took up permanent residence in the United States. In 1941, shortly after accepting a position at Harvard, Cattell proposed a quasi-hierarchical model of human abilities with two general factors at the apex (rather than the one advocated by Spearman). Each was defined by several of the primary factors Thurstone had identified. Cattell called these two factors fluid intelligence (Gf ) and crystallized intelligence (Gc). In the earliest published account of the theory, Cattell (1943) argued that fluid ability was “a purely general ability to discriminate and perceive relations between any fundaments, new or old” (p. 178). Fluid ability was hypothesized to increase until adolescence and then slowly decline. It was thought to represent the “action of the whole cortex” (p. 178). Further, fluid intelligence was thought to be the cause of the general factor found among ability tests administered to children and among the “speeded or adaptation-requiring” (p. 178) tests administered to adults. Crystallized intelligence, on the other hand, was thought to consist of “discriminatory habits long established in a particular field” that were originally acquired through the operation of fluid ability but that no longer required “insightful perception” (p. 178). The empirical facts Cattell hoped to explain by this theory were the relative independence of individual differences in speed and power in adult intellectual performance and their different patterns of growth and decline. The important psychological distinction in the theory was between process (fluid intelligence) and product (crystallized intelligence) (Cattell, 1963). The theory of fluid and crystallized ability attracted little attention, possibly because Cattell soon left Harvard for a research professorship at the University of Illinois. There he turned away from the study of human abilities and returned to his earlier research interest of applying the methods of factor analysis to the study of personality. He later wrote, “I had not learned . . . that more original and vital ideas than mine have collected dust on bookshelves for lack of exegesis by their parent or some scholarly leader” (Cattell, 1971, p. x). Twenty years were to elapse before Cattell was to return to the theory of fluid and crystallized abilities with new data. In the 1963 formulation of the theory, Gf was hypothesized to reflect the physiological integrity of the organism useful for adapting to novel situations that, when invested in particular learning experiences, produced Gc. Thus, Gf was now hypothesized to be physiologically determined, whereas Gc was “a product of environmentally varying, experientially determined investments of Gf.” (Cattell, 1963, p. 4) Although intuitively appealing, the hypothesis that Gf reflects physiological influences and is thus a better measure of the true intelligence of an individual is perhaps the most controversial aspect of the theory. Several prominent theorists accept the fluid-crystallized distinction, and some also subscribe to the investment theory of aptitude. But they do so without assuming that
Salkind_Chapter 36.indd 10
9/4/2010 10:42:06 AM
Lohman
Human Intelligence
11
fluid ability represents something more innate than crystallized ability. For example, Cattell’s student and collaborator, Horn (1976), interpreted Gf simply as “facility in reasoning, particularly in fìgural or non-word symbolic materials” (p. 445). Cronbach (1977) went even further and argued that “fluid ability is itself an achievement” that reflects the “residue of indirect learning from varied experience” (p. 287). More recently, Horn (1985) echoed the same theme: “There are good reasons to believe that Gf is learned as much as Gc, and that Gc is inherited as much as Gf” (p. 289). Gc, said Horn, reflects individual differences in “acculturation learning” whereas Gf reflects individual differences in “casual learning” and “independent thinking” (Horn, 1985, pp. 289–290). Horn and others point out that, if tests of fluid abilities were somehow better estimates of the physiological integrity of the organism and if achievement tests were more a product of experience, then scores on tests of fluid abilities should show relatively higher heritabilities, which they do not (Horn, 1985; Humphreys, 1981; Scarr & Carter-Saltzman, 1982). These theorists also reject using tests of fluid ability as measures of “capacity” or “potential” against which achievement can be gauged (Cronbach, 1977; Humphreys, 1985; R. L. Thorndike, 1963). On the contrary, some argue that fluid abilities are among the most important products of education and experience (Snow & Yalow, 1982).
Recent Changes in Gf–Gc Theory The most important change in Gf–Gc theory in recent years has been the addition of several other second-order factors to the model. These developments are summarized somewhat differently by Cattell (1971) and by Horn (1985). Horn identified 10 second-order factors: two deep processing factors (Fluid Ability and Crystallized Ability), three perceptual organization factors (Visualization, Clerical Speed, and Auditory Thinking), three associational processing factors (Short-Term Acquisition and Retrieval, Long-Term Storage and Retrieval, and Correct Decision Speed), and two sensory reception factors (Visual Sensory Detection and Auditory Sensory Detection). Figure 1 shows how these factors can be arrayed along a continuum that progresses from surface to deep processing or from infancy to adulthood. The model is frankly speculative. “I know very little about human abilities,” writes Horn (1985). “All I can do is write articles about them, talk about them, and specify models for them. The more I talk and write and model, the more I realize how little I really know about this complex realm of human functioning” (p. 293). Nevertheless, the model summarizes much of what is known about the organization of human abilities, and it is, in the main, consistent with the abilities Carroll (in press) has thus far identified in his massive review and reanalyses of 60 years of factor-analytic studies of human abilities. Recent research on the four most widely studied broad factors in this model is presented in the next section.
Salkind_Chapter 36.indd 11
9/4/2010 10:42:06 AM
12
Curriculum, Instruction and Learning
Relation eduction
Intensional Extensional Knowledge
Adulthood
Gf Fluid ability
Deep Processing
Gc Crystallized ability
Perceptual organization Gn Broad visualization
Gs Clerical speed Reaction time
Youth Dealing with visual novelty
Awareness
Childhood
Infancy
Developmental hierarchy
SAR Short-term acquisition retrieval
nSD Visual sensory detectors
Ga Broad auditory thinking
CDS Correct decision speed
Association processing
TSR Long-term storage retrieval
aSD Auditory sensory detectors
Sensory reception
Sensorimotor circular activities
Information-processing hierarchy
Note: From “Remodeling Old Models of Intelligence” by J. L. Horn in B. B. Wolman (Ed.), Handbook of Intelligence (p. 295). New York: John Wiley & Sons. Copyright 1985 by John Wiley & Sons, Inc. Reprinted by permission.
Figure 1: A model of ability organization within developmental and information processing hierarchies
Salkind_Chapter 36.indd 12
9/4/2010 10:42:06 AM
Lohman
Human Intelligence
13
Unpacking Existing Tests and Constructs Tests of Fluid and Crystallized Abilities Tests of fluid ability require novel problem solving, much like many of the intelligence tests developed during the first half of the century – particularly the so-called nonverbal or performance tests such as matrices or block design. These tests require subjects to reason with moderately novel fìgural or symbolic stimuli. For this reason, complex spatial tests often load strongly on the Gf factor (Lohman, 1979). Span tests and other measures of what Jensen (1969) calls Level I ability also often load significantly on the Gf factor (Horn, 1985). Tests of crystallized ability, on the other hand, require the examinee to display an understanding of concepts and skills taught in some domain, particularly in school. Verbal knowledge and skills are emphasized, although numerical computation and mechanical knowledge tests often load significantly on Gc factors. Recently the Stanford-Binet was revised along the lines of the theory of fluid and crystallized abilities. The particular version of Gf–Gc theory on which the new Stanford-Binet is based combines the hierarchical model of intelligence of Vernon (Vernon, 1950) and the quasi-hierarchical model of intelligence of Cattell (1963). The three-level hierarchy includes a General Reasoning factor, G, at the top. Three broad group factors – Crystallized Abilities, Fluid-Analytic Abilities, and Short-Term Memory – constitute the second level. Three more specific factors make up the third level. G is interpreted “as consisting of the cognitive assembly and control processes that an individual uses to organize adaptive strategies for solving novel problems” (R. L. Thorndike, Hagen, & Sattler, 1986, p. 3). Thus, the authors adopt Snow’s (1981) definition of Gf as their definition of G. This is a reasonable equation since the Gf factor is invariably highly (Cattell, 1971; Lohman, 1979), or even perfectly (Gustafsson, 1984), correlated with G. Crystallized abilities are represented by both verbal and quantitative reasoning tasks. These abilities “are greatly influenced by schooling, but they are also developed by more general experiences outside of school” (p. 4). Fluid-analytic abilities are estimated by fìgural and spatial tasks. Fluid abilities are thought to involve “the flexible reassembly of existing strategies to deal with novel situations.” Further, the authors acknowledge that these abilities are also developed, but they are developed from more general experiences than schooling. Finally, the Short-Term Memory factor is represented by tests requiring memory for beads, sentences, digits, or objects. Thus, the new Stanford-Binet attempts to fit old tasks into a more recent theory of intelligence. But do we really understand these tasks well enough to defend the inference that different combinations of them reflect different abilities? What happens when we try to look at the processes subjects use when solving test items or when acquiring the knowledge they sample? In other words, is it possible to develop process theories of abilities?
Salkind_Chapter 36.indd 13
9/4/2010 10:42:06 AM
14
Curriculum, Instruction and Learning
Verbal-Crystallized Ability Specific verbal processes. Verbal abilities hold a prominent place in all theories of intelligence. It is not surprising, then, that some of the first efforts to understand intelligence in terms of cognitive processes focused on verbal abilities. Hunt and his colleagues have reported several studies of the informationprocessing characteristics of subjects who differed in verbal-crystallized abilities. Their work is of particular interest because it deals with an important facet of intelligence and because it shows the strengths and weaknesses of both the newer cognitive-experimental approach and the traditional correlational approach to the study of intelligence. The aim of this line of research is aptly summarized in the question, “What does it mean to be high verbal?” which was the title of a report by Hunt, Lunneborg, and Lewis (1975). The method used in this and several other studies was to select college students with extremely high or low scores on the verbal section of a college entrance examination, to administer to these subjects a battery of presumably well understood experimental tasks, to estimate information-processing scores for each subject on each experimental task, and then to relate these scores to scores on the reference verbal-ability tests using some type of correlational analysis. For example, in one experimental task, subjects were required to compare pairs of letters of the alphabet, and to respond “yes,” if the two letters were physically identical (as in “aa” or “AA”), or “no”, if they were different (as in “aA” or “ab”). In a second task, similar pairs of letters were presented, but this time pairs were to be judged according to their names. Thus, in Task 1, the correct answer to the pair “Aa” would be “no,” whereas in Task 2, the correct answer would be “yes.” An information-processing model for Task 1 (Physical Comparison) would posit processes for encoding the appearance of the two letters, comparing these representations, and then responding. A model for Task 2 (Name Comparison) would include all of the processes required by Task 1 plus an additional process to retrieve the name codes. Thus, the difference between the time to respond to a given pair of letters in Task 2 and the same pair of letters in Task 1 provides an estimate of the time needed to perform this additional process. The resulting score is called the NIPI (Name Identity minus Physical Identity) difference and has been widely studied as a measure of the speed of accessing overlearned name codes. Correlations between the NIPI score and measures of verbal comprehension are typically about r = −.3, suggesting that subjects high in verbal ability access name codes faster than subjects low in verbal ability.8 These and other results are consistent with both a hierarchical model of human abilities and with current theories of the way knowledge is represented in memory. In particular, the information-processing tasks used by Hunt et al. (1975) appear to measure specific verbal abilities found in the lower branches of hierarchical models of abilities. Performance on many of these tasks depends on the subject’s ability (a) to produce a rapid, fluent
Salkind_Chapter 36.indd 14
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
15
response and /or (b) to remember the order in which information was presented. This latter ability is sometimes represented in models of memory by a special type of memory code called a linear order (Anderson, 1983). Such a code preserves the sequential structure of an event: what came first, then next, then next, and last. Spelling tests require this sort of memory code; one must not only remember the correct letters but also their proper sequence. Similarly, sequencing arbitrary phonemes into words, such as when learning a new language, or sequencing arbitrary words into strings of words, such as when memorizing the names of the letters of the alphabet, days of the week, or lines in a poem, seems to depend in part on the ability to code information in this way. Research relating scores on experimental tasks to scores on verbal ability tests also has revealed important limitations in efforts to generalize from laboratory tasks to test behavior. First, seemingly simple experimental tasks can measure different abilities in different subjects. For example, Hunt and others (see, e.g., Hunt, Lunneborg, & Lewis, 1975) have used a sentence verification task in which subjects are shown a phrase such as “star above plus” and a picture which either conforms with or contradicts the sentence. Subjects must determine whether the picture and sentence agree. However, minor variations in procedure can substantially alter the way subjects solve this task (Glushko & Cooper, 1978). More importantly, in any given procedure, subjects can differ in the way they solve the task: some create a mental picture from the phrase and compare it with the picture, and some convert the picture to a verbal description and compare that description with the phrase (Macleod, Hunt, & Mathews, 1978). A second limitation stems from the low correlations between scores representing particular information processes on experimental tasks and scores on reference tests of verbal abilities. Keating and MacLean (1987) argue that the main contribution of the information-processing approach to the analysis of intelligence is that it permits investigators to identify particular mental processes such as rate of rotation or speed of lexical access. The value of the process approach diminishes quickly when these parameters show low correlations with other measures or with similarly labeled parameters derived from other tasks. Keating and MacLean are particularly critical of studies in which Hunt abandoned process parameters and instead defined latent “process factors” based on correlations among total reaction time (RT ) or errors on experimental tasks. Using composite indices in this way, they claim, comes close to “dismissing the logic of the original cognitive correlates approach” (p. 259). Such composite indices cannot be used to “explain” composite indices computed in the same way on ability tests. Part of the confusion here surely stems from different expectations about what process parameters represent. It is commonly assumed that, by fitting an information-processing model to a task and by decomposing a composite index (total correct or total latency) into component indices, one has also
Salkind_Chapter 36.indd 15
9/4/2010 10:42:07 AM
16
Curriculum, Instruction and Learning
decomposed individual differences on the task into cleaner components. This is not the case. Actually, individual differences in component scores (e.g., rate of rotation) salvage individual differences relegated to the error term when performance for each individual is summarized in a single score such as number of problems solved correctly, or mean response latency. Recapturing variance from the error term might be a profitable activity but only when items on the task show poor internal consistency. Even then, it must be recognized that such scores do not represent a decomposition of the individual differences variance reflected in total or average scores. Low correlations between scores thought to represent particular verbal processes and reference verbal-ability tests may also mean that much of the knowledge or some of the cognitive processes that account for general crystallized abilities (Gc) as measured by tests are not required by the experimental tasks. Experimental tasks in which subjects are required to infer the meaning of unfamiliar words from context sometimes show much higher correlations than do simple laboratory tasks with both Gc scores and general reasoning scores (Sternberg & Powell, 1983). This suggests that the low correlations obtained by Hunt et al. (1975) may estimate the contribution of specific verbal processes to Gc. Much of the remaining variability in Gc is better attributed to the ability to apply general reasoning skills and prior knowledge to the task of understanding verbal material and learning from it. Reading comprehension. Nowhere is this interdependence of specific component processes, general reasoning abilities, and prior knowledge better demonstrated than in reading. Reading comprehension is highly correlated with general verbal abilities, particularly in school-age populations. Thus, research on reading comprehension not only illuminates an important aspect of Gc but also shows how diagnostically useful tests can be derived from theory and how studies of individual differences can in turn reveal needed changes in the theory. J. R. Frederiksen’s (1982) work is perhaps the best example of this reciprocity. Frederiksen began by developing a general model of reading from his own research and that of many other investigators. He eventually distinguished three types of information-processing skills used in reading: word-analysis processes (e.g., encoding single- and multiletter units, using phonics skills), discourse analysis processes (e.g., retrieving word meanings, resolving problems of reference), and integrative processes (e.g., combining information from pictures and text). Frederiksen then constructed a test battery to measure some of these skills. Measures were validated by using both experimental and correlational techniques. Later, training tasks were devised to assist poor readers in acquiring deficient skills. Other theories of reading ability have been advanced in recent years. For example, Perfetti (1986) distinguishes three types of component processes in his theory: lexical access, proposition encoding, and text modeling. Lexical access refers to the process by which word meanings are activated in longterm memory. Individual word meanings are then combined and retained in
Salkind_Chapter 36.indd 16
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
17
working memory in predicate-like structures called propositions. These in turn are combined with the reader’s prior schematic knowledge to form a text model. This model, then, represents the reader’s understanding of the text. Kintsch (1986), in another theory of text comprehension, argues that two types of mental models must be coordinated: a text model, which contains the reader’s representation of the propositions embedded in the text, and a situation model, which might be a mental image of the situation described by the text. For example, in following directions to assemble a toy, the text model might represent the ideas implied by the words, “Attach wheel K to spindle Q using two 5/16 washers and a large hex nut.” The situation mental model might be represented by an image of what one is supposed to do. Pictures, illustrations, good description, metaphor, and analogy facilitate the generation of good situation models. A well structured text that follows a familiar schema and uses familiar words facilitates the construction of a coherent text model. Mental models may be an important link in the individual difference equation as well. A central problem in the definition of verbal abilities has been the overlap between measures of reasoning abilities and measures of verbal comprehension. However, theories of reasoning (Holland, Holyoak, Nisbett, & Thagard, 1987; Johnson-Laird, 1983) also emphasize the construction and the coordination of mental models. Thus, process analyses reveal commonalities between tasks (and the ability constructs they define) not apparent in armchair analyses. A similar argument may account for the high correlation between reasoning and vocabulary scores. The meaning of an unfamiliar word is usually inferred from the contexts in which the word has been embedded. (DaalenKapteijns & Elshout-Mohr, 1981; Marshalek, 1981; Sternberg & Powell, 1983). This process is most successful when the learner generates a good schema (or model or working hypothesis) about the meaning of an unfamiliar word when it is first encountered. This schema can then be confirmed or contradicted by evidence from subsequent contexts. Low-verbal subjects are less likely to use this strategy than are high-verbal subjects. Thus, vocabulary tests that use abstract words (i.e., words whose meanings are difficult to infer from a single context) show higher correlations with reasoning than do vocabulary tests of comparable difficulty composed of infrequent words (Marshalek, 1981).
Spatial-Visualization Ability Spatial tasks have long been used as psychological tests. Before 1915, Porteus had used such “performance” tasks to estimate the intelligence of linguistically different or disabled examinees. Spearman also originally used such “performance” tests as a measure of g, a tradition he attributes to Itard (1801, cited
Salkind_Chapter 36.indd 17
9/4/2010 10:42:07 AM
18
Curriculum, Instruction and Learning
in Spearman & Wynn Jones, 1950). Spatial tasks also figured prominently in the Army Beta examinations of World War I. However, beginning with Kelley (1928) and then El Koussy (1935), such tasks were studied in their own right, and several specific spatial abilities were identified (Smith, 1964). Nevertheless, spatial or figural reasoning tasks have continued in their role as measures of general abilities, particularly Gf. As with verbal abilities, cognitive research on spatial abilities may be divided into (a) attempts to develop general theories of spatial thinking that ignore individual differences (e.g., Pinker, 1984; Shepard & Cooper, 1982), and (b) attempts to explain individual differences on existing tests of spatial abilities, either through correlations between scores on spatial tests and performance on laboratory tasks or through the construction of informationprocessing models for particular spatial tests. In contrast to recent research on verbal abilities, however, only a few studies have examined correlations between scores from laboratory tasks and scores from existing tests. Instead, most effort has been directed toward attempts to build information-processing models that describe how subjects solve particular spatial tests (see, e.g., Pellegrino & Kail, 1982). This is because most spatial tests are process-intensive in the same way that most verbal tests are knowledge-intensive. In other words, although some interesting processing occurs when subjects take a vocabulary test (Sternberg & McNamara, 1985), most of the complex processing occurred at the time the words were learned. Conversely, although spatial knowledge has an important impact on spatial problem solving (Lohman, 1988), whether subjects solve such problems depends heavily on the processes they employ during the test. Theories of spatial thinking (e.g., Kosslyn, 1980) distinguish two types of spatial knowledge: knowledge best modeled by quasi-pictorial mental representations (e.g., appearance of a particular object) and knowledge best modeled by abstract, proposition-based memory representations (concepts of symmetry, proportionality, closure, etc.). Each type of representation can be transformed by a different class of mental operators or procedural knowledge. Quasi-pictorial representations can be subjected to various analog transformations such as a rotation or synthesis (Shepard & Cooper, 1982). Propositional representations can be subjected to the same general and specific cognitive operators (e.g., means ends analysis) that can be applied to propositional knowledge derived from other sources (e.g., linguistic inputs). Transformations such as rotation, then, are of interest primarily for the constraints they place on the type of mental representation used. Thus, many spatial-ability tests present items which seem to require for their solution analog transformations such as rotation, reflection, transposition, or synthesis. Research on how subjects solve spatial tests has turned up several surprises. One persistent finding has been that all subjects rarely solve fìgural tasks in the same way. For example, in a series of experiments on visual comparison processes, Cooper (1982) identified two markedly different strategies. Some
Salkind_Chapter 36.indd 18
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
19
subjects appeared to rely on a serial, analytic process to compare forms whereas others relied on a parallel, holistic process. Complex tasks – such as the paper-folding tasks or form-board tasks commonly seen in mental tests – elicit an even wider range of alternative solution methods. Some subjects solve items on such tests by generating mental images that they then transform holistically. These high-spatial subjects excel in generating, retaining, and transforming mental representations that preserve information about the configuration of a figure. They also use their spatial knowledge to decompose unfamiliar visual shapes into simpler, more familiar shapes. Other subjects rely on general reasoning skills or external aids (such as line drawings) to solve problems. Others use still different processes. But most subjects use more than one type of processing, generally shifting from one strategy to another as problems increase in difficulty (Lohman, 1988). Such within-subject variability in solution strategy challenges simple information-processing models of spatial tests. Strategy shifting may partially explain why complex spatial tests are often good measures of g or Gf. Appropriate flexibility in adapting solution methods to meet personal limitations and changing item demands appears to be a central aspect of any process theory of Gf (Snow & Lohman, 1989).
Fluid-Reasoning Ability There has been considerably more research on reasoning or general fluid ability than on either general crystallized or general visualization abilities. However, attempts to understand how subjects solve Gf tasks such as analogies, classification, and series completion that have ignored differences in processing strategy (by averaging over items) or reduced the need for alternative strategies (by drastically simplifying items) have generally produced experimental tasks that show little relationship with scores on reference Gf tests. Put another way, simple items that are all solved in the same way by all subjects probably require little of what we call intelligence. The effects of simplifying a complex task so that it could be studied experimentally and ignoring within-person strategy shifts were perhaps most evident in Sternberg’s (1977) first investigation of analogical reasoning. Sternberg hypothesized that subjects use several different or “component” processes when solving analogies such as “Up is to down as left is to (a) back (b) right” or A:B::C:D1, D2. According to Sternberg’s theory, subjects (a) first read and understand each term in the analogy (encoding), (b) determine the relationship between the A and B terms (inference), (c) infer the relationship between the A and C terms (mapping), (d) generate an ideal answer by applying the A-B relationship to C (application), and (e) compare their ideal answer with the options provided (comparison). If none of the presented options meet the subjects’ criterion for acceptability, they then recycle through some or all
Salkind_Chapter 36.indd 19
9/4/2010 10:42:07 AM
20
Curriculum, Instruction and Learning
of the preceding steps ( justification) and finally choose an option and respond (response). Component processes were assumed to be executed serially. Different models were then formulated by deleting particular processes (e.g., mapping, justification) and by specifying different modes of execution for a given process (e.g., self-terminating or exhaustive). Three important results were obtained. First, models were quite successful in accounting for variabilities in response latencies and, to a lesser extent, in response errors. Second, the data from most subjects were well fitted by a single model, suggesting that most subjects used the same strategy. Third, estimates of speed of executing particular component operations showed small and inconsistent relationships with reference reasoning tests. Unexpectedly, the highest correlations were observed for the preparation-response component. Thus, the componential analysis appeared successful, but those components hypothesized to reflect the essence of reasoning seemed not to measure reasoning at all. Later studies in which better practiced subjects attempted more complex items did show significant correlations between component scores and scores on reasoning tests (Bethell-Fox, Lohman, & Snow, 1984; Sternberg & Gardner, 1983). It appears that problems must be more than trivially difficult before individual differences in reasoning are observed. Further, items must also vary somewhat in the processing demands they place on examinees.9 This means that problems must be moderately novel. Novelty is an ancient theme in the psychology of individual differences. From Stern (1912/1914) to Sternberg (1985), theorists have argued that intelligence is best displayed when tasks are relatively novel. Cognitive psychologists are only beginning to understand how subjects transfer prior learning to analogous situations (Gick & Holyoak, 1983). The problem, of course, is that what is novel for one person may not be novel for another person or even for the same person at a different time. It appears that inferences about how subjects solve items that require higher level processing must be probabilistic, since the novelty of each item varies for each person. Snow (1981) has integrated these and other research results in the following hypothesis on the nature of fluid and crystallized abilities. Gc may represent prior assemblies of performance processes retrieved as a system and applied anew in instructional or other performance situations not unlike those experienced in the past, while Gf may represent new assemblies of performance processes needed for more extreme adaptations to novel situations. The distinction is between long-term assembly for transfer to familiar situation vs. short-term assembly for transfer to unfamiliar situations. Both functions develop through exercise, and perhaps both can be understood as variations on a central production system development. (p. 360)
The point about “exercise” derives from E. L. Thorndike’s (1903) theory of learning whereas the point about “production system” derives from the ACT∗ model of Anderson (1983), which is discussed later.
Salkind_Chapter 36.indd 20
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
21
Mental Speed The fourth and last broad factor in Horn’s (1985) model that will be examined here is sometimes called General Speed, sometimes Clerical Speed, or sometimes simply, Mental Speed. There is a new interest in this construct, whatever it is called. However, like most other ability constructs, mental speed has a long history in educational and psychological measurement. E. L. Thorndike, Spearman, and Thurstone all addressed the question of whether mental speed should be distinguished from power (or altitude). For example, although mental speed was one of the four dimensions of his model of intelligence, E. L. Thorndike considered speed less important than altitude (see E. L. Thorndike et al., 1926). On the other hand, Spearman (1927), citing studies which showed high correlations between scores on a time limit test and scores on the same test after an extended period of time, concluded (erroneously) that speed and power (or altitude) were interchangeable. Thurstone (1937) proposed a three-dimensional model that related ability, speed, and motivation. Like E. L. Thorndike, he defined ability in terms of power or altitude in his model (although many of the ability factors he identified in his empirical studies were based on simple, highly speeded tests). Individual differences in mental speed have been studied in several paradigms, two of which are summarized here. Research in the first paradigm at first sought to estimate the subjects’ “natural” rate of thinking (Hunsicker, 1925). This search led to the identification of several personality factors such as Carefulness, Persistence, and Impulsivity that described subjects’ typical trade-off between speed and accuracy. It also led to the identification of several cognitive speed factors, such as Perceptual Speed, Clerical Speed, and eventually, to claims of a General Speed factor. Research in the second paradigm, which may be traced back to Galton (1869) has sought to define intelligence as a physiological rather than as a psychological or sociocultural construct. Thus, the aim is to determine the integrity and efficiency of neurological mechanisms thought to underlie intelligent thought and action. Preferred indicators of intelligence in this paradigm are measures of sensory acuity, speed of detecting a stimulus or discriminating between two stimuli, and, in more recent work, patterns in recordings of electrical activity in the brain. Correlations are then computed between these measures and more global indices of intelligence, such as teacher ratings, course grades, or scores on existing intelligence tests. Work in this paradigm had hardly begun when it was abandoned by most psychologists, partly because of studies like that of Wissler (1901), but perhaps in larger measure because of the success of Binet’s test. Wissler, working under the direction of James McKeen Cattell at Columbia (who had in turn worked with Galton for a short time), found that a measure of RT was uncorrelated with grade point average in a sample of university students. The RT paradigm has recently been revived by Jensen, Eysenck, and others.
Salkind_Chapter 36.indd 21
9/4/2010 10:42:07 AM
22
Curriculum, Instruction and Learning
Speed factors. Variation in the relative emphasis tests placed on speed or level of performance is an important confound in much of the literature on human abilities. The primary factors identified by Thurstone and his followers, particularly Guilford, were often defined by tests that contained simple, similar, highly speeded items. Complex versions of the same tests administered under conditions which emphasize level or altitude invariably show stronger loadings on the general factor and little evidence of the fractionalization of ability that occurs when simple, speeded tests are administered (Lohman, 1979). This is because individual differences in the speed with which subjects can solve relatively simple problems in a domain show only weak correlation with the complexity of a problem of the same type which subjects can solve when time is not a factor (Horn, 1985; Kyllonen, 1985). The question remains, though, whether some or all of these various speed primaries may define a higher order or General Speed factor. Although several investigators have claimed to have identified a General Speed factor, closer examination shows that such factors are often little more than overblown Clerical Speed or Perceptual Speed factors. General differences in speed of processing may well exist, but they are difficult to identify by factor analyzing speed scores from a battery of tests. The major reasons are that one cannot make unambiguous comparisons of response latencies across individuals unless (a) all subjects correctly solve all items, (b) all subjects adopt the same trade-off between speed and accuracy, and (c) neither of these factors vary systematically across tasks. One way to avoid these problems would be to use a single task that is so simple that everyone can solve it and that is not much influenced by the individual’s decision to emphasize speed or accuracy. Recent studies of reaction time aim to fit both of these criteria. Recent research on reaction time. The primary dependent measure in much cognitive research is response latency, usually on simple tasks. Those who study individual differences raised the question of whether individual differences in latencies on these laboratory tasks would show any relationship with individual differences on other tasks that presumably required the same processes (Underwood, 1975) or with ability variables commonly assessed by mental tests (Hunt et al., 1973; Snow et al., 1976). But the main goal of researchers like Hunt, Snow, and Sternberg was to develop and test information-processing models of theoretically interesting cognitive tasks or of tests commonly used to estimate important ability constructs, not to propose new measures of mental speed. However, this was precisely the goal of another group of researchers. Led by Jensen in the United States and Eysenck in the United Kingdom, these researchers saw possibilities for new measures of intelligence in response latencies on simple tasks and other indices of cognitive efficiency presumably unaffected by intention or experience. Jensen’s work. Jensen sparked new interest in the relationship between RT and G (intelligence) by showing significant correlations between choice (or discrimination) RT and measures of G. Jensen’s work has generated much
Salkind_Chapter 36.indd 22
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
23
discussion. In part this is because his goal seems to be to isolate a culture-free measure of intelligence. Individual and group differences on such a measure could then not be interpreted “as reflecting only differences in cognitive contents and skills that persons have chanced to learn in school or acquire in a cultured home” (Jensen, 1980, p. 704). The apparatus Jensen has used in his studies contains a center “home button” surrounded by 8 light/button pairs. Different light/button pairs can be covered to manipulate the number of stimulus–response pairs between 1 and 8. The task is to hold a finger on the home button until one of the exposed lights is activated and then turn it off as quickly as possible by moving the finger from the home button to the button directly below the activated light. Two time intervals are recorded: (a) the time between the onset of the stimulus light and the release of the home button (called RT), and (b) the additional time required to move the finger to the button below the activated light and press it (called movement time). In a typical experiment, subjects receive a few practice trials, followed by 15 trials at each of four levels of task complexity: 1, 2, 4, or 8 light/button pairs exposed. Typically, RT increases linearly with the log of the number of buttons exposed. Jensen found that the slope of this function, which is taken as an estimate of the rate at which a person processes a single unit of information, and G correlated negatively, with. r = −.41 being the most often cited correlation. In addition, the correlation between RT and G increases as task complexity is increased from 1 to 8 light/button pairs, suggesting that the greater the information-processing burden, the greater the demand on G. Jensen’s work has been praised by some (e.g., Eysenck, 1982) and criticized by others (e.g., Longstreth, 1984; Carroll, 1987). In particular, Jensen’s claim that performance on the choice RT task is not influenced by practice, motivation, or instructions to alter speed-accuracy trade-off has been questioned (Carroll, 1987; Longstreth, 1984). Longstreth also raises a number of fundamental questions about Jensen’s procedure, such as the routine confounding of practice with task complexity. Carroll questions the replicability and interpretation of Jensen’s results. He suggests that differences between individuals in average RT may better be described as differences in the variability in RT for a given person over trials. This is because RTs have a lower limit, and thus individuals with more variable RTs would tend to have higher mean RTs because they are more likely to deviate upward from the lower limit. This suggests that the observed correlation between RT and G may in part reflect differences in attentional control and not simply differences in the speed of neural conduction or the rate of neural oscillation, as Jensen hypothesizes. Attempts to replicate Jensen’s findings usually find some relationship between RT and G (most often between the variability of RTs for individual subjects and G, with lower G subjects having more variable RTs). But replications consistently fail to find that low G subjects show greater increases in RT
Salkind_Chapter 36.indd 23
9/4/2010 10:42:07 AM
24
Curriculum, Instruction and Learning
with increases in the number of exposed light/button pairs than do high G subjects (Barrett, Eysenck, & Lucking, 1986; Carlson, C. M. Jensen, & Widaman, 1983; Jensen, 1987). Although controversy about Jensen’s work continues, there is some consensus on the main findings. First, correlation between G and RT is generally somewhat lower for the simple RT condition (one light/button pair exposed) than for the discriminative RT conditions (two or more light/button pairs exposed). Second, correlations between discrimination RT and G vary widely. However, replicable correlations are generally in the −.2 to −.4 range. Conditions with more light/button pairs (e.g., 8) do not yield dependably higher correlations with G than conditions with fewer light/button pairs (e.g. 2). Indeed, it is a common finding that correlations between RT and G decline as more and more complex information processing is required. More complex tasks allow multiple strategies and are prone to differences in the speed–accuracy trade-off subjects adopt. Third, the variability in RT over trials often correlates as highly with G as does mean or median RT. Thus, attention control (or, conversely, distractibility) may be as important as speed of processing in this task. Fourth, Jensen’s claim that RT increases linearly with the log of the number of exposed light/button pairs has been repeatedly confirmed. However, other investigators have not been able to confirm his claim that individual differences in the slope of this line correlate with G. It is unclear whether this is due to persistent methodological inadequacies in these studies (which usually follow Jensen’s procedures), as Longstreth (1984) notes, or whether this reflects a more fundamental error in Jensen’s theory, as Eysenck (1987b) now claims. Eysenck’s work. Eysenck (1982; 1988) has proposed a theory of intelligence with an even stronger physiological flavor. Following Hebb (1949), Eysenck (1988) distinguished among biological intelligence, psychometric intelligence, and social intelligence. Biological intelligence “refers to the structure of the human brain, its physiology, biochemistry, and genetics which are responsible for the possibility of intelligent action” (p. 3). Eysenck considers biological intelligence to be the purest, most fundamental intelligence because it is “least adulterated by social factors.” He claims it can be measured by the electroencephalogram (EEG), evoked potentials, galvanic skin responses, and perhaps reaction times. Psychometric intelligence is defined as that intelligence which is measured by psychometric tests. In addition to the core of biological intelligence, is determined by cultural factors, education, family upbringing, and socioeconomic status. However, since only a fraction of the variance in psychometric intelligence (i.e., IQ) can be attributed to genetic factors (Eysenck estimates 70%), IQ should not be confused with biological intelligence. Social intelligence reflects the ability to solve problems an individual encounters in life. But since so many noncognitive factors are reflected in such performances, Eysenck (1988) argues that “social intelligence is far too
Salkind_Chapter 36.indd 24
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
25
inclusive a concept to have any kind of scientific meaning” (p. 45). Thus, for Eysenck, intelligence is a concept that is best studied at the physiological (or even neurological) level, only indirectly represented in intelligence tests, and obscured almost entirely in performances in the real world. This is an extreme view and is not widely shared, at least not by American academics. As with Jensen’s work, much of the controversy surrounding Eysenck’s work has centered not so much on the finding of significant correlations between G and EEGs, cortical evoked potentials, and other physiological indices but on the reported magnitude of the correlations. For example, Eysenck’s colleague, Hendrickson (1982), reported a correlation of r = .83 between a measure of evoked potentials and Wechsler IQ for a sample of 219 15-year old children. In 1984, Eysenck claimed that “several replications . . . have shown the results are essentially reproducible” and that these results were “a most important validation of Galton’s concept” of intelligence (published in Eysenck, 1987a, p. 359). However, by 1988, presumably on the basis of new evidence, Eysenck had changed his mind. “It seems unlikely that the correlation between IQ and a physiological measurement of biological intelligence . . . can exceed the square root of the heritability of IQ,” and thus correlations such as those obtained by Hendrickson (1982) are “inherently improbable and unlikely to be replicated” (Eysenck, 1988, p. 12). Inspection time. A similar history attends the reports on correlations between inspection time and IQ. Inspection time is the minimum duration for which two different stimuli must be presented if they are to be perceived as different. Nettelbeck and Lally (1976) reported a correlation of r = −.92 between the Wechsler Adult Intelligence Scale performance scale and inspection time, but, for a sample of only 10 subjects, 2 of which were retarded. The magnitude of the reported correlations gradually declined as larger and less wide-ranging samples were tested. By 1984, Irwin reported correlations of r = −.32 and r = −.09 for auditory and visual inspection times with a verbal intelligence test and correlations of r = −.23 and r = −.27 for those same inspection times with a nonverbal intelligence test for a sample of 50 12-year-old children. In the meantime, Nettlebeck and Kirby (1983) had gathered new data on a large sample of adults and had reanalyzed data from one of their earlier studies. This time they found no correlation between G and slope in the Jensen task and a weak correlation between inspection time and G (r = −.3) when retarded subjects were excluded. They therefore concluded that their earlier correlations had been inflated by the inclusion of retarded subjects, who were “markedly less efficient” (p. 39) on these tasks. Their conclusions run completely counter to earlier claims: This outcome raised doubt about the validity of combining data from retarded and nonretarded subjects. Our results ran counter to claims that tasks of the kind used [in this study] are largely uninfluenced by cognitive
Salkind_Chapter 36.indd 25
9/4/2010 10:42:07 AM
26
Curriculum, Instruction and Learning
variables [such as strategy], so that findings are not necessarily explained satisfactorily in terms of a mental speed factor. These measures of timed performance do not, at this time, provide a basis from which a reliable, culture-fair measure of intelligence might be devised. (p. 39)
Summary. Critics of studies that report correlations between measures such as RT, inspection time, evoked potentials, and G cynically argue that the best predictor of the correlation obtained is the date of the study. The first correlation reported is usually strikingly high, but then the magnitude of the reported correlation declines almost linearly with year of publication, eventually stabilizing on a value in the −.1 to −.4 range. Such correlations are theoretically interesting, but they do not justify attempts to replace existing intelligence tests with RT measures, or interpretations of G as a purely physiological phenomena. One need not descend to the level of neurons to find a plausible account of the role of mental speed in models of intelligence. For example, the rate at which activation spreads through regions of memory, the rate at which an activated memory loses its activation, and the level of activation needed to allow further processing are all important constructs in modern theories of memory (Anderson, 1983). Direct study of these variables would seem more useful than the study of isolated tasks that have not been designed to estimate specific cognitive processes. Even then, variables thought to reflect the physiological action of the cortex are useful only to the extent that they predict individual differences in behavior labeled “intelligent” in the culture. E. L. Thorndike saw this clearly: Psychologists would of course assume that differences in intelligence are due to differences histological or physiological, or both, and would expect these physical bases of intelligence to be measurable . . .. [However], even if one aimed at discovering the physiological basis of intellect and measuring it in physiological units, one would have to begin by measuring the intellectual products produced by it. For our only means of discovering physiological bases is search for the physiological factors which correspond to intellectual production. (E. L. Thorndike et al., 1926, p. 12)
Individual differences in mental speed have an important impact on all of cognition. But neither theory nor empirical evidence justifies attempts to define G in terms of speed, while ignoring the larger contributions of level or altitude in both process and knowledge to this construct we call intelligence.
Attempts to Move Beyond Existing Tests It has long been recognized that theories of human intelligence have been limited by the selection of tasks included in particular intelligence tests or in factor-analytic studies of abilities. Several theorists (e.g., Cattell, 1971;
Salkind_Chapter 36.indd 26
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
27
Guilford, 1959) have proposed schemes for defining the universe of intelligent behaviors, cognitive functions or tasks. The framework can then be used to select or construct tests of different facets of intelligence. In this section, I briefly survey two rational models of this sort; Guilfor’s (1959, 1985) structure of the intellect (SOI) model and Sternberg’s (1985) triarchic theory of intelligence.
Guilford’s SOI Model As director of the Aviation Psychology Research Unit during World War II, Guilford saw the number of factorially defined abilities grow as tests were developed to measure abilities hypothesized to be important in the training and performance of air crews. After the war, he continued to investigate new abilities in his Aptitudes Research Project at the University of Southern California. By the mid-1950s, approximately 40 ability factors had been identified in one or both of these efforts (Guilford, 1985). In searching for a way to organize these factors and guide the search for new abilities, Guilford hit upon the idea of grouping abilities by a three-way classification: by the kind of mental process required, by the kind of information processed, and by the mental products generated. The combination of five types of mental processes, four types of content, and six types of product defined the 120 abilities in the structure of the intellect model.10 Although the model has generated considerable research, it has declined in influence in recent years. Questions have been raised about the factoranalytic methods used to identify factors (Horn & Knapp, 1973), about the seeming fractionation of ability (McNemar, 1964, called the scheme “scatter-brained”), and about the adequacy of the SOI model itself. Some of these challenges have been countered. Elshout, van Hemert, and van Hemert (1975) showed that Guilford’s Procrustean factor-analytic methods were not as bad as Horn and Knapp (1973) had claimed. Following Humphreys’ (1962) suggestion, Guilford (1985) countered criticisms of fractionation by agreeing that higher order abilities may be defined by averaging over cells within the SOI model. In addition, he countered objections that the model did not include auditory abilities by adding another level to the content facet for auditory abilities – raising the total number of cells in the model from 120 to 150. Nevertheless, levels of facets have no convincing foundation other than rational appeal, and the entire product dimension remains poorly validated (Cronbach & Snow, 1977). Excepting the addition of 30 new auditory abilities, over 20 years of research has produced no substantive changes in the model. Perhaps this is because research sought to demonstrate the validity of the model rather than to identify and correct its weaknesses.
Salkind_Chapter 36.indd 27
9/4/2010 10:42:07 AM
28
Curriculum, Instruction and Learning
Triarchic Theory Overview of the theory. Sternberg’s (1985) theory of intelligence contains three subtheories: a contextual subtheory, an experiential subtheory, and a componential subtheory. The contextual subtheory attempts to specify those behaviors that would be considered intelligent in a particular culture. Sternberg argues that, in any culture, contextually intelligent behavior involves purposeful adaptation to the present environment, selection of an optimal environment, or shaping of the present environment to fit better one’s skills, interests, and values. The nature of the adaptation, selection, or shaping can vary importantly across cultures. For example, navigational skills, hunting skills, and academic skills are highly valued as markers of intelligence in different cultures. However, even if a particular task is thought to require intelligence, contextually appropriate behavior is not equally “intelligent” at all points along the continuum of experience with that class of tasks. According to the experiential subtheory, intelligence is best demonstrated when the task or situation is relatively novel or when learners are practicing their responses to the task so that they can respond automatically and effortlessly. Although many have suggested that tasks must be moderately novel to measure intelligence, Sternberg’s theory is unique in its claim that the ability to automatize processing is also a good indicator of intelligence. To date, no convincing evidence has been advanced to support this hypothesis. In the componential subtheory, Sternberg attempts to specify the cognitive structures and processes that underlie all intelligent behavior. Contextually appropriate behavior at relevant points in the experiential continuum is said to be intelligent to the extent to which it involves certain types of processes. Three types of processes are hypothesized: metacomponents, which control processing and enable one to monitor and evaluate it; performance components, which execute plans assembled by the metacomponents; and knowledge acquisition components, which selectively encode and combine new information and selectively compare new information to old information. Thus, Sternberg’s contextual subtheory describes what types of tasks, situations, and behaviors might be considered intelligent. It is relativistic with respect to individuals and to the sociocultural settings in which they live. In the United States, the prevailing contextual theory of intelligence involves problem-solving, or fluid abilities; knowledge-based, or crystallized abilities; and social and practical abilities. The experiential subtheory claims that intelligence is relative to each individual’s experience with the task or situation. Only the componential subtheory claims to describe the mechanisms of thought that would be used in any intelligent act. Evaluation of the Triarchic Theory. Some argue that intelligence as measured in the tradition of Binet and Wechsler is best construed as scholastic aptitude. This tendency to narrow the scope of intelligence tests has been
Salkind_Chapter 36.indd 28
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
29
countered repeatedly by those who would extend measurement to domains such as social intelligence (E. L. Thorndike, 1920), creativity (Guilford, 1959), or musical ability (Gardner, 1983) that are sampled inadequately or not at all by existing tests. Those who would extend the purview of existing tests tend to view intelligence as an adjective rather than a noun and argue that tests of intelligence should sample all domains of activity that are valued as intelligent in the culture. Sometimes these unmeasured abilities are essential features of the theorist’s implicit theory of intelligence or that of a larger social group. Those who view intelligence as a noun usually equate intelligence with individual differences in a particular type of cognition, such as “eduction of relations and correlates” (Spearman, 1927) or “judgment” (Binet & Simon, 1905). However, others view the noun as a shorthand expression for all individual differences in cognition and argue that a good test of intelligence presupposes a good theory of cognition (Hunt, 1986) or at least a good sample of “the repertoire of intellectual skills and knowledge available to the person at a particular point in time” (Humphreys, 1986, p. 98). Sternberg’s triarchic theory attempts to satisfy both of these demands. His contextual theory recognizes the cultural relativity implied when intelligence is treated as an adjective, and his componential theory “[covers] most if not all of the territory of cognitive psychology” (Carroll, 1986, p. 325). Reactions to Sternberg’s theory have been mixed. Some argue that his triarchic theory is not a theory at all but a “conceptualization” of intelligence (Humphreys, 1984). Sternberg’s theory for testing implies that one should model individual performance on cognitive tasks that represent fluid and crystallized abilities, so that component scores and solution strategy may be estimated for the individual; recognize that comparisons of individuals and especially of groups may be misleading, because tasks are differentially novel or practiced for different individuals and groups; and broaden the sample of tasks included on intelligence tests to better represent skills in adapting to the environment, shaping the environment, or selecting new environments. Here, Sternberg (1985) sees a special need for tests that measure “real-world” or practical intelligence. In several studies, questionnaires designed to assess repondents’ tacit knowledge about managing self, others, and career have shown moderate correlations with various objective criteria of success in the domain (Wagner & Sternberg, 1986). Cronbach (1986) agrees that this is a worthwhile goal for measurement, but he is unimpressed with the verbal tests of practical intelligence Sternberg has thus far developed. He claims that Sternberg’s tests are “quizzes on gamesmanship” (p. 24). Sternberg counters that scores on his questionnaires are generally uncorrelated with measures of verbal intelligence. Perhaps Ford’s (1986) research on the measurement of social intelligence can provide some useful cues for the measurement of practical intelligence. He argues that better measures can be obtained when social intelligence is
Salkind_Chapter 36.indd 29
9/4/2010 10:42:07 AM
30
Curriculum, Instruction and Learning
defined in terms of outcomes (i.e., social competencies) rather than in terms of social cognition (e.g., understanding verbal or pictorial displays of social events). However, practical and social intelligence differ in several respects, and each has its roots in a different tradition. Whereas research on social intelligence stemmed from the observation that academic intelligence was no guarantee of social competence, research on practical intelligence began with the observation that academic intelligence was also no guarantee of “common sense.” Thus, studies of social intelligence are rooted in the research on social judgments, whereas studies of practical intelligence developed from research on “tacit” knowledge – that is, knowledge that is not explicitly taught or discussed but that may facilitate performance or even be necessary for success in some domain. Whether or not Sternberg succeeds in his efforts to develop new measures of practical intelligence or better measures of other aspects of intelligence, he has clearly succeeded in unifying diverse – even antagonistic – traditions in research on intelligence. With his prolific research, writing, and editing activities, Robert Sternberg has probably done more than any other contemporary psychologist to bring back into attention fundamental questions about intelligence – what it is, how it can best be observed and measured, and how it relates to other domains of behavior. (Carroll, 1986, p. 325)
Integrative Theories in Cognitive Science All of the research efforts described to this point have involved the study of individual differences, either in existing tests of intelligence or achievement, or in tasks taken from the laboratories of experimental cognitive psychologists. However, there is an obvious circularity in attempts to understand the nature of intelligence by studying existing tests of intelligence or by identifying the information-processing characteristics of people who have been labeled high or low ability because of their scores on existing tests. Attempts to specify the cognitive character of the target behaviors or achievements such tests aim to predict expand the circle significantly but do not remove the circularity. What is needed is a general theory of human cognition. Measurements of individual differences could then be derived from this theory rather than in a theoretical vacuum. There have been several attempts to put theory before assessment, particularly in the measurement of reading disabilities (Frederiksen, 1982) and (less successfully) in the measurement of spatial abilities (Poltrock & Brown, 1984). But the term intelligence connotes a much broader effort. A central question in cognitive science is whether human cognition is best modeled as a unitary system or as a collection of independent systems or
Salkind_Chapter 36.indd 30
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
31
modules. This debate parallels the Spearman-Thorndike/Thurstone controversy over g versus multiple factors in differential psychology (see R. M. Thorndike & Lohman, 1989). Much early theorizing presumed a unitary system, as Newell and Simon (1972) advocated in their General Problem Solver. This program aimed to solve a broad array of reasoning problems using general heuristics. By the late 1970s, however, the pendulum was beginning to swing the other way. Led by Chomsky (1980) and Fodor (1981), a modular view of cognition gained popularity. Modularists argue that the mind is best construed as a collection of independent information-processing systems, including systems for language, visual processing, music, and other specialized mental contents. Chomsky even describes such faculties as “mental organs,” analogous to physical organs such as the heart. Modularists point to findings from neuropsychology on apparent localization of different mental functions in different regions of the brain and to factor-analytic and other studies of individual differences which show that musical, spatial, numerical, and other abilities can be distinguished (Gardner, 1983). Most modularists deny the need for a central or executive processor. Some recognize these higher thought processes but argue that cognitive science cannot explain them (Fodor, 1981). [Modularists recognize higher thought processes, but they deny the need for a central or executive processor and argue that cognitive science also cannot explain them (Fodor, 1981)].
Anderson’s ACT∗ Theory Several research efforts, most notably that of Anderson and his colleagues, have opposed this side of modularity. In a series of monographs (Anderson & Bower, 1973; Anderson, 1976, 1983), Anderson has developed and refined his Adaptive Control of Thought (ACT) system, culminating in the latest version, ACT∗. The system is too complex to describe more than its general features here. (The reader is referred to Chap. 1 of Anderson, 1983.) First, Anderson (1983) claims that all “higher cognitive processes, such as memory, language, problem solving, imagery, deduction, and induction, are different manifestations of the same underlying system” (p. 1). Nevertheless, ACT∗ posits special-purpose “peripheral systems” that convert information presented to the senses into distinctive perception-based memory representations or codes, such as images (that preserve information about configuration) and temporal strings (that preserve information about temporal order). Other perception-based memory codes (e.g., olfactory, kinesthetic) seem likely, but they have not been much studied. The peripheral systems that create and process these perception-based codes function like the modules Fodor posits. Higher cognitive processes, however, are thought to depend more heavily on a different type of memory representation that preserves the meaning of
Salkind_Chapter 36.indd 31
9/4/2010 10:42:07 AM
32
Curriculum, Instruction and Learning
an event. Indeed, Anderson (1983, 1985) argues that this type of abstract code dominates long-term memory, even for memories that might appear to be more perception based. For example, much of what we remember about a visual scene depends on our interpretation and understanding of the visual display. On this view, meaning-based representations (such as the idea of roundness) are derived from particular perception-based memories (such as memories for many particular round objects). This multicode theory of memory has several interesting analogs in research on individual differences. For example, specific learning disabilities may be caused by a dysfunction in one or more peripheral systems that encode information from the environment into memory or decode the products of thinking into particular responses. Conversely, the dominance of the meaning-based code in human cognition corresponds to the dominance of the general factor in individual differences on complex tasks that seemingly emphasize different mental contents or processes. Indeed, general ability – as typically estimated – may reflect the ability to create, transform, and retain meaning-based mental representations (Snow & Lohman, 1989). A second feature of Anderson’s ACT∗ theory that can inform theorizing about intelligence is the distinction between declarative and procedural knowledge. These two types of knowledge are posited in many, although certainly not all, AI theories. Declarative knowledge is knowing that something is the case. Procedural knowledge is knowing how to do something.11 Declarative knowledge is represented by a network in which nodes are like idea units, and procedural knowledge is represented by conditional imperatives of the form, “If a certain condition holds, then perform a certain action.” Thus, procedural knowledge is dynamic; declarative is static. Procedural knowledge can be executed automatically, even unconsciously; declarative knowledge is often accessed slowly and consciously. Each is also acquired with different proficiency and by different methods. On one hand, new declarative knowledge can be acquired relatively quickly (often in a single trial), often by elaborating relationships with previously acquired knowledge. On the other hand, proceduralization generally requires more extensive practice. The declarative-procedural distinction has several implications for a theory of intelligence. First, cognitive skills are modeled as forms of procedural knowledge in ACT∗. Therefore, those parts of the theory which describe how declarative knowledge is converted to procedural knowledge also describe an important aspect of ability development. Second, the theory predicts the gradual differentiation of abilities some have hypothesized (Garrett, 1946; Anastasi, 1970), and it can explain how the same task (e.g., division) can require general problem-solving skills for the inexperienced examinee and specific problem-solving skills for the more experienced examinee. Third, attempts to measure declarative and procedural knowledge suggest new
Salkind_Chapter 36.indd 32
9/4/2010 10:42:07 AM
Lohman
Human Intelligence
33
ways to separate students’ factual knowledge in a domain from their ability to solve unfamiliar problems in the domain. This is an old (Lindquist, 1948) but seldom attained goal in educational measurement. Attempts to assess declarative knowledge usually involve the construction of a map of the examinee’s factual knowledge base. Attempts to assess procedural knowledge emphasize speed of solving problems, methods of classifying them, or errors made in such processing. Kyllonen and Christal (1989a) have shown that Anderson’s theory can be used as a general framework for the assessment of individual differences. They argue that individual differences on a wide variety of cognitive tasks arise from differences in four primary sources: cognitive processing speed, working memory capacity, breadth and pattern of declarative knowledge, and breadth and pattern of procedural knowledge. Working memory occupies a central position in ACT∗ and in applications of the four-sources framework to problems of skill acquisition (Woltz, 1988) and reasoning abilities (Kyllonen & Christal, 1989b). For example, in one series of studies, Kyllonen and Christal (1989b) found strikingly high correlations between theory-based measures of working memory capacity and traditional measures of reasoning ability (or Gf ). While acknowledging that such correlations are open to multiple interpretations, they argue that individual differences in working memory capacity cause individual differences in reasoning. One interesting implication of this view is that attempts to localize reasoning ability in a particular component process (e.g., inference) are bound to fail since working memory capacity affects success across all component stages of reasoning tasks. These are but a few of the sorts of connections that can be made between a general theory of cognition and concepts familiar in measurement, particularly educational measurement. Some of these hypotheses may prove useful; others will surely be discarded. Nevertheless, it would appear that any good theory of intelligence must distinguish between higher level cognitive representations and the processes that operate on them and lower level representations and the processes that mediate between the world and the individual. Such a differentiation may take the form of a hierarchical system: a base of built-in, primitive mechanisms that operate in parallel with processes not accessible to introspection and a second level of processing that is serial, often open to introspection, and can be modified with some flexibility (Gardner, 1985). A good theory of intelligence must also acknowledge the crucial role of knowledge in all of cognition (Glaser, 1984). A major implication of Anderson’s theory of research on skill acquisition and of research on expertise is that aspects of thinking that were once considered elementary, wired-in processes are now understood to be knowledge that has been automatized (“compiled” or “proceduralized”) through practice. Thus, understanding abilities means understanding individual differences in learning and development.
Salkind_Chapter 36.indd 33
9/4/2010 10:42:08 AM
34
Curriculum, Instruction and Learning
The “New Connectionism” Critics of the computer metaphor for human thought have long pointed to the discrepancy between the serial, digital “Von Neumann” computer and the parallel, analog nature of much human thought. Cognitive psychologists countered that it was often impossible to distinguish between a serial model, in which one stage of processing follows on the heels of another, and a parallel model for the same task, in which all processes start at the same time, run in parallel, but finish at different times. Further, it was argued that parallel processing could be simulated – albeit clumsily – on a serial computer. These arguments began to lose their appeal as parallel processing computers were constructed and as deliberate efforts were made to make computational models of thought conform better to biological theories of brain function. This new breed of neurally inspired models of cognition is best exemplified in the work of Rumelhart, McClelland, and the PDP Research Group (1986) and their Parallel Distributed Processing (PDP) approach. Instead of a series of operations on symbols, a PDP model contains thousands of connections among hundreds of cognitive units. Excitations or inhibitions are signaled from one unit to another until the network momentarily achieves a stable state. “Thinking” or “action” occurs as strengths of the connections among units are momentarily altered. Memory is thus modeled as the set of relationships among aspects of events encoded in groups. The pattern of connections and their strengths allow particular “memories” to be recreated when the network is activated. The PDP approach signals a significant shift from purely serial models of thinking to parallel models. Some have already suggested that a comprehensive account of thinking will require both types of processing – for example, a richly interconnected hierarchy with parallel-processing modules at the base that are dedicated to particular sensory inputs or response systems and a serial, limited capacity system at the apex to model higher order thinking (Gardner, 1985). Such a system mirrors the sort of hierarchial model of human intelligence advocated in various guises by Spearman, Burt, Vernon, and Cattell. The PDP approach also reflects a shift from theories rich in process but short on knowledge to theories that are rich in knowledge but short on process. There has been a gradual realization in all of cognitive science of the importance of an extensive, accessible, and well organized knowledge base for intelligent performance. In AI, early efforts to avoid knowledge in the interest of simplification only served to make the task of modeling human reasoning “harder than it needed to be” (Dehn & Schank, 1982, p. 373). Similarly, there has been a gradual shift in cognitive psychology from the sort of knowledge-free information-processing models that can be neatly summarized in a flow chart to the study of the role of prior knowledge represented as scripts (Schank & Abelson, 1977), schema (Rumelhart & Ortony, 1977),
Salkind_Chapter 36.indd 34
9/4/2010 10:42:08 AM
Lohman
Human Intelligence
35
mental models (Johnson-Laird, 1983), and belief systems (Carey, 1986). The importance of knowledge has even emerged in process-intensive tasks, such as those used to estimate spatial abilities. Further, many functions formerly represented as wired-in processes in information-processing models are now seen as acquired proficiencies (i.e., procedural knowledge). Indeed, the goal of measuring knowledge- or experience-free cognitive processes may be a measurement pipe dream, as E. L. Thorndike et al. (1926) suggested. In a way, this newfound role of knowledge in cognitive science parallels the gradual realization by differential psychologists that intelligence is not an innate characteristic of the person but an acquired set of competencies (Anastasi, 1986; Cronbach, 1984).
Future Directions Including Affect Kant popularized Aristotle’s threefold categorization of mental faculties: cognitive, affective, and conative (knowing, feeling, and willing). By this account, a complete theory of mind must explain not only the cognitive dimension but also the emotional and intentional dimensions as well. Attempts to simply the task of understanding intelligence by ignoring emotion and intention may prove as ineffective as early attempts to ignore knowledge in AI. Indeed, theorists are once again beginning to argue that affect must be included in accounts of learning and cognition (Snow & Farr, 1987). Thus, one direction research on intelligence seems to be taking is to expand its horizons to include affective dimensions long recognized as central to intelligence (e.g. Wechsler, 1939) but rarely combined with the systematic study of the cognitive dimensions (see Royce, 1979, however, for one effort). A theory of intelligence thereby becomes more than an account of human cognition. It becomes an account of affect and perhaps even volition as well. Even when intelligence is treated as a noun, its purview knows no bounds.
From Crystallized to Fluid A second trend in research on intelligence is moving in the opposite direction. Binet’s test was originally designed to predict performance in school. Whatever larger purposes he might have hoped the test might serve, or that others have actually used tests for, it is clear that intelligence tests have always been most heavily used as measures of scholastic aptitude. Researchers have begun to uncover the reasons why such tests predict success in conventional forms of schooling as they have begun to understand the nature of the knowledge and thinking skills that are required by school-learning tasks that are also estimated by intelligence tests. Items on intelligence tests often appear
Salkind_Chapter 36.indd 35
9/4/2010 10:42:08 AM
36
Curriculum, Instruction and Learning
to differ markedly from the sort of school-learning tasks they predict. For example, matrix completion problems and/or paper folding problems do not appear to have much in common with understanding a story or solving an algebra word problem. Yet intensive analyses reveal a commonality in the processes students use to solve both test problems and school-learning tasks (Snow & Lohman, 1984). Analyses of existing intelligence tests and of the school-learning tasks such tests were originally designed to predict will continue to be important activities in measurement and instructional psychology. However, the study of school-learning tasks is now viewed by some as the research activity most likely to produce useful results (Cronbach, 1984, p. 300). In fact, there has been a subtle shift in recent years from the study of intelligence to the study of achievement, particularly the acquisition, organization, and use of knowledge in particular domains such as science, mathematics, and literature (Glaser, Lesgold, & Lajoie, 1987). Thus, somewhat paradoxically, new developments in the measurement of intelligence – particularly the sort of intelligence required by and developed through formal schooling – may well come about more through the careful study of achievement than through continued scrutiny of tasks modeled after existing intelligence tests. And there are reasons to be optimistic that such research may produce intelligence tests that are useful for instruction in more ways than are existing tests. This possibility can be better understood if intelligence and achievement are viewed as points on a continuum of transfer or novelty rather than as qualitatively distinct constructs. Figure 2 shows one such continuum. The horizontal line symbolizes the amount of transfer required by the test or the average novelty of the problems for the typical examinee. At the far left, problems on the test duplicate those taught. As one moves to the right, problems become increasingly novel and require increasing transfer. For example, if students have learned to add numbers in columns, then one could present these same addition facts in column format to require minimum transfer. Presenting the same facts horizontally would require a bit of transfer; embedding the problems in a sentence would require more transfer; and embedding them in a matrix problem in which the rule is “add row 1 to row 2” requires even more transfer. Perhaps creating the matrix items in the first place requires the most transfer. As this example demonstrates, the continuum of novelty in Figure 2 is not limited to general ability but can apply to narrower ability constructs as well. It also illustrates the principle that the same task can elicit different processes from different people, depending on their prior experience. Mastery Tests
Familiar
Final Exams
General Ach.
Fluid Ability
Insight
Novel
Figure 2: Hypothetical continuum of transfer for general achievement and ability tests
Salkind_Chapter 36.indd 36
9/4/2010 10:42:08 AM
Lohman
Human Intelligence
37
Important educational objectives may be identified all along this line (Elshout, 1987). Students must learn specific skills, but they must also learn to transfer their learnings to unfamiliar situations and to be creative. Unfortunately, measurement problems increase as one moves from left to right on this scale. Tests that sample no more than those facts and skills explicitly taught are relatively easy to defend, especially when only limited inferences are made from test scores. Tests that require transfer are more difficult to defend because problem novelty varies from individual to individual and because such tests are usually constructed in ways that encourage grander inferences. Some argue that defensible tests of insight (on the far right) are nonexistent. Much of the research on intelligence and intelligence tests conducted by Sternberg, Snow, Hunt, Pellegrino and others during the 1970s could be seen as an effort to start in the middle of Figure 2 and move to the left. Both Snow (1978) and Glaser (1976) argued that the ultimate goal of their research on intelligence was to discover how the thinking skills required by such tests are also required for learning in schools. Although much has been learned from these efforts, dependable methods for encouraging the development of fluid abilities have not been discovered, even though many recommendations have been made (e.g., Wagner & Sternberg, 1984). In part, this may be an inevitable consequence of studying tests that were designed to work rather than to reflect a particular theory of cognition. A more fruitful avenue, for education at least, might be to begin somewhere near the left of Figure 2 and work toward the right. Perhaps then educators might finally learn what to teach the so-called “overachiever,” who scores higher on tests of crystallized abilities than on tests of fluid abilities. The recent work of Brown and Ferrera (1985) in estimating a student’s “zone of proximal development” exemplifies one effort toward this goal.
Process Sensitive Tasks A third trend in research on intelligence is a renewed emphasis on the contextual foundation of the concept “intelligence” in the culture and lifehistory of the individual. In part, this represents a rediscovery of the fact that, as E. L. Thorndike et al. (1926) put it, “measurements of intelligence rest on judgements of value” (p. 12). But it also represents a breaking down of artificial barriers within psychology, such as between learning and the context in which learning occurs (Brown, Collins, & Duguid, 1989; Greeno, 1989) or between learning and development (Chi, 1978; Glaser, 1984). Renewed linkages between the psychologies of learning and development are particularly noteworthy. Understanding how abilities develop is central to the task of understanding what abilities are. It is no accident that qualitative advances in our understanding of the mental processes which produce
Salkind_Chapter 36.indd 37
9/4/2010 10:42:08 AM
38
Curriculum, Instruction and Learning
intelligent performances have come from those who studied the development of intelligence rather than those who focused exclusively or primarily on the organization of individual differences at a particular point in time. Much of this can be explained by a closer examination of the type of task typically studied by the developmentalist. All scientific measurements of intelligence that we have at present are measures of some product produced by the person or animal in question, or of the way in which some product is produced [italics added]. A is rated as more intelligent than B because he produces a better product, essay written, answer found, choice made, completion supplied or the like, or produces an equally good product in a better way, more quickly or by inference rather than by rote memory, or by more ingenious use of the material at hand [italics added]. (E. L. Thorndike et al., 1926, p. 11–12)
Thorndike et al. (1926) here describe two types of tasks: tasks which permit inferences about the nature of intelligence from the type of response made (often a qualitative judgment) and tasks which permit inferences about the rank order of individuals in ability by counting up the number of responses scored “correct” (usually a quantitative judgment). Psychometrics has understandably followed the quantitative route. Items are scaled for difficulty and examinees are ranked by how far up the ladder they can climb. Developmentalists from Piaget to Siegler have followed the qualitative path. The same problem is presented to all children and their developmental level is inferred from the sophistication of the response given. Indeed, early efforts to develop tests which provided a qualitative assessment of intelligence, such as the tests of Healy and Fernald (1911) or even the Binet scale of 1905, “did not emphasize the objective score which the child made so much as his general behavior and the way in which he went about the tasks which were set him” (Freeman, 1926, p. 108). However, judgments about process were clearly less dependable than judgments about whether the subject gave a keyed response, and so qualitative assessments of process were quickly displaced by quantitative assessments of product. Furthermore, tests which provided a score that could be immediately ranked better fit the requirements of a burgeoning test industry that was more interested in identifying who was intelligent than in understanding what intelligence was. By the 1970s, however, cognitive psychologists had developed new methods for testing inferences about process – methods that were more sophisticated and objective than clinical judgments. Many tried to apply these new methods for detecting process to experimental tasks modeled after existing intelligence tests. Of all the “strange ironies” which have attended the history of mental testing (Cronbach, 1975), none is stranger than the attempt to apply powerful methods for detecting individual differences in processing strategy to a class of test-like tasks carefully pruned of such differences. It is a tribute to the power of the methods and the ingenuity of the researchers
Salkind_Chapter 36.indd 38
9/4/2010 10:42:08 AM
Lohman
Human Intelligence
39
who employed them that anything interesting was found at all. Perhaps process analyses would be more successful in revealing interesting individual differences in process if they were to be applied to tasks deliberately designed to elicit such differences than to tasks modeled after existing mental tests.
Summary and Evaluation Summaries broader than the scope of this paper are available (see Snow & Lohman, 1989; Sternberg, 1985), but several themes emerge in all of them. First, much of the optimism about the potential impact of cognitive psychology on the study of human intelligence (e.g., Hunt et al., 1973; Sternberg, 1977) has been tempered by experience. Hunt now sees some fundamental incompatibilities between the correlational and experimental camps in psychology. He notes: Cronbach [1957] thought that general theories of psychological process ought not to ignore individual differences, and vice versa. He was right, and in a general sense the union of the camps is well underway. In my opinion . . . the way to achieve the scientific union is to concentrate on understanding how individual differences variables, such as age, sex, genetic constitution, and education, influence the processes of cognition. It does not seem particularly fruitful to try to derive the dimensions of . . . [a trait model] of abilities from an underlying process theory. (Hunt, 1987, p. 36)
Like Hunt, Sternberg has also modified his views, although he sees more compatibility than Hunt. In 1977, Sternberg described a method for testing information-processing models of tasks that he called componential analysis. He then compared his method of componential analysis with factor-analytic methods for understanding abilities and found the latter seriously wanting. More recently, he has claimed that “cognitive approaches to intelligence are basically compatible with psychometric and other approaches” (1985, p. 108), each better suited to addressing different questions about the same phenomenon. Sternberg (1985) argues that his triarchic theory recognizes the contributions not only of the correlational and the information-processing approaches to the study of intelligence but also of theorists such as Berry (1972) and E. L. Thorndike et al. (1926) who point out that the list of behaviors and accomplishments valued as “intelligent” varies over cultures and contexts. The conclusion that trait and process approaches are in some ways fundamentally incompatible may seem overly pessimistic. Nevertheless, it at least acknowledges that the two approaches make completely different demands on the basic person by item data matrix. Each partitions the data matrix in completely different ways. The trait theorist focuses on variation
Salkind_Chapter 36.indd 39
9/4/2010 10:42:08 AM
40
Curriculum, Instruction and Learning
in row means whereas the experimentalist focuses on variation in column means. The trait theorist is concerned with covariances computed over persons whereas the experimentalist should be more concerned with covariances computed over items. It is possible – even likely – to propose a processing model that does an excellent job of accounting for variability in item difficulties or latencies, either for all subjects or separately for each subject, and yet have no explanation for individual differences on the task. On the other hand, the trait theorist constructs measures of broad abilities by making items (or subtests) as heterogeneous as possible (Spearman, 1927; Humphreys, 1985), thereby making a process analysis of the test either impossible or so general that it is uninformative.12 Thus, the two approaches are in some ways complementary but in other ways incompatible (Ippel & Lohman, 1990). There has been a similar tempering of enthusiasm about the prospects for an easy victory over the problem of human intelligence in other quarters of cognitive science – particularly AI. Increasingly, those who have attempted to develop artificially intelligent systems have come to question their efforts and the constraints that the digital computer has placed on their work. In a summary of this recent history of AI, Dehn and Schank (1982) note, “Arrogance about the potential superiority of machine-specific intelligence slowly gave way to a growing respect for human intelligence and its operation. Characteristics of human intelligence . . . that had at first seemed to be weaknesses began to be recognized as strengths” (p. 354). For example, humans tend not to consider all aspects of a problem or to generate and evaluate all possible answers to a problem before deciding upon a course of action. Computers are easily programmed with algorithms that painstakingly consider all factors in a problem before choosing the best answer. However, the computer begins to drown in computation as problems increase in complexity, such as when the input is a visual scene or when the number of alternatives that could be generated is unlimited, as in a chess game. Further, this problem will not be solved by building computers with greater computational speed and power. Therefore, AI has shifted from programs that solve problems by brute force to programs modeled after the “satisficing” sort of rules of thumb humans use – balancing effort and time against expected payoff – in complex situations. The recent shift to parallel-processing computers and to models of cognition that conform to current theories of brain function takes an even larger step away from the conventional digital computer and the constraints it imposes on efforts to model human cognition. However, some predict that even these efforts are doomed to fail, either because human cognition is not rule bound (Dreyfus & Dreyfus, 1986) or because higher level cognitive processes such as judgment and reasoning can be influenced by one’s beliefs, values, and intentions (Pylyshyn, 1984; Fodor, 1981). In short, there has been a growing respect for human intelligence, and a realization that it will not yield to ready explanation by the methods of
Salkind_Chapter 36.indd 40
9/4/2010 10:42:08 AM
Lohman
Human Intelligence
41
cognitive science any more than it yielded to ready explanation by the method of factor analysis. Yet factor analysis contributed – and continues to contribute (Carroll, in press; Gustafsson, 1984) – to our understanding of human intelligence. Cognitive science will also continue to contribute to our understanding in spite of the dire warnings of the pessimists and in spite of difficulties already encountered. But it will do so with a little less arrogance and, hopefully, with a little greater appreciation for the contributions of Binet, E. L. Thorndike, and others who have traveled this path before.
Notes 1. Special abilities often improve the prediction when samples are large or restricted on general ability (R. L. Thorndike, 1986). Note, too, that the issue is not general versus special abilities but whether to give each ability factor a unique weight or to give all the same weight in forming a single composite to be correlated with a criterion. It has long been known that a weighted average differs little from a simple average (Burt, 1907, cited in Butcher, 1968, p. 68). Instability of regression weights for correlated predictors demands it. Pooling correlations from different studies (e.g., Hunter, 1986) further exaggerates the role of general abilities (Linn, 1986). Finally, multiple aptitude batteries can still provide important information for guidance (Tyler, 1986). 2. Like many cognitive psychologists, Anderson (1985) usually uses the word intelligence as a synonym for cognition, not the individual difference construct associated with intelligence tests. The implications of this view for an individual difference interpretation are outlined in the third section of this paper. 3. For example, Freeman (1926) notes the need “to identify the mental processes which are measured by [existing ability tests]” (p. 127). He also provides a remarkably balanced summary of early research on intelligence. 4. Norman (1986) claims that the architecture of the digital computer was heavily influenced by the designers’ tacit theories of human cognition. Nevertheless, many who came later turned the metaphor around and looked for parallels between physical structures in the computer and psychological structures. 5. There are several intermediate cases as well. For example, Cronbach (1977) argues that “intelligence” is an abstraction much like “efficiency”. On this view, one cannot locate production efficiency in a particular department of a factory: rather, it is a term that describes the overall functioning of the system relative to comparable factories. Another possibility is that intelligence is something like Spearman’s (1927) mental energy or Jensen’s (1982) neural efficiency. Once again, one could not isolate “intelligence” in particular processes, but one might equate it with some general characteristics of cognition, such as attentional resources or speed of processing. 6. Fancher (1985) offers a fascinating historical perspective on the controversy. Using biographical sources, he traces the conflict from the disparate life experiences of John Stuart Mill and Frances Galton, through the lives of the major players in this controversy, to the recent debates between Kamin and Eysenck. 7. Humphreys (1986) aptly describes those who openly espouse environmental explanations for intelligence but who then assume some biological capacity not measured by existing intelligence tests that would be assessed by a properly constructed test as “closet hereditarians”. The description seems also to apply to some cognitive scientists. 8. Carroll (1980) suggests that the correlation with verbal ability may be more parsimoniously attributed to a general or perceptual speed dimension. In a hierarchical model, however, factors such as perceptual (or clerical) speed, memory span, and fluency are
Salkind_Chapter 36.indd 41
9/4/2010 10:42:08 AM
42
9.
10.
11.
12.
Curriculum, Instruction and Learning
located below verbal comprehension and thus represent specific verbal abilities. Carroll’s critique is troublesome only if one views verbal comprehension as the sole verbal ability (see Snow & Lohman, 1989). Low correlations with external criteria for all component scores except the intercept parameter is a statistical necessity unless task scores have poor internal consistency. This point is discussed below and in greater detail in Ippel and Lohman (in preparation). Thus, low correlations between components and other variables do not invalidate the models, although they do challenge the goal of estimating component scores for individuals. For the there-is-nothing-new-under-the-sun folks, E. L. Thorndike et al. (1926) proposed that the various “products” of the human intellect be more systematically sampled from tests that differed in content (“including situations containing other human beings,” p. 20) that required different “internal . . . processes” or “operations performed with the words, numbers, pictures, and other content” (p. 21). Although procedural knowledge is said to be developed out of declarative knowledge, Anderson uses the term “procedural knowledge” more restrictively than some theorists. Knowledge of how to do something that is not yet compiled (or automatized) would be called declarative knowledge. Clearly, one can have declarative knowledge of a procedure or can have proceduralized that knowledge and not have a declarative representation of it, or one could have both. As previously suggested, more informative process analyses demand tasks that allow ready inference of how subjects solved a problem, or what knowledge they brought to bear on it by the type of response they gave, not by the presence or absence of a correct response. In other words, the fundamental problem should be one of response categorization, not response scoring. Analyses of individual differences in response latencies introduce even more problems, such as what to do with error-response latencies or how to equate subjects on speed-accuracy trade-off. These problems are routinely ignored or incorrectly dismissed (for further discussion, see Lohman, 1989).
References Anastasi, A. (1970). On the formation of psychological traits. American Psychologist, 25, 899–910. Anastasi, A. (1986). Intelligence as a quality of behavior. In R. J. Sternberg & D. K. Detterman (Eds.), What is intelligence? Contemporary viewpoints on its nature and definition (pp. 19–21). Norwood, NJ: Ablex. Anderson, J. R. (1976). Language, memory, and thought. Hillsdale, NJ: Erlbaum. Anderson, J. R. (1983). The architecture of cognition. Cambridge, M A: Harvard University Press. Anderson, J. R. (1985). Cognitive psychology and its implications (2nd ed.). New York: W. H. Freeman. Anderson, J. R., & Bower, G. H. (1973). Human associative memory. Washington, DC: Winston. Barrett, P., Eysenck, H. J., & Lucking, S. (1986). Reaction time and intelligence: A replicated study. Intelligence, 10, 9 – 40. Berry, J. W. (1972). Radical cultural relativism and the concept of intelligence. In L. J. Cronbach & P. Drenth (Eds.), Mental tests and cultural adaptation (pp. 77–89). The Hague: Mouton. Bethell-Fox, C. E., Lohman, D. F., & Snow, R. E. (1984). Adaptive reasoning: Componential and eye movement analysis of geometric analogy performance. Intelligence, 8, 205–238. Binet, A., & Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. L’ Annee Psychologique, 11, 245–336.
Salkind_Chapter 36.indd 42
9/4/2010 10:42:08 AM
Lohman
Human Intelligence
43
Brown, A. L., & Ferrara, R. A. (1985). Diagnosing zones of proximal development. In J. Wertsch (Ed.), Culture, communication and cognition: Vygotskian perspectives (pp. 273–305). Cambridge, M A: Cambridge University Press. Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18, 32– 42. Bruner, J. S., Goodnow, J., & Austin, G. (1956). A study of thinking. New York: Wiley. Burt, C. (1958). The inheritance of mental ability. American Psychologist, 13, 1–15. Butcher, H. J. (1968). Human intelligence: Its nature and assessment. London: Methuen. Carey, S. (1986). Cognitive science and science education. American Psychologist, 41, 1123–1130. Carlson, J. S., Jensen, C. M., & Widaman, K. F. (1983). Reaction time, intelligence, and attention. Intelligence, 7, 329–344. Carroll, J. B. (1976). Psychometric tests as cognitive tasks: A new “structure of the intellect.” In L. B. Resnick (Ed.), The nature of intelligence (pp. 27–56). Hillsdale, NJ: Erlbaum. Carroll, J. B. (1980). Individual differences in psychometric and experimental cognitive tasks (NU 150–406 ONR Final Report). Chapel Hill, NC: University of North Carolina, L. L. Thurstone Psychometric Laboratory. Carroll, J. B. (1986). Beyond IQ is cognition. A review of Beyond IQ: A triarchic theory of human intelligence by R. J. Sternberg. Contemporary Psychology, 31, 325–327. Carroll, J. B. (1987). Jensen’s mental chronometry: Some comments and questions. In S. Modgil & C. Modgil (Eds.), Arthur Jensen: Consensus and controversy (pp. 297–307). New York: The Falmer Press. Carroll, J. B. (in press). Factor analysis since Spearman: Where do we stand? What do we know? In R. Kanfer, P. L. Ackerman, & R. Cudeck (Eds.), The Minnesota symposium on learning and individual differences: Abilities, motivation, and methodology. Hillsdale, NJ: Erlbaum. Cattell, R. B. (1943). The measurement of adult intelligence. Psychological Bulletin, 40, 153–193. Cattell, R. B. (1963). Theory of fluid and crystallized intelligence: A critical experiment. Journal of Educational Psychology, 54, 1–22. Cattell, R. B. (1971). Abilities: Their structure, growth, and action. New York: Houghton Mifïlin. Chi, M. T. H. (1978). Knowledge structures and memory development. In R. S. Siegler (Ed.), Children’s thinking: What develops? (pp. 73–96). Hillsdale, NJ: Erlbaum. Chomsky, N. (1959). A review of B. F. Skinner’s Verbal behavior. Language, 35, 26–58. Chomsky, N. (1980). Rules and representations. New York: Columbia University Press. Cooper, L. A. (1982). Strategies for visual comparison and representation: Individual differences. In R. J. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 1, pp. 77–124). Hillsdale, NJ: Erlbaum. Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12, 671–684. Cronbach, L. J. (1972). Judging how well a test measures: New concepts, new analyses. In L. J. Cronbach & P . Drenth (Eds.), Mental tests and cultural adaptation (pp. 413– 427). The Hague: Mouton. Cronbach, L. J. (1975). Five decades of public controversy over mental testing. American Psychologist, 30, 1–14. Cronbach, L. J. (1977). Educational psychology (3rd ed.). New York: Harcourt, Brace, Jovanovich. Cronbach, L. J. (1984). Essentials of psychological testing (4th ed.). New York: Harper and Row. Cronbach, L. J. (1986). Signs of optimism for intelligence testing. Educational Measurement: Issues and Practice, 5, 23–24. Cronbach, L. J., & Snow, R. E. (1977). Aptitudes and instructional methods: A handbook for research on interactions. New York: Irvington.
Salkind_Chapter 36.indd 43
9/4/2010 10:42:08 AM
44
Curriculum, Instruction and Learning
Daalen-Kapteijns, M. M. van, & Elshout-Mohr, M. (1981). The acquisition of word meanings as a cognitive learning process. Journal of Verbal Learning and Verbal Behavior, 20, 386–399. Dehn, N., & Schank, R. (1982). Artificial and human intelligence. In R. J. Sternberg (Ed.), Handbook of human intelligence (pp. 352–391). Cambridge, M A: Cambridge University Press. Dreyfus, H. L., & Dreyfus, S. E. (1986). Mind over machine. New York: Free Press. Elshout, J. J. (1987). Problem solving and education. In E. de Corte, H. Lodewijks, R. Parmentier, & P . Span (Eds.), Learning and instruction: European research in an international context (Vol. 1, pp. 259–274). Oxford, UK: Leuven University Press and Pergamon Press. Elshout, J. J., Hemert, N., A. van, & Hemert, M., van (1975). Comment on Horn and Knapp on the subjective character of the empirical base of Guilfor’s structure-of-intellect model. Onderwijsresearch, 1, 15–25. Eysenck, H. J. (1982). A model for intelligence. New York: Springer. Eysenck, H. J. (1987a). A general systems approach to the measurement of intelligence and personality. In S. H. Irvine & S. E. Newstead (Eds.), Intelligence and cognition: Contemporary frames of reference (pp. 349–376). Dordrecht, Netherlands: Martinus Nijhoff. Eysenck, H. J. (1987b). Intelligence and reaction time: The contribution of Arthur Jensen. In S. Modgil & Modgil (Eds.), Arthur Jensen: Consensus and controversy (pp. 285–296). New: The Falmer Press. Eysenck, H. J. (1988). The concept of “intelligence”: Useful or useless? Intelligence, 12, 1–16. Fancher, R. E. (1985). The intelligence men: Makers of the IQ controversy. New York: W. W. Norton & Co. Fodor, J. A. (1981). Representations: Philosophical essays on the foundations of cognitive science. Cambridge, MA: MIT Press. Ford, M. E. (1986). A livings systems conceptualization of social intelligence: Outcomes, processes, and developmental change. In R. J. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 3, pp. 119–171). Hillsdale, NJ: Erlbaum. Frederiksen, J. R. (1982). A componential theory of reading skills and their interactions. In R. J. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 1, pp. 125–180). Hillsdale, NJ: Erlbaum. Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning. American Psychologist, 39, 193–202. Freeman, F. N. (1926). Mental tests: Their history, principles and application. Boston: Houghton Mifflin. Galton, F. (1869). Hereditary genius. London: Macmillan. Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York: Basic Books. Gardner, H. (1985). The mind’s new science. New York: Basic Books. Garrett, H. E. (1946). A developmental theory of intelligence. American Psychologist, 1, 372–378. Gick, M., & Holyoak, K. (1983). Schema induction and analogical reasoning. Cognitive Psychology, 15, 1–38. Glaser, R. (1972). Individuals and learning: The new aptitudes. Educational Researcher, 1, 5–12. Glaser, R. (1976). The processes of intelligence and education. In L. B. Resnick (Ed.), The nature of intelligence (pp. 341–352). Hillsdale, NJ: Erlbaum. Glaser, R. (1984). Education and thinking: The role of knowledge. American Psychologist, 39, 93–104. Glaser, R., Lesgold, A., & Lajoie, S. (1987). Toward a cognitive theory for the measurement of achievement. In R. R. Ronning, J. A. Glover, J. C. Conoley, & J. Witt (Eds.),
Salkind_Chapter 36.indd 44
9/4/2010 10:42:08 AM
Lohman
Human Intelligence
45
The influence of cognitive psychology on testing and measurement: The Buros-Nebraska symposium on measurement and testing ( Vol. 3, pp. 41–86). Hillsdale, NJ: Erlbaum. Glushko, R. J., & Cooper, L. A. (1978). Spatial comprehension and comparison processes in verification tasks. Cognitive Psychology, 10, 391– 421. Greeno, J. G. (1989). A perspective on thinking. American Psychologist, 44, 134–141. Guilford, J. P . (1959). Three faces of intellect. American Psychologist, 14, 459– 479. Guilford, J. P . (1985). The structure-of-intellect model. In B. B. Wolman (Ed.), Handbook of intelligence (pp. 225–266). New York: Wiley. Gustafsson, J. E. (1984). A unifying model for the structure of intellectual abilities. Intelligence, 8, 179–203. Healy, W., & Fernald, G. M. (1911). Tests for practical mental classification. Psychological Monographs, 13 (2). Hebb, D. (1949). The organization of behavior. New York: Wiley. Hendrickson, D. E. (1982). The biological basis of intelligence: Part 2. Measurement. In H. J. Eysenck (Ed.), A model for intelligence (pp. 197–228). New York: Springer. Holland, J. H., Holyoak, K. J., Nisbett, R. E., & Thagard, P . R. (1987). Induction: Processes of inference, learning, and discovery. Cambridge, MA: MIT Press. Horn, J. L. (1976). Human abilities: A review of research theory in the early 1970s. Annual Review of Psychology, 27, 437– 485. Horn, J. L. (1985). Remodeling old models of intelligence. In B. B. Wolman (Ed.), Handbook of intelligence (pp. 267–300). New York: John Wiley & Sons. Horn, J. L., & Knapp, J. R. (1973). On the subjective character of the empirical base of Guilford’s structure-of-the-intellect model. Psychological Bulletin, 80, 33–43. Humphreys, L. G. (1962). The organization of human abilities. American Psychologist, 17, 475–483. Humphreys, L. G. (1981). The primary mental ability. In M. P . Friedman, J. P . Das, & N. O’Connor (Eds.), Intelligence and learning (pp. 87–102). New York: Plennum. Humphreys, L. G. (1984). A rose is not a rose: A rival view of intelligence. Comment on R. J. Sternberg’s “Toward a triarchic theory of human intelligence.” The Behavioral and Brain Sciences, 7, 292–293. Humphreys, L. G. (1985). General intelligence: An integration of factor, test, and simplex theory. In B. B. Wolman (Eds.), Handbook of intelligence (pp. 201–224). New York: Wiley. Humphreys, L. G. (1986). Describing the elephant. In R. J. Sternberg & D. K. Detterman (Eds.), What is intelligence? Contemporary viewpoints on its nature and definition (pp. 97–100). Norwood, NJ: Ablex. Hunsicker, L. M. (1925). A study of the relationship between rate and ability. Contributions to Education, No. 185. New York: Columbia University, Teachers College. Hunt, E. (1985). Verbal ability. In R. J. Sternberg (Ed.), Human abilities: An informationprocessing approach (pp. 31–58). New York: Freeman. Hunt, E. (1986). The heffalump of intelligence. In R. J. Sternberg & D. K. Detterman (Eds.), What is intelligence? Contemporary viewpoints on its nature and definition (pp. 101–108). Norwood, NJ: Ablex. Hunt, E. (1987). Science, technology, and intelligence. In R. R. Ronning, J. A. Glover, J. C. Conoley, & J. C. Witt (Eds.), The influence of cognitive psychology on testing: The Buros-Nebraska symposium on measurement and testing (Vol. 3, pp. 11– 40). Hillsdale, NJ: Erlbaum. Hunt, E. B., Frost, N., & Lunneborg, C. (1973). Individual differences in cognition: A new approach to intelligence. In G. Bower (Ed.), The psychology of learning and motivation (Vol. 7, pp. 87–122). New York: Academic Press. Hunt, E. B., Lunneborg, C., & Lewis, J. (1975). What does it mean to be high verbal? Cognitive Psychology, 7, 194 –227.
Salkind_Chapter 36.indd 45
9/4/2010 10:42:08 AM
46
Curriculum, Instruction and Learning
Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of Vocational Behavior, 29, 340–362. Ippel, M. J., & Lohman, D. F. (1990). Cognitive diagnosis: From statistically-based assessment to theory-based assessment. Unpublished manuscript. Irwin, R. J. (1984). Inspection time and its relation to intelligence. Intelligence, 8, 47–66. Jensen, A. R. (1969). How much can we boost IQ and scholastic achievement? Harvard Educational Review, 39, 1–123. Jensen, A. R. (1980). Bias in mental testing. New York: The Free Press. Jensen, A. R. (1982). Reaction time and psychometric g. In H. J. Eysenck (Ed.), A model for intelligence (pp. 93–132). Prenger-Verlag. Jensen, A. R. (1987). Process differences and individual difference in some cognitive tasks. Intelligence, 11, 107–136. Johnson-Laird, P . N. (1983). Mental models: Towards a cognitive science of language, inference, and consciousness. Cambridge, MA: Harvard University Press. Kail, R., & Pellegrino, J. W. (1985). Human intelligence: Prospectives and prospects. New York: Freeman. Keating, D. P ., & MacLean, D. J. (1987). Cognitive processing, cognitive ability, and development: A reconsideration. In P . A. Vernon (Ed.), Speed of information-processing and intelligence (pp. 239–270). Norwood, NJ: Ablex. Kelley, T. L. (1928). Crossroads in the mind of man. Stanford, CA: Stanford University Press. Kintsch, W. (1986). Learning from text. Cognition and instruction, 3, 87–108. Kosslyn, S. M. (1980). Image and mind. Cambridge, MA: Harvard University Press. Koussy, A. A. H. El. (1935). The visual perception of space. British Journal of Psychology, 7 (Whole No. 20). Kyllonen, P . C. (1985). Dimensions of information processing speed. (AFHRL-TP-84-56). Brooks AFB, TX: Air Force Human Resources Lab. Kyllonen, P . C., & Christal, R. E. (1989a). Cognitive modeling of learning abilities: A status report of LAMP . In R. Dillon & J. W. Pellegrino (Eds.), Testing: Theoretical and applied issues (pp. 146 –173). New York: Freeman. Kyllonen, P . C., & Christal, R. E. (1989b). Reasoning ability is (little more than) working memory capacity. Manuscript submitted for publication. Lindquist, E. F. (1948). The nature and purposes of the Iowa Tests of Educational Development. Unpublished manuscript. Linn, R. L. (1986). Comments on the g factor in employment testing. Journal of Vocational Behavior, 29, 438 – 444. Lohman, D. F. (1979). Spatial ability: A review and reanalysis of the correlational literature (Tech. Rep . No. 9). Stanford, C A: Stanford University, School of Education. (NTIS No. AD-A075 972) Lohman, D. F. (1988). Spatial abilities as traits, processes, and knowledge. In R. J. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 4, pp. 181–248). Hillsdale, NJ: Erlbaum. Lohman, D. F. (1989). Individual differences in errors and latencies on cognitive tasks. Learning and Individual Differences, 1, 179–202. Longstreth, L. E. (1984). Jensen’s reaction-time investigations of intelligence: A critique. Intelligence, 8, 139–160. Macleod, C. M., Hunt, E. B., & Mathews, N. N. (1978). Individual differences in the verification of sentence-picture relationships. Journal of Verbal Learning and Verbal Behavior, 17, 493–508. Marshalek, B. (1981). Trait and process aspects of vocabulary knowledge and verbal ability (Tech. Rep. No. 15). Stanford, C A: Stanford University, Aptitude Research Project, School of Education. (NTIS No. AD-A102 757).
Salkind_Chapter 36.indd 46
9/4/2010 10:42:08 AM
Lohman
Human Intelligence
47
Marshall, J. C. (1977). Minds, machines and metaphors. Social Studies of Science, 7, 475 – 488. McNemar, Q. (1964). Lost: Our intelligence? Why? American Psychologist, 19, 871– 882. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Miller, G. A. (1981). Trends and debates in cognitive psychology. Cognition, 10, 215–225. Nettelbeck, T., & Kirby, N. H. (1983). Measures of timed performance and intelligence. Intelligence, 7, 39–52. Nettelbeck, T., & Lally, M. (1976). Inspection time and measured intelligence. British Journal of Psychology, 67, 17–22. Newell, A., Shaw, J. C., & Simon, H. A. (1957). Empirical explorations with the logic theory machine. Proceedings of the Western Joint Computer Conference, 15, 218–239. Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice-Hall. Norman, D. A. (1986). Reflections on cognition and parallel distributed processing. In D. E. Rumelhart, J. L. McClelland, and the PDP Research Group (Eds), Parallel distributed processing: Vol. 2. Psychological and biological models (pp. 531–546). Cambridge, MA: MIT Press. Pellegrino, J. W., & Kail, R. (1982). Process analyses of spatial aptitude. In R. J. Sternberg (Ed.), Advances in the psychology of human intelligence ( Vol. 1, pp. 311–366). Hillsdale, NJ: Erlbaum. Perfetti, C. A. (1986). Reading ability. New York: Oxford University Press. Pinker, S. (1984). Visual cognition: An introduction. Cognition, 18, 1–63. Poltrock, S. E., & Brown, P . (1984). Individual differences in visual imagery and spatial ability. Intelligence, 8, 93–138. Porteus, S. D. (1915). Mental tests for the feebleminded: A new series. Journal of PsychoAsthenics, 19, 200–213. Pylyshyn, Z. W. (1984). Computation and cognition. Cambridge, MA: MIT Press. Royce, J. R. (1979). Toward a viable theory of individual differences. Journal of Personality and Social Psychology, 37, 1927–1931. Rumelhart, D. E., McClelland, J. L., and the PDP Research Group. (1986). Parallel distributed processing: Vol. 1. Foundations. Cambridge, MA: MIT Press. Rumelhart, D. E., & Ortony, A. (1977). The representation of knowledge in memory. In R. C. Anderson, R. J. Spiro, & W. E. Montague (Eds.), Schooling and the acquisition of knowledge (pp. 99–136). Hillsdale, NJ: Erlbaum. Scarr, S. & Carter-Saltman, L. (1982). Genetics and intelligence. In R. J. Sternberg (Ed.), Handbook of human intelligence (pp. 792–896). Cambridge, MA: Cambridge University Press. Schank, R. C., & Abelson, R. P . (1977). Scripts, plans, goals, and understanding. Hillsdale, NJ: Erlbaum. Shepard, R. N., & Cooper, L. A. (1982). Mental images and their transformations. Cambridge, MA: MIT Press. Skinner, B. F. (1953). Science and human behavior. New York: Macmillan. Smith, I. M. (1964). Spatial ability. San Diego: Knapp. Snow, R. E. (1978). Theory and method for research on aptitude processes. Intelligence, 2, 225–278. Snow, R. E. (1981). Toward a theory of aptitude for learning: Fluid and crystallized abilities and their correlates. In M. P . Friedman, J. P . Das, & N. O’Connor (Eds.), Intelligence and learning (pp. 345–362). New York: Plenum Press. Snow, R. E., & Farr, M. J. (Eds.). (1987). Aptitude, learning, and instruction: Vol. 3, Conative and affective process analyses. Hillsdale, NJ: Erlbaum. Snow, R. E., & Lohman, D. F. (1984). Toward a theory of cognitive aptitude for learning from instruction. Journal of Educational Psychology, 76, 347–376.
Salkind_Chapter 36.indd 47
9/4/2010 10:42:08 AM
48
Curriculum, Instruction and Learning
Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In R. Linn (Ed.), Educational Measurement (3rd ed.) (pp. 263–331). New York: Macmillan. Snow, R. E., Marshalek, B., & Lohman, D. F. (1976). Correlation of selected cognitive abilities and cognitive processing parameters: An exploratory study (Tech. Rep. No. 3). Stanford, CA: Stanford University, School of Education. Snow, R. E., & Yalow, E. (1982). Education and intelligence. In R. J. Sternberg (Ed.), Handbook of human intelligence (pp. 493–585). Cambridge, MA: Cambridge University Press. Spearman, C. E. (1927). The abilities of man. London: Macmillan. Spearman, C. E., & Wynn Jones, L. L. (1950). Human ability. London: Macmillan. Spencer, H. (1855). The principles of psychology. London: Williams and Norgate. Stern, W. (1914). The psychological method of testing intelligence (G. M. Whipple, Trans.). Baltimore: Warwick & York. (Original work published 1912) Sternberg, R. J. (1977). Intelligence, information processing, and analogical reasoning: The componential analysis of human abilities. Hillsdale, NJ: Erlbaum. Sternberg, R. J. (1984). Toward a triarchic theory of human intelligence. The Behavioral and Brain Sciences, 7, 269–315. Sternberg, R. J. (1985). Beyond IQ: A triarchic theory of human intelligence. Cambridge, MA: Cambridge University Press. Sternberg, R. J., & Gardner, M. K. (1983). Unities in inductive reasoning. Journal of Experimental Psychology: General, 112, 80–116. Sternberg, R. J., & McNamara, T. P. (1985). The representation and processing of information in real-time verbal comprehension. In S. E. Embretson (Ed.), Test design: Developments in psychology and psychometrics (pp. 21– 43). Orlando, FL: Academic Press. Sternberg, R. J., & Powell, J. S. (1983). Comprehending verbal comprehension. American Psychologist, 38, 878–893. Terman, L. M. (1922). The great conspiracy. New Republic, 33, 116–120. Thomson, G. H. (1920). General versus group factors in mental activities. Psychological Review, 27, 173–190. Thorndike, E. L. (1903). Educational psychology. New York: The Science Press. Thorndike, E. L. (1920). Intelligence and its uses. Harper’s Magazine, 140, 227–235. Thorndike, E. L., Bregman, E. O., Cobb, M. V., & Woodyard, E. (1926). The measurement of intelligence. New York: Columbia University, Teachers College. Thorndike, R. L. (1963). The concepts of over- and under-achievement. New York: Columbia University, Teachers College. Thorndike, R. L. (1986). The role of general ability in prediction. Journal of Vocational Behavior, 29, 332–339. Thorndike, R. L., Hagen, E. P ., & Sattler, J. M. (1986). The Stanford-Binet intelligence scale: Fourth edition technical manual. Chicago: The Riverside Publishing Company. Thorndike, R. M., & Lohman, D. F. (1989). A century of ability testing. Chicago: The Riverside Publishing Company. Thurstone, L. L. (1937). Ability, motivation, and speed. Psychometrika, 2, 249–254. Thurstone, L. L. (1938). Primary mental abilities. Psychometric Monograph, 1. Thurstone, L. L. (1947). Multiple factor analysis. Chicago: University of Chicago Press. Tyler, L. (1986). Back to Spearman? Journal of Vocational Behavior, 29, 445–450. Underwood, B. J. (1975). Individual differences as a crucible in theory construction. American Psychologist, 30, 128–134. Vernon, P . E. (1950). The structure of human abilities. London: Methuen. Wagner, R. K., & Sternberg, R. J. (1984). Alternative conceptions of intelligence and their implications for education. Review of Educational Research, 54, 179–224. Wagner, R. K., & Sternberg, R. J. (1986). Tacit knowledge and intelligence in the everyday world. In R. J. Sternberg & R. K. Wagner (Eds.), Practical intelligence: Nature and
Salkind_Chapter 36.indd 48
9/4/2010 10:42:08 AM
Lohman
Human Intelligence
49
origins of competence in the everyday world (pp. 51–83). Cambridge, MA: Cambridge University Press. Watson, J. B. (1925). Behaviorism. New York: Norton. Wechsler, D. (1939). The measurement of adult intelligence. Baltimore: Williams & Wilkins. Wissler, C. (1901). The correlation of mental and physical tests. Psychological Monographs, 3 (6, Whole No. 16). Woltz, D. J. (1988). An investigation of the role of working memory in procedural skill acquisition. Journal of Experimental Psychology: General, 117, 319–331.
Salkind_Chapter 36.indd 49
9/4/2010 10:42:08 AM
Salkind_Chapter 36.indd 50
9/4/2010 10:42:09 AM
37 Cognitive Demands of New Technologies and the Implications for Learning Theory Richard J. Torraco
A
t a basic level, learning to perform work-related activities requires the engagement of one’s cognitive processes with the task to be accomplished. Few studies trace prescriptions for learning all the way back through human cognitive processes to the specific characteristics of the tasks that determine what should be learned in the first place. This study identifies the cognitive demands of new technologies by first examining the specific requirements of tasks involving the use of new technologies. These tasks are shown to place unique cognitive demands on those who use new technologies. Then, the question is raised, How well do current theories of learning address these cognitive demands? Four theories – those of Scribner, Schon, Wenger, and Hutchins – are analyzed for their power to explain human cognition and learning as they relate to the use of new technologies. Finally, the article offers new directions for future research on learning theory, studies of workplace learning, and theory building. Although the primary focus of this article is the need for better theory to explain learning at work, I start by addressing the nature of the task itself. This honors Gagne’s (1962) seminal admonition that scholars should first examine the task to be learned in order to specifically address what learning should accomplish and Hackman’s (1969) treatise on the centrality of the task to research on work behavior. The concept of task remains a prominent construct in models of work design (Campion & Medsker, 1992; Hackman & Oldham, 1980; Smith, Source: Human Resource Development Review, 1(4) (2002): 439–466.
Salkind_Chapter 37.indd 51
9/4/2010 10:42:19 AM
52
Curriculum, Instruction and Learning
Henning, & Smith, 1994), work motivation (Locke & Latham, 1990), work complexity (Campbell, 1988; Khurana, 1999; Weick, 1990), and human cognition (Engestrom & Middleton, 1996; Hutchins, 1995; Simon, 1981).
Characteristics of Tasks Involving New Technologies Any meaningful discussion of task characteristics must be framed in the context of a work or organizational environment. For the purpose of this article, the work environment is composed of the immediate physical environment of the worker and the organizational demands placed on the worker. The immediate physical environment is constituted by the tools, equipment, electronic devices, and other material resources needed by the worker to accomplish the task. Organizational demands relate to work requirements placed on the individual that go beyond one’s immediate responsibilities such as project deadlines, the need for administrative approvals, expected rates of transaction, and other requirements imposed on the individual by the organization that indicate how one’s job activities fit with process requirements. Technology is also a key concept in this discussion. Like other tools, technology is a means through which work is accomplished. Berniker’s (1987) definition of technology is adopted for this discussion. Berniker embedded the concept of technology within the larger structure of a technical system, which is a specific combination of machines, equipment, and methods used to produce some valued outcome. . . . Every technical system embodies a technology. It derives from a large body of knowledge which produces the basis for design decision. Technology refers to a body of knowledge about the means by which we work on the world, our arts and our methods. (p. 10)
Four specific characteristics of tasks involving the use of new technologies are described next. These concepts – contingent versus deterministic tasks, distancing technologies, stochastic events, and systemic interdependence – compose a proposed model of task characteristics associated with new technologies. These concepts were identified as elements of the model through a comprehensive review of the literature that sought task characteristics distinctively associated with the use of new technologies. These four characteristics are central to understanding the unique cognitive demands faced by users of new technologies and are not meant to be an exhaustive review of constructs related to the use of new technologies. Mindful of the need for balance between a model’s comprehensiveness and parsimony, I developed two concepts to explain new phenomena (contingent versus deterministic tasks and distancing technologies), whereas two others are adapted from existing theories (stochastic events and systemic interdependence). Thus, all concepts in the model are either developed and justified by the author or grounded in existing literature.
Salkind_Chapter 37.indd 52
9/4/2010 10:42:19 AM
Torraco
New Technologies
53
• Contingent versus deterministic tasks: When applied to what were once routine, predictable tasks (deterministic tasks), new technologies have substantially increased the contingency of these tasks by increasing their complexity and speed (Pentland, 1997). • Distancing technologies such as digital displays, controls, and sensor technologies, remove the operator from the operating location and eliminate the physical cues and sentient information from which knowledge can be derived (Woods, O’Brien, & Hanes, 1987; Zuboff, 1988). • Stochastic events are randomly occurring and unpredictable events that are properties of new technologies (Weick, 1990) and flexible manufacturing systems (Norros, 1996). • Systemic interdependence is the system of relationships needed to ensure that one’s work is coordinated with that of others within the work system (Adler, 1986). These four task characteristics of new technologies place unique cognitive demands on those who use them. The next section discusses the cognitive demands associated with each task characteristic. The task characteristics and the cognitive demands associated with them are listed in Table 1. Cognitive demands of contingent tasks. New technologies have fundamentally changed the character of technical work by removing it further than ever from its historical and deterministic origins. The work of technicians and craftsmen has always been contingent on contextual factors because task execution is frequently altered by temporal, material, social, economic, and other factors that reflect the changing properties of the task environment. However, new
Table 1: The cognitive demands of task characteristics Task characteristic
Cognitive demand
Contingent versus deterministic tasks Unanticipated problems Expanded menu of solutions
• • •
Distancing technologies Physical separation Psychological separation
•
• Stochastic events Disruptions to work process Premature task termination
Systemic interdependence Partial versus complete knowledge Transforming inputs to outputs
Salkind_Chapter 37.indd 53
•
Mental reconstruction of problem and causes Capability for systematic search and pragmatic solutions Ability to go beyond scripted procedures Capabilities for inference, imagination, and mental modeling to understand what is going on elsewhere Reconciliation of mental representation of work process with actual work process
•
Movement from emotional arousal to constructive thought and action Memory (information storage and retrieval) to reconsider means-ends relationships and desired end states Improvisation – the abilities of the bricoleur
• •
Interpersonal skills Transactive memory
•
9/4/2010 10:42:19 AM
54
Curriculum, Instruction and Learning
technologies have substantially increased task contingency by reducing their transparency, increasing their speed, and expanding the menu of options available for task accomplishment. Work processes that were formerly transparent (i.e., separate, observable, and easily deconstructed into their systemic components) have been combined through process engineering (Davenport, 1993) and have disappeared into computer-controlled machines and communication technology (Weick, 1990), thereby reducing their transparency and complicating efforts to discern process interactions. Technicians and customers can no longer see the flow of information and materials and must infer from outputs what occurred earlier in the work process. Replacing the industrial-era belief in “one best method” for each operation (Woodward, 1994), the reduced transparency of technology now masks an Internet-like network of possible paths for processing materials and information. These factors contribute directly to the contingency of tasks associated with the use of new technologies. New technology also feeds social pressures for rapid transactions, especially in customer service situations. Predetermined procedures are frequently abandoned to expedite customer requests. Expanded options for task accomplishment within and among distinct technologies further contributes to the contingent nature of work strategies. Although the Internet offers alternative paths for processing information (e-mail, Web sites), it can be circumvented altogether through the use of the phone, facsimile, satellite links, paper, or personal contact for communication. As witnessed in equipment repair (Orr, 1996), software support (Pentland, 1997), science laboratories (Barley & Bechky, 1994), insurance claim processing (Wenger, 1998), automobile assembly (Graham, 1993), and military operations (Hutchins, 1995; Weick & Roberts, 1993), new technologies have made technical tasks more contingent than ever. Because technical work is filled with novel or poorly defined problems that cannot be fully anticipated in advance (Barley & Orr, 1997), technicians are often confronted by technology breakdowns of ambiguous origins that cannot be resolved with schematics and procedural knowledge alone. Because their problem-solving algorithms are inadequate for the variety and unpredictability of these problems, technicians must rely instead on pragmatic rules of thumb and other shortcuts afforded by the task environment. Thus, successful performance requires employees to go beyond scripted procedures to resolve problems in innovative ways. This does not mean that job aids, operating procedures, and training in the use of such resources are of no value. Effective work systems should provide such technical assistance in ways that are easily referenced to minimize mental and computational loads, so workers are free to do higher level evaluation and problem solving (Norman, 1988). However, to understand and respond to ambiguous situations, workers must make use of improvised materials, local conditions, and social circumstances, thus deploying contingent work strategies that reflect the changing properties of the task environment.
Salkind_Chapter 37.indd 54
9/4/2010 10:42:19 AM
Torraco
New Technologies
55
The cognitive demands faced by those who deal with these problems include a considerable amount of systematic mental search to identify pragmatic solutions. Workers frequently must reconstruct the situation that led to the problem, identify the causes, develop a solution strategy, and ensure that the proposed solution is satisfactory and feasible. Assistance in this regard from the information storage and retrieval capabilities of technology may not always help workers deal with these cognitive demands. Technologybased troubleshooting aids that provide workers with problem-solving heuristics, algorithms, and databases are just as likely to hinder as help the worker’s problem-solving efforts (Norman, 1993). Moreover, as Griffith and Northcraft (1996) demonstrated, the implementation of technologies may be better served when less, rather than more, information about the technology is provided to users. Regardless of whether the technology itself is seen as valuable or detrimental, the contingency of tasks associated with new technologies places important cognitive demands on those who must use them. This characteristic of new technologies (contingent vs. deterministic tasks) and the cognitive demands associated with it are listed in Table 1. Cognitive demands of distancing technologies. Distancing technologies are present in work environments ranging from industrial factories (Zuboff, 1988) to high technology settings (Pentland, 1997). Digital displays, controls, and sensor technologies at the operator’s workstation are symbolic representations that distance workers from the physical and sensory referents present at the actual sites of operation. In the pulp and paper mills studied by Zuboff (1988), instrumentation formerly was located on or close to the operating equipment, allowing the operator to combine data from an instrument reading with data from his or her own senses. Distancing technologies removed the operator from the operating location and eliminated the physical cues and sentient information in which knowledge was based. In addition to physical distance from customers, the software support technicians studied by Pentland (1997) were expected to solve software problems over the telephone despite customers’ diverse software and hardware configurations. Problem solving was difficult due to the ambiguity of the problems as described by customers, who were unable to identify specific conditions that were relevant to troubleshooting the problems. Technicians had difficulty visualizing the situations that gave rise to the problems. The problems created by physical distance are magnified by the computer controls of most distancing technologies that display information on separate screens of a computer monitor. To recognize irregular patterns among the data or to initiate novel search sequences, the technician must remember what earlier screens have shown and hope that the readings have not changed while subsequent screens are accessed. However, human factors research has shown that it is easier to recognize patterns when data are presented simultaneously rather than serially. Technicians in production control rooms who use technology based on this research can easily create novel search sequences
Salkind_Chapter 37.indd 55
9/4/2010 10:42:19 AM
56
Curriculum, Instruction and Learning
when they are able to sweep visually across an array of indicators that present data at the lowest level of detail (Woods, O’Brien, & Hanes, 1987). Similar principles of task design have been applied to high-technology work settings. For example, Gill (1996) showed that changing the task characteristics of expert information systems to enhance the user’s sense of control over task activities and their own performance increased the intrinsic motivational character of these tasks, which, in turn, enhanced the workers’ motivation to increase their use of these expert systems. Separation from the operating environment requires workers to interpret symbolic, electronically presented data. The ability to make sense of what is going on at remote operating sites is vital to competent performance in these work environments. Software support technicians and production control room operators have to imagine the conditions in the operating environment that cannot be displayed by their information systems. Before attempting to solve problems, they must first mentally visualize the conditions that give rise to the problems. Thus, the physical and psychological separation created by distancing technologies increases cognitive demands for inference, imagination, and mental modeling to understand what is happening elsewhere. Human-computer-robot manufacturing systems, known as automatic manufacturing technology, provide an example of the performance problems associated with distancing technologies. Early automatic manufacturing technology systems provided the operator with a televised view of the robot’s actions at the point of manufacturing located away from the operator’s control panel. But because the televised view distorted the spatial properties of visual feedback to the operator, control panels were relocated closer to the point of operation to allow direct viewing of the automatic manufacturing technology robot by the operator, thus reducing the cognitive demands on the operator (Smith, Henning, & Smith, 1994). The physical and psychological distance from one’s work caused by these technologies can be reduced by designing work environments according to principles of human factors and ergonomics (Salvendy, 1987) and by giving more attention to the importance of inference and mental modeling in employee training and development. A discussion of how well these employee needs are addressed in current theories of learning is presented later in the article. This characteristic of new technologies (distancing technologies) and the cognitive demands associated with it are listed in Table 1. Cognitive demands of stochastic events. Complexity is added to the task when it is interrupted by stochastic events. Stochastic events are randomly occurring and unpredictable events that are properties of new technologies (Weick, 1990) and flexible manufacturing systems (Norros, 1996). When new technologies are implemented in industrial work processes, they frequently produce system disturbances to which operators must respond, even though they have not yet developed expertise in the use of these technologies (Norros, 1996). Pre–industrial era technologies were predictable and easily understood
Salkind_Chapter 37.indd 56
9/4/2010 10:42:19 AM
Torraco
New Technologies
57
because key operating mechanisms followed clear cause-effect relationships. However, today’s technologies are more complex and present problems due to their instability, reduced transparency, and tendency to break down. New technologies have always been accompanied by problems. Early mass production lines were plagued by incessant breakdowns. Although stochastic events are not new, Weick (1990) noted that new technologies are unique in that the uncertainties are permanent rather than transient. Many softwaredependent systems are intentionally pushed through product development and quickly delivered to market. Product testing is short-circuited because implementation is often the means by which the technology itself is designed. Such development-delivery tradeoffs result in “buggy” software, incomplete information networks, and password-activated technologies that will not start. Even common technologies are not free of breakdowns (e.g., disconnection from the Internet, power failures, being cut off during telephone calls). Dealing with the disruptions from unfinished technologies and prototypes increases the cognitive demands placed on technicians who must use them. When a sudden, unpredictable event disrupts a task, it triggers emotional arousal (Weick, 1990). Once emotion is stimulated, it increases as long as the interruption remains unexplained, especially when work stoppage is costly or risky. Stochastic events require rapid movement from emotion to action, that is, from arousal, to the search for explanations, to actions that produce information about possible causes. This occurs as the worker tries to subdue emotional interference with thought and action. Sudden work stoppage also forces the reconsideration of means-ends relationships and of desired end states. Are alternative paths available for project completion that circumvent the disabled technology? Can the project be completed elsewhere, by someone else, or at a later time? How much can the desired end state of the transaction be modified? The cognitive demands on memory (information storage and retrieval) and search for additional information from such disruptions can be considerable. A sudden system failure challenges workers to make do with the tools and materials at hand. As they improvise to complete their tasks, they invoke the skills of contemporary bricoleurs – resourceful craftsmen who make use of whatever materials are available to complete the project (Levi-Strauss, 1966). This characteristic of new technologies (stochastic events) and the cognitive demands associated with it are listed in Table 1. Cognitive demands of systemic interdependence. The interdependencies needed to ensure that one’s work is coordinated with the work of others have been termed “systemic interdependence” by Adler (1986). Systemic interdependence requires ongoing and flexible integration of hitherto distinct functions of operations, systems, design, and training. The reciprocal nature of this interdependence in operations is exemplified in the reliance on common databases. Users thereby become dependent on other users’ data input accuracy. (p. 19)
Salkind_Chapter 37.indd 57
9/4/2010 10:42:19 AM
58
Curriculum, Instruction and Learning
Systemic interdependence requires interpersonal skills and the ability to work effectively with others on the same project despite different social and technical backgrounds. Such interdependence is strengthened through the use of transactive memory systems (Wegner, Erber, & Raymond, 1991). Transactive memory is based on the premise that we need not know a particular subject ourselves if we know where to find information about it. Transactive memory systems are integrated and differentiated structures in which related information is held by different group members working on a common project. It is the sharing of relevant data that yields the higher order insights and generalizations that are valued in these work environments. Workers who contribute to transactive memory systems participate in the sharing and integration of technical knowledge and, in turn, further develop their networks of social and technical interdependencies. This characteristic of new technologies (systemic interdependence) and the cognitive demands associated with it are listed in Table 1. These four characteristics – contingent versus deterministic tasks, distancing technologies, stochastic events, and systemic interdependence – are fundamental elements of new technologies that place unique cognitive demands on those who use them. How well do current theories of learning address these cognitive demands? The next section discusses the extent to which four selected theories explain human cognition and learning as they relate to working with new technologies.
Theories of Learning and Cognitive Demands The theories of Scribner (Tobach, Falmagne, Parlee, Martin, & Kapelman, 1997), Schon (1983, 1987), Wenger (1998), and Hutchins (1995) are analyzed for their power to explain human cognition as it relates to the use of new technologies. These four learning theories provide comprehensive and meaningful explanations of how learning occurs in the type of work settings discussed here. The criteria used for selecting these theories for this discussion are that each theory (a) describes specific cognitive processes, (b) addresses learning as both an enabler and product of work practices, (c) explains how learning occurs in authentic work settings, (d) is comprehensive in its treatment of the behavioral and environmental influences on learning, and (e) offers propositions that can be generalized to other settings. Because these four theories explain the phenomena discussed in this article better than most other theories, these five selection criteria are discussed in more detail in the final section of the article as desirable characteristics of sound theory. Scribner’s model of practical thinking at work. Scribner used activity theory as developed by Leont’ev (1981) to bridge the conceptual relationship between knowing and doing in her cognitive studies of work. Activity theory explains purposeful behavior by focusing on the structure of the activity
Salkind_Chapter 37.indd 58
9/4/2010 10:42:20 AM
Torraco
New Technologies
59
itself. For Leont’ ev, the activity is the appropriate unit of analysis for human behavior. An activity can be analyzed at three levels. First, at the highest level of organization is the motivation of the activity, which provides coherence to the other levels. At the next level are goal-directed actions, carried out in the service of the activity. At the third level are operations, or the specific conditions under which actions are carried out. For example, if our action is traveling from one place to another in the service of some activity (e.g., pursuing leisure and recreation), whether we walk, drive, or use some other means of transportation is an operation that depends on distance and other specific conditions related to the action. Because dynamic relationships exist among the three elements of the theory, the theory presents different levels of analysis for studying work activity. Activities, actions, and operations may change positions in the hierarchy relative to one another according to changing situations, new knowledge, and the intentions of the human agent. Because motivated activities, actions, and operations are defined according to their functions rather than properties inherent in the elements themselves, an activity can lose its motivating force and become an action in the service of another activity (e.g., losing interest in the intrinsic value of one’s job and performing it primarily for income). Hence, questions about performance or the structure of work in different environments can be asked at the level of the activity, the action, and the operation. Because an activity is a dynamic system, methods of studying the activity can change as the activity changes and as new questions about it emerge. Scribner’s model of practical thinking is strongly influenced by the notion of activities as mediators of knowing and doing. The collection of Scribner’s cognitive studies of work concludes with a paper that presents her model of practical thinking (Tobach et al., 1997). The model is organized around four principles synthesized from Scribner’s studies of dairy workers (Scribner, 1984), industrial machinists (Martin & Scribner, 1991), bartenders (Scribner & Beach, 1993), indigenous literacy in West Africa (Scribner & Cole, 1981), and practical and theoretical arithmetic (Scribner & Fahrmeier, 1983). Scribner’s research sought support for the premise that cognitive skills take shape in the course of participation in socially organized practices. The results of her work are embodied in the four principles of her model: (a) economy of effort functioned as a criterion distinguishing skilled from amateur performance – the “least-effort strategy” was consistently followed by skilled performers whether mental or physical effort was minimized and regardless of resource constraints in the work environment; (b) problem-solving strategies were dependent on specific knowledge about materials and conditions in the immediate task environment; (c) diversity and flexibility of solution modes distinguished expert problem solvers from beginners; and (d) more experienced workers replaced all-purpose algorithms with a menu of solution modes fitted to properties of specific problems in changing environments. Scribner (cited in
Salkind_Chapter 37.indd 59
9/4/2010 10:42:20 AM
60
Curriculum, Instruction and Learning
Tobach et al., 1997) summarized the four principles in this way: “Thinking in the dairy was goal-directed and regulated by a principle of economy which, operating under changing conditions and on the basis of knowledge and information in the environment, generated flexible solution procedures adapted to particular occasions of use” (p. 380). Scribner’s work and the cognitive demands of new technologies. Scribner’s work demonstrated that workers seek pragmatic solutions through economy of effort regardless of the contingent or deterministic structure of the task. Her study of working intelligence (Scribner, 1984) fully accounts for task unpredictability and the need to go beyond scripted procedures to accommodate the changing demands of the task environment. For example, because each dairy order was different, delivery drivers modified their problem framing and arithmetic solutions to conform to the benefits of either their calculators or paper-and-pencil computations. Ways of solving problems followed means of arriving at solutions. Systemic interdependence requires knowledge of how one’s work fits in with the work of others and the ability to work with others on interdependent tasks. Scribner’s theory emphasizes workers’ ability to capitalize on available resources to find successful work strategies, including the efficiencies and reduced effort of relying on one’s coworkers to accomplish related tasks. However, a dominant theme in Scribner’s work, the importance of contextual factors in cognitive studies, is reflected in her theory as a multiplicity of influences, both social and material, on the cognitive strategies people adopt to accomplish their work. Interpersonal relations and interdependencies among workers is one of several key factors identified by Scribner that shape one’s repertoire of work behaviors. Scribner’s model emphasizes that successful work strategies are goal directed and vary adaptively with the changing properties of the problems and resources encountered by workers in the task environment. The model explains how workers might respond to stochastic events by relying on flexible solution strategies and improvising with available tools and materials. Task disruptions might trigger the reassessment of means-ends relationships, and solutions would reflect Scribner’s concept of mental and physical effort saving. Contextual factors would strongly influence how workers in a production environment learn and adapt their skills on the job. The power and endurance of Scribner’s model are evident. Even though it was developed 20 years ago, before technologies considered new today were developed, her theory explains how workers adapt to task contingency and respond to stochastic events. Although today’s workers might use Scribner’s least-effort strategies and context-specific solutions to achieve competence in today’s high-technology work environments, it is not known how well Scribner’s model of practical thinking addresses the cognitive demands of distancing technologies or explains the roles of inference and mental modeling to enable more effective
Salkind_Chapter 37.indd 60
9/4/2010 10:42:20 AM
Torraco
New Technologies
61
use of these technologies. A summary of how Scribner’s theory addresses the cognitive demands of new technologies is given in Table 2. Schon’s theory of reflection in action. Schon (1983) argued for a new epistemology of practice that takes as its point of departure the competence and artistry already embedded in skillful practice – especially, the reflection in action through which professionals think about what they are doing while they are doing it. Reflection in action is a theory of learning that explains how reflective practitioners use knowledge and problem solving in their work. Reflection in action is an iterative process that moves through the stages of (a) assessment of the situation, (b) testing of one’s preliminary sense of the problem through experiments, (c) examination of results, and (d) reassessment leading to another cycle of problem reformulation. Learning occurs through an iterative process of purposeful actions, discovered consequences, implications, reassessments, and further actions. Using reflection in action, we conduct experiments to examine the validity of our judgments and, in the process, expose ourselves to new possibilities for learning. According to Schon (1983), “the situation talks back, the practitioner listens, and as he appreciates what he hears, he reframes the situation once again” (p. 131). This theory of learning prompted Schon to raise a critical question: What kind of professional education would be appropriate to an epistemology of practice based on reflection in action? His subsequent work (Schon, 1987) answered this question by proposing that university-based professional schools should learn from such deviant traditions of education for practice as studios of art and design, conservatories of music and dance, athletic coaching, and apprenticeship in the crafts, all of which emphasize coaching and learning by doing. Professional education, Schon (1987) argued, should be redesigned to combine the teaching of applied science with coaching in the use of reflection in action strategies. He proposed a generalized educational setting, the reflective practicum, as a model for professional development in which learning occurs by doing, with the help of coaching, especially through a dialogue of reciprocal reflection in action between coach and student. The reflective practicum is a methodology for implementing reflection in action in the sense that it brings together the necessary material and contextual resources, along with the coach’s personal and technical support for critical reflection. It provides an environment in which students can learn by doing, not simply through trial and error, but through critical refection as students are coached in refection in action strategies. Reflection in action begins with a situation that yields spontaneous routinized responses. As long as the situation appears normal, our responses are tacit and spontaneously delivered without conscious deliberation. Yet routine responses sometimes produce a surprise – an unexpected outcome, pleasant or unpleasant, that does not fit our present knowledge schema. This unexpected consequence triggers reflection. We think about the consequence and
Salkind_Chapter 37.indd 61
9/4/2010 10:42:20 AM
Salkind_Chapter 37.indd 62
Scribner
• Problem-solving strategies are adaptive and dependent on specific knowledge of materials and changing conditions in the task environment. • Workers seek pragmatic solutions that reflect economy of mental and physical effort.
• Although Scribner’s principle of context-dependent problem solving has been broadly applied to work settings, it is not known how well the theory addresses the cognitive demands of distancing technologies.
Task characteristic
• Contingent versus deterministic tasks
• Distancing technologies • Reflection in action is a means for making sense of new situations through an iterative process of purposeful action, discovered consequences implications, reconstruction of our understanding, and further actions.
• Reflection in action enables workers to spontaneously construct solutions to problems that cannot be fully anticipated in advance.
Schon
• Communities of practice provide a social context for learning about distancing technologies and enable the sharing of representations of these technologies among members across locations.
• Communities of practice allow workers to reach pragmatic solutions through mutual engagement. • Communities of practice legitimize peripheral learning and foster adaptation and sense making in changing work environments.
Wenger
Theory of learning
Table 2: Summary of learning theories and the cognitive demands of new technologies
• Hutchins’s model explains how workers’ mental representations of their work allow navigation at night when navigators are “distanced” from the sentient cues that relate the ship’s position to its environment.
• In Hutchins’s model of cultural cognition, practice, learning, and work environment are all simultaneously transformed. • Workers use tools to transform the (navigation) task by mapping it into a domain, using representations and heuristics, where the answer or the path to the solution is apparent.
Hutchins
62 Curriculum, Instruction and Learning
9/13/2010 3:32:03 PM
Salkind_Chapter 37.indd 63
• Scribner’s model explains how workers improvise with available tools and materials and use flexible solution strategies to respond to task disruptions and stochastic events.
• Scribner’s model identifies mutual dependencies among colleagues as means for adapting to the changing demands of the task environment.
• Stochastic events
• Systemic interdependence • Relationships among coaches and students based on reciprocal reflection in action are central to professional development in Schon’s model.
• Workers respond to stochastic events through reflection in action. Reflection in action probes the unexpected disruption and allows for tentative understanding, testing, and reframing of the event to reach a resolution.
• Communities of practice build support and interdependencies that foster the sharing of members’ insights and generalizations. • Membership in a community of practice provides access to the knowledge of individuals and of the community of practice.
• Communities of practice strive to make sense of stochastic events as members exchange individual perspectives on their meaning. • The sharing of interpretations of stochastic events is likely to include explanations for successfully resolving these disruptions.
• Hutchins’s conception of distributed cognition reflects the overlapping knowledge among navigation team members and emphasizes networks of interdependence and shared expertise.
• Hutchins describes how workers in crisis overcome emotional arousal and construct solutions from procedural knowledge, environmental shortcuts, and bricolage. In these situations, practice, learning, and work environment are all simultaneously transformed.
Torraco New Technologies 63
9/4/2010 10:42:20 AM
64
Curriculum, Instruction and Learning
ponder why it occurred and, at the same time, we ask, “How have I been thinking about this?” Our thoughts turn back on the surprising phenomenon and, at the same time, back on themselves. Thus, reflection in action is a critical function through which we consciously or unconsciously question the assumptions of our present knowledge. Schon contrasts reflection in action with the technical rationality of prevailing curricula for professional education. Technical rationality is based on an objectivist view of practice that posits that reality can be known objectively – the reality to be known is distinct from the practitioner’s knowing. According to this view, professional knowledge is founded on facts and data; formal inquiry serves to measure, predict, and control the phenomenon of interest. On the other hand, reflection in action rests on a constructivist view of the reality that professionals face in practice. Reality and its meaning are negotiable, and what is known is influenced by the process of coming to know it. The dynamics of reflection in action cut across the positivist dichotomies of research-practice, means-ends, and knowing-doing. For the reflective professional, practice is researchlike, means and ends are interdependent and may be transformed depending on how the problem is framed, and practice involves personal interaction with the situation in which knowing and doing are inseparable. Schon’s work and the cognitive demands of new technologies. Schon’s theory accounts for contingent tasks by acknowledging that professionals are frequently confronted by novel situations and must construct their interpretations and responses accordingly. Schon recognized that procedural knowledge and problem-solving algorithms have limited applications in practice, where most problems are contextual and difficult to predict. The capability for reflection in action addresses these cognitive demands by allowing workers to bypass scripted procedures to arrive at solutions for problems that cannot be fully anticipated in advance. Reflection in action also explains the cognitive processes needed to respond effectively to stochastic events. A sudden disruption arouses emotion and triggers reassessment of means-ends relationships. Schon’s discussion of mental experimentation explains how workers might respond to a sudden systems failure by probing the unexpected disruption, forming a tentative understanding of the event, testing their understanding, and reframing the problem to arrive at a solution. Thus, workers respond to emergent situations by constructing new knowledge through reflection in action. Reflective practitioners are continuous learners, and those involved in professional practice are regularly confronted by new situations that may be uncertain, ill defined, and incoherent. Problem novelty and ambiguity are among the challenges facing those using distancing technologies that separate people from the physical cues and information present at the operating location. Those who have embraced reflection in action for solving
Salkind_Chapter 37.indd 64
9/4/2010 10:42:20 AM
Torraco
New Technologies
65
problems and making sense of new situations come to rely on their cognitive strategies for constructing understandings of the new problems confronted in practice. Although referring to the problem solving used to resolve architectural problems of an ambiguous nature, the following statement by Schon (1987) applies to other reflective practitioners, including those who use new technologies: “Their designing is a web of projected moves and discovered consequences and implications, sometimes leading to reconstruction of the initial incoherence – a reflective conversation with the materials of a situation” (p. 42). Because relationships among coaches and students based on reciprocal reflection in action are central to professional development in Schon’s model, the theory reflects the systemic interdependence involved in work situations that include the use of new technologies. One learns and refines reflection in action strategies through ongoing exchanges of reciprocal reflection in action with others. As colleagues in a network of practitioners, those working with new technologies share in shaping each other’s problem-solving strategies during reflective practice. A summary of how Schon’s theory addresses the cognitive demands of new technologies is given in Table 2. Communities of practice. Communities of practice are informal associations of workers who share common work problems and seek the benefits of learning from one another. In such communities, learning occurs primarily through participation in social practice (Wenger, 1998). Underlying communities of practice as an observable phenomenon is Wenger’s theory of social learning. The theory embodied in communities of practice builds on previous work in social learning theory and situated cognition. Social learning theory explains learning as a product of the reciprocal interactions among behavior, cognition, and environmental factors. Learning can occur directly, especially when one’s learning self-efficacy is high, or vicariously through behavior modeling by others (Bandura, 1977). Situated cognition originates with engagement in the activity itself, not with a preconceived model of how learning should occur. Situated cognition follows an “activity-perception-representation” model, in which the cognitive dynamics of learning appear less open to the predetermined knowledge schemas that are dominant in formal instruction (Brown, Collins, & Duguid, 1989). When people lack experience with a situation or are introduced to a new concept, presenting a relevant model may catalyze the formation of mental representations of what is learned. Along with new perceptions and relevant past experiences, the model becomes part of the present context for learning, in which the learner’s activities and perceptions precede mental representation. Four constructs compose the framework for Wenger’s theory of learning: practice, the shared historical and social resources, frameworks, and perspectives that can sustain mutual engagement in action; community, the social configurations in which our enterprises are defined as worth pursuing and our participation is recognizable as competence; identity, how learning
Salkind_Chapter 37.indd 65
9/4/2010 10:42:20 AM
66
Curriculum, Instruction and Learning
changes who we are and creates personal histories of becoming in the context of our communities; and meaning, the ability to experience our life and world as meaningful. Wenger’s assumptions about learning and the nature of knowledge include the premise that meaning – our ability to experience the world and our engagement with it as meaningful – is ultimately what learning is to produce. Another assumption that grounds communities of practice is that engagement in social practice is the fundamental process by which we learn and so become who we are. Thus, communities of practice provide a broad conceptual framework for thinking about learning as a process of social participation. The concept of practice is carefully defined by Wenger as experiences that include both the explicit and the tacit. Practice involves the language, tools, documents, images, symbols, well-defined roles, specified criteria, codified procedures, regulations, and contracts that various practices make explicit for a variety of purposes. But practice also includes the implicit relations, tacit conventions, subtle cues, untold rules of thumb, and so on. Most of these are never articulated, yet they are unmistakable signs of membership in communities of practice and are crucial to the success of their organizations. Learning in practice addresses the need for members to acquire skills and information, but learning goes beyond gaining competence. Members use competence to form an identify of participation. “Practice connotes doing, but not just doing in and of itself. It is doing in a historical and social context that gives structure and meaning to what we do. In this sense, practice is always social practice” (Wenger, 1998, p. 47). The central issue in learning is becoming a member of a community of practice, not simply learning about practice. A community of practice is a learning community to the extent that it is able to continuously reconfigure the identities of its members and of itself. This flexibility of organization allows it to negotiate and renegotiate the nature of its practice. Identity in a community is fostered by allowing members to participate peripherally, yet legitimately, in practice. Legitimacy and peripheral participation in practice are often mutually exclusive. Newcomers seeking to participate in the work of a community of practice are granted peripherality (e.g., as students) but denied legitimacy. Conversely, newcomers may be granted legitimacy but are denied the opportunity for development through peripheral participation. The periphery of practice not only is an important site for learning but can be a valuable source of innovation. Sustaining the peripherality of members’ perspectives is sought increasingly as a way to generate fresh insights for practice and new directions for the future. Wenger’s work and the cognitive demands of new technologies. Participation in communities of practice allows each member to draw on collective knowledge to construct responses to unanticipated or poorly structured problems, thus enabling members to respond effectively to contingent tasks associated with new technologies. Wenger (1998) illustrated his theory with ethnographic
Salkind_Chapter 37.indd 66
9/4/2010 10:42:20 AM
Torraco
New Technologies
67
accounts of insurance claims processors who had to respond to customers’ questions about claims coverage given only standardized forms and procedures and without full knowledge of how contested claims were ultimately resolved. The tasks they faced were made more contingent by customers’ concerns about copayments and company concerns about overpayments, especially in cases of multiple coverages. Workers tried to make sense of these ambiguous situations primarily through social configurations – the networks that claims processors spontaneously formed with each other, not by following claims processing procedures. Communities of practice allowed workers to go beyond standardized procedures and reach pragmatic solutions through mutual engagement. Stochastic events interrupt work, trigger affective responses, and challenge workers to make sense of unexplained disruptions. Members of a community of practice have the advantage of drawing on collective experiences and emotional support from other members to arrive at explanations and responses for stochastic events. A defining feature of communities of practice is the ability to generate fresh perspectives on practice from members who each develop unique identities within their community of practice. Because unexplained events elicit perspectives from members, some who are central to practice, and others, as newer members, who are more peripheral to practice, responses to a stochastic event are diverse and more likely to include a strategy for explaining and resolving the disruption. Communities of practice are manifestations of Wenger’s theory of social learning, and the relationships and expertise acquired by employees at work are explained in terms of social learning dynamics. This process of social learning helps meet the cognitive demands of distancing technologies, which require users to possess the capabilities for inference, imagination, and mental modeling to understand what is going on elsewhere. Communities of practice have emerged in high technology environments where distancing technologies are present (Marshall & Shipman, 1995; Orr, 1996). Communities of practice enable members to make sense of distancing technologies by supporting a communal memory that allows individuals to understand these technologies without needing to know everything about them and by sharing representations of these technologies among members across locations. Communities of practice provide a social context for learning about new technologies that gives structure and meaning to this process for members. The creation of identity is at the core of how communities of practice enable members to meet the cognitive demands of new technologies. Wenger (1998) maintained that who we are and what we can do are transformed through the process of becoming members of communities of practice. Identity and membership permit further engagement in social practice and access to collective knowledge, thus providing the basis for establishing the systemic interdependencies needed by users of new technologies. Although a member may lack specific knowledge about a problem, communities of practice
Salkind_Chapter 37.indd 67
9/4/2010 10:42:20 AM
68
Curriculum, Instruction and Learning
provide collective knowledge that enables a response to the unpredictability and ambiguity of new technology (Orr, 1996). Legitimate peripheral participation (Lave & Wenger, 1991) and identity (Wenger, 1998) allow members of communities of practice to share their insights and generalizations and foster the development of systemic interdependence. Thus, communities of practice can build support and personal interdependencies that help to meet the cognitive demands of new technologies (Weick & Roberts, 1993). A summary of how Wenger’s theory addresses the cognitive demands of new technologies is given in Table 2. Cultural cognition. Hutchins conceptualized cognition as a complex phenomenon in which practice, learning, and the work environment are all simultaneously transformed. Hutchins (1995) stated, “The very same processes that constitute the conduct of activity and that produce changes in the individual practitioners of navigation also produce changes in the social, material, and conceptual aspects of the setting” (p. 374). These changes occur at different rates and degrees of intensity and reflect histories of different lengths, but they all intersect during any moment in human practice. In the course of task performance, learning occurs and subsequent actions are carried out that create elements of representational structure (e.g., written notes or an improvised tool) that survive beyond the end of the task. The artifacts of learning become elements of the environment, just as the environment influences the nature of learning. It is because these processes interact simultaneously that Hutchins considered cognition at work a fundamentally cultural process. Hutchins (1995) argued that as sociocultural systems, work environments have cognitive properties that are distinct from the cognition of those who perform the work. He confronted contemporary thinking in cognitive science by challenging the adequacy of symbolic processing alone to explain how we use cognitive abilities to solve environmental problems. In this regard, Hutchins stated, Notice that when the symbols are in the environment of the human and the human is manipulating the symbols, the cognitive properties of the human are not the same as the properties of the system that is made up of the human in interaction with these symbols. The properties of the human in interaction with the system produce some kind of computation. But that does not mean that the computation is happening inside the person’s head. (p. 361)
This premise that knowledge can only be created through human interaction with a sociocultural system that includes environmental artifacts is the foundation for Hutchins’s theory of cognition. Hutchins’s work and the cognitive demands of new technologies. Hutchins (1995) opened Cognition in the Wild by describing a stochastic event – the USS Palau loses all power and risks running aground in a narrow channel
Salkind_Chapter 37.indd 68
9/4/2010 10:42:20 AM
Torraco
New Technologies
69
while entering San Diego harbor. Only through expert navigational skills and some luck is the crew able to recover the vessel and safely come to anchor. To meet the cognitive demands of these situations, workers must quickly overcome emotional arousal and construct solutions from procedural knowledge, environmental shortcuts, and bricolage (Levi-Strauss, 1966). Hutchins showed that this process is strongly shaped by the tools and techniques of practice, themselves historically developed. Learning is made easier in work settings where tools are used in public and the details of technology are observable, as they are in the practice of navigation. Hutchins described how the difficulty of piloting large ships is made easier by implementing the fix cycle – a series of procedures in which representations of the position of the ship in its environment are propagated across a series of representational media from initial telescope sightings to the actions taken to correct the ship’s course. These tools transform the complex task of navigation by mapping it into a domain, using the navigation chart and other artifacts, where the answer or the path to the solution is apparent. The fix cycle and other strategies allow navigation at night when navigators are distanced from the sentient cues that relate the ship’s position to its environment. They must rely on radar and limited environmental prompts to inform actions to maintain the ship’s course. Navigators’ mental representations of the ship’s position in oceanic darkness strongly influence the nature of the activities navigators use to monitor the ship’s course during their watch. The likelihood of encountering unanticipated contingent tasks increases when navigating through infrequently traveled waters and especially when piloting ships in the restricted waters of harbors and coastlines. Contingent tasks require the generation of novel responses (e.g., altering course and speed in response to approaching pleasure craft or changing weather) that may not be part of established procedures, because this type of navigation requires both adherence to restricted waters protocol (i.e., more frequent implementation of the fix cycle) and a collective awareness among navigation team members of the possibility of encountering an unscripted situation. Hutchins’s model of cultural cognition explains how navigators learn and adapt to rapidly changing navigation conditions through a process in which practice, learning, and the work environment are all simultaneously transformed. Systemic interdependence is accounted for in Hutchins’s model by overlapping distributions of knowledge among members of the navigation team and by the structure of shipboard authority and decision making. Hutchins clearly described the areas of overlapping knowledge among navigation team members, showed how the career trajectories of navigators are advanced through mastering ever-increasing areas of knowledge, and emphasized a decision-making process in which key personnel and environmental cues interact simultaneously, especially during crisis. Hutchins’s notion of distributed cognition reflects the network of interdependencies and the sharing of expertise associated with systemic interdependence.
Salkind_Chapter 37.indd 69
9/4/2010 10:42:20 AM
70
Curriculum, Instruction and Learning
The fix cycle also illustrates a central premise of Hutchins’s theory of cultural cognition – technology is best used to address the cognitive demands of complex tasks by using it to simplify the task, not to amplify cognitive ability. Illustrating the same point, Norman (1997) gave the example of using a computer for writing. Instead of designing computers and software programs to help the author create ideas with dialog boxes, menu choices, and other symbolic clutter, the computer should be used as a word processor to simplify the output process. Rather than attempting to use technology to extend one’s cognitive abilities, technology should transform what are normally difficult cognitive tasks into easy ones. A summary of how Hutchins’s theory addresses the cognitive demands of new technologies is given in Table 2.
Implications for Further Research This section summarizes key ideas from the preceding discussion and offers directions for further research on learning theory, future studies of workplace learning, and theory building. Implications for research on learning theory. Several implications for further research emerge from this examination of learning theory and the cognitive demands of new technologies. First, the four theories examined in the article address some of the cognitive demands of new technologies discussed more completely than others. All four theories explain cognitive mechanisms related to how workers deal with the cognitive demands of contingent versus deterministic tasks, stochastic events, and systemic interdependence, albeit from different theoretical perspectives (see Table 2). However, the task characteristic of distancing technologies is only partially addressed by these theories. Hutchins’s theory offers the most complete treatment of this task characteristic with its explicit description of the cognitive strategies and navigation techniques used by navigators to pilot ships in unknown waters at night. In addition, the theories of Wenger and Schon offer plausible explanations of how one might adapt to the cognitive demands of distancing technologies. However, the relevance of Scribner’s theory to this task characteristic is speculative. Distancing technologies separate the worker physically and psychologically from elements of the task environment and require capabilities for inference, imagination, and mental modeling to understand what is going on elsewhere. Technologies with distancing properties such as those enabled by the Internet and satellite technology are among the most recent, complex, and rapidly developing of technologies affecting the workplace. Because technologies with distancing properties are complex and have very short design-implementation cycles (i.e., they quickly become obsolete and replaced by newer technologies), there is little time to assimilate considerations
Salkind_Chapter 37.indd 70
9/4/2010 10:42:20 AM
Torraco
New Technologies
71
from new users and applications before the next generation of technologies is introduced. Thus, the turnover, complexity, and rapid development cycles associated with these technologies account, in part, for why they are incompletely addressed by theories of learning. Future research is needed to further examine the requirements these technologies place on users. Moreover, we need to know more about the human and environmental factors that support effective learning in this context. How do users develop the capabilities for inference, imagination, and mental modeling associated with the effective use of these technologies? What resources and environmental conditions are most conducive to developing expertise in the use of distancing technologies? Research to address these and other questions is needed to formulate new or revised theoretical explanations of effective learning in the use of these technologies. Future research is also needed to examine questions specific to other learning theories examined in this article. Schon’s theory of reflection in action reconceptualized teaching and learning in the professions. Although it explains how professionals engage in reflective practice, its applicability to nonprofessionals (technicians, supervisors, and skilled personnel who may not be considered professionals) is less apparent. Are skilled nonprofessionals included among those for whom learning through reflection in action is intended to apply? Although there is ample evidence that the capability for reflective practice is not limited to professionals, the scope and application of this theory to various populations of employees and occupations remains a question open to further study. Those who have closely studied communities of practice have raised some concern about their nebulous nature. Wenger and Snyder (2000) stated that “the organic, spontaneous and informal nature of communities of practice makes them resistant to supervision and interference” (p. 141). This makes communities of practice difficult to identify, assess, and cultivate. Thus, their existence in organizational contexts presents a paradox. Communities of practice create a type of value increasingly sought by organizations, yet the active development of such communities by organizations destroys them. How can communities of practice be fostered if their organic, spontaneous, and informal nature makes them resistant to supervision and interference? Further research might also attempt to extend Scribner’s model of practical thinking. Although Scribner’s least-effort strategies and context-specific solutions provide valuable insights into the cognitive strategies used by those in the work settings she studied, how well does Scribner’s model explain the contingent, stochastic, interdependent nature of today’s work? Further research to address these questions promises to yield valuable revisions and extensions of these theoretical explanations of how learning occurs in contemporary work environments. What can we learn from these theories? The learning theories examined in this article are but four among many theories that have been developed to explain learning in a variety of contexts including experiential learning,
Salkind_Chapter 37.indd 71
9/4/2010 10:42:20 AM
72
Curriculum, Instruction and Learning
learning in formal educational settings, workplace learning and on-the-job training, informal and incidental learning, role- and occupation-specific learning, and other types of learning. The ubiquity of learning and the broad range of contexts in which it occurs constrain the ability of learning theories to explain more than a particular domain within this diverse phenomenon. Even so, a class of learning theories is available to those seeking theoretical explanations of learning in work settings, and from among these, specific theories are available that adequately address the types of work settings and technical tasks discussed in this article. A central contention of this article is that relatively few learning theories that have been applied by scholars to work settings fully capture the behavioral and environmental dynamics of this distinctive phenomenon. Because the four theories selected for discussion here explain this phenomenon better than other theories, their attributes merit further examination. What is noteworthy here is not which of these four theories (Scribner’s, Schon’s, Wenger’s, or Hutchins’s) is the best or right theory for explaining this type of learning but the acknowledgment that this class of theories has theoretical properties that enable them to provide effective explanations of work-related learning and that distinguish them from other learning theories. Some features of sound theories. The four theories discussed in this article provide meaningful explanations of how learning occurs in the type of work settings discussed here. Why is this? Considering the need to reflect the workplace, what are the features of a good theory of learning? Five attributes embodied in the four theories covered in this article are summarized next. 1. Each theory describes specific cognitive processes. All the theories make explicit the cognitive processes for learning and describe the dynamics of learning in particular environments. Schon described the dynamics of learning as iterative cycles of reflection in action. Hutchins explained learning as a sociocultural process that occurs simultaneously with the activities of practice and changes in the environment. Scribner described specific solution strategies and how they were derived through learning by experienced workers. Wenger described how learning in practice is generated by the dynamic tension between experience and competence. All four theories describe specific cognitive processes and clearly explain how the dynamics of learning relate to other aspects of their theories. 2. Each theory addresses learning as an enabler and product of work practices. Rather than treating either learning or work practice as dominant, each theory reflects their reciprocal relationship by grounding learning in the conduct of practice. Work practice is one of four central concepts in Wenger’s theory of social learning; Schon proposed the reflective practicum as the setting to operationalize his theory of learning; Scribner showed how cognitive skills were dependent on the materials and conditions of practice; and
Salkind_Chapter 37.indd 72
9/4/2010 10:42:20 AM
Torraco
New Technologies
73
Hutchins proposed practice as the intersection of work activity, learning, and the environment, where all are simultaneously transformed. 3. Each theory explains how learning occurs in authentic work settings. Each theorist relied on ethnography or intimate knowledge of practice to describe the work settings and define the tasks in which learning was studied. None of the studies from which these theories were derived were purely theoretical or carried out in laboratories or other experimental settings. 4. Each theory is comprehensive in its treatment of the multiple ways in which knowledge can be generated. Rather than conceptualizing working knowledge as arising from cognitive or environmental sources alone, each theory accounts for multiple way s in which knowledge about work is generated and used. The theories explain how working knowledge can emerge from personal reflection and experience; from practicespecific tools, techniques, and conditions; through relationships with others; and through associations with other elements of the system and the environment. 5. Each theory offers propositions that can be generalized to other settings. Although each theory was based in studies of specific work environments and occupations, all theories offer principles of learning that have been applied elsewhere. Scribner’s “least-effort strategy” has been demonstrated in nonindustrial settings (Scribner & Cole, 1981; Scribner & Fahrmeier, 1983). Schon’s reflection in action has been applied to the preparation of architects, urban planners, artists, musicians, and athletes (Schon, 1987). Wenger’s theory has been used to explain communities of practice among photocopier repair technicians (Orr, 1996), refrigeration technicians (Henning, 1998), and insurance claims processors (Wenger, 1998). Hutchins’s original work on navigation technology has been applied to airline pilots (Hutchins & Klausen, 1996) and to the design of the human-computer interface (Hutchins, Hollan, & Norman, 1986). Each theory offers new knowledge about learning that can be applied and extended through further research. These five features of the learning theories discussed in this article provide a basis for developing better theories of learning. In addition to the criteria for evaluating theory offered by Bacharach (1989), Patterson (1983), and Whetten (1989) that can be applied to all theories, the features listed above are specifically applicable to theories intended to model the dynamics of learning and working. An additional distinction that cuts across these five features is also present in the four theories discussed here – the theories of Scribner, Schon, Wenger, and Hutchins conceptualize learning and working as phenomena that occur simultaneously. Learning and working are inseparable. Judging from the volume and variety of such studies, research on workplace learning is appealing to
Salkind_Chapter 37.indd 73
9/4/2010 10:42:20 AM
74
Curriculum, Instruction and Learning
many researchers from a variety of disciplines. Many of these studies focus primarily or exclusively on the phenomenon of learning and give secondary or cursory consideration to work activity. How appropriate is such an approach for the study of learning and working? The four theorists discussed in this article provide a clear, coherent response to this question – studies of learning and working should treat these as phenomena that occur simultaneously. The four theories examined here ground learning in the conduct of work practice and emphasize their reciprocal relationship. This important premise is the basis for the following model for studies of learning and working. The fabric of work activity is woven with fibers of work and fibers of learning (see Figure 1). Although learning and working are inseparable during most work activity, there are periods during which one or the other is the dominant or exclusive activity, such as during routine tasks that are performed unconsciously or during periods when learning is uninterrupted by task demands. These separations of learning from working are shown in Figure 1 as discontinuities in the fabric of work activity. However, much work activity can be characterized generally as a phenomenon in which learning and working are inseparable. This feature of work activity is evident in Scribner’s model of practical thinking, Schon’s reflection in action, Wenger’s concept of practice, and Hutchins’s cultural cognition. Future studies of learning and working need to treat these as phenomena that occur simultaneously. This requires giving greater attention to the basic questions such studies seek to answer. Research questions such as, How does learning occur in a particular setting? generally examine only the learning fibers of the fabric of work activity. Although such studies may include references to the conditions or context of the work itself, they do so in a way that marginalizes these factors and provides a central focus on learning. Unlike questions that probe learning only, broader questions that examine the entire fabric of work activity might ask, What is happening as someone works through a task or project? Such a question is more likely to reveal how the fibers of learning and working are woven together to constitute the type of work activity described in this article. Because the theories Working
Learning
Work Activity
Figure 1: The fabric of work activity
Salkind_Chapter 37.indd 74
9/4/2010 10:42:20 AM
Torraco
New Technologies
75
of Scribner, Schon, Wenger, and Hutchins conceptualize learning and working as phenomena that occur simultaneously, this represents an additional feature that distinguishes these from other learning theories and can be added to the five distinctions previously discussed. Those interested in developing broader, more integrative theories of how learning and working occur in contemporary work environments might arrive at better theories of learning and working using this perspective. Implications for theory-building research. Precise and logical conceptual development is the theorist’s central task when working back and forth from general domains to specific concepts and from existing knowledge to new theory. Just as the empirical researcher provides a detailed account of all data sources, instrumentation, and methods of data collection and analysis, the theorist allows other scholars to replicate the theorizing process by explicitly tracing all paths from existing knowledge to new theory. In short, clearly stated relationships among carefully selected concepts produces better theory. Theorizing that is replicable and provocative is more likely to advance our knowledge by stimulating further inquiry that leads to new knowledge. But how does a theory become provocative? Provocativeness (or fruitfulness) is the capacity of a theory to change research and/or practice in the field. A theorist who wishes to accurately model a cross-disciplinary phenomenon such as the cognitive demands of new technologies must confront the limits of her or his own discipline in relation to the cross-disciplinary system being modeled by the theory. Theorists who are interested in cross-disciplinary phenomena, but who venture too far away from domains they understand in attempting to explain these phenomena, risk developing theory that is poorly informed in unfamiliar domains. On the other hand, theorists who embrace the multiple content domains needed to model cross-disciplinary phenomena are likely to produce provocative theory. So the theorist (or cross-disciplinary theory-building team) starts with the accumulated knowledge from fields related to the phenomenon of interest. In the case of the model proposed in this article, relevant knowledge was needed from human factors/ergonomics, industrial engineering, information technology, psychology, cognitive science, and education. The model presented here was constructed to be a carefully selected combination of ideas synthesized from knowledge in all of these areas. Mindful of the need for balance between a theory’s comprehensiveness and parsimony, specific concepts – some created to explain new phenomena and others adapted from existing theories, were integrated into the full model. Because the new model contains thinking from several disciplines, it is more likely to stimulate new ideas within the theorist’s own discipline. New theories are provocative, in part, because they stretch our thinking across existing paradigms and beyond the boundaries of our discipline.
Salkind_Chapter 37.indd 75
9/4/2010 10:42:20 AM
76
Curriculum, Instruction and Learning
References Adler, P. S. (1986). New technologies, new skills. California Management Review, 29(1), 9–28. Bacharach, S. B. (1989). Organizational theories: Some criteria for evaluation. Academy of Management Review, 14(4), 496–515. Bandura, A. (1977). Social learning theory. Englewood Cliffs, NJ: Prentice Hall. Barley, S. R., & Bechky, B. A. (1994). In the backrooms of science: The work of technicians in science labs. Work and Occupations, 21(1), 85–126. Barley, S. R., & Orr, J. E. (1997). Between craft and science: Technical work in U.S. settings. Ithaca, NY: ILR Press. Berniker, E. (1987, November). Understanding technical systems. Paper presented at the Symposium on Management Training Programs: Implications of New Technologies, Geneva, Switzerland. Brown, J. S., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18(1), 32–42. Campbell, D. J. (1988). Task complexity: A review and analysis. Academy of Management Review, 13(1), 40–52. Campion, M. A., & Medsker, G. J. (1992). Job design. In G. Salvendy (Ed.), Handbook of human factors. New York: John Wiley. Davenport, T. H. (1993). Process innovation: Reengineering work through information technology. Boston: Harvard Business School Press. Engestrom, Y., & Middleton, D. (1996). Cognition and communication at work. New York: Cambridge University Press. Gagne, R. M. (1962). Military training and principles of learning. American Psychologist, 17, 83–91. Gill, T. G. (1996). Expert systems usage: Task change and intrinsic motivation. MIS Quarterly, 20, 301–329. Griffith, T. L., & Northcraft, G. B. (1996). Cognitive elements in the implementation of new technology: Can less information provide more benefits? MIS Quarterly, 20, 99–110. Graham, L. (1993). Inside a Japanese transplant: A critical perspective. Work and Occupations, 20(2), 147–173. Hackman, J. R. (1969). Toward understanding the role of tasks in behavioral research. Acta Psychologica, 31, 97–128. Hackman, J. R., & Oldham, G. R. (1980). Work redesign. Reading, MA: Addison-Wesley. Henning, P. H. (1998). Ways of learning: An ethnographic study of the work and situated learning of a group of refrigeration service technicians. Journal of Contemporary Ethnography, 27(1), 85–136. Hutchins, E. (1995). Cognition in the wild. Cambridge, MA: MIT Press. Hutchins, E., Hollan, J., & Norman, D. A. (1986). Direct manipulation interfaces. In D. A. Norman & S. Draper (Eds.), User centered system design: New perspectives in human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum. Hutchins, E., & Klausen, T. (1996). Distributed cognition in an airline cockpit. In Y. Engestrom & D. Middleton (Eds.), Cognition and communication at work. New York: Cambridge University Press. Khurana, A. (1999). Managing complex production processes. Sloan Management Review, 40(2), 85–97. Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. New York: Cambridge University Press. Leont’ev, A. N. (1981). Problems of the development of mind. Moscow: Progress. Levi-Strauss, C. (1966). The savage mind. Chicago: University of Chicago Press.
Salkind_Chapter 37.indd 76
9/4/2010 10:42:21 AM
Torraco
New Technologies
77
Locke, E. A., & Latham, G. P. (1990). A theory of goal setting and task performance. Englewood Cliffs, NJ: Prentice Hall. Marshall, C. C., & Shipman, F. M. (1995). Making large-scale information resources serve communities of practice. Journal of Management Information Systems, 11(4), 65–87. Martin, L. M. W., & Scribner, S. (1991). Laboratory for cognitive studies of work: A case study of the intellectual implications of new technology. Teachers College Record, 92(4), 582–602. Norman, D. A. (1988). Knowledge in the head and in the world. In The psychology of everyday things. New York: Basic Books. Norman, D. A. (1993). Things that make us smart: Defending human attributes in the age of the machine. Reading, MA: Addison-Wesley. Norman, D. A. (1997). Melding mind and machine. Technology Review, 100, 29–31. Norros, L. (1996). System disturbances as springboard for development of operators’ expertise. In Y. Engestrom & D. Middleton (Eds.), Cognition and communication at work. New York: Cambridge University Press. Orr, J. E. (1996). Talking about machines: An ethnography of a modern job. Ithaca, NY: ILR Press. Patterson, C.H.(1983). Theories of counseling and psychotherapy. Philadelphia: Harper and Row. Pentland, B. T. (1997). Bleeding edge epistemology: Practical problem solving in software support hot lines. In S. R. Barley & J. E. Orr (Eds.), Between craft and science: Technical work in U.S. settings. Ithaca, NY: ILR Press. Salvendy, G. (1987). Handbook of human factors. New York: John Wiley. Schon, D. A. (1983). The reflective practitioner: How professionals think in action. New York: Basic Books. Schon, D. A. (1987). Educating the reflective practitioner. San Francisco: Jossey-Bass. Scribner, S. (1984). Studying working intelligence. In B. Rogoff & J. Lave (Eds.), Everyday cognition: Its development in social context. Cambridge, MA: Harvard University Press. Scribner, S., & Beach, K. D. (1993). An activity theory approach to memory. Applied Cognitive Science, 7, 185–190. Scribner, S., & Cole, M. (1981). The psychology of literacy. Cambridge, MA: Harvard University Press. Scribner, S., & Fahrmeier, E. (1983). Practical and theoretical arithmetic (Working Paper No. 3). New York: Industrial Literacy Project, City University of New York. Simon, H. A. (1981). The sciences of the artificial (2nd ed.). Cambridge, MA: MIT Press. Smith, T. J., Henning, R. A., & Smith, K. U. (1994). Sources of performance variability. In G. Salvendy & W. Karwowski (Eds.), Design of work and development of personnel in advanced manufacturing. New York: Wiley-Interscience. Tobach, E., Falmagne, R. J., Parlee, M. B., Martin, L. M. W., & Kapelman, A. S. (1997). Mind and social practice: Selected writings of Sylvia Scribner. New York: Cambridge University Press. Wegner, D. M., Erber, R., & Raymond, P. (1991). Transactive memory in close relationships. Journal of Personality and Social Psychology, 61(6), 923–929. Weick, K. A. (1990). Technology as equivoque: Sensemaking in new technologies. In P. S. Goodman & L. S. Sproull and Associates (Eds.), Technology and organizations. San Francisco: Jossey-Bass. Weick, K. A., & Roberts, K. H. (1993). Collective mind in organizations: Heedful interrelating on flight decks. Administrative Science Quarterly, 38, 357–381. Wenger, E. (1998). Communities of practice: Learning, meaning, and identity. New York: Cambridge University Press. Wenger, E., & Snyder, W. M. (2000). Communities of practice: The organizational frontier. Harvard Business Review, 78(1), 139–145.
Salkind_Chapter 37.indd 77
9/4/2010 10:42:21 AM
78
Curriculum, Instruction and Learning
Whetten, D. A. (1989). What constitutes a theoretical contribution? Academy of Management Review, 14(4), 490–495. Woods, D. P., O’Brien, J. F., & Hanes, L. F. (1987). Human factors challenges in process control: The case of nuclear power plants. In G. Salvendy (Ed.), Handbook of human factors. New York: John Wiley. Woodward, J. (1994). Industrial organization: Theory and practice (4th ed.). London: Oxford University Press. Zuboff, S. (1988). In the age of the smart machine: The future of work and power. New York: Basic Books.
Salkind_Chapter 37.indd 78
9/4/2010 10:42:21 AM
38 Cognitive Conceptions of Learning Thomas J. Shuell
P
sychologists and educators have long been interested in understanding how people learn, for the concept of learning is central to many different human endeavors. Teaching, child rearing, counseling, and a wide variety of training situations, to name just a few areas, are all concerned in one way or another with individuals learning new knowledge and/or behavior. There is, of course, a long history of empirical research on learning dating back to the classic research of Ebbinghaus (1913) first published in 1885. During the first half of the present century, research on learning flourished (nearly all of it within the behavioral tradition of psychology), and learning theory exerted a strong influence on research and practice in many different spheres of psychology and education. This influence and interest in learning remained strong well into the 1960s. During the late 1960s and early 1970s, however, the zeitgeist of psychology began to change from a behavioristic to a cognitive orientation. Concern for the mind and the way it functions returned to scientific psychology. This cognitive orientation was clearly evident in research on topics such as meaningful verbal learning (Ausubel, 1962, 1963), discovery learning (e.g., Bruner, 1957, 1961), imagery (Paivio, 1969, 1971), “mathemagenic” behaviors (behaviors that give birth to learning) (Rothkopf, 1965, 1970), generative learning (Wittrock, 1974, 1978), and mnemonics (e.g., Bower, 1970). Nevertheless, during the period from about 1960 to 1980, research on learning per se – that is, a concern for those factors that produce changes in an individual’s behavior and/or knowledge – diminished drastically. For a variety of reasons (some of which will be discussed below), cognitive psychologists’ interest in learning gave way to other concerns. Cognitive psychologists occasionally Source: Review of Educational Research, 56(4) (1986): 411– 436.
Salkind_Chapter 38.indd 79
9/4/2010 3:17:36 PM
80
Curriculum, Instruction and Learning
acknowledged the importance of learning, but little effort was devoted to furthering our understanding of how learning occurs. In appraising this situation, Voss (1978) concluded that “although the concept of learning may be found in cognitive psychology, it also must be conceded that the cognitive view of learning is vague, is abstract, and, most important, is lacking a substantive data base” (p. 13). Similar conclusions were voiced by other cognitive psychologists (e.g., J. R. Anderson, 1982; Greeno, 1980a; Langley & Simon, 1981). Since about 1975, however, cognitive psychologists have shown a growing interest in learning, and a new era of research on learning may be at hand. Much, but certainly not all, of this more recent research represents an information-processing orientation and involves sophisticated computer models of learning. As one might expect, these cognitive conceptions of learning (both the earlier and the more recent ones) differ from traditional, behavioristic conceptions of learning in ways that enrich our understanding of how humans acquire new knowledge and new ways of doing things. The purpose of this article is to examine current conceptions of learning, primarily from the vantage point of modern-day cognitive psychology. To provide an appropriate perspective, however, similarities and differences between traditional and cognitive conceptions of learning will be discussed. After first highlighting some characteristics of traditional conceptions of learning, ways in which cognitive psychology has influenced research on learning will be considered. Next, several cognitive theories of learning will be described. Finally, implications for future research on learning and for educational practices will be outlined.
Traditional Conceptions of Learning During the 100 years since Ebbinghaus’ pioneering research, nearly all research on learning has been conducted within a behavioral framework. Although the Gestalt psychologists of the 1910s to 1930s (perhaps the chief forerunners of modern-day cognitive psychology) occasionally discussed learning, they were more interested in perception than in learning, and they usually interpreted learning in terms of perceptual principles of organization. For a variety of reasons (see, e.g., Stevenson, 1983), traditional research on learning focused primarily on animal learning rather than human learning (although this research has not been totally void of cognitive influence – see, e.g., Kimble, 1984). As a result, most research on learning has involved relatively simple forms of learning. Even in the case of human learning, most traditional studies of learning have employed simple tasks that involve memorization more than comprehension. But before continuing, perhaps it would be useful to consider what we normally mean by the term learning.
Salkind_Chapter 38.indd 80
9/4/2010 3:17:36 PM
Shuell
Cognitive Conceptions of Learning
81
The Concept of Learning The concern for learning, of course, focuses on the way in which people acquire new knowledge and skills and the way in which existing knowledge and skills are modified. Nearly all conceptions of learning have involved – either explicitly or implicitly – three criteria for defining learning (see, e.g., Shuell & Lee, 1976): (a) a change in an individual’s behavior or ability to do something, (b) a stipulation that this change must result from some sort of practice or experience, and (c) a stipulation that the change is an enduring one. The primary purpose of the latter two qualifications is to exclude certain types of behavioral changes that do not seem to represent what we mean by learning (maturation, temporary changes due to drugs, etc.). Although there appears to be general agreement among behavioral and cognitive conceptions of learning with regard to the defining characteristics of the underlying phenomenon, there are also a number of important differences between the two orientations. The only formal definition of learning from a cognitive perspective that I have been able to find (Langley & Simon, 1981) fits the above criteria almost perfectly: “Learning is any process that modifies a system so as to improve, more or less irreversibly, its subsequent performance of the same task or of tasks drawn from the same population” (p. 367). The main difference appears to be the emphasis on the performance of a system rather than on the behavior of an individual. Cognitive conceptions of learning, however, focus on the acquisition of knowledge and knowledge structures rather than on behavior per se, on “ . . . discrete change between states of knowledge rather than [on] change in probability of response” (Greeno, 1980a, p. 716). The significance of this difference is not as minor as it might appear, for if it is knowledge that one learns, “ . . . then behavior must be the result of learning, rather than that which itself is learned” (Stevenson, 1983, p. 214). There also tends to be general (although not complete) agreement among behavioral and cognitive conceptions of learning that both environmental factors and factors internal to the learner contribute to learning in an interactive manner (e.g., Brown, Bransford, Ferrara, & Campione, 1983). As one might expect, however, the different positions disagree on which side of this learner-environment equation is most important. For example, behavioral approaches focus on changing the environment in order to influence learning (e.g., by providing reinforcement when the appropriate response is made), whereas cognitive approaches focus more on changing the learner (e.g., by encouraging the person to use appropriate learning strategies). There are also considerable differences with regard to both what is learned (e.g., behavior vs. structured knowledge) and the factors that influence the learning process (e.g., reinforcement vs. strategies for obtaining feedback).
Salkind_Chapter 38.indd 81
9/4/2010 3:17:36 PM
82
Curriculum, Instruction and Learning
The Transition Begins Although the seeds of modern-day cognitive psychology were present during the 1930s (e.g., Bartlett, 1932; Tolman, 1932), they did not grow to fruition, especially with regard to learning, for many years. During the 1960s, research on learning, especially verbal learning (the main body of research on human learning during this period), began to undergo a change that reflected views more consistent with cognitive interpretations of behavior. Investigators began to question, for example, whether simple conceptions of learning could adequately handle the more complex forms of learning encountered in real-life situations such as the classroom. The debate about whether classical conditioning and operant conditioning represent one or two different types of learning (see Kimble, 1961) was extended by Gagné’s (1962, 1965) postulation of eight types of learning, including complex forms of learning such as concept learning and problem solving. People started to realize that even simple learning materials (e.g., nonsense syllables, isolated words) have meaning and that this meaningfulness can influence the learning process (e.g., Underwood & Schulz, 1960). The realization that learners were not passive during learning (e.g., Bruner, 1957; Miller, Galanter, & Pribram, 1960) began to spread. For example, subjects often selected a stimulus (the “functional stimulus”) that differed from the one intended by the experimenter (the “nominal stimulus”) (Underwood, 1963), and when allowed (e.g., the free-recall paradigm), they organized the material being learned in meaningful ways, even in the absence of obvious bases of organization (Shuell, 1969; Tulving, 1968). Thus, a transition had begun from a strictly behavioristic orientation to one that involved more cognitive activities. But somewhere in the transition the concern for learning got set aside. There are many reasons for the demise of interest in learning. Among the more obvious reasons are the following: 1. The appearance of experimental data that were difficult to reconcile with existing theories of learning (see Stevenson, 1983; White, 1970). Included among the many examples of this problem are age changes in the solution of reversal-nonreversal shift problems (Kendler & Kendler, 1962), the presence of organizational patterns in free recall (Shuell, 1969), and the transfer data that led to the notion of the functional stimulus (Underwood, 1963). 2. The feeling that one must understand the nature of the performance system before one can investigate learning (Newell & Simon, 1972). It is difficult to study transitions between knowledge states without first knowing something about the knowledge states between which the transition is being made, a problem directly analogous to the classical requirements for operational definitions and criterion specification.
Salkind_Chapter 38.indd 82
9/4/2010 3:17:36 PM
Shuell
Cognitive Conceptions of Learning
83
3. The realization that the laws of learning depend on the context in which it occurs and the prior knowledge of the learner (for a good discussion of this point, see Siegler, 1983). 4. The ability of fresh ideas to capture the interest of investigators becoming bored with decades of traditional thinking about learning – that is, the zeitgeist of cognitive psychology. In addition, the cognitive psychologists of the 1960s and 1970s became interested in identifying and describing the various stages and processes involved in human information processing. This focus led naturally to a concern for the nature of the memory system rather than learning – that is, how knowledge is represented in memory rather than how changes in knowledge take place. Different research questions were being asked; different paradigms were being employed; different assumptions were being made; and different theories were being developed.
The Influence of Cognitive Psychology Cognitive psychology is concerned with various mental activities (such as perception, thinking, knowledge representation, and memory) related to human information processing and problem solving, and it presently represents the mainstream of thinking in both psychology and education. The emphasis is no longer strictly on behavior, but on the mental processes and knowledge structures that can be inferred from behavioral indices and that are responsible for various types of human behavior. Thus, with regard to learning, the search by learning psychologists of the 1950s and 1960s for atheoretical, functional relationships (Underwood, 1964) has shifted to a concern for the thought processes and mental activities that mediate the relationship between stimulus and response (see, e.g., Wittrock, 1986). Nevertheless, cognitive psychology has influenced learning theory and research in several significant ways, including (a) the view of learning as an active, constructive process; (b) the presence of higher-level processes in learning; (c) the cumulative nature of learning and the corresponding role played by prior knowledge; (d) concern for the way knowledge is represented and organized in memory; and (e) concern for analyzing learning tasks and performance in terms of the cognitive processes that are involved.
Learning as an Active Process Cognitive approaches to learning stress that learning is an active, constructive, and goal-oriented process that is dependent upon the mental activities of the learner. This view, of course, contrasts with the behavioral orientation that
Salkind_Chapter 38.indd 83
9/4/2010 3:17:36 PM
84
Curriculum, Instruction and Learning
focuses on behavioral changes requiring a predominantly passive response from the learner to various environmental factors. Although operant conditioning requires the learner to make an overt response (so that it can be reinforced), the active nature of learning suggested by cognitive psychologists is very different. The cognitive orientation, for example, focuses on the mental activities of the learner that lead up to a response, and it explicitly acknowledges the following: (a) the role of metacognitive processes such as planning and setting goals and subgoals (e.g., Brown et al., 1983; Flavell, 1981); (b) the active selection of stimuli (e.g., the distinction between functional and nominal stimuli; Underwood, 1963); (c) the attempt by learners to organize the material they are learning, even when no obvious bases of organization are present in the materials being learned (e.g., Shuell, 1969; Tulving, 1968); (d) the generation or construction of appropriate responses (e.g., Wittrock, 1974); and the use of various learning strategies (e.g., Weinstein & Mayer, 1986). The suggestion that memory (e.g., Bartlett, 1932; Cofer, 1973; Jenkins, 1974) and learning (e.g., Wittrock, 1974) both require the learner to actively construct new knowledge and strategies is appealing to many cognitive psychologists, but these views are plagued with a theoretical paradox (see, e.g., Bereiter, 1985). The problem arises when a learner acquires a new cognitive structure that is more advanced or complex than the structures that are presently possessed. The paradox involves the need to explain how the learner can acquire the new cognitive structure without already having an existing cognitive structure more advanced or complex than the one being acquired – a situation that is easier to explain in terms of innate mental structures than in terms of learning. Bereiter (1985) suggests 10 “resources” that permit one to avoid this “learning paradox,” but few studies currently support their validity.
Higher-Level Processes in Learning Most cognitive conceptions of learning acknowledge the hierarchical nature of the psychological processes responsible for learning. Miller, Galanter, and Pribram’s (I960) book, Plans and the Structure of Behavior, proved very influential in popularizing the notion that behavior is hierarchically organized. Since the late 1970s, the higher-level (superordinate, executive) processes of learners have typically been referred to as metacognition (see, e.g., Brown, 1978; Brown et al., 1983; Flavell, 1979). Although such analyses raise the homunculus or “inner man” problem, such concerns need not be fatal. (For a discussion of this problem, see Brown et al., 1983.) Generally, two types of metacognitive activities are involved in learning. The first involves regulation and orchestration of the various activities that must be carried out in order for learning to be successful (planning, predicting what information is likely to be encountered, guessing, monitoring the learning process, etc.) (e.g., Brown, 1978). Since learning is goal oriented,
Salkind_Chapter 38.indd 84
9/4/2010 3:17:36 PM
Shuell
Cognitive Conceptions of Learning
85
the learner must somehow organize his or her resources and activities in order to achieve the goal. The second is concerned with what one does and does not know about the material being learned and the processes involved in learning it. Flavell and Wellman (1977) suggest four general classes of metacognitive knowledge: (a) tasks – knowledge about the way in which the nature of the task influences performance on the task; (b) self – knowledge about one’s own skills, strengths, and weaknesses; (c) strategies – knowledge regarding the differential value of alternative strategies for enhancing performance; and (d) interactions – knowledge of ways in which the preceding types of knowledge interact with one another to influence the outcome of some cognitive performance. An example of the hierarchical nature of learning is Sternberg’s (1984a, 1984b) componential theory of knowledge acquisition. Sternberg suggests that performance is regulated by nine metacomponents (executive processes) such as “recognition of just what the problem is that needs to be solved” (Sternberg, 1984b, p. 165). These metacomponents operate on lower-level performance components (processes used in the execution of a task, such as encoding and comparison) and three knowledge-acquisition components: 1. Selective encoding (sifting out relevant information from irrelevant information, in the stimulus environment, in order to select information for further processing). 2. Selective combination (combining selected information in such a way as to render it interpretable; that is, integrating it in some meaningful way). 3. Selective comparison (rendering newly encoded or combined information meaningful by perceiving its relations to old information previously stored). (Sternberg, 1984b, p. 168) These knowledge-acquisition components operate on a variety of cues present in the material being learned, although cue utilization is affected by moderating variables such as number of occurrences, variability of contexts, location of cues, importance of the to-be-learned information, and density of the information to be learned (Sternberg, 1984a).
The Role of Prior Knowledge Learning is cumulative in nature; nothing has meaning or is learned in isolation. Cognitive conceptions of learning place considerable importance on the role played by prior knowledge in the acquisition of new knowledge. Whereas traditional research on verbal learning was Concerned with transfer and the effect of proactive inhibition on retention, concern for what the learner had already acquired focused on associations between individual stimuli and responses rather than on the acquisition of meaning from organized bodies of knowledge.
Salkind_Chapter 38.indd 85
9/4/2010 3:17:36 PM
86
Curriculum, Instruction and Learning
In the early 1970s, several studies (Bransford & Johnson, 1972; Dooling & Lachman, 1971) demonstrated that what the learner already knows and the extent to which this knowledge is activated at the time of learning has important implications for what will be acquired and for whether or not the material being studied will make any sense to the learner. Realizations such as these led to the development of schema theory (e.g., R. C. Anderson, 1984), which stresses that the organized, structured, and abstract bodies of information (known as schemata) that a learner brings to bear in learning new material determine how the task is interpreted and what the learner will understand and acquire from studying the task. The traditional concept of transfer was concerned with the way prior learning influences later learning, and this influence was explained in terms of the similarity between stimuli and responses in the two situations. The newer cognitive concern for the role of prior knowledge in learning, however, recognizes that for meaningful forms of learning this process is more complex than the one suggested by earlier approaches to transfer. For example, Bransford and Franks (1976) suggest that the role of prior knowledge is to establish “boundary constraints” for identifying both the “sameness” and the “uniqueness” of novel information: “From the present perspective, growth and learning do not simply involve an expansion of some body of interconnected facts, concepts, etc. Learning involves a change in the form of one’s knowledge so that it can set the stage for new discoveries” (p. 112). Likewise, within the context of cognitive development, Siegler (1983) and Siegler and Klahr (1982), among others, have emphasized the importance of prior knowledge (especially the rules used to perform various tasks) in determining when children are ready to learn new material. Another change that has occurred recently is an emphasis on domainspecific knowledge and learning skills (e.g., Glaser, 1984). Although this change in thinking cannot be attributed directly to the rise of cognitive psychology, it has had a substantial influence on cognitive conceptions of learning. Traditional research on learning sought general laws applicable to all individuals and all subject-matter areas. However, recent research on individuals with differing levels of expertise in a particular subject, such as physics (e.g., Chi, Glaser, & Rees, 1982), has shown convincingly that experts and novices solve problems in fundamentally different ways. Although controversy remains over the relative importance of domain-specific knowledge and general, domain-independent learning strategies (e.g., Block, 1985; Glaser, 1984; Sternberg, 1985), it is generally recognized (e.g., Glaser, 1985; Keil, 1984) that both are important in most learning situations. Consequently, there is an important relationship between the emphasis on domain-specific knowledge and the concern for prior knowledge that is evident in research on cognitive learning.
Salkind_Chapter 38.indd 86
9/4/2010 3:17:36 PM
Shuell
Cognitive Conceptions of Learning
87
The Question of What is Learned One major difference between behavioral and cognitive conceptions of learning concerns the nature of what an individual learns. Behavioral approaches typically suggest either that the learner acquires associations or “bonds” between a stimulus and a response (e.g., Thorndike, 1913) or that the issue of what an individual might acquire internally (i.e., “theories” of learning) is totally irrelevant for understanding the factors responsible for learning (i.e., changes in behavior) (Skinner, 1950). Cognitive psychologists, on the other hand, are primarily concerned with meaning rather than with behavior per se – that is, a concern for the manner in which an individual extracts meaning from some experience. The emphasis is on understanding, not merely on learning how to perform a task, and on the acquisition of knowledge rather than on the acquisition of behavior. If knowledge is what an individual learns, then behavior is the result of learning rather than what an individual acquires (Stevenson, 1983). Generally, this knowledge is best represented by complex knowledge structures rather than by simple associations.1 These knowledge structures are usually conceptualized as networks of information specifying the relationship among various facts and actions (e.g., J.R. Anderson, 1980; Norman et al., 1975). There are, however, other ways of conceptualizing what an individual acquires in cognitive learning. For example, both Scandura (1970, 1977) and Siegler (1983) have suggested that rules are useful units for characterizing what people learn. Actually, it seems likely that humans have several different ways and /or modes for representing knowledge. For example, a distinction is frequently made in cognitive psychology between propositional (or declarative) and procedural knowledge, and it appears likely that there are several additional forms of knowledge representation (e.g., Gagné & White, 1978; Shuell, 1985) and several different memory systems (Tulving, 1985).
Cognitive Process Analysis One important consequence of the cognitive influence on learning has been an interest in analyzing performance and cognitive abilities in terms of the cognitive processes involved in performing a cognitive task, including performance on tests of mental ability such as intelligence (e.g., Carroll, 1976; Snow & Lohman, 1984; Sternberg, 1979), inductive reasoning (e.g., Pellegrino & Glaser, 1982), and deductive reasoning (e.g., Johnson-Laird, 1985). For example, Sternberg (1977) proposed that analogical reasoning – which some (e.g., Rumelhart & Norman, 1981) have suggested is the basis of cognitive learning – involves six cognitive processes: (a) encoding the various terms that
Salkind_Chapter 38.indd 87
9/4/2010 3:17:36 PM
88
Curriculum, Instruction and Learning
make up the analogy, (b) inferring the relationship between the first two terms of the analogy, (c) mapping or discovering a higher-order rule that relates the first and third terms of the analogy, (d) applying the results of the inferring and mapping components to the third term in order to generate an appropriate fourth term, (e) an optional justification process in which one of the answers provided is selected as being the closest to the “ideal” answer produced by the application process, and (f ) a response process whereby the solution is translated into a response. This type of cognitive process analysis also has been applied to various types of instructional tasks such as the learning of geometry (e.g., Greeno, 1978, 1980b), physics (e.g., Champagne, K lopfer, & Gunstone, 1982; Heller & Reif, 1984), reading (e.g., Omanson, Beck, Voss, & McKeown, 1984), and addition and subtraction (e.g., Carpenter, Moser, & Romberg, 1982). Such analyses can help us to better understand both the cognitive processes involved in learning and the instructional techniques most likely to facilitate that learning.
Cognitive Theories of Learning Most cognitive conceptions of learning reflect an overriding concern for the more complex forms of learning, that is, the types of learning frequently characterized as “meaningful” or where one “learns for understanding.” For the most part, cognitive psychologists have been interested in the latter approach. As Norman (1978) put it: I do not care about simple learning. . . . that only takes 30 minutes. I want to understand real learning, the kind we all do during the course of our lives. . . . I want to understand the learning of complex topics. . . . [those] with such a rich set of conceptual structures that it requires learning periods measured in weeks or even years. (p. 39)
One problem with meaningful learning is that it is difficult to define. Although an operational definition is not readily available, it is possible to provide examples of the differences that concern many investigators. It makes little sense to most people, for example, to say that one “understands” his or her phone number; we “can learn, know, or remember a phone number, but not understand one” (Markman, 1981, p. 63). Only information that is structured or organized can be thought of as being meaningful and can serve as an object of understanding (Bransford & McCarrell, 1974; Moravcsik, 1979). Although some investigators would apparently limit cognitive learning to the acquisition of information that is structured or organized, higher-order thought processes are involved in many forms of simpler learning as well (e.g., when elaboration occurs or when mnemonics are used – see Pressley & Levin, 1983a, 1983b). It seems reasonable to suggest that all of these different types of situations fall within the domain of cognitive learning.
Salkind_Chapter 38.indd 88
9/4/2010 3:17:36 PM
Shuell
Cognitive Conceptions of Learning
89
Various attempts have been made over the years to articulate the role of learning from a cognitive or human information-processing perspective (for a detailed discussion of these approaches, see Bower and Hilgard, 1981). The following discussion will focus on those theories that have most influenced current thinking and research on cognitive learning.
Early Conceptions During the late 1950s and early 1960s, several writers began to formulate cognitive theories of learning. For example, Bruner (1957, 1961) talked about learning in terms of “discovery” and “going beyond the information given.” According to Bruner (1957), learning occurs when an “organism . . . code[s] something in a generic manner so as to maximize the transferability of the learning to new situations” (p. 51). He goes on to identify four general sets of conditions under which such learning will occur: (a) the “set to learn” or “attitude toward learning”, (b) an appropriate need state (in which an “optimal” level of motivation is discussed), (c) prior mastery of the original learning (and its importance for generic coding), and (d) diversity of training. The first systematic model of cognitive learning, however, was Ausubel’s (1962, 1963) subsumption theory of meaningful verbal learning. Ausubel makes clear that the theory is concerned only with “meaningful” (as opposed to “rote”), “reception” (as opposed to “discovery”) learning. According to this theory, new, potentially logical information is subsumed (incorporated) into the learner’s existing cognitive structure. The availability of an existing cognitive structure – hierarchically organized with progressive differentiation within a given field of knowledge from more inclusive concepts to less inclusive subconcepts – is seen as the major factor affecting meaningful learning, and the use of “advance organizers” (models or other types of representation that provide a structured overview of the material to be learned) can help ensure that such availability exists. Another major factor is the extent to which the new material is discriminable from the existing cognitive structure that subsumes it. This discriminability can be facilitated by repetition and/or by explicitly pointing out the similarities and differences between the new materials and their presumed subsumers in cognitive structure. Finally, the retention of meaningful material was thought to be influenced by repetition, the length of time that relevant subsuming concepts had been part of the learner’s cognitive structure, the use of appropriate exemplars, and multicontextual exposure. Another early theory of cognitive learning was Wittrock’s (1974) model of generative learning. According to this model (Wittrock, 1974, 1978), people learn meaningful material by generating or constructing relationships among new information and knowledge already stored in long-term memory. These verbal and imaginal elaborations occur as the learner seeks to discover the
Salkind_Chapter 38.indd 89
9/4/2010 3:17:36 PM
90
Curriculum, Instruction and Learning
underlying rule or relationship “by drawing inferences [about the rule], applying it, testing it, and relating it to other rules and to experience” (Wittrock, 1978, p. 26). It was recognized that individuals might proceed differently and that different instructional adjuncts could elicit the appropriate cognitive processes. It appears that the primary mechanisms of learning, according to the generative model, consist of the learner making inferences about potential relationships and then actively seeking feedback on the adequacy of these relationships. Bransford and Franks (1976) suggested that understanding or comprehension involves the acquisition of novel information that is difficult, if not impossible, for the traditional, “memory metaphor” model of learning to explain. They suggest that learning that involves understanding (i.e., comprehension) occurs via a process of decontextualization. That is, knowledge is initially acquired in a specific context; in order for understanding to occur, this knowledge must become more abstract so that it can be related to a variety of different situations. A mechanism for this decontextualization process is not suggested, but Bransford and Franks suggest that concepts and knowledge become abstract by virtue of being used to clarify a number of situations, and thus stress the importance of the learner encountering relevant examples. Most of the more recent work on cognitive learning has occurred within the area of artificial intelligence (AI), where the goal has been to develop computer programs that can learn. A concern for simulating learning was present in some of the early work on AI – for example, the EPAM model (Feigenbaum, 1959; Simon & Fiegenbaum, 1964). Since about 1975, this interest has intensified, especially with regard to J. R. Anderson’s (1982, 1983) ACT theory. The programs of interest here are those intended to serve as models or theories of human cognitive learning. In general, attempts to define cognitive learning have emphasized a system of processes, relationships among concepts and/or facts, and the restructuring of schemata. The similarities and differences between behavioral and cognitive conceptions of learning can be illustrated by considering several prominent theories of cognitive learning and the mechanisms considered to be responsible in the psychological changes we refer to as learning.
Rumelhart and Norman The first comprehensive theory of cognitive learning was Rumelhart and Norman’s (1978) attempt to account for the process of learning within a schema-based theory of long-term memory, although they emphasized that “learning is not a unitary process: No single mental activity corresponding to learning exists. . . . and no single theoretical description will account for the multitude of ways by which learning might occur”
Salkind_Chapter 38.indd 90
9/4/2010 3:17:36 PM
Shuell
Cognitive Conceptions of Learning
91
(p. 50).2 Rumelhart and Norman suggest three qualitatively different kinds of learning: (a) accretion, or the encoding of new information in terms of existing schemata; (b) restructuring or schema creation, or the process whereby new schemata are created; and (c) tuning or schema evolution, or the slow modification and refinement of a schema as a result of using it in different situations. Most models of memory involve learning by accretion. New information is interpreted in terms of preexisting schemata, and this process occurs most readily when the material being learned is consistent with schemata already available in memory. The new information is added to knowledge already in memory without any changes being made in the way that knowledge is organized. Accretion involves the acquisition of factual information that some people might refer to as memorization. Resnick (1984) refers to this type of learning as schema instantiation and suggests that it is similar to the Piagetian concept of assimilation. Norman (1978) suggests that “ . . . accretion learning requires study, probably with the use of mnemonic aids (and deep levels of processing). It can be tested by conventional recall and recognition techniques” (p. 40). Interference from related topics tends to be high and transfer to related topics tends to be low. Tuning and restructuring are similar to the Piagetian concept of accommodation (Resnick, 1984). Restructuring may occur without any formal addition of new knowledge – that is, the learner may already have all of the necessary information and the only thing that occurs is a reorganization of existing knowledge. Rumelhart and Norman (1978) suggest two basic ways for restructuring to occur: (a) schema induction, which is a form of learning by contiguity in which certain spatial or temporal co-occurrence of schemata results in the formation of a new schema, and (b) patterned generation, in which a new schema is patterned (copied with modifications) on an old schema. Restructuring occurs as a result of encountering examples, analogies, and metaphors, as well as through tutorial interactions such as Socratic dialogue. Tests of restructuring should include conceptual tests and questions that require inference or problem solving (Norman, 1978). Generally, learning that involves the creation of new schemata occurs as the result of analogical processes – that is, we learn new schemata by relating new information to old schemata in analogical ways (Rumelhart & Norman, 1981). Tuning involves the slow and gradual refinement of existing schemata, a process that lasts a lifetime. Norman (1978) suggests that tuning is “ . . . best accomplished by practice at the task or in using the concepts of the topic matter. Tests of tuning should be measures of speed and smoothness, [including] performance under stress or pressure” (p. 42). With tuning there is low interference from related topics, and transfer to related topics is high with regard to general knowledge and very low with regard to specific (tuned) knowledge.
Salkind_Chapter 38.indd 91
9/4/2010 3:17:36 PM
92
Curriculum, Instruction and Learning
John Anderson’s ACT Most cognitive psychologists distinguish between declarative and procedural knowledge. Declarative knowledge is our knowledge about things and is usually thought to be represented in memory as an interrelated network of facts (e.g., 2 + 3 = 5, 5 × 4 = 20) that exist as propositions. Procedural knowledge is our knowledge of how to perform various skills (e.g., produce the correct sum when given an addition problem, solve a word problem). John Anderson (1982, 1983) has developed a computer program (i.e., a theory) called ACT (or ACT∗, as the current version is called) that is capable of learning procedural knowledge such as solving geometry proofs and other types of problems. In contrast to Rumelhart and Norman’s (1978) belief that there are many forms of learning, ACT is based on the presumption that a single set of learning processes is “ . . . involved in the full range of skill acquisition, from language acquisition to problem solving to schema abstraction” (Anderson, 1983, p. 255). Since ACT is the most explicit and comprehensive of current cognitive theories of learning, it will be described in some detail. The distinction between declarative and procedural knowledge is a fundamental part of the ACT theory. Declarative knowledge is represented in ACT as a network of propositions (i.e., statements of relationships among concepts, events, etc.), and procedural knowledge is represented as a system of productions (i.e., statements of the circumstances under which a certain action should be carried out and the details of what should be done when that action is appropriate). The theory is concerned with the acquisition of both declarative and procedural knowledge, as well as the transition between the two, although the emphasis is more on the latter than the former. According to ACT, knowledge in a new domain always begins as declarative knowledge; procedural knowledge is learned by making inferences from facts available in the declarative knowledge system. Anderson (1982, 1983) suggests that three stages are involved in learning procedural knowledge: the declarative stage, the knowledge compilation stage, and the procedural stage. These stages are similar to the three phases of skill learning suggested by Fitts (1964). The ACT theory is basically organized for problem solving in the belief that problem solving is the basic mode of cognition (Anderson, 1982; Newell, 1980). Consequently, the ACT system is organized in a hierarchical, goal-structured manner, with both performance and the various learning mechanisms operating under the control of some goal or subgoal. When new information is encountered, it is coded probabilistically into a network of existing propositions as declarative knowledge. The activation of various propositions in this network is determined by the strength of nodes – that is, points in the knowledge structure representing specific
Salkind_Chapter 38.indd 92
9/4/2010 3:17:36 PM
Shuell
Cognitive Conceptions of Learning
93
concepts, relationships among concepts (propositions), or images – which varies directly with practice and inversely with the passage of time. This declarative knowledge has little, if any, direct control on behavior. Rather, the impact of declarative knowledge on behavior, according to Anderson (1982), is filtered through an interpretive system that is well oiled in achieving the goals of the system. . . . New information should enter in declarative form because one can encode information declaratively without committing control to it and because one can be circumspect about the behavioral implications of declarative knowledge. (pp. 380–381)
During the declarative stage, general problem-solving procedures are used to interpret new information in a way that directs the learner’s behavior toward dealing with the task at hand. At some point this declarative knowledge is compiled into higher-order procedures (productions) that apply the knowledge and increase efficiency in dealing with the learning task (e.g., problem). Finally, ACT uses an adaptive production system that engages in the type of learning referred to in the preceding section (on Rumelhart and Norman) as tuning, a process that refines the procedure. Three learning mechanisms are used as the basis of this tuning: (a) generalization, a process by which production rules become broader in their range of applicability; (b) discrimination, a process by which production rules become narrower in their range of applicability; and (c) strengthening, a process by which better rules are strengthened and poorer rules are weakened. An example of how the ACT theory would explain the way a child learns to do addition problems would begin with statements (perhaps spoken by the teacher or read in a textbook) of certain facts such as: “In addition problems, one first adds the numbers in the rightmost column;” “Next, you add the numbers in the second column”3 and so forth. With some practice (and perhaps examples by the teacher), these statements of fact are transformed into the ability to actually do what these statements say need to be done. (While many educators are aware that knowing about something does not necessarily mean that the student has acquired the procedures for translating that knowledge into practice, this distinction is made explicitly by cognitive psychology.) The ability to carry out the actions specified might be represented as productions (P) such as the following:4 P1. P2.
IF the goal is to do an addition problem, THEN add the numbers in the rightmost column. IF the goal is to do an addition problem and the rightmost column has already been added, THEN add the numbers in the second column.
Salkind_Chapter 38.indd 93
9/4/2010 3:17:37 PM
94
Curriculum, Instruction and Learning
With additional experience, these (along with other productions) might be compiled into the following, higher-order productions taken from J. Anderson (1982, p. 371): P3. P4.
IF the goal is to do an addition problem, THEN the subgoal is to iterate through the columns of the problem. IF the goal is to iterate through the columns of an addition problem and the rightmost column has not been processed, THEN the subgoal is to iterate through the rows of the rightmost column and set the running total to zero.
These and other productions would then be compiled into yet other, more general productions that would enable the student to solve addition problems smoothly and efficiently. As other tasks are encountered, however, generalization may occur with the result that various production rules will become broader in their range of applicability. Generalization in ACT is similar to the traditional concept of generalization, except that in ACT generalization involves the learner (i.e., the program) searching for appropriate similarities among production rules and then creating a new production rule that combines those features that the two rules have in common. The search for rules is, of course, the feature that distinguishes this cognitive version of generalization from more traditional behavioristic ones. For example, in learning to solve addition problems, the student acquired production P3 above. Later, in learning to solve subtraction problems, the following production may be acquired: P5.
IF
the goal is to do a subtraction problem,
THEN the subgoal is to iterate through the columns of the problem.
The similarity between productions P3 and P5 are noticed, and the following generalization is formed: P6.
IF the goal is to do an LV problem, THEN the subgoal is to iterate through the columns of the problem.
LV is a “local variable” defined by the specific instances in which the production might apply. The new, more general production would not replace the two original ones; they would continue to apply in special circumstances. Transfer is facilitated, according to ACT, if the same components are taught in two different procedures so that the commonality is more likely to be noticed and generalization can occur. Thus, the transfer involved in learning to drive a new car will be greater if the individual has previously driven several different cars rather than only a single car, a position that is consistent with the results of a number of transfer studies (see e.g., Shuell & Lee, 1976, pp. 71–72) and more recent work on cognitive learning (e.g., Sternberg, 1984a). There are, of course, many situations in which the range of applicability of a production needs to be limited – that is, discrimination needs to
Salkind_Chapter 38.indd 94
9/4/2010 3:17:37 PM
Shuell
Cognitive Conceptions of Learning
95
occur if the learner is to produce appropriate behavior. For example, in learning to solve addition problems, the student may have acquired the following production: P7.
IF
the goal is to iterate through the rows of a column and the top row has not been processed, THEN the subgoal is to add the digit of the top row into the running total.
Through the use of this production in a variety of different types of addition problems, generalization may have occurred. In fact, when first encountering subtraction problems, the learner may attempt to employ this production, which has worked in the past. Obviously, if the student is to be successful in solving subtraction problems, he or she must learn to discriminate between production P7 and a similar production in which the action specified involves subtracting (rather than adding) the digit from the running total. Discrimination in ACT depends on the learner experiencing both correct and incorrect application of the production, a requirement that is consistent with the well documented need for the learner to encounter both positive and negative exemplars in concept learning (see, e.g., Tennyson & Park, 1980). Two types of discrimination are involved in ACT: action discrimination involves learning a new action and can occur only when feedback is obtained about what action is correct in the situation being considered. Condition discrimination involves restricting the conditions under which the old action was carried out, although the new, more restrictive productions coexist with the original production rather than replacing it. Generalization and discrimination are viewed as the inductive components of the learning system embodied in ACT. Due to the nature of induction, generalization and discrimination will err and produce incorrect and/ or inappropriate productions – for example, overgeneralizations and useless discriminations. A mechanism that strengthens successful productions will help to ensure that appropriate behavior will occur. While the strengthening mechanism in ACT is fairly complex, it functions basically by modifying the probability attached to a given production, depending on the positive and negative feedback it receives.
Implications for Future Research A new wave of research on learning is beginning within the various cognitive sciences. Although much of this research holds promise for new and more powerful theories of human learning, considerable work remains to be done before a truly viable and comprehensive theory (or theories) of learning (i.e., capable of accounting for both simpler and more complex forms of learning) is available. As might be expected, given its relative newness,
Salkind_Chapter 38.indd 95
9/4/2010 3:17:37 PM
96
Curriculum, Instruction and Learning
much of this research has focused on theoretical discussions of its nature and empirical demonstrations that certain types of processes and factors have been overlooked in traditional research on human learning – e.g., learners construct appropriate responses rather than merely react to environmental stimuli (see Wittrock, 1974) and encoding plays a crucial role in learning; see Siegler, 1983 and Sternberg, 1984a. A number of problems should be addressed by future research, and some of the challenges will be discussed in terms of: (a) variables that affect the learning process; (b) the relationship between knowledge and learning, including the role of prior knowledge and domain-specific versus domain-independent (general) aspects of learning; and (c) phases of learning.
Variables Affecting Learning Little is known about the specific variables (e.g., environmental events) that influence the learning process. Future research should develop more precise, operational definitions of variables that can influence cognitive learning (e.g., those that a teacher or counselor might use in trying to facilitate the learning of a student or client) so that they can be systematically investigated. Current theories of cognitive learning have identified various functions that must be performed if learning is to occur. For example, Sternberg (1984a) suggests that many, if not all, cognitive theories of learning incorporate three functions that must be performed if learning is to occur: (a) the collection of new information (encoding), (b) the combination of disparate pieces of new information, and (c) the relating of new to old information. It seems to me that several other functions, such as evaluation, are also involved. Little research has been done on variables that affect these factors within a complex learning situation (e.g., the specific variables that determine what does and what does not get encoded), although Sternberg (1984a) identifies and discusses five variables that appear to affect the learning of verbal concepts: (a) the number of occurrences of the new item of knowledge, (b) the variability of contexts in which multiple occurrences of the new item of knowledge occur, (c) location of cues relative to the to-be-learned item of knowledge, (d) importance of the to-be-learned item of knowledge, and (e) density of items of knowledge to be learned. It is interesting to note how similar most of these variables are to those variables responsible for more traditional types of learning (e.g., practice, contiguity, and reinforcement). (For a discussion of how these variables provide the basic conditions for learning to occur, see Shuell and Lee, 1976.) In fact, it should be evident from the preceding sections that many of the current theories use traditional concepts from the psychology of learning to explain cognitive learning – for example, generalization. It has been suggested that new schemata are learned by establishing analogies between old and new schemata (Rumelhart & Norman, 1981). If such is the case, it seems likely that
Salkind_Chapter 38.indd 96
9/4/2010 3:17:37 PM
Shuell
Cognitive Conceptions of Learning
97
the process is one of generalization and transfer. But generalizations based on analogies may be rather different from traditional conceptions of generalization, since analogies involve structured relationships whereas traditional conceptions typically involve unidimensional stimuli and responses. Tversky (1977) has proposed a contrast model of similarity in which perceived similarity is determined by the individual matching features (both common and distinctive) between two objects or families of objects (e.g., an analogy). The extent to which psychological processes related to generalization are the same in the two situations (traditional vs. cognitive learning) is an empirical question that remains to be investigated. For instance, if there are generalization gradients for analogies similar to those that exist for simpler forms of learning (and this assumption is not unreasonable), then what are the relevant dimensions in analogies along which generalization can occur and the structures to which they apply? A variety of other variables such as elaboration and advance organizers that have been investigated within the traditional framework of research on learning clearly involve mental activities and make assumptions similar to those discussed above for cognitive learning. In addition, Norman (1978) has suggested various ways in which interference and transfer (two very traditional factors in research on human learning) might be involved in various types of cognitive learning. Although a detailed discussion of the possible integration of these variables and phenomena with the types of concerns associated with cognitive learning is beyond the scope of the present article, a simple example may prove useful. Contiguity (the proximity of two events) is well established as one of the fundamental variables affecting traditional types of learning (e.g., Shuell & Lee, 1976). In these simpler forms of learning, contiguity is nearly always defined in terms of time intervals (e.g., the time between the conditioned stimulus and unconditioned stimulus in classical conditioning, the time between response and reinforcement in operant conditioning), but other forms of contiguity (e.g., spatial, semantic) appear just as reasonable. Thus, in learning more complex material, contiguity between disparate pieces of information may determine the likelihood that the individual will induce a schema. In Sternberg’s (1984a) list of five variables affecting the acquisition of concepts from text, one (location of cues) is a clear example of contiguity, and another (density of items) could involve contiguity, (e.g., the point at which cognitive overload occurs). In some cases, the learner may actively try to establish contiguity through the use of various learning strategies. Although contiguity may seem like an esoteric variable to some educators, it is a variable over which teachers and instructional designers (e.g., textbook authors) have considerable control; if more were known about the way contiguity affects meaningful learning, perhaps they could use it more effectively. In any case, a combination of concerns from traditional learning psychology and modern-day cognitive psychology should serve as a focal point of future research on cognitive learning.
Salkind_Chapter 38.indd 97
9/4/2010 3:17:37 PM
98
Curriculum, Instruction and Learning
Knowledge and Learning Traditional conceptions and theories of learning are, for the most part, content free – that is, learning occurs in basically the same way, or follows the same principles, in all situations. Gradually, however, it has become increasingly clear that the amount of knowledge that one possesses has a substantial impact on the learning process (e.g., Chi, Glaser, & Rees, 1982). For example, adults normally are able to remember more (e.g., have considerably longer digit spans) than children, yet Chi (1978) found that 10-year-old chess experts remembered more about the placement of chess pieces on a board than adults who were only novice chess players (the traditional finding of adult superiority was obtained when the same subjects were asked to remember digits). In addition, individuals who know a great deal about something (experts) encode new material related to that knowledge in a different way than individuals who know little about the topic (novices) (see, e.g., Chi et al., 1982; Siegler, 1983). While these expert/novice differences demonstrate that cognitive learning involves qualitative and not merely quantitative changes, we need to know more than the nature of the differences; we need to know how the transition between novice and expert takes place, especially if education is to facilitate the process. There is also evidence that learning is much more domain specific than earlier learning theorists believed (for a good discussion of this point, see Glaser, 1984). For example, Chase and Ericsson (1981) report on an average college student (SF) who over a period of 25 months of practice steadily increased his average digit span from seven to over 80 digits. This feat was accomplished by encoding the digits into running times (SF was an avid and proficient long-distance runner) and developing an elaborate, hierarchical retrieval structure. But the skills SF learned in acquiring the largest memory span ever reported in the literature are domain specific; SF’s memory span is normal (about seven symbols) when recalling other types of stimuli such as random consonants. Apparently SF lacked the knowledge base relevant to consonants, for example, that would be necessary to demonstrate his proficiency with other types of stimuli. Yet it seems unlikely that all learning is domain specific. If it were, then it would be difficult to explain how individuals deal with novel situations or learn material that is totally new to them. Obviously, learning involves both domain-specific and domain-independent processes. One challenge for future research to address is how these two aspects of learning interrelate with one another and with the skill and/or knowledge that is being acquired. Another issue that is likely to be addressed by future research is the relationship between various types of knowledge. As already noted, cognitive psychologists frequently distinguish between declarative and procedural knowledge, and other types of knowledge have also been suggested (e.g., Gagné & White, 1978). But is one type more basic than other types? For
Salkind_Chapter 38.indd 98
9/4/2010 3:17:37 PM
Shuell
Cognitive Conceptions of Learning
99
example, Rumelhart and Norman (1981) propose that “ . . . all knowledge is properly considered as knowledge how but . . . the system can sometimes interrogate this knowledge how to produce knowledge that ” (p. 343). This emphasis on “learning by doing” (Anzai & Simon, 1979) – “ . . . expertise comes about through the use of knowledge and not by analysis of knowledge” (Neves & Anderson, 1981, p. 83) – is reminiscent of themes heard in education, although the emphasis is somewhat different. In any case, these issues have important implications for educational practices, since different types of knowledge have different instructional requirements.
Phases of Learning The notion that learning progresses in what might be thought of as phases or stages is not a new idea. Some 30 years ago, Fleishman and Hempel (1954, 1955) provided evidence that psychomotor learning proceeds in this manner with performance in the various stages drawing upon different abilities. Other researchers provided evidence for stages in both pairedassociate (e.g., McGuire, 1961; Underwood, Runquist, & Schulz, 1959) and free-recall learning (Labouvie, Frohring, Baltes, & Goulet, 1973). In addition Brainerd (1985) and Brainerd, Howe, and Desrochers (1982) developed a sophisticated mathematical model of learning. Nearly all of this research, however, deals with relatively simple forms of learning. Very little empirical evidence is available on the phases that learners might go through in learning more complex, meaningful material. In recent years, several cognitive theorists have suggested that stages are involved in cognitive learning. For example, Bransford and Franks (1976) have argued that learning that involves understanding moves from concrete to abstract representations, and J. R. Anderson’s (1982) ACT theory postulates that learning proceeds from declarative knowledge to procedural knowledge. Other types of stages or phases of learning are possible, and it is reasonable to expect that different variables may be involved during the various phases. For example, in school we typically expect students to acquire complex bodies of knowledge with some degree of understanding. When the individual begins this undertaking, he or she normally begins by acquiring a number of relatively disparate pieces of information (e.g., the “basic facts” stressed in most classrooms). During this early phase of learning, pictorial and verbal mnemonics (or various other learning strategies) may facilitate learning by providing the conceptual glue necessary to hold these disparate pieces in memory, and variables such as repetition may play a relatively important role. As learning progresses, however, and the individual begins to fit some of the pieces together, mnemonics may play a less important (or different) role and other types of factors (organizational strategies?)
Salkind_Chapter 38.indd 99
9/4/2010 3:17:37 PM
100
Curriculum, Instruction and Learning
may play an increasingly important role. Still later, as performance becomes well established, mnemonics may have little or no effect on learning since the underlying knowledge structure now holds the information together in some meaningful, integrated whole – to use an extremely elementary example, C - A - T has become CAT. Thus, given variables may facilitate acquisition during one phase of learning and have little, if any, effect during other phases. Although retention is not really a phase of learning in the sense being discussed here, perhaps the relationship is similar enough to provide a useful analogy. Elaboration clearly has a facilitative effect on learning, but it has been found not to affect retention independent of learning (Olton, 1969); likewise, immediate feedback normally facilitates learning, but delayed feedback appears to facilitate retention (Surber & Anderson, 1975).5 The more important forms of human learning that interest most cognitive scientists and educators involve what is fundamentally a long-term process involving weeks, months, and even years. The phases that we go through as we engage in long-term learning are unknown at present, but they undoubtedly exist and deserve our attention.
Implications for Education Changes in the way we think about learning and what we know about the way learning occurs have important implications for those situations in which we want to facilitate changes in what people know and/or do. In education, for example, corresponding changes are occurring in the way we think about teaching. Since learning is an active process, the teacher’s task necessarily involves more than the mere dissemination of information. Rather, if students are to learn desired outcomes in a reasonably effective manner, then the teacher’s fundamental task is to get students to engage in learning activities that are likely to result in their achieving these outcomes, taking into account factors such as prior knowledge, the context in which the material is presented, and the realization that students’ interpretation and understanding of new information depend on the availability of appropriate schemata. Without taking away from the important role played by the teacher, it is helpful to remember that what the student does is actually more important in determining what is learned than what the teacher does. Although many educators have long advocated that teachers actively engage their students in the learning process, there has not been a great deal of scientific knowledge to support these contentions. “Open education” and “discovery learning” are just two examples of educational practices that failed to produce encouraging results due, at least in part, to the lack of a viable theory of cognitive learning. Many other educators, of course, have advocated “back to the basics” and other approaches stressing more behavioral
Salkind_Chapter 38.indd 100
9/4/2010 3:17:37 PM
Shuell
Cognitive Conceptions of Learning
101
forms of learning variables. With the advent of cognitive theories of learning and knowledge of how specific learning processes in the student are engaged by specific instructional variables, we may have the beginning of a viable body of scientific knowledge on how best to capitalize on the active nature of learning. Some of the cognitive research on learning discussed earlier may form the basis for this endeavor, although the nature of an instructional situation does have some unique characteristics. Theories of learning from instruction are somewhat different from regular theories of learning (Shuell, 1980), and important research on cognitive theories of learning from instruction (e.g., Leinhardt & Putnam, 1986; Snow & Lohman, 1984) is beginning to appear in the literature. With regard to prior knowledge, we know that students often begin learning with substantial misconceptions about the material they are studying (e.g., Champagne, Klopfer, & Gunstone, 1982) and that remnants of these misconceptions even persist in students who receive high grades in the course (Champagne, Klopfer, & Anderson, 1980; Gunstoke & White, 1981). Students also make systematic errors (such as always subtracting the smallest digit from the largest digit, regardless of which one is on top) sometimes referred to as “buggy algorithms” (Brown & VanLehn, 1982; Resnick, 1982). There errors are not careless mistakes or even the result of faulty reasoning; rather, they represent what students reasonably consider to be appropriate ways of dealing with the problem on which they are working, given their current knowledge structure (i.e., prior knowledge). Analysis of these errors can provide the teacher (or textbook writer, etc.) with useful insights into the type of instruction that has the best chance of being successful; at the very least, it highlights the crucial role played by prior knowledge in any real-life learning situation. What these concerns mean is that the teacher’s role is different from the one frequently envisioned in traditional conceptions of teaching. What have changed are the focus and the realization that good teachers are not merely people who can articulate a large number of relevant facts and ideas (although a sound understanding of the subject matter they are teaching is certainly essential); effective teachers must know how to get students actively engaged in learning activities that are appropriate for the desired outcome(s). This task involves the appropriate selection of content, an awareness of the cognitive processes that must be used by the learner in order to learn the content, and understanding of how prior knowledge and existing knowledge structures determine what and if the student learns from the material presented (and hopefully being studied). Consequently, we need to know more about the way in which specific content and instructional procedures engage and/or elicit the psychological processes and knowledge structures appropriate for the desired learning outcome(s) to be achieved – fortunately, some advances are beginning to be made in this direction (e.g., Winne & Marx, 1983).
Salkind_Chapter 38.indd 101
9/4/2010 3:17:37 PM
102
Curriculum, Instruction and Learning
Summary The cognitive sciences have begun to give serious consideration to research on human learning, and several different theories of cognitive learning have been suggested. Although the orientation of those interested in cognitive learning differs considerably from the more traditional, behavioral orientation toward learning, there are also similarities and common concerns between the two approaches. Learning is now viewed as being active, constructive, cumulative, and goal oriented. Yet, concerns for cognitive learning do not necessarily invalidate traditional concerns of learning psychology, and for investigators who look at learning in simpler terms, many of the traditional concerns of learning research remain viable. Individual learners go about learning in different ways (Bruner, 1985), and there are different types of learning outcomes (e.g., Gagné, 1965, 1984; Rumelhart & Norman, 1978). Thus, the more traditional principles of learning may be appropriate for certain types of learning while new principles need to be forged for other types of learning, especially those more complex forms of learning in which the desired outcome involves the understanding of relationships among many separate pieces of information. The possibility of identifying and integrating these multiple aspects of learning presents an important challenge to future research on learning and its application to a variety of applied problems, including classroom learning and instruction.
Notes 1. Most discussions of knowledge structures by cognitive psychologists go beyond the associative networks and habit-family hierarchies sometimes discussed by associationists. 2. The suggestion that there is more than one type of learning is not unique to current concerns for cognitive learning. For example, Kimble (1961) discussed differences between classical and operant conditioning, and Gagné (1965) postulated eight different types of learning ranging from classical conditioning to problem solving. 3. The process of carrying will be ignored for the sake of simplicity. 4. While individual productions may appear to some readers as being very similar to stimulus-response associations in which a particular response is under the control of a discriminating stimulus, production systems are different from S-R associations in the way individual productions are interrelated in an organized system under the control of various goals and subgoals. Note also how control of the system can be shifted from a goal to a subgoal, as in production P3 below. 5. For a more complete discussion of this point, see Shuell and Lee (1976).
References Anderson, J. R. (1980). Cognitive psychology and its implications. San Francisco: Freeman. Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89, 369– 406. Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.
Salkind_Chapter 38.indd 102
9/4/2010 3:17:37 PM
Shuell
Cognitive Conceptions of Learning
103
Anderson, R. C. (1984). Some reflections on the acquisition of knowledge. Educational Researcher, 13(9), 5–10. Anzai, Y., & Simon, H. A. (1979). The theory of learning by doing. Psychological Review, 86, 124 –140. Ausubel, D. P. (1962). A subsumption theory of meaningful verbal learning and retention. Journal of General Psychology, 66, 213–224. Ausubel, D. P. (1963). The psychology of meaningful verbal learning. New York: Grune & Stratton. Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge, England: Cambridge University Press. Bereiter, C. (1985). Toward a solution of the learning paradox. Review of Educational Research, 55, 201–226. Block, R. A. (1985). Education and thinking skills reconsidered. American Psychologist, 40, 574–575. Bower, G. H. (1970). Analysis of a mnemonic device. American Scientist, 58, 496–510. Bower, G. H., & Hilgard, E. R. (1981) Theories of learning (5th ed.). Englewood Cliffs, NJ: Prentice-Hall. Brainerd, C. J. (1985). Model-based approaches to storage and retrieval development. In C. J. Brainerd & M. Pressley (Eds.), Basic processes in memory development: Progress in cognitive development research (pp. 143–207). New York: Springer-Verlag. Brainerd, C. J., Howe, M. L., & Desrochers, A. (1982). The general theory of two-stage learning: A mathematical review with illustrations from memory development. Psychological Bulletin, 91, 634 –665. Bransford, J. D., & Franks, J. J. (1976). Toward a framework for understanding learning. In G. H. Bower (Ed.), Psychology of learning and motivation ( Vol. 10, pp. 93–127). New York: Academic Press. Bransford, J. D., & Johnson, M. K. (1972). Contextual prerequisites for understanding: Some investigations of comprehension and recall. Journal of Verbal Learning and Verbal Behavior, 11, 717–726. Bransford, J. D., & McCarrell, N. S. (1974). A sketch of a cognitive approach to comprehension: Some thoughts about understanding what it means to comprehend. In W. B. Weimer & D. S. Palermo (Eds.), Cognition and the symbolic process (pp. 189–229). Hillsdale, NJ: Lawrence Erlbaum Associates. Brown, A. L. (1978). Knowing when, where, and how to remember: A problem of metacognition. In R. Glaser (Ed.), Advances in instructional psychology ( Vol. 1, pp. 77–165). Hillsdale, NJ: Lawrence Erlbaum Associates. Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. (1983). Learning, remembering, and understanding. In P. H. Mussen (Ed.), Handbook of child psychology: Vol. III. Cognitive development (4th ed., pp. 77–166). New York: Wiley. Brown, J. S., & VanLehn, K. (1982). Towards a generative theory of “bugs.” In T. P. Carpenter, J. M. Moser, & T. A. Romberg (Eds.), Addition and subtraction: A cognitive perspective (pp. 117–135). Hillsdale, NJ: Lawrence Erlbaum Associates. Bruner, J. S. (1957). Going beyond the information given. In J. S. Bruner, E. Brunswik, L. Festinger, F. Heider, K. Muenzinger, C. Osgood, & D. Rapaport, Contemporary approaches to cognition (pp. 41– 69). Cambridge, MA: Harvard University Press. Bruner, J. S. (1961). The act of discovery. Harvard Educational Review, 31, 21–32. Bruner, J. (1985). Models of the learner. Educational Researcher, 14(6), 5–8. Carpenter, T. P., Moser, J. M., & Romberg, T. A. (Eds.) (1982). Addition and subtraction: A cognitive perspective. Hillsdale, NJ: Lawrence Erlbaum Associates. Carroll, J. B. (1976). Psychometric tests as cognitive tasks: A new “structure of intellect.” In L. B. Resnick (Ed.), The nature of intelligence (pp. 27–56). Hillsdale, NJ: Lawrence Erlbaum Associates.
Salkind_Chapter 38.indd 103
9/4/2010 3:17:37 PM
104
Curriculum, Instruction and Learning
Champagne, A. B., Klopfer, L. E., & Anderson, J. H. (1980). Factors influencing the learning of classical mechanics. American Journal of Physics, 48, 1074–1079. Champagne, A. B., Klopfer, L. E., & Gunstone, R. F. (1982). Cognitive research and the design of science instruction. Educational Psychologist, 17, 31–53. Chase, W. G., & Ericsson, K. A. (1981). Skilled memory. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 141–189). Hillsdale, NJ: Lawrence Erlbaum Associates. Chi, M. T. H. (1978). Knowledge structures and memory development. In R. S. Siegler (Ed.), Children’s thinking: What develops? (pp. 73–96). Hillsdale, NJ: Lawrence Erlbaum Associates. Chi, M. T. H., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In R. Sternberg (Ed.), Advances in the psychology of human intelligence ( Vol. 1, pp. 7–75). Hillsdale, NJ: Lawrence Erlbaum Associates. Cofer, C. N. (1973). Constructive processes in memory. American Scientist, 61, 537–543. Dooling, D. J., & Lachman, R. (1971). Effects of comprehension on retention of prose. Journal of Experimental Psychology, 88, 216–222. Ebbinghaus, H. (1913). Memory. (H. A. Ruger & C. E. Bussenius, Trans.). New York: Teachers College. (Original work published 1885) Feigenbaum, E. A. (1959). An information-processing theory of verbal learning (Paper No. P-1817). Santa Monica, CA: RAND Corp. Fitts, P. M. (1964). Perceptual-motor skill learning. In A. W. Melton (Ed.), Categories of human learning (pp. 243–285). New York: Academic Press. Flavell, J. H. (1979). Metacognition and cognitive monitoring: A new area of cognitivedevelopmental inquiry. American Psychologist, 34, 906–911. Flavell, J. H. (1981). Cognitive monitoring. In W. P. Dickson (Ed.), Children’s oral communication skills (pp. 35–60). New York: Academic Press. Flavell, J. H., & Wellman, H. M. (1977). Metamemory. In R. V. Kail, Jr. & J. W. Hagen (Eds.), Perspectives on the development of memory and cognition (pp. 3–33). Hillsdale, NJ: Lawrence Erlbaum Associates. Fleishman, E. A., & Hempel, W. E., Jr. (1954). Change in factor structure of a complex psychomotor test as a function of practice. Psychometrika, 19, 239–252. Fleishman, E. A., & Hempel, W. E., Jr. (1955). The relation between abilities and improvement with practice in a visual discrimination reaction task. Journal of Experimental Psychology, 49, 301–310. Gagné, R. M. (1962). The acquisition of knowledge. Psychological Review, 69, 355–365. Gagné, R. M. (1965). The conditions of learning. New York: Holt, Rinehart and Winston. Gagné, R. M. (1984). Learning outcomes and their effects: Useful categories of human performance. American Psychologist, 39, 377–385. Gagné, R. M., & White, R. T. (1978). Memory structures and learning outcomes. Review of Educational Research, 48, 187–222. Glaser, R. (1984). Education and thinking: The role of knowledge. American Psychologist, 39, 93–104. Glaser, R. (1985). All’s well that begins and ends with both knowledge and process: A reply to Sternberg. American Psychologist, 40, 573–574. Greeno, J. G. (1978). A study of problem solving. In R. Glaser (Ed.), Advances in instructional psychology ( Vol. 1, pp. 13–75). Hillsdale, NJ: Lawrence Erlbaum Associates. Greeno, J. G. (1980a). Psychology of learning, 1960–1980: One participant’s observations. American Psychologist, 35, 713–728. Greeno, J. G. (1980b). Some examples of cognitive task analysis with instructional implications. In R. E. Snow, P-A Federico, & W. E. Montague (Eds.), Aptitude, learning, and instruction: Vol. 2. Cognitive process analyses of learning and problem solving (pp. 1–21). Hillsdale, NJ: Lawrence Erlbaum Associates.
Salkind_Chapter 38.indd 104
9/4/2010 3:17:37 PM
Shuell
Cognitive Conceptions of Learning
105
Gunstone, R. F., & White, R. T. (1981). Understanding of gravity. Science Education, 65, 291–300. Heller, J. I., & Reif, F. (1984). Prescribing effective human problem-solving processes: Problem description in physics. Cognition and Instruction, 1, 177–216. Jenkins, J. J. (1974). Remember that old theory of memory? Well, forget it! American Psychologist, 29, 785–795. Johnson-Laird, P. N. (1985). Deductive reasoning ability. In R. J. Sternberg (Ed.), Human abilities: An information-processing approach (pp. 173–194). New York: Freeman. Keil, F. C. (1984). Mechanisms of cognitive development and the structure of knowledge. In R. J. Sternberg (Ed.), Mechanisms of cognitive development (pp. 81–99). New York: W. H. Freeman. Kendler, H. H., & Kendler, T. S. (1962). Vertical and horizontal processes in problem solving. Psychological Review, 69, 1–16. Kimble, G. A. (1961). Hilgard and Marquis’ conditioning and learning (2nd ed.). New York: Appleton-Century-Crofts. Kimble, G. A. (1984, August). The psychology of learning enters its second century. Master lecture presented at the meeting of the American Psychological Association, Toronto. Labouvie, G. V., Frohring, W. R., Baltes, P. B., & Goulet, L. R. (1973). Changing relationship between recall performance and abilities as a function of stage of learning and timing of recall. Journal of Educational Psychology, 64, 191–198. Langley, P., & Simon, H. A. (1981). The central role of learning in cognition. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 361–380). Hillsdale, NJ: Lawrence Erlbaum Associates. Leinhardt, G., & Putnam, R. T. (1986, April). The skill of learning from classroom lessons Paper presented at the annual meeting of the American Educational Research Association, San Francisco. Markman, E. M. (1981). Comprehension monitoring. In W. P. Dickson (Ed.), Children’s oral communication skills (pp. 61–84). New York: Academic Press. McGuire, W. J. (1961). A multiprocess model for paired-associate learning. Journal of Experimental Psychology, 62, 335–347. Miller, G. A., Galanter, E., & Pribram, K. L. (1960). Plans and the structure of behavior. New York: Holt, Rinehart and Winston. Moravcsik, J. (1979). Understanding. Dialectica, 33, 201–216. Neves, D. M., & Anderson, J. R. (1981). Knowledge compilation: Mechanisms for the automatization of cognitive skills. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 57–84). Hillsdale, NJ: Lawrence Erlbaum Associates. Newell, A. (1980). Reasoning, problem solving, and decision processes: The problem space as a fundamental category. In R. S. Nickerson (Ed.), Attention and performance VIII (pp. 693–718). Hillsdale, NJ: Lawernce Erlbaum Associates. Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: PrenticeHall. Norman, D. A. (1978). Notes toward a theory of complex learning. In A. M. Lesgold, J. W. Pellegrino, S. D. Fokkema, & R. Glaser (Eds.), Cognitive psychology and instruction (pp. 39–48). New York: Plenum Press. Norman, D. A., Rumelhart, D. E., & the LNR Research Group. (1975). Explorations in cognition. San Francisco, CA: Freeman. Olton, R. M. (1969). The effect of a mnemonic upon the retention of paired-associate verbal material. Journal of Verbal Learning and Verbal Behavior, 8, 43–48. Omanson, R. C., Beck, I. L., Voss, J. F., & McKeown, M. G. (1984). The effects of reading lessons on comprehension: A processing description. Cognition and Instruction, 1, 45–67. Paivio, A. (1969). Mental imagery in associative learning and memory. Psychological Review, 76, 241–263.
Salkind_Chapter 38.indd 105
9/4/2010 3:17:37 PM
106
Curriculum, Instruction and Learning
Paivio, A. (1971). Imagery and verbal processes. New York: Holt, Rinehart and Winston. Pellegrino, J. W., & Glaser, R. (1982). Analyzing aptitudes for learning: Inductive reasoning. In R. Glaser (Ed.), Advances in instructional psychology ( Vol. 2, pp. 269–345). Hillsdale, NJ: Lawrence Erlbaum Associates. Pressley, M., & Levin, J. R. (Eds.) (1983a). Cognitive strategy research: Educational applications. New York: Springer-Verlag. Pressley, M., & Levin, J. R. (Eds.) (1983b). Cognitive strategy research: Psychological foundations. New York: Springer-Verlag. Resnick, L. B. (1982). Syntax and semantics in learning to subtract. In T. P. Carpenter, J. M. Moser, & T. A. Romberg (Eds.), Addition and subtraction: A cognitive perspective (pp. 136–155). Hillsdale, NJ: Lawrence Erlbaum Associates. Resnick, L. B. (1984). Comprehending and learning: Implications for a cognitive theory of instruction. In H. Mandl, N. L. Stein, & T. Trabasso (Eds.), Learning and the comprehension of text (pp. 431–443). Hillsdale, NJ: Lawrence Erlbaum Associates. Rothkopf, E. Z. (1965). Some theoretical and experimental approaches to problems in written instruction. In J. D. Krumboltz (Ed.), Learning and the educational process (pp. 193–221). Chicago: Rand McNally. Rothkopf, E. Z. (1970). The concept of mathemagenic activities. Review of Educational Research, 40, 325–336. Rumelhart, D. E., & Norman, D. A. (1978). Accretion, tuning, and restructuring: Three modes of learning. In J. W. Cotton & R. L. Klatzky (Eds), Semantic factors in cognition (pp. 37–53). Hillsdale, NJ: Lawrence Erlbaum Associates. Rumelhart, D. E., & Norman, D. A. (1981). Analogical processes in learning. In J. R. Anderson (Ed.), Cognitive skills and their acquisition (pp. 335–359). Hillsdale, NJ: Lawrence Erlbaum Associates. Scandura, J. M. (1970). The role of rules in behavior: Toward an operational definition of what (rule) is learned. Psychological Review, 77, 516–533. Scandura, J. M. (1977). Problem solving: A structural /process approach with instructional implications. New York: Academic Press. Shuell, T. J. (1969). Clustering and organization in free recall. Psychological Bulletin, 72, 353–374. Shuell, T. J. (1980). Learning theory, instructional theory, and adaptation. In R. E. Snow, P.-A. Federico, & W. E. Montague (Eds.), Aptitude, learning, and instruction: Vol. 2, Cognitive process analyses of learning and problem solving (pp. 277–302). Hillsdale, NJ: Lawrence Erlbaum Associates. Shuell, T. J. (1985). Knowledge representation, cognitive structure, and school learning: A historical perspective. In L. H. T. West & A. L. Pines (Eds.), Cognitive structure and conceptual change (pp. 117–130). Orlando, FL: Academic Press. Shuell, T. J., & Lee, C. Z. (1976). Learning and instruction. Monterey, CA: Brooks/Cole. Siegler, R. S. (1983). Five generalizations about cognitive development. American Psychologist, 38, 263–277. Siegler, R. S. & Klahr, D. (1982). When do children learn? The relationship between existing knowledge and the acquisition of new knowledge. In R. Glaser (Ed.), Advances in instructional psychology ( Vol. 2, pp. 121–211). Hillsdale, NJ: Lawrence Erlbaum Associates. Simon, H. A., & Feigenbaum, E. A. (1964). An information-processing theory of some effects of similarity, familiarization and meaning. Journal of Verbal Learning and Verbal Behavior, 3, 385–396. Skinner, B. F. (1950). Are theories of learning necessary? Psychological Review, 57, 193–216. Snow, R. E., & Lohman, D. F. (1984). Toward a theory of cognitive aptitude for learning from instruction. Journal of Educational Psychology, 76, 347–376. Sternberg, R. J. (1977). Intelligence, information processing, and analogical reasoning: The componential analysis of human abilities. Hillsdale, NJ: Lawrence Erlbaum Associates.
Salkind_Chapter 38.indd 106
9/4/2010 3:17:37 PM
Shuell
Cognitive Conceptions of Learning
107
Sternberg, R. J. (1979). The nature of mental abilities. American Psychologist, 34, 214 –230. Sternberg, R. J. (1984a). A theory of knowledge acquisition in the development of verbal concepts. Developmental Review, 4, 113–138. Sternberg, R. J. (1984b). Mechanisms of cognitive development: A componential approach. In R. J. Sternberg (Ed.), Mechanisms of cognitive development (pp. 163–186). New York: W. H. Freeman. Sternberg, R. J. (1985). All’s well that ends well, but it’s a sad tale that begins at the end: A reply to Glaser. American Psychologist, 40, 571–573. Stevenson, H. (1983). How children learn – The quest for a theory. In P. H. Mussen (Ed.), Handbook of child psychology: Vol. I. History, theory, and methods (4th ed., pp. 213–236). New York: Wiley. Surber, J. R., & Anderson, R. C. (1975). Delay-retention effect in natural classroom settings. Journal of Educational Psychology, 67, 170–173. Tennyson, R. D., & Park, O. C. (1980). The teaching of concepts: A review of instructional design research literature. Review of Educational Research, 50, 55–70. Thorndike, E. L. (1913). Educational psychology: Vol. 2. The psychology of learning. New York: Teachers College. Tolman, E. C. (1932). Purposive behavior in animals and men. New York: Appleton-CenturyCrofts. Tulving, E. (1968). Theoretical issues in free recall. In T. R. Dixon & D. L. Horton (Eds.), Verbal behavior and general behavior theory (pp. 2–36). Englewood Cliffs, NJ: Prentice-Hall. Tulving, E. (1985). How many memory systems are there? American Psychologist, 40, 385–398. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352. Underwood, B. J. (1963). Stimulus selection in verbal learning. In C. N. Cofer & B. S. Musgrave (Eds.), Verbal behavior and learning: Problems and processes (pp. 33– 48). New York: McGraw-Hill. Underwood, B. J. (1964). Laboratory studies of verbal learning. In E. R. Hilgard (Ed.), Theories of learning and instruction (The sixty-third yearbook of the National Society for the Study of Education, Part I, pp. 133–152). Chicago: University of Chicago Press. Underwood, B. J., Runquist, W. N., & Schulz, R. W. (1959). Response learning in pairedassociate lists as a function of intralist similarity. Journal of Experimental Psychology, 58, 70–78. Underwood, B. J., & Schulz, R. W. (1960). Meaningfulness and verbal learning. Philadelphia: Lippincott. Voss, J. F. (1978). Cognition and instruction: Toward a cognitive theory of learning. In A. M. Lesgold, J. W. Pellegrino, S. D. Fokkema, & R. Glaser (Eds.), Cognitive psychology and instruction (pp. 13–26). New York: Plenum Press. Weinstein, C. E., & Mayer, R. E. (1986). The teaching of learning strategies. In M. C. Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 315–327). New York: Macmillan. White, S. H. (1970). The learning theory tradition and child psychology. In P. H. Mussen (Ed.), Carmichael’s manual of child psychology (3rd ed., Vol. 1, pp. 657–701). New York: John Wiley. Winne, P. H., & Marx, R. W. (1983). Matching students’ cognitive processes and teacher skills to enhance learning from teaching. (Instructional Psychology Research Group, Final Report). Burnaby, B. C., Canada: Simon Fraser University. Wittrock, M. C. (1974). Learning as a generative process. Educational Psychologist, 11, 87–95. Wittrock, M. C. (1978). The cognitive movement in instruction. Educational Psychologist, 13, 15–29. Wittrock, M. C. (1986). Student’s thought processes. In M. C. Wittrock (Ed.), Handbook of research on teaching (3rd ed., pp. 297–314). New York: Macmillan.
Salkind_Chapter 38.indd 107
9/4/2010 3:17:37 PM
Salkind_Chapter 38.indd 108
9/4/2010 3:17:37 PM
39 Meaning in Complex Learning Ronald E. Johnson
W
hat role does meaningfulness play in the learning of complex verbal materials? If meaningfulness does facilitate learning, how is the facilitation accomplished? What are the means by which learning may be made meaningful? Answers to these questions depend upon a satisfactory definition of meaningfulness and valid measures of the construct. Unfortunately, a review of the literature shows that meaningfulness has been neglected both theoretically and empirically. Perhaps the most important reason for neglect has been the intuitive certainty that meaningfulness does influence learning. In addition, investigators have had so much faith in intuition that meaningfulness levels typically have been decreed simply by personal judgment (e.g., English, Welborn, & Killian, 1934). When empirical assessments have been attempted, researchers have been stymied by excessive reliance upon the classical methods of assessing meaningfulness. Neglect also stems from the lack of agreement in defining meaningfulness (Alston, 1964; Creelman, 1966; Fries, 1954). Given the differences in theoretical viewpoints, it is not surprising to find Bousfield (1961) viewing meaning as “an unnecessary concept for verbal learning” (p. 81), while Osgood (1961) views meaning as “the single most important variable in human learning, verbal or otherwise” (p. 91). A thesis of this critical review is that meaningfulness is potentially the most powerful variable for explaining the learning of complex verbal discourse. In the review, the possibility is examined that meaningfulness may be pivotal in explaining the effects of other variables. Next, it is argued that the classical methods of measuring meaningfulness are generally
Source: Review of Educational Research, 45(3) (1975): 425–459.
Salkind_Chapter 39.indd 109
9/4/2010 10:42:42 AM
110
Curriculum, Instruction and Learning
inappropriate for calibrating the meaningfulness of verbal discourse. Attention is drawn to variables that influence meaningfulness, and suggestions are made regarding requisite conditions for adequately measuring meaningfulness. Finally, the need for additional research is emphasized by sampling problem areas in which productive research could be conducted.
Meaning – A Theory of Associational Reference Through experience, a person acquires knowledges about an object or idea. Such knowledges include associations about attributes, properties, functions, interrelationships, contextual correlates, and affect. The synthesis of these knowledges constitutes meaning. Although meaning may consist of a vast number of associations, the constituent knowledges are not just a random concatenation. Certain attributes, such as form, are more likely to be salient and to be entered into associational structure. Similarly, organization is derived from learners’ tendencies to bias input data through reliance on preferred cognitive transformations such as grouping and contrast (Campbell, 1958; Deese, 1965). The associational acquisition may or may not include the name of the object or idea. Furthermore, the referential knowledges are not necessarily in conscious awareness, although the existence of such knowledges may be verified by various semantic analyses and experimental techniques. As compared with previous theories in philosophy or linguistics, the present view of meaning is an extreme version of a referential or ideational theory (Alston, 1964).1 Stated as an ideational theory, meaning is asserted to be nothing more nor less than a person’s referential knowledge about a word, an object, or idea. Rephrased as a theory of reference, the referential object of a word or phrase is asserted to be a conceptual class designated by referential associations. Rather than restricting the term referent to an objective tangible object, the referent of a word is presumed to be a psychological entity. Each word in the subjective lexicon thus represents an entity of particular associations. Such referential associations are not assumed to be derived from a hypothetical construct called meaning; instead, the referential associations are the meaning. By thinking about a concept in relation to other concepts, however, the learner can expand or change existing meanings by acquiring new referential associations. The adoption of an ideational-referential stance does provide heuristic advantages in examining the relationship of meaning and learning. Although objections have been raised to similar theories of meaning (Alston, 1964, pp. 10–31; Church, 1961, pp. 124 –132; Fodor, Bever, & Garrett, 1974, pp. 141–170; Lyons, 1968, pp. 400– 442; Miller, 1965, pp. 16–18), the objections are not disabling. Space does not permit analysis of all criticism, but the reassertion of an ideational-referential theory argues the necessity for reexamining the most common objections (see Table 1).
Salkind_Chapter 39.indd 110
9/4/2010 10:42:42 AM
Salkind_Chapter 39.indd 111
Example “George Washington” vs. “first President of the United States.” Abstract words such as rule or duty. Also, combinations of words/concepts that have never been experienced directly.
Utterance “I” has similar meaning for each person, but referent of the phrase depends upon the speaker. “The centaur is a strange creature.” “It was nothing.” “The cat began to bark.” “The apple was calendar.”
Criticism
1) Two phrases may have identical referents and yet have different meanings (Miller, 1965).
2) “Sentences are meaningful, but their meaning cannot be given by their referent, for they may have none.” (Miller, 1965, p. 16).
3) Phrases may have identical meaning and yet have different referents (Alston, 1964; Black, 1968; Pollio, 1974).
4) Sentences can express hypothetical events, assertions of nonexistence, untrue events, and nonsensical events (Black, 1968; Church, 1961, p. 124).
(Continued )
No claim is made that semantic propositions mirror reality or truth, or that referential associations are veridically congruent with reality, but verbal descriptions of hypothetical events do arouse associations or meaning.
Objection unnecessarily assumes a single objective referent which is invariant for all users, and further assumes that meaning is invariant in all lexical contexts.
The referent need not be a concrete tangible object – abstract words also arouse associations. For each sentence, referential associations are aroused by the individual words, the various combinations of words, earlier sentences, and the psychological situation in which the sentence is uttered.
Since each of the two phrases arouses different associations, the phrases do not have the same referents or meaning even though ostensibly referring to the same individual.
Present theory
Table 1: Examination of criticisms of referential or ideational theories of meaning in relation to present view
Johnson Meaning in Complex Learning 111
9/4/2010 10:42:42 AM
Salkind_Chapter 39.indd 112
Example A particular name such as “Ralph Jones.”
Words such as in, on, with, despite.
In arouses association of out. Similarly, big produces little or small, man results in associate of woman, and buy arouses sell.
Criticism
5) Since proper names refer only to a particular person, object, or situation, this restriction allegedly eliminates the possibility of meaning (Fodor et al., 1974, p. 145; Terwilliger, 1968, pp. 149–150).
6) Function words such as articles, conjunctions, and prepositions have no external or denotative reference (Fodor et al., 1974; Glanzer, 1962; Pollio, 1974), and some words in a sentence do not arouse a distinguishable idea (Alston, 1964, pp. 12, 24).
7) Referential associations to a word may indicate the arousal of a meaning opposite to the usual meaning.
Table 1: (Continued )
Componential analyses of semantic features show that antonymous associations are closely related semantically to the stimulus word (Clark, 1970; Lyons, 1968). The single reversed semantic feature indicates as much focus on the critical defining dimension as occurs in the production of a synonymous associate.
Function words do have denotative meaning (Carroll, 1964a) and do arouse associations (Kanungo, 1968; Palermo & Jenkins, 1964). Each word in discourse is a unique functional component of the total meaning which is engendered; any change in wording, including substitution of synonyms, usually alters meaning (Alston, 1964, pp. 44–49; Lyons, 1968; Quine, 1960).
Referential associations (i.e., meaning) are established to the entity designated by the proper noun. Such associations may include memorial representations in the form of referential imagery (Paivio, 1971, pp. 50–77), either as a particular image, as one of a number of particular images, or as a schematized generic image.
Present theory
112 Curriculum, Instruction and Learning
9/4/2010 10:42:42 AM
Johnson
Meaning in Complex Learning
113
Meaningfulness Assuming meaning to be a synthesis of referential associations, meaningfulness may be defined literally as the extensiveness of the network of referential associations. However, the type, relevance, and organization of associations may be more important than quantity in determining the “fullness” of meaning. In any event, a differentiation between meaning and meaningfulness has important educational implications in that the “fullness” of meaning appears critical in learning. Learning may be said to be meaningful to the extent that the new learning task can be related to the existing cognitive structure of the learner, i.e., to the residual of his earlier learnings. The presence of meaning, however, does not guarantee meaningfulness. A person’s associations to a conceptual entity may include all of the associations that normatively define the concept. Yet, the sparseness of associations, or the quality of the associations, may make it difficult for the learner to establish useful associational linkages. Whether a concept is meaningful thus depends upon the associational background of the learner and also the semantic structure of the concept within the linguistic community.
Indirect Experimental Manipulations of Meaningfulness If meaningfulness is a powerful variable in learning, its influence ought to be evident even in experimental comparisons designed to test the influence of other variables. To illustrate, the active learner learns better than the passive learner, and this superiority may result because the active learner is more successful in relating the new material to existing ideas. Bobrow and Bower (1969) compared the remembering of learners who generated their own linking sentences for a noun pair as opposed to merely reading an equivalent linking sentence. When learners generated their own associative links, recall was superior, and Bobrow and Bower concluded that recall was facilitated when learners comprehended the meaning of the sentence. In their words, “the mere act of searching for something in memory at the time of input of a noun pair is not the beneficial factor. Rather it appears that the memory search has to be relevant to constructing a relational bridge between the two nouns” (p. 457). Similar conclusions were reached by R. C. Anderson and his associates (Anderson, Goldberg, & Hidde, 1971; Anderson & Kulhavy, 1972). In the 1971 experiment, learners who were required to fill in blanks at the end of sentences learned more than those who read whole sentences. As interpreted by Anderson et al., the completing of the sentence forced the learner to comprehend the other words in the sentence. Thus, the advantage was thought to result from “the process of giving meaningful representation to the words” (Anderson et al., 1971, p. 398).
Salkind_Chapter 39.indd 113
9/4/2010 10:42:42 AM
114
Curriculum, Instruction and Learning
In another experiment, Watts and Anderson (1971) inserted different types of questions into textual prose. A criterion test required learners to select correct examples of the learned concepts. Some learners received textual questions requiring them to identify the example used in the passage. Other learners received inserted questions requiring them to identify a new example. On the final criterion test, the group that applied their knowledge to new examples showed the best overall performance. As interpreted by Watts and Anderson (1971), the application questions induced the learners to process the text more thoroughly. In the present context, the “more thorough processing” suggests that the questions induced the learners to relate the new learning to old learnings, i.e., to learn meaningfully. Associations that are aroused during learning usually are semantically related to the new content (Clark & Card, 1961; Fillenbaum, 1971; Sachs, 1967), but the insertion of questions into text also can influence meaningfulness by directing the learner to relevant associations or concepts (e.g., Frase, 1970; Rothkopf & Bisbicos, 1967). Questions appearing prior to the relevant content influenced the learning of that content, but relatively little incidental information was remembered. When the inserted questions occurred after the relevant segment of text, learners recalled incidental information as well as relevant information. Questions also can influence the depth of processing (Rickards & Di Vesta, 1974). Interspersed post-questions directing attention to the learning of specific facts resulted in high levels of factual recall, but poor recall of superordinate generalizations. Querying the learner’s knowledge of superordinate generalizations, however, resulted in high recall both of the superordinate statements and also the subordinate facts. When the questions requested the retrieval of subordinate facts to substantiate the superordinate statement, as opposed to simply requesting the recall of the superordinate idea, the patternings of later recall suggested a greater cognitive integration of the superordinate and subordinate facts. Consistent with the results of Rickards and Di Vesta (1974), the effects of questioning procedures may be viewed as a direct derivative of the extent to which the questions induce the retrieval and processing of a particular set of referential associations. Orientation also may be provided by organizational subheadings. The title of a passage ordinarily orients the learner to the central theme, and the learner’s apprehension of the theme presumably would increase the meaningfulness of relevant segments. In a test of this hypothesis, learners who were permitted to see a thematic title showed significantly greater recall of the passage (Dooling & Lachman, 1971). A second experiment indicated that superiorities in recognitive performance were limited to content words that were semantically related to the theme. Evidence that the locus of the effect is in the learning phase is shown by the finding that receiving the title after the passage has been learned does not influence recall (Bransford & Johnson, 1972; Dooling & Mullet, 1973).
Salkind_Chapter 39.indd 114
9/4/2010 10:42:42 AM
Johnson
Meaning in Complex Learning
115
Meaningfulness also may be increased by inducing appropriate learner strategies. Bower and Clark (1969) presented 12 successive serial lists consisting of 10 concrete nouns. One group of learners was told to construct a meaningful story woven around the words to be remembered. A yoked control group received the usual serial learning instructions. Immediate recall performances were virtually perfect, but on a delayed recall test, the average median recall of the narrative group was 93%, whereas the control group recalled only 13%. The difference in performance presumably resulted from the learners’ differential success in relating the learning material to some central theme or organizational framework. In short, the narrative strategy allowed the learners to learn meaningfully. Consistent with this interpretation, Thieman’s (1973) extension of Bower and Clark’s (1969) experiment led to the conclusion that the differences in remembering corresponded directly with the degree of meaningful processing induced by the learning task. Finally, the ubiquitous influence of meaningfulness also appears critical in the distinction between short-term and long-term memory. Since 1965 the verbal learning literature has been dominated by multistore theories of memory in which processing flows from a limited capacity short-term store to a larger capacity long-term store (Atkinson & Shiffrin, 1968; Waugh & Norman, 1965). Encoding in the short-term store is assumed to be mainly acoustical, and forgetting is assumed to be rapid unless the learner engages in repetitive rehearsal. Long-term memories, however, are relatively permanent, and the encodings are presumed to be semantic. Entry into the longterm store is assumed to be directly related to the amount of time spent in short-term storage. At both the empirical and theoretical levels, the distinctions between short-term memory and long-term memory are fading rapidly (Craik, 1973; Craik & Lockhart, 1972; Wickelgren, 1973). As a theoretical replacement for multistore models, Craik (1973) suggests that differences in capacity, encoding, and rates of forgetting depend upon the learner’s “depth of processing.” A rehearsal process that simply holds or maintains the trace in short-term memory, for example, does not lead to long-term retention, whereas rehearsal that involves associative encoding does (Craik & Watkins, 1973; Gardiner, 1974; Jacoby, 1973; Woodward, Bjork & Jongeward, 1973). Thus, linguistic units that are encoded meaningfully are more likely to be remembered. The behavioral evidence that purportedly differentiates two types of memories actually may reflect differences in the extent to which linguistic units are encoded meaningfully. To summarize, some variables appear to exert their influence by increasing or decreasing meaningfulness. Note, however, that the experimenters did not intentionally manipulate meaningfulness. Let us now examine direct attempts to measure meaning, and inquire into the validity of such techniques for complex learning materials.
Salkind_Chapter 39.indd 115
9/4/2010 10:42:42 AM
116
Curriculum, Instruction and Learning
Classical Methods of Measuring Meaning Associational Frequency The meaningfulness of verbal units in isolation has been assessed by the frequency of individuals reporting an association (Archer, 1960; Glaze, 1928; Noble, 1961), the mean production of associates (Noble, 1952), and categorical ratings of associational frequency (Noble, 1961). Such techniques have successfully measured variations in meaningfulness among nonsense syllables. Among words, however, differentiations in levels of meaningfulness are virtually nonexistent, and such measurements are not obviously applicable when the words occur in a prose context.
Word Frequency and Readability Indices A classical measure of meaningfulness is the frequency of occurrence of words (Thorndike & Lorge, 1944). Over the entire frequency range, frequently occurring words are more likely to be meaningful. The degree of relationship, however, is not strong. If one considers recall to be a validating measure of meaningfulness, for example, the relationship between word frequency and recall is slight or nonexistent (Hall, 1971; Saltz, 1971; Underwood & Schulz, 1960). Word frequency also has been an important component in readability formulas, along with variables such as word length, number of syllables, sentence length, number of personal words, and measures of grammatical complexity (e.g., Flesch, 1948). However, in eight validation studies comparing readability indices with independent measures of comprehension or retention, only three studies reported validity coefficients higher than .50, whereas four studies had coefficients below .50 (Klare, 1963, pp. 148–156). Aside from disappointing validity, readability indices can be computed only for large sections of prose, and the method cannot accurately gauge the meaningfulness of individual linguistic subunits.
Word Associations The classical technique of free association also has been used to uncover the structure of meaning. In an examination of associative overlap, Deese (1959) demonstrated that the frequency with which words in a list elicited each other as associates was highly correlated (r = .88) with free recall. Such associative networks, derived from the cognitive operations of contrast and grouping, were presumed to be the essence of meaning (Deese, 1965). Deese’s theorizing was based upon free-association responses given to single-word stimuli, but he concludes that the meaning of sentences also is derived from the
Salkind_Chapter 39.indd 116
9/4/2010 10:42:43 AM
Johnson
Meaning in Complex Learning
117
associational structures of the individual words. The major difference, he assumes, is that associations to words in a sentence are influenced by syntactic and referential constraints. As evidence, Deese (1965, pp. 168–170) cites Clark’s (1966) study in which participants produced sentence-associations having the same grammatical format as the stimulus sentence. To the sentence The lazy student failed the exam, sentence associations included The smart girl passed the test and The industrious pupil passed the course. Individuality in producing sentence-associations was quite evident, but the sentence-associations did bear semantic similarities to the original sentences. Word replacements were similar in kind and in frequency to the distribution of associations given to the same stimulus words in the classical free association task (Deese, 1965, p. 169). Are the meanings of sentences derived from the dominant associations aroused by the individual words? To the stimulus words of working, good, from, comes, and health, the corresponding primary associates are hard, bad, to, goes, and sickness (Palermo & Jenkins, 1964). Yet, if the words are arranged in the sentence Good health comes from working, an individual does not give as a sentence association the words Bad sickness goes to hard. Clearly, when words are placed into sentences, associations are aroused that are not predictable from the individual words (Barclay, Bransford, Franks, McCarrell, & Nitsch, 1974). What, then, were the forces that resulted in sentential associations that were both semantically related and in the form of a sensible sentence? Perhaps the simplest explanation is that subjects were required to provide associational sentences in the same grammatical form as the original sentence. If grammatical congruence had not been required, the number of primary associates probably would have been substantially smaller. Equally important, Clark’s subjects apparently were producing associational sentences in response to the overall meaning (Gestalten) of the stimulus sentence. Understanding and responding to a sentence requires not only knowledge of the individual words, but also an understanding of the relationships among the words (Anisfeld, 1970; Fillenbaum, 1974a; Olson, 1970). Individuals do retrieve and synthesize referential associations that link the individual words of a sentence.
Associative Communality Howe (1972) has argued that the meaningfulness of sentences may be determined by the communality of word associations. According to Howe, meaningful words are more likely to produce the same associational response in each person. The occurrence of a variety of associates presumably reflects the greater possibilities of associative interference. This view is very similar to Martin’s (1968) theory that units of low meaning are likely to be variably
Salkind_Chapter 39.indd 117
9/4/2010 10:42:43 AM
118
Curriculum, Instruction and Learning
encoded; units of high meaning are likely to be encoded in the same way on each occasion. Ironically, a greater number of associates, when evoked from individuals, is one index of higher meaningfulness (m via Noble’s (1952) production method), whereas the occurrence of a greater number of different associates in a group of individuals, as measured only for the first occurring associate, is said to signal the existence of lower meaningfulness. As evidence that associative communality measures meaningfulness, Howe (1972) cites Clark’s (1966) data relating the learning of individual sentences to the normative popularity of associations given in response to the stimulus-sentences. Recall of sentence parts was found to be best when the corresponding sentence associations showed less diversity. Is associative communality a valid measure of meaningfulness? As would be expected, communality does predict recall. Furthermore, communality conceivably might signal the degree of accord with associational backgrounds. Common associates, however, also could hinder or be irrelevant to the required learning. Like Deese (1965), Howe’s (1972) premise is that the meaningfulness of a sentence is derived from the word-associations given to the individual words. The theory thereby is subject to the same limitations noted for Deese’s theory and Clark’s (1966) data.
Cloze Procedure In the cloze procedure, readers’ success in guessing deleted words is said to measure readability or comprehensibility. Might cloze also be used to assess meaningfulness? A passage containing frequently used words, rather than uncommon words, ordinarily would be more meaningful. Meaningfulness also might be higher when verbal passages contained higher proportions of concrete referents and lower proportions of abstract referents. In agreement, Coleman (1971) found substantial correlations between cloze scores and the densities of either concrete or abstract nouns. Similarly, a passage containing some redundancies ordinarily would be easier to comprehend than prose without redundancy. The extremely high correlations with measures of redundancy (Coleman, 1971; MacGinitie, 1971; Taylor, 1954), however, suggest that missing words often could be inserted without understanding the passage. Learning performances on high redundancy passages also may be misleadingly high. As evidenced in programmed instruction, learning may be hindered when the learner is provided with too many cues (Anderson, 1970). Research on reading also shows that excessive cues or redundancy may make the passage become less meaningful to the learner by reducing his attention and making him less active in his learning efforts (Samuels, 1970). In sum, the cloze procedure does tap certain aspects of meaningfulness, although the extent to which these dimensional taps are contaminated remains
Salkind_Chapter 39.indd 118
9/4/2010 10:42:43 AM
Johnson
Meaning in Complex Learning
119
to be seen (Bormuth, 1965; Weaver & Kingston, 1963). Second, the technique offers simplicity of measurement for comparisons of two or more passages. Disadvantages include the excessively heavy reliance on redundancy. Another disadvantage is that the method offers only gross comparisons among passages, and does not offer calibrations of smaller linguistic units such as the sentence or phrase.
Semantic Differential A valid measure of semantic similarity conceivably could be an appropriate measure of the meaningfulness of complex verbal units. Osgood, Suci, and Tannenbaum (1957) have proposed that the connotative meaning of a word can be represented in a three-dimensional semantic space on the factorial dimensions of evaluation, activity, and potency. Words close to each other in semantic space are said to be similar in meaning. An acknowledged limitation of Osgood’s semantic differential, as a general measure of meaning, is that it mainly measures connotative meaning. The technique has not been successfully applied to the measurement of denotative meaning. The words boat, income, bright, and eat occupy virtually identical locations in semantic space, but it is obvious that the differences in meaning are great. Whether the semantic differential could be revised to include dimensions measuring denotation is not known, but the number of potentially independent dimensions appears insurmountably large. In addition, another limitation of the semantic differential is that its usefulness is restricted largely to isolated words rather than words in context.
Toward Adequate Measures of Meaningfulness As noted, inadequacies are evident in existing methods of measuring meaningfulness in verbal discourse. Even when the classical methods reveal differences among words (Paivio, Yuille, & Madigan, 1968), the relationship to meaningfulness is not always obvious. To illustrate, in the Paivio et al. study, more associations supposedly signaled greater meaningfulness, but the following alignment of production values does not arouse faith in the scaling procedure: FATIGUE, 3.88; FACT, 4.29; SALUTATION, 5.24; PASSION, 5.68; SCIENCE, 6.56; PELT, 6.76; SOVEREIGN, 7.12; WHALE, 7.24; FIRE, 7.36; SEASON, 7.88; PRAIRIE, 8.16. The classical methods of measuring meaningfulness (Archer, 1960; Noble, 1961) also have located almost all words at the extreme high end of the scale, and thus have been insensitive to actual differences in meaningfulness. One possible solution might be to use scaling procedures that force raters to make differentiations. Using an eliminative method analogous to
Salkind_Chapter 39.indd 119
9/4/2010 10:42:43 AM
120
Curriculum, Instruction and Learning
the forced-choice technique, for example, Johnson’s (1973) raters judged meaningfulness by eliminating prose units of lower meaningfulness until only a specified proportion of the content remained. Similar procedures could be fruitful in calibrating the meaningfulness of a sample of isolated words. Another source of difficulty is that experimenters have focused on prose segments that were inappropriate in size. Some procedures are applicable only to gross comparisons between lengthy passages. Other methods have failed, in part, because the unit of analysis was the single word. If the previous assessment was correct, the associational linkages among words make it highly unlikely that the meaning of a sentence can be derived solely from an analysis of words studied in isolation. A successful measure of meaningfulness, then, must be responsive to semantic relationships among words. Compare, for example, “the nail file was used to remove the small screw,” with “the nail screw was used to remove the small file.,, The words in the two phrases are identical, as are the grammatical structures, but the exchange of file and screw has rendered the second phrase into a less meaningful assertion. In an analogous experiment, Rosenberg (1968) found that sentences containing words having strong associative relations with each other, such as The old king ruled wisely, were remembered better than poorly integrated sentences such as The poor king dined gravely. Even with strongly integrated sentences, however, the arousal of particular referents depends upon the associational context provided by other words in the sentence (Barclay et al., 1974). For example, when nouns from different taxonomic categories were presented in a list, recall patterns showed evidence of categorical clustering (Bousfield, 1953). When such nouns were integral parts of sentences, however, Cofer (1968) found that clustering and recall were disrupted by sentential context. Similarly, the importance of context is evident from Harris and Brewer’s (1973) experiments on the accuracy of recalling tenses of verbs. When sentences contained a temporal adverb such as “yesterday,” as opposed to a nontemporal adverb such as “accidentally,” memory for verb tense was more accurate. According to Harris and Brewer, the temporal context imparted greater meaning to verb tense, and hence resulted in better remembering. Similar outcomes were evident in the recall of other sentence elements in which full meaning depended upon a sentential context designating a particular time, place, or speaker (Brewer & Harris, 1974). Grammatical structure also determines the particular associational linkages induced by words in a sentence. The grammatical usage of a word, e.g., train as either a noun or a verb, may determine meaning, and differences in meaningfulness are evident even when different grammatical usages lead to essentially the same semantic interpretations, e.g., fill as a verb or as a noun (Brown, 1958b, pp. 247–253; Carroll, 1970). Even when the object and subject of a sentence remain the same, as in active and passive transformations, different
Salkind_Chapter 39.indd 120
9/4/2010 10:42:43 AM
Johnson
Meaning in Complex Learning
121
understandings are engendered (Anisfeld & Klenbort, 1973; Herriot, 1970; Offir, 1973). As a consequence, slight variations in syntax can lead to sizable differences in recall (Bock & Brewer, 1974). The recall of two temporally ordered events, for example, is best when the presentation order in the sentence corresponds to the actual ordering of events (Blount & Johnson, 1973; Clark & Clark, 1968). Thus, as evidence has increasingly shown (e.g., Fodor et al., 1974; Weisberg, 1971), sentence structure serves as one basis for sentence interpretation. Here again we have evidence for contextual influence in determining meaningfulness. In sum, meaningfulness judgments are not likely to be valid unless the linguistic units are in their appropriate verbal contexts (Jenkins, 1974). Classical measures of the meaningfulness of individual words hinted at the possibility that such measures, in conjunction with a set of some unknown combinatorial rules, might predict the learning of any possible combination of words and phrases. That aim has not been achieved, and the contextual determinancy of meaningfulness makes it unlikely that adequate combinatorial rules will be devised. Given the infinite number of possible word combinations, and the realization that contextual specificity does not allow prediction to other combinations of words, is it worthwhile to measure the meaningfulness of prose subunits? Although the generality of measurement is disappointing, the potential usefulness of such endeavors appears undeniable. One possible use might be pilot assessments of textual subunits prior to their use. Important segments rated low in meaningfulness could be revised to increase the probability of learning. At first glance, the task appears forbidding, but reliable ratings can be achieved with small numbers of raters (Johnson, 1973). Furthermore, when categorical ratings are made of phrase units in prose, the task requires relatively little time beyond that of normal reading, and the relative rankings are very similar to those produced by the eliminative method. Equally important, measures of the meaningfulness of prose could be useful either as independent or dependent variables in research on learning. Experimental variations in semantic or syntactic variables, for example, presumably would influence the meaningfulness of linguistic subunits, and the ease with which the subunits were learned. Through such studies, generalizable relationships could be established.
A Sampler of Needed Research Assessing Meaning Empirical studies are needed to determine whether meaningfulness ratings are more valid when raters make global judgments in contrast to ratings made of dimensional indicants such as amount or quality of aroused imagery.
Salkind_Chapter 39.indd 121
9/4/2010 10:42:43 AM
122
Curriculum, Instruction and Learning
If raters are instructed to attend simultaneously to the various components of meaningfulness, the task may be too complex. Alternately, if raters are told simply to furnish an overall global rating, they may develop differing reliances on the various component dimensions. An empirical solution may be to collect global ratings and also ratings of the component dimensions, and then determine the degree of overlap and the extent to which the various ratings predict criterion performances. The development of alternative methods of measuring meaning and meaningfulness also warrants research priority. Methods used in measuring comprehension (Carroll, 1972), for example, perhaps could be adapted for global measurements of meaningfulness. In developing analytical measures of particular aspects of meaning, researchers perhaps could improvise on techniques used with individual words. Fillenbaum and Rapoport (1971), for example, show much ingenuity in analyzing similarity judgments by techniques such as nonmetric multidimensional scaling, graph theoretic analysis, and hierarchical clustering analysis. Fillenbaum (1974a) probed other aspects of meaning through analyses of attempted paraphrases (also see Fillenbaum, 1974b; Gleitman & Gleitman, 1970), as well as raters’ judgments of sentence equivalence, informativeness, and semantic plausibility. Using a method of componential paraphrasing, followed by empirical analyses of subjects’ sortings of words according to “similarity of meaning,” Miller’s (1969, 1972) research provides additional directionality for developing alternative methods. To appreciate the facilitation that could result from new methods of assessing meaning, consider the catalytic influence of Bransford and Franks’ (1971) research in stimulating additional research (Barclay, 1973; Barclay & Reid, 1974; Bransford, Barclay, & Franks, 1972; Cofer, 1973; Franks & Bransford, 1972, 1974; Johnson, Bransford, & Solomon, 1973; Paris & Carter, 1973; Peterson & Mclntyre, 1973; Potts, 1972; Singer & Rosenberg, 1973). Although Bransfors methodological procedures and conclusions have been criticized (Katz, 1973; Katz & Gruenewald, 1974; Reitman & Bower, 1973), the powerful impact of Bransford and his associates appears derived from the use of an explicit method for showing that learners have considerable recognitive difficulty in distinguishing between the semantic content of the message and their own implicitly generated semantic content. In agreement with Bartlett’s (1932) reconstructive theory of remembering, the semantic content of a message appears to become fused with existing referential knowledges. The adoption of a referential theory of meaning argues the importance of developing methods that assess the referential associations of discourse units. As noted earlier, word associations to individual stimulus words do not necessarily predict the associations that will be given when the stimulus words are embedded in discourse. Yet, free associations to prose units undoubtedly tap certain aspects of the structure of meaning. Typically, the bulk of the associations can be categorized as being either opposites;
Salkind_Chapter 39.indd 122
9/4/2010 10:42:43 AM
Johnson
Meaning in Complex Learning
123
synonyms; superordinates; subordinates; logical coordinates, e.g., apple-pear; or functional, e.g., needle-thread (Karwoski & Berthold, 1945; Moran, Mefferd, & Kimble, 1964). Is there a code that allows translation of such word associations into structural or theoretical representations of meaning? In an important paper, Clark (1970) postulates the existence of associational rules that govern the production of word associations. Theorizing from the linguistic viewpoints of Katz and Fodor (1963) and of Chomsky (1965), Clark assumes a componential approach in which a word is comprehended as a set of syntactic and semantic features. The features of man, for example, might be characterized as: +Noun, +Det –, +Count, +Animate, +Human, +Adult, and +Male. In the free-association task, an “associating rule” is applied to the list of features, such as “change the sign of the last feature” (e.g., +Male to –Male), and an association is then produced which is congruent with the altered feature list, e.g., woman. “Changing the sign of the last feature” is also labeled the “minimal contrast rule” because the antonymous associations produced by the rule have the maximum number of features in common with the stimulus. Within the context of a referential theory of meaning, the “last” feature which is reversed appears centrally related to the semantic dimension on which the word is primarily defined. The word short, for example, leads to the association of long because length is the primary defining attribute of short. In experience, the psychological attribution of short is contrasted with the anchoring alternative of long. The “marking rule” describes another associational transformation common in word associations (Clark, 1970). With antonymic words, one member of the pair is marked or positive with regard to the presence of a feature, whereas the opposite member is neutral or unmarked (Greenberg, 1966). For example, dog is unmarked with regard to the classification of sex, but bitch is marked. In association tasks, marked stimulus words show a greater tendency to produce their unmarked counterparts as associational responses (Clark, 1970). The marked stimulus of better, for example, produces the unmarked response of good more often than good produces better. In the recall of sentences, unmarked words are remembered better than marked, and qualitative changes in memory also tend to proceed from the marked to the unmarked form rather than in the reverse direction (Benjafield & Giesbrecht, 1973; Carpenter, 1974; Clark & Card, 1969). Significantly, however, memorial change toward the unmarked form tends to occur only when such change allows the preservation of the original meaning of the sentence (Brewer & Lichtenstein, 1974). Although linguistic criteria are used to differentiate marked from unmarked words (Greenberg, 1966), Deese (1973) notes that unmarked members of a pair occur more frequently in written language and that children learn to use unmarked words prior to marked words. Equally important, marked words tend to be rated negatively on evaluative scales of affect (Deese, 1973). Given such differential experiences, it is not surprising that unmarked words have priority in associational structures. For the marking
Salkind_Chapter 39.indd 123
9/4/2010 10:42:43 AM
124
Curriculum, Instruction and Learning
rule, as well as other associational rules, regularities in associative responses may prove useful in delineating the structure of meaning. Similar associational transformations presumably mediate the referents aroused to linguistic units in prose. With linguistic units larger than one word, however, verbal associations elicited in free association probably represent associational structure only indirectly. In analyses of verbs of motion, for example, Miller (1972, pp. 345, 369) suggests that some concepts have associational representations which are blends of existing concepts or words. Such associational blends, coupled with the person’s ability to derive a fused representation of separate semantic knowledges, may play a determining role in the excellence of college students in judging the adequacy of their own paraphrases (Fillenbaum, 1974b). In any event, with passages of prose containing many ideational units, the associational blends sometimes become even more composite, until it is perhaps appropriate to speak of schemas (Bartlett, 1932), surrogate structures (Pompi & Lachman, 1967), themes (Dooling & Lachman, 1971), or conceptual macro-structures (Bower, 1974). If schemas do mediate the recall of prose, research ought to be able to discover the basis of such mediation. The foggy notion of schema, unfortunately, has not been operationalized empirically (Oldfield, 1972), but ingenious experimenters should be capable of closing this empirical gap. Perhaps schemas can be inferred from various regularities in recall. Recall patternings in prose, for example, appear analogous to the clusterings observed in categorized lists (Bousfield, 1953), and such clusterings appear to represent the influence of organizational processes during learning (Mandler, 1967). If schemas do determine clustering in the recall of prose, it should be possible to infer the schemas from clustering. Similarly, patternings of errors also might signal the existence of schemas in remembering (Bartlett, 1932). Learners under the influence of preexisting schematic knowledge, for example, made recognitive errors which were thematically congruent with their schema (Sulin & Dooling, 1974). Schemas also might be inferred from the effectiveness of linguistic segments in inducing the recall of other linguistic segments. The use of cuing to assess memory structures in prose learning has its counterpart in the cuing techniques used to induce the remembering of list members (e.g., Mandler, 1967; Tulving & Pearlstone, 1966; Slamecka, 1968). Just as with previous research, it may be assumed that the cuing taps existing superordinate categories that have not been remembered. With the development of such methods for measuring schemas, insight may be gained into the manner in which referential meanings are translated into recall.
Empirical Issues The adoption of a referential theory of meaning bespeaks the importance of relating learning to the organizational structure of existing referential knowledges. The organization and availability of referential associations, for
Salkind_Chapter 39.indd 124
9/4/2010 10:42:43 AM
Johnson
Meaning in Complex Learning
125
example, can be influenced by the input order of a sequence of sentences (Anderson, J., & Hastie, 1974). Furthermore, even subtle differences in the focus of a sentence can arouse different associational referents. The sentence “It was Mr. Smith who ordered the coffee,” for example, presupposes that coffee was ordered by someone, whereas “It was the coffee that Mr. Smith ordered” presupposes that Mr. Smith ordered something. When recognitive paraphrases of such sentences violated presuppositional knowledges, changes in wording were more easily detected than when the alternative phrasings did not violate presuppositions (Offir, 1973). Referential emphasis within a paragraph appears to be another factor influencing the associative representation of embedded sentences (Perfetti & Goldman, 1974). For a sentence such as The serfs rebelled against the baron, the extent to which the paragraph focused on either the subject or object of the sentence was related to the effectiveness of that subject or object as a retrieval cue in remembering the remaining portion of the sentence. Additional evidence for the importance of referential availability may be found in the results of Haviland and Clark (1974) and Moesner and Bregman (1972). In the latter study, when learners attempted to acquire an artificial phrase-structure language without the aid of semantic referents, there was practically no learning of the syntactic rules even after hundreds of trials. With the referential availability of geometrical forms that portrayed the syntactic relations, the learners readily learned the grammatical rules. An important area for empirical investigation is the determination of variables influencing the availability of referential knowledges. One determinant may be the structural organization of the learning material (Anderson, J., & Hastie, 1974). Material that possesses a logical or hierarchical structure may facilitate the arousal of subsuming associations (Ausubel, 1963; de Villiers, 1974). Meaningfulness also may be fostered by the adoption of a set to learn meaningfully rather than by rote (Ausubel, 1963). In turn, the major consequence of adopting a meaningful set may be the arousal of referential associations relating to the material to be learned. Similarly, gaining access to appropriate referential associations can be accomplished by redirection of the learner’s set (e.g., Luchins, 1942). The quality and organization of aroused referents also appears important in learning. Sentences containing pronouns as the subject are remembered better than sentences with nouns as the subject, even though the pronouns themselves are not remembered better (Martin & Walter, 1969). In contrast, the denotative specificity of nouns in prose, as gauged by superordinatesubordinate status, is positively related to remembering, even when the nouns are equivalent in concreteness-abstractness (August, Proctor, Hynes, & Johnson, Note 1). Similarly, memory is better for sentences having specific verbs than for sentences having general verbs (Thios, 1975). Increases in denotative specificity via the restriction of a noun modifier, however, do not influence remembering (August et al., Note 1). Are such differences in
Salkind_Chapter 39.indd 125
9/4/2010 10:42:43 AM
126
Curriculum, Instruction and Learning
remembering due to differences in the parceling of referential associations? The partitioning of referents for superordinates and subordinates, and for verbs, occurs along the referential boundaries of existing concepts, whereas the denotative restriction enjoined by an adjectival modifier is an arbitrary parceling of a noun’s referential class. As an alternate hypothesis, the effects of adjectival modification might be related to the extent to which the parceling induces the retrieval of concrete associates to the noun (Anderson, 1974). What are the dimensional or functional attributes of referents that influence learning? Are referents more easily aroused to specific categories such as diamond than to general categories such as gem? Or, as suggested by Brown (1958a) and by Loftus and Bolton (1974), perhaps the retrievability of referents is partially determined by usage habits. Do subordinate nouns evoke referential associations that are qualitatively different from the referents of superordinate nouns? For example, are referents to subordinate nouns more likely to be concrete? Are the meanings of general words stored in the format of a specific exemplar (Anderson & McGaw, 1973)? If so, why are errors in recall more likely to be memorial changes from specificity to generality than from generality to specificity (August et al., Note 1)? Is there a quantitative difference in the number of referents evoked by superordinates and subordinates (Smith, Shoben, & Rips, 1974)? Does the storage node for a word contain only referential distinguishers, and not the referential attributes common to the superordinate of the word (Collins & Quillian, 1969)? The answers to questions like these will add to our knowledge regarding the processes by which meaningfulness influences learning and retention. Studies of the component dimensions of meaningfulness could test Paivio’s (1971) conclusion that meaningfulness is important only when a particular sequential ordering is required in recall. Under other conditions, says Paivio, imagery is a more important predictor than meaningfulness. As evidence, Paivio cites a widely quoted study by Paivio, Smythe, and Yuille (1968) in which imagery influenced learning even when differences in meaningfulness were equated. When meaningfulness was varied and imagery was constant, meaningfulness exerted no additional effect on learning. An examination of the words used in the Paivio et al. study, however, suggests the possibility of bias in the selectional procedure. As assessed by the production method, the mean number of associates to the high-m list averaged only two more than the low-m list. The high-m list was designed to be high in meaningfulness and low in imagery, and included words such as abode, molecule, theologian, and whalebone. Suppose, instead, that the high-m list was composed of words such as answer, cost, idea, law, and duty, and the low-m list was composed of words such as labyrinth, rosin, and edifice. In such a comparison, the outcome might be different. Paivio’s (1971) dual coding hypothesis has received considerable experimental support, but Goldfarb, Wirtz, and Anisfeld’s (1973) evidence suggests that all verbal material is coded for referential meaning, and that
Salkind_Chapter 39.indd 126
9/4/2010 10:42:43 AM
Johnson
Meaning in Complex Learning
127
differences in recognitive memory for abstract and concrete phrases are due to differences in denotative distinctiveness rather than imagibility. Since denotative distinctiveness was operationally defined by judgments of personal relevancy, the generality of the Goldfarb et al. conclusion is uncertain. Paivio and Olver (1964), however, found that stimulus imagery did not influence the learning of paired associates when specificity was held constant, whereas stimulus specificity was significantly correlated with recall (r = .41) even when imagery was held constant. Imagery and denotative specificity also show some independence in factor analyses (Paivio, 1968; Spreen & Schulz, 1966). Coupled with the conflicting research evidence on the longevity of learning mediated by imagery (Begg & Robertson, 1973; Postman & Burns, 1973), research clearly is still needed on the relationships among imagery, meaningfulness, denotative specificity, and learning. Research also is needed to determine the influence of multiple referents on learning. Words such as tripod have quite limited sets of referential associations. Others, e.g., triangle, have extensive associations. For some concepts with multiple referents, such as scare, the referential associations are all related conceptually, whereas the multiple referents of words such as light are related to different denotative meanings. Based upon studies of verbal learning and retention, it might be predicted that ambiguous words and phrases would be more susceptible to negative transfer and interference. Complicating the prediction, however, is the fact that frequently occurring nouns, as measured in the Thorndike-Lorge (1944) count, are more likely to have a greater number of meanings (Saltz & Modigliani, 1967). Even with Thorndike-Lorge frequencies controlled, however, Saltz and Modigliani found superior learning of paired associates when the response terms were nouns having a greater number of meanings. Contrary to expectations, Saltz and Modigliani (1967) found that the number of meanings was virtually unrelated to Noble’s (1952) production measure of meaningfulness. If associative production is unrelated to the number of meanings, what associations are being tapped? Saltz (1971) suggests that the associations given to a stimulus word tend to exhaust a single meaning, and that words with high-m values differ from low-m words in the richness of their connotative meanings. Other explanations are possible, and it is clear that analytical investigations are needed to determine the relationships between the production of associations, the number and types of meanings associated with a linguistic unit, and learning. Empirical studies also could delineate the conditions under which separate knowledges become fused. Unification tends to be enhanced by a correct temporal sequencing of events (Clark & Clark, 1968), perceived cause and effect relationships (Fillen-baum, 1971), pronominalization (Lesgold, 1972), and the use of the definite article (de Villiers, 1974). When a series of sentences were perceived as a unified story, rather than an unrelated set, de Villiers’ (1974) learners recalled more sentences, recalled the sentences more often in
Salkind_Chapter 39.indd 127
9/4/2010 10:42:43 AM
128
Curriculum, Instruction and Learning
their story order, and more often showed gist recall. Furthermore, ratings of thematic centrality were directly related to sentential recall, whereas ratings of imagery were unrelated to remembering. In contrast, when not viewed as a story, centrality ratings were unrelated to recall, and imagery ratings were directly related. The associative relatedness of the input units thus appears critical in determining semantic fusion. As further evidence, when semantically related sentences are presented, learners cannot later discriminate the input sentences from distractor sentences containing semantically compatible content (Bransford & Franks, 1971; Franks & Bransford, 1974; Peterson & Mclntyre, 1973). If presented with a lengthy series of semantically unrelated sentences, however, learners are quite accurate in discriminating old from new sentences (Shepard, 1967). A related empirical problem is that of understanding memory for gist. Semantic changes in recognitive foils are detected much more readily than syntactical changes (Sachs, 1967; Begg & Wickelgren, 1974), and the verbatim recall of prose is a rarity (Bartlett, 1932; Johnson, 1974). The learner’s remembering of gist displays itself through the recall of verbal equivalences, the selective remembering of important content (Johnson, 1970), and the occurrence of meaning-preserving errors (Fillenbaum, 1966). Since judgments about the equivalence of meaning involve some subjectivity, experimenters have tended to study verbatim memory and to avoid studies of gist. As demonstrated by Fillenbaum (1966), however, gist can be studied objectively, and there is critical need for describing and understanding the transformational changes that occur from the original input of sentences to the display of gist. Probes also are needed to ascertain the relationships between learning and the organizational complexity of the referential associations. Perhaps the major characteristic differentiating abstract from concrete units is the complexity of the referent package. Verbal units that are concrete, such as chair, have referential attributes that are organized conjunctively. To be a chair, an object must have a base, a seat, and a back. The referential dimensions of a concrete unit ordinarily can be specifically denoted, and a potential instance or example of the category can be identified by noting the co-occurrences of the criterial attributes. Such co-occurrences, e.g., size and weight, are so regular that children often have difficulty in disentangling the attributes on occasions in which the attributes are not correlative (Ervin & Foster, 1960; Piaget, 1947/1960). The defining attributes of abstract categories are less obvious or distinct and are more highly interrelated with other concepts (Goldfarb et al., 1973). As shown by Carroll’s (1964b) analyses of immigrant and tort, abstract concepts are more likely to have complex referential systems requiring knowledges of relationships and disjunctive combinations. From a research viewpoint, the descriptive classification of concepts needs to progress beyond Bruner, Goodnow, and Austin’s (1956) categories of conjunctive, disjunctive, and relational.
Salkind_Chapter 39.indd 128
9/4/2010 10:42:43 AM
Johnson
Meaning in Complex Learning
129
Empirical information also is needed on the referential attributes that are salient to the learner. Deese (1973), for example, notes the pervasiveness of spatial information in our cognitive and affective categories. Apparently, then, spatiality is salient to the learner, and spatial information has a high probability of becoming incorporated into referential structures. In identifying other salient attributes, Rips, Shoben, and Smith (1973) had raters judge how typical each word (e.g., chicken) was of its superordinate category (e.g., birds). A multidimensional scaling procedure was then applied to the ratings to suggest salient dimensions (e.g., size and predaciousness). Using another method of identifying salient attributes, Bruner, Olver, and Greenfield (1966) had children judge how objects were alike and different. Developmental changes were found in the various modes by which equivalence judgments were made. Similarly, a mapping of salient semantic attributes might be obtained through the use of a set of structured questions to judges. Scattered throughout the developmental literature, there are studies reporting developmental changes in children’s understanding of different tasks and instructions (e.g., Luria, 1961; Piaget, 1947/1960; Piaget & Inhelder, 1968/1973). Data also exist on the ages at which particular words are acquired, developmental changes in word associations (Entwisle, 1966), and systematic progressions in the usage of different syntactical structures (Fodor, et al., 1974; McNeill, 1970; Menyuk, 1969). Surprisingly little research, however, has focused on semantic development (Anglin, 1970; McNeill, 1970; Palermo & Molfese, 1972). The discovery of developmental regularities in the acquisition of meanings could provide insights into both the cognitive functioning of children (Barclay & Reid, 1974) and also the structural representation of meaning in adults. For many of the research questions raised in the present review, counterpart questions exist regarding developmental regularities in semantic development. Although theoretical statements on the developmental acquisitions of meanings are virtually nonexistent, recent speculations by E. V. Clark (1973) and by Nelson (1974) appear to have ended the drought. Recent theorizing on the acquisition of semantic meanings in adults, to be discussed in the next section, also may provide impetus for comparable theorizing on developmental regularities in the acquisition and use of referential knowledges.
Associative Network Models The conceptual emphasis of the present review has much in congruence with the recent spate of associative network models of semantic memory (Anderson & Bower, 1973; Collins & Quillian, 1972; Kintsch, 1972; Quillian, 1968; and Rumelhart, Lindsay, & Norman, 1972). Quillian’s (1968) view, for example, is that the full meaning of a concept consists of all the memory nodes that can be reached from the concept via an exhaustive tracing process. In these network
Salkind_Chapter 39.indd 129
9/4/2010 10:42:43 AM
130
Curriculum, Instruction and Learning
models, the basic unit of analysis typically is the “proposition,” consisting of a “relation” (usually verbs, adjectives, conjunctions) and one or more “arguments” (usually a noun or other proposition). Within a proposition, the semantic destiny of a lexical item is partially determined by case-grammar rules regarding acceptable parsings. If the word has been encountered previously, the existing storage node is used. Otherwise, a new node is formed automatically. Meanings of lexical items thus become defined through cumulative entries of propositional statements containing the unit. After entry into memory, words also can gain new meanings through the operation of various inferential and transformational rules. Although the network models are couched in the familiar jargon of associationism, such speculations represent new vistas for psychologists and educators. In the empirical testing of the network models, one important consequence may be a shift in the dominant learning paradigm from serial lists and paired associates to studies of prose. With respect to theories of meaning, the network theories may become battlefields that will provide insights into the role of meaningfulness in learning. Since Anderson and Bower’s (1973) associative model (HAM) is the most explicit formulation and has received considerable support, it is appropriate to sample some of their assumptions that deserve empirical testing. Most basic, perhaps, is Anderson and Bower’s (1973) assertion that propositional organization is required for the formation of associations. Although Rohwer (1966) has demonstrated that propositional formats often aid learning, words ostensibly can be associated without benefit of verb or other propositional connectives. Furthermore, even if propositional structure is required, the structural components of such propositions are not obvious. Anderson and Bower postulate the existence of a context subtree containing location and temporal information, and also a fact subtree representing a topic, predications about the topic, and adverbial or adjectival modifiers of the predications. The resultant associative structures may be represented graphically by sentence diagrams similar to the tree-like parsings rendered by grammarians and generations of schoolboys. Figure 1, for example, shows Anderson and Bower’s (1973, p. 160) associative representation of “During the night in the park the hippie touched the debutante.” Note that other associative paths might be assumed, but Anderson and Bower (1973, p. 167) allow representation only of ideational combinations
L Park
C
F
T
S
P
Night Hippie
R
Touch
O Debutante
Figure 1: Anderson & Bower’s (1973) associative representation of “During the night in the park the hippie touched the debutante”
Salkind_Chapter 39.indd 130
9/4/2010 10:42:43 AM
Johnson
Meaning in Complex Learning
131
that represent direct predications. Might not a direct object (e.g., debutante), however, be directly associated with a location (e.g., park)? In what sense, psychologically, is location information (e.g., park) more closely associated with the subject (e.g., hippie) than with the object (e.g., debutante)? Is the verb concept (e.g., touch) semantically closer to the object (e.g., debutante) than to the subject (e.g., hippie)? Might the associational structure of a proposition be related to the syntactical structure in which the proposition is phrased? Anderson and Bower (pp. 295–314) examined the validity of their structural hierarchy by providing cues for stimulating the recall of previously unrecalled components. Their predictions of directional differences in recall received only marginal support from the data. Overall, then, the validity of HAM’s associational structure is not obvious, and reaction-time probes and other techniques might provide evidence of different associational structures. Anderson and Bower (1973, p. 284) assume that all associative links within a proposition are actually transformed into long-term memory associations. On the initial input, for example, the sentence in Figure 1 becomes parsed into 13 associational linkages that are equivalent in strength. With such an assumption, unfortunately, the problem of meaningfulness is skirted, and prior associational memories are assigned no role in determining whether an association will be formed. Differential recall of propositional components does occur (Anderson & Bower, 1973, pp. 295–329), and this fact suggests differential encodings during input. A successful model of associative memory needs to account for this fact, as well as evidence showing that the encoding and recall of linguistic units are related to dimensional characteristics such as abstractness-concreteness, semantic importance, and interest (Gomulicki, 1956; Johnson, 1974). Another challenge to researchers and theorists is to develop a memory system sensitive to the fact that associative representations depend on learners’ encoding strategies. The accuracy of remembering semantic content depends, in part, upon the learner’s set for accurate remembering (Brockway, Chmielewski, & Cofer, 1974). Similarly, with certain types of encoding strategies, the learner shows a long-term remembering of the grammatical voice of input sentences (Anderson & Bower, 1973, pp. 224–228). An adequate theory of semantic memory also needs to account for persistent encoding biases such as preferences for recoding linguistic units into abbreviated forms, and for the remembering of gist rather than surface detail. Additional structural components also may be required to adequately represent the full array of semantic information. Function words, for example, are given only token representation in HAM (pp. 139, 206), but our earlier analysis argued that each word carries elements of meaning. Furthermore, if a separate component is needed for storing temporal information, why is there not comparable representation for spatial information? As another example, informational predications regarding the meaning of verbs occur only infrequently in everyday discourse, and the operation of HAM appears
Salkind_Chapter 39.indd 131
9/4/2010 10:42:43 AM
132
Curriculum, Instruction and Learning
to offer limited opportunity for establishing equivalences in the meanings of verbs (Anderson & Bower, 1973, pp. 193–196). Yet, contrary to what might be predicted from HAM, subsets of verbs, such as verbs of motion, appear to be organized in meaning by rather complex structures of shared semantic components (Miller, 1972). Finally, HAM appears deficient in representing semantic information regarding the structural importance of the various propositions within a message. Informational input in HAM is not categorized according to importance or saliency, and all propositional inputs have equal representation. According to Anderson and Bower (1973, pp. 383–386), the saliency of propositions is determined by input frequency and recency, but there is evidence that structural importance is related to remembering even when input frequency is equivalent for all propositions (Johnson, 1970). On this issue, and on other issues, theorists and researchers have ample opportunity for challenging Anderson and Bower’s conception of associative meaning.
Concluding Comment The present designation of meaningfulness as an area needing research is predicated on the assumption that meaningfulness is a critical variable in learning. Empirical support for this assumption is evident in a study relating the recall of textual prose to meaningfulness (Johnson, 1973). Textual subunits rated in the highest level of meaningfulness were recalled approximately three to eighteen times better than subunits ranked in the lowest level of meaningfulness. Further evidence of the importance of meaningfulness may be found in paraphrasing studies (Fillenbaum, 1974a, 1974b) in which subjects were explicitly instructed “not to improve the sentences or make them more sensible, but to paraphrase them, rewording each in a way that captures its meaning as accurately as possible.” However, when the semantic content violated the paraphrasers’ existing knowledge, they nevertheless paraphrased the content more meaningfully even though they were aware of differences between the original content and their own paraphrases. In the words of Fillenbaum (1974b), “even in the peculiar circumstances of the psychological laboratory Ss seem to be acting on the basic assumption that what is described in discourse will be sensible, that what is described will conform to the customary order of events and will satisfy normal qualitative and causal relations between events or actions” (p. 577). As Bartlett (1932) said in his classical work on remembering, the person’s learning may be characterized as “an effort after meaning.” To quote Bartlett (1932), “there is a constant effort to get the maximum possible of meaning into the material presented” (p. 84). How may this be accomplished? Quoting Bartlett again, “such effort is simply the attempt to connect something that is given with something other than itself” (p. 227). Thus, when learning occurs, the learner inevitably attaches the new experience to the residual of previous experiences. In turn, the residual of past experiences, organized into schemas,
Salkind_Chapter 39.indd 132
9/4/2010 10:42:43 AM
Johnson
Meaning in Complex Learning
133
determines the quality of remembering. Bartlett’s research (1932), as well as others (e.g., Campbell, 1958; Johnson, 1962; Paul, 1959), have provided convincing evidence that qualitative distortions in remembering are related to the individual’s cognitive structure. Furthermore, whether a particular verbal unit is remembered or not remembered is also determined by the organized residual of the learner’s past experiences (Gomulicki, 1956; Johnson, 1974; Zangwill, 1972). It has been more than 40 years since the publication of Bartlett’s (1932) work. Since that time, there have been methodological advances which allow for a renewed attack on the problem of meaning. The time does seem ripe for theoretical and empirical reexaminations of the role of meaning in complex learning.
Note 1. Dixon’s (1965, pp. 23–104) historical account of western thoughts about language, from Plato to Chomsky, documents the everpresent attempts to grapple with the concept of meaning. A review of earlier experimental attempts to assess meaning may be found in Creelman (1966). Modern philosophers and linguists continue to write copiously and polemically on the topic of meaning, and reference to this literature may be initiated through Alston (1964), Lehrer and Lehrer (1970), and Lyons (1968).
Reference Note 1. August, G. J., Proctor, D. L., Hynes, K. P., & Johnson, R. E. Recall of prose as a function of denotative specificity. Paper presented at the meeting of the American Psychological Association, New Orleans, September 1974.
References Alston, W. P . Philosophy of language. Englewood Cliffs, N.J.: Prentice-Hall, 1964. Anderson, J., & Hastie, R. Individuation and reference in memory: Proper names and definite descriptions. Cognitive Psychology, 1974, 6, 495–514. Anderson, J. R., & Bower, G. H. Human associative memory. Washington, D.C.: V. H. Winston, 1973. Anderson, R. C. Control of student mediating processes during verbal learning and instruction. Review of Educational Research, 1970, 40, 349–369. Anderson, R. C. Concretization and sentence learning. Journal of Educational Psychology, 1974, 66, 179–183. Anderson, R. C., Goldberg, S. R., & Hidde, J. L. Meaningful processing of sentences. Journal of Educational Psychology, 1971, 62, 395–399. Anderson, R. C., & Kulhavy, R. W. Learning concepts from definitions. American Educational Research Journal, 1972, 9, 385–390. Anderson, R. C., & McGaw, B. On the representation of meanings of general terms. Journal of Experimental Psychology, 1973, 101, 301–306. Anglin, J. M. The growth of word meaning. Cambridge, Mass.: M.I.T. Press, 1970.
Salkind_Chapter 39.indd 133
9/4/2010 10:42:43 AM
134
Curriculum, Instruction and Learning
Anisfeld, M. False recognition of adjective-noun phrases. Journal of Experimental Psychology, 1970, 86, 120–122. Anisfeld, M., & Klenbort, I. On the functions of structural paraphrase: The view from the passive voice. Psychological Bulletin, 1973, 79, 117–126. Archer, E. J. A re-evaluation of meaningfulness of all possible C VC trigrams. Psychological Monographs, 1960, 74 (10, Whole No. 497). Atkinson, R. C., & Shiffrin, R. M. Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory (Vol. 2). New York: Academic Press, 1968. Ausubel, D. P. The psychology of meaningful verbal learning. New York: Grune & Stratton, 1963. Barclay, J. R. The role of comprehension in remembering sentences. Cognitive Psychology, 1973, 4, 229–254. Barclay, J. R., Bransford, J. D., Franks, J. J., McCarrell, N. S., & Nitsch, K. Comprehension and semantic flexibility. Journal of Verbal Learning and Verbal Behavior, 1974, 13, 471–481. Barclay, J. R., & Reid, M. Semantic integration in children’s recall of discourse. Developmental Psychology, 1974, 10, 277–281. Bartlett, F. C. Remembering. London: Cambridge University Press, 1932. Begg, I., & Robertson, R. Imagery and long-term retention. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 689–700. Begg, I., & Wickelgren, W. A. Retention functions for syntactic and lexical vs. semantic information in sentence recognition memory. Memory and Cognition, 1974, 2, 353–359. Benjafield, J., & Giesbrecht, L. Context effects and the recall of comparative sentences. Memory and Cognition, 1973, 1, 133–136. Black, M. The labyrinth of language. New York: Praeger, 1968. Blount, H. P., & Johnson, R. E. Grammatical structure and the recall of sentences in prose. American Educational Research Journal, 1973, 10, 163–168. Bobrow, S. A., & Bower, G. H. Comprehension and recall of sentences. Journal of Experimental Psychology, 1969, 80, 455–461. Bock, J. K., & Brewer, W. F. Reconstructive recall in sentences with alternative surface structures. Journal of Experimental Psychology, 1974, 103, 837–843. Bormuth, J. R. Cloze tests as a measure of ability to detect literary style. International Reading Association Proceedings, 1965, 287–290. Bousfield, W. A. The occurrence of clustering in the recall of randomly arranged associates. Journal of General Psychology, 1953, 49, 229–240. Bousfield, W. A. The problem of meaning in verbal learning. In C. N. Cofer & B. S. Musgrave (Eds.), Verbal learning and verbal behavior. New York: McGraw-Hill, 1961. Bower, G. H. Selective facilitation and interference in retention of prose. Journal of Educational Psychology, 1974, 66, 1–8. Bower, G. H., & Clark, M. C. Narrative stories as mediators for serial learning. Psychonomic Science, 1969, 14, 181–182. Bransford, J. D., Barclay, J. R., & Franks, J. J. Sentence memory: A constructive versus interpretative approach. Cognitive Psychology, 1972, 3, 193–209. Bransford, J. D., & Franks, J. J. The abstraction of linguistic ideas. Cognitive Psychology, 1971, 2, 331–350. Bransford, J. D., & Johnson, M. K. Contextual prerequisites for understanding: Some investigations of comprehension and recall. Journal of Verbal Learning and Verbal Behavior, 1972, 11, 717–726. Brewer, W. F., & Harris, R. J. Memory for deictic elements in sentences. Journal of Verbal Learning and Verbal Behavior, 1974, 13, 321–327.
Salkind_Chapter 39.indd 134
9/4/2010 10:42:44 AM
Johnson
Meaning in Complex Learning
135
Brewer, W. F., & Lichtenstein, E. H. Memory for marked semantic features versus memory for meaning. Journal of Verbal Learning and Verbal Behavior, 1974, 13, 172–180. Brockway, J., Chmielewski, D., & Cofer, C. N. Remembering prose: Productivity and accuracy constraints in recognition memory. Journal of Verbal Learning and Verbal Behavior, 1974, 13, 194–208. Brown, R. How shall a thing be called? Psychological Review, 1958, 65, 14 –21. (a) Brown, R. Words and things. New York: Free Press, 1958. (b) Bruner, J. S., Goodnow, J. J., & Austin, G. A. A study of thinking. New York: Wiley, 1956. Bruner, J. S., Olver, R. R., & Greenfield, P. M. Studies in cognitive growth. New York: Wiley, 1966. Campbell, D. T. Systematic error on the part of human links in communication systems. Information and Control, 1958, 1, 334–369. Carpenter, P. A. On the comprehension, storage, and retrieval of comparative sentences. Journal of Verbal Learning and Verbal Behavior, 1974, 13, 401–411. Carroll, J. B. Language and thought. Englewood Cliffs, N.J.: Prentice-Hall, 1964. (a) Carroll, J. B. Words, meanings, and concepts. Harvard Educational Review, 1964, 34, 178–202. (b) Carroll, J. B. Comprehension by 3rd, 6th, and 9th graders of words having multiple grammatical functions (Final Report, Project No. 0-0439, Grant No. OEG-2-9-400439-1059, U.S. Office of Education). Princeton, N.J.: Educational Testing Service, 1970. (ERIC Document Reproduction Service No. ED 048 311) Carroll, J. B. Defining language comprehension: Some speculations. In J. B. Carroll & R. O. Freedle (Eds.), Language comprehension and the acquisition of knowledge. Washington, D.C.: V. H. Winston, 1972. Chomsky, N. Aspects of the theory of syntax. Cambridge, Mass.: M.I.T. Press, 1965. Church, J. Language and the discovery of reality. New York: Random House, 1961. Clark, E. V. What’s in a word? On the child’s acquisition of semantics in his first language. In T. E. Moore (Ed.), Cognitive development and the acquisition of language. New York: Academic Press, 1973. Clark, H. H. The prediction of recall patterns in simple active sentences. Journal of Verbal Learning and Verbal Behavior, 1966, 5, 99–106. Clark, H. H. Word associations and linguistic theory. In J. Lyons (Ed.), New horizons in linguistics. Baltimore: Penguin, 1970. Clark, H. H., & Card, S. K. Role of semantics in remembering comparative sentences. Journal of Experimental Psychology, 1969, 82, 545–553. Clark, H. H., & Clark, E. V. Semantic distinctions and memory for complex sentences. Quarterly Journal of Experimental Psychology, 1968, 20, 129–138. Cofer, C. N. Free recall of nouns after presentation in sentences. Journal of Experimental Psychology, 1968, 78, 145–152. Cofer, C. N. Constructive processes in memory. American Scientist, 1973, 61, 537–543. Coleman, E. B. Developing a technology of written instruction: Some determiners of the complexity of prose. In E. Z. Rothkopf & P. E. Johnson (Eds.), Verbal learning research and the technology of written instruction. New York: Teachers College Press, 1971. Collins, A. M., & Quillian, M. R. Retrieval time from semantic memory. Journal of Verbal Learning and Verbal Behavior, 1969, 8, 240–247. Collins, A. M., & Quillian, M. R. How to make a language user. In E. Tulving & W. Donaldson (Eds.), Organization of memory. New York: Academic Press, 1972. Craik, F. I. M. A “levels of analysis” view of memory. In P. Pliner, L. Krames, & T. Alloway (Eds.), Communication and affect. New York: Academic Press, l973. Craik, F. I. M., & Lockhart, R. S. Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 1972, 11, 671–684. Craik, F. I. M., & Watkins, M. J. The role of rehearsal in short-term memory. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 599–607.
Salkind_Chapter 39.indd 135
9/4/2010 10:42:44 AM
136
Curriculum, Instruction and Learning
Creelman, M. B. The experimental investigation of meaning. New York: Springer, 1966. Deese, J. Influence of inter-item associative strength upon immediate free recall. Psychological Reports, 1959, 5, 305–312. Deese, J. The structure of associations in language and thought. Baltimore: Johns Hopkins Press, 1965. Deese, J. Cognitive structure and affect in language. In P. Pliner, L. Krames, & T. Alloway (Eds.), Communication and affect. New York: Academic Press, 1973. de Villiers, P . A. Imagery and theme in recall of connected discourse. Journal of Experimental Psychology, 1974, 103, 263–268. Dixon, R. M. W. What is language? A new approach to linguistic description. London: Longmans, Green & Co., 1965. Dooling, D. J., & Lachman, R. Effects of comprehension on retention of prose. Journal of Experimental Psychology, 1971, 88, 216–222. Dooling, D. J., & Mullet, R. L. Locus of thematic effects in retention of prose. Journal of Experimental Psychology, 1973, 97, 404 – 406. English, H. B., Welborn, E. L., & Killian, C. D. Studies in substance memorization. Journal of General Psychology, 1934, 11, 233–260. Entwisle, D. R. The word associations of young children. Baltimore: Johns Hopkins Press, 1966. Ervin, S. M., & Foster, G. The development of meaning in children’s descriptive terms. Journal of Abnormal and Social Psychology, 1960, 61, 271–275. Fillenbaum, S. Memory for gist: Some relevant variables. Language and Speech, 1966, 9, 217–227. Fillenbaum, S. On coping with ordered and unordered conjunctive sentences. Journal of Experimental Psychology, 1971, 87, 93–98. Fillenbaum, S. Or: Some uses. Journal of Experimental Psychology, 1974, 103, 913–921. (a) Fillenbaum, S. Pragmatic normalization: Further results for some conjunctive and disjunctive sentences. Journal of Experimental Psychology, 1974, 102, 574–578. (b) Fillenbaum, S., & Rapoport, A. Structures in the subjective lexicon. New York: Academic Press, 1971. Flesch, R. F. A new readability yardstick. Journal of Applied Psychology, 1948, 32, 221–233. Fodor, J. A. A review of Language and thought, by J. B. Carroll. The Modern Language Journal, 1965, 49, 384–386. Fodor, J. A., Bever, T. G., & Garrett, M. F. The psychology of language: An introduction to psycholinguistics and generative grammar. New York: McGraw-Hill, 1974. Franks, J. J., & Bransford, J. D. The acquisition of abstract ideas. Journal of Verbal Learning and Verbal Behavior, 1972, 11, 311–315. Franks, J. J., & Bransford, J. D. Memory for syntactic form as a function of semantic context. Journal of Experimental Psychology, 1974, 103, 1037–1039. Frase, L. T. Boundary conditions for mathemagenic behaviors. Review of Educational Research, 1970, 40, 337–347. Fries, C. C. Meaning and linguistic analysis. Language, 1954, 30, 57–68. Gardiner, J. M. Levels of processing in word recognition and subsequent free recall. Journal of Experimental Psychology, 1974, 102, 101–105. Glanzer, M. Grammatical category: A rote learning and word association analysis. Journal of Verbal Learning and Verbal Behavior, 1962, 1, 31–41. Glaze, J. A. The association value of nonsense syllables. Journal of Genetic Psychology, 1928, 35, 255–269. Gleitman, L. R., & Gleitman, H. Phrase and paraphrase. New York: W. W. Norton, 1970. Goldfarb, C., Wirtz, J., & Anisfeld, M. Abstract and concrete phrases in false recognition. Journal of Experimental Psychology, 1973, 98, 25–30.
Salkind_Chapter 39.indd 136
9/4/2010 10:42:44 AM
Johnson
Meaning in Complex Learning
137
Gomulicki, B. R. Recall as an abstractive process. Acta Psychologica, 1956, 12, 77–94. Greenberg, J. H. Language universals. The Hague: Mouton, 1966. Hall, J. F. Verbal learning and retention. Philadelphia: J. B. Lippincott, 1971. Harris, R. J., & Brewer, W. F. Deixis in memory for verb tense. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 590–597. Haviland, S. E., & Clark, H. H. What’s new? Acquiring new information as a process in comprehension. Journal of Verbal Learning and Verbal Behavior, 1974, 13, 512–521. Herriot, P. An introduction to the psychology of language. London: Methuen & Co. Ltd, 1970. Howe, E. S. Number of different free associates: A general measure of associative meaningfulness. Journal of Verbal Learning and Verbal Behavior, 1972, 11, 18–28. Jacoby, L. L. Encoding processes, rehearsal, and recall requirements. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 302–310. Jenkins, J. J. Remember that old theory of memory? Well, forget it. American Psychologist, 1974, 29, 785–795. Johnson, M. K., Bransford, J. D., & Solomon, S. K. Memory for tacit implications of sentences. Journal of Experimental Psychology, 1973, 98, 203–205. Johnson, R. E. The retention of qualitative changes in learning. Journal of Verbal Learning and Verbal Behavior, 1962, 1, 218–223. Johnson, R. E. Recall of prose as a function of the structural importance of the linguistic units. Journal of Verbal Learning and Verbal Behavior, 1970, 9, 12–20. Johnson, R. E. Meaningfulness and the recall of textual prose. American Educational Research Journal, 1973, 10, 49–58. Johnson, R. E. Abstractive processes in the remembering of prose. Journal of Educational Psychology, 1974, 66, 772–779. Kanungo, R. Paired-associate learning of function words. Psychonomic Science, 1968, 10, 47–48. Karwoski, T. F., & Berthold, F., Jr. Psychological studies in semantics: II. Reliability of the free association tests. Journal of Social Psychology, 1945, 22, 87–102. Katz, J. J., & Fodor, J. A. The structure of a semantic theory. Language, 1963, 39, 170–210. Katz, S. Role of instructions in abstraction of linguistic ideas. Journal of Experimental Psychology, 1973, 98, 79–84. Katz, S., & Gruenewald, P. The abstraction of linguistic ideas in “meaningless” sentences. Memory and Cognition, 1974, 2, 737–741. Kintsch, W. Notes on the structure of semantic memory. In E. Tulving & W. Donaldson (Eds.), Organization of memory. New York: Academic Press, 1972. Klare, G. R. The measurement of readability. Ames: Iowa State University Press, 1963. Lehrer, A., & Lehrer, K. (Eds.), Theory of meaning. Englewood Cliffs, N.J.: Prentice Hall, 1970. Lesgold, A. M. Pronominalization: A device for unifying sentences in memory. Journal of Verbal Learning and Verbal Behavior, 1972, 11, 316–323. Loftus, E. F., & Bolton, M. Retrieval of superordinates and subordinates. Journal of Experimental Psychology, 1974, 102, 121–125. Luchins, A. S. Mechanization in problem solving: The effect of Einstellung. Psychological Monographs, 1942, 54 (6, Whole No. 248). Luria, A. R. The role of speech in the regulation of normal and abnormal behavior. New York: Liveright, 1961. Lyons, J. Introduction to theoretical linguistics. Cambridge, England: Cambridge University Press, 1968. MacGinitie, W. H. Discussion of Professor Coleman’s paper. In E. Z. Rothkopf & P . E. Johnson (Eds.), Verbal learning research and the technology of written instruction. New York: Teachers College Press, 1971. Mandler, G. Organization and memory. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation. New York: Academic Press, 1967.
Salkind_Chapter 39.indd 137
9/4/2010 10:42:44 AM
138
Curriculum, Instruction and Learning
Martin, E. Stimulus meaningfulness and paired-associate transfer: An encoding variability hypothesis. Psychological Review, 1968, 75, 421– 441. Martin, E., & Walter, D. A. Subject uncertainty and word-class effects in short-term memory for sentences. Journal of Experimental Psychology, 1969, 80, 47–51. McNeill, D. The acquisition of language: The study of developmental psycholinguistics. New York: Harper & Row, 1970. Menyuk, P. Sentences children use. Cambridge, Mass.: M.I.T. Press, 1969. Miller, G. A. Some preliminaries to psycholinguistics. American Psychologist, 1965, 20, 15–20. Miller, G. A. A psychological method to investigate verbal concepts. Journal of Mathematical Psychology, 1969, 6, 169–191. Miller, G. A. English verbs of motion: A case study in semantics and lexical memory. In A. W. Melton & E. Martin (Eds.), Coding processes in human memory. Washington, D.C.: V. H. Winston, 1972. Moesner, S. D., & Bregman, A. S. The role of reference in the acquisition of a miniature artificial language. Journal of Verbal Learning and Verbal Behavior, 1972, 11, 759–769. Moran, L. J., Mefferd, R. B., & Kimble, J. P., Jr. Idiodynamic sets in word association. Psychological Monographs, 1964, 78 (2, Whole No. 579). Nelson, K. Concept, word, and sentence: Interrelations in acquisition and development. Psychological Review, 1974, 81, 267–285. Noble, C. E. An analysis of meaning. Psychological Review, 1952, 59, 421–430. Noble, C. E. Measurements of association value (a), rated associations (a′), and scaled meaningfulness (m′) for the 2100 C VC combinations of the English alphabet. Psychological Reports, 1961, 8, 487–521. Offir, C. E. Recognition memory for presuppositions of relative clause sentences. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 636–643. Oldfield, R. C. Frederick Charles Bartlett: 1886–1969. American Journal of Psychology, 1972, 85, 133–140. Olson, D. R. Language and thought: Aspects of a cognitive theory of semantics. Psychological Review, 1970, 77, 257–273. Osgood, C. E. Comments on Professor Bousfield’s paper. In C. N. Cofer & B. S. Musgrave (Eds.), Verbal learning and verbal behavior. New York: McGraw-Hill, 1961. Osgood, C. E., Suci, G. J., & Tannenbaum, P . The measurement of meaning. Urbana, Illinois: University of Illinois Press, 1957. Paivio, A. A factor-analytic study of word attributes and verbal learning. Journal of Verbal Learning and Verbal Behavior, 1968, 7, 41–49. Paivio, A. I magery and verbal processes. New York: Holt, Rinehart, and Winston, 1971. Paivio, A., & Olver, M. Denotative-generality, imagery, and meaningfulness in paired-associate learning of nouns. Psychonomic Science, 1964, 1, 183–184. Paivio, A., Smythe, P. C., & Yuille, J. C. Imagery versus meaningfulness of nouns in pairedassociate learning. Canadian Journal of Psychology, 1968, 22, 427–441. Paivio, A., Yuille, J. D., & Madigan, S. A. Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology Monograph, 1968, 76 (1, Pt. 2). Palermo, D. S., & Jenkins, J. J. Word association norms. Minneapolis: University of Minnesota Press, 1964. Palermo, D. S., & Molfese, D. L. Language acquisition from age five onward. Psychological Bulletin, 1972, 78, 409–428. Paris, S. G., & Carter, A. Y. Semantic and constructive aspects of sentence memory in children. Developmental Psychology, 1973, 9, 109–113. Paul, I. H. Studies in remembering: The reproduction of connected and extended verbal material. Psychological Issues, 1959, 1, No. 2. Perfetti, C. A., & Goldman, S. R. Thematization and sentence retrieval. Journal of Verbal Learning and Verbal Behavior, 1974, 13, 70–79.
Salkind_Chapter 39.indd 138
9/4/2010 10:42:44 AM
Johnson
Meaning in Complex Learning
139
Peterson, R. G., & Mclntyre, C. W. The influence of semantic ‘relatedness’ on linguistic integration and retention. American Journal of Psychology, 1973, 86, 697–706. Piaget, J. The psychology of intelligence. Patterson, New Jersey: Littlefield, Adams, 1960. (English-version reprint of 1947 edition.) Piaget, J., & Inhelder, B. Memory and intelligence. New York: Basic Books, 1973. (Englishversion reprint of 1968 edition.) Pollio, H. R. The psychology of symbolic activity. Reading, Mass.: Addison-Wesley, 1974. Pompi, K. F., & Lachman, R. Surrogate processes in the short-term retention of connected discourse. Journal of Experimental Psychology, 1967, 75, 143–150. Postman, L., & Burns, S. Experimental analysis of coding processes. Memory and Cognition, 1973, 1, 503–507. Potts, G. R. Information processing strategies used in the encoding of linear orderings. Journal of Verbal Learning and Verbal Behavior, 1972, 11, 727–740. Quillian, M. R. Semantic memory. In M. L. Minsky (Ed.), Semantic information processing. Cambridge, Mass.: M.I.T. Press, 1968. Quine, W. V. O. Word and object. Cambridge, Mass.: M.I.T. Press, 1960. Reitman, J. S., & Bower, G. H. Storage and later recognition of exemplars of concepts. Cognitive Psychology, 1973, 4, 194–206. Rickards, J. P., & Di Vesta, F. J. Type and frequency of questions in processing textual material. Journal of Educational Psychology, 1974, 66, 354–362. Rips, L. J., Shoben, E. J., & Smith, E. E. Semantic distance and the verification of semantic relations. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 1–20. Rohwer, W. D., Jr. Constraint, syntax and meaning in paired-associate learning. Journal of Verbal Learning and Verbal Behavior, 1966, 5, 541–547. Rosenberg, S. Association and phrase structure in sentence recall. Journal of Verbal Learning and Verbal Behavior, 1968, 7, 1077–1081. Rothkopf, E. Z., & Bisbicos, E. Selective facilitative effects of interspersed questions on learning from written material. Journal of Educational Psychology, 1967, 58, 56–61. Rumelhart, D. E., Lindsay, P. H., & Norman, D. A. A process model for long-term memory. In E. Tulving & W. Donaldson (Eds.), Organization of memory. New York: Academic Press, 1972. Sachs, J. S. Recognition memory for syntactic and semantic aspects of connected discourse. Perception and Psychophysics, 1967, 2, 437–442. Saltz, E. The cognitive bases of human learning. Homewood, Illinois: Dorsey, 1971. Saltz, E., & Modigliani, V. Response meaningfulness in paired-associates: T-L frequency, m, and number of meanings (dm). Journal of Experimental Psychology, 1967, 75, 313–320. Samuels, S. J. Effects of pictures on learning to read, comprehension and attitudes. Review of Educational Research, 1970, 40, 397–407. Shepard, R. N. Recognition memory for words, sentences, and pictures. Journal of Verbal Learning and Verbal Behavior, 1967, 6, 156–163. Singer, M., & Rosenberg, S. T. The role of grammatical relations in the abstraction on linguistic ideas. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 273–284. Slamecka, N.J. An examination of trace storage in free recall. Journal of Experimental Psychology, 1968, 76, 504–513. Smith, E. E., Shoben, E. J., & Rips, L. J. Structure and process in semantic memory: A featural model for semantic decisions. Psychological Review, 1974, 81, 214–241. Spreen, O., & Schulz, R. W. Parameters of abstraction, meaningfulness, and pronounceability for 329 nouns. Journal of Verbal Learning and Verbal Behavior, 1966, 5, 459–468. Sulin, R. A., & Dooling, D. J. Intrusion of a thematic idea in retention of prose. Journal of Experimental Psychology, 1974,103, 255–262. Taylor, W. L. Application of ‘cloze’ and entropy measures to the study of contextual constraints in continuous prose. (Doctoral dissertation, University of Illinois, 1954). Dissertation Abstracts, 1955, 15, 464–465. (University Microfilms No. MicA 55–592)
Salkind_Chapter 39.indd 139
9/4/2010 10:42:44 AM
140
Curriculum, Instruction and Learning
Terwilliger, R. F. Meaning and mind. New York: Oxford University Press, 1968. Thieman, T. J. Levels of processing serial lists embedded in narratives. Journal of Experimental Psychology, 1973, 100, 423–425. Thios, S. J. Memory for general and specific sentences. Memory and Cognition, 1975, 3, 75–77. Thorndike, E. L., & Lorge, I. The teacher’s word book of 30,000 words. New York: Teachers College Press, 1944. Tulving, E., & Pearlstone, Z. Availability versus accessibility of information in memory for words. Journal of Verbal Learning and Verbal Behavior, 1966, 5, 381–391. Underwood, B. J., & Schulz, R. W. Meaningfulness and verbal learning. Chicago: Lippincott, 1960. Watts, G. H., & Anderson, R. C. Effects of three types of inserted questions on learning from prose. Journal of Educational Psychology, 1971, 62, 387–394. Waugh, N. C, & Norman, D. A. Primary memory. Psychological Review, 1965, 72, 89–104. Weaver, W. W., & Kingston, A. J. A factor analysis of the cloze procedure and other measures of reading and language ability. Journal of Communication, 1963, 13, 252–261. Weisberg, R. W. On sentence storage: The influence of syntactic versus semantic factors on intrasentence word associations. Journal of Verbal Learning and Verbal Behavior, 1971, 10, 631–644. Wickelgren, W. A. The long and short of memory. Psychological Bulletin, 1973, 80, 425–438. Woodward, A. E., Jr., Bjork, R. A., & Jongeward, R. H., Jr. Recall and recognition as a function of primary rehearsal. Journal of Verbal Learning and Verbal Behavior, 1973, 12, 608–617. Zangwill, O. L. Remembering revisited. Quarterly Journal of Experimental Psychology, 1972, 24, 123–138.
Salkind_Chapter 39.indd 140
9/4/2010 10:42:44 AM
40 Phases of Meaningful Learning Thomas J. Shuell
L
earning is a much more complex and drawn out process than generally acknowledged. The type of complex, meaningful learning that occurs in school and throughout the life span occurs over a period of weeks, months, and years, and there is good reason to believe that the nature of the learning process changes as the task of mastering a complex body of knowledge unfolds. For example, there is good evidence that experts and novices in a field respond to tasks in fundamentally different ways (e.g., Chi, Glaser, & Rees, 1982). As one progresses from the initial encounter with a complex body of knowledge to the point where the expert is able to demonstrate understanding of that knowledge in ways that are more-or-less automatic, a task that once constituted a problem for the new learner (and elicited various problem-solving strategies) becomes little more than a simple recall task for the more experienced and sophisticated learner. This article will explore the notion that distinct stages or phases can be identified along the journey from knowing virtually nothing about a complex body of knowledge to the demonstration of a highly proficient mastery of that knowledge. After a discussion of several issues related to a phase theory of learning, research relevant to phases in simpler types of learning will be presented, followed by a similar discussion of phases in more meaningful learning. Finally, implications of these reviews will be discussed with regard to both theories of learning and educational practices. The idea of stages is certainly not new to psychology. There are the developmental stages of Piaget and Bruner and a long-standing concern for stages in problem solving (Andre, 1986; Mayer, 1983). Over the years, a variety of stage
Source: Review of Educational Research, 60(4) (1990): 531–547.
Salkind_Chapter 40.indd 141
9/4/2010 10:38:22 AM
142
Curriculum, Instruction and Learning
theories have been suggested for various types of learning (e.g., Anderson, 1982; Fleishman & Hempel, 1954, 1955; McGuire, 1961; Underwood, Runquist, & Schulz, 1959), and Brainerd (1985; Brainerd, Howe, & Desrochers, 1982) has developed a sophisticated mathematical two-stage model of learning. A concern for stages is clearly evident, at least implicitly, in the current literature on cognitive learning – for example, the growing body of literature on expert-novice differences (e.g., Chi, Glaser, & Rees, 1982). A number of factors have contributed to this concern for stages, or phases, in meaningful, cognitive learning. Current theories of learning, for instance, emphasize that learning is an active, constructive, cumulative, and goal-oriented process that involves problem solving (e.g., Shuell, 1986a, 1990). This view of learning as a complex, drawn-out process (e.g., Norman, 1978) that depends on factors from many sources suggests that learning may change as it progresses. This possibility, coupled with evidence that performance is strongly influenced by one’s prior knowledge (e.g., Bransford & Johnson, 1972; Chiesi, Spilich, & Voss, 1979), makes a concern for phases in meaningful learning an appropriate and timely pursuit. Although many cognitive theorists seem to accept the notion of phases in meaningful learning, there have been few systematic attempts to explore the issue in depth. Most of the empirical evidence on stages of learning deals with simpler forms of learning. Although the evidence of phases in long-term meaningful learning is not as convincing at present as one would like, there is good reason to postulate their presence. The following review combines evidence from empirical studies and theoretical discussions of both simple and meaningful forms of learning in order to evaluate this possibility.
An Overview of the Problem Imagine yourself about to embark on a long journey, a journey that involves learning a complex body of knowledge with which you currently are unfamiliar. At first, the new terrain appears strange, although certain similarities with familiar territory can be identified. During the first leg of this journey you find yourself primarily memorizing isolated facts (i.e., landmarks), for you do not yet possess a schema for interpreting and integrating the various pieces of information that are encountered. Initially, for example, you may find that mnemonics are helpful in remembering these more-or-less isolated facts. As learning progresses, however, and you begin to group and organize the facts and integrate them into higher order structures, you may find that mnemonics play a less beneficial role. In their place, various types of organizational aids (e.g., developing hierarchies and matrices) that were of little help initially may begin to play a more important role. But the nature of the learning process is not the only thing that changes as learning progresses; the learning process also becomes more diverse. Initially,
Salkind_Chapter 40.indd 142
9/4/2010 10:38:22 AM
Shuell
Phases of Meaningful Learning
143
the learner must rely on experiences associated with a particular course or on a few books selected for self-study. As the learner becomes more familiar with the territory through which he or she is traveling, the learner is likely to encounter a variety of relevant books, to attend lectures, to discuss issues with other students (at the same and/or more advanced levels), to use his or her knowledge to interpret various situations (e.g., a play, a movie, the failure of something to work in the way it is supposed to, the behavior of other people), and so forth. In short, meaningful learning in any field is a much more complex process than often realized; different types of learning are involved, and – as this article will address – various phases or stages occur during which the nature of the learning process changes in systematic ways.
What Is a Phase of Learning? Because special connotations are usually associated with the use of the term stage within psychology, it seems advisable to address at the onset the way in which phase will be used in this article. Generally speaking, the term stage is used to refer to distinct time periods. Each period is characterized by psychological functioning that is qualitatively different from that which occurs during other periods. The most notable examples, of course, are developmental stages such as those of Jean Piaget. Developmental stages of this type are structural in nature and apply across all domains. For example, most developmental stage theories consider it impossible for a child to be in the formal-operations stage in mathematics and in the concrete-operations stage in social studies, and thus developmental stages are considered to be independent of specific content domains. There is a growing body of literature, however, that challenges the validity of developmental stages conceived in this way (e.g., Keil, 1986). There is increasing evidence that the qualitative changes that occur with age are the result of knowledge-based competencies within a particular content domain, although the possibility of general competencies arising from similarities among the various domains is not ruled out. An example of such knowledgebased competencies is demonstrated in a study by Chi (1978). There is a great deal of evidence, reviewed by Chi, that memory span (the number of items that can be recalled after a single presentation) increases in a linear manner with age, with adults remembering up to twice as much as children five- to seven-years old. Chi’s study involved children in the third through eighth grade (mean age was 10.5 years) who were experts at playing chess and adults who were only novice chess players. The typical finding of adult superiority in memory span was obtained when the subjects were asked to remember digits (6.1 vs. 7.8). However, when the subjects were asked to remember the placement of pieces on a chess board, the performance of the 10-year-old experts far surpassed that of the adults (9.3 vs. 5.9).
Salkind_Chapter 40.indd 143
9/4/2010 10:38:22 AM
144
Curriculum, Instruction and Learning
This finding is often cited as evidence that knowledge differences may explain many of the developmental differences (and perhaps developmental stages as well) typically found in the literature. Karmiloff-Smith (1984, 1986) makes a useful distinction among stage, phase, and level Stage is used to refer to periods of time that differ qualitatively from both the preceding and succeeding stages. These stages, by definition, cover performance in a variety of content domains, and, once a person achieves a particular stage, he or she cannot return to a preceding one. Although a phase may include behavior across several domains, phases are recurrent in that individuals can pass through the various phases in each of many different content domains. Phases are based on: . . . the hypothesis that children (and adults, for that matter) attack any new problem by going through the same three phases, both within the various parts of particular domains and across different domains. The phase concept is focused on underlying similarity of process [italics added], whereas the stage concept usually refers to similarity of structure. (Karmiloff-Smith, 1984, p. 41)
Level refers to qualitative changes within a particular domain (e.g., the proper use of modifiers such as adjectives and adverbs) and accounts for specific changes within that domain. “Like stages, and unlike phases, levels are not recurrent. Once a child is at, say, level 3 in a specific domain, she does not return to level 1” (Karmiloff-Smith, 1984, pp. 41– 42). Within this type of conceptual framework, it makes most sense to think in terms of learning phases rather than learning stages, and, consequently, that term is used in this article.
Do Phases Add Anything Worthwhile to Our Understanding of Human Learning? If learning is a continuous process – and most of us would agree that it is – then one might reasonably ask what the notion of phases adds to our understanding of complex, meaningful learning. Both theoretical and practical implications exist if in fact the nature of the learning process changes in fundamental ways as learning progresses. Theoretically, it means that learning is a much more complex process than we had imagined. Not only must the type of learning be considered when conducting research but factors related to the length of time that the learning has been taking place must be considered, and prior knowledge will need to be considered in a much more explicit manner than is typically the case. In addition, concern for boundary conditions of various learning principles will need to include factors related to the phase of learning in which the learner is working. On the more practical side, there are also implications for teaching. Just as one should not teach in the same way when different types of outcomes
Salkind_Chapter 40.indd 144
9/4/2010 10:38:22 AM
Shuell
Phases of Meaningful Learning
145
are sought (e.g., the acquisition of concepts vs. the acquisition of facts), one should teach differently when different phases of learning are involved. The teaching methods employed, as well as the content, should be appropriate for the phase of learning in which the students are engaged. For example, one would teach differently if a new topical area is just being introduced than if the students had already gained some proficiency in the domain. Thus, introductory courses should be taught differently from more advanced courses – at least in part – but, in more instances than not, introductory and advance courses in a particular content area are taught in basically the same way.
Procedures for Identifying Phases In order to study phases of learning, objective and methodologically valid techniques must be used to distinguish among the various phases. Merely postulating their existence, defining them in terms of how much time has passed (how much practice/experience has occurred), and/or giving them plausible names are clearly insufficient methods of establishing their existence. The qualitative differences that presumably exist among the phases must be specified and verified in some objective way. One, and perhaps the best, way to differentiate among phases is to identify variables that function differently and /or have different effects during the various phases. For example, the distinction between short-term and long-term memory has been validated in this manner, for the effects of acoustic similarity are much greater in short-term than in long-term memory, whereas the effects of semantic similarity are much greater in long-term than in short-term memory (e.g., Baddeley, 1966; Conrad, 1964; Kintsch & Buschke, 1969). Expanding on this approach, a matrix can be established to portray the way in which relevant factors/variables operate in the various phases. Such a matrix, from Dreyfus and Dreyfus (1986), is presented in Table 1. In developing a matrix of this type, the defining factors/variables should be specified in an objective manner, and there needs to be clear evidence that these variables/processes actually change in the systematic manner specified. Various techniques have been used to identify the presence of stages or subproblems in problem solving. For example, Restle and Davis (1962) calculated the number of stages involved in solving a problem by dividing the square of the average time (across subjects) to solve a problem by the square of the standard deviation. Hayes (1965, 1966) and Thomas (1974) identified subproblems by comparing response times at each step of a well-defined problem. They assumed that response time is faster as a subject completes a subproblem and slower as he or she begins working on a new subproblem (presumably because time is required to think about a solution to the next subproblem).
Salkind_Chapter 40.indd 145
9/4/2010 10:38:22 AM
146
Curriculum, Instruction and Learning
Table 1: Stages of skill acquisition Skill level
Components
Perspective
Decision
Commitment
Novice
Context-free
None
Analytical
Detached
Advanced beginner
Context-free and situational
None
Analytical
Detached
Competent
Context-free and situational
Chosen
Analytical
Detached understanding and deciding. Involved in outcome
Proficient
Context-free and situational
Experienced
Analytical
Involved understanding. Detached deciding
Expert
Context-free and situational
Experienced
Intuitive
Involved
Note: From Mind Over Machine: The Power of Human Intuition and Expertise in the Era of the Computer (p. 50) by Hubert L. Dreyfus and Stuart E. Dreyfus, 1986, New York: The Free Press. Copyright 1986 by Hubert L. Dreyfus and Stuart E. Dreyfus. Reprinted by permission of The Free Press, a Division of Macmillan, Inc.
All three techniques are based on questionable assumptions and serve merely to identify the number of stages involved. They provide little, if any, information on the nature of each stage or the variables that affect learning during that stage. In addition, they do not lend themselves to the more complex and meaningful forms of learning addressed in this article.
Phase Theories in Simpler Forms of Learning Various forms of a phase theory of learning have existed since the earliest days of research on learning. For example, Bryan and Harter’s (1897, 1899) well-known studies of telegraph operators learning Morse code provide evidence for a phase theory of learning. The beginner first learns the alphabet of dots and dashes and sends and receives words on a letter-by-letter basis. With practice, the operator begins to combine these individual letters into higher order units that correspond to words, and, with continued practice, he or she combines words into units comprised of several words (i.e., phrases and short sentences). During the 1950s and 1960s, phase theories of paired-associated learning and skill learning were proposed. Later, John Anderson (1982) suggested a phase theory of procedural learning. These various theories of learning phases in complex forms of human learning will be discussed in the remainder of this section. Phase theories concerned with more meaningful forms of learning will be discussed in the subsequent section as well as expert-novice differences in processing meaningful material and the corresponding concern for the nature and development of intellectual competence that implies, at least implicitly, a phase theory of learning.
Salkind_Chapter 40.indd 146
9/4/2010 10:38:22 AM
Shuell
Phases of Meaningful Learning
147
Paired-Associated Learning During the early 1960s, several theorists proposed multiprocess theories of paired = associate (PA) learning. In their influential book, Meaningfulness and Verbal Learning, Underwood and Schulz (1960) suggested a two-stage analysis of PA learning consisting of a response learning, or response recall, stage (in which the subject learns the various responses that are used on the list) and an associative, or hook up, stage (in which the connection between each response and its corresponding stimulus is acquired). A number of other investigators, however, suggested two-process theories consisting of a stimulus differentiation phase and an associative phase in situations in which the responses were all well learned (see Battig, 1968, for a comprehensive discussion of multiprocess theories in PA learning). McGuire’s 1961 report of his 1954 dissertation suggested the first three-stage theory consisting of (a) stimulus encoding, or stimulus predifferentiation (in which the subject learns to discriminate among the various stimuli on the list); (b) mediation (in which a link is found for associating each stimulus with the appropriate response); and (c) response learning (in which the subject must learn the various response items that are being used). For example, in learning the list DOG-TREE, CAT-TABLE, and MAN-CHAIR, the learner must first learn to differentiate among dog, cat, and man (the task would be more difficult, of course, if the stimuli were XJC, YJB, and XKB). Note that it is not necessary to learn the responses, because they are always presented. Then he or she must find a way of linking each member of the pair (the cat is on the table, etc.). In addition, the subject must learn that tree, table, and chair (but not house) are the various responses that are appropriate. Evidence to support these stage theories of paired-associate learning are based on findings that response meaningfulness affects paired-associate learning to a greater extent than stimulus meaningfulness and that intralist response similarity (i.e., all of the responses in the list are more-or-less synonymous in meaning vs. being unrelated) has a facilitative effect on response learning but a detrimental effect on overall learning (that presumably includes both the response-learning and associative phases; Underwood, Runquist, & Schulz, 1959). In addition, McGuire (1961) presented support for his three-stage theory in a detailed analysis of correct responses and intrusion errors in the learning of pairs in which the stimuli were solid black circles of varying diameters and the responses were numbers. In a related study involving free-recall learning, Labouvie, Frohring, Baltes, and Goulet (1973) compared the correlation patterns between free-recall performance of pictorial stimuli (with recall commencing either immediately after presentation or after a 30-second delay) and a battery of eight intelligence and memory-ability tests. Although the same acquisition curves were obtained under immediate and delayed recall, there were systematic differences in the patterns of correlations obtained for the two conditions. Intelligence variables
Salkind_Chapter 40.indd 147
9/4/2010 10:38:22 AM
148
Curriculum, Instruction and Learning
correlated to a fairly high extent (.53 to .77) with recall during the later stages of acquisition under conditions of delayed recall, but the correlations were considerably less (.29 to .45) during the early stages of delayed recall and during all stages of immediate recall. Memory variables, on the other hand, were significantly correlated (.55 to .56) with performance during early stages of acquisition under conditions of immediate recall but not under the other conditions. Thus, it appears that the task demands involved in learning may change systemically as learning progresses.
Skill Learning More research has been done on phases in skill learning than in any other type of research on learning, beginning with Bryan and Harter’s (1897, 1899) classic studies on learning Morse code. During the mid-1950s, a series of factor analytic studies by Fleishman and Hempel (1954, 1955) revealed systematic changes in the particular combination of psychomotor abilities (e.g., reaction time, manual dexterity, rate of movement, spatial relations) most important for performance as learning progressed. For example, nonmotor abilities such as verbal ability and spatial relations play an important role early in learning, but their importance decreases progressively with practice. Motor abilities (e.g., reaction time, rate of movement), on the other hand, play an increasingly important role as learning progresses, as does a factor specific to the task itself. Later, Fitts (1962; 1964) suggested that skill learning consists of three phases: (a) cognitive, (b) associative, and (c) autonomous (although the labels applied to the three phases differ somewhat in his various writings). The initial phase can be relatively short, depending on the complexity of the task and consisting of “the time required to understand instructions, to complete a few preliminary trials, and to establish the proper cognitive set for the task” (Fitts, 1964, p. 262). The intermediate phase involves mediation and learning to associate various responses to specific cues as well as cognitive set learning. During the late phase, highly skilled performance continues to improve indefinitely. It should be noted that Fitts (1964) emphasizes that skill learning is a continuous process, without distinct stages as such. Instead, we should think of gradual shifts in the factor structure of skills, or in the nature of the processes (strategies and tactics; executive routines and subroutines) employed, as learning progresses. The evolving process is revealed by the organization of behavior into larger and larger units . . . and toward hierarchical organization. (pp. 261–262)
Dreyfus and Dreyfus (1986) have also suggested a phase theory of skill learning. Based on their observations of skill acquisition in airplane pilots, chess players, automobile drivers, and adults learning a second language, they
Salkind_Chapter 40.indd 148
9/4/2010 10:38:22 AM
Shuell
Phases of Meaningful Learning
149
suggest that five stages are involved in learning a complex skill: (a) novice, ( b) advanced beginner, (c) competent, (d) proficient, and (e) expert. The phases are defined largely by the manner in which four factors (components, perspective, decision, and commitment) operate in the respective phases (see Table 1). Each stage is described in considerable detail, but the data on which their theory is based are never presented. Consequently, it is difficult to determine the validity of the various stages or whether five stages (e.g., rather than three) are really needed to explain the data. Benner (1984), starting with the five stages as a given, attempted to validate the Dreyfus model in an interview study of expert and novice nurses. The nurses were asked for their perceptions of what was important in a series of case studies and how they would seek a solution to the problem involved in the case. Unfortunately, the study does not provide a clear test of the theory.
Procedural Learning In some ways, Anderson’s (1982, 1987) research could be included in the section on meaningful learning, for it deals with intellectual skills, such as solving mathematics problems, and Anderson argues that the model applies to all complex learning. It is represented in this section, however, because it has been presented consistently as a model of skill learning. Anderson proposes three phases of procedural learning very similar to Fitts’ (1964) three phases of skill learning; Anderson refers to them as the (a) declarative, ( b) knowledgecompilation, and (c) procedural phases. Anderson (1982, 1987) makes the common distinction between declarative, or propositional knowledge (knowledge about something), and procedural knowledge (knowledge of how to do something), and he argues that we begin learning a new domain by encoding a set of facts in largely unanalyzed form that we subsequently can interpret without allowing it to control our behavior. That is, we can withhold judgment about the behavioral implications of such declarative knowledge until we see examples of and reflect upon ways in which it can be used. With additional experience, we begin to combine some of this declarative knowledge into procedures that allow us to apply it on a limited basis that still does not demand explicit control of our behavior. Ultimately, procedures evolve that do control our behavior without a great deal of conscious thought and effort – that is, the behavior comes automatic.
Phase Theories in Meaningful Learning Phase theories have also been discussed with regard to more meaningful forms of learning, although usually on a more implicit and less well-developed basis than the phase theories discussed in the preceding section. Although there
Salkind_Chapter 40.indd 149
9/4/2010 10:38:22 AM
150
Curriculum, Instruction and Learning
is considerable agreement among various investigators on the viability of phases in long-term meaningful learning, the empirical evidence to support their presence is not overwhelming at present. Thus, the following review of the literature is based more on theoretical arguments than on empirical evidence, although the latter will be discussed whenever possible. Perhaps the earliest discussion of stages in meaningful learning was Wallas’ (1926) suggestion that problem solving involves four stages: preparation, incubation, illumination, and verification. Unfortunately, Wallas’ stages, as well as similar ones suggested by other investigators, are based more on introspection than on sound scientific investigations of any kind. A good critique of this literature is contained in Mayer (1983). Within the context of a schema-based theory of long-term memory, Rumelhart and Norman (1978) have suggested three qualitatively different types of learning: (a) accretion, or the encoding of new information in terms of existing schemata; ( b) tuning, or schema evolution, the slow modification and refinement of a schema as a result of using it in different situations; and (c) restructuring, or schema creation, the process by which new schemata are created. Rumelhart and Norman imply that these three kinds of learning occur sequentially, but, whereas there is consistency in listing accretion as the first phase, they interchange the order of tuning and restructuring in their discussion. Spiro, Coulson, Feltovich, and Anderson’s (1988) cognitive flexibility theory focuses on advanced knowledge acquisition. This phase of learning occurs between one’s initial attempts to study a subject area and the high levels of expertise that come with massive amounts of experience. According to Spiro et al.: This often neglected intermediate stage is important because the aims and means of advanced knowledge acquisition are different from those of introductory learning. In introductory learning the goal is often mere exposure to content and the establishment of a general orientation to a field; objectives of assessment are likewise confined to the simple effects of exposure (e.g., recognition and recall). At some point in learning about a knowledge domain the goal must change; at some point students must ‘get it right.’ This is the stage of advanced knowledge acquisition. (p. 1)
Although the phase aspect of the Spiro et al. theory is based more on their experience than on sound empirical evidence that the various phases actually exist, it does provide an example of current thinking among cognitive psychologists on the topic. Probably the best developed and most empirically based phase theory of meaningful learning is Karmiloff-Smith’s (1984, 1986) theory of cognitive development, discussed in the following section. The subsequent section will explore the implications of research on expert-novice differences (and the corresponding concern for the development of competence) for a phase theory of learning.
Salkind_Chapter 40.indd 150
9/4/2010 10:38:23 AM
Shuell
Phases of Meaningful Learning
151
Developmental Learning Based on evidence from several studies, Karmiloff-Smith (1984, 1986) has developed a knowledge-based theory of cognitive development. She believes that the theory is relevant to individuals of all ages who are learning a new content area. Phases and levels are distinguished from stages (as described earlier in this article), and she postulates the involvement of three phases/levels referred to as: (a) procedural, (b) metaprocedural, and (c) conceptual.1 During the procedural phase/level, the individual’s responses are generated primarily by data-driven processes generated by the individual’s adapting to external stimuli. The person’s behavior is controlled predominantly by the environment. During this initial phase/level, one observes behavioral change with no attempt to develop an overall organization capable of linking the isolated behavioral units into a consistent whole. During the second (or metaprocedural) phase, the individual beings to work in a “top-down” manner on the mental representations formed during the first phase – that is, the person begins to reflect or think about these representations as entities in their own right. During this phase, external stimuli become secondary to an internal representation that the person imposes on the environment. The person’s external behavior may actually deteriorate somewhat from what was observed during the preceding phase, for external stimuli are ignored as he or she experiments with the internal representation. The third (or conceptual) phase is governed by a subtle control mechanism that modulates the interaction between the data-driven processes characteristic of the first phase and the top-down processes characteristic of the second phase. The person is now in control of both environmental stimuli and the internal representations that guide his or her behavior. During this phase, the individual is able to consider environmental feedback without jeopardizing the structure of the internal representations.
Expert-Novice Differences Research on expert-novice differences grew out of a concern for the nature of intellectual competence and the way it develops. Because experts and novices are presumed to differ primarily, if not exclusively, in terms of the experience they have had in a particular subject-matter domain, we are dealing once again with a knowledge-based approach to learning. Although there is general agreement that a continuum exists as the individual moves from novice to expert in a particular field, most of the research to date has been concerned with describing differences in the way the two groups solve problems. It should be noted that in this research novice typically refers to someone who has had limited experience with the field or material being investigated, not someone with no experience. For example, in research on physics problem solving, a novice
Salkind_Chapter 40.indd 151
9/4/2010 10:38:23 AM
152
Curriculum, Instruction and Learning
might have had one undergraduate course in physics, whereas an expert might be a professor of physics or someone with comparable experience. Such minimal experience for the novice is necessary in order to have a reasonable basis for comparison, for data on how novices solve a problem could not be obtained if the subjects could not solve the problem at all. A number of qualitative differences between experts and novices have been identified (for a brief review, see Glaser & Chi, 1988; Shuell, 1986b). For example, in solving physics problems, experts tend to perform a qualitative analysis of the problem prior to deciding which equations to use, whereas novices tend to focus on equations from the onset and engage in a direct syntactic translation (e.g., identifying variables and then plugging them into an equation) rather than generating a physical representation of the problem situation. Likewise, novices tend to focus on literal objects and/or key terms explicitly mentioned in the problem, whereas experts tend to identify features that reflect the states and conditions of the physical situation described in the problem (Chi et al., 1982). Thus, novices might respond to (identify or classify) a problem in terms of “friction” or “gravity,” whereas experts might refer to it in terms of “given initial conditions” or “no external force” (p. 64). Few attempts have been made to identify stages or phases that might exist between the two states, although Voss, Greene, Post, and Penner (1983) discuss differences among undergraduates (novices), graduate students, and experts. Chi (1978) distinguishes among novice, advanced novice, and expert. Champagne, Klopfer, and Gunstone (1982) differentiate between uninstructed, or preinstructional students (i.e., those who have no experience studying the topic); novice (those with minimal experience in the field – i.e., the typical novice in expert-novice studies); and experts in their discussion of research relevant to the teaching of physics. Not only do Champagne et al. (1982) provide a detailed description of differences between the schemata of students in these three phases of learning based on their analysis of various empirical studies, they also discuss ways in which these differences are related to teaching students in each phase. Uninstructed students, for example, use principles that are little more than generalized rules derived from their everyday experiences. Consequently, these principles tend to be imprecise due to the students’ vague understanding of concepts, errors of magnitude, and inappropriate formulations of general rules. For novices, however, principles involve relationships between physical variables in the form of equations or rules. Although the major laws of physics are expressed in equation form, there is no indication that these equations serve an organizing function (e.g., as schemata). For experts, principles represent major laws of physics in a highly abstract form that expresses relationships with great generality. Each principle includes the conditions under which the principle applies and has an associated schema that serves to organize the relevant material.
Salkind_Chapter 40.indd 152
9/4/2010 10:38:23 AM
Shuell
Phases of Meaningful Learning
153
Conclusions Meaningful cognitive learning is an active, constructive, and cumulative process that occurs gradually over a period of time (Shuell, 1986a). It is a goal oriented process best characterized in terms of problem solving (Anderson, 1987; Bereiter, 1989; Shuell, 1990). Learning is not merely an additive process – qualitative, as well as quantitative, changes occur, and qualitative differences are evident in both the substance of what is being learned and in the learning processes most appropriate for acquiring additional knowledge. The preceding review of the literature reveals reasonable agreement among investigators that a learner passes through a series of phases as his or her knowledge about something evolves. During these phases, the learning process and the variables influencing it change in systematic ways. The exact number of phases that might be involved is not clear (although most theories have postulated three), and the characteristics of each phase have not been worked out in any detail. Needless to say, the number of phases and the defining characteristics of each one must be established on the basis of sound methodology, as discussed earlier in this article. Merely postulating their existence is not enough. Nevertheless, it may prove useful to attempt an initial description of what the various phases might be like as well as some speculative comments about the transition from one phase to another.
Initial Phase During the initial phase of learning, the individual encounters a large array of facts and pieces of information that are more-or-less isolated conceptually. Merely because someone familiar with the topic (teacher, expert, etc.) may see an organizing structure with many interrelationships among the various facts does not mean that the novice learner can make sense out of them. Initially, there appears to be little more than a wasteland with few landmarks to guide the traveler on his or her journey toward understanding and mastery. Under the circumstances, the learner does the only thing that is reasonable: memorizes facts and uses preexisting schemata to interpret the isolated pieces of data. Some of this new information is added to existing knowledge structures – for example, Rumelhart and Norman’s (1978) notion of accretion – and these preexisting knowledge structures are used for interpreting the new information and giving it meaning. If no meaning can be found, the information remains as isolated facts. Because the learner has little specific knowledge of the domain, the initial processing is global in nature (Sternberg, 1984). The learner must rely on general, domain-independent problem solving strategies and knowledge
Salkind_Chapter 40.indd 153
9/4/2010 10:38:23 AM
154
Curriculum, Instruction and Learning
from other domains to interpret the new information, to make comparisons and contrasts, and to find analogies that appear relevant to the learner (Anderson, 1987; Brown, Bransford, Ferrara, & Campione, 1983). The information acquired during this initial phase is concrete rather than abstract and bound to the specific context in which it occurs (Bransford & Franks, 1976). Thus, the encounter with a new domain of knowledge involves the rote learning of more-or-less isolated facts (we memorize new terms or what appear to be key facts – if we are learning a structured body of knowledge such as history, literature, or psychology – or we identify and try to remember key landmarks if we are learning to navigate around a large city.2 Gradually, the learner begins to form an overview of what the new domain is all about. In pursuing this task, our prior knowledge provides some help (or in certain cases hindrance) by suggesting initial possibilities and by establishing boundary constraints that assist in identifying both the sameness and the uniqueness of the new information (Bransford & Franks, 1976). Analogies from other domains may be used to represent the new domain, although these initial analogies must be modified as learning progresses (Anderson, 1987). The sophisticated learner may make assumptions, based on previous learning experiences, such as (a) “the knowledge [I am learning] has a structure that is more complex than [presently evident],” (b) “[I am] going to have trouble judging the importance of information and [it is] better to err on the side of overestimating importance,” and (c) “familiar words may have special meanings in the [new] domain” (Bereiter, 1989, p. 4). The fog that has shrouded the terrain is beginning to lift, but it is still difficult to see things clearly. During the initial phase, relatively simple forms of learning (e.g., operant conditioning, verbal learning) account for a large part of the learning that occurs. Classical conditioning may also be relevant with regard to establishing an emotional/affective predisposition to learning within that domain. Early stages of concept learning (e.g., grouping) may also occur, but the learner has acquired insufficient information for more complex forms of propositional and procedural learning to occur – such as Rumelhart and Norman’s (1978) tuning and restructuring. Thus, one might reasonably expect mnemonic strategies (a form of elaborative encoding) to have a greater affect on learning than chunking (a form of reductive encoding).3
Intermediate Phase Gradually, the learner begins to see similarities and relationships among these conceptually isolated pieces of information. The fog continues to lift but still has not burnt off completely. As these relationships become better developed, they are formed into higher order structures and networks. New schemata that provide the learner with more conceptual power are formed, but these
Salkind_Chapter 40.indd 154
9/4/2010 10:38:23 AM
Shuell
Phases of Meaningful Learning
155
new structures and schemata do not yet allow the learner to function on a fully autonomous, or automatic, basis. More meaningful forms of propositional and procedural learning predominate – what Spiro et al. (1988) refer to as advanced knowledge acquisition – and the student must now “ ‘get it right’ . . . attain a deeper understanding of content material, reason with it, and apply it flexibly in diverse contexts” (Spiro et al., 1988, p. 1). We extend our knowledge by applying it to new situations and by learning by doing – that is, the information acquired during the initial phase is now applied to the solution of various problems that the learner encounters, including understanding and explaining various situations such as might be involved in answering an essay question. An important advantage of this phase is that we can try out new knowledge in various ways and receive feedback on its appropriateness without its having autonomous control over our behavior (Anderson, 1982). Thus, there is the opportunity for reflection. As our knowledge becomes more abstract and more capable of being generalized to a variety of situations, it becomes less dependent on the specific context in which it was originally acquired (Karmiloff-Smith, 1984, 1986). During this phase, there may be a temporary deterioration in performance as all of these competing factors are sorted out (Karmiloff-Smith, 1984, 1986; Lesgold et al., 1988). Does learning automatically progress to this intermediate phase? Not necessarily. To insure that the transition occurs from the initial to the intermediate phase, certain things need to occur. Unfortunately, these things often are missing from an educational system that emphasizes the accumulation of more and more factual information – that is, an additive model of learning. In order for information to become more abstract, or decontextualized, Bransford and Franks (1976) suggest that concepts and knowledge should be used to clarify different situations, and they stress the importance of encountering relevant examples, a recommendation that is similar to Spiro et al.’s (1988) emphasis on learning by cases. The teacher and/or the learner can additionally employ various organizational strategies such as outlining and cognitive mapping (that can help the learner to identify and develop higher order relationships in the information being learned) and use the information to solve problems of various types (learning by doing). Variables such as mnemonics, for example, that had substantial affects on learning during the initial phase may have little, if any, affect on learning during the intermediate phase.
Terminal Phase During the last phase of learning, the knowledge structures and schemata formed during the intermediate phase become better integrated and function more autonomously. In most situations, performance will be automatic,
Salkind_Chapter 40.indd 155
9/4/2010 10:38:23 AM
156
Curriculum, Instruction and Learning
unconscious, and effortless, because relevant knowledge structures now control behavior in a more direct manner (Anderson, 1982). The individual relies heavily, if not exclusively, on domain-specific strategies for solving problems, answering questions, and so forth. The emphasis in this phase is on performance rather than learning, because any change in performance is most likely the result of different task requirements rather than changes in one’s cognitive structure or potential for performing in a particular manner. In fact, performance (e.g., solving a mathematics problem) that may have involved learning during an earlier phase may involve little, if any, learning during the terminal phase. The ability to perform a task (including answering certain questions about a complex body of knowledge) that is accomplished in a straightforward, automatic manner (i.e., one merely utilizes preexisting procedures) involves neither learning nor problem solving. The learning that does occur during this phase most likely consists of either: (a) the addition of new facts to preexisting schemata (i.e., accretion), or (b) increasingly higher levels of interrelationships (e.g., where the schemata consist of other schemata rather than facts). In one sense, learning in a particular domain never ends, but a point is reached when the expert (not necessarily defined in the traditional sense) functions autonomously on automatic pilot, giving little thought and /or exerting little mental effort to the control of what he or she is doing.
Transition Between Phases The most problematic part of any phase theory concerns the transition between phases. What, for example, is the nature of the change that occurs as one moves from one phase to the next? And what factors lead to the changes that are purported to occur? To many people, phases suggest the presence of separate and distinct entities with clear-cut boundaries between adjoining stages. But it seems unlikely that such is the case. It probably is best to think of learning as a continuous process; the boundaries between phases are most likely fuzzy, and the transitions between phases gradual rather than dichotomous (see Fitts’ [1964] quote cited earlier). The truth of the matter is that we currently have a very poor understanding of how these transitions occur and what factors precipitate them. But this type of problem is not unique to psychology; the physical sciences have similar problems – for example, understanding the changes and precipitating factors when a steady stream of water turns into a series of discrete drops. In both instances, the separate states (phases) can be documented, but the transition escapes current understanding. It may be the case, at least with phases of learning, that during the transition characteristics of both phases are operating in an overlapping manner. Thus, the learner might continue to rely on mnemonics even though their usefulness has diminished and the need for organization has become paramount. Such duplication could even
Salkind_Chapter 40.indd 156
9/4/2010 10:38:23 AM
Shuell
Phases of Meaningful Learning
157
serve a functional purpose in that new behavior is often unstable and the involvement of more than one factor could minimize the potentially negative effect of phenomena such as regression and forgetting. Finally, it is reasonable in an educational context to raise the issue as to whether the transitions between phases can be stimulated or encouraged. Given our lack of understanding of how these transitions operate, little can be said that might be helpful in this regard. Nevertheless, it does not seem unreasonable (neither is it anything new) to speculate that transitions can be facilitated by encouraging learners to utilize strategies consistent with the phase they are about to enter. However, the good teacher, as well as the good learner, will be aware that premature involvement of facts may be counterproductive. Let them be available in working memory, but let them enter into the learning process in their own due time. One final possibility is worth considering – namely, that phases and the transitions between them may be by-products of the learning process rather than an integral part. If such is the case, attempts to facilitate transitions per se will accomplish little.
Other Considerations There appears to be sufficient rationale to support the notion that learning a complex body of knowledge – whether it be the type we learn in school, the compilation of life experiences, or the mastering of the skills inherent in a craft, trade, or profession – involves a series of phases during which the learning process is fundamentally different. It is usually assumed that these phases are organized in a linear manner, but it is possible that they may be organized in a hierarchical, spiral, and /or concentric manner as well (Wade, 1989). Earlier phases may be subsumed into subsequent phases (in much the same way as the developmental stages of Piaget), or new phases may exist side-by-side (as in the developmental stages of Jerome Bruner).4 The present analysis and review has focused on cognitive aspects of learning, but learning, especially the type of long-term learning being discussed, involves emotional, affective, and social aspects as well. The extent to which these various aspects of learning interrelate in a manner conducive or detrimental to learning may also vary as a function of the phase of learning. For example, an individual may begin studying a domain of knowledge with considerable enthusiasm and interest only to discover later that the domain was not what he or she originally expected, and the converse can exist as well (e.g., begin with low expectations and a dislike that becomes more positive as learning progresses). In closing, it must be cautioned that, although a phase analysis of learning is appealing in many ways, much more evidence is needed if the existence of phases is to be established in a scientifically valid manner. Some of the methodological concerns have been discussed in this article, but delineation of the
Salkind_Chapter 40.indd 157
9/4/2010 10:38:23 AM
158
Curriculum, Instruction and Learning
phases (with regard to both the number of phases that might be involved and the characteristics of each phase) must await future research. In the meantime, the realization that phases most likely exist in the learning of complex and potentially meaningful knowledge provides useful insights into the learning process (including a basis for explaining why certain variables affect learning in some situations but not in others). In teaching such knowledge, it also suggests that we should pay attention to the way the teaching / learning process changes as learning progresses.
Notes 1. Phases and levels are similar in that both involve equivalency of process across either various domains or within a specific domain, respectively. They differ in that an individual progresses through the same three phases whenever he or she acquires a new body of knowledge (e.g., physics vs. literature), but, once he or she moves from Level 1 to Level 2 in a particular domain (e.g., the American novel), returning to Level 1 in that domain is not possible. 2. Too frequently, rote learning and meaningful learning are pitted against one another in a good /bad or either /or manner. In reality, both play an important role in learning from instruction, for at times it is intelligent to memorize something by rote, especially in the present context where rote learning is a means to an end rather than necessarily being an end in itself. 3. See Norman (1978) for a detailed discussion of learning strategies, general characteristics, modes of testing, and transfer relevant to accretion, restructuring, and tuning. 4. The distinctions that Karmiloff-Smith (1984, 1986) makes among stages, phases, and levels are also relevant here.
References Anderson, J. R. (1982). Acquisition of cognitive skill. Psychological Review, 89, 369–406. Anderson, J. R. (1987). Skill acquisition: Compilation of weak-method problem solutions. Psychological Review, 94, 192–210. Andre, T. (1986). Problem solving and education. In G. D. Phye & T. Andre (Ed.), Cognitive classroom learning: Understanding, thinking, and problem solving (pp. 169–204). Orlando, FL: Academic Press. Baddeley, A. D. (1966). The influence of acoustic and semantic similarity on long-term memory for word-sequences. Quarterly Journal of Experimental Psychology, 18, 302–309. Battig, W. F. (1968). Paired-associate learning. In T. R. Dixon & D. L. Horton (Eds.), Verbal behavior and general behavior theory (pp. 149–171). Englewood Cliffs, NJ: Prentice-Hall. Benner, P . (1984). From novice to expert: Excellence and power in clinical nursing practice. New York: Addison-Wesley. Bereiter, C. (1989, March). The role of an educational learning theory: Explaining difficult learning. In W. J. McKeachie (Chair), Toward a unified approach to learning as a multisource phenomenon. Symposium conducted at the meeting of the American Educational Research Association, San Francisco. Brainerd, C. J. (1985). Model-based approaches to storage and retrieval development. In C. J. Brainerd & M. Pressley (Eds.), Basic processes in memory development: Progress in cognitive development research (pp. 143–207). New York: Springer-Verlag.
Salkind_Chapter 40.indd 158
9/4/2010 10:38:23 AM
Shuell
Phases of Meaningful Learning
159
Brainerd, C. J., Howe, M. L., & Desrochers, A. (1982). The general theory of two-stage learning: A mathematical review with illustrations from memory development. Psychological Bulletin, 91, 634 –665. Bransford, J. D., & Franks, J. J. (1976). Toward a frame wok for understanding learning. In G. H. Bower (Ed.), Psychology of learning and motivation ( Vol. 10, pp. 93–127). New York: Academic Press. Bransford, J. D., & Johnson, M. K. (1972). Contextual prerequisites for understanding: Some investigations of comprehension and recall. Journal of Verbal Learning and Verbal Behavior, 11, 717–726. Brown, A. L., Bransford, J. D., Ferrara, R. A., & Campione, J. C. (1983). Learning, remembering, and understanding. In P . H. Mussen (Ed.), Handbook of child psychology: Vol. III. Cognitive development (J. H. Flavell & E. M. Markman, Vol. Eds.) (4th ed., pp. 77–166). New York: John Wiley & Sons. Bryan, W. L., & Harter, N. (1897). Studies in the physiology and psychology of the telegraphic language. Psychological Review, 4, 27–53. Bryan, W. L., & Harter, N. (1899). Studies on the telegraphic language: The acquisition of a hierarchy of habits. Psychological Review, 6, 345–375. Champagne, A. B., Klopfer, L. E., & Gunstone, R. F. (1982). Cognitive research and the design of science instruction. Educational Psychologist, 17, 31–53. Chi, M. T. H. (1978). Knowledge structures and memory development. In R. S. Siegler (Ed.), Children’s thinking: What develops? (pp. 73–96). Hillsdale, NJ: Lawrence Erlbaum Associates. Chi, M. T. H., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In R. Sternberg (Ed.), Advances in the psychology of human intelligence (Vol. 1, pp. 7–75). Hillsdale, NJ: Lawrence Erlbaum Associates. Chiesi, H. L., Spilich, G. J., & Voss, J. F. (1979). Acquisition of domain-related information in relation to high and low domain knowledge. Journal of Verbal Learning and Verbal Behavior, 18, 251–273. Conrad, R. (1964). Acoustic confusions in immediate memory. British Journal of Psychology, 55, 75–83. Dreyfus, H. L., & Dreyfus, S. E. (1986). Mind over machine: The power of human intuition and expertise in the era of the computer. New York: Free Press. Fitts, P . M. (1962). Factors in complex skill training. In R. Glaser (Ed.), Training research and education (pp. 177–197). Pittsburgh: University of Pittsburgh Press. Fitts, P . M. (1964). Perceptual-motor skill learning. In A. W. Melton (Ed.), Categories of human learning (pp. 243–285). New York: Academic Press. Fleishman, E. A., & Hempel, W. E., Jr. (1954). Changes in factor structure of a complex psychomotor test as a function of practice. Psychometrika, 19, 239–252. Fleishman, E. A., & Hempel, W. E., Jr. (1955). The relation between abilities and improvement with practice in a visual discrimination reaction task. Journal of Experimental Psychology, 49, 301–310. Glaser, R., & Chi, M. T. H. (1988). Overview. In M. T. H. Chi, R. Glaser, & M. J. Farr (Eds.). The nature of expertise (pp. xv–xxviii). Hillsdale, NJ: Lawrence Erlbaum Associates. Hayes, J. R. (1965). Problem topology and the solution process. Journal of Verbal Learning and Verbal Behavior, 4, 371–379. Hayes, J. R. (1966). Memory, goals, and problem solving. In B. Klenmuntz (Ed.), Problem solving: Research, method, and theory. New York: John Wiley & Sons. Karmiloff-Smith, A. (1984). Children’s problem solving. In M. E. Lamb, A. L. Brown, & B. Rogoff (Eds.), Advances in developmental psychology ( Vol. 3, pp. 39–90). Hillsdale, NJ: Lawrence Erlbaum Associates. Karmiloff-Smith, A. (1986). Stage /structure versus phase/process in modelling linguistic and cognitive development. In I. Levin (Ed.), Stage and structure: Reopening the debate (pp. 164 –190). Norwood, NJ: Ablex.
Salkind_Chapter 40.indd 159
9/4/2010 10:38:23 AM
160
Curriculum, Instruction and Learning
Keil, F. C. (1986). On the structure-dependent nature of stages of cognitive development. In I. Levin (Ed.), Stage and structure: Reopening the debate (pp. 144 –163). Norwood, NJ: Ablex. Kintsch, W., & Buschke, H. (1969). Homophones and synonyms in short-term memory. Journal of Experimental Psychology, 80, 403–407. Labouvie, G. V., Frohring, W. R., Baltes, P. B., & Goulet, L. R. (1973). Changing relationship between recall performance and abilities as a function of stage of learning and timing of recall. Journal of Educational Psychology, 64, 191–198. Lesgold, A., Rubinson, H., Feltovich, P ., Glaser, R., Klopfer, D., & Wang, Y. (1988). Expertise in a complex skill: Diagnosing x-ray pictures. In M. T. H. Chi, R. Glaser, & M. J. Farr (Eds.), The nature of expertise (pp. 311–342). Hillsdale, NJ: Lawrence Erlbaum Associates. Mayer, R. E. (1983). Thinking, problem solving, cognition. New York: W. H. Freeman. McGuire, W. J. (1961). A multiprocess model for paired-associate learning. Journal of Experimental Psychology, 62, 335–347. Norman, D. A. (1978). Notes toward a theory of complex learning. In A. M. Lesgold, J. W. Pellegrino, S. D. Fokkema, & R. Glaser (Eds.), Cognitive psychology and instruction (pp. 39– 48). New York: Plenum Press. Restle, T., & Davis, J. H. (1962). Success and speed of problem solving by individuals and groups. Psychological Review, 69, 520–536. Rumelhart, D. E., & Norman, D. A. (1978). Accretion, tuning, and restructuring: Three modes of learning. In J. W. Cotton & R. L. Klatzky (Eds.), Semantic factors in cognition (pp. 37–53). Hillsdale, NJ: Lawrence Erlbaum Associates. Shuell, T. J. (1986a). Cognitive conceptions of learning. Review of Educational Research, 56, 411–436. Shuell, T. J. (1986b). Individual differences: Changing concepts in research and practice. American Journal of Education, 94, 356–377. Shuell, T. J. (1990). Teaching and learning as problem solving. Theory into Practice, 29, 102–108. Spiro, R. J., Coulson, R. L., Feltovich, P . J., & Anderson, D. K. (1988). Cognitive flexibility theory: Advanced knowledge acquisition in ill-structured domains (Tech. Rep. No. 5). Springfield, IL: Southern Illinois University School of Medicine, Conceptual Knowledge Research Project. Sternberg, R. J. (1984). Mechanisms of cognitive development: A componential approach. In R. J. Sternberg (Ed.), Mechanisms of cognitive development (pp. 163–186). New York: W. H. Freeman. Thomas, J. C. (1974). An analysis of behavior in the hobbits-orcs problem. Cognitive Psychology, 6, 257–269. Underwood, B. J., Runquist, W. N., & Schulz, R. W. (1959). Response learning in pairedassociate lists as a function of intralist similarity. Journal of Experimental Psychology, 58, 70–78. Underwood, B. J., & Schulz, R. W. (1960). Meaningfulness and verbal learning. Philadelphia: Lippincott. Voss, J. F., Greene, T. R., Post, T. A., & Penner, B. C. (1983). Problem-solving skill in the social sciences. In G. H. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory ( Vol. 17, pp. 165–213). New York: Academic Press. Wade, L. (1989). General and domain-specific elements related to stages of learning. Unpublished manuscript, State University of New York at Buffalo. Wallas, G. (1926). The art of thought. New York: Harcourt Brace Jovanovich.
Salkind_Chapter 40.indd 160
9/4/2010 10:38:23 AM
41 Growth, Development, Learning, and Maturation as Factors in Curriculum and Teaching William C. Trow
I
n bringing together the diverse aspects of educational psychology indicated by the title of this chapter, and in focusing them on instructional practice, it would seem appropriate first to look for the hypotheses that lie behind both research and practice. The ideas embodied in these hypotheses can then be noted in the implications of the research that has been done. The treatment of this chapter will therefore concentrate on learning and the curriculum since growth and development are reviewed in a separate issue of this journal.
Hypotheses for Research and for School Practice Affecting Teaching and the Curriculum The basic idea that underlies all the others in modern education is to be found in John Dewey’s recurrent emphasis on the experience of the child as of central concern. Regard for his all-round development has shifted the emphasis from forcing him to acquire knowledge, to making provision for an inner living motivation (22). We cannot know what to teach, how to teach, or when to teach until we know whom we are teaching, and from what homes, groups, and cultures our students come, and until we can identify
Source: Review of Educational Research, XXI(3) (1951): 186–195.
Salkind_Chapter 41.indd 161
9/4/2010 10:38:11 AM
162
Curriculum, Instruction and Learning
and project their current needs against the future needs of a dynamic society (55). If these things are to be done, man in general, and the pupil in particular, must be viewed not as a free soul, a capricious creature, but as a dynamic, lawful system interacting with the energy systems of the environment. Anderson (3) contrasted the effects on one’s view of pupil capacity, motivation, practice, and transfer when man is viewed (a) as a “free soul” and (b) as a “dynamic lawful system.” American psychology supports the latter under which there have been identified two basic systems of behavior and of learning, the association system and the field system. If man is viewed as a dynamic lawful system, it means, first, that the individuals in a learning group must be perceived as a group of dynamically interactive personalities (27). Included in this perception will be what Snygg and Combs (69) called the “phenomenological” approach, which seeks to explain behavior on the basis of the behaver’s perception of himself and of the world, and deal with it accordingly. It means, second, that the objectives of education will include the development of attitudes toward other individuals and groups (50), but they must be “right” attitudes. Research is therefore needed on value concepts and their role and influence in social behavior (87). Modern education is actually reoriented toward human values (14), yet probably few teachers realize they are applying axiology when they try to answer a child’s, “Why do we have to?” If axiology is viewed as the science of preferential behavior (53), the way is open for the coalescence of value concepts with those derived from the study of personality and referred to as needs. The third implication of the view of man as a dynamic lawful system, then, is that the educator, in working out a curriculum to promote pupil learning, will be a student of the dynamics of normal personality. As expounded by Dollard and Miller (29) and others, knowledge of personality is derived from psychoanalysis, social psychology, and anthropology, from which areas light is being thrown on the nature of social motivation and drive, of fear and conflict, and on the cultural conditions of learning. Harsh and Shrickel (41) traced successive developmental stages which produce changes in motivation, ability, and learned adjustment patterns and which must be more adequately reflected in the school program. For example, Miller (56) found that rats learned to get relief from an electric shock by striking each other, and when alone they struck a celluloid doll. The rats, like the child in play who beats the “mamma doll,” revealed the Freudian mechanism of displayed aggression, as would the child restricted by too narrow curriculum demands. Kluckhohn and Murray (51) in one of the chapters in Personality: In Nature, Society and Culture, to which a number of authors have contributed, outlined their dynamic, organismic concept of personality. They see it as a continuity of functional forces and forms manifested thru sequences of organized integrative processes. Its functions, among others, are self-expression and the reduction of
Salkind_Chapter 41.indd 162
9/4/2010 10:38:12 AM
Trow
Factors in Curriculum and Teaching
163
tensions and conflicts thru social conformity, identification, and the creation of a design for living that permits periodic and harmonious appeasement of most of the needs, and gradual progression toward distant goals. If education is to profit from the empirical approach here indicated, as Bode (15) pointed out, it will have to accept Dewey’s view (28) and abandon its hope for truth as conformity to an alleged immutable cosmic order. Money (57) suggested a kind of compromise in view of the fact that the difference between private delusions, widely accepted beliefs, and scientific facts, is one of actual or possible validation. Since some beliefs, referred to as absolutes, have eased psychosomatic existence, may they not be considered as axioms that might be changed in content by common agreement; thus satisfying the philosophical need for stability and also the scientific need for change? In any case, scientific method has a long way to go in the social realm before it will be able to keep the thinking even of trained minds out of the mental channels of the child, the primitive, and the psychotic. These channels of thinking were mapped in detail by Werner (85) as follows, each with its scientific opposite: the syncretic to the discrete, the diffuse to the articulated; the indefinite to the definite, the rigid to the flexible, and the labile to the stable.
Learning and the Curriculum Since the relation of mental hygiene and guidance to the curriculum and teaching is not included in this issue of the Review, suffice it to say with Wright (88) that two kinds of childhood needs should be differentiated: the normative as they are appreciated by the adult, and the psychological as felt by children. But since the child responds not to generalities but to specifics, it is important, if his basic needs are to provide effective motivation, that curriculum activities be cognitively well-structured so that the activities lead clearly to the goals sought. Blair (11) indicated the nature of the psychologically effective curriculum as one which makes provision for varying maturity and experience levels, gears learning activities to pupil needs and goals, provides units of experience that have structure and meaning for the pupil, and selects and appraises projected pupil activities in terms of their transfer value to life situations. Brownell (19), however, pointed to the discrepancy between learning experiments and improvement in teaching, and called for long-term studies of the learning process in the classroom. He contended (18) that less emphasis in initial learning should be placed on speed and accuracy in the product, and more on improvement of the process and the establishment of principles transferable to new procedures. Detailed applications of educational psychology to the processes of learning and instruction are to be found in the Forty-Ninth Yearbook of the
Salkind_Chapter 41.indd 163
9/4/2010 10:38:12 AM
164
Curriculum, Instruction and Learning
National Society for the Study of Education (58). In this volume, a number of authors who themselves have long been conducting research in this field interpreted their findings and those of their colleagues. Following chapters on the nature of learning and of motivation, a second section describes the ways children learn motor types of activities, concepts and generalizations, interests and attitudes, personal and social adjustments, esthetic responses, and the technics of problem solving. Implications of learning principles are then pointed out for the different school levels and for teaching procedures. English (31) presented a brief discussion of the nature of learning and a history of learning theory, while the most intensive and extensive treatment was given by Hilgard (45), who critically reviewed the current theories. Tolman (78) sought to explain the divergencies in theory by asserting that there is more than one kind of learning and listed six kinds including field expectancies, field cognition modes, drive discriminations and motor patterns. A detailed report of the psychological studies from which the theories are derived is to be found in the Annual Review of Psychology (70), the first issue of which appeared last year. In it there is a critical review of learning theory, with a 101-item bibliography by Melton, an analysis of problem-solving processes by Johnson, a report of growth and age changes especially in motor functions and mental abilities by Jones and Bayley, and a review of studies of assorted items from the field of educational psychology by Cronbach. Piaget (62) continuing his long series of studies on the genetics of mental functions traced four stages in the development of what he called the moral judgment of the child by analyzing children’s acceptance of the rules in the game of marbles. Gebhard (37) found that the attractiveness of an activity is determined not only by past experience of success but by the expectation of future success. Grace (39) concluded that verbal approval was more effective with the well-adjusted and emotionally stable. Postman (63), drawing from a 332-item bibliography, summarized the history and present status of the law of effect. In this connection, mention should be made of Thorndike’s Selected Writings from a Connectionist’s Psychology (76), in which studies published from 1913 to 1947 were chosen and arranged by the author himself including seven on learning. Postman (65) is the senior author of one study in which it was found that forgetting in the form of retroactive inhibition is smaller when a change of set (e.g., direction of association or type of logical relationship) is used. In another study (64) he concluded that the subject’s readiness for a particular type of test is a factor influencing set. A 136-item bibliography accompanies an article (36) on the measurementof transfer of training which concluded that various methods of measurement suggest that different functions are being measured. Not only is it time that some of these functions are isolated, but the conditions under which what is learned in school will be used when needed require still further study.
Salkind_Chapter 41.indd 164
9/4/2010 10:38:12 AM
Trow
Factors in Curriculum and Teaching
165
School experimentation is bedeviled by the almost insuperable difficulty of the control of innumerable variables. Some, however, have faced up to the task. Anderson (2) found little difference in the result when pupils were taught arithmetic by a “drill method” (connectionist) and a “meaning method” (gestalt), altho the latter proved superior, for pupils scoring high on ability but low on initial achievement, in improving transfer to different kinds of materials. A study of pupil interests (47) revealed that variations are determined largely by opportunities and incentives of the environment, which means that the learning factor is important; and since elementary pupils revealed more interest in school than high-school pupils, it was concluded that the schools can use their influence to create more satisfactory learning situations. Studies of learning when learner differences are extreme have important implications for the wide range of abilities in the schools. Cruickshank (25) found that mentally retarded boys were less competent than normal boys of the same mental age in solving arithmetic problems. Strauss and Lehtinen (72) summarized 20 years of outstanding research on brain-injured children, reviewing especially-devised test situations, the characteristic behavior found, and applications to learning situations. Gesell (38) added a collection of papers reporting significant aspects of his work at the Yale Clinic previous to his retirement. And Terman and Oden (74) published the fourth volume of the Genetic Studies of Genius – The Gifted Child Grows Up, a 25-year follow-up of the original superior group, with educational and other implications, among which is the conclusion that in spite of the environmentalist’s efforts, the hereditary hypothesis seems to stand up.
General Sources Based on Research Studies with Implications for the Curriculum and Teaching Brickman (17) reviewed 16 texts and reference works in the field of educational psychology which were published during the years 1945–1948, and since then other texts have appeared including those by Beaumont and Macomber (8), Simpson (67), and Trow (79). Specific relationships of educational measurements and of a knowledge of individual differences to the curriculum were pointed out by Freeman (35) and Cook (24) respectively. Williams (86), after a survey of 223 titles, analyzed the approaches directed toward reducing and controlling intergroup tensions to which curriculum activities can make an important contribution. Growth factors influencing the curriculum have received detailed standard treatment by a number of authors including Breckenridge and Vincent (16) and Hurlock (48, 49). Rasey (66) brought out a unique volume which develops the implications of the whole-child concept and includes reports from the autobiographies of 1600 students with critic-teacher comments.
Salkind_Chapter 41.indd 165
9/4/2010 10:38:12 AM
166
Curriculum, Instruction and Learning
Olson (60) is the author of a significant report of research which has been continuing over a period of 20 years. The results of longitudinal studies are presented with important implications for elementary-school activities. Remedial reading, self-selection, and promotional policies, among other problems, receive attention. Beck in Human Growth (9) described the physiological changes accompanying adolescence as was done in the film by the same name. Havighurst (42) defined an interesting concept, the “developmental task,” as one which, if successfully accomplished, leads to happiness and success with later tasks, but if not results in unhappiness, disapproval by society, and difficulty with later tasks. At the elementary-school level, Averill (5) prepared a textbook for the whole period, and Forest (33) for students of early childhood education. Lee and Lee (52) emphasized the “integrative approach” in handling the elementary-school subjects, and Hildreth (44) developed the principles of organized and unified learning in harmony with behavioral development, interpreting them in terms of realistic life experiences. At the high-school level two rather unique documents have appeared. One of them (73) presents in pamphlet form discussions of a hypothetical workshop group relating learning to diverse viewpoints and to curriculum experiences. The other (43), the “Prairie City” studies of the development of personality, is full of implications for programs of character education. Mention should also be made of a selected list of 72 references on gifted children (82), and of the Yearbook of the National Society for the Study of Education on The Education of Exceptional Children (59).
Curriculum and Co-curriculum – What Shall Students Learn? While curriculum revisions are predominantly in the field of social learning, among the others that might be selected for special mention is Cureton’s (26) work on physical fitness appraisal and guidance, which, along with other influences is likely to have the effect of improving the physical education program. Luchins and Luchins (54) carried forward the structural approach to the comprehension of spatial relationships, supplementing Wertheimer’s chapter on the area of a parallelogram which appeared in his Productive Thinking. Geometry teachers will find here suggestions for improving pupil comprehension. Meaning in arithmetic was studied by Van Engen (83) who favored what he called the concept of operational arithmetic, in which meanings are derived from acts or operations, in contrast with the social-meaning theory, the structural-meaning theory, and the nihilistic theory of numbers as meaningless symbols, which latter is perhaps far too prevalent.
Salkind_Chapter 41.indd 166
9/4/2010 10:38:12 AM
Trow
Factors in Curriculum and Teaching
167
A number of papers at the college level deal with the place of educational psychology in the training of teachers. Anderson (1, 4) defined the field of educational psychology, enumerating the contributions it can make to the education of teachers and indicated suggestions for content. Trow (80) carried the analysis further enunciating the objective as effective pupil participation, this to be attained in proportion to the extent to which the teacher can learn to structure the school environment, organize activities (curriculum), recognize proper objectives in terms of needs and values, and teach appropriate behavior (knowledge, skills, and attitudes). Details were elaborated by others: Blair analyzed the content of various texts (12) and pointed out what teachers should know about adolescence (13). Bruce indicated relationships with general psychology (21) and the importance of a knowledge of child development (20). Freeman (34) and Cook (23) emphasized the importance of a study of individual differences and of educational measurement (24) for the curriculum and for the education of teachers. The direct approach has been made to the high-school pupil in order to facilitate his social learning thru the medium of the textbook. Thorpe (77) discussed understanding ourselves and others, maintaining personal and social integrity, character and religion, and personality and the welfare of society. Smart and Smart (68) were primarily concerned with the feelings and attitudes of children and their parents, and Duvall (30) prepared a text for family living which includes chapters on personality development, family interrelationships, boy-girl relationships, and preparation for marriage. Weiner (84) reported on the prolonged preacademic curriculum of the Wayne County Training School and furnished a guide for the objective observation of preacademic achievements, a program designed for high-grade feeble-minded children, but which should be taken to heart by elementary teachers in regular schools. The important implications of play therapy for general educational activities were elaborated by Axline (6) and helpful hints for an elementary-school council were provided by O’ Toole (62). The lush growth of group dynamics at Bethel and elsewhere forces the question of its place in the regular school situation where the teacher perforce has the roles both of leader and of resource person, as well as others. After doing background reading on group dynamics (10) and the psychodrama technic (40) developed by Moreno, the reader may wish to consider some of the implications of considering the class as a group (75, 81). From the social-work angle, group work has been going on for some time, and Strang (71) showed its relevance to schools and institutions of higher learning. Such efforts would seem appropriate since Bath (7) in a follow-up study found little to distinguish the winners from the non-winners of a junior high-school efficiency certificate in a good citizenship program of 20 years ago. Perhaps the most promising program for the future was initiated by the Horace Mann-Lincoln Institute of School experimentation. If effective improvements are to be made in the curriculum they will be made
Salkind_Chapter 41.indd 167
9/4/2010 10:38:12 AM
168
Curriculum, Instruction and Learning
not so much as a consequence of what is done to teachers as of what they themselves do. This involves (46) research on the educational program thru cooperative teacher research and planning. More schoolwork than in the past will probably be carried on thru problem-centered group activities (32). This term designates an educational process by which teachers and students work cooperatively to solve problems related to the experiences, interests, and concerns of young people, in the process of which attitudes are structured and self-evaluation is encouraged.
Bibliography 1. Anderson, George Lester. “Educational Psychology and Teacher Education.” Journal of Educational Psychology 40: 275–84; May 1949. 2. Anderson, George Lester. “Quantitative Thinking as Developed Under Connectionist and Field Theories of Learning.” In Swenson, Esther; Anderson, G. Lester; and Stacey, Chalmers L. Learning Theory in School Situations. Minneapolis: University of Minnesota Press, 1949. 103 p. 3. Anderson, George Lester. “Theories of Behavior and Some Curriculum Issues.” Journal of Educational Psychology 39: 133–40; March 1948. 4. Anderson, George Lester. “What the Psychology of Learning Has To Contribute to the Education of the Teacher.” Journal of Educational Psychology 41: 362–65; October 1950. 5. Averill, Lawrence A. The Psychology of the Elementary School Child. New York: Longmans, Green and Co., 1949. 459 p. 6. Axline, Virginia Mae. Play Therapy: The Inner Dynamics of Childhood. Boston: Houghton Mifflin Co., 1947, 379 p. 7. Bath, John A. “A Study of Selected Participants and Non-Participants in a Program Directed Toward the Development of Initiative and Good Citizenship.” Journal of Experimental Education 16: 161–75; March 1948. 8. Beaumont, Henry, and Macomber, Freeman G. Psychological Factors in Education. New York: McGraw-Hill Book Co., 1949. 318 p. 9. Beck, Lester F. Human Growth. New York: Harcourt, Brace and Co., 1949. 124 p. 10. Benne, Kenneth D.; Bradford, Leland P.; and Lippitt, Ronald. Group Dynamics and Social Action. New York: Anti-Defamation League of B’nai B’rith, 1950. 62 p. 11. Blair, Glenn M. “HOW Learning Theory Is Related to Curriculum Organization.” Journal of Educational Psychology 39: 161–66, March 1948. 12. Blair, Glenn M. “The Content of Educational Psychology.” Journal of Educational Psychology 40: 267–74; May 1949. 13. Blair, Glenn M. “What Teachers Should Know About the Psychology of Adolescence.” Journal of Educational Psychology 41: 356–61; October 1950. 14. Bode, Boyd H., and others. Modern Education and Human Values. Pittsburgh: University of Pittsburgh Press, 1947. 165 p. 15. Bode, Boyd H. “John Dewey’s Philosophy of Education.” New Republic 121: 10–39, October 1949. 16. Breckenridge, Marian E., and Vincent, Elizabeth Lee. Child Development. Revised edition. Philadelphia: Saunders Co., 1943. 1949. 622 p. 17. Brickman, William W. “Educational Psychology: A Review.” School and Society 68: 218–23; September 1948. 18. Brownell, William A. “Criteria of Learning in Educational Research.” Journal of Educational Psychology 39: 170–82; March 1948.
Salkind_Chapter 41.indd 168
9/4/2010 10:38:12 AM
Trow
Factors in Curriculum and Teaching
169
19. Brownell, William A. “Learning Theory and Educational Practice.” Journal of Educational Research 41: 481–97; March 1948. 20. Bruce, William F. “HOW Can the Psychology of Development in Infancy and Childhood Help Teachers?” Journal of Educational Psychology 41: 348–55; October 1950. 21. Bruce, William F. “The Relations of Educational Psychology with General Psychology.” Journal of Educational Psychology 40: 261–66; May 1949. 22. Caswell, Hollis L. “Influence of John Dewey on the Curriculum of American Schools.” Teachers College Record 51: 144–46; December 1949. 23. Cook, Walter W. “Individual Differences and Curriculum Practice.” Journal of Educational Psychology 39: 141–418; March 1948. 24. Cook, Walter W. “What Educational Measurement in the Education of Teachers?” Journal of Educational Psychology 41: 339–47; October 1950. 25. Cruickshank, William M. “Arithmetic Ability of Mentally Retarded Children.” Journal of Educational Research 42: 161–70, 279–88; November–December 1948. 26. Cureton, Thomas K. Physical Fitness Appraisal and Guidance. St. Louis: C. V. Mosby Co., 1947. 566 p. 27. Dennis, Wayne, editor. Current Trends in Social Psychology. Pittsburgh: University of Pittsburgh Press, l948. 299 p. 28. Dewey, John. Reconstruction in Philosophy. Enlarged edition. Boston: Houghton Mifflin Co., 1948. 224 p. 29. Dollard, John, and Miller, Neal E. Personality and Psychotherapy. New York: McGrawHill Book Co., 1950. 488 p. 30. Duvall, Evelyn Millis. Family Living. New York: Macmillan Co., 1950. 410 p. 31. English, Horace B. Learning as Psychotechnology. Columbus: Ohio State University, 1949. 81 p. 32. Evans, Hubert M., editor. “The Problem-Centered Group and Personal-Social Problems of Young People.” Teachers College Record 51: 438–59; April 1950. 33. Forest, Ilse. Early Years at School. New York: McGraw-Hill Book Co., 1949. 381 p. 34. Freeman, Frank S. “The Study of Individual Differences in the Education of Teachers.” Journal of Educational Psychology 41: 366–72; October 1950. 35. Freeman, Frank S. “How the Curriculum Is Evaluated and Modified Through Educational Measurement.” Journal of Educational Psychology 39: 167–69; March 1948. 36. Gagne, Robert M.; Foster, Harriet; and Crowley, Miriam E. “The Measurement of Transfer of Training.” Psychological Bulletin 45: 97–130; March 1948. 37. Gebhard, Mildred E. “The Effect of Success and Failure upon the Attractiveness of Activities as a Function of Experience, Expectation, and Need.” Journal of Experimental Psychology 38: 371–88; August 1948. 38. Gesell, Arnold. Studies in Child Development. New York: Harper and Brothers, 1948. 224 p. 39. Grace, Gloria Lauer. “The Relation of Personality Characteristics and Responses to Verbal Approval in a Learning Task.” Genetic Psychology Monographs 37: 73–99; 1948. 40. Grambs, Jean E. “Dynamics of Psychodrama in the Teaching Situation.” Sociatry 1: 383–99; March 1948. 41. Harsh, Charles M., and Schrickel, H. G. Personality, Development and Assessment. New York: The Ronald Press Co., 1950. 518 p. 42. Havighurst, Robert J. Developmental Tasks and Education. Chicago: University of Chicago Press, 1948. 86 p. 43. Havighurst, Robert J., and Taba, Hilda. Adolescent Character and Personality. New York: John Wiley and Sons, 1949. 315 p. 44. Hildreth, Gertrude. Child Growth Through Education. New York: The Ronald Press Co., 1948. 437 p. 45. Hilgard, Ernest R. Theories of Learning. New York: Appleton-Century-Crofts, 1948. 409 p.
Salkind_Chapter 41.indd 169
9/4/2010 10:38:12 AM
170
Curriculum, Instruction and Learning
46. Horace Mann-Lincoln Institute of School Experimentation. “The Social-Cultural Context of the School Program.” Teachers College Record 49: 325–29; February 1948. 47. Horace Mann-Lincoln Institute of School Experimentation. “Child Development and the Curriculum.” Teachers College Record 49: 314 –24; February 1948. 48. Hurlock, Elizabeth B. Adolescent Development. New York: McGraw-Hill Book Co., 1949. 566 p. 49. Hurlock, Elizabeth B. Child Development. Second edition. New York: McGraw-Hill Book Co., 1950. 669 p. 50. John Dewey Society. Intercultural Attitudes in the Making. Ninth Yearbook. New York: Harper and Brothers, 1947. 246 p. 51. Kluckhohn, Clyde, and Murray, Henry A., editors. Personality: In Nature, Society, and Culture. New York: Alfred A. Knopf, 1948. 561 p. 52. Lee, Jonathan Murray, and Lee, Doris May. The Child and His Curriculum. Second edition. New York: Appleton-Century-Crofts, 1950. 710 p. 53. Lepley, Ray, editor. Value: A Cooperative Inquiry. New York: Columbia University Press, 1949. 487 p. 54. Luchins, Abraham S., and Luchins, Edith H. “A Structural Approach to the Teaching of the Concept of Area in Intuitive Geometry.” Journal of Educational Research 40: 528–33; March 1947. 55. MacLean, Malcolm S. “Adolescent Needs and Building the Curriculum.” Trends in Student Personnel Work. (Edited by E. G. Williamson.) Minneapolis: University of Minnesota Press, 1949. p. 27–39. 56. Miller, Neal E. “Theory and Experiment Relating Psychoanalytic Displacement to Stimulus-Response Generalization.” Journal of Abnormal and Social Psychology 43: 155–78; April 1948. 57. Money, John. “Delusion, Belief, and Fact.” Psychiatry 11: 33–38, February 1948. 58. National Society for the Study of Education. Learning and Instruction. Forty-Ninth Yearbook, Part I. Chicago: University of Chicago Press, 1950. 352 p. 59. National Society for the Study of Education. The Education of Exceptional Children. Forty-Ninth Yearbook. Part II. Chicago: University of Chicago Press, 1950. 350 p. 60. Olson, Willard C. Child Development. Boston: D. C. Heath and Co., 1949. 417 p. 61. O’Toole, John F., Jr. “A Study of the Elementary School Student Council.” Elementary School Journal 50: 259–67; January, 1950. 62. Piaget, Jean. The Moral Judgment of the Child. Glencoe, Ill.: Free Press, 1948. 418 p. 63. Postman, Leo J. “The History and Present Status of the Law of Effect.” Psychological Bulletin 44: 489–563; November 1947. 64. Postman, Leo J., and Jenkins, William O. “An Experimental Analysis of Set in Rote Learning.” Journal of Experimental Psychology 38: 683–89; December 1948. 65. Postman, Leo J., and Postman, Dorothy L. “Change in Set as a Determinant of Retroactive Inhibition.” American Journal of Psychology 61: 236–42; April 1948. 66. Rasey, Marie I. Toward Maturity. New York: Hinds, Hayden, and Eldredge, 1947. 242 p. 67. Simpson, Robert G. Fundamentals of Educational Psychology. New York: J. B. Lippincott Co., 1949. 380 p. 68. Smart, Mollie S., and Smart, Russel C. Living and Learning with Children. Boston: Houghton Mifflin Co., 1949. 271 p. 69. Snygg, Donald, and Combs, Arthur W. Individual Behavior, A New Frame of Reference. New York: Harper and Brothers, 1948. 386 p. 70. Stone, Calvin P., editor. Annual Review of Psychology. Stanford, Calif.: Annual Reviews, 1950. 330 p. 71. Strang, Ruth. “Group Work in Schools and Institutions of Higher Learning.” A Decade of Group Work. (Edited by Charles E. Hendry.) New York: Association Press, 1948. p. 95–104.
Salkind_Chapter 41.indd 170
9/4/2010 10:38:12 AM
Trow
Factors in Curriculum and Teaching
171
72. Strauss, Alfred A., and Lehtinen, Laura E. Psychopathology and Education of the Brain-Injured Child. New York: Grune and Stratton, 1947. 220 p. 73. Sugarman, Myrtle F., editor. Effective Learning for Use in Junior High School. Denver Public Schools, 1949. 72 p. 74. Terman, Lewis M., and Oden, Melita H. The Gifted Child Grows Up. Stanford, Calif.: Stanford University Press, 1947. 448 p. 75. Thelen, Herbert A. “Human Dynamics in the Classroom.” Journal of Social Issues 6: 30–55; 1950. 76. Thorndike, Edward L. Selected Writings from a Connectionist’s Psychology. New York: Appleton-Century-Crofts, 1949. 370 p. 77. Thorpe, Louis P. Personality and Youth. Dubuque, Iowa: William C. Brown Co., 1949. 378 p. 78. Tolman, Edward C. “There Is More than One Kind of Learning.” Psychological Review 56: 144 –55; May 1949. 79. Trow, William Clark. Educational Psychology. Revised edition. Boston: Houghton Mifflin Co., 1950. 761 p. 80. Trow, William Clark. “Educational Psychology Charts a Course.” Journal of Educational Psychology 40: 285–94; May 1949. 81. Trow, William Clark, and others. “Psychology of Group Behavior.” Journal of Educational Psychology 41: 322–38; October 1950. 82. U.S. Office of Education. “Selected References on Gifted Children.” Understanding the Child 17: 56–64; April 1948. 83. Van Engen, Henry. “An Analysis of Meaning in Arithmetic.” Elementary School Journal 49: 321–29, 395– 400; February-March 1949. 84. Weiner, Bluma Beryl. “The Use of Systematic Classroom Observation To Aid in Curriculum Planning and Guidance for Young Mentally Retarded Boys.” American Journal of Mental Deficiency 52: 331–36, April 1948. 85. Werner, Heinz. Comparative Psychology of Mental Development. Revised edition. Chicago: Follett Publishing Co., 1948. 564 p. 86. Williams, Robin M., Jr. The Reduction of Intergroup Tensions: A Survey of Research on Problems of Ethnic, Racial, and Religious Group Relations. Social Science Research Council Bulletin No. 57, 1947. 153 p. 87. Woodruff, Asahel D. “Motivation Theory and Educational Practice.” Journal of Educational Psychology 40: 33–40; January 1949. 88. Wright, Herbert F. “How the Psychology of Motivation Is Related to Curriculum Development.” Journal of Educational Psychology 39: 149–56; March 1948.
Salkind_Chapter 41.indd 171
9/4/2010 10:38:12 AM
Salkind_Chapter 41.indd 172
9/4/2010 10:38:13 AM
Section III: Motivation
Salkind_Chapter 42.indd 173
9/4/2010 10:38:03 AM
Salkind_Chapter 42.indd 174
9/4/2010 10:38:04 AM
42 Maslow, Monkeys and Motivation Theory Dallas Cullen
T
he influence of Abraham Maslow’s (1943) hierarchy of needs is ubiquitous in management education and theory. Despite the common belief that Maslow’s theory is outdated and ignored (see, for example, Greiner, 1992: 61), current textbooks present the theory in approving terms. It is described as the ‘most widely recognized theory of motivation’ (Hellriegel et al., 1995: 174), the ‘most well-known need theory’ (Moorhead and Griffin, 1995: 83) and a ‘classic paper’ (Luthans, 1995:150). At the level of theory, Maslow’s hierarchy is so pervasive that it has almost become invisible, in that its basic framework and concepts are accepted without question. Managerial practices that permit or encourage employee autonomy and personal growth are justified on the grounds that such practices will enable employees to satisfy the esteem and self-actualization needs in Maslow’s hierarchy. Some of the recent literature on employee empowerment, for example, suggests that only those employees who value higher order needs such as personal growth will respond positively to being given greater autonomy in their work (Lawler, 1992: 83). This argument both follows from and incorporates the concept of ‘growth need strength’ in Hackman and Oldham’s (1980) theory of job design. In turn, the theory of job design draws on expectancy theory, which itself incorporates the needs in Maslow’s hierarchy (Porter and Lawler, 1968: 131). These links are not in any way hidden, but the question arises of why there is this acceptance of Maslow’s theory as the starting point, given that this acceptance coexists with the recognition that there is, at best, limited empirical evidence for the hierarchy (see, for example, Mitchell and Moudgill, 1976; Wahba Source: Organization, 4(3) (1997): 355–373.
Salkind_Chapter 42.indd 175
9/4/2010 10:38:04 AM
176
Motivation
and Bridwell, 1976). Indeed, Maslow himself wrote in his journal in 1962 that ‘My motivation theory was published 20 years ago, and in all that time nobody repeated it, or tested it, or really analyzed it or criticized it. They just used it, swallowed it whole with only the most minor modifications’ (Lowry, 1982: 63). In terms of research, the usual explanation is that the theory has not been disconfirmed; rather, research has not yet supported it, because of methodological problems in interpreting, operationalizing and measuring its concepts (Wahba and Bridwell, 1976: 235). The textbook explanations for including the needs hierarchy are that it ‘implicitly states the goals people value’ (Hellriegel et al., 1995: 176), makes managers ‘aware of the diverse needs of employees at work’ (Luthans, 1995: 152) and ‘makes a certain amount of intuitive sense’ (Moorhead and Griffin, 1995: 85). Because the hierarchy seems to describe what the average employee seeks, it gives management a simple and quick means of understanding differences or changes in employee motivation (Huczynski, 1993: 24). These explanations, however, beg the question. Why are there problems in interpreting the hierarchy’s central concepts? Why does it make intuitive sense? One answer to these questions lies in recognizing that Maslow’s humanistic psychology contains both a ‘democratic’ premise, which emphasizes authenticity, self-fulfillment and respect for the choices, preferences and values of each individual, and an ‘aristocratic’ premise, which emphasizes vocational competence, self-criticism and deference to the choices, preferences and values of the self-actualizing elite (Aron, 1977). The presence of these contradictory premises might then lead to difficulties in measuring or implementing the hierarchy’s concepts. But why does Maslow’s theory contain these contradictory premises? One explanation has focussed on the hierarchy’s links to liberal democratic theory. For example, Buss (1979) argues that Maslow’s humanistic psychology was a liberal reaction to the conservatism embodied in both the positivistic methodology of behaviorist psychology and the pessimistic determinism of Freudian psychoanalysis. The democratic premise of individual autonomy and self-fulfillment in Maslow’s hierarchy is the psychological counterpart of liberal democratic theory’s stress on individual rights. However, in modern liberal democratic societies control belongs to a small and powerful elite rather than the masses, and the aristocratic premise reflects this reality. Consequently, Buss believes, the contradictions in Maslow’s theory are real rather than conceptual, since the theory is grounded in the historical and social realities of the growth of democratic elitism. Shaw and Colimore (1988) take this analysis one step further. Given the close link between politics and economics, they argue, Maslow’s theory is best understood as an affirmation of capitalist ideology. Since growth comes through vocational achievement, Maslow’s theory glorifies individual initiative for personal gain. At the same time, its hierarchical structure justifies the class system found in capitalistic societies, treating this system as both inescapable and beneficial.
Salkind_Chapter 42.indd 176
9/4/2010 10:38:04 AM
Cullen
Maslow, Monkeys and Motivation Theory
177
While Buss believed that the needs hierarchy was a reflection of Maslow’s own liberal values, Shaw and Colimore (1988: 69) were ‘stunned’ by the ‘non-liberating quality’ of Maslow’s contributions to the management literature. However, they, like Buss, conclude that Maslow’s contradictions are unintentional and unconscious reflections of the society of which he was a part. This rather benign assessment suggests that Maslow was unaware of the implications of his theory, an assessment that I believe reflects an incomplete understanding of the framework in which he developed the hierarchy. That framework was his own earlier empirical research on the importance of dominance in explaining non-human primate (that is, monkey and ape) and human behavior. Maslow’s dominance studies were a significant contribution to the primatology research of the 1930s and references to them occur throughout a variety of more recent literature in that field (see, for example, de Waal, 1996; Fedigan, 1992; Haraway, 1989; Rowell, 1974). Whereas writers on primatology who mention Maslow’s dominance research often note that he is most known for the needs hierarchy (Haraway, 1989: 409, who describes this fact as ‘ironic’; de Waal, 1996: 99), other writers’ assessments of Maslow’s work slight or ignore this connection. Neither Aron’s nor Buss’s discussions refer to the dominance research, while Shaw and Colimore (1988: 63), in noting Maslow’s ‘lifelong fascination with individual superiority and social dominance’, cite but do not discuss it. My focus in this paper is an analysis of the ways in which the dominance research was the foundation of self-actualization theory and the implications of the link between the dominance studies and the needs hierarchy. Such an analysis, I believe, suggests why the hierarchy is intuitively appealing and, consequently, why it remains so influential. As I will demonstrate, Maslow explained dominance in terms of the characteristics of the individuals involved, not in terms of the attributes of either the interaction between them or the setting in which this interaction occurred. Instead, group organization or behavior was due to individual psychology. A given individual’s ability to be dominant over others was due to that individual’s acknowledged natural superiority, and differences in human or monkey groups and cultures occurred because of differences in the exercise of dominance by the individuals in those groups and cultures. As I will also demonstrate, the incorporation of these ideas into the needs hierarchy means that Maslow’s theory justifies managerial power, and enables managers to adopt motivational practices that appear to be responsive to employee needs while at the same time absolving them of accountability for the ineffectiveness of those practices. To set the context for this analysis, I begin with a description of Maslow’s career up to the point at which the needs hierarchy was published. I then discuss Maslow’s primate research and his study of dominance in humans and demonstrate how the findings from these studies form the basis for the needs hierarchy. I next move to the implications of the dominance
Salkind_Chapter 42.indd 177
9/4/2010 10:38:04 AM
178
Motivation
research for motivation theory, and finish with a discussion of the implications of a reexamination of Maslow’s monkey research in light of current primatological research.
Maslow’s Early Career Maslow was the first doctoral student of Harry Harlow (Suomi and LeRoy, 1982: 341), who later gained fame for his work on monkeys raised by surrogate mothers (see, for example, Harlow, 1974). In ‘an effort to see who was more correct, Freud or Adler, sex or dominance’ (Lowry, 1982: 55), Maslow focussed his doctoral research on the relationship between sexual behavior and dominance behavior in monkeys. He completed his doctorate in 1934 and became a research assistant to Edward Thorndike in the Institute for Educational Research at Columbia University the following year. Thorndike had his assistants complete a variety of intellectual and scholastic aptitude tests; Maslow’s tested IQ was an ‘astounding’ 195 (Hoffman, 1988: 74). For Maslow, this was evidence of his ‘factual superiority’, which meant that ultimately he was correct in his observations, intuitions and conclusions (see, for example, Lowry, 1982: 122–3). For his part, Thorndike was so impressed that he permitted Maslow to study whatever he wanted, which allowed Maslow to extend his studies of dominance and sexuality to humans (Wilson, 1972: 141). Maslow collected most of these data during 1936 (Hoffman, 1988: 75, 80), the same year that his studies of dominance in monkeys were published. Between this time and the publication of the needs hierarchy in 1943, his intellectual interests and activities also included other disciplines. In 1937, he began teaching at Brooklyn College, where he co-authored a textbook on abnormal psychology (Maslow and Mittelmann, 1941) and, relying primarily on his intuition, provided informal therapy to students (Hoffman, 1988: 142). He wrote a book chapter on the influence of culture on personality (Maslow, 1937c), and, in the summer of 1938, following anthropologist Ruth Benedict’s suggestion, undertook fieldwork among the Blackfoot in southern Alberta (Hoffman, 1988: 114). During this general time period, he also studied with the German psychoanalysts and Gestalt psychologists who had left Nazi Germany to work at the New School of Social Research (Hoffman, 1988: 87), including Kurt Goldstein, who had initially coined the term ‘self-actualization’ (Maslow, 1943: 382). He also continued to publish other papers based on his monkey studies (Maslow, 1937a; 1940a). While the studies of human dominance and sexuality were critical to the development of the needs hierarchy (Cullen, 1994; Lowry, 1973), for Maslow, ‘My primate research is the foundation upon which everything rests’ (Hoffman, 1988: 49). His confidence in the insights he gained from ‘my monkeys’ was in part due to his belief that this research involved a form of
Salkind_Chapter 42.indd 178
9/4/2010 10:38:04 AM
Cullen
Maslow, Monkeys and Motivation Theory
179
‘loving perception’; it was ‘more “true”, more “accurate”, in a certain sense, more objectively true’ because he was both ‘fond of’ and ‘fascinated’ by the monkeys (Maslow, 1971: 17, emphasis in original). We turn now to what he learned from his monkeys.
Dominance in Monkeys While the study of animal behavior is of value and interest in and of itself, its greater appeal for many people is its potential for teaching us about human behavior. This is particularly true of the study of non-human primates, especially the great apes (orangutans, chimpanzees and gorillas) and the Old World (Asian and African) monkeys such as macaques and baboons. One reason for this importance is that studies of non-human primates can give us insights into the behavior of our own earliest ancestors. How has human behavior evolved? What behaviors have led to, or ensured, our survival as a species? A second, but clearly related, reason is that non-human primate behavior can give us insights into behaviors that present-day humans share regardless of culture. In other words, monkeys and apes can tell us something about ‘true’ human nature, or, as Haraway (1986: 77) has observed, ‘what is “beneath”, “at the heart of”, or “outside”, our own behavior. Maslow himself clearly believed that this was the case. As noted earlier, his dissertation research was an attempt to compare the relative importance of sex and dominance in explaining behavior. He reported, however, that his initial attempt to study this relationship was a ‘failure’, both because of the ‘complexity of the problem’ and because his ‘own personality and social norms acted like a sieve or a filter’ (Maslow, 1937a: 488). In order to achieve ‘impartiality and objectivity’ he turned to animal studies, which allowed him to develop ‘a specific objective criterion or scale by which to judge human behavior’ and to see relationships among dominance, sexuality and social behavior that were ‘less confused by repression, inhibition, social norms and social values’ (Maslow, 1937a: 489). Comparative research of this type, Maslow argued, was a means of developing insights into ‘general humanness’ (1937a: 487). Given this potential significance, it is not surprising that primatology is a field of conflicting and contested data and interpretations, a field which is ‘politics by other means’ (Haraway, 1986). Conscious and unconscious social and political considerations influence what is observed about primate behavior; in turn, what is observed about primate behavior has social and political meaning. Consequently, each era of primatological research reflects the wider concerns of its time period (Haraway, 1989). Primatology in the 1930s, a time of political, social and economic turmoil, concentrated on the general themes of aggression and its control, cooperation and competition, and the means by which social order was maintained. Primatology researchers identified the
Salkind_Chapter 42.indd 179
9/4/2010 10:38:04 AM
180
Motivation
dominance hierarchy, which is, quite literally, the ‘pecking order’ among the members of a group, as that means of social control. To the researchers of the era, the dominance hierarchy was the ‘foundation of cooperation’, ensuring that the social order did not collapse into destructive competition (Haraway, 1978a: 33). Moreover, dominance was inextricably linked with sexuality: the potentially destructive competition amongst males was for access to sexually receptive females, who accepted a subordinate status in order to gain access to desired items (such as food) which the males’ greater size enabled them to control. This difference in size and subsequent subordination of females, it was argued, influenced or led to the creation of a patriarchal family unit (see, for example, Yerkes, 1939: 131). In other words, dominance was necessary and male dominance was natural. For those working within this paradigm, the issue was not the presence or absence of dominance, but rather its manifestations and underlying sources. In 1929, Yerkes and Yerkes emphasized that ‘Dominance and subordination are evident in every group of primates . . . Dominance may be by either sex, but dominance there must be’ (p. 250). Ten years later, in a review of the ‘social psychology’ of vertebrates, Crawford (1939: 418) observed that ‘exploration of the significance of the concept of dominance has hardly begun’, since knowledge of the connection between dominance relations and other social relations, as well as the factors that determined which animal was dominant, were still not fully clear. However, some answers were emerging. In his discussion of dominance in primates, Crawford focussed on two main studies: Maslow’s four papers and Zuckerman’s (1932) study of dominance in baboons. Zuckerman had argued that, because female primates (unlike other animals) are continually sexually receptive, and hence continually sexually active, male and female primates live in permanent groupings. The fighting and aggression inherent in this continual association is regulated through a dominance hierarchy in the form of a harem, in which a male ‘overlord’ controls access to as many females as he can, thus ensuring his own reproductive success. Zuckerman’s theory of females’ continual sexual receptivity is now considered to be an oversimplification (Fedigan, 1992: 158), but, at the time, it was enormously influential, helping to establish dominance as a fact rather than as a concept (Haraway, 1978b: 47). Maslow’s own papers refer to Zuckerman’s ‘excellent study’ which gives ‘a clear indication of the importance of the dominance principle in primate sociology’ (1936a: 261). At the same time, however, Maslow contended Zuckerman had ‘missed the full significance’ and ‘grossly underestimated the importance’ of dominance as a cause of social behavior (1936a: 262, 275). The constant sexual activity in monkeys, Maslow believed, occurred because of both the hormonal cycle and a dominance drive (1936b: 330), a conclusion that was based on his own observational and experimental studies.
Salkind_Chapter 42.indd 180
9/4/2010 10:38:04 AM
Cullen
Maslow, Monkeys and Motivation Theory
181
During most of 1932 and the first few months of 1933, Maslow had made observations of small groups of monkeys housed at the Vilas Park Zoo in Madison, Wisconsin (Maslow, 1936a). An ‘experienced observer’ like Maslow was easily able to determine which monkey was dominant in a group: it ‘struts’ rather than ‘slinks’, has a ‘cocky, aggressive and confident air’, and ‘stares fixedly and ferociously’ at the other monkeys (Maslow, 1936a: 266). Maslow later labelled this stare ‘the ‘Gaze’: it was a ‘look of command’ that is ‘level, unwavering, unyielding, even unself-conscious & spontaneous’, with which ‘the overlord just looked, & the other dropped his [sic] eyes as if he’d been mastered & admitted it’ (Lowry, 1982: 89, 30). From his observations, Maslow concluded (as had Zuckerman) that each group had a dominance hierarchy with one monkey who was the overlord. Dominance was related to size and age, but was not related to gender. Females could be dominant over both males and other females, and the behavior of dominant females was the same as that of dominant males. However, dominant females seemed to lose their dominance when they came into heat. These observations were followed in 1933–4 by the study of experimental pairings of macaque (rhesus) monkeys at the Primate Laboratory at the University of Wisconsin (Maslow, 1936b, 1936c; Maslow and Flanzbaum, 1936). In the pairings, monkeys which had been caged separately were brought together in another chamber for a varying number of brief experimental periods, during which all their behavior was recorded. Based on his analysis of these behaviors, Maslow (1936c: 183) decided that the best indicators of dominance were mounting (taking the male role in sexual behavior) and bullying, while the best indicators of subordinance were cringing and flight. It was on this basis that he argued that there was a continuum of sexual behavior, with one end being sexual behavior motivated by ‘sexual drive’ and the other end being sexual behavior motivated by ‘dominance drive’, with the latter type being used as a ‘power weapon’ (Maslow, 1936b: 319, 336). Maslow also concluded that all monkeys have this dominance drive; a subordinate monkey is one ‘whose dominance has been overshadowed by greater dominance’ (1936a: 264). What, then, causes a given monkey’s ability to overshadow another? According to Maslow, dominance is ‘determined by or actually is a composite of social attitudes, attitudes of aggressiveness, confidence or cockiness that are at times challenged, and which must then, of course, be backed up by physical prowess’ (Maslow and Flanzbaum, 1936: 305). In the experimental pairings, dominance was usually established quite rapidly since ‘one animal seemed, in most cases, to assume at once that he [sic] was dominant, and that the other animal seemed, just as naturally, to admit that he was subordinate’ (Maslow and Flanzbaum, 1936: 303– 4). The superiority of one was accepted by both.
Salkind_Chapter 42.indd 181
9/4/2010 10:38:04 AM
182
Motivation
The Nature of Dominance The picture of dominance that emerges in Maslow’s monkey studies is an unpleasant one. In Zuckerman’s baboons and Maslow’s macaques, dominance was ‘rough, brutal and aggressive; it is of the nature of a powerful, persistent, selfish urge’ that resulted in bullying and fighting (Maslow, 1940a: 316). Furthermore, dominance status (that is, a given monkey’s position in the dominance hierarchy) was ‘jealously guarded and affirmed’ (Maslow, 1940a: 319). However, not all non-human primates expressed dominance in such a brutal manner. Among chimpanzees, who were considered to be the most sociable of the great apes (Yerkes and Yerkes, 1929: 557), dominance was ‘mostly of a friendly kind’ (Maslow, 1940a: 314). One dominant male exercised his dominance in a teasing and playful, rather than a vicious, way (Maslow, 1935: 57). Another dominant chimpanzee tolerated, or was even apparently amused by, displays of anger or aggression by a subordinate (Maslow, 1940a: 315–16). Maslow was the first researcher to call attention to these genuine differences in the exercise of dominance (de Waal, 1996: 126); what is important to the development of the needs hierarchy is the conclusion he drew about them. He made what he called the ‘far-reaching and important’ suggestion that differences in group behavior were based on differences in individual personality (Maslow, 1940a: 322). The manner in which dominant chimpanzees behaved, as compared to the manner in which dominant baboons and macaques behaved, led to differences in chimpanzee society as compared to baboon and macaque society. As a result, Maslow hypothesized that cultural (or sub-cultural) differences among humans might be based on differences in the manner in which dominant individuals in those cultures behaved. What, then, would lead a dominant human to behave like a benevolent chimpanzee rather than a despotic baboon? The answer, according to Maslow, was the individual’s sense of emotional security. We turn now to the reasoning that led him to this conclusion. Maslow’s discussions of his human data reveal the same intermingling of dominance, sexuality and superiority as do his discussions of his monkey data. However, observing and measuring dominance in humans was more complicated than observing and measuring dominance in monkeys. For example, whereas a woman might dominate her husband, treating him with condescension, pity or aloofness, she might also be dominated by other people, an outcome which would not be apparent if one observed her only with her husband (Maslow, 1937b: 405–7). Consequently, Maslow focussed on people’s feeling of dominance, that is, the attitude which was analogous to the feeling of confidence he had observed in his dominant monkeys. He initially labelled this human attitude ‘dominance-feeling’ (Maslow, 1937b), but over time renamed it ‘ego-level’ (Maslow, 1939) and finally ‘self-esteem’ (Maslow, 1940b, 1942b), although he used the terms interchangeably.
Salkind_Chapter 42.indd 182
9/4/2010 10:38:04 AM
Cullen
Maslow, Monkeys and Motivation Theory
183
This renaming was intended to avoid the power-seeking connotation of ‘dominance-feeling’ (Maslow, 1942b: 269). Whatever its name, the attitude was difficult to define, so he provided a list of ‘near-synonyms’ that dominant people used to describe their own feelings about themselves, including selfconfidence, self-respect, forcefulness of personality, feelings that others do and ought to admire and respect one, and a consciousness of superiority in a general sense (Maslow, 1937b: 407). This feeling of superiority that highdominance people experienced was a ‘calm, objective recognition of facts that exist’; when factual inferiority was recognized, however, it did not lead to feelings of inferiority (Maslow, 1937b: 420). The data on the relationship between self-esteem and sexual behavior was collected in intensive, unstructured interviews totalling, on average, about 15 hours with each subject (Maslow, 1940b: 257). Initially, Maslow interviewed both men and women, but he found that ‘the men were far more evasive and tended to lie, exaggerate, or distort their sexual experiences’, whereas women, once they had agreed to participate, were more open (Hoffman, 1988: 77). In addition, Maslow found that interviewing women ‘was more fun – illuminating for me, the nature of women, who were certainly, to a shy boy, still mysterious’ (Wilson, 1972: 157), or, as his biographer describes it, the 28-year-old Maslow ‘got a thrill of excitement interviewing the women’ (Hoffman, 1988: 77). Consequently, the study was limited to women. In all, Maslow interviewed about 140 women, practically all of whom were middle-class college women between the ages of 20 and 28 (Maslow, 1942b: 270). Initially, he recruited subjects through word of mouth, but found that most of these volunteers were moderate to high in self-esteem (Maslow, 1937b: 418). In order to find more low self-esteem women, he developed a test of self-esteem (Maslow, 1940b) which he used to identify potential subjects; he then persuaded these women to ‘decide to submit to interview’ (Maslow, 1942b: 266). Based on the information Maslow elicited in the interviews, he assigned each woman a score on a scale of self-esteem. He also scored both her attitude toward sex and her sex drive, with the latter rating based on such factors as the ease, intensity and frequency of climax in hetero=sexual acts and the number of everyday stimuli which were sexually arousing (Maslow, 1942b: 264). Although he calculated correlations among these scores, Maslow relied more on the qualitative relationships, the ‘relationships as they impressed the experimenter’ (Maslow, 1942b: 272), to draw his conclusion that a woman’s sexual attitudes and behavior were more closely related to her self-esteem than to her sex drive. Judging from the items in the self-esteem scale (Maslow, 1940b: 267–70), high self-esteem women are also male-identified. They prefer men for company in sports, intellectual activities and conversation, and consider most other women catty and petty. Not surprisingly, they dominate most of the
Salkind_Chapter 42.indd 183
9/4/2010 10:38:04 AM
184
Motivation
women of their own age that they know; they also dominate most of the men of their own age that they know. In more general terms, high self-esteem women are more independent, socially poised, extroverted, relaxed and unconventional than low self-esteem women, who are timid, shy, modest, neat and retiring (Maslow, 1942b: 261). Low self-esteem women, however, were more honest than high selfesteem women (Maslow, 1942b: 261). Maslow does not seem to have fully realized the implications of this difference. He clearly believed that, since he had established good rapport and stressed the importance of telling the truth, his subjects were completely frank (Maslow, 1939: 5) and, as a result, his ratings were valid. Consequently, the possibility that his rating of self-esteem was simply a measure of a woman’s willingness to discuss sex with him, or that high self-esteem, male-identified women might have lied, exaggerated and distorted their sexual experiences (as did men) does not appear to have seriously influenced his interpretation of his results. Instead, he concluded that high self-esteem women were psychologically free and more natural, whereas low self-esteem women were inhibited and over-socialized (Maslow, 1939: 32).1 Maslow also concluded that among humans, and women in particular, dominance behavior is affected and inhibited by local and general cultural pressures and socialization as well as the specific situation (Maslow, 1939: 4); as a result, high self-esteem people might not always demonstrate their dominance. At the same time, however, being dominant is not always a sign of high self-esteem, since some people compensate for their low self-esteem by acting in a dominant way. Thus, Maslow had to differentiate between ‘true’ and ‘compensatory’ dominance behavior. Compensatory dominance behavior gives the impression of being ‘strained and unnatural . . . aggressive and louder than seems to be appropriate to the situation’ (Maslow, 1937b: 418), and occurs in people who have a ‘great craving’ for dominance, people who ‘feel weak but wish to appear strong’ (Maslow, 1937b: 422, 417). This craving occurs because of emotional or psychological insecurity. While insecure people feel isolated and rejected, are suspicious of others and crave power and status, secure people feel liked or loved, trust others and have a feeling of strength (Maslow, 1942a: 334–5). Maslow’s sense of the significance of emotional security had been reinforced by his experiences among the Blackfoot in 1938. Whereas the average person in North American society in general tended to be insecure (Maslow, 1942b: 269), Maslow saw the Blackfoot as very emotionally secure, a condition that he believed was due to the adults’ emphasis on instilling a sense of personal responsibility in children (Hoffman, 1988: 125). In addition, there was an almost perfect correlation between wealth and ability among ‘my Blackfoot Indians’ (Maslow, 1965: 137). Because of their emotional security, wealthy people were generous, giving away the goods that their ability had enabled them to acquire, and, as a result, they were admired and
Salkind_Chapter 42.indd 184
9/4/2010 10:38:04 AM
Cullen
Maslow, Monkeys and Motivation Theory
185
loved by others (Maslow, 1971: 204). Consequently, in Maslow’s perception (which may or may not have been accurate), dominance in Blackfoot society had positive connotations: capable people, appropriately rewarded for their capability, benefitted their society as a whole. Much as the less dominant in chimpanzee society had no reason to fear the more dominant, the less dominant in Blackfoot society had no reason to fear the more dominant. Indeed, they had every reason to praise and support them. The Blackfoot were also important for another reason. Maslow was originally a cultural relativist, arguing that ‘we must treat the individual first as a member of a particular cultural group, and only after this can we attempt to treat him [sic] as a member of the general human species’ (Maslow, 1937c: 409). However, he came to feel that ‘my Indians were first human beings and secondly Blackfoot Indians’ (Hoffman, 1988: 128, emphasis in the original). It was this feeling that led him to the concept of a ‘fundamental’ or ‘natural’ personality structure (Hoffman, 1988: 128), in other words, the universal theory of human motivation found in the needs hierarchy. In that hierarchy, emotional security is achieved through satisfaction of the love or belongingness needs. Satisfaction of the esteem needs comes through a willing and genuine acceptance of a given individual’s factual superiorities by both that individual and others. Hence, only the emotionally secure superior individual will develop the need to self-actualize. Self-actualization means taking one’s place among the elite who ‘enjoy responsibility’ and who are ‘parental or fatherly . . . stern as well as loving’ (Maslow, 1965: 131). A selfactualizer, because of his or her ‘deep feeling of identification, sympathy and affection’ for humanity in general, has a ‘genuine desire to help’ those ‘creatures whom he [sic] must regard with, if not condescension, at least the knowledge that he can do many things better than they can, that he can see things that they cannot see’ (Maslow, 1954: 217). The self-actualizing elite are able to provide this help through their clearer perception of reality, which makes their judgments a ‘partial basis for a true science of values, and consequently of ethics, social relations, politics, religion, etc.’ (Maslow, 1954: 204). Most important, only self-actualizers are ‘fully human’ (see, for example, Hoffman, 1996: 70).
Dominance and Motivation Theory Haraway (1978a: 21) has observed that the primatology of the 1930s represented a union between the political and the physiological (see also Sperling, 1991); Maslow’s needs hierarchy is the psychological outcome of that union. The psychological is based not just on the political but also on the physiological. This is explicit in the lowest of the needs in the hierarchy, the physiological needs, but also in the concept that the hierarchy itself is innate, and hence physiological or biological.
Salkind_Chapter 42.indd 185
9/4/2010 10:38:04 AM
186
Motivation
The assumption that the needs hierarchy is innate leads to both the democratic and aristocratic premises in Maslow’s theory. An innate hierarchy of needs means that all people possess or have these needs: they are universal, ahistorical and not linked to gender, class or culture. At the same time, however, the biological basis of the needs hierarchy leads to its aristocratic premise. Just as only some people have the biological potential to become extremely tall while others do not, only some people have the biological potential to self-actualize while others do not. Whether or not the individual is able to develop this potential to self-actualize depends on the type of environment in which she or he lives. For Maslow, the ‘good’ society is one in which the ‘biological elite’ are given the opportunity to develop their superiority, but are protected from the ‘almost inevitable malice of the biologically nongifted’ (Hoffman, 1996: 71), who cannot accept the reality that their inferiority is a matter of biological chance. This opportunity for the elite to develop their potential is particularly crucial in organizations, because of the commitment to work in self-actualizing people’s lives: ‘These highly evolved individuals assimilate their work into the identity, into the self, i.e., work actually becomes part of the self, part of the individual’s definition of himself [sic]’ (Maslow, 1965: 1). Self-actualizers are the living embodiment of the Protestant work ethic, in that ‘Salvation Is a By-Product of Self-Actualizing Work and Self-Actualizing Duty ’ (Maslow, 1965: 6, capitals and emphasis in the original). Salvation is also available to the masses since proper management of the ways in which people work and earn their living ‘can improve them and improve the world and in this sense be a utopian or revolutionary technique’ (Maslow, 1965: 1). There are, however, limits to what proper management can achieve, in that it will be successful, and is appropriate, only when employees are already developed (Maslow, 1965: 15–33). In other words, there are limits to what even enlightened management can achieve if employees are incapable of growth. One reason for the intuitive appeal of Maslow’s theory is now apparent. In a ‘good’ organization, the truly superior will be able to rise to the upper levels of that organization, while the inferior will properly remain in the lower levels. Managers are entitled to their positions since they deserve them, just as subordinates deserve their positions. If managers have achieved their positions because of their recognized genuine superiority, then their dominance can be assumed to be ‘of the “chimpanzee” sort, older-brotherly, responsible, affectionate, etc.’ (Maslow, 1965: 18). By responsibly and affectionately directing the activities of subordinates, managers are enabling those subordinates to grow and develop. At the same time, however, a subordinate’s level of growth and development is ultimately dependent not on the manager’s behavior but on the limited potential that that subordinate was born with. Thus, Maslow’s hierarchy justifies managerial power in organizations while minimizing managerial accountability for what occurs in those organizations.2
Salkind_Chapter 42.indd 186
9/4/2010 10:38:05 AM
Cullen
Maslow, Monkeys and Motivation Theory
187
The incorporation of Maslow’s needs hierarchy into other motivational theories extends these effects, while at the same time reinforcing the apparent validity of Maslow’s theory. Many textbooks (see, for example, Hellriegel et al., 1995: 187, and Luthans, 1995: 154) explicitly equate the lower and higher needs in Maslow’s hierarchy with, respectively, the hygiene factors and motivators in Herzberg’s (1966) two-factor theory. While Herzberg (1982: 292–3) considered this comparison an act of ‘creativity’ on the part of textbook writers (who, according to Herzberg, are compelled to provide personal input when they write about other people’s theories), the comparison gives credence to both Maslow and Herzberg. Similarly, Porter and Lawler’s (1968) expectancy theory uses a modification of Maslow’s theory to specify the needs that determine which rewards for effective performance will be satisfying, valued and hence lead to greater future effort. Expectancy theory is an application of cognitive psychology, in that it explains behavior in terms of the individual’s perception of environmental events (that is, in terms of whether or not an environmental event is felt to satisfy a need) rather than in terms of the environment itself. However, cognitive psychology, like Maslow’s theory itself, is psychology as ideology. Focussing on the individual’s subjective reactions to external events deflects an examination of those external events and thus serves to perpetuate the status quo and the interests of the powerful (Sampson, 1981), in much the same way as does attributing differences in power to biological inevitability. The commonality between the biological and cognitive approaches is perhaps best illustrated by the moderating variable of growth need strength in Hackman and Oldham’s (1980) job design application of expectancy theory. Some people, they observe, have a strong need for self-development, while others do not. The latter ‘may not recognize the existence of such opportunities [for self-development], or may not value them, or may even find them threatening and balk at being “pushed” or stretched too far by their work’ (Hackman and Oldham, 1980: 85). Similarly, Lawler (1992: 83) cautions that employees who do not value the higher-order needs of achievement, competence and personal growth will be frustrated rather than motivated by the work structure of involvement-oriented organizations. Again, those who cannot or will not grow will resist enlightened management’s attempts to enable them to do so.
Primatology and Motivation Theory The fundamental problem with motivation theory’s use of Maslow’s hierarchy is not necessarily the concept of dominance as such since, in both human and monkey societies, some individuals are able to dominate other individuals. Indeed, the study of the maintenance of patterns of dominance and subordinance is an essential feature of the analysis of organizations.
Salkind_Chapter 42.indd 187
9/4/2010 10:38:05 AM
188
Motivation
Nor is the problem necessarily the hierarchy’s basis in primatology data, since management theory has relied on insights drawn from other animal studies, as the example of organizational behavior modification illustrates. Skinner (see, for example, 1953) developed the principles of reinforcement theory in studies of rats and pigeons, and then applied these principles to understanding human behavior. Rather, the issue is the nature of the animal data on which Maslow based his understanding of dominance. His assumption that it was women’s selfesteem that enabled them to dominate others was based on his earlier conclusion that his monkeys’ confidence had enabled them to dominate others. However, his monkeys were both caged and isolated from one another except for the brief experimental periods they spent together. The methodology of Maslow’s monkey research led him to an individualistic conception of dominance, a conception which minimizes, if not ignores, the impact of the social setting and environment on the relationship between the more dominant and the less dominant. If we rely on a theory based on animal data that was collected more than 60 years ago, we are obligated to consider the accuracy and validity of that data. While Maslow assumed that ‘the behavior of caged animals differs in no fundamental way’ from that of uncaged animals, and that the behavior of animals in the zoo or laboratory was not ‘abnormal or perverted’ (1936a: 268, emphasis in original), we need to consider whether or not this is the case. In recent years, a major source of primatological data has been extended observations of free-living monkeys and apes in their natural habitats (for example, Goodall, 1990; Strum, 1987); as a consequence, primatologists’ understanding of dominance, aggression and competition has been significantly revised. The focus has shifted from explanations based on a given monkey’s size and physical strength to explanations based on that individual’s social skills. Concurrently, there has been growing attention to the means by which monkeys use cooperation as a means of maintaining the social systems in which they live (Fedigan, 1992: xix–xx). Most baboons and macaques, for example, live in multi-male, multifemale groups in which there are separate dominance hierarchies for each sex. Baboon and macaque societies are essentially female-bonded groups, since mothers and daughters form the permanent basis of the troop; at puberty, the males leave their original group and move to a new one (Napier and Napier, 1985: 71). Female hierarchies are relatively stable and primarily based on kinship, with the status of the mother being transferred to her daughters. While females rely on family members to maintain their position, they also form alliances with non-relatives (see, for example, Chapais, 1992; Datta, 1992). In the adult male hierarchy, while rank is determined by age and fighting ability, it is also determined by the length of time a male has been in the group and by his ability to form and maintain alliances with other members
Salkind_Chapter 42.indd 188
9/4/2010 10:38:05 AM
Cullen
Maslow, Monkeys and Motivation Theory
189
of the group. Dominant males are not isolated superiors, but rather are enmeshed in the ‘systems of social reciprocity they can actively construct’ (Strum, 1987: 152) through the exchange of social favors such as grooming or assistance in times of conflict. The formation of alliances or enduring cooperative relationships is a purposive act, involving complex social skills. Each individual monkey must recognize all of the others, be aware of and take into account its own relationship with each of those others, plus be aware of and take into account the relationships among the others (de Waal, 1982: 182) before it can begin to use those relationships for its own purposes. The experimental methods Maslow used did not permit him to see the social skills involved in establishing and maintaining dominance in nonhuman primate societies. As a result, he overestimated the autonomy of the dominant individual, and instead saw this individual as able to function independently and separately from others in the social setting. Concurrently, he underestimated the extent to which the dominant individual needed to pay attention to social links with others and use interpersonal skills in order to develop and maintain those links. His focus on the characteristics of the dominant individual in turn led to his belief that differences in the way those individuals behaved caused the differences in the nature of dominance in primate societies. However, the tendency of free-living monkeys and apes to form alliances, combined with the extent to which individuals are able to leave one group and enter another, provides an alternative explanation for these differences. The dominant individual in a group in which the less dominant are constrained from leaving can intimidate and oppress those subordinates, whereas the dominant individual whose subordinates can escape must behave more considerately (de Waal, 1996: 127). This environmental interpretation of differences in dominance style would seem to have more relevance for complex social settings such as organizations than does Maslow’s individualistic interpretation. Indeed, this new interpretation suggests another reason for both the initial and continuing intuitive appeal of Maslow’s theory. In times of economic growth and plentiful jobs, when employees are easily able to leave one organization for another, managers must treat those employees considerately, in order to ensure that they remain in the organization. These are precisely the conditions that existed in the 1960s, when Maslow’s theory entered the management literature. Organizations were able to recruit and retain employees by appearing to allow those employees the opportunity to develop their innate potential. In times of economic downturn, when jobs are not plentiful, managers can treat employees less considerately, threatening to discard them, or actually doing so through downsizing, restructuring, reengineering and rightsizing. However, these oppressive uses of control can be disguised by the contention that the remaining employees are ‘empowered’, and thus still apparently allowed to develop their innate potential.
Salkind_Chapter 42.indd 189
9/4/2010 10:38:05 AM
190
Motivation
Perhaps a more intriguing aspect of current primatology’s understanding of dominance, however, is the possibility that it provides for another perspective from which to view and develop theories of motivation. What would be the form of a motivation theory in which people’s prime goal is to create and strengthen social bonds in order that they can survive individually and collectively? What would be the form of a motivation theory that is based on cooperation as a fundamental construct? What would be the form of a motivation theory that stresses the influence of the social and environmental setting rather than the needs and reactions of the individual? What implications would such theories have for managerial practices? Applying insights from a discipline as contentious and contested as primatology is not without danger. If we make use of such a field, we cannot ignore its debates and changes. The point is not that it is inappropriate to rely on principles drawn from animal studies, but rather that we need to recognize the source of these principles, and the ways in which they may reflect and reproduce values and assumptions under the guise of objective science. We cannot achieve this recognition when we adopt these principles and then repeat them, mantra-like, without critical analysis. How many more examples of Maslow and his monkeys does management theory contain?
Notes I especially want to thank Linda Fedigan for encouraging my exploration of the primatology literature. I appreciate the always valuable comments of Barbara Townley, and the supportive advice of the editors and reviewers of Organization. I also want to thank Karen Farkas for her careful reading of an earlier version of this paper, A.D. (Tony) Fisher for his advice about Blackfoot culture, and Hope Olson for her material support. 1. See Cullen (1994) for a more detailed critique of these studies and a discussion of their implications for the women in management literature. 2. Another reason for the intuitive appeal of Maslow’s theory may lie in the ways in which it reflects and reinforces the sexuality of organizations (Hearn and Parkin, 1987) through its conflation of self-actualization with masculinity, dominance and sexuality. However, this aspect of its appeal, and the implications for the analysis of gender in organizations, will not be developed in this paper.
References Aron, A. (1977) ‘Maslow’s Other Child’, Journal of Humanistic Psychology 17(2): 9–24. Buss, A.R. (1979) ‘Humanistic Psychology as Liberal Ideology: The Socio-Historical Roots of Maslow’s Theory of Self-Actualization’, Journal of Humanistic Psychology 19(3): 43–55. Chapais, B. (1992) ‘The Role of Alliances in Social Inheritance of Rank among Female Primates’, in A.H. Harcourt and F .B.M. de Waal (eds) Coalitions and Alliances in Humans and Other Animals, pp. 29–59. New York: Oxford University Press. Crawford, M.P . (1939) ‘The Social Psychology of the Vertebrates’, Psychological Bulletin 36: 407– 46.
Salkind_Chapter 42.indd 190
9/4/2010 10:38:05 AM
Cullen
Maslow, Monkeys and Motivation Theory
191
Cullen, D. (1994) ‘Feminism, Management and Self-Actualization’, Gender, Work and Organization 1: 127–37. Datta, S.B. (1992) ‘Effects of Availability of Allies on Female Dominance Structure’, in A.H. Harcourt and F .B.M. de Waal (eds) Coalitions and Alliances in Humans and Other Animals, pp. 61–82. New York: Oxford University Press. Fedigan, L.M. (1992) Primate Paradigms: Sex Roles and Social Bonds. Chicago: University of Chicago Press. Goodall, J. (1990) Through a Window: My Thirty Years with the Chimpanzees of Gombe. Boston, MA: Houghton Mifflin. Greiner, L.E. (1992) ‘Resistance to Change During Restructuring’, Journal of Management Inquiry 1: 61–5. Hackman, J.R. and Oldham, G.R. (1980) Work Redesign. Reading, MA: Addison-Wesley. Haraway, D. (1978a) ‘Animal Sociology and a Natural Economy of the Body Politic, Part I: A Political Physiology of Dominance’, Signs: Journal of Women in Culture and Society 4: 21–36. Haraway, D. (1978b) ‘Animal Sociology and a Natural Economy of the Body Politic, Part II: The Past is the Contested Zone: Human Nature and Theories of Production and Reproduction in Primate Behavior Studies’, Signs: Journal of Women in Culture and Society 4: 37–60. Haraway, D. (1986) ‘Primatology is Politics by Other Means’, in R. Bleier (ed.) Feminist Approaches to Science, pp. 77–118. New York: Pergamon. Haraway, D. (1989) Primate Visions: Gender, Race, and Nature in the World of Modern Science. New York: Routledge. Harlow, H.F . (1974) Learning to Love. New York: Jason Aronson. Hearn, J. and Parkin, W. (1987) ‘Sex’ at ‘ Work’: The Power and Paradox of Organization Sexuality. New York: St Martin’s Press. Hellriegel, D., Slocum, J.W. Jr and Woodman, R.W. (1995) Organizational Behavior, 7th edn. Minneapolis/St Paul: West. Herzberg, F . (1966) Work and the Nature of Man. New York: New American Library. Herzberg, F . (1982) The Managerial Choice: To be Efficient and to be Human. Salt Lake City, UT: Olympus Publishing Co. Hoffman, E. (1988) The Right to be Human. Los Angeles: Jeremy P. Tarcher. Hoffman, E., ed. (1996) Future Visions: The Unpublished Papers of Abraham Maslow. Thousand Oaks, CA: Sage. Huczynski, A.A. (1993) Management Gurus: What Makes Them and How to Become One. London: Routledge. Lawler, E.E. III (1992) The Ultimate Advantage: Creating the High-Involvement Organization. San Francisco, CA: Jossey-Bass. Lowry, R.J., ed. (1973) Dominance, Self-esteem, Self-actualization: Germinal Papers of A. H. Maslow. Monterey, CA: Brooks/Cole. Lowry, R.J., ed. (1982) The Journals of Abraham Maslow. Lexington, MA: Lewis. Luthans, F. (1995) Organizational Behavior, 7th edn. New York: McGraw-Hill. Maslow, A.H. (1935) ‘Individual Psychology and the Social Behavior of Monkeys and Apes’, International Journal of Individual Psychology 1: 47–59. Maslow, A.H. (1936a) ‘The Role of Dominance in the Social and Sexual Behavior of InfraHuman Primates: I. Observations at Vilas Park Zoo’, Journal of Genetic Psychology 48: 261–77. Maslow, A.H. (1936b) ‘The Role of Dominance in the Social and Sexual Behavior of InfraHuman Primates: III. A Theory of Sexual Behavior of Infra-Human Primates’, Journal of Genetic Psychology 48: 310–38. Maslow, A.H. (1936c) ‘The Role of Dominance in the Social and Sexual Behavior of InfraHuman Primates: IV. The Determination of Hierarchy in Pairs and in a Group’, Journal of Genetic Psychology 49: 161–98.
Salkind_Chapter 42.indd 191
9/4/2010 10:38:05 AM
192
Motivation
Maslow, A.H. (1937a) ‘The Comparative Approach to Social Behavior’, Social Forces 15: 487–90. Maslow, A.H. (1937b) ‘Dominance-Feeling, Behavior, and Status’, Psychological Review 44: 404 –29. Maslow, A.H. (1937c) ‘Personality and Patterns of Culture’, in R. Stagner Psychology of Personality, pp. 408–28. New York: McGraw-Hill. Maslow, A.H. (1939) ‘Dominance-Feeling, Personality and Social Behavior in Women’, Journal of Social Psychology 10: 3–39. Maslow, A.H. (1940a) ‘Dominance-Quality and Social Behavior in Infra-Human Primates’, Journal of Social Psychology 11: 313–24. Maslow, A.H. (1940b) ‘A Test for Dominance-Feeling (Self-Esteem) in College Women’, Journal of Social Psychology 12: 255–70. Maslow, A.H. (1942a) ‘The Dynamics of Psychological Security–Insecurity’, Character and Personality 10: 331– 44. Maslow, A.H. (1942b) ‘Self-Esteem (Dominance-Feeling) and Sexuality in Women’, Journal of Social Psychology 16: 259–94. Maslow, A.H. (1943) ‘A Theory of Human Motivation’, Psychological Review 50: 370–96. Maslow, A.H. (1954) Motivation and Personality. New York: Harper. Maslow, A.H. (1965) Eupsychian Management. Homewood, IL: Irwin. Maslow, A.H. (1971) The Farther Reaches of Human Nature. New York: Viking. Maslow, A.H. and Flanzbaum, S. (1936) ‘The Role of Dominance in the Social and Sexual Behavior of Infra-Human Primates: II. An Experimental Determination of the Behavior Syndrome of Dominance’, Journal of Genetic Psychology 48: 278–309. Maslow, A.H. and Mittelmann, B. (1941) Principles of Abnormal Psychology: The Dynamics of Psychic Illness. New York: Harper. Mitchell, V .F . and Moudgill, P . (1976) ‘Measurement of Maslow’s Need Hierarchy’, Organizational Behavior and Human Performance 16: 334 – 49. Moorhead, G. and Griffin, R.W. (1995) Organizational Behavior, 4th edn. Boston, MA: Houghton Mifflin. Napier, J.R. and Napier, P .H. (1985) The Natural History of the Primates. Cambridge, MA: MIT Press. Porter, L.W. and Lawler, E.E. III (1968) Managerial Attitudes and Performance. Homewood, IL: Irwin. Rowell, T.E. (1974) ‘The Concept of Social Dominance’, Behavioral Biology 11: 131–54. Sampson, E.E. (1981) ‘Cognitive Psychology as Ideology’, American Psychologist 36: 730–43. Shaw, R. and Colimore, K. (1988) ‘Humanistic Psychology as Ideology: An Analysis of Maslow’s Contradictions’, Journal of Humanistic Psychology 28(3): 51–74. Skinner, B.F. (1953) Science and Human Behavior. New York: Macmillan. Sperling, S. (1991) ‘Baboons with Briefcases vs. Langurs in Lipstick: Feminism and Functionalism in Primate Studies’, in M. di Leonardo (ed.) Gender at the Crossroads of Knowledge: Feminist Anthropology in the Postmodern Era, pp. 204 –34. Berkeley: University of California Press. Strum, S.C. (1987) Almost Human: A Journey into the World of Baboons. New York: Random House. Suomi, S.J. and LeRoy, H.A. (1982) ‘In Memoriam: Harry Harlow (1905–1981)’, American Journal of Primatology 2: 319– 42. de Waal, F. (1982) Chimpanzee Politics: Power and Sex among Apes. New York: Harper & Row. de Waal, F . (1996) Good Natured: The Origins of Right and Wrong in Humans and Other Animals. Cambridge, MA: Harvard University Press.
Salkind_Chapter 42.indd 192
9/4/2010 10:38:05 AM
Cullen
Maslow, Monkeys and Motivation Theory
193
Wahba, M.A. and Bridwell, L.G. (1976) ‘Maslow Reconsidered: A Review of Research on the Need Hierarchy Theory’, Organizational Behavior and Human Performance 15: 212–40. Wilson, C. (1972) New Pathways in Psychology: Maslow and the Post-Freudian Revolution. London: Victor Gollancz. Yerkes, R.M. (1939) ‘Social Dominance and Sexual Status in the Chimpanzee’, The Quarterly Review of Biology 14: 115–36. Yerkes, R.M. and Yerkes, A.W. (1929) The Great Apes: A Study of Anthropoid Life. New Haven, CT: Yale University Press. Zuckerman, S. (1932) The Social Life of Monkeys and Apes. London: Kegan.
Salkind_Chapter 42.indd 193
9/4/2010 10:38:05 AM
Salkind_Chapter 42.indd 194
9/4/2010 10:38:05 AM
43 Maslow’s Theory of Motivation: A Critique Andrew Neher
T
his critique will evaluate Abraham Maslow’s theory of motivation, including each of its basic propositions. Although other critics have addressed various aspects of Maslow’s theory, no one, as far as I know, has taken on Maslow’s basic theory in toto. Two decades after his death, Maslow is still revered as one of the founders and guiding lights of humanistic psychology. Unfortunately, humanistic psychologists have yet to probe the flaws in Maslow’s theory in any concerted or thorough fashion. Why is this? Maybe it stems from motivations such as loyalty to the cause, but it may also relate to the tendency of humanistic psychologists to be “accepting” rather than “critical.” Of course, Maslow is known outside of humanistic psychology circles. Maslow himself sought to apply his theory to fields in the borderlands of psychology, where it still wields influence in some quarters – for example in the fields of management (Maslow, 1967), religion (Maslow, 1964), and science (Maslow, 1969). In addition, Maslow is routinely cited when general psychology texts discuss humanistic psychology. Texts in “adjustment” courses, in particular, tend to pay him much attention, sometimes to the extent of recommending that students evaluate their own lives to see how well they conform to Maslow’s ideas concerning the “good life.” On the other hand – the field of management excepted (e.g., Huizinga, 1970) – Maslow is seldom cited in the research literature on motivation, which means that his theory, to a significant extent, lies outside the mainstream of testing and critical evaluation that is the lifeblood of any vital theory. Source: Journal of Humanistic Psychology, 31(3) (1991): 89–112.
Salkind_Chapter 43.indd 195
9/4/2010 10:41:33 AM
196
Motivation
Thus there are many reasons to take a close look at Maslow’s theory and bring its flaws into the light of day. This article is a contribution to that effort.
Maslow’s Theory Outlined Most of Maslow’s basic theory is found in the 1970 edition of his book, Motivation and Personality, although I will draw from some of his other works from time to time. According to his theory 1. Each of us is endowed at birth with a full and, to an important extent, unique complement of needs that, allowed expression by our environment, will guide our growth in a healthy direction. 2. These needs function in a hierarchical manner. The bottom step of Maslow’s 5-step hierarchy, or pyramid, includes physiological needs (for food, water, and so on). Then come safety needs; next, needs for love and intimacy; then self-esteem needs; and, finally, at the apex of the pyramid, self-actualization (e.g., intellectual and esthetic) needs. By hierarchy is meant that needs lower on the pyramid must generally be satisfied before needs at higher levels are “activated.” For example, starving people (deprived on level one) will find it difficult to be very concerned about their relationships with others (needs on level three) until they are fed. 3. Needs on the first four levels are called deficiency-needs (or D-needs) because they drive us to gratify the need, at which point the need lapses in its importance to us until deprivation again motivates us to take action to satisfy the need. Self-actualization needs (on the fifth and highest level), on the other hand, are called being-needs (or B-needs) because, among other unique features, they sustain our interest without our being driven by feelings of deprivation. 4. The level of self-actualization, the end-point of the process outlined above, constitutes the highest level of human experience. To illustrate his theory, Maslow described a number of people he considered self-actualizers, including such well-known figures as Abraham Lincoln and Eleanor Roosevelt. All of these people, according to Maslow, share various personality traits (which Maslow subsumed under rubrics such as beingcognition and being-values). These include being relatively creative, spontaneous, able to see the “large picture,” nonjudgmental, and rich in emotional life; in particular, self-actualizers are more apt to experience euphoric heights of emotion that Maslow labeled peak experience. To summarize, we are born with certain needs, some of which, such as hunger, are prepotent in that they occupy our attention until they are satisfied. But such motivations are not what make us fully human. Only by living a life in which these lower needs are satisfied can we rise to our full human potential,
Salkind_Chapter 43.indd 196
9/4/2010 10:41:33 AM
Neher
Maslow’s Theory of Motivation
197
becoming self-actualized, as we free ourselves to become involved in higher pursuits such as art, literature, and science, and to experience the finer human qualities of broad understanding, tolerance, and the sublime emotions. Stated in rough outline, Maslow’s theory finds ready acceptance with many people. The theory seems reasonable and fits many of our preconceptions: For example, of course hungry people are concerned with little else besides finding food. But as we take a closer look we will see that almost every aspect of Maslow’s theory is burdened with a multitude of problems. We will see that many of these problems stem from the extreme stands that his theory, as a close examination will show, tends to take. The problem of overstatement is not unique, of course, to Maslow. It is a common trait of theorists who attempt, as Maslow did, to develop a perspective in opposition to prevailing theories. In Maslow’s case, the prevailing theories of motivation stemmed from psychoanalysis on the one hand and behaviorism on the other. Thus we should not be surprised that Maslow overstated his case in an attempt to make his theory distinctive when compared with competing theories. Other problems involve some of Maslow’s more peripheral statements that contradict many of the assumptions of his own theory. To some extent, Maslow seemed to have second thoughts about his theory, but these modifications never filtered down to his general theoretical statements. This might have been intentional, in part, because these qualifications to his theory have the effect, as we shall see, of “watering it down” and making it less distinctive. But perhaps the most significant basis of these inconsistencies was Maslow’s tendency, which he himself recognized, to be impressionistic, rather than conceptually rigorous, in his thinking and writing (Daniels, 1982, pp. 62, 70–71). Finally, still other problems concern the internal logic of his theory.
Maslow’s Theory Critiqued Let us evaluate the various components of Maslow’s theory in the order that they were presented earlier. 1. Each of us is endowed at birth with a complete, and, to some extent, unique complement of needs that, allowed expression by our environment, will foster our growth in a healthy direction (Maslow, 1970, pp. 77–104). Few psychologists would disagree that our lower needs in general (hunger, need for intimacy, and so on) are innate. But many would question whether, in general, the higher needs (intellectual, esthetic) are innate as Maslow claimed (1970, pp. 100–101). Although there is good evidence for the innate nature of some of the higher needs (e.g., the curiosity drive; Eisenberger, 1972), others, such as esthetic motivations, are probably largely shaped by cultural experience. Maslow’s tendency to downgrade the role of the environment in
Salkind_Chapter 43.indd 197
9/4/2010 10:41:33 AM
198
Motivation
forming the human psyche has been noted by several critics (e.g., Aron, 1977; Daniels, 1982; Geller, 1982; Smith, 1973) and seems to be related to his rejection of the behaviorist perspective, which traditionally committed the opposite error of viewing environmental influence as all-important (Maslow, 1970, pp. 88–89). According to Maslow, once [lower needs are met] each person proceeds to develop in his own style, uniquely, using these necessities for his own private purposes. In a very meaningful sense, development then becomes determined from within rather than from without. . . . The role of the environment is ultimately to permit him or help him to actualize his own potentialities [because] he “knows” better than anyone else what is good for him. (1968, pp. 34, 160, 198)
To sum up, Maslow believed that, given basic support and nurturance from the environment, our inborn needs are sufficient to foster our psychological growth in a healthy direction. Thus it is clear that Maslow is squarely in the camp of the nativists, who stress the role of hereditary influences in human experience. In this regard, he is in accord with many other humanistic psychologists (e.g., Carl Rogers) and, as a consequence, suffers along with them from a number of difficulties. If the most culture can do, or should do, is provide for basic needs and freedom of expression, then most of the structure of cultures around the world must be viewed as potentially disruptive. In particular, child-rearing practices may conflict with innate needs of children to develop in directions other than those sanctioned by the culture. As Maslow said, “Our human instincts [including our needs] are so weak that they need protection against culture, against learning – in a word, against being overwhelmed by the environment” (1970, p. 103). Of course, as Maslow admitted (1970, p. 278), our culture is relatively tolerant, but he believed that we still need to tip “the balance [even more] in favor of spontaneity, the ability to be expressive . . . creative, etc.” (1968, p. 198). Let us take Maslow at his word, and let us take language as our example. According to the widely accepted Sapir-Whorf hypothesis, the particular language we speak determines to some extent the way in which we are able to think about the world (Whorf, 1956). If this is so, then teaching our own language to our children has the effect, in part, of putting their thoughts in an intellectual straitjacket – perhaps, unfortunately, in ways that conflict with their innate needs to conceptualize the world in their own unique fashion. So perhaps we should “protect” our children from hearing our language so that they can create their own. But, of course, we know that, although children inherit a genetic ability to learn language that they hear in their environment (Piattelli-Palmarini, 1980), they do not inherit the ability to create, from scratch, their own language (Malson, 1972). And, if they could, can you imagine the problem of attempting to communicate with one another, each
Salkind_Chapter 43.indd 198
9/4/2010 10:41:33 AM
Neher
Maslow’s Theory of Motivation
199
of us in a different language? Much the same could be said, of course, of a multitude of other cultural traits that serve as a common basis for human relations in any culture. One way to understand our need to learn the folkways of our culture is to remember that the trend in human evolution has been away from strong genetic programming. Instead, we develop our “humaness,” to a significant extent, through being socialized into the norms of our particular culture. In fact, our genetic heritage seems to consist, to a large degree, of a potential to adapt to any of the wide variety of cultures that have ever existed; that is, our genetic endowment seems very flexible in this regard. And, although we each inherit a unique mix of needs and potentials, these require for their development a context of cultural inputs (language, and so on) that are, at least initially, imposed upon us. This is because, as young children, our nervous systems are not sufficiently developed to allow us to choose from among these inputs. Of course, parents should be sensitive to their childrens’ unique individual needs, but it is hardly possible to tailor basic cultural inputs (language is a good example) to the individual. Naturally, as we mature most of us are increasingly able to choose the life experiences that best “fit” us, but these choices are a product of the unique mix of genes and culture that each of us embodies by the time we are old enough to make these choices. In sum, Maslow’s list of needs ignores considerations such as these. It does not include the need to learn language or any of the other cultural traits that create our humanness and bind us socially. To repeat, his theory implies that the imposition of cultural norms is unnecessary at best, and, at worst, destructive of our unique potential as individuals. In this regard, he and many other humanistic psychologists are in the mainstream of Western values that tend to glorify the individual. Maslow’s failure to acknowledge the need to learn cultural norms may have stemmed from more than one source. On the one hand, he may have assumed that, with the advent of pluralistic societies such as ours, we all need to pick and choose our own path, and that the best basis for this is the unique mix of needs we each inherit. But, as has already been pointed out, this assumption is undermined by the fact that we are helpless as children to “pick and choose” until we have already been socialized into the language patterns and other basic norms of our particular culture. Of course, if Maslow’s theory does apply only to pluralistic societies, then it is culture specific rather than universal in application. On the other hand, Maslow may have been reacting against the obvious failures of our own society, his solution being to base human development on the “wisdom” of the unique biological makeup of each of us rather than on bankrupt cultural priorities. However, a good argument can be made that extreme individualism, whether or not it is founded on the notions of individual biological uniqueness that Maslow favored, in fact, fosters much of the social alienation and dehumanization that plagues our society. One critic noted the “irony that those as deeply
Salkind_Chapter 43.indd 199
9/4/2010 10:41:33 AM
200
Motivation
concerned about the human condition as . . . Maslow . . . should have developed a theory the practical recommendations of which sustain and strengthen the very dehumanization against which in part they are reacting” (Geller, 1982, p. 72). Thus the nativist position is more than just a theoretical issue. For example, we have all known parents who have hesitated to “put their own trip” on their child, for fear of violating their child’s unique nature, to the point where they became ineffective as parents. And we have all known children who have, in conformance with pop-psych beliefs, agonized over who the “real me” is as distinct from the “me whom my parents created.” But these hesitancies and agonies, of course, are predicated on the notion that there is a more or less complete, original “me” waiting to blossom given only a nurturing and accepting, but otherwise neutral environment. So whether or not the assumptions of Maslow – and other nativists such as Carl Rogers – are valid is a very significant question, with very real ramifications. Another difficulty with the nativist position concerns its internal logic. If all that we require to become self-actualized is that our culture provide for our basic needs and freedom of expression, then our genetic potential is indeed potent. As Maslow said, our “inner nature . . . tends strongly to persist” (1968, p. 190). But, if this is so, then why was Maslow, in agreement with many other humanistic psychologists, so fearful that our culture will misdirect us in ways that violate this potential? Elsewhere Maslow said that “this inner nature . . . is weak and delicate . . . and easily overcome by . . . cultural pressure” (1968, p. 4). Maslow seemed to want it both ways – a strong innate tendency to self-actualize on the one hand, but also a disturbing weakness in the face of cultural dictates on the other. But, of course, he cannot have it both ways. At least one assumption must be wrong – or, more likely, less extreme versions of both might be correct. A final issue related to Maslow’s nativist position concerns values rather than logic. Along with other nativists, Maslow maintained, in essence, that we have to live with whatever the genetic roll of the dice provides us, because environmental influences (other than providing for our needs) are viewed either as relatively insignificant or as potentially insensitive to our innate tendencies (Daniels, 1988, p. 25). Where behaviorists have traditionally said “You can become whatever you want, and we’ll show you how,” Maslow, and other nativist theorists, have said, “You can become what your native potential allows you to become, and nothing else.” Although the behaviorists are undoubtedly overly optimistic in their view, Maslow seems overly pessimistic. In this case, Maslow goes against the grain of Western values, which maintain that practically unlimited possibilities are open to any of us. To summarize, Maslow’s tendency to emphasize the role of our innate needs in directing the course of healthy psychological development, and his tendency to downgrade the importance of cultural input in this process, leads to a view of human development that is one-sided and consequently very
Salkind_Chapter 43.indd 200
9/4/2010 10:41:33 AM
Neher
Maslow’s Theory of Motivation
201
difficult to support. Thus we start to see some of the problems that stem from Maslow’s tendency to take extreme stands. Now let us move ahead and examine the second component of Maslow’s theory. 2. Our needs function in a hierarchical fashion, so that our basic needs ( for food, etc.) are prepotent, in that generally they must be satisfied before we can feel “free” of them and move on to satisfy our higher needs (Maslow, 1970, pp. 35–51). Actually, in advanced societies our physiological and safety needs (the first two steps on Maslow’s need-pyramid) are often satisfied, whereas the next two steps – needs for love and for self-esteem – constitute stumbling blocks for many people. In simpler societies, on the other hand, the situation is often the reverse. In such societies, people may periodically go hungry and suffer from life-threatening illnesses, but nevertheless, unless these problems are severe (Turnbull, 1974), people in these societies typically exhibit strong social ties and a strong sense of self. In fact, it appears that a certain degree of hardship in meeting basic needs can bring people together and give them a sense of purpose as they cooperate to overcome adversity. Most of us can probably recall experiences of our own that illustrate this process. For example, many couples say that struggling together to make ends meet when they were young fostered strong bonds between them, compared with their later years when they had finally achieved a life of ease and comfort. If these examples are valid, they stand Maslow’s need-hierarchy on its head: In these instances, deprivation at lower-need levels (survival needs) seems to facilitate need satisfaction at higher levels (e.g., the achievement of intimacy) rather than hinder it as Maslow would predict. Aside from such anecdotal evidence, some researchers, particularly in the field of management, have attempted to test Maslow’s hierarchy in a more systematic fashion. In general, these researchers have wanted to determine if Maslow’s theory can clarify the factors involved in job choice and job satisfaction. Here is a sampling of these studies, many of which are summarized in Wahba and Bridwell (1979). Some of these studies have been designed to test Maslow’s particular ordering of needs in his hierarchy. Briefly, the results of these studies are equivocal; results range from some support (Graham & Balloun, 1973; Mathes, 1981; Wuthnow, 1978), to no support (Miner & Dachler, 1973), to outright refutation (Wofford, 1971). Other studies have attempted to test Maslow’s assertion that need satisfaction leads to a diminution of that need in the future. These studies show a similar spread, from some support (Alderfer, 1969; Graham & Balloun, 1973), to no support (Lawler & Suttle, 1972), to results that indicate that need-satisfaction leads to heightened salience of the need (Hall & Norigaim, 1968)! Obviously the research picture is rather equivocal. However, research of this nature seldom yields definitive answers and should not be considered, in
Salkind_Chapter 43.indd 201
9/4/2010 10:41:33 AM
202
Motivation
and of itself, the last word. Thus let us take a closer look at Maslow’s assertion that “need gratification diminishes the strength of the need,” because, in spite of its quality of seeming obvious, I believe it is highly questionable. First of all, no one denies that need satisfaction leads to a temporary decrease in the strength of a need. But most needs are cyclical, in that they are satisfied for a time, only to resurface later. Hunger and sex are obvious examples. What Maslow meant is that, over the long term, the strength of a need that is readily and easily satisfied will decline. For example: “If a mother kisses her child often, the drive itself disappears and the child learns not to crave kisses” (Maslow, 1970, p. 63). As with much of Maslow’s theory, this statement seems reasonable at first glance. It certainly ties in with much of our experience, as well as with other theories, such as psychoanalysis, that are widely accepted: When we express our needs, we are less “bothered” by them. But there is another possibility. Behaviorists would probably maintain that kissing, for example, is usually more valued by adults than by children, partly because of the pleasures that have been associated with it on so many different occasions. And, strangely enough, we can probably all think of examples from our own experience that support this alternative perspective. So which is it? Over the long term, do needs “dry up” or “well up” when they are satisfied? Unfortunately, there is no ready answer to this question, and psychologists remain divided on the issue. If Maslow meant that we should oversatiate our needs (e.g., eat until we are sick of eating) then we would probably agree that needs would tend to “dry up,” but there is no indication that he had this in mind. The point is that Maslow’s assumption – that satisfying needs reduces their strength in the long run, which is so crucial to his theory as a whole – is much more tenuous than he indicated. It is important to keep in mind that Maslow put himself in such a tenuous position because he was intent on eliminating the lower needs, in this process, as a motivational force in our lives; this was his prescription for moving up the needs hierarchy to the level of self-actualization. At this point, we need to examine another of Maslow’s assumptions that is not obvious on first inspection – namely, that the highest level in his need hierarchy, self-actualization, is, ideally, autonomous. It is obvious that our motivations to engage in creative, intellectual, or esthetic pursuits (pursuits on the highest level of the hierarchy) may, in fact, stem from lower needs – such as needs to gain social recognition, enhance our self-esteem, or even, perhaps, to satisfy physiological survival drives. In general, both psychoanalysts and behaviorists would agree with this view, citing mechanisms such as sublimation on the one hand and conditioned associations on the other. Maslow himself made the point that “the cognitive capacities . . . are a set of adjustive tools, which have, among other functions, that of satisfaction of our basic needs. . . . Acquiring knowledge and systematizing the universe [are], in part, techniques for the achievement of basic safety in the world” (1970, pp. 47– 48). But, of course, it is central to Maslow’s theory that these
Salkind_Chapter 43.indd 202
9/4/2010 10:41:33 AM
Neher
Maslow’s Theory of Motivation
203
lower motivations, when they are present, detract from the true essence of self-actualization. In Maslow’s theory, remember, the road to self-actualization requires having already satisfied these basic needs. This means that Maslow must, as he said, “distinguish the artistic and intellectual products of basically satisfied people from those of basically unsatisfied people” (1970, p. 46), to make sure their accomplishments are not contaminated by lower needs. Not an easy task. If the self-actualization needs are, ideally, autonomous, how then did Maslow explain the mechanism through which this occurs? His main theme, of course, was that the self-actualization needs evolved biologically (Maslow, 1970, pp. 100–101). The problem was that he was not clear how this came about. Now, our higher needs might have evolved to serve lower needs, and /or they might have evolved because they are adaptive in their own right. If they evolved to meet our lower needs, then we must somehow explain how, on a biological level, they have become autonomous. If they evolved because they are adaptive in their own right, we must postulate that creative, intellectual, and artistic endeavors facilitate survival in and of themselves and thus have been incorporated into the gene pool. As far as I know, Maslow never discussed these possibilities. Maslow’s chief explanation for the autonomous nature of the self-actualization needs invoked Gordon Allport’s (1937) notion of “functional autonomy [in which the higher need] develops only on the basis of the lower, but eventually, when well established, may become relatively independent of the lower” (Maslow, 1970, pp. 103–104). For example, consider the following scenario: Let us imagine that you have a natural talent for music for which you are praised (which satisfies social recognition and self-esteem needs) in your younger years. As you grow up, your interest in music itself is enhanced because of its association with social rewards, and thus you develop your musical skills more and more “for their own sake.” Also, behaviorists would predict this increasingly autonomous interest in music on the basis that the “schedule” of social reinforcement becomes intermittent and unpredictable. But, as reasonable as this scenario is, it is a poor fit with the rest of Maslow’s theory. It requires some initial degree of lowerneed deprivation, which violates his conception of the self-actualizing process, and, because it derives from an environmentalist perspective, it goes against the grain of his biological bias. Actually, it is questionable whether Maslow truly understood the implications of the functional autonomy theory. In sum, Maslow never adequately accounted, as far as I can determine, for the autonomous nature that he postulated for the self-actualization needs. Now let us address in greater detail Maslow’s belief that satiation of lower needs leads to self-actualization. This is such an important assertion that we need to be clear concerning what Maslow said about it: “Gratification of any basic need . . . is a move in the healthy direction” (1970, pp. 61–62), and “a man who is thwarted in any of his basic needs may fairly be envisioned simply as . . . less than fully human” (1970, p. 57). Seems pretty clear. Then
Salkind_Chapter 43.indd 203
9/4/2010 10:41:33 AM
204
Motivation
what can we make of a statement such as “the complete absence of frustration, pain or danger is dangerous. To be strong, a person must acquire frustrationtolerance” (1968, p. 200). Obviously there is a contradiction here: Maslow said that thwarting of basic needs is unhealthy, but also that lack of frustration is unhealthy. Despite such contradictions, it is clear that Maslow’s theory favors a high level of need satisfaction. So let us go back to his basic theoretical position and see why, in fact, it does present great difficulties. Let us imagine what kind of circumstances would produce consistent gratification, remembering that partial gratification will produce less movement toward self-actualization. Using the hunger drive as an example, perhaps the only way that consistent gratification could be achieved is through eating small amounts of food almost continuously (although intravenous feeding would achieve a similar result). We can imagine similar conditions for other needs – for example, sexual gratification should be available just as soon as the urge arises. Do not make the mistake of dismissing this as farfetched. To the extent we allow ourselves to be hungry, or sexually unsatisfied, our efforts will be directed towards satisfying our lower needs rather than towards self-actualization. Following this logic, then, parents who want to raise selfactualized children should strive to meet their basic needs as soon as they arise, ideally before the children begin to feel much deprivation or motivation to make efforts to satisfy these needs. Now, if you are beginning to think that this approach might lead to problems, you are not alone. Researchers have found, not surprisingly, that parents who “pamper, indulge, and fawn over the youngster in such ways as to teach him that his every wish is a command to others” (Millon, 1969, p. 263) tend to raise children who are narcissistic, are exploitive of others, have little self-control, and lack competency skills (Millon, 1969, pp. 261–266). In fact, there are many threads of research and theory in psychology that postulate, contrary to Maslow, that some frustration and deprivation is necessary for healthy psychological development. Among these are (a) Robert White’s competence theory (1959), (b) Yerkes-Dodson’s law (Yerkes & Dodson, 1908), (c) Hans Selye’s eustress theory (1974), and (d) Alfred Adler’s compensation theory (Ansbacher & Ansbacher, 1959). In fact, these perspectives are far from esoteric; their essence can be found in any number of self-help books written for the general public (e.g., Bloomfield & Felder, 1985; Brown, 1983; Houston, 1981). In spite of their differences, all of these perspectives agree on one or more of the following points: (a) that a moderate amount of deprivation stimulates our creative potential; (b) that this keeps us motivated and interested in life; and (c) that this leads to a sense of competence that helps us deal with the vicissitudes of living. Nietzsche said it in a particularly pithy (and extreme) fashion: “What does not kill me makes me stronger.” In addition, research indicates that some degree of deprivation, and thus challenge, are necessary to keep us from feeling bored. In particular, this research indicates a connection between low levels of deprivation and
Salkind_Chapter 43.indd 204
9/4/2010 10:41:33 AM
Neher
Maslow’s Theory of Motivation
205
psychosomatic illness (Goldberg, 1978). Note that this finding also conflicts with the widely noted position of Holmes and Rahe (1967), who, along with Maslow, believe that the less deprivation and stress (in their theory, stress that arises from having to adjust to change) the better. George Bernard Shaw’s memorable comment on the matter is certainly an overstatement, but it clearly states the alternative view to Maslow’s: “The only thing worse than not getting what you want is getting what you want.” Finally, it might be said that conditions that allow for consistent gratification of needs are probably only possible in advanced affluent societies such as ours. In fact, Maslow’s theory could be considered elitist in this regard (Smith, 1973, p. 29). This also makes it difficult to image the evolutionary conditions that would give rise to a self-actualization potential which could be realized only in a society that didn’t come into being until recently. So where did Maslow go wrong? His error, I think, lies in overstating his position. We can all agree that extreme need deprivation is ordinarily psychologically damaging. But this doesn’t mean that the opposite condition, extreme ease of need gratification, is psychologically healthy. As with many issues, a moderate position is the most defensible. Of course, as we have seen, Maslow did vacillate on this issue. This is understandable when we realize, on the one hand, how important his absolutist stand is to his theory as a whole. After all, if some deprivation is psychologically healthy, then not only does his theory lose much of its distinctiveness, but its chain of reasoning loses one of its crucial links: if we are deprived at lower-need levels, how then, in Maslow’s way of thinking, are we able to move up the need hierarchy and become fully self-actualized? On the other hand, as we have also seen, Maslow experienced great difficulty maintaining his absolutist stand in the face of so much opposing theory and research. Now we are ready to discuss the third component of Maslow’s theory. 3. The self-actualization needs differ qualitatively from the lower (or “deficiency”) needs in that they motivate us in the absence of a sense of deficiency – hence they are called “being” needs (Maslow, 1968, pp. 29–37). As Maslow said, being motivation involves a state “of desirelessness, purposelessness, [and] lack of D-need (deficiency-need)” (1971, p. 128). If Maslow were referring to the psychological state that often persists for a period following the gratification of a need, this would be an obvious statement. However, it is clear that he was describing a more or less ongoing level of functioning. Now we can grant that, for example, compared with eating a meal, there is a different feeling associated with creating art, writing literature, or getting involved in a favorite building project. Our involvement with these activities seems self-sustaining, persistent, and intrinsically rewarding, and this is certainly the quality that Maslow tried to capture in his theory. But we have seen that, when it comes to Maslow’s theory, initial impressions are often misleading. So let us take a closer look at this aspect of his theory.
Salkind_Chapter 43.indd 205
9/4/2010 10:41:34 AM
206
Motivation
Let us begin by examining the logic of Maslow’s assumption that we can be motivated in the absence of a sense of deficiency. Another way to state this is that we can be motivated to gain or achieve something even though we don’t lack it in the first place. Not very logical. As Salvatore Maddi says, “In order to define a motive, you must specify a goal state that is to be achieved. . . . And once you define a goal, you are of necessity assuring that the person having the motive is in a deprived state until he reaches the goal” (1968, p. 83). Think about your own experiences with higher-level needs. Don’t you find yourself setting goals, perhaps very long-range goals, but goals that consist of something you lack at present? If you achieve your goals, don’t you typically set new goals for yourself, and the cycle repeats itself? Certainly this has a different quality than eating a meal, but the difference doesn’t seem to have to do with deficiency, as Maslow maintained. Rather, the difference seems to involve such matters as experiencing greater freedom to choose higher-level motivations, or challenges – deprivations if you will – that are practically unlimited in their potential scope. These characteristics of higher-order motivations might arise because a wide variety of such motivations can meet a multitude of lower-level needs or because these motivations have truly become functionally autonomous or both. In any case, the basis of the distinctiveness of the self-actualization needs seems not to hinge on the absence of a sense of deprivation. Maslow’s discontent with motivation based on deprivation stemmed from his rejection of the traditional behaviorist position, which postulated tension or drive reduction – that is, overcoming deprivation, especially with respect to basic needs – as the sole basis of motivation (Maslow, 1968, p. 38). Behaviorists traditionally ignored higher drives such as curiosity and exploration, which seem to involve pursuing challenges and thus heightened drive states (Berlyne, 1960). However, it now appears that these higher drives are capable of being satiated, at least in some species (Eisenberger, 1972). Because satiation implies a prior state of deprivation, these findings call into question Maslow’s assumption that these higher motives operate in the absence of feelings of deprivation. To sum up, what appears to be unique about higher-order needs is not the absence of feelings of deprivation, but rather a number of other characteristics, including the purposeful choosing of challenges, and thus deprivations, which can provide almost limitless motivation and satisfaction. Now we come to the final component of Maslow’s theory of motivation. 4. The level of self-actualization, which is the end-point of the process outlined above, constitutes the highest level of human experience (Maslow, 1970, pp. 149–180; Maslow, 1971). Let us start with a quotation from Maslow: “Western civilization has generally believed that the animal in us was a bad animal” (1970, pp. 82–83). So, to some extent, does Eastern civilization, and most important, so, to some extent, did Maslow. Where Maslow differed from both Western and Eastern traditions is in the route he favored to overcome
Salkind_Chapter 43.indd 206
9/4/2010 10:41:34 AM
Neher
Maslow’s Theory of Motivation
207
our animal nature, by which he meant our basic needs that we share with other animals – needs for food, sex, and so on. You will remember that Maslow’s prescription runs as follows: ‘The easiest technique for releasing the organism from the bondage of the lower . . . needs is to gratify them” (1970, p. 61). Of course, we have already seen that it is questionable whether this approach is effective, but how does it compare with more traditional approaches? Now, traditionally in both East and West, the most common way to overcome lower needs is to deny and to suppress them. Of course, Maslow’s approach probably fits our modern-day affluent society much better, which often seems to believe that the best way to overcome temptation is to give in to it. But Maslow’s value judgment is the same as the traditional one – that a part of our basic biological makeup is sufficiently unworthy that it should be eliminated as an important concern in our lives (Daniels, 1988, p. 23). You may agree or disagree with Maslow’s value judgment (it makes little sense to me), but, for a theorist such as Maslow, who claimed to be taking his lead from basic biological characteristics, it seems strangely nonbiological. Why are these lower needs seen as unworthy (the term lower itself reinforces this assumption)? Maslow, in particular, considered them lower partly because he believed that they are basically selfish in nature (Maslow, 1968, p. 202). However, research in sociobiology has demonstrated that many lower drives, including the traditional archvillain, sex, are, biologically speaking, largely altruistic in nature. For example, animals will sometimes risk their own lives to conceive, or later to protect, their offspring (Wilson, 1980). Moving on to the characteristics of people who have attained selfactualization, Maslow once more had difficulty being consistent. We already know that ‘The perfectly healthy [self-actualized] man has no sex needs or hunger needs, or needs for safety, or for love, or for prestige, or self-esteem” (Maslow, 1970, p. 57). But elsewhere Maslow maintained that self-actualized people “tend to be good animals, hearty in their appetites and enjoying themselves without regret or shame or apology” (1970, p. 156). Of course, it makes no sense to say that people with no hunger needs are hearty in their appetites. This is yet another instance of Maslow contradicting Maslow. Maslow also granted that need satisfaction is not the only route to selfactualization: ‘There are apparently innately creative people in whom the drive to creativeness seems to be more important than any other counterdeterminant” (1970, p. 52). By this, Maslow meant that some people are chiefly motivated by higher-level needs even though they have failed to satisfy needs lower in the hierarchy. Examples would include artists or scientists who are so wrapped up in their work that they forgo eating, or sex, or meaningful relationships of any kind, for lengthy periods. Finally, Maslow admitted that his formula – satisfying lower needs is the way to achieve self-actualization – does not always work: “I have individual subjects in whom apparent basic-need-gratification is compatible with
Salkind_Chapter 43.indd 207
9/4/2010 10:41:34 AM
208
Motivation
‘existential neurosis,’ meaninglessness, valuelessness, or the like” (1971, pp. 300–301). Maslow suggested that, to deal with this difficulty, he needed to modify his basic theory: “It is now more clear to me that gratification of the basic needs is not a sufficient condition for self-actualization” (1971, p. 300). And this is indeed a drastic modification. What, then, did Maslow propose as a sufficient condition for achieving self-actualization? Although he was far from clear on this point (Maslow, 1971, pp. 39, 301), he seems to have concluded that, because the potential for self-actualization is genetically based, some people will inherit it and some people won’t (Frick, 1982, pp. 32– 40). To expand on his reasoning, according to the principle of genetic variation, inherited needs are likely to be distributed more or less according to a normal curve, with some individuals demonstrating a high level of the need, others a low level, but most people a moderate level. This principle should apply as well to self-actualization needs, if they are indeed genetic in character. Thus some individuals would be expected to inherit a very low selfactualization potential. In the extreme case, for example, seeking to specify a process by which retarded individuals could function consistently at the level of higher motivations would probably be a futile endeavor. For such people to satisfy completely their lower needs might indeed be a misguided effort, because other motivations may not be available to sustain them. Thus Maslow recognized that a low genetic potential for self-actualization might account for the feelings of “meaninglessness” he said he observed in some people who were gratified in their basic needs. The problem is that this view clashes with other statements of his regarding self-actualization – for example, “What a man can be, he must be. He must be true to his own nature. This need we may call self-actualization” (1970, p. 46). This statement, of course, conveys quite a different conception of self-actualization; according to it, we would conclude that anyone can potentially become self-actualized. But, as we have just seen, Maslow elsewhere realized that his genetic theory in fact limits selfactualization to a favored proportion of the population. But, of course, Maslow cannot have it both ways. One of these positions must be wrong. Let us conclude this section on self-actualization with a look at the people Maslow cited as self-actualized. Remember, they include such well-known personalities as Abraham Lincoln and Eleanor Roosevelt. Now, according to Maslow, to be self-actualized, individuals should “have been satisfied in their basic needs throughout their lives, particularly in their earlier years” (Maslow, 1970, p. 53). Thus, achieving a high level of need satisfaction late in life won’t do; this situation fits the alternative “deprivation followed by fulfillment” model of human well-being, rather than Maslow’s “constant-fulfillment” model. Now, if you are familiar with the early lives of Abraham Lincoln and Eleanor Roosevelt, you know that they both had extraordinary challenges and deprivations to overcome. In other words, they fail to qualify as exemplars of Maslow’s theory. Why did Maslow include such individuals in his attempts to support his theory? The answer seems to be that Maslow chose
Salkind_Chapter 43.indd 208
9/4/2010 10:41:34 AM
Neher
Maslow’s Theory of Motivation
209
his sample of self-actualizers on the basis of their adult traits, not their past life experiences (Maslow, 1970, pp. 149–180). Thus, unfortunately, instead of serving as a test of his theory (Does a consistently high level of need gratification produce self-actualized individuals?), his sample chiefly shows that if you look for people who meet any particular criteria of psychological health, you can probably find people who meet those criteria. For this reason, his demonstration of the traits of self-actualizers is “circular” and has little bearing on his theory. On the other hand, Maslow’s sample does demonstrate that some adults seem able to function much of the time at higher-need levels. However, most of the possible mechanisms for achieving self-actualization – we have discussed these in previous sections – are not encompassed by Maslow’s theory. In any particular instance, of course, it is difficult to know which of these mechanisms might be involved: for example, (a) gratification of lower needs in later life, (b) repression of lower needs, (c) a particularly strong genetic self-actualization potential, (d) a linkage between the two levels by which the pursuit of higher needs helps to meet lower needs, or (e) the achievement of functional autonomy of higher needs. Most likely, different combinations of these mechanisms operate in different people at different times. With respect to the traits of self-actualizers, you will remember that such people are said to be exceptionally creative, spontaneous, and nonjudgmental. However, in spite of the value Maslow seemed to attach to being nonjudgmental, Maslow’s theory is very judgmental – about what produces and what constitutes a self-actualized individual. In this, he is allied with other nativist theorists such as Carl Rogers. That is, because they postulate a more or less predetermined and unchanging human nature, they have a framework for judging whether or not people are pursuing the “correct path” to self-actualization. In contrast, behaviorists, for example, traditionally make no judgments about what an ideal human is like, because our human potential, in their view, is not fixed, but rather is infinitely malleable. Of course, either of these extreme positions is difficult to support. A final characteristic of self-actualizers deserves comment, and that is their ability to experience heights of emotion – what Maslow called peak experience, or what is more commonly referred to as mystical experience. Remember that, according to Maslow, people become self-actualized, and thus more likely to have peak experiences, when their lower needs have been met. However, as we have already said, both Eastern and Western traditions favor deprivation and suppression as a means of curtailing the lower needs, and this same approach, carried to an extreme, constitutes perhaps the most common path to mystical experience (Neher, 1990, pp. 107–121). At one point, and contrary to his theory, Maslow admitted that “higher needs may occasionally emerge, not after gratification, but rather after forced or voluntary deprivation, renunciation, or suppression of lower basic needs [as is]
Salkind_Chapter 43.indd 209
9/4/2010 10:41:34 AM
210
Motivation
reported to be common in Eastern cultures” (1970, pp. 59–60). Probably all of us have experienced the ecstasy that can follow fulfillment after a long period of deprivation – for example, reunion with a loved one after a lengthy separation. But how do we make sense of deprivation practices of mystics, East and West, whose fulfillment, when it comes, seems to be in the form of transcendental feelings or visions of achieving oneness with a higher essence? St. Teresa’s accounts of ecstatic union with spiritual beings is probably the best-known example. Perhaps, as with much of experience, fulfillment is more a matter of expectation and perception than of external reality (Neher, 1990, pp. 122–130). Short of such extremes, most of us can remember when we have purposefully deprived ourselves of basic needs; going camping is a good example. Having to concern ourselves with providing shelter, keeping warm, and catching and preparing fish to eat may only prove what Cicero said: “Hunger is the best seasoning for meat.” But such experiences also seem to provide a connection with our primal roots (i.e., our basic needs) that can be very meaningful and invigorating. All these examples of purposeful need deprivation in the service of achieving apparently higher states of being tend, of course, to undermine further Maslow’s belief that satiating lower needs constitutes the most reasonable path to self-actualization and peak experience. To summarize, the problem here is not that the level of self-actualization is not worth attaining. The problems are that, first, there is a serious question whether its attainment is a consequence of the process Maslow advocated. In particular, the requirement that lower-level motivations must first be eliminated, through satiating them, is highly questionable on a number of grounds. And, second, there is good reason to believe that lower motivations are not always burdensome. In fact, they can make their own unique and significant contribution to our lives.
Conclusion With respect to the main outlines of his theory, Maslow certainly deserves credit for his general thesis: Undoubtedly, we do have a difficult time reaching the heights of experience if we are preoccupied with attaining the base essentials of life. However, many of the details of his theory need modification. In particular, the four components of the theory need some reworking. 1. We do inherit needs, but among these are needs that Maslow failed to acknowledge as necessary for developing as fully functioning humans. These needs involve the necessity for a great deal of cultural input, more than just what is necessary to gratify our lower needs. In particular, many
Salkind_Chapter 43.indd 210
9/4/2010 10:41:34 AM
Neher
Maslow’s Theory of Motivation
211
higher needs undoubtedly require encouragement from the environment for their development. 2. There probably is some sort of need hierarchy, in that our basic needs are ordinarily more urgent in their demands than are higher-level needs. However, it is not clear that, in the long run, satisfying our lower needs diminishes their urgency, which Maslow felt was necessary for higher needs to emerge. In fact, for many reasons, a moderate level of need gratification seems to be more growth enhancing than the high levels of need gratification that Maslow favored. In addition, there is probably more linkage between various need levels than Maslow proposed. In particular, the higher needs may not be as autonomous as Maslow’s theory suggests. For example, if we could, we might often trace them to their origin, either in evolutionary or individual experience, in helping us meet lower needs. 3. Higher-level needs seem not to operate apart from a sense of deficiency, as Maslow proposed. However, higher needs certainly are distinctive in that, unlike lower needs, we are able to choose our higher motivations (or challenges, and thus deprivations) they are farthest because removed from essential survival needs. 4. The level of self-actualization, as Maslow described it, is unique to humans and is worthy of attainment. However, his widely cited sample of selfactualized individuals does not support his theory that a history of highlevels of satiation of basic needs, which is intended to eliminate them as motivations, is required for the attainment of self-actualization. In fact, there are many reasons to believe that “lower” motivations, far from always being a burden, can provide important fulfillments and satisfactions of their own. Nevertheless, there are a number of possible mechanisms, most of which Maslow’s theory fails to encompass, that may be involved in the achievement of self-actualization. In the face of these many problems, humanistic psychologists have a choice. They can ignore the difficulties, preserve Maslow’s teachings intact, and consequently run the risk of ideological atrophy as has happened, to some extent, in psychoanalysis. Or they can view Maslow’s theory as a serious scientific contribution that therefore deserves scrutiny and modification in the light of new insights and new information. The particulars of his theory aside, Maslow certainly deserves credit for a number of accomplishments. He attacked behaviorism, as well as psychoanalysis, at some of their most vulnerable points, and encouraged us to think about alternative ways of viewing motivation. And he encouraged us to devote more attention to the example of psychologically healthy individuals and what they can teach us about the positive aspects of living. There is little question that these are worthy accomplishments.
Salkind_Chapter 43.indd 211
9/4/2010 10:41:34 AM
212
Motivation
References Alderfer, C. P . (1969). An empirical test of a new theory of human needs. Organizational Behavior and Human Performance, 4, 142–175. Allport, G. W. (1937). The functional autonomy of motives. American Journal of Psychology, 50, 141–156. Ansbacher, H., & Ansbacher, R. (1959). The individual psychology of Alfred Adler. New York: Basic Books. Aron, A. (1977). Maslow’s other child. Journal of Humanistic Psychology, 17(2), 9–24. Berlyne, D. E. (1960). Conflict, arousal, and curiosity. New York: McGraw-Hill. Bloomfield, H., & Felder, L. (1985). The Achilles syndrome: Transforming your weaknesses into strengths. New York: Random House. Brown, W. (1983). Welcome stress. Minneapolis, MN: Compcare. Daniels, M. (1982). The development of the concept of self-actualization in the writings of Abraham Maslow. Current Psychological Reviews, 2, 61–76. Daniels, M. (1988). The myth of self-actualization. Journal of Humanistic Psychology, 28(1), 7–38 Eisenberger, R. (1972). Explanation of rewards that do not reduce tissue needs. Psychological Bulletin, 77, 319–339. Frick, W. (1982). Conceptual foundations of self-actualization. Journal of Humanistic Psychology, 22(4), 33–52. Geller, L. (1982). The failure of self-actualization theory. Journal of Humanistic Psychology, 22(2), 56–73. Goldberg, P . (1978). Executive health. New York: McGraw-Hill. Graham, W., & Balloun, J. (1973). An empirical test of Maslow’s need hierarchy. Journal of Humanistic Psychology, 13(1), 97–108. Hall, D. T., & Norigaim, K. E. (1968). An examination of Maslow’s need hierarchy in an organizational setting. Organizational Behavior and Human Performance, 3,12–35. Holmes, T. H., & Rahe, R. H. (1967). The social readjustment rating. Journal of Psychosomatic Research, 11, 213–218. Houston, J. (1981). The pursuit of happiness. Glenview, IL: Scott, Foresman. Huizinga, G. (1970). Maslow’s need hierarchy in the work situation. Groningen, Netherlands: Wolters-Noordhoff. Lawler, E., & Suttle, J. L. (1972). A causal correlational test of the need hierarchy concept. Organizational Behavior and Human Performance, 7, 265–287. Maddi, S. (1968). Personality theories. Belmont, CA: Dorsey. Malson, L. (1972). Wolf children and the problem of human nature. New York: Monthly Review Press. Maslow, A. (1964). Religions, values, and peak experiences. Columbus, OH: Ohio State University. Maslow, A. (1967). Eupsychian management: A journal. Homewood, IL: Irwin-Dorsey. Maslow, A. (1968). Toward a psychology of being (2nd ed.). New York: Van Nostrand. Maslow, A. (1969). The psychology of science: A reconnaissance. New York: Harper & Row. Maslow, A. (1970). Motivation and personality (2nd ed.). New York: Harper & Row. Maslow, A. (1971). The farther reaches of human nature. New York: Viking. Mathes, E. (1981). Maslow’s hierarchy of needs as a guide for living. Journal of Humanistic Psychology, 21(4), 69–72. Millon, T. (1969). Modern psychopathology. Philadelphia: W. B. Saunders. Miner, J. B., & Dachler, H. P . (1973). Personal attitudes and motivation. Annual Review of Psychology, 24, 379– 402. Neher, A. (1990). The psychology of transcendence (2nd ed.). New York: Dover.
Salkind_Chapter 43.indd 212
9/4/2010 10:41:34 AM
Neher
Maslow’s Theory of Motivation
213
Piattelli-Palmarini, M. (1980). Language and learning. Cambridge, MA: Harvard University Press. Selye, H. (1974). Stress without distress. Philadelphia: Lippincott. Smith, M. B. (1973). On self-actualization: A transambivalent examination of a focal theme in Maslow’s psychology. Journal of Humanistic Psychology, 13(2), 17–33. Turnbull, C. M. (1974). The mountain people. New York: Simon & Schuster. Wahba, M. A., & Bridwell, L. G. (1979). Maslow reconsidered: A review of research on the need hierarchy theory. In R. M. Steers & L. W. Porter (Eds.), Motivation and work behavior (pp. 47–55). New York: McGraw-Hill. White, R. (1959). Motivation reconsidered: The concept of competence. Psychological Review, 66, 297–333. Whorf, B. (1956). Language, thought, and reality. Cambridge, MA: MIT Press. Wilson, E. O. (1980). Sociobiology. Cambridge, MA: Harvard University Press. Wofford, J. C. (1971). The motivational bases of job satisfaction and job performance. Personnel Psychology, 24, 501–518. Wuthnow, R. (1978). An empirical test of Maslow’s theory of motivation. Journal of Humanistic Psychology, 18(3), 75–77. Yerkes, R., & Dodson, J. D. (1908). The relation of strength of stimulus to rapidity of habit formation. Journal of Comparative Neurology and Psychology, 18, 459– 482.
Salkind_Chapter 43.indd 213
9/4/2010 10:41:34 AM
Salkind_Chapter 43.indd 214
9/4/2010 10:41:34 AM
44 Caught on Fire: Motivation and Giftedness Ann Robinson
I do not mean zeal without capacity, nor capacity without zeal. —Sir Francis Galton (1869)
A
few years ago, a film based on real events surrounding a British team of Olympic runners found favor with movie goers. The film, Chariots of Fire, traced the development of two great athletes as they prepared for the 1924 Olympic games after the First World War. One long distance runner, Eric Liddell, was a devout missionary from Scotland; the other was Harold Abrahams, a young Jewish runner utterly absorbed by his sport. Despite early difficulties and experiences with ethnic prejudice, Abrahams went on to become an institution in British athletics. Liddell pursued a life in the church. The film is a revealing examination of zeal, or the eagerness to work. In the film, each of the two leading actors communicates the joy, the desire and the powerful identification with one’s talents that we have come to understand as integral to giftedness. Indeed, as a construct, motivation permeates our field (Feldhusen, 1986). We can trace its modern roots to the Victorian Sir Francis Galton who believed that great achievements called for both intellect and enthusiasm. Moving into the 1940’s, 1950’s and 1960’s, motivation was a focal point for researchers like White (1959) and McClelland (1961). Researchers began to refine their understanding of intrinsic and extrinsic motivation. Students who worked for personal feelings of satisfaction were intrinsically motivated and thought to be more likely to continue learning for its own sake than those Source: Gifted Child Quarterly, 40(4) (1996): 177–178.
Salkind_Chapter 44.indd 215
9/4/2010 10:41:24 AM
216
Motivation
who achieved in school because of extrinsic rewards. Later achievement motivation researchers discriminated between task versus ego involvement. A student with task involvement learned because he or she was “carried away” with the activity itself. Ego-involved students were more likely to work in order to best others. Prominent figures in gifted education like E. Paul Torrance recognized the importance of “falling in love with an idea.” Here, motivation became an emotional state rather than a behavior or an action. More recently, Csikzentmihalyi conceptualized such feelings as flow – an optimal experience which transports the person beyond themselves. As classroom teachers, we have embraced Renzulli’s use of task commitment to describe the persistence necessary to the development of talent. The widespread use of task commitment by schools to define giftedness and to identify gifted students is testimony to the consensus that motivation counts in the real world. And count, it does. There can be no more valuable outcome for education than the love of learning. Unfortunately, in an attempt to categorize and to create handy taxonomies, educators have too often compartmentalized intellect and feeling. We speak or cognitive and affective domains as if they do not meet in the same individual. By artificially divorcing our cogitations from our passions, we have committed Descartes’ error – the belief that heart and mind are quite separate organs or entities. A thoughtful teacher observing a child happily, passionately, and zealously engaged in learning knows quite the contrary.
In This Issue First, Gottfried and Gottfried trace the development of academic intrinsic motivation from childhood through early adolescence. In their longitudinal study, gifted children were more likely than a comparison group to report higher motivation across all subject areas. The authors conclude that the enjoyment of learning is greater for gifted students and that motivation is important for the development of giftedness. Their contribution to our knowledge base includes the developmental finding that motivation in gifted children remains stable over time. Next, Chan examines the motivational orientations and metacognitive abilities of gifted children and their average achieving peers. Using the framework of attribution theory, she notes that gifted children have greater confidence in their feelings of control over success and failure in school than do their agemates. Gifted children are more likely to report that they can control the amount of effort they put into a school task and the strategies they use to learn them.
Salkind_Chapter 44.indd 216
9/4/2010 10:41:25 AM
Robinson
Caught on Fire
217
In “Gifted and Non-Selected Children’s Perceptions of Academic Achievement, Academic Effort, and Athleticism,” Udvari and Rubin extend the landmark study by Tannenbaum (1962). They studied younger children and they introduced gender as a variable. Their results indicate that gifted children are more tolerant of “brilliant” peers than average children are, that neither group actively disparaged effort nor did they particularly reward it, and that athleticism continued to be the most important contributor to social acceptability. In the next article, Kurt Heller of Germany summarizes the considerable literature on gender differences in mathematics and the natural sciences from a motivational perspective. This important contribution to the knowledge base distills a vast literature with significant implications for the development of gifts and talents in girls and women. He examines the hypothesis that the lowered performance of girls and women may be due to their attributions about their abilities in these subject areas. To support his conjecture, he reviews empirical research which indicates that girls and women hold unrealistically low expectations of their abilities in mathematics and science. Then, Heller reports two studies of his own on attribution retraining programs which successfully modify attributions of high school and college women and which subsequently raise their level of achievement. Neither study reviewed has been accessible to English speaking scholars until now. What happens when motivation is diverted from a healthy course? How frequently is that likely to happen among gifted students? Is it truly unhealthy? In “The Incidence of Perfectionism in Gifted Students,” Parker and Mills explore these questions and conclude that gifted students and a comparison group of age peers do not differ significantly in the incidence of perfectionism. They also suggest that the anecdotal reportage of perfectionism among gifted youth may be the result of differential labeling. What is viewed as healthy effort among the general cohort may be viewed by others as unhealthy overachievement among gifted students. Finally, they urge the field to develop a more precise distinction between striving which stimulates excellence and striving which inhibits it. Our “In the Public Interest” shares the reflections of Pamela Clinkenbeard on what the literature on motivation and giftedness has to offer us as we set about developing the talents of our students in the schools. She notes that the studies which conceptualize motivation as a trait or state lead us to include measures or markers of motivation in the identification of gifted students. Leading us further, she points out that the field will benefit from viewing the motivation to learn as the outcome as well as an identification “input” of our programs and services. Finally, we close with two book reviews which contribute to our understanding of giftedness and motivation. First, Pat Haensly reviews Karen Arnold’s study of high school valedictorians. Her thoughtful review of Arnold’s thoughtful text poses an important question. To what extent does
Salkind_Chapter 44.indd 217
9/4/2010 10:41:25 AM
218
Motivation
the traditional avenue of recognition for school achievement – class standing – divert talented young people from a life happily lived and creatively expressed? The review leads us to the fitting, final piece in this special issue on giftedness and motivation, Gary Davis’ review of a biography of E. Paul Torrance by Garnet Miller. Working through document analysis, interviews and extensive converations with Dr. Torrance himself, Millar has produced a portrait of a man who fell in love with an idea and made it a way of life.
References Csikzentmihalyi, M. (1991). Flow: The psychology of optimal experience. New York: Harper Perennial. Feldhusen, J. F. (1986). A conception of giftedness. In R. J. Sternberg & J. F . Davidson (Eds.), Conceptions of giftedness. Cambridge, England: Cambridge University Press. Galton, E. (1869). Hereditary genius: An inquiry into its laws and consequences. London: Macmillan & Co. McClelland, D. (1961). The achieving society. New York: The Pree Press. Renzulli, J. S. (1978). What makes giftedness? Re-examining a definition. Phi Delta Kappan, 60, 180–184, 261. Tannenbaum, A. (1962). Adolescents attitudes toward academic brilliance. New York: Teachers College Press. White, R. (1959). Motivation reconsidered: The concept of competence. Psychological Review, 66, 297–333.
Salkind_Chapter 44.indd 218
9/4/2010 10:41:25 AM
45 An Empirical Test of Maslow’s Theory of Motivation Eugene W. Mathes and Linda L. Edwards
M
aslow’s (1970) theory of motivation suggests that there are five basic classes of needs and that they are hierarchically organized as follows: physiological, security, belongingness, esteem, and self-actualization. Each need level is prepotent to the next higher need level. This means that an individual initially attempts to satisfy his or her physiological needs, and only when they are satisfied does the individual attempt to satisfy security needs. Once security needs are satisfied the individual attempts to satisfy belongingness needs and so on. Although a number of studies have shown that satisfaction of physiological (Cofer & Appley, 1964), security (Maslow, Birsh, Honigmann, McGrath, Plason, & Stein, 1952), belongingness (Rogers & Dymond, 1954), and esteem needs (Maslow, 1939; 1940; 1942) facilitates self-actualization, there is no evidence demonstrating that these lower needs form the hierarchy specified by Maslow. The purpose of the study reported below was to test the hierarchical aspect of Maslow’s theory of motivation. To accomplish this end, student subjects (36 males, 76 females) were given self-report inventories: the Security-Insecurity Scale of Maslow et al. (1952); a belongingness scale devised by the authors; Rosenberg’s (1965) Self-Esteem Scale; and Shostrom’s (1965) measure of self-actualization, the Personal Orientation Inventory (POI). Physiological need satisfaction was not measured because it was assumed that the subjects’ physiological needs were satisfied. It was hypothesized that subjects scoring above the median on one of these measure of need satisfaction would obtain significantly higher average
Source: Journal of Humanistic Psychology, 18(1) (1978): 75–77.
Salkind_Chapter 45.indd 219
9/4/2010 10:41:17 AM
220
Motivation
scores on all of the measures of need satisfaction further up the hierarchy than subjects scoring below the median. Specifically, three hypotheses were made: Hypothesis 1. Subjects scoring above the median on the measure of security need satisfaction would obtain significantly higher average belongingness satisfaction, esteem satisfaction, and self-actualization scores than subjects scoring below the median on the security measure. Hypothesis 2. Subjects scoring above the median on the measure of belongingness need satisfaction would obtain significantly higher average scores on the measures of esteem need satisfaction and self-actualization than subjects scoring below the median on the belongingness measure. Hypothesis 3. Subjects scoring above the median on the measure of esteem need satisfaction would obtain a significantly higher average score on the measure of self-actualization than subjects scoring below the median on the esteem measure.
To test the first hypothesis, subjects were split into secure and insecure groups by means of a median split of Security-Insecurity Scale scores. The average scores of these two groups on the Belongingness, Self-Esteem, and POI scales were then compared by means of t tests. As Table 1 shows, Hypothesis 1 was entirely supported by the women’s data but only partially supported by the men’s. Although the secure men scored significantly higher on the POI than the insecure men, significant differences were not found for the other two scales. To test the second hypothesis, subjects were split into belonging and nonbelonging groups by means of a median split of Belongingness Scale scores. The average scores of these two groups on the Self-Esteem and POI scales were then compared by means of t tests. As Table 1 shows, Hypothesis 2 was not supported.
Table 1: Mean satisfaction scores of subjects scoring above and below the median on lower level need satisfaction measures Women
Belonging Self-Esteem POI
Self-Esteem POI
POI
Salkind_Chapter 45.indd 220
Men
Insecure ss
Secure ss
p
Insecure ss
Secure ss
p
18.58 48.37 96.52
21.16 60.47 106.00
.0013 .0011 .0054
19.28 53.89 92.61
21.11 55.22 107.50
n.s. n.s. .0006
Unloved ss
Loved ss
p
Unloved ss
Loved ss
p
52.74 98.42
56.11 104.11
n.s. n.s.
53.50 99.50
57.11 102.50
n.s. n.s.
Low Self-Esteem
High Self-Esteem
p
Low Self-Esteem
High Self-Esteem
p
98.08
104.45
n.s.
98.22
n.s.
101.44
9/4/2010 10:41:18 AM
Mathes and Edwards
Maslow’s Theory of Motivation
221
To test the third hypothesis, subjects were split into high and low self-esteem groups by means of a median split of Self-Esteem scale scores. The average scores of these two groups on the POI were then compared by means of a t test. Table 1 shows that Hypothesis 3 was not supported. The results of this study suggest that Maslow’s hierarchical theory of motivation should be modified to include only two or three levels. Security was shown to be a prerequisite to self-actualization, while belongingness and esteem were shown not to be essential prerequisites.
References Cofer C. N., & Appley, M. H. Motivation: Theory and research. New York: Wiley, 1964. Maslow, A. H. Dominance-feeling, personality and social behavior in women. Journal of Social Psychology, 1939, 10, 3–39. Maslow, A. H. A test for dominance-feeling (self-esteem) in women. Journal of Social Psychology, 1940, 12, 255–270. Maslow, A. H. Self-esteem (dominance feeling) and sexuality in women. Journal of Social Psychology, 1942, 16, 259–294. Maslow, A. H. Motivation and personality (Revised ed.). New York: Harper and Row, 1970. Maslow, A. H., Birsh, E., Honigmann, I., McGrath, F., Plason, F., & Stein, M. Manual for the security-insecurity inventory. Palo Alto, Calif.: Consulting Psychologists Press, 1952. Rogers, C. R., & Dymond, R. F . (Eds.). Psychotherapy and personality change. Chicago: University of Chicago Press, 1954. Rosenberg, M. Society and the adolescent self-image. Princeton, N.J.: Princeton University Press, 1965. Shostrom, E. L. A test for the measurement of self-actualization. Educational and Psychological Measurement, 1965, 24, 207–218.
Salkind_Chapter 45.indd 221
9/4/2010 10:41:18 AM
Salkind_Chapter 45.indd 222
9/4/2010 10:41:18 AM
46 Meaningfulness, Commitment, and Engagement: The Intersection of a Deeper Level of Intrinsic Motivation Neal Chalofsky and Vijay Krishna
T
he managerial and popular literature has been increasingly referring to the “baby boomers” in America (the disproportionately large generation born just after World War II) nearing retirement age and questioning the meaning and purpose of their work and their lives. At the same time, their children, Generations X and Y, have started their careers asking the same questions. The classic motivation theorists and humanistic psychologists clearly supported the notion that individuals have an inherent need for a work life that they believe is meaningful (Alderfer, 1972; Herzberg, Mausner, & Snyderman, 1959; Maslow, 1943, 1954, 1971; McClelland, 1965; McGregor, 1960; Rogers, 1959, 1961). Maslow (1971) wrote that individuals who do not perceive the workplace as meaningful and purposeful will not work up to their professional capacity. There is a long history of research and discourse about what motivates employees and the relationship between job satisfaction and performance/productivity. The need or content theories of the 1960s and 1970s and their emphasis on the individual gave way to the reinforcement and person–environment interaction theories of the 1970s through the 1990s and their emphasis on performance, organizational systems, and productivity. Most of the research, therefore, has been in relation to these theories. The resurgence of interest of intrinsic factors such as meaning, purpose, spirituality, and commitment Source: Advances in Developing Human Resources, 11(2) (2009): 189–203.
Salkind_Chapter 46.indd 223
9/4/2010 10:41:10 AM
224
Motivation
and the recent introduction of engagement has resulted in an increase in both the popular and scholarly literature concerning the role of work as a motivator in the organization (Csikszentmihalyi, 1990; Fox, 1994; Lockwood, 2007; Meyer & Herscovitch, 2001). Employee commitment and engagement have emerged as very important constructs in organizational research on account of their favorable relationship with employee behaviors that promote organizational retention and performance. According to Porter (1968), commitment involves the willingness of employees to exert higher efforts on behalf of the organization, a strong desire to stay in the organization, and accept major goals and values of the organization (as cited in Porters, Steers, Mowday, & Boulin, 1974). A number of studies have shown a positive correlation between employee commitment and job performance (Hunter & Thatcher, 2007; Pool & Pool, 2007). Angle and Perry (1981) showed in their research that organizational commitment correlates positively with employees’ and organization’s ability to adapt to unforeseeable events. Studies also suggest that organizational commitment supports organizational citizenship behaviors that are central to flatter organizations, effective teams, and empowerment (Dessler, 1999). Kanter (1968) in her study of the 19th century American utopian societies, such as the Shakers, showed that the commitment-producing strategies distinguished successful from unsuccessful societies: “commitment is central to the understanding of both human motivation and system maintenance” (p. 499). According to Senge (1993), personnel commitment is one of the key requirements to become a learning organization. Be it a utopian society or a learning organization, commitment is seen as one of the key factors for organizational survival and growth. Despite the tremendous interest that organizational commitment research generates (Beck & Wilson, 2000), questions about the process and determinants of organizational commitment remain unanswered (Cohen, 2003; Meyer & Herscovitch, 2001). One of the possible reasons for this lack of a clear understanding of the motivational processes is because of the separation of the intrinsic aspects of motivation from the organizational and contextual factors that affect its development. Although there has been some research that suggests that employee engagement is related to workforce efficiency and productivity, very little empirical research exists that explains the processes through which engagement develops. Engagement has been defined as “the extent to which employees commit to something or someone in their organization, [and] how hard they work and how long they stay as a result of that commitment” (Corporate Leadership Council, 2004). The purpose of this article is to explore a deeper level of intrinsic motivation, meaningfulness, and to discuss the connections between meaning of work and meaning at work, represented by the concepts of employee commitment and engagement as organizational and contextual factors. A holistic approach to workplace motivation that combines the intrinsic aspects of
Salkind_Chapter 46.indd 224
9/4/2010 10:41:10 AM
Chalofsky and Krishna
Meaningfulness, Commitment, and Engagement
225
work motivation with the contextual and organizational factors has not been developed in the literature. This approach is important because although motivation is an individual and personal process, it is also significantly influenced and shaped by the contextual and organizational factors. Hence, while studying motivational factors, it is necessary to consider both the individual and the organizational factors that affect its development. This article attempts to fill this gap by generating a conceptual frame of a deeper level of motivation, namely, meaningfulness or meaningful work, and outlines the connection between meaning of work and meaning at work that is expressed in terms of employee commitment and engagement. This article seeks to contribute to the organizational behavior field by linking these streams of research and conceptual development that have not been connected previously. The integrative approach adopted in this article provides a new perspective on the connections between workplace motivation, employee commitment, and employee engagement.
Conceptual Background In preindustrial society, work was performed in the same community setting where people lived. Consequently, people knew one another closely and saw the connection between their work and how that work benefited the rest of the community. The work of an individual was intricately tied to the wellbeing of the self and the community. There was no separation of work from self, community, and life. The twin forces of reduction in agricultural work and rise of mechanical work meant more people becoming wage earners who were working for others (Brisken, 1996). In 1860, half the working population was self-employed; by 1900, two thirds were wage earners. Work became governed by the clock, by uniform standards, and by supervisors. “Reason demanded that workers subordinate their own experience of natural rhythms to the logic of efficiency” (Brisken, 1996, p. 100). The industrial era separated work from the community and created the bureaucracy to house, organize, and control work. There was little or no contact between the organization where employees worked and the community where they lived. Work was no longer an integral part of community life; it was detached, separated, and contained within specific buildings and times. In bureaucracies, hierarchies separated executives from workers, and internal competition forced workers against workers as they fought to move up the increasingly narrow upper levels of the organization. Wall Street further separated the owners from the employees. Now there are people who commute from New York or Boston to Washington and beyond, as well as people all over the globe who work in virtual teams and even virtual organizations. Consequently, people are not only moving work further away but are further away from the rest of their
Salkind_Chapter 46.indd 225
9/4/2010 10:41:10 AM
226
Motivation
lives. As work has become separated from the community and life, it has lost its original sense of meaning as an integral aspect of human existence. One hypothesis is that motivation only became an issue because meaning disappeared when the work became separated from the rest of life and community. “As a consequence motivation theories have become surrogates for the search for meaning” (Sievers, 1984, p. 3). There is very little research based on the premise that meaningful work is lost when work becomes separated from being a natural and integral part of the community. In the 1960s and 1970s, the classic motivation theorists and humanistic psychologists clearly supported the notion that individuals have an inherent need for a work life that they believe is meaningful (Alderfer, 1972; Herzberg et al., 1959; Maslow, 1943, 1971; McGregor, 1960; Rogers, 1959, 1961). Maslow (1971) wrote that individuals who do not perceive the workplace as meaningful and purposeful will not work up to their professional capacity. They theorized that individuals are motivated to take certain actions based on fulfilling needs believed to be inherent in all humans. These theorists all proposed that as these needs move from the basic survival needs to higher-order needs, they become more intrinsic and reflective in nature. The higher-order needs reflect life values: working toward a higher cause, meaningfulness, and life purpose. Maslow (1971) expressed these values as being values, referred to as B-values. B-values included truth, transcendence, goodness, uniqueness, aliveness, justice, richness, and meaningfulness. Maslow believed that individuals have the potential to reach what he called self-actualization, which is the process of developing one’s potential, of expressing oneself to the fullest possible extent in a manner that is personally fulfilling. It is not an end-state but an ongoing process of becoming. Near the end of his life, Maslow wrote of people who seemed to transcend self-actualization. He labeled this phenomenon “Theory Z” after McGregor’s (1960) “Theories X and Y.” In this state, people are devoted to a task, vocation, or calling that transcends the dichotomies of work and play. Maslow (1971) viewed this as a dynamic process of expanding the capabilities of the self to virtually unlimited potential. Also noteworthy were the thoughtful concepts from Rogers (1961), Locke (1975), and Ackoff (1981). Rogers believed that people find purpose when they experience freedom to be exactly who they are in a fluid and changing manner. Locke (1975) wrote that people strive to attain goals to satisfy their emotions and desires. Ackoff (1981) described purpose and meaning as progress toward an ideal that converts mere existence into significant living by making choice meaningful.
Meaning of Work In the late 1990s and early 2000s, spirituality and meaning at work emerged as a reaction to the loss of job security, as well as other factors (Darling & Chalofsky, 2004). One set of events was the environmental
Salkind_Chapter 46.indd 226
9/4/2010 10:41:10 AM
Chalofsky and Krishna
Meaningfulness, Commitment, and Engagement
227
disasters of Chernobyl, the chemical pollution at Bhopal, and the big oil spills off the coasts of Canada and Europe. These sparked an increase in the collective conscious about corporate social responsibility. The second set of events was the ethics scandals by Enron, Worldcom, and others. There have been a host of books, articles, and other media questioning our misuse of this planet, the role of work in capitalist societies, and our moral, ethical, and spiritual stance around life’s meaning and purpose (Holbecke & Springnett, 2004). In the past several years, organizations had been attempting to attract and retain highly qualified workers in advance of a projected labor shortage and amid increasing global competition. More recently, the economic downturn that began in 2007/2008 has been causing tremendous turmoil in employment. Yet new young professionals are still expressing a preference to work for socially responsible, ethically driven organizations that allow the “whole self” to be brought to work. And the “baby boomers” in America have been going through midlife and early retirement questioning the meaning and purpose of work in their lives, especially those who went through the downsizings of the 1990s (both the ones who lost their jobs and the survivors). When you ask these people about how they feel about work, according to one consulting group, they talk about a sense of loss; a lack of purpose, trust, and commitment; a loosening of emotional ties to the workplace; and a questioning of whether their work is worthwhile (Holbecke & Springnett, 2004). According to the Society for Human Resource Management’s (2008b) workplace forecast report, 4 of the 10 key themes identified were the following: • The implications of increased global competitiveness, especially the need for an educated and skilled workforce • Demographic changes, especially the aging of the workforce, the impending retirement of the baby boom generation, and the greater demand for work / life balance • Growing need to develop retention strategies for current and future workforce • Demographic shifts leading to a shortage of high-skill workers • Other findings from their survey that were relevant include the following: • Growth in the number of employees with caring responsibilities (elder care, child care, and both elder care and child care at the same time) • Generational issues – recognizing and catering to groups such as Generation Y (born 1980–2000), Generation X (born 1965–1980), and so on As mentioned earlier, the United States and the rest of the world were going through a chaotic economic decline, and even before the economic turmoil fully emerged, employees identified job security as their top concern (Society for Human Resource Management, 2008a). The Society for Human Resource
Salkind_Chapter 46.indd 227
9/4/2010 10:41:10 AM
228
Motivation
Management study identified contributors to employee job satisfaction, and the rest of the top four were the following: benefits, compensation, and feeling safe in the work environment. The top four contributors to job satisfaction were actually not satisfiers, based on Herzburg, but basic hygiene factors, or lower-order Maslow’s hierarchy levels. And they were rated high, at least in part, because of the dismal economic situation. So to call them contributors to satisfaction, or motivational factors, is a misnomer. But five out of the top 10 contributors to job satisfaction are motivational: • • • • •
Opportunities to use skills and abilities Relationship with immediate supervisor The work itself Meaningfulness of job Flexibility to balance life and work issues
What all these findings point to is the American workforce’s desire to be part of an organization that is going to take care of them and help them take care of their families, support their growth through skill and knowledge development, understand their need to have some work–life balance, and use their skills and abilities in a way that is meaningful.
Motivation and Meaning The literature refers to values as intrinsic motivators to performing a task and deriving satisfaction from the accomplishment of that task (or job). Although the emphasis may be on the congruence of the task with our beliefs, objectives, and anticipated rewards, motivation is seen as focused on the accomplishment of the task. The common assumption is that we are motivated by values based on result or outcome. Meaning, on the other hand, is more deeply intrinsic than values, suggesting three levels of satisfaction: extrinsic, intrinsic, and something even deeper. This level of intrinsic motivation is about the meaning of the work itself to the individual. Csikszentmihalyi (1990), in his attempt to define meaning, readily acknowledged the difficulty the task presents by suggesting that any definition of the term would undoubtedly be circular. However, he pointed to three ways in which the word may be defined, two of which are (a) having a purpose or the significance of something and (b) the intentions one holds. Similarly, Dirkx (1995) subscribed to the theory that work is one of the ways that a mature adult cares for oneself and others. This was expressed by respondents in the Schaefer and Darling (1996) study, who defined work as an opportunity for service to others and not distinct from the rest of life. The term may also be definitive of one’s uniqueness and a way of expressing one’s self in the world.
Salkind_Chapter 46.indd 228
9/4/2010 10:41:10 AM
Chalofsky and Krishna
Meaningfulness, Commitment, and Engagement
229
The significance of Csikszentmihalyi’s research was how intrinsically motivated people are driven by the work itself rather than by the accomplishment of the task. He included people in a wide range of occupations and activities and discovered a particular kind of experience where people’s performance seemed effortless. They described the feeling of being able to continue forever in their task and wanting to learn additional skills to master more demanding challenges. The fun, sense of mastery, and the potential for growth of self was what he labeled flow. In addition, they were disappointed when the work was finished because they were no longer in the flow state. This flow state was very similar to Maslow’s peak experiences at the self-actualization level. The work itself is but one aspect of Chalofsky’s (2003) construct of meaningful work. Chalofsky identified three themes: sense of self, the work itself, and the sense of balance. These themes represent a deeper level of motivation than the traditional intrinsic values of a sense of accomplishment, pride, satisfaction of finishing a task, and praise from a supervisor. This emerging new paradigm links back to some of the work of the content theorists but takes their thinking and the concept of intrinsic motivation to a deeper evolutionary level.
Sense of Self The idea of people needing to bring their whole selves (mind, body, emotion, and spirit) to their work is critical to finding meaning in work. People often fail to bring their whole selves to work out of fear of rejection, prejudice, or misunderstanding. “We work hard to create physical safety in our workplaces. Can we also create mental, emotional, and spiritual safety – safety for the whole person?” (Richards, 1995, p. 87). Mitroff and Denton (1999), in their groundbreaking study of spirituality in the workplace, found that the word that best described what people were feeling was a loss of interconnectedness, and what upset them the most was not being able to bring their complete selves into the workplace. For those people who felt adrift spiritually, their work and the workplace ceased to be a source to find deeper meaning, satisfaction, and connection. Helping individuals integrate their work and spiritual lives might mean that the time people spend working in their lifetime are more joyful, balanced, and meaningful and spiritually nourishing (Gibbons, 2007). These more fulfilled individuals might then return to their families, friends, and communities contented, refreshed, and ready to contribute. Because of this integration, one might expect that these people might be more ethical and more productive workers – which would benefit their employers. Moreover, a values-based organization culture might help businesses to become humane, socially active, and environmentally responsible.
Salkind_Chapter 46.indd 229
9/4/2010 10:41:10 AM
230
Motivation
Before one can bring the whole self to work, one has to first be aware of one’s own values, beliefs, and purpose in life. The sense of self also includes constantly striving to reach one’s potential and believing in one’s ability to reach that potential. And it includes an alignment between one’s purpose in life and the purpose for the work. Fulfillment, in part, comes from feeling that what we do on this earth makes a difference to other people. In fact, Maslow’s (1971) views expressed in the Farther Reaches of Human Nature would warrant the term selfless-actualization rather than self-actualization (Greene & Burke, 2007). His last work espoused human development beyond the self in self-actualization. Maslow’s (1971) message was that people must ultimately move from a focus on self to a focus and concern for other people to achieve the highest level of human nature. People who move beyond self-actualization “are, without a single exception, involved in a cause outside of their skin: in something outside of themselves, some calling or vocation” (p. 42). Meeting the self-actualization needs focuses on achieving a personal identity and complete acceptance of self and then moving beyond to a higher connection with others.
The Work Itself In the not-so-distant past, managers made decisions about the structure and process of work activities, in the name of efficiency (Thomas, 2000). Jobs were broken down into tasks, which involved certain competencies, and specific and measurable objectives. But work has now changed dramatically. Organizations have realized that they need to rely more and more on workers to make decisions about how the work should get accomplished. This requires more worker autonomy, flexibility, empowerment, continuous learning, risk taking, and creativity. Thomas captures what the research has demonstrated with his list of the four most critical intrinsic rewards: sense of meaning and purpose, sense of choice, sense of competence, and sense of progress. Although the work itself relates back to both Maslow’s self-actualization and Alderfer’s growth levels, and to an extent Herzberg’s motivators, the focus is on carrying out one’s life purpose through the work itself. “This is what I was meant to do.” It is not about productivity or other end state. It is about working and growing as a never-ending process. Professionalism is a related concept about taking pride in your work, a commitment to quality, a dedication to the interests of the client (be they internal or external), and a sincere desire to help. The premise of Good Work (Gardner, Csikszentmihalyi, & Damon, 2001) also speaks to professionalism but expands the concept to include ethics and social responsibility. They define good work as “work of expert quality that benefits the broader society” (p. ix). And people know that they are doing good work because it feels good. This may sound too simple, but people know when the work they are doing is good and meaningful. It is about trusting both one’s judgment and
Salkind_Chapter 46.indd 230
9/4/2010 10:41:10 AM
Chalofsky and Krishna
Meaningfulness, Commitment, and Engagement
231
one’s intuition. The more we know ourselves, the more we can evaluate and change our professional behavior, our moral and ethical judgment, and how our performance affects those around us.
Sense of Balance To paraphrase a Zen Buddhist saying, work and pleasure should be so aligned that it is impossible to distinguish one from the other. The sense of balance at its ideal is that life is so integrated that it does not matter whether what one is doing so long as it is meaningful. But given that most of us do not live in an ideal world, a sense of balance concerns the choices we make between the time spent at paid work, unpaid work (work at home, with family, as a volunteer), and at pleasurable pursuits, such that no one area of our lives is so dominant that we cease to value the other areas. All work and no play is stressful, overwhelming, and usually results in our health, family, and social lives suffering – even when the work is meaningful. All play and no work quickly becomes boring and meaningless. We also need to balance the nourishing of our different selves (mental, physical, emotional, and spiritual) because, in the less than ideal world, we do not have the luxury of meeting all our needs through one major activity. So we need to take the time to learn, to keep fit, to reflect, to meditate or pray, and to give to others. Again, because we usually worry most about doing our paid work, we do not take the time to care for ourselves. And when we do not take care of ourselves, we usually cannot be there for others. So we end up running on the proverbial treadmill until we finally realize we are not meeting our own or anyone else’s needs. The statistics we read in the media on work-related stress, people being overweight and less than physically fit, depression, divorce, and even workplace violence speak for themselves. Employees today are defining success on their own terms and some are opting out of the corporate rat race. Instead of living to work, people are working to live. They are tired of the inflexibility of standard work hours and the lack of concern for work–family balance and are leaving corporate positions in favor of more flexible career options. Meaningful work is not just about the meaning of the paid work we perform; it is about the way we live our lives. It is the alignment of purpose, values, and the relationships and activities we pursue in life. It is about living our lives and performing our work with integrity. It is about integrated wholeness.
Meaning at Work Meaning at work implies a relationship between the person and the organization or the workplace, in terms of commitment and engagement. Richards (1995) talked about the situation that when there is meaning at work, “[only
Salkind_Chapter 46.indd 231
9/4/2010 10:41:11 AM
232
Motivation
then] will our work become more joyful [and] our organizations will flourish with commitment, passion, imagination, spirit, and soul” (p. 94). As noted earlier, commitment involves the willingness of employees to exert higher efforts on behalf of the organization, a strong desire to stay in the organization, and accept major goals and values of the organization (as cited in Porters et al., 1974).
Commitment The primary drivers of commitment are identification with the organization’s goals and values, congruence between individual and organizational goals, and internalization of organizational values and mission. The term work commitment refers to a broader concept than organizational commitment and includes the different forms commitment can take in the workplace. According to Morrow (1993), there are five universal forms of work commitment, namely, (a) work ethic endorsement, (b) career commitment, (c) affective organizational commitment, (d) continuance organizational commitment, and (e) job involvement. The third form refers to an affective or psychological bonding that binds an employee to his/her organization. The primary drivers of this form of commitment are identification with the organizations goals and values, congruence between individual and organizational goals, and internalization of organizational values and mission. Of all the forms of commitment, affective commitment has been found to have the strongest positive relationship with desirable outcomes (Eisenberger, Huntington, Hutchison, & Sowa, 1986). Organizations that want to foster affective commitment must in turn show their commitment to the employees by providing supportive work environments. The research that has examined the relationship between perception of organizational support and organizational commitment has found a consistent positive relationship between them. Perception of organizational support states that “employees form a global belief concerning the extent to which the organization cares about them and values their contribution to the organization” (Aselage & Eisenberger, 2003, p. 492). Employees will be loyal to their organization if their organization values and appreciates them (Tyler, 1999, as cited in Fuller, Barnett, Hester, & Relyea, 2003). Organizations that are committed to employee development, their well being, and their need for actualization tend to have employees with high commitment (Dessler, 1999). Paul and Anantharaman (2004), in their research study, found that of all the human resource management variables that correlate with commitment, the human resource development variables of (a) career development, (b) development-oriented appraisal, (c) comprehensive training, and (4) employee-friendly work environment have the strongest correlation. In a study on culture and employee-friendly/ humane organizations, Chalofsky (2008) found that there was an interdependent relationship based
Salkind_Chapter 46.indd 232
9/4/2010 10:41:11 AM
Chalofsky and Krishna
Meaningfulness, Commitment, and Engagement
233
on the values of the organizational culture. Although no organization can be all things to all people, the organizations that were studied work hard to recognize and support employees’ work, family, leisure, personal, and community needs. They knew that if work–life balance is provided, then more of the whole employee will be able to focus (and wants to focus) on their work. Employees of the organizations are not there just because they have great benefits. The benefits are a result of the culture, because the culture values employees. In turn, employees have an overwhelming commitment to their organizations. It is all intertwined and synergistic. This was evident by the overwhelming alignment between the organizations’ missions and their commitment to their employees, customers, suppliers, and community. The organization supports the whole person, and the whole person is engaged in the organization.
Engagement Employee engagement has emerged as the most recent “business driver” of organizational success (Lockwood, 2007). A number of consulting companies (e.g., Gallup, Blessing-White) have surveyed their clients and have found a concern that the majority of employees are not engaged in their work and their organizations. One survey (Blessing-White, Inc., 2005) found that some of those employees who are not engaged may care about the organization and their work, but did not feel there is a good fit between their capabilities and their tasks. Others were not dissatisfied enough to leave the organization but were biding their time and not committed to either their work or the organization. The rest are actively looking to leave the organization. Engaged employees, on the other hand, work harder, are more committed, and are more likely to go “above and beyond” the requirements and expectations of their work (Lockwood, 2007). Engaged employees tend to feel that their work actually positively affects their physical health and their psychological well-being (Crabtree, 2005). The findings of Blessing-White, Inc. (2006) were similar: Engaged employees were proud to work in their organizations and trusted their immediate managers. Overall, their emotional connections were positive. Emotionally based commitment to the work and the organization results in higher levels of engagement and commitment based on developmental, financial, or professional rewards (Corporate Leadership Council, 2004).
Conclusion: Meaningfulness, Commitment, and Engagement One of the primary challenges organizations are facing today concerns motivating employees to carry out broader and more proactive roles. The current workforce is becoming more emergent and less traditional. An
Salkind_Chapter 46.indd 233
9/4/2010 10:41:11 AM
234
Motivation
emergent workforce is driven by opportunity as against a traditional work force that believes that tenure dictates growth (Campbell, 2002). Hence, organizations will need to develop novel approaches to motivation to retain an emergent workforce. Given the current state of the economy, it may seem that hiring and retention are not as important as they were thought to be several years ago. But organizations that want to be sustainable and successful over the long term need to still consider how to attract and grow high performing and committed employees. In view of the ineffectiveness of extrinsic motivational factors in fostering employee commitment and engagement, and the limited impact of traditional intrinsic factors in isolation, this article develops a conceptual framework of the relationship between commitment and engagement and a deeper level of intrinsic motivation, namely, meaningful work. This article builds on the premise that people with the highest levels of productivity and fulfillment view themselves as inseparable from their work (Mohrman & Cohen, 1995), are intrinsically motivated by the work itself (Csikszentmihalyi, 1990), and are professionally committed to and engaged with the organization. This approach combines the individual aspect of motivation emanating from a psychological perspective to a contextual dimension of motivation that highlights the importance of workplace environment and culture. Although the commitment construct has been researched for more than four decades, the research pertaining to engagement is of recent origin. Most of the engagement literature at this time is primarily based on survey results generated by consulting companies rather than empirical research. More research needs to be conducted concerning engagement as a viable construct and the relationship between engagement, commitment, and meaningfulness. The connections of the concepts of meaningful work, employee commitment, and engagement can give human resource development practitioners and managers powerful tools to develop workplace strategies that can greatly improve employee satisfaction, fulfillment, and loyalty. Organizational productivity, retention, and sustainability will be enhanced, and individuals will feel good about their work and how it affects the rest of their lives.
References Ackoff, R. L. (1981). Creating the corporate future: Be planned or be planned for. New York: Wiley. Alderfer, C. P . (1972). Existence, relatedness and growth: Human needs in organizational settings. New York: Free Press. Angle, H. L., & Perry, J. L. (1981). An empirical assessment of organizational commitment and organizational effectiveness. Administrative Science Quarterly, 26, 1–13. Aselage, J., & Eisenberger, R. (2003). Perceived organizational support and psychological contracts: A theoretical integration. Journal of Organizational Behavior, 24, 491–509.
Salkind_Chapter 46.indd 234
9/4/2010 10:41:11 AM
Chalofsky and Krishna
Meaningfulness, Commitment, and Engagement
235
Beck, K., & Wilson, C. (2000). Development of affective organizational commitment: A cross-sequential examination of change with tenure. Journal of Vocational Behavior, 56, 114–136. Blessing-White, Inc. (2005). Employee engagement report 2005. Princeton, NJ: Author. Blessing-White, Inc. (2006). Employee engagement report 2006. Princeton, NJ: Author. Brisken, A. (1996). The stirring of the soul in the workplace. San Francisco: Jossey-Bass. Campbell, B. (2002). The high cost of turnover: Why holding on to your employees can improve your bottom line. Black Enterprise, 33(5), 61. Chalofsky, N. (2003). An emerging construct for meaningful work. Human Resource Development International, 6, 69–83. Chalofsky, N. (2008). Work-life programs and organizational culture: The essence of workplace community. Organization Development Journal, 26, 11–18. Cohen, A. (2003). Multiple commitments at work: An integrative approach. Hillsdale, NJ: Lawrence Erlbaum. Corporate Leadership Council. (2004). Driving performance and retention through employee engagement. Washington, DC: Author. Crabtree, S. (2005). Engagement keeps the doctor away. Gallup Management Journal. Retrieved November 12, 2007, from http://gmj.gallup.com/content/14500/EngagementKeeps-Doctor-Away.aspx Csikszentmihalyi, M. (1990). Flow: The psychology of optimal experience. New York: Harper Perennial. Darling, J., & Chalofsky, N. (2004). Spirituality in the workplace. In M. Marquardt (Ed.), Encyclopedia of life support systems (EOLSS). Oxford, UK: EOLSS. Retrieved February 5, 2009, from http://www.eolss.net/outlinecomponents/HumanResources-Management.aspx Dessler, G. (1999). How to earn your employees’ commitment. Academy of Management Executive, 13, 58 –67. Dirkx, J. (1995). Earning a living or building a life? Reinterpreting the meaning of work in the practice of workplace education. Paper presented at the Academy of Human Resource Development Conference, San Antonio, TX. Eisenberger, R., Huntington, R., Hutchison, S., & Sowa, D. (1986). Perceived organizational support. Journal of Applied Psychology, 71, 500 –507. Fox, M. (1994). The reinvention of work: A new vision of livelihood for our time. New York: Harper Collins. Fuller, J. B., Barnett, T., Hester, K., & Relyea, C. (2003). A social identity perspective on the relationship between perceived organizational support and organizational commitment. Journal of Social Psychology, 143, 789–791. Gardner, H., Csikszentmihalyi, M., & Damon, W. (2001). Good work: When excellence and ethics meet. New York: Basic Books. Gibbons, P . (2007). Spirituality at work: A pre-theoretical overview. Retrieved September 8, 2008, from http://www.paulgibbons.net Greene, L., & Burke, G. (2007). Beyond self-actualization. Texas State University, School of Health Administration. Retrieved September 22, 2008, from http://ecommons.txstate. edu/cgi/viewcontent.cgi?article=1001&context=sohafacp Herzberg, F ., Mausner, B., & Snyderman, B. B. (1959). The motivation to work. New York: Wiley. Holbecke, L., & Springnett, N. (2004). In search of meaning in the workplace. Unpublished report, Roffey Park Institute, London. Hunter, L. W., & Thatcher, S. M. (2007). Feeling the heat: Effects of stress, commitment, and job experience on job performance. Academy of Management Journal, 50, 953–968. Kanter, R. M. (1968). Commitment and social organization: A study of commitment mechanisms in utopian communities. American Sociological Review, 33, 499–517.
Salkind_Chapter 46.indd 235
9/4/2010 10:41:11 AM
236
Motivation
Locke, E. A. (1975). Personnel attitudes and motivation. Annual Review of Psychology, 26, 457– 498. Lockwood, N. R. (2007). Leveraging employee engagement for competitive advantage: HR’s strategic role (SHRM Research Quarterly Report). Alexandria, VA: Society for Human Resource Management. Maslow, A. H. (1943). A theory of human motivation. Psychological Review, 50, 370–396. Maslow, A. H. (1954). Motivation and personality. New York: Harper. Maslow, A. H. (1971). The farther reaches of human nature. New York: Penguin. McClelland, D. C. (1965, November / December). Achievement motivation can be developed. Harvard Business Review, 43, 7–16. McGregor, D. (1960). The human side of enterprise. New York: McGraw-Hill. Meyer, J. P ., & Herscovitch, L. (2001). Commitment in the workplace: Toward a general model. Human Resources Management Review, 11, 299–326. Mitroff, I., & Denton, E. (1999). A study of spirituality in the workplace. Sloan Management Review, 40, 83–92. Mohrman, S. A., & Cohen, S. G. (1995). When people get out of the box: New relationships, new systems. In A. Howard (Ed.), The changing nature of work (pp. 365– 410). San Francisco: Jossey-Bass. Morrow, P . (1993). The theory and measurement of work commitment. Greenwich: CT: JAI Press. Paul, A. K., & Anantharaman, R. N. (2004). Influence of HRM practices on organizational commitment: A study among software professionals in India. Human Resource Development Quarterly, 15, 77– 88. Pool, S., & Pool, B. (2007). A management development model: Measuring organizational commitment its impact on job satisfaction among executives in a learning organization. Journal of Management Development, 26, 353–369. Porters, L. W., Steers, R. M., Mowday, R. T., & Boulin, P. V . (1974). Organizational commitment, job satisfaction, and turnover among psychiatric technicians. Journal of Applied Psychology, 59, 603–609. Richards, R. (1995). Artful work: Awakening joy, meaning, and commitment in the workplace. San Francisco: Berrett-Koehler. Rogers, C. (1959). A theory of therapy, personality, and interpersonal relationships as developed in the client-centered framework. In S. Koch (Ed.), Psychology: A study of science ( Vol. 3, pp. 184–256). New York: McGraw-Hill. Rogers, C. (1961). On becoming a person. Boston: Houghton Mifflin. Schaefer, C. & Darling, J. Contemplative Disciplines in Work and Organizational Life,” High Tor Alliance, Spring Valley, NY, 1996. Senge, P . (1993). The fifth discipline: The art and practice of the learning organization. New York: Doubleday. Sievers, B. (1984). Motivation as a surrogate for meaning (Arbeitspapiere des Frachbereichs). Wupprtal, Germany: Bergische Universitat. Society for Human Resource Management. (2008a). Job satisfaction survey report. Alexandria, VA: Author. Society for Human Resource Management. (2008b). Workplace forecast. Alexandria, VA: Author. Thomas, K . (2000). Unlocking the mysteries of intrinsic motivation. OD Practitioner, 32(4), 27–30.
Salkind_Chapter 46.indd 236
9/4/2010 10:41:11 AM
47 Motivation and Human Growth: A Developmental Perspective M.S. Srinivasin
Introduction
M
otivation is a subject of perennial interest in management, psychology and leadership. However, most modern motivational theories suffer from two inadequacies – a lack of sufficient attention to the higher motives of the mental, moral and spiritual being in humans; and a too-heavy insistence on performance rather than on growth. What is not recognized fully is that motivation can be a means or lever of human development in the organization. A human being is not merely a knowledge, skill and productivity engine created solely for filling the coffers of an organization or meeting its bottom line and deadlines. It is a complex living entity with a sacred essence, created for a higher purpose. Most wisdom-traditions of the world agree that this higher purpose is a progressive unfolding of the human potential, culminating in fully blossomed flowers of humanity. This article provides a conceptual framework for understanding the process of motivation from an evolutionary and developmental perspective.
Hierarchy of Motives Equality of humans may be a spiritual truth, but is not yet an actual fact of life because individuals are at various levels of development. Needs, values and attitudes of individuals depend on their nature and the level of their inner Source: Journal of Human Values, 14(1) (2008): 63–71.
Salkind_Chapter 47.indd 237
9/4/2010 10:41:03 AM
238
Motivation
development. The task or challenge of corporate leadership is, therefore, to understand intuitively this inner spirit of an employee and provide him with an individualized motivational programme that matches his unique needs. But how is this motivational level of each individual employee to be determined? This is where the importance of the well known ‘need hierarchy of motives’ model of Abraham Maslow comes in. This model identifies five basic human needs and arranges them in an ascending order. They are: first, biological ones for sex, survival and other physical needs; second, those for material and emotional security; third, social needs for affection, autonomy, achievement, status, recognition and attention; and, finally, the highest need of all, self-actualization. According to Maslow, as each of these needs become substantially satisfied, the next needs become dominant. So the right motivation requires a clear understanding of these motivational needs of each individual and focus on satisfying them (Robins 1997: 214). This need hierarchy model of Maslow, after a powerful initial impact on management thinkers and professionals, later went out of favour for supposedly better theories. Maslow’s idea was criticized on many points. For example, it was accused of ignoring the cultural factor; of lacking empirical validity; and that the needs are parallel rather than hierarchical. All these criticisms can be valid, for no concept or theory can hope to explain or encompass the incredible complexity of human nature and its motives. But Maslow’s need hierarchy model has two plus points over other modern motivational theories. First, it recognizes the process of evolution, viewing the human being as an evolving entity, moving progressively towards higher and higher levels of motivation; second, its intuition or idea is broader and more comprehensive than other modern theories. However, from the viewpoint of Indian spiritual vision, Maslow’s model has two flaws. First, it ignores or fails to articulate clearly the higher intellectual, moral and spiritual motives in man; and, second, from a holistic perspective, it needs to be integrated with a comprehensive vision of human development. This is where the Indian vision of human development can rectify and complement Maslow’s model.
Evolution and Motivation: The Indian Paradigm According to Indian thought, there are four stages in the evolution of humans that takes them towards their spiritual goal. Every human being begins the evolutionary journey as a physical entity driven by biological and security needs. He progresses to becoming a vital being with emotional and vital needs.1 There are two sub-stages in the evolution of the vital human. First, he becomes someone who lives predominantly in his emotional and pragmatic mind with its need for mutuality, harmonious relationship, enjoyment
Salkind_Chapter 47.indd 238
9/4/2010 10:41:03 AM
Srinivasin
Motivation and Human Growth
239
and pragmatic adaptation to life. In the need hierarchy of Maslow these social needs constitute only one part of our emotional needs. At the next stage, the vital human becomes a person of strong will and abundant vital energy, the leader or the warrior type, with needs for power, achievement, conquest, expansion, name and fame. These ‘esteem’ needs, are again, one part of the needs of the human type of will and power in Maslow ‘s theory. Alexander and Napoleon are archetypal vital men of power, while in the corporate world, great and successful entrepreneurs and executives like Carnegie and Ford of the old economy, and Gates and Grove of the new economy, are predominantly vital men. As the person progresses further, he becomes the intellectual, moral and artistic type of personality with intellectual, ethical and aesthetic needs for knowledge, values, ideals and vision; in other words, the mental human.2 He looks beyond physical and vital needs, seeking to understand higher aims, values and laws of life, and trying to organize it according to these higher verities. Socrates and Plato, Tagore and Leonardo da Vinci, Einstein, Confucius and Gandhi are different types of mental men who have reached the higher plateaus of the human mind. One of the major aims of the social philosophy and practices of ancient Indian and Chinese civilization is to create a society governed by the mental and moral motives of dharma. As the mental human reaches the highest peak of intellectual, ethical and aesthetic development, he becomes aware of a spiritual reality beyond the mind and awakens to this highest spiritual need for self-realization, truth and God. He begins to become the spiritual human. The Vedic and Upanishadic sages, St Francis of Asisi, Meister Eckhart, and modern age sages like Sri Aurobindo, Vivekananda and Ramana Maharishi are different types of accomplished spiritual men. We must note here that the stages of an individual’s evolution depend mainly on the dominant temperament and motives that shape and drive his life, and not on academic status or mental development. In the process of evolution, mind and vital need develop simultaneously, although some vital persons may be at a transitional stage from the vital to the mental phase of development. Take for example someone like Andy Grove of Intel, the microchip giant. He started his career as a brilliant research engineer with a doctorate in chemical engineering, did some outstanding research work in fluid mechanics and semiconductor physics, and wrote six books. But when we look at his later life as CEO of Intel, we can see his dominant temperament and motives are that of the vital human, with an aggressive push for power, dominance, achievement, name and fame. There four types or stages in human evolution can be placed in a corresponding four-fold motivational spectrum. At the lower end of the spectrum are the outwardly motivated who need the stimulus of external reward or punishment to remain active. At the higher end first come the self-motivated who feel an intrinsic joy in work and, therefore, need no external stimulus to remain motivated. Next come the ethically motivated who feel the need to
Salkind_Chapter 47.indd 239
9/4/2010 10:41:03 AM
240
Motivation
contribute or serve a higher moral or social cause. The ethically awakened individual seeks not only joy in work, but a higher meaning as well. The last and the highest is spiritual motivation, which develops when the individual is awakened to his spiritual self beyond his body and mind. Let us now try to relate these four stages of evolution to their motivation spectrum. The physical human who is bound to the needs and instincts of the body is at the lowest level of the motivation spectrum. For his higher evolution and development, his vital and emotional being have to be awakened by external motivators like the need for wealth, power, enjoyment and success. The vital man is capable of self-motivation and self-dedication to a higher moral or spiritual cause. When he awakens to these higher motives and dedicates himself to a higher ideal, he not only accelerates his own higher evolution, but also becomes a dynamic instrument for the higher evolution of the collectivity. The vital being, inspired by higher values, can be a very effective and heroic leader and crusader for manifesting these higher values in the outer life. Some of the Indian kings like Ashoka, Shivaji and Akbar, and statesmen of the West like Winston Churchill and Abraham Lincoln belong to this category. However, if there is a lack of sufficient mental or spiritual illumination in the mind, the vital man can become an aggressive and intolerant tyrant, forcefully championing a narrow dogmatic idea. Similarly, when the mental human awakens to the spiritual realm may blossom into a high thinker, sage or a saint sowing luminous, kindly or inspiring ideals in the consciousness of people. But if there is a lack of strength in the will or vital force, the mental or moral individual will be ineffective as a leader. So, to fully realize moral and spiritual potentialities, both vital and mental humans must pursue a mental, moral and spiritual education and discipline, leading to a deepening, widening and refinement of mind and heart, linking their consciousness and will to a spiritual inspiration and energy. One such discipline is the karma yoga or yoga of action of the Indian scripture, the Bhagavad Gita. A main principle of this discipline, which has direct relevance for the corporate world, is to renounce the eager and anxious seeking of rewards of action and concentrate all our energies on the present, on the work to be done. If we have faith in God, we may add to this a consecration of all our activities to the divine power. The karma yoga path of the Gita leads to motiveless action, driven not by human motives – vital, mental or moral – but by a universal spiritual force, transcending the individual and collective ego. Thus, Indian spiritual vision links motivation with human development in an integrated perspective. This scheme provides a broad and general framework for understanding and identifying the process of motivation in an evolutionary perspective. However, as mentioned earlier, human evolution is a complex process that cannot be rammed into any mental formula. We are at once a physical, vital, mental and a spiritual being. The motives and impulses of all these parts exist simultaneously within us although some of
Salkind_Chapter 47.indd 240
9/4/2010 10:41:03 AM
Srinivasin
Motivation and Human Growth
241
them may be dormant, weak or unmanifest.3 The stage of our inner development depends on the most dominant, conscious or manifest part of our personality. For example, if the dominant part is vital we are in the second, vital stage of development. We also admit that this Indian scheme of human evolution is only one among many other possible formulas. Other schemes with different systems of classification are also possible and equally valid, but the Indian concept is preferable because we find it integral, embracing all the fundamental elements constituting the human organism.
Beyond Job Satisfaction This brings us to one of the major objectives of modern motivational strategies – job satisfaction. Job satisfaction happens when the nature of work and the rewards received for this work match the motivational needs of an employee. But mere job satisfaction cannot be the highest ideal for an evolving human being. In an evolving world, growth and progress is an eternal law and a higher need. Anything that does not grow disintegrates and perishes. So we have to create a work culture that consciously promotes and accelerates the progressive evolution of the individual by awakening in him the dormant higher needs. So the aim of motivational strategy has to be not only to satisfy the employee’s present needs, but also to awaken higher needs. This means the physical being has to be awakened to his vital and mental needs, and helped to become the vital and mental being; the vital human to his mental, moral and aesthetic needs to bring the light of a higher culture to his life of raw desire and ambition; and the mental or moral individual to his highest spiritual goal. The need for this evolutionary transition to higher needs is indicated by a lack of interest in the needs and activities of the present stage of development, and a growing interest in the needs and activities of higher stages. Here is an example from the Harvard Business Review illustrating this transition. Mark was a star at the large West Coast Bank where he had worked for three years. He had an MBA from a leading business school and he had distinguished himself as a skilled lending officer. He excelled in every work task the bank gave him. He was smart and knew no other way to approach than to give it his all. The bank paid Mark well and senior managers had every intention of promoting him. But over time Mark grew more and more unhappy. He was seriously considering leaving the organization. Fortunately for both Mark and the bank, after consulting a counsellor, he was able to identify the cause of his unhappiness: he was no longer interested in his present job, which involved number crunching and interaction with customers. He wanted a more intellectually stimulating job. Using this insight, he was able to find a new assignment that required conceptual and analytical thinking, making him happy and satisfied (Butler and Waldrop 1999).
Salkind_Chapter 47.indd 241
9/4/2010 10:41:04 AM
242
Motivation
It is very difficult to say with precision or certainty what the psychological factors behind Mark’s motivational problem were. One could be a shift in his life-motives from the vital to the mental level. However, sometimes this awakening to higher motives may express itself not in the professional life of the person, but in his hobbies and extra-professional interests. For example, it was reported in a leading business journal that a top executive from a big business house was very much interested in the field of unified theory in physics and in his spare time read every available book on the subject.
The Corporate World in the Motivation Map We are now in a better position to relate the motivational process sketched so far to the present state of the corporate world. Our modern age represents a rapid and increasing ‘vitalization’ and ‘mentalization’ (terms coined by Sri Aurobindo) of the human mass. So the pure physical type of personality satisfied with basic minimum needs are becoming fewer and fewer, for in the hyper-competitive and charged atmosphere of the corporate world, with its new thrust towards empowerment, knowledge, innovation and relentless chasing of deadlines, there is not much scope for the physical human. However, most of the shop floor and clerical workforce in the corporate world may perhaps live predominantly in their physical consciousness, but with a growing awakening to vital and mental needs. Moving up to the managerial cadre, we have some interesting insights on executive motivation from two psychologists, Timothy Butler and James Waldrop, as elaborated in their article in the Harvard Business Review. According to these two Harvard psychologists, most executives in business are driven by seven basic ‘business core functions’ related to their deeply embedded life-interests or needs. They are: application of technology; enterprise control; managing people and relationships; quantitative analysis; counselling and mentoring; theory development and conceptual thinking; and influencing through language and ideas (ibid. 1999.) The first four factors are predominantly needs of the vital and pragmatic mind, while the last three are needs of the thinking and communicating mind. But this classification is based on the expression of life needs of people in their professional life. For a better understanding of the motivational level of people, we have to take into consideration the nature of their extra-professional activities. Moreover, there are probably a considerable number of people in the corporate world who are seeking a moral and spiritual fulfilment or meaning in and through work. For example, the US Academy of Management recently launched a new magazine, Journal of Management, Spirituality and Religion, focusing on these higher needs and broader issues emerging in the management community.
Salkind_Chapter 47.indd 242
9/4/2010 10:41:04 AM
Srinivasin
Motivation and Human Growth
243
However, motivation is not only individual, but also collective. Just as an individual, the collectivity can also move up the motivational ladder in the course of its natural evolution. Contemporary business is perhaps in such a state of evolutionary transition towards some higher mental and moral needs. The first major change is what we may call the people-knowledge factor, a shift in the strategic motive of business from reliance on a mechanical and mass application of technology to the living knowledge or creativity of people or individual employees. As Michael Burns, chairman and CEO of Mercer Human Resource Consulting points out: ‘The last decade has been technology fuelled productivity. Now is the turn of the knowledge-economy’ (Burns 2007). And knowledge economy is people-centric. Christopher Barret of the Harvard Business School explains: We can’t just manage by systems which are invariably defined in financial terms, we need to focus on people and on developing, managing and building our capacities through them. . . . Because they are the ones with the expertise and that is replacing capital as the scarce strategic resource. The new model, of the Individualized Corporation that we have evolved requires companies to leverage individual competencies, capacities, knowledge and skills. This is going to be the source of competitive advantage. (1999: 61)
Barret gives the following example of ISS, a Denmark-based firm which is in the cleaning business: It is a business with minute margins, so they have to focus on costs. They could have regarded their employees as labourers who were asked to go and do their job, directed in the classical hierarchal form. But what they did instead was to create individual teams that worked together on cleaning contracts. . . . Then they engaged in education . . . where they took the front-line people through a series of training sets. The first obviously was teaching them how to clean properly. The second was to work together in a team. Third, they started focusing on quality. Fourth they got their teams to focus on customer service and listening to customers. Fifth, the teams were taught to read financials. Eventually the teams became interested in what the customer wanted and became capable of interpreting data. This is innovation. You get costs down by driving responsibility down the organization, creating entrepreneurial initiative and leveraging ideas across the organization – it’s a different philosophy. (ibid.)
The second factor is the growing interest in ethics. There are two important features in the emerging ethical debate in business. First is the recognition of the motivational power of ethics. As former CEO of Johnson & Johnson, James Burke, says: Here we believe strongly in three things, decentralization, managing for the long-term, and the ethical principles embodied in our Credo. Credo is the sort of thing that inspires the best in people. I think that all of us have
Salkind_Chapter 47.indd 243
9/4/2010 10:41:04 AM
244
Motivation
a basic moral imperative hidden somewhere in us. In some people it is more central to their being, but it’s always there. To tap that well-spring creates energy that you can’t get elsewhere. (1986: 19)
The second feature is the growing demand for fairness and transparency. As founder of Infosys N.R. Narayana Murthy states: ‘Investors, customers, employees and vendors have all become more discerning and are demanding greater transparency and fairness in all dealings’ (Skaria 1999: 25). This shows that the corporate world as a whole is becoming more sensitive to ethical issues. The third factor is the concept of corporate social responsibility (CSR), which is spreading fast in the business community. CSR seems to be the new fad in business and management. As a columnist in the business section of a leading Indian daily points out: Call it guilt cleansing or genuine concern for the downtrodden; the fact is that from single-minded devotion to bottom line till a few years ago, corporations are increasingly putting their mind and money to the bottom of social pyramid. Philanthropy indeed is fast becoming an integral part of corporate culture. Today nearly every major corporate house is supporting some cause or social initiative. And they are no longer taking it as charity but as a responsibility. In today’s world being a good and responsible corporate citizen is as important as increasing your business. ( Vishwajeet 2006)
For example, in India most of the major players in the new economy like Satyam, Wipro, Infosys, and Dr Reddy’s Laboratory have their charitable trusts working on social causes. In the US, two icons of the new economy, Bill Gates and Andy Grove, have their own foundations.
The Path Ahead These mental and moral needs emerging in the corporate mind hold great promise for the future evolution of business, but these needs have to be explored to their highest potential. This requires a deep insight into the psychological and spiritual sources of knowledge and ethics, and which must be harnessed for the higher evolution of business. If businesses can do this, it will give a quantum thrust to their future evolution. This higher evolution is not a matter of idealism, but a crucial choice that will determine the future status of individuals and collectivities. Tex Gunning (2007), a vice-president of the Unilever Group, in his valedictory address to the CII national summit on corporate social responsibility, said: Many companies did not exist more than 60 to 70 years because they do not evolve. . . . Earning money was essential but it was not the essence of life. Companies have to create social capital, economic capital, spiritual capital and intellectual capital. Companies that don’t create this kind of
Salkind_Chapter 47.indd 244
9/4/2010 10:41:04 AM
Srinivasin
Motivation and Human Growth
245
wealth would be dissolved or swept away. We have to act now out of choice or have change forced on us.
These prophetic words from the mind of a top business executive display an instinctive recognition of what Sri Aurobindo perceived with a more conscious, enlightened and far-seeing vision in the beginning of the twentieth century. ‘In the next stage of human progress,’ said Sri Aurobindo, ‘it is not a material but a spiritual, moral and psychological progress that has to be made . . . [and] whatever race or whatever country that seizes on the lines of these evolution and fulfills it will be the leader of humanity’ (Sri Aurobindo 1972a, 1972b). In the scheme of nature, whatever that does not evolve either becomes extinct or has to play second fiddle to the leaders who surge ahead. However, there is one more important factor related to this higher evolution, which we would like to briefly touch upon before concluding our discussion. Human motivation or action has an inner intent as well as an outer content. The word ‘motive’ is normally used to describe mainly the inner intent. For example, if I become moral out of fear of hell in the life after death or because of karmic consequences, then my motivation is ethical only in the outer content and not in the inner intent, which is still the vital motive of fear. In this sense, the mental and moral needs emerging in business are very much mixed. There is a change only in the outer content, but not much in the inner intent of still vital needs like productivity, competitive advantage, and the pressure of outer circumstances. However, our human organism is ‘psychosomatic’. Our body and mind, thoughts, feelings and actions have a mutual interaction and influence. An outer action, when it is done with sincerity, persistence and conviction, has corresponding inner results. For example, someone who becomes moral out of vital or material needs may one day become conscious of the inherent joy of virtue and as a result the lower needs may drop away. Or else, as he grows mentally, he may awaken to the fact that ethics is an integral part of the higher laws of life, and as a result, a corresponding change may occur in the inner motives of action. For instance, the modern environmental movement is the result of such a mental awakening to the laws of physical nature. When there is a similar awakening to the psychological and spiritual ecology of universal nature, and when these higher laws of life are implemented and institutionalized in the corporate life, then it will give a decisive thrust to the higher evolution of the collective life of humanity. The corporate mind in business has to consciously strive for this higher awakening.
Notes 1. We use the term ‘vital’ to denote that part of our consciousness that is the source of our emotions, passions, enthusiasm, energy, and the dynamic will for action and execution.
Salkind_Chapter 47.indd 245
9/4/2010 10:41:04 AM
246
Motivation
2. We use the word ‘mental’ for that part of our consciousness that houses our intellectual, ethical and aesthetic intelligence. A human being can achieve his full development, or in other words, become the true mental human, only when he develops fully all the potentialities of his higher mental nature made of the rational, ethical and aesthetic being, and govern the rest of his nature with this higher element in him. This is the reason why in our scheme of human development we have placed the mental above the vital in the evolutionary ladder. Beyond the perfection of ‘humanhood’ there is what we may call the perfection of ‘soulhood’, which can be achieved only by realizing our spiritual nature beyond mind. 3. Even when we are fully awakened to the higher needs and try to organize our life around them, lower needs are still present – perhaps very much suppressed and held down, but not mastered. Therefore, they can cast their overt or covert influence over our actions. So the intellectual, the artist and the saint can still be swayed by vital motives like name and fame and power and wealth. This is the reason why the path of yoga in which the seeker makes a conscious effort to rise beyond the mind into the spiritual consciousness is so difficult. Even after we have kindled the fire of aspiration and kept it burning, our lower nature may still throw its smoke and dust and filth into the sacred flame and disturb the inner sacrifice, or even extinguish the flame. This fact of the inner life is symbolically conveyed in Indian mythology in the image of titanic beings disturbing the fire-sacrifice rituals of the rishis.
References Barret, Christopher (1999), ‘Interview: Create a Purpose to Engage People’, Business Today, 7 May, 61–69. Burke, James (1986), Interview, in Thomas R. Horton, ed., What Works For Me, pp. 16–25 (New York: Random House). Burns, Michael (2007), ‘Interview: Now it is the Turn of the Knowledge Economy’, Business Today, 15 June. Butler, Timothy and James Waldrop (1999), ‘Job Sculpting: The Art of Retaining Your Best People’, Harvard Business Review, September–October, 41–63. Robins, Stephen (1997), Organizational Behavior (New Delhi: Prentice-Hall). Gunning, Tex (2007), ‘Corporates Should have a Conscience’, Hindu, 16 June. Skaria, George (1999) ‘The Well-governed Corporation’, Business Today, 21 November, 25–31. Sri Aurobindo (1972a), Collected Works: Bande Matharam (Pondicherry: Sri Aurobindo Ashram). ———. (1972b). Collected Works: Supplement (Pondicherry: Sri Aurobindo Ashram).
Salkind_Chapter 47.indd 246
9/4/2010 10:41:04 AM
48 Evolutionary Perspectives on Human Motivation Jutta Heckhausen
B
efore Charles Darwin’s theory gained influence in the social and behavioral sciences, the traditional philosophical and theological views distinguished human motivation from animal motivation as something governed by the “free will,” as opposed to by instinct. The growing acceptance of Darwinian ideas resulted in three major innovations in psychology, which led to a segregation rather than integration of approaches. First, McDougall (1908) argued that a set of basic instincts and drives guides not only animal but also human behavior. His approach is reflected in modern ethological approaches to fundamental behavioral systems, such as aggression (Bischof, 1985; Lorenz, 1966), parenting (Bischof, 1985; Bowlby, 1969), and foraging (L. Tinbergen, 1960; N. Tinbergen, 1951). Second, simultaneously with McDougall’s (1908) ideas about human motivational drives, Sigmund Freud developed his psychodynamic theory, which conceptualizes behavior and cognition as influenced by latent and unconscious drives of the individual. This approach found its continuation in personality conceptions of motivation and their specific diagnostic instruments, namely, projective tests (McClelland, 1971; Murray, 1938). Third, the ability to adjust instinctual behavior to changing environmental conditions is a key feature of human behavior, which should be precedented by early forms of intelligent behavior in related animal species. The pioneer of comparative research in learning (i.e., associative) capacity was Thorndike (1898). His groundbreaking work, together with James’s (1890) conception
Source: American Behavioral Scientist, 43(6) (2000): 1015–1029.
Salkind_Chapter 48.indd 247
9/4/2010 10:40:57 AM
248
Motivation
of “habit,” laid the foundation for behaviorism, which unfortunately dominated psychology at the expense of all other approaches for nearly three decades. In consequence of the excessive and prolonged domination of psychology by behaviorism, human motivation appeared to be an unworthy domain of psychological research. Nevertheless, the field made important advances in terms of adopting models of instrumentality of behavior (Vroom, 1964) and of decision rationality by way of combining the expectations about outcomes with perceived outcome value as determinants of human motivation and thus behavioral investment. Atkinson (1957) combined this expectancy-value approach with an interindividual-difference construct of motive strength, thus creating a predictive model of motivated behavior. However, the model became ever more cognitive and thus segregated from ethological and comparative approaches to motivation. The human-animal gap widened even more with the rise of attributional theory in motivation (Kelley, 1967; Weiner, 1972), which may have been, in part, a reaction to the overdominance of behaviorism. The modern revival of human motivation in psychology (e.g., see J. Heck-hausen & Dweck, 1998) was largely associated with the cognitive paradigm and its integration with an interindividual-difference approach to motives (H. Heck-hausen, 1991). This course of scientific evolution has largely bypassed the issue of evolutionary precursors of motivated human behavior. At the same time, comparative psychology has focused on cognitive phenomena to the exclusion of phenomena of motivational engagement and disengagement.
Why Should Evolutionary Psychology Be Interested in Motivation? Evolutionary psychology has thus far paid little attention to phenomena of motivation and emotion (see review in Schneider & Dittrich, 1990), and has mostly focused on the cognitive functioning involved in social exchange (e.g., Cosmides & Tooby, 1992), risk perception (e.g., Gigerenzer, Todd, & the ABC Research Group, in press; Rode & Wang, 2000 [this issue]), foraging and food preferences (Stephens & Krebs, 1986; Rozin, 2000 [this issue]), mate choice (Buss, 1994; Todd, 2000 [this issue]), and parenting (e.g., Keller, 2000 [this issue]; Mann, 1992). An evolutionary approach to motivation and emotion must first ask the question of how the organism can direct its behavior to seek favorable and avoid harmful environments and outcomes (Schneider & Dittrich, 1990). Hypothetically, one might postulate either of two extreme types of mechanisms: The first is fixed stimulus-response patterns, which are preadapted by genetically transferred programs of behavior, what Mayr (1974) referred to as “closed behavior programs.” The alternative mechanisms would be one
Salkind_Chapter 48.indd 248
9/4/2010 10:40:57 AM
Heckhausen
Evolutionary Perspectives on Human Motivation
249
that directly guides the organism’s behavior in view of the requirements of maximizing inclusive fitness (Hamilton, 1964; Wilson, 1975), a view promoted by radical sociobiologists. Both these extreme alternative mechanisms seem unlikely to play a key role in human behavioral regulation. Fixed or “closed behavior programs” (Mayr, 1974) are not flexible enough to effectively guide the behavior of a species living in a highly complex material and social environment. Intentional pursuit of ultimate goals of reproductive fitness would exceed the capacity of a central regulating mechanism, in terms of both the complexity and coordination of subsystems. Instead of such extreme models, an approach that integrates the operation of “open behavioral programs” (Mayr, 1974) or behavioral modules (Cosmides & Tooby, 1994; Rozin, 1976) and more general processes of behavior direction associated with emotional states and motivational tendencies (Hamburg, 1963; Plutchik, 1980; Scherer, 1984) is more promising. Evolutionary psychology has furnished an impressive range of research programs in various domain-specific modules preadapted to solve specific tasks involved in the optimization of inclusive fitness (Cosmides & Tooby, 1994; Fodor, 1983; Rozin, 1976; Tooby & Cosmides, 1992). However, little attention has been invested and consequently no consensus has been achieved with respect to the regulation of behavior across domains. In complex situational settings that afford more than one module of behavior (e.g., foraging and mate selection), the organism needs to manage cross-domain trade-offs. Moreover, in a mobile species capable of highly varied and flexible behavior, the attainment of proximate goals may require prolonged effort even in the absence of immediate situational affordances. This constellation of challenges for behavioral regulation requires mediational mechanisms, which help the organism select the most appropriate behavior given a certain combination of need state, environmental opportunity, and expected control. Emotional mediation between situational affordances and the organism’s responses provides an overall directionality to behavior, and thus enables the organism to activate behavior that tightly fits its specific needs and the environmental opportunities. An example is sexual excitement in rhesus monkeys that facilitates a variety of behavior patterns ranging from mounting to grooming or even masturbating, depending on the presence and behavior of a potential mate. Emotional mediating also allows the organism to put learning experiences acquired within its own ontogenesis to use, rather than having to rely on phylogenetically evolved preadapted and fixed stimulusresponse connections. Learning the relation between a certain behavior and a certain desired (or feared) outcome makes it possible to bring behavior under the control of anticipated consequences. In sum, motivational and emotional mechanisms might provide the missing link to the environmentneed fit in the activation and deactivation of behavioral and cognitive modules. In this way, behavioral regulation by motivation may be part of a multilevel architecture of the mammal and, indeed, the human mind.
Salkind_Chapter 48.indd 249
9/4/2010 10:40:57 AM
250
Motivation
How Could Motivational Psychology Profit from an Evolutionary Perspective? Motivational psychology has started out with the great complexity involved in adult human action. The first motivational research was focused around the concepts of volition and the free will (Ach, 1910; James, 1890; Wundt, 1896). What could be more cerebral, and thus discrepant, from the regulation of behavior in animals? However, phenomena of volition in human motivation did not suddenly occur with modern man. Motivational mechanisms, including those of volition and the free will had evolutionary precursors. Evolution can not invent solutions to environmental or regulative challenges because it is not teleologically guided. Therefore, nature needs to work with what evolution has already brought about in previously evolved species. This is as true for behavioral programs as it is for older brain structures and for basic body plans of anatomy (Rumbaugh, Savage-Rumbaugh, & Washburn, 1994). It is known from comparative psychological research that various complex psychological mechanisms can be traced to simpler, more basic processes in nonhuman species (e.g., Leger, 1992; Roitblad, 1987). To be sure, the evolutionary heritage is not necessarily the best solution for present problems but merely the best solution selected in the phylogenetic past, given the constraints of already existing canalizations of phylogeny at the time. Although evolutionary psychology, with its present emphasis on specific cognitive modules involved in foraging, decision making, and risk-related behavior focuses on those modules believed to be a product of hunter-gatherer evolution (Barkow, Cosmides, & Tooby, 1992), I would argue that key modules involved in the motivational regulation of human behavior go back as far as early mammal or even vertebrate evolution. To arrive at a model of the origins and evolution of motivational processes, I should start with a task analysis of survival and reproductive fitness in terms of its motivational implications. The basic survival functions involve the internal regulation of body metabolism by way of breathing, cardiovascular functioning, and balancing of substances, as well as regulatory challenges involving control of the immediate environment to attain food, liquid, and shelter; avoid predators; seek a mate; reproduce; and (in some species) care for offspring. It can probably be said that for most invertebrates, these challenges of inclusive fitness are mastered by way of closed-behavior programs (Mayr, 1974), which comprise genetically fixed stimulus-response connections (e.g., hunger triggers species-typical foraging behavior, followed by consummatory activity). However, even in some invertebrates and lower vertebrates, these stimulus-response connections are modifiable by need states in the sense that higher need lowers the threshold for the stimulus-typical response (Ewert, 1976; Kravitz, 1988). Thus the animal may, for example, react with sexual behavior even to objects that are remotely similar in appearance to conspecifics. This modification of response threshold may be the very earliest form of
Salkind_Chapter 48.indd 250
9/4/2010 10:40:57 AM
Heckhausen
Evolutionary Perspectives on Human Motivation
251
flexibilization of the fixed stimulus-response connections in closed behavior programs (Mayr, 1974). However, these closed connections between need states, behavior, need-relevant stimulus, and responses provide no degree of freedom for multiple behavioral options and the adaptation of behavioral means to variations in the environment. In vertebrate – and especially mammal – evolution, open behavior programs evolved that provide greater degrees of freedom to flexibly adapt behavior to environmental conditions for foraging, predator avoidance, reproduction, and other challenges. An example is adaptively varied insect foraging patterns in birds across an array of food patches with varying food availability, familiarity, and under conditions of high (with hatchlings) versus low need states (e.g., Krebs, 1980; McFarland, 1977; L. Tinbergen, 1960). The evolutionary precursors of emotional processes probably evolved hand in hand with the transition from vertrebrates, which fed by filtering nutritious particles out of water, to those actively searching for larger individual pieces of nutrients. The latter need to regulate their movement patterns, whereas the former had no choice. Recent work on the first steps in the evolution of neocortical structures at the transition between invertebrate and vertebrate strata (e.g., Appendicularia) shows that extrapyramidal neocortical structures resemble those dedicated to visual orientation in more complex species and occur strictly contingent with the ability to move in the water, rather than being fixed to a certain place. A transition species even showed an ontogenetic contingent with the juvenile form moving about and endowed with a minute and basic neocortical structure, which the stationary adult form loses. Emotional states may have come about with the emergence of neocortical structures, which allowed the secondary projection and integration of sensory input and motor programs with the vegetative and endocrinological systems that had evolved even earlier for the maintenance of internal bodily equilibrium. The types of species that are associated with this milestone in the evolution of motivational regulation are reptiles. It has been shown with contemporary caimans (Keating, Kormann, & Horel, 1970) that artificial stimulation of certain central cortical areas elicited directed-flight behavior, including the circumvention of obstacles and involving heavy breathing and vocalizations. Thus, these reptiles exhibited all the constitutional aspects of motivated behavior and both vegetative and motoric behavioral patterns of emotional responses. Comparative research in learning patterns provides strikingly convergent evidence for the transition from fixed to emotionally mediated (in the broadest sense) connections between behavior and environmental events. Species differences in the response to changes in reinforcement incentives (food pellets) reflect probably the earliest step in this evolutionary advance. Bitterman’s (1975) classical comparative study of learning revealed that whereas fish and certain turtle species exhibit a direct relation between resistance to extinction and magnitude of reinforcement (i.e., resistance to extinction increases with
Salkind_Chapter 48.indd 251
9/4/2010 10:40:57 AM
252
Motivation
magnitude of reinforcement), mammals such as rats show an inverse relation between extinction and reinforcement (resistance to extinction decreases with magnitude of reinforcement). These findings may be interpreted as an impressive illustration of the mediating effects of emotional states. In species with more sophisticated neuronal systems, behavior changes do not simply mirror changes in reinforcement. Higher developed species instead react to the change in incentives by disproportionately decreasing the operant behavior after decreases in incentives and disproportionately increasing it after increases in incentives. It is an intriguing question whether this phylogenetic transition may be associated with the evolution of reptiles and thus converge with the transition to earliest forms of behavior motivated by emotional states (for findings on caiman behavior, see Keating et al., 1970). In mammals, emotional reactions are found that mediate between stimulus and reaction and provide a general directionality of behavior, for example, in terms of appetence with regard to favored and needed food or avoidance with regard to predators or superior rivals. This general directionality of behavior then allows the specific behavioral means to be adjusted in accordance to the specific affordances of the environment. Such emotional mediators can become effective incentives of behavior, not only via conscious expectations but also by way of Pavlovian conditioning of emotional responses to stimulus constellations. This way, certain situations and behavioral patterns become marked emotionally, and are thus incorporated into internal mental representations and modifiable by learning (Schneider & Dittrich, 1990). Hence, even without any insight into the ultimate goals of behavior in terms of reproductive fitness, the organism is steered toward maximizing inclusive fitness in the various domains of survival and reproduction. The major motivational systems of prosocial (altruistic) behavior, aggression, affiliation, power, and achievement lend directionality and dynamics to behavior by way of need (push) and incentives (pull) and involve motivespecific emotions (H. Heckhausen, 1991). Although the completeness of this list may be debated and various longer lists have been proposed, the systems mentioned play a key role in regulating behavior by way of a hidden agenda that maximizes reproductive fitness while being experienced by the organism as highly need- and situation-specific motivators of behavior. A telling case in point is altruistic behavior, which is costly to the individual, yet holds benefits for inclusive fitness, and thus is an ultimate goal for adaptation (Hoffman, 1981). The mediation between proximate incentives and this ultimate goal is provided by empathic affective experiences, which motivate the individual to invest altruistic behavior in ameliorating distress in others. As altruism researcher Hoffman (1981) notes, Empathy may be uniquely suited for bridging the gap between egoism and altruism, since it has the property of transforming another person’s misfortune into one’s own feeling of distress. . . . an aversive state that may often best be alleviated by helping the victim, (p. 133)
Salkind_Chapter 48.indd 252
9/4/2010 10:40:57 AM
Heckhausen
Evolutionary Perspectives on Human Motivation
253
Basic Motivational Modules as Domain-General Regulators of Human Behavior A common feature of all motivated behavior is that the organism attempts to achieve outcomes in the environment by its own activity. In activities such as trying to find food, winning a mate, or struggling with a rival, the organism strives for control in terms of bringing about desired outcomes and preventing unde-sired ones. I therefore argue that the most fundamental and universal of motivational modules should relate to this basic endeavor to control the environment (J. Heckhausen & Schulz, 1995, in press). The strive for control should also be shared with the broadest range of species and go back the furthest into the phylogenetic past; at least as far back as to those species that first acquired a notable flexibility in their behavior programs (Gallistel, 1990; Rumbaugh & Sterritt, 1986). From a functionalistic perspective, one would hypothesize a set of basic motivational modules that would together favor an overall preference for controlling the environment and maximizing one’s resources and capacities for control (J. Heckhausen & Schulz, in press; Schulz & Heckhausen, 1997). Because of the dearth of comparative psychological research into motivational processes, one has to rely on reasoning about functional requirements of behavior regulation in active, complex, and resource-needy organisms such as mammals. First, one might expect a selectively enhanced attentional readiness and sensitivity to detect contingencies between behavior and external stimuli. Such a module for detecting behavior-event contingencies would help the organism to generally learn about its effectiveness to bring about events in the environment and identify specific behavioral patterns as causes for certain desired or dreaded outcomes. Second, control striving is promoted by an inherent preference for behavior-event contingencies. By inherent, it is meant that the preference holds even when there is no reinforcement with regard to a specific need, such as hunger, thirst, and so on. There is ample evidence for this assumption both with regard to humans and to other mammals (see review in Rumbaugh et al., 1994; White, 1959). Animals of various mammal species have been shown to become listless and depressed when experiencing uncontrollable negative events (Overmier & Seligman, 1967). Operant conditioning studies with mammals show that behavior-event contingencies are preferred to event-event contingencies even in the absence of consummatory behavior (see review in White, 1959). Chimpanzees favor objects that can be moved, changed, or made to emit sounds and light (Welker, 1956); monkeys spend hours solving mechanical puzzles (Harlow, 1953); and both children and rats prefer response-elicited rewards to receiving the same rewards without having to respond (Singh, 1970; see also aversion of freeloading phenomenon, Osborne, 1977). These preferences for behavior-event contingencies
Salkind_Chapter 48.indd 253
9/4/2010 10:40:57 AM
254
Motivation
are already in place at the very beginning of life. Even human neonates are able to detect behavior-event contingencies (Janos & Papousek, 1977; Papousek, 1967). Papousek (1967) found, for example, that very young infants learned head movements contingent on acoustic signals and milk reinforcement. Even after complete satiation, when the milk had lost its reinforcing potential, signals elicited prompt head movements and pleasure on the occurrence of the expected contingent presentation of the milk bottle.1 The third motivational module, which would favor control behavior, is a tendency to repeat responses when they have led to desirable consequences. This is the classical behaviorist notion of the “law of effect” (Thorndike, 1898) and operant conditioning (Skinner, 1938), which has been shown to hold for an extensive variety of vertebrate species, ranging from fish to birds, rats, monkeys, and humans. However, as discussed above, there appear to be interesting interspecies differences in the response to changes in reinforcers, so that species with elaborated neocortex structures exhibit enhanced reactions to shifts in incentives. The fourth motivational module is an asymmetric pattern of affect reactions to negative and positive changes in the environment. This asymmetry in affective responses is closely related to the basic forms of affective transformations discussed in the previous paragraph. Frijda (1988) has proposed “the laws of emotion” that humans affectively respond to negative change more strongly than to positive change. After a change for the worse, the negative emotions are stronger and typically last much longer than the positive emotions that follow a change for the better. In terms of control behavior, an interesting fact is that positive emotions of pride, feeling satisfied with the environment, and so on would hardly motivate the individual to become active to change the environment. In contrast, a negative emotion after a negative change motivates the individual to do something to change the environment to get rid of the noxious situation. Thus, the asymmetry in responding emotionally to positive and negative changes leads to a selective promotion of control behavior directed at changing the environment. Bitterman’s (1975) findings on interspecies differences may suggest the transition in phylogeny when this asymmetry evolved. The fifth motivational module involved in promoting control behavior is curiosity and exploration. Those species that operate based on open-behavior programs (Mayr, 1974) rely heavily on the acquisition of experience and knowledge during each organism’s ontogenesis. Experience and knowledge acquisition is, of course, most promoted when the organism exposes itself to novel situations. It is striking how similar and almost stereotypical mammal species with more complex neocortices are with regard to their typical exploratory behavior; they gaze at, walk around, sniff, touch, and manipulate an unknown object or animal (Schneider, 1996). It would seem likely that curiosity and exploration is a universal motivational system in higher mammals. An organism can only profit from the experiences of exploration when they are stored in
Salkind_Chapter 48.indd 254
9/4/2010 10:40:58 AM
Heckhausen
Evolutionary Perspectives on Human Motivation
255
some kind of mental representation, as expectancies, schemata, and so on. With greater neocortical capacity came the ability to store more complex schemata about object relations and causal connections. Violations of expectancies can then become instigators of curiosity and elicit exploratory behavior. This phenomenon of a preference for moderate discrepancy has been widely researched in the wake of Helson’s (1964) adaptation-level theory (McClelland, 1953). As a sixth component of motivational regulation, humans exhibit a perception of personal control, mastery, and self-efficacy (Bandura, 1982; Harter, 1974; Watson, 1966; White, 1959). The developmental origin of this mastery perception is a generalized awareness of behavior-event contingency (Watson, 1966) that emerges during the first 2 or 3 years of life and provides a motivational resource for active control attempts (J. Heckhausen, 1989; see also review in J. Heckhausen & Schulz, 1995). Such a generalized conception of one’s own competence, efficacy, and control enables the organism to view activities directed at attaining outcomes in the environment as opportunities to experience and test competence, thus creating a motivational resource for overcoming difficulties and pursuing effortful activities, even in the face of obstacles or long-term delays of gratification. Anticipated selfreinforcement then, is the missing link to adult human achievement motivation (H. Heckhausen, 1991). Unfortunately, very little is known about the potential nonhuman primate or even mammal precursors of such generalized concepts. Rumbaugh and Sterritt (1986) suggested that perceptions of control may have had both proximate reinforcement value as a buffer against anxiety with overwhelmingly novel stimulation and ultimate reproductive advantage by facilitating the development of new activities and experiences. With regard to the phylogenetic availability of the phenomenon of perceived control, it should be taken into account that perceived control most likely requires an awareness of self, which phylogentically did not evolve before the higher primates. In the great apes, however, self-recognition seems to be present (Gallup, 1970, 1979), and thus a notion of one’s own competence may play a role as a motivating factor of control activities, as well. In addition to these modules that would steer an organism toward selecting activities directed at achieving goals in the environment, there are probably other facilitative processes that help to focus attention and behavior on a chosen goal of control. Among these should be mechanisms of intentionbased priming, which enhance the salience of goal-relevant cues and benefits while degrading irrelevant and particularly conflicting goals and their respective cues. Modern approaches to human motivation (Gollwitzer, 1990; H. Heckhausen & Gollwitzer, 1986; H. Heckhausen, 1991; Kuhl, 1984) have put such long-forgotten volitional processes back into the larger field of motivational psychology. Moreover, recent models of control behavior and developmental regulation have addressed self-regulatory processes as part of motivational engagement and disengagement (J. Heckhausen, 1999; J. Heckhausen &Schulz, 1993,1998; Schulz & Heckhausen, 1996).
Salkind_Chapter 48.indd 255
9/4/2010 10:40:58 AM
256
Motivation
The motivational modules discussed so far all are directed at engaging the organism with goals of controlling the external world. However, control when striving for a particular goal may become dysfunctional when the goal turns out to be unattainable or the costs for striving become excessive and harm other, more important goal pursuits. Under such presumably not uncommon circumstances, the organism needs to disengage from a control goal so as to avoid wasting behavioral and motivational resources in futile goal pursuits, or become frustrated (a consequence of emotional-laden goal commitment) or even depleted in self-esteem and hopefulness (for those species that can construct a conception of their own competence). Deactivating behavior programs is not a challenge uniquely encountered by humans. All behavior that is to some extent flexible and involves choosing among options can go awry and should be susceptible to deactivation. Activities such as searching for food on a particular patch, chasing prey, courting a potential mate, or fighting a rival can turn out futile and thus wasteful or even directly destructive. Thus, mechanisms that promote engagement in goal pursuits need to be balanced by those allowing disengagement. Such mechanisms seem to be in place. Animals do not follow a prey until they collapse from exhaustion, they do not exploit a patch until collecting the last grain of food, or fight a superior rival until they are killed. Instead, there seem to be discontinuous mechanisms of goal deactivation that allow the animal to switch from complete engagement to disengagement in a sudden, discrete manner. The mechanisms involved in such deactivation of goal pursuit can be seen as the building blocks for human self-regulation of goal pursuit and coping with failure and losses. They enable the individual to switch behavioral and motivational resources over to comparatively more promising goal pursuits, and to avoid frustration with blocked goals. In addition to these two important functions of goal disengagement, humans also have to compensate for the negative consequences of failure experiences on self-esteem and general self-related conceptions of competence. Self-protective processes of reinterpreting failure or loss (e.g., by self-serving causal attributions; Snyder, Stephan, & Rosenfield, 1978) are probably unique to humans, although they do not rely on conscious processing of information. In fact, they may be all the more effective the less intentional they are (Brandtstädter & Renner, 1992; Brandtstädter, Wentura, & Greve, 1993).
Summary and Conclusion The history of psychology has disconnected motivational and comparative perspectives that had once inspired each other. After the prolonged reign of behaviorism, both comparative and motivational psychology have become dominated by a strong emphasis on cognitive processes at the expense, and
Salkind_Chapter 48.indd 256
9/4/2010 10:40:58 AM
Heckhausen
Evolutionary Perspectives on Human Motivation
257
to the exclusion of, affect-related processes. This has made it difficult to formulate and pursue an evolutionary approach to human motivation. However, evolutionary psychology should be keenly interested in motivational issues, given that problems of behavior and self-regulation are not resolved by merely addressing cognitive skills and modules. Instead, organisms with a substantial neocortex and behavioral flexibility require mechanisms that mediate between environmental challenges and behavior and allow adjustment of behavioral means in accordance with a complex and changing environment. From the point of view of motivational psychology, the paradigm of evolutionary psychology can provide a good approximation to the likely phylogenetic origins of specific motivational processes in other primates, mammals, and vertebrates. The problems of behavior regulation share common features across an impressive range of different species, and may have led to the selection of a few basic modules involved in affecting change in the environment. The set of potential motivational modules discussed promotes control behavior directed at the environment and is broadly applicable across domains of functioning and tasks involved in reproductive fitness.
Note 1. A related but functionally distinct issue is the preference for self-determination or self-controlled selection of goals for behavior. For example, Washburn and Rumbaugh (1991) report that rhesus monkeys perform better on tasks that they had selected themselves than on tasks assigned to them. A similar argument is made by Deci and Ryan (1985; Ryan, Kuhl, & Deci, 1997) with regard to basic psychological needs in humans. This preference for autonomy is a most interesting phenomenon with regard to those species that live in hierarchical social structures. Choosing one’s own behavioral goals is counteracting the dominance of high-status individuals. It may have benefits for the individual, but certainly not for group stability.
References Ach, N. (1910). Über den Willensakt und das Temperament [On acts of will and temperament]. Leipzig, Germany: Quelle & Meyer. Atkinson, J. W. (1957). Motivational determinants of risk-taking behavior. Psychological Review, 64, 359–372. Bandura, A. (1982). Self-efficacy mechanisms in human agency. American Psychologist, 37, 122 –147. Barkow, J. H., Cosmides, L., & Tooby, J. (Eds.). (1992). The adapted mind: Evolutionary psychology and the generation of culture. New York: Oxford University Press. Bischof, N. (1985). Das Rätsel Ödipus [The Oedipus mystery]. München, Germany: Piper. Bitterman, M. E. (1975). The comparative analysis of learning. Science, 188, 699–709. Bowlby, J. (1969). Attachment and loss: Attachment (Vol. 1). New York: Basic Books. Brandtstädter, J., & Renner, G. (1992). Coping with discrepancies between aspirations and achievements in adult development: A dual-process model. In L. Montada, S.-H. Filipp, & M. R. Lerner (Eds.), Life crises and experiences of loss in adulthood (pp. 301–319). Hillsdale, NJ: Lawrence Erlbaum.
Salkind_Chapter 48.indd 257
9/4/2010 10:40:58 AM
258
Motivation
Brandtstädter, J., Wentura, D., & Greve, W. (1993). Adaptive resources of the aging self: Outlines of an emergent perspective. International Journal of Behavioral Development, 16, 323–349. Buss, D. M. (1994). The evolution of desire: Strategies of human mating. New York: Basic Books. Cosmides, L., & Tooby, J. (1992). Cognitive adaptations for social exchange. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture (pp. 163–228). New York: Oxford University Press. Cosmides, L., & Tooby, J. (1994). Origins of domain-specificity: The evolution of functional organization. In L. A. Hirschfeld & S. A. Gelman (Eds.), Mapping the mind: Domain specificity in cognition and culture (pp. 85–116). Cambridge, UK: Cambridge University Press. Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum. Ewert, J.-P. (1976). Neuro-Ethologie. Einführung in die neurophysiologischen Grundlagen des Verhaltens [Neuro-ethology: Introduction to the neurophysiological foundations of behavior]. Berlin: Springer. Fodor, J. (1983). The modularity of mind. Cambridge, MA: MIT Press. Frijda, N. H. (1988). The laws of emotion. American Psychologist, 43, 349–358. Gallistel, C. R. (1990). The organization of learning. Cambridge: MIT Press. Gallup, G. G., Jr. (1970). Chimpanzees: Self-recognition. Science, 167, 86–87. Gallup, G. G., Jr. (1979). Self-recognition in chimpanzees and man: A developmental and comparative perspective. In M. Lewis & L. Rosenblum (Eds.), The child and its family: The genesis of behavior (Vol. 2, 107–126). New York: Plenum. Gigerenzer, G., Todd, P. M., & ABC Research Group, (in press). Simple heuristics that make us smart. New York: Oxford University Press. Gollwitzer, P. M. (1990). Action phases and mind-sets. In E. T. Higgins & R. M. Sorrentino (Eds.), Handbook of motivation and cognition: Foundations of social behavior ( Vol. 2, pp. 53–92). New York: Guilford. Hamburg, D. A. (1963). Emotions in the perspective of human evolution. In P. H. Knapp (Ed.), Expression of emotions in man (pp. 300–317). New York: International University Press. Hamilton, W. D. (1964). The genetical evolution of social behavior. Journal of Theoretical Biology, 7, 1–52. Harlow, H. F. (1953). Mice, monkeys, men, and motives. Psychological Review, 60, 23–32. Harter, S. (1974). Pleasure derived from cognitive challenge and mastery. Child Development, 45, 661–669. Heckhausen, H. (1991). Motivation and action. New York: Springer. Heckhausen, H., & Gollwitzer, P. M. (1986). Information processing before and after the formation of an intent. In F. Klix & H. Hagendorf (Eds.), In memoriam Hermann Ebbinghaus: Symposium on the structure and function of human memory (pp. 1071–1082). Amsterdam: Elsevier. Heckhausen, J. (1989). Normatives Entwicklungswissen als Bezugsrahmen zur (Re)Konstruktion der eigenen Biographie [Normative conceptions about development as a frame of reference for (re)constructing one’s own biography]. In P. Alheit & E. Hoerning (Eds.), Biographisches Wissen: Beiträge zu einer Theorie lebensgeschichtlicher Erfahrung, (pp. 202–282). Frankfurt, Germany: Campus. Heckhausen, J. (1999). Developmental regulation in adulthood: Age-normative and sociostructural constraints as adaptive challenges. New York: Cambridge University Press. Heckhausen, J., & Dweck, C. S. (Eds.). (1998). Motivation and self-regulation across the life span. New York: Cambridge University Press.
Salkind_Chapter 48.indd 258
9/4/2010 10:40:58 AM
Heckhausen
Evolutionary Perspectives on Human Motivation
259
Heckhausen, J., & Schulz, R. (1993). Optimisation by selection and compensation: Balancing primary and secondary control in life-span development. International Journal of Behavioral Development, 16, 287–303. Heckhausen, J., & Schulz, R. (1995). A life-span theory of control. Psychological Review, 102, 284–304. Heckhausen, J., & Schulz, R. (1998). Developmental regulation in adulthood: Selection and compensation via primary and secondary control. In J. Heckhausen & C. S. Dweck (Eds.), Motivation and self-regulation across the life span (pp. 50–77). New York: Cambridge University Press. Heckhausen, J. & Schulz, R. (in press). The primacy of primary control is a human universal: A reply to Gould’s critique of the life-span theory of control. Psychological Review. Helson, H. (1964). Adaptation-level theory. New York: Harper and Row. Hoffman, M. L. (1981). Is altruism a part of human nature? Journal of Personality and Social Psychology, 40, 121–137. James, W. (1890). The principles of psychology (Vol. 2). New York: Holt, Rinehart & Winston. Janos, O., & Papousek, H. (1977). Acquisition of appetition and palpebral conditioned reflexes by the same infants. Early Human Development, 1, 91–97. Keating, E. G., Kormann, L. A., & Horel, J. A. (1970). The behavioral effects of stimulating and ablating the reptilian amygdala (Caiman sklerops). Physiology and Behavior, 5, 55–59. Keller, H. (2000). Human parent-child relationships from an evolutionary perspective. American Behavioral Scientist, 43, [957–969]. Kelley, H. H. (1967). Attribution theory in social psychology. In D. Levine (Ed.), Nebraska symposium on motivation (pp. 192–238). Lincoln: Nebraska University Press. Kravitz, E. A. (1988). Hormonal control of behavior: Amines and the biasing of behavioral output in lobsters. Science, 241, 1775–1781. Krebs, J. R. (1980). Optimal foraging, predation risk and territory defense. Area, 68, 83–90. Kuhl, J. (1984). Motivational aspects of achievement motivation and learned helplessness: Toward a comprehensive theory of action control. In B. A. Maher & W. B. Maher (Eds.), Progress in experimental personality research (Vol. 13, pp. 99–171). New York: Academic Press. Leger, D. W. (1992). Biological foundations of behavior: An integrative approach. New York: HarperCollins. Lorenz, K. (1966). Ethologie, die Biologie des Verhaltens [Ethology, the biology of behavior]. In F. Gessner & L. V. Bertalanffy (Eds.), Handbuch der Biologie ( Vol. 2, pp. 341–559). Frankfurt, Germany: Athenäum. Mann, J. (1992). Nurturance or negligence: Maternal psychology and behavioral preference among preterm twins. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture (pp. 367–390). New York: Oxford University Press. Mayr, E. (1974). Behavior programs and evolutionary strategies. American Scientist, 62, 650–659. McClelland, D. C. (1953). The achievement motive. New York: Appleton-Century-Crofts. McClelland, D. C. (1971). Assessing human motivation. New York: General Learning Press. McDougall, W. (1908). An introduction to social psychology. London: Methuen. McFarland, D. J. (1977). Decision making in animals. Nature, 269, 15–21. Murray, H. A. (1938). Explorations in personality. New York: Oxford University Press. Osborne, S. R. (1977). The free food (contrafreeloading) phenomenon: A review and analysis. Animal Learning and Behavior, 5, 221–235.
Salkind_Chapter 48.indd 259
9/4/2010 10:40:58 AM
260
Motivation
Overmier, J. B., & Seligman, M.E.P. (1967). Effects of inescapable shock upon subsequent escape and avoidance responding. Journal of Comparative and Physiological Psychology, 63, 28–33. Papousek, H. (1967). Experimental studies of appetitional behavior in human newborns and infants. In H. W. Stevenson, E. H. Hess, & H. L. Rheingold (Eds.), Early behavior: Comparative developmental approaches (pp. 249–277). New York: John Wiley. Plutchik, R. (1980). Emotion. A psychoevolutionary synthesis. New York: Harper and Row. Rode, C., & Wang, X. T. (2000). Risk-sensitive decision-making examined within an evolutionary framework. American Behavioral Scientist, 43, 926–939. Roitblad, H. L. (1987). Introduction to comparative cognition. New York: Freeman. Rozin, P. (1976). The evolution of intelligence and access to the cognitive unconscious. In J. M. Sprague & A. N. Epstein (Eds.), Progress in psychobiology and physiological psychology (pp. 245–277). New York: Academic Press. Rozin, P. (2000). Evolution and adaptation in the the understanding of behavior, culture, and mind. American Behavioral Scientist, 43, 970–986. Rumbaugh, D. M., Savage-Rumbaugh, E. S., & Washburn, D. A. (1994). Learning, prediction, and control with an eye to the future. In M. M. Haith, J. B. Benson, R. J. Roberts, Jr., & B. F. Penning-ton (Eds.), The development of future-oriented processes (pp. 119–138). Chicago: University of Chicago Press. Rumbaugh, D. M., & Sterritt, G. M. (1986). Intelligence: From genes to genius in the quest for control. In W. Bechtel (Ed.), Integrating scientific disciplines. Dordrecht: Martinus Nijhoff. Ryan, R. M., Kuhl, J., & Deci, E. L. (1997). Nature and autonomy: An organizational view of social and neurobiological aspects of self-regulation in behavior and development. Development and Psychopathology, 9, 701–728. Scherer, K. R. (1984). On the nature and function of emotion: A component process approach. In K. R. Scherer & P. Ekman (Eds.), Approaches to emotion (pp. 293–317). Hillsdale, NJ: Lawrence Erlbaum. Schneider, K. (1996). Intrinsisch (autotelisch) motiviertes Verhalten – dargestellt an den Beispielen des Neugierverhaltens sowie verwandter Verhaltenssysteme (Spielen und leistungsmotiviertes Handeln) [Intrinsic (autotelic) behavior – discussed on examples of curious behavior and related behavioral systems]. In J. Kuhl & H. Heckhausen (Eds.), Enzyklopädie der Psychologie: Motivation, Volition und Handlung (pp. 119–152). Göttingen, Germany: Hogrefe. Schneider, K., & Dittrich, W. (1990). Evolution und Funktion von Emotionen [Evolution and function of emotions]. In K. R. Scherer (Ed.), Enzyklopädie der Psychologie: Psychologie der Emotion (pp. 41–114). Göttingen, Germany: Hogrefe. Schulz, R., & Heckhausen, J. (1996). A life-span model of successful aging. American Psychologist, 51, 702–714. Schulz, R., & Heckhausen, J. (1997). Emotions and control: A life-span perspective. In K. W. Schaie & M. P. Lawton (Eds.), Annual review of gerontology and geriatrics ( Vol. 17, pp. 185–205). New York: Springer. Singh, D. (1970). Preference for bar-pressing to obtain reward over freeloading in rats and children. Journal of Comparative and Physiological Psychology, 73, 320–327. Skinner, B. F. (1938). The behavior of organisms: An experimental approach. New York: Appleton-Century-Crofts. Snyder, M. L., Stephan, W. G., & Rosenfield, D. (1978). Attributional egotism. In J. H. Harvey, W. Ickes, & R. F. Kidd (Eds.), New directions in attribution research ( Vol. 2, pp. 91–117). Hills-dale, NJ: Lawrence Erlbaum. Stephens, D. W. E., & Krebs, J. R. (1986). Foraging theory. Princeton, NJ: Princeton University Press.
Salkind_Chapter 48.indd 260
9/4/2010 10:40:58 AM
Heckhausen
Evolutionary Perspectives on Human Motivation
261
Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative processes in animals. The Psychological Review Monograph Supplements, 2 (Whole No. 8). Tinbergen, L. (1960). The natural control of insects in pinewoods. Factors influencing the intensity of predation in songbirds. Archives Neerlandaiscs de Zoologie, 13, 265–343. Tinbergen, N. (1951). The study of instinct. London: Oxford University Press. Todd, P. M. (2000). The ecological rationality of mechanisms evolved to make up minds. American Behavioral Scientist, 43, 940–956. Tooby, J., & Cosmides, L. (1992). The psychological foundation of culture. In J. H. Barkow, L. Cosmides, & J. Tooby (Eds.), The adapted mind: Evolutionary psychology and the generation of culture (pp. 19–136). New York: Oxford University Press. Vroom, V. H. (1964). Work and motivation. New York: John Wiley. Washburn, D. A., & Rumbaugh, D. M. (1991). Ordinal judgments of numerical symbols by macaques (Macaca mulatta). Psychological Science, 2, 190–193. Watson, J. S. (1966). The development and generalization of ‘contingency awareness’ in early infancy: Some hypotheses. Merrill-Palmer Quarterly, 12, 123–135. Weiner, B. (1972). Theories of motivation, Chicago: Markham. Welker, W. L. (1956). Some determinants of play and exploration in chimpanzees. Journal of Comparative Physiological Psychology, 49, 84 – 89. White, R. W. (1959). Motivation reconsidered: The concept of competence. Psychological Review, 66, 297–333. Wilson, E. O. (1975). Sociobiology: The new synthesis. Cambridge, MA: Harvard University Press. Wundt, W. (1896). Grundriß der Psychologie [Foundations of psychology]. Leipzig: Engelmann.
Salkind_Chapter 48.indd 261
9/4/2010 10:40:58 AM
Salkind_Chapter 48.indd 262
9/4/2010 10:40:58 AM
49 The Debate about Rewards and Intrinsic Motivation: Protests and Accusations Do Not Alter the Results Judy Cameron and W. David Pierce
O
ur research (Cameron & Pierce, 1994) has clearly touched a nerve. The results of our meta-analysis indicate that rewards can be used effectively to enhance or maintain an individual’s intrinsic interest in activities. These findings are challenging to those who espouse the view that rewards and reinforcement are generally detrimental to a person’s intrinsic motivation. Our article has drawn criticism because the data from approximately 100 experiments show that there is only one small negative effect of reward, an effect that is highly circumscribed and easily avoided. This finding is disconcerting to those who contend that the negative effects of reward are substantial, generalized and occur across many conditions. Our analysis of 20 years of research is the most extensive review of the literature on rewards and intrinsic motivation to date. Because of its thoroughness, the data, analysis, and conclusions must be taken seriously. Faced with the evidence, researchers who have argued that rewards produce harmful effects under a wide range of conditions are put in a difficult position. One option they can take is to reanalyze the data in an attempt to show that rewards have strong negative effects on intrinsic motivation. Our data are readily available for additional analyses, and our procedures are clearly outlined in the original article. Failing this option, a second strategy is to suggest that the findings are invalid due to intentional bias, Source: Review of Educational Research, 66(1) (1996): 39–51.
Salkind_Chapter 49.indd 263
9/4/2010 10:40:50 AM
264
Motivation
deliberate misrepresentation, and inept analysis. Our critics have chosen the second strategy. Lepper, Keavney, and Drake (1996); Ryan and Deci (1996); and Kohn (1996) have responded to the results of our meta-analysis by accusing us of asking inappropriate questions, omitting important moderator variables, excluding critical experiments, and contradicting other reviews on the topic. In addition, they criticize our meta-analytic procedures and decisions as flawed. In response to these criticisms, we show that all relevant studies were included in our analyses and that the questions and reward conditions we assessed expand on previous reviews to provide a more comprehensive picture of the effects of rewards on intrinsic motivation. We answer the statistical concerns of our critics and show that our analysis is appropriate, accurate, and robust. Most importantly, we show that none of the objections raised by our critics negates our findings. The results and conclusions of our meta-analysis remain important, especially for those involved in education and other applied settings. An issue of prime concern to educators is how to use rewards effectively to promote learning without disrupting students’ intrinsic interest. Contrary to Ryan and Deci’s (1996) claim that our “theoretical position acknowledged no conditions under which one should expect negative effects” (p. 33), our results provide important clarifications about the conditions under which rewards produce positive or negative effects on intrinsic motivation. Of primary importance in classroom situations is the finding that rewards can be used to maintain or enhance students’ intrinsic interest in schoolwork. Verbal praise and performance feedback increase the value of an activity. When tangible rewards are offered contingent on level of performance or are given unexpectedly, students remain motivated in the subject area. A slight negative effect can be expected when a teacher offers a tangible reward without regard to the students’ level of performance. Under this condition, when the rewards are withdrawn, students will continue to like their schoolwork as much as others, but they may spend slightly less time on it in a free period. This negative effect can be easily prevented by offering students rewards for successful solution of problems, completion of work, or for attaining specified levels of performance on particular tasks. The point is that teachers can reward the level and quality of students’ work without disrupting motivation and interest in learning. These conclusions are not altered by the comments of Kohn, Ryan and Deci, and Lepper et al. In the following commentary we address our critics’ concerns. Our response is organized in two sections; the first deals with the general issues that have been raised by our critics, and in the second we focus on specific statistical criticisms.
Salkind_Chapter 49.indd 264
9/4/2010 10:40:50 AM
Cameron and Pierce
The Debate about Rewards and Intrinsic Motivation
265
General Issues The Overall Question One issue of contention involves our decision to begin our meta-analysis by investigating the overall effect of reward on intrinsic motivation (overall effect hypothesis). Lepper and his colleagues state that “to ask about the ‘overall’ or ‘in general’ effects of rewards or reinforcers is to pose a fundamentally meaningless question” (p. 7). They argue that the question is senseless and misleading, a view echoed by Kohn and by Ryan and Deci. We maintain that the overall effect hypothesis is central to an understanding of this area of research. One reason is practical. Many educators, parents, and administrators have adopted Kohn’s (1993) position that overall, rewards and incentive systems are harmful. In the present context, this stance means that rewards negatively affect students’ intrinsic interest, a question of overall effect. Others involved in education are still open to the possibility that rewards may be beneficial. A classroom teacher who wishes to implement an incentive system is first of all interested in whether rewards disrupt intrinsic interest in the subject matter. Of course, it may be advantageous to target particular subgroups or implement additional measures, but the question of the overall effect of reward is crucial to one’s teaching strategy. Another reason to address the main effect hypothesis is that academic journals, introductory textbooks, newspapers, and some of our critics continue to point to the overall negative or harmful effects of reward and reinforcement. In a prominent scientific journal, Nature, we learn that “it has been repeatedly shown that if people are rewarded for performing a task they find intrinsically pleasurable, they do it less, not more” (Sutherland, 1993, p. 767). A major introductory psychology textbook informs us that when an extrinsic reward is given, the motivation becomes extrinsic and the task itself is enjoyed less. When the extrinsic rewards are withdrawn, the activity loses its material value. . . . The moral is: A reward a day makes work out of play. (Zimbardo, 1992, p. 454, italics in the original)
Even in this issue of Review of Educational Research, Kohn asserts that “there is more than adequate justification for avoiding the use of incentives to control people’s behavior, particularly in a school setting” (p. 3). These examples are but a small sample of the claims made about the overall effects of reward. Many university students, educators, and parents have been exposed to this negative main effect assumption and base their own understanding and use of rewards on it. Social policy in our schools and other institutions reflects these beliefs. Because of this, an analysis of the general effects of reward is warranted.
Salkind_Chapter 49.indd 265
9/4/2010 10:40:50 AM
266
Motivation
In their critiques of our meta-analysis, Lepper et al. and Ryan and Deci indicate that they and others have long recognized that the negative overall effect hypothesis is incorrect. Nonetheless, numerous writers interpret the research findings as indicative of an overall negative effect and decry the use of rewards in educational and work settings (e.g., see Kohn, 1993). As a result, many parents, teachers, and others are reluctant to use rewards – any rewards – under any circumstances! Lepper and his colleagues suggest that reversing this incorrect conclusion will be harmful. They imply that we are trying to propagate our own myth – that rewards have no negative effects. We do not want to add any more myths to this research area. So let us be clear in stating that our research demonstrates that rewards have either positive or negative effects depending on the way they are administered. Importantly, the only negative effect of reward on intrinsic motivation occurs under a circumscribed set of conditions, namely, when rewards are tangible and promised to individuals without regard to any level of performance.
The Role of Moderator Variables A major focus of our meta-analysis was to assess the effects of various moderator variables. The moderators we included (type of reward, reward expectancy, and reward contingency) were chosen because of their theoretical and practical importance in the literature on intrinsic motivation as well as replication over a number of experiments. Our results indicate that the detrimental effects of reward are limited and depend on multiple moderators. All of our critics, Lepper et al., Ryan and Deci, and Kohn, are concerned that we failed to assess the impact of additional important moderators. The implication of their comments is that decremental effects of reward occur under numerous conditions and are far more widespread than our analysis suggests. Interestingly, however, as we describe below, an analysis of additional moderators would, in fact, show the opposite. Lepper et al. point to studies that assessed the impact of initial task interest and reward salience on intrinsic motivation. Other moderator variables hypothesized to influence intrinsic motivation include reward attractiveness, presence or absence of the experimenter, task difficulty, reward magnitude, and so on. It is critical to point out that the few studies designed to investigate the impact of these moderators typically begin with the one condition that produces a negative effect. Furthermore, such moderators have been shown to enhance, mitigate, or reverse the negative effects of expected, tangible, noncontingent reward. For example, Ross (1975) found that salient rewards make the negative effect of tangible, expected, noncontingent reward greater. McLoyd (1979), on the other hand, demonstrated that individuals offered a noncontingent, tangible reward experienced an increase in intrinsic motivation when the task was less interesting, while Williams’s (1980) research
Salkind_Chapter 49.indd 266
9/4/2010 10:40:50 AM
Cameron and Pierce
The Debate about Rewards and Intrinsic Motivation
267
indicated that the negative effects of tangible, expected, noncontingent reward could be offset by offering attractive rewards. In other words, the variables we have not assessed are moderators that have typically been added to the conditions that produce the single negative effect of reward found in our meta-analysis. Thus, an analysis of studies that included moderators that increase the negative effects of expected, tangible, noncontingent reward would serve to place further restrictions on the circumstances under which rewards undermine intrinsic motivation. That is, the negative effect phenomenon may be even more circumscribed than our data indicate, a finding contrary to the implications hinted at by our critics. Presently, however, there is no way to assess the theoretical or applied importance of these moderator variables. This is because only one or two studies have replicated the same moderator procedures on a common dependent measure of intrinsic motivation. If the effects of moderators such as reward salience, reward attractiveness, and so on were systematically replicated, a subsequent meta-analysis could be conducted to determine the conditions that moderate the negative effect on intrinsic motivation of tangible, expected, noncontingent rewards when they are removed. Of course, such an analysis would simply extend our findings and show that tangible, expected, noncontingent rewards produce negative effects on intrinsic motivation only when other conditions are present. For example, in terms of reward attractiveness, Williams’s (1980) research shows that when tangible, expected, noncontingent, unattractive rewards are given, intrinsic motivation decreases; the same reward condition with attractive rewards does not produce a decrement. Although present theoretical accounts (e.g., cognitive evaluation theory, the overjustification hypothesis) may be able to organize such circumscribed effects, the theories would become less and less generalizable. In applied settings, negative effects of reward on intrinsic motivation would depend on so many conditions that there would be little need for concern. Both Kohn (1996) and Ryan and Deci (1996) raise the question of moderators in the context of our finding that verbal praise produces positive effects both on the free time students spend on tasks and on attitude measures of intrinsic motivation. Specifically, they claim that verbal praise directed at controlling student behavior has negative effects on intrinsic motivation, whereas informational praise does not. We did not conduct an analysis on the control-informational dimension of verbal reward because these variables appear in only one or two studies. In addition, most research on this topic has been conducted without adequate no-feedback control groups (e.g., Ryan, 1982). Until a sufficient number of experiments with control groups are conducted, a meta-analysis of conditions that have few replications would not be reliable or beneficial to our understanding of reward and intrinsic motivation. We note, however, that although there are so few studies on this topic, the effects of controlling and informational verbal reward
Salkind_Chapter 49.indd 267
9/4/2010 10:40:50 AM
268
Motivation
were analyzed in a recent meta-analysis by Tang & Hall (1995). They found no significant effects on either of these dimensions. In sum, although our meta-analysis was designed to assess the effects of several moderators on reward and intrinsic motivation, Lepper et al., Ryan and Deci, and Kohn have suggested that many additional important moderators were omitted. As we have shown, an analysis of additional moderators would not alter our conclusions or change any of the results of our metaanalysis. That is, negative effects of reward on intrinsic motivation are highly conditional and occur solely in the presence of multiple moderators. In educational settings, negative effects can be avoided by praising students for their work and making tangible rewards contingent on performance.
Our Findings in Context Both Ryan and Deci and Lepper et al. argue that our findings contradict previous narrative reviews and other meta-analyses of reward and intrinsic motivation. Lepper et al. are not consistent on this point, and in a later section of their critique they concede that “other recent meta-analyses, . . . as well as numerous previous narrative reviews, have reached exactly [our] conclusion” (p. 7). In this section, we show that our results are in accord with other summaries of reward and intrinsic motivation and that our review advances the knowledge in this area. We briefly comment on three other meta-analyses on this topic (Rummel & Feinberg, 1988; Tang & Hall, 1995; Wiersma, 1992). The most recent meta-analysis on rewards and intrinsic motivation, conducted by Tang and Hall (1995), was designed to test several theoretical propositions about the overjustification effect. Fifty studies were included, largely a subset of the experiments examined in our review. One analysis concerned assessing the effects of expected, tangible, task-contingent (noncontingent) reward on the free time measure of intrinsic motivation. Tang and Hall found a negative effect, as did we. Also, in accord with our findings, they found no detrimental effect with unexpected, tangible reward. It is difficult to compare our findings on the effects of verbal reward on free time with their study, because their analysis included only two effect sizes (their result was not significant). Tang and Hall (1995) reported a negative effect on the free time measure for performance-contingent reward, whereas we found no significant effect. This difference in findings is due to Tang and Hall’s classification of performance-contingent reward as well as to their omission of several relevant studies. Of the seven studies that Tang and Hall analyzed as performance contingent, six are actually task-contingent reward procedures, as defined by Deci and Ryan (1985). We used Deci and Ryan’s definitions and identified 10 studies of performance-contingent reward; overall, there was no evidence of a negative effect. Additional measures of intrinsic motivation (e.g., attitude toward task) that we examined were not reported by Tang and Hall.1
Salkind_Chapter 49.indd 268
9/4/2010 10:40:50 AM
Cameron and Pierce
The Debate about Rewards and Intrinsic Motivation
269
The meta-analyses by Wiersma (1992) and Rummel and Feinberg (1988) were discussed in our original article (Cameron & Pierce, 1994). Wiersma analyzed 20 studies, and Rummel and Feinberg analyzed 45 studies. We cannot compare our findings with those of Rummel and Feinberg, because they averaged over different dependent measures of intrinsic motivation. Our meta-analysis shows that this is inappropriate, because the free time and attitude measures do not necessarily covary with the same experimental treatment. In addition, in both Rummel and Feinberg’s and Wiersma’s analyses, many of the effect sizes reported came from studies where one reward condition was compared to another reward condition. The lack of a no-reward group makes a comparison of findings problematic. Wiersma does, however, report effect size estimates for six experiments on free time that compared a no-reward condition to an expected, tangible, noncontingent reward condition. Though we have not conducted a meta-analysis on his results, we computed the average of the six independent effects sizes and found a negative effect, a finding compatible with our original conclusions. All in all, our findings for rewards that are tangible, expected, and noncontingent are consistent with other meta-analyses. Our research, however, went beyond an analysis of the one negative reward procedure and assessed the effects of reward under a variety of conditions. In terms of other reward procedures (e.g., verbal reward, performance-contingent reward) and other measures of intrinsic motivation (e.g., attitude toward a task), we failed to find any detrimental effects on intrinsic motivation. That is, our study showed that most reward procedures can be used to maintain or enhance intrinsic motivation; the negative effect other reviews have detected is only a small part of a larger picture. Thus, our meta-analysis provides a more complete account of the effects of rewards on intrinsic motivation.
The Completeness of Our Review A criticism put forward by Kohn, as well as by Ryan and Deci, is that we failed to include several critical experiments in our meta-analysis. The implication is that had such studies been included, our results would have been different. Kohn cites a number of studies that he believes we have overlooked. Most of these studies were located in our original search and were not included in our meta-analysis because of the lack of an adequate no-reward control condition. In addition, as we reported in our original article, our metaanalysis included studies published up to and including 1991. The studies from the period 1992–1994 cited by Kohn (Boggiano et al., 1992; Freedman, Cunningham, & Krismer, 1992; Gottfried, Fleming, & Gottfried, 1994) were, of course, not included. Of these, Freedman et al. varied the amount of reward but had no nonreward control group. The article by Boggiano et al. reported past research in order to develop a theory or model of students’
Salkind_Chapter 49.indd 269
9/4/2010 10:40:51 AM
270
Motivation
achievement patterns. Gottfried et al. examined parental motivational practices; their study did not include any of the reward conditions or dependent measures that we analyzed in our meta-analysis. Earlier studies by Birch, Marlin, and Rotter (1984) and Fabes, Fultz, Eisenberg, May-Plumlee, and Christopher (1989) concerned food preferences and prosocial behavior, respectively. Clearly, all these studies are off topic. Other papers that Kohn cites as missing are, in fact, included in our analyses (a list of all studies is presented in Cameron & Pierce, 1994, pp. 399–403). In contrast to Kohn, Lepper et al. charge us with including too many “bad” studies. An essential criterion of a reliable meta-analysis, however, is that all the studies done in a field are examined, independently of one’s own theoretical position and the degree to which the results of any particular study may be promising. We have met this criterion. In fact, our meta-analysis on the effects of rewards on intrinsic motivation is the most comprehensive review of this literature to date. The results are based on a large number of studies, and, to our knowledge, no relevant published studies were omitted. Due to the large sample of studies included in our analyses, any single study that may have been overlooked would not alter the conclusions. Overall, our results were based on all the available evidence, and the findings are central to an understanding of the effects of rewards on intrinsic motivation.
Meta-Analytic Issues In addition to the general criticisms discussed above, Lepper and his associates object to our use of meta-analysis for assessing the research on the effects of rewards on intrinsic motivation. In particular, they contend that the distributions of effect sizes in our article indicate that meta-analytic tests should not have been conducted. In accord with Ryan and Deci (1996) and Kohn (1996), they further suggest that the statistical procedures used in our meta-analyses must be flawed. Specifically, they criticize the technique of aggregating effect sizes within a single study when moderator variables are present. In this section, we respond to our critics’ meta-analytic and statistical concerns. We show that our analyses are appropriate, that the data are approximately normal and homogeneous, that inclusion or exclusion of outliers does not alter the results, and that our procedures yield correct estimates for the effects of rewards on intrinsic motivation at each level of analysis.
The Appropriateness of Meta-Analysis There are two main issues that concern Lepper et al. with regard to our use of meta-analytic techniques for assessing the effects of rewards on intrinsic motivation. First, they suggest that the apparent normality of our distributions for the
Salkind_Chapter 49.indd 270
9/4/2010 10:40:51 AM
Cameron and Pierce
The Debate about Rewards and Intrinsic Motivation
271
critical measures of intrinsic motivation (free time, attitude) is deceptive. Their second concern is that the data are not homogeneous (equal spread of effect sizes) and that meta-analytic tests should therefore not have been performed. As Lepper et al. acknowledge (p. 13–14), our distributions of effect sizes approximate a normal shape. However, they attribute the normality of these distributions to the inclusion of “pure zero cases” and random estimates. They argue that our inclusion of “pure zero cases” in our graphic portrayal of effect sizes (Cameron & Pierce, 1994, Figures 1 and 2) guarantees a normal distribution around the value of zero. Pure zero cases refer to studies that did not provide sufficient information to calculate effect sizes or random estimates (4 cases for free time and 17 cases for attitude). The truth is that we did not include pure zero cases in these figures. This is clearly stated on pages 379 and 384 of our original article. The normality of the distributions centering around zero is not due to pure zero cases. Thus, Lepper and his associates need not be concerned. In terms of our use of random estimates of effect sizes, our procedure is innovative and may be more appropriate than merely assigning a zero effect to the experiment or omitting the study itself. The procedure depended on the information available in each study. When t or F values were nonsignificant and were reported as less than some value (e.g., < 1), a random number between 0.01 and that value was selected; and an effect size was then calculated. In other cases, t or F values were not available, but means or directions of means were reported. In these situations, a random number between 0.01 and the critical value of t or F at p = .05 was drawn, and an effect size was then calculated. (For more information, see Cameron & Pierce, 1994, p. 376). With regard to the normality of our distributions, it is important to note that the direction of effect for random estimates was always known. If more studies had had negative effects, the distribution would have been pulled in that direction. The actual shape of the distribution shows that positive and negative effect sizes occurred with similar frequency. This is based not on our use of random estimates but on the actual direction of effects reported in such studies. In other words, the use of random estimates in no way biases the results toward an average zero effect size. The normality of the distributions centering around zero is not due to this, and, again, there is no need for concern. The point is that the effect size distributions approximated a normal shape, and meta-analytic tests could be used with confidence. Although Lepper et al. agree that our distributions are normal, they argue that our data are heterogeneous (lacking equal spread) and therefore inappropriate for meta-analysis. Our decision to use meta-analytic procedures involved a consideration of several issues. Initially, we were concerned with the normality of the distribution of effect sizes. We showed that the distributions were approximately normal and reported the degree of kurtosis and skewness of the free time distribution in the original article (p. 381). Next we considered the results of the Q test for homogeneity. It is well known that this
Salkind_Chapter 49.indd 271
9/4/2010 10:40:51 AM
272
Motivation
test is liberal in the sense that the null hypothesis (homogeneity) is too often rejected (Hunter, Schmidt, & Jackson, 1982). Because of this problem, we set the critical value of Q farther out on the chi-square distribution, just below the value at the .01 level (that is, p > .01). Homogeneity was achieved by excluding extreme effect sizes. The exclusion of outliers is not unusual and is recommended by Hedges (1987) as a method for obtaining more equal spread of the effect sizes. To assess any biases due to the removal of outliers, we reported all analyses with extreme values included and excluded. In addition, we identified the studies with extreme values and discussed the conditions that may have led to these atypical results. Inspection of our original article shows that the results do not change to any extent by excluding outliers. The validity of our meta-analysis is also increased by the use of the CL statistic (McGraw & Wong, 1992). CL is another way to express effect size. Importantly, McGraw and Wong conducted 118 tests (simulations) to show that the CL statistic is robust with respect to violations of normality and homogeneity. Because of this, we used CL in all our analyses and reported results identical to those of the other meta-analytic tests. In sum, the distribution of effect sizes for the critical measures of intrinsic motivation approximated a normal shape. The normality was not due to the inclusion of “pure zero cases” or random estimates as Lepper and his associates have suggested. Homogeneity of effect sizes was achieved by excluding outliers. All results were reported with outliers included and excluded; our findings were not altered to any extent by the exclusion of outliers. In addition, given our use of the CL statistic, we are confident that our analyses are appropriate and that the results are accurate and valid.
Aggregation of Effect Sizes in Meta-Analysis Lepper et al., Ryan and Deci, and Kohn are critical of the method of aggregating effect sizes within a study to yield a single estimate for each meta-analytic test. They contend that such procedures yield inaccurate estimates of the effects of reward on intrinsic motivation. Underlying this criticism is the supposition that the effects of important moderators and interactions were not detected in our analyses. Again, the implication of these comments is that negative effects of reward are more prevalent than our results communicate. In response to this concern, we first note that aggregation of effect sizes within a study is a common procedure in meta-analysis that avoids violation of the assumption of independence (Cooper, 1989; Hedges & Olkin, 1985). The procedures for aggregation are clearly described in our original article (pp. 376–377). It is important to point out that a serious statistical violation occurs when more than one effect size from an individual experiment is
Salkind_Chapter 49.indd 272
9/4/2010 10:40:51 AM
Cameron and Pierce
The Debate about Rewards and Intrinsic Motivation
273
entered into a single meta-analysis. Typically, in such cases, a control group is compared with more than one experimental treatment within a study, several effect sizes are calculated, and each is entered into a single meta-analytic test. The major problem is that the effect sizes are not independent (errors among observations are correlated). If the dependencies in such data were properly accounted for, the error term would become larger and mean effect sizes would become smaller. Another problem is that a particular study will contribute more weight to the overall meta-analytic outcome than a study yielding only one effect size. Other meta-analyses on reward and intrinsic motivation favored by Lepper et al. (p. 5) have violated the assumption of independence by entering several (sometimes over 10) effect sizes from one study into a single meta-analytic test (e.g., Rummel & Feinberg, 1988; Tang & Hall, 1995). The implication is that conclusions based on these meta-analyses could be incorrect. The way to achieve independence and at the same time retain effect sizes for an analysis of the impact of various moderators is to (a) aggregate them into a single estimate for an overall analysis of the effects of rewards on intrinsic motivation and (b) conduct further analyses of the effects of various moderator variables. For factorial designs, the main effect of reward is entered into an analysis of the overall effects of reward; interaction effects that have been replicated in a sufficient number of experiments are then analyzed separately. These are the procedures we used in our meta-analyses. As we indicated previously, the moderators we analyzed (reward type, reward expectancy, and reward contingency) were chosen because of their theoretical and applied importance as well as replication. Lepper et al. are concerned that aggregation of the moderators (rather than separate analyses) yields inaccurate estimates of the effects of reward on intrinsic motivation (p. 11–13). As mentioned earlier, the moderators not assessed in our analyses (e.g., presence of experimenter, reward attractiveness, salience, distraction, etc.) have appeared in only one or two studies, and in these studies they have been added to the tangible, expected, noncontingent reward condition to decrease, mitigate, or increase the negative effect. In terms of such studies, it is possible to obtain an unbiased estimate of the effect size of tangible, expected, noncontingent reward. When the results are pooled across all studies, the effects of any additional moderators are averaged out. That is, although any one of these manipulations may push intrinsic interest up (e.g., reward attractiveness) or down (e.g., surveillance, reward salience) in a given study, their effects are expected to cancel out across many studies. In other words, the best estimate of the effect size of tangible, expected, noncontingent reward when additional moderators are present is the average of all the comparisons of the rewarded conditions with nonrewarded control groups. Of course, additional meta-analyses could be conducted on the effects of these moderators if they were sufficiently replicated. As we pointed out,
Salkind_Chapter 49.indd 273
9/4/2010 10:40:51 AM
274
Motivation
however, because they are added to the one reward procedure that produces a reliable negative effect, the results would show that decremental effects of reward on intrinsic motivation depend on even stricter conditions than our analysis indicates. This is demonstrated in Lepper et al.’s analysis of three factorial experiments (Calder & Staw, 1975; Loveland & Olley, 1979; McLoyd, 1979) that crossed initial task interest (high, low) with reward (reward, no reward). Lepper et al. (p. 10) show that in these three studies, rewarding activities with high intrinsic interest yields a large negative effect size. In contrast, rewarding a task with low initial interest produces a positive effect size. In each of these studies, the reward procedure involved tangible, expected, noncontingent (or task-contingent) rewards – the one procedure that produces a negative effect on the free time measure of intrinsic motivation. Thus, if Lepper et al.’s analysis is reliable, the results indicate that tangible, expected, noncontingent rewards are harmful only when delivered for more interesting tasks. It is worth mentioning here, however, that a study excluded in Lepper et al.’s analysis (Mynatt et al., 1978) also crossed task interest with tangible, expected, noncontingent reward but found positive effects of reward for both low- and high-interest tasks. Given that there are so few studies of the interest variable, the results from this one study could substantially alter Lepper et al.’s conclusions about the importance of level of task interest when rewards are tangible, expected, and noncontingent. In summary, the procedures used in our meta-analysis yield correct estimates for the effects of reward on intrinsic motivation at each level of analysis. Our critics have implied that analyses of additional moderators and interactions would yield more general negative effects of reward on intrinsic motivation. However, as we have shown, further analyses would actually reveal that positive effects of reward are more general and that decremental effects of reward occur under even more restricted circumstances than our results indicate.
Conclusion A prominent view in education and social psychology is that rewards decrease a person’s intrinsic motivation. Our meta-analysis of 20 years of research suggests that this view is incorrect. The findings from approximately 100 studies indicate that rewards can be used effectively to enhance or maintain intrinsic interest in activities. The only negative effect of reward occurs under a highly specific set of conditions, circumstances that are easily avoided. Not surprisingly, these results have not been well received by those who argue that rewards produce negative effects on intrinsic motivation under a wide range of conditions. In response to the findings, Lepper, Keavney, and Drake (1996), Ryan and Deci (1996), and Kohn (1996) have suggested that the questions asked in our meta-analysis were inappropriate, that critical studies were excluded,
Salkind_Chapter 49.indd 274
9/4/2010 10:40:51 AM
Cameron and Pierce
The Debate about Rewards and Intrinsic Motivation
275
that important negative effects were not detected, and that the techniques used in our meta-analysis were unsuitable. In this response, we have shown that the questions asked are fundamental to an understanding of the relationship between rewards and intrinsic motivation and that our meta-analytic techniques are appropriate, robust, and statistically correct. Our meta-analysis includes all relevant studies on the topic, and the results clearly show that negative effects of rewards occur under limited conditions. All told, the results and conclusions of our meta-analysis are not altered by our critics’ protests and accusations. Our findings have important practical implications. In applied settings, the results indicate that verbal rewards (praise and positive feedback) can be used to enhance intrinsic motivation. When tangible rewards (e.g., gold stars, money) are offered contingent on performance on a task or are delivered unexpectedly, intrinsic motivation is maintained. A slight negative effect of reward can be expected when tangible rewards are offered without regard to level of performance. Under this condition, when the rewards are withdrawn, individuals report as much interest in the activity as those in a nonrewarded group, but they may spend slightly less time on it in a free period.2 This negative effect can be prevented by rewarding people for completing work, solving problems successfully, or attaining a specified level of performance. In other words, rewards can be used effectively in educational and other applied settings without undermining intrinsic motivation.
Notes 1. Tang and Hall (1995) reported effect sizes for questionnaire measures of intrinsic motivation. The studies they analyzed used questionnaire items to index attributions of causality; moral obligation; attitude toward the task; perceptions of luck, ability, effort, and difficulty; feelings of competence; negative affect; self-esteem; and so on. Tang and Hall combined the effect sizes of all these measures and reported metaanalyses based on this composite index. They did not examine attitude toward the task separately, as we did. Thus, we cannot compare our findings on the attitude measure of intrinsic motivation. 2. It may be informative to consider how serious the negative effect of expected, tangible, noncontingent reward on free time really is. How much less time would students spend on academic subjects if a teacher implemented this reward procedure and then removed it? Results from our meta-analysis indicate that the average effect size for a comparison between people who receive an expected, tangible, noncontingent reward and nonrewarded individuals on time on task following withdrawal of reward is –0.26. In the original experiments, time on task was typically measured over an 8-minute period. In order to convert the effect size of –0.26 to real time, one needs to know the pooled standard deviation of rewarded and nonrewarded groups. Because many researchers reported only t or F statistics, we will use a well-designed study by Pretty and Seligman (1984) to estimate a pooled standard deviation. Their study reported two experiments with large sample sizes and readily available statistical information. Both experiments compared a condition of expected, tangible, noncontingent reward (N = 30) with a nonrewarded control group (N = 30) on 8 minutes of free time. The pooled standard deviation was 2.6 minutes.
Salkind_Chapter 49.indd 275
9/4/2010 10:40:51 AM
276
Motivation
Using this estimate of error, we are able to convert the negative effect size from the meta-analysis into real time. An effect size of −0.26 would mean that in an 8-minute period, the average individual who is promised a noncontingent, tangible reward will spend about 41 seconds less time on the task when the reward procedure is withdrawn than the average nonrewarded individual. Given this result, what would happen if a teacher implemented this incentive procedure in a reading program and then removed it? According to the estimate, students who are offered gold stars for reading would spend about 3 minutes, 25 seconds less time reading in a 40-minute free-choice period than students not given the incentive. Of course, this is a hypothetical example, but it does illustrate the magnitude of this negative effect size in terms of real time.
References Birch, L. L., Marlin, D. W., & Rotter, J. (1984). Eating as the “means” activity in a contingency: Effects on young children’s food preference. Child Development, 55, 431–439. Boggiano, A. K., Shields, A., Barrett, M., Kellam, T., Thompson, E., Simons, J., & Katz, P. (1992). Helplessness deficits in students: The role of motivational orientation. Motivation and Emotion, 16, 271–296. Calder, B. J., & Staw, B. M. (1975). Self-perception of intrinsic and extrinsic motivation. Journal of Personality and Social Psychology, 31, 599–605. Cameron, J., & Pierce, W. D. (1994). Reinforcement, reward, and intrinsic motivation: A meta-analysis. Review of Educational Research, 64, 363–423. Cooper, H. M. (1989). Integrating research: A guide for literature reviews (2nd ed.). Beverly Hills, CA: Sage. Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum. Fabes, R. A., Fultz, J., Eisenberg, N., May-Plumlee, T., & Christopher, F. S. (1989). Effects of rewards on children’s prosocial motivation: A socialization study. Developmental Psychology, 25, 509–515. Freedman, J. L., Cunningham, J. A., & Krismer, K. (1992). Inferred values and the reverseincentive effect in induced compliance. Journal of Personality and Social Psychology, 62, 357–368. Gottfried, A. E., Fleming, J. S., & Gottfried, A. W. (1994). Role of parental motivation practices in children’s academic intrinsic motivation and achievement. Journal of Educational Psychology, 86, 104 –113. Hedges, L. (1987). How hard is hard science, how soft is soft science? The empirical cumulativeness of research. American Psychologist, 42, 443–55. Hedges, L., & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic. Hunter, J. E., Schmidt, F. L., & Jackson, G. B. (1982). Meta-analysis: Cumulating research findings across studies. Beverly Hills, CA: Sage. Kohn, A. (1993). Punished by rewards. Boston: Houghton Mifflin. Kohn, A. (1996). By all available means: Cameron and Pierce’s defense of extrinsic motivators. Review of Educational Research, 66, 1– 4. Lepper, M. R., Keavney, M., & Drake, M. (1996). Intrinsic motivation and extrinsic rewards: A commentary on Cameron and Pierce’s meta-analysis. Review of Educational Research, 66, 5–32. Loveland, K. K., & Olley, J. G. (1979). The effect of external reward on interest and quality of task performance in children of high and low intrinsic motivation. Child Development, 50, 1207–1210.
Salkind_Chapter 49.indd 276
9/4/2010 10:40:51 AM
Cameron and Pierce
The Debate about Rewards and Intrinsic Motivation
277
McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111, 361–365. McLoyd, V. C. (1979). The effects of extrinsic rewards of differential value on high and low intrinsic interest. Child Development, 50, 1010–1019. Mynatt, C., Oakley, D., Arkkelin, D., Piccione, A., Margolis, R., & Arkkelin, J. (1978). An examination of overjustification under conditions of extended observation and multiple reinforcement: Overjustification or boredom? Cognitive Therapy and Research, 2, 171–177. Pretty, G. H., & Seligman, C. (1984). Affect and the overjustification effect. Journal of Personality and Social Psychology, 46, 1241–1253. Ross, M. (1975). Salience of reward and intrinsic motivation. Journal of Personality and Social Psychology, 32, 245–254. Rummel, A., & Feinberg, R. (1988). Cognitive evaluation theory: A meta-analytic review of the literature. Social Behavior and Personality, 16, 147–164. Ryan, R. M. (1982). Control and information in the intrapersonal sphere: An extension of cognitive evaluation theory. Journal of Personality and Social Psychology, 43, 450 – 461. Ryan, R. M., & Deci, E. L. (1996). When paradigms clash: Comments on Cameron and Pierce’s claim that rewards do not undermine intrinsic motivation. Review of Educational Research, 66, 33–38. Sutherland, S. (1993). Impoverished minds. Nature, 364, 767. Tang, S., & Hall, V. (1995). The overjustification effect: A meta-analysis. Applied Cognitive Psychology, 9, 365–404. Wiersma, U. J. (1992). The effects of extrinsic rewards in intrinsic motivation: A metaanalysis. Journal of Occupational and Organizational Psychology, 65, 101–114. Williams, B. W. (1980). Reinforcement, behavior constraint, and the overjustification effect. Journal of Personality and Social Psychology, 39, 599–614. Zimbardo, P. G. (1992). Psychology and life (13th ed.). New York: Harper Collins.
Salkind_Chapter 49.indd 277
9/4/2010 10:40:51 AM
Salkind_Chapter 49.indd 278
9/4/2010 10:40:51 AM
50 A Comprehensive Expectancy Motivation Model: Implications for Adult Education and Training Kenneth W. Howard
M
otivating adult learners has always been a critical concern of adult education theorists and practitioners. Motivation has been defined as a hypothetical mechanism which controls goal-directed behavior (Reykowski, 1965). Various theoretical frameworks have been used to explain motivation in the context of adult education. Of these various frameworks, perhaps Maslow’s (1943) five-stage self-actualization model has been the most prominent (Gilmore, 1974). In recent years, expectancy theory has begun to gain popularity as a model for understanding educational motivation. Derived from social learning theory generally and cognitive or field theory specifically, it views people as purposeful beings who interact proactively with their environments based on their expectancies about the likelihood that their efforts will result in outcomes that they value. In other words, they choose to perform in ways that they believe are likely to benefit them (McMillan, 1980). Such a model has relevance to adult education and training, not only as a means of increasing learner motivation and performance in the learning situation, but also for refining enrollment strategies, reducing dropout rates, and insuring that learning has a practical application for the learner.
Source: Adult Education Quarterly, 39(4) (1989): 199–210.
Salkind_Chapter 50.indd 279
9/4/2010 10:40:42 AM
280
Motivation
The Development of Expectancy Theory Expectancy theory has its origins in the theories of Lewin (1938) and Tolman (1932), who postulated that human behavior was a result of the interaction of the individual and the environment, in the context of a specific situation, and that individuals develop beliefs about the probability of various possible outcomes of their behaviors, preferring some outcomes over others. Julian Rotter (1954, 1971), a social learning theorist, expanded on Lewin’s ideas regarding expectancy and motivation by adding elements of stimulus-response theory, suggesting that behaviors are motivated by the interaction of three factors: expectancy, reinforcement value, and the specific psychological situation. Some outcomes hold greater reinforcement value than others because they satisfy stronger needs. Specific situational cues (e.g., novelty of the situation, other people present) may alter expectancy or reinforcement values. Building on the work of Lewin, Tolman, and others, Vroom (1964a) developed valence-instrumentality-expectancy ( VIE) theory in its classical form. He postulated that the force of motivation behind any behavior was a product of valence, instrumentality, and expectancy. He defined expectancy as the individual’s subjective estimation of the likelihood of successfully performing a particular behavior, instrumentality as the individual’s subjective estimation of the likelihood that the behavior would be rewarded, and valence as the positive or negative value that the individual placed on the reward. Vroom (1964b) spelled out three basic assumptions underlying VIE theory: (a) that anticipation of reward energizes individual behavior, (b) that perceived value of various outcomes gives direction to individual behavior, and (c) that learned connections develop between behavior and outcome expectancy. Expectancy theory originated as a theory of work motivation and job satisfaction (Vroom, 1964a). Hence, most early applications of the theory were focused on business and industry, as were most early expectancy research studies. Yet despite the fact that expectancy theory had become the dominant motivation model in industry, it had been largely ignored by educators and educational administrators (Wright, 1985). However, in recent years adult education theorists have begun to recognize that expectancy theory has significant implications for adult education, particularly in accounting for the importance of barriers (internal or external) in predicting dropout (Darkenwald, 1981). Swedish theorists Rubenson and Hoghielm (1976, 1978) adapted Vroom’s VIE theory to explain and predict dropout from adult education. This was later refined by Borgstrom (1980). Their model described Force ( Vroom’s force of motivation), the strength of which determines if the individual completes or drops a course, as resulting from valence (the extent to which the individual regards a course as a fruitful means of satisfying perceived needs) and expectancy (the extent to which the individual feels capable of completing or coping with a course).
Salkind_Chapter 50.indd 280
9/4/2010 10:40:43 AM
Howard
Expectancy Motivation Model
281
Results of Expectancy Research Researchers have tested aspects and variations of expectancy theory with a variety of adult populations involved in both traditional and non-traditional educational settings, including: undergraduate university students (Arvey & Dunnette, 1980; Butler & Womer, 1985; Constantinople, 1967; Henson, 1976; Mitchell & Knudson, 1971; Mitchell & Nebeker, 1973; Polczynski & Shirland, 1976; Schmitt, 1975); adult GED students (Darkenwald, 1987; Moore & Davies, 1984); community college students (Malloch & Michael, 1981; Pritchard & DeLeo, 1973); graduate students (Miskel, DeFrain, & Wilcox, 1980), and public school teachers (Miskel, DeFrain, & Wilcox, 1980; Wright, 1985). Expectancy theory has also undergone extensive research in business and industry settings, in addition to the limited research in educational settings described above. Excellent comparative analyses of these studies and their results have been compiled by Heneman and Schwab (1972) and House, Shapiro, and Wahba (1974). Results of expectancy research have been mixed: the expectancy basis for motivation is supported but the individual elements of the theory are not consistently supported. Certainly, a simple, multiplicative model as proposed by Vroom cannot be supported. For example, while some studies support a multiplicative relationship between the VIE process variables (Lawler, 1968), others support an additive relationship (Feldman, 1974), others support both under different conditions (Butler & Womer, 1985), and others support neither an additive nor a multiplicative relationship (Hackman & Porter, 1968; Pritchard & Sanders, 1973). Some have found significant effects for all of the VIE process variables; others have found significant effects for some of the variables but not others (Arvey & Dunnette, 1980; House, Shapiro, & Wahba, 1974; Malloch & Michael, 1981; Moore & Davies, 1984; Pritchard & DeLeo, 1973). Although most studies agree that some combination of the VIE process variables is predictive of effort, a large number of studies have demonstrated that the VIE process variables alone are not predictive of performance (House, Shapiro, & Wahba, 1974). These studies have identified various other variables (e.g., ability, self-esteem, various personality traits) which either intervene between motivation and performance or influence the VIE process variables (Arvey & Dunnette, 1980; Butler & Womer, 1985; Darkenwald, 1987; House, Shapiro, & Wahba, 1974; Henson, 1976; Malloch & Michael, 1981; Mitchell & Knudson, 1971; Mitchell & Nebeker, 1973; Moore & Davies, 1984). These mixed results can be traced to two major areas: problems with research methodology and the lack of a sufficiently comprehensive model that better describes the complex relation between both the expectancy process variables and the other variables. Because several researchers have cogently and thoroughly addressed methodological problems in expectancy research (Butler & Womer, 1985; House, Shapiro, & Wahba, 1974), these are not discussed in detail in this paper. A few theorists (Graen, 1969; Lawler, 1973)
Salkind_Chapter 50.indd 281
9/4/2010 10:40:43 AM
282
Motivation
have either commented on the need for a more complex model of expectancy motivation or have suggested specific additions or changes to existing models. While these have been positive contributions, none have proved sufficiently comprehensive. Therefore, this paper addresses the need for a comprehensive model of expectancy motivation.
A Comprehensive Expectancy Motivation Model A comprehensive expectancy motivation model must meet three criteria. First, it must accurately describe the dynamics of the fundamental process variables. Second, it must place expectancy motivation in the context of a cycle that explains not only the influence of expectancy motivation on the actual behavior of individuals but also the influence of actual performance, reward, and need satisfaction on expectancy motivation. Third, it must describe the influence of other variables on the motivation process. This paper proposes such a model.
The Primary Expectancy Motivation Variables In this model, motivation is seen as the product of four primary process variables (see Figure 1): effort-performance (E-P) expectancy, performancereward (P-R) expectancy, reward-need satisfaction (R-N) expectancy, and valence ( V ). E-P expectancy is defined as an individual’s perception of the likelihood that his or her effort will lead to successful performance of a specific behavior(s) in a specific situation. P-R expectancy is defined as the perception of the likelihood of being rewarded for successful performance. R-N expectancy is defined as the perception of the likelihood that those rewards will meet important personal needs. Valence ( V ) is defined as the value the individual places on the object (e.g., performance, reward, or need satisfaction) of any of the above expectancies. Motivation
E1
E2
Effort
Performance
E P Expectancy
E3
Reward
P R Expectancy
Need Satisfaction
R N Expectancy
Figure 1: The primary expectancy motivation variables
Salkind_Chapter 50.indd 282
9/4/2010 10:40:43 AM
Howard
Expectancy Motivation Model
283
For example, an individual’s motivation in a learning situation would be high if that person: (a) perceived a high likelihood of performing successfully in the classroom and transferring those behaviors to the job (E-P), ( b) perceived that improved job performance was likely to be rewarded by recognition from co-workers and supervisor (P-R), (c) perceived a high likelihood that recognition would meet basic acceptance needs (R-N), and (d) placed a high value ( V ) on each of the above. How a situation is viewed varies among individuals who will have different expectancies and valences. For example, a learner with internal perceived locus of control would be more likely to value (and expend effort towards) intrinsic (i.e., built-in, learner-centered) than extrinsic (i.e., educator-administered) rewards.
Expectancy Motivation as a Dynamic Process Figure 2 illustrates the dynamic nature of the expectancy motivation process variables. The model is a cyclical one: The outcomes of motivation (i.e., effort, performance, reward, and need satisfaction) affect the individual’s level of motivation on a continuous basis. Initial motivation is based on an individual’s subjective prediction of the probability of performance, reward, and need satisfaction. However, initial motivation results in actual effort, which in turn may result in actual performance, reward, and need satisfaction. Based on these observed results, the individual tests the accuracy of initial predictions (i.e., expectancies) and revises current E-P, P-R, and R-N expectancies. In any given situation, motivation directly influences only the amount of effort a person will expend towards performing required behaviors (e.g., learning tasks). Actual effort is the only variable directly related to motivation; the rest are indirectly related. Actual effort may or may not result in successful performance. Initial success (or progress) may increase a person’s E-P expectancy and, thereby, motivation to continue efforts toward performing subsequent tasks. Similarly, initial lack of success results in lower E-P expectancy. Successful performance of
Motivation E-P
P-R R-N
Effort
Performance
Reward
Need Satisfaction
Figure 2: Expectancy motivation as a dynamic process
Salkind_Chapter 50.indd 283
9/4/2010 10:40:43 AM
284
Motivation
initial learning tasks motivates the learner to work toward subsequent learning tasks. Conversely, poor performance of initial learning tasks may lead to lower motivation on subsequent learning tasks. Continued poor performance may lower E-P expectancy to the point that the learner may decide that continued effort is wasted and drop out of the learning activity. Actual performance may or may not result in rewards. In a given situation, consistent reward for successful performance improves an individual’s P-R expectancy. Similarly, lack of reward (or inconsistent or inequitable rewards) results in lowered P-R expectancy. Reinforcement of newly learned behavior improves the learner’s P-R expectancy and therefore increases that person’s motivation to continue in the activity. If learned behavior is not reinforced, P-R expectancy and resultant motivation are decreased. Actual rewards may or may not meet the individual’s needs. If rewards satisfy the individual’s needs, that person’s R-N expectancy – and resulting motivation – will be increased. Learning programs tailored to unique learner needs result in higher motivation. This model implies that performance would have a stronger impact on satisfaction than satisfaction would have on performance. In other words, successful performance in a learning situation results in increased learner satisfaction (performance-reward-need satisfaction), setting up a cycle of reinforcement which becomes stronger over time.
The Influence of Other Variables on Expectancy Motivation As Figure 3 illustrates, expectancies are not only modified by ongoing feedback in the current situation but also by the individual’s past experience. Personal experience in similar situations provides the individual with a basis for determining E-P, P-R, and R-N expectancies. Observed experience (e.g., knowledge obtained by directly observing others’ experiences in similar situations) and communicated experience (e.g., shared information from others about their experiences in similar situations) are other sources. Repeated exposure to similar situations develops an individual’s knowledge, skills, and abilities (KSAs). An individual with moderate motivation and a high skill level will probably perform better than one with moderate motivation but a lower skill level. An individual whose effort frequently results in successful performance will have higher self-esteem than one who experiences frequent failures. Lower self-esteem translates as lower E-P expectancy and, therefore, lower motivation. P-R and R-N expectancies are similarly influenced by the individual’s past experience in similar situations. Personality variables also influence expectancy motivation. Finally, uncontrollable environmental forces sometimes interfere with actual performance and reward. A turbulent environment decreases E-P and P-R expectancies.
Salkind_Chapter 50.indd 284
9/4/2010 10:40:43 AM
Howard
Expectancy Motivation Model
285
Self Esteem
Past Experience
Ability (KSAs)
Motivation E-P P-R R-N
Effort
Environ. Conditions
Performance
INT/EXT
Reward
Need Satisfaction
Figure 3: A comprehensive expectancy motivation model
Implications for Adult Education and Training Increasing and maintaining learner motivation is a fundamental concern of adult educators. One might broadly conceptualize learning situations as having three stages, each with a different motivational focus: Pre-Learning (i.e., the period immediately prior to the learning situation), Learning (i.e., the actual learning situation), and Post-Learning (i.e., the period immediately following the learning situation). In the Pre-Learning stage, prospective learners must be motivated to become initially involved in learning. In the Learning stage, learners must be motivated to continue and take an active part in learning activities. In the Post-Learning stage, learners must be motivated to apply what they have learned. In each stage the same internal process variables – E-P, P-R, and R-N expectancies, and Valence – determine the level of motivation. The implications of this model are clearest for planned, structured adult education programs with specific learning objectives. Increased planning and structure provide the adult educator with more opportunities to manipulate the expectancy variables. Similarly, increased structure and specific learning objectives increase the ability of the learner to formulate clear expectancies regarding the learning situation. However, even in less structured learning situations or learning situations with broader goals (e.g., liberal education programs, self-directed learning projects) the same principles still apply: Learners that believe the learning goals are achievable and will result in personal rewards that meet their individual needs will be more motivated than those who do not. Similarly, learners involved in self-directed learning activities can plan and structure their learning according to the principles of the Comprehensive Expectancy Motivation Model in order to maximize their motivation in the context of their learning projects.
Salkind_Chapter 50.indd 285
9/4/2010 10:40:43 AM
286
Motivation
Pre-Learning: Motivating Initial Involvement To motivate learners to become involved in a specific learning project, the adult educator must persuade them that: (a) the learning tasks are within their ability to perform, given reasonable effort (increased E-P expectancy); (b) successful performance of these tasks will be rewarded, both in the learning situation and in practical application, (increased P-R expectancy); and (c) the reward will satisfy their needs (increased R-N expectancy). The individual’s perception of the situation – not the objective reality – influences motivation at this stage, since the person has no direct experience to go on. This means that the learners will formulate expectancies based on past experience in similar learning programs, and particularly on what others have said about the learning program in question. Marketing can be a key factor in maximizing motivation. Brochures should clearly describe learning objectives and demonstrate how they translate into improved performance (E-P). They should clearly state the minimum experience and the KSA levels for which a program is designed (E-P). In the case of industry-based programs, they should describe organizational sanctions or incentives in support of programs (P-R); supervisors should be made aware of programs so that they can encourage appropriate employees to attend (E-P and P-R). Word-of-mouth marketing from employees currently or previously involved in similar programs can also be motivating (E-P, P-R, and R-N). This assumes that program objectives are, in fact, based on assessed needs related to typical tasks, and that the learning program is designed in such a way that successful performance in the learning program is analogous and transferable to practical performance. It also suggests that in-house trainers should actively work with management to build support for job-related education, as well as for specific learning programs. One way of maximizing job relevance, organizational support, and effective word-of-mouth marketing would be to involve representatives of targeted groups in the design of programs. Another option would be to include a representative cross-section of staff on an advisory committee.
Learning: Motivating Continued Involvement During this stage learner motivation is much more fluid and may be influenced by actual experience in the learning situation. Learners’ initial motivation may decrease if their experience in the learning situation leads them to believe: (a) that they cannot perform the learning tasks (E-P), (b) that learning task performance will not translate to performance on the job (E-P), (c) that performance will not be rewarded in either the learning situation or in the practical settings (P-R), or (d) that the rewards will not satisfy their needs (R-N). If motivation drops significantly, they may become uninvolved or
Salkind_Chapter 50.indd 286
9/4/2010 10:40:43 AM
Howard
Expectancy Motivation Model
287
may drop out altogether. The learner’s perception is still the only thing that counts. However, now we are dealing with the learner’s perception of his or her own actual learning experience. Adult educators should attempt to build success into learning designs. The curriculum should build on skills that learners already possess. Learning should be in steps that are challenging, yet achievable, and the tasks related to the practical setting. A variety of opportunities for performing should be offered, allowing for different learning styles and incorporating both intrinsic and extrinsic rewards. Successful performance builds learner motivation directly, through experience in the learning situation, and indirectly, by building self-esteem. The adult educator should attempt early in the program to engage the individual learners in explicit goal setting, focusing on clarifying expectancies regarding the learning situation. Specifically, the focus should be on whether the program will accomplish the learner’s goals (R-N) and whether the learner can – with reasonable effort – achieve the learning objectives (E-P). Learning contracts are ideal for use in such goal-setting activities. Learning contracts, though strongly validated by field practice, have been criticized for their lack of a theoretical base (Polczynski & Shirland, 1976). Expectancy theory would appear to provide a strong theoretic basis for contract learning. The adult educator should be alert at this stage for adaptations necessary to bring the learning activities in line with overall learner expectancies. On the other hand, adult educators should not devote excessive time to unrelated warm-up exercises, ice breakers, and strategies aimed at making learners feel good about themselves, since the model presented here does not support the assumption that such strategies will improve motivation to learn. Rather, it suggests that learner practice be encouraged as early in the program as possible. Learner practice should be followed with immediate, constructive feedback from the adult educator and other learners. Clear ground rules for feedback, set early in the program, allow for reinforcement of learner expectancies that effort will in fact result in successful performance on learning tasks. Such feedback can also help shape performance in the learning situation into performance that can be more easily transferred to real-life situations which are more likely to reward learners in ways that satisfy their needs. Effective problem-solving methods, imparted early in the program, can help maintain learner motivation by providing the learner with the tools to improve performance, which, in turn, will both directly and indirectly raise expectancy levels.
Post-Learning: Motivating Application of Learning In the Post-Learning stage, the learners must be motivated to apply the skills learned. Learners’ motivation may decrease if they develop the perception
Salkind_Chapter 50.indd 287
9/4/2010 10:40:44 AM
288
Motivation
that: (a) learning task performance will not translate to actual performance (E-P), (b) actual performance will not be rewarded (P-R), or (c) that the rewards will not satisfy their needs (R-N)- These issues need to be addressed toward the end of the program. Learners’ problem-solving strategies should also be refined and action and contingency plans developed for implementing their new skills. Conscious planning is helpful in maintaining motivation. The problem of maintaining motivation can also be dealt with by breaking up the program into a series of sessions, interspersed with opportunities for practical application. This gives the learner the opportunity to “phase in” actual performance in small, achievable steps, thus undergirding self-esteem and building an objective, experiential foundation on which to base expectancies. Finally, the adult educator should encourage learners to form support groups during the Post-Learning period. In job-related training, the adult educator should attempt to educate learners’ supervisors to the need to reinforce successful performance during this period through constructive feedback and by suggesting opportunities to use the new skills.
Testing the Comprehensive Expectancy Motivation Model The Comprehensive Expectancy Motivation Model presented in this paper suggests a number of hypotheses regarding motivation that can be tested empirically: 1. The learner’s expectancies would change and become more accurate and consistent with continuing experience in any situation. 2. The expectancies of individuals with prior experience in similar situations would be more accurate and consistent than those of others who had not. 3. Successful performance in a learning situation would increase the learner’s E-P expectancy; failure to perform would decrease it. 4. Consistent reward in a learning situation would increase the learner’s P-R expectancy; lack of rewards or inconsistent rewards would decrease it. 5. Lack of fit between rewards and the learner’s perceived needs would decrease the learner’s R-N expectancy. 6. Successful performance would have a greater impact on learner satisfaction than learner satisfaction would have on performance. 7. Expectancy and ability combined would be a better predictor of successful performance than either would separately. At the start of any learning situation, expectancy motivation would predict effort, while ability would be more strongly correlated with performance. However, with increased experience in any given situation the correlation between expectancy motivation and performance would become stronger.
Salkind_Chapter 50.indd 288
9/4/2010 10:40:44 AM
Howard
Expectancy Motivation Model
289
8. Learner practice should be encouraged as early in the program as possible since the model does not support the assumption that warm-up exercises, ice breakers, and other strategies aimed at making the learners feel good about themselves improve motivation to learn. The Comprehensive Expectancy Motivation Model provides a framework that encompasses and explains the dynamic relationships among most of the commonly observed adult learning principles. Adult educators have long observed that adults are more motivated to learn when involved in setting their own learning goals, when given opportunities for relevant practice, when the “payoff” of learning is immediate, and so forth. This paper has described how these principles can be integrated into a single, predictive model that can be tested empirically.
References Arvey, R, & Dunnette, M. (1980). Task performance as a function of perceived effortperformance and performance-reward contingencies (Technical Report No. 4003). Washington, DC: Office of Naval Research. Borgstrom, L. (1980). Drop-out in municipal adult schools in the context of allocation policy. In R. Hoghielm and K. Rubenson (Eds.), Adult education for social change (pp. 105–130). Stockholm: Stockholm Institute of Education. Butler, Jr., J., & Womer, N. (1985). Hierarchical vs. non-nested tests for contrasting expectancy-valence models: Some effects of cognitive characteristics. Multivariate Behavioral Research, 20, 335–352. Constantinople, A. (1967). Perceived instrumentality of the college as a measure of attitudes toward college. Journal of Personality and Social Psychology, 5(2), 196 –201. Darkenwald, G. (1981). Retaining adult students. Columbus, OH: National Center for Research in Vocational Education. Darkenwald, G. (1987). Dropout as a function of discrepancies between expectations and actual experiences of the classroom social environment. Adult Education Quarterly, 37, 152–163. Gilmore, R. (1974). Expectancy beliefs, ability, and personality in predicting academic performance. Journal of Educational Research, 156(4), 28–37. Graen, G. (1969). Instrumentality theory of work motivation: Some experimental results and suggested modification. Journal of Applied Psychology, 53, 2. Feldman, J. (1974). Note on the utility of certain weights in expectancy theory. Journal of Applied Psychology, 59(6), 727–730. Hackman, J., & Porter, L. (1968). Expectancy theory predictions of work effectiveness. Organizational Behavior and Human Performance, 3, 417– 426. Heneman, H., & Schwab, D. (1972). Evaluation of research on expectancy theory predictions of employee performance. Psychological Bulletin, 78(1), 1–9. Henson, R (1976). Expectancy beliefs, ability, and personality in predicting academic performance. Journal of Educational Research, 70, 41– 44. House, R., Shapiro, H., & Wahba, A. (1974). Expectancy theory as a predictor of work behavior and attitude: A re-evaluation of empirical evidence. Decision Sciences, 5, 481–506.
Salkind_Chapter 50.indd 289
9/4/2010 10:40:44 AM
290
Motivation
Lawler, E. (1968). A correlation-causal analysis of the relationship between expectancy attitudes and job performance. Journal of Applied Psychology, 52, 462– 468. Lawler, E. (1973). Motivation in work organizations. Monterey, CA: Brooks-Cole. Lewin, K. (1938). The conceptual representation and the measurement of psychological forces. Durham, NC: Duke University Press. Maslow, A. (1943). A theory of human motivation. Psychological Review, 50, 370–396. Malloch, D., & Michael, W. (1981). Predicting student grade point average at a community college from scholastic aptitude tests and from measures representing three constructs in Vroom’s expectancy theory model of motivation. Educational and Psychological Measurement, 41, 1127–1135. McMillan, J. (1980). Social psychology and learning. In J. H. McMillan (Ed.), The social psychology of school learning. New York: Academic Press. Miskel, C, DeFrain, J., & Wilcox, K. (1980). A test of expectancy work motivation in educational organizations. Educational Administration Quarterly, 16(1), 70–92. Mitchell, T, & Knudson, B. (1971). Instrumentality theory predictions of students attitudes towards business and their choice of business as an occupation. Journal of Applied Psychology, 57, 61–67. Mitchell, T., & Nebeker, D. (1973). Expectancy theory predictions of academic effort and performance. Journal of Applied Psychology, 57, 61–67. Moore, R., & Davies, J. (1984). Predicting GED scores on the bases of expectancy, valence, intelligence, and pretest skill levels with the disadvantages. Educational and Psychological Measurement, 44, 483– 490. Polczynski, J., & Shirland, L. (1976). Expectancy theory and contract grading combined as an effective motivational force for college students. Journal of Educational Research, 70, 238–241. Pritchard, R., & DeLeo, R. (1973). Experimental test of the valence-instrumentality relationship in job performance. Journal of Applied Psychology, 57, 264 –270. Pritchard, R., & Sanders, M. (1973). The influence of valence, instrumentality, and expectancy on effort and performance. Journal of Applied Psychology, 57, 55–60. Reykowski, J. (1965). Motivation as a component of the regulatory system of behavior. In M. Jones (Ed.), Human Motivation (pp. 71–85). Lincoln, NE: University of Nebraska Press. Rotter, J. (1954). Social learning and clinical psychology. Englewood Cliffs, NJ: PrenticeHall. Rotter, J. (1971). Clinical psychology. Englewood Cliffs, NJ: Prentice-Hall. Rubenson, K. (1976). Recruitment in adult education: A research strategy. Stockholm: Stockholm Institute of Education. Rubenson, K, & Hoghielm, R. (1978). The teaching process and study dropouts in adult education. Stockholm: Stockholm Institute of Education. Schmitt, N. (1975). A causal-correlational analysis of expectancy theory hypotheses. Psychological Reports, 37, 427– 431. Tolman E. (1932). Purposeful behavior in animals and men. New York: Appelton-CenturyCrofts. Vroom, V. (1964a). Work and motivation. New York: John Wiley. Vroom, V. (1964b). Some psychological aspects of organizational control. In W. W. Cooper (Ed.), New perspectives in organizational research (pp. 72–86). New York: John Wiley. Wright, R. (1985). Motivating teacher involvement in professional growth activities. The Canadian Administrator, 24(5), 1–6.
Salkind_Chapter 50.indd 290
9/4/2010 10:40:44 AM
51 The Academic Motivation Scale: A Measure of Intrinsic, Extrinsic, and Amotivation in Education Robert J. Vallerand, Luc G. Pelletier, Marc R. Blais, Nathalie M. Brière, Caroline Senécal and Evelyne F. Vallières
O
ne of the most important psychological concepts in education is certainly that of motivation. Indeed, much research has shown that motivation is related to various outcomes such as curiosity, persistence, learning, and performance (for a review of the literature see Deci and Ryan, 1985). In light of the importance of these consequences for education, one can easily understand the interest of researchers for motivation in educational settings. Several conceptual perspectives have been proposed in order to better understand academic motivation (see The Educational Psychologist, 1991, Issue 4, for a complete number devoted to academic motivation). One useful perspective posits that behavior can be intrinsically motivated, extrinsically motivated, or amotivated (Deci and Ryan, 1985, 1991). This theoretical approach has generated a considerable amount of research and appears rather pertinent for the field of education (see Deci and Ryan, 1985; Deci, Vallerand, Pelletier, and Ryan, 1991). This approach is detailed below.
Intrinsic Motivation In general, intrinsic motivation (IM) refers to the fact of doing an activity for itself, and the pleasure and satisfaction derived from participation (Deci, Source: Educational and Psychological Measurement, 52 (1992): 1003–1017.
Salkind_Chapter 51.indd 291
9/4/2010 10:40:34 AM
292
Motivation
1975; Deci and Ryan, 1985). An example of IM is the student that goes to class because he or she finds it interesting and satisfying to learn more about certain subjects. Deci and Ryan posit that IM stems from the innate psychological needs of competence and self-determination. Thus, activities that allow individuals to experience such feelings will be engaged in again freely out of IM. While most researchers posit the presence of a global IM construct, certain theorists (Deci, 1975) have proposed that IM might be differentiated into more specific motives. Unfortunately, these authors have not indicated which types of IM follow from the more general IM construct. More recently, a tripartite taxonomy of intrinsic motivation has been postulated ( Vallerand, Blais, Brière, and Pelletier, 1989). This taxonomy is based on the IM literature which reveals the presence of three types of IM that have been researched on an independent basis. These three types of IM can be identified as IM to know, to accomplish things, and to experience stimulation. These types of IM are described more fully below. Intrinsic motivation to know (IM-to know). This type of IM has a vast tradition in educational research. It relates to several constructs such as exploration, curiosity, learning goals, intrinsic intellectuality, and finally the IM to learn (e.g., Gottfried, 1985; Harter, 1981). To the above perspectives which are more specific to the realm of education, may be added others that are more global such as that of the epistemic need to know and understand, and that of the search for meaning (see Vallerand et al., 1989). Thus, IM-to know can be defined as the fact of performing an activity for the pleasure and the satisfaction that one experiences while learning, exploring, or trying to understand something new. For instance, students are intrinsically motivated to know when they read a book for the sheer pleasure that they experience while learning something new. Intrinsic motivation toward accomplishments (IM-to accomplish things). This second type of IM has been studied in developmental psychology as well as in educational research under concepts such as mastery motivation (Harter, 1981). In addition, other authors have postulated that individuals interact with the environment in order to feel competent, and to create unique accomplishments (Deci, 1975; Deci and Ryan, 1985, 1991). Finally, to the extent that individuals focus on the process of achieving rather than on the outcome, achievement motivation can be seen as being subsumed under the umbrella of IM-to accomplish things. Thus, IM-to accomplish things can be defined as the fact of engaging in an activity for the pleasure and satisfaction experienced when one attempts to accomplish or create something. Students who extend their work beyond the requirements of a term paper in order to experience pleasure and satisfaction while attempting to surpass themselves display IM toward accomplishments. Intrinsic motivation to experience stimulation (IM-to experience stimulation). Finally, IM-to experience stimulation is operative when someone engages in
Salkind_Chapter 51.indd 292
9/4/2010 10:40:35 AM
Vallerand et al.
The Academic Motivation Scale
293
an activity in order to experience stimulating sensations (e.g., sensory pleasure, aesthetic experiences, as well as fun and excitement) derived from one’s engagement in the activity. Research on the dynamic and holistic sensation of flow, on feelings of excitement in IM, on aesthetic stimulating experiences, and peak experiences is representative of this form of IM (e.g., Csikszentmihalyi, 1975). Students who go to class in order to experience the excitement of a stimulating class discussion, or who read a book for the intense feelings of cognitive pleasure derived from passionate and exciting passages represent examples of individuals who are intrinsically motivated to experience stimulation in education.
Extrinsic Motivation Contrary to IM, extrinsic motivation (EM) pertains to a wide variety of behaviors which are engaged in as a means to an end and not for their own sake (Deci, 1975). Recently, Deci, Ryan and their colleagues (Deci and Ryan, 1985, 1991) have proposed that three types of EM can be ordered along a selfdetermination continuum. From lower to higher levels of self-determination, they are: external regulation, introjection, and identification1. External regulation corresponds to EM as it generally appears in the literature. That is, behavior is regulated through external means such as rewards and constraints. For instance, a student might say: “I study the night before exams because my parents force me to.” With introjected regulation, the individual begins to internalize the reasons for his or her actions. However, this form of internalization, while internal to the person, is not truly self-determined since it is limited to the internalization of past external contingencies. Thus, the individual might say: “I study the night before exams because that’s what good students are supposed to do.” To the extent that the behavior becomes valued and judged important for the individual, and especially that it is perceived as chosen by oneself, then the internalization of extrinsic motives becomes regulated through identification. The individual might say, for instance: “I’ve chosen to study tonight because it is something important for me.”
Amotivation In addition to intrinsic and extrinsic motivation, Deci and Ryan (1985) have recently posited that a third type of motivational construct is important to consider in order to fully understand human behavior. This concept is termed amotivation. Individuals are amotivated when they do not perceive contingencies between outcomes and their own actions. They are neither intrinsically nor extrinsically motivated. When amotivated individuals experience feelings of incompetence and expectancies of uncontrollability. They perceive their
Salkind_Chapter 51.indd 293
9/4/2010 10:40:35 AM
294
Motivation
behaviors as caused by forces out of their own control. They feel undeceived, and start asking themselves why in the world they go to school. Eventually they may stop participating in academic activities. Although scales assessing motivation toward education do exist, no scale currently allows to assess all constructs discussed above. Harter’s (1981) Intrinsic vs Extrinsic Orientation Scale pits IM against EM on the same continuum and thus prevents an independent assessment of these two constructs. In addition, it does not measure the different types of EM and amotivation. Gottfried’s (1985) Children Academic Intrinsic Motivation Inventory assesses only intrinsic interest toward learning in various subjects (e.g., reading, social sciences) as well as toward school in general. Thus, it does not measure the different types of IM, EM, or amotivation. Furthermore, while Ryan and Connell (1989) have recently developed a scale that does assess IM, identification, introjection, and external regulation, the psychometric properties of this scale have not been fully presented. In addition, this scale does not include the different types of IM or amotivation. Finally, it should be noted that all of the above scales are aimed at elementary and beginning high-school students. No existing scale seems to assess motivation toward post-secondary studies within the present theoretical framework. In light of the importance of conducting research on academic motivation with an instrument based on a valid theoretical conceptualization, and the fact that no scale to date seems to assess IM, EM, and amotivation toward post-secondary studies, Vallerand et al. (1989) developed and validated in French the Echelle de Motivation en Education (EME). This scale is made up of seven subscales of four items each assessing the three types of IM (IM to know, to accomplish things, and to experience stimulation), three types of EM (external, introjected, and identified regulation), and amotivation. In the EME, motivation is operationalized as the underlying “why” of behavior (Deci and Ryan, 1985) and focus on the perceived reasons for engaging in the activity. Thus, the scale asks the question “Why do you go to college?” and items represent possible answers to that question, thus reflecting the different types of motivation. Here are some sample items from the scale: Amotivation subscale, “Honestly I don’t know; I really feel that I’m wasting my time in college”; External Regulation, “In order to get a more prestigious job later on”; Introjected Regulation, “To prove to myself that I can do better than just a high-school degree”; Identified Regulation, “Because eventually it will allow me to enter the job market in a field that I like”; IM-to know, “Because I experience pleasure and satisfaction while learning new things”; IM-Accomplishment, “For the pleasure I experience while surpassing myself in my studies”; IM-Stimulation, “For the high feeling that I experience while reading on various interesting subjects.” Preliminary (Daoust, Vallerand, and Blais, 1988; Vallerand and Bissonnette, in press) and validation studies ( Vallerand et al., 1989), which involved
Salkind_Chapter 51.indd 294
9/4/2010 10:40:35 AM
Vallerand et al.
The Academic Motivation Scale
295
more than 3,000 students, revealed that the EME has satisfactory internal consistency levels (a mean alpha score of .80), as well as high indices of temporal stability (a mean test-retest correlation of .75) over a one-month period. Results of a confirmatory factor analysis (with LISREL) also confirmed the seven-factor structure of the EME. Finally, the construct validity of the scale was supported by a series of correlational analyses among the seven subscales, as well as between these scales and other psychological constructs relevant to education, such as interest toward school, time spent in academic activities, being distracted in class, academic satisfaction, positive emotions in the classroom, and nihilism toward education. These findings replicated the results reported earlier on the role of the different IM, EM, and amotivation in various educational outcomes. In addition, earlier versions as well as the current version of the EME were able to predict dropout behavior in high school and junior college (see Vallerand et al., 1989). The French version of the EME therefore appears to represent a reliable and valid measure of IM, EM, and amotivation in education. Because the EME was initially validated in French, it was thus not available to researchers conducting research with English-speaking students. In light of the psychometric qualities of the EME, the findings it has yielded, and the importance of assessing motivation from a sound theoretical perspective, it was decided to cross-culturally validate the EME in English. To validate a scale into another language involves much more than translation (Brislin, 1986; Vallerand, 1989). In addition to appropriate translation, one must conduct research in order to show that this new version of the scale shares the same psychometric properties as the original scale. Thus, the overall purpose of the present study was to translate the scale in English and to conduct initial assessment of its psychometric properties.
The Current Investigation Purpose A four-fold purpose guided this investigation: (a) to translate the EME in English using appropriate cross-cultural procedures, (b) to replicate the sevenfactor structure of the AMS through confirmatory factor analysis (with LISREL), (c) to assess the reliability (internal consistency and temporal stability) of the seven subscales, and (d) to assess whether the results from the Vallerand et al. (1989) study which revealed that females reported higher levels of IM to know, IM to experience stimulation, identification, and introjection, but lower levels of amotivation than males, would be replicated with a population of English-speaking students.
Salkind_Chapter 51.indd 295
9/4/2010 10:40:35 AM
296
Motivation
Method Translation of the EME in English In line with recent approaches to cross-cultural scale translation (Brislin, 1986; Vallerand, 1989), three steps were taken. First, the scale was translated from French to English. This was done with the parallel back-translation procedure (Brislin, 1986). Back translation first involves translating the scale from the original to the target language by a bilingual individual. This translation is then translated back to the original language by another bilingual individual without the use of the original scale. To the extent that the original scale is appropriately retranslated, this method provides an initial assessment of the adequacy of the translated version of the scale. The parallel back-translation procedure necessitates the use of two independent back translation sequences. This approach is preferred to the single back-translation method because it prevents the occurrence of certain biases that could result from the two specific bilingual individuals used in the back translation. In this study, four bilingual individuals (two social psychologists and two graduate students in social psychology) well cognizant of Deci and Ryan’s (1985) motivation theory conducted the parallel back-translation procedure. This led to two preliminary English versions of the AMS that were evaluated in the next phase. In the second phase, the items produced by the two back-translations were thoroughly assessed by a committee. The committee was formed of the individuals who participated in the back translation procedures and the authors of the original version of the scale (the EME). The committee selected the items that had been retranslated appropriately, that is which had retained the original meaning, and that had been conveyed in acceptable English. Once the 28 English items were selected the committee prepared the scale format and instructions so that they be identical to the ones used with the original French-Canadian version. Thus, the experimental version of the English AMS lists 28 items that may represent reasons why students go to college. These reasons are scored on a 7-point scale anchored by the end point “Not at all” (1) to “Exactly” (7) with a midpoint at 4 (“Moderately”). Third and final, a pretest was conducted with 10 junior-college students in order to determine whether the AMS was clear and formulated in a language to which post-secondary studies students can relate ( Vallerand, 1989). Students were asked to read the AMS and to verbalize any questions they may have with the items or instructions. This led to some minor modifications with the instructions.
Procedures The AMS was completed by 745 university students from the province of Ontario. This sample was composed of 484 females and 261 males with a
Salkind_Chapter 51.indd 296
9/4/2010 10:40:35 AM
Vallerand et al.
The Academic Motivation Scale
297
mean age of 21.0 years. In order to assess the temporal stability of the AMS, a second sample of 57 university students (27 males and 30 females) with a mean age of 19.3 years also completed the AMS twice over a one-month period. Students were informed that we were interested in better understanding the reasons why they go to the university. To this end, we asked students to complete the AMS. Students were told that they did not have to complete the questionnaire but that their collaboration would be very much appreciated. Subjects completed the AMS in class at the beginning of the period.
Statistical Analyses The various statistical analyses conducted dealt with the confirmatory factor analysis (with LISREL), the internal consistency (Cronbach alphas), test-retest correlations of the seven subscales, and the analysis of variance on the means of the subscales in order to test for sex differences.
Results and Discussion Confirmatory Factor Analysis The data were subjected to a confirmatory factor analysis with LISREL VI (Jöreskog and Sörbom, 1984). This analysis tests the extent to which the theoretical model, in this case the seven-factor model corresponding to the seven subscales, adequately represents the covariance matrix of the data. The fitting function estimated by the procedure was assessed through several indices, namely a chi-square statistic, the Goodness of Fit Index (GFI), the Adjusted Goodness of Fit Index (AGFI), and the Normed Fit Index (NFI) being the most widely used. These indices vary from 0 to 1 where 1 indicates a perfect fit for the model. In the initial model, seven factors were postulated. These factors corresponded to the seven subscales and were made up of the four corresponding items. No cross-loadings were postulated. Although the confirmatory factor analysis of the initial measurement model yielded fit values of .89 for the NFI, .87 for the AGFI, and .89 for the GFI, the model did not reach statistical nonsignificance (c2 = 1228.27, df = 329, p < .001). Correlations between pairs of measured-variable residuals were added to the model on the basis of the inspection of the modification indices. This resulted in 26 correlated residuals added to the model. With these additions the fit indices for the final measurement model showed that the model fits the data reasonably well, NFI = .93, AGFI = .91, GFI = .94, although the model did not reach statistical nonsignificance (c2 = 748.64, df = 303, p ≤ .001).
Salkind_Chapter 51.indd 297
9/4/2010 10:40:35 AM
298
Motivation
This improvement in fit was highly significant, difference in c2 = 479.63, df = 26, p < .001. In order to assess whether the inclusion of these theta delta values in the model could bias the interpretation of the model, the initial parameter estimates from the initial model were correlated with those from the final model. Results from the correlations involving the lambda x parameters yielded a .99 correlation value, while those including the lambda x and phi parameters indicated a .98 correlation value. These results underscore the fact that including the additional parameters in the model did not bias interpretation of the model. In sum, results from the confirmatory factor analysis replicated the findings obtained with the original French-Canadian version (the EME), and confirmed the seven-factor structure of the AMS. Loadings from the final model, which were all significant, are presented in Table 1.
Reliability The internal consistency of the subscales was assessed with the use of the Cronbach alpha. Values appear in the first column of Table 2. It can be seen that values varied from .83 to .86, except for the Identification subscale which had an alpha value of .62. These findings are remarkably similar to those obtained with the original version of the scale (EME) where values varied from .76 to .86, except for the Identification subscale which had a value of .62. Overall, considering the fact that these subscales are made up of 4 items, they appear to display adequate levels of internal consistency equivalent to that obtained with the original scale. In order to assess the temporal stability of the AMS, a second sample of 57 university students completed the AMS twice over a one-month period. Results from the test-retest correlations appear in the last column of Table 2. It can be seen that correlations are fairly high ranging from .71 to .83, with a mean test-retest correlations of .79. These results are once again very similar to those obtained with the French-Canadian version (the EME), and support the temporal stability of the English version of the scale. In addition, the alpha values for the pretest and posttest appear in Table 2. It can be seen that these values are quite acceptable varying from .72 to .91 at the pretest, and from .78 to .90 at the posttest. The alpha values for the identification subscale were of .72 and .78 at the pretest and posttest, respectively thereby further supporting the reliability of that subscale. In sum, overall these results provide support for the internal consistency and the temporal stability of the AMS.
Analyses of Variance on the Subscale Means Means of the seven subscales as a function of sex appear in Table 3. A sex X scale repeated measure analysis of variance, with repeated measures on the
Salkind_Chapter 51.indd 298
9/4/2010 10:40:35 AM
Salkind_Chapter 51.indd 299
Amotivation 1 Amotivation 2 Amotivation 3 Amotivation 4 External Regulation 1 External Regulation 2 External Regulation 3 External Regulation 4 Introjected Regulation 1 Introjected Regulation 2 Introjected Regulation 3 Introjected Regulation 4 Identified Regulation 1 Identified Regulation 2 Identified Regulation 3 Identified Regulation 4 Intrinsic Motivation-Knowledge 1 Intrinsic Motivation-Knowledge 2 Intrinsic Motivation-Knowledge 3 Intrinsic Motivation-Knowledge 4 Intrinsic Motivation-Accomplishment 1 Intrinsic Motivation-Accomplishment 2 Intrinsic Motivation-Accomplishment 3 Intrinsic Motivation-Accomplishment 4 Intrinsic Motivation-Stimulation 1 Intrinsic Motivation-Stimulation 2 Intrinsic Motivation-Stimulation 3 Intrinsic Motivation-Stimulation 4
1.059 0.750 1.025 0.940
Amotivation
1.143 1.024 1.139 1.262
External regulation
1.384 1.321 1.398 1.225
Introjected regulation
Table 1: Standardized loadings from the confirmatory factor analysis (LISREL)
0.582 0.808 0.749 0.783
Identified regulation
0.953 0.918 1.223 1.226
Intr.Mot. knowledge
1.198 1.174 1.261 1.292
Intr.Mot. accomplishment
0.878 1.424 1.449 1.445
Intr.Mot. stimulation
Vallerand et al. The Academic Motivation Scale 299
9/4/2010 10:40:35 AM
300
Motivation
Table 2: Internal consistency values (Cronbach alpha) and test-retest correlations of the AMS 7 subscales: Samples 1 and 2
Amotivation External Regulation Introjected Regulation Identified Regulation IM-to Know IM-Accomplishment IM-Stimulation
Alpha sample 1 (n = 745)
Alpha pretest sample 2 (n = 57)
Alpha posttest sample 2 (n = 57)
Test-retest correlations sample 2 (n = 57)
.85 .83 .84 .62 .84 .85 .86
.91 .85 .76 .72 .85 .90 .88
.88 .89 .83 .78 .90 .87 .84
.83 .83 .73 .71 .79 .83 .80
Table 3: Means (and standard deviations) for males and females on the AMS: Sample 1 Subscales Amotivation External Regulation Introjected Regulation* Identified Regulation* Intrinsic Motivation – Knowledge* Intrinsic Motivation – Accomplishment* Intrinsic Motivation – Stimulation*
Males (n = 261)
Females (n = 484)
6.74 (3.96) 21.78 (4.79) 16.0 (5.82) 21.60 (3.57) 18.89 (4.22) 15.93 (5.03) 12.21 (5.33)
6.51 (4.14) 21.80 (5.27) 17.80 (5.81) 22.19 (3.98) 20.46 (4.74) 17.52 (5.39) 13.83 (5.75)
* Females scored significantly higher (p < .01) than males.
scale factor, revealed the presence of main effects for sex, F(1, 743) = 21.10, p < .001, and scale, F(6, 738) = 1035.18, p < .001. The latter main effect revealed that all subscales differed from each other except for the Introjection and IM to Accomplish subscales, and the identification and external regulation subscales, respectively. The most important forms of motivation for the students in this sample were, in decreasing order: identification, external regulation, IM to know, introjection, IM toward accomplishments, IM to experience stimulation, and amotivation. However, these main effects must be interpreted in light of the significant sex X scale interaction, F(6, 738) = 3.87, p < .001. Results from the simple main effects revealed that female students scored higher than males on the 3 IM subscales (knowledge, accomplishment, and stimulation), as well as on the Identification and Introjection subscales. However, no sex differences were found on the other subscales (all Fs > 4.03, ps > .05).
General Discussion The purpose of the present study was to cross-culturally validate the English version of the EME. Results revealed that the AMS has adequate levels of reliability and factorial validity, very much in line with those of the original French-Canadian version. With respect to the reliability of the scale, results from this study revealed that the internal consistency of all subscales was adequate, typically ranging in the .80s, with the exception of
Salkind_Chapter 51.indd 300
9/4/2010 10:40:35 AM
Vallerand et al.
The Academic Motivation Scale
301
the Identification subscale which yielded values of .62 in the large sample, and .72 and .78 with the second sample used to assess the temporal stability of the scale. Finally, it should be reiterated that all AMS subscales displayed acceptable levels of temporal stability with a mean test-retest correlation value of .79 over a one-month period. These last results support the contention that the AMS measures students’ rather stable motivational orientations toward education. With respect to the validity of the AMS, the present results are also very encouraging on at least three accounts. First, results from the confirmatory factor analysis confirmed the seven-factor structure of the AMS and thus provided some support for the factorial validity of the scale. Second, results from the confirmatory factor analysis and the pattern of means of the IM subscales yielded preliminary support for the discriminant validity of the three IM subscales. Finally, gender differences on the various subscale means generally reproduced findings from the original study (Vallerand et al., 1989). The only difference between these two studies is that in the Vallerand et al. study (1989) females were also less amotivated than males and there was no sex differences on the IM Accomplishment subscale (although the means were in the predicted direction). These differences between the results from the Vallerand et al. and this study could be due to several factors including distinctions between the French and EnglishCanadian cultures, the motivation of university students (this study) and junior-college students (the Vallerand et al., 1989 study), as well as specificities (e.g., age, socio-economic background) of the samples used in the present and Vallerand et al. (1989) studies. Future research is needed in order to more fully understand these sex differences. However, one thing seems rather clear: In line with past research in education (e.g., Daoust et al., 1988; Vallerand and Bissonnette, in press; Vallerand et al., 1989) it appears that female students display a more self-determined motivational profile than male students. Overall, the findings from the series of studies replicated the results obtained with the French-Canadian version (EME). It now appears that preliminary support exists for the reliability and some elements of validity of the AMS. Although these findings are indeed very encouraging, they must nevertheless be perceived as being only preliminary in nature. A complete assessment of the psychometric properties of the scale will necessitate additional research. In that perspective, recent research of ours (Vallerand, Pelletier, Blais, Brière, Senécal, and Vallieres, in press) has shown that the AMS has elements of concurrent and construct validity. Specifically, it was found that the scale was correlated as hypothesized with other motivational scales such as that of Gottfried (1985). In addition, the AMS correlated as predicted from cognitive evaluation theory (Deci and Ryan, 1985) with motivational antecedents and consequences. Future research in that direction would therefore appear fruitful.
Salkind_Chapter 51.indd 301
9/4/2010 10:40:35 AM
302
Motivation
In addition, it seems appropriate to reiterate that the operational definition of the AMS directly reflects the conceptual definition of intrinsic/extrinsic motivation which refers to one’s perceived reasons for engaging in a given activity (the “why” of behavior), be they for the activity itself or for reasons lying outside the activity. Such an equivalence between the conceptual and operational definition of motivation should lead to more meaningful research. Furthermore, it should also be noted that contrary to other unidimensional instruments (e.g., Gottfried, 1985), the AMS assesses several types of motivation in a multidimensional fashion. These types of motivation go beyond the usual IM/EM distinction and allow a finer analysis of the motivational forces in education, thereby opening the door to innovative research. In sum, even though the AMS represents a recent scale whose evaluation should be pursued in future research, results from the present study provide support for the adequacy of its psychomometric properties. Not only does the AMS represent an adequate cross-cultural adaptation of the original FrenchCanadian version (the EME), but it represents a reliable and valid scale in its own right. The psychometric properties of the AMS, as well as the flexibility allowed through its multidimensional structure, should make it a useful tool in motivation research in educational settings.
Note 1. Deci and Ryan (1985) also include integrated regulation as one type of extrinsic motivation. However, integrated regulation was not initially included in the Echelle de Motivation en Education (EME) and therefore is not assessed in the Academic Motivation Scale (AMS). Two major reasons supported this initial decision. First, pilot data revealed that integrated regulation did not come out as a perceived reason for participating in educational activities. Second, factor analyses on experimental forms of the EME revealed that integrated regulation did not distinguish itself from identified regulation. The above findings may have been due to a host of potential factors including the fact that young adults may be too young to have achieved a sense of integration with respect to school activities. Future research would appear necessary on this issue.
References Brislin, R. W. (1986). The wording and translation of research instruments. In W. Lonner and J. Berry (Eds.), Field methods in cross-cultural research (pp. 137–164). Beverly Hills, CA: Sage. Csikszentmihalyi, M. (1975). Beyond boredom and anxiety. San Francisco: Jossey-Bass. Daoust, H., Vallerand, R. J., and Blais, M. R. (1988). Motivation and education: A look at some important consequences. Canadian Psychology, 29 (2a), 172. (abstract). Deci, E. L. (1975). Intrinsic motivation. New York: Plenum Press. Deci, E. L. and Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum Press.
Salkind_Chapter 51.indd 302
9/4/2010 10:40:35 AM
Vallerand et al.
The Academic Motivation Scale
303
Deci, E. L. and Ryan, R. M. (1991). A motivational approach to self: Integration in personality. In R. Dienstbier (Ed.), Nebraska Symposium on motivation: Vol. 38. Perspectives on motivation (pp. 237–288) Lincoln, NE: University of Nebraska Press. Deci, E. L., Vallerand, R. J., Pelletier, L. G., and Ryan, R. M. (1991). Motivation in education: The self-determination perspective. The Educational Psychologist, 26, 325–346. Gottfried, A. E. (1985). Academic intrinsic motivation in elementary and junior high school students. Journal of Educational Psychology, 77, 631–645. Harter, S. (1981). A new self-report scale on intrinsic versus extrinsic orientation in the classroom: Motivational and informational components. Developmental Psychology, 17, 300–312. Jöreskog, K. G. and Sörbom, D. (1984). LISREL VI. Chicago, IL: National Educational Resources. Ryan, R. M. and Connell, J. P. (1989). Perceived locus of causality and internalization: Examining reasons for acting in two domains. Journal of Personality and Social Psychology, 57, 450–461. Vallerand, R. J. (1989). Vers une méthodologie de validation trans-culturelle de questionnaires psychologiques: Implications pour la recherche en langue française (Toward a cross-cultural validation methodology for psychological scales: Implications for research conducted in the French language). Canadian Psychology, 30, 662–680. Vallerand, R. J. and Bissonnette, R. (in press). Intrinsic, extrinsic, and amotivational styles as predictors of behavior: A prospective study. Journal of Personality. Vallerand, R. J., Blais, M. R., Brière, N. M., and Pelletier, L. G. (1989). Construction et validation de l’Echelle de Motivation en Education (EME) [Construction and validation of the Echelle de Motivation en Education (EME)]. Canadian Journal of Behavioral Sciences, 21, 323–349. Vallerand, R. J., Pelletier, L. G., Blais, M. R., Brière, N. M., Senécal, C., and Vallières, E. F. (in press). On the assessment of intrinsic, extrinsic, and amotivation in education: Evidence on the concurrent and construct validity of the Academic Motivation Scale. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT.
Salkind_Chapter 51.indd 303
9/4/2010 10:40:35 AM
Salkind_Chapter 51.indd 304
9/4/2010 10:40:36 AM
52 Extrinsic Rewards and Intrinsic Motivation in Education: Reconsidered Once Again Edward L. Deci, Richard Koestner and Richard M. Ryan
G
old stars, best-student awards, honor roles, pizzas for reading, and other reward-focused incentive systems have long been part of the currency of schools. Typically intended to motivate or reinforce student learning, such techniques have been widely advocated by some educators, although, in recent years, a few commentators have questioned their widespread use. The controversy has been prompted in part by psychological research that has demonstrated negative effects of extrinsic rewards on students’ intrinsic motivation to learn. Some studies have suggested that, rather than always being positive motivators, rewards can at times undermine rather than enhance self-motivation, curiosity, interest, and persistence at learning tasks. Because of the widespread use of rewards in schools, a careful summary of reward effects on intrinsic motivation would seem to be of considerable importance for educators. Accordingly, in the Fall 1994 issue of Review of Educational Research, Cameron and Pierce (1994) presented a meta-analysis of extrinsic reward effects on intrinsic motivation, concluding that, overall, rewards do not decrease intrinsic motivation. Implicitly acknowledging that intrinsic motivation is important for learning and adjustment in educational settings (see, e.g., Ryan & La Guardia, 1999), Cameron and Pierce nonetheless stated that “teachers have no reason to resist implementing incentive systems in the classroom” (p. 397). They also advocated abandoning Deci and Ryan’s (1980) cognitive Source: Review of Educational Research, 71(1) (2001): 1–27.
Salkind_Chapter 52.indd 305
9/4/2010 10:40:24 AM
306
Motivation
evaluation theory (CET), which had initially been formulated to explain both positive and negative reward effects on intrinsic motivation. In the Spring 1996 issue of RER, three commentaries were published (Kohn, 1996; Lepper, Keavney, & Drake, 1996; Ryan & Deci, 1996) arguing that Cameron and Pierce’s meta-analysis was flawed and that its conclusions were unwarranted. In that same issue, Cameron and Pierce (1996) responded to the commentaries by claiming that, rather than reanalyzing the data, the authors of the three commentaries had suggested “that the findings are invalid due to intentional bias, deliberate misrepresentation, and inept analysis” (p. 39). Subtitling their response “Protests and Accusations Do Not Alter the Results,” Cameron and Pierce stated that any meaningful criticism of their article would have to include a reanalysis of the data. Subsequent to that interchange, Eisenberger and Cameron (1996) published an article in the American Psychologist summarizing the Cameron and Pierce (1994) meta-analysis and claiming that the so-called undermining of intrinsic motivation by extrinsic rewards, which they said had become accepted as reality, was in fact largely a myth. We do not claim that there was “intentional bias” or “deliberate misrepresentation” in either the Cameron and Pierce (1994) meta-analysis or the Eisenberger and Cameron (1996) article, but we do believe, as Ryan and Deci argued in 1996, that Cameron and Pierce used some inappropriate procedures and made numerous errors in their meta-analysis. Therefore, because we believe the problems with their meta-analysis made their conclusions invalid, because we agree that a useful critique of their article must involve reanalysis of the data, and because the issue of reward effects on intrinsic motivation is extremely important for educators, we performed a new metaanalysis of reward effects on intrinsic motivation (Deci, Koestner, & Ryan, 1999). Our meta-analysis included 128 experiments, organized so as to provide a test of CET, much as Cameron and Pierce had done. The new metaanalysis, which we summarize in this article, showed that, in fact, tangible rewards do significantly and substantially undermine intrinsic motivation. The meta-analysis provided strong support for CET and made clear that there is indeed reason for teachers to exercise great care when using reward-based incentive systems. The new meta-analysis was published in Psychological Bulletin (Deci et al., 1999). Included in that article was an appendix table (here reproduced with permission as Table 1a) listing every study in the meta-analysis and explaining exactly where errors were made by Cameron and Pierce, how our meta-analysis corrected their errors, and what studies were included in ours that had been overlooked or omitted by them. The table allows interested readers to see for themselves exactly how it is that Cameron and Pierce’s meta-analysis and our meta-analysis arrived at such different conclusions. In the seven years since the publication of Cameron and Pierce’s (1994) article, academics, school administrators, and classroom teachers from
Salkind_Chapter 52.indd 306
9/4/2010 10:40:25 AM
Deci et al.
Extrinsic Rewards and Intrinsic Motivation
307
many countries have spoken to us about the article, making it clear that the conclusions of the article had been widely disseminated and that the issue of reward effects is of considerable interest to educators around the world. Given the great importance of this issue for education, then, the current article is intended to set the record straight for the many readers of RER. In this article, we provide a brief description of CET, because it has guided much of the research in the field. This is followed by a summary of the methods and results of our meta-analysis and, finally, a discussion of the relevance of the results for education.
Cognitive Evaluation Theory CET proposes that underlying intrinsic motivation are the innate psychological needs for competence and self-determination. According to the theory, the effects on intrinsic motivation of external events such as the offering of rewards, the delivery of evaluations, the setting of deadlines, and other motivational inputs are a function of how these events influence a person’s perceptions of competence and self-determination. Events that decrease perceived self-determination (i.e., that lead to a more external perceived locus of causality) will undermine intrinsic motivation, whereas those that increase perceived self-determination (i.e., that lead to a more internal perceived locus of causality) will enhance intrinsic motivation. Furthermore, events that increase perceived competence will enhance intrinsic motivation so long as they are accompanied by perceived selfdetermination (e.g., Ryan, 1982), and those that decrease perceived competence will diminish intrinsic motivation. Finally, rewards (and other external events) have two aspects. The informational aspect conveys self-determined competence and thus enhances intrinsic motivation. In contrast, the controlling aspect prompts an external perceived locus of causality (i.e., low perceived self-determination) and thus undermines intrinsic motivation. As noted, CET applies not only to reward effects but to the effects of various other external factors such as evaluations (Smith, 1975), deadlines (Amabile, DeJong, & Lepper, 1976), competition (Deci, Betley, Kahle, Abrams, & Porac, 1981), and externally imposed goals (Mossholder, 1980), as well as to the general climate of classrooms, schools, and other interpersonal settings (e.g., Deci, Connell, & Ryan, 1989; Deci, Schwartz, Sheinman, & Ryan, 1981). In this article, however, we focus only on CET as an explanation for reward effects. In making predictions about reward effects on intrinsic motivation, CET analyzes the type of reward and the type of reward contingency to determine whether the reward is likely to be experienced as informational or controlling. The theory acknowledges that in some cases both the informational and
Salkind_Chapter 52.indd 307
9/4/2010 10:40:25 AM
308
Motivation
controlling aspects will be somewhat salient, so, in those situations, additional factors are taken into account in making predictions. We begin our discussion of CET’s reward-effect predictions by distinguishing between verbal rewards and tangible rewards, considering verbal rewards first and then moving on to tangible rewards.
Verbal Rewards Although we do not usually use the term verbal rewards, preferring instead to speak of “positive feedback,” we do use that term here in order to include the positive-feedback studies within the general category of reward effects. Verbal rewards typically contain explicit positive performance feedback, so CET predicts that they are likely to enhance perceived competence and thus enhance intrinsic motivation. In the meta-analysis, we tested the hypothesis that verbal rewards would enhance intrinsic motivation. Nonetheless, verbal rewards can have a significant controlling aspect leading people to engage in behaviors specifically to gain praise, so verbal rewards have the potential to undermine intrinsic motivation. The theory therefore suggests that the interpersonal context within which positive feedback is administered can influence whether it will be interpreted as informational or controlling. As used here, the term interpersonal context refers to the social ambience of settings, such as classrooms, as they influence people’s experience of self-determination (Deci & Ryan, 1991). When studied in laboratory experiments, the interpersonal climate is usually manipulated in terms of the interpersonal style used by the experimenter when providing the feedback (e.g., Ryan, 1982; Ryan, Mims, & Koestner, 1983). An interpersonal context is considered controlling to the extent that people feel pressured by it to think, feel, or behave in particular ways. Verbal rewards administered within such a context are thus more likely to be experienced as controlling rather than informational. For example, CET suggests that if a teacher uses an interpersonal style intended to make students do what he or she wants them to, verbal rewards administered by that teacher are likely to be experienced as controlling. In a supplemental meta-analysis involving five studies, we tested the prediction that controlling positive feedback would lead to less intrinsic motivation than informational positive feedback.
Tangible Rewards Unlike verbal rewards, tangible rewards are frequently offered to people as an inducement to engage in a behavior in which they might not otherwise engage. Thus, according to CET, tangible rewards will tend to be experienced as controlling, and as a result they will tend to decrease intrinsic motivation.
Salkind_Chapter 52.indd 308
9/4/2010 10:40:25 AM
Deci et al.
Extrinsic Rewards and Intrinsic Motivation
309
The meta-analysis tested the hypothesis that, overall, tangible rewards would decrease intrinsic motivation. In order for tangible rewards to be experienced as controlling, however, people would need to be engaging in the behavior for the rewards; that is, they would need to expect that the behavior would lead to the rewards. If tangible rewards are given unexpectedly to people after they have finished a task, the rewards are less likely to be experienced as the reason for doing the task and are thus less likely to be detrimental to intrinsic motivation. The meta-analysis tested the hypothesis that unexpected tangible rewards would not undermine intrinsic motivation, whereas expected tangible rewards would. Expected tangible rewards can be administered through various contingencies; that is, they can be made contingent upon different aspects of taskrelated behavior. In making more refined predictions about the effects of expected tangible rewards on intrinsic motivation, CET takes account of task contingency. Ryan et al. (1983) specified three types of reward contingencies: task-noncontingent rewards, which do not require engaging in the activity per se but are instead given for some other reason such as simply participating in the experiment; task-contingent rewards, which require doing or completing the target activity; and performance-contingent rewards, which require performing the activity well, matching a standard of excellence, or surpassing a specified criterion (e.g., doing better than half of the other participants). A further distinction has been made between task-contingent rewards that specifically require completing the target task (herein referred to as completioncontingent rewards) and those that require engaging in the activity but do not require completing it (herein referred to as engagement-contingent rewards). We (e.g., Deci & Ryan, 1985) have considered the completion-contingent and engagement-contingent rewards to constitute the single category of taskcontingent rewards because the effects of these two reward contingencies have seemed to be remarkably similar; however, we separated them for this metaanalysis in order to evaluate whether the effects of completion-contingent and engagement-contingent rewards are, in fact, the same. Because task-noncontingent rewards do not require doing, completing, or doing well at the target task, there is no reason to expect these rewards to be experienced as either informational or controlling with respect to the task. Accordingly, the meta-analysis tested the hypothesis that intrinsic motivation would not be affected by these rewards. Engagement-contingent rewards specifically require that people work on the task, so the rewards are likely to be experienced as controlling the task behavior. Because these rewards carry little or no competence affirmation, they are unlikely to increase perceived competence, and thus there will be nothing to counteract the negative effects of the control. Thus, the metaanalysis tested the hypothesis that engagement-contingent rewards would undermine intrinsic motivation.
Salkind_Chapter 52.indd 309
9/4/2010 10:40:25 AM
310
Motivation
Completion-contingent rewards require that people complete the task to obtain the rewards, so the rewards are likely to be experienced as even more controlling than engagement-contingent rewards. However, with completioncontingent rewards, receipt of the rewards conveys competence if the task required skill and the person had a normative sense of what constitutes good performance on the task. To the extent that the rewards do represent competence affirmation, this implicit positive feedback could offset some of the control. Still, averaged across different types of tasks, the competenceaffirming aspect of completion-contingent rewards is not expected to be strong relative to the controlling aspect, so we tested the hypothesis that completioncontingent rewards would undermine intrinsic motivation at a level roughly comparable to that of engagement-contingent rewards. Parenthetically, because the category of task-contingent rewards is composed of engagement-contingent and completion-contingent rewards, we also expected this larger category to yield significant undermining of intrinsic motivation. Finally, performance-contingent rewards are linked to people’s performance, so there is even stronger control. People have to meet a standard to maximize rewards, and thus there is a strong tendency for these rewards to undermine intrinsic motivation. However, performance-contingent rewards can also convey substantial positive competence information when a person receives a level of reward that signifies excellent performance. In those cases, there would be a tendency for performance-contingent rewards to affirm competence and, thus, to offset some of the negative effects of control. In the meta-analysis, we tested the hypothesis that performance-contingent rewards would undermine intrinsic motivation, but we also expected that other factors would influence the effects of these rewards on intrinsic motivation. One such factor is whether or not the level of reward implies excellent performance. Thus, we examined the hypothesis that performance-contingent rewards would be more undermining of intrinsic motivation if the rewards did not convey high-quality performance. Another factor that is expected to influence the effects of performancecontingent rewards is the interpersonal context (as was the case with verbal rewards). If the interpersonal climate within which these rewards are administered is demanding and controlling, the rewards are expected to be more undermining of intrinsic motivation. Although few studies have manipulated the interpersonal context of performance-contingent rewards, Ryan et al. (1983) compared a performancecontingent rewards group in which the rewards were administered in a relatively controlling manner and one in which they were administered in a relatively non-controlling manner. As predicted, the controlling administration of performance-contingent rewards led to undermining of intrinsic motivation relative to the noncontrolling administration. In terms of education, this is a particularly important finding because it suggests that when rewards are used in the classroom, it is important that the climate of the
Salkind_Chapter 52.indd 310
9/4/2010 10:40:25 AM
Deci et al.
Extrinsic Rewards and Intrinsic Motivation
311
classroom be supportive rather than controlling so that the students will be less likely to experience the rewards as controlling.
Method Our meta-analytic strategy (Deci et al., 1999) involved a hierarchical approach in which the results of 128 experiments were examined in two separate meta-analyses. The first involved 101 of the studies that had used a free-choice behavioral measure of intrinsic motivation, and the second involved 84 of the studies that had used self-reported interest as a dependent variable. In a hierarchical meta-analysis, one begins with the most general category and reports the composite effect size. If the set of effects is heterogeneous, then one proceeds to differentiate the overall category into meaningful subcategories in an attempt to achieve homogeneity of effects within the subcategories. Thus, in both meta-analyses (i.e., with the two dependent measures), we began by calculating the effects of all rewards on intrinsic motivation and then systematically differentiated the reward conditions. Only after we had exhausted all possible moderator variables did we discard outliers to create homogeneity within subcategories. Using this approach, we ended up discarding only about 4% of the effects as outliers, whereas Cameron and Pierce (1994) had discarded approximately 20% of the effects as outliers. In the differentiation, studies were first separated into those that examined verbal rewards versus those that examined tangible rewards. Then tangible rewards, which have been extensively studied, were analyzed as follows. The effects of rewards that were unexpected versus expected were examined separately. Studies of expected tangible rewards were then separated into four groups, depending on what the rewards were contingent upon. The groups were as follows: task noncontingent (rewards that did not explicitly require working on a task), engagement contingent (rewards that did require working on the task), completion contingent (rewards that required finishing a task), and performance contingent (rewards contingent upon a specified level of performance at a task). As described subsequently, because the performance-contingent reward effects on the free-choice measure were heterogeneous, that category was further differentiated. Finally, in categories in which the effect sizes were heterogeneous after all theoretically based differentiations had been completed, we compared the effects of the reward types on schoolchildren versus college students, an issue that had not been considered previously but emerged from an inspection of the data and seemed very important in terms of the educational relevance of the results. Inclusion criteria for studies that spanned the period 1971 to 1996 were the following. First, because intrinsic motivation is pertinent to tasks that
Salkind_Chapter 52.indd 311
9/4/2010 10:40:25 AM
312
Motivation
people experience as interesting and because the field of inquiry has always been defined in terms of reward effects on intrinsic motivation for interesting tasks, we included only studies or conditions within studies if the target task was at least moderately interesting (i.e., if it either was not defined a priori as a boring task by the experimenter or did not have a prereward interest rating below the midpoint of the scale). In contrast, Cameron and Pierce (1994) had aggregated across boring and interesting tasks without even addressing the issue in their article. Second, the analyses included only studies that assessed intrinsic motivation after the rewards had been clearly terminated, because while the reward is in effect participants’ behavior reflects a mix of intrinsic and extrinsic motivation. Cameron and Pierce, however, included assessments which they called intrinsic motivation but which had been taken while the reward contingency was still in effect. Third, studies were included only if they had an appropriate no-reward control group. Cameron and Pierce had made numerous comparisons based on questionable selections of control groups, at times even using inappropriate control groups when appropriate ones were available. In conducting the meta-analyses, we used Cohen’s d as the measure of effect size. It reflects the difference between the means of two groups divided by the pooled within-group standard deviations, adjusted for sample size (Hedges & Olkin, 1985). The mean of the control group was subtracted from the mean of the rewards group, so a negative d reflects an “undermining effect,” whereas a positive d reflects an “enhancement effect.” Means, standard deviations, t tests, F tests, and sample sizes were used to calculate d values. For any study in which insufficient data were provided to calculate an effect size, we assigned an effect of d = 0.00, and we included those imputed values in all analyses. All effect-size computations and summary analyses were done with DSTAT (Johnson, 1993), a meta-analytic software program. Each calculation of a composite effect size is accompanied by a 95% confidence interval (CI) (for additional methodological details, see Deci et al., 1999).
Results Effects of All Rewards Although the early discussions of extrinsic reward effects on intrinsic motivation (e.g., deCharms, 1968) tended to consider extrinsic rewards as a unitary concept, even the very first investigations of this issue differentiated the concept. Deci (1971, 1972b) distinguished between tangible rewards and verbal rewards (i.e., positive feedback), reporting that tangible rewards decreased intrinsic motivation, while verbal rewards increased it. Furthermore, Deci (1972a) differentiated task-contingent rewards from task-noncontingent
Salkind_Chapter 52.indd 312
9/4/2010 10:40:25 AM
Deci et al.
Extrinsic Rewards and Intrinsic Motivation
313
rewards, finding that task-contingent rewards decreased intrinsic motivation but task-noncontingent rewards did not, and Lepper, Greene, and Nisbett (1973) distinguished between rewards that were expected and those that were unexpected, finding that expected rewards decreased intrinsic motivation but unexpected rewards did not. Accordingly, given that different rewards and different reward contingencies seem to have different effects on intrinsic motivation, aggregating across all types of rewards meta-analytically is, in a sense, a meaningless endeavor, because the outcome will depend primarily on how many studies of each type of reward or reward contingency are included in the meta-analysis (Ryan & Deci, 1996). Nonetheless, because Cameron and Pierce (1994) calculated the effect of all rewards on intrinsic motivation in their meta-analysis, we also calculated it for comparative purposes. The effect of all types of rewards across all relevant studies revealed significant undermining for the freechoice behavioral measure of intrinsic motivation (k = 101; d = –0.24; CI = –0.29, –0.19),1 although the overall effect for the self-report measure was not significant. These and other major results are summarized in Table 1.
Table 1: Major results of the meta-analysis of the effects of extrinsic rewards on free-choice intrinsic motivation and self-reported interest, shown as Cohen’s composite d, with k effects included Free-choice behavior
All rewards Verbal rewards College Children Tangible rewards Unexpected Expected Task noncontingent Engagement contingent College Children Completion contingent Performance contingent Maximal reward Not maximum reward Positive feedback control Negative feedback control
Self-reported interest
d
k
−0.24* 0.33* 0.43* 0.11 −0.34* 0.01 −0.36* −0.14 −0.40* −0.21* −0.43* −0.44* −0.28* −0.15* −0.88* −0.20* −0.03
101 21 14a 7a 92 9a 92 7a 55 12a 39a 19a 32 18a 6a 10a 3a
d
k
0.04 0.31*
84 21a
−0.07* 0.05 −0.07* 0.21 −0.15*
70 5a 69 5a 35a
−0.17* −0.01
13a 29a
a
These categories were not further differentiated and are homogeneous. Some of the studies used to determine the overall composite effect size (i.e., for all rewards) in each meta-analysis had multiple reward conditions, so the sums of the numbers of effect sizes in the most differentiated categories of each metaanalysis are greater than the numbers in the all-rewards category. There were 150 effect sizes in the most differentiated categories for the free-choice analyses, of which 6 were removed as outliers, and there were 114 effect sizes in the most differentiated categories of the self-report analyses, of which 6 were removed as outliers. * Significant at p < .05 or greater.
Salkind_Chapter 52.indd 313
9/4/2010 10:40:25 AM
314
Motivation
As already mentioned, we expected that all rewards would not affect intrinsic motivation in a uniform way, and thus we both expected and found that the set of effects for the all-rewards category was heterogeneous. Consequently, we proceeded with more differentiated analyses of specific types of rewards, based on both theoretical and empirical considerations. We first separated studies of verbal rewards from those of tangible rewards.
Verbal Rewards (Positive Feedback) We first tested the CET prediction that, on average, verbal rewards would enhance intrinsic motivation. Twenty-one studies examined the effects of verbal rewards on free-choice intrinsic motivation, and 21 examined its effects on self-reports of interest. Results indicated that verbal rewards enhanced intrinsic motivation: for the behavioral measure, d = 0.33 (CI = 0.18, 0.43), and for self-reports, d = 0.31 (CI = 0.19, 0.44). However, there are two important caveats to this general finding. First, because the set of effect sizes for verbal-reward effects on free-choice behavior was heterogeneous, we inspected the studies to determine whether there was any obvious pattern in the results. We noticed that the effects of verbal rewards on schoolchildren appeared to be different from the effects on college students, so we conducted separate analyses for schoolchildren and college students. It turned out that verbal rewards enhanced free-choice intrinsic motivation for college students (k = 14; d = 0.43; CI = 0.27, 0.58) but not for children (k = 7; d = 0.11; CI = 0.11, 0.34), a point that is very important when thinking about educational practices. Second, CET has emphasized that although positive feedback can enhance intrinsic motivation, it can actually undermine intrinsic motivation if it is administered with a controlling interpersonal style. Five studies examined the administration of verbal rewards with an informational versus controlling interpersonal style, so we did a supplemental analysis of these studies. The results indicated, as hypothesized, that although informationally administered verbal rewards enhanced intrinsic motivation (d = 0.66; CI = 0.28, 1.03), controllingly administered verbal rewards undermined intrinsic motivation (d = –0.44; CI = –0.82, –0.07). To summarize, research indicates that verbal rewards (i.e., positive feedback) tend to have an enhancing effect on intrinsic motivation; however, verbal rewards are less likely to have a positive effect for children than for older individuals. Furthermore, verbal rewards can even have a negative effect on intrinsic motivation if the interpersonal context within which they are administered is controlling rather than informational.
Salkind_Chapter 52.indd 314
9/4/2010 10:40:25 AM
Deci et al.
Extrinsic Rewards and Intrinsic Motivation
315
Tangible Rewards Next, we tested the CET prediction that, overall, tangible rewards (including material rewards, such as money and prizes, and symbolic rewards, such as trophies and good player awards) would decrease intrinsic motivation, because tangible rewards are frequently used to persuade people to do things they would not otherwise do, that is, to control their behavior. The meta-analysis included 92 tangible reward studies with a free-choice measure and 70 with a self-report measure. As predicted by CET, results indicated that, on average, tangible rewards significantly undermined both free-choice intrinsic motivation (d = –.34; CI = –0.39, –0.28) and self-reported interest (d = –0.07; CI = –0.13, –0.01). Of course, we have regularly argued that a full understanding of the effects of tangible rewards requires a consideration of additional factors such as reward contingency and interpersonal context, but these results do highlight the general risks associated with the use of tangible rewards as a motivator. Because age effects had emerged for verbal rewards, we also compared the effects of tangible rewards in studies of children versus college students. This revealed that even though tangible rewards significantly undermined intrinsic motivation for both groups, the undermining effect was significantly greater for children than for college students on both behavioral and selfreport measures of intrinsic motivation. The real-world implications of this pattern of results are extremely important. There is great concern about children’s motivation for school work, as well as for other behaviors such as sports, art, and prosocial activities, and a study conducted by Boggiano, Barrett, Weiher, McClelland, and Lusk (1987) indicated that adults tend to view salient extrinsic rewards as an effective motivational strategy for promoting these behaviors in children. However, the age-effect analyses indicate that, although tangible rewards may control immediate behaviors, they have negative consequences for subsequent interest, persistence, and preference for challenge, especially for children. In summary, the age effects that emerged from our meta-analysis indicate that tangible rewards have a more negative effect on children than on college students and that verbal rewards have a less positive effect on children than on college students.
Unexpected Rewards and Task-Noncontingent Rewards We next tested the CET prediction that unexpected rewards would not be detrimental to intrinsic motivation, whereas expected rewards would. The reasoning was that if people are not doing a task in order to get a reward, they are not likely to experience their task behavior as being controlled by the reward. The meta-analysis supported the hypothesis. Nine studies of free-choice behavior revealed no undermining (d = 0.01; CI = –0.20, 0.22), and five studies of self-reported interest revealed similar results (d = 0.05; CI = –0.19, 0.29).
Salkind_Chapter 52.indd 315
9/4/2010 10:40:25 AM
316
Motivation
In contrast, analyses of expected rewards did yield undermining for both free-choice behavior (k = 92; d = –0.36; CI = –0.42, –0.30) and self-reported interest (k = 69; d = –0.07; CI = –0.13, –0.01). It is interesting in this regard to note that verbal rewards are generally unexpected, and that may be one of the reasons they do not typically have a negative effect on intrinsic motivation. According to CET, rewards not requiring task engagement should be unlikely to affect intrinsic motivation for the task because the rewards are not given for doing the task. Although relatively few studies of tasknoncontingent rewards have been done, the meta-analysis revealed no evidence that these rewards significantly affected either measure of intrinsic motivation (k = 7; d = –0.14; CI = –0.39, 0.11, for free-choice behavior and k = 5; d = 0.21; CI = –0.08, 0.50, for self-reported interest).
Engagement-Contingent Rewards Engagement-contingent rewards are offered explicitly for engaging in an activity. When children were told they would get a good player award for working on an art activity (Lepper et al., 1973), the reward was engagement contingent. Similarly, when college students were told they would receive a reward if they performed a hidden-figures activity, the reward was engagement contingent (Ryan et al., 1983). In neither case was there a performance requirement: Participants did not have to finish the task or do well on it; they simply had to work on it. More studies have used engagement-contingent rewards than any other reward contingency, and that is particularly true for studies of children. Results of the meta-analyses confirmed that engagementcontingent rewards significantly diminished intrinsic motivation measured in both ways (k = 55; d = –0.40; CI = –0.48, –0.32, for free-choice and k = 35; d = –0.15; CI = –0.25, –0.06, for self-reports). Furthermore, the undermining on the free-choice measure, while significant for both children and college students, was significantly stronger for children than for college students. The strength of the undermining on self-reports did not differ for the two groups.
Completion-Contingent Rewards The first study of reward effects on intrinsic motivation in humans (Deci, 1971) employed completion-contingent rewards. In it, participants were offered $1 for each of four puzzles they completed within a specified amount of time. As already mentioned, the pressure associated with the completion-contingent rewards was greater than that associated with engagement-contingent rewards, but we expected this to be offset somewhat by the implicit competence affirmation provided by the reward. Overall, we predicted an undermining effect for this category of rewards comparable to that for engagement-contingent rewards (Ryan et al., 1983).
Salkind_Chapter 52.indd 316
9/4/2010 10:40:25 AM
Deci et al.
Extrinsic Rewards and Intrinsic Motivation
317
Twenty studies examined completion-contingent reward effects on freechoice behavior, and 15 examined effects on self-reports. Analyses revealed that completion-contingent rewards significantly undermined intrinsic motivation for both dependent measures. Because the effects for these rewards on free-choice behavior were heterogeneous and there were no age effects, we had to remove one outlier to achieve homogeneity. With the outlier removed, the results were as follows: k = 19; d = –0.44; CI = –0.59, –0.30. For selfreports, the effects were also heterogeneous, and again there were no age effects; thus, we had to remove two outliers. With these outliers removed, we also found significant undermining by the completion-contingent rewards (k = 13; d = –0.17; CI = –0.33, –0.00, for self-reports).2 As expected, the effects of engagement-contingent and completion-contingent rewards were virtually identical.
Task-Contingent Rewards In the first taxonomy of reward contingencies, Ryan et al. (1983) included task-contingent rewards, and Cameron and Pierce included the category in their meta-analysis. Because the task-contingent reward category is simply the aggregate of engagement-contingent rewards and completion-contingent rewards, this category is redundant. However, for comparative purposes, we mention it here. Task-contingent rewards undermined intrinsic motivation assessed with both measures (k = 74; d = –0.39; CI = –0.46, –0.32, for free choice and k = 48; d = –0.12; CI = –0.20, –0.04, for self-reports). Again, the undermining tended to be worse for children.
Performance-Contingent Rewards From the standpoint of CET, performance-contingent rewards are the most interesting type of tangible rewards. Performance-contingent rewards were defined by Ryan et al. (1983) as rewards given explicitly for doing well at a task or for performing up to a specified standard. Examples of performancecontingency studies include the Ryan et al. study, in which all participants in the performance-contingent-rewards condition received $3 for “having done well at the activity,” and the Harackiewicz, Manderlink, and Sansone (1984) study, in which participants received a reward because they were said to have performed better than 80% of other participants. According to CET, performance-contingent rewards have the potential to affect intrinsic motivation in two ways, one quite positive and one quite negative. Performance-contingent rewards can maintain or enhance intrinsic motivation if the receiver of the reward interprets it informationally, as an affirmation of competence. Yet, because performance-contingent rewards are often used as a vehicle to control not only what the person does but how well he
Salkind_Chapter 52.indd 317
9/4/2010 10:40:25 AM
318
Motivation
or she does it, such rewards can easily be experienced as very controlling, thus undermining intrinsic motivation. According to CET, it is the relative salience of the informational versus controlling aspects of performance-contingent rewards which determines their ultimate effect on intrinsic motivation. In most experiments examining performance-contingent rewards, all participants receive rewards as if they had done very well (which, of course, does not happen in the real world). Therefore, these studies do not address the effects of receiving only partial rewards or no rewards under performance contingencies, a circumstance that is more common in the real world and would undoubtedly diminish both perceived competence and perceived self-determination and accordingly have a very negative effect on intrinsic motivation. There can thus be little doubt that research on the effects of performance-contingent rewards markedly underestimates the negative effects of this type of reward, since it has focused largely on people who succeed at the contingency. In contrast, a real-world contingency in which only those achieving above the 80th percentile receive a reward, if veridically applied, would mean that 80% of participants would end up getting no reward and, implicitly, receiving negative competence feedback. The meta-analyses for the overall effects of performance-contingent rewards included 32 studies with a free-choice measure and 30 with a selfreport measure. Performance-contingent rewards significantly undermined free-choice behavior (d = –0.28, CI = –0.38, –0.18), whereas results for the self-report studies were not significant. We did not do further analyses of studies with the self-report measure because the set of effects was homogeneous with only one outlier removed. However, the effects for the free-choice measure were quite heterogeneous. Consequently, we separated the effects into four categories based on the following two considerations. First, different studies of performance-contingent rewards have used different control groups; specifically, some have used control groups in which participants received neither rewards nor feedback, whereas others have used control groups in which participants received no rewards but did receive the same feedback conveyed by the rewards to the participants who received rewards. In this latter instance, for example, if the rewards were given for doing better than 80% of the participants, participants in a no-reward control group that received feedback would have been told that they did better than 80% of the participants. To examine the combined effects of performance-contingent rewards and the feedback inherent within them, one would compare the rewards condition with a no-rewards, no-feedback condition. On the other hand, to examine the effects of the rewards per se, independent of the feedback conveyed by them, one would compare the rewards group with a no-rewards group that received comparable feedback. Second, although the definition of performance-contingent rewards used in the majority of studies involves giving rewards to all participants as if they
Salkind_Chapter 52.indd 318
9/4/2010 10:40:26 AM
Deci et al.
Extrinsic Rewards and Intrinsic Motivation
319
had performed well, some studies gave rewards in a way that conveyed to some or all of the participants that they had not performed well. These participants got less than the maximum available rewards, thus indicating that their competence was not optimal. For example, in a study conducted by Rosenfield, Folger, and Adelman (1980) that involved a feedback control group, rewarded participants got a small reward for performing in the bottom 15% of all participants, and the corresponding control group received the comparable “negative” feedback without the reward. Clearly, this and other such studies are quite different from the more typical studies of performancecontingent rewards in which all participants receive the same maximum reward for having done well. Studies involving different types of control groups and different levels of performance were aggregated without comment by Cameron and Pierce (1994). In our meta-analysis, however, because performance-contingent reward effects were not homogeneous, we examined four categories of performance-contingent rewards rather than simply discarding outliers as Cameron and Pierce had done. The four categories were as follows: effects involving no-feedback control groups in which everyone received the maximum possible rewards, effects involving no-feedback control groups in which all participants did not receive the maximum possible rewards, effects involving comparable-feedback control groups in which all participants received positive feedback, and effects involving comparable-feedback control groups in which all participants received negative feedback. With the free-choice measure, for studies that compared no-feedback control groups and participants who received the maximum possible rewards, there was significant undermining (k = 18; d = –0.15; CI = –0.31, –0.00).2 For studies with no-feedback control groups in which all participants did not receive the maximum possible rewards, there was also significant undermining (k = 6; d = –0.88; CI = –1.12, –0.65). The same was true for studies with comparable-feedback control groups in which everyone received positive feedback (k = 10; d = –0.20; CI = –0.37, –0.03). However, for the three studies with comparable-feedback control groups in which participants received negative feedback, there was not a significant effect for reward versus no reward. The group in which at least some participants got less than the maximum possible rewards and the control group received no feedback stands out and deserves special mention. This represents the type of performance-contingent rewards that one would typically find in the real world, in that here rewards are a direct function of performance. Those who perform best get the largest rewards, and those who perform less well get smaller rewards or no rewards. The analysis showed that this type of reward had the largest undermining effect of any category used in the entire meta-analysis (d = –0.88), indicating clearly that rewarding people as a direct function of performance runs a very serious risk of negatively affecting their intrinsic motivation.
Salkind_Chapter 52.indd 319
9/4/2010 10:40:26 AM
320
Motivation
Summary of the Primary Analyses To summarize the primary findings from the meta-analyses, when free-choice behavior was used as the dependent measure, all rewards, all tangible rewards, all expected rewards, engagement-contingent rewards, completioncontingent rewards, task-contingent rewards, and performance-contingent rewards significantly undermined intrinsic motivation. Only verbal rewards enhanced intrinsic motivation in general, but verbal rewards did undermine intrinsic motivation if they were given with a controlling interpersonal style. The undermining of intrinsic motivation by tangible rewards was worse for children than for college students, and the enhancement by verbal rewards was weaker for children than for college students. The most damaging reward contingency was the commonly used one of performance-contingent rewards in which not all participants receive maximum rewards. When self-reported interest served as the dependent measure, all tangible rewards, all expected rewards, engagement-contingent rewards, completion-contingent rewards, and task-contingent rewards significantly undermined intrinsic motivation. Verbal rewards enhanced self-reported interest.
Supplemental Analyses To further clarify the limiting conditions and moderator effects of rewards, we performed two supplemental analyses. First, to determine whether the undermining of intrinsic motivation is simply a transitory phenomenon, we examined the effects of tangible rewards on the free-choice behavior of children, dividing the studies into three groups: those for which intrinsic motivation was assessed immediately after the reward was terminated, those for which it was assessed a few days later, and those for which it was assessed at least a week later. Analyses indicated that timing of the dependent measure did not affect the results. For all three groups, the composite effect sizes were between –0.40 and –0.53, all statistically significant. If anything, the undermining was strongest in the studies in which the measure was taken at least a week after the rewards were given. Second, although our primary meta-analyses included only studies for which the target activity was initially interesting, whereas Cameron and Pierce collapsed across interesting and dull tasks without analyzing task effects, we conducted a set of analyses to consider this issue empirically. In our first analysis, we included data from the dull-task conditions and repeated the overall meta-analysis. For the free-choice analyses, every undermining effect that had appeared when only initially interesting tasks were included also appeared after the dull-task conditions were added in; for the self-report analyses, all except one of the effects that had indicated significant
Salkind_Chapter 52.indd 320
9/4/2010 10:40:26 AM
Deci et al.
Extrinsic Rewards and Intrinsic Motivation
321
undermining when only interesting tasks were used were again significant when the dull-task conditions were included. The one exception for selfreport studies was that the inclusion of the dull-task data led the undermining of self-reported interest in the completion-contingent condition to drop to nonsignificance. In our second analysis, we examined the 13 studies that had included both interesting and dull tasks, assessing the effects of tangible rewards separately for interesting and dull tasks. For the 11 studies with a free-choice measure, results indicated a large undermining by rewards in the interestingtask conditions (d = –0.68; CI = –0.89, –0.47) but not in the dull-task conditions (d = 0.18; CI = –0.03, 0.39). For 5 studies with self-reports, there was also significant undermining with the interesting task (d = –0.37; CI = –0.67, –0.07) but not the dull task (d = 0.10; CI = –0.09, 0.40). In summary, it is clear that rewards do not undermine people’s intrinsic motivation for dull tasks because there is little or no intrinsic motivation to be undermined. But neither do rewards enhance intrinsic motivation for such tasks. From our perspective (see, e.g., Ryan & Deci, 2000; Ryan & Stiller, 1991), the issue of promoting self-regulation of uninteresting activities is addressed with the concept of internalization rather than reward effects on intrinsic motivation. In other words, if a task is dull and boring, the issue is not whether the rewards will lead people to find the task intrinsically interesting because rewards do not add interest value to the task itself. Rather, the issue is how to facilitate people’s understanding the importance of the activity to themselves and thus internalizing its regulation so they will be selfmotivated to perform it.
Summary and Conclusions To summarize, results of the meta-analysis make clear that the undermining of intrinsic motivation by tangible rewards is indeed a significant issue. Whereas verbal rewards tended to enhance intrinsic motivation (although not for children and not when the rewards were given controllingly) and neither unexpected tangible rewards nor task-noncontingent tangible rewards affected intrinsic motivation, expected tangible rewards did significantly and substantially undermine intrinsic motivation, and this effect was quite robust. Furthermore, the undermining was especially strong for children. Tangible rewards – both material rewards, such as pizza parties for reading books, and symbolic rewards, such as good student awards – are widely advocated by many educators and are used in many classrooms, yet the evidence suggests that these rewards tend to undermine intrinsic motivation for the rewarded activity. Because the undermining of intrinsic motivation by tangible rewards was especially strong for school-aged children, and because studies have linked
Salkind_Chapter 52.indd 321
9/4/2010 10:40:26 AM
322
Motivation
intrinsic motivation to high-quality learning and adjustment (e.g., Benware & Deci, 1984; Ryan & Grolnick, 1986), the findings from this meta-analysis are of particular import for primary and secondary school educators. Specifically, the results indicate that, rather than focusing on rewards for motivating students’ learning, it is important to focus more on how to facilitate intrinsic motivation, for example, by beginning from the students’ perspective to develop more interesting learning activities, to provide more choice, and to ensure that tasks are optimally challenging (e.g., Cordova & Lepper, 1996; Deci, Schwartz, et al., 1981; Harter, 1974; Reeve, Bolt, & Cai, 1999; Ryan & Grolnick, 1986; Zuckerman, Porac, Lathin, Smith, & Deci, 1978). In these ways, we will be more able to facilitate the type of motivation that has been found to promote creative task engagement (Amabile, 1982), cognitive flexibility (McGraw & McCullers, 1979), and conceptual understanding of learning activities (Benware & Deci, 1984; Grolnick & Ryan, 1987). The results of the meta-analysis also provided strong support for CET. Specifically, the predictions made by CET, based on an analysis of whether reward types and reward contingencies are likely to be experienced as informational or controlling, were uniformly supported and were particularly strong for the behavioral measure. Thus, although Cameron and Pierce argued that CET should be abandoned and stated that there is no reason for teachers to resist using rewards in the classroom, it is clear that CET provides an excellent account of reward effects and that there is, in fact, good reason for teachers to think carefully about when and how to use rewards in the classroom.
Appendix A list of each study used in our meta-analyses. A (D) indicates an unpublished dissertation. The second column indicates types of rewards and/or reward contingencies, followed by whether participants were children or undergraduates, followed by whether the dependent measure was free-choice behavior or self reported interest. (Codes appear in Notes to the Appendix.) Finally, we explain whether our treatment of the study and results differed from Cameron and Pierce’s. If a study was coded the same, the same control groups were used in the comparisons, and the effect sizes we reported did not differ from the effect sizes Cameron and Pierce reported by more than 0.10 in either direction, we noted that the study was the same in the two meta-analyses. If there was a difference, we explained what it was.
Salkind_Chapter 52.indd 322
9/4/2010 10:40:26 AM
Variables
E, 1, F, S E, 2, S V, E, 1, F
V, 2, S C, 2, F, S E, 2, S E, C, 2, S P, 2, S V, 2, F, S V, 2, F, S E, P, 1, F E, 1, F E, C, P, 1 ,F
E, 2, F
Study
Amabile et al., 1986, Exp. 1 Amabile et al., 1986, Exp. 3 Anderson et al., 1976
Salkind_Chapter 52.indd 323
Anderson & Rodin, 1989 Arkes, 1979 Arnold, 1976 Arnold, 1985 Bartelme, 1983 (D) Blanck et al., 1984, Exp. 1 Blanck et al., 1984, Exp. 2 Boggiano & Ruble, 1979 Boggiano et al., 1982 Boggiano et al., 1985
Brennan & Glover, 1980
(Continued )
Same.1 Same. This had multiple no-reward control groups. We selected the one recommended as appropriate by the study’s authors and comparable to ones used for other studies in this meta-analysis. C. & P.2 used a control group that the authors said was inappropriate, in which the experimenter avoided eye contact with the young children and ignored their attempts to interact, even though there were just the two people in the room. The study’s authors said that this condition was uncomfortable, even painful, for both the children and experimenter. Not surprisingly, that group showed free-choice intrinsic motivation that was considerably lower than any other group. Nearly the same.3 Both meta-analyses treated the composite dependent variable as self-report. Same. Same. Same. Excluded, type I.4 Same for free-choice; nearly the same for self-report. Excluded, type II.5 Excluded, type II. Same. The study’s authors crossed reward contingency with salience of reward. They referred to the two reward contingencies as task contingent and performance contingent, and C. & P. coded them that way, treating the task-contingent conditions as engagement contingent.6 However, the salience manipulation in the task-contingent condition changed the contingency. In the low-salience group, rewards were given for simply working on the puzzles, which makes them engagement contingent, but in the high salience group, rewards were given for each puzzle “completed,” which makes them completion contingent. This was engagement contingent because participants got rewards if they “work with the Soma puzzle for at least 8 minutes,” but C. & P. coded it task noncontingent. Further, C. & P. combine two control groups, including one that had not worked on the task for the same amount of time as the rewards group during the experimental period, but we used only the control group that had worked on the task for the same amount of time.
Comparison with Cameron & Pierce’s (1994) analysis
Table 1a: Studies used in our meta-analyses compared with Cameron and Pierce (1994)
Deci et al. Extrinsic Rewards and Intrinsic Motivation 323
9/4/2010 10:40:26 AM
Variables
E, P, 1, F, S C, 2, F, S V, 1, S C, D, 2, S
E, P, D, 1, F V, P, 2, F, S V, 2, F, S N, P, 1, F, S P, D, 2, F, S
V, E, 1, F, S C, 2, F, S V, 2, F, S N, 2, F V, C, 2, F V, 2, F E, 1, F E, 1, F, S V, P, 1, F, S
N, 2, F, S
V, E, P, 2, S U, C, D, 1, F P, 2, F C, P, 1, F
C, 1, F
Study
Brewer, 1980 (D) Brockner & Vasta, 1981 Butler, 1987 Calder & Staw, 1975
Chung, 1995 Cohen, 1974 (D) Crino & White, 1982 Dafoe, 1985 (D) Daniel & Esser, 1980
Danner & Lonky, 1981, Exp. 2 Deci, 1971, Exp. 1 Deci, 1971, Exp. 3 Deci, 1972a Deci, 1972b Deci et al., 1975 DeLoach et al., 1983 Dimitroff, 1984 (D) Dollinger & Thelen, 1978
Earn, 1982
Efron, 1976 (D) Eisenstein, 1985 Enzle et al., 1991 Fabes, 1987, Exp. 1
Fabes, 1987, Exp. 2
Table 1a: (Continued)
Salkind_Chapter 52.indd 324
Excluded, type I. Same. Nearly the same. This study provided monetary rewards for completing a set of puzzles, thus making it completion contingent, but C. & P. coded it engagement contingent. Also, C. & P. collapsed across interesting and dull tasks.7 Excluded, type III.8 Excluded, type I. Same. Excluded, type I. In this study, participants were told “they could win up to $2 depending on how quickly they correctly assembled the puzzles.” This conveyed that the rewards depended on doing well relative to a standard and not just on finishing the puzzles. Thus, we coded it performance contingent, but C. & P. coded it completion contingent. Also, C. & P. collapsed across interesting and dull tasks. Nearly the same. Same. Same. Same. Same. Excluded, type II. Same. Excluded, type I. This had three tangible rewards groups, a verbal rewards group, and a control group. C. & P. inappropriately collapsed across verbal and tangible rewards, and they did not use the free-choice data. Rewards were given “simply for participating in the study” which makes it task noncontingent, but C. & P. coded it engagement contingent. Excluded, type I. Excluded, type II. Excluded, type II. Same for the performance-contingent condition. For the other condition, participants were given rewards “when they finished” a block construction, making it completion contingent, but C. & P. coded it engagement contingent. This study used the same procedure as the completion-contingent condition in Fabes (1987, Exp. 1), making it completion contingent, but C. & P. coded it engagement completion.
Comparison with Cameron & Pierce’s (1994) analysis
324 Motivation
9/4/2010 10:40:26 AM
E, 1, F, S E, 1, F, S
E, 1, F C, 2, F V, C, P, 1, F, S C, 2, F
U, E, P, 1, F
E, D, 1, F
C, 1, F
E, C, D, 2, S
V, E, P, 1, S
P, 1, S P, 2, F, S U, P, 2, F, S
P, 2, F, S P, 1, S
E, D, 2, F, S E, P, 1, F E, P, 1, F
V, IC, 1, S V, 2, F, S
Fabes et al., 1986 Fabes et al., 1988
Fabes et al., 1989 Feehan & Enzle, 1991, Exp. 2 Goldstein, 1977 (D) Goldstein, 1980 (D)
Greene & Lepper, 1974
Griffith, 1984 (D)
Griffith et al., 1984
Hamner & Foster, 1975
Harackiewicz, 1979
Harackiewicz & Manderlink, 1984 Harackiewicz et al., 1984, Exp. 1 Harackiewicz et al., 1984, Exp. 2
Harackiewicz et al., 1984, Exp. 3 Harackiewicz et al., 1987
Hitt et al., 1992 Hyman, 1985 (D) Karniol & Ross, 1977
Salkind_Chapter 52.indd 325
Kast & Connor, 1988 Koestner et al., 1987
(Continued )
Excluded, type III. Excluded, type I. Same except we coded the performance-contingent conditions for whether participants got the maximum rewards with implicit positive feedback or less than maximum rewards with implicit negative feedback. Excluded, type II. Same.
Excluded, type II. Same for free-choice, but C. & P. did not include the self-report. In this study, children selected a face ranging from frown to smile to reflect how much they enjoyed the task, a procedure that is common for obtaining self-report data from young children. Excluded, type II. Excluded, type II. Excluded, type I. Excluded, type I. This included competition conditions but we did not use those because competition has a complex effect on intrinsic motivation (Reeve & Deci, 1996). Same for the two unexpected groups and the engagement-contingent group, but C. & P. exclude the performancecontingent group. Excluded, type I. To be comparable to most other studies in this meta-analysis, we included only participants who worked in the individual context. Children were rewarded for finishing reading a passage up to the bookmark, which makes it completion contingent, but C. & P. coded it engagement contingent. (The McLoyd, 1979 study used the same instructions and C. & P. did code it completion contingent.) Same coding for completion contingent. In engagement contingent, participants were paid “75 cents for the 20 minute task,” but C. & P. coded it as task noncontingent. Also, C. & P. collapsed across interesting and dull tasks. Same for verbal rewards. Nearly the same for engagement contingent. C. & P. excluded the two performancecontingent rewards groups. Same. Same. Same coding, but C. & P. made an error in the self report effect size for performance contingent, showing it as enhancement when in fact it was undermining with a d = –0.16. Same. Same.
Deci et al. Extrinsic Rewards and Intrinsic Motivation 325
9/4/2010 10:40:26 AM
Variables
N, 1, S
U, 1, S C, 1, S
P, 1, S
P, 2, F, S U, E, 1, F
E, 1, F C, 2, F, S C, 2, F, S E, D, 1, F C, P, 2, F, S
C, 2, S C, D, 1, F E, 1, F, S E, 1, F, S E, 1, F, S E, 1, F, S E, D, 1, F E, D, 1, F E, 1, F E, 1, F, S N, E, 1, F, S
Study
Kruglanski et al., 1971
Kruglanski et al., 1972 Kruglanski et al., 1975, Exp. 1
Kruglanski et al., 1975, Exp. 2
Lee, 1982 (D) Lepper et al., 1973
Lepper et al., 1982, Exp. 3 Liberty, 1986, Exp. 1 (D) Liberty, 1986, Exp. 2 (D) Loveland & Olley, 1979 Luyten & Lens, 1981
McGraw & McCullers, 1979 McLoyd, 1979 Morgan, 1981, Exp. 1 Morgan, 1981, Exp. 2 Morgan, 1983, Exp. 1 Morgan, 1983, Exp. 2 Mynatt et al., 1978 Newman & Layton, 1984 Ogilvie & Prior, 1982 Okano, 1981, Exp. 1 Okano, 1981, Exp. 2
Table 1a: (Continued)
Salkind_Chapter 52.indd 326
Rewards were given “because you have volunteered for this study …” so they were task noncontingent, but C. & P. coded them engagement contingent. Same. Participants were rewarded either for the number of coin flips they guessed correctly or for the number of block constructions they completed correctly, making it completion contingent, but C. & P. coded it performance contingent. It explored moderation by endogenous versus exogenous rewards. There were two reward groups and two control groups. In one pair, people worked on a stock market game and earned cash after each trial for good investments. The control group was the same as the experimental group except they were told they had to give back their earnings, so it was not a reasonable no-reward control group. In the other pair of conditions, money was not mentioned to the no-reward control group. We excluded the pair of conditions without a proper control group, but C. & P. collapsed across the two pairs of conditions. Excluded, type I. Same coding. Same effect sizes for engagement contingent. C. & P. made an error in calculating the effect size for unexpected rewards. Excluded, type II. Excluded, type I. Excluded, type I. Same coding, but C. & P. collapsed across interesting and dull tasks. Same for performance contingent. In the other rewards condition participants were paid after each of three puzzles they solved, so it was completion contingent, but C. & P. coded it as engagement contingent. Same. Coded the same, but C. & P. collapsed across interesting and dull tasks. Same on free-choice; nearly the same on self-report. Same. Same on free-choice; nearly the same on self-report. Same. Coded the same, but C. & P. collapsed across interesting and dull tasks. Excluded, type II. Same. Excluded, type II. Excluded, type II.
Comparison with Cameron & Pierce’s (1994) analysis
326 Motivation
9/4/2010 10:40:26 AM
V, U, P, 1, F
V, U, P, 1, F
E, P, 1, F, S E, 1, F, S E, P, 2, F, S P, 2, F, S V, IC, 2, F N, E, 1, F
E, 1, F C, 2, F
V, U, E, 2, F, S U, E, 2, F, S E, 1, F P, 2, F, S
E, 1, F, S E, 1, F, S N, E, 1, F
IC, 2, F
Orlick & Mosher, 1978
Pallak et al., 1982
Patrick, 1985 (D) Perry, et al., 1977 Picek, 1976 (D) Pittman et al., 1977 Pittman et al., 1980 Pittman et al., 1982, Exp. 1
Pittman et al., 1982, Exp. 2 Porac & Meindl, 1982
Pretty & Seligman, 1984, Exp. 1 Pretty & Seligman, 1984, Exp. 2 Reiss & Sushinsky, 1975, Exp. 1 Rosenfield et al., 1980
Ross, 1975, Exp. 1 Ross, 1975, Exp. 2 Ross et al., 1976
Ryan, 1982
Salkind_Chapter 52.indd 327
Extrinsic Rewards and Intrinsic Motivation
(Continued )
Same coding for verbal and unexpected. In performance contingent, children got rewards “if you do a good job today and tomorrow on the balance board,” but C. & P. coded it as completion contingent. There were discrepancies in the effect sizes. Same for verbal and unexpected. C. & P. did not report how they coded the tangible expected rewards condition, which was performance contingent. Excluded, type I. Excluded, type II. Excluded, type I. Same coding, but C. & P. used only self-report. We also used free-choice persistence, calculated as the number of trials. Same except that C. & P. did not do an analysis of informational versus controlling positive feedback. Same codings and nearly the same free-choice effects. C. & P. imputed a self-report value of 0.00, but participants were not asked how interesting or enjoyable they found the activity. Nearly the same. C. & P. coded this engagement contingent, but participants received $1.50 for each puzzle solved. C. & P. reported a comparison for 40 experimental and 20 control participants, but there were only 50 participants in the study. We calculated the reward effect size based on a comparison of the rewarded groups with neutral and extrinsic mind sets versus the non-rewarded groups with neutral and extrinsic mind sets, because that comparison provided corresponding reward versus no-reward conditions. Same for unexpected and engagement contingent. Nearly the same for verbal on free-choice. Same. Same. This study had performance-contingent, completion-contingent, and task-noncontingent groups, and a control group with feedback comparable to that in performance contingent. There was no appropriate control group for completion contingent or task noncontingent. It also crossed tangible rewards with positive versus negative feedback. C. & P. reported a verbal effect for positive versus negative feedback, and then they collapsed across feedback to examine tangible-reward effects. We did a moderator analysis of rewards signifying positive versus negative feedback. C. & P. listed a performance-contingent self report d = 2.80, but the correct d was 0.22. For free-choice, there was a modest discrepancy. Same for free-choice; they did not include self-report. Nearly the same for free-choice; they did not include self-report. Same for engagement contingent. In the other group, children were rewarded “for waiting,” which is task noncontingent, but C. & P. coded it engagement contingent. We included this study only in the supplemental meta-analysis of Informational versus Controlling verbal rewards. C. & P. excluded it.
Deci et al. 327
9/4/2010 10:40:26 AM
Variables
V, E, P, IC, 2, F, S
P, 2, F, S
V, 2, S V, 2, S V, 2, S E, 1, F, S V, 2, F, S E, P, 1, F, S V, U, P, 2, F, S E, D, 1, F
P, 2, F, S
C, 1, F C, 2, S
N, E, 1, F E, 1, F
P, 2, S E, 2, F V, E, 2, F, S E, P, 2, F, S
Study
Ryan et al., 1983
Salancik, 1975
Sansone, 1986 Sansone, 1989 Sansone et al., 1989 Sarafino, 1984 Shanab, 1981 Shiffman-Kaufman, 1990 (D) Smith, 1975 (D) Smith, 1980 (D)
Smith & Pittman, 1978
Sorensen & Maehr, 1976 Staw et al., 1980
Swann & Pittman, 1977, Exp. 1 Swann & Pittman, 1977, Exp. 2
Taub & Dollinger, 1975 Thompson et al., 1993 Tripathi & Agarwal, 1985 Tripathi & Agarwal, 1988
Table 1a: (Continued)
Salkind_Chapter 52.indd 328
Same on verbal and engagement contingent. There were two performance-contingent groups, one informational and one controlling. There were three no-reward control groups, one with informational positive feedback, one with controlling positive feedback, and one with no-feedback. We compared performance-contingent both to comparablefeedback controls and no-feedback controls in the moderator analyses. C. & P. did only the comparable-feedback comparisons. Also, C. & P. did not do an informational-controlling comparison. Same coding. C. & P. collapsed across positive and negative feedback conditions, but we did a moderator analysis for positive versus negative. Same. Same. Same. Same. Same. Excluded, type I. For comparability with other studies, we used only data from the 10-day assessments. Excluded, type I. Excluded, type I. In this study, there was also a condition called positive feedback, but the statements were not competence feedback. Same for self-report. C. & P. imputed a score of 0.00 for free-choice performance, even though means and significance tests were reported. Excluded, type II. Participants got a $1 reward for completing 15 puzzles, making it completion contingent, but C. & P. coded it engagement contingent. Same. There were two engagement-contingent groups, an engagement-contingent plus verbal-rewards group, and two no-reward control groups. There was not a control group for the engagement plus verbal group. We compared the two engagement to the two control groups, but C. & P. used all three reward groups. Same. Excluded, type III. Nearly the same. Same for engagement contingent on free-choice. For performance contingent, there were two tasks, with free-choice data reported for only one. Both we and C. & P. used the data for the one task and assigned d = 0.00 for the other, but C. & P. averaged the effects whereas we combined them meta-analytically. In the self-report data, C. & P. combined the engagement and performance conditions so it is unclear which analysis they were used in.
Comparison with Cameron & Pierce’s (1994) analysis
328 Motivation
9/4/2010 10:40:26 AM
Salkind_Chapter 52.indd 329
P, 2, S C, 2, F, S
E, P, 2, F, S E, 1, F, S E, D, 2, F, S N, C, 2, S
E, 2, F, S V, 1, F
Weinberg & Jackson, 1979 Weiner, 1980
Weiner & Mander, 1978 Williams, 1980 Wilson, 1978 (D) Wimperis & Farr, 1979
Yuen, 1984 (D) Zinser, 1982
Same. Same. This study had pre-post data for a rewards group and a control group. C. & P. did pre-post analyses for the rewards group and ignored the control group. We compared the rewards group to the control group with pre-post analyses. We coded it completion contingent, but C. & P. did not code it. Same. Participants received $.25 for each anagram completed, which makes it completion contingent, but C. & P. coded it performance contingent. Same. Same. Excluded, type I. In one group, participants received $1.75 for being in the study, making it task noncontingent, but C. & P. coded it engagement contingent. In the other, participants “were paid for each model or subunit completed,” making it completion contingent, but C. & P. coded it performance contingent. Excluded, type I. Same.
Note: (D) = Unpublished Dissertation; V = Verbal Rewards; U = Unexpected Tangible Rewards; N = Task-Noncontingent Rewards; E = Engagement-Contingent Rewards; C = Completion-Contingent Rewards; P = Performance-Contingent Rewards; D = Dull-Task condition included in study and used in supplemental meta-analysis; IC = Informational versus Controlling comparison was made in supplemental meta-analysis. The code of 1 means the participants were children and the code of 2 means they were undergraduates. Finally, F means that the free-choice dependent measure was used and S means that the self-report measure was used. 1 Same means that Cameron and Pierce and we coded the study the same, used the same control groups, and found effects sizes that did not differ from each other by more that 0.10 in either direction. 2 C. & P. refers to Cameron and Pierce. 3 Nearly the same means the studies were coded the same and the same control groups were used, but that the effect sizes were different by more than 0.10, probably due to differences in estimation of standard deviations. If the discrepancy is large, we make note of that. 4 “Excluded, type I” refers to dissertations, and Cameron and Pierce excluded all dissertations. 5 “Excluded, type II” refers to studies that Cameron and Pierce excluded for no apparent reason. 6 Cameron and Pierce (1994) did not use the term “engagement-contingent.” When we say they coded a reward engagement-contingent, it means that they coded it as both “task-contingent” and what they referred to as “not contingent using a behavioral definition.” Because the intersection of those two codes is equivalent to our engagementcontingent code, we say that they coded it as engagement-contingent to minimize confusion for the reader. Similarly, they did not use the term completion-contingent, but what they coded as both “task-contingent” and “contingent using a behavioral definition” is equivalent to what we call completion-contingent. 7 These studies used both interesting and uninteresting tasks. We excluded the uninteresting tasks from the primary meta-analyses and included them in the supplemental metaanalysis concerned with initial task interest. Cameron and Pierce collapsed across the interesting and dull tasks even though it has been firmly established in the literature that initial task interest interacts with reward effects. 8 “Excluded, type III” refers to studies that Cameron and Pierce excluded because they were published after Cameron and Pierce’s cut-off date.
V, 1, S V, 2, S C, 1, F
Vallerand, 1983 Vallerand & Reid, 1984 Vasta & Stirpe, 1979
Deci et al. Extrinsic Rewards and Intrinsic Motivation 329
9/4/2010 10:40:26 AM
330
Motivation
Notes 1. The value k represents the number of effects considered in calculating a composite effect size. Because, for any given calculation, the data were aggregated across all relevant conditions within a study in order to ensure independence of effect sizes, k also represents the number of studies that were included in the calculation of a composite effect size. The value d represents the composite effect size corrected for reliability (Hedges & Olkin, 1985). In regard to CIs, if both endpoints are on the same side of 0.00, it indicates that the mean for the reward groups is significantly different from the mean for the no-reward groups. 2. Although one end of the CI appears to be 0.00, it was actually slightly negative and was rounded to 0.00. A significance test indicated that the composite effect size was significant.
References Amabile, T. M. (1982). Social psychology of creativity: A consensual assessment technique. Journal of Personality and Social Psychology, 43, 997–1013. Amabile, T. M., DeJong, W., & Lepper, M. R. (1976). Effects of externally imposed deadlines on subsequent intrinsic motivation. Journal of Personality and Social Psychology, 34, 92–98. Benware, C., & Deci, E. L. (1984). Quality of learning with an active versus passive motivational set. American Educational Research Journal, 21, 755–765. Boggiano, A. K., Barrett, M., Weiher, A. W., McClelland, G. H., & Lusk, C. M. (1987). Use of the maximal-operant principle to motivate children’s intrinsic interest. Journal of Personality and Social Psychology, 53, 866–879. Cameron, J., & Pierce, W. D. (1994). Reinforcement, reward, and intrinsic motivation: A meta-analysis. Review of Educational Research, 64, 363–423. Cameron, J., & Pierce, W. D. (1996). The debate about rewards and intrinsic motivation: Protests and accusations do not alter the results. Review of Educational Research, 66, 39–52. Cordova, D. I., & Lepper, M. R. (1996). Intrinsic motivation and the process of learning: Beneficial effects of contextualization, personalization, and choice. Journal of Educational Psychology, 88, 715–730. deCharms, R. (1968). Personal causation. New York: Academic Press. Deci, E. L. (1971). Effects of externally mediated rewards on intrinsic motivation. Journal of Personality and Social Psychology, 18, 105–115. Deci, E. L. (1972a). Effects of contingent and non-contingent rewards and controls on intrinsic motivation. Organizational Behavior and Human Performance, 8, 217–229. Deci, E. L. (1972b). Intrinsic motivation, extrinsic reinforcement, and inequity. Journal of Personality and Social Psychology, 22, 113–120. Deci, E. L., Betley, G., Kahle, J., Abrams, L., & Porac, J. (1981). When trying to win: Competition and intrinsic motivation. Personality and Social Psychology Bulletin, 7, 79–83. Deci, E. L., Connell, J. P., & Ryan, R. M. (1989). Self-determination in a work organization. Journal of Applied Psychology, 74, 580–590. Deci, E. L., Koestner, R., & Ryan, R. M. (1999). A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation. Psychological Bulletin, 125, 627–668. Deci, E. L., & Ryan, R. M. (1980). The empirical exploration of intrinsic motivational processes. In L. Berkowitz (Ed.), Advances in experimental social psychology ( Vol. 13, pp. 39–80). New York: Academic Press.
Salkind_Chapter 52.indd 330
9/4/2010 10:40:26 AM
Deci et al.
Extrinsic Rewards and Intrinsic Motivation
331
Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum. Deci, E. L., & Ryan, R. M. (1991). A motivational approach to self: Integration in personality. In R. Dienstbier (Ed.), Nebraska Symposium on Motivation: Vol. 38. Perspectives on motivation (pp. 237–288). Lincoln: University of Nebraska Press. Deci, E. L., Schwartz, A. J., Sheinman, L., & Ryan, R. M. (1981). An instrument to assess adults’ orientations toward control versus autonomy with children: Reflections on intrinsic motivation and perceived competence. Journal of Educational Psychology, 73, 642–650. Eisenberger, R., & Cameron, J. (1996). Detrimental effects of reward: Reality or myth? American Psychologist, 51, 1153–1166. Grolnick, W. S., & Ryan, R. M. (1987). Autonomy in children’s learning: An experimental and individual difference investigation. Journal of Personality and Social Psychology, 52, 890–898. Harackiewicz, J. M., Manderlink, G., & Sansone, C. (1984). Rewarding pinball wizardry: The effects of evaluation on intrinsic interest. Journal of Personality and Social Psychology, 47, 287–300. Harter, S. (1974). Pleasure derived by children from cognitive challenge and mastery. Child Development, 45, 661–669. Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic Press. Johnson, B. T. (1993). DSTAT 1.10: Software for the meta-analytic review of literatures [Software and manual]. Hillsdale, NJ: Erlbaum. Kohn, A. (1996). By all available means: Cameron and Pierce’s defense of extrinsic motivators. Review of Educational Research, 66, 1–4. Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children’s intrinsic interest with extrinsic rewards: A test of the “overjustification” hypothesis. Journal of Personality and Social Psychology, 28, 129–137. Lepper, M. R., Keavney, M., & Drake, M. (1996). Intrinsic motivation and extrinsic rewards: A commentary on Cameron and Pierce’s meta-analysis. Review of Educational Research, 66, 5–32. McGraw, K. O., & McCullers, J. C. (1979). Evidence of a detrimental effect of extrinsic incentives on breaking a mental set. Journal of Experimental Social Psychology, 15, 285–294. Mossholder, K. W. (1980). Effects of externally mediated goal setting on intrinsic motivation: A laboratory experiment. Journal of Applied Psychology, 65, 202–210. Reeve, J., Bolt, E., & Cai, Y. (1999). Autonomy-supportive teachers: How they teach and motivate students. Journal of Educational Psychology, 91, 537–548. Rosenfield, D., Folger, R., & Adelman, H. (1980). When rewards reflect competence: A qualification of the overjustification effect. Journal of Personality and Social Psychology, 39, 368–376. Ryan, R. M. (1982). Control and information in the intrapersonal sphere: An extension of cognitive evaluation theory. Journal of Personality and Social Psychology, 43, 450–461. Ryan, R. M., & Deci, E. L. (1996). When paradigms clash: Comments on Cameron and Pierce’s claim that rewards do not undermine intrinsic motivation. Review of Educational Research, 66, 33–38. Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55, 68–78. Ryan, R. M., & Grolnick, W. S. (1986). Origins and pawns in the classroom: Self-report and projective assessments of individual differences in children’s perceptions. Journal of Personality and Social Psychology, 50, 550–558.
Salkind_Chapter 52.indd 331
9/4/2010 10:40:26 AM
332
Motivation
Ryan, R. M., & La Guardia, J. G. (1999). Achievement motivation within a pressured society: Intrinsic and extrinsic motivations to learn and the politics of school reform. In T. C. Urdan (Ed.), Advances in motivation and achievement: The role of context ( Vol. 11, pp. 45–85). Greenwich, CT: JAI Press. Ryan, R. M., Mims, V., & Koestner, R. (1983). Relation of reward contingency and interpersonal context to intrinsic motivation: A review and test using cognitive evaluation theory. Journal of Personality and Social Psychology, 45, 736–750. Ryan, R. M., & Stiller, J. (1991). The social contexts of internalization: Parent and teacher influences on autonomy, motivation and learning. In M. L. Maehr & P. R. Pintrich (Eds.), Advances in motivation and achievement ( Vol. 7, pp. 115–150). Greenwich, CT: JAI Press. Smith, W. E. (1975). The effect of anticipated vs. unanticipated social reward on subsequent intrinsic motivation. Unpublished doctoral dissertation, Cornell University, Ithaca, NY. Zuckerman, M., Porac, J., Lathin, D., Smith, R., & Deci, E. L. (1978). On the importance of self-determination for intrinsically motivated behavior. Personality and Social Psychology Bulletin, 4, 443–446.
Salkind_Chapter 52.indd 332
9/4/2010 10:40:26 AM
53 Beyond the Rhetoric: Understanding Achievement and Motivation in Catholic School Students Janine Bempechat, Beth A. Boulay, Stephanie C. Piergross and Kenzie A. Wenk
I
n the early 1980s, James Coleman’s work on the academic advantage associated with Catholic school membership generated a variety of research studies that examined the nature and extent of these early findings (Coleman, Hoffer, & Kilgore, 1982). Since then, a growing literature has documented that low-income students of color in Catholic high schools tend to outperform their peers in public schools in virtually every measure of preand post-secondary achievement, including GPA, SAT scores, enrollment in higher-track coursework, and high school completion (Bryk, Lee, & Holland, 1993; Carbonaro, 2003; Ellison & Hallinan, 2004; Morgan, 2001; Sander & Krautman, 1995). More recently, research on college acceptance has found that, on average and relative to their public school peers, students who graduate from Catholic high schools are more likely to attend college and be admitted to more selective colleges (Altonji, Elder, & Taber, 2005; Eide, Goldhaber, & Showalter, 2004). The research on what has come to be called the “Catholic school advantage” cannot be taken lightly. It is known that urban Catholic schools advance their students’ achievement with far fewer resources and curricula and pedagogy that are not necessarily on the cutting edge of educational research (Bempechat, Drago-Severson, & Boulay, 2002; Cattaro, 2002a).
Source: Education and Urban Society, 40(2) (2008): 167–178.
Salkind_Chapter 53.indd 333
9/4/2010 10:40:15 AM
334
Motivation
Furthermore, inner-city Catholic schools achieve the greatest success with students who are the most disadvantaged and at risk for school failure, for both demographic and public policy reasons (Ilg, Massucci, & Cattaro, 2004; Peterson & Walberg, 2002). In other words, students at risk – those who are poor, whose first language is not English, who are members of an ethnic minority, and whose own parents have limited educations – are the most likely to suffer the negative consequences of the resurgence of school segregation and the increasing use of school promotion examinations (Heubert, 2002; Orfield, Frankenberg, & Lee, 2002). Yet enduring concerns about self-selection, although not inappropriate, have made it commonplace to attribute the higher achievement of Catholic school students to factors having nothing to do with pedagogy. Because Catholic school are schools of choice, students who enroll may be smarter, be better off materially, and have parents who are themselves better educated and therefore more motivated to ensure academic excellence in their children (Chubb, 2005; Goldberger & Cain, 1982). It could also be the case that administrators select the most well-behaved students and expel the most disruptive from their midst, making teachers’ jobs that much more manageable (Hoxby, 2003; Salganik & Karweit, 1982). Despite increased evidence of negative selection (i.e., that Catholic schools educate under- rather than over-achieving students; Sander, 2001), the positive outcomes associated with Catholic school enrollment are seemingly routinely dismissed. This is regrettable because something impressive is going on in urban Catholic schools, something from which we can and should want to learn (Cattaro, 2002b). As educators and students of educational reform, our goal is to look beyond outcome variables to probe the underlying factors that motivate students to achieve. The purpose of this article is to suggest new directions for research that go beyond an enumeration of outcome scores. Specifically, we present a program of research that we designed to build grounded theory on achievement and motivation in urban Catholic high school students. We first provide the reader with a brief background on the research context in which our work has evolved.
The Motivational Underpinnings of Success Research in achievement motivation has demonstrated that students’ beliefs about what it takes to do well in school are better predictors of their performance than even achievement test scores (Grant & Dweck, 2003). Beliefs about reasons for success and failure are particularly powerful because they predict the extent to which students will persist in the face of difficulty (Weiner, 2005). For example, students who tend to attribute poor performance to internal factors within their control, such as lack of effort, are more likely to feel ashamed and work harder for the next assignment or test.
Salkind_Chapter 53.indd 334
9/4/2010 10:40:16 AM
Bempechat et al.
Achievement and Motivation in Catholic School Students 335
In contrast, students who tend to implicate external factors over which they have little or no control, such as a difficult test or a teacher who does not like them, are more likely to believe that investing more effort for the next test will be of little consequence for their ultimate performance (Weiner, 1994). To the extent that urban Catholic school students are outperforming their public school peers academically, is it possible that they adhere to beliefs that place them at an advantage motivationally? We asked more than 1,000 public and Catholic school fifth and sixth graders to read short scenarios that described success or failure experiences and to indicate the extent to which effort (e.g., “I’m careful in my work”), ability (e.g., “Everyone knows I do math badly”), or external factors (e.g., “The teacher likes me”) could explain the outcomes, if they themselves had actually lived through these experiences. Overall, we found that, relative to their public school peers, African American and Latino Catholic school students attributed success and failure to causes that were helpful for learning (Bempechat et al., 2002). For example, these students were much less likely than their public school peers to believe that success could be because of external factors, such as luck or an easy test. Relative to their public school peers, the Catholic school African American students were much less likely to believe that failure could be because of external factors, such as being disliked by the teacher or having studied the wrong material. Again, this is a helpful belief because it implies that failure is controllable and potentially avoidable. This first phase of our work provided evidence that, at the elementary school level, Catholic school students seem to hold more adaptive beliefs about learning than do their public school peers. However, these findings were limited by the very method we chose to employ. Because we used a questionnaire, the particular learning beliefs that students responded to and the way in which they responded (a 5-point scale) were dictated by us from the outside (an etic perspective). However, what we as researchers feel are important constructs may not match what students believe to be important (an emic perspective; Strauss, 1987). We thus launched a 4-year, longitudinal investigation of the ways in which adolescents in Catholic high schools conceptualize and speak about learning, achievement, and motivation in the context of their educational experiences. Our goal is to build a grounded theory by focusing on the issues that students raise in response to open-ended and semistructured interviews (Bempechat, 2003).
Developing Grounded Theory: A Longitudinal Investigation We designed our study to address the following research question: How do low-income adolescents of color construct meaning about learning, achievement, and motivation? More specifically, how do these students conceptualize
Salkind_Chapter 53.indd 335
9/4/2010 10:40:16 AM
336
Motivation
the role that education plays in their lives, present and future? How do they perceive and interpret teachers’ goals for them? What are the ways in which family and peers foster or inhibit their school progress? At each of two urban Catholic high schools, we have been following a group of 20 students, half females and half males, all of whom come from lowincome families. These students are not exemplary pupils. Many are struggling to stay in good academic standing. When we began this study, half of the students were 9th graders and half were 10th graders. At Sienna High, the students are African American. At Norman High, the students are of Dominican descent. With the schools’ and parents’ permission, we conducted two individual interviews with each student in the spring of 2000. With few exceptions, each student was interviewed by the same member of our research team. Each interview lasted about 45 minutes, or one school period. We audiotaped all the interviews for later transcription, and all students were assured that their comments and opinions would remain anonymous. They also understood that they were free to refuse to answer any question and knew that they had the option of withdrawing from the study at any time. We designed the first interview to elicit open-ended descriptions of students’ learning experiences. Our goal was to let the students dictate the topics of discussion. We were very concerned about not putting words into the students’ mouths. We probed their responses by asking them to say more or give examples to illustrate the points they were making. After an initial reading of these interview transcripts, we developed the second interview as a semistructured questionnaire designed to examine perceived parent and peer support for learning and to ask students to speak about the meanings of academic-related words such as learning, motivation, success, failure, ability, and effort. After we reviewed the material from both interviews, we raised common themes in a focus group interview at the end of the school year. During a breakfast meeting at Norman High and a pizza lunch at Sienna High, the research team members asked the students to comment on issues that emerged in many of the interviews. Following conventions of qualitative analyses, we first read each interview to get a sense of what each individual student was expressing – his or her own beliefs, concerns, and questions. We then read “across” the interviews, paying particular attention to words and phrases that the students used frequently and spontaneously. The themes that emerged – culture of caring, personal responsibility, and adaptive achievement beliefs – provide distinct categories, but the reader will see that they are very much interrelated.
“You Need to Work Up”: The Culture of Caring The students we interviewed described their school as a caring environment, where teachers take a deep interest in both their academic and psychosocial well-being. Many students described the Sienna and Norman High Schools’
Salkind_Chapter 53.indd 336
9/4/2010 10:40:16 AM
Bempechat et al.
Achievement and Motivation in Catholic School Students 337
“family,” even though Norman High has an enrollment of 1,100 students. Enrique described his perceptions in terms of his adjustment to the school: I thought it was going to be a lot harder for me, and it’s like, everybody here at the school is just like one big family, they try to help out a lot. But the teachers, they help any time they can, like if they see you falling off, like, in the beginning, I was doing, I was like doing pretty good. But once the basketball season started, I kinda fell off a little bit. The teachers were like, “I see your grades dropping a little, you need to work up.” So I started staying after, started to keep working. . . . It’s like they care for you so much at this school, they make sure they don’t want nobody, you know, to fall down in their grades and fail and not be able to, you know, reach their goals in life.
The students described their teachers variously, as “nice,” “mean,” “cool,” boring,” and “strict.” Regardless of the characterization, students’ overall comments describe a faculty who deeply care about and believe in their ability to learn. Abel experienced this as being pushed to higher levels, whereas for Darnell, in particular, this realization came through the fact that many teachers know who he is: Umm, the teachers here are like really cool. They, they’re not narrow minded about just one thing. They’re for everybody, they help everybody. If you’re, you’re doing bad or something, or you need help. They are there, like, to put you up to the other level that you should be in. (Abel) There’s more people like, that know your name, like teachers, some teachers … they might like know, like know the kids that really want to succeed in life. And they won’t really know the kids, um, that really like, have low grades—they won’t know their name, and they really wouldn’t care about cuz it’s only the people that have to care about. (Darnell)
“It Shows If You Don’t Have Effort”: Personal Responsibility in Learning The theme of personal responsibility emerged in the many comments that students made about the importance of effort and the necessity of setting goals. For many students, effort can be a double-edged sword that leaves them unsure about how much and when they should invest effort in their learning (Covington & Dray, 2002). Most students recognize that effort will enhance their academic performance. Yet they also realize that if they have to try hard, this implies that they probably are not smart. We did not hear this view from any of the students we interviewed. Quite to the contrary, the students spoke about the importance of persistence, as in Juana’s comments: Like, because I mostly think that effort is something that you put in, and it shows if you have effort, it shows if you doesn’t have effort, don’t have
Salkind_Chapter 53.indd 337
9/4/2010 10:40:16 AM
338
Motivation
effort. Because, umm, if you … let’s say you learn something but you don’t get it and you give up. You know, there’s no effort there. It’s learning something that you don’t get because you don’t get it automatically. Umm, it really plays a part in school because . . . also, umm, if you don’t have effort . . . in school you’re not going to get everything, you’re not like a genius, you know. Albert Einstein didn’t get everything, you know. Umm, but like if you have effort and you want to do something so bad, like, it often turns into ability. Because if you have ability and you have effort, to do something bad you’re going to eventually. . . . Well, it depends on how much time it takes, but you’re going to eventually make a difference in your school, work, and whatever you do.
The notion that effort – an unstable quality – can be eventually transformed into ability – a more enduring trait – is notable because it is more common among very young children (Nicholls, 1978; Nicholls, Nolen, & Thorkildsen, 1995). As early as the second grade, most students begin to view the relationship between effort and ability as compensatory. In other words, they begin to endorse the view that the harder they have to try, the “dumber” they must be. A great deal of research attention has been paid to classroom factors that can promote the mature view that Juana articulates (Cheung & Rudowicz, 2003; Eccles, Roeser, Vida, Fredricks, & Wigfield, 2006; Schunk & Pajares, 2002). Nadia talked about her desire to make it on the honor roll, noting that her failure thus far is her own doing: [It’s a challenge] being on the honor role. [laughing] I can’t get grades good enough to get on the honor role. . . . I ain’t studying hard. I want it, just to be on it, cause like, ever since I was like in sixth grade and up I’ve never been on the honor role. But, like, in elementary school I was on the honor role a lot. And now no more.
Indeed, the students who conveyed some dissatisfaction with their performance blamed only themselves, mirroring findings from our questionnaire study of elementary school students.
“Failure Is Not Really Something Bad”: Adaptive Achievement Beliefs For many students, the experience of failure can be debilitating and lead to learned helplessness and feelings of inability. Educators have found that many students can be helped by reorienting their perception of failure toward the belief that mistakes and setbacks are a natural part of learning (Lepper, Corpus, & Iyengar, 2005). In this context, Margarita’s comments about failure demonstrate that she values seeing the positive in what many consider a negative experience:
Salkind_Chapter 53.indd 338
9/4/2010 10:40:16 AM
Bempechat et al.
Achievement and Motivation in Catholic School Students 339
What [failure] means, umm … it’s not really something bad. It something that you need to try again. Umm, even if you fail at something that you want to do, don’t, don’t give up. Never, never give up. Umm, it’s something that probably takes motivation away from you. But, at the same time it wants you . . . umm, brings you new ideas to your head to break the obstacles. Like can I try this again, can I do something else. Umm, want me to move on our something. Like, it’s not really a bad term because we people are like saying failure, they are like, oh! you’re a failure. It’s not really bad because you have so many chances in this life and you can always try again. And, umm, I think, like let’s say I was to fail at something. I would try my hardest to do it again. To . . . like . . . make sure that something that I did wrong is fixed. Or because failure is the best way of learning, because when you fail something you learn, you learn more, because you want to succeed. And if you fail again, you even learn more. But like, and then at the same time when you fail and then you succeed it brings like … if you just succeed it’s just like I succeed, but if you fail and then you succeed, like, you learn more because you’re like, I failed at something but then I didn’t give up. . . . And then I got it right. That’s what I think [failure] means for me.
We found that, even when speaking about academic challenges, many students fell back on effort as a means of strategizing their way out of difficulty. Notably, Hector can articulate a strategy for coping with challenge even when he dislikes the work in question: Chemistry class, umm, that’s the toughest class for any first year. To be honest with you, that’s a tough class. And, and you know, I’m not going to lie to you but I can’t stand that class. I hate, I hate that class. [laughing] And, umm, it’s tough man. You have to memorize the periodic tables, the atoms. You know, stoichiometry, this and that, it’s tough man. I mean, but like I said you got to show perseverance and never say never. And, go for it. And that’s what I try to do.
Learning from Catholic Schools The most interesting finding of this research is that, when given the opportunity to express their views, these Catholic school students focused on their teachers’ commitment to them as learners and articulated mature and sophisticated views about their learning. The level of support and care that these students expressed has been reported in previous research on Catholic schools (Nelson & Bauch, 1997). Importantly, this finding dovetails not only with Noddings’s (2005) work on the positive psychosocial influence of caring adults in students’ lives but also with Wentzel’s (2002) recent research on social motivation. Her research has revealed that students who feel cared for and who have supportive teachers who mentor them tend to do better in school, both
Salkind_Chapter 53.indd 339
9/4/2010 10:40:16 AM
340
Motivation
academically and socially. Furthermore, they tend to be supportive of their peers and more prosocial in and out of the classroom (Wentzel, 2004). However, we are struck by the degree to which educational goals and expectations were clearly communicated and understood by all students, an observation that has been made in previous work on Catholic school pedagogy (Hill, Foster, & Gendler, 1990). This is even more compelling when we consider that many of the students we interviewed fit the literature’s definition of those who are at risk for school failure (RAND, 2005). The students we interviewed perceived that their teachers not only hold them to high standards but also offer the support they need to meet these standards. For these students, the standards are not mysterious – they are clear, are unambiguous, and apply to everyone. In setting such goals, teachers are communicating the belief that all students have what it takes to achieve at the level expected of them. It is certainly the case that in Catholic schools, as in public schools, children become increasingly aware of who learns faster or who is “smarter” (Marsh, Hau, & Craven, 2004). Nonetheless, the message that all teachers can promote is that despite differences in rates of learning, everyone can and will learn. As all of us who study education reform know, higher standards in and of themselves do not guarantee higher achievement – they must be accompanied by ongoing support (Heubert, 2002). This support, according to the students we interviewed, was both emotional and academic. Teachers provided ageappropriate, pragmatic suggestions that helped them focus their efforts in ways that were likely to improve their performance. From the perspective of achievement motivation theory, the suggestions themselves can help to foster a sense of control over how well they do in school. Furthermore, to the extent that the teachers are offering up strategies for dealing with difficulty, they may be modeling persistence, a component of motivation that is critical for school success (Eccles et al., 2006). Finally, the students in our study perceived the standards and support they received to be ongoing, ebbing and flowing with variations in their performance. In other words, these students understood that their teachers would not tolerate performance that did not meet their definition of an acceptable standard. These students knew that they could not rest on previous laurels without being taken to task, should the quality of their work deteriorate. According to these students, their teachers appeared to be relentless in their pursuit of highquality work from their pupils. As Nicholls (1978; Nicholls et al., 1995) and others have shown, this insistence serves to communicate an unwavering belief in students’ ability to master the required work, a conviction that is a powerful motivator for all students (EdSource, 2006; Rosenthal, 2002). We cannot know the extent to which the adaptive beliefs about learning, endorsed by Margarita, Hector, and their peers, were fostered by the teachers’ pedagogical styles or were the result of factors having nothing to do with the school. Our goal remains the identification of influences that the students themselves perceive as influential. In this regard, it appears that these students appreciate teachers who
Salkind_Chapter 53.indd 340
9/4/2010 10:40:16 AM
Bempechat et al.
Achievement and Motivation in Catholic School Students 341
believe in them, who closely monitor their progress, and who provide a variety of emotional and academic supports to help them excel in school. In our view, the more we understand about how students think about and interpret their educational experiences, the better equipped we are to develop models of intervention that promise success for all students. Our hope is that the important lessons that we can glean from the success of Catholic schools will not be lost in the ongoing debate over self-selection.
Authors’ Note The authors gratefully acknowledge the ongoing support of Sr. Kathleen Carr, CSJ, superintendent of schools, Archdiocese of Boston; Robert J. McCarthy, president, David Paskind, associate principal, Sister Ellen Powers, CSJ, former president/principal, and the faculty and students of North Cambridge Catholic High School; and David M. DeFillippo, principal, Christopher Sullivan, assistant principal, and the faculty and students of Central Catholic High School, Lawrence. This work was supported in part by a Spencer Foundation Small Grants Award.
References Altonji, J. G., Elder, T. E., & Taber, C. R. (2005). Selection on observed and unobserved variables: Assessing the effectiveness of Catholic schools. Journal of Political Economy, 113(1), 151–184. Bempechat, J. (2003). Meeting the psychological and emotional needs of young adolescents: Exploring achievement and motivation in Catholic high school students. Washington, DC: National Catholic Education Association. Bempechat, J., Drago-Severson, E., & Boulay, B. A. (2002). Attributions for success and failure in mathematics: A comparative study of Catholic and public school students. Catholic Education: A Journal of Inquiry and Practice, 5, 357–372. Bryk, A., Lee, V., & Holland, P. (1993). Catholic schools and the common good. Cambridge, MA: Harvard University Press. Carbonaro, W. J. (2003). Sector differences in student learning: Differences in achievement gains across school years and during the summer. Catholic Education: A Journal of Inquiry and Practice, 7(2), 219–245. Cattaro, G. M. (2002a). Catholic schools: Enduring presence in urban America. Education and Urban Society, 35(1), 100–110. Cattaro, G. M. (2002b). Immigration and pluralism in urban Catholic schools. Education and Urban Society, 34(2), 199–211. Cheung, C., & Rudowicz, E. (2003). Underachievement and attributions among students attending schools stratified by student ability. Social Psychology of Education, 6(4), 303–323. Chubb, J. E. (2005). Within our reach: How America can educate every child. Lanham, MD: Rowman & Littlefield. Coleman, J., Hoffer, T., & Kilgore, S. (1982). Cognitive outcomes in public and private schools. Sociology of Education, 55, 65–76. Covington, M. V., & Dray, E. (2002). The developmental course of achievement motivation: A need-based approach. In A. Wigfield & J. S. Eccles (Eds.), Development of achievement motivation (pp. 33–56). San Diego, CA: Academic Press.
Salkind_Chapter 53.indd 341
9/4/2010 10:40:16 AM
342
Motivation
Eccles, J. S., Roeser, R., Vida, M., Fredricks, J., & Wigfield, A. (2006). Motivational and achievement pathways through middle childhood. In L. Balter & C. S. Tamis-LeMonda (Eds.), Child psychology: A handbook of contemporary issues (2nd ed., pp. 325–355). New York: Psychology Press. EdSource. (2006). Similar students, different results: Why do some schools do better? Palo Alto, CA: Author. Eide, E. R., Goldhaber, D. D., & Showalter, M. H. (2004). Does Catholic high school attendance lead to attendance at a more selective college? Social Science Quarterly, 85(5), 1335–1352. Ellison, B. J., & Hallinan, M. T. (2004). Ability grouping in Catholic and public schools. Catholic Education: A Journal of Inquiry and Practice, 8(1), 107–129. Goldberger, A., & Cain, G. (1982). The causal analysis of cognitive outcomes in the Coleman, Hoffer, and Kilgore report. Sociology of Education, 55, 103–122. Grant, H., & Dweck, C. S. (2003). Clarifying achievement goals and their impact. Journal of Personality and Social Psychology, 85(3), 541–553. Heubert, J. (2002). First, do no harm. Educational Leadership, 60(4), 26–30. Hill, P. T., Foster, G. E., & Gendler, T. (1990). High schools with character. Santa Monica, CA: RAND. Hoxby, C. (2003). The economics of school choice. Cambridge, MA: National Bureau of Economic Research Conference Report. Ilg, T. J., Massucci, J. D., & Cattaro, G. M. (2004). Brown at 50: The dream is still alive in urban Catholic schools. Education and Urban Society, 36(3), 355–367. Lepper, M. R., Corpus, J. H., & Iyengar, S. S. (2005). Intrinsic and extrinsic motivational orientation in the classroom: Age differences and academic correlates. Journal of Educational Psychology, 97(2), 184 –196. Marsh, H. W., Hau, K., & Craven, R. (2004). The big-fish-little-pond effect stands up to scrutiny. American Psychologist, 59(4), 269 –271. Morgan, S. L. (2001). Counterfactuals, causal effect heterogeneity, and the Catholic school effect on learning. Sociology of Education, 74, 341–374. Nelson, M. D., & Bauch, P. A. (1997, March). African American students’ perceptions of caring teacher behaviors at Catholic and public schools of choice. Paper presented at the American Educational Research Association, Chicago. Nicholls, J. G. (1978). The development of the concepts of effort and ability, perception of own attainment, and the understanding that difficult tasks require more ability. Child Development, 49, 800 – 814. Nicholls, J. G., Nolen, S. B., & Thorkildsen, T. A. (1995). Big science, little teachers: Knowledge and motives concerning student motivation. In J. G. Nicholls & T. A. Thorkildsen (Eds.), Reasons for learning: Expanding the conversation on student-teacher collaboration (pp. 5–20). New York: Teachers College Press. Noddings, N. (2005). Care and moral education. In H. S. Shapiro & D. E. Purpel (Eds.), Critical social issues in American education: Democracy and meaning in a globalizing world (pp. 297–308). Mahwah, NJ: Lawrence Erlbaum. Orfield, G., Frankenberg, E. D., & Lee, C. (2002). The resurgence of school segregation. Educational Leadership, 60(4), 16–20. Peterson, P. E., & Walberg, H. J. (2002). Countering the negative effect of poverty on learning. Chicago: Heartland Institute. RAND. (2005). Children at risk: Consequences for school readiness and beyond. Santa Monica, CA: Author. Rosenthal, R. (2002). The Pygmalion effect and its mediating mechanisms. In J. Aronson (Ed.), Improving academic achievement: Impact of psychological factors on education (pp. 25–36). San Diego, CA: Academic Press. Salganik, L., & Karweit, N. (1982). Voluntarism and governance in education. Sociology of Education, 55, 152–161.
Salkind_Chapter 53.indd 342
9/4/2010 10:40:16 AM
Bempechat et al.
Achievement and Motivation in Catholic School Students 343
Sander, W. (2001). The effects of Catholic schools on religiosity, education, and competition (Occasional Paper NCSPE-OP-32). New York: Teachers College. Sander, W., & Krautman, A. C. (1995). Catholic schools, dropout rates and educational attainment. Economic Inquiry, 33(2), 217–233. Schunk, D. H., & Pajares, F. (2002). The development of academic self-efficacy. In A. Wigfield & J. S. Eccles (Eds.), Development of achievement motivation (pp. 15–31). New York: Academic Press. Strauss, A. (1987). Qualitative analysis for social scientists. Cambridge, UK: Cambridge University Press. Weiner, B. (1994). Integrating social and personal theories of achievement strivings. Review of Educational Research, 64(4), 557–573. Weiner, B. (2005). Motivation from an attributional perspective and the social psychology of perceived competence. In A. J. Elliot & C. S. Dweck (Eds.), Handbook of competence and motivation (pp. 73–84). New York: Guilford. Wentzel, K. R. (2002). Are effective teachers like good parents: Teaching styles and student adjustment in early adolescence. Child Development, 73(1), 287–301. Wentzel, K. R. (2004). Understanding classroom competence: The role of social-motivational and self-processes. In R. V. Kail (Ed.), Advances in child development and behavior ( Vol. 32, pp. 231–241). San Diego, CA: Elsevier.
Salkind_Chapter 53.indd 343
9/4/2010 10:40:16 AM
Salkind_Chapter 53.indd 344
9/4/2010 10:40:16 AM
54 Dimensions of School Motivation: A Cross-cultural Validation Study Dennis M. McInerney and Kenneth E. Sinclair
I
n a multicultural society such as Australia, educators are concerned with the school performance of children from various minority groups. Within the context of Australian education, aboriginal children appear particularly disadvantaged with regard to academic achievement and school retention, whereas the children of certain migrant minority groups appear, in the latter part of the century, to be performing particularly well. In recent studies, Mclnerney (1989, 1990, 1991a, 1991b; Mclnerney & Sinclair, 1991) has examined a range of factors that are considered influential in determining the success or otherwise of particular groups within school settings, and in particular the studies have focused on key variables that predict school retention for these groups. In a study of aboriginal, migrant, and Anglo-Australian students (Mclnerney, 1988, 1989), a hypothesized set of influential background variables was examined using the Facilitating Conditions Questionnaire (FCQ). Parental influence emerged as the major discriminating variable for those aboriginal children who continued with school. It was also apparent that the child’s feelings toward school and the perceived support the child received from teachers and friends to continue with school, were also critical variables distinguishing the aboriginal school-leaver and nonleaver. Other variables such as negative peer influence and perceived value of school appeared to be not important as discriminant variables. Although parental influence emerged as the most important discriminant variable for the nonaboriginal groups, affect to school and the positive influence of teachers and peers on the child’s decision Source: Journal of Cross-Cultural Psychology, 23(3) (1992): 389– 406.
Salkind_Chapter 54.indd 345
9/4/2010 11:07:07 AM
346
Motivation
making appeared less important. For these groups the perceived value of school and negative peer influence appeared relatively more important. Convergent evidence for the importance of parental influence on the child’s decision to continue with school was obtained in a further study with the same sample using the Behavioural Intentions Questionnaire (Mclnerney, 1990). In addition to external factors such as parental encouragement and peer influence, factors intrinsic to the person, such as desire for achievement, competitiveness, and self-reliance also play an important role in influencing a student’s application to learning and schooling. In the international literature a key construct used to examine differential school performance across cultural groups has been achievement motivation. However, the methodological and conceptual difficulties involved in measuring and defining achievement motivation for cross-cultural use have been discussed in a large number of publications (see Davidson & Thomson, 1980; De Vos, 1968, 1973; De Vos & Caudill, 1973; Draguns, 1979; Maehr, 1974; Maehr & Nicholls, 1980; Pedersen, 1979). A theoretical model with clear and significant implications for methodological improvements in cross-cultural research on achievement motivation is Maehr’s Personal Investment Model (Braskamp & Maehr, 1983; Maehr, 1984; Maehr & Braskamp, 1986), which provides the framework for the present study. Three critical components are designated by this model in determining an individual’s personal investment (or motivation) in a specific situation. The first is Sense of Self, which refers to the more or less organized collections of perceptions, beliefs, and feelings related to who one is. Sense of Self is presumed to be composed of a number of components such as sense of competence, sense of autonomy, and sense of purpose, each contributing to the motivational orientation of the individual. The second component, Personal Incentives, refers to the motivational focus of activity, especially what the person defines as “success” and “failure” in a particular situation. Among possible personal incentives are task goals (e.g., experiencing adventure, novelty, or working to understand something), ego goals (e.g., doing better than others), socialsolidarity goals (e.g., pleasing others and making others happy), and extrinsicreward goals (e.g., working for a prize or reward of some kind). Each of these components is subdivided into two facets described in Figure 1. The third component, Perceived Alternatives, refers to the behavioral alternatives that a person perceives to be available and appropriate (in terms of the individual’s sociocultural norms) in a given situation. Each of these components may be influenced by the design of the task, the personal experience and access to information of the individual, and the sociocultural context. In summary, personal investment or motivation in a particular task or behavior is a function of the sense of self, the feelings toward the behavior or task, the personal incentives operating, and the perceived options available. Each of the dimensions, Maehr maintains, is significant in any individual or situation interaction and has been considered, at some time, important in explaining and interpreting the differential performances and motivation of
Salkind_Chapter 54.indd 346
9/4/2010 11:07:08 AM
McInerney and Sinclair
School Motivation
347
Personal Investment = Sense of Self + Affect + Personal Incentives SR (self-reliance) SE (self-esteem) GD (goal directed)
Ego Extrinsic Social Solidarity Task Rewards
– Competitiveness (co) – Power (pw) – Recognition (rc) – Token rewards (tn) – Social concern (sc) – Affiliation (af) – Task involvement (ta) – Striving for excellence (ex)
ACTION POSSIBILITIES
Figure 1: Dimensions of Maehr’s personal investment model
various cultural groups in school settings. The purpose of the present article is to describe the construction and validation of an instrument entitled the Inventory of School Motivation (ISM), which is based on the Personal Investment Model. The scale was developed (a) to test the “sense of self” and “personal incentives” dimensions of the Maehr model, (b) to test the applicability of the model and instrument in cross-cultural settings, and (c) to provide an instrument for measuring dimensions of motivation in classroom settings.
Method Subjects In total, 2,152 subjects were surveyed comprising 492 aboriginal students, 487 migrant-background students, and 1,173 Anglo students drawn from Year 7 to Year 10 in 12 NSW high schools. There were approximately equal numbers of males and females.
Materials Inventory of School Motivation (ISM) A presurvey of adult community members of the three groups was undertaken to ensure the cultural relevance of the items. An instrument was devised to evaluate the nature of school motivation for aboriginal-, Anglo-, and migrantbackground children. For the ISM, questions were written to measure the following 11 dimensions of the Maehr model: self-reliance (e.g., I can do things as well as most people at school), self-esteem (e.g., at times I feel that I’m no good at anything at school), goal directed (e.g., it is good to plan ahead
Salkind_Chapter 54.indd 347
9/4/2010 11:07:08 AM
348
Motivation
to complete my schooling), competitiveness (e.g., winning is important to me), power (e.g., I often try to be the leader of a group), recognition (e.g., having other people tell me that I did well is important to me), token rewards (e.g., getting merit certificates would make me work harder at school), social concern (e.g., it is very important for students to help each other at school), affiliation (e.g., I try to work with friends as much as possible at school), task involvement (e.g., the more interesting the school-work the harder I try), and striving for excellence (e.g., I try hard to make sure that I am good at my schoolwork). Items were measured by a Likert-type scale, from strongly agree (1) to strongly disagree (5). There were 100 questions in the final pool of items in the Inventory of School Motivation. There were approximately 9 questions targeted on each dimension of the model. The questions were randomly assigned throughout the form and contained 24 negative items to guard against response bias. Items comprising the questionnaire are found in the appendix.
Procedure Administration of the Survey Each survey session began with a standardized explanation of the purpose of the survey and a request for the support of the students in completing the survey accurately. To ensure that procedures adopted for the survey were standardized from school to school, to avoid any difficulties students might have completing the survey due to poor reading skills, and to ensure that the majority of students completed the questionnaire in the available time, the chief researcher read the questionnaire (including the standardized directions) aloud while students filled in their responses. Students who experienced difficulties in answering questions or who required other assistance simply raised their hand and one of the research assistants went to their aid. In this way the procedure of the survey was not interrupted.
Statistical Analyses Preliminary Data Reduction and Statistical Analysis Factorial Study 1. Preliminary analysis consisted of determining whether the designed instrument had construct validity for the full group as well as for each of the separate groups, aboriginal, Anglo, and migrant. As the Maehr model hypothesizes, 11 dimensions relating to sense of self and personal incentives, a principal axis factor analysis with orthogonal (varimax) rotation setting the NFACTORS parameter at 11 was performed on the data for the full group and each separate group. Pairwise deletion of missing data was used to maximize the amount of data available for each analysis.
Salkind_Chapter 54.indd 348
9/4/2010 11:07:08 AM
McInerney and Sinclair
School Motivation
349
Factor analysis of the set of 100 items for the full group (N = 2,152, M = 1,042, F = 1,110) resulted in 10 theoretically interpretable factors accounting for 98.2% of the variance in these items (although the last three factors consisted of doublets). Factors were named based on the content of the items with factor loadings that exceed .30. From this analysis it was apparent that for the full group of subjects the Inventory of School Motivation gave broad support for the existence of several discrete parameters that may influence student motivation in school settings, even though this analysis failed to find all of the 11 separate dimensions hypothesized in the Maehr model. The following dimensions were demonstrated: Self-Esteem, Self-Reliance, Affiliation, Social Concern, and Power (defined by group leadership). To a lesser extent, the existence of the dimensions Token Rewards and Competition was supported. The items designed by Maehr to measure ego and extrinsic rewards (viz., competitiveness, power, recognition, and token rewards) formed one general factor that we termed Extrinsic Motivation. Task rewards (viz., task involvement and striving for excellence) formed one factor that we called Intrinsic Motivation. It also included items written to measure Goal-Directed behavior. In order to assess the cross-cultural validity of the model and its reliability, a further series of principal axis factor analyses were performed on the three groups in the sample, aboriginal, Anglo, and migrant. In each case a varimax solution was chosen and the NFACTOR parameter was set to 11. Key dimensions of the Maehr model, Intrinsic Motivation, Extrinsic Motivation, SelfEsteem, Self-Reliance, Affiliation, Social Concern, and Power (group leadership), emerged again as major factors. The consistency of the findings across the four groups argues very strongly for the reliability of the ISM as well as for its construct validity. It gives strong support to the theoretical model from which it is derived. The ability of the model to illustrate characteristics of specific relevance to each group indicates its validity for use in a cross-cultural context.
Discussion The similarity of the factor pattern matrices across the three groups argues strongly for the etic validity of the constructs, whereas the differences that emerged in the composition of factors in the several groups support the emic validity of the scales derived from the constructs. It remains to demonstrate the relative importance of these dimensions for each group in determining performance level in educational settings. All scales were analyzed by means of the reliability subprogram of the SPSS package (Nie & Hull, 1981) for each group. Cronbach’s alphas were calculated for each scale. In general there was a high degree of reliability for each of the scales analyzed (with the majority being in excess of .70). Factor score variables were produced to represent the factors for each of the groups in later analyses. Reliability estimates are presented in Table 2.
Salkind_Chapter 54.indd 349
9/4/2010 11:07:08 AM
350
Motivation
The Significant Predictors Multiple Regression and Intention to Complete Schooling A series of stepwise multiple-regression analyses (based on listwise deletion of missing data) were conducted to ascertain which variables were of most significance for each of three groups in predicting school performance (in particular motivation to continue with school beyond the minimum school-leaving age). The criterion variable was the expressed intention of the subject to continue with school and complete the Higher School Certificate (the final year of study in NSW schools). The predictor variables included were scales derived from the factor analyses (earlier described). Table 2 presents the list of predictor variables for each of the three groups in the study. Because results from multiple-regression analyses can be severely affected by intercorrelations among the predictor variables, each of the predictor variables was correlated with each other using the Pearson correlation program from SPSS (Nie, Hull, Jenkins, Steinbrenner, & Bent, 1975). The pattern of correlations among the factor score scales indicated very low levels of intercorrelation ( < .14).
Further Model Testing and Data Reduction Factorial Study 2. To test the model further and to reduce the number of items comprising the ISM (it was intended to develop a set of [composite] scales that might be used to assess the motivational characteristics of students within school settings), the data were subjected to a further set of principal axis factor analyses with varimax rotation. It was felt that limiting the NFACTOR parameter to 11 may have prevented a number of other salient dimensions of the Maehr model from emerging in earlier analyses. Consequently, for each group (Anglo, aboriginal, and migrant), a further factor analysis was performed without any limitation on the number of factors to be obtained. These factor analyses of the ISM clearly identified important dimensions of the Maehr model, with the pattern of factor loadings providing support for the scales that the ISM was designed to measure. However, the unrestricted factor analyses generated more factors supportive of the Maehr model than the analyses based on an a priori restriction of the NFACTOR to 11 and they also generated a large number of trivial and poorly defined factors. In an attempt to remove them and to reduce the item set from the 100 original variables to a set more manageable for general classroom purposes, each factor analysis was scrutinized carefully in order to isolate those items that did not factor out for a particular group on any factor (there were only a small number of these), and those items that loaded on poorly defined or trivial factors. Through this procedure it was possible to select, for further analysis, items of greatest relevance to each particular group.
Salkind_Chapter 54.indd 350
9/4/2010 11:07:08 AM
McInerney and Sinclair
School Motivation
351
The reduced set of items for each group was subjected to a principal axis factor analysis using varimax rotation. Pairwise deletion of missing data was utilized. Table 1 presents a comparison of the factor structure for the three groups. Items defining each factor are included. Table 2 presents the multiple regression results for the three groups on the ISM for the intention to complete the Higher School Certificate. These
Table 1: Factor patterns for the ISM across three groups and items defining each factor for each group Aboriginal Intrinsic rewards (f1)a Extrinsic rewards (f2) Self-reliance (f3) Affiliation (f4) Competition (f5) Recognition (f6) Social concern (f7) Self-esteem (f8) Goal directed (f10) Power (f11) Confidence (aboriginal) (f9) – –
Migrant
Anglo
Intrinsic rewards (f1) Extrinsic rewards (f2) Self-reliance (f11) Affiliation (f5) Competition (f7) Recognition (f3) Social concern (f6) Self-esteem (f4) Goal directed (f10) Power (f9) – Token reward (f8) –
Intrinsic rewards (f1) Extrinsic rewards (f2) Self-reliance (f5) Affiliation (f6) Competition (f7) Recognition (f3) Social concern (f8) Self-esteem (f4) Goal directed (f10) Power (f9) – – Success (f11)
Defining Items Scale
b
Anglo
Aboriginal
Migrant
Intrinsic
7, 13, 16, 22, 30, 33 34, 40, 44, 56, 63, 66 68, 70, 79, 89
4, 7, 9, 11, 12, 13,16, 22 28, 30, 33, 34, 38, 39, 40 48, 54, 56, 57, 63, 69, 75 79, 83, 89, 96
Extrinsic
3, 15, 18, 27, 32, 52, 53 65, 88
8, 14, 15, 23, 24, 27, 32, 41 3, 6, 15, 18, 27, 32 44, 53, 65, 72, 73, 78, 91 53, 65, 88, 94
Recognition
12, 17, 20, 23, 24, 28 41, 73, 91
3, 6, 17
8, 12, 20, 23, 28, 41 24, 50, 73, 91
Self-esteem
45, 55, 67, 77, 80, 81 82, 95, 98
45, 77, 81, 100
45, 55, 57, 67, 70, 82
7, 13, 40, 56, 59, 60 63, 66, 69, 74, 75, 79 83, 84, 89, 93
Self-reliance
31, 59, 60, 69, 75, 83, 97
18, 60, 61, 66, 74, 90
31, 97
Affiliation
35, 36, 37, 42, 47
35, 36, 37, 42, 47
35, 36, 37, 42, 47, 61 1, 2, 14, 43, 76, 99
Competition
1, 2, 14, 43, 76
1, 2, 76
Social concern
10, 29, 46, 61, 74, 85
10, 21, 29, 46
10, 21, 29, 46, 85
Power
62, 71, 86, 94
86, 88
62, 71, 86
Goal directed
54, 84, 87
59, 87, 84
22, 38, 39, 48
Success
90, 93
–
–
Token
–
–
72, 78, 80, 90, 95, 98
Confidence
–
80, 95, 98
–
a. b.
Order of factor. Items are listed if they loaded 0.3 or greater on the factor.
Salkind_Chapter 54.indd 351
9/4/2010 11:07:08 AM
Salkind_Chapter 54.indd 352
935 90 44 67 72 75 68 54 71 *** 54 – –
1 2 3 4 5 6 7 8 10 11 9 – –
fnum
2
Aboriginal (n = 492)
627
419** –002 102* –060 –039 –047 –014 014 372** –036 162** – –
beta
3
3 – –
2
4
1
ord
4
88 91 60 72 82 85 63 72 74 75 – *** –
alpha 1 2 11 5 7 3 6 4 10 9 – 8 –
fnum
Migrant (n = 487)
Groups
591
423** –036 053 –095* 223** 084* 121** 091* 185** 026 – –087* –
beta
– 6 –
5 2 7 4 6 3
1
ord 1 2 5 6 7 3 8 4 10 9 – 11
***
fnum
89 87 81 71 80 84 63 79 76 66 –
alpha
Anglo (n = 1,173)
665
–105*
241** –052* 262** –070** 032 051* 125** 152** 429** –046 –
beta
6
9 5 4 1 10
3 8 2 7
ord
Note: 1. Reliability coefficients (Cronbach’s alpha); 2. Order of factor; 3. Standardized beta weights; 4. Order of importance of the significant predictor variables; 5. All coefficients are presented without decimal points. *p < .05. **p < .01. ***Reliability not available due to limitation of the Reliability program (Nie & Hull, 1981). A minimum of three items is required to constitute a scale.
Multiple R
alpha
Predictor variables
1
Intrinsic rewards Extrinsic rewards Self-reliance Affiliation Competition Recognition Social concern Self-esteem Goal directed Power Confidence (ab) Token reward (mig) Success (Anglo)
Factor score scale
Table 2: Sets of beta weights and multiple-correlation coefficients for each group (aboriginal, migrant, anglo) on predictor variables drawn from the inventory of school motivation (ISM) and intention to complete the higher school certificate
352 Motivation
9/4/2010 11:07:08 AM
McInerney and Sinclair
School Motivation
353
results indicate the usefulness of the ISM in determining the salient predictors of intentional behavior for the three groups studied. For each group (aboriginal, migrant, and Anglo), the combined set of culturally determined predictor variables developed from the personal investment theoretical framework was significantly related to the criterion variable. The multipleregression analyses therefore indicate the usefulness of the ISM in explaining and describing the nature of motivation for students from different cultural backgrounds in school settings, given the adequacy of the ISM for the three groups in the first place. In essence, the major correlates of intention for the nontraditional aboriginal students in this study were level of intrinsic motivation, desire to complete schooling, or lack of it, and level of confidence and self-reliance. Factors often alleged to be important determinants of aboriginal motivation in the school setting such as affiliation, social concern, self-esteem, and recognition did not emerge as important predictors in this study. A greater number of predictors was important for the migrant group, with eight scales significantly related to the criterion. Intrinsic Motivation, Competition, Goal Directed (to have a better future), and Social Concern accounted for most of the explained variance in the criterion variable. After Intrinsic Motivation, Goal Directed (to have a better future) and Competition emerged as the two most significant factors. This interesting finding supports the notion that many migrant children do well in Australian schools because of encouragement by their parents to work hard for a better future, and therefore to compete. Other variables that were expected to be significant for the aboriginal group but were not (viz., affiliation, social concern, recognition, and self-esteem) attained significance for the migrant group. For the Anglo group, all variables were found to be significantly related to the intention to complete schooling except for Competition. The most important predictor variable was Goal Directed (to complete schooling), followed by Self-Reliance and Intrinsic Motivation. Extrinsic Motivation was found to be significantly and negatively related to the intention to complete schooling. To the extent that the Anglo student is reward dependent, the less likely he or she is to hold the intention to finish school.
Summary Although direct numerical comparisons across the groups are not possible as each regression equation is based on a different set of predictor variables, some generalizations can be made. For all groups, Intrinsic Motivation appears to be a major predictor. It was the single most important predictor for the aboriginal and migrant group, whereas for the Anglo group Goal Directed (to complete schooling) emerged as the single most important
Salkind_Chapter 54.indd 353
9/4/2010 11:07:08 AM
354
Motivation
predictor followed by Intrinsic Motivation. For all groups, Goal-Directed motivation was a significant predictor but the nature of the goal direction varied across groups. The goal was school completion for the Anglo and aboriginal groups, and it was pinpointed as the student’s desire to complete schooling and to do better than his or her parents for the migrant group. A narrower range of predictors was significant for the aboriginal group. Apart from the two intrinsically oriented scales, aboriginal motivation to continue schooling is largely determined by feelings of self-reliance within the school setting. An attributional model of motivation thus appears particularly salient to this group of students. Attribution theory (Weiner, 1974) maintains that children who perceive that they lack ability (internal stable and uncontrollable factor) or perceive that the situation is beyond them (external, stable, uncontrollable factor) will withdraw from the task. These feelings of inadequacy may become intractable and lead to learned helplessness in school situations (Dweck & Goetz, 1978). Extrinsic Motivation emerged as a low level predictor for the Anglo group, being negatively related to the intention of completing school. There was a negative, though nonsignificant, relationship between extrinsic motivation and intention to complete school for the other two groups. Clearly, to the extent students say they intend to complete schooling, they are less dependent on external rewards. Conversely, those children who perceive little value in schooling and/or dislike it are likely to be reward dependent to keep them at the task of learning. Competition was not an important predictor for either the aboriginal or the Anglo groups; however, it was the second most important predictor for the migrant group. Power Motivation (indicated through a desire to be group leader) was not an important predictor for any group. Motivational characteristics such as Affiliation and Social Concern often claimed to be important for the aboriginal group, emerged as more important predictors for the migrant and Anglo groups throwing into stark relief the cluster of variables that was found to be significant for the aboriginal group.
Prediction and Behavior: Are They Related? Discriminant Analyses and Returning to School As a final test of the validity of the Inventory of School Motivation, a series of discriminant analyses (with stepwise variable selection and minimization of Wilks’s lambda) using the significant predictors from the initial analyses was performed with a subset of the data on those subjects who had continued
Salkind_Chapter 54.indd 354
9/4/2010 11:07:08 AM
McInerney and Sinclair
School Motivation
355
with school or left it before completing the Higher School Certificate. In other words, we set out to examine the value of the predictor variables identified and discussed earlier in distinguishing between those who remained at school and those who left after Year 10. Subjects consisted of 658 Anglo students (M = 313, F = 345), 283 migrant students (M = 154, F = 129), and 85 aboriginal students (M = 42, F = 43). Analyses with the Anglo group indicated that all of the predictor variables except Affiliation were retained in the analysis. The most important of these variables (based on standardized canonical discriminant coefficients) were Goal-Directed (school), Self-Reliance, and Success Motivation. Using this discriminant analysis, 72% of the sample were correctly classified as being at school or having left school ( p < .001). Analyses with the migrant group indicated that the best set of predictors was Intrinsic Motivation, Recognition, Self-Esteem, Affiliation, Competition, Token Reward, and Power. This combination of variables correctly classified 63% of the sample as being at school or having left (p < .001). Aboriginal analyses indicated that the best set of predictor variables was Goal Directed (school), Self-Reliance, and Confidence. Using this combination of variables, 70% of the sample were correctly classified (p = .002). Table 3 presents the comparison of the major discriminant variables for each group studied.
Table 3: A comparison of the discriminant variables drawn from the inventory of school motivation across three groups (aboriginal, anglo, and migrant) on continuing with school or leaving school after year 10 Standard canonical discriminant coefficients Discriminant variables
Aboriginal
Anglo
Goal directed Self-reliance Confidence Competition Intrinsic Power Token Self-esteem Recognition Social concern Extrinsic Affiliation
.962* .449 .405
.697 .485 .379 .403 .173 .207
% of group correctly classified
70%**
.232 .121 .199 .096
Migrant
.579 .551 .331 .305 .216 .189
.267 72%**
63%**
*Standardized canonical discriminant function coefficients indicate the relative importance of the variable to the discriminant equation. The higher the number the more important the variable. **Significant at the .001 level.
Salkind_Chapter 54.indd 355
9/4/2010 11:07:08 AM
356
Motivation
Discussion The pattern of discriminant variables for each group bears comparison. The major discriminant variables for the Year 10 aboriginal and Anglo groups are strikingly similar and stand in marked contrast to the pattern established for the migrant group. In the former case a self-efficacy model explains behavior; that is, an aboriginal or Anglo child who feels confident, is self-assured, and has a sense of purpose in schooling continues with schooling. An interesting difference between these two profiles should be highlighted, however. First, the range of variables relevant to the Anglo group is much greater, suggesting a more complex interplay of factors in the Anglo child’s decision to continue with school. In the case of the aboriginal children there is clear evidence that the explanatory base for their decision making is much narrower and relates very much to feelings of confidence and assurance within the school setting; this finding is of great importance. Contrary to expectations, competition was not found to be an important discriminant variable for the Anglo group. The pattern of discriminant variables for the aboriginal group appears even more telling when compared with the migrant group. In this latter case, the significant variables are rewards and competition. This pattern gives clear support to the hypothesis that the children of migrants are more competitive, independent, and desirous of proving their capacity to obtain rewards, both through self-satisfaction and extrinsic modes (such as recognition, marks, and power through group leadership), than Anglo or aboriginal children. The success rate of migrant children at school and their retention levels increasingly appears better than norms established for the nonmigrant groups. It should be noted that Goal Directed (to improve one’s life-style), which was an important predictor variable for this group, was not a discriminant variable. It is possible that those migrant children who leave school hope to obtain an occupation that will enable them to do better than their parents, even though they may lack the competitive drive and intrinsic motivation that characterizes their non-leaving confreres. Variables, which according to generally held beliefs about aboriginal students should have been discriminant variables, such as Affiliation, Social Concern, Competition, and Self-Esteem, did not emerge as such. Given the adequacy of the dimensions in the first place there seems little justification for emphasizing these variables in any analysis of aboriginal student performance at school. Greater attention should be given to investigating the development of school confidence and self-reliance in aboriginal students as well as the development of a positive sense of the value of schooling.
Salkind_Chapter 54.indd 356
9/4/2010 11:07:09 AM
McInerney and Sinclair
School Motivation
357
Appendix: Items Comprising the Inventory of School Motivation Predicted factors: (ta) working for the inherent interest (ex) striving for excellence (co) competitiveness (pw) power (afi) affiliation (sc) social concern (re) recognition (tn) token rewards (gd) goal directed (sr) self-reliance (se) self-esteem 1. I want to do well at school to be better than my classmates. 2. Winning is important to me. 3. I try to do well at school to please my teachers. 4. I like being given the chance to do something again to make it better. 5. I often try new things on my own. 6. I work hard it. school for rewards from the teacher. 7. I want to do well at school to show that I can do it. 8. I work best in class when I can get some kind of reward. 9. The more interesting the schoolwork the harder I try. 10. It is very important for students to help each other at school. 11. I don’t mind working a long time at schoolwork that I find interesting. 12. Having other people tell me that I did well is important to me. 13. I try hard to make sure that I am good at my schoolwork. 14. I am happy only when I am one of the best in class. 15. I work hard at school for presents from my parents. 16. I try to do well at school to please my parents. 17. Praise from my teachers for my good schoolwork is important to me. 18. I don’t often make mistakes at school. 19. I am always getting into trouble at school. 20. Getting a reward for my good schoolwork is not very important to me. 21. I like to help other students do well at school. 22. I want to do well at school so that I can have a good future. 23. Praise from my friends for good schoolwork is important to me. 24. Getting merit certificates would make me work harder at school. 25. Students shouldn’t depend on their friends for help with schoolwork. 26. I usually do the wrong things at school.
27. I like my teacher to show my work to the rest of the class. 28. I like to be encouraged for my schoolwork. 29. I care about other people at school.
Salkind_Chapter 54.indd 357
9/4/2010 11:07:09 AM
358
30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72.
Motivation
When I get good marks I work harder at school. I can do things as well as most people at school. I work hard because I want the teacher to take notice of what I say. I like to see that I am improving in my schoolwork. I need to know that I am getting somewhere with my schoolwork. I do not like working with other people at school. I can do my best work at school when I am working with others. I try to work with friends as much as possible at school. I aim my schooling toward getting a good job. I want to do well at school so that I have something better to look forward to than my parents. I work hard to try to understand something new at school. At school I work best when I am praised. I do better work by myself at school. Coming first is very important to me. Getting good marks is everything for me at school. At times I feel that I’m not good at anything at school. I enjoy helping others with their schoolwork even if I don’t do so well myself. When I work in groups at school I don’t do my best. I try hard to do well at school so I can get a good job when I leave. Not doing better than my friends in class is important to me. Having people notice my good schoolwork is not really important to me. I just do my schoolwork day by day without thinking about the future. I try to do well at school to please my friends. I like my schoolwork to be compared with others. It is good for me to plan ahead so I can do well at school. I feel I always need help with difficult schoolwork. When I am improving in my schoolwork I try even harder. Marks are the best way to know that you’ve done well at school. No one pays much attention to me at school. I am bright enough to continue my schooling to the Higher School Certificate. I like to think things out for myself at school. I don’t worry about other students, I just do my own work. I often try to be the leader of a group. Most of the time I feel that I can do my schoolwork. Kids usually pick on me at school. I work hard because I want to feel important in front of my school friends. I don’t need anyone to tell me to work hard at school; I do it myself. I often think that there are things I can’t do at school. The harder the problem the harder I try. On the whole I am pleased with myself at school. How I get on with other students is more important than how I get on with my schoolwork. At school I don’t like being in charge of a group. Getting rewards of money would make me work harder at school.
Salkind_Chapter 54.indd 358
9/4/2010 11:07:09 AM
McInerney and Sinclair
73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100.
School Motivation
359
I want to be praised for my good schoolwork. As long as I am doing my own work well other students don’t matter much. I am very confident at school. I work harder if I’m trying to be better than others. I wish I had a little more confidence in my schoolwork. Praise for good work is not enough, I like a reward. I try hard at school because I am interested in my work. Trying hard at school is not much fun if the competition is too strong. I often worry that I am not very good at school. Other students have to help me a lot with my work. I think that I can do quite well at school. I work hard at school so that I can go on to Year 12. It makes me unhappy if my friends aren’t doing well at school. It is very important for me to be a group leader. It is good to plan ahead to complete my schooling. I work hard at school because I want the class to take notice of me. I am always trying to do better in my school work. Things hardly ever bother me at school. Praise from my parents for good schoolwork is important to me. If I’m working alone, difficult schoolwork doesn’t bother me. I succeed at whatever I do at school. I work hard at school so that I will be put in charge of things. I only like to do things at school that I feel confident at. I often forget the time when I’m working on something interesting at school. I think I’m as good as everybody else at school. I always choose easy work for myself to do at school so that I don’t have too much trouble. I don’t like trying to be better than someone else. I don’t like being told my marks.
Authors’ Note This research was supported, in part, by two grants from the Australian Institute of Aboriginal Studies. We would like to thank Don Apearritt and Jenny Tjugiarto for their invaluable assistance. Requests for reprints should be sent to Dennis Mclnerney, School of Education and Language Studies, University of Western Sydney, Macarthur, P .O. Box 555, Campbelltown, NSW 2560, Australia.
References Braskamp, L. A., & Maehr, M. L. (1983). Personal investment: Theory, assessment and application. Revision of a paper present at AERA, Montreal, Canada. Davidson, A. R., & Thomson, E. (1980). Cross-cultural studies on attitudes and beliefs. In H. C. Triandis & R. Brislin (Eds.), Handbook of cross-cultural psychology (Vol. 5). Boston: Allyn & Bacon.
Salkind_Chapter 54.indd 359
9/4/2010 11:07:09 AM
360
Motivation
De Vos, G. A. (1968). Achievement and innovation in culture and personality. In E. Norbeck, D. Price-Williams, & W. M. McCord (Eds.), The study of personality. An interdisciplinary appraisal. New York: Holt, Rinehart & Winston. De Vos, G. A. (Ed.). (1973). Socialisation for achievement: Essays on the cultural psychology of the Japanese. Berkeley: University of California Press. De Vos, G. A., & Caudill, W. (1973). Achievement, culture and personality: the case of Japanese-Americans. In G. DeVos (Ed.), Socialisation for achievement. Essays on the cultural psychology of the Japanese. Berkeley: University of California Press. Draguns, J. G. (1979). Culture and personality. In A. J. Marsella, R. G. Tharp & T. J. Ciborowski (Eds.), Perspectives on cross-cultural psychology. New York: Academic Press. Dweck, C. S., & Goetz, T. (1978). Attributions and learned helplessness. In J. H. Harvey, W. J. Ickles, & R. F. Kidd (Eds.), New directions in attribution research ( Vol. 2). Hillsdale, NJ: Lawrence Erlbaum. Maehr, M. L. (1974). Culture and achievement motivation. American Psychologist, 29, 887– 896. Maehr, M. L. (1984). Meaning and motivation. Toward a theory of personal investment. In R. Ames & C. Ames (Eds.), Research on motivation in education: Vol. 2. Student motivation. Orlando, FL: Academic Press. Maehr, M. L., & Braskamp, L. A. (1986). The motivation factor: A theory of personal investment. Lexington, MA: Lexington. Maehr, M. L., & Nicholls, J. C. (1980). Culture and achievement motivation: A second look. In N. Warren (Ed.), Studies in cross-cultural psychology ( Vol. 2). London: Academic Press. Mclnerney, D. M. (1988). The psychological determinants of motivation of urban and rural nontraditional Aboriginal students in school settings: A cross-cultural study. Unpublished doctoral dissertation presented to the University of Sydney, Australia. Mclnerney, D. M. (1989). A cross-cultural analysis of students’ motivation. In D. M. Keats, D. Munro, & L. Mann (Eds.), Heterogeneity in cross-cultural psychology. Lisse: Zwets & Zeitlinger. Mclnerney, D. M. (1990). The determinants of motivation for urban Aboriginal students: A cross-cultural analysis. Journal of Cross-Cultural Psychology, 21, 474 – 495. Mclnerney, D. M. (1991a). The key determinants of motivation of urban and rural nontraditional Aboriginal students in school settings: Recommendations for educational change. Australian Journal of Education, 35, 154 –174. Mclnerney, D. M. (1991b). The behavioural intentions questionnaire. An examination of face and etic validity in an educational setting. Journal of Cross Cultural Psychology, 22, 293–306. Mclnerney, D. M., & Sinclair, K. E. (1991). Cross-cultural model testing: Inventory of School Motivation. Educational and Psychological Measurement, 51, 123–133. Nie, N., & Hull, C. (1981). SPSS update. New York: McGraw-Hill. Nie, N., Hull, C., Jenkins, J., Steinbrenner, K., & Bent, D. (1975). SPSS: Statistical package for the social sciences (2nd ed.). New York: McGraw-Hill. Pedersen, P. (1979). Non-western psychology; the search for alternatives. In A. J. Marsella, R. G. Tharp and T. J. Ciborowski (Eds.), Perspectives on cross-cultural psychology. New York: Academic Press. Weiner, B. (1974). Achievement motivation and attribution theory. Morristown, NJ: General Learning Press.
Salkind_Chapter 54.indd 360
9/4/2010 11:07:09 AM
55 Achievement Motivation in Children of Three Ethnic Groups in the United States Manuel Ramirez III and Douglass R. Price-Williams
I
n a recent article on achievement motivation, Maehr (1974) points out that McClelland’s well-known work in this area (1961) has given minimal attention to the fact that motives to achieve may be actualized in different ways in different cultures. He states: The important principle is that achievement and achievement motivation must be understood in terms of the sociocultural context in which they are found, as well as in terms of generalized descriptions of achieving norms or abstract constructions of psychological processes (p. 894).
In addition, Maehr points out that: Much of the research that attempts to understand the motivational patterns of ethnic and cultural groups involves placing children in a ‘middle-class-biased’ performance setting and then observing behavior (p. 894).
Maehr suggests that we would do well to pursue an ethnographic approach to the study of achievement motivation in cross-cultural research. He argues for an experimental anthropology of motivation. Gallimore, Weiss, and Finney (1974) agree with this point of view.
Source: Journal of Cross-Cultural Psychology, 7(1) (1976): 49–60.
Salkind_Chapter 55.indd 361
9/4/2010 1:09:33 PM
362
Motivation
In reviewing research on delay of gratification, they note: A methodological problem common to many cross-cultural, cross-ethnic investigations is the use of behavior observation classifications irrelevant or inappropriate to one or more groups about which comparative statements are made (p. 78).
De Vos (1968) has indicated that McClelland’s definition of achievement motivation is based on a Western view of psychodynamics – that it is dependent on a conception of human behavior as individualistically motivated. To support his argument, De Vos cites the importance of affiliation in motivation among “successful” Japanese, indicating that in Japan, striving for success is more often motivated by a concern for the reaction of others than by the pursuit of what in the West is considered self-satisfaction. A similar orientation toward achievement was also observed among Japanese Americans (Caudill and De Vos, 1956). Gallimore, Weiss, and Finney (1974) have noted that affiliation is critical to achievement among Hawaians. These investigators observed that Hawaiian parents socialized their children to be attentive to the concerns and expectations of others, and that this type of training makes children more responsive to affiliation and social rewards. In particular, Gallimore, Boggs, and Jordan (1974) found that young Hawaiians regard contributions to and continuing affiliation with the family system as more important goals than personal achievement and independence as these are represented by McClelland’s conceptualization of achievement motivation. The fact that Hawaiian culture and socialization emphasize identification with the family may be the critical variable in understanding why McClelland’s measures for n Achievement may not be appropriate for them. McClelland’s definition of n Achievement is consonant with socialization that encourages children to view themselves as individuals separate from their families. It is not likely that measures based on his definition would be appropriate for assessment of achievement motivation in most Mexican-American and Black children. We hypothesize that many MexicanAmerican and Black children, like Hawaiians and Japanese, are socialized to identify themselves with their family and ethnic group, and to cooperate for attainment of mutual goals: socialization in Mexican-American and Black cultures has strong affiliation components. Recent research by Gray (1975) supports this hypothesis for MexicanAmerican children. Using a questionnaire, she found that Mexican-American children expressed a greater tendency to want to achieve for others than did Anglo children. The research reported below studied achievement motivation in children of three ethnic groups in Houston, Texas: Mexican-American, Black, and Anglo. It was predicted that Mexican-American and Black children would score higher in family achievement – oriented toward achievement goals
Salkind_Chapter 55.indd 362
9/4/2010 1:09:33 PM
Ramirez III and Price-Williams
Achievement Motivation
363
which would benefit the family or achievement for recognition from family members. It was also predicted that the Anglo children in the study would score higher on n Achievement.
Method The subjects were 180 fourth grade children (mean age, 10.4) from Catholic parochial schools in Houston, Texas.1, 2 Sixty children were MexicanAmerican, 60 Black, and 60 Anglo. Half of the subjects in each group were male, half were female. There were also equal numbers of children of the lower and middle socioeconomic classes in each sex and ethnic group. Father’s occupation was used as an indicator of SES (Moore and Holtzman, 1965). A research team administered a short questionnaire in English to all fourthgrade children at the schools from which subjects were drawn. The questionnaire contained items concerning the language(s) spoken by the child and the parents, family activities, number of persons residing in the child’s home and their relationship to the child, and the size of the home. The children were also asked to draw a human figure. Those who had difficulty answering the questions or drawing the human figure were eliminated; the others were placed in a pool from which the subjects for the study were selected on the basis of ethnicity, sex, and SES.
Mexican-Americans The majority of Mexican Americans selected for this study are bilingual. These people are well identified with the traditional Mexican-American system of values, that is, they have close ties to members of their extended families, they are familiar with both Mexican and Mexican-American history, and their interpersonal relationships are characterized by warmth and a commitment to mutual help. Child-rearing practices emphasize respect for adults, family, and religious authority, and there is strong identification with Mexican Catholic ideology. The majority of the children selected to participate in this study were second- and third-generation Americans.
Blacks The Black residents of the areas of Houston from which our subjects were selected differ in many respects from Black populations in most urban settings in the United States. Most of these people are bilingual (French/ English) and most of the adults were reared in rural areas of Louisiana. Observations of these subjects indicated an emphasis on strong ties to the
Salkind_Chapter 55.indd 363
9/4/2010 1:09:33 PM
364
Motivation
extended family, respect for adults, respect for family and religious authority, and identification with the teachings of the Catholic Church.
Anglo-Americans The majority of Anglos from which we chose our subjects were Caucasians who made no indication that they identified with their original ethnic groups. None of the children were bilingual. Observations of the Anglo families indicated that there was a strong emphasis on encouraging children to develop identities separate from those of the family group. Children were also encouraged to be individually competitive.
Procedure The subjects were asked to tell a story to each of seven line drawings depicting a person(s) in a setting related to education. The tester asked each child to tell the most interesting story he could think of. In composing the story, each child was asked to answer three questions: (1) What is happening? (2) What happened before? (3) How will the story end? The content of each of the seven cards in the set is as follows: (1) student and teacher, (2) student and mother, (3) student and father, (4) two students of the same ethnic group, (5) two students, one of darker complexion than the other, (6) student, parents, and principal, and (7) student studying alone. Different male and female sets of cards were constructed for each of the ethnic groups. The subjects were tested individually in two separate sessions. Three cards were administered during the first sessions and four during the second. The subjects were tested by a member of their ethnic group. To score for n Achievement, a version of the McClelland scoring system devised by Riccuiti and Clark (1957) was abbreviated. A maximum of four points could be given for each story. One point was given for each of the following categories:3 (1) imagery – reference made to achievement or to a goal related to achievement (competition with a standard of excellence); (2) instrumental activity – any activity independent of the original statement indicating that the character in the story is doing something to attain an achievement goal; (3) positive outcome of instrumental activity – activity leads to attainment of the achievement goal; and (4) thema – the plot of the story revolves around achievement. The scoring categories for family achievement 4 are as follows: (1) imagery – reference made to achievement or attainment of an achievement goal (competition with a standard of excellence) from which the family would benefit or that would gain recognition from family members; (2) instrumental activity – any activity independent of the original statement that helps the character achieve for his family; (3) positive outcome of instrumental
Salkind_Chapter 55.indd 364
9/4/2010 1:09:33 PM
Ramirez III and Price-Williams
Achievement Motivation
365
activity – activity leads to attainment of the achievement goal; and (4) thema – achievement is the central plot or theme of the story. Those who scored the stories were trained with the manual by McClelland, Atkinson, Clark, and Lowell (Atkinson, 1958). All stories were scored blind, without knowledge of the sex or ethnic group membership of the subject.
Results The findings listed in Table 1 show that Mexican-American and Black subjects scored higher on family achievement than did Anglo children, while Anglo children scored higher than Mexican Americans and Blacks on n Achievement. A 3 × 2 × 2 ANOVA revealed significant ethnic effects for both the family achievement (F = 5.79, p < .01) and the n Achievement data (F = 5.73, p < .01). Sex and SES effects were insignificant for both the n Achievement and family achievement data. A separate ANOVA on the n Achievement data from the three parent cards of the SSPST yielded a significant ethnic effect (F = 6.87, p < .01). Mean scores for each subgroup on these three cards are contained in Table 2. A close examination of the data revealed that the highest scores on n Achievement were those of Mexican-American males; Mexican Americans scored lower than Anglos as a group because of the lower scores of MexicanAmerican females. Black and Anglo females also scored lower on n Achievement than the males of their respective groups. Females in all three ethnic groups scored higher than males on family achievement. Results of post hoc comparisons (Tukey) showed that Mexican Americans scored significantly higher than Black Americans on n Achievement (XM A− XB A = 2.39, p < .05) and significantly higher than Anglos on family achievement (XM A− XA A = 1.81, p < .05). There was no significant difference between Anglos and Mexican Americans on n Achievement.
Table 1: Ethnic group means and standard deviations of scores on family achievement and need achievement Ethnic group Black-American Family achievement
Need achievement
Family achievement
Anglo-American
Need achievement
Family achievement
Need achievement
X
S.D.
X
S.D.
X
S.D.
X
S.D.
X
S.D.
X
S.D.
3.11 3.19
3.75 2.54
3.13 2.50
4.22 3.16
3.51 3.54
3.29 2.97
5.77 4.66
4.62 4.25
1.65 1.77
2.27 3.14
5.65 5.39
6.72 5.01
Sex Male Female
Mexican-American
N = 30 for each group.
Salkind_Chapter 55.indd 365
9/4/2010 1:09:33 PM
366
Motivation
Table 2: Ethnic group means and standard deviations on need achievement scores on the three parent cards of the SSPST Ethnic group Black-American
Mexican-American
Anglo-American
Sex
N
X
S.D.
N
X
S.D.
N
X
S.D.
Male Female
30 30
2.80 2.23
1.16 1.17
30 30
2.80 3.23
1.63 1.81
30 30
2.17 1.83
1.44 1.41
Also, Anglo subjects scored significantly higher than Blacks on n Achievement (XA A− XB A = 2.68, p < .01), but Blacks scored significantly higher than Anglos on family achievement (XB A− XA A = 1.44, p < .05).
Discussion The results obtained here support the contention by Maehr (1974) that contextual conditions are important in expressions of achievement motivation and that the particular form in which achievement is expressed is determined by the definition which culture gives to it. The importance of contextual conditions for eliciting achievement responses is most evident in our finding that Mexican-American and Black children tended to score higher on n Achievement than Anglo children on those cards with parental figures, but scored lower than Anglos overall.5 These findings seem to be in line with those obtained by Schwartz (1969) with Mexican-American children in Los Angeles. Schwartz found that Mexican Americans, in contrast to Anglo-Americans, were more concerned about adult than about peer approval of their actions. Since the Mexican-American and Black subjects in the current study expressed achievement motivation in the form of family achievement, then, it seems likely that if more cards in the test set had contained scenes with parent figures the overall n Achievement scores of these subjects would have been higher. The most important cultural determinant of achievement motivation, at least for the members of the three groups studied here, may be the degree to which identification with the family is encouraged in socialization. The Mexican-American and Black groups seemed to encourage children to identify with the family early in life and to remain so identified, while the Anglo group seemed to encourage children to consider themselves as separate individuals early in life. The finding that females in all three cultural groups scored higher than males on family achievement and lower on n Achievement may indicate that the females were socialized to identify with the family more than were the males.
Salkind_Chapter 55.indd 366
9/4/2010 1:09:33 PM
Ramirez III and Price-Williams
Achievement Motivation
367
The discovery that Mexican-American males scored higher on n Achievement than the other subgroups and were exceeded on family achievement only by Mexican-American females may indicate that Mexican-American males have been socialized both to achieve for the self and for the family. This may be a result of the separation of the sex roles in Mexican-American culture (Madsen, 1964; Ramirez and Castaneda, 1974; also Tuddenham, Brooks, and Melkovich, 1974) and Mexican culture (Diaz-Guerrero, 1955). Researchers have indicated that Mexican and Mexican-American males are not subjected to as much pressure as females to adhere to convention and that as they get older they interact less with family members and more with persons outside the extended family. Tuddenham et al. (1974) found that Mexican-American mothers reported more sex differences in behaviors of their ten-year-old children than Black, Anglo, or Oriental mothers. The results of this study apply only to motivation attributed to like-sexed pictures. Future research should counterbalance sex of the main character in the pictures to ensure that data are not affected by the fact that achievement in most cultures is frequently associated with the male role. In the past, it has been all too readily concluded that Mexican Americans and Blacks have little motivation to achieve, and it has been assumed that somehow their cultures interfere with the development of this motivation. The results of the current study, however, show that the aforementioned conclusions are unjustified. That is, members of certain cultural groups may have appeared to exhibit little achievement motivation because the particular methodology used did not tap achievement motivation as interpreted by that cultural group and/or because the achievement motivation expressed was not recognized as such due to the narrow definition of achievement used.
Authors’ Note The work described in this paper was supported by a grant from the center for Research in Social Change and Economic Development, Rice University, Houston, Texas, financed under ARPA order 738. Subsequent analysis of results was supported in part by Research Grant HD 04612, NICHD, Mental Retardation Research Center, UCLA; by the California Department of Mental Hygiene; and by the University of California.
Notes 1. The authors would like to thank the Diocese of Houston for making subjects available for this study. 2. All of the schools were in neighborhoods which were ethnically homogenous and most of the instructional and administrative personnel in the schools were of the same ethnic group as the community and children. 3. The four scoring categories given above are those found by Riccuiti and Clark (1957) to have the greatest validity in scoring for n Achievement.
Salkind_Chapter 55.indd 367
9/4/2010 1:09:33 PM
368
Motivation
4. Family achievement should not be confused with s Power, as defined by McClelland et al. (1972). 5. The card that elicited most stories with family achievement themes from MexicanAmerican and Black children showed a child, parent, and a school principal in the principal’s office. A common story to this card was the following: The child is experiencing difficulty in school or has no interest in his(her) studies; the parents are asked to go to school to confer with the principal; after the conference, the parents take an interest in the child’s progress in school and ask the child to study more; this motivates the child to work hard and he(she) succeeds in school making his(her) parents proud of him(her).
References Atkinson, J. W. [ed.] (1958) Motives in Fantasy, Action and Society. Princeton, NJ.: Van Nostrand. Caudill, W. and G. A. De Vos (1956) “Achievement culture and personality: the case of the Japanese Americans.” Amer. Anthropologist 58: 1102–1126. De Vos, G. A. (1968) “Achievement and innovation in culture and personality,” in E. Norbeck, D. Price-Williams, and W. M. McCord (eds.) Personality: An Interdisciplinary Approach. New York: Holt, Rinehart & Winston. Diaz-Guerrero, R. (1955) “Neurosis and the Mexican family structure.” Amer. J. of Psychiatry 112: 411–417. Gallimore, R., J. W. Boggs, and C. Jordan (1974) Culture, Behavior, and Education: A Study of Hawaiian-Americans. Beverly Hills: Sage. Gallimore, R., L. B. Weiss, and R. Finney (1974) “Cultural differences in delay of gratification: a problem of behavior classification.” J. of Personality and Social Psychology 30, 1: 72–80. Gray, T. (1975) “A bicultural approach to the issue of achievement motivation.” Ph.D. Dissertation: Stanford University, School of Education. Madsen, W. (1964) Mexican Americans of South Texas. New York: Holt, Rinehart & Winston. Maehr, M. L. (1974) “Culture and achievement motivation.” Amer. Psychologist 29: 887–895. McClelland, D. C. (1961) The Achieving Society. New York: Free Press. ———, W. N. Davis, R. Kalin, and E. Wanner (1972) The Drinking Man. New York: Free Press. Moore, B. M. and W. Holtzman (1965) Tomorrow’s Parents: A Study of Youth and Their Families. Austin: Univ. of Texas Press. Ramirez, M. and A. Castaneda (1974) Cultural Democracy, Bicognitive Development and Education. New York: Academic Press. Riccuiti, H. N. and R. A. Clark (1957) A Comparison of Need-Achievement Stories Written by Experimentally “Relaxed” and “Achievement Oriented” Subjects: Effects Obtained with New Pictures and Revised Scoring Categories. Princeton, N.J.: Educational Testing Service. Schwartz, A. J. (1969) “Comparative values and achievement of Mexican-American and Anglo pupils.” Center for the Study of Evaluation, UCLA Graduate School of Education, Report No. 37. Tuddenham, R. D., J. Brooks, and L. Melkovich (1974) “Mothers’ reports of behavior of ten-year-olds: relationship with sex, ethnicity and mother’s education.” Developmental Psychology 10, 6: 959–995.
Salkind_Chapter 55.indd 368
9/4/2010 1:09:33 PM
56 Motivation and Learning Environment Differences between Resilient and Nonresilient Latino Middle School Students Hersholt C. Waxman, Shwu-yong L. Huang and Yolanda N. Padrón
A
lthough many programs and school-based interventions have been found to be effective for some types of students at risk of failure, these programs and interventions have not necessarily been effective for Latino students because programs need to specifically address many of the concerns of these students. Furthermore, even within the general Latino population, it cannot be assumed that all Latino students have similar backgrounds, motivation, and perceptions toward school (Reyes & Valencia, 1993). Some Latino students, for example, have been very successful academically in school, whereas other Latino students have experienced failure and despair in school. Consequently, it may be necessary to first look at Latino students who have done well in school and then see how they differ from less successful Latino students. One area of research that has important implications for the educational improvement of Latino students is that of examining resilient students, or students who succeed in school despite the presence of adverse conditions (Gordon & Song, 1994; Matsen, 1994; McMillan & Reed, 1994; Wang & Gordon, 1994; Winfield, 1991). Although the resilience construct has been widely used in areas like developmental psychopathology (Garmezy, 1991; Matsen, 1994; Matsen, Best, & Garmezy, 1990;
Source: Hispanic Journal of Behavioral Sciences, 19(2) (1997): 137–155.
Salkind_Chapter 56.indd 369
9/4/2010 10:39:37 AM
370
Motivation
Rutter, 1987, 1990), its application to educational phenomena has been fairly recent. Wang, Haertel, and Walberg (1994) defined educational resilience as “the heightened likelihood of success in school and other life accomplishments despite environmental adversities brought about by early traits, conditions, and experiences” (p. 46). Alva (1991) used the term academic invulnerability to describe students who “sustain high levels of achievement motivation and performance, despite the presence of stressful events and conditions that place them at risk of doing poorly in school and ultimately dropping out of school” (p. 19). Some Latino students do well in school despite coming from at-risk environments, and it is important to know why these resilient students succeed, whereas other Latino students (i.e., nonresilient students) from equally stressful environments do not. This approach is important because it focuses on the predictors of academic success rather than on academic failure. This focus may also help us design more effective educational interventions because it enables us to specifically identify those alterable factors that distinguish resilient and nonresilient students. The research thrust in this area is to extend previous studies that merely identified and categorized students at risk and to shift to studies that focus on identifying potential individual and school processes that lead to and foster success (Winfield, 1991). In other words, the construct of educational resilience is not viewed as a fixed attribute of some students but, rather, as alterable processes or mechanisms that can be developed and fostered for all students. Fixed attributes of individuals such as students’ ability have not been found to be characteristic of resilient students (Bernard, 1993; Gordon & Song, 1994; Matsen et al., 1990). On the other hand, there are several alterable processes or characteristics that have been found to be associated with resilient children. Bernard (1993), for example, maintained that there are four attributes or personal characteristics that resilient children have: (a) social competence like responsiveness, (b) problem-solving skills, (c) autonomy, and (d) a sense of purpose. McMillan and Reed (1994) described four factors that appear to be related to resiliency: (a) individual attributes, (b) positive use of time, (c) family, and (d) school. There have been very few studies, however, that have actually compared resilient and nonresilient Latino students on these characteristics. Furthermore, the research in this area has not typically used the resilience construct. Instead, it has generally focused on characteristics that have differentiated more successful and less successful students. In one such study, Alva (1991) examined the characteristics of a cohort of l0th-grade Mexican American students and found that successful or invulnerable students reported higher levels of educational support from their teachers and friends and were more likely to “(a) feel encouraged and prepared to attend college, (b) enjoy coming to school and being involved in high school activities, (c) experience fewer conflicts and difficulties in their intergroup relations with other students, and (d) experience fewer family
Salkind_Chapter 56.indd 370
9/4/2010 10:39:37 AM
Waxman et al.
Resilient, Nonresilient Students
371
conflicts and difficulties” (p. 31). She also supported the view that research on students at risk needs to focus on aspects of school success rather than school failure. She also maintained that educational policies need to focus on expanding both the protective resources and students’ subjective appraisals (e.g., perceptions or attitudes toward their classroom environment). In a study designed to understand successful high school students, Reyes and Jason (1993) examined factors that distinguished the success and failure of Latino students from an inner-city high school. Based on their 9th-grade attendance rate and academic achievement, they identified 24 l0th-grade students as being at high risk for dropping out of school and 24 students as at low risk. They individually interviewed each participant on a number of topics that covered four areas: (a) family background, (b) family support, (c) overall school satisfaction, and (d) gang pressures. They found that there were no differences between the two groups on (a) socioeconomic status, (b) parent-student involvement, and (c) parental supervision. Low-risk students, however, reported significantly more satisfaction with their school than did high-risk students. On the other hand, high-risk students were more likely to respond that they had (a) been invited to join a gang and (b) brought a weapon to school. The two studies previously described are examples of the growing body of research trying to address the issue of why some Latino students do well in school and succeed in school, whereas others have not been successful. One concern with these studies, however, is that they typically use only one indicator of success (e.g., grades or achievement data for 1 year) rather than measures that more accurately reflect the construct of educational resilience or being successful over time despite attending at-risk school environments. Furthermore, these studies do not examine important psychosocial behaviors that have been found to significantly influence students’ cognitive and affective outcomes and several key motivational variables, like achievement motivation and academic self-concept.
Purpose of the Present Study Although basic skills deficiencies are often cited as the most critical educational problem for Latino students and other students at risk of failure (Slavin, 1989), fostering or maintaining an effective classroom learning environment has been suggested as a means of enabling them to be successful in school (Chavez, 1988; Padrón, 1992; Pierce, 1994). There have been a few studies that have looked at the classroom learning environment of students at risk of academic failure (Duncan & Newby, 1993; Pierce, 1994; Waxman 1989; Waxman, Huang, Knight, & Owens, 1992), but those studies have not specifically compared resilient and nonresilient students’ perceptions of their classroom learning environment and instructional learning environment in inner-city schools. Similarly, there have been very few studies that have examined the
Salkind_Chapter 56.indd 371
9/4/2010 10:39:37 AM
372
Motivation
classroom learning environment of Latino students (Padrón, 1992). It is especially important to examine the learning environment of Latino students because there is some preliminary evidence that they perceive their learning environments very differently from English-monolingual students (Padrón, 1989) and African American students (Waxman, 1989). Furthermore, several studies have found that students perceive that there are differences in the ways high and low achievers are treated in the classroom (Babad, 1990; Weinstein, 1983, 1989; Weinstein & Middlestadt, 1979). Another concern with the prior research in the field is that most of the studies on learning environments have not included measures of students’ motivation and aspirations. It is important to include students’ motivation and aspirations as important aspects of the learning environment because they have been found to be highly related to both students’ academic achievement and the classroom learning environment (Cheng, 1994; Knight & Waxman, 1990, 1991; Uguroglu & Walberg, 1986). Furthermore, the variables of student motivation and classroom learning environments have often been researched and discussed separately, but they are so closely related conceptually that they need to be empirically examined together (Knight & Waxman, 1990). Only a limited number of studies, however, have investigated both the classroom learning environment and students’ motivation. The purpose of the present study is to compare resilient and nonresilient Latino students’ motivation and classroom learning environment in mathematics. In addition, other important background characteristics such as academic aspirations, attendance record, and student’ personal time allocation are examined between the two student groups because they have previously been found to be important variables that are related to students’ academic achievement (Dossey, Mullis, Lindquist, & Chambers, 1988). Furthermore, grade-and sex-related differences are examined in the present study because they have been previously found to affect at-risk students’ attitudes of their classroom environment (Duncan & Newby, 1993; Waxman & Eash, 1983). This study specifically addresses the following research questions: 1. Are there significant differences between resilient and nonresilient Latino students on background characteristics, academic aspirations, attendance records, and time allocation? 2. Are there significant differences between resilient and nonresilient Latino students on the dimensions of academic self-concept, achievement motivation, involvement, affiliation, satisfaction, and parent involvement? 3. Are there significant differences in the dimensions of academic selfconcept, achievement motivation, involvement, affiliation, satisfaction, and parent involvement by students’ sex and grade level? 4. To what extent do students’ background characteristics, classroom and instructional learning environment, and motivation discriminate resilient from nonresilient students?
Salkind_Chapter 56.indd 372
9/4/2010 10:39:37 AM
Waxman et al.
Resilient, Nonresilient Students
373
Methods Participants The present study was conducted in the five middle schools of a multicultural school district located in a major metropolitan city in the south central region of the United States. The school district was selected because it had relatively equal representations of Latino, African American, Asian Americans, and White students in each school and classroom. About 25% of the students enrolled in the district were Latinos, 30% were Whites (i.e., White, nonLatino), 25% were African Americans, and 20% were Asian Americans. In addition, this district was selected because Latino students represented an unsuccessful minority group. Latino middle school students in this district scored significantly lower than all other ethnic groups on statewide standardized achievement tests in mathematics and on the district-administered Four-Step Problem Solving Test (Hofmann, 1986). Furthermore, Latino students in this district had a significantly higher drop-out rate than all the other ethnic groups. Finally, we selected this school district because there is no tracking and students are heterogeneously grouped for mathematics. In other words, each mathematics class would generally include both resilient and nonresilient students. The majority of these middle school Latino students are foreign born, and the second largest number of these students were born in the United States but entered elementary school speaking a primary language other than English. Both groups of students typically received limited primary language instruction and were generally placed in submersion classroom environments with little special assistance. Most of the Latino students in the district came from working-class families. Most of these Latino parents do not have high school degrees, but they do have stable jobs in this urban community that has a large number of thriving businesses located within it. Despite the fact that school district is classified by the state as below average in property wealth, only 15% of the students come from low-income families. There is a very strong academic orientation in this district, as evidenced by the facts that nearly two thirds of the students in the district attend college, and only 6% of the students drop out of school. Furthermore, the composite standardized achievement test scores for middle school students in the district show students scoring at around the 70th percentile.
Instruments The following three standardized instruments were adapted and incorporated for use in the present study: (a) the Multidimensional Motivational Instrument (MMI; Uguroglu, Schiller, & Walberg, 1981; Uguroglu & Walberg, 1986),
Salkind_Chapter 56.indd 373
9/4/2010 10:39:37 AM
374
Motivation
(b) the Classroom Environment Scale (CES; Fraser, 1982, 1986), and (c) the Instructional Learning Environment Questionnaire (ILEQ; Knight & Waxman, 1989, 1990). All of the items on these instruments were modified to a personal form in the present study, which elicits an individual student’s responses to his or her role in their mathematics class, rather than a student’s perception of the class as a whole (Fraser, 1991). The Achievement Motivation and Academic Self-Concept scales from the MMI were used in the present study. The instrument has been found to have test-retest reliability and construct and predictive validity (Uguroglu et al., 1981; Uguroglu & Walberg, 1986). The Achievement Motivation scale measures the extent to which students feel the intrinsic desire to succeed and earn good grades in mathematics, and the Academic Self-Concept scale measures the extent to which students exhibit pride in their classwork and expect to do well in mathematics. The CES is a questionnaire that has been widely used in a variety of different educational settings to measure students’ perceptions of their relationships with students and teachers as well as the organizational structure of the classroom. The content and concurrent validities of the CES have been established through correlational studies and classroom observation (Fisher & Fraser, 1983; Fraser, 1982, 1986; Moos, 1979). Adequate internal consistency reliability coefficients were also obtained in previous studies (Fisher & Fraser, 1983; Fraser, 1982, 1986; Moos, 1979). For the present study, the two scales that were used were (a) the Involvement scale, which measures the extent to which students participate actively and attentively in their mathematics class, and (b) the Affiliation scale, which measures the extent to which students know, help, and are friendly toward each other in their mathematics class. The ILEQ measures students’ perceptions of several aspects of their instructional learning environment. It has been found to have adequate internal consistency reliability coefficients and test-retest reliability coefficients (Knight & Waxman, 1989, 1990; Waxman et al., 1992). For the present study, the two scales that were used were (a) the Satisfaction scale, which measures the extent of students’ enjoyment of their mathematics class and school work in mathematics, and (b) the Parent Involvement scale, which measures the extent to which parents are interested and involved in what their children are doing in mathematics. Each scale from the three instruments includes four items, and all of the items were measured on a 4-point, Likert-type scale ranging from 1 (not at all true) to 2 (not very true) to 3 (sort of true) to 4 (very true). Student responses to each item within the same scale were added and averaged. Consequently, a mean value of 4 indicates that the student responded favorably with the scale, whereas a mean value of 1 indicates that the student responded unfavorably to the scale. Several background items selected from the National Educational Longitudinal Study of 1988 (NELS:88) were also included in the final study
Salkind_Chapter 56.indd 374
9/4/2010 10:39:37 AM
Waxman et al.
Resilient, Nonresilient Students
375
survey (Hafner, Ingels, Schneider, & Stevenson, 1990). These items included questions about students’ (a) background characteristics (e.g., mathematics grades), (b) academic aspirations (e.g., how far they will go in school), (c) attendance (e.g., number of days missed), and (e) time allocation (e.g., time spent on homework). Students’ mathematics achievement was measured using the Four-Step Problem Solving Test (Hofmann, 1986). This test consists of 10 nonroutine mathematics problems, each with four related questions: (a) reading to understand the problem, (b) selecting a strategy, (c) solving the problem, and (d) reviewing and extending the problem. It is a multiple-choice, paper-andpencil test designed to measure problem-solving mathematics skills of middle school students. The range for the total test is 0 to 40. The school district in the present study annually administers the Four-Step Problem Solving Test to all middle school students to assess their problem-solving achievement in mathematics, which is the district’s top priority in mathematics.
Procedures The scales from the three instruments and the background items from the NELS:88 survey were combined into one survey and were administered concurrently by trained researchers near the end of the school year during students’ regular mathematics class. We selected two scales from each of the three instruments because the school district only allowed us about 45 minutes to administer the combined survey instrument. Students were informed by the researchers that they were not tests and that completed questionnaires would not be seen by their teachers or other school personnel. All middle school students in the district were asked to complete the questionnaire as part of an ongoing evaluation of the mathematics curriculum. The response rate for the student questionnaire was about 97%, and it took students approximately 40 minutes to complete. From the entire population of Latino students in the district who completed the questionnaire, a stratified sample of 60 resilient and 60 nonresilient Latino students were randomly selected to be included in the present study. Students identified as gifted or talented, special education, or developmental were excluded from the population to avoid potential effects related to ability differences. Students were classified as resilient if they (a) scored on or above the 75th percentile on the district-administered, standardized FourStep Problem Solving Test over a 2-year period and (b) reported receiving A’s or B’s in mathematics over a 2-year period. Students were classified as nonresilient if they (a) scored on or below the 25th percentile on the Four-Step Problem Solving Test for a 2-year period, (b) reported receiving C’s, D’s, or F’s for mathematics this year, and (c) reported receiving B’s, C’s, D’s, or F’s in mathematics the previous year. A stratified sampling technique was used to
Salkind_Chapter 56.indd 375
9/4/2010 10:39:37 AM
376
Motivation
obtain an equal number of students by sex and grade within each student group (i.e., resilient or nonresilient). Chi-square tests were used to compare the frequencies of responses between resilient and nonresilient students on the background items from the NELS:88 survey. A three-way multivariate analysis of variance (MANOVA) was used to determine (a) whether there are motivational and perceptional differences by students’ sex, grade, and student classification (resilient or nonresilient) and (b) whether there are any interaction effects by sex, student classification, and/or grade level. As a follow-up procedure, univariate analysis of variance (ANOVA) and post hoc multiple comparison tests were also performed to determine where the significant differences were. Finally, descriptive discriminant analysis was used to determine the extent to which the two groups differ with respect to their classroom learning environment, instructional learning environment, motivation, and background characteristics. To ensure adequate reliability and validity of the six scales used in this study, internal consistency (Cronbach alpha) reliability and discriminant validity (correlations between scales) were conducted. These coefficients were calculated using the individual student as unit of statistical analysis. The results indicated that the mean alpha coefficient of these scales was .60, and the individual coefficients ranged from .42 to .73, indicating that the survey instrument has adequate reliability given the few number of items per scale. The mean correlation between the scales was .29, and the individual correlations between scales ranged from .11 to .59, indicating that the survey instrument has adequate discriminant validity. We also examined the reliability and validity coefficients separately for resilient and nonresilient students but did not find any substantial differences between the two groups of students.
Results The descriptive and chi-square results for the two student groups revealed that there were no significant differences between the two groups on whether they spoke a non-English language before they started school, c 2(1) = 1.35, p = .256. About 76% of the resilient students indicated that they spoke a language other than English before they started school, whereas about 67% of the nonresilient students responded that they also spoke a language other than English before starting school. There were, however, statistically significant differences between the two groups on the extent to which students were held back a grade in school, c 2(1) = 23.48, p = .000. About 53% of the nonresilient students indicated that they were held back a grade in school, compared with only 13% of resilient students. There were significant differences between the two student groups on their academic aspirations. Resilient students were significantly more likely to indicate that they were sure that they would graduate from high school,
Salkind_Chapter 56.indd 376
9/4/2010 10:39:37 AM
Waxman et al.
Resilient, Nonresilient Students
377
c 2(2) = 17.01, p = .000, and they were significantly more likely to respond that they would graduate college and attend graduate schools, c 2(4) = 29.00, p = .000. About 78% of the resilient students indicated that they would graduate from high school, compared with only 43% of the nonresilient students. Similarly, over 90% of the resilient students indicated that they would graduate college or attend graduate school, compared with only about 46% of the nonresilient students. There were also significant differences between the two groups on attendance records. Resilient students were less likely to report cutting or skipping classes, c 2(3) = 10.53, p = .015, and being late for school than nonresilient students, c 2(4) = 21.87, p = .000. There were statistically significant differences between the two groups on two of the time allocation items. Resilient students reported that they spent significantly more time doing mathematics homework each week than nonresilient students, c 2(4) = 11.71, p = .020. Resilient students also indicated that they spent more time on additional reading than nonresilient students, c 2(4) = 21.81, p = .000. There were no significant differences between the two groups on the amount of time they spent watching television on weekends, c2(4) = 4.03, p = .402, or during the weekdays, c 2(4) = 4.89, p = .298, and on the amount of time spent listening to CDs, tapes, or the radio, c 2(4) = 7.54, p = .110. The three-way MANOVA results indicated that there are significant main effects of group and grade on middle school students’ motivation and perceptions of their learning environment. Resilient students’ overall motivation and perceptions of their mathematics classroom learning environment were significantly different from those of less resilient students, F(6, 103) = 7.36, p = .0001. Students’ overall motivation and perceptions of their learning environment also differed by grade, F(12, 206) = 1.97, p = .0280. There were, however, no significant main effects for sex, or interaction effects of (a) group by sex, (b) group by grade, (c) sex by grade, or (d) group by sex and grade. The descriptive and univariate ANOVA results for students’ motivation and perceptions by group and grade revealed that resilient students had significantly higher perceptions of involvement, F(1, 108) = 33.52, p = .000, satisfaction, F(1, 108) = 15.48, p = .000, academic self-concept, F(1, 108) = 28.10, p = .000, and achievement motivation, F(1, 108) = 13.15, p = .000, than nonresilient students. There were no significant differences between the two groups of students on the Affiliation, F(1, 108) = 3.18, p = .077, and Parent Involvement, F(1, 108) = 0.13, p = .718, scales. With the exception of parent involvement, the mean values for the resilient students were over 3.0, which indicates a highly positive attitude and motivation. With the exception of affiliation, the mean values on the scales for the nonresilient students ranged from 2.6 to 2.9. These values are slightly higher than 2.5 median value, which indicates that nonresilient students had slightly higher than average perceptions and motivation. The standard deviations were similar for the two groups, with the exception of achievement motivation for which nonresilient students had greater variation among their responses.
Salkind_Chapter 56.indd 377
9/4/2010 10:39:38 AM
378
Motivation
In regard to the grade-related differences, sixth-grade students reported significantly higher involvement than seventh-grade students, F(2, 108) = 5.20, p = .007. Eighth-grade students also reported significantly higher achievement motivation than seventh-grade students, F(2, 108) = 3.40, p = .037. There were no significant differences on the Affiliation, F(2, 108) = 2.48, p = .088, Satisfaction, F(2, 108) = 2.35, p = .100, Parent Involvement, F(2, 108) = 0.91, p = .405, and Academic Self-Concept scales, F(2, 108) = 0.77, p = .464, among the three grade levels. The standard deviations were generally similar across the three grade levels. A discriminant function analysis was performed to determine the extent to which the two groups differ with respect to their classroom learning environment, instructional learning environment, motivation, and background characteristics. To reduce the large number of variables examined in this study to a more parsimonious model, only those variables that were previously found to differ significantly between the two groups were entered directly into a discriminant model to see how well they were able to discriminate between the two groups of students. Descriptive discriminant analysis was used instead of predictive discriminant analysis because the purpose of the analysis was to describe the MANOVA results (Huberty & Barton, 1989). The direct entry model examines the independent contribution of each of the variables in determining group membership. The model produced a Wilks’s lambda of .501, F(12,107) = 8.87, which was statistically significant at the p < .0001 level. The discriminant function had a canonical correlation of .71, indicating a moderately strong relationship between the groups and the discriminant function. The squared canonical correlation coefficient for the model was .50, indicating that about 50% of the variance between the two groups can be explained by the 12 variables in this model. A classification matrix revealed that overall, 86% of the cases were correctly classified, with 90% of the resilient student cases correctly classified and 83% of the nonresilient student cases correctly classified. The standardized discriminant function coefficients describe the impact or independent contribution of a given variable on the grouping variable, holding constant the impact of all the other discriminating variables. The results indicated that the variables of not held back in school, academic aspirations, and expectations for high school graduation were found to have the greatest impact, after adjusting for all the other variables in the analysis. The variables time spent on homework and academic self-concept were found to have the least impact on the grouping variable. The canonical structure coefficients for each variable provide an indication of the relative contribution of each variable to the overall discriminant function. It describes how closely a variable and the discriminant function are related. The results indicated that 10 of the 12 independent variables included in the discriminant analysis were found to have structure coefficient values of .40 or greater and have the greatest practical significance for distinguishing
Salkind_Chapter 56.indd 378
9/4/2010 10:39:38 AM
Waxman et al.
Resilient, Nonresilient Students
379
between resilient and nonresilient students. These variables are academic aspirations, involvement, academic self-concept, expectations for high school graduation, not held back in school, satisfaction, late for school, time spent reading additional material, achievement motivation, and time spent on homework. Only the variables of days missed in school and cut or skipped class do not appear to be highly related to the discriminant function.
Discussion In the present study, we specifically focused on Latino middle school students from a multicultural, metropolitan school district and found that their motivation and psychosocial processes significantly differed between resilient and nonresilient students. Despite coming from the same school environment and similar home environments, some Latino students have done exceptionally well in their mathematics classes, whereas others have done quite poorly. As expected, we also found that resilient students are much more motivated than their nonresilient classmates and that they are much more satisfied and involved with their mathematics classes. These findings are similar to other studies that have found that student satisfaction differentiates resilient and nonresilient students (Alva, 1991; Reyes & Jason, 1993). Another interesting finding of the present study was that there was not a statistically significant difference between resilient and nonresilient students on the extent to which they spoke a language other than English before arriving at school. About 76% of the resilient students and 67% of the nonresilient students spoke a non-English language before they started going to school. This finding lends supports to other studies that have similarly found that language factors are not significant predictors and do not hinder Latino secondary students’ academic achievement (Adams, Astone, Nunez-Wormack, & Smodlaka, 1994; Buriel & Cardoza, 1988). Unlike previous studies, the findings from the present investigation did not reveal any sex-related differences. The grade-related differences found in this study revealed that sixth-grade students were more involved than seventh-grade students and that eighth-grade students had higher achievement motivation than seventh-grade students. Although not statistically significant, seventh-grade students were also found to be less satisfied with their mathematics class than sixth- and eighth-grade students. Additional studies may want to specifically investigate why seventh-grade Latino students have lower perceptions than other middle school students. Curriculum factors and/or instructional processes may need to be explored. Another important finding from this study related to the high academic aspirations held by resilient Latino students. Although the findings for the nonresilient Latino students are very similar to the overall national results for Hispanic students from the NELS:88 (Peng, Wright, & Hill, 1995), the
Salkind_Chapter 56.indd 379
9/4/2010 10:39:38 AM
380
Motivation
results for resilient students are much higher. Although there were no significant differences found between resilient and nonresilient students on their perceptions of parental involvement, which measures aspects of (a) parental interest (e.g., “My parents are often interested in what I do in mathematics”) and (b) parent expectations (e.g., “My parents expect me to do well in mathematics”), it is still possible that there might be differences in parents’ aspirations for their children. Another possible explanation for resilient Latino students’ high aspirations stems from the overall high academic press and expectations for students in the district. After all the quantitative data were collected, we informally asked two of the middle school mathematics teachers to help us explain why resilient students do significantly better in mathematics and have higher academic aspirations. They cited several personality traits like persistence and positive work habits that they thought distinguished resilient from nonresilient students. We specifically asked them about family characteristics that might distinguish the two groups of students, but they could not identify any family demographics that they thought were different. Further studies, however, may need to explicitly focus on students’ home and family characteristics. Students can be exposed to inappropriate educational experiences in either the family, school, or community (Pallas, Natriello, & McDill, 1989). Community demographics and family conditions, however, cannot be greatly changed by educators, whereas educational policy and practice can be modified to improve the education of students at risk (Comer, 1987; Waxman, 1992). Policymakers, administrators, teachers, and parents need to know why some students are successful and do well in school, whereas other students (a) from identical socioeconomic backgrounds, (b) from similar home environments, (c) with similar ability, and (d) from the same schools and classrooms do not do well academically. Examining these factors will allow us to investigate the circumstances that place these students at risk as well as those processes or factors that foster success. One of the major advantages of the approach of studying educational resilience is that it shifts us away from the educational research and policy perspective that has primarily focused on school failure and predictors of school failure to one that now focuses on the academic success of students who come from disadvantaged circumstances. In the present study, we examined indicators of at least four important factors that McMillan and Reed (1994) identified as being related to resiliency: (a) individual attributes, such as students’ motivation; (b) school and classroom factors, like satisfaction, involvement, and affiliation; (c) family factors, like parent involvement; and (d) positive use of time, like doing homework. Future studies should investigate other indicators of these four factors as well as examine other variables or factors that differentiate resilient and nonresilient Latino students. Nelson-LeGall and Jones (1991), for example, argue that classroom help-seeking behavior is a strategy or skill that allows learners to cope with academic difficulties and thus become
Salkind_Chapter 56.indd 380
9/4/2010 10:39:38 AM
Waxman et al.
Resilient, Nonresilient Students
381
a protective mechanism in the classroom learning context. Clark (1991) similarly suggests that social identity and support networks are resilient behaviors that need to be fostered and developed by students, and Barbarin (1993) maintains that we need to focus on the coping processes students use to mediate risk factors. These variables and others, like peer-group support, problem-solving skills, and students’ cognitive learning strategies, need to be explored in future studies. Although the present study specifically focused on examining motivational and psychosocial differences between resilient and nonresilient Latino students, other theoretical and conceptual work in the area has focused on the processes and mechanisms that can be developed and altered to facilitate students’ resilient behaviors. Rutter (1987, 1990), for example, has identified four processes that can be developed to facilitate resiliency: (a) reducing the risk impact and changing students’ exposure to the risk, (b) reducing the negative chain reactions that often follow exposure to risks, (c) improving students’ self-efficacy or self-esteem, and (d) opening up or creating new opportunities for students. Matsen (1994) has similarly described four strategies for fostering resiliency: (a) reducing vulnerability and risk, (b) reducing stressors, (c) increasing available resources, and (d) mobilizing protective processes. Swanson and Spencer (1991) provide some specific suggestions for enhancing most of these resiliency processes. They maintain that to reduce the risk impact, we should (a) increase access to academically challenging programs for disadvantaged students, (b) forge alliances between schools, churches, organizations, and businesses, and (c) increase funding for early childhood programs. To reduce negative chain reactions, Swanson and Spencer argue that teacher training, teacher recruitment, and teacher retention need to be addressed and altered, and parent involvement in schools also needs to be increased. To improve students’ self-efficacy, they argue that schools should recognize and demand academic performance and also redesign classrooms into heterogeneous ability groups rather than track by ability level. Finally, to open up opportunities, they maintain that there should be increased funding for compensatory education, student financial aid, pilot programs, and updated technological equipment. They also call for integrating resources from schools, businesses, and communities to help students make a smooth transition from the school to work environment. Although the results of the present causal-comparative study do not allow us to lend support to Rutter’s (1987, 1990), Matsen’s (1994), and Swanson and Spencer’s (1991) research, the findings from this study suggest that future experimental studies examining areas such as improving students’ motivation and self-efficacy may be warranted. An important methodological consideration that needs to be examined in other studies is the criteria chosen to define educationally resilient and nonresilient Latino students. Several specific criteria were chosen for the present study. First, standardized achievement test scores for a 2-year period were
Salkind_Chapter 56.indd 381
9/4/2010 10:39:38 AM
382
Motivation
used. Because the construct of educational resilience suggests sustained success or success over time, it is important that at least two measures of achievement over time are used. Standardized test scores are admittedly a narrow measure of students’ achievement, but they do represent one of the primary outcomes that school districts use to assess their educational accomplishments. The addition of student grades as a criterion helps support the success criteria. Again, we used grades from a 2-year period to examine the resilient criteria. Resilient students received A’s or B’s for a 2-year period, whereas nonresilient students typically received C’s or less for the 2-year period. Finally, the selection of only mathematics test scores suggests that a student may be educationally resilient in one content or subject area but not resilient in another. Given the large body of research that has found that there are content-specific attitudinal, instructional, curricular, and achievement differences for students (Needels & Gage, 1991; Stodolsky, 1988; Stodolsky & Grossman, 1995), it may be important to conduct content-specific research on resilience before we determine whether or not educational resilience is a content-specific or generic phenomenon. In other words, additional studies should examine if educational resilience is content specific (i.e., different according to the content area examined) or generic (i.e., similar across all content areas). Although the findings from the present study have some important educational implications, further descriptive, correlational, and especially experimental research is needed to verify these results. Longitudinal studies are also essential to adequately study the educational resilience phenomena. It is important to investigate at what point resilience develops, and it is also necessary to look at the long-term stability of the construct. Further studies also need to specifically examine how aspects of the classroom learning environment and instructional learning environment can be changed so that they can serve as protective mechanisms for students in at-risk school environments (Waxman, 1992). In addition, affective or motivational training programs may need to be developed and implemented to see if they improve Latino students’ affective and cognitive outcomes. These and similar issues should be examined so that we can continue to understand why some Latino students are resilient and how we can help other students develop resiliency and become more successful.
References Adams, D., Astone, B., Nunez-Wormack, E., & Smodlaka, I. (1994). Predicting the academic achievement of Puerto Rican and Mexican-American ninth-grade students. Urban Review, 26, 1–14. Alva, S. A. (1991). Academic invulnerability among Mexican-American students: The importance of protective and resources and appraisals. Hispanic Journal of Behavioral Sciences, 13, 18–34.
Salkind_Chapter 56.indd 382
9/4/2010 10:39:38 AM
Waxman et al.
Resilient, Nonresilient Students
383
Babad, E. (1990). Measuring and changing teachers’ differential behavior as perceived by students and teachers. Journal of Educational Psychology, 82, 683–690. Barbarin, O. A. (1993). Coping and resilience: Exploring the inner lives of African American children. Journal of Black Psychology, 19, 478–492. Bernard, B. (1993). Fostering resiliency in kids. Educational Leadership, 51(3), 44–48. Buriel, R., & Cardoza, D. (1988). Sociocultural correlates of achievement among three generations of Mexican American high school students. American Educational Research Journal, 25, 177–192. Chavez, R. C. (1988). Theoretical issues relevant to bilingual multicultural climate research. Educational Issues of Language Minority Students, 3, 5–14. Cheng, Y. C. (1994). Classroom environment and student affective performance: An effective profile. Journal of Experimental Education, 62, 221–239. Clark, M. L. (1991). Social identity, peer relations, and academic competence of AfricanAmerican adolescents. Education and Urban Society, 24, 41–52. Comer, J. P. (1987). New Haven’s school community connection. Educational Leadership, 44(6), 13–16. Dossey, J. A., Mullis, I. V. S., Lindquist, M. M., & Chambers, D. L. (1988). The mathematics report card: Trends and achievement based on the 1986 national assessment (Rep. No. 17-M-01). Princeton, NJ: National Assessment of Educational Progress. Duncan, L., & Newby, R. (1993). Attitudes of at-risk students toward their school environment. Texas Researcher, 4, 39–46. Fisher, D. L., & Fraser, B. J. (1983). Validity and use of Classroom Environment Scale. Educational Evaluation and Policy Analysis, 5, 261–271. Fraser, B. J. (1982). Development of short forms of several classroom environment scales. Journal of Educational Measurement, 19, 221–227. Fraser, B. J. (1986). Classroom environment. London: Croom Helm. Fraser, B. J. (1991). Validity and use of classroom environment instruments. Journal of Classroom Interaction, 26(2), 5–11. Garmezy, N. (1991). Resilience and vulnerability to adverse developmental outcomes associated with poverty. American Behavioral Scientist, 34, 416–430. Gordon, E. W., & Song, L. D. (1994). Variations in the experience of resilience. In M. C. Wang & E. W. Gordon (Eds.), Educational resilience in inner-city America: Challenges and prospects (pp. 27–43). Hillsdale, NJ: Lawrence Erlbaum. Hafner, A., Ingels, S., Schneider, B., & Stevenson, D. (1990). A profile of the American eighth grader: NELS:88 Student descriptive summary. Washington, DC: U.S. Department of Education, National Center for Educational Statistics. Hofmann, P. S. (1986). Construction and validation of a testing instrument to measure problem-solving skills of students. Unpublished doctoral dissertation, Temple University, Philadelphia. Huberty, C. J., & Barton, R. M. (1989). An introduction to discriminant analysis. Measurement and Evaluation in Counseling and Development, 2, 158–168. Knight, S. L., & Waxman, H. C. (1989, January). Development and validation of the instructional learning environment questionnaire. Paper presented at the annual meeting of the Southwest Educational Research Association, Houston, TX. Knight, S. L., & Waxman, H. C. (1990). Investigating the effects of the classroom learning environment on students’ motivation in social studies. Journal of Social Studies Research, 14, 1–12. Knight, S. L., & Waxman, H. C. (1991). Students’ cognition and classroom instruction. In H. C. Waxman & H. J. Walberg (Eds.), Effective teaching: Current research (pp. 239–255). Berkeley, CA: McCutchan. Matsen, A. S. (1994). Resilience in individual development: Successful adaptation despite risk and adversity. In M. C. Wang & E. W. Gordon (Eds.), Educational resilience in innercity America: Challenges and prospects (pp. 3–25). Hillsdale, NJ: Lawrence Erlbaum.
Salkind_Chapter 56.indd 383
9/4/2010 10:39:38 AM
384
Motivation
Matsen, A. S., Best, K. M., & Garmezy, N. (1990). Resilience and development: Contributions from the study of children who overcome adversity. Development and Psychopathology, 2, 425–444. McMillan, J. H., & Reed, D. F. (1994). At-risk students and resiliency: Factors contributing to academic success. The Clearing House, 67, 137–140. Moos, R. H. (1979). Evaluating educational environments: Procedures, measures, findings, and policy implications. San Francisco: Jossey-Bass. Needels, M., & Gage, N. L. (1991). Essence and accident in process-product research on teaching. In H. C. Waxman & H. J. Walberg (Eds.), Effective teaching: Current research (pp. 3–31). Berkeley, CA: McCutchan. Nelson-LeGall, S., & Jones, E. (1991). Classroom help-seeking behavior of AfricanAmerican children. Education and Urban Society, 24, 27–40. Padrón, Y. N. (1989, April). A comparison of bilingual and English-monolingual students’ perceptions of their classroom learning environment in reading. Paper presented at the annual meeting of the American Educational Research Association, San Francisco. Padrón, Y. N. (1992). Comparing bilingual and monolingual students’ perceptions of their classroom learning environment. In H. C. Waxman & C. D. Ellett (Eds.), The study of learning environments (Vol. 5, pp. 108–113). Houston: University of Houston. Pallas, A. M., Natriello, G., & McDill, E. L. (1989). The changing nature of the disadvantaged: Current dimensions and future trends. Educational Researcher, 18(5), 16–22. Peng, S. S., Wright, D., & Hill, S. T. (1995). Understanding racial-ethnic differences in secondary school science and mathematics. Washington, DC: National Center for Education Statistics. Pierce, C. (1994). Importance of classroom climate for at-risk learners. Journal of Educational Research, 88, 37–42. Reyes, O., & Jason, L. A. (1993). Pilot study examining factors associated with academic success for Hispanic high school students. Journal of Youth and Adolescence, 22, 57–71. Reyes, P., & Valencia, R. R. (1993). Educational policy and the growing Latino student population: Problems and prospects. Hispanic Journal of Behavioral Sciences, 15, 258–283. Rutter, M. (1987). Psychosocial resilience and protective mechanisms. American Journal of Orthopsychiatry, 37, 317–331. Rutter, M. (1990). Psychosocial resilience and protective mechanisms. In J. Rolf, A. Masten, D. Cichetti, K. Nuechterlein, & S. Weintraub (Eds.), Risk and protective factors in the development of psychopathology (pp. 181–214). New York: Cambridge University Press. Slavin, R. E. (1989). Students at risk of school failure: The problem and its dimensions. In R. E. Slavin, N. L. Karweit, & N. A. Madden (Eds.), Effective programs for students at risk (pp. 3–19). Boston: Allyn & Bacon. Stodolsky, S. S. (1988). The subject matters: Classroom activity in math and social studies. Chicago: University of Chicago. Stodolsky, S. S., & Grossman, P. L. (1995). The impact of subject matter on curricular activity: An analysis of five academic subjects. American Educational Research Journal, 32, 227–249. Swanson, D. P., & Spencer, M. B. (1991). Youth policy, poverty, and African-Americans: Implications for resilience. Education and Urban Society, 24, 148–161. Uguroglu, M. E., Schiller, D. P., & Walberg, H. J. (1981). A multidimensional motivational instrument. Psychology in the Schools, 18, 279–285. Uguroglu, M. E., & Walberg, H. J. (1986). Predicting achievement and motivation. Journal of Research and Development in Education, 19, 1–12. Wang, M. C., & Gordon, E. W. (Eds.). (1994). Educational resilience in inner-city America: Challenges and prospects. Hillsdale, NJ: Lawrence Erlbaum.
Salkind_Chapter 56.indd 384
9/4/2010 10:39:38 AM
Waxman et al.
Resilient, Nonresilient Students
385
Wang, M. C., Haertel, G. D., & Walberg, H. J. (1994). Educational resilience in inner cities. In M. C. Wang & E. W. Gordon (Eds.), Educational resilience in inner-city America: Challenges and prospects (pp. 45–72). Hillsdale, NJ: Lawrence Erlbaum. Waxman, H. C. (1989). Urban Black and Hispanic elementary school students’ perceptions of classroom instruction. Journal of Research and Development in Education, 22, 57–61. Waxman, H. C. (1992). Reversing the cycle of educational failure for students in at-risk school environments. In H. C. Waxman, J. Walker de Felix, J. Anderson, & H. P. Baptiste (Eds.), Students at risk in at-risk schools: Improving environments for learning (pp. 1–9). Newbury Park, CA: Corwin. Waxman, H. C., & Eash, M. J. (1983). Utilizing students’ perception and context variables to analyze effective teaching: A process-product investigation. Journal of Educational Research, 76, 322–325. Waxman, H. C., Huang, S. L., Knight, S. L., & Owens, E. W. (1992). Investigating the effects of the classroom learning environment on the academic achievement of at-risk students. In H. C. Waxman & C. D. Ellett (Eds.), The study of learning environment (Vol. 5, pp. 92–100). Houston: University of Houston. Weinstein, R. S. (1983). Student perceptions of schooling. Elementary School Journal, 83, 287–312. Weinstein, R. S. (1989). Perceptions of classroom processes and student motivation: Children’s views of self-fulfilling prophecies. In C. Ames & R. Ames (Eds.), Research on motivation in education: Goals and cognitions (Vol. 3, pp. 187–221). San Diego, CA: Academic Press. Weinstein, R. S., & Middlestadt, S. E. (1979). Students’ perceptions of teacher interactions with male high and low achievers. Journal of Educational Psychology, 71, 421–431. Winfield, L. F. (1991). Resilience, schooling, and development in African-American youth: A conceptual framework. Education and Urban Society, 24, 5–14.
Salkind_Chapter 56.indd 385
9/4/2010 10:39:38 AM
Salkind_Chapter 56.indd 386
9/4/2010 10:39:38 AM
57 Attracting and Retaining Teachers: A Question of Motivation Karin Müller, Roberta Alliata and Fabienne Benninghoff
Introduction
M
atching vacant teaching posts with qualified candidates is a key issue for the organization and running of schools. Given the cyclical patterns of teacher supply and demand, this matching operation is not an easy one. In a bid to overcome its short-term, annual recruitment horizon and to take early political action in order to avoid a shortage or surplus of teachers, the Canton of Geneva’s Education Department put in place a human resources planning system (Gestion prévisionnelle des enseignants [GPE]) which allows the Department to forecast demand for teachers up to five years in advance. All in all, the Education Department employs 7300 teachers at primary and secondary school level, teaching students from age 4 to age 19. However, forecasting the number of teachers needed is not enough for purposes of directing policy responses. What are the most significant measures for attracting and retaining competent teachers within the profession? The Canton of Geneva’s human resources planning system for teachers consists of complementary tools (i.e. a database, a dashboard of indicators, a prospective system and also surveys) to deliver information that will enable decision makers to identify areas where action might be particularly effective. The two surveys are aimed at a better understanding of the key stages in teaching careers: the motivation for entering teaching and the reasons for leaving the profession and taking early retirement.
Source: Educational Management Administration & Leadership, 37(5) (2009): 574 –598.
Salkind_Chapter 57.indd 387
9/8/2010 12:06:47 PM
388
Motivation
The objective of our present article is to define to what extent an understanding of these different types of teacher motivation can provide a decision framework for defining teacher policies that will make it possible to attract, retain and develop effective teachers.
Education System in Switzerland Swiss Institutional Background Switzerland has a federalist system where responsibility for education is divided between the Confederation (e.g. vocational training and tertiary education) and the cantons (e.g. compulsory schooling). However, responsibilities are not distributed in a simple, dichotomic way between the Confederation and the cantons. The Confederation and the cantons cooperate and provide mutual support for each other, in a spirit of ‘co-operative federalism’. The new Federal Constitution of 18 April 1999 confirms the historical sovereignty of the 26 cantons: ‘the cantons are sovereign insofar as their sovereignty is not limited by the Federal Constitution; they shall exercise all rights which are not transferred to the Confederation’ (Article 3). In concrete terms, this means that the cantons have the right to legislate in certain domains. This is also the case for the education sector, and, according to the Federal Constitution: 1) Education is a cantonal matter. 2) The cantons are to ensure sufficient primary education, open to all children. This education shall be compulsory, and shall be placed under state direction or supervision. It shall be free in all public schools. The school year shall begin between mid-August and mid-September. (Article 62)
Since the Swiss system of education is essentially the responsibility of the cantons, it is not correct to talk about a Swiss education system, since Switzerland does not have a single ‘Ministry of Education’ but rather 26 independent and distinct systems. Within the cantons, educational responsibilities are administered by the cantonal departments of education.
Teacher Policy on the Current Political Agenda The Teacher’s Key Role in a Changing Environment Interest in teacher policy research has intensified over the last few years for a number of reasons. First of all, key correlations exist between teacher quality and working conditions, on the one hand, and student learning, on the other. These correlations offer extensive political leverage for
Salkind_Chapter 57.indd 388
9/8/2010 12:06:48 PM
Müller et al.
Attracting and Retaining Teachers
389
improving school performance (e.g. Rivkin et al., 1998; Gustafsson, 2003; SECTQ, 2004). Second, given the size of the teacher workforce, policies that address issues like working conditions or curriculum reforms have a major impact on the organization and coordination of schools. Recent research projects have thus focused particularly on understanding the teacher’s role in respect of changes in society, the economy and schools in order to define effective teacher policies. The high level of international involvement (a total of 25 countries) in a recent study conducted by the Organisation for Economic Co-operation and Development (OECD), focusing on ‘attracting, developing and retaining effective teachers’, illustrates the scale of global interest (OECD, 2005). In Switzerland, worries about attracting, recruiting and retaining teachers have also been addressed by the Swiss Conference of Cantonal Ministers of Education (CDIP) that has drawn up guidelines for a recruiting strategy for teachers (Müller et al., 2003).
Geneva’s Human Resources Planning System for Teachers Education being primarily the responsibility of cantons, Geneva’s Department of Education is also in charge of the planning and management of teaching personnel. In 2001, the Education Department decided to set up GPE – making it possible to anticipate recruitment needs and define policy options over a mid-term horizon of four to five years. The planning system sets out to capture the most relevant factors influencing the supply and demand of teachers and to provide valuable assistance to policymakers for the recruitment of competent teachers. The GPE management tool is made up of four instruments: (1) the database, which constitutes the central database for teaching personnel; (2) the dashboard, with indicators that make it possible to track the evolution of the education system; (3) the prospective system, which is used as a tool to estimate quantitative needs for teaching personnel; and (4) the surveys, which permit the identification of key factors that are likely to influence the movements of teaching personnel (motivation for entering the profession and motivation for taking early retirement).
Theoretical Framework and Analysis of the Literature Work Motivation Theories When it comes to work motivation, many theoretical strands have been put forward to explain the relationship between individual motivation, job satisfaction and performance at work. The underlying hypothesis is that, with given individual capacities (intellectual, physical, know-how) and the organization put in place by a firm or administration (technical, human resources,
Salkind_Chapter 57.indd 389
9/8/2010 12:06:48 PM
390
Motivation
administrative), motivation can directly influence the individual performance of each employee – and ultimately influence the success of an organization. Although there are multiple definitions of motivation, a certain consensus has evolved on the main dimension that characterizes motivation. In fact, since motivation is difficult to observe directly, it has been defined by the behaviour that individuals are supposed to develop (Roussel, 2000). Vallerand and Thill (1993: 18) summarize the concept of motivation as a ‘hypothetical construct that is used to describe internal and/or external forces that generate the kickoff, the direction, the intensity, and the persistence of behaviour’. As a result, motivation can be defined as ‘a process that activates, orients, reinforces and maintains the behaviour of individuals towards the achievement of intended objectives’ (Roussel, 2000: 5). Ryan and Deci’s (2000a: 54) definition of motivation underlines this process-oriented concept: ‘to be motivated means to be moved to do something. A person who feels no impetus or inspiration to act is thus characterized as unmotivated, whereas someone who is energized or activated toward an end is considered motivated’. Based on Kanfer’s (1990) taxonomy of theories of motivation, there are three main paradigms that regroup current theoretical approaches: the first paradigm regroups need-motive-value approaches: according to these motivation theories, what leads an individual to start a type of behaviour, to direct it towards specific objectives and to support it both intensely and persistently is explained by needs, values and motives that have to be satisfied (e.g. Maslow’s need hierarchy theory, Alderfer’s ERG theory, Herzberg’s dual-factor theory, McClelland’s achievement motivation theory, Adams’ equity theory). The second paradigm regroups cognitive-choice theories: this paradigm rests on the guiding principle that ‘behaviour is determined by the subjective value of the objectives towards which the individual is working, but also by their expectancy to see their behaviour producing the required results’ (Oubraye-Rossel and Roussel, 2001) (e.g. Vroom’s expectancy theory, Weiner’s attribution theory). The third paradigm regroups self-regulation/metacognition theories: these theories try to explain how goals can have an effect on individual work motivation and to understand the processes that determine the objectives chosen by the worker. These theories include Carver and Scheier’s control theory, Locke’s goal-setting theory and the social learning theory of Bandura). Self-regulation is a fairly new construct of motivation, and recent research on strategies for enhancing motivation, have focussed on its promotion. The term self-regulated can be used to describe performance guided by three key processes: self-observation (monitoring one’s activities), self-judgement (self-evaluation of one’s performance) and self-reaction (reactions to performance outcomes) (Zimmermann and Schunk, 2001). We place our study within the theoretical framework of the first paradigm, which aims to identify the internal and external forces that have an impact on an individual’s motivations. More specifically, we make reference to the cognitive evaluation theory (Deci, 1971; Deci, 1975; Amabile et al., 1976;
Salkind_Chapter 57.indd 390
9/8/2010 12:06:48 PM
Müller et al.
Attracting and Retaining Teachers
391
Zuckermann et al., 1978) that has been extended into the self-determination theory (Ryan et al., 1985; Gagné and Deci, 2005). These theories draw a distinction between two fundamental types of motivation. According to Ryan and Deci (2000a: 55) ‘the most basic distinction is between intrinsic motivation, which refers to doing something because it is inherently interesting or enjoyable, and extrinsic motivation, which refers to doing something because it leads to a separable outcome’. Intrinsic motivation is also described as an ‘inherent tendency to seek out novelty and challenges, to extend and exercise one’s capacities, to explore, and to learn’ (Ryan and Deci, 2000b: 70), while extrinsic motivation regulates behaviour ‘in order to attain a separable outcome’ (Ryan and Deci, 2000b: 71). Self-determination theory considers extrinsic motivation from the angle of autonomy and control. It states that extrinsic motivation varies greatly with regard to its degree of autonomy: from external regulation (controlled motivation) right through to integrated regulation (autonomous motivation). The latter results from external values and behavioural regulations that tend to be internalized through socialization, thus leading to self-regulated behaviour (goal internalization). This means that ‘a behavioural regulation and the value associated with it have been internalized. Internalization is defined as people taking in values, attitudes or regulatory structures, such that the external regulation of a behaviour is transformed into an internal regulation and thus no longer requires the presence of an external contingency’ (Gagné and Deci, 2005: 334). To sum up, research findings on work motivation generally identify three sources of work motivation: intrinsic motivation, extrinsic motivation, and goal internalization as a subgroup of extrinsic motivation. Studies (e.g. Deci, 1971; Lepper et al., 1973; Deci, 1975) that analysed the relationship between intrinsic and extrinsic motivations showed that they are not necessarily independent of each other and that they can interact positively or negatively. These studies revealed, for example, that extrinsic rewards, such as pay, can have a detrimental effect on intrinsic interest and task persistence. However, these undermining effects of extrinsic rewards do not occur automatically. According to Kanfer (1990: 88): ‘Fisher (1978), for example, showed that financial rewards did not affect intrinsic motivation in situations consistent with societal norms about the role of pay for time and effort in real jobs’.
Motivation in Organization Theory and Human Resources Management Employee motivation is regarded as a critical factor by organization and human resource management theories, since organizations that can create work environments that attract, motivate and retain effective individuals will be better positioned to succeed in a competitive environment. As a consequence, these theories set out to define organizational designs and human resource strategies that ensure high employee motivation.
Salkind_Chapter 57.indd 391
9/8/2010 12:06:48 PM
392
Motivation
Motivation-based organization theories that adopt a behavioural view emphasize the difference between intrinsic and extrinsic motivations (Argyris, 1964; McGregor, 1960; Osterloh et al., 2001). Drawing on the findings of psychological approaches, such as the cognitive evaluation theory (Deci, 1975) and the observed relationship between extrinsic and intrinsic motivations, organization theories aim to develop strategies to manage the potential trade-off between the two types of motivation. Osterloh and Frey (2000) state that there are three aspects that should be taken into account when considering the integration into an organization of market elements, such as profit centres or variable pay for performance: increased control, reduced personal relationships and also performance-based rewards have potentially negative effects on intrinsic motivations. Qualified and motivated employees are considered to be a key factor for organizational success, according to resource-based human resource management theory (Wright and McMahan, 1992). Human resource management strategies are used to develop policies to select, develop, motivate and retain employees. Among these workforce management approaches, motivational inducement systems are applied in order to energize, direct, or sustain behaviour within organizations. Leonard et al. (1999) distinguish four commonly employed inducement systems applied in organizations: reward systems, managerial systems, task systems and social systems.
Findings Regarding Teacher Motivation In line with the theoretical framework of work motivation cited above, both Kyriacou and Coulthard’s (2000) and Obin’s (2002) findings on the motivational choices that prompt people to enter teaching lead to three distinct categories: (1) intrinsic reasons related to the teaching activity itself, such as the transmission of subject knowledge and expertise; (2) extrinsic reasons, such as working conditions, autonomy, pay level, job security and status; and (3) altruistic reasons, such as the desire to help children to succeed and the consideration of teaching as a socially valuable profession. Within the self-determination theory, this latter category may be considered as internalized extrinsic motivation, since it represents values associated with the teaching profession. Surveys carried out in the UK (Sturman, 2004), Australia (MCEETYA, 2003) and France (Esquieu, 2003; Esquieu, 2005) reveal a remarkable stability of motivational hierarchy: extrinsic aspects of a teacher’s job play an important role in respect of job security, flexibility to organize work and autonomy in pedagogic choices. Salary and financial benefits, though, are less important for those considering teaching. Some research findings suggest that pay incentives are unsuccessful in increasing teacher motivation, since teachers are mainly motivated by gratification derived from higher-order needs, such as social relations and esteem (Sylvia and Hutchinson, 1985).
Salkind_Chapter 57.indd 392
9/8/2010 12:06:48 PM
Müller et al.
Attracting and Retaining Teachers
393
Barmby and Coe (2004) conclude from their literature survey that working conditions are nevertheless important considerations for teachers: stress, long hours and relatively low remuneration are decisive factors that discourage potential candidates from choosing teaching as a career. Moreover, research into teacher motivation has revealed that key correlations exist between a student’s motivation and the teacher’s motivation. Pelletier et al. (2002: 193) found that ‘by the same way students could become less self-determined when exposed to controlling teachers, our results indicate that, when teachers are pressured by the school’s administration or by colleagues to behave in a specific manner, they also indicate that they are less self-determined toward their work’. Furthermore, the less teachers are selfdetermined towards teaching, the more controlling they become with students, which has a negative effect on the student’s intrinsic motivation and selfdetermination (Reeve et al., 1999). However, existing research does not establish a clear consensus regarding the benefits of teacher motivation for increased levels of student achievement (Bishay, 1996). In addition, a recent study conducted in the UK (Day et al., 2006) performed a quantitative analysis of the variations in a teacher’s lifecycle on their motivation. The authors identified six professional life phases related to a teacher’s experience and their relationship with specific motivational or demotivational factors. The first phase (0–3 years of experience) was thus associated with a crucial motivational factor, namely the support of the school and department leaders. Conversely, declining pupil behaviour had a negative impact on the motivation of this population of ‘novice’ teachers. As far as the second phase was concerned (4 –7 years), the study identified the management of heavy workloads as being the most demotivating factor. In phase 3 (8–15 years), holding positions of responsibility, with the possibility of progression in their career, had a positive impact on the motivation of this teacher group. In phase 4 (16–23 years), further career advancement and good results had a positive impact on teacher motivation. Phase 4 was also associated with a large number of negative motivational factors, however, such as managing heavy workloads, facing additional responsibilities in school or demands outside of school, achieving a work-life balance, a feeling of career stagnation, lack of support in school and poor pupil behaviour. As for phase 5 (24 –30 years), the most important reasons for teacher demotivation were a lack of support in school and bad pupil behaviour. Finally, in phase 6 (31 years and above), teachers generally considered they were having positive teacher-pupil relations and appreciated pupils’ progress. In contrast, however, health issues were beginning to surface, and teachers were demotivated by government policies and pupil behaviour. Research into teacher motivation is also often related to research into job satisfaction. According to Scholl (2002a: 2) these are ‘related but distinct behavioural forces with different determinants and different outcomes’. While motivation is generally ‘future directed’ and has previously been defined as a
Salkind_Chapter 57.indd 393
9/8/2010 12:06:48 PM
394
Motivation
process that activates, orients, and maintains the behaviour of individuals towards the achievement of intended objectives, job satisfaction is defined as the ‘extent to which expectations are met resulting in positive feelings’ (Scholl, 2002b: 3) and is therefore more ‘present directed’. Scholl (2001: 1) states that ‘dissatisfaction generally manifests itself in low membership motivation (absenteeism, turnover), and may result in the reduction of Extra Role Behaviour originally motivated by one of the inducement systems’. Research focusing on teachers and retention shows that teachers are more satisfied with their job, (1) if they feel supported by the school administration and by parents, (2) if they benefit from a certain autonomy in carrying out their job, and (3) if student behaviour and the school atmosphere are pleasant (NCES, 1997; Forneck et al., 2000; Gonik et al., 2000). Conversely, the physical and psychological fatigue of teachers increases, (1) if they face difficult relations with students and parents, (2) if they are subject to numerous reforms (pedagogic, organizational, technological, etc.), (3) if administrative tasks are increased, and (4) if they believe that teaching has lost its positive image (Spear et al., 2000; Basaglia and D’Oria, 2003; Cros and Obin, 2003; Papart, 2003). Studies that investigated specific reasons given by teachers for leaving their job mention the following factors as being particularly decisive: too heavy a workload, numerous government initiatives and reforms, the desire to take up a new challenge, a discouraging school situation (student behaviour, school management, etc.), stress, and personal circumstances (Smithers and Robinson, 2003; Luekens et al., 2004).
Towards a Decision Framework for an Effective Teacher Policy Our present study is setting out to develop a decision framework for an effective teacher policy based on teacher motivation. First of all, taking work motivation theory as a basis, we single out those motivations that are particularly significant for explaining decisions to enter or leave the teaching profession. Having identified these main sources of teacher motivation, we then focus on those that are potentially accessible to human resource policy measures, in a bid to identify a teacher workforce policy that will make it possible to attract, develop and retain effective teachers.
Method and Data Sources Method The GPE has been conducting an annual survey since 2002, in a bid to better understand teachers’ motivation for entering and also for leaving the teaching profession. In this article, we present the results of the most recent
Salkind_Chapter 57.indd 394
9/8/2010 12:06:48 PM
Müller et al.
Attracting and Retaining Teachers
395
surveys (candidates: 2004/5 academic year; teachers taking early retirement: 2003/4 academic year). We saw that the results were homogenous over the years, indicating the potential transferability of the findings. Anonymous questionnaires were sent by post to all the candidates who fulfilled all the recruitment requirements (population 1) and to all the teachers taking early retirement (population 2).
Participants and Instruments Survey of Motivations for Entering Teaching The most recent survey among potential future teachers was distributed to 590 candidates who fulfilled all the recruitment requirements. The participation rate was 52% (306 questionnaires returned).2 Women constitute the majority in the candidate survey (66%). They apply more for jobs as primary teachers (82%) than as secondary-level teachers (53%). Even though most candidates are aged between 20 and 29 (48%), a large proportion are between 30 and 39 years of age (32%) or even aged 40 or more (20%). In general, women candidates are younger than male candidates. This tendency is more marked for those applying for posts as primary teachers. Candidates took the decision to enter teaching at very different times: 36% decided to take up teaching 5 or more years ago, 42% between 1 and 5 years ago and 24% less than a year ago. Recent decisions are more common among candidates applying to be secondary teachers. It should also be noted that a quarter of candidates decided to enter the teaching profession after initial professional experience in another field. The questionnaire drawn up for carrying out the candidates’ survey included 43 questions on the motivation for entering teaching and 12 questions for measuring the teachers’ socio-demographic characteristics. In order to structure the analysis, 35 items were grouped in seven motivation categories: (1) humanistic values; (2) professional vocation; (3) working conditions; (4) personal experience; (5) social status; (6) mobility; and (7) choice by default.
Survey of Motivations for Leaving Teaching The questionnaire for teachers taking early retirement was sent to 204 teachers, 121 of whom (59%) replied. Somewhat more than 50 percent of the teachers decided to take early retirement less than one year prior to reaching retirement age, and slightly more than one third took this choice less than three years prior to retirement age.3 The average age of teachers taking advantage of the early retirement plan (Plan d’encouragement au départ anticipé, Plend4) is 59 years. Teachers in primary education – the
Salkind_Chapter 57.indd 395
9/8/2010 12:06:48 PM
396
Motivation
majority of whom are women – are generally younger when they leave the profession – at an average age of 57 years. In terms of the geographical location of the last school in which they worked, we saw that two-thirds of the teachers came from urban areas, one teacher in five from rural areas and 11% from suburban areas. For the purpose of this second survey, the questionnaire was structured in three sections. In the first section, the teacher was invited to draw up an assessment of their career and to reflect on the positive features and weaknesses of the teaching profession. The second section was the longest in the questionnaire, since it included a question made up of 38 items, each of which constituted a reason for leaving teaching. These items were grouped in eight categories related to: (1) work conditions; (2) workload; (3) quality of relationships with principals; (4) fatigue and health; (5) private life (a wish to spend more time with the family); (6) school policy; (7) Plend characteristics; and (8) private life (a wish to spend more time on leisure activities). This last section measured five socio-demographic characteristics of participants.
Data Analysis Data analysis was carried out in four stages: (1) a descriptive data analysis (frequencies) was used to draw up profiles of teachers according to their socio-demographic characteristics together with their motivation for entering or leaving the teaching profession; (2) a bivariate analysis (chi-squared tests) was applied, taking into account motivation for entering or leaving teaching together with socio-demographic variables, such as gender and the educational level being taught; (3) a multivariate analysis (factor analysis, cluster analysis) made it possible to identify different groups on the basis of the teacher’s motivational profile with regard to their decision to enter or leave teaching. The objective of the factor analysis is to reduce the large number of variables to fewer dimensions and to achieve a twodimensional representation of the essential information. This reduction is possible on account of the correlations that exist between the variables and is achieved by constructing synthetic variables, through a linear combination of the initial variables (Benzécri, 1973; Lebart et al., 1995). As far as cluster analysis is concerned, this consists in grouping the closest elements together in order to produce homogenous classes of individuals (Gordon, 1981; Lebart et al., 1995). Then (4), in order to map the motivations in more detail, we depicted the reasons for entering or leaving teaching on two matrixes. The x-axis represents the respective percentage of teachers who agreed with the entry or exit motivations suggested in the questionnaire (scale 1 to 10). On the y-axis, we classified each motivation on a scale according to its accessibility and responsiveness to policy measures in order to identify potential leverage (scale 1 to 10). The upper right quadrant of
Salkind_Chapter 57.indd 396
9/8/2010 12:06:48 PM
Müller et al.
Attracting and Retaining Teachers
397
the two resulting matrixes thus sets out the critical motivations that are highly responsive to political actions taken by educational decision makers. Finally, by comparing these two matrixes, we set out to identify transversal teacher policy priorities.
Results In the following, we highlight four groups of results that are organized on the basis of the data analysis stages set out above. The first three analyses are grouped according to survey.
Analysis of Motivations for Entering Teaching Descriptive Analysis of Entry/Exit Motivations The main motivations for teaching are grouped into three categories. Humanistic values are those which motivate candidates the most – for example, the wish to work in contact with children and young people (91 %), to help them succeed (95%), or the desire to transmit knowledge to them (88%), and the desire to give all students an equal chance (86%). Motivations associated with professional vocation – for example, identification with the teaching profession (76%), the possibility of exercising a profession they feel passionately about (93%) – and work conditions linked to the characteristics of the profession – for example, the possibility to work in a spirit of cooperation (91%) and to carry out an evolutionary and demanding job (91%) – also constitute key motivation categories for entering the profession (Table 1).
Differences in Motivational Orientation with Respect to Education Level and Gender The global results presented above obviously mask certain disparities. It is clear, for example, that significant differences (chi-squared tests, p < 0.05) exist among the motivations as a function of education level and gender. As far as education level is concerned, candidates applying to be primary teachers have a tendency to place more importance on the humanistic values and psychological aspects of teaching, as well as on the social role and the evolutionary and demanding aspect of the job, whereas candidates applying to be secondary teachers are more attracted by work conditions, and mainly by the flexibility of the schedule and the holidays. With regard to gender, we find that female candidates are more motivated by the relational and psychological aspects of the teaching profession. Furthermore, their job applications have more frequently been stimulated by previous professional experience in teaching.
Salkind_Chapter 57.indd 397
9/8/2010 12:06:48 PM
398
Motivation
Table 1: Entry motivations by education level (classified by categories) (in %) Entry motivations Category 1: Humanistic values Wish to help children and young people to succeed Opportunity to put key values into practice Wish to work in contact with children and young people Interest in work where human relations are important Wish to transmit knowledge to children and young people Interest in didactic and pedagogical aspects Opportunity to give all students an equal chance Interest in psychological aspects Wish to contribute to improving society Category 2: Professional vocation Profession that can be exercised with passion Identification with the teaching profession Teaching is still a useful profession An opportunity to avoid routine A vocation A profession for life A profession to be exercised for a few years only Category 3: Work conditions Category 3a: Characteristics of the profession Wish to work in a spirit of cooperation and sharing experience Motivated by an evolutionary and demanding job Appreciation of the autonomy and independence of teaching Interest in the possibilities for continuous training/professional development Motivated by on-the-job teacher training An opportunity to take on interesting responsibilities Category 3b: Extrinsic conditions Opportunity to reconcile private and professional life Appreciation of flexibility in schedule and activity rate Appreciation of a stable and secure job An opportunity to grant importance to family life Wish for sufficient holidays and leisure time Motivation of social security benefits and salary Constitutes an interesting complementary activity Category 4: Personal experience Currently the most appropriate choice Motivated by previous teaching experience Education pursued confirms this choice
Salkind_Chapter 57.indd 398
Total (n = 306)
Primary level (n = 129)
Secondary level (n = 165)
95
98
92
91 91
95 98
86 85
89
95
84
88
85
90
86 86
92 85
81 87
83 82
91 86
76 78
93 76 76 72 68 56 15
94 78 74 80 67 56 15
92 73 78 66 69 56 15
91
95
86
91
96
86
80
76
82
78
85
74
71 47
71 50
70 45
71
70
72
68
59
72
62 61
54 62
67 59
48
41
52
42
33
49
26
24
27
72 70 61
71 64 68
73 72 56
9/8/2010 12:06:48 PM
Müller et al.
Entry motivations Category 5: Social status Finds it meaningful to exercise a profession of general interest Wishes to practice a profession that has an important social role Teaching is a profession valued by society Aspires to attain the social status associated with the profession Category 6: Mobility Interest in working in different sectors Opportunity to work in different schools and locations within the canton Permits a professional change Permits work in different cantons and countries Category 7: Choice by default Main objective of studies Gave up an academic or research career Difficulty in finding another job
Attracting and Retaining Teachers
399
Total (n = 306)
Primary level (n = 129)
Secondary level (n = 165)
65
64
66
58
68
50
22 19
24 13
21 23
57 37
56 44
57 33
34 29
31 32
37 27
38 23 14
45 19 9
33 26 18
Note: The percentages refer to the respondents who stated that their choice to enter teaching was influenced ‘quite a lot’ or ‘very much’ by each motivation.
Motivational Typologies of Teachers A multiple correspondence analysis (Benzécri, 1973; Lebart et al., 1995) summarized the various response categories for the entry motivation variables in factors whose values were estimated for each individual. A hierarchical cluster analysis was performed on the resulting factor values, using Ward’s (1963) algorithm, in order to establish groups of teachers who were as homogeneous and as distinct from other groups as possible. This analysis of the motivational profiles of teacher candidates provided four groups. The ‘passionate’ group takes in 37% of all candidates. They identify strongly with the motivations related to the social dimension of teaching and the evolutionary nature of the job (e.g. an interest in professional development). They are also strongly motivated by the prospect of transferring their subject knowledge to students. The ‘engaged’ group accounts for another 37% of all candidates. Their motivational profile corresponds largely to that of the previous group. Their degree of agreement is less strong, however. Finally, candidates with ‘mitigated’ (9%) and ‘disillusioned’ (17%) motivations represent the last two groups. Even though they are somewhat motivated by the working conditions and humanistic values, they acknowledge that teaching is not their preferred professional choice.
Salkind_Chapter 57.indd 399
9/8/2010 12:06:49 PM
400
Motivation
Analysis of Motivations for Leaving Teaching Descriptive Analysis of Entry/Exit Motivations The eight types of motivation defined above can be grouped into two categories of factors: (1) motivational factors internal to the profession, or negative private motivations, which influence leaving decisions ( pushing factors); and (2) motivational factors external to the profession, or positive private motivations, which attract teachers towards the choice of departure ( pulling factors). Five types of motivation correspond to the pushing factors category: these motivations are related to changes in work conditions, workload, fatigue and health, relationships with principals and school policies. Three types of motivations are pulling factors: these motivations are associated with private life – for family or leisure related reasons – or with the Plend characteristics. According to the results of our study, two pushing factors have a key influence on early retirement decision: changes in work conditions (e.g. the manner of implementing institutional changes (50%), the effort put into disciplining rather than into teaching students [44%]), and workload, such as the evolution of work contents (55%) and an increasing workload (50%). Two pulling factors also made a considerable contribution to the choice of those opting for early retirement: the Plend characteristics – especially with regard to the advantageous conditions involved (e.g. the attractiveness of the retirement package [63%]) – and the desire to spend more time on leisure activities (to profit from their remaining energy (84%) and to devote time to their hobbies [69%]) (Table 2).
Differences in Motivational Orientation with Respect to Education Level and Gender As with the survey on entry motivations, the global results for the motivational factors behind early retirement also mask a number of disparities. For example, there are significant differences (chi-squared tests, p < 0.05) between education levels. Aspects related to changes in workload content were mentioned much more frequently by teachers in primary education than by secondary teachers. In the same way, work conditions – and more particularly the feeling of lack of freedom or autonomy – also pushed primary education teachers to leave the profession prematurely more than secondary teachers. Other factors, such as advantageous Plend conditions, also motivated primary education teachers more than secondary teachers. As far as the gender variable is concerned, we found that women were over-represented among the teachers who were motivated by the wish to spend time on non-professional activities. Men, however, were overrepresented among teachers motivated by social changes and, more particularly, by the perception of a decline in pupils’ competencies.
Salkind_Chapter 57.indd 400
9/8/2010 12:06:49 PM
Müller et al.
Attracting and Retaining Teachers
401
Table 2: Motivation for leaving teaching by education level (classified by categories) (in %) Total (n = 121)
Primary level (n = 40)
Secondary level (n = 81)
50
58
46
44
50
40
Deterioration of profession’s image Student behaviour Students’ competence level Feeling out of touch with students
44 33 25 13
43 33 15 8
44 33 30 15
Category 2: Workload Evolution of work contents Increasing work load Contents of institutional changes Feeling of lack of freedom, autonomy
55 50 44 19
70 58 45 35
47 46 44 10
22
18
24
19
18
19
26 16
26 29
26 25
34 28
46 40
27 22
26
41
18
15 15
8 23
20 10
63 47
82 51
53 44
84 69 41
90 79 39
81 65 42
13 11
8 21
1 6
Exit motivations Category 1: Work conditions Manner of implementing institutional changes Too much effort going into disciplining rather than into teaching students
Category 3: Relationships with principals Relations with education department (administration) Lacking support of school management Category 4: Fatigue and health Lacking the energy required to teach Health reasons Category 5: Private life (family) To devote time to family To take up non-professional activities (volunteering, etc.) Spouse is already retired Category 6: School policy Relations with school leaders Unsatisfactory professional development Category 7: Characteristics of early retirement conditions (Plend) Attractiveness of retirement package Likely disappearance of retirement package Category 8: Private life (leisure) To profit from remaining energy To devote time to hobbies To travel Others Tired of teaching a specific school subject Feeling unable to keep up with teaching content
Note: The percentages refer to the respondents who stated that their early retirement choice was influenced ‘quite a lot’ or ‘very much’ by each motivation.
Motivational Typologies of Teachers In this second survey, the combined method of multiple correspondence analysis and hierarchical cluster analysis set out above was similarly employed.
Salkind_Chapter 57.indd 401
9/8/2010 12:06:49 PM
402
Motivation
When the profiles of teachers taking early retirement are analysed below, this shows that a teacher’s overall assessment of his or her career correlates with his or her attitude towards institutional, pedagogic and social changes. The largest portion of teachers (49%) taking early retirement have a ‘positive assessment of their career’ in overall terms. They do not mention ‘changes’ as being a decisive factor in their decision to leave. They succeeded in adapting their professional commitment to an evolving environment. Thirty-two per cent of teachers finish their career with a fairly ‘mixed assessment’. Their decision to leave has been influenced by pedagogical and institutional changes and an increasing workload. Also, they feel that the image of the teaching profession has lost a lot of its appeal. There are 19% of teachers with an overall ‘negative assessment of their career’. Their decision to leave has been largely influenced by institutional and pedagogic changes and increasing workload. They also mention insufficient support from their professional environment (school leaders and administration).
Evaluation by Matrix Analysis The matrix for candidates (Figure 1) shows that altruistic motivations and intrinsic motivations rank high on the scale of motivation for becoming a teacher (x-axis) but low in respect of their accessibility and responsiveness to political action (y-axis) (lower right quadrant). Certain extrinsic motivations are highly ranked by teachers and are also susceptible to potential policy measures: i.e. possibilities for professional development, image of the profession, the evolving nature of the job and autonomy. However, there are a number of extrinsic factors, such as salary and job mobility, that score high with regard to their accessibility to political action but are of relatively low importance on the motivational scale (upper-left quadrant). This might be explained by the fact that Swiss teachers have a high salary level compared with other countries in Europe (OECD, 2005). Looking at the reasons for taking early retirement from teaching (Figure 2), private motivations, such as spending more time with the family, and on hobbies and travelling, rank high on the motivational scale. However, these private motivations offer little scope for potential policy intervention (lower-right quadrant). Factors that are highly responsive to political measures and have a key influence on a teacher’s decision to take early retirement are the way that institutional changes are carried out, the content of reforms, an increasing workload and advantageous pre-retirement benefits.
Salkind_Chapter 57.indd 402
9/8/2010 12:06:49 PM
Salkind_Chapter 57.indd 403 Motivated by social security benefits and salary
0.00
2.00
Difficulties finding another job
Job for a few years only
4.00
6.00
Wish to help children to succeed
8.00
10.00
Profession to exercice with passion
Allows to put important values into practice
Wish to improve society
Allows to give the same chance to all pupils
Transmitting knowledge to the children
Interest for the didactic and teaching aspects
Previous experience in teaching motivates application
Desire to work in spirit of collaboration
Social contacts and human relations
Appreciate flexibility of schedule and activity rate
Importance for becoming a teacher
Renouncement of academic career
Attain social status associated profession
Allows to work in different cantons, countries
Interest for professional development
Motivated by an evolutionary and demanding job
Allows to avoid routine
Stable employment and working conditions
Appreciate autonomy and independence of teaching
Identification with teaching profession
Allows to reconcile private and professional life
Allows to take interesting responsibilities
Allows to work in different schools
Sufficient holidays and time for leisure
Attracting and Retaining Teachers
Degree of responsiveness to policy measures
Figure 1: Motives for entering teaching and responsiveness to policy measures
0.00
2.00
4.00
6.00
8.00
10.00
Müller et al. 403
9/8/2010 12:06:49 PM
Salkind_Chapter 57.indd 404
Degree of responsiveness to policy measures
2.00
Student’s behaviour
4.00
Importance for leaving teaching
6.00
Too much effort on disciplining than on teaching students
Increasing work load
To take up non-professional activities (volunteer, etc.) To devote to family To travel
Spouse is already retired
Tired of teaching a specific school subject
Health reasons
Feeling out of touch with students
Degredation of profession’s image
Lacking required energy to teach
Feeling not able to keep up with teaching content
Student’s competence level
Relations with school leaders
Relations with school leaders
Feeling of lack of freedom,autonomy
To devote to hobbies
Attractiveness of retirement package
Way of implementing institutional changes
Evolution of work contents
Lacking support of school management Contents of institutional changes
Unsatisfactory professional development
Relations with education department (administration)
Figure 2: Motives for leaving teaching and responsiveness to policy measures
0.00
2.00
4.00
6.00
8.00
10.00
8.00
To profit of energy that remains
404 Motivation
9/8/2010 12:06:49 PM
Müller et al.
Attracting and Retaining Teachers
405
A Decision Framework for Defining Teacher Policies How can the results of our previous analysis of teachers’ motivation for entering and leaving the profession be of assistance in defining teacher policies aimed at attracting and retaining teachers in their profession? We have identified three issues that have been shortlisted from our previous matrix analysis that might be of particular interest, since they also have an impact on attracting new candidates as well as on retaining experienced teachers. Their transversal character holds scope for promising leverages for anchoring teacher policies over the full length of teachers’ careers. More specifically, these three transversal issues relate to (1) job characteristics (e.g. activities), (2) working conditions, and (3) the image of the teaching profession (see Table 3). They all show similar patterns in Table 3: Transversal issues to attract, develop and retain teachers Transversal issues Job characteristics
Working conditions
Professional image
Salkind_Chapter 57.indd 405
Motivations for entering teaching
Motivations for leaving teaching
Motivational inducement systems involved
Little job routine Working in a social network providing various human contacts (students, colleagues, parents)
Increasing work load (e.g. increasing diversity of tasks, more administrative work) Increasing number of meetings
Task system (e.g. job definition, job description)
An evolving and demanding job
Dissatisfaction with content and the way that institutional reforms have been implemented
Leadership system (e.g. change implementation) Professional development system (e.g. enhancement of teacher’s competencies)
Transmission of knowledge to young people
Too much effort going into disciplining rather than into teaching students Student behaviour
Task system (e.g. evolution of teacher’s responsibilities and professional activities) Social system (e.g. perception of teacher’s role in society)
Autonomy in pedagogical choices and activities
Lack of autonomy and flexibility
Task system (e.g. structures and processes to carry out professional activities) Professional development system (e.g. opportunities to acquire skills and knowledge)
Autonomy in performing teaching activities
Lack of hierarchical support Lack of flexibility
Leadership system (e.g. guidance and support to carry out professional activities) Social system (e.g. teamwork and feedback procedures) Reward system (e.g. pay and working conditions)
Identification with teaching profession
Degradation of teaching profession’s image
Task system (e.g. vision creation and mission development) Social system (e.g. shared vision and set of norms)
9/8/2010 12:06:49 PM
406
Motivation
respect of teacher motivation: initially they have a positive impact but, over the years, they develop into the main reasons for leaving teaching. With regard to job characteristics, one key factor is the way teachers face change in the course of their career. Table 3 shows that the fact that there is very little job routine is something that attracts teacher candidates to the profession. However, frequent changes in the activities involved in their job and their professional environment, due to school reforms for example, can become a key argument for losing one’s motivation to teach. Furthermore, Table 3 shows that there is a similar pattern for working conditions, especially with regard to autonomy. It is important for teachers at the start of their career to have sufficient autonomy to implement their pedagogical choices and their professional activities. However, the reality for an experienced teacher is somewhat different. They regret having too little autonomy and flexibility with regard to pedagogical choices and feel there is a lack of hierarchical support for specific measures – leading to major frustration and teacher losses. Finally, strong identification with the teaching profession fades over time. It seems that the initial enthusiasm for teaching cannot, unfortunately, be maintained over the years. More experienced teachers regret, on a systematic basis, that the professional image of teaching has deteriorated over the course of their career and that they do not identify themselves with the current profession any more. These changes highlight a key question for school principals and other practitioners: how can the initial motivational factors be maintained as teachers progress in their career? Our research is able to offer a number of answers to this fundamental question. Teaching-policy levers ought, in fact, to prevent the development of the gaps between entry and leaving motivations. Taking the motivational inducement system of Leonard et al. (1999), we can identify five determinants for leveraging the motivation of teachers: (1) task system; (2) leadership system; (3) reward system; (4) social system; and (5) professional development system. The last columns of Table 3 indicates what leverage could be used on a general basis to address issues related to job characteristics, work conditions and professional image. More specifically: what kind of measures can be put in place by school authorities to prevent a loss in teacher motivation from coming about? Table 4 summarizes potential measures for keeping the initial motivational factors alive over a teacher’s career. It is evident that several inducement systems are required in parallel in order to tackle motivational issues. Teacher policies can only be successful if they address motivational determinants in a complementary manner. Research on educational leadership shows that effective education leadership has a positive impact on teaching and learning. Leithwood et al. (2004) identify three sets of practices that make up the basic core of successful leadership: setting directions, developing people, and redesigning the organization. Developing people by providing teachers with necessary support and training to succeed is therefore a key task for those in leadership roles.
Salkind_Chapter 57.indd 406
9/8/2010 12:06:49 PM
Task system
Determine teacher’s job definitions and job descriptions Communicate expected competencies, organizational goals and the role which teachers are expected to play: e.g. subject matter and pedagogical knowledge, organizational and communication skills
Provide organizational structures and processes that allow flexibility and autonomy in carrying out teaching activities Offer possibilities to diversify teacher’s activities (e.g. project involvement, professional experience in other domains)
Create visions of current and future evolution of the teaching profession Evaluate missions with regard to social, institutional changes
Transversal issues
Characteristics of job activities
Salkind_Chapter 57.indd 407
Working conditions
Professional image Integrate teachers as responsible actors of change and job evolution
Offer possibilities for taking on new responsibilities (e.g. school leadership, mentoring, training)
Build adequate leadership to guide and support teachers in order to achieve common objectives (e.g. supervision, feedback)
Provide a professional network (e.g. school leaders, mentors) to provide a framework for and support teachers’ activities at their workplace Build strong leadership systems in order to lead und support institutional and pedagogic changes
Leadership system
Provide attractive working conditions to improve image
Competitive salary Emphasize flexible working conditions and job autonomy, e.g. flexible working hours, possibility of working part time, job sharing, etc. Possibility of reducing workload towards the end of the career
Provide a professional network and feedback (e.g. school leaders, mentors) to provide a framework for and support teachers’ activities at their workplace Let teachers express their preferences in respect of their activities (choice of degree, of school, of branch)
Reward system
Table 4: Policy measures derived from motivational inducement systems
Maintain high standards for professional development in order to retain highly qualified staff
Provide a work environment and resources that allow teachers to carry out their professional activities Provide opportunities for self-development and self-realization at the workplace
Conceive of teacher’s professional development as a continuing activity over the teacher’s career Identify individual training needs and adapt specific professional development possibilities Complement teaching reforms with training opportunities that facilitate implementation of changes in a positive way
Professional development system
Share and develop visions of the teaching profession in cooperation with key stakeholders Develop strategies to maintain and promote the image of teaching
Encourage team building and peer recognition among teachers
Clarify expectations with regard to the teacher’s function, role and profile with key stakeholders (e.g. policymakers, parents, teachers) Prepare (future) teachers adequately for the reality of the job: confront expectations and reality early on in initial teacher education through practical experience and field training
Social system
Müller et al. Attracting and Retaining Teachers 407
9/8/2010 12:06:49 PM
408
Motivation
First of all, change management emerges as one of the key elements with regard to evolving job characteristics. A clear understanding of the job definition and its evolution as a result of changing roles and modified expectations is crucial (task system). Furthermore, a strong leadership system is required in order to implement change and reforms. Professional development systems should provide additional support all along a teacher’s career and, finally, interactions with key stakeholders, such as parents, administrative authorities, political organizations and business associations, are necessary in order to clarify and share the teacher’s function, roles and profiles (social system). Working conditions provide an important additional lever of teacher policy, since they touch on all five motivational inducement systems. In general, working conditions should be conducive to a teacher’s motivation to carry out their professional work in a flexible and autonomous manner by providing the opportunity to work in a professional network and offering hierarchical support. Moreover, working conditions should provide an opportunity for teachers to keep up with evolving teaching contents and materials. Finally, it is important to develop and enhance the professional image of teachers both inside and outside the school system. The task system, for example, allows visions regarding the current and future evolution of the teaching profession to be updated. It would appear, in fact, that some of the perceived loss of a teacher’s image can be explained by the evolution of the job, which is perceived in a negative way by older teachers who fail to see their initial role confirmed. Furthermore, attractive working conditions and stringent requirements on the continuing education of teachers are measures that help maintain a positive professional image outside of the school system too. From our previous analyses based on theories of work motivation and organizational behaviour, we see that employee motivation is a critical element in terms of its influence on individual performance and on the capacity of organizations to attain their objectives. Set in the context of schools, teacher motivation plays an essential role with regard to student learning as well as to a school’s capacity to achieve its objectives as an organization. As a result, teacher motivation plays a key role in defining policies to attract, maintain and develop teachers, as has been illustrated by the measures identified above. Our suggested policy measures have been prioritized in respect of their potential impact on teacher motivation. Additional criteria, however, such as their political and economic feasibility, are to be considered for deciding on their final implementation.
Notes 1. In Switzerland, a distinction is drawn between pre-schools (Kindergarten, école enfantine or scuola dell’infanzia) and childcare outside the family (day nurseries, day-care mothers, play groups). Children of all cantons are entitled to have access to pre-school education before they enter compulsory education. Cantons and /or communes are responsible for organising and funding pre-school education.
Salkind_Chapter 57.indd 408
9/8/2010 12:06:49 PM
Müller et al.
Attracting and Retaining Teachers
409
2. Currently we do not have any information on non-respondents but we intend to collect data on the whole populations in forthcoming surveys so that we can compare the basic characteristics of respondents and non-respondents. 3. In Switzerland, the official retirement age for men is 65 years and, for women, 63 years. 4. The Plend early retirement plan (Plan d’encouragement au départ anticipé) was introduced in 1994 as a permanent measure, forming part of the Canton of Geneva’s human resources policy for its public administration. A certain number of conditions must be fulfilled in order to benefit from this retirement plan; these relate to age, for instance (women: a minimum of 57 years old; men: a minimum of 58 years old) and seniority (a minimum 10 years’ service as an employee with the canton of Geneva).
References Amabile, T.M., DeJong, W. and Lepper, M.R. (1976) ‘Effects of Externally Imposed Deadlines on Subsequent Intrinsic motivation’, Journal of Personality and Social Psychology 34: 92–8. Argyris, C. (1964) Integrating the Individual and the Organization. New York: Wiley. Basaglia, G. and D’Oria V.L. (2003) ‘Image and Health of Teachers in Italy: Framework, Problems and Proposals’, Appendix 4. In: OCDE (2003). Attracting, Developing and Retaining Effective Teachers. Country Background Report for Italy. OCDE Activity. Available at: http://www.oecd.org /els /education /teacherpolicy. Accessed 14 May 2005. Barmby, P. and Coe, R. (2004) Recruiting and Retaining Teachers: Findings from Recent Studies. Paper presented at the British Educational Research Association Conference, Manchester 14 –18 September, Curriculum, Evaluation and Management Centre, University of Durham. Benzécri, J.P. (1973) L ’analyse Des Données. Tome 1: La Taxinomie. Tome 2: L’analyse Des Correspondences, 2nd edn. 1976. Paris: Dunod. Bishay, A. (1996) ‘Teacher Motivation and Job Satisfaction: A Study Employing the Experience Sampling Method’, Journal of Undergraduate Sciences 3: 147–54. Cros, F. and Obin, J.P. (2003) Attirer, Former et Retenir Des Enseignants de Qualité, Rapport de base nationale de la France dans le cadre de l’activité de l’OCDE. Available at: http://www.oecd.org/els/education/teacherpolicy. Accessed 14 May 2005. Day, C. Stobart, G., Sammons, P., Kington, A., Gu, Q., Smees, R. and Mujtaba, T. (2006) Variations in Teachers’ Work, Lives and Effectiveness: Final report for the VITAE Project. London: Department for Education and Skills. Deci, E.L. (1971) ‘Effects of Externally Mediated Rewards on Intrinsic Motivation’, Journal of Personality and Social Psychology 18: 105–15. Deci, E.L. (1975) Instrinsic Motivation. New York: Plenum. Esquieu, N. (2003) Être Professeur en Lycée et Collège en 2002. Note d’information 03.37. Paris: Ministère de l’éducation nationale, de l’enseignement supérieur et de la recherché. Available at: http://www.education.gouv.fr/stateval. Accessed 23 May 2005. Esquieu, N. (2005) Portrait des Enseignants de Collèges et Lycées Interrogation de 1000 Enseignants du Second Degré en mai-juin 2004. Note d’information 05.07. Paris: Ministère de l’éducation nationale, de l’enseignement supérieur et de la recherché. Available at: http://www.education.gouv.fr/stateval. Accessed 23 May 2005. Federal Constitution of the Swiss Confederation of 18 April 1999, RS 101. Fisher, C.D. (1978) ‘The Effects of Personal Control, Competence, and Extrinsic Reward Systems on Intrinsic Motivation’, Organizational Behavior and Human Performance 21: 273–88. Forneck, H.J. and Schriever, F. (2000) Die Individualisierte Profession. Untersuchung der Lehrerinnen und Lehrerarbeitszeit und -Belastung im Kanton Zürich. Bildungsdirektion
Salkind_Chapter 57.indd 409
9/8/2010 12:06:50 PM
410
Motivation
des Kantons Zürich. Available at: http://www.bildungsdirektion.zh.ch/internet/bi/ de/publikationen/studien/evaluationen.html. Accessed 25 May 2005. Gagné, M., and Deci, L.E. (2005) ‘Self-Determination Theory and Work Motivation’, Journal of Organizational Behavior 26: 33–362. Gonik, V., Kurth, S. and Boillat, M.A. (2000) Analyse du Questionnaire Sur L’état de Santé Physique et Mentale des Enseignants Vaudois. Rapport Final. Lausanne: Institut universitaire romand de la Santé au Travail. Gordon, A.D. (1981) Classification: Methods for the Exploratory Analysis of Multivariate Data. London: Chapman and Hall. Gustafsson, J.E. (2003) ‘What Do We Know About Effects of School Resources on Student Achievement’, Review of Educational Research 66: 77–110. Kanfer, R. (1990) ‘Motivation Theory and Industrial and Organizational Psychology’, in M.D. Dunnette and L.M. Hough (eds) Handbook of Industrial and Organizational Psychology, vol. 1, pp. 75–170. Palo Alto, CA: Consulting Psychologists Press. Kyriacou, C. and Coulthard, M. (2000) ‘Undergraduates Views of Teaching as a Career Choice’, Journal of Education for Teaching 26(2): 117–26. Lebart, L., Morineau, A. and Piron, M. (1995) Statistiques Exploratoires Multidimensionnelles. Paris: Dunod. Leithwood, K. et al. (2004) How Leadership Influences Student Learning. Learning from Leadership Project. Minnesota: University of Minnesota, Center for Applied Research and Educational Improvement and Toronto: University of Toronto, Ontario Institute for Studies in Education. Leonard, N.H., Beauvais, L.L. and Scholl, R.W. (1999) ‘Work Motivation: The Incorporation of Self-Concept-Based Processes’, Human Relations 52: 969–98. Lepper, J.R., Greene, D. and Nisbett, R.E. (1973) ‘Undermining Children’s Intrinsic Interest with Extrinsic Rewards: A Test of the “Overjustification” Hypothesis’, Journal of Personality and Social Psychology 28: 129–37. Luekens, M., Lyter, D. and Fox, E. (2004) Teacher Attrition and Mobility: Results from the Teacher Follow-Up Survey, 2000–01. NCES 2004–301. Washington, DC: National Center for Education Statistics (NCES). MCEETYA (Ministerial Council on Education, Employment, Training and Youth Affairs (2003) Demand and Supply of Primary and Secondary School Teachers in Australia. Melbourne: MCEETYA. McGregor, D. (1960) The Human Side of Enterprise. New York: McGraw-Hill. Müller, K., Bortolotti, R. and Bottani, N. (2003) Stratégie de Recrutement des Enseignantes et Enseignants. Etudes et rapports 17A. Berne: Conférence Suisse des directeurs cantonaux de l’instruction publique (CDIP). NCES (National Center of Education Statistics) (1997) Job Satisfaction among America’s Teachers: Effects of Workplace Conditions, Background Characteristics, and Teacher Compensation. Statistical Analysis Report, July 1997. Washington, DC: US Department of Education, Office of Educational Research and Improvement. Available at: http:// www.nces.ed.gov/pubs97/97471.html. Accessed 12 June 2005. Obin, J.P. (2002) Enseigner, un Métier pour Demain. Rapport au minister de l’éducation natonale. Mission de réflexion sur le métier d’enseignant. Available at: http://www. education.gouv.fr/rapport/obin.pdf. Accessed 12 June 2005. OECD (Organization of Economic Cooperation and Development) (2003) Le Rôle des Systèmes Nationaux de Certification Pour Promouvoir l’Apprentissage Tout au Long de la vie. Rapport de base de la Suisse. Paris: OECD. OECD (Organization of Economic Cooperation and Development) (2005) ‘Teachers Matter: Attracting, Developing and Retaining Effective Teachers’, Education and Training Policy. Paris: OECD. Osterloh, M. and Frey, B.S. (2000) ‘Motivation, Knowledge Transfer, and Organizational Forms’, Organization Science 11: 538–50.
Salkind_Chapter 57.indd 410
9/8/2010 12:06:50 PM
Müller et al.
Attracting and Retaining Teachers
411
Osterloh, M., Frey, B. and Frost, J. (2001) ‘Managing Motivation, Organization and Governance’, Journal of Management and Governance 5: 231–39. Oubraye-Rossel, N. et Roussel, P. (2001) Le Soi et la Motivation. Notes du Laboratoire Interdisciplinaire de recherche sur les Ressources Humaines et l’Emploi (LIRHE), Note No. 345. Toulouse: LIRHE. Papart, J.P. (2003) La Santé des Enseignants et des Éducateurs de L’enseignement Primaire, Rapport à L ’organisation du Travail. Versoix: Actions en santé publique. Available at: http://www.geneve.ch/primaire/corps_enseignant.html. Accessed 12 June 2005. Pelletier, L.G., Legault, L. and Séguin-Lévesque, C. (2002) ‘Pressure from Above and Pressure from Below as Determinants of Teachers’ Motivation and Teaching Behaviors’, Journal of Educational Psychology 94: 186–96. Reeve, J., Bolt, E. and Cai, Y. (1999) ‘Autonomy-Supportive Teachers: How they Teach and Motivate Students’, Journal of Educational Psychology 9: 537– 48. Rivkin, S., Hanushek, E. and Kain, J. (1998) Teachers, Schools, and Academic Achievement: Working Paper 6691. Cambridge, MA: National Bureau of Economic Research (NBER). Roussel, P. (2000) La Motivation au Travail – Concept et Theories. Notes du Laboratoire Interdisciplinaire de recherché sur les Ressources Humaines et l’Emploi (LIRHE), Note No. 326. Toulouse: LIRHE. Ryan, R.M., Connell, J.P. and Deci, E.L. (1985) ‘A Motivational Analysis of Self-Determination and Self-Regulation in Education’, in C. Ames and R.E. Ames (eds) Research on Motivation in Education: The Classroom Milieu, pp. 13–51. New York: Academic Press. Ryan, R.M. and Deci, E.L. (2000a) ‘Intrinsic and Extrinsic Motivations: Classic Definitions and New Directions’, Contemporary Educational Psychology 25: 54 –67. Ryan, R.M. and Deci, E.L (2000b) ‘Self-Determination Theory and the Facilitation of Intrinsic Motivation, Social Development, and Well-Being’, American Psychologist 55: 68–78. Scholl, R.W. (2001) ‘Motivation Diagnostic Framework using Sources of Motivation Framework’. Available at: http://www.cba.uri.edu/scholl/Notes/Motivation_Diagnosis2. html. Accessed 7 March 2006. Scholl, R.W. (2002a) ‘Motivation’. Available at: http://www.cba.uri.edu/scholl/Notes/ Motivation.html. Accessed 7 March 2006. Scholl, R.W. (2002b) ‘Analysis and Diagnosis of Behavioral Problems’. Available at: http:// www.cba.uri.edu/scholl/Notes/Behavioral_Diagnosis.html. Accessed 7 March 2006. SECTQ (The Southeast Center of Teaching Quality) (2004) Teacher Working Conditions are Student Learning Conditions: A report to Governor Mike Easley on the 2004 North Carolina Teacher Working Conditions Survey. Chapel Hill, NC: The Southeast Center of Teaching Quality. Available at: http://www.teachingquality.org/TWC.htm. Accessed 31 May 2005. Smithers, A. and Robinson, P. (2003) Factors Affecting Teachers’ Decision to Leave the Profession. Nottingham: Department for Education and Skills (DfES). Spear, M., Gould, K. and Lee, B. (2000) Who Would be a Teacher? A Review of Factors Motivating and Demotivating Prospective and Practicing Teachers. Slough: National Foundation for educational research (NFER). Sturman, L. (2004) Contented and Committed? A Survey of Quality of Working Life Amongst Teachers. Slough: National Foundation for Educational Research (NFER). Swiss Conference of Cantonal Ministers of Education (CDIP) (2006) Simplified Diagram of the Swiss Education System. Available at: http://www.edk.ch/PDF_Downloads/ Bildungswesen_CH/BildungCH_e.pdf. Accessed 7 March 2006. Sylvia, R.D. and Hutchinson, T. (1985) ‘What Makes Ms. Johnson Teach? A Study of Teacher Motivation’, Human Relations 38: 841–56.
Salkind_Chapter 57.indd 411
9/8/2010 12:06:50 PM
412
Motivation
Vallerand, R.J. and Thill, E.E. (1993) ‘Introduction au Concept de Motivation’, in J. Vallernad and Thill, E.E. (eds) Introduction À La Psychologie De La Motivation, pp. 201–38. Laval (Quebec): Editions etudes vivantes. Ward, J.H. (1963) ‘Hierarchical Grouping to Optimize an Objective Function’, Journal of American Statistical Association 58: 236– 44. Wright, P.M. and McMahan, G.C. (1992) ‘Theoretical Perspectives for Strategic Human Resource Management’, Journal of Management 18: 295–320. Zimmermann, B.J. and Schunk, D.H. (2001) Self-Regulated Learning and Academic Achievement: Theory, Research and Practice. Hillsdale, NJ: Erlbaum. Zuckermann, M., Porac, J., Lathin, D., Smith, R. and Deci, E.L. (1978) ‘On the Importance of Self-Determination for Intrinsically Motivated Behavior’, Personality and Social Psychology Bulletin 4: 443–46.
Salkind_Chapter 57.indd 412
9/8/2010 12:06:50 PM
Salkind_Chapter 57.indd 413
9/8/2010 12:06:50 PM
Salkind_Chapter 57.indd 414
9/8/2010 12:06:50 PM
Salkind_Chapter 57.indd 415
9/8/2010 12:06:50 PM
Salkind_Chapter 57.indd 416
9/8/2010 12:06:50 PM
Salkind_Chapter 57.indd 417
9/8/2010 12:06:50 PM
Salkind_Chapter 57.indd 418
9/8/2010 12:06:50 PM
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY
Salkind_Prelims IV.indd i
9/4/2010 10:48:37 AM
Salkind_Prelims IV.indd ii
9/4/2010 10:48:37 AM
SAGE LIBRARY OF EDUCATIONAL THOUGHT AND PRACTICE
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY VOLUME IV
Edited by
Neil J. Salkind
Salkind_Prelims IV.indd iii
9/4/2010 10:48:37 AM
Introduction and editorial arrangement © Neil J. Salkind 2011 First published 2011 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. Every effort has been made to trace and acknowledge all the copyright owners of the material reprinted herein. However, if any copyright owners have not been located and contacted at the time of publication, the publishers will be pleased to make the necessary arrangements at the first opportunity. SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1, Mohan Cooperative Industrial Area Mathura Road New Delhi 110 044 SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN: 978-0-85702-178-6 (set of five volumes) Library of Congress Control Number: 2010923776 Typeset by Mukesh Technologies Pvt. Ltd., Pondicherry, India. Printed on paper from sustainable resources Printed by MPG Books Group, Bodmin Cornwall
Salkind_Prelims IV.indd iv
9/13/2010 3:51:23 PM
Contents Volume IV Section III: Motivation (Continued) 58.
59. 60. 61.
62. 63. 64. 65. 66.
Interpersonal Relationships, Motivation, Engagement, and Achievement: Yields for Theory, Current Issues, and Educational Practice Andrew J. Martin and Martin Dowson Classroom and Individual Differences in Early Adolescents’ Motivation and Self-Regulated Learning Paul R. Pintrich, Robert W. Roeser and Elisabeth A.M. De Groot Atkinson’s Theory of Achievement Motivation: First Step toward a Theory of Academic Motivation? Martin L. Maehr and Douglas D. Sjogren Motivation and Engagement across the Academic Life Span: A Developmental Construct Validity Study of Elementary School, High School, and University/College Students Andrew J. Martin Motivation and Achievement: A Quantitative Synthesis Margaret E. Uguroglu and Herbert J. Walberg Academic Motivation and Achievement among Urban Adolescents Joyce F. Long, Shinichi Monoi, Brian Harper, Dee Knoblauch and P. Karen Murphy Intrinsic Motivation and School Misbehavior: Some Intervention Implications Howard S. Adelman and Linda Taylor Reinforcement, Reward, and Intrinsic Motivation: A Meta-Analysis Judy Cameron and W. David Pierce Motivation in Transition Barbara Stauber
3 45 67
87 121 135
157 179 241
Section IV: Research Design, Measurement and Statistics and Evaluation Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing Raymond Hubbard and R. Murray Lindsay 68. Alphabet Soup: Blurring the Distinctions between p’s and a ’s in Psychological Research Raymond Hubbard 69. Research Methods: Experimental Design Julian C. Stanley
67.
Salkind_Prelims IV.indd v
263 283 313
9/4/2010 10:48:37 AM
vi
70. 71. 72.
Contents
What Can We Learn from International Assessments? Robert J. Mislevy Power, Control, and Validity in Research Randall M. Parker Testing Reasoning and Reasoning about Testing Walt Haney
Salkind_Prelims IV.indd vi
325 353 371
9/4/2010 10:48:37 AM
Section III: Motivation (Continued )
Salkind_Chapter 58.indd 1
9/4/2010 10:48:27 AM
Salkind_Chapter 58.indd 2
9/4/2010 10:48:28 AM
58 Interpersonal Relationships, Motivation, Engagement, and Achievement: Yields for Theory, Current Issues, and Educational Practice Andrew J. Martin and Martin Dowson
F
ew would dispute the importance of high-quality interpersonal relationships in young people’s capacity to function effectively, including in their academic lives. The literature consistently notes the substantial role that relationships play in students’ success at school (e.g., Creasey et al., 1997; Culp, Hubbs-Tait, Culp, & Starost, 2000; Field, Diego, & Sanders, 2002; Marjoribanks, 1996; Martin, Marsh, McInerney, Green, & Dowson, 2007; Pianta, Nimetz, & Bennett, 1997; Robinson, 1995). Guided by a core definition of relationship as “a state of connectedness between people, especially an emotional connection” (Webster’s Online Dictionary, 2007), we suggest that the concept of relationships provides an organizing framework for considering theories, issues, and practices relevant to achievement motivation. We also seek to demonstrate that the greater the connectedness on personal and emotional levels (also referred to as relatedness and relational processes) in the academic context, the greater the scope for academic motivation, engagement, and achievement. The purposes of this article are multifold. It elucidates the ways in which relationships affect achievement motivation and the benefits accrued from considering a relational perspective on achievement motivation. It describes a number of important motivation- and achievement-related theories and Source: Review of Educational Research, 79(1) (2009): 327–365.
Salkind_Chapter 58.indd 3
9/4/2010 10:48:28 AM
4
Motivation
demonstrates the central role of interpersonal relationships in each of these theories. It explores practical implications of a relational understanding of both theory and current issues in terms of practices relating to student-, teacher/classroom-, and school-level actions. Finally, it concludes with an integrative framework that summarizes theory, constructs, mechanisms, and practices relevant to the relational dynamics underpinning motivation, engagement, and achievement in the academic context. Figure 1 presents an organizing framework for this review.
Part I: The Importance and Process of Relatedness The role of relatedness in academic, social, emotional, and cognitive development How motivation affects achievement motivation Foreseen yields of positive relationships for achievement motivation
Part II: Relatedness and Theories of Achievement Motivation The role of relatedness in: • Attribution theory • Expectancy-value theory • Goal theory • Self-determination theory • Self-efficacy theory • Self-worth motivation theory
Part III: Trilevel Approach to Action from a Relational Perspective Student-level action • Universal student programs and intervention • Targeted student programs for at-risk populations • Extracurricular activity • Cooperative learning • Mentoring Teacher/classroom-level action • Connective instruction • Professional development • Teacher retention and training • Classroom composition School-level action • School as community • Effective leadership
Part IV: Integrative Model of Theory and Practice Connecting: • Theory to Constructs to Mechanisms to Practice
Figure 1: Organizing framework for review
Salkind_Chapter 58.indd 4
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
5
Part I: The Importance and Process of Relatedness Why Positive Interpersonal Relationships Are Important for Young People A substantial body of research demonstrates the importance of positive interpersonal relationships for healthy human functioning (e.g., see Berkowitz, 1996; Bronfenbrenner, 1986; De Leon, 2000; Fyson, 1999; Glover, Burns, Butler, & Patten, 1998; Hill, 1996; Moos, 2002; Royal & Rossi, 1996; Sarason, 1993; Weisenfeld, 1996). Relationships are a major source of happiness and a buffer against stress (Argyle, 1999; Glover et al., 1998; McCarthy, Pretty, & Catano, 1990). Through relationships, individuals receive instrumental help for tasks and challenges, emotional support in their daily lives, and companionship in shared activities (Argyle & Furnham, 1983; Gutman, Sameroff, & Eccles, 2002; Irwin, 1996). Conversely, the loss of relationship is a source of unhappiness and distress (Bronfenbrenner, 1974; Cowen, 1988; Gaede, 1985). Interpersonal relationships are also important for social and emotional development (Abbott & Ryan, 2001; Kelly & Hansen, 1987; McCarthy et al., 1990). For example, during childhood and adolescence, key aspects of development involve, and rely on, positive relationships (Damon, 1983; Hartup, 1982). Relationships are also a critical factor in young people’s engagement and motivation at school (Ainley, 1995; Battistich & Hom, 1997; Hargreaves, Earl, & Ryan, 1996; Pianta, 1998). This latter issue is the focus of our review.
Relationships and Achievement Motivation: Causal Effects and Value-Added Explanations Motivation is defined as a set of interrelated beliefs and emotions that influence and direct behavior (Wentzel, 1999; see also Green, Martin, & Marsh, 2007; Martin, 2007, 2008a, 2008b, in press). We propose that relationships affect achievement motivation by directly influencing motivation’s constituent beliefs and emotions. Ongoing social interactions teach individuals about themselves and about what is needed to fit in with a particular group. Accordingly, individuals develop beliefs, orientations, and values that are consistent with their relational environment. Hence, relatedness in the academic domain teaches students the beliefs, orientations, and values needed to function effectively in academic environments. In turn, these beliefs (if positive and adaptive) direct behavior in the form of enhanced persistence, goal striving, and self-regulation. In high-quality relationships, individuals not only learn that particular beliefs are useful for functioning in particular environments, but they actually internalize the beliefs valued by significant others (Wentzel, 1999).
Salkind_Chapter 58.indd 5
9/4/2010 10:48:28 AM
6
Motivation
In this way, beliefs held by others become a part of the individual’s own belief system. In the academic context, for example, good relationships with a particular teacher are likely to lead students to internalize at least some of that teacher’s beliefs and values about school and schoolwork. These internalized beliefs and values then have the potential to be transferred to other academic settings. Thus, students learn not only how to behave in a particular academic setting but also how to be a student in academic situations more generally (Ryan & Deci, 2000). Relatedness is an important self-system process in itself. As such, it has an energizing function on the self, working through the activation of positive affect and mood (Furrer & Skinner, 2003). This intrapersonal energy, gained from interpersonal relationships, provides a primary pathway toward motivated engagement in life activities. A complementary perspective on these processes is provided by the need to belong hypothesis. This hypothesis suggests that “human beings have a pervasive drive to form and maintain at least a minimum quantity of lasting, positive, and significant interpersonal relationships” (Baumeister & Leary, 1995, p. 497). When the need for belongingness is fulfilled, this fulfillment produces positive emotional responses. In the academic domain, these emotional responses are said to drive students’ achievement behaviors, including their responses to challenge, self-regulation, participation, and strategy use (Meyer & Turner, 2002). Relatedness affects individuals’ motivation and behavior by way of positive influences on other self-processes relevant to achievement motivation. For example, in the context of a student’s life, positive emotional attachments to peers, teachers, and parents promote not only healthy social, emotional, and intellectual functioning but also positive feelings of self-worth and self-esteem (Connell & Wellborn, 1991). This is important because self-worth and self-esteem are both related to sustained achievement motivation (Covington, 2002; Thompson, 1994). Finally, relatedness is linked to key psychological needs in a way that fosters achievement motivation. Work on autonomy in previous decades is a good example. Autonomy and relatedness have been linked (under various terminologies) in work on (a) agency (i.e., existence of an organism as an individual, giving rise to self-expansion and self-protection) and communion (i.e., participation of the individual in a larger organism, giving rise to cooperation) by Bakan (1966); (b) the importance of both individuational and relational needs along the lines proposed by Angyal (1941,1965), who identified orientations toward self-determination and self-surrender as complementary needs, and by Maslow (1968), who recognized the need for love and belongingness in the path to self-actualization; and (c) individualism and interdependence (Waterman, 1981) under a framework that provides support for the scope of individualistic values to facilitate helping, cooperation, and other prosocial behaviors. Indeed, these early integrations of autonomy and relatedness have been influential in later theorizing
Salkind_Chapter 58.indd 6
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
7
on motivation specifically (e.g., see Deci & Ryan, 2000) and personality more generally (e.g., see McAdams, Hoffman, Mansfield, & Day, 1996).
Benefits Accrued through Positive Interpersonal Relationships There are a number of benefits accrued through taking relatedness into account when examining achievement motivation theories and processes. First, relatedness serves as an explanatory construct through which diverse theories of achievement motivation can be integrated. In fact, relatedness may even transcend broader divisions of psychology beyond motivation psychology. For example, the belongingness hypothesis has wide application in educational, personality, and social psychology (Baumeister & Leary, 1995). Second, relatedness provides a useful diagnostic tool with which to view and understand adaptive behavior in the classroom and to treat achievement motivation problems in the classroom that are other related. For example, adjustment and adaptation problems in school have been linked to the failure of learning environments to meet students’ need to belong (Baumeister & Leary, 1995; Wentzel, McNamara Barry, & Caldwell, 2004). Third, relatedness recognizes and actively accommodates the interconnectedness of the social, academic, and affective dimensions of the self and the need for educational programs to recognize this interconnectedness (Weissberg, Kumpfer, & Seligman, 2003). Thus, the concept of relatedness can act as an impetus and explanation for educational programs that accommodate the whole self. Fourth, positive relationships are valued outcomes in their own right. The present review deals with relatedness as a means to greater theoretical and practical clarity with respect to achievement motivation. However, positive relationships can also be recognized as important end states in themselves. Thus, whatever their value for clarifying human motivation and achievement, relationships and relatedness are critical for understanding human functioning more widely. In addition to these more direct benefits derived through a closer understanding of relatedness in the classroom, there may also be indirect yields from a closer consideration of relatedness. Relatedness may help explain why the effect of adaptive beliefs on achievement motivation varies across contexts. For example, there is variation across studies with respect to the effects of various beliefs and goals on achievement motivation. Performance goals have been shown to be both adaptive and maladaptive for achievement motivation. Clearly, these results are inconsistent (for examples of the ongoing debate over the adaptiveness of performance orientation, see Brophy, 2005; Harackiewicz, Barron, Pintrich, Elliott, & Thrash, 2002; Kaplan & Middleton, 2002; Martin, 2006c), and it may be that relatedness can explain some of this inconsistency. Specifically, relatedness may act as a mediating variable with respect to the interface of goals and achievement motivation. In performanceoriented environments where students experience positive relationships, these environments may be perceived by students as being supportive in the
Salkind_Chapter 58.indd 7
9/4/2010 10:48:28 AM
8
Motivation
path to achievement. When this is the case, achievement motivation may be facilitated and sustained in the context of a performance orientation. On the other hand, a performance-oriented environment in the context of poor relationships may be perceived as a “dog-eat-dog” context rather than a supportive one. Hence, relatedness could be a mediating process that can inform current theoretical debates and empirical inconsistencies.
Part II: Relatedness and Theories of Achievement Motivation The Role of Interpersonal Relationships and the Other in Achievement Motivation Theory Our analysis of motivation-related theory falls largely within the socialcognitive domain and primarily utilizes social-cognitive perspectives (e.g., Dweck & Leggett, 1988; Schunk, 1991). This social-cognitive analysis brings into consideration six theoretical viewpoints. Each of these viewpoints, while maintaining the relevance of relationships to their conceptualizations, differs in the way in which interpersonal relationships are invoked. These viewpoints are attribution theory, expectancy-value theory, goal theory, selfdetermination theory, self-efficacy theory, and self-worth motivation theory. It is important that not all theories are historically social-cognitive theories per se. Rather, we invoke their social-cognitive elements for the purposes of our synthesis. We also recognize that other theories (not addressed here) include social-cognitive elements as a source of influence.
Rationale for the Choice of Theories Theories in this study represent major frameworks in achievement motivation have been developed over the past 40 years that drive current research (McInerney & Van Etten, 2004). At the time of writing we conducted a somewhat expeditious search of the Education Resources Information Center (ERIC) data base limited to publications that are: (a) journal articles, (b) peer reviewed, (c) dealing with motivation and/or achievement as keywords from the six theoretical positions outlined, (d) written in English, and (e) published since 2000 (inclusive). Through searches of keyword and/or mapping onto subject headings, this identified close to 1,500 articles dealing with “self-efficacy ” “self-worth/self-esteem”, “achievement goals”, “goal orientation”, “attribution/s”, “expectancy/ies”, and “self-determination”. Whilst we recognize that this is an ever changing and fluid tally that does not denote these constructs’ relative importance or substance, we present the tallies to demonstrate the current and recent relevance of these constructs and the theories to which they relate in published educational research.
Salkind_Chapter 58.indd 8
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
9
These theories also share a common social-cognitive heritage. Social-cognitive theories examine, inter alia, cognition and behavior (e.g., attributions, expectancies, purposes, perceived needs, capacities, and vulnerabilities) that are contextually located and influenced. This is not to imply that the place of relationships is explicit and central in each theory; however, when it comes to operationalizing the theories in achievement motivation research, there is often a clear relevance for interpersonal relationships. Indeed, this relevance is the focus of the present review. Although we propose that relationships are important to achievement motivation, this does not mean that the role of self-generated cognitions and emotions should be ignored. We recognize – as do the theories we examine – that the self has powerful generative capacities of its own. Similarly, we recognize that in addition to relatedness and its impact on motivation, engagement, and achievement, there is the key issue of students’ academic proficiency. This proficiency encompasses general skills such as critical thinking, self-regulation, and metacognition, as well as more-specific skills, such as decoding texts, comprehension, and mathematical reasoning. Hence, we suggest that relatedness is a necessary but not sufficient condition for explaining variation in educational outcomes.
Review of Theories Attribution theory. According to attribution theory, the causes individuals attribute to events have an impact on the way they cognitively, affectively, and behaviorally respond on future occasions (Schell, Bruning, & Colvin, 1995; Weiner, 1986, 1994). Four attributions are typically identified in the literature: attributions to luck, task difficulty, ability, and effort. For example, failure on an exam may be attributed to bad luck, difficult questions, low ability, or insufficient effort. These causal attributions can also be mapped according to their locus, stability, and controllability (Weiner, 1994). Thus, the causes of an event may be located within the person or external to the person, may be stable or unstable, or may be controllable or uncontrollable. The control dimension is of particular interest in this review because it tends to be a significant determinant of students’ responses to setback, pressure, and fear of failure (Borkowski, Carr, Rellinger, & Pressley, 1990; Groteluschen, Borkowski, & Hales, 1990; Martin, Marsh, & Debus, 2001b). One means by which students gain a sense of control is through the feedback they receive from significant others such as their parents and teachers (Fabricius & Hagen, 1984; Weiner, 1986). The significance of this other person an important mechanism for a sense of control, and this significance is established, at least in part, through the nature and strength of the relationship. It has been suggested that control (or helplessness) is learned by observing powerful models, such as parents (Peterson, Maier, & Seligman, 1993). Furthermore, parents and teachers
Salkind_Chapter 58.indd 9
9/4/2010 10:48:28 AM
10
Motivation
who provide reinforcement and feedback that are commensurate with students’ performance enhance students’ perceived control over educational outcomes (Perry & Tunna, 1988; Thompson, 1994). Hence, a defining aspect of students’ attributional profiles is in part relationally determined. Put simply, students can learn control from these significant others and the way these significant others relate to them. It has also been suggested that attributions in the interpersonal context give rise to socially based emotions (Hareli & Weiner, 2002). Recent work has proposed that socially based emotions are the result of attributional inferences focusing on the perceived causes of a particular outcome (Hareli & Weiner, 2002). This can have two impacts. First, it affects the observer’s emotions directly. In an adaptive scenario, a student attributing another student’s success to effort can experience positive affect and feelings of admiration for that student. On the other hand, a student attributing another student’s poor performance to a lack of ability may experience negative affect (Hareli & Weiner, 2000). In both cases, emotion is evoked in the academic context through the attributions students make about others’ academic outcomes. There is a second way socially based emotions emerge as a result of attributional inferences. Here, observers’ inferences about the cause of an event can shape the student’s emotions and behavior. For example, observers (e.g., teachers, parents) view a student’s performance and make inferences about the causes of the outcome, and these then influence the student’s reactions to the outcome and subsequent behavior. In the adaptive scenario described above, a teacher explicitly attributing a student’s success to effort can evoke positive affect and feelings of pride in the student. On the other hand, a teacher explicitly attributing poor performance to a lack of ability may evoke negative affect and shame in that student. Again, academically related emotion is evoked through the attributions for success and failure in a relational context, and this emotion has achievement motivation relevance. Taken together, on the matter of relatedness and attributions, these findings underscore “the interconnection of the self and others in achievement settings, and the necessity of a transactional analysis to understand the social dynamics that accompany achievement performance” (Hareli & Weiner, 2002, p. 191). Expectancy-value theory. Atkinson (1957) viewed the motivation to achieve success as a product of the individual’s perceived probability of success and the incentive value of that success. Similarly, the motivation to avoid failure was seen as a product of perceived probability of failure and the negative incentive value of failure. More recent formulations of expectancy-value theory (e.g., Eccles, 1983; Wigfield, 1994; Wigfield & Tonks, 2002) have refined and extended Atkinson’s original formulation by suggesting that (a) the expectancy-value framework can be applied to the whole range of behavior, not just risk-taking behaviors; (b) the strength of an individual’s motivation is based on the valuing of proximal and distal outcomes associated with a behavior or pattern of behaviors; and (c) motivation is dependent on the perception of the
Salkind_Chapter 58.indd 10
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
11
likelihood of a desired outcome occurring, contingent on a behavior or pattern of behaviors (see also Nicholls, Cheung, Lauer, & Patashnick, 1989; Wigfield & Tonks, 2002). In an educational context, students who believe they are capable of mastering their schoolwork typically have positive expectations for success and, hence, high motivation and achievement (Nicholls et al., 1989). What further contributes to students’ motivation and achievement is their valuing of an academic task, as well as the interface of their expectancies and task values (Arbreton & Blumenfield, 1997; Eccles, 1983). In a recent model representing the development of students’ expectancies for success and task values, Wigfield and Tonks (2002) identified the role of significant socializers’ attitudes, beliefs, and behaviors in the development of students’ expectancies and values. In particular, expectancies and values are influenced by the socializers with whom students have significant relationships. Thus, expectancy-value theory implicates relationships as an important component of its theoretical framework, and expectancies and values may be conceptualized as being, in part, relationally determined. Goal theory. Goal theory focuses on the meaning students attach to achievement situations and the purpose for their actions (Ames, 1992; Barker, Dowson, & McInerney, 2002; Dweck, 1992; Pintrich, Marx, & Boyle, 1993). Goals proposed in early theorizing were the desire to affirm competence (mastery goal) and the desire to demonstrate superiority (performance goal). More-recent developments in goal theory have added social goals. Social goals focus on social reasons for achievement, such as affiliating with others, gaining approval from others (e.g., parents and peers), and complying with group norms (Dowson & McInerney, 2001, 2003; Elliot, 1997, 1999; McInerney, Roche, McInerney, & Marsh, 1997; Middleton & Midgley, 1997; urdan & Maehr, 1995). Goal theorizing has now also introduced an approach and avoidance distinction (e.g., Barker et al., 2002; Elliot, 1997). Goals may be conceptualized as being directed toward approach or toward avoidance. Approach goals are those that draw participation in an activity. Avoidance goals drive withdrawal from activities or avoidance of negative implications and consequences. Mastery, performance, and social goals can be located on approach–avoidance axes. A mastery avoidance goal, for example, represents the desire not to fail at developing mastery, a performance avoidance goal as the desire not to demonstrate lack of ability, and a social avoidance goal as, for example, working mainly to avoid disapproval from parents and teachers (Barker et al., 2002; Dowson & McInerney, 2003; Elliot, 1997; Martin, 2001, 2002b, 2006a). Whether directed toward approach or avoidance, the goals students adopt, their relative importance, and their effects on motivation and achievement are related to the influence of others (e.g., McInerney, Hinkley, Dowson, & Van Etten, 1998; Wentzel, 1994). For example, Martin et al. (2007) demonstrated a significant link between the quality of teacher-student relationships
Salkind_Chapter 58.indd 11
9/4/2010 10:48:28 AM
12
Motivation
and students’ mastery orientation and avoidance goals (see also Anderman & Maehr, 1994; Meece, 1991, for other aspects of teacher behavior and students’ goals). They also demonstrated a significant association between (a) students’ relationships with peers and their mastery and avoidance goals and (b) students’ relationship with parents or caregivers and these goals (see also Creasey et al., 1997 for the influence of relational contexts with peers and parents). Indeed, there may be different impacts of teachers, parents, and peers on different goals. For example, Martin et al. (2007) found relationships with teachers had the most impact on students’ mastery and avoidance goals, and Dowson and McInerney (2003) found that parents may have the most impact on students’ social goals. All this suggests that the goals students adopt, and the way these goals are expressed, are not independent of the influence of the relationships students have with teachers, peers, and parents. For this reason, students’ goals can be conceptualized as both arising from and being fulfilled in relational contexts (see also Lemos, 1996; Stipek, Giwin, Salmon, & MacGyvers, 1998; Taylor, 1995). Self-determination theory. Of the theories reviewed here, self-determination theory is among the most explicit in its recognition of relatedness as a fundamental ingredient of motivation. It proposes that for one to be motivated and to function at optimal level, a set of psychological needs must be supported (Deci & Ryan, 2000; La Guardia & Ryan, 2002; Reeve, Deci, & Ryan, 2004). These needs are relatedness, competence, and autonomy. Relatedness refers to the connection and sense of belonging with others. This connectedness and belonging provides the required emotional security that individuals need to actively explore and effectively deal with their worlds. From a learning perspective, a strong sense of relatedness better positions students to take on challenge, set positive goals, and establish high expectations that extend and motivate them. Moreover, relatedness needs constitute a motivating force for internalizing social regulations and adapting to interpersonal circumstances (La Guardia & Ryan, 2002). In turn, meeting these relatedness needs is likely to enable students to negotiate the affective and social world of the classroom and school, and this enhanced affective and social integration interfaces with enhanced motivational processes (Furrer & Skinner, 2003; Weissberg et al., 2003; Wentzel et al., 2004). For example, to the extent that home and school expectations and goals are aligned, children who are more warmly involved with their parents experience better academic functioning in class, and children with a heightened sense of relatedness with parents are more engaged at school and display higher self-esteem while at school (Avery & Ryan, 1987; Ryan, Stiller, & Lynch, 1994). Quality relatedness with parents also predicts quality relatedness with teachers (Ryan et al., 1994). Self-efficacy theory. Self-efficacy theory is centrally relevant to individuals’ belief in their capacity to successfully carry out given tasks and the consequent impact this self-belief has on motivation and achievement
Salkind_Chapter 58.indd 12
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
13
(Bandura, 1986, 1997; Schell et al., 1995; Schunk & Miller, 2002). Self-efficacy is hypothesized to support a generative capacity such that individuals high in self-efficacy generate and test alternative courses of action when they do not meet with initial success (Schunk, 1991; Schunk & Miller, 2002). High self-efficacy can also enhance one’s functioning through elevated levels of effort and persistence and can also enhance one’s ability to deal with problematic situations by influencing cognitive and emotional processes related to the situation (Bandura, 1986,1997; Zimmerman, Bandura, & Martinez-Ponz, 1992). Students can gain a sense of self-efficacy through the problem-solving modeling and supportive communication of significant others (Bandura, 1997). Moreover, those with whom students identify and to whom they are closely connected are more-powerful channels of this modeling and positive communication (Bandura, 1997; Meece, 1997; Schunk & Miller, 2002). In this sense, relatedness is a mechanism through which modeling takes place. Furthermore, a key interpersonal influence on self-efficacy is the vicarious influence from others through social models (Bandura, 1997). For these reasons, efficacious self-beliefs, and the extent to which these are held by self, can be conceptualized as a relationally influenced process. And although self-efficacy is often discussed in individualistic terms, both the extent to which self-efficacy beliefs change over time and the ways these beliefs affect motivation and achievement are determined in the social domain (e.g., Bandura, 1986; Parker & Martin, in press). Hence, self-efficacy may be conceptualized in relational terms rather than in solely individual terms (Schunk, 1991; Schunk & Miller, 2002). Perhaps a focus for future research is whether relationships are a moderator of these processes such that relatedness (e.g., high, low) and modeling (e.g., yes, no) interact to affect achievement motivation or whether relatedness is a mediator of these processes such that modeling predicts achievement motivation by way of relational factors. Self-worth motivation theory. Self-worth motivation theory describes the bases of, and the processes involved in, protecting or enhancing one’s self-worth (Covington, 1992, 1998, 2002). According to this theory, students’ self-worth is largely derived through their ability to perform academically and competitively (Covington, 2002; Robinson, 1995). One reason students come to equate their worth with ability is that their worth, in part communicated to them by significant others, is made conditional on achievement. These conditional relationships, then, have a significant impact on students’ propensity to self-protect (Covington, 1992; Martin, 2002c, 2007; Martin & Marsh, 2003). In turn, such self-protection can have a negative impact on students’ engagement and achievement (Covington, 1992; Martin, Marsh, & Debus, 2001a, 2001b, 2003; Thompson, 1994). This suggests that students’ relationships, especially the conditionality of those relationships, affects their self-worth and then their motivation and achievement. Thus, self-worth theory may also be conceptualized in relational terms.
Salkind_Chapter 58.indd 13
9/4/2010 10:48:28 AM
14
Motivation
From an empirical perspective, Martin, Marsh, Williamson, and Debus (2003) have shown that students’ motive to protect self-worth and the specific strategies in which they engage to do this are influenced by significant others. In particular, they found that students’ parents were a factor in their fear of failure. They also found that the characteristic way in which that fear was responded to (e.g., through self-handicapping or defensive pessimism) was often linked to the characteristic way in which their parents dealt with their own fear. This impact of the family and relatedness is supported by other research demonstrating the intergenerational transmission of fear of failure and the impact of approval withdrawal on students’ fear of failure (Elliot & Thrash, 2004).
Summary of Key Relational Ideas Emanating from Theory The discussion above identifies key motivation- and achievement-related concepts, ideas, and processes underpinned or directed by relatedness, connectedness, and belonging. A summary of these linkages is presented in Table 1. Attribution theory focuses on the causes ascribed to outcomes and events in one’s life and the impact of these causal attributions on behavior, affect, and cognition. Personal attributions may be learned from, or modeled on, the attributional “styles” or patterns of others. Specific consequences of attributions (such as a sense of personal control) can also be developed
Table 1: Summary of key theories and key concepts relevant to relatedness Theory
Key concepts
Link to relatedness or the other
Attribution theory
Perceived causes of an event or outcome shape behavior, affect, and cognition; key causal ascriptions – control, locus, stability Positive expectations and high value placed on task or outcome enhances motivation Reasons for engaging in a particular behavior or pursuing a particular goal Relatedness a psychological need Belief in capacity to achieve in a specific domain or task
Perceived causes learned or inferred from significant others; dimensions such as control shaped by feedback from others
Expectancy-value theory
Goal theory
Self-determination theory Self-efficacy
Self-worth motivation theory
Salkind_Chapter 58.indd 14
Link between worth and achievement; fear of failure
Socializers’ beliefs, attitudes, and behaviors communicate level of expectation and nature of value Communicated through others’ values, expectations, and group norms Relatedness need met through warmth, support, and nurturance Modeled and communicated by significant others; vicarious influence from others Relationships (approval, affirmation) conditional on level of achievement; specific response to fear of failure linked to how significant others respond
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
15
through feedback from and observation of significant others. Self-efficacy refers to a belief in one’s capacity and agency to achieve a desired outcome. This sense of capacity and agency can be instilled through direct or vicarious influence, modeling, and open communication from others. Related to this, expectancies and values have also been substantively linked to socializers’ beliefs, attitudes, and behaviors. Goal theory focuses on the why of behavior, which can be communicated through the values and expectations of significant others (working at individual, group, and organizational levels). Self-determination theory focuses on the psychological need for relatedness, which is satisfied through the warmth, support, and nurturance of significant others. Self-worth motivation theory focuses on the link between worth and achievement. It demonstrates that this link is in part determined by relationships in the child’s life in which worth, affirmation, and approval are communicated in either conditional or unconditional ways.
Part III: A Trilevel Approach to Action from a Relational Perspective To the extent that relatedness is central to achievement motivation theory, then educational practice relevant to motivation can also be framed in relational terms. A useful heuristic by which to organize and consider educational practice rests on the multiple tiers at which educational outcomes unfold and at which intervention and practice can be directed. Tiered approaches to intervention and practice are not uncommon and have recently been advocated as best practice in addressing diverse education- and health-based problems and challenges (e.g., see National Institutes of Health, 2008, and National Institute of Child Health and Human Development, 2008, for links to research along these lines). Such tiered approaches are now identified as particularly effective in reaching diverse populations with varying degrees and types of need. The tiered approach is also a useful way of organizing the discussion of relational action. Accordingly, we consider relatedness at the three levels that typically characterize the natural structure of students’ educational environs, namely, (a) practice at the level of the student, (b) practice at the level of the teacher or classroom, and (c) practice at the level of the school. We argue that analyzing action in this trilevel fashion represents an integrative means by which to address relational practice in the context of theory. To support this argument, we point to the fact that previous research has focused on one or more of these three levels to enhance the quality of pedagogy (Hill & Rowe, 1996; Kontos & Wilcox-Herzog, 1997b; Marzano, 2003), improve middle schooling (Eccles, 1999), enhance the educational outcomes of boys (Martin, 2003a, 2003b, 2004; Weaver-Hightower, 2003), assist Indigenous Australian students (Munns, 1998), address the educational needs of disadvantaged students (Battistich & Hom, 1997; Becker & Luthar, 2002),
Salkind_Chapter 58.indd 15
9/4/2010 10:48:28 AM
16
Motivation
smooth educational transition (Barratt, 1998; Maehr & Midgley, 1996; Martin, 2008a), and build resilience and buoyancy (Cunningham, Brandon, & Frydenberg, 1999; Howard & Johnson, 2000; Martin & Marsh, 2006, 2008, in press). The key principles derived from theory outlined in Part II are also useful in identifying key elements to consider at each of the three levels of intervention. Thus, we should be looking to practice at each level that involves or encompasses key constructs and mechanisms detailed in the key theories discussed in Part II. Along these lines, Pintrich (2003) recently identified substantive questions for the development of a motivational science. Taken together, these questions underscore the importance of considering, conceptualizing, and articulating a model of motivational practice from salient and seminal theorizing related to self-efficacy, attributions, expectancy and valuing, goal orientation, self-determination, and self-worth perspectives. As we discuss each level of practice, it is important to recognize that no one practice is a sufficient condition for an encompassing approach to relational intervention. Moreover, in the context of a tiered model, approaches are most effective if integrated. For example, a school implementing cooperative learning, mentoring, or an expanded approach to extracurricular activity as its only targeted effort to meet the relational needs of its students is unlikely to achieve the interpersonal yields of schools doing more than this. Likewise, the benefits to be derived from practice will be limited if there is not sufficient depth such that the fullness of any one practice is not amply addressed. We propose, then, that a powerful implementation of the various practices described below will rest on breadth, depth, quality, and integration.
Practice at the Student Level At the student level, we emphasize universal student programs and intervention, targeted student programs assisting at-risk populations, extracurricular activity, cooperative learning, and mentoring. Although there are many other practices at the student level that facilitate relatedness, we emphasize these practices because they are underpinned by elements of theory described above, represent opportunities to enhance connectedness between students, and are grounded in individual, student-to-student, or studentto-adult approaches to enhancing educational outcomes.
Universal Student Programs and Intervention In terms of the theoretical foundations described earlier, there are many in-school and out-of-school programs in which students engage that not only enhance academic outcomes and prevent maladaptive outcomes but also offer scope for personal growth and development (indeed, a recent issue
Salkind_Chapter 58.indd 16
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
17
of American Psychologist, 38 (6–7), 2003, focused on such programs and interventions for young people). Even broadly based relational programs offer scope to build bridges to students’ academic lives. Such programs typically range in specific purpose but are often aimed at enhancing or intervening in students’ emotional, social, physical, behavioral, and academic development. These programs comprise positive interpersonal relationships and support, helping students feel valued, developing supportive relationships, establishing a meaningful place for the individual in a group, and fostering individuals’ usefulness to others (Dryfoos, 1990; Martin, 2008a; Nation et al., 2003; Weissberg et al., 2003). Martin (2005, 2008a) also identified elements that contribute to effective motivation and engagement interventions based on the seminal theory described above. The first element comprised optimistic expectations held by adults for the students, directly invoking self-efficacy principles through the modeling of efficacious behavior by adults and expectancy–value principles through communicating efficacy-related expectations to students (e.g., see Bandura, 1997; Wigfield & Tonks, 2002). A focus on mastery was a second element, invoking principles of goal theory that identify the importance of significant adults in shaping students’ goals (e.g., see Anderman & Maehr, 1994; Creasey et al., 1997; Meece, 1991). These adults are also influential in shaping the climate, the third element identified by Martin. Specifically, a climate of cooperation, consistent with goal theory and relevant climate research (Ames, 1992; Dweck, 1992; Elliot, 1997; Qin, Johnson, & Johnson, 1995; Roeser, Midgley, & urdan, 1996; urdan, Midgley, & Anderman, 1998), evokes a sense of belonging that fulfills relatedness needs, consistent with self-determination theory (Deci & Ryan, 2000; La Guardia & Ryan, 2002). This climate of cooperation also serves to diminish evaluative concerns and a consequent fear of failure, in keeping with tenets of self-worth motivation theory (Covington, 1992, 1998, 2002; Martin & Marsh, 2003).
Targeted Student Programs for At-Risk Populations: Special Focus on Indigenous Students As discussed, universal intervention programs typically involve practices directed at all students, whether they be high or low achievers, motivated, or unmotivated. However, there has been some concern that such programs may increase the gap between the strong and the struggling students such that the strugglers gain but the strong gain more (e.g., Ceci & Papierno, 2005). We propose that a relational perspective on educational practice may hold specific and differentiated benefits for groups that are at risk, even under a universal intervention paradigm. To illustrate, we focus on students from disadvantaged groups. Although these groups are by no means exhaustive of student groups at risk, they are an informative means of examining the potential for a relational approach in addressing their educational needs.
Salkind_Chapter 58.indd 17
9/4/2010 10:48:28 AM
18
Motivation
In many countries, Indigenous students represent a distinct group of disadvantaged student. In Australia, for example, across reading, mathematical literacy, and scientific literacy, Indigenous students achieve at a much lower standard than their non-Indigenous counterparts, and the dropout rate in high school is markedly higher for Indigenous groups (Groome & Hamilton, 1995; Martin, 2003c; Munns, 1998). Research conducted among Indigenous students has found that the impact of positive relationships on a number of educational outcomes can be substantial (see, e.g., Collins, 1993; Groome & Hamilton, 1995; Richer, Godfrey, Partington, Harslett, & Harrison, 1998). Given the fact that many Indigenous students experience difficulties with their teacher, interpersonal relationships are a critical concern when schools are seeking to enhance Indigenous students’ educational outcomes (Richer et al., 1998). Reviews point to three levels of relationships relevant to the educational needs of Indigenous students (Martin, 2006a, 2006b; Munns, 1998; see also Fanshawe, 1989). The first involves an active daily connection with the school. This relationship is underpinned by ongoing connections with the Indigenous community, Indigenous Studies as part of the general curriculum, and a focus on the interests of Indigenous students as a policy priority. Together, these aspects of relationship with school enhance students’ academic and nonacademic morale (Fanshawe, 1989; Martin, 2006a, 2006b; Munns, 1998). The second, interpersonal relationships, involves teachers’ getting to know students, developing trust within the class and school, and developing Indigenous cultural knowledge and understanding. The third, pedagogical relationships, involves connecting with students by means of challenging and interesting work, effective instructional strategies, and positive expectations held by teachers for students. In the context of Indigenous education, predictors of this relationship include teacher satisfaction, appropriate and respectful views of students’ Indigenous status, collaborative lesson planning, and effective early intervention policies and programming (Munns, 1998). Taken together, school, interpersonal, and pedagogical relatedness can be an organizing concept for improving educational outcomes of Indigenous students – and potentially the educational outcomes of other disadvantaged minorities and groups. In line with this, lessons learned through Indigenous education are echoed in those learned in other cultural settings. Graham (1994), for example, developed a taxonomy for considering motivation among African Americans. Notwithstanding the important historical and social factors that distinguish them from other racial groups, Martin (2003c) suggested that this framework provided a useful means by which to think about Indigenous students’ educational status and outcomes. According to Graham, a central element of such a motivational psychology must address socialization antecedents of achievement strivings. Similarly, pedagogical principles
Salkind_Chapter 58.indd 18
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
19
have been drawn from the work of Ladson-Billings with exemplary teachers of African American students (Ladson-Billings, 1995). According to LadsonBillings, culturally responsive teachers create social interactions through maintaining fluid teacher-student relationships, demonstrating connectedness with all students, developing a community of learners, and encouraging students to learn collaboratively. As can be readily surmised, these are principles of effective teaching that should be effective with any group. However, they have particular scope for classrooms characterized by diversity, and in particular with students who are academically disadvantaged, such as Indigenous minorities (e.g., Indigenous Australians, Native Americans) and educationally disadvantaged ethnic minorities and groups (e.g., African Americans and Mexican Americans), where they are most needed.
Extracurricular Activity Extracurricular involvements traverse in-school and out-of-school programs. Extracurricular involvement encompasses, among other things, activities such as sport, music, dance, clubs, and church. The weight of evidence suggests that most extracurricular activities are a positive influence in young people’s lives, including in their educational, social, and emotional lives (Barber, Eccles, & Stone, 2001; Cooper, Valentine, Nye, & Lindsay, 1999; Eccles & Barber, 1999; Marsh, 1992; Marsh & Kleitman, 2002; Valentine, Cooper, Bettencourt, & DuBois, 2002). Significantly, relatedness and belonging are important reasons such activities are thought to yield positive effects. Extracurricular activity provides young people with safe and caring environments (McLaughlin, Irby, & Langman, 1994) in which prosocial adults (Mahoney, Schweder, & Stattin, 2001; Roth & Brooks-Gunn, 2000) are able to promote self-efficacy and model effective behaviors, consistent with selfefficacy theory (Bandura, 1997; Schunk & Miller, 2002). Extracurricular activity helps develop social skills and social capital (Broh, 2002), thereby building a student’s sense of control, as articulated by attribution theory (Weiner, 1986, 1994; see also Perry & Tunna, 1988; Thompson, 1994), and autonomy, consistent with a self-determination perspective (Deci & Ryan, 2000; La Guardia & Ryan, 2002; Reeve et al., 2004). Moreover, extracurricular activity provides an adolescent with a sense of belonging to a personally valued group (Brown & Evans, 2002), harnessing principles from expectancy-value and self-determination frameworks (Deci & Ryan, 2000; Wigfield & Tonks, 2002). To the extent that these connections and modeling are aligned with academic goals, they have the potential to promote achievement motivation. Hence, through a relational framework underpinned by principles salient in theorizing, extracurricular activity can facilitate educational and other outcomes.
Salkind_Chapter 58.indd 19
9/4/2010 10:48:28 AM
20
Motivation
Cooperative Learning Also relevant at the student level and related in part to goal theory is the relative emphasis on cooperative (relational) and competitive (anti- or at least a relational) activities among students. Cooperation can be operationally defined as the presence of joint goals, mutual rewards, shared resources, and complementary roles (Qin, Johnson, & Johnson, 1995). In cooperative situations, students strive to reach their goals through the support and joint focus of others in their group or class. In competitive situations, students strive to reach their goals individually, or against (rather than with) others (Anderman & Maehr, 1994; Barker et al., 2002). Thus, whereas cooperation is focused on the notion of relatedness and mutual action with the other, the notion of competition tends to be antithetical to it. Evidence suggests that cooperative efforts are more effective than competitive efforts for many learning-related tasks, such as those involving decoding and recall of information (Barker et al., 2002; Johnson, Maruyama, Johnson, Nelson, & Skon, 1981), and more conducive to higher level thinking and problem solving (Johnson et al., 1981; Qin et al., 1995; Slavin, 1983). Cooperative learning theorists might explain such findings by arguing that the pursuit of joint goals and mutual rewards and the sharing of intellectual and physical resources (all factors relying on relatedness and inter-connectedness) contribute to the advancement of achievement and motivation underpinning these outcomes.
Mentoring Within the school environment, mentoring harnesses relatedness between younger students and older students (or adults) who provide support and guidance in particular domains. Mentoring is implemented in numerous ways, including high school students “adopting” elementary school students, elementary school activity days (e.g., high school students teaching younger students skills for better schoolwork), former students visiting the school (e.g., to encourage reading or to identify postschool pathways relying on academic engagement), underachievers choosing a teacher–mentor to work with, or pairings in partnership with local industry (see Noble & Bradford, 2000). It has been suggested that the enhanced interpersonal connectedness that is part of these programs contributes directly to engagement and achievement gains (Karcher, Davis, & Powell, 2002). In a recent model representing the development of students’ expectancies for success and task values, Wigfield and Tonks (2002) emphasized the role of significant socializers’ (e.g., mentors) beliefs and behaviors on the academic development of students. From a self-efficacy perspective, students gain a sense of efficacy, at least in part, through the problem-solving modeling and supportive communication of others (Bandura, 1997). Mentors are likely to be powerful channels of modeling and positive communication, and so quality relatedness in the mentor process is an important part of this.
Salkind_Chapter 58.indd 20
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
21
Practice at the Teacher and Classroom Level A pervading theme underpinning the theoretical traditions in Part II is the role of teachers (and classroom factors) in shaping students’ achievement motivation. Attribution theory proposes that students gain a sense of control and locus through feedback from teachers or by observing models demonstrating a sense of control (Fabricius & Hagen, 1984; Perry & 1988; Peterson et al., 1993; Thompson, 1994; Weiner, 1986). Expectancy-value theory identifies the role of significant socializers’ attitudes, beliefs, and behaviors in the development of students’ expectancies and values (Wigfield & Tonks, 2002). From a goal theory perspective, teacher-set tasks, assessment, and grouping strategies influence the goals students adopt (Anderman & Maehr, 1994; Meece, 1991). Belongingness in the classroom, central to self-determination theory, is cultivated by the teacher and the students collected in the classroom (Deci & Ryan, 2000; La Guardia & Ryan, 2002; Reeve et al., 2004). Students gain a sense of self-efficacy through the modeling and supportive communication of teachers (Bandura, 1997). From a self-worth motivation perspective, Martin, Marsh, Williamson, et al. (2003; see also Covington, 1992, 1998; Thompson, 1994) have shown that students’ motive to protect self-worth is influenced by teachers while other research has demonstrated the impact of approval withdrawal on students’ fear of failure (Elliot & Thrash, 2004). Indeed, teacher and classroom practice can be a vehicle for providing students with a sense of being at one with the group along the lines of communion posited by Bakan some four decades ago and yet let students retain the complementary but nonover-lapping sense of personal agency that is a hallmark of student motivation, engagement, and achievement (Bakan, 1966; see also, for early work, Angyal, 1941, 1965; Maslow, 1968; Waterman, 1981; for later work, see Deci & Ryan, 2000; McAdams et al., 1996). All this being the case, it is clear that the means by which teachers and classroom practice affect achievement motivation are directly and indirectly shaped by relational factors and processes. At the teacher and classroom level, we suggest that instructional, professional development, teacher retention and training, and organizational practices can be conceptualized in terms of these relational factors and processes. In particular, the emerging concept of connective instruction may have implications for teachers’ ongoing professional development, the importance of teacher retetion and attracting prosocial and positive (young) adults to teacher training, and the nature of classroom composition in affecting the motivation and engagement of students and classroom climate. Although not the only teacher and classroom practices that affect achievement motivation, they are a useful and informative means by which to frame practice in relational terms.
Salkind_Chapter 58.indd 21
9/4/2010 10:48:28 AM
22
Motivation
Connective Instruction To the extent that relationships are a vital underpinning of student motivation, engagement, and achievement, teachers who frame practice in relational terms are more likely to foster motivated, engaged, and achieving students. Many studies support this contention (e.g., Abbott & Ryan, 2001; Battistich & Hom, 1997; Elicker & Fortner-Wood, 1995; Fyson, 1999; Kontos & WilcoxHerzog, 1997a, 1997b; Martin, 2006d). Specifically, research supports the following points: a. Students’ sense of support (e.g., being liked, respected, and valued by the teacher) predicts their expectancies for success and valuing of subject matter. Indeed, support from teacher is a consistently influential factor in motivation and achievement (Goodenow, 1993a). b. Students who believe that their teacher is caring also believe they learn more (Teven & McCroskey, 1997). c. Students’ feelings of acceptance by teachers are associated with emotional, cognitive, and behavioral engagement in class (Connell & Wellborn, 1991). d. Teachers who support a student’s autonomy tend to facilitate greater motivation, curiosity, and desire for challenge (Flink, Boggiano, & Barrett, 1990). e. Teachers higher in warmth tend to develop greater confidence in students (Ryan & Grolnick, 1986). Conversely, research also supports the following conclusions: f. When teachers are more controlling, students tend to show less mastery motivation and lower confidence (Deci, Schwartz, Sheinman, & Ryan, 1981). g. Teachers who are not perceived as warm typically evince lower motivation and achievement among students (Kontos & Wilcox-Herzog, 1997b). Relationships, therefore, are central to the issue of teaching and instruction. The concept of connective instruction, built on the previously proposed pastoral pedagogy (Cavanagh, 2001; Hunter, 1994; Martin, 2006a, 2006b), relational pedagogy (Bergum, 2003; Boyd, MacNeil, & Sullivan, 2006; Gadow, 1999), and connective pedagogy (Corbett, 2001a, 2001b; Corbett & Norwich, 1999), is relevant here. Pastoral pedagogy, introduced by Hunter (1994), described how modern teachers harness principles of the Christian pastorate to shape the ethical development of students (see also Cavanagh, 2001). Relational pedagogy refers to pedagogy that has as its foundation the need for good relationships between student and teacher that must also be accompanied by enhanced student learning (Boyd et al., 2006). Extending
Salkind_Chapter 58.indd 22
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
23
Gadow’s (1999) work, connective pedagogy deals with the delivery of teaching that interpersonally connects with learners, seeks to make the learning material meaningful (i.e., another form of connection), connects with external sectors to maximize student development, and looks to connect with significant others, such as parents, in students’ lives (Corbett, 2001a, 2001b; Corbett & Norwich, 1999). Martin (2006a, 2006b; see also Martino & Pallotta-Chiarolli, 2003; Munns, 1998, for cognate perspectives) offered an adaptation of these notions to more centrally position relatedness and connectedness between teacher and student in the context of instruction itself. Martin proposed such instruction – connective instruction – as that which connects the student and teacher on three levels: the level of substance and subject matter, the interpersonal level, and the instructional level (see also Martino & Pallotta-Chiarolli, 2003; Munns, 1998). Hence, connective instruction comprises three relationships: the substantive relationship (the connection between the student and the subject matter and substance of what is taught – i.e., connecting to the what), the interpersonal relationship (the connection between the student and the teacher himself or herself – i.e., connecting to the who), and the instructional relationship (the connection between the student and the instruction or teaching – i.e., connecting to the how). Although connective instruction emphasizes the impact of teacher on student, there is also an impact of student(s) on teacher such that the teacher is able to refine or adjust subject matter, interpersonal relatedness, and instruction on the basis of students’ responses to the teacher’s connective instruction. Connective instruction, then, may be viewed as a bidirectional process that is mutually beneficial and enhancing to both teacher and student. Substantive connectiveness (connecting to the what ). The first relationship in connective instruction is that between the student and the actual subject matter and nature of tasks conducted in the teaching and learning context. Core elements of subject matter that facilitate students’ connection to the teaching and learning context include setting tasks that are appropriately challenging, assigning work that is important and meaningful, building variety into content and assessment tasks, and utilizing material that arouses curiosity and is interesting to young people (e.g., Covington, 1998; Martin, 2002a, 2003a, 2003b; McInerney, 2000). These elements reflect content, subject matter, and learning tasks to which a student can meaningfully connect. These are a means by which the student engages with the what of teaching and learning. A good deal of this component of relational pedagogy rests on the valuing dimension of expectancy-value theory and the mastery dimension of goal theory, which emphasize relevance, contextual dimensions of subject matter, utility, interest, and satisfaction in learning (see Eccles, 1983; Elliot, 1997, 1999; McInerney, 2000; Wigfield, 1994; Wigfield & Tonks, 2002).
Salkind_Chapter 58.indd 23
9/4/2010 10:48:28 AM
24
Motivation
Interpersonal connectiveness (connecting to the who). The second relationship in the connective instruction framework is that between the student and the teacher. Previously identified characteristics of quality interpersonal relationships in the teaching and learning context include actively listening to students’ views, allowing students to have input into decisions that affect them, getting to know students, showing no favoritism but affirming all students, accepting students’ individuality, and having positive but attainable expectations for students (Martin, 2002a, 2003a, 2003b; Slade, 2001; see also Flink et al., 1990; Goodenow, 1993a; Teven & McCroskey, 1997, for research confirming the yields of such relational characteristics). These elements are a means by which the student engages with the who in the teaching and learning context. This component explicitly invokes interpersonal relationships as central to learning and instruction – and by implication is perhaps most closely aligned with self-determination theory and its relatedness construct (Ryan & Deci, 2000). Whereas other theories might rely on interpersonal relatedness more as a conduit for their constructs and processes (e.g., for enhancing self-efficacy, control, self-worth, expectations, valuing) – self-determination theory quite centrally comprises the need for interpersonal relatedness as an important end in itself. Instructional connectiveness (connecting to the how ). The third relationship in connective instruction is that between the student and the teaching or instruction itself. Elements of effective instruction include maximizing opportunities for students to develop competence, providing clear feedback to students, explaining things clearly and carefully, injecting variety into teaching methods, encouraging students to learn from their mistakes, clearly demonstrating to students how schoolwork is relevant or meaningful, ensuring all students keep up with the work, and allowing for opportunities to catch up (e.g., Baird, 1999; Bandura, 1997; Covington, 1997; Craven, Marsh, & Debus, 1991; Martin, 2002a, 2003a, 2003b). These elements characterize high-quality instructional practice and are a means by which the student engages with the how of teaching and learning. They bring into consideration teacher-based behaviors that emphasize effective feedback and reward (attribution theory), nurturing of students’ expectancies and valuing of subject matter (expectancy-value theory), development of a mastery and improvement focus (goal theory), use of modeling (self-efficacy theory), and reduction of achievement stress and fear of failure (self-worth motivation theory). The role of the student in connective instruction. Connective instruction also recognizes that teaching is not a unidirectional process. Rather, at each of the three levels (substantive, interpersonal, and instructional) there is the opportunity for the teacher to refine or adjust the relevant level. For example, in response to a lack of student interest in a particular lesson, the teacher might adjust subject matter, how he or she is relating interpersonally to students, the instructional techniques themselves, or a combination of these.
Salkind_Chapter 58.indd 24
9/4/2010 10:48:28 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
25
Hence, in the true spirit of relatedness, there exists a bidirectional process potentially mutually beneficial to all parties. In sum, connective instruction explicitly recognizes that relatedness is an instructional need and that students are likely to be more engaged and motivated when this need is met (Battistich & Hom, 1997; Burroughs & Eby, 1998; Chavis & Newbrough, 1986; N. Fry, 1994; Fyson, 1999; McCarthy et al., 1990). Through meeting this relatedness need, connective instruction facilitates students’ identification with the school and provides a connection with instruction on a more meaningful basis (see Munns, 1998). Jointly, identification with school and connection with instruction are proposed to promote adaptive academic engagement and motivation.
Professional Development Seminal motivation theory and conceptualizing around instruction itself (e.g., connective instruction) can also be a basis for teacher education and professional development (Bergum, 2003; Boyd et al., 2006; Cavanagh, 2001; Corbett, 2001a; Hunter, 1994; Martin, 2006a, 2006b). Teacher training and preservice education have been a focus of much prior research, with a number of journals specifically devoted to it. However, relatively less attention has been given to the professional development of teachers in the workforce. Teacher professional development (or in-servicing) has the potential for enhancing the educational outcomes of students and assisting teachers to operate more effectively in the classroom (Rowe & Rowe, 1999). Cherubini, Zambelli, and Boscolo (2002) examined the effects of professional development on teachers’ success in facilitating student motivation. Teachers participated in professional development related to theoretical and methodological aspects of motivation research and strategies to modify and sustain student motivation. Their findings showed that participants increased their practical knowledge about student motivation, were better able to identify and consider motivational problems, and planned new instructional programs to sustain their students’ motivation (see also Schorr, 2000). Similarly, Stipek et al. (1998) found that teachers participating in professional development focusing on student motivation were more likely to emphasize mastery and understanding in their teaching, to encourage student autonomy, and to create psychologically safer classroom environments. Participating teachers also made more-accurate assessments of students’ motivation – an important precursor to effective and targeted intervention (Martin, 2008a). Recent reviews have pointed to the need for teacher professional development in assisting disengaged and disadvantaged students. It is noteworthy that one of the key areas targeted for such professional development is improving teacher-student relationships (Becker & Luthar, 2002). Integrating theory and research detailed in Parts II and III suggests that professional development along these lines should focus on (a) developing a sense of community among
Salkind_Chapter 58.indd 25
9/4/2010 10:48:28 AM
26
Motivation
students through relationally supportive school structures (Battistich & Hom, 1997; Cumming, 1996); (b) cultivating cooperative and mastery-oriented climates as articulated in goal theory (Qin et al., 1995); (c) integrating students within their peer groups (Bolger, Patterson, & Kupersmidt, 1998) to develop a sense of belonging consistent with self-determination theory; (d) developing competence and personal control in the context of interpersonal relatedness (Connell & Wellborn, 1991) along the lines of that articulated under self-efficacy and attribution principles, respectively; (e) reducing emphases on teacheras-authority (Flink et al., 1990), consistent with connective instructional principles introduced above (see also Bergum, 2003; Boyd et al., 2006; Cavanagh, 2001; Corbett, 2001a, 2001b; Hunter, 1994; Martin, 2006a, 2006b); and (f ) providing positive role modeling (Hernandez, 1995), consistent with self-efficacy theory. These are all a means of intentionally directing professional development toward relational understandings of teaching and learning. This accords with our overall relational conceptualization of motivationand achievement-related theory, key issues, and practices described above.
Teacher Retention and Training In almost every organizational setting, the workplace is changing, and at a seemingly increasing pace (Schabaracq & Cooper, 2000). Most employees work long hours, often not sufficiently remunerated (Dollard, 2006). Reports of an increasing lack of control, less input into decision making, and less involvement in the scheduling of work tasks and methods of work are consistently associated with poorer well-being (Karasek & Theorell, 1990). Indeed, stressrelated workers’ compensation claims continue to rise at an alarming rate. For example, in Australia (the context for the present authors), stress-related claims increased by more than 60% between 1996–1997 and 2002–2003 (Office of the Australian Safety and Compensation Council, 2006), and in the united States, more than half of working adults say they are concerned about the amount of stress in their lives (Stambor, 2006). Of particular relevance to this review, some researchers place school teachers among the group of employees facing many or all of the above pressures (Martin & Marsh,in press). Such research has identified stress, disengagement, heavy workloads, little support, and high turnover in this challenging setting (Fry & Martin, 1994; Mayer, 2006; McCormack, Gore, & Thomas, 2006; Richardson & Watt, 2006; Smithers & Robinson, 2003) – factors that significantly hamper individual career and employment development. It is important to note that such factors also lead to high rates of teacher attrition, high mobility, and even difficulties attracting sufficient numbers of teachers into teacher training (G. Fry & Martin, 1994; Organisation for Economic Co-operation and Development, 2005; Smithers & Robinson, 2003; Vinson, 2002). One of the effects of teacher attrition and mobility is that there are fewer opportunities for consistent and stable relationships between student and
Salkind_Chapter 58.indd 26
9/4/2010 10:48:29 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
27
teacher and, by implication, fewer consistent prosocial and positive adults in students’ lives. Similarly, failure to attract potentially good teachers to teaching means a more limited pool of available such people for children and young people and the consequent cost of this in terms of children’s and young people’s potentially supportive interpersonal relationships. The present review, then, echoes calls in other research for support needed by teachers and schools to more effectively deal with the stressors that lead to attrition, mobility, and alternative career choices (G. Fry & Martin, 1994; Martin & Marsh, in press; Mayer, 2006; McCormack et al., 2006; Organisation for Economic Co-operation and Development, 2005; Richardson & Watt, 2006; Smithers & Robinson, 2003; Vinson, 2002).
Classroom Composition From a relational perspective, it is also important to consider the nature and number of students in the classroom. If, as key theories (e.g., goal theory, self-efficacy theory, attribution theory) propose, motivation and achievement are affected by goal climates, peers, and models with whom one identifies (e.g., other students), then it follows that research and practice must look more closely at the composition of students in the classroom. To date, most multilevel research examining variance in achievement and motivation at the classroom level attributes such variance to the teachers themselves (e.g., see Hill & Rowe, 1996; Papaioannou, Marsh, & Theodorakis, 2004; Rowe & Rowe, 1999). Relatively little research, however, has attempted to disentangle the effects of the teacher from those of the class. If, for example, there is an effect of class composition on motivation and engagement, then there are implications from a relational perspective. Some immediate questions from an achievement motivation perspective would be: What students are collected together? How many are there? Where are they seated? Whom do they work with or alongside? How do they interact? How do they get on? Disentangling the relative role of teacher from that of class composition is most appropriately handled by multilevel cross-classification analyses in which there are multiple teachers, each of whom teaches multiple classes. Marsh, Martin, and Cheng (2008) conducted such analyses and showed that there were some differences between classes but that these differences did not always generalize over different classes taught by the same teacher. Hence, over and above teacher effects are the effects of class composition. The researchers concluded that both the quality of the teaching and the classroom composition are factors in motivation (see also Martin & Marsh, 2005). This achievement has implications for classroom climate research, which suggests that the motivational climate may also be a function of the particular collection of students in that class. Whereas in recent years there has been substantial focus on teacher effectiveness and characteristics of effective teachers, it might now be timely to revisit the issue of class composition and
Salkind_Chapter 58.indd 27
9/4/2010 10:48:29 AM
28
Motivation
perhaps from a relational perspective. More specifically, in the context of achievement motivation, research might investigate the characteristics of effective classrooms, the students collected together in the classroom, the bases on which they are collected together, and how they interact. Moving beyond the students themselves are other factors relevant to the classroom and its environment that affect relatedness among students and between students and teachers. These include such factors as the classroom’s physical space (encompassing size, organization of furniture and equipment, lighting, temperature, etc.), its location in the school itself (e.g., in terms of noise, proximity to other classrooms for ease of movement, etc.), and even the time of day at which classroom activities are conducted. Prior work has been conducted into cognate issues such as seating arrangement (Hastings & Schwieso, 1995; Marx, Fuhrer, & Hartig, 1999), streaming (Marsh, 1987; Marsh & Hau, 2003), single-sex class composition (Marsh, 1989; Marsh & Rowe, 1996; Martin, 2004; Martin & Marsh, 2005), and the physicality of the learning environment (O’Hare, 1998; Stone, 2001). Hence, class composition and other class environment factors from a relational and achievement motivation perspective are an avenue for further research. Moreover, from a relational perspective, such research would also need to establish how much variance in achievement motivation at the class level is a function of teacherstudent interactions (i.e., class-level variance due to teacher–student relatedness) and how much is unique to student–student interactions (i.e., class-level variance due to student-student relatedness).
Practice at the School Level The theories informing this discussion deal primarily with intrapsychic, individualistic constructs that are directed at individuals or relatively small groups and activated by individuals such as teachers, counselors, psychologists, and the like. Although the issue of relatedness may be more aligned with research and practice at the individual and interpersonal level, it is important to consider what application of theory can be directed at the school level. A thoroughgoing treatment of relatedness would encompass integrated recommendations at all levels: student, teacher or classroom, and school. For example, hypothesized under goal theory are mastery and performance classroom climates that also have implications for whole-school climates (e.g., see Duda, 2001; Middleton & Midgley, 1997; Papaioannou et al., 2004; Roeser et al., 1996; urdan et al., 1998). The notion of fear of failure and disengagement at the school level is not inconsistent with predictions under need achievement and self-worth motivation theories (Atkinson, 1957; Covington, 1992, 1998; McClelleand, 1965). Work in the areas of attributions and learned helplessness shows that through observing potent models, even relatively large groups can acquire helpless behaviors
Salkind_Chapter 58.indd 28
9/4/2010 10:48:29 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
29
and dispositions (Peterson et al., 1993). Indeed, recent multilevel modeling research has examined school-level variance in constructs central to self-efficacy, expectancy-value, goal, self-worth motivation, and self-determination theories (Marsh et al., 2008; Martin & Marsh, 2005). Hence, there are extensions of achievement motivation theory and research to school-level considerations that are logical and defensible. Given this, we address two issues relevant to such considerations: school as community and effective leadership. Again, they are not the only school-level practices that are relevant to relationships, but they are a useful means by which to consider relatedness at a school level as relevant to achievement motivation.
School as Community Cooperative climates develop a sense of community and belonging, consistent with predictions under goal and self-determination theories (Ames, 1992; Dweck, 1992; Elliot, 1997; Qin et al., 1995; Ryan & Deci, 2000). A sense of community affects young people’s sense of self and efficacy. It can also affect their engagement. In the educational context, Becker and Luthar (2002) suggest that an important means of enhancing motivation is through promoting a sense of belonging in school. In fact, it has been suggested that there can be tension between the emphasis on social cohesion (e.g., school as community) and a strong academic mission – with schools often pursuing one more than the other. Indeed, research under the goal theory framework has attempted to resolve similar dissonance through the articulation of multiple goals (e.g., see Heyman & Dweck, 1992; urdan & Maehr, 1995; Wentzel, 1992). Encouragingly, it has been found that achievement can result from an integrated emphasis on social cohesion and academic mission (Shouse, 1996) and that psychological school membership (students’ perceived belonging) is significantly linked to academic motivation and achievement (Goodenow, 1993b). Conversely, alienation may be conceptualized, not just in relational terms (i.e., not feeling at home in a particular institution), but also in academic terms (i.e., not being able to relate to particular content or the presentation of that content). For these reasons, relational perspectives would support greater school-level action to enhance a sense of community, belonging, and connectedness at school (following others, e.g., Cumming, 1996; Hernandez, 1995; Mann, 1989).
Effective Leadership In our discussion of teacher- and classroom-level practice, we described how feedback, modeling of efficacy and control, effective reward contingencies, expectations, set tasks, assessment and grouping strategies, supportive communication, and the transfer of fear and approval are means
Salkind_Chapter 58.indd 29
9/4/2010 10:48:29 AM
30
Motivation
by which teachers relationally influence students’ achievement motivation. It is not inconceivable that similar dynamics are relevant at upper levels, such as at the school executive or leadership level. Research into school effectiveness consistently emphasizes the importance of effective leadership (Edmonds, 1979; Levine & Lezotte, 1990; Marzano, 2003; Sammons, 1999). There are many features of effective leadership that have parallels with motivation and achievement theories, including visibility and energy that serve as modeling behavior (see self-efficacy theory), high expectations for staff and students (see expectancy-value theory), openness to feedback and input that can enhance teachers’ sense of control and autonomy (see attribution and self-determination theory), and advocacy for the school that demonstrates valuing (see expectancy-value theory). Other relational features include emotional and professional support of staff, mutual respect between staff and the executive, connectedness to the student body, interest in and involvement with parents, and links to the community and industry (Blum, Butler, & Olson, 1987; Hallinger & Murphy, 1987; Levine & Lezotte, 1990; Sammons, Hillman, & Mortimore, 1995). In implementing schoollevel action along these lines, however, it is important not to underestimate the yields of intervention at the student and classroom levels. For example, in the context of the multiple and sharp developmental trajectories occurring through childhood and adolescence, the impact of relational intervention may be greater when directed to students and classrooms than when directed to school executives.
Part IV: Integrative Model of Theory and Practice In finalizing our review, we synthesize its key elements into an integrative model of theory and relational practice. Table 2 presents this model and summarizes the relevant theories, their component constructs, recommended educational practice, and the mechanisms and conduits within the theories that inform or implement such practice. Also evident in the table are some of the congruencies between central constructs in the model, including competence-based constructs such as self-efficacy, expectancies, and worth, and control-based constructs such as control and autonomy. The table also shows that there are commonalities in terms of the mechanisms that are the means by which these theories and component constructs are relationally translated to educational practice. These include the roles of modeling, communication of expectations, task assignment, skill development, reward contingencies, and feedback to students – all central to motivation- and achievement-related theories detailed in Part II. It is also evident in Table 2 that interpersonal relationships are directly or indirectly present in the way theory is manifested in students’ academic lives.
Salkind_Chapter 58.indd 30
9/4/2010 10:48:29 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
31
Table 2: Summary of constructs, mechanisms, and practice relevant to relatedness Theory
Key constructs relevant to review
Attribution theory
• Perceived control • Perceived locus • Helplessness
Expectancy-value theory
• Expectancy for success • Valuing of school, subjects, etc.
Goal theory
• • • • •
Mastery goals Performance goals Social goals Motivational climate (Approach and avoidance extensions)
Mechanisms or conduits • Feedback to students • Reward contingencies • Observation of and identification with relevant others • Communication of expectancies • Communication of valuing • Modeling of valuing • Responses to or treatment of students in class • Tasks set • Assessment and grading practices • Development of climate • Reasons for learning valued by relevant others • Warmth, support, and nurturance • Promoting independence • Self-responsibility
Self-determination theory
• Relatedness or belonging • Autonomy • Competence
Self-efficacy
• Self-efficacy • Control
• Modeling • Positive communication from relevant others • Vicarious influence
Self-worth motivation theory
• Self-worth • Fear of failure • Disengagement
• Approval, affirmation • Conditions of love, approval • Intergenerational transfer of love • Reward contingencies • Grading practices
Trilevel educational practice Practice at student level: • Universal student programs and intervention • Targeted student programs and intervention • Extracurricular activity • Cooperative learning • Mentoring
Practice at teacher and classroom level: • Connective instruction • Professional development • Teacher retention and training • Classroom composition
Practice at the school level: • School as community • Effective leadership
Moving beyond theory, Table 2 suggests that interpersonal relationships play a pivotal part in resolving complex or critical concerns with respect to current and prospective educational practice. For these reasons, we argue that motivation- and achievement-based theory, key issues, and practice may be conceptualized from a relational perspective. Hence, the interplay of theory and practice from a relational perspective provides direction for educators seeking to enhance students’ achievement motivation.
Salkind_Chapter 58.indd 31
9/4/2010 10:48:29 AM
32
Motivation
Conclusion This review has elucidated the multiple ways in which interpersonal relationships affect motivation and achievement, the benefits derived from relational perspectives on motivation and engagement, achievement motivation theories relevant to relationships, and relational practices underpinning student-, teacher- or classroom-, and school-level actions. Theory and research support the proposition that positive relationships with significant others are cornerstones of young people’s capacity to function effectively in social, affective, and academic domains. With a focus on the latter, we conclude that high-quality interpersonal relationships in students’ lives contribute to their academic motivation, engagement, and achievement. Further, relational elements of educational theory provide guidance for educational practice directed at student motivation and achievement. Taken together, this integration of relationally based theory and practice holds implications for researchers studying issues relevant to motivation and achievement and is also relevant to educators seeking to enhance educational outcomes that rely in large part on the extent to which their students are interpersonally connected to the significant others in their academic lives.
References Abbott, J., & Ryan, T. (2001). The unfinished revolution: Learning, human behavior, community and political paradox. Alexandria, VA: Association for Supervision and Curriculum Development. Ainley, J. (1995). Students’ views of their schools. Unicorn, 21, 5 –16. Ames, C. (1992). Classrooms: Goals, structures and student motivation. Journal of Educational Psychology, 84, 261–271. Anderman, E. A., & Maehr, M. L. (1994). Motivation and schooling in the middle grades. Review of Educational Research, 64, 287–310. Angyal, A. (1941). Foundations for a science of personality. Cambridge, MA: Harvard University Press. Angyal, A. (1965). Neurosis and treatment: A holistic theory. New York: J. Wiley. Arbreton, A., & Blumenfield, P. (1997). Change in competence beliefs and subjective task values across the elementary school years: A 3-year study. Journal of Educational Psychology, 89, 451– 469. Argyle, M. (1999). The development of social coping skills. In E. Frydenberg (Ed.), Learning to cope: Developing as a person in complex societies (pp. 81–106). Oxford, UK: Oxford University Press. Argyle, M., & Furnham, A. (1983). Sources of satisfaction and conflict in long-term relationships. Journal of Marriage and the Family, 45, 481– 493. Atkinson, J. W. (1957). Motivational determinants of risk-taking. Psychological Review, 64, 359–372. Avery, R. R., & Ryan, R. M. (1987). Object relations and ego development: Comparison and correlates in middle childhood. Journal of Personality, 56, 547–569. Baird, J. R. (1999). Learning to convert ignorance into understanding. In J. R. Baird (Ed.), Reflecting, teaching, learning: Perspectives on educational improvement. Cheltenham, Victoria, Australia: Hawker Brownlow Education.
Salkind_Chapter 58.indd 32
9/13/2010 3:36:40 PM
Martin and Dowson
Relationships, Student Motivation and Engagement
33
Argyle, M. (1999). The development of social coping skills. In E. Frydenberg (Ed.), Learning to cope: Developing as a person in complex societies (pp. 81–106). Oxford, UK: Oxford University Press. Argyle, M., & Furnham, A. (1983). Sources of satisfaction and conflict in long-term relationships. Journal of Marriage and the Family, 45, 481– 493. Atkinson, J. W. (1957). Motivational determinants of risk-taking. Psychological Review, 64, 359–372. Avery, R. R., & Ryan, R. M. (1987). Object relations and ego development: Comparison and correlates in middle childhood. Journal of Personality, 56, 547–569. Baird, J. R. (1999). Learning to convert ignorance into understanding. In J. R. Baird (Ed.), Reflecting, teaching, learning: Perspectives on educational improvement. Cheltenham, Victoria, Australia: Hawker Brownlow Education. Bakan, D. (1966). The duality of human existence: Isolation and communion in Western man. Boston: Beacon Press. Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. New Jersey: Prentice Hall. Bandura, A. (1997). Self-efficacy: The exercise of control. New York: Freeman. Barber, B. L., Eccles, J. S., & Stone, M. R. (2001). Whatever happened to the jock, the brain, and the princess? Young adult pathways linked to adolescent activity involvement and social identity. Journal of Adolescent Research, 16, 429– 455. Barker, K., Dowson, M., & McInerney, D. M. (2002). Performance approach, performance avoidance and depth of information processing: A fresh look at relations between students’ academic motivation and cognition. Educational Psychology, 22, 571–589. Barratt, R. (1998). The future: The shape of middle schooling in Australia. Curriculum Perspectives, 18, 53–75. Battistich, V., & Hom, A. (1997). The relationship between students’ sense of their school as a community and their involvement in problem behaviors. American Journal of Public Health, 87, 1997–2001. Baumeister, R. F., & Leary, M. R. (1995). The need to belong: Desire for interpersonal attachments as a fundamental human motivation. Psychological Bulletin, 117, 497–529. Becker, B. E., & Luthar, S. S. (2002). Social-emotional factors affecting achievement outcomes among disadvantaged students: Closing the achievement gap. Educational Psychologist, 37, 197–214. Bergum, V. (2003). Relational pedagogy: Embodiment, improvisation, and interdependence. Nursing Philosophy, 4, 121–128. Berkowitz, B. (1996). Personal and community sustainability. American Journal of Community Psychology, 24, 441– 460. Blum, R. E., Butler, J. A., & Olson, N. L. (1987). Leadership for excellence: Research-based training for principals. Educational Leadership, 45, 25–29. Bolger, K. E., Patterson, C. J., & Kupersmidt, J. B. (1998). Peer relationships and self-esteem among children who have been maltreated. Child Development, 69, 1171–1197. Borkowski, J., Carr, M., Rellinger, E., & Pressley, M. (1990). Self-regulated cognition: Interdependence of meta-cognition, attributions, and self-esteem. In B. F. Jones, & L. Idol (Eds.), Dimensions of thinking and cognitive instruction (pp. 53–92). Hillsdale: NJ: Lawrence Erlbaum. Boyd, R., MacNeil, N., & Sullivan, G. (2006). Relational pedagogy: Putting balance back into students’ learning. Curriculum Leadership: An Electronic Journal for Leaders in Education, 13. Retrieved from http://www.curriculum.edu.au/leader/relational_ pedagogy:_putting_balance_back_into_stu,13944.html?issueID=10277 Broh, B. A. (2002). Linking extracurricular programming to academic achievement: Who benefits and why? Sociology of Education, 75, 69–91. Bronfenbrenner, U. (1974). The origins of alienation. Scientific American, 231, 53–61.
Salkind_Chapter 58.indd 33
9/4/2010 10:48:29 AM
34
Motivation
Bronfenbrenner, U. (1986, February). Alienation and the four worlds of childhood. Phi Delta Kappan, 430 – 436. Brophy, J. (2005). Goal theorists should move on from performance goals. Educational Psychologist, 40, 167–176. Brown, R., & Evans, W. P. (2002). Extracurricular activity and ethnicity: Creating greater school connection among diverse student populations. Urban Education, 37, 41–58. Burroughs, S. M., & Eby, L. T. (1998). Psychological sense of community: A measurement system and explanatory framework. Journal of Community Psychology, 26, 509–532. Cavanagh, S. L. (2001). The pedagogy of the pastor: The formation of the social studies curriculum in Ontario. Canadian Journal of Education, 26, 401– 417. Ceci, S. J., & Papierno, P. B. (2005). The rhetoric and reality of gap closing: When the “have-nots” gain but the “haves” gain even more. American Psychologist, 60, 149–160. Chavis, D., & Newbrough, J. R. (1986). The meaning of “community” in community psychology. Journal of Community Psychology, 14, 335–340. Cherubini, G., Zambelli, F., & Boscolo, P. (2002). Student motivation: An experience of inservice education as a context for professional development of teachers. Teaching and Teacher Education, 18, 273–288. Collins, G. (1993). Meeting the needs of Aboriginal students. Aboriginal Child at School, 21, 3–17. Connell, J. P., & Wellborn, J. G. (1991). Competence, autonomy, and relatedness: A motivational analysis of self-system processes. In M. R. Gunnar & L. A. Sroufe (Eds.), Self processes in development: Minnesota Symposium on Child Psychology: Vol. 29 (pp. 244–254). Hillsdale, NJ: Lawrence Erlbaum. Cooper, H., Valentine, J. C., Nye, B., & Lindsay, J. J. (1999). Relationships between five after-school activities and academic achievement. Journal of Educational Psychology, 91, 369–378. Corbett, J. (2001a). Supporting inclusive education: A connective pedagogy. London: Routledge-Falmer. Corbett, J. (2001b). Teaching approaches which support inclusive education: A connective pedagogy. British Journal of Special Education, 28, 55–59. Corbett, J., & Norwich, B. (1999). Learners with special educational needs. In P. Mortimore (Ed.), Understanding pedagogy and its impact on learning (p. 115–136). London: Paul Chapman. Covington, M. V. (1992). Making the grade: A self-worth perspective on motivation and school reform. Cambridge, UK: Cambridge University Press. Covington, M. V. (1998). The will to learn: A guide for motivating young people. Cambridge, UK: Cambridge University Press. Covington, M. V. (2002). Rewards and intrinsic motivation: A needs-based developmental perspective. In F. Pajares & T. Urdan (Eds.), Academic motivation of adolescents. Greenwich, CT: Information Age. Cowen, E. (1988). Resilient children, psychological wellness and primary prevention. American Journal of Community Psychology, 16, 591– 607. Craven, R. G., Marsh, H. W., & Debus, R. L. (1991). Effects of internally focused feedback and attributional feedback on the enhancement of academic self-concept. Journal of Educational Psychology, 83, 17–26. Creasey, G., Ottlinger, K., Devico, K., Murray, T., Harvey, A., & Hesson-McInnis, M. (1997). Children’s affective responses, cognitive appraisals, and coping strategies in response to the negative affect of parents and peers. Journal of Experimental Child Psychology, 67, 39–56. Culp, A. M., Hubbs-Tait, L., Culp, R. E., & Starost, H. J. (2000). Maternal parenting characteristics and school involvement: Predictors of kindergarten cognitive competence among Head Start children. Journal of Research in Childhood Education, 15, 5–17.
Salkind_Chapter 58.indd 34
9/4/2010 10:48:29 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
35
Cumming, J. (1996). From alienation to engagement: Opportunities for reform in the middle years of schooling: Vol. 3. Teacher action. Canberra: Australian Curriculum Studies Association. Cunningham, E. G., Brandon, C. M., & Frydenberg, E. (1999). Building resilience in early adolescence through a universal school-based preventive program. Australian Journal of Guidance and Counselling, 9, 15–24. Damon, W. (1983). Social and personality development: Infancy through adolescence. New York: Norton. Deci, E. L., & Ryan, R. M. (2000). The darker and brighter sides of human existence: Basic psychological needs as a unifying concept. Psychological Inquiry, 11, 319–338. Deci, E. L., Schwartz, A. J., Sheinman, L., & Ryan, R. M. (1981). An instrument to assess adults’ orientations toward control versus autonomy with children: Reflections on intrinsic motivation and perceived competence. Journal of Educational Psychology, 73, 642 – 650. De Leon, G. (2000). The therapeutic community: Theory, model and method. New York: Springer. Dollard, M. (2006). Throwaway workers. InPsych, 28, 8–12. Dowson, M., & McInerney, D. M. (2001). Psychological parameters of students’ social and work avoidance goals: A qualitative investigation. Journal of Educational Psychology, 93, 35–42. Dowson, M., & McInerney, D. M. (2003). What do students say about their motivational goals? Towards a more complex and dynamic perspective on student motivation. Contemporary Educational Psychology, 28, 91–113. Dryfoos, J. G. (1990). Adolescents at risk: Prevalence and prevention. New York: Oxford University Press. Duda, J. L. (2001). Achievement goal research in sport: Pushing the boundaries and clarifying some misunderstandings. In G. C. Roberts (Ed.), Advances in motivation in sport and exercise (pp. 129–182). Champaign, IL: Human Kinetics. Dweck, C. S. (1992). The study of goals in psychology. Psychological Science, 3, 165–167. Dweck, C., & Leggett, E. (1988). A social-cognitive approach to motivation and personality. Psychological Review, 95, 256–273. Eccles, J. (1983). Expectancies, values, and academic behaviors. In J. Spence (Ed.), Achievement and achievement motives (pp. 75–146). San Francisco: Freeman. Eccles, J. S. (1999). The development of children ages 6 to 14. Future of Children, 9, 30–42. Eccles, J. S., & Barber, B. L. (1999). Student council, volunteering, basketball, or marching band: What kind of extracurricular involvement matters? Journal of Adolescent Research, 14, 10–43. Edmonds, R. R. (1979). Effective schools for the urban poor. Educational Leadership, 37, 15–27. Elicker, J., & Fortner-Wood, C. (1995). Adult-child relationships in early childhood programs: Research in review. Young Children, 51, 69–78. Elliot, A. J. (1997). Integrating the “classic” and “contemporary” approaches to achievement motivation: A hierarchical model of approach and avoidance achievement motivation. In M. L. Maehr & P. R. Pintrich (Eds.), Advances in motivation and achievement: Vol. 10 (pp. 143–179). Greenwich, CT: JAI Press. Elliot, A. J. (1999). Approach and avoidance motivation and achievement goals. Educational Psychologist, 34, 169–189. Elliot, A. J., & Thrash, T. M. (2004). The intergenerational transmission of fear of failure. Personality and Social Psychology Bulletin, 30, 957–971. Fabricius, W. V., & Hagen, J. W. (1984). Use of causal attributions about recall performance to assess meta-memory and predict strategic memory behavior in young children. Developmental Psychology, 20, 975–987.
Salkind_Chapter 58.indd 35
9/4/2010 10:48:29 AM
36
Motivation
Fanshawe, J. P. (1989). Personal characteristics of an effective teacher of adolescent Aboriginals. Aboriginal Child at School, 17, 35–48. Field, T., Diego, M., & Sanders, C. (2002). Adolescents’ parent and peer relationships. Adolescence, 37, 121–130. Flink, C., Boggiano, A. K., & Barrett, M. (1990). Controlling teaching strategies: Undermining children’s self-determination and performance. Journal of Personality and Social Psychology, 59, 916–924. Fry, N. (1994). Meeting in the middle: Preparing teachers for working with young adolescents. Unicorn, 20, 21–27. Fry, G., & Martin, A. J. (1994). Factors contributing to identification and incidence of stress during the school practicum as reported by supervising teachers. In T. A. Simpson (Ed.), Teacher Educators’ Annual Handbook. Queensland, Australia: QUT Press. Furrer, C., & Skinner, E. (2003). Sense of relatedness as a factor in children’s academic engagement and performance. Journal of Educational Psychology, 95, 148–162. Fyson, S. J. (1999). Developing and applying concepts about community: Reflections from the field. Journal of Community Psychology, 27, 347–365. Gadow, S. (1999) Relational narrative: The postmodern turn in nursing ethics. Scholarly Inquiry for Nursing Practice, 13, 57–70. Gaede, S. D. (1985). Belonging: Our need for community in church and family. Grand Rapids, MI: Academic Books. Glover, S., Burns, J., Butler, H., & Patten, G. (1998). Social environments and the emotional wellbeing of young people. Family Matters, 49, 11–16. Goodenow, C. (1993a). Classroom belonging among early adolescent students: Relationships to motivation and achievement. Journal of Early Adolescence, 13, 21– 43. Goodenow, C. (1993b). The psychological sense of school membership among adolescents: Scale development and educational correlates. Psychology in the Schools, 30, 79–90. Graham, S. (1994). Motivation in African-Americans. Review of Educational Research, 64, 55–117. Green, J., Martin, A. J., & Marsh, H. W. (2007). Motivation and engagement in English, mathematics and science high school subjects: Towards an understanding of multidimensional domain specificity. Learning and Individual Differences, 17, 269–279. Groome, H., & Hamilton, A. (1995). Meeting the educational needs of Aboriginal adolescents. Canberra, Australia: AGPS. Groteluschen, A. K., Borkowski, J. G., & Hales, C. (1990). Strategy instruction is often insufficient: Addressing the interdependency of executive and attributional processes. In T. Scruggs & B. Wong (Eds.), Intervention research in learning disabilities (pp. 81–101). New York: Springer-Verlag. Gutman, L. M., Sameroff, A., & Eccles, J. S. (2002). The academic achievement of African American students during early adolescence: An examination of multiple risk, promotive, and protective factors. American Journal of Community Psychology, 30, 401– 428. Hallinger, P., & Murphy, J. F. (1987). Assessing and developing instructional leadership. Educational Leadership, 45, 54–61. Harackiewicz, J. M., Barron, K. E., Pintrich, P. R., Elliott, P. R., & Thrash, T. M. (2002). Revision of achievement goal theory: Necessary and illuminating. Journal of Educational Psychology, 94, 638–645. Hareli, S., & Weiner, B. (2000). Accounts for success as determinants of perceived arrogance and modesty. Motivation and Emotion, 24, 215–236. Hareli, S., & Weiner, B. (2002). Social emotions and personality inferences: A scaffold for a new direction in the study of achievement motivation. Educational Psychologist, 37, 183–193.
Salkind_Chapter 58.indd 36
9/4/2010 10:48:29 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
37
Hargreaves, A., Earl, L., & Ryan, J. (1996). Schooling for change: Reinventing education for early adolescents. Washington, DC: Falmer Press. Hartup, W. W. (1982). Peer relations. In C. B. Kopp & J. B. Krakow (Eds.), The child: Development in a social context (pp. 514–575). Reading, MA: Addison-Wesley. Hastings, N., & Schwieso, J. (1995). Tasks and tables: The effects of seating arrangements on task engagement in primary classrooms. Educational Research, 37, 279–291. Hernandez, A. E. (1995). Do role models influence self-efficacy and aspirations in Mexican American at-risk females? Hispanic Journal of Behavioral Sciences, 17, 256–263. Heyman, G. D., & Dweck, C. S. (1992). Achievement goals and intrinsic motivation: Their relation and their role in adaptive motivation. Motivation and Emotion, 16, 231–247. Hill, J. L. (1996). Psychological sense of community: Suggestions for future research. Journal of Community Psychology, 24, 431– 438. Hill, P. W., & Rowe, K. J. (1996). Multilevel modelling in school effectiveness research. School Effectiveness and School Improvement, 7, 1–34. Howard, S., & Johnson, B. (2000). What makes the difference? Children and teachers talk about resilient outcomes for children “at risk.” Educational Studies, 26, 321–337. Hunter, I. (1994). Rethinking the school: Subjectivity, bureaucracy and criticism. New York: St. Martin’s Press. Irwin, J. L. (1996). Developmental tasks of early adolescents: How adult awareness can reduce at-risk behavior. Clearing House, March April, 222–225. Johnson, D. W., Maruyama, G., Johnson, R., Nelson, D., & Skon, L. (1981). Effects of cooperative, competitive, and individualistic goal structures on achievement: A meta-analysis. Psychological Bulletin, 89, 47–62. Kaplan, A., & Middleton, M. J. (2002). Should childhood be a journey or a race? Response to Harackiewicz et al. (2002). Journal of Educational Psychology, 94, 646–648. Karasek, R. A., & Theorell, T. (1990). Healthy work: Stress, productivity, and the reconstruction of working life. New York: Basic Books. Karcher, M. J., Davis, C., & Powell, B. (2002). The effects of development mentoring on connectedness and academic achievement. School Community Journal, 12, 35–50. Kelly, J. A., & Hansen, D. J. (1987). Social interactions and adjustment. In V. B. Van Hasselt & M. Hersen (Eds.), Handbook of adolescent psychology (pp. 131–146). New York: Pergamon Press. Kontos, S., & Wilcox-Herzog, A. (1997a). Influences on children’s competence in early childhood classrooms. Early Childhood Research Quarterly, 12, 247–262. Kontos, S., & Wilcox-Herzog, A. (1997b). Teachers’ interactions with children: Why are they so important? Research in review. Young Children, 52, 4–12. Ladson-Billings, G. (1995). But that’s just good teaching! The case for culturally relevant pedagogy. Theory into Practice, 34, 159–165. La Guardia, J. G., & Ryan, R. M. (2002). What adolescents need: A self-determination theory perspective on development within families, school, and society. In F. Pajares & T. Urdan (Eds.), Academic motivation of adolescents: Vol. 2 (pp. 193–219). Greenwich, CT: Information Age. Lemos, M. S. (1996). Student’s and teacher’s goals in the classroom. Learning and Instruction, 6, 151–171. Levine, D. U., & Lezotte, L. W. (1990). Unusually effective schools: A review and analysis of research and practice. Madison, WI: National Center for Effective Schools Research and Development. Maehr, M. L., & Midgley, C. (1996). Transforming school cultures. Boulder, CO: Westview Press. Mahoney, J. L., Schweder, A. E., & Stattin, H. (2001). Structured after-school activities as moderator of depressed mood for adolescents with detached relations to their parents. Journal of Community Psychology, 30, 69–86.
Salkind_Chapter 58.indd 37
9/4/2010 10:48:29 AM
38
Motivation
Mann, D. (1989). Effective schools as a dropout prevention strategy. NASSP Bulletin 73, 518, 77–83. Marjoribanks, K. (1996). Family socialization and children’s school outcomes: An investigation of a parenting model. Educational Studies, 22, 3–11. Marsh, H. W. (1987). The big-fish-little-pond effect on academic self-concept. Journal of Educational Psychology, 79, 280–295. Marsh, H. W. (1989). Effects of attending single-sex and coeducational high schools on achievement, attitudes, behaviors, and sex differences. Journal of Educational Psychology, 81, 70–85. Marsh, H. W. (1992). Extracurricular activities: Beneficial extension of the traditional curriculum or subversion of academic goals? Journal of Educational Psychology, 84, 553–562. Marsh, H. W., & Hau, K. (2003). Big-Fish – Little-Pond effect on academic self-concept: A cross-cultural (26-country) test of the negative effects of academically selective schools. American Psychologist, 58, 364–376. Marsh, H. W., & Kleitman, S. (2002). Extracurricular school activities: The good, the bad, and the nonlinear. Harvard Educational Review, 72, 464–511. Marsh, H. W., Martin, A. J., & Cheng, J. (2008). A multilevel perspective on gender in classroom motivation and climate: Potential benefits of male teachers for boys? Journal of Educational Psychology, 100, 78–95. Marsh, H. W., & Rowe, K. J. (1996). The negative effects of school-average ability on academic self-concept – an application of multilevel modeling. Australian Journal of Education, 40, 65–87. Martin, A. J. (2001). The Student Motivation Scale: A tool for measuring and enhancing motivation. Australian Journal of Guidance and Counselling, 11, 1–20. Martin, A. J. (2002a). Improving the educational outcomes of boys. Final report to ACT Department of Education, Youth and Family Services, Canberra, Australia. Retrieved September 30, 2008, from http://www.det.act.gov.au/__data/assets/pdf_file/0005/ 17798/Ed_Outcomes_Boys.pdf Martin, A. J. (2002b). Motivation and academic resilience: Developing a model of student enhancement. Australian Journal of Education, 46, 34 – 49. Martin, A. J. (2002c). The lethal cocktail: Low self-belief, low control, and high fear of failure. Australian Journal of Guidance and Counselling, 12, 74 –85. Martin, A. J. (2003a). Boys and motivation: Contrasts and comparisons with girls’ approaches to schoolwork. Australian Educational Researcher, 30, 43–65. Martin, A. J. (2003b). Enhancing the educational outcomes of boys: Findings from the A.C.T. investigation into boys’ education. Youth Studies Australia, 22, 27–36. Martin, A. J. (2003c). The role of significant others in enhancing the educational outcomes and aspirations of Indigenous/Aboriginal students. Aboriginal Studies Association Journal, 12, 23–26. Martin, A. J. (2004). School motivation of boys and girls: Differences of degree, differences of kind, or both? Australian Journal of Psychology, 56, 133–146. Martin, A.J. (2005). Exploring the effects of a youth enrichment program on academic motivation and engagement. Social Psychology of Education, 8, 179–206. Martin, A. J. (2006a). A motivational psychology for the education of Indigenous students. Australian Journal of Indigenous Education, 35, 30–43. Martin, A. J. (2006b). Pastoral pedagogy: A great composition comprising the song, the singer, and the singing. US Department of Education. (ERIC Document Reproduction Service No. ED490483) Martin, A. J. (2006c). Personal bests (PBs): A proposed multidimensional model and empirical analysis. British Journal of Educational Psychology, 76, 803–825. Martin, A. J. (2006d). The relationship between teachers’ perceptions of student motivation and engagement and teachers’ enjoyment of and confidence in teaching. AsiaPacific Journal of Teacher Education, 34, 73–93.
Salkind_Chapter 58.indd 38
9/4/2010 10:48:29 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
39
Martin, A. J. (2007). Examining a multidimensional model of student motivation and engagement using a construct validation approach. British Journal of Educational Psychology, 77, 413–440. Martin, A. J. (2008a). Enhancing student motivation and engagement: The effects of a multidimensional intervention. Contemporary Educational Psychology, 33, 239–269. Martin, A. J. (2008b). Motivation and engagement in music and sport: Testing a multidimensional framework in diverse performance settings. Journal of Personality, 76, 135–170. Martin, A. J. (in press). Age appropriateness and motivation, engagement, and performance in high school: Effects of age-within-cohort, grade retention, and delayed school entry. Journal of Educational Psychology. Martin, A. J., & Marsh, H. W. (2003). Fear of failure: Friend or foe? Australian Psychologist, 38, 31–38. Martin, A. J., & Marsh, H. W. (2005). Motivating boys and motivating girls: Does teacher gender really make a difference? Australian Journal of Education, 49, 320–334. Martin, A. J., & Marsh, H. W. (2006). Academic resilience and its psychological and educational correlates: A construct validity approach. Psychology in the Schools, 43, 267–282. Martin, A. J., & Marsh, H. W. (2008). Academic buoyancy: Towards an understanding of students’ everyday academic resilience. Journal of School Psychology, 46, 53–83. Martin, A. J., & Marsh, H. W. (in press). Workplace and academic buoyancy: Psychometric assessment and construct validity amongst school personnel and students. Journal of Psychoeducational Assessment. Martin, A. J., Marsh, H. W., & Debus, R. L. (2001a). A quadripolar need achievement representation of self-handicapping and defensive pessimism. American Educational Research Journal, 38, 583–610. Martin, A. J., Marsh, H. W., & Debus, R. L. (2001b). Self-handicapping and defensive pessimism: Exploring a model of predictors and outcomes from a self-protection perspective. Journal of Educational Psychology, 93, 87–102. Martin, A. J., Marsh, H. W., & Debus, R. L. (2003). Self-handicapping and defensive pessimism: A model of self-protection from a longitudinal perspective. Contemporary Educational Psychology, 28, 1–36. Martin, A. J., Marsh, H. W., McInerney, D. M., Green, J., & Dowson, M. (2007). Getting along with teachers and parents: The yields of good relationships for students’ achievement motivation and self-esteem. Australian Journal of Guidance and Counselling, 17, 109–125. Martin, A. J., Marsh, H. W., Williamson, A., & Debus, R. L. (2003). Self-handicapping, defensive pessimism, and goal orientation: A qualitative study of university students. Journal of Educational Psychology, 95, 617–628. Martino, W., & Pallotta-Chiarolli, M. (2003). So what’s a boy: Addressing issues of masculinity and schooling. Buckingham, UK: Oxford University Press. Marx, A., Fuhrer, U., & Hartig, T. (1999). Effects of classroom seating arrangements on children’s question-asking. Learning Environments Research, 2, 249–263. Marzano, R. (2003). What works in schools. Alexandria, VA: ASCD. Maslow, A. (1968). Toward a psychology of being. Princeton, NJ: Van Nostrand. Mayer, D. (2006). The changing face of the Australian teaching profession: New generations and new ways of working and learning. Asia-Pacific Journal of Teacher Education, 34, 57–61. McAdams, D. P., Hoffman, B. J., Mansfield, E. D., & Day, R. (1996). Themes of agency and communion in significant autobiographical scenes. Journal of Personality, 64, 339–378. McCarthy, M., Pretty, G., & Catano, V. (1990). Psychological sense of community and burnout. Journal of College Student Development, 31, 211–216. McClelland, D. C. (1965). Toward a theory of motive acquisition. American Psychologist, 20, 321–333.
Salkind_Chapter 58.indd 39
9/4/2010 10:48:29 AM
40
Motivation
McCormack, A., Gore, J., & Thomas, K. (2006). Early career teacher professional learning. Asia-Pacific Journal of Teacher Education, 34, 95–113. McInerney, D. (2000). Helping kids achieve their best. Sydney, Australia: Allen and Unwin. McInerney, D. M., Hinkley, J., Dowson, M., & Van Etten, S. (1998). Children’s beliefs about success in the classroom: Are there cultural differences? Journal of Educational Psychology, 90, 621–629. McInerney, D. M., Roche, L., McInerney, V., & Marsh, H. W. (1997). Cultural perspectives on school motivation: The relevance and application of goal theory. American Educational Research Journal, 34, 207–236. McInerney, D. M., & Van Etten, S. (2004). Big theories revisited. Greenwich, CT: Information Age. McLaughlin, M. W., Irby, M. A., & Langman, J. (1994). Urban sanctuaries: Neighborhood organizations and the lives and futures of inner city youth. San Francisco: Jossey-Bass. Meece, J. L. (1991). The classroom context and student’s motivational goals. In M. L. Maehr & P. R. Pintrich (Eds.), Advances in motivation and achievement (pp. 261–285). Greenwich, CT: JAI Press. Meece, J. L. (1997). Child and adolescent development for educators. New York: McGraw-Hill. Meyer, D. K., & Turner, J. C. (2002). Discovering emotion in classroom motivation research. Educational Psychologist, 37, 107–114. Middleton, M. J., & Midgley, C. (1997). Avoiding the demonstration of lack of ability: An unexplored aspect of goal theory. Journal of Educational Psychology, 89, 710–718. Moos, R. H. (2002). The mystery of human context and coping: An unraveling of clues. American Journal of Community Psychology, 30, 67–88. Munns, G. (1998). “They just can’t hack that”: Aboriginal students, their teachers and responses to schools and classrooms. In G. Partington (Ed.), Perspectives on Aboriginal and Torres Strait Islander education (pp. 171–187). Katoomba, Australia: Social Science Press. Nation, M., Crusto, C., Wandersman, A., Kumpfer, K. L., Seybolt, D., Morrisey-Kane, E., et al. (2003). What works in prevention: Principles of effective prevention programs. American Psychologist, 58, 449 – 456. National Institute of Child Health and Human Development. (2008). Accessed September 30, 2008, at http://www.nichd.nih.gov/ National Institutes of Health. (2008). Accessed September 30, 2008, at http://www.nih. gov/ Nicholls, J. G., Cheung, P. C., Lauer, J., & Patashnick, M. (1989). Individual differences in academic motivation: Perceived ability, goals, beliefs, and values. Learning and Individual Differences, 1, 63–84. Noble, C., & Bradford, W. (2000). Getting it right for boys … and girls. London: Routledge. Office of the Australian Safety and Compensation Council. (2006). Compendium of workers’ compensation statistics, Australia, 2002–2003. Canberra: Commonwealth of Australia, Department of Employment and Workplace Relations. O’Hare, M. (1998). Classroom design for discussion-based teaching. Journal of Policy Analysis and Management, 17, 706 –720. Organisation for Economic Co-operation and Development. (2005). Teachers matter: Attracting, developing and retaining effective teachers. Paris: Author. Papaioannou, A., Marsh, H. W., Theodorakis, Y. (2004). A multilevel approach to motivational climate in physical education and sport settings: An individual or a group level construct. Journal of Sport and Exercise Psychology, 26, 90–118. Parker, P. D., & Martin, A. J. (in press). Personal capacity building for the human services: What is the relative salience of curriculum and individual differences in predicting selfconcept amongst college/university students? Learning and Individual Differences.
Salkind_Chapter 58.indd 40
9/4/2010 10:48:29 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
41
Perry, R. P., & Tunna, K. (1988). Perceived control, Type A / B behavior, and quality of instruction. Journal of Educational Psychology, 80, 102–110. Peterson, C., Maier, S. F., & Seligman, M. E. P. (1993). Learned helplessness: A theory for the age of personal control. New York: Oxford University Press. Pianta, R. C. (1998). Applying the concept of resilience in schools: Cautions from a developmental systems perspective. School Psychology Review, 27, 407– 428. Pianta, R. C., Nimetz, S. L., & Bennett, E. (1997). Mother-child relationships, teacher-child relationships, and school outcomes in preschool and kindergarten. Early Childhood Research Quarterly, 12, 263–280. Pintrich, P. R. (2003). A motivational science perspective on the role of student motivation in learning and teaching contexts. Journal of Educational Psychology, 95, 667–686. Pintrich, P. R., Marx, R. W., & Boyle, R. A. (1993). Beyond cold conceptual change: The role of motivational beliefs and classroom contextual factors in the process of conceptual change. Review of Educational Research, 63, 167–199. Qin, Z., Johnson, D. W., & Johnson, R. T. (1995). Cooperative versus competitive efforts and problem solving. Review of Educational Research, 65, 129–144. Reeve, J., Deci, E. L., & Ryan, R. M. (2004). Self-determination theory: A dialectical framework for understanding sociocultural influences on student motivation. In D. McInerney & S. Van Etten (Eds.), Big theories revisited (pp. 31–60). Greenwich, CT: Information Age. Richardson, P. W., & Watt, H. M. G. (2006). Who chooses teaching and why? Profiling characteristics and motivation across three Australian institutions. Asia-Pacific Journal of Teacher Education, 34, 27–56. Richer, K., Godfrey, J., Partington, G., Harslett, M., & Harrison, B. (1998). Attitudes of Aboriginal students to further education: An overview of a questionnaire survey. Paper presented at Australian Association for Research in Education Annual Conference, Adelaide, Australia. Robinson, N. S. (1995). Evaluating the nature of perceived support and its relation to perceived self-worth in adolescents. Journal of Research on Adolescence, 5, 253–280. Roeser, R. W., Midgley, C., & Urdan, T. C. (1996). Perceptions of the school psychological environment and early adolescents’ psychological and behavioral functioning in school: The mediating role of goals and belonging. Journal of Educational Psychology, 88, 408–422. Roth, J., & Brooks-Gunn, J. (2000). What do adolescents need for healthy development? Implication for youth policy. Social Policy Report, Society for Research in Child Development, 16, 3–19. Rowe, K. J., & Rowe, K. S. (1999). Investigating the relationship between students’ attentiveinattentive behaviours in the classroom and their literacy progress. International Journal of Educational Research, 31, 1–138. Royal, M. A., & Rossi, R. (1996). Individual level correlates of sense of community: Findings from workplace and school. Journal of Community Psychology, 24, 395– 416. Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55, 68–78. Ryan, R. M., & Grolnick, W. S. (1986). Origins and pawns in the classroom: Self-report and projective assessments of individual differences in children’s perceptions. Journal of Personality and Social Psychology, 50, 550–558. Ryan, R. M., Stiller, J., & Lynch, J. H. (1994). Representations of relationships to parents, teachers, and friends as predictors of academic motivation and self-esteem. Journal of Early Adolescence, 14, 226–249. Sammons, P. (1999). School effectiveness: Coming of age in the twenty-first century. Lisse, Netherlands: Swets and Zeitlinger.
Salkind_Chapter 58.indd 41
9/4/2010 10:48:29 AM
42
Motivation
Sammons, P., Hillman, J., & Mortimore, P. (1995). Key characteristics of effective schools: A review of school effectiveness research. London: Office of Standards in Education and Institute of Education. Sarason, S. B. (1993). American psychology and the needs for transcendence and community. American Journal of Community Psychology, 21, 185–202. Schabaracq, M. J., & Cooper, C. L. (2000). The changing nature of work and stress. Journal of Managerial Psychology, 15, 227–241. Schell, D., Bruning, R., & Colvin, C. (1995). Self-efficacy, attribution, and outcome expectancy mechanisms in reading and writing achievement: Grade-level and achievement level. Journal of Educational Psychology, 87, 386–398. Schorr, R. Y. (2000). Impact at the student level: A study of the effects of a teacher development intervention on students’ mathematical thinking. Journal of Mathematical Behavior, 19, 209–231. Schunk, D. (1991). Goal setting and self-regulation: A social cognitive perspective on self-regulation. In M. L. Maehr, & P. R. Pintrich (Eds.), Advances in motivation and achievement (pp. 85–113). Greenwich, CT: JAI Press. Schunk, D. H., & Miller, S. D. (2002). Self-efficacy and adolescents’ motivation. In F. Pajares & T. urdan (Eds.), Academic motivation of adolescents (pp. 29–52). Greenwich, CT: Information Age. Shouse, R. C. (1996). Academic press and sense of community: Conflict, congruence, and implications for student achievement. Social Psychology of Education, 1, 47–68. Slade, M. (2001). Listening to boys. Boys in Schools Bulletin, 4, 10–18. Slavin, R. (1983). Cooperative learning. New York: Longman. Smithers, A., & Robinson, P. (2003). Factors affecting teachers’ decision to leave the profession (Research Report RR430). UK: Department of Education and Skills. Stambor, Z. (2006). Stressed out nation. Monitor on Psychology, 37(4), 28–29. Stipek, D., Giwin, K. B., Salmon, J. M., & MacGyvers, V. L. (1998). Can a teacher intervention improve classroom practices and student motivation in mathematics? Journal of Experimental Education, 66, 319–337. Stone, N. J. (2001). Designing effective study environments. Journal of Environmental Psychology, 21, 179–190. Taylor, R. D. (1995). Social contextual influences on family relations. In M. Maehr & P. R. Pintrich. (Eds.), Advances in Motivation and Achievement (pp. 229–253). Greenwich, CT: JAI. Teven, J. J., & McCroskey, J. C. (1997). The relationship of perceived teacher caring with student learning and teacher evaluation. Communication Education, 46, 1–9. Thompson, T. (1994). Self-worth protection: Review and implications for the classroom. Educational Review, 46, 259–274. Urdan, T. C., & Maehr, M. L. (1995). Beyond a two goal theory of motivation and achievement: A case for social goals. Review of Educational Research, 65, 213–243. Urdan, T. C., Midgley, C., & Anderman, E. M. (1998). The role of classroom goal structure in students’ use of self-handicapping strategies. American Educational Research Journal, 35, 101–122. Valentine, J. C., Cooper, H., Bettencourt, B. A., & DuBois, D. L. (2002). Out-of-school activities and academic achievement: The mediating role of self-beliefs. Educational Psychologist, 37, 245–256. Vinson, T. (2002). Inquiry into the provision of public education. Sydney, Australia: Pluto Press. Waterman, A. S. (1981). Individualism and interdependence. American Psychologist, 36, 762–773. Weaver-Hightower, M. (2003). The “boy turn” in research on gender and education. Review of Educational Research, 73, 471– 498.
Salkind_Chapter 58.indd 42
9/4/2010 10:48:29 AM
Martin and Dowson
Relationships, Student Motivation and Engagement
43
Webster’s Online Dictionary. (2007). http://www.websters-online-dictionary.org Weiner, B. (1986). An attributional theory of motivation and emotion. New York: Springer-Verlag. Weiner, B. (1994). Integrating social and personal theories of achievement striving. Review of Educational Research, 64, 557–573. Weisenfeld, E. (1996). The concept of “We”: A community social psychology myth? Journal of Community Psychology, 24, 337–346. Weissberg, R. P., Kumpfer, K. L., & Seligman, M. E. P. (2003). Prevention that works for children and youth: An introduction. American Psychologist, 58, 425– 432. Wentzel, K. R. (1992). Motivation and achievement in adolescence: A multiple goal perspective. In D. H. Schunk & J. L. Meece (Eds.), Student perceptions in the classroom (pp. 287–306). Hillsdale, NJ: Lawrence Erlbaum. Wentzel, K. R. (1994). Relations of social goal pursuit to social acceptance, classroom behaviour, and perceived social support. Journal of Educational Psychology, 84, 173–182. Wentzel, K. R. (1999). Social-motivational processes and interpersonal relationships: Implications for understanding motivation at school. Journal of Educational Psychology, 91, 76–97. Wentzel, K. R., McNamara Barry, C., & Caldwell, K. A. (2004). Friendships in middle school: Influences on motivation and school adjustment. Journal of Educational Psychology, 96, 195–203. Wigfield, A. (1994). Expectancy-value theory of achievement motivation: A developmental perspective. Educational Psychology Review, 6, 49–78. Wigfield, A., & Tonks, S. (2002). Adolescents’ expectancies for success and achievement task values during the middle and high school years. In F. Pajares & T. Urdan (Eds.), Academic motivation of adolescents (pp. 53–82). Greenwich, CT: Information Age. Zimmerman, B., Bandura, A., & Martinez-Ponz, M. (1992). Self-motivation for academic attainment: The role of self-efficacy and personal goal setting. American Educational Research Journal, 29, 663–676.
Salkind_Chapter 58.indd 43
9/4/2010 10:48:29 AM
Salkind_Chapter 58.indd 44
9/4/2010 10:48:29 AM
59 Classroom and Individual Differences in Early Adolescents’ Motivation and Self-Regulated Learning Paul R. Pintrich, Robert W. Roeser and Elisabeth A.M. De Groot
C
urrent research on early adolescence and schooling has examined the relations between biological changes and schooling (e.g., Simmons & Blyth, 1987), cognitive development and schooling (e.g., Entwisle, 1990; Keating,1990), and social and motivational development and schooling (e.g., Eccles et al., 1993). Although this research has begun to provide us with excellent descriptions of how different characteristics of junior high and senior high schools influence the course of adolescent development, there has not been much integration across the different domains of development. In fact, Keating (1990) suggests the need for models that “open the gate of our typically closed-system models of thinking, learning, and instruction” ( p. 76) to include motivational, cognitive, and contextual dimensions. The purpose of this article is to explicitly integrate the motivational and cognitive domains of adolescent development and examine their interrelations in the classroom context of middle schools. One hallmark of adolescent thinking is the ability to monitor and regulate one’s thinking and learning (Keating, 1990). Although there has been research on the development of adolescents’ ability to regulate their learning (Sternberg & Powell, 1983), there has been very little research on how this ability is related to students’ motivational beliefs, goals, and values. In fact, Keating (1990) noted that “tracking the development of these motivational aspects that relate to cognitive performance remains a key topic for future Source: The Journal of Early Adolescence, 14(2) (1994): 139–161.
Salkind_Chapter 59.indd 45
9/4/2010 10:47:48 AM
46
Motivation
research” (p. 76). In the classroom context, the use of cognitive and self-regulatory strategies has been shown to be an important component of student performance and achievement in the classroom (Pintrich & De Groot, 1990; Zimmerman & Martinez-Pons, 1986). This research has shown that students who use cognitive strategies such as elaboration (e.g., summarizing, paraphrasing) and organization (making outlines, drawing up tables or charts) engage the content at a deeper level of processing and are more likely to recall the information and be able to use it at a later date. In contrast, students who do not use any strategies to help them encode the information or rely only on rote rehearsal strategies seem to process the information at a more superficial or surface level and do not perform as well on memory and transfer tasks (Weinstein & Mayer, 1986). Besides these basic cognitive strategies, research has shown that metacognitive control and self-regulatory strategies also are important for learning (Brown, Bransford, Campione, & Ferrara, 1983; Keating, 1990). Metacognitive control strategies include planning (e.g., setting goals), monitoring (e.g., tracking attention and comprehension, self-testing for understanding), and regulating (e.g., rereading, adjusting reading speed) strategies that help guide and direct students’ cognition. Besides these metacognitive control strategies, there are a variety of other self-regulatory strategies that are important for performance (Corno, 1986, 1993). In the classroom context, students’ ability to manage and regulate their effort (i.e., persist with difficult tasks, maintain attention with uninteresting tasks) seems to be an important component of self-regulated learning (Pintrich & De Groot, 1990; Zimmerman & Martinez-Pons, 1986). For the most part, however, cognitive development research has not focused on how students are motivated to use these cognitive and selfregulatory strategies. Nevertheless, it does appear that certain motivational beliefs and affective reactions are related to how adolescents approach and become cognitively involved in different classroom academic tasks (Graham & Golan, 1991; Nolen, 1988; Pintrich, 1989; Pintrich & De Groot, 1990; Pintrich & Schrauben, 1992). In our research, we have focused on three motivational components – value, expectancy, and affect. As in all motivational theories, these motivational beliefs and reactions are assumed to lead to three general types of motivated behavior including choice (i.e., choosing to do some tasks and not others), level of activity or engagement (i.e., engaging the task at a sustained and deep level), and persistence (i.e., continued effort in the face of difficulty). However, motivational research does not usually examine the use of different strategies or resources through which the learner would enact these motivational beliefs. Given that the use of different cognitive and self-regulatory strategies represents motivated behavior in terms of a sustained and deeper level of cognitive engagement, we have focused on the relations between the motivational and cognitive components thereby describing a motivated and self-regulating learner.
Salkind_Chapter 59.indd 46
9/4/2010 10:47:48 AM
Pintrich el al.
Motivation and Learning
47
In our model, we have defined value components in terms of two constructs – goal orientation and task value beliefs (Pintrich, 1989; Pintrich & Garcia, 1991; Pintrich & Schrauben, 1992; Pintrich, Smith, Garcia, & McKeachie, 1993). Goal orientation refers to two general approaches to academic tasks in line with a more qualitative view of motivation (Ames, 1992) whereby the different goal orientations can lead students in qualitatively different directions as they perform an academic task. The two dimensions are an intrinsic goal orientation where the student focuses on mastery and learning and an extrinsic goal orientation where the student approaches the task with a concern about grades, pleasing others, or besting others (cf. Dweck & Leggett, 1988; Harter, 1981). In contrast, task value beliefs reflect a more quantitative approach to motivation where higher levels of task value should result in more motivated behavior. Following Eccles (1983), we have proposed that there are three general aspects of task value beliefs – interest, utility, and importance. Interest refers to students’ personal interest and liking of the course material. Utility is students’ perceptions of how useful the course material is to them. Importance concerns students’ beliefs about how significant the course content is for them and their future goals (Pintrich, 1989). Conceptually, we propose that goal orientation and task value are separate motivational components. However, although we have found that intrinsic goal orientation, extrinsic goal orientation, and task value beliefs are separate factors with college age adolescents (Pintrich & Garcia, 1991; Pintrich et al., 1993), in our work with middle school adolescents (Pintrich & De Groot, 1990), we have found that these beliefs are less differentiated and form a general value factor that reflects intrinsic goal orientation and high levels of interest, utility, and importance beliefs. In this article, we focus on this general factor that we have labeled intrinsic value in line with our earlier work (Pintrich & De Groot, 1990). The second component in our research on motivational beliefs includes the expectancy component, self-efficacy. Self-efficacy beliefs are defined as students’ judgments of their capability to accomplish a task in a specific situation and have been linked to a number of positive performance and achievement outcomes (Bandura, 1986; Schunk, 1985). In our work, we have operationalized self-efficacy beliefs at a slightly more global level to include adolescents’ judgments of their capabilities to learn and succeed in a specific course (Pintrich & De Groot, 1990). The third motivational component in our model is the affective construct of test anxiety. There seems to be two aspects of test anxiety – a worry component and an emotionality component (Liebert & Morris, 1967). The worry component is more cognitive in nature and refers to negative thoughts and self-talk that interfere with effective performance. The emotionality component includes the experience of negative emotions and physiological arousal. There are a large number of studies that show that high levels of test anxiety disrupt performance and have detrimental effects on student achievement (Hill & Wigfield, 1984).
Salkind_Chapter 59.indd 47
9/4/2010 10:47:48 AM
48
Motivation
Following the Keating (1990) call for more integrative research on motivation and self-regulation, the first research question in this study concerns the relations between these three motivational components (intrinsic value, self-efficacy, and test anxiety) and adolescents’ use of cognitive and self-regulatory strategies. In a previous study (Pintrich & De Groot, 1990), we found that higher levels of intrinsic value and self-efficacy were positively related to the use of cognitive and self-regulatory strategies for middle school adolescents. These findings parallel other studies (e.g., Ames & Archer, 1988; Graham & Golan, 1991; Nolen, 1988) that have shown that adopting an intrinsic, mastery, and task-involved orientation to a learning task results in deeper levels of cognitive processing and that higher levels of self-efficacy lead to more strategy use and self-regulated learning (Pintrich & Schrauben, 1992; Schunk, 1989). In addition, we found in the previous study that test anxiety was not related to cognitive strategy use but was negatively related to adolescents’ self-regulation (Pintrich & De Groot, 1990) as has been found in other studies (e.g., Benjamin, McKeachie, & Lin, 1987). In our earlier study, however, we only had measures of students’ motivation and cognition at one point in time that limited our ability to examine the development of the relations over time. In the present study, we have measures of both motivation and self-regulated learning at two points in time. One purpose of the present study is to provide a partial replication of the results from our earlier study. In addition, we go beyond those results by describing the relations between motivation and self-regulation over the course of a school year in middle school classrooms. Many of the studies that have directly investigated the relations between motivation and cognition have not examined classroom context effects. Accordingly, our second general research question concerns how adolescents’ classroom experiences are related to their motivation and self-regulated learning. As Eccles et al. (1993) have suggested, there are a number of dimensions of classrooms that can have a positive or negative influence on the course of adolescent development, especially adolescents’ motivation. In this article, we focus on three general aspects of middle school adolescents’ classroom experience – the nature of academic work, the teacher’s instructional style, and cooperative goal structure. In research on the influence of classroom characteristics on adolescent development, there is an important conceptual and methodological issue concerning the nature and measurement of the classroom characteristics. Ames (1992) and others (e.g., Ryan & Grolnick, 1986; Weinstein, 1989; Winne & Marx, 1982) have argued strongly for the inclusion of students’ perceptions of classroom experience as an important mediator of actual classroom experience. Research on classroom climate in general (e.g., Moos, 1979) and specific research on how individuals construct meaning regarding the “functional significance” (Ryan & Grolnick, 1986) of context or how they create a “psychological climate” (Maehr, 1984) of a particular environment
Salkind_Chapter 59.indd 48
9/4/2010 10:47:49 AM
Pintrich el al.
Motivation and Learning
49
suggests the importance of student perceptions of the classroom. We follow in this tradition and use student perceptions of the classroom as our measure of the classroom characteristics. At the same time, following the analyses used by Ryan and Grolnick (1986), we examine the relative effects of the classroom as a whole by including an overall classroom mean as one measure of the environment as well as a measure of the disparity in an individual’s perceptions of the classroom by including a deviation measure that subtracts the individual’s scale score from the classroom mean. By including both these measures, we can examine questions regarding overall classroom influences as well as the individual-within-a-context questions thereby shedding some light on the relative importance of classroom and individual differences effects. In addition, many of these studies of students’ perceptions of the classroom environment (e.g., Ames & Archer, 1988; Ryan & Grolnick, 1986) do not examine or factor out students’ entering individual characteristics before measuring their classroom perceptions. For example, both Ames and Archer (1988) and Ryan and Grolnick (1986) used the more common design of giving a classroom perception measure followed at some later point in time by measures of outcomes (e.g., individual differences in perceived competence, goal orientation, self-efficacy, intrinsic motivation, strategy use). This type of design does not control for preexisting individual differences. In this study, we have a pretest measure of students’ motivation and cognition followed by a measure of classroom perceptions and a posttest on student motivation and cognition. This design allows us to examine the relative influence of entering both individual differences and classroom perceptions on students’ motivation and cognition. In terms of the classroom dimensions, academic work has been shown to influence learners by focusing their attention on particular aspects of the content, specifying ways to process information, and promoting interest (Doyle, 1983). Blumenfeld, Mergendoller, and Swarthout (1987) also have argued that features of academic classroom work can influence students’ motivation and cognition. In addition, motivation researchers have suggested that providing students with some choice and control over their learning will result in higher levels of motivation and interest (Ames, 1992; Deci & Ryan, 1985). Eccles et al. (1993) have found that there is usually a decline in the amount of autonomy, choice, and control over academic work as adolescents enter middle school with concomitant declines in their motivation. Accordingly, our measure of academic work included students’ perceptions of how much interest is generated by the work, the amount of choice they have, and whether the work helps them learn the course content. It was expected that perceptions of the classroom work as productive would be related to higher levels of student motivation, especially more positive intrinsic value beliefs, as well as more frequent use of cognitive and self-regulatory strategies.
Salkind_Chapter 59.indd 49
9/4/2010 10:47:49 AM
50
Motivation
The second dimension of classroom experience includes students’ perceptions of the teacher’s instructional behavior. Students’ perceptions of their teachers and their instructional behavior have been shown to be related to motivational beliefs (Eccles, 1983) as well as actual academic performance (Brophy & Good, 1986). The research from the process-product paradigm suggests that both management behavior (e.g., maintaining control and order) as well as instructional behavior (e.g., providing clear explanations) are important characteristics of good teachers (Brophy & Good, 1986). More recently, Blumenfeld, Puro, and Mergendoller (1992) have suggested that teachers need to combine instructional strategies to facilitate student motivation (i.e., enhancing student interest and value for the course material; cf. Brophy, 1983) with instructional strategies that enhance cognitive engagement (i.e., clear explanations, pressing for understanding through questioning and feedback). We included students’ perceptions of these instructional behaviors and expected that students who perceived more frequent use of these behaviors would report more positive motivational beliefs and higher levels of cognitive and self-regulatory strategy use. Finally, cooperative goal structures in the classroom have been shown to enhance student motivation including attributions, interest and value beliefs, and self-efficacy and perceptions of competence (Slavin, 1980). In addition, students’ perceptions of classroom mastery and cooperative goal structures are related to their use of cognitive strategies (Ames & Archer, 1988) as well as their overall academic achievement (Slavin, 1980). Although there are many studies of cooperative goal structures in classrooms, very few classroom studies have examined the relative contributions of the opportunity to work together with other aspects of the classroom. It may be that the nature of the academic work or the teacher’s instructional behavior may have an influence on students’ motivation and cognition over and above any effect from cooperative learning opportunities. We expected that students’ perceptions of the opportunity to work with other students would be positively correlated with students’ motivational beliefs and their use of cognitive and self-regulatory strategies. However, we also were interested in examining the relative strength of the three classroom variables as predictors of students’ motivation and self-regulated learning. The third general research question concerns the relations between the individual student characteristics of motivation and self-regulated learning and the three classroom characteristics. Many motivational models tend to emphasize the strength of individual differences as predictors of students’ achievement (Corno & Snow, 1986), whereas classroom research on tasks (Blumenfeld et al., 1991; Doyle, 1983) and more recent work on situated cognition (Brown, Collins, & Duguid, 1989) suggest that the situational features of the classroom have a powerful influence on students’ motivation and cognition. Given the design of the study, which includes measures of students’ motivation and self-regulated learning at the beginning of the school year and
Salkind_Chapter 59.indd 50
9/4/2010 10:47:49 AM
Pintrich el al.
Motivation and Learning
51
at the end, we can examine the relative strength of individual differences versus classroom characteristics as predictors of end-of-the-year student motivation and self-regulated learning. In summary, three main research questions were investigated: (a) What is the relation between adolescents’ motivational beliefs (intrinsic value, selfefficacy, and test anxiety) and self-regulated learning (use of cognitive strategies, use of metacognitive control strategies) over the course of a school year? (b) What is the relation between characteristics of the classroom environment and adolescents’ motivation and self-regulated learning? and (c) What is the relative strength of adolescent personal characteristics and perceptions of classroom experience as predictors of motivation and self-regulated learning at the end of the school year?
Method Subjects The sample included 100 seventh-grade students from 14 classrooms in two middle schools. The subjects were predominantly White, middle-class adolescents from a small city in southeastern Michigan. There were 55 girls and 45 boys in the sample. The mean age was 12 years, 3 months.
Measures The students responded twice to a self-report questionnaire, the Motivated Strategies for Learning Questionnaire (MSLQ) (see Pintrich & De Groot, 1990; Pintrich et al., 1993) that included 56 items on student motivation, cognitive strategy use, and self-regulation that formed the scales for the motivational and self-regulatory variables. Administration of the questionnaire took place during the fall semester (October, Time 1) and again in the spring of the following year (May, Time 2). During the spring administration, students answered a version of the MSLQ that included, in addition to the same 56 items on the first questionnaire, 12 items that asked students about the class work, their teacher, and the opportunities to work with other students in that class. These 12 items were adapted from classroom climate scales Moos (1979), and formed the scales for the classroom experience variables. Scale construction for the motivational and self-regulation variables was guided by previous work done with the MSLQ with this age group (Pintrich & De Groot, 1990). Three motivation scales were formed. The Intrinsic Value scale (Time 1 alpha = .87, Time 2 alpha = .90) was constructed by taking the average of the students’ responses to the nine items concerning intrinsic interest (“I think what we are learning in this Science class is interesting.”)
Salkind_Chapter 59.indd 51
9/4/2010 10:47:49 AM
52
Motivation
and perceived importance of course work (“It is important for me to learn what is being taught in this Social Studies class.”), as well as preference for challenge and mastery goals (“I prefer class work that is challenging so I can learn new things.”). It should be noted that the items were keyed in terms of the class in which the students took the questionnaire; that is, if they were in a science class, the items referred to science. The Self-Efficacy scale (alphas = .91, .92) consisted of nine items regarding perceived competence and confidence in performance of class work (e.g., “I expect to do very well in this class.” “I am sure that I can do an excellent job on the problems and tasks assigned for this class.”). Finally, four items constituted the Test Anxiety scale (alphas = .75, .84), including items concerning worry and cognitive interference on tests (“I am so nervous during a test I cannot remember facts I have learned.” “When I take a test I think about how poorly I am doing.”). In addition to the three motivation scales, two cognitive scales, Cognitive Strategy Use and Self-Regulation, were constructed. The Cognitive Strategy Use scale (alphas = .83, .88) consisted of 13 items averaged to form the scale score. Items pertained to the use of rehearsal strategies (e.g., “When I read material for science class, I say the words over and over to myself to help me remember.”), elaboration strategies such as summarizing and paraphrasing (e.g., “When I study for this English class, I put important ideas into my own words.”), and organizational strategies (e.g., “I outline the chapters in my book to help me study.”). The Self-Regulation scale (alphas = .63, .71) was constructed from nine items that asked about metacognitive strategies such as planning, skimming, and comprehension monitoring (e.g., “I ask myself questions to make sure I know the material I have been studying,” “I find that when the teacher is talking I think of other things and don’t really listen to what is being said,” and “I often find that I have been reading for class but don’t know what it is all about,” with the latter two items reflected before scale construction). In addition, students’ strategies for managing their effort such as persistence at difficult or boring tasks and working diligently were included in the self-regulation scale (e.g., “Even when study materials are dull and uninteresting, I keep working until I finish,” and “When work is hard, I either give up or study only the easy parts,” with the latter reflected before scale construction). Factor analysis was used as a guide to create three classroom perception scales. A varimax rotated solution generated three interpretable factors that accounted for 67% of the variance. The first scale, the Productive Classroom Work scale (alpha = .83) consisted of five items concerned with how the student perceived the class assignments in terms of utility and interest, choice of work, and subject matter in general (e.g., “Students have some choice over the topics for class reports.”). The Teacher Effectiveness scale (alpha = .85) was constructed by taking the mean of five items regarding the teacher’s treatment of the subject matter in a clear and interesting manner, good classroom management, and fair grading procedures (e.g., “The teacher explains
Salkind_Chapter 59.indd 52
9/4/2010 10:47:49 AM
Pintrich el al.
Motivation and Learning
53
the material well.” “The teacher has good control of this class.”). Finally, the Cooperative Work scale comprised two items (alpha = .79). Items asked whether or not the teacher encouraged students to work together on assignments or provided opportunities to do so (e.g., “The teacher encourages us to work on assignments together,” and “I had the opportunity to work with other students in this class.”). Three different versions of these three scales were created, following Ryan and Grolnick (1986). First, an individual difference perception score was created that represented the individual’s mean score on the scale regardless of classroom. Second, these scales were then aggregated to the classroom level to create social consensus scores for perceptions of work, teacher, and opportunities to work cooperatively with every student in the same classroom assigned the classroom mean. Finally, an individual student deviation score for these three class perceptions was computed by subtracting the classroom mean from his or her individual score on the same scale.
Results Gender differences were examined in preliminary analyses on all of the motivational and cognitive variables and revealed only one significant difference for boys and girls on the self-efficacy measure. Accordingly, in all analyses except those including self-efficacy, gender was not included in the analyses. The first question of the study concerned the relations between the motivational and self-regulated learning components, and the results generally replicated our previous findings. Table 1 displays the zero-order correlations and summary statistics for the motivational and self-regulated learning variables at Time 1 and Time 2. The autocorrelations of the motivational and cognitive scales at Times 1 and 2 were moderately large, ranging from .47 to .61. As predicted and paralleling our earlier results (Pintrich & De Groot, 1990), at both Time 1 and Time 2, higher levels of intrinsic value (r = .66, r = .76) and self-efficacy (r = .41, r = .61) were correlated with high levels of cognitive strategy use. Test anxiety was not significantly related to cognitive strategy use. Similar to the cognitive strategy use results, higher levels of intrinsic value (r = .69, r = .73) and self-efficacy (r = .50, r = .67) were related to higher levels of self-regulation. Test anxiety was negatively related to self-regulation at both Time 1 and Time 2 (r = −.25, r = −.29). In terms of the relations between the motivational beliefs and self-regulated learning variables over time, five separate regressions were run. Each of the Time 2 variables was predicted by the five Time 1 variables, and the results are presented in Table 2. In general, the dependent measure was predicted most strongly by the parallel measure at Time 1. For self-efficacy, test anxiety, and cognitive strategy use at Time 2, the strongest and only significant predictors were self-efficacy at Time 1 (beta = .47), test anxiety at Time 1
Salkind_Chapter 59.indd 53
9/4/2010 10:47:49 AM
54
Motivation
Table 1: Summary statistics and zero-order correlations for motivation and self-regulated learning variables at Time 1 and Time 2 Variable 1. Intrinsic value 2. Intrinsic value 3. Self-efficacy 4. Self-efficacy 5. Test anxiety 6. Test anxiety 7. Strategy use 8. Strategy use 9. Self-regulation 10. Self-regulation Mean Standard deviation
Time 1 2 1 2 1 2 1 2 1 2
1
2
3
4
5
6
7
– .47*** – .53*** .41*** – .35*** .72*** .57*** –.14 –.17* –.40*** –.01 –.25** –.32*** .66*** .51*** .41*** .48*** .76*** .26** .69*** .38*** .50*** .38*** .73*** .29***
– –.30*** –.41*** .43*** .61*** .47*** .67***
– .60*** – –.08 .01 –.05 –.09 –.25** –.11 –.14 –.29**
5.41 1.03
5.56 1.10
3.42 1.49
5.41 1.10
5.45 1.01
3.23 1.63
8
9
– .61*** – .73*** .49*** – .50*** .73*** .49*** 4.99 0.92
5.08 0.96
4.84 1.00
10
– 4.88 1.07
Note: N = 100. *p < .05; **p < .01; ***p < .001.
Table 2: Standardized regression effects of Time 1 motivation and self-regulated learning variables on Time 2 motivation and self-regulated learning variables Predictors a Intrinsic value Self-efficacy c Test anxiety Strategy use Self-regulation Total adjusted R 2
Intrinsic value b .20 .19 –.07 .42** –.17 .30***
Self-efficacy b –.17 .47*** .08 .21 .19 .36***
Test anxiety b .16 –.20 .54*** .05 –.03 .36***
Strategy use b .11 .07 .14 .48*** .09 .37***
Self-regulation b –.02 .03 –.05 .31* .26 .25***
Note: N = 100. a Time 1 variables. b Time 2 variables. c For self-efficacy, there was a significant gender difference, so gender was added as the first term in the regression equation. The standardized regression coefficient for gender on self-efficacy at Time 2 was β = –.01 ( p = .94). *p < .05; **p < .01; ***p < .001.
(beta = .54) and cognitive strategy use at Time 1 (beta = .48), respectively. These variables accounted for about 36% of the variance in self-efficacy, test anxiety, and cognitive strategy use at the end of the year. A different pattern of results was found for intrinsic value and self-regulation at Time 2. The strongest predictor of intrinsic value at the end of the year was cognitive strategy use at Time 1 (beta = .42). Time 1 level of intrinsic value had a positive, but nonsignificant effect on intrinsic value at the end of the year (beta = .20, p = .13). For self-regulation at the end of the year, students’ reported use of cognitive strategies was the best predictor (beta = .31), with prior levels of self-regulation having a positive effect that was not significant (beta = .26, p = .08). The cognitive and motivation variables at Time 1 accounted for 30% of the variance in students’ intrinsic value and 25% of their self-regulation at the end of the year.
Salkind_Chapter 59.indd 54
9/4/2010 10:47:49 AM
Pintrich el al.
Motivation and Learning
55
The second research question addressed the relations of these motivational and cognitive variables to children’s perceptions of their classrooms. The productive classroom work and teacher effectiveness perception scales shared modest correlations with the cooperative work scale (r = .38, r = .37, respectively), whereas perceptions of productive work and teacher effectiveness had a fairly high intercorrelation (r = .69). From these original scales, we constructed two additional measures of the classroom environment. First, we aggregated student perceptions to the class level yielding a social consensus measure for each of the three scales. Next, an individual deviation score was calculated for each of the three class perceptions by subtracting the classroom mean from individual student perceptions. T tests were run on the classroom mean and deviations perceptions, and one significant gender difference was found. Boys viewed their teachers’ instructional and management styles more favorably than did girls. This difference was manifest in the deviation teacher perception scores, with boys being above the classroom mean and girls below the classroom mean on the average across all classrooms. Because of this difference, gender was used as a control in the regression analyses in the final section. Table 3 presents the zero-order correlations and summary statistics for the classroom perception variables and the motivational and self-regulated learning variables at Time 1 and Time 2. Table 3 includes the original scales, the social consensus measures, and the deviation scores for each of the classroom perceptions. In general, the results for the deviation scores mirrored the results for the original scales, so only the results for the deviation and classroom-level perceptions will be discussed here. Recall that correlations involving the deviation scores represent the relationship of the motivation and cognition variables with a student’s perception of the classroom as more or less than the class average on a given construct. Table 3 shows significant relationships between intrinsic value and the deviation perceptions of productive class work (r = .44, r = .70), teacher effectiveness (r = .26, r = .50), and cooperative work (r = .24, r = .32), with the larger magnitude relations occurring at Time 2. The same pattern emerged for self-efficacy and cognitive strategy use. To the extent that students perceived their teacher as more effective, their work as interesting and productive, and opportunities to work together more than did their classmates, they also reported higher levels of self-efficacy and strategy use, with stronger effects at Time 2. Students who showed lower levels of test anxiety at both time 1 and 2 (r = .20, r = .21) percieved their teachers as more effective than did their classmates. Finally, higher levels of self-regulation were related to favorable perceptions of productive work at Time 1 and Time 2, and of the teacher’s effectiveness at the end of the year. Table 3 also displays the correlations among the motivation, cognitive strategy variables and the social consensus measures of the classroom environment. No significant correlations between the social consensus measures
Salkind_Chapter 59.indd 55
9/4/2010 10:47:49 AM
Salkind_Chapter 59.indd 56
1 1 1 1 1 2 2 2 2 2
Time
4.79 1.33
.38*** .26** –.11 .40*** .33*** .82*** .56*** –.21* .69*** .60***
Productive work
5.52 1.19
.19* .30** –.12 .18* .10 .60*** .40*** –.18* .47*** .41***
Teacher
5.55 1.25
.15 .17 .01 .21* .05 .33*** .32*** –.06 .32*** .26**
Cooperative work
4.79 0.70
.01 .13 –.01 .06 .13 .43*** .39*** –.11 .40*** .37***
Productive work
5.52 0.76
–.02 .07 .07 .02 .06 .34*** .22* –.03 .32*** .29**
Teacher
5.55 0.54
–.15 .11 –.06 –.14 –.05 .11 .24** –.28** .19* .25**
Cooperative work
Classroom-level aggregate perceptions of
0.00 1.13
.44*** .23** –.12 .43*** .30** .70*** .42*** –.18 .56*** .48***
Productive work
0.00 0.91
.26** .33*** –.21* .22* .09 .50*** .34*** –.20* .35*** .29**
Teacher
0.00 1.13
.24* .13 .04 .30** .08 .32*** .24* .06 .27** .16
Cooperative work
Student deviation scores for classroom perceptions of b
Note: N = 89. a Classroom perceptions variables are presented in three forms. In the first column are individual difference classroom perceptions, the second column presents within-classroom averages of student perceptions, and the third column presents individual difference scores minus the classroom average. b Because these are deviation scores from classroom means, a positive correlation indicates the extent to which a student’s individual perception score being higher than the class average is related to increased levels of a motivation or strategy variable. *p < .05; **p < .01; ***p < .001.
Mean Standard deviation
Intrinsic value Self-efficacy Test anxiety Strategy use Self-regulation Intrinsic value Self-efficacy Test anxiety Strategy use Self-regulation
Student motivation
Individual difference class perceptions of
Table 3: Summary statistics and zero-order correlations for classroom perception variablesa with motivation and self-regulated learning at Time 1 and Time 2
56 Motivation
9/4/2010 10:47:49 AM
Pintrich el al.
Motivation and Learning
57
and students’ Time 1 motivation or cognition emerged. At Time 2, however, higher classroom level perceptions of a teacher’s effectiveness, as well as productive work were related to increased levels of intrinsic value, self-efficacy, strategy use, and self-regulation. In addition, higher classroom perceptions of cooperative work were related to increased levels of students’ self-efficacy, strategy use, and self-regulation and decreased levels of students’ test anxiety at the end of the year. The third research question was concerned with what predicts a student’s motivation and self-regulated learning at the end of the school year. In order to assess the relative impact of the student’s entry-level characteristics (Time 1 measures of intrinsic value, self-efficacy, strategy use, etc.) and their classroom experiences as measured by their perceptions (productive work, teacher effectiveness, cooperative work) on these motivational and cognitive components, regression analyses were used. Five separate regression analyses were run, each examining how an initial level of a motivational belief or self-regulatory learning strategy and the student’s classroom experience predicted the end-of-the-year measure of the same motivational or cognitive variable. Both the social consensus and deviation class perception variables were included in these analyses. The results are summarized in Table 4. These regressions also were done with multiplicative interaction terms included, and no significant interactions emerged between the entry characteristics and classroom perceptions on Time 2 variables. Table 4: Standardized regression effects of Time 1 motivation, self-regulated learning, and class perception variables on Time 2 motivation and self-regulated learning variables Intrinsic value a
Predictors Intrinsic valueb Self-efficacyb Test anxietyb Strategy useb Self-regulationb Gender Class perception of productive work Class perception of teacher Class perception of cooperative work Deviation perception of productive work Deviation perception of teacher Deviation perception of cooperative work Adjusted R
2
Self-efficacy a
Test anxiety a
.17* __ — — — –.01
.46*** — — — –.08
.52*** — — .17
.52*** –.08
.53*** –.27*
–.07 .54***
Strategy use a Self-regulation a
.45*** — –.03
.37*** .04
.02 –.01
.30* .03
.22 .04
.05
–.27**
.13
.16
.31***
–.13
.36***
.33**
.11
–.05
–.02
.03
.07
.04
.07
.13
–.01
.00
.69***
.54***
.39***
.61***
.44***
Note: N = 89. a Time 2 variables. b Time 1 variables. *p < .05; **p < .01; ***p < .001.
Salkind_Chapter 59.indd 57
9/4/2010 10:47:49 AM
58
Motivation
In general, Table 4 shows, as one would expect, that the Time 1 estimates of the student’s motivational and self-regulatory characteristics had large, positive effects on the Time 2 measures of the same construct. The one exception to this is for intrinsic value, where the initial level had only a small effect on value at the end of the year (beta = .17). In addition, these results suggest that both class-level environmental variables, as well as individual difference perceptions of the classroom affect students’ year-end motivation and selfregulatory learning. Specifically, classroom-level assessments of productive work were related positively to Time 2 measures of intrinsic value (beta = .52), self-efficacy (beta = .53), and cognitive strategy use (beta = .30). In addition, the extent to which a student perceived the work as more productive than did his or her classmates also had positive effects on value, self-efficacy, and strategy use and self-regulation above and beyond those effects due to the general consensus of what tasks were like in a class. Test anxiety at Time 2 was the only variable for which perceptions of productive work was not significantly predictive. Two other significant effects emerged. First, to the extent a student perceived fewer opportunities to work cooperatively, high levels of test anxiety were reported. Finally, higher levels of self-efficacy were related to lower classroom-level perceptions of a teacher’s effectiveness. With the exceptions of test anxiety and self-regulation, initial personal levels of these motivational and cognitive constructs, as well as perceptions of productive work, accounted for large proportions in Time 2 levels of intrinsic value (r 2adj = .69), self-efficacy (r 2adj = .54), and strategy use (r 2adj = .61). Favorable perceptions of productive classroom work and Time 1 measures of self-regulation have equal effects on end-of-the-year selfregulation, accounting for a fair amount of the variance (r 2adj = .44). Finally, modest levels of the variance in year-end test anxiety are accounted for by opportunities for students to work cooperatively and initial levels of test anxiety (r 2adj = .39).
Discussion In terms of our first research question on the relations between motivation and cognition, the results replicated our earlier results (Pintrich & De Groot, 1990) with a different sample of early adolescents. In fact, the direction and magnitude of the correlations were very similar over the two studies. Students who had positive motivational beliefs, which included a general intrinsic orientation focused on learning and mastery, positive perceptions of interest and value regarding course material, and high self-efficacy beliefs, were more likely to report using cognitive and self-regulated learning strategies that will result in deeper processing of the material and better understanding. At the same time, students who reported higher levels of test anxiety were less likely to be self-regulating. This finding is in line with a general
Salkind_Chapter 59.indd 58
9/4/2010 10:47:50 AM
Pintrich el al.
Motivation and Learning
59
information processing and social cognitive view of anxiety (Bandura, 1986; Benjamin et al., 1987) that highlights the interfering effects of anxiety on cognitive processing. This overall pattern of results for the three motivational components and self-regulated learning components has been found in a number of other experimental and correlational studies (e.g., Graham & Golan, 1991; Nolen, 1988; see review by Pintrich & Schrauben, 1992) and seems to represent a fairly reliable and valid set of findings. The fact that motivational beliefs and self-regulated learning variables were linked to each other usually begs the question of causality. That is, do positive motivational beliefs drive or power cognitive engagement and selfregulation, or does being self-regulating and cognitively skilled result in more positive motivational beliefs? The regression results predicting Time 2 motivation and self-regulated learning from Time 1 variables without the classroom perceptions included showed, for the most part, that the best predictor of later motivation or cognition was earlier motivation or cognition. The main exception to this general finding was that use of cognitive strategies earlier in the year was the best predictor of later intrinsic orientation, suggesting that students who are more cognitively engaged report more mastery goals and higher levels of interest and value later in the year. This finding of deeper processing leading to qualitatively better motivation is the reverse of the usual suggested path of mastery goals leading to better cognitive engagement (cf. Ames, 1992; Graham & Golan, 1991; Nolen, 1988). However, considered with the zero-order correlations, this finding suggests that the relations between motivation and cognition are reciprocal, especially in the classroom setting, as suggested by social cognitive theory (Bandura, 1986). Accordingly, although goal theory usually assumes a unidirectional influence from motivational goals to cognitive engagement, the results here begin to specify the nature of the reciprocal relations between motivation and cognitive self-regulation as suggested by Keating (1990). It may be more useful for future research to focus on describing in more detail the reciprocal nature of the relations between motivation and self-regulated learning rather than trying to define the one correct causal sequence in a deterministic fashion. More important, it appears that the functional significance of the classroom context influenced the motivational and self-regulated learning variables over time. In terms of the second general question, students’ motivational beliefs were positively related to positive features of the classroom as would be predicted from Eccles et al. (1993). Students reported that they were more likely to be focused on learning and mastery and have higher levels of interest and value for the course material when the classes they were in provided them with some choice of tasks, the teacher made the work interesting, provided good explanations, and allowed them to work with others. In addition, these same features of the classroom were related to higher levels of self-efficacy and lower levels of test anxiety. Students also
Salkind_Chapter 59.indd 59
9/4/2010 10:47:50 AM
60
Motivation
reported that they were more likely to use cognitive strategies for learning and to regulate their own thinking and effort in classrooms that had these positive features. These findings are interesting, but given that the classroom features were measured by students’ perceptions, it was important to control for initial levels of students’ motivation and self-regulated learning as well as examine both general classroom effects and individual perceptions of the classroom. By examining students’ entry characteristics and later classroom perceptions together (as in Table 4), we attempted to determine the relative contribution of entry characteristics and classroom features to students’ end-of-the-year motivation and self-regulated learning. In general, the results showed that both entry characteristics and classroom features contributed very significantly to student outcomes with high levels of variance accounted for by the predictors, but the relative strength varied depending on the outcome. First, in terms of the relative strength of the different classroom variables, both between-classroom and within-classroom variables were significantly related to motivation and cognition. In terms of the motivational beliefs, intrinsic value was strongly influenced by both between- and within-classroom features of classroom academic work. In fact, perceptions of productive classroom work was more than three times more important for end-of-the-year intrinsic value than was students’ entry level of intrinsic value. Students who were in classrooms that allowed task choice and had interesting tasks showed higher levels of intrinsic motivation in general; students who perceived more within-classroom choice and interest were more intrinsically motivated later in the year, regardless of their initial levels of intrinsic value. This suggests that intrinsic value may be more context dependent and that teachers can influence students’ general orientation to the academic work and facilitate students’ interest, value, and focus on mastery and learning. In contrast, entry level of test anxiety was a much stronger predictor of later test anxiety than any of the classroom perception variables, suggesting that test anxiety is a more traitlike characteristic of students, this is brought with them to the classroom situation, at least in terms of these three dimensions of the classroom. In fact, test anxiety had the lowest overall variance accounted for by the predictors. However, in classrooms, where there was more opportunity to work collaboratively, test anxiety was lower, regardless of initial levels of anxiety or within-classroom perceptions, suggesting one general classroom strategy that all teachers can use to help students become less anxious (cf. Hill & Wigfield, 1984). Self-efficacy, in contrast to the more situational intrinsic value and more traitlike anxiety, was predicted by both initial self-efficacy levels and between-classroom work equally. In addition, students that perceived their classroom work as more productive than their classmates felt more efficacious. Accordingly, classroom work that provided more choice and was more interesting was related to higher self-efficacy levels, regardless of initial
Salkind_Chapter 59.indd 60
9/4/2010 10:47:50 AM
Pintrich el al.
Motivation and Learning
61
levels of efficacy. Interestingly and unexpectedly, in classrooms where the teachers were perceived as more effective overall, students had lower self-efficacy. This may be due to the perception that these teachers are so effective at management and instruction that students are less willing to attribute their success at learning to themselves and hence have somewhat lower self-efficacy for learning. At the same time, it is important to note that the perceptions of efficacy were not so low as to have any detrimental effect on cognitive engagement. Students who perceived their teacher as effective were still cognitively engaged (see Table 3). Future research will have to examine this question more carefully by including attributional and control belief scales about responsibility for learning. The same basic pattern that was shown in the self-efficacy results held for cognitive strategy use and self-regulation; entry levels on these cognitive variables accounted for approximately equal proportions of the variance in outcome levels as classroom work perceptions. In terms of cognitive strategy use, there was both an overall between-classroom effect as well as a within-classroom effect. Students were more likely to use cognitive strategies if they were in classrooms that had task choice and interesting tasks overall as well as if they perceived more choice and interest within their class. For self-regulation, only the within-classroom perception was significantly related to use of regulatory strategies. Accordingly, it appears that both student entry characteristics and between- and within-classroom characteristics have an influence on adolescents’ motivational beliefs and their self-regulated learning. Adolescents bring with them to the classroom certain motivational beliefs and levels of strategy use and self-regulation that influence their later motivation and self-regulated learning. These individual differences are important and do relate to future achievement (cf. Corno & Snow, 1986). At the same time, classroom features, particularly the nature of the classroom work, can influence these student outcomes as well (Eccles et al., 1993). In earlier studies (e.g., Ryan & Grolnick, 1986), the within-classroom differences seemed to be more important than the social consensus between classroom differences as predictors of student motivation. Our results show both may be important, even after initial individual differences are taken into consideration. Accordingly, there may be two levels of the functional significance of the classroom: an overall between-classroom level that represents a social consensus regarding adolescents’ perceptions and a within-classroom level that reflects adolescents’ differing perceptions of the classroom. Both levels of these classroom features can provide the context in which individual differences in student motivation and self-regulation operate, demonstrating the interplay between not only adolescent motivation and cognition but also the classroom context in middle schools. Finally, in terms of the educational implications of this research, there are several important suggestions for teachers in middle schools. First,
Salkind_Chapter 59.indd 61
9/4/2010 10:47:50 AM
62
Motivation
motivational beliefs in adolescents, particularly their interest, value, and intrinsic goals for class work, are not stable traits that imply that middle school students are either motivated or not. It appears that the nature of the classroom work can influence these motivational beliefs. If students are given work that is interesting, allows some choice, and provides opportunities to work cooperatively with one another, then they will be more likely to be motivated and cognitively engaged. Second, it may not be that difficult to implement some of these changes in classroom work. Although Lepper and his colleagues (Lepper & Malone, 1987; Malone & Lepper, 1987) have provided suggestions to improve intrinsic motivation through the use of choice and control options in fantasy and simulation situations, the simple provision of choice for the timing of when tasks are completed offers students some control over their learning but does not usurp the teacher’s responsibility for curriculum management nor does it require the use of nontraditional tasks. Others have shown that teachers can use somewhat traditional tasks and still increase motivation and cognition. For example, Blumenfeld (1992) has shown that instructional activities that are based on children’s experience and real-life events and that ask students to apply their knowledge can foster both motivation and cognitive engagement. This highlights a third important implication of our work: the need for teachers to consider both motivation and cognition simultaneously, and not simply focus on motivating the students without considering the cognitive consequences of motivational enhancement. For example, whereas cooperative groups may be more motivating, they can lead to less cognitive engagement due to group distractions. Blumenfeld (1992) has suggested that not only do teachers need to “bring the task to the students” by making the tasks more motivating, interesting, and relevant, but they also need to “bring the student to the task” (p. 110) by making the students accountable for deeper levels of cognitive engagement through evaluation and assessment procedures. Not only should tasks be interesting, but there should be a press for deeper cognitive engagement through the use of higher level questioning during class instruction and requests for written work that requires this type of thinking. By considering both student motivation and cognition and how they are influenced by the nature of classroom instruction and tasks, teachers will be able to create classrooms that are both motivating and thoughtful, a context that can only benefit the development of young adolescents in middle schools.
References Ames, C. (1992). Classrooms: Goals, structures, and student motivation. Journal of Educational Psychology, 84, 261–271. Ames, C., & Archer, J. (1988). Achievement goals in the classroom: Student learning strategies and motivation processes. Journal of Educational Psychology, 80, 260–267. Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice-Hall.
Salkind_Chapter 59.indd 62
9/4/2010 10:47:50 AM
Pintrich el al.
Motivation and Learning
63
Benjamin, M., McKeachie, W., & Lin, Y. (1987). Two types of test anxious students: Support for an information processing model. Journal of Educational Psychology, 73, 816–824. Blumenfeld, P. (1992). The task and the teacher: Enhancing student thoughtfulness in science. In J. Brophy (Ed.), Advances in research on teaching ( Vol. 3, pp. 81–114). Greenwich, CT: JAI. Blumenfeld, P., Mergendoller, J., & Swarthout, D. (1987). Task as a heuristic for understanding student learning and motivation. Journal of Curriculum Studies, 19, 135–148. Blumenfeld, P., Puro, P., & Mergendoller, J. (1992). Translating motivation into thoughtfulness. In H. Marshall (Ed.), Redefining student learning: Roots of educational change (pp. 207–239). Norwood, NJ: Ablex. Blumenfeld, P., Soloway, E., Marx, R., Krajcik, J., Guzdial, M., & Palincsar, A. (1991). Motivating project-based learning: Sustaining the doing, supporting the learning. Educational Psychologist, 26, 369–398. Brophy, J. (1983). Conceptualizing student motivation. Educational Psychologist, 18, 200–215. Brophy, J., & Good, T. (1986). Teacher behavior and student achievement. In M. Wittrock (Ed.), Handbook of research on teaching (pp. 328–375). New York: Macmillan. Brown, A. L., Bransford, J. K., Campione, J. C., & Ferrara, R. A. (1983). Learning, remembering and understanding. In J. Flavell & E. Markman (Eds.), Handbook of child psychology: Vol. 3. Cognitive development (pp. 77–166). New York: Wiley. Brown, J., Collins, A., & Duguid, P. (1989). Situated cognition and the culture of learning. Educational Researcher, 18, 32–42. Corno, L. (1986). The metacognitive control components of self-regulated learning. Contemporary Educational Psychology, 11, 333–346. Corno, L. (1993). The best-laid plans: Modern conceptions of volition and educational research. Educational Researcher, 22, 14 –22. Corno, L., & Snow, R. (1986). Adapting teaching to individual differences among learners. In M. Wittrock (Ed.), Handbook of research on teaching (pp. 605–629). New York: Macmillan. Deci, E., & Ryan, R. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum. Doyle, W. (1983). Academic work. Review of Educational Research, 53, 159–200. Dweck, C., & Leggett, E. L. (1988). A social cognitive approach to motivation and personality. Psychological Review, 95, 256–273. Eccles, J. (1983). Expectancies, values and academic behaviors. In J. T. Spence (Ed.), Achievement and achievement motives (pp. 75–146). San Francisco: Freeman. Eccles, J., Midgley, G, Wigfield, A., Buchanan, C. M., Reuman, D., Flanagan, C., & Mac Iver, D. (1993). Development during adolescence: The impact of stage-environment fit on young adolescents’ experiences in schools and in families. American Psychologist, 48, 90–101. Entwisle, D. (1990). Schools and the adolescent. In S. Feldman & G. Elliott (Eds.), At the threshold: The developing adolescent (pp. 197–224). Cambridge, MA: Harvard University Press. Graham, S., & Golan, S. (1991). Motivational influences on cognition: Task involvement, ego involvement, and depth of information processing. Journal of Educational Psychology, 83, 187–194. Harter, S. (1981). A new self-report scale of intrinsic versus extrinsic orientation in the classroom: Motivational and informational components. Developmental Psychology, 17, 302–312. Hill, K., & Wigfield, A. (1984). Test anxiety: A major educational problem and what can be done about it. Elementary School Journal, 85, 105 –126.
Salkind_Chapter 59.indd 63
9/4/2010 10:47:50 AM
64
Motivation
Keating, D. (1990). Adolescent thinking. In S. Feldman & G. Elliott (Eds.), At the threshold: The developing adolescent (pp. 54 – 89). Cambridge, MA: Harvard University Press. Lepper, M., & Malone, T. (1987). Intrinsic motivation and instructional effectiveness in computer-based education. In R. Snow & M. Farr (Ed.), Aptitude, learning, and instruction: Vol. 3. Conative and affective process analyses (pp. 255–286). Hillsdale, NJ: Lawrence Erlbaum. Liebert, R. M., & Morris, L. W. (1967). Cognitive and emotional components of test anxiety: A distinction and some initial data. Psychological Reports, 20, 975–978. Maehr, M. (1984). Meaning and motivation: Toward a theory of personal investment In R. Ames & C. Ames (Eds.), Research on motivation in education ( Vol. 1, pp. 39–73). San Diego, CA: Academic Press. Malone, T., & Lepper, M. (1987). Making learning fun: A taxonomy of intrinsic motivations for learning. In R. Snow & M. Farr (Eds.), Aptitude, learning, and instruction: Vol. 3. Conative and affective process analyses (pp. 223–253). Hillsdale, NJ: Lawrence Erlbaum. Moos, R. (1979). Evaluating educational environments. San Francisco: Jossey-Bass. Nolen, S. (1988). Reasons for studying: Motivational orientations and study strategies. Cognition and Instruction, 5, 269–287. Pintrich, P. R. (1989). The dynamic interplay of student motivation and cognition in the college classroom. In C. Ames & M. Maehr (Eds.), Advances in motivation and achievement: Vol. 6. Motivation and enhancing environments (pp. 117–160). Greenwich, CT: JAI. Pintrich, P. R., & De Groot, E. V. (1990). Motivational and self-regulated learning components of classroom academic performance. Journal of Educational Psychology, 82, 33– 40. Pintrich, P. R., & Garcia, T. (1991). Student goal orientation and self-regulation in the college classroom. In M. Maehr & P. R. Pintrich (Eds.), Advances in motivation and achievement: Vol. 7. Goals and self-regulatory processes (pp. 371– 402). Greenwich, CT: JAI. Pintrich, P. R., & Schrauben, B. (1992). Students’ motivational beliefs and their cognitive engagement in academic tasks. In D. Schunk & J. Meece (Eds.), Students perceptions in the classroom: Causes and consequences (pp. 149–183). Hillsdale, NJ: Lawrence Erlbaum. Pintrich, P. R., Smith, D., Garcia, T, & McKeachie, W. J. (1993). Reliability and predictive validity of the Motivated Strategies for Learning Questionnaire (MSLQ). Educational and Psychological Measurement, 53, 810–813. Ryan, R., & Grolnick, W. (1986). Origins and pawns in the classroom: Self-report and projective assessments of individual differences in children’s perceptions. Journal of Personality and Social Psychology, 50, 550–558. Schunk, D. (1985). Self-efficacy and school learning. Psychology in the Schools, 22, 208–223. Schunk, D. (1989). Social cognitive theory and self-regulated learning. In B. Zimmerman & D. Schunk (Eds.), Self-regulated learning and academic achievement: Theory, research, and practice (pp. 83–110). New York: Springer-Verlag. Simmons, R. G., & Blyth, D. A. (1987). Moving into adolescence: The impact of pubertal change and school context. Hawthorne, NY: Aldine. Slavin, R. E. (1980). Cooperative learning. Review of Educational Research, 50, 315–342. Stemberg, R., & Powell, J. (1983). The development of intelligence. In J. Flavell & E. Markman (Eds.), Handbook of child psychology (Vol. 3, pp. 341– 419). New York: Wiley. Weinstein, C. E., & Mayer, R. E. (1986). The teaching of learning strategies. In M. Wittrock (Ed.), Handbook of research on teaching (pp. 315–327). New York: Macmillan.
Salkind_Chapter 59.indd 64
9/4/2010 10:47:50 AM
Pintrich el al.
Motivation and Learning
65
Weinstein, R. (1989). Perceptions of classroom processes and student motivation: Children’s views of self-fulfilling prophecies. In C. Ames & R. Ames (Eds.), Research on motivation in education: Vol. 3. Goals and cognitions (pp. 187–221). New York: Academic Press. Winne, P., & Marx, R. (1982). Students’ and teachers’ views of thinking processes for classroom learning. Elementary School Journal, 82, 459–518. Zimmerman, B., & Martinez-Pons, M. (1986). Development of a structured interview for assessing student use of self-regulated learning strategies. American Educational Research Journal, 23, 614–628.
Salkind_Chapter 59.indd 65
9/4/2010 10:47:50 AM
Salkind_Chapter 59.indd 66
9/4/2010 10:47:50 AM
60 Atkinson’s Theory of Achievement Motivation: First Step toward a Theory of Academic Motivation? Martin L. Maehr and Douglas D. Sjogren
E
ducators generally agree that a major variable affecting classroom performance is motivation. However, important as motivational variables may be in understanding, predicting, and controlling classroom behavior, there is a paucity of information and theory associated with them. There are many theories of human motivation; but little attempt has been made to extend these theories in a systematic way to educational situations. Furthermore, the occasional application of psychological theory to education has not typically eventuated in a theory of academic motivation nor a unified and coherent body of information. As a result, there is very little in the way of motivation theory which is clearly of help to the classroom teacher or to education in general. The situation is not without hope, however. The work of several theorists has shown promise of evolving postulates and hypotheses relevant to the teaching-learning process. Prominent among these is the theory of achievement motivation, particularly as formulated by Atkinson (1957, 1964, 1965; Atkinson & Feather, 1966).1 This theory has provided a productive approach to a variety of behavioral phenomena; thus it has been suggested (e.g., Atkinson, 1966; Weiner, 1967) that it may also give direction to educationally relevant research. Can it indeed serve as a first step toward a theory of academic motivation? It is the goal of this paper to consider that question. Source: Review of Educational Research, 41(2) (1971): 143–161.
Salkind_Chapter 60.indd 67
9/4/2010 10:47:38 AM
68
Motivation
Atkinson’s Theory of Achievement Motivation Essentially, Atkinson’s theory of achievement motivation can be summarized in the following equation: Ta = Ts + T−f + Text , where Ta = an active impulse to undertake a particular achievement-oriented activity; Ts = Ms (Ps) (Is); Ms = tendency to approach success, usually assessed with the aid of the Thematic Aperception Test (TAT); Ps = subjective probability of success, ranging on a scale from 0.00 to 1.00; Is = incentive value of success; it is assumed that Is = (1 – Ps); T–f = Maf(Pf ) (If ); Maf = tendency to avoid failure, usually assessed with the aid of the Test Anxiety Questionnaire (TAQ); Pf = subjective probability of failure; Pf = (1 – Ps); If = incentive value of failure; If = (1 – Pf); in computing the values in the equation, the sign is assumed to be negative (–If); Text = positive extrinsic tendency to perform the activity; these are tendencies which are not associated with pride in achievement per se; included, e.g., would be motives to comply or seek for approval which may eventuate in achievement behavior in a given context; the inclusion of Text in the formula represents a recent recognition of the fact that social contexts typically also bring non-nAch motives to bear on the achieving situation. Although its essentials are contained in the above equation, certain assumptions and characteristics of the theory might profitably be emphasized. First, it should be noted that the theory is assumed to be appropriate in “achievement situations,” i.e., situations in which a person not only sees himself as responsible for a somewhat uncertain outcome but knows that the outcome for which he is responsible will be evaluated against a standard of excellence. In other words, achievement situations are situations which require skill and competence. Second, it is assumed that in such achievement situations two conflicting predispositions will be energized: a motive to approach success and a motive to avoid failure. Furthermore, it should be noted that these conflicting motives are part of the person’s enduring personal orientation and that the relative strength of these two motives will vary from person to person. Thus, unlike most expectancy x value theories, a personality or individual difference factor is considered along with situational factors.
Salkind_Chapter 60.indd 68
9/4/2010 10:47:38 AM
Maehr and Sjogren
Atkinson’s Theory of Achievement Motivation
69
Finally, it should be emphasized that the theory, again in contrast to most expectancy x value theories, assumes that the value factor (Is, If) is directly dependent on the expectancy factor (Ps, Pf); the incentive values of success and failure are inverse linear functions of the probability of success and failure. It is noteworthy that this simplifying assumption has some empirical justification, at least in the case of Ps and Is (Litwin, 1958). Given this general theoretical framework, a variety of specific predictions can be made. The major, over-riding hypothesis is that, in achievement situations, Ss for whom Ms > Maf (achievement-oriented Ss) will exhibit lowest motivation where Ps = 0.00 or 1.00 and highest motivation where Ps = 0.50. Conversely, Ss for whom Maf > Ms (failure-threatened Ss) will exhibit lowest motivation when Ps = 0.50.
Supporting Studies That the Atkinson theory represents more than intriguing speculation is clear from a perusal of the literature. There is a substantial body of data which indicates that the formulation is viable at least for the moment. Perhaps the major prediction of the theory is that achievement-oriented Ss will be more motivated toward moderately difficult tasks than failurethreatened Ss; in educational terms, achievement-oriented Ss will be more inclined toward challenge. Using choice-preference, persistence, or level of performance as the index of motivation, support for this general hypothesis was obtained in a variety of situations. Choice-preference. In a number of studies involving competitive, gamelike situations (McClelland, 1958; Atkinson, Earl, & Litwin, 1960; Litwin, 1958, 1966; Atkinson & Litwin, 1960), achievement-oriented Ss consistently showed a greater tendency to choose alternatives or perform tasks which had an intermediate probability of success. Kogan and Wallach (1967), among others, suggested this differential tendency to select or choose tasks of intermediate difficulty is limited to the narrow confines of competitive games. However, it is clear from other studies utilizing non-game-like experimental contexts that this is not true. For example, in a study concerned with aspiration level on an intellectual task and without the presence of a game-like atmosphere, Moulton (1965) found the predicated differential tendencies. This differential tendency of achievement-oriented and failure-threatened Ss to opt for challenge was also exhibited in curricular (Isaacson, 1964) and job (Mahone, 1960) choices. In short, the theory correctly predicts a differential preference for intermediate success levels; moreover, this prediction is not to be limited to the playing of competitive games. While there is strong support for the hypothesis that achievement-oriented Ss are more likely than failure-threatened Ss to prefer tasks of moderate difficulty, it should not be concluded that the predicted patterns of motivational increase-decrease across all Ps levels has also received confirmation. One may,
Salkind_Chapter 60.indd 69
9/4/2010 10:47:38 AM
70
Motivation
as Heckhausen (1968) did, question whether the predicted symmetrical curves about Ps = .50 have validity, since achievement-oriented Ss have often been found to prefer levels slightly less than Ps = .50. Perhaps of greater interest is the fact that a U-shape function about Ps = .50 has not typified the choices of failure-threatened Ss. Thus it may be misleading to conclude that failure-threatened Ss actually avoid the Ps = .50 level, as the theory asserts. For the most part, the data only indicate that failure-threatened Ss prefer it less than achievement-oriented Ss. Thus, when the preference patterns of achievement-oriented and failure-threatened Ss are compared, achievement-oriented Ss do indeed exhibit a higher level of preference for moderately difficult tasks. However, considering the preference pattern of failure-threatened Ss separately, it is clear that their preference for moderately difficult tasks is not consistently less than their preferences for either difficult or easy tasks. If anything, they too exhibit a preference for moderate difficulty levels, though not perhaps to the same degree as achievement-oriented Ss. That achievementoriented Ss have a greater preference for moderately difficult tasks than failurethreatened Ss seems clear. What remains to be demonstrated is that failure-threatened Ss exhibit maximum avoidance in the moderate difficulty range, as the theory predicts. Persistence. The Atkinson model has also been effectively applied to the prediction of task persistence. In a series of papers, Feather (1961, 1962, 1963) not only developed a rationale for the application of the theory to the study of persistence but provided evidence confirming the validity of this application. In theory, achievement-oriented Ss should be more motivated, or approach-oriented, to the task (and hence, persist longer) when Ps = .50; failurethreatened Ss should be less motivated, in fact avoidance-oriented, toward the task (and hence, persist for a shorter period) under the same Ps = .50 condition. In confirmation of this prediction, Feather found that when Ss were told that a task was easy (presumably, Ps > .50) and then experienced failure (presumably lowering Ps to or in the direction of .50), achievement-oriented Ss exhibited greater persistence than failure-threatened Ss. Similarly, when Ss were told that a task was hard and then succeeded, achievement-oriented Ss likewise showed greater persistence. Indirect confirmation of the Atkinson model in the case of persistence behavior was also obtained by Maehr and Videbeck (1968). In a study on the effects of general risk-taking tendency (defined by a behavioral measure) on preference for reinforcements levels, these researchers found that regardless of general risk orientation, Ss were most persistent at moderate (50%) reinforcement-success schedules and least persistent at minimal (15%) and maximal (85%) schedules. Since the task was of an achievement nature and since the Ss (undergraduate male volunteers) were presumably high in success motivation, the data were interpreted as supportive of the hypothesis that achievement-oriented Ss are more motivated and, hence, more persistent when Ps = .50 than at lower or higher Ps levels. However, it is important to emphasize that while these findings are in accord with Feather’s results
Salkind_Chapter 60.indd 70
9/4/2010 10:47:38 AM
Maehr and Sjogren
Atkinson’s Theory of Achievement Motivation
71
and in a sense enhance them, the study did not clearly define a difference between achievement-oriented and failure-threatened Ss. All in all, there seems to be strong support for the Atkinson model in the case of persistence behavior. Performance. Attempts to apply the Atkinson model to performance have eventuated in conflicting results, at Atkinson and others (Atkinson, 1967; Klinger, 1966) noted. Achievement-oriented Ss have not necessarily shown their best performance under a Ps = .50 condition (e.g., Smith, 1964; O’Connor, Atkinson, & Horner, 1966) and only occasionally have failure-threatened Ss shown deteriorated performance at this level (e.g., Karabenick & Youssef, 1968). Atkinson (Atkinson & Feather, 1966, p. 335; Atkinson, 1967) suggested that other conditions or tendencies (Text) may summate with the tendency for success and the tendency to avoid failure, resulting in extreme and counterproductive drive in the case of achievement-oriented Ss and in moderate and productive resultant achievement motivation in the case of failurethreatened Ss. The validity of this explanation is yet to be determined. It is, however, worth noting that the lack of success in predicting performance is not altogether surprising in view of the complexity of the factors typically involved. Furthermore, one might argue that preferences and tendencies to persist should have a long-term or eventual effect on outcomes in educational situations, even though short-term studies of performance do not reveal the predicted patterns. But such arguments do not settle the matter; they only suggest a direction for future research. Conclusion. It seems clear that Atkinson’s theory accurately predicts a differential orientation toward moderate success levels on the part of achievementoriented and failure-threatened Ss. That the results have been less clear in the case of performance data may be at least partially explainable by the more complicated nature of performance. Questions of whether failurethreatened Ss actually avoid the moderate levels and equally prefer either minimal or maximal success levels still remain. Finally, although it would be amiss to conclude that the model has validity only in the socially competitive game-like situation, it must be pointed out that most of the studies have at least implied some form of social competition in eliciting achievement. Moreover, in one study (de Charms & Davé, 1965) specifically designed to be self- rather than socially-competitive, the predicted effects were not found.
Unresolved Issues Granted that there is support for the basic conceptions of the model, are these conceptions really applicable to educational situations? Is the theory so structured that it can serve not only as a guide for educationally relevant research but also as a frame of reference for interpreting educational behavior and structuring instructional procedure? Without doubt, some issues need to be resolved before experimentation and application can proceed very far.
Salkind_Chapter 60.indd 71
9/4/2010 10:47:39 AM
72
Motivation
Measurement A very critical area is the need for a better instrument for measuring Ms . The most commonly used procedure at the present time is to score responses to certain pictures from the TAT according to the achievement theme of the responses (McClelland, Atkinson, Clark, & Lowell, 1953; Atkinson, 1958). The index obtained through such a procedure is not notably reliable, perhaps primarily because subtle environmental cues present during the administration of the test can have strong effects on Ss’ responses (Klinger, 1967; Weinstein, 1969). Aside from such technical reservations, it may also be noted that the administration and scoring of such an instrument is so difficult that even if the instrument is reliable, it would not be practical for an applied situation. Attempts to alleviate this situation have been made. In particular, it may be noted that a number of objective measures of Ms have been tried, such as the Edwards Personal Preference Schedule (Edwards, 1954) and the Iowa Picture Interpretation Test (Hurley, 1955, 1957; Johnston, 1957). These have not had notable success and are questionable substitutes for a projective-type measure (Atkinson & Litwin, 1960; Heekhausen, 1967). Similarly, Atkinson and O’Connor (1966) tried unsuccessfully to develop an objectively scoreable questionnaire which would assess Ms on the basis of behavioral preferences. Some hope in this regard may be found in an instrument developed by Mehrabian (1968, 1969) and given preliminary testing by Weiner and Kukla (1970). A recently developed test by Herman (1969, 1970) likewise holds promise. However, both the Mehrabian and the Herman instruments have had only limited testing and their utility in an applied setting is yet to be determined. In view of such lack of success with nonprojective measures, one might conclude that this confirms the initial assumption of McClelland et al. (1953), based on Freud, that motives are best expressed in fantasy; what you obtain on objective measures is subject to strong social restraints which preclude the expression of actual inclinations. However, since the projective devices have not provided ultimate success in this regard either, it is difficult to accept this assumption. Perhaps test development has simply proceeded along inappropriate lines. For example, designers of objective instruments have typically developed items on an a priori or theoretical basis and then determined the validity of the instrument on the basis of correlations with standard projective measures or designated achievement behavior. A more appropriate tactic might be to build a scale empirically by selecting items that differentiate between people who perform a task with high achievement motivation and those who perform with low achievement motivation.
n-Achievement or n-Competition? It was previously concluded that Atkinson’s theory is not limited to the confines of competitive, game-like situations. But the extent to which external (social) as
Salkind_Chapter 60.indd 72
9/4/2010 10:47:39 AM
Maehr and Sjogren
Atkinson’s Theory of Achievement Motivation
73
opposed to internal standards are effective remains an intriguing question. To put it differently, was Atkinson describing the behavior of people who like to win (over others), or simply, as he implied, the behavior of people who like to do well in terms of either external or internal standards? Perhaps there may be different types of achievement motivated Ss: those who are self-competitive and those who are socially competitive. In their early work, McClelland et al. (1953, p. 107 ff.) speculated on this possibility. More recently, Veroff (1969) suggested that nAch may take different forms at different age levels. Presumably, in the earlier stages of development nAch is reflected in a drive for competence (cf. White, 1959, 1960) which is relatively independent of the social comparison processes described by Festinger (1954). Later, during middle childhood when social comparison processes become both possible and important, nAch may be exhibited in an inclination toward socially competitive patterns. Finally, during adolescence, the individual presumably arrives at some sort of integrated nAch patterns which is a product of his experiences during both the early and late stages. Thus he might ultimately be more or less autonomous in his achievement orientation. Conceivably then, at least two types of achievement pattern might eventuate: a self-competitive and an other-competitive pattern. It may well be that the Atkinson model, while not limited to competitive games, is primarily appropriate to the other-competitive type. The preponderance of research has employed social norms of one kind or another in communicating the standard of excellence. What further enhances the importance of this question is that its answer may solve yet another perplexing problem. Throughout the literature one finds repeated references to the fact that women often do not respond to achievement cues in the same manner as men nor reflect the achievement behavior predicted by nAch theory (McClelland et al., 1953; Veroff, Wilcox, & Atkinson, 1953; Skolnick, 1966; McClelland, 1966). Quite possibly such disconcerting sex differences can be appropriately understood as differences in competitive orientation rather than as differences in achieving tendency per se. In any case, however one wishes to approach the sex difference problem (see, e.g., French & Lesser, 1964; Horner, 1968; Houts & Entwistle, 1968; Weston & Mednick, 1970), the competitive forms that achievement may take emerge as a crucial question.
nAch and Long-Term Achievement Yet another important question is what effect changes in Ps during performance may have. The results of Feather’s research (1961, 1962, 1963) seem to clearly indicate that both achievement-oriented and failure-threatened Ss persist under the expectancy conditions as the theory would predict. However, it must be kept in mind that in these studies Ps was a presumed function of E’s initial characterization of a task as easy or hard followed by S’s success or failure. Certainly an educator would be interested in persistence
Salkind_Chapter 60.indd 73
9/4/2010 10:47:39 AM
74
Motivation
(and motivation) under those circumstances. However, he is quite likely to be more interested in persistence under slightly different circumstances, viz., under circumstances in which varying degrees (or ratios) of success are achieved in the course of performance. Such a situation, in which Ps must be self-determined by S based on the reinforcement he has thus far received, is more analogous to the typical situation. The question is, is nAch theory really applicable to a situation in which the actual ratio of success to persistence is of concern? One may infer from Atkinson and Cartwright (1964) that it should be. However, empirical data in this regard are limited. The notion that persistence (and motivation) will be greatest for achievement-oriented Ss when there is a challenge (Ps approaching .50) is intuitively plausible. One might, however, wonder whether continued effort at a task in which Ps remains at or around .50 might not become somewhat discouraging. An even more serious reservation in this regard concerns the issue of whether the failure-threatened S would be most persistent both when success is achieved 0.00% and 100% of the time in an ongoing achievement process.
Cross-cultural Generalizability Problems in motivating students are doubtless no more salient than in the case of the confrontation between the middle-class school and the “other” class and culture child. The viability of Atkinson’s theory within educational psychology depends to a considerable extent on its cross-cultural applicability. Is the theory applicable outside of white middle-class culture? Most of the confirming studies reported in the literature were conducted with middle-class and predominantly white Ss. Furthermore, if the theory is found to be relevant only in socially competitive situations (a moot point raised previously), it hardly needs to be added that the typical classroom is not the situation in which the non-middle-class child could be expected to exhibit whatever competitive predispositions he might have. Equally germane is Katz’s (1967) criticism that nAch indexes typically do not tap those areas in which disadvantaged children have a desire or hope for achievement. Supporting this is the finding that lower-class Black Ss (Rosen, 1959), and in some cases (Littig, 1968) upper- and middle-class Black Ss, exhibit lower achievement imagery on the conventional projective measures. There is, however, some evidence that the theory is not entirely devoid of cross-cultural applicability. First of all, Mingione’s work (1965, 1968) indicates the direction that cross-cultural assessment might take. Reasoning that the usual TAT procedure does not effectively elicit achievement imagery in Black Ss, Mingione (1965) developed pictorial stimuli which were designed to be more “culture fair.” Nevertheless, her results still showed differences between Black and white Ss with white Ss exhibiting more achievement imagery. Subjects in this study were grade school children living in
Salkind_Chapter 60.indd 74
9/13/2010 3:37:37 PM
Maehr and Sjogren
Atkinson’s Theory of Achievement Motivation
75
North Carolina. In a subsequent study using “disadvantaged” Ss residing in Connecticut and employing verbal stimuli in eliciting achievement imagery, Mingione (1968) found no differences between Black and white Ss. Second, de Charms and Carpenter (1969) provided strong support for the valitdy of Atkinson’s model in the case of Black disadvantaged fifth- and seventh-grade children. In this study high nAch Ss (TAQ was not employed) exhibited greater preferences for moderate success levels on spelling and arithmetic tasks than low nAch Ss. In the case of the spelling task, low nAch Ss actually exhibited elevated preferences at low and high Ps levels and lowered preferences at moderate levels, a prediction that has seldom been found to occur regardless of subject population. What is most interesting, however, is that the experimental situation was not socially competitive in nature and was consistent with the realities of school performance. It will be remembered that when a similar non-competitive condition prevailed in the case of a game-like experiment using middle-class Ss (de Charms & Davé, 1965), Atkinson’s predictions were not confirmed. It is difficult to fully explain the incongruity existing in the case of these two studies. One might hypothesize that whereas a competitive atmosphere may be necessary to elicit achievement behavior in middle-class Ss, it is not in the case of Black lower-class Ss. It does not seem logical to argue that the type of task made a difference since game-like tasks have been effective in most other studies, but a possible Situation × Task × Subject interaction should not be ruled out and is worth exploring. Although there is at least some evidence of the cross-cultural applicability of the theory in the case of preference data involving minimal time and personal commitment, one might wonder whether the predicted effects would occur in a continuing task. For example, would the failure-threatened culturally disadvantaged child really continue to persist and perform well with little or no success, as he should according to the Atkinson equation? At least on an intuitive basis, it is possible to accept the hypothesis that failurethreatened white middle-class Ss may at times exhibit a tendency to work hard although they achieve little success; perhaps this is simply because persistence under all circumstances is considered a virtue by the middle class. On the same basis, however, it may be questioned whether the culturally disadvantaged S will persist when achieving little success. In fact, implicit in the theory itself is an argument against the validity of the usual predictions in the case of culturally disadvantaged Ss. Presumably, the nAch motives are energized only when the individual sees himself as responsible for achieving or not achieving the standard of excellence. In other words, perception of responsibility for and control over behavior is a requisite for the obtainment of the predicted curvilinear function. It may be noted, however, that not only nAch but also perception of responsibility may vary from situation to situation and from individual to individual. The work of Rotter and his associates (Rotter, 1966; Lefcourt, 1966) on locus of control is interesting in this regard.2 Their work strongly suggests that the culturally disadvantaged S may see the typical achievement
Salkind_Chapter 60.indd 75
9/4/2010 10:47:39 AM
76
Motivation
situation as externally based and controlled rather than as a function of his own behavior. For this S, then, what was meant to be an achievement situation may well be perceived as a gambling situation, in the control of chance or the fates. In a gambling situation, a linear relationship between Ps and motivation is predicted for both achievement-oriented and failure-threatened Ss (Atkinson, 1964, p. 251). In any case, it is clear that in applying the Atkinson Model to other than white middle-class groups, other critical factors must be taken into account. In discussing the cross-cultural generalizability of the theory it is well to refer to two additional factors that need further study: mode of success and variations in task. There is a growing amount of evidence which indicates that children from different socioeconomic groups show different responses to different modes of feedback (Zigler & Kanzer, 1962; Zigler & Child, 1969; Stuempfig & Maehr, 1970). Within the wider context of nAch theory research, one finds clear interest in and guidelines for studying the effects of various modes of success on varying personality types. For example, French (1958) reported that S’s personal orientation (in terms of nAff and nAch) moderated responsiveness to different modes of feedback communication. And Atkinson (1966) emphasized that any consideration of academic motivation would have to take account of motives other than nAch.3 The recent inclusion of the Text term in the formal statement of the theory is indicative of the recognition of the importance of non-nAch motives in determining achievement in social settings. Yet all of this is only a bare beginning on a problem which is perhaps primary to the educator. Regarding the relationship of achievement motivation and task, theoretical concerns have tended to preclude research in this area. However, if the theory is applied to education, the question takes on major importance. All of this would seem to suggest that the development of Atkinson’s theory toward a general theory of academic motivation might well proceed by focusing on a major motivational problem: the achievement of disadvantaged children. Not only would this serve an important social need; it could conceivably force the kind of theoretical adjustments necessary to the development of a productive theory of academic motivation. For example, it is clear from the discussion thus far that the theory cannot now be applied to widely diverse cultural groups without taking factors other than nAch into account. Possibly all of these additional factors can be appropriately subsumed under the Text term. Possibly this strains the theory to the breaking point and a new conception is demanded. Thus Klinger and McNelly (1969), in considering differential achieving orientations of different social groups, suggested a role theory interpretation which, they argued, provides a more parsimonious and productive conception of achieving behavior as it occurs within given social contexts. Doubtless there are other possibilities, but the point to be made here is that application of the theory within a cross-cultural context is a potentially profitable way of determining if and how Atkinson’s model can evolve a theory of academic motivation.
Salkind_Chapter 60.indd 76
9/4/2010 10:47:39 AM
Maehr and Sjogren
Atkinson’s Theory of Achievement Motivation
77
A Practical Problem There is one final question that must be raised: how might one implement the theory if it is to serve as a guide to practice? The theory is primarily concerned with how specialized environmental events differentially affect certain kinds of persons. Although educators are concerned with individual differences in response to environmental events, the exigencies of education make it incumbent that they be more concerned with how environmental variables have a greater or lesser effect on all students regardless of individual differences. It is difficult, if not impossible at present, to create a specialized environment for each student. A theory of academic motivation must tell the educator how to manipulate environments in order to obtain the greatest over-all effects. Thus the question of concern to the educator is: how do I manipulate the school environment so that I can maximize motivation in all students regardless of personality differences? That one can effectively approach achievement motivation in this way can be seen in the work of Alschuler (1968) who, by organizing the learning task as a self-competitive game in which the individual was responsible for setting his own goals and was graded on his own terms, significantly increased students’ achievement. Along a similar line, de Charms (1968) developed a theory of motivation which, while not ignoring the personality dimension, gave full due to the possibility that the manipulation of environment alone may determine (effectively if not completely) achievement motivation. Thus, de Charms suggested that when environmental events reduce S’s freedom or make him feel like a “pawn” he will be less achievement oriented than when he performs freely or as an “origin.” Such social-psychological approaches as are implicit in the work of Alschuler and de Charms, among others, probably ought to receive increased consideration. Furthermore, there is no reason why the body of literature associated with Atkinson’s theory cannot at least serve as a heuristic device in work along these lines.
Implications for the Study of Academic Motivation Although one can level many criticisms against Atkinson’s theory, the fact remains that it does suggest new interpretations and potentially fruitful hypotheses related to a number of educational practices. Some of these deserve to be outlined here (see also Atkinson, 1966; Weiner, 1967).
Ability Grouping One example of the applicability of the theory to education involves the practice of ability grouping. Ability grouping is based on the assumption that
Salkind_Chapter 60.indd 77
9/4/2010 10:47:39 AM
78
Motivation
intelligence is the most important variable affecting performance in school: students will achieve more because the teaching can be geared to the ability level of the group. This assumption is probably not without validity, but consider how achievement motivation might affect performance in ability groups. First, it must be assumed that achievement motivation is not correlated with ability; there is evidence to support this assumption (Mahone, 1960; O’Connor et al., 1966.) If, then, students are placed into groups homogeneous according to ability but heterogeneous with respect to achievement motivation, what behaviors are predicted? Assuming also that there is more of a challenge (Ps → .50) in a homogeneous ability group, the theory would predict that those students with high achievement motivation would show high preference for and strong persistence at tasks; those students with low achievement motivation would have low preference for and low persistence at tasks. Ultimately, these variations in persistence and preference would be expected to affect performance. These predictions were substantiated to some extent in a study by O’Connor et al. (1966). They found that achievement-oriented students showed greater growth in scholastic achievement and more interest in school work when placed in an ability-grouped class. Failure-threatened students showed no difference in scholastic achievement but had less interest in school work when placed in such a class. The investigators speculated whether their finding might explain such phenomena as the differential performance of college freshmen who often come from high ability groups but who vary greatly in their performance and persistence in college. The evidence is perhaps not sufficient to justify changes in ability-grouping practices; but as Smith (1969, p. 243) suggested, it is sufficient to justify experimentation using variation in both achievement motivation and ability as a basis for forming homogeneous groups.
Programmed Instruction Another example is to be found in the case of programmed instruction. This educational innovation was to have solved the “motivational problem”; that it has not is obvious to any teacher (see Maehr, 1968). The limitations of programmed instruction in this regard may be partially attributable to the fact that programmers have been content with certain over-simplified assumptions concerning motivation. A typical assumption is that a high success ratio will elicit maximum motivation in all Ss. In light of the discussion thus far, this is obviously a misleading assumption. Atkinson’s theory directly repudiates the notion that all Ss will be equally motivated by the same success ratios. While achievement-oriented Ss will in all probability be more motivated across all success ratios than failure-threatened Ss, the major prediction of the theory relates to the point that maximal motivation in the case of each will occur at quite different success levels. Achievement-oriented Ss will be maximally motivated under moderate success ratios; conversely, failurethreatened Ss will be maximally motivated under either high or low success
Salkind_Chapter 60.indd 78
9/4/2010 10:47:39 AM
Maehr and Sjogren
Atkinson’s Theory of Achievement Motivation
79
levels but minimally motivated at intermediate levels. In other words, the typical error rate will, according to the Atkinson theory, maximize motivation of failure-threatened but not achievement-oriented Ss. Whether or not this application of the theory to programmed tasks is valid is yet to be determined. Preliminary studies by Kight and Sassenrath (1966) and Shrable and Sassenrath (1969) possibly yield some support in this regard. However, since the measure of achievement orientation employed in these studies was the Iowa Picture Interpretation Test, a questionable alternative to the TAT procedure (see Heckhausen, 1967), their findings must be viewed as suggestive rather than definitive. In any case, the point to be stressed is that Atkinson’s theory provides a potentially fruitful perspective for analyzing behavior on programmed tasks.
Independent Study Independent study is another educational practice that might be more effective if the results of research on achievement motivation were considered. (See Alexander & Hines, 1967, for a discussion of independent study.) According to the original conception of achievement motivation (McClelland et al., 1953), the person high in nAch is presumably highly inclined to take responsibility for his achievement behavior. Subsequently, Bartman (as quoted in Heckhausen, 1967) found evidence that high nAch Ss were indeed more able to handle an independent program of study than low nAch Ss. Moreover, McKeachie (1961) and Morris (1967), while exploring personality-environment interactions in the classroom, uncovered an interesting fact. Both studies found evidence to suggest that high nAch Ss outperform low nAch Ss in classrooms which allow for or are more dependent on self-motivation. More specifically, on the basis of Atkinson’s model it migh be expected that achievement-oriented and failure-threatened students would differ in the goals and tasks they select for themselves. The success-motivated student would seek out situations that were challenging. The behavior of the person with low achievement motivation would be less certain. He may seek out very easy tasks and learn little or he may seek out tasks so difficult that he is not able to learn them. Either result would be undesirable. Clearly, experimentation in independent study with achievement motivation as an independent variable is called for.
Personnel Practices A final example of the kinds of question that the theory prompts concerns personnel practices. McClelland (1961) suggested that high productivity in a society is associated with the society’s capacity to develop and effectively employ achievement-motivated persons. Such people seek out, perform well, and persist in challenging situations or in situations that present a substantial degree of risk to the individual.
Salkind_Chapter 60.indd 79
9/4/2010 10:47:39 AM
80
Motivation
Are the personnel practices in education such that individuals with high achievement motivation are attracted to and remain in education? Quite possibly not. The techniques generally used to attract and hold teachers, e.g., tenure, salary schedules, pay, vacations, fringe benefits, certification, etc., are techniques most likely to appeal to the person with low achievement motivation. If teachers do tend to be low in achievement motivation, what effect might this have? One may wonder, e.g., about the kind of environment created by the failure-threatened teacher. What kind of rewards would he employ? To what extent would he encourage independence and present a challenge? What kind of materials would he select and recommend? One may also wonder how students varying in nAch respond to such environments. Along a quite different line, one might question whether the lack of innovation in education is directly attributable to the lack of nAch extant in in the educational establishment. Might it be that the supposedly static condition of education is a direct result of personnel practices which tend to discourage the recruitment and retention of the risk taker and innovator? At the least it can be said that Atkinson’s model stimulates a host of questions in this area, questions that ought to be confronted and dealt with by educational researchers.
Toward a Theory of Academic Motivation It has been amply illustrated that Atkinson’s theory suggests a variety of studies relevant to a broad range of questions of relevance to education. The heuristic value of the theory can hardly be denied. Just as evident are the limitations of the theory in its present form. It is not now, nor does it presume to be, a theory of academic motivation. Although it suggests a variety of insights into the educational process, it can provide only limited advice for the practitioner. It is our view that the limitations can be overcome. Ways of doing just that have been suggested. In any case, we feel it is obvious that the theory can serve as a first step toward a theory of academic motivation.
Notes 1. The work of Crandall and her associates (e.g., Crandall, Katkovsky, & Crandall, 1965; Crandall, 1969; McGhee & Crandall, 1968: see also Battle, 1965, 1966), which in some sense derives from the work of Rotter (1954, 1966), is particularly interesting as is also the work of Cattell (e g., Cattell, Seakey, & Sweeney, 1966; Cattell & Butcher, 1968; Hundleby & Cattell, 1968) and Sarason, Davidson, Lightall, Waite, & Ruebush (1960). However, the educationally relevant findings proceeding from these and other approaches to motivation will be only incidentally treated in this paper. It may also be noted that much of the earlier work on achievement motivation was reviewed by Crandall (1963). 2. Rotter (1966) suggested that nAch and locus of control are indeed separable factors. Therewith, a number of researchers (Feather, 1967, 1969; Weiner & Kukla, 1970) have begun to explore the interaction of these variables and/or the relationship of
Salkind_Chapter 60.indd 80
9/4/2010 10:47:39 AM
Maehr and Sjogren
Atkinson’s Theory of Achievement Motivation
81
the attribution of causality to achievement motivation. De Charms (1968) actually reformulated achievement motivation theory in locus of causality terms. 3. Parenthetically, it may be noted that a number of studies (e.g., McKeachie, 1961; McKeachie, Lin, Milholland, & Isaacson, 1966; McKeachie, Isaacson, Milholland, & Lin, 1968; Morris, 1967) have explored the interaction of classroom environment (e.g., achievement oriented, affiliation oriented) and personal orientation (e.g., low-high nAch). In general, results have provided some but not consistent confirmation of the basic notion that high and low nAch Ss should be most motivated in presumably matching high-low nAch environments. One of the obvious difficulties with these studies is the definition and assessment of environments. What is an achievement or an affiliation environment? Be that as it may and complex as the problem is, it deserves further consideration on the part of educational researchers.
References Alexander, W. M., & Hines, V. A. Independent study in secondary schools. New York: Holt, 1967. Alschuler, A. S. How to increase motivation through climate and structure. Achievement Motivation Development Project Working Paper No. 8., Harvard University, Graduate School of Education, 1968. Atkinson, J. W. Motivational determinants of risk-taking behavior. Psychlogical Review, 1957, 64, 359–373. Atkinson, J. W. (Ed.) Motives in fantasy, action, and society. Princeton, N. J.: D. Van Nostrand, 1958. Atkinson, J. W. An introduction to motivation. Princeton, N. J.: D. Van Nostrand, 1964. Atkinson, J. W. Some general implications of conceptual developments in the study of achievement-oriented behavior. In M. R. Jones (Ed.), Human motivation: A symposium. Lincoln: University of Nebraska Press, 1965. Atkinson, J. W. Mainsprings of achievement oriented activity. In J. D. Krumboltz (Ed.), Learning and the educational process. Chicago: Rand McNally, 1966. Atkinson, J. W. Implications of curvilinearity in the relationship of efficiency of performance to strength of motivation for studies of individual differences in achievementrelated motives. Paper presented at National Academy of Sciences meeting, University of Michigan, October 24, 1967. Atkinson, J. W., Bastian, J. R., Earl, R. W., & Litwin, G. H. The achievement motive, goal setting, and probability preferences. Journal of Abnormal and Social Psychology, 1960, 60, 27–36. Atkinson, J. W., & Cartwright, D. Some neglected variables in contemporary conceptions of decision and performance. Psychological Reports, 1964, 14, 575–590. Atkinson, J. W., & Feather, N. T. (Eds.) A theory of achievement motivation. New York: Wiley, 1966. Atkinson, J. W., & Litwin, G. H. Achievement motive and test anxiety conceived as motive to approach success and motive to avoid failure. Journal of Abnormal and Social Psychology, 1960, 60, 52–63. Atkinson, J. W., & O’Connor, P. Neglected factors in studies of achievement-oriented performance: Social approval as incentive and performance decrement. In J. W. Atkinson & N. T. Feather (Eds.), A theory of achievement motivation. New York: Wiley, 1966. Battle, E. S. Motivational determinants of academic task persistence. Journal of Personality and Social Psychology, 1965, 2, 209–218. Cattell, R. B., & Butcher, H. J. The prediction of achievement and creativity. Indianapolis: Bobbs-Merrill, 1968.
Salkind_Chapter 60.indd 81
9/4/2010 10:47:39 AM
82
Motivation
Cattell, R. B., Sealey, A. P., & Sweeney, A. B. What can personality and motivation source trait measurements add to the practice of school achievement? British Journal of Educational Psychology, 1966, 36, 280–295. Crandall, V. C, Katkovsky, W., & Crandall, V. J. Children’s beliefs in their own control of reinforcements in intellectual-academic achievement situations. Child Development, 1965, 36, 91–109. Crandall, V. J. Achievement. In The sixty-second yearbook of the National Society for the Study of Education, Part I, Child Psychology, 1963. Crandall, V. J. Sex differences in expectancy of intellectual and academic reinforcement. In C. P. Smith (Ed.), Achievement-related motives in children. New York: The Russell Sage Foundation, 1969. de Charms, R. Personal causation. New York: Academic Press, 1968. de Charms, R., & Carpenter, V. Measuring motivation in culturally disadvantaged school children. In H. J. Klausmeier & G. T. O’Hearn (Eds.), Research and development toward the improvement of education. Madison, Wis.: Dembar Educational Services, 1969. de Charms, R., & Davé, P. Hope of success, fear of failure, subjective probability, and risk-taking behavior. Journal of Personality and Social Psychology, 1965, 1, 558–568. Edwards, A. L. Edwards Personal Preference Schedule. New York: Psychological Corporation, 1954. Feather, N. T. The relationship of persistence at a task to expectation of success and achievementrelated motive. Journal of Abnormal and Social Psychology, 1961, 63, 552–561. Feather, N. T. The study of persistence. Psychological Bulletin, 1962, 59, 94 –115. Feather, N. T. Persistence at a difficult task with alternative task of intermediate difficulty. Journal of Abnormal and Social Psychology, 1963, 66, 604–609. Feather, N. T. Valence of outcome and expectation of success in relation to task difficulty and perceived locus of control. Journal of Personality and Social Psychology, 1967, 7, 372–376. Feather, N. T. Attribution of responsibility and valence of success and failure in relation to initial confidence and task performance. Journal of Personality and Social Psychology, 1969, 13, 129–144. Festinger, L. A theory of social comparison processes. Human Relations, 1954, 7, 117–140. French, E. G., Effects of interaction of motivation and feedback of performance. In J. W. Atkinson (Ed.), Motives in fantasy, action, and society. Princeton, N. J.: D. Van Nostrand, 1958. French, E. G., & Lesser, G. S. Some characteristics of the achievement motive in women. Journal of Abnormal and Social Psychology, 1964, 68, 119–128. Heckhausen, H. The anatomy of achievement motivation. New York: Academic Press, 1967. Heckhausen, H. Achievement motive research: Current problems and some contributions towards a general theory of motivation. In W. J. Arnold (Ed.), Nebraska symposium on motivation, 1968. Lincoln: University of Nebraska Press, 1968. Hermans, H. J. M. The validity of different strategies of scale construction in predicting academic achievment. Educational and Psychological Measurement, 1960, 29, 877–883. Hermans, H. J. M. A questionnaire measure of achievement motivation. Journal of Applied Psychology, 1970, 54, 353–363. Horner, M. Sex differences in achievement motivation and performance in competitive and non-competitive situations. (Doctoral dissertation, University of Michigan) Ann Arbor, Mich.: University Microfilms, 1968. No. 69–12, 135. Houts, P. S., & Entwistle, D. R. Academic achievement effort among females: Academic attitudes and sex-role orientation. Journal of Counseling Psychology, 1968, 15, 284 –286.
Salkind_Chapter 60.indd 82
9/4/2010 10:47:40 AM
Maehr and Sjogren
Atkinson’s Theory of Achievement Motivation
83
Hundleby, J. D., & Cattell, R. B. Personality structure in middle childhood and the prediction of school achievement and adjustment. Child Development Monographs, 1968, No. 121. Hurley, J. R. The Iowa Picture Interpretation Test: A multiple choice variation of the TAT. Journal of Consulting Psychology, 1955, 19, 372–376. Hurley, J. R. Achievement imagery and motivational instructions as determinants of verbal learning. Journal of Personality, 1957, 25, 274–282. Isaacson, R. L. Relation between n achievement, test anxiety, and curricular choices. Journal of Abnormal and Social Psychology, 1964, 68, 447–452. Johnston, R. A. A methodological analysis of several revised forms of the Iowa Picture Interpretation Test. Journal of Personality, 1957, 25, 283–293. Karabenick, S. A., & Youssef, Z. I. Performance as a function of achievement motive level and perceived difficulty. Journal of Personality and Social Psychology, 1968, 10, 414 – 419. Katz, I. The socialization of academic motivation in minority group children. In D. Levine (Ed.), Nebraska symposium on motivation, 1967. Lincoln: University of Nebraska Press, 1967. Kight, H. R., & Sassenrath, J. M. Relation of achievement motivation and test anxiety to performance in programmed instruction. Journal of Educational Psychology, 1966, 57, 14 –17. Klinger, E. Fantasy 291 need achievement as a motivational construct. Psychological Bulletin, 1966, 66, 291–308. Klinger, E. Modeling effects on achievement imagery. Journal of Personality and Social Psychology, 1967, 7, 49–62. Klinger, E., & McNelly, F. W. Fantasy need achievement and performance: A role analysis. Psychological Review, 1969, 76, 574 –591. Kogan, N., & Wallach, M. A. Risk-taking as a function of the situation, the person and the group. In New direction in psychology III. New York: Holt, 1967. Lefcourt, H. M. Internal versus external control of reinforcement. Psychological Bulletin, 1966, 65, 206–220. Littig, L. W. Negro personality correlates of aspiration to traditionally open and closed occupations. Journal of Negro Education, 1968, 37, 31–36. Litwin, G. H. Motives and expectancies as determinants of preference for degrees of risk. Unpublished honors thesis, University of Michigan, 1958. Litwin, G. H. Achievement motivation, expectancy of success, and risk-taking behavior. In J. W. Atkinson & N. T. Feather (Eds.), A theory of achievement motivation. New York: Wiley, 1966. Maehr, M. L. Some limitations of the application of reinforcement theory to education. School and Society, 1968, 96, 108–110. Maehr, M. L., & Videbeck, R. Predisposition to risk and persistence under varying reinforcement-success schedules. Journal of Personality and Social Psychology, 1968, 9, 96 –100. Mahone, C. H. Fear of failure and unrealistic vocational aspiration. Journal of Abnormal and Social Psychology, 1960, 60, 253–261. McClelland, D. C. Risk taking in children with high and low need for achievement. In J. W. Atkinson (Ed.), Motives in fantasy, action, and society. Princeton, N. J.: D. Van Nostrand, 1958. McClelland, D. C. The achieving society. Princeton, N. J.: D. Van Nostrand, 1961. McClelland, D. C. Longitudinal trends in the relation of thought to action. Journal of Consulting Psychology, 1966, 30, 479– 484. McClelland, D. C, Atkinson, J. W., Clark, R. A., & Lowell, E. L. The achievement motive. New York: Appleton, 1953.
Salkind_Chapter 60.indd 83
9/4/2010 10:47:40 AM
84
Motivation
McKeachie, W. J. Motivation, teaching methods, and college learning. In M. R. Jones (Ed.), Nebraska symposium on motivation, 1961. Lincoln: University of Nebraska Press, 1961. McKeachie, W. J., Isaacson, R. L., Milholland, J. E., & Lin, Y. G. Student achievement cues and academic achievement. Journal of Consulting and Clinical Psychology, 1968, 32, 26–29. McKeachie, W. J., Lin, Y. G., Milholland, J., & Isaacson, R. Student affiliation motives, teacher warmth and academic achievement. Journal of Personality and Social Psychology, 1966, 4, 457–461. Mehrabian, A. Male and female scales of the tendency to achieve. Educational and Psychological Measurement, 1968, 28, 493–502. Mehrabian, A. Measures of achieving tendency. Educational and Psychological Measurement, 1969, 29, 445–451. Mingione, A. D. Need for achievement in Negro and white children. Journal of Consulting Psychology, 1965, 29, 108–111. Mingione, A. D. Need for achievement in Negro, white, and Puerto Rican children. Journal of Consulting and Clinical Psychology, 1968, 32, 94 –95. Morris, J. L. Teacher-student interaction as a determinant of academic grades in the secondary school. The Australian Journal of Education, 1967, 11, 13–23. Moulton, R. W. Effects of success and failure on level of aspiration as related to achievement motives. Journal of Personality and Social Psychology, 1965, 1, 399– 406. O’Connor, P., Atkinson, J. W., & Horner, M. Motivational implications of ability grouping in schools. In J. W. Atkinson & N. T. Feather (Eds.), A theory of achievement motivation. New York: Wiley, 1966. Rosen, B. C. Race, ethnicity, and the achievement syndrome. American Sociological Review, 1959, 24, 47–60. Rotter, J. B. Social learning and clinical psychology. Englewood Cliffs, N. J.: Prentice-Hall, 1954. Rotter, J. B. Generalized expectancies for internal versus external control of reinforcement. Psychological Monographs, 1966, 80 (1, Whole No. 609). Sarason, S. B., Davidson, K. S., Lightall, F. F., Waite, R. R., & Ruebush B. K. Anxiety in elementary school children. New York: Wiley, 1960. Shrable, K., & Sassenrath, J. M. Effects of achievement motivation and test anxiety on performance in programmed instruction. Paper presented at American Educational Research Association convention, Los Angeles, February 1969. Skolnick, A. Motivational imagery and behavior over twenty years. Journal of Consulting Psychology, 1966, 30, 463– 478. Smith, C. P. Relationships between achievement-related motives and intelligence, performance level, and persistence, Journal of Abnormal and Social Psychology, 1964, 68, 523–532. Smith, C. P. (Ed.) Achievement-related motives in children. New York: The Russell Sage Foundation, 1969. Stuempfig, D. W., & Maehr, M. L. Persistence as a function of conceptual structure and quality of feedback. Child Development, 1970, 41(4), in press. Veroff, J. Social comparison and the development of achievement motivation. In C. P. Smith (Ed.), Achievement-related motives in children. New York: The Russell Sage Foundation, 1969. Veroff, J., Wilcox, S., & Atkinson, J. W. The achievement motive in high school and college age women. Journal of Abnormal and Social Psychology, 1953, 48, 108–119. Weiner, B. Implications of the current theory of achievement motivation for research and performance in the classroom. Psychology in the Schools, 1967, 4, 164 –171.
Salkind_Chapter 60.indd 84
9/4/2010 10:47:40 AM
Maehr and Sjogren
Atkinson’s Theory of Achievement Motivation
85
Weiner, B., & Kukla, A. An attributional analysis of achievement motivation. Journal of Personality and Social Psychology, 1970, 15, 1–20. Weinstein, M. S. Achievement motivation and risk preference. Journal of Personality and Social Psychology, 1969, 13, 153–172. Weston, P. J., & Mednick, M. T. Race, social class and the motive to avoid success in women. Journal of Cross-Cultural Psychology, 1970, 1, 284–291. White, R. W. Motivation reconsidered: The concept of competence. Psychological Review, 66, 297–333. White, R. W. Competence and the psychosexual stages of development. In M. R. Jones (Ed.), Nebraska symposium on motivation. Lincoln: University of Nebraska Press, 1960. Zigler, E., & Child, I. L. Socialization. In G. Lindzey & E. Aronson (Eds.). The handbook of social psychology, Vol. III. (2nd ed.) Reading, Mass.: Addison-Wesley, 1969. Zigler, E., & Kanzer, P. The effectiveness of two classes of verbal reinforcers on the performance of middle- and lower-class children. Journal of Personality, 1962, 30, 157–163.
Salkind_Chapter 60.indd 85
9/4/2010 10:47:40 AM
Salkind_Chapter 60.indd 86
9/4/2010 10:47:40 AM
61 Motivation and Engagement across the Academic Life Span: A Developmental Construct Validity Study of Elementary School, High School, and University/ College Students Andrew J. Martin
S
tudents in elementary school, high school, and university/college share a great deal in common. In each context, students are required to apply themselves over a sustained period of time to develop their academic skills, engage with key performance demands, negotiate the rigors of competition, deal with setback and adversity, cope with possible self-doubt and uncertainty, and develop psychological and behavioral skills to effectively manage the ups and downs of the ordinary course of academic life. Given these congruencies across distinct educational stages, it is feasible to propose that there will be core and common constructs relevant and meaningful across the academic life span. This study seeks to assess this issue in the context of academic motivation and engagement; more specifically, it seeks to assess the validity of recently developed academic motivation and engagement instrumentation in the context of students from elementary school, high school, and university/college. Analyses conducted in this investigation across these three distinct educational stages are proposed as a developmental construct validity study of academic motivation and engagement.
Source: Educational and Psychological Measurement, 69(5) (2009): 794 – 824.
Salkind_Chapter 61.indd 87
9/4/2010 10:53:03 AM
88
Motivation
Substantive Background: An Integrative Framework for Motivation and Engagement and Implications for Measurement The substantive background to the study centers on academic motivation and engagement and the need for more pragmatic and integrative approaches to their measurement and theorizing. In critical reviews of motivation and engagement research, it has been suggested that such research oftentimes yields limited practical implications and applications and that there is a need to devise research that advances scientific understanding but that also has applied utility. Hence, there have been calls to give greater attention to useinspired basic research in education and psychology contexts (Stokes, 1997; see also Greeno, 1998; Pintrich, 2000, 2003). Critical reviews of motivation and engagement research also point to the fact that such research is diverse and fragmented. As a result, there have also been calls for more integrative approaches to its research and theorizing (Bong, 1996; Murphy & Alexander, 2000; Pintrich, 2003). It is in this context that the Motivation and Engagement Wheel (Martin, 2001, 2002, 2007a) was developed. The wheel is presented in Figure 1. As Figure 1 shows, there are two levels at which the wheel has been conceptualized: the integrative higher order level, comprising 4 factors, and the lower (or first-order) level, comprising 11 factors. As discussed fully by Martin (2007a, 2008a, 2008b), higher order factors (and corresponding first-order factors) are adaptive cognitions (self-efficacy, valuing, mastery orientation), adaptive behaviors (planning, task management, persistence), impeding/maladaptive cognitions (anxiety, failure avoidance, uncertain control), and maladaptive behaviors (self-handicapping, disengagement). Initially this wheel was developed to better understand motivation and engagement among high school students; however, in the present study its application to elementary school and university students is assessed from a developmental construct validity perspective (described below).
Higher Order Dimensions of Motivation and Engagement Martin (2007a, 2008a, 2008b) proposed that over the past four decades a number of psychological theories and models have been developed that explain the nature of human cognition and behavior. He demonstrated that there are significant commonalities across these theories and models, which provide direction as to fundamental (higher order) dimensions of motivation and engagement. These commonalities operate at three levels. The first level delineates cognitive and behavioral elements, including work encompassing cognitive and behavioral orientations in learning strategies (Pintrich & DeGroot, 1990; Pintrich & Garcia, 1991), cognitive antecedents of
Salkind_Chapter 61.indd 88
9/4/2010 10:53:04 AM
Martin
Motivation and Engagement
ADAPTIVE COGNITION
89
ADAPTIVE BEHAVIOR Valuing
Persistence
Mastery orientation
Planning
Selfefficacy
Task management
Anxiety Disengagement
Failure avoidance Selfhandicapping MALADAPTIVE BEHAVIOR
Uncertain control IMPEDING/MALADAPTIVE COGNITION
Source: Adapted from Martin (2003a).
Figure 1: Motivation and engagement wheel
behavioral strategies used to negotiate environmental demands (Buss & Cantor, 1989), cognitive-behavioral approaches to engagement and behavior change (Beck, 1995), and cognitive-affective and behavioral dimensions to academic engagement (Miller, Greene, Montalvo, Ravindran, & Nichols, 1996; Miserandino, 1996). The second level demonstrates the differential empirical strength of distinct aspects of motivation and engagement – for example, self-efficacy reflects highly adaptive motivation (Bandura, 1997; Pajares, 1996), anxiety impedes individuals’ engagement (Sarason & Sarason, 1990; Spielberger, 1985), and behaviors such as self-handicapping reflect quite maladaptive engagement (Martin, Marsh, & Debus, 2001a, 2001b, 2003; Martin, Marsh, Williamson, & Debus, 2003). The third level informs the structure of motivation and engagement frameworks, such as those hypothesizing and empirically demonstrating hierarchical models of human cognition and behavior that encompass specific factors under more global characterizations (e.g., Elliot & Church, 1997; Marsh & Shavelson, 1985; Shavelson, Hubner, & Stanton, 1976).
Salkind_Chapter 61.indd 89
9/4/2010 10:53:04 AM
90
Motivation
Taken together and in consideration of the joint issues of motivational and behavioral orientations; cognitive-behavioral frameworks; differing empirical levels of adaptive, impeding, and maladaptive dimensions in applied settings; and hierarchical models of cognition and behavior, Martin (2007a, 2008a, 2008b) proposed that motivation can be characterized in terms of four higher order dimensions: (a) adaptive cognition, (b) adaptive behavior, (c) impeding/ maladaptive cognition, and (d) maladaptive behavior. These dimensions and their component first-order factors have been synthesized under the Motivation and Engagement Wheel (Martin, 2001, 2003a, 2003c, 2007a, 2008b) presented in Figure 1.
First-Order Dimensions of Motivation and Engagement Pintrich (2003) identified core substantive questions for the development of a motivational science. Taken together, these questions underscore the importance of considering, conceptualizing, and articulating a model of motivation from salient and seminal theorizing related to self-efficacy, control, valuing, goal orientation, need achievement, self-worth, and self-regulation. These, it is suggested, provide a useful heuristic for the identification of first-order constructs for operationalizing the Motivation and Engagement Wheel. As discussed fully by Martin (2001, 2002, 2003c, 2007a), (a) self-efficacy theory (e.g., Bandura, 1997) is reflected in the self-efficacy dimension of the wheel, (b) attributions and control are reflected in the uncertain control dimension (tapping the controllability element of attributions; see Connell, 1985; Weiner, 1994), (c) valuing (e.g., Eccles, 1983; Wigfield & Tonks, 2002) is reflected in the valuing dimension, (d) self-determination (in terms of intrinsic motivation; see Ryan & Deci, 2000) and motivation orientation (see Dweck, 1986; Martin & Debus, 1998; Nicholls, 1989) are reflected in the mastery orientation dimension, (e) self-regulation (e.g., Martin, 2001, 2002, 2003c, 2007a; Martin, Marsh, & Debus, 2001a, 2001b, 2003; Zimmerman, 2002) is reflected in the planning, task management, and persistence dimensions, and (f ) need achievement and self-worth (e.g., Atkinson 1957; Covington, 1992; Martin & Marsh, 2003; McClelland, 1965) are reflected in the failure avoidance, anxiety, self-handicapping, and disengagement dimensions. Hence, the wheel comprises 11 lower, or first-order, dimensions (see Figure 1).
Measurement and the Motivation and Engagement Scale Alongside the Motivation and Engagement Wheel is its accompanying instrumentation, the Motivation and Engagement Scale (MES). Typically administered to high school students, the Motivation and Engagement Scale–High School (MES-HS; Martin, 2001, 2003c, 2007a, 2007b, 2008a)
Salkind_Chapter 61.indd 90
9/4/2010 10:53:04 AM
Martin
Motivation and Engagement
91
demonstrates a strong factor structure that is invariant across gender and age (but there are mean-level differences such that females generally report higher levels of motivation than do males, and middle high school students report lower motivation than do junior and senior high school students) and is reliable and normally distributed. It has also been found to predict a variety of educational outcomes such as enjoyment of school, classroom participation, educational aspirations, and achievement-related outcomes such as school grades. To extend this line of research, the present investigation assesses parallel forms of the MES: for elementary school students, the Motivation and Engagement Scale-Junior School (MES-JS), and for college or university students, the Motivation and Engagement Scale–University/College (MES-UC). Over the past few years, there has been growing research around the Motivation and Engagement Wheel and its accompanying instrumentation, the MES. The MES is robust in the high school (Martin, 2007a), workplace (Martin, in press b; see also Martin 2005b, 2005c), music (Martin, 2008b), sport (Martin, 2008b), and physical activity domains (Martin, Tipler, Marsh, Richards, & Williams, 2006). The wheel and MES are useful as bases for educational intervention (Martin, 2005a, 2008b). The wheel and MES are helpful foundations for assessing group-level (climate) effects (Martin & Marsh, 2005). Finally, the wheel and MES are useful in addressing more specific educational issues such as domain specificity (Green, Martin, & Marsh, 2007), teacher effects (Martin & Marsh, 2005), and the role of parents and teachers in the motivation and engagement process (Martin, 2003b, 2006). However, to date, there has been no thoroughgoing and detailed scoping of the wheel and MES across the span of education – that is, across elementary school, high school, and university samples (but see Martin, in press-b, for brief research in the context of sport, music, work, and daily life motivation and engagement). The present study does so from a proposed developmental construct validity perspective.
Methodological Background: A Developmental Construct Validity Perspective Researchers in psychology and education have increasingly emphasized the need to develop and evaluate instruments within a construct validation framework (e.g., see Marsh, 2002; Marsh & Hau, 2007). Investigations that adopt a construct validation approach can be classified as within-network or between-network studies. Moreover, it is proposed here that when construct validity is assessed across distinct educational stages it constitutes something of a developmental construct validity perspective. Specifically, it is proposed that a dual within- and between-network approach across elementary school, high school, and university represents a developmental construct validity approach to assessing the generality of motivation and engagement across the academic life span.
Salkind_Chapter 61.indd 91
9/4/2010 10:53:04 AM
92
Motivation
Within-Network Validity Beginning with a logical analysis of internal consistency of the construct definition, measurement instruments, and generation of predictions, withinnetwork studies typically employ empirical techniques such as exploratory factor analysis, confirmatory factor analysis (CFA), and reliability analysis. The present study conducts within-network analyses across the three samples using CFA to test the multidimensional motivation and engagement framework and reliability analysis to test the internal consistency of scores. Consistent with previous studies of high school students (e.g., Green et al., 2007; Martin, 2001, 2003c, 2007a) and across diverse performance settings such as music and sport (Martin, 2008b), it is hypothesized that at each educational stage (elementary school, high school, and university), the motivation and framework instrumentation (MES) will evince a sound first- and higher order factor structure and comprise reliable scores.
Between-Network Validity Between-network research explores relationships between a target central framework and a set of factors external to the framework. It typically does so through statistical procedures such as correlation, regression, or structural equation modeling (SEM) analyses to examine relationships between measures and instruments. The present study conducts between-network analyses across the three samples by assessing (a) the invariance of factor structure across gender, age groups, and educational stages (elementary school, high school, university/college); (b) mean-level differences across educational stages; and (c) the empirical links between the hypothesized first- and higher order factors and a set of cognate between-network measures (enjoyment of school or university, class participation, positive intentions, academic buoyancy, homework/assignment completion). Each of these between-network techniques is described in turn. Factorial invariance in the structure of motivation and engagement. As described by Martin (2007a, 2008b), insufficient attention is given to analyses of the factor structure of motivation and engagement and the extent to which a given motivation and engagement instrument and its components are invariant across different groups. Such concerns about factor structure invariance are most appropriately evaluated using CFA to determine whether – and how – the structure of motivation and engagement varies according to key subpopulations (see Hattie, 1992; Marsh, 1993). Martin (2004, 2007a) has previously shown the MES factor structure (factor loadings, uniquenesses, correlations/variances) to be invariant across early-, mid-, and late-adolescent samples and also across gender. The present study is an opportunity to assess invariance across gender and age within elementary school and university.
Salkind_Chapter 61.indd 92
9/4/2010 10:53:04 AM
Martin
Motivation and Engagement
93
It is also an opportunity to assess invariance across elementary school, high school, and university samples. Consistent with previous studies of high school students (e.g., Green et al., 2007; Martin, 2001, 2003c, 2007a) and across diverse performance settings such as music and sport (Martin, 2008b), it is hypothesized that factor structure (including loadings, correlations/variances, and uniquenesses) across gender, age, and educational stage will evince relative invariance. Mean-level educational stage effects. Very little research has assessed mean levels of motivation and engagement across the academic life span: elementary school, high school, and university. The transition from elementary to middle school has been found to pose difficulties and challenges unique to that time (Anderman & Midgley, 1997; Roeser, Eccles, & Sameroff, 2000), and a decline in student motivation and engagement is typically found to emerge after this transition (see Martin, 2001, 2003c, 2004, 2007a; Wigfield & Tonks, 2002), including changes in subjective task value (Wigfield, Eccles, Mac Iver, Reuman, & Midgley, 1991). As students move on to university/ college, some research has found them to be more confident in the quantity and quality of their abilities, whereas other research finds it a difficult transition with less support and structure and a major challenge in asserting one’s identity among highly capable peers (Martin, Marsh, Williamson, et al., 2003). Increasingly, universities and colleges are recognizing the stresses and strains of undergraduate life and the difficulties in making a successful transition from high school (see Martin, Milne-Home, Barrett, & Spalding, 1997; Martin, Milne-Home, Barrett, Spalding, & Jones, 2000). Indeed, Martin and colleagues (Martin, Marsh, Williamson, et al., 2003) found university to present distinct challenges that instill doubts and uncertainties that in some cases lead to self-handicapping, poorer academic performance, and eventual dropout. Taken together, then, it is hypothesized that elementary school students will evince relatively higher mean levels of motivation and engagement than high school and university students do; however, no predictions are made regarding the relative mean levels of the latter two groups. Motivation, engagement, and cognate correlates. Consistent with the construct validity approach, it is proposed that five between-network constructs provide a theoretically relevant basis for examining the external validity of the MES across the academic life span: positive intentions, class participation, enjoyment of school, academic buoyancy, and homework/assignment completion. In terms of positive intentions, several researchers have shown that students higher in motivation and engagement are more likely to take advanced or optional courses and also more likely to report future course enrolment intentions (Meece, Wigfield, & Eccles, 1990). In addition to positive intentions, class participation is deemed a feasible between-network construct. Learning environments that foster student participation are found to enhance students’ commitment to learning (Richter & Tjosvold, 1980), whereas a lack of participation is found to lead to unsuccessful educational
Salkind_Chapter 61.indd 93
9/4/2010 10:53:04 AM
94
Motivation
outcomes such as emotional withdrawal and poor identification with the school (Finn, 1989). Enjoyment of school is another feasible between-network construct. Elliot and Sheldon (1997), for example, included enjoyment as one of the five key variables in their study of goal pursuit. Even research in higher education finds that enjoyment is a key factor in students’ engagement at university (Lee, Sheldon & Turban, 2003). Martin and Marsh (2006, 2008a, 2008b) have shown academic buoyancy to be a factor relevant to students’ ability to deal with academic setback in the ordinary course of academic life and also have shown a variety of motivation and engagement factors to be significantly associated with such buoyancy. It is also proposed that in addition to these four intrapsychic measures, there is a need for more behavioral measures (Green et al., 2007) that in the present study take the form of homework/assignment completion. Consistent with previous studies of high school students (e.g., Green et al., 2007; Martin, 2001, 2003c, 2007a) and across diverse performance settings such as music and sport (Martin, 2008b, in press-a, in press-b), it is hypothesized that the adaptive dimensions will be positively (to a modest or strong degree) associated with these correlates, the impeding/maladaptive dimensions will be associated at near-zero or negatively (to a weak or modest degree), whereas maladaptive dimensions will be more markedly negatively (to a modest or strong degree) associated with these correlates.
Aims of the Study The overarching aim of this study is to examine the developmental construct validity of motivation and engagement across elementary school, high school, and university samples. More specifically, this study assesses a recently developed integrative motivation and engagement instrumentation across the academic life span with a view to assessing (a) within-network validity in terms of firstand higher order factor structure and reliability and (b) between-network validity in terms of invariance of factor structure across groups (gender, age, educational stage), mean-level differences across educational stage, and associations with cognate correlates.
Method Elementary School Sample and Procedure The elementary school sample comprised 624 upper-age elementary students in five schools. All schools were located in urban areas drawing from two capital cities in Australia. Students were age 9 to 11.5 years (n = 114, 56%
Salkind_Chapter 61.indd 94
9/4/2010 10:53:04 AM
Martin
Motivation and Engagement
95
female and 44% male) and 11.5 years to 13 years (n = 510, 38% female and 62% male). The mean age of students was 11.13 (SD = 0.69) years. Teachers read the MES-JS (Martin, 2007b) items aloud to students during class or pastoral care/tutorial groups. The rating scale was first explained, and sample items were presented. Students were then asked to complete the instrument as the teacher read out each item in turn and to return the completed form to the teacher at the end of class or pastoral care/tutorial group. Previous work has been conducted in a smaller urban and rural elementary school sample (Martin, Craven, & Munns, 2006); however, this work only comprised a factor analysis of the MES-JS with no invariance testing, mean-level analyses, analyses in the context of the academic life span, and external validity checks. The present study, then, is a significant progression on previous work.
High School Archive Sample and Procedure The high school sample comprised data collected from 21,579 high school students from 58 Australian schools. Thirty-six schools were government, and 22 schools were independent, and they were from urban and regional areas across most states in Australia. Students were age 12 to 13 years (n = 6,640, 49% female and 51% male), 14 to 15 years (n = 7,894, 43% female and 57% male), and 16 to 18 years (n = 7,045, 44% female and 56% male). The mean age of students was 14.52 (SD = 1.57) years. The high school sample is something of an archive sample that has been compiled over recent years across numerous research projects. Portions of the data have been reported elsewhere with a more substantial construct validity study by Martin (2007a) assessing the MES-HS among 12,237 high school students, all of whom are included as part of the present archive sample of 21,579 students. The reader is urged to consult Martin (2007a; see also Martin, 2008b, in press-a, in press-b) for these academic motivation and engagement data in the context of other performance domains such as sport, music, and work as the first substantial large-sample investigation into the MES-HS. The archive dataset represents the integration of data collected over the previous 5 years and so can be considered to be relatively current. Teachers administered the MES-HS (Martin, 2001, 2003c, 2007a, 2007b) to students during class or pastoral care/ tutorial groups. The rating scale was first explained, and sample items were presented. Students were then asked to complete the instrument on their own and to return the completed form to the teacher at the end of class or pastoral care.
University Sample and Procedure University (college) respondents were 420 undergraduate students from two Australian universities. One university is well established and one of the oldest in the country (68% of sample). The other is a more recently
Salkind_Chapter 61.indd 95
9/4/2010 10:53:04 AM
96
Motivation
established institution (32%). Most respondents were women (80%), and 20% were men. Most students were enrolled in education (66%), with other students enrolled in arts (18%), psychology/social science (8%), social work (3%), science (3%), and communications (2%). Most were full-time students (96%), with 4% part-time. Most were in their first year of study (65%), with 25% in second year, 7% in third year, and 3% in fourth or fifth year. The mean age of students was 21.47 (SD = 6.62) years, with 60% under 20 years of age and 40% 20 years and over. Students completed the instrument in lecture or tutorial time. Students were asked to complete the MES-UC (Martin, 2007b) on their own and return the completed instrument at the end of the lecture or tutorial they were attending at the time.
Materials Motivation and Engagement Scale General overview. The MES-JS (Martin, 2007b), MES-HS (Martin, 2001, 2003c, 2007a, 2007b), and MES-UC (Martin, 2007b) are instruments that measure elementary, high school, and university students’ motivation and engagement, respectively. Adapted from the MES-HS, the MES-JS and MES-UC assess motivation and engagement through three adaptive cognitive dimensions (self-efficacy, valuing, mastery orientation), three adaptive behavioral dimensions (persistence, planning, task management), three impeding/maladaptive cognitive dimensions (anxiety, failure avoidance, uncertain control), and two maladaptive behavioral dimensions (self-handicapping, disengagement). Each of the 11 factors comprises four items – hence, the MES is a 44-item instrument. The MES-JS and MES-UC comprise the same number of items (44) and the same number of first-order (11) and higher order (4) factors as the original high school instrument (MES-HS). As much as possible, item adaptation aimed to make simple and transparent word and terminology changes in order to remain very parallel to the high school form. In the appendix, a sample item from the MES-HS is presented along with its MES-JS and MES-UC adaptations (see Martin, 2007a, for a full account of the origins of and rationale for the scale and item development). To simplify the survey for younger students, the MES-JS asks students to rate themselves on a shorter scale of 1 (strongly disagree) to 5 (strongly agree), whereas for the MES-HS and MES-UC, students rate themselves on a scale of 1 (strongly disagree) to 7 (strongly agree). In most studies using the MES (e.g., Martin, 2007a, 2008a, 2008b, in press-a), the 7-point rating scale is typically used. However, the elementary school sample posed a distinct challenge in that a simpler survey form was desirable: Pilot work indicated that students had difficulty teasing apart the finer-grained rating points on the 7-point scale.
Salkind_Chapter 61.indd 96
9/4/2010 10:53:04 AM
Martin
Motivation and Engagement
97
Adaptive cognitive and behavioral dimensions. Each adaptive dimension falls into one of two groups: cognitions and behaviors. Adaptive cognitions include self-efficacy, mastery orientation, and valuing. Adaptive behaviors include persistence, planning, and task management. Self-efficacy is students’ belief and confidence in their ability to understand or to do well in their school or university work, to meet challenges they face, and to perform to the best of their ability. Valuing of school or university is how much students believe what they do and learn at school or university is useful, important, and relevant to them. Mastery orientation entails being focused on understanding, learning, solving problems, and developing skills. Planning is how much students plan their work and how much they keep track of their progress as they are doing it. Task management refers to the way students use their time, organize their timetables, and choose and arrange where they prepare for school or university and school or university tasks. Persistence reflects students’ capacity to persist in situations that are challenging and at times when they find it difficult to do what is required. Impeding and maladaptive cognitive and behavioral dimensions. Impeding/ maladaptive cognitive dimensions are anxiety, failure avoidance, and uncertain control. Anxiety has two parts: feeling nervous and worrying. Feeling nervous is the uneasy or sick feeling students get when they think about their school or university work or tasks. Worrying is their fear of not doing very well in their school or university work. Failure avoidance occurs when the main reason students try at school or university is to avoid doing poorly or to avoid being seen to do poorly. Uncertain control assesses students’ uncertainty about how to do well or how to avoid doing poorly. Maladaptive behavioral dimensions are self-handicapping and disengagement. Self-handicapping occurs when students reduce their chances of success at school or university. Examples are engaging in other activities when they are meant to be doing their school or university work or preparing for upcoming school or university tasks. Disengagement occurs when students give up or are at risk of giving up at school or university or in particular school or university activities.
Between-Network Correlates Students were also administered items that explored their enjoyment of school or university (4 items; e.g., elementary school item: “I like school,” Cronbach’s α = .94; high school item: “I like school,” α = .91; university item: “I like university,” α = .91), class participation (4 items; e.g., elementary school item: “I get involved in things we do in class,” α = .90; high school item: “I get involved in things we do in class,” α = .90; university item: “I get involved in things we do in class,” α = .93), positive intentions (4 items; e.g., high school item: “I intend to complete school,” α = .82; university item: “I intend to complete university,” α = .72), and academic buoyancy (4 items; e.g., elementary
Salkind_Chapter 61.indd 97
9/4/2010 10:53:04 AM
98
Motivation
school item: “I think I’m good at dealing with schoolwork pressures,” α = .78; high school item: “I think I’m good at dealing with schoolwork pressures,” α = .80; university item: “I think I’m good at dealing with university pressures,” α = .84). These measures were rated on a scale of 1 (strongly disagree) to 7 (strongly agree) and were adapted directly from Martin (2007a, 2008b; see also Martin & Marsh, 2006, 2008a, 2008b), who has shown them to be reliable, a good fit to the data in CFA, and significantly associated with motivation and engagement in other performance domains such as sport and music. Homework /assignment completion (“How often do you do and complete your assignments?”) was a single item assessed on a rating scale of 1 (never) to 5 (always).
Confirmatory Factor Analysis and Structural Equation Modeling CFA and SEM, performed with LISREL 8.80 (Jöreskog & Sörbom, 2006), were used to test the hypothesized models. In CFA and SEM, the researcher posits an a priori structure and tests the ability of a solution based on this structure to fit the data by demonstrating that (a) the solution is well defined, (b) parameter estimates are consistent with theory and a priori predictions, and (c) the subjective indices of fit are reasonable (McDonald & Marsh, 1990). Maximum likelihood was the method of estimation used for the models. In evaluating goodness of fit of alternative models, the root mean square error of approximation (RMSEA) is emphasized, as are the comparative fit index (CFI), the non-normed fit index (NNFI), and an evaluation of parameter estimates. For RMSEAs, values at or less than .05 and .08 are taken to reflect a close and reasonable fit, respectively (see Jöreskog & Sörbom, 1993). The CFI and NNFI vary along a 0 to 1 continuum in which values at or greater than .90 and .95 are typically taken to reflect acceptable and excellent fits to the data, respectively (McDonald & Marsh, 1990). The CFI contains no penalty for a lack of parsimony, whereas the RMSEA contains penalties for a lack of parsimony.
Missing Data For large-scale studies, the inevitable missing data are a potentially important problem, particularly when the amount of missing data exceeds 5% (e.g., Graham & Hoffer, 2000). A growing body of research has emphasized potential problems with traditional pairwise, listwise, and mean substitution approaches to missing data (e.g., Graham & Hoffer, 2000), leading to the implementation of the expectation maximization (EM) algorithm, the most widely recommended approach to imputation for data that are missing at random, as operationalized using missing value analysis in LISREL. In fact, less than 5% of the MES data were missing in each of the elementary
Salkind_Chapter 61.indd 98
9/4/2010 10:53:04 AM
Martin
Motivation and Engagement
99
school, high school, and university samples, and so the EM algorithm was implemented for all samples. Also explored were alternative approaches to this problem, which showed that results based on the EM algorithm used here were very similar to those based on the traditional pairwise deletion methods for missing data – as would be expected to be the case when there were so few missing data.
Multigroup Confirmatory Factor Analysis and Tests of Invariance Two broad sets of invariance tests were conducted. The first assessed invariance within samples. The second assessed invariance between samples. For the within sample invariance tests, for each of elementary school, high school, and university, multigroup CFAs were conducted to assess invariance across gender and age. For the between-sample invariance tests, three invariance analyses were conducted – between high school and university on the original 7-point rating scale; between elementary school, high school, and university using a common 5-point rating scale (reliabilities for the transformed 5-point variables: high school α range = .75 to .81; university α range = .66 to .86); and between elementary school and university on a common 5-point rating scale (the common 5-point rating scale was derived by aggregating the first and last 2 points of the 7-point rating scale). Although the chi-square difference test is the most straightforward means of assessing differences between nested models, problems associated with such tests exist (e.g., see McDonald & Marsh, 1990; Tabachnick & Fidell, 2007). Hence, in formally assessing differences in models, emphasis is given to differences in fit indices (Cheung & Rensvold, 2002).
Multiple-Indicator, Multiple-Cause Models Notwithstanding the importance of testing for invariance in factor structure, there is also reason to investigate the mean-level developmental effects on the 11 facets of the MES-JS, MES-HS, and MES-UC. Kaplan (2000) suggested the multiple-indicator, multiple-cause (MIMIC) approach, which is similar to a regression model in which latent variables (e.g., multiple dimensions of motivation and engagement) are “caused” by discrete grouping variables (e.g., educational stage) that are represented by single indicators. This MIMIC model assessed the role of educational stage (elementary school, high school, university) as a predictor of motivation and engagement. Being a multinomial predictor and using high school as the reference point, educational stage was represented by two dummy variables: high school (0) versus elementary school (1) and high school (0) versus university (1); hence, positive beta weights for both dummy variables indicate
Salkind_Chapter 61.indd 99
9/4/2010 10:53:04 AM
100
Motivation
higher scores for elementary school and university students compared with high school students, and negative beta weights for both dummy variables indicate lower scores for elementary school and university students compared with high school students.
Results First- and Higher Order Confirmatory Factor Analysis In the first instance, an 11-factor model was examined using CFA. The CFA yielded a very good fit to the data for elementary school ( c2 = 1,881.10, df = 847, p < .001, CFI = .98, NNFI = .97, RMSEA = .04), high school ( c2 = 28,217.75, df = 847, p < .001, CFI = .98, NNFI = .98, RMSEA = .04), and university ( c2 = 1,697.75, df = 847, p < .001, CFI = .96, NNFI = .95, RMSEA = .05). Factor loading ranges and means are presented in Table 1. Taken together, for all three samples the loadings are acceptable. This is supported by the acceptable reliability coefficients (e.g., see Henson, 2001) also presented in Table 1. Correlations for the sample are presented in Table 2. Predictably, for the three samples all adaptive dimensions were strongly (significantly) positively correlated and correlated strongly (significantly) negatively with maladaptive dimensions and slightly (but significantly) negatively or at near-zero with impeding/maladaptive dimensions. Maladaptive dimensions were markedly (significantly) positively correlated, as were impeding/maladaptive dimensions. For the three samples, all correlations indicate lower levels of shared variance between factor groupings than within factor groupings. In addition to the first-order dimensions constituting the 11 facets of the Motivation and Engagement Wheel, there is also hypothesized a higher order structure delineated by adaptive cognitive dimensions, adaptive behavioral dimensions, impeding/maladaptive cognitive dimensions, and maladaptive behavioral dimensions. In higher order models, correlations between first-order dimensions are constrained to be zero, and relations among these first-order dimensions are explained in terms of higher order dimensions. For each of elementary school, high school, and university samples, the higher order CFAs comprised the 44 items, the 11 first-order dimensions, and the 4 higher order dimensions. The higher order elementary school structure fit the data very well ( c2 = 2,155.87, df = 886, p < .001, CFI = .97, NNFI = .97, RMSEA = .05), as did the higher order model for high school students ( c2 = 36,732.07, df = 886, p < .001, CFI = .98, NNFI = .98, RMSEA = .04) and university students ( c2 = 1,968.82, df = 886, p < .001, CFI = .95, NNFI = .94, RMSEA = .05). Table 2 presents higher order correlations, which broadly confirm cluster correlations in the first-order model.
Salkind_Chapter 61.indd 100
9/4/2010 10:53:04 AM
Salkind_Chapter 61.indd 101
.87 / .77 / .73 .86 / .82 / .82 .79 / .81 / .75
Adaptive behavior Planning Task management Persistence Higher order
.82 / .81 / .87 .70 / .81 / .72
.68−.77 (.73) / .61−.78 (.72) / .72−.84 (.79) .33−.85 (.63) / .65−.84 (.74) / .50−.79 (.65) ES: Range = .72−.89, Mean = .81 HS: Range = .70-.87, Mean = .79 UNI: Range = .64−. 80, Mean = .72
.52−.74 (.65) / .61−.74 (.68) / .55−.82 (.69) .61−.85 (.76) / .65−.84 (.70) / .71−.83 (.77) .65−.73 (.69) / .62−.75 (.69) / .62−.82 (.72) ES: Range = .51−.87, Mean = .69 HS: Range = .56-.83, Mean = .69 UNI: Range = .51−.74, Mean = .65
.73−.89 (.80) / .57-.79 (.70) / .33−.91 (.66) .61−.88 (.78) / .71-.85 (.76) / .62−.87 (.74) .63−.79 (.70) / .60−.79 (.71) / .59−.75 (.66) ES: Range = .72−.80, Mean = .76 HS: Range = .84-.88, Mean = .86 UNI: Range = .59−.90, Mean = .74
.60−.72 (.67) / .63-.75 (.69) / .54−.71 (.62) .69−.79 (.73) / .65-.78 (.72) / .63−.82 (.73) .49−.77 (.65) / .55-.76 (.68) / .49−.70 (.61) ES: Range = .84−.90, Mean = .87 HS: Range = .84-.92, Mean = .87 UNI: Range = .75−. 89, Mean = .80
CFA loadings range (mean) ES/ HS/UNI
−.26*** (HS > U) −.13*** (HS > U) −.24*** (HS > U)
−.49*** (HS > ES)
−.24*** (HS > U)
−.47*** (HS > ES)
−.47*** (HS > ES) −.31*** (HS > ES)
.22*** (U > HS) −.14*** (HS > U) −.28*** (HS > U)
.30*** (U > HS)
.35*** (ES > HS)
.04*** (ES > HS) −.18*** (HS > ES) −.50*** (HS > ES)
.26*** (U > HS) .24*** (U > HS) .24*** (U > HS)
.36*** ( U > HS)
.45*** (ES > HS)
.33*** (ES > HS) .26*** (ES > HS) .25*** (ES > HS)
.15*** ( U > HS) .31*** ( U > HS) .38*** ( U > HS)
HS (0) vs. UNI (1)
.24*** (ES > HS) .30*** (ES > HS) .50*** (ES > HS)
HS(0) vs. ES (1)
Motivation and Engagement
***p < 0.001
Note: ES = elementary school; HS = high school; UNI = university. Means, standard deviations, skewness, and kurtosis are available from the author on request. High school results are bolded to assist readability.
Maladaptive behavior Self-handicapping Disengagement Higher order
Impeding/maladaptive cognition Anxiety .75 / .77 / .78 Failure avoidance .84 / .79 / .85 Uncertain control .78 / .79 / .80 Higher order
.76 / .77 / .71 .82 /.81 / .82 .74 / .77 / .70
Adaptive cognition Self-efficacy Mastery orientation Valuing Higher order
Cronbach’s a ES/ HS/UNI
Table 1: Cronbach’s alphas, confirmatory factor analysis (CFA) loadings, and multiple-indicator, multiple-cause modeling standardized betas
Martin 101
9/4/2010 10:53:04 AM
Salkind_Chapter 61.indd 102
−.24 / −.16 / −.24
−.54 / –.34 / −.50
−.47 / –.37 / −.30
−.59 / –.62 / −.47
Failure avoid
Uncertain control
Self-handicapping
Disengagement
.71 / .68 / .64
Persistence
−.08 / .03 / −.08
.57 / .58 / .25
Task management
Anxiety
.60 / .55 / .41
.75 / .76 / .61
– .78 / .73 / .60
Planning
First-order correlations Self-efficacy Mastery orientation Valuing
Self-efficacy
.72 / .78 / .71 .56 / .54 / .42 .50 / .56 / .42 .58 / .59 / .48 .03 / .21 / .17 −.15 / −.05 / −.11 −.38 / –.10 / −.12 −.37 / –.26 / −.26 −.59 / –.56 / −.36
–
Mastery orientation
.51 / .57 / .43 .52 / .58 / .39 .52 / .65 / .64 .04 / .14 / .08 −.22 / –.11 / −.28 −.42/–.17 / −.13 −.49/−.32 / −.32 −.75 / −.71 / −.63
–
Valuing
.63 / .79 / .57 .59 / .74 / .65 −.19 / .11 / .13 −.23 / −.02 /−.15 −.39 / −.17 / −.21 −.36 / –.33 / −.30 −.45 / −.51 / −.26
–
Planning
−.48 / −.51 / −.26
−.36 / −.32 / −.24
−.35 / −.15 / −.10
−.20 / −.02 / −.10
−.11 / .15 / .09
.63 / .66 / .46
–
Task management
−.19 / .07 / .08 −.29 / −.09 / −.31 −.52 / −.27 / −.38 . −.45/−.40 / −.45 −.59/−.60 / −.54
–
Persistence
Elementary school / High school / University
Table 2: Interscale correlations in confirmatory factor analysis: first- and higher order solutions
.50 / .43 / .39 .40 / .49 / .47 .26 / .19 / .17 .11 / .06 / .10
–
Anxiety
.57 / .53 / .45 .50 / .45 / .53 .36 / .32 / .40
–
Failure avoid
.62 / .53 / .36 .51 / .43 / .39
–
.65 / .59 / .51
–
Uncertain Selfcontrol handicapping
102 Motivation
9/4/2010 10:53:04 AM
Higher order correlations Adaptive cognitions Adaptive behaviors Impeding/ maladaptive cognitions Maladaptive behaviors
Salkind_Chapter 61.indd 103
−.56 / −.14 / −.33
−.74 / −.68 / −.66
−.79 / −.75 / −.69
–
Adaptive behaviors
−.46 / −.16/−.29
.86 / .78 / .77
−
Adaptive cognitions
.70 / .61 / .73
–
–
Impeding/ maladaptive Maladaptive cognitions behaviors
Martin Motivation and Engagement 103
9/4/2010 10:53:05 AM
104
Motivation
Multigroup Confirmatory Factor Analysis and Invariance Tests Eight models were tested in each of the multigroup CFAs assessing invariance of factor structure across gender, age, and educational stage. The initial five models related to the first-order factor structure. The first model allowed all factor loadings, uniquenesses, and correlations to be freely estimated; the second held first-order factor loadings invariant across groups; the third held first-order factor loadings and correlations/variances invariant; the fourth held first-order factor loadings and uniquenesses invariant; and the fifth held first-order factor loadings, uniquenesses, and correlations/ variances invariant. The final three models focused on invariance of higher order loadings and correlations/variances: The sixth freely estimated the higher order loadings and correlations/variances, the seventh held higher order loadings invariant, and the eighth held higher order loadings and correlations/variances invariant. Within-sample invariance tests. For elementary school, results in Table 3 indicate that when successive elements of the first- and higher order factor structure are held invariant across groups, the fit indices are predominantly comparable across (Table 3 also indicates c2, df, and p values) (a) males and females (ranges: CFIs = .97 for first-order and .96 for higher order solutions; NNFIs = .98 for first-order and .97 for higher order solutions; RMSEAs = .05 for first-order and higher order solutions) and (b) younger (9–11.5 years) and older (11.5–13 years) students (ranges: CFIs = .97 for first-order and .96 for higher order solutions; NNFIs = .96 for first-order and higher order solutions; RMSEAs = .05 for first-order and higher order solutions). For high school, the fit indices are predominantly comparable across (a) males and females (ranges: CFIs = .98 for first-order and .97 for higher order solutions; NNFIs = .98 for first-order and .97 for higher order solutions; RMSEAs = .04 for first-order and higher order solutions) and (b) early (12–13 years), middle (14–15 years), and late (16–18 years) adolescence (ranges: CFIs = .98 for first-order and .97 for higher order solutions; NNFIs = .98 for first-order and .97 for higher order solutions; RMSEAs = .04 for first-order and higher order solutions). For university, the fit indices are predominantly comparable across (a) males and females (ranges: CFI = .93 to .94 for first-order and .92 to .93 for higher order solutions; NNFIs = .93 for first-order and .92 for higher order solutions; RMSEAs = .06 for first-order and higher order solutions) and (b) younger (17–19 years) and older (20 or more years) students (ranges: CFIs = .94 for first-order and .92 for higher order solutions; NNFIs = .93 for first-order and .92 for higher order solutions; RMSEAs = .05 to .06 for firstorder and .06 for higher order solutions). For all three samples, the application of recommended criteria for evidence of lack of invariance (i.e., a change of .01 in fit indices; see Cheung & Rensvold, 2002) indicates that there is invariance across groups.
Salkind_Chapter 61.indd 104
9/4/2010 10:53:05 AM
Salkind_Chapter 61.indd 105
3,320 / 41,931 / 3,208 3,325 / 42,050 / 3,217 3,364 / 42,582 / 3,261
1,849 / 2,812 / 1,849 1,855 / 2,824 / 1,855 1,876 / 2,866 / 1,876
/ 1,727 / 1,793 / 1,771 / 1,837
1,727 / 2,607 1,793 / 2,739 1,771 / 2,695 1,837 / 2,827
3,036 / 31,021 3,156 / 32,005 2,993 / 32,800 3,091 / 33,857
/ 2,792 / 2,924 / 2,875 / 3,004
1,694 / 2,541 / 1,694
1,849 / 1,849 / 1,849 1,855 / 1,855 / 1,855 1,876 / 1,876 / 1,876
3,011 / 30,639 / 2,728
3,409 / 39,563 / 3,285 3,413 / 39,595 / 3,320 3,558 / 40,077 / 3,455
/ 1,727 / 1,793 / 1,771 / 1,837
1,727 / 1,727 1,793 / 1,793 1,771 / 1,771 1,837 / 1,837
3,084 / 28,859 3,165 / 29,343 3,108 / 31,109 3,269 / 31,759
/ 2,761 / 2,923 / 2,983 / 3,162
1,694 / 1,694 / 1,694
df
2,947 / 28,707 / 2,720
c 2
.96 / .97 / .93 .96 / .97 / .93 .96 / .97 / .92
.97 / .98 / .94 .97 / .98 / .94 .97 / .98 / .94 .96 / .98 / .94
.97 / .98 / .94
.96 / .97 / .93 .96 / .97 / .93 .96 / .97 / .92
.97 / .98 / .94 .97 / .98 / .94 .97 / .98 / .94 .97 / .98 / .93
.97 / .98 / .94
Comparative fit index
Elementary school / High school / University
/ .93 / .93 / .93 / .93
/ .93 / .93 / .93 / .93 .96 / .97 / .92 .96 / .97 / .93 .96 / .97 / .92
.96 / .98 .96 / .98 .96 / .98 .96 / .98
.96 / .98 / .93
.96 / .97 / .92 .96 / .97 / .92 .96 / .97 / .92
.97 / .98 .97 / .98 .97 / .98 .96 / .98
.97 / .98 / .93
Nonnormed fit index
.05 / .04 / .06 .05 / .04 / .06 .05 / .04 / .06
.05 / .04 / .05 .05 / .04 / .06 .05 / .04 / .06 .05 / .04 / .06
.05 / .04 / .05
.05 / .04 / .06 .05 / .04 / .06 .05 / .04 / .06
.05 / .04 / .05 .05 / .04 / .06 .05 / .04 / .06 .05 / .04 / .06
.05 / .04 / .05
Root mean square error of approximation
Note: High school results are bolded to assist readability. All chi-square values significant at p < .001. Maximum 90% confidence interval range for all first-order root mean square errors of approximation (RMSEAs) = .04 to .06. Maximum 90% confidence interval range for all higher order RMSEAs = .04 to .07.
Invariance across age groups First-order parameters are free (Model 1: no invariance) First-order factor loadings invariant (Model 2) Model 2 + correlations/variances invariant Model 2 + uniquenesses invariant Model 2 + correlations/variances, uniquenesses invariant Higher order parameters free Higher order factor loadings invariant (Model 3) Model 3 + correlations/variances invariant
Invariance across males and females First-order parameters are free (Model 1: no invariance) First-order factor loadings invariant (Model 2) Model 2 + correlations/variances invariant Model 2 + uniquenesses invariant Model 2 + correlations/variances, uniquenesses invariant Higher order parameters free Higher order factor loadings invariant (Model 3) Model 3 + correlations/variances invariant
Table 3: Invariance tests across gender and age group
Martin Motivation and Engagement 105
9/4/2010 10:53:05 AM
106
Motivation
Between-sample invariance tests. The final set of invariance tests assessed first- and higher order factor structure across elementary school, high school, and university samples. This is a direct assessment of the generalizability of the framework and measurement across diverse settings. Fit indices in Table 4 (Table 4 also indicates c2, df, and p values) show that when successive elements of the factor structure are held invariant across high school and university samples on the original 7-point rating scale (ranges: CFIs and NNFIs = .98 for first-order and higher order solutions; RMSEAs = .04 for firstorder and higher order solutions), there is invariance across all first-order and higher order parameters. In terms of elementary school, high school, and university samples on a common 5-point scale (the common 5-point rating scale was derived by aggregating the first and last 2 points of the 7-point rating scale), there is also invariance across the three samples (ranges: CFIs and NNFIs = .98 for first-order and higher order solutions; RMSEAs = .04 for first-order and higher order solutions). Finally, when assessing invariance between elementary school and university samples (thereby omitting the extremely large high school sample that could bias invariance findings), there is also evidence of invariance when aspects of factor structure (loadings, correlations/variances, uniquenesses) are systematically constrained to be equal (ranges: CFI = .96 to .97 for first-order and .96 for higher order solutions; NNFI= .96 to .97 for first-order and .96 for higher order solutions; RMSEAs = .05 for first-order and higher order solutions). For each of these three sets of between-sample invariance tests, the application of recommended criteria for evidence of lack of invariance (i.e., a change of .01 in fit indices) indicates that there is invariance across elementary school, high school, and university domains.
Multiple-Indicator, Multiple-Cause Modeling The previous analyses explored possible differences in factor structure as a function of educational stage. It was also of interest to explore possible mean-level differences in motivation and engagement as a function of educational stage (elementary school, high school, university). MIMIC modeling was the analytical method used to examine this and involved structural equation models in which educational stage was used as a predictor of the first- and higher order factors of the wheel. The first-order model yielded a good fit to the data (c2 = 39,347.85, df = 914, p < .001, CFI = .95, NNFI = .94, RMSEA = .04), as did the higher order model (c2 = 45,508.66, df = 966, p < .001, p < .001, CFI = .95, NNFI = .94, RMSEA = .05). Beta coefficients are presented in Table 1 along with the main effects for educational stage. Results show that there are significant stage differences on all motivation and engagement factors. Compared with high school students, elementary school and university students are significantly higher on all
Salkind_Chapter 61.indd 106
9/4/2010 10:53:05 AM
Salkind_Chapter 61.indd 107
1,694 1,727 1,793 1,771 1,837 1,849 1,855 1,876
2,541 2,607 2,739 2,695 2,827 2,812 2,824 2,866
1,694 1,727 1,793 1,771 1,837 1,849 1,855 1,876
df
.97 .97 .97 .97 .96 .96 .96 .96
.98 .98 .98 .98 .98 .98 .98 .98
.98 .98 .98 .98 .98 .98 .98 .98
Comparative fit index
.97 .97 .96 .96 .96 .96 .96 .96
.98 .98 .98 .98 .98 .98 .98 .98
.98 .98 .98 .98 .98 .98 .98 .98
Nonnormed fit index
.05 .05 .05 .05 .05 .05 .05 .05
.04 .04 .04 .04 .04 .04 .04 .04
.04 .04 .04 .04 .04 .04 .04 .04
Root mean square error of approximation
Note: All chi-square values significant at p < .001. Maximum 90% confidence interval range for all first-order root mean square errors of approximation (RMSEAs) = .03 to .05. Maximum 90% confidence interval range for all higher order RMSEAs = .04 to .05.
3,472 3,657 3,931 3,895 4,181 4,403 4,419 4,561
26,203 26,645 27,480 26,878 27,550 34,745 34,823 35,171
Invariance elementary, high school, university (5-point scale) First-order parameters are free (Model 1: no invariance) First-order factor loadings invariant (Model 2) Model 2 + correlations/variances invariant Model 2 + uniquenesses invariant Model 2 + correlations/variances, unique invariant Higher order parameters free Higher order factor loadings invariant (Model 3) Model 3 + correlations/variances invariant
Invariance elementary and university (5-point scale) First-order parameters are free (Model 1: no invariance) First-order factor loadings invariant (Model 2) Model 2 + correlations/variances invariant Model 2 + uniquenesses invariant Model 2 + correlations/variances, unique invariant Higher order parameters free Higher order factor loadings invariant (Model 3) Model 3 + correlations/variances invariant
28,875 29,002 29,249 29,110 29,291 37,548 37,563 37,609
c2
Invariance high school and university (7-point scale) First-order parameters are free (Model 1: no invariance) First-order factor loadings invariant (Model 2) Model 2 + correlations/variances invariant Model 2 + uniquenesses invariant Model 2 + correlations/variances, unique invariant Higher order parameters free Higher order factor loadings invariant (Model 3) Model 3 + correlations/variances invariant
Table 4: Invariance tests across samples
Martin Motivation and Engagement 107
9/4/2010 10:53:05 AM
108
Motivation
adaptive dimensions. Also, compared with high school students, elementary school and university students are significantly lower in uncertain control, self-handicapping, and disengagement. However, compared to high school students, elementary school and university students are significantly higher on anxiety and failure avoidance. As a general finding, there is a greater difference between elementary and high school students than between high school and university students. Again, however, note that the high school and university 1-to-7 rating continuum was transformed to a 1-to-5 rating continuum to place high school and university on the same scale of measurement as elementary school; hence, caution is advised when interpreting these findings. Due to the large high school sample, caution is also advised when interpreting the significance of the MIMIC results, and this being the case, greater emphasis is given to findings in relation to selfefficacy, mastery orientation, valuing of school, planning, task management, persistence, uncertain control, and self-handicapping that yielded standardized beta values greater than .30.
Motivation, Engagement, and Between-Network Cognate Correlates As indicated earlier, consistent with the between-network construct validity approach, it was of interest to explore the nature of relationships between each facet of motivation and a set of key between-network correlates across the three educational stages. To this end, the three samples were also administered items that explored enjoyment of school or university (elementary school, high school, university), class participation (elementary school, high school, university), positive academic intentions (high school, university), academic buoyancy (elementary school, high school, university), and homework completion (high school, university). For each of the three samples, first- and higher order CFAs were conducted. The first-order elementary school CFA yielded a very good fit to the data (c2 = 2,915.33, df = 1,393, p < .001, CFI = .98, NNFI = .98, RMSEA = .04) and showed that (a) adaptive dimensions are significantly positively associated with these between-network constructs and (b) impeding/maladaptive and maladaptive dimensions (particularly uncertain control, self-handicapping, and disengagement) are negatively correlated with these constructs. Table 5 presents findings. These first-order findings were broadly supported in the high school sample (c2 = 52,112, df = 1,650, p < .001, CFI = .98, NNFI = .98, RMSEA = .04) and the university sample (c2 = 3,251.39, df = 1,650, p < .001, CFI = .96, NNFI= .96, RMSEA = .05). Interestingly – and consistent with Martin (2007a; see also Martin & Marsh, 2006, 2008a, 2008b) – academic buoyancy is a notable exception in being more markedly correlated with impeding/maladaptive cognitions than with maladaptive behaviors,
Salkind_Chapter 61.indd 108
9/4/2010 10:53:05 AM
Salkind_Chapter 61.indd 109
.40 / .46 / .34 .33 / .41 / .26 .51 / .48 / .41 −.16 / −.08 / −.15 −.24 / −.15 / −.19 −.40 / −.25 / −.24 −.40 / −.30 / −.30 −.49 / −.46 / −.33
.40 / .49 / .21 .40 / .48 / .11 .46 / .54 / .35
−.11 / −.04 / −.23 −.16 / −.17 / −.33 −.28 / −.26 / −.29
−.32 / −.34 / −.28 −.67 / −.68 / −.57
.59 / .67 / .55 .55 / .59 / .32 −.28 / −.20 / −.38 −.66 / −.71 / −.62
Higher order correlations Adaptive cognitions Adaptive behaviors Impeding/maladaptive cognitions Maladaptive behaviors
.41 / .31 / .34 .61 / .39 / .28 −.66 / −.74 / −.87 −.33 / −.33 / −.30
−.29 / −.25 /−.25 −.29 / −.29 / −.23
−.62 / −.69 / −.74 −.34 / −.31 / −.39 −.52 / −.47 / −.54
.47 / .35 / .19 .39 / .27 / .13 .53 / .37 / .27
.42 / .38 / .41 .35 / .20 / .16 .31 / .25 / .26
Buoyancy
− / .73 / .81 − / .62 / .50 − / −.19 / −.29 − / −.72 / −.73
− / −.40 / −.34 − / −.68 / −.67
− / .02 / −.10 − / −.18 / −.35 − / −.32 / −.31
− / .49 / .30 − / .50 / .23 − / .60 / .53
− / .67 / .68 − / .56 / .56 − / .68 / .72
Positive intent
− / .42 / .10 − / .50 / .16 − / −.12 / −.06 − / −.53 / −.11
− / −.37 / −.19 − / −.47 / −.19
− / .04 / −.06 − / −.14 / −.11 − / −.24 / .05
− / .42 / .12 − / .40 / .11 − / .48 / .13
− / .35 / .05 − / .34 / .01 − / .39 / .15
Homework completion
Note: Elementary school r > +/−.07, significant at p < .05; high school r > +/−.02 significant at p < .05 (but note large sample); university r > +/−.12 significant at p < .05. High school results are bolded to assist readability.
.52 / .54 / .52 .55 / .53 / .46 −.41 / −.21 / −.26 −.54 / −.50 / −.41
.44 / .51 / .45 .48 / .45 / .36 .42 / .46 / .44
.43 / .57 / .45 .57 / .55 / .37 .55 / .63 / .51
Participation
First-order correlations Adaptive cognitions Self-efficacy Mastery orientation Valuing Adaptive behaviors Planning Task management Persistence Impeding/maladaptive cognitions Anxiety Failure avoidance Uncertain control Maladaptive behaviors Self-handicapping Disengagement
Enjoyment
Elementary school / High school / University
Table 5: First- and higher order correlations with between-network constructs
Martin Motivation and Engagement 109
9/4/2010 10:53:05 AM
110
Motivation
largely a function of its very high correlation with anxiety (discussed fully in Martin & Marsh, 2006, 2008a, 2008b). Again, however, due to the large high school sample, caution is advised when interpreting the correlations – emphasis is given to the size and direction of the correlation coefficients themselves rather than to their significance levels. The higher order factor analysis for elementary school ( c2 = 3,361.64, df = 1,453, p < .001, CFI = .97, NNFI = .97, RMSEA = .05) provides general support for the first-order findings. Higher order correlations are also presented in Table 5 (again, due to the large samples involved, emphasis is given to the size and direction of the correlation coefficients themselves rather than to their significance levels). Consistent with the elementary school findings, the higher order factor analysis for high school ( c2 = 67,868.55, df = 1,724, p < .001, CFI = .98, NNFI = .98, RMSEA = .04) provides support for the first-order findings, as did the higher order model for the university sample ( c2 = 3,683.58, df = 1,724, p < .001, CFI = .95, NNFI = .95, RMSEA = .05).
Discussion Through the integration of multivariate measurement and the hypothesized motivation and engagement framework, this study supports the developmental construct validity of motivation and engagement at the elementary school, high school, and university/college levels. From this developmental construct validity perspective, perhaps the most significant yield of this study is the predominantly comparable findings across three very distinct educational stages. The data confirm the hypothesized generality of the wheel and its accompanying instrumentation among very young students in elementary school through to mature-age students in university. In some ways, the most revealing tests are the multigroup invariance analyses across the elementary school, high school, and university samples. These analyses directly address the question posed at the outset of the study regarding the generality of the proposed motivation and engagement framework in diverse educational settings. The invariance data suggest that there is generality – and developmental validity – of the framework across the academic life span. Notwithstanding the important consistencies across the three educational stages, findings also suggest issues distinct to each academic setting. For example, the data show that elementary school students reflect higher levels of motivation and engagement, and this is consistent with prior work showing declines between elementary and middle or high school (e.g., Anderman & Midgley, 1997; Roeser et al., 2000; Wigfield et al., 1991; Wigfield & Tonks, 2002). In terms of university students, there is some question as to their level of motivation relative to school students, with some research recognizing
Salkind_Chapter 61.indd 110
9/4/2010 10:53:05 AM
Martin
Motivation and Engagement
111
the challenges they face in higher education and other research reporting on their confidence in their abilities (e.g., see Martin, Marsh, Williamson, et al., 2003; Pitts, 2005). The present data shed light on these competing views by showing that, notwithstanding equivalence in factor structure, university students reflect higher mean levels of motivation and engagement than do their high school counterparts. In the case of all MIMIC analyses, however, due to the large samples involved, emphasis is given to the size and direction of the standardized beta coefficients rather than to the attained significance levels. Because the constructs within the wheel have a theoretical basis, researchers are able to draw on theory to provide direction for intervention aimed at addressing facets within the wheel. Research shows that targeted intervention is more effective than intervention that does not focus on specific target behaviors (O’Mara, Marsh, Craven, & Debus, 2006), and so it is proposed that intervention programs seeking to build specific academic skills and competencies need to provide targeted support that can do this. The wheel provides a basis for doing so. Martin (2007a; see also Martin, 2008b, in press-a, for strategy in sport and music settings) has proposed specific classroom strategy that targets each of these dimensions, and this strategy incorporated into intervention work has demonstrated significant yields for students (Martin, 2005a, 2008a). In addition to what Martin (2007a) suggests in terms of specific classroom strategy, there are other approaches to intervention that have more of a measurement basis to them. One such approach that Martin (2008b) has previously proposed in relation to motivation and engagement involves performance profiling. Performance profiling (Butler & Hardy, 1992) has very direct synergies with the wheel both in form and substance – indeed, Martin (2008b) has demonstrated how performance profiling can be conducted with the wheel in the domains of sport and music. Performance profiling provides a means by which to effectively and parsimoniously contextualize individuals’ profiles in reference to a set of psychological and behavioral criteria. Although there are various ways and levels to profile under a performance profiling schedule, the example in the present study is the mean-level profile (rounded) for the high school sample as a whole (n = 21,579). In Figure 2, the traditional performance profiling format (see Butler & Hardy, 1992; see also Martin, 2008b; Weinberg & Gould, 1999) has been adapted to interface with the Motivation and Engagement Wheel. Obviously, at the individual level it would reflect the student’s mean scores on each dimension. Or, it could be readily employed at a class or school level (and bringing into focus the issue of multilevel models of motivation and engagement; for multilevel research along these lines, see Marsh, Martin, & Cheng, 2008; Martin & Marsh, 2005).
Salkind_Chapter 61.indd 111
9/4/2010 10:53:05 AM
112
Motivation
ADAPTIVE COGNITION
Mastery orientation
7
Planning
6 Valuing
ADAPTIVE BEHAVIOR Task management
5 4
Selfefficacy
3
Persistence
2 1
Anxiety Disengagement Failure avoidance MALADAPTIVE BEHAVIOR
Self handicapping
Uncertain control
IMPEDING/ MALADAPTIVE COGNITION
Source: Adapted from Butler and Hardy (1992), Martin (in press-b), and Weinberg and Gould (1999).
Figure 2: Performance profile for motivation and engagement, reflecting mean level/7 (rounded to nearest 0.5) profile for high school sample (n = 21,579)
Limitations, Future Directions, and Conclusion This study provides an enhanced understanding of the validity of motivation and engagement in the context of three educational stages: elementary school, high school, and university. There are, however, a number of potential limitations important to consider when interpreting findings. First, although the large sample involved in the study is a distinct strength of the research, it posed some challenges when interpreting data, with the need to emphasize the practical significance of findings as much as or more than the statistical significance of findings. It is also important to recognize that the data presented in this study are all self-reported. Although this is a logical and defensible methodology in its own right given the substantive focus, it is important to conduct research that examines the same constructs using data derived from additional sources such as, for example, achievement and that from teachers and parents. Just as important as the self-report nature of findings is the fact that the data presented in the study are cross-sectional. Tracking the same students over time and assessing factor structure and interrelationships from a longitudinal perspective would shed further light on the developmental processes
Salkind_Chapter 61.indd 112
9/4/2010 10:53:05 AM
Martin
Motivation and Engagement
113
relevant to motivation and engagement. In addition, examining reliability and stability of the scores over time and the causal ordering of motivation and engagement in relation to the cognate constructs assessed here are other issues of interest in longitudinal work. The nature of quantitative survey–based methods also warrants some further comment. Although Martin, Marsh, Williamson, and Debus (2003) conducted qualitative work among university samples, future research might encompass qualitative work that can more fully scope the detailed nature and extent of motivation and engagement across the academic life span. Alongside this qualitative work, there may also be yields in multilevel approaches to developmental construct validity in motivation and engagement. Advances in statistical software enable researchers to more accurately assess the relative influence of individual-, class-, and school-level factors using multilevel modeling (see Goldstein, 2003), and so future research can readily explore the influence of class- and school-level motivation climates relative to individuallevel variation in motivation and engagement as relevant to developmental construct validity. To conclude, the research presented here supports the developmental construct validity of the Motivation and Engagement Wheel and its accompanying instrumentation, the MES, across the academic life span. The findings of this investigation hold implications for researchers studying issues relevant to motivation and engagement across the academic life span. The findings also present new insights and opportunities for educators seeking to enhance the educational outcomes of their students – outcomes that are affected by motivation and engagement and the extent to which educators can effectively measure and enhance them.
References Anderman, E. A., & Midgley, C. (1997). Changes in personal achievement goals and the perceived classroom goal structures across the transition to middle level schools. Contemporary Educational Psychology, 22, 269–298. Atkinson, J. W. (1957). Motivational determinants of risk-taking. Psychological Review, 64, 359–372. Bandura, A. (1997). Self-efficacy: The exercise of control. New York: Freeman. Beck, A. T. (1995). Cognitive therapy: Basics and beyond. New York: Guilford. Bong, M. (1996). Problems in academic motivation research and advantages and disadvantages of their solutions. Contemporary Educational Psychology, 21, 149–165. Buss, D. W., & Cantor, N. (1989). Personality psychology: Recent trends and emerging directions. New York: Springer-Verlag. Butler, R. J., & Hardy, L. (1992). The performance profile: Theory and application. Sport Psychologist, 6, 253–264. Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255. Connell, J. P. (1985). A new multidimensional measure of children’s perceptions of control. Child Development, 56, 1018–1041.
Salkind_Chapter 61.indd 113
9/4/2010 10:53:05 AM
114
Motivation
Covington, M. V. (1992). Making the grade: A self-worth perspective on motivation and school reform. Cambridge, UK: Cambridge University Press. Dweck, C. S. (1986). Motivational processes affecting learning. American Psychologist, 41, 1040–1048. Eccles, J. (1983). Expectancies, values, and academic behaviors. In J. Spence (Ed.), Achievement and achievement motivation. Elliot, A. J., & Church, M. A. (1997). A hierarchical model of approach and avoidance achievement motivation, Journal of Personality and Social Psychology, 72, 218–232. Elliot, A. J., & Sheldon, K. M. (1997). Avoidance achievement motivation: A personal goals analysis. Journal of Personality and Social Psychology, 73, 171–185. Finn, J. D. (1989). Withdrawing from school. Review of Educational Research, 59, 117–142. Goldstein, H. (2003). Multilevel statistical models. London: Edward Arnold. Graham, J. W., & Hoffer, S. M. (2000). Multiple imputation in multivariate research. In T. D. Little, K. U. Schnable, & J. Baumert (Eds.), Modeling longitudinal and multilevel data: Practical issues, applied approaches, and specific examples (pp. 201–218). Mahwah, NJ: Lawrence Erlbaum. Green, J., Martin, A. J., & Marsh, H. W. (2007). Motivation and engagement in English, mathematics and science high school subjects: Towards an understanding of multidimensional domain specificity. Learning and Individual Differences, 17, 269–279. Greeno, J. (1998). The situativity of knowing, learning, and research. American Psychologist, 53, 5–26. Hattie, J. (1992). Self-concept. Hillsdale, NJ: Lawrence Erlbaum. Henson, R. K. (2001). Understanding internal consistency reliability estimates: A conceptual primer on coefficient alpha. Measurement and Evaluation in Counseling and Development, 34, 177–189. Jöreskog, K. G., & Sörbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language. Chicago: Scientific Software International. Jöreskog, K. G., & Sörbom, D. (2006). LISREL 8.80. Chicago: Scientific Software International. Kaplan, D. (2000). Structural equation modeling: Foundations and extensions. Thousand Oaks, CA: Sage. Lee, F. K., Sheldon, K. M., & Turban, D. B. (2003). Personality and goal-striving process: The influence of achievement goal patterns, goal level, and mental focus on performance and enjoyment. Journal of Applied Psychology, 88, 256 –265. Marsh, H. W. (1993). The multidimensional structure of academic self-concept: Invariance over gender and age. American Educational Research Journal, 30, 841–860. Marsh, H. W. (2002). A multidimensional physical self-concept: A construct validity approach to theory, measurement, and research. Psychology: The Journal of the Hellenic Psychological Society, 9, 459–493. Marsh, H. W., & Hau, K.-T. (2007). Applications of latent-variable models in educational psychology: The need for methodological-substantive synergies. Contemporary Educational Psychology, 32, 151–171. Marsh, H. W., Martin, A. J., & Cheng, J. (2008). A multilevel perspective on gender in classroom motivation and climate: Potential benefits of male teachers for boys? Journal of Educational Psychology, 100, 78–95. Marsh, H. W., & Shavelson, R. J. (1985). Self-concept: Its multifaceted, hierarchical structure. Educational Psychologist, 20, 107–125. Martin, A. J. (2001). The Student Motivation Scale: A tool for measuring and enhancing motivation. Australian Journal of Guidance and Counselling, 11, 1–20.
Salkind_Chapter 61.indd 114
9/4/2010 10:53:05 AM
Martin
Motivation and Engagement
115
Martin, A. J. (2002). Motivation and academic resilience: Developing a model of student enhancement. Australian Journal of Education, 14, 34 – 49. Martin, A. J. (2003a). How to motivate your child for school and beyond. Sydney, Australia: Bantam. Martin, A. J. (2003b). The relationship between parents’ enjoyment of parenting and children’s school motivation. Australian Journal of Guidance and Counselling, 13, 115–132. Martin, A. J. (2003c). The Student Motivation Scale: Further testing of an instrument that measures school students’ motivation. Australian Journal of Education, 47, 88–106. Martin, A. J. (2004). School motivation of boys and girls: Differences of degree, differences of kind, or both? Australian Journal of Psychology, 56, 133–146. Martin, A. J. (2005a). Exploring the effects of a youth enrichment program on academic motivation and engagement. Social Psychology of Education, 8, 179–206. Martin, A. J. (2005b). Perplexity and passion: Further consideration of the role of positive psychology in the workplace. Journal of Organizational Behavior Management, 24, 203–205. Martin, A. J. (2005c). The role of positive psychology in enhancing satisfaction, motivation, and productivity in the workplace. Journal of Organizational Behavior Management, 24, 113–133. Martin, A. J. (2006). The relationship between teachers’ perceptions of student motivation and engagement and teachers’ enjoyment of and confidence in teaching. Asia-Pacific Journal of Teacher Education, 34, 73–93. Martin, A. J. (2007a). Examining a multidimensional model of student motivation and engagement using a construct validation approach. British Journal of Educational Psychology, 77, 413–440. Martin, A. J. (2007b). The Motivation and Engagement Scale. Sydney, Australia: Lifelong Achievement Group. www.lifelongachievement.com. Martin, A. J. (2008a). Enhancing student motivation and engagement: The effects of a multidimensional intervention. Contemporary Educational Psychology, 33, 239–269. Martin, A. J. (2008b). Motivation and engagement in music and sport: Testing a multidimensional framework in diverse performance settings. Journal of Personality, 76, 135–170. Martin, A. J. (in press-a). How domain specific are motivation and engagement across school, sport, and music? A substantive-methodological synergy assessing young sportspeople and musicians. Contemporary Educational Psychology. Martin, A. J. (in press-b). Motivation and engagement in diverse performance domains: Testing their generality across school, university/college, work, sport, music, and daily life. Journal of Research in Personality. Martin, A. J., Craven, R. G., & Munns, G. (2006). Motivation and engagement in young children: How well does a high school conceptualization generalize to junior school? In R. G. Craven., J. S. Eccles, & T. M. Ha (Eds.), Self-concept, motivation, social and personal identity for the 21st century. Proceedings of the Fourth International Biennial SELF Research Conference, Ann Arbor, University of Michigan. Martin, A. J., & Debus, R. L. (1998). Self-reports of mathematics self-concept and educational outcomes: The roles of ego-dimensions and self-consciousness. British Journal of Educational Psychology, 68, 517–535. Martin, A. J. & Marsh, H. W. (2003). Fear of failure: Friend or foe? Australian Psychologist, 38, 31–38.
Salkind_Chapter 61.indd 115
9/4/2010 10:53:05 AM
116
Motivation
Martin, A. J., & Marsh, H. W. (2005). Motivating boys and motivating girls: Does teacher gender really make a difference? Australian Journal of Education, 49, 320–334. Martin, A. J., & Marsh, H. W. (2006). Academic resilience and its psychological and educational correlates: A construct validity approach. Psychology in the Schools, 43, 267–282. Martin, A. J., & Marsh, H. W. (2008a). Academic buoyancy: Towards an understanding of students’ everyday academic resilience. Journal of School Psychology, 46, 53–83. Martin, A. J., & Marsh, H. W. (2008b). Workplace and academic buoyancy: Psychometric assessment and construct validity amongst school personnel and students. Journal of Psychoeducational Assessment, 26, 168–184. Martin, A. J., Marsh, H. W., & Debus, R. L. (2001a). A quadripolar need achievement representation of self-handicapping and defensive pessimism. American Educational Research Journal, 38, 583–610. Martin, A. J., Marsh, H. W., & Debus, R. L. (2001b). Self-handicapping and defensive pessimism: Exploring a model of predictors and outcomes from a self-protection perspective. Journal of Educational Psychology, 93, 87–102. Martin, A. J., Marsh, H. W., & Debus, R. L. (2003). Self-handicapping and defensive pessimism: A model of self-protection from a longitudinal perspective. Contemporary Educational Psychology, 28, 1–36. Martin, A. J., Marsh, H. W., Williamson, A., & Debus, R. L. (2003). Self-handicapping, defensive pessimism, and goal orientation: A qualitative study of university students. Journal of Educational Psychology, 95, 617–628. Martin, A. J., Milne-Home, J., Barrett, J., & Spalding, E. (1997). Stakeholder perceptions of the institution: To agree or not to agree. Journal of Institutional Research in Australasia, 6, 53–67. Martin, A. J., Milne-Home, J., Barrett, J., Spalding, E., & Jones, G. (2000). Graduate satisfaction with university and perceived employment preparation. Journal of Education and Work, 13, 199–214. Martin, A. J., Tipler, D. V., Marsh, H. W., Richards, G. E., & Williams, M. R. (2006). Assessing multidimensional physical activity motivation: A construct validity study of highschool students. Journal of Sport and Exercise Psychology, 28, 171–192. McClelland, D. C. (1965). Toward a theory of motive acquisition. American Psychologist, 20, 321–333. McDonald, R. P., & Marsh, H. W. (1990). Choosing a multivariate model: Noncentrality and goodness-of-fit. Psychological Bulletin, 107, 247–255. Meece, J. L., Wigfield, A., & Eccles, J. S. (1990). Predictors of mathematics anxiety and its influence on young adolescents’ course enrolment intentions and performance in mathematics. Journal of Educational Psychology, 82, 60–70. Miller, R. B., Greene, B. A., Montalvo, G. P., Ravindran, B., & Nichols, J. D. (1996). Engagement in academic work: The role of learning goals, future consequences, pleasing others, and perceived ability. Contemporary Educational Psychology, 21, 388 – 422. Miserandino, M. (1996). Children who do well in school: Individual differences in perceived competence and autonomy in above-average children. Journal of Educational Psychology, 88, 203–214. Murphy, P. K., & Alexander, P. A. (2000). A motivated exploration of motivation terminology. Contemporary Educational Psychology, 25, 3–53. Nicholls, J. G. (1989). The competitive ethos and democratic education. Cambridge, MA: Harvard University Press.
Salkind_Chapter 61.indd 116
9/4/2010 10:53:06 AM
Martin
Motivation and Engagement
117
O’Mara, A. J., Marsh, H. W., Craven, R. G., & Debus, R. (2006). Do self-concept interventions make a difference? A synergistic blend of construct validation and meta-analysis. Educational Psychologist, 41, 181–206. Pajares, F. (1996). Self-efficacy beliefs in achievement settings. Review of Educational Research, 66, 543–578. Pintrich, P. R. (2000). Educational psychology at the millennium: A look back and a look forward. Educational Psychologist, 35, 221–226. Pintrich, P. R. (2003). A motivational science perspective on the role of student motivation in learning and teaching contexts. Journal of Educational Psychology, 95, 667–686. Pintrich, P. R., & DeGroot, E. (1990). Motivational and self-regulated learning components of classroom academic performance. Journal of Educational Psychology, 82, 33–40. Pintrich, P. R., & Garcia, T. (1991). Student goal orientation and self-regulation in the college classroom. In M. Maehr & P. R. Pintrich (Eds.). Advances in motivation and achievement: Goals and self-regulatory processes (Vol. 7, pp. 371–402). Greenwich, CT: JAI. Pitts, S. (2005). Valuing musical participation. Hants, UK: Ashgate. Richter, F. D., & Tjosvold, D. (1980). Effects of student participation in classroom decision making on attitudes, peer interaction, motivation and learning. Journal of Applied Psychology, 65, 74 – 80. Roeser, R., Eccles, J. S., & Sameroff, A. J. (2000). School as a context of early adolescents’ academic and social-emotional development: A summary of research findings. Elementary School Journal, 100, 443 – 471. Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American Psychologist, 55, 68–78. Sarason, I. G., & Sarason, B. R. (1990). Test anxiety. In H. Leitenberg (Ed.), Handbook of social and evaluation anxiety (pp. 475 – 495). New York: Plenum. Shavelson, R. J., Hubner, J. J., & Stanton, G. (1976). Self-concept: Validation of construct interpretations. Review of Educational Research, 46, 407 – 441. Spielberger, C. D. (1985). Assessment of state and trait anxiety: Conceptual and methodological issues. Southern Psychologist, 2, 6–16. Stokes, D. (1997). Pasteur’s quadrant: Basic science and technological innovation. Washington, DC: Brookings Institution Press. Tabachnick, B. G., & Fidell, L. A. (2007). Using multivariate statistics (5th ed.). Boston: Pearson/Allyn & Bacon. Weinberg, R. S., & Gould, D. (1999). Foundations of sport and exercise psychology (2nd ed.). Champaign, IL: Human Kinetics. Weiner, B. (1994). Integrating social and personal theories of achievement striving. Review of Educational Research, 64, 557–573. Wigfield, A., Eccles, J., Mac Iver, D., Reuman, D., & Midgley, C. (1991). Transitions at early adolescence: Changes in children’s domain-specific self-perceptions and general self-esteem across the transition to junior high school. Developmental Psychology, 27, 552–565. Wigfield, A., & Tonks, S. (2002). Adolescents’ expectancies for success and achievement task values during middle and high school years. In F. Pajares & T. Urdan (Eds.), Academic motivation of adolescents (pp. 53–82). Greenwich, CT: Information Age. Zimmerman, B. J. (2002). Achieving self-regulation: The trial and triumph of adolescence. In F. Pajares & T. Urdan (Eds.), Academic motivation of adolescents (pp. 1–28). Greenwich, CT: Information Age.
Salkind_Chapter 61.indd 117
9/4/2010 10:53:06 AM
Salkind_Chapter 61.indd 118
“If I try hard, I believe I can do my schoolwork well”
“Learning at school is important”
“I feel very pleased with myself when I really understand what I’m taught at school”
“Before I start a project, I plan out how I am going to do it”
“I usually do my homework in places where I can concentrate”
“If I can’t understand my schoolwork, I keep going over it until I do”
“When I have a project to do, I worry about it a lot”
Self-efficacy
Valuing
Mastery orientation
Planning
Task management
Persistence
Anxiety
Motivation and engagement scale–junior school
Sample Motivation and Engagement Scale Items
“When exams and assignments are coming up, I worry a lot”
“If I can’t understand my schoolwork at first, I keep going over it until I do”
“When I study, I usually study in places where I can concentrate”
“Before I start an assignment, I plan out how I am going to do it”
“I feel very pleased with myself when I really understand what I’m taught at school”
“Learning at school is important”
“If I try hard, I believe I can do my schoolwork well”
Motivation and engagement scale–high school
Appendix
“When exams and assignments are coming up, I worry a lot”
“If I can’t understand my university work at first, I keep going over it until I do”
“When I study, I usually study in places where I can concentrate”
“Before I start an assignment, I plan out how I am going to do it”
“I feel very pleased with myself when I really understand what I’m taught at university”
“Learning at university is important”
“If I try hard, I believe I can do my university work well”
Motivation and engagement scale–university/college
118 Motivation
9/4/2010 10:53:06 AM
“The main reason I try at school is because I don’t want to disappoint my parents”
“When I get a bad mark I don’t know how to stop that happening again”
“I sometimes don’t work very hard at school so I can have a reason if I don’t do well”
“I’ve given up being interested in school”
Failure avoidance
Uncertain control
Salkind_Chapter 61.indd 119
Self-handicapping
Disengagement
“I’ve pretty much given up being involved in things at school”
“I sometimes don’t study very hard before exams so I have an excuse if I don’t do as well as I hoped”
“When I get a bad mark I’m often unsure how I’m going to avoid getting that mark again”
“Often the main reason I work at school is because I don’t want to disappoint my parents”
“I’ve pretty much given up being involved in things at university”
“I sometimes don’t study very hard before exams so I have an excuse if I don’t do as well as I hoped”
“When I get a bad mark I’m often unsure how I’m going to avoid getting that mark again”
“Often the main reason I work at university is because I don’t want to disappoint others”
Martin Motivation and Engagement 119
9/4/2010 10:53:06 AM
Salkind_Chapter 61.indd 120
9/4/2010 10:53:06 AM
62 Motivation and Achievement: A Quantitative Synthesis Margaret E. Uguroglu and Herbert J. Walberg
W
hat is the typical correlation between measures of motivation and educational achievement? One recent review, discussed below, suggests a figure of approximately .50. But it is by no means clear that this figure would be estimated from a second sample of studies; and it remains to be shown whether certain motivation constructs such as locus of control are more valid than others, as claimed by some authors, or that the relation of motivation and achievement is stronger for some students, for example, boys or girls or older or younger participants, as held by others. The purpose of this study is to use new analytic techniques for synthesizing research to produce objective estimates of the motivation-achievement correlation and to examine its dependency on types of motivation, achievement, and samples that have been investigated in recent years. A complete discussion of motivation and learning in the context of a 6-factor theory of educational productivity is presented in Walberg & Uguroglu (1979).
Method Sample The analysis makes use of two samples of studies with correlations between motivation and achievement measures. The first set of data, which is termed the calibration sample, consists of 122 correlations compiled by Bloom Source: American Educational Research Journal, 16(4) (1979): 375–389.
Salkind_Chapter 62.indd 121
9/4/2010 10:52:55 AM
122
Motivation
(1976, pp. 252–256) from 22 studies published between 1953 and 1974. Bloom does not explicitly describe the search and selection procedure. No systematic bias appears in the sampling; 56 of the 122 correlations in the calibration sample, however, were taken from a single large-scale study of mathematics achievement by Crosswhite in 1972 (Bloom, 1976), which was taken into account in the analysis in the present paper. The second sample of correlations, termed the validation sample, was obtained from two sources. (The validation sample contained the following 18 studies: Chang, 1976; Cobb, Chissom, & Davis, 1975; Cole, 1974; Frymier, Henning, Henning, Norris, & West, 1975; Handel, 1975; Kagan & Zahn, 1975; Kennelly & Kinley, 1975; Lewis & Adank, 1975; Portes & Wilson, 1976; Prawat, 1976; Prendergast & Binder, 1975; Primavera, Simon, & Primavera, 1974; Pugh, 1976; Schultz & Pomerantz, 1974; Simon & Simon, 1975; Satenner & Katzenmeyer, 1976; White & McConnel, 1974; White & Simmons, 1974.) To obtain a group of studies representative of current psychological research, all studies cited in Psychological Abstracts International for 1974, 1975, and 1976 under the descriptors Motivation, Self-concept, Self-esteem, Internal-External Locus of Control, and Academic Achievement Motivation were scanned. In addition, to obtain a representative set of studies on a basic school subject 1,578 studies discussed in Weintraub et al.’s (1974, 1975, 1976) “Summary of Investigations Relating to Reading,” published annually in Reading Research Quarterly, were searched. From these two sources, all studies were included that reported one or more Pearson product-moment correlations between any measure of motivation and any educational achievement or ability measure and that did not represent special populations, namely, juvenile delinquents, mentally or physically handicapped, the learning disabled, college students, or adults. Ability correlations were included for comparative purposes. This selection yielded 110 correlations from 18 studies. The combined samples contain 232 correlations between motivation measures and achievement measures based on 40 studies of a total of 36,946 students. These students were in addition to the more than 498,044 represented in four correlations from the Coleman, Campbell, Hobson, McPartland, Mood, Weinfeld, and York (1966) Equality of Educational Opportunity study in the calibration sample. Without the Coleman figures, the size of the samples ranged from 42 to 2,213 with a mean of 438 students and an even distribution across size.
Procedure Although physical scientists have long quantitatively summarized numerical results across studies, until recently there were few examples in psychological research (Jones & Fiske, 1953). Recent writings of Gage, Glass and Smith,
Salkind_Chapter 62.indd 122
9/4/2010 10:52:55 AM
Uguroglu and Walberg
Motivation and Achievement
123
Light and Smith, and Rosenthal, however, make a persuasive case for the need and value of such synthesis in educational and psychological research (see Glass, 1978). The present research makes use of the techniques developed by Glass. Twenty-seven characteristics (Table I) were recorded for each of the 232 correlations from the 40 studies so that the analysis could reveal how the sizes of the correlations are related to the characteristics of the samples and measures. Most of these characteristics are self-explanatory, but several require explanation. Concurrent validity studies are those that obtain motivation and achievement measures roughly at the same time; predictive validity studies are those which obtained motivation measures at least three months before the achievement measures. Reliability was recorded for all motivation measures for which it was reported in the original study; unreported reliabilities were estimated at .77, the mean of the reported reliabilities from the validation sample. The mean was considered a neutral value and used so that studies that did not report reliabilities would not be deleted from the analysis; special regressions, however, were run to test the possibility
Table I: Variable list and coding conventions Variable Sample Grade Size of sample Sex of sample Validity Motivation Measure Reliability Motivation Measure Reliability Source Nationality Observed Correlation
Coding of Categorical Variables and Range of Continuous Variables Calibration = 0, Validation = 1 1 through 12 42 through 200,000 Male = 1, Mixed = 1.5, Female = 2 Predictive = 0, Concurrent = 1 .48 through .91 Reported in study = 0, Estimated = 1 U.S. population = 0, other than U.S. population = 1 –.31 through .98
Motivation Measures General Self-concept Academic Self-concept Locus of Control/Field Dependence Achievement Motivation Mathematics Self-concept
(all coded: not present = 0, present = 1)
Achievement Measures General Achievement Test Reading Achievement Test Math Achievement Test Language Achievement Test Other Achievement Test General Grade Point Average Math Grade Index Language Grade Index Social Studies Grade Index Other Grade Index General Ability Verbal Ability Nonverbal Ability
(all coded: not present = 0, present = 1)
Salkind_Chapter 62.indd 123
9/4/2010 10:52:56 AM
124
Motivation
that such estimation might cause misleading results. Mathematics Self-concept was coded separately because 56 correlations from one study of Crosswhite in 1972 (Bloom, 1976) involved this particular measure; none of the other motivation measures is specific to particular school subjects.
Results and Discussion Table II shows stem-and-leaf diagrams (Tukey, 1977) of the correlations for the two samples. The first decimal place of the correlations is represented on the stem on the left of the vertical line, and the second decimal place is represented as a leaf to the right of the line; for example, the highest and lowest outlying correlations for the calibration sample are .71 and .07. These Table II: Correlations between motivation and achievement Calibration sample
Validation sample
.9
.9
.9
.9
.8
.8
.8
.8
.7
8
.7
.7
1
.7
.6
77
.6
.6
12344
.6
.5
6889
.5
67
.5
011111111122344
.5
014
.4
55666678
.4
566778888
.4
00011122222334
.4
0000001122334
.3
555556677789999
.3
556788889999
.3
00000011123333334444
.3
000000011122223333444
.2
55566677777888899999
.2
5666777777889
.2
00011233334
.2
000011111233444
.1
556699
.1
577889
.1
00012234
.0
5
.0
.0
2
–.0
–.0
134
–.0
–.0
7
–.1
–.1
–.1
–.1
–.2
–.2
–.2
–.2
–.3
–.3
–.3
–.3
.1 .0
7
Salkind_Chapter 62.indd 124
1
9/4/2010 10:52:56 AM
Uguroglu and Walberg
Motivation and Achievement
125
diagrams, unlike the usual bar graphs, reveal each data point as well as the shape and irregularities of the distribution. The calibration sample reveals a slight tendency toward bimodality with peaks at about .30 and .51. The validation sample is more normally distributed but has negative correlations based on younger children in the primary grades. The validation sample also has two outlying correlations, –.31 and .98. The sources of these correlations (Cobb, Chissom & Davis, 1975; Prendergast & Binder, 1975) yield neither convincing reasons for the unusual results nor for deleting the correlations, although the possibility of their undue influence on the general results was checked in the analysis. Despite these irregularities, the central tendencies and spreads of the samples are reasonably close. The median correlation for the calibration and validation samples are .35 and .30 respectively. The middle 50% of the correlations of the samples are bounded by .28 and .46 and .21 and .40, yielding comparable interquartile ranges of .18 and .19. The median correlation for the combined sample, .325, suggests that motivation typically “accounts” for about 10.6% of the variance in achievement and ability. However, since the range of values after liberally trimming the extreme, perhaps questionable outlying values, is at least from .10 to .60 (Table II), the variance accounted for under partly identifiable conditions ranges from one to 36%. What characteristics of the sample and the measures account for this range of variation? To answer this question, the correlations for the different samples and categories may be compared. This is done in two ways in the present study: first, by tabulation of the means, standard deviations, and numbers of correlations for the subgroups to show how many times certain relations have been investigated (and also to introduce gradually the notion of comparison of averages of correlations, which may be unfamiliar to some readers); but because these simple comparisons do not control for the other variables investigated, a second method, multiple regression analysis, is employed.
Uncontrolled Comparisons Table III shows the results by grade level: the correlations are notably small in first grade, perhaps because of reading problems, and tend to be higher in the later grades, perhaps because of self-insight. The numbers of correlations are reasonably well distributed across grade levels, but there are fewer at the lower grade levels, an age level that has been less frequently investigated. Table IV shows that the correlations are slightly lower for boys, and that girls have been somewhat less often studied. Table V shows that concurrent and predictive mean correlations are approximately the same but concurrent correlations are considerably more frequently investigated and also more variable.
Salkind_Chapter 62.indd 125
9/4/2010 10:52:56 AM
126
Motivation
Table III: Correlations by grade Grade
Mean correlation
Standard deviation
Number of correlations
1
.07
.19
9
2
.25
.14
4
3
.25
.08
15
4
.29
.07
6
5
.35
.10
27
6
.36
.12
37
7
.31
.14
40
8
.38
.09
12
9
.41
.18
26
10
.29
.08
13
11
.36
.11
19
12
.44
.18
24
Mean correlation
Standard deviation
Number of correlations
Males
.35
.15
66
Females
.37
.15
49
Mixed
.32
.13
117
Mean correlation
Standard deviation
Number of correlations
Concurrent
.34
.17
145
Predictive
.34
.11
87
Table IV: Correlations by sex
Note: Not all samples were separated by sex.
Table V: Correlations by type of validity
Table VI cross-classifies the two samples by motivation measure. With the exception of Academic Self-concept in the calibration sample, the averages of the correlations across the five categories of motivation lie in the limited range of .29 to .31. The calibration sample has correlations only for Academic and Mathematics Self-concept whereas the validation sample has a wider variety of measures. These results are due to the sampling procedures: Bloom (1976) restricted motivation to Academic Self-concept and obtained 56 correlations from an international study of mathematics achievement carried out in a number of countries by Crosswhite in 1972. The validation sampling more broadly reflects the educational and psychological literature, but also shows that recent studies have produced twice as many correlations for General Self-concept as the other three motivation measures combined.
Salkind_Chapter 62.indd 126
9/4/2010 10:52:56 AM
Uguroglu and Walberg
Motivation and Achievement
127
Table VI: Correlations by type of motivation measure Type of motivation measure Calibration Sample General Self-concept Academic Self-concept Locus of Control Achievement Motivation Mathematics Self-concept
Mean correlation
Standard deviation
Number of correlations
— .43 — — .31
— .14 — — .08
— 66 — — 56 122
Validation Sample General Self-concept Academic Self-concept Locus of Control Achievement Motivation Mathematics Self-concept
.29 .31 .32 .31 —
.17 .14 .13 .12 —
74 10 13 13 — 110
Total General Self-concept Academic Self-concept Locus of Control Achievement Motivation Mathematics Self-concept
.29 .41 .32 .31 .31
.17 .15 .13 .12 .08
74 76 13 13 56 232
Note: The Locus of Control category also includes Field Dependence.
The achievement measures (Table VII) are broadly categorized as achievement areas, grade indices, and ability tests. The grade indices correlated the highest with motivation measures which may suggest greater accuracy on the part of the teacher in evaluating student progress and also the role of the teacher in the motivation of the student.
Regression-controlled Results To control the effects of the independent variables for one another, multiple regression analysis was employed (Daniel & Wood, 1971). To find a parsimonious, nonredundant model that explains the maximum amount of variance and that is not distorted by aberrant, outlying data points or studies (Walberg & Rasher, 1976), 12 regressions were run. The first includes 25 variables (all 27 variables in Table I except Achievement Motivation and Other Grade Index whose effects are represented in the constant term) and accounts for 40.7% of the variance in the correlations. The final equation includes only the eight consistently significant variables and accounts for nearly as much variance: 39.2%. The variables deleted make no contribution to the explained variance because they are either insufficiently related to the size of the correlations, are redundant with the significant variables, or both.
Salkind_Chapter 62.indd 127
9/4/2010 10:52:56 AM
128
Motivation
Table VII: Univariate statistics of correlations by type of achievement measure
Achievement Tests General Reading Math Language Other
Mean correlation
Standard deviation
Number of correlations
.28 .35 .30 .28 .44
.17 .16 .09 .17 .03
40 30 69 6 4 149
Grade Index General GPA Math Language Social Studies Other
.45 .40 .51 .50 .60
.15 .07 .09 .04 .01
38 18 2 2 2 62
Ability Tests General Verbal Non Verbal
.16 .39 .25
.16 .08 .07
7 5 9 21
Several intermediate steps deleted variables at successively higher significance levels until only those at the .05 level remain. Another step showed that deleting the two outlying correlations, .98 and –.31 (Table II), had little effect on the explained variance or the regression weights. Several steps tested the possibility of whether deleted variables that were at borderline significance at earlier steps would gain significance if entered one at a time. Another step set the largest number of cases from the Coleman et al. (1966) research to an intermediate level of 300 to test the possibility that the apparent tendency of larger studies to produce smaller correlations is attributable only to the Coleman et al. (1966) research, which has far higher sample sizes than the other studies. This was found to be true; the rushed schedule of the research and substandard administrative testing conditions might have accounted for the anomalous trend, which is not confirmed in the data from the other 39 studies. The penultimate step (discussed further below) entered the product of grade level and sex to determine if the two variables interact in the determination of the size of the correlation of motivation and achievement measures. The final equation shows that the magnitude of the correlation between motivation and achievement is parsimoniously predicted by the following regression weights for the significant variables (all probabilities less than .05; t-values given in parentheses):
Salkind_Chapter 62.indd 128
9/4/2010 10:52:56 AM
Uguroglu and Walberg
Motivation and Achievement
129
.48 + .10 (Calibration Sample) + .01 (Grade Level) (5.30) (5.45) – .16 (Math Self-concept) – .06 (General Achievement) (7.63) (2.73) + .11 (Math Grade Index) – .15 (General IQ Test) (3.81) (3.39) – .09 (Nonverbal IQ Test) – .34 (Motivation Reliability). (2.37) (3.09) The first weight after the constant means that, controlled for the other significant variables, the correlation for the calibration sample is .10 higher on average than in the validation sample. Table VI shows that the two selection procedures resulted in samples that differ in the types of motivational variables investigated; the 66 correlations for Academic Self-concept in the calibration sample which average .12 higher than in the validation sample largely explain the difference. Grade level has a weight of .01, which means that a one-unit rise in level adds this much to the estimated correlation; for example, 12th-grade samples, other things being equal, yield correlations .10 higher on average than second-grade samples (see Table III for the trend – uncontrolled for the other variables). The closer linkage between motivation and achievement among older students may be attributable to their wider and longer experience in comparing their ability and performance to age peers. Grade level is the only student characteristic that is significant, and it is notable that there is no significant difference on average between the correlations for boys and girls (see also Table IV). Some previous research indicates a possible interaction between age and sex in the determination of the relation of psychological traits and educational performance (Walberg, 1969); the association between the two could become closer with age in samples of girls than in samples of boys. To test this possibility, the product of grade level and sex was added to the 11th equation; it was found to be nonsignificant but showed the same trend as the previous research. Thus, from the large number of correlations analyzed in this research, it must be concluded that neither sex nor its interaction with grade level appears to be a significant determinant of the size of the correlations, although both these possibilities warrant further research. The next five coefficients identify characteristics of the motivation and educational outcome measures that significantly affect the correlations between them. Two of the weights reveal differences between Crosswhite’s (Bloom, 1976) 56 correlations involving mathematics achievement and the 176 correlations from the other 39 studies: correlations involving Mathematics Self-concept are on average .16 lower, and those involving Mathematics Grade Index are .11 higher than the other correlations with the other independent variables controlled. These controlled estimates do not correspond to the uncontrolled comparisons in Tables VI and VII because they are adjusted for differences between the calibration and validation samples as well as the
Salkind_Chapter 62.indd 129
9/4/2010 10:52:56 AM
130
Motivation
large number of mathematics correlations obtained from high school samples. The significant differences associated with mathematics should be viewed as provisional since they are based only on a single study. The correlations of ability and motivation were included for comparative purposes. The regression equation indicates that correlations of general ability and nonverbal ability are, respectively, .15 and .09 lower than for verbal ability and the achievement correlations. These findings provide support for Cattell’s (1963) theoretical dimension of fluid and crystallized aptitudes: crystallized aptitudes such as verbal-educational achievement are more susceptible to internal influences such as motivation as well as external influences such as home environment and instruction. The last weight shows that more reliable motivation measures are less closely associated with achievement than are less reliable measures.1,2 Humphreys (1970) showed the reason for such a seemingly paradoxical finding in a general discussion of reliability assessment and factor analysis of psychological traits. Most reliabilities reported for psychological and educational tests, including those for the present samples, are indexes of internal consistency or factor purity rather than stability across occasions or forms. Because internal consistency estimation only requires one administration, it is most frequently used; indeed, item selection from pilot forms, based on correlations of items with the total test scores (a common practice) leads to unifactorial instruments that have lower validity, that is, correlations with external criteria, because the test content is narrowed. The regression weight for the motivation reliability in the present data supports this interpretation in that an increase in the reliability across the range from the lowest to the highest value, .48 to .91, is associated with a .15 drop in the correlation between motivation and achievement.3
Conclusion The mean correlation between motivation and achievement from samples of studies in psychological and educational literature is .338 (with a standard error of .009), which indicates that motivation measures on average “account” for 11.4% of the variance in achievement. There is considerable dispersion around the mean (Table II), and regression yields a more precise estimate and points to the significant factors that determine the size of the correlations associated with particular samples and measures. Perhaps the most general and useful estimates are .48 – .34 (.77) + .01 (1) = .24 and .48 – .34 (.77) + .01 (12) = .35 for the estimated correlation from the validation sample containing more recent and representative literature, for motivation measures with the average reliability of .77, and for first- and 12th-grade samples. These estimates are applicable to both sexes, to General Self-concept, Academic Self-concept,
Salkind_Chapter 62.indd 130
9/4/2010 10:52:56 AM
Uguroglu and Walberg
Motivation and Achievement
131
Locus of Control and Field Dependence, and Achievement Motivation, and most of the specific subject achievement measures and indexes. The exceptions to the estimates are: the correlations from the pre-1974, less representative literature that are .10 higher on average, the lower correlations for General and Nonverbal ability measures, and the mixed pattern for mathematics correlations that may be unique to a single large study. Motivation measures appear to be associated with less variance in educational achievement on average than are other factors in learning which are replicable correlates; the squares of the two estimates above yield figures ranging from 5% in first grade to 12% in twelfth grade. Similarly derived and partly overlapping estimates suggest that ability is associated with about 60% of the achievement variance; quality of instruction, about 15%; amount of time spent in learning, 15%; sociopsychological characteristics of the classroom group, 60%; and educationally-relevant aspects of the child’s home environment, about 60% on average in elementary and secondary school samples (Walberg, Note 1). The relations among the replicable, productive factors in learning, however, appear to be multiplicatively interactive; as a consequence, any factor such as motivation that is at very low level for a particular student or group can be a potent deterrent to learning. Although the present analysis shows that motivation measures are relatively weak correlates of learning, it also suggests that multiple or multifactorial, and hence more valid, measures of motivation rather than more internally consistent or homogeneous single measures, are likely to improve predictions. Because of their replicated correlations with achievement and potential for psychometric improvement, motivation measures clearly deserve inclusion in general research on classroom learning along with the other factors to determine the causal directions and weights for the factors as well as to point to the most effective ways to make learning more productive.
Notes 1. Since achievement reliabilities are rarely reported, the reliability of the achievement measures was set at .85 for all correlations. The reliabilities of the motivation measures were taken directly from each study or were set to .77 (the mean of all the motivation reliabilities from the validation sample) in cases in which a motivation reliability was not reported. Using the appropriate reliabilities, all correlations were corrected for attenuation (Guilford, 1954, p. 400) to see how strongly the results reported in the text would be affected after adjusting for the error of measurement. The mean correlation changed from .338 for the observed to .421 when corrected for attenuation; however, in the final regression model, the variance did not fluctuate in the first four decimal places. The b weights and t values were also similar for all variables except motivation reliability which is discussed in the text. 2. Using the Fisher Z transformation formula (Glass & Stanley, 1970, p. 265), all correlations were also adjusted for the uneven skewness which results when there is other than a normal distribution in analyses not discussed in the text. Unless a correlation is zero, skeawness occurs increasingly with the higher positive or lower negative correlations. The Z transformation adjusts the raw correlation so significance tests can be applied.
Salkind_Chapter 62.indd 131
9/4/2010 10:52:56 AM
132
Motivation
The mean observed correlation changed from .338 to .366 when transformed; however, in the final regression model, the variance accounted for did not fluctuate. The b weights and t values were also similar for all variables. Thus, the Z transformation does not change the essential results reported in the text. 3. Various methods were used to check whether one correlation abnormally affected the others, but the most rigorous method used to see whether one study affected the others was the Tukey jackknife (Mosteller & Tukey, 1977, p. 135). Each study was weighted inversely to the number of correlations it contributed to the total 232, and then 40 regressions were run eliminating each study one after the other. The results of the final jackknifed regression showed the following: .31 + .25 (Calibration Sample) + .02 (Grade level) (1.34) (3.09) – .15 (Math Self-concept) – .05 (General Achievement) (5.14) (1.11) + .11 (Math Grade Index) – .14 (General IQ Test) (6.11) (1.90) – .11 (Nonverbal IQ Test) – .00 (Motivation Reliability). (1.81) (.03) The t values for the sample and general achievement became nonsignificant; it could be concluded that there was no difference between the calibration sample and the validation sample. Motivation Reliability according to the Tukey method makes no difference; however, the grade-level effect was strengthened from .01 to .02. Finally, except where just mentioned, all results were similar in sign, magnitude, and significance as stated in the text. The authors thank Maurice J. Eash, Dean of the College of Education; Harriet Talmage, Director of the Office of Evaluation Research; Sue Pinzur Rasher; Patricia Wang; and Aurelia Jones at the University of Illinois at Chicago Circle for institutional support; and Diane Schiller, Barbara K. Iverson, and Donna Hetzel for their ideas and encouragement. The research presented in this article was supported by the National Institute of Education (Grant No. NIE-G78-0090) and the National Science Foundation (Grant No. NSF-78-17374); the points of view and opinions stated do not necessarily represent the official position or policy of either agency.
Reference Note 1. Walberg, H. J. A psychological theory of educational productivity. Invited paper read at the annual meeting of the American Psychological Association, Toronto, September 1978.
References Bloom, B. S. Human characteristics and school learning. New York: McGraw-Hill, 1976. Catell, R. B. (Ed.). Handbook of multivariate experimental psychology. Chicago, Ill.: Rand McNally, 1963. Cattell, R. B., & Child, D. Motivation and dynamic structure. New York: Wiley, 1975. Chang, T. S. Self-concepts, academic achievement, and teacher’s rating. Psychology in the Schools, 1976, 13, 111–113. Cobb, P . R., Chissom, B. S., & Davis, M. W. Relationships among perceptual-motor, selfconcept, and academic measures for children in kindergarten, grades one and two. Perceptual and Motor Skills, 1975, 41, 539–546.
Salkind_Chapter 62.indd 132
9/4/2010 10:52:56 AM
Uguroglu and Walberg
Motivation and Achievement
133
Cole, J. L. The relationship of selected personality variables to academic achievement of average aptitude third grade. The Journal of Educational Research, 1974, 67, 329–333. Coleman, J. S., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood, A. M., Weinfeld, F. D., & York, R. L. Equality of educational opportunity. Washington, D.C.: U.S. Department of Health, Education and Welfare, 1966. Daniel, C, & Wood, F. S. Fitting equations to data. New York: Wiley-Interscience, 1971. Frymier, J. R., Henning, M. J., Henning, W., Norris, L., & West, S. C. A longitudinal study of academic motivation. The Journal of Educational Research, 1975, 69, 63–66. Glass, G. V Integrating findings: The meta-analysis of research. In L. S. Schulman (Ed.), Review of Research in Education (Vol. 5). Itasca, Ill.: Peacock, 1978. Glass, G. V, & Stanley, J. C. Statistical methods in education and psychology. Englewood Cliffs, N.J.: Prentice-Hall, 1970. Guilford, J. P. Psychometric methods. New York: McGraw-Hill, 1954. Handel, A. Attitudinal orientations and cognitive functioning among adolescents. Developmental Psychology, 1975, 11, 667–675. Humphreys, L. G. A skeptical look at the factor-pure test. In C. E. Lunneborg (Ed.), Current problems and techniques in multivariate psychology. Seattle: University of Washington Press, 1970. Jones, L. V., & Fiske, D. W. Models for testing the significance of combined results. Psychological Bulletin, 1953, 50, 375 –382. Kagan, S., & Zahn, G. L. Field dependence and the school achievement gap between AngloAmerican and Mexican-American children. Journal of Educational Psychology, 1975, 67(5), 643–650. Kennelly, K., & Kinley, S. Perceived contingency of teacher administered reinforcements and academic performance of boys. Psychology in the Schools, 1975, 12, 449– 453. Lewis, J., & Adank, R. Intercorrelations among measures of intelligence, achievement, selfesteem and anxiety in two groups of elementary school pupils exposed to two different models of instruction. Educational and Psychological Measurement, 1975, 35, 499–501. Mosteller, F., & Tukey, J. W. Data analysis and regression. Reading, Mass.: Addison-Wesley, 1977. Portes, A., & Wilson, K. L. Black-white differences in educational attainment. American Sociological Review, 1976, 41, 414 – 431. Prendergast, M. A., & Binder, D. M. Relationships of selected self-concept and academic achievement measures. Measurement and Evaluation in Guidance, 1975, 8, 92–95. Primavera, L. H., Simon, W. E., & Primavera, A. M. The relationship between self-esteem and academic achievement: An investigation of sex differences. Psychology in the Schools, 1974, 11, 213–216. Pugh, M. D. Statistical assumptions and social reality: A critical analysis of achievement models. Sociology of Education, 1976, 49, 34 – 40. Schultz, C. B., & Pomerantz, M. Some problems in the application of achievement motivation to education: The assessment of motive to succeed and probability of success. The Journal of Educational Psychology, 1974, 66, 599– 608. Simon, W. E., & Simon, M. C. Self-esteem, intelligence and standardized academic achievement. Psychology in the Schools, 1975, 12, 97–100. Steanner, A. J., & Katzenmeyer, W. G. Self-concept, ability and achievement in a sample of sixth grade students. The Journal of Educational Research, 1976, 69, 270–273. Thorndike, R. L. (Ed.). Educational measurement. Washington, D.C.: American Council on Education, 1971. Tukey, J. W. Exploratory data analysis. Reading, Mass.: Addison-Wesley, 1977. Walberg, H. J. Physics, femininity and creativity. Developmental Psychology, 1969, 1, 47–54. Walberg, H. J., & Rasher, S. P. Improving regression models. Journal of Educational Statistics, 1976, 1, 253–277.
Salkind_Chapter 62.indd 133
9/4/2010 10:52:56 AM
134
Motivation
Walberg, H. J., & Uguroglu, M. E. Motivation and educational productivity; theories, results, and implications. In L. J. Fyans, Jr. (Ed.), Achievement motivation; recent trends in theory and research. N.Y: Plenum, 1979. Weintraub, S., Robinson, H. M., Smith, H. K., Plessas, G. P., & Rowls, M. Summary of investigations relating to reading, July 1973 to June 1974. Reading Research Quarterly, 1974 –1975, ( Vol. 10). Weintraub, S., Robinson, H. M, Smith, H. K., Plessas, G. P., Roser, N. L., & Rowls, M. Summary of investigations relating to reading, July 1974 to June 1975. Reading Research Quarterly, 1975 –1976, ( Vol. 11). Weintraub, S., Robinson, H. M, Smith, H. K., Plessas, G. P., & Rowls, M. Summary of investigations relating to reading, July 1975 to June 1976. Reading Research Quarterly, 1976 –1977, (Vol. 12). White, W. F., & McConnel, J. Affective responses and school achievement among eighth grade boys and girls. Perceptual and Motor Skills, 1974, 38, 1,295–1,301. White, W. F., & Simmons, M. First grade readiness predicted by teachers perception of students maturity and students perception of self. Perceptual and Motor Skills, 1974, 39, 395–399.
Salkind_Chapter 62.indd 134
9/4/2010 10:52:56 AM
63 Academic Motivation and Achievement among Urban Adolescents Joyce F. Long, Shinichi Monoi, Brian Harper, Dee Knoblauch and P. Karen Murphy
T
he time of adolescence can be fraught with peril, particularly during the transition from middle school to high school. Moreover, the transitional adjustments of urban minority adolescents can be even more troublesome (Seidman, Aber, Allen, & French, 1996) because of already stressful home and neighborhood environments (Gillock & Reyes, 1996; Reyes, Gillock, Kobus, & Sanchez, 2000; Seidman, Allen, Aber, Mitchell, & Feinman, 1994). Upon entering high school, students often encounter a larger, more heterogeneous student body, whole-class instruction, higher levels of competition (Bryk & Thum, 1989), and rigid academic ability tracking (Seidman & French, 1997). In addition, there can be a loss of social status for ninth-grade students who are now the youngest in the school (Eccles et al., 1993). These transitions also can be accompanied by increased stress levels, decreased self-esteem (Alvidrez & Weinstein, 1993), academic underachievement, and social maladjustment (Reed, McMillan, & McBee, 1995). Furthermore, the size and bureaucracy of urban public schools (Seidman et al., 1994) may further exacerbate these transitional characteristics. If urban students are to successfully maneuver through the increasing challenges and academic rigors of high school, their motivation to learn must be supported throughout the transition. Motivation can be defined as a “temporal sequence that is started, sustained, directed, and finally terminated,” which examines “why people think and behave as they do” (Graham & Weiner, 1996, p. 63). Although motivational factors “are at the heart of contemporary Source: Urban Education, 42(3) (2007): 196–221.
Salkind_Chapter 63.indd 135
9/4/2010 10:52:44 AM
136
Motivation
concerns about the status of African Americans in general and their academic achievements in particular” (Graham, 1994, p. 55), researchers know very little about how motivational variables relate to achievement in classrooms where African American students predominate. Several explanations postulated by sociologists, however, do address the underperformance of African American students relative to their Caucasian counterparts, and their conclusions include factors unique to this ethnic group (Steinberg, Dornbusch, & Brown, 1992). The intergenerational legacy of slavery and discrimination, for example, may force African Americans to develop an oppositional identity, which rejects the values of the dominant culture (Fordham & Ogbu, 1988). When African American students reject the effortful pursuit of academic excellence as “acting White,” this practice results in failure and estrangement from opportunities for mainstream success (Ogbu, 1988). In addition, negative stereotypes about their group of origin can be threatening to African American students and diminish their motivational beliefs (Aronson, Quinn, & Spencer, 1998). For example, images in the electronic and print media can stereotypically represent African Americans as being deficient in verbal and intellectual abilities compared with other ethnic groups. If students’ awareness of this stereotype is coupled with a deliberate affiliation toward this disparaged group, a stereotype threat can be initiated, which produces a specific psychological anxiety that inhibits the efficacy and cognitive performance of African American students (Aronson & Good, 2002; Steele & Aronson, 1995). This was clearly demonstrated in a series of experiments that examined negatively stereotyped intellectual abilities of African American students at Stanford University (Steele, Spencer, & Aronson, 2002). White and Black students were invited, one at a time, to enter a laboratory where they were administered a brief section of the Graduate Record Examination. Students in the treatment condition were informed that the test was a measure of intellectual prowess, whereas those in the control section were merely instructed to complete the examination to the best of their ability. The researchers hypothesized that for Black students in the treatment condition, the risk of confirming negative stereotypes about intellectual ability relative to other racial groups would heighten anxiety and impede performance. This was indeed the case: Black and White students in the control condition performed similarly on the examination, whereas White students outperformed Black students by a full standard deviation among those for whom the stereotype was made salient. As such, any model attempting to account for the academic achievement of African American students must attend to multiple influences and factors (Graham, 1994). Thus, this research project is directed toward understanding how three sources of motivation may relate to the academic achievement of predominantly African American urban students during their transition to high school. More specifically, the motivational variables selected for their
Salkind_Chapter 63.indd 136
9/4/2010 10:52:45 AM
Long et al.
Motivation and Academics
137
association with achievement include interest (Schiefele, Krapp, & Winteler, 1992), self-efficacy (Bandura, 1997), and achievement goal orientation (Ames, 1992).
Interest William James (1958) remarked that a century ago, no other topic had received more pedagogical attention than interest. Conceptualized by Dewey (1899) as the formation of a relationship between a person and an object, some describe interest as being deep-seated and originating in the individual (e.g., Renninger, 2000; Schiefele, 1991). Others term interest to be a temporary response relative to the attractiveness of a situation or object (e.g., Hidi & Baird, 1988). Regardless of the source of origination, however, interest is described as energizing the underlying needs or desires of the learner (Alexander, Murphy, Woods, Duhon, & Parker, 1997) in a way that can positively influence the cognitive (Schiefele, 1996), affective (Sansone & Smith, 2000), and volitional (Dewey, 1899) components of individual learners. Interests are further categorized according to the degree of content specificity being considered. More precisely, topic interests focus on a single area, and domain interests relate to “a range of activities, text passages dealing with the field, or body of knowledge in general” (Tobias, 1994, p. 47). When interest is conceptualized as a domain-specific motivational variable, educators use this information to investigate why students are motivated to learn specific subject matter over others when all the activities appear to have the same value and provide similar challenges (Alexander & Murphy, 1998). However, interest’s potential for energizing learning appears to be limited when students possess lower levels of knowledge (Alexander, Kulikowich, & Schulze, 1994). Thus apparently positive findings that poor Black elementary students have higher science interest levels than their affluent White neighbors may not be cause for rejoicing because the White children possessed more knowledge of science (Wenner, 2003). This interest and knowledge partnership is also exemplified by decades of research into the juncture of interest and achievement (e.g., grades). One meta-analysis of these studies (Schiefele et al., 1992) revealed correlations ranging from .17 in literature to .35 in science among students from all grade levels, but the authors were unable to test for developmental differences because the distribution of studies across grade levels was unbalanced. More recently, empirical studies into this connection have continued in Europe, Australia, Africa, Canada, and the United States, indicating that academic interest is a cross-cultural phenomenon. However, the overwhelming majority of the work is done among Caucasian students, so we know considerably less about the academic interests of urban Black students and how they specifically relate to achievement.
Salkind_Chapter 63.indd 137
9/4/2010 10:52:45 AM
138
Motivation
In addition, distinct gender differences appear to exist. Women, for example, can display a greater interest in music (Marjoribanks & Mboya, 2004), human biology, and social/moral issues, whereas men may exhibit preferences for scientific research and environmental preservation (Gardner & Tamir, 1989). However, the same meta-analysis cited earlier (Schiefele et al., 1992) indicated that the academic performance of female students was “less associated with their interests” than their male counterparts (p. 202). Of these studies, few have noted gender differences being related to academic transitions.
Self-Efficacy Beliefs Social cognitive theory (Bandura, 1977, 1997) suggests that self-efficacy beliefs powerfully influence the choices people make, the amount of effort they expend, and their level of persistence. Defining self-efficacy as “people’s judgments of their capabilities to organize and execute courses of action required to attain designated types of performances” (Bandura, 1986, p. 391), individuals with high self-efficacy beliefs view difficult tasks as challenges, remain committed to their goals, and increase their efforts when faced with failure. As such, their perseverance typically results in performance accomplishments. In contrast, individuals who have low self-efficacy beliefs do not embrace difficult tasks because they are seen as personal threats. When confronted with difficult tasks, individuals with low self-efficacy focus on their weaknesses, obstacles, or negative outcomes and easily give up. Because failure profoundly affects efficacy beliefs (Bandura, 1993), efficacy beliefs are correlated with academic choices, changes, and achievement. Consequently, efficacy beliefs can powerfully determine and predict the level of success that individuals will attain (Pajares, 1996). Schunk (1989) reported on the predictive utility of self-efficacy beliefs in regard to academic performance, noting that significant and positive correlations (rs = .33 to .42) were found between self-efficacy beliefs and the number of arithmetic problems that students completed during a lesson. Such correlations (rs = .27 to .84) were also found between self-efficacy and the proportion of problems solved correctly. Similarly, researchers discovered a strong correlation between self-efficacy beliefs and skill in reading and writing tasks among college students (Shell, Murphy, & Bruning, 1989). Gender differences in student academic self-efficacy beliefs have been reported, particularly in the domains of mathematics and writing. Pajares and Miller (1994) indicated that male undergraduates in their study expressed higher mathematical self-efficacy than did female undergraduates, whose poorer performance in math problems was “largely due to lower judgments of their capability” (p. 200). During elementary years, girls and boys exhibited no differences in their mathematics self-efficacy, but by middle school, boys displayed higher efficacy than did girls ( Wigfield, Eccles, & Pintrich, 1996). Furthermore, fifth-grade girls reported having higher writing self-efficacy
Salkind_Chapter 63.indd 138
9/4/2010 10:52:45 AM
Long et al.
Motivation and Academics
139
than did their male counterparts, but neither girls nor boys differed in writing performance (Pajares & Valiante, 1997). By ninth grade, girls and boys still possessed similar writing performance levels, but boys expressed higher selfefficacy for writing (Pajares & Johnson, 1996). These gender results appear to indicate that efficacious beliefs can change over time. As such, Eccles et al. (1993) found that declines in academic performance after a transition to middle school were a reliable predictor of lower selfconcept, intrinsic motivation, and confidence in intellectual abilities. These researchers proposed that such declines resulted from a developmental mismatch between the early adolescents and their classroom environment, resulting in negative motivational outcomes especially for struggling students. In one study of poor African American elementary students, GPA significantly declined during the transition to middle school, but students “who felt more academically efficacious in sixth grade” had higher grades than did their peers (Gutman & Midgley, 2000, p. 237). A gap, however, exists in the literature regarding the efficacy beliefs of minority youth following their transition to high school.
Goal Orientations A goal orientation framework incorporates learning contexts, personal academic goal orientations, learning behaviors, and academic achievement (Anderman & Maehr, 1994; Eccles & Midgley, 1989). Specifically, research has found that students’ contextual goal structures were determinants of their achievement goal orientations, which in turn influenced their learning behaviors and academic achievement. Therefore, achievement goal orientations seem to be a predictive factor for adolescents’ academic performance across changes in learning environments, such as during the school transitions (Eccles & Midgley, 1989). Achievement goal orientations have reflected students’ reasons for engaging in academic tasks (Ames, 1992; Dweck & Leggett, 1988). Within the goal orientation literature, at least three conceptually distinct types of achievement goal orientations have been identified (Pintrich & Schunk, 2002): learning, performance, and work-avoidant. Learning goal orientations pertain to an individual’s willingness to master the skills necessary for academic tasks or to increase knowledge and understanding with effort (Pintrich, 2000). When students with a learning goal orientation “encounter difficulties, they are likely to seek help or if necessary to persist with their own self-regulated learning efforts, buoyed by the belief that these efforts are worthwhile and the confidence that they will pay off eventually ” (Brophy, 2004, p. 90). Performance goal orientations, on the other hand, represent social comparisons, such as a desire to gain favorable judgments from others while avoiding negative judgments of one’s competence (Dweck, 1986). When comparing the two goal orientations, learning goal orientations were
Salkind_Chapter 63.indd 139
9/4/2010 10:52:45 AM
140
Motivation
considered to link to “a motivational pattern … likely to maintain achievement behavior,” whereas performance goal orientations tended to develop “a failureavoiding pattern of motivation” (Ames, 1992, p. 262). Generally, however, their contribution to achievement has yielded inconsistent research results (Brophy, 2004; Pintrich & Schunk, 2002). More recently, Elliot and Harackiewicz (1996) further distinguished performance goals by separating them into two types: performance-approach and performance-avoidance goals. They determined that performanceapproach goal orientations represented a desire to seek favorable judgments of competence, which is positively associated with academic achievement. In contrast, performance-avoidant goals represented a desire to avoid unfavorable judgments of competence and were negatively correlated with academic achievement. The third category of goal orientations, work-avoidant, focused on a student’s desire to finish assigned works with a minimum amount of effort (Meece, Blumenfeld, & Hoyle, 1988); this orientation is consistently reported as being detrimental to achievement behaviors. Researchers also have suggested that achievement goal orientations could change during school transitions. In a cross-sectional study across grade levels with predominantly White, middle-class adolescents, students were more oriented toward performance goals and less oriented toward learning goals in middle school than in elementary school (Midgley, Anderman, & Hicks, 1995). However, little is known about how the complete range of students’ achievement goal orientations and achievement behaviors may change during the transition to high school (Anderman, Austin, & Johnson, 2001; Newman, Myers, Newman, Lohan, & Smith, 2000), and current research has not frequently examined the predictive value of these goal orientations in urban African American students. Several studies, however, have examined gender differences in achievement goal orientations with mixed results. Some research (e.g., Meece & Miller, 2001; Middleton & Midgley, 1997) reported that adolescent gender differences existed only in work-avoidant goals (i.e., boys endorsing them more strongly than did girls). Contrary to these findings, other studies have noted significant gender differences in academic achievement goal orientations, suggesting that male adolescents were more oriented to performance goal orientations and less oriented to learning goal orientation than were female adolescents (e.g., Anderman & Midgley, 1997; Pajares, Britner, & Valiante, 2000). In sum, these inconsistent results indicate that additional research is necessary.
Integrative Impact How do these three motivational variables (i.e., interest, self-efficacy, and goal orientation) collectively interact and affect achievement? With regard to self-efficacy and goal orientation, judgments of competence or self-efficacy in middle school students “figure into motivation differentially depending on what
Salkind_Chapter 63.indd 140
9/4/2010 10:52:45 AM
Long et al.
Motivation and Academics
141
goal (orientation) dominates” (Anderman & Maehr, 1994, p. 298). Studies examining the juncture of goal orientation and interest in middle school and college students found that positive relationships existed when interest (task or subject) was paired with learning or mastery goals (Gehlbach, 2006; Van Yperen, 2003) and the relationship was especially strong for adolescent girls. When goals were joined with achievement, however, the results appeared to developmentally differ: in middle school students, “increases in mastery goal orientation related to higher levels of content knowledge and better grades” (Gehlbach, 2006, p. 366), whereas college students consistently linked performance goals with grades (Harackiewicz, Durik, & Barron, 2005). Lent, Brown, and Hackett (1994) formulated a theory of career interest development that featured self-efficacy, interest, and goals. They envisioned the linear process as originating with self-efficacy, progressing to interests, and then affecting goals. An additional direct link from self-efficacy to goals represented their belief that self-efficacy has both a direct and an indirect effect on goals. After testing this theory among engineering students in contrasting Black and White university samples, Lent et al. (2005) found that Black students reported stronger self-efficacy, technical interests (e.g., reading books about engineering issues), and educational goals (e.g., becoming an engineering major). However, further research on the model indicated that only mathematics efficacy and interest were found to affect grade performance (Lent, Lopez, & Bieschke, 1993). Some conclude that goals precede interest (Krapp, 1999), and others identify goals as an outcome of interest (Lent et al., 1994). Consequently, these variations have led to conclusions that the relationship may be reciprocal rather than unidirectional (Hidi & Harackiewicz, 2000). In addition, the relationship between goals and interest may be moderated by socioeconomic status. One study among high school students in South Africa found a connection between mastery goals and interest only in middle class participants, whereas interest and performance goals were more strongly aligned in their lower class counterparts (Marjoribanks & Mboya, 2004). As such, these limited findings provide us with an inadequate road map for distinguishing how goal orientation, interest, and self-efficacy might affect the achievement of urban eighth- and ninth-grade students. Although we readily acknowledge that a number of variables are likely mediating cognitive, affective, or motivational processes involved in learning, we constructed a motivational model that conceptualized achievement goal orientations, gender, and self-efficacy as contributing to domain interests, which in turn affected academic achievement. Specifically, our research questions included the following inquiries: (a) To what extent do gender, achievement goal orientations, and self-efficacy predict domain interests in urban adolescents who are predominately African American? and (b) Within the same urban setting, to what extent do gender, achievement goal orientations, self-efficacy, and domain interests predict achievement?
Salkind_Chapter 63.indd 141
9/4/2010 10:52:45 AM
142
Motivation
Methods Participants One site of this research project was a high school in a large urban district in the midwestern United States. Because students in the school have been performing below state proficiency levels in all subjects and grades (“Phi Delta,” 2002), the entire system was categorized as being in a state of “academic emergency.” This resulted in the curriculum of required courses (i.e., mathematics, English, social studies, and science) becoming more explicitly aligned with state proficiency exams, and teachers were expected to strictly adhere to the content of curriculum documents provided by central administration. The sample of eighth-grade students (n = 255) was drawn from three middle schools, which were feeder institutions for the single high school from which the ninth-grade sample (n = 159) was selected. The eighth-grade students consisted of 123 boys (48%) and 132 girls (52%). The ethnic breakdown was 87% African American or African American mix, 10% Caucasian, and 3% other (i.e., Hispanic, Native American, and Asian). Approximately 61% of the students received free or reduced lunches. For the ninth-grade sample, there were 83 boys (53%) and 75 girls (47%). Ethnically, students were 72% African American or African American mix, 22% Caucasian, and 6% Other. Almost 56% of the ninth-grade students received free or reduced lunches.
Measures Interest and self-efficacy. Students self-reported interest and efficacious belief levels in six subject domains: history, mathematics, science, reading, computer science, and art. The interest portion of the measure was composed of two items in which students identified their levels of interest and importance for each domain. Although not equivalent, researchers do consider value to be one valence of interest (Renninger, 2000). The self-efficacy portion consisted of three questions for each domain (e.g., ability to perform well in a math course, think through a math problem, and solve a math problem). Students selected from a 10-point modified Likert-type response scale ranging from 0 (strongly disagree) to 9 (strongly agree) and recorded their responses on a Scantron sheet. After the grade reports were gathered, we chose to compile composite interest and self-efficacy scores only for the four core domains (mathematics, science, reading, and history) required by the entire student population. The interest/self-efficacy scale had a Cronbach’s alpha of .92. Goal orientations. Achievement goal orientations were measured using 18 items related to learning, performance-approach, and performance-avoidance goals adapted from the Patterns of Adaptive Learning Survey (PALS; Midgley
Salkind_Chapter 63.indd 142
9/4/2010 10:52:45 AM
Long et al.
Motivation and Academics
143
et al., 1998). An additional fourth achievement goal orientation (i.e., workavoidant) was assessed using a six-item scale adapted from previous work by Meece and colleagues (Meece et al., 1988). Specifically, some sample items are as follows: “I want to do as little school work as possible; I would feel successful in school if I did better than most of the other students.” The original scales for the four goals were assessed using 5-point Likert-type scales. In this study, however, a 10-point Likert-type scale was employed to make scales of measurement consistent among all variables. On the basis of factor analysis with varimax rotation, a three-factor structure was identified for both the eighth and ninth goal items. An examination of items loaded on the factors revealed that in eighth and ninth grades, the goal items were divided into three primary goals: learning, performance, and work-avoidant goals (Cronbach’s alpha reliability for these factors ranged from .77 to .86). For the subsequent analysis of this study, composite mean scores were calculated for each of these three achievement goal orientations. Additional data about gender and ethnicity were gathered using a demographic measure that was part of the test package. Academic records, including final grade reports and participation in school lunch programs, were collected from the schools at the end of each school year (both eighth and ninth grades). Grades for core subjects (reading/literature, history/social study, math, and science) were used as indices of academic achievement. They were coded using a 4-point scale and then averaged to calculate a composite GPA.
Procedures Members of a trained research team administered a test packet that was part of a longitudinal study on resilience. The eighth-grade participants were tested in May during regular class periods. In the following school year, ninth-grade students were tested either in December or February, depending on which semester they participated in a required core class – Exploration of Literature and Composition. In that way, every freshman student had the same opportunity to participate in the study and to become part of a larger longitudinal research project designed to focus on the transition from middle to high school.
Results and Discussion Descriptive Statistics and Correlational Analyses Tables 1 and 2 present the means, standard deviations, and correlations for eighth- and ninth-grade students. Mean student scores in both grades for the five motivational variables (three goal orientations, self-efficacy, and interest)
Salkind_Chapter 63.indd 143
9/4/2010 10:52:45 AM
144
Motivation
Table 1: Bivariate correlations, means, and standard deviations for gender, motivational variables, domain interests, and academic achievements for the eighth graders (n = 255) 1
2
1. Gender — 2. Learning goal orientation .179** 3. Performance goal orientation 4. Work-avoidance goal orientation 5. Self-efficacy 6. Domain interests 7. GPA
–.116
3
4
M
SD
6
7
— 5.727
— 1.856
4.663
1.824
—
4.348
2.344
.588** .166** –.088 — .633** .153* –.121 .872** — .239** –.061 –.169** .204** .166**
5.774 5.875 2.130
1.950 1.951 0.823
— .137*
—
–.168** –.313** .407** –.024 .043 .192**
5
—
*p ≤ .05. **p ≤ .01.
Table 2: Bivariate correlations, means, and standard deviations for gender, motivational variables, domain interests, and academic achievements for the ninth graders (n = 159) 1 1. Gender 2. Learning goal orientation 3. Performance goal orientation 4. Work-avoidance goal orientation 5. Self-efficacy 6. Domain interests 7. GPA *p ≤ .05.
— –.072 –.302**
2
— .283**
–.243** –.112 –.101 –.069 .060
3
4
5
6
7
— .538**
—
.618** .277** .025 — .670** .309* –.016 .889** .154 –.095 –.217** .135
— .026
—
Μ
SD
— 5.627 4.663
— 1.924 1.883
4.340
1.916
5.921 5.896 1.472
1.814 1.884 1.091
**p ≤ .01.
were moderate in both grades, ranging in eighth grade from 4.348 for work avoidance to 5.875 for academic domain interests and in ninth grade from 4.340 for work avoidance to 5.921 for self-efficacy. Because no significant differences were detected between the eighth- and ninth-grade levels of any motivational variable, and we found no significant differences in interest or efficacy scores across the four core domains, we collapsed domain ratings into one composite score for interest as well as for self-efficacy. However, the mean GPA (2.130 for eighth grade) significantly decreased in ninth grade (1.427), as demonstrated by the independent t test, t(412) = 6.968, p < .001. This drastic drop in achievement is reminiscent of already-cited research among poor Black students where GPA significantly declined after the transition to middle school (Gutman & Midgley, 2000) and also corresponds to the literature’s contention that academic under-achievement can accompany the transition to high school (Reed et al., 1995). To assess the overall relationships among the variables in the study, we examined the zero-order intercorrelations for eighth- and ninth-grade students. All of the correlation values are interpreted as the Pearson correlation coefficients, although the correlations between gender, a dichotomous variable,
Salkind_Chapter 63.indd 144
9/4/2010 10:52:46 AM
Long et al.
Motivation and Academics
145
and the remaining continuous variables were computed by the point biserial correlation formula. By assigning two different numerical values to each category (1 for male, 2 for female) of the dichotomous variable, the values of the point biserial correlation coefficients are numerically equivalent to those that are obtained by the Pearson correlation formula (Gravetter & Wallnau, 2000). Results (Tables 1 and 2) suggested that motivational patterns both correspond to and differ from previously published studies. Domain interests of these adolescents in both eighth and ninth grades significantly and moderately related to their learning goals (r = .633, .618), strongly correlated with self-efficacy (r = .872, 889), but were less intensely connected to performance goals (r = .153, .309). In addition, performance goals associated with learning goals at a low level (r = .137, .283) but were more powerfully connected with work-avoidant goals (r = .407, .538). Moreover, achievement correlated with domain interest (r = .166), self-efficacy (r = .204), and learning goals (r = .239), but only in eighth grade. The relationship between work-avoidant goals and achievement, however, existed in both grades (r = –.169, –.217). These findings contrast with studies in college students where performance goals correlated only to grades (Harackiewicz et al., 2005), but support other empirical work with middle school students that associated learning goals and interest (Gehlbach, 2006). Furthermore, the connection between performance goals and interest is corroborated by the South African high school study among lower class students. Nonetheless, the high correlation between interest and self-efficacy indicates that when these predominantly Black urban students believe they are competent in mastering materials within a domain, they also are likely to be interested in that domain. Moreover, the eighth-grade data does parallel research findings that support the ongoing relationship between self-efficacy and achievement (Pajares, 1996) as well as between domain interest and achievement (Schiefele et al., 1992). Gender differences did exist in eighth-grade learning and work-avoidance goal orientations (r = .179, –.168, respectively) and academic achievement (r = .192). Using gender as an independent variable and motivational variables as dependent variables, a multivariate analysis of variance (MANOVA) showed a significant main effect, Wilks’s = .897, F(6, 248) = 4.764, p < .001. Female eighth-grade students tended to hold stronger learning or masteryoriented goals and obtained higher GPA scores, whereas boys expressed stronger preferences for work-avoidant goals. Gender differences continued to be present in ninth-grade boys’ work-avoidant (r = –.243) and performance (r = –.302) goal orientations (i.e., significant main effect, Wilks’s l = .892), F(6, 152) = 3.055, p < .01. These results conflict with prior research reports noting insignificant gender differences in learning goals among middle school students (Middleton & Midgley, 1997) and yet support the tendency of male adolescents to endorse work-avoidant goals more strongly than their female counterparts (Meece & Miller, 2001).
Salkind_Chapter 63.indd 145
9/4/2010 10:52:46 AM
146
Motivation
Contributors to Domain Interest and Achievement To answer the first research question, which examined the contribution of gender, self-efficacy, and goal orientation to domain interest, a regression analysis was employed. The five predictor variables (i.e., gender, learning, performance, work-avoidant goal orientations, and self-efficacy) were entered into the equation simultaneously. This approach allowed us to identify the unique contribution of each predictor to the designated outcome variable (Cohen, Cohen, West, & Aiken, 2003). Those results appear in Table 3 for eighth- and ninth-grade students. The predictors of the dependent variable, domain interests, accounted for 78% of the overall variance among eighthgrade students, and of those predictors, the learning goal orientation (β = .18) and academic self-efficacy (β = .77) were significant. Similarly, 81% of the variance of ninth-grader’s domain interests was explained by the same variables: learning goal orientation (β = .17) and academic self-efficacy (β = .77). The second research question, which focused on the predictive power of gender, goal orientations, self-efficacy, and domain interest on academic achievement, was addressed with a hierarchical regression analysis. An advantage of hierarchical regression analyses over simultaneous regression analysis is that this approach allowed us to examine the unique contribution of predictors to an outcome variable after controlling the overlaps among the five predictors established in the first regression equation (Cohen et al., 2003). These results are presented in Table 4. In step 1, the five predictors were entered in the equation: gender, learning, performance, work-avoidant goal orientations, and self-efficacy. Of these five predictors, gender (β = .16) significantly contributed to academic achievement, indicating that eighth-grade female students tended to have higher GPA than did their male counterparts. In step 2, domain interests were entered into the equation to control the contribution of gender on academic achievement. Gender remained a significant predictor (β = .16),
Table 3: Regression of domain interests on gender; learning, performance, and work-avoidance goal orientations; and academic self-efficacy Domain interests 8th Graders Gender Learning goal orientation Performance goal orientation Work-avoidance goal orientation Self-efficacy Total
β = .03 β = .18** β = .01 β = .01 β = .77** R2 = .78** (Adj. R2 = .78)
9th Graders β = .03 β = .17** β = .09 β = –.06 β = .77** R2 = .82** (Adj. R2 = .81)
**p ≤ .01.
Salkind_Chapter 63.indd 146
9/4/2010 10:52:46 AM
Long et al.
Motivation and Academics
147
Table 4: Regression of academic achievement on gender; learning, performance, and work-avoidance goal orientations; academic self-efficacy; and domain interests Academic Achievement 8th Graders Step1 Gender Learning goal orientation Performance goal orientation Work-avoidance goal orientation Self-efficacy Total Step 2 Gender Learning goal orientation Performance goal orientation Work-avoidance goal orientation Self-efficacy Domain interests Total
β = .16* β = .11 β = –.05 β = –.08 β = .15 R2 = .10** (Adj. R2 = .08)
β = .02 β = .09 β = –.04 β = –.18 β = .10 R2 = .07* (Adj. R2 = .04)
β = .16* β = .14 β = –.05 β = –.07 β = .28* β = –.17 R2 = .11** (Adj. R2 = .09)
β = .04 β = .19 β = .01 β = –.22* β = .56** β = –.60** R2 = .14** (Adj. R2 = .10)
ΔR2 = .01 *p ≤ .05.
9th Graders
ΔR2 = .07**
**p ≤ .01.
academic self-efficacy became a significant predictor (β = .28), and domain interests did not significantly contribute to academic achievement. More specifically, eighth-grade students with higher academic self-efficacy beliefs were likely to receive higher GPA than those who were less efficacious. Overall, this model explained 9% of the variance of academic achievement for eighth-grade students. Not surprisingly, the ninth-grade data portrayed a different picture. In step 1, none of the five predictors were significant. Step 2 of the hierarchical regression analysis showed that the overall ninth-grade model accounted for 10% of the variance of academic achievement, and of this 10% variance, 4% was explained by the work-avoidant goal orientation (β = –.22) and academic self-efficacy (β = .56), whereas domain interests accounted for the remaining 6%. A notable finding, as shown in Table 4, is that both domain interests and work-avoidant goals negatively contributed to academic achievement (β = –.60). These results were likely skewed by the dramatic decrease in ninth-grade GPA because domain interest levels remained comparable in both grades (5.88 and 5.90, respectively). Nonetheless, it appears that ninth-grade students were more likely to endorse work-avoidant goals, resulting in lower GPA scores, but those who believed they were academically efficacious had higher GPA (Gutman & Midgley, 2000). To summarize, the data revealed the following findings about this primarily poor, urban, African American, adolescent sample. First, students
Salkind_Chapter 63.indd 147
9/4/2010 10:52:46 AM
148
Motivation
expressed moderate levels of all three motivational variables (i.e., self-efficacy, domain interest, and personal goal orientations) in both grades, but grades were significantly lower in high school. Second, levels of efficacy and learning goals strongly predicted domain interest in both grades. Third, self-efficacy consistently contributed to achievement at either grade level. Fourth, although interest’s contribution to achievement could have been masked by selfefficacy and goal orientation in middle school, interest emerged as a significant (albeit negative) contributor to achievement in high school. Fifth, the negative effect of work-avoidant goals on achievement became prominent in high school. Sixth, gender’s affect on motivation and achievement varied between grades. These results confirm both Krapp (1999) and Lent et al.’s (1994) assumptions regarding the significant effect of goals and self-efficacy on interest. However, they do not support related findings by Lent and associates (1993) that both self-efficacy and interest positively affect grades. In addition, the overall contribution of motivational variables and gender represented only a small percentage of variance in achievement, confirming that achievement is a complex phenomenon composed of many factors that were not included in our model. Nonetheless, if motivation starts, sustains, and directs a sequence (Graham & Weiner, 1996), then its contribution to the learning process can be neither undervalued nor overestimated.
Conclusions and Implications Several provocative conclusions and implications emerge from these findings. First, if learning goals and self-efficacy significantly contribute to students’ domain interests, then interest’s power (Alexander et al., 1997) depends on positive beliefs about ability (Bandura, 1986), a willingness to master skills necessary for academic tasks, and an effortful investment into increasing levels of knowledge and understanding (Pintrich, 2000). As such, we could conclude that when student learning goals and self-efficacy are encouraged to grow, domain interests will likewise increase and empower achievement across a variety of subjects and domains. Unfortunately, however, domain interest’s consistent relationship with achievement across many cultures was lower in these eighth-grade students than the averages previously reported (Schiefele et al., 1992) and was insignificant in ninth grade. Furthermore, interest did not empower achievement at either grade level. Why did this occur? Although the literature has noted that declines in achievement can differentially affect motivation after developmental milestones (e.g., transition from elementary to middle school), self-efficacy’s effect increased (from low to moderate) but interest’s effect was not positive. Simply acknowledging that most interest and efficacy research has been conducted among Caucasian students, however, does not
Salkind_Chapter 63.indd 148
9/4/2010 10:52:46 AM
Long et al.
Motivation and Academics
149
warrant our concluding that these disparate outcomes represent a cultural anomaly. Instead, we believe it may be more appropriate to suggest that their interest was simply more responsive to contextual factors. Until we know how interest develops, this suggestion remains provocatively unresolved. We do acknowledge, however, that students within a system characterized as being in an academic emergency may not have opportunities for their existing domain interests to be utilized during instruction. The vocational interests of 71 students from both eighth and ninth grades who participated in the entire 2-year study remained surprisingly consistent and stable longitudinally, yet few students believed their teachers could identify their individual or vocational interests (Long, 2003). Thus a student’s desire to pursue pediatric nursing, for example, would not necessarily empower domain interest or learning in science and math. Furthermore, if domain interest is unaccompanied by correspondingly high levels of knowledge (Alexander et al., 1994), then even students who are able to identify their levels of domain interest as being moderately strong are likely unable to capitalize on interest’s potential power to support their learning. Essentially, interest’s energizing ability is fueled by knowledge acquisition, which was low in eighth grade (M = 2.130) and plummeted in ninth grade (M = 1.427). Because new knowledge must be constructed from existing knowledge (Bransford, Brown, & Cocking, 1999), even students who possess the will, desire, and value for learning require substance with which to build cognitive schema and understanding. Thus, these levels of GPA indicate that students’ construction of knowledge is being severely hampered by other factors not examined in this empirical project. Second, because self-efficacy levels consistently contributed to domain interest as well as achievement at each grade level, we need to reexamine this outcome in light of previous findings in the literature. Typically, selfbeliefs among Caucasian students decline following a developmental school transition (Eccles et al., 1993), but this did not occur in our urban, minority sample. Conceptually regarded as being sensitive to experiences of failure, their levels of efficacious beliefs were moderately strong and surprisingly robust, but their skill and effort did not match the outcome (Bandura, 1986). This can occur in settings where social bias and inferior resources impair academic achievement and “self-efficacy may exceed actual performance,” indicating that rather than not knowing what to do, students “are unable to do what they know” (Pajares, 1996, p. 568). Such a scenario is reminiscent of the sociological premises referenced in the introduction of this article (i.e., oppositional identity, stereotype threat). As such, African American students who adopt oppositional identities to combat the negative impact of actual and perceived discrimination within their school setting may feel confident in their ability to successfully execute a given academic task or be interested in an academic domain and still simultaneously express a deliberate disdain for academic behaviors associated
Salkind_Chapter 63.indd 149
9/4/2010 10:52:46 AM
150
Motivation
with successful outcomes. Steinberg (1991) examined beliefs about the rewards of success among African American high school students and concluded that it was extremely difficult for Black high school students to join a peer group that encouraged academic excellence. Thus the paramount importance of the peer group for African American students relative to other ethnic groups has led researchers to conclude that Black high school students who desire to excel academically may be faced with isolation or be cut off from the social networks that exist among their high school peers (Witherspoon, Speight, & Thomas, 1997). Third, neither learning goals (a contributor to domain interest) nor performance goals significantly factored into achievement at any grade level. Again, this contrasts with research in predominantly non-Black samples at comparable ninth-grade developmental levels (e.g., Gehlbach, 2006). However, these students’ work-avoidant goals did emerge as negatively influential on ninth-grade GPA. When faced with the challenges associated with high school (e.g., larger, more heterogeneous student body; rigid academic ability tracking), achievement was actually affected by students’ apparent desire to finish assigned works with a minimum amount of effort (Meece et al., 1988) rather than seeking help or persisting (Brophy, 2004). Even more important, although the literature and our eighth-grade findings support the notion that boys tended to hold work-avoidant goal orientations (Meece & Miller, 2001), these gender differences disappeared in high school. Within the present research design, it is impossible to determine if the more prominent adoption of work-avoidant goals actually encouraged the decline in GPA or resulted from the decline. We can only state that avoidant goals emerged as a significant factor among both ninth-grade boys and girls, sadly contrasting with the eighth-grade girls’ strength in higher grades and stronger learning goals. Although this finding could indicate the emergence of a cross-gender developmental trend, the literature neither corroborates nor negates this conclusion. Thus, this pattern may be unique to African American students, especially if their environment inadvertently encourages early acquisition of avoidance goals for strategic purposes. This possibility became evident to us when the first author was a reading tutor in one of the urban middle schools featured in this study. After a new female tutee was able to rather easily read the designated passage, the student was asked why she needed extra help. She responded by stating that their currently assigned novel was “very boring.” Therefore, she and a large group of her girlfriends complained and requested a replacement. When their pleas were denied because the book was mandated by the system’s language arts curriculum, they formulated a plan to finish the dull book more quickly. Their scheme consisted of pretending they could decipher only one-syllable words when it was their turn to read aloud. Their halting responses so frustrated the teacher that she took over reading the book, which was covered much more rapidly. As they
Salkind_Chapter 63.indd 150
9/4/2010 10:52:46 AM
Long et al.
Motivation and Academics
151
moved to another text, the girls concluded that their strategy had been very successful. Unfortunately, their skills in constructing and implementing the work-avoidant goal later resulted in the group receiving lower reading grades and remediation (indicative of the negative relationship between avoidance and GPA). On one level, these middle school students’ proficiency in formulating strategies that utilized work-avoidant goals could appear to be the result of a developmental mismatch with their required text. However, there may be cultural reasons why African Americans may be particularly vulnerable to this goal orientation. More specifically, students of color attend more readily to curriculum presented in a humanized narrative form (Banks, 1988). In addition, Bennett (1990) reported that African American students tend to evidence a learning style that stresses a visual/global rather than a verbal/analytical approach as well as a preference for reasoning by inference rather than formal logic. Furthermore, Hale (2003) argued that culturally appropriate pedagogy must consider three interacting spheres of influence: classroom instruction, cultural enrichment, and instructional accountability. Boykin (1983) also found that Black students evidenced a preference for energetic involvement in several activities simultaneously rather than routine, step-by-step learning. Although these findings are well-known among ethnicity researchers, they are less apparent to educators and are rarely afforded pedagogical consideration, particularly in urban high schools where whole-class instruction and higher levels of competition (Bryk & Thum, 1989) sharply conflict with cultural differences in learning styles and communication preferences. In discussing how African American students may best reconcile the absence of these culturally relevant teaching practices, Ogbu (2003) recommends “accommodation without assimilation,” or the adoption of attitudes and behaviors that lend themselves to academic success in school setting while still embracing cultural norms that are acceptable in less formal settings. This alternative is preferred to other assimilationist behaviors that can significantly correlate with psychological distress (i.e., emulation of Whites, disguising true academic attitudes and behaviors, and the deliberate isolation from other African Americans). To reverse the apparent tendency for African American students to assume work-avoidant goal orientations, perhaps high school students need opportunities to value both the culture of the school and their African American community, stressing the value of one without undermining loyalty to the other. In sum, our findings indicate that students’ motivational beliefs and dispositions, similar to self-efficacy, domain interest, and achievement goal orientations, develop “partly as a consequence of the educational environments they experience” (National Research Council, 2004, p. 33). Thus, factors that hinder the relationship between motivation and achievement can consist of poor resources, dilapidated facilities or equipment, ineffective
Salkind_Chapter 63.indd 151
9/4/2010 10:52:46 AM
152
Motivation
teachers (Pajares, 1996), or other indigenous factors often associated with academic transitions (i.e., increased stress levels, decreased self-esteem, and loss of social status). Qualitative methods (e.g., student interviews about motivation that capture their own words, close examinations of the climate of testing and its effect on motivation) as well as survey measures should be used in the future to distinguish why minority students’ existing levels of motivation fail to produce acceptable achievement levels. Furthermore, although the generalizability of these findings is limited to urban schools where there is an academic emergency that directly and indirectly affects all participants within the ecological educational system, our results suggest the need to further investigate such locales where a mandatory curriculum in all core courses is being implemented. A decade ago, Graham (1994) concluded that “Black subjects maintain undaunted optimism and positive self-regard even in the face of achievement failure” (p. 103), and our findings echo the same refrain today. Because these students possessed moderate levels of all three motivational variables, they cannot be technically classified as unmotivated. However, the purchasing power of their motivational resources seems reflective of an impoverished academic state. Some of our outcomes parallel developmental patterns discovered by other researchers after an academic transition (e.g., declines in achievement and shifts from learning goals), but other results (e.g., decline in the effectiveness of interest; both boys and girls utilizing work-avoidant goals) distinctively differ and seem more connected with cultural and contextual factors. Thus, we support suggestions by the National Research Council (2004) for fostering motivation in urban high schools. Their environmental recommendations included redesigning courses and instructional methods to increase engagement and learning, providing resources, assessing understanding and skills, creating smaller learning communities, coordinating communication within the community, and eliminating tracking. It is very likely that such improvements will not only support the growth of motivation but also contribute to its potency and effectiveness in empowering achievement.
References Alexander, P. A., Kulikowich, J. M., & Schulze, S. K. (1994). How subject-matter knowledge affects recall and interest. American Educational Research Journal, 3, 313–337. Alexander, P. A., & Murphy, P. K. (1998). Profiling the differences in students’ knowledge, interest, and strategic processing. Journal of Educational Psychology, 90, 435– 447. Alexander, P. A., Murphy, P. K., Woods, B. S., Duhon, K. E., & Parker, D. (1997). College instruction and concomitant changes in students’ knowledge, interest, and strategy use: A study of domain learning. Contemporary Educational Psychology, 22, 125–146. Alvidrez, J., & Weinstein, R. S. (1993). The nature of “schooling” in school transitions: A critical re-examination. Prevention in Human Services, 10, 7–26. Ames, C. (1992). Classrooms: Goals, structures, and student motivation. Journal of Educational Psychology, 84, 261–271.
Salkind_Chapter 63.indd 152
9/4/2010 10:52:46 AM
Long et al.
Motivation and Academics
153
Anderman, E. M., Austin, C. C., & Johnson, D. M. (2001). The development of goal orientation. In A. Wigfield & J. S. Eccles (Eds.), Development of achievement motivation (pp. 197–220). San Diego, CA: Academic Press. Anderman, E. M., & Maehr, M. L. (1994). Motivation and schooling in the middle grades. Review of Educational Research, 64, 287–309. Anderman, E. M., & Midgley, C. (1997). Changes in achievement goal orientations, perceived academic competence, and grades across the transition to middle-level schools. Contemporary Educational Psychology, 22, 269–298. Aronson, J., & Good, C. (2002). Development and consequences of stereotype vulnerability in adolescents. In F. Pajares & T. Urdan (Eds.), Academic motivation of adolescents (pp. 178–198). Greenwich, CT: Information Age. Aronson, J., Quinn, D., & Spencer, S. (1998). Stereotype threat and the academic underperformance of minorities and women. In J. Swim & C. Stangor (Eds.), Prejudice: The target’s perspective (pp. 83–103). San Diego, C A: Academic Press. Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84, 191–215. Bandura, A. (1986). Social foundation of thought and action: A social cognitive theory. Upper Saddle River, NJ: Prentice Hall. Bandura, A. (1993). Perceived self-efficacy in cognitive development and functioning. Educational Psychologist, 28, 117–148. Bandura, A. (1997). Self-efficacy: The exercise of control. New York: Freeman. Banks, J. (1988). Multicultural education: Development, dimensions and challenges. Phi Delta Kappan, 75, 22–28. Bennett, C. (1990). Comprehensive multicultural education: Theory and practice (2nd ed.). Boston: Allyn & Bacon. Boykin, A. W. (1983). The academic performance of Afro-American children. In J. Spence (Ed.), Achievement and achievement motives (pp. 324 –371). San Francisco: Freeman. Bransford, J., Brown, A., & Cocking, R. (Eds.). (1999). How people learn: Brain, mind, experience, and school. Washington, DC: National Academy Press. Brophy, J. (2004). Motivating students to learn (2nd ed.). Mahwah, NJ: Lawrence Erlbaum. Bryk, A., & Thum, Y. (1989). The effects of high school organization on dropping out: An exploratory investigation. American Educational Research Journal, 26, 353–383. Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral science (3rd ed.). Mahwah, NJ: Lawrence Erlbaum. Dewey, J. (1899). Interest as related to will. Chicago: University of Chicago. Dweck, C. S. (1986). Motivational processes affecting learning. American Psychologist, 41, 1040 –1048. Dweck, C. S., & Leggett, E. L. (1988). A social-cognitive approach to motivation and personality. Psychologist Review, 95, 256 –273. Eccles, J., & Midgley, C. (1989). Stage/environment fit: Developmentally appropriate classrooms for early adolescents. In R. Ames & C. Ames (Eds.), Research on motivation in education (Vol. 3, pp. 139–181). New York: Academic Press. Eccles, J., Midgley, C., Wigfield, A., Buchanan, C., Reuman, D., Flanagan, C., et al. (1993). Development during adolescence: The impact of stage-environment fit on young adolescents’ experiences in schools and in families. American Psychologist, 48, 90 –101. Elliot, A. J., & Harackiewicz, J. M. (1996). Approach and avoidance goals and intrinsic motivation: A mediational analysis. Journal of Personality and Social Psychology, 70, 461– 475. Fordham, S., & Ogbu, J. (1988). Black students’ school success: Coping with the “burden of ‘acting white.’” Urban Review, 18, 178 –204. Gardner, P. L., & Tamir, P. (1989). Interest in biology: A multidimensional construct. Journal of Research in Science Teaching, 26, 409 – 423.
Salkind_Chapter 63.indd 153
9/4/2010 10:52:46 AM
154
Motivation
Gehlbach, H. (2006). How changes in students’ goal orientations relate to outcomes in social studies. The Journal of Educational Research, 99, 358 – 370. Gillock, K., & Reyes, O. (1996). High school transition-related changes in urban minority students’ academic performance and perceptions of self and school environment. Journal of Community Psychology, 24, 245–261. Graham, S. (1994). Motivation in African Americans. Review of Educational Research, 64, 55–117. Graham, S., & Weiner, B. (1996). Theories and principles of motivation. In D. C. Berliner & R. C. Calfee (Eds.), Handbook of educational psychology (pp. 63 – 84). New York: Macmillan. Gravetter, F. J., & Wallnau, L. B. (2000). Statistics for the behavioral sciences (5th ed.). Belmont, CA: Wadsworth/Thomson Learning. Gutman, L. M., & Midgley, C. (2000). The role of protective factors in supporting the academic achievement of poor African American students during the middle school transition. Journal of Youth and Adolescence, 29, 223–248. Hale, J. (2003). Learning while Black: Creating educational excellence for African American children. Baltimore, MD: Johns Hopkins University Press. Harackiewicz, J. M., Durik, A. M., & Barron, K. E. (2005). Multiple goals, optimal motivation, and the development of interest. In J. P. Forgas, K. D. Williams, & S. M. Laham (Eds.), Social motivation: Conscious and unconscious processes (pp. 21–39). Cambridge, UK: Cambridge University Press. Hidi, S., & Baird, W. (1988). Strategies for increasing text-based interest and students’ recall of expository texts. Reading Research Quarterly, 23, 465– 483. Hidi, S., & Harackiewicz, J. M. (2000). Motivating the academically unmotivated: A critical issue for the 21st century. Review of Educational Research, 70, 151–179. James, W. (1958). Talks to teachers. New York: Norton. Krapp, A. (1999). Interest, motivation and learning: An educational-psychological perspective. European Journal of Psychology of Education, 14(1), 23 – 40. Lent, R. W., Brown, S. D., & Hackett, G. (1994). Toward a unifying social cognitive theory of career and academic interest, choice, and performance. Journal of Vocational Behavior, 45, 79–122. Lent, R. W., Brown, S. D., Sheu, H., Schmidt, J., Brenner, B. R., Gloster, C. S., et al. (2005). Social cognitive predictors of academic interests and goals in engineering: Utility for women and students at historically Black universities. Journal of Counseling Psychology, 52, 84 –92. Lent, R. W., Lopez, F. G., & Bieschke, K. J. (1993). Predicting mathematics-related choice and success behaviors: Test of an expanded social cognitive model. Journal of Vocational Behavior, 42, 223 –236. Long, J. F. (2003). Connecting with the content: How teacher interest affects student interest in a core course. Unpublished doctoral dissertation, The Ohio State University, Columbus. Marjoribanks, K., & Mboya, M. (2004). Learning environments, goal orientations, and interest in music. Journal of Research in Music Education, 52, 155 –166. Meece, J. L., Blumenfeld, P. C., & Hoyle, R. (1988). Students’ goal orientations and cognitive engagement in classroom activities. Journal of Educational Psychology, 80, 514 –523. Meece, J. L., & Miller, S. D. (2001). A longitudinal analysis of elementary school students’ achievement goals in literacy activities. Contemporary Educational Psychology, 26, 458 – 480. Middleton, M. J., & Midgley, C. (1997). Avoiding the demonstration of lack of ability: An unexplored aspect of goal theory. Journal of Educational Psychology, 89, 710 –718. Midgley, C., Anderman, E., & Hicks, L. (1995). Differences between elementary and middle school teachers and students. Journal of Early Adolescence, 15, 90 –113.
Salkind_Chapter 63.indd 154
9/4/2010 10:52:46 AM
Long et al.
Motivation and Academics
155
Midgley, C., Kaplan, A., Middleton, M., Maehr, M. L., Urdan, T., Anderman, L. H., et al. (1998). The development and validation of scales assessing students’ achievement goal orientations. Contemporary Educational Psychology, 23, 113 –131. National Research Council. (2004). Engaging schools: Fostering high school students’ motivation to learn. Washington, DC: National Academies Press. Newman, B. M., Myers, M.C., Newman, P. R., Lohan, B. J., & Smith, V. L. (2000). The transition to high school for academically promising, urban, low-income African American youth. Adolescence, 35, 45 –66. Ogbu, J. U. (1988). Cultural diversity and human development. In D.T. Slaughter (Ed.), Black children and poverty, a developmental perspective (pp. 11–28). San Francisco: Jossey-Bass. Ogbu, J. (2003). Black American students in an affluent suburb. A study of academic disengagement. Mahwah, NJ: Lawrence Erlbaum. Pajares, F. (1996). Self-efficacy beliefs in academic settings. Review of Educational Research, 66, 533 –578. Pajares, F., Britner, S. L., & Valiante, G. (2000). Relation between achievement goals and self-beliefs of middle school students in writing and science. Contemporary Educational Psychology, 25, 406 – 422. Pajares, F., & Johnson, M. J. (1996). Self-efficacy beliefs and the writing performance of entering high school students. Psychology in the Schools, 33, 163 –175. Pajares, F., & Miller, M. D. (1994). Role of self-efficacy and self-concept beliefs in mathematical problem solving: A path analysis. Journal of Educational Psychology, 86, 193–203. Pajares, F., & Valiante, G. (1997). Influence of self-efficacy on elementary students’ writing. The Journal of Educational Research, 90, 353 –360. Phi Delta Kappa curriculum management audit of Columbus Public Schools. (2002, December 2). The Columbus Dispatch. Retrieved December 2, 2002, from http://www. dispatch.com/news/audit/standard4-1.html. Pintrich, P. R. (2000). Multiple goals, multiple pathways: The role of goal orientation in learning and achievement. Journal of Educational Psychology, 92, 544 –555. Pintrich, P. R., & Schunk, D. H. (2002). Motivation in education: Theory, research, and application (2nd ed.). Upper Saddle River, NJ: Merrill-Prentice Hall. Reed, D., McMillan, J., & McBee, R. (1995). Defying the odds: Middle schoolers in high-risk circumstances who succeed. Middle School Journal, 27, 3 –10. Renninger, K. A. (2000). Individual interest and its implications for understanding intrinsic motivation. In C. Sansone & J. M. Harackiewicz (Eds.), Intrinsic and extrinsic motivation: The search for optimal motivation and performance (pp. 373 – 404). San Diego, CA: Academic Press. Reyes, O., Gillock, K., Kobus, K., & Sanchez, B. (2000). A longitudinal examination of the transition into senior high school for adolescents from urban, low-income status, and predominantly minority backgrounds. American Journal of Community Psychology, 28, 519 – 544. Sansone, C., & Smith, J. L. (2000). Interest and self-regulation: The relation between having to and wanting to. In C. Sansone & J. M. Harackiewicz (Eds.), Intrinsic and extrinsic motivation: The search for optimal motivation and performance (pp. 341–372). San Diego, CA: Academic Press. Schiefele, U. (1991). Interest, learning, and motivation. Educational Psychologist, 26, 299 –323. Schiefele, U. (1996). Topic interest, text representation, and quality of experience. Contemporary Educational Psychology, 2, 3 –18. Schiefele, U., Krapp, A., & Winteler, A. (1992). Interest as a predictor of academic achievement: A meta-analysis of research. In K. A. Renninger, S. Hidi, & A. Krapp (Eds.), The role of interest in learning and development (pp. 183 –212). Hillsdale, NJ: Lawrence Erlbaum.
Salkind_Chapter 63.indd 155
9/4/2010 10:52:46 AM
156
Motivation
Schunk, D. H. (1989). Self-efficacy and achievement behaviors. Educational Psychology Review, 1, 173 –207. Seidman, E., Aber, J. L., Allen, L., & French, S. E. (1996). The impact of the transition to high school on the self-system and perceived social context of poor urban youth. American Journal of Community Psychology, 24, 489 –516. Seidman, E., Allen, L., Aber, J. L., Mitchell, C., & Feinman, J. (1994). The impact of school transitions in early adolescence on the self-system and perceived social context of poor urban youth. Child Development, 65, 507–522. Seidman, E., & French, S. E. (1997). Normative school transitions among urban adolescents: When, where, and how to intervene. In H. J. Walberg, O. Reyes, & R. P. Weissberg (Eds.), Children and youth: Interdisciplinary perspectives (pp. 166 –189). Thousand Oaks, CA: Sage. Shell, D. F, Murphy, C. C., & Bruning, R. H. (1989). Self-efficacy and outcome expectancy mechanisms in reading and writing achievement. Journal of Educational Psychology, 81, 91–100. Steele, C., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69, 797– 811. Steele, C., Spencer, S., & Aronson, J. (2002). Contending with group image: The psychology of stereotype and social identity threat. In M. Zanna (Ed.), Advances in experimental social psychology (Vol. 34, pp. 379 – 440). New York: Academic Press. Steinberg, L. (1991). Ethnic differences in adolescent achievement: An ecological perspective. American Psychologist, 46, 723 – 729. Steinberg, L., Dornbusch, S., & Brown, B. (1992). Ethnic differences in adolescent achievement: An ecological perspective. American Psychologist, 47, 723 –729. Tobias, S. (1994). Interest, prior knowledge, and learning. Review of Educational Research, 64, 37– 54. Van Yperen, N. (2003). Task interest and actual performance: The moderating effects of assigned and adopted purpose goals. Journal of Personality and Social Psychology, 85, 1006 –1015. Wenner, G. (2003). Comparing poor, minority elementary students’ interest and background in science with that of their White, affluent peers. Urban Education, 38, 153 –172. Wigfield, A., Eccles, J. S., & Pintrich, P. R. (1996). Development between the ages of 11 and 25. In D. Berliner & R. Calfee (Eds.), Handbook of educational psychology (pp. 148 –185). New York: Macmillan. Witherspoon, K., Speight, S., & Thomas, A. (1997). Racial identity attitudes, school achievement, and academic self-efficacy among African American high school students. Journal of Black Psychology, 23, 344 –357.
Salkind_Chapter 63.indd 156
9/4/2010 10:52:46 AM
64 Intrinsic Motivation and School Misbehavior: Some Intervention Implications Howard S. Adelman and Linda Taylor
W
ith the “cognitive revolution” in psychology, new work on the construct of intrinsic motivation has emerged. This work has relevance for researchers and practitioners concerned with behavior and learning problems. The purpose of this article is to highlight the importance of understanding intrinsic motivation, with specific respect to research and practice focused on school misbehavior. The value of the concepts of self-determination, competence, and relatedness in studying motivation for devious and deviant behavior is discussed. Then, these concepts are applied to the problem of categorizing intrinsically motivated misbehavior. Finally, implications for intervention are explored to suggest directions for formal research and experimental practice.
An Intrinsic View of Motivation The following draws primarily on the work of Deci and his colleagues, because their theoretical ideas are consistent with a large amount of theory and research, and they consistently apply their work to schooling, clinical intervention, and special education populations (e.g., Deci, 1975, 1980; Deci & Chandler, 1986; Deci & Ryan, 1985; Ryan, Connell, & Deci, 1985). That perspective specifies three fundamental psychological needs motivating Source: Journal of Learning Disabilities, 23(9) (1990): 541–550.
Salkind_Chapter 64.indd 157
9/4/2010 10:52:35 AM
158
Motivation
human activity – self-determination, competence, and relatedness (see Note 1). These are seen as the intrinsic motivating forces that lead individuals to seek out challenges; and, seeking and conquering challenges are widely viewed as fundamental to development of the internal structures that guide subsequent action. In contrast to growth-oriented-behavior theorists, intrinsic motivation theorists (e.g., Brehm & Brehm, 1981; Condry, 1977; Deci & Ryan, 1985; McGraw, 1978) emphasize that individuals are especially vulnerable to events that (a) exert pressure and control or (b) lead to repeated failure/ negative feedback, or to outcomes that are unpredictable or uncontrollable. Obvious examples are demands for conformity enforced with punishment for noncompliance. Less obvious examples are efforts to use material and social rewards to control behavior. For instance, even when students “shape up” to obtain an offered reward, they may perceive the situation as another effort to control them (i.e., limit their self-determination). Several research reviews concur that the use of rewards, surveillance, and deadlines, and other actions that exert pressure and control on an individual, can undermine feelings of self-determination and lead to psychological reactance (see Note 2) (e.g., Brehm & Brehm, 1981; Condry, 1977; Deci & Ryan, 1985; McGraw, 1978; Ryan et al., 1985). It should be stressed, however, that it is the surrounding context, not events themselves, that is seen as determining whether external control produces negative effects. As Deci and Chandler (1986) pointed out, In the realm of education, both the general classroom context and the specific context for any given child seem to be determined primarily by the teachers’ orientations and intentions. For example, when teachers offer rewards or impose deadlines with the intent of controlling the children’s behavior – of getting the children to do what they want them to do – the rewards and deadlines have predicted negative effects (e.g., Deci, Nezlek, & Sheinman, 1981). On the other hand, when these events are presented as informative structures, as ways of acknowledging independent achievement or creative initiations, for example, they do not have negative effects, (p. 589)
From the above perspective, then, a considerable amount of the motivation underlying school misbehavior can be understood in terms of a student’s (a) growth-oriented activity stemming from psychological needs for selfdetermination, competence, and relatedness, and (b) reactions to threats to these three psychological needs. In the latter instance, degree of threat is dependent on how the student perceives events and their context. The growing body of empirical support relevant to intrinsic motivation is too large to review here (see Deci & Ryan, 1985; Weiner, 1980). A few studies relevant to the concept of self-determination should suffice to illustrate the value of intrinsic motivation constructs in understanding school behavior problems. Much of the research generated by reactance theory is germane to selfdetermination (see Brehm & Brehm, 1981). Particularly pertinent are studies
Salkind_Chapter 64.indd 158
9/4/2010 10:52:36 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
159
of the relation between hostility and defiance and one’s objective control over events. Most of these studies involve an experimental manipulation under laboratory conditions, wherein one person threatens the freedom of another. A representative study (Worchel, 1974) operationalizes threat by taking away a specific choice and replacing it with an alternative. Results show predicted increases in hostility due to frustrated expectations, with degree of hostility related to degree of disconfirmation of the expected choice and attractiveness of the alternative. Research on perceived control also is relevant to self-determination, in that having perceived control is necessary for people to feel self-determining (see Note 3). The bulk of work in this area emphasizes the relation of (a) positive perceptions of control to well-being and (b) negative perceptions to problem situations. For instance, research on stress and health has documented benefits of increased perceived control (see Langer, 1983); studies of school achievement and perceived control also report a strong positive relationship (see Stipek & Weisz, 1981; Weisz & Cameron, 1985). A specific example of direct relevance to school misbehavior is the work of Allen and Greenberger (1980) on destructive behavior. These investigators reported a series of laboratory studies on increases in sense of personal control as related to damage to the physical environment. In a task involving destruction of a block tower, undergraduates reported that (a) the act increased their feelings of perceived control and success, (b) the increase was greatest for those placed in a state of low general control (failure) prior to destruction, and (c) the degree of the complexity of and objective control over destruction was positively related to feelings of success. In the same report, data from an interview with a sample of one hundred twenty 18- to 20-year-old males regarding their motivation for vandalism relate such acts to a sense of self-determination. (Feelings of accomplishment about damaging the environment are illustrated by the respondent who had smashed a high school locker; he recalled thinking with pride each time he passed it, “There’s my little destruction to this brand new school.”) As Allen and Greenberger (1980) suggested, failure situations can lead to a state of low perceived control. Thus, students who do poorly would be expected to have lower levels of perceived control than those who achieve to their satisfaction. Indeed, our research group has found lower levels of perceived control among students diagnosed as learning disabled as contrasted to nonproblem student samples (Adelman, Smith, Nelson, Taylor, & Phares, 1986). Similarly, in a study at an inner city, lower SES area junior high, Nichols (1985) found decreasing levels of perceived control among a contrast group not enrolled in an experimental leadership-training program. Assuming that lower levels of perceived control reflect threats to self-determination, both theory and research suggest that students who do poorly at school may manifest some form of deviant or devious behavior in order to increase, at least temporarily, feelings of self-determination.
Salkind_Chapter 64.indd 159
9/4/2010 10:52:36 AM
160
Motivation
Toward Categorizing Intrinsic Motivation for Misbehavior Once one adopts an intrinsic view of motivation, the concepts can be applied in efforts to categorize intrinsic motivational underpinnings of devious and deviant behavior. Before briefly illustrating the point with a working schema, however, the basic context for such work is stressed, namely, the problem of describing school misbehavior.
Categorizing School Misbehavior What makes a particular act at school deviant? Charles (1985) stated that classroom misbehavior is “behavior that the teacher judges to be inappropriate for a given time or place” (p. 4). That is, an act that bothers the teacher usually is seen as deviant. This is not to say the act should be identified as misbehavior, or that the student should be seen as a discipline problem. It is simply a recognition that, as a representative of a powerful system, the teacher’s perception generally prevails and is the starting point for intervention. Similarly, any effort to categorize and classify troublesome school behavior is influenced by the rationale used by the classifier. For example, one could simply categorize misbehavior in terms of designations used on behavior rating scales – arguing, high or low activity levels, crying, hitting and fighting, destroying things, not following rules and directions, not participating in class or not coming to school, lying, cheating, and so forth. (Such descriptive terms imply nothing about motivation.) Most practitioners and researchers, however, are not satisfied with descriptive terms. They prefer to classify misbehavior with respect to its disruptive influence or postulated underlying pathology. For instance, some refer to acting out (as contrasted with withdrawn) students with labels such as uncooperative, noncompliant, disrespectful, or inappropriately aggressive. These labels do not specify the motivational underpinnings for misbehavior, but the behavior commonly is explained in terms of youngsters’ desires to “get attention,” “flout authority,” or “avoid doing the assigned work.” In contrast, some professionals see deviant and devious behavior as a manifestation of disabilities associated with psychological and physiological malfunctioning. Thus, they use diagnostic labels such as attention deficithyperactivity disorder, oppositional defiant disorder, depression, and so forth. One implication of the underlying-pathology view is that misbehavior is not always rationally motivated; another related implication is that, although observers often infer rational intention, they may be in error. (Some intervenors offer these implications as a basis for not considering rational motives in dealing with misbehavior; others are unconcerned about motives because their practices are not based on differential diagnoses of cause.)
Salkind_Chapter 64.indd 160
9/4/2010 10:52:37 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
161
Efforts to improve the classification of children’s psychological and educational problems involve an interplay of conceptual and empirical activity. Progress has accelerated with advances in methodology (Achenbach, 1988); however, the consensus among reviewers is that considerable problems remain, especially with respect to validation of taxonomies (Achenbach & Edelbrock, 1983; Dreger, 1982; Quay, 1986; Quay, Routh, & Shapiro, 1987; Rutter & Gould, 1985; Rutter & Tuma, 1988). As Quay et al. (1987) stressed in discussing the classification of childhood psychopathology, the field needs “categories or dimensions that (a) can be discriminated from other syndromes and thus reliably diagnosed or measured; and also meet one or more of the following criteria: (b) are associated with different causes, (c) have different outcomes, or (d) respond to different interventions” (p. 493). These reviewers conclude that “the application of these criteria would seem to rule out many of the categories of DSM-III and not a few of the empirically derived dimensions as well” (p. 493). One of our prerequisite tasks in exploring motivational underpinnings for school misbehavior is that of categorically describing manifestations of such behavior. Because of the deficiencies related to current schemes for classifying children’s problems, we have found it helpful for now to return to the use of basic dimensions to describe observable manifestations. The basic descriptive dimensions we use are (1) general type of behavior, (2) overtness, (3) energy level, (4) pervasiveness, and (5) frequency. That is, first we group the acts into the two major categories that are highlighted by multi-variate analytic studies (cf. Quay & Werry, 1986): (a) acting-out behavior and (b) withdrawal (physical and psychological). Second, we distinguish each category in terms of whether it is manifested overtly or covertly, whether the act is displayed in a highly intensive or passive manner, and whether it is seen in a narrow or broad range of situations. Then, we rate its frequency of occurrence. We find that this categorization provides sufficient descriptive differentiation among troublesome acts to allow us to proceed with exploring the motivational underpinnings for such behavior.
Categorizing Motivational Underpinnings As with learning problems, the largest proportion of deviant and devious behavior seen at school is not likely the result of internal pathology (Adelman & Taylor, 1983, 1986a; Winett, Stefanek, & Riley, 1983). Indeed, the majority of such behavior probably is motivated and rational, judging from the frequency with which misbehavior is characterized as an act of defiance, a diversion, revenge, an effort to deceive, manipulate, be nonconforming or anticonforming, and so forth. (We do recognize that some behavior problems are an unintentional by-product of efforts to cope without having acquired the skills for doing so appropriately. Purely unintentional misbehavior, however,
Salkind_Chapter 64.indd 161
9/4/2010 10:52:37 AM
162
Motivation
seems most prevalent among young children who have not yet experienced a great deal of failure or frustration and among individuals with significant disabilities.) In categorizing intentional misbehavior, a motivational perspective requires distinguishing between motivational subgroups. To begin with, it is useful to differentiate proactive from reactive behavior. Proactive behavior is the individual’s effort to seek out or establish conditions that produce feelings of satisfaction. With respect to misbehavior, such actions reflect an approach tendency. Reactive behavior is seen in efforts to cope and defend against conditions that produce unpleasant feelings. With respect to misbehavior, such actions reflect an avoidance tendency. Furthermore, as conceived by Deci and his colleagues, the intrinsic motivational bases for most intentional behavior can be viewed as stemming from a desire to feel self-determining, competent, and related to others. From this theoretical base, a substantial portion of misbehavior at school can be understood in terms of students’ attempts to act in ways that increase feelings of control, competence, and connectedness. That is, some misbehavior reflects proactive efforts to do things that will lead to such feelings; other behavior reflects reactive efforts to deal with threats that interfere with such feelings (see Figure 1). For example, students often are compelled into situations in which they feel they cannot perform effectively and, under such circumstances, may react in negative or inappropriate ways, to avoid or protest what is happening. Over a period of time, this reactive behavior that was initially designed to defend against aversive situations can become an established pattern of coping (Adelman & Taylor, 1986a). The same action may reflect proactive or reactive motivation and stem from a desire to feel self-determining, competent, or related to others. The misbehavior may take the form of overt or covert actions. Examples of the former include open, direct defiance and aggression, direct physical or psychological withdrawal, and nonconformity or conformity to deviant models;
Proactive INTENTIONAL MISBEHAVIOR Reactive
SelfCompetence Relatedness determination TYPE OF INTRINSIC MOTIVE
Figure 1: Motivational underpinnings for intentional misbehavior in the classroom
Salkind_Chapter 64.indd 162
9/4/2010 10:52:37 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
163
examples of the latter include manipulation, deception, passive withdrawal, and psychologically induced physical illness. The importance of distinguishing the underlying motivation for misbehavior at school can be illustrated by thinking about six students who are noncompliant, particularly when it comes to following school rules. Student A sets off the fire alarm in the hall. His action is based on a proactive effort to stir up some excitement. He is seeking the challenge of breaking the law and the feeling of competence that results from not getting caught. Student B paints graffiti all over the bathroom wall. Her act also is proactive; but the behavior is intended to increase her acceptance by a subgroup of peers who she believes expect her to defy the rules (i.e., she is trying to conform to the standards set by deviant role models). Student C proactively seeks to satisfy her need for self-determination by dressing according to her own view of what is best for her, rather than adhering to the school’s dress code. In contrast, Student D’s action is reactive: He defiantly breaks the dress code as a direct protest against what he views as a threat to his self-determination. Student E’s action is also reactive: She refuses to do classroom assignments, as a way of diverting attention from the fact that she lacks the skills to perform competently in class. Finally, Student F has been rebuffed in his efforts to establish a positive relationship with his teacher, and he reacts by withdrawing and giving her “the silent treatment.”
Implications for Intervention Interventions to deal with misbehavior can be categorized in terms of phases, namely, efforts to prevent and anticipate misbehavior, actions to be taken during misbehavior, and steps to be taken afterwards. An understanding of intrinsic motivation in general, and reactive and proactive deviance in particular, is critical for each of these intervention phases. For example, with respect to prevention, most professionals, regardless of theoretical orientation, recognize that social and school program improvements could reduce learning and behavior problems significantly. There is increasing acceptance that a primary preventive step involves normative changes in classroom programs. From the perspective of intrinsic motivation theory, such changes include designing classroom instruction to better match the broad range of differences in students’ intrinsic motivation, as well as their differences in capability. Indeed, such changes are considered to be an essential prerequisite to individual intervention (e.g., Adelman, 1989; Adel-man & Taylor, 1985, 1986b; Maher & Zins, 1987; Millman, Schaefer, & Cohen, 1981). However, even if primary and secondary preventive steps are taken, there remains the necessity of intervening with individuals who continue to be troublesome. Discussions of practices for dealing with such students often are organized around the topics of discipline, classroom management, and
Salkind_Chapter 64.indd 163
9/4/2010 10:52:37 AM
164
Motivation
student behavioral self-management. An appreciation of the role intrinsic motivation plays in deviant and devious behavior suggests approaches to such behavior that go beyond current disciplinary and management practices. Before discussing these matters, however, it is important to acknowledge the necessity of dealing with the impact of misbehavior and to highlight practical and research implications related to minimizing negative motivational and behavioral repercussions.
Dealing with the Impact of Misbehavior: A Motivational Perspective The first concern of school personnel is almost always the impact of misbehavior, and rightly so. Such behavior disrupts; it may be hurtful; it may disinhibit others. Thus, when a youngster misbehaves, a natural reaction is to want that youngster to experience, and other students to see, the consequences of misbehaving, in hopes that the consequences will deter subsequent misbehavior. That is, because the impact of misbehavior is typically the first concern, the primary focus of intervention usually is on discipline (e.g., Charles, 1985; Dreikurs, Grunwald, & Pepper, 1982; Hyman, Flanagan, & Smith, 1982; Knoff, 1987; Wolfgang & Glickman, 1986). Given the primary role assigned to disciplinary practices in responding to school misbehavior, it is essential that their impact on intrinsic motivation be considered and investigated. Thus, some motivational concerns are highlighted here as a stimulus for practice and research. Discipline. Knoff (1987) presents three definitions of discipline as applied in schools: (a) ... a punitive intervention; (b) … a means of suppressing or elim inating inappropriate behavior, of teaching or reinforcing appropriate behavior, and of redirecting potentially inappropriate behavior toward acceptable ends; and (c) … a process of self-control whereby the (po tentially) misbehaving student applies techniques that interrupt inappropriate behavior, and that replace it with acceptable behavior. (p. 119)
In contrast to the first definition, which specifies discipline as punishment, Knoff (1987) sees the other two as nonpunitive, or, as he calls them, “positive, best practices approaches” (p. 119). He appears to make this distinction because of the general consensus that punishment is an undesirable form of discipline, to be used only in an emergency. School personnel often see punishment as the only recourse in dealing with a student’s misbehavior. That is, they use the most potent negative consequences available to them in a desperate effort to control an individual and make it clear to others that acting in such a fashion will not be tolerated.
Salkind_Chapter 64.indd 164
9/4/2010 10:52:37 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
165
A demand for future compliance is usually made, along with threats of harsher punishment if compliance is not forthcoming. Furthermore, the discipline may be administered in a way that suggests that the student is officially seen as an undesirable person. As with many emergency procedures, the benefits of using punishment may be offset by a variety of negative consequences (e.g., increases in negative attitudes toward school and school personnel, which often lead to other forms of misbehavior). Thus, as soon as the emergency is resolved, the emphasis often shifts from punishment to implementing logical consequences. Logical Consequences and Recipient Perceptions. Guidelines for managing misbehavior generally emphasize the desirability of presenting discipline as reasonable, fair, and nondenigrating. Intrinsic motivation theory specifically stresses that recipients of positive, best-practices approaches experience them as legitimate disciplinary acts that neither denigrate one’s sense of worth nor reduce one’s sense of autonomy (e.g., Deci & Ryan, 1985). To these ends, discussions of classroom management practices usually emphasize establishing and administering logical consequences. This idea is evident in situations where there are naturally occurring consequences (e.g., if you touch a hot stove, you get burned). In classrooms, there may be little ambiguity about the rules; unfortunately, the same often cannot be said about “logical” penalties. Even when the consequence for a particular rule infraction has been specified ahead of time, its logic may be more in the mind of the teacher than in the eye of the student. Indeed, the distinctions made by Knoff (1987) reflect an observer’s perspective of discipline. In the recipient’s view, any act of discipline may be experienced as punitive (e.g., unreasonable, unfair, denigrating, disempowering). Basically, consequences involve depriving students of something they want and/or making them experience something they do not want to experience. Consequences usually take the form of (a) removal/deprivation (e.g., loss of privileges, removal from an activity); (b) reprimands (e.g., public censure); (c) reparations (to compensate for any losses arising from the misbehavior); and (d) recantations (e.g., apologies, plans for avoiding future problems). For instance, teachers commonly deal with acting-out behavior by removing a student from an activity. To the teacher, this step (often described as “time out”) may be seen as a logical way to stop the student from disrupting others by isolating him or her; or, the logic may be that the student needs a cooling-off period. It may be reasoned that (a) by misbehaving the student has shown that she or he does not deserve the privilege of participating (assuming the student likes the activity), and (b) the loss will lead to improved behavior in order to avoid future deprivation. Most teachers have little difficulty explaining their reasons for using a particular consequence. However, if the intent really is to have students perceive consequences as logical and nondebilitating, it seems logical to first
Salkind_Chapter 64.indd 165
9/4/2010 10:52:37 AM
166
Motivation
determine whether the recipient sees a particular disciplinary act as a legitimate response to his or her misbehavior. Moreover, it is important to recognize the difficulty of administering consequences in a way that minimizes the negative impact on the recipient’s perceptions of self. That is, although the intent is to stress that it is the misbehavior and its impact that are bad, the student can too easily experience the process as a characterization of her or him as a bad person. Examples of an established, accepted set of consequences that gives major consideration to the recipients’ perceptions are organized sports, such as youth basketball and soccer. In these arenas, the referee is able to use the rules and related criteria to identify inappropriate acts and apply penalties; moreover, he or she is expected to do so with positive concern for maintaining the youngster’s dignity as well as engendering respect for others. For discipline to be seen as a logical consequence, it may be necessary to take steps to convey that (a) disciplinary responses are not personally motivated acts of power (e.g., an authoritarian action) and, at the same time, (b) the social order has established rational reactions to a student’s behavior that negatively affects others. Also, if the intent of the discipline is a longterm reduction in future misbehavior, it may be necessary to take steps to help students learn right from wrong, to respect others’ rights, and to accept responsibility. Toward these ends, motivational theorists suggest it may be useful to (a) establish a publicly accepted set of consequences, to increase the likelihood that students will experience them as socially just (e.g., reasonable, firm but fair), and (b) administer such consequences in ways that allow students to maintain a sense of integrity, dignity, and autonomy (e.g., Brehm & Brehm, 1981; Deci & Ryan, 1985). All of this is best achieved under conditions wherein students are empowered (e.g., are involved in deciding how to rectify the situation and avoid future misbehavior and are given opportunities for subsequent positive involvement and reputation-building at school). From a motivational perspective, then, it is essential to (a) gain a better understanding of recipient perceptions of discipline and (b) develop disciplinary practices that minimize negative repercussions. These are both areas where there is a dearth of direct research.
Addressing Underlying Motivation Beyond discipline, there is a need for research on interventions designed to address the roots of misbehavior, especially the underlying motivational bases for such behavior. Consider students who spend most of the day trying to avoid all or part of the instructional program. An intrinsic motivational interpretation of the avoidance behavior of many of these youngsters is that it reflects their perception that school is not a place where they experience
Salkind_Chapter 64.indd 166
9/4/2010 10:52:37 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
167
a sense of competence, autonomy, and/or relatedness to others. Over time, these perceptions develop into strong motivational dispositions and related patterns of misbehavior. Relevant interventions for such problems begin with major changes in social and school programs. The aims of such changes vis-à-vis motivational problems are to (a) prevent and overcome negative attitudes toward school and learning, (b) enhance motivational readiness for learning and overcoming problems that arise, (c) maintain intrinsic motivation throughout learning and problem-solving processes, and (d) nurture the type of continuing motivation that results in students’ engaging in activities away from school that facilitate maintenance, generalization, and expansion of learning and problem solving. Failure to attend to these motivational concerns in a comprehensive, normative manner results in approaching passive and often hostile students with practices that can instigate and exacerbate many learning and behavior problems. After accomplishing broad programmatic changes to the degree feasible, intervention with a misbehaving student involves remedial steps directed at specific factors associated with unintentional, proactive, and/or reactive deviance. Because the concern here is with intentional behavior problems, the focus in the following sections is primarily on reactive and proactive misbehavior. First, a few implications for counseling and consulting are highlighted and then implications for general changes in school programs are discussed. Counseling and Consulting. Understanding the motivational ideas discussed above can profoundly influence research and practice focused on counseling individuals who misbehave and consulting with their teachers and parents. For instance, with intrinsic motivation in mind, the following assessment questions arise: 1. Is the misbehavior unintentional or intentional? 2. If it is intentional, is it reactive or proactive? 3. If the misbehavior is reactive, is it a reaction to threats to selfdetermination, competence, or relatedness? 4. If it is proactive, are there other interests that might successfully compete with the satisfaction derived from deviant behavior? The answers to these questions may be based on perspectives of cause that are related by teachers, parents, and the identified student. (In ruling out a skill deficit, data also are needed on the youngster’s basic abilities.) However, because of attributional biases, one can expect these interested parties to offer different causal views (Jones & Nisbett, 1972; Miller & Ross, 1975; Monson & Snyder, 1977). Rather than confounding assessment, such differences can help clarify the student’s underlying motivation and how others interpret that motivation. Both matters can be seen as central to planning corrective strategies aimed at affecting the student’s intrinsic motivation.
Salkind_Chapter 64.indd 167
9/4/2010 10:52:37 AM
168
Motivation
That is, differing perceptions can compound a problem by resulting in different analyses of what is wrong and what should be done. Awareness of differences in perceived cause enables interveners to explore how those differences are affecting the actions of each interested party and clarify which perceptions may be counterproductive to resolving the problem. With respect to resolving the problem, intrinsic motivational theory suggests that individual corrective interventions for those who misbehave reactively require steps designed to reduce reactance and enhance positive motivation for participating in an intervention. For youngsters highly motivated to pursue deviance (e.g., those who proactively engage in criminal acts), even more is needed. Intervention might focus on helping these youngsters identify and follow through on a range of valued, socially appropriate alternatives to deviant activity. From the theoretical perspective presented above, such alternatives must be capable of producing greater feelings of self-determination, competence, and relatedness than usually result from the youngster’s deviant actions. To these ends, motivational analyses of the problem can point to corrective steps for implementation by teachers, clinicians, parents, or the students themselves. If misbehavior is unintentional, the focus of intervention at school, clinic, and home probably needs to be directed only at reducing stress and building skills. However, if the behavior is intentional, all interested parties probably should be encouraged to (a) eliminate situations leading to reactivity and establish alternative ways for the student to cope with what cannot be changed, or (b) establish activity options designed to redirect proactive misbehavior toward prosocial interests and behavior. For example, consultants might help teachers and parents understand the motivational bases for a youngster’s misbehavior and facilitate environmental and programmatic changes that would take into account the youngster’s need to feel self-determining, competent, and related. Similarly, in direct counseling with students whose misbehavior is intentionally reactive, short-term work might stress increasing a student’s awareness and how she or he can work with significant others to produce circumstances that better match his or her psychological needs. Comparable counseling might be provided to those exhibiting proactive deviance; however, evidence from delinquent populations suggests that short-term counseling in such cases is rather ineffective. Indeed, for both groups, it must be acknowledged that little is known about how effective even long-term psychotherapy or behavior change strategies might be. Nevertheless, long-term intervention generally provides the time necessary to deal with students’ affect, increasing their understanding of why they behave as they do, and exploring the possibility of change (Lambert, Shapiro, & Bergin, 1986). The question of how well such outcomes can be achieved awaits appropriate evaluative research (Adelman, 1986; Maher & Bennett, 1984).
Salkind_Chapter 64.indd 168
9/4/2010 10:52:37 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
169
From a motivational perspective, an appropriate test of the efficacy of long-term psychotherapeutic and behavior change interventions for intentional misbehavior requires more than specifying what one wants youngsters to understand and do. It requires interventions that systematically address intrinsic motivation as key process or “enabling” objectives. To be specific, the intervention must deal with the initial attitudes these youngsters are likely to bring to the counseling situation. They are unlikely to approach the process positively or even neutrally; that is, there are negative attitudes to overcome. Assuming negative attitudes are overcome, the intervenor must be able to (a) enhance the youngster’s motivational readiness to develop a working relationship, and (b) maintain the youngster’s positive intrinsic motivation for as long as intervention is needed. In terms of the motivational concepts discussed above, from the beginning of the intervention until its successful completion, the process should strive to stimulate feelings of selfdetermination, competence, and interpersonal relatedness. Finally, the intervention should focus on intrinsic motivation as an outcome objective. That is, the process should nurture the type of ongoing intrinsic motivation that results in the youngster engaging in activities away from the intervention setting that facilitate maintenance and generalization of problem-solving behavior. School Program Changes to Deal with Reactive Misbehavior. A student who perceives school personnel and activities as threats to his or her self-determination, competence, or sense of relatedness to others may react in protective ways. For instance, a student who expects to do poorly on an assigned classroom task may misbehave as a way of avoiding the activity (Brehm & Brehm, 1981; Kaplan, 1980). If the teacher’s reaction to the misbehavior is to threaten or apply punitive measures, the student may react in increasingly negative ways. The case of Bret provides an example: Because of his many experiences of failure at school, Bret tends to perceive learning situations as threatening. Even before he knows much about a task, he expects to have difficulty coping. Thus, he feels vulnerable, fearful, and sometimes angry at being pushed into such situations. He would like to avoid them, and if he cannot do so directly, he tries indirect ways, such as diverting the teacher to a discussion of other matters. When he cannot manipulate the situation effectively, he engages in various acting-out behaviors, such as arguing, inciting the class to disruption, or regularly missing school. This often leads to a power struggle with the teacher and results in Bret being sent to the principal or home. After a number of such experiences, he has developed rather strong negative expectations and attitudes about school and teachers and has learned a rather large range of behaviors to protect himself from what he perceives as bad situations. Unfortunately, the more he displays such behavior, the more those around him tend to think of him as uncontrollable and incorrigible.
Salkind_Chapter 64.indd 169
9/4/2010 10:52:37 AM
170
Motivation
A great deal of the negative behavior of students such as Bret may reflect reactions to immediate school pressures. Those with long or intense histories of school problems may develop the general expectation that most classroom experiences are hurtful. Consequently, a student may approach all classroom situations looking for the worst – and, thus, perceiving it. Even when a teacher offers “exciting” new opportunities, the student may not perceive them as such. If the intention is to address the motivational underpinnings for reactive misbehavior, two intervention objectives seem fundamental: (1) to minimize external demands to perform and conform (e.g., eliminate threats) and (2) to explore learning activities with the student in order to identify nonthreatening and interesting replacements (e.g., establish a program of intrinsically motivated activity). To these ends, intervention focuses first on assessing (if feasible) the nature of any perceived threats. Such an assessment is guided by motivational thinking about threats to perceived self-determination, competence, and relatedness. The data are then used to replace threatening situations and tasks with activity that produces positive perceptions with respect to identified psychological needs. Even if the specific areas of threat cannot be assessed, one can proceed to work with the student to eliminate aspects of the program that he or she appears to be reacting against. In making changes, it is important to realize that students with extremely negative perceptions of teachers and school programs are not likely to be open to “new” activities that look like “the same old thing.” There have to be vivid variations in the alternatives for students to perceive differences. Several key elements of such interventions are summarized after the following discussion of proactive misbehavior. School Program Changes to Deal with Proactive Misbehavior. Proactive misbehavior is aimed at directly producing feelings of satisfaction. That is, non-cooperative, disruptive, and aggressive behavior may be rewarding or satisfying to an individual because the behavior itself is exciting or because the behavior leads to desired outcomes (e.g., peer recognition, feelings of autonomy and /or competence). Intentional negative behavior stemming from such motivation can be viewed as the direct pursuit of deviance. In practice, it is not easy to differentiate reactive from proactive misbehavior. For example, one student may proactively engage in decorating school walls with graffiti because he or she finds it to be an interesting and exciting act; another may engage in the activity because of norms established by a valued peer group. Still another may reactively engage in such behavior because of anger toward school authorities. (Subsequently, this last student may fall in with negative role models, such as gang members, and adopt their pattern of proactive misbehavior, e.g., delinquent acts that are intrinsically interesting and exciting). And, of course, students involved in deviant behavior inevitably come into conflict with school authorities and soon manifest additional reactive misbehavior.
Salkind_Chapter 64.indd 170
9/4/2010 10:52:37 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
171
Proactive misbehavior, such as staying home from school to watch TV or hang out with friends, participating with gangs, using drugs, and baiting authority, may be much more interesting and exciting to some students than any activity that school offers. That is probably why proactive misbehavior is so difficult to alter. From the perspective of intrinsic motivation theory, the fundamental objective of intervention in such cases is to establish a program of intrinsically motivated activity powerful enough to compete with the satisfaction gained from the misbehavior. This means the intervenor must be able to explore options well beyond the norm in offering nonthreatening and interesting learning activities to replace the student’s current school program. At the same time, because such students are unlikely to give up their pursuit of deviance quickly, it may be necessary, initially, to accommodate a wider range of behavior than typically is accepted in schools. That is, if the intention is to recapture the interest of such students, one may have to increase one’s tolerance, for a while, of certain “bad manners” (e.g., some rudeness, some swearing), eccentric mannerisms (e.g., strange clothing and grooming), and temporary nonparticipation. To be more specific, it may be necessary to begin by exploring with the student (a) topical interests (e.g., sports, rock music, movies and TV shows, computer games, auto mechanics) and (b) desired activities (e.g., working with certain individuals, use of nonstandard materials, special status roles). Discussion and sampling of the area of interest may have to be continued until the student identifies a specific facet that she or he would like to learn more about. Concomitantly, the intervenor may have to redefine rules and standards so that limits on behavior are expanded for such students (i.e., certain behaviors are tolerated and not treated as misbehavior). Failure to do so may account for the large proportion of these students who are pushed out or drop out due to constant conflict over misconduct. A case example suggests the extremes that may have to be attempted: Harry would come to school, but he had no interest in working on what his teachers had planned. He spent much of the time talking to friends and looking for exciting ways to make the time pass. He was frequently in the midst of whatever trouble was occurring in class. He was unresponsive to threats of punishment. He readily accepted suspensions. It seemed clear that unless something dramatic were done he would be expelled from school. Rather than letting the tragedy run its course, school personnel decided to try an experimental intervention. The teacher set aside time to help Harry identify one area of personal interest that he would like to learn more about. After some discussion, he indicated that he wanted to be a rock musician and would be interested in learning more about how people got into the field; he would also like to spend time improving his musical skills. Based on his stated interest, several interesting and realistic activities were identified that he would pursue, such as writing letters to musicians and agencies, instrument instruction and practice, and reading relevant publications. It was clear,
Salkind_Chapter 64.indd 171
9/4/2010 10:52:37 AM
172
Motivation
however, that the one topic and those few activities would not hold his interest all day. Indeed, it was likely that what had been planned would involve him for only 1 to 2 hours a day. Thus, it was easy to anticipate that he would simply fall into his pattern of misbehaving for the remainder of the day, and the experimental effort to counter his misbehavior by building an intrinsically motivating program would be defeated. The solution devised for this problem was as simple as it was controversial. Harry was scheduled to come to school for only that period of time for which he had planned a program he intended to pursue. The reasoning for this approach was twofold: (1) It is clear that students such as Harry work only on what they have identified as desirable, and (2) they not only waste the rest of their time, but use it to pursue deviant behavior. If they are not at school for a full day, they are less likely to get into as much trouble at school. But, more important, the less that school personnel are in the position of coercing and punishing them, the less likely the problem will be confounded by misbehavior that is a reaction to such practices. Moreover, when such students no longer are expending energy misbehaving, they are in a better position to work with the teacher to develop an increasing range of academic interests. Indeed, it was a matter of only a few weeks before Harry indicated several additional areas of interest, including a desire to improve his reading. To accommodate his interests, his school day was expanded. Within a period of several months, he was regularly attending school all day, pursuing a combination of personally designated areas of interest and an increasing amount of the basic curriculum.
Clearly, there are many practical, economic, and legal problems involved in a strategy such as cutting back on the length of a student’s school day. However, those problems should be considered in the context of the cost to society (and individuals) of ignoring the fact that forcing certain students to stay in school all day might actually interfere with correcting their problems. It may be better to temporarily reduce a student’s time at school for positive reasons than for punishment (e.g., suspensions), or because of truancy. For older students, of course, a shortened day paired with a part-time job or apprenticeship is already an accepted and often productive strategy. Key Program Elements in Addressing Motivational Problems. As stated above, it is important to realize that students with extremely negative perceptions of teachers and school programs may not respond to changes that look like “the same old thing.” It seems necessary to, at the very least, make exceptional efforts to have students (a) view the teacher as supportive (rather than hostile and controlling or indifferent) and (b) perceive curricular and behavioral options as personally valuable and obtainable. Comprehensive, motivationally oriented intervention research is needed to clarify ways to produce major changes in students’ perceptions about such matters. (See Adelman & Taylor, 1983, 1986b, for a more extensive discussion.)
Salkind_Chapter 64.indd 172
9/4/2010 10:52:37 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
173
Three key classroom program elements that warrant attention are highlighted below: 1. Options. Provision of a range of valued curricular and behavioral options is hypothesized as essential to establishing a classroom program that is a good match with a student’s psychological needs for self-determination, competence, and relatedness. (By definition, a good match means the program is nonthreatening.) In extreme cases it may be necessary to deemphasize temporarily the standard curriculum and pursue only activities to which the student makes a personal commitment. Furthermore, in extreme cases it may be necessary to accommodate, again temporarily, a wider range of deviant behavior than usually is tolerated from a student (e.g., limits set by existing standards and rules may have to be widened), to minimize psychological reactance and resultant increases in reactive behavior problems. 2. Student Decision Making. In terms of intrinsic motivation, options alone are insufficient. Also necessary is a structure that facilitates (e.g., supports) student decision making with respect to choosing options (and that allows for student-initiated changes in program plans). Thus, from a motivational perspective, another key element in dealing with misbehavior is student involvement in decision making about daily school activities and consequences for misbehavior. With respect to cause, it is hypothesized that students not included in decision making have little commitment to what is decided. Moreover, some of these students may perceive themselves as coerced and react in deviant ways in order to regain a sense of self-determination. With respect to correction, decision-making processes that maximize student perceptions of choice and control are essential. 3. Continuous Information on Functioning. Because of the potential negative impact of too much emphasis on extrinsic punishment and rewards, great care must be taken in providing student feedback on progress (see Deci, 1975, 1980; Deci & Ryan, 1985). The implications of research on this matter are that information given should highlight success not only in terms of task performance but with respect to student effectiveness in making good decisions and on how outcomes relate to the intrinsic reasons underlying student actions. Feedback, of course, is also supposed to clarify directions for future progress. Research is needed to clarify conditions that maximize the likelihood of feedback contributing to, rather than undermining, the student’s feelings of competence, self-determination, and relatedness.
Concluding Comments In the above brief discussion of interventions related to school misbehavior, we have been able to touch upon only a few major areas of practice. Table 1 provides a more comprehensive perspective on the nature and scope of needed intervention activity.
Salkind_Chapter 64.indd 173
9/4/2010 10:52:37 AM
174
Motivation
As the preceding discussion underscores, and as is highlighted in Table 1, an understanding of the intrinsic motivational bases for deviant and devious behavior generates profound implications for intervention research and practice. For example, such an understanding points to assessment questions, classification concepts, and corrective strategies that might otherwise be ignored. It also highlights the need for comprehensive research programs to develop and evaluate interventions that address the motivational underpinnings of school misbehavior. Data from such research could shed considerable light on cause and correction with respect to all psychoeducational problems. Table 1: Focus of interventions for dealing with misbehavior I. Preventing Misbehavior A. Expand Social Programs 1. Increase economic opportunity for low income groups 2. Augment health and safety prevention and maintenance (encompassing parent education and direct child services) 3. Extend quality day care and early education B. Improve Schooling 1. Personalize classroom instruction (e.g., accommodating a wide range of motivational and developmental differences) 2. Provide status opportunities for nonpopular students (e.g., special roles as assistants and tutors) 3. Identify and remedy skill deficiencies early C. Follow Up All Occurrences of Misbehavior to Remedy Causes 1. Identify underlying motivation for misbehavior 2. For unintentional misbehavior, strengthen coping skills (e.g., social skills, problem-solving strategies) 3. If misbehavior is intentional but reactive, work to eliminate conditions that produce reactions (e.g., conditions that make the student feel incompetent, controlled, or unrelated to significant others) 4. For proactive misbehavior, offer appropriate and attractive alternative ways the student can pursue a sense of competence, control, and relatedness 5. Equip the individual with acceptable steps to take instead of misbehaving (e.g., options to withdraw from a situation or to try relaxation techniques) 6. Enhance the individual’s motivation and skills for overcoming behavior problems (including negative attitudes toward school) II. Anticipating Misbehavior A. Personalize Classroom Structure for High-Risk Students 1. Identify underlying motivation for misbehavior 2. Design curricula to consist primarily of activities that are a good match with the identified individual’s intrinsic motivation and developmental capability 3. Provide extra support and direction so the identified individual can cope with difficult situations (including steps that can be taken instead of misbehaving) B. Develop Consequences for Misbehavior That Are Perceived by Students as Logical (i.e., that are perceived by the student as reasonable, fair, and nondenigrating reactions that do not reduce one’s sense of autonomy) III. During Misbehavior A. Try to base response on understanding of underlying motivation. (If uncertain, start with assumption that the misbehavior is unintentional) B. Reestablish a calm and safe atmosphere 1. Use understanding of student’s underlying motivation for misbehaving to clarify what occurred. (If feasible, involve participants in discussion of events) 2. Validate each participant’s perspective and feelings 3. Indicate how the matter will be resolved, emphasizing use of previously agreed-upon logical consequences that have been personalized in keeping with understanding of underlying motivation
(Continued )
Salkind_Chapter 64.indd 174
9/4/2010 10:52:37 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
175
Table 1: (Continued ) III. During Misbehavior (continued ) 4. If the misbehavior continues, revert to a firm but nonauthoritarian statement including that it must stop or else the student will have to be suspended 5. As a last resort use crisis back-up resources: a. If appropriate, ask student’s classroom friends to help b. Call for help from identified back-up personnel 6. Throughout the process, keep others calm by dealing with the situation with a calm and protective demeanor IV. After Misbehavior A. Implement Discipline – Logical Consequences/Punishment 1. Objectives in using consequences a. To deprive student of something he or she wants b. To make student experience something he or she does not want 2. Forms of consequences a. Removal/deprivation (e.g., loss of privileges, removal from activity) b. Reprimands (e.g., public censure) c. Reparations (e.g., of damaged or stolen property) d. Recantations (e.g., apologies, plans for avoiding future problems) B. Discuss the Problem with Parents 1. Explain how they can avoid exacerbating the problem 2. Mobilize them to work preventively with school C. Work Toward Prevention of Further Occurrences (see I & II)
Authors’ Note We want to thank Ed Deci and an anonymous reviewer for their valuable feedback and suggestions.
Notes 1. Deci and Ryan (1985) define self-determination as “a basic, innate propensity … that leads organisms to engage in interesting behaviors … out of choice rather than obligation or coercion, and those choices are based on an awareness of one’s organismic needs and a flexible interpretation of external events” (p. 38). They define competence as the need people have to be effective. Relatedness is the need for warmth from and involvement with others (Deci & Chandler, 1986). 2. Psychological reactance is Brehm’s term for the motivation to protect or restore options or freedoms (Brehm & Brehm, 1981). Such motivation is aroused when an option (freedom) that is important and believed to be available is removed or threatened. 3. There are significant variations in the way theorists have defined perceived control. In our work, we define it as the degrees of freedom one expects to have over processes that one believes must be pursued to accomplish specific outcomes (including decisionmaking processes and outcomes). In studying how such perceptions affect behavior, we also emphasize the importance of the degree of value one places on having control over a specific process or outcome (Adelman, Smith, Nelson, Taylor, & Phares, 1986).
References Achenbach, T. M. (1988). Integrating assessment and taxonomy. In M. Rutter, A. H. Tuma, & I. S. Lann (Eds.), Assessment and diagnosis in child psychopathology (pp. 28– 41). New York: Guilford.
Salkind_Chapter 64.indd 175
9/4/2010 10:52:37 AM
176
Motivation
Achenbach, T. M., & Edelbrock, C. S. (1983). Taxonomic issues in child psychopathology. In T. H. Ollendick & M. Hersen (Eds.), Handbook of child psychopathology (pp. 65–93). New York: Plenum Press. Adelman, H. S. (1986). Intervention theory and evaluating efficacy. Evaluation Review, 10, 65–83. Adelman, H. S. (1989). Prediction and prevention of learning disabilities: Current state of the art and future directions. In L. Bond & B. Compas (Eds.), Primary prevention in the schools (pp. 106 –145). Newbury Park, CA: Sage. Adelman, H. S., Smith, D., Nelson, P., Taylor, L., & Phares, V. (1986). An instrument to assess students’perceived control at school. Educational and Psychological Measurement, 46, 1005–1017. Adelman, H. S., & Taylor, L. (1983). Learning disabilities in perspective. Glenview, IL: Scott, Foresman. Adelman, H. S., & Taylor, L. (1985). Toward integrating intervention concepts, research, and practice. In S. I. Pfeiffer (Ed.), Clinical child psychology: An introduction to theory, research, and practice (pp. 57–92). New York: Grune & Stratton. Adelman, H. S., & Taylor, L. (1986a). Children’s reluctance regarding treatment: Incompetence, resistance, or an appropriate response? School Psychology Review, 15, 91–99. Adelman, H. S., & Taylor, L. (1986b). An introduction to learning disabilities. Glenview, IL: Scott, Foresman. Allen, V. L., & Greenberger, D. B. (1980). Destruction and perceived control. In A. Baum & J. E. Singer (Eds.), Advances in environmental psychology: Vol. 2, Applications of personal control (pp. 85–109). Hillsdale, NJ: Erlbaum. Brehm, S. S., & Brehm, J. W. (1981). Psychological reactance: A theory of freedom and control. New York: Academic Press. Charles, C. M. (1985). Building classroom discipline: From models to practice (2nd ed.). New York: Longman. Condry, J. (1977). Enemies of exploration: Self-initiated versus other-initiated learning. Journal of Personality and Social Psychology, 35, 459–477. Deci, E. L. (1975). Intrinsic motivation. New York: Plenum Press. Deci, E. L. (1980). The psychology of self-determination. Lexington, MA: Lexington Books. Deci, E. L., & Chandler, C. L. (1986). The importance of motivation for the future of the LD field. Journal of Learning Disabilities, 19, 587–594. Deci, E. L., Nezlek, J., & Sheinman, L. (1981). Characteristics of the rewarder and intrinsic motivation of the rewardee. Journal of Personality and Social Psychology, 40, 1–10. Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. New York: Plenum Press. Dreger, R. M. (1982). The classification of children and their emotional problems: An overview – II. Clinical Psychology Review, 2, 349–385. Dreikurs, R., Grunwald, B. B., & Pepper, F. C. (1982). Maintaining sanity in the classroom: Classroom management techniques (2nd ed.). New York: Harper & Row. Hyman, I., Flanagan, D., & Smith, K. (1982). Discipline in the schools. In C. R. Reynolds & T. B. Gutkin (Eds.), The handbook of school psychology (pp. 454 – 480). New York: Wiley. Jones, E., & Nisbett, R. (1972). The actor and the observer: Divergent perceptions of the causes of behavior. In E. E. Jones, D. E. Kanouse, H. H. Kelley, R. E. Nisbett, S. Valens, & B. Weiner (Eds.), Attribution: Perceiving the causes of behavior (pp. 79–94). Morristown, NJ: General Learning Press. Kaplan, H. B. (1980). Deviant behavior in defense of self. New York: Academic Press. Knoff, H. M. (1987). School-based interventions for discipline problems. In C. A. Maher & J. E. Zins (Eds.), Psychoeducational interventions in the schools (pp. 118–140). New York: Pergamon.
Salkind_Chapter 64.indd 176
9/4/2010 10:52:37 AM
Adelman and Taylor
Intrinsic Motivation and School Misbehavior
177
Lambert, M. J., Shapiro, D. A., & Bergin, A. E. (1986). The effectiveness of psychotherapy. In S. L. Garfield & A. E. Bergin (Eds.), Handbook of psychotherapy and behavior change (3rd ed.). New York: Wiley. Langer, E. J. (1983). The psychology of control. Beverly Hills, CA: Sage. Maher, C. A., & Bennett, R. E. (1984). Planning and evaluating special education services. Englewood Cliffs, NJ: Prentice-Hall. Maher, C. A., & Zins, J. E. (Eds.). (1987). Psychoeducational interventions in the schools. New York: Pergamon. McGraw, K. O. (1978). The detrimental effects of reward on performance: A literature review and a prediction model. In M. R. Lepper & D. Greene (Eds.), The hidden costs of reward (pp. 33–60). Hillsdale, NJ: Erlbaum. Miller, D. T., & Ross, M. (1975). Self-serving biases in the attribution of causality: Fact or fiction? Psychological Bulletin, 82, 213–225. Millman, H. L., Schaefer, C. E., & Cohen, J. J. (1981). Therapies for school behavior problems. San Francisco: Jossey-Bass. Monson, T. C., & Snyder, M. (1977). Actors, observers, and the attribution process: Toward a reconceptualization. Journal of Experimental Social Psychology, 13, 89–111. Nichols, B. K. (1985). Self-perception of control and esteem as related to participation in a leadership training program. Unpublished doctoral dissertation, University of California, Los Angeles. Quay, H. C. (1986). A critical analysis of DSM-III as a taxonomy of psychopathology in childhood and adolescence. In T. Millon & G. L. Klerman (Eds.), Contemporary directions in psychopathology: Toward the DSM-IV (pp. 151–165). New York: Guilford. Quay, H. C., Routh, D. K., & Shapiro, S. K. (1987). Psychopathology of childhood: From description to validation. Annual Review of Psychology, 38, 491–532. Quay, H. C., & Werry, J. S. (Eds.). (1986). Psychopathological disorders of childhood (3rd ed.). New York: Wiley. Rutter, M., & Gould, M. (1985). Classification. In M. Rutter & L. Hersov (Eds.), Child and adolescent psychiatry: Modern approaches (2nd ed., pp. 304 –321). Oxford: Blackwell. Rutter, M., & Tuma, A. H. (1988). Diagnosis and classification: Some outstanding issues. In M. Rutter, A.H. Tuma, & I. S. Lann (Eds.), Assessment and diagnosis in child psychopathology. New York: Guilford. Ryan, R. M., Connell, J. P., & Deci, E. L. (1985). A motivational analysis of self-determination and self-regulation in education. In C. Ames & R. E. Ames (Eds.), Research on motivation in education: The classroom milieu (pp. 13–51). New York: Academic Press. Stipek, D. J., & Weisz, J. R. (1981). Perceived control and children’s academic achievement: A review and critique of the locus of control research. Review of Educational Research, 51, 101–137. Weiner, B. (1980). Human motivation. New York: Holt, Rinehart & Winston. Weisz, J. R., & Cameron, A. M. (1985). Individual differences in the student’s sense of control. In C. Ames & R. E. Ames (Eds.), Research on motivation in education: The classroom milieu (pp. 13–51). New York: Academic Press. Winett, R. A., Stefanek, M., & Riley, A. W. (1983). Preventive strategies with children and families: Small groups, organizations, communities. In T. H Ollendick & M. Hersen (Eds.), Handbook of child psychopathology (pp. 485–521). New York: Plenum Press. Wolfgang, C. H., & Glickman, C. D. (1986). Solving discipline problems: Strategies for classroom teachers (2nd ed.). Boston: Allyn & Bacon. Worchel, S. (1974). The effect of three types of arbitrary thwarting on the instigation to aggression. Journal of Personality, 42, 300 –318.
Salkind_Chapter 64.indd 177
9/4/2010 10:52:37 AM
Salkind_Chapter 64.indd 178
9/4/2010 10:52:38 AM
65 Reinforcement, Reward, and Intrinsic Motivation: A Meta-Analysis Judy Cameron and W. David Pierce
R
einforcement theory has had a significant impact on education. Education professors routinely teach the basic elements of behavior theory. As a consequence, most classroom teachers have at least some rudimentary understanding of the principles of reinforcement. These principles are often used to promote learning and to motivate students. In recent years, however, there has been a growing concern over the application of reward systems in educational settings. Several researchers have presented evidence and argued that incentive systems based on reinforcement may have detrimental effects. The contention is that reinforcement may decrease an individual’s intrinsic motivation to engage in a particular activity. To illustrate, if a child who enjoys drawing pictures is externally reinforced (e.g., with points or money) for drawing, the child may come to draw less once the reward is discontinued. In other words, one alleged effect of reinforcement is that it undermines intrinsic interest in a task. The literature concerned with the effects of reinforcement on intrinsic motivation draws mainly from experimental investigations. In an article published in the American Psychologist, Schwartz (1990) cited the intrinsic motivation experiment of Lepper, Greene, and Nisbett (1973) and concluded that reinforcement has two effects. First, predictably it gains control of [an] activity, increasing its frequency. Second, … when reinforcement is later withdrawn, people engage in the activity even less than they did before reinforcement was introduced. (p. 10)
Source: Review of Educational Research, 64(3) (1994): 363–423.
Salkind_Chapter 65.indd 179
9/4/2010 7:05:31 PM
180
Motivation
While several researchers agree with this conclusion (e.g., Kohn, 1993; Sutherland, 1993), others continue to favor the use of reinforcement principles in applied settings (e.g., Hopkins & Mawhinney, 1992). This is, obviously, an important issue. Incentive systems are often implemented (or not) in schools, industry, hospitals, and so forth on the basis of research findings and conclusions. The present article evaluates the literature concerned with the effects of reinforcement and reward on intrinsic motivation by a meta-analysis of the relevant experimental investigations. Several researchers draw a distinction between intrinsic and extrinsic motivation. Intrinsically motivated behaviors are ones for which there is no apparent reward except the activity itself (Deci, 1975). Extrinsically motivated behaviors, on the other hand, refer to behaviors in which an external controlling variable can be readily identified. According to Deci (1975), intrinsic motivation is demonstrated when people engage in an activity for its own sake and not because of any extrinsic reward. The result of such behavior is an experience of interest and enjoyment; people feel competent and self-determining, and they perceive the locus of causality for their behavior to be internal. Intrinsically motivated behavior is seen to be innate and is said to result in creativity, flexibility, and spontaneity (Deci & Ryan, 1985). In contrast, extrinsically motivated actions are characterized by pressure and tension and result in low self-esteem and anxiety (Deci & Ryan, 1985). A great deal of debate has surrounded the intrinsic/extrinsic distinction. Several critics (e.g., Guzzo, 1979; Scott, 1975) point out difficulties in identifying intrinsically motivated behaviors. Although many human behaviors appear to occur in the absence of any obvious or apparent extrinsic consequences, they may, in fact, be due to anticipated future benefits (Bandura, 1977) or intermittent reinforcement (Dickinson, 1989). From this perspective, intrinsically motivated behavior is simply behavior for which appropriate controlling stimuli have yet to be specified. In spite of these conceptual difficulties, other social scientists frequently accept the intrinsic/extrinsic distinction. In fact, a large body of research is concerned with the effects of extrinsic rewards and reinforcers on behavior that is thought to have been previously maintained by intrinsic motivation. The next section of this article presents a description of the early studies concerned with the effects of reward and reinforcement on intrinsic motivation, the various research designs used to further investigate the issue, the variables investigated, and major findings.
The Effects of Reward and Reinforcement on Intrinsic Motivation The terms reward and reinforcement have frequently been used synonymously. Although this is the case, behavioral psychologists make an important distinction between the two terms. A reinforcer is an event that
Salkind_Chapter 65.indd 180
9/4/2010 7:05:31 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
181
increases the frequency of the behavior it follows. A reward, however, is not defined by its effects on behavior. Rewards are stimuli that are assumed to be positive events, but they have not been shown to strengthen behavior. Incentive systems (e.g., classroom token economies) may be based on reward or reinforcement and are designed to increase motivation. Because of these distinctions (between reward and reinforcement), this review separates those studies dealing with effects of reward from those concerned with the effects of reinforcement on intrinsic motivation.
The Early Studies The first laboratory investigations to test the effects of reward on intrinsic motivation were conducted by Deci (1971, 1972a, 1972b). In the first experiment (1971), 24 college students, fulfilling a course requirement, were presented with a puzzle-solving task (Soma, a commercial puzzle, produced by Parker Brothers, composed of seven different shapes that can be solved in a variety of ways). The Soma puzzle was chosen because it was believed that college students would be intrinsically interested in the task. The study was made up of three 1-hour sessions over a 3-day period. Twelve subjects were assigned to an experimental group; the other 12 to a control group. During each session, subjects were individually taken to a room and asked to work on the Soma puzzles in order to reproduce various configurations which were drawn on a piece of paper. Four puzzles were presented in a session, and subjects were given 13 minutes to solve each one. In the second session only, experimental subjects were told that they would receive $1.00 for each puzzle solved. Control subjects were offered no money. In the middle of each session, the experimenter made an excuse to leave the room for 8 minutes. Subjects were told that they could do as they pleased. During these 8-minute periods, the experimenter observed the subjects through one-way glass and recorded the time that each subject spent engaged on the Soma task. The amount of time spent on the task during the free periods was taken to be the measure of intrinsic motivation, the dependent variable. Deci hypothesized that reward (money) would interfere with subsequent intrinsic motivation and that subjects in the experimental group would spend less time on the task in the third session than they had in the first. He suggested that there would be a significant difference between the experimental and control subjects on this measure. Using a one-tailed t test, Deci found the difference between the two groups to be significant at p < .10. The rewarded group spent less time on the task than the control group. Although social scientists do not generally accept results at p > .05 as significant, and although Deci (1971) noted the marginal nature of his result, the data have been taken as support for the hypothesis that
Salkind_Chapter 65.indd 181
9/4/2010 7:05:32 PM
182
Motivation
If a person is engaged in some activity for reasons of intrinsic motivation, and if he begins to receive the external reward, money, for performing the activity, the degree to which he is intrinsically motivated to perform the activity decreases. (Deci, 1971, p. 108)
Deci’s experiment is often cited as groundbreaking evidence for the negative effects of reinforcement on intrinsic motivation (e.g., Kohn, 1993). Given the distinction between reward and reinforcement, however, Deci’s (1971) experiment, at most, demonstrates that rewards may have a negative impact on a person’s interest in a task. Nonetheless, his study was the first to investigate an issue that was of prime concern to many psychologists. The experiment provided researchers with a way to measure intrinsic motivation and with a paradigm to investigate the negative effects of reward. In another experiment, Deci (1971; Experiment 3) used the same experimental paradigm to investigate the effects of verbal reward. The reward contingency introduced in the second session was verbal praise, rather than money. During the second phase, subjects in the experimental group were told after each trial that their performance was very good or much better than average. Deci found that the reinforced group spent significantly more time on the task (difference scores between Session 3 and Session 1) than those who received no praise (p < .05). These results suggest that social rewards may increase the motivation to perform an activity. One of the best known and most cited studies on the detrimental effects of reward on behavior is the work of Lepper, Greene, and Nisbett (1973). In this study, nursery school children were observed in a free-play period to determine their initial interest on an activity (drawing). Two observers sat behind a one-way glass and recorded the amount of time each child was engaged in the activity. Those children who spent the most time on the task were selected as subjects for the experiment. Three experimental conditions were employed. In the “expected-reward” condition, children were offered a “good-player” award, which they received for drawing with magic markers. Children in the “unexpected-reward” group received the award but were not promised it beforehand, and “no-reward” subjects did not expect or receive an award. In a subsequent free-play session, those children who were promised an award (expected-reward subjects) spent significantly less time drawing than the other two groups. Furthermore, the expected-reward group spent less time drawing in the postexperimental session than they had in the initial session (preexperimental free-play session). The unexpected-reward and no-reward subjects showed slight increases in time on task from preexperimental to postexperimental sessions. Lepper et al. (1973) concluded that their results provided “empirical evidence of an undesirable consequence of the unnecessary use of extrinsic rewards,” (p. 136). However, those who received an unexpected reward spent more time on the task during the postexperimental free-play period than either the
Salkind_Chapter 65.indd 182
9/4/2010 7:05:32 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
183
expected-reward or the control group. Because the unexpected- and expectedreward groups are both reward conditions, the conclusion that these results demonstrate the negative effects of reward may not be warranted. This is because reward was held constant in the unexpected-reward and expectedreward groups; what differed was promise or no promise. That is, the promises made or the instructions given could have produced these results. Nonetheless, the findings of Lepper et al.’s (1973) study are frequently cited in journal articles and introductory psychology textbooks as evidence that extrinsic rewards and reinforcement undermine intrinsic interest in a task. The early studies by Deci (1971) and Lepper et al. (1973) have raised a number of issues and controversies that have generated considerable research. Some psychologists have claimed that the original findings provide evidence for the view that reinforcement decreases intrinsic motivation (e.g., Schwartz, 1990). Others recognize that not all types of reinforcement undermine intrinsic interest (e.g., Deci & Ryan, 1985). Still others argue that one must demonstrate that rewards are, in fact, reinforcers before any statements about the effects of reinforcement can be made (Feingold & Mahoney, 1975; Mawhinney, 1990). Several researchers are cautious about equating reward with reinforcement; their focus has been to discover when and under what conditions reward is detrimental (Bates, 1979; Morgan, 1984). In order to address these issues, researchers have employed a variety of research paradigms.
Between-Group Designs Studies designed to assess the effects of reward on intrinsic motivation have been conducted using between-group designs. Typically, one of two methods is employed. The first method, referred to as a before-after design (Deci & Ryan, 1985), involves a three-session paradigm. In these studies, a baseline measure of intrinsic motivation on a particular task is taken. This entails measuring time on task in the absence of extrinsic reward, usually from a session of short duration (e.g., 10 minutes). Subjects are then assigned to a reward or no-reward (control) condition, and an intervention with extrinsic rewards is carried out. Following this, reward is withdrawn, and time on task is again measured. The procedure is identical for both groups except that control subjects do not experience the intervention in the second session. Mean differences in time on task between pre- and postintervention are calculated for each group, and the scores for the experimental and control subjects are then statistically compared. Any difference between the two groups is considered evidence of the effects of withdrawal of reward. One advantage to the before-after procedure is that it allows the researcher to examine differences within groups from pre- to postexperimental sessions as well as differences between groups. In most studies of this type, however, only differences between groups are investigated. This is because the before-after
Salkind_Chapter 65.indd 183
9/4/2010 7:05:32 PM
184
Motivation
procedure has generally been used to identify individuals who show an initial interest in a specific task; those people are then selected as subjects for the study. In such cases, differences between rewarded and nonrewarded subjects are usually measured in the after-reward session only. Most researchers have used an after-only between-groups experimental design to assess the effects of rewards on intrinsic motivation. In this approach, no pretreatment measure of intrinsic interest is collected. In the typical experiment, subjects are presented with a task that is assumed to be intrinsically motivating – solving and assembling puzzles, drawing with felt-tipped pens, word games, and so on. Experimental subjects are rewarded with money or grades, candy, praise, good-player awards, and so forth for performing the activity. In some studies, the reward is delivered contingent on a certain level of performance on the task; in others, subjects are simply rewarded for participating in the task. Control subjects are not rewarded. The reward intervention is usually conducted over a 10-minute to 1-hour period. All groups are then observed during a nonreward period. This usually occurs immediately after the experimental session, although some researchers have observed subjects several weeks later. If experimental subjects spend less time on the task (during the postreward observation) than the controls, reinforcement/reward is said to undermine intrinsic motivation. The amount of time subjects spend on the task during the nonreward period is one of the major ways in which intrinsic motivation has been measured, and it is usually referred to as free time on task.
Findings from the Group Design Studies Generally, the results of the group design studies examining the main effects of rewards are conflicting. While some researchers have found that rewards lead to decreased time on the task relative to control groups (e.g., Deci, 1971; Fabes, 1987; Morgan, 1981), others report the opposite (e.g., Brennan & Glover, 1980; Deci, 1972a; Harackiewicz, Manderlink & Sansone, 1984). Some studies report no significant differences (e.g., Amabile, Hennessey, & Grossman, 1986; DeLoach, Griffith & LaBarba, 1983). Not all studies use the free-time measure of intrinsic motivation. Other dependent variables have included self-reports of task enjoyment, interest, and satisfaction; performance during the free time period (number of puzzles/ problems solved, number of drawings completed, etc.); and willingness to volunteer for future projects without reward. Overall, the results from studies employing these measures are conflicting and do not help to clarify the issue of whether reward leads to decreased intrinsic motivation. A number of reviewers (e.g., Bates, 1979; Deci & Ryan, 1985; Dickinson, 1989; Morgan, 1984) have noted the contradictory nature of the findings and have attempted to identify the conditions under which extrinsic reward produces decrements in intrinsic motivation. Some of the conditions thought to be critical in determining the impact of rewards include the type of reward (tangible or verbal), reward expectancy (whether reward is expected – i.e., offered
Salkind_Chapter 65.indd 184
9/4/2010 7:05:32 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
185
beforehand or received unexpectedly), and reward contingency (whether reward is delivered simply for performing the task or is contingent on some specified level of performance). Although this categorization system is useful, an examination of the literature within each category reveals conflicting results.
Type of Reward When verbally praised subjects are compared to a control group, some researchers have found an increase in intrinsic motivation (e.g. Deci, 1971) while others report no significant differences (e.g., Orlick & Mosher, 1978). The same holds true when subjects receiving tangible rewards are compared to controls. While some results provide evidence for a decrease in intrinsic motivation following the receipt of a tangible reward (e.g., Danner & Lonkey, 1981), others indicate an increase (e.g., Rosenfìeld, Folger, & Adelman, 1980).1 Reward Expectancy Comparisons between subjects who receive an unexpected tangible reward and subjects who receive no reward are also not clear cut. Some results indicate that unexpected-reward subjects show a decrease in intrinsic motivation (e.g., Orlick & Mosher, 1978); others have found no significant differences (e.g., Greene & Lepper, 1974). Experiments designed to investigate the effects of expected tangible rewards are also contradictory. Some studies, comparing subjects offered an expected reward to nonrewarded controls, show a negative effect of reward on intrinsic motivation (e.g., Deci, 1971; Lepper, Greene, & Nisbett, 1973). Others, however, demonstrate that expected-reward subjects show an increase in intrinsic motivation relative to controls (e.g., Brennan & Glover, 1980). Reward Contingency Morgan (1984) and Deci and Ryan (1985) suggest that reward contingency may play a critical role in determining the negative effects on intrinsic motivation. Again, however, results from such studies vary. When rewards are delivered contingent on some level of performance, some researchers have found a positive effect (e.g., Karniol & Ross, 1977); others report negative findings (e.g., Ryan, Mims, & Koestner, 1983). When rewards are delivered contingent on engagement in the task regardless of subjects’ level of performance, an undermining effect has been found in some studies (e.g., Lepper, Greene & Nisbett, 1973; Morgan, 1983, Experiment 1). Others report no decrease in intrinsic motivation (e.g., Pittman, Emery & Boggiano, 1982).
Within-Subject Designs One of the criticisms of the group design research is that researchers employing such a design often refer to their reward manipulation as a reinforcement procedure. By definition, a reinforcer is an event that
Salkind_Chapter 65.indd 185
9/4/2010 7:05:32 PM
186
Motivation
increases the frequency of the behavior it follows. In most studies on intrinsic motivation, researchers have not demonstrated that the events used as rewards increased the frequency of the behavior studied. In addition, critics (e.g., Feingold & Mahoney, 1975; Mawhinney, 1990) suggest that the measurement phases in the group design research are too brief to detect any temporal trends and transition states. In order to address these issues, a few studies have been conducted using a repeated measures, within-subject design. In this paradigm, the amount of time subjects spend on a particular task is measured over a number of sessions. Reinforcement procedures are then implemented over a number of sessions. In the final phase, reinforcement is withdrawn, and time on task is again repeatedly measured. Intrinsic motivation is indexed as a difference in time on task between pre- and postreinforcement phases where differences are attributed to the external reinforcement. In general, no substantial differences have been found when rate of performance and time on task in postreinforcement sessions are compared to pre-reinforcement phases (although, see Vasta & Stirpe, 1979). The advantage of within-subjects designs is that the researcher can determine whether the rewards used are actual reinforcers – that is, whether behavior increases during the reinforcement phase. Statements can then be made about the effects of reinforcement, rather than reward. However, only a handful of studies have been conducted using this type of design. Critics of within-subject research (e.g., Deci & Ryan, 1985) suggest that results from these designs are not generalizable because so few subjects are studied in any one experiment. A further criticism has to do with the lack of a control group. The argument is that in the within-subject designs there is no group that performs the activity without reinforcement; thus, one cannot know if there is an undermining effect relative to a control group. Finally, for these studies, the definition of a reinforcer is necessarily circular. That is, reward becomes reinforcement only after its effects are shown to increase behavior.
Theoretical Accounts of the Literature Although the results of laboratory investigations into the effects of reward and reinforcement on intrinsic motivation appear contradictory and confusing, a general contention in many textbooks and journal articles is that reward and/or reinforcement is detrimental to an individual’s intrinsic motivation (e.g., Kohn, 1993; McCullers, 1978; Schwartz, 1990; Zimbardo, 1988). In an attempt to account for the disparate outcomes, a few psychologists have offered theoretical explanations. Three major accounts are outlined below.
Salkind_Chapter 65.indd 186
9/4/2010 7:05:32 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
187
The Overjustifìcation Hypothesis One explanation that has been put forth to account for the detrimental effects of reward is termed the overjustifìcation effect (Lepper, Greene, & Nisbett, 1973). This hypothesis is largely based on attribution (Kelly, 1967) and selfperception (Bem, 1972) theories. A person’s perceptions about the causes of behavior are hypothesized to influence future motivation and performance. In the presence of external controls, people attribute their behavior to an external agent; when this is removed, future motivation and performance decrease. Conversely, behavior is attributed to internal causes in the absence of obvious external controls. In this case, motivation and performance are not affected. A decrease in intrinsic motivation following the withdrawal of a reward has been termed the overjustifìcation effect because it is thought that an external reward provides overjustifìcation for participating in an already attractive activity. Put another way, when individuals are rewarded for engaging in an already interesting activity, their perceptions shift from accounting for their behavior as self-initiated to accounting for it in terms of external rewards. That is, they are faced with too many reasons (justifications) for performing the activity, and the role of intrinsic motivation is discounted resulting in a decline in intrinsic motivation. Lepper (1981) has suggested that extrinsic rewards lead to a decrease in intrinsic motivation when they allow perceptual shifts of causality. According to Lepper, this occurs when there is sufficient initial interest in an activity, when the extrinsic rewards are salient, and when rewards do not increase perceived competence.
Cognitive Evaluation Theory Deci and Ryan (1985) suggest that the overjustifìcation hypothesis should not be considered a theory of motivation. They argue that self-attributions may affect intrinsic motivation, but they do not see them as necessary mediators. Instead, Deci and Ryan offer cognitive evaluation theory as an explanation for intrinsic motivation. Cognitive evaluation theory is based on the assumption that people have innate needs for competence and self-determination. From this perspective, a person’s intrinsic motivation is affected by changes in feelings of competence and self-determination. According to Deci and Ryan (1985), events facilitate or hinder feelings of competence and self-determination depending on their perceived informational, controlling, or amotivational significance. Events seen as informational indicate skill in performing a task; hence, competence is facilitated, which leads to increased intrinsic motivation. A controlling event is one perceived as an attempt to determine behavior. This type of event diminishes an individual’s self-determination
Salkind_Chapter 65.indd 187
9/4/2010 7:05:32 PM
188
Motivation
and intrinsic motivation. An amotivational event provides negative feedback, indicating a lack of skill, which reduces one’s competence and intrinsic motivation. Cognitive evaluation theory focuses on a person’s experiences of an activity. For this reason, Deci and Ryan (1985) emphasize the importance of self-report measures of task interest, satisfaction, and enjoyment as more indicative of intrinsic motivation than the free time-on-task measure. According to cognitive evaluation theory, rewards are not always harmful. Verbal rewards may be informational and lead to an increase in intrinsic motivation. Tangible rewards, on the other hand, are seen as controlling when their delivery is stated before the reward period (expected rewards). This is because the cognitive evaluation process is believed to begin while the rewarded activity is occurring. Further, rewards promised to persons for engaging in a task without a performance criterion (referred to as expected task contingent rewards by Deci & Ryan, 1985) are controlling and decrease intrinsic motivation. Deci and Ryan suggest that rewards delivered to a person contingent on a specified level of performance are more complicated. This type of reward can be informational or controlling, but the difficulty is that its function can only be determined by how well a person performs in relation to the specified standard. If the individual performs well, the reward is informational, and, if performance is poor, it is controlling. Rummel and Feinberg (1988) conducted a meta-analysis to assess cognitive evaluation theory. Subjects who received rewards that were defined to convey “controlling” information were compared to groups receiving other types of rewards or no reward. The dependent measure of intrinsic motivation was a combination of both free time-on-task measures and self-reports of satisfaction and task interest. Results provided support for cognitive evaluation theory. Rummel and Feinberg concluded that controlling, extrinsic rewards have detrimental effects on intrinsic motivation. In Rummel and Feinberg’s meta-analysis, rewards were defined as controlling after the fact. That is, when a reward was found to produce a negative effect, it was seen as controlling, and the study was selected for the analysis. This exemplifies the major difficulty with cognitive evaluation theory. Rewards are defined as controlling or informational after their effect on performance has been measured. Although cognitive evaluation theory may account for the diverse findings of the effects of reward on intrinsic motivation, there are difficulties with this interpretation. One problem is that feelings of competence and self-determination are seen as causes of changes in intrinsic motivation, but they are not measured. They are assumed to be operating because behavior changes. In other words, the existence of competence, selfdetermination, and intrinsic motivation is inferred from the very behavior it supposedly causes. Rewards are defined as controlling if measures of
Salkind_Chapter 65.indd 188
9/4/2010 7:05:32 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
189
intrinsic motivation decrease and informational when the dependent variable indexes an increase in motivation.
Behavioral Accounts An operant analysis of behavior involves consideration of a prior learning history and the three-term contingency, the SD: R → Sr relationship. The three terms are: (a) discriminative stimulus (SD) or setting event, (b) the response (R) or behavior, and (c) contingent reinforcement (Sr). Flora (1990) has suggested that all of the empirical results of the intrinsic motivation research can be accounted for by considering the promised reward procedures (expected reward) as discriminative stimuli. That is, telling a person that he or she will receive a reward is a stimulus event that precedes the operant and, as such, is a discriminative stimulus rather than a reinforcer. From this perspective, if behavior is regulated by its consequences (i.e., reinforcement), no loss of intrinsic motivation is expected. When individuals who are engaged in a task are reinforced for doing the task, they will spend as much time on the activity as they originally did once the reinforcer is withdrawn. A behavioral view suggests that it is only when rewards function as discriminitive stimuli that one might expect to observe a decline in intrinsic motivation. Although discriminitive stimuli are part of the three-term contingency and affect the probability of an operant, they can and do have very different effects from reinforcers. Task performance evoked by instructions and promises of reward (SDs) can be influenced by a number of factors such as the subject’s history with respect to whether promised rewards were actually received, the subject’s verbal repertoire, the nature of prior exposure to the object being offered as the reward, and so on (Dickinson, 1989).
Summary The overjustifìcation effect, cognitive evaluation theory, and recent behavioral explanations each attempt to account for the disparate effects of reward and reinforcement on intrinsic motivation. Given the diverse findings reported in this literature, however, it is not clear at this point what effect reward or reinforcement has on intrinsic motivation. Reviewers on all sides of the issue tend to be highly critical of research designed outside of their own paradigm, and, more often than not, findings from studies in opposing camps are not considered relevant. For these reasons, the literature and its interpretations are still contentious. Because a substantial number of experimental studies have been carried out to assess the effects of reward and reinforcement on intrinsic motivation, one way to evaluate their effects is to conduct a meta-analysis.
Salkind_Chapter 65.indd 189
9/4/2010 7:05:32 PM
190
Motivation
The Present Meta-Analysis The primary purpose of the present meta-analysis is to make a causal statement about the effects of extrinsic rewards and reinforcement on intrinsic motivation. This analysis should be useful in addressing a number of concerns. Of major importance is whether the bulk of evidence suggests that extrinsic rewards and/or reinforcement produce decrements in intrinsic motivation. If so, what is the size of the relationships being uncovered? Also, do different patterns emerge with different reward types (e.g., tangible, verbal rewards), reward expectancies (expected, unexpected), or reward contingencies (e.g., rewards delivered for engaging in a task, competing or solving a task, or meeting a specified level of performance)? In the following sections of this article, the research questions addressed in the present meta-analysis are outlined, the steps involved in conducting the meta-analysis are described, and the findings are presented and discussed.
Research Questions The following questions have been addressed in this meta-analysis: 1. Overall, what is the effect of reward on intrinsic motivation? In order to answer this question, a meta-analysis of the group design experiments was conducted. Subjects who received a tangible reward and/or an extrinsic verbal reward were compared to a nonrewarded control group. This analysis should shed light on the overall effects of reward on intrinsic motivation. 2. What are the effects of specific features of reward on intrinsic motivation? Several researchers note that reward interacts with other variables to produce increments or decrements in intrinsic motivation. That is, intrinsic motivation is affected differently by the type of reward implemented, the reward expectancy and the reward contingency. Specifically, researchers have investigated the following: (a) the effect of reward type on intrinsic motivation (i.e., whether rewards are verbal or tangible), (b) the effect of reward expectancy on intrinsic motivation (i.e., whether rewards are expected – promised and delivered to subjects or unexpected – delivered to subjects but not promised), (c) the effect of reward contingency on intrinsic motivation (i.e., whether rewards are delivered to subjects for participating in an experimental session regardless of what they do, for engaging in a task, for completing or solving a task, or for attaining a specific level of performance). All analyses performed on these features were conducted with group design studies in which a rewarded group was compared to a control group. These
Salkind_Chapter 65.indd 190
9/4/2010 7:05:32 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
191
analyses should lead to a greater understanding of the specific conditions under which reward affects intrinsic motivation. Although the present analyses present a breakdown of several features of reward, there are other moderator variables mentioned in the literature (e.g., salience of reward, task type, reward attractiveness, goals of individuals, etc.). These conditions may interact with reward to affect intrinsic motivation. Unfortunately, these variables appear in only one or two studies and are, thus, not amenable to a meta-analysis. At this point in time, placing emphasis on interaction effects that have few replications would not be beneficial to an understanding of reward and intrinsic motivation. 3. Overall, what is the effect of reinforcement on intrinsic motivation? One of the criticisms of the group designs has been that reward is frequently cited as synonymous with reinforcement, yet no evidence has been provided to indicate that the rewards used in group designs are actual reinforcers. In the single-subject, repeated measures designs, researchers have demonstrated that the rewards administered increased behavior and can be considered as reinforcers. For this reason, a separate analysis was conducted with the single-subject designs where subjects served as their own controls. This analysis should allow a more definitive statement to be made about the effects of reinforcement on intrinsic motivation.
Method Selection of Studies A basic list of studies was assembled by conducting a computer search of the psychological literature (PSYCH LIT) using intrinsic motivation as the search term. The meta-analysis started with Deci (1971), and relevant articles published up to September 1991 were identified. Studies not listed on the computer database were identified through the bibliographies of review articles, chapters, books, and papers located in the original search. Two sets of studies were collected (between-group designs and withinsubject designs). The main analysis entailed assessing the overall effects of reward on intrinsic motivation from studies involving group designs. Criteria for including studies in the sample were: (a) that the study involve an experimental manipulation of a reward condition and include a nonrewarded control group; (b) that any characteristics of rewarded subjects be either held constant or varied but be represented identically for both rewarded and control groups; and (c) that studies be published (no unpublished documents were collected) and written in English.2
Salkind_Chapter 65.indd 191
9/4/2010 7:05:32 PM
192
Motivation
In addition, only studies that measured intrinsic motivation as a dependent variable were included. Intrinsic motivation has been measured as free time on task after withdrawal of reward; self-reports of task interest, satisfaction, and/or enjoyment; performance during the free time period (number of puzzles/problems solved, number of drawings completed, etc.); and subjects’ willingness to participate in future projects without reward. One study which met the criteria was excluded (Boggiano & Ruble, 1979) because the statistical contrasts used in the article were not logical given the sample size of the study.3 Other studies were omitted from the sample if some subjects in a reward condition were not actually given a reward (e.g., Pritchard, Campbell, & Campbell, 1977). The resulting sample consisted of 83 documents, reporting 96 independent studies. A major criticism of the meta-analytic technique has been that researchers often lump different measures together. This has been referred to as the apples-and-oranges problem in that it is argued that logical conclusions cannot be drawn from comparisons of studies using different measures of the dependent variable (see Glass, McGaw, & Smith, 1981). In order to avoid this problem, separate analyses were conducted on the overall effect of reward for each measure of intrinsic motivation. Using this strategy, 61 studies compared a rewarded group to a control group on the free-time measure; 64 studies investigated the attitude (task interest, enjoyment, and satisfaction) measure; 11 studies assessed the willingness to volunteer for future studies without reward measure, and 12 studies measured performance during the free-time period. In order to assess the impact of specific features of reward, further analyses were conducted with data from the 96 group design studies. In these analyses, subjects assigned to different types of rewards (tangible, verbal), reward expectancies (unexpected, expected), and reward contingencies were compared to nonrewarded control groups. The second meta-analysis was conducted on studies that employed a within-subject, multiple-trials design. In this type of design, subjects served as their own controls. These experiments are conducted in three phases with a number of sessions in each phase. Baseline measures of intrinsic motivation are taken in the first phase; reinforcement procedures are then implemented over a number of sessions, and in the third phase reinforcement is withdrawn. Changes in intrinsic motivation are measured as differences between the pre- and postreinforcement phase. Single-subject studies were included in this analysis when a reinforcement effect was demonstrated (i.e., the rewards used showed an increase in behavior) and when baseline, reinforcement, and postreinforcement phases involved repeated measures. One study reporting a reinforcement effect was excluded (Vasta, Andrews, McLaughlin, Stirpe, & Comfort, 1978, Experiment 1) because the authors reported only one measure of behavior
Salkind_Chapter 65.indd 192
9/4/2010 7:05:32 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
193
during the postreinforcement phase. Two studies used a repeated measures group design to assess the effects of reinforcement between and within groups (Greene, Sternberg & Lepper, 1976; Mynatt, Oakley, Arkkelin, Piccione, Margolis, & Arkkelin, 1978). Although subjects’ performance in these studies was measured repeatedly as in the single-subject designs, only group effects were reported. In addition, the rewards used in these studies were not shown to be reinforcers for some of the rewarded groups. Thus, these two studies were not included in the meta-analysis of single-subject designs (Mynatt et al., 1978, are included in the meta-analysis of group designs because their study included a nonrewarded control group). In all, five studies were selected for the within-subject meta-analysis. A list of studies included in the meta-analyses is presented in Appendix A.
Coding of Studies Once all relevant articles had been collected, each study was read and coded. The following general information was extracted from each report: (a) author(s), (b) date of publication, (c) publication source, (d) population sampled (children or adults), (e) sample size, (f) type of experimental design (before-after groups design, after-only groups design, or single-subject multiple-trial design), and (g) type of task used in the study. The following aspects of the independent variable were also coded: (a) reward type (tangible or verbal), (b) reward expectancy (expected or unexpected) and (c) reward contingency. Reward contingency was coded according to Deci and Ryan’s (1985) taxonomy. Task noncontingent rewards referred to rewards delivered to subjects for participating in an experimental session regardless of what they did in the session. The term task contingent reward was used to mean that a reward was given for actually doing the task and/or for completing or solving the task. Performance contingent rewards were defined as rewards delivered for achieving a specified level of performance. In addition to using Deci and Ryan’s classification, contingency was also coded in accord with a behavioral perspective. Using operant definitions, rewards were defined as noncontingent or contingent. Noncontingent rewards referred to rewards delivered for participating in the study or engaging in the task regardless of any level of performance. Contingent rewards were defined as rewards dependent on performance (i.e., rewards given for completing a puzzle, solving a task, and/or attaining a specified level of performance). Other characteristics of studies that were coded were: (a) type of dependent measure (e.g., free time on task, task interest, etc.), (b) whether experimenter was blind to conditions, and (c) whether experimenter was present or absent during the post-reward phase. As well, statistical information was recorded, and effect sizes were calculated from appropriate contrasts.
Salkind_Chapter 65.indd 193
9/4/2010 7:05:32 PM
194
Motivation
Descriptive characteristics and effect sizes of the reviewed studies are summarized in Appendix C.
Intercoder Reliability From the pool of relevant studies, 10 were randomly selected and independently coded by the second author. A standardized coding form4 was created that allowed the second coder to extract information regarding independent variables (reward type, reward expectancy, reward contingency), dependent variables (measures of intrinsic motivation), sample sizes, type of task used in the study, and calculation of effect sizes for available contrasts. Reliability calculated as percentage agreement was 93.4%. For 6 of the 10 studies, agreement was 100%. Disagreements in the other four studies involved (a) miscommunication of formulas to use for calculating effect size (for two studies), (b) mislabeling of reward expectancy (in one study), and (c) a misreading of the number of subjects in a group (in one study). Disagreements were resolved through discussion and a more careful reading of the studies and coding criteria.
Computation and Analysis of Effect Sizes The procedures used in the meta-analysis of the group design studies followed those of Hedges and Olkin (1985). Meta-analysis is a statistical technique for aggregating the results of many experimental studies which compare two groups on a common dependent measure. Once the studies and groups to be compared are identified, the statistical result of each study is transformed into a measure called an effect size. An effect size is found by converting the findings from each study into a standard deviation unit. The effect size indicates the extent to which experimental and control groups differ in the means of a dependent variable at the end of a treatment phase. In its simplest form, the effect size calculated, g, is the difference between the means of the rewarded group and a nonrewarded control group divided by the pooled standard deviation of this difference. When means or standard deviations were not available from reports, effect size was calculated from t tests, F statistics, and p-level values (e.g., p < .05) by using Hedges and Becker’s (1986) formulas. Formulas for calculating effect size are listed in Appendix B. One problem that arises in conducting a meta-analysis is determining effect sizes from studies with limited information. In a few studies, for example, contrasts are simply reported as t or F < 1.00. In such cases, effect size estimates were calculated by making t or F equal to a number between 0.01 and 1.00 chosen from a random numbers table. When results from a study were not reported or were reported as nonsignificant and when t or F values were not available but means and/or direction of means were
Salkind_Chapter 65.indd 194
9/4/2010 7:05:32 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
195
known, a random number between 0.01 and the critical value of t or F at p = .05 was chosen to calculate an estimate of effect size. When results for an outcome measure were not reported or were reported as nonsignificant and when means and direction were unknown, the effect size for that measure was set at 0.00 (indicating exactly no difference between rewarded and nonrewarded groups). For each analysis, results were calculated with 0.00 values included and with 0.00 values omitted. For several studies, more than one effect size was calculated. For example, if a single study contained two measures of intrinsic motivation (e.g., free time on task, attitude) and two types of reward groups plus a control group (e.g., tangible reward, verbal reward), a total of four effect sizes was calculated (e.g., free time-tangible reward, free time-verbal reward, attitude-tangible reward, attitude-verbal reward). In order to satisfy the independence assumption of meta-analytic statistics (Hedges & Olkin, 1985), only one effect size per study was entered into each analysis. When two or more effect sizes from one study were appropriate for a particular analysis, these effect sizes were averaged. To illustrate, for the estimate of the overall effect of reward on the free-time measure of intrinsic motivation, some studies assessed the effects of several types of rewards. If a single study, for example, contained two or more reward groups (e.g., expected reward, unexpected reward) and a control condition, the two effect sizes were averaged so that the study contributed only one effect size to the overall analysis of reward. For an analysis of the effects of expected reward on intrinsic motivation, only the one appropriate effect size from the study would be used. This strategy retained as much data as possible without violating the assumption of independence. Average effect sizes were obtained by weighting each g index by the number of participants on which it was based (see Cooper, 1989). As was previously mentioned, in the single-subject, repeated measure designs, there is no separate control group; subjects serve as their own controls. An increase or decrease in intrinsic motivation is indexed by a difference in the amount of time spent on the task between baseline and postreinforcement sessions. Effect sizes for these studies were calculated by subtracting the average time spent by all subjects in the baseline phase from the average time spent by all subjects in the postreinforcement phase. This number was then divided by the pooled standard deviation. After all effect sizes were calculated, the analyses were run on the computer program Meta (Schwarzer, 1991). Results reported in this article are based on the weighted integration method (Hedges & Olkin, 1985). Using this technique, effect sizes g are converted to ds by correcting them for bias (g is an overestimation of the population effect size, particularly for small samples; see Hedges, 1981). To obtain an overall effect size, each effect size is weighted by the reciprocal of its variance, and the weighted ds are averaged. This procedure gives more weight to effect sizes that are more
Salkind_Chapter 65.indd 195
9/4/2010 7:05:32 PM
196
Motivation
reliably estimated. Once mean effect sizes are calculated, 95% confidence intervals are constructed around the weighted mean. In order to verify the accuracy of the computer program, one analysis (the overall effect of reward on free time) was hand calculated. All obtained values from the meta-analysis program and the hand calculations were identical within rounding error. To determine whether each set of effect sizes in a sample shared a common effect size (i.e., was consistent across studies), a homogeneity statistic, Q, was calculated. Q has an approximate chi-square distribution with k–1 degrees of freedom, where k is the number of effect sizes (Hedges & Olkin, 1985). The null hypothesis is that the effect sizes are homogeneous (i.e., effect sizes in a given analysis are viewed as values sampled from a single population; variation in effect sizes among studies is merely due to sampling variation). For purposes of the present analyses, samples were considered homogeneous at p > .01. When samples are not homogeneous, studies can be classified by characteristics, such that effect sizes within categories are homogeneous. This strategy was undertaken by examining the effects of different types of rewards, reward expectancies, and reward contingencies. As a supplementary analysis, homogeneity was attained by removing outliers. That is, studies were omitted when they provided estimates that were inconsistent with those from other studies. Outliers in each data set were first identified using Tukey’s (1977) procedure. These outliers were then omitted from the analysis. If homogeneity was still not attained, other studies that reduced the homogeneity statistic by the largest amount were removed. Hedges (1987) has pointed out that this is a common procedure in both the physical and social sciences. In one area of physics, for example, Hedges (1987) found that data from 40% of the available studies were omitted from calculations. For meta-analyses of psychological topics, Hedges (1987) notes that removal of up to 20% of the outliers in a group of heterogeneous effect sizes usually results in a high degree of homogeneity. In an article in Psychological Bulletin, McGraw and Wong (1992) noted that one of the problems with effect size statistics (e.g., d) is that many readers of meta-analyses have difficulty interpreting the meaning and generalizability of findings. McGraw and Wong have introduced another way to look at effect size, by a statistic they call the common language effect size indicator (CL). CL refers to the probability that a score sampled from one distribution will be greater than a score sampled from some other distribution. McGraw and Wong suggest that CL is a useful way to talk about effect size because it is easily interpretable. They provide an example in which a sample of young adult men is compared to a sample of young adult women on the variable height. A CL of .92 indicates the probability of a male being taller than a female. Put another way, in any random pairing of young adult men and women, the male will be taller than the female 92 out of 100 times.
Salkind_Chapter 65.indd 196
9/4/2010 7:05:32 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
197
CL is calculated from means and standard deviations. Additionally, an effect size, d, can be converted to CL by multiplying d by 1/√2 or 0.707 to obtain a Z value (K.O. McGraw, personal communication, April 24, 1992). The upper tail probability associated with this value corresponds to CL and can be calculated using the unit normal curve. To test the robustness of the CL statistic, McGraw and Wong (1992) conducted a series of 118 tests (simulations) to determine the implications of violating the assumption that sample data come from populations of values that are normally distributed with equal variances. They found small discrepancies between the estimate of CL under the normality assumption and the estimate of CL when the normality assumption was violated in terms of skewness and kurtosis. The worst case discrepancy was 0.1 which occurred with a large violation of the equal variance assumption, considerable negative skewness, and a large violation of kurtosis. Given the robustness of CL and the ease with which it can be interpreted, results from the present analyses have also been expressed using the CL statistic. The meta-analytic procedures used in the present review include: (a) the estimation of average effect sizes and 95% confidence intervals, (b) homogeneity analyses to determine whether effect sizes are drawn from the same population, (c) removal of outliers to attain homogeneity, and (d) conversion of average effect sizes to the common language statistic (CL). Note that outliers are included and excluded in each analysis.
Results from Group Designs The Overall Effect of Reward on Intrinsic Motivation To assess the overall effect of reward on intrinsic motivation, descriptive and meta-analytic procedures were performed on each of the four different measures of intrinsic motivation (free time on task, attitude, performance during the free-time period; willingness to volunteer for future studies without reward). For each measure, negative effects represent a decrement in intrinsic motivation; positive effects indicate an increment.
Direction of Effects The number of studies collected for each analysis of the overall effects of rewards on intrinsic motivation and the direction of their effects is presented in Table 1. On the free-time measure, the majority of studies showed that reward decreased intrinsic motivation. However, when intrinsic motivation was measured by attitude toward a task, performance during the free-time period, or willingness to volunteer for future studies without reward, more studies showed positive effects.
Salkind_Chapter 65.indd 197
9/4/2010 7:05:32 PM
198
Motivation
Table 1: Number of studies and direction of effects for reward versus control groups on four measures of intrinsic motivation Free time
Attitude
Performance in free time
Willingness to volunteer
Showing a positive effect of reward
22
31
6
6
Showing a negative effect of reward
34
15
4
4
Showing no effect
1
1
–
–
With lack of sufficient information to calculate effects
4
17
2
1
61
64
12
11
Number of studies
Total
Distribution of Effect Sizes Frequency distributions of the data are shown in Figure 1. Studies that found no significant differences but did not provide sufficient information to calculate effect sizes are not portrayed in the graphs. When intrinsic motivation was measured as time on task following the removal of a reward (free time), effect sizes ranged from –1.94 to 1.06. The bulk of experiments found effects between – 0.59 and 0.19. Using Tukey’s (1977) procedure, one negative outlier was identified in the free-time data. This effect (g = –1.94) was calculated from a study conducted by Morgan (1983, Experiment 1). In this study, subjects who received an expected, task contingent (noncontingent), tangible reward were compared to no-reward control subjects. The large negative effect could be due to the type of reward (tangible), the reward expectancy, and/or the reward contingency. All of these features are examined in further analyses. In addition, this study was somewhat different from other studies in that subjects who performed the activity for a reward were observed by other subjects. That is, subjects were offered a reward for engaging in an activity while their performance on the task was being watched. Thus, the large negative effect could be a result of an interaction of reward type, expectancy, contingency, and surveillance. The attitude measure of intrinsic motivation refers to subjects’ self-reports of task interest, enjoyment, and/or satisfaction. Effect sizes ranged from –0.69 to +1.98 with the majority of effects falling between –0.19 and +0.59. Two positive outliers in this data set come from studies conducted by Vallerand (1983) and Butler (1987). In both of these studies, extrinsic verbal reward is compared to a no-reward group. The effect of verbal reward on intrinsic motivation is investigated in a subsequent analysis. Effect sizes on the performance measure ranged from –3.72 to + 0.96; the median was +0.03. One large negative outlier (–3.72) comes from a study conducted by Deci (1971, Experiment 2). This study differed from others in that it was a field experiment where students working for a college newspaper were paid to write headlines. Only eight subjects participated,
Salkind_Chapter 65.indd 198
9/4/2010 7:05:33 PM
12 11 10 9 8 7 6 5 4 3 2 1 0
199
ATTITUDE
−0.79 to −0.60 −0.59 to −0.40 −0.39 to −0.20 −0.19 to −0.01 0.00 to 0.19 0.20 to 0.39 0.40 to 0.59 0.60 to 0.79 0.80 to 0.99 1.00 to 1.19 1.20 to 1.39 1.40 to 1.59 1.60 to 1.79 1.80 to 1.99
FREE TIME Number of Studies
10 9 8 7 6 5 4 3 2 1 0
Reinforcement, Reward, and Intrinsic Motivation
−1.99 to −1.80 −1.79 to −1.60 −1.59 to −1.40 −1.39 to −1.20 −1.19 to −1.00 −0.99 to −0.80 −0.79 to −0.60 −0.59 to −0.40 −0.39 to −0.20 −0.19 to −0.01 0.00 to 0.19 0.20 to 0.39 0.40 to 0.59 0.60 to 0.79 0.80 to 0.99 1.00 to 1.19
Number of Studies
Cameron and Pierce
Effect sizes
Effect sizes
PERFORMANCE
Effect sizes
0.60 to 0.79
0.40 to 0.59
0.20 to 0.39
0.00 to 0.19
−0.19 to −0.01
−0.39 to −0.20
0
−0.59 to −0.40
1
3
2
1
0
−3.79 to −3.60 −3.59 to −3.40 −3.39 to −3.20 −3.19 to −3.00 −2.99 to −2.80 −2.79 to −2.60 −2.59 to −2.40 −2.39 to −2.20 −2.19 to −2.00 −1.99 to −1.80 −1.79 to −1.60 −1.59 to −1.40 −1.39 to −1.20 −1.19 to −1.00 −.99 to −.80 −.79 to −.60 −0.59 to −0.40 −0.39 to −0.20 −0.19 to −0.01 0.00 to 0.19 0.20 to 0.39 0.40 to 0.59 0.60 to 0.79 0.80 to 0.99
Number of Studies
2
−0.79 to −0.60
Number of Studies
WILLINGNESS TO VOLUNTEER
Effect sizes
Figure 1: Frequency distributions of effect sizes for overall reward versus control groups on four measures of intrinsic motivation
and two subjects in the control group dropped out and were not included in the analysis. On the willingness-to-volunteer measure, effect sizes ranged from –0.63 to +0.68. There were no outliers in this sample. To establish whether the CL statistic (McGraw & Wong, 1992) could be used confidently in the analyses, the extent to which the free-time distribution of effect sizes deviated from normality was determined. Obtained values for skewness and kurtosis were –0.21 and 0.55, respectively (where normal skewness and kurtosis equal 0.00). McGraw and Wong tested the effect that violations from normality would have on CL. Based on their findings and the skewness and kurtosis values obtained here, in the metaanalysis of effect sizes for the free-time measure, one could expect, at worst, an underestimate of 0.02 and an overestimate of 0.04 for CL. Given this small discrepancy, the implication is that the CL statistic can be used and interpreted without any serious concern about violations of normality and homogeneity of variance.
Salkind_Chapter 65.indd 199
9/4/2010 7:05:33 PM
200
Motivation
Meta-Analysis of Effect Sizes The overall meta-analysis of effect sizes presented in Table 2 allows one to determine whether rewarded subjects showed less intrinsic motivation than nonrewarded subjects as measured by time on task following the removal of reward (free time); self-reports of task interest, satisfaction, and enjoyment (attitude); performance during the free-time period; and willingness to volunteer for future studies without reward. For each measure of intrinsic motivation, an analysis was conducted which included all studies that provided sufficient information to calculate effect sizes (see “All known effects” in Table 2). When samples were not homogeneous, outliers were identified and removed using Tukey’s (1977) procedure. If samples were still significantly heterogeneous, additional outliers were removed. Homogeneity was attained for the free-time and attitude measures by omitting approximately 20% of the effect sizes, a typical meta-analytic procedure. An examination of Table 2 indicates that the procedure of including and excluding outliers does not drastically alter mean effect sizes. On the free-time measure, rewarded subjects showed less intrinsic motivation than nonrewarded controls (mean weighted d = –0.04), but this effect was not significant (i.e., the confidence interval included 0.00). When the mean effect of the homogeneous sample was converted to CL, results indicate that, given a sample of studies designed to investigate the effects of reward on time on task, 51 out of 100 studies would show that overall, rewarded subjects spend less time on the task than nonrewarded controls (assuming that all studies are of equal importance and have the same characteristics). Results from the attitude measure indicate greater intrinsic motivation for rewarded subjects. This effect was small at 0.14 (from the homogeneous sample) but differed significantly from the value of 0.00 (i.e., the confidence interval did not include 0.00). The CL statistic was .54 and can be interpreted to mean that, in comparisons of rewarded to nonrewarded subjects, rewarded subjects will show a more positive attitude toward a task than nonrewarded subjects in 54 out of 100 studies. Rewarded subjects also showed a tendency to score higher on performance measures and to volunteer for the future projects more than nonrewarded subjects, but these effects were not significant. Studies that could not be represented with effect sizes were given a value of 0.00. When these studies were included in the overall analyses (see “All reports” in Table 2), the mean effect size for each measure was little changed. Overall, the results show that reward does not significantly affect intrinsic motivation as measured by free time on task following removal of reward, by performance during the free-time period, or by subjects’ willingness to volunteer for future projects without reward. When intrinsic motivation is measured by attitude toward a task, rewarded subjects report higher intrinsic motivation than nonrewarded subjects. It is important to point out that these main effect
Salkind_Chapter 65.indd 200
9/4/2010 7:05:33 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
201
Table 2: Overall effect of reward versus control groups on four measures of intrinsic motivation Analysis Free time on task All known effects (zeros excluded) Outliers removed using Tukey’s procedure (zeros excluded) Additional outliers removed (no zeros) All reports (zeros and outliers included) Attitude All known effects (zeros excluded) Outliers removed using Tukey’s procedure (zeros excluded) Additional outliers removed (no zeros) All reports (zeros and outliers included) Performance during free time period All known effects (zeros excluded) Outliers removed using Tukey’s procedure (zeros excluded) Additional outliers removed (no zeros) All reports (zeros and outliers included) Willingness to volunteer All known effects (zeros excluded) All reports (zeros and outliers included)
k
Sample size
Mean weighted d
95% CI for d
Q
CL
57
3539
–0.06
−0.13 to 0.01
225.51*
.48
56
3459
–0.03
−0.10 to 0.04
177.40*
.49
44 61
2634 3858
–0.04 –0.06
−0.12 to 0.04 −0.12 to 0.01
66.39 225.80*
.49 .48
47 45
3184 3034
+0.21 +0.17
0.14 to 0.29 0.09 to 0.24
167.50* 110.70*
.56 .55
39 64
2680 4431
+0.14 +0.15
0.06 to 0.22 0.09 to 0.21
58.03 177.07*
.54 .54
10 9
575 569
+0.08 +0.09
−0.09 to 0.25 −0.08 to 0.26
27.90* 21.63*
.52 .52
8 12
509 770
–0.0004 +0.06
−0.18 to 0.18 −0.09 to 0.21
11.73 28.07*
.50 .52
10 11
561 609
+0.05 +0.05
−0.12 to 0.23 −0.12 to 0.22
17.38 17.42
.52 .52
Note: Negative effect sizes indicate a decrease in intrinsic motivation for reward/reinforcement groups; positive effect sizes indicate an increase. k = number of effect sizes; sample size = sum of n in all studies; mean weighted d = mean of weighted effect sizes (weighted by sample size); CI = confidence interval; Q = homogeneity statistic for mean effect sizes; CL = common language effect size statistic. *Significance indicates rejection of the hypothesis of homogeneity. *p < .01.
results should be viewed with caution. This is because many studies show interaction effects that are obscured when results are aggregated. Previous reviewers (e.g., Deci & Ryan, 1985; Morgan, 1984) have suggested that reward type, reward expectancy, and reward contingency may influence the effect of reward on intrinsic motivation. In subsequent analyses, effect sizes have been partitioned into groups based on these characteristics in an attempt to test potential moderator variables and to establish homogeneity of variance.
Interactions: Effect Size as a Function of Reward Characteristics5 In the following section, type of reward and its impact on effect size are presented. Studies are included that measured the effects of either verbal or tangible reward (e.g., money) on intrinsic motivation. The second part
Salkind_Chapter 65.indd 201
9/4/2010 7:05:33 PM
202
Motivation
of this section involves an analysis of reward expectancy (i.e., expected and unexpected rewards). Finally, reward contingency is assessed. Specifically, the question here is whether effect size varies as a function of reward delivered for engaging in a task, completing or solving a task, or achieving a certain level of performance. Studies that could not be represented as effect sizes due to lack of sufficient information are not included in further analyses presented in this article.6
Type of Reward The purpose of the present analyses is to assess the effects of different types of rewards (i.e., tangible and verbal) on intrinsic motivation. Because few studies assessed intrinsic motivation as a function of “performance during the free-time period” and “willingness to volunteer,” no further analyses on these measures have been conducted. Effect sizes for both types of reward on the free-time and attitude measures are presented in funnel distributions in Figure 2. Funnel graphs are used to plot effect size against sample size of the study. The advantage of a funnel display is that it capitalizes on a well-known statistical principle (Light & Pillemer, 1984). That is, the larger the sample, the closer the effect size will come to represent the true underlying population value; variability due to sampling error decreases. Conversely, smaller samples are more prone to sampling error and are likely to deviate considerably about the true mean. For these reasons, the distribution is expected to take the shape of an inverted funnel. An inspection of the funnel distribution of effect sizes for the free-time measure indicates that, overall, larger samples tend to concentrate around zero; greater variation is evident with smaller samples. Verbal reward appears to produce a positive effect. Results of tangible reward suggest a negative effect. These differences suggest that, on the free-time measure, the effects of reward depend on the type of reward. On the attitude measure, positive effects emerge from both tangible and verbal reward studies; verbal reward appears to produce a slightly more positive effect. There is no indication of a publication bias because studies with small sample sizes and near zero effects are represented in the funnel distribution (for a discussion of this issue, see Light & Pillemer, 1984). Although it is not possible to rule out experimenter bias (Rosenthal, 1966), the funnel graphs demonstrate that sampling variability may account for the fact that some researchers find reward has a detrimental effect while others do not. The results from the meta-analysis of the effects of reward type presented in Table 3 indicate that, when studies compared subjects who received a verbal reward (i.e., praise or positive feedback) to those who did not receive a reward, rewarded subjects demonstrated significantly higher intrinsic motivation as measured by both time on task and attitude. On the time measure,
Salkind_Chapter 65.indd 202
9/4/2010 7:05:33 PM
Cameron and Pierce
180
Reinforcement, Reward, and Intrinsic Motivation
203
FREE TIME
160 Tangible Verbal
140 Sample size
120 100 80 60 40 20 0
−2
−1
+1
0 Effect size
+2
160 ATTITUDE 140
Tangible Verbal
Sample size
120 100 80 60 40 20 0
−3
−2
−1
0 Effect size
+1
+2
+3
Figure 2: Funnel distributions of effect sizes for tangible and verbal reward on two measures of intrinsic motivation
homogeneity was attained by removing one outlier. This extreme positive value (+1.61) was obtained from a study conducted in India (Tripathi & Agarwal, 1985). Because all other studies in this analysis came from North America, the large effect size may have been due to differences in the population studied.7 Three outliers from studies measuring the effects of verbal reward were removed to achieve homogeneity on the attitude measure. Inspection of these outliers suggested that they did not differ in obvious ways from other studies in the sample except for their tendency to generate extreme values of effect size. From these analyses, one can estimate that the probability of a sample of verbally rewarded subjects’ being more highly intrinsically motivated than nonrewarded subjects is 0.61 (CL) as measured by time on task and attitude toward task.
Salkind_Chapter 65.indd 203
9/4/2010 7:05:33 PM
204
Motivation
Table 3: Effect size as a function of the type of reward delivered Reward type
Analysis
Free time on task Verbal All known effects Verbal Outliers removed using Tukey’s procedure Tangible All known effects Tangible Outliers removed using Tukey’s procedure Tangible Additional outliers removed Attitude Verbal All known effects Verbal Outliers removed using Tukey’s procedure Verbal Additional outliers removed Tangible All known effects Tangible Outliers removed using Tukey’s procedure
k
Sample size
Mean Weighted d
95% CI for d
Q
CL
15 14
958 918
+0.42 +0.38
0.29 to 0.56 0.25 to 0.52
29.37* 18.96
.62 .61
51 47
2983 2761
–0.20 –0.22
–0.28 to –0.12 –0.30 to –0.14
181.01* 97.55*
.44 .44
43
2591
–0.21
–0.29 to –0.13
63.53
.44
15 13
1024 874
+0.45 +0.30
0.31 to 0.58 0.15 to 0.43
69.71* 26.75*
.63 .58
12
785
+0.39
0.24 to 0.53
8.73
.61
37 33
2362 2149
+0.09 +0.05
0.004 to 0.17 –0.04 to 0.13
143.29* 50.56
.52 .52
Note: Negative effect sizes indicate a decrease in intrinsic motivation for reward/reinforcement groups; positive effect sizes indicate an increase. k = number of effect sizes; sample size = sum of n in all studies; mean weighted d = mean of weighted effect sizes (weighted by sample size); CI = confidence interval; Q = homogeneity statistic for mean effect sizes; CL = common language effect size statistic. *Significance indicates rejection of the hypothesis of homogeneity. *p < .01.
Studies assessing the effects of tangible reward on intrinsic motivation show a decrease on the free-time measure as indicated by a negative mean effect size that differed significantly from 0.00. The CL statistic of .44 implies that subjects who receive a tangible reward will show a decrease in intrinsic motivation as measured by time on task in 56 out of 100 studies. The mean effect size on attitude for subjects given a tangible reward was positive, but once outliers were removed, the mean did not differ significantly from 0.00. In summary, subjects rewarded with verbal praise or positive feedback show significantly greater intrinsic motivation than nonrewarded subjects. Those who receive a tangible reward evidence significantly less intrinsic motivation than nonrewarded subjects, as measured by time on task, but they do not differ in their reports of task interest or enjoyment. The next step in the analysis involves a further breakdown of the effects of tangible reward. The goal is to identify variables that may moderate the effects of tangible reward on intrinsic motivation and to establish within-group
Salkind_Chapter 65.indd 204
9/4/2010 7:05:33 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
205
homogeneity. One factor that may impact effect size is whether the rewards implemented in the studies were promised to subjects prior to the experimental sessions or whether they were received unexpectedly.
Reward Expectancy Within the intrinsic motivation literature, researchers draw a distinction between expected and unexpected reward. Expected rewards refer to a procedure whereby subjects are offered a reward prior to the experimental session and delivered the reward following the session. Subjects who receive an unexpected reward have not been promised the reward beforehand. These terms are generally used to describe procedures involving the administration of tangible rewards. In most studies on verbal reward, praise was delivered unexpectedly and was not contingent on any specified level of performance. The few studies on verbal reward that did employ expected and/or contingency procedures did not produce effect sizes that deviated much from the mean effect size presented in Table 3. For this reason, no further subdivision of effect sizes from verbal reward studies was undertaken. The following analyses concern the effects of tangible reward. Results are displayed in Table 4. Only six studies assessed the effects of unexpected tangible reward on the time measure of intrinsic motivation; five studies investigated attitude. The average effect sizes for unexpected tangible reward versus control groups on free time and attitude were slightly positive but did not differ from 0.00. These results indicate that subjects receiving an unexpected reward do not differ significantly from nonrewarded control subjects on measures of intrinsic motivation. For the expected tangible reward versus control comparisons, expected reward subjects demonstrated significantly less intrinsic motivation on the free-time measure. On attitude, when homogeneity was attained, the two groups did not differ. In the following section of this article, studies comparing expected, tangible reward groups to nonrewarded controls were further subdivided into groups based on reward contingency.
Reward Contingency In some studies, subjects were promised a tangible reward that was delivered for participating in the study or for engaging in a specific task. In other studies, a tangible reward was offered for solving a puzzle, completing a task, and/or attaining a certain level of performance. Rewards administered in these various ways have been labeled by Deci and Ryan (1985) as task noncontingent (rewards offered for participating in the study regardless of what subjects do), task contingent (rewards offered for engaging in a task, and/or
Salkind_Chapter 65.indd 205
9/4/2010 7:05:33 PM
206
Motivation
Table 4: Effect size as a function of reward expectancy for tangible reward versus control comparisons Reward expectancy
Analysis
k
Sample size
Free time on task: Tangible reward versus control Unexpected All known 6 275 effects Expected All known 50 2825 effects Expected Outliers 46 2603 removed using Tukey’s procedure Expected Additional 42 2408 outliers removed Attitude: Tangible reward versus control Unexpected All known 5 311 effects
Mean weighted d
95% CI for d
Q
CL
+0.01
−0.24 to 0.25
7.38
.50
−0.23
−0.30 to −0.15
185.48*
.44
−0.25
−0.33 to −0.17
101.36*
.43
−0.25
−0.33 to −0.16
64.78
.43
+0.06
−0.16 to 0.28
12.42
.52
135.26*
.53
50.48
.52
Expected
All known effects
35
2126
+0.10
0.01 to 0.19
Expected
Outliers removed using Tukey’s procedure
32
1961
+0.07
−0.02 to 0.16
Note: Negative effect sizes indicate a decrease in intrinsic motivation for reward/reinforcement groups; positive effect sizes indicate an increase. k = number of effect sizes; sample size = sum of n in all studies; mean weighted d = mean of weighted effect sizes (weighted by sample size); CI = confidence interval; Q = homogeneity statistic for mean effect sizes; CL = common language effect size statistic. *Significance indicates rejection of the hypothesis of homogeneity. *p < .01.
completing or solving a task), and performance contingent (rewards offered for attaining a specified level of performance). Table 5 presents results from the meta-analysis of these comparisons. Table 5 indicates that when subjects who are promised a tangible reward regardless of what they do in the study (task noncontingent) are compared to nonrewarded controls, no significant difference emerges on the free-time measure of intrinsic motivation. No analyses were conducted with this type of reward contingency on the attitude measure because only two studies of this type assessed attitude. Subjects who receive an expected tangible reward for doing, completing, or solving a task (task contingent) show significantly less intrinsic motivation than controls, as measured by time on task, once reward is withdrawn. On attitude, they show less intrinsic motivation, but this difference is not significant. When rewards are delivered contingent on a certain level of performance, there is no significant effect on the free-time measure; subjects in this condition do, however, report a more positive attitude than controls.
Salkind_Chapter 65.indd 206
9/4/2010 7:05:33 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
207
Table 5: Effect size as a function of reward contingency (as defined by Deci & Ryan, 1985) for expected tangible reward versus control comparisons Reward contingency
Analysis
k
Sample size
Mean weighted d
Free time on task: Expected tangible reward versus control Task nonAll known effects 6 225 +0.55 contingent Task nonOutliers removed 4 124 +0.10 contingent Task All known effects 45 2257 –0.32 contingent Task Outliers removed 44 2177 –0.28 contingent using Tukey’s procedure Task Additional outliers 40 2015 –0.23 contingent removed Performance All known effects 10 484 –0.12 contingent Performance Outliers removed 8 439 –0.13 contingent using Tukey’s procedure Attitude: Expected tangible reward versus control Task All known effects 21 1217 –0.07 contingent Outliers removed Task 20 1157 –0.01 contingent using Tukey’s procedure Task Additional outliers 19 1058 –0.08 contingent removed Performance All known effects 14 819 +0.38 contingent Performance Outliers removed 13 762 +0.29 contingent using Tukey’s procedure Performance Additional outliers 11 682 +0.19 contingent removed
95% CI for d
Q
CL
+0.27 to 0.83
20.02*
.65
–0.26 to 0.45
1.86
.53
–0.41 to –0.24
130.90*
.41
–0.37 to –0.19
94.99*
.42
–0.32 to –0.14
62.08*
.44
–0.31 to 0.06
26.22*
.47
–0.34 to 0.06
17.83
.46
–0.18 to 0.05
53.75*
.48
–0.13 to 0.10
36.24*
.49
–0.20 to 0.04
21.76
.48
0.24 to 0.52
70.03*
.61
0.14 to 0.43
27.35*
.58
0.04 to 0.35
11.54
.55
Note: Negative effect sizes indicate a decrease in intrinsic motivation for reward/reinforcement groups; positive effect sizes indicate an increase. k = number of effect sizes; sample size = sum of n in all studies; Mean weighted d = mean of weighted effect sizes (weighted by sample size); CI = confidence interval; Q = homogeneity statistic for mean effect sizes; CL = common language effect size statistic. *Significance indicates rejection of the hypothesis of homogeneity. *p < .01. No effect size was calculated for the attitude measure of task noncontingent rewards because there were only two studies that fit in this category.
Studies employing various reward contingencies were also categorized using behavioral definitions. Rewards delivered for participating in a study or for engaging in a task are referred to as noncontingent rewards. Rewards are called contingent when they are offered for solving a puzzle, completing a task, or reaching a specified level of performance. The results of this analysis are shown in Table 6. The findings indicate that, when reward contingency is defined behaviorally, subjects demonstrate a decrease in intrinsic motivation on the
Salkind_Chapter 65.indd 207
9/4/2010 7:05:33 PM
208
Motivation
free-time measure when expected tangible rewards are not contingent on successful performance. On the attitude measure, noncontingent rewards produce no significant effect. Rewards contingent on successful performance do not produce significant effects on either the free-time or attitude measures. The major difference between a behavioral classification of contingency and Deci and Ryan’s categorization system concerns those studies where subjects are given a reward for completing or solving a task. The first experiment conducted by Deci (1971) is an example of a study coded as task contingent using Deci and Ryan’s categories and contingent using a behavioral framework. In this study, subjects were paid money for each puzzle they solved. Deci and Ryan classified such reward procedures as task contingent because the rewards were not contingent on how well subjects performed relative to some standard. From a behavioral perspective, however, completion or solution of a task is seen as dependent on successful performance; these studies were labeled contingent. Thus, performance contingent rewards as
Table 6: Effect size as a function of reward contingency (as defined behaviorally) for expected tangible reward versus control comparisons Reward contingency
Analysis
k
Sample size
Mean weighted d
Free time on task: Expected tangible reward versus control Contingent All known effects 18 906 –0.12
95% CI for d
Q
CL
–0.26 to 0.01
37.44*
.47
861 2017
–0.13 –0.27
–0.26 to 0.01 –0.35 to – 0.18
29.06 167.05*
.46 .42
Outliers removed 38 1894 using Tukey’s procedure NonAdditional outliers 34 1728 contingent removed Attitude: Expected tangible reward versus control
–0.26
–0.35 to – 0.16
100.86*
.43
–0.26
–0.36 to – 0.16
54.66
.43
Contingent Noncontingent Noncontingent
Outliers removed All known effects
16 40
Contingent
All known effects
20
1224
+0.24
0.12 to 0.36
88.64*
.57
Contingent
Outliers removed using Tukey’s procedure All known effects
17
1087
+0.11
–0.01 to 0.23
22.24
.53
17
913
–0.04
–0.17 to 0.09
50.14*
.49
Outliers removed using Tukey’s procedure Additional outliers removed
16
853
+0.03
–0.10 to 0.17
31.52*
.49
15
833
+0.05
–0.08 to 0.19
27.91
.48
Noncontingent Noncontingent Noncontingent
Note: Negative effect sizes indicate a decrease in intrinsic motivation for reward/reinforcement groups; positive effect sizes indicate an increase. k = number of effect sizes; sample size = sum of n in all studies; Mean weighted d = mean of weighted effect sizes (weighted by sample size); CI= confidence interval; Q = homogeneity statistic for mean effect sizes; CL = common language effect size statistic. *Significance indicates rejection of the hypothesis of homogeneity. *p < .01.
Salkind_Chapter 65.indd 208
9/4/2010 7:05:33 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
209
Table 7: Effect size as a function of rewards contingent on task completion or solution for expected tangible reward versus control comparisons Measure
k
Sample size
Mean weighted d
95% CI for d
Q
CL
Free time Attitude
8 6
423 405
–0.12 –0.05
–0.32 to 0.08 –0.25 to 0.14
11.21 6.89
.47 .48
Note: Negative effect sizes indicate a decrease in intrinsic motivation for reward/reinforcement groups; positive effect sizes indicate an increase, k = number of effect sizes; sample size = sum of n in all studies; Mean weighted d = mean of weighted effect sizes (weighted by sample size); CI = confidence interval; Q = homogeneity statistic for mean effect sizes; CL = common language effect size statistic. *Significance indicates rejection of the hypothesis of homogeneity. *p < .01.
defined by Deci and Ryan (1985) include only those studies where subjects are offered a reward for attaining a certain level of performance; using a behavioral definition, studies coded as contingent include both rewards that are contingent on completing or solving a task and rewards that are contingent on reaching a specified level of performance. Because these two types of reward contingencies may have opposite effects on intrinsic motivation, a separate analysis was conducted on studies in which reward was delivered for completing or solving a task. Results given in Table 7 show no significant differences between rewarded and control groups on the free-time or attitude measures for this type of reward contingency. These findings suggest that contingent rewards (which include performance contingent rewards), as defined behaviorally, do not harm intrinsic motivation.
Summary of Results from Group Designs A summary of the various analyses conducted on the group design studies and the major findings is given in Figure 3.8 When all types of reward are aggregated, overall, the results indicate that reward does not negatively affect intrinsic motivation on any of the four measures (free time on task once reward is withdrawn, self-reports of attitude, performance during the free-time measure, willingness to volunteer for future studies without reward). When rewards are subdivided into reward type (verbal, tangible), reward expectancy (expected, unexpected), and reward contingency, the findings demonstrate that people who receive a verbal reward spend more time on a task once the reward is withdrawn; they also show more interest and enjoyment than nonrewarded persons. Tangible reward produces no decrement in intrinsic motivation when it is received unexpectedly. Expected tangible rewards produce differing effects depending on the manner in which they are administered. Individuals who receive an expected reward for solving or completing a task or for achieving a specific level of performance do not spend less time on a task than controls once the reward is withdrawn. They do, however, report more interest,
Salkind_Chapter 65.indd 209
9/4/2010 7:05:33 PM
210
Motivation
INTRINSIC MOTIVATION
Attitude
Performance
Willingness to Volunteer
Reward (dw = 0.14)
Reward n.s.
Reward n.s.
Free Time
Reward n.s.
Verbal (dw = 0.38)
Tangible (dw = −0.21)
Unexpected n.s.
Verbal (dw = 0.39)
Expected (dw = −0.25)
Contingent n.s.
Tangible n.s.
Unexpected n.s.
Noncontingent (dw = −0.26)
Contingent on task completion or solution n.s. Performance contingent n.s.
Task contingent (dw = −0.23)
Expected n.s.
Contingent n.s.
Noncontingent n.s.
Contingent on task completion or solution n.s. Task noncontingent n.s.
Performance contingent (dw = 0.19)
Task contingent n.s.
Note: dw = mean weighted effect size (based on homogeneous samples); n.s. = not significant; analyses in regular type indicate no effect; analyses in bold indicate a negative effect; underlined analyses indicate a positive effect. When no dw is reported, there was no significant effect. No analyses were conducted on the attitude measure for task noncontingent reward because only two studies assessed this measure.
Figure 3: A summary of the meta-analysis of the effects of reward versus control groups on intrinsic motivation
satisfaction, and enjoyment of the task when the reward is given for a certain level of performance. The detrimental effects of reward appear when rewards are offered to people simply for engaging in a task, independent of successful performance. Under these conditions, once the reward is removed, individuals spend less time on the task than controls; they do not, however, report a less favorable attitude toward the task.
Results from Single-Subject Designs To determine the effects of reinforcement on intrinsic motivation, an analysis was conducted on effect sizes from single-subject, repeated measures designs where the rewards used were shown to be reinforcers for each
Salkind_Chapter 65.indd 210
9/4/2010 7:05:33 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
211
subject in the study. That is, rewards were shown to increase behavior during a reinforcement phase. An increase or decrease in intrinsic motivation was measured as a difference between behavior during the pre- and postreinforcement phases. Five studies contributed an effect size to this analysis. Four studies showed that subjects spent more time on the task during the postreinforcement phase than the baseline phase. One study (Vasta & Stirpe, 1979) showed a decrease in time on task immediately following the removal of reward but an increase in time when intrinsic motivation was measured 2 weeks later. To make this analysis comparable to the analysis of group design studies, however, only differences between the immediate postreinforcement phase and baseline were analyzed. The average effect size and confidence interval for this analysis was +0.34 (– 0.28, 0.96) indicating no significant change in intrinsic motivation from baseline to postreinforcement phases. Effect sizes were homogeneous (Q = 2.96, df = 4). These results suggest that reinforcement does not alter people’s intrinsic motivation. As noted previously, two studies used a between- and within-group repeated measures design to assess the effects of reinforcement on intrinsic motivation (Greene, Sternberg, & Lepper, 1976; Mynatt, Oakley, Arkkelin, Piccione, Margolis, & Arkkelin, 1978). Although these studies did not meet the criteria for inclusion in the meta-analysis of within-subject designs, it is possible to assess the within-group effects for reward conditions that were comparable in both studies. Both Greene et al. (1976) and Mynatt et al. (1978) included a group of subjects rewarded for playing with activities that they had spent the most time with during baseline phases (high interest condition) and a group that was rewarded for playing with activities they had spent the least time with during baseline (low interest condition). In terms of the high interest conditions, Mynatt et al. did not find a reinforcement effect but reported a decrease in intrinsic motivation from baseline to postreward phases. Greene et al. reported a reinforcement effect for the high interest group and a decrease in intrinsic motivation between baseline and postreinforcement sessions. It is difficult to draw conclusions from only two studies. Nonetheless, because a decline in intrinsic motivation occurred with or without a reinforcement effect, it may be that reinforcement is not the critical variable. Both studies reported a reinforcement effect for the low interest conditions, but there was no change in intrinsic motivation from baseline to postreinforcement phases. Again, conclusions based on two studies are tenuous. One interpretation, however, is that the time spent on low interest activities was so low that a decline in intrinsic motivation could not be detected. Alternatively, reinforcement does not interrupt intrinsic motivation for low interest activities.
Salkind_Chapter 65.indd 211
9/4/2010 7:05:34 PM
212
Motivation
Discussion A major contention in education and psychology is that rewards and reinforcement negatively impact a person’s intrinsic motivation. The view is that, if people are reinforced or rewarded for activities they already spend time on and enjoy, they will be less motivated to engage in the activity than they were prior to the introduction of reward, once the reward is no longer forthcoming. In other words, rewards and reinforcement are said to decrease people’s intrinsic motivation. Over the past 20 years, dozens of studies have been conducted to investigate this issue. The primary objective of this article was to assess the research findings by conducting a meta-analysis of results from experiments on the effects of reward and reinforcement on intrinsic motivation. What follows is a discussion of the results obtained from the meta-analysis. The vast majority of studies have assessed the effects of reward on intrinsic motivation by using group designs. Rewarded subjects are compared to nonrewarded controls. Intrinsic motivation is measured by differences between groups on attitude, time spent on a task following the removal of reward (free time), performance during the free-time period, and willingness to volunteer for future studies without reward. The main meta-analysis reported in this article was conducted on results from these studies. This analysis concerned assessing the overall effects of reward on intrinsic motivation as well as the effects of a number of reward characteristics. The results suggest that in the laboratory, overall, reward does not negatively impact intrinsic motivation on any of the four measures analyzed here. A separate analysis was conducted using single-subject, repeated measures designs. A few researchers employed this type of design to evaluate the effects of reinforcement on intrinsic motivation. The rewards used in these studies were shown to be reinforcers, and intrinsic motivation was indexed as differences in subjects’ behavior between pre- and postreinforcement sessions. Results from the meta-analysis indicate no effect of reinforcement on intrinsic motivation. That is, the evidence suggests that reinforcement does not decrease a person’s intrinsic motivation to engage in an activity. In terms of rewards and extrinsic reinforcement, our overall findings suggest that there is no detrimental effect on intrinsic motivation. These findings are based on laboratory experiments, but a similar conclusion was reached by Workman and Williams (1980) in their review of the effects of extrinsic rewards on intrinsic motivation in the classroom. Generally, on task behavior, Workman and Williams found that external reinforcement increased and maintained intrinsic motivation for prolonged periods (up to 12 months). Thus, it no longer seems appropriate to argue against the use of incentive systems in applied settings. The findings from both experimental and applied research run contrary to the views expressed by many psychologists and educators (e.g., Deci & Ryan,
Salkind_Chapter 65.indd 212
9/4/2010 7:05:34 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
213
1985; Kohn, 1993; Levine & Fasnacht, 1974; Schwartz, 1990). For example, Deci and Ryan (1987) state that: In general [italics added], rewards have been found to undermine intrinsic motivation. When people received rewards for working on an interesting activity, they tended to display less interest in and willingness to work on that activity after the termination of the rewards than did people who had worked on the activity without receiving a reward. (p. 1026)
Results from the present meta-analysis suggest that this statement is erroneous. The findings indicate that, in general, rewarded people are not less willing to work on activities and they do not display a less favorable attitude toward tasks than people who do not receive rewards. When rewards are broken down into reward type, expectancy, and contingency, results indicate that, on the free-time measure, verbal reward produces an increase in intrinsic motivation; tangible rewards produce no effect when they are delivered unexpectedly, and they are not detrimental when they are expected and contingent on level of performance or completing or solving a task. Expected tangible rewards produce a decrease in intrinsic motivation as measured by free time on task when they are given to individuals simply for engaging in an activity. On the attitude measure, verbal reward produces an increase in intrinsic motivation, and tangible rewards do not lead to a decrease in intrinsic motivation under any conditions. An increase in intrinsic motivation is shown on the attitude measure when individuals are offered a reward for performing to a set of standards. Thus, the present results suggest that rewards are detrimental only under a highly specified set of circumstances. That is, when subjects are offered a tangible reward (expected) that is delivered regardless of level of performance, they spend less time on a task than control subjects once the reward is removed. The same condition has no effect on attitude. Given these results, why is it that one commonly finds general statements condemning reinforcement and/or reward in journal articles and introductory textbooks? The present meta-analysis makes it clear how circumscribed the negative effect of reward really is. One possibility is that terms such as tangible, expected, unexpected, contingent and noncontingent become very confusing to a reader sorting through this literature. Consider, at its simplest, a study investigating the effects of expected reward on intrinsic motivation. Suppose the results showed a negative effect for expected reward. When discussing findings, do the researchers talk about the negative effects of the promise of reward or about the negative effects of reward, in general? There is no doubt that conclusions reached from such studies are often made about reward or reinforcement in general, not promise of reward. This has led to a great deal of misunderstanding about the overall effects of reward and reinforcement on intrinsic motivation. Even an informed reader can have difficulty keeping in mind what a particular study is investigating. It may be for this reason that rewards are
Salkind_Chapter 65.indd 213
9/4/2010 7:05:34 PM
214
Motivation
often equated with reinforcers and, overall, have come to be seen as harmful. It is hoped that the present meta-analysis has helped to clarify the issue.
Theoretical Implications How do results from the present meta-analysis fit in with the various theories that have been formulated to account for the negative effects of rewards on intrinsic motivation? Advocates of cognitive evaluation theory (e.g., Deci & Ryan, 1985) would probably not have difficulty reconciling results from the free-time measure of intrinsic motivation. According to cognitive evaluation theory, competence and self-determination underlie intrinsic motivation. Rewards can facilitate or hinder competence and self-determination depending on whether they are perceived as informational, controlling, or amotivational. From this perspective, results from the meta-analysis would suggest that verbal rewards increase a person’s intrinsic motivation because of their informational value. Verbal praise would be seen to lead an individual to feel competent in performing a task; hence, intrinsic motivation would increase. Because the cognitive evaluation process is said to take place while the rewarded activity is occurring, unexpected rewards would not alter a person’s intrinsic motivation. On the other hand, rewards offered to people for participating in a task, in spite of how well they perform, would be perceived as controlling and would decrease intrinsic motivation. The problem for cognitive evaluation theory arises when one considers results from the attitude measure of intrinsic motivation. Deci and Ryan (1985) suggest that interest, enjoyment, and satisfaction are central emotions that accompany intrinsic motivation. A person’s experience of an activity is a focal point of cognitive evaluation theory. In other words, cognitive evaluation theory depends on an internal attitude change that is later expressed behaviorally as time on task. Results from the present meta-analysis indicate that reward does not negatively affect attitude. Individuals who receive verbal praise report greater interest than nonrewarded people. Tangible rewards produce no change in attitude when they are given for doing, completing, or solving a task; a positive effect is evident when rewards are contingent on a specified level of performance. One way of mitigating the findings for cognitive evaluation theory is to question the reliability of the attitude measure. In many studies, the attitude measure was obtained from a single-item Likert scale. An additional problem is that the questions designed to assess attitude toward the task may have been unable to separate subjects’ liking of the reward from their liking of the task. If the attitude measures are unreliable, they will fail to reflect true differences between rewarded and nonrewarded groups. This may be one way to handle the puzzling results; however, it also suggests that there has been no test of the major mediator proposed by the theory.
Salkind_Chapter 65.indd 214
9/4/2010 7:05:34 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
215
The problem of operationalizing the construct of intrinsic motivation was recently addressed in a meta-analysis by Wiersma (1992).9 Results from Wiersma’s study depended on whether intrinsic motivation was operationalized as a free-time measure or as a task performance during rewarded period measure. Free-time measures showed a decline in intrinsic motivation; performance measures showed an increase. As noted, in the present analyses, results from the attitude measure do not coincide with the free-time measure. Additionally, measures of intrinsic motivation as performance during free time or as willingness to volunteer for future studies do not clarify the issue of operationalization of intrinsic motivation. Given the lack of covariation among the measures, it seems appropriate to devote further research to clarifying the concept of intrinsic motivation and to developing suitable measures. A different solution is offered by Rigby, Deci, Patrick, and Ryan (1992) who suggest that attention be directed toward the concept of self-determination rather than a pursuit of the intrinsic/ extrinsic dichotomy. Others concur but suggest that researchers should focus on goal definitions (Sansone & Morgan, 1992). A final alternative would be to agree that constructs such as self-determination, goal definition, and intrinsic motivation are scientifically unclear and that it would be more appropriate to deal with the effects of reward and reinforcement on behavior (e.g., Bandura, 1977, 1986; Dickinson, 1989). Such a course of action would mean abandoning cognitive evaluation theory. Another theoretical explanation that has been proposed to account for the effects of rewards on intrinsic motivation is the overjustification effect (Lepper, Greene, & Nisbett, 1973). The view is that people’s perceptions about the causes of their behavior influence future motivation. Rewards lead to a decrease in intrinsic motivation when people’s perceptions shift from accounting for their behavior as self-initiated to accounting for it in terms of external reward. Because the present analysis did not evaluate subjects’ perceptions about the causes of their behavior, it is impossible to determine whether overjustification explains the results. Further research that measures subjects’ attributions to internal and external factors is warranted. Finally, how would the findings of the meta-analysis be interpreted from a behavioral perspective? The results from single-subject designs indicate that reinforcement does not produce decrements in intrinsic motivation. This finding is compatible with a behavioral view. That is, behaviorists maintain that behavior returns to baseline after reinforcement is withdrawn. If the rewards used in the groups’ design studies are reinforcers, one would expect behavior to eventually return to baseline. Research designed to investigate the effects of reward on intrinsic motivation has typically measured time on task for a brief 8- to 10-minute period, immediately following the removal of reward. Thus, if verbal praise were a reinforcer, one might interpret the positive effect as a carryover of the reinforcement procedure. Another interpretation is that the positive effect is the result of an extinction burst. That is, when reinforcement is first withdrawn, the immediate, short-term effect is that rate
Salkind_Chapter 65.indd 215
9/4/2010 7:05:34 PM
216
Motivation
of response increases. After a period of time, behavior would return to baseline. In terms of the negative effect of expected, noncontingent, tangible reward, some writers (e.g., Dickinson, 1989; Flora, 1990) have suggested that such a reward procedure does not represent a reinforcement contingency. The promise of a reward is seen by behaviorists as a discriminative stimulus (SD), and the negative effect is understood as the result of a bribe. A difficulty with this interpretation is that it does not account for findings from other conditions where promise of reward does not produce a negative effect. Further research is necessary to determine when and under what conditions promises of rewards function as bribes. Our data suggest that promises linked to noncontingent reward may function as bribes rather than as positive incentives.
Practical Implications The present findings suggest that verbal praise and positive feedback enhance people’s intrinsic interest. This is an important finding. Most social interaction in business, education, and clinical settings involves verbal feedback from managers, teachers, and therapists. When praise and other forms of positive feedback are given and later removed, people continue to show intrinsic interest in their work. In contrast to recent claims made by Kohn (1993, p. 55), verbal praise is an extrinsic motivator that positively alters attitudes and behavior. Rewards can have a negative impact on intrinsic motivation when they are offered to people for engaging in a task without consideration of any standard of performance. In a classroom, this might occur if a teacher promised students tangible rewards simply for doing an activity. For example, a teacher who promises stars or other awards to students for spending time doing math problems may undermine intrinsic motivation. In such a case, one could expect rewarded individuals to enjoy the task as much as those who are not offered an incentive. But, they may spend less time on the activity in a study period when the reward is no longer forthcoming. According to our results, this would not occur if the teacher used the same rewards but made them contingent on successful completion of the problems. Overall, the present review suggests that teachers have no reason to resist implementing incentive systems in the classroom. This conclusion is based on our findings, which show that verbal praise enhances intrinsic motivation and that other rewards and reinforcement leave intrinsic motivation largely unaffected. A small negative effect occurs when tangible rewards are promised without regard to a standard of performance. Under this circumstance, the promise of reward may act as a bribe. Importantly, on a practical level, the implication is that reward offered in educational and other settings should be delivered contingent on performance.
Salkind_Chapter 65.indd 216
9/4/2010 7:05:34 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
217
Notes 1. Although there was an overall positive effect of tangible reward on intrinsic motivation, Rosenfìeld et al. (l980) also found that rewards that did not indicate ability led to less intrinsic interest. 2. In addition to studies reported in English, five relevant Japanese experiments were identified by the CD-ROM search. The information in the abstracts was not adequate to code the findings. Therefore, these studies are not included in the meta-analysis. 3. Boggiano and Ruble (1979) reported that 147 children participated in the study. There were two reward conditions (task contingent, performance contingent) and a nonrewarded control group. The contrast for the control versus task-contingent reward groups on the free-time measure is reported as t(l30) = 2.0, p < .05; the contrast for the control versus performance-contingent reward groups is reported as t(130) = 1.16, n.s. 4. A copy of the coding form is available on request from the first author. 5. A list of the experiments included in each interaction is available on request from the first author. 6. Further analyses which include studies that index effect size as 0.00 are available in Cameron (1992). 7. The present review does not assess cultural differences in the impact of reward on intrinsic motivation. However, it is interesting to note that, although the study from India (Tripathi & Agarwal, 1985) shows an extreme positive value for the effect of verbal praise on the free-time measure, the direction of the result is consistent with the North American studies. 8. A few researchers have assessed the effects of expected tangible rewards on intrinsic motivation relative to unexpected tangible rewards (e.g., Enzle & Ross, 1978; Fazio, 1981; Lepper & Greene, 1975). Other researchers have conducted studies comparing expected noncontingent reward groups to expected contingent reward groups (e.g., Farr, 1976; Phillips & Lord, 1980; Pinder, 1976). Such studies concern direct comparisons between the two types of reward expectancies (expected versus unexpected) and the two types of reward contingencies (noncontingent versus contingent) without reference to a nonrewarded control group. Results from meta-analyses conducted on these comparisons and a list of studies included in such analyses can be obtained in Cameron (1992). One significant effect emerged from these analyses; subjects who received an expected tangible reward showed less intrinsic motivation on the free-time measure than subjects who received an unexpected tangible reward. The average effect size and confidence interval for this comparison was –0.26 (–0.45, –0.06). 9. Wiersma (1992) reported results of a meta-analysis of 23 experiments on reward and intrinsic motivation. These studies make up a subset of those analyzed in the present article. Effect sizes from Wiersma’s study were not always based on a comparison of a reward condition to a no-reward condition. This makes it impossible to directly compare our findings with those of Wiersma.
References Amabile, T. M., Hennessey, B. A., & Grossman, B. S. (1986). Social influences on creativity: The effects of contracted-for reward. Journal of Personality and Social Psychology, 50, 14 – 23. Bandura, A. (1977). Social learning theory. Englewood Cliffs, NJ: Prentice Hall. Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood Cliffs, NJ: Prentice-Hall.
Salkind_Chapter 65.indd 217
9/4/2010 7:05:34 PM
218
Motivation
Bates, J. A. (1979). Extrinsic reward and intrinsic motivation: A review with implications for the classroom. Review of Educational Research, 49, 557–576. Bem, D. J. (1972). Self-perception theory. In L. Berkowitz (Ed.), Advances in Experimental Social Psychology (Vol. 6, pp. 1–62). New York: Academic. Boggiano, A. K., & Ruble, D. N. (1979). Competence and the overjustifìcation effect: A developmental study. Journal of Personality and Social Psychology, 37, 1462–1468. Brennan, T. P., & Glover, J. A. (1980). An examination of the effect of extrinsic reinforcers on intrinsically motivated behavior: experimental and theoretical. Social Behavior and Personality, 8, 27–32. Butler, R. (1987). Task-involving and ego-involving properties of evaluation: Effects of different feedback conditions on motivational perceptions, interest, and performance. Journal of Educational Psychology, 79, 474 – 482. Cameron, J. (1992). Intrinsic motivation revisited. Unpublished doctoral dissertation, University of Alberta, Canada. Cooper, H. M. (1989). Integrating research: A guide for literature reviews (2nd ed.). Beverly Hills: Sage. Danner, F. W., & Lonkey, E. (1981). A cognitive developmental approach to the effects of rewards on intrinsic motivation. Child Development, 52, 1043–1052. Deci, E. L. (1971). Effects of externally mediated rewards on intrinsic motivation. Journal of Personality and Social Psychology, 18, 105–115. Deci, E. L. (1972a). Intrinsic motivation, extrinsic reinforcement, and inequity. Journal of Personality and Social Psychology, 22, 113–120. Deci, E. L. (1972b). The effects of contingent and noncontingent rewards and controls on intrinsic motivation. Organizational Behavior and Human Performance, 8, 217–229. Deci, E. L. (1975). Intrinsic Motivation. New York: Plenum. Deci, E. L., & Ryan, R. M. (1985). Intrinsic Motivation and Self-Determination in Human Behavior. New York: Plenum. Deci, E. L., & Ryan, R. M. (1987). The support of autonomy and the control of behavior. Journal of Personality and Social Psychology, 53, 1024–1037. DeLoach, L. L., Griffith, K., & LaBarba, R. C. (1983). The relationship of group context and intelligence to the overjustifìcation effect. Bulletin of the Psychonomic Society, 21, 291–293. Dickinson, A. M. (1989). The detrimental effects of extrinsic reinforcement on “intrinsic motivation.” The Behavior Analyst, 12, 1–15. Enzle, M. E., & Ross, J. M. (1978). Increasing and decreasing intrinsic interest with contingent rewards. Journal of Experimental and Social Psychology, 14, 588–597. Fabes, R. A. (1987). Effects of reward contexts on young children’s task interest. Journal of Psychology, 121, 5–19. Farr, J. L. (1976). Task characteristics, reward contingency and intrinsic motivation. Organizational Behavior and Human Performance, 16, 294–307. Fazio, R. H. (1981). On the self-perception explanation of the overjustifìcation effect: the role of salience and initial attitude. Journal of Experimental Social Psychology, 17, 417–426. Feingold, B. D., & Mahoney, M. J. (1975). Reinforcement effects on intrinsic interest: Undermining the overjustification hypothesis. Behavior Therapy, 6, 357–377. Flora, S. R. (1990). Undermining intrinsic interest from the standpoint of a behaviorist. The Psychological Record, 40, 323–346. Glass, G. V., McGaw, B., & Smith, M. L. (1981). Meta-Analysis in Social Research. Beverly Hills: Sage. Greene, D., & Lepper, M. R. (1974). Effects of extrinsic rewards on children’s subsequent intrinsic interest. Child Development, 45, 1141–1145.
Salkind_Chapter 65.indd 218
9/4/2010 7:05:34 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
219
Greene, D., Sternberg, B., & Lepper, M. R. (1976). Overjustification in a token economy. Journal of Personality and Social Psychology, 34, 1219–1234. Guzzo, R. A. (1979). Types of rewards, cognitions and work motivation. Academy of Management Journal, 22, 75–86. Harackiewicz, J. K., Manderlink, G., & Sansone, C. (1984). Rewarding pinball wizardry: effects of evaluation and cue value on intrinsic interest. Journal of Personality and Social Psychology, 47, 287–300. Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107–128. Hedges, L. V. (1987). How hard is hard science, how soft is soft science? The empirical cumulativeness of research. American Psychologist, 42, 443– 455. Hedges, L., & Becker, B. J. (1986). Statistical methods in the meta-analysis of research on gender differences. In J. S. Hyde & M. C. Linn (Eds.), The psychology of gender: Advances through meta-analysis (pp. 14–50). Baltimore: John Hopkins University Press. Hedges, L., & Olkin, I. (1985). Statistical Methods for Meta-Analysis. Orlando: Academic. Hopkins, B. L., & Mawhinney, T. C. (1992). Pay for performance: History, controversy, and evidence. New York: Haworth. Karniol, R., & Ross, M. (1977). The effect of performance relevant and performance irrelevant rewards on children’s intrinsic motivation. Child Development, 48, 482–487. Kelly, H. H. (1967). Attribution theory in social psychology. In D. Levine (Ed.), Nebraska symposium on motivation (Vol. 15, pp. 192–238). Lincoln: University of Nebraska Press. Kohn, A. (1993). Why incentive plans cannot work. Harvard Business Review, 71(5), 54 – 63. Lepper, M. R. (1981). Intrinsic and extrinsic motivation in children: Detrimental effects of superfluous social controls. In W. A. Collins (Ed.), Aspects of the development of competence: The Minnesota symposia on child psychology (Vol. 14, pp. 155–214). Hillsdale, NJ: Erlbaum. Lepper, M. R., & Greene, D. (1975). Turning play into work: Effects of adult surveillance and extrinsic rewards on children’s intrinsic motivation. Journal of Personality and Social Psychology, 31, 479 – 486. Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children’s intrinsic interest with extrinsic reward: A test of the “overjustification” hypothesis. Journal of Personality and Social Psychology, 28, 129–137. Levine, F. M., & Fasnacht, G. (1974). Token rewards may lead to token learning. American Psychologist, 29, 817–820. Light, R. J., & Pillemer, D. B. (1984). Summing Up: The Science of Reviewing Research. Cambridge, MA: Harvard University Press. Mawhinney, T. C. (1990). Decreasing intrinsic “motivation” with extrinsic rewards: Easier said than done. Journal of Organizational Behavior Management, 11, 175–191. McCullers, J. C. (1978). Issues in learning and motivation. In M. R. Lepper & D. Greene (Eds.), The Hidden Costs of Reward: New Perspectives on the Psychology of Human Motivation (pp. 5–18). Hillsdale, NJ: Erlbaum. McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111, 361–365. Morgan, M. (1981). The overjustification effect: A developmental test of self-perception interpretations. Journal of Personality and Social Psychology, 40, 809– 821. Morgan, M. (1983). Decrements in intrinsic interest among rewarded and observer subjects. Child Development, 54, 636–644. Morgan, M. (1984). Reward-induced decrements and increments in intrinsic motivation. Review of Educational Research, 54, 5–30.
Salkind_Chapter 65.indd 219
9/4/2010 7:05:34 PM
220
Motivation
Mynatt, C., Oakley, D., Arkkelin, D., Piccione, A., Margolis, R., & Arkkelin, J. (1978). An examination of overjustification under conditions of extended observation and multiple reinforcement: Overjustification or boredom? Cognitive Therapy and Research, 2, 171–177. Orlick, T. D., & Mosher, R. (1978). Extrinsic awards and participant motivation in a sport related task. International Journal of Sport Psychology, 9, 27–39. Phillips, J. S., & Lord, R. G. (1980). Determinants of intrinsic motivation: locus of control and competence information as components of Deci’s cognitive evaluation theory. Journal of Applied Psychology, 65, 211–218. Pinder, C. C. (1976). Additivity versus nonadditivity of intrinsic and extrinsic incentives: Implications for work, motivation, performance, and attitudes. Journal of Applied Psychology, 61, 693–700. Pittman, T. S., Emery, J., & Boggiano, A. K. (1982). Intrinsic and extrinsic motivational orientations: reward-induced changes in preference for complexity. Journal of Personality and Social Psychology, 42, 789–797. Pritchard, R. D., Campbell, K. M., & Campbell, D. J. (1977). Effects of extrinsic financial rewards on intrinsic motivation. Journal of Applied Psychology, 62, 9–15. Rigby, C. S., Deci, E. L., Patrick, B. C., & Ryan, R. M. (1992). Beyond the intrinsic–extrinsic dichotomy: Self-determination in motivation and learning. Motivation and Emotion, 16, 165–185. Rosenfield, D., Folger, R., & Adelman, H. F. (1980). When rewards reflect competence: A qualification of the overjustification effect. Journal of Personality and Social Psychology, 39, 368–376. Rosenthal, R. (1966). Experimenter effects in behavioral research. New York: AppletonCentury-Crofts. Rummel, A., & Feinberg, R. (1988). Cognitive evaluation theory: A meta-analytic review of the literature. Social Behavior and Personality, 16, 147–164. Ryan, R. M., Mims, B., & Koestner, R. (1983). Relation of reward contingency and interpersonal context to intrinsic motivation: A review and test using cognitive evaluation theory. Journal of Personality and Social Psychology, 45, 736–750. Sansone, C., & Morgan, C. (1992). Intrinsic motivation and education: Competence in context. Motivation and Emotion, 16, 249–270. Schwartz, B. (1990). The creation and destruction of value. American Psychologist, 45, 7–15. Schwarzer, R. (1991). Meta: Programs for secondary data analysis, MS-DOS Version 5.0 [Computer program]. Dubuque, IA: Wm. C. Brown. Scott, W. E., Jr. (1975). The effects of extrinsic rewards on “intrinsic motivation.” Organizational Behavior and Human Performance, 25, 311–335. Sutherland, S. (1993). Impoverished minds. Nature, 364, 767. Tripathi, K. N., & Agarwal, A. (1985). Effects of verbal and tangible rewards on intrinsic motivation in males and females. Psychological Studies, 30, 77–84. Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley. Vallerand, R. J. (1983). The effect of differential amounts of positive verbal feedback on the intrinsic motivation of male hockey players. Journal of Sport Psychology, 5, 100–107. Vasta, R., Andrews, D. E., McLaughlin, A. M., Stirpe, L. A., & Comfort, C. (1978). Reinforcement effects on intrinsic interest: A classroom analog. Journal of School Psychology, 16, 161–168. Vasta, R., & Stirpe, L. A. (1979). Reinforcement effects on three measures of children’s interest in math. Behavior Modification, 3, 223–244. Wiersma, U. J. (1992). The effects of extrinsic rewards in intrinsic motivation: A metaanalysis. Journal of Occupational and Organizational Psychology, 65, 101–114. Workman, E. A., & Williams, R. L. (1980). Effects of extrinsic rewards on intrinsic motivation in the classroom. Journal of School Psychology, 18, 141–147. Zimbardo, P. G. (1988) Psychology and life (11th ed.). Glenview, IL: Scott, Foresman.
Salkind_Chapter 65.indd 220
9/4/2010 7:05:34 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
221
Appendix A: Studies Included in the Meta-Analysis of Group Designs Amabile, T. M., Hennessey, B. A., & Grossman, B. S. (1986). Social influences on creativity: The effects of contracted-for reward. Journal of Personality and Social Psychology, 50, 14–23. Anderson, R., Manoogian, S. T., & Reznick, J. S. (1976). The undermining and enhancing of intrinsic motivation in preschool children. Journal of Personality and Social Psychology, 34, 915–922. Anderson, S., & Rodin, J. (1989). Is bad news always bad? Cue and feedback effects on intrinsic motivation. Journal of Applied Social Psychology, 19, 449–467. Arkes, H. R. (1979). Competence and the overjustification effect. Motivation and Emotion, 3, 143–150. Arnold, H. J. (1976). Effects of performance feedback and extrinsic reward upon high intrinsic motivation. Organizational Behavior and Human Performance, 17, 275–288. Arnold, H. J. (1985). Task performance, perceived competence, and attributed causes of performance as determinants of intrinsic motivation. Academy of Management Journal, 28, 876–888. Blanck, P. D., Reis, H. T., & Jackson, L. (1984). The effects of verbal reinforcement of intrinsic motivation for sex-linked tasks. Sex Roles, 10, 369–386. Boal, K. B., & Cummings, L. L. (1981). Cognitive evaluation theory: an experimental test of processes and outcomes. Organizational Behavior and Human Performance, 28, 289–310. Boggiano, A. K., Harackiewicz, J. M., Besette, J. M., Main, D. S. (1985). Increasing children’s interest through performance contingent reward. Social Cognition, 3, 400–411. Boggiano, A. K., & Hertel, P. T. (1983). Bonuses and bribes: mood effects in memory. Social Cognition, 2, 49–61. Boggiano, A. K., Ruble, D. N., & Pittman, T. S. (1982). The mastery hypothesis and the overjustification effect. Social Cognition, 1, 38–49. Brennan, T. P., & Glover, J. A. (1980). An examination of the effect of extrinsic reinforcers on intrinsically motivated behavior: experimental and theoretical. Social Behavior and Personality, 8, 27–32. Broekner, J., & Vasta, R. (1981). Do causal attributions mediate the effects of extrinsic rewards on intrinsic interest? Journal of Research in Personality, 15, 201–209. Butler, R. (1987). Task-involving and ego-involving properties of evaluation: Effects of different feedback conditions on motivational perceptions, interest, and performance. Journal of Educational Psychology, 79, 474–482. Calder, B. J., & Staw, B. M. (1975). Self-perception of intrinsic and extrinsic motivation. Journal of Personality and Social Psychology, 31, 599–605. Crino, M. D., & White, M. C. (1982). Feedback effects in intrinsic/extrinsic reward paradigms. Journal of Management, 8, 95–108. Daniel, T. L., & Esser, J. K. (1980). Intrinsic motivation as influenced by rewards, task interest, and task structure. Journal of Applied Psychology, 65, 566–573.
Salkind_Chapter 65.indd 221
9/4/2010 7:05:34 PM
222
Motivation
Danner, F. W., & Lonkey, E. (1981). A cognitive developmental approach to the effects of rewards on intrinsic motivation. Child Development, 52, 1043–1052. Deci, E. L. (1971). Effects of externally mediated rewards on intrinsic motivation. Journal of Personality and Social Psychology, 18, 105–115. Deci, E. L. (1972a). Intrinsic motivation, extrinsic reinforcement, and inequity. Journal of Personality and Social Psychology, 22, 113–120. Deci, E. L. (1972b). The effects of contingent and noncontingent rewards and controls on intrinsic motivation. Organizational Behavior and Human Performance, 8, 217–229. DeLoach, L. L., Griffith, K., & LaBarba, R. C. (1983). The relationship of group context and intelligence to the overjustification effect. Bulletin of the Psychonomic Society, 21, 291–293. Dollinger, S. J., & Thelen, M. H. (1978). Overjustification and children’s intrinsic motivation: comparative effects of four rewards. Journal of Personality and Social Psychology, 36, 1259–1269. Earn, B. M. (1982). Intrinsic motivation as a function of extrinsic financial rewards and subjects’ locus of control. Journal of Personality, 50, 360–373. Fabes, R. A. (1987). Effects of reward contexts on young children’s task interest. Journal of Psychology, 121, 5–19. Fabes, R. A., Eisenberg, N., Fultz, J., & Miller, P (1988). Reward, affect and young children’s motivational orientation. Motivation and Emotion, 12, 155–169. Freedman, S. M., & Phillips, J. S. (1985). The effects of situational performance constraints on intrinsic motivation and satisfaction: the role of perceived competence and self-determination. Organizational Behavior and Human Decision Processes, 35, 397–416. Greene, D., & Lepper, M. R. (1974). Effects of extrinsic rewards on children’s subsequent intrinsic interest. Child Development, 45, 1141–1145. Griffith, K. M., DeLoach, L. L., & LaBarba, R. C. (1984). The effects of rewarder familiarity and differential reward preference in intrinsic motivation. Bulletin of the Psychonomic Society, 22, 313–316. Hamner, W. C., & Foster, L. W. (1975). Are intrinsic and extrinsic rewards additive: A test of Deci’s cognitive evaluation theory of task motivation. Organizational Behavior and Human Performance, 14, 398–415. Harackiewicz, J. M. (1979). The effects of reward contingency and performance feedback on intrinsic motivation. Journal of Personality and Social Psychology, 37, 1352–1363. Harackiewicz, J. M., Abrahams, S., & Wageman, R. (1987). Performance evaluation and intrinsic motivation: The effects of evaluative focus, rewards, and achievement orientation. Journal of Personality and Social Psychology, 53, 1015–1023. Harackiewicz, J. M., & Manderlink, G. (1984). A process analysis of the effects of performance-contingent rewards on intrinsic motivation. Journal of Experimental Social Psychology, 20, 531–551. Harackiewicz, J. M., Manderlink, G., & Sansone, C. (1984). Rewarding pinball wizardry: effects of evaluation and cue value on intrinsic interest. Journal of Personality and Social Psychology, 47, 287–300. Hom, H. L. (1987). A methodological note: time of participation effects on intrinsic motivation. Personality and Social Psychology Bulletin, 13, 210–215.
Salkind_Chapter 65.indd 222
9/4/2010 7:05:34 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
223
Karniol, R., & Ross, M. (1977). The effect of performance relevant and performance irrelevant rewards on children’s intrinsic motivation. Child evelopment, 48, 482–487. Koestner, R., Zuckerman, M., & Koestner, J. (1987). Praise, involvement, and intrinsic motivation. Journal of Personality and Social Psychology, 53, 383–390. Kruglanski, A. W., Alon, S., & Lewis, T. (1972). Retrospective misattribution and task enjoyment. Journal of Experimental Social Psychology, 8, 493–501. Kruglanski, A. W., Friedman, I., & Zeevi, G. (1971). The effects of extrinsic incentive on some qualitative aspects of task performance. Journal of Personality, 39, 606–617. Kruglanski, A. W., Riter, A., Amitai, A., Margolin, B. S., Shabatai, L., & Zaksh, D. (1975). Can money enhance intrinsic motivation?: A test of the contentconsequence hypothesis. Journal of Personality and Social Psychology, 31, 744–750. Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children’s intrinsic interest with extrinsic reward: A test of the “overjustification” hypothesis. Journal of Personality and Social Psychology, 28, 129–137. Loveland, K. K., & Olley, J. G. (1979). The effect of external reward on interest and quality of task performance in children of high and low intrinsic motivation. Child Development, 50, 1207–1210. Luyten, H., & Lens, W. (1981). The effect of earlier experience and reward contingencies on intrinsic motivation. Motivation and Emotion, 5, 25–36. McGraw, K. O., & McCullers, J. C. (1979). Evidence of a detrimental effect of extrinsic incentives on breaking a mental set. Journal of Experimental Social Psychology, 15, 285–294. McLoyd, V. C. (1979). The effects of extrinsic rewards of differential value on high and low intrinsic interest. Child Development, 50, 1010–1019. Morgan, M. (1981). The overjustification effect: A developmental test of selfperception interpretations. Journal of Personality and Social Psychology, 40, 809–821. Morgan, M. (1983). Decrements in intrinsic interest among rewarded and observer subjects. Child Development, 54, 636–644. Mynatt, C., Oakley, D., Piccione, A., Margolis, R., & Arkkelin, J. (1978). An examination of overjustification under conditions of extended observation and multiple reinforcement: Overjustification or boredom? Cognitive Therapy and Research, 2, 171–177. Ogilvie, L., & Prior, M. (1982). The overjustification effect in retarded children: durability and generalizability. Australia and New Zealand Journal of Developmental Disabilities, 8, 213–218. Orlick, T. D., & Mosher, R. (1978). Extrinsic awards and participant motivation in a sport related task. International Journal of Sport Psychology, 9, 27–39. Palack, S. R., Costomotis, S., Sroka, S., & Pittman, T. S. (1982). School experience, reward characteristics, and intrinsic motivation. Child Development, 53, 1382–1391. Pittman, T S., Cooper, E. E., & Smith, T. W. (1977). Attribution of causality and the overjustification effect. Personality and Social Psychology Bulletin, 3, 280–283.
Salkind_Chapter 65.indd 223
9/4/2010 7:05:34 PM
224
Motivation
Pittman, T. S., Davey, M. E., Alafat, K. A., Wetherill, K. V., & Kramer, N. A. (1980). Informational versus controlling verbal rewards. Personality and Social Psychology Bulletin, 6, 228–233. Pittman, T. S., Emery, J., & Boggiano, A. K. (1982). Intrinsic and extrinsic motivational orientations: reward-induced changes in preference for complexity. Journal of Personality and Social Psychology, 42, 789–797. Porac, J. F., & Meindl, J. (1982). Undermining overjustification: Inducing intrinsic and extrinsic task representations. Organizational Behavior and Human Performance, 29, 208–226. Pretty, G. H., & Seligman, C. (1984). Affect and the overjustification effect. Journal of Personality and Social Psychology, 46, 1241–1253. Reiss, S., & Sushinsky, L. W. (1975). Overjustification, competing responses, and the acquisition of intrinsic interest. Journal of Personality and Social Psychology, 31, 1116–1125. Rosenfield, D., Folger, R., & Adelman, H. F. (1980). When rewards reflect competence: A qualification of the overjustification effect. Journal of Personality and Social Psychology, 39, 368–376. Ross, M. (1975). Salience of reward and intrinsic motivation. Journal of Personality and Social Psychology, 32, 245–254. Ross, M., Karnio, R., & Rothstein, M. (1976). Reward contingency and intrinsic motivation in children: a test of the delay of gratification hypothesis. Journal of Personality and Social Psychology, 33, 442–447. Ryan, R. M., Mims, B., & Koestner, R. (1983). Relation of reward contingency and interpersonal context to intrinsic motivation: A review and test using cognitive evaluation theory. Journal of Personality and Social Psychology, 45, 736–750. Salinick, G. R. (1975). Interaction effects of performance and money on selfperception of intrinsic motivation. Organizational Behavior and Human Performance, 13, 339–351. Sansone, C. (1986). A question of competence: the effects of competence and task feedback on intrinsic interest. Journal of Personality and Social Psychology, 51, 918–931. Sansone, C. (1989). Competence feedback, task feedback, and intrinsic interest: An examination of process and context. Journal of Experimental Social Psychology, 25, 343–361. Sansone, C, Sachau, D. A., & Weir, C. (1989). Effects of instruction on intrinsic interest: The importance of context. Journal of Personality and Social Psychology, 57, 819–829. Sarafino, E. P. (1984). Intrinsic motivation and delay of gratification in preschoolers: the variables of reward salience and length of expected delay. British Journal of Developmental Psychology, 2, 149–156. Shanab, M. E., Peterson, D., Dargahi, S., & Deroian, P . (1981). The effects of positive and negative verbal feedback on the intrinsic motivation of male and female subjects. The Journal of Social Psychology, 115, 195–205. Shapira, Z. (1976). Expectancy determinants of intrinsically motivated behavior. Journal of Personality and Social Psychology, 34, 1235–1244.
Salkind_Chapter 65.indd 224
9/4/2010 7:05:34 PM
Cameron and Pierce
Reinforcement, Reward, and Intrinsic Motivation
225
Smith, T. W., & Pittman, T. S. (1978). Reward, distraction, and the overjustification effect. Journal of Personality and Social Psychology, 36, 565–573. Staw, B. M., Calder, B. J., Hess, R. K., & Samdelands, L. E. (1980). Intrinsic motivation and norms about payment. Journal of Personality, 48, 1–14. Swann, W. B., Jr., & Pittman, T. S. (1977). Moderating influence of verbal cues on intrinsic motivation. Child Development, 48, 1128–1132. Taub, S. I., & Dollinger, S. J. (1975). Reward and purpose as incentives for children differing in locus of control expectancies. Journal of Personality, 43, 179–195. Tripathi, K. N., & Agarwal, A. (1985). Effects of verbal and tangible rewards on intrinsic motivation in males and females. Psychological Studies, 30, 77–84. Tripathi, K. N., & Agarwal, A. (1988). Effect of reward contingency on intrinsic motivation. The Journal of General Psychology, 115 (3), 241–246. Vallerand, R. J. (1983). The effect of differential amounts of positive verbal feedback on the intrinsic motivation of male hockey players. Journal of Sport Psychology, 5, 100–107. Vallerand, R. J., & Reid, G. (1984). On the causal effects of perceived competence on intrinsic motivation: A test of cognitive evaluation theory. Journal of Sport Psychology, 6, 94–102. Weinberg, R. S., & Jackson, A. (1979). Competition and extrinsic rewards: Effect on intrinsic motivation and attribution. Research Quarterly, 50, 494–502. Weiner, M. J. (1980). The effect of incentive and control over outcomes upon intrinsic motivation and performance. The Journal of Social Psychology, 112, 247–254. Weiner, M. J., & Mander, A. M. (1978). The effects of reward and perception of competency upon intrinsic motivation. Motivation and Emotion, 2, 67–73. Wicker, F. W., Brown, G., Wiehe, J. A., & Shim, W. Y. (1990). Moods, goals, and measures of intrinsic motivation. The Journal of Psychology, 124, 75–86. Williams, B. W., (1980). Reinforcement, behavior constraint and the overjustification effect. Journal of Personality and Social Psychology, 39, 599–614. Wimperis, B. R., & Farr, J. L. (1979). The effects of task content and reward contingency upon task performance and satisfaction. Journal of Applied Social Psychology, 9 (3), 229–249. Zinser, O., Young, J. G., & King, P. E. (1982). The influence of verbal reward on intrinsic motivation in children. The Journal of General Psychology, 106, 85–91.
Studies Included in the Meta-Analysis of Single-Subject Designs Davidson, P., & Bucher, B. (1978). Intrinsic interest and extrinsic reward: The effects of a continuing token program on continuing nonconstrained preference. Behavior Therapy, 9, 222–234. Feingold, B. D., & Mahoney, M. J. (1975). Reinforcement effects on intrinsic interest: Undermining the overjustification hypothesis. Behavior Therapy, 6, 357–377.
Salkind_Chapter 65.indd 225
9/4/2010 7:05:34 PM
226
Motivation
Mawhinney, T. C., Dickinson, A. M., & Taylor, L. A. (1989). The use of concurrent schedules to evaluate the effects of extrinsic rewards on “intrinsic motivation.” Journal of Organizational Behavior Management, 10, 109–129. Vasta, R., Andrews, D. E., McLaughlin, A. M., Stirpe, L. A., & Comfort, C. (1978). Reinforcement effects on intrinsic interest: A classroom analog. Journal of School Psychology, 16, 161–168. Vasta, R., & Stirpe, L. A. (1979). Reinforcement effects on three measures of children’s interest in math. Behavior Modification, 3, 223–244.
Appendix B: Formulas for calculating effect size, g 1.
g=
X E − XC Sp where XE = mean of experimental group XC = mean of control group Sp = pooled standard deviation
Sp2 =
(nE − 1)SE2 + (nC − 1)SC2 nE + nC − 2 where
Sp2= pooled variance SE2 = variance of experimental group SC2 = variance of control group nE = sample size of experimental group nC = sample size of control group
2.
g=t
2 n
g=t
1 1 + nE nC
3.
4.
g= F
Salkind_Chapter 65.indd 226
for equal ns; n = sample size of each group
for unequal ns
nE + nC nE nC
9/4/2010 7:05:35 PM
Salkind_Chapter 65.indd 227
JPSP
Child dev
Greene, Lepper (1974)
Ross (1975) Exp. 2
Child dev
Greene, Lepper (1974)
JPSP
JPSP
Lepper et al. (1973)
Ross (1975) Exp. 1
JPSP
Lepper et al. 1973)
Child dev
J. Exp. Soc Psych
Kruglanski et al. (1972)
JPSP
Org Beh & Hum Perf
Deci (1972b)
Ross (1975) Exp. 1
JPSP
Deci (1972a)
Greene, Lepper (1974)
JPSP
JPSP
J of Pers.
Kruglanski et al. (1971)
Deci (1972a)
J of Pers.
Kruglanski et al. (1971)
Deci (1972a)
JPSP
JPSP
Deci (1971) Exp. 2
JPSP
JPSP
Deci (1971) Exp. 1
Deci (1971) Exp. 3
JPSP
Deci (1971) Exp. 1
Deci (1971) Exp. 3
Journal
Author(s)
Children
Children
Children
Children
Children
Children
Children
Children
Children
Adults
Adults
Adults
Adults
15–16 yrs
15–16 yrs
Adults
Adults
Adults
Adults
Adults
Subjects
Drum
Playing drum
Playing drum
Drawing
Drawing
Drawing
Drawing
Drawing
5 games
Soma
Soma
Soma
Soma
Creativity & recall
Creativity & recall
Soma
Soma
Writing headlines
Soma
Soma
Task
T
T
T
T
T
T
T
T
T
T
T
T
V
T
T
V
V
T
T
T
Reward type
E
E
E
U
U
E
U
E
U
E
E
E
U
E
E
U
U
E
E
E
Expectancy
Not, TC
Not, TC
Not, TC
Not, TC
Not, TC
Not, NC
Cont, TC
Cont, TC
Not, TC
Not, TC
Cont. TC
Cont, TC
Cont, TC
Contingency
Free time
Free time
Free time
Free time
Free time
Free time
Free time
Free time
Attitude
Free time
Free time
Free time
Free time
Volunteer
Attitude
Attitude
Free time
Performance
Attitude
Free time
Dep. measure
52
20
20
13
13
15
18
18
36
24
32
32
48
16
16
12
12
4
12
12
N exp.
14
20
20
15
15
15
15
15
33
16
32
32
48
16
16
12
12
2
12
12
–0.81
+0.56
–0.54
+0.22
+0.06
–0.70
+0.57b
–0.72
–0.66
+0.08b
–0.10
+0.75
+0.29
–0.63
–0.69
0.00a
+0.82
–3.72
0.00a
– 0.54
Effect size (g)ab
(Continued )
N control
Reinforcement, Reward, and Intrinsic Motivation
A/O
A/O
A/O
A/O
A/O
A/O
B/A
B/A
A/O
A/O
A/O
A/O
A/O
A/O
A/O
B/A
B/A
Field study
B/A
B/A
Design
Appendix C: Characteristics of Studies Included in the Meta-Analysis Cameron and Pierce 227
9/4/2010 7:05:35 PM
Salkind_Chapter 65.indd 228
JPSP
Calder, Staw (1975)
Org Beh & Hum Perf
Org Beh & Hum Perf
JPSP
JPSP
Arnold (1976)
Arnold (1976)
Ross et al. (1976)
Ross et al. (1976)
Adults
Adults
Adults
Adults
Adults
Adults
Children
15–16-yr.olds
14–15-yr.olds
Children
Subjects
A/O
A/O
Multiple trials
Multiple trials
Children
Children
Adults
Adults
Children
Children
SS Repeated Children measures
A/O
A/O
A/O
B/A
Org Beh & Hum Perf
Hamner, Foster (1975)
A/O
JPSP
Org Beh & Hum Perf
Hamner, Foster (1975)
A/O
B/A
Org Beh & Hum Perf
Salanick (1975)
A/O
JPSP
Org Beh & Hum Perf
Salanick (1975)
A/O
Anderson et al. (1976) Anderson et al. (1976)
JPSP
Reiss, Sushinski (1975)
A/O
JPSP
JPSP
Kruglanski et al. (1975) Exp. 2
A/O
Behavior Therapy
JPSP
Kruglanski et al. (1975) Exp. 1
A/O
Feingold, Mahoney (1975)
J of Pers
Taub, Dollinger (1975)
Design
Calder, Staw (1975)
Journal
Author(s)
Appendix C: (Continued )
Drawing
Drawing
Computer game
Computer game
Drawing
Drawing
Dot-to-dot connections
Puzzles
Puzzles
Scoring questions
Scoring questions
Train game
Train game
Listening to songs
2 tasks
2 tasks
Coding
Task
T
T
T
T
T
V
T
T
T
T
T
T
T
T
T
T
T
Reward type
E
E
E
E
E
U
E
E
E
E
E
E
E
E
E
E
E
Expectancy
Not, NC
Not, TC
Not, TC
Not, TC
Not, TC
Cont
Not, TC
Not, TC
Cont, TC
Not, NC
Cont, PC
Cont, PC
Not, TC
Cont, PC
Cont, PC
Cont, PC
Contingency
Free time
Free time
Volunteer
Attitude
Free time
Free time
# of connections
Volunteer
Attitude
Attitude
Attitude
Attitude
Free time
Free time
Attitude
Attitude
Attitude
Dep. measure
12
12
17
17
36
18
5
20
20
37
31
38
38
16
40
24
124
N exp.
12
12
36
36
46
46
–
20
20
30
30
39
39
16
40
24
124
N control
+0.44
–0.64
+0.02
0.00a
+0.04
+1.07
+0.34
+0.28
+0.22b
+0.19
–0.23
–0.01b
–0.12b
–0.84
+0.39
+1.15
0.00a
Effect size (g)ab
228 Motivation
9/4/2010 7:05:35 PM
Salkind_Chapter 65.indd 229
Child Dev
Child Dev
Child Dev
Child Dev
Per & Soc Psy Bull
Cog Ther & Res
Mot & Emotion
Mot & Emotion
Mot & Emotion
Mot & Emotion
Mot & Emotion
Mot & Emotion
Int J. of Sport Psy
Int J. of Sport Psy
Swann, Pittman (1977) Exp. 1
Swann, Pittman (1977) Exp. 2
Karniol, Ross (1977)
Karniol, Ross (1977)
Pittmann et al. (1977)
Mynatt et al. (1978)
Weiner, Mander (1978)
Weiner, Mander (1978)
Weiner, Mander (1978)
Weiner, Mander (1978)
Weiner, Mander (1978)
Weiner, Mander (1978)
Orlick, Mosher (1978)
Orlick, Mosher (1978)
B/A
B/A
A/O
A/O
A/O
A/O
A/O
A/O
B/A mult. trials
A/O
A/O
A/O
A/O
A/O
A/O
Int J. of Sport Psy B/A
Child Dev
Swann, Pittman (1977) Exp. 1
A/O
Children
Children
Children
Adults
Adults
Adults
Adults
Adults
Adults
Children
Adults
Children
Children
Children
Children
Children
Adults
Stabilometer
Stabilometer
Stabilometer
Decoding cartoons
Decoding cartoons
Decoding cartoons
Decoding cartoons
Decoding cartoons
Decoding cartoons
Educ games
Gravitation
Slide show
Slide show
Drawing
Drawing
Drawings
Soma
V
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
U
U
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
Cont, TC
Cont, PC
Not, TC
Cont, PC
Not, TC
Cont, PC
Not, TC
Not, TC
Cont, PC
Cont, PC
Not, TC
Not, TC
Not, NC
Not, NC
Cont, PC
Free time
Free time
Free time
Performance
Performance
Attitude
Attitude
Free time
Free time
Free time
Attitude
Free time
Free time
Free time
Free time
Free time
Attitude
11
12
14
30
30
30
30
30
30
10
60
20
17
39
20
20
30
12
12
12
30
30
30
30
30
30
10
20
20
20
26
20
20
30
(Continued )
–0.22
–0.82
–0.34
–0.39b
–0.39b
000a
0.00a
–0.54
–0.34
+1.01
–0.20
+0.15
–0.04
–0.15b
–0.78b
–0.21b
+0.41b
Reinforcement, Reward, and Intrinsic Motivation
Orlick, Mosher (1978)
JPSP
Shapira (1976)
Cameron and Pierce 229
9/4/2010 7:05:35 PM
Salkind_Chapter 65.indd 230
SS Repeated Children measures
Behavior Therapy
J of School Psych
Mot & Emotion
Mot & Emotion
Child Dev
JPSP
JPSP
Child Dev
Child Dev
J. Applied Soc Psych
J. Applied Soc Psych
J. Applied Soc Psych
Davidson, Bucher (1978)
Vasta et al. (1978)
Arkes (1979)
Arkes (1979)
Loveland, Olley (1979)
Harackiewicz (1979)
Harackiewicz (1979)
McLoyd (1979)
McLoyd (1979)
Wimperis, Farr (1979)
Wimperis, Farr (1979)
Wimperis, Farr (1979)
Research Quarterly
A/O
Dollinger, Thelan (1978) JPSP
Weinberg, Jackson (1979)
A/O
JPSP
A/O
A/O
A/O
A/O
A/O
A/O
B/A
B/A
A/O
A/O
A/O
Adults
Adults
Adults
Adults
Children
Children
16-yr.-olds
16-yr.-oIds
Children
Adults
Adults
SS Repeated Children measures
Children
Adults
Adults
Smith, Pittman (1978)
A/O
JPSP
Subjects
Smith, Pittman (1978)
Design
Journal
Author(s)
Appendix C: (Continued )
Stabilometer
Erector sets
Erector sets
Erector sets
Reading books
Reading books
Hidden puzzles
Hidden puzzles
Drawing
Soma
Soma
Coloring
Playing with clown
Mazes
Labyrinth
Labyrinth
Task
T
T
T
T
T
T
T
V
T
T
T
T&V
T
T&V
T
T
Reward type
E
E
E
E
E
E
E
U
E
E
E
U
E
E
E
E
Expectancy
Cont, PC
Both
Cont, PC
Not, TC
Cont, TC
Cont, TC
Not, TC
Not, TC
Cont, TC
Cont, TC
Not
Both
Cont, TC
Cont, TC
Contingency
Attitude
Volunteer
Attitude
Attitude
Performance
Free time
Attitude
Attitude
Free time
Attitude
Free time
Time
# of responses
Attitude
Performance
Attitude
Dep. measure
40
32
16
16
36
36
31
31
12
32
32
6
3
48
66
66
N exp.
40
16
16
16
18
18
31
31
12
32
32
–
–
12
33
33
N control
0.00a
+0.69
+1.36
+0.56
–0.40
–0.22
–0.38
+0.59
0.00a
+0.03
–0.16
+0.74
+1.83
0.00a
0.00a
–0.10b
Effect size (g)ab
230 Motivation
9/4/2010 7:05:35 PM
Behavior Mod
Soc Beh & Pers
J of Soc Psych
J of Soc Psych
J of Soc Psych
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
J of Pers
J of Pers
JPSP
JPSP
J Applied Psych A/O
J Applied Psych A/O
J Applied Psych A/O
JPSP
JPSP
Vasta, Stirpe (1979)
Salkind_Chapter 65.indd 231
Brennan, Glover (1980)
Weiner (1980)
Weiner (1980)
Weiner (1980)
Rosenfield et al. (1980)
Rosenfield et al. (1980)
Rosenfield et al. (1980)
Rosenfield et al. (1980)
Rosenfield et al. (1980)
Rosenfield et al. (1980)
Staw et al. (1980)
Staw et al. (1980)
Williams (1980)
Williams (1980)
Daniel, Esser (1980)
Daniel, Esser (1980)
Daniel, Esser (1980)
Morgan (1981) Exp. 1
Morgan (1981) Exp. 1
Adults
Adults
Children
Children
Adults
Adults
Adults
Children
Children
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Puzzles
Puzzles
Puzzles
Puzzles
Puzzles
4 games
4 games
Puzzles
Puzzles
Ad Lib
Ad Lib
Ad Lib
Ad Lib
Ad Lib
Ad Lib
Anagrams
Anagrams
Anagrams
Soma
Math problems
Water jar problem
Water jar problem
T
T
T
T
T
T
T
T
T
T
T
T
V
V
V
T
T
T
T
T
T
T
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
E
Not, TC
Not, TC
Cont, TC
Cont, TC
Cont, TC
Not, TC
Not, TC
Not, TC
Cont, PC
Cont, PC
Cont, PC
Cont, PC
Cont, PC
Cont, PC
Not, NC
Not
Cont, PC
Cont, PC
Attitude
Free time
Volunteer
Attitude
Free time
Attitude
Free time
Volunteer
Attitude
Volunteer
Attitude
Free time
Volunteer
Attitude
Free time
Performance
Volunteer
Attitude
Free time
Time
Volunteer
Attitude
27
27
32
32
32
24
24
47
47
30
30
30
30
30
30
24
24
24
19
4
18
20
27
27
32
32
32
24
24
46
46
27
27
27
59
59
59
24
24
24
39
–
17
20
0.00a
(Continued )
–0.31
–0.98
+0.08
–0.19b
–0.52
0.00a
–0.32
+0.34
+0.19
+0.27
+2.80
+0.65
–0.76
–0.64
+0.48
+0.35
0.00a
+1.06
–0.46
–0.43b
–0.04
Reinforcement, Reward, and Intrinsic Motivation
A/O
A/O
B/A
B/A
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
B/A
SS Repeated Children measures
A/O
J Exp Soc Psych
McGraw, McCullers (1979)
A/O
J Exp Soc Psych
McGraw, McCullers (1979)
Cameron and Pierce 231
9/4/2010 7:05:35 PM
Journal
JPSP
JPSP
J of Res in Pers
J of Res in Pers
Pers & Soc Psych Bull
J of Soc Psych
J of Soc Psych
Child Dev
Child Dev
Child Dev
Child Dev
Org Beh & Hum Perf
Org Beh & Hum Perf
Mot & Emotion
Author(s)
Morgan (1981) Exp. 2
Morgan (1981) Exp. 2
Brockner, Vasta (1981)
Brockner, Vasta (1981)
Pittman et al. (1980)
Shanab et al. (1981)
Shanab et al. (1981)
Danner, Lonkey (1981)
Danner, Lonkey (1981)
Danner, Lonkey (1981)
Danner, Lonkey (1981)
Boal, Cummings (1981)
Boal, Cummings (1981)
Luyten, Lens (1981)
Appendix C: (Continued )
Salkind_Chapter 65.indd 232
A/O
Field study
Field study
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
Design
Adults
Adults
Adults
Children
Children
Children
Children
Adults
Adults
Adults
Adults
Adults
Children
Children
Subjects
Wood models
Coding data
Coding data
Class inclusion
Class inclusion
Class inclusion
Class inclusion
Soma
Soma
Soma
Soma
Soma
Puzzles
Puzzles
Task
T
T
T
V
V
T
T
V
V
V
T
T
T
T
Reward type
E
E
E
U
U
E
E
U
U
U
E
E
E
E
Expectancy
Not, TC
Cont, TC
Not, NC
Not, TC
Not, TC
Cont, TC
Cont, TC
Not, TC
Not, TC
Contingency
Free time
Free time
Free time
Attitude
Free time
Attitude
Free time
Attitude
Free time
Free time
Attitude
Free time
Attitude
Free time
Dep. measure
10
21
21
30
30
30
30
20
20
24
25
26
20
20
N exp.
10
22
22
30
30
30
30
20
20
12
26
26
20
20
N control
–0.96
+0.38
+1.64
–0.08
–0.10
–1.23
–1.33
+0.43
+0.64
+0.80
–0.58
–0.37
+0.04
–0.77
Effect size (g)ab
232 Motivation
9/4/2010 7:05:35 PM
Salkind_Chapter 65.indd 233
Child Dev
Child Dev
Child Dev
Pallack et al. (1982)
Pallack et al. (1982)
JPSP
Pittman et al. (1982) Exp. 2
Pallack et al. (1982)
JPSP
Pittman et al. (1982) Exp. 1
Org Beh & Hum Perf
Porac, Meindl (1982)
JPSP
J General Psych
Zinser et al. (1982)
Pittman et al. (1982) Exp. 1
Social Cognition
Boggiano et al. (1982)
JPSP
Am. J Psych
Fabes et al. (1981)
Pittman et al. (1982) Exp. 1
Mot & Emotion
Luyten, Lens (1981)
J of Pers
Mot & Emotion
Luyten, Lens (1981)
J of Pers
Mot & Emotion
Luyten, Lens (1981)
Earn (1982)
Mot & Emotion
Luyten, Lens (1981)
Earn (1982)
Mot & Emotion
Luyten, Lens (1981)
A/O
A/O
A/O
A/O
A/O
Children
Children
Children
Children
Children
Children
Children
Adults
Adults
Adults
Children
Children
Adults
Drawing
Drawing
Drawing
Drawing
Matching games
Matwzching games
Matching games
Anagrams
Anagrams
Soma
Hidden pictures
Hidden pictures
Algorithms heuristic tasks
Wood models
Wood models
Wood models
Wood models
Wood models
T
V
V
T
T
T
T
T
T
T
V
T
T
T
T
T
T
T
U
E
U
E
E
E
E
E
E
E
U
E
E
E
E
E
E
E
Not, TC
Not, TC
Not, TC
Not, NC
Not, TC
Not, TC
Not, TC
Not, TC
All
Cont, PC
Cont, PC
Cont, PC
Not, TC
Not, TC
Free time
Free time
Free time
Free time
Attitude
Free time
Free time
Attitude
Free time
Free time
Free time
Free time
Performance
Volunteer
Attitude
Free time
Volunteer
Attitude
15
14
14
28
20
10
10
40
40
40
64
81
57
10
10
10
10
10
12
12
12
28
10
10
10
20
20
20
32
84
19
10
10
10
10
10
(Continued )
–0.44
+0.32
–0.48
–0.05
0.00a
+0.25
+0.37
+0.18
–0.28
–0.21
+0.08
+0.28
–0.53
+1.08
+0.08
–0.91
–1.15
–0.88
Reinforcement, Reward, and Intrinsic Motivation
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
Adults
Adults
Adults
Adults
Adults
Cameron and Pierce 233
9/4/2010 7:05:35 PM
Salkind_Chapter 65.indd 234
J Management
Aust & N.Z. J Dev. Dis.
Social Cognition
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
Child Dev
Child Dev
Child Dev
Boggiano, Hertel (1983)
Ryan et al. (1983)
Ryan et al. (1983)
Ryan et al. (1983)
Ryan et al. (1983)
Ryan et al. (1983)
Ryan et al. (1983)
Morgan (1983) Exp. 1
Morgan (1983) Exp. 1
Morgan (1983) Exp. 2
J Management J Management
Crino, White (1982) Crino, White (1982)
Ogilvie, Prior (1982)
J Management
Crino, White (1982)
Child Dev
Pallack et al. (1982)
Journal
Crino, White (1982)
Author(s)
Appendix C: (Continued )
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
B/A
A/O
A/O A/O
A/O
A/O
Design
Children
Children
Children
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Children
Adults
Adults Adults
Adults
Children
Subjects
Puzzles
Puzzles
Puzzles
Hidden puzzles
Hidden puzzles
Hidden puzzles
Hidden puzzles
Hidden puzzles
Hidden puzzles
Memory task
Drawing
Puzzles
Puzzles Puzzles
Puzzles
Drawing
Task
T
T
T
V
V
T
T
T
T
T
T
V
V V
V
T
Reward type
E
E
E
E
E
E
E
E
E
U
E
U
U U
U
E
Expectancy
Not, TC
Not, TC
Not, TC
Not, TC
Not, TC
Cont, PC
Cont, PC
Not, TC
Contingency
Free time
Attitude
Free time
Attitude
Free time
Attitude
Free time
Attitude
Free time
Attitude
Free time
Volunteer
Volunteer Attitude
Attitude
Free time
Dep. measure
40
40
40
64
64
16
16
32
32
46
26
20
20 20
20
15
N exp.
40
20
40
32
32
16
16
32
32
46
26
10
10 10
10
12
N control
–0.59
–0.27b
–1.94
0.00a
+0.47
0.00a
–0.35
0.00a
–0.46
+.0.02
–0.08
+0.64
+0.49 +0.07
+0.01
–0.16
Effect size (g)ab
234 Motivation
9/4/2010 7:05:35 PM
B/A
B/A
Bull Psych Society
Sex Roles
Sex Roles
Br. J Dev Psych
Br. J Dev Psych
J Exp. Psych
Bull Psych Society
Bull Psych Society
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
DeLoach et al. (1983)
Blanck et al. (1984)
Blanck et al. (1984)
Sarafino (1984)
Sarafino (1984)
Harackiewicz et al. (1984)
Griffith et al. (1984)
Griffith et al. (1984)
Pretty, Seligman (1984) Exp. 1
Pretty, Seligman (1984) Exp. 1
Pretty, Seligman (1984) Exp. 1
Pretty, Seligman (1984) Exp. 1
Pretty, Seligman (1984) Exp. 1
Pretty, Seligman (1984) Exp. 1
Pretty, Seligman (1984) Exp. 2 Pretty, Seligman (1984) Exp. 2
B/A
Salkind_Chapter 65.indd 235
B/A
B/A
B/A
B/A
B/A
B/A
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
J Sport Psych
Vallerand (1983)
A/O
Child Dev
Morgan (1983) Exp.2
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Children
Children
16-yr.-olds
Children
Children
Adults
Adults
Children
Children
Children
Soma
Soma
Soma
Soma
Soma
Soma
Soma
Soma
Reading books
Reading books
Hidden puzzles
Riddles
Riddles
Word game
Word game
Connect dots
Slideshow game
Puzzles
T
T
T
V
V
T
T
T
T
T
T
T
T
T
V
V
T
V
E
E
E
U
U
U
U
E
E
E
E
E
E
U
U
E
E
Not, TC
Not, TC
Not, TC
Not, TC
Not, TC
Not, TC
Cont, PC
Not, TC
Not, TC
Not, TC
Not, TC
Attitude
Attitude
Free time
Attitude
Free time
Attitude
Free time
Attitude
Free time
Performance
Free time
Attitude
Attitude
Free time
Attitude
Free time
Free time
Attitude
20
30
30
30
30
30
30
30
30
64
64
47
85
85
70
70
26
40
20
30
30
30
30
30
30
30
30
32
32
47
15
15
69
69
26
10
0.00
Reinforcement, Reward, and Intrinsic Motivation
(Continued )
–0.16
–0.13
+0.46
+0.35
+0.42
+0.06
–0.05
–0.75
0.00a
0.00a
+0.33
0.00a
–0.41
+0.46
+0.56
0.00a
+1.98
Cameron and Pierce 235
9/4/2010 7:05:35 PM
Journal
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
JPSP
J Sport Psych
Acad. Man. J.
Author(s)
Pretty, Seligman (1984) Exp. 2
Pretty, Seligman (1984) Exp. 2
Harackiewicz et al. (1984) Exp. 1
Harackiewicz et al. (1984) Exp. 1
Harackiewicz et al. (1984) Exp. 1
Harackiewicz et al. (1984) Exp. 2
Harackiewicz et al. (1984) Exp. 2
Harackiewicz et al. (1984) Exp. 2
Harackiewicz et al. (1984) Exp. 2
Harackiewicz et al. (1984) Exp. 3
Harackiewicz et al. (1984) Exp. 3
Vallerand, Reid (1984)
Arnold (1985)
Appendix C: (Continued )
Salkind_Chapter 65.indd 236
B/A
B/A
B/A
B/A
B/A
B/A
B/A
B/A
B/A
B/A
B/A
B/A
B/A
Design
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Subjects
Computer game
Stabilometer
Pinball
Pinball
Pinball
Pinball
Pinball
Pinball
Pinball
Pinball
Pinball
Soma
Soma
Task
T
V
T
T
T
T
T
T
T
T
T
T
T
Reward type
E
E
E
E
U
U
E
E
E
E
E
U
U
Expectancy
Both
Cont, PC
Cont, PC
Cont, PC
Cont, PC
Cont, PC
Cont, PC
Cont, PC
Contingency
Attitude
Attitude
Performance
Attitude
Performance
Attitude
Performance
Attitude
Performance
Attitude
Free time
Attitude
Free time
Dep. measure
26
28
26
26
15
15
15
15
32
32
32
30
30
N exp.
16
28
26
26
15
15
15
15
32
32
32
30
30
N control
–0.04
+0.53 b
+0.04
+0.32
+0.44
+0.15
–0.43
+0.18
+0.16
+0.03
+0.07
+0.38
+0.06
Effect size (g)ab
236 Motivation
9/4/2010 7:05:35 PM
Social Cognition
Social Cognition
Org Beh & Hum Dec P
Org Beh & Hum Dec P
Psych Studies
Psych Studies
Psych Studies
Psych Studies
Psych Studies
Psych Studies
JPSP
JPSP
JPSP
JPSP
JPSP
Boggiano et al. (1985)
Boggiano et al. (1985)
Freedman, Phillips (1985)
Salkind_Chapter 65.indd 237
Freedman, Phillips (1985)
Tripathi, Agarwal (1985)
Tripathi, Agarwal (1985)
Tripathi, Agarwal (1985)
Tripathi, Agarwal (1985)
Tripathi, Agarwal (1985)
Tripathi, Agarwal (1985)
Sansone (1986) Exp. 1
Amabile et al. (1986) Exp.1
Amabile et al. (1986) Exp. 1
Amabile et al. (1986) Exp. 3
Harackiewicz et al. (1987)
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
16-yr.-olds
Adults
Children
Children
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Adults
Children
Children
Puzzles
3 tasks
3 tasks
3 tasks
Identify names
Puzzles
Puzzles
Puzzles
Puzzles
Puzzles
Puzzles
Proof reading
Proof reading
Puzzles
Puzzles
T
T
T
T
V
V
V
V
T
T
T
T
T
T
T
E
E
E
E
U
E
E
E
E
E
E
E
E
E
E
Cont, PC
Not, TC
Not, TC
Not, TC
Not, TC
Not, TC
Not, TC
Cont, PC
Not, TC
Cont, PC
Not, TC
Attitude
Attitude
Attitude
Free time
Attitude
Performance
Attitude
Free time
Performance
Attitude
Free time
Attitude
Attitude
Free time
Free time
24
30
56
56
44
20
20
20
20
20
20
47
52
26
26
27
30
57
57
11
20
20
20
20
20
20
47
47
13
13
Reinforcement, Reward, and Intrinsic Motivation
(Continued )
–0.10
0.00a
0.00a
0.00a
+0.68
+0.54
+0.48
+1.61
+0.54
+0.54
+0.41
+0.68
+0.75
–0.10
–0.79
Cameron and Pierce 237
9/4/2010 7:05:35 PM
Salkind_Chapter 65.indd 238
JPSP
JPSP
J Ed Psych
J Ed Psych
J Gen Psych
J Gen Psych
Koestner et al. (1987)
Koestner et al. (1987)
Butler (1987)
Butler (1987)
Tripathi, Agarwal (1988)
Tripathi, Agarwal (1988)
J of Psych
Fabes (1987) Exp. 1
J of Psych
Pers & Soc Psych Bull
Hom (1987) Exp. 2
J of Psych
Pers & Soc Psych Bull
Hom (1987) Exp. 1
Fabes (1987) Exp. 2
Pers & Soc Psych Bull
Hom (1987) Exp. 1
Fabes (1987) Exp. 1
Journal
Author(s)
Appendix C: (Continued )
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
A/O
Design
Adults
Adults
Children
Children
Adults
Adults
Children
Children
Children
Adults
Adults
Adults
Subjects
Problem solving
Problem solving
Problem solving
Problem solving
Hidden puzzles
Hidden puzzles
Block building
Block building
Block building
Solving anagrams
Pursuit rotor task
Pursuit rotor task
Task
T
T
V
V
V
V
T
T
T
V
T
T
Reward type
E
E
U
U
U
U
E
E
E
?
?
?
Expectancy
Cont, PC
Not, TC
Not, TC
Cont, PC
Not, TC
?
Not
Not
Contingency
Free time
Free time
Performance
Attitude
Attitude
Free time
Free time
Free time
Free time
Performance
Attitude
Free time
Dep. measure
20
20
50
50
35
35
14
19
18
28
26
26
N exp.
10
10
50
50
18
18
14
19
19
28
26
26
N control
+1.18
+0.03
+0.39
+1.59
0.00a
+0.51
–0.45
–0.87
–0.82
–0.37
0.00a
+0.11b
Effect size (g)ab
238 Motivation
9/4/2010 7:05:36 PM
Salkind_Chapter 65.indd 239
JPSP
J App Soc Psych A/O
J Org Beh SS Repeated Adults Management measures
J of Psych
J of Psych
Sansone et al. (1989)
Anderson, Rodin (1989)
Mawhinney et al. (1989)
Wicker et al. (1990)
Wicker et al. (1990)
Adults
Adults
Adults
Adults
Adults
Think Tac Toe
Think Tac Toe
Video game
Brain teasers
Computer games
Identify names
Beanbag game
Problem solving
T
T
T
V
V
V
T
T
E
E
E
U
U
U
E
E
Not, TC
Not, TC
Not
Not, TC
Both
Attitude
Free time
Time
Attitude
Attitude
Attitude
Free time
Attitude
Notes: Design: B/A = before-after groups design, A/O = after-only groups design, SS = single-subject design Reward type: T = tangible, V = verbal Reward expectancy: E = expected, U = unexpected Reward contingency: cont = contingent, not = not contingent; NC = nontask contingent, TC = task contingent, PC = performance contingent a indicates effect sizes given a value of zero (nonsignificant results with no report of means or direction of means) b indicates estimated effect sizes JPSP = Journal of Personality and Social Psychology J of Pers = Journal of Personality Org Beh & Hum Perf = Organizational Behavior and Human Performance J Exp Soc Psych = Journal of Experimental Social Psychology Child Dev = Child Development Per & Soc Psy Bull = Personality and Social Psychology Bulletin Cog Ther & Res = Cognitive Therapy and Research Mot & Emotion = Motivation and Emotion Int J of Sport Psy = International Journal of Sport Psychology J of School Psych = Journal of School Psychology J Applied Soc Psych = Journal of Applied Social Psychology
A/O
A/O
A/O
J Exp Soc Psych A/O
Sansone (1989)
Children
Mot & Emotion A/O
Adults
Fabes et al. (1988)
A/O
J Gen Psych
Tripathi, Agarwal (1988)
29
29
3
10
40
82
14
40
29
29
–
10
40
41
14
10
(Continued )
0.00a
0.00a
+0.15
+0.90
+0.12
+0.46
–1.34
+0.26b
Cameron and Pierce Reinforcement, Reward, and Intrinsic Motivation 239
9/4/2010 7:05:36 PM
Appendix C: (Continued )
Behavior Mod = Behavior Modification Soc Beh & Pers = Social Behavior and Personality J of Soc Psych = Journal of Social Psychology J Applied Psych = Journal of Applied Psychology J of Res Pers = Journal of Research in Personality J General Psych = Journal of General Psychology J Management = Journal of Management Aust & N.Z. J Dev Dis = Australia and New Zealand Journal of Developmental Disabilities J Sport Psych = Journal of Sport Psychology Bull Psych Society = Bulletin of the Psychonomic Society Br J Dev Psych = British Journal of Developmental Psychology J Exp Psych = Journal of Experimental Psychology Acad Man J = Academy of Management Journal Org Beh & Hum Dec P = Organizational Behavior and Human Decision Processes Psych Studies = Psychological Studies J Org Beh Management = Journal of Organizational Behavior Management
240 Motivation
Salkind_Chapter 65.indd 240
9/4/2010 7:05:36 PM
66 Motivation in Transition Barbara Stauber
Introduction
T
his article focuses on young people’s motivation in their transitions from school to work. The context for these considerations is a general pattern of change in youth transitions throughout Europe, which, as recent research shows, have become more prolonged, fragmented, insecure and, in many respects, reversible – a process described as ‘Yoyoisation’ (EGRIS, 2001).The metaphor of the yo-yo illustrates the phenomenon that youth transitions no longer represent a (linear) status passage from youth to adulthood, but include forward and backward movements, ups and downs, between both statuses, creating an ever longer transitional period of semi-dependencies. Broadly speaking, this change is part of a general destandardization of the life course that has taken place during the last 30 years (Hurrelmann, 2003), in conjunction with major changes in two fundamental social institutions: work and family, and their gendered notions of what constitutes ‘normal’ male work and ‘normal’ female family life (Stauber, 2006). These changes address increasing demands to the younger generation in terms of their life decisions and career orientations, and increasingly these demands have to be coped with by the individual. Formal qualifications are necessary, but no longer sufficient. Concepts such as self-organization or even self-socialization (Heinz, 2002) are indicative of these new demands. Such concepts are, however, predicated on certain material and psychological conditions, above all a sense of ‘social hold’ in order to provide what Anthony
Source: Young: Nordic Journal of Youth Research, 15(1) (2007): 31– 47.
Salkind_Chapter 66.indd 241
9/4/2010 10:52:10 AM
242
Motivation
Giddens (1991) has called ‘ontological security’. And it is precisely these soft prerequisites that are missing in risky transitions: stable and reliable relationships and a certain amount of social capital, providing opportunities which help to respectively create and maintain a certain level of motivation. The article draws from some recently completed research (Walther et al., 2006; see also Walther, 2006) in which the interrelation between motivation and conditions for transitions into work has been explored in depth – the European research project Yo-Yo (Youth Policy and Participation). After introducing some basic concepts, I will briefly present three of its case study projects in order to discuss exemplary research findings on the issue of motivational change in youth transitions. The analysis will focus on a series of modes of participation identified in the research, which have general relevance for motivating young people.
The European Research Project Yo-Yo – Youth Policy and Participation The Yo-Yo-project can be situated at the interface of several European discourses on young people: the debate on social exclusion; the respective policy recommendations, such as (lifelong) learning, informal/nonformal learning, youth citizenship and ‘activation’; and, connected to these discourses, the topic of participation. Focusing on the latter, the project sought to attain a more profound understanding of what participation means, to enable young people to regain a pro-active attitude towards their individual transition project. The Yo-Yo-research project was carried out at a comparative level, encompassing nine European countries.1 Methodologically it is based on case study analysis (28 case studies), which included semi-structured qualitative interviews with 365 young people (with around 70 per cent interviewed twice), expert interviews (141 individuals working in the field of school to work transitions) and video projects with young people. The key research question for the Yo-Yo-project was whether motivational change could be facilitated by increased possibilities of participation. This core question was broken down into two further questions: what are the prerequisites, settings and, above all, opportunity structures in case study projects which facilitate motivational experiences for young people? How do different projects provide for different target groups in different contexts? A broad range of projects was selected for case study analysis, clustered in respect to their function in the transition system and their profile as either gravitating towards soft’ youth work (highly participatory but without direct career related outcomes in terms of transitions) or ‘hard’ employment-centred schemes (intended to lead to integration into work, but not necessarily in a participatory way).2 All projects represent best practice and have been chosen
Salkind_Chapter 66.indd 242
9/4/2010 10:52:10 AM
Stauber
Motivation in Transition
243
because, compared to normal practice in the respective transition systems, they are exemplary in terms of facilitating participation. Key concepts of the research are motivation, participation and, for comparative purposes, the concept of transition regimes.
Motivation Motivation theory has widely influenced educational research and practice (Maehr and Meyer, 1997), but has still not really been incorporated into transition research and policies. This is all the more surprising since motivation has to be regarded as an increasingly important, but at the same time scarce, resource that can easily be lost, for instance in institutionally misled trajectories (see Walther et al., 2002). Corresponding with the expectancy-value approach of Allan Wigfield and Jaquelynne Eccles (2000), we started with a double theoretical assumption that motivation, in the sense of a proactive attitude among young people towards their transitions, can result on the one hand from subjective needs and interests, and on the other hand from the perceived probability of achieving subjectively relevant goals. Both aspects are contextual (Ryan and Deci, 2000), namely they relate to different contextual conditions, and are open to experiences and potential change: needs or interests generate from interaction with the social world, and these may change with their fulfilment, giving rise to other needs or interests. The perception of the probability of achieving goals depends on the facilities available to young people to experience self-efficacy (Bandura, 1994). The notion of motivational change corresponds with the way we look at young people’s transitions. In this respect, change is perceived as the natural order, namely is normal and common, whereas stability represents a more unusual condition (Abbott, 1997) – a perspective which has become increasingly salient under the conditions of late modernity and destandardized life-courses (Fornäs, 1995). In these motivational changes, intrinsic and extrinsic aspects of motivation are interwoven: the desire to engage in a specific activity (e.g. enjoying dance) often has extrinsic aspects related to the instrumental quality or consequences of this activity.3 Young people’s eagerness to engage with their own transition biographies is in constant flux, depending on their experiences. With these sets of experiences, every young person, throughout his or her transition to adulthood, develops a personal motivational career,4 influenced by empowering or frustrating experiences, facilitated or hindered by more or less favourable contexts – in their broader social surroundings as well as in concrete pedagogical settings (du Bois-Reymond and Stauber, 2005). With this dynamic, biographical concept, motivation gets embedded in social contexts, which are partly shaped by individuals, but, to a far greater extent, are out of their hands. This relates to opportunity structures, which in the Yo-Yo-project focused on the perspective of participation.
Salkind_Chapter 66.indd 243
9/4/2010 10:52:10 AM
244
Motivation
Participation In order to define this widely discussed concept, we used two central lines of distinction: • participation as active influence versus passive (or formalized) involvement; and • participation as a principle versus participation as a goal of policies. Our concept of participation contrasts with formal participation models, which in theory ‘allow’ young people to have a voice without considering the unequal distribution of tools and access structures that enable them to do so in practice. Instead, we include concepts such as ‘active citizenship’ (European Commission, 2001), which considers the scopes and tools for active engagement. Such an understanding of participation is missing in young people’s transitions from school to work. Above all in the hard policy sector (education, training and labour market policies), with its powerful gatekeepers, young people are channelled into trajectories in which participation is postponed to an ill-defined ‘later’. Instead, we argue that participation should be an integral principle of policies from design to practice. Translated to the supporting measures for youth transitions, which are at the forefront of our research, this principle would encompass four dimensions: (1) voluntary (and thereby self-chosen) attendance at the project; (2) involvement in project related decision-making; (3) social and civic engagement by means of citizenship tools within a community approach; and (4) biographical selfdetermination. The latter is the most demanding level of participation, but the dimensions are interrelated: as soon as young people start to feel that they can ‘participate in the making of their own future’ (Project Leader of Cityteam, The Netherlands), the other dimensions can become subjectively meaningful. This could also operate the other way around: as soon as young women and men feel that they are part of a community, have influence within it, are recognized, acknowledged and mirrored’, their subjective goals and abilities can become clearer to them. Here, both the expectation of success and the needs/interests aspect of motivation are equally relevant – eventually they are interlinked in the sense that the probability of achieving goals influences the definition of interests (Cocks and Watt, 2004).
Transition Regimes The concept of transition regimes is necessary for transnational comparison. This concept draws on the famous regime differentiation of Gösta EspingAnderson (1990), enlarged by Duncan Gallie and Serge Paugam (2000), and transfers it to the transition topic (Walther et al., 2006).Transition regimes refer to the interplay of socio-economic structures, institutions, cultural norms
Salkind_Chapter 66.indd 244
9/4/2010 10:52:10 AM
Stauber
Motivation in Transition
245
and the agency of pedagogues, parents, peers and young people themselves in all areas that are relevant for youth transitions. Such areas include labour markets and the ways in which they are regulated; education and training and how they are organized; compensatory programmes for unemployed youth and concepts of ‘disadvantage’; as well as mechanisms relating to gender, ethnicity and, more generally, difference. Considering, that the regimeapproach relates more to the general ‘Gestalt’ (Kaufmann, 2003) than to specificities of individual transition systems, and that there are considerable variations and also developmental dynamics within and between the countries subsumed under one regime, we refer to five broad regime types: the universalistic transition regime of the Scandinavian countries, the liberal regime of the Anglo-Saxon countries, the employment-centred regime in Continental countries, the sub-protective regime in Southern Europe, and the postsocialist transition regime, including a very heterogeneous group of East-European countries. These regimes can also be distinguished by the ways in which they deal with young people’s motivation, in which they function as motivational systems, and the ways in which they regulate flows of students and applicants by encouraging or ‘cooling’ individual aspirations (Goffman, 1963). Very roughly, we can distinguish the universalistic regime, which predominantly works through motivating young people to find out what suits them best, allows for biographical orientation and empowers individual aspirations, from the work-oriented regime types which operate much more through the principle of sorting out and thus dampening down the motivation of those who do not fit into certain educational clusters. The starting point of our research was the view that in destandardized transitions from school to work, individual motivation to orient one’s career, take decisions, complete education or training, or seek alternatives if chosen careers turn out to be inappropriate, is an increasingly important prerequisite. It is, above all, in such critical situations that maintaining a proactive attitude becomes crucial. This was confirmed by both groups of young people we interviewed, most visibly by the young women and men who managed to develop alternative careers in terms of choice biographies, and which they may realize due to personal competencies and family resources, coupled with favourable conditions on the local labour market. But the need for a proactive attitude was also confirmed by those young women and men whose options were restricted by low qualification levels, few family resources and a labour market segmentation that is unfavourable to them. This second group of young women and men who we accessed in the projects mostly look back on a turbulent motivational career with severe drawbacks. In identifying their motivational changes, we applied two strategies. Where they formed part of young people’s narratives in the interviews, motivational careers could be reconstructed by interview analysis. In other cases, we combined evaluation of the interviews with other sources of information (expert interviews, and
Salkind_Chapter 66.indd 245
9/4/2010 10:52:11 AM
246
Motivation
observation of these young people via the video projects). Although the following section highlights mostly positive motivational change, the concept (and our results) includes upward as well as downward movement in motivational careers. Thus, motivational change cannot simply be equated with progress or success.
Exemplary Cases from the Yo-Yo-Project There is insufficient space in this article to carry out an extended case-study analysis. Instead, we have selected a few exemplary practices from a rather complex and comprehensive research project, restricting our analysis to only three (out of 28) case study examples in order to show the interrelationship between participation and motivation: the Italian project ArciRagazzi, located in the context of the sub-protective transition regime; the Dutch Cityteam project, located in a mixed regime type, including elements from employment-centred, liberal, but also universalistic regimes; and the project Lifting the Limits in Northern Ireland, exemplary of the liberal regime. We will briefly outline these case study projects by looking at their structures, goals, target groups and also some contextual conditions for evaluating their specific participatory strengths, but also some limitations.
ArciRagazzi, Palermo (Italy) – A Youth Work Project5 ArciRagazzi in Palermo, Sicily, is a youth association organizing leisure and cultural activities. It is a local branch of a national association, partly financed through membership fees, partly by public and private funding. A prime objective is to provide young people, especially those living in deprived neighbourhoods or having been released from detention centres, with life perspectives beyond unemployment and/or involvement with the Mafia. This is even more important in a context where youth unemployment stands at 60 per cent and the expectation is of a prolonged waiting period before entering a regular job. Typically for southern Italy this affects young people from all educational levels. Therefore, participants at ArciRagazzi are heterogeneous in terms of class and education. Taking a specific community approach, the project sets up participatory planning initiatives in which children, adolescents and families collaborate to improve public buildings and spaces, and offers all kinds of cultural activities based on young people’s skills and wishes, such as handicrafts workshops, fairs and concerts. Through debates, meetings and assemblies to decide even on project management guidelines, young people are given tools for active participation. The project initiates career orientation and transitions to work, and in this regard also aims at the enhancement of entrepreneurship. In fact, several young men and women have made semi-professional careers from simple engagement in voluntary activities to become freelance project leaders.
Salkind_Chapter 66.indd 246
9/4/2010 10:52:11 AM
Stauber
Motivation in Transition
247
Cityteam ( The Netherlands) – Preparing Young People for Training and Work Cityteam is based in three cities in The Netherlands: Utrecht, Zoetermeer and Rotterdam – the latter, however, has closed down owing to local funding problems which arose during the timeframe of the research. Cityteam provides professional orientation through a flexible combination of workshops, voluntary work and internships in private companies, accompanied by career counselling and coaching, in order to open up individual pathways for each of its participants. The target group of Cityteam consists of a mix of young people from different ethnic backgrounds and mostly with risk-full transitions (such as school dropout, low qualification levels and unemployment). A smaller number of participants, mostly younger than 20 and with at most secondary qualifications, are less at risk but felt they needed to take time out for orientation. The share of the first group of young people has increased lately, owing to the fact that Cityteam has become more involved in providing programmes for publicly funded reintegration trajectories. Cityteam may be regarded as an example of an independent transition institution and public-private partnership, the latter occurring more and more often in The Netherlands; but it seems to be becoming increasingly dependent on scarce public funding. This endangers its approach because of the predominance of youth at risk in what has up to now been rather balanced groups. With its focus on temporary work it also reflects the trend in labour market policies towards greater flexibility. However, it still has a strong focus on biographical participation.
Lifting the Limits, Armagh (Northern Ireland) – A Participatory Training Scheme Lifting the Limits is a year-long programme for young mothers between 16 and 25 in the countryside of Armagh, Northern Ireland. Financed by public funding, it is under increasing pressure due to funding shortfalls and problems finding new premises. It combines personal empowerment and support for these young mothers, with formally acknowledged training guaranteeing their inclusion in future employment. Facilitating structures for their participation are carefully adapted to their needs: a salary of about 8400 Euros per annum, reduced working hours of 25 hours a week, and contributions towards childcare and travel. The participating young women are trained by two peer support workers (former participants of the programme) to do outreach work as community leaders. They directly implement their training into practical work with the same group they themselves belong to: young mothers. The prospect of an immediate switch from trainee to peer-educator creates a highly empowering space for personal and interpersonal development, such as leadership skills, initiative skills and problem solving directly linked to the community. In the female peer-context young mothers can use
Salkind_Chapter 66.indd 247
9/4/2010 10:52:11 AM
248
Motivation
each other as a type of gender role-model that is different from the norm. While a lot of their learning is informal, it is at the same time formally recognized, and successful completion of the training even provides access to higher education qualifications in community youth work, social work and community development. Leaving aside other important areas for analysis, we will immediately move on to modes of participation, which were revealed to be important for (re-)motivating disengaged young people (Walther et al., 2006).
Modes of Participation Through case-study analysis, based on the accounts of both project workers and young people and drawing together subjective experiences and the specific approaches of the projects, different modes of participation were identified. It is important to note that some of these modes correspond to the original function and objectives of the projects, whereas others are a result of the courage of the project workers to transcend institutionally set boundaries.
Choice The first thing is, nobody should tell you: ‘do this and that’ but it’s you in the first place who has to take decisions … a sort of self-experimentation. We also realize we have made some mistakes during this project, but it was nice, even making mistakes, growing up, it was like self-training. (Pamela, female, 21, ArciRagazzi)
The importance of choice was striking in the accounts of the young people interviewed. One could say that for young people choice represents a metaprinciple of self-determination and participation in late modern societies. Because it belongs to the set of cultural demands for individualization, choice has also been described as an ambiguous concept, above all if the necessary prerequisites to take decisions are neglected. From the perspective of the Foulcauldian discourse on governmentality, the issue of choice is even suspected of underpinning the advanced liberal model of social control (Rose, 1999). This meta-discussion in a way passes over issues of subjective relevance, which of course are influenced by ideological discourses, but which have to be respected anyway. The projects under discussion furthermore minimize the ambiguity of choice, because most of them provide a sensitive network of support and companionship for young people in their decisionmaking, which helps to mitigate the inequalities of advanced liberalism. Choice operates at various levels: it concerns an individual’s decision to attend a project or not, but also how to ‘use’ a project to meet individual
Salkind_Chapter 66.indd 248
9/4/2010 10:52:11 AM
Stauber
Motivation in Transition
249
needs and interests. Both aspects have been identified as being crucial for young people in order to identify with a project and to regard it as integral element of their transition. Of course, we find different conditions in the various projects to provide for both dimensions: those that aim to help young people find their orientation most often provide more options in this regard than those that stick to a specific task – for example training – for which they are paid. Biographical participation comes into play, whenever projects allow participants to develop individually appropriate and subjectively meaningful relations not only with the project, but with their life-decisions as well. This interrelation is expressed most clearly by those responsible for the Lifting the Limits project, working with young single mothers: … a kind of self-determination – having freedom in choices around things like … that they don’t want more children, or what they want for their children, or that they don’t have to be with the father of the children, or that they do want to be with the father of their children despite family opposition. And it’s having the courage and the self-confidence to stand up and say, ‘These are the choices I want to make and these are the right choices’, and not to be bullied as a result of that. (Project worker, female, Lifting the limits)
Such empowerment includes a strong notion of negotiating gender. Motivation built up by such empowerment may include financial or material incentives or wages that meet financial needs to build a bridge to more intrinsic experiences. In the case of Lifting the Limits, the access given to paid work and higher education can be regarded as such a bridge. Choice in some projects is explicitly related to a low threshold approach – especially when they attempt to attract young people at risk, who may have already developed a negative street-life-attitude. It is even more important for such projects to reflect on regime-related structures, which young people have experienced as pressure, force or as stigmatization. They need to understand the cycle of de-motivation and try to move beyond it by offering alternative structures.
Flexible Outcomes and Biographical Fit In existing arrangements, if a young person comes in, it is already assumed that he will become a painter. I think this is nonsense. A young person should decide for him- or herself what he or she would like to become. (Cityteam director, male)
Especially with regard to the pre-vocational sector, the degree to which projects either allow for open outcomes or pre-define outcomes can be seen as a criterion for distinguishing participatory from non-participatory
Salkind_Chapter 66.indd 249
9/4/2010 10:52:11 AM
250
Motivation
approaches in terms of biographical self-determination. At the same time, it may be seen as a criterion for being more or less ascribing in terms of gendered vocational routes. However, this criterion has to be combined with biographical fit to the young person concerned. One of the original principles of the Cityteam project is that the young person him or herself is supported to find out what personal goals to follow and what transition steps to take within the project and afterwards. This principle unites what has been worked out as basic motivational factors: sticking with young people’s interests and needs, and providing them with encouraging experiences to achieve their goals, which they can get closer to step-by-step. This demands a high level of flexibility, by which projects are sometimes overburdened. In the case of Cityteam, for example, this could mean that internships cannot be organized at the right time – which could then lead to a withdrawal of motivation: When I started at Cityteam I was looking forward to it a lot. But after a few months, I didn’t care to get up anymore to go there. Because you didn’t do anything and you might as well be sitting at home. (Liv, female, 21, Cityteam)
Projects targeting youth at risk are particularly reliant on the principle of open outcomes, whereas in participatory training and employment projects the situation is completely different: young people enter these projects because they have a clearly defined objective, and because it is attractive to them to achieve it. In these cases, there is no contradiction between a predefined outcome and motivation, but how the means of achieving their goal is presented, and how much room there is for participation and personal decision-making are all the more crucial.
Individuality and Strengths One mode of participation is even more basic in terms of motivating young people: the mode of personal acceptance – ‘come as you are’ is a principle that acknowledges individual needs, peculiarities, obligations and constraints across different transitional strands (Thomson et al., 2002), which young people have to cope with and shape. This counter-strategy to negative labelling and stigmatization responds to young women’s and men’s need for recognition as individuals with normal aspirations. As long as this need is ignored, young people feel alienated from the start and cannot look at the next steps as their own.6 Lifting the Limits is an example of a project that manages to positively represent the situation of young single mothers by underlining the expertise deriving from the daily practice of these young women. This shift from what is normally considered a ‘deficit’ (all the more in terms of labour market
Salkind_Chapter 66.indd 250
9/4/2010 10:52:11 AM
Stauber
Motivation in Transition
251
opportunities) into a competence which gives access to paid work and even higher education, is highly empowering: When Lydia, 23, recalls being faced for the first time with having to take a group of young women in her local community for training: ‘It first was a big “no way” … I didn’t think I could do it’ – but then it felt amazing that I could do that’, which ‘showed me that I could do everything I wanted to do, despite having a child … The training has given me a sense of independence and shown me that I have a choice in how I live my life.’ Indeed, Lydia uses the opportunity of getting access to higher education by planning to study community work after her time in the project. Crucially, the success of this example depends to a great extent on an open access structure within the educational system, as can be found in the liberal transition regime. Similarly, very different projects such as Cityteam and ArciRagazzi have in common a refusal to ascribe stagnation and disadvantage in young people’s transitions to individual deficits, but rather focus on strengths instead. This attitude is even more important in case study projects, which address youth at risk (such as homeless youth), because it forms a basic prerequisite for gaining access to young people who would otherwise be excluded. These projects shift attention from areas in which young people fail to meet formal standards to activities in which they are strong. The approach allows for experiences of success. Instead of lowering the education and training level, which would mean reproducing rather than overcoming the deficit-perspective, it changes the ‘subject’ by shifting attention away from areas in which persons fail to meet formal standards to activities which they can excel at – precisely because they are related to their subjective interests. This approach avoids the mistaken assumption, often made in formal learning settings, that the issue of increased self-efficacy is isolated from the subjective relevance of goals, and combines them instead. Of course, focusing on strengths is closely related with the principle of allowing project-related decision-making and a general climate that incorporates more holistic approaches, in which young people can feel acknowledged in their (special) strengths as well as in their (special) problems.
Space I believe experimenting is important in the transition from school to work, namely having time and opportunities to realize what you like, what you don’t, and what you want to do in life … Experimenting also involves the possibility of making mistakes, and discovering your potential. (Project worker, male, ArciRagazzi)
We found various constellations whose interpretation of participation involved providing places and spaces to be appropriated and shaped by young people themselves according to their own needs and interests. Projects such as ArciRagazzi and Lifting the Limits focus on community-related participation
Salkind_Chapter 66.indd 251
9/4/2010 10:52:11 AM
252
Motivation
and enlarge these spaces to the local level. They aim at fostering a mutual relationship between subjective motivation and active (youth) citizenship. By offering young people the possibility of influencing, shaping and changing their social environment, they provide a means of both constructing personal biographies and actively participating in the community. In ArciRagazzi the connection between space as part of the wider community and space as something to be jointly shaped and decided upon is enacted by internal democratic procedures which are realized in the external community. Participation in practical terms means that project workers do not make full use of their decision-making power, but share it with the young women and men involved in the projects who actively co-decide. This democratic principle is closely related with the rationale of focusing on young people’s strengths and requires confidence in young people’s competencies and in their willingness to invest them in a cooperative context. Giving young people space expresses belief in their strength. As one German project-worker puts it: If you would just let them do, far better things would come about than you and I would even consider. (Project worker, Kompass-Job-in-Club, East Germany)
What this translates to is an explicit empowerment of young people to create spaces for experimentation, in which experiences of self-efficacy can be made. Thus, the issue of space extends to the issue of accessibility (choice) and individuality (recognition). This relates to the external qualities of spaces to represent cool places (Skelton and Valentine, 1998), laden with youth cultural value, as in the case of ArciRagazzi, which is an attractive place for young people to meet. Such projects can be used by young people for self-presentation, which is an important aspect of identity work and thereby closely related to subjective needs and interests as an important level of motivation (Stauber, 2004). But equally important is the recognition that these locations also seem to represent warm places: by providing reliable bonds and a warm and welcoming atmosphere these spaces can in some cases even replace missing families and become homes for young people during a certain phase of their transitions. This second meaning points to a deeper need of young people and represents the ‘holding’ component to cool places. This is reaffirmed by the way some projects offer opportunities for self-presentation, allowing young people to decide for themselves how far they want to become visible, and when they prefer to draw back into the secure backstage of the project. The fashion-shows related to the dressmaking training provided by the German project, La Silhouette, are a perfect symbol of this balance, with individual young women stepping into the spotlight, but cushioned by the bigger group. The combination of cool and warm places allows for the creation of ‘communities of practice’ in which the interplay of meaning, belonging and identity (Wenger, 1998) can even produce a stronger desire for engagement.
Salkind_Chapter 66.indd 252
9/4/2010 10:52:11 AM
Stauber
Motivation in Transition
253
Responsibility Giving young people possibilities to use and to shape space, implies giving them a share of responsibility for what is happening in the project. This means taking them seriously, addressing them as adults. Participation in projectrelated decision-making, involving shared responsibility for the group, a common task or goal, enhances in young people a feeling of making a personal contribution and being socially important, which increases self-esteem. This is most obvious in the case of ArciRagazzi where young people at an early age can become project leaders and run a child recreation centre quasi autonomously. In this case, the task for the project workers is to keep a balance between transferring responsibility to the young people and maintaining a holding frame together with an encouraging atmosphere in which mistakes are accepted as learning experiences. Such a combination of personal challenge and a feeling of security encourages young people and motivates them to take on more responsibility with time.7 The motivating effect of approaches that balance of self-responsibility and support is aptly described by a young women of LaSilhouette: They help you to a certain degree, but you have to make your own contribution. Which means they don’t take the thing out of your hands. No matter, how clumsy your problem is. (Dani, 20, female, Germany)
Taking responsibility does not necessarily have to be an individual project, but often happens in interaction between individuals and their social surroundings. Some projects consciously use this link between social responsibility and biographical meaningfulness by providing opportunities for social engagement: work with children, with the elderly, social engagement in communities, for example.
Trust and Reliability I haven’t got this kind of relation either with my father or with my mother, I can’t talk about certain things with them … He [the project worker] made me see what it really means to listen to somebody. (Pamela, female, 21, ArciRagazzi)
In one way or another, all constellations of participation are based on and depend on relationships of trust. Trust is both a basic need, which grows with the experience of social marginalization, and a prerequisite for increased self-efficacy. First of all, this relates to the relationships between young people and project workers. Project workers often represent an alternative type of adult compared to teachers, employment officers and parents. This ‘otherness’ is based not only on professional habitus, for example as youth worker, but also on a ‘different’ socio-cultural orientation towards liberal,
Salkind_Chapter 66.indd 253
9/4/2010 10:52:11 AM
254
Motivation
alternative milieus, relying more on personal authority than on power. As Pamela’s quote shows, coming to know these project workers can represent a new intergenerational experience. But also at the peer-level of other participants, ‘significant others’ (Mead, 1934) represent a strong motivational force: We were very supportive of each other all the time. There was a few of us went through different things during the project, you know, outside of work, and everyone was always involved in supporting each other. (Laura, female, 19, Lifting the Limits)
As mentioned above, in some cases projects could provide a substitute for missing families and become homes for young people during a certain phase of their transitions. Correspondingly, many young people referred to the projects they were involved in as a ‘family’. Nevertheless, reference to relationships of trust cannot simply be equated with harmony. In contrast, in the context of de-standardized transitions it is rather unlikely that young people’s and project worker’s values, views and interests will converge. Allowing for a culture of conflict in most projects is therefore an important issue, and respective learning experiences are a necessary prerequisite for participation in its project-, community- and biographyrelated dimensions.
Concluding Discussion: Motivators and Motivational Changes To conclude our discussion, it first has to be stressed that those modes of participation, which were shown to be relevant for motivating disengaged young people, are not equally applied by the projects under discussion because of their different functions and positions within the transitional system and their level of equipment and resources. These structural differences are also reflected in the different dimensions through which participation is actually foreseen and facilitated by the projects. It is even more important to underline the relative strength of these projects, which in different aspects and constellations show what could be important contributions to a successful link between participation and motivation. A second observation is that, as far as the link between participation and motivation is concerned, the major common trait shared by the various participants’ experiences is the activation of a virtuous circle in which participatory and motivational aspects are interconnected with each other. However, the projects have different approaches to the two aspects which are regarded as driving forces for motivation – namely to start from individual needs and interests, and to enhance the probability of achieving self-chosen goals.
Salkind_Chapter 66.indd 254
9/4/2010 10:52:11 AM
Stauber
Motivation in Transition
255
Some projects relate to and start from existing interests and desires expressed by young people; others pick up what remaining motivation young people have maintained despite a series of de-motivating experiences; others provide the opportunity for young people to access the space and develop their own interests. Some projects start out by addressing young people’s competencies and raise their feeling of self-efficacy through an advance in terms of trust. Also, the legitimate possibility of taking ‘time out’ may be experienced as breaking with past experiences of failure. And there are projects which, by providing different (material and immaterial) resources and support, increase the means which the concerned young women and men have at their disposal to reach their goals. The third observation is that most of the projects explicitly follow a biographical perspective, aiming at providing (and using) the chance of a ‘fresh start’, without ignoring the hindering/de-motivating factors of the transitional process to date. This new process starts with personal recognition, which gives young people a sense of their personal expertise and a general sense that they ‘matter’ (Amundson, 1998). It goes on to provide young people with spaces and opportunity structures by which they can find out what is subjectively meaningful to them in order to take subjectively relevant decisions (also ‘against’ the logic of institutions). This implies an awareness of structural conditions, namely the availability of means which are necessary for shaping their own trajectory (in terms of opportunities, resources and competencies). And it often implies an increased reflexivity regarding gender roles and respective scopes of agency. Most rewarding, in this regard are projects (such as Lifting the Limits) in which new competencies get certified and officially acknowledged. Getting in contact with meaningfulness in a biographical sense does not necessarily have to be an individual project, but often comes about in the interaction between individuals and their social surroundings. Personal counselling and coaching appear to be valuable resources in this respect, helping to incorporate individual experiences and learning steps into the broader framework of biographical development. This enables young women and men to develop what has been called ‘biographicity’: to acknowledge their biography to date, and to develop confidence in their biographical progress in future, while being open to biographical change (Alheit and Dausien, 1999). Biographicity can be regarded as crucial for motivational management, above all, if it includes reflection on personal limits and structural borders and leads to a sense of realism regarding the possible scope of one’s own efforts. In this way it becomes obvious that motivation is a prerequisite of (biographical) learning, whereas motivational change itself has to be seen as a learning process. Fourth, in all projects, the modes of participation that turned out to be motivators for young people go hand in hand with the perception of social
Salkind_Chapter 66.indd 255
9/4/2010 10:52:11 AM
256
Motivation
hold, allowing the achievement of a sense of belonging and relatedness: self-determination is not realized in an individualized way but is combined with companionship, support and reliable relationships. It is striking that all these participatory modes rely on this interactive dimension – regardless of whether they are more related to needs fulfilment or the expediency of goals. Choice is an issue of trust (in someone’s ability to make the right decisions) and thus implies a strong interactive component. The same applies to flexible outcomes. Focusing on individuality and strengths means trusting in young people’s abilities from the start, which shapes the framework of interaction from the beginning. Giving young people responsibility for project procedures and space links the motivational issues of increased self-efficacy and subjectively relevant goals, and puts them into the interactive framework of reliable relationships. It is this link between freedom and bonds, choice and hold, weak ties and strong ties (Granovetter, 1977), self-determination and reliable support which makes the difference both to the individualized situation which young people failed to cope with and the negative ascriptions they have experienced in the past. So, participatory approaches are shown to be the most sustainable motivators, as long as they include these soft qualities of social bonds and reliability. This last point has some theoretical consequences. Whereas the issue of biographical meaningfulness is not new in motivation theory, the importance of relatedness must be highlighted, since it is often underestimated. Through such an appreciation, we can come to a much more differentiated understanding of the motivation of young people (Ryan and Deci, 2000).
Perspectives While focusing on the question of how to motivate young, disengaged people, our research has produced interrelated findings, which have some general relevance for theorizing youth transitions. Transitions have to be considered much more from the perspective of motivational careers, which demands a contextualized understanding of structural and biographical (de-)motivators. As regards the latter, active participation is revealed to be an important motivator, as long as participatory measures have a strong interactive dimension of relationships of trust, and as long as the activities they are involved in have biographical relevance for young people. These insights into the development of motivational careers – as the biographical motors of transitions – are basic, but nevertheless their latent potential is underestimated. Theoretical and, to an even greater extent, political discourses on changed demands of (youth) transitions often neglect the ‘soft’ prerequisites to cope with new transition-related demands; discourses on (lifelong) learning, competence building and (key) competences often overlook the unspectacular yet crucial components of subjective relevance and intersubjective support.
Salkind_Chapter 66.indd 256
9/4/2010 10:52:11 AM
Stauber
Motivation in Transition
257
This is exactly what these projects under discussion have understood and why they are outstanding examples – despite some limitations – of best practice: late modern youth needs biographical orientation in connection with social hold and reliability – perhaps much more than previous generations. But the increasingly precarious financial situation these projects find themselves in after three years of evaluation shows that their approaches are far from being established. On the contrary, the specific qualities which enable them to make a difference in transition systems, and which in some countries represent the very spirit and soul of the projects, have in some cases already been abolished (e.g. Danish Open Youth Education) or are in acute danger of being reduced to mere pre-vocational education or vocational training (as in Northern Ireland and Germany) in the context of current workfare trends in transition policies.
Notes 1. The United Kingdom, Ireland, Portugal, Spain, Italy, Romania, Denmark, The Netherlands and, distinguished because of still existing differences, East and West Germany. The research (2002–04) was funded by the European Commission under the 5th Framework Program. For further information see www.iris-egris.de/yoyo. 2. The projects under discussion have been clustered into: • • • • •
3. 4.
5. 6.
7.
youth work projects; projects focusing on the integration of youth at risk; projects focusing on preparation for training and work; projects which represent training and employment schemes; and projects with a highly participatory approach to training and employment.
Whereas the first two categories can be associated with the soft sectors of youth policies, the last three belong, via their link to training and employment, to the hard sector of transition policies. For the fluid boundaries between intrinsic and extrinsic motivation, see the continuum model in Ryan and Deci (2000:72). See Erving Goffman’s idea of career’, related to any social strand of any person’s course through life’ (Goffman, 1968: 119), and also Bloomer and Hodkinson (2000), who have adopted it in their concept of learning careers. For these classifications see Note 2. The importance of such personal recognition was also apparent during the interviews. As soon as interviewees felt encouraged to present themselves as experts of their own life situation, a more participatory climate developed and the communication situation became more symmetrical. This shows the difference between the right to take responsibility (including tools and opportunity structures) and the rights and responsibilities approach on which repressive workfare policies rely.
References Abbott, Andrew (1997) ‘On the Concept of Turning Point’, Comparative Social Research 16: 85–105. Alheit, Peter and Dausien, Bettina (1999) ‘Biographicity as a Basic Resource of Lifelong Learning’, paper presented at the European Conference Lifelong Learning inside and
Salkind_Chapter 66.indd 257
9/4/2010 10:52:11 AM
258
Motivation
outside Schools, Bremen, 25–27 February, URL (consulted November 2005): http:// www.erill.unibremen.de/lios/sections/s4_alheit.html Amundson, Norman (1998) Active Engagement: Enhancing the Career Counselling Process. Richmond: Ergon Communcations. Bandura, Albert (1994) Self-efficacy: The Exercise of Control. New York: Freeman. Bloomer, Martin and Hodkinson, Phil (2000) ‘Learning Careers: Continuity and Change in Young People’s Dispositions to Learning’, British Educational Research Journal 26(5): 583–97. Cocks, Rachel J. and Watt, Helen M.G (2004) ‘Relationships among Perceived Competence, Intrinsic Value and Mastery Goal Orientation in English and Maths’, Australian Educational Researcher 31(2): 81–112. du Bois-Reymond, Manuela and Stauber, Barbara (2005) ‘Biographical Turning Points in Young People’s Transitions to Work Across Europe’, in Helena Helve and Gunilla Holm (eds) Contemporary Youth Research: Local Expressions And Global Connections, pp. 63–75. Aldershot: Ashgate. EGRIS (European Group of Integrated Social Research) (2001) ‘Misleading Trajectories: Transition Dilemmas of Young Adults in Europe’, Journal of Youth Studies 4(1): 101–18. Esping-Andersen, Gösta (1990) The Three Worlds of Welfare Capitalism. Cambridge: Cambridge University Press. European Commission (2001) A New Impetus for European Youth. European Commission White Paper, URL (consulted August 2005): http://europa.eu.int/comm/dgs/ education_culture/publ/pdf/youth-wb/en.pdf Fornäs, Johan (1995) Cultural Theory and Late Modernity. London: Sage. Gallie, Duncan and Paugam, Serge (eds) (2000) Welfare Regimes and the Experience of Unemployment in Europe. Oxford: Oxford University Press. Giddens, Anthony (1991) Modernity and Self-Identity. Cambridge: Polity Press. Goffman, Erving (1963) On “Cooling the Mark Out”: Some Aspects of Adaptation and Failure’, in Arnold Rose (ed.) Human Behaviour and Social Processes, pp. 482–505. Boston, MA: Houghton Mifflin. Goffman, Erving (1968) Asylums: Essays on the Social Situation of Mental Patients and Other Inmates. Harmondsworth: Penguin. Granovetter, Marc (1977) ‘The Strength of ‘Weak Ties’, American Journal of Sociology 78(6): 1360–80. Heinz, Walter R. (2002) ‘Self-socialisation and Post-traditional Society ’, in Richard A. Settersten and Timothy J. Owens (eds) New Frontiers of Socialisation, pp. 41–64. Oxford: Elsevier Science. Hurrelmann, Klaus (2003) ‘Der entstrukturierte Lebenslauf. Die Auswirkungen der Expansion der Jugendphase’ (The De-standardized life Course. Effects of the Expansion of Youth), Zeitschrift für Soziologie der Erziehung und Sozialisation (Journal for Sociology of Education and Socialization) 23(2): 115–26. Kaufmann, Franz-Xaver (2003) Varianten des Wohlfahrtsstaats. Der deutsche Sozialstaat im internationalen Vergleich (Variations of Welfare State. The German Sozialstaat in International Comparison). Frankfurt: Suhrkamp. Maehr, Martin L. and Meyer, Heather A. (1997) Understanding Motivation and Schooling: Where We’ve Been, Where We Are, and Where We Need to Go’, Educational Psychology Review 9(4): 371– 403. Mead, George Herbert (1934) Mind, Self and Society. Chicago, IL: C.W Morris. Rose, Nikolas (1999) Powers of Freedom: Refraining Political Thought. Cambridge: Cambridge University Press. Ryan, Richard M. and Deci, Edward L. (2000) ‘Self-Determination Theory and the Facilitation of Intrinsic Motivation, Social Development, and Well-Being’, American Psychologist 55(1):68–78.
Salkind_Chapter 66.indd 258
9/4/2010 10:52:11 AM
Stauber
Motivation in Transition
259
Skelton, Tracey and Valentine, Gill (eds) (1998) Cool Places: Geographies of Youth Cultures. London and New York: Routledge. Stauber, Barbara (2004) Junge Frauen und Männer in Jugendkulturen. Selbstinszenierungen und Handlungspotentiale (Young Women and Men in Youth Cultures. Performing Selves and Agency Potentials). Opladen: Leske and Budrich. Stauber, Barbara (2006) ‘Biography and Gender in Youth Transitions’, in Manuela du-Bois Reymond and Lynne Chisholm (eds) The Modernization of Youth Transitions in Europe, New Directions for Child and Adolescent Development 113: 63–75. Thomson, Rachel, Bell, Robert, Holland, Janet, Henderson, Sheila, McGrellis, Sheena and Sharpe, Sue (2002) ‘Critical Moments: Choice, Chance and Opportunity in Young People’s Narratives of Transition to Adulthood’, Sociology 36(2): 335–54. Walther, Andreas (2006) ‘Regimes of Youth Transitions, Choice, Flexibility and Security in Young People’s Experiences Across Different European Contexts’, Young 14(2): 119 – 41. Walther, Andreas, du-bois Reymond Manuela and Biggart, Andy (eds) (2006) Participation in Transition: Motivation of Young Adults in Europe for Learning and Working. Frankfurt: Peter Lang. Walther, Andreas, Stauber, Barbara, Biggart, Andy, du Bois-Reymond, Manuela, Furlong, Andy, Lòpez Blasco, Andreu, Morch, Sven Pais and José Machado (eds) (2002) Misleading Trajectories – Integration Policies for Young Adults in Europe? An EGRIS Publication, Opladen: Leske + Budrich. Wenger, Etienne (1998) Communities of Practice. Learning, Meaning, and Identity. Cambridge: Cambridge University Press. Wigfield, Allan and Eccles, Jacquelynne S. (2000) ‘Expectancy-value Theory of Achievement Motivation’, Contemporary Educational Psychology 25(1): 68–81.
Salkind_Chapter 66.indd 259
9/4/2010 10:52:11 AM
Salkind_Chapter 66.indd 260
9/4/2010 10:52:11 AM
Section IV: Research Design, Measurement and Statistics and Evaluation
Salkind_Chapter 67.indd 261
9/4/2010 10:54:23 AM
Salkind_Chapter 67.indd 262
9/4/2010 10:54:23 AM
67 Why P Values Are Not a Useful Measure of Evidence in Statistical Significance Testing Raymond Hubbard and R. Murray Lindsay
The most important task before us in developing statistical science is to demolish the P-value culture, which has taken root to a frightening extent in many areas of both pure and applied science, and technology. (Nelder, 1999, p. 261) My personal view is that p-values should be relegated to the scrap heap and not considered by those who wish to think and act coherently. (Lindley, 1999, p. 75)
M
uch empirical work in psychology focuses on hypothesis testing. The typical empirical paper develops, tests, and reports the results of a number of explicit hypotheses relating to the topic at hand. The outcomes of these hypothesis tests are said to contribute toward the creation of a body of knowledge within the discipline. For the most part, psychology researchers rely on p values from statistical significance tests when evaluating the merits of their hypotheses. Based on an annual random sample of issues from 12 American Psychological Association journals for the period 1990–2002, for example, Hubbard (2004) estimated that 94% of empirical papers used significance tests. Given their universality, it seems reasonable to presume that p values play an integral part in knowledge development. In addition, the ubiquity of p values strongly suggests that researchers are intimately familiar with their capabilities. But this is not always the case. Thus, for instance, many Source: Theory & Psychology, 18(1) (2008): 69–88.
Salkind_Chapter 67.indd 263
9/4/2010 10:54:23 AM
264
Research Design, Measurement and Statistics and Evaluation
investigators erroneously believe that the p value indicates the probability that (1) the results occurred because of chance, (2) the results are replicable, (3) the alternative hypothesis is true, (4) the results are important, and (5) the results will generalize. (For specific examples showing where each of these five misuses of p values may be found in the psychology literature, see Bakan, 1966; Carver, 1978; Cohen, 1994; Falk & Greenbaum, 1995; Gigerenzer, 1993; Gigerenzer, Krauss, & Vitouch, 2004; Krämer & Gigerenzer, 2005; Krantz, 1999; Krueger, 2001; Nickerson, 2000; Schmidt, 1996; and Thompson, 1999, among others.)1 This paper is not concerned with such misinterpretations of p values, damaging though they are. Rather, it examines the inherent problems associated with the p value as a plausible measure of evidence per se. Although the origin of the modern p value is generally credited to Karl Pearson (1900), who introduced it in his c2 test (he actually called it the P , c2 test), it was Sir Ronald Fisher who was responsible for popularizing statistical significance testing and p values in the many editions of his classic books Statistical Methods for Research Workers and The Design of Experiments, first published in 1925 and 1935, respectively. Fisher used discrepancies in the data to reject the null hypothesis, that is, he calculated the probability of the data on a true null hypothesis, or Pr(x | H0). Formally, p = Pr(T(X) ≥ T(x) | H0). P is the probability of getting a test statistic T(X) greater than or equal to the observed result, T(x), in addition to more extreme ones, conditional on a true null hypothesis, H0, of no effect or relationship. (Disturbingly, Freund and Perles [1993] remark that differences in the definition of the p value abound in textbooks. See Good [1981], also.) So, the p value is a measure of the (im)plausibility of the actual observations (as well as more extreme and unobserved ones) obtained in an investigation, assuming a true null hypothesis. The rationale is that if the data are seen as being rare or highly discrepant under H0, this constitutes inductive evidence against H0. The idea that rare occurrences comprise evidence against a hypothesis has a pedigree dating back to the first ‘significance test’ by John Arbuthnot in 1710 concerning the birth rates of males and females in London, and is continued in the work of Mitchell, LaPlace, and Edgeworth, among others. (See Baird, 1988 and Gigerenzer et al., 1989 for synopses of this early history of statistical testing.) Traditionally, a p value of .05 has been used as a benchmark to indicate inductive evidence against the null hypothesis, with values like p < .01, p < .001, etc., furnishing even stronger evidence against H0. Fisher (1959) considered the p value to be an objective way for researchers to assess the (im)plausibility of the null hypothesis: ... the feeling induced by a test of significance has an objective basis in that the probability statement on which it is based is a fact communicable to and verifiable by other rational minds. The level of significance in such cases fulfils the conditions of a measure of the rational grounds for the disbelief [in the null hypothesis] it engenders. (p. 43)
Salkind_Chapter 67.indd 264
9/4/2010 10:54:23 AM
Hubbard and Lindsay
P Values 265
But a critical question remains: does the p value, in fact, provide an objective, useful, and unambiguous measure of evidence in hypothesis testing? We argue in this paper that it does not. More specifically, a review of the statistics literature points to several reasons – statistical, logical, the relative nature of evidence, etc. – why the p value fails visibly as a credible measure of evidence. Our premise is simple: that p values continue to saturate empirical work is taken as prima facie testimony that most psychology (and other) scholars – W. Edwards, Lindman, and Savage (1963), Gigerenzer and his colleagues (Gigerenzer, 1993; Gigerenzer et al., 2004; Gigerenzer & Murray, 1987; Gigerenzer et al., 1989), and Nickerson (2000) being notable exceptions – remain unaware of many of the reasons why this index is a defective measure of evidence. To illustrate this, even those who see value in statistical significance testing (e.g., Abelson, 1997; Chow, 1996, 1998; Cortina & Dunlap, 1997; Hagen, 1997; and Mulaik, Raju, & Harshman, 1997) simply never bring up, much less defend, the issue of the adequacy of the p value as a measure of evidence qua evidence. We hope that the present review will help to rectify this situation. As a secondary goal we propose, like Cohen (1994) and Loftus (1996), that instead of/along with reporting p values in individual studies, researchers should provide estimates of sample statistics, effect sizes, and their confidence intervals. We also stress, following Fisher, the importance of replication with extension research (the grist for meta-analyses) in developing a cumulative knowledge base. For comparisons of population estimates from this research, we recommend the criterion of overlapping confidence intervals. Sufficiently overlapping confidence intervals indicate reasonable estimates of the same population parameter.
Why P Values Are an Inadequate Measure of Evidence in Statistical Significance Testing P Values Exaggerate the Evidence against the Null Hypothesis This is the most damning criticism of the p value as a measure of evidence.
Two-Sided Nulls P values exaggerate the evidence against a two-sided (point null or ‘small interval’) hypothesis (Berger, 1986; Berger & Sellke, 1987). An exact, or point null, hypothesis takes the form H0: q = q0 versus HA: q ≠ q0, where q0 is a specific value of q. More realistically, Berger and Delampady (1987) argue, exact hypotheses are better represented as tests such as H0: |q – q0| ≤ e versus HA: |q – q0| > e, where e is ‘small’.
Salkind_Chapter 67.indd 265
9/4/2010 10:54:23 AM
266
Research Design, Measurement and Statistics and Evaluation
Using a Bayesian significance test for a normal mean, James Berger and Thomas Sellke (1987, pp. 112–113) showed that for p values of .05, .01, and .001, respectively, the posterior probabilities of the null, Pr(H0 | x), for n = 50 are .52, .22, and .034. For n = 100 the corresponding figures are .60, .27, and .045. Clearly these discrepancies between p and Pr(H0 | x) are pronounced, and cast serious doubt on the use of p values as reasonable measures of evidence. In fact, Berger and Sellke (1987) demonstrated that data yielding a p value of .05 in testing a normal mean nevertheless resulted in a posterior probability of the null hypothesis of at least .30 for any objective (symmetric priors with equal prior weight given to H0 and HA) prior distribution. It is important at this juncture to emphasize the distinction between the p value, Pr(x | H0), and the posterior probability of the null, Pr(H0 | x).2 The p value gives the probability of the observed (and more extreme) data conditional on a true null hypothesis. Even though it may sound similar, this is not the same thing as the probability of the null being true conditional on the observed data. There is an asymmetric relationship between Pr(x | H0) and Pr(H0 | x). Despite this, a number of psychologists, including Carver (1978), Cohen (1994), and Nickerson (2000), note that many researchers are confused over the meaning of the two expressions, and tend to view the p value as the probability that the null is true. Berger and Sellke (1987) put this succinctly: ‘Indeed, most nonspecialists interpret p precisely as Pr(H0 | x)’ (p. 114). Berger and Sellke’s (1987) research led them to conclude that p values can be highly misleading measures of evidence. That is, the use of p values makes it relatively easy to obtain statistically significant findings, such that p = .05 can indicate no evidence against H0. Researchers and practitioners, on the other hand, tend to interpret a .05 value as constituting much greater evidence against the null. Continuing in the same vein, Berger and Delampady (1987) found similar discrepant results between p values versus Pr(H0 | x) in both normal and binomial situations. This prompted them to recommend that formal use of p values should be abandoned when testing precise (point null and small interval) hypotheses, a conclusion supported by Nester (1996). And, of course, psychologists overwhelmingly test point null and small interval hypotheses. George Casella and Roger Berger (1987), however, showed that Berger and Sellke’s (1987) results for two-sided hypotheses do not necessarily extend to the one-sided testing problem. This outcome maintained hope for the efficacy of the p value as a measure of evidence, at least in more restricted circumstances. Casella and Berger believe that the p value is useful as a quick and crude inferential index. Berger and Sellke (1987) responded: Our basic view of the Casella–Berger article, however, is that it pounds another nail into the coffin of P values. To clarify why, consider what it is that makes a statistical concept valuable; of primary importance is that the concept must convey a well-understood and sensible message for the vast majority of problems to which it is applied. (p. 135)
Salkind_Chapter 67.indd 266
9/4/2010 10:54:23 AM
Hubbard and Lindsay
P Values 267
Berger and Sellke find no such ‘well-understood and sensible message’ with respect to p values because they do not provide easily interpretable measures of evidence against H0 over the spectrum of everyday testing problems. Dickey (1987) agreed with Berger and Sellke’s position regarding the drawbacks of p values, while Dollinger, Kulinskaya, and Staudte (1996) found them wanting even as a measure of evidence for normal data in a one-sided testing context. And in any case, surely science requires more than the quick, crude, restrictive form of inference that Casella and Berger (1987) appear willing to settle for. In light of the above discussion, one would have to concur with Berger and Berry’s (1988) sobering opinion that there should be concern about the validity of research findings based on moderately small, including .05, p values.
Frequentist ‘Calibration’ of P Values It is conceivable that the work cited above raising serious doubts on the usefulness of p values may be ignored or dismissed by mainstream (Neyman–Pearson) frequentist statisticians because of its ‘subjective’ Bayesian orientation (see Neyman, 1977). But what if p values are found wanting as a measure of evidence among those espousing ‘objective’ relative frequency approaches to statistical testing. Here, Sellke, Bayarri, and Berger’s (2001) findings should serve as a salutary warning even to entrenched (Neyman–Pearson) frequentists. To fully appreciate the importance of this issue requires some background information, supplied below. It is not understood by many researchers that in classical statistical testing there are two, quite different, measures of ‘statistical significance.’ One is Fisher’s p value, which is an inferential index of the strength of the evidence against H0, is a data-based random variable, and is applicable to individual studies. On the other hand there is the a level from a Neyman–Pearson hypothesis test. This test is concerned with minimizing Type II, or b, errors (i.e., false acceptance of a null hypothesis) subject to a bound on Type I, or a, errors (i.e., false rejections of a null hypothesis). In addition, a is a prescription for behaviors (accepting or rejecting H0), not a means of assessing evidence; is a pre-selected fixed measure, not a random variable; and applies only to long-run repeated random sampling from the same population, not to single experiments (Hubbard, 2004). The Neyman–Pearson theory of hypothesis testing, with a as the significance level, is generally accepted as constituting frequentist statistical orthodoxy (Hogben, 1957; Nester, 1996; Royall, 1997).3 So the Neyman–Pearson model is the one typically portrayed in statistics textbooks. Conversely, social science methods texts, in a misguided attempt to present a single, unified model of statistical testing, have tended to anonymously mix together the two incompatible measures of statistical significance, p’s and a ’s. Needless to say, this has resulted in massive confusion among members of the scholarly
Salkind_Chapter 67.indd 267
9/4/2010 10:54:23 AM
268
Research Design, Measurement and Statistics and Evaluation
community about exactly what ‘statistical significance’ means – is it denoted by a p value, an a level, and/or the ubiquitous p < a criterion (Hubbard & Armstrong, 2006)? The upshot is that many researchers (e.g., Bayarri & Berger, 1999, 2000, 2004; Berger, 2003; Berger & Sellke, 1987; Gigerenzer, 1993; Goodman, 1993, 1999; Hubbard, 2004; Hubbard & Bayarri, 2003a, 2003b, 2005) state that the p value is routinely misinterpreted as a frequentist Type I error probability. An empirical literature in which p values and a levels are erroneously seen to be interchangeable, but in which investigators overwhelmingly report p’s rather than a ’s required of Neyman–Pearson frequentist orthodoxy (see Hubbard, 2004), sets the backdrop for Sellke et al.’s (2001) study. As seen above, Berger and his colleagues had already shown the p value to be a poor gauge of evidence in a Bayesian context. They now wanted to determine if p values are useful measures of evidence against H0 when considered from a Neyman–Pearsonian perspective. Accordingly, Sellke et al. (2001) devised a method for ‘calibrating’ p values so that they can be interpreted as Neyman–Pearson frequentist error probabilities.4 The end result of this calibration is as follows: a( p) = (1 + [−e p log ( p)]–1)–1. Consequently, p = .05 translates into frequentist error probability a (.05) = .289 in rejecting H0 – a result suggesting no evidence against H0. Even a (.01) = .111. These findings convey in a non-Bayesian manner the severe problems involved in using p values as credible measures of evidence against the null hypothesis.
Frequentist Performance of P Values As reported in a number of studies (e.g., Berger, 2003; Hubbard & Bayarri, 2003a; and especially Sellke et al., 2001), a simulation of the frequentist performance of p values is revealing. Whereas a ’s can be constrained to some pre-assigned (e.g., .05) level, p values share no similar obligation. That is, p’s do not behave in the frequentist manner of a ’s. This is dramatically illustrated by accessing an applet at www.stat.duke.edu/~berger, which permits a simulation of the frequentist properties of p values. As an example, suppose we wish to conduct some tests on the effectiveness of a new psychotherapy, P-T. The statistical test would be H0: P-T = 0 versus HA: P-T ≠ 0. The simulation, based on a long series of such tests on normal data (variance known), records how often H0 is true for p values in given ranges, say p approximately equal to .05 or .01. Otherwise expressed, this frequentist simulation of the behavior of p values demonstrates that even when we obtain ‘statistically significant’ outcomes near the .05 or .01 levels, these results often arise from true null hypotheses of no effect or association. More specifically, assuming that one-half of the null hypotheses in the P-T tests are true, Sellke et al. (2001, p. 63) warned that:
Salkind_Chapter 67.indd 268
9/4/2010 10:54:24 AM
Hubbard and Lindsay
P Values 269
1. Of the subset of P-T tests for which the p value is close to the .05 level, at least 22% (and typically about 50%) come from true nulls. 2. Of the subset of P-T tests for which the p value is close to the .01 level, at least 7% (and typically about 15%) come from true nulls.5 As Berger (2003) understated the case: ‘The harm from the common misinterpretation of p = 0.05 as an error probability is apparent’ (p. 4). A p value of .05 may provide no evidence against the null hypothesis.
P Values and Sample Size P Values and Small Versus Large Samples Sample size is hugely influential in determining significance levels. Royall (1986), for example, cites well-known statisticians whose interpretations of p values in small versus large sample studies are totally contradictory: some argue that a given p value in a small sample study is stronger evidence against H0 than the same p value in a large sample study, and vice versa. As such, a given p value does not have a fixed, objective meaning – it is contingent upon (at least) the sample size. Indeed, as Marden (2000) points out, the p value is not very useful with large sample sizes. Because almost no null hypothesis is exactly true (Tukey, 1991), when sample sizes are large enough almost any null hypothesis will have a tiny p value. Hand’s (1998) concerns about the relevance of significance testing in data-mining situations, where every p value will be statistically significant to several zeros following the decimal point, is simply Marden’s observation writ bold.
Lindley’s ‘Paradox’ Moreover, the problems with p values and sample sizes do not end here. We must consider also Lindley’s ‘paradox’ (Lindley, 1957). He showed that for any level of significance, p, and for any nonzero prior probability of the null hypothesis, Pr(H0), a sample size can be found such that the posterior probability of the null, Pr(H0 | x), is 1 – p. That is, a null hypothesis that is soundly rejected at, say, the .05 level by a Fisherian significance test can nevertheless have 95% support from a Bayesian viewpoint. That these inferences are diametrically opposed is the paradox. The rationale behind this conundrum, Johnstone (1986) explains, is that no matter how small the p value, the likelihood ratio Pr(x | H0)/Pr(x | HA) approaches infinity as the sample size increases. Consequently, for large n, a small p value can actually be interpreted as evidence in favor of H0 rather than against it. The question of the objectivity and usefulness of the p value as a measure of evidence is shattered by this argument.
Salkind_Chapter 67.indd 269
9/4/2010 10:54:24 AM
270
Research Design, Measurement and Statistics and Evaluation
Experimental Designs and P Values How different investigators might conceive the planning and execution of a study can also lead to p values with widely varying magnitudes. As an example of this, let us examine Fisher’s (1935, ch. 2) classic experiment of the ‘lady tasting tea,’ as described by Lindley (1993). The lady in question claimed she could distinguish between whether milk or tea had been poured first into a cup (of tea). In the experiment, the lady is presented with six pairs of cups of tea, and she must determine whether milk or tea entered the cup first. The null hypothesis – that she cannot, in fact, discriminate – is that she would guess 50% right (R) and 50% wrong (W). Suppose that she gets the first five results right and the last one wrong, or RRRRRW. The p value for this outcome, Lindley notes, is 7(½)6, or .110, which is not statistically significant at the .05 level. This p value, like all of them, consists of two parts. In this case: 6(½)6 = .094 (probability of observed outcomes) + (½)6 = .016 (probability of more extreme outcomes). The justification for the inclusion of the latter in the calculation of p values is given in a later section of the paper. Suppose instead of the above design, another researcher decides to repeat the experiment until the lady makes her first mistake. In this case, and with the same RRRRRW data, the p value is now statistically significant at the .032 level [(½)6 + (½)6 = .016 + .016 = .032]. The two parts of this p value are explained as follows: (½)6 = .016 (probability of observed outcomes) – but without this expression being multiplied by 6 because the mistaken choice, W, must always come at the end (see, e.g., Goodman, 1999) – + (½)6 = .016 (probability of more extreme outcomes). Of course, these experimental results make no sense. The exact same data, obtained in the exact same sequence, should yield the exact same p values. But they do not. And all because two different investigators held alternate conceptions as to how the experiment should be run.
Effect Sizes and P Values One must surely question the p value as a measure of evidence when it has nothing to say about the effect size obtained in a study (Gelman & Stern, 2006). For instance, a small sample study with a large effect can yield the same p value as a large sample study with a small effect size. To illustrate this, consider Freeman’s (1993) hypothetical medical trials in which all patients receive both treatments A and B and are asked to express their preference (see Table 1). The results of trial 1, with its 75% preference rate for A over B, would be considered as indicative of a potentially enormous preference for A. Trial 4, on the other hand, with a 50.07% preference rate for A, would be
Salkind_Chapter 67.indd 270
9/4/2010 10:54:24 AM
Hubbard and Lindsay
P Values 271
Table 1 Trial 1 2 3 4
No. preferring A 15 114 1,046 1,001,455
No. preferring B 5 86 954 998,555
% preferring A 75.0 57.0 52.3 50.07
regarded as overwhelming evidence that preferences for A versus B are all but identical. Very few researchers would view the results of these four trials as being equivalent, yet they all produce a p value of .041. (Freeman does not specify which particular statistical test was used in making these comparisons.) Gibbons’ (1986) assertion, therefore, in an article titled ‘P-Values’, that ‘An investigator who can report only a P value conveys the maximum amount of information contained in the sample...’ (p. 367) is seen to be incredulous. Far from conveying such information, Berger, Boukai, and Wang (1997) caution that the interpretation of p values will change drastically from problem to problem. Contrary to Fisher’s claims, the p value is not an objective measure of evidence against a hypothesis, a topic that is pursued below.
P Values and Subjectivity A further example of the fallibility of the p value as an objective measure of evidence is seen in the choice of one-sided versus two-sided statistical significance tests (Goodman & Royall, 1988; Royall, 1997). Although twosided tests are the norm, researchers are sometimes advised that if they expect a departure from H0 in a specific direction they can halve the p value, say from .05 to .025. That is, Goodman and Royall (1988) comment, even though the data are the same, the p value is altered by the researcher’s subjective impressions about the likely outcome of the study. They also note that similar changes to p values occur when the research involves multiple comparisons.
P Values Are Logically Flawed P Values Are Logically Flawed Measures of Support for Hypotheses Schervish (1996) demonstrated that p values fail to meet the simple logical condition required by a measure of support, namely, that if hypothesis H implies hypothesis H′, there should be at least as much support for H′ as there is for H. In the course of this work, he lamented that he had been
Salkind_Chapter 67.indd 271
9/4/2010 10:54:24 AM
272
Research Design, Measurement and Statistics and Evaluation
unable to construct a consistent interpretation of the p value as anything resembling a measure of support for a hypothesis even in simple, much less multiparameter, problems. Schervish warned that ‘common informal use of P values as measures of support or evidence for hypotheses has serious logical flaws’ (p. 203). Further, because they are not as different as they might have seemed (i.e., point null and one-sided hypotheses are, in fact, at opposite ends of a continuum of hypotheses spanned by interval hypotheses), Schervish argued that the interpretation of the p value as a measure of evidence should be consistent across the different hypotheses tested – point null, one-sided, and interval. This, of course, is not the case. Thus, Schervish’s research supports the claim of Berger and Sellke (1987) and Bayarri and Berger (2000) that the p value is not amenable to a reasonably objective interpretation as evidence over the spectrum of testing problems. And this, together with much other information presented in this paper, runs counter to Frick’s (1996) claim that a p value creates a common measure of strength of evidence across statistical tests.
The P Value Computes Not the Probability of the Observed Data under H0 , But This Plus the Probability of More Extreme Data This is a major weakness regarding the usefulness of p values. Because they are defined as a procedure for establishing the probability of an outcome, as well as more extreme ones, on a null hypothesis, significance tests are affected by how the probability distribution is spread over unobserved outcomes in the sample space. That is, the p value denotes not only the probability of what was observed, but also the probabilities of all the more extreme events that did not arise. How is it that these more extreme, unobserved, cases are involved in calculating the p value? To find out, we revisit Lindley’s (1993) analysis of the ‘lady tasting tea’. Recall the lady was right (R) about the outcomes of the first five experiments, and wrong (W) about the sixth, i.e., RRRRRW. This result has probability (½)6, or a statistically significant p value of .016. But, Lindley continues, Fisher saw the flaw in this argument because every possible result with the six pairs of cups is significant at p = .016. To guard against this, Fisher proposed that any result where just one W occurs out of six supports the lady’s ability to discriminate, and should be included in the calculation of the p value. There are six possibilities, including RRRRRW, so the p value is now 6(½)6 = .094, which is not significant. Fisher’s significance rationale is no longer the p value for a given outcome on a true null hypothesis, but that and similar outcomes; in our case, one mistake in six taste tests. He was aware, however, that this situation also was not feasible. This is because the most likely result in the which comes first – milk or tea – taste test is sheer guessing: 50% R and
Salkind_Chapter 67.indd 272
9/4/2010 10:54:24 AM
Hubbard and Lindsay
P Values 273
50% W. For example, Lindley asserts, for 128 taste tests (64 R, 64 W) the p value – 128C64(½)128 – is approximately .05. But this brings us back to square one; if this result is the most likely, then all other outcomes have a smaller probability. That is, all 128 taste tests will be significant at the p = .05 level. In order to circumvent this issue, Fisher suggested that if one error in six (RRRRRW) is significant, more extreme outcomes, such as no mistakes at all (RRRRRR), must necessarily be significant. Therefore, these more extreme results should be incorporated when calculating the p value. For the outcome RRRRRW, with probability (½)6 or p = .016, there are five others (RRRRWR, RRRWRR, etc.) as extreme, and one (RRRRRR) more extreme, so the overall probability is 7(½)6 = .110, which is not significant. And this p value has two components: 6(½)6 = .094 (probability of observed data) plus (½)6 = .016 (probability of more extreme data). This p = .110 is, of course, the same value cited earlier. Many statisticians (e.g., Berger & Berry, 1988; Berger & Delampady, 1987; Freeman, 1993; Goodman, 1999; Royall, 1997) charge that a valid measure of strength of evidence cannot be dependent on the probabilities of unobserved outcomes. Jeffreys (1939) acknowledged this illogic in p values: What the use of P implies ... is that a hypothesis that may be true may be rejected because it has not predicted observable results that have not occurred. This seems a remarkable procedure. (p. 316)
Royall (1997) insists that there is no value to Fisherian significance tests because they are at odds with the law of likelihood and its implication of the ‘irrelevance of the sample space’ (p. 68). As he explains: The law of likelihood says that the evidence in an observation, X = x, as it pertains to two probability distributions labeled q1 and q2, is represented by the likelihood ratio, f(x; q1) ⁄ f(x; q2 ). In particular, the law implies that for interpreting the observation as evidence for hypothesis H1: q = q1 vis-à-vis H2: q = q2, only the likelihood ratio is relevant. What other values of X might have been observed, and how the two distributions in question spread their remaining probability over the unobserved values is irrelevant – all that counts is the ratio of the probabilities of the observation under the two hypotheses. (p. 22)
Or as Freeman (1993) says, echoing Birnbaum’s (1962) and A.W.F. Edwards’ (1992) seminal contributions: … the likelihood principle is the one secure foundation for all statistics. I find the arguments in favour of it compelling and the counterarguments unconvincing. Since p-values and all other frequentist methods violate this principle, they must necessarily be unsatisfactory tools of statistical inference. (pp. 1444 –1445)6
Salkind_Chapter 67.indd 273
9/4/2010 10:54:24 AM
274
Research Design, Measurement and Statistics and Evaluation
Specification of an Alternative Hypothesis Evidence Is Relative When an alternative hypothesis is specified, it is possible to identify those outcomes as extreme or more so than the observed event. Consequently, Royall (1997) states, it is not low probability under A that makes an observation evidence against A. Rather, it is low probability under A compared with the probability under a different hypothesis B, and this makes it evidence against A versus B. This line of reasoning necessitates a weighing of the evidence between two rival hypotheses, a situation impossible in Fisherian significance tests, where there is only the null hypothesis. Fisher never saw the need for an alternative hypothesis, and vigorously opposed its later inclusion by Jerzy Neyman and Egon Pearson (Gigerenzer & Murray, 1987; Hubbard & Bayarri, 2003a). Note, then, Johnstone’s (1986) observation that the law of likelihood provides a better measure of evidence than p values for evaluating the plausibility of two (or more) rival hypotheses.7 More specifically, if the likelihood ratio Pr(x | H0)/Pr(x | HA ) exceeds 1, then the evidence is in favor of H0 over HA, and vice versa. Unfortunately, Fisher’s disjunction only addresses Pr(x | H 0 ); it is silent about Pr(x | HA). The p value is a tail-area probability and not a likelihood ratio.
Our Interest Is in the Alternative Hypothesis Specifying an alternative hypothesis is not just a means of covering values more extreme than those observed on a null hypothesis. The alternative (research) hypothesis is the one the investigator is interested in. Berkson (1942) recognized this when posing an early challenge to Fisher’s paradigm of null hypothesis testing: In the null hypothesis schema we are trying only to nullify something . . . . But ordinarily evidence does not take this form. With the corpus delicti in front of you, you do not say, ‘Here is evidence against the hypothesis that no one is dead’. You say, ‘Evidently, someone has been murdered’. (p. 326)
For statistical tests to be scientifically useful they should speak to the research hypothesis, and not be fixated with rejection of the null hypothesis. This is consistent with Goodman and Royall’s (1988) complaint that p values blinker us into thinking that a hypothesis can only be weakened, rather than strengthened, by the data. But Fisher’s methodology denies the existence of an alternative/research hypothesis. In this matter, it is sometimes thought that Fisherian significance testing has an implicit alternative hypothesis that is simply the complement of the null. But, as Hubbard and Bayarri (2003a) point out, this is difficult to formalize. For instance, what is the complement of an N (0, 1) model? Is it the mean differing from 0, the variance differing
Salkind_Chapter 67.indd 274
9/4/2010 10:54:24 AM
Hubbard and Lindsay
P Values 275
from 1, the model not being Normal? Formally, Fisher only had the null model in mind, and wanted to see if the data were compatible with it.
Confidence Intervals, Not P Values The foregoing discussion makes it clear that p values are neither objective nor credible measures of evidence in statistical significance testing. Moreover, the authenticity of many published studies with p < .05 findings must be called into question. Rather than the preoccupation with p values and testing, the goal of empirical research in individual studies should be the estimation of sample statistics, effect sizes, and the confidence intervals (CIs) surrounding them. CIs underscore the superiority of estimation over testing. Scientific advance typically necessitates plausible estimates of the magnitude of effect sizes in the population (A.W.F. Edwards, 1992; Lindsay, 1995), and the CI provides this. CIs also indicate the precision or reliability of the estimate via the width of the interval. Also, because they are in the same metric as the point estimate, CIs make it easier to see whether the results are substantively, rather than statistically, significant. And, of course, a CI can be used as a significance test; a 95% CI not including the null value is equivalent to rejecting the hypothesis at the .05 level. Furthermore, initial results need to be replicated and extended. Here again, CIs assume a pivotal role. Specifically, we advocate the criterion of overlapping CIs around point estimates across similar studies as a measure of replication success. Substantially overlapping CIs suggest tenable estimates of the same population parameter, and we applaud the very useful recent work in this area (e.g., Cumming & Finch, 2001, 2005; Fidler, Thomason, Cumming, Finch, & Leeman, 2005; Goldstein & Healy, 1995; Schenker & Gentleman, 2001; Schmidt, 1996; Smifhson, 2003; Thompson, 2002; Tryon, 2001).8 It is the systematic replication and extension of the results of previous studies, and not p values from individual ones, that fosters cumulative knowledge development. That this statement appears to have eluded many applied researchers, as well as editors and reviewers, is puzzling because Fisher (1966) himself put only provisional stock in statistically significant results from single studies: ‘we thereby admit that no isolated experiment, however significant in itself, can suffice for the experimental demonstration of any natural phenomenon’ (p. 13). Fisher was a major proponent of replication: ‘Fisher had reason to emphasize, as a first principle of experimentation, the function of appropriate replication in providing an estimate of error’ (Fisher Box, 1978, p. 142). Indeed, Fisher Box (1978) insinuates that Fisher coined the term ‘replication’: ‘The method adopted was replication, as Fisher called it; by his naming of what was already a common experimental practice, he called attention to its functional importance in experimentation’ (p. 142). Fisher (1966) encouraged in particular the importance of replication with extension research: ‘we may, by
Salkind_Chapter 67.indd 275
9/4/2010 10:54:24 AM
276
Research Design, Measurement and Statistics and Evaluation
deliberately varying in each case some of the conditions of the experiment, achieve a wider inductive basis for our conclusions, without in any degree impairing their precision’ (p. 102). It is easy, therefore, to imagine Fisher agreeing with the sentiments put forward in both the psychology (e.g., Falk, 1998; Hubbard, 2004; Neuliep & Crandall, 1990, 1993; Rosenthal, 1990; Rosnow & Rosenthal, 1989; Sohn, 1998; Thompson, 1994) and statistics (e.g., Bayarri & Mayoral, 2002; Chatfield 1995; Ehrenberg & Bound, 1993; Guttman, 1977; Lindsay & Ehrenberg, 1993; Nelder, 1986, 1999; Ottenbacher, 1996; Rosenbaum, 1999, 2001) disciplines that there is an urgent need for more replication with extension research.
Conclusions Over the last few decades a considerable literature has emerged in psychology critical of the misuse of statistical significance testing. Much of the literature has dealt with how researchers invest these tests with far greater capabilities than they possess. Moreover, this frequently involves gross misinterpretations of the meaning of p values. Works like these are to be welcomed. During this same time, however, little has appeared in psychology (or elsewhere in the social sciences) about the severe limitations of the p value as a measure of evidence per se. In other words, it is bad enough for researchers to misuse a measure that is useful: But it strains credulity to do so when that measure is seriously flawed in itself. And this paper has demonstrated – from a multitude of perspectives – that the p value is just that. Hence Nelder’s (1999) call to ‘demolish’ the p value culture. In concluding, we note that there is more than a hint of irony in the fact that Fisher’s sanctioning of the vital role of replication has been overlooked, while at the same time his widely misunderstood and defective p values blanket the empirical literature. This has occurred, even though, as Steiger (1990) expressed: ‘An ounce of replication is worth a ton of inferential statistics’ (p. 176). It is past time to redress this imbalance. Accordingly, we hope that the present paper will help stimulate further public discussion on methods of data analysis and knowledge development within the field.
Notes 1. These works, particularly Nickerson’s (2000) tour de force, also offer excellent reviews of the statistical significance testing controversy in psychology. 2. From Bayes’ theorem, the posterior probability of the null hypothesis using our terminology is calculated as follows: Pr(H 0 | x) =
Salkind_Chapter 67.indd 276
Pr(x | H 0 )Pr(H 0 ) Pr(x | H 0 )Pr(H 0 ) + Pr(x | H A )Pr(H A )
9/4/2010 10:54:24 AM
Hubbard and Lindsay
3.
4. 5.
6.
P Values 277
Readers are referred to several articles in the psychology literature making use of this formula (e.g., Cohen, 1994; Falk & Greenbaum, 1995; Hagen, 1997; Nickerson, 2000). Fisher is also a frequentist in the sense that a p value of .05 on a true null hypothesis yielded in a single study would be interpreted to mean that the probability of obtaining such an observed value (and more extreme ones) is only 5%. He is not, however, a frequentist in the long-run repeated sampling mode like Neyman–Pearson. See Hubbard and Bayarri (2003a) for further discussion of this. The details of this calibration are too involved to consider here. They can be found in Sellke et al. (2001). Interested readers are encouraged to experiment with the applet, where one can specify the initial percentage of true nulls, the small ranges of p values to investigate (e.g., p = .05 might be chosen as p between .049 and .05), and the value of the normal means, m’s, that occur under HA in the simulation. Freeman’s (1993) appraisal of the usefulness of p values in data analysis is instructive, reflecting as it does a 180° change of opinion: This paper started life as an attempt to defend p-values ... I have, however, been led inexorably to the opposite conclusion, that the current use of p values as the ‘main means’ of assessing and reporting the results of clinical trials is indefensible. (p. 1443)
7. See also Glover and Dixon’s (2004) support of the likelihood principle as a means of adjudicating knowledge claims in psychology. 8. Despite the advantages in using CIs over p values, reforms in statistical practice in psychology have been problematic (Hubbard & Ryan, 2000). Fidler, Thomason, Cumming, Finch, and Leeman (2004), for example, report on the difficulties encountered by Loftus (1993) in his efforts to decrease the emphasis on significance testing while editor of Memory & Cognition. During his tenure, Fidler et al. (2004) note, the proportion of articles using error bars (both CIs and standard error bars) increased to 41% as compared with 7% under his predecessor. Unfortunately, after Loftus left his editorial position, this proportion fell to 24%. Clearly, effecting changes in the manner in which statistical evidence is presented in the literature will be no easy task. Yet it is surely an important one.
References Abelson, R.P. (1997). A retrospective on the significance test ban of 1999 (If there were no significance tests, they would be invented). In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 117–141). Mahwah, NJ: Erlbaum. Baird, D. (1988). Significance tests, history and logic. In S. Kotz & N.L. Johnson (Eds.), Encyclopedia of statistical sciences (pp. 466– 471). New York: Wiley. Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 423– 437. Bayarri, M.J., & Berger, J.O. (1999). Quantifying surprise in the data and model verification (with comments). In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian statistics ( Vol. 6, pp. 53–82). Oxford: Clarendon. Bayarri, M.J., & Berger, J.O. (2000). P values for composite null models. Journal of the American Statistical Association, 95, 1127–1142. Bayarri, M.J., & Berger, J.O. (2004). The interplay of Bayesian and frequentist analysis. Statistical Science, 19, 58– 80. Bayarri, M.J., & Mayoral, A.M. (2002). Bayesian design of ‘successful’ replications. The American Statistician, 56, 207–214.
Salkind_Chapter 67.indd 277
9/4/2010 10:54:24 AM
278
Research Design, Measurement and Statistics and Evaluation
Berger, J.O. (1986). Are p-values reasonable measures of accuracy? In I.S. Francis, B.F.J. Manly, & F.C. Lam (Eds.), Pacific Statistical Congress (pp. 21–27). Amsterdam: Elsevier. Berger, J.O. (2003). Could Fisher, Jeffreys and Neyman have agreed on testing? (with comments). Statistical Science, 18, 1–32. Berger, J.O., & Berry, D.A. (1988). Statistical analysis and the illusion of objectivity. American Scientist, 76, 159–165. Berger, J.O., Boukai, B., & Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with comments). Statistical Science, 12, 133–160. Berger, J.O., & Delampady, M. (1987). Testing precise hypotheses (with comments). Statistical Science, 2, 317–352. Berger, J.O., & Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of p values and evidence (with comments). Journal of the American Statistical Association, 82, 112–139. Berkson, J. (1942). Tests of significance considered as evidence. Journal of the American Statistical Association, 37, 325–335. Birnbaum, A. (1962). On the foundations of statistical inference (with comments). Journal of the American Statistical Association, 57, 269–326. Carver, R.P. (1978). The case against statistical significance testing. Harvard Educational Review, 48, 378–399. Casella, G., & Berger, R.L. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with comments). Journal of the American Statistical Association, 82, 106–139. Chatfield, C. (1995). Model uncertainty, data mining and statistical inference (with comments). Journal of the Royal Statistical Society A, 158, 419– 466. Chow, S.L. (1996). Statistical significance: Rationale, validity and utility. Thousand Oaks, CA: SAGE. Chow, S.L. (1998). Précis of statistical significance: Rationale, validity and utility (with comments). Behavioral and Brain Sciences, 21, 169–239. Cohen, J. (1994). The earth is round ( p < .05). American Psychologist, 49, 997–1003. Cortina, J.M., & Dunlap, W.P. (1997). On the logic and purpose of significance testing. Psychological Methods, 2, 161–172. Cumming, G., & Finch, S. (2001). A primer on the understanding, use and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532–574. Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60, 170–180. Dickey, J.M. (1987). Comment on Berger and Sellke. Journal of the American Statistical Association, 82, 129–130. Dollinger, M.B., Kulinskaya, E., & Staudte, R.G. (1996). When is a p-value a good measure of evidence? In H. Rieder (Ed.), Robust statistics, data analysis and computer intensive methods (pp. 119–134). New York: Springer Verlag. Edwards, A.W.F. (1992). Likelihood (Expanded ed.). Baltimore, MD: Johns Hopkins University Press. Edwards, W., Lindman, H., & Savage, L.J. (1963). Bayesian statistical inference for psychological research. Psychological Review, 70, 193–242. Ehrenberg, A.S.C., & Bound, J.A. (1993). Predictability and prediction (with comments). Journal of the Royal Statistical Society A, 156, 167–206. Falk, R. (1998). Replication – A step in the right direction: Commentary on Sohn. Theory & Psychology, 8, 313–321. Falk, R., & Greenbaum, C.W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 5, 75–98.
Salkind_Chapter 67.indd 278
9/4/2010 10:54:24 AM
Hubbard and Lindsay
P Values 279
Fidler, F., Thomason, N., Cumming, G., Finch, S., & Leeman, J. (2004). Editors can lead researchers to confidence intervals, but can’t make them think: Statistical reform lessons from medicine. Psychological Science, 15, 119–126. Fidler, F., Thomason, N., Cumming, G., Finch, S., & Leeman, J. (2005). Still much to learn about confidence intervals: Reply to Rouder and Morey (2005). Psychological Science, 16, 494 – 495. Fisher, R.A. (1925). Statistical methods for research workers. Edinburgh: Oliver & Boyd. Fisher, R.A. (1935). The design of experiments. Edinburgh: Oliver & Boyd. Fisher, R.A. (1959). Statistical methods and scientific inference (2nd ed.). Edinburgh: Oliver & Boyd. Fisher, R.A. (1966). The design of experiments (8th ed.). Edinburgh: Oliver & Boyd. Fisher Box, J. (1978). R. A. Fisher: The life of a scientist. New York: Wiley. Freeman, P.R. (1993). The role of p-values in analysing trial results. Statistics in Medicine, 12, 1443–1452. Freund, J.E., & Perles, B.M. (1993). Observations on the definition of P-values. Teaching Statistics, 15, 8–9. Frick, R.W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1, 379–390. Gelman, A., & Stern, H. (2006). The difference between ‘significant’ and ‘not significant’ is not itself statistically significant. The American Statistician, 60, 328–331. Gibbons, J.D. (1986). P-Values. In S. Kotz & N.L. Johnson (Eds.), Encyclopedia of statistical sciences (pp. 366–368). New York: Wiley. Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C.A. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 311–339). Hillsdale, NJ: Erlbaum. Gigerenzer, G., Krauss, S., & Vitouch, O. (2004). The null ritual: What you always wanted to know about significance testing but were afraid to ask. In D. Kaplan (Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 391– 408). Thousand Oaks, CA: SAGE. Gigerenzer, G., & Murray, D.J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum. Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L. (1989). The empire of chance. New York: Cambridge University Press. Glover, S., & Dixon, P. (2004). Likelihood ratios: A simple and flexible statistic for empirical psychologists. Psychonomic Bulletin & Review, 11, 791–806. Goldstein, H., & Healy, M.J.R. (1995). The graphical interpretation of a collection of means. Journal of the Royal Statistical Society A, 158, 175–177. Good, I.J. (1981). Some logic and history of hypothesis testing. In J.C. Pitt (Ed.), Philosophy in economics (pp. 149–174). Dordrecht: D. Reidel. Goodman, S.N. (1993). P values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. American Journal of Epidemiology, 137, 485– 496. Goodman, S.N. (1999). Toward evidence-based medical statistics. 1: The P value fallacy. Annals of Internal Medicine, 130, 995–1004. Goodman, S.N., & Royall, R.M. (1988). Evidence and scientific research. American Journal of Public Health, 78, 1568–1574. Guttman, L. (1977). What is not what in statistics. The Statistician, 26, 81–107. Hagen, R.L. (1997). In praise of the null hypothesis statistical test. American Psychologist, 52, 15–24. Hand, D.J. (1998). Data mining: Statistics and more? The American Statistician, 52, 112–118. Hogben, L. (1957). Statistical theory. New York: Norton.
Salkind_Chapter 67.indd 279
9/4/2010 10:54:24 AM
280
Research Design, Measurement and Statistics and Evaluation
Hubbard, R. (2004). Alphabet soup: Blurring the distinctions between p’s and a ’s in psychological research. Theory & Psychology, 14, 295–327. Hubbard, R., & Armstrong, J.S. (2006). Why we don’t really know what statistical significance means: Implications for educators. Journal of Marketing Education, 28, 114–120. Hubbard, R., & Bayarri, M.J. (2003a). Confusion over measures of evidence ( p’s) versus errors (a ’s) in classical statistical testing (with comments). The American Statistician, 57, 171–182. Hubbard, R., & Bayarri, M.J. (2003b). P values are not error probabilities. Institute of Statistics and Decision Sciences, Working Paper, No. 03–26. Durham, NC: Duke University Working Papers Series, 27708–0251. Hubbard, R., & Bayarri, M.J. (2005). Comment on Christensen. The American Statistician, 59, 353. Hubbard, R., & Ryan, P.A. (2000). The historical growth of statistical significance testing in psychology – and its future prospects. Educational and Psychological Measurement, 60, 661–681. Jeffreys, H. (1939). Theory of probability. Oxford: Clarendon. Johnstone, D.J. (1986). Tests of significance in theory and practice (with comments). The Statistician, 35, 491–504. Krämer, W., & Gigerenzer, G. (2005). How to confuse with statistics or: The use and misuse of conditional probabilities. Statistical Science, 20, 223–230. Krantz, D.H. (1999). The null hypothesis testing controversy in psychology. Journal of the American Statistical Association, 44, 1372–1381. Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method. American Psychologist, 56, 16–26. Lindley, D.V. (1957). A statistical paradox. Biometrika, 44, 187–192. Lindley, D.V. (1993). The analysis of experimental data: The appreciation of tea and wine. Teaching Statistics, 15, 22–25. Lindley, D.V. (1999). Comment on Bayarri and Berger. In J.M. Bernardo, J.O. Berger, A.P. Dawid, & A.F.M. Smith (Eds.), Bayesian Statistics ( Vol. 6, p. 75). Oxford: Clarendon. Lindsay, R.M. (1995). Reconsidering the status of tests of significance: An alternative criterion of adequacy. Accounting, Organizations and Society, 20, 35–53. Lindsay, R.M., & Ehrenberg, A.S.C. (1993). The design of replicated studies. The American Statistician, 47, 217–228. Loftus, G.R. (1993). Editorial comment. Memory & Cognition, 21, 1–3. Loftus, G.R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5, 161–171. Marden, J.I. (2000). Hypothesis testing: From p values to Bayes factors. Journal of the American Statistical Association, 95, 1316–1320. Mulaik S.A., Raju, N.S., & Harshman, R.A. (1997). There is a time and a place for significance testing. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 65–115). Mahwah, NJ: Erlbaum. Nelder, J.A. (1986). Statistics, science and technology (with comments). Journal of the Royal Statistical Society A, 149, 109–121. Nelder, J.A. (1999). From statistics to statistical science (with comments). The Statistician, 48, 257–269. Nester, M.R. (1996). An applied statistician’s creed. The Statistician, 45, 401– 410. Neuliep, J.W., & Crandall, R. (1990). Editorial bias against replication research. Journal of Social Behavior and Personality, 5, 85–90. Neuliep, J.W., & Crandall, R. (1993). Reviewer bias against replication research. Journal of Social Behavior and Personality, 8, 22–29. Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese, 36, 97–131.
Salkind_Chapter 67.indd 280
9/4/2010 10:54:24 AM
Hubbard and Lindsay
P Values 281
Nickerson, R.S. (2000). Null hypothesis statistical testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301. Ottenbacher, K.J. (1996). The power of replications and replications of power. The American Statistician, 50, 271–275. Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, 50, 157–175. Rosenbaum, P.R. (1999). Choice as an alternative to control in observational studies (with comments). Statistical Science, 14, 259–304. Rosenbaum, P.R. (2001). Replicating effects and biases. The American Statistician, 55, 223–227. Rosenthal, R. (1990). Replication in behavioral research. Journal of Social Behavior and Personality, 5, 1–30. Rosnow, R.L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276–1284. Royall, R.M. (1986). The effect of sample size on the meaning of significance tests. The American Statistician, 40, 313–315. Royall, R.M. (1997). Statistical evidence: A likelihood paradigm. New York: Chapman & Hall. Schenker, N., & Gentleman, J.F. (2001). On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician, 55, 182–186. Schervish, M.J. (1996). P values: What they are and what they are not. The American Statistician, 50, 203–206. Schmidt, F.L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 1, 115–129. Sellke, T., Bayarri, M.J., & Berger, J.O. (2001). Calibration of p values for testing precise null hypotheses. The American Statistician, 55, 62–71. Smithson, M. (2003). Confidence intervals. Thousand Oaks, CA: SAGE. Sohn, D. (1998). Statistical significance and replicability: Why the former does not presage the latter. Theory & Psychology, 8, 291–311. Steiger, J.H. (1990). Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research, 25, 173–180. Thompson, B. (1994). The pivotal role of replication in psychological research: Empirically evaluating the replicability of sample results. Journal of Personality, 62, 157–176. Thompson, B. (1999). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology, 9, 165–181. Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31, 25–32. Tryon, W.W. (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests. Psychological Methods, 6, 371–386. Tukey, J.W. (1991). The philosophy of multiple comparisons. Statistical Science, 6, 100–116.
Salkind_Chapter 67.indd 281
9/4/2010 10:54:24 AM
Salkind_Chapter 67.indd 282
9/4/2010 10:54:24 AM
68 Alphabet Soup: Blurring the Distinctions between p’s and a’s in Psychological Research Raymond Hubbard
It is my personal belief that an objective look at the record will show that Fisher contributed a large number of statistical methods, but that Neyman contributed the basis of statistical thinking. (Lucien LeCam, quoted in Reid, 1982, p. 268)
E
xtensive confusion prevails among psychologists concerning the reporting and interpretation of results of classical statistical tests. The reason for this confusion is that textbooks on statistical methods in psychology and the social sciences usually present the subject matter as a single, comprehensive, uncontroversial theory of statistical inference. These texts rarely allude to the fact that classical statistical inference as it is generally portrayed is in fact an anonymous hybrid consisting of the union of the ideas developed by Ronald Fisher, on the one hand, and Jerzy Neyman and Egon Pearson, on the other (Gigerenzer, 1993; Gigerenzer & Murray, 1987; Gigerenzer et al., 1989; Huberty, 1993; Huberty & Pike, 1999). It is a union that neither side would have agreed to, given the pronounced philosophical and methodological differences between them. In fact, a bitter debate raged over the years between the Fisherian and Neyman–Pearson camps. The seminal work of Gigerenzer and his colleagues (Gigerenzer, 1993; Gigerenzer & Murray, 1987; Gigerenzer et al., 1989) and Huberty (1993; Huberty & Pike, 1999) notwithstanding, most researchers in psychology and
Source: Theory & Psychology, 14(3) (2004): 295–326.
Salkind_Chapter 68.indd 283
9/4/2010 10:54:06 AM
284
Research Design, Measurement and Statistics and Evaluation
elsewhere remain uninformed about the historical development of methods of statistical inference, and of the mixing of Fisherian and Neyman – Pearson concepts. In particular, there is widespread failure to acknowledge the incompatibility of Fisher’s evidential p value (actually, Karl Pearson [1900] introduced the modern p value, but Fisher popularized it) with Neyman–Pearson’s Type I error rate, α (Goodman, 1993). The difference between evidence ( p’s) and errors (α’s) is not some semantic splitting of hairs. Rather, it points to the basic distinctions between Fisher’s notions of significance testing and inductive inference, versus Neyman–Pearson’s ideas on hypothesis testing and inductive behavior. But since statistics textbooks often surreptitiously blend concepts from both sides, misunderstandings concerning the reporting and interpretation of statistical tests are virtually guaranteed. Adding insult to injury, the confusion over measures of evidence versus errors is so completely ingrained that it is not even seen as being a problem among the rank and file of researchers. As proof of this, even critics of statistical testing often fail to distinguish between p’s and α’s, and the repercussions this has on the meaning of empirical results. Thus, many of these critics (e.g. Carver, 1978; Kirk, 1996; Krueger, 2001; Rozeboom, 1960; Wilkinson & the APA Task Force on Statistical Inference, 1999) unwittingly adopt a Fisherian stance inasmuch as they talk almost exclusively in terms of p, as opposed to α, values. Still other critics or discussants of statistical testing (e.g. American Psychological Association, 1994, 2001; Cohen, 1990, 1994; Dar, Serlin, & Omer, 1994; Falk & Greenbaum, 1995; Loftus, 1996; Mulaik, Raju, & Harshman, 1997; Nickerson, 2000; Rosnow & Rosenthal, 1989; Schmidt, 1996) are inclined, erroneously, to use p’s and α’s interchangeably. Additional examples of recent studies critiquing the merits of statistical testing that nonetheless continue to offer incorrect advice regarding the interpretation of p’s and α’ s are readily adduced from the literature (e.g. Chow, 1996, 1998; Clark, 1999; Daniel, 1998; Dixon & O’Reilly, 1999; Finch, Cumming, & Thomason, 2001; Grayson, Pattison, & Robins, 1997; Hyde, 2001; Macdonald, 1997; Nix & Barnette, 1998). The varying levels of confusion exhibited in so many articles dealing with the meaning and interpretation of classical statistical tests points to the need to become familiar with their historical development. Krantz (1999) would surely agree with this assessment. In light of the above concerns, the present paper addresses how the confusion between p’s and α’s came about. I do this by first reporting on the major differences in the structure of the Fisherian and Neyman– Pearson schools of thought. In doing so, I typically let the protagonists speak for themselves. This is an absolute necessity given that their own names are conspicuously absent from the textbooks used to help teach psychologists about statistical methods. Because textbook authors almost uniformly do not cite and discuss Fisher’s and Neyman–Pearson’s respective contributions to the statistics literature, it is hardly surprising to learn that present researchers are not familiar with them. Second, I show how
Salkind_Chapter 68.indd 284
9/4/2010 10:54:06 AM
Hubbard
Distinctions between p’s and a’s
285
the competing ideas from the two camps have been inadvertently merged. The upshot is that although Neyman–Pearson theory claimed the mantle of statistical orthodoxy some fifty or so years ago (Hogben, 1957; LeCam & Lehmann, 1974; Nester, 1996; Royall, 1997; Spielman, 1974), it is Fisher’s influence which dominates statistical testing procedures in psychology today. Third, empirical evidence is gathered from a random sample of articles in 12 psychology journals for the period 1990–2002 detailing the widespread confusion among researchers caused by the mixing of Fisherian and Neyman–Pearson perspectives. This evidence is manifested in how researchers, through their misunderstandings of the differences between p’s and α’s, almost universally misreport and misinterpret the outcomes of statistical tests. They can scarcely help it, for such misreporting and misinterpretation is virtually sanctioned in the advice found in APA Publication Manuals (1994, 2001). The end result is that applications of classical statistical testing in psychology are largely meaningless. And this signals the need for changes in the way in which it is taught in the classroom. More specifically, hopes for eliminating (or at least drastically reducing) the mass confusion over the meanings of p’s and α’s must rest on acquainting students with the fundamentals of the historical development of Fisherian and Neyman–Pearson statistical testing. The present paper attempts to do this.
Comparing and Contrasting the Fisherian and Neyman–Pearson Paradigms Fisher’s Paradigm of Significance Testing Fisher’s ideas on significance testing, popularized in the many editions of his widely influential books Statistical Methods for Research Workers (1925) and The Design of Experiments (1935a), were enthusiastically received by practitioners. At the heart of his conception of inductive inference is what he termed the null hypothesis, H0. Although briefly dabbling with Bayesian approaches (Zabell, 1992), Fisher quickly renounced the methods of inverse probability, or the probability of a hypothesis (H) given the data (x), Pr(H | x), instead championing the direct probability, Pr(x | H). In particular, Fisher used disparities in the data to reject the null hypothesis, that is, the probability of the data conditional on a true null hypothesis, or Pr(x | H0). Consequently, a significance test is a means of determining the probability of a result, in addition to more extreme ones, on a null hypothesis of no effect or relationship. In Fisher’s model the researcher proposes a null hypothesis that a sample comes from a hypothetical infinite population with a known sampling distribution. As Gigerenzer and Murray (1987) comment, the null hypothesis is
Salkind_Chapter 68.indd 285
9/4/2010 10:54:06 AM
286
Research Design, Measurement and Statistics and Evaluation
rejected ‘if our sample statistic deviates from the mean of the sampling distribution by more than a criterion, which corresponds to alpha, the level of significance’ (p. 10)1 In other words, the p value from a significance test is regarded as a measure of the implausibility of the actual observations (as well as more extreme and unobserved ones) obtained in an experiment or other study, assuming a true null hypothesis. The rationale for the significance test is that if the data are seen as being rare or highly discrepant under H0 this constitutes inductive evidence against H0. Fisher (1966) noted that ‘It is usual and convenient for experimenters to take 5 per cent as a standard level of significance, in the sense that they are prepared to ignore all results which fail to reach this standard’ (p. 13). Thus, Fisher’s significance testing revolves around the rejection of the null hypothesis at the p ≤ .05 level. If, in an experiment, the researcher obtains a p value of, say, .05 or .01 on a true null hypothesis, it would be interpreted to mean that the probability of obtaining such an extreme (or more extreme) value is only 5% or 1%. (Hence, Fisher is a frequentist, but not in the same sense as Neyman–Pearson.) For Fisher (1966), then, ‘Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis’ (p. 16). In the Fisherian paradigm, an event is deemed established when we can conduct experiments that rarely fail to yield statistically significant ( p ≤ .05) results. As mentioned earlier, Fisher considered p values from single experiments as supplying inductive evidence against the null hypothesis, with smaller p values indicating greater evidence (Johnstone, 1986, 1987b; Spielman, 1974). According to Fisher’s famous disjunction, a p value ≤ .05 on the null hypothesis shows that either a rare event has occurred or else the null hypothesis is false (Seidenfeld, 1979). Fisher was sure that statistics could play a major role in fostering inductive inference, that is, drawing inferences from the particular to the general, from samples to populations. According to him, ‘Inductive inference is the only process known to us by which essentially new knowledge comes into the world’ (Fisher, 1966, p. 7). But Fisher (1958) was wary that mathematicians (certainly Neyman) did not necessarily subscribe to his inductivist viewpoint: In that field of deductive logic, at least when carried out with mathematical symbols, [mathematicians] are of course experts. But it would be a mistake to think that mathematicians as such are particularly good at the inductive logical processes which are needed in improving our knowledge of the natural world, in reasoning from observational facts to the inferences which those facts warrant. (p. 261)
Fisher never wavered in his belief that inductive reasoning was the chief mechanism of knowledge development, and for him the p values from significance tests were evidential.
Salkind_Chapter 68.indd 286
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
287
Neyman–Pearson’s Paradigm of Hypothesis Testing The Neyman–Pearson (1928a, 1928b, 1933) statistical paradigm is widely accepted as the norm in classical statistical circles (Carlson, 1976; Hogben, 1957; LeCam & Lehmann, 1974; Nester, 1996; Oakes, 1986; Royall, 1997; Spielman, 1974). Their work on hypothesis testing, terminology they preferred to distinguish it from Fisher’s ‘significance testing’, was quite distinct from the latter’s framework of inductive inference. The Neyman–Pearson approach postulates two competing hypotheses: the null hypothesis (H0) and the alternative hypothesis (HA). In justifying the need for an alternative hypothesis, Neyman (1977) wrote: … in addition to H [the null hypothesis] there must exist some other hypotheses, one of which may conceivably be true. Here, then, we come to the concept of the ‘set of all admissible hypotheses’ which is frequently denoted by the letter Ω. Naturally, Ω must contain H. Let H denote the complement, say Ω – H = H. It will be noticed that when speaking of a test of the hypothesis H, we really speak of its test ‘against the alternative H.’ This is quite important. The fact is that, unless the alternative H is specified, the problem of an optimal test of H is indeterminate. (p. 104)
Neyman–Pearson considered Fisher’s usage of the occurrence of rare or implausible results to reject H0 to be an inadequate vehicle for hypothesis testing. Something more was needed. They wanted to see whether this same improbable outcome under H0 is more likely to occur under a competing hypothesis. As Pearson later explained: ‘The rational human mind did not discard a hypothesis until it could conceive at least one plausible alternative hypothesis’ (E.S. Pearson, 1990, p. 82). Even William S. Gosset (‘Student’, of t test fame), a man whom Fisher admired, saw the need for an alternative hypothesis. In response to a letter from Pearson, Gosset wrote that ‘the only valid reason for rejecting any statistical hypothesis, no matter how unlikely, is that some alternative hypothesis explains the observed events with a greater degree of probability’ (quoted in Reid, 1982, p. 62). The inclusion of an alternative hypothesis by Neyman–Pearson critically distinguishes their approach from Fisher’s, and this was an issue of great contention between the two camps over the years. In Neyman–Pearson theory, the investigator selects a (typically point) null hypothesis and tests it against the alternative hypothesis. Their work introduced the probabilities of committing two kinds of error, namely false rejection (Type I error) and false acceptance (Type II error) of the null hypothesis. The former probability is called α, while the latter probability is called β. Eschewing Fisher’s ideas about hypothetical infinite populations, Neyman–Pearson results are predicated on the assumption of repeated
Salkind_Chapter 68.indd 287
9/4/2010 10:54:07 AM
288
Research Design, Measurement and Statistics and Evaluation
random sampling from a defined population (Gigerenzer & Murray, 1987). Therefore, Neyman–Pearson theory is best equipped for handling situations where repeated random sampling has meaning, such as in the case of quality-control experiments. In these narrow circumstances, the Neyman– Pearson frequentist interpretation of probability makes sense: α is the long-run relative frequency of Type I errors conditional on the null being true and β is the counterpart for Type II errors. The Neyman–Pearson theory of hypothesis testing introduced the entirely new concept of the power of a statistical test. The power of a test, or (1 – β), is the probability of rejecting a false null hypothesis. Since the power of a test to detect a particular effect size in the population can be calculated before conducting the research, it is useful in the design of experiments. In Fisher’s significance-testing scheme, however, there is no alternative hypothesis (HA), making the ideas about Type II errors and the power of the test irrelevant. Fisher (1935b) pointed this out when rebuking Neyman and Pearson without naming them: ‘In fact … “errors of the second kind” are committed only by those who misunderstand the nature and application of tests of significance’ (p. 474). And he subsequently added: The notion of an error of the so-called ‘second kind,’ due to accepting the null hypothesis ‘when it is false’ … has no meaning with respect to simple tests of significance, in which the only available expectations are those which flow from the null hypothesis being true. (Fisher, 1966, p. 17)
Fisher denied the need for an alternative hypothesis, and strenuously opposed its incorporation by Neyman–Pearson (Gigerenzer & Murray, 1987; Hacking, 1965). Fisher (1966), however, touches upon the concept of the power of a test when discussing the ‘sensitiveness’ of an experiment: By increasing the size of the experiment we can render it more sensitive, meaning by this that it will allow of the detection of a lower degree of sensory discrimination, or, in other words, of a quantitatively smaller departure from the null hypothesis. Since in every case the experiment is capable of disproving, but never of proving this hypothesis, we may say that the value of the experiment is increased whenever it permits the null hypothesis to be more readily disproved. (pp. 21–22)
And Neyman (1967) was, of course, familiar with this: ‘The consideration of power is occasionally implicit in Fisher’s writings, but I would have liked to see it treated explicitly’ (p. 1459). Whereas Fisher’s view of inductive inference centered on the rejection of the null hypothesis, Neyman and Pearson had no time at all for the very idea of inductive reasoning. Their concept of inductive behavior sought to provide rules for making decisions between two hypotheses,
Salkind_Chapter 68.indd 288
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
289
regardless of the researcher’s belief in either one. Neyman (1950) made this quite explicit: Thus, to accept a hypothesis H means only to decide to take action A rather than action B. This does not mean that we necessarily believe that the hypothesis H is true … [while rejecting H] … means only that the rule prescribes action B and does not imply that we believe that H is false. (pp. 259–260)
Neyman–Pearson theory, therefore, substitutes the idea of inductive behavior for that of inductive inference. According to Neyman (1971): The description of the theory of statistics involving a reference to behavior, for example, behavioristic statistics, has been introduced to contrast with what has been termed inductive reasoning. Rather than speak of inductive reasoning I prefer to speak of inductive behavior. (p. 1)
And ‘The term “inductive behavior” means simply the habit of humans and other animals (Pavlov’s dogs, etc.) to adjust their actions to noticed frequencies of events, so as to avoid undesirable consequences’ (Neyman, 1961, p. 148; see also Neyman, 1962). Further defending his preference for inductive behavior over inductive inference, Neyman (1957) acknowledged his suspicions about the latter ‘because of its dogmatism, lack of clarity, and because of the absence of consideration of consequences of the various actions contemplated’ (p. 16). In presenting his decision rules for taking action A rather than B, Neyman (1950) emphasized that ‘the theory of probability and statistics both play an important role, and there is a considerable amount of reasoning involved. As usual, however, the reasoning is all deductive’ (p. 1). The deductive character of the Neyman–Pearson model proceeds from the general to the particular. They came up with a ‘rule of behavior’ for selecting between two alternative courses of action, accepting or rejecting the null hypothesis, such that ‘in the long run of experience, we shall not be too often wrong’ (Neyman & Pearson, 1933, p. 291). Whether to accept or reject the hypothesis in their framework depends on the cost trade-offs involved with committing a Type I or Type II error. These costs are independent of statistical theory. They must be estimated by the researcher in the context of each particular problem. Neyman and Pearson (1933) advised: … in some cases it will be more important to avoid the first [type of error], in others the second [type of error]. . . . From the point of view of mathematical theory all we can do is to show how the risk of errors may be controlled or minimised. The use of these statistical tools in any given case, in determining just how the balance should be struck, must be left to the investigator. (p. 296)
Salkind_Chapter 68.indd 289
9/4/2010 10:54:07 AM
290
Research Design, Measurement and Statistics and Evaluation
After heeding such advice, the researcher would design an experiment to control the probabilities of the α and β error rates, with the ‘best’ test being the one that minimizes β subject to a bound on α (Lehmann, 1993). In determining what this bound on α should be, Neyman (1950) later stated that the control of Type I errors was more important than that of Type II errors: The problem of testing statistical hypotheses is the problem of selecting critical regions. When attempting to solve this problem, one must remember that the purpose of testing hypotheses is to avoid errors insofar as possible. Because an error of the first kind is more important to avoid than an error of the second kind, our first requirement is that the test should reject the hypothesis tested when it is true very infrequently. . . . To put it differently, when selecting tests, we begin by making an effort to control the frequency of the errors of the first kind (the more important errors to avoid), and then think of errors of the second kind. The ordinary procedure is to fix arbitrarily a small number α … and to require that the probability of committing an error of the first kind does not exceed α. (p. 265)
Consequently, α is specified or fixed prior to the collection of the data. Because of this, Neyman–Pearson methodology is sometimes labeled the fixed α (Huberty, 1993), fixed level (Lehmann, 1993) or fixed size (Seidenfeld, 1979) approach. This contrasts α with Fisher’s p value, which is a random variable whose distribution is uniform over the interval [0, 1] under the null hypothesis. The α and β error rates define a ‘critical’ or ‘rejection’ region for the test statistic, say z or t > 1.96. If the test statistic falls in the critical region, H0 is rejected in favor of HA, otherwise H0 is retained (Goodman, 1993; Huberty, 1993). Furthermore, descriptions of Neyman–Pearson theory refer to the rejection of H0 when H0 is true – the Type I error probability, α – as the ‘significance level’ of a test. As we shall see below, calling the Type I error probability the significance level of a statistical test was something quite unacceptable to Fisher. It has also helped to create enormous confusion among researchers concerning the meaning and interpretation of ‘statistical significance’. Recall that Fisher regarded his significance tests as constituting inductive evidence against the null hypothesis in single experiments (Johnstone, 1987a; Kyburg, 1974; Seidenfeld, 1979). Neyman–Pearson hypothesis tests, on the other hand, do not permit an inference to be made about the outcome of any individual hypothesis that the researcher is examining. Neyman and Pearson (1933) were unequivocal about this: ‘We are inclined to think that as far as a particular hypothesis is concerned, no test based upon the theory of probability can by itself provide any valuable evidence of the truth or falsehood of that hypothesis’ (pp. 290–291). But since scientists are in the business of gleaning evidence from individual studies, this limitation of Neyman–Pearson theory is acute. Nor, for that matter, does the Neyman– Pearson model allow an inference to be made in the case of ongoing, repetitive studies. Thus, Grayson, Pattison and Robins (1997) were incorrect when they stated that ‘one implication of a strictly frequentist [Neyman–Pearson]
Salkind_Chapter 68.indd 290
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
291
approach is that we can only make inferences on the basis of a long run of repeated trials’ (p. 68). Neyman–Pearson theory is strictly behavioral; it is non-evidential in both the short and long runs. As Neyman (1942) wrote: ‘it will be seen that the theory of testing hypotheses has no claim of any contribution to … “inductive reasoning” ’(p. 301). Fisher (1959) recognized this, commenting that the Neyman–Pearson ‘procedure is devised for a whole class of cases. No particular thought is given to each case as it arises, nor is the tester’s capacity for learning exercised’ (p. 100). Instead, the investigator is only allowed to make a decision about the likely outcome of a hypothesis as if it had been subjected, as Fisher (1956) observed, to ‘an endless series of repeated trials which will never take place’ (p. 99). In the vast majority of applied work, repeated random sampling does not occur; empirical findings are usually limited to a single sample. Fisher conceded that Neyman and Pearson’s contribution, which he referred to as an ‘acceptance procedures’ approach, had merit in the context of quality-control decisions. For example, he acknowledged: ‘I am casting no contempt on acceptance procedures, and I am thankful, whenever I travel by air, that the high level of precision and reliability required can really be achieved by such means’ (Fisher, 1955, p. 69). This concession aside, Fisher (1959) was resolute in his objections to Neyman–Pearson ideas about hypothesis testing as an appropriate method for guiding scientific research: The ‘Theory of Testing Hypotheses’ was a later attempt, by authors who had taken no part in the development of [significance] tests, or in their scientific application, to reinterpret them in terms of an imagined process of acceptance sampling, such as was beginning to be used in commerce; although such processes have a logical basis very different from those of a scientist engaged in gaining from his observations an improved understanding of reality. (pp. 4–5)
He insisted that … the logical differences between [acceptance procedures] and the work of scientific discovery by physical or biological experimentation seem to me so wide that the analogy between them is not helpful, and the identification of the two sorts of operation is decidedly misleading. (Fisher, 1955, pp. 69–70)
In further distancing himself from Neyman–Pearson methodology, Fisher (1955) drew attention to the fact that: From a test of significance, however, we learn more than that the body of data at our disposal would have passed an acceptance test at some particular level; we may learn, if we wish to, and it is to this that we usually pay attention, at what level it would have been doubtful; doing this we have a genuine measure of the confidence with which any particular opinion may be held, in view of our particular data. From a strictly realistic
Salkind_Chapter 68.indd 291
9/4/2010 10:54:07 AM
292
Research Design, Measurement and Statistics and Evaluation
viewpoint we have no expectation of an unending sequence of similar bodies of data, to each of which a mechanical ‘yes or no’ response is to be given. What we look forward to in science is further data, probably of a somewhat different kind, which may confirm or elaborate the conclusions we have drawn; but perhaps of the same kind, which may then be added to what we have already, to form an enlarged basis for induction. (p. 74)
The above discussion shows that Fisher and Neyman–Pearson disagreed vehemently over both the nature of statistical methods and their approaches to the conduct of science per se. Indeed, ongoing exchanges of a frequently acrimonious nature passed between Fisher and Neyman–Pearson as both sides promulgated their respective conceptions of statistical analysis and the scientific method.
Minding One’s p’s and a’s Users of statistical techniques in the social and medical sciences are almost totally unaware of the distinctions, described above, between Fisher’s ideas on significance testing and Neyman–Pearson thoughts on hypothesis testing (Gigerenzer, 1993; Goodman, 1993, 1999; Huberty, 1993; Royall, 1997). This is through no fault of their own; after all, they have been taught from numerous well-regarded textbooks on statistical methods. Unfortunately, many of these same textbooks combine, without acknowledgement, incongruous ideas from both the Fisherian and Neyman–Pearson camps. This is something that both sides found appalling. Ironically, as will be seen, the end result of this unintentional mixing of Fisherian with Neyman–Pearson ideas is that although the latter’s work came to be accepted as statistical orthodoxy about fifty years ago (Hogben, 1957; Spielman 1974), it is Fisher’s methods that flourish today. As Royall (1997) observed: The distinction between Neyman–Pearson tests and [Fisher’s] significance tests is not made consistently clear in modern statistical writing and teaching. Mathematical statistical textbooks tend to present Neyman–Pearson theory, while statistical methods textbooks tend to lean more towards significance tests. The terminology is not standard, and the same terms and symbols are often used in both contexts, blurring the differences between them. (p. 64)
Johnstone (1986) and Keuzenkamp and Magnus (1995) maintain that statistical testing usually follows Neyman–Pearson formally, but Fisher philosophically. For instance, Fisher’s notion of disproving the null hypothesis is taught along with the Neyman–Pearson concepts of alternative hypotheses, Type II errors and the power of a statistical test. In addition, textbook descriptions of Neyman–Pearson theory often refer to the Type I error probability as the ‘significance level’ (Goodman, 1999; Kempthorne, 1976; Royall, 1997). But the quintessential example of the bewilderment caused by the forging of Fisher’s ideas on inductive inference with the Neyman–Pearson principle of
Salkind_Chapter 68.indd 292
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
293
inductive behavior is the widely unappreciated fact that the former’s p value is incompatible with the Neyman–Pearson hypothesis test in which it has become embedded (Goodman, 1993). Despite this fundamental incompatibility, the end result of this mixing is that the p value is now indelibly linked in researchers’ minds with the Type I error rate, α. And this is precisely what Fisher (1955) had earlier complained about when he accused Neyman– Pearson of attempting ‘to assimilate a test of significance to an acceptance procedure’ (p. 74). Because of this assimilation, much empirical work in psychology and the social and biological sciences proceeds in the following manner: the investigator states the null (H0) and alternative (HA) hypotheses, the Type I error rate/significance level, α, and presumably – but rarely – calculates the statistical power of the test (e.g. t). These steps are in accordance with Neyman–Pearson convention. After this, the test statistic is computed for the sample data, and in an effort to have the best of both worlds, an associated p value (significance probability) is calculated. The p value is then erroneously interpreted as a frequency-based ‘observed’ Type I error rate, α (Goodman, 1993), and at the same time as an incorrect (i.e. p < α) measure of evidence against H0.
The p Value as a Type I Error Rate Even staunch critics of, and other commentators on, statistical testing in psychology occasionally commit this error. Thus Dar et al. (1994) noted: ‘The sample p value, in the context of null hypothesis testing, is involved … in determining whether the predetermined criterion of Type I error, the alpha level, has been surpassed’ (p. 76). Likewise, Meehl (1967) misreported that the investigator ‘gleefully records the tiny probability number “p < .001,” and there is a tendency to feel that the extreme smallness of this probability of a Type I error is somehow transferable’ (p. 107, my emphasis). Nickerson (2000) also makes the mistake of drawing a parallel between p values and Type I error rates: ‘The value of p that is obtained as the result of NHST is the probability of a Type I error on the assumption that the null hypothesis is true’ (p. 243). He later goes on to compound this mistake by adding: ‘Both p and α represent bounds on the probability of Type I error … p is the probability of a Type I error resulting from a particular test if the null hypothesis is true’ (p. 259, my emphasis). Here, Nickerson misinterprets a p value as an ‘observed’ Type I error rate, something which is impossible since the latter applies only to long-run frequencies, not to individual instances. Neyman (1971) expressed this as follows: It would be nice if something could be done to guard against errors in each particular case. However, as long as the postulate is maintained that the observations are subject to variation affected by chance (in the sense of frequentist theory of probability), all that appears possible to do is to control the frequencies of errors in a sequence of situations. (p. 13)
Salkind_Chapter 68.indd 293
9/4/2010 10:54:07 AM
294
Research Design, Measurement and Statistics and Evaluation
The p value is not a Type I error rate, long-run or otherwise; it is a measure of inductive evidence against H0. Type I errors play no role in Fisher’s paradigm. This misinterpretation of his evidential p value as a Neyman–Pearson Type I error rate severely upset Fisher, who was adamant that the significance level of a statistical test had no ongoing sampling interpretation. With regard to the .05 level, Fisher (1929) early on warned that this does not mean that the researcher ‘allows himself to be deceived once in every twenty experiments. The test of significance only tells him what to ignore, namely all experiments in which significant results are not obtained’ (p. 191). The significance level, for Fisher, was a measure of evidence for the ‘objective’ disbelief in the null hypothesis; it had no long-run frequentist characteristics. Again, Fisher (1950) protested that his tests of significance … have been most unwarrantably ignored in at least one pretentious work on ‘Testing Statistical Hypotheses’ … Pearson and Neyman have laid it down axiomatically that the level of significance of a test must be equated to the frequency of a wrong decision ‘in repeated samples from the same population.’ This idea was foreign to the development of tests of significance given by the author in 1925. (p. 35.173a, my emphasis)
Seidenfeld (1979) exposed the difference between the two schools of thought on this crucial matter: … such a frequency property has little or no connection with the interpretation of the [Fisherian significance] test. To repeat, the correct interpretation is through the disjunction, either a rare event has occurred or the null hypothesis is false. (p. 79)
In highlighting the discrepancies between p’s and α’s, Gigerenzer (1993) offered the following: For Fisher, the exact level of significance is a property of the data (i.e., a relation between a body of data and a theory); for Neyman and Pearson, alpha is a property of the test, not of the data. Level of significance [p value] and alpha are not the same thing’ (p. 317, my emphasis)
Despite the above cautions about p values not being Type I error rates, it is sobering to note that even well-known statisticians such as Barnard (1985), Gibbons and Pratt (1975) and Hinkley (1987) nevertheless make the mistake of equating them. Yet, as Berger and Delampady (1987) warn, the interpretation of the p value as an error rate is strictly forbidden: P-values are not a repetitive error rate … A Neyman–Pearson error probability, α, has the actual frequentist interpretation that a long series of α level tests will reject no more than 100α% of true H0, but the datadependent-P-values have no such interpretation. (p. 329)
Salkind_Chapter 68.indd 294
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
295
At the same time it must be underlined that Neyman–Pearson would not endorse an inferential or epistemic interpretation of statistical testing, as manifested in a p value. Their theory is behavioral, not evidential, and they would likewise complain that the p value is not a Type I error rate. It should therefore be pointed out that in his effort to partially resolve discrepancies between the Fisherian and Neyman–Pearson programs, Lehmann (1993) similarly fails to distinguish between measures of evidence versus error. He refers to the Type I error rate as the significance level of the test, when for Fisher this was determined by p values and not α’s. And we have seen that misconstruing the evidential p value as a Neyman–Pearson Type I error rate was anathema to both Fisher and Neyman–Pearson.
The p Value as a Quasi-measure of Evidence against H0 (p < α) While the p value is being erroneously reported as a Neyman–Pearson Type I error rate, it will be interpreted simultaneously in an incorrect quasi-Fisherian manner as evidence against H0. If p < α, a statistically significant finding is announced, and the null hypothesis is disproved. For example, Leavens and Hopkins’ (1998) declaration that ‘alpha was set at p < .05 for all tests’ (p. 816) reflects a common tendency among researchers to confuse p’s and α’s in a statistical significance testing framework. Clark-Carter (1997) goes further in this regard by invoking the great man himself: ‘According to Fisher, if p were greater than a then the null hypothesis could not be accepted’ (p. 71). Fisher, of course, would have taken umbrage at such a statement, just as he would have with Clark’s (1999) assertion that ‘in Fisher’s original work in agriculture, alpha was set a priori’ (p. 283), with Huberty (1993, p. 328) and Huberty and Pike’s (1999, p. 11) suggestions that Fisher encouraged the use of α = .05, with Cortina and Dunlap’s (1997) recommendation to compare observed probabilities with predetermined α cut-off values, and with Chow’s (1996) error of investing his (Fisher’s) tests with both p’s and α’s. Fisher had no use for the concept α. Again, in an otherwise thoughtful article titled ‘The Appropriate Use of Null Hypothesis Testing’, Frick (1996) nonetheless makes the mistake of using p’s and α’s interchangeably: ‘Finally, the obtained value of p is compared to a criterion alpha, which is conventionally set at .05 … When p is less than .05, the experimenter has sufficient empirical evidence to support a claim’ (p. 385). In addition, Nickerson (2000), who earlier misinterpreted a p value as a Type I error rate, follows Frick (1996) in ascribing an evidential meaning to the p value when it is directly compared with this error rate: A specified significance level conventionally designated α (alpha) serves as a decision criterion, an the null hypothesis is rejected only if the value
Salkind_Chapter 68.indd 295
9/4/2010 10:54:07 AM
296
Research Design, Measurement and Statistics and Evaluation
of p yielded by the test is not greater than the value of α. If α is set at .05, say, and a significance test yields a value of p equal to or less than .05, the null hypothesis is rejected and the result is said to be statistically significant at that level. (Nickerson, 2000, pp. 242–243)
Nix and Barnette (1998) do likewise in a paper subtitled ‘A Review of Null Hypothesis Significance Testing’: ‘As such, p values lower than the alpha value are viewed as a rejection of the null hypothesis, and p values equal to or greater than the alpha value are viewed as a failure to reject’ (p. 6). Yet we have seen that interpreting p values as evidence against the null hypothesis in a single experiment is impossible in the Neyman–Pearson framework. Their approach centers on decision rules with a priori stated error rates, α and β, which are limiting frequencies based on long-run repeated sampling. If a result falls into the critical region, H0 is rejected and HA is accepted, otherwise H0 is accepted and HA is rejected (Goodman, 1993; Huberty, 1993). Interestingly, this last claim contradicts Fisher’s (1966) remark that ‘the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation’ (p. 16). In the Neyman– Pearson framework one can indeed ‘accept’ the null hypothesis. Neyman’s (1942) advice makes this plain: … we may say that any test of a statistical hypothesis consists in a selection of a certain region, w0, in the n dimensional experimental space W, and in basing our decision as to the hypothesis H0 on whether the experimental point E', determined by the actual observations, falls within w0 or not. If it does, the hypothesis H0 will be rejected, if it does not, it will be accepted. (p. 303, my emphasis)
Note, then, the distinctly Fisherian bent adopted by Wilkinson and the APA Task Force on Statistical Inference (1999) when they recommend: ‘Never use the unfortunate expression “accept the null hypothesis” ’ (p. 599). To reiterate, this advice is at odds with, ostensibly, Neyman–Pearson statistical convention.
Further Confusion over p’s and α’s In the Neyman–Pearson decision model the researcher is only allowed to say whether or not the result fell in the critical region, not where it fell, as might be indicated by a p value. Thus, if the Type I error rate, α, is fixed at it usual .05 value before (as it must be) the study is carried out, and the researcher subsequently obtains a p value of, say, .0014, this exact value cannot be reported in a Neyman–Pearson hypothesis test (Oakes, 1986). This is because, Goodman (1993, 1999) explains, α is the probability of a set of possible results that may fall anywhere in the tail area of the distribution under the null hypothesis, and we cannot know in advance which of these particular results will arise. This differs from the tail area for the p value,
Salkind_Chapter 68.indd 296
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
297
which is known only after the result is observed, and which, by definition, will always lie exactly on the border of that tail area. Consequently, a predetermined Type I error rate cannot be conveniently renegotiated as a measure of evidence after the result is observed (Royall, 1997). Despite the above, Wilkinson and the APA Task Force on Statistical Inference (1999) again display a Fisherian, rather than a Neyman–Pearsonian, orientation when they state: ‘It is hard to imagine a situation in which a dichotomous accept–reject decision is better than reporting an actual p value’ (p. 599). But it is not at all difficult to imagine such a situation in a Neyman– Pearson context. On the contrary, the dichotomous accept–reject decision is all that is possible in their statistical calculus. Furthermore, p values, exact or otherwise, play no role in the Neyman–Pearson model. By the same reasoning, it is not permissible to report what Goodman (1993, p. 489) calls ‘roving alphas’, whereby p values are assigned a limited number of categories of Type I error rates, such as p <.05, p <.01, p <.001, and so on. As mentioned earlier, and elaborated below, a Type I error rate, α, must be fixed before the data are collected, and any ex post facto reinterpretation of values like p < .05, p < .01, and so on, as variable Type I error rates applicable to different parts of any given study is strictly inadmissible.
Why Must α Be Fixed before the Study Is Carried Out? At this juncture it is helpful to be reminded why it is necessary to fix α prior to conducting the study, rather than allow it to be flexible (like a p value), and how the failure to do so has tainted researcher behavior. Alpha must be selected before performing the study so as to constrain the Type I error rate to some agreed-upon level. In specifying α in advance, Neyman (1977) admitted that in their efforts to build a frequentist theory of testing hypotheses, he and Pearson were inspired by the French mathematician Borel’s advice that: … (a) the criterion to test a hypothesis (a ‘statistical hypothesis’) using some observations must be selected not after the examination of the results of observation, but before, and (b) this criterion should be a function of the observations ‘en quelque sort remarquable’. (p. 103)
Royall (1997, pp. 119–121) addresses the rationale for prespecifying α, and I paraphrase him here. Suppose a researcher decides not to supply α ahead of time, but waits instead until the data are in. It is decided that if the result is statistically significant at the .01 level, it will be reported as such, and if it falls between the .01 and .05 levels, it will be recorded as being statistically significant at the .05 level. In the Neyman–Pearson model, this test procedure is improper and yields misleading results, namely, that while H0 will occasionally be rejected at the .01 level, it will always be rejected any
Salkind_Chapter 68.indd 297
9/4/2010 10:54:07 AM
298
Research Design, Measurement and Statistics and Evaluation
time the result is statistically significant at the .05 level. Thus, the research report does not have a Type I error probability of .01, but only .05. If the prespecified α is fixed at the .05 level, no extra observations can legitimize a claim of statistical significance at the .01 level. This, Royall (1997) points out, makes perfect sense within Neyman–Pearson theory, although no sense at all from a scientific perspective, where additional observations would be seen as preferable. And this deficiency in the Neyman–Pearson paradigm has encouraged researchers, albeit unwittingly, to incorrectly supplement their testing procedures with ‘roving alpha’ p values. The Neyman–Pearson hypothesis testing program strictly disallows any ‘peeking’ at the data and (repeated) sequential testing (Cornfield, 1966; Royall, 1997). Cornfield (1966, p. 19) tells of a situation he calls common among statistics consultants: suppose, after gathering n observations, a researcher does not quite reject H0 at the prespecified α = .05 level. The researcher still believes the null hypothesis is false, and that if s/he had obtained a statistically significant result, the findings would be submitted for publication. The investigator then asks the statistician how many more data points would be necessary to reject the null. And, in Neyman–Pearson theory, the answer is no amount of (even extreme) additional data points would allow rejection at the .05 level. As incredible as this answer seems, it is correct. If the null hypothesis is true, there is a .05 chance of its being rejected after the first study. As Cornfield (1966) remarks, however, to this chance we must add the probability of rejecting H0 in the second study, conditional on our failure to do so after the first one. This, in turn, raises the total probability of the erroneous rejection of H0 to over .05. Cornfield shows that as the n size in the second study is continuously increased, the significance level approaches .0975 (= .05 + .95 × .05). This demonstrates that no amount of collateral data points can be gathered to reject H0 at the .05 level. Royall (1997) puts it this way: ‘Choosing to operate at the 5% level means allowing only a 5% chance of erroneously rejecting the hypothesis, and the experimenter has already taken that chance. He spent his 5% when he tested after the first n observations’ (p. 111). But once again, despite being in complete accord with Neyman–Pearson theory, this explanation runs counter to common sense. And once again, researchers mistakenly augment the Neyman–Pearson hypothesis-testing model with Fisherian p values, thereby introducing a thicket of ‘roving’ or pseudo-alphas in their reports. Further confounding matters, these variable Type I error ‘p’ values are also interpreted in an evidential fashion. This happens, for example, when p < .05 is deemed ‘significant’, p < .01 is ‘highly significant’, p < .001 is ‘extremely significant’, and so on. Goodman (1993, 1999) and Royall (1997) caution that because of its apparent similarity with the Neyman–Pearson Type I error rate, α, Fisher’s p value has been subsumed within the former’s paradigm. As a result, the p value has been contemporaneously interpreted as both a measure of
Salkind_Chapter 68.indd 298
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
299
evidence and an ‘observed’ error rate. This has created extensive confusion over the meaning of p values and α levels. Unfortunately, as Goodman (1992) warned, … because p-values and the critical regions of hypothesis tests are both tail area probabilities, they are easy to confuse. This confusion blurs the division between concepts of evidence and error for the statistician, and obscures it completely for nearly everyone else. (p. 897)
Fisher, Neyman–Pearson and the .05 Level Finally, it is ironic that the confusion surrounding the differences between p’s and α’s was inadvertently fueled by Neyman and Pearson themselves. This occurred when they used, for convenience, Fisher’s 5% and 1% significance levels to help define their Type I error rates (E.S. Pearson, 1962). Fisher’s popularizing of such nominal levels of statistical significance is a curious, and influential, historical oddity. While working on Statistical Methods for Research Workers, Fisher was denied permission by Karl Pearson to reproduce W.P. Elderton’s table of χ2 from the first volume of Biometrika, and therefore created his own version. In doing so, Egon Pearson (1990) wrote: ‘[Fisher] gave the values of [Karl Pearson’s] χ2 [and student’s t] for selected values of P ... instead of P for arbitrary χ2, and thus introduced the concept of nominal levels of significance’ (p. 52, my emphasis). As noted, Fisher’s use of 5% and 1% levels was also adopted by Neyman–Pearson. And Fisher (1959) criticized them for doing so, explaining that ‘no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas’ (p. 42). No wonder that many researchers confuse Fisher’s evidential p values with Neyman–Pearson behavioral error rates when both concepts are ossified at the 5% and 1% levels.
Confusion over p’s and α’s: APA Publication Manuals Adding fuel to the fire, there is also ongoing confusion and equivocation over the interpretation of p values and α levels in various APA Publication Manuals. I noted previously that a prespecified Type I error rate (α) cannot be simultaneously viewed as a measure of evidence ( p) once the result is observed. Yet this mistake is actively endorsed as official policy in an earlier APA Publication Manual (1994): For example, given a true null hypothesis, the probability of obtaining the particular value of the statistic you computed might be [ p =] .008.
Salkind_Chapter 68.indd 299
9/4/2010 10:54:07 AM
300
Research Design, Measurement and Statistics and Evaluation
Many statistical packages now provide these exact [p] values. You can report this distinct piece of information in addition to specifying whether you rejected or failed to reject the null hypothesis using the specified alpha level. With an alpha level of .05, the effect of age was statistically significant, F (1, 123) = 7.27, p = .008. (p. 17)
And psychologists are provided with mixed messages by statements such as: ‘Before you begin to report specific results, you should routinely state the particular alpha level you selected for the statistical tests you conducted: [for example,] An alpha level of .05 was used for all statistical tests’ (APA, 1994, p. 17). This is sound advice that is congruent with Neyman–Pearson principles. However, it is immediately followed with some poor advice: ‘If you do not make a general statement about the alpha level, specify the alpha level when reporting each result’ (p. 17). This recommendation promotes unsound statistical practice by virtually sanctioning the habitual use of untenable ‘roving alphas’. Such a situation, whether applied to actual α’s or (more likely) p values, is completely unacceptable: α must be determined prior to, not after, data collection and statistical analysis. The latest APA Publication Manual (2001), following on the heels of the report of Wilkinson and the APA Task Force on Statistical Inference (1999), continues in a similar vein. Like its 1994 predecessor, for instance, the 2001 Manual distinguishes the two types of probabilities associated with statistical testing, p and α (although both editions incorrectly call the p value a likelihood when it is a probability). But the new version likewise proceeds to dispense erroneous counsel: The APA is neutral on which interpretation [ p or α] is to be preferred in psychological research. . . . Because most statistical packages now report the p value … and because this probability can be interpreted according to either mode of thinking, in general it is the exact probability ( p value) that should be reported. (APA, 2001, p. 24, my emphasis)
But this is not the case. The p value is not a Type I error rate, and an α level is not evidential. The Manual goes on to declare: There will be cases – for example, large tables of correlations or complex tables of path coefficients – where the reporting of exact probabilities could be awkward. In these cases, you may prefer to identify or highlight a subset of values in the table that reach some prespecified level of statistical significance. To do so, follow those values with a single asterisk (∗) or double asterisk (∗∗) to indicate p < .05 or p < .01, respectively. When using prespecified significance levels, you should routinely state the particular alpha level you selected for the statistical tests you conducted: An alpha level of .05 was used for all statistical tests. (p. 25)
Salkind_Chapter 68.indd 300
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
301
Such recommendations are misleading to authors. Just after being told to present exact, data-dependent, p values wherever feasible, we are also instructed to use both p values and α levels to indicate prespecified (yet variable) levels of statistical significance. This virtually obligates the researcher to consider a p value as an ‘observed’ Type I error rate. This obligation is all but cemented in another passage, where two examples are given: Two common approaches for reporting statistical results using the exact probability formulation are as follows: With an alpha level of .05, the effect of age was statistically significant, F (1, 123) = 7.27, p < .01. The effect of age was not statistically significant, F (1, 123) = 2.45, p = .12. The second example should be used only if you have included a statement of significance level earlier in your article. (p. 25)
Note that the first example uses p and α interchangeably, as if they are the same thing. Moreover, p < .01 is not an exact probability. Example two, which does use an exact probability, implies that such probabilities cannot be used without first announcing an ‘overall’ significance level – a prespecified α– to which all others bow. This is not the case.2 Because some statisticians, some quantitative psychologists and APA Publication Manuals are unclear about the differences between p’s and α’ s, it is to be expected that confusion levels among psychologists in general will be high. I assess the magnitude of this confusion in a sample of APA journals.
Confusion over p’s and α’s: Empirical Evidence An empirical investigation was made of the way in which the results of statistical tests are reported in the psychology literature. In particular, a randomly selected issue of each of 12 psychology journals – the American Psychologist (AP), Developmental Psychology (DP), Journal of Abnormal Psychology (JAbP), Journal of Applied Psychology (JAP), Journal of Comparative Psychology (JComP), Journal of Consulting and Clinical Psychology (JCCP), Journal of Counseling Psychology (JCouP), Journal of Educational Psychology (JEP), Journal of Experimental Psychology – General (JExPG), Journal of Personality and Social Psychology (JPSP), Psychological Bulletin (PB) and Psychological Review (PR) – was examined for every year from 1990 through 2002 to ascertain the number of empirical articles and notes they contained. Table 1 shows that this examination produced a sample of 1,750 empirical papers, of which 1,645, or 94%, used statistical tests. There is some variability in the
Salkind_Chapter 68.indd 301
9/4/2010 10:54:07 AM
302
Research Design, Measurement and Statistics and Evaluation
Table 1: The incidence of statistical testing in 12 psychology journals: 1990–2002
Journal
Number of papers
Number of empirical papers
Number of empirical papers using statistical tests
Percentage of empirical papers using statistical tests
AP DP JAbP JAP JCCP JComP JCouP JEP JExPG JPSP PB PR
149 222 242 193 273 148 176 193 100 191 128 110
10 213 240 186 236 146 157 187 86 188 41 60
4 211 238 176 230 138 140 178 85 180 32 33
40.0 99.1 99.2 94.6 97.5 94.5 89.2 95.2 98.8 95.7 78.0 55.0
Totals
2,125
1,750
1,645
94.0
Table 2: The reporting of results of statistical tests ‘Roving alphas’ (R) p’s 1,090
Total 1,090 Percentage 66.3
Exact p values (Ep ) 54
54 3.3
Combination of Ep’s with fixed p values and ‘roving alphas’ Ep + .05 18 Ep + .01 5 Ep + .001 7 Ep + R 354 384 23.3
‘Fixed’ level values Level
p’s
‘Significant’
α’s
Unspecified Total
.05 47
5
11
17
1
1
17
–
2
7
1
2
88
7
16
6
.01 .001 Other
5.3
0.4
1.0
6
1,645
0.4
100.0
percentage of empirical papers using statistical testing, with a low of 40% for AP, and a high of 99.2% for JAbP. Moreover, six of the journals (DP, JAbP, JCCP, JEP, JExPG and JPSP) had in excess of 95% of their published empirical works featuring such tests. Even though the Fisherian evidential p value from a significance test is at odds with the Neyman–Pearson hypothesis-testing approach, Table 2 reveals that p values are ubiquitous in psychology’s empirical literature. In marked contrast, α levels are few and far between. Of the 1,645 papers using statistical tests, 1,090, or 66.3%, employed ‘roving alphas’, that is, a discrete, graduated number of p values interpreted variously as Type I error rates and/or measures of evidence against H0, usually p < .05, p < .01, p < .001, and so on. In other words, these p values are sometimes
Salkind_Chapter 68.indd 302
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
303
viewed as an ‘observed’ Type I error rate, meaning that they are not pre-assigned, or fixed, error levels that would be dictated by Neyman–Pearson theory. Instead, these ‘error rates’ are incorrectly determined solely by the data. Corroborating my figures on the incidence of ‘roving alphas’ – 66.3% overall, and 74.4% for the JAP – Finch et al. (2001) found that these were used in 77% of empirical articles in the JAP for the period 1955 to 1999, and in almost 100% of such articles in the British Journal of Psychology for 1999. They concluded: ‘Typical practice is to report relative p values using two or more implicit alpha levels, most commonly .05 and .01 and sometimes .001’ (p. 198, my emphasis). Interestingly, however, they are not critical, as they should be, of this custom of interpreting p values as Type I error rates. Gigerenzer (1993), on the other hand, most certainly is. When discussing how p values are usually reported in the literature – p < .05, p < .01, p < .001 – he observes: ‘Neyman and Pearson would have rejected this practice: These p values are not the probability of Type I errors … [values] such as p < .05, which look like probabilities of Type I errors but aren‘t’ (p. 329). Further clouding the issue, these same p values will be interpreted simultaneously in a quasi-evidential manner as a basis for rejecting H0 if p < α. This includes, in many cases, mistakenly using the p value as a surrogate measure for effect sizes (e.g. p < .05 is ‘significant’, p < .01 is ‘very significant’, p < .001 is ‘extremely significant’, etc.). In short, these ‘roving alphas’ can assume a number of incorrect and contradictory interpretations. After the ‘roving alphas,’ an additional 54 (3.3%) papers reported ‘exact’ p values, and a further 384 (23.3%) presented various combinations of exact p’s with either ‘roving alphas’ or fixed p values. Conservatively, therefore, some 1,474, or 89.6% (i.e. ‘roving alphas’ plus the combination of exact p’s and ‘roving alphas’), of empirical articles in a sample of psychology journals report the results of statistical tests in a fashion that is at variance with Neyman–Pearson convention. At the same time they violate Fisherian theory, as when p values are misinterpreted as both Type I error rates and as quasiFisherian (i.e. p < α) measures of evidence against the null. Another 6 (0.4%) studies were insufficiently clear about the disposition of a result in their accounts (other than comments such as ‘This result was statistically significant’). I therefore assigned them to the ‘Unspecified’ category. Thus, 111 (6.7%) studies remain eligible for the reporting of ‘fixed’ level α values in the manner intended by Neyman–Pearson. However, 88 of these 111 studies reported ‘fixed p’ rather than fixed α levels. After subtracting this group, a mere 23 (1.4%) studies remain eligible. Of these 23 papers, 7 simply refer to their published results as being ‘significant’ at the .05, .01, levels, and so on, but provide no information about p values and/or α levels. In the final analysis, only 16 of 1,645 empirical papers using statistical tests, or 1%, explicitly used α levels.
Salkind_Chapter 68.indd 303
9/4/2010 10:54:07 AM
Salkind_Chapter 68.indd 304
2 160 140 131 136 84 88 126 55 133 16 19
1,090
AP DP JAbP JAP JCCP JComP JCouP JEP JExPG JPSP PB PR
Totals
66.3
50.0 75.8 58.8 74.4 59.4 60.9 62.9 70.8 64.7 73.9 50.0 57.6
54
_ 11 8 − 14 4 1 5 3 1 4 3 3.3
_ 5.2 3.4 − 6.1 2.9 0.7 2.8 3.5 0.6 12.5 9.1
%
No.
%
No.
Journal
Exact p values (Ep )
‘Roving alphas’ (R)
384
1 34 82 19 68 38 40 34 14 43 6 5
No.
23.3
25.0 16.1 34.5 10.8 29.7 27.5 28.6 19.1 16.5 23.9 18.8 15.2
%
Combination of Ep’s with fixed p values and ‘roving alphas’
Table 3: The reporting of results of statistical tests by journal
88
1 6 3 23 11 8 9 8 7 3 4 5
No.
%
5.3
25.0 2.8 1.3 13.1 4.8 5.8 6.4 4.5 8.2 1.7 12.5 15.2
p’s
7
− − 3 1 − 1 − − 1 − − 1
No.
0.4
− − 1.3 0.6 − 0.7 − − 1.2 − − 3.0
%
‘Significant’
‘Fixed’ level values
16
− − 1 2 − 2 2 4 3 − 2 −
No.
α’s
1.0
_ − 0.4 1.1 − 1.4 1.4 2.2 3.5 − 6.3 −
%
6
_ − 1 1 − 1 − 1 2 − − −
No.
0.4
_ − 0.4 0.6 − 0.7 − 0.6 2.4 − − −
%
Unspecified
304 Research Design, Measurement and Statistics and Evaluation
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
305
Table 3 makes it abundantly clear that articles in all 12 psychology journals reflect confusion in the reporting of results of statistical tests. Fisher’s p values (‘roving alphas’, exact, and in various combinations) dominate the way in which the results of statistical tests are portrayed in psychology journals. Neyman–Pearson fixed α’ s are practically invisible. Indeed, 5 of the 12 journals in this sample (AP, DP, JCCP, JPSP, PR) report no use of fixed α’ s whatsoever. The fact that Neyman–Pearson theory is recognized as statistical orthodoxy is overwhelmingly belied by the results in this paper. In psychology journals, Fisher’s legacy thrives; Neyman–Pearson viewpoints, on the other hand, receive only lip service.
Discussion Confusion over the interpretation of classical statistical tests is so universal as to render their application almost meaningless. This ubiquitous confusion among researchers over the meaning of p values and α levels is easier to understand when it is pointed out that both expressions are used to refer to the ‘significance level’ of a test. But their interpretations are poles apart. The level of significance expressed by a p value in a Fisherian significance test indicates the probability of encountering data this extreme (or more so) under a null hypothesis. This p value does not pertain to some prespecified error rate, but instead is determined by the data, and has epistemic implications through its use as a measure of inductive evidence against H0 in individual studies. Contrast this with the significance level called α in a Neyman–Pearson hypothesis test. Here, the emphasis is on minimizing Type II, or β, errors (i.e. false acceptance of a null hypothesis) subject to a bound on Type I, or α errors (i.e. false rejections of a null hypothesis). Moreover, this error minimization applies only to long-run repeated sampling from the same population, not to single experiments, and is a directive for behaviors, not a way of gathering evidence. Viewed from this perspective, the two notions of ‘statistical significance’ could hardly be more distant in meaning. And both the Fisherian and Neyman–Pearson camps, as has been shown repeatedly throughout this paper, were keenly aware of these differences and attempted, in vain as it turns out, to communicate them to research workers. These differences are summarized in Table 4. Yet these distinguishing characteristics of p’s and α’ s are rarely spelled out in the literature. On the contrary, they tend to be used equivalently, even being compared with one another (e.g. p < α), especially in statistics textbooks aimed at applied researchers. Usually, in such texts, an anonymous account of standard Neyman–Pearson doctrine is put forward initially, and is often followed by an equally anonymous discussion of ‘the p value approach’. This transition from α levels to p values (and their intermixing) is typically seamless, as if it constituted a natural progression through different parts of the same coherent statistical whole.
Salkind_Chapter 68.indd 305
9/4/2010 10:54:07 AM
306
Research Design, Measurement and Statistics and Evaluation
Table 4: Contrasting p’s and α’s p value
a level
Fisherian significance level
Neyman–Pearson significance level
Significance test
Hypothesis test
Evidence against H0
Type I error – erroneous rejection of H0
Inductive philosophy – from particular to general
Deductive philosophy – from general to particular
Inductive inference – guidelines for interpreting strength of evidence in data
Inductive behavior – guidelines for making decisions based on data
Data-based random variable
Pre-assigned fixed value
Property of data
Property of test
Short-run – applies to any single experiment/study
Long-run – applies only to ongoing, identical repetitions of original experiment/study, not to any given study
Hypothetical infinite population
Clearly defined population
Of course, a cynic might argue so what if researchers cannot correctly distinguish between the meanings of p’s and α’s. S/he may ask just exactly how the confusion over p’s and α’s has impeded knowledge growth in the discipline. But surely continued, misinformed statistical testing cannot be justified on the grounds that its pernicious consequences are difficult to isolate. Another line of argument that might be adopted by those seeking to maintain the status quo with regard to statistical testing could be that in their interchangeable uses of p values and α’s, researchers are simply picking and choosing from the ‘best parts’ of the (unstated) Fisherian and Neyman–Pearson paradigms. Why should a researcher, one might say, feel obligated to use only one or the other of the statistical testing methods? It might be thought that employing an α level here and a p value there does no damage. But these kinds of rationalizations of improper statistical practice do not wash. The Fisherian and Neyman–Pearson conceptions of statistical testing are incommensurate. The researcher is not at liberty to pick and choose, albeit unintentionally, from two incompatible statistical approaches. Yet the crux of the problem is that this is precisely what the overwhelming majority of psychology (and other) researchers do in their empirical analyses. For example, it would be surprising indeed to learn that the author(s) of even a single article from among the 1,645 examined in our sample consciously planned to analyze the data from both the Fisherian and Neyman–Pearsonian perspectives. The proof of the pudding is seen in the fact that the unawareness among researchers of the distinctions between these perspectives is so ingrained that using p’s and α’s interchangeably is not perceived as even being a problem in the first place. The foregoing account illustrates why Nickerson’s (2000) statement that ‘reporting p values is not necessarily inconsistent with using an α criterion’ (p. 277) is incorrect. Contrary to Nickerson’s claims (pp. 243 and 242–243, respectively), a p value is not a bound on α, nor is α a decision
Salkind_Chapter 68.indd 306
9/4/2010 10:54:07 AM
Hubbard
Distinctions between p’s and a’s
307
criterion for p. Fisher and Neyman–Pearson themselves would have denied such assertions, for they continue to cause trouble. While not writing about the confusion regarding p’s and α’s per se, Tryon’s (1998) comments over general misunderstandings of statistical tests by psychologists are relevant here: … the fact that statistical experts and investigators publishing in the best journals cannot consistently interpret the results of these analyses is extremely disturbing. Seventy-two years of education have resulted in miniscule, if any, progress toward correcting this situation. It is difficult to estimate the handicap that widespread, incorrect, and intractable use of a primary data analytic method has on a scientific discipline, but the deleterious effects are undoubtedly substantial. (p. 796)
I share Tryon’s sentiments.
Conclusions Among psychology researchers, the p value is interpreted simultaneously in Neyman–Pearson terms as a deductive assessment of error in long-run repeated sampling situations, and in a Fisherian sense as a measure of inductive evidence in a single study. Clearly, this is asking the impossible. More pointedly, a p value from a Fisherian significance test has no place in the Neyman–Pearson hypothesis-testing framework. And the Neyman–Pearson Type I error rate, α, has no place in Fisher’s significance-testing approach. Contrary to popular belief, p’s and α’s are not the same thing; they measure different concepts. Such distinctions notwithstanding, the mixing of measures of evidence ( p’s) with measures of error (α’s) is standard practice in the classroom and in scholarly journals. Because of this, statistical testing is largely devoid of meaning – a p and/or α value can mean almost anything to the applied researcher, and hence nothing. Furthermore, this mixing of ideas from both schools of thought has not been evenhanded. Paradoxically, while the Neyman–Pearson hypothesis-testing framework began supplanting Fisher’s views on significance testing half a century ago, this is not evident in journal articles. Here, it is Fisher’s legacy that is omnipresent. Despite trenchant criticism of classical statistical testing (e.g. Schmidt, 1996; Thompson, 1994a, 1994b, 1997),3 following the recommendations of Wilkinson and the APA Task Force on Statistical Inference (1999), and the continued coverage given to it in the latest APA Publication Manual (2001), it appears that such testing is here to stay. My advice to those who wish to continue using these tests is to be more deliberate about their application. At the very least, researchers should purposely determine whether their concerns are with controlling errors or collecting evidence. If the former, investigators
Salkind_Chapter 68.indd 307
9/4/2010 10:54:08 AM
308
Research Design, Measurement and Statistics and Evaluation
should adopt a Neyman–Pearson approach to guide behavior, but one which makes a serious attempt to estimate the likely costs associated with Type I and II errors. Researchers choosing this option should discontinue using ‘roving alphas’ and the like, and stick with fixed, pre-assigned, α’s. If the focus of a study is more evidential in nature, the use of Fisher’s p value would be more appropriate, preferably in its exact format so as to once more avoid the ‘roving alphas’ dilemma. For myself, I see little of value in classical statistical testing – whether of the p’s or α’s variety. A better route for assessing the reliability, validity, and generalizability of empirical findings is to establish systematic replication with extension research programs focusing on sample statistics, effect sizes and their confidence intervals (Cumming & Finch, 2001; Hubbard, 1995; Hubbard, Parsa, & Luthy, 1997; Hubbard & Ryan, 2000; Thompson, 1994b, 1997, 1999, 2002). But that is another story.
Notes 1. The wording by Gigerenzer and Murray here is unfortunate, likely to cause mischief, and necessitates a digression. Their phraseology appears to convey to readers the impression that p values and α levels are the same thing. This uncharacteristic lapse in communication by Gigerenzer and Murray can, no doubt, be attributed to what Nickerson (2000, pp. 261–262) calls ‘linguistic ambiguity’, whereby even experts on matters relating to statistical testing can occasionally use misleading language. In fact, we shall see later that Gigerenzer and his colleagues are acutely aware of the distinctions between p’s and α’s. 2. An anonymous reviewer offered three (speculative) reasons why the APA Publication Manuals bestow erroneous and inconsistent advice. First, across decades of editions, some views were included at one point and not taken out when alternative views were embedded. Second, most of the APA staff do not have training in statistical methods. Third, this same staff do not seem to be overly concerned about statistical issues, as evidenced by (a) the small amount of coverage devoted to them, and (b) the lack of attention in the latest (2001) edition to revisions involving statistical content. See, in this context, Fidler (2002). 3. Indeed, such criticism can also be found in disciplines as diverse as accounting (Lindsay, 1995), economics (McCloskey & Ziliak, 1996), marketing (Hubbard & Lindsay, 2002; Sawyer & Peter, 1983) and wildlife sciences (Anderson, Burnham, & Thompson, 2000). Moreover, this criticism has escalated decade by decade, judging by the number of articles published on the topic (cf. Hubbard & Ryan, 2000).
References American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, DC: Author. American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, DC: Author. Anderson, D.R., Burnham, K.P., & Thompson, W. (2000). Null hypothesis testing: Problems, prevalence, and an alternative. Journal of Wildlife Management, 64, 912–923.
Salkind_Chapter 68.indd 308
9/4/2010 10:54:08 AM
Hubbard
Distinctions between p’s and a’s
309
Barnard, G.A. (1985). A coherent view of statistical inference. Technical Report Series, Department of Statistics & Actuarial Science, University of Waterloo, Ontario, Canada. Berger, J.O., & Delampady, M. (1987). Testing precise hypotheses (with comments). Statistical Science, 2, 317–352. Carlson, R. (1976). The logic of tests of significance. Philosophy of Science, 43, 116–128. Carver, R.P. (1978). The case against statistical significance testing. Harvard Educational Review, 48, 378–399. Chow, S.L. (1996). Statistical significance: Rationale, validity and utility. Thousand Oaks, CA: Sage. Chow, S.L. (1998). Précis of statistical significance: Rationale, validity and utility (with comments). Behavioral and Brain Sciences, 21, 169–239. Clark, C.M. (1999). Further considerations of null hypothesis testing. Journal of Clinical and Experimental Neuropsychology, 21, 283–284. Clark-Carter, D. (1997). The account taken of statistical power in research published in the British Journal of Psychology. British Journal of Psychology, 88, 71–83. Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304 –1312. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003. Cornfield, J. (1966), Sequential trials, sequential analysis, and the likelihood principle. The American Statistician, 20, 18–23. Cortina, J.M., & Dunlap, W.P. (1997). On the logic and purpose of significance testing. Psychological Methods, 2, 161–172. Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532–574. Daniel, L.G. (1998). Statistical significance testing: A historical overview of misuse and misinterpretation with implications for the editorial policies of educational journals. Research in the Schools, 5, 23–32. Dar, R., Serlin, R.C., & Omer, H. (1994). Misuse of statistical tests in three decades of psychotherapy research. Journal of Consulting and Clinical Psychology, 62, 75–82. Dixon, P., & O’Reilly, T. (1999). Scientific versus statistical inference. Canadian Journal of Experimental Psychology, 53, 133–149. Falk, R., & Greenbaum, C.W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 5, 75–98. Fidler, F. (2002). The fifth edition of the APA Publication Manual: Why its statistics recommendations are so controversial. Educational and Psychological Measurement, 62, 749–770. Finch, S., Cumming, G., & Thomason, N. (2001). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform. Educational and Psychological Measurement, 61, 181–210. Fisher, R.A. (1925). Statistical methods for research workers. Edinburgh: Oliver & Boyd. Fisher, R.A. (1929). The statistical method in psychical research. Proceedings of the Society for Psychical Research, London, 39, 189–192. Fisher, R.A. (1935a). The design of experiments. Edinburgh: Oliver & Boyd. Fisher, R.A. (1935b). Statistical tests. Nature, 136, 474. Fisher, R.A. (1950). Contributions to mathematical statistics. London: Chapman & Hall. Fisher, R.A. (1955). Statistical methods and scientific induction. Journal of the Royal Statistical Society, B, 17, 69–78. Fisher, R.A. (1956). Statistical methods and scientific inference. Edinburgh: Oliver & Boyd. Fisher, R.A. (1958). The nature of probability. The Centennial Review of Arts and Sciences, 2, 261–274. Fisher, R.A. (1959). Statistical methods and scientific inference (2nd rev. ed.). Edinburgh: Oliver & Boyd.
Salkind_Chapter 68.indd 309
9/4/2010 10:54:08 AM
310
Research Design, Measurement and Statistics and Evaluation
Fisher, R.A. (1966). The design of experiments (8th ed.). Edinburgh: Oliver & Boyd. Frick, R.W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1, 379–390. Gibbons, J.D., & Pratt, J.W. (1975). P-values: Interpretation and methodology. The American Statistician, 29, 20–25. Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C.A. Lewis (Eds.), A handbook for data analysis in the behavioral sciences – Methodological issues (pp. 311–339). Hillsdale, NJ: Erlbaum. Gigerenzer, G., & Murray, D.J. (1987). Cognition as intuitive statistics. Hillsdale, NJ: Erlbaum. Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Kruger, L. (1989). The empire of chance. New York: Cambridge University Press. Goodman, S.N. (1992). A comment on replication, P-values and evidence. Statistics in Medicine, 11, 875–879. Goodman, S.N. (1993). p values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. American Journal of Epidemiology, 137, 485–496. Goodman, S.N. (1999). Toward evidence-based medical statistics. 1: The P value fallacy. Annals of Internal Medicine, 130, 995–1004. Grayson, D., Pattison, P., & Robins, G. (1997). Evidence, inference, and the ‘rejection’ of the significance test. Australian Journal of Psychology, 49, 64 –70. Hacking, I. (1965). Logic of statistical inference. New York: Cambridge University Press. Hinkley, D.V. (1987). Comment. Journal of the American Statistical Association, 82, 128–129. Hogben, L. (1957). Statistical theory. New York: Norton. Hubbard, R. (1995). The earth is highly significantly round (p < .0001). American Psychologist, 50, 1098. Hubbard, R., & Lindsay, R.M. (2002). How the emphasis on ‘original’ empirical marketing research impedes knowledge development. Marketing Theory, 2, 381– 402. Hubbard, R., Parsa, R.A., & Luthy, M.R. (1997). The spread of statistical significance testing in psychology: The case of the Journal of Applied Psychology, 1917–1994. Theory & Psychology, 7, 545–554. Hubbard, R., & Ryan, P.A. (2000). The historical growth of statistical significance testing in psychology – and its future prospects. Educational and Psychological Measurement, 60, 661–681. Huberty, C.J (1993). Historical origins of statistical testing practices: The treatment of Fisher versus Neyman–Pearson views in textbooks. Journal of Experimental Education, 61, 317–333. Huberty, C.J, & Pike, C.J. (1999). On some history regarding statistical testing. In B. Thompson (Ed.), Advances in social science methodology (Vol. 5, pp. 1–22). Stamford, CT: JAI Press. Hyde, J.S. (2001). Reporting effect sizes: The roles of editors, textbook authors, and publication manuals. Educational and Psychological Measurement, 61, 225–228. Johnstone, D.J. (1986). Tests of significance in theory and practice (with comments). The Statistician, 35, 491–504. Johnstone, D.J. (1987a). On the interpretation of hypothesis tests following Neyman and Pearson. In R. Viertl (Ed.), Probability and Bayesian statistics (pp. 311–339). New York: Plenum. Johnstone, D.J. (1987b). Tests of significance following R.A. Fisher. The British Journal for the Philosophy of Science, 38, 481– 499. Kempthorne, O. (1976). Of what use are tests of significance and tests of hypothesis. Communications in Statistics – Theory and Methods, A5 (8), 763–777. Keuzenkamp, H.A., & Magnus, J.R. (1995). On tests and significance in econometrics. Journal of Econometrics, 67, 5–24.
Salkind_Chapter 68.indd 310
9/4/2010 10:54:08 AM
Hubbard
Distinctions between p’s and a’s
311
Kirk, R.E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746–759. Krantz, D.H. (1999). The null hypothesis testing controversy in psychology. Journal of the American Statistical Association, 44, 1372–1381. Krueger, J. (2001). Null hypothesis significance testing: On the survival of a flawed method. American Psychologist, 56, 16–26. Kyburg, H.E. (1974). The logical foundations of statistical inference. Dordrecht: Reidel. Leavens, D.A., & Hopkins, W.D. (1998). Intentional communication by chimpanzees: A cross-sectional study of the use of referential gestures. Developmental Psychology, 34, 813–822. LeCam, L., & Lehmann, E.L. (1974). J. Neyman: On the occasion of his 80th birthday. The Annals of Statistics, 2, vii–xiii. Lehmann, E.L. (1993). The Fisher, Neyman–Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88, 1242–1249. Lindsay, R.M. (1995). Reconsidering the status of tests of significance: An alternative criterion of adequacy. Accounting, Organizations and Society, 20, 35–53. Loftus, G.R. (1996). Psychology will be a much better science when we change the way we analyze data. Psychological Science, 7, 161–171. Macdonald, R.R. (1997). On statistical testing in psychology. British Journal of Psychology, 88, 333–347. McCloskey, D.N., & Ziliak, S.T. (1996). The standard error of regressions. Journal of Economic Literature, 34, 97–114. Meehl, P.E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34, 103–115. Mulaik, S.A., Raju, N.S., & Harshman, R.A. (1997). There is a time and a place for significance testing. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 65–116). Hillsdale, NJ: Erlbaum. Nester, M.R. (1996). An applied statistician’s creed. Applied Statistics, 45, 401– 410. Neyman, J. (1942). Basic ideas and some recent results of the theory of testing statistical hypotheses. Journal of the Royal Statistical Society, 105, 292–327. Neyman, J. (1950). First course in probability and statistics. New York: Holt. Neyman, J. (1957). ‘Inductive behavior’ as a basic concept of philosophy of science. International Statistical Review, 25, 7–22. Neyman, J. (1961). Silver jubilee of my dispute with Fisher. Journal of the Operations Research Society of Japan, 3, 145–154. Neyman, J. (1962). Two breakthroughs in the theory of statistical decision making. Review of the International Statistical Institute, 30, 11–27. Neyman, J. (1967). R.A. Fisher (1890–1962), an appreciation. Science, 156, 1456 –1460. Neyman, J. (1971). Foundations of behavioristic statistics (with comments). In V.P. Godambe & D.A. Sprott (Eds.), Foundations of statistical inference (pp. 1–19). Toronto: Holt, Rinehart & Winston of Canada. Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese, 36, 97–131. Neyman, J., & Pearson, E.S. (1928a). On the use and interpretation of certain test criteria for purposes of statistical inference. Part I. Biometrika, 20A, 175–240. Neyman, J., & Pearson, E.S. (1928b). On the use and interpretation of certain test criteria for purposes of statistical inference. Part II. Biometrika, 20A, 263–294. Neyman, J., & Pearson, E.S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London, Ser. A, 231, 289–337. Nickerson, R.S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301. Nix, T.W., & Barnette, J.J. (1998). The data analysis dilemma: Ban or abandon. A review of null hypothesis significance testing. Research in the Schools, 5, 3–14.
Salkind_Chapter 68.indd 311
9/4/2010 10:54:08 AM
312
Research Design, Measurement and Statistics and Evaluation
Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral sciences. New York: Wiley. Pearson, E.S. (1962). Some thoughts on statistical inference. Annals of Mathematical Statistics, 33, 394 – 403. Pearson, E.S. (1990). ‘Student’ a statistical biography of William Sealy Gosset. Oxford: Clarendon Press. Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. London, Edinburgh and Dublin Philosophical Magazine and Journal of Science, 50, 157–175. Reid, C. (1982). Neyman–from life. Berlin: Springer-Verlag. Rosnow, R.L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276 –1284. Royall, R.M. (1997). Statistical evidence: A likelihood paradigm. New York: Chapman & Hall. Rozeboom, W.W. (1960). The fallacy of the null-hypothesis significance test. Psychological Bulletin, 57, 416–428. Sawyer, A.G., & Peter, J.P. (1983). The significance of statistical significance tests in marketing research. Journal of Marketing Research, 20, 122–133. Schmidt, F.L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 1, 115–129. Seidenfeld, T. (1979). Philosophical problems of statistical inference: Learning from R.A. Fisher. Dordrecht: Reidel. Spielman, S. (1974). The logic of tests of significance. Philosophy of Science, 41, 211–226. Thompson, B. (1994a). Guidelines for authors. Educational and Psychological Measurement, 54, 837–847. Thompson, B. (1994b). The pivotal role of replication in psychological research: Empirically evaluating the replicability of sample results. Journal of Personality, 62, 157–176. Thompson, B. (1997). Editorial policies regarding statistical significance tests: Further comments. Educational Researcher, 26, 29–32. Thompson, B. (1999). If statistical significance tests are broken/misused, what practices should supplement or replace them? Theory & Psychology, 9, 165–181. Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31, 25–32. Tryon, W.W. (1998). The inscrutable null hypothesis. American Psychologist, 53, 796. Wilkinson, L., & the APA Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594 – 604. Zabell, S.L. (1992). R. A. Fisher and the fiducial argument. Statistical Science, 7, 369 – 387.
Salkind_Chapter 68.indd 312
9/4/2010 10:54:08 AM
69 Research Methods: Experimental Design Julian C. Stanley
Current Trends
M
odern experimental design is less than half a century old. Its grandfather was William Gossett (“Student”); its very active father, Sir Ronald Fisher; its bible, Fisher’s The Design of Experiments (27). Recently two competitors, decision theory and information theory, have risen to challenge the supremacy of the analysis of variance and covariance. Fisher (28:69) spoke out against “the attempt to reinterpret the common tests of significance used in scientific research as though they constituted some kind of acceptance procedure and led to ‘decisions’ in Wald’s sense. . . .” Pearson (59) and Neyman (55) replied to some of his criticisms. Cochran (15) stated that it would be unduly restrictive and harmful to view the function of statistics wholly in terms of decision making. In a technical paper, Lindley (45) suggested that altho undoubtedly one reason for experimenting is to reach decisions, another is to gain knowledge about the state of nature in the information-theory sense of Shannon. He introduced a measure of the information that an experiment provides and formulated a rule of experimentation: Perform that experiment for which the expected gain in information is greatest and continue experimenting until a predesignated amount of information is attained. Garner and McGill (31) compared uncertainty analysis, following Shannon, with a two-way nonorthogonal analysis of variance and concluded that while the two are similar in many respects, uncertainty analysis should
Source: Review of Educational Research, XXVII(5) (1957): 449– 459.
Salkind_Chapter 69.indd 313
9/4/2010 10:45:30 AM
314
Research Design, Measurement and Statistics and Evaluation
be used when the criterion variable has the properties of Stevens’ nominal or ordinal scale, while the analysis of variance must be employed when one desires to retain information about the metric and conditions for an interval or ratio scale are met. Frequently it may be desirable to do both analyses and compare the results. Box (5:975) recorded his opinion that “outside the field of agriculture the sequential situation is by far the most common one.” Altho the mathematical statistics of sequential experimentation have proved formidable, Grundy, Healy, and Rees (32) presented a solution for a simple version of the two-stage experiment: After the first experiment the researcher decides whether or not he needs to perform a second experiment whose extent depends upon the results of the first. Johnson (38) utilized sequential procedures to discriminate between two hypotheses about the ratio of variance components in a simple one-way classification. Ray (61) published tables for sequential tests applicable to the one-way classification and randomized blocks. In a clear, explicit, heuristic article, Box (4) proposed a method of process improvement to be run in the normal course of production by plant personnel themselves, whereby industrial processes regularly yield information on how the product can be improved. Educators might be repaid amply for time spent pondering the applications of Box’s concepts to educational “products.”
Fundamental Books Cochran and Cox (16) expanded their 1950 edition by 36 percent to cover recent developments. This expository handbook of designs is invaluable to the experimenter who already has a year or so of background in statistics. Snedecor’s revision (67) should be quite helpful to educators who can translate agricultural examples into the jargon of their own areas. Davies (22) edited a large volume that, despite its title, has much relevant material; principles of experimental design are widely applicable. Likewise, the contributions of Bennett and Franklin (3), who devoted 280 pages to the analysis of variance and design of experiments, transcend the chemical industry. Perhaps Federer (24) tried to cover too much without presenting designs plans except in his examples, thereby making his treatment overly concise and difficult for most educational workers. Finney (25) wrote clearly but abbreviated excessively, providing little detailed help with computations. Ostle (57) included chapters on the analysis of variance and covariance and experimental design. A second edition of the Wishart and Sanders (82) manual appeared. Pearson and Hartley (60) tabled percentage points of the F distribution for the .25, .10, .05, .025, .01, .005, and .001 levels of significance and for the largest variance ratio, besides providing many other useful statistics.
Salkind_Chapter 69.indd 314
9/4/2010 10:45:31 AM
Stanley
Research Methods
315
Applied Books McNemar (48) expanded his coverage of the design and analysis of experiments, especially with respect to statistical models. Cornell’s fresh new approach (19) merits adequate tryouts in one-year sequences; this book is a long step ahead of the usual textbook in educational statistics, but its sections on the analysis of variance demand supplementation along lines suggested by Stanley. A similar warning applies with even greater force to Guilford’s revision (33). The chapter by Edwards (23) sets forth elementary principles of experimental design. Brunswik’s “representative design” (9) bears a certain resemblance to recent work on random factors and variance components (20, 64, 81) and therefore might be pursued profitably in those terms.
Primarily Expository Articles Baker (1) described a systematic method for arranging and analyzing the results of factorial experiments that gives the mean square relating to each degree of freedom. It is applicable either to qualitative or to quantitative factors and is especially suitable for use with a desk calculator. Stanley (70) emphasized the importance of design, stressed random assignment, and outlined the planning of two classroom experiments. Cochran (14) commented about what may well have been the largest experiment ever conducted, listing important factors in the polio field trials. Stanley (73) showed in detail how to analyze scores from counterbalanced examinations, explaining Latin and Greco-Latin square crossover designs. Campbell (11) set forth clearly and systematically considerations of group control design, with special attention to the role of pretests.
Four Basic Articles Four long, well-written, clarifying articles of great importance to experimenters, tho probably not easy reading for most educators, deserve careful study. Scheffé (64) surveyed the current state of the theory of alternative models in the analysis of variance, showing (on page 259) expected values of mean squares for the mixed model with dependent interactions. His results agree with those of other recent investigators (20, 81) but not with certain earlier recommendations. McNemar (48:309), for instance, included in his expected mean square for the random effect in a mixed model an interaction component of variance that Scheffé omits, thereby leading the former to a more conservative test of significance for the random effect.
Salkind_Chapter 69.indd 315
9/4/2010 10:45:31 AM
316
Research Design, Measurement and Statistics and Evaluation
Cornfield and Tukey (20) dealt with average (expected) values of mean squares for several types of factorial designs, stating that they are absolutely essential to the choice of an error term – tho not sufficient, of course – and found that the customary expected mean squares (64: 259; 81: 963) resulted from their derivations under very general assumptions. Thus it may make sense for Hoyt (35) and others to talk in terms of variance components (or rather, of intraclass correlation) for dichotomously scored test items. Wilk and Kempthorne (81) continued their contributions to the analysis of variance, defining experimental units as “those entities in an experiment to which treatments are assigned at random” (page 951) and suggesting that the term be extended to include periods of time, states of mind, and other poorly defined complexes of conditions. They stressed the need for random sampling of “levels” of a factor when the experimenter wants to generalize to all levels, just as one samples randomly from a population of experimental units if he wants to generalize beyond the experimental units actually employed in the experiment. Wilk and Kempthorne showed how unit-treatment interactions enter into expectations of mean squares, pointing out that if the number of experimental units in the population of experimental units is large, the bias caused by such interactions will be small. We have long needed a thoro, authoritative treatment of the Latin square design. Wilk and Kempthorne (80) couched this in relatively easy language and symbolism, presented finite-model expectations of mean squares from which EMS’s for the other models can be derived quickly, and concluded that analyses of variance for Latin square designs may overestimate the error term for treatment comparisons and underestimate the component of variance due to treatment main effects. Nevertheless, when both the Latin square and the randomized block designs are reasonable for a proposed experiment, they recommend the former, tho with caution, because its error term for the treatment mean square will usually be too large. Experimenters owe a debt of gratitude to Scheffé, Cornfield and Tukey, and Wilk and Kempthorne for the care they took to make these articles intelligible to the reader whose mathematics walks with a limp.
Other Articles Concerning Expectations of Mean Squares In a note (71) and a review (19) Stanley listed and discussed expectations of mean squares for finite, fixed, random, and mixed models, explicitly for two- and three-way crossed classifications with equal numbers of replicates per treatment combination. Extensions to one and to four or more classifications can be made quickly on the basis of principles outlined. Johnson and Stanley (39) exhibited the EMS’s for a mixed-model design involving two independent groups of boys, every one of whom responded to 16 projective cards into which were incorporated three dichoto-mous factors, each level
Salkind_Chapter 69.indd 316
9/4/2010 10:45:31 AM
Stanley
Research Methods
317
of which was represented twice: 2 (2 × 2 × 2) = 16. They showed how randomization at several points in their investigation was essential to the analysis employed. Medley, Mitzel, and Doi (49) provided EMS’s for the three-way design without replication tho not via the finite-model EMS’s.
Designs Technical journals, such as Biometrics, Biometrika, and Sankhy a, abound with papers extending old experimental designs and proposing new ones. The interested reader is referred to these journals and to Cochran and Cox (16). Only a few articles of this type will be cited here. Zelen (83) explained at a reasonable elementary level his new method for analyzing data from incomplete block designs. Stanley (72) showed that Gellerman’s study was more complex than its author supposed, constituting a sort of split-plot crossover design, and explained how to analyze it. Stanley (69) used scores from two forms of a “satisfaction” inventory to compare crossover and noncrossover designs. Pearce (58) studied local versus remote effects of various treatments applied to different parts of the same organism. Morrison (53) illustrated several designs with at least five factors that make possible the testing of all main effects and two-factor interactions while requiring only half the number of observations of an analogous factorial design. Clarke (12) indicated how four 4 × 4 Greco-Latin squares might be used together to provide enough degrees of freedom for error. Stanley (73) dealt with completely permuted 3 × 3 Greco-Latin square designs. Collier and Stunkard (17) treated the same type of design as Stanley and Beeman (74) tho quite differently.
Components of Variance, Pooling Procedures, and Power Tukey (77, 78) tackled variance components with a new mathematical procedure. Bulmer (10) gave a simple, reasonably accurate formula for the confidence limits of variance components. Searle (65) employed matrix methods to find sampling variances of estimates of components of variance and covariance for a one-way classification with unequal numbers of observations in the various classes. King (40) recommended that in one-way classifications of a random factor, the number of levels of the factor equal the number of observations per level in order to give nearly maximum power for testing the null hypothesis. Johnson’s study (38) has already been cited. Huntsberger (36) showed that a certain weighting procedure provides greater control over disturbances that might result from pooling sums of squares to secure an error term with greater degrees of freedom than does the familiar sometimes-pool method. Bozivich, Bancroft, and Hartley (8)
Salkind_Chapter 69.indd 317
9/4/2010 10:45:31 AM
318
Research Design, Measurement and Statistics and Evaluation
examined critically for some random and mixed models the consequences, with regard to resulting errors of the first and second kind, of certain pooling procedures. They provided two qualified recommendations, concluding that “no rule of the form V2 /V1> constant is very satisfactory” (page 1040). Nicholson’s formula (56) for the power of the analysis of variance test holds when the denominator of the F ratio has an even number of degrees of freedom. Fox’s extensive charts (29) with a detailed example should be useful. Commins (18) published a table stating the size of sample needed for various assumed values of population parameters and for four probabilities that the study will yield significant results.
A Posteriori Comparisons of Means Stanley (68) illustrated the “post-mortem” methods of Scheffé, Tukey, Dunnett, and Duncan for comparing various differences among means after an analysis of variance has been performed. Wallace (79) presented Tukey’s unpublished procedure, with applications and comments. Kramer (42, 43) extended Duncan’s multiple range test to include group means with unequal numbers of replications, adjusted means with heterogeneous variances and covariances, covariance analysis, incomplete block designs, lattices, and other situations.
Applications of the Analysis of Variance to Tests Hoyt (35) generalized to test items not scored dichotomously his analysis of variance procedure for securing coefficients of equivalence. His result is algebraically equivalent to Cronbach’s alpha. Stanley commented on this procedure in his review (19). Moonan (50, 51, 52) showed how to ascertain the equivalence and stability of examinations and the interaction of items with methods in an experiment, using an orthogonal linear transformation due to Nandi.
Nonparametric Approaches to the Analysis of Variance That current darling of psychologists, nonparametric statistics, is treated in Chapter VI of this issue, except as applied to the analysis of variance. Roy and Mitra (62: 374) attempted to make a clear distinction between a “variate” and a “way of classification” in order to differentiate between a “multivariate analysis” situation, an “analysis of variance” situation, and “something of a mixed type.” Hodges and Lehmann (34) found that the
Salkind_Chapter 69.indd 318
9/4/2010 10:45:31 AM
Stanley
Research Methods
319
asymptotic Pitman efficiency of the Kruskal-Wallace rank test when compared with the F-test never falls below .864. They then investigated alternative notions of asymptotic efficiency. Sutcliffe (75) showed how to partition sums of squares and associated degrees of freedom for complex contingency tables of frequency data from multiple classification designs, quite analogously to the analysis of variance. It is interesting to compare this method with the Garner-McGill uncertainty analysis (31) based upon information theory. McNemar (47) analyzed seven sets of data both by the analysis of variance and Kellogg V. Wilson’s method, which is similar to Sutcliffe’s but less general, to show that the latter has considerably less power. One might add that with Wilson’s partitioning of chi-square the interaction term may be negative. The upshot of this seems to be that while nonparametric analogues of the analysis of variance are valuable for frequency data, one must be careful not to throw out the baby with the dirty bath water in the interests of simplifying computations.
The Analysis of Covariance In an important paper, Cox (21) compared covariance analysis with blocking (matching with respect to a covariable) and concluded that methods based upon covariance are preferable to blocking only if the r between the covariate (x) and the dependent variate ( y) is at least .6. But if we suspect that the treatment effects are not independent of x – that there is a treatment by x interaction – we should ordinarily prefer to use x quantitatively. Truitt and Smith (76) examined methods for making covariance adjustments in split-plot experiments and testing main effects for significance. We have already mentioned the work of Kramer (42) and Searle (65).
Pairing Jackson and Fleckenstein (37) compared the Thurstone-Mosteller, Scheffé, Bradley-Terry, and Gulliksen methods for analyzing data based upon paired comparisons, concluding that while all four procedures give about the same results, each has advantages for certain situations. For both the fixed and the mixed models, Lev and Kinder (44) offered analysis of variance formulas applicable to a group of several subjects observed in the presence of each of the other subjects of the group, the entire set of possible pairings having been repeated on several occasions. Runkel, Smith, and Newcomb (63) presented a method for computing the interaction effects on variables measured by observing interacting pairs of persons, where not all possible pairs of subjects need be observed.
Salkind_Chapter 69.indd 319
9/4/2010 10:45:31 AM
320
Research Design, Measurement and Statistics and Evaluation
Miscellaneous In a long, technical article, Box and Hunter (7) continued the development of “Boxism,” introducing the concept of the “variance function” for an experimental design and defining “rotable designs.” Cochran (13) discussed combining estimates from several experiments and gave examples. Box and Andersen (6) found that while the analysis of variance test for groups of equal size is both remarkably “robust” (insensitive to extraneous factors not being tested) and “powerful” (sensitive to the specific factors being tested), Bartlett’s test for the homogeneity of a set of variances is affected drastically by departures from mesokurtosis. For 20 variances based upon 9 d.ƒ. each, Bartlett’s test yields a significance level of .718 when kurtosis is 2, instead of the “correct” .05 value! For kurtosis of –1 the corresponding figure is .000004. The authors applied permutation theory to the problem of comparing variances to secure a robust test. Fisher (26) warned against choosing one transformation of data in preference to another because of computational ease without considering how well it conforms to theoretical considerations. Smith (66) stepped in to resolve the long-standing discussion in Biometrics about whether the missing plot estimate should be considered simply a number to be placed in the empty space or an estimate of the lost observation. He pointed out that the standard error to be attached to the estimate depends upon what one intended a priori to estimate. Articles on graphic methods by Barnes, Pearson, and Reiss (2) and Lyle (46) are well worth perusing.
Concluding Remarks Many of the contributions to experimental design during the past three years should be incorporated rapidly into statistics textbooks designed for students in education and psychology. Authors of such books need the ability and willingness to translate into simpler but still accurate form relevant material published by mathematical statisticians. Then by studying for at least a year, and preferably longer, under a well-qualified instructor, graduate students may come to understand the rudiments of experimental design. To do less than this and still hope for properly designed experiments is asking for a miracle.
Bibliography 1. Baker, Anthony G. “Analysis and Presentation of the Results of Factorial Experiments.” Applied Statistics 6: 45–55; March 1957. 2. Barnes, Benjamin A.; Pearson, Elinor; and Reiss, Eric. “The Analysis of Variance: A Graphical Representation of a Statistical Concept.” Journal of the American Statistical Association 50: 1064 –72; December 1955.
Salkind_Chapter 69.indd 320
9/4/2010 10:45:32 AM
Stanley
Research Methods
321
3. Bennett, Carl A., and Franklin, Norman L. Statistical Analysis in Chemistry and the Chemical Industry. New York: John Wiley and Sons, 1954. 724 p. See review by F. R. Himsworth in Journal of the American Statistical Association 50: 980–84; September 1955. 4. Box, George E. P. “Evolutionary Operation: A Method for Increasing Industrial Productivity.” Applied Statistics 6: 81–101; June 1957. 5. Box, George E. P. “Review of Oscar Kempthorne’s The Design and Analysis of Experiments.” Journal of the American Statistical Association 50: 974 –75; September 1955. 6. Box, George E. P., and Andersen, Sigurd L. “Permutation Theory in the Derivation of Robust Criteria and the Study of Departures from Assumption.” Journal of the Royal Statistical Society, Series B 17: 1–34; No. 1, 1955. 7. Box, George E. P., and Hunter, J. Stuart. “Multi-Factor Experimental Designs for Exploring Response Surfaces.” Annals of Mathematical Statistics 28: 195–241; March 1957. 8. Bozivich, Helen; Bancroft, Theodore A.; and Hartley, Herman O. “Power of Analysis of Variance Test Procedures for Certain Incompletely Specified Models, I.” Annals of Mathematical Statistics 27: 1017– 43; December 1956. 9. Brunswik, Egon. Perception and the Representative Design of Psychological Experiments. Berkeley: University of California Press, 1956. 154 p. See review by James J. Gibson in Contemporary Psychology 2: 33–34; February 1957. 10. Bulmer, M. G. “Approximate Confidence Limits for Components of Variance.” Biometrika 44: 159–67; June 1957. 11. Campbell, Donald T. “Factors Relevant to the Validity of Experiments in Social Settings.” Psychological Bulletin 54: 297–312; July 1957. 12. Clarke, Geoffrey M. “A Design for Testing Several Treatments under Controlled Experimental Conditions.” Applied Statistics 4: 199–206; November 1955. 13. Cochran, William G. “The Combination of Estimates from Different Experiments.” Biometrics 10: 101–29; March 1954. 14. Cochran, William G. “The 1954 Trial of the Poliomyelitis Vaccine in the United States.” Biometrics 11: 528–34; December 1955. 15. Cochran, William G. “Review of the Wallis-Roberts Statistics: A New Approach.” Journal of the American Statistical Association 51: 664 –66; December 1956. 16. Cochran, William G., and Cox, Gertrude M. Experimental Designs. Second edition. New York: John Wiley and Sons, 1957. 617 p. 17. Collier, Raymond O., and Stunkard, Clayton L. “An Analysis of Variance of Multiple Measurements on Subjects Classified in Unequal Groups of One Dimension.” Journal of Experimental Education 25: 255–62; March 1957. 18. Commins, William D. “The Number of Cases and the Probability of Obtaining Significant Results.” Journal of General Psychology 52: 163–66; January 1955. 19. Cornell, Francis G. The Essentials of Educational Statistics. New York: John Wiley and Sons, 1956. 375 p. See review by Julian C. Stanley in Educational and Psychological Measurement 16: 549–54; Winter 1956. 20. Cornfield, Jerome, and Tukey, John W. “Average Values of Mean Squares in Factorials.” Annals of Mathematical Statistics 27: 907– 49; December 1956. 21. Cox, David R. “The Use of a Concomitant Variable in Selecting an Experimental Design.” Biometrika 44: 150–58; June 1957. 22. Davies, Owen L., editor. Design and Analysis of Industrial Experiments. New York: Hafner Publishing Co., 1954. 636 p. See review by Harold A. Freeman in Journal of the American Statistical Association 50: 610 –11; June 1955. 23. Edwards, Allen L. “Experiments: Their Planning and Execution.” Handbook of Social Psychology. (Edited by Gardner Lindzey.) Cambridge, Mass.: Addison-Wesley Publishing Co., 1954. Vol. 1, p. 259–88. 24. Federer, Walter T. Experimental Design: Theory and Application. New York: Macmillan Co., 1955. 591 p. See review by Richard L. Anderson in Journal of the American Statistical Association 51: 667–69; December 1956.
Salkind_Chapter 69.indd 321
9/4/2010 10:45:32 AM
322
Research Design, Measurement and Statistics and Evaluation
25. Finney, David J. Experimental Design and Its Statistical Basis. Chicago: University of Chicago Press, 1955. 169 p. See review by William S. Connor in Journal of the American Statistical Association 51: 386–87; June 1956. 26. Fisher, Ronald A. “The Analysis of Variance with Various Binomial Transformations.” Biometrics 10: 130–51; March 1954. 27. Fisher, Ronald A. The Design of Experiments. Sixth edition. New York: Hafner Publishing Co., 1951. 244 p. 28. Fisher, Ronald A. “Statistical Methods and Scientific Induction.” Journal of the Royal Statistical Society, Series B 17: 69–70; No. 1, 1955. 29. Fox, Martin. “Charts of the Power of the F-Test.” Annals of Mathematical Statistics 27: 484 –97; June 1956. 30. Gardner, Eric F. “Statistical Methods.” Annual Review of Psychology. Stanford, Calif.: Annual Reviews, 1957. Vol. 8, p. 377–98. 31. Garner, Wendell R., and McGill, William J. “The Relation Between Information and Variance Analyses.” Psychometrika 21: 219–28; September 1956. 32. Grundy, P. M.; Healy, M. J. R.; and Rees, D. H. “Economic Choice of the Amount of Experimentation.” Journal of the Royal Statistical Society, Series B 18: 32–55; No. 1, 1956. 33. Gullford, Joy P. Fundamental Statistics in Psychology and Education. Third edition. New York: McGraw-Hill Book Co., 1956. 565 p. See review by Julian C. Stanley in Contemporary Psychology 1: 302; October 1956. 34. Hodges, Joseph L., Jr., and Lehmann, Eric L. “The Efficiency of Some Nonparametric Competitors of the t-Test.” Annals of Mathematical Statistics 27: 324 –35; June 1956. 35. Hoyt, Cyril J. “Relations of Certain Correlational to Variance Ratio Estimates of Test Reliability.” Twelfth Yearbook of the National Council on Measurements Used in Education, Part I. Cambridge, Mass.: the Council (Secy.-Treas.: David V. Tiedeman, Harvard University), 1955. p. 50–55. 36. Huntsberger, David V. “A Generalization of a Preliminary Testing Procedure for Pooling Data.” Annals of Mathematical Statistics 26: 734 – 43; December 1955. 37. Jackson, J. Edward, and Fleckenstein, Mary. “An Evaluation of Some Statistical Techniques Used in the Analysis of Paired Comparison Data.” Biometrics 13: 51–64; March 1957. 38. Johnson, Norman L. “Sequential Procedures in Certain Component of Variance Problems.” Annals of Mathematical Statistics 25: 357–66; June 1954. 39. Johnson, Orval G., and Stanley, Julian C. “Attitudes Toward Authority of Delinquent and Non-Delinquent Boys.” Journal of Abnormal and Social Psychology 51: 712–16; November 1955. 40. King, Edgar P . “Optimum Grouping in One-Criterion Variance Components Analysis.” Journal of the American Statistical Association 49: 637–39; September 1954. 41. Kogan, Leonard S. “Applications of Variance-Covariance Designs in Educational Research.” Review of Educational Research 24: 439– 47; December 1954. 42. Kramer, Clyde Y. “Extension of Multiple Range Tests to Group Correlated Adjusted Means.” Biometrics 13: 13–18; March 1957. 43. Kramer, Clyde Y. “Extension of Multiple Range Tests to Group Means with Unequal Numbers of Replications.” Biometrics 12: 307–10; September 1956. 44. Lev, Joseph, and Kinder, Elaine F. “New Analysis of Variance Formulas for Treating Data from Mutually Paired Subjects.” Psychometrika 22: 1–15; March 1957. 45. Lindley, Dennis V. “On a Measure of Information Provided by an Experiment.” Annals of Mathematical Statistics 27: 986–1005; December 1956. 46. Lyle, Philip. “The Construction of Nomograms for Use in Statistics: Part II The Graphical Analysis of the Results of a Factorial Experiment.” Applied Statistics 3: 184 – 95; November 1954.
Salkind_Chapter 69.indd 322
9/4/2010 10:45:32 AM
Stanley
Research Methods
323
47. McNemar, Quinn. “On Wilson’s Distribution-Free Test of Analysis of Variance Hypotheses.” Psychological Bulletin 54: 361–62; July 1957. 48. McNemar, Quinn. Psychological Statistics. Second edition. New York: John Wiley and Sons, 1955. 408 p. See review by Julian C. Stanley in Educational and Psychological Measurement 15: 307–11; Autumn 1955. 49. Medley, Donald M.; Mitzel, Harold E.; and Doi, Arthur N. “Analysis-of-Variance Models and Their Use in a Three-Way Design Without Replication.” Journal of Experimental Education 24: 221–29; March 1956. 50. Moonan, William J. “An Analysis of Variance Method for Determining the External and Internal Consistency of an Examination.” Journal of Experimental Education 24: 239 – 44; March 1956. 51. Moonan, William J. “Computational Illustrations of the Internal and External Consistency Analysis of Examination Responses.” Journal of Experimental Education 25: 181–90; March 1957. 52. Moonan, William J. “Simultaneous Examination and Method Analysis by Variance Algebra.” Journal of Experimental Education 23: 253–57; March 1955. 53. Morrison, Milton. “Fractional Replication for Mixed Series.” Biometrics 12: 1–19; March 1956. 54. Moses, Lincoln E. “Statistical Theory and Research Design.” Annual Review of Psychology. Stanford, Calif.: Annual Reviews, 1956. Vol. 7, p. 233–58. 55. Neyman, Jerzy. “Note on an Article by Sir Ronald Fisher.” Journal of the Royal Statistical Society, Series B 18: 288–94; No. 2, 1956. 56. Nicholson, Wesley L. “A Computing Formula for the Power of the Analysis of Variance Test.” Annals of Mathematical Statistics 25: 607–10; September 1954. 57. Ostle, Bernard. Statistics in Research: Basic Concepts and Techniques for Research Workers. Ames: Iowa State College Press, 1954. 487 p. See review by Allan Birnbaum in Journal of the American Statistical Association 51: 385–86; June 1956. 58. Pearce, S. C. “Experimenting with Organisms as Blocks.” Biometrika 44: 141– 49; June 1957. 59. Pearson, Egon S. “Statistical Concepts in Their Relation to Reality.” Journal of the Royal Statistical Society, Series B 17: 204–207; No. 2, 1955. 60. Pearson, Egon S., and Hartley, Herman O. Biometrika Tables for Statisticians.Vol. 1. Fourth edition. Cambridge, England: Cambridge University Press, 1954. 238 p. See review by William B. Michael in Educational and Psychological Measurement 16: 171–72; Spring 1956. 61. Ray, W. D. “Sequential Analysis Applied to Certain Experimental Designs in the Analysis of Variance.” Biometrika 43: 388– 403; December 1956. 62. Roy, Samarendra N., and Mitra, Sujit K. “An Introduction to Some Non-Parametric Generalizations of Analysis of Variance and Multivariate Analysis.” Biometrika 43: 361–76; December 1956. 63. Runkel, Philip J.; Smith, J. E. Keith; and Newcomb, Theodore M.“Estimating Interaction Effects among Overlapping Pairs.” Psychological Bulletin 54: 152–58; March 1957. 64. Scheffé Henry. “Alternative Models for the Analysis of Variance.” Annals of Mathematical Statistics 27: 251–71; June 1956. 65. Searle, S. R. “Matrix Methods in Components of Variance and Covariance Analysis.” Annals of Mathematical Statistics 27: 737– 48; September 1956. 66. Smith, H. Fairfield. “Missing Plot Estimates.” Biometrics 13: 115–18; March 1957. 67. Snedecor, George W. Statistical Methods Applied to Experiments in Agriculture and Biology. Fifth edition. Ames: Iowa State College Press, 1956. 534 p. See review by K. Alexander Brownlee in Journal of the American Statistical Association 52: 100–102; March 1957.
Salkind_Chapter 69.indd 323
9/4/2010 10:45:32 AM
324
Research Design, Measurement and Statistics and Evaluation
68. Stanley, Julian C. “Additional ‘Post-Mortem’ Tests of Experimental Comparisons.” Psychological Bulletin 54: 128–30; March 1957. 69. Stanley, Julian C. “A Comparison of Verbal and Pictorial Self-Rating-Scale Categories.” Journal of Experimental Education 23: 239–46; March 1955. 70. Stanley, Julian C. “Controlled Experimentation in the Classroom.” Journal of Experimental Education 25: 195–201; March 1957. 71. Stanley, Julian C. “Fixed, Random, and Mixed Models in the Analysis of Variance as Special Cases of Finite Model III.” Psychological Reports 2: 369; September 1956. 72. Stanley, Julian C. “Gellerman’s Complex Crossover Design.” Journal of Consulting Psychology 18: 380; October 1954. 73. Stanley, Julian C. “Statistical Analysis of Scores from Counterbalanced Tests.” Journal of Experimental Education 23: 187–207; March 1955. 74. Stanley, Julian C., and Beeman, Ellen Y. “Interaction of Major Field of Study with Kind of Test.” Psychological Reports 2: 333–36; September 1956. 75. Sutcliffe, J. P. “A General Method of Analysis of Frequency Data for Multiple Classification Designs.” Psychological Bulletin 54: 134 –37; March 1957. 76. Truitt, Jeanne T., and Smith, H. Fairfield. “Adjustment by Covariance and Consequent Tests of Significance in Split-Plot Experiments.” Biometrics 12: 23–39; March 1956. 77. Tukey, John W. “Variances of Variance Components: I. Balanced Designs.” Annals of Mathematical Statistics 27: 722–36; September 1956. 78. Tukey, John W. “Variances of Variance Components: II. The Unbalanced Single Classification.” Annals of Mathematical Statistics 28: 43–56; March 1957. 79. Wallace, David L. “Multiple Comparisons in the Analysis of Variance.” National Convention Transactions, Eleventh Annual Convention. New York: American Society for Quality Control (50 Church St.), 1957. p. 279–85. 80. Wilk, Martin B., and Kempthorne, Oscar. “Non-Additivities in a Latin Square Design.” Journal of the American Statistical Association 52: 218–36; June 1957. 81. Wilk, Martin B., and Kempthorne, Oscar. “Some Aspects of the Analysis of Factorial Experiments in a Completely Randomized Design.” Annals of Mathematical Statistics 27: 950–85; December 1956. 82. Wishart, John, and Sanders, Harold G. Principles and Practices of Field Experimentation. Second edition. Cambridge, England: W. Heffer and Sons, 1955. 133 p. See review by George A. Baker in Journal of the American Statistical Association 51: 387–88; June 1956. 83. Zelen, Marvin. “The Analysis of Incomplete Block Designs.” Journal of the American Statistical Association 52: 204–17; June 1957.
Salkind_Chapter 69.indd 324
9/4/2010 10:45:32 AM
70 What Can We Learn from International Assessments? Robert J. Mislevy
I
n broadest terms, international assessment is meant to gather information about schooling in a number of countries and somehow use it to improve students’ learning. My topic is inference in international assessment. What exactly do we hope to learn from international assessments? What evidence do the kinds of information we gather provide about what we want to learn, and how do we interpret it? We will consider issues of population definition and sampling plans; of assessment exercises and background variables; of statistical advances and inferential brick walls. After a few words about the nature of evidence and inference in general, we’ll take a closer look at the kinds of inferences people want to make from international assessments.
Evidence and Inference Inference is reasoning from what we observe and what we know to explanations, conclusions, or predictions. We are always reasoning in the presence of uncertainty. The information we work with is typically incomplete, inconclusive, and amenable to more than one explanation. We try to establish the weight and coverage of evidence in what we observe. The first question is “Evidence about what?” There is a crucial difference between data and evidence:
Source: Educational Evaluation and Policy Analysis, 17(4) (1995): 419 – 437.
Salkind_Chapter 70.indd 325
9/4/2010 10:49:21 AM
326
Research Design, Measurement and Statistics and Evaluation
A datum becomes evidence in some analytic problem when its relevance to one or more hypotheses being considered is established. . . . Evidence is relevant on some hypothesis if it either increases or decreases the likelihood of the hypothesis. Without hypotheses, the relevance of no datum could be established. (Schum, 1987, p. 16)
Assessment data, like clues in a criminal investigation, acquire meaning only in relation to particular inferences. The same data can be direct evidence for some inferences, indirect evidence for others, and wholly irrelevant to still other inferences.
Objectives of International Assessment Theisen, Achola, and Boakari (1983) distinguish three purposes that have motivated international assessment, each giving rise to conjectures with their own special inferential challenges. These purposes, and a quick summary of my conclusions about them, are listed below. Afterwards, we’ll consider each in more detail. 1. Comparing relative achievement status by country and subject. Rankings of nations enjoy wide popular interest, but critics point to the difficulty of defining simple indices to compare national educational systems, and question the utility of such comparisons even if they were technically flawless (Bracey, 1992, 1993; Rotberg, 1991). My answer to people who want comparative standings is to give them comparative standings – lots of them: in different topics, at different ages, with different kinds of tasks, both unadjusted and adjusted for factors such as national curricula and proportion of students in school. Recognizing that no single index of achievement can tell the full story and that each has its own limitations, we increase our understanding of how nations compare by increasing the breadth of our vision. Even so, however, simply ascertaining nations’ relative standing tells us little about how to set educational policy or improve instructional practice. 2. Gleaning policy implications in one nation from determinants of achievement in others. This objective suggests that nations’ varying policies and practices constitute “natural experiments,” from which we can infer the causes of educational achievement. But survey data cannot, by their nature, provide direct evidence about causal effects. They can, however, reveal associations that are worth investigating in studies that can tell us about effects of policies and instructional approaches. Technical developments allow us to better focus evidence on our conjectures about these associations in large-scale surveys, to better account for interrelationships among task performance, explanatory variables, and the hierarchical organization of schooling – but without breaking through the inferential wall to causal inference. 3. Reassessing in-country expenditure priorities to boost achievement. The idea here is to provide information to policymakers on the status of achievement
Salkind_Chapter 70.indd 326
9/4/2010 10:49:21 AM
Mislevy
Learning from International Assessments
327
and practices within their own nations, in terms that have some international grounding. Inferences concern what students know and can do, and what is happening in classes and schools – without attempting to establish causal relationships. Even without the strong conclusions associated with experiments, nations can benefit from tracking status and change in educational conditions, and from cross-national exploratory analyses for clues about achievement that they can follow up with more focused research of other kinds. From this point of view, international assessments are a useful component in the mix of sources of information, complementing experiments, field studies, and in-depth analyses of specific aspects of learning.
Comparisons of Relative Achievement Comparative standings can be very useful in policy analysis, and people will construct them whether you want them to or not. It is better to compile them yourself and explain the difficulties in interpreting them rather than leaving the job to outsiders. (Maddison, 1975, p. 170) To the idea that people like to have a single number we answer that usually they shouldn’t get it. (Box & Taio, 1973, p. 309)
Before one can gather evidence to rank nations, one must have an operational definition of a nation’s achievement. It must determine the scope of the comparison: Is it a given learning area defined by economical multiple-choice items or by more complex but still school-like tasks, or is it broader indicators such as courses taken or school completion? An operational definition must specify the population under consideration: Is it, say, all 17-year-olds in a nation, just those in school, or only students instructed in the dominant language? How about 9-year olds, 13-year olds, and adults? Should the relationships of the tasks to the nations’ diverse curricula and cultures be taken into account, and if so, how? Once these issues are decided, one must create tasks, draw samples of students, collect data, and analyze the results. Much progress has been made over the past 30 years in these technical aspects of international assessments. But different choices along each of these dimensions can lead to different rankings. Gerald Bracey (1991, 1992, 1993) and Iris Rotberg (1990, 1991) point out that the predominantly low rankings of the United States in recent assessments1 by the International Association for the Evaluation of Educational Achievement (IEA) and the International Assessment of Educational Progress (IAEP) depend, in part, on the configurations of choices that were employed. It is fruitless to argue, however, about which choice leads to the “true” rankings. Even if comparisons of achievement are desired, educational systems and students’ accomplishments vary in many aspects from nation to nation. We all know that Consumer Reports presents information on a wide
Salkind_Chapter 70.indd 327
9/4/2010 10:49:22 AM
328
Research Design, Measurement and Statistics and Evaluation
diversity of aspects of autos. Some, such as headroom and fuel mileage, are precisely specified in engineering terms; others, such as road feel and instrument layout, are based on expert opinion. Trade-offs are noted in overall evaluations, and direct comparisons are made only within groups of cars with similar purposes and prices, such as luxury sedans, sports cars, and minivans. Should we expect comparing nations’ education systems to be so much easier than comparing automobiles? For a given level of expenditure, gathering some evidence about many aspects of education gives us a better understanding of nations’ comparative status than a great deal of evidence for rankings on any single aspect.
Scope of Comparisons There are many aspects of achievement about which information could be gathered. Cost is always a factor in the choice; so is coverage of evidence. We will focus on what might be called “drop in from the sky” assessment, or administering common prespecified tasks to randomly selected students on a given day (in contrast to, for example, examining students’ performance in depth and over time in the particular areas they are studying). IEA and IAEP both collect this kind of data from students. Because such comparisons, no matter how accurate or well-executed, present an incomplete picture at best, two recent reports supplemented IAEP data with such indices as rates of higher education, employment at various levels of education, and rate of engineers per 10,000 workers (Carson, Huelskamp, & Woodall’s 1993 Sandia Report; the Salganik, Phelps, Bianchi, Nohara, & Smith, 1993 OECD report). Each indicator provides information the others don’t; each has limitations on its usefulness. The United States ranked first in the rate of college graduates, for example, a positive indication of commitment to education and accomplishment of students; this index does not, however, convey the capabilities of the graduates either in absolute terms or in comparison with those of other nations. A serious limitation of any single-number index of achievement is its inability to communicate the degree and character of variation within nations. The IAEP 1991 survey of mathematics (Lapointe, Mead, & Askew, 1992), for example, ranks U.S. 13-year-olds near the bottom of 15 nations with a mean of 262, substantially below Korea (283) and Taiwan (285), but far above Jordan (246). Outpacing Jordan was unremarkable in light of the nations’ relative stages of economic development, but placing so far behind the leaders was viewed with alarm. When these results are projected into the framework of the 1992 U.S. Trial State Assessment,2 however, we find means for North Dakota, Iowa, and Minnesota on a par with those of Korea and Taiwan, and means for Mississippi and the District of Columbia that match Jordan’s (Salganik et al., 1993)!3
Salkind_Chapter 70.indd 328
9/4/2010 10:49:22 AM
Mislevy
Learning from International Assessments
329
Population Definition and Sampling The evidential value of early international assessments for international comparisons was limited by the disparity of target populations and lack of representativeness of samples. IEA’s First International Mathematics Study, for example, included a survey of students in the final year of secondary study. Table 1, from Wolf (1977), shows vast differences in the proportions of students from the relevant age group enrolled in the participating nations in 1964, resulting from the available resources and the degree of selectivity in their school systems. Even within these diverse in-school populations, representative samples could not always be achieved because of lack of cooperation in some countries and restrictions on attainable students in others (e.g., limitation to urbanized areas or to dominantlanguage, hence more privileged, schools). Several strategies have since been employed to address these deficiencies. A first strategy has been simply to gather better evidence: defining populations more consistently across nations, designing more representative samples, and increasing cooperation rates from schools and students. A second has been to turn attention to 9- and 13-year-olds, because in many countries most children of these ages are in school (compare Table 2, from Lapointe, Mead, and Askew, 1992, with Table 1). Comparability is thus improved in exchange for scope of comparison. A third strategy has been to analyze data from selected subpopulations of nations. Husén (1975, p. 130–131) reported that in the IEA’s First International Mathematics Study (FIMS),
Table 1: Percent of age group enrolled in full-time schooling in the terminal year of secondary school in 1964 Country United States Belgium (Flemish) Belgium (French) Sweden Australia France Hungary Finland England Scotland Chile Italy India Netherlands New Zealand Thailand Federal Republic of Germany Iran
Percent 75 47 47 45 29 29 28 21 20 17 16 16 14 13 13 10 9 9
Note: Table based on Table 15 of Wolf (1977).
Salkind_Chapter 70.indd 329
9/4/2010 10:49:22 AM
330
Research Design, Measurement and Statistics and Evaluation
Table 2: Age 13 sampling frame for IAEP mathematics in 1991 Country Scotland United States Spain Switzerland Ireland France Jordan Hungary Korea Israel Slovenia Canada Taiwan Soviet Union
% Age-eligible in school
% In-school age-eligible in frame
100 100 100 100 99.8 99.7 98.5 97.8 95.9 95.5 95.4 94 –100 90 –
99 98 80 76 93 98 96 99 97 71 97 94 100 60
Comments on frame
Spanish-speaking schools; not Catalan Public schools, 15 cantons
Hebrew-speaking public schools only
Russian-speaking schools, 14 republics
Note: Table based on Figure A.2 of Lapointe, Mead, & Askew (1992).
The average mathematics score among United States high school graduates is far below that of all other countries. . . . But when we compare the average score of the top 4% of the corresponding age group, a proportion selected because it represents the lowest relative number of students in any country taking mathematics, ... the range among countries is much narrower than for the entire group of terminal mathematics students. The top 4% of U.S. students score at about the same level as those in other countries.
Similarly, Westbury (1992) found that in the Second International Mathematics Study (SIMS), the top 20% of the U.S. 13-year-old students, who had taken algebra, had an average similar to top 20% of the Japanese students.
Context To a participating student, an international assessment is a personal event in a social and institutional situation. Systematic differences in the ways students from different nations typically react to this situation can reduce the value of “what they do in assessment setting” as evidence about “what they know and can do” as more broadly conceived. Reports from IEA and IAEP indicate a number of threats of this kind to comparability: • Familiarity. The more familiar the type of task a student is asked to perform, the better he or she will tend to do. The strictly timed, no-talking-no-help, restrictive format tasks that have characterized most international assessments tend to favor students and nations in which such assessment is commonplace – just as any alternative assessment methods would favor the students familiar with them.
Salkind_Chapter 70.indd 330
9/4/2010 10:49:22 AM
Mislevy
Learning from International Assessments
331
• Motivation. Archie Lapointe recalls the assembly before the administration of the 1991 IAEP assessment in Korea, to honor as “champions” of their school the students who had been selected at random to take the assessment. Might not their motivation differ from that of American 13-year-olds, excused from gym class to write impromptu essays in a survey that has no bearing on their grades? In the 1992 reading survey of the U.S. National Assessment of Educational Progress (NAEP), about 3% of U.S. students who performed relatively well on multiple-choice tasks simply didn’t bother to respond to tasks that required a more extended written response (John Donoghue, personal communication). Motivation also depends on the relative match of tasks to students’ capabilities, in content as well as format, and on the length of time students are asked to perform. • Style. When students from different nations approach a set of tasks with different response styles, summary indices based on, for instance, proportions of correct response, can be misleading. Richard Wolf (1977, p. 33) describes a problem observed in the first IEA mathematics assessment, and still encountered in the most recent IAEP: Previous IEA studies had revealed interesting differences in the amount of guessing among countries. In the mathematics study, for example, students in the United States appeared to engage in a large amount of guessing; i.e., a substantial number of questions were incorrectly answered and relatively few were omitted. In contrast, Belgian students appeared to engage in very little guessing; that is, relatively few items were answered incorrectly but a substantial number of items were omitted. Belgian students, it was learned, are taught from the beginning of their school careers not to attempt to answer a question unless they are almost certain they know the answer. This is not true in the United States, however, where guessing is often encouraged.
• The two distinct dimensions along which the performances of students from these two nations differed – how many of the tasks they attempted, and how well they did on the ones they did attempt – cannot be fully captured by any single-number summary.4
Measures of Achievement The heart of international assessment comparisons is the set of tasks on which performance is to be compared. The desire for single-number comparisons is driven by the lingering belief in universal “traits” such as “intelligence” and “mathematics achievement” that can be ascertained from performance in settings that somehow transcend culture, background, relevance, and educational experience. Now it is true that we can gather some evidence about achievement by observing performance in settings that hold some relevance across participating nations, perhaps to different degrees in varying nations; and we
Salkind_Chapter 70.indd 331
9/4/2010 10:49:22 AM
332
Research Design, Measurement and Statistics and Evaluation
can, in each nation, supplement these common tasks with additional tasks particularly relevant to that nation and perhaps selected others. But there is a big step from what we actually observe students do to what we infer about what they know, what they have learned, and what they can do in a variety of settings both in and out of school. This section considers inferential issues that concern the kinds of tasks that might be used in an international assessment, the senses in which they may or may not be comparable, and the various levels of aggregation at which one might summarize and analyze the responses. Comparability of assessment tasks. The very notion of quantitative comparison of nations demands a common frame of reference. The usual approach in educational measurement follows by analogy from physical measurement. One specifies the situations under which observations will be made (e.g., defining the assessment tasks and administration conditions) and the rules by which observations will be mapped to summary statements (e.g., identification of correct answers for multiple-choice items, scoring guides for openended tasks, schemes for aggregating results from individual tasks). The final result summarizes students’ behavior in the same way for all participants, and the precision of the summaries is reliability or measurement accuracy. The usefulness of these summaries for various inferences about students’ capabilities more broadly construed is validity. For the most part, IEA and IAEP have relied on multiple-choice tasks to indicate achievement. The range of skills such items provoke is at best an incomplete representation of the capabilities that schooling is meant to impart. If students are familiar with tests of this type, however, the rules for what they have to do and how they will be evaluated are straightforward. Tasks that require constructed solutions probe a broader range of skills, but students’ reactions to them vary even more than with multiple-choice tasks (Hambleton & Kanjee, n.d.). In addition to the motivation problems noted previously, we must be concerned about the comparability of meaning across nations of both the tasks and the standards by which they are to be evaluated. The validity of comparing students’ capabilities from their performance on standard tasks erodes when the tasks are less related to the experiences of some of the students (Estes, 1981). Suppose we ask U.S. and German students to solve statistics problems written in German. Their relative performance may be a valid indicator of their proficiency with “statistics problems written in German,” but the U.S. students’ performance is not a valid indicator of their understanding of statistics. The “functional equivalence” approach to comparisons is to devise tasks for different groups that may differ on the surface, but tap comparable knowledge, skills, and strategies. The obvious fact that students in different nations (and often within a given nation) speak different languages necessitates functional rather than literal equivalence of tasks. To what extent does simply translating tasks solve the problem? Two examples illustrate extremes in the comparability of tasks. The first is drawn from Angoff and Cook’s (1988) calibration of the Scholastic Aptitude
Salkind_Chapter 70.indd 332
9/4/2010 10:49:22 AM
Mislevy
Learning from International Assessments
333
Test (SAT) and the Prueba de Aptitud Academica (PAA), college entrance examinations for high school juniors and seniors in English and Spanish, respectively. Carefully translated versions of 91 multiple-choice mathematics items and 142 multiple-choice verbal items were administered to Englishlanguage SAT test-takers in the continental United States and to Spanishlanguage test takers in Puerto Rico. Figures 1 and 2 show the relative difficulties of the items in the two languages for the mathematics and verbal sections, respectively. Which items are relatively easy and which are hard is quite similar in the mathematics section, but not in the verbal section. A singlenumber score in mathematics in either language summarizes performance in about the same way for both language groups, in the sense that which particular items a student with a given score got right would be similar regardless of whether they were in English or Spanish. In contrast, the item-by-item performances of PAA and SAT test-takers with the same overall score would generally be quite different. The mathematics scores are in this sense more comparable across languages than verbal scores. Angoff and Cook (1988, p. 6) note that these interactions were not entirely unexpected; the observation has often been made that verbal material, however well it may be translated into another language, loses many of the subtleties in the translation process. Even for mathematical items some shift in the order of item difficulties is to be expected, possibly because of differences between Puerto Rico and
4.0
3.0
+
2.0 +
PAA group
+
+
1.0 +
0.0
−1.0
+
−2.0
+ + +
+
+
+ ++
+
+
+
+ + + +
−3.0
−2.0
+
+
+
+ +
+ Items originally in Spanish Items originally in English Items selected for equating
−3.0 −4.0 −4.0
+ + +
++ + + + + + + + ++ + + + ++ + + + + + +
+
+
+
−1.0
0.0 SAT group
1.0
2.0
3.0
4.0
Figure 1: Plot of bs for pretested verbal items (N = 91)
Salkind_Chapter 70.indd 333
9/4/2010 10:49:22 AM
334
Research Design, Measurement and Statistics and Evaluation
2.0
+
+ ++
1.0 + +
PAA group
0.0
−1.0
+ +
+
+
++
++ +
+ + + + + + ++ +++ +
−2.0 + Items originally in Spanish Items originally in English
−3.0
Items selected for equating −4.0 −4.0
−3.0
−2.0
−1.0
0.0 SAT group
1.0
2.0
3.0
4.0
Figure 2: Plot of bs for pretested mathematical items (N = 142)
the United States with respect to the organization and emphasis of the mathematics curriculum in the early grades.
The second example is drawn from the 1980 IEA study of written composition, which found that standard task definitions and rules of scoring did not provide a satisfactory foundation for uniformly comparable international scales of writing achievement (De Glopper, 1988, p. 74). As White and Löfqvist (1988, p. 98) explained, Although it might be thought that this range of [pragmatic writing] tasks comprised a fairly basic set of writing competencies, results of the study indicated that students’ familiarity with, or need for all of these uses of writing varied according to the cultural context. Thus, it was remarked that Chilean and Thai students rarely had the experience of applying for summer jobs, while in Italy the hierarchical organization of schools made it most unlikely that a student/head teacher communication of the type requested would ever occur. For Indonesian students the phenomenon of coming home to an empty house and needing to leave a written message to members of the family was unthinkable . . . . Therefore, if we are to think of these writing topics as in some sense “functional” or basic, it is important to remember that the ways in which each one is focused for the purpose of the IEA study mean that for some of the students concerned, writing in these modes required a considerable imaginative leap – something quite different from the more mundane message transmission envisaged by such a range of tasks.
We can evaluate the coverage of evidence that achievement summaries based on common assessment tasks provide with supplemental studies within
Salkind_Chapter 70.indd 334
9/4/2010 10:49:23 AM
Mislevy
Learning from International Assessments
335
nations, in which targeted, in-depth information is gathered from students who are also administered the common tasks. The resulting relationships show how performance in the common, limited-scope, assessment tasks relates to performance in students’ own school settings and in real life. Depending on the learning area and the scope of the common tasks, these relationships can differ not only from nation to nation, but also from school to school and from neighborhood to neighborhood within nations. Aggregating performance across tasks. The outcome for every individual task in an international assessment tells a story in its own right. Assessments with hundreds of tasks, like those of IEA and IAEP, tell hundreds of stories – easily too many for even a specialist to digest. For any given inference, though, certain groups of tasks, related by the skills they tap and their relation to the various curricula, tell similar stories. The fundamental law of data aggregation is that collapsing information simultaneously (a) highlights the common pattern and (b) obscures patterns that are unique. The name of the game is to determine groups of tasks that optimize evidence for inferences of interest. Optimal levels of aggregation can differ for different inferences. As we saw in Figure 1, total scores tell most of the story for comparing the levels of performance of the PAA and SAT samples in mathematics, because the differences between groups were so similar on all the items.5 In this case, weighting the results differently for different items would have little effect on the summary comparison. In international assessments, however, patterns of differences among nations on tasks are often related to the degree to which the nations emphasize topics in their curricula, as indicated by “opportunity to learn” (OTL) ratings the teachers provide (Platt, 1975, p. 46). Though comparisons might be similar among tasks within topic groupings, comparisons can differ across groupings. When this happens, summaries over topics depend on the numbers of tasks that happen to be present in each topic (Wolfe, 1989). An agency contemplating comparisons based on a single-number summary should, therefore, 1. Attempt to identify interactions (some tools for doing this are discussed in what follows); 2. Limit reports to single-number summaries only if interactions are minimal; and 3. If interactions are found, present them and determine the stability of single-number summary comparisons with respect to alternative meaningful weightings. If comparisons vary with importance weightings, it is the responsibility of the reporter to justify any particular choice he or she emphasizes. For example, Eugene Johnson (personal communication, 1989), finding only slight differences in the status of nations in the IAEP-88 Mathematics assessment when
Salkind_Chapter 70.indd 335
9/4/2010 10:49:23 AM
336
Research Design, Measurement and Statistics and Evaluation
weighted by OTL figures from each nation in turn, offered the un-weighted aggregate as a fairly representative and consensually defined international “market-basket” of educational tasks. Achievement indices are analogous in this respect to the Department of Labor’s Consumer Price Index (CPI). The CPI is used to track changes in prices of a “market-basket” of goods and services determined through surveys as representative of a typical American. At the highest level of aggregation, the CPI itself is viewed as an (admittedly imperfect) indicator of inflation. More detailed reports present a fuller and typically more variegated picture, such as, “The CPI increased by .5% last month, but this was due mostly to energy costs, which rose rapidly in the Northeast and moderately in the Midwest. Food prices actually fell.” We note that many interest groups closely watch changes in the definition of the CPI. Government benefits such as Social Security payments are often indexed to the CPI, and a relatively minor change in the composition of the market basket can translate to hundreds of millions of dollars in a year. The Association for the Advancement of Retired People (AARP), for example, would prefer an inflation index for Social Security benefits that better matches the expenditures of retired people, including a lower proportion on housing and a higher proportion on energy. A recent technical development concerning aggregation in international assessment merits comment. Item response theory (IRT) is a scaling approach based on the patterns of regularity among the tasks in a selected group. It characterizes students in terms of their overall tendency to make correct responses (for multiple-choice items) or higher rated performances (for openended exercises). It characterizes tasks in terms of their tendency to be answered correctly or receive highly rated responses. IRT enables comparisons on a common scale from efficient but complex designs for presenting samples of tasks to samples of students (Benefit #1). Because the task and student parameters imply probabilities of possible responses from students at any level of overall proficiency, IRT adds a layer of meaning to the achievement index (Benefit #2). IRT facilitates investigations of item-by-nation interactions, to detect differential, possibly intentional, differences across nations in tasks; this way the model can be applied just to groupings of tasks that share similar patterns across nations (Benefit #3).6 This last point is particularly important, because IRT models don’t change the fundamental law of aggregation. Modelfit investigations, such as those reported for IAEP’s 1991 mathematics and science assessments (Blaise, 1992), are necessary precursors to IRT.
Comments about International Comparisons I stated earlier that my answer to people who want comparative standings is to give them comparative standings – lots of them: in different topics, at different ages, with different kinds of tasks; unweighted, weighted by
Salkind_Chapter 70.indd 336
9/4/2010 10:49:23 AM
Mislevy
Learning from International Assessments
337
national curricula guidelines, weighted by surveyed opportunity-to-learn; unadjusted results for the full sample, for students in selected courses of study, for students at or above selected percentiles on within-nation performance.7 I would also provide comparisons of wholly different indices, such as the school-completion rates, school achievement data, and job-distributionby-education characteristics. Because no single index of achievement can tell the full story and that each suffers its own limitations, we increase our understanding of how nations compare by increasing our breadth of vision – just as Consumers Reports informs us more fully by rating scores of attributes of automobiles. (Among, say, mini-vans, “best buys” score well for their cost in several categories, and strike effective balances between competing qualities such as performance and economy.) We should continue to improve the techniques we use to define indices, collect data, and analyze results for all such indices – just as Consumers Reports strives toward more comprehensive and more accurate measures of aspects of automobiles’ safety, comfort, and performance. Only with such information can we at once compare nations on aspects we deem comparable and evaluate each in light of their own goals and resources. Though I do believe indices of educational achievement that are to varying degrees comparable across nations can be useful (for reasons discussed in the section on assessing within-nation priorities), ascertaining nations’ relative standing tells us little about how to set educational policy or improve instructional practice. EPA fuel mileage ratings help us to compare the cars’ fuel economy, but tell engineers nothing about how to boost a given car’s performance. The second motivating objective of international assessment, therefore, has been the attempt to infer the determinants of achievement.
Determinants of Achievement When studying the effectiveness of schools, investigators must ask whether the primary goal is to provide descriptive data of how things are or, rather, to estimate what outcomes would most likely occur if certain changes were introduced. If the goal is essentially description, then largescale sample surveys provide excellent data. . . . If, on the other hand, the purpose is to develop informed predictions about how educational outcomes would change in response to new or different mixes of resources, a randomized controlled field trial is preferable, almost necessary. (Platt, 1975, p. 63)
Recognizing that simple comparisons of status provide little guidance for improving education, users of assessment data, national and international alike, consistently plead for more practical advice from assessments ( Viadero, 1993). What policies should we enact? How should we structure the curriculum? Which teaching practices should we follow? International assessments
Salkind_Chapter 70.indd 337
9/4/2010 10:49:23 AM
338
Research Design, Measurement and Statistics and Evaluation
such as those of IEA and IAEP would appear well positioned to respond. IEA assessments, for example, solicit information from students and teachers on some 200 background variables in addition to achievement tasks, including type of school and program, student characteristics (e.g., demographics, educational background, home conditions), learning conditions (e.g., OTL, instructional practices), and kindred variables (e.g., attitudes toward education, aspirations). This section concerns the associations of background variables such as these with achievement. The central message is neither new nor optimistic: We cannot infer the causes of achievement from survey data such as those gathered in international assessment. Does extra instruction in reading help children read better? Of course, we respond. Yet in the 1992 NAEP reading assessment, the amount of reading instruction fourth-graders receive is correlated negatively with their performance on the reading tasks (Mullis, Campbell, & Farstrup, 1993): Time Spent in Reading Instruction (in Minutes)
Average proficiency
30 – 45
60
90
220
219
216
With a correlation of about –.1, reading instruction seems to reduce reading performance. But the average difference among students in the population who received various amounts of reading instruction – the prima facie effect – doesn’t necessarily estimate the average causal effect of reading instruction on performance, because factors that may influence instructional time or reading performance are not taken into account in the comparison (Holland & Rubin, 1987). The NAEP report explains that the negative relationship in this example makes sense when we remember that (a) students who get extra help are usually students who seem to need extra help, and (b) students who seem to need extra help usually have low test scores to begin with.8 The problem is that other prima facie effects we interpet as causal effects when they conform to our expectations can be just as wrong for similar reasons. In contrast, the prima facie effect in a randomized experiment is an unbiased estimate of the average causal effect, because all other variables – even ones we are not aware of – are independent of the assignment of the conditions we want to compare.9 Their effects tend to cancel out as sample size increases. Without random assignment, the effects of other variables need not cancel out, so the assumptions required to infer causal effects are very strong: “We might be willing to assume strong ignorability … if each [matched group of individuals in the comparison groups] contains a very homogeneous set of individuals who tend to respond very similarly to [the alternative treatments]” (Holland & Rubin, 1987, p. 27). We might entertain a causal interpretation for the differing results between a drug with white rats in one laboratory and a different drug with the same strain of rats in a
Salkind_Chapter 70.indd 338
9/4/2010 10:49:24 AM
Mislevy
Learning from International Assessments
339
different laboratory because, genetically, laboratory rats of the same strain are almost as alike as identical twins. But in international comparisons, we don’t have this similarity of students in different nations. Even after we match for, for instance, age, sex, family income, and parents’ education, there remain differences among students from Taiwan, the United States, and Mozambique with respect to culture, attitude, motivation, and values that can moderate the effect of instructional practices. This doesn’t mean that studying the relationships between achievement and other variables is wholly useless. When making comparisons among nonrandomly determined groups, sometimes we can match cases or use statistical techniques such as blocking or regression to take selected variables, or covariates, into account to some extent (Rosenbaum & Rubin, 1983). Suppose we speculate that “the amount of teacher education, including preservice teacher training, is important for the performance of students, particularly in the higher grades in school” (Postelthwaite, 1975, p. 28). It may be that teachers with more training tend to teach in schools in wealthier communities, or are assigned to more advanced classes within schools. If so, the difference among students with teachers with different amounts of experience would depend on these factors as well, and, as in the reading instruction example, modify or even reverse the relationship. We would instead compare achievement among students whose teachers had different amounts of training, but who had similar scores at the beginning of the year, took comparable courses in previous years, and lived in the same kinds of neighborhoods. Other factors we don’t have data to match on, or aren’t even aware of, can still make the “matched cases” effect different from the causal effect, but at least we’ve eliminated some of the effects we knew could skew the results. The availability of strong, well-understood covariates10 never eliminates the potential of a large difference between prima facie and causal effects, but it does require ever stronger relationships among the studied variables and omitted variables to alter the relationship. Thus, studying associations among background variables and achievement variables is useful as circumstantial support for conjectures about the determinants of achievement, or as a source of inspriation for new conjectures. We note in passing that though relationships with achievement are generally found for OTL and economic status variables in international surveys, findings related to more specific instructional practices have been disappointing. Thorndike (1973, p. 178) lamented the early pattern, rarely broken: In general, the factors that it was possible to identify in the school are at best minimally related to reading achievement, and a relationship that is found in any country rarely appears consistently in the others. Even the variables that one might anticipate a priori would be predictors of achievement do not tend to hold up. For example, indicators of training
Salkind_Chapter 70.indd 339
9/4/2010 10:49:24 AM
340
Research Design, Measurement and Statistics and Evaluation
of teachers in the teaching of reading, of size of class, and of availability of specialist teachers in the school all turn out to have either no relationship to reading achievement or a relationship the reverse of what one might anticipate.
For pointed conjectures about the effect of particular variables, a welldesigned, randomized field study with 200 students can provide stronger evidence than an international assessment with 200,000 students (Wiley & Bock, 1967). As a reviewer of an earlier draft of this article pointed out, however, experiments in education have not always produced the definitive results their advocates hope for. First, any experimental comparison takes place within a context. Potential explanatory variables can be eliminated as explanations of results when, by design, they are not confounded with the studied conditions, but it remains an open question as to whether the same effects would be observed in different contexts. For example, a course that works well in the suburbs may fare poorly in the inner city. Second, unbiased effects of some agent are estimated in an experiment, but because the difference in conditions that experimenters intended to manipulate is not always the difference in conditions that actually occurs, it may be difficult to isolate the explanation. For these reasons, any experiment, like any survey, cannot be considered definitive; it is the accumulation of corroborating evidence in different contexts and different circumstances that ultimately sways our belief.
An Analogy from Medical Research Medical research illustrates how different kinds of studies complement others. The Centers for Disease Control might carry out an epidemiological study to begin to learn about the cause of an epidemic. A broad range of information is gathered from the local population, concerning nutrition, lifestyle, environment, and health. It is not known which of these variables will bear any relevance to the epidemic, and there are far too many to examine them all in controlled studies. They are examined for associations among background variables and the disease. Factors such as age, sex, and complicating health conditions are taken into account as well as possible. Background variables that are still associated with a disease are so-called risk factors. They seem to be related somehow to the disease, but we cannot conclude from this kind of study that they cause the disease. To learn more, laboratory experiments or clinical field trials are required. For example, early epidemiological studies in the 1950s showed a significant correlation between smoking and lung cancer, but with few background variables taken into account. The eminent statistician Sir Ronald Fisher pointed out that this association does not prove that smoking causes cancer.
Salkind_Chapter 70.indd 340
9/4/2010 10:49:24 AM
Mislevy
Learning from International Assessments
341
Fisher argued that smoking might only be indicative of certain genetic differences between smokers and nonsmokers a nd that these genetic differences could be related to the development or not of lung cancer. Fisher did feel that “a good prima facie case had been made for further investigation” (Holland, 1986, p. 955).
He did not mean to collect greater amounts of data the same kind, with an epidemiological survey that merely asked the same questions of a bigger sample of people. This would provide additional evidence about the question “What is the correlation between smoking and cancer?” but not about the question we are really interested in, “Does refraining from smoking reduce a person’s chances of developing lung cancer?” We cannot carry out experiments that would provide the strongest direct evidence for this conjecture, namely, assigning people at random to smoking and not-smoking conditions. Today’s stronger belief that smoking causes cancer rests on converging lines of indirect evidence. First, there are survey data that take more relevant variables into account: “Among his responses to Fisher, McCurdy pointed out that lung cancer rates increased with the amount of smoking and that subjects who stopped smoking had lower lung cancer rates than those who did not” (Holland, 1986, p. 955). Secondly, experiments with laboratory animals strongly support the conjecture that smoking causes lung cancer in mice – persuasive, though still indirect, evidence about its effect on humans.
Background Variables If the background variables included in an international assessment are poorly defined or unreliably measured, it is difficult to ascertain their association with achievement, however defined. There are usually tradeoffs between the quality of background information and its cost, in terms of time, money, motivation, or cooperation: At the end of the year, teachers completed a background questionnaire as well as the opportunity-to-learn (OTL) instrument. The OTL questionnaire required the teacher to indicate, for each of the items in the pool (180 for 8th grade and 136 for 12th grade) whether or not the mathematics on which the item was based had been taught to the target class. The SIMS instrumentation was very demanding of time and effort of those participating. This factor undoubtedly contributed to the relatively low participation rate. [The cooperation rate of selected public districts was about 50%; of private schools, about 40%.] (IEA, 1985, p. 95–96)
IEA studies have consistently found a relationship between nations’ emphasis on topics and student performance on tasks in those topics (Platt, 1975). Teachers’ ratings on a simple 4-level scale of the degree to which topics have been addressed is clearly an impoverished indicator of students’ educational
Salkind_Chapter 70.indd 341
9/4/2010 10:49:24 AM
342
Research Design, Measurement and Statistics and Evaluation
experience. Observational studies of classroom process go further, and indeed, IEA carries them out in selected assessments. These studies are expensive because they station a trained observer in the classroom for extended periods of time – probably prohibitively expensive to carry out on a large scale, given the limits of evidence from survey data about the determinants of achievement. IEA also gathers performance data at the beginning and the end of the school year in selected assessments (IEA, 1985). This costs more than collecting data at a single point in time, but holds the promise of greater utility: Pretest performance can serve as a covariate, to allow a sharper focus on associations of achievement during the school year with OTL and reported instructional-practice variables. Matching on pretest scores (literally or statistically) approximates matching on a host of unspecified factors that were operating before the year began, such as socioeconomic status and parental attitudes, although it does not control for their impact during the year. Soliciting background data from the students themselves is quite economical, compared to ascertaining information such as home characteristics from actual observation or record searches. Especially with younger students, though, the trade-off is accuracy: Some indicator systems have relied on student reports for information on background factors. . . . [An] analysis of the quality of responses in the High School and Beyond study provided … sobering results. Correlation coefficients between sophomores’ and parents’ reports of background variables ranged from very low to quite high – for example, .21 for the presence of a specific place to study in the home; .35 for the presence of an encyclopedia in the home (an item used in the NAEP as well); .44 for mother’s occupation; .50 for family income; .56 for whether the family owns or rents its residence; .81 for mother’s education; and .87 for father’s education. (Fetters, Stowe, & Owings, 1984, in Koretz, 1992a, pp. 17–18)
In the same vein, IEA (1985, p. 23) found that Teachers’ opinions about the teaching of geometry were notably at odds with their reported practices. They affirm that an intuitive approach is most meaningful, that concrete models and aides should be used and that activities to improve spatial ability should be included. But in reality the most emphasized approach was a statement of definitions. (IEA, 1985, p. 23)
We could further speculate about the degree to which their reported practices are at odds with their actual practices.
Statistical Methodology To study the effect of teacher training on student achievement, we discussed the utility of matching students on neighborhood, courses taken, and previous scores. We would look at the difference in performance of groups of students
Salkind_Chapter 70.indd 342
9/4/2010 10:49:24 AM
Mislevy
Learning from International Assessments
343
who differ as to the training their teachers had received, but who were matched on another set of background variables we believe influence achievement. But unless we have a very large sample and few matching variables, the number of students in matched groups becomes too small to provide stable estimates. Instead of matching explicitly, we can use regression analyses to the same end, leaning on assumptions of regularity in patterns to make up for lack of data. The outcome of interest, for instance, a reading score, is the dependent variable. Its associations with background variables, taking into account their associations among themselves, are expressed as regression coefficients. A regression coefficient is interpreted in the following way. Consider groups of students who were identical on all the background variables except a particular one. The regression coefficient for that background variable is an estimate of how different the groups’ performances would be, as a function of how different their values are on that particular background variable. It is a way to approximate a prima facie effect in a more complicated situation. If the students had been assigned at random to levels of that background variable, the regression coefficient also estimates the average causal effect of the variable. Otherwise, as is usually the case in assessments, it does not. In the reading example, we knew right away that the prima facie difference was not the causal effect because it didn’t make sense and we could see why. It is harder to maintain our skepticism in regression analysis, for two reasons. The first is that the analysis is more complex than simply comparing prima facie differences, and it is easy to think that complex analyses are doing more than they really are. But regression is not more complex because it is carrying out a different kind of inference, or because the nature of the evidence is any different; it is the exactly the same kind of reasoning, just applied to more complex data. The second reason is that results often seem to make sense. Al Beaton (personal communication) recalls a comment he heard about the Cole-man et al. (1966) study of educational opportunity: “Look at the regression coefficient for teacher’s education. If we provided each teacher with one additional year of schooling, we’d raise students’ scores by 2 points.” Al replied, “But the coefficient for ‘Do you have a vacuum cleaner in your home?’ [a proxy for economic status] is even higher. We should just buy each kid a vacuum cleaner.” This warning about inferring causal effects is by far the most important inferential issue for regression analysis of assessment data, within and across nations. Some additional, more technical, issues should be mentioned as limitations on the potential of regression analyses of such data: • The poor definition or unreliable measurement of background variables that erodes the strength of their relationships with achievement is reflected as artificially low regression coefficients. A partial solution that uses the same data is to use a statistical model that attempts to first estimate, then adjust for, the effects of poor measurement (more on this below). An alternative solution is to get better data in a different kind of study. Careful
Salkind_Chapter 70.indd 343
9/4/2010 10:49:24 AM
344
Research Design, Measurement and Statistics and Evaluation
observation of classroom practices in a handful of classrooms provides better information about the associations between instruction and learning than does cursory self-reported data from thousands of classrooms. • The limited range of some potential explanatory variables also precludes the possibility of finding strong relationships. Regression focuses on differences in outcomes associated with differences in background variables, so practices that are similar among schools will not show up with significant regression coefficients, even if they have large positive impacts (Wolf, 1977, p. 121). If all class sizes are similar within a nation, for example, the regression coefficient for class size will be difficult to estimate and probably not statistically significant, even if results for much larger or much smaller classes might have differed substantially. • Because background variables are generally associated with one another (e.g., better teachers tend to be employed in schools with more resources and more economically advantaged students), there are limits to the extent to which regression analyses can disentangle their associations with performance. Finding consistent results for a given variable in a wide variety of models adds credibility to its relevance. Finding large variations or even sign changes in its coefficients under different models indicates that the data cannot support strong statements about the relationship. • When important background factors are omitted from the survey, their effect on achievement appears in regression coefficients for related survey variables. This happened in Beaton’s vacuum cleaner example. Students’ opportunities to learn in their preschool years influenced their later achievement in school. Economic conditions in the home was correlated with these opportunities. Having a vacuum cleaner was correlated with economic conditions in the home. Because the Coleman study could not directly address the key variables in this chain, the positive effect of early learning on school learning showed up as a coefficient for “Do you have a vacuum cleaner?” Sound construction of assessment questionnaires and analyses of survey data must lean on results from small-scale, in-depth field studies. These latter studies help indentify variables that seem to be important in achievement, so that affordable proxies for them can be included in large-scale surveys. Information from each kind of study thus improves the next round of the other kind. Structural equations modeling and multilevel analysis. Over the past decade, two extensions of regression analysis have proven useful in analysis of assessment data. They are structural equations modeling (e.g., Muthén, 1988) and multilevel analysis (e.g., Raudenbush & Willms, 1991). By exploiting what is known a priori about the structure of associations among variables, both techniques allow us to fit models more finely attuned to the patterns within data and better focused on the conjectures of interest. Like the move from comparisons of means to multiple regression, though, neither
Salkind_Chapter 70.indd 344
9/4/2010 10:49:24 AM
Mislevy
Learning from International Assessments
345
can overcome the hurdle between model-fitting and conclusive causal inference about determinants of achievement from survey data (although the lure is seductive; sometimes structural equations modeling is even called “causal modeling”). The main idea of structural equations modeling is to build a system of regression equations that, together, incorporate hypotheses such as which variables are associated with others only because they are both associated with a third (“conditional independence”), and which observed variables are noisy measures of the same unobserved “true” value (“measurement error models”). Suppose we determine the relationship between economical but unreliable self-reported income data and actual values in a small side study. With a structural equations model, we can use this relationship to estimate the association between performance and actual income from the attenuated association between performance and self-reported income. If these hypothesized structures are consistent with the data, resulting regression coefficients are better indicators of the associations among background factors and achievement. However, the same data can generally be fit equally well with a variety of models based on different hypothetical relationships and having different implications for some conjectures. Any result that shifts or changes sign under different plausible models is not supported by the data in hand. The main idea of hierarchical analysis is to account for the shared impact of variables in the hierarchical organization of school systems. For example, a teacher’s practices affect all the students in the class, a state’s funding levels affect all the schools in the state, and a nation’s educational policies affect all the schools in the nation. Some variables that affect educational achievement operate at the level of the student, others operate at the level of the school, the district, the class, and so on. The regression coefficients for student variables in one class, for example, may be quite different from the coefficients in another class, because the instructional approaches of the teachers differ. The ability to model these differences allows us to explore more facets of associations among background variables and achievement.11 As with comparisons of means, multiple regression, and structural equations modeling, though, one cannot draw causal inferences just because a hierarchical analysis is used, and any estimates of associations that do not hold up under a variety of alternative plausible models cannot be trusted.
Assessments of In-Country Expenditure Priorities We have seen that international comparisons of achievement status provide little information to guide educational policy or instructional practice, and that analyses of correlates of achievement provide at best circumstantial evidence about the causes of achievement. Are there good reasons nevertheless to continue to carry out international assessments? Perhaps the best
Salkind_Chapter 70.indd 345
9/4/2010 10:49:24 AM
346
Research Design, Measurement and Statistics and Evaluation
reason, in my opinion, is that policymakers need indicators of status of educational achievement and practices within their own nations. Rather than setting policy in ignorance, they can use indices to alert them of problems and successes – without, of course, conclusively identifying causes or remedies. Indices can provide incomplete but affordable data about teaching practices and learning outcomes in the many classrooms of a nation. They provide direct evidence about conjectures concerning “What are things like?” in contrast to circumstantial evidence about conjectures of “Why are things as they are?” or “What shall we do to improve them?” The role of assessment indices of achievement and practice can thus serve a function much like the CPI does for economic policy.12 Comparisons of automobiles benefit from having common metrics and methods for measuring certain characteristics of performance, such as standard EPA tests for fuel consumption and braking distances from specified speeds. Sometimes the comparisons are directly relevant to our purchasing decision, but even when they are not, knowing results for a wide range of vehicles increases our understanding of the values of the cars we are considering. In the same way, some of the aspects of educational achievement we need to track in our own nation are relevant to other nations as well. Each participating nation benefits from the shared expertise, techniques, and results of an international assessment. International results on achievement, for example, add a dimension of meaning for setting standards for withinnation performance. With within-nation uses in mind, Wolf (1979) stresses that sample designs for IEA assessments “must be arranged to facilitate within-country analysis. This may be more important than provision of national summaries or international comparisons.” Within the United States, NAEP and international assessments help inform the debate about the condition of educational achievement. Essentially flat profiles of NAEP trends over 2 decades, limited though they are in scope, serve to help evaluate the often-heard indictment that performance has declined precipitously in recent years (Bracey, 1991; Koretz, 1992b). This is not to say that achievement is satisfactory or that improvements cannot be made. Indeed, results from international assessments suggest that they can be: The NAEP data add further support to our growing understanding of instructional and course-taking patterns. . . . At the high-school level, large proportions of students elect to avoid mathematics courses and, to a greater extent, science courses. Even though the United States may retain a larger percentage of students in high school than many other countries, the Second International Mathematics Study found that advanced mathematics course enrollment in the United States was only about average. The Second International Science Study found enrollments in advanced science courses in the United States to be well below other industrialized nations. Despite survey findings from NAEP and other large-scale assessments, consistently revealing that students who have had more coursework also have higher
Salkind_Chapter 70.indd 346
9/4/2010 10:49:25 AM
Mislevy
Learning from International Assessments
347
achievement levels, even students in academic programs often do not enroll in advanced mathematics and science courses. (Mullis, Owen, & Phillips, 1990, pp. 61–62)
As we have also seen, however, results from the OECD report comparing nations and states shows average achievement in North Dakota and Minnesota that looks like that of the highest nations in the survey, and averages in Mississippi and the District of Columbia that look like those of developing nations. The policies and practices that improve learning will certainly differ from one school to another, in ways we will only learn through experiments, field trials, and research into the processes of school learning. International assessments can thus play an important role in each nation’s system of educational research. Complementary ways of acquiring different forms of evidence for different purposes are all needed, ranging from the wide coverage, easier-to-get survey indicators found in international assessments to the insights of in-depth studies of the learning processes of individual students. Links among levels and kinds of research improve what we learn from each. Including measures of opportunity-to-learn from an international assessment in a more comprehensive observational study on classroom practices, as an example, adds contextual grounding and possibilities of improved analysis to the observational study.
Conclusion In 1973, Harvard University and IEA sponsored a conference summarized in On Educational Policy and International Assessment (Purves & Levine, Eds., 1975). It documented lessons researchers had learned about defining populations and achieving more representative samples, about devising and interpreting assessment tasks, and about how to analyze the data and draw justifiable inferences from them. Further progress has been made on all of these fronts since then. Nevertheless, Marshall Smith’s (1975) cautionary comment on inferential issues is as timely today as it was then; the key issues Smith raises are essentially the same ones discussed in the present article. Despite 30 years spent figuring out what can be learned from international assessment data and developing ways of doing it, expectations of “true” comparative rankings results and of causal conclusions from survey data remained largely unabated. Why is this so? Perhaps Howard Gardner’s conjecture about “the unschooled mind” explains it: In most domains of knowledge, we develop very powerful theories when we are very young. School and the disciplines are supposed to reformulate those theories and to make them more comprehensive and more accurate. As long as we stay in school, we can maintain the illusion that the effort
Salkind_Chapter 70.indd 347
9/4/2010 10:49:25 AM
348
Research Design, Measurement and Statistics and Evaluation
has succeeded, but … once we leave school, the illusion disappears and there is a 5-year-old mind dying to get out and express itself … No one has to tell a kid that heavy objects fall more quickly than light objects. It’s totally intuitive. It happens to be wrong. Galileo showed that it was wrong. Newton explained why it was wrong. But, like others with a robust 5-year-old mind, I still believe heavier objects fall more quickly than lighter objects. … The only people on whom these engravings change are experts. Experts are people who actually think about the world in more sophisticated and different kinds of ways. . . . In your area of expertise, you don’t think about what you do as you would when you were five years of age. But I venture to say that if I get to questioning you about something that you are not an expert in, the answers you give will be the answers you would have given before you had gone to school. (Gardner, 1993, p. 5)
A 5-year-old mind thinks about international assessments in terms of games – who won and who lost? – and of prima facie differences as causal effects. This is as true today as it was true 30 years ago, quite independent of advances in the expert perspective. This article maintains that international assessments, done well, can indeed provide useful information to help nations improve schooling – but not by becoming big enough and comprehensive enough to provide “the right answers.” International assessments convey context, clues, and current conditions. But after a point, a dollar spent on international assessment to enhance this kind of information tells less for improving education than the same dollar spent to obtain different, complementary kinds of information, as from field experiments, close observation of classroom processes, and investigations of cognitive aspects of learning. It is less newsworthy and less spectacular than a definitive experiment in physics, but the gradual confluence of evidence of different kinds, of different strengths, from different sources, is the source of accumulating wisdom upon which educational policy should be grounded.
Notes This paper was prepared for the Conference on the Use of International Educational Data, sponsored by the Board of International Comparative Studies in Education, held February 4, 1994, in Washington, DC. Support was provided by the National Center for Education Statistics, the National Science Foundation, and the Kellogg Foundation. I am grateful for discussions with Gene Johnson, Frank Jenkins, Nick Longford, Nancy Mead, Ina Mullis, Howard Wainer, Ming Mei Wang, and Kentaro Yamamoto. Figures 1 and 2 are reproduced with the permission of the College Board. 1. Especially in science and mathematics assessments (e.g., Lapointe, Mead, & Askew, 1992a and 1992b); U.S. samples have fared better in reading. 2. This projection was possible because (a) the tasks and administration conditions of the two assessments overlapped substantially and (b) they were administered to randomly equivalent groups of U.S. students at the same point in time. The degree to
Salkind_Chapter 70.indd 348
9/4/2010 10:49:25 AM
Mislevy
3.
4.
5.
6.
7.
8.
9.
10. 11.
Learning from International Assessments
349
which results from one assessment can be linked with those of another is addressed by Linn (1993) and Mislevy (1992). And of course there exists heterogeneity within North Dakota, Iowa, and Mississippi, and within Korea, Taiwan, and Jordan as well; the point is that single-number summaries at high levels of aggregation not only obscure patterns of attainment, but that they also provide little guidance for policymakers. Including “corrections for guessing,” which produce comparable scores for students who do and do not guess but only under the assumptions that they understand the adjustment and seek to maximize their score under its application. Total score suffices for summarizing levels of observed performance on all the items, but this is not the same as saying differences in total scores are equivalent to differences in mathematical capabilities. A constant shift by which all items become easier or harder for members of one culture for reasons other than the skills of interest cannot be disentangled from this kind of data alone. Sometimes we can account for the difficulties of items in terms of the skills they demand. In data from the U.S. Survey of Young Adult Literacy (Kirsch, Jungeblut, Jenkins, & Kolstad, 1993), for example, Sheehan and Mislevy (1990) were able to account for over 80% of the variation in document literacy IRT item difficulty parameters with descriptors of the complexity of the document and the directive and the cognitive processing requirements of the task. This opens the door to an alternative use of IRT in the upcoming international survey of adult literacy. Whether a single IRT model adequately fits document task performance across nations will be explored first. It may not, because of the interrelationships of the familiarity of content and context of documents in different cultures. But if different fits of IRT models succeed within nations and in each case the same higher level attributes account in essentially the same way for task difficulty, then a functional equivalence among the disparate IRT scales can be achieved. Respondents could then be characterized in terms of their capabilities of carrying out, for instance, two-feature matching tasks in line with a table’s organization, or determining the correct entry in a three-level nested list – as evidenced in their interactions with documents reflecting the contents and contexts of their own cultures. I would even go a step further by providing multiple rankings on a given index, each differing in accordance with a randomly selected draw of sampling-error and measurement-error distributions that characterize uncertainty given a particular operational definition of achievement. Nations with barely distinguishable values would often change ranks in these pseudocomparisons, thereby reducing the risk of overinterpreting small differences. If the correlation between needing help and performance is –.6, and the correlation between needing help and getting help is +.6, then the partial correlation between performance and receiving extra help among students equally in need of help would be +.4. The difference between treatment means in an experiment is an estimate of the average causal effect for the population involved in that experiment, and as such constitutes direct evidence for inference about the effect for that population. The same results are only indirect evidence about different populations, however, with decreasing weight as the populations differ. An experiment showing a curriculum works well in one suburban school is weaker evidence about its effect in disadvantaged urban schools than other similar suburban schools. In educational research, a case is usually built up from patterns of findings across experiments in different settings, in conjunction with survey results and studies of classroom interactions. Which are generally lacking anyway in international assessments, as discussed in the section on Background Variables. In addition to examining the overall effects of variables (“achievement as outcome”), hierarchical analyses have been used to explore the association of policy variables on
Salkind_Chapter 70.indd 349
9/4/2010 10:49:25 AM
350
Research Design, Measurement and Statistics and Evaluation
relationships of student background with achievement relationships within schools or classes (“slopes as outcomes”) and the identification of schools in which achievement is unexpectedly high in light of student background variables (exemplary schools research, or “residuals as outcomes”). 12. Expecting educational indices from surveys to determine educational policy is like expecting economic indices to determine economic policy; the result is bad policy. A reaction to a rapidly rising CPI without deeper analysis of the underlying factors is to impose wage-and-price controls – in effect, making it against the law for the CPI to rise further!
References Angoff, W . H., & Cook, L. L. (1988). Equating the scores of the Prueba de Aptitud Academica and the Scholastic Aptitude Test (College Board Report No. 88-2/ETS RR 88-3). New York: College Entrance Examination Board and Princeton, NJ: Educational Testing Service. Blais, J. G. (1992). IAEP technical report, Vol. 2. Princeton, NJ: The International Assessment of Educational Progress/Educational Testing Service. Box, G. E. P., & Taio, G. C. (1973). Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley. Bracey, G. W . (1991). Why can’t they be like we were? Phi Delta Kappan, 73, 104 –117. Bracey, G. W . (1992). The second Bracey report on the condition of public education. Phi Delta Kappan, 74, 104–117. Bracey, G. W . (1993). The third Bracey report on the condition of public education. Phi Delta Kappan, 75, 104–117. Carson, C. C., Huelskamp, R. M., & Woodall, T. D. (1993). Perspectives on education in America. Journal of Educational Research, 86, 259–311. Coleman, J. S., Campbell, E. Q., Hobson, C. J., Mc-Partland, J., Mood, A. M., Weinfeld, F . D., & York, R. L. (1966). Equality of educational opportunity. Washington, DC: U.S. Department of Health, Education, and Welfare. De Glopper, K. (1988). The results of the international scoring sessions. In T. P. Gorman, A. C. Purves, & R. E. Degenhart (Eds.), The IEA study of written composition I: The international writing tasks and scoring scales (pp. 59–75). Oxford: Pergamon Press. Estes, W . K. (1981). Intelligence and learning. In M. P. Friedman, J. P. Das, & N. O’Connor (Eds.), Intelligence and learning (pp. 3–23). New York: Plenum. Fetters, W . B., Stowe, P. S., & Owings, J. A. (1984). High school and beyond: Quality of responses of high school students to questionnaire items. Washington, DC: National Center for Education Statistics. Gardner, H. (1993). Educating the unschooled mind. Washington, DC: Federation of Behavioral, Psychological, and Cognitive Sciences. Hambleton, R. K., & Kanjee, A. (n.d.). Enhancing the validity of cross-cultural studies: Improvements in instrument translation methods. Unpublished manuscript. Amherst, MA: Department of Education, University of Massachusetts. Holland, P. W . (1986). Statistics and causal inference. Journal of the American Statistical Association, 81, 945–960. Holland, P. W , & Rubin, D. B. (1987). Causal inference in retrospective studies (Res. Rep. RR-87-7). Princeton: Educational Testing Service. Husén, T . (1975). Implications of the IEA findings for the philosophy of comprehensive education. In A. C. Purves & D. U. Levine (Eds.), Educational policy and international assessment (pp. 117–143). Berkeley, CA: McCutchen.
Salkind_Chapter 70.indd 350
9/4/2010 10:49:25 AM
Mislevy
Learning from International Assessments
351
International Association for the Evaluation of Educational Achievement (IEA). (1985). Second International Mathematics Study: Study report for the United States. Champaign, IL: Author. Kirsch, I., Jungeblut, A., Jenkins, L., & Kolstad, A. (1993). Adult literacy in America. Princeton, NJ: Educational Testing Service. Koretz, D. (1992a). Evaluating and validating indicators of mathematics and science education. RAND Note No. N-2900-NSF. Santa Monica, CA: RAND. Koretz, D. (1992b). What happened to test scores, and why? Educational Measurement: Issues and Practice, 11, 7–11. Lapointe, A. E., Mead, N. A., & Askew, J. M. (1992). Learning mathematics (Report No. 22-CAEP-01). Princeton: Educational Testing Service. Lapointe, A. E., Askew, J. M., & Mead, N. A. (1992). Learning science (Report No. 22-CAEP-02). Princeton: Educational Testing Service. Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6, 83–102. Maddison, A. (1975). Commentary on J. Vaizey’s “Implications of the IEA studies for educational planning with respect to organization and resource allocation.” In A. C. Purves & D. U. Levine (Eds.), Educational policy and international assessment (pp. 168–175). Berkeley, CA: McCutchen. Mislevy, R. J. (1993). Linking educational assessments: Concepts, issues, methods, and prospects (foreword by R. L. Linn). Princeton, NJ: Policy Information Center, Educational Testing Service. (ERIC Document Reproduction Service No. ED 353 302) Mullis, I. V. S., Campbell, J. R., & Farstrup, A. E. (1993). NAEP 1992 reading report card for the nation and the states. Princeton, NJ: Educational Testing Service. Mullis, I. V. S., Owen, E. H., & Phillips, G. W . (1990). America’s challenge: Accelerating academic achievement (Report No. 19-OV-01). Princeton, NJ: Educational Testing Service/National Assessment of Educational Progress. Muthén, B. (1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wiainer & H. Braun (Eds.), Test validity. Hillsdale, NJ: Erlbaum. Platt, W. J. (1975). Policy making and international studies in educational evaluation. In A. C. Purves & D. U. Levine (Eds.), Educational policy and international assessment (pp. 33–59). Berkeley, CA: McCutchen. Postelthwaite, T. N. (1975). The surveys of the International Association for the Evaluation of Educational Achievement (IEA). In A. C. Purves & D. U. Levine (Eds.), Educational policy and international assessment (pp. 1–32). Berkeley, CA: McCutchen. Purves, A. C, & Levine, D. U. (Eds.). (1975). Educational policy and international assessment. Berkeley, CA: McCutchen. Raudenbush, S. W., & Willms, J. D. (Eds.) (1991). Schools, classrooms, and pupils: International studies of schooling from a multilevel perspective. San Diego: Academic Press. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41–55. Rotberg, I. (1990). I never promised you first place. Phi Delta Kappan, 72, 296–303. Rotberg, I. (1991). How did all those dumb kids make all those smart bombs? Phi Delta Kappan, 73, 778–781. Salganik, L. H., Phelps, R. P., Bianchi, L., Nohara, D., & Smith, T. M. (1993). Education in states and nations: Indicators comparing the U.S. states with the OECD countries in 1988 (Report NCES 93-237). Washington, DC: U.S. Department of Education, Office of Educational Research and Improvement. Schum, D. A. (1987). Evidence and inference for the intelligence analyst. Lanham, Md.: University Press of America.
Salkind_Chapter 70.indd 351
9/4/2010 10:49:25 AM
352
Research Design, Measurement and Statistics and Evaluation
Sheehan, K. M., & Mislevy, R. J. (1990). Integrating cognitive and psychometric models in a measure of document literacy. Journal of Educational Measurement, 27, 255–272. Smith, M. S. (1975). Commentary on R. L. Thorndike’s “The relation of school achievement to differences in the backgrounds of children.” In A. C. Purves & D. U. Levine (Eds.), Educational policy and international assessment (pp. 111–116). Berkeley, CA: McCutchen. Theisen, G. L., Achola, P. P., & Boakari, F. M. (1983). The underachievement of crossnational studies of achievement. Comparative Education Review, 27, 46–68. Thorndike, R. L. (1973). Reading comprehension education in fifteen counties: An empirical study. International Studies in Evaluation, Vol. III. New York: Wiley, and Stockholm: Almqvist & Wiksell. Viadero, D. (Dec. 8, 1993). NAEP urged to make “report card” more useful. Education Week. White, J., & Löfqvist, G. (1988). Pragmatic writing tasks. In T. P. Gorman, A. C. Purves, & R. E. Degenhart (Eds.), The IEA Study of Written Composition I: The international writing tasks and scoring scales (pp. 79–99). Oxford: Pergamon Press. Wiley, D. E., & Bock, R. D. (1967). Quasi-experimentation in educational settings: Comment. The School Review, 75, 353–366. Wolf, R. (1977). Achievement in America. New York: Teachers College Press. Wolf, R. (1979). Sampling. Bulletin 4: Secondary Study of Mathematics. Urbana, IL: Second International Mathematics Study. Wolfe, R. G. (1989, February). An indifference to differences: Problems with the IEAP-88 study. Paper presented at a research conference on the Second International Mathematics Study data, University of Illinois, Champaign.
Salkind_Chapter 70.indd 352
9/4/2010 10:49:25 AM
71 Power, Control, and Validity in Research Randall M. Parker
P
ower and control, though usually conceived of as belonging in the realms of warfare and politics, also play an important role in science. In lay terms, power and control refer to the influence one has over others. The media, for example, possess power and control through information dissemination. Dissemination of research results to the public through television, newspapers, and so on, has the power to influence and even control public behavior. The media have come to play a major role in providing information on new scientific developments. For instance, at one time the names Pons and Fleischmann might have conjured images of beauty cream and margarine. Today, many would recognize these as the names of the physicists who discovered cold fusion. Early media reports of cold fusion held great promise for readily available, limitless energy. It all sounded too good to be true. Apparently, it was. While the final nail has yet to be driven into the coffin of cold fusion, it appears to have been a tempest in a teapot, or perhaps more appropriately, fury in a fusion jar (Amato, 1990). Media hype of the results of poorly designed research not only may lead to undeserved recognition for the researcher, but also may harm public welfare. Occasionally, the presentation of overstated research findings in the media causes marked shifts in human behavior. A body of research, for example, suggests that oat bran is effective in lowering serum cholesterol, thus reducing the risk of atherosclerotic heart disease. A recent, highly publicized study, however, indicated that oat bran had no direct effect in lowering serum cholesterol (Swain, Rouse, Curley, & Sacks, 1990). Subsequent television and
Source: Journal of Learning Disabilities, 23(10) (1990): 613–620.
Salkind_Chapter 71.indd 353
9/4/2010 10:49:10 AM
354
Research Design, Measurement and Statistics and Evaluation
newspaper coverage emphasized the conclusions of Swain et al.’s research and ignored serious problems with the study’s design. Design problems included the fact that the small number of participants (N = 24) were predominantly young (M = 30 yrs.), female dieticians who had low serum cholesterol (M = 186 mg/dl) – hardly a typical or representative group. Moreover, considerable data were lost because 4 of the 24 participants (16.7%) did not complete the study; and the participants’ diet was not recorded or controlled other than including a specified number of oat bran muffins (treatment) or low-fiber wheat muffins (control). As a result of this poorly designed study and the attention it received in the media, consumption of oat bran plummeted, perhaps to the detriment of public health (Raloff, 1990). Because scientific history is replete with examples of specious, even fraudulent, findings, experienced scientists develop a cautious, skeptical attitude toward dramatic discoveries. And yet, this scientific skepticism must not impede exploration of new theories, models, and effects. There are several lines of defense that ensure the validity of scientific results. First are the ethics and competence of scientists. Despite notable exceptions, the vast majority of researchers are ethical, well-trained individuals. Second is the process of peer review that is practiced by most research journals. Peer review, though cumbersome and inequitable in some cases, acts as a quality control mechanism. Third is an intelligent, knowledgeable, well-informed public. Simple principles for conducting valid research are taught in public schools, even though these principles are frequently disregarded by media reporters and readers alike. Another important line of defense is replication of research. The ethics and tradition of science require data, new procedures, and the like, to be shared fully with the scientific community so that peers may attempt to replicate the results. However, concerns over retaining proper recognition, guarding legal rights, and protecting potential financial gain motivate some to refuse to place complete data in the public domain. When replication is not possible because complete information is not shared, the process of winnowing scientific fact from fiction is thwarted. Replication by peers is immensely important because repetition of random errors from one study to another is unlikely. Other pitfalls in human research are also unlikely to be repeated in independent replications. One such pitfall is the “Investigator Fudging Effect” (Barber, 1976). Probably because altering or fabricating data is taboo among scientists, words like “fudging,” “trimming,” and “cooking” are used to describe a researcher’s unethical tampering with data. Even Newton was believed to have fudged in Principia by making his data appear more precise than they actually were (Barber, 1976; Westfall, 1973). Sir Ronald Fisher, the father of modern statistics, asserted that Gregor Mendel, or one of Mendel’s assistants, probably cooked the data that led to the Mendelian inheritance ratios (Koestler, 1971). The motivation
Salkind_Chapter 71.indd 354
9/4/2010 10:49:11 AM
Parker
Power, Control, and Validity in Research
355
for such cheating is likely to be the historical need of scientists to survive in a critical, competitive world. In today’s “publish or perish” scientific environment, it is surprising that research standards essentially remain high. This article began with a discussion of the terms power and control as commonly defined. Power and control, however, have different technical meanings when used in research design. As technically defined, power and control are the keystones of research validity. And research validity is the foundation of this article, the remainder of which will review basic principles of research validity and comment on the Irlen lens research (Blaskey, Scheiman, Parisi, Ciner, Gallaway, & Selznick; O’Connor, Sofo, Kendall, & Olsen; Robinson & Conway) published in this issue of the Journal of Learning Disabilities. Basic principles of research validity will be organized according to the four types of validity – internal, external, statistical conclusion, and construct validity – presented by Campbell and Stanley (1966), Cook and Campbell (1979), and particularly Cook and Campbell (1984).
Internal Validity Internal validity, which is the most important type of research validity, refers to the extent to which error variance is experimentally controlled. Experimental control may be attained through the use of several techniques. Five common methods of control are (1) random assignment of subjects to treatment and control groups, (2) holding extraneous variables constant or restricting their range, (3) including extraneous variables in the design to measure their effects, (4) employing methods of statistical control, and (5) matching subjects in the treatment and control groups on contaminating extraneous variables (Bolton & Parker, 1987; Christensen, 1980). Random assignment equates the treatment and control groups on all variables except the independent (treatment) variable. Although random assignment of subjects to groups is the most effective and preferable form of control, it is not feasible, or even possible, in many situations. For example, it is not possible to randomly assign students to high and low reading groups because reading level is a preexisting, predetermined condition. The next most preferable method of control is holding extraneous variables constant or restricting their range. For example, if age were a potentially contaminating variable, the range of age in the sample could be limited. If gender were a potentially contaminating variable, only females (or males) could be studied. Holding variables constant reduces their variation and therefore their covariation with outcome variables. Statistical control involves the statistical manipulation of data to reduce extraneous variation. Such techniques include partial correlation, analysis of covariance, and multivariate strategies (e.g., regression analysis, discriminant analysis, and canonical correlation analysis).
Salkind_Chapter 71.indd 355
9/4/2010 10:49:11 AM
356
Research Design, Measurement and Statistics and Evaluation
The most problematic form of control is matching subjects on potentially contaminating variables. Matching is most effective when several criteria are met: (1) The researcher can identify and measure likely contaminating variables; (2) the number of variables to be matched is small (two or three); and (3) each member of the treatment group is matched with a member of the control group, as opposed to matching whole groups at a time. Failure to control sources of contaminating, extraneous variance poses many threats to internal validity.
Threats to Internal Validity When researchers employ flawed designs and obtain evidence supporting their research hypotheses, the findings often can be explained by “rival hypotheses.” These rival hypotheses are alternative explanations that account for the findings just as well or even better than the research hypothesis. For example, the finding that a new instructional approach resulted in statistically significantly greater gains in achievement than other approaches may be due to the instructor’s enthusiasm for the new approach rather than the content of the program. The following are threats to internal validity that, for certain designs, may be explanations for statistically significant findings as equally viable as the research hypotheses. 1. History. This term refers to an extraneous event that occurs prior to outcome assessment and correlates with the dependent variable (e.g., several experimental subjects in a study of the relationship of stress to anxiety received IRS audit notices before the posttest was administered). 2. Maturation. This denotes naturally occurring developmental changes in the subject that affect the subject’s performance on the outcome variable (e.g., during a 2-year study of achievement skills of elementary students, the females’ reading achievement grew at a faster rate than the males’). 3. Testing. Testing refers to the fact that pretesting may sensitize subjects in ways that affect the posttest scores. For example, students may remember their pretest responses and answer more items correctly at posttest, or react negatively to retaking the same test and answer haphazardly. 4. Instrumentation. This term refers to deterioration or changes in the accuracy of instruments, devices, or observers used to measure the dependent (outcome) variable (e.g., observers forget their training, or surveys mailed to one geographical area become wrinkled and rain-soaked in transit, affecting their legibility). 5. Statistical Regression. Grouping subjects on the basis of extreme scores tends to be inaccurate because extreme scores tend to regress toward the group mean. For example, high and low reading groups are formed by dividing students at the median on a reading test. To the extent that the
Salkind_Chapter 71.indd 356
9/4/2010 10:49:11 AM
Parker
6.
7.
8.
9.
10.
11.
12.
Power, Control, and Validity in Research
357
reading test is unreliable, individuals who scored the highest will tend to score lower, and those who scored lowest will tend to score higher at posttest. Furthermore, some students scoring slightly above the mean at pretest will score below the mean at posttest. The reverse is true for those scoring slightly below the mean. The migration of participants across the high-low group boundary due solely to test unreliability obviously results in misclassification and inflated error variance. Mortality. This refers to the loss of subjects during research, due to death, absence, and so forth. If the treatment itself causes participants to drop out, the treatment groups’ posttest mean would be contaminated because only the scores of the “hardy” survivors would be available. Selection. This occurs when subjects volunteer for a treatment or are assigned to treatment and control groups based on their preferences. This assignment results in the groups being different on many variables: They are not comparable groups. Interactions with Selection. Many of the foregoing threats to internal validity may interact with selection to produce effects that may be erroneously attributed to the treatment. People who volunteer for a treatment might be different from a nonvolunteer control group on variables that are related to the outcome variable. Interaction of selection and maturation, for example, might cause the experimental and control groups to be composed of individuals who are maturing at different rates, and this difference might affect the outcome variables. Ambiguity About the Direction of Causal Influence. This threat to internal validity, common in bivariate correlational studies, refers to when it is not clear whether A causes B or B causes A. Consider a situation in which it is found that supervision and productivity are negatively related. Low levels of supervision are related to high productivity and vice versa. Does low supervision cause high productivity, or does high productivity cause low supervision? Diffusion or Imitation of the Treatment. If the treatment is widely diffused or disseminated and/or the treatment and control groups communicate with each other about the treatment, the groups may become indistinguishable on the dependent (outcome) variable. Compensatory Equalization of Treatment. When the treatment is desirable, administrators may be reluctant to tolerate inequity. In Project Follow Through, for example, administrators of the project gave needy control schools Title 1 funds in amounts equivalent to the experimental schools. This action introduced contamination into the study (Cook & Campbell, 1984). Compensatory Rivalry. When membership in experimental and control groups is made public, competitive motivations may result. Saretsky (1972) noted that the performance of students taught by control teachers in a performance contracting experiment was higher during the experiment than
Salkind_Chapter 71.indd 357
9/4/2010 10:49:11 AM
358
Research Design, Measurement and Statistics and Evaluation
in previous years. It appeared that performance contracting was perceived by the teachers as a threat to their job security, causing them to redouble their teaching efforts. Saretsky called this the “John Henry effect,” comparing the control group’s extra efforts to those of John Henry, a folk song character who competed with a machine in laying rail for the railroad. 13. Resentful Demoralization of Respondents Receiving Less Desirable Treatments. Particularly when an experiment is obtrusive and the experimental group receives preferential or desirable treatment, the no-treatment control group may feel resentful and demoralized. This resentment may lead to posttest differences that are not due to the positive effects of the treatment on the experimental group, but to the negative effects of no treatment on the control group. 14. Local History. This threat to internal validity occurs when data are collected by groups. Idiosyncratic events occurring during group data-gathering sessions are confounded with treatment effects. For example, if individuals assigned randomly to an experimental group receive an enthusiastic presentation of the treatment, and the same enthusiasm is not expressed to the control group, any effects attributed to the treatment may be due in whole or in part to the enthusiasm.
External Validity External validity refers to the degree to which research findings can be generalized across time, settings, and persons. Generalizing across time and settings typically necessitates administering the experimental procedures at different times and in different settings. Generalizing across persons requires research samples to be representative of the population of interest. Obtaining an appropriate, representative sample of individuals to study is one way to increase external validity. A variety of sampling methods has been developed to obtain representative samples. The most widely known method is random selection. Unfortunately, most educational and psychological research possesses little external validity because convenience samples, the most readily available or convenient subjects, are used. There are two types of sample designs: probability and nonprobability. In probability samples each sampling unit or case has a known likelihood or probability of being included in the sample. The probability of each sampling unit being included in the sample is unknown in nonprobability samples. Only through probability sampling can one obtain a representative sample. Probability designs include: 1. Simple random samples – samples in which each sampling unit in the sampling frame has an equal probability of being included. A table of random numbers is usually used to select cases randomly.
Salkind_Chapter 71.indd 358
9/4/2010 10:49:11 AM
Parker
Power, Control, and Validity in Research
359
2. Systematic samples – samples consisting of every Kth sampling unit of the population. This method is not as desirable as simple random sampling, but may be more easily achieved if you are asking others (e.g., teachers or administrators) to select cases. 3. Stratified random samples – samples that are randomly selected from predefined strata in the population. The strata, for example, could be made up of four socioeconomic levels. Cases would be sorted into SES levels and randomly selected from each level. This procedure assures that the sample will contain cases representing each SES level. The number of cases selected from each stratum may be equal or may be proportional to the number of cases in the population. 4. Cluster samples – samples that are selected through simple random sampling or stratified random sampling in several stages. For example, 10 public school districts might be randomly selected at Stage 1. Next, 10 schools (1 from each district) might be randomly selected. Finally, 10 white, 10 Hispanic, and 10 black students might be randomly selected from each school (Nachmias & Nachmias, 1981). Nonprobability designs include the following three types: 1. Convenience samples – samples that are chosen without a sampling plan and because they are readily available. 2. Purposive samples – samples that are selected to be representative through the subjective judgment of the researcher. 3. Quota samples – samples that are selected to be similar to the population on specified characteristics (e.g., age, sex, political affiliations, etc.). The researcher employing this design may select anyone (e.g., friends) as long as they possess the specified characteristics. Failing to obtain a sample that is representative of a population, of a range of settings, and of a range of times results in the inability to generalize the findings of the study beyond the persons, setting, and time employed in the research. Such failures pose serious threats to external validity.
Threats to External Validity 1. Interaction Among Treatments. When multiple treatments are administered to the same participants, the effects may be cumulative. Thus, the results of research employing multiple treatments may not be generalizable to situations where the treatments are given singly. This threat is applicable to complex time-series designs. 2. Interaction of Testing and Treatment. The pretest may increase or decrease the subjects’ responsiveness or sensitivity to the treatment; as a consequence,
Salkind_Chapter 71.indd 359
9/4/2010 10:49:11 AM
360
3.
4.
5.
6.
Research Design, Measurement and Statistics and Evaluation
the results are not generalizable to the nonpretested population from which the treatment group was selected. Interaction of Selection and Treatment. This occurs when research subjects are volunteers or individuals who are prone to seek out research participation. Such persons may have traits that tend to enhance or diminish the effects of the treatment. Thus, the results are not generalizable to the population of interest, which includes nonvolunteers. Interaction of Setting and Treatment. Treatments demonstrated in one environment (e.g., the laboratory) may not “work” in other settings (e.g., the classroom). Therefore, this threat to external validity refers to whether effects demonstrated in one setting are generalizable to other settings. Interaction of History and Treatment. The effects observed in a study may be due to special circumstances (e.g., teachers show a substantial lowering of anxiety after a stress reduction program. However, the study was conducted during the time that the local community was under a tornado warning. Is the observed effect generalizable to other, more mundane circumstances?). Generalizing-Across-Effect Constructs. This refers to the degree to which a treatment found to be effective on one outcome will work (i.e., be generalizable) to produce other outcomes. For instance, will students using tinted lenses to ameliorate eyestrain also experience an enhanced self-concept?
Statistical Conclusion Validity Statistical conclusion validity is concerned with the appropriate use of statistics to derive accurate conclusions. Before proceeding, a few terms must be defined. Let us consider a typical study wherein treatment and control groups are being compared on their performance on an outcome variable. The null hypothesis states that there is no difference between group means on the outcome variable. In this situation the Type I error, which is called alpha (a), is the probability of falsely rejecting the null hypothesis, that is, finding a significant difference when the means come from the same population. On the other hand, the Type II error, beta (b ), is the probability of failing to reject a false null hypothesis, that is, finding no significant difference when the means come from different populations. Let us now proceed with describing the threats to statistical conclusion validity.
Threats to Statistical Conclusion Validity 1. Low Statistical Power. Statistical power is equal to 1.0 – b, or the probability of rejecting a false null hypothesis, that is, finding a statistically significant
Salkind_Chapter 71.indd 360
9/4/2010 10:49:11 AM
Parker
Power, Control, and Validity in Research
361
difference when the means, in fact, come from different populations. Power is a function of alpha, sample size (N), and the effect size (ES). Effect size refers to the amount of common variance between the independent variable (IV) and the dependent variable (DV), or the degree to which changes in the IV result in changes in the DV. The effect size is different for different statistics. For example, for correlations the ES is equal to Pearson r, whereas for F tests ES is the correlation ratio (SS Between/SS Total) (Cohen, 1988). Increasing the alpha level, sample size, and effect size, singly or in combination, acts to increase statistical power. According to Cohen (1988), power should be about .80, that is, researchers should design their studies so that they have an 8 in 10 chance of obtaining a statistically significant result when one actually exists. However, surveys of research indicate that most studies have much lower power (Cohen, 1962; Lipsey, 1990). By systematically varying alpha, N, and ES, a researcher can produce the desired level of power. Using the power tables provided by Cohen (1988), one can readily estimate the power of a study during the planning stage and thus maximize the probability of finding an effect when one exists. 2. Fishing and the Error Rate Problem. Running many statistical tests in an effort to obtain statistically significant findings beyond those hypothesized is called “fishing.” Fishing produces error rates that are higher than the preset alpha. Running one statistical test on the data set results in a Type I error rate equal to preset alpha (usually .05). However, running two or more independent tests inflates alpha above the predetermined rate. This alpha inflation is also called probability pyramiding (Neher, 1967). Alpha inflation occurs for all independent statistical analyses, whether hypothesized in advance or not. “Independent analyses” refers to all separate statistical tests, for example, repeated Pearson rs, multiple t tests, and so forth. Exceptions include multiple F ratios and similar statistics as part of the same complex ANOVA or ANCOVA statistical design, including appropriate follow-up procedures (e.g., Scheffe’s, Tukey’s, and similar multiple comparison tests). Other exceptions are multiple statistics run as part of a multivariate analysis, including regression analysis, multivariate analysis of variance and covariance, discriminant analysis, canonical correlation, and the like. 3. Low Reliability of Measures. Measures with low reliability increase error variance and reduce the power of statistical tests. Using tests comprising more items with high internal consistency, decreasing the intervals between pretests and posttests, and using corrections for unreliability (with great caution) are methods for avoiding this problem. A related problem occurs when simple gain scores (posttest minus pretest scores) are used as a measure of change on the dependent variable. Simple gain scores are notoriously unreliable, because by subtracting the pretest from
Salkind_Chapter 71.indd 361
9/4/2010 10:49:11 AM
362
Research Design, Measurement and Statistics and Evaluation
the posttest, one is left with less true variance and relatively more error variance. 4. Low Reliability of Treatment Implementation. This is due to the lack of standardization of procedures used to administer the treatment. Using different individuals and different occasions to implement the treatment increases this threat to statistical conclusion validity. 5. Random Irrelevancies in the Experimental Setting. Scores on the outcome variable can be affected by aspects of the experimental setting other than the treatment. This threat may be reduced by choosing settings free of distractions (e.g., irregularly occurring noise) or by focusing participant attention on the treatment and lowering the salience of environmental factors. 6. Random Heterogeneity of Respondents. This occurs when respondents are heterogeneous on variables that are related to the outcome variable. When uncontrolled respondent variables correlate with the outcome variable, error variance is inflated. This problem may be controlled by selecting subjects who are homogeneous on all variables (excluding the independent variable) that are related to the dependent variable. Alternatively, subjects may be “blocked” on such variables, which would then be included in the statistical design.
Construct Validity The construct validity of a variable refers to whether the variable is adequately defined and accurately measured by the instruments, procedures, manipulations, and methods employed in the study. A valid construct must be uniquely operationally defined. When a construct is suggested as a cause or effect, it is valid when other constructs cannot be construed as being either the cause or the effect.
Threats to Construct Validity 1. Inadequate Preoperational Explication of Constructs. When constructs are poorly defined, the instruments, procedures, manipulations, and methods used in the study cannot be adequately specified. More importantly, the results of the study cannot be accurately attributed to the constructs of interest. 2. Mono-Operation Bias. Construct validity is limited when a construct is defined by only one measure or operation. Multiply operationalized variables tend to be more valid than single operations because single operations under-represent constructs and contain irrelevancies. Alternative measures of a target allow one to triangulate on the construct.
Salkind_Chapter 71.indd 362
9/4/2010 10:49:11 AM
Parker
Power, Control, and Validity in Research
363
3. Mono-Method Bias. This occurs when all the manipulations and measures use the same means of presenting the treatments or recording the results. For example, because many leadership studies have employed single paper-and-pencil measures, it has been suggested that leadership theories predict paper-and-pencil behavior better than actually practiced leadership behaviors (Campbell, Daft, & Hulin, 1982). Other measures (e.g., observations) should have been used in these studies. 4. Hypothesis-Guessing Within Experimental Conditions. Research participants may attempt to guess the purpose of the study and alter their behavior accordingly. This alteration of behavior does not involve comparisons to other groups in the study and thus is different from threats to internal validity (e.g., resentful demoralization) discussed earlier. Hypothesisguessing can be avoided by making hypotheses difficult to guess, by reducing the reactivity and obtrusiveness of the study, and by purposefully giving different hypotheses to different participants. 5. Evaluation Apprehension. Treatment group subjects may, because of being apprehensive about being evaluated by the researcher, present themselves in such a way as to be favorably assessed. If this favorable presentation affects the outcome variable, the experimental results are confounded. 6. Experimenter Expectancies. Some research suggests that experimenter expectancies can influence participants’ behavior and the outcome data. This effect can be avoided by selecting objective individuals who do not know the purposes of the study to administer the treatment and record the data. 7. Confounding Levels of Constructs and Constructs. This threat occurs when the independent variable (IV) has multiple levels. One may conclude, for example, that, overall, the IV does not affect the dependent variable (DV) when only IV1 (Level 1 of the IV) and IV4 do not affect the dependent variable (DV). However, IV2 and IV3 do affect the DV, obviously indicating a nonlinear relationship. Typically, this threat occurs with weak treatments. Conclusions, nonetheless, are drawn regarding the ineffectiveness of the treatment without noting the low strength of the manipulation. This threat can be controlled by manipulating many levels of the IV and measuring many levels of the DV. 8. Generalizing Across Time. When indicating the impact of a treatment, it would be desirable to indicate how long it took for the effects to appear and how long and at what level the effects were manifested. Ideally, time should be included as a design variable in treatment research. 9. Interaction of Procedures and Treatments. Participants who receive new information or have new experiences as part of the treatment may react differently to the treatment. Subjects in a 2-month study using glasses with tinted lenses to improve their reading performance may react to the fact that at the termination of the study they would be allowed to keep the glasses. If an effect is observed, is it due to the tinted lenses or to the fact that the participants would receive the glasses as a gift?
Salkind_Chapter 71.indd 363
9/4/2010 10:49:11 AM
364
Research Design, Measurement and Statistics and Evaluation
Comments on the Irlen Lens Research Prior to presenting a critical analysis of the studies, the author identified the research design employed in each study and consulted tables in Campbell and Stanley (1966) to determine which threats to internal and external validity were likely to exist for each study. Additionally, each study was reviewed to identify threats to statistical conclusion and construct validity. A summary of the threats to validity is presented in Table 1. As can be readily determined from the table, all three studies appear to have substantial threats to validity Table 1: Threats to validity of three Irlen lens studiesa
Internal Validity threats History Maturation Testing Instrumentation Statistical regression Mortality Selection Interactions with selection Ambiguity re causal influence Diffusion of the treatment Compensatory equalization Compensatory rivalry Resentful demoralization Local history External validity threats Interaction among treatments Interaction of testing and treatment Interaction of selection and treatment Interaction of setting and treatment Interaction of history and treatment Generalizing-over-effect constructs Statistical conclusion validity Statistical power Fishing and the error rate problem Reliability of measures Reliability of treatment implementation Random irrelevancies Random heterogeneity Construct validity Inadequate explication of constructs Mono-operation bias Mono-method bias Hypothesis-guessing Evaluation apprehension Experimenter expectancies Confounding levels of constructs Generalizing across time Interaction of procedures and treatments
Blaskey et al.
O’Connor et al.
Robinson & Conway
+ + + + + − + + + − + ? ? ?
+ + + + + ? + + + − + ? ? ?
− + + ? + ? + + − + + + + ?
− − ? ? ? ?
− − ? ? ? ?
− − ? ? ? ?
− − ? ? ? ?
− − ? ? ? ?
− − ? ? ? ?
− + − − − − + ? −
− + − − − − + − −
− + − − − − + + −
A plus ( + ) means the threat was controlled, a minus (−) means it was not, and a question mark (?) means it may not have been controlled.
a
Salkind_Chapter 71.indd 364
9/4/2010 10:49:11 AM
Parker
Power, Control, and Validity in Research
365
(denoted by a minus sign) and may have other, potential threats (denoted by a question mark). In a more positive vein, all three studies appear to have controlled numerous threats to validity (denoted by a plus sign). Because of space limitations, I will review only selected threats to internal and external validity for each study in the following paragraphs. The studies possess flaws other than those I have selected to discuss. Of the three studies, Blaskey et al. use the “cleanest” design – an experimental, pretest-posttest, control-group design with random assignment of participants to groups. This design controls for most threats to internal validity. Mortality, however, was a problem. Eight of 30 subjects did not complete the study, resulting in a relatively high mortality rate (27%). Four other potential threats to internal validity, noted in Table 1 with a question mark, also might have lowered the validity of the study. With respect to external validity, the Blaskey et al. study had two threats, namely interaction among treatments and interaction of testing and treatment. Multiple treatments were given to one group. The Irlen lens treatment group received two pairs of glasses, one with placebo lenses and the other with Irlen lenses, which they wore during different phases of the study. The effects of the Irlen lenses and placebo lenses are, therefore, inseparable, and differences observed between this group and the control group could not be attributed to Irlen lenses alone. Any generalizations would have to be made to individuals wearing both placebo and Irlen lenses. Interaction of testing and treatment may also have occurred, so that generalizations could be made only to groups who received both the pretest and the treatment. O’Connor et al. erroneously claimed to have used an “expanded” Solomon Four-Group Design. The Solomon Four-Group Design has a treatment group and control group that are pretested and posttested, and another treatment group and control group that receive only a posttest; subjects are randomly assigned to the four groups. O’Connor et al. had six groups, five of which were pretested and posttested. One control group was posttested only. The design is most similar to the pretest-posttest, control-group design, and possesses the same threats to internal and external validity as the Blaskey et al. study does, except that O’Connor et al. reported no mortality problems. I regard the design of O’Connor et al. as inferior to that of Blaskey et al., because of the former design’s “patched-up” appearance. In addition, O’Connor et al. used random assignment in an idiosyncratic manner. Scotopic individuals were separated from nonscotopic persons, and the scotopic group was randomly assigned to four subgroups; the nonscotopic group was separately randomly assigned to two subgroups. Therefore, valid comparisons could be made within the scotopic group and within the nonscotopic group, but not between groups. Robinson and Conway employed what most closely resembles a singlegroup, time-series design. This is quite an adequate design, despite lacking a control group. The primary threat to internal validity is history. Since this
Salkind_Chapter 71.indd 365
9/4/2010 10:49:11 AM
366
Research Design, Measurement and Statistics and Evaluation
study took place over 1 year’s time, many events could have occurred that influenced the outcome. Therefore, changes in the outcome variable could be attributed to these extraneous events as easily as to the treatment. The identifiable threats to external validity are the same as the other two studies. All three studies had serious threats to statistical conclusion validity and construct validity. All had low statistical power. Assuming a strong effect size and an alpha of .05, only the Robinson and Conway study approached a power of .50. A power of .50 means there is a 50-50 chance of obtaining a statistically significant finding when a difference exists in the populations. To reach a power of .80, recommended by Cohen (1988), one would need about 100 subjects in each group (Lipsey, 1990). Likewise, all three studies had fishing and error rate problems. Each did a series of univariate tests without correcting for probability pyramiding or the alpha inflation effect. For example, Blaskey et al. ran 22 t tests. Although their stated alpha level was .05, it actually was .68, if my calculations are accurate. In other words, the probability of a Type I error (i.e., finding a statistically significant result when none existed) was 68 chances in 100, an unacceptably high value. All three studies could have used more effective statistical designs. Blaskey et al. might have employed a multivariate analysis of covariance, with pretest scores as covariates. O’Connor et al. should have used standard scores rather than grade equivalents, not only because of statistical concerns, but because grade equivalents are dangerously misleading (Anastasi, 1988). Regardless of the scores used, O’Connor et al. probably should have used a Friedman two-way analysis of variance for the four scotopic subgroups alone, instead of a KruskalWallis ANOVA over six groups. The fact that they obtained statistical significance using a Kruskal-Wallis ANOVA across the six groups is not surprising. One would have expected a significant finding, because four of the six groups were scotopically sensitive and two were not; by definition, the groups came from different populations. In addition, they used a follow-up test (the “Steele Test,” which is not referenced and is not discussed in major nonparametric statistics texts, e.g., Siegel & Castellan, 1988) to determine whether simple gain scores (posttest minus pretest scores) for the Irlen transparency group were different. This was an error, because simple gain scores tend to be unreliable, even when the original scores possess moderate reliability (Anastasi, 1988). Robinson and Conway also utilized inappropriate measures and statistics. Using raw scores and age scores from developmental measures without statistically controlling for age differences is a fatal flaw. By definition one expects raw scores or age scores on developmental measures to change over time. Simple corrections are unacceptable (e.g., Robinson and Conway added 12 months to the pretest to equate it with the posttest given 12 months later). Actually, the authors should have used one of several techniques available to analyze time-series data (e.g., Cook & Campbell, 1979). The data depicted in Robinson and Conway’s two figures, nonetheless, are quite revealing. It is
Salkind_Chapter 71.indd 366
9/4/2010 10:49:11 AM
Parker
Power, Control, and Validity in Research
367
clear that all the measures are developmental, because they show relatively smooth, positive slopes over time. The fact that one does not see marked changes in the slopes of any measure suggests that the treatment had no greater effect than the placebo (see Kidder & Judd, 1986, pp. 108–115). Finally, and perhaps most importantly, all three studies suffered from threats to construct validity, particularly the threat labeled “inadequate preoperational explication of constructs.” The construct around which the research revolves, scotopic sensitivity, is not clearly defined, and the procedures for measuring it are problematic. What is scotopic sensitivity? Is it an independent construct unrelated to other vision problems, or are the phenomena it describes encompassed and explained better by other constructs? These and similar questions about scotopic sensitivity must be addressed before valid attempts can be made to determine whether tinted lenses, which appear to ameliorate scotopic sensitivity, can also be used to treat reading disorders. Are the lenses responsible for the effects found in some research, or are there other, more parsimonious explanations? For example, do certain wavelengths of light present in fluorescent lighting negatively affect the reading performance of some students? The reliability and validity of measures of scotopic sensitivity also must be assessed adequately. In effect, one must know the theoretical basis of the phenomenon and be able to measure it consistently and accurately prior to doing valid treatment evaluation research. A related concern is with possible contamination inherent in the procedures used to determine scotopic sensitivity. Based on the descriptions of the diagnostic procedures in the three studies, it appears that one criterion of scotopic sensitivity is that the preferred color overlay must improve performance in reading and similar visual tasks. Reading measures are also used as outcome measures for the studies. Using the same or similar measures to define the treatment group and to assess the effects of the treatment is referred to as criterion contamination. Criterion contamination calls into question the validity of all studies testing whether a treatment group with scotopic sensitivity will improve on reading measures with tinted overlays or lenses. By definition, they will improve. Furthermore, as described in the three articles, the procedure for diagnosing scotopic sensitivity may involve social influences that could affect participants’ behaviors on outcome measures. Evaluation apprehension, experimenter expectancies, and a host of other factors might be engendered during the diagnostic process and could alter the behavior of subjects. In short, the diagnostic procedure may act as a potent treatment in itself.
Conclusions What can the reader conclude from these three studies? Does this research contribute to the debate about the effects of Irlen lenses? It is obvious from the foregoing that the three studies herein do not present definitive, valid findings.
Salkind_Chapter 71.indd 367
9/4/2010 10:49:11 AM
368
Research Design, Measurement and Statistics and Evaluation
However, to the extent of their heuristic value, they do advance the effort to investigate questions about Irlen lenses. I have learned, and I hope the reader has learned, many interesting things about conducting this kind of treatment evaluation research. Things to do and not to do. And, interestingly, some of the data in the three studies suggest that an effect may be taking place. Isn’t it an exciting challenge to try to tease out what, if anything, is occurring? Despite my railing about the flaws of the three studies, the types of errors made are quite common in published special education and rehabilitation research, though rare in JLD. This observation is not intended to minimize the studies’ faults. I intend only to point out that it is easy to recline in one’s armchair and shoot holes in any piece of research. The perfectly designed study exists only in textbooks; in reality there is no such thing as flawless research. Nevertheless, researchers should develop the expertise to correctly carry out all aspects of their work, or should hire expert consultants to assist them. Two issues are particularly important. First, all easily avoidable errors, for example, selecting an inappropriate design and analysis, should be circumvented in the research planning stage. Second, authors should be aware of, and thoroughly report, the threats to the validity of their research. To do otherwise is not only unethical, it is scientifically indefensible.
References Amato, I. (1990). Cold fusion: Wanted dead or alive. Science News, 137, 212. Anastasi, A. (1988). Psychological testing (6th ed.). New York: Harper & Row. Barber, T. (1976). Pitfalls of human research: Ten pivotal points. New York: Pergamon. Bolton, B., & Parker, R. (1987). Research in rehabilitation counseling. In R. Parker (Ed.), Rehabilitation counseling: Basics and beyond (pp. 157–187). Austin, TX: PRO-ED. Campbell, J., Daft, R., & Hulin, C. (1982). What to study: Generating and developing research questions. Beverly Hills, CA: Sage. Campbell, D., & Stanley, J. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNally. Christensen, L. (1980). Experimental methodology. Boston: Allyn & Bacon. Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (3rd ed.). New York: Academic Press. Cook, T ., & Campbell, D. (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally. Cook, T ., & Campbell, D. (1984). The design and conduct of quasi-experiments and true experiments in field settings. In M. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 223–326). Chicago: Rand McNally. Kidder, L., & Judd, C. (1986). Research methods in social relations (5th ed.). New York: Holt, Rinehart & Winston. Koestler, A. (1971). The case of the midwife toad. New York: Random House. Lipsey, M. (1990). Design sensitivity: Statistical power for experimental research. Newbury Park, CA: Sage.
Salkind_Chapter 71.indd 368
9/4/2010 10:49:11 AM
Parker
Power, Control, and Validity in Research
369
Nachmias, C., & Nachmias, D. (1981). Research methods in the social sciences. New York: St. Martin’s. Neher, A. (1967). Probability pyramiding, research error, and the need for independent replication. Psychological Record, 17, 257–262. Raloff, J. (1990). About that other study. Science News, 137, 331. Saretsky, G. (1972). The OEO P.C. experiment and the John Henry effect. Phi Delta Kappan, 53, 579–581. Siegel, S., & Castellan, N. (1988). Nonparametric statistics for the behavioral sciences (2nd ed.). New York: McGraw-Hill. Swain, J., Rouse, I., Curley, C., & Sacks, F. (1990). Comparison of the effects of oat bran and low-fiber wheat on serum lipoprotein levels and blood pressure. The New England Journal of Medicine, 322, 147–152. Westfall, R. (1973). Newton and the fudge factor. Science, 179, 751–758.
Salkind_Chapter 71.indd 369
9/4/2010 10:49:12 AM
Salkind_Chapter 71.indd 370
9/4/2010 10:49:12 AM
72 Testing Reasoning and Reasoning about Testing Walt Haney
I
n the United States in recent years, much worry and writing have been occasioned by the apparent lack of influence of social science research on practical affairs, including affairs of schooling. In 1979, for example, Lindblom and Cohen wrote, “In public policy making, many suppliers and users of social research are dissatisfied, the former because they are not much listened to, the latter because they do not hear much they want to listen to” (p. 1). Whatever the merits of the general proposition that social science research has exerted little influence on practical affairs – and in particular, the practice of education – there is one very clear exception to this notion. It is the influence that research on testing has exerted on education. This influence will be traced in detail later in this paper, but for the moment, only two works need to be cited. Twice in the last two decades, the National Academy of Education has set out to explore the relationship between research and education, and how the latter might benefit from the former. These efforts resulted in books to which diverse groups of eminent scholars contributed. Each volume gave extended examples of how research has influenced educational practice. The first example in each volume? In Research for Tomorrow’s Schools (Cronbach & Suppes, 1969), it was “Mental Tests and Pupil Classification.” In Impact of Research on Education (Suppes, 1978), it was “On the Theory-Practice Interface in the Measurement of Intellectual Abilities” by John Carroll. If theory and research concerning standardized testing have influenced educational practice, it is worth tracing the relationship for several reasons.
Source: Review of Educational Research, 54(4) (1984): 597–654.
Salkind_Chapter 72.indd 371
9/8/2010 12:07:59 PM
372
Research Design, Measurement and Statistics and Evaluation
First, it illustrates the variety of ways in which educational research may influence educational practice. Second, it indicates something of future possibilities and limits for research concerning testing to improve teaching and learning. Third, it suggests a point that “suppliers ... of social research ... not much listened to” may too easily forget – namely that not all influence is useful or of the stuff to which we ought aspire. Thus the purpose of this paper is to recount the history of research and reasoning about mental testing and its role in educational practice. The history of this relationship is here divided into three broad parts. The first concerns the roots of mental testing up until approximately World War I (WWI). The second, lasting from WWI until 1950, deals with the refinement of testing techniques and institutionalization of testing in educational practice. The third, lasting from about 1950 through the 1970s, was a time when education emerged prominently on the national agenda, and the role of testing in the agenda became markedly more mixed. Tests were seen variously as indicators of educational problems, as solutions to some of those same problems, and as sources of other troubles. The last part of this paper, after summarizing what the relationship between research concerning testing and educational practice has been in the past, suggests research regarding testing that might constructively influence educational practice in the future. Before setting out on this story, two explanations are in order. First, what is meant by tests or standardized tests? In this article, these terms are used to mean systematic devices for eliciting and recording samplings of individuals’ knowledge, skills, or attitudes, as represented in problems, questions, or tasks posed with such devices. This definition includes instruments commonly recognized as aptitude or achievement tests or as norm-referenced or criterionreferenced tests, but excludes, at least for the sake of this discussion, things like observation scales and teacher-made or classroom tests. Second, the term tests of reasoning is also used, for two reasons. First, although the term ability tests might have been adhered to as Wigdor and Garner (1982) did in the report of the Committee on Ability Testing of the National Research Council, I prefer the gerund reasoning for its transitive connotation, as opposed to the static connotation of ability. As Wigdor and Garner themselves comment in cautioning against the phrase “intelligence test,” the report is careful to emphasize that tests can only measure abilities at the moment of testing. . . . the Committee cautions that “intelligence test” can be a misleading label insofar as it encourages misunderstandings about the kind of measurement involved or false notions about intelligence: that it is a tangible and well-defined entity like a heart or even that it is a unitary ability. (P. 2)
The second reason for adopting the phrase tests of reasoning is that the history of testing recounted here attempts to show how thinking has evolved
Salkind_Chapter 72.indd 372
9/8/2010 12:08:00 PM
Haney
Testing Reasoning and Reasoning about Testing
373
regarding the types of reasoning that performance on standardized tests does and does not represent. Nevertheless, it should be noted that the phrase tests of reasoning is used essentially interchangeably with the way others have used the terms ability tests, mental tests, and standardized tests.
The Roots of Standardized Testing (1880 to WWI) The history of standardized testing often is traced back to the invention by Alfred Binet of a test of “intelligence” or general school ability at the turn of the century. However, I think it useful to trace it back somewhat further, specifically to around 1880, in order to recount something of the roots and the social role of testing in American society. The year 1880 is, of course, a somewhat arbitrary date with which to begin this story but it seems an appropriate starting point for several reasons.
Francis Galton First, 1883 was the year of publication of Francis Galton’s Inquiries Into Human Faculty and Its Development. As Boring (1950a) remarks, Inquiries was a work that helped launch “the study of individual differences in human capacity, a subject matter and an interest which affected American psychology profoundly. Galton was the originator of mental tests and of the statistical methods for dealing with individual differences” (p. 161). Elsewhere Boring (1950b) suggests that “this famous book has sometimes been regarded as the beginning of scientific psychology and of mental tests” (p. 483). Though Galton’s early writings were on travel and the study of weather, he was much influenced by the publication in 1859 of Charles Darwin’s Origin of Species (David, 1968, p. 49). If Galton was influenced by his cousin Darwin in terms of the substance of his inquiries, he was influenced mightily in terms of techniques by the Belgian astronomer, mathematician, and man of letters, Adolphe Quetelet. As royal astronomer to the King of Belgium, Quetelet developed a wide-ranging interest in the collection of data about all kinds of social and physical phenomena. In the social realm, he promoted the collection of numerous kinds of anthropometric measurements from large samples of people and showed that these measurements conformed to normal curves. So impressed was he with the prevalence of normal curves that he suggested that the average man, I’homme moyen, constituted a natural ideal, from which individuals deviated by amounts that could be quantified as the probable error. In Galton’s first widely influential volume, Hereditary Genius (1869), in which he referred to Quetelet as the “greatest authority on vital and social statistics” (p. 23), he used a variety of social and physical measurements of
Salkind_Chapter 72.indd 373
9/8/2010 12:08:00 PM
374
Research Design, Measurement and Statistics and Evaluation
individuals to study the tendency of genius to run in families. Paralleling Quetelet’s use of the normal curve to summarize anthropometric measurements, Galton used it to grade intellectual ability. Galton’s fascination with the normal curve continued in Inquiries (1883) and in a third widely influential work on applied psychological measurement, Natural Inheritance (1889). Galton is probably best known for his invention of the term eugenics and for his concern for applying the tools of anthropometry, genetics, psychometry, and statistics for the improvement of the human race – or at least the race of Anglo-Saxons. But his influence on testing was far wider. Boring (1950b) in fact credits him as follows: To measure the capacities of a large number of persons, and thus to sample the population, requires as a practical matter, the development of apparatus and methods by which the measurements of a single individual can be easily and quickly made, such errors as are introduced by the casual procedure are expected to cancel out in the mass results. For this purpose, Galton invented the test and in particular the mental test, an experimental method of measurement which is characterized by its brevity and which contrasts with the elaborate psychophysical procedure of German psychology. . . . In America, behaviorism was able readily to assimilate the mental test because both are primarily concerned with performance without regard to its conscious causes. (p. 484)
As Walker (1929) wrote, “To exaggerate the influence of Francis Galton upon modern statistical method in education in the United States would be very difficult” (p. 45). It was not just that Galton was a key figure in the development of the “mental test” and its application to problems of human betterment, but that he was the inventor or promoter of statistical techniques which continue, a century later, to be the tools of the testing trade. He was an avid user and promoter of the Gaussian Law of Error, that is the normal curve, and was the inventor of the statistical technique of co-relation, though it remained for Galton’s protege, Karl Pearson, to put statistical “co-relation” into the form that it is most widely known today, as the Pearson productmoment correlation coefficient.
Civil Service Reform Another reason for delving back into the last century to find the roots of standardized testing is that well before the turn of the century and the invention of the “new” type of tests, testing had already become a tool of both political and educational reform. On the political front, the U.S. Civil Service Commission was founded to reform the spoils system by using objective examinations in the appointment of federal government employees. Following the assassination of President Garfield by a disappointed office seeker, the Congress passed the Civil Service Act of 1883, which set up the
Salkind_Chapter 72.indd 374
9/8/2010 12:08:00 PM
Haney
Testing Reasoning and Reasoning about Testing
375
Commission, with the authority to administer competitive examinations for certain federal jobs. Initially only about 10% of federal employees were covered, but the federal system was expanded so that by the turn of the century around 50% of federal jobs were awarded by civil service examinations. By 1900 several states had adopted similar reform measures to help ensure that instead of being based solely on political connections, government jobs would be awarded on the basis of merit as measured by competitive exams (Hale, 1982; Kavruck, 1956).
Joseph Mayer Rice In the field of education, tests also had won a place in reform efforts. Indeed, Lawrence Cremin begins his book, The Transformation of the School: Progressivism in American Education 1876–1957 (1964), with the story of Joseph Mayer Rice. Rice was a pediatrician turned educational reformer. Walker (1929) comments that the “earliest statistical investigations of educational problems in this country were largely the work of physicians and anthropologists, familiar with the work of Quetelet” (p. 41). As a physician in New York, Rice became interested in the problems of New York City schools. Following publication of a few minor pieces on the topic and 2 years studying pedagogy in Germany, he was commissioned in 1891 by the Forum magazine to prepare an appraisal of American public education: He was to place “no reliance whatever” on reports by school officials; his goal was to render an objective assessment for the public. . . . Rice left on January 7, 1892. His tour took him to thirty-six cities and he talked with some 1200 teachers; he returned late in June, his notes crammed with statistics, illustrations and judgments. (Cremin, 1964, p. 4)
In his articles in the Forum, Rice attacked the inefficiency and ineffectiveness of the schools – ”the political hacks hiring untrained teachers who blindly led their innocent charges in singsong drill, rote repetition and meaningless verbiage” (Cremin, 1964, p. 5). He called for more “progressive education” in which children would be taught in a meaningful way via a unified curriculum. In a follow-up article on “The Futility of the Spelling Grind,” Rice attacked the practice of rote spelling drills. Rice had collected data on over 30,000 school children and found that there was no relation between the time schools spent on spelling drills and children’s performance on objective tests of spelling. Stanley (1966) points out that Rice’s methods of analyzing his data were remarkably sophisticated for his time, and Engelhart and Thomas (1966), reanalyzing Rice’s data almost 70 years after they were gathered, estimated that they showed a correlation between minutes per day devoted to drill in spelling and spelling achievement on a sentence test of −0.12.1
Salkind_Chapter 72.indd 375
9/8/2010 12:08:00 PM
376
Research Design, Measurement and Statistics and Evaluation
Cronbach et al. (1982) suggest that Rice’s survey “was probably the very first to use a standard test of ability” (p. 26), though whether it might reasonably be called a test of ability is a matter open to question. But in any case, well before the turn of the century, Rice had brought the tools of objective measurement, as well as those of muck-raking journalism, to the task of school reform. It was not to be the last time that standardized testing, educational reform, and muck-raking journalism became intertwined.
Alfred Binet Against this background, there came in the first decade of the new century what was the signal invention in the history of testing. In A History of Experimental Psychology (1950a), Boring suggests that if the 1880s were the decade of Galton in this field, and the 1890s that of J. McKeen Cattell (for his development of numerous tests of sensory capacity, discrimination, and perception), the 1900s belonged to Alfred Binet (p. 573). In his research on mental faculties, Binet originally had employed techniques common among experimental psychologists of his time – measures of characteristics ranging from sensory perception to reaction time and skull size. But in 1904, when he was commissioned to develop a technique for identifying school children unlikely to suceed in normal classes and therefore in need of special instruction, he abandoned such methods and instead chose a quite pragmatic solution, which became the now famous 1905 Binet scale. Binet and his collaborators (initially Henri and later Simon) brought together a wide range of short tasks encompassing a number of quite diverse reasoning skills, such as ordering, naming, and comparing. Many of the specific tasks in the Binet collection were drawn from the work of earlier experimentalists, but Binet’s invention was to ignore the specifics of the reasoning processes involved in particular tasks and to select problems, largely regardless of their content, on which children’s performance showed a correspondence with teachers’ identification of children as average or as feeble-minded. The limitations of the 1905 scale were many: For instance, it was “standardized” on a quite small sample of children selected by teachers. Still, the invention of the 1905 scale was “a turning point in the history of mental measurement”: In this scale for the first time, tests of diverse content were combined to strike an average level of performance rather than to measure separately the dimensions of conventional psychological analysis – faculties, sensory thresholds or whatever. Moreover, the inclusion or exclusion of specific test content was an empirical matter, reflecting Binet’s many years of prior experimentation to discover which tests were corroborated by criterion judgments. Lastly, the tests were not precise laboratory determinations but brief simple and eminently practical tools for examining children in a schoolroom. . . .
Salkind_Chapter 72.indd 376
9/8/2010 12:08:00 PM
Haney
Testing Reasoning and Reasoning about Testing
377
The publication of the 1905 scale attracted considerable attention from those working on the problems of mental deficiency or of educational classification. Goddard translated it into English for use at the Vineland Training School in New Jersey. Decroly and Degand tried it out in Belgium, and other workers, including Terman were in communication with Binet and suggested tests of their own. (Tuddenham, 1962, pp. 484–485)
Binet and Simon went on to contribute further innovations in the Binet-Simon scales of 1908 and 1911. The 1908 scale was expanded to focus more on the normal than on the subnormal populations. Items of more diverse content were introduced and again ordered by difficulty. However, the most significant innovation was that the performance of a child was expressed as an age level: the age at which a child of average ability could successfully perform a task. This is of course the idea of mental age, as opposed to chronological age. (Though Wolf, 1973, pp. 201–203, points out that Binet himself only used the term “mental level,” and avoided the phrase “mental age.”) The 1911 scale was essentially an extension of the 1908 version with an expanded age range and with provision for reporting mental level in terms of fractional years. Though Binet often is called the father of IQ testing, it was not he but German William Stern who introduced the idea of “intelligence quotient” as the ratio of mental age to chronological age. The IQ had the happy property of allowing the comparison of “intelligence” of persons of different ages (Boring, 1950b). Binet died in 1911, well before his invention found widespread application in the United States and elsewhere. This has left biographers to bemoan the “unfortunate fate of Alfred Binet” (as Sarason, 1976, put it). Not only did Binet, whom Boring (1950b) calls “France’s greatest psychologist of that generation” (p. 573), die relatively young (before his 55th birthday), but his invention came to be applied in the United States and elsewhere in ways that seemed antithetical to Binet’s own ideas regarding his invention. As Tuddenham (1962), Wolf (1961, 1973), and Gould (1981) have suggested, Binet did not view intelligence as measured with his instrument as a unitary entity. He viewed the instrument as a practical device to help solve a practical problem, the identification of children in need of special instruction. In his instrument Binet had sought to separate natural intelligence from the effects of instruction, and thus he suggested that his instrument might help “to free a beautiful native intelligence from the trammels of the school” (Binet, 1908, as quoted in Gould, 1981, p. 151). But Binet clearly held that individuals could be instructed in the mental skills that his instrument tapped. Specifically, he maintained that children could be engaged in a sort of “mental orthopedics” that would teach them how to learn. Theodore Simon, interviewed by Wolf (1973) in Paris 50 years after the death of Binet, and having dedicated much of his life to preserving Binet’s memory, maintained that the success of the Binet scale had blinded the world to the “true greatness” of Binet. Among other things, Simon suggested that Binet would have objected “to the use of the IQ” (Wolf, 1973, p. 245).
Salkind_Chapter 72.indd 377
9/8/2010 12:08:00 PM
378
Research Design, Measurement and Statistics and Evaluation
Testing in the United States at the Start of World War I Whatever the historical fate of Alfred Binet, it is clear that his invention had a dramatic impact on testing in the United States. As previously noted, Henry Goddard translated the Binet scale into English for use in the Vineland Training School in New Jersey, but Goddard was not alone in such endeavors: Whipple and Huey both published translations of Binet’s test in 1910. Kuhlmann and Long issued their versions in 1911. Nevertheless, the first thoroughly revised and standardized test on the Binet pattern was the well-known Stanford-Binet of Lewis M. Terman. This test, published in 1916, was a major landmark in the history of psychometrics, and earned enduring fame for its author. (Tuddenham, 1962, p. 492)
Another translator of Binet was Robert Yerkes, who was to have a crucial role in the development of testing in the United States in heading the testing effort in World War I. This pivotal episode, that is, the WWI mental testing, will be described shortly, but first it is worth summing up where things stood at the outbreak of the war. First, on the scientific side of things, much work had been invested in the development and refinement of measures of mental faculties, culminating in the development of the Binet-type intelligence scale. Second, on the educational front, even before the invention of the Binet scale, testing had become a weapon in the armamentarium of educational reformers and muck-raking journalists – though mental testing was soon to become a target, as well as an instrument, of muckraking journalists. Third, also before the development of the new Binet-type tests, testing had been implicated, most obviously in the work of Galton, in matters of politics and human betterment. Fourth, even before WWI, the use of tests, and particularly the new Binet-type tests, had substantially increased in the schools. Chapman (1980), for example, reports on a poll of school systems in 103 cities in 1914 indicating that “eighty-three (81 percent of those responding) were using psychological tests to identify feeble-minded and backward children, and seventy-one cities were using Binet tests specifically” (Chapman, 1980, p. 41). Fifth, the statistical techniques that would come to dominate subsequent work on testing had been invented: the application of the normal curve by Quetelet and Galton, the technique of correlation by Galton and Pearson, the practice of empirical item selection by Binet, and the technique of factor analysis by Spearman (the latter to be described later in this paper). Finally, in the world of business, testing had been established as having some commercial value. Surely one of the first “commercial” applications of mental testing was the Anthropometric Laboratory opened by Galton in London in 1884, in which individuals could have themselves subjected to a variety of anthropometric and psychometric measurements for threepence (Boring, 1950b, pp. 486 – 487). But even before WWI, commercial enterprise in testing had been established on a considerably wider basis. In 1914 the
Salkind_Chapter 72.indd 378
9/8/2010 12:08:00 PM
Haney
Testing Reasoning and Reasoning about Testing
379
World Book Company (which merged with Harcourt Brace in 1962) began publishing the Courtis Standard Research Test in Arithmetic (and arranged to publish the Otis Group Intelligence Scale in 1918). And in 1916 Houghton Mifflin entered the testing field with the publication of the Stanford-Binet intelligence scale (Holmen & Doctor, 1972, pp. 34–35). Quite apart from these strictly historical developments, three more general points are worth noting about this early history of testing. One is that early work in psychological measurement (focusing on specific characteristics, such as reaction speed and memory) had been found to have little power to predict general school achievement (see Tuddenham, 1962, pp. 478–481). Another is that the development of the instrument that was subsequently shown to have more power to predict school achievement, that is, the Binet-type scale, was motivated more by an immediate practical problem than by theoretical issues even though it has been noted as a turning point in the history of testing ever since. The third is that measurement of some refinement led to rather crude social classifications (e.g., classifying children as either average or feebleminded). This last is certainly worth noting in the history of testing: No matter if we know a man’s weight to the milligram, society likely will still call him “fat” or “skinny.”
Going to School and College (WWI to 1950) The testing movement in the United States clearly had considerable momentum before WWI. But the War, and the experience of psychologists in applying their new science in the war effort, helped to boost testing to a prominence that surely even the most ardent promoters could scarcely have imagined. Indeed, Cremin (1964) has suggested that the feverish activity in testing in the early part of this century “would have remained very much a professional phenomenon had it not been for the historical intervention of World War I” (p. 187).
The World War I Testing Just after the United States declared war on Germany, April 6, 1917, a meeting was convened by Robert Yerkes, president of the American Psychological Association, to discuss how psychologists and psychological science might contribute to the war effort.2 Within a month, a tentative plan for the psychological examining of recruits had been submitted to the Surgeon General of the Army (Yerkes, 1921, p. 91). Since the story of the WWI testing has been told in detail elsewhere (Camfield, 1969; Gould, 1981; Jonich, 1968), here we need merely sketch the broad outlines of the endeavor and its consequences for educational testing.
Salkind_Chapter 72.indd 379
9/8/2010 12:08:00 PM
380
Research Design, Measurement and Statistics and Evaluation
The proposal for the testing of recruits led to the creation of the now famous Army Alpha test for literates and Army Beta test for illiterates. In less than 2 years the group-administered Alpha test was given to more than 1.7 million recruits. Yerkes’ massive account, Psychological Examining in the United States Army (1921), summarizes the results: Between April 28, 1918, and January 31, 1919, 7,800 men (0.5 per cent) were reported with recommendations for discharge by psychological examiners because of mental inferiority. The recommendations for assignment to labor battalions because of low grade intelligence number 10,014 (0.6+ per cent). For assignment to development battalions, in order that they might be more carefully observed and given preliminary training to discover, if possible, ways of using them in the Army, 9,487 men (0.6+ per cent) were recommended. During the same interval there were reported 4,780 men with mental age below 7 years, 7,875 between 7 and 8 years; 14,814, between 8 and 9 years; 18,878, between 9 and 10 years. This gives a total of 46,347 men under 10 years’ mental age. It is extremely improbable that many of these individuals were worth what it cost the Government to maintain, equip, and train them for military service. ( pp. 99–100)
Testing on the Home Front Exactly how much the psychological examinations were used in military assignment is unclear. But it is clear that the WWI testing contributed to testing in civilian life in several ways. First, it resulted in an immense amount of publicity. Figure 1 shows the average annual number of entries given in the Readers’ Guide to Periodical Literature, over the last 70 years, under three rubrics: educational measurements (or, since 1957, educational tests and measurements), intelligence testing, and intelligence quotient. The last two terms did not appear as separate headings until 1919 and 1937, respectively. The emergence of “intelligence testing” as a separate heading in the Readers’ Guide seems to have been a quite direct consequence of the Army testing, with volume 5 of the Readers’ Guide (covering 1919–1921) listing articles such as: Applying the Army trade tests in vocational schools. Industrial Arts Magazine. October 1919. Army intelligence test as a means of prognosis in high school. School and Society, April 16, 1921. Extension of selective tests to industry. Annals of the American Academy, January 1919. Intelligence examinations and admission to college. Education Review, February 1921. Some problems of Americanization as seen by an Army psychologist. School and Society, January 1921.
Salkind_Chapter 72.indd 380
9/8/2010 12:08:00 PM
Haney
Testing Reasoning and Reasoning about Testing
381
60
Intelligence Testing
Number of Entries Under Rubric
48
36
24 Educational (Tests and) Measurement 12 Intelligence Quotient 0
Year
1912
1920
1930
1940
1950
1960
1970
1982
Figure 1: Average annual number of entries in Readers’ Guide to Periodical Literature under educational (tests and) measurements, intelligence testing, and intelligence quotient, 1912–1982. (See appendix 1.)
Boring (1950b) recounts that “the advertising that this [Army] testing gave psychology in America reached into the remotest laboratory and swelled college classes, creating a great demand for PhD instructors” (p. 575), and Camfield (1969) reports that after the Armistice, the “Army received hundreds of inquiries about the Army’s intelligence examining and other psychological programs” (p. 279). Second, the army testing clearly swelled the ranks of people directly involved in psychological testing. In 1917, when the membership of the American Psychological Association stood at only 350, more than 500 men were involved in the WWI testing (Camfield, 1969, p. 206). Many of them were to become leaders in the field of testing for much of the first half of the century: Robert Yerkes, Edward Lee Thorndike, Walter Dill Scott, Henry Goddard, Lewis Terman, Walter Bingham, Edwin Boring, and Guy Whipple, to name but a few. As Cronbach (1975) put it, “delighted with this achievement [the Army testing], psychologists then pressed for civilian testing” (p. 1). Third, the data derived from the WWI testing were “for many years the prime source for studies of occupational, ethnic, racial and geographic differences in ability in the United States” (Tuddenham, 1962, p. 495). Yerkes (1921), for instance, devoted whole chapters to “Intelligence Ratings by State” (chap. 5, p, 681) and “Intelligence of the Negro” (chap. 8, p. 705).
Salkind_Chapter 72.indd 381
9/8/2010 12:08:00 PM
382
Research Design, Measurement and Statistics and Evaluation
Carl Brigham, then a young professor at Princeton University, used the Army data in A Study of American Intelligence (1923) to mount a case for a restrictive immigration policy: “Our study of the army tests of foreign born individuals has pointed at every step to the conclusion that the average intelligence of our immigrants is declining” (p. 197). On the basis of his “objective” and “scientific” analysis of the Army data, Brigham argued for a policy to restrict the immigration of “inferior” races and to prevent the “continued propagation of defective strains in the present population” (p. 210). Yerkes, then professor at Harvard and chairman of the National Research Council’s Information Service, in his foreword to the volume, praised Brigham for adducing “new evidences of the trustworthiness and scientific value of the statistical methods used by military psychologists” (p. vii). In his account of the civil rights movement in the United States leading up to the landmark Supreme Court decision in Brown v. Board of Education of Topeka in 1954, Kluger (1977) points out that as early as 1913, adaptations of the Binet-Simon scale had been used to demonstrate the supposed mental inferiority of Negroes in the United States; but surely the accounts of the WWI testing by prominent scientists such as Yerkes and Brigham, associated with prestigious institutions such as Harvard, Princeton, and the National Research Council, contributed significantly to the atmosphere of xenophobia and racism of the 1920s. It is a highly sobering episode for anyone interested in the application of social science research in practical affairs – for the Immigration Restriction Act was passed in 1924, greatly restricting immigration from nations which, according to the Army data, were the source of inferior mental stock. Though interpretations of the Army data contributed only marginally to the passage of the 1924 Act, in restrospect it seems clear that the Army data led to no such conclusions as Brigham and others derived from them regarding native intelligence. As Stephen Jay Gould has shown in The Mismeasure of Man (1981) – a history of efforts to rank and classify classes of people via “scientific” measurement – a much more plausible and economical interpretation of the Army data was that race and social differences in the Army test scores derived simply from differences in language and culture. Gould comments, The army mental tests could have provided an impetus for social reform, since they documented that environmental disadvantages were robbing from millions of people an opportunity to develop their intellectual skills. Again and again the data pointed to strong correlations between test scores and environment. Again and again those who wrote and administered the tests invented tortuous, ad hoc explanations to preserve their hereditarian biases. How powerful the hereditarian biases of Terman, Goddard, and Yerkes [and he might here have added Brigham] must have been to make them so blind to immediate circumstances. (pp. 222–223)
Salkind_Chapter 72.indd 382
9/8/2010 12:08:00 PM
Haney
Testing Reasoning and Reasoning about Testing
383
Controversy over the Interpretation of WWI Test Results Fortunately, even at the time not all of the conclusions drawn from the Army testing in WWI were taken for granted. Popular accounts of the Army testing in fact triggered “one of the major social controversies of the 1920s” (Cremin, 1964, p. 188). It was an occasion in which testing became not just a tool, but for the first time a target, of muck-raking journalism (Block & Dworkin, 1976). The most famous exchange of opinion was between Walter Lippman and Lewis Terman. In an article in October 1922 in New Republic entitled “The Mental Age of Americans,” Lippman (1922a) lambasted a Mr. Lothrop Stoddard and his conclusion, drawn from intelligence test data, that the average mental age of Americans is only about 14. Over the next few weeks, Lippman published five more articles in the New Republic, all generally critical of mental testing and the conclusions being drawn from it. In the last article, “A Future for Tests” (1922b), Lippman argued that psychologists went awry in trying to construct general abstract tests of intelligence in “the vain effort to discount training and knowledge” (1922a, pp. 10–11). Nevertheless, he suggested that tests of specific skills and abilities “may ultimately make a serious contribution to a civilization which is constantly searching for more successful ways of classifying people for specialized jobs” and would save psychologists “from the reproach of having opened up a new chance for quackery in a field where quacks breed like rabbits” (p. 119). Faced with such a challenge to their newly applied science, proponents of the new mental tests responded in the same forum. In an article published in the New Republic in December’ 1922, Lewis Terman made a curious rejoinder, full of sarcasm and disdain for a layman’s foray into the field of mental testing. Terman suggested that if Lippman was so sure of the effects of experience on mental test scores, he should himself venture into “this enchanting field of research” to investigate the “IQ effects of different kinds of baby talk, different versions of Mother Goose, and different makes of pacifiers and safety pins” (p. 119). Cronbach (1975) observes that in responding to Lippman’s “muck-raking tone and irreverent pinpricks,” Terman “tried to play the same game and was hopelessly overmatched” (p. 12). It was not to be the last time that muck-raking journalists dug in the field of educational testing. Though the Lippman-Terman exchange was surely the most famous exchange of opinion on the matter of mental testing in the 1920s, as early as 1921 there were challenges within the profession of psychological testing to the hereditarian interpretations of mental test scores. In the 1920s, Beardsley Ruml, William Bagley, and other social scientists seriously challenged the notions that intelligence could be measured as a unitary entity and that it was largely heritable and little affected by environment (Chapman, 1980, pp. 144–145; Kluger, 1977, pp. 308–312).
Salkind_Chapter 72.indd 383
9/8/2010 12:08:00 PM
384
Research Design, Measurement and Statistics and Evaluation
Testing Goes to School But the disputes, both within and without the profession, seemed to do little to slow the growth and proliferation of mental testing in the schools in the 1920s. On the very day in 1919 that the Army’s Psychological Division was dismantled, Yerkes and Terman wrote to the Rockefeller Foundation to request a grant for “developing and standardizing an intelligence scale for the group examination of school children – a scale for the measurement of native ability” (quoted in Chapman, 1980, p. 90). This led to the development of the National Intelligence Tests, published by the World Book Company (for which Arthur Otis became director of testing), authored by Terman, Thorndike, Yerkes, and others, and publicized as the “direct result of the application of the Army testing methods to school needs” (Gould, 1981, p. 179). Numerous other mental tests were also authored in the years following the War, and Terman became a leader in promoting the application of mental tests for classifying school children into homogeneous instructional groups and for educational and vocational guidance (Chapman, 1980). Such prescriptions fell on fertile ground. Between 1910 and 1920, enrollment in elementary and secondary schools overall had jumped 20% (from 19.3 to 23.3 million), and enrollment in secondary schools had increased a phenomenal 140% (from .9 to 2.2 million, U.S. Bureau of the Census, 1975, p. 368). And not only were the schools facing more students, but they were being asked to serve increasingly different kinds of students, as a result of unprecedented immigration into the United States around the turn of the century. Into this situation, Leonard Ayres had thrown what Raymond Callahan (in Education and the Cult of Efficiency from 1910 to 1930, 1962, p. 15) calls an “incendiary bomb,” namely his book Laggards in Our Schools (1909). In this book, Ayres reported data showing that large numbers of children were overage for their grade placement in school. And Ayres held the schools responsible for this inefficiency. He charged that school programs were “fitted not to the slow child or to the average child but to the unusually bright one” (Ayres, 1909, p. 5, quoted in Callahan, 1962, p. 15). Ayres’ and other criticisms led to the movement to promote school efficiency in years immediately preceding the War. Starch and Elliot (1912, 1913) reported analyses showing that subjective marks of essay exams were highly variable. The same papers received widely varying marks from different teachers, and even when the same teachers did the regrading, marks still varied widely. Such findings provided impetus in the school efficiency movement for more objective and reliable measurement. Among the leaders pressing for improved measurement as a means of making schools more efficient were Edward Thorndike, George Strayer, Elwood Cubberly, and Paul Hanus (Callahan, 1962; Jonich, 1968; Radwin, 1981). Given this background, it seems hardly surprising that schoolmen should react favorably to proposals for using the new scientific tests of intelligence
Salkind_Chapter 72.indd 384
9/8/2010 12:08:01 PM
Haney
Testing Reasoning and Reasoning about Testing
385
to effect homogeneous grouping of students into instructional tracks. A 1925 survey of school superintendents in 215 cities over 10,000 in population indicated that “classification of pupils into homogeneous groups” was the number one use of intelligence tests. A 1926 survey of the same population of superintendents indicated that homogeneous ability grouping was employed in 90% of the elementary schools of cities with populations greater than 100,000 (Chapman, 1980, pp. 172–180). If the decade of the 1920s was one of institutionalization of intelligence testing in the schools, the 1930s seem to have been one of proliferation for educational testing in general. Figure 1 indicates that in the 1930s attention to intelligence testing seemed to wane but that interest in educational testing more generally, at least as reflected in the popular literature, was on the increase. The periodical literature indicates a proliferation of tests of different sorts, on subjects ranging from art to woodworking. Bibliographies on testing also indicate such a proliferation. It was in the 1930s that Oscar Buros began his long career as the preeminent bibliographer of testing. His first compilation, entitled Educational, Psychological and Personality Tests of 1933 and 1934, was a mere 44 pages. But by 1938 and publication of what later became known as the first Mental Measurements Yearbook, his compilation had grown to more than 400 pages, listing around 4,000 tests available for use (Buros, 1938). The popular literature of the 1930s was far freer of criticism of testing than that of the 1920s – perhaps in part because some of the more extravagant claims of the early mental testers had meanwhile been modified considerably. Gould (1981) in fact cites “recantations” of earlier deterministic interpretations of heritable intelligence by no less than Goddard (by 1928, p. 172), Brigham (who, according to Gould, p. 232, apologized in 1930 for claims made in his 1923 book A Study of American Intelligence “with an abjectness rarely encountered in scientific literature”), and Terman (whose book on the use and interpretation of the 1937 revision of the Stanford-Binet clearly left as an open question the relative influence of heredity and environment on IQ scores, pp. 190–191). What had happened? For one thing, by the late 1920s and early 1930s, the movement to introduce standardized tests into the schools as instruments of scientific educational management clearly had succeeded. By 1930, yearly sales of Terman’s group intelligence test by the World Book Company were over .75 million and those of the Stanford Achievement Tests had reached 1.5 million (Chapman, 1980, pp. 111–112). For another, some early proponents of genetic interpretations of mental test scores had been chastened by their foray into public debate. Shortly after the Lippman-Terman debate in the pages of the New Republic, Terman confided to a colleague that he hoped he would “never be led to take part in such a controversy again” – though the head of World Book had advised him that the “attacks on tests may help the business rather than damage it” (Chapman, 1980, pp. 155, 154).
Salkind_Chapter 72.indd 385
9/8/2010 12:08:01 PM
386
Research Design, Measurement and Statistics and Evaluation
Surely too, the Great Depression of the 1930s altered thinking of hereditarians. If nothing else, the Depression made it clear that regardless of people’s native endowments, severe environmental conditions could have a tremendous impact on their lot in life. Pastore’s (1949) volume examining the relationship between political views and positions taken on the naturenurture controversy among 20th-century scientists, suggests for example, that Terman’s thinking was influenced by the economic experience of the 1930s. And Gould notes how times had changed between the 1920s and the 1930s, as “intellectual fashions of jingoism and eugenics [were] swamped in the morass of the Great Depression” (p. 191).
Research and Theory on Testing If social conditions of the 1930s influenced reasoning about testing during this period, so too did new research and theory on testing. In 1910 Charles Spearman introduced into the English language the term reliability coefficient, defined as the “coefficient between one half and the other half of several measures of the same thing” (p. 281). This notion is, of course, the one by which the accuracy of tests has been assessed for most of this century. To assess the reliability of a test one needs only to calculate the coefficient of correlation between two parallel tests, or the same test given on two occasions, or two sets of items on the same test given just once, or, in the language of the 1954 Technical Recommendations for Psychological Tests and Diagnostic Techniques (American Psychological Association [APA], American Educational Research Association [AERA], & National Council on Measurements Used in Education [NCMUE], 1954, p. 8) the reliability coefficients of equivalence, stability, and internal consistency. As Stanley (1971) commented, “Most of the intervening years have been spent explicating and at times obfuscating the meaning of the key phrases ‘one half,’ ‘other half,’ and ‘same thing’“ (p. 370). Stanley went on to make a fascinating suggestion about how the technology of testing might have developed differently: By 1913, most of the basic test theory, except that unique to factor analysis, had been set forth by Spearman (1913). Numerous details and procedures remained to be worked out, and many applications were yet to be made; but a dozen years before [Ronald] Fisher’s (1925) Statistical Methods for Research Workers introduced the framework of modern experimental design, educational and psychological measurement was headed firmly down the correlational road so well constructed by Francis Galton, Karl Pearson and Charles Spearman. Several Americans, such as Truman L. Kelley (1927) and Edward E. Cureton (1931), gave it further impetus. Kelley’s (1924) major work Statistical Method appeared just a year before Fisher’s and therefore did not reflect the statistical revolution being fomented in England. The history of test
Salkind_Chapter 72.indd 386
9/8/2010 12:08:01 PM
Haney
Testing Reasoning and Reasoning about Testing
387
theory after World War I might have been substantially different had Fisher’s book preceded Kelley’s by several years. (p. 371)
Stanley did not expand on what he meant, but if analysis of variance (ANOVA) methods rather than correlational ones had come to predominate in work on testing, two outcomes would have been likely. First, considerably more attention probably would have been directed at the conditions (or in the jargon of experimental design, the “treatments”) associated with variations in test performance. Second, more heed might have been paid to the nature of what was actually being measured. The results of correlational analysis – correlation coefficients and variance explained and unexplained – are metric free. But at least some of the results of analysis of variance – such as effects estimates – are interpreted in terms of the metric scale of the original measurements. But since Fisherian ANOVA techniques had very little influence on testing methodology in the United States before mid-century – and have remarkably little even now, despite the elaboration by Cronbach and others (Cronbach, Gleser, & Rajaratnam, 1972; Cronbach, Rajaratnam, & Gleser, 1963) of generalizability theory as a framework for encompassing both ANOVA and correlational perspectives – testers in the first half of this century had to turn to other methods when it came to making sense of more than two variables at a time.
Factor Analysis The method was factor analysis, and Boring (1950b) suggests that the 1930s were the decade of this method in the American educational psychology. Factor analysis is in essence a statistical method for interpreting the intercorrelations among a set of variables, such as scores from three or more tests given to the same population. The inventor of factor analysis was Charles Spearman, who in 1904 published an article in which he set out a method for interpreting the underlying meaning of a set of four intercorrelated variables – the method of tetrad differences. In the same article, he applied his technique to derive his two-factor theory of intelligence, which held that intelligence, as measured on a variety of tests, could be divided into two components: a general factor (Spearman’s g) and a specific factor (Spearman’s s, the components unique to specific tests). Spearman’s theory surely helped to bolster the interpretation, common in the early part of the century, of intelligence tests as measuring an underlying general intelligence; and his technique of factor analysis is called by Gould (1981), who is in other ways highly critical of Spearman’s work, “still the most important technique in modern multivariate statistics” (p. 257). There is not space here to trace the development of factor analysis in the analysis and interpretation of test scores throughout the whole of the first
Salkind_Chapter 72.indd 387
9/8/2010 12:08:01 PM
388
Research Design, Measurement and Statistics and Evaluation
half-century (see Gould, 1981; Tuddenham, 1962, for historical accounts). But it is worth noting that Gould’s (1982) account is especially valuable because factor analysis often is not treated seriously in popular accounts of mental testing. Jensen (1981), for instance, disclaims that “factor analysis is too complicated to explain here” (p. 53), although he then wanders off into an analogy likening the use of factor analysis to identify the common ingredient measured by mental tests to the job of “a group of scientific minded Martians [who] invade a large well-stocked liquor store and tried to discover the nature of the multitudinous variety of liquids they found in all the different bottles” (p. 56). The analogy is amusing, but of course presupposes that there is a common ingredient, ethanol in booze and Spearman’s g in mental test scores, waiting to be discovered. Yet as Gould explains, any matrix of positive correlations can be resolved into a single major factor, analogous to g, and a set of subsidiary factors, analogous to Spearman’s s factors, or “into a set of ‘simple structure’ factors that usually lack a single dominant direction. Since either solution resolves the same amount of information, they are equivalent in mathematical terms” (p. 269). Historically, factor analysis seemed to become a fairly popular approach to the analysis of mental test scores in the 1920s and 1930s. As early as 1920 Geoffrey Thompson in Britain had challenged Spearman’s analysis and instead advocated a multiple-factor interpretation of mental test scores. And in the United States, L. L. Thurstone surely was the most prominent challenger of Spearman’s interpretation of factor analyses of mental test scores (E. L. Thorndike was also a Spearman rival, but less involved in issues of factor analysis; Jonich, 1968). As early as the 1920s, Thurstone had taken exception to some of the practices common in mental testing, in particular the use of mental age as an appropriate basis for interpreting test performance. And in the 1930s, Thurstone turned his attention more to the use of factor analysis as a means of identifying underlying mental faculties. He contributed substantially to the technical development of factor analysis (in particular inventing the technique of factor rotation by which small sets of independent factors will explain maximum variance). But he is more widely known for his theory of mental abilities, to which his innovations in factor analysis contributed support: namely, the theory of primary multiple abilities. Instead of isolating a single unitary factor g as underlying mental test score performance, Thurstone and his wife, T. G. Thurstone (who collaborated on several publications), used rotated factor analysis to locate as many as 12 and as few as 6 primary mental abilities. Any one test might not tap all of the abilities, but that might be simply because it did not contain items covering some of the primary abilities. The most prominent primary mental abilities (or factors) found in the Thurstones’ studies were those labeled as verbal comprehension, number (that is, computational skills), spatial visualization, associative memory, word fluency, reasoning, and perceptual speed
Salkind_Chapter 72.indd 388
9/8/2010 12:08:01 PM
Haney
Testing Reasoning and Reasoning about Testing
389
(though in citing these “primary mental abilities,” it should be noted that the labeling of factors identified through factor analysis, whether rotated or not, is a highly subjective process). In an article published in the Harvard Educational Review in 1939 entitled “The Principal Compulsions of Factor Analysts,” E. E. Cureton aptly summed up, in tongue-in-cheek fashion, where things seemed to stand with factor analysis at the time: Factor theory may be defined as mathematical rationalization. A factor analyst is an individual with a particular obsession regarding the nature of mental ability or personality. By the application of higher mathematics to wishful thinking, he always proves that his fixed idea or compulsion was right or necessary. In the process he usually proves that all other factor analysts are dangerously insane, and that the salvation for them is to undergo his own brand of factor analysis in order that the true essence of their several maladies may be discovered. Since they never submit themselves to this indignity, he classes them all as hopeless cases and searches about for some branch of mathematics which none of them is likely to have studied in order to prove that their incurability is not only necessary but also sufficient. (p. 287)
Testing in Survey Research Apart from the developments in factor analysis, two other developments in testing in the 1920–1950 era should be noted. One was the use of standardized tests in large-scale school surveys, and the other was the increasing use of the new-type objective tests for the purpose of college admission. In his history of educational testing, Daniel Resnick (1982) cites three large-scale surveys between the two world wars as contributing to the use of testing “by the emerging corps of professional guidance counselors.” These were the following: 1. The Pennsylvania Study, which Resnick terms “the most important guidance study ... in the interwar period,” was carried out from 1928 to 1932 and tested nearly all high school seniors in the state. The study was headed by William Learned and Ben Wood, and revealed that “a large portion of the intellectually talented were not going to college.” The study also indicated in a finding seemingly prescient of similar ones in the 1950s and 1980s, that students who would go on to teachers’ colleges scored more poorly in areas such as science than many of the high school students they would have to teach. One major recommendation of the Pennsylvania study was that high schools and colleges should keep cumulative records, including test scores, on students for use in advising them on their future educational careers.
Salkind_Chapter 72.indd 389
9/8/2010 12:08:01 PM
390
Research Design, Measurement and Statistics and Evaluation
2. The Eight Year Study, begun in 1933, was the most extensive of the surveys between the wars. Headed by Ralph Tyler, this study was an effort by progressive educators to see whether high school curricula might be reformed, without strict adherence to college entrance requirements, and still not hurt the chances of high school graduates for success in college. In terms of the goals originally motivating the study, the experiment was a success, but Cremin (1964, pp. 250–258) argued that the Eight Year Study helped to hasten the end of the Progressive movement (because it drained attention and funding away from broader efforts), and that its results were never suitably recognized because they were released during World War II. 3. The third study, sponsored by the Regents of the State of New York, “investigated the knowledge, work orientation and level of satisfaction of all students who left secondary school in New York State before, at or after graduation in 1936–37” (Resnick, 1982, p. 186). The study noted that there was too much teaching to the Regents’ high school graduation tests for academic track students and recommended better record-keeping and educational guidance for high school students generally. (p. 185)
Testing and College-Going Obviously, with the phenomenal growth in secondary school attendance during the first part of the century, much attention was focused on this level; and it was with respect to the transition from high school to college that another major development in testing occurred in the 1920–1950 era, namely, the development of standardized college admissions tests. At the turn of the century, colleges’ standards for admissions were so diverse as to be anarchic. Different colleges required candidates to have studied different subjects, and each college had its own means of testing the preparedness of its applicants. These conditions gave rise to the founding of the College Entrance Examination Board in 1900. Into the 1920s, the exams administered by the College Board were strictly essay examinations in the classics and in specific subjects. But of course by the 1920s, the testing scene was changing. In 1925, the Board appointed a committee of experts to advise it on the suitability of developing the new type of multiple choice tests for use in college admissions. Among the members of the advisory panel were Carl Brigham and Robert Yerkes. The Board accepted the recommendation of the committee that the new tests be tried out, and Carl Brigham was named to head the effort. Within a short time, the Brigham Committee produced a manual on what they called the “Scholastic Aptitude Test,” explicitly distinguishing it from tests of achievement in school subjects, but disclaiming any intention to measure “general intelligence” or “general mental alertness.” (Angoff & Dyer, 1971, p. 2).
Salkind_Chapter 72.indd 390
9/8/2010 12:08:01 PM
Haney
Testing Reasoning and Reasoning about Testing
391
The first Scholastic Aptitude Test (SAT), for the most part a multiple choice exam, was administered to 8,000 candidates in June 1926. “In the beginning the test offered only a single score, lumping together both mathematical and verbal material into one sum, but in 1930 the presence of two factors was detected and thereafter the mathematical material was scored separately ” (Donlon & Angoff, 1971, p. 19). Over the next 10 years the multiple-choice SAT was administered in conjunction with the older College Board essay exams. During almost all of this period, the College Board’s admissions testing program was headed by Carl Brigham, and under Brigham’s leadership the Board introduced a number of innovations. One of these was to include in the SAT an experimental section (which test takers could not identify) containing items to be tried out for possible inclusion in future tests. In 1937 an April administration of the SAT was introduced, and as its use increased, the Board felt the need for making scores on the April and June administrations comparable. “Beginning in June 1941, ... the scores on every form of the SAT were equated directly to some preceding form of the SAT, and ultimately and indirectly to the April 1941 form,” using the 800 scale which had been originally introduced with the first administration of the SAT in 1926 (Angoff & Dyer, 1971, p. 3). It is interesting to note that ever since its inception under the leadership of Brigham in 1926, the SAT has been disclaimed as a measure of intelligence. Yet in one of the minor but longstanding ironies in the field of testing, it has been classified under the rubric “Intelligence – Group” in the Mental Measurements Yearbooks (MMYs) ever since the SAT was listed, in the fourth MMY (Buros, 1953). With the outbreak of World War II (WWII), and the need for accelerating college studies, it was decided to abandon the June essay-type examination used in the College Board admissions testing program for over 40 years. Initially this was viewed as a temporary measure, but after the war and studies showing that the multiple choice SAT could predict performance in college as well as essay scores, the essay exams were not reintroduced.
Testing in World War II Curiously, although about 10 million recruits were tested for military assignments in WWII with the Army General Classification Test (AGCT), the military testing in WWII did not receive nearly as much attention in the popular literature as did the WWI testing (see Figure 1).3 There would seem to have been several reasons for the contrast. First, standardized testing was well institutionalized in civilian life by the outbreak of WWII. As previously noted, the WWI testing had done much to contribute to civilian testing, but ironically, just after WWI, the Army’s testing program had been discontinued. It was only with the prospect of the second war that the War Department reinstituted a widespread testing program, again with the help of the National Research Council and many prominent psychologists. Second, in the words of
Salkind_Chapter 72.indd 391
9/8/2010 12:08:01 PM
392
Research Design, Measurement and Statistics and Evaluation
Walter Bingham, who headed the Army testing during WWII, the AGCT made “no pretense of measuring native intelligence” (quoted in Hale, 1982, p. 23). Instead, the AGCT presented three types of problems: “vocabulary to measure the verbal factor, arithmetic word problems to measure the number and reasoning factors, and block counting to measure the space factor” (Dailey, 1953, p. 379). It was a clear reflection of the work on factor analysis in the 1930s.
Testing at Mid-Century By mid-century, mental testing had clearly become well established in education in the United States. Several developments during its growth and institutionalization from WWI to 1950 have been noted. First, the testing in WWI helped greatly to promote testing in the schools of the United States, and also sparked a major public controversy in the 1920s. Second, so-called intelligence tests and achievement tests both came to be widely used in the schools, for pupil classification as well as for guidance. Third, though commercial marketing of tests had begun before WWI, testing as a commercial enterprise greatly expanded during this period. Fourth, the technology of testing did not advance markedly during this era, with the notable exception of the development and expanding use of factor analysis. Fifth, standardized tests came to be quite widely used in large-scale research surveys. Sixth, standardized testing had come, by mid-century, to be a major instrument for guiding the transition from secondary school to college. Finally, work on factor analysis clearly contributed to the emergence of tests of differentiated mental abilities. The WWI to 1950 era, after some initial setbacks due to overly grand claims for testing early in the period, was largely one of growth and refinement. Standardized testing of mental abilities clearly had won a prominent place in education in the United States. But as testing, and education, came to have a more prominent place on the national agenda following mid-century, considerable controversy was to reemerge.
Testing and Education on the National Scene after Mid-Century After the controversy over intelligence testing in the 1920s, the story of standardized testing in the United States in the next two decades had been one of seemingly uninterrupted success and expansion, with the new type of tests being widely adopted in the schools for student classification and guidance, in broad-scale survey research, and increasingly for purposes of college admissions. Although researchers such as Thurstone and Bagley had for two decades argued that, contrary to the publicity of the 1920s, IQ tests did not measure a unitary and general native intelligence, and though prominent
Salkind_Chapter 72.indd 392
9/8/2010 12:08:01 PM
Haney
Testing Reasoning and Reasoning about Testing
393
new tests such as the SAT and the AGCT avoided general intelligence and instead were aimed at measuring differentiated mental faculties, at midcentury IQ testing was still very well entrenched in the schools. Sarason (1976) recounts the situation he found as a new school psychologist in the 1940s: In 1942 I went to work at the Southbury Training School in Connecticut and discovered that Binet testers were a numerous breed who ground out IQ scores the way Detroit did cars. I am not being unkind. During my first year at Southbury I built my own factory because many of the children had not been tested for years, some had been tested with the 1916 [Stanford Binet] scale and it was felt that they should be evaluated by the 1937 revision, and in some instances the results we would obtain would be more valid than those of other testers. We tested, scored and wrote reports – day after day and month after month. Those were the days when an IQ of 70 was the dividing line: Below it you could be admitted, above it you had to stay “outside.” IQ points were like heartbeats; they made a difference. (p. 585)
Standards for Tests Despite the success of testing, in terms of its adoption into educational practice and the apparent quiet on the popular front (note the declining number of popular articles on testing in the 1945–1955 period shown in Figure 1), the educational role of testing was of increasing concern to professionals in the fields of psychology and educational research. This concern led to the promulgation of professional standards concerning educational testing. In 1953, the American Psychological Association (APA) issued a set of Ethical Standards of Psychologists, which listed cases of test misuse such as the following: A high school newspaper carried a page-one headline: “Meet the geniuses of the incoming class” and then listed all pupils of IQ 120 and up with numerical scores. Then under the heading “These are not geniuses, but good enough” were listed all the rest, with IQ’s down to the 60’s. A new battery of tests for reading readiness was introduced in a school. Instead of the customary two or three, 12 beginners were this year described by the test as not ready for reading. They were placed in a special group and given no reading instruction. The principal insisted that if the parents or anyone else tried to teach them to read “their little minds would crack under the strain.” In at least two cases parents did teach them to read with normal progress in the first semester, and later mental tests showed IQ’s above 120. (p. 144)
The 1953 Ethical Standards listed 19 principles concerning the sale and distribution of tests and diagnostic aids. Their general theme was to safeguard psychological tests as “professional equipment,” both to prevent misuse and to avoid their invalidation. In 1954, the APA joined forces with the American Educational Research Association (AERA) and the National Council on
Salkind_Chapter 72.indd 393
9/8/2010 12:08:01 PM
394
Research Design, Measurement and Statistics and Evaluation
Measurements Used in Education (NCMUE, the forerunner of the National Council on Measurement in Education, NCME) to issue a set of Technical Recommendations for Psychological Tests and Diagnostic Techniques. The 1954 Technical Recommendations were based on the “essential principle” that a test manual should “carry information sufficient to enable any qualified user to make sound judgments regarding the usefulness and interpretation of the test” (p. 2). Altogether the document contained more than 160 standards grouped into six categories: (a) dissemination of information (10 standards); (b) interpretation (18); (c) validity (66); (d) reliability (31); (e) administration and scoring (10); and (f) scales and norms (29). In the section on test interpretation, the 1954 document advised that where a misinterpretation of a test is known to be common, it was essential that the test manual warn against the misinterpretation. Several examples of such misinterpretation were offered. One was the common misinterpretation that intelligence tests are “measures of native ability alone.” Another was the interpretation of the term “IQ.” The Recommendations warned that since ratio IQ scores (that is, mental age divided by chronological age), do not have the same properties as deviation IQ scores (wherein IQ is determined not by the mental age/chronological age ratio, but in terms of a standardized scale with the mean equal to 100 and standard deviation equal to 15 or 16), the term IQ should be avoided when referring to deviation scores.4 In 1955, the AERA and NCME issued Technical Recommendations For Achievement Tests, a reworking of the 1954 Recommendations, aimed more directly at educational tests than at psychological instruments, such as personality inventories or projective tests. This flurry of issuing standards in the early 1950s surely was in part motivated by a desire to enhance the professional image of testing. But also it was a direct reflection of the social success of testing and the fact that with the widespread adoption of tests into educational and psychological practice, some test use was misuse. Evidently there was disquiet among professionals over the proper development and use of tests; and before long, debate on the topic was to reemerge prominently on the public front as well. This reemergence is reflected in Figure 1, in the peak of articles around 1960 listed under the educational tests and measurements rubric. Several events prompted the upswing of popular attention to educational testing.
The National Defense Education Act and the National Merit Program Following the launching of the Soviet satellite Sputnik in 1957, the U.S. Congress passed the National Defense Education Act (NDEA) of 1958. Under the NDEA, federal funds were provided for states to
Salkind_Chapter 72.indd 394
9/8/2010 12:08:01 PM
Haney
Testing Reasoning and Reasoning about Testing
395
establish and maintain a program for testing aptitudes and abilities of students in public secondary schools, and ... to identify students with outstanding aptitudes and abilities ... a) to provide such information about the aptitudes and abilities of secondary school students as may be needed by secondary school guidance personnel in carrying out their duties; and b) to provide information to other educational institutions relative to the educational potential of students seeking admissions to such institutions. (Goslin, 1963, p. 71)
Another factor contributing to increased national prominence for testing was the founding of the National Merit Scholarship Corporation (NMSC) in 1955. By administering a qualifying test to high school juniors nationwide, the NMSC sought to identify and honor students “who rank at the upper end of the academic ability scale” (NMSC, 1978).
Proliferation of Testing Proliferation of large-scale testing programs was aided by the development of automated optical scoring equipment by E.F. Lindquist and others at the Measurement Research Center at the University of Iowa in 1955 (Baker, 1971). Oscar Buros’ Mental Measurements Yearbook (MMY) series presents a picture of where testing stood at the end of the 1950s. Buros published his fifth MMY in 1959. But because many of the tests covered in previous MMYs were still in print and widely used, Buros attempted with the publication of Tests in Print (1961) to present a “comprehensive bibliography to tests for use in education, business and industry.” Tests in Print listed over 2,000 tests available for use (and another 800 covered in Buros’ MMYs, but out of print as of 1961). In terms of Buros’ system of classifying tests, the most common types were character and personality tests (14.4% of in-print test titles), vocational tests (13.5%), and intelligence tests (11.2%). In the introduction to Tests in Print (1961), Buros offered the following comments on the proliferation of tests: At present, no matter how poor a test may be, if it is nicely packaged and it promises to do all sorts of things which no test can do, the test will find many gullible buyers. When we initiated critical test reviewing in The 1938 Yearbook, we had no idea how difficult it would be to discourage the use of poorly constructed tests of unknown validity. Even the better informed test users who finally become convinced that a widely used test has no validity after all are likely to rush to use a new instrument which promises far more than any good test can possibly deliver. . . . The test user who has faith – however unjustified – can speak with confidence in interpeting test results and in making recommendations. The wellinformed test user cannot do this; he knows that the best of our tests are highly fallible instruments which are extremely difficult to interpret with assurance in individual cases. Consequently he must interpret test results
Salkind_Chapter 72.indd 395
9/8/2010 12:08:01 PM
396
Research Design, Measurement and Statistics and Evaluation
cautiously and with so many reservations that others wonder whether he really knows what he is talking about. (pp. xxiii–xxiv)
Commenting on the new test standards embodied in the Technical Recommendations of 1954 and 1955, Buros also offered the following evaluation: “Many of the tests on the market today would cease to exist if these standards should become the accepted criterion for evaluating test manuals” (p. xxviii). Exactly how many standardized tests – good or bad – are given in the schools and elsewhere has never been easy to estimate, but Goslin (1963) suggested that as of 1961 more than 100 million commercially produced ability tests were administered annually. He commented that “it is probably safe to say that there are more ability tests being given annually in the United States than there are people” (p. 54). In terms of use in schools, Goslin suggested that tests were more widely employed at the elementary than at the secondary level, and that at the high school level the primary reported uses of test results (all indicated by more than 50% of a national sample of high schools, though results varied considerably by region and type of school) were (a) to aid in counseling; (b) to aid in placing students in various curricula (business, vocational, college preparatory); and (c) to make homogeneous groups within subject matter areas.
Reemergence of Criticisms Given such wide use of tests, and the judgment by people such as Buros that many of those marketed and used were of doubtful validity, it is hardly surprising that popular criticisms of testing reemerged in the late 1950s. A sampling of articles of this period follows: Testing. Can everyone be pigeonholed? Newsweek, July 20, 1959. What the tests do not test. New York Times Magazine, October 2, 1960. Are we developing a robot education? Ladies Home Journal, August 1959.
Easily the most famous critical work on testing in this period was by Banesh Hoffmann, first in an essay in Harpers magazine (1961), and then in 1962 in a book titled The Tyranny of Testing. In the latter, Hoffmann criticized the superficiality of multiple choice tests, arguing that they tend to penalize deep thinkers and to inhibit close reasoning. Other popular treatises critical of testing followed, such as Martin Gross’s The Brain Watchers (1962), Hillel Black’s They Shall Not Pass (1962), and Vance Packard’s The Naked Society (1964). Such criticisms led to considerable concern among professionals. In 1965, the APA released a special issue of American Psychologist entirely devoted
Salkind_Chapter 72.indd 396
9/8/2010 12:08:01 PM
Haney
Testing Reasoning and Reasoning about Testing
397
to testing, which the guest editor described at the time as one of the largest editions ever published. Criticisms of testing also led to an examination by no less a body than the U.S. Congress. The immediate trigger for the congressional hearings in 1965 appears to have been concern over the use of personality tests in the selection of federal government personnel (Amrine, 1965a, p. 861). The use of the Minnesota Multiphasic Personality Inventory (MMPI) was subjected to particular ridicule. The flap over personality tests even prompted Washington humorists to foray into the field of test construction. Hence the Art Buchwald Personality Inventory, which included among other true-false items, a la the MMPI, the statement, “A wide necktie is a sign of disease.” As recounted by Michale Amrine, the test interpreted as follows: If you had more true than false, you should work for the labor department. If more false than true, you should work for the Peace Corps. If you were evenly divided, true and false, you should apply for work with the Voice of America. If you held your hand over the questions while you answered them, you should go with the FBI, and if you refused to answer some of the questions, you might work for the White House. (1965b, p. 989)
Beyond provoking some good humor and a fat issue of the American Psychologist, the controversies of the early 1960s may also have helped to heighten the sensitivity of testers to privacy rights in personality testing and to the intertwining interests of government and the psychological profession in the business of testing.
The 1966 Test Standards In 1966, the APA, AERA, and NCME issued a revised version of the 1954 Technical Recommendations, now called Standards for Educational and Psychological Tests and Manuals. In form and content the 1966 Standards were highly similar to the 1954 Recommendations, but where changes were apparent, they often seemed to reflect the increasingly public role of testing and to address the problem of misuse. An introductory passage stressed the need for validating projective techniques, such as the Rorscach, and a new standard in 1966 said, “Promotional material for a test should be accurate and complete and should not give the reader false impressions. ESSENTIAL” (A1.22, p. 7). The 1966 Standards reiterated the warning about frequent misinterpretation (Standard B 1.5) from the 1954 document, but curiously, the specific reference to avoiding the term IQ in referring to deviation scores was dropped. On technical matters, advice in the 1966 Standards also changed somewhat regarding validity and reliability. The 1954 Recommendations had defined four types of test validity: content, predictive, concurrent, and construct. The 1966
Salkind_Chapter 72.indd 397
9/8/2010 12:08:01 PM
398
Research Design, Measurement and Statistics and Evaluation
Standards treated predictive and concurrent validity together as alternative forms of criterion-related validity. The 1954 document had discussed test reliability largely in terms of correlation coefficients, namely, coefficients of internal consistency, equivalence, and stability. In a sharp change, however, the 1966 Standards reported that the “classification of coefficients ... into several types has been discarded. Such a terminological system breaks down as more adequate statistical analyses are applied and methods are more adequately described” (p. 26). Instead, the 1966 document recommended that “the estimation of clearly labelled components of error variance is the most informative outcome of a reliability study’’ (p. 26, emphasis in original).
ESEA Title I Evaluation How much influence such professional standards have exerted on practices of test development and use is doubtful (as Buros observed of the 1954 Recommendations in 1961, and was to observe later of the 1966 Standards and of their 1974 revision). But in 1965, the year of the congressional hearings on personality testing, the U.S. Congress passed a bill that was to have more influence on school testing programs than probably any other act of Congress, or for that matter any other body. This was the Elementary and Secondary Education Act (ESEA) of 1965. ESEA was “an important watershed in the history of American education” (Bailey & Mosher, 1968, p. vii). The vast majority of funds disbursed under ESEA were Title I funds, providing financial assistance to local education agencies for the education of children from low-income families. Between 1966 and 1981, Title I funding grew from around three-quarters of a billion dollars to nearly $4 billion (Grant & Eiden, 1982, p. 173), easily the single largest source of federal funding for education. With the federal funding came strings, in the form of a provision for the evaluation of the effectiveness of Title I-funded programs. Specifically, the ESEA legislation required that local educational agencies (LEAs) could receive funds only if “effective procedures, including provision for appropriate objective measurements of educational achievements, will be adopted for evaluating at least annually the effectiveness of the programs in meeting the needs of educationally deprived students” (Public Law, 89-10, 89th Congress, H.R. 2362, April 11, 1965, as quoted in Bailey & Mosher, 1968, p. 239). Evaluation was intended to ensure that the Title I money would indeed be used to improve the education and learning of disdvantaged students. Also, it was anticipated that LEAs would gather objective data on the effectiveness of Title I programs that could be aggregated at the state level and then summarized, so that the federal government could derive a national picture of the extent to which Title I was boosting the learning of educationally deprived students served in the program.
Salkind_Chapter 72.indd 398
9/8/2010 12:08:01 PM
Haney
Testing Reasoning and Reasoning about Testing
399
As McLaughlin (1975) recounts in her history and analysis of the first decade of experience with Title I evaluation, aggregation of objective evidence of Title I effectiveness proved impossible. LEAs used different tests to evaluate effectiveness, analyzed the results in different ways, and reported the results, if at all, late and in markedly different ways. Such problems led to the inclusion in section 151 of the Education Amendments of 1974 of requirements that the U.S. Office of Education develop models embodying uniform methods of evaluation that would yield comparable results across the Title I programs of different LEAs. Evaluation models were therefore developed that would allow the estimation of program effectiveness (Tallmadge & Wood, 1976). More than 90% of local Title I evaluations used Model A, the norm-referenced model, for estimating program effectiveness (Reisner, Alkin, Boruch, Linn, & Millman, 1982, p. 10). Exactly how much the testing scene was affected by the development of Title I evaluation procedures in the 1970s is hard to estimate, but it seems safe to say that ramifications were considerable. One sign of this was that all major achievement test series revised in the late 1970s and early 1980s included tables for interpreting results in normal curve equivalents (which was the metric selected for aggregation of Title I evaluation results). Another was that local and state officials interviewed about the Title I evaluation system realized that the federal government was promoting the use of norm-referenced tests (NRTs) – even though they thought that other types of tests would be more appropriate (Comptroller General, 1977, pp. 226–228).5 This development – the increasing use of norm-referenced tests in Title I evaluations to help ensure benefits to disadvantaged students – carried considerable irony in contrast to two other broad developments in testing. First, at the same time that the NRT was increasingly being used in Title I evaluations, many educational researchers were becoming highly critical of the use of NRTs for program evaluation. Instead of NRTs, they were promoting criterion-referenced tests (CRTs). Second, as part of Lyndon Johnson’s War on Poverty, Title I was intended to advance the interests of disadvantaged children, and the Title I evaluation and testing system that evolved in the 1970s was meant to ensure that that intention was carried out. Yet during the 1970s, other developments on one hand raised questions as to whether the investment in education for the disadvantaged, as in Title I, might be in vain, and on the other hand, raised doubt as to whether widespread testing in the schools was doing more to retard than to advance the interests of disadvantaged minorities.
The Criterion-Referenced Testing Cause NRTs surely constitute the thickest branch in the family tree of testing in the United States. The Army Alpha and Beta, the SAT, the AGCT, and the various intelligence tests previously mentioned all are NRTs, in that their
Salkind_Chapter 72.indd 399
9/8/2010 12:08:01 PM
400
Research Design, Measurement and Statistics and Evaluation
results are interpreted in terms of performance of some other group of test-takers (a norm group or standardization sample). Two of the most widely used criteria for the inclusion of candidate items in an NRT are item difficulty (the percentage of a tryout sample who get the item correct) and item discrimination (the correlation or association of results on the item and some criterion, such as all of the items on the tryout test or some other criteria such as grade placement or chronological age). These two techniques for selecting items for NRTs can be traced back directly to Binet’s inventions recounted earlier. Constructors of NRTs must of course pay some heed to item content, but as the technical report on the SAT notes, content specifications are “necessarily less rigorous” than item selection in terms of difficulty and discrimination. In terms of informing selection decisions, these criteria contribute to important test characteristics. Difficulty affects the test’s power to discriminate among test-takers – an important characteristic of a selection test since selection decisions are almost always constrained in that not all candidates can be selected. Similarly, item discrimination – the idea that all items on a test discriminate among test-takers in the same way or at least in the same direction – contributes to what might be called the construct coherence of the test. If one is faced with a binary selection decision – that is, to select or not – then such an attribute can make matters much simpler than if a selection instrument taps several different constructs. Desirable though these characteristics of an NRT may be from a selection perspective, advocates of CRTs pointed out in the 1960s and 1970s (Carver, 1974; Glaser, 1963; Madaus, Kellaghan, Rakow, & King, 1979; Popham, 1978a, 1978b; Popham & Husek, 1969) that these characteristics of NRTs and means of item selection are undesirable from other points of view. James Popham (Popham, 1978a, 1978b, 1980; Popham & Husek, 1969) became perhaps the best known advocate of CRTs as an antidote to the weaknesses of NRTs. He argued, for example (1978a), that NRTs have three major deficits from an educational point of view: 1. Because of the way they are constructed, NRTs can result in unrecognized mismatches between what is being tested and what is being taught. 2. Because of their generality, NRTs provide imprecise instructional targets. 3. The techniques used in building NRTs, which help to ensure considerable response variance among examinees, can lead to the elimination of items covering important topics or skills just because all examinees get the items right or wrong (and thus do not discriminate among examinees). Naturally, Popham argued that because of their direct attention to the skills and knowledge measured, “On all three counts, CRTs can correct the inadequacies of NRTs for purposes of instruction and evaluation” (Popham, 1978a, p. 9). Popham even predicted that because of the weaknesses of NRTs and the
Salkind_Chapter 72.indd 400
9/8/2010 12:08:01 PM
Haney
Testing Reasoning and Reasoning about Testing
401
purported strengths of CRTs, we would witness a new period in educational assessment, “the criterion-referenced measurement era” (Popham, 1978b, p. 2, emphasis in original). More moderate observers suggested merely that curriculum-sensitive CRTs could play an important role in educational evaluation, even though NRTs might continue to be used to compare outcomes of programs that emphasize different aspects of instruction (Madaus et al., 1979).
The IQ Controversy The charges leveled against NRTs by Popham and other advocates of CRTs in the 1970s were important and aroused considerable attention in the worlds of education and evaluation. But compared to another major dispute over testing in the 1970s, the case of the CRT advocates seemed nothing but a minor tempest. The larger controversy was the one over IQ and interpretations of IQ test scores. As Figure 1 shows, the early 1970s were the only time in the last 70 years when the number of articles indexed in the Reader’s Guide under the “intelligence quotient” rubric surpassed those under the more general headings “intelligence testing” and “educational tests and measurements.” The controversy was clearly set off by a long article by Arthur Jensen (1969) in the Harvard Educational Review entitled, “How Much Can We Boost IQ and Scholastic Achievement?” The piece began, “Compensatory education has been tried and apparently it has failed.” There followed 120 pages of text in which the Berkeley psychologist reviewed evidence indicating that IQ and scholastic aptitude differences within the white population can be attributed in large measure to genetic inheritance. But the bombshell in the Jensen article was the suggestion that it is not an unreasonable hypothesis that “genetic factors are strongly implicated in the average Negro-white intelligence difference” (p. 82). Jumping from indirect evidence concerning heritability within white populations to the suggestion about differences between populations, and from evidence about IQ test scores to conclusions about native intelligence, were large leaps in logic, but the fallout was direct. Twelve of 15 articles listed in volume 29 of Reader’s Guide (1969–1970) dealt directly with the Jensen article. Summaries of the Jensen piece, more or less accurate, and rebuttals were published in U.S. News and World Report, Time, New Republic, and the New York Times Magazine, to name but a few of the popular periodicals that covered the affair. Cronbach’s (1975) review of the episode suggests the following: The news media were not able to weigh matters as delicately as Jensen had. Fairly typical is Newsweeks’s (“Born dumb,” 1969) summary: “Dr. Jensen’s view put simply is that most blacks are born with less intelligence than most whites.” (p. 4)
Salkind_Chapter 72.indd 401
9/8/2010 12:08:01 PM
402
Research Design, Measurement and Statistics and Evaluation
The IQ controversy of the 1970s was set off by the Jensen article and refueled by a 1971 article by Richard Herrnstein in Atlantic Monthly. The controversy of the 1970s was political more than scientific – highly reminiscent of the Lippman-Terman debate of the 1920s. The controversy spawned dozens of popular articles in the early 1970s and literally hundreds of pages in the more scholarly literature. After the original Jensen piece, the Harvard Educational Review alone devoted about 300 pages to followups. A compilation of materials on “Environment, Intelligence and Scholastic Achievement” by the Select Committee on Equal Educational Opportunity of the U.S. Senate (1972), prepared for hearings that were canceled because they promised to be too controversial, ran to more than 600 pages. A book of critical readings on The IQ Controversy by Block and Dworkin (1976), bringing together only the “best of the critical literature,” had more than 500 pages. The volume of print on the IQ controversy can be contrasted with the single listing under the IQ heading in Volume 28 of the Reader’s Guide (which covered the 2 years before the Jensen article) – a Reader’s Digest piece, ironically enough on how “You Can Raise Your Child’s IQ.”
The 1974 Test Standards The IQ debate of the early 1970s contributed to several developments, including a revision of the professional test standards and several calls for the reform of testing in education and elsewhere. A revision of the 1966 Standards for Educational and Psychological Tests was issued under the same title in 1974 (APA, AERA, & NCME, 1974). The degree of change from the 1966 Standards to the 1974 version, despite the common title, was far greater than between the 1954 Technical Recommendations and the 1966 Standards. Both the 1954 and 1966 documents had been aimed mainly at test developers and the preparation of test manuals. The 1954 Technical Recommendations had advised that “The ultimate responsibility for the improvement of testing rests with test users” (p. 7). Yet the Recommendations, intended as a guide to test development and reporting, offered scarcely a word of direct advice to test users. In the main this discrepancy persisted in the 1966 Standards. But the 1974 document was intended, certainly as a result of the increasingly prominent social role of testing and controversies over that role, to help bridge the gap between test manuals and test use. It was intended “to guide both test developers and test users” (p. 1). As a result, the 1974 Standards were considerably expanded and reorganized into three sections concerning standards for (a) test manuals and reports, (b) reports of research on reliability and validity, and (c) use of tests. Most of the expansion came in the third section, reflecting the broadened goal of the revised document, but there were significant changes elsewhere.
Salkind_Chapter 72.indd 402
9/8/2010 12:08:02 PM
Haney
Testing Reasoning and Reasoning about Testing
403
In the first section, the document warned that “interpretive scores that lend themselves to gross misinterpretations, such as mental-age or grade equivalent scores, should be abandoned or their use discouraged” (p. 23). In the second section, advice concerning test validity was revised considerably in comparison to the 1966 document. The section still discussed content, construct, and criterion-related validity, but most of the new standards on validity dealt with criterion-related validity. It seems likely that the expanded attention to criterion-related validity was due to two developments. First, the criterion-related validity of a selection test, in terms of its power to predict job performance or performance in job training, was at the time one of the major criteria by which the legal propriety of employment selection was judged (Lerner, 1978). Second, there was more general concern over whether standard tests were fair to individuals of different cultural or ethnic groups. Thus, one of the most significant of the new standards concerning criterion-related validity evidence was the following: A test user should investigate the possibility of bias in tests or test items. Whenever possible, there should be an investigation of possible differences in criterion-related validity for ethnic, sex, or other subsamples that can be identified when the test is given. The manual or research report should give the results for each sample separately or report that no differences were found. (Standard E9, p. 43)
Nevertheless, the major expansion of the 1974 Standards came in the third section, entitled “Standards for the Use of Tests.” The prescriptions in this section dealt with (a) qualifications and concerns of users, (b) deciding to use a test or other method of assessment, (c) test administration and scoring, and (d) interpretation of test scores. In the last category, and one of the most interesting of the new standards, was the following: A test score should be interpreted as an estimate of performance under a given set of circumstances. It should not be interpreted as some absolute characteristic of the examinee or as something permanent and generalizable to all other circumstances. (Standard J 1, p. 68)
This addition to the 1974 standards is fascinating because it would seem to preclude the interpretation of test scores as measures of innate intelligence or fixed ability – exactly the sort of interpretation that gave rise to the IQ controversy of the early 1970s.6
Calls for the Reform of Testing The preparation and issuance of the new set of professional Standards for Educational and Psychological Tests did little to quell the broader concern, and hence the 1970s saw a remarkable variety of calls for the reform of
Salkind_Chapter 72.indd 403
9/8/2010 12:08:02 PM
404
Research Design, Measurement and Statistics and Evaluation
testing. What might be called the testing reform movement of the 1970s actually began in 1968 when the National Association of Black Psychologists passed a resolution calling on the APA to “establish a committee to study the misuse of standardized psychological instruments [which are used] to maintain and justify the practice of systematically denying economic opportunities to Black youth.” Pending the findings of such a review, the group called for a moratorium on comparative testing (Williams, Mosby, & Hinson, 1976, p. 6). Over the next several years, concerned over the effects of testing on minority youth, several groups of black professionals called for moratoria on testing in the schools, and in 1972 and 1973, the National Education Association (NEA) voted for immediate moratoria on standardized testing in the schools and testing for the purpose of teacher certification. In 1972 and 1974, the National Association for the Advancement of Colored People (NAACP) in its national convention called for cessation of standardized testing whenever such tests have “not been corrected for cultural bias” (Bollenbacher, 1976; Williams et al., 1976). These groups calling for reform of testing were joined in 1975 by the National Association of Elementary School Principals (NAESP), which in their journal National Elementary Principal issued what Bollenbacher (1976) called “a devastating attack on standardized testing.” The Principal articles were collected and printed in book form under the title The Myth of Measurability (Houts, 1977). Added to these voices was that of the nation’s best known muck-raking reformer, Ralph Nader. In 1975, Nader published an article in Ladies Home Journal critical of testing in general, but specifically attacking the Educational Testing Service (ETS). ETS had been founded in 1947 by the American Council on Education, the Carnegie Foundation on the Advancement of Teaching, and the College Entrance Examination Board (Holmen & Doctor, 1972, p. 36). At the time, ETS was seen as a vehicle for consolidating these organizations’ testing programs, and for bringing together the best technical expertise for the development and improvement of testing. The ETS became best known for its development and administration of the SAT, which until 1947 had been administered by the College Board. The Nader organization continued its scrutiny of standardized testing, and ETS in particular, and in 1980 issued a 500-page report entitled The Reign of ETS: The Corporation That Makes Up Minds (Nairn & Associates, 1980). The report leveled numerous charges against ETS and the tests it develops. Among the accusations were that tests like the SAT and the Law School Admissions Test (LSAT) have little power to predict how well students will do in school and have even less power to predict how well they will do in life after schooling; such tests are biased against minority and low-income individuals; and, contrary to disclaimers by ETS and test sponsors, coaching can be effective in boosting scores on aptitude and admissions tests.
Salkind_Chapter 72.indd 404
9/8/2010 12:08:02 PM
Haney
Testing Reasoning and Reasoning about Testing
405
Some testers understandably seemed to feel besieged. Some responded with considerable vehemence and a cynicism reminiscent of Terman’s rejoinder to Lippman a half-century earlier. Psychologist Barbara Lerner, former study director of the National Research Council’s Committee on Ability Testing, and College Board President George Hanford both declared that there was a “war on testing” (Hanford, 1980; Lerner, 1979, 1980). In an address to the 1979 APA convention that was subsequently printed and distributed by ETS, Lerner opined, The attack on tests is, to a very frightening degree, an attack on truth itself by those who deal with unpleasant and unflattering truths by denying them and by attacking and trying to destroy the evidence for them. (p. 1)
She suggested specifically that the leaders of the NEA opposed standardized testing because tests revealed what an inadequate job teachers had been doing, and that NAACP leaders had attacked tests “because they show that integration alone cannot solve the problems of illiterate black youth” (p. 4). Fred Hargadon (1980), chairman of the College Board’s Board of Trustees, even went so far as to suggest this suspicion, in his less generous moments, that much of the support for doing away with objective testing actually comes from those who, having “made it” on merit, who are now more educated and privileged, want to change the ground rules by which their offspring will be judged, particularly if objective testing might – as it invariably will in some cases – show their offspring to have less talent and less potential than the offspring of minority or low income families. (p. 13)
As with the controversy in the 1920s, though, some testing specialists responded to the public clamor in considerably more thoughtful tones. Bert Green (1978), for example, published an article “In Defense of Measurement,” in which he reiterated the value of standardized multiple choice tests in promoting objective evaluation, but at the same time noted that the apparent precision of test scores tends to foster an overdependence that should be resisted. Henry Dyer, in a review of The Myth of Measurability, pointed out that many of the “outside critics” of the 1970s, though sometimes extreme in their rhetoric, were in the main simply pointing out “many of the things that we ‘inside critics’ have been trying for years to get across to the test using public concerning the limitations, misconceptions, and consequent misuses and abuses of standardized tests as well as other kinds of tests.” How long have we been saying, asked Dyer (1977), that 1) Test norms should never be confused with fixed standards of academic performance? ... 2) No battery of standardized achievement tests can measure everything one cares about in pupils’ learning; that indeed there is always alot
Salkind_Chapter 72.indd 405
9/8/2010 12:08:02 PM
406
Research Design, Measurement and Statistics and Evaluation
3) 4) 5) 6)
left over that such tests cannot capture and that must be observed by other means? ... IQ must never be regarded as a measure of something immutable inside a pupil’s head that determines his or her life chances “even to the edge of doom?” ... Your so-called intelligence test would be better regarded as a general achievement test that is inevitably affected by the child’s learning opportunities? ... The measures you get from educational and psychological tests of any kind ... are inevitably afflicted with huge errors of measurement? ... [and] A machine-scorable multiple-choice test cannot pretend to provide anything more than an indirect measure of what children know or how they think? (pp. 3– 4)
Truth-in-Testing Dyer’s thoughtful rejoinder to the criticisms of testing in the 1970s was something of a rarity, but even as it was issued pressures were growing for even more heavy reliance on testing in the schools. However, before we turn to the causes of such pressure – namely, concern over the test score decline and the minimum competency testing movement – let us first recount one aspect of the testing reform movement that may have more than passing consequence for the future testing of reasoning. It was the proposal for socalled “truth in testing.” This term refers to a variety of efforts growing out of the testing reform movement of the 1970s to regulate testing, many of which took the form of legislative proposals to require that (a) individual test takers have access to corrected test questions, not just the score they received on the test; (b) test sponsors file information on test development, validity, and cost with government agencies; and (c) test publishers give individual test takers information on the nature and intended uses of test results prior to testing and guarantee their right of privacy concerning their own test scores. Legislative proposals along these lines were considered in the late 1970s in California, New York, Massachusetts, and even in the U.S. Congress. Advocates argued that as a matter of simple justice, test takers should have the opportunity to review the corrected results of tests that could have important consequences for their educational and job opportunities – they should know the basis on which they were being judged. They also argued that test publishers should be more accountable for the intellectual and social consequences of the tests they produced and that the limitations of standardized tests (such as the ones Dyer noted) should be more widely known not just to those who used test results but also to those who took the tests. The theory was that releasing tests for public scrutiny would help bring these things about (Brown & McClung, 1980; New York State, 1980; Strenio, 1979; U.S. Congress 1980a, 1980b).
Salkind_Chapter 72.indd 406
9/8/2010 12:08:02 PM
Haney
Testing Reasoning and Reasoning about Testing
407
The truth-in-testing proposals were backed by many of the groups which earlier in the 1970s had called for test reforms: the NEA, the NAACP, and Public Interest Research Groups affiliated with Ralph Nader, for example. Test publishers were largely opposed. Naturally they were against government regulation in general and specifically worried that public disclosure would invalidate tests for future reuse and would subtantially raise test development costs. The first place that a truth-in-testing proposal was passed into law essentially intact was New York in 1979. Despite the considerable controversy over the test disclosure legislation, subsequent events transpired to indicate that at least in minor measure the reformers’ argument was correct. Following passage of the New York law, the College Board decided to release nationwide questions and answers on the Preliminary Scholastic Aptitude Test (PSAT). Question No. 44 from the October 1980 administration of the PSAT showed two pyramids, one with four faces and the other with five faces, with all faces being equal-sized equilateral triangles except for the square base of the five-faced pyramid. Question 44 asked, “If the two pyramids were placed together face-to-face with the vertices of the equal-sized equilateral triangles coinciding, how many exposed faces would the resulting solid have?” The answer originally scored correct was seven, on the reasoning that when the four-faced and the five-faced pyramids were placed together, two faces would disappear, leaving the resulting solid with seven faces. Daniel Lowen, a Florida schoolboy, had marked five as the correct answer. On receiving his results, together with the questions and corrected answers, checking with his father who happened to be a mechanical engineer, and building a physical model of the problem, young Lowen became convinced this his answer was correct and that the one marked as correct was wrong. What happened when the pyramids were placed together was that in addition to losing the two faces that joined the pyramids, the resulting solid also lost two additional faces as four equilateral triangles fell into two planes forming two parallelograms out of four equilateral triangles. It was clear that test developers at ETS, as well as a panel of college professors who had reviewed the original item, were wrong and that Daniel Lowen was right. ETS quickly owned up to the mistake. “They were very professional about it,” in the words of Lowen’s father, but the incident caused a flurry of publicity, with front-page headlines such as the following: Youth Outwits Merit Exam, Raising 240,000 Scores. New York Times, March 17, 1981. Student Outwits PSAT, Raising 240,000 Scores. Boston Globe, March 18, 1981.
The incident came only a week after the executive vice president of ETS had written an editorial for the op-ed page of the New York Times in a form parodying multiple choice test items to suggest that among its other failings,
Salkind_Chapter 72.indd 407
9/8/2010 12:08:02 PM
408
Research Design, Measurement and Statistics and Evaluation
the New York law would not achieve the goal of letting students check the accuracy of scoring (Solomon, 1981). The pyramid problem provoked an outpouring of interest (a deluge of letters, according to one account). A professor of psychiatry even wrote to the New York Times to suggest, facetiously, that marking “eight” as incorrect on the problem would discriminate against “future internists and psychoanalysts who assume from the start that everything has an inside” (Fiske, 1981a, p. C3).7 The pyramid incident was not an isolated one. A New York student reviewing his SAT results discovered that, contrary to how it had been scored, a math question had two correct answers; according to news accounts two faulty items were discovered on the Graduate Record Examination; and another was discovered on the LSAT (Fiske, 1981b, 1981c; “22,000 Scores Revised,” 1981). The errors discovered in admissions tests following passage of the New York truth-in-testing law surely contributed to an air of vaudeville in public attention to testing in 1981, but they also contributed in the words of one headline to “Soul-Searching in the Testing Establishment” (Fiske, 1981c). In an apparent turnaround of opinion, College Board chairman Hargadon asserted that the mistakes constituted persuasive grounds for “making every effort” to expand test disclosure (cited in Jacobson, 1981, p. 1).
The Test Score Decline The truth-in-testing movement focused mainly on college and postgraduate admissions tests, probably in part because such tests seemed to have more visible consequences for the fate of individual test-takers than did testing of students below the college age, but surely also because college age testtakers had considerably more political clout than test-takers too young to vote. Meanwhile, however, attention to test score declines among high school students seemed to contribute to a substantial increase in testing among elementary and secondary school students. As Figure 1 shows, writing on testing saw an upswing in the late 1970s. In part this reflected the test reform movement already recounted. In addition, however, it reflected widespread concern over nationally declining test scores among high school students. Just after mid-decade, much interest was aroused over an unprecedented decline in SAT scores. In the preface to a special panel report on the SAT score decline, Sidney Marland, then president of the College Board wrote, “No topic related to the programs of the College Board has received more public attention in recent years than the unexplained decline in scores earned by students on the Scholastic Aptitude” (cited in Wirtz, 1977, p. xii). No one was sure what the decline meant, but whatever the cause, it was seen as a problem. A special panel, convened to
Salkind_Chapter 72.indd 408
9/8/2010 12:08:02 PM
Haney
Testing Reasoning and Reasoning about Testing
409
look into the problem, concluded that the 14-year decline in average national SAT scores (from 1963 to 1977, the average SAT verbal score dropped nearly 50 points, and the SAT math average dropped about 30 points) was due to two different sets of factors. The panel suggested that until about 1970, the fall-off in average scores was due mainly to “compositional changes” in the population of students taking the SAT. More students were taking the SAT, and they were of types who tended to earn lower scores. After 1970, however, the panel concluded that the continuing SAT score decline was due to factors other than expansion of the test-taking population among previously low-scoring types of students. Although it eschewed claims to anything but indirect evidence, the panel suggested six factors as contributing to the test score decline in the 1970s: (a) “a significant dispersal of learning activities and emphasis in the schools”; (b) “diminished seriousness of purpose and attention to mastery of skills and knowledge … in the schools, the home and in society generally”; (c) more learning taking place through “viewing and listening modes than through traditional modes”; (d) changes in the “role of the family in the educational process”; (e) “disruption of life in the country” between 1967 and 1975 by the Vietnam War and other events; and (f) a “marked diminution in young people’s learning motivation” (Wirtz, 1977, pp. 46–48).
The Minimum Competency Testing Movement The suspected ills were many, but in the view of the special panel the prime ones seemed to lie with schools and schooling. The diagnosis was a common one in the late 1970s. One of the most commonly prescribed remedies, minimum competency testing, led to a new type of testing and to considerable controversy over its utility and fairness. Between 1976 and 1980, dozens of states around the country began to implement new testing programs to provide (a) a new standard for the award of the high school diploma, (b) a guide for grade-to-grade promotion, and/or (c) a means of identifying students in need of remedial instruction. Public opinion polls in the 1970s clearly indicated that a vast majority of Americans thought that requiring students to pass a test in order to receive their high school diplomas was a good idea and that the answer to dealing with students unable to keep up with their classmates was to “hold them back” (Gallup, 1974, p. 29). The minimum competency testing movement seemed to be a direct reflection of such sentiments, but the application of minimum competency tests in the schools seemed to be not a simple matter. Numerous articles, several books (e.g., Airasian, Madaus, & Pedulla, 1979; Jaeger & Tittle, 1980; Lazarus, 1981), and even a Public Broadcasting System television series have been devoted to the problems and prospects of minimum competency testing. But probably no other single incident has better characterized the problems of
Salkind_Chapter 72.indd 409
9/8/2010 12:08:02 PM
410
Research Design, Measurement and Statistics and Evaluation
minimum competency testing – nor received more publicity – than the federal court case of Debra P. v. Turlington in Florida.
Litigation and the Case of Debra P. Litigation concerning testing was nothing new in the late 1970s. Title VII of the Civil Rights Act of 1964 required nondiscrimination in employment with respect to race, color, religion, sex, and national origin. The Act also established the Equal Employment Opportunity Commission (EEOC). In 1966 and again in 1970, the EEOC issued guidelines on the use of tests in employment selection (see Novick, 1982, and Wigdor, 1982, for recent accounts of the evolution of federal employment testing guidelines). Both the 1966 and 1970 guidelines were based in part on the APA, AERA, and NCME test standards and were intended to remedy what was widely seen as abysmally low standards of practice in the field of employment testing in the early 1970s. “Spokesmen for test publishers, personnel directors in industry and industrial psychologists agree overwhelmingly that unvalidated tests are commonly used in industry” (Holmen & Doctor, 1972, p. 146). The 1970 EEOC guidelines established some very stiff criteria for establishing the legality of employment testing procedures. Specifically, the 1970 guidelines specified that the use of any test that adversely affects the hiring, promotion, or other employment opportunities of classes protected under Title VII constituted discrimination unless (a) the test had been validated in terms of criterion-related validity as described in the APA, AERA, and NCME standards (or where criterion-related validity is not feasible in terms of appropriate content or construct validity), and (b) it has been demonstrated that alternative employment selection procedures having less discriminatory impact are unavailable for use. The nondiscriminatory employment mandate embodied in Title VII of the Civil Rights Act of 1964 led to a tremendous amount of litigation, much of it concerning employment testing. An employee of the EEOC estimated in the early 1970s that some 15 to 20% of all complaints filed with that agency involved testing, amounting to roughly 1,000 cases per year (Holmen & Docter, 1972, p. 146). Summaries of Court Decisions on Employment Testing 1968-1977 (Psychological Corporation, 1978) listed more than 100 court decisions, most of them based on Title VII. In her review of legal decisions on employment testing, Wigdor (1982) concluded that the “general thrust of judicial rulings … has been to eliminate tests and other selection standards (e.g., requiring a high school diploma) where there is adverse impact. . . . Employment tests are being subjected to a degree of governmental scrutiny that very few human contrivances could bear” (pp. 66–67). But despite the considerable litigation concerning employment testing in the 1970s, there were relatively few cases focusing specifically on educational
Salkind_Chapter 72.indd 410
9/8/2010 12:08:02 PM
Haney
Testing Reasoning and Reasoning about Testing
411
testing (though several of the employment cases dealt with the hiring of teachers). One federal court case received considerable attention in the 1960s. It was the case of Hobsen v. Hansen (1967), which charged that the use of IQ tests in ability tracking in the public schools of the District of Columbia resulted in the disproportionate placement of black students in lower ability tracks. The court ruled that the procedure constituted illegal segregation and banned the use of IQ tests, in part because the tests had been standardized largely on white and middle-class groups of students and hence, in the opinion of the court, produced inaccurate and misleading results when used with black and disadvantaged students. A second education testing case, also involving IQ testing and the classification of black children, received much publicity in the 1970s. It was the case of Larry P. v. Riles (1972), brought in federal court in California. In this case, brought originally in 1971, plaintiffs charged that the use of IQ tests in the placement of children into classes for the “educable mentally retarded” (EMR) resulted in disproportionate placement of black children in such classes. Pointing out that black children represented only 10% of the general student population in California schools, but constituted some 25% of EMR enrollments, they held that misplacement doomed children to “stigma, inadequate education, and failure to develop the skills necessary to productive success in society.” Initially in 1972, the judge issued a restraining order, temporarily banning the use of IQ tests in EMR placements in California. The full trial on the merits of the Larry P. case did not take place until 1977. The trial lasted more than 5 months and saw scores of expert witnesses – “renowned experts disagreeing sharply,” as the court later put it. The full trial transcript ran to more than 10,000 pages. The court decision in the case was issued October 16, 1979. In it, federal court judge Robert Peckham ruled in favor of the plaintiffs, against the state, saying in part, In violation of Title VI of the Civil Rights Act of 1964, the Rehabilitation Act of 1973, and the Education for All Handicapped Children Act of 1975, defendants have utilized standardized intelligence tests that are racially and culturally biased, have a discriminatory impact against black children and have not been validated for the purpose of essentially permanent placement of blacks into educationally dead-end, isolated and stigmatizing classes for the so-called educable mentally retarded. (p. 3)
Both the Hobsen and Larry P. cases were indictments of the school systems’ tracking systems as much as they were of standardized testing. And both dealt with IQ testing, which was, as previously recounted, under considerable critical scrutiny as far back as the 1920s. Against this backdrop, the Debra P. case represented a considerable departure, for it dealt with tests which by no stretch of the imagination purported to measure “innate” ability. Also, though disputation among testing experts was nothing new in court cases, the clash of experts in the Debra P. case was particularly telling.8
Salkind_Chapter 72.indd 411
9/8/2010 12:08:02 PM
412
Research Design, Measurement and Statistics and Evaluation
In 1978, the Florida legislature passed a law requiring Florida public school students to pass a “functional literacy” examination in order to receive a high school diploma. Shortly thereafter, the Debra P. v. Turlington case was brought in federal court in Florida on behalf of 10 black students and others who were to be denied diplomas after failing the state’s new graduation test. Data presented during the trial indicated that had the diploma sanction been enforced, 20% of black high school seniors would have been denied diplomas compared with only 2% of white high school seniors. The original Debra P. case lasted several weeks in the spring of 1979. As in previous employment and educational testing cases, considerable reference was made to the APA, AERA, and NCME standards on testing, and there was considerable dispute regarding interpretation of the Standards. George Madaus, an expert witness for the plaintiffs, argued that adequate validation of a test like the Florida high school graduation test required the integration of construct validity studies, criterion-related validity studies, and content validity studies as described in the test Standards (p. 1732).9 Madaus charged that the Florida test had not been properly validated in any of these three areas. When asked what he thought of the use of the Florida test to determine whether or not students ought to receive their high school diplomas, Madaus answered simply, “It’s unconscionable” (p. 1978). The state of Florida called its own expert witnesses in educational testing. One, who testified at length on the test Standards, was William Mehrens. In sharp contrast to Madaus, Mehrens testified that neither construct nor criterionrelated validity studies were necessary for a test such as the Florida graduation test (at pp. 2225, 2227, and 2300). Moreover, he opined that the content validity evidence for the Florida test met acceptable professional standards. Under questioning, Mehrens acknowledged the legitimacy of the plaintiffs’ concerns over whether the content of the Florida test covered what was taught in the schools. He simply did not define content validity to include such concerns. The dispute between expert witnesses Madaus and Mehrens in the Florida case was nothing new in comparison to previous court cases dealing with standardized testing. But what was remarkable in the Florida case, in light of their sharply contrasting interpretations of the 1974 test Standards, was that both Madaus and Mehrens had been members of the committee that drew up the 1974 standards. On the question of whether or not the Florida test was biased, the court heard similarly conflicting testimony. Lawyers for the plaintiffs called on several experts who identified test questions that they considered biased against poor and minority students. But lawyers for the state of Florida called on their own witnesses to defend the test. One of them, James Popham of the University of California, addressed the test bias issue as follows: The test, the minimum competency program in Florida, is not biased against minority youngsters, against youngsters from low economic
Salkind_Chapter 72.indd 412
9/8/2010 12:08:02 PM
Haney
Testing Reasoning and Reasoning about Testing
413
backgrounds. It’s just the opposite. It allows one to isolate and thereafter remediate instructional deficits that are currently operative in the state. (p. 3190)
In the initial decision in the Debra P. case in 1979, federal court Judge George Carr enjoined the state from forcing students to pass the competency test as a requirement for high school graduation (primarily because the state testing program had been instituted so quickly as to preclude adequate notice and because students, up until the high school graduating class of 1983, would have undergone some of their schooling in de jure segregated schools). But on the specific issues of test validity and bias, the judge held that the Florida test had “adequate content and construct validity and bore rational relation to valid state interest,” and that “the plaintiffs failed to establish that the test was racially or ethnically biased” (Debra P., 1979, p. 244). Neither side was happy with this decision, so appeal and cross appeal were filed. In 1981, the U.S. Circuit Court of Appeals, Fifth Circuit, issued its decision, ruling that immediate use of the diploma sanction would violate equal protection guarantees and would unfairly punish black students for prior dual schooling (de jure segregated schools were abolished in Florida only around 1971–1972). But on the issue of test validity, the Appeals court disagreed sharply with the lower court’s decision. It ruled that the trial clearly erred in its finding on the validity question. “In the field of competency testing,” said the Appeals Court, “an important component of content validity is curricular validity…things that are currently taught” (Debra P., 1981, p. 6770). Since there was insufficient evidence in the record on the issue of whether the Florida test measured what was actually taught in Florida schools, the Appeals Court remanded the case to the lower court for a hearing on the matter. On remand, the district court held a trial on the instructional validity question in 1983. With the help of a consulting firm, the state of Florida presented four types of evidence to show that the material covered on the graduation test actually was taught in the schools. Though expert witnesses for the plaintiffs disputed much of the evidence and its interpretation, the district court ruled, on the preponderance of evidence, that the Florida graduation test did have instructional validity (Debra P., 1983). This decision was appealed by plaintiffs’ lawyers, but in 1984 the Appeals Court upheld the ruling. In a remarkable denoument to the story of the Debra P. litigation, at the annual meeting of the American Educational Research Association, James Popham argued that the state of Florida should have lost the instructional validity trial, so that educators would be forced to “deliver more on-target instruction” (“Florida,” 1984). The comment is remarkable for two reasons. First, Popham was the head of the consulting firm that conducted the study which led to the state’s victory in the instructional validity trial. Second, the remark, by one of the nation’s best-known
Salkind_Chapter 72.indd 413
9/8/2010 12:08:02 PM
414
Research Design, Measurement and Statistics and Evaluation
educational researchers and a past president of AERA, was hauntingly similar to the ideas of Joseph Mayer Rice, pioneer of testing and educational research, a century earlier, who also would have used tests to help ensure that schools delivered “on-target” instruction.
Testing at Eighty This remarkable denoument to the Debra P. case suggests that in some ways the role of testing in education had not changed much in 100 years. Objective tests were viewed by Popham just as they had been by Rice, as a means of holding schools accountable for the learning of their charges. But in the generations between Rice and Popham, reasoning about testing and the influence of testing on educational practice had changed considerably. In the next and concluding section of this paper, I review some of the myriad ways testing had influenced practice in education in this century. But before turning to these conclusions, it is worth pointing out two things. First, this brief history of reasoning about testing has surely given short shrift to several developments and influences involving testing that have had considerable indirect influence on education and thinking about education. The practice of using tests in survey research, for example, pioneered by Rice, extended in the 1930s and 1940s in the Eight Year Study and others, expanded after mid-century. Among the most notable of these surveys were Project TALENT, the Equal Educational Opportunity Survey, the National Longitudinal Survey of the High School Class of 1972, the High School and Beyond surveys (both carried out through the National Center for Educational Statistics) and the National Longitudinal Surveys sponsored by the U.S. Department of Labor. A prime example of how such surveys could influence thinking was the report of the Equal Educational Opportunity survey of 1966, also known after its chief author, James Coleman, as the Coleman Report. Though quietly released and initially seemingly headed for obscurity, the Coleman Report (Coleman et al., 1966) gained widespread attention for what it said about equality of educational opportunity (Grant, 1970). The most startling findings of the report were that school facilities for minority children were not greatly unequal to those of other children, and, even more surprisingly, that what differences in facilities and resources that did exist had little if any discernible relationship to student test scores. The findings were widely (and inaccurately) publicized as indicating that schools don’t make a difference. As Mosteller and Moynihan (1972) pointed out, the Coleman study had a dramatic impact in moving attention in debates about educational equality away from school “inputs” (i.e., things such as funding, school facilities, and teacher characteristics) to focus on “outputs” (i.e., student performance on tests).
Salkind_Chapter 72.indd 414
9/8/2010 12:08:02 PM
Haney
Testing Reasoning and Reasoning about Testing
415
Other surveys and research reports based on surveys that surely influenced thinking about education and testing were the Westinghouse/Ohio study of Head Start (Circirelli, Cooper, & Granger, 1969); works by Jencks and his colleagues (1972, 1977) on the determinants of economic and social success; and Coleman, Hoffer, and Kilgore’s (1981) study based on High School and Beyond data, comparing private and public schools. Periodic reports of the National Assessment of Educational Progress (initiated in 1969 as a sort of check on the educational health of the nation’s youth) also have had some influence, though less than its designers’ intended (Messick, Beaton, & Lord, 1983; Wirtz & Lapointe, 1982). Also, numerous other specific works on education (e.g., James Bryant Conant’s 1961 Slums and Suburbs, and Rosenthal and Jacobsen’s 1968 Pygmalion in the Classroom) have had some indirect influence on thinking about testing. And, as for litigation, though several cases dealing specifically with testing already have been mentioned, surely no single court decision in the United States has had more impact on U.S. schooling – or greater indirect impact on testing – than the Supreme Court’s 1954 decision in Brown v. Board of Education of Topeka (Kluger, 1977). But despite myriad specific influences on thinking about education and testing in the period after mid-century, a second general point is that by 1980, though standardized testing was doubtless a huge success in terms of proliferation of tests and use of tests by educational institutions, this success was also seen as a weakness; testing seems to have become infected with the inertia of those institutions and to have progressed little from where it stood decades earlier. As far back as 1967, Anastasi suggested that the “isolation of psychometrics from other relevant areas of psychology is one of the conditions that have led to the prevalent public hostility toward testing” (p. 297). And Oscar Buros, preeminent bibliographer of tests, in one of the last publications before his death in 1978 wrote: Little progress has been made in the past fifty years – in fact in some areas, we are not doing as well. Except for the tremendous advances in electronic scoring, analysis and reporting of test results, we don’t have a great deal to show for fifty years’ work. Essentially, achievement tests are being constructed today in the same manner they were fifty years ago – the major changes being the use of more sophisticated statistical procedures for doing what we did then – mistakes and all. (1978, p. 1972)
Over Eighty: Whither Testing Now? If mental testing – the testing of reasoning – has progressed little in the last half-century, what are its prospects in the future? Before attempting to answer this question and specifically to suggest ways in which the influence of testing and research on testing on educational practice might be enhanced
Salkind_Chapter 72.indd 415
9/8/2010 12:08:02 PM
416
Research Design, Measurement and Statistics and Evaluation
in the future, let us first step back from the historical account just presented to take account of how, over this history, (a) tests have been used in research, (b) tests have been used in schools, and (c) other ways in which testing and research on testing have influenced educational practice.
A Half-Century of Research Using Testing What sorts of research using standardized tests have been conducted over the last 50 years? What sorts of mental attributes have been sought to be measured in this research? Table I presents data that help to answer these questions. Based on Tests in Print III (TIP III) (Mitchell, 1983), it lists the titles of tests that had 99 or more new citations in the TIP III (plus some other tests with particular relevance to education) and for each shows the number of published citations in Buros’ publications back as far as the second Mental Measurements Yearbook (MMY2) published in 1940. These tests account for the bulk of recently published literature concerning tests. TIP III contained almost 2,700 entries on commercially published tests and provided 12,170 new “references in English on the construction, validity and use” of the 2,700 tests listed in the volume (by “new” I refer to the fact that TIP III listed references that were not included in MMY8 published in 1978 or in earlier Buros publications). The 32 test titles listed in Table I account for half of these 12,700 references. Moreover, these 32 test titles account for more than onethird of the total of roughly 90,000 published references to tests compiled in the entire Buros series. Several patterns are apparent in Table I. First, note that many of the tests most often cited in recent literature have a considerable lineage, with references on many going back before mid-century. This would seem to be an indirect reflection of the Buros’ point cited earlier that testing has progressed little in the last 50 years – though it should be noted that many of the test titles listed in Table I have been published in different editions over their lives. Sundberg (1954) conducted a similar analysis showing patterns of references for the 12 tests with most references in MMY4 (1953). The top 5 in Sundberg’s list all are among the 20 tests with most new citations in TIP III – three decades later. Table I nevertheless indicates something of the ebb and flow of interest in different kinds of tests (and in particular test titles) in recent years. References concerning personality tests seemed to have peaked in the 1970s, for example, with the number of references to the MMPI, the Rorschach, the Eysenck Personality Inventory, the Personality Research Form, and the Sixteen Personality Factors Questionnaire all reaching their maximum in MMY8 (1978) or earlier. Note too the appearance of tests aimed at young children in the 1970s (with references to both the Wechsler
Salkind_Chapter 72.indd 416
9/8/2010 12:08:02 PM
Salkind_Chapter 72.indd 417
1. Minnesota Multiphasic Interest Inventory (1498) 2. Wechsler Intelligence Scale for Children (2602) 3. Wechsler Adult Intelligence Scale (2598) 4. Peabody Picture Vocabulary Test (1771) 5. State-Trait Anxiety Test (2300) 6. Wide Range Achievement Test (2621) 7. Eysenck Personality Inventory (859) 8. Personality Research Form (1798) 9. Stanford-Binet Intelligence Scale (2289) 10. Progressive Matrices (1914) 11. California Psychological Inventory (354) 12. Sixteen Personality Factor Questionnaire (2208) 13. Bender Gestalt Test (280) 14. Rorschach (2030) 15. College Board SAT (501) 16. Tennessee Self-Concept Scale (2413) 17. Adjective Checklist (116)
Test title (and entry number in TIP III)
Table I: Tests commonly cited in Buros’ MMY series
8 34 621 22
8 451
147
102
142 32
211 22
MMY4 1953
217 13
72
MMY3 1949
134 8
MMY2 1940
127 62 33 21 118 1078 20
496 111 42
MMY5 1959
108 78 111 79 99 733 79
15 1
615 155 180 21
MMY6 1965
30
249 249 170 719
52 13
1066
PTR1 1970 831 519 540 202 20 49 121 27 258 194 371 295 192 455 298 88 131
MMY7 1972 549 230 178 78 45 35 140 23 428 122 166 244 144 376 148 90 85
TIP II 1974
Number of references listed in Buros publications
1188 548 351 213 268 117 405 132 176 190 452 619 253 360 217 384 202
MMY8 1978
749 645 576 301 277 249 245 116 203 200 195 182 159 155 152 120 117
5777 2230 1867 815 610 465 964 311 1793 899 1577 1697 1177 5095 936 702 637
Total references
(Continued )
TIP III 1983
Haney Testing Reasoning and Reasoning about Testing 417
9/8/2010 12:08:03 PM
Salkind_Chapter 72.indd 418
Source: See Appendix 2.
18. Multiple Affect Adjective Check List (1547) 19. Piers-Harris Children’s Self-Concept Scale (1831) 20. Torrance Test of Creative Thinking (2512) 21. Thematic Apperception Test (2491) 22. Bayley Scales of Infant Development (270) 23. Metropolitan Achievement Tests (1473) 24. Strong-Campbell Interest Inventory (2318) 26. Iowa Tests of Basic Skills (1192) 32. Stanford Achievement Test (2206) 33. Wechsler Preschool and Primary Intelligence Scale (2608) 35. Gates-MacGinitie Reading Test (932) 38. Metropolitan Readiness Test (1479) 41. Otis-Lennon School Ability Test (1754) 45. Comprehensive Test of Basic Skills (551) 46. California Achievement Test (344)
Test title (and entry number in TIP III)
Table I: (Continued)
3
1
8
3
20
34
5
10 98
198
MMY4 1953
7 104
101
MMY3 1949
10
3 71
MMY2 1940
10
19
153
311
MMY5 1959
19
16 188 17 13
287
MMY6 1965
339
28
PTR1 1970
32
124 6
44 56
60 8 243 297 20 25 485
MMY7 1972 56 10 88 231 11 20 133 87 87 30 18 55 10 1 28
TIP II 1974
Number of references listed in Buros publications
102 95 229 241 28 41 289 58 51 84 34 111 35 13 33
MMY8 1978
TOTAL
108 107 107 105 101 99 99 97 80 80 77 73 67 59 58
TIP III 1983
34654
354 220 667 2110 160 221 1620 259 348 250 129 381 118 73 192
Total references
418 Research Design, Measurement and Statistics and Evaluation
9/8/2010 12:08:03 PM
Haney
Testing Reasoning and Reasoning about Testing
419
Preschool and Primary Intelligence Scale and the Bayley Scales of Infant Development appearing in Buros’ 1972 MMY7). This was no doubt an indirect reflection of the upsurge of interest in early childhood education in the 1960s with the founding of Head Start and the arguments by Hunt (1961) and Bloom (1964) that early childhood was a critical period for educational development. Third, note that in terms of cumulative published references, two test titles clearly stand out: the Minnesota Multiphasic Personality Inventory and the Rorschach, each with over 5,000 cumulative references and no other test title having even half that number. Why should these tests be so widely cited? Surely it is not because of their proven validity and reliability as measurement instruments. As one reviewer of the Rorschach pointed out, Certainly the validity research on the Rorschach does not warrant its popularity. Rather it seems it is the role the Rorschach has played within the psychodynamic oriented approach to psychopathology that has resulted in its popularity. Few instruments provide data so rich with hypothetical associations as does the Rorschach. When the goal of assessment is to formulate complex personality structures and complex dynamic interactions as the cause of observed behavior, the Rorschach elicits responses which can be multi-interpreted and combined in an endless set of associations to produce speculative complex hypotheses and interpretations. (Peterson, 1978, p. 1042)
If I may offer a uni-interpretation of this passage, it seems that Peterson feels that the Rorschach is popular not because it provides valid answers to specific questions, but because it multiplies the questions. This suggests that at least for some purposes standardized tests are valued not as valid and reliable measurement instruments per se, but because they provide provocative data that can be interpreted in many ways. References to the Rorschach in the Buros bibliographies have sharply declined since the 1960s, and though the MMPI still headed the list of tests with many new citations in TIP III, it appears that the MMPI may be starting to lose its edge in published citations. Also, a considerable proportion of the references to the Rorschach and the MMPI appear to come from the psychological and psychiatric literature, and hence may not provide a very good indication of test use in educational research. Heyneman and Mintz (1976) provide more relevant data. They reviewed some 3,500 proposals for research on children and youth submitted to a variety of federal agencies in fiscal year 1975. They identified about 500 titles of published tests (and another thousand titles that they could not locate in standard test reference bibliographies). They found, however, that relatively few academic achievement test batteries (5% of the test titles) accounted for over one-quarter of all research instruments proposed to be
Salkind_Chapter 72.indd 419
9/8/2010 12:08:03 PM
420
Research Design, Measurement and Statistics and Evaluation
used. The most commonly identified test titles (each with over 50 instances of proposed use) were the following: Academic Achievement Batteries: Metropolitan Achievement Test Stanford Achievement Test California Achievement Test Wide Range Achievement Test Iowa Test of Basic Skills Comprehensive Tests of Basic Skills Tests of Basic Experiences Individual (Intelligence) Peabody Picture Vocabulary Test Wechsler Intelligence Scale for Children Stanford Binet Intelligence Scale Bayley Scales of Infant Development
Whatever the public concern and litigation over intelligence testing in the 1970s, traditional intelligence tests still retained a prominent place in the armamentarium of researchers in the mid-1970s.
Research on Test Use in Schools These data do not speak directly, however, to the issue of test use in schools. Dimengo (1978) reports on a survey of “94 test directors of large cities and counties” across the nation. Though only 80% responded, Dimengo reported data indicating that relatively few test titles were used in school systems’ basic testing programs. The tests used at any one grade level by eight or more systems were the following: Comprehensive Tests of Basic Skills Metropolitan Achievement Test Iowa Test of Basic Skills California Achievement Test Stanford Achievement Test
These findings seem to confirm those of Holmen and Doctor (1972) that sales of a handful of commercial test publishers account for “approximately three quarters of test sales in the country and for probably a higher percentage of educational test sales” (p. 33). But two other aspects of the Dimengo survey are worth noting. First, in marked contrast to the Heyneman and Mintz (1976) findings regarding tests used by researchers, large school systems seem by the late 1970s largely to have abandoned intelligence tests as components of their basic testing programs. Second, in addition to the commercially marketed tests listed above, the Dimengo survey identified state-sponsored tests in California and Florida as being used by five or more of the large school systems surveyed. It seems that with the growth of minimum competency testing and statewide
Salkind_Chapter 72.indd 420
9/8/2010 12:08:03 PM
Haney
Testing Reasoning and Reasoning about Testing
421
assessment programs modeled after the National Assessment of Educational Progress, state-sponsored tests are beginning to rival commercially published tests in terms of use in some local education agencies. These survey data on tests used in school systems do not indicate much about how tests are used in schools. Since mid-century there have been several studies of how tests are actually used (Goslin, 1967; Goslin, Epstein, & Hallock, 1965; Hastings, Runkel, & Damrin, 1961; Hastings, Runkel, Damrin, Kane, & Larson, 1960; Kellaghan, Madaus, & Airasian, 1982; Radwin, 1981; D. Resnick, 1981; L. Resnick, 1981; Salmon-Cox, 1981; Sproull & Zubrow, 1981; see also Rudman et al. 1980). Several themes emerge from this literature. First, school people often are not terribly knowledgeable about the technology of testing, with misinterpretations of grade-equivalent scores and even percentile scores quite common. Second, run-of-the-mill school testing with no clear consequences attached to test results often are seen as little useful. Sproull and Zubrow’s (1981) study of central school administrators, for instance, found that such administrators thought standardized test results to be not terribly useful to themselves. They tended to report that the benefits of testing accrue primarily to others, to teachers and principals at the building level, and to outside audiences, such as the federal government. Yet in a parallel study, Salmon-Cox (1981) reported that teachers do not find test results to be of much use: “Teachers desire diagnostic tests that are precise, closely matched to curricula and instruction and timely. Achievement tests of the kind now widely used do not match these criteria” (p. 634). Added to these two findings is the provocative finding from the Hastings et al. (1961) study, that the more school people learn about the technology of testing, the less weight they may place on test results in comparison to other information about students. These findings would seem perplexing on at least two counts. First, they seem at odds with the repeated social controversies over testing since midcentury. Second, they would seem to run contrary to the notion widely publicized by Pygmalion in the Classroom (Rosenthal & Jacobsen, 1968) that test scores can directly influence teacher’s ratings of students, and teachers’ expectations in turn can influence student performance on IQ tests. The perplexity can be resolved as follows. First, let me address the Pygmalion issue. Rosenthal and Jacobsen (1968) reported results of an experiment in which teachers had been given test data on individual students in their classes, and these test data had affected teachers’ expectations, which in turn appeared to have affected students’ IQ test performance. The Pygmalion study was severely criticized on methodological grounds shortly after it was published (Elashoff & Snow, 1971). Cronbach (1975) even opines that “Pygmalion in the Classroom merits no attention as research” (p. 6) and suggests that the publicity of the study was due to the prevailing mood of public opinion and the propensities of the media to publicize controversial research findings without much attention to the quality of the research evidence. It is also true that several replications of the Pygmalion research have
Salkind_Chapter 72.indd 421
9/8/2010 12:08:03 PM
422
Research Design, Measurement and Statistics and Evaluation
failed to replicate its findings. But recent reanalyses of “Pygmalion” studies (Raudenbush, 1984) suggest that the effects of test scores on teachers’ opinions may be real under certain conditions. The conditions? when there has been little prior personal contact (defined in Raudenbush’s reanalyses as less than 2 weeks of classroom contact) between teacher and student, with Pygmalion effects particularly strong in grades 1, 2, and 7. Second, in resolving the apparent contrast between research on test use and the recent controversies over testing, note that almost all of the controversies have dealt not with run-of-the mill school testing programs, but with testing that does, or at least appears to, have considerable consequence for individual test-takers (e.g., minimum competency testing for high school graduation, IQ testing for placement into EMR classes, and college admissions testing). Both explanations suggest two important points. One is that the social success of testing in many ways is a product of the bureaucratization of education. Testing seems not so important in the stuff of teaching and learning, where surely there must be much personal contact, but rather in the interstices of our educational institutions – entry into elementary school, placement in special classes, the transition from elementary to secondary school, high school leaving and college going. This suggests a second point, namely, that to make sense of the last century of reasoning about testing, we cannot rely on instrumental reason alone. Rather, to make sense of the historical role of testing, as Cohen and Rosenberg (1977) observed of the social functions of schooling, we must attend to not only the strictly instrumental functions of tests but also to their expressive qualities, such as feelings, values, and images expressed. In this regard, it seems clear that attitudes and debates over testing are affected not just by prevailing social conditions and political attitudes – as Gould (1981) noted of the interpretations of the WWI test results and Cronbach (1975) noted of the original Pygmalion study – but in particular by attitudes toward the social role of schooling. At the elementary and secondary levels of schooling, it is quite widely accepted that the value of test information for the purposes of special placement should be judged in terms of the educational benefits derived from such special placement. Even as staunch a defender of mental testing as Arthur Jensen (1980) argues that the only justification for placement of students in special classes is evidence “that alternative treatments are more beneficial to the individuals assigned to them than if everyone got the same treatment” (p. 46). Analogous logic could be applied with regard to postsecondary admissions testing, but interestingly it generally is not. Debates about the value of postsecondary admissions testing generally are couched, by both proponents (such as ETS) and critics (such as Nader), in terms of the predictive value of such test information. Yet as Alexander Astin (1971) noted, “To defend the value of selective admissions on the grounds that aptitude tests and high school grades predict performance is perhaps to miss the main point of education” (p. 639). Astin suggests that the criterion
Salkind_Chapter 72.indd 422
9/8/2010 12:08:03 PM
Haney
Testing Reasoning and Reasoning about Testing
423
for determining the value of an admissions program ought to be whether students learn and acquire skills that are useful either to themselves or to society. Despite the cogence of Astin’s point and its similarity to Jensen’s warning about the value of test information in special placements at the elementary and secondary levels, debates over the utility of postsecondary admissions testing continue to focus very largely on the issue of predictive validity. Why? It seems as though the answer derives not from anything particular to standardized testing but rather from attitudes about the aims of education. When it comes to public schooling at the elementary (and perhaps to a somewhat lesser extent, the secondary) level, there is broad social commitment to educating all students to their full educational potential. But at the secondary level, the sorting and selection function comes to the fore, and hence the value of selection tests used at this level is judged quite differently than at the elementary level.
The Influence of Testing on Educational Practice If how we think about the goals of education has affected thinking about the value of testing, it also is true, even if tests have not been shown to have much direct influence on the everyday activities of teachers and school administrators, that testing has had much indirect influence on education and how we think about it. Indeed, Cohen and Garet’s (1975) point about the impact of social science research on educational practice seems particularly apropos to influence of testing and research involving testing on education: “Most policy-oriented research, at least in education, tends to influence the broad assumptions and beliefs underlying policies, not particular decisions” (p. 39). That this is so seems evident throughout the history of testing in the United States. The spelling tests used by Rice surely had far less impact on particular students and schools than Rice’s findings had on public perceptions of the efficacy of schools. The direct impact of the Binet scales on the placement of Parisian school children into special classes has apparently gone completely unrecorded, but Binet’s invention obviously has had tremendous impact on testing and how people think about testing in the United States. The actual use of the Army tests in WWI in the placement of recruits seems doubtful, but the WWI testing had much indirect impact, as we have seen, on both testing in the schools and assumptions about the intelligence of different social groups. Factor analysis research in the 1920s and 1930s surely had little if any influence on specific decisions, but had great impact in promoting views of differentiated, as opposed to unitary, intelligence. And in more recent years, myriad indirect effects of testing and research using testing are apparent on how we think about education, on how we think about the outcomes of schooling, equality of educational opportunity, and the competence of students and teachers.
Salkind_Chapter 72.indd 423
9/8/2010 12:08:03 PM
424
Research Design, Measurement and Statistics and Evaluation
Under certain conditions, it seems clear that testing can have much direct impact on educational practice. As Madaus (1981) pointed out in commenting on recent research on test use, Whenever test results become a key element in important decisions that affect individual life chances (e.g., graduation from high school, grade-tograde promotion, teacher salary or tenure decisions, school certification or the allocation of funds), the agency that administers the test assumes a great deal of power over the schooling process. When external tests are used in these ways, administrators, teachers and pupils take the results seriously and modify their behavior and attitudes accordingly. (p. 635)
Radwin (1981) presents a case of how this worked, when citywide reading testing in New York City had highly visible consequences for schools in that city. Another case is the emergence of proprietary coaching courses for college and graduate admissions tests (and recently, marketing of computer software tutorials to prepare for admissions tests). These counter examples to findings of studies of test use serve to reemphasize a prior point. These are instances of influential testing not on the everyday stuff of teaching and learning in regular classrooms, but instead are cases in which testing is linked to the educational organization and administration. It indicates again that the social success of testing has been closely linked with the bureaucratization of education. Such bureaucratic use of testing is, of course, nothing new. In this country it goes back at least a century, to the civil service exams of the 1800s, and elsewhere at least several millenium, in the civil service examinations of the Chinese empire (DuBois, 1966). If bureaucratic sanctions, either real or imagined, are linked to particular testing programs, surely the testing can be highly influential. The problem is that when such sanctions are attached to tests, attention may devolve on the instruments themselves rather than on the substance of what the tests are intended to measure.
Testing Reasoning and Future Research If this is so – that is, if the past suggests that the influence of tests on educational practice has been largely indirect, or if more direct, dependent on social conditions, institutional context, and sanctions attached to their use – what, then, can we suggest about future research regarding testing that might benefit educational practice? Wagner and Sternberg (1984) have argued that it is extremely hard to derive educational implications from the psychometric conception of intelligence (or reasoning). As long as testing is equated with psychometrics – the measurement of reasoning and other mental faculties – I think they are right. But more than anything else, we need to think of testing as more than measurement. Hence I would offer two suggestions for research on testing: first,
Salkind_Chapter 72.indd 424
9/8/2010 12:08:03 PM
Haney
Testing Reasoning and Reasoning about Testing
425
to promote thinking and better understanding of the currently prevailing sorts of testing, and second, to develop new types of tests that might better serve not just as measurement instruments, but also as spurs to reasoning. First, we need to develop ways of better understanding what it is that tasks posed on tests represent to individual test-takers. It is clear from unusual examples such as the pyramid problem described earlier and from experience interviewing individual children about their reasoning on specific questions posed on tests (Haney & Scott, 1978; Mehan & Wood, 1975; Powell, 1968), that test questions sometimes simply may not represent what was intended in terms of the type of reasoning elicited. Interviewing individuals about test questions can be terribly time consuming, but direct consideration of how people react to and think about test questions surely is a better method for finding out what test questions mean to those who answer them than is indirect investigation via statistical analysis of right and wrong answers selected, which are, of course, mere artifacts of the reasoning of real interest. This is something that anyone developing an individually administered test must do for him- or herself. But too few people who would use test results have themselves had the experience of trying to figure out what sort of task particular test questions represent for individual test-takers. Second, we need work directed at understanding the implications of reporting test results in various ways, with different labels, scales, and degrees of precision. Testing has been in several instances a victim of its own success. Perhaps the best example of this is in regard to the notions of IQ or general intelligence. Both terms represent useful concepts, but both have fallen into disfavor, it seems to me, not because there is anything inherently wrong with them or unuseful in the ideas they represent. Rather, current disfavor stems from the very success with which these terms have been disseminated into popular culture – to such an extent that they are widely misunderstood and frequently ill-used. Such problems have led some to substitute terms such as mental aptitude, cognitive ability, or academic achievement to avoid misunderstanding and ill-use (Green, 1974; Jencks & Crouse, 1982; Shrader, 1982; Slack & Porter, 1980). And to prevent people from believing over much in the aptitude/achievement distinction (which is, of course, quite ill-defined when it comes to tests so labeled, distinguishing, if anything at all, only degrees with which instruments are intended to be related to previous instruction, to cover general as opposed to specific skills, and to look into the future vs. the past; Anastasi, 1975), some would call tests with both labels simply ability tests to emphasize the continuity (Haney, 1982; Wigdor & Gamer, 1982). Such changes are well-motivated, but the changed nomenclature risks exactly the same dangers that led to the problems. For in both cases, words from common parlance are being attached to a technology such that the common meanings of the words will almost inevitably color, or discolor, understanding of the technology. As long as words from the common lexicon are used to describe tests and test results,
Salkind_Chapter 72.indd 425
9/8/2010 12:08:03 PM
426
Research Design, Measurement and Statistics and Evaluation
misunderstandings inevitably will creep in, but rather than simply applying ad hoc linguistic remedies to recognized misunderstandings (or simply exhorting test authors to warn against misunderstanding, as the professional test Standards sometimes have done), some systematic investigation should be made into the implications of applying various labels and scales in reporting test results. Psychophysical methods that sociologists have used to investigate the dimensions of social standing (Coleman & Rainwater, 1978) or the card sort experiments used by Hastings et al. (1960) and Goslin (1967) could easily be adapted to such ends. A third area in which useful research might be undertaken is that of test scaling, in particular the development of scales that would better allow the comparison of test results with educational practice. The need for such scales is apparent in the continuing prominence of grade-equivalent scores. Such scores are widely misunderstood and because of this, as previously noted, the 1974 test Standards prescribed that they “should be abandoned or their use discouraged.” Yet at least one test publisher, in an apparent effort to adhere to this prescription, on abandoning the grade-equivalent scale in a new test edition, was forced to reinstitute it when the company discovered that nobody would buy the test without it (“Misleading Data,” 1977). The apparent reason is that of the scales typically offered in common achievement tests, the grade-equivalent is the only one that is directly interpretable in terms of educational practice, namely, grade placement, even if it is commonly misinterpreted. This suggests a real need for the development of better scales to allow test results to be more easily related to educational practice. An example of a relatively new test that offers such a scale is the Degrees of Reading Power (DRP) test. DRP results can be interpreted in terms of a readability scale, and because textbooks can be scored on a similar basis, results from this reading test can be compared directly to textbooks that are or might be used. Another idea would be the development of scales relating to typical standards of educational evaluation in schools, standards that would be more meaningful than grade placement. If predicted grade point average has proven a useful construct for the interpretation of college admissions test results, might something along these lines be useful below the college level? Also, latent trait scaling, for instance, as proposed to be used in the new National Assessment (Messick, Beaton, & Lord, 1983), offers possibilities for the development of new and useful scaling methods (Weiss, 1983). A fourth and probably the most potentially important area for new research is computerized testing (Weiss, 1978, 1983). The rapid development of increasingly powerful yet low-cost computers promises to have considerable impact on education (Papert, 1980; Toong & Gupta, 1982), and for this reason, if no other, computers will affect testing in the field of education. But the widespread availability of computing power holds specific potential for improving testing for three reasons. First, computerized testing holds the potential for narrowing the gap between the performance required on tests and the skills and knowledge of real educational interest. As previously
Salkind_Chapter 72.indd 426
9/8/2010 12:08:03 PM
Haney
Testing Reasoning and Reasoning about Testing
427
noted, testing has most impact on students and teachers when sanctions, real or perceived, are attached to performance on the test. But when this happens there is great danger that energy and attention will be drawn away from the real skills and knowledge of interest to focus on the instruments themselves. The problem is that large-scale testing programs invariably rely on the multiple choice answer format. Critics of multiple choice tests, such as Banesh Hoffman (1962), have long argued that merely recognizing a correct answer on such a test is quite a different skill and encourages a form of reasoning quite different from actually creating an answer to a problem for oneself. Passage dependency studies of multiple choice reading tests (indicating that even when the reading passage on which test questions ostensibly are based are missing, subjects can select the “correct” answer at far greater than chance levels) have lent considerable credence to such charges. Moreover, Frederiksen (1984) has reviewed a variety of evidence to argue that the predominance of the multiple choice format is “the real test bias” because multiple choice tests tend not to measure the more complex cognitive abilities. Computerized testing has the potential for freeing large-scale testing from the constraints of the multiple choice formats, because via computer a student can be presented with a problem and have to create an answer for himor herself, with the answer then weighed against a variety of answers or algorithms not apparent to the student when the problem is presented. Obviously such testing will not completely eliminate the discrepancy between the skills and independent reasoning that schooling seeks to impart to students and those that can be systematically tested; but at least it holds potential for narrowing the gap somewhat, and hence for reducing the distortions that testing sometimes prompts in educational practice. A second reason for viewing computerized testing with some optimism is that it offers the potential for rapid feedback about performance (Prestwood, 1978). Rapid feedback about performance is a desideratum of most theories of learning, but it is something that typical school testing programs have not offered. Results of both multiple choice exams and essay tests typically are returned to students only after days or even weeks of delay. Yet tests administered by computer offer the potential for results being returned to students virtually instantaneously, while both the problem and the student’s own strategy for solving the problem are well in mind. The tremendous storage capacity of media such as optical laser video disks (which can store the equivalent of 1,000,000 pages of text on a 12-inch disk) may make it economically feasible to have such large sets of items available that the return of feedback on particular items will not invalidate the reuse of the items, a concern behind many test publishers’ opposition to the truth-in-testing proposals in the 1970s. Third, computers offer the potential for testing to be more thoroughly and usefully integrated with instruction (Sleeman & Brown, 1982). Here I refer not just to the integration of instructional and testing sequences in the
Salkind_Chapter 72.indd 427
9/8/2010 12:08:03 PM
428
Research Design, Measurement and Statistics and Evaluation
same computer-delivered tutorial lesson, but also to the record-keeping and data manipulation power of computers which will allow teachers to reorganize and view test results rapidly and in ways that are relevant to what they currently might be teaching. In short, it will offer teachers both time and power to reason about what their students are learning. These, then, are four promising areas of research regarding testing: research into what test questions represent for individual test-takers, investigations of the meaning connoted by labels and scales used in reporting test results, development of scales for better relating test results to educational practice, and explorations of the potential of computerized testing. But the general rationale motivating these suggestions may be more important than the specifics. Throughout the history of standardized testing since the early part of this century, there seems to have been a general trend away from a unitary view of human intelligence and reasoning, toward more differentiated perspectives. Moreover, recent research (Gardner, 1983; Sternberg, 1977) indicates that human intelligence and reasoning are multifaceted, with different facets evident under different conditions and from different perspectives. Similarly, it would be well to view the testing of reasoning as a differentiated enterprise. In particular, the educational role of standardized testing can be enhanced by broadening our thinking about the ends and means of the endeavor. The testing of reasoning, after all, should mean not just the measurement of cognitive abilities, but also the challenging of thinking by both those who take the tests and those who would use their results.
Notes 1. Rice died in 1934, and Cremin (1964) points out that at the time of his death, Rice was “virtually unknown, remembered only – when at all – as one of the founders of the American testing movement” (p. 8). However, in 1966, as a result of what was billed as “the first symposium centered upon the history of educational research,” the Journal of Educational Measurement published papers on Rice as a “pioneer in progressive education, educational research and educational measurement,” with articles by Graham (1966), Stanley (1966), and Engelhart and Thomas (1966). As the latter point out, Horace Mann is another candidate for the appellation of pioneer in testing in the United States for his introduction of written as opposed to oral exams in the mid-1800s – except that Mann’s work did not stimulate further developments in testing until after the time of Rice. 2. Camfield (1969) points out that planning for the WWI testing began well before U.S. entry into the war. He also shows that the effort was motivated by a desire to establish psychology as a recognized science as well to contribute to the war effort. 3. Accounts of the WWII testing differ drastically as to the numbers of men tested with the AGCT. Tuddenham (1962, p. 495) says it was “some four million.” Hale (1982, p. 23) puts it as more than 9 million, and Dailey (1953, p. 281), citing data from Science Research Associates (the firm that published a civilian version of the AGCT after the war), gives the figure of “more than 12 million.” Since more than 12 million men did serve in the U.S. Army in WWII, the higher of these figures would seem correct.
Salkind_Chapter 72.indd 428
9/8/2010 12:08:04 PM
Haney
Testing Reasoning and Reasoning about Testing
429
4. Though the 1937 revision of the Stanford-Binet had used the ratio IQ only 2 years later, David Wechsler in the Wechsler-Bellvue intelligence scale had used the deviation score for calculating IQ scores. Wechsler dropped the ratio approach to defining IQ, both because IQ scores calculated in this way tended to vary with age, and because IQ as mental age divided by chronological age made little sense when applied to adults (see Matarazzo, 1972, pp. 90–120). It was not until the 1960 revision that the Stanford-Binet adopted a deviation IQ score. 5. This is an extremely brief account of the history of Title I evaluations. For more thorough descriptions, see McLaughlin (1975); Anderson, Johnson, Fishbein, Stonehill, and Burnes (1978); Cross (1979); and Reisner et al. (1982). For a broader survey of requirements for the evaluation of federally funded programs, see Boruch and Cordray (1980). 6. These and earlier comments on the evolution of the APA, AERA, and NCME test standards are of necessity very brief. For more extended discussion of the evolution of these standards, see Haney (1978), and Novick (1982). The latter describes the evolution of federal guidelines on employment testing as well as the professional test standards. 7. The pyramid problem had appeared on several previous versions of ETS tests, and after the discovery of the error, ETS, to its credit, went back and reanalyzed test data for the question. It found that even with a sample of 800,000 people who had taken the pyramid item, item discrimination analyses showed no indication of the faulty item (Wainer, 1983). Strieker (1982) performed a study to see if disclosing a form of the SAT previously taken by examinees had any apparent effect on scores when they subsequently took another form of the SAT. He found no discernible effect. 8. Shortly after the final decision in the Larry P. case, in a highly similar case, PASE v. Hannon (1980), another federal judge reached a quite different decision as to the legality of using IQ tests for EMR placements. In light of sharp disagreement among experts, the judge in the PASE case decided to look at the questions on the IQ tests in dispute, and largely on the basis of this examination concluded that the tests were not biased and could continue to be used, together with other information, for the placement of Chicago youngsters. For more detailed information concerning the cases summarized above and others, see Psychological Corporation (1978), Jensen (1980), Wigdor (1982), and Hollander (1982). 9. This citation and others in the next paragraph refer to the pages in the Debra P. trial transcript.
References Airasian, P., Madaus, G., & Pedulla, J. (1979). Minimal competency testing. Englewood Cliffs, NJ: Educational Technology Publications. American Educational Research Association & National Council on Measurement in Education. (1955). Technical recommendations for achievement tests. Washington, DC: American Psychological Association. American Psychological Association. (1953). Ethical standards of psychologists. Washington, DC: American Psychological Association. American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. (1966). Standards for educational and psychological tests and manuals. Washington, DC: American Psychological Association. American Psychological Association, American Educational Research Association, & National Council on Measurement in Education. (1974). Standards for educational and psychological tests. Washington, DC: American Psychological Association.
Salkind_Chapter 72.indd 429
9/8/2010 12:08:04 PM
430
Research Design, Measurement and Statistics and Evaluation
American Psychological Association, American Educational Research Association, & National Council on Measurements Used in Education. (1954). Technical recommendations for psychological tests and diagnostic techniques. Washington, DC: American Psychological Association. Amrine, M. (1965a). The 1965 congressional inquiry into testing: A commentary. American Psychologist, 20, 859–870. Amrine, M. (1965b). Now is the time for all good mental cases to come to the aid ... of the psychologists. American Psychologist, 20, 989. Anastasi, A. (1967). Psychology, psychologists and psychological testing. American Psychologist, 22, 297–306. Anastasi, A. (1975). Harassing a dead horse (Review of D. R. Green [Ed.], The aptitudeachievement distinction: Proceedings of the second CTB/McGraw-Hill conference on issues in educational measurement). Review of Education, 1, 356–362. Anderson, J., Johnson, R., Fishbein, R., Stonehill, R., & Burnes, J. (1978). The U.S. Office of Education models to evaluate E.S.E.A. Title I: Experiences after one year of use. Washington, DC: U.S. Office of Education, Office of Planning, Budgeting and Evaluation. Angoff, W. H., & Dyer, H. (1971). The admissions testing program. In W. Angof (Ed.), The College Board admissions testing program. New York: College Entrance Examination Board. Astin, A. (1971). Open admissions and programs for disadvantaged students. Journal of Higher Education, 42, 620–647. Ayres, L. (1909). Laggards in our schools. New York: Russell Sage. Bailey, S. K., & Mosher, E. K. (1968). ESEA: The Office of Education administers a law. Syracuse, NY: Syracuse University Press. Baker, F. (1971). Automation of test scoring, reporting, and analysis. In R. Thorndike (Ed.), Educational measurement. Washington, DC: American Council on Education. Black, H. (1962). They shall not pass. New York: Random House. Block, N., & Dworkin, G. (Eds.). (1976). The IQ controversy. New York: Pantheon. Bloom, B. (1964). Stability and change in human characteristics. New York: John Wiley and Sons. Bollenbacher, J. (1976). The testing scene: Chaos and controversy. In Testing in the public interest. Proceedings of the 1976 ETS Invitational Conference. Princeton, NJ: Educational Testing Service. Boring, E. G. (1950a). A history of experimental psychology (2nd ed.). New York: AppletonCentury-Crofts. Boring, E. G. (1950b). History, psychology, and science: Selected papers. New York: John Wiley. Boruch, R. F., & Cordray, D. S. (1980). An appraisal of educational program evaluations: Federal, state, and local agencies. Washington, DC: U.S. Department of Education. Brigham, C. C. (1923). A study of American intelligence. Princeton, NJ: Princeton University Press. Brown, R., & McClung, M. (1980). Searching for the truth about truth-in-testing legislation. Denver, CO: Education Commission of the States. Buros, O. K. (Ed.). (1938). The nineteen thirty-eight mental measurements yearbook. Highland Park, NJ: Gryphon Press. Buros, O. K. (Ed.). (1940). The nineteen forty mental measurements yearbook. Highland Park, NJ: Gryphon Press, (reissued 1972). Buros, O. K. (Ed.). (1953). The fourth mental measurements yearbook. Highland Park, NJ: Gryphon Press. Buros, O. K. (Ed.). (1959). The fifth mental measurements yearbook. Highland Park, NJ: Gryphon Press.
Salkind_Chapter 72.indd 430
9/8/2010 12:08:04 PM
Haney
Testing Reasoning and Reasoning about Testing
431
Buros, O. K. (Ed.). (1961). Tests in print: A comprehensive bibliography of tests for use in education, psychology, and industry. Highland Park, NJ: Gryphon Press. Buros, O. K. (Ed.). (1972). The seventh mental measurements yearbook. Highland Park, NJ: Gryphon Press. Buros, O. K. (Ed.). (1978). The eighth mental measurements yearbook. Highland Park, NJ: Gryphon Press. Callahan, R. E. (1962). Education and the cult of efficiency. Chicago: Chicago University Press. Camfield, T. M. (1969). Psychologists at war: The history of American psychology and the first world war. Doctoral dissertation, University of Texas at Austin. (University Microfilms No. 70–10, 766) Carver, R. (1974). The two dimensions of tests: Psychometric and edumetric. American Psychologist, 29, 512–518. Chapman, P. D. (1980). Schools as sorters: Lewis M. Terman and the intelligence testing movement, 1890–1930. Doctoral dissertation, Stanford University. (University Microfilms No. 80–11, 615) Cicirelli, V. G., Cooper, W. H., & Granger, R. L. (1969). The impact of Head Start: An evaluation of the effects of Head Start on children’s cognitive and affective development. Washington, DC: Office of Economic Opportunity. Cohen, D., & Garet, M. (1975). Reforming educational policy with applied social science research. Harvard Educational Review, 45(1), 17–43. Cohen, D., & Rosenberg, B. (1977). Functions and fantasies: Understanding schools in capitalist America. History of Education Quarterly, 11, 113–137. Coleman, J., Campbell, E. Q., Hobson, C. J., McPartland, J., Mood, A. M, Weinfeld, F. D., & York, R. L. (1966). Equality of educational opportunity. Washington, D.C.: U.S. Government Printing Office. Coleman, J., Hotter, S. N., & Kilgore, S. (1981). Public and private schools. Report to the National Center for Education Statistics by the National Opinion Research Center, Chicago. Coleman, R., & Rainwater, L. (1978). Social standing in America. New dimensions of class. New York: Basic Books. Comptroller General. (1977). Problems and needed improvements in evaluating Office of Education programs. Washington, DC: U.S. Government Printing Office. Conant, J. B. (1961). Slums and suburbs: A commentary on schools in metropolitan areas New York: McGraw-Hill. Cremin, L. (1964). The transformation of the school. New York: Vintage. Cronbach, L. J. (1975). Five decades of public controversy over mental testing. American Psychologist, 30, 1–14. Cronbach, L., Ambron, S., Dornbush, S., Hess, R., Hornik, R., Phillips, D., Walker, D., & Weiner, S. (1982). Toward reform of program evaluation. San Francisco: Jossey-Bass. Cronbach, L. J., Gleser, G. G., & Rajaratnam, N. (1972). The dependability of behavioral measurements. New York: John Wiley & Sons. Cronbach, L. J., Rajaratnam, N., & Gleser, G. G. (1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16, 137–163. Cronbach, L., & Suppes, P. (Eds.). (1969). Research for tomorrow’s schools. New York: MacMillan. Cross, C. T. (1979). Title I evaluation: A case study in congressional frustration. Educational Evaluation and Policy Analysis, 1, 2, 15–21. Cureton, E. E. (1939). The principal compulsions of factor analysts. Harvard Educational Review, 19, 287–295. Dailey, J. T. (1953). Review of Army general classification test, first civilian edition. In O. K. Buros (Ed.), The fourth mental measurements yearbook. Highland Park, NJ: Gryphon Press.
Salkind_Chapter 72.indd 431
9/8/2010 12:08:04 PM
432
Research Design, Measurement and Statistics and Evaluation
David, F. N. (1968). Galton, Francis. In D. L. Sills (Ed.), International encyclopedia of the social sciences (Vol. 12, pp. 48–53). New York: Macmillan. Debra P. v. Turlington, 474 F. Supp. 244 (M.D. Fla., 1979). Debra P. v. Turlington, 644 F.2d 397 (5th Cir., 1981). Debra P. v. Turlington, 564 F. Supp. 177 (M.D. Fla, 1983). Debra P. v. Turlington, 1984. F. Supp. No. 83–3326. Dimengo, C. (1978). Basic testing programs used in major school systems throughout the United States in the school year 1977–78. Akron, OH: Akron Public Schools Division of Personnel and Administration. Donlon, T. F., & AngofT, W. H. (1971). The scholastic aptitude test. In W. Angoff (Ed.), The College Board admissions testing program (pp. 15–47). New York: College Entrance Examination Board. DuBois, P. (1966). A test dominated society: China, 1115 B.C.–1905 A.D. In A. Anastasi (Ed.), Testing problems in perspective (pp. 29–36). Washington, DC: American Council on Education. Dyer, H. (1977). Criticisms of testing: How mean is the median? Measurement in Education, 8, 1–10. Elashoff, J., & Snow, R. (1971). Pygmalion reconsidered. Worthington, OH: Charles A. James. Engelhart, M. D., & Thomas, M. (1966). Rice as the inventor of the comparative test. Journal of Educational Measurement, 3, 141–145. Fiske, E. (1981a, April 14). Pyramids of test question 44 open a pandora’s box. New York Times, p. C3. Fiske, E. (1981b, March 24). A second student wins challenge on math test. New York Times, p. B1. Fiske, E. (1981c, April 28). Soul searching in the testing establishment. New York Times, p. C1. Florida should have lost testing case. (1984, April 27). Education USA. Frederiksen, N. (1984). The real test bias: Influences of testing on teaching and learning. American Psychologist, 39, 193–202. Gallup, G. (1974). Sixth annual poll of public attitudes towards education. Phi Delta Kappan, 55, 20–32. Galton, F. (1869). Hereditary genius: An inquiry into its laws and consequences. London: Macmillan. Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York: Basic Books. Glaser, R. (1963). Instructional technology and the measurement of learning outcomes – some questions. American Psychologist, 18, 519–521. Goslin, D. (1963). The search for ability. New York: Russell Sage. Goslin, D. (1967). Teachers and testing. New York: Russell Sage. Goslin, D., Epstein, R. R., & Hallock, B. A. (1965). The use of standardized tests in elementary schools. New York: Russell Sage. Gould, S. (1981). The mismeasure of man. New York: Norton. Graham, P. A. (1966). Joseph Meyer Rice as a founder of the progressive education movement. Journal of Educational Measurement, 3, 129–133. Grant, G. (1970). Social science and public policy: A case study of the Coleman report. Doctoral dissertation. Harvard University Graduate School of Education. Grant, W. V., & Eiden, L. J. (1982). Digest of educational statistics. Washington, DC: U.S. Government Printing Office. Green, B. F. (1978). In defense of measurement. American Psychologist, 33, 664–670. Green, D. R. (Ed.). (1974). The aptitude-achievement distinction. Monterey, CA: CTB/ McGraw-Hill. Gross, M. (1962). The brain watchers. New York: Random House.
Salkind_Chapter 72.indd 432
9/8/2010 12:08:04 PM
Haney
Testing Reasoning and Reasoning about Testing
433
Hale, M. (1982). History of employment testing. In A. Wigdor & W. Garner (Eds.), Ability testing (Vol. 2, pp. 3–38). Washington, DC: National Academy Press. Haney, W. (1978). Standards for tests and test use (National Consortium on Testing Staff Circular No. 3). Cambridge, MA: The Huron Institute. Haney, W. (1982). Disabilities of committee testing: A review. Educational Measurement: Issues and Practice, 1(3), 13–18. Haney, W., & Scott, L. (1978). Talking with children about testing: A pilot study of test item ambiguity (National Consortium on Testing Staff Circular No. 7). Cambridge, MA: The Huron Institute. Hanford, G. (1980). Testing the tests. Princeton, NJ: College Board Publications Office. Hargadon, F. (1980). Two cheers. Princeton, NJ: College Board Publications Office. Hastings, J., Runkel, P., & Damrin, D. (1961). Effects on use of tests by teachers trained in a summer institute. Urbana: Bureau of Educational Research, University of Illinois. (ERIC Document Reproduction Service No. 002 925) Hastings, J., Runkel, P., Damrin, D., Kane, R., & Larson, G. (1960). The use of test results. Urbana: Bureau of Educational Research, University of Illinois. Herrnstein, R. (1971, September). IQ. Atlantic Monthly, 228, 43–64. Heyneman, S., & Mintz, F. (1976). The frequency and quality of measures utilized in federally sponsored research on children and adolescents. Washington, DC: George Washington University, Social Research Group. Hobson v. Hansen, 269 F. Supp 401 (D.D.C. 1967). Hoffmann, B. (1961, March). The tyranny of multiple-choice tests. Harper’s Magazine, 222, 37– 41. Hoffmann, B. (1962). The tyranny of testing. New York: Collier. Hollander, P. (1982). Legal context of educational testing. In A. Wigdor & W. Garner (Eds.), Ability testing (vol. 2, pp. 195–231). Washington, DC: National Academy Press. Holmen, M. G., & Doctor, R. F. (1972). Educational and psychological testing: A study of the industry and its practices. New York: Russell Sage. Houts, P. (Ed.). (1977). The myth of measurability. New York: Hart. Hunt, J. (1961). Intelligence and experience. New York: Ronald. Jacobson, R. (1981, March 30). “Discovery of second error poses threat to test,” College Board Chairman says. Chronicle of Higher Education, p. 4. Jaeger, R., & Tittle, C. (Eds.). (1980). Minimum competency achievement testing: Motives, models, measures and consequences. Berkeley, CA: McCutchan. Jencks, C, Bartlett, S., Corcoran, M., Crouse, J., Eaglesfield, D., Jackson, G., McClelland, K., Mueser, P., Olneck, M., Schwartz, J., Ward, S., & Williams, J. (1977). Who gets ahead: The determinants of economic success in America. New York: Basic Books. Jencks, C, & Crouse, J. (1982, June). Should we relabel the SAT ... or replace it? Phi Delta Kappan, 63, 659–663. Jencks, C, Smith, M., Acland H., Bane, M. J., Cohen, D., Gintis, H., Heyns, B., & Michelson, S. (1972). Inequality: A reassessment of the effect of family and schooling in America. New York: Basic Books. Jensen, A. (1969). How much can we boost IQ and achievement? Harvard Educational Review, 39, 1–123. Jensen, A. (1980). Bias in mental testing. New York: Free Press. Jensen, A. (1981). Straight talk about mental tests. New York: Free Press. Jonich, G. (1968). The sane positivist: A biography of Edward L. Thorndike. Middletown, CT: Wesleyan University Press. Kavruck, S. (1956). Thirty-three years of test research: A short history of test development in the U.S. Civil Service Commission. American Psychologist, 11, 329–333. Kellaghan, T., Madaus, G. F., & Airasian, P. W. (1982). The effects of standardized testing. Boston: Kluwer-Nijhoff.
Salkind_Chapter 72.indd 433
9/8/2010 12:08:04 PM
434
Research Design, Measurement and Statistics and Evaluation
Kluger, R. (1977). Simple justice: The history of Brown v. Board of Education and black America’s struggle for equality. New York: Vintage. Larry P. v. Riles. 343 F. Supp 401 (N.D. CA., 1972). Lazarus, M. (1981). Goodbye to excellence: A critical look at minimum competency testing. Boulder, CO: Westview. Lerner, B. (1978). The Supreme Court and the APA, AERA, and NCME test standards: Past references and future possibilities. Paper presented at the semiannual conference of the National Consortium on Testing, Washington, DC. Lerner, B. (1979). The war on testing: Detroit Edison in perspective. Princeton, NJ: ETS. Lerner, B. (1980). The war on testing: David, Goliath and Gallup. The Public Interest, 60, 119–147. Lindblom, C. E., & Cohen, D. K. (1979). Usable knowledge. New Haven, CT: Yale University Press. Lippman, W. (1922a, October). The mental age of Americans. New Republic, 32, 213–215. Lippman, W. (1922b, November). A future for tests. New Republic, 33, 9–11. Madaus, G. (1981, May). Reaction to the ‘Pittsburgh’ papers. Phi Delta Kappan, 62, 634–639. Madaus, G., Kellaghan, T., Rakow, E. A., & King, D. J. (1979). The sensitivity of measures of school effectiveness. Harvard Educational Review, 49, 207–230. Matarazzo, J. (1972). Wechsler’s measurement and appraisal of adult intelligence. Baltimore: Williams and Wilkins. McLaughlin, M. W. (1975). Evaluation and reform: The Elementary and Secondary Education Act of 1965. Cambridge, MA: Ballinger. Mehan, H., & Wood, H. (1975). The reality of ethnomethodology. New York: WileyInterscience. Messick, S., Beaton, A., & Lord, F. (1983). National Assessment of Educational Progress reconsidered: A new design for a new era. Princeton, NJ: Educational Testing Service. Misleading data. (1977, May 7). New York Times, p. E19. Mitchell, J. (Ed.). (1983). Tests in print III. Lincoln: University of Nebraska Press. Mosteller, F., & Moynihan, D. P. (Eds.). (1972). On equality of educational opportunity. New York: Random House. Nairn, N., & Associates. (1980). The reign of ETS: The corporation that makes up minds. Washington, DC: Learning Research Project. National Merit Scholarship Corporation. (1978). Guide to the National Merit Scholarship Program. Evanston, IL: Author. New York State. (1980, February 20). In the matter of a joint hearing on the subject of truth-in-testing. Proceedings before the Senate Committee on Higher Education and the Assembly Committee on Higher Education of the State of New York, (unpublished stenographic record) Novick, M. (1982). Ability testing: Federal guidelines and professional standards. In A. Wigdor & W. Garner (Eds.), Ability testing (pp. 78–98). Washington, DC: National Academy Press. Packard, V. (1964). The naked society. New York: McKay. Papert, S. (1980). Windstorms: Children, computers and powerful ideas. New York: Basic Books. PASE, v. Hannon O. U.S.D.C. NILL: 49 U.S.L.W. 2087, July 7, 1980. Pastore, W. (1949). The nature-nurture controversy. New York: King’s Crown Press. Peterson, R. (1978). Review of Rorschach test. In O. Buros (Ed.), The eighth mental measurements yearbook. Highland Park, NJ: Gryphon Press. Popham, J. (1978a). The case for criterion-referenced measurement. Educational Researcher, 7, 6–10.
Salkind_Chapter 72.indd 434
9/8/2010 12:08:04 PM
Haney
Testing Reasoning and Reasoning about Testing
435
Popham, J. (1978b). Criterion-referenced measurement. Englewood Cliffs, NJ: PrenticeHall. Popham, J. (1980). Domain specification strategies. In R. Berk (Ed.), Criterion referenced measurement: The state of the art. Baltimore: Johns Hopkins. Popham, J., & Husek, T. (1969). Implications of criterion referenced measurement. Journal of Educational Measurement, 6, 1–9. Powell, J. (1968). The interpretation of wrong answers from a multiple choice test. Educational and Psychological Measurement, 28(2), 403–412. Prestwood, J. S. (1978). Effects of knowledge of results and varying proportions correct on ability test performance and psychological variables. In D. Weiss (Ed.), Proceedings of the 1977 computerized adaptive testing conference (pp. 105–115). Minneapolis: University of Minnesota. Psychological Corporation. (1978). Summaries of court decisions on employment testing. New York: Author. Radwin, E. (1981). A case study of New York City: Citywide reading testing program. Cambridge, MA: The Huron Institute. Raudenbush, S. (1984). Magnitude of teacher expectancy effects on pupil IQ as a function of the credibility of expectancy induction. Journal of Educational Psychology 76(1), 85–97. Reader’s Guide to Periodical Literature. (1915–1983). (Vols. 3–42). New York: H.W. Wilson Co. Reisner, E., Alkin, M., Boruch, R., Linn, R., & Millman, J. (1982). Assessment of the Title I evaluation and reporting system. Washington, DC: U.S. Department of Education. Resnick, D. (1981, May). Testing in America: A supportive environment. Phi Delta Kappan, 62, 625–628. Resnick, D. (1982). History of educational testing. In A. Wigdor & W. Garner (Eds.), Ability testing (Vol. 2, pp. 173–194).Washington, DC: National Academy Press. Resnick, L. (1981, May). Introduction: Research to inform a debate. Phi Delta Kappan, 62, 623–624. Rosenthal, R., & Jacobsen, L. (1968). Pygmalion in the classroom. New York: Holt, Rinehart and Winston. Rudman, H., Kelly, J., Wanous, D., Mehrens, W., Clark, C, & Porter, A. (1980). Integrating assessment with instruction. A review (1922–1980). East Lansing: Institute for Research on Teaching, Michigan State University. Salmon-Cox, L. (1981, May). Teachers and standardized test: What’s really happening? Phi Delta Kappan, 62, 631–634. Sarason, S. (1976). The unfortunate fate of Alfred Binet and school psychology. Teachers College Record, 77, 579–592. Shrader, W. (Ed.). (1982). New directions in testing and measurement: Measurement, guidance and program improvement (No. 13). San Francisco: Jossey-Bass. Slack, W. V., & Porter, D. (1980). The scholastic aptitude test: A critical appraisal. Harvard Educational Review, 50(2), 154–175. Sleeman, D., & Brown, J. S. (Eds.). (1982). Intelligent tutoring systems. New York: Academic Press. Solomon, R. (1981, March 10). Truth-in-testing is (A) (B) (C). New York Times, p. A19. Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72–101. Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295. Sproull, L., & Zubrow, D. (1981, May). Standardized testing from the administrative perspective. Phi Delta Kappan, 62, 628–631. Stanley, J. C. (1966). Rice as a pioneer educational researcher. Journal of Educational Measurement, 3, 135–139.
Salkind_Chapter 72.indd 435
9/8/2010 12:08:04 PM
436
Research Design, Measurement and Statistics and Evaluation
Stanley, J. C. (1971). Reliability. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.). Washington, DC: Council on Education. Starch, D., & Elliot, E. C. (1912). Reliability of grading high school work in English. School Review, 20, 442–457. Starch, D., & Elliot, E. C. (1913). Reliability of grading high school work in mathematics. School Review, 21, 254–259. Sternberg, R. J. (1977). Intelligence, information processing and analogical reasoning. Hillsdale, NJ: Erlbaum. Strenio, A. J. (1979). The debate over open versus secure testing: A critical review. Cambridge, MA: The Huron Institute. Stricker, L. (1982). Test disclosure and retest performance on the scholastic aptitude test. (Report No. 82–7). New York: College Entrance Examination Board. Sundberg, N. (1954). A note concerning the history of testing. American Psychologist, 9, 150–151. Suppes, P. (Ed.). (1978). Impact of research on education. Washington, DC: National Academy of Education. Tallmadge, G. K., & Wood, C. T. (1976). User’s guide: ESEA Title I evaluation and reporting system. Mountain View, CA: RMC Corporation. Terman, L. (1922). The great conspiracy: The impulse of intelligence testers psychoanalyzed and exposed by Mr. Lippman. New Republic, 23, 116–119. (Reprinted in Block, N. J., & Dworkin, G., Eds., The IQ controversy. New York: Random House, 1976, pp. 30–38.) Toong, H., & Gupta, A. (1982). Personal computers. Scientific American, 247(6), 87–107. Tuddenham, R. (1962). The nature and measurement of intelligence. In L. Postman (Ed.), Psychology in the making (pp. 469–525). New York: Knopf. 22,000 scores revised after error is detected on law school exam. (1981, May 7). New York Times, p. A29. U.S. Bureau of the Census. (1975). Historical statistics of the United States, colonial times to 1970 (Vols. 1 & 2). Washington, DC: United States Government Publications Office. U.S. Congress. (1980a). The educational testing act of 1979. (Hearings before the Subcommittee on Elementary, Secondary, and Vocational Education). Washington, DC: U.S. Government Printing Office. U.S. Congress. (1980b). Truth in testing act of 1979: The educational testing act of 1979. (Hearings before the Subcommittee on Elementary, Secondary, and Vocational Education). Washington, DC: U.S. Government Printing Office. U.S. Senate Select Committee on Equal Educational Opportunity. (1972). Environment, intelligence and scholastic achievement. Washington, DC: U.S. Government Printing Office. Wagner, R. K., & Sternberg, R. J. (1984). Alternative conceptions of intelligence and their implications for education. Review of Educational Research, 54(2), 179–223. Wainer, H. (1983). Pyramid power: Searching for an error in test scoring with 830,000 helpers. American Statistician, 37(1), 87–91. Walker, H. M. (1929). Studies in the history of statistical method. Baltimore: Williams & Wilkins. Weiss, D. (Ed.). (1978). Proceedings of the 1977 computerized adaptive testing conference. Minneapolis: University of Minnesota. Weiss, D. (Ed.). (1983). New horizons in testing: Latent trait theory and computerized adaptive testing. New York: Academic Press. Wigdor, A. (1982). Psychological testing and the law of employment discrimination. In A. Wigdor & W. Garner (Eds.), Ability testing (Vol. 2, pp. 39–69). Washington, DC: National Academy Press. Wigdor, A., & Garner, W. (Eds.). (1982). Ability testing: Uses, consequences, and controversies (Vols. 1 & 2). Washington, DC: National Academy Press.
Salkind_Chapter 72.indd 436
9/8/2010 12:08:04 PM
Haney
Testing Reasoning and Reasoning about Testing
437
Williams, R. L., Mosby, D., & Hinson, V. (1976). Critical issues in achievement testing of children from diverse educational backgrounds (paper presented at the Invitational Conference on Achievement Testing of Disadvantaged and Minority Students for Educational Program Evaluation.) Washington, DC: U.S. Office of Education. Wirtz, W. (1977). On further examination: Report of the panel on scholastic aptitude test score decline. Princeton, NJ: College Entrance Examination Board. Wirtz, W., & Lapointe, A. (1982). Measuring the quality of education: A report on assessing educational progress. Washington, DC: Wirtz & Lapointe. Wolf, T. (1961). An individual who made a difference. American Psychologist, 16, 245–248. Wolf, T. (1973). Alfred Binet. Chicago: University of Chicago Press. Yerkes, R. M. (Ed.).(1921). Psychological examining in the United States Army. Memoirs of the National Academy of Sciences (Vol. 15).
Appendix 1 (to Figure 1) Source. Readers’ Guide to Periodical Literature, Vols. 3–42. New York: H. W. Wilson, 1915–1983. In its first three volumes, the Readers’ Guide was published at 5-year intervals. Beginning in 1921 it was published triannually, and starting around 1937 biannually, though the actual period of time covered varied slightly. Since 1966, the Readers’ Guide has been issued annually. Hence the data used in preparing Figure 1 were average annual entries to account for volumes spanning different lengths of time. Also, the presentation of these data in graphic form in Figure 1 entailed some smoothing to eliminate year-to-year variations. The average annual number of listings in the Readers’ Guide under various rubrics is, however, an imperfect indicator of interest in these topics for several reasons. First, topic headings under which articles are listed have varied somewhat over the years. The heading “Educational Tests and Measurements” did not appear until the volume spanning 1957–59. Thus for previous years, I substituted the preexisting rubric “Educational Measurements.” Also, listings under “IQ” have been discontinued in the last two volumes of the Guide. In addition, changes in the number of articles in “periodicals of general interest” (Preface, Readers’ Guide, Feb. 1978) may reflect not just changes in relative interest in particular topics, but also changes in the volume of articles published and indexed in the Guide. In this regard, it should be noted that the total number of pages in the Readers’ Guide has increased from about 1,800 in volume 5 (covering 3 years) to 1,500 in volume 39 (covering a single year). Inconsistency in the indexing system and overlap are possible sources of omission and redundancy. One further problem is that the domain of literature indexed in the Guide changes periodically. The exact basis for periodical selection is not described for earlier volumes, but since the early 1950s, selection of periodicals for indexing has been done by a group of librarians, the committee on Wilson indexes, drawing on advice from subscribers. Over the period depicted in Figure 1, the number of periodicals indexed has varied from roughly 100 to 130, though the number of periodical titles dropped or added in consecutive volumes rarely seems to have exceeded 10. Nevertheless, one change in periodical coverage should be noted. In volume 19 (covering
Salkind_Chapter 72.indd 437
9/8/2010 12:08:04 PM
438
Research Design, Measurement and Statistics and Evaluation
April 1953–February 1955), three education journals were dropped from coverage, and this change may partially account for the drop in articles indexed on testing in the mid-1950s. As a result of such complications, the listings in Readers’ Guide provide only a crude indication of the ebb and flow of interest in particular topics.
Appendix 2 (to Table I) This table was developed using Mitchell (1983), as follows. Test titles were identified using Mitchell’s Table 2 (pp. xxvii–xxviii), and then the number of references for each of these titles in each of the Buros publications was determined by looking up each test title using the TIP III test entry number given after each test title. One discrepancy between Mitchell’s Table 2 and our Table 1 should be noted. As the test with the sixth greatest number of references since the eighth MMY, Mitchell lists the Wide Range Employability Test Sample, whereas we have listed the Wide Range Achievement Test. Consulting the appropriate test entries makes it apparent that the TIP III listing was simply a typographical error and that the WRAT was the test title intended.
Salkind_Chapter 72.indd 438
9/8/2010 12:08:04 PM
Salkind_Chapter 72.indd 439
9/8/2010 12:08:05 PM
Salkind_Chapter 72.indd 440
9/8/2010 12:08:05 PM
Salkind_Chapter 72.indd 441
9/8/2010 12:08:05 PM
Salkind_Chapter 72.indd 442
9/8/2010 12:08:05 PM
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY
Salkind_Prelims V.indd i
9/4/2010 11:14:16 AM
Salkind_Prelims V.indd ii
9/4/2010 11:14:16 AM
SAGE LIBRARY OF EDUCATIONAL THOUGHT AND PRACTICE
SAGE DIRECTIONS IN EDUCATIONAL PSYCHOLOGY VOLUME V
Edited by
Neil J. Salkind
Salkind_Prelims V.indd iii
9/4/2010 11:14:16 AM
Introduction and editorial arrangement © Neil J. Salkind 2011 First published 2011 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. Every effort has been made to trace and acknowledge all the copyright owners of the material reprinted herein. However, if any copyright owners have not been located and contacted at the time of publication, the publishers will be pleased to make the necessary arrangements at the first opportunity. SAGE Publications Ltd 1 Oliver’s Yard 55 City Road London EC1Y 1SP SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320 SAGE Publications India Pvt Ltd B 1/I 1, Mohan Cooperative Industrial Area Mathura Road New Delhi 110 044 SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 British Library Cataloguing in Publication data A catalogue record for this book is available from the British Library ISBN: 978-0-85702-178-6 (set of five volumes) Library of Congress Control Number: 2010923776 Typeset by Mukesh Technologies Pvt. Ltd., Pondicherry, India. Printed on paper from sustainable resources Printed by MPG Books Group, Bodmin Cornwall
Salkind_Prelims V.indd iv
9/13/2010 3:50:57 PM
Contents Volume V Section IV: Research Design, Measurement and Statistics and Evaluation (Continued) 73. Magnitudes of Experimental Effects in Social Science Research Lee Sechrest and William H. Yeaton 74. Hypothesis Testing in Relation to Statistical Methodology Cherry Ann Clark 75. On Examinee Choice in Educational Testing Howard Wainer and David Thissen 76. Historical Views of Invariance: Evidence from the Measurement Theories of Thorndike, Thurstone, and Rasch George Engelhard, Jr 77. If Statistical Significance Tests Are Broken/Misused, What Practices Should Supplement or Replace Them? Bruce Thompson 78. Musical Aptitude Testing: From James McKeen Cattell to Carl Emil Seashore Jere T. Humphreys 79. The Life and Labors of Francis Galton: A Review of Four Recent Books about the Father of Behavioral Statistics Brian E. Clauser 80. Regression towards the Mean, Historically Considered Stephen M. Stigler 81. Karl Pearson and Statistics: The Social Origins of Scientific Innovation Bernard J. Norton 82. A History of Effect Size Indices Carl J. Huberty 83. The Role of Assessment in a Learning Culture Lorrie A. Shepard 84. The Place of Theory in Educational Research Patrick Suppes 85. Curriculum-based Measures: Development and Perspectives Stanley L. Deno 86. Tests as Research Instruments Robert L. Thorndike 87. My Current Thoughts on Coefficient Alpha and Successor Procedures Lee J. Cronbach and Richard J. Shavelson 88. Handbook of Evaluation Research Lee Ross and Lee J. Cronbach
Salkind_Prelims V.indd v
3 23 43 83 99 115 131 137 151 179 193 215 231 243 257 285
9/4/2010 11:14:17 AM
vi
89. 90. 91. 92. 93. 94. 95.
Contents
A Model for Studying the Validity of Multiple-Choice Items Lee J. Cronbach and Jack C. Merwin Assisted Assessment: A Taxonomy of Approaches and an Outline of Strengths and Weaknesses Joseph C. Campione Standardized Testing Roger T. Lennon The Place of Statistics in Psychology Jum Nunnally Education in Statistics and Research Design in School Psychology Steven G. Little, Howard B. Lee and Angeleque Akin-Little The Role of Measurement Error in Familiar Statistics Malcolm James Ree and Thomas R. Carretta Qualitative Methods and the Development of Clinical Assessment Tools Jane F. Gilgun
Salkind_Prelims V.indd vi
305 319 351 359 367 377 393
9/4/2010 11:14:17 AM
Section IV: Research Design, Measurement and Statistics and Evaluation (Continued )
Salkind_Chapter 73.indd 1
9/4/2010 10:58:46 AM
Salkind_Chapter 73.indd 2
9/4/2010 10:58:47 AM
73 Magnitudes of Experimental Effects in Social Science Research Lee Sechrest and William H. Yeaton
A
problem of which researching psychologists have been aware for years (e.g., Bolles and Messick, 1958; Savage, 1957) but that has had increasing attention over the past decade or so is how to determine how large an effect is achieved by an experimental intervention. Especially for psychologists working in applied areas it is important to know more than that a treatment produces a statistically significant main effect. However, even for theoretical problems it is at least enlightening and often sobering to find out how much an effect is at stake during the intricacies of a theoretical controversy. For example, Smith (1980) found a .25 standard deviation sex bias effect in published studies in counseling and psychotherapy, a finding that accounts for less than 2% of the variance in results. Unfortunately, a relationship of that magnitude is likely to have little, if any, practical significance.1 A number of suggestions about ways of estimating the magnitude of an experimental effect have been proposed, and clearly some of them are useful – or at least better than relying solely on statistical significance as a criterion. Yet we believe that there are serious shortcomings with existing approaches and that alternatives need to be invented and investigated. In our view, the problems are of sufficient complexity that no one solution will suffice, and multiple approaches will be required.
Source: Evaluation Review, 6(5) (1982): 579–600.
Salkind_Chapter 73.indd 3
9/4/2010 10:58:47 AM
4
Research Design, Measurement and Statistics and Evaluation
Statistical Significance Perhaps for want of a better device, authors often resort to statistical significance as an index of effect size, often implying that there is at least a fairly direct relationship between the statistical significance of a finding and its importance in the real world.2 Thus, for example, it is fairly common to find authors noting that a finding is “highly” significant or “very” significant, or reporting p values to four, five, and even six decimal places, as if something of critical importance were contained in those several zeros preceding the final digit. Statistical significance in fact depends on several factors quite unrelated to the magnitude of the experimental effect. First, whether a finding is statistically significant depends on the alpha level one sets (traditionally, .05). The statistical significance of a finding is also a function of sample size, although the relationship of significance level and sample size is nonlinear, being in general a function of √ N . Consequently, studies with larger numbers of subjects yield smaller p values for equal experimental effects, though sample size tends to be inversely related to the percentage of variance accounted for (Craig et al., 1976). It is also the case that whether one obtains a significant effect at all is a function of the level chosen for beta, the arbitrarily defined probability of Type II errors and the resulting power (1 – b ) of a particular experiment to reject the null hypothesis. It is, of course, true that statistical significance is very much a function of effect size. All other things kept equal, a larger difference between means will be associated with a lower p value.3 Again, however, the relationship is nonlinear, and it is hazardous to make comparisons of significance levels across experiments or even across treatments within an experiment. To take the latter, a less obvious case, if in comparison to a control group one treatment is significant at the .01 level while another treatment is significant at only the .05 level, it is tempting to conclude that the former is a stronger treatment. For that to be a legitimate conclusion, the difference between means would have to be larger and the error term the same. And to prove true generalizability, one would have to defend the proposition that the two treatments were implemented with equal care and precision, a point to which we will return.
What Is an Experimental Effect? Perhaps it would be desirable at this point to clarify what we mean by an experimental effect and also to indicate how our arguments may be extended to nonexperimental research findings. In experimental research an effect is usually reflected in the difference(s) between measures of central tendency for different experimental groups, or at least that is the simplest case. Thus, in a simple two-group (E and C) design with posttest measures only,
Salkind_Chapter 73.indd 4
9/4/2010 10:58:47 AM
Sechrest and Yeaton
Experimental Effects
5
magnitude of the experimental effect is achieved by: Effmag = mean XE − mean XC. If the data are categorical and are cast in the form of a contingency table analysis, say for volunteer donors and nondonors within experimental and control groups, the experimental effect is the excess of observed versus expected donors in the experimental as compared to the control condition. If the data are analyzed by regression analysis, the experimental effect is the regression coefficient associated with an E versus C dummy variable. When we speak of experimental effects, we have in mind a raw estimate or value that cannot, until processed in some kind of index, be compared across studies, or even across different main effects within the same experiment. To be concrete, the effect produced by an analgesic is the difference in measured headache distress between the experimental and placebo drug groups. If, on the average, control subjects report headaches of 68 on a 100-point scale and experimentals report headaches of only 53, then the experimental effect is 68 − 53 = 15 points of pain reduction. Extension of the above ideas to subject variables is straightforward. The “effect” of sex of subject on a dependent variable is the difference between the means for male and female subjects. However, it is also possible to think about correlational results within the same framework. A correlation between two variables is the “effect” of one on the other, and the difference between two correlations enables one to compare two different effects – for example, to answer the question whether the relationship of family size to intelligence is greater than the relationship of socioeconomic status to intelligence.
What Is the Problem in Estimating Effect Size? Having just defined what is meant by an effect, we can now turn to the problem of trying to see how big an effect is, of estimating the magnitude of an experimental effect. The problem is not a simple one, and it has generally been either ignored or treated as if it were simple. Some illustration may help. 1. Suppose a group of children exposed to an early childhood educational enrichment program show a five-point superiority over a control group on a 100-item achievement test. The experimental effect is five points, but how much are those five points worth? Is the program really a good one? Should congressmen vote money to implement it on a nationwide basis? The critical questions cannot be answered. 2. An investigator finds that there is a correlation of .32 between income and utilization of the services of mental health specialists. Is that correlation grounds for supposing that persons with low incomes are being seriously shortchanged in their access to mental health services? Is it a large enough relationship to demand action, or even further study? There is no way of saying – at least not right now.
Salkind_Chapter 73.indd 5
9/4/2010 10:58:47 AM
6
Research Design, Measurement and Statistics and Evaluation
3. How many lives would an experimental medical unit have to save to be considered impressive? A difference of one life could never be statistically significant, but would five lives or twenty lives be grounds for elation over an operation period of one year? Probably it all depends, but how much and on what (see Rhoads, 1978)? The foregoing examples are meant to convey the sense of uncertainty, often bordering on the absurd, that must afflict any insightful and honest investigator if asked what his or her results really mean in a practical sense. We believe that the same sense of uncertainty should often afflict even the investigator of a theoretical problem, but it is characteristic of theoretical research in psychology to ignore the issues of significance in other than a statistical sense. Because we do not wish to single out any particular investigation, we do not provide specific references here, but as an example note that one theoretical investigation with statistically significant results hung on the differences in ratings of objects on a five-point scale, with obtained differences being .22 and .25 points for two effects! One scarcely knows whether to conclude that the theory is woefully weak to produce such small differences or impressively powerful to be able to predict them. A prevalent concept of effect size involves the notion of “accounting for” or “explaining” variance. Presumably a large experimental effect is reflected in a large index, of whatever nature, of variance accounted for.
What Does It Mean to “Account for ” Variance? Nearly all statistics books are vague at best when it comes to explaining what is meant by “accounting for” variance. We have even encountered such circular explanations as that to account for variance means to explain it, with explaining it meaning to be able to account for it. Cohen and Cohen (1975), however, present a clear, understandable exposition of what is entailed in accounting for variance. Space does not permit elaboration here (see Sechrest and Yeaton, 1981c, for a detailed discussion), but put simply, accounting for variance means that one is able to reduce the variance in one’s errors of predictions of scores by applying some knowledge more precise than that a person is a member of a population. In the experimental paradigm, knowing merely that a person was a subject in an experiment enables no better prediction of his or her standing on a dependent variable than the overall mean for the sample. However, if an experimental treatment accounts for some of the total variance, that means that by knowing which treatment condition a subject was in, a better prediction, the mean for the condition, can be made, and variance of errors of prediction will be reduced from the variance of the total sample to some lower value. It will be noted that accounting for 25% or so of the variance in some scores does not make a great deal of difference in the standard deviation of
Salkind_Chapter 73.indd 6
9/4/2010 10:58:47 AM
Sechrest and Yeaton
Experimental Effects
7
errors of prediction. Let us suppose that a population exists for which the mean intelligence score is 100, and let us suppose that the standard deviation of those scores is 15, which results in a variance of 225. Accounting for 25% of the variance reduces the standard deviation of errors of predicted intelligence from only 15 to 13, a feat more unimpressive than usually imagined.
Estimating Size of Effects The problem of estimating the magnitude of an experimental effect as we noted, has been recognized for some time, although not so widely that it has achieved any degree of prominence in the research literature (e.g., Soderquist and Hussian, 1978). However, as early as 1935, Kelly developed the correlation measure e 2 (epsilon squared) which could be used to estimate effect size, though it was Bolles and Messick (1958) who made one of the earliest attempts to deal with the issue of estimating substantive significance, proposing the coefficient of utility U for use in statistically assessing the usefulness of specific experimental variables. Underlying the rule-of-thumb approaches soon to be discussed is the notion that some index of effect size can be devised for which useful, if arbitrary, comparisons may be made of the results of different experiments. We emphasize in advance, however, the arbitrary nature of the rules, for none of them speaks in a direct way to the issue of social or practical importance of findings, only to relative size of effects.
Cohen’s Rule of Thumb As much as any other person Cohen has been responsible for bringing the issue of effect size to the attention of the social science community (Cohen, 1977). In an article published in 1962, Cohen analyzed research reports in a complete volume of the Journal of Abnormal and Social Psychology in an attempt to determine the statistical power with which each analysis might confront the null hypothesis. Since this seminal article, several researchers have conducted power analyses in other disciplines (e.g., Brewer [1972] in education, Katzer and Sodt [1973] and Chase and Tucker [1975] in communications research, and Chase and Chase [1976] in applied psychological research). In Cohen’s article it was necessary for him to set an effect size in order to estimate power. Cohen simply stated that some effect sizes represented “small,” “medium,” and “large” effects and confessed that the values he chose were arbitrary but seemed “reasonable,” urging readers to render their own judgment on the matter. Cohen wished to establish a set of metrics for effect sizes that would make it possible to compare effect sizes across experiments. Moreover, he aimed to
Salkind_Chapter 73.indd 7
9/4/2010 10:58:47 AM
8
Research Design, Measurement and Statistics and Evaluation
make it possible to compare effect sizes across different statistics so that, for example, one could estimate whether an experimental result estimated by a chi-square test for difference between proportions is less or greater than a result estimated by a t-test for difference between means. Cohen’s classification of effect sizes for difference between means as small, medium, and large was based on the ratio (M1 − M2)/s. The specific values he chose were .25, .50, and 1.00 for small, medium, and large, respectively, though more recently, Cohen (1977) has reduced the initial ratios to .20, .50, and .80. These values for effect sizes are now being cited with some frequency in the literature, despite the fact that they have no compelling rationale other than that they seemed like a good idea at the time. We note that what Cohen means by “small” effect is likely to be really a small effect size in any practical sense. A difference between means of only .2s represents about 1% of the variance in the dependent variable, and even a “large” effect of .8s represents only about 14% of the variance in the dependent variable. To put it another way, Cohen’s small effect size would be reflected in a correlation of only .10 and a large effect in a correlation of only .37. Those are limited aspirations, indeed. Somewhat astoundingly, however, Cohen (1973b) notes that researchers are often implicitly testing for effect sizes smaller than what he has defined as small! The big advantage of Cohen’s rule of thumb is that effect size, by his procedure, is standardized and hence independent of specific population or sample values. One can compare the relative effects of manipulations as diverse as psychotherapy and demand characteristics on dependent variables as diverse as IQ and reduction in cigarette smoking. Since Cohen’s rule of thumb is standardized, one can, for purposes of statistical power analysis, state an anticipated effect size and do power analysis without the necessity for estimating population variance that would otherwise be required. Glass and his colleagues (Glass et al., 1981; Smith and Glass, 1977; Smith et al., 1980) have provided a practical application of Cohen’s method of standardizing results in their meta-analyses of psychotherapy and drug outcome studies. They converted relevant effects in several hundreds of studies to standard deviation units; that is, effects were expressed in terms of a fraction of a standard deviation of difference between experimental and control groups. From these quantitative syntheses they found, for example, that systematic desensitization results in an average effect of .48s. The meta-analytic approach, however, does not address the fundamental question of the value of the resulting effect. We are still uncertain whether .75s or .40s or any other fraction of a standard deviation difference in a dependent measure is worth the money and effort required to produce it. What can a group of children whose math scores are at the fiftieth percentile do that a group whose scores are at the fortieth percentile cannot do?
Salkind_Chapter 73.indd 8
9/4/2010 10:58:47 AM
Sechrest and Yeaton
Experimental Effects
9
Friedman’s rm Friedman (1968) attempted to establish a single generalizable index of magnitude of experimental effect by expressing the relationship between a statistical measure such as t, F, or c 2 and sample size as a correlation. Beyond expressing the notion of effect size in a general form, Friedman’s contribution is a table making possible the quick estimation of effect size. One merely needs values for an inferential statistic and for sample size to enter the table, and rm may be read directly. To take but one example, a t of 2.50 with a sample size of 60 produces an rm of about .31, indicating that a little less than 10% of the variance is accounted for.
w 2 and Related Statistics Largely with impetus from Hays (1963, 1973) a statistic he named w 2 (omega squared) has come into use in estimating proportions of variance accounted for in experiments involving parametric tests of differences between means – t and ANOVA. Hays noted that w 2 is a population value to be estimated from sample data. The formulae for estimating w 2 are considered to produce biased estimates in unknown degree. Hays also notes that w 2 applies to ANOVA models with fixed effects. The formula for w 2 will vary according to the specific design that is involved, but for a simple one-way ANOVA it is: est w 2 = [SSbet − (J − l)MSwith]/[SStot + MSwith] Hays also discusses h 2 (eta squared), a sample statistic useful for descriptive purposes within any one experiment and interpreted in the same way as w 2. Actually, h 2 has traditionally been used to quantify curvilinear relationships (Peters and Van Voorhis, 1940). Its use to estimate proportion of variance accounted for by an experimental treatment (Cohen, 1965) is a direct extension of its capacity to express the relationship between variables not necessarily either ordered or linear in magnitude. Since h is a correlation ratio, h 2 is interpretable as proportion of variance in one variable accounted for by the other. Hays notes that since h 2 is a sample statistic, it is subject to capitalizing on chance and usually gives a larger estimate accounted for than does w 2. The computational formula for h 2 is: dfN(F)/[dfN(F) + dfD]. Hays describes an additional statistic for estimating proportion of variance accounted for, rI (rho), the intraclass correlation. According to Hays, ρ12 provides an estimate of variance accounted for in analyses involving a random effects model. It is also a population parameter. Although Hays asserted that w 2 is applicable only for analyses up to the two-way ANOVA, Fleiss (1969), Vaughn and Corballis (1969), Halderson and Glasnapp (1972), and Dodd and Schultz (1973) have extended the
Salkind_Chapter 73.indd 9
9/4/2010 10:58:47 AM
10
Research Design, Measurement and Statistics and Evaluation
rationale and computation to include random and mixed models and more complex designs. Halderson and Glasnapp (1972) give generalized rules for estimating magnitudes of effects in factorial and repeated measures ANOVA designs. Refinement in the methodology, notably in the estimation of interaction terms in mixed models (Dwyer, 1974), and attention to assumptions underlying the model (Gaebelein and Soderquist, 1976) have subsequently been proposed.
Comparisons among Effect Size Estimates Considerable energy has been expended to develop a set of guidelines to assist researchers in the choice of an appropriate statistical analysis of outcomes. Witness the number of statistical analysis and design textbooks available to students and faculty (e.g., Cochran and Cox, 1957; Kirk, 1968; Myers, 1966; Winer, 1962). With the exception of a few Monte Carlo studies (e.g., Carroll and Nordholm, 1975; Keselman, 1975), however, little comparable energy has been invested with estimates of effect size. The choice among these estimates is rather arbitrary. An obvious consideration in the choice of an estimate is the knowledge of exactly what quantity is being estimated. As we noted, the terminology “percentage of variance accounted for” is deficient in important ways as a descriptor of what is being estimated by effect size indicators. Furthermore, the computational formulae offer little intuition as to the actual quantities being estimated. This has the effect of inhibiting comparison, since we cannot ascertain if we are estimating fundamentally different quantities. Knowing that w 2 is a population parameter and that partial h 2 is a sample statistic does make strict comparison impossible, though some “feeling” for the differences between these two estimates is desirable. Further confusion is added when we learn that h 2 has been referred to in different ways by different researchers. Kennedy (1970) noted that Kerlinger 2 (1964) defined ηx = SSx /SSTot and claimed that Cohen (1965) and Friedman (1968) in their previous research had defined ηx2 as dfN (Fx )/[dfN (Fx ) + dfD ]. However, Cohen (1973a) subsequently corrected Kennedy’s use of the terminology eta squared for ηx2 = dfN (Fx )/[dfN (Fx ) + dfD ], recognizing this as a formula for partial eta squared. The confusion between eta squared and partial eta squared was alleviated considerably by Kennedy, who showed by algebraic simplication that partial ηx2 = SSx /(SSx + SSe ), thus making obvious the fact that the difference between the two eta square estimates is in the denominator of the two estimates; SSTot will change when any of the sources of variation in an experiment change, and the number of these sources will increase as the complexity of the experiment increases. However, SSx + SSe only varies as a function of one additional source of variation, namely SSe.
Salkind_Chapter 73.indd 10
9/4/2010 10:58:48 AM
Sechrest and Yeaton
Experimental Effects
11
Though there is no difference between these two estimates in the one-way ANOVA since SSTot = SSx + SSe, eta squared and partial eta squared will almost always differ in higher order ANOVA’s. Even partial eta squares for different sources of variation are not comparable when they have different bases (denominators) and cannot legitimately be added together to obtain a total percentage of variance accounted for (Cohen, 1973a). We wondered immediately if w 2 was analogous to h 2 or to partial h 2. To answer this question, we consulted Table 1 in Vaughn and Corballis (1969), which gives variance components for fixed, mixed, and random designs in 2 one- and two-way ANOVAs. Since ω 2 = σ x / σ tot for the one- and two-way ANOVA, w 2 is analogous to h 2, since both denominators are expressed in terms of total variation. Additionally, it is apparent that Hays’s w 2 could be considered the fixed model case of the general components of variance approach. Previously we had wondered why w 2 was relevant to fixed models and pI, the intraclass correlation, to random models as stated by Hays (1963). The reason is simply that w 2 (an arbitrary symbol) refers to the fixed model case from which its computational formula is derived, while rI is comparable to w 2 except that it is the symbol chosen to designate the computational formula taken from the same components of variance approach when the model is random. A third symbol could just as easily have been chosen for those formulae derived from the components of variance approach when the underlying model is mixed.
Table 1: Comparison of effect size estimates in three different studies* 2
η =
SS x x
SSTot Source of variation
w2 =
2
2
Partial η = rm
Percentage of variance accounted for
I. Evaluation (E) Attitudes (A) E×A
41.0 23.1 4.3
40.6 22.5 3.1
56.5 42.3 11.9
II. (a) (1-way ANOVA) Conditioning
28.0
20.8
28.0
(b) (2-way ANOVA, between Ss) Conditioning (C) Interviewing (I) C×I
.2 5.9 7.4
0.0 2.2 3.7
0.0 6.4 7.9
III. (From a 3-way ANOVA) Achievement (A) Company policy (C) A×C
57.4 16.8 0.8
57.6 17.0 1.2
71.6 42.7 4.9
*Byrne and Rhamey, 1965 (I); Vitalo, 1970 (II); Lindsay et al., 1967 (III).
Salkind_Chapter 73.indd 11
9/4/2010 10:58:48 AM
12
Research Design, Measurement and Statistics and Evaluation
To summarize briefly, both w 2 and rI can be considered special cases of the components of variance approach. And though it is a population param2 ) is more similar eter, w 2 is more analogous to h 2, since its denominator (σ Tot 2 to the denominator in h (SSTot) than to the denominator in the partial h 2 (SSX + SSe ). One means of clarifying the important points made in this section on the comparison of effect sizes is to illustrate the extent of differences among effect size estimates with specific examples taken from the literature. Table 1 shows several effect size estimates calculated from data taken from Byrne and Rhamey (1965). h 2 and w 2 are very comparable, as are partial h 2 and rm2 . However, these two separate sets of estimates are discrepant. Since partial h 2 is based on a denominator using only source and error SS, sums of squares based on other main effects and interactions are not used in the calculation as they would be in effect size estimates based on total variation (h 2 and w 2 ). Consequently, partial h 2 will be substantially larger than w 2 when any other sources (main effects or interactions) account for considerable variance, as is the case in Byrne and Rhamey. Only when SSTot approximates SSx + SSe (i.e., other sources of variance are close to zero) will these measures be com2 2 parable. Also obvious from Table 1 is that partial ηE , partial ηA , and partial ηE2 XA sum to more than 100%. This is also true of rm2, though this is not true of h 2 or w 2. Since effect sizes are typically of small absolute magnitude, the undesirable feature of accounting for more than 100% variance would not likely be discovered by researchers. Effect size estimates in Table 1 (taken from Vitalo, 1970) allow comparisons in the one-way ANOVA as well as this two-way case. Here, the proportion of variance accounted for by sources is smaller than proportions in the Byrne and Rhamey study. h 2, partial h 2, and rm2 are indeed equal in the one-way ANOVA. Though w 2 is more similar to h 2 than to partial h 2 and rm2 , such a population parameter is not likely to approximate closely sample statistics when the sample size is small, as is the case in Vitalo. Effect size estimates of sources in the two-way ANOVA are generally comparable, since other sources than those being tested account for small proportions of the total variance. Again, w 2 is more discrepant from h 2 than in the Byrne and Rhamey data due to the smaller sample size. Another interesting comparison among effect size estimates can be made by choosing published studies which have reported effect size estimates. For example, in Lindsay, Marks, and Gorlow (1967) the h 2 effect size estimate closely parallels respective w 2 values due to the large sample size. However, the existence of main effects, which account for substantial portions of the total variance, causes w 2 and partial h 2 to be discrepant. Hard and fast decision rules regarding choice among effect size estimates are difficult to produce and perhaps undesirable. However, w 2 appears to be the logical choice if the researcher wishes to be conservative in statements regarding percentage of total variance accounted for. The fact that partial h 2
Salkind_Chapter 73.indd 12
9/4/2010 10:58:48 AM
Sechrest and Yeaton
Experimental Effects
13
values summed across all the sources of variation in an experiment may total more than 100% should be considered a weakness of this statistic. It is also a bit uncomfortable to work with percentages that do not share the same base and that do not use 100% as a standard, even when the sum does not surpass 100%. However, given the typical research scenario in which the total SSs is made up largely of error variability, while other sources of variation contribute little to the total, the choice of an appropriate effect size estimate may be based on more practical considerations, such as computational ease. The best practice may be to report two or more estimates and let the reader judge the effectiveness of the results reported in the experiment.
How Much Variance Is There to Be Explained? It seems generally and naively to be assumed by those who favor calculations of proportion of variance explained that the actual variance to be explained is 100%. That assumption is unwarranted, since it requires the additional assumption that the dependent measure is measured without error. For the most part, investigators seem conceptually to deal with total variance as if it consisted of two parts: that variance accounted for by the experimental factors and a residual part commonly called “error.” Actually the total variance is better regarded as “partitionable” three ways: variance explained by the experimental factors, reliable variance not accounted for by experimental factors, and error, or unreliable variance. By definition, unreliable variance cannot be accounted for. Consider, for example, a study of a helicopter patrol strategy for decreasing the incidence of specific crimes (e.g., Schnelle et al., 1977). Presumably, the number of crimes occurring during helicopter patrolling would be subject to errors reflected in a host of reliable factors not accounted for by the experimental manipulation (number of criminals in the area, time of year, unemployment rate, etc.). Consequently, what variance could even in principle be explained by the helicopter patrol intervention would be total variance minus the error variance. If the patrol strategy manipulation accounted for 20% of the total variance, when there was only 40% reliable variance, the seeming importance of the experimental factors might be small even though, from the more insightful position where total reliable variance is known, the importance of experimental factors would be greatly enhanced. The argument presented here is reflected in the psychometric relationship between reliability and validity of a measure, it being the case that the maximum achievable validity of measure is limited to √ rtt . Thus, if a measure has a reliability coefficient4 of .81, the maximum validity coefficient that could be associated with a predictor of that measure would be .90, but it is the reliability coefficient, .81 (i.e., .902), that indicates the reliable variance to be accounted for. Therefore, if a dependent measure in an experiment had a reliability of .81, rather than estimating proportion of variance accounted for by
Salkind_Chapter 73.indd 13
9/4/2010 10:58:48 AM
14
Research Design, Measurement and Statistics and Evaluation
an experimental variable against a maximum of 100%, the estimate should be done against the base of 81%. A variable that accounted for 20% of the total variance in such a situation would account for 25% of the reliable variance.
What Determines Our Ability to Account for Variance? Now that the concept of accounting for variance has been explained in detail, it remains to be explained what determines variance accounted for. As a general proposition it can be stated that all measures of variance accounted for are specific to characteristics of the experiments from which the estimates were obtained, and therefore the ultimate interpretation of proportion of variance accounted for is a dubious prospect at best. There are, in fact, several determinants of variance accounted for within any experiment, and there are only inexact ways of knowing about or estimating the importance of those determinants. The problem of interpreting a measure of variance accounted for begins with the fact that all such measures are essentially ratios of variance within some treatment to a more inclusive variance estimate ranging from treatment plus error up to total experimental variance. The fact that a ratio is involved should suggest immediately that estimates of variance accounted for might be unstable, since small changes in the denominator may well change estimates drastically due to decisions made about how an experiment will be carried out. (For a more extended discussion of the following factors that determine our ability to account for variance, see Sechrest and Yeaton, 1981c.)
Built-in Variance First, the total variance to be accounted for will vary as a consequence of how much variance is built into the experiment. Thus, if experimental subjects are quite heterogeneous in factors associated with scores on dependent measures, there will be a larger total variance than if subjects are homogeneous (Glass and Hakstian, 1969). It should be easier to account for a lot of variance in the self-esteem scores of 15-year-old male delinquents in two experimental conditions than to account for the same proportion of variance in scores of two groups of delinquents whose only commonality is that they all live in the same county. Failure to replicate otherwise consistent results may be explained by the heterogeneity of the subject sample used in the study (e.g., Oakes, 1972).
Experimental Precision Another determinant of total variance in an experiment is the precision achieved in planning and the integrity maintained in implementing the experiment (Sechrest et al., 1979; Yeaton and Sechrest, 1981a). Consider,
Salkind_Chapter 73.indd 14
9/4/2010 10:58:49 AM
Sechrest and Yeaton
Experimental Effects
15
for example, the almost certain difference between otherwise identical experiments when one of them involves only a single, motivated experimenter, while the other involves several experimenters with little direct interest in the outcome. The second experiment would certainly have a greater total variance, and the apparent experimental effect would be smaller. Note, however, that there is no necessary effect on means of the experimental groups; hence, subtracting one mean from another might well suggest the same effect in the two experiments. There are many sources of imprecision that might cause two experiments to differ even if the same experimental treatment is being employed. Degree of standardization of experimenter demand, clarity of instructions, calibration of apparatus, degree of control achieved with respect to strength of the experimental manipulation, reliability of outcome measures, and many other factors will affect total variance to be explained and, consequently, proportion of variance explainable by any given variable. An interesting instance is provided by two experiments (Brady et al., 1976; Vitalo, 1970) involving the same experimental treatment and a generally serious attempt at replication. Brady et al. state that “the only known deviation from Vitalo’s (1970) study is that the number of subjects was increased from 28 to 32.”5 A critical difference in the results of the two studies was that Vitalo reported an F of 13.10 (p < .005) for an experimenters x conditions interaction, while Brady et al. obtained an F of only 1.54 (n.s.). The problem becomes clear when the SS for error is examined, for it is 2.40 in the first study and 19.88 in the second. (All other SSs in the source table were comparable.) For whatever reason, and despite their seemingly careful attempt to replicate Vitalo’s experiment, Brady, Rowe, and Smouse produced a considerably larger amount of unexplained variation in the within-subjects part of their experiment. Even if one wanted to compare the variance accounted for by two treatments within the same experiment, it is important to recognize that they may contribute differentially to error variance. Consider an experiment in which a drug and a behavioral intervention are to be jointly tested. It may be possible to achieve more control over the drug dosage than over the behavioral manipulation. In such a case, one might be seriously misled about the potential magnitude of the effect produced by the drug, since it would be judged not in terms of its own characteristic error but in terms of the total error associated with it and the behavior manipulation.
Number of Treatments Another factor which determines the variance one can account for in an experiment is the number of treatments being tested within the experiment. In general we would expect that the more effects that are being analyzed for,
Salkind_Chapter 73.indd 15
9/4/2010 10:58:49 AM
16
Research Design, Measurement and Statistics and Evaluation
the smaller the error term would be. Thus, one could expect to account by any one variable for a larger proportion of the variance when one or more other variables is being simultaneously studied (Kennedy, 1970). The various indices of variance accounted for utilize different denominators and hence are differentially susceptible to the effects of multiple factors in experiments. Specifically, w2 uses an estimate of total variance in the denominator, while h2 and rm use source plus error. Therefore, w2 will always be smaller than the other indices in multifactor experiments and probably is the index to be preferred.
Strength of Treatments Of great theoretical and practical interest is the fact that proportion of variance accounted for obviously depends on the strength of the experimental treatment (see Sechrest and Redner, 1979; Sechrest et al., 1979; Yeaton and Redner, forthcoming). A weak treatment could account for only a small proportion of the variance in most experiments, while a strong treatment could account for a large proportion. The problem in interpreting proportion of variance accounted for is that we rarely – at least in the social and behavioral sciences – have an independent measure of the strength of the treatment administered. For example, suppose one wished to know whether attitudinal similarity or physical attractiveness is a stronger determinant of interpersonal attraction. One could probably do little better than merely to describe the manipulations used to produce the levels of each factor and conclude that for the levels tested one or the other factor seemed to account for more variance in interpersonal attraction. For the more important theoretical question of which is generally more important, no statement can be made. There is no common metric for the two variables, so one cannot say how much physical attractiveness is equal to how much attitudinal similarity; consequently one could not say whether the treatments were of even approximately equal strength. In only a few cases do experimenters attempt to determine the strength of a treatment employed, other than by its effect on the dependent variable. When the attempt is made, it is often by means of a “manipulation check” whose meaning can only be taken literally. To show, for example, that experimental and control groups differ as they should on a seven-point rating scale gives no clue about the strength of treatment beyond the fact that it was different between the two conditions. How many scale points of difference between experimental and control groups means would be indicative of a moderately strong treatment? Of a very strong treatment? Without some way of assessing the strength of treatment, it makes little sense to talk about the proportion of variance it accounts for.
Salkind_Chapter 73.indd 16
9/4/2010 10:58:49 AM
Sechrest and Yeaton
Experimental Effects
17
Range of Treatments Still another limitation on interpretations of proportion of variance accounted for is that for any treatment involving more than two levels of an estimate of proportion of variance accounted for, far more can be obscured than revealed. If one were testing the effects of two alternative drugs for controlling blood pressure, even if one of the drugs were more effective than the other, relatively little of the variance in terminal blood pressures might be accounted for by the treatment effect. If, however, one added an untreated control group to the experiment, the treatment effect might, with seeming magic, be doubled. Glass and Hakstian (1969) have addressed this problem and note that it had previously been discussed by Sir Ronald Fisher (1946). A particularly apt example, however, has been provided by Levin (1967). He described an experiment with six experimental conditions analyzed by a one-way ANOVA, with the result that w2 was 37%. However, subsequent analyses indicated that over 85% of the explained variation was attributable to the superiority of one group to all the others.
Real-World Variance One final problem in interpreting proportion of variance accounted for has to do with its “external validity” that is, its relationship to any “real-world” context in which one might want to draw inferences about the probable effect of some intervention. The variance that exists within an experiment depends largely on how the experimenter plans and implements the experiment. In the laboratory, when an experimenter studies the probability of a guilty verdict as a function of the physical attractiveness of a defendant, all other potential sources of variance in the determination of the verdict are controlled out of the experiment to as great an extent as possible, thus reducing the error term (unexplained variance) to a value below that likely to exist in the extraexperimental context. We are not arguing that physical attractiveness has no effect outside the social psychology laboratory; but we do argue that the fact that physical attractiveness can be made to affect responses in the laboratory in some degree does not mean that physical attractiveness has the same effect, let alone to the same degree, in the extraexperimental world.
The Search for Effectiveness Criteria It appears to us that no purely statistical method for assessing magnitude of experimental effects is going to be satisfactory, at least if one leaves the fairly
Salkind_Chapter 73.indd 17
9/4/2010 10:58:49 AM
18
Research Design, Measurement and Statistics and Evaluation
abstract world of theory building and enters into the realm of practical decision making. On the other hand, it is clearly not going to be satisfactory to continue as if all significant effects were important or to rely on haphazard or intuitive judgments. There appear to be no simple solutions for a whole variety of reasons, some of which would be remediable by changes in editorial policies and in ways in which investigators report their findings. At present it does not appear to us likely that any single procedure or set of rules will soon emerge. What is more likely is that the demands and customs prevalent in different research areas will result in differing opportunities to develop empirical rules for estimation of effect size. Some rules are likely to involve a degree of arbitrariness and good judgment, while others will probably be normative at some level. We have begun to explore several alternative methods and to assess the interrelation of these approaches (Sechrest and Yeaton, 1981a, 1981b; Yeaton and Sechrest, 1981a, 1981b). These initial efforts may provide the first steps toward the development of a set of useful tools for thinking about the outcome of experiments. The ability of these methods to discriminate between large and small experimental effects may be reflected in the acceptability of the procedures to investigators and decision makers. That these methods be impressive enough for acceptance is a telling test of our success in estimating magnitudes of experimental effects.
Notes 1. However, see Sechrest and Yeaton (1981a) for an explanation of why small differences at the means of two distributions may in some circumstances be important at the extremes. 2. A particularly cogent treatment of the test of significance and problems in its int erpretation was provided some years ago by Bakan (1966), in a paper still worth reading. 3. Meehl (1967) has noted the paradox in the differences between the approaches and methods of physics and psychology: The better the methods employed in physics, the greater the probability that the experiment will disprove the hypothesis, while in psychology hypotheses are stated in terms of deviations from the null. 4. Space limitations prevent further explanation, but we note that it obviously makes a great deal of difference which reliability coefficient one chooses to estimate variance to be accounted for. Cronbach et al. (1972) present a particularly cogent discussion of the issues involved, and their work should be consulted. 5. Tversky and Kahneman (1971) have demonstrated that attempts to replicate experimental findings are quite likely to fail unless the replication experiment has a substantially larger N with the resulting increase in statistical power. Brady et al. were on the right track in increasing sample size but did not go far enough. In order to know whether they did or did not replicate Vitalo’s findings, it would be necessary to have a table of means as well as an ANOVA table, since the direction of results could have been replicated even though statistical significance was not achieved. Unfortunately, Brady et al. did not give a table of means, nor do they report directions of effects in their text.
Salkind_Chapter 73.indd 18
9/4/2010 10:58:49 AM
Sechrest and Yeaton
Experimental Effects
19
References Bakan, D. (1966) “The test of significance in psychological research.” Psych. Bull. 66: 423–437. Bolles, R., and S. Messick (1958) “Statistical utility in experimental inference.” Psych. Reports 4: 223–227. Brady, D., W. Rowe, and A. D. Smouse (1976) “Facilitative level and verbal conditioning: a replication.” J. of Counseling Psychology 23: 78 – 80. Brewer, J. K. (1972) “On the power of statistical tests in the American Educational Research Journal.” Amer. Educ. Research J. 9: 391– 401. Byrne, D., and R. Rhamey (1965) “Magnitude of positive and negative reinforcements as a determinant of attraction.” J. of Personality and Social Psychology 2: 884 – 889. Carroll, R. M., and L. A. Nordholm (1975) “Sampling characteristics of Kelley’s e 2 and Hays’ w 2.” Educ. and Psych. Measurement 35: 541– 554. Chase, L. J., and R. B. Chase (1976) “A statistical power analysis of applied psychological research.” J. of Applied Psychology 42: 29 – 41. Chase, L. J., and R. K. Tucker (1975) “A power-analytic examination of contemporary communication research.” Speech Monographs 61: 234 – 237. Cohen, J. (1977) Statistical Power Analysis and the Behavioral Sciences. New York: Academic Press. ——— (1973a) “Eta-squared and partial eta-squared in fixed factor ANOVA designs.” Educ. and Psych. Measurement 33: 107–112. ——— (1973b) “Statistical power analysis and research results.” Amer. Educ. Research J. 10: 225 – 229. ——— (1965) “Some statistical issues in psychological research,” in B. B. Wolman (ed.) Handbook of Clinical Psychology. New York: McGraw-Hill. ——— (1962) “The statistical power of abnormal-social psychological research: a review.” J. of Abnormal and Social Psychology 65: 145 –153. ———, and P. Cohen (1975) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Erlbaum. Craig, J. R., C. L. Eison, and L. P. Metze (1976) “Significance tests and their interpretation: an example utilizing published research and w2.” Bull. of the Psychonomic Society 7: 280 –282. Cronbach, L. J., G. C. Glaser, H. Nanda, and N. Rajaratnam (1972) The Dependability of Behavioral Measurement: Theory of Generalizability for Scores and Profiles. New York: John Wiley. Dodd, D. H., and R. F. Schultz (1973) “Computational procedures for estimating magnitude of effect for some analysis of variance designs.” Psych. Bull. 79: 391– 395. Dwyer, J. H. (1974) “Analysis of variance and the magnitude of effects: a general approach.” Psych. Bull. 81: 731–737. Fisher, R. A. (1946) Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. Fleiss, J. L. (1969) “Estimating the magnitude of experimental effects.” Psych. Bull. 72: 273 –276. Friedman, H. (1968) “Magnitude of experimental effect and a table for its rapid estimation.” Psych. Bull. 70: 245–251. Gaebelein, J. W., and D. R. Soderquist (1976) “A note on variance explained in the mixed analysis of variance model.” Psych. Bull. 83: 1110 –1112. Glass, G. V., and A. R. Hakstian (1969) “Measures of association in comparative experiments: their development and interpretation.” Amer. Educ. Research J. 6: 403– 413. Glass, G. V., B. McGaw, and M. L. Smith (1981) Meta-Analysis in Social Research. Beverly Hills, CA: Sage.
Salkind_Chapter 73.indd 19
9/4/2010 10:58:49 AM
20
Research Design, Measurement and Statistics and Evaluation
Halderson, J. S., and D. R. Glasnapp (1972) “Generalized rules for calculating the magnitude of an effect in factorial and repeated measures ANOVA designs.” Amer. Educ. Research J. 9: 301–310. Hays, W. L. (1973) Statistics for the Social Sciences. New York: Holt, Rinehart & Winston. ——— (1963) Statistics for Psychologists. New York: Holt, Rinehart & Winston. Katzer, J., and J. Sodt (1973) “An analysis of the use of statistical testing in communication research.” J. of Communication 23: 251–265. Kelly, T. L. (1935) “An unbiased correlation measure.” Proceedings of the National Academy of Sciences 21: 554 –559. Kennedy, J. J. (1970) “The eta coefficient in complex ANOVA designs.” Educ. and Psych. Measurement 30: 885 – 889. Kerlinger, F. N. (1964) Foundations of Behavioral Research. New York: Holt, Rinehart & Winston. Keselman, H. J. (1975) “A Monte Carlo investigation of three estimates of treatment magnitude: epsilon squared, eta squared, and omega squared.” Canadian Psych. Rev. 16: 44 – 48. Kirk, R. E. (1968) Experimental Design: Procedures for the Behavioral Sciences. Belmont, CA: Brooks/Cole. Levin, J. R. (1967) “Comment: misinterpreting the significance of ‘explained variation’.” Amer. Psychologist 22: 675 – 676. Lindsay, C. A., E. Marks, and L. Gorlow (1967) “The Herzberg theory: critique and reformulation.” J. of Applied Psychology 51: 330–339. Meehl, R. P. (1967) “Theory-testing in psychology and physics: a methodological paradox.” Philosophy of Science 34: 103 – 115. Myers, J. L. (1966) Fundamentals of Experimental Design. Boston: Allyn & Bacon. Oakes, W. (1972) “External validity and the use of real people as subjects.” Amer. Psychologist 27: 959 – 962. Peters, C. C., and W. R. Van Voorhis (1940) Statistical Procedures and Their Mathematical Bases. New York: McGraw-Hill. Rhoads, S. E. (1978) “How much should we spend to save a life?” Public Interest 51: 74 – 92. Savage, I. R. (1957) “Nonparametric statistics.” J. of the Amer. Statistical Assoc. 52: 331–344. Schnelle, J. R., R E. Kirchner, Jr., J. P. Casey, P. H. Uselton, Jr., and M. P. McNees (1977) “Patrol evaluation research: a multiple-baseline analysis of saturation police patrolling during day and night hours.” J. of Applied Behavior Analysis 10: 33– 40. Sechrest, L., and R. Redner (1979) “Strength and integrity of treatment in evaluation studies,” in Evaluation Reports. Washington, D.C.: National Criminal Justice Reference Service. Sechrest, L., and W. H. Yeaton (1981a) “Assessing the effectiveness of social programs: methodological and conceptual issues,” in S. Ball (ed.) New Directions in Evaluation Research. San Francisco: Jossey-Bass. ——— (1981b) “Empirical bases for estimating effect size,” in R. F. Boruch et al. (eds) Reanalyzing Program Evaluations: Policies and Practices for Secondary Analysis of Social and Educational Programs. San Francisco: Jossey-Bass. ——— (1981c) “Estimating magnitudes of experimental effects.” J. of Supplement Abstract Service, # 2355. Sechrest, L., S. G. West, M. A. Phillips, R. Redner, and W. Yeaton (1979) “Introduction – some neglected problems in evaluation research: strength and integrity of treatments,” in L. Sechrest et al. (eds) Evaluation Studies Review Annual, Vol. 4. Beverly Hills, CA: Sage.
Salkind_Chapter 73.indd 20
9/4/2010 10:58:49 AM
Sechrest and Yeaton
Experimental Effects
21
Smith, M. L. (1980) “Sex bias in counseling and psychotherapy.” Psych. Bull. 87: 392– 407. ———, and G. V Glass (1977) “Meta-analysis of psychotherapy outcome studies.” Amer. Psychologist 32: 752–760. ———, and T. I. Miller (1980) The Benefits of Psychotherapy. Baltimore: Johns Hopkins Univ. Press. Soderquist, D. R., and R. A. Hussian (1978) “The utility of utility indices.” Bull. of the Psychonomic Society 11: 136–138. Tversky, A., and D. Kahneman (1971) “Belief in the law of small numbers.” Psych. Bull. 76: 105 –110. Vaughn, G. M., and M. C. Corballis (1969) “Beyond tests of significance: estimating strength of effects in selected AVOVA designs.” Psych. Bull. 72: 204 –213. Vitalo, R. L. (1970) “Effects of facilitative interpersonal functioning in a conditioning paradigm.” J. of Counseling Psychology 17: 141–144. Winer, B. J. (1962) Statistical Principles in Experimental Design. New York: McGraw-Hill. Yeaton, W. H., and R. Redner (forthcoming) “Measuring strength and integrity of treatments: rationale, techniques, and examples,” in R. Conner (ed) Methodological Advances in Evaluation Research. Beverly Hills, CA: Sage. Yeaton, W. H., and L. Sechrest (1981a) “Critical dimensions in the choice and maintenance of successful treatments: strength, integrity, and effectiveness.” J. of Consulting and Clinical Psychology 49: 156 –167. ——— (1981b) “Estimating effect size,” in P. M. Wortman (ed) Methods for Evaluating Health Services. Beverly Hills, CA: Sage.
Salkind_Chapter 73.indd 21
9/4/2010 10:58:49 AM
Salkind_Chapter 73.indd 22
9/4/2010 10:58:49 AM
74 Hypothesis Testing in Relation to Statistical Methodology Cherry Ann Clark
T
he Shortcomings in the methodology of statistical hypothesis testing used in educational and psychological research have been emphasized repeatedly in recent behavioral science and statistical literature (Binder, 1963; Edwards, Lindman, and Savage, 1963; Grant, 1962; Lubin, 1962; McNemar, 1960; Mowrer, 1960; Nunnally, 1960; Rozeboom, 1960; Savage, 1957). This chapter reviews the salient points of the criticisms. Since this issue of the REVIEW marks the first time an entire chapter has been devoted to the statistical methodology of hypothesis testing, a brief account of several theories of statistical inference is included to provide a background for evaluating the rationales of significance tests compared with other methods for statistical inferences, as well as the modifications in the function of the null hypothesis in testing statistical hypotheses. The problems in statistical inference which have been associated with the widespread use of significance tests are reviewed. The limitations in and the effectiveness of significance tests as methods for informative inference are described. The applications of interval estimation contrasted with significance tests in the investigation of hypotheses and models and in the development of a body of empirical data are summarized.
Hypothesis Testing and Statistical Inference Hypothesis testing is a central and complex problem in the methodology of science. It involves comparing the deductions or the predictions from Source: Review of Educational Research, XXXIII(5) (1963): 455– 473.
Salkind_Chapter 74.indd 23
9/4/2010 7:10:33 PM
24
Research Design, Measurement and Statistics and Evaluation
scientific hypotheses with observational data to eliminate unsatisfactory hypotheses and to give support to satisfactory ones. A primary objective of hypothesis testing methodology is the formulation of rules or criteria, sometimes called decision procedures, to use in determining whether the data should be construed as rejecting or accepting the hypothesis under investigation. Decision procedures are judged by such criteria as consistency, relevance, completeness, and effectiveness. Decision rules are formulated to provide objective, reliable, and valid solutions to important problems in the analysis of data (Buehler, 1959; Hotelling, 1958; Jeffreys, 1957, 1961; Tukey, 1962). The hypotheses to be tested must be examined for their logical validity or consistency, their heuristic value, and their amenability to empirical test. The testing of hypotheses requires that appropriate controls be employed in making observations and in analyzing the data. Systematic errors in the experimental procedures must be minimized if decision rules are to be efficient (Tukey, 1954, 1960a, 1962). Statistical hypothesis testing is a special instance of the general scientific method of testing hypotheses. As in the general method, statistical hypothesis testing combines deductive and inductive methods in intricate ways. For the most part, statistical hypothesis testing has been concerned with determining which one of a dichotomous set of mutually exclusive and exhaustive hypotheses is to be rejected and which one accepted at a specified level of risk in making an erroneous conclusion or decision. Probability theory provides the deductive foundations for theories of statistical inference (Hotelling, 1958). Natural phenomena which are presumed to be subject to random fluctuations, either because of a stochastic process characterizing their behavior or because of random errors in observing or measuring them, provide the inductive basis for statistical inference. Both enumerative and eliminative methods of induction are used in treating the data and in formulating statistical generalizations. The question of whether statistical methods based on eliminative or enumerative rules of induction constitute the best foundation for statistical inference is widely disputed (Rozeboom, 1960; Savage and others, 1962). The argument about whether scientific knowledge advances primarily by accumulation of positive instances of phenomena to support a hypothesis or by negative instances to refute a hypothesis is the focus of some of the criticism of the classical theories of statistical inference – namely, the Fisher and Neyman-Pearson theories (Savage, 1961). Another argument in the current controversy on the foundations of statistics focuses on the question of whether there is a sound logical basis for assigning a direct probability to a set of propositions or hypotheses. The frequency or objective school has denied the legitimacy of such a procedure. Probability, for this school, refers to the relative frequency of events within a class of random recurrences (Hotelling, 1958), not to the amount of support for or the degree of belief in a hypothesis. On the other hand, the subjective probability school, among
Salkind_Chapter 74.indd 24
9/4/2010 7:10:33 PM
Clark
Hypothesis Testing and Statistical Methodology
25
others, has argued that a fruitful and consistent way to proceed with statistical inferences is by assigning a degree of belief to a statement based upon available information and then by altering the amount of uncertainty about the statement in the light of additional experimental evidence by the use of Bayes’s theorem (Lindley, 1953, 1961; Rozeboom, 1960; Savage, 1961). Another aspect of the controversy is how the likelihood function should be incorporated into statistical theory and practice. One of the serious limitations in the classical theories of significance tests is that they are not dependent upon the likelihood function (Birnbaum, 1962). Statistical hypotheses are more restricted than are scientific hypotheses, just as statistical inferences are more restricted than scientific inferences (Bolles, 1962; Cox, 1958b; Tukey, 1960a). Statistical hypotheses concern the behavior of observable random variables, whereas scientific hypotheses treat the phenomena of nature and man. A null hypothesis is a particular statistical hypothesis which refers to the theoretical probability distribution governing the random variable(s) under investigation and is the hypothesis under test (Kendall and Stuart, 1961). Statistical inference, broadly defined, deals with statements about statistical populations made from given observations with a measured degree of uncertainty (Cox, 1958b). Statistical inference proceeds from observations to conclusions about the populations sampled. Scientific inference, on the other hand, often argues from descriptive facts about populations to an abstraction about the system of phenomena under investigation. A statistical inference frequently is instrumental in formulating a general inference, but usually it constitutes only a small part of the uncertainty connected with the scientific inference (Cox, 1958b; Tukey, 1960a). Decision procedures used in making statistical inferences, including those that aid in the selection of statistical design and analysis (Cox, 1958a; Savage, 1961), require that the investigator use his judgment in diverse ways. How such judgment should be allowed to affect the use of the decision procedures is a crucial issue in the present controversy on the foundations of statistical inference (Savage and others, 1962; Tukey, 1954, 1960a, 1962). Much of the criticism of the rationales for significance tests has been aimed at the inconsistencies in the performance of the decision procedures (Berkson, 1942; Birnbaum, 1962; Cox, 1958b; Edwards, Lindman, and Savage, 1963; Pratt, 1961a, b; Savage and others, 1962).
Development of Theories of Statistical Inference and Significance Tests The use of significance tests preceded the development of any systematic theories of statistical inference. Barnard (Savage and others, 1962) and Lehmann (1959) have mentioned their use by two early probability theorists, Daniel Bernoulli and Pierre Simon de Laplace. The principal components
Salkind_Chapter 74.indd 25
9/4/2010 7:10:33 PM
26
Research Design, Measurement and Statistics and Evaluation
of significance tests were contained in their mathematical studies on the departures of planetary orbits from specific mathematical models. The essential components were the theoretical or mathematical hypotheses and equations and the observations on the extent of the departures of the planetary planes from the expected values. The significant departures were those that were considered unlikely on a particular hypothesis. When Bernoulli found that the values he computed were uniformly improbable on the hypotheses, he concluded that the observations provided evidence for rejecting the initial hypotheses. He did not consider an alternative hypothesis, nor did he have a particular degree of departure in mind before he made his computations. Karl Pearson’s work on goodness of fit tests and the chi square distribution and W. S. Gossett’s derivation of the student distribution, a small sample exactsampling distribution, were important influences on subsequent developments in the theory of significance tests.
Contributions of Fisher to Statistical Inference Sir Ronald Fisher has been called the founder of modern statistical theory. He propounded many basic concepts and methods and developed the first complete theory of statistical inference (Hotelling, 1951; Pearson, 1962; Yates, 1951). Among his contributions to the theory of significance tests were a systematic rationale for critical ratio tests based on the Central Limit Theorem and the Law of Large Numbers; exact small sample tests for which he derived many theoretical sampling distributions; and randomization or permutation tests to be used in complex experimental designs when the assumptions underlying other significance tests procedures are not fulfilled. He developed interval estimation methods, which culminated in his theory of fiducial inference with its fiducial intervals. He introduced an extensive formal theory of experimental and statistical design and related methods of analysis, including the analyses of variance and covariance. He set forth the method of maximum likelihood as a foundation for statistical inference and as a basis for developing many statistical techniques. He discussed the concept of sufficient statistics, which has been used extensively in modern statistical theories. He argued that inverse probability has no place in modern science or statistical theory (Fisher, 1956, 1960). He originated the term null hypothesis. He presented significance tests of null hypotheses as examples of a logical disjunction. In this view, the null hypothesis could never be proved by experimentation. On the contrary, the null hypothesis exists only to be rejected by a sufficiently sensitive experiment. Rarely is an experimenter interested in accepting the null hypothesis, an exception being in determining the uniformity of experimental procedures (Fisher, 1960). Fisher discussed the combination of estimation procedures with significance tests after the initial phases of experimentation have been completed.
Salkind_Chapter 74.indd 26
9/4/2010 7:10:33 PM
Clark
Hypothesis Testing and Statistical Methodology
27
Hotelling (1951) and Yates (1951), in evaluating Fisher’s impact upon statistical practice in the social sciences, suggested that his books have not had an altogether positive influence. Psychologists, for example, have been content to publish very limited reports on one or at most several significance tests based on null hypotheses of no difference (Grant, 1962; Nunnally, 1960; Rozeboom, 1960). Such statistical tests have very little informative value. They often are known to be false prior to experimentation (Savage, 1954).
Contributions of Neyman and Pearson to Statistical Inference Jerzy Neyman and Egon Pearson collaborated for more than a decade on the development of many aspects of statistical theory and methods. Some of their work was an extension of Fisher’s contributions, and some of it was a reaction to Fisher’s proposals. They expanded and systematized the rationale of significance tests. They argued for the necessity of carefully considering the alternative to the null hypothesis in making any test of significance. They introduced the notions of errors of the first and second kinds, the power of a test in discriminating between the hypothesis under test and the alternative, and the comparison of the power functions of different tests to aid in the selection of the most satisfactory statistical test. They formulated an extensive rationale for the comparison and derivation of different tests in terms of their “nice properties,” such as the uniformly most powerful test, most powerful tests, and unbiased tests. They expanded Fisher’s maximum likelihood method into the test criterion called the Neyman-Pearson lemma, which they used to develop tests of simple statistical hypotheses. They gave considerable attention to the likelihood ratio in the development of likelihood ratio tests, including both univariate and multivariate tests. They pointed out the great complexity of the problems in deriving suitable test criteria for complex hypotheses as contrasted with simple hypotheses. They, like Fisher, demonstrated the wide variety of statistical tests to be derived from the maximum likelihood method. They posited that the size of the sample should be set before experimentation is begun. The decision rules for tests of significance within their theory are based upon hypothetical repetitions of similar size random samples from the population(s) under investigation. Determination of the size of the sample is a function both of the size of the critical region and of the sensitivity or power of the test in detecting departures from the null hypothesis. They conceptualized the error rate in significance tests as the probability of obtaining a given or more extreme level of significance in an extended series of random fixed sample size experiments. They reasoned that a significance test is a decision procedure to be used as an aid in deciding which one of two actions to take in the face of uncertainty (Lehmann, 1959).
Salkind_Chapter 74.indd 27
9/4/2010 7:10:33 PM
28
Research Design, Measurement and Statistics and Evaluation
In their exposition of the theory of significance tests, Neyman and Pearson incorporated the spirit of probabilistic reasoning for a two-valued problem. Theoretically, the null hypothesis and the alternative hypothesis have equal status: the data are to determine which hypothesis is more probable. The error of the first kind is considered the more serious; it is determined by the size of the critical region or the level of significance. The error of the second kind cannot be controlled directly by the experimenter, but is a function of the size of the critical region and of the characteristics of the power function of the test in relation to the specific alternative hypothesis under consideration. In actuality, the null hypothesis is rarely found to be acceptable in the Neyman-Pearson formulation, for as the sample size increases, a sufficiently sensitive test rejects the null hypothesis at a decreasing level of significance (Kendall and Stuart, 1961; Pratt, 1961b). This fact has been of grave concern to a number of statisticians (Savage and others, 1962). This fact, in addition to certain other inconsistencies in the performance of the decisions rule, has evoked sharp criticism of this classical method for statistical inference (Pratt, 1961b). The Neyman-Pearson theory indicated the relationship between tests of significance and the method of interval estimation called confidence intervals (Berkson, 1942; Birnbaum, 1961; Bulmer, 1957; Kendall and Stuart, 1961; Lehmann, 1959; Natrella, 1960; Pratt, 1961a). A test of significance is concerned with only those values specified by the hypotheses under consideration, while confidence intervals by their width and by the level of confidence indicate the multitude of statistical hypotheses which are acceptable as well as how acceptable each hypothesis is by its location within the confidence interval. The Neyman-Pearson rationale showed the practical and theoretical equivalence of tests of significance accompanied by the power function or the operating characteristic curve and confidence intervals (Natrella, 1960). Pratt (1961a, b) reminded statisticians of one important shortcoming of the theory of confidence intervals: the probability is either one or zero that the value of interest is contained within the confidence interval; moreover, the formulation is based upon the frequency interpretation of probability, which does not allow for a straightforward degree of confidence in the likelihood of the obtained statistic. Educational and psychological investigators have rarely used effectively the Neyman-Pearson formulation either for significance testing or for interval estimation.
Contributions of Wald to Statistical Inference In the short but exceptionally productive time that Wald contributed to statistics, he made many important modifications in the theory of statistical hypothesis testing. He recognized the desirability of incorporating into
Salkind_Chapter 74.indd 28
9/4/2010 7:10:33 PM
Clark
Hypothesis Testing and Statistical Methodology
29
statistical theory and practice provision for handling sequentially selected samples of varying sizes of items rather than limiting statistical design and analysis to fixed sample sizes, as required by the Neyman-Pearson method. He devised the sequential probability ratio test to analyze sequentially sampled items in industrial situations, such as quality control. Sampling is continued until a desired degree of precision is obtained as a basis either for accepting the null hypothesis or for rejecting it and accepting the alternative. The dichotomous decision problem of earlier significance tests is expanded into a trichotomous decision problem: either to accept or reject a statistical hypothesis or to continue sampling (Savage, 1954; Schlaifer, 1959). Two constants are selected for the test procedure to give the desired weight to the probability of each of the two kinds of errors. In his Statistical Decision Functions, Wald (1950) gave explicit consideration to the statistician’s role as a decision maker by his detailed specifications of the various factors of the decision situation. Previously, Neyman and Pearson had given indirect consideration to the possible losses consequent to a wrong decision by the different values to be selected for the two kinds of errors. Instead of basing the decision rule upon the specification of error rates as Fisher and Neyman and Pearson had done, Wald gave formal recognition to the assessment of losses involved in making wrong decisions (Bahadur and Robbins, 1950; Lindley, 1953, 1961). The formal mathematical structure of statistical decision theory is based upon the theory of games (Luce and Raiffa, 1957). It requires the statistical consumer or investigator to be able to list a set of possible actions, among which is a preferred action. The action preferred depends on what the “true state of nature” is, in other words, upon the unknown value of some parameter. The preferences among the actions are assessed in terms of the losses attached to each possible action to be taken in the presence of each state of nature (the circumstance which may be referred to as the loss function). The decision maker can obtain information about the state of nature at some specified cost for the observations made; he must balance cost against possible losses in the face of insufficient information. The decision function, then, depends upon the given loss function, the set of possible actions, the possible states of nature, the prior information available about the state of nature, and the cost of making observations (Savage, 1954). Point estimation, interval estimation, and hypothesis testing are subsumed within a single theory, known as statistical hypothesis testing. The problem of choosing a design and an analysis is integrated within the theory and is closely associated with the factors determining the choice of the statistical decision function. Another noteworthy contribution of Wald was his demonstration that there is not a unique decision procedure which is equally effective in all conditions. He showed that in those cases where prior information is available, a Bayes solution is admissible and optimum. There are situations, he pointed out,
Salkind_Chapter 74.indd 29
9/4/2010 7:10:33 PM
30
Research Design, Measurement and Statistics and Evaluation
in which a Bayes solution is not part of an admissible strategy. He also raised the question of formulating a class of admissible hypotheses, and he thereby clarified some of the issues connected with composite hypotheses in the Neyman-Pearson formulation. The theory of testing statistical hypotheses provides a decision procedure for working with multiple hypotheses rather than merely with two hypotheses, as in the theory of significance tests. For example, given three hypotheses, including a null hypothesis and two alternative hypotheses at different distances on either side of the null hypothesis, an optimum statistical decision function often is available to select the most probable of the three hypotheses on the available data (Lehmann, 1959). The mathematical complexity of statistical decision theory and the great amount of information which is required to use it have discouraged its application in behavioral science research. Its theoretical formulations have helped to clarify a number of problems in the conventional use of significance tests (Kaiser, 1960).
Contributions of Bayesian Statistical Theory to Statistical Inference Bayesian statistical theory, for the most part, is not so dependent on significance tests for informative statistical inferences as is classical statistical theory, namely the theories of Fisher and Neyman and Pearson. With rare exceptions, the British-American school of statistics has been firmly entrenched in the objective or frequency interpretation of probability and has eschewed the use of Bayes’s theorem or the likelihood function as bases for formulating statistical inferences. The Continental school, on the other hand, has been concerned with giving formal mathematical treatment to the subjective theory of probability or with developing statistical methods based on the use of Bayes’s theorem (Savage, 1954). One British scientist working somewhat independently of the BritishAmerican school, his work unrecognized until recently by the majority of American statisticians, has made many important technical advances in the use of Bayes’s theorem. Jeffreys (1957, 1961) has treated the problem of significance tests in a unique manner. Tests of significance are used to assess whether the hypothesized parameter(s) can be considered to account adequately for the obtained data. If the approximation of the posterior probability distribution to the null hypothesis, which reflects the prior probability distribution, is not satisfactory, then another parameter is introduced into the model. The process of adjusting the value of the null hypothesis is continued until a satisfactory approximation is attained, that is, within the limits of random sampling variation and errors of measurement. Jeffreys has emphasized the
Salkind_Chapter 74.indd 30
9/4/2010 7:10:33 PM
Clark
Hypothesis Testing and Statistical Methodology
31
importance of refining measurement methods for the advancement of scientific knowledge. For Jeffreys one of the objectives of scientific investigation has been the verification of satisfactory null hypotheses. Perusal of the computations required in the use of Bayes’s theorem for t-tests of significance would dismay the average social scientist accustomed to t-tests without consideration of prior probability distributions. The merits of Jeffreys’ work have been recognized by Lindley (1953, 1961), Raiffa and Schlaifer (1961), Savage (1954, 1961), and Savage and others (1962). His work and that of the French and Italian probability theorists have influenced the surgence of Bayesian statistics, especially Bayesian statistical decision theory in the United States (Raiffa and Schlaifer, 1961; Roberts, 1962; Schlaifer, 1959). In his compendious discussion concerning the foundations of statistics of 10 years ago, Savage (Savage, 1954; Edwards, 1956) argued that various aspects of the classical theories of statistics could be reinforced and given a consistent logical framework relevant to the behavior of the scientist and the decision maker by adopting the personal interpretation of probability. At that time, he critically evaluated many of the conventional practices in the use of statistical methods, especially the widespread use of extreme null hypotheses which are known to be false prior to experimentation. Subsequently, Lindley (1961) and Raiffa and Schlaifer (1961) have built upon the work of Jeffreys a still incomplete structure of statistical theory and methods using Bayes’s theorem. Edwards, Lindman, and Savage (1963) have presented to psychologists some of the fundamental ideas and techniques of Bayesian statistics. Bayesian statisticians have pointed out that classical statistical methods do not make full use of available information and that they do not provide concise and relevant answers to the questions of greatest importance to investigators. Furthermore, in many situations the decision procedures do not meet the basic criterion of consistency (Pratt, 1961b). Classical and Bayesian formulations have points in common. Both have the following elements: alternative hypotheses or acts, possible parameter values or states, the possibility of sampling (experimentation) to obtain information about the parameter value or state, the sampling or experimental results in the form of descriptive statistics, and either loss structures or error rates. Both seek the best decision rule which will minimize loss or error in the decisions or conclusions made following experimentation. The classical decision maker chooses an act on the basis of the outcome of a sample which is conditional upon the parameter and the type of sample and experiment. Within the decision rules for significance tests, the so-called conditionality of the experiment and of the sample is not adequately represented; the statistician must use his judgment to give appropriate weight to the conditions when he interprets the obtained level of significance (Cox, 1958b).
Salkind_Chapter 74.indd 31
9/4/2010 7:10:33 PM
32
Research Design, Measurement and Statistics and Evaluation
The Bayesian statistician pursues a different course. He begins by making probabilistic statements about the parameter under investigation. The probabilistic statements are in the form of a prior distribution. Then, dependent upon how precise he wishes to be in his final probabilistic statements about the value of the parameter, he sets a limit, or an upper bound, in the form of an expected value on the cost or worth of experimental data for selecting a terminal act or decision. From this point of view he decides whether it is worthwhile to sample. As sampling information becomes available, the Bayesian modifies his prior distribution in the light of sample evidence and thereby obtains a posterior distribution. By using some specified decision rule, he may decide that he has sufficient information to come to a decision or a conclusion; or he may decide that his uncertainty has not been reduced enough and that, therefore, he must continue sampling until he has a suitable basis for a terminal act or conclusion (Raiffa and Schlaifer, 1961; Roberts, 1962). The Bayesian analysis formalizes many aspects of statistical practice which classical methods leave to the judgment of the investigator. For the orthodox contention that there is often little objective basis for arriving at prior probability distributions, Bayesians have countered that in at least a number of situations the prior distribution assumes little weight in the final or posterior distribution (Edwards, Lindman, and Savage, 1963; Savage and others, 1962). Several interesting solutions to this problem have been indicated by Raiffa and Schlaifer (1961). Edwards, Lindman, and Savage (1963), Lindley (1961), and Savage and others (1962) have discussed some of the advantages of Bayesian significance tests over classical tests. The first-mentioned authors have discussed the importance in many educational and psychological research problems of taking possible losses or gains and prior information into account. For example, classical t-tests and F-tests have been used to study the differential effects on several groups of several methods of instruction. In neither approach has it been practical to take into account considerations of previously available information about the differential effectiveness of one method versus another; nor has it been customary to give attention to various kinds of losses, economic or learning, which might be associated with the adoption of one method rather than another. The authors have showed how Bayesian procedures can provide appropriate answers to such problems.
The Likelihood Principle and Statistical Inference The likelihood principle, like Bayes’s theorem, has been recognized as a part of the armamentarium of statistics; but, as with Bayes’s theorem, it has been given little attention as a primary tool for statistical inference. The likelihood function is represented by the distribution function of the
Salkind_Chapter 74.indd 32
9/4/2010 7:10:33 PM
Clark
Hypothesis Testing and Statistical Methodology
33
random sample variables corresponding to values of given parameters and is determined by the observed outcome of a random variable in any specified experiment (Birnbaum, 1962). Barnard, Fisher, and Birnbaum (Birnbaum, 1962) have argued that the likelihood function alone provides a suitable and sufficient basis for interpreting experimental data. The likelihood function can be interpreted without reference to the structure of the experiment. In contrast with the classical methods of significance tests and confidence levels, its use avoids the problem of having to modify the interpretation of the significance and confidence levels in the light of the experimental frame of reference. The likelihood function clearly asserts the irrelevance of possible experimental outcomes which have not been observed during any given experiment. One of the main objections to the use of the tail areas of the probability distribution for significance tests is just that unobserved experimental possibilities are irrelevant for the formulation of inferences about the actual observations. The likelihood function and Bayes’s theorem are not dependent upon the sequence of or the stopping point in sampling, whereas orthodox significance tests and confidence intervals cannot be used when sampling is stopped arbitrarily.
Use of Significance Tests for Statistical Inference The test of significant difference has been called the prototype of modern experimental statistics (Tukey, 1960a). It is a qualitative rather than a quantitative procedure for statistical analysis and inference which is used to answer questions such as this: Dare we conclude that the difference is not zero? A classical significance test assays whether the two statistical hypotheses A and B are equal or whether A is less than or greater than B. The failure to obtain a significant difference does not warrant the conclusion that A is equal to B until careful consideration is given to the specific experimental situation, the available evidence, and the assessment of the consequences for such a decision or conclusion (Tukey, 1960a). The precision of the experimental comparison, the power of the statistical test, and the theoretical and the empirical closeness of A to B must be examined before the lack of statistical significance is interpreted as a positive finding (Cox, 1958a; Tukey, 1960a). The effectiveness of confidence intervals in showing the probable relationship between A and B indicates their superiority over significance tests for informative inference. Statisticians have given a great deal of attention to the use and the misuse of significance tests (Anscombe, 1961; Bahadur and Robbins, 1950; Good, 1958; Kish, 1959; Lindley, 1958; Savage, 1957; Selvin, 1957; Sterling, 1959, 1960; Williams, 1959). There has been general agreement that significance tests have been used too frequently in social science research (Grant, 1962; Harrington, 1961; Lubin,
Salkind_Chapter 74.indd 33
9/4/2010 7:10:34 PM
34
Research Design, Measurement and Statistics and Evaluation
1962; Nunnally, 1960; Rozeboom, 1960; Savage, 1957; and Wilson, 1961) and at the expense of more appropriate procedures, such as confidence and fiducial intervals (Birnbaum, 1961; Grant, 1962; Kish, 1959; Lindley, 1958; and Savage and others, 1962). Significance tests without power functions do not answer such questions as these: How far is the sample statistic from the null hypothesis? How much credence should be given to the null or the alternative hypotheses? The situations in which classical significance tests can be used are limited. Anscombe (1961) has pointed out the usefulness of significance tests as a fundamental method for testing theoretical hypotheses or models. For this purpose, the null hypothesis is given some credence in the light of theoretical and experimental consideration. The investigator wishes to determine in a new sample whether the null hypothesis is tenable, the observed departures being no more than those expected as a result of random sampling variations. When the departures are excessive, the investigator considers whether another parameter should be introduced into the statistical model or whether some other modification in the model should be made. Another use for significance tests occurs when an experimenter wishes to test the adequacy of a particular statistical design, an analysis or technique, a particular stochastic process, or an experimental procedure. He is interested in determining whether the methods used produce the desired precision or uniformity with due allowance for random sampling errors (Anscombe, 1961). Significance tests are also applicable when an investigator is concerned with testing a particular statistical hypothesis, when a set of specific alternatives has not yet been conceptualized (Savage and others, 1962). When a research worker wishes to verify a prediction that an experimental result has been in a specific direction, one-tailed significance tests are appropriate (Anscombe, 1961). Significance tests can be informative in later stages of research when estimation is included in the statistical procedures. The null hypothesis of no difference has been judged to be no longer a sound or fruitful basis for statistical investigation. Both the null and the alternative hypotheses are formulated to include estimation of relevant experimental variables (Bush, 1963; Grant, 1962; Nunnally, 1960; Tukey, 1960b). Significance tests can be divided into two groups: those for which a set of alternative hypotheses can be defined, such as tests of specific parameter values, and those for which a set of mutually exclusive and exhaustive hypotheses are not available, such as tests of randomness. The former almost always have comparable interval estimation procedures – at least, such procedures can be derived – whereas the latter pose serious problems for the derivation of interval estimation procedures. When comparable interval estimation procedures are not available, then an investigator has no choice but to use significance tests, as in the use of some distribution-free methods (Kendall and Stuart, 1961; Savage and others, 1962).
Salkind_Chapter 74.indd 34
9/4/2010 7:10:34 PM
Clark
Hypothesis Testing and Statistical Methodology
35
There are many statistical problems for which both significance and interval estimation procedures are available. Similar answers are obtained if the procedures are used effectively and appropriately (Berkson, 1942; Birnbaum, 1961; Kendall and Stuart, 1961; Natrella, 1960; Pratt, 1961a). Statisticians have agreed that interval estimation does give a more intuitively understandable summary of the data than do significance tests with power functions (Kish, 1959; Natrella, 1960). Among statistical problems which often can be treated by either testing or interval estimation procedures are (a) simple preference problems, wherein a family of simple hypotheses is ranked in order of credibility on the basis of the data, and (b) composite preference problems, wherein a family of composite hypotheses is ranked on the basis of the data (Lehmann, 1959; Savage and others, 1962). These problems conventionally have been classified under the rubrics of hypothesis testing, point and interval estimation, and discrimination (Savage and others, 1962).
Problems in Statistical Hypothesis Testing There are many unsolved problems in the theory and the practice of statistical hypothesis testing. Some of the theoretical problems have been mentioned at the beginning of this chapter. In the following paragraphs some of the practical problems are summarized.
Decision Theory and Conclusion Theory Tukey (1960a) has suggested that some of the misunderstanding surrounding the use of significance tests can be clarified by distinguishing those statistical problems concerned with the necessity of making a choice between two alternatives from those concerned with the accumulation of evidence in support of a theory or set of working hypotheses. Decision theory is typically concerned with problems in which economic losses or rewards, a set of possible actions, and their consequences in various states of nature can be defined. Decision theory counsels the statistical consumer about how to choose wisely among available strategies (Bahadur and Robbins, 1950; Flanagan, 1958). Conclusion theory, on the other hand, is concerned with evaluating the adequacy of evidence. Conclusions need not be made if the evidence is deemed inadequate. A conclusion is accepted relative to the conditions of an experiment until compelling evidence to the contrary is found. Decision procedures choose between two hypotheses in terms of minimizing the risk for an action, while conclusion procedures often are concerned with controlling errors of both the first and the second kinds at suitably low levels in choosing between two hypotheses (Anscombe, 1961; Lindley, 1961; Tukey, 1960a).
Salkind_Chapter 74.indd 35
9/4/2010 7:10:34 PM
36
Research Design, Measurement and Statistics and Evaluation
Rejection or Acceptance of the Null Hypothesis Much of the controversy among statisticians and behavioral scientists about the theory and the practice of statistical hypothesis testing has centered upon whether the null hypothesis can reasonably be accepted. Grant (1962) has argued that experiments oriented toward the acceptance of null hypotheses are inappropriate. Binder (1963) has countered that the Neyman-Pearson formulation admits the validity of accepting the null hypothesis, although, unless some adjustment in significance level is made, the rationale is prone to reject the null hypothesis when sample sizes are large or when powerful tests are used (Pratt, 1961b). It is the ready rejection of the null hypothesis by classical methods that is the focus of much of the Bayesian criticism (Edwards, Lindman, and Savage, 1963; Savage and others, 1962). Among statisticians, Anscombe (1961), Good (1958), Lindley (1958, 1961), Savage and others (1962), and Sterling (1960) have recommended that hypothesis testing should be directed toward testing plausible null hypotheses and not toward testing implausible ones that can be rejected by a single observation.
Statistical Significance and Substantive Significance That “significance levels do not signify” (Savage and others, 1962) has been widely recognized as a primary problem in the use of significance tests. Among the proposed solutions to the problem have been the inclusion of a distance function to indicate how deviant the data are from the null hypothesis (Bulmer, 1957), the use of a form of variance in analysis of variance to show how well the independent variables account for the obtained results (Bolles and Messick, 1958; Gaito, 1958), the substitution of interval estimation for significance tests to give a quantitative representation of the spread of the obtained values (Birnbaum, 1961; Kish, 1959; Natrella, 1960), the use of Bayesian methods (Edwards, Lindman, and Savage, 1963), and the use of the simple likelihood function (Birnbaum, 1962). Significance tests as methods for informative inference have been criticized severely (Birnbaum, 1962; Edwards, Lindman, and Savage, 1963; Lubin, 1962; Rozeboom, 1960; Savage and others, 1962). Mere inspection of the data often reveals the untenability of the null hypothesis. When there are grounds for believing that random sampling variations alone do not account for the data, a significance test is not the most appropriate method to use (Williams, 1959). Significance tests do not summarize the evidence (Birnbaum, 1962; Lindley, 1958; Tukey, 1962; Wilson, 1961). Much caution must be used in interpreting significance levels, for the obtained level of significance is not consistent from sample to sample or among comparable tests (Pratt, 1961b). The obtained level of significance is dependent upon conditions which do not
Salkind_Chapter 74.indd 36
9/4/2010 7:10:34 PM
Clark
Hypothesis Testing and Statistical Methodology
37
modify the obtained level (Birnbaum, 1962). There are many conditions which vitiate the accuracy of the obtained significance levels, even for robust test procedures (Tukey, 1962). Often graphical representation indicates the presence of disturbing conditions, but social scientists do not use such a procedure regularly (Tukey, 1962). Classical significance procedures are not sensitive to the size of the error (Wilson, 1961); often it is the size of the error that an investigator wishes to know to assess his procedures. An artificial level of significance is not a sufficient or an efficient indication of the presence or absence of evidence; yet editorial policies have chosen to ignore this fact in their insistence on extreme levels of significance (Melton, 1962). Publication policies have posed other problems for the interpretation of test results. Sterling (1959) has commented upon the prevalence of Type I errors among psychological publications associated with the suppression of nonsignificant results as a consequence of the inclination of editors to publish mainly articles with statistically significant findings. Cohen (1962) has found that articles on social and abnormal psychology have not been appropriately concerned with the power of the designs and tests used.
Significance Tests and Statistical Assumptions The assumptions underlying all phases of statistical methods constitute pervasive problems in statistical practice (Savage, 1957; Savage and others, 1962). Rarely can an investigator be certain that the assumptions are adequately fulfilled in any particular problem (Tukey, 1960a, 1962). Among the important problems regarding assumptions in ordinary statistical practice are the following: (a) the appropriateness of the probability model to characterize the population (Savage, 1957; Neyman, 1960); (b) the inappropriate use of small sample methods before large sample methods have been used to make estimates of variance (Nunnally, 1960); (c) the robustness or sensitivity of statistical methods to violations of such assumptions as normality, independence, homogeneity, additivity (Boneau, 1960); (d) the selection of test procedures with due awareness for the performance and the requirements of the procedures and the nature of the data (Binder, 1959; Boneau, 1960; Kendall and Stuart, 1961; Lehmann, 1959); (e) the use of transformations of data to meet the assumptions of the more powerful statistical methods versus the use of the less powerful distribution-free and nonparametric methods (Boneau, 1962; Lehmann, 1959; Savage, 1957). Always the investigator must remember that different tests based on different assumptions when applied to varying sets of data lead to different results (Tukey, 1962). Often the precision promised is not the precision delivered (Tukey, 1954, 1962). Another basic assumption underlying classical statistical methods is that of randomization. Psychologists have been criticized for not being sufficiently concerned with random sampling procedures, the definition of target
Salkind_Chapter 74.indd 37
9/4/2010 7:10:34 PM
38
Research Design, Measurement and Statistics and Evaluation
populations, and the limitations in the representative nature of most samples studied (Kish, 1959; McGinnis, 1958; Nunnally, 1960; Stevens, 1960). McGinnis (1958) and Kish (1959) have discussed the validity of significance tests in surveys versus well-controlled experiments.
Data Analysis and Statistical Hypothesis Testing Tukey (1962) commented that much of current statistical practice has emphasized the development of elaborate methods at the expense of attention to the analysis of data. Instead of concentrating upon probing the nature of the data and upon the questions that concern scientists, statistical practitioners have been prone to commit errors of the third kind – that is, giving exact answers to the wrong questions – which is perhaps the most serious of the three kinds of errors (Schlaifer, 1959). The routine use of significance tests has perpetrated many errors of the third kind, for, as mentioned above, significance tests do not provide the information that scientists need, and, furthermore, they are not the most effective method for analyzing and summarizing data. Tukey (1954, 1962) recommended that attention should be given to adapting methods to problems rather than to forcing problems into particular methods.
Statistics and Experimentation Statistical methods have been widely used in the sciences in which strict experimental controls are either not feasible or desirable. The usefulness of statistical methods for designing and analyzing investigations has been widely recognized; however, there have been dissenters, among them Hogben (Hogben, 1957; Stevens, 1960), who have maintained that the most appropriate solution for the many problems associated with the theory and practice of statistics is the replacement of statistical methods by well-controlled experimentation. Lindley (1958), in his response to Hogben’s criticisms, contended that statistical methodology requires extensive modifications in order to play an effective part in science, but that statistics will doubtlessly continue to have an important role in all kinds of research. The problems of the relative emphasis which should be given to considerations of statistical versus experimental design does not admit an easy solution.
Statistics and Psychological Theory If statistics is to make significant contributions to the testing of scientific hypotheses, then the methodology of statistical hypothesis testing must
Salkind_Chapter 74.indd 38
9/4/2010 7:10:34 PM
Clark
Hypothesis Testing and Statistical Methodology
39
incorporate theoretical formulations of the subject matter under investigation. Statistical inferences are maximally informative if they are supported by other findings and if they are enmeshed in an expanding network of hypotheses and constructs. The constitution of a set of mutually exclusive and exhaustive statistical hypotheses requires careful consideration of relevant theoretical alternatives in order to select heuristically valuable alternatives (Savage and others, 1962). Recognition of the limitations in the theoretical formulations and in the statistical methods should motivate prudence against unwarranted statistical and scientific inferences (Jeffreys, 1957). Bush (1963) remarked that theories are not substantiated by the goodness of fit between the data and a hypothetical statistical model but rather by the estimation of significant parameters in the statistical model so that information about the functions of experimental and theoretical variables can be accumulated. Estimation is essential for the testing of accurate and fruitful theories.
Conclusion While the so-called crisis in statistics (Hogben, 1957; Lindley, 1958; Stevens, 1960) has not been resolved, it has highlighted many of the shortcomings in such frequently used methods as significance tests, and it has evoked constructive thought about the very foundations of statistical inference (Savage and others, 1962) as well as about the rationale of statistical methods (Tukey, 1954, 1962). If educational and psychological research workers heed the admonitions and the recommendations of statisticians, a change in many aspects of statistical practices may be anticipated. At least investigators will be wary of the routine application of significance tests as the main basis for statistical inference.
Bibliography Anscombe, F. J. “Examination of Residuals.” Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1961. Vol. 1, pp. 1–35. Bahadur, Raghu Raj, and Robbins, Herbert. “The Problem of the Greater Mean.” Annals of Mathematical Statistics 21: 469–87; December 1950. Berkson, Joseph. “Tests of Significance Considered as Evidence.” Journal of the American Statistical Association 37: 325 –35; September 1942. Binder, Arnold. “Considerations of the Place of Assumptions in Correlational Analysis.” American Psychologist 14: 504 –10; August 1959. Binder, Arnold. “Further Considerations on Testing the Null Hypothesis and the Strategy and Tactics of Investigating Theoretical Models.” Psychological Review 70: 107–15; January 1963.
Salkind_Chapter 74.indd 39
9/4/2010 7:10:34 PM
40
Research Design, Measurement and Statistics and Evaluation
Birnbaum, Allan “Confidence Curves: An Omnibus Technique for Estimation and Testing Statistical Hypotheses.” Journal of the American Statistical Association 56: 246 – 49; June 1961. Birnbaum, Allan. “On the Foundations of Statistical Inference.” Journal of the American Statistical Association 57: 269–326; June 1962. Bolles, Robert C. “The Difference Between Statistical Hypotheses and Scientific Hypotheses.” Psychological Reports 11: 639– 45; December 1962. Bolles, Robert, and Messick, Samuel “Statistical Utility in Experimental Inference.” Psychological Reports 4: 223 – 27; June 1958. Boneau, C. Alan “The Effects of Violations of Assumptions Underlying the t Test.” Psychological Bulletin 57: 49 – 64; January 1960. Boneau, C. Alan “A Comparison of the Power of the U and t Tests.” Psychological Review 69: 246–56; May 1962. Buehler, Robert J. “Some Validity Criteria for Statistical Inferences.” Annals of Mathematical Statistics 30: 845– 63; December 1959. Bulmer, M. G. “Confirming Statistical Hypotheses.” Journal of the Royal Statistical Society (Series B, Methodological) 19: 125 –32; No. 1, 1957. Bush, Robert R. “Estimation and Evaluation.” Handbook of Mathematical Psychology. (Edited by R. Duncan Luce, Robert R. Bush, and Eugene Galanter.) New York: John Wiley & Sons, 1963. Vol. 1, Chapter 8, pp. 429 –69. Cohen, Jacob. “The Statistical Power of Abnormal-Social Psychological Research: A Review.” Journal of Abnormal and Social Psychology 65: 145 –53; September 1962. Cox, D. R. Planning of Experiments. New York: John Wiley & Sons, 1958. p. 308 (a) ——— “Some Problems Connected with Statistical Inference.” Annals of Mathematical Statistics 29: 357–72; June 1958. (b) Edwards, Ward. “Savage Statistics.” Contemporary Psychology 1: 14 –15; January 1956. ——— Lindman, Harold and Savage, Leonard J. “Bayesian Statistical Inference for Psychological Research.” Psychological Review 70: 193 –242; May 1963. Fisher, Sir Ronald A. Statistical Methods and Scientific Inference. New York: Hafner Publishing Co., 1956. p. 175. Fisher, Sir Ronald A. The Design of Experiments. Seventh edition. New York: Hafner Publishing Co., 1960. p. 248. Flanagan, John C. “The Dollar Dimension in Testing Decisions.” Contemporary Psychology 3: 164 –66; June 1958. Gaito, John “The Bolles-Messick Coefficient of Utility.” Psychological Reports 4: 595–98; December 1958. Good, I. J. “Significance Tests in Parallel and in Series.” Journal of the American Statistical Association 53: 799– 813; December 1958. Grant, David A. “Testing the Null Hypothesis and the Strategy and Tactics of Investigating Theoretical Models.” Psychological Review 69: 54 – 61; January 1962. Harrington, Gordon M. “Statistics’ Logic.” Contemporary Psychology 6: 304 –305; September 1961. Hogben, Lancelot Statistical Theory: The Relationship of Probability, Credibility, and Error. New York: W. W. Norton & Co., 1957, p. 510. Hotelling, Harold. “The Impact of R. A. Fisher on Statistics.” Journal of the American Statistical Association 46: 35–46; March 1951. Hotelling, Harold. “The Statistical Method and the Philosophy of Science.” American Statistician 12: 9 –14; December 1958. Jeffreys, Harold. Scientific Inference. Second edition. New York: Cambridge University Press, 1957. p. 236. Jeffreys, Harold. Theory of Probability. Third edition. New York: Oxford University Press, 1961. p. 447.
Salkind_Chapter 74.indd 40
9/4/2010 7:10:34 PM
Clark
Hypothesis Testing and Statistical Methodology
41
Kaiser, Henry F. “Directional Statistical Decisions.” Psychological Review 67: 160 – 67; May I960. Kendall, Maurice G., and Stuart, Alan The Advanced Theory of Statistics. Revised edition. New York: Hafner Publishing Co., 1961. Vol. 2, “Inference and Relationship,” 676 pp. Kish, Leslie. “Some Statistical Problems in Research Design.” American Sociological Review 24: 328–38; June 1959. Lehmann, E. L. Testing Statistical Hypotheses. New York: John Wiley & Sons, 1959. 369 pp. Lindley, D. V. “Statistical Inference.” Journal of the Royal Statistical Society (Series B, Methodological) 15: 30 –76; No. 1, 1953. Lindley, D. V. “Professor Hogben’s ‘Crisis’ – A Survey of the Foundations of Statistics.” Applied Statistics 7: 186–98; November 1958. Lindley, D. V. “The Use of Prior Probability Distributions in Statistical Inference and Decisions.” Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1961. Vol. 2, pp. 453 – 68. Lubin, Ardie “Statistics.” Annual Review of Psychology. (Edited by Paul R. Farnsworth, Olga McNemar, and Quinn McNemar.) Palo Alto, Calif.: Annual Reviews, 1962. Vol. 13, pp. 345 – 70. Luce, R. Duncan, and Raiffa, Howard. Games and Decisions: Introduction and Critical Survey. New York: John Wiley & Sons, 1957. 509 pp. McGinnis, Robert. “Randomization and Inference in Sociological Research.” American Sociological Review 23: 408 –14; August 1958. McNemar, Quinn “At Random: Sense and Nonsense.” American Psychologist 15: 295 –300; May 1960. Melton, Arthur W. “Editorial.” Journal of Experimental Psychology 64: 553 – 57; December 1962. Mowrer, O. Hobart. Learning Theory and the Symbolic Processes. New York: John Wiley & Sons, 1960. 473 pp. Natrella, Mary G. “The Relation Between Confidence Intervals and Tests of Significance.” American Statistician 14: 20 –22, 38; February 1960. Neyman, Jerzy “Indeterminism in Science and New Demands on Statisticians.” Journal of the American Statistical Association 55: 625 –39; December 1960. Nunnally, Jum “The Place of Statistics in Psychology.” Educational and Psychological Measurement 20: 641–50; Winter 1960. Pearson, E. S. “Some Thoughts on Statistical Inference.” Annals of Mathematical Statistics 33: 394 – 403; June 1962. Pratt, John W. “Length of Confidence Intervals.” Journal of the American Statistical Association 56: 549 – 67; September 1961. (a) Pratt, John W., reviewer “Testing Statistical Hypotheses by E. L. Lehmann.” Journal of the American Statistical Association 56: 163–67; March 1961. (b) Raiffa, Howard, and Schlaifer, Robert Applied Statistical Decision Theory. Boston: Division of Research, Harvard Business School, Harvard University, 1961. 356 pp. Roberts, Harry V., reviewer “Applied Statistical Decision Theory by Howard Raiffa and Robert Schlaifer.” Journal of the American Statistical Association 57: 199 – 202; March 1962. Rozeboom, William W. “The Fallacy of the Null-Hypothesis Significance Test.” Psychological Bulletin 57: 416 –28; September 1960. Savage, I. Richard. “Nonparametric Statistics.” Journal of the American Statistical Association 52: 331– 44; September 1957. Savage, Leonard J. The Foundations of Statistics. New York: John Wiley & Sons, 1954, 294 pp.
Salkind_Chapter 74.indd 41
9/4/2010 7:10:34 PM
42
Research Design, Measurement and Statistics and Evaluation
Savage, Leonard J. “The Foundations of Statistics Reconsidered.” Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1961. Vol. 1, pp. 575 – 86. Savage, Leonard J., and others. The Foundations of Statistical Inference. New York: John Wiley & Sons, 1962. 112 pp. Schlaifer, Robert. Probability and Statistics for Business Decisions. New York: McGraw-Hill Book Co., 1959. 732 pp. Selvin, Hanan C. “A Critique of Tests of Significance in Survey Research.” American Sociological Review 22: 519 – 27; October 1957. Sterling, Theodor D. “Publication Decisions and Their Possible Effects on Inferences Drawn from Tests of Significance – or Vice Versa.” Journal of the American Statistical Association 54: 30–34; March 1959. Sterling, Theodor D. “What Is So Peculiar About Accepting the Null Hypothesis?” Psychological Reports 7: 363 – 64; October 1960. Stevens, S. S. “The Predicament in Design and Significance.” Contemporary Psychology 5: 273–76; September 1960. Tukey, John W. “Unsolved Problems of Experimental Statistics.” Journal of the American Statistical Association 49: 706 –31; December 1954. Tukey, John W. “Conclusions vs. Decisions.” Technometrics 2: 423–33; November 1960. (a) Tukey, John W. “Where Do We Go from Here?” Journal of the American Statistical Association 55: 80–93; March 1960. (b) Tukey, John W. “The Future of Data Analysis.” Annals of Mathematical Statistics 33: 1– 67; March 1962. Wald, Abraham. Statistical Decision Functions. New York: John Wiley & Sons, 1950. 179 pp. Williams, E. J. Regression Analysis. New York: John Wiley & Sons, 1959. 214 pp. Wilson, Kellogg V. “Subjectivist Statistics for the Current Crisis.” Contemporary Psychology 6: 229 – 31; July 1961. Yates, F. “The Influence of Statistical Methods for Research Workers on the Development of the Science of Statistics.” Journal of the American Statistical Association 46: 19 – 34; March 1951.
Salkind_Chapter 74.indd 42
9/4/2010 7:10:34 PM
75 On Examinee Choice in Educational Testing Howard Wainer and David Thissen
If you allow choice, you will regret it; if you don’t allow choice, you will regret it; whether you allow choice or not, you will regret both. —Kierkegaard, 1986, p. 24
T
hroughout the educational process, decisions are made using nonrandomly selected data. Admissions to college are decided among individuals whose dossiers contain mixtures of material; one high school student may opt to emphasize courses in math and science whereas another may have taken advanced courses in French and Spanish. One student might have been editor of the school newspaper, another captain of the football team, and yet a third might have been first violin in the orchestra. All represent commitment and success; how are they to be compared? Is your French better than my calculus? Is such a comparison sensible? Admissions offices at competitive universities face these problems all the time; dismissing them is being blind to reality. Moreover, there is obvious sense in statements like, “I know more physics than you know French.” Or, “I am a better runner than you are a swimmer.” Cross-modal comparisons are not impossible, given that we have some implicit underlying notion of quality. How accurate are such comparisons? Can we make them at all when the differences between the individuals are subtle? How do we take into account the difficulty of the accomplishment? Is being an All-State athlete as distinguished an accomplishment as being a Merit Scholarship finalist? Source: Review of Educational Research, 64(1) (1994): 159 –195.
Salkind_Chapter 75.indd 43
9/4/2010 10:58:22 AM
44
Research Design, Measurement and Statistics and Evaluation
How can we understand comparisons like these? Perhaps we can begin by considering the more manageable situation that manifests itself when the examination scores of students who are to be compared are obtained from test items that the students have chosen themselves. Such a situation is likely to occur increasingly often in the future, because of the greater contemporary emphasis on assessing what are called generative or constructive processes in learning. To be able to measure such processes, testing programs believe they must incorporate constructed response items into their previously multiple-choice standardized exams. It is hoped that large items such as testlets, essays, mathematical proofs, experiments, portfolios, or other performance-based tasks are better able to measure deep understanding, broad analysis, and higher levels of performance than traditional multiplechoice items. Examples of tests currently using large items are the College Board’s Advanced Placement Examinations in United States History, European History, United States Government and Politics, Physics, Calculus, and Chemistry. There are many others. When an exam consists, in whole or in part, of constructed response items, it is common practice to allow the examinee to choose a subset of the constructed response questions from a larger pool. It is sometimes argued that, if choice were not allowed, the limitations on domain coverage forced by the small number of items might unfairly affect some examinees. An alternative to choice would be to increase the length of the test; this is not often practical. Another alternative would be to confine the test questions to a core curriculum that all valid courses ought to cover. This option may discourage teachers from broadening their courses beyond that core. Even in the absence of attractive alternatives, is allowing examinee choice a sensible strategy? Under what conditions can we allow choice without compromising the fairness and quality of the test? We will not provide a complete answer to these questions. We will illustrate some of the pitfalls associated with allowing examinee choice; we will provide a vocabulary and framework that aids in the clear discussion of the topic; we will outline some experimental steps that can tell us whether choice can be implemented fairly; and we will provide some experimental evidence that illuminates the topic. We begin with a brief history of examinee choice in testing, in the belief that it is easier to learn from the experiences of our clever predecessors than to try to relive those experiences.
A Selective History of Choice in Exams We shall confine our attention to some college entrance exams used during the first half of the 20th century in the United States. The College Entrance Examination Board (CEEB) began testing prospective college students at
Salkind_Chapter 75.indd 44
9/4/2010 10:58:23 AM
Wainer and Thissen
On Examinee Choice in Testing
45
the turn of the century. By 1905, exams were offered in 13 subjects: English, French, German, Greek, Latin, Spanish, mathematics, botany, chemistry, physics, drawing, geography, and history (College Entrance Examination Board, 1905). Most of these exams contained some degree of choice. In addition, all of the science exams used portfolio assessment, as it might be called in modern terminology. For example, on the 1905 Botany exam 37% of the grade was based on the examinee’s laboratory notebook. The remaining 63% was based on a 10-item exam. The examinee was asked to answer 7 of those 10 items. This yielded 120 (10 choose 7) different examinee-created “forms.” Question 10 (below), of the 1905 Botany exam, could complicate the computation of the number of choice-created forms: 10. Select some botanical topic not included in the questions above, and write a brief exposition of it.
The graders for each test were identified, with their institutional affiliations. The chemistry and physics exams shared the structure of the botany exam. As the CEEB grew more experienced, the structure of its tests changed. The portfolio aspect disappeared by 1913, when the requirement of a teacher’s certification that the student had, in fact, completed a lab course was substituted for the student’s notebook. By 1921, this certification was no longer required. The extent to which examinees were permitted choice varied; see Table 1, in which the number of possible examinee-created forms are listed for several subjects and years. The contrast between the flamboyance of the English exam and the staid German exam is instructive. Only once, in 1925, was any choice allowed on the German exam (“Answer only one of the following six questions”). The lack of choice seen in the German exam is representative of all of the foreign language exams except Latin. The amount of choice seen in the Physics and Chemistry exams parallels that Table 1: Number of possible test forms generated by examinee choice patterns Subject Year
Chemistry
Physics
English
German
1905 1909 1913 1917 1921 1925 1929 1933 1937 1941
54 18 8 252 252 126 20 20 15 1
81 108 144 1,620 216 56 56 10 2 1
64 60 7,260 1,587,600 2,960,100 48 90 24 1 1
1 1 1 1 1 6 1 1 1 1
Salkind_Chapter 75.indd 45
9/4/2010 10:58:23 AM
46
Research Design, Measurement and Statistics and Evaluation
for most exams that allowed choice. The English exam between 1913 and 1925 is unique in terms of the possible variation.1 No equating of these examinee-constructed forms was considered. By 1941, the CEEB offered 14 exams, but only 3 (American History, Contemporary Civilization, and Latin) allowed examinee choice. Even among these, choice was sharply limited: •
• •
In the American History exam, there were six essay questions. Essays 1, 2, and 6 were mandatory. There were three questions about the American Constitution (labeled 3A, 4A, and 5A) as well as three parallel questions about the British Constitution (3B, 4B, and 5B). The examinee could choose either the A questions or the B questions. In the Contemporary Civilization exam, there were six essay questions. Questions 1– 4 and 6 were all mandatory. Question 5 consisted of six short-answer items out of which the examinee had to answer five. The Latin exam had many parts. In sections requiring translation from Latin to English or vice versa, the examinee often had the opportunity to pick one passage from a pair to translate.
In 1942, the last year of the program, there were fewer exams given; none allowed examinee choice. Why did the use of choice disappear over the 40 years of this pioneering examination program? Our investigations did not yield a definitive answer, although there are many hints that suggest that issues of fairness propelled the CEEB toward the test structure on which they eventually settled. Our insight into this was sharpened during the reading of Brigham’s (1934, p. i) remarkable report on “the first major attack on the problem of grading the written examination.” This exam required the writing of four essays. There were six topics; Topics 1 and 6 were required of all examinees, and there was a choice offered between Topics 2 and 3 and between Topics 4 and 5. The practice of allowing examinee choice was termed “alternative questions” (p. i). Brigham noted (1934, p. 7) that, When alternative questions are used, different examinations are in fact set for the various groups electing the different patterns. The total score reported to the college is to a certain extent a sum of the separate elements, and the manner in which the elements combine depends on their intercorrelation. This subject is too complex to be investigated with this material.
Brigham’s judgment of the difficulty is mirrored by Harold Gulliksen (1950, p. 338), who wrote that, “In general it is impossible to determine the appropriate adjustment without an inordinate amount of effort. Alternative questions should always be avoided.” Both of these comments suggest that, while adjusting for the effects of examinee choice is possible, it is too difficult to do within an operational
Salkind_Chapter 75.indd 46
9/4/2010 10:58:23 AM
Wainer and Thissen
On Examinee Choice in Testing
47
context. We suspect that even these careful researchers underestimated the difficulty of satisfactorily accomplishing such an adjustment; their warnings are strongly reminiscent of Fermat’s marginal comments about his only just proved theorem. This view was supported by Ledyard Tucker (personal communication, February 10, 1993), who said that, “I don’t think that they knew how to deal with choice then. I’m not sure we know how now.” The core of the problem of choice is that, when it is allowed, the examinees’ choices generate what can be thought of as different forms of the test. These forms may not be of equal difficulty. When different forms are administered, standards of good testing practice require that those forms be statistically equated so that individuals who took different forms can be compared fairly. Perhaps, through pretesting and careful test construction, it may be possible to make the differences in form difficulty sufficiently small that further equating is not required. As we will show, an unbiased estimate of the difficulty of any item can only be obtained from a random sample of the examinee population. The CEEB made no attempt to get such a sample. Perhaps that is why Brigham said that, “This subject is too complex to be investigated with this material.”
Are Choice Items of Equal Difficulty? We have not yet found a situation in which it was plausible to believe that two choice questions were of equal difficulty. However, in only one case could this be shown unequivocally; in operational tests, there are always alternative explanations. These alternatives, although often unlikely, can never be dismissed entirely. To illustrate, we will give three examples and provide references to more. We will then describe the only circumstance we know of in which the difficulty of choice items was established. This last case will provide motivation for the subsequent discussion of how choice can be permitted and tests can still be fair. Example 1:1968 AP Chemistry Test. Shown in Table 2 is a result reported by Fremer, Jackson, and McPeek (1968) for a form of the College Board’s Advanced Placement Chemistry Test. One group of examinees chose to take Problem 4, and a second group chose Problem 5. While their scores on the
Table 2: Average scores on AP Chemistry 1968 Choice problem Choice group
MC section
4
1
11.7
8.2
2
11.2
Salkind_Chapter 75.indd 47
5
2.7
9/4/2010 10:58:23 AM
48
Research Design, Measurement and Statistics and Evaluation
common multiple-choice section were about the same (11.7 vs. 11.2, out of a possible 25), their scores on the choice problem were very different (8.2 vs. 2.7, on a 10-point scale). There are several possible conclusions to be drawn from this; four among them are: 1. Problem 5 is a good deal more difficult than Problem 4. 2. Small differences in performance on the multiple-choice section translate into much larger differences on the free response questions. 3. The proficiency required to do the two problems is not strongly related to that required to do well on the multiple-choice section. 4. Item 5 is selected by those who are less likely to do well on it. Investigation of the content of the questions, as well as studies of the dimensionality of the entire test, suggests that Conclusion 1 is the most credible. This interpretation would suggest that scores on these two examinee-created test forms ought to have been equated. They were not. Example 2:1989 AP Chemistry Test. In the 1989 version of the AP Chemistry Test much had changed, but there was still examinee choice. Figure 1 shows the expected score on each of two choice items (examinees could opt to answer just one of these two) as a function of examinee proficiency as measured by the entire test (Wainer, Wang, & Thissen, 1991). To calculate these trace lines, an untestable assumption about unobserved examinee performance had to be made; this assumption will be discussed in detail later. It will suffice for now to note that, even with the assumptions that usually underlie item response theory, we also had to assume that relatively poorer performance on Problem 3 by an examinee who otherwise performed equivalently to an examinee who chose Problem 2 must be reflected in differential difficulty of the choice problems, not multidimensionality.
9 8
Problem 2
Expected Score
7
Problem 3
6 5 4 3 2 1 0
−3
−2
−1
0 1 Proficiency
2
3
Figure 1: A comparison of the expected score trace lines for the two choice problems in Part II, Section B, of the 1989 AP Examination in Chemistry
Salkind_Chapter 75.indd 48
9/4/2010 10:58:23 AM
Wainer and Thissen
On Examinee Choice in Testing
49
There is about a one-point advantage for an examinee who takes Problem 2 and whose proficiency lies between 0 and 2 on this scale. This is an improvement on the 1968 test, but it is by no means a trivial difference. To put this in perspective, there are about 160 points on the test, and a score of about 45 is sufficient to obtain college credit for completing a one-semester chemistry course. One point is not a trivial amount. Example 3:1988 AP United States History Test. The first two examples indicated that those who chose the more difficult problems were placed at a disadvantage. In this example, we identify more specifically the examinees who tended to choose more difficult items. Consider the results shown in Figure 2 from the 1988 administration of the College Board’s Advanced Placement Test in United States History (hereafter AP US History). This test comprises 100 multiple-choice items and two essays. The first essay is mandatory; the second is chosen by the examinee from among five topics. Shown in Figure 2 are the average scores given for each of those topics, as well as the proportion of men and women who chose them. Essay 3 had the lowest average scores for both men and women. The usual interpretation of this finding is that the topic of Essay 3 was the most difficult. This topic was about twice as popular among women as among men. An alternative interpretation of these findings might be that the lowest proficiency examinees chose this topic and that a greater proportion
The most difficult essay (3) was twice as popular with women as with men 0.4 Essay 4
Proportion Choosing
0.3
Essay 2
Essay 3
Men
0.2
Women Essay 6 Essay 5
0.1
0.0 5.2
5.4
5.6
5.8
6.0
6.2
6.4
6.6
Average score on choice eassy
Figure 2: The relative performance of men and women on the choice essays in the 1988 AP Examination in History
Salkind_Chapter 75.indd 49
9/4/2010 10:58:24 AM
50
Research Design, Measurement and Statistics and Evaluation
Table 3: Item 6 is the hardest; Items 8 and 9 are the easiest Proportion choosing Choice on Section D
Males
Females
567 568
0.13 0.25
0.16 0.35
578 789 589
0.28 0.06 0.10
0.24 0.03 0.07
Hardest
Easiest
of women than men fell into this category. This illustrates again how any finding within a self-selected sample yields ambiguous interpretations. This same phenomenon shows itself on the chemistry test as well. Shown in Table 3 are the summary proportions of male and female examinees choosing various triples in a choose-3-of-5 section of the 1989 AP Chemistry Test. Of the five choice items, Item 6 is the most difficult, and Items 8 and 9 are the easiest. Note that female examinees chose Item triples 5,6,7 and 5,6,8 considerably more often than male examinees; men chose the easier items more often than women. Because all items counted equally, men had an advantage: not because of any superior knowledge of chemistry but merely because they tended to choose easier items to answer. Similar studies with similar findings have been carried out on all of the AP tests that allow choice (DeMauro, 1991; Pomplun, Morgan, & Nellikunnel, 1992). Fitzpatrick and Yen (1993) also reported substantial sex and ethnic differences in choice on third-, fifth-, and eighth-grade passage-based reading comprehension tests. In the data described by Fitzpatrick and Yen, there is no obvious tendency of any particular group to select more or less difficult items; however, for any particular form of the test, the outcome of the combination of choice and scoring was that some group was placed at a disadvantage. The phenomenon of students choosing poorly is so widespread that evidence of it occurs whenever one looks for it. It is instructive to note that Powers, Fowles, Farnum, and Gerritz (1992), in their description of the effect of choice on a test of basic writing, found that the more that examinees liked a particular topic, the lower they scored on an essay they subsequently wrote on that topic. Their results are summarized in Figure 3. An examinee’s preference for a topic does not predict how well he or she will do on it. Perhaps if examinees were explicitly informed that they ought to choose a topic on the basis of how well they think they can score, and not how much they like the topic, they might choose more wisely. Although test developers try to make up choice questions that are of equivalent difficulty, they do not appear to have been completely successful.
Salkind_Chapter 75.indd 50
9/4/2010 10:58:24 AM
Wainer and Thissen
On Examinee Choice in Testing
51
Performance Declines with Preference! 4.4
Performance
4.2
4.0
3.8
3.6
3.4 4.0
4.5
5.0 5.5 6.0 Preference
6.5
7.0
Note: Adapted from Powers, Fowles, Farnum, and Gerritz, 1992, Table 3, p. 17.
Figure 3: In a test of basic writing, examinees scored lower on essay topics that they liked more
Of course, this conclusion is clouded by possible alternative explanations that have their origin in self-selection: For example, the choice items are not unequally difficult; rather, the people who chose them are unequally proficient. So long as there is self-selection, this alternative cannot be completely dismissed, although it can be discredited through the use of covariate information. If it can be shown that choice items are not of equal difficulty, it follows that some individuals will be placed at a disadvantage by their choice of item – they choose to take a test some of the items of which are more difficult than those on a corresponding test for other examinees – and this extra difficulty is not adjusted away in the scoring. The only unambiguous data on choice and difficulty that we know of involved data gathered and reported by Wang, Wainer, and Thissen (1993) in which examinees were presented with a choice of two items but then required to answer both. In Table 4, we see that, even though Item 12 was much more difficult than Item 11, there were still some students who chose it. The item difficulties in Table 3 were obtained from an operational administration of these items involving more than 18 thousand examinees. Perhaps examinees chose Item 12 because they had some special knowledge that made this item less difficult for them than Item 11. In Table 5 is shown the performance of examinees on each of these items broken down by the items they chose. Note that 11% of those examinees who chose Item 12 responded correctly, whereas 69% of them answered Item 11 correctly. Moreover, this group performed more poorly on both of the items than examinees who chose Item 11. The obvious implication drawn from this example is that examinees do not always choose wisely and that less proficient examinees exacerbate matters through their unfortunate choices. Data from the rest of this experiment consistently supported these conclusions.
Salkind_Chapter 75.indd 51
9/4/2010 10:58:24 AM
52
Research Design, Measurement and Statistics and Evaluation
Table 4: The number of students choosing each item and the difficulty of those items in the general test-taking population Item chosen 11 12
Number choosing
Item difficulty (b)
180 45
−2.5 1.6
Table 5: The proportion of students getting each item correct shown conditional on which item they preferred to answer Item chosen Item answered 11 12
11
12
0.84 0.23
0.69 0.11
Strategies for Fair Testing When Choice Is Allowed We will now define more carefully the goal of examinee choice. We will then examine two alternative strategies for achieving that goal. The first is to aid students in making wiser choices; the second is to diminish the unintended consequences of a poor choice through statistical equating of the choice items.
What Do We Get When We Allow Choice? When a test is given we ordinarily estimate a score. Following traditional IRT notation,2 we will call the proficiency of a particular examinee taking the test θ . Ordinarily, each item on a test proθ and the estimate of that proficiency vides a somewhat different value of the examinee’s estimated proficiency; sometimes the variance of the distribution of those estimates is used as an index of measurement precision (Wainer & Wright, 1980). θ Max. The usual estimate of proficiency is based on estimated performance from a random sample drawn from the entire distribution of items. But suppose we allow choice. What are we estimating? Most practitioners we have spoken to, who favor allowing choice, argue that choice provides the opportunity for the examinees to show themselves to best advantage. We shall call this proficiency θ Max . Figure 4 is a graphical depiction of what might occur in a choice situation. The top panel is the distribution of θ based on information obtained prior to administering the choice items. The distribution is centered on 0. Beneath this panel is a second panel showing the trace lines for two items, assuming that they are answered correctly. Item 1 is relatively easy; Item 2 is somewhat more difficult. Each curve in the bottom panel is constructed by multiplying
Salkind_Chapter 75.indd 52
9/4/2010 10:58:24 AM
On Examinee Choice in Testing
53
Population Density
Wainer and Thissen
0.0 ⫺3
⫺2
⫺1
⫹1
0 θ
⫹2
⫹3
⫹2
⫹3
Probability Correct
1.0 P1(θ) P2(θ)
0.5
Posterior Density
0.0 ⫺3
⫺2
⫺1
⫹1
0 θ
Item 1 correct
Item 2 incorrect
Item 2 correct
Item 1 incorrect
0.0 ⫺3
⫺2
⫺1
⫹1
0 θ S 1⫺ S2⫺
⫹
⫹2
⫹3
⫹
S1 S 2
Figure 4: A graphical representation depicts how the posterior density is computed from the product of the prior density and the appropriate item trace lines
the prior density by the appropriate trace line (or one minus that trace line, for incorrect responses). These are called the posterior densities after choice and represent our knowledge of that examinee’s ability. θMax is the characterization of proficiency that would be obtained if the examinees chose the item that would give them the highest score. When we allow choice we are attempting to estimate θMax. What we are actually obtaining is θ Max related to θMax? θ Max. How is To answer this question, it is useful to adopt a Bayesian approach. But first, we must specify the scoring system for the items, because an examinee’s expected score is Pitem s, where Pitem is the probability the examinee will obtain score s from the item. We will consider two possibilities: 1. Both items are scored s = 1, if correct, s = 0, if incorrect (simple summed scoring). In this case, only the information in the second panel of Figure 4 is relevant to the choice: We see that P1 > P2 for all values of θ, so all examinees maximize their expected score by choosing Item 1.
Salkind_Chapter 75.indd 53
9/4/2010 10:58:24 AM
54
Research Design, Measurement and Statistics and Evaluation
2. The items are scored using IRT – for example, using the mean of the posterior given the item response. In this case, the score associated with Item 1 correct is s1+ ; that for Item 1 incorrect is s1− . The score associated with Item 2 correct is s2+ ; that for Item 2 incorrect is s2− .3 The locations of the scores are shown in Figure 4. Now the examinee’s task is more difficult; the choice must be Item 2 if and only if + − P2 s2+ + (1 − P2 )s2− > Ps 1 1 + (1 − P1 )s1 .
To do this computation, examinees must know both P1 and P2 (at least for their own values of θ), as well as the item scores. But the examinees, when choosing an item, do not know their probability of responding correctly to the item. Instead, they have some subjective idea of that probability. We have already provided strong evidence indicating that some examinees do not choose wisely. Moreover, we have seen that the propensity for making optimizing choices varies by sex and ethnic group. As we have already seen, choice items, as currently prepared, are typically not of equal difficulty. This fact, combined with the common practice of not equating choice items for their differential difficulty, yields the inescapable conclusion that it matters what choice an examinee makes. Examinees who chose the more difficult question will, on average, get lower scores than would have been the case had they chosen the easier item. The fact that all examinees do not choose those items that will show their proficiency to best advantage completes this unhappy syllogism: examinee choice is not likely to yield credible estimates of θMax.
What Can We Do to Improve Matters? There appear to be two paths that can be followed: eliciting wiser choices by examinees or equating test forms. The second option removes the necessity for the first; in fact, it makes examinee choice unnecessary. How can we improve examinees’ judgment about which items to select? Estimation of θMax can be done optimally only by asking examinees to answer all items and then scoring just those responses that yield the highest estimate of performance. This strategy is not without its drawbacks. First, it takes more testing time, and choice is often instituted to keep testing time within practical limits. Second, many examinees, on hearing that “only one of the six items will be counted” will only answer one. Thus, this strategy may commingle measures of grit, choice wisdom, and risk aversion with those of proficiency. A more practical approach might be to try to improve the instructions to the examinees about how the test is graded, to guide their choices better. It would be well if the instructions about choice made it clear that there is no advantage to answering a hard item correctly relative to answering an easy one, if such is indeed the case. Current instructions do not address this issue.
Salkind_Chapter 75.indd 54
9/4/2010 10:58:25 AM
Wainer and Thissen
On Examinee Choice in Testing
55
For example, the instructions about choice on the 1989 AP Chemistry Test (CEEB, 1990, p. 23), reproduced in their entirety are: Solve ONE of the two problems in this part. (A second problem will not be scored.)
Contrast this with the care that is taken to instruct examinees about the hazards of guessing. These are taken from the same test (p. 3): Many candidates wonder whether or not to guess the answers to questions about which they are not certain. In this section of the examination, as a correction for haphazard guessing, one-fourth of the number of questions you answer incorrectly will be subtracted from the number you answer correctly. It is improbable, therefore, that mere guessing will improve your score significantly; it may even lower your score, and it does take time. If, however, you are not sure of the correct answer but have some knowledge of the question and are able to eliminate one or more of the answer choices as wrong, your chance of getting the right answer is improved, and it may be to your advantage to answer such a question.
Perhaps, with better instructions, the quality of examinee choices can be improved. At the moment, there is no evidence supporting the conjecture that they can be, or if so, by how much. An experimental test of the value of improved instructions could involve one randomly selected group with the traditional instructions and another with a more informative set; which group has higher average scores on the choice section? A more complex experiment could use a paradigm much like that employed by Wang, Wainer, and Thissen (1993) in which examinees were asked to choose from among several items but were then required to answer all of them. This sort of experiment would allow a detailed examination of the change in choice behavior due to the instructions. While more explicit instructions may help matters somewhat, and ought to be included regardless of their efficacy, we are not sanguine about this option solving the problem of getting θ Max closer to θ Max . To do this requires reducing the impact of unwise choice. As we pointed out, there are two terms in the calculation of subjective posterior density. The first is the subjective probability of getting the item correct; improved instructions may help this. The second requires that the examinee have an accurate idea of the relative difficulty of the choice items. Pretesting, when it is possible, would allow us to present to examinees each item’s difficulty in the pretest population. It will not help to characterize the individual variations in item difficulty that are the principal reason for allowing choice. A more promising path seems to be to make all of the choice problems equally difficult (from the point of view of the entire examinee population) and allow the choice to be governed by whatever special knowledge or proficiency each individual examinee might possess. In this way, we can be sure that, at least on average, the items are as fair as
Salkind_Chapter 75.indd 55
9/4/2010 10:58:25 AM
56
Research Design, Measurement and Statistics and Evaluation
possible. The problem is that it is at least difficult, and perhaps impossible, to build items that empirically turn out to be exactly equal in difficulty. Another option is to adjust the scores on the choice items statistically for their differential difficulty. We will refer to this statistical adjustment as equating, although the way it is carried out may not satisfy the strict rules that are sometimes associated with that term.
How Does Equating Affect the Examinee’s Task? Equating appears, at first blush, to make the examinee’s task of choosing more difficult still. If no equating is done, the instructions to the examinee should be: Answer that item that seems easiest to you
(and we hope that the examinees choose correctly, but we will not know if they do not). If we equate the choice items (give more credit for harder items than easier ones), the instructions should be: Pick that item which, after we adjust, will give you the highest score.
This task could be akin to the problem faced by competitive divers, who choose their routine of dives from within various homogeneous groups of dives. The diver’s decision is informed by: • • •
knowledge of the degree of difficulty of each dive, knowledge of the concatenation rule by which the dive’s difficulty and the diver’s performance rating are combined (they are multiplied), and knowledge, obtained through long practice, of what his or her score is likely to be on all of the dives.
Armed with this knowledge, the diver can select a set of dives that is most likely to maximize his or her total score. The diver scenario is one in which an individual’s informed choice provides a θ Max that seems to be close enough to θMax for useful purposes. Is a similar scenario possible within the plausible confines of standardized testing? Let us examine the aspects of required knowledge point by point. Specifying how much each item will count in advance is possible, either by calculating the empirical characteristics of each item from pretest data or, as is currently the case, by specifying how much each one counts by fiat. We favor the former, because it allows each item to contribute to total score in a way that minimizes measurement error. An improvident choice of a priori weights can have a serious deleterious effect on
Salkind_Chapter 75.indd 56
9/4/2010 10:58:25 AM
Wainer and Thissen
On Examinee Choice in Testing
57
measurement accuracy (see Lukhele, Thissen, & Wainer, 1993; Wainer & Thissen, 1993a). Specifying the concatenation rule (how examinee performance and item characteristics interact to contribute to the examinee’s score) in advance is also possible but may be quite complex, for example, if IRT is used. Perhaps a rough approximation can be worked out, or perhaps one could present a graphical solution like that shown in Figure 4, but for now this remains a question. The difficulties that we might have with specifying the concatenation rule are largely technical, and workable solutions could probably be developed. A much more formidable obstacle is providing the examinees with enough information so that they can make wise choices. This seems completely out of reach, for, even if examinees know how much a particular item will, if answered correctly, contribute to their final score, it does no good unless the examinees have a good idea of their likelihood of answering the item correctly. The extent to which such knowledge is imperfect would then correspond to the bias (used in its statistical sense) associated with the θ Max to estimate θMax. The nature of security associated with modern use of large-scale tests makes impossible the sort of rehearsal that provides divers with accurate estimates of their performance under various choice options. The prospect appears bleak for simultaneously allowing choice and satisfying the canons of good practice that require the equating of test forms of unequal difficulty. The task that examinees face in choosing items when they are adjusted seems too difficult. But is it? There remain two glimmers of hope. The brighter of these rests on the possibility of successfully equating the various choice forms. If we can do this, the examinees should be indifferent as to which items they answer, because successful equating means that an examinee will receive, in expectation, the same score regardless of the form administered. This is happy but ironic news, for it appears that we can allow choice and have fair tests only when choice is unnecessary. A dimmer possibility is to try to improve examinees’ estimates of their success on the various choices. In a computer administered test, it may be possible to provide some model-based estimates of an examinee’s probable score on each item. To the extent that these estimates are accurate, they might help. Of course, if really good estimates were available, we would not need to test further. Moreover, the value of choice would be greatest when an examinee’s likelihood of success is very different than that predicted from the rest of the test. To answer the question posed at the beginning of this section: When we do not equate selected items, the problem of choice faced by the examinee can be both difficult and important. When we do equate, the selection problem simultaneously becomes much more difficult but considerably less important. This conclusion naturally brings us to the next question.
Salkind_Chapter 75.indd 57
9/4/2010 10:58:25 AM
58
Research Design, Measurement and Statistics and Evaluation
Under What Conditions Can We Equate Choice Items? How? Let us reconsider Harold Gulliksen’s (1950, p. 338) advice, “Alternative questions should always be avoided.” We have discussed one possible reason for this – that it makes the examinee’s task too difficult. Our conclusion was that, while it does make the task difficult, this difficulty becomes irrelevant for most uses of the test score if the alternate forms thus constructed can be equated. This raises a second possible explanation for this advice: The equating task is too difficult. Certainly this explanation was the one favored by Tucker (quoted earlier). The only way to equate test forms that are created by choice is to make some (untestable) assumptions about the structure of the missing data that have resulted from the choice behavior. One possible assumption is missing-completely-at-random. Underlying this assumption is the notion that, if we had the examinee’s responses to all of the items, a random deletion of some portion of them would yield, in expectation, the same score as was obtained through the examinee’s choice. In simple terms, we assume that the choice had no effect on the examinee’s score. If we really believed missing-completely-at-random, we could equate without any anchor items because an important consequence of missingcompletely-at-random is that all choice groups will have the same proficiency distribution. Data gathered from all of the Advanced Placement exams (Pomplun, Morgan, & Nellikunnel, 1992) suggest that this is not credible. Thus, it is imperative to use required anchor items to establish a common scale for the choice items. This can be done using traditional or IRT methods (see Dorans, 1990) and is justified if we believe that the missing responses yielded by examinee choice are generated by a process that is, in Little and Rubin’s (1987) terminology, conditionally ignorable.4 What we mean by this weaker assumption is that the probability of an examinee choosing any particular item is independent of his or her likelihood of getting that item correct, conditional on θ. In graphical terms, this means that an item’s trace lines are the same for those individuals who chose it as they would have been for those who omitted it. Subsequent discussions will be clearer if we repeat our characterization of the missing data assumption and the logic surrounding their genesis with more precision. Therefore, suppose yi is the score on test item Yi, and Ri is a choice function that takes the value 1 if Yi is chosen and 0 if not.
In a choice situation, we can observe the distribution of scores, f1( y), for those who opted to take an item. This can be denoted f1( y) = P(Y = y|R = 1).
Salkind_Chapter 75.indd 58
9/4/2010 10:58:25 AM
Wainer and Thissen
On Examinee Choice in Testing
59
What we do not know, but what is crucial if we are to be able to equate the different choice items, is the distribution of scores, f0( y), for those who did not take the item. This is denoted f0( y) = P( Y = y|R = 0). To be able to equate, we need to know the distribution of scores in the unselected population, g( y) = P( Y = y). Note that we can represent g( y) as g( y) = f1( y) × P(R = 1) + f0( y) × P(R = 0).
(l)
The only piece of this which is unknown is f0( y), the distribution of scores among those individuals who chose not to answer it. Unless one engages in a special data gathering effort, in which those examinees who did not answer Yi are forced to, f0( y) is not only unknown but unknowable. Thus, the conundrum is that we must equate to ensure fairness, but we cannot equate without knowing f0( y). One approach to such problems, mixture modeling (Glynn, Laird, & Rubin, 1986), involves a hypothesized structure for f0( y). It is convenient to assume that the function f0( y) is the same as f1( y). In formal terms, f1( y) = P( Y = y|R = l, θ) = P( Y = y|R = 0, θ) = f0( y) = P( Y = y|θ).
(2)
Or: We assume that the trace lines for the choice item would have been the same for those who didn’t choose it as it was for those who did. If we could gather the appropriate data (forcing those who opted not to answer it to do so), this hypothesis could easily be tested using standard DIF technology (Holland & Wainer, 1993). Although the conditional independence, given R and θ, expressed in Equation 2 has a surface similarity to the conditional independence, given θ, that underlies all of IRT, Equation 2 expresses a much stronger assumption that may or may not be true: Equation 2 states that, if θ is known, knowledge of whether the examinee chooses to answer an item or not does not affect the modeled probability of each response. This assumption is certainly contrary to the perceptions of examinees, who often feel that their choice of an item optimizes their score. However, there is little evidence available that illuminates the relationship between examinees’ preference for a particular item and their eventual score. Contrary to widespread belief, what little experimental evidence there is supports the assumption expressed in Equation 2. Thus, in the absence of contrary data, and because this assumption allows us to employ the existing technology of IRT to equate, we shall use it. For a fuller description of the structure and consequences of assumptions about missing data, the reader is referred to Allen and Holland (1993); of special importance in the examinee choice situation is their distinction between ignorable and forgettable nonresponse.
Salkind_Chapter 75.indd 59
9/4/2010 10:58:25 AM
60
Research Design, Measurement and Statistics and Evaluation
What Other Assumptions Are Necessary for Equating? While ignorable nonresponse is the only assumption that is new to this circumstance, it is not the only assumption required. In addition, we need to assume unidimensionality and fit to the test scoring model employed. These two latter assumptions are well known and can be tested with the test data ordinarily gathered; ignorable nonresponse cannot be. To test ignorable nonresponse requires a special data gathering effort. One example is the sort of data gathering scheme that Wang, Wainer, and Thissen (1993) employed: asking examinees to choose items but then requiring them to answer some of the items they did not choose. This is called sampling from the unselected population and will be discussed in greater detail later. Equating test forms constructed by examinee choice can be straightforward once we have made some assumptions about the unobserved distribution of scores f0( y). While one can derive a formal equating procedure for many assumed characterizations of the missing data, the assumption of conditionally ignorable nonresponse allows us to immediately use the existing machinery for IRT equating. One merely enters the various vectors of item responses and treats what’s missing as having not been presented to the individual. We establish a common scale by requiring a subset of items that all examinees must answer. This anchor test provides a set of items drawn from the unselected population on which we can also test model fit and unidimensionality.
How Can We Test Our Assumptions? The special assumption required to equate choice items involves the distribution of scores on the choice items from those who did not answer them: f0( y). This distribution is necessary to estimate g( y), the distribution of scores in the unselected population. There are many ways to test the viability of this assumption, but they all require some sort of special data gathering. We will describe two experimental designs that can be used to accomplish this. Design 1: Within subjects. In a randomly chosen subset of the examinee population, examinees must be required to indicate their choice but then required to answer all items. This design allows us to estimate all three parts of Equation 1 and so allows an explicit test of the assumption stated as Equation 2. A variant of this design was employed by Wang (1992) which asked examinees their choices both before and after they answered the questions. This design is subject to the criticism that examinees might not be particularly judicious in their choices when they know that they will have to answer all the questions anyway. If this conjecture is true, it is likely to affect the estimates of f0( y) and f1( y) more than that of the composite g( y). Using
Salkind_Chapter 75.indd 60
9/4/2010 10:58:25 AM
Wainer and Thissen
On Examinee Choice in Testing
61
the good estimates of f1( y) we can get from the operational choice test and the estimates of g( y) from the experimental administration, we can derive f0( y) through Equation 1. We used this design to test the assumption of ignorable nonresponse among some choice items in the 1989 AP Chemistry Test ( Wang, Wainer, & Thissen, 1993), using IRT-based DIF technology (Thissen, Steinberg, & Wainer, 1988, 1993; Wainer, Sired, & Thissen, 1991). Figure 5 shows the estimated trace lines for choice Items 11 and 12 for those examinees that chose each item [ f1( y)] as well as for those that did not [ f0( y)]. The apparent difference between the two trace lines for Item 11 2 is somewhat unlikely (χ(2) = 4), whereas there is no difference at all between the two trace lines for Item 12. Operationally, this means the ordinarily untestable assumption that we used to equate choice forms may be untrue for Item 11. A more extensive experiment seems in order. Note that the differences observed in the trace lines for Item 11, although not quite achieving nominal levels of statistical significance, suggest that Item 11 is easier for those who chose it than for those examinees who did not. This is not always the case. As part of the same study, we Tracelines for item 11 1.0 Those
em 11
hose it
who c
em 12
hose it
ho c hose w
T
T(x) 0.5
0.0 ⫺3
⫺2
⫺1
⫹3
⫹2
⫹1
0 θ
Tracelines for item 12
12 ite
ch os e
Th
os
e
w
Th
os
ho
e
w
ho
T(x) 0.5
m
ch os e
ite
m
11
1.0
0.0 ⫺3
⫺2
⫺1
0
⫹1
⫹2
⫹3
θ
Figure 5: Graphical tests of the ordinarily untestable assumption that choice items have the same trace lines for those who chose them as for those who did not
Salkind_Chapter 75.indd 61
9/4/2010 10:58:25 AM
62
Research Design, Measurement and Statistics and Evaluation
found that for another pair of choice items the reverse was true. In none of the cases examined were the differences between f1( y) and f0( y) so large as to generate errors in the equating larger than would have been the case had we not equated. Design 2: Between subjects. In a randomly chosen subset of the examinee population, examinees must be randomly assigned to each of the choice items. This will provide us with unbiased estimates of g( y) for each of the choice items and allow us to equate. It will not provide direct estimates of f0( y) and f1( y), but those can be obtained from the portion of the exam in which choice is allowed. As of this writing, an experiment that will have this format is currently being considered for the GRE Writing Test.
Test Dimensionality Because of the increasing interest in the development of tests that combine the psychometric advantages of multiple-choice items with other features of constructed response items, the following two questions assume importance: 1. Are we measuring the same thing with the constructed response items that we are measuring with the multiple-choice questions? 2. Is it meaningful to combine the scores on the constructed response sections with the multiple-choice score to yield a single reported total score? Answers to these questions are necessary to build appropriate score-reporting strategies for such hybrid tests. As we shall see, answering these questions is more difficult when the examinee is permitted to choose to answer a subset of the time-consuming constructed response questions (Wainer, Wang, & Thissen, 1991). The use of item response theory to score the test, or to equate forms comprising chosen questions, explicitly requires that the test (or forms) be essentially unidimensional – that all the items measure more or less the same thing. Thus, we must answer the dimensionality question to be able to score the test in a meaningful way. This is explicitly true when using IRT but also must be true when a test score is calculated in many other, less principled rubrics. Are hybrid tests unidimensional? The literature on this subject is equivocal. Bennett, Rock, Braun, Frye, Spohrer, and Soloway (1991) fitted different factor structures to two relatively similar combinations of multiple-choice, constructed response, and constrained constructed response items; a one-factor model was sufficient for one set of data, but a two-factor model was required for another similar set of data. Bennett, Rock, and Wang (1991) examined a particular two-factor model for the combined multiple-choice and constructed response items on the College Board’s Advanced Placement (AP)
Salkind_Chapter 75.indd 62
9/4/2010 10:58:26 AM
Wainer and Thissen
On Examinee Choice in Testing
63
Test in Computer Science and concluded that the one-factor model provided a more parsimonious fit. We reanalyzed (Thissen, Wainer, & Wang, 1993) the Computer Science AP data reported by Bennett et al. (1991) and showed that significant, albeit relatively small, factors explain some of the observed local dependence among the constructed response items. We replicated this finding using data from the AP test in chemistry. There was clear evidence that the constructed response problems on both of these tests measure something different than the multiple-choice sections of those tests: There were statistically significant factors for the constructed response items, orthogonal to the general factor. However, there was also clear evidence that the constructed response problems predominantly measure the same thing as the multiple-choice sections: The factor loadings for the constructed response items were almost always larger on the general (multiple-choice) factor than on the constructed response factor(s). The loadings of the constructed response items on the specifically constructed response factors were small, indicating that the constructed response items do not measure something different very well. Given the small size of the constructed response factor loadings, it is clear that it would take many constructed response items to produce a reliable score on the factor underlying the constructed response items alone – many more items than are currently used. When we asked the practical question, “Is it meaningful to combine the scores on the constructed response sections with the multiple-choice score to yield a single reported score?” we were driven to conclude that it probably is; indeed, given the small size of the loadings of the constructed response items on their own specific factors, it would probably not be meaningful to attempt to report a constructed response score separately, because it would not be reliably distinct from the multiple-choice score. Our investigation, and hence the above conclusions, utilized much of the same factor analytic technology, founded on complete data, that has become the standard in dimensionality studies (Jöreskog & Sörbom, 1986,1988). The procedure assumes that estimates of the covariances were obtained from what is essentially a random sample from the examinee population. However, when there are choice items, assuming a noninformative sampling process5 is not credible. What is analogous to Assumption 2 that will allow us to factor analyze the observed covariances and treat the results as if they came from the unselected population? Obviously, missingcompletely-at-random would suffice, but this is usually patently false in a choice situation. Can we weaken it? Unfortunately, not much. Suppose we make the obvious assumption that the covariances that we observe are the same as those we do not. Does this allow us to analyze what are observed as if they were the unconditioned covariances? It does not, even with this strong an assumption. To understand why, it is best if we trace the logic mathematically.
Salkind_Chapter 75.indd 63
9/4/2010 10:58:26 AM
64
Research Design, Measurement and Statistics and Evaluation
What must we assume to allow us to treat Cov( yi, yj | R i × R j = 1) as if they were Cov( yi, yj )? There are many possible assumptions. One, parallel to Assumption 2, would be to assume that the covariance involving a choice item is the same among those examinees who did not choose that item as it was among those that did – that is, Condition 1: Cov( yi, yj | R i × R j = 1) = Cov( yi, yj | R i × R j = 0). But this is not enough. We must also assume that the means for at least one of the two items in the covariance must be the same for those who chose it as it would have been for those who did not. Condition 2: E( yi | Ri = 1) = E( yi | Ri = 0) or E( yj | Rj = 1) = E( yj | R j = 0). A little algebra will confirm that these conditions will yield the desired result.6 How plausible is it that these two conditions will be upheld in practice? Clearly, if one thought that they were likely to be true, what would be the point of providing choice to examinees? Yet, to be able to justify the typical analyses used to answer the crucial dimensionality question, one must posit performance for examinees on the choice items that is essentially the same regardless of whether or not the items were chosen. We find this compelling evidence to look elsewhere for methodologies to answer dimensionality questions when there is choice. The missing data theory described above presents a convincing argument for the necessity of a special data gathering effort to estimate the covariances associated with choice items. We have demonstrated that there is no easy and obvious model that would allow the credible use of the observed covariances as a proxy for the covariances of interest. To obtain these, we need a special data gathering effort analogous to the ones described earlier. Both kinds of designs require a sample from the unselected population. Design 1 is exactly the same as described earlier. Design 2 is slightly different. Design 1: Within subjects. In a randomly chosen subset of the examinee population, examinees must be required to indicate their choice but then required to answer all items. As before, this provides estimates of the covariances involving the choice items that are uncontaminated by selfselection. They might suffer the same shortcoming as before; that is, examinees might not be particularly judicious in their choices when they know that they will have to answer all the questions anyway. This will affect any measured relations of yi and Ri but will probably be satisfactory for estimates of the covariances between the items. We have no data to shed light on these conjectures. Design 2: Between subjects. In a randomly chosen subset of the examinee population, examinees must be randomly assigned to all pairs of the choice items. This will provide us with unbiased estimates of Cov( yi, yj) for all pairs of the choice items. It will provide more stable estimates of the covariances between each choice item and all of the required items as well. It will thus
Salkind_Chapter 75.indd 64
9/4/2010 10:58:26 AM
Wainer and Thissen
On Examinee Choice in Testing
65
allow us to do dimensionality studies. Obviously, because this design does not gather any choice information, it cannot provide estimates of Cov( yi, Ri).
What Can We Learn from Choice Behavior? Thus far, our proposed requirements prior to implementing examinee choice fairly require a good deal of work on the part of both the examinee and the examiner. We are aware that extra work and expense are not part of the plan for many choice tests. Often, choice is allowed because there are too many plausible items to be asked and too little time to answer them. Is all of this work really necessary? Almost surely. At a minimum, one cannot know whether it is necessary unless it is done. To paraphrase Derek Bok’s comment on the cost of education, if you think doing it right is expensive, try doing it wrong. Yet many well-meaning and otherwise clear-thinking individuals ardently support choice in exams. Why? The answer to this question must, perforce, be impressionistic. We have heard a variety of reasons. Some are nonscientific; an example is “To show the examinees that we care.” The implication is that, by allowing choice, we are giving examinees the opportunity to do their best. We find this justification difficult to accept, because there is overwhelming evidence to indicate that this goal is unlikely to be accomplished. Which is more important – fairness or the appearance of fairness? Ordinarily, the two go together, but, when they do not, we must be fair and do our best to explain why. A second justification (W. B. Schrader, personal communication, March 7th, 1993) is that outstanding individuals are usually outstanding on a small number of things. If the purpose of the exam is to find outstanding individuals, we ought to allow them to have the option to show their maximum performance. We find this argument more convincing, but it is moot in a measurement task that is essentially unidimensional. A third justification might be termed instructional driven measurement (IDM). The argument is that because, in the classroom, students are often provided with choice options evaluation instruments ought to as well. This argument can be compelling, especially if one thinks of the choice options being those that teachers make: which topics to cover, in what order, from what perspective. Why should students suffer the consequences of unfortunate choice that were made on their behalf? The central question is: Can these issues be fairly addressed through the mechanism of allowing choice on exams? Let us consider more narrowly what we can learn from the choice behavior. Suppose we administer a test that is constructed of two sections. One section is mandatory, and everyone is required to answer all items. A second section contains choice. Equating of different test forms constructed by choice behavior can be done, if we make the usual assumptions required for IRT as well
Salkind_Chapter 75.indd 65
9/4/2010 10:58:26 AM
66
Research Design, Measurement and Statistics and Evaluation
as an assumption about the shape of the choice items’ trace lines among those who opted for other items. Suppose, instead, we examine the estimates of proficiency obtained from the mandatory section of the test. How well is proficiency predicted from the choices that examinees make? An illustration of such a test uses data drawn from the 1989 Advanced Placement Examination in Chemistry (Wainer & Thissen, 1993b). A full description of this test, the examinee population, and the scoring model is found in Wainer, Wang, and Thissen (1991). For the purposes of this illustration, we consider only the five constructed response items in Part II, Section D. Section D has five problems (Problems 5, 6, 7, 8, and 9), of which the examinee must answer three. This section accounts for 19% of the total grade. Because examinees had to answer three out of the five questions, a total of 10 choice groups was formed, with each group taking a somewhat different test form than the others. Each group had at least one problem in common with every other group; this overlap can be used to place all examinee selected forms on a common scale. The common items serve the role of the mandatory section described earlier. The fitting of a polytomous IRT model to all 10 forms simultaneously was described in Wainer, Wang, and Thissen (1991). As part of this procedure, we obtained estimates of the mean value of each choice group’s proficiency (μi) as well as the marginal reliability of this section of the test. Our findings are summarized in Table 6. The proficiency scale had a standard deviation of one; those examinees who chose the first three items (5, 6, and 7) were considerably less proficient, on the average, than any other group. The groups labeled 2 through 7 were essentially indistinguishable in performance from one another. Groups 8, 9, and 10 were the best performing groups. If we think of Section D as a single item with an examinee falling into one of 10 possible categories, then the estimated proficiency of each examinee is the mean score of everyone in that category. How reliable is this one-item test? We can derive an analog of reliability (see the appendix for a derivation), the Table 6: Summary statistics for the 10 groups formed by examinee choice on Problems 5–9 Problems chosen
Mean group proficiency (mi)
1
5,6,7
–1.02
2,555
0.63
2 3 4 5 6 7
6,7,9 5,6,8 5,7,9 5,7,8 6,7,8 5,6,9
–0.04 0.00* 0.04 0.08 0.08 0.09
121 5,227 753 4,918 1,392 457
0.65 0.57 0.64 0.51 0.54 0.67
8 9 10
6,8,9 7,8,9 5,8,9
0.40 0.43 0.47
407 898 1,707
0.57 0.59 0.59
Group
n
Cronbach’s α
*The mean for Group 3, the largest group, is fixed at 0.0 to set the location of the proficiency scale.
Salkind_Chapter 75.indd 66
9/4/2010 10:58:26 AM
Wainer and Thissen
On Examinee Choice in Testing
67
), from the squared correlation of proficiency (θ) with estimated proficiency (θ between-group variance [var(μi)] and the within-group variance (unity). This index of reliability, , θ) = var(μ )/[var(μ ) + 1], r2(θ i
i
is easily calculated. The variance of the μi is .17, and so r 2 ( θ, θ) is .15(= .17/1.17). It is informative to consider how close .15 is to .57, the reliability of these items when actually scored. Suppose we think of the task of selecting three out of five questions to answer as a single testlet. We can calculate the reliability of a test made up of any number of such testlets using the Spearman-Brown prophesy formula. Thus, if we ask the examinee to pick three from five on one set of topics and then three from five on another, we have effectively doubled the test’s length, and its reliability rises from .15 to .26. The estimated reliabilities for tests built of various numbers of such choice testlets are shown in Table 7. How much information is obtained by requiring examinees to actually answer questions and then grading them? The marginal gain for the AP Chemistry Test is very small; see Figure 6, which shows that at all of the important choice points the error of measurement is virtually the same whether the questions chosen are scored for the content of the answers or scored by noting which choices were made. Thus, we have seen that, for one test, the marginal gain in information by merely noting the choice is almost the same as that which is available from scoring the items. Interestingly, we did not need to make any assumptions about choice behavior, as we did when we scored the item content in the presence of choice-induced missing data, because there is no missing data if the data are the choices. There is no doubt that more information is available from scoring the constructed response items of the chemistry test than from merely observing which items were chosen to answer. This is reflected in the difference in the size of the reliabilities of the choice test versus the traditionally scored Table 7: Spearman-Brown extrapolation for building a test of specified reliability Number of testlets* 1 2 3 4 5 10 20
Reliability 0.15 0.26 0.35 0.41 0.47 0.64 0.78
*Here, each testlet comprises the task of selecting three questions out of five.
Salkind_Chapter 75.indd 67
9/4/2010 10:58:26 AM
68
Research Design, Measurement and Statistics and Evaluation
There is little gain in accuracy from scoring the choice items except at the highest levels of proficiency
Standard error of proficiency
0.5
0.4
1
0.3
2 0.2
Total test noting what choices were made Total test scoring choice items
3 4
Numbers designate Ap Score Categories shown demarked by dashed lines
5
0.1 ⫺3
⫺2
⫺1
0
1
2
3
Proficiency
Figure 6: A comparison of the standard errors of estimate of proficiency for two versions of the chemistry test derived by scoring the choice items (84) or merely noting which items were chosen (79). At the selection points of interest, scoring the choice items provides almost no practical increase in precision.
version. This advantage may be diminished considerably on tests based on constructed response items that are holistically scored. Such tests typically have much lower reliability than analytically scored tests. The reliabilities for the constructed response sections of 20 Advanced Placement Tests are shown in Table 8. Note that there is very little overlap between the distributions of reliability for analytically and holistically scored tests, the latter being considerably less reliable. Chemistry is a little better than average, among analytically scored tests, with a reliability of .78 for its constructed response sections. It is sobering to consider how well a test that uses only the information about which options are chosen would compare to one of the less reliably scored tests (i.e., any of the holistically scored tests). The structure of such a choice test might be to offer three or four sets of, say, five candidate essay topics, ask the examinees to choose three of those topics in each set that they would write on, and then stop. Perhaps a more informative analysis of the information available in choices compares it to other sorts of categorical information. Figure 7 shows that more Fisherian information is obtained from examinee choice than is obtained from knowledge of examinee sex and ethnicity but that it is still less than the information obtained from just two (good) multiple-choice items. It is not our intention to suggest that it is better to have examinees choose questions to answer than it is to actually have them answer them.7 We observe
Salkind_Chapter 75.indd 68
9/13/2010 3:38:27 PM
Wainer and Thissen
On Examinee Choice in Testing
69
Table 8: Reliabilities of constructed response sections of AP tests Analytically scored
Score reliability
Calculus AB Physics B Computer Science Calculus BC French Language Chemistry Latin – Virgil Latin – Catullus-Horace Physics C – Electricity Music Theory, Biology Spanish Language Physics C – Mechanics
Holistically scored
0.85 0.84 0.82 0.80 0.79 0.78 0.77 0.76 0.74 0.73 0.72 0.70 0.69 0.63 0.60 0.56 0.49 0.48 0.29
History of Art French Literature Spanish Literature English Language & Composition English Literature & Composition American History European History Music: Listening & Literature
While there is more information in choice patterns than in sex & ethnicity, there is much more information still in item responses 0.8
Information
0.6
Items 1 and 7
0.4
0.2
Choice items Sex & Ethnicity
0.0 1 ⫺0.2
⫺3
⫺2
2
3
⫺1
0 1 Proficiency
4
Numbers designate Ap Score Categories shown demarked by dashed lines
5 2
3
Figure 7: On the AP Chemistry exam, the information about chemistry knowledge provided by just two multiple-choice items dwarfs that available from sex and ethnicity or even choice behavior
only that, if the purpose is accurate measurement, some information can be obtained from the choices and that we can obtain this information without relying on untestable (and perhaps unlikely) assumptions about unobservable choice behavior. Moreover, one should feel cautioned if the test administration
Salkind_Chapter 75.indd 69
9/4/2010 10:58:27 AM
70
Research Design, Measurement and Statistics and Evaluation
and scoring scheme yield a measuring instrument little different in accuracy than would have been obtained by ignoring the performance of the examinee entirely.
Discussion We have painted a bleak psychometric picture for the use of examinee choice within fair tests. To make tests with choice fair requires equating the test forms generated by the choice for their differential difficulty. Accomplishing this requires either some special data gathering effort or trust in assumptions about the unobserved responses that, if true, obviate the need for choice. If we can successfully equate choice items, we have thus removed the value of choice in any but the most superficial sense. To extend these considerations, we need to be explicit about the goals of the test. There are many possible goals of a testing program. In this exposition, we will consider only three: contest, measurement, and device to induce social change. When a test is a contest, we are using it to determine a winner. We might wish to choose a subset of examinees for admission, for an award, or for a promotion. In a contest, we are principally concerned with fairness. All competitors must be judged under the same rules and under the same conditions. We are not concerned with accuracy, except to require that the test is sufficiently accurate to tell us the order of finish unambiguously. When a test is used for measurement, we wish to make the most accurate possible determination of some characteristic of an examinee. Usually measurement has some action associated with it; we measure blood pressure and then consider exercise and diet; we measure a child’s reading proficiency and then choose suitable books; we measure mathematical proficiency and then choose the next step of instruction. Similarly, we employ measurement to determine the efficacy of various interventions. How much did the diet lower blood pressure? How much better was one reading program than another? When measuring, we are primarily concerned with accuracy. Anything that reduces error may fairly be included on the test. When a test is a device to induce social change, we are using the test to influence behavior (Torrance, 1993). Sometimes the test is used as a carrot or a stick to influence the behavior of students; we give the test to get students to study more assiduously. Sometimes the test is used to influence the behavior of teachers; we construct the test to influence teachers’ choice of material to be covered. The recent literature (Popham, 1987) has characterized this goal as measurement driven instruction (MDI). MDI has engendered rich and contentious discussions, and we will not add to them here. The interested reader can begin with Cizek (1993) and work backward through the references provided by him. At first, it might appear that, when
Salkind_Chapter 75.indd 70
9/4/2010 10:58:27 AM
Wainer and Thissen
On Examinee Choice in Testing
71
a test is being used in this way, issues of fairness and measurement precision are not important, although the appearance of fairness may be. However, that is false. When a test is used to induce change, the obvious next question must be, “How well did it work?” If we used the test to get students to study more assiduously, or to study certain specific material, or to study in a different way, how much did they do so? How much more have the students learned than they would have under some other condition? The other condition might be no announced test, or it might be with a test of a different format. There are obvious experimental designs that would allow us to investigate such questions – but all require measurement.8 Thus, even when the purpose of the test is to influence behavior, that test still ought to satisfy the canons of good measurement practice. Thus far, we have confined our discussion to situations in which it is reasonable to assign any of the choice items to any examinee. Such an assumption underlies the notions of equating, which as we have used the term requires essential unidimensionality (using Stout’s, 1990, useful terminology), and also of the experiments we have described to ascertain the difficulty of the choice items in the unselected population. Situations in which examinees are given such a choice we call small choice. Small choice is used most commonly because it is felt that measurement of the underlying construct may be contaminated by the particular context in which the material is embedded. It is sometimes thought that by allowing examinee choice from among several different contexts a purer estimate of the underlying construct may be obtained. Consider, for example, the following two math problems that are intended to test the same conceptual knowledge: 1. The distance between the Earth and the Sun is 93 million miles. If a rocket ship took 40 days to make the trip, what was its average speed? 2. The Kentucky Derby is one and one-fourth miles in length. When Northern Dancer won the race with a time of 2 minutes, what was his average speed? The answer to both problems may be expressed in miles/hour. Both problems are formally identical, except for differences in the difficulty of the arithmetic. Allowing an examinee to choose between these items might allow us to test the construct of interest (Does the student know the relation Rate × Time = Distance?), while at the same time letting the examinees pick the context within which they feel more comfortable. Big choice. In contrast to small choice is a situation in which it makes no sense to insist that all individuals attempt all tasks (e.g., it is of no interest or value to ask the editor of the school yearbook to quarterback the football team for a series of plays in order to gauge proficiency in that context). We call this sort of situation big choice. Using more precise language, we would characterize situations involving big choice as multidimensional. Making
Salkind_Chapter 75.indd 71
9/4/2010 10:58:27 AM
72
Research Design, Measurement and Statistics and Evaluation
comparisons among individuals after those individuals have made a big choice is quite common. College admissions officers compare students who have chosen to take the French Achievement Test against those who opted for one in physics, even though their scores are on completely different scales. Companies that reward employees with merit raises usually have a limited pool of money available for raises and, in the quest for an equitable distribution of that pool, must confront such imponderable questions as “is person A a more worthy carpenter than person B is a statistician?” At the beginning of this account, we set aside big choice while we attempted to deal with the easier problems associated with small choice. Most of what we have discussed so far leans heavily on sampling responses in an unselected population and thus applies primarily to the small choice situation. Can we make useful comparisons in the context of big choice? Yes, but only for tests as contests, at least for the moment. When there is big choice, we can set out rules that will make the contest fair. We are not able to make the inferences that are usually desirable for measurement. To illustrate, let us consider the scoring rules for the decathlon as an illustration of scoring a multidimensional test without choice. Building on this example, we will expand to the situation of multidimensionality and choice. The decathlon is a 10-part track event that is clearly multidimensional. There are strength events like discus, speed events like the 100 m dash, endurance events like the 1,500 m run, and events that stress agility, like the pole vault. Of course, underlying all of these events is some notion of generalized athletic ability, which may predict performance in all events reasonably accurately.9 How is the decathlon scored? In a word, arbitrarily. Each event is counted “equally” in that an equal number of points is allocated for someone who equaled the world record that existed in that event at the time that the scoring rules were specified.10 How closely one approaches the world record determines the number of points received (i.e., if one is within 90% of the world record, one gets 90% of the points). As the world record in separate events changes, so too does the number of points allocated. If the world record got 10% better, then 10% more points would be allocated to that event. Let us examine the two relevant questions: Is this accurate measurement? Is this a fair contest? To judge the accuracy of the procedure as measurement, we need to know the qualities of the scale so defined. Can we consider decathlon scores to be on a ratio scale? Is an athlete who scores 8,000 points twice as good as someone who scores 4,000? Most experts would agree that such statements are nonsensical. Can we consider decathlon scores to be on an interval scale? Is the difference between an athlete who scores 8,000 and one who scores 7,000 in any way the same as the difference between one who scores 2,000 and another who scores 1,000? Again, experts agree that this is not true in any meaningful sense.
Salkind_Chapter 75.indd 72
9/4/2010 10:58:27 AM
Wainer and Thissen
On Examinee Choice in Testing
73
Can we consider decathlon scores to be ordinally scaled? Yes. A demonstration uses standard mathematical notation and is virtually identical to the description given in Krantz, Luce, Suppes, and Tversky (1971, p. 14): Definition: Let A be a set and ≥ be a binary relation on A, i.e. ≥ is a subset of A × A. The relational structure (A, ≥) is a weak order if and only if, for all a, b, c ∈ A, the following two axioms are satisfied: 1. Connectedness: Either a ≥ b or b ≥ a. 2. Transitivity: If a ≥ b and b ≥ c, then a ≥ c. If such a definition holds, it can be proved that If A is a finite nonempty set and if (A, ≥) is a weak order, then there exists a real-valued function φ on A such that for all a, b ∈ A, a ≥ b if and only if φ(a) ≥ φ(b). φ is then an ordinal scale.
Translating this into the current context, A might represent the collection of performances on one of the various decathlon events, scaled in seconds or meters or whatever, φ is the scoring function that translates all of those performances into points. It is straightforward to examine any particular scoring function to see if it satisfies these conditions. Obviously, any function that is monotonic will satisfy them. We conclude that decathlon scoring satisfies the conditions for an ordinal scale. A fair contest must. This raises an important and interesting issue: If we are using a test as a contest and we wish it to be fair, we must gather data that would allow us to test the viability of the assumptions stated in the definition above. The most interesting condition is that of transitivity. The condition suggests two possible outcomes in a situation involving multidimensional comparisons: 1. There may exist instances in which Person A is preferred to Person B and Person B to Person C, and, last, Person C is preferred to Person A. This happens sufficiently often so that we cannot always attribute it to random error. It means that, in some multidimensional situations, no ordinal scale exists. 2. Data that allow the occurrence of an intransitive triad are not gathered. This means that while the scaling scheme may fail to satisfy the requirements of an ordinal scale, which are crucial for a fair contest, we will never know. In a situation involving big choice, we do not know if the connectedness axiom is satisfied. How can we test the viability of this axiom if we can observe only a on one person and only b on another?
Salkind_Chapter 75.indd 73
9/4/2010 10:58:27 AM
74
Research Design, Measurement and Statistics and Evaluation
To get a better sense of the quality of measurement represented by the decathlon, let us consider what noncontest uses might be made of the scores. The most obvious use would be as a measure of the relative advantage of different training methods. Suppose we had two competing training methods – for example, one emphasizing strength and the other endurance. We could then conduct an experiment in which we randomly assigned athletes to one or the other of these two methods. In a pretest, we could get a decathlon score for each competitor and then another after the training period had ended. We could then rate each method’s efficacy as a function of the mean improvement in total decathlon score. While one might find this an acceptable scheme, it may be less than desirable. Unless all events showed the same direction of effect, some athletes might profit more from a training regime that emphasizes strength; others might need more endurance. It seems that it would be far better not to combine scores but, instead, to treat the 10 component scores as a vector. Of course, each competitor would almost surely want to combine scores to see how much his total had increased, but that is later in the process. The measurement task, from which we are trying to understand the relation between training and performance, is better done at the disaggregated level. It is only for the contest portion that the combination takes place. We conclude that scoring methods that resemble those used in the decathlon can only be characterized as measurement in an ordinal sense. And thus, the measures obtained are only suitable for crude sorts of inferences.
When Is a Contest Fair? In addition to the requirement of an ordinal scale, fair measurement also requires that all competitors know the rules in advance, that the same rules must apply to all competitors equally, and that there is nothing in the rules that gives one competitor an advantage over another because of some characteristic unrelated to the competition. How well do the decathlon rules satisfy these criteria? Certainly the scoring rules, arcane as they might be, are well known to all competitors, and they apply evenhandedly to everyone. Moreover, the measurements in each event are equally accurate for every competitor. Thus, if two competitors both throw the shot the same distance, they will get the same number of points. Last, is a competitor placed at a disadvantage because of unrelated characteristics? No; each competitor’s score is determined solely by his performance in the events. We conclude that the decathlon’s scoring rules comprise a fair contest even though they comprise a somewhat limited measuring instrument. The decathlon represents a good illustration of what can be done with multidimensional tests. Sensible scoring can yield a fair contest, but it is not
Salkind_Chapter 75.indd 74
9/4/2010 10:58:27 AM
Wainer and Thissen
On Examinee Choice in Testing
75
good measurement. There has been an attempt to somehow count all events equally, balancing the relative value of an extra inch in the long jump against an extra second in the 1,500 meter run. But no one would contend that they are matched in any formal way. Such formal matching is possible, but it requires agreement on the metric. The decathlon is a multidimensional test, but it is not big choice as we have previously defined it. Every competitor provides a score in each event (on every item). How much deterioration would result if we add big choice into this mix? Big choice makes the situation worse. One may be able to invent scoring rules that yield a fair contest but do not give an accurate measurement. As one example, consider ABC’s “Super Star’s Competition,” a popular TV pseudosport in which athletes from various sports are gathered together to compete in a series of seven different events. The athletes each select five events from among the seven. The winner of each event is awarded 10 points, second place 7, third place 5, and so on. The overall winner is the one who accumulates the most points. Some events are “easier” than others because fewer and /or lesser athletes elected to compete in that event; nevertheless, the same number of points are awarded. This is big choice by our definition, in that there are events that some athletes could not compete in (i.e., Joe Frazier, a former world champion boxer, chose not to compete in swimming because he could not swim). Are the scores in such a competition measurement? No. Is the contest fair? By the rules of fairness described above, yes, although the missingness of some of the data makes checking key underlying assumptions problematic. The current state of the art allows us to use big choice in a multidimensional context and, under limited circumstances, to have fair contests. We cannot yet have measurement in this context at a level of accuracy that can be called anything other than crude. As such, we do not believe that inferences based on such procedures should depend on any characteristic other than their fairness. This being the case, users of big choice should work hard to assure that their scoring schemes are indeed as fair as they can make them. Wainer (1993) and Wainer and Deveaux (1994) provide two detailed case studies describing how this might be accomplished. When is it not fair? Paul Holland (Allen, Holland, & Thayer, 1993, p. 5) calls big choice “easy choice,” because often big choice is really no choice at all. Consider a choice item in which an examinee is asked to discuss the plot of either (a) The Pickwick Papers or (b) Crime and Punishment from a Marxist perspective. If the student’s teacher chose The Pickwick Papers, there really is no choice. At least, the student had no choice. Because many times in a big choice situation the examinee really has no choice, in that it is not plausible to answer any but a single option, fairness requires the various options to be of equal difficulty. This returns us to the primary point of this account. How are we to ascertain the relative difficulty of big choice items?
Salkind_Chapter 75.indd 75
9/4/2010 10:58:27 AM
76
Research Design, Measurement and Statistics and Evaluation
Is Big Choice Useful When the Test’s Goal Is to Induce Social Change? If we wish to use the test to influence instruction, we might evaluate the success of the enterprise by surveying the field before and after the test became widespread. But this is surely only a superficial goal. The primary goal is not the structure of instruction but rather the effects of that instruction on the students. Thus, any attempt to measure the efficacy of an intervention (in this case a particular kind of test structure) must eventually use some sort of measuring instrument. We must also pay careful attention that the use of a test to induce change does not compromise its fairness. We know of one standardized science test that introduced a very easy item on a new topic as a possible choice. The goal was to influence teachers to cover this new area. Examinees whose teachers covered this topic had a distinct advantage over examinees whose teachers had not. Since the choice was really made months before the test, and by the teacher, not the student, is this fair?
Conclusions This summary of research is far from conclusive; many questions remain. It would be good to know how far away from unidimensionality a test can be and still yield acceptable measurement when choices are allowed. How far from ignorable can nonresponse be and still be acceptably adjusted for statistically? What kinds of conditioning variables are helpful in such adjustments? What are the most efficient kinds of data-gathering designs? Such questions lend themselves to solutions through careful experimentation and computer simulation. Can the uncritical use of choice lead us seriously astray? While there are several sources of evidence summarized in this article about the size of choice effects, we focused on just one series of exams. Summaries from other sources, albeit analyzed in several different ways, lead us to believe that the Advanced Placement Tests, often referred to because they currently involve choice, are not unusual. In fact, they may be considerably better than average. A recent experience with allowing choice in an experimental SAT is instructive (Lawrence, 1992). It has long been felt by math teachers that it would be better if examinees were allowed to use calculators on the mathematics portion of the SAT. An experiment was performed in which examinees were allowed to use a calculator, if they wished. The hope was that it would have no effect on the scores. Calculators did improve scores. The experiment also showed that examinees who used more elaborate calculators got higher scores than those who used more rudimentary ones. Sadly, a preliminary announcement had already been made indicating that the future SAT-M would allow examinees the option of using whatever calculator they wished, or not using one at all.
Salkind_Chapter 75.indd 76
9/4/2010 10:58:27 AM
Wainer and Thissen
On Examinee Choice in Testing
77
A testing situation corresponds to measuring people’s heights by having them stand with their backs to a wall. Allowing examinees to bring a calculator to the testing situation, or not, but not knowing for sure whether they had one, or what kind, corresponds to having some persons to be measured for height, unbeknownst to you, bring a stool of unknown and varying height on which to stand. Accurate and fair measurement is no longer possible in either case. Our discussion has concentrated on explicitly defined choice in tests, or alternative questions in the language of the first half of this century. However, in the case of portfolio assessment, the element of choice is implicit and not amenable to many of the kinds of analysis that have been described here. Portfolio assessment may be more or less structured in its demands on the examinee – that is, it may specify the elements of the portfolio more or less specifically. However, to the extent that the elements of the portfolio are left to the choice of the examinee, portfolio assessment more closely resembles ABC’s “Super Star’s Competition” than even the decathlon. In portfolio assessment, how many forms of the test are created by examinee choice? Often, as many as there are examinees! If that is the case, can those forms be statistically equated? No. This fact has clear consequences in the results obtained with portfolio assessment; for instance, Koretz, McCaffrey, Klein, Bell, and Stecher (1992) report that the reliability of the 1992 Vermont portfolio program measures was substantially less than is expected for useful measurement. Can it be otherwise, when the examinees (effectively) construct their own tests?11 Is building examinee choice into a test possible? Yes, but it requires extra work. Approaches that ignore the empirical possibility that different items do not have the same difficulty will not satisfy the canons of good testing practice, nor will they yield fair tests. But, to assess the difficulty of choice items, one must have responses from an unselected sample of fully motivated examinees. This requires a special sort of data gathering effort. What are we estimating when we use examinee selected items? If we are interested in θMax, then we need to choose the items for the examinees. The belief that the estimate of θMax obtained from examinee selected items is accurate has been disconfirmed by the data gathered so far. Although these data are of modest scope, they indicate what sorts of data need to be gathered to examine this question more fully. What can we do if the assumptions required for equating are not satisfied across the choice items? If test forms are built that cannot be equated (made comparable), scores comparing individuals on incomparable forms have their validity compromised by the portion of the test that is not comparable. Thus, we cannot fairly allow choice if the process of choosing cannot be adjusted away. Choice is anathema to standardized testing unless those aspects that characterize the choice are irrelevant to what is being tested.
Salkind_Chapter 75.indd 77
9/4/2010 10:58:27 AM
78
Research Design, Measurement and Statistics and Evaluation
Notes 1. Section II of the 1921 exam asked the examinee to answer 5 of 26 questions. This alone yielded more than 65 thousand different possible “forms.” When coupled with Section III (pick one essay topic from among 15) and Section I (“Answer 1 of the following 3”), we arrive at the unlikely figure shown in Table 1. 2. We will use both the language and notation of item response theory (IRT). This is not necessary; our argument could be phrased in traditional true score theory terms. We chose to place this argument within an IRT framework because it allows greater precision of explanation. This is especially important in later sections where being explicit about the estimand and the assumptions is critical. 3. We are oversimplifying IRT scoring here; for most IRT models, the score associated with each item response actually depends on the other item responses, and so it may be different for each examinee. 4. This varies a little from Little and Rubin’s (1987) conception. They would require the independence of choice given some observed conditioning variable. In our construction, the conditioning variable, θ, is latent. In any operational test, this difference is only a technical one for, when the test is longish, raw score (observable) and θ can be transformed from one to the other easily. This does not apply in situations like adaptive testing in which raw score is unrelated to θ. 5. A sampling process is noninformative in this case if, by knowing an individual’s choice, we learn nothing about how well they will do on the item. 6. Our thanks to Nick Longford for pointing this out to us. 7. Our colleague Nick Longford commented that this “suits perfectly the current American culture in which no one ever actually does anything but is concerned instead with management.” 8. It is not uncommon in education for innovations to be tried without an explicit design to aid in determining the efficacy of the intervention. Harold Gulliksen (personal communication, October 26, 1965) was fond of recounting the response he received when he asked what the control condition was against which the particular education innovation was to be measured. The response was “We didn’t have a control because it was only an experiment.” 9. Actually, it only predicts accurately for top-ranked competitors, who tend to perform “equally” well in all events. There are some athletes who are very much better in one event or another, but they tend to have much lower overall performance than generalists who appear more evenly talented. 10. The Olympic Decathlon scoring rules were first established in 1912 and allocated 1,000 points in each event for a world record performance. These scoring rules have been revised in 1936, 1950, 1964, and 1985. It is interesting to note (Mislevy, 1992) that the 1932 gold medal winner would have finished second under the current (1985) rules. 11. The idea of portfolio assessment includes two components, one of which is examinee choice of material to submit, and the other is that the material is collected over some longer period of time than in a conventional test. The latter idea, collecting responses over a long period of time, is certainly a useful one. However, the former idea, letting the examinees choose their test, leads to noncomparable (and unreliable) scores. Long-term data collection is certainly possible with well-specified prompts, questions, or items that leave the examinee no choice. Portfolio assessment would provide better measurement to the extent that the element of choice was removed. 12. We are grateful to Charles Lewis who suggested this analog for reliability, provided a derivation, and cautioned against its too broad usage.
Salkind_Chapter 75.indd 78
9/4/2010 10:58:27 AM
Wainer and Thissen
On Examinee Choice in Testing
79
Appendix12 How can we calculate a reliability coefficient from the classification of examinees by their choice of items? Let us assume that we know the mean proficiency of all examinees in each choice group. We will index examinees by j and choice groups by i, and the model we use is θij = μi + zij
(A1)
where the proficiency of person j in group i is θíj and is distributed normally with mean μi and variance 1. We represent the deviation of each person j within group i from that group’s mean as zij. θij with μi, the mean of group i, that is If we estimate θij = μ i .
(A2)
The correlation between θij and θij is analogous to validity if we think of θíj as the analog of true score. The square of this correlation can be thought of as a measure of reliability. Keeping this in mind, we can derive a computational formula for r 2 ( θij , θij) by noting that r 2 ( θij , θij) = [cov( θij , θij)]2 / [Var (θij) × Var(( θij )] .
(A3)
In the numerator, cov ( θij , θij ) = cov [E(θij | i), E( θij | i )] + E[cov ( θij , θij | i)] = cov(μi, μi) + E[Cov(θij, μi |i)] = var(μi).
θij , θij | i)]} is zero, and hence The rightmost term in the initial expression {E[Cov( the expression reduces to the covariance of μi with itself or the variance of μi. This is the expression in the numerator that we need to compute (A3). The denominaθij and θij. These are easily computed from tor requires the variance of both Var(θij) = Var[E(θij | i)] + E[Var(θij | i)] = var(μi) + 1,
(A4)
and Var(θij) = var(μi).
(A5)
Substituting these results into (A3) yields r 2 ( θij, θij ) = var(μ i ) / [var(μ i ) + 1].
(A6)
The estimate of var(μi ) we obtained from Section D of AP Chemistry is .17, and hence the estimated reliability [from (A6)] is .15.
Salkind_Chapter 75.indd 79
9/4/2010 10:58:27 AM
80
Research Design, Measurement and Statistics and Evaluation
References Allen, N. L., & Holland, P. W. (1993). A model for missing information about the group membership of examinees in DIF studies. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 241–252). Hillsdale, NJ: Erlbaum. Allen, N. L., Holland, P. W., & Thayer, D. T. (1993). The optional essay problem and the hypothesis of equal difficulty (ETS Tech. Rep. No. 93–94). Princeton, NJ: Educational Testing Service. Bennett, R. E., Rock, D. A., Braun, H. I., Frye, D., Spohrer, J. C., & Soloway, E. (1991). The relationship of expert-system scored constrained free-response items to multiplechoice and open-ended items. Applied Psychological Measurement, 14, 151–162. Bennett, R. E., Rock, D. A., & Wang, M. (1991). Equivalence of free-response and multiplechoice items. Journal of Educational Measurement, 28, 77–92. Brigham, C. C. (1934). The reading of the comprehensive examination in English. Princeton, NJ: Princeton University Press. Cizek, G. J. (1993). Rethinking psychometricians’ beliefs about learning. Educational Researcher, 22(4), 4 – 9. College Entrance Examination Board. (1905). Questions set at the examinations held June 19–24, 1905. New York: Ginn. College Entrance Examination Board. (1990). The 1989 Advanced Placement Examinations in Chemistry and their grading. Princeton, NJ: Advanced Placement Programs. DeMauro, G. E. (1991). The effects of the availability of alternatives and the use of multiple choice or essay anchor tests on constructed response constructs (Draft Report). Princeton, NJ: Educational Testing Service. Dorans, N. J. (1990). Scaling and equating. In H. Wainer with N. J. Dorans, R. Flaugher, B. F. Green, R. J. Mislevy, L. Steinberg, and D. Thissen, Computerized adaptive testing: A primer (pp. 137–160). Hillsdale, NJ: Erlbaum. Fitzpatrick, A. R., & Yen, W. M. (1993, April). The psychometric characteristics of choice items. Paper presented at the Annual Meeting of the National Council on Measurement in Education, Atlanta. Fremer, J., Jackson, R., & McPeek, M. (1968). Review of the psychometric characteristics of the Advanced Placement Tests in Chemistry, American History, and French (Internal Memorandum). Princeton, NJ: Educational Testing Service. Glynn, R. J., Laird, N. M., & Rubin, D. B. (1986). Selection modeling versus mixture modeling with nonignorable nonresponse. In H. Wainer (Ed.), Drawing inferences from self-selected samples (pp. 115–142). New York: Springer-Verlag. Gulliksen, H. O. (1950). A theory of mental tests. New York: Wiley. (Reprinted,1987, Hillsdale, NJ: Erlbaum). Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Erlbaum. Jöreskog, K. J., & Sörbom, D. (1986). PRELIS: A program for multivariate data screening and data summarization. Chicago, IL: Scientific Software. Jöreskog, K. J., & Sörbom, D. (1988). LISREL 7: A guide to the program and applications. Chicago, IL: SPSS. Kierkegaard, S. (1986). Either/or. New York: Harper & Row. Koretz, D., McCaffrey, D., Klein, S., Bell, R., & Stecher, B. (1992). The reliability of scores from the 1992 Vermont Portfolio Assessment Program (Interim Report). Santa Monica, CA: RAND Institute on Education and Training. Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement, Vol. 1. New York: Academic. Lawrence, I. (1992). Effect of calculator use on SAT-M score conversions and equating (Draft Report). Princeton, NJ: Educational Testing Service.
Salkind_Chapter 75.indd 80
9/4/2010 10:58:28 AM
Wainer and Thissen
On Examinee Choice in Testing
81
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley. Lukhele, R., Thissen, D., & Wainer, H. (1993). On the relative value of multiple-choice, free-response, and examinee-selected items in two achievement tests (ETS Tech. Rep. No. 93 - 28). Princeton, NJ: Educational Testing Service. (Also in press, Journal of Educational Measurement, 31.) Mislevy, R. J. (1992). Linking educational assessments: Concepts, issues, methods, and prospects (Draft Report). Princeton, NJ: Educational Testing Service. Pomplun, M., Morgan, R., & Nellikunnel, A. (1992). Choice in Advanced Placement Tests (Unpublished Statistical Report No. SR-92 - 51). Princeton, NJ: Educational Testing Service. Popham, W. J. (1987). The merits of measurement-driven instruction. Phi Delta Kappan, 68, 679–682. Powers, D. E., Fowles, M. E., Farnum, M., & Gerritz, K. (1992). Giving a choice of topics on a test of basic writing skills: Does it make any difference (Research Report No. 92 - 19)? Princeton, NJ: Educational Testing Service. Stout, W. F. (1990). A new item response theory modeling approach with applications to unidimensionality assessment and ability estimation. Psychometrika, 55, 293–325. Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147 – 169). Hillsdale, NJ: Erlbaum. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Erlbaum. Thissen, D., Wainer, H., & Wang, X. B. (1993). How unidimensional are tests comprising both multiple-choice and free-response items? An analysis of two tests (ETS Tech. Rep. No. 93-32). Princeton, NJ: Educational Testing Service. (Also in press, Journal of Educational Measurement, 31.) Torrance, H. (1993). Combining measurement-driven instruction with authentic assessment: Some initial observations of the national assessment in England and Wales. Educational Evaluation and Policy Analysis, 15, 81–90. Wainer, H. (1993). How much more efficiently can humans run than swim? Chance, 6, 17–21. Wainer, H., & Deveaux, R. (1994). Resizing triathlons for fairness. Chance 7(1): xxx–xxx. Wainer, H., Sireci, S. G., & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197–219. Wainer, H., & Thissen, D. (1993b). Choosing: A test (ETS Tech. Rep. No. 92–25). Princeton, NJ: Educational Testing Service. Wainer, H., Wang, X. B., & Thissen, D. (1991). How well can we equate test forms that are constructed by examinees (Tech. Rep. No. 91–15)? Princeton, NJ: Educational Testing Service. (Also in press, Journal of Educational Measurement, 31.) Wainer, H., & Wright, B. D. (1980). Robust estimation of ability in the Rasch model. Psychometrika, 45, 373–391. Wang, X. B. (1992). Achieving equity in self-selected subsets of test items. Unpublished doctoral dissertation, University of Hawaii at Manoa, Honolulu. Wang, X. B., Wainer, H., & Thissen, D. (1993). On the viability of some untestable assumptions in equating exams that allow examinee choice (ETS Tech. Rep. No. 93–31). Princeton, NJ: Educational Testing Service.
Salkind_Chapter 75.indd 81
9/4/2010 10:58:28 AM
Salkind_Chapter 75.indd 82
9/4/2010 10:58:28 AM
76 Historical Views of Invariance: Evidence from the Measurement Theories of Thorndike, Thurstone, and Rasch George Engelhard, Jr
The history of science is the history of measurement (Cattell, 1893, p. 316) The scientist is usually looking for invariance whether he knows it or not. (Stevens, 1951, p. 20)
S
tevens (1951) has presented a strong case for the general importance of the concept of invariance within the behavioral sciences. Invariance has also been identified as a fundamental aspect of measurement (Andrich, 1988a; Bock and Jones, 1968; Jones, 1960; Stevens, 1951). In essence, the goal of invariant measurement has been succinctly stated by Stevens: “the scientist seeks measures that will stay put while his back is turned” (1951, p. 21). The concept of invariance has implications for both item calibration and the measurement of individuals. As pointed out by Jones and Appelbaum (1989), developments in item response theory have led to constructive changes in psychological testing and the “primary advantage of IRT over classical test theory resides in properties of invariance” (p. 24). In a chapter on “Mathematics, Measurement and Psychophysics” which appeared in the Handbook of Experimental Psychology, Stevens (1951) described the role of invariance in mathematics and physics, and he argued that “many psychological problems are already conceived as the deliberate search for invariances” (p. 20). In fact, Stevens denned the whole field of science in terms of a quest for invariance and the concomitant generalizability of results. In his words, Source: Educational and Psychological Measurement, 52 (1992): 275–291.
Salkind_Chapter 76.indd 83
9/4/2010 10:58:11 AM
84
Research Design, Measurement and Statistics and Evaluation
The scientist is usually looking for invariance whether he knows it or not. Whenever he discovers a functional relationship his next question follows naturally: under what conditions does it hold? … The quest for invariant relations is essentially the aspiration toward generality, and in psychology, as in physics, the principles that have wide applications are those we prize. (Stevens, 1951, p. 20)
Applying this view of invariance more specifically to measurement issues, Stevens used the concept of invariance to define his familiar scales of measurement – nominal, ordinal, interval, and ratio scales (Stevens, 1946). In his words, Each of the four classes of scales is best characterized by its range of invariance – by the kinds of transformations that leave the “structure” of the scale undistorted. And the nature of invariance sets limits to the kinds of statistical manipulations that can be legitimately applied to the scaled data. (Stevens, 1951, p. 23)
Influenced by the insightful work of Mosier (1940, 1941), Stevens pointed out the symmetry between the fields of psychophysics and psychometrics as related to the concept of invariance: Psychophysics sees the response as an indicator of an attribute of the individual – an attribute that varies with the stimulus and is relatively invariant from person to person. Psychometrics regards the response as indicative of an attribute that varies from person to person but is relatively invariant for different stimuli. Both psychophysics and psychometrics make it their business to display the conditions and limits of these invariances. (Stevens, 1951, p. 31)
The first sentence in this quotation illustrates the idea of sample-invariant item calibration, whereas the second sentence points to the idea of iteminvariant measurement of individuals. This duality between psychophysics and psychometrics, which was clearly described by Mosier (1940, 1941) and pointed out even earlier by Guilford (1936), represents one of the five major ideas underlying test theory identified by Lumsden (1976). Measurement problems related to invariance can be meaningfully viewed in terms of these two broad classes – sample-invariant item calibration and item-invariant measurement of individuals. Within each of these two classes, invariance over methods and conditions can be examined. Methods refer to the statistical procedures and models, including the method used to collect the data, employed within the measurement theory. For example, paired comparison and successive interval scaling not only would represent different methods of data collection, but also would also require different statistical models. Conditions can refer to either subgroupings of items and/or examinees. For example, test equating is concerned
Salkind_Chapter 76.indd 84
9/4/2010 10:58:11 AM
Engelhard
Historical Views of Invariance
85
with the development of procedures which yield comparable estimates of an individual’s ability. These estimates are invariant over the subgroups of items (tests) which are used to obtain these ability estimates. As another example, the research on item bias or differential item functioning, as it has come to be labelled, reflects a concern with whether or not the meaning of an individual’s responses on a particular test item varies as a function of irrelevant factors related to membership in various social categories, such as gender, race, and social class.
Sample-Invariant Item Calibration The basic measurement problem underlying sample-invariant item calibration is how to minimize the influence of arbitrary samples of individuals on the estimation of item scale values. For example, Engelhard (1984) described how Thorndike provided a single adjustment (location) for differences in group characteristics, whereas Thurstone provided for two adjustments (location and scale). Rasch’s (1961) approach to sample-invariant calibration can be viewed as providing three adjustments (location, scale, and an individual level response model). Andrich (1978) has also provided an important comparison between Thurstone and Rasch approaches to item scaling by using paired comparison responses which can also lead to sample-invariant item calibrations. The overall goal of sample-invariant calibration of items is to estimate the location of items on a latent variable of interest which will remain unchanged across subgroups of individuals and also across various subgroups of items. For example, if the goal of sample-invariant calibration is achieved, then the item scale values will not be a function of subgroup characteristics, such as ability level, gender, race, or social class. Further, the calibration of the items should also be invariant over subsets of items, so that if a calibrated item bank (Wright and Bell, 1984) is being developed, the scale values of the items are not affected by the inclusion or exclusion of other items in the bank.
Item-Invariant Measurement of Individuals In the case of item-invariant measurement, the basic measurement problem involves minimizing the influence of the particular items which happen to be used to estimate an individual’s ability. This problem is also related to the scaling and equating of test scores, as well as to the scoring of each individual’s performance. Solutions to this problem usually include adjustments for item characteristics (item difficulty) and test characteristics (location, dispersion, and shape of item distributions on the latent variable scale). The overall objective is to obtain comparable estimates of individual ability regardless of which items are included in the test. This objective is essentially the problem of
Salkind_Chapter 76.indd 85
9/4/2010 10:58:11 AM
86
Research Design, Measurement and Statistics and Evaluation
equating person measurements obtained on tests composed of different items (Engelhard and Osberg, 1983). Invariance over scoring method also requires attention. In addition to considering invariance over methods, it is important to examine invariance over conditions within this context; an individual’s score should not depend on the scores of other individuals being tested at the same time. In summary, invariance can be viewed as an important general concept in the physical and behavioral sciences, as well as a key aspect of successful measurement in the behavioral sciences. As pointed out by Bock and Jones (1968), “in a well-developed science, measurement can be made to yield invariant results over a variety of measurement methods and over a range of experimental conditions for any one method” (p. 9).
Three Measurement Theories and Invariant Measurement The purposes of this section are to describe and to illustrate how the concept of invariance emerged within the measurement theories of Thorndike, Thurstone, and Rasch. As the most cogent statement of the conditions necessary to accomplish invariance is presented in the measurement theory of Rasch, this section begins with his research and then traces the adumbrations of these ideas within the work of Thurstone and Thorndike. It also should be pointed out that all three of these theorists wrote extensively on various measurement problems, and for Thorndike especially it was sometimes difficult to point to one consistent set of principles that defined his definitive “theory of measurement.” In order to address this issue, certain texts are explicitly cited. It should be understood that these texts are being used to define a particular individual’s “measurement theory.” This endeavor was not much of a problem for Rasch because he was very consistent in his views related to invariance; Thurstone was fairly consistent, whereas Thorndike was the least consistent of the three.
Rasch Based on psychometric research conducted during the 1950s, Rasch (1980/ 1960, 1961, 1966a, 1966b) presented a set of ideas and methods which were described by Loevinger (1965) as a “truly new approach to psychometric problems” (p. 151), which can lead to “nonarbitrary measures” (p. 151). One of the major characteristics of this new approach was Rasch’s explicit concern with the development of individual-centered techniques as opposed to the group-based measurement models used by measurement theorists such as Thorndike and Thurstone. In Rasch’s words, “individualcentered statistical techniques require models in which each individual is
Salkind_Chapter 76.indd 86
9/4/2010 10:58:11 AM
Engelhard
Historical Views of Invariance
87
characterized separately and from which, given adequate data, the individual parameters can be estimated” (1980/1960, p. xx). Problems related to invariance played an important role in motivating the measurement theory of Rasch. As pointed out by Andrich (1988a), Rasch presented “two principles of invariance for making comparisons that in an important sense precede, though inevitably lead to, measurement” (p. 18). The Rasch concept of specific objectivity which he formulated in terms of his principles of comparison form his version of the goals of invariant measurement (Rasch, 1977). In Rasch’s words, The comparison between two stimuli should be independent of which particular individuals were instrumental for the comparison; and it should also be independent of which stimuli within the considered class were or might also have been compared. Symmetrically, a comparison between two individuals should be independent of which particular stimuli within the class considered were instrumental for the comparison; and it should also be independent of which other individuals were also compared, on the same or on some other occasion (Rasch, 1961, pp. 331–332).
It is clear in this quotation that Rasch recognized the importance of both sample-invariant item calibration and item-invariant measurement of individuals. In fact, he made them the cornerstones of his quest for specific objectivity. In order to address problems related to invariance, Rasch laid the foundation for the development of a family of measurement models which are characterized by separability of item and person parameters (Masters and Wright, 1984). Rasch’s approach to sample-invariant item calibration involved the comparison of item difficulties obtained in separate groups. In this words, In relation to attainment tests all the school grades for which the tests are in practice applicable may be considered as forming a total collection of persons, that may be divided into subpopulations, such as single grades, sex groups, and age groups within a grade, social strata, etc. Between the test results in such more or less extensive groups the same fundamental relationship must hold, and if so we shall use the term that the relationship is “relatively independent of population,” the qualification “relatively” pointing to the degree of breakdown that has been applied to the data. (Rasch, 1980/1960, p. 9)
In his book, he used ability groups formed on the basis of raw scores. In essence, Rasch was “looking for trouble in a more or less definite direction; namely, for the possibility that the relative difficulties of the tests may vary with [raw score] that is, with the reading inability of the children” (Rasch, 1961, p. 323). This test of fit (or what Rasch referred to as control of the model) was presented graphically. Essentially, Rasch calibrated items separately in different score groups and then plotted them against the average calibrations
Salkind_Chapter 76.indd 87
9/4/2010 10:58:11 AM
88
Research Design, Measurement and Statistics and Evaluation
across score groups. If these plots are linear and parallel with slopes close to one, then a sample-invariant item calibration was approximated. If these plots were non-linear and non-parallel, then invariance did not hold over these scores groups. Because of the formal symmetry in the model proposed by Rasch between items and individuals, he used a similar graphic approach to examine whether or not item-invariant measurement of individuals had been achieved. In this case, ability estimates are obtained separately for item groups, and then plotted against the average ability estimates obtained from different item groups. If these plots are linear and parallel with slopes close to one, then successful item-invariant measurement of individuals has been approximated. If these conditions are not met, then item-invariant measurement of individuals is not possible with these items. Even though there are more sophisticated methods for examining invariance using statistical tests of item and person fit (Wright, 1988; Wright and Stone, 1979), the graphical methods clearly show whether or not invariance has been achieved. As will be seen in the next section, Thurstone used a similar graphical method to examine whether or not his method of absolute scaling was appropriate for a particular set of test data. By focusing on the individual as the level of analysis, Rasch was able to examine test data and to identify when invariance was exhibited. When the data fit the Rasch model, then the types of invariance which eluded research workers in the test theory tradition can be obtained. To quote Loevinger, Rasch is concerned with a different and more rigorous kind of generalization than Cronbach, Rajaratnam and Gleser. When his model fits, the results are independent of the sample of persons and of the particular items with some broad limits. Within these limits, generality is, one might say, complete. (Loevinger, 1965, p. 151)
Detailed descriptions of Rasch measurement are presented in Andrich (1988a), Wright and Stone (1979), Wright and Masters (1982), and Wright (1988).
Thurstone Thurstone also recognized the importance of invariant measurement. In fact, as pointed out by Bock and Jones (1968), “in the system of psychological measurement based on the Thurstonian models, we achieved some of the invariance in measurement which is characteristic of the other sciences” (p. 9). In developing his method of absolute scaling for calibrating test items, Thurstone (1925, 1927, 1928a, 1928b) was specifically motivated by the lack of sample-invariance he observed in Thorndike’s scaling method. In his words,
Salkind_Chapter 76.indd 88
9/4/2010 10:58:12 AM
Engelhard
Historical Views of Invariance
89
The probable error, or PE [used in Thorndike’s method], is not valid as a unit of measurement for educational scales. Its defect consists in that it does not possess the one requirement of a unit of measurement, namely constancy [emphasis added]. It fluctuates from one age to another. (Thurstone, 1927, p. 505)
The concept of constancy proposed by Thurstone is his version of an invariance condition, and it is an explicit consequence of measurement situations that yield objective measurements. Thorndike’s PE values fluctuate because the item scale values are not sample-invariant. A condition which violates Thurstone’s insight that the “scale value of an item should be the same no matter which age group is used in the standardization” (Thurstone, 1928a, p. 119). As did Rasch, Thurstone used the idea of a continuum to represent the latent variable of interest and assumed that items can be placed at points on this linear scale which would have a fixed position regardless of the group being tested. According to Thurstone, “if any particular test item or particular raw score is to be allocated on the absolute scale, its scale value should be ideally the same whether determined by group one or group two” (1925, p. 438). Thurstone also presented his ideas about invariance graphically. For example in several of his articles (1925, 1927), he presented overlapping ability distributions and pointed out that the location of the items on the latent variable scale should be invariant over different ability distributions. In order to adjust for differences in the location and variability of two or more distributions, Thurstone assumed a normal distribution of ability for each group and adjusted statistically for differences in locations (means) and scales (standard deviations). In order for these adjustments proposed by Thurstone to lead successfully to sample-invariant item calibration, Thurstone proposed a graphical test of fit that is essentially the same as Rasch’s approach. According to Thurstone, If the plot of Fig. 4 should be distinctly non-linear, the present scaling method is not applicable. Non-linearity here shows that the two distributions cannot both be normal on the same scale. If the plot is linear, it proves that both distributions may be assumed to be normal on the same scale or base line. (Thurstone, 1927, p. 513).
This test of fit can also be presented in the style of the graphical displays used by Rasch as shown by Engelhard (1984). The effects of using Thurstone’s method of absolute scaling, which provides adjustments for differences in the locations and variations of the ability distributions, as compared to Thorndike’s scaling method which simply adjusts for location differences, can be dramatic. Thurstone (1927) presented the results of using Thorndike’s method to calibrate a language scale developed by Trabue (1916). Trabue’s analysis based on Thorndike’s method indicates that
Salkind_Chapter 76.indd 89
9/4/2010 10:58:12 AM
90
Research Design, Measurement and Statistics and Evaluation
the average language ability increases as a function of grade level, whereas the variances remain constant. The results obtained by using Thurstone’s method also indicate that average ability increases with grade level, but the variances of the scores are not constant, as they tend to increase as a function of grade level. These results seem theoretically plausible. Thurstone’s method of absolute scaling is described and illustrated in detail in Engelhard (1984). An “experimental” adjustment for sample effects which occurs with Thurstone’s model for paired comparisons is described in Andrich (1978). Thurstone’s method of absolute scaling can also be used to scale test scores (Gulliksen, 1950), but a more interesting discussion of issues related to iteminvariant measurement is presented by Thurstone (1926) in an article on the scoring of individual performance. In this article, Thurstone presented a set of conditions as follows: 1. It should not be required to have the same number of test elements at each step of the scale. 2. It should be possible to omit several test questions at different levels of the scale without affecting the individual score. 3. It should be possible to include in the same scale two forms of test. 4. It should not be required to submit every subject to the whole range of the scale. The starting point and terminal point, being selected by the examiner, should not directly affect the individual score. 5. It should be possible to use the scale so that a rational score may be determined for each individual subject and so that the performance of groups of subjects may be compared. 6. The arithmetical labor in determining individual scores should be a minimum. 7. The procedure should be as far as possible consistent with psychophysical methods so that it will be free from the logical errors involved in the Binet scales and its variants. Conditions one to five clearly show Thurstone’s concern with item-invariant measurement. In his 1926 paper, he went on to propose a scoring method which meets these conditions. Thurstone’s approach is presented in detail by Engelhard (1991). In essence, Thurstone proposed what would be recognized today as person characteristic curves. Many of Thurstone’s articles on scaling are included in The Measurement of Values (1959), although his work on absolute scaling is not included in that volume. The technical details and elaborations of Thurstonian models are presented in Bock and Jones (1968). Andrich (1988c) provided a useful overview of Thurstone’s contributions to measurement theory. Although it is not directly relevant for this paper, it is interesting to note that Thurstone (1947), as did Rasch (1953), also used the concept of invariance as an important aspect of his approach to factor analysis.
Salkind_Chapter 76.indd 90
9/4/2010 10:58:12 AM
Engelhard
Historical Views of Invariance
91
Thorndike In 1904, Thorndike published the first edition of his highly influential book entitled An Introduction to the Theory of Mental and Social Measurements. Thorndike’s major aim in writing this book was to “introduce students to the theory of mental measurements and to provide them with such knowledge and practice as may assist them to follow critically quantitative evidence and argument and to make their own researches exact and logical (1904, p. v). Thorndike’s book was the standard reference on statistics and quantitative methods in the mental and social sciences for the first two decades of this century (Clifford, 1984; Engelhard, 1988; Travers, 1983). Much of this influence can be attributed to Thorndike’s (1904) clear and expository writing style. Thorndike explicitly acknowledged that contemporary work in measurement theory had not been presented in a manner suitable for students without fairly advanced mathematical skills. He set out to present a less mathematical introduction to measurement theory based on the belief that “there is, happily, nothing in the general principles of modern statistical theory but refined common sense, and little in the techniques resulting from them that general intelligence can not readily master” (p. 2). Thorndike, who wrote extensively on educational and psychological measurement, covered topics which ranged from the general statement of his theory (Thorndike, 1904) to the measurement of a variety of educational outcomes (Thorndike, 1910, 1914, 1918, 1921), as well as intelligence (Thorndike, Bregman, Cobb, and Woodyard, 1926). What were the basic measurement problems identified by Thorndike? Thorndike clearly stated that the “special difficulties” of measurement in the behavioral sciences are as follows: 1. Absence or imperfection of units in which to measure 2. Lack of constancy in the facts measured 3. Extreme complexity of the measurements to be made. In order to illustrate the problems related to the absence of an accepted unit of measurement, Thorndike (1904) pointed out that the spelling tests developed by Joseph Mayer Rice did not have equal units. Rice assumed that all his spelling words were of equal difficulty, whereas Thorndike argued that the correct spelling of an easy versus a hard word did not reflect equal amounts of spelling ability. Because the units of measurement are unequal, Thorndike asserted that Rice’s results were inaccurate. Without general agreement on units, the meaning of test scores becomes more subjective. Within the framework of this paper, Thorndike was illustrating that obtained scores may not be invariant over subsets of items which vary in difficulty. Inconstancy is the second major measurement problem identified by Thorndike (1904). Many of the measurement problems encountered in the
Salkind_Chapter 76.indd 91
9/4/2010 10:58:12 AM
92
Research Design, Measurement and Statistics and Evaluation
behavioral sciences are related to random variation inherent in human characteristics. These variations are due not only to the unreliability of tests, but also to within subject fluctuations. For example, if a person’s motivation is measured repeatedly, these values tend to vary. Thorndike’s concept of “constancy” is also related to the idea of invariance as developed in this paper. The final measurement problem or “special difficulty” identified by Thorndike pertains to the extreme complexity of the variables and constructs the social and behavioral scientists wish to measure. This problem reflects a concern with dimensionality. Most of the variables worth measuring in the behavioral sciences do not readily translate into unidimensional tests which permit the reporting of a single score to represent the individual’s location on the latent variable or construct of interest. As pointed out by Jones and Applebaum (1989), if unidimensionality is obtained for all items and over all groups of examinees, then item parameters will be invariant across groups, and ability parameters will be invariant across items. Methods for conducting item factor analyses designed to explore this issue have been summarized by Mislevy (1986), and an approach to this problem has been illustrated by Muraki and Engelhard (1985). Thorndike’s method for obtaining sample-invariant item calibration is very similar to Thurstone’s method of absolute scaling. As described by Thurstone, Thorndike’s scaling method consists in first determining the scale value of each item for each grade separately with the mean of each grade as an origin. The difficulty of a test item for Grade V children, for example, is determined by the proportion of right answers to the test item in that grade. When a test item has been scaled in several grades, the scale values so obtained will, of course, be different because of the fact that they are expressed as deviations from different grade means as origins. Thorndike then reduces all these measurements to a common origin in the construction of an educational scale by adding by each scale value the scale value of the mean of the grade (Thurstone, 1927, p. 508).
The major difference between Thorndike’s method of item scaling and Thurstone’s method of absolute scaling is that Thorndike assumed that the variances of the groups are equal. Thurstone criticized this assumption, ... it is clear that in order to reduce the overlapping sentences or test items to a common base line or scale it is necessary to make not one but two adjustments. One of these adjustments concerns the means of the several grade groups, and this adjustment is made by the Thorndike scaling methods. The second adjustment which is not made by Thorndike concerns the variation in dispersion of the several groups when they are referred to a common scale (Thurstone, 1927, p. 509).
In his later work, Thorndike did include an adjustment for the range of scores (Thomson, 1940).
Salkind_Chapter 76.indd 92
9/4/2010 10:58:12 AM
Engelhard
Historical Views of Invariance
93
Thorndike’s views of item-invariant measurement of individuals are presented in several places (Thorndike, 1914; Thorndike, Bregman, Cook, and Woodyard, 1926). Engelhard (1991) has presented a detailed description of Thomdike’s approach as applied to the measurement of reading ability (Thorndike, 1914). Essentially, Thorndike recommended using a set of procedures that are very similar to the methods of scoring individual performance used by Thurstone and Rasch. Thorndike also suggested examining person fit and proposed adjusting reading ability estimates when an individual responded in an inconsistent manner to the test items.
Comparison and Discussion of Three Measurement Theories The major similarities and differences among the measurement theories of Thorndike, Thurstone, and Rasch are summarized in this section. In general terms, it is clear that Thorndike, Thurstone, and Rasch were all working within a common scaling tradition. They based many of their proposed methods for calibrating test items and measuring individuals on statistical advances made within the field of psychophysics. One of the differences between psychophysics and psychometrics is that the independent variable is usually an observable variable in psychophysics, whereas in psychometrics the construct is usually unobservable. As this construct is not directly observable, these three psychometricians used the idea of a latent continuum to represent this unobservable variable. Although they all held similar positions on many measurement issues, there were also several important differences between the conceptualizations of Thorndike and Thurstone as compared to the views of Rasch. One of the major differences was the recognition by Rasch that measurement models can and should be developed based on the responses of individuals to single test items. This focus on the individual, rather than on groups, allowed Rasch to avoid making unnecessary assumptions regarding the distribution of abilities which were needed by both Thorndike and Thurstone. As pointed out earlier, Thorndike’s method of scaling test items and Thurstone’s method of absolute scaling were both based on the assumption that abilities were normally distributed. By using the individual and not the group, as the level of analysis, Rasch invented measurement models which are capable of providing estimates of the location of both items and individuals on a latent variable continuum simultaneously. This approach also allowed Rasch to develop probabilistic models rather than deterministic ones for modelling the probability of each individual succeeding on a particular test item as a function of his or her ability and the item difficulties. This probabilistic relationship is clearly shown in the familiar S-shaped item characteristic curves. Further, by simultaneously including item calibration and individual measurement within one model, he was able to derive “conditional” estimates
Salkind_Chapter 76.indd 93
9/4/2010 10:58:12 AM
94
Research Design, Measurement and Statistics and Evaluation
of these parameters which provide a framework for determining whether or not invariance has been achieved. In summary, many of the measurement problems that confront researchers in psychology and education today, such as those related to invariance, are not new. By taking a historical perspective on these measurement problems, one may find it possible to increase the understanding of the measurement problems themselves, to assess the adequacy of solutions proposed by major measurement theorists, and to identify promising areas for future research. Progress, and in some cases lack of progress, towards the solution of basic measurement problems can also be meaningfully documented. Progress is as difficult to define within the field of measurement as in any other field of study (Donovan, Laudan, and Laudan, 1988; Laudan, 1977). The analysis presented in this paper suggests that Rasch’s work provides a theoretical and statistical framework for the practical realization of invariant measurement that was sought by both Thorndike and Thurstone. The simultaneous inclusion of both ability and item difficulty within a probabilistic model defined at the individual level of analysis has provided a general framework in which item and person parameters can be estimated separately. Rasch was able to use recent advances in statistics, such as the concept of sufficiency developed by Fisher (1925), to propose an approach to measurement which provides practical solutions to many testing problems related to invariance. Measurement problems related to invariance are of fundamental importance for the development of meaningful measures in education and psychology. Item-invariant estimates of individual abilities and sample-invariant estimates of item difficulties are essential in order to realize the advantages of objective measurement. The conditions for objective measurement correspond to the concept of invariance as developed in this paper. The conditions for objective measurement are as follows: First, the calibration of measuring instruments must be independent of those objects that happen to be used for the calibration. Second, the measurement of objects must be independent of the instrument that happens to be used for the measuring (Wright, 1968, p. 87).
This paper provides a historical and substantive review of the problems related to invariant measurement as well as illustrates the progress which has been made toward solving measurement problems related to invariance. Further, this paper contributes to an appreciation of Rasch’s accomplishments and of the elegance of Rasch’s approach to problems related to invariant measurement. As pointed out by Andrich (1988b), Rasch’s achievements did not occur in a “historical vacuum” (p. 13). This paper illustrates the continuity and progress that is evident within the measurement theories of Thorndike, Thurstone, and Rasch.
Salkind_Chapter 76.indd 94
9/4/2010 10:58:12 AM
Engelhard
Historical Views of Invariance
95
Note This research was supported in part by the University Research Committee of Emory University. Support for this research was also provided through a Spencer Fellowship from the National Academy of Education. Earlier versions of this paper were presented at the Fifth International Objective Measurement Workshop at the University of California, Berkeley (March, 1989), and at the Sixth International Objective Measurement Workshop at the University of Chicago (April, 1991). Judith A. Monsaas and Larry Ludlow provided helpful comments on earlier drafts of this paper.
References Andrich, D. (1978). Relationships between the Thurstone and Rasch approaches to item scaling. Applied Psychological Measurement, 2, 449 – 460. Andrich, D. (1988a). Rasch models for measurement. Newbury Park, CA: Sage Publications, Inc. Andrich, D. (1988b). A scientific revolution in social measurement. Paper presented at the annual meeting of the American Educational Research Association in New Orleans. Andrich, D. (1988c). Thurstone scales. In J. P. Keeves (Ed.), Educational Research, Methodology, and Measurement: An International Handbook. Oxford, England: Pergamon Press. Bock, R. D. and Jones, L. V. (1968). The measurement and prediction of judgement and choice. San Francisco: Holden-Day. Cattell, J. K. (1893). Mental measurement. Philosophical Review, 2, 316 –332. Clifford, G. J. (1984). Edward L. Thorndike: The sane positivist. Middleton, CT: Wesleyan University Press. (Originally published 1968). Donovan, A., Laudan, L., and Laudan, R. (Eds.). (1988). Scrutinizing science: Empirical studies of scientific change. Boston: Kluwer Academic Publishers. Engelhard, G. (1984). Thorndike, Thurstone and Rasch: A comparison of their methods of scaling psychological tests. Applied Psychological Measurement, 8, 21–38. Engelhard, G. (1988, April). Thorndike’s and Wood’s principles of educational measurement: A view from the 1980’s. Paper presented at the annual meeting of the American Educational Research Association in New Orleans. (ERIC Document Reproduction Service No. ED 295 961). Engelhard, G. (1991). Thorndike, Thurstone and Rasch: A comparison of their approaches to item-invariant measurement. Journal of Research and Development in Education, 24(2), 45 – 60. Engelhard, G. and Osberg, D. W. (1983). Constructing a test network with a Rasch measurement model. Applied Psychological Measurement, 7, 283 –294. Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh: Oliver and Boyd. Guilford, J. P. (1936). Psychometric methods. New York: Mc-Graw Hill Book Company Inc. Gulliksen, H. (1950). Theory of mental tests. New York: J. Wiley and Sons. Jones, L. V. (1960). Some invariant findings under the method of successive intervals. In H. Gulliksen and S. Messick (Eds.), Psychological scaling: Theory and applications, (pp. 7–20). New York: John Wiley and Sons, Inc. Jones, L. V. and Appelbaum, M. I. (1989). Psychometric methods. Annual review of psychology, 40, 23 – 43. Laudan, L. (1977). Progress and its problems: Toward a theory of scientific change. Berkeley, CA: University of California Press.
Salkind_Chapter 76.indd 95
9/4/2010 10:58:12 AM
96
Research Design, Measurement and Statistics and Evaluation
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635–694. Loevinger, J. (1965). Person and population as psychometric concepts. Psychological Review, 72, 143 –155. Lumsden, J. (1976). Test theory. Annual review of psychology, 27, 251–280. Master, G. N. and Wright, B. D. (1984). The essential process in a family of measurement models. Psychometrika, 49, 529 – 544. Mislevy, R. J. (1986). Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 11, 3 – 31. Mosier, C. I. (1940). Psychophysics and mental test theory: Fundamental postulates and elementary theorems. Psychological Review, 47, 355 – 366. Mosier, C. I. (1941). Psychophysics and mental test theory II: The constant process. Psychological Review, 48, 235 – 249. Muraki, E. and Engelhard, G. (1985). Full-information item factor analysis: Applications of EAP scores. Applied Psychological Measurement, 9, 417– 430. Rasch, G. (1953). On simultaneous factor analysis in several populations. Uppsala Symposium on Psychological Factor Analysis. Nordisk Psykologi’s Monograph Series, 3. Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In J. Neyman (Ed.), Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, (pp. 321– 333). Berkeley, CA: University of California Press. Rasch, G. (1966a). An individualistic approach to item analysis. In P. F. Lazarsfeld and N. Henry (Eds.), Readings in Mathematical Social Science (pp. 89–107). Chicago: Science Research Associates. Rasch, G. (1966b). An item analysis which takes individual differences into account. British Journal of Mathematical and Statistical Psychology, 19, 49 – 57. Rasch, G. (1977). On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. Danish Yearbook of Philosophy, 14, 58 – 94. Rasch, G. (1980/1960). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press. [Originally published in 1960 by the Danish Institute for Educational Research]. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677– 680. Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S.S. Stevens (Ed.), Handbook of experimental psychology, (pp. 1– 49). New York: Wiley. Thomson, G. H. (1940). The nature and measurement of the intellect. Teachers College Record, 41, 726 –750. Thorndike, E. L. (1904). An introduction to the theory of mental and social measurements. New York: Teachers College, Columbia University. Thorndike, E. L. (1910). Handwriting. Teachers College Record, 11, 83 –175. Thorndike, E. L. (1914). The measurement of ability in reading. Teachers College Record, 15, 207– 277. Thorndike, E. L. (1918). The nature, purposes, and general methods of measurements of educational products. In Whipple, G. M. (Ed.), The seventeenth yearbook of the national society for the study of education. Part II, The measurement of educational products. Bloomington, IL: Public School Publishing Company. Thorndike, E. L. (1921). Measurement in education: Teachers College Record, 22, 371– 379. Thorndike, E. L., Bregman, E. O., Cobb, M. V., and Woodyard, E. (1926). The measurement of intelligence. New York: Bureau of Publications, Teachers College, Columbia University. Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 15, 433 – 451. Thurstone, L. L. (1926). The scoring of individual performance. Journal of Educational Psychology, 17, 446 – 457.
Salkind_Chapter 76.indd 96
9/4/2010 10:58:12 AM
Engelhard
Historical Views of Invariance
97
Thurstone, L. L. (1927). The unit of measurement in educational scales. Journal of Educational Psychology, 18, 505 –524. Thurstone, L. L. (1928a). II. Comment of Professor L. L. Thurstone. Journal of Educational Psychology, 19, 117–124. Thurstone, L. L. (1928b). Scale construction with weighted observations. Journal of Educational Psychology, 19, 441– 453. Thurstone, L. L. (1947). Multiple-factor analysis: A development and expansion of the vectors of mind. Chicago: The University of Chicago Press. Thurstone, L. L. (1959). The measurement of values. Chicago: The University of Chicago Press. Trabue, M. R. (1916). Completion-test language scales. Contributions to Education, No. 77. New York: Columbia University, Teachers College. Travers, R. M. W. (1983). How research has changed American schools: A history from 1840 to the present. Kalamazoo, MI: Mythos Press. Wright, B. D. (1968). Sample-free test calibration and person measurement. In Proceedings of the 1967 invitational conference on testing problems. Princeton, NJ: Educational Testing Service. Wright, B. D. (1988). Rasch measurement models. In J. P. Keeves (Ed.), Educational Research, Methodology, and Measurement: An International Handbook. Oxford, England: Pergamon Press. Wright, B. D. and Bell, S. R. (1984). Item banks: What, why and how. Journal of Educational Measurement, 21, 331– 345. Wright, B. D. and Masters, G. (1982). Rating scale analysis: Rasch measurement. Chicago: MESA Press. Wright, B. D. and Stone, M. H. (1979). Best test design: Rasch measurement. Chicago: MESA Press.
Salkind_Chapter 76.indd 97
9/4/2010 10:58:12 AM
Salkind_Chapter 76.indd 98
9/4/2010 10:58:12 AM
77 If Statistical Significance Tests Are Broken/Misused, What Practices Should Supplement or Replace Them? Bruce Thompson
A
few years ago Pedhazur and Schmelkin (1991) asserted that ‘probably very few methodological issues have generated as much controversy’ (p. 198) as have the use and interpretation of statistical significance tests. These tests have certainly proven surprisingly resistant to repeated efforts ‘to exorcise the null hypothesis’ (Cronbach, 1975, p. 124). Particularly noteworthy among the historical efforts to accomplish the exorcism have been works by Rozeboom (1960), Morrison and Henkel (1970), Carver (1978), Meehl (1978), Shaver (1985) and Oakes (1986). The entire Volume 61, Number 4 issue of the Journal of Experimental Education (1993) was devoted to these themes. Yet, notwithstanding the long-term availability of these publications, even today some psychologists still do not understand what statistical significance tests do and do not do. In a public-domain brief digest disseminated as a class handout by the US Department of Education Educational Resources Information Center, the present author (Thompson, 1994a) provided some simple tests of understanding of what pCALCULATED actually evaluates: In which one of each of the following [three] pairs of studies will the pCALCULATED be smaller? • In two studies each involving three groups of subjects each of size 30, in one study the means were 100, 100, and 90, and in the second study the means were 100, 100, and 100. Source: Theory & Psychology, 9(2) (1999): 165–181.
Salkind_Chapter 77.indd 99
9/4/2010 10:58:02 AM
100
Research Design, Measurement and Statistics and Evaluation
•
•
In two studies each comparing the standard deviations (SD) of scores on the dependent variable of two groups of subjects, in both studies SD1 = 4 and SD2 = 3, but in study one the sample sizes were 100 and 100, while in study two the samples sizes were 50 and 50. In two studies involving a multiple regression prediction of Y using predictors X1, X2, and X3, and both with samples sizes of 75, in study one R2 = .49 and in study two R2 = .25 (p. 5).
These judgments do not require calculations or additional information. However, making such judgments does require a genuine understanding of what statistical significance tests are all about.1 It is not clear how well most authors of journal articles would do on the previous three-item evaluation (cf. Falk & Greenbaum, 1995; Nelson, Rosenthal, & Rosnow, 1986; Oakes, 1986; Zuckerman, Hodgins, Zuckerman, & Rosenthal, 1993). Many of us continue to prefer ‘investing ... [these tests] with what appear to be magical powers’ (Pedhazur & Schmelkin, 1991, p. 198). And some of us try to use p values to cling to a mantle of unattainable objectivity. The use of statistical tests has recently stimulated yet more controversy. Harlow, Mulaik and Steiger (1997) provide a compendium of views on these issues (for a review, see Thompson, 1998a). Contemporary commentaries include those provided by Hunter (1997), Kirk (1996), Schmidt (1996) and the present author (Thompson, 1996, 1997). The less positive treatments of statistical significance tests have also provoked reactions from test advocates (cf. Chow, 1988; Frick, 1996; Greenwald, Gonzalez, Harris, & Guthrie, 1996; Hagen, 1997; Robinson & Levin, 1997). Yet even Frick (1996) acknowledged that critics of conventional practices ‘usefully point out the limitations of null hypothesis testing’ (p. 388). Given growing consciousness regarding these limitations, the APA Board of Scientific Affairs recently named a Task Force on Statistical Inference (Shea, 1996). The APA Task Force is charged with recommending policies and practices leading to more informed and thoughtful statistical analyses, including those involving the use of statistical significance tests. Articles within the American Psychologist, published on a seemingly periodic basis, have especially informed the movement of the field as regards statistical significance testing. Table 1 lists some of these articles, and also reports citation frequencies for the articles as of 1996. These American Psychologist articles, and the related comments published within the journal, have considerably influenced psychology and the social sciences more generally. For example, Roger Kirk (1996) characterized the two American Psychologist articles by Cohen as ‘classics’, and argued that ‘the one individual most responsible for bringing the shortcomings of hypothesis testing to the attention of behavioral and educational researchers is Jacob Cohen’ (p. 747). The present paper briefly reviews some of the consensus that has arisen or seems to be occurring as regards the use and limits of statistical significance tests. However, the present treatment also explores both (a) recommendations
Salkind_Chapter 77.indd 100
9/4/2010 10:58:03 AM
Thompson
Statistical Significance Tests
101
Table 1: Citations of selected American psychologist articles Number of citations Year
Author(s)
Pre-1991
1991
1992
1993
1994
1995
1996*
Total
1994 1991
Cohen
_
_
_
_
_
17
42
59
Rosenthal
_
_
6
2
2
2
1
13
1990
Cohen
_
18
36
38
23
28
23
166
1989
Rosnow & Rosenthal
4
18
23
20
14
12
13
104
1988
Kupfersmid
19
3
6
6
2
4
1
41
1987
Dar
10
2
6
2
2
1
1
24
*The most current Index at the time of this compilation only covered 1996 through September of that year.
involving changes in research practices and editorial policies and (b) related issues that the field has yet to resolve. Given some consensus that statistical significance tests are broken, misused or at least have somewhat limited utility, the focus of discussion within the field ought to move beyond additional bashing of statistical significance tests, and toward more constructive suggestions for improved practice.
Emerging Consensus The field appears to have achieved or is approaching consensus regarding certain limitations of statistical significance tests, notwithstanding some psychological resistance (Schmidt & Hunter, 1997; Thompson, 1998b). At least three noteworthy realizations can be briefly cited.
Result Effect Size First, researchers have recognized that p values are not useful as indices of study effect sizes (although some researchers still may implicitly deem more important those studies reporting smaller p values – cf. Rosenthal & Gaito, 1963; Zuckerman et al., 1993). The calculated p values in a given study are a function of several study features, but are particularly influenced by the confounded, joint influence of study sample size and study effect sizes. Because p values are confounded indices, in theory 100 studies with varying sample sizes and 100 different effect sizes could each have the same single PCALCULATED, and 100 studies with the same single effect size could each have 100 different values for PCALCULATED. This realization led to an important change in the fourth edition of the American Psychological Association Publication Manual (APA, 1994). The manual noted that
Salkind_Chapter 77.indd 101
9/4/2010 10:58:03 AM
102
Research Design, Measurement and Statistics and Evaluation
Neither of the two types of probability values [statistical significance tests] reflects the importance or magnitude of an effect because both depend on sample size. . . . You are [therefore] encouraged to provide effect size information. (APA, 1994, p. 18; emphasis added)
Result Importance Second, more and more researchers and editors have come to recognize that p values do not evaluate result importance. Therefore, p values cannot be used as an effective vehicle for escaping disagreement and confrontation regarding our subjective judgments of the worth of our results. As Thompson (1993) noted, … importance is a question of human values, and math cannot be employed as an atavistic escape (à la Fromm’s Escape from Freedom) from the existential human responsibility for making value judgments. If the computer package did not ask you your values prior to its analysis, it could not have considered your value system in calculating p’s, and so p’s cannot be blithely used to infer the value of research results. (p. 365)
Result Replicability Third, researchers have recognized that pCALCULATED values are not informative regarding the likelihood of result replication in future samples (Thompson, 1996). As Cohen (1994) made so clear, these calculations presume that the null hypothesis exactly describes the population, and then indicate the probability of the sample results (or of sample results even more disparate from the null than those in the actual sample), given the sample size. But what we want to know is the population parameters, given the statistics in the sample and the sample size. This interest in true population values stems from a desire to avoid the discovery of cold fusion, which leads to a single jubilant conference experience, followed by a lifetime of being shunned at all remaining professional meetings. If we could infer the population parameters, given the sample statistics and sample size, then we might have some confidence that future research would yield sample statistics similar to those in our own sample. Unfortunately, the direction of the inference in inferential statistics is from the population and to the sample, and not from the sample to the population (Thompson, 1997). Thus Cohen (1994) concluded that the statistical significance test ‘does not tell us what we want to know, and we so much want to know what we want to know that, out of desperation, we nevertheless believe that it does!’ (p. 997).
Salkind_Chapter 77.indd 102
9/4/2010 10:58:03 AM
Thompson
Statistical Significance Tests
103
Recommended Changes in Practice A few scholars have called for the banning of statistical significance tests (cf. Carver, 1978, 1993). However, the fact that many psychologists misinterpret statistical significance tests is not a reasonable warrant for banning these tests. As Strike (1979) explained, ‘To deduce a proposition with an “ought” in it from premises containing only “is” assertions is to get something in the conclusion not contained in the premises, something impossible in a valid deductive argument’ (p. 13). In logic this fallacy is called a ‘should/ would’ or ‘is/ought’ error (Hudson, 1969). But more and more researchers also now realize that ‘virtually any study can be made to show [statistically] significant results if one uses enough subjects’ (Hays, 1981, p. 293). This means that Statistical significance testing can involve a tautological logic in which tired researchers, having collected data from hundreds of subjects, then conduct a statistical test to evaluate whether there were a lot of subjects, which the researchers already know, because they collected the data and know they’re tired. (Thompson, 1992b, p. 436)
Consequently, attention has now turned toward ways to improve practice. Five potential improvements in practice are suggested here.
Effect Size Reporting Empirical studies of articles published since 1994 in psychology, counseling, special education and general education suggest that merely ‘encouraging’ effect size reporting (APA, 1994) has not appreciably affected actual reporting practices (e.g. Kirk, 1996; Snyder & Thompson, 1998; Thompson & Snyder, 1997, 1998; Vacha-Haase & Nilsson, 1998). Apparently, when it comes to reporting and interpreting effect sizes, many are called but few choose to be chosen. Consequently, editorial policies at some journals now require authors to report and interpret effect sizes (Heldref Foundation, 1997; Thompson, 1994b; see also Loftus, 1993; Shrout, 1997). It is particularly noteworthy that editorial policies even at one APA journal now indicate that If an author decides not to present an effect size estimate along with the outcome of a significance test, I will ask the author to provide specific justification for why effect sizes are not reported. So far, I have not heard a good argument against presenting effect sizes. Therefore, unless there is a real impediment to doing so, you should routinely include effect size information in the papers you submit. (Murphy, 1997, p. 4)
Effect sizes are important to report and interpret for at least two reasons. First, these indices can help inform judgment regarding the practical
Salkind_Chapter 77.indd 103
9/4/2010 10:58:03 AM
104
Research Design, Measurement and Statistics and Evaluation
or substantive significance of results. Statistical significance tests do not bear upon the noteworthiness of results, because improbable events are not necessarily important (see Shaver’s [1985] classic example), and because ‘if the null hypothesis is not rejected, it is usually [only] because the N is too small’ (Nunnally, 1960, p. 643). Second, reporting effect sizes facilitates the meta-analytic integration of findings across a given literature. People who incorrectly believe, either consciously or unconsciously, that statistical significance tests evaluate the probability of population parameters can exaggerate the importance of a single study, because the study then generalizes to the population. Persons who recognize the limits of these statistical tests realize that most single studies are important primarily only as building blocks within a cumulative body of evidence. As Schmidt (1996) noted: Meta-analysis ... has revealed how little information there typically is in any single study. It has shown that, contrary to widespread belief, a single primary study can rarely resolve an issue or answer a question. (p. 127)
Reporting effect sizes helps meta-analysts more easily and more accurately synthesize findings, because the analyst can then avoid using more approximate effects computed based on sometimes tenuous statistical assumptions. Of course, effect size is no more a panacea than is a statistical significance test, for two reasons noted by Zwick (1997). First, because human values are also not part of the calculation of an effect size, any more than values are part of the calculation of p, ‘largeness of effect does not guarantee practical importance any more than statistical significance does’ (p. 4). Second, some researchers seem to have adopted Cohen’s (1988) definitions of small, medium and large effects with the same rigidity that ‘α = .05’ has been adopted. Such rigidity is inappropriate. Cohen only intended these as impressionistic characterizations of result typicality across a diverse literature, and not as rigid universal criteria. However, some empirical studies suggest that the characterization is reasonably accurate (Glass, 1979; Olejnik, 1984), at least as regards a literature historically built with a bias against statistically non-significant results (Rosenthal, 1979). Notwithstanding these caveats, it is suggested that all authors of quantitative studies should report and interpret effect sizes. Because merely encouraging these practices has to date had little or no effect, at some point it may become necessary to require that effect sizes are reported. Of course, a requirement that effect sizes be reported does not inherently require that a whole new system of statistical analyses be invoked; all our classical analytic methods can be used to yield both pCALCULATED and effect size values, even though the methods have traditionally been used only for the first purpose.
Salkind_Chapter 77.indd 104
9/4/2010 10:58:03 AM
Thompson
Statistical Significance Tests
105
Effect Size Interpretability There are myriad effect sizes from which the researcher can choose. Useful reviews of the choices have been provided by Kirk (1996), Snyder and Lawson (1993) and Friedman (1968), among others. Effect sizes can be categorized into two broad classes: variance-accountedfor measures (e.g. R2, eta(η)2 ) and standardized differences (e.g. Cohen’s d, Hedges’ g) (Kirk [1996] identifies a third, ‘miscellaneous’ class. Varianceaccounted-for indices can be computed in all classical statistical analyses because all analyses are correlational, even though some designs are experimental and some are not (Knapp, 1978; Thompson, in press). Furthermore, effect sizes can be further subdivided as being either ‘uncorrected’ (e.g. R2, eta2) or ‘corrected’ (e.g. adjusted R2, omega(ω)2). Because all conventional analyses are least-squares correlational methods that capitalize on all sample variance, including the sampling error variance unique to the sample, all uncorrected variance-accounted-for statistics are positively biased and overestimate population effects. This bias can be statistically removed via the corrected effect size formulas which estimate the influence of the three major factors contributing to sampling error: 1. Samples with smaller sample sizes tend to have more sampling error. 2. Studies with more variables tend to have more sampling error. 3. Samples from populations with larger variance-accounted-for parameters tend to have less sampling error. Regarding this last influence, the case can be made clear at the extreme for a study involving the statistic r2. If the population parameter is 1.0, it is impossible to draw a sample that yields an inaccurate effect size, since from this population every sample involving any number of pairs of scores will yield an r2 of 1.0. The field has not yet established a single preferred effect size, a preference for variance-accounted-for as against standardized differences indices, or a preference for corrected as against uncorrected indices. It is doubtful that the field will ever settle on a single index to be used in all studies, given that so many choices exist and because the statistics can usually be translated into approximations across the two major classes. However, some pluses and minuses for both variance-accounted-for and standardized differences indices can be noted. On the one hand, variance-accounted-for indices do have the benefit of reinforcing the realization that all classical analyses are correlational (Knapp, 1978; Thompson, in press). This may minimize the autonomic choice of ANOVA as an analytic method based on an unconscious association of ANOVA with the ability to make causal inferences (cf. Humphreys & Fleishman, 1974).
Salkind_Chapter 77.indd 105
9/4/2010 10:58:04 AM
106
Research Design, Measurement and Statistics and Evaluation
On the other hand, standardized difference effect sizes (e.g. the difference of the experimental group mean minus the control group mean divided by the control group standard deviation) may be more directly interpretable. For example, Saunders, Howard and Newman (1988) argued that a varianceaccounted-for effect is ‘still cast in a language that was foreign to (and unusable by) practitioners’ (pp. 207–208); a variance-accounted-for 2 percent effect usually must be expressed in the metric of an outcome variable to be meaningful. However, not all studies involve experiments or a focus on means, and the use of standardized differences can seem stilted in such contexts. Thus, there are no clear-cut choices of an optimal effect size, or even a class of effect indices. But it does seem reasonable to expect at a minimum that effect sizes should always be presented in an accessible metric (e.g. years added to longevity, on the average, from not smoking; median number of additional months due to an intervention that Alzheimer’s patients were able to live without institutionalization). Several clinical disciplines have explored innovative ways to meet these requirements (see, e.g., the half-dozen articles in a 1988 special issue of Behavioral Assessment, including the report by Saunders et al. [1988]). But continued development of more effective ways to communicate effects remains warranted.
Values Explication Cohen’s (1988) typicality characterizations are not suitable as rigid criteria for noteworthiness, nor were they meant to be so used. The only suitable criteria for evaluating result value (a) must be informed by the personal, idiosyncratic values of each researcher and (b) must take into account the particular context of a given study. Regarding the first point, Huberty and Morris (1988) noted that, ‘As in all of statistical inference, subjective judgment cannot be avoided. Neither can reasonableness!’ (p. 573). Regarding the context of a given study, a 2 percent variance-accounted-for effect size will not be noteworthy to most researchers (or to most readers) in the context of a study like one I once read titled ‘Smiling and Touching Behavior of Adolescents in Fast Food Restaurants’. However, Gage (1978) pointed out that the relationship between cigarette smoking and lung cancer involves roughly this same effect size, and noted that: Sometimes even very weak relationships can be important. . . . [O]n the basis of such correlations, important public health policy has been made and millions of people have changed strong habits. (p. 21)
Certainly a small variance-accounted-for effect size involving highly valued outcomes, such as longevity, can be noteworthy. But since the judgments of result noteworthiness are inherently value-driven, and are ‘on the average’,
Salkind_Chapter 77.indd 106
9/4/2010 10:58:04 AM
Thompson
Statistical Significance Tests
107
even here some may reach a seemingly reasoned decision that the effect is not noteworthy, or at least not noteworthy enough to merit changed behavior. Many scientists will probably feel uncomfortable declaring their effects in a meaningful metric and then explicating the associated personal or societal values that make these effects noteworthy. Declarations that ‘My results were [statistically] significant’ will have to be replaced with, ‘This intervention extends life expectancy, on the average, by 1.4 years, and given my valuing of life, I believe this result is noteworthy.’ Historically, social scientists have used p statistics as a way to finesse values differences, because conflicting values of different people are not readily reconcilable. Nevertheless, researchers should be expected to declare the values that make their effects noteworthy. Normative practices for evaluating such assertions will have to evolve. Research results should not be published merely because the individual researcher thinks the results are noteworthy. By the same token, editors should not quash research reports merely because they find explicated values unappealing. These resolutions will have to be formulated in a spirit of reasoned comity. But we also must realize that our historical reliance on p values as a way to avoid value assertions led only to feigned objectivity, and not to real objectivity. This feigned objectivity was built on the edifice of misinterpretation of what statistical significance tests really do.
Evidence of Replicability The cumulation of knowledge about relationships that recur under specified conditions is the sine qua non of science for those psychologists who believe that such laws can reasonably be formulated. For these psychologists evidence of result replicability is critical for creating a warrant that results are noteworthy. The required nature of this warrant has received too little attention in an era when statistical significance tests were thought to evaluate result replicability, when these tests were thought to evaluate (rather than merely to presume) selected population parameters. Several vehicles for establishing these warrants can be noted. One warrant involves an important contribution that Jacob Cohen made in his 1994 article; this very important contribution has not been as widely noticed as might be hoped (Hagen, 1997). Cohen (1994) carefully distinguished the general class of ‘null’ hypothesis tests from a subclass of null tests he labeled the ‘nil’ hypothesis test. (A related important distinction is what Meehl [1997] has described as ‘strong’ vs ‘weak’ null hypothesis refutation.) For Cohen, a nil null hypothesis always specifies zero difference or zero relationship (e.g. for the especially inappropriate test of a reliability statistic,
Salkind_Chapter 77.indd 107
9/4/2010 10:58:04 AM
108
Research Design, Measurement and Statistics and Evaluation
H 0: rXX = 0; H A: rXX ≠ 0 ), while other non-nil null hypotheses may test an alternative hypothesis such as HA: rXX > .7). Cohen’s important distinction recognizes that a ‘null hypothesis means the hypothesis to be nullified, not necessarily a hypothesis of no difference’ (Chow, 1988, p. 105). Some specific null must be presumed true in the population, or otherwise infinitely many parameters are possible and the pCALCULATED for the sample results becomes indeterminate (Thompson, 1996). Most researchers use a nil hypothesis as the null partly because this is what most computer packages assume, and partly because methodology for invoking non-nil null hypotheses has some ‘complexity, and it is not yet readily applicable in many designs’ (Dar, Serlin, & Omer, 1994, p. 81). The mindless use of the nil hypothesis obviates the necessity prospectively to extrapolate thoughtful expected effect sizes from prior literature as part of study design. Furthermore, the interpretation of ‘[statistical] significance’ as indicating result value means that some researchers do not retrospectively interpret their study effects in the context of specific previous findings. These failures are most unfortunate, because the prospective and retrospective use of effects from prior studies is itself a check on the replicability of results in a given inquiry. Empirical evidence for result replicability can be either ‘external’ or ‘internal’ (Thompson, 1993, 1996). ‘External’ replication studies invoke a new sample measured at a different time and/or a different location. Such replications have unfortunately been undervalued (Robinson & Levin, 1997), perhaps because some researchers thought they were already testing replicability by conducting statistical significance tests. ‘Internal’ replicability analyses use the sample in hand to combine the participants in different ways to try to estimate how much the idiosyncracies of individuality within the sample have compromised sample results. The major ‘internal’ replicability analyses are cross-validation, the jackknife and the bootstrap (Diaconis & Efron, 1983); the logics are reviewed in more detail elsewhere (cf. Thompson, 1993, 1994c). ‘Internal’ evidence for replicability is never as good as an actual replication (Robinson & Levin, 1997; Thompson, 1997), but is certainly better than presuming that a statistical significance test assures result replicability. And such ‘internal’ replicability evidence is useful for researchers who for practical reasons cannot externally replicate all results prior to graduation or tenure review. It is important that, when used to evaluate result replicability, these logics are not confused with other uses of the same logics (Thompson, 1993). For example, the inferential use of the bootstrap involves using the bootstrap to estimate a sampling distribution when the sampling distribution is not known or assumptions for the use of a known sampling distribution cannot be met. The descriptive use of the bootstrap looks primarily at the variance in parameter estimates across many different combinations of the participants.
Salkind_Chapter 77.indd 108
9/4/2010 10:58:04 AM
Thompson
Statistical Significance Tests
109
The inferential application requires considerably more ‘re-samples’ (see Thompson, 1994c) than the descriptive application recommended here. This is because the inferential focus is on the tails of the estimated sampling distribution (e.g. the 95th percentile of the distribution, for a one-tailed statistical significance test), rather than the descriptive focus on the standard deviation (i.e. the ‘standard error’) of the sampling distribution. Participants in the tails of the sampling distribution are rarer, and therefore many more bootstrap re-samples are required to estimate these very small or large percentiles. The field has not yet resolved all the issues involved in establishing a sufficient warrant for result replicability, again, perhaps, because some authors incorrectly assumed that statistical tests evaluated the population. The relevant software to conduct ‘internal’ bootstrap analyses is already available (e.g. Lunneborg, 1987, for univariate applications, and Thompson, 1992a, 1995, for multivariate applications). Because replicability evidence is critical to the cumulation of knowledge, more authors should be expected to provide some evidence of result replicability.
Reporting Confidence Intervals Various scholars have recommended that confidence intervals should be used to replace or supplement statistical significance tests (e.g. Dar, Serlin, & Omer, 1994; Meehl, 1997; Schmidt, 1996; Serlin, 1993). However, researchers using confidence intervals must remember that ‘the interval endpoints are themselves random variables’ (Zwick, 1997, p. 5) also estimated using sample data. That is, the confidence interval does not indicate that, given the endpoints, the chances are X percent that the interval will include the parameter (Falk & Greenbaum, 1995; Howson & Urbach, 1994). Furthermore, researchers who mindlessly interpret confidence intervals only against the standard of whether the interval subsumes zero are doing nothing more than a mindless ‘nil’ hypothesis test (Cortina & Dunlap, 1997). However, confidence intervals do have one very appealing feature, as Schmidt (1996) made clear. Even if all the research in an area of inquiry was based on radically erroneous estimates of parameters (and even if these a priori estimates were used in specifying non-nil null hypotheses), the parameter would still emerge across studies as a series of overlapping confidence intervals converging on the same parameter. The use of confidence intervals might also mitigate against the current bias in the literature (a) first favoring the publication of Type I errors and (b) then disfavoring publication of replication studies revealing the previously published Type I error. Setting alpha at a small level does not prevent any Type I errors; rather, the percentage of such errors is capped at a small proportion. But some such errors will unavoidably occur. Because the literature has been biased in favor of statistically significant results (Rosenthal, 1979), such
Salkind_Chapter 77.indd 109
9/4/2010 10:58:04 AM
110
Research Design, Measurement and Statistics and Evaluation
Type I errors are afforded priority for publication, but the replications with statistically non-significant results will compete at a disadvantage for journal space, and so the self-correction of science through replication will be impeded. Greenwald (1975) cited relevant actual examples. A focus on consistency of findings across studies can be achieved with confidence intervals interpreted in relation to each other, rather than against the nil standard of a zero value. Therefore, it is suggested that more authors should report confidence intervals as part of their results.
Summary Kirk (1996) recently noted that, ‘Our science has paid a high price for its ritualistic adherence to null hypothesis significance testing’ (p. 756). The overuse and misinterpretation of statistical tests has been frequently decried as well in literatures other than psychology, including medicine (Kraemer, 1992; Pocock, Hughes, & Lee, 1987), business (Sawyer & Peter, 1983), occupational therapy (Ottenbacher, 1984) and speech and hearing (Young, 1993). Nevertheless, the use of statistical significance tests remains common, and some empirical studies reflect even an increased use of these methods (Parker, 1990)! Many have marveled at the robustness of the statistical significance logic against the application of the wooden stake through the heart. For example, Falk and Greenbaum (1995) noted: We have shown the compelling nature and the robustness of that illusion [that statistical significance tests give us the information we need]. A massive educational effort is required to eradicate the misconception and extinguish the mindless use of a procedure that dies hard. (p. 94)
And Harris (1991) observed, ‘it is surprising that the dragon will not stay dead’ (p. 375). Frick (1996) cited an anonymous reviewer of his defense of statistical significance testing who argued that, ‘A way of thinking that has survived decades of ferocious attacks is likely to have some value’ (p. 379). Of course, this view presumes a completely rational model of science in which scientists are objective, dispassionate logicians never acting merely out of habit; the view also presumes that scientists are always anxious to admit past errors publicly made in the articles they themselves published over the courses of their careers. Five specific suggestions for improved analytic practice have been presented here. It should be noted that these suggestions can be followed even by those psychologists still employing conventional statistical significance tests. But social science will proceed most rapidly when research becomes the search for replicable effects noteworthy in magnitude in the context of both the inquiry and personal or social values.
Salkind_Chapter 77.indd 110
9/4/2010 10:58:04 AM
Thompson
Statistical Significance Tests
111
Note 1. For each of the three pairs of studies, the first study within each pair has a smaller PCALCULATED value, if conventional nil null hypotheses (i.e. H0: M1 = M2 = M3; H0: SD1 = SD2; and R2 = 0) are used.
References American Psychological Association. (1994). Publication manual of the American Psychological Association (4th ed.). Washington, D.C.: Author. Carver, R. (1978). The case against statistical significance testing. Harvard Educational Review, 48, 378 – 399. Carver, R. (1993). The case against statistical significance testing, revisited. Journal of Experimental Education, 61, 287 – 292. Chow, S.L. (1988). Significance test or effect size? Psychological Bulletin, 103, 105 – 110. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304 – 1312. Cohen, J. (1994). The earth is round ( p < .05). American Psychologist, 49, 997 – 1003. Cortina, J.M., & Dunlap, W.P. (1997). Logic and purpose of significance testing. Psychological Methods, 2, 161–172. Cronbach, L.J. (1975). Beyond the two disciplines of psychology. American Psychologist, 30, 116 –127. Dar, R. (1987). Another look at Meehl, Lakatos, and the scientific practices of psychologists. American Psychologist, 42, 145 –151. Dar, R., Serlin, R.C., & Omer, H. (1994). Misuse of statistical tests in three decades of psychotherapy research. Journal of Consulting and Clinical Psychology, 62, 75 – 82. Diaconis, P., & Efron, B. (1983). Computer-intensive methods in statistics. Scientific American, 248(5), 116 – 130. Falk, R., & Greenbaum, C.W. (1995). Significance tests die hard: The amazing persistence of a probabilistic misconception. Theory & Psychology, 5, 75 – 98. Frick, R.W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1, 379 – 390. Friedman, H. (1968). Magnitude of experimental effect and a table for its rapid estimation. Psychological Bulletin, 70, 245 – 251. Gage, N.L. (1978). The scientific basis of the art of teaching. New York: Teachers College Press. Glass, G.V. (1979). Policy for the unpredictable (uncertainty research and policy). Educational Researcher, 8(9), 12 – 14. Greenwald, A.G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82, 1– 20. Greenwald, A.G., Gonzalez, R., Harris, R.J., & Guthrie, D. (1996). Effect size and p-values: What should be reported and what should be replicated? Psychophysiology, 33, 175 –183. Hagen, R.L. (1997). In praise of the null hypothesis statistical test. American Psychologist, 52, 15 – 24. Harlow, L.L., Mulaik, S.A., & Steiger, J.H. (Eds.). (1997). What if there were no significance tests? Mahwah, NJ: Erlbaum. Harris, M.J. (1991). Significance tests are not enough: The role of effect size estimation in theory corroboration. Theory & Psychology, 1, 375 – 382. Hays, W.L. (1981). Statistics (3rd ed.). New York: Holt, Rinehart & Winston.
Salkind_Chapter 77.indd 111
9/4/2010 10:58:04 AM
112
Research Design, Measurement and Statistics and Evaluation
Heldref Foundation (1997). Guidelines for contributors. Journal of Experimental Education, 65, 287– 288. Howson, C., & Urbach, P. (1994). Probability, uncertainty and the practice of statistics. In G. Wright & P. Ayton (Eds.), Subjective probability (pp. 39 – 51). Chichester: Wiley. Huberty, C.J., & Morris, J.D. (1988). A single contrast test procedure. Educational and Psychological Measurement, 48, 567– 578. Hudson, W.D. (1969). The is/ought question. London: Macmillan. Humphreys, L.G., & Fleishman, A. (1974). Pseudo-orthogonal and other analysis of variance designs involving individual-differences variables. Journal of Educational Psychology, 66, 464 – 472. Hunter, J.E. (1997). Needed: A ban on the significance test. Psychological Science, 8, 3 – 7. Kirk, R. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746 – 759. Knapp, T.R. (1978). Canonical correlation analysis: A general parametric significance testing system. Psychological Bulletin, 85, 410 – 416. Kraemer, H.C. (1992). Reporting the size of effects in research studies to facilitate assessment of practical or clinical significance. Psychoendocrinology, 17, 527– 536. Kupfersmid, J. (1988). Improving what is published: A model in search of an editor. American Psychologist, 43, 635 – 642. Loftus, G.R. (1993). Editorial comment. Memory & Cognition, 21, 1 – 3. Lunneborg, C.E. (1987). Bootstrap applications for the behavioral sciences. Seattle: University of Washington Press. Meehl, P.E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806 – 834. Meehl, P.E. (1997). The problem is epistemology, not statistics: Replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 391– 423). Mahwah, NJ: Erlbaum. Morrison, D.E., & Henkel, R.E. (Eds.). (1970). The significance test controversy. Chicago, IL: Aldine. Murphy, K.R. (1997). Editorial. Journal of Applied Psychology, 82, 3 – 5. Nelson, N., Rosenthal, R., & Rosnow, R.L. (1986). Interpretation of significance levels and effect sizes by psychological researchers. American Psychologist, 41, 1299 –1301. Nunnally, J. (1960). The place of statistics in psychology. Educational and Psychological Measurement, 20, 641– 650. Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral sciences. New York: Wiley. Olejnik, S.F. (1984). Planning educational research: Determining the necessary sample size. Journal of Experimental Education, 53, 40 – 48. Ottenbacher, K. (1984). Measures of relationship strength in occupational therapy research. The Occupational Therapy Journal of Research, 4, 271– 285. Parker, S. (1990). A note on the growth of the use of statistical tests in Perception & Psychophysics. Bulletin of the Psychonomic Society, 28, 565 – 566. Pedhazur, E.J., & Schmelkin, L.P. (1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Erlbaum. Pocock, S.J., Hughes, M.D., & Lee, R.J. (1987). Statistical problems in the reporting of clinical trials. The New England Journal of Medicine, 317, 426 – 432. Robinson, D., & Levin, J. (1997). Reflections on statistical and substantive significance, with a slice of replication. Educational Researcher, 26(5), 21– 26. Rosenthal, R. (1979). The ‘file drawer problem’ and tolerance for null results. Psychological Bulletin, 86, 638 – 641.
Salkind_Chapter 77.indd 112
9/4/2010 10:58:04 AM
Thompson
Statistical Significance Tests
113
Rosenthal, R. (1991). Effect sizes: Pearson’s correlation, its display via the BESD, and alternative indices. American Psychologist, 46, 1086 –1087. Rosenthal, R., & Gaito, J. (1963). The interpretation of levels of significance by psychological researchers. Journal of Psychology, 55, 33 – 38. Rosnow, R.L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American Psychologist, 44, 1276 –1284. Rozeboom, W.W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin, 57, 416 – 428. Saunders, S.M., Howard, K.I., & Newman, F.L. (1988). Evaluating the clinical significance of treatment effects: Norms and normality. Behavioral Assessment, 10, 207– 218. Sawyer, A.G., & Peter, J.P. (1983). The significance of statistical significance tests in marketing research. Journal of Marketing Research, 20, 122 –123. Schmidt, F.L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psychological Methods, 1, 115 –129. Schmidt, F.L., & Hunter, J.E. (1997). Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In L.L. Harlow, S.A. Mulaik, & J.H. Steiger (Eds.), What if there were no significance tests? (pp. 37– 64). Mahwah, NJ: Erlbaum. Serlin, R.C. (1993). Confidence intervals and the scientific method: A case for Holm on the range. Journal of Experimental Education, 61, 350 – 360. Shaver, J. (1985). Chance and nonsense. Phi Delta Kappan, 67, 57 – 60. Shea, C. (1996). Psychologists debate accuracy of ‘significance test’. Chronicle of Higher Education, 42, A12, A16. Shrout, P. E. (1997). Should significance tests be banned? Introduction to a special section exploring the pros and cons. Psychological Science, 8, 1– 2. Snyder, P.A., & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education, 61, 334 – 349. Snyder, P.A., & Thompson, B. (1998). Use of tests of statistical significance and other analytic choices in a school psychology journal: Review of practices and suggested alternatives. School Psychology Quarterly, 13, 335 – 348. Strike, K.A. (1979). An epistemology of practical research. Educational Researcher, 8(1), 10 – 16. Thompson, B. (1992a). DISCSTRA: A computer program that computes bootstrap resampling estimates of descriptive discriminant analysis function and structure coefficients and group centroids. Educational and Psychological Measurement, 52, 905 – 911. Thompson, B. (1992b). Two and one-half decades of leadership in measurement and evaluation. Journal of Counseling and Development, 70, 434 – 438. Thompson, B. (1993). The use of statistical significance tests in research: Bootstrap and other alternatives. Journal of Experimental Education, 61, 361– 377. Thompson, B. (1994a). The concept of statistical significance testing (An ERIC/AE Clearinghouse Digest #EDO-TM-94 - 1). Measurement Update, 4, 5 – 6. (ERIC Document Reproduction Service No. ED 366 654). Thompson, B. (1994b). Guidelines for authors. Educational and Psychological Measurement, 54, 837– 847. Thompson, B. (1994c). The pivotal role of replication in psychological research: Empirically evaluating the replicability of sample results. Journal of Personality, 62, 157–176. Thompson, B. (1995). Exploring the replicability of a study’s results: Bootstrap statistics for the multivariate case. Educational and Psychological Measurement, 55, 84 –94. Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25(2), 26 –30. Thompson, B. (1997). Editorial policies regarding statistical significance tests: Further comments. Educational Researcher, 26(5), 29 – 32.
Salkind_Chapter 77.indd 113
9/4/2010 10:58:04 AM
114
Research Design, Measurement and Statistics and Evaluation
Thompson, B. (1998a). Review of What if there were no significance tests? Educational and Psychological Measurement, 58. Thompson, B. (1998b, January). Why ‘encouraging’ effect size reporting isn’t working: The etiology of researcher resistance to changing practices. Paper presented at the annual meeting of the Southwest Educational Research Association, Houston, TX. (ERIC Document Reproduction Service No. ED 416 214). Thompson, B. (in press). Canonical correlation analysis. In L. Grimm & P. Yarnold (Eds.), Reading and understanding multivariate statistics, Vol. 2. Washington, D.C.: American Psychological Association. Thompson, B., & Snyder, P.A. (1997). Statistical significance testing practices in the Journal of Experimental Education. Journal of Experimental Education, 66, 75 – 83. Thompson, B., & Snyder, P.A. (1998). Statistical significance and reliability analyses in recent JCD research articles. Journal of Counseling and Development, 76, 436 – 441. Vacha-Haase, T., & Nilsson, J.E. (1998). Statistical significance reporting: Current trends and usages within MECD. Measurement and Evaluation in Counseling and Development, 31, 46 – 57. Young, M.A. (1993). Supplementing tests of statistical significance: Variation accounted for. Journal of Speech and Hearing Research, 36, 644 – 656. Zuckerman, M., Hodgins, H.S., Zuckerman, A., & Rosenthal, R. (1993). Contemporary issues in the analysis of data: A survey of 551 psychologists. Psychological Science, 4, 49–53. Zwick, R. (1997, March). Would the abolition of significance testing lead to better science? Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Salkind_Chapter 77.indd 114
9/4/2010 10:58:04 AM
78 Musical Aptitude Testing: From James McKeen Cattell to Carl Emil Seashore Jere T. Humphreys
W
hen psychologist Carl Emil Seashore (1866–1949) began the two decades of research that led to the development of his famous tests of musical aptitude,1 he drew upon beliefs and research methods then prevalent in the field of psychology. Many of those beliefs and methods were examined in a previous article.2 The purpose of this article is to describe the remaining major links between late nineteenth-century musicrelated psychological research and Seashore’s early work: the music-related research of James McKeen Cattell (1860–1944), the leader of the mental testing movement during the 1890s.3 In the 1890s, European and American researchers in the new field of scientific, empirical, laboratory-based psychology (as opposed philosophical “armchair” psychology) focused their research on sensory perception, the first of the new psychology’s three “great topics.”4 At about the same time, American psychologists assumed the leadership in mental testing research, which was part of the new psychology. Carl Seashore undertook his doctoral studies in the first half of the 1890s, a period that coincided with the birth of sensory psychology and mental testing research in the United States. Not long thereafter, he applied the methods of scientific psychology and mental testing to his research on musical aptitude.
Source: Research Studies in Music Education, 10 (1998): 42–53.
Salkind_Chapter 78.indd 115
9/4/2010 10:57:53 AM
116
Research Design, Measurement and Statistics and Evaluation
Perception Research Speculation about sensory perception began in Ancient Greece. Empirical research on the same began during the Renaissance, including studies of the least discernible differences in musical pitch. Theoretical and empirical perception research by Ernst Heinrich Weber (1795–1878), Gustav Theodor Fechner (1801–1887), Hermann Ludwig Ferdinand von Helmholtz (1821–1894), and other physicists led to a fusion of philosophical speculation and physiological research on sensation, which in turn contributed significantly to the emergence of the field of modem psychology. Wilhelm Wundt (1832–1920) of the University of Leipzig, the world’s first famous psychologist, borrowed testing ideas and research methods from these early researchers and helped develop the subfields of experimental psychology and psychophysics.5 Cambridge University researcher Francis Galton (1822–1911) borrowed sensory perception research methods from Wundt and others. An adherent of the long-standing belief that all knowledge is obtained through the five senses, Galton was also influenced by Charles Darwin’s (1809–1882) theory of evolution, the natural selection properties of which implied individual differences between people;6 the widespread belief in faculty psychology, whose adherents held that sensory faculties correspond to faculties of the brain; the emergence of atomistic chemistry, which encouraged psychologists to study the “psychological elements,” or senses; and the development of the concept of the normal, or random, distribution curve.7 Eventually, Galton hypothesized that “a measure of sensory acuity would provide a crude measure of a person’s level of intelligence,” and that mental ability is normally, or randomly, distributed.8 He also came to believe that mental abilities are related to each other, which led him to develop the rudiments of statistical correlation.9 Unlike Wundt, who attempted to identify traits common to all (or most) people, Galton used Wundt’s methods to measure individual differences in mental ability. Galton’s research, which began in the 1870s, included tests of musical discrimination and perceptions.10
James Cattell Leadership of the mental testing movement passed from Galton in the 1880s to James Cattell in the 1890s. Cattell graduated from Lafayette College in Easton, Pennsylvania, where his father was president, in 1880. For the next several years, he divided his time between completing a master’s degree at Lafayette, working on a Ph.D. in psychology under Wundt (granted in 1886), studying with Galton at Cambridge, and several other activities. One of those activities was a graduate fellowship at Johns Hopkins University (1882–83), where he and fellow graduate students (two of whom were John Dewey and Joseph Jastrow) helped G. Stanley Hall (1844 –1924)
Salkind_Chapter 78.indd 116
9/4/2010 10:57:53 AM
Humphreys
Musical Aptitude Testing
117
establish one of the first American psychological laboratories.11 He went to the University of Pennsylvania in the late 1880s, where he opened a psychological laboratory and held the first university faculty position in psychology in the United States. After moving to Columbia University in 1891, he provided leadership to the new experimental psychology movement for the next twenty-six years.12 Cattell seems to have begun his sensory perception studies while at Johns Hopkins in 1883.13 He continued at least one of those experiments in Wundt’s laboratory.14 He also seems to have developed his keen interest in experimental apparatus at Leipzig, including those for music research: We have in the [Wundt’s] laboratory two excellent pieces of apparatus for testing the power of distinguishing notes. The one is an organ arrangement, which gives the notes at intervals of four vibrations from 32 to 1024 [Hz.] … The other apparatus is a set of tuning forks made up by König [sic], in Paris. Pairs of tuning forks are taken, one always gives the same note, the other (by means of weights) can be so regulated as to give a note a little lower or higher. Experiments on this subject are being made by three groups of students … In one case, memory of notes is being especially investigated.15
Similarly, in 1888, Cattell described “[c]areful experiments, not yet published,” that had “been carried on for several years past in the Leipsic [sic] laboratory” on the least perceptible differences in loudness and pitch, and on the perception of musical intervals.16 Cattell also seems to have first become interested in individual differences during his time with Hall at Johns Hopkins.17 He took that interest with him to Leipzig, where Wundt, himself uninterested in individual differences,18 allowed Cattell to write a paper on the subject as early as 1885.19 Cattell’s interest in individual differences intensified during his intermittent work with Galton at Cambridge over several years. For example, his letters from Cambridge tell of his “association experiments,”20 which he employed in his mental testing efforts. Cattell studied extensively with Wundt, the early leader in the psychological measurement of sensory perception, and Galton, the pioneering mental tester and the early leader in the measurement of individual differences in sensory perception. The fact that both men incorporated tests of musical perception in their research appears to have influenced Cattell to do the same.
Cattell the Mental Tester At the University of Pennsylvania, his first full-time position, Cattell gathered for the laboratory “a valuable collection of Koenig’s [sic] apparatus for the study of hearing and the elements of music …”21 Soon thereafter, in 1890,
Salkind_Chapter 78.indd 117
9/4/2010 10:57:53 AM
118
Research Design, Measurement and Statistics and Evaluation
he published an article in a British journal that scholars believe was the first time the term “mental test” appeared in print.22 In this article, a watershed in the history of mental measurement,23 Cattell described a series of ten tests then in use at Pennsylvania. None of the ten tests involved music, although one measured “Reaction-time for Sound.” However, Cattell listed an additional fifty tests still under development, “which I look on as the more important in order that attention may be drawn to them, and co-operation secured in choosing the best series of tests and the most accurate and convenient methods.” Some of these were music tests.24 Upon his arrival at Columbia, he established the department of psychology and developed what became known as the “Freshman Tests,” which he administered to at least fifty volunteer freshmen each year beginning in 1893. Cattell held great hope for these tests, which he predicted would correlate with each other and with academic grades. In an 1896 article, Cattell and a collaborator described their research methods in some detail and provided preliminary results from what may have been the first predictive study of academic success. Only two tests related to music. For one, a test of hearing (of tones), the researchers simply divided subjects from each year into “normal,” “subnormal,” and “abnormal” categories. The other music test measured the “accuracy of the perception of pitch.” After subjects heard a pitch (F below middle C) played on a monochord, they attempted to match the pitch by adjusting the instrument’s bridge.25 After several more years of data collection, one of Cattell’s graduate students, Clark Wissler, reported more results from the study, including data collected from a small number of female students from Barnard College. Wissler correlated the test scores with each other and with senior grade-point averages using the technique of statistical correlation that had been discovered by Galton and developed by Galton’s young associate, Karl Pearson.26 Most of the instruments in the battery were tests of sensory discrimination. In addition to the pitch perception test described above, Wissler discussed a music-related test of “Rhythm and Perception of Time” that measured subjects’ abilities to continue tapping a steady beat on a telegraph key fifty times after hearing a stimulus of ten tapped beats, and two “Imagery” questions that required written responses.27 On the pitch perception test, the “average error” (monochord bridge distance from the “correct” placement) was 7.2 centimeters for freshmen and 3.7 centimeters for seniors. Wissler concluded that women were superior to men and seniors were superior to freshmen on that test, with a “certainty of results” of p < .01 in each case.28 He found no statistically significant differences in pitch perception between freshmen from different years.29 Unfortunately, with one exception, Wissler did not report correlation coefficients between the pitch perception test and the other variables. The exception was a coefficient of r = .01 between pitch perception and reaction time (N = 100). In general, he found only chance intercorrelations between
Salkind_Chapter 78.indd 118
9/4/2010 10:57:54 AM
Humphreys
Musical Aptitude Testing
119
the physical and mental tests, and moderate intercorrelations between grades for specific courses. Most disappointing of all, he found only chance correlations between individual tests taken as freshmen and overall grades as seniors. Among other things, Wissler complained about the inadequacy of undergraduate grades as a representative measure of students’ abilities to handle “life tasks,” which he deemed “exceedingly complex.” He concluded that Cattell’s physical and mental tests promised little “from a practical point of view.”30
Other Mental Testers European researchers were beginning to conduct similar studies of mental functions in the 1890s.31 In the United States, where most of the work occurred, Frank Boas (1858–1942) related school children’s test scores to their mental alertness as estimated by teachers; Joseph Jastrow (1863–1944) developed fifteen tests for college students; and James A. Gilbert (b. ?-d. ?) studied the mental and physical development of school children. Like Cattell and Wissler, Boas and Gilbert found only chance relationships between test scores and teacher ratings.32 More important than any of these mental testing efforts was the work of Alfred Binet (1857–1911) and Victor Henri (1872–1940) in France and Hugo Münsterberg (1863–1916) in the United States. These researchers experimented with a radically different type of mental test based on cognitive functioning rather than sensory perception.33 Cattell seems to have recognized as early as 1896 the importance of these new tests: of a strictly psychological character. For the psychologist these are, of course, the most interesting and important. But we are at present concerned with anthropometric work, and measurements of the body and of the senses come as completely within our scope as the higher mental processes.34
Indeed, the mental testing movement soon followed Binet and Henri’s lead. Probably for that reason, Cattell, like Galton before him, turned to other interests.35 He eventually became embittered, in part because “his major contribution to experimental psychology … [was] thoroughly discredited and replaced by the … tests of Alfred Binet.”36
Cattell and Seashore Several pieces of evidence suggest that Cattell influenced Carl Seashore’s work on musical aptitude testing. First, as a founding member and fourth president of the American Psychological Association, founding editor of the
Salkind_Chapter 78.indd 119
9/4/2010 10:57:54 AM
120
Research Design, Measurement and Statistics and Evaluation
American Journal of Psychology, founding head of the psychology department at a leading university (Columbia), and leader of the mental testing movement during the 1890s, James Cattell was an extremely prominent psychologist. Second, Cattell was a professional friend of Edward Wheeler Scripture (1864–1945), Seashore’s doctoral mentor at Yale University who himself had taken his doctorate under Wundt in 1891. Scripture was a highly productive researcher, but because of his disagreeable personality, he “was largely estranged from his generation of American psychologists,” except for Cattell, his “best friend among the American psychologists.”37 In addition to the personal relationship between Cattell and Scripture, both Scripture and Seashore adopted “an approach like Cattell’s” to the study of sensation.38 A final set of clues to the link between Cattell and Seashore resides in the James McKeen Cattell Collection held by the Library of Congress. The author located more than seventy pieces of personal correspondence between the two men, the earliest dating from 1899.39 Seashore’s doctoral dissertation, which he completed in 1895, was about neither mental testing nor music. Instead, his interest in mental testing may have come indirectly from the prominent Cattell, whose article on the Columbia “Freshman Tests” appeared after Seashore completed his dissertation but before he published his first article on a music-related study.40 Cattell’s work with music tests probably appealed to Seashore, a former singing school student, church organist and choir director, and college glee club director from an amateur musical family.41
Conclusions Cattell learned from Wundt about the long German tradition of perception research, with its “precision, accuracy, order, and reproducibility of data and findings.”42 He may have become interested in individual differences under Hall. Under Galton, he developed his interest in the measurement of sensory perception differences between individuals. In addition, Galton’s concepts about statistical correlation undoubtedly formed the basis for Cattell’s hypotheses about relationships between mental ability and academic grades (and other “life tasks”). Despite the failure of Cattell’s tests, his work “was of great importance as it was the first attempt to apply the ‘new psychology’ to problems of individual differences.”43 Nevertheless, he experienced difficulty in selecting valid, measurable dependent variables, a problem that continues to plague today’s music education researchers. His main problem, however, was his presumably false hypothesis about strong relationships between mental ability and sensory perception ability.44 It is not surprising that Seashore and many other American psychologists followed Cattell and not Binet, because the latter’s most important work
Salkind_Chapter 78.indd 120
9/4/2010 10:57:54 AM
Humphreys
Musical Aptitude Testing
121
appeared a few years later. However, most American researchers eventually joined Binet in defining intelligence as cognitive functioning ability, not as sensory perception ability. By contrast, Seashore’s tests remained largely in the sensory perception realm, and he appears to have followed several other early mental testing researchers when he added (tonal) memory to his list of important “psychological processes,” something that Cattell did not do.45 However, Binet and other mainstream mental testing researchers dropped “sensation, attention, perception, association, and memory” from their test batteries around 1904.46 Seashore retained his 1890s belief about yet another issue: that a series of mental tests could not yield a single score that represents general musical (or intellectual) ability. That was Cattell’s position and Binet’s, but subsequent American testing researchers went on to develop the concept of the intelligence quotient and other unitary measures of mental ability. On that issue, at least, Seashore’s conservatism aligned his work with current thinking, which has now returned to that position.
Implications for Music Education Cattell, the leading sensory mental tester of the 1890s, formally tested his first complete battery in the 1890s. Seashore, the leading sensory musical tester, formally tested his first complete battery in the 1910s, some twenty years later. Binet and Henri published their first battery of cognitive-type tests in 1904, only a few years after Cattell’s sensory tests failed to predict academic achievement. Other researchers further developed the French tests in the first decade of the twentieth century, and have continued to develop them to this day. By contrast, the field of music waited until 1965 for the appearance of a well-constructed test that corresponded to the second generation of intelligence tests.47 Researchers now question the validity of this second generation of tests, both of intelligence and of musical aptitude. Both types of tests predict performance on school-related tasks,48 but not necessarily on “life tasks.” Two central historical questions remain. First, why did Seashore, unlike Cattell, not abandon the effort after his tests failed to demonstrate predictive validity? In other words, why did Seashore remain committed to sensory measures when his (mostly American) counterparts in other fields shifted from tests of sensory perception to tests of reasoning and judgment? It was partly a matter timing, Seashore having received his doctoral training and begun his research program during the crucial few years between Cattell’s bold hypothesis about relationships between sensory skills and mental ability and the failure of his statistical correlations to support that hypothesis. By the time failure was reported, Seashore may have already committed himself to his life-long agenda. After all, Binet and Henri’s
Salkind_Chapter 78.indd 121
9/4/2010 10:57:54 AM
122
Research Design, Measurement and Statistics and Evaluation
“key article” – in which they “argued for mental testing based not on sensory and motor functions but on the psychological processes thought to be involved in intelligence …”49 – appeared in 1895, the year Seashore received this doctorate. It is also probable that Seashore believed that musical ability really is based largely on sensory ability, and therefore is somehow different from other mental abilities. Regardless, years later, Seashore explained that he had “drifted gradually into the field of psychology of music primarily for two reasons: first, my love of music and realization of great possibilities in an unworked field; and, secondly,” because his first research interest, vision, plagued its research subjects with eye fatigue, a problem that did not trouble aural researchers.50 The second question is: Since Seashore did not turn to other types of tests, why did other music researchers not do so either? Almost from the beginning, critics charged that Seashore’s battery was sensory and atomistic, but no one, including his most prominent critic, Columbia University psychologist James Mursell (1893–1963),51 conducted extensive, rigorous research on the tests or developed alternative measures. The lack of researchoriented graduate training in music education undoubtedly hampered the profession’s efforts to test the validity of Seashore’s battery thoroughly and to keep pace with new developments in the mental testing and research worlds generally.52 In addition, Seashore himself – with his Yale Ph.D. in the “scientific” field of psychology, his deanship at the University of Iowa and presidency of the American Psychological Association, and his tireless research efforts and prolific publication record – brought considerable prestige to the field of musical aptitude testing. Indeed, unlike most other earlier sensory mental testing researchers, Seashore never changed his mind about the sensory nature of mental (musical) aptitude, although eventually he tacitly acknowledged the possibility of other types of “musical capacity” when he wrote, in 1930, that his six published measures “furnish a fairly good index to music capacity on the sensory side.”53 Given his professional prestige and that of the field of psychology, and in the absence of strong evidence against his claims or of alternative approaches to musical aptitude testing, Seashore’s tests stood nearly alone for a long time. An implication of this historical research is that the field of music education can benefit from the research of prominent individuals from outside the field. Wundt, Galton, Cattell, and Seashore developed some of the concepts and research methods and tools still in use today. Each made large intellectual “leaps of faith” and each worked diligently to test his hypotheses. Each rendered the field a great service, directly or indirectly. In particular, this story of James Cattell and Carl Seashore suggests that music educators should consider carefully the contributions and prestige brought by those from outside the field. Seashore himself mentioned that he had been “more
Salkind_Chapter 78.indd 122
9/4/2010 10:57:54 AM
Humphreys
Musical Aptitude Testing
123
or less justly the butt of criticism from the musical profession.” However, he also wrote that: In the field of diagnosing musical talent, I have had a rather extraordinary following, but unfortunately much of it a gullible and non-critical type on the part of people who would take an isolated element in my procedure and handle it as if it covered the whole situation.54
Clearly, it is incumbent upon music educators to decide which contributions to embrace, which to disregard, and which to use as building blocks for the next generation of ideas.
Notes 1. Carl E. Seashore, Seashore Measures of Musical Talent (New York: The Psychological Corporation, 1919). 2. Jere T. Humphreys, “Precursors of Musical Aptitude Testing: From the Greeks through the Work of Francis Galton,” Journal of Research in Music Education 41 (Winter 1993): 315–27. 3. Edwin G. Boring, A History of Experimental Psychology, 2d ed. (New York: Appleton Century-Crofts, 1950), 569. 4. The other two “great topics,” learning and motivation, did not emerge until a few years later. Humphreys, “Precursors,” 323. 5. Ibid., 316–17. 6. Charles Darwin, in his The Descent of Man and Selection in Relation to Sex, 2d ed. (New York: American Publishers Corporation, 1874), had argued that sensitivity to pitch is important in the natural selection process because “the vocal organs were primarily used and perfected in relation to the propagation of the species” (589). 7. Humphreys, “Precursors,” 318–19. 8. Ibid., 319. Attempts to measure mental ability had occurred at least since the early nineteenth century. According to Florence Goodenough, Mental Testing: Its History, Principles, and Applications (New York: Rinehart and Company, 1949), 3, the first major work along those lines was produced in France and differentiated between mental deficiency and mental disease. Jean-Etienne Dominique Esquirol, Des maladies mentales considé rées sous les rapports médical, hygienique, et médico-légal, vols. I, II and atlas (Paris: J. B. Bailliére, 1838). Galton knew from this research that idiots and imbeciles frequently exhibit inferior sensory acuity, so he hypothesized a relationship between sensory acuity and intelligence. Richard Herrnstein, I.Q. in the Meritocracy (Boston: Little, Brown, 1973), 63. 9. Humphreys, “Precursors,” 321. Galton developed his first regression line from the size of “mother” and “daughter” peas. Karl Pearson, The Life, Letters and Labours of Francis Galton, vol. IIIA (Cambridge, England: Cambridge University Press, 1930), 3–5, 69. 10. Humphreys, “Precursors,” 319–20. 11. Controversy remains over whom should be credited with establishing the first psychological laboratory in the United States: Hall at Johns Hopkins in early 1883 or William James at Harvard University around 1876. The controversy centers on whether “James’ room for demonstrational experiments at Harvard” was really a laboratory. J. McKeen Cattell, “Early Psychological Laboratories,” Science 67 (May 1928): 546. 12. Most biographical accounts state that Cattell went to Pennsylvania in 1888, but he himself wrote later that he founded the laboratory there in 1887. Ibid., 546. Published
Salkind_Chapter 78.indd 123
9/4/2010 10:57:54 AM
124
13. 14.
15.
16. 17.
18. 19.
20. 21. 22. 23.
Research Design, Measurement and Statistics and Evaluation
accounts of Cattell’s life differ in many details as to his exact whereabouts at specific times before his extended stays abroad ended in 1894. Numerous personal letters, primarily to his parents, suggest that he traveled frequently between Leipzig, Cambridge, and various other places in the United States and Europe. E.g., Jim [James McKeen Cattell], Leipzig, letters to “Mama and Papa” [William and Elizabeth Cattell], Philadelphia, 25 July 1894 and 5 December 1894, in James McKeen Cattell Collection, Container #55, “Family Correspondence,” December 1888-March 1903, Manuscript Division, Library of Congress, Washington, DC. The most complete account of Cattell’s activities from 1880–1888 appears in Michael M. Sokal, ed., An Education in Psychology: James McKeen Cattell’s Journal and Letters from Germany and England, 1880–1888 (Cambridge, MA: The MIT Press, 1981). Ibid., 70, note 3. J. McKeen Cattell, “Ueber die Trägheit der Netzhaut und des Sehcentrums,” Philosophische Studien 3 (1885): [94 – 127]; reprinted as “The Inertia of the Eye and Brain,” (no trans.) in James McKeen Cattell, James McKeen Cattell, 1860–1944: Man of Science, vol. I (Lancaster, PA: The Science Press, 1947), 27. (Page citation is to the reprint edition.) [James McKeen Cattell], Leipzig, letter to Francis Galton, Cambridge, England, [n.d. given], quoted in Francis Galton, “On Recent Designs for Anthropometric Instruments,” The Journal of the Anthropological Institute of Great Britain and Ireland 16 (1887): 8. Several of Cattell’s journal entries and letters to his parents beginning in 1884 also contain references to his involvement with apparatus in Wundt’s Leipzig laboratory. E.g., Sokal, Education, 98 –105. James McKeen Cattell, “The Psychological Laboratory at Leipsic [sic],” Mind 13 (January 1888): 43. Sokal, Education, 70, note 3. Although Galton is usually regarded as the first to study individual differences and was the undisputedly leader of the movement, Hall was eclectic and progressive and, like other psychologists in the child-study movement, eventually became interested in the characteristics of individual children. Jere T. Humphreys, “The Child-Study Movement and Public School Music Education,” Journal of Research in Music Education 33 (Summer 1985): 82. Humphreys, “Precursors,” 319. J. McKeen Cattell, “Ueber die Zeit der Erkennung und Benennung von Schriftzeichen, Bildern und Farben.” Philosophische Studien 2 (1885): 635–50; translated by R. S. Woodworth as “On the Time Required for Recognizing and Naming Letters and Words, Pictures and Colors,” in Cattell, Cattell, vol. I, 13 – 25. Years later, Cattell gave this revealing account of Wundt’s reactions to his early work on individual differences: . . . in my second interview with Wundt [probably in 1883] I presented an outline of the work I wanted to undertake, which was the objective measurement of the time of reactions with special reference to individual differences. Wundt said that . . . only psychologists could be the subjects in psychological experiments. I later bought and made the apparatus needed and did the work in my own room, without, however, any interruption in relations that were then becoming friendly. J. McKeen Cattell, “In Memory of Wilhelm Wundt,” Psychological Review 28 (May 1921): 156. E.g., Jim [James McKeen Cattell], Cambridge, letter to “Mama and Papa,” Philadelphia, 23 November 1893, James McKeen Cattell Collection, Container #55. James McKeen Cattell, “Psychology at the University of Pennsylvania,” American Journal of Psychology 3 (April 1890): 282. James McKeen Cattell, “Mental Tests and Measurements, Mind 15 (January 1890): 373–80. Katherine W. Linden and James D. Linden, Modern Mental Measurement: A Historical Perspective (Boston: Houghton Mifflin Company, 1968), 9–10.
Salkind_Chapter 78.indd 124
9/4/2010 10:57:54 AM
Humphreys
Musical Aptitude Testing
125
24. Cattell, “Mental Tests,” 378. Cattell acknowledged his indebtedness to Galton in that article, when he wrote that Galton had “already used some of these tests, and I hope the series here suggested will meet with his approval” (373, note 1). 25. J. McKeen Cattell and Livingston Farrand, “Physical and Mental Measurements of the Students of Columbia University,” Psychological Review 3 (1896): 636. 26. Clark Wissler, “The Correlation of Mental and Physical Tests,” Psychological Review Monograph Supplements 3 (Whole No. 16) (June 1901): 1– 62. 27. Ibid., 9. 28. Ibid., 6, 15 –17. The present author translated Wissler’s archaic statistical terms and symbols to modem usage. 29. Ibid., 21– 22. Wissler did not report exact significance levels for these and most of his other statistical tests. 30. Ibid., 54, 61– 62. 31. For more information see Linden and Linden, 10. 32. Frank Boas, “Anthropological Investigations in Schools,” Pedagogical Seminary 1 (June 1891): 225 – 28; Joseph Jastrow, “Some Anthropometric and Psychologic [sic] Tests on College Students: A Preliminary Survey,” American Journal of Psychology 4 (April 1892): 420–28; J. Allen Gilbert, “Researches on Mental and Physical Development of School-Children,” in Studies from the Yale Psychological Laboratory, vol. II, ed. E. W. Scripture (New Haven, CT: Yale University Press, 1894); and J. Allen Gilbert, “Researches upon School Children and College Students,” in University of Iowa Studies in Psychology, vol. I, ed. George T. W. Patrick and J. Allen Gilbert (Iowa City: State University of Iowa, 1897). One of Gilbert’s testing experiments involved music: J. A. Gilbert, “Experiments on the Musical Sensitiveness of School Children,” in Studies from the Yale Psychological Laboratory, vol. I, ed. E. W. Scripture (New Haven, CT: Yale University Press, 1893). 33. A. Binet and V. Henri, “La Psychologie individuelle,” L’année psychologique 2 (1895): 411–65; excerpts translated by Mollie D. Boring in A Source Book in the History of Psychology, ed. Richard J. Herrnstein and Edwin G. Boring (Cambridge, MA: Harvard University Press, 1965), 428 – 33; and Hugo Münsterberg, Zur individual Psychologie,” Centralblatt f. nervenkeilkunde und psychiatrie 14 (1891): 196 – 98. 34. Cattell and Farrand, 623. 35. Humphreys, “Precursors,” 322. 36. Michael M. Sokal, “The Unpublished Autobiography of James McKeen Cattell,” American Psychologist 26 (July 1971): 629. Evidence that Cattell never completely gave up on his “Freshman Tests” can be found in a report written in 1922. J. M. Cattell, “The First Year of the Psychological Corporation,” unpublished report, 1 December 1922, in James McKeen Cattell Collection, Container #178, “Subject File,” 1890 –1936, Manuscript Division, Library of Congress, Washington, DC. 37. Michael M. Sokal, “Biographical Approach: The Psychological Career of Edward Wheeler Scripture,” in Historiography of Modern Psychology: Aims, Resources, Approaches, ed. Josef Brozek and Ludwig J. Pongratz (Toronto: C. J. Hogrefe, Inc., 1980), 268. Cattell, for example, praised one of Scripture’s books, Edward Wheeler Scripture, The New Psychology (New York: Scribner’s, 1897), when other psychologists criticized it harshly. Sokal, “Biographical,” 266 – 67. In addition, Cattell apparently helped Scripture become a fellow of the American Association for the Advancement of Science. [Edward Wheeler] Scripture, New Haven, CT, letter to [James McKeen] Cattell, New York, NY, 7 August 1901, in James McKeen Cattell Collection, Container #38, “General Correspondence,” 1884 –1944, Manuscript Division, Library of Congress, Washington, DC. In 1902, Scripture visited Cattell about his future plans, just before he was released by Yale later that year. After earning a medical degree in his native Germany and teaching briefly at Johns Hopkins, both Scripture and his wife
Salkind_Chapter 78.indd 125
9/4/2010 10:57:54 AM
126
38. 39.
40.
41.
42. 43. 44.
45. 46. 47.
48.
49. 50.
51. 52.
53.
Research Design, Measurement and Statistics and Evaluation
obtained positions at Columbia, “possibly with Cattell’s Hopkins, help.” Sokal, “Biographical,” 269–70. Ibid., 256. James McKeen Cattell Collection, Container #38. Years later, Cattell and others collaborated with Seashore to found the nonprofit Psychological Corporation. Their correspondence continued through the 1930s. James McKeen Cattell, New York, NY, various letters, memoranda, and undated manuscripts, James McKeen Cattell Collection, Container #178. One of Seashore’s first music-related publications was his “Hearing-ability and Discriminative Sensibility for Pitch,” in University of Iowa Studies in Psychology, vol. II, ed. G. T. W. Patrick (Iowa City: State University of Iowa, 1899). His doctoral dissertation was “Measurements of Illusions and Hallucinations in Normal Life” (Ph.D. diss., Yale University, 1895); published in Studies from the Yale Psychological Laboratory, vol. III, ed. E. W. Scripture (New Haven, CT: Yale University Press, 1895). Interestingly, Scripture began to study pitch perception, but not mental testing per se, during Seashore’s student years at Yale. E. W. Scripture, “The Method of Regular Variation,” in “Psychological Notes,” American Journal of Psychology 4 (August 1892): 577– 84. Carl Emil Seashore, “Carl Emil Seashore,” in A History of Psychology in Autobiography, vol. I, ed. Carl Murchison (Worcester, MA: Clark University Press, 1930; reprint, New York: Russell & Russell, 1961), 236 – 38, 245. (Page citations are to the reprint edition.) Linden and Linden, 5. Sokal, “Unpublished,” 629. Cattell may yet be proven correct. Recently, Deary reported a strong link between intelligence and auditory ability. Ian J. Deary, “Intelligence and Auditory Discrimination: Separating Processing Speed and Fidelity of Stimulus Representation,” Intelligence 18 (March 1994): 189 – 213. Philip H. DuBois, A History of Psychological Testing (Boston: Allyn and Bacon, 1970), 28. Ibid., 46. Edwin Gordon, Musical Aptitude Profile (Boston: Houghton Mifflin Company, 1965). The “Musical Sensitivity” portion of this test clearly represents a move away from sensory measurement. Various aptitude tests, including those of musical aptitude, predict achievement in school music. See Jere T. Humphreys, William V. May, and David J. Nelson, “Research on Music Ensembles,” in Handbook of Research on Music Teaching and Learning, ed. Richard Colwell (New York: Schirmer Books, 1992), 651– 53. Herrnstein, 65. The “key article” was Binet and Henri, “La psychologie individuelle.” Seashore, “Seashore,” 272. Interestingly, neither Cattell nor Galton investigated the relationships between sensory ability and musical aptitude per se. Rather, Galton gathered antecdoctal data about artistic ability and compared them to expected statistical values. Humphreys, “Precursors,” 322. Among his many writings about the Seashore tests is James L. Mursell, The Psychology of School Music Teaching (New York: Silver, Burdett and Company, 1931), 333 – 35. Jere T. Humphreys, “Applications of Science: The Age of Standardization and Efficiency in Music Education,” The Bulletin of Historical Research in Music Educafion 9 (January 1988): 17–18. Seashore, “Seashore,” 273 – 74. Seashore’s statement stands in contrast to the reflections of one of his contemporaries published in the same year, the prominent mental tester Joseph Jastrow of the University of Wisconsin: My interest in the subject goes back to 1893 and before. In that early period Cattell had emphasized the importance of tests of indices of individual differences. . . . But it remained for Binet . . . to recognize in ordinary achievements (not merely in specially arranged sensory, motor, memory,
Salkind_Chapter 78.indd 126
9/4/2010 10:57:54 AM
Humphreys
Musical Aptitude Testing
127
and intelligence functions, such as I had used) an available means of grading natural aptitudes. Joseph Jastrow, “Joseph Jastrow,” in A History of Psychology in Autobiography, vol. I, ed. Carl Murchison (Worcester, MA: Clark University Press, 1930; reprint, New York: Russell & Russell, 1961), 156. (Page citation is to the reprint edition.) 54. Seashore, “Seashore,” 272.
Bibliography Binet, A., and V. Henri. “La psychologie individuelle.” L’année psychologique 2 (1895): 411– 65. Excerpts translated by Mollie D. Boring. In A Source Book in the History of Psychology, ed. Richard J. Herrnstein and Edwin G. Boring, 428–33. Cambridge, MA: Harvard University Press, 1965. Boas, Frank. “Anthropological Investigations in Schools.” Pedagogical Seminary 1 (June 1891): 225 –28. Boring, Edwin G. A History of Experimental Psychology. 2d ed. New York: Appleton Century-Crofts, 1950. Cattell, J. McKeen. “Early Psychological Laboratories.” Science 67 (May 1928): 543 – 48. ———. “The First Year of the Psychological Corporation.” Unpublished report to the Board of Directors, 1 December 1922. James McKeen Cattell Collection, Container #178, “Subject File, 1890–1936. Manuscript Division, Library of Congress, Washington, D.C. ———. James McKeen Cattell, 1860 –1944: Man of Science. Vol. I (Lancaster, PA: The Science Press, 1947). ———. Cambridge [England], to “Mama and Papa” [William and Elizabeth Cattell], Philadelphia, 23 November 1893, 25 July 1894, 5 December 1894. Letters in the hand of James McKeen Cattell. James McKeen Cattell Collection, Container #55, “Family Correspondence,” December 1888-March 1903. Manuscript Division, Library of Congress, Washington, D.C. (Two of these letters are misfiled in the Collection.) ———. Leipzig [Germany], to Francis Galton, Cambridge [England], [n.d. given]. Quoted in Francis Galton. “On Recent Designs for Anthropometric Instruments.” The Journal of the Anthropological Institute of Great Britain and Ireland 16 (1887): 2–9. ———. “In Memory of Wilhelm Wundt.” Psychological Review 28 (May 1921): 155 – 59. ———. “Mental Tests and Measurements.” Mind 15 (January 1890): 373 – 80. ———. “The Psychological Laboratory at Leipsic [sic].” Mind 13 (January 1888): 37– 51. ———. “Psychology at the University of Pennsylvania.” American Journal of Psychology 3 (April 1890): 281– 83. ———. “Ueber die Trägheit der Netzhaut und des Sehcentrums.” Philosophische Studien 3 (1885): 94–127. Reprinted as “The Inertia of the Eye and Brain.” In (no translator) James McKeen Cattell. James McKeen Cattell, 1860 –1944: Man of Science. Vol. I, 26–40. Lancaster, PA: The Science Press, 1947. ———. “Ueber die Zeit der Erkennung und Benennung von Schriftzeichen, Bildern und Farben.” Philosophische Studien 2 (1885): 635 – 50. Excerpts translated by R. S. Woodworth in “On the Time Required for Recognizing and Naming Letters and Words, Pictures and Colors.” In James McKeen Cattell. James McKeen Cattell, 1860 –1944: Man of Science. Vol. I, 13 – 25. Lancaster, PA: The Science Press, 1947. Cattell, J. McKeen, and Livingston Farrand. “Physical and Mental Measurements of the Students of Columbia University. Psychological Review 3 (1896): 618 – 48. Darwin, Charles. The Descent of Man and Selection in Relation to Sex, 2d ed. New York: American Publishers Corporation, 1874. Deary, Ian J. “Intelligence and Auditory Discrimination: Separating Processing Speed and Fidelity of Stimulus Representation.” Intelligence 18 (March 1994): 189 – 213. DuBois, Philip H. A History of Psychological Testing. Boston: Allyn and Bacon, 1970.
Salkind_Chapter 78.indd 127
9/4/2010 10:57:54 AM
128
Research Design, Measurement and Statistics and Evaluation
Esquirol, Jean-Etienne Dominique. Des maladies mentales considé rées sous les rapports médical, hygienique, et médico-légal. Vols. I, II and Atlas. Paris: J. B. Bailliére, 1838. Galton, Francis. “On Recent Designs for Anthropometric Instruments.” The Journal of the Anthropological Institute of Great Britain and Ireland 16 (1887): 2 – 9. Gilbert, J. A. “Experiments on the Musical Sensitiveness of School Children.” In Studies from the Yale Psychological Laboratory. Vol. I, ed. E. W. Scripture, 80 – 87. New Haven, CT: Yale University Press, 1893. ———. “Researches on Mental and Physical Development of School-Children.” Studies from the Yale Psychological Laboratory. Vol. II, ed. E. W. Scripture, 40 –100. New Haven, CT: Yale University Press, 1894. ———.“Researches upon School Children and College Students.” University of Iowa Studies in Psychology. Vol. I, ed. G. T. W. Patrick and J. Allen Gilbert, 1– 39. Iowa City: State University of Iowa, 1897. Goodenough, Florence L. Mental Testing: Its History, Principles, and Applications. New York: Rinehart and Company, 1949. Gordon, Edwin. Musical Aptitude Profile. Boston: Houghton Mifflin Company, 1965. Herrnstein, Richard J. I. Q. in the Meritocracy. Boston: Little, Brown, 1973. Herrnstein, Richard J., and Edwin G. Boring, eds. A Source Book in the History of Psychology. Cambridge, MA: Harvard University Press, 1965. Humphreys, Jere T. “Applications of Science: The Age of Standardization and Efficiency in Music Education.” The Bulletin of Historical Research in Music Education 9 (January 1988): 1– 21. ———. “The Child-Study Movement and Public School Music Education.” Journal of Research in Music Education 33 (Summer 1985): 79 – 86. ———. “Precursors of Musical Aptitude Testing: From the Greeks through the Work of Francis Galton.” Journal of Research in Music Education 41 (Winter 1993): 315 – 27. Humphreys, Jere T., William V. May, and David J. Nelson. “Research on Music Ensembles.” In Handbook of Research on Music Teaching and Learning, ed. Richard Colwell, 651– 68. New York: Schirmer Books, 1992. Jastrow, Joseph. “Joseph Jastrow.” In A History of Psychology in Autobiography. Vol. I, ed. Carl Murchison, 135 – 62. Worcester, MA: Clark University Press, 1930; reprint, New York: Russell & Russell, 1961. ———. “Some Anthropometric and Psychologic [sic] Tests on College Students.” American Journal of Psychology 4 (April 1892): 420 – 28. Linden, Katherine W., and James D. Linden. Modern Mental Measurement: A Historical Perspective. Boston: Houghton Mifflin Company, 1968. Münsterberg, Hugo. “Zur individual Psychologie,” Centralblatt f. Nervenheilkunde und Psychiatrie 14 (1891): 196 – 98. Mursell, James L. The Psychology of School Music Teaching. New York: Silver, Burdett and Company, 1931. Pearson, Karl. The Life, Letters and Labours of Francis Galton. Vol. IIIA. Cambridge, England: Cambridge University Press, 1930. Scripture, [Edward Wheeler], New Haven, CT, to [James McKeen] Cattell, New York, NY, 7 August 1901. Letter in the hand of Edward Wheeler Scripture. James McKeen Cattell Collection, Container #38, “General Correspondence,” 1884 –1944. Manuscript Division, Library of Congress, Washington, D.C. ———. “The Method of Regular Variation,” in “Psychological Notes.” American Journal of Psychology 4 (August 1892): 577– 84. ———. The New Psychology. New York: Scribner’s, 1897. Seashore, Carl Emil. “Carl Emil Seashore.” In A History of Psychology in Autobiography. Vol. I, ed. Carl Murchison, 225 – 97. Worcester, MA: Clark University Press, 1930; reprint, New York: Russell & Russell, 1961.
Salkind_Chapter 78.indd 128
9/4/2010 10:57:55 AM
Humphreys
Musical Aptitude Testing
129
Seashore, Carl Emil. “Hearing-ability and Discriminative Sensibility for Pitch.” University of Iowa Studies in Psychology. Vol. II, ed. G. T. W. Patrick, 163 –78. Iowa City: State University of Iowa, 1899. ———. “Measurements of Illusions and Hallucinations in Normal Life.” Ph.D. diss., Yale University, 1895; published in Studies from the Yale Psychological Laboratory. Vol. III, ed. E. W. Scripture, 1– 67. New Haven, CT: Yale University Press, 1895. Seashore, Carl E. Seashore Measures of Musical Talent. New York: The Psychological Corporation, 1919. Sokal, Michael M. “Biographical Approach: The Psychological Career of Edward Wheeler Scripture.” In Historiography of Modern Psychology: Aims, Resources, Approaches, ed. Josef Brozek and Ludwig J. Pongratz, 255 – 78. Toronto: C.J. Hogrefe, Inc., 1980. ———. ed. An Education in Psychology: James McKeen Cattell’s Journal and Letters from Germany and England, 1880–1888. Cambridge, MA: The MIT Press, 1981. ———. “The Unpublished Autobiography of James McKeen Cattell.” American Psychologist 26 (July 1971): 626 – 35. Wissler, Clark. “The Correlation of Mental and Physical Tests.” Psychological Review Monograph Supplements 3 (Whole No. 16) (June 1901): 1– 62.
Salkind_Chapter 78.indd 129
9/4/2010 10:57:55 AM
Salkind_Chapter 78.indd 130
9/4/2010 10:57:55 AM
79 The Life and Labors of Francis Galton: A Review of Four Recent Books about the Father of Behavioral Statistics Brian E. Clauser
I
f one individual can be credited as the founder of the field of behavioral and educational statistics, that individual is Francis Galton. Galton was not a great mathematical statistician; he made no important contributions to that field. In fact, his efforts to earn an honors degree in mathematics at Cambridge resulted in a physical and mental breakdown (Gillham, 2001). The contributions that justify Galton’s status as father – or grandfather – of the field are based on his rediscovery of statistical methods and his application of those methods to the measurement of the mental and physical characteristics of humans. Galton deserves credit for our use of such basic analytic frameworks as percentile rank, correlation, and regression. He was not the first to describe the mathematical relationship represented by the correlation coefficient, but he rediscovered this relationship and demonstrated its application in the study of heredity, anthropology, and psychology. He is responsible for the term correlation (from co-relation), he discovered the phenomenon of regression to the mean, and he is responsible for the choice of r (for reversion or regression) to represent the correlation coefficient. Galton developed statistical applications for the behavioral sciences. He demonstrated the importance of the normal distribution and the normal cumulative frequency distribution in understanding human characteristics. Through this research and his influence on Karl Pearson (who provided a mathematically superior alternative to Galton’s formulation of the correlation Source: Journal of Educational and Behavioral Statistics, 32(4) (2007): 440 – 444.
Salkind_Chapter 79.indd 131
9/4/2010 10:57:44 AM
132
Research Design, Measurement and Statistics and Evaluation
coefficient), Galton influenced “Student” (William Gosset), R. A. Fisher, and the applied statisticians that have followed. Galton also pioneered the use of surveys in the behavioral sciences. In one study, he asked his fellow members of the Royal Society of London to describe mental images that they experienced. In another, he collected in-depth surveys from eminent scientists for a work examining the effects of nature and nurture on the propensity toward scientific thinking. Galton’s activities did not stop there. Francis Galton was the quintessential Victorian polymath. He was an explorer in Africa years before Stanley uttered the phrase “Dr. Livingstone, I presume.” (In fact, later in life he was involved in an unpleasant and very public controversy with Stanley.) When he returned from Africa, he wrote a manual for travelers with advice on topics as diverse as how to cross a river with a horse, protect provisions from foraging animals, and prepare for medical emergencies in the wild. To the explorer in need of medical assistance, he offered the consolation, “Though there is a great difference between a good physician and a bad one, there is very little between a good one and none at all” (Galton, 1883, p. 14). He collaborated with his cousin, Charles Darwin, providing statistical analysis for results Darwin (1876) presented in his volume on the effects of cross-fertilization. (R. A. Fisher, 1935, later used this as an example of how not to do statistical analysis, but he did so with the advantage of six decades of hindsight.) And he conducted studies that refuted Darwin’s hypothesis of pangenesis, a Lamarckian description of how acquired characteristics could be passed on to offspring. Galton developed weather maps and discovered the existence of the anticyclone. He wrote three monographs on the use of fingerprints and stands as the major influence in the adoption of this technology in criminology. He studied and wrote papers on the visions of sane people, statistical evidence for the efficacy of prayer (the results were not supportive), and the mechanism of heredity (Mendel’s work was unknown at the time; Galton conducted his own experiments with peas). Galton’s fascination with and admiration of Darwin’s work and his obsession for measurement of human characteristics led to an interest in inheritance in humans. He was the first to make the case that intelligence and other mental characteristics could be inherited, and he published several books of evidence to support his views (e.g., Hereditary Genius, 1869; Natural Inheritance, 1889). Ultimately, this line of work led to the conclusion that society had control and responsibility for improvement of the human stock. He coined the term eugenics (to describe the science that would support such improvement through the control of human mating) and wrote essays and a novel in support of this science. Among Galton’s final works was an autobiography, and shortly after Galton’s death in 1911 Karl Pearson (1914, 1924, 1930a, 1930b) wrote a monumental four-volume biography on Galton’s life and works. But for the
Salkind_Chapter 79.indd 132
9/4/2010 10:57:44 AM
Clauser
The Father of Behavioral Statistics
133
seven decades following the publication of Pearson’s opus, Galton received relatively little attention from biographers (Forrest’s [1974] volume is a noteworthy exception, although like Pearson’s biography it is now out of print). In recent years, however, there has been a renewed interest in Galton; four volumes have appeared that describe Galton’s life. The interested reader can choose from an array of writing styles and perspectives. The most literary of these efforts comes from the pen of A. S. Byatt in the form of a novel. The Biographer’s Tale tells a story within a story within a story; actually, at the center of the tale are three stories. The narrator/protagonist is attempting to write the biography of a fictional biographer who apparently died while researching three historical figures: Linnaeus, Ibsen, and Galton. The notes for this research are discovered by the protagonist and provide an opportunity to present fascinating (and mostly factual) information about all three of these individuals. Readers who are already convinced that they wish to know more about Galton will likely not be satisfied by the intriguing but all too brief presentation provided by Byatt. The reader who starts with Byatt’s novel likely will decide that he or she wishes to know more, but time spent reading the novel will not have been wasted. Of the three recent volumes taking a more traditional biographical approach, Brookes’s Extreme Measures: The Dark Visions and Bright Ideas of Francis Galton is the most accessible. It has been written with a broad audience in mind and is the least scholarly of the three. Brookes inserts descriptions of his personal experiences at locations visited in the process of researching the book and so establishes a relaxed, narrative style that is pleasantly readable. Brookes’s biography is also the briefest of the three; as such, it lacks detail about some aspects of Galton’s life. Although there is discussion of Galton’s obsession for measurement, there is little attention given to his statistical innovation. The volume is also limited by Brookes’s tendency to see every aspect of Galton’s life in relation to his views on eugenics. Brookes fails to place Galton’s views in the context of the times in which he lived. In the process, he makes too little of a distinction between Galton’s views and the final solution practiced by the Nazis decades after Galton’s death. Although Galton’s views of the indigenous populations that he encountered in Africa might well be seen as enlightened by Victorian standards, Brookes views them with a 21st-century perspective and finds evidence of Galton’s intolerance. This intolerance is then used as a basis for interpreting Galton’s eugenic interests. Little attention is given to Galton’s sensitivity about the importance of developing a plan within the constraints of social acceptability. Similarly, Brookes makes no effort to place Galton’s views within the social and historical context of the times; for example, it should be remembered that both Karl Pearson and R. A. Fisher actively participated in the eugenics movement. Gillham’s A Life of Sir Francis Galton: From African Exploration to the Birth of Eugenics differs from Brookes’s effort in several important respects. The Gillham book is less a narrative and although not exhaustive, provides a more
Salkind_Chapter 79.indd 133
9/4/2010 10:57:44 AM
134
Research Design, Measurement and Statistics and Evaluation
detailed academic account of Galton’s work. It is also much more substantial (at more than 400 pages) and concludes with nearly 40 pages of notes and references. Gillham’s effort differs from that of Brookes’s in that although nearly half of his volume falls in the section titled The Triumph of Pedigree, particular emphasis is not placed on eugenics. Eugenics is considered in the context of his life and work rather than the other way around. Although not written in the conversational tone of Brookes’s biography, Gillham’s style is pleasant and readable. The notes and references will be valuable to the serious reader, and the book also has numerous illustrations. One expects a biography to contain at least a few photographs, and Gillham obliges; more interesting are the many tables and figures from Galton’s own papers and monographs. Finally, Michael Bulmer’s Francis Galton: Pioneer of Heredity and Biometry provides a highly focused review of Galton’s contributions to genetics and applied statistics. Bulmer begins with an introductory chapter that gives an overview of Galton’s life; the reader progresses from Galton’s birth to old age in 41 pages. Following this overview are eight chapters that focus on Galton’s ideas about statistics, hereditary ability, the laws and mechanisms of heredity, eugenics, evolution, and biometry. Bulmer offers a well-balanced description of these areas. He describes Galton’s innovations and contributions, but he is also willing to point out where and when Galton got it wrong. Galton discovered regression to the mean, but his mathematical understanding of regression was limited; Galton carefully collected and studied data to understand the mechanisms and laws of heredity, but he never produced an accurate model to explain those data. Although Bulmer provides a reasonably extensive reference list, he does not include footnotes and does not follow the academic writing practice of providing references in support of his assertions. This is somewhat surprising considering that this is far and away the most detailed of the discussions of Galton’s intellectual efforts. Although this lack of referencing will be a disappointment for some readers, Bulmer does provide considerable background and framework for interpreting Galton’s work within the context of the times. For example, in discussing Galton’s statistical theory of heredity he shows how Galton’s views changed over time and contrasts those views with Mendel’s and Pearson’s, among others. Bulmer’s work provides the most detail on Galton’s use of statistics. Although the statistical presentations (occasionally including matrix formulation) will not present a challenge to readers of this journal, much of the discussion is in the context of genetic theories; the relatively naive reader can make his or her way through the text, but Bulmer clearly expects his reader to have some background in this area. Together, these four books provide a range of options for becoming familiar with the contributions that Francis Galton made to statistical and biological science. In addition, any readers who are left wanting more can seek out
Salkind_Chapter 79.indd 134
9/4/2010 10:57:44 AM
Clauser
The Father of Behavioral Statistics
135
Galton’s original works or Pearson’s encyclopedic biography. This said, the best introduction to Galton’s life and work may well be his autobiographical writing on the subject, Memories of My Life (1908). Whatever choices one makes, it is clear that with the range of current works on Galton there is no excuse for ignorance about this foundational figure in the field of applied statistics.
References Darwin, C. (1876). The effects of cross and self fertilization in the vegetable kingdom. London: John Murray. Fisher, R. A. (1935). The design of experiments. Edinburgh, UK: Oliver & Boyd. Forrest, D. W. (1974). Francis Galton: The life and work of a Victorian genius. New York: Taplinger. Galton, F. (1869). Hereditary genius. London: Macmillan. Galton, F. (1883). The art of travel. London: John Murray. Galton, F. (1889). Natural inheritance. London: Macmillan. Galton, F. (1908). Memories of my life. London: Methuen. Pearson, K. (1914). The life letters and labours of Francis Galton. Vo1. 1, birth 1822 to marriage 1853. Cambridge, UK: Cambridge University Press. Pearson, K. (1924). The life letters and labours of Francis Galton. Vol. 2, researches and middle life. Cambridge, UK: Cambridge University Press. Pearson, K. (1930a). The life letters and labours of Francis Galton. Vol. 3A, correlation, personal identification and eugenics. Cambridge. UK: Cambridge University Press. Pearson, K. (1930b). The life letters and labours of Francis Galton. Vol. 3B, characterization, especially by letters; index. Cambridge, UK: Cambridge University Press.
Salkind_Chapter 79.indd 135
9/4/2010 10:57:44 AM
Salkind_Chapter 79.indd 136
9/4/2010 10:57:44 AM
80 Regression towards the Mean, Historically Considered Stephen M. Stigler
1 Introduction
R
egression towards the mean is an elementary concept in statistics. When properly understood, it is transparent to the point of being obvious. Yet despite its simplicity, it has been consistently misunderstood and it has repeatedly been the source of major errors in analysis, some with significant policy implications, attracting such names as ‘the regression paradox’, ‘the regression fallacy’ and ‘the regression trap’. Milton Friedman has written ‘I suspect that the regression fallacy is the most common fallacy in the statistical analysis of economic data’,1 a sentiment that could with justice be carried over to any other field where multivariate data are employed for the analysis and formulation of policies. To understand the nature of this phenomenon, of how a simple idea could cause so much difficulty, it will be useful to examine the history of the idea, because the historical origins reveal a number of ways of interpreting it that could, if more widely known, alleviate much confusion. That history is remarkably short, a fact that itself may seem paradoxical. Modern texts on ‘regression analysis’ or ‘applied linear regression’ or ‘multiple regression analysis’ are almost entirely occupied with examining the use of the method of least squares to fit linear relationships to multivariate data, often for predictive purposes. These texts are based on a statistical methodology that dates back to at least 1805 and the work of Legendre and
Source: Statistical Methods in Medical Research, 6 (1997): 103 –114.
Salkind_Chapter 80.indd 137
9/4/2010 10:57:34 AM
138
Research Design, Measurement and Statistics and Evaluation
Gauss and Laplace, methods that were in part foreshadowed by developments a half-century before that.2 Yet the name ‘regression’ itself and the concept I discuss here only date from the period 1877–85, and those same texts on ‘regression analysis’ discuss that concept only sparsely, if at all.
2 The Concept of Regression Regression can be viewed as a purely mathematical phenomenon or as an intrinsically statistical concept; to begin with, let us consider how it can be expressed verbally, mathematically, and geometrically, since all of these can be traced to the early days of the concept. Verbally, we may consider a stochastic time-varying phenomenon, where two correlated measurements are taken of the same person or object at two different times. For example, we might consider the scores recorded on two examinations taken by the same individual at two separated times. Suppose the first score is exceptionally high – near the top of the class. How well do we expect the individual to do on the second test? The answer, regression teaches us, is ‘less well’, relative to the class’s performance. And the reasoning is clear: there is a selection effect. The high score on the first occasion is surely due to some mixture of successes in two components, to a high degree of skill (a permanent component) and to a high degree of luck (a transient component). The relative bearings of the two components of skill and luck on the first-time score would require measurement to pin down, but the fact that we expect both to have, on average, contributed to the exceptional first outcome is intuitively plausible, even obvious. And on the second occasion we expect the permanent component of skill to persist (for that is the meaning of permanent) while the transient component of luck will, on average, not be present (for that is the meaning of transient). We would not expect that the ‘luck’ on the second occasion will be bad luck; it may even be good luck – possibly on rare occasions even better that the first time. But it cannot be counted on to persist, and on average there will be no luck at all, neither good nor bad. And so we will have gone from ‘high skill plus good luck’ to ‘high skill alone’, a net decrease; still better than average, but less so than before. We expect (with of course no guarantee) regression towards the average. If the first score were exceptionally low, the situation would be reversed, with regression towards the average from below. Geometrically, the phenomenon can be seen in terms of one simple picture. Figure 1 shows a bivariate normal density; both variables are standardized and the correlation is 0.5. The solid object pictured, if complete, would have a total volume of 1.0 contained in the space between the surface and the X–Y plane. It has, however, been sliced apart. First, a cross-sectional slice is taken perpendicular to the X–Y plane and parallel to the Y-axis, intersecting the X-axis at X = x > 0, which might be taken as the exceptionally high first-occasion score. Next, the surface is decapitated parallel to the X–Y plane, such that the level
Salkind_Chapter 80.indd 138
9/4/2010 10:57:35 AM
Stigler
Regression towards the Mean, Historically Considered
139
Figure 1: The bivariate normal surface: A geometric illustration of the concept of regression
curve of intersection (an ellipse) is exactly tangent to the curve of intersection of the first slice (which is a curve proportional to a normal density, the conditional density of Y given X = x). The major and minor axes of the ellipse are shown (they are the lines Y = X and Y = –X), as is the line from the origin through the point of tangency of the two curves. This latter line is the line of the conditional expectation of Y given X = x (this is clear since it must pass through the mode of the conditional density of Y given X, and for the symmetrical normal distributions, the mode, the median, and the mean must all agree). Then in terms of this diagram the regression phenomenon consists of the obvious observation that the line of conditional expectations must be closer to the X-axis than is the major axis of the ellipse – for it would be clearly impossible for the first slice to touch the ellipse at the point the major axis crosses it, unless the ellipse were collapsed to a line segment, as would only be true if the correlation were 1.0. And so, unless there is perfect correlation between X and Y there must be regression towards the average. Mathematically, there are several different, equivalent ways of deriving the regression phenomenon. 1. You may begin with two standard normal random variables X and Y with correlation r and bivariate density f( x, y) =
⎛ ⎞ 1 exp − ⎜⎜⎜ ( x 2 − 2ρ xy + y 2 )⎟⎟⎟ 2 ⎟⎠ ⎝ 2(1 − ρ ) 2π 1 − ρ 1
2
Then after some algebra the conditional density of Y given X = x is found to be f( y | x) = =
Salkind_Chapter 80.indd 139
f( x, y) fx ( x ) 2⎞ ⎛ ⎜⎜ 1 ⎛⎜ y − ρ x ⎞⎟ ⎟⎟ ⎟ exp ⎜⎜− ⎜⎜ ⎟ ⎟⎟⎟ 2 2 ⎟ ⎜ ⎜ 2 ⎜ 2π 1 − ρ ⎜⎝ ⎝ 1 − ρ ⎟⎠ ⎟⎟⎠
1
9/4/2010 10:57:35 AM
140
Research Design, Measurement and Statistics and Evaluation
which we recognize as the density of a N(rx, 1 −r2) random variable. Hence the conditional expectation of Y given X = x is rx, representing regression from x towards the mean of 0. 2. The verbal description given earlier can be expressed mathematically. We may represent X = S + E1 Y = S + E2 where S, E1, and E2 are independent, S is the ‘persistent’ trait and the Ei are the ‘transient’ traits. For the simplest form of the argument, suppose that S and the Ei all have the same distribution, with E(S) = 0 and E(Ei) = 0. Then E(X|Y = y) = E(S + E1|S + E2 = y) = E(S|S + E2 = y)+E(E1|S + E2 = y) = E(S|S + E2 = y) + E(E1) (by independence) = E(S|S + E2 = y) But y = E(S + E2|S + E2 = y) =E(S|S + E2 = y) + E(E2|S + E2 = y) = 2E(S|S + E2 = y) and so E(X|Y = y) = 0.5y. Note that this argument does not require normality or even the existence of second moments, although if the correlation exists we would clearly have r = 0.5, in agreement with (1). 3. A different approach is not in terms of standardized variables, but rather is framed sequentially, in terms of a conditional distribution. Let X have a normal distribution N(0, c2), and let Y = X + Z, where Z is N(0, b2), independent of X. Then Y is N(0, b2 + c2) and the correlation of X and Y is
ρ = ρ XY =
c2 2
2
2
c (b + c )
=
c 2
b + c2
Clearly the conditional expectation of Y given X = x is simply x; what is the conditional expectation of X given Y = y? Finding the bivariate distribution of X and Y and employing a derivation such as that in (1) above tells us that E(X|Y = y) is not y, but rather it is [(c2)/(b2 + c2)]y, clearly closer to the mean of 0 than is y. The fact that E(Y|X = x) is equal to x (rather than being itself closer to the mean of 0) is a reminder that ‘regression towards the mean’ need literally be true only when the variables are
Salkind_Chapter 80.indd 140
9/4/2010 10:57:35 AM
Stigler
Regression towards the Mean, Historically Considered
141
standardized to have the same variances. If we rescale Y to have the same variance as X, by Y′= rY, then E(Y ′|X = x) = rx and E(X|Y ′ = y) = r y
3
Galton and Regression
Francis Galton discovered the phenomenon of regression. Few conceptual advances in statistics can be as unequivocally associated with a single individual. Least squares, the central limit theorem, the chi-squared test – all of these were realized as the culmination of many years of exploration by many people. Regression too came as the culmination of many years’ work, but in this case it was the repeated efforts of one individual. The first glimmers of the idea can be found already in Galton’s 1869 book Hereditary genius. In that work he studied the way talent ran in families, and most of the book consists of lists of eminent people and their eminent relatives – great scientists and their kin with known scientific accomplishments (e.g. the Bernoullis), musicians and their musical kin (e.g. the Bachs), and so forth. But despite the inevitable arbitrariness in his classifications and evaluation of eminence, Galton noted that there was a marked tendency for a steady decrease in eminence the further down or up the family tree one went from the great man (e.g. Jacob Bernoulli or Johann Sebastian Bach) whose fame led to the family’s inclusion in the study. Even with dogs this was true: ‘If a man breeds from strong, well-shaped dogs, but of mixed pedigree, the puppies will be sometimes, but rarely, the equals of their parents. They will commonly be of a mongrel, nondescript type, because ancestral peculiarities are apt to crop out in the offspring.’3 In 1869 Galton only vaguely approached the concept in its verbal form, but he was unable to formulate in a precise way how the accidental ‘cropping out’ of ‘ancestral peculiarities’ might be encompassed in a theory. Still the question kept gnawing at him; over the years 1874 – 88 he revisited this problem repeatedly, and, bit by bit, he overcame it in one of the grand triumphs of the history of science. The story is an exciting one, involving science, experiment, mathematics, simulation, and one of the great mental experiments of all time. But it is a long story, one I have examined in detail in my book,2 and so I shall only relate it in outline here. In the years 1874 – 77, Galton launched his first assault upon this conundrum: how and why was it that talent or quality once it occurred tended to dissipate rather than grow. He never lost interest in the study of the inheritance of human genius, but he realized early on that intellectual quality was not an area that permitted either easy measurement on a wide scale or active experimentation. And so he fell back on studies of other measurable qualities, particularly stature – height – in humans, and he began a series of experiments
Salkind_Chapter 80.indd 141
9/4/2010 10:57:35 AM
142
Research Design, Measurement and Statistics and Evaluation
involving the measurement in successive generations of the diameter of sweet peas. And while considering these experiments, he invented a wonderful machine, the Quincunx, that was to serve as an analogue for hereditary processes and provide the key insight to the solution. Galton had been puzzled by how to reconcile the standard theory of errors with what he observed and knew to be true from experiments. The theory of errors held that a normal population distribution would be produced through the accumulation of a large number of small accidental deviations, and there seemed to be no other way to account for the ubiquitous appearance of that normal outline. Galton’s experiments with sweet peas and his studies of human stature agreed with earlier work by the Belgian statistician Adolphe Quetelet: the world, by and large, was normally distributed. Yet, as Galton realized, this did not square with the fact that in heredity there were large and important causes of deviations at work: inheritance of talent, height, or diameter was not perfect, but these qualities did run in families. The normal distribution he and others found was not the exclusive result of small accidental causes; it had somehow to be reconciled with the influence of the large and invariable causes of heredity. In 1873 Galton had a tradesman make for him a machine he called the Quincunx. It consisted of a board with a funnel at the top through which lead shot could be released to fall through a succession of offset rows of pins, collecting at the bottom in vertical compartments (for a photograph of the original machine see Stigler,2 p. 277). The left panel of Figure 2 shows a schematic rendition. The name ‘Quincunx’ was derived from the similarity of the pattern of pins to the arrangement of cultivated fruit trees in English agriculture, a pattern that was known as quincunxial because it was based on a square of four trees with a fifth in the centre. Galton’s Quincunx was initially intended
A
A
B
B
Figure 2: A schematic drawing of Galton’s Quincunx4
Salkind_Chapter 80.indd 142
9/4/2010 10:57:36 AM
Stigler
Regression towards the Mean, Historically Considered
143
to illustrate the workings of a large number of small accidental causes to produce a normal-like distribution. It might be likened to a dynamic version of Pascal’s triangle: As shot pass from top to bottom they are randomly deflected at each row, and if the machine is well made and in balance the shot will produce an outline at the bottom where the number of shot in each compartment is proportional to the number of paths to that compartment. That is, the number of shot in a compartment will be proportional to the binomial coefficients – a nearly normal distribution if the number of rows of pins is at all large. The Quincunx illustrated the manner in which a large number of small accidents could produce a normal distribution. But what of the large and notso-accidental causes that Galton found inherited to one degree or another in his studies? The evolutionary progress of the shot through the Quincunx led Galton to his fundamental first insight through one of the great mental experiments in the history of science. I term this a mental experiment because, while Galton clearly in several places described the variant of the Quincunx that performed the experiment, there is no indication that he actually built the apparatus. And having tried to build such a machine, I can testify that it is exceedingly difficult to make one that will accomplish the task in a satisfactory manner. Galton first imagined taking the Quincunx apart in the middle and stretching it out, but to ensure that the stretching does not alter the final distribution of the shot he would add vertical barriers to keep them from straying while they traversed the gap. Galton’s printed diagram from 1889 is shown in the second panel of Figure 1; he illustrated the idea in correspondence as early as 1877 (see Stigler,2 pp. 278–79). Clearly with these barriers the introduction of the gap would have no effect on the distribution of shot among the compartments at the bottom. Galton then conceived of introducing a barrier at the bottom of the gap, turning the barriers into a second set of compartments like those at the bottom. What effect would that have? Again it is clear that all this would do is to foreshorten the Quincunx; with fewer rows of pins to traverse, the shot would still come to rest in a normal-like distribution, but one that was less disperse than if they had been allowed to finish the course. Galton would then release the shot from this midlevel, but only from one compartment: this would be expected to produce a small normal distribution immediately below the compartment from which they were released. Proceed then to release the remaining compartments, one at a time. Each will produce its own little normal curve; those near the centre being larger than those more extreme, because more shot will have been deposited in the central compartments by the first stage of the Quincunx. And when all have been released, the result – the sum of all the little normal curves – will be as if no interruption at all had taken place! Galton’s imagination had shown how the normal world could be dissected into components, components which could be traced back to the location of the shot at the end of a first stage. The machine was a beautiful match to his
Salkind_Chapter 80.indd 143
9/4/2010 10:57:36 AM
144
Research Design, Measurement and Statistics and Evaluation
investigations of inheritance. The seeming homogeneity of the final outline could be seen now as a mixture derived from previous generations. Indeed, Galton’s mental experiment can be interpreted as an analogue proof of the mathematical theorem, that a normal mixture of normal distributions is itself normal (or, in the discrete version, that a convolution of binomial distributions with the same p is binomial). You can even see the phenomenon of regression: the expected final position of a shot released from the mid-level is immediately below it, but what is the expected origin of a shot on the bottom level? Clearly towards the centre from its position, since there are more shot originating towards the centre than further away. With the Quincunx in mind. Galton’s later correlation tables take on a whole new meaning. For example, in Table 1 the right-hand column ‘Total no. of adult children’ is seen as the distribution of the shot at the mid-level, the rows of counts as the corresponding little normal curves, and the ‘Totals’ of the bottom row as the final outline of the Quincunx. Even by 1877 Galton had begun to assemble these insights mathematically. He had empirically noted the tendency for ‘reversion’ towards the mean and labelled this ‘r’. In his notation, if c = the dispersion (essentially, standard deviation) of the first generation, d = the dispersion of the second generation and υ = the dispersion of the offspring (the little normal curves), then since the position of a second generation individual was the sum of its ‘reverted’ average displacement from its parent (say rz, where z was the first generation position) and its random deviation from that position, these dispersions would be related by d2 = u2 + r2c2. But why did the reversion take the linear form rz? And why would the population dispersion remain stable; that is, what mechanism produced d = c? The answer to this (that d = c was a necessary consequence of population stability) did not come to Galton until 1885, when, inspired by tables such as Table 1, and with a slight assist from the Cambridge mathematician JH Dickson, he produced a full formulation in terms of the bivariate normal distribution. He summarized and elaborated upon this formulation in his 1889 Natural inheritance.4 His discussion there included the geometric interpretation of regression and the mathematical formulation given earlier as (3) (which we can recognize now as a description of the working of the Quincunx, with X = the reverted first generation position and Z is the displacement of offspring from parent), and much more. He was aware that there were two regression lines. He even described a variance components model for fraternal relationships, and he discussed how to estimate the components of variance. By the time Natural inheritance appeared, he had, while considering problems in physical anthropology and forensic science, noticed that when two variables were expressed in standardized units, the two regression lines had the same slope, and he suggested using that slope, which he termed the ‘index of co-relation,’ as a measure of the strength of the relationship. He interpreted the correlation coefficient both as a regression coefficient and as what we would now term an intraclass correlation coefficient.5
Salkind_Chapter 80.indd 144
9/4/2010 10:57:36 AM
Salkind_Chapter 80.indd 145
1 1 1
5
..
Totals ..
Medians ..
.. .. .. 1 .. 1 ..
Below
Above .. 72.5 71.5 70.5 69.5 68.5 67.5 66.5 65.5 64.5 Below ..
Heights of the mid-parents in inches
..
7
.. .. .. .. .. .. 3 3 .. 1 ..
62.2
66.3
32
.. .. .. 1 1 7 5 3 9 4 2
63.2
67.8
59
.. .. .. .. 16 11 14 5 5 4 4
64.2
67.9
48
.. .. 1 1 4 16 15 2 7 1 1
65.2
67.7
117
.. .. 3 1 17 25 36 17 11 5 2
66.2
67.9
138
.. .. 4 3 27 31 38 17 11 5 2
67.2
68.3
120
.. 1 3 12 20 34 28 14 7 .. 1
68.2
68.5
167
.. 2 5 18 33 48 38 13 7 2 1
69.2
Heights of the adult children
69.0
99
.. 1 10 14 25 21 19 4 5 .. ..
70.2
69.0
64
.. 2 4 7 20 18 11 .. 2 .. ..
71.2
70.0
41
1 7 9 4 11 4 4 .. 1 .. ..
72.2
..
17
3 2 2 3 4 3 .. .. .. .. ..
73.2
..
14
.. 4 2 3 5 .. .. .. .. .. ..
Above
..
928
4 19 43 68 183 219 211 78 66 23 14
Adult children
..
205
5 6 11 22 41 49 33 20 12 5 1
Mid-parents
Total number of
..
..
.. 72.2 60.9 69.5 68.9 68.2 67.6 67.2 66.7 65.8 ..
Medians
Table 1: One of Galton’s correlation tables (from Francis Galton, family likeness in stature, Proceedings of the Royal Society of London 1886; 40: 42–73). Galton’s 1885 crosstabulation of 928 ‘adult children’ born of 205 mid-parents, by their height and their mid-parent’s height
Stigler Regression towards the Mean, Historically Considered 145
9/13/2010 5:05:20 PM
146
Research Design, Measurement and Statistics and Evaluation
4
The Understanding of the Regression Phenomenon
It is fair to say that by 1889 Francis Galton had a clear understanding of the concept of regression. He did not have the command of all the mathematical apparatus I used in the discussion of the concept early in this essay, but his written discussion captured the essence of all of the different formulations given, and his mathematics reflected at least that of (2) and (3). Regression was no longer simply an empirical observation, it was a mathematical deduction. He wrote (p. 95) However paradoxical it may appear at first sight, it is theoretically a necessary fact, and one that is clearly confirmed by observation, that the Stature of the adult offspring must on the whole be more mediocre than the stature of their Parents.4
Questions about how to best estimate the coefficients of the problem, the correlation coefficient and the parameters of the bivariate normal distribution, would be addressed later by Francis Edgeworth and Karl Pearson, but Galton’s grasp of the concepts was as firm as any you are likely to encounter even today. Galton himself was naive in assuming that if data were recorded on a sequence of occasions (not only two) that regression necessarily continued, even at the same rate. Karl Pearson named this ‘Galton’s Law of Ancestral Heredity,’ and even Pearson did not seem to appreciate that the continuation of the phenomenon after the first generation requires rather special assumptions.6 How well did Galton do in communicating that understanding? If judged by the way he is received by a reader a century after he wrote, the answer would have to be, very well indeed. He wrote in clear and direct prose, in terms that we can understand, with the penetration and clarity that are characteristic of only some of the greatest minds. But that is not the standard that is called for. How well did his contemporaries understand his message? Statisticians generally grasped the concept quite well at one level. Edgeworth and Pearson set to work developing the mathematics of regression further, moving towards multiple dimensions and exploring optimum procedures for estimating correlation. In 1901 Bowley wrote the earliest English text to include the new statistical methods, and he included a chapter on the mathematics of the bivariate normal distribution, including both lines of conditional expectation.7 A reader who came away from Bowley’s discussion with the impression that the primary importance of regression was for the study of evolution should have been excused, however, Udny Yule incorporated a full appreciation of the idea of two regression lines into his highly influential text from the first edition.8 At least one perceptive early reviewer of Galton, the philosopher John Dewey, called specific attention to the phenomenon of regression, even noting in effect its dependence upon a stationary population, when
Salkind_Chapter 80.indd 146
9/4/2010 10:57:36 AM
Stigler
Regression towards the Mean, Historically Considered
147
he wrote that it might not hold in the inheritance of wealth: ‘The tendency of wealth to breed wealth, as illustrated by any interest table, and the tendency of extreme poverty to induce conditions which plunge children still deeper into poverty, would probably prevent the operation of the law of regression toward mediocrity.’9 Of course Galton could have replied that even then the law would hold in standardized units. Still, there were clearly limitations to the general understanding of regression as a phenomenon capable of dangerously misleading. The biometrician Frank Weldon, who himself had a very good grasp of Galton’s message, wrote in a 1905 lecture that [T]his phenomenon of regression ... is not generally understood [V]ery few of those biologists who have tried to use [Galton’s] methods have taken the trouble to understand the process by which he was led to adopt them, and we constantly find regression spoken of as a peculiar property of living things, by virtue of which variations are diminished in intensity during their transmission from parent to child, and the species is kept true to type. This view may seem plausible to those who simply consider that the mean deviation of children is less than that of their fathers: but if such persons would remember the equally obvious fact that there is also a regression of fathers on children, so that the fathers of abnormal children are on the whole less abnormal than their children, they would either have to attribute this feature of regression to a vital property by which children are able to reduce the abnormality of their parents, or else to recognize the real nature of the phenomenon they are trying to discuss.10
In the decades after Weldon wrote, the situation did not change. Following Yule, regression was a staple of textbooks. Its mathematics could be said to be well understood by mathematical statisticians, while applied statisticians, if they were aware of it at all, thought of it as either the use of the method of least squares or as only a biological process. The term ‘regression’ soon came to be regarded as archaic, often accompanied by a brief explanation of its roots in biology but with no indication of the relevance of those roots to other applications. In 1924 the economic statistician Frederick C Mills could write, ‘The term is now used generally, as indicated above, though the original meaning has no significance in most of its applications.’11 It was therefore a trap waiting for the unwary, who were legion. The most spectacular instance of a statistician falling into the trap was in 1933, when a Northwestern University professor named Horace Secrist unwittingly wrote a whole book on the subject, The triumph of mediocrity in business.12 In over 200 charts and tables, Secrist ‘demonstrated’ what he took to be an important economic phenomenon, one that likely lay at the root of the great depression: a tendency for firms to grow more mediocre over time. Secrist was aware of Galton’s work; he cited it and used Galton’s terminology. The preface even acknowledged ‘helpful criticism’ from such statistical luminaries
Salkind_Chapter 80.indd 147
9/4/2010 10:57:37 AM
148
Research Design, Measurement and Statistics and Evaluation
as HC Carver (the editor of the Annals of Mathematical Statistics), Raymond Pearl, EB Wilson, AL Bowley, John Wishart and Udny Yule. How thoroughly these statisticians were informed of Secrist’s work is unclear, but there is no evidence that they were successful in alerting him to the magnitude of his folly (or even if they noticed it). Most of the reviews of the book applauded it.13–15 But there was one dramatic exception: in late 1933 Harold Hotelling wrote a devastating review, noting among other things that ‘The seeming convergence is a statistical fallacy, resulting from the method of grouping. These diagrams really prove nothing more than that the ratios in question have a tendency to wander about.’16 Secrist did not understand the criticism, leading Hotelling to reiterate the lesson in a subsequent letter in even plainer language: ‘When in different parts of a book there are passages from which the casual reader may obtain two different ideas of what the book is proving, and when one version of the thesis is interesting but false and the other is true but trivial, it becomes the duty of the reviewer to give warning at least against the false version.’17 One would think that so public a flogging as Secrist received for his blunder would wake up a generation of social scientists to the dangers implicit in this phenomenon, but that did not happen. Textbooks did not change their treatment of the topic, and if there was any increased awareness of it, the signs are hard to find. In the more than two decades between the SecristHotelling exchange in 1933 and the publication in 1956 of a perceptively clear exposition in a textbook by W Allen Wallis and Harry Roberts, I have only encountered the briefest acknowledgements.18 A paper in Psychometrika by RL Thorndike19 is an exception; like Hotelling’s review, Thorndike’s paper was a reaction to blunders in the literature. Thorndike disclaimed originality (‘It is not the purpose of this paper to present any scintillating new statistical ideas’), but he clearly expected that his tutorial would be news to many readers. He mentioned only one offender, a psychologist from the University of Iowa named Crissey, but stated, ‘I select this example without malice – I might have selected any of a number of others’. The more common rule over this two-decade period was that textbooks such as the successive revisions of Yule’s book by MG Kendall kept repeating the earlier material with more recent references and enhanced mathematics. Even after 1956, when (perhaps influenced by Wallis and Roberts) the topic attracted increasing attention, blunders persisted. In 1970 a political economist AO Hirschman (who had presumably not read Hotelling’s review, and was evidently innocent of any awareness of the regression phenomenon) cited Secrist’s book, writing, ‘An early, completely forgotten empirical work with a related theme has the significant title The triumph of mediocrity in business, by Horace Secrist, ... The book contains an elaborate statistical demonstration that, over a period of time, initially high-performing firms will on the average show deterioration while the initial low performers will exhibit improvement.’20 Some writers have known of the problem but still
Salkind_Chapter 80.indd 148
9/4/2010 10:57:37 AM
Stigler
Regression towards the Mean, Historically Considered
149
fallen in the trap (see, for example, Friedman21 for a discussion of two of these). Other researchers who have known of the phenomenon but not understood it have been frightened by the spectre of one type of error into making another: in at least one instance,22 researchers were so worried about the possibility of committing the fallacy that they introduced a correction for ‘regression effects’ where, not only was none needed, the ‘correction’ produced an erroneous result! The recurrence of regression fallacies is testimony to its subtlety, deceptive simplicity, and, I speculate, to the wide use of the word regression to describe least squares fitting of curves, lines, and surfaces. Researchers may err because they believe they know about regression, yet in truth have never fully appreciated how Galton’s concept works. History suggests that this will not change soon. Galton’s achievement remains one of the most attractive triumphs in the history of statistics, but it is one that each generation must learn to appreciate anew, one that seemingly never loses its power to surprise.
References 1. Friedman M. Do old fallacies ever die? Journal of Economic Literature 1992; 30: 2129–32. 2. Stigler SM. The history of statistics. Cambridge, MA: Harvard University Press, 1986. 3 Galton F. Hereditary genius. London: Macmillan 1869: 64. 4. Galton F. Natural inheritance. London: Macmillan, 1889. 5 Stigler SM. Francis Galton’s account of the invention of correlation. Statistical Science 1989; 4: 73 – 86. 6. Nesselroade J, Stigler SM, Baltes P. Regression toward the mean and the study of change. Psychological Bulletin 1980; 87: 622 – 37. 7. Bowley AL. Elements of statistics. London: PS King, 1901: 316 –26, and later editions. 8. Yule GU. An introduction to the theory of statistics. London: Charles Griffin, 1911 (and many later editions). 9. Dewey J. Galton’s statistical methods. Publications of the American Statistical Association 1889; 7: 331– 34. [Quoted in Stigler, The history of statistics. 1986: 301, and at more length in Stigler, A look backward on the occasion of the centenary of JASA, Journal of the American Statistical Association 1988; 83: 583– 87.] 10. Strong TB ed. Lectures on the method of science. Oxford: Clarendon Press, 1906: 106 –107. 11. Mills FC. Statistical methods. Applied to economics and business. New York: Henry Holt, 1924: 394. 12. Secrist H. The triumph of mediocrity in business. Evanston, IL: Bureau of Business Research, Northwestern University 1933. 13. Elder RF. Review of The triumph of mediocrity in business by Secrist H. American Economic Review 1934: 24: 121– 22. 14. King WI. Review of The triumph of mediocrity in business by Secrist H. Journal of Political Economy 1934; 42: 398 – 400. 15. Riegel R. Review of The triumph of mediocrity in business by Secrist H. Annals of the American Academy of Political and Social Science 1933; 170: 178 –79. 16. Hotelling H. Review of The triumph of mediocrity in business by Secrist H. Journal of the American Statistical Association 1933; 28: 463 – 65.
Salkind_Chapter 80.indd 149
9/4/2010 10:57:37 AM
150
Research Design, Measurement and Statistics and Evaluation
17. Secrist H, Hotelling H, Rorty MC. Open letters I. Journal of the American Statistical Association 1934; 29: 196 –200; see Stigler SM., The history of statistics in 1933. Statistical Science 1996, 11: 244 –52, for a full account of Secrist and Hotelling. 18. Wallis WA, and Roberts H. Statistics: a new approach. Glencoe IL: Free Press, 1956: 258 – 63. 19. Thorndike RL. Regression fallacies in the matched groups experiment. Psychometrika 1942; 7: 85–102. 20. Hirschman AO. Exit, voice, and loyalty: responses to decline in firms, organizations, and states. Cambridge, MA: Harvard University Press, 1970. 21. Friedman M. Do old fallacies ever die? Journal of Economic Literature 1992; 30: 2129 –32. 22. Stigler SM. Psychological functions and regression effect. Science 1979; 206: 1430.
Salkind_Chapter 80.indd 150
9/4/2010 10:57:37 AM
81 Karl Pearson and Statistics: The Social Origins of Scientific Innovation Bernard J. Norton
K
arl Pearson (1857–1936) is widely regarded as the founder of the modern discipline of statistics, and is also famous as a philosopher of science, as a writer on social Darwinism and as a leading mover to install eugenics as the key social science.1 He offers the prospect of a profitable study of the relations which may hold between a man’s scientific work on the one hand and his social and philosophical views on the other – and between both of these and the historical ‘forces’ of his time. It is good to begin by recalling some leading aspects of Pearson’s life and career. He was the son of William and Fanny Pearson. William was a self-made man who had risen from a rural background to become a successful London barrister: Fanny was the daughter of a ship’s captain and owner. In his youth Pearson moved steadily through the educational channels then available to the professional middle classes, going from University College School, via a crammers, to King’s College Cambridge where, in 1879, he was third wrangler in the mathematics tripos. In the following year he was awarded a college fellowship, which gave him six years of financial independence. Pearson undertook postgraduate studies in the universities of Heidelberg and Berlin, and later, whilst ostensibly preparing for a legal career, wrote and lectured on German history and on the ‘advanced’ topics of his day – anarchy, socialism, sex, womens’ rights, and so on. This radical scholarship was not staunched by his appointment to the chair of applied mathematics and mechanics at University College London in 1884, being in fact supplemented by work in the history and philosophy of science.2 ‘Non-scientific’ writing, interestingly, ceased only Source: Social Studies of Science, 8 (1978): 3–34.
Salkind_Chapter 81.indd 151
9/4/2010 10:57:24 AM
152
Research Design, Measurement and Statistics and Evaluation
after Pearson’s meeting with W.F. R. Weldon (1860–1906), University College’s professor of zoology, who, on his appointment in 1890, was seeking to inject the then new statistical techniques of Francis Galton into what he (Weldon) had come to regard as the moribund field of evolutionary biology.3 Weldon needed mathematical assistance if he was to succeed, and it was perhaps natural that he should turn to Pearson, as they were colleagues in the cause of university reform.4 Pearson gave more than a little assistance, and from 1893 onwards, began to produce memoir after memoir on the ‘mathematical theory of evolution’, published at first in the mathematical volumes of the Philosophical Transactions of the Royal Society. These memoirs were, in fact, exemplars of a new discipline of biometry, and Pearson’s contributions to biometry over the next fifteen years were to yield developments in statistical theory which Churchill Eisenhart sees as having ‘firmly established statistics as a discipline in its own right’.5 These developments in theory were sustained by institutional moves: in 1901 Pearson and Weldon founded Biometrika, and, on Galton’s demise in 1911, Pearson became the first Galton professor of eugenics at University College London, taking a chair established in that year with funds left by Galton in his will. By 1911, Pearson was already director of a ‘Biometric Laboratory’ within the applied mathematics department at University College, and also director of the ‘Galton Laboratory for National Eugenics’, which had been set up, with Galton’s assistance, in 1906. Now he could combine the two into a Department of Applied Statistics – the first such department.6 The Biometric Laboratory developed statistical methods in a biological context, and the Eugenics Laboratory applied these in work held to show the high dominance of nature over nurture in human affairs. The two put out a range of publications: Biometrika itself, a range of biometric and eugenic memoirs, tracts on issues of the ‘Day and Fray’, several ‘Studies in National Deterioration’, and, from 1926 onwards, the Annals of Eugenics, now reborn as the Annals of Human Genetics. For many years Pearson’s department was England’s premier source of statistical tuition, attracting students later to achieve fame and posts of importance, and producing publications that were to affect significantly the thought of biologists, psychologists, sociologists and statisticians. Both G. Udny Yule and (looking to a later period) Jerzy Neyman were intimately associated with the department at various times.7 Certainly, in Pearson’s time, statistics was always associated with eugenics, and, more generally, was strongly promoted as a mathematical methodology that was capable of elevating several disciplines – for instance, psychology, anthropology, sociology and craniometry – into truly scientific ones. To the end of his tenure in 1930, Pearson emphasized the need to construct a research institute where a ‘novel calculus could be applied to problems concerning living forms’.8 On retirement, Pearson saw his department divided into a statistics department under E.S. Pearson, and a department of eugenics under R.A. Fisher.
Salkind_Chapter 81.indd 152
9/4/2010 10:57:24 AM
Norton
Karl Pearson and Statistics
153
Interestingly, in 1937 there was set up a Weldon chair of biometry, funded by money bequeathed by Weldon’s widow: the first incumbent was to be J.B.S. Haldane. Putting aside the fascinating issues of funding and personnel involved in Pearson’s development of the discipline of statistics, we should now be able to discern a number of clear and important historical problems. One wonders why it is that Pearson should take to evolutionary biology, to biometry, some fifteen years after his graduation as a mathematician. Similarly, one wonders why this biological work, this biometry, should have led to major developments in statistical theory. Then, one wonders how Pearson’s statistics related to his work in the philosophy of science and eugenics – and, indeed, why he should have promoted statistics as a universal methodology for the human sciences. In this paper I will attempt to develop a thesis of the following sort. Pearson entered willingly into biometry when presented with the opportunity by Weldon, not because of Weldon’s exceptional charm or because Pearson was short of problems of his own, but because by the time that he met with Weldon, Pearson had independently developed a pattern of social, philosophical and political thought which disposed him to find Weldon’s programme of mathematical biology one of the greatest possible significance. Before meeting with Weldon, I shall argue, Pearson had grown into a social Darwinist anxious to provide his particular form of Darwinism with a proper scientific basis, and to show that Darwin’s ideas and socialism were complementary, and not opposed, as had been maintained by several leading thinkers of the nineteenth century. Biometry offered him the chance of pursuing these ends. Moreover, I shall argue, Pearson’s conception of ‘properly scientific’ (as articulated in his philosophical writings) was one that made it probable that the development of biometry, should it be at all forthcoming, would yield a harvest of statistical methods. Statistics, thus formed, embodied the central tenets of Pearson’s philosophy of science, and, as such, was to be universally recommended. It was to be applied to eugenics in particular, for eugenic thought was a component of Pearson’s social Darwinism before his meeting with Weldon. Pearson’s Darwinism and his philosophy of science, I shall argue, were integrated components in a world view constructed by Pearson in early manhood, when he was attempting to come to terms with the social and intellectual problems posed to him by his life within late-Victorian society. Thus, I shall argue, we must see Pearson’s work in statistics as the outcome of his attempts to deal with his social and intellectual milieu. The thesis is here developed in several sections, and it will perhaps be useful to give a preliminary account of the ordering of these sections and of their contents. I commence with a section entitled ‘Biometry and Statistics’. Here, after providing social and intellectual background to the biometric movement, I attempt to show something of the way in which biometric problems led to
Salkind_Chapter 81.indd 153
9/4/2010 10:57:24 AM
154
Research Design, Measurement and Statistics and Evaluation
the creation of the statistical ideas for which Pearson is famous and which were to form the core of the tuition offered within his biometric laboratory and his department of applied statistics. At this stage, something of the relationship between the distinctive philosophy of science developed by Pearson before his meeting with Weldon and his subsequent biometric and statistical endeavours should start to become apparent. We should be able to see by the end of this section that the form taken by biometry, and its role as the midwife of statistics may largely be understood via its relations with the philosophical views formed by Pearson before he took to biometry. At this stage too, Pearson’s espousal of statistics as a universal methodology should become comprehensible. The second section, ‘Science, Socialism and Social Darwinism’, addresses the further topic of why it was that Pearson was prepared to be interested in biology when approached by Weldon. It is one thing, after all, to explain (in the manner of Section 1) the particular form taken by biometry, and to exhibit this form as a cause of biometry’s having led to statistics. It is another, distinct task to explain why Pearson should have been prepared to enter into biological work. At the time it was not a recognized or honoured path for the mathematician and seems to have done little for Pearson’s career prospects – as, for example, when he applied without success for the Savilian Chair at Oxford in 1897. The line I take in this second section is that of denying that Pearson was ever primarily interested in biology in its own right. I shall suggest rather that by the time of his meeting with Weldon, Pearson was already an established social Darwinian – that is to say, one who supposed that a scientific guide to human affairs could be obtained from the philosophy of Darwin, suitably interpreted. Pearson, I will show, entered into biometry, into evolutionary biology, not only with a view to giving an exemplar of a truly scientific biology, but also with the aim of providing his social Darwinism with suitable underpinnings; he also hoped to show that Darwinism enjoined a move to state socialism, rather than to the laissez-faire capitalism recommended by earlier writers on social Darwinism. At this stage too, we shall see that before meeting Weldon Pearson’s thought already had a significant eugenic component. In a third section, entitled ‘Scenes from a Victorian Life’, I attempt to trace the development of the patterns of thought which, I claim, predisposed Pearson to take to biometry. Here I will discuss his early days in Cambridge, Heidelberg and London, tracing the incidents and problems thrust upon him by the conditions of his life; I will show how his responses to these led him to the ‘primed’ condition that disposed him to respond so favourably when approached by Weldon, and thus started the major enterprise of his life – the building up of a biometric school of statistics and social biology. Naturally, the explanations I offer have their difficulties, and, perhaps, foremost amongst these is that of explaining the particular pattern of Pearson’s response to the stimuli of his early life. After all, in human affairs, the same set of stimuli do not always call forth the same response: here I explore the
Salkind_Chapter 81.indd 154
9/4/2010 10:57:24 AM
Norton
Karl Pearson and Statistics
155
possibility of explaining Pearson’s making the sort of response that he did in terms of the natural ‘interests’ of persons occupying his sort of social role in later Victorian society. Such a strategy has severe difficulties and these are finally made very clear.
1. Biometry and Statistics (a) General Background Biometry was a construct of England of the late 1890s, and to a degree to be determined, reflected its circumstances, some of which were as follows. In ‘scientific’ England, in the home of Darwin, relatively little work had been done on the mechanism of evolution – on the physiology of heredity and variation and the action of natural selection, for example.9 Academic biologists, by and large, had tended to devote their energies to the establishment of the historical evolutionary relationships connecting different groups in the plant and animal kingdoms. Statistics, insofar as it was an institutionalized concern, was basically non-mathematical, despite the existence of good work by Venn, Marshall, Edgeworth and others.10 British social thought of the period contained several streams which we shall see to have been relevant to the development of Pearson’s statistical work. The 1880s saw the onset of various types of socialist thought.11 In 1881 Henry George came to England: in the following year Hyndman set up the Social Democratic Federation, and, in 1883, the Fabian Society was inaugurated. All of this was played out against a growing recognition of the rottenness of urban England. 1883 saw the publication of The Bitter Outcry of Outcast London, revealing the conditions of the sub-proletariat, who were to feature in Charles Booth’s Life and Labours of the People in London as the ‘very poor’. 1890 saw the appearance of William Booth’s In Darkest England and the Way Out. 1884, 1886 and 1887 saw large civil disturbances, deeply worrying to the English middle classes. At about the same period we find Bradlaugh making a reputation on the strength of atheism, Besant facing prosecution for issuing a tract on birth control, and good popular audiences for the lay sermons of scientific populists like Tyndall, Clifford and Huxley.12 Social Darwinism was a popular genre of thought, with Darwin’s ideas being adapted in many directions to suit the preference of the adaptor.13 Some thinkers still followed Spencer in seeing Darwin’s work as underpinning a social philosophy of individualism and competition, but others (as we shall see) now read a more collectivist message from the pages of the Origin of Species. T.H. Huxley, typically, threw doubt on the value of any such process of extrapolation from nature to man.14 In the 1890s, Francis Galton was one of Britain’s leading ‘men of science’. As several authors have pointed out, he was a man motivated by strong
Salkind_Chapter 81.indd 155
9/4/2010 10:57:24 AM
156
Research Design, Measurement and Statistics and Evaluation
eugenic views, a man whose attempts to understand human heredity were inspired by the hope of showing the dominance of nature over nurture; and this, in turn, led him to uncover certain crucial statistical notions – notably those of a distribution of variations, of correlation and of regression. Before 1900, Galton was able to attract only a small following for eugenics, which remained more of a catalyst to research than a social movement. But, as several authors have noted, the events of the Boer war, coming as they did in a period occupied with a ‘quest for national efficiency’, were to pave the way for a strong popular interest in eugenics in the first decade of the twentieth century.15 As early as 1913, the Daily Sketch was splashing the birth of Eugenette Bolce, Britain’s – indeed, Hampstead’s – first ‘eugenic baby’.16
(b) Intellectual Structures Let us now pass from the background to biometry to the subject itself. Statements of its aims were common in the literature, but it may conveniently be regarded as a discipline which applied mathematics to the study of the variations found among the members of large populations, including human populations. Perhaps the standard statement of biometric problems is one due to Weldon, published first in 1893: The problem of animal evolution is essentially a statistical problem: that before we can properly estimate the changes at present going on in a race or species we must know accurately (a) the percentage of animals which exhibit a given amount of abnormality with regard to a particular character; (b) the degree of abnormality of other organs that accompanies a given abnormality of one; (c) the difference between the death rate per cent in animals of different degrees of abnormality with respect to any organ; (d) the abnormality of offspring in terms of the abnormality of parents and vice-versa. These are all questions of arithmetic; and when we know the numerical answers to these questions for a number of species, we shall know the direction and rate of change in these species at the present day — a knowledge which is the only legitimate basis for speculations as to their past history and future fate.17
The statistical developments which the pursuit of these and related biometric problems led Pearson to were nicely summarized by the sociologist S.A. Stouffer in a paper which conveys something of Pearson’s personal magnetism – one, it should be said, that could attract or repel, but was a strong force in either case.18 I wish I could communicate to you, and especially to those of you who are just now beginning your professional careers in a world of statistics incredibly more sophisticated than that of Karl Pearson’s day, something of the thrill in meeting in person and studying under a man of Pearson’s immense reputation. Author of the Grammar of Science; perfector of simple linear
Salkind_Chapter 81.indd 156
9/4/2010 10:57:25 AM
Norton
Karl Pearson and Statistics
157
correlations; inventor of multiple and partial correlation, of curvilinear correlation, of tetrachoric and biserial correlation; discoverer of the χ2 function for summarizing multinomial data with magnificent simplicity; builder of a beautiful system of frequency curves derived from a single differential equation which in turn harked back to the hypergeometric series; founder of Biometrika and author or co-author of a prolific literature applying thse new statistics to biological and sociological data — Karl Pearson was a hero of Asgard to an American boy vouchsafed a visit to the home of the gods. Indeed, Pearson was Thor himself — for the thunderbolts with which he attacked unsparingly those who dared oppose him were echoing and reechoing.
Why, one asks, did the study of biology, albeit of mathematical biology, lead to such results? Certainly, they are not the inexorable consequence of the successful application of mathematics to evolutionary biology, as readers of D’Arcy Thomson’s On Growth and Form will appreciate.19 The answer, I wish to suggest, resides in the circumstance that, for Pearson, biometry was a branch of biology which stressed very heavily the importance of exact measurement and exact description, without theory, of the observable phenomena of evolutionary biology. To see this point it is useful to consider a particular example, namely that of Pearson’s study of heredity which led to the massive developments in the theory of correlation itemized by Stouffer above. As such, heredity is a particularly good choice, for, as Stouffer’s passage indicates, Pearson’s work in statistical theory was focused very strongly upon the theory of correlation; and it would appear that this was no accident, as Pearson’s statement of the aims and goals of statistics ran as follows: The purpose of the mathematical theory of statistics is to deal with the relationship between 2 or more variable quantities without assuming that one is a single-valued mathematical function of the rest. The statistician does not think a certain x will produce a single-valued y; not a causative relation but a correlation. The relationship between x and y will be somewhere within a zone and we have to work out the probability that the point (x,y) will lie in different parts of that zone. The physicist is limited and shrinks the zone into a line. Our treatment will fit all the vagueness of biology, sociology, etc. A very wide science.20
Galton had developed the notions of correlation and regression whilst studying heredity in man, but in doing so, he always linked his statistical investigations with exercises in theorizing about the physiology of heredity – about the underlying biological mechanisms that might be responsible for the patterns of correlation and regression which he observed.21 Pearson had absolutely no time for such a combined approach. Science, for him, was the stern business of observation and measurement, and he stressed heavily what is now termed ‘operational definition’. The thrust of his approach may be gauged from the following Pearsonian definition of the problem of heredity.
Salkind_Chapter 81.indd 157
9/4/2010 10:57:25 AM
158
Research Design, Measurement and Statistics and Evaluation
Heredity. Given any organ in a parent and the same or any other organ in its offspring, the mathematical measure of heredity is the correlation of these organs for pairs of parent and offspring ... The word organ here must be taken to include any characteristic which can be quantitatively measured.22
Pearson’s goal was a phenomenal theory of heredity lacking any theoretical mediation (such as Galton’s ideas on hereditary particles). Given his chosen mathematical measure of heredity, it is unsurprising that biometry should have led to the developments in theory mentioned above. Let us take a particular example – namely, Pearson’s development of the theory of multivariate normal correlation. This was first presented in a memoir of 1896 in which he investigated contemporary claims that a relaxation of natural selection would put evolution into reverse.23 This, of course, was a view that could be supported by citing Galton’s observation that sons regressed linearly upon fathers in respect of stature with a coefficient of regression of about one third. This suggested that if an ‘improved’ population deviating from an original population mean stature by z inches was allowed to reproduce without the operation of selection, then successive generations of posterity would show z/3, z/9, z/27 inches of deviation, and so on. Pearson was anxious to combat this view, and while I prefer to discuss his motivation for so doing at a later point in the paper, it is worth pointing out that even at this early stage the social and eugenic side of biometry was present in Pearson’s published works.24 For, while he treats this problem of regression quite generally, he does make it clear that the human situation is of most concern. Galton, of course, was familiar with the bivariate normal distribution – for that, in good approximation, is the distribution followed by parental and filial statures taken jointly.25 Pearson now, in an attempt to construct a model allowing for the influence of ancestry more distant than the immediate parentage, developed an expression for the joint distribution of n normal variates – an expression, that is, for the multivariate normal correlation surface. He hoped that it would transpire that the values of the various correlation coefficients connecting different degrees of ancestry would be such as to yield multiple regression equations which indicate that when a line of ancestry had been long selected (that is, if the grandfather and the great grandfather and so on had been exceptional as well as the immediate parentage), then regression of the sort observed by Galton among the general population would no longer occur. This, indeed, was the start of Pearson’s work on the ‘law of ancestral heredity ’, which deserves separate treatment.26 All that matters for the moment is that the very significant step of developing the theory of multivariate normal correlation arose from a concern with a biological problem and from a determination to treat the problem in a particular way. Interestingly, in the same paper Pearson showed that the best value of correlation coefficient (ρ) of a bivariate normal distribution is given by the formula now said to give the ‘sample product moment coefficient of correlation.’
Salkind_Chapter 81.indd 158
9/4/2010 10:57:25 AM
Norton
Karl Pearson and Statistics
159
We can see therefore that Pearson’s massive developments of the statistical theory of correlation, the branch of his work that he invested with the highest significance, orginated in his theory-free approach to heredity. He wished to make probabilistic predictions about the outcome of a line of ancestry without the necessity of discussing underlying mechanisms of heredity. This was quite out of step with contemporary biological practice, which was, if anything, a great deal more interested in getting to grips with the underlying physiology of heredity than in the sheer business of prediction. But, said Pearson, on the eve of the rediscovery of Mendel’s ideas, the would-be physiologists were like planetary theorists rushing to prescribe a law of attraction for planets, the very orbital forms of which they have not first ascertained.27
It was in this way that the advantage of biometry led to developments in statistical theory – a circumstance, of course, that it is quite consistent with the mathematics, once embarked upon, ‘taking up a life of its own’: issues like those of the sampling distribution of the correlation coefficient then ‘arose naturally’ and had to be dealt with. But the point remains that the search for a new mathematical science of heredity, for a science of a particularly austere sort, led to developments in statistical theory. Correlation looms large in Pearson’s work, and this should not surprise us, having seen his definition of the purposes of statistics. But, as Stouffer showed, Pearson’s work was not exhausted by his labours in the field of correlation. Other aspects of his work also arose in a biometric context, and it is not too much to say that they reflect an approach to science with a massive emphasis on the production of mathematical ways of describing observable phenomena, and on ways of checking up on the goodness of the description. Thus, for example, Pearson’s first biometric paper was devoted to developing a method for deciding whether a particular assymetrical frequency curve found by Weldon when sampling crabs could be resolved as the sum of two normal distributions.28 His second paper developed the series of Pearson curves as a way of describing non-symmetrical and unresolvable distributions of (biological) data.29 And, generally, if the correlational part of Pearson’s work stemmed from a desire to find theory-free connections between different sets of data, then the aim in this other part of his work seems to have been to find ways of accurately describing any given set of data – notably by fitting a curve to it. Not all of Pearson’s early statistical developments can be seen as the direct outcome of attempts to deal with specific biological problems, but they can, I think, be reasonably seen as more general developments jibing with the aims for biometry (and, more generally, for science) noted already in Pearson’s approach. The chi-squared goodness of fit test, for example, developed in I900, is surely a good instance.30 It is not that if we know Pearson’s aims for science, his insistence on mathematical representation of the phenomena as the major goal, then
Salkind_Chapter 81.indd 159
9/4/2010 10:57:25 AM
160
Research Design, Measurement and Statistics and Evaluation
we are led to the test. That is where his genius came into play. Rather, it is that if we understand these aims and goals we can see the attraction, for him, in pursuing such a mathematical investigation.
(c) Questions of Method The remarks just made about the methodological style of biometry may be supported by going to texts, to Pearson’s methodological writings which were largely completed before his entry into biometry. They were most widely publicized in his Grammar of Science, first published in 1892.31 Given the aims and goals of biometry at the level of methodology we can, I hope, see why and how biometry led on to statistics. What I wish to suggest now is that it is no surprise that biometry had these aims and goals, for they came directly out of Pearson’s already formed methodological ideas. These, interestingly, were ones that he could develop and enhance as he developed his statistical thought. In the three editions of the Grammar (1892, 1900, 1911) we find a philosophy of science which resembles some of the views of the later Logical Positivist school of philosophy. In a doctoral thesis Chauncey Riddle has discerned three main components to Pearson’s epistemological writings, namely ‘empiricism, a Kantian emphasis on the role of the mind in organising and interpreting sensation, and a Cartesian faith in mathematics as the key to organised scientific thought’.32 The Grammar, Riddle notes, is ‘largely an attempt to impress the ideas of Mach upon the English speaking world’. This seems entirely correct; Pearson was an instrumentalist and a sensationalist, a man who denied the possibility of getting to grips with the Ding an sich and who expressly ruled out the possibility of a fruitful metaphysics. Metaphysical speculation, he in effect said, was meaningless. Objects, in this philosophy, were mental constructs out of sense data, and what so fascinates one about this aspect of Pearson’s thought is his Kantian emphasis on the possibly active power of the mind in creating experience. For he wrote that it may be the perceptive faculty itself, which, without being directly conscious of it, contributes the ordered sequence in time and space to our sense impressions. The routine of perceptions may be due to the recipient and not characteristic of the material.33
Any connection, through experience, between the self and the real world was therefore highly tenuous, and the only goal for science that made sense was an instrumental one. One could not learn about underlying realities, and the postulation of a realist ontology of atoms, molecules and so on was, in this philosophy, rendered incoherent or redundant. All that science could do was to uncover laws that summarized the flow of phenomena and functioned as instruments of prediction, whose ultimate rationale lay in the enhanced
Salkind_Chapter 81.indd 160
9/4/2010 10:57:25 AM
Norton
Karl Pearson and Statistics
161
potential for survival that they offered in the evolutionary struggle. This they did best when they partook of the economy and precision granted by expression in mathematical form. Pearson, clearly, saw biometry as an exemplar of his philosophy put into operation. He saw himself as finally ridding biology of its traditional metaphysical integuements, and took pains to introduce two new chapters on biometry in the second edition of the Grammar. Biometry, clearly, was a natural Pearsonian research programme and, it should also be clear, the statistical methods emanating from it must be seen as the mathematical encapsulation of a philosophy of science Pearson had developed before taking up biometry. Good Cartesian that he was, statistics offered a mathematical way of economically describing the flow of appearances in the non-physical sciences. But, good Kantian that he was in other respects, statistics offered the makings of a philosophical revolution which could be carried forward as his work in biometry and statistics grew. As his contributions to the theory of correlation became more refined, Pearson took to suggesting that this work was philosophically profound. For it showed that the great Kant had been wrong in asserting that determinism was a precondition for human experience.34 What was needed, Pearson wrote, was the kind of semi-determinism that the statistical methods of correlation were adapted to handling. The category under which experience fell was not deterministic causation, but, rather, the looser framework now describable via the mathematical theory of correlation. All scientists, he thundered, should desist from trying to conceptualize the world under the category of causation. Instead, they should adopt the new category implicit in his own work, namely that of correlation, under which all our experience whatever of the links between phenomena can be classified.35
All of the foregoing, I hope, lends support to the thesis that biometry begat statistics on account of its peculiar methodological form. This, by turn, was due to the circumstance that before meeting Weldon, Pearson had worked out a distinctive epistemology and methodology for science. In particular, the Kantian tinge of this philosophy made it possible for Pearson to see his work in correlation as being philosophically significant – a feature which undoubtedly sustained his interest in correlation and all its possible ramifications.
2. Science, Socialism and Social Darwinism I now come to the problem of why it was that fifteen years after graduation, after a period in which he had done no biological work at all, Pearson should have been prepared to embark upon a new career in biometry when tackled
Salkind_Chapter 81.indd 161
9/4/2010 10:57:25 AM
162
Research Design, Measurement and Statistics and Evaluation
by Weldon in the early 90s. One response, seemingly that of J.B.S. Haldane, is that Pearson’s decision to move in a biological direction rather than some other, and his founding Biometrika rather than, say, Technometrika, were largely accidents of fate: it just happened to be Weldon, a biologist, who wished for assistance.36 It seems to me that such an approach is implausible, for it undervalues the magnitude of Pearson’s response. This may be gauged from the following bibliographical statistics.37 In the period up to 1894 (that is, Pearson’s ‘pre-biometric’ phase), Pearson published 55 items listed as ‘Literary and Historical’ in the official bibliography of his works; thereafter he published only a further 10 items so classified. The period after 1894 contained 405 items listed as ‘Statistical’. Moreover, the section headed ‘Pure and Applied Mathematics and Physical Science’ contains 4 items in the period to 1894, and 32 thereafter, suggesting a more or less uniform rate of productivity in this area. In short, there does seem to have been an amazing turn-about in Pearson’s pattern of work, as if biometry had the power to absorb the interests that were previously being discharged in the production of literary and historical work. We must ask why this change occurred. It is this turn-about by Pearson that I now address, but not before stressing that it would be wrong to see Weldon’s role as an overly simple one. Weldon may have led Pearson to use and develop methods pioneered by Galton, but we have to explain why it was that Galton’s works did not speak to Pearson unmediated by Weldon. Indeed, things are more difficult even than this, for Pearson had encountered Galton’s Natural Inheritance at the date of its publication in 1889, and had given a talk upon it to a Men and Women’s Club of which he was then a member. (I shall return to this club in the next section of the paper.) In his talk, Pearson gave a less than fulsome account of Galton’s methods: Personally I ought to say that there is, in my own opinion, considerable danger in applying the methods of the exact sciences to problems in descriptive science, whether they be problems of heredity or of political economy: the grace and logical economy of the mathematical processes are apt to so fascinate the descriptive scientist that he seeks for sociological hypotheses which fit his mathematical reasoning and this without first ascertaining whether the basis of his hypothesis is as broad as that human life to which the theory is to be applied. I write therefore as a very partial sympathiser with Galton’s methods.38
And, in his copy of Galton’s book, Pearson pencilled in his exasperation with Galton’s style of argument. On page 30, for example, he wrote, testily, that It is merely an analogy without any scientific value as to the how still less to the why.39
Yet, later on, Pearson recalled that he had interpreted the introduction to Natural Inheritance to mean that
Salkind_Chapter 81.indd 162
9/4/2010 10:57:25 AM
Norton
Karl Pearson and Statistics
163
there was a category broader than causation, namely correlation, of which causation was only the limit, and that this new conception of correlation brought psychology, anthropology, medicine and sociology in large parts into the field of mathematical treatment. It was Galton who first freed me from the prejudice that sound mathematics could only by applied to natural phenomena under the category of causation.40
Clearly, Weldon acted as a middleman, able to reinterpret Galton’s statistical approach to biological matters in a manner that harmonized with Pearson’s stern methodological criteria. Certainly, in the statement of problems due to Weldon, and in Weldon’s early work, we find none of the analogical reasoning and physiological theorizing that Pearson so disliked in Galton’s work. But, if we accept that some methodological refining was necessary if Pearson was to take the biostatistical bait, so to speak, there remains the issue of explaining his subsequent total devotion to biostatistical inquiry, his new devotion to biological inquiry. One still wishes to know why Pearson was so prepared to dive into biological and evolutionary issues fifteen years after graduating as a mathematician. In the remainder of this section, I shall try to show that by the time of his meeting with Weldon, Pearson was intellectually primed to take up just the investigations that he did. In the final section I shall address the issue of how he came to be so primed. It should be remembered that Pearson’s philosophy of science was also a philosophy of life. It is no surprise to learn this when one recalls that Pearson’s ideal was the freethinker, the abider by the ‘ethic of freethought’. This person would have ‘assimilated the results of the highest scientific and philosophical knowledge of the day’, he would be a ‘sound citizen’, trained in the ‘impersonal judgement’ criteria of the scientific intellectual: he would be able to assess, for example, the views of Weismann on the continuity of the germ plasm and to employ this judgement when considering the right conduct of society towards its ‘anti-social members’. This, Pearson averred, would remain an open question until one knew ‘what science has to tell us on the fundamental problems of inheritance’. Quite generally, Pearson wrote, in the Grammar, each one of us is now called upon to give a judgement upon an immense variety of problems, crucial for our social existence. If that judgement confirms measures and conduct tending to the increased welfare of society, then it may be termed a moral, or better, a social judgement. It follows then that to ensure a judgement’s being moral, method and knowledge are essential to its formation. It cannot be too often insisted upon that the formation of a moral judgement — that is one which the individual is reasonably certain will lead to social welfare — does not depend solely on the readiness to sacrifice individual gain or comfort, or on the impulse to act unselfishly: it depends in the first place upon knowledge and method. The first demand of the state upon the individual is not for self sacrifice, but for self improvement.41
Salkind_Chapter 81.indd 163
9/4/2010 10:57:25 AM
164
Research Design, Measurement and Statistics and Evaluation
And, as one reads further into the pages of the Grammar, it becomes clear that what Pearson means by ‘increased welfare of society’ is not some Benthamite entity, but, rather, something crucially related to ideas like those of ‘national survival and supremacy in the inevitable international competition for existence’. Pearson, indeed, is known to social historians as a key promoter of ‘external’ social Darwinism, of the doctrine that the correct way of envisaging the struggle for existence in human affairs is not at the level of man against man, but at that of nation or race against nation or race, with success going to the best organized group. ‘ The growth of national and social life’, Pearson wrote, can give us the most wonderful insight into natural selection, and into the elimination of the unstable on the widest and most impressive scale.42
So, for Pearson, morality was dictated by considerations of what would be of avail to a society in its necessary struggle with other societies, and it is in this context that the defence of socialism appears in Pearson’s work – though, as we shall see, his style of socialism was distinctive. Socialism, by which he meant the ‘tendency for social organisation, always prominent in political communities’, could be justified by its power to bestow success in the ‘intense struggle which is ever waging between society and society’. The lesson of history was the lesson of socialism, and science would ultimately balance ‘the individualistic and socialistic tendencies better than Haeckel and Spencer seem to have done’. Certainly, in the face of the severe struggle, physical and commercial, this fight for land, for food and for mineral wealth between existing nations, we have every need to strengthen by training the partially dormant socialist spirit, if we as a nation are to be among the surviving fit.43
This new pattern of organization, said Pearson, must ‘largely proceed from the state’. Here it is that science relentlessly proclaims: a nation needs not only a few prize individuals; it needs a finely regulated social system — of which the members as a whole respond to each external stress by organized reaction — if it is to survive in the struggle for existence.44
And, quite generally, if we look at his writings produced by the time of his meeting with Weldon, we can see that Pearson’s social and ethical thought had a thoroughgoing Darwinian form. It certainly included commitments to the following propositions.45 (i) History must be understood in terms of the principles of Darwinian evolution. At this stage it may become a science, a biological determinism to rival historical materialism.
Salkind_Chapter 81.indd 164
9/4/2010 10:57:25 AM
Norton
Karl Pearson and Statistics
165
(ii) In important practice, the Darwinian struggle for existence in history goes on between group and group, with different social mores waxing and waning in influence according to their power to assist the group in its inexorable struggles. (iii) The ultimate legitimation of morality has to be sought in the biological standard of group survival. Only with a people attuned in their outlook, showing Clifford’s ‘tribal conscience’, could there be built up a society with ‘permanent stability’. (iv) On scientific grounds, therefore, the proper goal for the members of a society is the production of ‘a finely regulated social system’ enabling it best to survive in the struggle and to emerge ‘among the surviving fit’. The best way to achieve this was a move to a form of state socialism, run by talented experts. By now, I suggest, we should be able to see why work in evolutionary biology could so attract Pearson; why he was, so to speak, ‘primed’ to respond to Weldon. We can see too, at least in outline (an outline to be filled in in the next section), why eugenics could so attract him – for eugenics was just the branch of evolutionary biology that could be deployed to maximize the fitness of the socialist state envisaged by Pearson. No wonder we find that, in 1894, Pearson could write that it would only be when mathematical work on the ‘relative numerical importance of the several factors of natural selection’ had been completed that it would be time to talk about ‘the antagonism of socialist theory to biological laws’.46 Clearly, he was anticipating the results of work that, he would hold, showed that laissez-faire in reproduction led, not as Spencer had predicted, to sociobiological advance, but, in fact, to the proliferation of the unfit at the expense of the professional middle classes.47 Certainly, this general perspective – namely that Pearson was prepared to work in a biological field when approached by Weldon because his thought was already steeped in Darwinian notions needing, given his philosophy, mathematical development – may be supported powerfully by autobiographical evidence. This takes the form of a letter which Pearson wrote to the Manchester Guardian in 1901, replying to its review of his recent work, National Life from the Standpoint of Science. The latter was a gloomy and aggressive jeremiad which had presented a ‘scientific’ view of the nation as that of an organised whole, kept up to a high pitch of internal efficiency by insuring that its numbers are substantially recruited from the better stocks, and kept up to a high pitch of external efficiency by contest, chiefly by way of war with inferior races, and with equal races by the struggle for trade routes and for the sources of raw material and food supply. This is the natural history view of mankind, and I do not think you can in its main features subvert it.48
Salkind_Chapter 81.indd 165
9/4/2010 10:57:25 AM
166
Research Design, Measurement and Statistics and Evaluation
In his letter, Pearson took great pains to rebut the Guardian’s charge that he was just another politically ignorant biologist turning his microscope to the world of affairs with the usual disastrous consequences. What grounds, he inquired, did the reviewer have for supposing that I may not have spent more years of my life in historical work than in the study of heredity; that I may not possibly have laboured more carefully at history than at biology; that more of my published work may not deal with the former rather than the latter; nay that even my endeavour to understand something of inheritance and of racial struggle may not have arisen from my attempts to read history aright? May it not be that I am convinced that through the principle of evolution by natural selection combined with inheritance, light alone can be thrown on that maze of wars, movements, national survivals and national decays which passes for history in our current textbooks? Is it not just possible that a man who has thought and worked in the historical field may have turned to the biological field because he has been driven by the force of facts to see that the keynote to the history of man lies in the struggle for food and in the struggle to reproduce, which are the great factors at the base of all biological reasoning with regard to the development of animal life? I ask what reason you have for supposing my history an outgrowth of ‘biological consciousness’ rather than that my interest in heredity has arisen from my conviction of its bearing on historical studies.49
Here, it seems plain, we have the source of Pearson’s preparedness to enter the field of evolutionary biology.
3. Scenes from a Victorian Life If the foregoing analysis is approximately correct, and it is accepted that Pearson’s readiness to enter into biometry and the power of biometry to produce statistics linked to eugenics can be understood in terms of the social, ethical and epistemological ideas which Pearson had developed prior to his meeting with Weldon, then there remains the task of explaining how it is that he came to have this intellectual disposition. It is to this task that I now turn, and I shall proceed by discussing Pearson’s development during his ‘pre-biometric’ phase – that is, the period in which he was an undergraduate, a fellow of King’s and a London-based intellectual. As the section develops it should be possible to clarify the exact nature of Pearson’s ‘non-scientific’ thought.
(a) Cambridge The roots of Pearson’s philosophy of science and social Darwinism may first be sought in his undergraduate years at King’s College Cambridge.
Salkind_Chapter 81.indd 166
9/4/2010 10:57:25 AM
Norton
Karl Pearson and Statistics
167
Here he met Robert Parker the future law lord,50 Henry Bradshaw the librarian, Macaulay the mathematician and Oscar Browning the historian. Then, as ever, he looked for a few close friends, and was especially close with Parker. Like many undergraduates, Pearson did not enjoy a carefree life. His ‘Commonplace Book’ for 1877, for example, suggests a state of mental turmoil which led him to a piece of self-analysis in which he attempted to clarify his views on religion ‘till I was left with some definite idea of what religious belief I have or whether I have any at all’. His answer was vague and rambling, but showed clearly enough a growing contempt for laissez-faire society and for Christianity.51 At times, he wrote, he could believe in a God, but not when he encountered the poverty of Victorian Britain. Pearson, in short, was a candidate for philosophy (as had been Clifford, Marshall and others before him at similar periods in their development),52 and his writings portray him as searching for a creed, for some secular religion upon which he could focus the religious feelings so common among Victorians. This comes out more clearly in a letter to Parker, where Pearson wrote that since all my religious dogmatic faith fell to the ground, I feel that I can only be happy by adding a mystic ideality to everything, and looking at everything from a religious point of view ... It is this spirit of the ideal which Carlyle tries to cast over everything and which delights me so.53
At this time, Pearson’s non-mathematical reading was chiefly in British empiricist philosophy and in German literature – in Goethe, Herder, Schelling and others. Like Carlyle he was an enthusiast for Wilhelm Meister. In February 1879 he read Berkeley’s works, and at about the same time decided to go to Heidelberg to study philosophy and physics.
(b) Heidelberg In Heidelberg, doubtlessly, Pearson hoped to find a new philosophy, a new creed that would satisfy his need for something in which to believe. We can garner something of his mood and thoughts from his letters, but also from a book, the New Werther, which Pearson published under the pen-name ‘Loki’. The Werther, Pearson was to claim, was written in a deliberately ‘gush style’, but nevertheless it tells a great deal about Pearson’s time in Germany – for, judging from Pearson’s other attempts at fiction, it seems improbable that he had the skill to create a character whose thoughts strayed too far from his own. In the pages of the Werther we learn a great deal about his unhappiness in Cambridge, his decision to turn to Germany – the ‘country of ideas’ – and his love of things German, which was to be reflected in his changing his name from Carl to Karl. In Germany he seems to have developed a mild nature-mysticism and to have kept the company of Raphael Wertheimer, a Jewish law student and radical who
Salkind_Chapter 81.indd 167
9/4/2010 10:57:25 AM
168
Research Design, Measurement and Statistics and Evaluation
features prominently in the Werther; there he is depicted as introducing Arthur (the autobiographical tragic hero) to socialism, saying of the English that they do not recognise the difference between a French communist, a Russian nihilist, and a German social democrat, but brand them with a common stigma as subverters of society.55
Wertheimer, a social democrat, insisted that We do not wish a revoutionary change in all old laws and customs; we recognise the truths which history has taught, that real change is gradual, and yet also that change is necessary to life. The violence of some persons claiming to be members of the party is due to the ignorant and vicious whom the leaders cannot prevent from joining their banner.56
Clearly, Wertheimer found a convert of sorts in Pearson, who thereafter proclaimed himself a socialist – though, as will become ever more apparent, an elitist state-socialist. This comes out rather clearly in one of the first papers which he wrote after his return from Germany, a short work entitled ‘Anarchy ’. In this he wrote with genuine horror of the state of London’s sub-proletariat: Those weak and emaciated beings, weak and feeble as they look, have the power in their millions to throw down the few feet of bricks which guard the arsenals. Those three million could sweep a few thousand police and soldiers before them as the wind blows a handful of chaff.57
He was fearsome lest there be an uncontrolled anarchic revolution from below, something he took to be the natural outcome of existing conditions. In its place, Pearson recommended a gradual ‘revolution’ from above, leading to a form of society with ‘forms and grades’ and with power based not on a financial hierarchy but on a hierarchy of ‘power intellectual’ which alone would determine whether the life-calling of a man is to scavenge the streets, or to guide a nation.58
How the transfer was to be effected was unstated, but, Pearson insisted, the new order would need a new religion which would form a real bond ‘between class and class, between man and man solely on the score of their manhood’. Some indication of what this might mean was given in a further paper of the same year, on ‘Political Economy for the Proletariat’, which attacked traditional political economy and compared the ‘individualism of Bentham’ unfavourably with the ‘socialism of Fichte’. Pearson, clearly, was attached to some of Fichte’s ideas, and wrote that in the new order, for which he (Pearson) hoped, the state would be charged with the duty of ‘the improvement of mankind’, and that in the science that would treat of the organization of the state.
Salkind_Chapter 81.indd 168
9/4/2010 10:57:25 AM
Norton
Karl Pearson and Statistics
169
All the ordinary categories of political economy — capital, labour, land, trade and so forth — must be judged from this new standpoint, and I fear not a few of the results attained will be found to differ from the mammon-worshipping doctrines of Ricardo and his disciples.59
The nearest extant approach to what he had in mind, wrote Pearson, again reflecting his German experience, was to be found in the work of the KathederSocialisten who, under Schmoller, helped frame Bismarck’s social policies. In particular, Pearson singled out the ideas of Held and his school, citing their claims and demands with approval: They demand that the economic man must also be considered as a member of a state organism, they reject the suggestion of an unusually valid natural law, and demand that each existing judicial system must in whole and part be considered critically as a factor of the greatest importance in the formation of economic relations ...60
It seems therefore that in Germany Pearson picked up what might, somewhat anachronistically, be described as a Spenglerian view of the state, one stressing the desirability of an organic unity with hierarchical ranks and grades bound by feelings of common purpose. Shortly we shall see how this political line of thought developed whilst in London in the period prior to his meeting with Weldon. But, for the present, I would like to pause briefly to trace the early development of Pearson’s epistemology and philosophy of science at this period, thinking particularly of his interesting neo-Kantian and instrumentalist perspective upon knowledge. Returning to Heidelberg, we find that Pearson studied philosophy under Kuno Fischer, but read more widely than was required. By May 1879 he was reading Kant’s Metaphysics of Ethics as a follow up to the Critique of Pure Reason, which he had meticulously studied whilst in Cambridge. By 25 May, Pearson felt able to write to Parker, saying more about his work and rejecting the possibility of a metaphysical foundation for ethical judgement. You are certainly right about the foundation of religion not being the pure reason, this Kant I think has conclusively prove in the Kritik der reinen Vernunft. In the Metaphysics of Ethics and the Practical Reason, he attempts to base religion on morality, or a belief in God follows from the necessity of moral order in the Universe. They seem both to me thoroughly unsatisfactory. He even contradicts himself by founding his moral system on a moral sense (conscience, which is innate and universal), which he asserts dogmatically to exist. Is this innate sense the same in the cannibal and the educated man? It is not empirical, according to Kant, and there is no question of its development. If then we can’t found religion on morality we are left alone with the emotions, the feeling of want, religiosity, and quite enough too.61
Salkind_Chapter 81.indd 169
9/4/2010 10:57:26 AM
170
Research Design, Measurement and Statistics and Evaluation
Perhaps the sequel to this was not surprising. By 20 June Pearson was writing to Parker, telling of a dinner at which he had told Fischer that philosophy was a vain pursuit, and that he (Pearson) ‘felt at a lower ebb of despair with regard to the truth than I have ever felt before in my life’. And, as for truth, it was a dubious affair. Let us consider whether it can be a law of nature. Does anybody know what we mean by this expression? The more I have studied science and physics, the more I see that we know nothing of what we call nature — of electricity, light and attraction we know nothing. What is the sense of calling light a vibration? Or that gravity is a force between particles of matter varying as the inverse square of the distance? ... The term was invented some hundred years ago to describe a phenomenon which it attempts to explain . ... Besides, the whole tendency of modern philosophy since Kant is to assure us that the so-called laws of nature exist in our minds, are a logical necessity of our minds which impress them on the things themselves for they can only observe things in such relations. Fancy truth a function of that absurd humbug man’s mind!62
Faced with such difficulties, Pearson decided temporarily to abandon the study of philosophy, his reason having been shattered ‘by the purely negative results’ found in the works of the philosophers. Briefly thereafter he toyed with the idea of going to Berlin, to work in natural science with Kirchoff or Helmholtz; but, by October, Pearson had decided to throw over both physics and philosophy and reluctantly to submit to a career at the bar.63 As we have seen, he was to return to philosophy, and would build upon the base, small that it was, that was constructed in Heidelberg – namely his conviction that science described but did not explain; his views on the impossibility of knowing the thing in itself; and his addiction to some of Kant’s ideas. Unsurprisingly, Pearson did not favour Kant’s metaphysical approach to ethics. We have seen this above, but the full force of his distaste came out in a review of 1883, of one of Fischer’s books. In the review Pearson wrote kindly of Kant’s Critique of Pure Reason but harshly of his ethics. And, thinking doubtlessly of the Hegelian revival in Oxford, he noted that there was in the ethical writings an entire change of front, the door is to be thrown open to the whole body of emotionalists, mystics and metaphysical idealists.64
Clearly, Pearson was open to a non-metaphysical account of ethics, and, as we have seen, he was to find – or, more accurately, to suppose that he had found – such an account in his Darwinian explorations. Thus, it might be said that once we understand Pearson’s intellectual development in Heidelberg we are well on the way to understanding how he came to that intellectual state which made him a candidate for the sort of work in biology that would produce statistics and would ally itself with eugenics. It remains now to
Salkind_Chapter 81.indd 170
9/4/2010 10:57:26 AM
Norton
Karl Pearson and Statistics
171
consider the remainder of the 1880s, which Pearson spent in London, at first as a lawyer, and later as professor of applied mathematics at University College London.
(c) London and the Men and Women’s Club Back in London, Pearson’s thought developed steadily. On the philosophical side we find that in October 1884 publishers asked him to edit and complete the late W.K. Clifford’s Commonsense of the Exact Sciences, which he was able to publish in 1885. On the social and ethical side he was able to publish a book of collected essays, the Ethic of Freethought, in 1887. In these writings two trends may be discerned. In the Commonsense, Pearson developed the epistemological ideas which had begun to crystallize whilst in Germany, ideas bringing him closer to the Grammar of Science. While preparing the Commonsense Pearson read the works of Ernst Mach, and when contributing his own ideas on the laws of motion was delighted to be able to record that these views seemed to have ‘the weighty authority of Professor Mach of Prag’. By 1885, it would seem, the creation of his philosophy of science was almost complete.65 Pearson’s social, political and ethical thought underwent a more significant development, for we find an increasing introduction of ‘Darwinian’ ideas when discussing social organization and moral principles. This, perhaps, is unsurprising, for Darwin’s ideas were then on everyone’s tongues. It is hard to say precisely where Pearson’s own style of Darwinism came from, but we do know him to have been a keen student of the writings of Clifford and there is much in Clifford’s essay on ‘The Scientific Basis of Morals’ that found its way into Pearson’s thought. Certainly, he deployed Clifford’s idea of a ‘tribal conscience’.66 The drift to Darwinism is clear enough in the essays that make up the Ethic of Freethought. By 1885, in fact, most of his Darwinian ideas seem to have been formed, and may be discerned in his essay of that year on ‘The Woman’s Question’. Here, when discussing womens’ rights, he insisted that a decision about the woman’s proper social role should be consequent upon an analysis of the effects of any proposed role on national fitness. We have first to settle what is the physical capacity of woman, what would be the effect of her emancipation on her function of racereproduction, before we can talk about her ‘rights’, which are, after all, only a vague description of what may be the fittest position for her, the sphere of her maximum usefulness in the developed society of the future. The higher education of women may connote a general intellectual progress for the community, or, on the other hand, a physical degradation of the race, owing to prolonged study having ill effects on woman’s child-bearing efficiency.67
Salkind_Chapter 81.indd 171
9/4/2010 10:57:26 AM
172
Research Design, Measurement and Statistics and Evaluation
And, by 1887, judging from a paper on ‘Socialism and Sex’, the Darwinian perspective seems to have become total. In this essay we find Pearson outlining all of the theses discussed in Section 2 above, insisting, for example, that the moral or good action is that which tends in the direction of growth of a particular society at a particular time.
that Herder attempted a philosophy of history on the basis of metaphysics and naturally failed. The philosophy of history is only possible since Darwin, and the rationalisation of history by the ‘future Darwin’ will consist in the explanation of human growth by the action of physical and sexualogical laws in varying human institutions.
and that we are students of history, not because we are socialists, but socialists because we have studied history.68
The style of socialism which he advocated was taking clearer shape, but along the lines outlined in the paper on ‘Anarchy’ discussed above. In Pearson’s socialist state, in the state whose structures he increasingly supported by Darwinian rhetoric, persons like himself, ‘labourers with the head’ as he called them, would play a preeminent role. This was made quite clear at several points.69 Pearson’s growing interest in and commitment to sociobiological studies was reflected in his formation, along with Parker, of a ‘Men and Women’s Club’. The secretary of the club was Maria Sharpe, his future wife. By looking at some of the activities of the club we shall, I think, see finally and clearly why and how, by the early 1890s, Pearson was able to plunge into biometry and to link it with eugenics. The club was established in 1885, by Pearson, Parker, Elizabeth Cobb (wife of Cobb the MP) and her sisters Maria and Laetitia Sharpe, for the purpose of frank discussion of the relations between men and women. It was a select middle class group, anxious to avoid scandal, whose members were, by and large, just the sort of people one might expect to find joining the new Fabian Society.70 Members, proposed members and guests included Annie Besant, Havelock Ellis, Olive Schreiner, Eleanor Marx and Mrs Wilson the Hampstead anarchist. Mrs Wilson, interestingly, had written to Pearson in the previous year asking him whether he would care to join her, Sidney Webb and others in a reading of Marx’s Capital.71 The thirty-six meetings of the club covered a wide range of topics: prostitution, then an outrageous scandal; the relative sex drives of men and women; and, above all, patterns of sexual relations in contemporary and
Salkind_Chapter 81.indd 172
9/4/2010 10:57:26 AM
Norton
Karl Pearson and Statistics
173
defunct societies. In these surroundings Pearson’s interest in the biological basis of national fitness increased, and we find for example that in contemporary writings he referred to the right to bear children as a sacred one, and inquired if, in ‘a better organized society than the present’, it would not be fitting that either the state should have a voice in the matter, or else that a strong public opinion should often intervene? Shall those who are diseased, shall those who are nighest to the brute have the right to reproduce their like? Shall the reckless, the idle, be they poor or wealthy, those who follow mere instinct without reason, be the parents of future generations? ... Out of the law of inherited characteristics spring problems which strike very deeply into the roots of our present social habits.72
By 1889, the Club was coming apart from flagging interest, but Pearson introduced Galton’s Natural Inheritance to a final meeting, criticizing (as we have seen) its methodological structure. But what, perhaps, is of the greatest interest is his conviction, mentioned in Section l, that the regression observed by Galton in the general population would not hold for long-selected lines. And said Pearson, in one of the Club’s closing meetings, I am not advocating a return to group or even to close intermarrying, but a far more careful sexual selection on the part of those members of the community who have a large deviation physically or mentally from mediocrity.73
Here, it seems, is laid bare the basis of Pearson’s preparedness to enter biological work. By 1890 several ideas were converging. Pearson had adopted a Darwinian historicism to justify his state socialism, and, as we can see, his interest in national fitness was moving on from issues of organization to issues of biological efficiency: already he was concerned with eugenic problems, as well as the more general issues of evolution. In the period up to 1890, therefore, we can see the emergence of a framework of thought that would make biometry an attractive proposition, which would make it a science likely to produce statistical results which could be prized for their philosophical significance, and which could be used in eugenic investigations. This should be seen as another phase of Pearson’s socialism, with its emphasis on national fitness and the production of a socialist élite class of administrators of the highest quality.
Conclusion I have depicted a pattern of intellectual growth and change on Pearson’s part, reflecting in various ways the late-Victorian tide of secularism and religious doubt after the advent of Darwin, and concerns for the urban proletariat.
Salkind_Chapter 81.indd 173
9/4/2010 10:57:26 AM
174
Research Design, Measurement and Statistics and Evaluation
Pearson, one might say, responded in various ways to the conditions of his life. But to say this is only to invite the further question of why he responded in the manner that he did. Why, one wonders, did he not perhaps become a Christian socialist, or, like the respectably born Hyndman, a revolutionary? Why, in philosophy, did he tread the Machian path when others did not? Why should he have become a Darwinian in ethics when Huxley was inveighing against such moves? Possibly some answers may be obtained by studying Pearson’s social position and the natural interests arising from it.74 He was a brilliant intellectual with no investment in land or capital, with friends similarly located in the ‘nouvelle couche sociale’ which Hobsbawm has seen the Fabians as inhabiting.75 Up to a point, therefore, it may be possible to see Pearson’s élitist socialism as a reflection of this position – for, certainly, it was a form of social organization in which he and his circle would play esteemed roles. His sensationalist philosophy might perhaps be similarly interpreted, as one that eliminated the clergy from the sphere of rational influence and entrenched a new class of scientifically trained persons, again like Pearson. The eugenics concerns may perhaps be seen as jibing with the natural interests of such persons, for it gave a biological foundation to their supremacy. In short, we can see that many of Pearson’s ideas appear to be enhancing the esteem of the group with whom he identified. Whether or not such a harmonization can be seen as explaining his espousal of these ideas is, it seems to me, a question that brings us hard against the philosophical difficulties inherent in explaining an individual’s thought in terms of the interests of a group to which he has attached himself. Perhaps it is unwise to take this issue on at this point. It needs separate treatment. Possibly the case of Pearson and statistics could serve as a useful reference in such discussions.
Notes I would like to acknowledge gratefully financial assistance from the UK SSRC whilst preparing this paper. I would also like to thank Professor E.S. Pearson for permission to use the Pearson papers. 1. For the best biography of Pearson, see E.S. Pearson, Karl Pearson: An Appreciation of Some Aspects of his Life and Work (Cambridge: Cambridge University Press, 1938). For an account of Pearson’s social Darwinism, see Bernard Semmel, ‘Karl Pearson: Socialist and Darwinist’, British Journal of Sociology, Vol. 9 (1958), 111–25. The best account of secondary literature on Pearson is contained in Churchill Eisenhart’s article on Pearson in the Dictionary of Scientific Biography, Vol. 10 (New York: Charles Scribner’s Sons, 1974). 2. The pattern of development of Pearson’s writings may be discerned in G.M. Morant, A Bibliography of the Statistical and Other Writings of Karl Pearson (Cambridge: Cambridge University Press, 1938). 3. The fullest biography of Weldon is Pearson’s paper ‘W.F.R. Weldon, 1860–1906’, Biometrika, Vol. 5 (1906), 1–50.
Salkind_Chapter 81.indd 174
9/4/2010 10:57:26 AM
Norton
Karl Pearson and Statistics
175
4. For an account of Pearson’s involvement, see his biography of Weldon, ibid. note 3. See also K. Pearson, The New University for London: A Guide to its History and a Criticism of its Defects (London: T. Fisher Unwin, 1892). 5. Eisenhart, op. cit. note 1, 450. 6. An excellent account of some of the stages involved in the setting up of the department may be had in Lyndsay Farrall, ‘The Origin and Growth of the English Eugenics Movement 1865–1925’ (unpublished PhD thesis, Indiana University, Bloomington, 1970), available from University Microfilms. 7. Good discussions of the students of Pearson’s department are to be found in Farrall, ibid. 8. The impact of Pearson’s methods on psychology, for example, was significant especially in the area of the study of individual differences. See B. Norton, ‘Charles Spearman and the Doctrine of ‘g’: Genesis and Interpretation’, forthcoming in the Journal of the History of the Behavioural Sciences. For the citation, see E.S. Pearson, op. cit. note 1, 119. 9. It should be recalled that, before the 1870s, there was very little biological research done in the English Universities, and that, at Cambridge, for example, experimental work was seriously introduced only after the appointment of Michael Foster to a praelectorship of physiology at Trinity College in 1870. His protégé F.M. Balfour started England’s leading school of evolutionary biology, and as may be seen by inspecting Balfour’s masterful Treatise on Comparative Embryology, 2 Vols. (London: Macmillan, 1880–81), the paradigm of this school was one of phylogenetic morphology. 10. The best general study of the history of statistics is, perhaps, H.M. Waller, Studies in the History of Statistical Method (Baltimore, Md.: Williams and Wilkins, 1929). The generally non-mathematical tenor of institutionalized statistics prior to 1900 may be seen by inspecting the volumes of the Journal of the Royal Statistical Society for that period. 11. For details put in a way relevant to this paper see Bernard Semmel, Imperialism and Social Reform: English Social-Imperial Thought, 1895–1914 (London: Allen and Unwin, 1960). 12. Perhaps the best account of the radical London intelligentsia of the period is to be had in W.S. Smith, The London Heretics 1870–1914 (New York: Dodd Mead and Co., 1968). See also G. Stedman-Jones, Outcast London (Oxford: The Clarendon Press, 1971). 13. For a good discussion of social Darwinism, see G. Himmelfarb, Victorian Minds (London: Weidenfeld and Nicolson, 1968). See, in particular, Chapter 12, ‘Varieties of Social Darwinism’. 14. See T.H. Huxley, ‘Evolution and Ethics’, in Evolution and Ethics (London: Macmillan, 1911), 46–116. This chapter was based on Huxley’s Romanes Lecture for 1893. 15. For an account, see D. MacKenzie, ‘Eugenics in Britain’, Social Studies of Science, Vol. 6 (1976), 499–532. 16. See the Daily Sketch (3 October 1913). 17. W.F.R. Weldon, ‘On Certain Correlated Variations in Carcinus Moenas’, Proceedings of the Royal Society, Series A, Vol. 54 (1893), 329. 18. S.A. Stouffer, ‘Karl Pearson—An Appreciation on the 100th Anniversary of His Birth’, Journal of the American Statistical Association, Vol. 53 (1958), 23–27, esp. 23. 19. See Ruth D’Arcy Thompson, D’Arcy Wentworth Thompson (London: Oxford University Press, 1968), particularly the postscript by P.B. Medawar, ‘D’Arcy Thompson and “Growth and Form”.’ 20. E.S. Pearson, op. cit. note 1, 97. 21. See, for example, Galton’s early mathematical speculations on Darwin’s theory of pangenesis, in F. Galton, Hereditary Genius (London: Macmillan, 1869), especially the closing section, ‘General Considerations’. 22. K. Pearson, ‘Mathematical Contributions to the Theory of Evolution III: Regression, Heredity and Panmixia’, Philosophical Transactions of the Royal Society, Series A, Vol. 187 (1896), 253–318; quotation at 259.
Salkind_Chapter 81.indd 175
9/4/2010 10:57:26 AM
176
Research Design, Measurement and Statistics and Evaluation
23. Ibid. Donald MacKenzie’s accompanying paper, in the same issue of this journal, gives another vivid illustration of the way in which the study of heredity led Pearson into work in correlation: see D. MacKenzie, ‘Statistical Theory and Social Interests: A CaseStudy’, Social Studies of Science, Vol. 8 (1978), 35–83. 24. We shall see this point with increasing force as the paper proceeds. But, for the present, note that Pearson’s 1896 paper (op. cit. note 22) is clearly written with the human condition in mind: see, particularly, 306–08. 25. For accounts of Galton’s involvements in statistics, see V. Hilts, ‘Statistics and Social Science’, in R. Giere and R. Westfall (eds), Foundations of Statistical Method in the 19th Century (Bloomington, Ind.: Indiana University Press, 1973), 243–58. See also R.S. Cowan, ‘Francis Galton’s Statistical Ideas: the Influence of Eugenics’, Isis, Vol. 63 (1972), 509–28. 26. For a discussion, see W. Provine, The Origins of Theoretical Population Genetics (Chicago: The University of Chicago Press, 1971), 179–87. 27. K. Pearson, ‘Mathematical Contributions to the Theory of Evolution VIII’, Philosophical Transactions of the Royal Society, Series A, Vol. 195 (1901), 121. 28. K. Pearson, ‘Contributions to the Mathematical Theory of Evolution’, Philosophical Transactions of the Royal Society, Series A, Vol. 185 (1894), 71–110. 29. K. Pearson, ‘Contributions to the Mathematical Theory of Evolution II: Skew Variations in Homogeneous Material’, Philosophical Transactions of the Royal Society, Series A, Vol. 186 (1895), 343–414. 30. K. Pearson, ‘On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it can be Reasonably Supposed to have Arisen from Random Sampling’, Philosophical Magazine, Series 5, Vol. 50 (1900), 157–75. 31. K. Pearson, The Grammar of Science (London: Walter Scott, 1892). 32. C. Riddle, ‘Karl Pearson’s Philosophy of Science’ (unpublished PhD dissertation, Columbia University, New York, 1958). Available from University Microfilms. See Abstract. 33. K. Pearson, op. cit. note 31, 128. 34. See Pearson’s chapter on ‘Contingency and Correlation’, in the third edition of his Grammar of Science (London: Adam and Charles Black, 1911). 35. Ibid., 170. 36. See J.B.S. Haldane, Karl Pearson 1857–1957 (London: Biometrika Trustees, 1958), 10. 37. Taken from Morant, op. cit. note 2. 38. Pearson papers: the text of Pearson’s talk ‘On the Laws of Inheritance according to Galton’ is in Cabinet 5, drawer 6. The Pearson papers are kept at the archive room, University College London. 39. This book is kept in the Pearson archive, University College London. 40. K. Pearson, in Speeches at a Dinner held in University College London, in Honour of Professor Karl Pearson (Cambridge: privately printed, 1934). 41. K. Pearson, op. cit. note 31, 34. 42. Ibid., 425. 43. Ibid., 435. 44. Ibid., 436. 45. These views are gathered from Chapter 9, ‘Life’, of the Grammar of Science (op cit. note 31), and from the various essays making up Pearson’s The Ethic of Freethought (London: T. Fisher Unwin, 1888). A very brief statement of Pearson’s position is given in the Grammar, 438, where, after asserting that it was ‘a false view of human solidarity’ that would regret ‘that a capable and stalwart race of white men should replace a dark skinned tribe’, he claimed again that the ‘principle of the survival of the fittest ... is from
Salkind_Chapter 81.indd 176
9/4/2010 10:57:26 AM
Norton
46.
47.
48. 49. 50. 51. 52.
53. 54.
55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66.
67. 68. 69.
Karl Pearson and Statistics
177
the standpoint of science the sole account we can give of those purely human faculties of healthy activity, of sympathy, of love, and of social action which men value as their chief heritage.’ K. Pearson, ‘Socialism and Natural Selection’. This essay was reprinted in Vol. 1 of Pearson’s Chances of Death and Other Studies in Evolution (London: Edward Arnold, 1897). See p. 138. From quite an early stage in his work Pearson floated the idea that the British nation was declining due to the growth in fertility of the lower classes and the diminution of fertility among the professional classes, whom he took to be genetically superior. The most politically effective work that was done in this area was performed, under Pearson’s instruction, by David Heron, whose memoir on ‘The Relation of Fertility in Man to Social Status and on the Changes in this Relation that have taken Place in the Last Fifty Years’, was published in 1906 as one of the Biometric Laboratory’s ‘Studies in National Deterioration’. See K. Pearson, National Life from the Standpoint of Science (London: Adam and Charles Black, 1901), 43–44. K. Pearson, letter to the Manchester Guardian (15 February 1901). Robert Parker (1857–1918) rose to a Baronetcy, and was to be a leading law lord. Pearson papers: the Commonplace Book is in Cabinet 2, drawer 1. A detailed account of some of the tensions felt on religious issues, most useful for comparative purposes, by some leading Victorian intellectuals is contained in F.M. Turner, Between Science and Religion (London and New Haven, Conn.: Yale University Press, 1974). Pearson papers: Pearson to Parker, 18 September 1878. See K. Pearson (‘Loki’), The New Werther (London: C. Kegan Paul and Co., 1880). At the beginning of the book, interestingly, Pearson wrote that its contents ‘truly image the mind of him who has written them, and therefore necessarily to some extent the minds of the children of his generation, who are passing through a like struggle’. Ibid., 33. Ibid., 34. K. Pearson, ‘Anarchy’, The Cambridge Review, Vol. 2 (1881), 268–70. Ibid., 270. K. Pearson, ‘Political Economy for the Proletariat’, The Cambridge Review, Vol. 3 (1881), 123–26. Ibid., 124. Pearson papers: Parker to Pearson, 25 May 1879. Ibid., Pearson to Parker, 20 June 1879. Ibid., Pearson to Parker, October 1879. K. Pearson, ‘Kuno Fischer’s New Critique of Kant’, The Cambridge Review, Vol. 5 (1883), 109–11, esp. 111. See the prefatory remarks by Pearson to W.K. Clifford, The Common Sense of the Exact Sciences (London: Kegan Paul Trench and Co., 1885). Clifford’s ideas were developed in his famous 1875 essay ‘On the Scientific Basis of Morals’, reproduced in W.K. Clifford, Lectures and Essays (London: Macmillan, 1879), Vol. 2, 74–95. Pearson, The Ethic of Freethought, op. cit. note 45, 371. The original essay was written in 1885. Ibid., 428. The essay dates from 1887. See, for example, Pearson’s ‘Socialism in Theory and Practice: being a lecture delivered to a working class audience’, dating from February 1884 and reproduced in The Ethic of Freethought, op. cit. note 45. In this lecture Pearson made it quite clear that while all forms of labour were equally honourable, some forms of labour, namely that done with the brain, were far the most important.
Salkind_Chapter 81.indd 177
9/4/2010 10:57:26 AM
178
Research Design, Measurement and Statistics and Evaluation
70. For an analysis of the Fabians, see Chapter 14, ‘The Fabians Reconsidered’, of E.J. Hobsbawm’s Labouring Men (London: Weidenfeld and Nicolson, 1968). 71. Pearson papers: Mrs Wilson to Pearson, 22 October 1884. 72. Pearson, The Ethic of Freethought, op. cit. note 45, 391. 73. Pearson papers: op. cit. note 38. 74. This possibility was interestingly discussed by Donald MacKenzie at a colloquim on Eugenics in England held at the University of Leeds, July 1977. See also the papers by MacKenzie cited in notes 15 and 23. 75. See Hobsbawm, op. cit. note 70.
Salkind_Chapter 81.indd 178
9/4/2010 10:57:26 AM
82 A History of Effect Size Indices Carl J. Huberty
D
uring the past several decades, there has been an exponential increase in the frequency of publications criticizing uses of statistical testing; this pattern has occurred across disciplines as diverse as psychology and wildlife studies (Anderson, Burnham, & Thompson, 2000). Concomitantly, there has been an increased emphasis placed on the reporting and interpretation of effect sizes. For example, the American Psychological Association (APA) Task Force on Statistical Inference recently emphasized, “Always provide some effect-size estimate when reporting a p value” (Wilkinson & APA Task Force on Statistical Inference, 1999, p. 599). The Task Force also wrote, Always present effect sizes for primary outcomes . . . . It helps to add brief comments that place these effect sizes in a practical and theoretical context. . . . We must stress again that reporting and interpreting effect sizes in the context of previously reported effects is essential to good research. (p. 599)
The editorial policies of the following 19 journals now require effect size reporting: • • • • •
Career Development Quarterly Contemporary Educational Psychology Early Childhood Research Quarterly Educational and Psychological Measurement Exceptional Children
Source: Educational and Psychological Measurement, 62(2) (2002): 227–240.
Salkind_Chapter 82.indd 179
9/4/2010 10:57:13 AM
180
• • • • • • • • • • • • • •
Research Design, Measurement and Statistics and Evaluation
Journal of Agricultural Education Journal of Applied Psychology Journal of Community Psychology Journal of Consulting & Clinical Psychology Journal of Counseling and Development Journal of Early Intervention Journal of Educational and Psychological Consultation Journal of Experimental Education Journal of Learning Disabilities Language Learning Measurement and Evaluation in Counseling and Development The Professional Educator Reading and Writing Research in the Schools
It is noteworthy that two of these journals are the flagship journals of the American Counseling Association and the Council of Exceptional Children. The fifth edition of the APA (2001) Publication Manual also emphasizes the importance of effect size reporting: For the reader to fully understand the importance of your findings, it is almost always necessary to include some index of effect size or strength of relationship in your Results section. You can estimate the magnitude of the effect or the strength of the relationship with a number of common effect size estimates. . . . The general principle to be followed ... is to provide the reader not only with information about statistical significance but also with enough information to assess the magnitude of the observed effect or relationship. (pp. 25–26)
For the past two decades or so, the notion of effect size has been fairly common across introductory statistical methods textbooks, particularly in the behavioral sciences. (An exception to this is the book by Moore, 2000, who is, by background and current position, a bona fide statistician; as good as this book is, it does not include the notion of effect size.) The commonality has not, however, carried over to behavioral science journals that typically report results of quantitative studies, although an increased emphasis on the reporting of effect size index values has taken place very recently. For the past three decades or so, the notion of effect size very commonly pertained to differences between (and sometimes among) means of scores on a single outcome variable. Even to this day, if the expression effect size is heard or read, it is estimated that a large percent (>95%?) of empirical researchers and methodologists will think of the univariate (two-group?) mean-comparison context. I would maintain that the effect size notion applies to contexts in addition to that involving univariate mean comparisons. These other contexts include, but are not limited to, multiple regression or prediction, multiple correlation,
Salkind_Chapter 82.indd 180
9/4/2010 10:57:14 AM
Huberty
A History of Effect Size Indices
181
multivariate analysis of variance, and univariate proportion comparisons. No consistent historical pattern was found across all types of effect size indices. In fact, some indices that are currently considered effect size indices were not originally proposed as effect size indices. Bases for univariate mean-comparison effect size indices may be categorized according to three interpretations: relationship, group difference, and group overlap. The historical development of these three interpretations of effect size will now be addressed. (As will be evidenced below, these three interpretations may not be considered in any way to reflect “new” thinking.) Some effect size indices in a multiple-response-variable context will follow in a separate section. The article concludes with a Comments section.
Relationship Indices In the context of data analysis, relationship typically refers to the correlation between two characteristics or attributes for a set of analysis units. According to some documentation (e.g., Cowles, 1989, pp. 123, 132; Hald, 1998, p. 164; Johnson & Kotz, 1997, p. 109; Stigler, 1986, p. 298), Francis Galton (1822–1912) originated the concept of correlation in 1889, although a year earlier he used the word co-relation (Galton, 1888). Stigler (1999, p. 89) maintained, however, that the concept of correlation was reported some 30 years earlier by Charles Darwin (1809–1882), who was a cousin of Galton. Cowles (1989, p. 141) and Stigler (1986, p. 353) disagreed on whether Auguste Bravis (1811–1863) used the idea of correlation in a 1846 paper. (Stigler, 1986, chap. 9, provides an excellent discussion of the history of [simple] correlation.) It was in 1892 that Francis Y. Edgeworth (1845–1926) used the expression coefficient of correlation for the symbol ρ (parameter and statistic were not then commonly differentiated). A disciple of Galton, Karl Pearson (1857–1936), began to popularize the correlation coefficient – he used r – around 1896. Some years later, Pearson (1905) defined and labeled η the correlation ratio. This coefficient was developed in the context of multiple data arrays (like groups in analysis of variance [ANOVA]) that typically suggested a nonlinear relationship between the grouping variable and the outcome variable. In 1924, Ronald A. Fisher (1890 – 1962) derived the probability distribution of η in the context of ANOVA. But the explicit analysis connection between the ANOVA F test and the correlation ratio was not made until 1935 by Truman L. Kelley (1884 –1961). (In an ANOVA context, the η value reflects the correlation between the grouping variable and the outcome variable.) In making the connection, Kelley (1935) proposed an adjustment of the statistic η2 (to reduce the estimation bias) that he labeled ε2. (For a more detailed historical discussion of ε2 and ω2, along with the respective estimators, see Glass & Hakstian, 1969.) It may be pointed out that the bias in η2 as an estimator for its population counterpart was recognized by Pearson (1923). (Relationships
Salkind_Chapter 82.indd 181
9/4/2010 10:57:14 AM
182
Research Design, Measurement and Statistics and Evaluation
among η2, ε2, and ω2 and how the latter two reduce but do not remove the bias related to η2 are discussed in some detail by Richardson, 1996, pp. 18–19.) What may be the first textbook that connected η2 and ε2 to ANOVA is that by Peters and Van Voorhis (1940, p. 319). It is interesting that the many editions of the two Fisher (1925, 1935) books (Statistical Methods for Research Workers and The Design of Experiments) did not make the connection. Fisher’s lack of attention to the connection was emphasized by Yates (1951) when he mentioned that “research workers ... pay undue attention to the results of ... tests of significance ... and too little to the estimates of the magnitude of the effects they are investigating” (p. 32). (This appears contrary to Kirk’s 1996, p. 748, statement that Fisher did make the connection in his 1925 book, Statistical Methods for Research Workers.) In a textbook aimed at the behavioral sciences, Diamond (1959) proposed another expression, differentiation ratio, for η2 because “it tells us with what success the groups ... have been differentiated by the principle which underlies their classification” (p. 55). A second alternative to η2 was proposed in a textbook by William L. Hays (1926 –1995) in 1963. The Hays (p. 325) index, denoted “est. ω2,” is interpreted as an estimator for the strength of association between a grouping variable and an outcome variable. (Later in his book, Hays, p. 547, used η2 to denote a correlation ratio.) In sum, then, three different strength-ofrelationship estimators have been proposed from 1935 to 1963: η, ε, and ω. The proposals for the latter two were made with the idea of reducing estimation bias associated with the first one. (Over the years, notation has been a little bit of an issue in that the three Greek letters are often used to represent sample values.) When levels of the grouping variable in an ANOVA context are random (rather than fixed), some methodologists suggest that an intraclass correlation coefficient, ρ1, be used as an effect size index. The random-fixed issue was recognized by Hays (1963, p. 424) and further discussed by Vaughan and Corballis (1969). Richardson (1996, p. 19) provided a more detailed discussion of this issue. It was alluded to earlier in a two-group research situation that the strength of the relationship pertains to the relationship between the (continuous) outcome variable and the dichotomous grouping variable. If the dichotomy is imposed, then the index of relationship is the biserial correlation coefficient. The biserial r idea was suggested by Pearson (1910). According to Stigler (1999, p. 18), Pearson later used the expressions “biserial r” and “biserial η.” The use of biserial η implies (to me, at least) that the biserial correlation coefficient is simply a special case of η. If the dichotomy is natural (e.g., with gender or with experimental versus control), then the square of the pointbiserial correlation coefficient, which is a special case of η2, could be considered an effect size index, although it was not so considered 90 years ago. Although indices of relationship have been and are currently considered to assess effect size, it is common to square such an index and consider percent
Salkind_Chapter 82.indd 182
9/4/2010 10:57:14 AM
Huberty
A History of Effect Size Indices
183
of shared variance to assess the magnitude of an effect. A little history on such a perspective of effect is mentioned earlier in this section. Standard cutoffs for some such indices have been suggested (see, for example, Cohen, 1969, pp. 277–281). Rosenthal and Rubin (1979) pointed out a problem of assessing an r2 value of, say, .14, as being “small” in some specific research situations. A little later, Rosenthal and Rubin (1982) proposed the binomial effect size display (BESD) as an aid in assessing the “practical importance of any effect indexed by a correlation coefficient” (p. 242). Finally, an index of relationship that was used with two dichotomous variables was originally proposed by George Udney Yule (1871–1951). Three variations of the Yule (1900) index, Q, have been suggested: Pearson coefficient of mean square contingency, Pearson tetrachoric coefficient of correlation, and Tschuprow coefficient (see Cowles, 1989, pp. 142–143, and Mac-Kenzie, 1981, chap. 7, “The Politics of the Contingency Table”). In the current context of comparing two groups using a dichotomous outcome variable, a Q-derived value could be used as an effect size value. A popular index that could be used, but seldom is in research practice, is the so-called C coefficient named after Harald Cramér (1893–1985), which Cramér (1946, p. 282) actually attributed to Karl Pearson. The Cramér C may also be used as an effect size index in the context of comparing multiple proportions. (Use of effect size indices for categorical data was fairly recently discussed by Fleiss, 1994.)
Group Difference Indices In the two-group mean-comparison situation, the typical effect size index considered is a standardized mean difference. Such an index was proposed by Jacob Cohen (1923–1998) in 1962 when he used the letter d. A standardized difference was also included in a discussion by Hays (1963, p. 329), which involved a fairly direct relationship between the population counterpart of d, δ, and ω2. During the 1970s and 1980s, there were some discussions as to which standard deviation should be used as the denominator in d. Two suggestions made were the standard deviation pooled across the two groups proposed by Cohen (1969, p. 18) and the standard deviation of the control group – the definition of which is not always clear – proposed by Glass (1976). The letter d was used by both Cohen and Glass. (Hedges, 1981, took an exception to these two proposals because of bias in the estimators and suggested an adjusted d, denoted by g.) Cohen (1962, p. 148) also proposed a standardized-difference type of an index that might be used in a multiple group context. Here, as in Cohen (1969, p. 267), the letter f was used. This index reflects the variability of the group means relative to a standard deviation. (As pointed out by Cohen, 1969, p. 274, there is a relationship between f and η: f 2 = η2/(1 − η2).)
Salkind_Chapter 82.indd 183
9/4/2010 10:57:14 AM
184
Research Design, Measurement and Statistics and Evaluation
About the same time, Winer (1962, p. 57) proposed an index for “the effect of treatment j” (p. 274): tj = μj − μ, where μj is the mean for population j and μ is the grand mean across all of the populations. A short time later, Cohen (1969, p. 269) suggested a standardized mean difference in the context of more than two groups; parameterwise, the index is δ = (μmax − μmin)/σ, where μmax is the largest mean, μmin is the smallest mean, and σ is the standard deviation common to all of the populations involved. (The use of a standardized difference in more complex ANOVA designs is discussed by Olejnik & Algina, 2000, pp. 248–258.) When the outcome variable is dichotomous, group differences are assessed by comparing two proportions. Cohen (1962) suggested the simple difference in proportions, |P1 − P2|, as an effect size index in the two-group context. For testing equality of multiple proportions, Cohen (1962) suggested the use of the ratio of the largest proportion to the smallest proportion as an effect size index.
Group Overlap Indices Building on the earlier work of Kelley (1920, 1923), in 1937, John W. Tilton (1891–1980) suggested that the amount of group overlap be considered – in two-group univariate mean comparisons – in determining whether two means are significantly different. Tilton (1937) proposed that “the comparison of means should be supplemented whenever possible by an explicit measure of overlapping, such as the percentage of area common to the two distributions” (p. 657) and that this calculation be based on “two perfectly normal distributions” (pp. 661–662). Tilton’s notion of group overlap as related to two-group statistical testing sat dormant for about 30 years, until it was revisited by Dunnette (1966) and Alf and Abrahams (1968) and a few years later by Elster and Dunnette (1971). Dunnette (1966) restated Tilton’s idea: “The greater the amount of overlap, the less effective is the predictor in separating the two distributions” (p. 142). (He used predictor for what we currently call an outcome variable.) Alf and Abrahams (1968) presented a fair bit of detail of calculating the percent of group overlap assuming two normal distributions for the outcome variable. Elster and Dunnette (1971) studied the robustness of Tilton’s (1937) measure of overlap when the two distributions of outcome variable scores are nonnormal. Oakes (1986, p. 54) mentioned that the misclassified proportion was considered by Eysenck (1971, p. 34) to distinguish the theoretical interest in differences in IQ scores between races. The concept of group overlap as an effect size basis was also considered by Cohen (1969, pp. 19–21) in the context of a two-group mean comparison. Group overlap was also considered in a two-group context by Kraemer and Andrews (1982) when they suggested using D as the standard normal deviate that corresponds to the proportion of
Salkind_Chapter 82.indd 184
9/4/2010 10:57:14 AM
Huberty
A History of Effect Size Indices
185
analysis units in one group that are less than the median score of the other group. (For a recent discussion of some parametric and nonparametric effect size indices, see Hogarty & Kromrey, 2001.) It was Levy (1967) who may have been the first to relate the notion of group overlap to univariate predictive discriminant analysis (PDA). What he considered was the proportion of misclassified units of analysis into the two groups as a “simple matter to proceed from the usual test of statistical significance to a measure of the substantive significance” (p. 38). (It may be noted that the outcome variable in the original study design will play the role of a predictor variable in the PDA, a conceptual variable role reversal.) The relationship between Levy’s idea of group overlap was not explicitly connected to that of Tilton (1937). Some elaboration on the group overlap idea as applied to univariate two-group comparisons was given about 20 years ago by Huberty and Holmes (1983). More recently, Huberty and Lowman (2000) proposed the use of the better-than-chance notion in using group overlap assessed via a PDA as a basis for effect size in the multiple outcome variable context – here, the letter I is used (see also Hess, Olejnik, & Huberty, 2001).
Multivariable Indices In this section, multivariable refers to multiple response variables. A discussion of a design with one or more grouping variables and one response variable was given in the previous sections. The concept of multiple correlation was originated by Pearson and Lee (1897), and in 1914, Pearson proposed the expression coefficient of multiple correlation when he used the symbol R. The association of an effect size index with a multiple regression analysis (MRA) or a multiple correlation analysis (MCA) has been virtually ignored by applied researchers in the behavioral sciences. In relating MRA to ANOVA, Cohen (1977, p. 410) suggested an f-type index, f 2 = R 2/(1 − R 2); the statistical test of interest here is that the true multiple correlation coefficient is zero. (Cohen’s use of f here is consistent with what he used in an ANOVA context when f 2 = η2/(1 − η2).) This index reflects a single-to-noise ratio. A better-than-chance effect size index in an MRA or an MCA zero-correlation context was recently suggested by Huberty 2 2 (1994b): Radj – k /(N−1), where Radj is an adjusted R2 value (which depends on whether an MRA or an MCA is the focus), k is the number of X variables, and N is the sample size. The expression k /(N − 1) represents the chance 2 value of R2 under the null hypothesis that ρ2 = 0; thus, Radj − k /(N − 1) is a better-than-chance index of effect size. For testing that the true regression weight for Xj is zero, Maxwell (2000, p. 435) suggested using f 2 = (ρ2 − ρ(2− j )) / (1 − ρ2) as an effect size index, where ρ(−j) denotes the population multiple correlation coefficient involving all X variables except Xj. (It is not clear if adjusted R2 values are to be used to calculate Maxwell’s sample f 2 value.)
Salkind_Chapter 82.indd 185
9/4/2010 10:57:14 AM
186
Research Design, Measurement and Statistics and Evaluation
The effect size concept is also applicable in the context of grouping variable effects with multiple outcome variables. This is the multivariate analysis of variance (MANOVA) context. The development and discussion of multivariate indices of strength of relationship appear to have started in the early 1970s. The relevant literature was pretty much summarized by Maurice M. Tatsuoka (1922–1996) in 1973. The use of a multivariate effect size index was first (at least in the behavioral sciences) proposed by Tatsuoka (1970). Tatsuoka (1973, p. 48) and Olejnik and Algina (2000, p. 272) provided other early 1970s references of MANOVA-related effect size indices. As in the univariate mean-comparison context, the proposed MANOVA effect size indices are simple transformations of statistical test criteria. For example, one effect size index is, simply, η2 = 1 − Λ, where Λ is the MANOVA criterion originated by Samuel S. Wilks (1906–1964) in 1932, which he described as a generalization of the univariate correlation ratio (Cooley & Lohnes, 1971, p. 225). Cramer and Nicewander (1979) proposed three additional indices that may be used as effect size indices in a MANOVA context, one of which is τ2 = 1−Λ 1/r, where r = min(p, q), p denotes the number of outcome variables, and q denotes the hypothesis degrees of freedom. A little later, Serlin (1982) 2 = U/r , where U denotes the Bartlett-Pillai test criterion. proposed ηPB (Transformations of other MANOVA criteria are discussed by Huberty, 1994a, pp. 194 –196.) Just as for the univariate relationship effect size indices, an adjustment was proposed for the multivariate counterparts by Tatsuoka (1973) (see also Huberty, 1994a, p. 195; Olejnik & Algina, 2000; Tatsuoka, 1993). There is a multivariate index based on group overlap that may be utilized in a group comparison context. Following a one-factor MANOVA, group overlap may be assessed using a PDA. It is recognized that there is a role reversal for the multiple response variables and the lone grouping variable. A PDA hit rate is determined, and then the hit rate is transformed to a better-than-chance index, I, which may serve as an effect size index (Huberty & Lowman, 2000). It should be noted that the I index is applicable under covariance heterogeneity as well as under covariance homogeneity – univariate or multivariate. An abbreviated time line depicting some originations related to effect size developments is given in Figure 1.
Comments The recent rise in the popularity of the effect size concept in the behavioral sciences was mentioned earlier in this article. Elmore (2001) recently counted 61 effect size choices. Summaries of many of the available choices have been provided by Cortina and Nouri (2000), Kirk (1996, in press), Rosenthal (1994), Snyder and Lawson (1993), and Thompson (2002).
Salkind_Chapter 82.indd 186
9/4/2010 10:57:14 AM
9
12
15
18
21
24
27
Salkind_Chapter 82.indd 187
Overlap (Kelley)
36
39
42
45
48
Overlap (Tilton)
51
54
57
66
w2 (Hays)
63
rj (Winer)
d (Cohen)
60
1960 72
75
81
84
τ2 (Cramer & Nicewander)
(Serlin)
PB
η2
D (Kraemer & Andrews)
BESD (Rosenthal & Rubin)
g f 2 (Hedges) (Cohen)
78
d (Glass) h2 (Tatsuoka) d (Cohen)
69
87
90
1990
A History of Effect Size Indices
Figure 1: Some approximate years in historical developments related to effect size
h rbis Q (Yule) (Pearson) (Pearson)
33
ANOVA & e2 C h 2, e 2 (Kelley) L (Peters & (Cramér) (Wilks) Van Voorhis)
30
6
0
3
1930
1900
Huberty 187
9/4/2010 10:57:15 AM
188
Research Design, Measurement and Statistics and Evaluation
One might hypothesize that this increased popularity of effect size use is a response to the critics of statistical testing. Such criticisms go back a number of decades. There was a large number of concerns pertaining to significance tests and substantive significance in the collection of writings in Morrison and Henkel (1970), but no one explicitly proposed an effect size index. (One of the 31 writings in this book is a critique of statistical testing by Joseph Berkson (1899–1982), M.D., which appeared in a 1942 issue of the American Statistical Association Journal.) How one considers the “effect” in the interpretation process of study results depends, of course, on the study purpose and on the study design. There are study purposes that pertain to intervariable relationships, group mean differences, and group proportion differences. With respect to a comparative-group design, the effect size index to consider in a multiple-group context may depend on whether the groups are independent (betweengroups design), dependent (within-groups design), or both (split-plot design) and on the number of outcome variables (see Olejnik & Algina, 2000, for elaborations; see also Fern & Monroe, 1996, for a discussion of related restrictions in the use of effect size indices). Numerical indices that are now currently used or could be used to reflect a magnitude of effect have been available for a number of decades. As one might surmise, some relationship indices are those originated for a purpose other than to reflect size of effect in a group-comparison study, for example, η2. A number of limitations to the use of effect size indices in comparative studies were pointed out by Olejnik and Algina (2000), four of which are • • • •
Limited reliability of outcome variable scores Outcome variable variance heterogeneity Design quality Definition of grouping variable levels
How, in general, is an effect size index value utilized? Of course, it is utilized in the context of statistical testing wherein the researcher arrives at a referent distribution tail-area value, a probability value denoted here by P. Suppose the researcher also determines an effect size index value, say, E. Two approaches to using the P and E values are the following: 1. Using P, decide if an effect is obtained, and then use E to determine how big the effect is. 2. Consider the P value and the E value jointly; if the P value is small and the E value is substantial, then a real effect is obtained. The predominance of statistical testing in the behavioral sciences, at least, has led to some standards pertaining to magnitudes of P. The dominant use
Salkind_Chapter 82.indd 188
9/4/2010 10:57:15 AM
Huberty
A History of Effect Size Indices
189
of P = .05 as a standard across all types of research studies and across all types of statistical analyses is somewhat puzzling. Just as puzzling is the use of some cutoffs for describing magnitudes of E. The interpretation of the index value magnitude is, perhaps, the biggest limitation of the use of E. It appears that the only cutoffs to which applied researchers have paid attention are those standards initiated by Cohen (1969). As astutely noted by Olejnik and Algina (2000), “There is little empirical justification for these standards” (p. 277). Furthermore, as Thompson (2001) recently noted regarding Cohen’s criteria, “If people interpreted effect sizes with the same rigidity that α = .05 has been used in statistical testing, we would merely be being stupid in another metric” (pp. 82–83). As admirable as it was for Cohen (1969) to initiate some effect size magnitude guidelines, much more empirical and design research is needed to establish guidelines for different designs and different data conditions. With respect to the latter, very little thought has been given to the common condition of unequal variances or unequal covariance matrices in group comparison studies. There is an approach to effect size estimation that may be useful in a group comparison context with one or more outcome variables and with or without variance/covariance homogeneity. The index proposed is based on group overlap and involves classifying analysis units into the criterion groups. (More thinking is needed for multiple grouping variables.) The classification accuracy may be transformed to an index, I, that is a better-than-chance classification index. Some very initial guidelines for classification accuracy were proposed by Huberty and Holmes (1983) and for I values by Huberty and Lowman (2000).
References Alf, E., & Abrahams, N. M. (1968). Relationship between percent overlap and measures of correlation. Educational and Psychological Measurement, 28, 779–792. American Psychological Association. (2001). Publication manual of the American Psychological Association (5th ed.). Washington, D.C: Author. Anderson, D. R., Burnham, K. P., & Thompson, W. L. (2000). Null hypothesis testing: Problems, prevalence, and an alternative. Journal of Wildlife Management, 64, 912–923. Berkson, J. (1942). Tests of significance considered as evidence. American Statistical Association Journal, 33, 325–335. Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153. Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic Press. Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Academic Press. Cooley, W. W., & Lohnes, P. R. (1971). Multivariate data analysis. New York: Wiley. Cortina, J. M., & Nouri, H. (2000). Effect size for ANOVA designs. Thousand Oaks, CA: Sage.
Salkind_Chapter 82.indd 189
9/4/2010 10:57:15 AM
190
Research Design, Measurement and Statistics and Evaluation
Cowles, M. (1989). Statistics in psychology: An historical perspective. Hillsdale, NJ: Lawrence Erlbaum. Cramér, H. (1946). Mathematical methods of statistics. Princeton, NJ: Princeton University Press. Cramer, E. M., & Nicewander, W. A. (1979). Some symmetric, invariant measures of multivariate association. Psychometrika, 44, 43–54. Diamond, S. (1959). Information and error. New York: Basic Books. Dunnette, M. D. (1966). Personnel selection and placement. Belmont, CA: Wadsworth. Edgeworth, F. W. (1892). Correlated averages. Philosophical Magazine (5th series), 34, 190–204. Elmore, F. (2001, April). A primer on basic effect size concepts. Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA. Elster, R. S., & Dunnette, M. D. (1971). The robustness of Tilton’s measure of overlap. Educational and Psychological Measurement, 31, 685–697. Eysenck, H. J. (1971). Race, intelligence and education. London: Temple Smith. Fern, E. F., & Monroe, K. B. (1996). Effect-size estimates: Issues and problems in interpretation. Journal of Consumer Research, 23, 80–105. Fisher, R. A. (1924). On a distribution yielding the error functions of several well known statistics. Proceedings of the International Congress of Mathematics, 2, 805–813. Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh, UK: Oliver and Boyd. Fisher, R. A. (1935). The design of experiments. Edinburgh, UK: Oliver and Boyd. Fleiss, J. L. (1994). Measures of effect size for categorical data. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 245–260). New York: Russell Sage Foundation. Galton, F. (1988). Co-relations and their measurement. Proceedings of the Royal Society of London, 45, 135–145. Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5, 3–8. Glass, G. V., & Hakstian, A. R. (1969). Measures of association in comparative experiments: Their development and interpretation. American Educational Research Journal, 6, 403–414. Hald, A. (1998). A history of mathematical statistics from 1750 to 1930. New York: Wiley. Hays, W. L. (1963). Statistics for psychologists. New York: Holt, Rinehart & Winston. Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6, 107–128. Hess, B., Olejnik, S., &Huberty, C. J. (2001). The efficacy of two improvement-over-chance effect sizes for two-group univariate comparisons under variance heterogeneity and non-normality. Educational and Psychological Measurement, 61, 909–936. Hogarty, K. Y., & Kromrey, J. D. (2001, April). We’ve been reporting some effect sizes: Can we guess what they mean? Paper presented at the annual meeting of the American Educational Research Association, Seattle, WA. Huberty, C. J. (1994a). Applied discriminant analysis. New York: Wiley. Huberty, C. J. (1994b). A note on interpreting an R2 value. Journal of Educational and Behavioral Statistics, 19, 351–356. Huberty, C. J., & Holmes, S. E. (1983). Two-group comparisons and univariate classification. Educational and Psychological Measurement, 43, 15–26. Huberty, C. J., & Lowman, L. L. (2000). Group overlap as abasis for effect size. Educational and Psychological Measurement, 60, 543–563. Johnson, N. L., & Kotz, S. (Eds.). (1997). Leading personalities in statistical sciences. New York: Wiley.
Salkind_Chapter 82.indd 190
9/4/2010 10:57:15 AM
Huberty
A History of Effect Size Indices
191
Kelley, T. L. (1920). Measurement of overlapping. Journal of Educational Psychology, 11, 458 – 461. Kelley, T. L. (1923). Statistical method. New York: Macmillan. Kelley, T. L. (1935). An unbiased correlation ratio. Proceedings of the National Academy of Sciences, 21, 554–559. Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746–759. Kirk, R. E. (in press). The importance of effect magnitude. In S. F. Davis (Ed.), Handbook of research methods in experimental psychology. Oxford, UK: Blackwell. Kraemer, H. C., & Andrews, G. (1982). A nonparametric technique for meta-analysis effect size calculation. Psychological Bulletin, 91, 404–412. Levy, P . (1967). Substantive significance of significant differences between two groups. Psychological Bulletin, 67, 37–40. MacKenzie, D. A. (1981). Statistics in Britain, 1865–1930. Edinburgh, UK: Edinburgh University Press. Maxwell, S. E. (2000). Sample size and multiple regression analysis. Psychological Methods, 5, 434–458. Moore, D. S. (2000). The basic practice of statistics. New York: Freeman. Morrison, D. E., & Henkel, R. E. (Eds.). (1970). The significance test controversy. Chicago: Aldine. Oakes, M. (1986). Statistical inference: A commentary for the social and behavioral sciences. Chichester, UK: Wiley. Olejnik, S., & Algina, J. (2000). Measures of effect size for comparative studies: Applications, interpretations, and limitations. Contemporary Educational Psychology, 25, 241–286. Pearson, K. (1905). Mathematical contributions to the theory of evolution, XIV: On the general theory of skew correlation and non-linear regression (Drapers’ Company Research Memoirs, Biometric Series II). London: Dulau. Pearson, K. (1910). On a new method of determining correlation, when one variable is given by alternative and the other by multiple categories. Biometrika, 7, 248 – 257. Pearson, K. (1914). On certain errors with regard to multiple correlation occasionally made by those who have not adequately studied this subject. Biometrika, 10, 181–187. Pearson, K. (1923). On the correction necessary for the correlation ratio, η. Biometrika, 14, 412 – 417. Pearson, K., & Lee, A. (1897). On the distribution of frequency (variation and correlation) of the barometric height of divers stations. Philosophical Transactions of the Royal Society of London, 190, 423 – 469. Peters, C. C., & Van Voorhis, W. R. (1940). Statistical procedures and their mathematical bases. New York: McGraw-Hill. Richardson, J.T.E. (1996). Measures of effect size. Behavior Research Methods, Instruments, & Computers, 28, 12 – 22. Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 231–244). New York: Russell Sage Foundation. Rosenthal, R., & Rubin, D. B. (1979). A note on percent variance explained as a measure of the importance of effects. Journal of Applied Social Psychology, 9, 395 – 396. Rosenthal, R., & Rubin, D.B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74, 166 –169. Serlin, R. C. (1982). A multivariate measure of association based on Pillai-Bartlett procedure. Psychological Bulletin, 91, 413 – 417. Snyder, P., & Lawson, S. (1993). Evaluating results using corrected and uncorrected effect size estimates. Journal of Experimental Education, 61, 334 – 349.
Salkind_Chapter 82.indd 191
9/4/2010 10:57:15 AM
192
Research Design, Measurement and Statistics and Evaluation
Stigler, S. M. (1986). The history of statistics. Cambridge, MA: Belknap. Stigler, S. M. (1999). Statistics on the table. Cambridge, MA: Harvard University Press. Tatsuoka, M. M. (1970). Discriminant analysis: The study of group differences. Champaign, IL: Institute for Personality and Ability Testing. Tatsuoka, M. M. (1973). An examination of the statistical properties of a multivariate measure of strength of association. Final Report to U.S. Office of Education on Contract No. OEG-5-72-0027. Tatsuoka, M. M. (1993). Effect size. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 461– 479). Hillsdale, NJ: Lawrence Erlbaum. Thompson, B. (2001). Significance, effect sizes, stepwise methods, and other issues: Strong arguments move the field. Journal of Experimental Education, 70, 80 – 93. Thompson, B. (2002). “Statistical,” “practical,” and “clinical”: How many kinds of significance do counselors need to consider? Journal of Counseling and Development, 80, 64 – 71. Tilton, J. W. (1937). The measurement of overlapping. Journal of Educational Psychology, 28, 656 – 662. Vaughan, G. M., & Corballis, M. C. (1969). Beyond tests of significance: Estimated strength of effects in selected ANOVA designs. Psychological Bulletin, 72, 204 – 213. Wilkinson, L., & American Psychological Association Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594 – 604. [Reprint available from http://apa.org/journals/ amp/amp 548594.html] Wilks, S. S. (1932). Certain generalizations of the analysis of variance. Biometrika, 39, 471– 494. Winer, B. J. (1962). Statistical principles in experimental design. New York: McGraw-Hill. Yates, F. (1951). The influence of Statistical Methods for Research Workers on the development of the science of statistics. American Statistical Association Journal, 46, 19 – 34. Yule, G. U. (1900). On the association of attributes in statistics. Philosophical Transactions of the Royal Society, A, 194, 257– 319.
Salkind_Chapter 82.indd 192
9/4/2010 10:57:15 AM
83 The Role of Assessment in a Learning Culture Lorrie A. Shepard
T
his article is about classroom assessment – not the kind of assessments used to give grades or to satisfy the accountability demands of an external authority, but rather the kind of assessment that can be used as a part of instruction to support and enhance learning. On this topic, I am especially interested in engaging the very large number of educational researchers who participate, in one way or another, in teacher education. The transformation of assessment practices cannot be accomplished in separate tests and measurement courses, but rather should be a central concern in teaching methods courses. The article is organized in three parts. I present, first, an historical framework highlighting the key tenets of social efficiency curricula, behaviorist learning theories, and “scientific measurement.” Next, I offer a contrasting social-constructivist-conceptual framework that blends key ideas from cognitive, constructivist, and sociocultural theories. In the third part, I elaborate on the ways that assessment practices should change to be consistent with and support social-constructivist pedagogy. The impetus for my development of an historical framework was the observation by Beth Graue (1993) that “assessment and instruction are often conceived as curiously separate in both time and purpose” (p. 291, emphasis added). As Graue notes, the measurement approach to classroom assessment, “exemplified by standardized tests and teacher-made emulations of those tests,” presents a barrier to the implementation of more constructivist approaches to instruction. Source: Educational Researcher, 29(7) (2000): 4 –14.
Salkind_Chapter 83.indd 193
9/4/2010 7:24:33 PM
194
Research Design, Measurement and Statistics and Evaluation
Reformed Vision of Curriculum
Social Efficiency Curriculum Hereditarian Theory of lQ Associationist & Behaviorist Learning Theories
Scientific Measurement
20th Century Dominant Paradigm (circa 1900s–2000+)
Instruction
Traditional Testing
Dissolution of Old Paradigm: New Views of Instruction/Old Views of Testing (circa 1980s–2000+)
Cognitive & Constructivist Learning Theories
Classroom Assessment
Emergent Paradigm (circa 1990s–2000+)
Figure 1: An historical overview illustrating how changing conceptions of curriculum, learning theory, and measurement explain the current incompatibility between new views of instruction and traditional views of testing
To understand the origins of Graue’s picture of separation and to help explain its continuing power over present-day practice, I drew the chronology in Figure l. A longer-term span of history helps us see that those measurement perspectives, now felt to be incompatible with instruction, came from an earlier, highly consistent theoretical framework (on the left) in which conceptions of “scientific measurement” were closely aligned with traditional curricula and beliefs about learning. To the right is an emergent, constructivist paradigm in which teachers’ close assessment of students’ understandings, feedback from peers, and student self-assessments would be a central part of the social processes that mediate the development of intellectual abilities, construction of knowledge, and formation of students’ identities. The best way to understand dissonant current practices, shown in the middle of the figure, is to realize that instruction (at least in its ideal form) is drawn from the emergent paradigm, while testing is held over from the past.
Historical Perspectives: Curriculum, Psychology, and Measurement The historical framework I present here is familiar to you. Yet, it is important to remind ourselves where traditional views of testing came from and to appreciate how tightly entwined these views of testing are with past models of curriculum and instruction – because dominant theories of the past continue to operate as the default framework affecting and driving current practices and perspectives. Belief systems of teachers, parents, and policymakers derive from these old theories. A more elaborated version of the paradigm that has predominated throughout the 20th century can be shown as a set of interlocking circles (Figure 2). The central ideas of social efficiency and scientific management
Salkind_Chapter 83.indd 194
9/4/2010 7:24:33 PM
Shepard
The Role of Assessment in a Learning Culture
195
The Curriculum of Social Efficiency • Scientific management of schools like factories • Carefully specified educational objectives based on job analysis • Utilitarian content, antagonism toward academic content except for elite few • Science of exact measurement, precise standards • Differentiated curriculum based on predicted social roles
Hereditarian Theory of Intelligence • IQ as innate, unitary, and fixed Associationist & Behaviorist Learning Theories • Concept of mind replaced by stimulus-response associations • Accumulation of atomistic bits of knowledge • Learning tightly sequenced & hierarchical • Limited transfer, each objective taught explicitly • Test-teach-test to ensure learning • Tests isomorphic with learning • Motivation based on positive reinforcement of many small steps
Scientific Measurement • IQ tests to sort pupils by ability • Objective tests to measure achievement
Figure 2: Interlocking tenets of curriculum theory, psychological theories, and measurement theory characterizing the dominant 20th-century paradigm
in the curriculum circle were closely linked, respectively, to hereditarian theories of individual differences and to associationist and behaviorist learning theories. These psychological theories were, in turn, served by scientific measurement of ability an achievement. In the early 1900s, the social efficiency movement grew out of the belief that science could be used to solve the problems of industrialization and urbanization. According to social efficiency theory, modern principles of scientific management, intended to maximize the efficiency of factories, could be applied with equal success to schools. This meant taking F. W. Taylor’s example of a detailed analysis of the movements performed by expert bricklayers and applying similar analyses to every vocation for which students were being prepared (Kleibard, 1995). Then, given the new associationist or connectionist psychology with its emphasis on fundamental building blocks, every step would have to be taught specifically. Precise standards of measurement were required to ensure that each skill was mastered at the desired level. And because it was not possible to teach every student the skills of
Salkind_Chapter 83.indd 195
9/4/2010 7:24:33 PM
196
Research Design, Measurement and Statistics and Evaluation
every vocation, scientific measures of ability were also needed to predict one’s future role in life and thereby determine who was best suited for each endeavor. For John Franklin Bobbitt, a leader in the social efficiency movement, a primary goal of curriculum design was the elimination of waste (1912), and it was wasteful to teach people things they would never use. Bobbitt’s most telling principle was that each individual should be educated “according to his capabilities.” These views led to a highly differentiated curriculum and a largely utilitarian one that disdained academic subjects for any but college preparatory students. Alongside these curriculum theories, Edward Thorndike’s (1922) associationism and the behaviorism of Hull (1943), Skinner (1938, 1954) and Gagne (1965) conceived of learning as the accumulation of stimulus-response associations. The following quotation from B. F. Skinner is illustrative: The whole process of becoming competent in any field must be divided into a very large number of very small steps, and reinforcement must be contingent upon the accomplishment of each step. This solution to the problem of creating a complex repertoire of behavior also solves the problem of maintaining the behavior in strength . . .. By making each successive step as small as possible, the frequency of reinforcement can be raised to a maximum, while the possibly aversive consequences of being wrong are reduced to a minimum. (Skinner, 1954, p. 94)
Note that this viewpoint promotes a theory of motivation as well as one of cognitive development. Several key assumptions of the behavioristic model had consequences for ensuing conceptualizations of teaching and testing: 1. 2. 3. 4.
Learning occurs by accumulating atomized bits of knowledge; Learning is tightly sequenced and hierarchical; Transfer is limited, so each objective must be explicitly taught; Tests should be used frequently to ensure mastery before proceeding to the next objective; 5. Tests are isomorphic with learning (tests = learning); 6. Motivation is external and based on positive reinforcement of many small steps. It is no coincidence that Thorndike was both the originator of associationist learning theory and the “father” of “scientific measurement,” a name given him by Ayers in 1918. Thorndike and his students fostered the development and dominance of the “objective” test, which has been the single most striking feature of achievement testing in the United States from the beginning of the century to the present day. Recognizing the common paternity of behaviorist learning theory and objective testing helps us to understand the continued intellectual kinship between one-skill-at-a-time test items and instructional practices aimed at mastery of constituent elements.
Salkind_Chapter 83.indd 196
9/4/2010 7:24:34 PM
Shepard
The Role of Assessment in a Learning Culture
New Stone Reasoning Tests in Arithmetic (1908) 1. James had 5 cents. He earned 13 cents more and then bought a top for 10 cents. How much money did he have left? Answer: ________ Sones-Harry High School Achievement Test, Part II (1929) 1. Write "25% of" as "a decimal times." ............ (________) 2. Write in figures: one thousand seven and four hundredths ................................................... (________) The Modern School Achievement Tests, Language Usage a. off ________ 1. I borrowed a pen b. off of my brother. c. from The Barrett-Ryan Literature Test: Silas Marner 1. Dolly Winthrop is: a. an ambitious society woman. c. a haughty lady. b. a frivolous girl.
d. a kind, helpful neighbor.
Examples of True-False Objective Test (Ruch, 1929) 1. Tetanus (lockjaw) germs usually enter the body False through open wounds. True
197
American History Examination, East High School (Sam Everett and Effey Riley, 1928) I. Below is a list of statements. Indicate by a cross (X) after it, each statement that expresses a social heritage of the present-day American nation. Place a (0) after each statement that is not a present-day social heritage of the American nation. 1. Americans believe in the ideal of religious toleration. _____ 2. Property in land should be inherited by a man's eldest son. _____ 3. Citizens should have the right to say what taxes should be put upon them. _____ II. To test your ability to see how an intelligent knowledge of past events helps us to understand present-day situations and tendencies. (Note: Write your answer in essay form on a separate sheet of paper.) State your reasons for every position assumed. 4. Take some economic fact or group of facts in American History about which we have studied and briefly show what seems to you to be the actual significance of this fact in the past, present and future of America. 5. Show this same three-fold relationship using some political fact or facts. 6. Show this same three-fold relationship using a religious fact or facts.
Note: The first four examples are borrowed from Ross (1941); the last two, including the Everett-Riley American History Examination, appeared in Ruch (1929).
Figure 3: Examples from some of the earliest 20th-century “standard” tests and objectivetype classroom tests
Looking at any collection of tests from early in the century, as shown in Figure 3, one is immediately struck by how much the questions emphasized rote recall. To be fair, at the time, this was not a distortion of subject matter caused by the adoption of objective-item formats. One hundred years ago, various recall, completion, matching, and multiple-choice test types, along with some essay questions, fit closely with what was deemed important to learn. However, once curriculum became encapsulated and represented by these types of items, it is reasonable to say that these formats locked in a particular and outdated conception of subject matter. The dominance of objective tests in classroom practice has affected more than the form of subject-matter knowledge. It has also shaped beliefs about the nature of evidence and principles of fairness. In a recent assessment project, for example, both teachers and researchers were surprised to find that despite our shared enthusiasm for developing alternatives to standardized tests we nonetheless operated from different assumptions about how “standardized” assessments needed to be in classrooms. More surprising still, it was teachers who held beliefs more consistent with traditional principles of scientific measurement. From the perspective of our teacher colleagues, assessment needed to be an official event, separate from instruction (Bliem & Davinroy, 1997). To ensure fairness, teachers believed that assessments had to be uniformly administered, so they were reluctant to conduct more intensive
Salkind_Chapter 83.indd 197
9/4/2010 7:24:34 PM
198
Research Design, Measurement and Statistics and Evaluation
individualized assessments with only below-grade-level readers. Because of the belief that assessments had to be targeted to a specific instructional goal, teachers felt more comfortable using two separate assessments for separate goals, “running records” to assess fluency and written summaries to assess comprehension rather than, say, asking students to retell the gist of a story in conjunction with running records. Most significantly, teachers wanted their assessments to be “objective,” and this was the word they used. They worried often about the subjectivity involved in making more holistic evaluations of student work and preferred formula-based methods, such as counting miscues, because these techniques were more “impartial.” Any attempt to change the form and purpose of classroom assessment to make it more fundamentally a part of the learning process must acknowledge the power of these enduring and hidden beliefs.
Conceptual Framework: New Theories of Curriculum, Learning and Assessment To consider how classroom assessment practices might be reconceptualized to be more effective in moving forward the teaching and learning process, I elaborated the principles of a “social-constructivist” conceptual framework, borrowing from cognitive, constructivist, and sociocultural theories.1 (Though these camps are sometimes warring with each other, I predict that it will be something like this merged, middle-ground theory that will eventually be accepted as common wisdom and carried into practice.) The threepart figure (Figure 4) was developed in parallel to the three-part historical paradigm to highlight, respectively, changes in curriculum, learning theory, and assessment. In some cases, principles in the new paradigm are the direct antitheses of principles in the old. The interlocking circles again are intended to show the coherence and inter-relatedness of these ideas taken together. The cognitive revolution reintroduced the concept of mind. In contrast to past, mechanistic theories of knowledge acquisition, we now understand that learning is an active process of mental construction and sense making. From cognitive theory we have also learned that existing knowledge structures and beliefs work to enable or impede new learning, that intelligent thought involves self-monitoring and awareness about when and how to use skills, and that “expertise” develops in a field of study as a principled and coherent way of thinking and representing problems, not just as an accumulation of information At the same time, rediscovery of Vygotsky (1978) and the work of other Soviet psychologists led to the realization that what is taken into the mind is socially and culturally determined. Fixed, largely hereditarian theories of intelligence have been replaced with a new understanding that cognitive abilities are “developed” through socially supported interactions. Although
Salkind_Chapter 83.indd 198
9/4/2010 7:24:34 PM
Shepard
The Role of Assessment in a Learning Culture
199
Reformed Vision of Curriculum • All students can learn • Challenging subject matter aimed at higher order thinking & problem solving • Equal opportunity for diverse learners • Socialization into the discourse & practices of academic disciplines • Authenticity in the relationship between learning in and out of school • Fostering of important dispositions and habits of mind • Enactment of democratic practices in a caring community Cognitive & Constructivist Learning Theories • Intellectual abilities are socially and culturally developed • Learners construct knowledge and understandings within a social context • New learning is shaped by prior knowledge and cultural perspectives • Intelligent thought involves "metacognition" or self monitoring of learning and thinking. • Deep understanding is principled and supports transfer • Cognitive performance depends on dispositions and personal identity
Classroom Assessment • Challenging tasks to elicit higher order thinking • Addresses learning processes as well as learning outcomes • An on-going process, integrated with instruction • Used formatively in support of student learning • Expectations visible to students • Students active in evaluating their own work • Used to evaluate teaching as well as student learning
Figure 4: Shared principles of curriculum theories, psychological theories and assessment theory characterizing an emergent, constructivist paradigm
Vygotsky was initially interested in how children learn to think, over time the ideas of social mediation have been applied equally to the development of intelligence, expertise in academic disciplines, and meta-cognitive skills, and to the formation of identity. Indeed, a singularly important idea in this new paradigm is that both development and learning are primarily social processes. These insights from learning theory then lead to a set of principles for curriculum reform. The slogan that “all students can learn” is intended to refute past beliefs that only an elite group of students could master challenging subject matter. A commitment to equal opportunity for diverse learners means providing genuine opportunities for high-quality instruction and “ways into” academic curricula that are consistent with language and interaction patterns of home and community (Au & Jordan, 1981; Brown, 1994; Heath, 1983; Tharp & Gallimore, 1988). Classroom routines and the ways that teachers and students talk with each other should help students gain experience with the ways of thinking and speaking in academic disciplines. School learning should be authentic and connected to the world outside of school not only to make learning more interesting and
Salkind_Chapter 83.indd 199
9/4/2010 7:24:34 PM
200
Research Design, Measurement and Statistics and Evaluation
motivating to students but also to develop the ability to use knowledge in real-world settings. In addition to the development of cognitive abilities, classroom expectations and social norms should foster the development of important dispositions, such as students’ willingness to persist in trying to solve difficult problems. To be compatible with and to support this social-constructivist model of teaching and learning, classroom assessment must change in two fundamentally important ways. First, its form and content must be changed to better represent important thinking and problem solving skills in each of the disciplines. Second, the way that assessment is used in classrooms and how it is regarded by teachers and students must change. Furthermore, to enable this latter set of changes within classrooms, I argue that teachers need help in fending off the distorting and de-motivating effects of external assessments.
Improving the Content and Form of Assessments The content of assessments should match challenging subject matter standards and serve to instantiate what it means to know and learn in each of the disciplines. Therefore, a broader range of assessment tools is needed to capture important learning goals and processes and to more directly connect assessment to ongoing instruction. The most obvious reform has been to devise more open-ended performance tasks to ensure that students are able to reason critically, to solve complex problems, and to apply their knowledge in real-world contexts. In addition, if instructional goals include developing students’ metacognitive abilities, fostering important dispositions, and socializing students into the discourse and practices of academic disciplines, then it is essential that classroom routines and corresponding assessments reflect these goals as well. This means expanding the armamentarium for data gathering to include observations, clinical interviews, reflective journals, projects, demonstrations, collections of student work, and students’ self-evaluations, and it means that teachers must engage in systematic analysis of the available evidence. In this article, I do not elaborate further on needed changes in the content and form of assessment primarily because this aspect of reform has received the most attention to date. Although I cannot claim that common practice has moved significantly beyond the end-of-chapter test, there are nonetheless already promising models being developed and used in literacy, mathematics, science, history, and so forth. For example, Pat Thompson (1995) provided the set of questions in Figure 5 to illustrate how non-algorithmic problems can help students “see” a mathematical idea. Two additional openended tasks are shown in Figure 6 and serve to illustrate the point that good assessment tasks are interchangeable with good instructional tasks.
Salkind_Chapter 83.indd 200
9/4/2010 7:24:34 PM
Shepard
The Role of Assessment in a Learning Culture
201
a) Can you see 3/5 of something? b) Can you see 5/3 of something? c) Can you see 5/3 of 3/5? d) Can you see 2/3 of 3/5? e) Can you see 1 ÷ 3/5? f) Can you see 5/4 ÷ 3/4?
Figure 5: An Example of a set of questions designed to help students visualize part-whole relationships as a way to understand fractions (Thompson, 1995)
Grade 4 Mathematics Problem Set (Mathematical Sciences Education Board, 1993) All of the bridges in this part are built with yellow rods for spans and red rods for supports, like the one shown here. This is a 2-span bridge like the one you just built. Note that the yellow rods are 5 cm long. yellow red 1. Now, build a 3-span bridge. a. How many yellow rods did you use?_____ b. How long is your bridge?_____ c. How many red rods did you use?_____ d. How many rods did you use altogether? _____ 2. Try to answer these questions without building a 5-span bridge. If you want, build a 5-span bridge to check your answers. a. How many yellow rods would you need for a 5-span bridge? _____ b. How long would your bridge be? _____ c. How many red rods would you need? _____ d. How many rods would you need altogether? _____
Grade 5 Science Tasks (California Learning Assessment System, 1994) Fossils You are a paleontologist (a scientist who studies past life forms). You were digging and just discovered a large group of fossils. Directions: Open BAG A and spread the fossils on the table. Use the hand lens to carefully observe each fossil. Sort your fossils into groups. You may make as many groups as you like. Write answers to these questions in your journal. 1. Draw your groups. Circle and number each group. 2. How many groups do you have? 3. List the number of each group and tell why you sorted your fossils into these groups. BAG B has a fossil that was found in the area near where you were digging. Directions:
Opn BAG B. Take out the new fossil and compare it with the other fossils on the table. 4. Does this new fossil fit into one of your groups? If YES, how are they alike? 5. If the new fossil does not fit into any of your groups, describe a new group in which this fossil would fit. 3. Write a rule for figuring out the total number of rods you would need to build a bridge if you knew how many spans 6. Choose one of the fossils and draw a picture of it. the bridge had. 7. In what kind of habitat (environment) do you think this fossil might have once lived? Why?
Figure 6: Examples of open-ended assessment tasks intended to engage students in thinking and reasoning about important content
Protecting Classroom Assessment from the Negative Effects of High-Stakes Accountability Testing The arguments advanced thus far – in support of social-constructivist learning theory, challenging curriculum for all students, and imaginative new forms of assessment – follow closely the rhetoric of standards-based reform. I have avoided using that term, however, because, from the beginning, standardsbased reform has additionally placed great faith in externally imposed standards and “tests worth teaching to.” More recently, the standards movement has been corrupted, in many instances, into a heavy-handed system of rewards
Salkind_Chapter 83.indd 201
9/4/2010 7:24:34 PM
202
Research Design, Measurement and Statistics and Evaluation
and punishments without the capacity building and professional development originally proposed as part of the vision (McLaughlin & Shepard, 1995). Although both large-scale, system-monitoring assessments and classroom assessments could benefit from the same kinds of substantive reform and alignment of content with important learning goals, there is more at stake here than reform of assessment format. If we wish to pursue seriously the use of assessment for learning, which I consider in the next section, it is important to recognize the pervasive negative effects of accountability tests and the extent to which externally imposed testing programs prevent and drive out thoughtful classroom practices. In presenting these ideas to an audience of educational researchers and teacher educators, I used the image of Darth Vader and the Death Star to convey the overshadowing effects of accountability testing. The negative effects of high-stakes testing on teaching and learning are well known (e.g., Madaus, West, Harmon, Lomax, & Viator, 1992). Under intense political pressure, test scores are likely to go up without a corresponding improvement in student learning. In fact, distortions in what and how students are taught may actually decrease students’ conceptual understanding. While some had imagined that teaching to good tests would be an improvement over low-level basic-skills curricula, more recent experiences remind us that all tests can be corrupted. And all can have a corrupting influence on teaching (Whitford & Jones, 2000). Moreover, as Darling-Hammond (1988), McNeil (1988), and others have pointed out, external accountability testing leads to the de-skilling and de-professionalization of teachers, even – in my own state recently – to the denigration of teaching. High-stakes accountability teaches students that effort in school should be in response to externally administered rewards and punishment rather than the excitement of ideas. And accountability-testing mandates warn teachers to comply or get out (or move, if they can, to schools with higher scoring students). Again, these ideas are not new. It is likely that teacher educators say something about this litany of complaints in teacher preparation courses. But, what do diatribes against testing teach candidates about more meaningful forms of assessment? Given their own personal histories, our students are able to hate standardized testing and at the same time reproduce it faithfully in their own pre-post testing routines, if they are not given the opportunity to develop and try out other meaningful forms of assessment situated in practice. So we must teach them how to do assessment well. Also, teacher candidates need to find support and a way of protecting their own developing understandings of constructivist assessment practices from the onslaught of test-driven curricula. I have in mind here something like the double-entry teaching that teachers had invented in Linda McNeil’s (1988) study of the Contradictions of Control. In contrast to teachers who trivialized content and taught defensively as a means to control and win compliance
Salkind_Chapter 83.indd 202
9/4/2010 7:24:34 PM
Shepard
The Role of Assessment in a Learning Culture
203
from students, McNeil found that excited and engaging teachers in the magnet schools she studied found ways to resist and hold off the pernicious effects of proficiency testing on their curriculum. Specifically, they helped students keep parallel sets of notes, one set for the real knowledge and one for the knowledge they would need for the test. They did this rather than give over the entire course to the “fragments and facts” required on the test. This is only one example of a strategy for resistance. As I continue next to describe productive ways to use assessment in classrooms, I emphasize the need sometimes to “mark” informal assessment occasions for students as they occur within the normal flow of classroom discourse – because this helps students become self-aware about how assessment can help learning. Similarly, I believe we should explicitly address with our teacher education students how they might cope with the contesting forces of good and evil assessment as they compete in classrooms to control curriculum, time, and student attitudes about learning.
Using Assessment in the Process of Learning A Learning Culture Improving the content of assessments is important but not sufficient to ensure that assessment will be used to enhance learning. In this section, I consider the changes in classroom practices that are also needed to make it possible for assessment to be used as part of the learning process. How might the culture of classrooms be shifted so that students no longer feign competence or work to perform well on the test as an end separate from real learning? Could we create a learning culture where students and teachers would have a shared expectation that finding out what makes sense and what doesn’t is a joint and worthwhile project, essential to taking the next steps in learning? I believe that our international colleagues are ahead of us in thinking about the difficulties of making these cultural changes. Sadler (1998) in Australia, for example, writes about “the long-term exposure of students to defective patterns of formative2 assessment” (p. 77). Perrenoud in Switzerland (1991) notes that there are always certain students in a class who are willing to work harder to learn more and who, therefore, go along with formative assessment. But other children and adolescents are “imprisoned in the identity of a bad pupil and an opponent” (p. 92). According to Perrenoud, “every teacher who wants to practice formative assessment must reconstruct the teaching contract so as to counteract the habits acquired by his pupils” (p. 92). Tunstall and Gipps (1996) have studied classrooms in Great Britain where teachers have developed more interactive ways of discussing work and criteria with students as a means to redistribute power and establish more collaborative relationship with students.
Salkind_Chapter 83.indd 203
9/4/2010 7:24:34 PM
204
Research Design, Measurement and Statistics and Evaluation
To accomplish the kind of transformation envisioned, we have not only to make assessment more informative, more insightfully tied to learning steps, but at the same time we must change the social meaning of evaluation. Our aim should be to change our cultural practices so that students and teachers look to assessment as a source of insight and help instead of an occasion for meting out rewards and punishments. In the paragraphs that follow, I summarize briefly several specific assessment strategies: dynamic assessment, assessment of prior knowledge, the use of feedback, teaching for transfer, explicit criteria, student self-assessment, and evaluation of teaching. Each of these strategies serves a social, motivational purpose as well as a cognitive, informational one. None of these strategies by themselves will be effective if they are not part of a more fundamental shift in classroom practices and expectations about learning.
Dynamic, On-Going Assessment In order for assessment to play a more useful role in helping students learn it should be moved into the middle of the teaching and learning process instead of being postponed as only the end-point of instruction. Dynamic assessment – finding out what a student is able to do independently as well as what can be done with adult guidance – is integral to Vygotsky’s idea of a zone of proximal development. This type of interactive assessment, which allows teachers to provide assistance as part of assessment, does more than help teachers gain valuable insights about how understanding might be extended. It also creates perfectly targeted occasions to teach and provides the means to scaffold next steps. Although formal dynamic assessments are assumed to involve an adult working with only one child, these ideas about social mediation of learning can be extended to groups, especially if students are socialized into the ways of talking in a community of practice and become accustomed to explaining their reasoning and offering and receiving feedback about their developing competence as part of a social group. Note that these ideas, based on activity theory and Lave and Wenger’s (1991) concept of legitimate peripheral participation, provide a profoundly different view of motivation from behaviorist reinforcement and create no separation between cognitive and motivational goals. According to Lave and Wenger’s theory, learning and development of an identity of mastery occur together as a newcomer becomes increasingly adept at participating in a community of practice. If one’s identity is tied to group membership, then it is natural to work to become a more competent and full-fledged member of the group.
Prior Knowledge Prior knowledge and feedback are two well-established ideas, the meaning of which may have to be reexamined as learning theories are changed to
Salkind_Chapter 83.indd 204
9/4/2010 7:24:34 PM
Shepard
The Role of Assessment in a Learning Culture
205
take better account of social and cultural contexts. For example, assessing my prior knowledge using a checklist or pre-test version of the intended endof-unit test may not be very accurate unless I already have sophisticated experience with the teacher’s measures and conceptual categories. Open discussion or “instructional conversations” (Tharp & Gallimore, 1988) are more likely to elicit a more coherent version of students’ reasoning and relevant experiences and can be a much more productive way for novice teachers to learn about the resources brought by students from diverse communities. In my own experience working in schools, I have noticed two divergent sets of teaching practices that address students’ prior knowledge. First, many teachers rely on a traditional, pretest-posttest design to document student progress, but then do not use information from the pretest in instruction. At the same time, a significant number of teachers, especially in reading and language arts, use prior knowledge activation techniques, such as Ogle’s (1986) KWL strategy, but without necessarily attending to the assessment insights provided. We have a great deal of work to do to develop and model effective assessment strategies, for starting points as well as for other stages of learning. One question we may want to consider is whether assessment should become so much a part of normal classroom discourse patterns that scaffolding and ongoing checks for understanding are embedded (and therefore disguised). Or whether assessment steps should be marked and made visible to students as an essential step in learning. In our efforts to change the culture of the classroom, it may be helpful, at least in the short term, to label prior knowledge activation techniques as instances of “assessment.” What safer time to admit what you don’t know than at the start of an instructional activity?
Feedback We take it for granted that providing feedback to the learner about performance will lead to self-correction and improvement. For the most part, however, the existing literature on feedback will be of limited value to us in reconceptualizing assessment from a constructivist perspective, because the great majority of existing studies are based on behaviorist assumptions. Typically, the outcome measures are narrowly defined, feedback consists of reporting of right and wrong answers to the learner, and the end-of-study test may differ only slightly from the prior measure and from instructional materials. More promising are studies of scaffolding and naturalistic studies of expert tutoring – but these studies also reveal how much we have to learn about effective use of feedback. For example, Lepper, Drake and O’DonnellJohnson (1997) found that the most effective tutors do not routinely correct student errors directly. Instead they ignore errors when they are inconsequential to the solution process and forestall errors that the student has made previously by offering hints or asking leading questions. Only when the
Salkind_Chapter 83.indd 205
9/4/2010 7:24:34 PM
206
Research Design, Measurement and Statistics and Evaluation
forestalling tactic fails do expert tutors intervene with a direct question intended to force the student to self-correct, or they may engage in debugging, using a series of increasingly direct questions to guide the student through the solution process. According to Lepper et al.’s analysis, the tendency of expert tutors to use indirect forms of feedback when possible was influenced by their desire to maintain student motivation and self-confidence while not ignoring student errors. This is a balancing act that new teachers must learn to perform as well.
Transfer There is a close relationship between truly understanding a concept and being able to transfer knowledge and use it in new situations. In contrast to memorization – and in contrast to the behaviorist assumption that each application must be taught as a separate learning objective – true understanding is flexible, connected, and generalizable. Not surprisingly, research studies demonstrate that learning is more likely to transfer if students have the opportunity to practice with a variety of applications while learning (Bransford, 1979). Although there appears to be disagreement between cognitivists and situativists regarding knowledge generalization (Anderson, Reder, & Simon, 1996), in fact, both groups of researchers acknowledge the importance of being able to use what one has learned in new situations (Bransford, Brown, & Cocking, 1999). Cognitivists focus more on cognitive structures, abstract representations, and generalized principles that enable knowledge use in new situations, while situativists are concerned about “learning to participate in interactions in ways that succeed over a broad range of situations” (Greeno, 1996, p. 3). In working with pre-service teachers, I have suggested that a goal of teaching should be to help students develop “robust” understandings (Shepard, 1997). The term was prompted by Marilyn Burns’s (1993) reference to children’s understandings as being “fragile” – they appear to know a concept in one context but not to know it when asked in another way or in another setting. Sometimes this fragility occurs because students are still in the process of learning and sometimes because the framing of the problem, clues, and other supports available in the familiar context are not available in another. All too often, however, mastery appears pat and certain but does not travel to new situations because students have mastered classroom routines and not the underlying concepts. To support generalization and ensure transfer, that is, to support robust understandings, “Good teaching constantly asks about old understandings in new ways, calls for new applications, and draws new connections” (Shepard, 1997, p. 27). And good assessment does the same. We should not, for example, agree to a contract with our students which says that the only fair test is one with familiar and well-rehearsed problems.
Salkind_Chapter 83.indd 206
9/4/2010 7:24:34 PM
Shepard
The Role of Assessment in a Learning Culture
207
Explicit Criteria Frederiksen and Collins (1989) used the term transparency to express the idea that students must have a clear understanding of the criteria by which their work will be assessed. In fact, the features of excellent performance should be so transparent that students can learn to evaluate their own work in the same way that their teachers would. According to Frederiksen and Collins, The assessment system (should) provide a basis for developing a metacognitive awareness of what are important characteristics of good problem solving, good writing, good experimentation, good historical analysis, and so on. Moreover, such an assessment can address not only the product one is trying to achieve, but also the process of achieving it, that is, the habits of mind that contribute to successful writing, painting, and problem solving (Wiggins, 1989). (Frederikson & Collins, 1989, p. 30)
Having access to evaluation criteria satisfies a basic fairness principle (we should know the rules for how our work will be judged). More importantly, however, giving students the opportunity to get good at what it is that the standards require speaks to a different and even more fundamental sense of fairness, which is what Wolf and Reardon (1996) had in mind when they talked about “making thinking visible” and “making excellence attainable.”
Self-Assessment Student self-assessment serves cognitive purposes, then, but it also promises to increase students’ responsibility for their own learning and to make the relationship between teachers and students more collaborative. As Caroline Gipps (1999) has suggested, this does not mean that the teacher gives up responsibility, but that rather, by sharing it, she gains greater student ownership, less distrust, and more appreciation that standards are not capricious or arbitrary. In case studies of student self-evaluation practices in both an Australian and English site, Klenowski (1995) found that students participating in self-evaluation became more interested in the criteria and substantive feedback than in their grade per se. Students also reported that they had to be more honest about their own work as well as being fair with other students, and they had to be prepared to defend their opinions in terms of the evidence. Klenowski’s (1995) data support Wiggins’s (1992) earlier assertion that involving students in analyzing their own work builds ownership of the evaluation process and “makes it possible to hold students to higher standards because the criteria are clear and reasonable” (p. 30).
Salkind_Chapter 83.indd 207
9/4/2010 7:24:34 PM
208
Research Design, Measurement and Statistics and Evaluation
Evaluation of Teaching In addition to using assessment to monitor and promote individual students’ learning, classroom assessment should also be used to examine and improve teaching practices. This includes both ongoing, informal assessments of students’ understandings to adjust lessons and teaching plans as well as more formal and critical action-research studies. As I have suggested with other assessment strategies, here again I believe it will be helpful for teachers to make their investigations of teaching visible to students, for example, by discussing with them decisions to redirect instruction, stop for a mini-lesson, and so forth. This seems to be fundamentally important to the idea of transforming the culture of the classroom. If we want to develop a community of learners – where students naturally seek feedback and critique their own work – then it is reasonable that teachers would model this same commitment to using data systematically as it applies to their own role in the teaching and learning process.
Conclusion In conclusion, let me acknowledge that this social-constructivist view of classroom assessment is an idealization. The new ideas and perspectives underlying it have a basis in theory and empirical studies, but how they will work in practice and on a larger scale is not known. Clearly, the abilities needed to implement a reformed vision of curriculum and classroom assessment are daunting. Being able to ask the right questions at the right time, anticipate conceptual pitfalls, and have at the ready a repertoire of tasks that will help students take the next steps requires deep knowledge of subject matter. Teachers will also need help in learning to use assessment in new ways. They will need a theory of motivation and a sense of how to develop a classroom culture with learning at its center. Given that new ideas about the role of assessment are likely to be at odds with prevailing beliefs, teachers will need assistance to reflect on their own beliefs as well as those of students, colleagues, parents, and school administrators. I am reminded of Linda Darling-Hammond’s (1996) acknowledgement in her presidential address that John Dewey anticipated all of these ideas 100 years ago. But as Cremin (1961) explained, the successes of progressive education reforms never spread widely because such practice required “infinitely skilled teachers” who were never prepared in sufficient numbers to sustain these complex forms of teaching and schooling. So, we are asking a lot of ourselves and others. Nonetheless, we must try again. This vision should be pursued because it holds the most promise for using assessment to improve teaching and learning. To do otherwise means that day-to-day instructional practices will continue to reinforce and reproduce
Salkind_Chapter 83.indd 208
9/4/2010 7:24:34 PM
Shepard
The Role of Assessment in a Learning Culture
209
the status quo. Our goal should be to find ways to fend off the negative effects of externally imposed tests and to develop instead classroom assessment practices that can be trusted to help students take the next steps in learning.
Epilogue I would be remiss if I did not take this opportunity to provide at least a brief sketch of what we might do concretely to work toward a proposed vision of assessment in the service of learning. Happily for an organization of researchers, I suggest more research – but research of a particular kind embedded in the dilemmas of practice. I also suggest that we develop and pursue an agenda of public education to help policymakers and the general citizenry understand the differences between large-scale, system monitoring tests and what we hope for from teachers on a daily basis.
A Program of Research To develop effective practices based on social-constructivist perspectives, it will be important to conduct studies in classrooms where instruction and assessment strategies are consonant with this model. In many cases this will mean “starting over again” and not assuming that findings from previous research studies can be generalized across paradigms. For example, as suggested earlier, there are hundreds of studies on feedback but nearly all conform to behaviorist assumptions – instruction is of short duration, posttests closely resemble pretests, feedback is in the form of being told the correct answers, and so forth. New studies will be needed to further our understandings of feedback provided in ways that reflect constructivist principles, for example, as part of instructional scaffolding, assessment conversations, and other interactive means of helping students self-correct and improve. Similarly, the research literature on motivation makes sweeping claims about the risks of evaluating students, especially when they are tackling difficult problems. Yet, these findings are based on students’ experiences with traditional, inauthentic and normative forms of assessment, where students took little responsibility for their own learning, and criteria remained mysterious. If the classroom culture were to be shifted dramatically, consistent with social-constructivist learning perspectives, then the effects of assessing students on difficult problems would have to be reexamined. Thus we face the challenge of trying to find out what works at the same time that we are attempting to create new contexts and new cultural expectations that will fundamentally alter the very relations we are trying to study. We also need to study what makes sense in terms of teacher development and change. Many of the most exciting current assessment projects are being
Salkind_Chapter 83.indd 209
9/4/2010 7:24:34 PM
210
Research Design, Measurement and Statistics and Evaluation
conducted in classrooms but still have researchers at the helm, taking central responsibility for the development of curriculum, assessment tasks, and technology-based delivery systems. We know that for teachers to make meaningful changes in pedagogical beliefs and accompanying practices, they themselves will need to try out and reflect on new approaches in the context of their own classrooms (Putnam & Borko, 1997). In deference to the enormous constraints on teachers’ time, we should also look for ways to introduce new practices incrementally, for example, to develop a portfolio for one subject area or one curriculum unit before trying to do it in all subject areas. To consider how particular classroom assessment strategies might be used to create a learning culture as well as improve achievement, teams of teachers in schools might undertake projects aimed at any one of the assessment elements. For example, one team might want to introduce self-assessment and conference with students about how (or whether) self-assessment helps them learn. Another team of teachers might agree to meet regularly to share examples of “assessment insights,” that is, specific occasions when assessment data from a student, written or oral, helped the teacher intervene in a better way because she understood what the student was thinking. While another group of teachers might focus on using feedback explicitly to help students make their work better. When I say that our research efforts should be embedded in the dilemmas of practice, I am echoing the call for more collaborative forms of research advanced in recent reports by the National Research Council (1999) and National Academy of Education (1999) as well as by Alan Schoenfeld (1999) in his presidential address to the AERA. In contrast to a traditional, linear progression from research to development and dissemination, these authors argue for investing in research projects that would advance fundamental understandings at the same time that they would work to solve practical problems in real-world settings. If researchers and professional educators share responsibility for improving educational outcomes, it is hoped that research will lead to continuous improvement of practice and not require a separate translation phase to be useful. In the context of an agenda for improving classroom assessment, this model for research would mean conducting studies aimed at general explanatory principles regarding prior knowledge, selfassessment, and the like, at the same time that practical issues are addressed such as the initial obstacles of negative student attitudes, time seemingly stolen from instruction, and the inevitable demand for better materials and instructional tasks that elicit the kind of thinking and dialogue envisioned.
A Public Education Agenda Researchers in the United States have engaged policymakers and the public on the topic of testing but have focused almost exclusively on the features of state and district accountability testing programs – what the content should be, whether there should be high-stakes consequences, and so forth. In contrast,
Salkind_Chapter 83.indd 210
9/4/2010 7:24:34 PM
Shepard
The Role of Assessment in a Learning Culture
211
we have much to learn from assessment experts in the United Kingdom who have pursued a fundamentally different course of action emphasizing the key role of formative assessment in effective teaching. Beginning in 1989, researchers representing England, Northern Ireland, Scotland, and Wales met as a Task Group of the British Educational Research Association and ultimately established themselves as the Assessment Reform Group. The group is concerned with policy issues and has attempted to have a dialogue with policymakers. Although members of the group have been involved with either the development or evaluation of the National Assessment Programme, they “have become more and more convinced of the crucial link between assessment, as carried out in the classroom, and learning and teaching” (Assessment Reform Group, 1999, p. 1). They commissioned a major review of research examining the impact of assessment on students’ learning (Black & Wiliam, 1998a), and they have issued two policy-oriented “little books” summarizing the important tenets of assessment for learning and urging government policies that would give more than lip service to the importance of improving formative assessment (Assessment Reform Group, 1999; Black & Wiliam, 1998b). They have argued for (a) reframing of bureaucratic requirements, such as standards for teacher education and school inspections, to ensure that teachers are skilled assessors of students’ learning; (b) increased funding, especially for teacher professional development; and (c) reducing obstacles, especially the influence of external tests that dominate teachers’ work. Assessment experts in the U.S. should consider whether a similar public education endeavor would be worthwhile and what message we would choose to convey. At a minimum, we should try to get beyond the currently popular sound-bite of “instructionally relevant assessment,” because, unfortunately, legislators and school board members have taken up this slogan with the intention that once-per-year accountability testing can be used to diagnose individual student needs. Yes, end-of-year tests can be used to evaluate instruction and even tell us something about individual students; but such exams are like shopping mall medical screenings compared to the in-depth and ongoing assessments needed to genuinely increase learning. By pursuing a public education agenda like that undertaken in the U.K. we could help policymakers understand the limits to what can be accomplished with accountability tests (and thereby fend off their negative effects) and at the same time garner the support and flexibility that teachers and researchers will need to develop powerful examples and to enact more pervasive shifts in classroom practices.
Notes This article was presented as the presidential address at the 2000 AERA Annual Meeting in New Orleans, LA. The work reported herein was supported in part by grants from the Office of Educational Research and Improvement, U.S. Department of Education, to the Center for Research on
Salkind_Chapter 83.indd 211
9/4/2010 7:24:35 PM
212
Research Design, Measurement and Statistics and Evaluation
Evaluation, Standards, and Student Testing (CRESST) (Award No. R305B60002) and to the Center for Research on Evaluation, Diversity and Excellence (CREDE) (Award No. R306A60001). The findings and opinions expressed in this article do not reflect the positions or policies of the Office of Educational Research and Improvement or the U.S. Department of Education. 1. A more detailed discussion of this framework and supporting literature review are provided in Shepard (in press). 2. Sadler (1998) uses the term formative assessment to mean assessment “that is specifically intended to provide feedback on performance to improve and accelerate learning” (p. 77). He acknowledges that teachers may have difficulty using feedback in positive ways because of students’ negative coping strategies developed in response to past practices.
References Anderson, J. R., Reder, L. M., & Simon, H. A. (1996). Situated learning and education. Educational Researcher, 25, 5 – 11. Assessment Reform Group. (1999). Assessment for learning: Beyond the black box. Cambridge: University of Cambridge School of Education. Au, K. H., & Jordan, C. (1981). Teaching reading to Hawaiian children: Finding a culturally appropriate solution. In H. Trueba, G. P. Guthrie, & K. H. Au (Eds.), Culture in the bilingual classroom: Studies in classroom ethnography (pp. 139–152). Rowley, MA: Newbury House. Ayers, L. P. (1918). History and present status of educational measurements. Seventeenth Yearbook of the National Society for the Study of Education, Part II, 9–15. Black, P., & Wiliam, D. (1998a). Assessment and classroom learning. Assessment in Education: Principles, Policy, and Practice, 5(1), 7–74. Black, P., & Wiliam, D. (1998b). Inside the black box: Raising standards through classroom assessment. London: School of Education, King’s College. Bliem, C. L., & Davinroy, K. H. (1997). Teachers’ beliefs about assessment and instruction in literacy. Unpublished manuscript, University of Colorado at Boulder. Bobbitt, F. (1912). The elimination of waste in education. The Elementary School Teacher, 12, 259–271. Bransford, J. D. (1979). Human cognition: Learning, understanding, and remembering. Belmont, CA: Wadsworth. Bransford, J. D., Brown, A. L., & Cocking, R. R. (1999). How people learn: Brain, mind, experience, and school. Washington, DC: National Academy Press. Brown, A. L. (1994). The advancement of learning. Educational Researcher, 23, 4 – 12. Burns, M. (1993). Mathematics: Assessing understanding. White Plains, NY: Cuisenaire Company of America. California Learning Assessment System. (1994). A sampler of science assessment – elementary. Sacramento: California Department of Education. Cremin, L. (1961). The transformation of the school: Progressivism in American education, 1876 – 1957. New York: Vintage Books. Darling-Hammond, L. (1988). Accountability and teacher professionalism. American Educator, 12, 8 –13. Darling-Hammond, L. (1996). The right to learn and the advancement of teaching: Research, policy, and practice for democratic education. Educational Researcher, 25, 5 – 17. Frederiksen, J. R., & Collins, A. (1989). A systems approach to educational testing. Educational Researcher, 18, 27–32.
Salkind_Chapter 83.indd 212
9/4/2010 7:24:35 PM
Shepard
The Role of Assessment in a Learning Culture
213
Gagne, R. M. (1965). The conditions of learning. New York: Rinehard & Winston. Gipps, C. V. (1999). Socio-cultural aspects of assessment. In P.D. Pearson & A. Iran-Nejad (Eds.), Review of Research in Education (Vol. 24, pp. 355 – 392). Washington, DC: American Educational Research Association. Graue, M. E. (1993). Integrating theory and practice through instructional assessment. Educational Assessment, 1, 293 – 309. Greeno, J. G. (1996, July). On claims that answer the wrong questions. Stanford, CA: Institute for Research on Learning. Heath, S. B. (1983). Ways with words: Language, life, and work in communities and classrooms. Cambridge: Cambridge University Press. Hull, C. L. (1943). Principles of behavior: An introduction to behavior theory. New York: Appleton-Century. Klenowski, V. (1995). Student self-evaluation process in student-centered teaching and learning contexts of Australia and England. Assessment in Education, 2,145 –163. Kliebard, H. M. (1995). The struggle for the American curriculum: 1893–1958 (2nd ed.). New York: Routledge. Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge, England: Cambridge University Press. Lepper, M. R., Drake, M. F., O’Donnell-Johnson, T. (1997). Scaffolding techniques of expert human tutors. In K. Hogan & M. Pressley (Eds.), Scaffolding student learning: Instructional approaches & issues. Cambridge, MA: Brookline Books. Madaus, G. F., West, M. M., Harmon, M. C, Lomax, R. G., & Viator, K. A. (1992). The influence of testing on teaching math and science in grades 4 –12. Chestnut Hill, MA: Center of Study of Testing, Evaluation, and Educational Policy, Boston College. Mathematical Sciences Education Board. (1993). Measuring up: Prototypes for mathematics assessment. Washington, DC: National Academy Press. McLaughlin, M. W., & Shepard, L. A. (1995). Improving education through standards-based reform: A report of the National Academy of Education panel on standards-based educational reform. Stanford, CA: National Academy of Education. McNeil, L. M. (1988). Contradictions of control: School structure and school knowledge. New York: Routledge. National Academy of Education. (1999, March). Recommendations regarding research priorities: An advisory report to the National Educational Research Policy and Priorities Board. New York: New York University. National Research Council. (1999). Improving student learning: A strategic plan for education research and its utilization. Washington, DC: National Academy Press. Ogle, D. M. (1986). K-W-L: A teaching model that develops active reading of expository test. The Reading Teacher, 39(6), 564 – 570. Perrenoud, P. (1991). Towards a pragmatic approach to formative evaluation. In P. Weston (Ed.), Assessment of pupils’ achievement: Motivation and school success (pp. 77–101). Amsterdam: Swets and Zeitlinger. Putnam, R. T., & Borko, H. (1997). Teacher learning: Implications of new views of cognition. In B. J. Biddle, T. L. Good, & I. F. Goodson (Eds.), International handbook of teachers and teaching (Vol. 2, pp. 1223 – 1296). Dordecht, The Netherlands: Kluwer. Ross, C. C. (1941). Measurement in today’s schools. New York: Prentice-Hall. Ruch, G. M. (1929). The objective or new-type examination. Chicago: Scott Foresman. Sadler, D. R. (1998). Formative assessment: Revisiting the territory. Assessment in Education: Principles, Policy and Practice, 5, 77 – 84. Schoenfeld, A. H. (1999). Looking toward the 21st century: Challenges of educational theory and practice. Educational Researcher, 28(7), 4 – 14. Shepard, L. A. (1997). Measuring achievement: What does it mean to test for robust understanding? Princeton, NJ: Policy Information Center, Educational Testing Service.
Salkind_Chapter 83.indd 213
9/4/2010 7:24:35 PM
214
Research Design, Measurement and Statistics and Evaluation
Shepard, L. A. (in press). The role of classroom assessment in teaching and learning. In V. Richardson (Ed.), Handbook of research on teaching (4th ed). Washington, DC: American Educational Research Association. Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. New York: Appleton-Century-Crofts. Skinner, B. F. (1954). The science of learning and the art of teaching. Harvard Educational Review, 24, 86 – 97. Tharp, R. G., & Gallimore, R. (1988). Rousing minds to life: Teaching, learning, and schooling in social context. New York: Cambridge University press. Thompson, P . W. (1995). Notation, convention, and quantity in elementary mathematics. In J. T. Sowder & B. P. Schappelle (Eds.), Providing a foundation for teaching mathematics in the middle grades (pp. 199 – 221). New York: State University of New York Press. Thorndike, E. L. (1922). The psychology of arithmetic. New York: Macmillan. Tunstall, P. & Gipps, C. (1996). Teacher feedback to young children in formative assessment: A typology. British Educational Research Journal, 22, 389 – 404. Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge, MA: Harvard University Press. Whitford, B. L., & Jones, K. (2000). Kentucky lesson: How high stakes school accountability undermines a performance-based curriculum vision. In B. L. Whitford & K. Jones (Eds.), Accountability, assessment, and teacher commitment: Lessons from Kentucky’s reform efforts. Albany, NY: State University of New York Press. Wiggins, G. (1989). A true test: Toward more authentic and equitable assessment. Phi Delta Kappan, 70, 703 – 713. Wiggins, G. (1992). Creating tests worth taking. Educational Leadership, 49, 26 – 33. Wolf, D. P., & Reardon, S. F. (1996). Access to excellence through new forms of student assessment. In J. B. Baron & Wolf, D. P. (Eds.), Performance-based student assessment: Challenges and possibilities (pp. 1–31). Chicago: University of Chicago Press.
Salkind_Chapter 83.indd 214
9/4/2010 7:24:35 PM
84 The Place of Theory in Educational Research1 Patrick Suppes
I
n every modern society, the education of its citizens, young and old, is a major concern. In some developing countries, the educational activities of the government consume as much as a third of the national budget. In the United States today, it is estimated that educational activities require at least a hundred billion dollars a year. Most educational activities in this country and elsewhere are like other forms of social and economic activity in society in that only a slight effort is made to study the character of the activities and to understand them as intellectual, economic, or social processes. It is true that there has been a longer tradition, even if a fragile one, of studying the character of education, but I think all members of this Association are very much aware that educational research is a minor activity compared with education as a whole. All of us probably feel on occasion that there is little hope that educational research, given the small national effort devoted to it, will have any real impact on education as a whole. Such pessimistic thoughts are not historically, I think, supported by the evidence, especially when we look at the evidence outside of education as well as inside. By looking outside education I digress for a moment to examine some instances of the impact of science on society. All of the characteristic features of electronic communication and rapid transportation of our society are unique products of the long tradition of science and technology, and the case is especially strong that the changes that have taken place recently, for example, the widespread introduction of color television, have depended in a direct way on prior scientific research. Source: Educational Researcher, 3 (1974): 3–10.
Salkind_Chapter 84.indd 215
9/4/2010 10:56:52 AM
216
Research Design, Measurement and Statistics and Evaluation
It might be useful to mention eight outstanding recent cases that have been studied for the National Science Foundation (Battelle Report, 1973), because the listing of these cases gives a better sense of the diversity of important recent contributions to society arising from specific scientific work. The eight cases all represent developments that almost certainly would never have taken place simply on the basis of either enlightened common sense or some approach of bare empiricism. The eight cases range across a variety of scientific theories and technologies and a variety of segments of society in their applications. They are the heart pacemaker; the development of hybrid grains and the green revolution; electrophotography, which led to office copiers or, as we say in ordinary parlance, Xerox machines; input-output economic analysis developed originally in the thirties by Leontief; organophosphorus insecticides; oral contraceptives, which rest on relatively delicate matters of steroid chemistry; magnetic ferrites, which are widely used in communications equipment and computers; and videotape recorders, which depended upon a confluence of electromagnetic and communication theory and the technology of audio recording. Compared with the impact of some of these scientific and technological developments, the initial cost of research and development has been relatively minor. As these examples illustrate, research can have an impact in our society, and it certainly does in many different ways. To a large extent, education pays more lip service to research than do other main segments of the society. Every large school system has as part of its central office staff some sort of research unit. The schools and colleges of education associated with institutions of higher education throughout the country are all charged with research responsibilities, some of which are specifically written into the legislative charter of the institution. When the Office of Education was established by federal legislation more than a hundred years ago in 1867, the first section of the Act defined the chief purpose of the new bureau, later called the Office of Education, as one of “collecting such statistics and facts as shall show the condition and progress of education in the several states and territories, and of diffusing information respecting the organization and management of schools and school systems and methods of teaching.” There is not in this charge to the Office of Education a serious thrust of theory, and it is fair to say that most of the efforts of the Office of Education have not been directed toward the nurturing of educational theory, but rather to the more mundane and empirical matters of collecting statistics and facts and of disseminating information about the nation’s schools. The point I am making in leisurely fashion is that for at least a hundred years there has been a serious respect for facts and statistical data about education and also for many empirical studies, often of excellent design and execution, to evaluate the learning of students, the effectiveness of a given method of instruction, and so forth. At least until recently, the empiricism of education has been
Salkind_Chapter 84.indd 216
9/4/2010 10:56:53 AM
Suppes
The Place of Theory in Educational Research
217
more enlightened and sophisticated than the empiricism of medicine, which represents an investment comparable to education in our society. The period running from the beginning of this century to the onset of World War II has sometimes been described as the golden age of empiricism in education. Certainly it was marked by a serious effort to move from a priori dogmas and principles of education to consideration of empirical results and even experimental design of inquiries to test the relative efficiency or power of different approaches to a given part of the curriculum. Detailed analysis of the nature of tests and how to interpret the results was begun, and serious attempts, especially by Edward Thorndike and his collaborators, were made to apply a broad range of results from educational psychology to actual problems of learning in the classroom. Unfortunately, this golden age of empiricism was replaced not by a deeper theoretical viewpoint toward educational research, but by a noticeable decline of research. To some extent, the overenthusiastic empiricism of the 1920s promoted a negative reaction from teachers, administrators, and parents. Opposition to achievement tests, to standardization, and to too much ‘objectivity’ in education became rife. A summary of many of the disappointments in the empirical movement in education may be found in the 1938 Yearbook of the National Society for the Study of Education. Although in many respects John Dewey can be identified with the development of the empirical tradition, it is important to note that his work and that of his close collaborators is not notable for the sophistication of its scientific aspects; Dewey himself, it can properly be said, continually stood on shifting ground in advocating empirical and innovative attitudes toward teaching. In fact, one does not find in Dewey the emphasis on tough-minded empirical research that one would like, but rather a kind of hortatory expression of conviction in the value of methods of inquiry brought directly to the classroom, and indeed more directly to the classroom than to the scientific study of what was going on in the classroom. Beginning in the 1950s and especially since Sputnik, we have had a new era of a return to research, and without doubt much valuable work has been done in the last two decades. It is also important to recognize, of course, that much of the thrust for curriculum reform and change in the schools has been bolstered by one form or another of new romanticism untouched by sophisticated consideration of data or facts. This superficial sketch of the historical developments over the past hundred years leads to the conclusion that research, let alone any theoretically oriented research, has occupied almost always a precarious place in education. It might therefore be thought that the proper theme for a presidential address would be the place of research in education and not the more specialized and restricted topic of the place of theory in educational research. However, as the examples I have cited from the National Science Foundation study indicate, there is more than meets the eye on the problems
Salkind_Chapter 84.indd 217
9/4/2010 10:56:53 AM
218
Research Design, Measurement and Statistics and Evaluation
of developing an adequate body of theory in educational research, and success in developing such a body of theory can impact significantly on the place of research in education. I would like to turn to this question in more detail as my first point of inquiry.
1. Why Theory? There are five kinds of argument I would like to examine that can be used to make the case for the relevance of theory to educational research. The first is an argument by analogy, the second is in terms of the reorganization of experience, the third is as a device for recognizing complexity, the fourth is a comparison with Deweyean problem solving, and the fifth concerns the triviality of bare empiricism. I now turn to each of these arguments. Argument by analogy. The success of theory in the natural sciences is recognized by everyone. More recently, some of the social sciences, especially economics and psychology in certain parts, have begun to achieve considerable theoretical developments. It is argued that the obvious and universally recognized importance of theory in the more mature sciences is strong evidence for the universal generalization that theory is important in all sciences, and consequently, we have an argument by analogy for the importance of theory in educational research. However, since at least the eleventh century, when Anselem tried to use an argument by analogy to prove the existence of God, there is proper skepticism that an argument by analogy carries much weight. Although the argument that the success of the natural sciences in the use of theory provides an excellent example for educational research, it does not follow that theory must be comparably useful as we move from one subject to the other. Reorganization of experience. A more important way to think about the role of theory is to attack directly the problem of identifying the need for theory in a subject matter. In all cases where theory has been successful in science I think we can make an excellent argument for the deeper organization of experience the theory has thereby provided. A powerful theory changes our perspective on what is important and what is superficial. Perhaps the most striking example in the history of physics is the law of inertia, which says that a body shall continue uniformly in its direction of motion until acted upon by some external force. Aristotle and other ancient natural philosopher were persuaded that the evidence of experience is clear: A body does not continue in motion unless it is acted upon by force. We can all agree that our own broad experience is exactly that of Aristotle’s. It was a deep insight and represented a radical reorganization of how to think about the world to recognize that the theory of motion is correctly expressed by laws like that of inertia and seldom by our direct commonsense experience.
Salkind_Chapter 84.indd 218
9/4/2010 10:56:53 AM
Suppes
The Place of Theory in Educational Research
219
A good example in education of the impact of theory on reorganizing our way of thinking about our discipline is the infusion of economic theory that has taken place in the last decade with such vigor and impact. (A good survey is to be found in the two-volume reader edited by Blaug, 1968, 1969.) The attempt, for instance, to develop an economic theory of productivity for our schools can be criticized in many different ways, but it still remains that we have been forced to think anew about the allocation of resources, especially of how we can develop a deeper running theory for the efficient allocation of resources to increase productivity and, at the same time, to develop a better theory for the measurements of input and output and the construction of production functions. Let me give one example from some of my own discussions with economists, especially with Dean Jamison. Starting from the economists’ way of looking at output, it is natural to ask how we can measure the output of an elementary school, for example. What I find striking is the lack of previous discussion of this problem in the literature of education. (Exceptions are Page, 1972, and Page & Breen, 1973.) Even if we restrict ourselves to measurements of academic skills, and indeed only to the academic skills assessed on standard achievement tests, we still have the problem of how to aggregate the measurement of these skills to give us an overall measure of output. If one accepts the fact, as most of us do, that academic achievement alone is not important, but that a variety of social and personal skills, as well as the development of a sense of values and of moral autonomy, are needed, one is really nonplussed by even crude assessments of these individual components. There is, of course, the well-worn answer that the things that matter most are really ineffable and immeasurable, but this romantic attitude is not one for which I have much tolerance. I am simply struck in my own thinking by the difficulty of making a good assessment, and my sense of the difficulties has been put in focus by trying to deal with some of the theoretical ideas economists have brought to bear in education. Recognition of complexity. One of the thrusts of theory is to show that what appear on the surface to be simple matters of empirical investigation, on a deeper view, prove to be complex and subtle. The basic skills of language and mathematics at any level of instruction, but primarily at the most elementary level, provide good examples. If we are offered two methods of reading it is straightforward to design an experiment to see whether or not a difference of any significant magnitude between the two methods can be found in the achievement of students. It has been progress in education to recognize that such problems can be studied as scientific problems, and it is a mark of the work of the first half of this century, the golden age of empiricism as I termed it earlier, to firmly establish the use of such methods in education. It is an additional step, however, and one in which the recognition of theory is the main carrier of progress to recognize that the empirical comparison of two methods of teaching reading or of teaching subtraction, to take an example
Salkind_Chapter 84.indd 219
9/4/2010 10:56:53 AM
220
Research Design, Measurement and Statistics and Evaluation
that has been much researched, is by no means to provide anything like the theory of how the child learns to read or learns to do arithmetic. A most elementary perusal of psychological considerations of information processing shows at once how far we are from an adequate theory of learning even the most elementary basic skills. It is a requirement of theory, but not of experimentalism, to provide analysis of the process by which the child acquires a basic skill and later uses it. It is a merit of theory to push for a deeper understanding of the acquisition and not to rest until we have a complete process analysis of what the child does and what goes on inside his head as he acquires a new skill. The history of physics can be written around the concept of the search for mechanisms ranging from the reduction of astronomical motions to compositions of circular motions in the time of Ptolemy to the gravitational and electromagnetic mechanisms of modern physics. It has been to a partial extent, and should be to a greater extent, a primary thrust of theory in educational research to seek mechanisms or processes that answer the question of why a given aspect of education works the way it does. This should be true whether we consider the individual learning of a child beginning school or the much broader interaction between adolescents, their peer groups, and what is supposed to take place in their high school classrooms. For educational purposes we need an understanding of biosocial mechanisms of influence as much as in medicine we need an understanding of biochemical mechanisms for the control of disease in a host organism. The search beyond the facts for a conception of mechanism or of explanation forces upon us a recognition of the complexity of the phenomena and the need for a theory of this complexity. Why not Deweyean problem solving? The instrumental view of knowledge developed by Peirce and Dewey led, especially in the hands of Dewey, to an emphasis on the importance of problem solving in inquiry. As Dewey repeatedly emphasized, inquiry is the transformation of an indeterminate situation that presents a problem into one that is determinate and unified by the solution of the initial problem. Dewey’s conception of inquiry can be regarded as a proper corrective to an overly scholastic and rigid conception of scientific theory, but the weakness of replacing classical conceptions of scientific theory by inquiry as problem solving is that the articulation of the historically and intellectually important role of theory in inquiry is neglected or slighted. In any case, even if we accept some of Dewey’s criticisms of classical philosophical conceptions of theory, we can argue for the importance of the development of scientific theories as potential tools for use in problem solving. It would be a naive and careless view of problem solving to think that on each occasion where we find ourselves in an indeterminate situation we can begin afresh to think about the problem and not to bring to bear a variety of sophisticated systematic tools. This sounds so obvious that it is hard to believe anyone could disagree with it. Historically,
Salkind_Chapter 84.indd 220
9/4/2010 10:56:53 AM
Suppes
The Place of Theory in Educational Research
221
however, it is important to recognize that under the influence of Dewey educational leadership moved away from development and testing of theory, and Dewey himself did not properly recognize the importance of deep-running systematic theories.2 The newest version of the naive problem-solving viewpoint is to be found in the romantics running from John Holt to Charles Silber-man, who seem to think that simply by using our natural intuition and by observing what goes on in classrooms we can put together all the ingredients needed to solve our educational problems. To a large extent these new romantics are the proper heirs of Dewey, and they suffer from the same intellectual weakness – the absence of the felt need for theoretically based techniques of analysis. The continual plague of romantic problem solvers in education will only disappear, as have plagues of the past, when the proper antidotes are developed. My belief about these antidotes is that we need deep-running theories of the kind that have driven alchemists out of chemistry and astrologers out of astronomy. Triviality of bare empiricism. The best general argument for theory in educational research I have left for last. This is the obvious triviality of bare empiricism as an approach to knowledge. Those parts of science that have been beset by bare empiricism have suffered accordingly. It is to be found everywhere historically, ranging from the sections on natural history in the early Transactions of the Royal Society of the seventeenth century to the endless lists of case histories in medicine, or as an example closer to home, to studies of methods of instruction that report only raw data. At its most extreme level, bare empiricism is simply the recording of individual facts, and with no apparatus of generalization or theory, these bare facts duly recorded lead nowhere. They do not provide even a practical guide for future experience or policy. They do not provide methods of prediction or analysis. In short, bare empiricism does not generalize. The same triviality may be claimed for the bare intuition of the romantics. Either bare empiricism or bare intuition leads not only to triviality, but also to chaos in practice if each teacher is left only to his or her own observations and intuitions. Reliance on bare empiricism or bare intuition in educational practice is a mental form of streaking, and nudity of mind is not as appealing as nudity of body.
2. Examples of Theory in Educational Research There are good examples of theory in educational research. I want to consider a few and examine their characteristic features. After surveying five main areas in which substantial theories may be found, I turn to the general question of whether we can expect developments of theory strictly within educational research, or whether we should think of educational research as applied science,
Salkind_Chapter 84.indd 221
9/4/2010 10:56:53 AM
222
Research Design, Measurement and Statistics and Evaluation
drawing upon other domains for the fundamental theories considered, on the model, for example, of pharmacology in relation to biochemistry, or electrical engineering in relation to physics. Statistical design. The bible of much if not most educational research is a statistical bible, and there is little doubt that the best use of statistics in educational research is at a high level. It is sometimes thought by research workers in education that statistical design is simply used in experimental studies and that it does not represent a theoretical component, but I think a more accurate way of formulating the situation is this. When the substantive hypotheses being tested are essentially empirical in character and are not drawn from a broader theoretical framework, then the only theoretical component of the study is the statistical theory required to provide a proper test of the hypotheses. As a broad generalization I would claim that the bestdeveloped theory used in educational research is the theory of statistical design of experiments. The sophisticated level that has been reached in these matters by the latter part of the twentieth century is one of the glories of science in the twentieth century, and the dedication to insisting on proper organization of evidence to make a strong inference has been one of the most creditable sides of educational research over the past fifty years. The opprobrium heaped on matters statistical in educational circles arises, I think, from two main sources. One is that on occasion the teaching traditions have been bad and students have been taught to approach the use of statistics in rote or cookbook fashion, without reaching for any genuine understanding of the inference procedures and their intellectual justification. The second is that the mere use of statistics is not a substitute for good theoretical analysis about the substantive questions at hand. There is no doubt that excellent statistical methods have been used more than once to test utterly trivial hypotheses that could scarcely be of interest to anyone. Neither of these defects, however, makes a serious case for the unimportance of statistical theory. Test theory. My second example is closely related to the first, but is more specific to educational matters. The educational practice of basing decisions on tests has a long and venerable history, the longest and most continuous history being the examinations for mandarins in China, running from the twelfth century to the downfall of the empire at the end of the nineteenth century. The great traditions of testing in Oxford and Cambridge are famous and in previous years notorious. As tradition has it, students preparing for the Mathematical Tripos at Cambridge worked so intensely and so feverishly that many of them went from the examination room directly to the hospital for a period of recuperation. The position that a man achieved in the Mathematical Tripos at Cambridge in the nineteenth century was one of the most important facts about his entire career. The competitive spirit about examinations for admittance to college or graduate school in this country is not at all a new phenomenon, but rather it
Salkind_Chapter 84.indd 222
9/4/2010 10:56:53 AM
Suppes
The Place of Theory in Educational Research
223
represents an old and established cultural tradition. What is new in this century is the theory of tests. In all of that long history of 700 years of Chinese examinations there seems to have been no serious thought about the theory of such tests or even a systematic attempt to collect data of empirical significance. It is an insight that belongs to this century, and historically will be recorded as an important achievement of this century, to recognize that a theory of tests is possible and has to a considerable extent been developed. By these remarks I do not mean to suggest that the theory of tests has reached a state of perfection, but rather that definite and clear accomplishments have taken place. It is in fact a credit to the theory that many of the more important weaknesses of current tests are explicitly recognized. Certainly the concepts of validity and reliability of tests, and the more specific axioms of classical test theory, represent a permanent contribution to the literature of educational theory. (Lord & Novick’s systematic treatise, 1968, provides a superb analysis of the foundations of the classical theory.) Learning theory. In the March 1974 issue of the Educational Researcher, W. J. McKeachie has an article entitled “The Decline and Fall of the Laws of Learning.” He examines what has happened to Thorndike’s Law of Effect and Law of Exercise, especially in the more recent versions of reinforcement theory advocated by Skinner. McKeachie is right in his analysis of the decline and fall of classical laws of learning, but I think that over the past two decades the specific and more technical development of mathematical models of learning that have not made sweeping claims as being the only laws of learning or as being adequate to all kinds of learning have accomplished a great deal and represent a permanent scientific advance. Moreover, the development of mathematical models of learning has not been restricted to simple laboratory situations, but has encompassed results directly relevant to subject-matter learning ranging from elementary mathematics to acquisition at the college level of a second language. It is not to the point in this general lecture to enter into details, but because a good deal of my own research is in this area, I cannot for-bear a few more remarks about what has been accomplished. In the case of mathematics, we can give a detailed mathematical theory of the learning of elementary mathematical concepts and skills by students. The details of the theory are a far cry from the early pioneering work of Thorndike. In fact, the mathematical tools for the formulation of detailed theory were simply not available during the time of Thorndike. I would not want to claim that the theories we can currently construct and test are the last word on these matters. The analysis of specific mathematical skills and concepts has been achieved by moving away from the simple-minded conception of stimulus and response found in Skinner’s writings. In a previous paper given to this Association, I criticized in detail some of the things Skinner has had to say about the learning of mathematics (Suppes, 1972). I shall not repeat those criticisms, but rather in the
Salkind_Chapter 84.indd 223
9/4/2010 10:56:53 AM
224
Research Design, Measurement and Statistics and Evaluation
present context, I shall emphasize the positive and try to sketch the kind of theoretical apparatus that has been added to classical stimulus-response theories of learning in order to have a theory of adequate structural depth to handle specific mathematical concepts and skills. As many of you would expect, the basic step is to postulate a hierarchy of internal processing on the part of the student – processing that must include the handling at least in schematic form of the perceptual format in which problems are presented, whether they are arithmetic algorithms or simple problems of a geometric character. An internal processing language is postulated and the basic mechanism of learning is that of constructing subroutines or programs for the handling of particular concepts and skills (Suppes, 1969b; Suppes & Morningstar, 1972, Ch. 4; Suppes, 1972). There is one important theoretical point about such work that I would like to make, because I think that ignoring this theoretical point represents a major error on the part of some learning psychologists and also of physiological psychologists. The point is that it is a mistake to think of precisely one internal processing language and one particular subroutine for a given skill or concept being learned in the same form by each student. What we can expect in an area like mathematics is behavioral isomorphism, but not internal isomorphism, of subroutines. It is important to think about the theory in this way and not to expect a point-for-point confirmation of the internal programs constructed by the student as he acquires new skills and concepts. To assume that the physiology of human beings is so constructed that we can infer from the physiology how particular tasks are learned and organized internally is as mistaken as to think that from the specification of the physical hardware of a computer we can infer the structure of programs that are written for that computer. It is one reason for thinking that the contributions of physiological psychologists to educational psychology are necessarily limited in principle and not simply in practice. This seems to me worth mentioning because currently physiological psychology is the fashion, and if we are not careful we will begin to hear that the next great hope in educational psychology will be the contributions we can expect from physiological psychology. I am making the strong claim that in principle this may not be possible, and that we can proceed independently within educational research to develop powerful theories of learning without dependence on the latest news from neurophysiology. The kind of examples I have sketched for elementary mathematics can also be extended to language skills and to the important problem of reading. Much of my own recent work has been concerned with first- and secondlanguage acquisition, but I shall not try to expand upon these matters except again to say that what is important about current work in these areas is that specific theories of considerable structural depth, using tools developed in logic for semantics and in linguistics for syntax, have been constructed to provide a richness of theory and a potential for subsequent development that
Salkind_Chapter 84.indd 224
9/4/2010 10:56:53 AM
Suppes
The Place of Theory in Educational Research
225
has not existed until the past decade or so (Smith, 1972; Suppes, 1970, 1971, 1974; Suppes, Smith, & Léveillé, 1972). I am sanguine about the possibilities for the future and believe that substantive contributions of importance to education may be expected from learning theory throughout the rest of this century. Theories of instruction. One of the most interesting and direct applications of modern work in mathematical models of learning has been to the burgeoning subject of theories of instruction. A theory of instruction differs from a theory of learning in the following respect. We assume that a mathematical model of learning will provide an approximate description of the student’s learning, and the task for a theory of instruction is then to settle the question of how the instructional sequence of concepts, skills, and facts should be organized to optimize for a given student his rate of learning. My colleague, Richard Atkinson, has been successfully applying such methods for the past several years, and some of the results he has achieved in beginning reading skills are especially striking (Atkinson, 1972, 1974; Atkinson & Paulson, 1972). The mathematical techniques of optimization used in theories of instruction draw upon a wealth of results from other areas of science, especially from tools developed in mathematical economics and operations research over the past two decades, and it would be my prediction that we will see increasingly sophisticated theories of instruction in the near future. Continuing development of computer-assisted instruction makes possible detailed implementation of specific theories in ways that would hardly be possible in ordinary classrooms. The application by Atkinson and his collaborators that I mentioned earlier has this character, and some of my own work in elementary mathematics is of the same sort. In the case of the elementary-school mathematics programs, what we have been able to do is to derive from plausible qualitative assumptions a stochastic differential equation describing the trajectory of students through the curriculum, with the constants of the solution of the differential equation corresponding to unique parameters of each individual student (Suppes, Fletcher, & Zanotti, 1973). The fits to data we have achieved in this effort are about as good as any I have ever achieved, and I think we can now speak with confidence in this area of student trajectories in the same spirit that we speak of trajectories of bodies in the solar system. But again, I emphasize that this in only the beginning, and the promise of future developments seems much more substantial. Economic models. As I have already remarked, economists’ vigorous interest in education over the past decade has been one of the most salient features of new theoretical work in educational research. Some of us may not like thinking about education as primarily an investment in human capital, and no doubt the concepts of economics introduced into discussions of educational policy in the past few years are alien to many people in education, including a goodly number of educational researchers. Measurements of productivity, for example, that depend mainly on a measurement of output
Salkind_Chapter 84.indd 225
9/4/2010 10:56:53 AM
226
Research Design, Measurement and Statistics and Evaluation
that counts only the number of bodies that pass through a given door to receive accreditation rightly raise questions in the minds of many of us, as do other measures the economists use, sometimes with apparently too much abandon. Moreover, the theoretical tools from economics that have been brought to bear in the economics of education are as yet not thoroughly developed. It is too often the case that an economic model for a particular educational process actually consists of nothing more than an empirical linear-regression equation that has little, if any, theoretical justification back of it. (See, for example, the otherwise excellent articles of Chiswick & Mincer, 1972, and Griliches & Mason, 1972.) All the same, it is my feeling that the dialogue that has begun and that is continuing at an accelerated pace between economists and the broad community of educational researchers is an important one for our discipline. The broad global concepts that economists are used to dealing with provide in many respects a good intellectual antidote to the overly microscopic concerns of educational psychology that have dominated much of the research in education in past decades. I do not mean to suggest by this remark that we should eliminate the microscopic research – I have been too dedicated to it myself to recommend anything of the sort – but rather to say that it is good to have both kinds of work underway, and to have serious intellectual concentration on the broad picture of what is happening in our educational system. The sometimes mindless suggestions of outsiders about how priorities in education should be reallocated or how particular functions should be reduced is best met not by cries of outrage, but by soberminded and careful intellectual analysis of our priorities in allocation of resources. Economic theory, above all, provides the appropriate tools for such an analysis, and I am pleased to see that a growing circle of educational researchers are becoming familiar with the use of these tools and are spending a good deal of time thinking about their applications in education.
3. Sources of Theory I promised earlier to examine the more general question of whether theory in educational research is chiefly a matter of applying theories developed in economics, psychology, sociology, anthropology, and other sciences close in spirit to the central problems of education. I firmly believe such applications will continue to play a major role in educational research as they have in the past, but I also resist the notion that theoretically based work in educational research must wait for the latest developments in various other scientific disciplines before it can move forward. Other areas of applied science show a much more complicated and tangled history of interaction between the basically applied discipline and the fundamental discipline nearest to it. Physics is not just applied mathematics, nor is electrical engineering just applied
Salkind_Chapter 84.indd 226
9/4/2010 10:56:54 AM
Suppes
The Place of Theory in Educational Research
227
physics. These disciplines interact and mutually enrich each other. The same can be said for education. In the earlier history of this century it was difficult to disentangle progress in educational psychology from progress in more general experimental psychology, and recently some of the best young economists have claimed the economics of education as the primary area of economics in which they will develop their fundamental contributions. The role of educational researchers should be not merely to test theories made by others, but, when the occasion demands and the opportunity is there, to create new theories as well. Some areas, like the theory of instruction, seem ripe for this sort of development. Another area that I like to call the theory of talking and listening, or what we might call in more standard terms, the theory of verbal communication, seems ripe also for developments special to education, and I do not propose that we wait for linguists and logicians to set us on the right theoretical tracks. What is important is not the decision as to whether the theories should be made at home or abroad, but the positive decision to increase significantly the theory-laden character of our research. Another point needs to be made about these matters of the source of theory. One of the favorite economic generalizations of our time is that this is the age of specialization. Not every man can do everything equally well, as most of us know when faced with the breakdown of a television set or a washing machine or some other modern device of convenience. This same attitude of specialization should be our attitude toward theory. Not everyone should have the same grasp of theory nor the same involvement in its development. Physics has long recognized such a division of labor between experimental and theoretical physics, and I have come to believe that we need to encourage a similar division in educational research. Ultimately, the most important work may be empirical, but we need both kinds of workers in the vineyard and we need variety of training for these various workers, not only in terms of different areas of education, but also in terms of whether their approach is primarily theoretical or experimental. It is a mark of the undeveloped character of current educational research that we do not have as much division of labor and specialization of research technique as seems desirable. According to one apocryphal story about the late John von Neumann, he was asked in the early fifties to put together a master list of unsolved problems in mathematics comparable to the famous list given by Hilbert at the beginning of the century. Von Neumann answered that he did not know enough about the various branches of mathematics as they had then developed to provide such a list. I shall be happy when the same kind of developments are found in educational research, and when not only inquiring reporters but also colleagues across the hall recognize that the theoretical work in learning theory, or theories of instruction, or the economics of education, or what have you, is now too richly developed and too intricate to have more than amateur opinions about it.
Salkind_Chapter 84.indd 227
9/4/2010 10:56:54 AM
228
Research Design, Measurement and Statistics and Evaluation
It is often thought and said that what we most need in education is wisdom and broad understanding of the issues that confront us. Not at all, I say. What we need are deeply structured theories in education that drastically reduce, if not eliminate, the need for wisdom. I do not want wise men to design or build the airplane I fly in, but rather technical men who understand the theory of aerodynamics and the structural properties of metal. I do not want a banker acting like a sage to recommend the measures to control inflation, but rather an economist who can articulate a theory that will be shown to work and who can make explicit the reason why it works (or fails). And so it is with education. Wisdom we need, I will admit, but good theories we need even more. I want to see a new generation of trained theorists and an equally competent band of experimentalists to surround them, and I look for the day when they will show that the theories I now cherish were merely humble way stations on the road to the theoretical palaces they have constructed.
Notes 1. Presidential address to the American Educational Research Association, Chicago, April 17, 1974. Some of the research reported in this article has been supported by National Science Foundation Grant NSFGJ - 443X. 2. The most detailed expression of Dewey’s (1938) view of scientific inquiry as problem solving is to be found in his Logic. A critical, but I think not unsympathetic, analysis of this work is to be found in my account of Nagel’s lectures on Dewey’s logic (Suppes, 1969a).
References Atkinson, R. C. Ingredients for a theory of instruction. American Psychologist, 1972, 27, 921–931. Republished in M. C. Wittrock (Ed.), Changing education: Alternatives from educational research. Englewood Cliffs, N.J.: Prentice-Hall, 1973. Atkinson, R. C. Teaching children to read using a computer. American Psychologist, 1974, 29, 169 –178. Atkinson, R. C, & Paulson, J. A. An approach to the psychology of instruction. Psychological Bulletin. 1972, 78, 49– 61. Blaug, M. (Ed.) Economics. Vol. 1. Harmondsworth, Middlesex, England: Penguin Books, 1968. Blaug, M. (Ed.) Economics. Vol. 2. Harmondsworth, Middlesex, England: Penguin Books, 1969. Chiswick, B. R., & Mincer, J. Time-series changes in personal income inequality in the United States from 1939, with projections to 1985. Journal of Political Economy, 1972, 30, S34 – S66. Dewey, J. Logic, the theory of inquiry. New York: Holt, 1938. Griliches, Z., & Mason, W. M. Education, income, and ability. Journal of Political Economy, 1972, 88, S74 – S103. Lord, F. M., & Novick, M. R. Statistical theories of mental test scores. New York: Addison-Wesley, 1968.
Salkind_Chapter 84.indd 228
9/4/2010 10:56:54 AM
Suppes
The Place of Theory in Educational Research
229
McKeachie, W. J. The decline and fall of the laws of learning. Educational Researcher, 1974, 3, 7–11. National Science Foundation, Science, Technology, and Innovation. The place of theory in educational research. Columbus, Ohio: Battelle, Columbus Laboratories, 1973. Page, E. B. Seeking a measure of general educational advancement: The Bentee. Journal of Educational Measurement, 1972, 9, 33 – 43. Page, E. B., & Breen, T. F., III. Educational values for measurement technology: Some theory and data. In W. E. Coffman (Ed), Frontiers of educational measurement and information systems, 1973. Boston: Houghton Mifflin, 1973. Smith, R. L. The syntax and semantics of ERICA. (Tech. Rept. No. 185) Stanford, Calif.: Institute for Mathematical Studies in the Social Sciences, Stanford University, 1972. Suppes, P. Nagel’s lectures on Dewey’s logic. In S. Morgenbesser, P. Suppes, & M. White (Eds.), Philosophy, science, and method. New York: St. Martin’s Press, 1969. (a) Suppes, P. Stimulus-response theory of finite automata. Journal of Mathematical Psychology, 1969, 6, 327– 355. (b) Suppes, P. Probabilistic grammars for natural languages. Synthese, 1970, 22, 95 –116. Republished in D. Davidson & G. Herman (Eds.), Semantics of natural language. Dordrecht, Holland: Reidel, 1972. Suppes, P. Semantics of context-free fragments of natural languages. (Tech. Rept. No. 171) Stanford, Calif.: Institute for Mathematical Studies in the Social Sciences, Stanford University, 1971. Republished in K. J. J. Hintikka, J. M. E. Moravcsik, & P. Suppes (Eds.), Approaches to natural language. Dordrecht, Holland: Reidel, 1973. Suppes, P. Facts and fantasies of education. Phi Delta Kappa Monograph, 1972. Republished in M. C. Wittrock (Ed.), Changing education: Alternatives from educational research. Englewood Cliffs, N.J.: Prentice-Hall, 1973. Suppes, P. The semantics of children’s language. American Psychologist, 1974, 29, 103 – 114. Suppes, P., Fletcher, J. D., & Zanotti, M. Models of individual trajectories in computer-assisted Instruction for deaf students. (Tech. Rept. No. 214) Stanford, Calif.: Institute for Mathematical Studies in the Social Sciences, Stanford University, 1973. Suppes, P., & Morningstar, M. Computer-assisted instruction at Stanford, 1966 – 68: Data, models, and evaluation of the arithmetic programs. New York: Academic Press, 1972. Suppes, P., Smith, R., & Léveillé, M. The French syntax and semantics of PHILIPPE, Part I: Noun phrases. (Tech. Rept. No. 195) Stanford, Calif.: Institute for Mathematical Studies in the Social Sciences, Stanford University, 1972.
Salkind_Chapter 84.indd 229
9/4/2010 10:56:54 AM
Salkind_Chapter 84.indd 230
9/4/2010 10:56:54 AM
85 Curriculum-based Measures: Development and Perspectives Stanley L. Deno
C
urriculum-based measurement (CBM) (Deno, 1985) is an approach to measuring the academic growth of individual students. The essential purpose of CBM has always been to aid teachers in evaluating the effectiveness of the instruction they provide to individual students. However, research and development on CBM has extended it to educational decisions well beyond those for which it was originally created. Thus, the early work on improving the effectiveness of special education for students with learning disabilities has been expanded to screening and identification of students who are at risk for academic failure, developing schoolwide accountability systems, addressing the problem of disproportionate representation, evaluating growth in early childhood, assessing attainments in content area learning, measuring literacy in students who are deaf, assessing students who are English language learners (ELL), and predicting success on highstakes assessments. This article presents a brief history of the development of CBM and reflections on current efforts to use CBM to address a variety of educational problems.
Development of CBM CBM originated in the data-based program modification (DBPM) model, (Deno & Mirkin, 1977), which outlined how a variety of progress monitoring
Source: Assessment for Effective Intervention, 28(3–4) (2003): 3–12.
Salkind_Chapter 85.indd 231
9/4/2010 11:09:29 AM
232
Research Design, Measurement and Statistics and Evaluation
data could be used to make educational programming decisions for students in special education. The DBPM model was designed to be used by special education resource teachers in improving their interventions with students who were struggling academically. While the model showed how the data could be used to make decisions for students in special education, its validity as an approach for improving special education had not been empirically validated. To explore the validity of DBPM, an empirical research and development program was conducted for six years through the federally funded Institute for Research on Leaming Disabilities (IRLD) at the University of Minnesota. The ultimate question pursued through the research program was whether a formative evaluation system could be developed for use by teachers to improve their effectiveness in teaching students with academic disabilities. Ultimately, a comparative study demonstrated that teachers were more effective when using such a model (Fuchs et al., 1984). In the course of conducting the CBM research program through the IRLD, a set of generic progress monitoring procedures were developed that met conventional reliability and validity criteria in the areas of reading, spelling, and written expression. Three key questions were addressed: (a) What are the outcome tasks on which performance should be measured? (“What to measure”); (b) How must the measurement activities be structured to produce technically adequate data? (“How to measure”); and (c) Can the data be used to improve educational programs? (“How to use”). The questions were answered through systematic examination of three key issues relevant to each – the technical adequacy of the measures, the treatment validity or utility of the measures, and the logistical feasibility of the measures. The framework for developing the measures has been specified elsewhere and will not be included here (Deno & Fuchs, 1987). The results of the research on progress monitoring led to the development of an assessment approach typically referred to as CBM (Deno, 1985). Extensive field applications of research illustrating “what to measure,” “how to measure,” and “how to use” the data have occurred and are described in a variety of publications (cf. Fuchs, Fuchs, & Maxwell, 1988; Shinn, 1989, 1998). In summary, these studies resulted in basic skills measures that are now widely used to improve educational decisions in a variety of contexts.
CBM and CBA The term curriculum-based assessment (CBA) became popular in the field of special education with the publication of a special issue of Exceptional Children on that topic (Tucker, 1985). In that issue, Tucker described CBA as a practice that had existed for a long time – the practice of using what is to be learned as the basis for assessing what has been learned. While his description is appealing, it does not clearly distinguish CBA from traditional
Salkind_Chapter 85.indd 232
9/4/2010 11:09:29 AM
Deno
Curriculum-based Measures
233
psychometric test construction where a table of specifications is used to define the content domains of a test and the tests are then designed to test whether that intended content has been learned. Four salient differences between CBA and traditional psychometric testing can be identified: (a) in CBA, the very curriculum materials that serve as the media for instruction are used as the test stimuli; (b) direct observation and recording of student performance in response to selected curriculum materials are emphasized as a basis for collecting information that is used to make assessment decisions; (c) interobserver agreement is the primary technique used to establish the reliability of information collected through CBA; and (d) social validity is typically the basis for justifying the use of information gathered through CBA. Given these emphases, it is common for CBA proponents to argue that the information gathered from student performance in the curriculum more adequately reflects the real goals of instruction in the classroom than most standardized achievement tests because it relates more directly to what is being taught. Further, the content and materials of daily instruction are viewed as a fairer and firmer basis for making judgments about student learning.
CBM as Distinct from CBA Since the focus of this article is on CBM, distinguishing between CBM and CBA is necessary. The term assessment as used in CBA is very broad, referring to information gathered for purposes of decision-making. Thus, curriculumbased assessment is often used to refer to any information-gathering practices that occur when obtaining information about student performance in the curriculum. Such practices can include scoring a student’s worksheets to determine the percentage of questions answered correctly; doing an error analysis of a student’s oral reading from instructional text; or establishing “mastery” of a new skill based on performance on an end of unit test. In CBA, typically, different assessment information is collected for different decisions. A variety of different, but related, approaches to CBA are represented in the current literature (e.g., Bigge, 1988; Howell et al., 1993; Idol, Nevin, & Paolucci-Whitcomb, 1986; Shinn, 1989).
“Measurement” rather than “Assessment” From the perspective provided here, CBM is a separate and distinct subset of CBA procedures. As such, it refers to a specific set of procedures for measuring student growth in basic skills developed at the University of Minnesota through the Institute for Research on Learning Disabilities (Deno, 1985). The procedures were developed as part of a larger program of research directed toward designing a practically feasible and effective formative evaluation
Salkind_Chapter 85.indd 233
9/4/2010 11:09:30 AM
234
Research Design, Measurement and Statistics and Evaluation
system that special education teachers could use to build more effective instructional programs for their students. As part of that formative evaluation system it was necessary to create a simple, reliable, and valid set of measurement procedures that teachers could use to frequently and repeatedly measure the growth of their students in the basic skills of reading, spelling, and written expression. As with CBA, the measurement procedures of CBM become “curriculum-based” when they are used within the context of the local school’s curriculum.
CBM and “General Outcome Measurement” As continued development of CBM has occurred, evidence has been generated leading to the conclusion that the generic measurement procedures of CBM can provide technically adequate and instructionally relevant data using stimulus materials drawn from sources other than a school’s curriculum (Fuchs & Deno, 1994). For that reason, the terms general outcome measurement (GOM) (Fuchs & Deno, 1994) and dynamic indicators of basic skills (DIBS) (Shinn, 1995, 1998) have been coined to refer to the generic measurement procedures used with stimulus materials that are not drawn from the curriculum. This “uncoupling” of CBM from the local school’s curriculum has made it increasingly possible, both in research and practice, to capitalize on using standardized stimulus materials without the loss of the relevance of CBM for making everyday instructional programming decisions. Further, it has enabled extensions of CBM to areas of skill development where schools do not always have a curriculum (e.g., secondary reading and written expression – Espin & Deno, 1993; Espin, Scierka, Skare, & Halverson, 1999; early literacy – Kaminski & Good, 1996; English language learning – Baker & Good, 1995). It has also facilitated development of computer-based applications (Fuchs et al., 1993), enabled aggregation of data across schools to make district-level evaluation decisions (Marston & Magnusson, 1998), been used as a component of effective classroom intervention packages (Fuchs et al., 1997), and opened new avenues to assessing reading and writing in students who are deaf and hard of hearing (Chen, 2002; Devenow, 2002). Further examples of the development and extensions of the generic CBM procedures for measurement are illustrated in other articles in this special issue.
An Example of CBM A distinctive characteristic of CBM when used to improve individual student programs is the individual student progress graph which illustrates the responsiveness of a student to various program modifications made by the teacher. Figure 1 shows the results of using CBM procedures with a student
Salkind_Chapter 85.indd 234
9/4/2010 11:09:30 AM
Deno
140
PreSpEd
Curriculum-based Measures
SpEd Resource Program
Prereferral Intervention
120 100
235
Goal
Peer Level
CBM Growth 80 Score (Reading) 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Weeks correct
incorrect
Figure 1: CBM formative evaluation graph
in reading over the course of a single school year. Specifically, we see the results of repeated sampling of student performance in reading different passages from the same book throughout the year. The number of words read aloud correctly in one minute from those passages is plotted on the graph. In addition, we see how performance changes in relation to changes made in that student’s program. It is this continuous evaluation framework that was tested by Fuchs, Deno, and Mirkin (1984) and found effective in accelerating student growth.
Characteristics of CBM Repeated Measurement on a Single Task The generic measurement procedures used in CBM and GOM are based on obtaining repeated samples of student performance on equivalent forms of the same task across time. Changes in performance on this task are then interpreted to reflect generalizable change in a student’s proficiency. The procedures are analogous to what occurs when we measure changes in a child’s height and weight using a scale or ruler. The concept is simple, but it is uncharacteristic of education. From the out-set, CBM development created a system that allows teachers to focus clearly on the target of their instruction, based on the assumption that successful intervention requires that teachers receive clear and unambiguous feedback about the general effects of their instructional efforts.
Salkind_Chapter 85.indd 235
9/4/2010 11:09:30 AM
236
Research Design, Measurement and Statistics and Evaluation
If teachers are either (a) uncertain about the overall effects of their efforts or (b) believe they have been successful simply because a student learns the specific content that has been taught, their efforts to improve growth will be unsuccessful. The uncertainty present in (a) can stem from the fact that teachers do not have “vital sign” indicators for learning, such as pulse rate and temperature, that they can use to monitor the effects of their instruction in basic skills outcomes. In many respects teachers must operate like early pilots who had to resort to feel; that is, to “flying by the seat of their pants” because instruments to indicate aircraft altitude and attitude had not yet been developed. Unfortunately, without instrumentation it was possible for pilots to believe they were flying straight and level when, in fact, they were headed directly toward the ground. Similarly, it is possible for teachers to believe that they have been successful because a student has learned what has been taught. The misfortune in this case is that it is possible to successfully teach something that might not contribute to developing the overall proficiency for which the curriculum is designed. The problem is similar to what occurs when taking a golf lesson and learning to do what the instructor has taught, but then finding, much to our disappointment, that we play no better on the golf course. What we teach in an area like reading, and what students learn from that instruction, does not always contribute to general improvements in reading. The measurement tasks of CBM are empirically selected and, therefore, reflect whether the instruction we are providing does, in fact, result in improvement in general reading outcomes.
Empirically Selected Tasks In developing CBM procedures, a two-part strategy is used to identify tasks that teachers can reasonably use to evaluate their instruction. The first part of the strategy – initial task selection – is based on research using a criterion validity paradigm to select tasks that would seem to be the best candidates for repeated performance measurement (Deno, 1985; Fuchs & Maxwell, 1988; Marston, 1989). The second part of the task selection strategy is to test the instructional utility of the measures by evaluating the student achievement effects of teachers using the CBM data to make instructional evaluation decisions (Fuchs, Deno, & Mirkin, 1984; Fuchs et al., 1989).
Reliability A related consideration in empirical task selection relates to the fact that CBM data are used to make important instructional intervention decisions. For that reason, like criterion validity, the tasks selected for use in CBM are those for which reliable measures could be constructed (Marston, 1989).
Salkind_Chapter 85.indd 236
9/4/2010 11:09:30 AM
Deno
Curriculum-based Measures
237
Establishing the reliability of CBM always includes not only inter-observer agreement, but test/retest and alternate form reliabilities as well. The latter reliability is particularly important since multiple forms are used in CBM.
Economical and Efficient A number of additionally important characteristics used in developing CBM procedures relate to the need for them to be logistically feasible within the context of ongoing instruction, as listed below. Time efficient. Since frequent, repeated measurements are required for growth measurement and evaluation, CBM tasks must be of short duration. Multiple forms. Each repeated measurement of CBM must be in response to a stimulus task that is unfamiliar to the student so that any increase in performance represents real growth in general proficiency rather than the effects of practice. Thus, for any task used, it must be simple to create many equivalent forms. Inexpensive. Since many forms must be made available for teachers to use frequently, the task must be one that would not require expensive production of materials. Easy to teach. Since it is likely that many teachers, paraprofessionals, and students will administer the measures, the task must be one that can be easily taught.
Issues in Implementing CBM Establishing parameters such as these in task selection for CBM has always been important because it delimits the range and variety of tasks included in any search for valid indicators of basic proficiency. In addition, specifying the characteristics of a practically feasible task on which to do frequent, repeated measurement enables the developer to focus criterion validity research on only those tasks that could reasonably be part of a classroom-based, ongoing formative evaluation system. Unfortunately, these reasons for limiting task selection have not always been fully understood or appreciated by many on their first encounter with CBM. Thus, both potential users and developers of alternative measures may increase the complexity and scope of measures intended to assess curriculum outcomes, resulting in impractical measures that cannot be used as part of routine classroom instruction. Paradoxically, the fact that CBM procedures do not require tasks for measurement that seem sufficiently complex can mislead critics into believing that the CBM measures are invalid (Shinn, 1998). A good illustration of this problem may be found in reading where the evidence has been developed
Salkind_Chapter 85.indd 237
9/4/2010 11:09:30 AM
238
Research Design, Measurement and Statistics and Evaluation
that, when structured properly, reading aloud from text can be used to develop a global indicator of reading proficiency (Deno, Mirkin, & Chiang, 1982). The major criticism of measuring reading by having students read aloud from connected discourse is that such a task does not reflect a student’s comprehension of text. On technical grounds this criticism is invalid (Fuchs et al., 1988; Good & Jefferson, 1998). The criterion validity research on using this task in reading measurement provides a solid empirical basis for concluding that the number of words read aloud correctly from text in a 1-minute time sample is a good indication of a student’s general reading proficiency. CBM reading scores relate sensibly to standardized achievement test scores, to Students’ age and grade, to teachers’ judgments of reading proficiency, and to placements in regular, compensatory, and special education programs. Nevertheless, critics will argue that CBMs in reading should include a “direct measure of comprehension” such as answering comprehension questions or retelling the story that has been read. While it is possible to argue on empirical grounds that reading aloud from text indexes comprehension better than most so-called “direct measures” (Fuchs et al., 1988), it is important to clarify that tasks such as answering comprehension questions or retelling the story do not meet the requirements established for the CBM outlined above. To use either task would (a) consume far too much time to be used for repeated measurement in CBM (students would have to read fairly lengthy passages so that question asking or story retelling would be possible); (b) cost too much in the development of multiple equivalent forms; and (c) as in the case of story retell, be difficult to teach others to score reliably. Thus, while these tasks have been used as criterion measures in CBM task selection, they must be excluded as candidates for repeated measurement on other important grounds. As CBM developers have painfully learned, however, neither empirically nor technologically valid reasons are enough to persuade many people. In a study of the barriers to successful use of CBM (Yell, Deno, & Marston, 1992), “face validity” issues stood out as among the most important concerns for teachers. In their survey, Yell et al. also found interesting differences between teachers and administrators. That is, teachers focused on the immediate impact of using CBM on a frequent basis and expressed concern about the additional time required in conducting CBM. In fact, three of the five barriers frequently identified by teachers refer to time associated problems – despite the efficient nature of the measures. The administrators’ view of problems associated with implementing CBM were quite different from those of teachers. The emphasis in their responses was that it was difficult to develop effective teacher use of the CBM procedures. Three of the five barriers most frequently identified by administrators addressed difficulties related to a lack of teacher resourcefulness in using the CBM data responsively to modify and evaluate their instruction. Of interest is the fact that the single most frequently identified barrier from the administrators’ perspective was the
Salkind_Chapter 85.indd 238
9/4/2010 11:09:30 AM
Deno
Curriculum-based Measures
239
“natural resistance” that occurred when any change in practice was required of school personnel.
Reflections on CBM in the Broader Context of Assessment The results of the CBM research program have provided a basis for developing standardized measurement procedures that can be used to formatively evaluate the effects of modifications in the instructional programs for individual students. Indeed, the research conducted on student achievement effects of special education teachers using these procedures provides a basis for concluding that instructional effectiveness can be improved through the use of CBM in formative evaluation. At the same time, the CBM procedures have been used to data-base the full range of intervention decisions that are made for students who are academically “at risk.” In addition, CBM/GOM/ DIBS are being used to address the problem of disproportionate representation (Minneapolis Public Schools, 2001) in a problem-solving model that emphasizes prereferral intervention evaluation (Shinn, 1995; Tilly & Grimes, 1998); to appraise growth in early childhood (Kaminski & Good, 1996); to assess attainments in content area learning (Espin & Foegen, 1996); and to predict success on high stakes assessments (e.g., Deno, Reschly-Anderson, Lembke, Zorka, & Callender, 2002; Good, Simmons, & Kameenui, 2001). Developments in using CBM procedures have accelerated dramatically in the past five years. For example, textbooks now routinely include descriptions of how CBM is used in both assessment and remediation (Henley, Ramsey, & Algozzine, 2002; Mercer, 1997; Spinelli, 2002; Taylor, 2000), and dissemination is extensive – much of it likely due to the functional utility of the measures. In addition, the generic nature of the procedures may have allowed a wide range of potential users and developers to feel “ownership” over both the procedures and the data collected, setting the standardized procedures of CBM apart from most standardized tests, which are the commercial property of test developers and test publishers. It will be interesting to track the relative use of growth measures like CBM and status measures like commercial standardized tests. Little work has been done in the private sector to develop progress-monitoring systems. The reason is unclear but probably stems from the fact that development of educational and psychological measurement in the United States has been directed toward discriminating between individuals for purposes of classification. That is, to describe differences between individuals rather than differences within an individual across time. Differences between individuals are important when the primary function of assessment is to sort individuals into groups for making selection decisions rather than to examine individual growth. Those of us working in school programs are very aware that assessments commonly are conducted to
Salkind_Chapter 85.indd 239
9/4/2010 11:09:30 AM
240
Research Design, Measurement and Statistics and Evaluation
classify students as eligible for alternative programs like special education, Title I, and gifted education. In all of these cases, the decision to be made has rested on distinguishing the relative achievements or accomplishments of a subgroup of students within the general student population. Since the economic and social consequences of these decisions are potentially very important, it is not surprising that responsible decision makers would seek assessment procedures that discriminate and quantify differences between individuals as justification for these decisions. Interest examining individual performance to ascertain attainment of “standards” is increasing. Criterion performance on particular tasks is gaining prominence in the view of decision makers. Important also in this shift to alternative approaches to performance assessment is not only the increased emphasis on criterion performance but also on the nature of the tasks selected for assessment purposes. Authenticity has become the prime characteristic to be embraced when tasks are selected, and, for that reason, face validity has now become paramount in task selection. Indeed, the argument is that authenticity and face validity can take the place of the more traditional reliability and validity criteria of psychometrics. If we are interested in developing CBM procedures for continued use in student progress monitoring, we must see the recommendations associated with alternative approaches to assessment as helpful. Contained in those recommendations is an emphasis on individual attainment that is at the basis of progress monitoring. Discriminating growth relative to a performance standard is an important shift away from the emphasis on making distinctions between individuals. At the same time we should not be sanguine about the possibility that the focus will now become individual growth rather than sorting and classifying students. Indeed, those of us concerned with the education and habilitation of people with disabilities have already seen that the emphasis on attaining performance standards has resulted in a tendency to exclude such persons from the assessment process. A second concern is that the race to develop alternatives has resulted in expectations far exceeding reality. Establishing authenticity and instructional utility as characteristics for assessment are admirable ideals, but just as developing a cure for cancer requires more than specifying the goal, developing assessment procedures with particular characteristics requires more than asserting their importance. Contrary to assumptions currently made by advocates of “authentic assessment,” the technical knowledge required for accomplishing our goals is neither available nor unnecessary. Any reading of the current literature on the results of efforts to develop and use new alternative approaches to assessment reveals that the effort is fraught with difficulty. Many years ago, Jerome Bruner (1965) argued that achievements in developing technology that increases our powers of observation are the basis of most of our greatest scientific achievements. If that is so, the development of improved procedures for assessing individual growth may well result in
Salkind_Chapter 85.indd 240
9/4/2010 11:09:30 AM
Deno
Curriculum-based Measures
241
breakthroughs that increase our knowledge of human development and our success in optimizing that development. Most certainly, breakthroughs in assessment technology that expand our knowledge in the long run will result from the types of intense research and development efforts presented in this special issue rather than from engaging in the politics of education.
References Baker, S. K., & Good, R. H. (1995). Curriculum-based measurement of English reading with bilingual Hispanic students: A validation study with second-grade students. School Psychology Review, 24, 561–578. Bigge, J. (1988). Curriculum-based instruction. Mountain View, CA: Mayfield Publishing Co. Bruner, J. S. (1965). On Knowing: Essays for the left hand. New York: Atheneum. Chen, Y. (2002). Assessment of reading and writing samples of deaf and hard of hearing students by curriculum-based measurements. Unpublished doctoral dissertation. University of Minnesota. Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219–232. Deno, S. L., & Fuchs, L. S. (1987). Developing curriculum-based measurement systems for data-based special education problem solving. Focus on Exceptional Children, 19 (8), 1–15. Deno, S. L., & Mirkin, P. K. (1977). Data-based program modification: A manual. Reston, VA: Council for Exceptional Children. Deno, S. L., Mirkin, P., & Chiang, B. (1982). Identifying valid measures of reading. Exceptional Children, 49(1), 36 – 45. Deno, S. L., Reschly-Anderson, A., Lembke, E., Zorka, H., & Callender, S. (2002). A model for schoolwide implementation: A case example. Presentation at the National Association of School Psychologists Annual Meeting, Chicago. Devenow, P. S. (2002). A study of the CBM maze procedure as a measure of reading with deaf and hard of hearing students. Unpublished doctoral dissertation. University of Minnesota Espin, C. A., & Deno, S. L. (1993). Content-specific and general reading disabilities of secondary-level students: Identification and educational relevance. Journal of Special Education, 27, 321–337. Espin, C. A., & Foegen, A. (1996). Validity of three general outcome measures for predicting secondary student performance on content-area tasks. Exceptional Children, 62, 497–514. Espin, C. A., Scierka, B. J., Skare, S., & Halverson, N. (1999). Criterion-related validity of curriculum-based measures in writing for secondary students. Reading and Writing Quarterly, 15, 5–27. Fuchs, D., Fuchs, L. S., Mathes, P., & Simmons, D. (1997). Peer-Assisted Learning Strategies: Making classrooms more responsible to student diversity. American Educational Research Journal, 34, 174–206. Fuchs, L. S., & Deno, S. L. (1994). Must instructionally useful performance assessment be based in the curriculum? Exceptional Children, 61(1), 15–24. Fuchs, L., Deno, S. L., & Mirkin, P. (1984). Effects of frequent curriculum-based measurement and evaluation on pedagogy, student achievement, and student awareness of learning. American Educational Research Journal, 21, 449–460. Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1989). Effects of instrumental use of curriculumbased measurement to enhance instructional programs. Remedial and Special Education, 10(2), 43–52.
Salkind_Chapter 85.indd 241
9/4/2010 11:09:30 AM
242
Research Design, Measurement and Statistics and Evaluation
Fuchs, L. S., Fuchs, D., & Hamlett, C. L. (1993). Technological advances linking the assessment of students’ academic proficiency to instructional planning. Journal of Special Education Technology, 12, 49–62. Fuchs, L. S., Fuchs, D., & Maxwell, L. (1988). The validity of informal reading comprehension measures. Remedial and Special Education, 9, 20–28. Good, R., & Jefferson, G. (1998). Contemporary perspectives on curriculum-based measurement validity. In M. R. Shinn (Ed.). Advanced applications of curriculum-based measurement (pp. 61–88). New York: The Guilford Press. Good, R. H. III, Simmons, D. C., & Kameenui, E. J. (2001). The importance and decisionmaking utility of a continuum of fluency-based indicators of foundational reading skills for third-grade high stakes outcomes. Scientific Studies of Reading, 5(3), 257–288. Henley, M., Ramsey, R. A., & Algozzine, R. F. (2002) Teaching students with mild disabilities (4th ed.). Boston: Allyn & Bacon. Idol, L., Nevin, A., & Paolucci-Whitcomb. (1986). Models of curriculum-based assessment. Rockville, MD: Aspen Publishers, Inc. Kaminski, R. A., & Good, R. H. (1996). Toward a technology for assessing basic early literacy skills. School Psychology Review, 25, 215–227. Marston, D. (1989). A curriculum-based approach to assessing academic performance: What it is and why do it. In M. R. Shinn (Ed.), Curriculum-based measurement: Assessing Special Children (pp. 19–78). New York: Guilford Press. Marston, D., & Magnusson, D. (1988). Curriculum-based measurement: District level implementation. In J. Garden, J. Zins, & M. Curtis (Eds.). Alternative educational delivery systems: Enhancing instructional options for all students (pp. 137–172). Washington, D.C.: National Association of School Psychologists. Mercer, C. D. (1997). Students with learning disabilities (5th ed.). Upper Saddle River, NJ: Merrill/Prentice-Hall. Minneapolis Public Schools. (2001). Report of the external review committee on the Minneapolis Problem Solving Model. Minneapolis, MN: Author. Shinn, M. R. (Ed.). (1989). Curriculum-based measurement: Assessing special children. New York: Guilford Press. Shinn, M. (1995). Best practices in curriculum-based measurement and its use in a problem-solving model. In J. Grimes & A. Thomas (Eds.). Best practices in school psychology III (pp. 547–568). Silver Spring, MD: National Association of School Psychologists. Shinn, M. R. (Ed.). (1998). Advanced applications of curriculum-based measurement. New York: Guilford. Spinelli, C. (2002) Classroom assessment for students with special needs in inclusive settings. Upper Saddle River, NJ: Merrill/Prentice Hall. Taylor, R. L. (2000). Assessment of special students (5th ed.). Needham Heights, MA: Allyn & Bacon. Tilly, W. D., & Grimes, J. (1998). Curriculum-based measurement: One vehicle for systematic educational reform. In M.R. Shinn (Ed.), Advanced applications of curriculumbased measurement (pp. 32–88). New York: Guilford. Tucker, J. (1985). Curriculum-based assessment: An introduction. Exceptional Children, 52(3), 266–276. Yell, M., Deno, S. L., & Marston, D. (1992). Barriers to implementing curriculum-based measurement. Diagnostique, 18(1), 99.
Salkind_Chapter 85.indd 242
9/4/2010 11:09:30 AM
86 Tests as Research Instruments Robert L. Thorndike
General Treatments
T
he period in question has seen the publication of two general treatises on test construction and theory which will represent lasting contributions to the testing literature. The first of these is Gulliksen’s Theory of Mental Tests (48). Gulliksen undertook the presentation of a complete integrated picture of the rational and statistical theory underlying the analysis of a single test. No attempt was made to deal fully with multivariate analysis. The book brings together under one cover a great deal of material that will be useful to the student of test theory. The reviewer was particularly interested in the treatments of (a) the effects of heterogeneity and of curtailment on a correlated variable, (b) the statistical definition of parallel tests, (c) the statistics of speeded tests, and (d) the general logic of weighting subtests to yield a total score. The second important book is Educational Measurement (60), edited by Lindquist and written by 21 contributing authors. This volume is divided into three sections, dealing respectively with “The Functions of Measurement in Education,” “The Construction of Achievement Tests,” and “Measurement Theory.” Specific chapters will be mentioned in connection with specific topics. In addition to the above two sources, Thorndike (70) brought out a text dealing with the phases of test development and analysis for personnel selection. The treatment is oriented around the use of tests for selecting and classifying military, civil service, or industrial personnel. Introductory texts on test preparation, designed primarily for the classroom teacher, have been prepared
Source: Review of Educational Research, XXI(5) (1951): 450 – 462.
Salkind_Chapter 86.indd 243
9/4/2010 11:00:31 AM
244
Research Design, Measurement and Statistics and Evaluation
by Travers (72) and Micheels and Karnes (62). An extensive bibliography of selected references on test construction, mental test theory, and statistics has been prepared by Goheen and Kavruck (43).
Preparation of Test Items Guidance in the preparation of test items has been provided by Ebel (60), while Davis (60) commented on editorial considerations in item writing. Flanagan (36) and Travers (73) have each protested against the purely empirical approach to item preparation and have urged the importance of rational analysis and the formulation of definite hypotheses as the basis for item preparation. Travers contrasted the approach of the technician, who is only interested in empirical validity, with that of the scientist, who is interested in developing and testing hypotheses, and pleaded for more of the scientific approach in test construction. Flanagan indicated the importance of determining the critical requirements of any job or segment of education, analyzing the knowledge and skill required to succeed in those requirements, and relating each test item directly to some required knowledge or skill.
Item Analysis Davis (60) gave a comprehensive discussion of the logic and procedures for item analysis, and indicated appropriate ways of using item analysis data. The literature is covered thru 1949 and is supplemented by the author’s own critical discussion of such problems as correction for chance, optimum difficulty distributions, and the appropriate use of item statistics in preparing different types of tests. A number of reports consider specific aspects of item difficulty. Cadwell (16) reported data which confirms previous indications that judges can estimate the relative difficulty of test items with fair success, but are not able to make accurate judgments of absolute difficulty level. The problem of using word frequency counts as indicators of difficulty of vocabulary test items received the attention of several writers (29, 56, 79). The relationship appears to be very slight within the range of commoner words and when vocabulary knowledge is measured by testing precise choice of meaning. When relatively rare words are included and when only broad discriminations of meaning are required, a substantial relationship appears. There appear, thus, to be two somewhat distinct aspects of vocabulary involved – range and precision. The optimum shape of test score distribution was discussed by Ferguson (34). Ferguson indicated that when the function of a test is to make the maximum number of discriminations among individuals tested, the optimum shape of distribution is rectangular, and that this shape of distribution can be
Salkind_Chapter 86.indd 244
9/4/2010 11:00:32 AM
Thorndike
Tests as Research Instruments
245
approximated by selecting items within a narrow range of difficulties around the 50 percent level. There are, of course, as Davis (60) indicated, a number of other purposes for which a test may be used, for which quite different score distributions may be required. Mollenkopf (63) has investigated the effect of position in the test and speeded vs. unspeeded administration on indexes of item difficulty and discrimination. Both types of indexes were found to be disturbed for the later items of speeded tests. The effect was judged due to the distorting selection introduced in a speeded test, where those who complete the test tend to be the able on the one hand and the careless on the other. The relative precision of different indexes of item-test correlation has been compared by Doppelt and Potts (31) and by Flanagan (35). The results are in agreement in indicating that correlations estimated from the upper and lower 27 percent are only slightly less precise than biserial correlations, tho both of these show sampling fluctuations substantially larger than those to be expected for a product-moment correlation with the same size of group. Bedell (5) developed a routine for determining the number of items from a test to retain for maximum validity, while Gulliksen (46) and Gleser and DuBois (42) each proposed approximation procedures for selecting from a total group of items the subset which yields a score with maximum validity. All these procedures are concerned with maximizing the correlation for the specific sample. It is not clear for any of them that the correlation will still be a maximum in a new sample in which regression effects change the relative size of correlations for different items. Walker (1) proposed applying the logic of sequential analysis to item analysis. In this application of sequential analysis, one examines item data for a small number of pairs of cases from the upper and lower extremes of the group and decides that the results (a) require rejection of the hypothesis of zero correlation between item and test, (b) are completely compatible with the hypothesis of no relation, or (c) do not permit a decision. In the last case, additional cases are added pair by pair until a decision is possible in one direction or the other. Where analysis was being done by hand, without benefit of IBM equipment, the sequential procedure could presumably result in a substantial time saving. However, the only decision which it permits is that an item’s correlation with a criterion score is or is not different from zero. This is rarely a useful piece of information when carrying out an item analysis, since practically every item will have a positive validity coefficient, and the decision which one must make is which are the most desirable items to use from among a group all of which have positive correlations with the total score. The sequential analysis type of thinking has also been discussed by Moonan (64) in connection with the use of tests and test scores and by Kimball (55) in connection with the checking of test scoring. In the first instance, the proposal is that those pupils whose performance on a sample of
Salkind_Chapter 86.indd 245
9/4/2010 11:00:32 AM
246
Research Design, Measurement and Statistics and Evaluation
test material permits a decision that they surpass a required limit be exempted from further testing and spend their time in some other way (an idea related to the procedure that has occasionally been practiced in schools of exempting certain students with superior class records from final examinations). In the second case, the idea is to rescore a limited sample of a set of papers and to continue check-scoring until it is possible to state with specified confidence that the scoring does or does not meet specified limits of accuracy. In spite of the interest of these investigators, the present writer doubts whether sequential analysis has any very important contribution to make in the construction and use of tests. In using tests, weighting the part scores when they are combined is always a nuisance. Horst (49, 51) developed a technic for calculating what part of a specified total testing period (not including time for instructions and practice) should be allotted to each subtest to give a maximum prediction of a criterion. The solution applies, of course, to the present sample, and the lengths will not in general, be those which will yield the most valid test in a new sample.
Reliability and Homogeneity Problems of estimating the precision and singleness of meaning of the score resulting from a test continue to attract attention and arouse controversy. Thorndike (60) has reviewed the underlying logic, the experimental procedures for gathering data, the statistical procedures for computing indexes, and the uses and limitations of reliability data. Horst (50) has developed a generalized formula for estimating reliability which is applicable when the number of scores or ratings varies from person to person. A number of writers have been concerned with the concept of test homogeneity. Roughly speaking, a homogeneous test is one in which all the items are measuring the same trait or the same combination of traits. However, just what is to serve as an index of homogeneity is not clear. The extent to which a set of items will “scale,” as defined by Stouffer (68) and others (20), seems to be the criterion for some. However, low “scalability” may result from instability of response to single items as well as from heterogeneity of the items in a set. Carroll (18) considered other possible indicators of homogeneity. He suggested that the items of a test be sorted into groups for general difficulty level and the examinees be sorted into groups with respect to total test score. If a three-dimensional plot is now prepared in which one dimension is item difficulty and a second is total score level, and if in the third dimension we plot the percent succeeding with each item in each ability category, these percentages should fall on a smooth surface if the test is to be considered homogeneous. Gage and Damrin (38) investigated the properties of the type of homogeneity index proposed by Loevinger. They found that it yielded numerical
Salkind_Chapter 86.indd 246
9/4/2010 11:00:32 AM
Thorndike
Tests as Research Instruments
247
magnitudes completely different from the standard reliability coefficient, tho both are indexes with a maximum possible value of 1.00, and that the homogeneity index was unrelated to test length. Furthermore, the index appeared to be a function of item difficulty, tending to be a maximum when the items are widely spaced in difficulty. Kriedt and Clark (57) compared the characteristics of a test composed of items selected by “scale analysis” with one based on traditional item analysis procedures. Scale analysis yielded a less reliable test and one with items which split the group in quite uneven fractions, tho the test did possess a higher degree of “scalability.” The concern with homogeneity may eventually prove fruitful, but it seems doubtful that procedures which have so far been developed to produce or express it are useful in the preparation of ability tests. How homogeneous a test should be depends, of course, on the purpose for which the scores are to be used. Gulliksen (47) and Cronbach and Warrington (23) have attacked the problem of getting from a single test administration a usable estimate of the reliability of a speeded test. The ordinary split-test or Kuder-Richardson procedure has a spurious element which tends to make it an overestimate. These authors developed formulas to indicate a lower bound for the reliability coefficient. It was pointed out that when the split-test reliability and the lower-bound estimate differ only slightly, the reliability can be bracketed within useful limits. When the upper and lower bounds differ widely, no useful estimate can be obtained, and one must fall back on the separate administration of parallel forms of the test. Clark (19) reported an empirical analysis of the effect of different item splits upon the split-half reliability coefficient. Using intelligence test material, he concluded that the particular item split is insignificant as a source of variation. The reliability of difference scores which serve as the basis for differential diagnosis and guidance has come in for empirical study and theoretical discussion. Doppelt and Bennett (30) have reported evidence on the reliability over a three-year period for differences between pairs of tests in the Differential Aptitude Test Battery. The difference scores have a reliability of about .50, as compared with one of .70 to .75 for the component scores. Derner, Aborn and Canter (28) and Gilhooley (40) reported reliability data for the subtests of the Wechsler, and questioned the extensive use made of differences between these only moderately reliable and rather substantially correlated subscores.
Validity and the Criterion A comprehensive treatment of the topic of validity was written by Cureton (60). More limited discussions of theoretical problems of validity were
Salkind_Chapter 86.indd 247
9/4/2010 11:00:32 AM
248
Research Design, Measurement and Statistics and Evaluation
prepared by Anastasi (3) and by Gulliksen (45). Anastasi emphasized that validity refers to some practical criterion rather than to some hypostatized trait, and that an extraneous factor, such as socio-economic status, is to be considered a distorter of test scores only if it affects the test scores without similarly affecting the criterion measure. Gulliksen questioned the desirability of unquestioningly accepting “expert judgment” in selecting a criterion variable. He proposed that alternate criterion measures be studied in terms of their intercorrelations and factor structure, and that the sensibleness of the correlations between criterion and predictors serve as one cue to the appropriateness of the criterion. There have been several articles oriented around producing more satisfactory practical indexes of the effectiveness of tests. Brogden (9) developed a coefficient of selective efficiency which under certain circumstances is equivalent to a biserial or product-moment correlation coefficient, and which indicates what proportion of the maximum possible gain in criterion score results from selection at a specified level on a particular predictor. Jarrett (52) proposed the use of percent increase in output as a practical measure of the effectiveness of a selection procedure in those cases in which number of units produced is a meaningful criterion variable. Brogden and Taylor (11) developed the notion of “the dollar criterion” as a common yardstick for appraising personnel procedures – an appealing notion but one fraught with numerous practical difficulties. Brogden and Taylor (12) have also considered the various classes of deficiencies in criterion measures and discussed ways of combating them. Brokaw (13) tested empirically the theoretical relationship between the length of the subtests in a battery and the validities of the resulting composite scores. His results were in accord with expectation, in that cutting the length of the separate tests in half produced only a slight drop in the validity of the composite. Travers and Wallace (74) discussed the problem of inconsistent validity data in successive groups. The discussion was illustrated by data from two consecutive classes in a school of dentistry. Change in the subjective weighting of entrance criteria was proposed as a probable explanation.
Cross Validation A set of papers discussed the importance of cross validation, i.e., the verification of test weights or item selection on a new sample. Mosier (65) distinguished among a number of related concepts which come within the general area of cross validation. Katzell (54) indicated the crucial role of cross validation when items are being selected by validity against an external criterion, and Cureton (26) documented this point by a synthetic example. Cureton (25) indicated that, especially where predictors are highly correlated, much of the difference in regression weights between variables may be due
Salkind_Chapter 86.indd 248
9/4/2010 11:00:32 AM
Thorndike
Tests as Research Instruments
249
to chance fluctuations, and proposed a principal components type of analysis to provide a closer approximation to “true score” regression weights. Wherry (80) tried to reconcile the advantages of theoretical analysis of regression weights and empirical tests by cross validation.
Special Types of Scoring and Scaling A number of investigators have considered problems relating to the more effective scoring of tests or scaling of the resulting scores. Guilford and Michael (44) discussed the problem of developing relatively univocal scores of pure factors. They feel that since in many cases the purity of a test is contaminated by only a single additional factor, the use of a single suppressor variable will often be an effective procedure for obtaining a relatively pure score. Feifel and Lorge (33) studied the qualitative differences between vocabulary responses of children and adults, indicating the possible value of qualitative differentiations in scoring. Fruchter (37) studied the information to be used from working with error scores. In some speeded tests, number wrong and number right have quite low intercorrelations. Fruchter found that a number of error scores defined a distinct factor, which might perhaps be considered to be “carefulness.” Glaser (41) criticized “inconsistency scores,” based on the number of changes in response from one testing to a later testing, as involving a basic statistical artifact. He points out that the number of changes which an individual makes will depend upon the number of items which are at or near the threshold of reaction for that individual. The problem of handling patterns of item responses was discussed by Meehl (61), who pointed out that it is possible for items to have joint validity for discriminating criterion groups even tho neither of them has validity individually. This will happen when the intercorrelation of the items is different in each of the criterion groups. Meehl indicates that empirical studies are under way to determine whether “configural scoring” appears profitable for items of the Minnesota Multiphasic Personality Inventory. Cronbach (21) developed a procedure which facilitates the joint tabulation of three subscores and the search for patterns of the three which discriminate criterion groups. Du Mas (32) developed an expression for similarity between pairs of profiles based upon the percent of like-signed lines in the two profiles. However, the test of significance for this index is faulty, since it assumes independence in the signs of the different segments. Mosier (60) prepared a general critique of profiles as devices for representing test results. The use of special testing devices which give the student immediate knowledge of results and which permit him to try alternate responses until he reaches the correct one has been studied by Pressey (66) and by Jones and Sawyer (53). It is reported in each case that this type of testing procedure results in improved learning.
Salkind_Chapter 86.indd 249
9/4/2010 11:00:32 AM
250
Research Design, Measurement and Statistics and Evaluation
A procedure for scaling test results which does not depend upon assumptions of normality has been developed by Gardner (39). The routine makes use of tables of the Pearson Type III curve, and is applicable to distributions with any degree of skewness. Flanagan (60) prepared a general discussion of the problems of units and scaling in psychological tests.
Distortions of Test Performance Cronbach (22) presented further evidence of “response sets,” i.e., such individual idiosyncracies as readiness to guess, emphasis on speed as opposed to accuracy, preference for some particular response option, and the like, upon test performance. A number of suggestions are offered for minimizing these influences. Cross (24) confirmed previous studies which show that the self-report type of inventory is readily subject to faking.
Differential Prediction The basic theoretical and practical problems in using a test battery for differential prediction of success in a number of fields, which were the focus of so much work during World War II, continue to attract interest. The problems were reviewed briefly by Wesman and Bennett (78). They questioned the desirability of concentrating on tests which differentiate two fields to the exclusion of those which measure qualities common to both. Thorndike (71) stated the problems involved both in designing a battery to be used in assigning men to many categories and in setting up administrative procedures for the use of the resulting scores. Brogden (10) investigated the gain in effectiveness of differential assignment resulting from the use of several different predictor variables rather than only a single predictor. Brogden’s analysis is limited to the specific situations where either (a) assignment is made between only two different categories, or (b) there is zero correlation between all the predictor scores. The results from these special cases made him perhaps unduly optimistic as to the gains to be expected from differential prediction in the general case in which there are a number of assignment categories and the predictor scores for all of the categories have substantial correlations.
Tests in the Study of Communities Recent nationwide testing programs have made possible several studies in which the community rather than the person is the unit being studied.
Salkind_Chapter 86.indd 250
9/4/2010 11:00:32 AM
Thorndike
Tests as Research Instruments
251
Davenport and Remmers (27) related state means on the World War II A-12 V-12 test to various socio-economic facts about the states, finding correlations as high as .83. Thorndike (1) studied community variables related to intelligence and achievement test scores of school children tested in the standardization of the revised Metropolitan Achievement Tests. He found the community variables to give fairly substantial predictions of the children’s intelligence test score, but only slight prediction of the achievement measures. Lennon (59) studied the interrelations of community intelligence and achievement measures using the same source of data.
New Tests An important addition to the stock of individual intelligence tests for children is the Wechsler Intelligence Scale for Children (77). This test is made up of the same subtests and yields the same types of scores as the well-known scale for adults. The Leiter International Performance Scale, 1948 Revision (58) provides an individual performance test for adults. A nonreading vocabulary test, requiring subjects to indicate to which one of a set of pictures a particular word relates has been developed by Ammons and Ammons (2). In the testing of motor performance, Van der Lugt brought to this country and translated for use a series for the testing of manual ability for adults (75) and a psychomotor test series for children (76). Both were originally developed in Europe, with norms and other statistics on European groups. The measurement of listening comprehension has been an area of marked activity, and reports of tests and their use have been prepared by Blewett (7), Brown (14), and Spache (67). Typically, these tests seem to have fairly adequate reliability and low enough correlations both with measures of reading and with measures of general intelligence to indicate that they are measuring skills which are in some degree unique. Increased dependence on newer audio-visual media of instruction will make measures of ability to learn by looking and listening as important as ability to learn by reading, as Caffrey (17) pointed out in his plea for “auding-age” norms. Two variations of the Thematic Apperception Test have been prepared with the objective of adapting the materials to use with specific groups. Thompson (69) prepared a form incorporating Negro figures for use with Negro subjects, while Bachrach and Thompson (4) developed a form showing figures of crippled individuals for use with the handicapped. Bellak and Bellak (6) prepared a test for children which makes use of animal figures. The Blacky Pictures, prepared by Blum (8), have been published for research use. This series again makes use of a dog (Blacky) as the central figure in a series of pictures designed to permit sex-related interpretations. An inventory of interests and activities has been prepared by Burgess, Cavan, and Havighurst (15) for the specific purpose of work with adults 60 and over.
Salkind_Chapter 86.indd 251
9/4/2010 11:00:32 AM
252
Research Design, Measurement and Statistics and Evaluation
Concluding Statement The reviewer has not identified in the work of the past three years which he has reviewed, any instances of especially noteworthy advance in test invention or in test theory. There has been real progress in consolidating and organizing test doctrine for the user, and certain ones of the new tests will undoubtedly give good service.
Bibliography 1. American Educational Research Association. Growing Points in Educational Research. 1949 Official Report. Washington, D.C.: the Association, a department of the National Education Association, 1949. Thorndike, Robert L., “Community Factors Related to Intelligence and Achievement of School Children,” p. 265 –71; Walker, Helen M., “Item Selection by Sequential Sampling,” p. 181– 87. 2. Ammons, Robert B., and Ammons, Helen S. Full-Range Vocabulary Test. New Orleans: the Author, Tulane University, 1948. 3. Anastasi, Anne. “The Concept of Validity in the Interpretation of Test Scores.” Educational and Psychological Measurement 10: 67–78; Spring 1950. 4. Bachrach, Arthur J., and Thompson, Charles E. Thematic Apperception Test, Modification for the Handicapped. Cleveland: Society for Crippled Children, 1949. 5. Bedell, B. J. “Determination of the Optimum Number of Items To Retain in a Test Measuring a Single Ability.” Psychometrika 15: 419 – 30; December 1950. 6. Bellak, Leopold, and Bellak, Sonya S. Children’s Apperception Test. New York: Children’s Psychiatric Service Co., 1949. 13 p. 7. Blewett, Thomas T. “An Experiment in the Measurement of Listening at the College Level.” Journal of Educational Research 44: 575 – 85; April 1951. 8. Blum, Gerald S. The Blacky Pictures. New York: Psychological Corp., 1950. 9. Brogden, Hubert E. “A New Coefficient: Application to Biserial Correlation and to Estimation of Selective Efficiency.” Psychometrika 14: 169 – 82; September 1949. 10. Brogden, Hubert E. “Increased Efficiency of Selection Resulting from Replacement of a Single Predictor with Several Differential Predictors.” Educational and Psychological Measurement 11: 173 – 95; Summer 1951. 11. Brogden, Hubert E., and Taylor, Erwin K. “The Dollar Criterion—Applying the Cost Accounting Concept to Criterion Construction.” Personnel Psychology 3: 133 –54; Summer 1950. 12. Brogden, Hubert E., and Taylor, Erwin K. “The Theory and Classification of Criterion Bias.” Educational and Psychological Measurement 10: 159 – 86; Summer 1950. 13. Brokaw, Leland D. The Comparative Composite Validities of Batteries of “Short” Versus “Long” Tests. Research Bulletin 50 –1. San Antonio, Texas: Lackland Air Force Base, Human Resources Research Center, January 1950. 30 p. 14. Brown, James I. “The Construction of a Diagnostic Test of Listening Comprehension.” Journal of Experimental Education 18: 139 – 46; December 1949. 15. Burgess, Ernest W.; Cavan, Ruth S.; and Havighurst, Robert J. Your Activities and Attitudes. Chicago: Science Associates, 1949. 4 p. 16. Cadwell, Dorothy H. B. “Accuracy of Prediction of Item Difficulty for a Recent Civil Service Examination for Clerks.” Canadian Journal of Psychology 4: 18 – 25; March 1950. 17. Caffrey, John. “The Establishment of Auding-Age Norms.” School and Society 70: 310 –12; November 12, 1949.
Salkind_Chapter 86.indd 252
9/4/2010 11:00:32 AM
Thorndike
Tests as Research Instruments
253
18. Carroll, John B. “Criteria for the Evaluation of Achievement Tests: From the Point of View of Their Internal Statistics.” Proceedings of the 1950 Invitational Conference on Testing Problems. Princeton, NJ: Educational Testing Service, 1951. p. 95 – 99. 19. Clark, Edward L. “Methods of Splitting vs. Samples as Sources of Instability in TestRetest Reliability Coefficients.” Harvard Educational Review 19: 178 – 82; May 1949. 20. Coombs, Clyde H. “ The Concepts of Reliability and Homogeneity.” Educational and Psychological Measurement 10: 43 – 56; Spring 1950. 21. Cronbach, Lee J. “ ‘Pattern Tabulation’: a Statistical Method for Analysis of Limited Patterns of Scores, with Particular Reference to the Rorschach Test.” Educational and Psychological Measurement 9: 149 –71; Summer 1949. 22. Cronbach, Lee J. “Further Evidence on Response Sets and Test Design.” Educational and Psychological Measurement 10: 3 – 31; Spring 1950. 23. Cronbach, Lee J., and Warrington, Willard G. “Time-Limit Tests: Estimating Their Reliability and Degree of Speeding.” Psychometrika 16: 167– 88; June 1951. 24. Cross, Orrin H. “A Study of Faking on the Kuder Preference Record.” Educational and Psychological Measurement 10: 271– 77; Summer 1950. 25. Cureton, Edward E. “Approximate Linear Restraints and Best Predictor Weights.” Educational and Psychological Measurement 11: 12 –15; Spring 1951. 26. Cureton, Edward E. “Validity, Reliability and Baloney.” Educational and Psychological Measurement 10: 94 – 96; Spring 1950. 27. Davenport, Kenneth S., and Remmers, Herman H. “Factors in State Characteristics Related to Average A-12 V-12 Test Scores.” Journal of Educational Psychology 41: 110–15; February 1950. 28. Derner, Gordon G.; Aborn, Murray; and Canter, Aaron H. “The Reliability of the Wechsler-Bellevue Subtests and Scales.” Journal of Consulting Psychology 14: 172 –79; May–June 1950. 29. Dolch, Edward W. “Tested Word Knowledge Versus Frequency Counts.” Journal of Educational Research 44: 457– 70; February 1951. 30. Doppelt, Jerome E., and Bennett, George K. “A Longitudinal Study of the Differential Aptitude Tests.” Educational and Psychological Measurement 11: 228 – 37; Summer 1951. 31. Doppelt, Jerome E., and Potts, Edith M. “Constancy of Item-Test Correlation Coefficients Computed from Upper and Lower Groups.” Journal of Educational Psychology 40: 378–81; October 1949. 32. duMas, Frank M. “The Coefficient of Profile Similarity.” Journal of Clinical Psychology 5: 123–31; April 1949. 33. Feifel, Herman, and Lorge, Irving D. “Qualitative Differences in the Vocabulary Responses of Children.” Journal of Educational Psychology 41: 1–18; January 1950. 34. Ferguson, George A. “On the Theory of Test Discrimination.” Psychometrika 14: 61– 68; March 1949. 35. Flanagan, John C. “The Effectiveness of Short Methods for Calculating Correlation Coefficients.” American Psychologist 6: 404; July 1951. 36. Flanagan, John C. “The Use of Comprehensive Rationales in Test Development.” Educational and Psychological Measurement 11: 151– 55; Spring 1951. 37. Fruchter, Benjamin. “Error Scores as a Measure of Carefulness.” Journal of Educational Psychology 41: 279 – 91; May 1950. 38. Gage, Nathaniel L., and Damrin, Dora E. “Reliability, Homogeneity and Number of Choices.” Journal of Educational Psychology 41: 385 – 404; November 1950. 39. Gardner, Eric F. “Value of Norms Based on a New Type of Scale Unit.” Proceedings of the 1948 Conference on Testing Problems. Princeton, N.J.: Educational Testing Service, 1949. p. 67– 74. 40. Gilhooly, Francis M. “Wechsler-Bellevue Reliability and the Validity of Certain Diagnostic Signs of the Neuroses.” Journal of Consulting Psychology 14: 82 – 87; March–April 1950.
Salkind_Chapter 86.indd 253
9/4/2010 11:00:33 AM
254
Research Design, Measurement and Statistics and Evaluation
41. Glaser, Robert. “Methodological Analysis of the Inconsistency of Response to Test Items.” Educational and Psychological Measurement 9: 727–39; Winter 1949. 42. Gleser, Goldine C., and Du Bois, Philip H. “A Successive Approximation Method of Maximizing Test Validity.” Psychometrika 16: 129 – 39; March 1951. 43. Goheen, Howard W., and Kavruck, Samuel. Selected References on Test Construction, Mental Test Theory, and Statistics, 1929 –1949. Washington, D.C.: U.S. Civil Service Commission, 1950. 209 p. 44. Guilford, J. Paul, and Michael, William B. “Approaches to Univocal Factor Scores.” Psychometrika 13: 1– 22; March 1950. 45. Gulliksen, Harold. “Intrinsic Validity.” American Psychologist 5: 511–17; October 1950. 46. Gulliksen, Harold. “Item Selection to Maximize Test Validity.” Proceedings of the 1948 Conference on Testing Problems. Princeton, N.J.: Educational Testing Service, 1949. p. 13 –17. 47. Gulliksen, Harold. “The Reliability of Speeded Tests.” Psychometrika 15: 259 – 69; September 1950. 48. Gulliksen, Harold. Theory of Mental Tests. New York: John Wiley and Sons, 1950. 486 p. 49. Horst, Paul. “Determination of the Optimal Test Length to Maximize the Multiple Correlation.” Psychometrika 14: 79 – 88; June 1949. 50. Horst, Paul. “A Generalized Expression for the Reliability of Measures.” Psychometrika 14: 21–31; March 1949. 51. Horst, Paul. “Optimal Test Length for Maximum Battery Validity.” Psychometrika 16: 189 – 202; June 1951. 52. Jarrett, Rheem F. “Per Cent Increase in Output of Selected Personnel as an Index of Test Efficiency.” Journal of Applied Psychology 32: 135 – 45; April 1948. 53. Jones, Howard L., and Sawyer, Michael O. “A New Evaluative Instrument.” Journal of Educational Research 42: 381– 85; January 1949. 54. Katzell, Raymond A. “Cross Validation of Item Analyses.” Educational and Psychological Measurement 11: 16 –22; Spring 1951. 55. Kimball, Allyn W. “Sequential Sampling Plans for Use in Psychological Test Work.” Psychometrika 15: 1–15; March 1950. 56. Kirkpatrick, James J., and Cureton, Edward E. “Vocabulary Item Difficulty and Word Frequency.” Journal of Applied Psychology 33: 347– 51; August 1949. 57. Kriedt, Philip H., and Clark, Kenneth E. “ ‘Item Analysis’ versus ‘Scale Analysis’. ” Journal of Applied Psychology 33: 114 – 21; April 1949. 58. Leiter, Russell G. Leiter International Performance Scale, 1948 Revision. Washington, D. C.: Psychological Service Center, 1948. 59. Lennon, Roger T. “Relation Between Intelligence and Achievement Test Results for a Group of Communities.” Journal of Educational Psychology 41: 301– 308; May 1950. 60. Lindquist, Everet F., editor. Educational Measurement. Washington, D. C.: American Council on Education, 1951. Cureton, Edward E., “Validity,” p. 621– 94; Davis, Frederick B., “Item Selection Techniques,” p. 266–328; Ebel, Robert L., “Writing the Test Item,” p. 185 – 249; Flanagan, John C. “Units, Scores, and Norms,” p. 695–763; Mosier, Charles I., “Batteries and Profiles,” p. 764 – 810; Thorndike, Robert L., “Reliability,” p. 560–620. 61. Meehl, Paul E. “Configural Scoring.” Journal of Consulting Psychology 14: 165 –71; May – June 1950. 62. Micheels, William J., and Karnes, M. Ray. Measuring Educational Achievement. New York: McGraw-Hill Book Co., 1950. 496 p. 63. Mollenkopf, William G. “An Empirical Study of the Effects on Item-Analysis Data of Changing Item Placement and Test Time Limit.” Psychometrika 15: 291– 315; September 1950.
Salkind_Chapter 86.indd 254
9/4/2010 11:00:33 AM
Thorndike
Tests as Research Instruments
255
64. Moonan, William J. “Some Empirical Aspects of the Sequential Analysis Technique as Applied to an Achievement Examination.” Journal of Experimental Education 18: 195 – 207; March 1950. 65. Mosier, Charles I. “Problems and Designs of Cross Validation.” Educational and Psychological Measurement 11: 5 –11; Spring 1951. 66. Pressey, Sidney L. “Development and Appraisal of Devices Providing Immediate Automatic Scoring of Objective Tests and Concomitant Self-Instruction.” Journal of Psychology 29: 417– 47; April 1950. 67. Spache, George D. “The Construction and Validation of a Work-Type Auditory Comprehension Reading Test.” Educational and Psychological Measurement 10: 249 – 53; Summer 1950. 68. Stouffer, Samuel A., and others. “Studies in Social Psychology in World War II.” Measurement and Prediction. Vol. 4. Princeton, N.J.: Princeton University Press, 1950. 756 p. 69. Thompson, Charles E. Thematic Apperception Test—Thompson Modification. Cambridge, Mass.: Harvard University Press, 1949. 70. Thorndike, Robert L. Personnel Selection: Test and Measurement Techniques. New York: John Wiley and Sons, 1949. 358 p. 71. Thorndike, Robert L. “The Problem of Classification of Personnel.” Psychometrika 15: 215 – 35; September 1950. 72. Travers, Robert M. W. How To Make Achievement Tests. New York: Odyssey Press, 1950. 180 p. 73. Travers, Robert M. W. “Rational Hypotheses in the Construction of Tests.” Educational and Psychological Measurement 11: 128 – 37; Spring 1951. 74. Travers, Robert M. W., and Wallace, Wimburn L. “Inconsistency in the Predictive Value of a Battery of Tests.” Journal of Applied Psychology 34: 237–39; August 1950. 75. Van der Lugt, Maria-Johanna A. V.D.L. Adult Psychomotor Test Series for the Measurement of Manual Ability. New York: the Author, New York University, Department of Psychology, 1948. 61 p. 76. Van der Lugt, Maria-Johanna A. V. D. L. Psychomotor Test Series for Children for the Measurement of Manual Ability. New York: the Author, New York University, Department of Psychology, 1949. (Mimeo.) 77. Wechsler, David. Wechsler Intelligence Scale for Children; Manual. New York: Psychological Corp., 1949. 113 p. 78. Wesman, Alexander G., and Bennett, George K. “Problems of Differential Prediction.” Educational and Psychological Measurement 11: 265 – 72; Summer 1951. 79. Wesman, Alexander G., and Seashore, Harold G. “Frequency versus Complexity of Words in Verbal Measurement.” Journal of Educational Psychology 40: 395 – 404; November 1949. 80. Wherry, Robert J. “Comparison of Cross-Validation with Statistical Inference of Betas and Multiple R from a Single Sample.” Educational and Psychological Measurement 11: 23 – 28; Spring 1951.
Salkind_Chapter 86.indd 255
9/4/2010 11:00:33 AM
Salkind_Chapter 86.indd 256
9/4/2010 11:00:33 AM
87 My Current Thoughts on Coefficient Alpha and Successor Procedures Lee J. Cronbach and Richard J. Shavelson
W
here the accuracy of a measurement is important, whether for scientific or practical purposes, the investigator should evaluate how much random error affects the measurement. New research may not be necessary when a procedure has been studied enough to establish how much error it involves. But with new measures, or measures being transferred to unusual conditions, a fresh study is in order. Sciences other than psychology have typically summarized such research by describing a margin of error; a measure will be reported, followed by a plus or minus sign and a numeral that is almost always the standard error of measurement (which will be explained later). The alpha formula is one of several analyses that may be used to gauge the reliability (i.e., accuracy) of psychological and educational measurements. This formula was designed to be applied to a two-way table of data where rows represent persons (p) and columns represent scores assigned to the person under two or more conditions (i). Condition is a general term often used where each column represents the score on a single item within a test. But it may also be used, for example, for different scorers when more than one person judges each article and any scorer treats all persons in the sample. Because the analysis examines the consistency of scores from one condition to another, procedures like alpha are known as internal consistency analyses.
Source: Educational and Psychological Measurement, 64(3) (2004): 391– 418.
Salkind_Chapter 87.indd 257
9/4/2010 11:00:19 AM
258
Research Design, Measurement and Statistics and Evaluation
Origin and Purpose of These Notes My 1951 Article and Its Reception In 1951, I published an article entitled, “Coefficient Alpha and the Internal Structure of Tests.” The article was a great success and was cited frequently [no less than 5,590 times].1 Even in recent years, there have been approximately 325 social science citations per year.2 The numerous citations to my article by no means indicate that the person who cited it had read it, and does not even demonstrate that he or she had looked at it. I envision the typical activity leading to the typical citation as beginning with a student laying out his research plans for a professor or submitting a draft report, and it would be the professor’s routine practice to say, wherever a measuring instrument was used, that the student ought to check the reliability of the instrument. To the question, “How do I do that?” the professor would suggest using the alpha formula because the computations are well within the reach of almost all students undertaking research and because the calculation can be performed on data the student will routinely collect. The professor might write out the formula or simply say, “You can look it up.” The student would find the formula in many textbooks that would be likely to give the 1951 article as a reference, so the student would copy that reference and add one to the citation count. There would be no point for him or her to try to read the 1951 article, which was directed to a specialist audience. And the professor who recommended the formula may have been born well after 1951 and not only be unacquainted with the article but uninterested in the debates about 1951 conceptions that had been given much space in it. (The citations are not all from nonreaders; throughout the years, there has been a trickle of articles discussing alpha from a theoretical point of view and sometimes suggesting interpretations substantially different from mine. These articles did little to influence my thinking.) Other signs of success: There were very few later articles by others criticizing parts of my argument. The proposals or hypotheses of others that I had criticized in my article generally dropped out of the professional literature.
A 50th Anniversary In 1997, noting that the 50th anniversary of the publication was fast approaching, I began to plan what has now become these notes. If it had developed into a publishable article, the article would clearly have been self-congratulatory. But I intended to devote most of the space to pointing out the ways my own views had evolved; I doubt whether coefficient
Salkind_Chapter 87.indd 258
9/4/2010 11:00:20 AM
Cronbach and Shavelson
Coefficient Alpha
259
alpha is the best way of judging the reliability of the instrument to which it is applied. My plan was derailed when various loyalties impelled me to become the head of the team of qualified and mostly quite experienced investigators who agreed on the desirability of producing a volume (Cronbach, 2002) to recognize the work of R. E. Snow, who had died at the end of 1997. When the team manuscript had been sent off for publication as a book, I might have returned to alpha. Almost immediately, however, I was struck by a health problem that removed most of my strength, and a year later, when I was just beginning to get back to normal strength, an unrelated physical disorder removed virtually all my near vision. I could no longer read professional writings and would have been foolish to try to write an article of publishable quality. In 2001, however, Rich Shavelson urged me to try to put the thoughts that might have gone into the undeveloped article on alpha into a dictated memorandum, and this set of notes is the result. Obviously, it is not the scholarly review of uses that have been made of alpha and of discussions in the literature about its interpretation that I intended. It may nonetheless pull together some ideas that have been lost from view. I have tried to present my thoughts here in a nontechnical manner with a bare minimum of algebraic statements, and I hope that the material will be useful to the kind of student who in the past was using the alpha formula and citing my 1951 article.
My Subsequent Thinking Only one event in the early 1950s influenced my thinking: Frederick Lord’s (1955) article in which he introduced the concept of randomly parallel tests. The use I made of the concept is already hinted at in the preceding section. A team started working with me on the reliability problem in the latter half of the decade, and we developed an analysis of the data far more complex than the two-way table from which alpha is formed. The summary of that thinking was published in 1963, but is beyond the scope of these notes. The lasting influence on me was the appreciation we developed for the approach to reliability through variance components, which I shall discuss later.3 From 1970 to 1995, I had much exposure to the increasingly prominent, statewide assessments and innovative instruments using samples of student performance. This led me to what is surely the main message to be developed here. Coefficients are a crude device that does not bring to the surface many subtleties implied by variance components. In particular, the interpretations being made in current assessments are best evaluated through use of a standard error of measurement, as I discuss later.
Salkind_Chapter 87.indd 259
9/4/2010 11:00:20 AM
260
Research Design, Measurement and Statistics and Evaluation
Conceptions of Reliability The Correlational Stream Emphasis on individual differences. Much early psychological research, particularly in England, was strongly influenced by the ideas on inheritance suggested by Darwin’s theory of Natural Selection. The research of psychologists focused on measures of differences between persons. Educational measurement was inspired by the early studies in this vein and it, too, has given priority to the study of individual differences, that is, this research has focused on person differences. When differences were being measured, the accuracy of measurement was usually examined. The report has almost always been in the form of a reliability coefficient. The coefficient is a kind of correlation with a possible range from 0 to 1.00. Coefficient alpha was such a reliability coefficient. Reliability seen as consistency among measurements. Just what is to be meant by reliability was a perennial source of dispute. Everyone knew that the concern was with consistency from one measurement to another, and the conception favored by some authors saw reliability as the correlation of an instrument with itself. That is, if, hypothetically, we could apply the instrument twice and on the second occasion have the person unchanged and without memory of his first experience, then the consistency of the two identical measurements would indicate the uncertainty due to measurement error, for example, a different guess on the second presentation of a hard item. There were definitions that referred not to the self-correlation but to the correlation of parallel tests, and parallel could be defined in many ways (a topic to which I shall return). Whatever the derivation, any calculation that did not directly fit the definition was considered no better than an approximation. As no formal definition of reliability had considered the internal consistency of an instrument as equivalent to reliability, all internal consistency formulas were suspect. I did not fully resolve this problem; I shall later speak of developments after 1951 that give a constructive answer. I did, in 1951, reject the idealistic concept of a selfcorrelation, which at best is unobservable; parallel measurements were seen as an approximation. The split-half technique. Charles Spearman, just after the start of the 20th century, realized that psychologists needed to evaluate the accuracy of any measuring instrument that they used. Accuracy would be naively translated as the agreement among successive measures of the same thing by the same technique. But repeated measurement is suspect because participants learn on the first trial of an instrument and, in an ability test, are likely to earn better scores on later trials. Spearman, for purposes of his own research, invented the split-half procedure in which two scores are obtained from a single testing by scoring
Salkind_Chapter 87.indd 260
9/4/2010 11:00:20 AM
Cronbach and Shavelson
Coefficient Alpha
261
separately the odd-numbered items and the even-numbered items.4 This is the first of the internal consistency procedures, of which coefficient alpha is a modern exemplar. Thus, with a 40-item test, Spearman would obtain total scores for two 20-item half-tests, and correlate the two columns of scores. He then proposed a formula for estimating the correlation expected from two 40-item tests. In the test theory that was developed to provide a mathematical basis for formulas like Spearman’s, the concept of true score was central. Roughly speaking, the person’s true score is the average score he or she would obtain on a great number of independent applications of the measuring instrument. The problem of multiple splits. Over the years, many investigators proposed alternative calculation routines, but these either gave Spearman’s result or a second result that differed little from that of Spearman; we need not pursue the reason for this discrepancy. In the 1930s, investigators became increasingly uncomfortable with the fact that comparing the total score from Items 1, 3, 5, and so on with the total on Items 2, 4, 6, and so on gave one coefficient, but that contrasting the sum of scores on Items 1, 4, 5, 8, 9, and so on with the total on Items 2, 3, 6, 7, 10 and so on would give a different numerical result. Indeed, there were a vast number of such possible splits of a test, and therefore any split-half coefficient was, to some degree, incorrect. In the period from the 1930s to the late 1940s, quite a number of technical specialists had capitalized on new statistical theory being developed in England by R. A. Fisher and others, and these authors generally presented a formula whose results were the same as those from the alpha formula. Independent of these advances, which were almost completely unnoticed by persons using measurement in the United States, Kuder and Richardson developed a set of internal consistency formulas that attempted to cut through the confusion caused by the multiplicity of possible splits. They included what became known as K-R 20, which was mathematically a special case of alpha that applied only to items scored one and zero. Their formula was widely used, but there were many articles questioning its assumptions. Evaluation of the 1951 article. My article was designed for the most technical of publications on psychological and educational measurement, Psychometrika. I wrote a somewhat encyclopedic article in which I not only presented the material summarized above but reacted to a number of publications by others that had suggested alternative formulas based on a logic other than that of alpha, or commenting on the nature of internal consistency. This practice of loading an article with a large number of thoughts related to a central topic was normal practice and preferable to writing half a dozen articles on each of the topics included in the alpha article. In retrospect, it would have been desirable for me to write a simple article laying out the formula, the rationale and limitations of internal consistency methods, and the interpretation of the coefficients the formula yielded. I was not
Salkind_Chapter 87.indd 261
9/4/2010 11:00:20 AM
262
Research Design, Measurement and Statistics and Evaluation
aware for some time that the 1951 article was being widely cited as a source, and I had moved on once the article was published to other lines of investigation. One of the bits of new knowledge I was able to offer in my 1951 article was a proof that coefficient alpha gave a result identical with the average coefficient that would be obtained if every possible split of a test were made and a coefficient calculated for every split. Moreover, my formula was identical to K-R 20 when it was applied to items scored one and zero. This, then, made alpha seem preeminent among internal consistency techniques. I also wrote an alpha formula that may or may not have appeared in some writing by a previous author, but it was not well known. I proposed to calculate alpha as ⎛ k ⎞⎟⎛⎜ ∑ si2 ⎞⎟. ⎜⎜ ⎟⎟ 1− ⎜ ⎟ ⎜⎝ k − 1⎟⎠⎜⎜⎝ st2 ⎟⎠ Here, k stands for the number of conditions contributing to a total score, and s is the standard deviation, which students have learned to calculate and interpret early in the most elementary statistics course. There is an si for every column of a p × i layout (see Table 1a) and an st for the column of total scores (usually test scores). The formula was something that students having an absolute minimum of technical knowledge could make use of. Not only had equivalent formulas been presented numerous times in the psychological literature, as I documented carefully in the 1951 article, but the fundamental idea goes far back. Alpha is a special application of what is called the intraclass correlation, which originated in research on marine populations where statistics were being used to make inferences about the laws of heredity.5 R. A. Fisher did a great deal to explicate the intraclass correlation and moved forward into what became known as the analysis of variance. The various investigators who applied Fisher’s ideas to psychological measurement were all relying on aspects of analysis of variance, which did not begin to command attention in the United States until about 1946.6 Even so, to make so much use of an easily calculated translation of Table 1a: Person × Item score (X pi ) sample matrix Item Person 1 2 … P … n
1
2
…
i
…
k
Sum or total
X11 X21 … Xp1 … Xn1
X12 X22 … Xp2 … Xn2
… … … … … …
X1i X2i … Xpi … Xni
… … … … … …
X1k X2k … Xpk … Xnk
X1 X2 … Xp … Xn
Note: Table added by the editor.
Salkind_Chapter 87.indd 262
9/4/2010 11:00:20 AM
Cronbach and Shavelson
Coefficient Alpha
263
a well-established formula scarcely justifies the fame it has brought me. It is an embarrassment to me that the formula became conventionally known as Cronbach’s α. The label alpha, which I applied, is also an embarrassment. It bespeaks my conviction that one could set up a variety of calculations that would assess properties of test scores other than reliability, and alpha was only the beginning. For example, I thought one could examine the consistency among rows of the matrix mentioned above (see Table 1a) to look at the similarity of people in the domain of the instrument. This idea produced a number of provocative ideas, but the idea of a coefficient analogous to alpha proved to be unsound (Cronbach & Gleser, 1953). My article had the virtue of blowing away a great deal of dust that had grown up out of attempts to think more clearly about K-R 20. So many articles tried to offer sets of assumptions that would lead to the result that there was a joke that “deriving K-R 20 in new ways is the second favorite indoor sport of psychometricians.” Those articles served no function once the general applicability of alpha was recognized. I particularly cleared the air by getting rid of the assumption that the items of a test were unidimensional, in the sense that each of them measured the same common type of individual difference, along with, of course, individual differences with respect to the specific content of items. This made it reasonable to apply alpha to the typical tests of mathematical reasoning, for example, where many different mental processes would be used in various combinations from item to item. There would be groupings in such a set of items, but not enough to warrant formally recognizing the groups in subscores. Alpha, then, fulfilled a function that psychologists had wanted fulfilled since the days of Spearman. The 1951 article and its formula thus served as a climax for nearly 50 years of work with these correlational conceptions. It would be wrong to say that there were no assumptions behind the alpha formula (e.g., independence), but the calculation could be made whenever an investigator had a two-way layout of scores with persons as rows and columns for each successive independent measurement.7 This meant that the formula could be applied not only to the consistency among items in a test but also to agreement among scorers of a performance test and the stability of performance of scores on multiple trials of the same procedure, with somewhat more trust than was generally defensible.
The Variance-Components Model Working as a statistician in an agricultural research project station, R. A. Fisher designed elaborate experiments to assess the effects on growth and yield of variations in soil, fertilizer, and the like. He devised the analysis of variance as a way to identify which conditions obtained superior effects. This analysis
Salkind_Chapter 87.indd 263
9/4/2010 11:00:20 AM
264
Research Design, Measurement and Statistics and Evaluation
gradually filtered into American experimental psychology, where Fisher’s F test enters most reports of conclusions. A few persons in England and Scotland, who were interested in measurement, did connect Fisher’s method with questions about reliability of measures, but this work had no lasting influence. Around 1945, an alternative to analysis of variance was introduced, and this did have an influence on psychometrics. In the middle 1940s, a few mathematical statisticians suggested a major extension of Fisherian thinking into new territory. Fisher had started with agricultural research and thought of environmental conditions as discrete choices. A study might deal with two varieties of oats, or with several kinds of fertilizer, which could not be considered a random sample from a greater array of varieties. Fisher did consider plots to be sampled from an array of possible plots. That is, he would combine Species A with Fertilizer 1 and measure the results in some number of scattered areas. Similar samples of plots were used for each of the other combinations of species and fertilizer. In the postwar literature, it was suggested that one or both factors in a two-way design might be considered random. This opened the way for a method that reached beyond what Fisher’s interpretation offered. I have already mentioned the sampling of persons and the sampling of items or tasks, which can be analyzed with the new components-of-variance model, as will be seen. Burt, working in London and subject to the influence of Fisher, had carried the variance approach in the direction that became generalizability (G) theory, with alpha as a simplified case (Cronbach, Gleser, Nanda, & Rajaratnam, 1972).8 His notes for students in the 1930s were lost during World War II, and his ideas only gradually became available to Americans in articles where students had applied his methods. In 1951, Burt’s work was unknown to U.S. psychometricians.
Basics of Alpha We obtain a score Xpi for person p by observing him in condition i. The term condition is highly general, but most often in the alpha literature it refers either to tests or to items, and I shall use the symbol i. The conditions, however, might be a great variety of social circumstances, and it would very often be raters of performance or scorers of responses. If the persons are all observed under the same condition, then the scores can be laid out in a column with persons functioning as rows; and when scores are obtained for two or more conditions, adding the columns for those conditions gives the score matrix (see Table 1a).9 We usually think of a set of conditions i with every person having a score on the first condition, on the second condition, and so on, although if there is an omission we will generally enter a score of 0 or, in the case of the scorer
Salkind_Chapter 87.indd 264
9/4/2010 11:00:20 AM
Cronbach and Shavelson
Coefficient Alpha
265
failing to mark the article, we will have to treat this as a case of missing data. The alternative, however, is where each person is observed under a different series of conditions. The obvious example is where person p is evaluated on some personality trait by acquaintances, and the set of acquaintances varies from person to person, possibly with no overlap. Then there is no rational basis for assigning scores on the two persons to the same column. Formally, the situation where scores are clearly identified with the same condition i is called a crossed matrix because conditions are crossed with persons. In the second situation, there is a different set of conditions for each person; therefore, we may speak of this as a nested design because raters are nested within the person. Virtually all the literature leading down to the alpha article has assumed a crossed design, although occasional side remarks will recognize the possibility of nesting. Note that we also have a nested design when different questions are set for different persons, which can easily happen in an oral examination and may happen in connection with a portfolio. Second, a distinction is to be made between the sample matrix of actual observations (see Table 1a) and the infinite matrix (see Table 1b) about which one wishes to draw conclusions. (I use the term infinite because it is likely to be more familiar to readers than the technical terms preferred in mathematical discourse.) We may speak of the population-universe matrix for a conception where an infinite number of persons all in some sense of the same type respond to an infinite universe of conditions, again of the same type.10 The matrix of actual data could be described as representing a sample of persons crossed with a sample of conditions, but it will suffice to speak of the sample matrix. The alpha literature and most other literature prior to 1951 assumed that the sample matrix and the population matrix were crossed. Mathematically, it is easy enough to substitute scores from a nested sample matrix by simply taking the score listed first for each as belonging in column 1, but this is not the appropriate analysis. All psychometric theory of reliability pivots on the concept of true score. (In G Theory, this is renamed “Universe Score,” but we need not consider the reasons here.) The true score is conceptualized as the average score the person would reach if measured an indefinitely large number of times, all Table 1b: Person × Item score (Xpi) infinite ( population-universe) matrix Item Person
1
2
…
i
…
k→∞
1 2 … P … n →∞
X11 X21 … xp1 … Xn1
X12 X22 … Xp2 … Xn2
… … … … … …
X1i X2i … Xpi … Xni
… … … … … …
X1K X2k … Xpk … Xnk
Note: Table added by the editor.
Salkind_Chapter 87.indd 265
9/4/2010 11:00:21 AM
266
Research Design, Measurement and Statistics and Evaluation
measurements being independent, with the same or equivalent procedures [average over k → ∞; see Table 1b]. The difference between the observed score and the person’s true score is the error. It is uncorrelated from one measurement to another – another statement of the independence principle. The concept of error is that random errors are unrelated to the true score and have a mean of zero over persons, or over repeated measurements. The conception of true score is indefinite until equivalent is endowed. This did not occur until Lord (1955) cataloged various degrees in which parallel tests might resemble one another. At one extreme, there could be parallel tests where the content of Item 5 appeared in a second form of the instrument in other wording as, let us say, Item 11. That is to say, the specific content of the two tests, as well as the general dimensions running through many items, were duplicated. At the other extreme were random-parallel tests, where each test was (or could reasonably be regarded as) a random sample from a specified domain of admissible test items. It was the latter level of parallelism that seemed best to explain the function of coefficient alpha; it measured the consistency of one random sample of items with other such samples from the same domain. A rather obvious description of the accuracy with which an instrument measures individual differences in the corresponding true score is the correlation of the observed score with the true score. Coefficient alpha is essentially equal to the square of that correlation. (The word essentially is intended to glide past a full consideration of the fact that each randomly formed instrument will have a somewhat different correlation with the true score.) Reliability formulas developed with assumptions rather different from those entering alpha are also to be interpreted as squared correlations of observed score with the corresponding true score, so alpha is on a scale consistent with tradition. It might seem logical to use the square root of alpha in reports of reliability findings, but that has never become the practice. The observed score is regarded as the sum of the true score and a random error. That statement and the independence assumption, which has its counterpart in the development of other reliability formulas, lead to the simple conclusion that the variance of observed scores is the sum of the error variance and the true score variance. It will be recalled that variance is really the square of the standard deviation. Each individual taking a test has a particular true score, which I may label T, and the true scores have a variance. The observed score has been broken into fractions, its presenting error and true score. We may, therefore, interpret alpha as reporting the percentage of the observed individual differences (as described in their variance) that is attributable to true variance in the quality measured by this family of randomly parallel tests.11 In thinking about reliability, one can distinguish between the coefficient generated from a single set of n persons and k items, or about the value that would be obtained using an exceedingly large sample and averaging
Salkind_Chapter 87.indd 266
9/4/2010 11:00:21 AM
Cronbach and Shavelson
Coefficient Alpha
267
coefficients over many random drawings of items. The coefficient calculated from a finite sample is to be considered an estimate of the population value of the coefficient. Little interest attaches to the consistency among scores on a limited set of items and a particular group of people. This is the usual consideration in research where data from the sample are used to infer relations in the population. In the history of psychometric theory, there was virtually no attention to this distinction prior to 1951, save in the writings of British-trained theorists. My 1951 article made no clear distinction between results for the sample and results for the population. It was not until Lord’s (1955) explicit formulation of the idea of random parallel tests that we began to write generally about the sampling, not only of persons, but of items. This two-way sampling had no counterpart in the usual thinking of psychologists. No change in procedures was required, but writing had to become more careful to recognize the sample-population distinction. The alpha formula is constructed to apply to data where the total score in a row of Table 1a will be taken as the person’s observed score. An equivalent form of the calculation applicable when the average is to be taken as the raw score yields the same coefficient. The alpha coefficient also applies to composites of k conditions. When an investigator wants to know what would happen if there were k′ conditions, the solution known as the Spearman-Brown Formula applies. My 1951 article embodied the randomly parallel-test concept of the meaning of true score and the associated meaning of reliability, but only in indefinite language. Once Lord’s (1955) statement was available, one could argue that alpha was almost an unbiased estimate of the desired reliability for this family of instruments. The almost in the preceding sentence refers to a small mathematical detail that causes the alpha coefficient to run a trifle lower than the desired value. This detail is of no consequence and does not support the statement made frequently in textbooks or in articles that alpha is a lower value to the reliability coefficient. That statement is justified by reasoning that starts with the definition of the desired coefficient as the expected consistency among measurements that had a higher degree of parallelism than the random parallel concept implied. We might say that my choice of the true score as the expected value over random parallel tests and the coefficient as the consistency expected among such tests is an assumption of my argument. There is a fundamental assumption behind the use of alpha, an assumption that has its counterpart in many other methods of estimating reliability. The parts of the test that identify columns in the score table (see Table 1a) must be independent in a particular sense of the word. The parts are not expected to have zero correlations. But it is expected that the experience of responding to one part (e.g., one item) will not affect performance on any subsequent item. The assumption, like all psychometric assumptions, is unlikely to
Salkind_Chapter 87.indd 267
9/4/2010 11:00:21 AM
268
Research Design, Measurement and Statistics and Evaluation
be strictly true. A person can become confused on an item that deals with, say, the concept of entropy, and have less confidence when he encounters a later item again introducing the word. There can be fatigue effects. And, insofar as performance on any one trial is influenced by a person’s particular state at the time, the items within that trial are, to some degree, influenced by that state. One can rarely assert, then, that violations of independence are absent, and it is burdensome (if not impossible) to assess the degree and effect of nonindependence.12 One therefore turns to a different method or makes a careful judgment as to whether the violation of the assumption is major or minor in its consequence. If the problem is minor, one can report the coefficient with a word of caution as to the reasons for accepting it and warning that the nonindependence will operate to increase such coefficients by at least a small amount. When the problem is major, alpha simply should not be used. An example is a test given with a time limit so that an appreciable number of students stop before reaching the last items. Their score on these items not reached is inevitably zero, which raises the within-trial correlation in a way that is not to be expected of the correlations across separately timed administrations. The alpha formula is not strictly appropriate for many tests constructed according to a plan that allocates some fraction of the items to particular topics or processes. Thus, in a test of mathematical reasoning, it may be decided to make 20% of the items around geometric shapes. The several forms of the test that could be constructed by randomly sampling geometric items will be higher than the correlation among items in general. The tests are not random parallel. When the distribution of content is specified formally, it is possible to develop a formula to fit those specifications, but this is difficult and not appropriate when the allocation of items is more impressionistic than strict. In such an instance, one is likely to fall back on alpha and to recognize in the discussion that the coefficient underestimates the expected relationship between observed scores and true scores formed from tests, all of which satisfy the constraint. That is to say, alpha tends to give too low a coefficient for such tests. An extension of alpha to fit specifically the stratified parallel test (sometimes called stratified alpha; Cronbach, Schonemann, & McKie, 1965) can be based on the battery reliability formula that Jackson and Ferguson published in an obscure monograph.13
Variance Components and Their Interpretation I no longer regard the alpha formula as the most appropriate way to examine most data. Over the years, my associates and I developed the complex generalizability (G) theory (Cronbach, Rajaratnam, & Gleser, 1963; Cronbach et al.,
Salkind_Chapter 87.indd 268
9/4/2010 11:00:21 AM
Cronbach and Shavelson
Coefficient Alpha
269
1972; see also Brennan, 2001; Shavelson & Webb, 1991), which can be simplified to deal specifically with a simple two-way matrix and produce coefficient alpha. From 1955 to 1972, we exploited a major development in mathematical statistics of which psychologists were unaware in the early 1950s. Subsequently, I had occasion to participate in the analysis of newer types of assessments, including the use of performance samples where the examinee worked on a complex realistic problem for 30 minutes or more, and as few as four such tasks might constitute the test (Cronbach, Linn, Brennan, & Haertel, 1997). The performance was judged by trained scorers so that the data generated could be laid out in a two-way matrix.14 Here I sketch out the components of variance approach to reliability focusing on the simplest case where coefficient alpha applies, the Person × Condition data matrix (see Table 1a). Random sampling of persons and conditions (e.g., items, tasks) is a central assumption to this approach.
Giving Sampling a Place in Reliability Theory Measurement specialists have often spoken of a test as a sample of behavior, but the formal mathematical distinction between sample of persons and populations of persons, or between a sample of tasks and a population [a universe] of tasks, was rarely made in writings on test theory in 1951 and earlier [see discussion of Fisher above]. Nevertheless, the postwar mathematical statistics literature suggested that one or both factors in a two-way design might be considered random. This opened the way for a method, the components of variance method, that reached beyond what Fisher’s interpretation offered.15 Random sampling, now, is almost invariably an assumption in the interpretation of psychological and educational data where conclusions are drawn, but the reference is to sampling of persons from the population. We are thinking now of a person universe matrix from which one can sample not only rows (persons) but also columns (conditions). Thus, the alpha article flirted with the thought that conditions are randomly sampled from the universe, but this idea did not become explicit until much later. Now, it is most helpful to regard the random sampling of persons as a virtually universal assumption and the random sampling of conditions that provide the data as an assumption of the alpha formula when the result is interpreted as applying to a family of instruments that are no more similar to each other than random samples of conditions would be. Investigators who want to postulate a higher degree of similarity among the composites would find alpha and related calculations underestimating the accuracy of the instrument. The [random sampling] assumptions just stated are not true in any strict sense, and a naive response would be to say that if the assumptions are violated, the alpha calculations cannot be used. No statistical work would
Salkind_Chapter 87.indd 269
9/4/2010 11:00:21 AM
270
Research Design, Measurement and Statistics and Evaluation
be possible, however, without making assumptions and, as long as the assumptions are not obviously grossly inappropriate to the data, the statistics calculated are used, if only because they can provide a definite result that replaces a hand-waving interpretation. It is possible at times to develop a mathematical analysis based on a more complex set of assumptions, for example, recognizing that instruments are generally constructed according to a plan that samples from domains of content rather than being constructed at random. This is more troublesome in many ways than the analysis based on simple assumptions, but where feasible it is to be preferred.
Components of Variance In the random model with persons crossed with conditions, it is necessary to recognize that the observed score for person p in condition i (Xpi) can be divided into four components, one each for the (1) grand mean, (2) person ( p), condition (i), and residual consisting of the interaction of person and condition ( pi) and random error (e, actually pi, e): X pi = μ + (μ p − μ ) + (μ i − μ ) + ( X pi − μ p − μ i + μ). The first of these, the grand mean, μ, is constant for all persons. The next term, μp – μ, is the person’s true score (μp) expressed as a deviation from the grand mean (μ) – the person effect. The true score, it will be recalled, is the mean that would be expected if the person were tested by an indefinitely large number of randomly parallel instruments drawn from the same universe. (In G Theory, it is referred to as the universe score because it is the person’s average score over the entire universe of conditions.) The μ i term represents the average of the scores on item i in the population and is expressed as a deviation from μ – the item effect. The fourth term is the residual consisting of the interaction of person p with item i, which, in a p × i matrix, cannot be disentangled from random error, e. The residual simply recognizes the departure of the observed score from what would be expected in view of the μi level of the item and the person’s general performance level, μp. (In most writings, the residual term is divided into interaction and error, although in practice it cannot be subdivided because with the usual matrix of scores Xpi from a single test administration, there is no way to take such subdivision into account.) Except for μ, each of the components that enter into an observed score vary from one person to another, one item to another, and /or in unpredictable ways. Recognizing that score components vary, we now come to the critically important equation that decomposes the observed-score variance into its component parts: V(Xpi ) = Vp + Vi + VRes.16
Salkind_Chapter 87.indd 270
9/4/2010 11:00:21 AM
Cronbach and Shavelson
Coefficient Alpha
271
Here, V is a symbol form of the population variance. (In the technical literature, the symbol σ2 is used.) The term on the left refers to the variation in scores in the extended matrix that includes all persons in the population and all items in the universe [see Table 1b]. It characterizes the extent of variation in performance. The equation states that this variance can be decomposed into three components, hence the name Components of Variance approach. The first term on the right is the variance among persons, the true-score variance. This is systematic, error-free variance among persons, the stuff that is the purpose and focus of the measurement. This variance component gives rise to consistency of performance across the universe of conditions. The i component of variance describes the extent to which conditions (items, tasks) vary. And the residual represents what is commonly thought of as error of measurement, combining the variability of performance to be expected when an individual can sometimes exceed his norm by gaining insight into a question and sometimes fall short because of confusion, a lapse of attention, and so forth. The last equation is only slightly different from the statement made in connection with alpha and more traditional coefficients: The observed variance is the sum of true-score variance and error variance. The novelty lies in the introduction of the μi. In the long history of psychological measurement that considered only individual differences, the difference in item means is disregarded, having no effect on individual standings when everyone responds to the same items. Spearman started the tradition of ignoring item characteristics because he felt that the person’s position on the absolute score scale was of no interest. He reasoned that the person’s score depended on a number of fairly arbitrary conditions, for example, the size and duration of a stimulus such as a light bulb, and on the background, as well as on the physical brightness itself. His main question was whether the persons who were superior at one kind of discrimination were superior at the next kind, and for this he was concerned only with ranks. Psychologists shifted attention from ranks to deviation scores, partly because these are sensitive to the size of differences between individuals in a way that ranks are not, are easier to handle mathematically, and fit into a normal distribution. (For a time, it was believed that nearly all characteristics are normally distributed, as a matter of natural law.) When psychologists and educators began to make standardized tests, some of them tried to use natural units, but this quickly faded out because of the sense that the individual’s score depended on the difficulty of the items chosen for the test. The rankings on arithmetic tests could be considered stable from one set of items to another, where the score itself was seen as arbitrary. Consequently, it was the statistics of individual differences observed in tests that received the greatest emphasis. Nonetheless, the absolute level of the person’s performance is of significance in many circumstances. This is especially true in the many educational tests
Salkind_Chapter 87.indd 271
9/4/2010 11:00:21 AM
272
Research Design, Measurement and Statistics and Evaluation
used to certify that the person has performed adequately. The critical score indicating minimal adequate performance is established by careful review of the tasks weighed by experts in the domain of the test. This score is established for the family of tests in general, not separately for each form in turn. When a candidate takes a form for which μi is unusually low, the number of examinees passing are reduced for no good reason. Therefore, persons using tests for absolute decisions must be assured that the choice of form does not have a large effect on a person’s chances of passing, which means that a low Vμi is wanted. The analysis that generates estimates of the three components is simple. One first performs an analysis of variance, ordinarily using one of the readily available computer programs designed for that purpose. Instead of calculating F ratios, one converts the mean squares (MS) for rows, columns, and a residual to components of variance. These equations apply: VˆResidual = MS Residual Vˆ = ( MS − MS i
Vˆp
i
Residual
) /n p
= ( MS p − MS Residual ) /ni
It is to be understood that these components describe the contributions of the three sources to variation in scores at the item level. We are looking not at the decomposition of a particular item but at a typical result, in a sense averaged over many persons and items. These estimates are readily converted to estimates that would apply to test scores and to averages over specified numbers of persons. The components of variance are determined with the assumption that the average of scores in the row (see Table 1a) would lead to the composite score. Specifically, if randomly sampled tests of 20 items are applied, and the average score on the 20 items is reported, then VˆResidual for this average score is 1/20 of VResidual for a single item score. Results reached with that understanding are readily converted to the total score scale. If your interpretation is based on the total scores over 20 items, VˆResidual for this total score is 20 times greater than VˆResidual , but I shall stay with averages for observed scores because this keeps formulas a bit simpler.
Interpreting the Variance Components The output from the analysis of variance is a set of estimates of characteristics of the population-universe matrix [see Table 1b]. The estimates are assumed to apply to any sample matrix. Obviously, they apply to the sample from which they were taken, and, for want of an alternative, the other possible sample matrices are assumed to be similar statistically. Variance components are generally interpreted by converting them to estimates of the corresponding standard deviations. Thus, the square root of
Salkind_Chapter 87.indd 272
9/4/2010 11:00:21 AM
Cronbach and Shavelson
Coefficient Alpha
273
the Vˆp is a standard deviation of the distribution of individuals’ true scores, that is to say, the average score they would obtain if they could be tested on all conditions in the universe. One might consider forming a composite instrument by combining many conditions, the usual test score being a prominent example. If the test score is expressed as a per-condition average, then the standard deviation just calculated applies to the true score on such composites. If, however, as is often the case, the total score over conditions is to be used, then the value of the standard deviation must be multiplied by the number of items to put it on the scale of the composite. The usual rule of thumb for interpreting standard deviations is that two thirds of the scores of persons will fall within one standard deviation of the mean, and 95% of the persons will fall within two standard deviations of the mean. The standard deviation of true scores gives a clearer picture of the spread of the variable being measured than the standard deviation that is calculated routinely from observed scores, because the effect of random errors of measurement is to enlarge the range of observed scores. Working from the Vˆp indicates whether the variable of interest is spread over much of the possible score scale or is confined to a narrow range. μp is the row mean in the population-universe matrix [see Table 1b], and μi is the column mean, that is to say, the population mean for all p under condition i. The variance of column means Vi is therefore the information about the extent to which condition means differ. A standard deviation may be formed and interpreted just as before, this time with the understanding that the information refers to the spread of the items (or, more generally, the spread of the conditions) and not the spread of persons. The standard deviation for condition means gives a direct answer to questions such as the following: Do the items in this ability test present similar difficulty? Do the statements being endorsed or rejected in a personality inventory have similar popularity? Do some of the persons scoring this performance exercise tend to give higher scores than others? It is important to reiterate that we are concerned with characteristics of the population and universe. We are arriving at a statement about the probable spread in other samples of conditions that might be drawn from the universe. Where we have a composite of k′ single conditions, the estimated variance for μ i must be divided by k′ (i.e., Vˆi /k ′). The standard deviation is reduced correspondingly, and if the composite is being scored by adding the scores on the elements, the estimated value of Vˆi is k′ times as large as that for single conditions. A comparatively large value of this standard deviation raises serious questions about the suitability of an instrument for typical applications. If students are being judged by whether they can reach a level expressed in terms of score units (e.g., 90% of simple calculations), then the student who happens to be given one of the easier tests has a considerable advantage and the test interpreter may get too optimistic an impression of the student’s ability. Similarly, when one of a group of scorers is comparatively lenient, the students who are
Salkind_Chapter 87.indd 273
9/4/2010 11:00:22 AM
274
Research Design, Measurement and Statistics and Evaluation
lucky enough to draw that scorer will have an advantage over students who draw one of the others. To introduce the residual or the RES, it may help to think of a residual score matrix that would be formed by adjusting each Xpi by subtracting out μp for person p and μi for condition i, then adding in the constant (μ) equal to the overall mean of scores in the population. These are scores showing the inconsistency in the individual’s performance after you make allowance for his level on the variable being measured, and the typical scores on the conditions in the universe. The residual scores spread around the value of zero. They represent fluctuations in performance, some of which can be explained by systematic causes, and some of which are due to nonrecurrent variation such as those due to momentary inattention or confusion. A few of the possible systematic causes can be listed: •
•
In an ability test, the student finds certain subtopics especially difficult and will consistently have a negative residual on such items; for example, the student taking a math test may be confused about tangents, even when he or she is at home with sines and cosines. Deviations can also arise from picking the high-scoring alternative when choosing between attractive options, and also from sheer good or bad luck in guessing. In an anxiety inventory, a student who can generally say that he or she has no emotional problems in situation after situation may recognize a timidity about making speeches or otherwise exposing himself or herself to the scrutiny of a group, and thus respond to the related items in a way that deviates from his or her typical response.
Additive Combinations of Variance Components The interpretation of components gives information about the populationuniverse matrix, but it is combinations of components that more directly yield answers to the questions of a prospective user of an instrument, including the following: How much do the statistics for the instrument change as k′ is increased or decreased? How much greater precision is achieved by using a crossed rather than a nested design for the instrument? How much is the score from a sample of conditions expected to differ from the universe score? How much is the uncertainty about the universe score arising from such errors of measurement? Adding two or three variance components in an appropriate way estimates the expected observed-score variance for measures constructed by sampling conditions. The word expected signifies that we can estimate only for a particular new set of randomly sampled conditions. I take up first the estimate for nested conditions where different individuals are assessed under different sets of conditions (see Table 2). The most common example is where scores on observations of performance tasks for
Salkind_Chapter 87.indd 274
9/4/2010 11:00:22 AM
Cronbach and Shavelson
Coefficient Alpha
275
Table 2: Statistics applying to two types of designs and two types of decisions Measurement Design
Absolute
Differential
Nested: Conditions (i ) within Persons (p) – i:p Vp
Universe-score variance
Vp
Expected observed-score variance
Vp + (Vi + VRes )/k’
Vp + (Vi + VRes )/k’
Error variance
(Vi + VRes)/k’
(Vi + VRes )/k’
Universe-score variance
Vp
Vp
Expected observed-score variance
Vp + (Vi + VRes )/k’
Vp + (Vi + VRes )/k’
Error variance
(Vi + VRes )/k’
(VRes )/k’
Crossed: Conditions (i ) crossed with Persons (p) – p × i
Note: It is assumed that each person responds to a sample of k’ conditions and that the score for the person is the average of these scores under separate conditions. If the totals were used instead, the entries in the table would be increased but the patterning would remain the same. The standard error of measurement is the square root of the error variance. The reliability coefficient pertains only to differential measurement and is obtained by dividing the universe-score [true-score] variance by the expected observed-score variance.
each individual are assigned by different scorers selected haphazardly from a pool of qualified scorers. The expected observed-score variance here is a weighted sum of all three components. Assume that there are k′ conditions and that the average score over conditions will be used: VˆX = Vˆp + VˆRes /k ′ pi where the residual consists of three variance components confounded with one another Vˆi , Vˆpi , eˆ. The weight of Vˆp is 1. The other two components (conditions confounded with the pi interaction and error) are weighted by 1/k′ This allows for the fact that as more conditions are combined, random variability of the average decreases. If future observations will be made by means of a crossed design, everyone being observed under the same set of conditions, then the expected observed variance is VP plus VRes/k ′. The variation in conditions (i ) makes no contribution, because everyone is exposed to the same conditions and all scores are raised or lowered on easy and difficult items (respectively) by a constant amount. In the crossed p × i design (see Table 1a), each person is observed under each condition. The most common example is where scores are available for each individual on each item on a test. The expected observed-score variance here (see Table 2) is a weighted sum of Vp and VRes, where VResidual consists of Vpi, e. Again, the weight of Vp is 1. The residual is weighted by 1/k′. A comparison of the residual terms for the nested and crossed design shows that in the nested design, the variance due to conditions cannot be disentangled from the variances due to the person by condition interaction and random error. With a crossed design, condition variance can be disentangled from variance due to the person by condition interaction and error. Consequently, the nesteddesign residual will be larger than or equal to the crossed-design residual.
Salkind_Chapter 87.indd 275
9/4/2010 11:00:22 AM
276
Research Design, Measurement and Statistics and Evaluation
The Standard Error A much more significant report on the measuring instrument is given by the residual (error) variance and its square root, the standard error of measurement (SEM). This describes the extent to which an individual’s scores are likely to vary from one testing to another when each measurement uses a different set of conditions. In the nested design, the error variance equals the expected observed score variance as calculated above minus Vp. This leaves us with the weighted sum of the i and residual components of variance, both of which represent sources of error. The rule of thumb I suggest for interpreting the standard error assumes that errors of measurement for any person are normally distributed, and the standard error tends to be the same in all parts of the range. Both of these assumptions can be questioned. Indeed, when complex analyses are used to estimate a standard error in each part of the range, it is usual for the standard error to show a trend, higher in some ranges of universe [true] scores than others. Here again, we rely on the rule of thumb, because it is impractical to interpret the standard error without them. Observed scores depart in either direction from the person’s universe score. Two thirds of the measurements, according to the usual rule of thumb, fall within one SEM of the universe score, and 95% fall within two SEM. Here we have a direct report on the degree of uncertainty about the person’s true level of performance. The figure is often surprisingly large and serves as an important warning against placing heavy weight on the exact score level reached. For many purposes, a useful scheme is to report scores as a band rather than a single number. Thus, in a profile of interest scores, one would have an array of bands, some spanning a low range and some spanning a high range, but usually with a good many that overlap to a large degree. This discourages emphases on which interest is strongest and encourages attention to the variety of categories in which the person expresses interest For a design with conditions (e.g., scorers) nested within persons, the residual or measurement error includes differences in condition means as well as unsystematic (random) variation (due to the p × i interaction confounded with random error; see Table 2). In this case, we speak about what may be called absolute measurement, where the level of a person’s score, and not just his or her standing among peers, is of concern. Many educational applications of tests require a judgment as to whether the examinee has reached a predetermined score level. Examinees are not in competition; all may meet the standard, or none. For a design with conditions (e.g., items) crossed with persons, the residual or measurement error does not include differences in condition means. So the residual is an index of relative or differential error disentangled from differences in conditions means. In contrast to absolute measurement, this
Salkind_Chapter 87.indd 276
9/4/2010 11:00:22 AM
Cronbach and Shavelson
Coefficient Alpha
277
differential measurement is concerned with the relative standing of persons. In selection, when there are a limited number of positions to be allotted, the highest scoring individuals are given preference. Few practical decisions are based directly on such simple rankings, but this is the formulation that permits statistical analysis. It should be noted also that where the correlation between one instrument and another is to be the basis for interpreting data, the interpretation is differential. It was his interest in correlations that led Spearman originally to define the reliability coefficient so that it applied to differential measurement (which ignores the contribution of variation in mi to error). This tradition dominated the literature on reliability down through the alpha article. Many tests convert the raw score to a different form for use by interpreters. Thus, the raw score on an interest inventory is often expressed as a percentile rank within some reference distribution. There is no way to apply internal consistency analysis directly to such converted scores. One can, however, express the bounds on the probable true score on the raw score scale, as has been illustrated. Then each limit can be rescaled to apply to the new scale. As an illustration, suppose that raw scores 40, 50, and 60 convert to percentile scores 33, 42, and 60, respectively. Then an observed score of 50 converts to a percentile score of 42. If we have established that two thirds of the raw scores fall between 43 and 57, these can be converted to the new scale supplying an asymmetric confidence range running from approximately 37 to 56. Note that the interval is no longer symmetric around the observed score.
Reliability Coefficients We come now to reliability coefficients estimated with variance components. These coefficients describe the accuracy of the instrument on a 0-to-1 scale; the alpha coefficient fits this description. The assumptions underlying the formulas for estimating variance components are quite similar to the assumptions made in connection with alpha. We discuss here only the analysis of the crossed design, which matches the basis for alpha. The principal change is that because variance components are used to make inferences to the populationuniverse matrix [see Table 1b] rather than describing the sample, the random sampling of persons and of conditions becomes a formal assumption. In general, the coefficient would be defined as Vp divided by the expected observed variance. We have seen above that the expected observed variance takes on different values, depending on the design used in data collection. Coefficients differ correspondingly. The alpha coefficient applies to a crossed design implying k conditions. It refers to the accuracy of differential measurement with such data. Computing components of variance has the advantage that an observed-score variance is estimated in terms of k′, which may take on any value. Thus, direct calculation of the expected observed variance
Salkind_Chapter 87.indd 277
9/4/2010 11:00:23 AM
278
Research Design, Measurement and Statistics and Evaluation
(with the implied and important standard error) reaches the result for which Spearman-Brown Formula has traditionally been utilized.17 As the expected observed variance is larger for a nested design than a crossed design [See Table 2], the coefficient is smaller than that from the crossed design. This is important because an instrument developer often sets up the crossed design in checking the accuracy of the instrument when practical conditions make it likely that the actual data obtained will have a nested design. Differential and absolute measurements and reliability. It will be noted that the alpha coefficient is included as one of the statistics reported with differential decisions and not with absolute decisions. A coefficient could be calculated by formal analogy to the entry in the differential column, but it would be meaningless. A coefficient is concerned with individual differences, and those are irrelevant to absolute decisions. Homogeneity/heterogeneity of samples of conditions. Whereas the topic of homogeneity was the subject of heated discussion in the late 1940s, it has faded from prominence. There are, however, investigators who believe that good psychological measurement will rely on homogeneous instruments, where homogeneity can be thought of as consistency from one condition to another in the ranking of individuals. A contrary position emphasizes that one needs to represent all aspects of the variable that is the focus of measurement, not narrowing it to a single focal topic. An appropriate statistic for evaluating the homogeneity of conditions is the value of the reliability coefficient when k′ is set at 1. The value of this coefficient is held down not only by diversity among conditions, but also by the sheer unreliability of an individual’s performance in responding many times to the same condition. More advanced techniques, such as factor analysis, can remove much of the ambiguity.
Recommendations General Observations and Recommendations I am convinced that the standard error of measurement, defined in accordance with the relevant cell of Table 2, is the most important single piece of information to report regarding an instrument, and not a coefficient. The standard error, which is a report on the uncertainty associated with each score, is easily understood not only by professional test interpreters but also by educators and other persons unschooled in statistical theory, and also to lay persons to whom scores are reported. There has been a shift in the character of the way measurement is used. The change is obvious in much of educational assessment, where the purpose is to judge individuals or student bodies relative to specified performance standards. Rankings are irrelevant. A similar change is to be seen in screening applicants for employment, where the employer now bears a burden of proof
Salkind_Chapter 87.indd 278
9/4/2010 11:00:23 AM
Cronbach and Shavelson
Coefficient Alpha
279
that the choice of a higher scoring individual is warranted, a policy that seems to work against minority candidates. In making comparisons between candidates, the employer wants to know whether a difference in favor of one of the two would probably be confirmed in another testing. (Questions about the predicted job performance of the candidates are more significant than questions about accuracy of measurement, but inaccurate measurement sets a limit on the accuracy that predictions can obtain.) The investigator charged with evaluating reliability ought to obtain information on the most prominent sources of potential error. For instruments that make use of judgment of scorers or raters, a simple p × i design is inadequate. The alpha coefficient, which relies on that design, is appropriate enough for objectively scored tests where items can be considered a sample from the domain. But even in the limited situation contemplated in a p × i design, the application of the alpha formula does not yield estimates of the three components of variance or the sums listed in Table 2. I cannot consider here data structures in which conditions are classified in more than one way. In general, a person responsible for evaluating and reporting the accuracy of a measurement procedure ought to be aware of the variety of analyses suggested by Table 2 and include in the report on the instrument information for all of the potential applications of the instrument. Sometimes the investigator will know that the instrument is to be used in correlational research only, in which case a reliability coefficient may be the only report needed. But most instruments lend themselves to more diversified applications. I suggest that the person making judgments about the suitability of an instrument or its purposes, or about the trust that can be placed in observed scores, consider these questions: In my use of the instrument, will I be concerned with the absolute standing of persons, or groups, or the comparative standing? The choice of a single statistic to summarize the accuracy of an instrument is not the best report that can be made. I recommend that the three separate components of variance be reported. Given this information, the investigator can combine the components or not, according to the competence of his or her likely readership.
Considerations in Conducting a Reliability Study Aspects of the test plan. The investigator conducting a reliability study should consider a number of points in taking advantage of the information laid out. I write here as if the investigator believes that his or her instrument is likely to be useful in future studies by him or her or by others, and that the investigator is therefore providing guidance for instrumentation in those studies. Of course, the case may be that the investigator is interested in the current set of data and only that set, and has no intention of making further use of the instrument. If so, the investigator will run through these considerations,
Salkind_Chapter 87.indd 279
9/4/2010 11:00:23 AM
280
Research Design, Measurement and Statistics and Evaluation
giving much weight to some and little weight to others in deciding of the adequacy of the scores for the purpose of that one study. I assume that the investigator is starting with a matrix of scores for persons crossed with conditions, such as are used with the alpha formula. Independence in sampling. The first step is to judge whether assumptions behind the calculations are seriously violated by the data being used. Violations of the independence assumption can often be regarded as having little consequence, but some violations are serious. The most prominent and frequent misuse of the computations discussed in this article is to apply them to a test where the examinees are unable to complete many items on which they have a reasonable probability of earning a nonzero score. The data may then be used only if it is considered reasonable to truncate the data set, eliminating persons who have too many items not completed, or omitting items toward the end of the set from the calculation. This is a makeshift solution, but it may be necessary. Heterogeneity of content. Another common difficulty is that conditions fall into psychologically distinct classes, which calls into question the assumption that conditions are randomly sampled. There is no reason to worry about scattered diversity of items, but if, for example, a test in mathematics is planned with some number of geometric-reasoning items and a certain number of numeric reasoning items, the sampling is not random. This type of heterogeneity is not a bar to use of the formulas. It needs only to be recognized that an analysis that does not differentiate between the two classes of items will report a larger standard error than a more subtle analysis. How the measurement will be used. Decide whether future uses of the instrument are likely to be exclusively for absolute decisions, for differential decisions, or may include both uses (not necessarily in the same study). If either type of decision is unlikely to be made with this instrument in future applications, no further information need be stated for it. Once this decision is made, I recommend that the investigator calculate estimates for the components of variance and combine these to fill in numerical values for the rows of each relevant column of Table 2. With respect to differential decisions, the standard error from a nested design will be at least a bit larger than the standard error from a crossed design. This larger error, plus the appearance of greater fairness, favors use of crossed designs wherever feasible. However, in large-scale programs such as tests for college admissions, it may seem easy to provide crossed data, when in fact the data are from a nested design. Examinees tested on different dates, or perhaps in different locales, will take different forms of the test and yet be compared with each other. Where it is practical to obtain crossed data for a reliability study, the program itself will always have a nested design. Likewise, a crossed design with a small group of scorers is feasible for the reliability study, but the crossing is impractical in operational scoring of the instrument.
Salkind_Chapter 87.indd 280
9/4/2010 11:00:23 AM
Cronbach and Shavelson
Coefficient Alpha
281
Number of conditions for the test. Next, specify the standard error considered acceptable for the purpose of the measurement. Calculate the value of k′, which changes the previously calculated standard error. The original value assumed the decisions would be based on responses to k conditions, the new calculation may produce a higher or lower value of k′. Increasing k′ to the value just calculated may prove too costly, and a compromise must be made between cost and precision. When a test will be used in a variety of contexts, different users may specify different standard errors as acceptable. Anticipating that problem, the original investigator could well set up a table with several values of the standard error and the corresponding k′ required to achieve each one. If the instrument is to be used in correlational research only, it may be easier to specify an acceptable reliability coefficient than a standard error. The equations in the differential column make it simple to convert the acceptable coefficient detailed and acceptable probable error.
Main Message of These Notes The alpha coefficient was developed out of the history that emphasized a crossed design used for measuring differences among persons. This is now seen to cover only a small perspective of the range of measurement uses for which reliability information is needed. The alpha coefficient is now seen to fit within a much larger system of reliability analysis.
Notes The project could not have been started without the assistance of Martin Romeo Shim, who helped me not only with a reexamination of the 1951 paper but with various library activities needed to support some of the statements in these notes. Mydebt is even greater to Shavelson for his willingness to check my notes for misstatements and outright errors of thinking, but it was understood that hewas not to do a major editing. He supported my activity, both psychologically and concretely, and I thank him. 1. [All Editor’s Notes in text, as well as in subsequent endnotes, are in brackets.] 2. [To give some notion of how extraordinary this annual citation frequency is for a psychometric piece, Noreen Webb and I published Generalizability Theory: A Primer in 1991. The average number of social science citations over the past 5 years was 11 per year!] 3. [Cronbach, Rajaratnam, & Gleser (1963).] 4. [In “Coefficient Alpha,” Cronbach (1951, p. 300) cites both Spearman (1910) and Brown (1910) as providing the first definition of a split-half coefficient.] 5. [As applied to reliability, intraclass correlation is a ratio of true-score (typically person) variance to observed-score variance for a single condition which is composed of truescore variance plus error variance.] 6. The articles by others working with Fisher’s ideas employed a number of statistical labels that gave a result identical to my formula but that were unfamiliar to most persons applying measurements. This explains why so little use was made of these formulas. Priority in
Salkind_Chapter 87.indd 281
9/4/2010 11:00:23 AM
282
7.
8. 9.
10.
11.
12. 13. 14.
15.
16. 17.
Research Design, Measurement and Statistics and Evaluation
applying the appropriate intraclass correlation to measurements probably goes to R. W. B. Jackson (Jackson & Ferguson, 1941). So far as I recall, no one had presented the version that I offered in 1951, except for the Kuder-Richardson report, which did not give a general formula. Violation of independence usually makes the coefficient somewhat too large, as in the case where the content of each test form is constrained, for example, by the requirement that 10% of items in a mathematical reasoning test should be concerned with geometric reasoning. Then, the items can be described as chosen at random within the category specified in the [test] plan, but this is stratified random sampling rather than random sampling. The alpha formula will underestimate the reliability of such instruments (Cronbach, Schonemann, & McKie, 1965). [Cronbach is likely referring to Burt (1936).] Realistically, of course, conditions themselves may be classified in two or more ways, for example, test questions being one basis for classification and scorer being another. The matrices that result when persons are combined with such complex systems of conditions are the subject of generalizability theory (Cronbach, Gleser, Nanda, & Rajaratnam 1972), and did not enter into the 1951 article. To avoid confusion, my colleagues and I adopted the convention of referring to the domain of items from which tests were presumably sampled as the universe of items, reserving the term population for the persons represented in a study. The statements in the preceding two paragraphs are in no way peculiar to alpha. They appear in the theory for any other type of reliability coefficient, with the sole reservation that some coefficients rest on the assumption that every test in a family has the same correlation with the corresponding true score. This assumption of independence enters the derivation of any internal-consistency formula. [Cronbach is likely referring to Jackson and Ferguson (1941).] Most of the analyses involved more complex structures, for instance, a three-way matrix in which persons, tasks, and scorers were treated as separate bases for sorting scores. It may be said at the outset that these methods retained Fisher’s calculations but then went beyond them to an interpretation that would have been meaningless with fixed factors such as species. [Vp = E(μp − μ)2; Vi = E(μi − μ)2; VResidual = E(Xpi − μp − μi + μ)2; VX = Vp + Vi + VRes, pi where E is the expectation operator.] [Alpha, expressed in variance-component terms, is
α=
Vp Vp +
VRes k′
where k′ provides the Spearman-Brown adjustment for length of test (or, alternatively, number of tests).]
References Brennan, R. L. (2001). Generalizability theory. New York: Springer-Verlag. Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322. Burt, C. (1936). The analysis of examination marks. In P. Hartog & E. C. Rhodes (Eds.), The marks of examiners (pp. 245–314). London: Macmillan.
Salkind_Chapter 87.indd 282
9/4/2010 11:00:23 AM
Cronbach and Shavelson
Coefficient Alpha
283
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. Cronbach, L. J. (Ed.). (2002). Remaking the concept of aptitude: Extending the legacy of Richard E. Snow. Mahwah, NJ: Lawrence Erlbaum. Cronbach, L. J., & Gleser, G. C. (1953). Assessing similarity among profiles. Psychological Bulletin, 50(6), 456–473. Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: John Wiley. Cronbach, L. J., Linn, R. L., Brennan, R. L., & Haertel, E. H. (1997). Generalizability analysis for performance assessments of student achievement or school effectiveness. Educational and Psychological Measurement, 57, 373–399. Cronbach, L. J., Rajaratnam, N., & Gleser, G. C. (1963). Theory of generalizbility: A liberalization of reliability theory. British Journal of Statistical Psychology, 16, 137–163. Cronbach, L. J., Schonemann, P., & McKie, D. (1965). Alpha coefficients for stratifiedparallel tests. Educational and Psychological Measurement, 25, 291–312. Jackson, R. W. B., & Ferguson, G. A. (1941). Studies on the reliability of tests (Bulletin No. 12, Department of Educational Research, Ontario College of Education). Toronto, Canada: University of Toronto Press. Lord, F. M. (1955). Estimating test reliability. Educational and Psychological Measurement, 15, 325–336. Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage. Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271–295.
Salkind_Chapter 87.indd 283
9/4/2010 11:00:23 AM
Salkind_Chapter 87.indd 284
9/4/2010 11:00:23 AM
88 Handbook of Evaluation Research Lee Ross and Lee J. Cronbach1,2
W
ithin the social sciences, the appearance of a handbook can signal the coming of age of an area of scholarship. In the 1920s the newly flourishing fields of social psychology and child psychology were heralded by the Murchison volumes. Over the years the tradition has continued. As one field after another has come to fruition – organizational theory, communications research, and research on socialization, to name a few – there have been handbooks to announce these events and provide rallying points for an increasingly visible college. The systematic evaluation of social programs and institutions dates back at least to the “Age of Reform” at the turn of the century (Hofstadter, 1960), but the development of a scholarly community has come slowly. Without guidance, each generation was forced to improvise its own strategies or, at best, to extrapolate from existing texts on general research methodology. After World War II the Lewinian commitment to action research stimulated psychology’s interest in social intervention strategies and spawned Research Methods in Social Relations by Cook, Jahoda, and Deutsch (1951). Still, no textbook specifically concerned with evaluation was available until 1967 when Suchman’s brief text Evaluation Research (180 pages) appeared. Publication of the massive Handbook of Evaluation Research (1975) is thus a significant event; it constitutes a claim that evaluation has come of age in the last decade, that the field’s accumulated wisdom merits codification in two volumes, 38 chapters, and 1,432 pages. Our review measures the Struening-Guttentag Handbook against a stern, if not wholly unreasonable, standard: that a handbook should be not a random
Source: Educational Researcher, 5 (1976): 9–19.
Salkind_Chapter 88.indd 285
9/4/2010 11:00:09 AM
286
Research Design, Measurement and Statistics and Evaluation
collection of articles but a mosaic depicting the domain, central concerns, and contributions of its field. It should consolidate methods, theory, and data that are scattered over a vast primary literature. It should instruct neophytes and increase the sophistication of practitioners – and, in the case of evaluation research, increase the sophistication of those who commission and act upon the results of the research. Ideally, it should also inspire and focus the attention of the leaders in the field from whom future advances must come. Does the Handbook – we ask – clearly define the domain of evaluation research and convey its present art and technology, its difficulties, and its possibilities for greater service? Our review begins by describing the origins of the Handbook and outlining its contents. It then proceeds to consider general, nontechnical issues of research strategy and tactics in evaluation. We discuss some of the features – conceptual, as well as political – that distinguish evaluation research from conventional research, and we relate this distinction to certain strengths and weaknesses of the Handbook. The review next takes up design, instrumentation, and analysis and evaluates the advice provided (or omitted) by the Handbook’s chapters. Finally, our review discusses two views of the evaluation enterprise: the conventional view that dominates the field and the Handbook, and an alternative view that seems likely to improve evaluation practice. The review was written by members of the Stanford Evaluation Consortium, which is a group of about 20 Stanford faculty plus an equal number of advanced graduate students from several departments. Most Consortium members have worked on evaluation teams or in the management of social intervention programs – mostly in education, but also in other facets of community services such as counseling and therapy. Ideas for the review came initially from many members of the Consortium, but the reviewers ultimately speak for themselves. Lee Ross and Lee Cronbach served as editors for the reviewing team.
Origins, Organization, and Content of the Handbook The Handbook was conceived in the face of the burgeoning demand for formal evaluations that sprang up in the 1960s. This demand was a product of the period’s dramatic expansion of social services and proposals for services, but, it is interesting now to recall, the demand did not come primarily from tightfisted conservatives. It was Robert McNamara who introduced “program budgeting,” that forerunner of sloganizing about accountability. Francis Keppel and John Gardner launched the National Assessment of Educational Progress, and Robert Kennedy added a requirement for evidence of behavioral change to Title I of the Elementary and Secondary Education Act. The Progressives of the 1890s suspected that governmental services would be mismanaged unless program procedures and accomplishments were closely monitored; their successors in the 1960s shared these suspicions.
Salkind_Chapter 88.indd 286
9/4/2010 11:00:09 AM
Ross and Cronbach
Evaluation Research
287
In the 1960s there were few evaluation specialists with broad training or experience. Many individuals, of course, had participated in evaluations of particular programs or institutions, using some particular evaluation tool, design, or strategy. In education, notably, some evaluators were trained to conduct intensive observation of events within the classroom while others were schooled in the use of standardized testing, but few were adequately prepared to assess the impact of large-scale innovations and interventions. The vacuum was filled in part by a cadre of systems analysts, economists, applied sociologists, and others willing to apply their field’s generic tools to evaluating social intervention. Each field, in fact, began to define evaluation as one of its branches. The evaluation literature quickly expanded as evaluators both reported their experiences and translated methodological maxims of social science for the use of their peers. A few writers even tried to generalize about problems, priorities, and strategies. Compilations of such articles (starting with Wittrock & Wiley, 1970, and Caro, 1971) began to appear, but still a void remained. There simply was no single source of systematic, comprehensive, or definitive papers. The Struening-Guttentag Handbook, supported by a grant from the National Institute of Mental Health to the Society for the Psychological Study of Social Issues, was obviously conceived as an attempt to fill this void. Work commenced late in the 1960s with a plan for publication about 1971. Gestation and birth, however, were neither painless nor free of complications. Some authors were lost to the project, others lagged in delivering chapters, and those articles submitted early were rendered somewhat obsolete by the delays. The thinking (and rhetoric) of five years ago dominates the Handbook. Nevertheless, in a field so fragmented, diffusion of ideas and criticism is slow indeed; so the Handbook’s contents will be news to most rank-and-file evaluators and commissioners of evaluation. Certainly, the diversity of social science disciplines represented will surprise and enlighten many readers. We suspect that some of the more weighty chapters will offer fresh ideas not only to the reader but to some of the Handbook’s contributors as well. It was the Handbook’s avowed purpose to guide evaluators across the entire range of human services and social interventions. Nevertheless, the Handbook’s preface (II: Ch. 1)3 properly stresses the need for an intimate understanding of the special evaluation requirements for each distinctive type of program and every unique setting. One cannot quarrel with either the broad goal or the caveat about specificity, but one can criticize the general absence of typologies, decision rules, or even maxims to bridge the gap between the general and the specific. No chapter undertook the task of clarifying the different types of evaluation settings, tasks, or objectives in a manner designed to help the reader apply the wealth of advice (replete, inevitably, with conflicting priorities and seeming contradictions) that the Handbook contains.
Salkind_Chapter 88.indd 287
9/4/2010 11:00:09 AM
288
Research Design, Measurement and Statistics and Evaluation
Central issues are taken up in Volume 1 and in the first 100 pages of Volume 2. The flow of topics is suggested by some of the vocabulary in section titles: conceptualization, measures, interviews and records, social ecology, data analysis, cost-benefit, communication of results, and policy. Some of the chapters are entirely nontechnical, even journalistic in flavor, while others present formal methodology at the level of a first-year graduate course in social science. The last 500 pages of Volume 2 deal with evaluations of specific programs or review literature relevant to evaluations in particular areas of intervention. While each article of this group offers material for discussion, this review primarily concerns issues in evaluation more broadly construed. We shall therefore confine our remarks on these chapters of Volume 2 to the following illustrative comments. Brandon’s hundred-page chapter – it will seem like a thousand to many readers – lays out the rationale, procedures, and findings for a social-area analysis that illustrates how regression analysis of survey data can tease out the factors that condition client use of community programs (II:341–429). A critique would have aided the reader, if only to warn of the hazards in drawing firm causal conclusions from correlational data and in choosing arbitrary area units for aggregation of data. Sainsbury describes step by step how an evaluation of a community psychiatric service in Chichester was planned and carried out – the chapter is direct, clear, and of value to the inexperienced (II:125–159). The Durkins (II:275–339) distill – from studies of residential treatments of disturbed children – a list of problems, advice on tactics, and recommendations for integrating evaluation into an operating system. The remaining chapters are more concerned with describing results of evaluation than with upgrading research strategy. These include a reprint (II:519–603) of Bronfenbrenner’s competent chapter summarizing results on early education and a literature review by Rabkin on public attitudes towards mental illness (II:431–482). The 38 chapters of the two volumes represent many disciplines – economics, demography, epidemiology, statistics, systems analysis, and psychoanalysis – that have contributed to evaluation research. However, the collection is a strange melange. A third of the articles are reprintings – for example, Donald Campbell’s “Reforms as Experiments” (I:71–100) and Jacob Cohen’s “Multiple Regression as a General Data-Analytic System” (I:571–595). Significantly, three chapters from a standard text (Nunnally, 1967) form the backbone of the technical section. Inevitably, such articles vary considerably in style and level of discourse, and they do not optimally supplement and complement other contributions in the Handbook. It is similarly disappointing to find chapters that were directly commissioned for the book equally lacking in coherence. These range from the solid, original presentation of a specialized topic to a rather platitudinous listing of caveats, from the superficial announcement of important topics to a comprehensive summary of a slice of literature. A noticeable weakness is the lack of articles featuring comparative discussions
Salkind_Chapter 88.indd 288
9/4/2010 11:00:09 AM
Ross and Cronbach
Evaluation Research
289
and analyses. For instance, no chapter helps the reader understand relative strengths and weaknesses, shared assumptions, distinguishing priorities, etc., of the evaluation disciplines that are so faithfully represented. Equally unfortunate is the failure to provide any comprehensive literature review dealing with methodology. Most disappointing to the present reviewers was the lack of direct confrontation. Heated controversies abound in the evaluation field, and it is apparent that the Handbook contributors disagree in their experiences, outlooks, and recommendations. Nevertheless, the authors obviously have not been invited to explore the conflicts. Neither the prose, which remains uncontentious, measured, and bland, nor the selection and organization of chapters will stimulate discussion. Conflicts that should occasion strenuous debate remain submerged. The reader can grasp the disputed matters only if he is willing to dive deep beneath a sea of truisms. In the course of the present review we shall do our best to bring some of these conflicts to the surface.
Nontechnical Issues of Evaluation The politics of evaluation research. The Handbook is committed to “evaluation research” as distinguished from “evaluation.” The coded message in such a distinction is that the evaluator is to act as an objective scientist, emphasizing quantitative and reproducible techniques and eschewing judgment. Many of the authors are sophisticated enough to warn that distancing oneself from the project for the sake of objectivity and confining one’s attention to what is quantifiable can result in a sterile and uninformative evaluation. Nevertheless, the ideal most often expressed is “Science in action” – with the emphasis on the science and not on the action. Deming (I:55), for instance, characterizes evaluation simply as the striving for consensus on cause and effect. Over and over, Handbook authors urge the evaluator to formulate sharp questions or hypotheses, to collect data systematically, and to compress them systematically into conclusions. Scientifically valid conclusions are held to be the hallmark of excellence in evaluation. The commissioning of an evaluation, however, is rarely the product of the inquiring scientific spirit; more often it is the expression of political forces. The Handbook’s authors appreciate the relevance of this political context; indeed, a number seek to raise the evaluator’s political sophistication and consciousness. Knowledge is power, so the maxim goes, and evaluation necessarily plays a role in the struggle for political power. It is “an effort to gain politically significant information on the consequences of political acts” (Cohen, 1970, pp. 236–237). The call for rational decision can itself be an attempt to capture ground from opponents. Certainly, the evaluator who believes that the introduction of evaluation facts and findings will make an argument less political is dangerously deluded.
Salkind_Chapter 88.indd 289
9/4/2010 11:00:09 AM
290
Research Design, Measurement and Statistics and Evaluation
As Carol Weiss (I:18) warns, the evaluator’s several audiences are waiting to use the data in their own way, as tactics in a continuing political struggle. Far from supplying facts and figures to an economic man, the evaluator is furnishing arms to combatants in a war with fluid lines of battle and transient alliances; whoever can use the evaluators to gain an inch of terrain can be expected to do so. Politicians are no less rational than social scientists, Weiss explains, but their rationality is political in its objectives. They are concerned with bargaining and trade-offs, with satisfying constituents and keeping politically advantageous programs alive whether or not they accomplish their stated goals. These goals themselves, in fact, are apt to be diffuse, exaggerated, and even inconsistent because of the governmental processes that shape them. The relationship between the processes of social intervention and evaluation has proven more complex and disconcerting than the Progressives imagined. Social interventions arise from the observation of distress and from the demands of the distressed. The need for intervention, typically, is easy (and often unnecessary) to document objectively. Any particular intervention conceived by rational and concerned individuals will probably be easy to defend as a necessary effort. It is the demonstration of the effectiveness or sufficiency of the intervention that requires the professional evaluator’s tools. One can imagine happy circumstances in which all parties anticipate that the proposed innovations will prove sufficient. Then, little conflict is likely to arise. The innovator sees the evaluation as likely to increase political and financial support for his program and thus regards the evaluator as an ally. Such circumstances, however, are rare. More typically, the innovator will not be confident about the results of formal evaluation. Indeed, the more sophisticated and experienced the innovator, the less reason he will find for optimism. Whether the target of intervention is crime, illiteracy, anomie, or some other social ill, the causes are multiple and only a few of them are affected by any one intervention. When many factors determine outcomes, the evaluator’s task is to detect a small signal amidst a great deal of noise. Moreover, when factors combine multiplicatively, a standard intervention may be necessary but insufficient, and there may be no signal to detect. In all, the standard evaluator’s role (as distinct perhaps from his motives) is inherently a threatening and potentially destructive one from the viewpoint of the innovator or supporter of social reform. In the fight for allocation of resources, arguments based upon necessity are apt to prove more persuasive than the findings of program evaluators. Evaluators, in fact, can produce a strong case for withholding funds in the very instances where no such case could otherwise be mounted in the face of crushing social needs. In short, the reformer’s antipathy to summative (i.e., outcome-oriented) evaluation is not “paranoid” or “unthinking” or “anti-intellectual”; it is simply pragmatic. The Handbook’s contributors take differing and perhaps conflicting stances in advising the evaluator about his encounters with political forces. Campbell warns that program directors and political advocates are apt to block
Salkind_Chapter 88.indd 290
9/4/2010 11:00:09 AM
Ross and Cronbach
Evaluation Research
291
controlled summative evaluations because, with good reason, they fear “reality testing.” He urges the evaluator to reject the role of bystander in the political process and to urge politicians to support assessments not of single solutions but of planned program variations (I:73). Campbell is also unique among the authors in suggesting (I:95ff.) that the evaluator accept the task of formal, impartial, summative appraisal only when the political community is poised to move with the results. Other authors also wrestle with political dilemmas. David Twain (I:29ff.) settles for the blandest of all advice: Engage in good-natured negotiation with all the persons whose cooperation you need. Howard Davis and Susan Salasin (I:626ff.) go further and suggest that the evaluator abandon detachment and become self-consciously a “change agent,” that he collect data with an eye to their potential leverage in producing social change. In essence, Davis and Salasin advocate a formative role for the evaluator: He is to collaborate in program design, improvement, and monitoring, but he is not to deliver impartial facts to unsympathetic outsiders. Weiss’s thoughtful chapter (I:13–26) conveys her ambivalence. She wants the evaluator to enter the fray: “Only with sensitivity to the politics of evaluation research can the researcher be as creative and strategically useful as he should be” (I:14). But her chapter ends with a pessimistic message we may paraphrase thus: Social programs have rarely proven sufficient to cope with major social ills. In a field where good studies have shown negative results, the next evaluator will contribute little either by designing better studies or by carefully appraising minor variants of the old programs. Talent may be better employed in studying the causes of social problems (including the institutional structures that perpetuate them) and in developing theory to guide social policy than in assessing, ad hoc, patch-on remedies. The most forceful comment on the politics of evaluation comes from Gideon Sjoberg (II:29–51), who, alone among the writers, pillories the evaluator as a conservative building conservative assumptions into his designs, his instruments, and his relations to the political system. Evaluators, Sjoberg charges, align themselves with the dominant groups in the system and meekly accept their definitions of program success and the public good. Sjoberg would have the evaluator adopt a “countersystem” position. He is to envision logical alternatives to the social structure; he is to cooperate with the powerless in formulating priorities for research and data collection. It should be clear from the preceding paragraphs that the politics of evaluation receive some intelligent attention in the Handbook. Unfortunately, it is the intelligence of one speaker after another mounting the rostrum to speak his piece – with no sharpening of conflicts, no rebuttals, and no continuity of thought from the politically aware chapters to the chapters on research techniques. Evaluation research and scientific strategy. While some Handbook contributors recognize the unique practical and political problems that attend evaluation
Salkind_Chapter 88.indd 291
9/4/2010 11:00:09 AM
292
Research Design, Measurement and Statistics and Evaluation
research, they seem less attentive to its unique strategic and conceptual problems. Perhaps the most serious weakness of the Handbook is its failure to make clear how evaluation research is distinct from field research in general. Instead, the Handbook (and, indeed, most existing evaluation literature) stresses the parallels between these two types of research. According to this view, an evaluation is basically a social experiment in which “independent variables” are manipulated (i.e., a social intervention is initiated) and “dependent variables” are measured (i.e., the intervention’s impacts are assessed). Ideally, the evaluator helps with the design and staging of the experiment, with the development of appropriate measures, and with the processing of data – just as any other “experimenter” does. Virtually the only additional obligations upon the evaluator/experimenter are (a) to convince operators to set up the intervention in a manner that facilitates appraisal of effects and (b) to communicate findings in a manner designed to facilitate decisions. This view overlooks the normal relationship between a social scientist’s theoretical or conceptual analysis and the strategic research decisions he makes. Ordinarily, the research design, the sample size, and the choice of operations and measures derive from the experimenter’s view of the phenomena under investigation. Whether in the field or the laboratory, his procedures reflect his educated guesses about underlying processes and about the magnitude of experimental effects relative to sources of noise and error. The evaluator, however, rarely enjoys the freedom to adapt his procedures to his conception of phenomena. He can rarely choose the size of the intervention program and practically never can dictate the nature of the social manipulation embodied in the program. The result, invariably, is a research strategy that makes little sense from any social scientist’s view (including the evaluator’s) of the problem providing the target for intervention. As everyone seems to recognize, social interventions are aimed to remedy complexly determined problems and social ills, ills unlikely to be easily overcome. Such an assessment, by conventional experimental wisdom, would dictate a research strategy featuring intensive interventions of long duration, performed upon relatively few subjects. Extensive pretesting with rigorous “manipulation checks” would be undertaken to satisfy the experimenter’s demand that the independent variable manipulation be sufficiently strong to test his conceptual hypothesis. Most likely, the research strategy selected would feature successive studies with continual modification of systematically varied manipulations and measures as the experimenter accumulated wisdom about underlying processes and their impact. Finally, if the researcher’s hypotheses failed, he would not attempt to convince his sponsors that his manipulations were good and his hypotheses correct; instead, he would attempt to collect new data (featuring even more powerful manipulations and more sensitive, if less socially relevant, measures). In every particular this seemingly rational, conceptually dictated research strategy is thwarted by the political, practical, and humanitarian demands upon evaluation research. The tryout period for any intervention is typically brief, and the number of subjects is large. As a result, the outcome data are
Salkind_Chapter 88.indd 292
9/4/2010 11:00:09 AM
Ross and Cronbach
Evaluation Research
293
limited to a small number of measures, the manipulation checks and attempts to closely monitor process variables are omitted, and the whole process of “fine-tuning,” whereby the experimenter improves his procedures, is neglected. Furthermore, the treatments are mild and usually eclectic in character. The result, invariably, is a research strategy ill-suited for a test of the broader social hypothesis that prompted the intervention efforts. A further and directly related result is the “political” dilemma that we have outlined and seen compellingly described in Weiss’s chapter. The evaluator is likely to provide less ammunition for advocates of social intervention than for conservative critics of social intervention. We do not imply, of course, that the political, practical, and humanitarian considerations that shaped evaluation research are unimportant or illegitimate. But it is important for the evaluator to recognize that he is playing against a loaded deck in a game with enormous stakes for those whose cooperation he seeks – stakes far greater than publishable results and a more impressive academic vita. The Handbook simply does not drive this message home. A few authors (notably Weiss and Twain) describe the practical handicaps and social obligations of the evaluator. Deming emphasizes the importance of observing “process,” although he seems content to treat the intervention as defined and fixed prior to the conduct of research (I:57ff.). Overall, however, the treatment of the relationship between theory and methodology is neither thorough nor penetrating. Interestingly, it is Susser writing on epidemiology who is most articulate on this subject (I:497–517), but the conceptual leap to more usual evaluation terrain is a large one indeed. How one wishes that Susser, or another equally lucid contributor, had more directly spelled out the relevance of the epidemiologist’s general insights for evaluators concerned with nonmedical social ills. Program goals and evaluation goals. To what extent should official program goals shape the evaluation effort? More generally, what should be the goal of evaluation? Both questions are much discussed in the evaluation field and in the present Handbook. An orthodox view is that evaluation’s primary function is to determine whether a program is accomplishing its objectives. Evaluators are told that a list of program goals is a prerequisite for evaluation; in education, at least, precise behavioral specification of goals has become something of a dogma. Objections to this philosophy – starting with the practical difficulty of squeezing a list of goals out of the planners – are well represented in the Handbook. Are unintended consequences, positive and negative, less important than explicit program goals? If not, attention to stated goals is not enough. Do officially stated goals coincide with and exhaust the purposes seen for the program by other significant constituencies? Do program goals remain fixed over time, or do they change as a program matures and new insights are gained? Each of these questions is raised in the Handbook. Weiss warns (I:23) that focusing on the objectives often distorts the evaluation, if “bloated promises and political rhetoric” are treated as authentic
Salkind_Chapter 88.indd 293
9/4/2010 11:00:09 AM
294
Research Design, Measurement and Statistics and Evaluation
program goals. Sjoberg’s critique is even more severe: Organizational goals reflect the priorities of existing systems that the evaluators ought to challenge. Ward Edwards, Marcia Guttentag, and Kurt Snapper (I:159ff.) are particularly concerned with conflicting goals and priorities, a problem many other writers evade in their tunnel-vision search for clean “causal inferences” and “facts.” Edwards et al., in fact, propose a technology for weighting competing goals. Although the scheme is a tool for management, the writers recognize its potential as a medium for expressing the conflicting interests of various segments of the public. These sober reflections on goal-directed management notwithstanding, a number of chapters preach orthodoxy. Thus Gurel: “To begin with, the evaluator must pin down the manager to specific and detailed answers to a list of questions (about) objectives, both immediate and ultimate” (II:22). Davis and Salasin, similarly, make “goal definition” one of four key steps in the technique that goes under the brand name “A VICTORY ” (I:656). Once again, a conflict in views, whether fundamental or only rhetorical, is seen within the Handbook. A similar conflict arises when the reader attempts to decode the Handbook’s message on purposes and priorities. The various purposes outlined at different points in the Handbook – to assess needs, to guide a “go/ no-go” decision, to provide support for a decision already made, to improve program plans and policies, to assist management by monitoring daily operations, to test social theories – all imply different criteria for excellence in evaluation and different, often contradictory, research tactics. The contributors make confident assertions, leaving it to the reader to integrate or limit the generality of such assertions. Evaluators, states Deming, are always engaged in making causal inquiries with a view to generalizing over future conditions that may depart from those currently being tested. Because of this intent to extrapolate, “The problem is not one in statistical significance. . . . Extreme accuracy in an analytic study is wasted effort” (I:65). That is clear enough. Do the Handbook authors who painstakingly describe procedures for assessing statistical significance disagree with Deming, or are they simply giving sound advice about studies that serve some other, unstated, social function? What are the implications of these conflicting views for the policymaker who must commission an evaluation? The reader is given less guidance than he deserves in view of the scope (and price) of the Struening-Guttentag Handbook.
The Technical Side of Evaluation About a dozen Handbook Chapters deal with technical issues. We shall discuss, in turn, the Handbook’s perspective on, and specific contributions concerning, design, instrumentation, and analysis; first, however, two general
Salkind_Chapter 88.indd 294
9/4/2010 11:00:09 AM
Ross and Cronbach
Evaluation Research
295
observations. The largest block of chapters consists of revised versions of three chapters from Nunnally’s (1967) textbook Psychometric Theory. These chapters constitute a rather conventional introduction to the subject of psychometric theory; they were not prepared initially with a particular eye to evaluation. The editors’ reliance upon these chapters reveals, we suspect, the degree to which conventional research planning and execution are considered suitable in an evaluation. We disagree with such a view and in this section will emphasize the respect in which conventional technical wisdom is ill-suited for evaluation. The editors deserve praise, however, for selecting authors who appreciate the multivariate character of evaluation. The methodology for giving weight to the multiplicity of input variables, program dimensions, and outcome is not well codified, but the Handbook’s suggestions reach far beyond usual present practice. Design. The Handbook relies heavily upon Campbell’s present and past contributions. The Campbell “Reforms” chapter describes a dozen uncommon designs such as the “interrupted time series” in which an institution prior to and after an innovation provides its own control. These designs are instructive when they can be used, and Campbell gives considerable insight into the bases for favoring one or another of them. A Nunnally chapter on design restates the well-known significant arguments of Campbell and Stanley (1963) that contrast the random experiment with studies in which assignment to treatment is not tightly controlled. Nunnally, however, neglects to discuss the Campbell-Stanley point about “external validity.” As they point out, an excellently designed formal experiment conducted on one particular realization of the treatment at one time in one particular community gives no basis for inference about the probable impact of the treatment when the operationalization, the community, personnel, or even simply the time has been changed. External validity is discussed elsewhere in the Handbook (especially in Deming’s chapter) but not strongly enough, we fear, to dispel the view that a formal random experiment provides answers about social interventions that are definitive and capable of broad generalization. A major issue for any contemporary discussion of evaluation is the uses and abuses of quasi-experiments, that is, designs comparing groups not randomly assigned. Gilbert, Light, and Mosteller (1975, p. 182) regard such studies as “fooling around with people” and doubt that quasi-experiments can ever resolve uncertainties about the merit of a social program. The controversies surrounding evaluations of compensatory education have hinged particularly on the uncertain comparability of treated and untreated samples. The Campbell-Erlebacher paper of 1970 (1:597–617) condemned the original Head Start research as “tragically misleading” on this basis. Nevertheless, comparisons of nonrandom groups are the rule rather than the exception in evaluation practice. In Bronfenbrenner’s survey of seven recent evaluations in compensatory education, for instance, only three projects randomly assigned
Salkind_Chapter 88.indd 295
9/4/2010 11:00:10 AM
296
Research Design, Measurement and Statistics and Evaluation
individual children, and two of these three carried additional nonequivalent comparison groups. One study had no randomization, and three others randomized inadequately by selecting two neighborhoods and arbitrarily installing the treatment in one of them (II:529–530). Although the sophistication of evaluators has gradually increased, the notion survives that randomization can be accomplished by randomly deciding which of two neighborhoods or institutions shall receive a program or innovation and which shall not. A randomization between two alternative choices provides only one attainable significance level – 50%. Random choice between many possible patterns of assignment is necessary before conventional levels of significance are attainable. The main argument for nonrandom comparison, beside that of practicality, is that of representativeness. If one wishes to generalize over diverse communities, only one or two of which consent to a random manipulation, the random experiment may give no better answer to the original question than would comparing nonequivalent groups. Rubin (1974) provides the best general discussion we have seen on the trade-offs between random and nonrandom experiments, but an article addressed to the specific concerns of evaluation would have been most welcome. Clarence Sherwood, John Morris, and Sylvia Sherwood (I:183ff.) make a good start in this direction. They quote writers who believe that randomization is everything and also quote other statisticians’ reasons for not relying on randomization. This welcome confrontation quickly becomes confused. Sherwood et al., following Campbell and Erlebacher, note that if groups differ on the pretests, any kind of matching will be subverted by the regression toward different means. They go on to an unjustified argument for matching as an alternative to randomization. What Campbell and Erlebacher offer is a univariate argument; Sherwood et al. Would have us believe that one escapes the problem if one matches on other bases and uses multiple outcome variables. They illustrate their proposal with a study in which persons applied to, were accepted for, and, in time, occupied a new housing facility for the aged. The control Ss were persons who applied to, but did not occupy, the facility. (They applied after the space was filled, or they did not accept admission.) The heart of the Sherwood scheme is to match persons from the two lists individually, on variables recorded at the time of application. Although this is not a bad design, as matching designs go, the logic remains flawed. The two pools undoubtedly were dissimilar; lateness of application or failure to accept admission must be symptomatic of something – less mobility, social isolation, or perhaps something less obvious. The authors’ assertion (I:192) that subjects matched a posteriori (from different pools) must regress in the same direction is simply wrong. The fact remains: Insofar as there is error of measurement, scores regress toward the means of their own populations. One more comment will alert readers to a point that the Handbook neglects. According to Nunnally, random assignment “absolutely guarantees”
Salkind_Chapter 88.indd 296
9/4/2010 11:00:10 AM
Ross and Cronbach
Evaluation Research
297
(italics his; I:119) that, prior to treatment, subjects differ only by chance. The evaluator, however, is interested in comparison at the end of treatment, not at the time of assignment. When subjects drop out, the two groups (experimental and control) of survivors are no longer samples of the same population. A random experiment has been executed with respect to only one dependent variable – completion or noncompletion of the course of treatment. Of course, the bias due to (nonrandom) dropouts is likely to be small if dropout rates in both groups are small. Much of what Nunnally says on design is wise, but there are significant oversights. For instance, his consideration of the problem of “units” is largely restricted to a passing comment (I:128) on the practical necessity of working with intact classrooms rather than isolated children in assessing a particular teaching technique. A more relevant ground for using intact classrooms as the unit of study is that one wishes to generalize to classrooms, not individuals in isolation; a design must reflect the ultimate, substantive question addressed by the evaluator. Nunnally displays a surprising enthusiasm for the posttest-only design; it is “the workhorse of evaluation research, and with certain exceptions should remain so” (I:123). Designs employing pretests he characterizes as “unwise.” The chronic objection to premeasures is their potential ability to sensitize the subjects and thus have a “reactive” effect. This can indeed be a hazard (e.g., in research on the effects of advertising), but it is hardly the pervasive threat that Nunnally implies. It is probably less relevant to evaluation than to most other kinds of research. The sensitization problem can often be overcome by adopting dissimilar pretest and posttest measures. Evaluators, moreover, are often asked to determine what kinds of individuals respond best to treatment. Nunnally himself specifically suggests correlating “effects” with data on pretreatment status. If pretests were avoided, as he advises, this analysis would be impossible. Instrumentation. Only Nunnally offers a technical discussion of instrument development. The first 30 pages of chapter 9 (vol. I) restate Stevens’s fundamentalist view on scales of measurement. Thus, Nunnally asserts with considerable force that one cannot (for example) multiply numbers that are not “on a ratio scale.” Such a rule would preclude the computation of most standard deviations! Nunnally’s point would be appropriate, perhaps, in a textbook for experimental psychologists concerned with the quantitative form of “laws,” but it has little bearing on evaluation and can only misdirect the Handbook reader. In any case, Nunnally’s dogmatism fails to reflect the years of dispute over Stevens’s position and the rationales regarding distribution shape – distinctly different from Stevens’s – that do pertain to statistical calculations. Although Nunnally gives sound introductory advice on test construction, with particular attention to achievement tests, the presentation does not compare with Thorndike’s Educational Measurement (1971) in thoroughness or tightness of argument on such matters as item writing and analysis.
Salkind_Chapter 88.indd 297
9/4/2010 11:00:10 AM
298
Research Design, Measurement and Statistics and Evaluation
Nunnally’s advocacy of the “homogeneous” test, in which individual differences on one item are correlated with individual differences on the others, is to be questioned. The justification for homogeneous scales, such as it is, arises from a special concern with individual differences. Such a concern was appropriate in Nunnally’s original textbook, but is of dubious relevance to the evaluator. From the evaluator’s viewpoint, it may make excellent sense to aggregate heterogeneous variables relevant to program outcomes (e.g., alternative forms of social participation). Consider, also, the case where there is no variance on a particular posttest measure because all the trainees successfully reached some ceiling. An item of this sort would be bad from Nunnally’s viewpoint (since it would correlate zero with all other items), but the evaluator would be well-advised to include such a strong indicator of the program’s success. Nunnally’s chapter on reliability and validity is a commendable, up-to-date textbook treatment; less thorough than the corresponding chapters of the Thorndike volume, but easier. The organization and emphasis of this chapter, however, might have been quite different if an evaluator’s special needs and interests had guided the author. Nunnally’s emphasis is apt to mislead the evaluator on some key issues. Psychometrics revolves around the reliability coefficient, and Nunnally follows tradition in saying, “in many applied settings a reliability of .80 is not nearly high enough” (I:345). In his illustrative “applied” task of identifying retarded children, his point is well-taken; in much evaluation research, however, the reliability coefficient of the outcome measure is nearly irrelevant. Often a low coefficient introduces no great difficulty, mainly because the evaluator’s primary concern is with group means. Therefore it is the measurement error associated with the mean (and not the much larger error associated with individual scores) that offers the pertinent reliability index for the evaluator. Even a work as massive as the Struening-Guttentag Handbook must leave some important technical topics untouched. The present reviewers, however, are disappointed by the absence of “how-to-do-it” advice on the construction of attitude measures. Evaluators are handicapped by the lack of an adequate current guide on this topic. Another disappointment was the neglect of the important development of matrix sampling and the technical work on criterion-referenced measurement. Statistical analysis. Evaluators trained in traditional research skills have relied heavily on tests of significance and have been guilty of equating statistical significance with social importance. Deming (I:62–63) drives home the message that significance testing in evaluation is “not helpful” and “logically misleading.” When samples are very large, effects of no economic or scientific importance will reach statistical significance. Conversely, socially important effects can remain undetected when the significance tests employed lack power. It obviously is wise, as Nunnally advises the evaluator, to report the confidence intervals associated with estimates of effects.
Salkind_Chapter 88.indd 298
9/4/2010 11:00:10 AM
Ross and Cronbach
Evaluation Research
299
Despite such scattered displays of wisdom, we fear that the readerpractitioner who relies mainly on significance tests will not feel seriously challenged to mend his ways. According to Nunnally (I:124), if statistical significance is not obtained, “It probably would be better to toss the data sheets into the wastebasket than to try to make sense of them in any other way.” Type-II errors never had so strong a champion. Nunnally’s advice might be defensible (although not universally applicable) in most conventional research. In evaluation research, however, such advice is misguided. First, sources of error variance are numerous, potent, and difficult to control. Even more important is the “degrees of freedom” problem. Programs typically are tried in a modest number of intact groups such as classes (even if the number of individuals involved is huge); accordingly, the sampling error of the mean that results from this procedure is apt to be distressingly large, making nonsignificance the likely finding.
Two Views of Evaluation Within the Handbook and in contemporary debate among evaluation experts, two models of evaluation struggle for ascendance. The mainstream view of evaluation. The model that dominates the field as a whole, and much of the present Handbook, views evaluation as an event that begins, runs alongside a program for a time as the evaluator makes observations and collects data, and ends rather abruptly. This effort ends in a report to an all-powerful decision-maker (usually one outside the program under consideration). This report assesses the extent to which the program as a whole, or particular innovations within the program, satisfied the objectives and expectations of the decision-maker (and those who provide resources). In this mainstream view, the evaluator enters the picture only after the initial events in the life of the program. 1. Proposal for services. Some agency or individual proposes a new service or an alteration in existing services. 2. Demand for evaluation. In the course of gathering support for the proposal, or perhaps as a standing tactic of management, evaluation is mandated. Some individual or agency, inside or outside the operating program, becomes the Evaluator (E). The question of the continuation, expansion, or modification of the program is placed on the future agenda of a Decision-maker (D) who, it is anticipated, will use information from E. The evaluator thus arrives en scene because of a potential contest or conflict, charged with the obligation of providing the decision-maker with the “objective facts” needed to judge the contest. His involvement can be traced through the following steps in the life of the program. 3. Program realization. Having at least short-term approval, the program comes into being. Many persons, operating under tight or loose supervision,
Salkind_Chapter 88.indd 299
9/4/2010 11:00:10 AM
300
Research Design, Measurement and Statistics and Evaluation
with specific or vague guidelines, turn the original proposal into a real program. The evaluator, in all likelihood, enters just before or just after this stage has been completed but plays little role in designing the program itself. 4. E formulates questions. Once commissioned, E takes the first conventional step of the scientific method by stating the questions to be answered or the hypotheses to be tested. Some of the writers say that to properly formulate these questions E must construct a conceptual model of program operations and effects. 5. Delivery. At this point in the life of the project, a decision about research strategy must be made in light of practical, political, ethical, as well as conceptual factors: E sets assignment rules. E may dictate who will receive a particular treatment and who will receive either no treatment or a contrasting treatment. He then confines his subsequent attention to treatments assigned according to his design. This is social experimentation strictly defined. E records participation. Often, of course, E cannot dictate treatment assignments. For example, he is commissioned only after some communities or hospitals have adopted a new program – and others have not – or some families – but not others – have chosen to participate in that program. In such circumstances, most “design” options are foreclosed, and E’s task is to collect data on the program as it stands (although he is free, of course, to ruminate about program possibilities not in effect and lo collect any data he deems relevant to such possibilities). He records who enters each treatment group, or at least who is in each group at the point of his arrival. Generally, he also collects background data on those individuals.
6. E measures outcomes. The next step is to measure the impact or outcome of the program. How easily do viewers of Sesame Street learn to read? How often do ambulances staffed by specially trained corps-persons save a life? Do those who have passed through Halfway Houses stay out of difficulty? He may design and/or administer instruments of his own, or he may rely upon secondary sources. 7. E Processes the data. This step calls primarily for technical expertise. A statistical mill grinds the Measures to a fineness the Gods of Methodology decree and produces a Conclusion if not a Causal Inference. 8. E reports his conclusions to D and shuts up shop. Evaluation has been done. If it was well done, the decision-maker is presented with the evidence he requires to proceed rationally. The program that fails to produce the promised benefits, to an acceptable degree, is terminated to save resources. The program whose benefits have outweighed its costs is maintained or expanded. An alternative and extended view of evaluation research. Reading through the Handbook, some limitations in the conventional or mainstream view of evaluation become evident. Several authors, like other contributors to the evaluation literature, speak to the need for one or another revision of the
Salkind_Chapter 88.indd 300
9/4/2010 11:00:10 AM
Ross and Cronbach
Evaluation Research
301
model – but no synoptic view of a better evaluation model emerges from the Handbook. We would emphasize the general features of such a model: (1) Evaluation can constructively enter the picture earlier and can be seen as a continuing part of management rather than as a short-term consulting contract. (2) The evaluator, instead of running alongside the train making notes through the windows, can board the train and influence the engineer, the conductor, and the passengers. (3) The evaluator need not limit his concerns to objectives stated in advance; instead, he can also function as a naturalistic observer whose inquiries grow out of his observations. (4) The evaluator should not concentrate on outcomes; ultimately, it may prove more profitable to study just what was delivered and how people interacted during the treatment process. (5) The evaluator should recognize (and act upon the recognition) that systems are rarely influenced by reports received in the mail. Evaluation thus becomes a component of the evolving program itself, rather than disinterested monitoring undertaken to provide ammunition to the warring factions in a political struggle. Formal reports to outsiders are reduced in significance, and research findings become not conclusions but updatings of the system’s picture of itself. Some new steps or stages in the life of a program, and some revision of conventional ones, are demanded by the model we have begun to describe. We shall outline them briefly, noting relevant remarks by the Handbook contributors. 1. E undertakes planning studies. The first evaluative activity can well come even before a decision to launch the program is undertaken. David Twain (I: 27ff., 37) notes that investigators can help the agency define the need to be met and to understand the potential client population. Weiss doubts that evaluating well-funded programs is half so good a use of talent as inquiry that serves program formation. She and others point to the advisability, prior to starting a field evaluation, of discerning whether the program to be realized is sufficiently thought through to be worthy of formal evaluation. Weiss (I:23) suggests that, along with evaluating the effectiveness of typical or average realizations, one “particularly strong version of the program” should be assessed in order to show the best that the intended treatment can do. Planning studies require intimate collaboration between evaluator and program developer. The plan is their joint product, and it may have been discussed with members of the client community and officials outside the immediate system. In a sense, that “the decision-maker” is served by the evaluation is a fiction and perhaps one that has outlived its usefulness. Numerous persons within the system, in numerous roles, engage in reshaping a program from day to day. Others play a part in the political, legislative, and bureaucratic processes that reshape the program from outside. Weiss is one of several who warn that there are multiple decision-makers, that coalitions develop, and that “trade-offs with interest groups, professional guilds, and salient publics”
Salkind_Chapter 88.indd 301
9/4/2010 11:00:10 AM
302
Research Design, Measurement and Statistics and Evaluation
(I:17) all play a role. Decisions, in the face of such cross-pressures, become more a matter of fine tuning than a question of go/no go. 2. E formulates questions. A major change from the conventional model is that research questions are expected to emerge as the study progresses. Several writers mention that program goals and the values of participants change during evaluation of a program. This thought has had little influence on those government officials who, in contracting for an evaluation, seek to rivet the questions and design into a brass-bound contract. Unfortunately, it has also had little influence on the Handbook contributors who discusss trategy and design. The didactic chapters, in fact, positively oppose letting questions grow out of continuing experience and observation. Twain (I:41) warns against considering any information not called for by a priori theoretical considerations – because the person who collects information that does not fit his theory becomes confused. Nunnally (I:102) makes an even more astonishing statement: “It should go without saying that measures [of outcomes] … should be constructed before rather than after the program is under way.” When the Handbook surveys the actual evaluation practice – as in Bronfenbrenner’s review – the realities of the field assert themselves. Variables and issues emerge as salient only after the team is in the field. Even late in the game, formal measures are being developed for phenomena whose importance has just been recognized. The crucial parental-involvement variable, for instance, came into those evaluations as an afterthought, prompted by direct experience in the field. One must turn to Sjorberg’s isolated chapter for a firm condemnation of the model most of his fellow authors rely upon (II:31–32): The research process is far more complex than has been depicted. . . . Typically there is a kind of “circular causation.” . . . The format – statement of the problem, collection of the data, and analysis of the results – is typically imposed upon the research process in an ex-post-facto manner. In practice … procedures for collecting data feed back upon the first stages of the design. . . . Changes in strategy during the course of the research project typically lead to a revision of the overall design. The picture of the research process, as idealized in print, is generally a crude approximation to actuality.
3. E reflects and discusses. Instead of storing up the data for use in an ultimate report, E could use the data as they come in, to help the program and evaluation staffs improve what they are doing. Edwards et al. (I:143) condemn the “pseudo-experiment” in which E insists that the program sit still while its picture is being taken. Change is inevitable. Once this is acknowledged, the same logic that warrants E’s engagement at the planning stage warrants his engagement in modifications. This formative role is exemplified in Sainsbury’s report (II:150):
Salkind_Chapter 88.indd 302
9/4/2010 11:00:10 AM
Ross and Cronbach
Evaluation Research
303
Our close daily contact with the hospital personnel and the responsible senior psychiatrists, and our presence at the monthly meetings of the psychiatric division. . . . were the channels whereby … the research affected policy.
The arguments against such free exchange seem to loom larger in the Handbook than the arguments for it. There is, on the one hand, the frequent concern that the evaluator will lose objectivity (I:44). From a radically different perspective, there is the fear that the administrator will use whatever facts come to his hand to “manipulate” subordinates (II:39). 4. E reports his current picture to outsiders. The evaluator trying to exert a constructive influence will be sensitive to the fact that the operating program evolves and that findings become outmoded. He will also remain aware that his data are being fed into an adversary process. The Weiss and Davis-Salasin papers hint that the evaluator ought not to be most concerned with the program realizations he has been comparing. Rather, he is to imagine what plausible further directions of program development might appeal to various segments of the decision-making community, and to make predictions about those alternatives on the basis of his accumulated knowledge. In so doing, he returns to the planning mode.
Concluding Remarks The Handbook was set on its final course just as Campbell’s great “Reforms as Experiments” paper appeared calling for hardheaded, summative testing of projects (where, and only where, the political climate is propitious). Campbell’s visions have not been realized, but social experimentation is being even more heavily promoted now than formerly (Riecken and Boruch, 1974; Rivlin and Timpane, 1975; Bennett and Lumsdaine, 1975). Insofar as there are to be further attempts at conventional summative evaluation, much of the fine detail of the Handbook will defend and even enrich the practice. The augmented, more interactive, and more continuous conception of evaluation sketched in the present review represents a shift away from solely summative concerns. Some Handbook contributors speak of the need for such a shift, but they do little to set out the costs and benefits of the two alternatives. Scholarly interchange is lacking in evaluation today, though writing is not. The need is for direct debate on some of the issues that can be perceived within the Handbook. As it becomes clearer what epistemological stance, what view of the political system, or what assumption about the purposes of evaluation leads to each of the divergent positions taken by evaluators, researchers, and decision-makers, babble may be replaced by the kind of argument that brings a field to maturity – maturity in service, as well as of internal clarity.
Salkind_Chapter 88.indd 303
9/4/2010 11:00:10 AM
304
Research Design, Measurement and Statistics and Evaluation
Notes 1. Preparation of this review has been sponsored by the Stanford Evaluation Consortium, under a grant from the Russel Sage Foundation Original distribution as part of the Proceedings of the National Academy of Education was sponsored by a grant from The Ford Foundation for support of activities of the Academy concerning public understanding of research on education. The opinions expressed are those of the authors and do not necessarily represent the position of the National Academy of Education, The Ford Foundation, the Russell Sage Foundation, or the Stanford Evaluation Consortium. 2. The task force included Sueann Ambron. Iris Berke, Robert Conry, Lee J. Cronbach, Lincoln Moses, Robin Parker. Barbara Pence, David Rogosa, Lee Ross, Nancy Sanders, Gary Sykes, and William Wagner. 3. This notation will be used throughout to denote reference to one of the two (II) volumes reviewed, with chapter or page references following.
References Bennett, C.A., & Lumsdaine, A.A. (Eds.). Evaluation and experiment. San Francisco. Academic Press, 1975. Campbell, D.T. & Stanley, J.C. Experimental and quasi-experimental designs for research. Chicago: Rand McNally, 1963. Caro, F.G. (Ed.). Readings in evaluation research. New York: Russell Sage Foundation, 1971. Cohen, D.K. Politics and research: Evaluation of social action programs in education. Review of Educational Research, 1970, 40(2), 213–238. Cook. S.W., Jahoda. M., & Deutsch, M. Research methods in social relations. New York: Dryden Press, 1951. Cronbach. L.J., Rogosa. D.R., Floden. R.E., & Price. G.G. Analysis of covariance: Angel of salvation or temptress and deluder? Stanford Evaluation Consortium Occasional Paper, February 1976. Gilbert, J.P., Light, R., & Mosteller., F. Assessing social innovations: An empirical base for policy. In C.A. Bennett & A.A. Lumsdaine (Eds.), Evaluation and experiment. San Francisco: Academic Press, 1975. Hofstadter, R., The age of reform. New “York: Vintage Books, I960. Nunnally, J.C., Psychometric theory. New York McGraw-Hill. 1967. Riecken, H.W., & Boruch, R.F., Social experimentation. San Francisco: Academic Press, 1974. Rivlin, A.M, & Timpane. M.P, (Eds.). Planned variation in education. Should we give up or try harder? Washington. D.C.: Brookings Institution, 1975. Rubin, D.B., Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 66, 688–702 1974. Suchman, E. Evaluation research New York Russell Sage Foundation, 1967. Thorndike, R., (Ed.). Educational measurement (2nd ed.). Washington, D.C.: American Council on Education, 1971. Wittrock, M.C., & Wiley, D.E., (Eds:). The evaluation of instruction. New York: Holt, Rinehart, & Winston, 1970.
Salkind_Chapter 88.indd 304
9/4/2010 11:00:10 AM
89 A Model for Studying the Validity of Multiple-Choice Items Lee J. Cronbach and Jack C. Merwin
T
he multiple-choice technique has deserved popularity as an aid in assessing achievement, ability, and personality. Its advantages include objectivity, adaptability, relative freedom from ambiguity, and relative freedom from response sets. Despite the extensive use of choice techniques, theory regarding the multiple-choice item and its construction has been almost non-existent. This paper lays a basis for needed studies of the properties affecting item efficiency. The test constructor makes many decisions in preparing an item. For example, he may decide to use one alternative which is distinctly different from and superior to the other two answers offered, or he may select alternatives so similar that judgment becomes difficult. The latter item would report more differences between persons, but a large fraction of these discriminations would be due to chance. The validity of a multiple-choice item depends in some manner on the “closeness” of alternatives and other parameters. We describe here a mathematical model which permits precise analysis of such relationships. Principles derived from the model will ultimately provide a basis for constructing more valid tests. The model also permits comparison of several proposed modifications of the best-answer technique. In self-report tests, for example, the subject may be asked to rank alternatives from most descriptive of himself to least. Presumably such directions yield more information than marking only the best answer. In ability tests, similar advantages may be sought by means of the sequential-response methods embodied in the “Tab-test” (7) and the Troyer-Angell Self-Scorer (2). Our model permits study Source: Educational and Psychological Measurement, 15 (1955): 337–352.
Salkind_Chapter 89.indd 305
9/4/2010 10:59:59 AM
306
Research Design, Measurement and Statistics and Evaluation
of the effect of these devices on validity, with various item characteristics. We believe that our model is capable of eventual extension to permit a theoretical comparison of choice items with forced response (“select one alternative in every item”) and choice with unforced responses (“mark the correct alternative or leave blank”). Previous work on choice behavior has not been chiefly aimed toward the improvement of tests. Psychophysical studies have primarily sought methods of scaling stimuli; Thurstone has done notable work in this area (10). Our methods of predicting choice draw heavily on the psychophysical models. Hull and co-workers (5, 6, 9, 11), studying discriminative learning in the rat, have used similar models to predict choice in that context. In the test-construction literature, a limited number of papers are relevant to our problem. Adkins and Toops (1) derive upper and lower bounds for the validity of multiple-choice items, on the basis of information obtained by administering the “stem” of the item as an open-ended question. Their model makes weaker assumptions than ours, but permits much less definite conclusions about factors affecting validity. Horst (8) has presented suggestions regarding the construction of multiple-choice items on the basis of a coarse distinction between “those who can make a discrimination” and “those who guess.” Our model acknowledges degrees of difficulty in discriminating, and should therefore be able to examine more subtle effects than Horst’s. The literature on forced-choice rating methods bears on related questions, although here as in psychophysics the interest has been in discrimination between stimuli rather than in measuring the rater.
Prediction of Choice among Stimuli A choice item consists of a stem and some number of alternatives, for example, “The President of the U. S. in 1835 was (a) John Adams, (b) Andrew Jackson, (c) Abraham Lincoln, (d) John Eaton.” The alternatives constitute a set of stimuli among which the subject is required to discriminate. The basis for discrimination is dictated by the stem, or implied in the test directions.
The Single-Stimulus Experiment In our model as in that of Adkins and Toops, experience with the possible alternatives (stimuli) presented separately is required in order to predict choice behavior. To study validity, it is necessary to use a group of persons for whom an acceptable criterion score is available. Adkins and Toops present the stem as a sentence to be completed (“The President of the U. S. in 1835 was ...”). The responses given by the subjects provide a list of possible alternatives. The frequency of each response among persons at any level on the criterion is noted.
Salkind_Chapter 89.indd 306
9/4/2010 11:00:00 AM
Cronbach and Merwin
The Validity of Multiple-Choice Items
307
Rather than use this open-end procedure, one may, as Adkins and Toops note, prepare a pool of alternatives in advance of the single-stimulus experiment. Each alternative is cast in a form which permits the subject to respond to it independently as a recognition question. For example, pupils might be asked to judge the following statements “True” or “False”: The President of the U. S. in 1835 was John Adams. The President of the U. S. in 1835 was John Eaton. Alternatives for interest, personality, or attitude tests would be cast in appropriate form, e.g.: Do you like to build birdhouses?..........Yes
No
In this as in the open-end experiment, one determines the proportion of subjects responding positively to each alternative and the relation of this response to the criterion. The methods used below to convert single-stimulus data to predicted choice frequencies may be applied either to the recall or recognition experiment. Each type of experiment has its own advantages, but we regard the recognition procedure as generally superior. The choice test calls for recognition rather than recall; we therefore would expect data from the single-stimulus recognition test, being more similar to the ultimate test, to give better predictions than data from the open-end test method. The open-end method does not provide data on correlation among the alternatives independent of the criterion. In the recognition experiment, the investigator ordinarily makes a concentrated study of a limited group of alternatives, whereas an unlimited number of responses may occur in an open-end study. Therefore, recognition conditions provide more stable data regarding any one response for a given sample size. Our model assumes experimentally independent responses to the various alternatives, and the experiment should be designed to obtain such responses. If one is willing to assume that all variance in choice behavior is criterionrelevant, it is possible to divide alternatives among subjects randomly so that each subject responds to only one alternative for any item. But if (as is more likely) responses to alternatives are correlated even with the criterion held constant, it is necessary to administer more than one alternative from a given set to the same pupil. These alternatives should ordinarily be presented on separate occasions to obtain approximate experimental independence. Preferably, the criterion used will be independent of the test under construction. Ratings from supervisors, from teachers, or from psychiatric interviews might be employed, for example. Often, the best criterion for a test will be some other test measuring the same attribute, particularly in the case of a proficiency or achievement test. If necessary, a total score derived from the single-stimulus experiment may be employed as a criterion for the individual item without doing violence to the model.
Salkind_Chapter 89.indd 307
9/4/2010 11:00:00 AM
308
Research Design, Measurement and Statistics and Evaluation
Scale Position of Stimulus Stimuli may be arranged along a continuum of attractiveness for any individual j, each stimulus s having a “true scale position” Ssj for that person. This continuum represents the power of the stimulus to elicit some “critical” response from the individual in the single-stimulus experiment. In open-end questions, completing a statement with the alternative under examination is itself the critical response. In a recognition experiment, the critical response is saying “Yes”, or “True”, or “Agree”, etc. Let psj designate the probability of the critical response over many trials. Any stimulus to which person j gives the critical response on 50% of all trials in a singlestimulus experiment (psj = .50) is located at Sj, his threshold. If Psj = .50, Ssj = Sj ; if psj > .50, Ssj > Sj. The person does not respond to a stimulus in the same way on every trial t. We account for this by postulating a “momentary scale position” Ssjt of the stimulus which varies around Ssj . These momentary variations in attractiveness may be due to changes in the situation (e.g., distracting noise) or to changes within the person. Whenever Ssjt > Sj, the person gives the critical response; when Ssjt < Sj, the response does not occur. We assume that Ssjt − Ssj varies randomly with normal distribution, zero mean, and standard deviation σsj. σsj is assumed to be the same for all stimuli, as in Case V of the Law of Comparative Judgment. With these assumptions, psj has a monotonic relation to S sj − Sj . Figure 1 presents an example where Ssj − Sj = Iσsj. On 84 per cent of all trials j will give the critical response. Conversely, if psj is experimentally determined to be .84, it follows that Ssj is I s.d. above Sj . The normal curve table may be used to convert frequencies of response psj into standardized scale posiS − Sj tions of the form sj . In the remainder of this paper, scale positions σ sj are assumed to have been thus standardized. The following sections discuss the prediction of choice, once psj is known. The single-stimulus experiments, however, do not necessarily provide direct information on psj. Three cases, dealing with this difficulty in different ways, will be discussed later.
84 %
−3
−2
−1
Sj 0
S sj 1
2
3
Figure 1: Relationship of psj to Ssj (here, psj = .84.)
Salkind_Chapter 89.indd 308
9/4/2010 11:00:00 AM
Cronbach and Merwin
The Validity of Multiple-Choice Items
309
Choice Behavior To predict the individual’s choice, we assume that when confronted with alternative stimuli he will give the critical response to whichever stimulus has the highest Ssjt (i.e., attractiveness) at that moment. At any instant, the values Ss jt , Ss jt, Ss jt, … , Ss jt define a point in n space, where n is the number 1 2 2 n of alternatives to be presented simultaneously in a choice item. These points are normally distributed about the centroid ( Ss1 j ,… Ssn j ). We shall designate this centroid Sij. The correlation between momentary scale values for any person is assumed to be zero, i.e., variations on the stimuli are assumed to be independent. This does not require independence of the Ssj over persons. Since choice depends on the relative momentary scale values, we convert the Ssjt into deviations about the momentary centroid sijt. This yields a set of deviation scale values of the form S′ sjt = Ssjt − 1 n ∑ Ssjt . These points are s
normally distributed with zero correlation in n − 1 space, because the S¢ must have a zero sum. The variance of S¢sjt , however, will be 1 − 1 n rather than 1. This type of projection has been discussed at length elsewhere (3, 4). Where S¢s1jt is greater than any other S¢sjt , S1 is more attractive than any other alternative. This condition defines a certain region of the n − 1 space. If at any moment S¢ijt for a person is in this region, he is expected to choose stimulus S1. There is a similar region where each of the other stimuli will be preferred. To know the expected distribution of j’s choices it remains only to determine on what proportion P¢sj of the trials S¢ijt falls in each region. We shall illustrate the foregoing argument with both two- and threechoice items. If there are two stimuli, the Sijt will be distributed as in Figure 2, the location of the distribution depending on the psj . Whenever Ss1j > Ss2 j, that
L (S1,S2 )
ˆ
Ss2j
S’ij
Sij
Sˆs,j
II
I
Figure 2: Distribution of Sijt and resulting distribution of S’ijt for a two-choice item
Salkind_Chapter 89.indd 309
9/4/2010 11:00:00 AM
310
Research Design, Measurement and Statistics and Evaluation
I
III
II
Figure 3: Distribution of S’ijt for a three-choice item
is whenever Sijt is below line L, the person will choose s1, and vice versa. According to this figure, j chooses s2 more frequently than s1. The frequency of each choice is determined by projecting each point onto a line normal to L. This line intersects L where S¢1jt = S¢2jt = 0. The projected distribution is normal, and Area 1 (32 per cent in this figure) indicates the proportion of trials on which we expect j to choose s1. He will choose s2 on the remaining 68% of the trials. In the example shown in Figure 2, Ss1j = .25 and Ss2i = 1.00. These give S¢sj of .38 and −.38, respectively. Due to projection, the S¢sjt values have a standard deviation of 1 2 Each S¢sjt is multiplied by 2 so that the normal curve table may be used. A three-choice item produces a distribution of Sijt in three-space. In this space is a pencil of planes which divide the space into regions, and these planes meet in a line L where all scale values are equal. When the Sijt are projected into a plane (n − 1 = 2) normal to L, the resulting distribution of S¢ijt is as pictured in Figure 3. When S¢ijt falls in area 1, s1 is chosen, and so on. The proportionate frequency of each choice is determined from the volume of the corresponding sector of the bivariate normal distribution. For the three-choice item, the standard deviation of the S¢sjt is 2 3 . Thus each S¢sj must be divided by 2 3 to permit use of the normal bivariate tables. The foregoing argument can be generalized to any value of n but computational labor increases rapidly as dimensions are added. One of the authors (J. M.) has developed a program for performing the computations for the three-choice case on the University of Illinois Electronic Digital Computer. Table 1 gives illustrative results for the prediction of preference, where four stimuli are grouped to form three different triads.
Prediction of Choices after the First Probability of each subsequent choice may be predicted as well as the first. We assume that in each successive choice the person chooses whichever of the remaining alternatives has the greatest momentary scale value. The plane of
Salkind_Chapter 89.indd 310
9/4/2010 11:00:00 AM
Cronbach and Merwin
The Validity of Multiple-Choice Items
311
Table 1: Choice frequencies predicted from single stimulus items Alternatives Hypothetical psj (Ssj – Sj)/si Possible triads
∑S
ij / n
S’sj (respectively)
p’sj (respectively)
Carpentry (C)
Gardening (G)
.70 .842
.50 .000
Reading (R)
Hiking (H)
.25 −.674
.05 −1.645
CGR
CRH
GRH
CGH
+ .056
− .492
−.773
−.268
+ .79; − .06; − .73 .63; .26; .11
+ 1.33; − .18; −1.15 .78; .17; .05
+.77; + .10; −.87 .61; .30; .09
+ 1.11; + .27; −1.38 .68; .29; .03
Ia
Ib
IIIb
IIa IIIa
IIb
Figure 4: Regions giving probability of each order of responses for a three-choice item
n – 1 dimensions contains regions corresponding to each possible order of scale positions. The plane of Figure 3 may be divided into six such regions (see Figure 4). Whenever S¢ijt falls into region 1a, j will rank the stimuli in the order s1 , s2 , s3. Each of the other areas has its corresponding order. Determining the volume in each subarea obtains results of the type illustrated in Table 2. This table is based on the same hypothetical psj values as Table 1. It is meaningful in a preference or attitude test to ask subjects to rank alternatives. Such directions are not appropriate in an ability test where one answer is correct and the others completely wrong. Ability tests, however, may be administered with special answer sheets which provide an immediate indication if a wrong answer is selected. Then the person can meaningfully be required to select what he regards as the next best answer. We assume that the relative attractiveness of the remaining alternatives is not altered by feedback from former choices. If a three-choice item regarding the presidency of the United States were administered in this way (using the same psj as in Table 1), the possible response patterns would have the frequencies listed in Table 3.
Salkind_Chapter 89.indd 311
9/4/2010 11:00:00 AM
312
Research Design, Measurement and Statistics and Evaluation
Table 2: Probability of each response pattern Triad Rank order of choice
CGR
CRH
CGR CGH CRG CRH CHG CHR
.397 __ .226 — — —
__ — — .533 — .248
GCR GCH GRC GRH GHC GHR
.194 — .066 — — —
RCG RCH RGC RGH RHC RHG
.075 — .042 — — —
GRH
__ .255 — .036 — __ — — .403 — .199
HCG HCR HGC HGR HRC HRG
GCH
— .147 — — .023 —
— — — .233 — .067
— .034 — — .015 —
— — — .058 — .040
__ .543 — — .131
.013 — .022 — — —
Table 3: Probability of each sequential response pattern in an ability item (alternatives and their psj are: Jackson (keyed as correct), .70; adams, .50; Lincoln, .25; Eaton, .05) Triad Rank order of choice
JAL
JLE
JAE
J AJ ALJ AEJ LJ LAJ LEJ EJ EAJ ELJ
.623 .194 .066 — .075 .042 — — — —
.781 — — — .147 — .023 .034 — .015
.674 .255 — .036 — — — .022 .013 —
Correct on first trial Correct on second trial
.623 .269
.781 .181
.674 .277
Prediction for a Group of Persons To predict p¢sC, the probability of each choice by persons in a criterion group C, the model may be developed under any of three conditions.
Salkind_Chapter 89.indd 312
9/4/2010 11:00:01 AM
Cronbach and Merwin
The Validity of Multiple-Choice Items
313
Case I. ssj for any j equal for all s. Case II. ss all persons being pooled, equal for all s. S − Sj Case III. ssj for any j equal for all s, and sj for any s the same for all σ sj j in C. Case III, the most restrictive, assumes that the standardized scale position of any stimulus is the same for all individuals having the same criterion score. This assumes that a person’s response in the single-stimulus experiment is fully determined by his criterion score, his threshold for the critical response and his random variation on any trial. If his threshold is low he will give the critical response more often than others in his group; but the stimuli have the same relative attractiveness for every person in the group. Determination of p¢sC in Case III is a simple matter. Since the standardized scale positions of stimuli are the same for all persons, the distribution of S¢ijt differs from person to person only by the factor σj. S¢ijt /sj may be denoted by S¢iCt. The proportions choosing each response in the entire group is the same as for any person in the group. psC, the proportion in group C giving the critical response to stimulus s, is observed and used in place of psj in the procedures described above. In Case II, the stimuli are allowed to have different standardized scale positions for persons in the same criterion group. Specific characteristics of the stimulus not represented in the criterion or the threshold can influence the individual’s response. I.e., response depends on group and specific factors among the stimuli. To predict choice, it is assumed that the dispersion of momentary scale values ssj is the same for all stimuli, all persons and trials being pooled. To obtain data, each pair of stimuli is administered to an acceptable sample of persons in the single-stimulus recognition experiment, or the same stem is presented at least twice in an open-end experiment. The joint probability of giving the critical response to both alternatives is observed. The method of computing p¢sC is described below. In Case I, it is necessary to observe psj for the individual by presenting each stimulus singly on many independent trials. Such data would be treated as described earlier to determine the individual’s probability of choice, and this result would be averaged over persons in group C to determine p¢sC. Calculations for Case II. The steps in determining p¢sC for Case II are as follows: (1) Compute S¢iC from the psC , in the same way as S¢ij is determined from the psj. (2) Observe the joint probability Ps1s2C of critical response to each of a pair of stimuli. (3) From this, with ps1C and Ps2C, compute the tetrachoric correlation of ps1Ct with Ps2Ct . From our earlier assumptions, this is an estimate of the relation of Ss1 jt and Ss2 jt , for all persons in C and all trials pooled. Repeat for
Salkind_Chapter 89.indd 313
9/4/2010 11:00:01 AM
314
(4) (5) (6)
(7)
Research Design, Measurement and Statistics and Evaluation
all pairs of alternatives. Write the matrix of these correlations, with ones in the diagonal. Remove the first centroid factor a. The residual matrix gives the variances and covariances of the S¢sjt. Employ the square-root method to remove the next n – 1 orthogonal factors b, c, etc. Divide the space formed by factors b, c, etc. into regions within each of which one of the S¢sjt is greatest. (Or, in studying ranking methods, subdivide further in the manner of Figure 4.) Examine the distribution of responses using the square root of the latent root of each orthogonal factor b, c, etc. as the standard deviation on that dimension. Determine the proportion of the distribution falling within each region; this is the probability of choice, for members of group C taken together.
These operations can be clarified by an example involving three alternatives. Suppose that tetrachoric correlations between responses in the single-stimulus experiment are as follows: S1 S1 S2 S3
1.00 .60 .40
S2
S3
.60 1.00 .20
.40 .20 1.00
fsa .86 .77 .69
The first centroid has the loadings shown in the fsa column. The residual matrix giving the variances and covariances of the s¢sjt is:
S1 S2 S3
S1
S2
S3
fsb
.26 −.06 −.19
−.06 .40 −.33
−.19 −.33 .52
.51 −.12 −.38
Factor b has been arbitrarily placed so that it coincides with s¢ s1jt. Only one further factor is required to account for all variance, yielding the following factor matrix: fsa Sl S2 S3
σf
∑f
2
fsb
fsc
h2
.86 .77 .69
.51 −.12 −.38
.00 .63 −.62
1.00 1.00 1.00
1.35
.64
.88
3.00
1.16
.80
.94
The first factor contains information about individual thresholds, but has no bearing on choice behavior. The distribution of S¢ijt is plotted with reference to the b and c axes in Figure 5. The values of S’s1C, S¢s2C , S¢s3C are assumed
Salkind_Chapter 89.indd 314
9/4/2010 11:00:01 AM
Cronbach and Merwin
The Validity of Multiple-Choice Items
315
c
I II b III
Figure 5: Distribution of S¢ijt for a three-choice item, correlated case
to be .4, .4, and −.8 in this figure. It will be noted that the standard deviations of b and c are the square roots of the sums of squares of factor loadings. The original axes are located by normalizing the factor loadings in the b and c columns so that their sum of squares is 1.00, and using the normalized loadings as direction cosines. As planned, the axis for s1 coincides with factor b. The boundaries between regions are indicated in Figure 5. From tables for the bivariate normal distribution the proportions in the various areas are computed as follows: I ( ps1), .46; II (Ps2), .29; III (Ps3), .25. It is evident that computation for Case II is difficult, and perhaps prohibitive as the number of alternative goes beyond three. But in principle, the model affords an approach of considerable generality.
Determination of Item Validity The previous sections have indicated how the proportion giving each choice in any criterion group can be predicted. The single-stimulus experiment will have provided sufficient data to prepare a complete contingency table showing the proportion in each criterion group pC and the conditional probability p¢sC of a person in each group making each choice. The item will ordinarily be scored by an arbitrary system. For example, the best answer may be credited +1 and the others 0. In a ranking method, the credit may be 2, 1, or 0, depending on whether the correct answer is chosen first, second, or last. With such a scoring system, a table can be prepared showing the joint probability pCx that a person is in criterion group C and will receive score x. If the criterion categories form an interval scale, the product-moment correlation based on this table is a proper validity coefficient. Other appropriate statistical indices may be used for other types of criterion data. Even when the criterion is continuous, rather coarse groupings on the criterion scale will, as Adkins and Toops point out, give adequate estimates of the correlation. An example will show the validities obtained by the two testing procedures when two equal criterion groups are used. Let the pCx of the high group for the
Salkind_Chapter 89.indd 315
9/4/2010 11:00:01 AM
316
Research Design, Measurement and Statistics and Evaluation
ranking method be .33, .12, .05 respectively and for the low group be .11, .18, .21 respectively. The validity of the item when the ranking method is used with integer scoring weights is .48. For these same groups, the pCx for the best answer method are .33 and .17 for the high group and .11 and .39 for the low group. The validity of the item when the best answer procedure is used is .44. It is also possible to weight each response pattern to give maximum prediction of the criterion. If the criterion is expressed by an interval scale, the curvilinear correlation σC.x based on the joint probability distribution is the validity coefficient for this optimum weighting. It will be noted that the value of weighting can be assessed without actually computing the weights.
Use of the Model The model is designed with both theoretical and practical ends in mind. Many questions regarding the design of multiple-choice tests can be studied by the procedures here described, using hypothetical parameters rather than performing actual experiments. The basic parameter of the model is psc, the probability of critical response to each stimulus in each criterion group. For Case II, the other important parameter is the joint probability Ps1s2C for each pair of stimuli. With n alternatives, each criterion group calls for n values of psC and n(n – 1)/2 joint probabilities. In addition, pC , the probability of occurrence of each criterion score, is required to determine a validity coefficient. Thus, with n alternatives and C criterion groups, a total of C(n + 1) – 1 independent parameters are invoked to predict validity in Case I and (n2 + n + 2) – 1 in Case II. The investigator may hypothesize values of these parameters and thereby study the entire range of possible item characteristics. The procedure for such studies is simpler than the foregoing paragraph suggests, since not all the degrees of freedom in the model affect the results. Many different sets of psC yield the same set of p¢sC, and in any set of p¢sC there are only n − 1 degrees of freedom. Similarly, the correlations between S’sCt rather than between psc determine choice in Case II, and we need specify only the (n − 1)(n − 2)/2 independent entries in the first residual matrix, not the n(n − 1)/2 correlations. In Case I or III, behavior on any item is predicted from nC − 1 parameters; in Case II, (n2 – n + 2) – 1 are required. There is reason to believe that moderate changes in pC have only a minor effect on validity, which further simplifies the problem. By varying the parameters appropriately, the effect on validity of any proposed modification of the forced-choice test can be determined. Although the computations required are numerous, modern computing methods make them feasible. Studies employing the model in this manner are now in program. Comparison of validity coefficients predicted by this model to validities empirically obtained should be undertaken for the present model, and also for
Salkind_Chapter 89.indd 316
9/13/2010 3:49:39 PM
Cronbach and Merwin
The Validity of Multiple-Choice Items
317
the Horst and Adkins-Toops models. If the model meets such empirical tests, the test constructor may perform single-stimulus experiments and from them predict which combinations of alternatives would be most valid. This would require him to perform the same calculations as in the theoretical studies discussed above, using observed values of the parameters. The most difficult stage in these computations for Case I or III is the conversion of S¢iCt into values of p¢sC . A nomogram to make this conversion for three-choice items has been prepared by the junior author. Copies will be supplied on request. Nomograms for Case II, or for larger numbers of alternatives, would be prohibitively complicated. This line of investigation should in the long run develop generalizations which will permit the test constructor to recognize immediately, from their parameters, what combinations of alternatives will be effective. More broadly, it should advance our comprehension of the way in which choice items can best serve in measurement.
References 1. Adkins, Dorothy C., and Toops, Herbert A. “Simplified Formulas for Item Selection and Construction.” Psychometrika, II (1937), 165–171. 2. Angell, George W., and Troyer, Maurice E. “A New Self-Scoring Test Device for Improving Instruction.” School and Society, LXVII (1948), 84–85. 3. Cronbach, Lee J. “ ‘Pattern Tabulation’: A Statistical Method for Analysis of Limited Patterns of Scores, with Particular Reference to the Rorschach Test.” Educational and Psychological Measurement, IX (1949), 149–171. 4. Cronbach, Lee J., and Gleser, Goldine C. “Assessing Similarity between Profiles.” Psychological Bulletin, L (1953), 456–473. 5. Felsinger, J. M., Gladstone, A. I., Yamaguchi, H. G., and Hull, C. L. “Reaction Latency (str) as a Function of the Number of Reinforcements (N).” Journal of Experimental Psychology, XXXVII (1947), 214–228. 6. Gladstone, A. I., Yamaguchi, H. G., Hull, C. L., and Felsinger, J. M. “Some Functional Relationships of Reaction Potential (sEr) and Related Phenomena.” Journal of Experimental Psychology, XXXVII (1947), 510–526. 7. Glaser, Robert, Damrin, Dora E., and Gardner, Floyd M. “The Tab Item: A Technique for the Measurement of Proficiency in Diagnostic Problem Solving Tasks.” Educational and Psychological Measurement, XV (1954), 283–293. 8. Horst, Paul “The Difficulty of a Multiple Choice Test Item.” Journal of Educational Psychology, XXIV (1933), 229–232. 9. Hull, Clark L., Felsinger, John M., Gladstone, Arthur I., and Yamaguchi, Harry G. “A Proposed Quantification of Habit Strength.” Psychological Review, LIV (1947), 237–254. 10. Thurstone, L. L. “Psychophysical Methods.” In T. G. Andrews (Ed.), Methods of Psychology. New York: John Wiley and Sons, 1948. 11. Yamaguchi, H. G., Hull, C. L., Felsinger, J. M., and Gladstone, A. I. “Characteristics of Dispersions Based on Momentary Reaction Potential (sEr) of a group.” Psychological Review, LV (1948), 216–238.
Salkind_Chapter 89.indd 317
9/4/2010 11:00:02 AM
Salkind_Chapter 89.indd 318
9/4/2010 11:00:02 AM
90 Assisted Assessment: A Taxonomy of Approaches and an Outline of Strengths and Weaknesses Joseph C. Campione
I
n the recent critiques of the effectiveness of American schools, both instructional and assessment practices have come under strong attack. The crux of the matter is that there is a considerable amount of evidence that by the middle grade-school years, the majority of students have acquired many of the basic skills involved in reading (decoding), writing (producing a passable essay), and arithmetic (executing computational algorithms), but seem not to understand those activities in a way that allows them to progress beyond entering levels and become truly proficient. This predicament, common enough among regular division students, is even more pronounced among students in special education settings. The further argument is that this pattern is in good part a consequence of the way in which standard instruction is organized, and that it is then reinforced by accepted assessment practices. Our recent research activities have centered on the development of novel approaches to instruction and assessment that can overcome the limitations we see as characterizing much of current practice. The goal is to develop an overall theoretical framework within which the two sets of activities can be integrated. In this paper, we concentrate primarily on the work we and others have done in the assessment realm, with some discussion of the implications of that work for instructional practice.
Source: Journal of Learning Disabilities, 22(3) (1989): 151–165.
Salkind_Chapter 90.indd 319
9/4/2010 7:26:29 PM
320
Research Design, Measurement and Statistics and Evaluation
Since the inception of the testing movement at the turn of the century, the goals of assessment have remained the same. The idea is to develop tests that will generate descriptions of individual learners in terms of their strengths and weaknesses that will (a) predict how well they are likely to do in academic settings and (b) inform the development of instructional programs that can facilitate the performance of those predicted to experience particular difficulties. Assessment practices have come under attack for both their predictive and prescriptive features. In the context of prediction, it has been argued that they are likely to misclassify students who come from nonmajority cultural backgrounds. Standard tests rest on the assumption that all students have had equivalent opportunities to acquire the information and skills probed for on those tests. To the extent that this assumption is not true, and it frequently is not, any inferences drawn from those tests are problematic. In the context of the relation between assessment and instruction, typical tests have been criticized in two, almost contradictory, ways. On one level, it is argued that typical ability and achievement tests do not inform instruction, that is, they do not provide the kinds of diagnoses needed to build instructional programs that can overcome student weaknesses. At the same time, there are concerns that those tests do influence instruction, albeit in a negative way. Students, teachers, schools, and so forth, are evaluated in terms of performance on standardized, norm-referenced tests. As we argue below, both the structure and content of those tests help shape and reinforce some of the negative aspects of traditional instruction. In the area of special education, tests serve simultaneously to identify students to be assigned to remedial programs and to define the nature of the disability different children possess. Both legal and scientific definitions of various kinds of academic problems (mental retardation, distinct kinds of learning disabilities, etc.) are defined in reference to standard assessment instruments; performance on those tests serves as the basis for student labeling and influences the likelihood that the student will be assigned, for example, to a special education program. Beyond classification, the tests also function to suggest forms of treatment. For example, some students who are having difficulty learning to read may, on the basis of standard ability tests, be found wanting in their auditory sequencing skills. As a consequence, they may be given practice on items designed to sharpen those skills, with the hope that such training will result in improved reading ability. We have labeled this step, from diagnosis to intervention, the “leap to instruction” phenomenon, and have argued that the action is seldom easy to defend (Brown & Campione, 1986). To improve upon the predictive and, particularly, prescriptive aspects of tests requires an understanding of the component skills and processes involved in the target academic tasks and the ways in which they contribute to successful or unsuccessful performance. It is only when we have a strong
Salkind_Chapter 90.indd 320
9/4/2010 7:26:30 PM
Campione
Taxonomy of Approaches
321
theory of the cognitive underpinnings of a given task, along with indications of likely sources of individual differences in their execution, that we can begin to build a diagnostic test. If, according to some theory, it is assumed that processes A, B, and C are involved in effective reading performance, and that skilled and unskilled readers differ most dramatically in use of B, the B should be a main target of assessment. These kinds of theoretical analyses serve to highlight what should be evaluated. Of course, theorists will differ in their analyses and hence spotlight different skills for measurement – the utility of the tests can then also serve as a way of evaluating the theories. In addition to the what of testing, there is also the how. Even after we have decided to probe for process A, how do we assess it? In this paper we will be concerned with two sets of issues. One has to do with the degree of support testees receive. Skills can be measured in situations where students work unaided on sets of items, and are given but a single chance to demonstrate their proficiency (static tests). The contrast here involves cases where students are given some form of help designed to maximize their performance, with this aided, maximal level taken as providing the clearer picture of student ability (dynamic tests). The second feature, the degree of contextualization of the assessment, involves two subthemes. To illustrate, assume that the ability to “identify main ideas” is deemed central to reading and writing. That skill can then be measured as either a relatively isolated activity, typically with specially prepared materials, or in the actual context of reading a text for meaning or producing an essay. A second issue concerns the degree of generality one assumes. If “finding the main idea” (or “auditory sequencing”) is regarded as a fairly general characteristic, any measurement in any context will be acceptable. If, however, the ability to identify main ideas is regarded as varying within individuals over tasks (some may do it well while reading someone else’s text but less well when producing or evaluating their own), the specific match between the target and testing settings takes on much more significance. The majority of extant tests involve static assessment. Intelligence and ability tests also assume considerable generality of the processes under evaluation. And it is easy to see why this is the case; these are the most pleasant assumptions a test developer can make. Static tests, as opposed to dynamic ones, are much less complex to generate. No aid is provided, social interaction between the tester and testee is minimized (though not eliminated – Mehan, 1973), objective and reliable scoring systems can more readily be implemented, and norming is much simpler. The assumption of generality also makes life easier, as the specific context does not become a major concern. Evaluating a given process in one context provides information about its operation in all, or many, others. As a result, a small number of tests can be used to generate a considerable amount of information. Assessment is of course not the whole story. Even if satisfactory and theoretically defensible assessment instruments have been designed, there remains
Salkind_Chapter 90.indd 321
9/4/2010 7:26:30 PM
322
Research Design, Measurement and Statistics and Evaluation
the problem of translating test results into suggestions for remediation. A good assessment may well highlight the strengths and weaknesses an individual or group of individuals possesses, and hence may indicate the contents of an appropriate instructional program, but it cannot do that in the absence of a theory of learning and instruction that can be used to guide the way in which both the assessment and instructional enterprises are implemented. As we have indicated elsewhere (Brown et al., 1986a), prevailing views about the nature of individual differences in academic aptitude and about the nature of human learning have had profound effects on the identification and treatment of children with learning problems, be they called mentally retarded or learning disabled. In the next sections, we outline what we see as some of the weaknesses of standard instruction and assessment. We also point to some recent attempts to generate alternatives designed to overcome those problems. In this context, we highlight the role of supportive social contexts for learning and their place in both instruction and assessment. Our main goal in this paper is to summarize and organize a family of new approaches to assessment, generically referred to as dynamic assessment, that have been developed as alternatives to standard assessment practices. We want to explore the ways in which the various approaches to dynamic assessment can influence classification of students, perception of student abilities, and instructional practices. The defining feature of these approaches is a reliance on process, rather than product, information. In our treatment, we will be concerned with two aspects of dynamic assessment: (a) whether the skills assessed are assumed to be domain general or domain specific and (b) whether the assessment is structured in a formal, standardized fashion or in a more opportunistic, clinical way. Before turning to the assessment issue, we review some concerns with instructional practices.
Traditional Instructional Programs: Criticisms Elsewhere (Brown & Campione, in press; Campione, Brown, & Connell, in press), we have summarized some of the limitations of standard instructional practices that contribute to findings that students come to be able to perform sets of basic skills on demand, but do not acquire a firm conceptual grasp of the goal of those activities. They can perform the necessary subskills or algorithms on demand, for example, when cued on a standard test, but without understanding their significance, they are not in a position to use the skills flexibly. It is our hypothesis that this result is explainable in terms of a number of features of typical instructional practices. Throughout the curriculum, there is an emphasis on direct instruction with strong teacher control – the teacher lectures and the students listen. There is little discussion and few opportunities for students to contribute
Salkind_Chapter 90.indd 322
9/4/2010 7:26:30 PM
Campione
Taxonomy of Approaches
323
their own feelings, ideas, or concerns during the course of instruction. This tendency, present in reading and social studies courses, is most pronounced in mathematics, where teachers routinely work out problems on the board and then have the students work independently on related examples (Stodolsky, 1988). With the students forced into a passive role, and not encouraged to contribute their own comments, there is little occasion for the teacher to engage in the kind of on-line diagnosis of individual student capabilities. Teachers respond by proceeding through their lesson sequences following what Putnam (1987) calls a curriculum script; it is the preset curriculum that guides selection of instructional goals and content, not student progress. There is a strong tendency for lower level skills to be taught before higher level understanding, as if they were separable, leading children to misunderstand the goal; they come to believe that reading is decoding, that math consists only of running off well-practiced algorithms, and so forth. There seem to be two general assumptions underlying this sequence. One is that unless basic level skills are mastered, students cannot acquire higher level ones. An alternative is that higher level skills will emerge automatically from mastery of the basic skills. Another consistent feature of much instruction is an emphasis on subskills. Many academic tasks, such as reading for meaning, are complex; and for purposes of instruction it is deemed necessary to make them more manageable for novice learners. Frequently, this means analyzing the global task into the discrete subskills that are assumed to be the components of the overall task. Instruction then focuses on those subskills, and students practice them in relative isolation from the real goal of the activity, such as reading for meaning. This emphasis on subskills also results in a lack of explicit instruction regarding the more complex and global strategies that expert studiers deploy flexibly. As one example, analyses of the comprehension process have identified an impressive array of such tactics, ones that are acquired by extremely capable students in the absence of explicit instruction. These activities, such as summarizing and paraphrasing what one has just read, anticipating the author’s argument, and so forth, allow students both to extend their comprehension and to monitor its progress. (If one cannot summarize or if anticipations are disproved, this is evidence that comprehension is not occurring.) However, there is by now considerable evidence that less capable students do not acquire a variety of such cognitive strategies unless they are given detailed and explicit instruction in their use (e.g., Campione, Brown, & Ferrara, 1982; Rohwer, 1973). It is also true that the more complex the strategy in question, the more explicit the instruction needed, even for more capable students (Brown, Bransford, Ferrara, & Campione, 1983; Day, 1986). The idea that complex problem-solving strategies will emerge from instruction aimed at instilling their constituent subskills is difficult to defend. Worse,
Salkind_Chapter 90.indd 323
9/4/2010 7:26:30 PM
324
Research Design, Measurement and Statistics and Evaluation
even when practice in understanding is finally provided, frequently it too is treated as consisting of decomposed skills (summarizing, inferring, etc.). Such activities are presented as ends in themselves, rather than as a means to a more meaningful end. Little attention is paid to the flexible or opportunistic use of strategies in appropriate contexts. Unfortunately, this emphasis on skill training is particularly stressed for low achieving students, for whom explicit instruction in understanding is particularly necessary. Higher level strategies are rarely taught. Students perceived by their teachers as less capable are seldom asked to engage in sophisticated reasoning processes, but instead are required in the case of reading to concentrate on pronunciation rather than understanding a text (e.g., Collins, 1980), or doing simple computations rather than deploying new procedures (e.g., Petitto, 1985). Hence, weaker students seldom get to practice the higher level skills they are most unlikely to acquire spontaneously.
Traditional Instructional Programs: Consequences Given this educational history, it is not surprising that students have difficulty understanding and orchestrating their own learning. Many students acquire a distorted view of what academic tasks are and hence come to believe that reading is decoding, that math is executing algorithms, that writing is neatness, and so forth. They come to view the syntax of the domain, rather than its semantics, as its core concept (Resnick, 1982). Given that many of the practices underlying these views are particularly emphasized for students having academic problems, it is not surprising that these students are most likely to have distorted views of the main goals of schooling. They suffer from problems in two main arenas. Their knowledge of the domains is faulty, and they experience particular difficulties attempting to monitor and regulate their online learning and problem-solving attempts. In contemporary parlance, they have problems with metacognition (Brown et al., 1983). On the positive side, there is growing consensus that interactive learning environments, in which the goal is to enhance students’ conceptual understanding of the semantics, or the meaning, of procedures, produce more insightful intentional learners. One example of such an approach is reciprocal teaching of reading comprehension (Brown & Palincsar, 1982, in press; Palincsar & Brown, 1984), and other examples have been reviewed by Collins, Brown, and Newman (in press). Although we will not go into detail here, reciprocal teaching involves a guided cooperative learning environment in which students take an active part in discussions designed to improve their reading comprehension. As we indicate below, that involvement affords teachers the opportunity to do on-line evaluation of individual student competence and to provide tailored instruction designed to enhance that competence.
Salkind_Chapter 90.indd 324
9/4/2010 7:26:30 PM
Campione
Taxonomy of Approaches
325
Traditional Assessment Practices: Criticisms Students are subjected to a variety of tests during their school careers. As a general cut, we might distinguish between ability or intelligence tests on the one hand and content area (reading, mathematics, social studies, etc.) tests on the other. These tests are designed to play different roles and to provide different kinds of information. Individual students, particularly those who are candidates for special education programs, are most directly affected by intelligence and ability tests, but all students are affected, at least indirectly, by content area evaluations. The tests also lead to different kinds of problems and different types of criticisms. For example, intelligence tests are criticized because they may give a distorted view of individual learners, area tests because they lead to a slanted view of academic domains. In some cases, the tests are criticized from opposite ends of some continuum. Intelligence tests are criticized because they do not influence instruction, content area tests because they are overly influential. Intelligence tests are challenged because they may underestimate the potential of some students, area tests because they may paint too optimistic a picture of student progress. Although the criticisms differ, it is our view that the underlying causes are the same in both cases. These include a reliance on static, product-based tests, inappropriate levels of description, and the decontextualized nature of the evaluations. In the next sections, we consider these factors and their impact on the different tests.
Ability and Intelligence Tests Ability and intelligence tests have been criticized in terms of both their predictive and prescriptive properties. Their larger success is in terms of identifying students likely to have particular problems dealing with school learning. Correlations between intelligence test scores and scholastic achievement are consistently high, not surprising as they were developed with that criterion in mind. However, even in the case of prediction, their success is limited, notably when children of poverty are involved. In the case of providing profiles of abilities that can be used to design interventions tailored to the needs of particular students or groups of students, they have been considerably less successful. And finally, they provide a pessimistic view of students who perform poorly. Elsewhere (Campione & Brown, 1987) we have reviewed some hypotheses that might account for the limitations. Static, Product-based Evaluation. Standard ability tests are geared to establishing students’ current levels of performance but yield no direct evidence about the processes that underlie that competence. They may tell us where someone is at a given point in time, but not how that person got there. In this sense, they provide at best a partial picture of student capabilities, a point
Salkind_Chapter 90.indd 325
9/4/2010 7:26:30 PM
326
Research Design, Measurement and Statistics and Evaluation
made nicely by Vygotsky (1978), who, in his discussion of the zone of proximal development, noted that static tests do not provide information about those functions that have not yet matured but are in the process of maturation, functions that will mature tomorrow but are in the embryonic stage. These functions could be called the “buds” or “flowers” rather than the fruits of development. The actual developmental level characterizes mental development retrospectively, while the zone of proximal development characterizes mental development prospectively. (pp. 86–87)
Vygotsky’s notion is of a testing environment, incorporating some kind of social support, that will create a zone of proximal development in which students will be able to demonstrate the embryonic skills not tapped by static test procedures. In his view, it is the observation of these nascent skills that provides better estimates of an individual’s potential for proceeding beyond current competence. Without such information, the likelihood of misclassifying students in increased. Particularly liable to be misclassified are students who have not had the opportunity to acquire the skills and knowledge assessed on standard tests (Campione, Brown, & Ferrara, 1982; Feuerstein, 1979; Vygotsky, 1978). In addition, without any way of articulating the processes that may have operated, or failed to operate, to produce a given level of performance, it is not possible to determine how to devise an intervention to improve that performance. Level of Description and Degree of Contextualization. Although standard ability tests are product based, it is nonetheless the case that they are frequently interpreted in terms of sets of psychological processes. It is these inferred processes that are sometimes the basis for intervention attempts (Brown & Campione, 1986a). The problem is that the identified processes tend to be vague abstractions, drawn from a particular psychological theory, which are not readily relatable to performance on school tasks such as reading or mathematical problem solving. Hence, there are no specific suggestions for dealing with the student who is having trouble reading and/or doing mathematics. This is not to say that nothing is done; the question is whether it is appropriate. The processes that emerge from standard ability tests are typically assumed to be quite general ones that operate in many, if not all, academic domains. The belief is in the centrality of general, decontextualized reasoning skills. This invites the conclusion that instruction aimed directly at those processes will have widespread effects throughout the curriculum – the leap to instruction (Brown & Campione, 1986a). If the analysis were correct, of course, enhancing those skills would represent an efficient way to remedy simultaneously a number of academic difficulties. However, this also leads to an approach that can displace alternatives aimed at teaching more domain-specific skills and competencies – students in resource rooms
Salkind_Chapter 90.indd 326
9/4/2010 7:26:30 PM
Campione
Taxonomy of Approaches
327
may practice their auditory sequencing skills, rather than skills associated with, say, reading comprehension. Static Nature of Evaluation. Another concern with ability tests stems from the conclusions that tend to be drawn. The result of assessment is frequently taken as providing a relatively permanent characterization of the individual in question. The classifications that result, already presumed to reflect general intellectual ability, are further regarded as fixed and unlikely to change. These expectations free teachers and schools from some of the responsibility for effective remediation; they also have a long history: I have always believed that intelligence can, to some extent, be taught, can be improved in every child, and I deplore the pessimism that this question often evokes. There is a frequent prejudice against the educability of intelligence. The familiar proverb which says, “When we are stupid, it is for a long time,” seems to be taken for granted by unscrupulous teachers. They are indifferent to children lacking intelligence; they don’t have any sympathy for them, or even respect, for their intemperance of language is such that they would say “this is a child who will never do anything … he is not gifted, not intelligent at all.” I have too often heard this uncautious language. I remember that during my Baccalaureate exam, one examiner, horrified by one of my answers, declared that I will never have the mind for philosophy. Never! What a big word! Some contemporary philosophers seem to have given their moral support to such lamentable verdicts, asserting that intelligence is a fixed quantity, a quantity which cannot be increased. We must protest and react against this brutal pessimism and show that it has no foundation. If it were not possible to change intelligence, why measure it in the first place? “Apres le mal, le remede.” Diagnosis is crucial but remedy must follow. (Binet, 1909, from Brown, 1985, translation.)
Despite Binet’s impassioned plea, the view of intelligence as fixed and immutable continues to be held by many.
Content Area Tests Content area tests differ from ability tests most obviously in the specificity of their content; they are geared toward particular academic domains. While the specific criticisms differ, their source remains the same. Static, Product-based Evaluation. As with ability tests, a major criticism of content area tests includes the point that by resting on a purely product-based assessment approach, they are silent on the processes involved in the acquisition of those products. There are several paths to getting a correct answer on a test, and unless those can be distinguished, the assessment has the potential for providing misleading information. In contrast to the
Salkind_Chapter 90.indd 327
9/4/2010 7:26:30 PM
328
Research Design, Measurement and Statistics and Evaluation
situation with ability tests, where the concern is that some students may get an item wrong for the wrong reason, thereby underestimating the competence of an individual, with content tests, the greater concern is that some get the item right for the wrong reason. There is by now good reason to believe that typical content area tests can provide a distorted view of progress by overestimating the capabilities of many students. Further, by so doing they reinforce some of the negative features of traditional educational practice that we reviewed earlier. Interestingly, a major mechanism for these effects is the argument that, in contrast to having no effect on instruction, content area testing drives instruction. Students, teachers, and school districts are evaluated against performance on standardized tests, and considerable time is spent preparing students to take those tests. The items tested are in good part those that are taught, and what is taught helps to define, for students and teachers, the nature of the domain in question. Level of Description and Degree of Contextualization. If ability tests emphasize general, global processes, area tests go to the other extreme. They focus on the many subskills assumed to be involved in effective performance within the domain. The questions are whether (a) the skills as tested are in fact relevant to the actual domain and (b) if they are, whether students who can execute those skills as tested can actually use them in the service of the larger tasks of which they are a component. Consider first reading assessment. In line with the notion that basic skills should precede higher level skills, as outlined above, tests at the earlier grades are heavily oriented toward phonics, as compared with comprehension, items. It is not until around fourth grade that this bias begins to change, and it is around this time that weaker students’ performance begins to look particularly poor. Students with comprehension problems can do reasonably well on reading tests until this time, and it is not until around the fourth-grade level that their performance begins to diverge rapidly from that of more capable readers. If students do poorly on phonics tests in the early grades, they are given additional practice aimed at phonics skills, with an attendant further reduction in instruction having to do with comprehension. Even at the upper grade levels, test structures continue to reinforce the subskills emphasis. The items designed to evaluate comprehension tend to tap skills in settings divorced from actually reading and understanding large segments of text. That is, the activities are tested as ends in themselves, rather than as means to the end of understanding what is being read. If reading comprehension is assumed to involve acts of finding the main idea, sequencing thoughts and actions, relating causes and consequences, summarizing, and so forth, then those skills appear on standardized tests. However, they appear as isolated activities, often in forms that are not recognizably related to normal reading. A brief, say three-sentence, paragraph is given, and the students’ task is to identify the topic sentence – or
Salkind_Chapter 90.indd 328
9/4/2010 7:26:30 PM
Campione
Taxonomy of Approaches
329
they are asked to select an appropriate title from a set of nominees. It is not that these abilities are themselves unimportant; the objection is that the description of reading ability that results is in terms of performance on a large number of discrete subskills. Even if (some of the) activities are involved in effective reading comprehension, what is important is the reader’s ability to combine and deploy them opportunistically as needed in response to different goals of reading, different text structures, and so forth. And these executive, megacognitive skills are not evaluated. Testing is divorced from the context of reading large segments of text for meaning. Students are asked to perform on items similar to those that appear on worksheets associated with basal reader series; and it is quite possible to master those exercises without actually being able to read with understanding. Stated in another fashion, students are tested on their ability to perform the requisite activities, but are not tested on their understanding of those activities, for example, in terms of when or why they would be appropriate adjuncts to learning. Finally, students perceived to be doing poorly, in part by virtue of their performance on standardized tests, receive more drill and practice on the subskills appearing on those tests. And instruction aimed at more global comprehension skills is further delayed and reduced. Similarly, mathematics evaluations are based on static tests assessing students’ ability to run off algorithms, to solve problems displayed in a recognizable format, and so forth. Evaluations do not tap the extent of students’ understanding of the procedures they are asked to execute. There is a tendency to assume that children who get the right answers know what they are doing, and those who fail do not. In addition, it is assumed that what a child does now on a test is a reasonable reflection of his or her knowledge, and that knowledge predicts or is equivalent to readiness to learn. It is those assumptions with which proponents of dynamic assessment quarrel. Students can be taught to run off algorithms in a purely rote or mechanical fashion, or they can be led to understand the rationale underlying those algorithms and hence something about the mathematical principles exemplified in them. The tests that are used to evaluate students clearly do not distinguish different possible paths to a correct answer. The major problem with standard static tests featuring problems presented in canonical form is that getting the right answer does not necessarily indicate that a child knows what he or she is doing (Erlwanger, 1973). For example, Peck, Jenks, and Connell (in press) interviewed fourth through sixth graders who had just taken a standardized math test used by their school district for placement in appropriate instructional groups. On the basis of the interviews, they found four types of students – two of the categories are those that tests are meant to separate: (1) those who got the answers right and knew why and (2) those who produced incorrect answers and did not know why. But there were large numbers of students who fell into the other two classifications: (3) those who were right but who did not understand what they were
Salkind_Chapter 90.indd 329
9/4/2010 7:26:30 PM
330
Research Design, Measurement and Statistics and Evaluation
doing and (4) those who were wrong but who did show evidence of understanding. On the basis of these interviews, Peck et al. reported that 41% of the students were inappropriately placed. Of the latter two groups, Group 3 is the more interesting. Group 4 consisted of children who are scored wrong primarily because they did not conform to the strict rules of the game ( 1 2 is correct, 3 6 is incorrect). Of more interest are students who appeared to have worked out the problem correctly, for example, 3
1
3
−2
5
6
= 10 3 − 17 6 = 20 6 − 17 6 =
3
6
=
1
2
This child is in control of the algorithm. However, what happens if he is asked to discuss the answer a little, for example, by being asked if 1 2 or 3 6 is larger? This student insisted that 1 2 is larger because “the denominator of 1 1 2 is smaller, so the pieces are larger and one of the great big pieces ( 2 ) is more than 3 of the tiny (sixth) pieces.” Having been suitably confused, the student had difficulty reworking the problem. This student, like a significant number of his peers, recognizes the problem type when presented in canonical form and can run off the algorithm correctly. However, he cannot resist countersuggestions because he does not have a firm grip on the meaning of what he is doing. Finally, as with reading, poor performance on mathematics tests frequently results in more practice aimed at perfecting algorithm use, again reducing the degree of attention aimed at understanding the underlying principles.
Traditional Assessment Practices: Consequences By way of summary, we can point to several consequences of the structure and interpretation of ability and intelligence tests, on the one hand, and content area tests, on the other. Ability tests tend to be applied on a relatively limited basis. They affect directly individual students, or small groups of students, with their main role being to contribute to the classification and placement of students with special needs. In the case of content area tests, the situation is different in a number of ways. These tests are applied to all students and affect many, if not all, of them, at least indirectly. Because of mass testing, and the resultant use of, for example, matrix sampling procedures, little attention is (or can be) paid to the performance of individual students; attention is focused on the mean performance of groups of students (a school, a district, a state, etc.). The problems that result in each case, although different in many ways, result from the same general set of features.
Salkind_Chapter 90.indd 330
9/4/2010 7:26:30 PM
Campione
Taxonomy of Approaches
331
Static, Product-based Evaluation. The emphasis on product, as opposed to process, information poses difficulties of interpretation in both cases. In the case of intelligence tests, if we assume that some testees may not have had the opportunity to acquire the knowledge and skills being assessed, there is considerable potential for underestimating those students’ ability levels, leading to misclassification and mislabeling. In the case of content area tests, the failure to evaluate the reasoning underlying student responses can result in the opposite problem. Students can arrive at the correct answers for the wrong reasons, often having no real understanding of the operations they carry out. In either case, the emphasis on product information results in the tendency on the part of some to assume that possessing the product information is equivalent to competence within the domain. Hence, teaching the specific information (knowledge) or skills (algorithms) contained in the tests is seen as the way to enhance competence – teaching digit span increases intelligence, providing historical facts makes one culturally literate (Hirsch, 1987), and so forth. Level of Description and Degree of Contextualization. We would argue that the levels of description associated with the two sets of tests are inappropriate, and again for opposing reasons. In the case of ability tests, there is an emphasis on quite general skills. Although there have been many debates about the existence of a general factor underlying intelligence, along with arguments about the needs for postulating specific factors, there is a general consensus that the number of factors is not very large. Those factors are then each seen to play a role in many aspects of intellectual performance. Individuals who perform poorly are seen as deficient in the operation of these general capabilities. The result of this view is that intelligence tests can provide a pessimistic, and frequently misleading, picture of labeled students, emphasizing that (a) their “disabilities” are in fairly general processing capabilities that can have widespread effects on a variety of academic tasks, and that (b) their potential for change is restricted. The situation with content area tests contrasts sharply; there the items correspond to extremely specific and very narrowly defined subskills. By concentrating on subskills tested in canonical settings, content area tests contribute to a distorted view of the different academic domains (they reinforce the view that reading is decoding and that mathematics is the ability to execute familiar algorithms) and of the nature of delay within those domains (due to incomplete control of basic skills). The structure of content area tests does influence instruction. Preparing groups of students to take and do well on such exams results in precisely the conditions we criticized about instruction in the earlier portion of this paper. Subskills are practiced, the curriculum script (geared to the standardized tests) is followed, little on-line diagnosis of individual student progress is
Salkind_Chapter 90.indd 331
9/4/2010 7:26:30 PM
332
Research Design, Measurement and Statistics and Evaluation
made, and so forth. Given this match between instruction and testing, it is likely that performance on those tests may overestimate the ability of many students. This elevated evaluation can then serve to indicate that the task of educating students is proceeding better than is actually the case.
Assisted Assessment: An Alternative Approach It is in response to some of these concerns that we and a number of other investigators have turned to alternative methods of assessment. Dynamic assessment is the general term used to encompass a number of distinct approaches (see Lidz, 1987, for an overview), a term initially used by Feuerstein (1979). Others have used different descriptors, Budoff (1974, 1987a, 1987b) referring to learning potential assessment, Carlson (Carlson & Weidl, 1978, 1979) to testing the limits approaches, Bransford and his colleagues (Bransford, Declos, Vye, Burns, & Hasselbring, 1987) to mediated assessment, Vygotsky (1978) to evaluation of the zone of proximal development, and we and our colleagues (e.g., Campione & Brown, in press) to assessment via assisted learning and transfer. The common feature is an emphasis on evaluating the psychological processes involved in learning and change. This is seen to contrast with standard methods of assessment that rely on product information. The argument is that individuals with comparable scores on static tests may have taken different paths to those scores, and that consideration of those differences can provide information of additional diagnostic value. The clearest example, of course, is of individuals who have not experienced a full range of the opportunities needed to acquire the skills or information being tested. And it is in this context – testing children from atypical backgrounds or children with school-related problems – that Budoff (1974), Feuerstein (1979), and Vygotsky (1978) did their early work. It was also with these populations that our early work leading to a concern with assessment and instruction was conducted (e.g., Brown, 1974, 1978; Campione & Brown, 1977, 1978; Campione et al., 1982). The methods different workers have developed vary considerably and reflect different goals. At one level, the concern may be with increasing the predictive validity of the assessment procedure. Or it may be with informing instruction. Although everyone aspires to both, there are trade-offs that result in some procedures being more appropriate in one case than the other. Across approaches, however, the common feature is an emphasis on individuals’ potential for change. In our own work, we have considered several goals of dynamic assessment. The result is a set of approaches to dynamic assessment that lend themselves to different sets of issues. We have been concerned with attempts to improve both prediction and instruction. This schizophrenic attack is not unique to us, but rather can be seen to characterize the field in general.
Salkind_Chapter 90.indd 332
9/4/2010 7:26:31 PM
Campione
Taxonomy of Approaches
333
Before reviewing our programs of research, we will outline a rough taxonomy of approaches to dynamic assessment that have been proposed, and indicate some others that have not yet appeared.
A Proposed Taxonomy Three general dimensions have been considered. The first, which we refer to as focus, looks at the competing ways in which potential for change can be assessed. One is by observing the actual improvement that takes place following some intervention. An alternative is to try to specify the processes that underlie any improvement and to assess the operation of those processes directly. By interaction, we mean the nature of the social interaction involving the examiner and subject. That interaction can be conducted in either a standardized or clinical fashion. And finally, by target, we indicate that dynamic assessment attempts can be geared to either general or domain-specific skills. Focus. There are two general ways in which to evaluate potential for change and hence determine whether that revealed potential has any diagnostic value. The most common is via some form of test-train-test procedure. Students take a particular test, are given some practice and/or instruction on typical test items, and then a posttest. A number of scores result from this sequence. First, there is the pretest score, that which would normally be used for purposes of prediction or classification. There are, in addition, data available from the instructional interaction itself. Next, there is the change score, or how much improvement took place from the pre- to the posttest. Finally, there is the posttest score itself. Each of these is then a candidate predictor score, and the empirical question concerns which is the most useful. This is the general procedure used by, among others, Budoff (e.g., 1974), Vygotsky (see Brown & French, 1979), Campione and Brown (1984, 1987, in press), Carlson and his colleagues (e.g., Carlson & Weidl, 1978, 1979), and Embretson (1987). Below, we indicate some of the differences among these researchers. The alternative approach to evaluating potential for change is to attempt to specify the processes involved in change in more detail and to evaluate their operation. That is, rather than looking at how much learning takes place in a given situation, we can try to specify the skills that underlie learning and evaluate them directly. The goal here is to end up with a description of individuals that reveals which skills are functioning smoothly and which are not being appropriately used. This profile can then be used as the basis for designing enrichment activities. In this approach, pre- and posttests are not necessary, although they could be incorporated. Interaction. For some, the goal is to devise a standardized protocol to govern the provision of help to students during the training portion of
Salkind_Chapter 90.indd 333
9/4/2010 7:26:31 PM
334
Research Design, Measurement and Statistics and Evaluation
the intervention. That is, given a test-train-test sequence, the training experiences should be as standardized and consistent across students as possible given the restriction that it is impossible to completely standardize any social interaction. Others make the decision to resort to an unstructured, clinical interview approach in which the examiner is given considerable latitude in selection of both the tasks to be presented and the way in which he or she responds to the testee’s statements and actions. For those who choose to standardize the procedure (e.g., Budoff, 1987a, 1987b; Campione & Brown, 1987, in press; Carlson & Weidl, 1978, 1979; Embretson, 1987), the goal is to generate psychometrically defensible quantitative data that can be used for purposes of description and classification. Even here, however, there are differences. Budoff is concerned primarily with a gain score, the difference between the initial and final test performance (taking into account the pretest score). His argument is that gainer status will provide diagnostic information beyond that afforded by the pretest or other standardized scores, and he provides evidence in support of this contention (e.g., Budoff, 1987a). Carlson and his colleagues, as well as Embretson, focus more directly on the posttest score. Their belief is that the intervention provided between testing sessions will, for different sets of reasons, improve performance by minimizing the efforts of extraneous (motivational, misunderstanding of instruction, etc.) factors that artificially reduce performance. The higher, less contaminated, levels of performance obtained on the final test should then provide more accurate characterizations of subjects, and should result in greater predictive validity; again, there is evidence to support this claim. In our own work (Campione & Brown, 1974, 1987, in press), we have taken still another approach. Our concern has been with measuring learning and transfer efficiency. Students are asked to learn new rules or principles, and we provide titrated instruction, beginning with weak, general hints and proceeding through much more detailed instruction. We ask, not how much improvement takes place, but rather how much help students need to reach a specified criterion with regard to rule use, and then how much additional help they need to begin to transfer those rules to novel situations. The idea is that these learning and transfer scores will provide more information about individuals than competing static scores, for example, general intelligence or entering competence. Below, we review some of that work. In designing the interaction between examiner and student, the alternative method, championed by Feuerstein (1979) and the group at Vanderbilt (e.g., Bransford et al., 1987; Burns, 1985; Vye, Burns, Delclos, & Bransford, 1987), is to resort to a more clinical evaluation. The argument is that the most sensitive assessments result when the examiner, rather than being restricted by a standardized protocol, has the flexibility to follow promising leads as they arise during the interview. A crucial feature of the interview is the ability of the examiner to take advantage of cues provided by the student
Salkind_Chapter 90.indd 334
9/4/2010 7:26:31 PM
Campione
Taxonomy of Approaches
335
and use them as an opportunity to probe in more detail an individual’s strengths and weaknesses. The goal here, rather than generating quantitatively useful data, is to generate a rich clinical picture of an individual learner, one that can be used to guide remediation attempts. It is also argued that this is an efficient way to merge assessment and instruction. When the examiner perceives a problem, he or she can select examples and provide instruction designed to clarify misconceptions or show the student how to improve his or her reasoning. Target. The goal of assessment can be either evaluation of relatively general or domain-specific skills and processes. Some (e.g., Feuerstein, 1979) have concentrated on general skills (deficient cognitive functions, in Feuerstein’s terminology), whereas others (e.g., Brown, Campione, Reeve, Ferrara, & Palincsar, in press; Ferrara, 1987) have been concerned with assessments situated within a particular content area. The competing assumptions are in terms of whether one wishes to assess, and eventually modify, problems associated with intelligence, or problems in more restricted cognitive skills (Brown & Campione, 1986b).
Examples and Evaluations Although it is impossible to go into detail here, we would like to illustrate some of the contrasting approaches that have been taken, consider the goals encompassed in each, and review their strengths and weaknesses (see Brown, Campione, & Webber, in preparation).
1. Standardized Interventions/General Skills Efforts in this category are concerned primarily with devising methods to increase the predictive validity of the assessment process. The general approach is to work within the context of tests of fairly general abilities, and proceed on the assumption that assessments conducted at this level will provide information about individuals across a number of situations. The work of Budoff (1974, 1987a, 1987b) provides a good starting point. His approach involves a pre-post design with a standardized instructional component interspersed. His goal is to assess a general learning potential possessed by some educable retarded children that was not tapped by standard static tests. This potential is defined by the pre- to post-gain score. Budoff originally distinguished gainers, those who show a marked pre- to posttest gain; non-gainers, those who improve little; and high scorers, those who did well on the pretest. More recently, Budoff (1987b) has refined his scoring system to take pretest scores more fully into account, moving to a continuum of gain status rather than a tripartite distinction.
Salkind_Chapter 90.indd 335
9/4/2010 7:26:31 PM
336
Research Design, Measurement and Statistics and Evaluation
The main issue concerns the extent to which gainer status provides useful diagnostic and predictive information, and it does. For example, middle class children in special education classes tend to be nongainers, whereas lower class children have a high incidence of gainers. Learning potential status also predicts performance on a variety of laboratory concept-learning tasks and a specially constructed math curriculum. It also is related to successful adaptation to mainstreaming, the ability to find and hold jobs during adolescence, and a number of positive personality characteristics. Carlson and his colleagues (e.g., Carlson & Weidl, 1978, 1979) and Embretson (1987) offer a related approach. Carlson and Weidl’s method, referred to as testing the limits or the integration of specific interventions within the testing procedure, involves modifying the test context in a way designed to enhance performance. Again, they are concerned with standardized interventions designed to facilitate performance and provide a more sensitive index of a general intellectual capability. For example, groups of subjects may be given the Raven Progressive Matrices in the standard administration, or they may be required to verbalize the solution choice before seeing the alternatives or after making their choice, or they may simply be given feedback about the correctness of each choice. These modifications result in higher levels of performance on the Raven, increases that are seen to reflect modifications in subjects’ understanding of the task, their greater comfort or reduced anxiety in the testing situation, and so forth. The issue then concerns the predictive validity of the posttest scores, compared with the pretest scores. One expectation would be that the posttest results would be less useful, as the conditions under which the test norms were collected have been altered. However, Carlson and his colleagues report that the posttest scores are actually the more predictive. Interventions that facilitate the performance of individuals also lead to scores more clearly related to a number of criterion measures. Embretson (1987) employs a standardized instructional component intervening between the pre- and posttest. She gave one group of subjects (a) a test of spatial ability, (b) training on folding three-dimensional shapes, and then (c) a readministration of the spatial ability test. Her subjects improved from the first to the second administration, and she found that posttest performance provided a better prediction of text editing performance than did pretest performance. Some of our own work also fits into this category. We were concerned with a number of issues. The first was to test some hypotheses about the role of learning and transfer processes in students varying in scholastic performance. To do this, we had to devise measures of learning and transfer, and then investigate the concurrent and predictive validity of those scores. We worked with inductive reasoning tasks, such as letter series completion and Raven-type matrix problems. Subjects were given a series of static pretests, including tests of general ability (subscales from the Wechsler Intelligence
Salkind_Chapter 90.indd 336
9/4/2010 7:26:31 PM
Campione
Taxonomy of Approaches
337
Scale for Children-Revised [WISC-R] [Wechsler, 1974] and the Raven Coloured Progressive Matrices [Raven, 1956]) and a pretest on the test items. Following this they were given a series of learning and transfer sessions in which we assessed how much instruction they needed to learn in order to use a set of rules independently (learning score), and then how much instruction they needed before they could apply those rules in related but novel settings (transfer score). The addition of transfer probes was dictated by a number of considerations. For one, we have argued elsewhere (e.g., Campione & Brown, 1978; Campione et al., 1982) that transfer performance is highly related to academic success. In addition, we also believe, with Moore and Newell (1974), that appropriate and flexible use of a rule or principle is the hallmark of understanding that rule; we also believe that the ability to understand a principle is related to future success. Both of these points dictate that transfer performance be included as a component of assessment. Our first question was whether the learning and transfer scores would be related to general ability differences, and it turned out that they were (e.g., Campione, Brown, Ferrara, Jones, & Steinberg, 1985; Ferrara, Brown, & Campione, 1986). Lower ability, as compared with higher ability, children required more instruction to learn a set of rules to some criterion and needed more help to come to apply those rules to novel problems. Further, as the number of features distinguishing the learning and transfer problems increased, the performance decrements of the lower ability students became progressively more pronounced. The second, and more crucial, issue concerned whether those scores would provide information beyond that obtainable from the static tests. To evaluate that issue, Bryant, Brown, and Campione (1983) took as a criterion measure the subjects’ gains from the pretest to the posttest, and asked which scores or set of scores best predicted those gains. Across a number of studies, the results have been similar. If simple correlations are considered, the guided learning and transfer scores are the best individual predictors of gain; correlations involving static ability scores and gain average around .45, whereas the correlation between learning or transfer scores and gain are of the order of .60. And if static test scores are entered first into a hierarchical regression analysis, the dynamic scores added subsequently consistently account for significant additional variance (from 22% to 40%) in the gain scores. Also consistent across studies, the transfer scores always account for more variance than do the learning scores. Critique. The strengths and weaknesses of these procedures are clear. There is little doubt that they can lead to more accurate prediction and classification of individual subjects. All of the attempts, which feature interventions designed to facilitate performance, succeed both in improving performance and in showing that the heightened scores can possess more predictive validity than the unaided static test scores. That is true
Salkind_Chapter 90.indd 337
9/4/2010 7:26:31 PM
338
Research Design, Measurement and Statistics and Evaluation
whether attention is focused on the gain from unaided to aided conditions (Budoff, 1974), post-test performance (Carlson et al., 1978; Embretson, 1987), or on assessments of ease of learning and transfer (Campione et al., in press). The most important feature to emerge from our work is the particular sensitivity of transfer processes. The major drawback to these approaches, in the context of assessment and instruction, is that they provide little information of direct use to teachers. We may be better able, through their use, to identify those likely to experience problems, but we cannot derive information to guide remediation attempts.
2. Clinical Interventions/General Skills The goal of those in this camp, rather than improving prediction, is providing information about individuals that can inform instructional programs. The focus is more directly on sets of underlying cognitive processes that are presumed quite general, and the assessment is conducted in a clinical, opportunistic fashion that combines evaluation and instruction. In these applications, it is difficult to separate assessment and instruction. We would place Vygotsky (1978) and his treatment of the zone of proximal development in this category. However, the major figure here is clearly Feuerstein (1979). His Learning Potential Assessment Device (LPAD) has been developed in considerable detail, and the program is in wide use in the United States of America. Feuerstein’s stated goal is to evaluate individuals’ ability to profit from instruction, and toward this end the goal of the LPAD is to produce changes in fundamental cognitive processes and even to introduce new cognitive structures. He wishes to evaluate and remediate simultaneously. To accomplish this, Feuerstein eschews the use of standardized approaches and argues for a flexible, individualized, and highly interactive format. He views the examiner as a teacher/observer and the examinee as a learner/performer. Crucial is the role of affect. A neutral unresponsive stance from the examiner is seen to reinforce the examinee’s already negative self-feelings, and the provision of simple positive feedback is viewed as unlikely to be effective. Instead the examiner must function as a teacher, one who is responsive to the examinee in a multiplicity of ways – giving and requiring explanations, selecting examples (including repetition when necessary), summarizing progress, and so forth. The goal is not prediction of future behavior, but a statement of how modifiable the individual’s structures are and where the individual’s deficits lie. This is accomplished through the mechanism of a cognitive map, composed of seven parameters (content, modality, phase, operations, level of complexity, level of abstraction, and level of efficiency) that pinpoints the directions that instructions should take. The ultimate test of Feuerstein’s approach rests in the improvements in academic performance that eventually result. Here the results are somewhat mixed, with some positive evidence being reported, along with some
Salkind_Chapter 90.indd 338
9/4/2010 7:26:31 PM
Campione
Taxonomy of Approaches
339
less successful attempts (see Bransford, Stein, Arbitman-Smith, & Vye, 1985, for a review and analysis). Burns (1985), a member of the Vanderbilt group, has also employed this approach, using what she calls a mediated assessment procedure. This is in contrast to the graduated prompt procedure that we have used in some of our work (see previous section). A test-train-test paradigm is used. The defining feature of the mediated assessment approach is that the instruction that takes place is exactly that, intensive instruction in which the examiner does everything possible to teach the student how to solve the test problems. Emphasizing the relation between assessment and instruction, one criterion against which the mediated assessment procedure is evaluated is the examinees’ performance both on the types of items practiced during the assessment session and on transfer items. Assessment and instruction are so strongly linked in this approach that unless the instruction has been effective in producing change, the assessment is seen to have failed. Another aim of this approach is the development of individualized intervention programs based on the outcome of the assessment. Working with a stencil design task favored by Feuerstein, Vye et al. (1987) reported some success in this endeavor. Finally, one of the major criticisms of static tests, particularly those addressed at general intellectual functioning, is the view of the learner they generate. Low scorers are seen as having broad cognitive limitations and as unlikely to profit from standard instruction. Vye et al. (1987) also reported data indicating that teachers who observe dynamic testing sessions end up with a more optimistic view of the students than they do if they observe static testing. Critique. The goals of those who espouse this approach are more ambitious than in the previous case. And the criteria for their success, worthwhile improvements in academic performance, are more stringent. The main points to be considered in these programs are the same ones to which we have alluded throughout the paper: (a) the level of description or generality of the target processes and (b) the degree of contextualization of the instructed activities. Without going into detail, our main concerns are that the emphasis on very general skills, particularly in Feuerstein’s case, leads to an enormous transfer problem. Assessment (and instruction) takes place with specially developed materials that intentionally bear little relation to school-like tasks, and it is quite possible for students to improve on their ability to deal with those tasks and yet show no appreciable gains in the academic disciplines. The instruction is divorced from the actual contents and contexts of schooling. As Bransford et al. (1985) pointed out, many of the gains achieved by those who have participated in the Instrumental Enrichment program (Feuerstein, 1980) have been on standardized ability tests. The evidence that gains in reading, writing, and arithmetic will follow remains uncomfortably slim.
Salkind_Chapter 90.indd 339
9/4/2010 7:26:31 PM
340
Research Design, Measurement and Statistics and Evaluation
In addition, there is the question of how we can evaluate the claim that the assessment process results in the identification of an individually tailored remediation program. Vye et al. (1987) provide evidence that, following mediated dynamic assessment, different children were seen to have different types of problems, and instruction based on that information resulted in clearly improved performance, that is, the assessment procedure did seem to be sensitive to individual differences and did lead to an effective teaching approach. The problem is that we have no way of knowing whether an alternative instructional avenue would have produced equal or better learning. That is, all that can be clearly claimed is that a responsive clinician, familiar with the domain in question and working one on one with a student, can help that student learn. The assessment cannot demonstrate the efficacy of a particular approach compared to alternatives. Based on the reservations we have expressed thus far, the question is, what are the better alternatives? Our current bias is to situate assessment within particular academic domains. Whether the reliance is on a standardized approach to assessment or on a clinically based combination of assessment and instruction, it seems that we have come to know enough about basic academic subjects that we can design powerful techniques (Brown & Campione, 1986a) that can help overcome some of the limitations of traditional assessment and instruction. As examples, we offer two programs that have been developed in our laboratory.
3. Standardized Intervention/Domain-Specific Skills In her PhD thesis, Ferrara (1987) extended our work on assessment via guided learning and transfer to the field of early mathematics. During the initial learning sessions, the student and tester worked collaboratively to solve problems that the student could not solve independently. The problems were simple, two-digit addition problems, for example, 3 + 2 = ?, presented as word problems, such as: Cookie Monster starts out with three cookies in his cookie jar, and I’m putting 2 more in the jar. Now how many cookies are there in the cookie jar?
When the student encountered difficulties, the tester provided a sequence of hints or suggestions about how he should proceed, and Ferrara measured the amount of aid needed to achieve this degree of competence, that is, how much help does the student need to master the specific procedures? Following this, Ferrara presented a variety of transfer problems in the same interactive, assisted format. These problems (see Ferrara, 1987, for details) required the student to apply the procedures learned originally to a variety of problems that differed in systematic ways from those worked on initially. Some were quite similar (near transfer: addition problems involving
Salkind_Chapter 90.indd 340
9/4/2010 7:26:31 PM
Campione
Taxonomy of Approaches
341
new combinations of familiar quantities and different toy and character contexts); others more dissimilar (far transfer: 4 + 2 + 3 = ?); and some very different indeed (very far transfer: missing addend problems, 4 + ? = 6). What was scored was the amount of help students needed to solve these transfer problems on their own. The aim of the transfer sessions was to evaluate understanding of the learned procedures. That is, the goal was both to program transfer and to use the flexible application of routines in novel contexts as the measure of understanding. Recall that one of our concerns about standard assessment procedures is their insensitivity to students’ understanding of the routines they can apparently execute – many students get the right answer but for the wrong reasons. Inclusion of a transfer component in the assessment is designed to distinguish students who can use only what they were taught originally from those who, because they understand, can go beyond the specific problem types they have practiced and apply their routines flexibly. After these learning and transfer sessions were completed, a posttest was given to determine how much the student had learned during the course of the assessment/instruction, the gain from pre- to posttest. The first finding was that the dynamic scores were better predictors of gain (mean correlation = −.57) than were the static knowledge and ability scores (mean correlation = .38). Further, in a hierarchical regression analysis, although the static scores when extracted first did account for 22.2% of the variance in gain scores, addition of the dynamic scores accounted for an additional 33.7% of the variance, with transfer performance doing the majority of the work; it accounted for 32% of the variance. Critique. The clearest conclusion that can be drawn is that this work reinforces the view that dynamic assessment procedures can be used to improve prediction. The learning, and in particular transfer, scores were more strongly associated with gain than were the competing static scores, and they also accounted for variance in gain even when the effects of the static scores were removed. But we think there is more to say. There is also an important sense in which this effort combines assessment with instruction. In addition to helping predict how well individuals may do in some domain, it is highly desirable that the assessment process produce some payoff in terms of contributing directly to the instructional process. In one sense, the approach we have taken does this automatically – instruction is an integral part of assessment. While students are being evaluated, they are also being taught something about the domain in question. Further, the assessment involved not testing what students have already been taught, but skills that they have clearly not as yet mastered; they were asked to learn to solve problems that they could not solve at the outset. And they showed significant improvement in their ability to do just that. In the ideal case, the hints that are given are based on a detailed task analysis of the components of competence within the
Salkind_Chapter 90.indd 341
9/4/2010 7:26:31 PM
342
Research Design, Measurement and Statistics and Evaluation
domain; as such, they provide a model of how one should proceed to solve the problems presented. If the hints are internalized to some degree, the subjects will have acquired relevant skills. That learning does take place is clear enough – in all the studies we have conducted, subjects have shown large gains from pretest to posttest. Suggested Directions for Research. From our perspective, an ideal way of integrating assessment and instruction would involve the interspersing of dynamic assessment sessions with regular instructional sessions. The assessment segments would provide current information on how quickly individuals were able to acquire and use new skills, as well as helping teach those skills. Although the sessions are time-consuming, the fact that the hints are preprogrammed makes it feasible to carry them out on a computer, and we have in fact done that successfully in our own work. As a result, the assessments could be done without taking up teacher time. The main point is that checking regularly for students’ ability to use new resources would reduce the likelihood that they are acquiring progressive bits of knowledge that remain encapsulated and relatively inaccessible. If tests of current competence are solely in terms of the extent to which particular routines have been mastered and have become usable within familiar contexts, it is easily possible that some students will have acquired a repository of inert facts and procedures. In addition to signaling that some student may be in particular difficulty, it is also desirable that the assessment process provide specific information about the kinds of help that individuals may need to advance more quickly. One approach is to develop sequences of prompts that can be organized qualitatively. In that way, in addition to determining the number of hints individuals require, information about the specific kinds of hints they need would also be available. This information could then be used to devise more specific remedial instruction. For example, Ferrara’s hints included simple negative feedback (giving an opportunity for subjects to correct their initial response), verbal memory aids (reminders of the quantities involved in the problem), concrete memory aids, scaffolding (supportive prompts designed to help the child structure the problem), strategy suggestions, and so forth. Although we have not run enough subjects as yet to know if the additional information will be helpful, this approach is one we are currently pursuing. If such procedures could be devised, they would represent a way of combining the best features of the standardized (quantitative data) and clinical (qualitative descriptions) assessment procedures in a single package.
4. Clinical Assessment/Domain-Specific Skills Our final example is the reciprocal teaching of reading and listening comprehension skills (e.g., Brown & Palincsar, in press; Palincsar & Brown, 1984).
Salkind_Chapter 90.indd 342
9/4/2010 7:26:31 PM
Campione
Taxonomy of Approaches
343
The reason for setting the approach within a specific domain is our feeling that it then becomes possible to select target skills to be taught and evaluated that are at an appropriate level of analysis – general enough to be applicable in many situations but powerful enough that they make clear contributions to learning. In the reciprocal teaching application, these guiding skills must also be concrete enough that they can be used to structure a discussion. Working within a domain also makes it considerably easier to contextualize the instruction in a reasonable way; the processes being honed are always practiced in the actual context of the academic task in which we are interested. The social nature of the procedure also makes it possible, as will become clear, to eliminate the need for reliance on subskills that characterizes many other approaches, again a way of increasing the degree of contextualization of the instruction. Reciprocal teaching takes place in a cooperative learning group that features guided practice in applying simple concrete strategies to the task of text comprehension. A teacher and a group of students take turns leading a discussion concerning a segment of text they are jointly trying to understand. The dialogues are organized around four main comprehension-fostering and comprehension-monitoring activities: questioning, summarizing, predicting, and clarifying. These activities were chosen because they are known to facilitate comprehension and because they are used by skilled, but not unskilled, readers. The goal of the enterprise is to have the students become independent readers who use a variety of comprehension strategies opportunistically to aid their comprehension of texts. The task in each dialogue is joint construction of meaning. The strategies provide concrete heuristics for getting the discussion moving, teacher modeling provides examples of expert performance, and the reciprocal nature of the procedure guarantees student involvement. The approach embodies five central principles: 1. When leading the discussion, the teacher actively models the target comprehension activities, making them explicit and overt. 2. The strategies are always modeled and practiced in the actual context of constructing meaning from text, never as isolated skills. They are applied as needed to the task of understanding relatively extended segments of text. Although at the outset, individual students are not proficient at this overall activity, the social support provided by the teacher and the rest of the group makes the task a manageable one. It is this social feature of guided cooperative learning (Brown & Palincsar, in press) that makes it possible to practice nascent complex activities in an appropriate context, rather than as decontextualized isolated subskills. 3. Students are made aware of the nature of the strategies and when and how they are to be applied. There is little chance that they will fail to understand the significance of those activities given the explanations offered by teachers and the fact that the activities are used exclusively in the context of reading for meaning.
Salkind_Chapter 90.indd 343
9/4/2010 7:26:31 PM
344
Research Design, Measurement and Statistics and Evaluation
4. Central to our discussion here, the procedure forces each student to lead some of the discussions. In this role, students make their own level of competence apparent. Consequently, the teacher can engage in the kind of on-line diagnosis that is absent from traditional instruction, and provide instruction geared to the level of that student at that time. 5. Responsibility for the activities is transferred to the students as quickly as possible. As a student masters one level of involvement, the teacher increases his or her demands so that the student is gradually called upon to function at a more advanced level. Again, the emphasis on the teacher’s need to monitor progress is clear. Finally, as opposed to more formal assessment, the teacher’s assessment of individual student responses need not be precise. The virtue of the procedure is that it is a regular component of classroom activity, taking place on a daily basis. As such, the teacher has many opportunities to monitor each student, and his or her judgments can reflect aggregations of a number of different inputs. The fact that there are many opportunities for evaluation means that no individual one is of particular significance. This approach embodies some of the features of Feuerstein’s program. Sets of processes involved in the task are specified; an environment is constructed in which students are observed as they engage in those activities; and the teacher acts as both an evaluator and a clinician, capable of discovering strengths and weaknesses and responding to student input by providing feedback, practice, and support as needed. It is also different in fundamental ways. The processes that are targeted are chosen in reference to the academic domain in question, and the activities are always modeled and practiced in context. These two features conspire to minimize, or finesse, the transfer problem. As the activities are practiced in the context of reading for meaning, we need not be concerned whether they will transfer to that task. Also, if the program is successful, improvements are obtained directly on important school tasks rather than on processing skills that are assumed to be related to performance on those tasks. There is abundant evidence that reciprocal teaching of reading and listening comprehension can be an effective means for dealing with poorly achieving students in the early to middle school years. The issue for future research is to establish that the principles that have been identified can be generalized to other content areas. This approach to integrating assessment and instruction rests on the ability to identify, in other areas, the kinds of activities similar to questioning, summarizing, and so forth, that are general enough to be widely useful but concrete and specific enough that they can both facilitate performance and support a discussion centering on the semantics of the domain. It is our belief that this is possible, and we have made beginnings in the area of elementary biology (see Brown et al., in press) and beginning algebra (Campione, Brown, & Connell, in press). It may not be the easiest thing to do (Brown & Campione, in press), but it seems worth the effort.
Salkind_Chapter 90.indd 344
9/4/2010 7:26:31 PM
Campione
Taxonomy of Approaches
345
Summary Criticisms of educational practice have become increasingly frequent and strident. Many of the problems can be seen to be a consequence of the ways in which standard instruction and assessment are structured and of the interplay between them. In this paper, we have reviewed some of the sources of the problems and argued that novel approaches to assessment – generically called dynamic assessment, which features the provision of assistance to the examinee – hold some promise for contributing to improvements in both assessment and instruction. Dynamic assessment approaches emphasize potential for growth and can focus on either aspects of improvement that result from some intervention or on specified processes assumed responsible for learning. The skills assessed can be presumed either general and content independent or more specific and domain dependent. Finally, the assessment can be conducted either in a standardized fashion geared to the generation of quantitative data that can facilitate prediction and classification or in a more clinical mode aimed at providing a rich qualitative picture of individual learners that can be used to guide instruction. Both the standardized and clinical procedures have their strengths and weaknesses. The standardized approaches provide quantitative data that serve a number of roles: (1) The dynamic scores are on occasion more predictive of other target behaviors than static scores obtained from the same test. (2) By focusing on potential for change and providing a forum most likely to reveal competence, they can help minimize the likelihood of misclassification of students, particularly those from poverty backgrounds. (3) The provision of assistance makes it possible to evaluate performance in settings just in advance of current capabilities, creating a zone of proximal development in which to gauge progress. In our own work, this enables us to look specifically at transfer performance, which we see to reflect understanding of newly acquired skills, a feature notably missing from static tests. (4) Teacher perception of student abilities is influenced. Teachers observing dynamic assessment sessions come away with increased confidence that the students can profit from suitable instruction. On the negative side, this approach provides relatively little information to guide instruction. The clinical approaches hold more promise for contributing to educational practice, but are much more difficult to implement and evaluate. The data to support them are still lacking. In either case, attention has been focused primarily on relatively general, content-independent processes. This may contribute to the weaknesses of each approach. Our suggestion is to concentrate on domain-specific, as opposed to domain-general, activities; and it is this tack we have taken in our more recent efforts. By situating research on assessment and instruction in the context of the major school areas, such as reading, science, and mathematics, it is possible to specify in more detail and with more confidence the
Salkind_Chapter 90.indd 345
9/4/2010 7:26:31 PM
346
Research Design, Measurement and Statistics and Evaluation
skills and activities we wish to evaluate. Further, if we assess performance in the areas in which students are having difficulties and concentrate on skills known to be related to success in the domain, the problem of the leap to instruction is minimized. We thus avoid the problem of transfer. Finally, if we rely on assessments within domains, the view we provide of students may be a more optimistic one. The fact that someone does poorly in early reading does not mean he or she will have difficulty in mathematics or science. Given what we know of teacher bias effects, this is itself a worthwhile benefit. Finally, the negative side to this view is that it requires that we develop separate assessment instruments and remedial instructional packages for each domain. Life would be easier if general skills were a major part of the answer. One test would do, one remedial curriculum would suffice for those having problems. Although we will follow with considerable interest those taking the general skills approach, it is our bet that progress can best be made by devising novel methods of assessment and instruction tailored to specific domains. We believe that dynamic assessment methods will come to play a progressively larger role in those activities.
References Binet, A. (1909). Les idees modernes sur les infants. Paris: Ernest Flammeron. Bransford, J.C., Delclos, V.R., Vye, N.J., Burns, M.S., & Hasselbring, T.S. (1987). State of the art and future directions. In C.S. Lidz (Ed.), Dynamic assessment: An interactional approach to evaluating learning potential (pp. 479–496). New York: Guilford Press. Bransford, J.D., Stein, B.S., Arbitman-Smith, R., & Vye, N.J. (1985). Improving thinking and learning skills: An analysis of three approaches. In J. Segal, S.F. Chipman, & R. Glaser (Eds.), Thinking and learning skills: Relating instruction to research ( Vol. 1, pp. 133–208). Hillsdale, NJ: Erlbaum. Brown, A.L. (1974). The role of strategic behavior in retardate memory. In N.R. Ellis (Ed.), International review of research in mental retardation ( Vol. 7, pp. 55–111). New York: Academic Press. Brown, A.L. (1978). Knowing when, where, and how to remember: A problem of metacognition. In R. Glaser (Ed.), Advances in instructional psychology ( Vol. 1, pp. 77–165). Hillsdale, NJ: Erlbaum. Brown, A.L. (1985). Mental ortheopedics, the training of cognitive skills: An interview with Alfred Binet. In S. Chipman, J. Segal, & R. Glaser (Eds.), Thinking and learning skills ( Vol. 2, pp. 319–337). Hillsdale, NJ: Erlbaum. Brown, A.L., Bransford, J.D., Ferrara, R.A., & Campione, J.C. (1983). Learning, remembering, and understanding. In J.H. Flavell & E.M. Markman (Eds.), Handbook of child psychology ( Vol. 3, pp. 77–166). New York: Wiley. Brown, A.L., & Campione, J.C. (1986a). Psychological theory and the study of learning disabilities. American Psychologist, 41, 1059–1068. Brown, A.L., & Campione, J.C. (1986b). Academic intelligence and learning potential. In R.J. Sternberg & D.K. Dettermann (Eds.), What is intelligence? Contemporary viewpoints on its nature and definition (pp. 39–44). New York: Ablex. Brown, A.L., & Campione, J.C. (in press). Interactive learning environments and the teaching of science and mathematics. In M.H. Gardner, J.G. Greeno, F. Reif, & A. Schoenfeld (Eds.), Towards a scientific practice of science education. Hillsdale, NJ: Erlbaum.
Salkind_Chapter 90.indd 346
9/4/2010 7:26:31 PM
Campione
Taxonomy of Approaches
347
Brown, A.L., Campione, J.C, Reeve, R.A., Ferrara, R.A., & Palincsar, A.S. (in press). Interactive learning and individual understanding: The case of reading and mathematics. In L.T. Landsmann (Ed.), Culture, schooling and psychological development. Hillsdale, NJ: Erlbaum. Brown, A.L., & French, L.A. (1979). The zone of potential development: Implications for intelligence testing in the year 2000. Intelligence, 3, 253–271. Brown, A.L., & Palincsar, A.S. (1982). Inducing strategic learning from texts by means of informed, self-control training. Topics in Learning & Learning Disabilities, 2(1), 1–17. Brown, A.L., & Palincsar, A.S. (in press). Guided cooperative learning and individual knowledge acquisition. In L.B. Resnick (Ed.), Cognition and instruction: Issues and agendas. Hillsdale, NJ: Erlbaum. Bryant, N.R., Brown, A.L., & Campione, J.C. (1983, April). Preschool children’s learning and transfer of matrices problems: Potential for improvement. Paper presented at the Society for Research in Child Development meetings, Detroit. Budoff, M. (1974). Learning potential and educability among the educable mentally retarded (Final Report, Project No. 312312). Cambridge, MA: Research Institute for Educational Problems, Cambridge Mental Health Association. Budoff, M. (1987a). The validity of learning potential assessment. In C.S. Lidz (Ed.), Dynamic assessment: An interactional approach to evaluating learning potential (pp. 52–81). New York: Guilford Press. Budoff, M. (1987b). Measures for assessing learning potential. In C.S. Lidz (Ed.), Dynamic assessment: An interactional approach to evaluating learning potential (pp. 173–195). New York: Guilford Press. Burns, M.S. (1985). Comparison of “graduated prompt” and “mediational” dynamic assessment and static assessment with young children (Tech. Rep. No. 2). Alternative assessments of young handicapped children. Nashville, TN: Vanderbilt University, John F. Kennedy Center for Research on Human Development. Campione, J.C., & Brown, A.L. (1977). Memory and metamemory development in educable retarded children. In R.V. Kail, Jr., & J.W. Hagen (Eds.), Perspectives on the development of memory and cognition (pp. 367–406). Hillsdale, NJ: Erlbaum. Campione, J.C., & Brown, A.L. (1978). Toward a theory of intelligence: Contributions from research with retarded children. Intelligence, 2, 279–304. Campione, J.C., & Brown., A.L. (1984). Learning ability and transfer propensity as sources of individual differences in intelligence. In P .H. Brooks, R.D. Sperber, & C. McCauley (Eds.), Learning and cognition in the mentally retarded (pp. 265–294). Baltimore: University Park Press. Campione, J.C., & Brown, A.L. (1987). Linking dynamic assessment with school achievement. In C.S. Lidz (Ed.), Dynamic assessment: An interactional approach to evaluating learning potential (pp. 82–115). New York: Guilford Press. Campione, J.C., & Brown, A.L. (in press). Guided learning and transfer: Implications for assessment. In N. Fredericksen, R. Glaser, A. Lesgold, & M. Shafto (Eds.), Diagnostic monitoring of skill and knowledge acquisition. Hillsdale, NJ: Erlbaum. Campione, J.C., Brown, A.L., & Connell, M.L. (in press). Metacognition: On the importance of understanding what you are doing. In R.I. Charles & E.A. Silver (Eds.), Research agenda for mathematics education: Teaching and assessment of mathematical problem solving. Hillsdale, NJ: Erlbaum. Campione, J.C., Brown, A.L., & Ferrara, R.A. (1982). Mental retardation and intelligence. In R.J. Sternberg (Ed.), Handbook of human intelligence (pp. 392–490). New York: Cambridge University Press. Campione, J.C., Brown, A.L., Ferrara, R.A., Jones, R.S., & Steinberg, E. (1985). Breakdowns in flexible use of information: Intelligence-related differences in transfer following equivalent learning performance. Intelligence, 9, 297–315.
Salkind_Chapter 90.indd 347
9/4/2010 7:26:31 PM
348
Research Design, Measurement and Statistics and Evaluation
Carlson, J.S., & Weidl, K.H. (1978). Use of testing-the-limits procedures in the assessment of intellectual capabilities in children with learning difficulties. American Journal of Mental Deficiency, 82, 559–564. Carlson, J.S., & Weidl, K.H. (1979). Toward a differential testing approach: Testing-thelimits employing the Raven matrices. Intelligence, 3, 323–344. Collins, A., Brown, J.S., & Newman, S.E. (in press). Cognitive apprenticeship: Teaching the craft of reading, writing, and mathematics. In L.B. Resnick (Ed.), Cognition and instruction: Issues and agendas. Hillsdale, NJ: Erlbaum. Collins, J. (1980). Differential treatment in reading groups. In J. Cook-Gumperz (Ed.), Educational discourse. London: Heinemann. Day, J.D. (1986). Teaching summarization skills: Influences of student ability level and strategy difficulty. Cognition and Instruction, 3, 193–210. Embretson, S.E. (1987). Improving the measurement of spatial aptitude by dynamic testing. Intelligence, 11, 333–358. Erlwanger, S.H. (1973). Benny’s conception of rules and answers in IPI mathematics. Journal of Children’s Mathematical Behavior, 1, 7–26. Ferrara, R.A. (1987). Learning mathematics in the zone of proximal development: The importance of flexible use of knowledge. Unpublished doctoral dissertation, University of Illinois, Champaign. Ferrara, R.A., Brown, A.L., & Campione, J.C. (1986). Children’s learning and transfer of inductive reasoning rules: Studies in proximal development. Child Development, 57, 1087–1099. Feuerstein, R. (1979). The dynamic assessment of retarded performers: The learning potential assessmerit device, theory, instruments, and techniques. Baltimore: University Park Press. Feuerstein, R. (1980). Instrumental enrichment: An intervention program for cognitive modifiability. Baltimore: University Park Press. Hirsch, E.D., Jr. (1987). Cultural literacy: What every American needs to know. Boston: Houghton Mifflin. Lidz, C.S. (Ed.). (1987). Dynamic assessment: An interactional approach to evaluating learning potential. New York: Guilford Press. Mehan, H. (1973). Assessing children’s language using abilities: Methodological and cross cultural implications. In M. Armer & A.D. Grimshaw (Eds.), Comparative social research: Methodological problems and strategies (pp. 309–343). New York: Wiley. Moore, J., & Newell, A. (1974). How can Merlin understand? In L.W. Gregg (Ed.), Knowledge and cognition (pp. 201–252). Hillsdale, NJ: Erlbaum. Palincsar, A.S., & Brown, A.L. (1984). Reciprocal teaching of comprehension-fostering and monitoring activities. Cognition and Instruction, 1, 117–175. Peck, D.M., Jenks, S.M., & Connell, M.L. (in press). Improving instruction via brief interviews. Arithmetic Teacher. Petitto, A.L. (1985). Division of labor: Procedural learning in teacher-led small groups. Cognition and Instruction, 2, 233–270. Putnam, R.T. (1987). Structuring and adjusting content for students: A study of live and simulated tutoring of addition. American Educational Research Journal, 24, 13–48. Raven, J.C. (1956). Coloured progressive matrices. New York: Psychological Corp. Resnick, L.B. (1982). Syntax and semantics in learning to subtract. In T. Carpenter, J. Moser, & T. Romberg (Eds.), Addition and subtraction: A cognitive perspective (pp. 136–158). Hillsdale, NJ: Erlbaum. Rohwer, W.D., Jr. (1973). Elaboration and learning in childhood and adolescence. In H. W. Reese (Ed.), Advances in child development and behavior ( Vol. 8, pp. 1–57). New York: Academic Press.
Salkind_Chapter 90.indd 348
9/4/2010 7:26:31 PM
Campione
Taxonomy of Approaches
349
Stodolsky, S. (1988). The subject matters: Classroom activity in math and social studies. Chicago: University of Chicago Press. Vye, N.J., Burns, M.S., Delclos, V.R., & Bransford, J.D. (1987). A comprehensive approach to assessing intellectually handicapped children. In C.S. Lidz Ed.), Dynamic assessment: An interactional approach to evaluating learning potential (pp. 327–359). New York: Guilford Press. Vygotsky, L.S. (1978). Mind in society: The development of higher psychological processes. (M. Cole, V. John-Steiner, S. Scribner, & E. Souberman, Eds. and Trans.). Cambridge, MA: Harvard University Press. Wechsler, D. (1974). Wechsler intelligence scales for children-Revised. New York: Psychological Corp.
Salkind_Chapter 90.indd 349
9/4/2010 7:26:32 PM
Salkind_Chapter 90.indd 350
9/4/2010 7:26:32 PM
91 Standardized Testing Roger T. Lennon
S
tandardized testing is a twentieth-century phenomenon in American education. It is so recent a development that the pioneers in the field are still alive, and practically all of the present-day leaders had their training under some of the pioneers.
The Beginnings of Mental Measurement The origin of the scientific measurement of intelligence is commonly attributed to the work of the French psychologist, Alfred Binet, who in 1905 (by coincidence the date of World Book Company’s founding) with the collaboration of Theodore Simon, issued the Binet-Simon Test of Intelligence, generally regarded as the forebear of many, perhaps most, intelligence tests. Binet was interested in the practical problem of identifying feeble-minded children who could not profit from the ordinary program of school instruction. His ‘work, therefore, foreshadows the use of tests for educational classification and guidance purposes. Binet’s work attracted much interest in other countries, including the United States. Among the American psychologists who were greatly impressed by the potential usefulness of the Binet scale was Lewis M. Terman, of Stanford University, who undertook to produce an adaptation of the scale for American use. Terman’s work resulted in the publication in 1916 of the
Source: NASSP Bulletin: National Association of Secondary-School Principals, 39 (1955): 34 – 40.
Salkind_Chapter 91.indd 351
9/4/2010 10:59:40 AM
352
Research Design, Measurement and Statistics and Evaluation
Stanford Revision of the Binet-Simon Scale, which immediately established itself as the best available individual intelligence test. Psychologists were quick to sense the exciting possibilities of this new tool for more accurate study of human beings. Dependable measurement of mental ability opened up new research vistas: studies of the nature of genius, in which Terman was vitally interested; new light on the perennial question of heredity and environment; identification of and educational provision for the feebleminded; effects of physical handicaps on intelligence; racial differences in intelligence; and a host of other theoretical and applied problems. But it was clear that any large-scale efforts to exploit the potentialities of these instruments would be severely limited by the time-consuming individual examination method required for the Stanford-Binet and the need for a trained psychometrist to do the testing. Among the first to investigate a practical means of overcoming these limitations was one of Terman’s students at Stanford, Arthur S. Otis, who in 1915 undertook as his doctoral dissertation the preparation of an intelligence test that could be administered to large groups of persons by relatively unskilled examiners. His original group intelligence scale formed the basis for the Army Alpha Examination used in classifying hundreds of thousands in World War I.
The Origins of Standardized Testing Meanwhile, roughly paralleling the development of intelligence measurement, and stemming in part from it, was the development of standardized measures of education achievement. Achievement testing had, of course, always been an integral part of education. What was new about standardized achievement tests was their systematic utilization of principles of scientific measurement of human abilities, developed in the psychological laboratories. The application of scientific principles to the building of educational tests received its greatest initial impetus from the work of E. L. Thorndike at Teachers College, Columbia University. Thorndike in 1904 published the first textbook on educational measurement, An Introduction to the Theory of Mental and Social Measurement. Shortly after, there began to come from Thorndike and his students pioneering efforts to measure objectively outcomes in various subjects. An awareness of the importance of carefully controlled administration and uniform scoring was carried over from the psychological laboratory. Statistical methodology that had been growing up in connection with the measurement of mental traits was applied with ingenuity and effectiveness to the examination of achievement. Experiments revealed unreliability of grading, and the extent to which marks depended on the whims and prejudices of the teacher or scorer.
Salkind_Chapter 91.indd 352
9/4/2010 10:59:40 AM
Lennon
Standardized Testing
353
The Scientific Movement in Education To understand why education at the end of the century’s second decade was so ready to extend a welcome to standardized testing, it is necessary to appreciate two concepts that were then just beginning to loom large in the philosophy of education: a. An emerging recognition of and respect for the individuality of each pupil as a person unique in his talent and needs, predicated on the test revealed an enormous range of differences among individuals, and an appreciation of the implications – e.g., individualization of instruction, differentiation of goals and curricula, need for guidance services – that follow. b. The development of a science of education, with major dependence on research and tested knowledge as the keys to improvement of instruction, administration, evaluation, and other phases of educational effort. What is more natural than that the schoolman should see in standardized testing the obvious, indeed the indispensable, instrument for implementation of these two dieas? Now he had the tools for assaying the intellectual endowment of each of his pupils, for measuring precisely each one’s attainment in the various branches of learning. Could he not with complete objectivity and high precision evaluate the work of pupil, teacher, school, and system? So it seemed to many an early devotee of testing. Of the ferment that the new “scientific movement in education” was in fact causing, evidences abound. Bureaus of research began to make their appearance in large city school systems, in state departments of education, and in state universities; the years 1921–1922 saw the establishment of more than 100 such bureaus in city systems. Generally, the major preoccupation of these bureaus was with testing and measurement problems. Textbooks on statistical research methods applied to education began to make their appearance. In 1915 the Association of Directors of Educational Research (later to become the American Educational Research Association) was formed to serve the growing number of persons engaged in this field.
Publisher with a Vision But for all this evidence that standardized testing in 1920 was “an idea whose time had come,” it is not improbable that the measurement of intelligence and of scholastic outcomes would have been much deferred in its impact on the schools had it not been for the vision of Caspar W . Hodgson, founder of World Book Company, who was quick to sense the implications of the measurement movement for all of education and the probable readiness of schools to adopt these new aids. As early as 1914, the Company, under
Salkind_Chapter 91.indd 353
9/4/2010 10:59:40 AM
354
Research Design, Measurement and Statistics and Evaluation
Hodgson’s direction, published Curtis Standard Practice Tests in Arithmetic. During World War I, World Book Company made arrangements with Arthur S. Otis for publication of his group intelligence test for school use and in 1918 issued the first of the historic series of Otis intelligence tests under the title Otis Group Intelligence Scale. The Otis test was quickly followed in the next few years by World Book Company’s publication of mental ability tests of such outstanding psychologists as Terman (1920), Haggerty (1920), Yerkes (with Haggerty, Terman, Thorndike, and Whipple) (1920), Miller (1921), Herring (1922), Goodenough (1926) – a roster of authors that reads like a Who’s Who of the pioneers of testing. Almost simultaneous with the Company’s entry into the intelligence testing field was its publication of standardized achievement tests – e.g., Haggerty Reading Examinations (1920), Hudelson English Composition Scale (1920), Henmon French Tests and Henmon Latin Tests (1921) – climaxed in 1923 by publication of the first Stanford Achievement Test by L. M. Terman, T. L. Kelley, and G. M. Ruch, a landmark in the history of educational measurement. Psychologists did not confine their early efforts at measurement to the fields of intelligence and achievement. John L. Stenquist, one of Thorndike’s students at Teachers College, undertook to measure mechanical aptitude, producing Stenquist Mechanical Aptitude Tests, published by World Book Company in 1922. In the same year the Company released Thurstone Empolyment Tests, measuring clerical aptitude and typing and stenographic skills. Issuance of Downey Will-Temperament Tests (1921–22), the Company’s first cautious venture into the personality field, signalized its early awareness of the importance of measures of non-intellective traits.
A New Area of Educational Publishing B. R. Buckingham, in reviewing the one-quarter century history of AERA in 1941 remarked appropriately that “in 1919 test materials first began to be issued by commercial publishers … passing from an amateur to a professional basis.” The National Research Council in 1919 offered their National Intelligence Tests (1920) to World Book Company. Unique mechanical and editorial problems arose with the production of standards tests in large quantities – methods of packaging and of handling accessories such as scoring keys and class records, developing a system of nomenclature, cataloging, and such – and in the editorial area, the training and efficient utilization of specialized professional editorial help. Arthur S. Otis was added to the World Book Company Staff (1921) to insure quality of tests and to assist school personnel in their selection and use. Many policies and practices developed by the Company during those early years have become standard and have been adopted in whole or in part by every publisher that has since come into the field.
Salkind_Chapter 91.indd 354
9/4/2010 10:59:40 AM
Lennon
Standardized Testing
355
Recognizing the need for the training in tests and measurements, World Book Company promptly addressed itself to the task of consumer education. It established a Test Service Department to provide counsel and guidance on testing matters. The Company inaugurated in 1923 its Test Service Bulletin series to provide simple, non-technical exposition of test topics and practical concrete examples of effective use of tests. It launched the Measurement and Adjustment Series of professional books, under the editorship of Lewis M. Terman, which series was to include such classics as Goodenough’s Measurement of Intelligence by Drawings (1926), Hull’s Aptitude Testing (1928), Kelley’s Interpretation of Educational Measurements (1927), Otis’s Statistical Method in Educational Measurement (1925), and Hildreth’s Psychological Service for School Problems (1930). The Company co-operated with universities in setting up courses in tests and measurements, and in several instances it effected test distributing arrangements with university research bureaus to stimulate promotion of testing by competent persons. It arranged that some of its sales representatives be specially schooled in the use of tests so that they could assist educators. Perhaps most important of all, it strove through provision of clear, easily understood manuals of directions to make the administration, scoring, and interpretation of tests sufficiently simple to be handled by the educator with little or no formal training in measurement.
Testing and Guidance From the earliest days of mental measurements one of the major uses envisioned for test results was in connection with vocational and educational guidance. Early on World Book Company’s list appeared Thurstone Vocational Guidance Tests (1922), along with the mechanical and clerical aptitude tests mentioned earlier. Widespread use of tests for vocational guidance purposes, however, had to wait for the guidance movement to establish a foothold in the schools. This did not take place until the middle or late ‘20’s. The guidance counselor soon found himself, by choice or necessity, engaged in a far more extensive kind of counseling program concerned with problems of personal, social, and emotional adjustment; and, as he had looked to guidance and interest measures to help in his vocational advisement, so he turned to personality measures, problem checklists, and the like for a better understanding of his counselees. World Book Company adopted on the whole a cautious attitude toward instruments of this kind, for it was clear that they did not compare in validity or reliability with achievement or general ability measures and, moreover, that their proper use called for a much higher level of training and sophistication. Apart from HaggertyOlson-Wickman Behavior Rating Schedules (1930), it was not until the late 1930’s that the Company published any inventories, issuing at that time Washburne
Salkind_Chapter 91.indd 355
9/4/2010 10:59:40 AM
356
Research Design, Measurement and Statistics and Evaluation
Social-Adjustment Inventory and Pintner Aspects of Personality, the latter being one of the first efforts to measure personality characteristics at the elementaryschool level. In 1942 the Company, recognizing the changing emphasis in personality measurement, issued the Individual Record Blank for the Rorschach method of personality diagnosis, together with the authoritative KlopferKelley volume on this foremost of the projective methods, and in 1953 a similar Record Blank for the Thermatic Apperception Test, next most popular of the projective measures for personality study. New inventories in the form of Heston Personal Adjustment Inventory (1948), and Gordon Personal Profile (1953), have been added to the Company’s list in late years.
The Scoring Machine In the late ‘20’s and early ‘30’s leaders in the testing movement were becoming aware of the need for greatly improved methods of scoring tests and processing data, particularly in connection with state-wide and other largescale programs. Largely through the efforts of Ben D. Wood, International Business Machines Corporation became interested in developing a machine to score tests. By 1933 IBM had pilot models of a scoring machine in operation and in 1937 offered the machine for general use. World Book Company collaborated with IBM both by granting permission for use of the answersheet arrangement under a basic patent held by the Company and by conducting a series of studies on the reliability and the validity of tests marked for scoring on the IBM test scoring machine. Convinced that under proper circumstances the machine could be used without any harmful effects from a measurement standpoint, the Company moved quickly to adapt several of its tests for scoring on the new machine, being the first of the commercial publishers to take this step.
Toward Higher Standards The search for devices to help in scoring large numbers of tests was only one sign of the continued expansion of testing during the 1950’s. An unceasing stream of new tests, several new journals devoted almost exclusively to measurement problems, and the expansion of state and regional testing programs were other indications of the thriving conditions of the field. Test users were growing more sophisticated; they were asking for higher standards of technical excellence in the tests that they bought. During 1930–1940 World Book Company gradually undertook a large part of those aspects of test development which had exceeded the resources of authors, and supplemented their thoroughgoing subject matter knowledge and psychological expertness with technical and experimental facilities.
Salkind_Chapter 91.indd 356
9/4/2010 10:59:40 AM
Lennon
Standardized Testing
357
Thus much of the development and standardization of such publications as Metropolitan Achievement Tests, Pintner General Ability Tests, the revised editions of Harry A. Greene’s Iowa Silent Reading Tests; Durrell-Sullivan Reading Tests, and the Otis Quick-Scoring Mental Ability Tests were planned and carried on by the staff of the Company. The Test Division increasingly assumed responsibility for experimental work on problems of validity, reliability, equivalence of forms, norms, and the like. Staff and equipment, including scoring and tabulating machines, were expected to handle these operations. This mode of operation was in full effect at the time of development of the 1940 edition of Stanford Achievement Test, for which the Company launched a nation-wide standardization program more comprehensive in scope than any previously undertaken – one that set a pattern for standardization of later editions of this battery and of Metropolitan Achievement Tests, a pattern that no other publisher has yet attempted to equal.
World War II and After Like World War I before it, World War II furthered the testing movement. The extensive work of the armed forces in developing a great variety of tests for use in improved selection and classification aided in two ways: (1) huge research programs, impossible under any other auspices, led to substantial advances in measurement theory and improved understanding of the structure and organization of human abilities; (2) literally millions of persons, brought into contact with standardized tests in situations in which the test results had an intimate and direct bearing on their own careers, acquired a knowledge of the usefulness of these psychological tools. In the years that followed World War II, the Company completely revised both the Metropolitan and Stanford Achievement Tests, issued an entirely new series of high-school achievement tests, new personality instruments, new intelligence measures (both single-score and multi-factor), and other contemporary instruments combined with the well-established titles to form a comprehensive, well-rounded set of offerings for primary, elementary, and secondary schools.
The Way Ahead So much for this little history. It has brought us, at mid-century, and at World Book Company’s golden anniversary, to a point where standardized testing finds itself an accepted and essential part of school practice to an extent surprising in view of its relatively short career. Approximately 75,000,000 standardized tests are given annually in the schools, evidence enough of the reliance that the educator has come to place on test data to guide his activities.
Salkind_Chapter 91.indd 357
9/4/2010 10:59:40 AM
358
Research Design, Measurement and Statistics and Evaluation
Startling advances in the application of electronic devices to test scoring and to processing of test data already are on the horizon. Professional organizations, including the American Psychological Association, the American Educational Research Association, and the National Council on Measurements Used in Education are concerning themselves with codes of standards for the distribution of tests. A steady stream of research on test theory, construction, and application overflows the pages of the now numerous professional journals in this field. A variety of unusually fine textbooks in the measurement field has appeared in recent years. In short, there is every sign that the testing movement has taken deep root and is in a flourishing and vigorous state of maturity. World Book Company’s constant concern for increasing the effectiveness of testing led it to inaugurate a new series entitled Test Service Notebook, offering articles of a somewhat more technical nature than the popular Test Service Bulletins; to create a Fellowship in Educational Measurement, awarded annually through the American Educational Research Association; to increase its facilities for offering consultant service to school systems in need of professional counsel; and to give continuing vigorous attention to the development of test manuals and other accessories that will be of maximum usefulness in the intelligent application of test results.
Salkind_Chapter 91.indd 358
9/4/2010 10:59:40 AM
92 The Place of Statistics in Psychology Jum Nunnally
M
ost psychologists probably will agree that the emphasis on statistical methods in psychology is a healthy sign. Although we sometimes substitute statistical elegance for good ideas and overembellish small studies with elaborate analyses, we are probably on a firmer basis than we were in the prestatistical days. However, it will be argued that there are some serious misemphases in our use of statistical methods, which are retarding the growth of psychology. The purpose of this article is to criticize the use of statistical “hypothesistesting” models and some related concepts. It will be argued that the hypothesis-testing models have little to do with the actual testing of hypotheses and that the use of the models has encouraged some unhealthy attitudes toward research. Some alternative approaches will be suggested. Few, if any, of the criticisms which will be made were originated by the author, and, taken separately, each is probably a well-smitten “straw man.” However, it is hoped that when the criticisms are brought together they will argue persuasively for a change in viewpoint about statistical logic in psychology.
What Is Wrong Most will agree that science is mainly concerned with finding functional relations. A particular functional relationship may be studied either because it is interesting in its own right or because it helps clarify a theory.
Source: Educational and Psychological Measurement, XX(4) (1960): 641–650.
Salkind_Chapter 92.indd 359
9/4/2010 10:59:32 AM
360
Research Design, Measurement and Statistics and Evaluation
The functional relations most often sought in psychology are correlations between psychological variables, and differences in central tendency in differently treated groups of subjects. Saying it in a simpler manner, psychological results are usually reported as correlation coefficients (or some extension thereof, such as factor analysis) and differences between means (or some elaboration, such as a complex analysis of variance treatment). Hypothesis Testing. After an experiment is completed, and the correlations or differences between means have been obtained, the results must be interpreted. The experimenter is aware of sampling error and realizes that if the experiment is run on different groups of subjects the obtained relations will probably not be the same. How then should he take into account the chance element in the obtained relationship? In order to interpret the results, the experimenter would, as most of us have, rely on the statistical models for hypothesis testing. It will be argued that the hypothesis-testing models are inappropriate for nearly all psychological studies. Statistical hypothesis testing is a decision theory: you have one or more alternative courses of action, and the theory leads to the choice of one or several of these over the others. Although the theory is very useful in some practical circumstances (such as in “quality control”), it is misnamed. It has very little to do with hypothesis testing in the way that hypotheses are tested in the work-a-day world of scientific activity. The most misused and misconceived hypothesis-testing model employed in psychology is referred to as the “null-hypothesis” model. Stating it crudely, one null hypothesis would be that two treatments do not produce different mean effects in the long run. Using the obtained means and sample estimates of “population” variances, probability statements can be made about the acceptance or rejection of the null hypothesis. Similar null hypotheses are applied to correlations, complex experimental designs, factor-analytic results, and most all experimental results. Although from a mathematical point of view the null-hypothesis models are internally neat, they share a crippling flaw: in the real world the null hypothesis is almost never true, and it is usually nonsensical to perform an experiment with the sole aim of rejecting the null hypothesis. This is a personal point of view, and it cannot be proved directly. However, it is supported both by common sense and by practical experience. The common-sense argument is that different psychological treatments will almost always (in the long run) produce differences in mean effects, even though the differences may be very small. Also, just as nature abhors a vacuum, it probably abhors zero correlations between variables. Experience shows that when large numbers of subjects are used in studies, nearly all comparisons of means are “significantly” different and all correlations are “significantly” different from zero. The author once had occasion to use 700 subjects in a study of public opinion. After a factor analysis of the results, the factors were correlated with individual-difference
Salkind_Chapter 92.indd 360
9/4/2010 10:59:32 AM
Nunnally
The Place of Statistics in Psychology
361
variables such as amount of education, age, income, sex, and others. In looking at the results I was happy to find so many “significant” correlations (under the null-hypothesis model) – indeed, nearly all correlations were significant, including ones that made little sense. Of course, with an N of 700 correlations as large as .08 are “beyond the .05 level.” Many of the “significant” correlations were of no theoretical or practical importance. The point of view taken here is that if the null hypothesis is not rejected, it usually is because the N is too small. If enough data is gathered, the hypothesis will generally be rejected. If rejection of the null hypothesis were the real intention in psychological experiments, there usually would be no need to gather data. The arguments above apply most straightforwardly to “two-tail tests,” which are used in most experiments. A somewhat better argument can be made for using the null hypothesis in the one-tail test. However, even in that case, if rejection of the null hypothesis is not obtained for the specified direction, the hypothesis can be reversed and rejection will usually occur. Perhaps my intuitions are wrong – perhaps there are many cases in which different treatments produce the same effects and many cases in which correlations are exactly zero. Even so, the emphasis on the null-hypothesis models is unfortunate. As is well recognized, the mere rejection of a null hypothesis provides only meager information. For example, to say that a correlation is “significantly” different from zero provides almost no information about the relationship. Some would argue that finding “significance” is only the first step, but how many psychologists ever go beyond this first step? Psychologists are usually not interested in finding tiny relationships. However, once this is admitted, it forces either a modification or an abandonment of the null-hypothesis model. An alternative to the null hypothesis is the “fixed-increment” hypothesis. In this model, the experimenter must state in advance how much of a difference is an important difference. The model could be used, for example, to test the differential effect of two methods of teaching psychology, in which an achievement test is used to measure the amount of learning. Suppose that the regular method of instruction obtains a mean achievement test score of 45. In the alternative method of instruction, laboratory sessions are used in addition to lectures. The experimenter states that he will consider the alternative method of instruction better if, in the long run, it produces a mean achievement test score which is at least ten points greater than the regular method of instruction. Suppose that the alternative method actually produces a mean achievement test score of 65. The probability can then be determined as to whether the range of scores from 55 upwards covers the “true” value (the parameter). The difficulty with the “fixed-increment” hypothesis-testing model is that there are very few experiments in which the increment can be stated in advance. In the example above, if the desired statistical confidence could not
Salkind_Chapter 92.indd 361
9/4/2010 10:59:32 AM
362
Research Design, Measurement and Statistics and Evaluation
be found for a ten point increment, the experimenter would probably try a nine point increment, then an eight point increment, and so on. Then the experimenter is no longer operating with a hypothesis-testing model. He has switched to a confidence-interval model, which will be discussed later in the article. The Small N Fallacy. Closely related to the null hypothesis is the notion that only enough subjects need be used in psychological experiments to obtain “significant” results. This often encourages experimenters to be content with very imprecise estimates of effects. In those situations where the dispersions of responses are small, only a small number of subjects is required. However, such situations are seldom encountered in psychology. The question, “When is the N large enough?” will be discussed later in the article. Even if the object in experimental studies were to test the null hypothesis, the statistical test is often compromised by the small N. The tests depend on assumptions like homogeneity of variance, and the small N study is not sufficient to say how well the assumptions hold. The small N experiment, coupled with the null hypothesis, is usually an illogical effort to leap beyond the confines of limited data to document lawful relations in human behavior. The Sampling Fallacy. In psychological experiments we speak of the group of subjects as a “sample” and use statistical sampling theory to assess the results. Of course, we are seldom interested only in the particular group of subjects, and it is reasonable to question the generality of the results in wider collections of people. However, we should not take the sampling notion too seriously, because in many studies no sampling is done. In many studies we are content to use any humans available. College freshmen are preferred, but in a pinch we will use our wives, secretaries, janitors, and anyone else who will participate. We should then be a bit cautious in applying a statistical sampling theory, which holds only when individuals are randomly or systematically drawn from a defined population. The Crucial Experiment. Related to the misconceptions above are some misconceptions about crucial experiments. Before the points are argued, a distinction should be made between crucial designs and crucial sets of data. A crucial design is an agreed-on experimental procedure for testing a theoretical statement. Even if the design is accepted as crucial, a particular set of data obtained with the design may not be accepted as crucial. Although crucial designs have played important parts in some areas of science, few of them are, as yet, available in psychology. In psychology it is more often the case that experimenters propose different designs for testing the same theoretical statement. Experimental designs that apparently differ in small ways often produce different relationships. However, this is not a serious bother. Antithetical results should lead to more comprehensive theory. A more serious concern is whether particular sets of experimental data can be regarded as crucial. Even when different psychologists employ the
Salkind_Chapter 92.indd 362
9/4/2010 10:59:32 AM
Nunnally
The Place of Statistics in Psychology
363
same design they often obtain different relationships. Such inconsistencies are often explained by “sampling error,” but this is not a complete explanation. Even when the N’s are large, it is sometimes reported that Jones finds a positive correlation, Smith a negative correlation, and Brown a nil correlation. The results of psychological studies are sometimes particular to the experimenter and the time and place of the experiment. This is why most psychologists would place more faith in the results of two studies, each with 50 subjects, performed by different investigators in different places, than in the results obtained by one investigator for 100 subjects. Then we must be concerned not only with the sampling of people but with the sampling of experimental environments as well. The need to “sample” experimental environments is much greater in some types of studies than in others. For example, the need probably would be greater in group dynamic studies than in studies of depth perception.
What Should Be Done Estimation. Hypotheses are really tested by a process of estimation rather than with statistical hypothesis-testing models. That is, the experimenter wants to determine what the mean differences are, how large the correlation is, what form the curve takes, and what kinds of factors occur in test scores. If, in the long run, substantial differences are found between effects or if substantial correlations are found, the experimenter can then speak of the theoretical and practical implications. To illustrate our dependence on estimation, analysis of variance should be considered primarily an estimation device. The variances and ratios of variances obtained from the analysis are unbiased estimates of different effects and their interactions. The proper questions to ask are, “How large are the separate variances?’’ and “How much of the total variance is explained by particular classifications?” Only as a minor question should we ask whether or not the separate sources of variance are such as to reject the null hypothesis. Of course, if the results fail to reject the null hypothesis, they should not be interpreted further; but if the hypothesis is rejected, this should be considered only the beginning of the analysis. Once it is realized that the basis for testing psychological hypotheses is that of estimation, other issues are clarified. For example, the Gordian-knot can be cut on the controversial issue of “proving” the null hypothesis. If, in the long run, it is found that the means of two differently treated groups differ inconsequentially, there is nothing wrong with believing the results as they stand. Confidence Intervals. It is not always necessary to use a large N, and there are ways of telling when enough data has been gathered to have faith in statistical estimates. Most of the statistics which are used (means,
Salkind_Chapter 92.indd 363
9/4/2010 10:59:32 AM
364
Research Design, Measurement and Statistics and Evaluation
variances, correlations, and others) have known distributions, and, from these, confidence intervals can be derived for particular estimates. For example, if the estimate of a correlation is .50, a confidence interval can be set for the inclusion of the “true” value. It might be found in this way that the probability is .99 that the “true” value1 is at least as high as .30. This would supply a great deal more information than to reject the null hypothesis only. The statistical hypothesis-testing models differ in a subtle, but important, way from the confidence methods. The former make decisions for the experimenter on an all-or-none basis. The latter tell the experimenter how much faith he can place in his estimates, and they indicate how much the N needs to be increased to raise the precision of estimates by particular amounts. The null-hypothesis model occurs as a special case of the confidence models. If, for example, in a correlational study the confidence interval covers zero, then, in effect, the null hypothesis is not rejected. When this occurs it usually means that not enough data has been gathered to answer the questions at issue. Discriminatory Power. In conjunction with making estimates and using confidence methods with those estimates, methods are needed for demonstrating the strength of relationships. In correlational studies, this need is served by the correlations themselves. In measuring differences in central tendency for differently treated groups, no strength-of-relationship measure is generally used. One measure that is sometimes used is obtained by converting mean differences for two groups into a point-biserial correlation. This is easily done by giving the members of one group a “group score” of 1 and the members of the other group a “group score” of 2 (any other two numbers would serve the purpose). The dichotomous “group scores” are then correlated with the dependent variable. When the N is large, it is an eye-opener to learn what small correlations correspond to “highly significant” differences. There is a general strength-of-relationship measure that can be applied to all comparisons of mean differences. The statistic is Epsilon, which was derived by Kelley (1935) and extended by Peters and Van Voorhis (1940). The latter showed how Epsilon applies to analysis of variance methods and recommended its use in general. Their advice was not followed, and the suggestion here is that we reconsider Epsilon. Epsilon is an unbiased estimate of the correlation ratio, Eta. It is unbiased because “degrees of freedom” are employed in the variance estimates. To show how Epsilon is applied, consider the one classification analysis of variance results shown in Table 1. Epsilon is obtained by dividing the error variance (in the example in Table 1, the within columns variance) by the total variance, subtracting that from one, and taking the square-root of the result. The one classification in Table 1 explains 49 per cent of the total variance, which shows that the classification
Salkind_Chapter 92.indd 364
9/4/2010 10:59:32 AM
Nunnally
The Place of Statistics in Psychology
365
Table 1: Hypothetical results illustrating the use of epsilon Source Experimental treatments (between column means) Within columns Total
Sums of squares 510 490 1000 Within var. 2 (Epsilon) = 1 − Total var. 4.12 = 1− 8.13
df
Variance est.
4
127.50
119 123
4.12 8.13
= .49 Epsilon = .70
has high discriminatory power. Of course, in this case, the null hypothesis would have been rejected, but that is not nearly as important as it is to show that the classification produces strong differences. Whereas Epsilon was applied in Table 1 to the simplest analysis of variance design, it applies equally well to complex designs. Each classification produces an Epsilon, which shows directly the discriminatory power of each (See Peters and Van Voorhis, 1940). Epsilon is simply a general measure of correlation. If levels within a classification are ordered on a quantitative scale and regressions are linear, Epsilon reduces to the familiar r. A Point of View. Statisticians are not to blame for the misconceptions in psychology about the use of statistical methods. They have warned us about the use of the hypothesis-testing models and the related concepts. In particular they have criticized the null-hypothesis model and have recommended alternative procedures similar to those recommended here (See Savage, 1957; Tukey, 1954; and Yates, 1951). People are complicated, and it is hard to find principles of human behavior. Consequently, psychological research is often difficult and frustrating, and the frustration can lead to a “flight into statistics.” With some, this takes the form of a preoccupation with statistics to the point of divorcement from the headaches of empirical study. With others, the hypothesis-testing models provide a quick and easy way of finding “significant differences” and an attendant sense of satisfaction. The emphasis that has been placed on the null hypothesis and its companion concepts is probably due in part to the professional milieu of psychologists. The “reprint race” in our universities induces us to publish hastily-done, small studies and to be content with inexact estimates of relationships. There is a definite place for small N studies in psychology. A chain of small studies, each elaborating and modifying the hypotheses and procedures, can eventually lead to a good understanding of a domain of behavior.
Salkind_Chapter 92.indd 365
9/4/2010 10:59:33 AM
366
Research Design, Measurement and Statistics and Evaluation
However, if such small studies are taken out of context and considered (or published) separately, they usually are of little value, even if null hypotheses are successfully rejected. Psychology had a proud beginning, and it would be a pity to see it settle for the meager efforts which are encouraged by the use of the hypothesistesting models. The original purpose was to find lawful relations in human behavior. We should not feel proud when we see the psychologist smile and say “the correlation is significant beyond the .01 level.” Perhaps that is the most that he can say, but he has no reason to smile.
Note 1. Technically, it would be more correct to say that the probability is .99 that the range from .30 to 1.00 covers the parameter.
References Kelley, T. L. “An Unbiased Correlation Ratio Measure.” Proceedings of the National Academy of Science, Washington, XXI (1935), 554–559. Peters, C. C., and Van Voorhis, W. R. Statistical Procedures and Their Mathematical Bases. New York: McGraw-Hill, 1940. Savage, R. J. “Nonparametric Statistics.” Journal of the American Statistical Association, LII (1957), 332–333. Tukey, J. W. “Unsolved Problems of Experimental Statistics.” Journal of the American Statistical Association, XLIX (1954), 710. Yates, F. “The Influence of Statistical Methods for Research Workers on the Development of the Science of Statistics.” Journal of the American Statistical Association, XLVI (1951), 32–33.
Salkind_Chapter 92.indd 366
9/4/2010 10:59:33 AM
93 Education in Statistics and Research Design in School Psychology Steven G. Little, Howard B. Lee and Angeleque Akin-Little
I
n the US, accreditation and training issues have long been debated within the field of school psychology as has the entry level of training for school psychology practice. The focus of most of the controversy has been on whether school psychology should adopt standards similar to clinical psychology by having the doctoral level as the entry level for practice (Brown, 1989; Cobb, 1989; Coulter, 1989; Fagan, 1989a; Fagan, 1989b; Prasse, 1989; Slate, 1989; Stone, 1989; Welsh, Rosenthal et al., 1990). These arguments tend to focus more on guild issues such as licensure and independent practice than on training in specific areas. Graduate education, particularly doctoral degree education, varies from country to country with regard to historical traditions and specific demands (Noble, 1994). The model of training of school psychologists in the US has some variability regarding amount and focus of training. There are certain commonalities among training programs but each program tends to have a larger standard or ideal on which they base their training ‘model.’ Fagan and Wise (2000) define three broad training models: the Scientist-Practitioner, Practitioner and Pragmatic. The Scientist-Practitioner Model is based on the assumption that professional psychologists are expected to do research and that practice should be grounded in general and experimental psychology (Frank, 1984). These programs generally offer the PhD. In the Professional Model, conducting research is de-emphasized while a greater emphasis is placed on preparing
Source: School Psychology International, 24(4) (2003): 437– 448.
Salkind_Chapter 93.indd 367
9/4/2010 10:59:24 AM
368
Research Design, Measurement and Statistics and Evaluation
students for professional practice (Korman, 1974). Programs adhering to this model frequently offer the PsyD (Doctor of Psychology), but may also offer the PhD, Specialist or Masters degree. The third model identified by Fagan and Wise is the Pragmatic Model and it is what is followed by most non-doctoral programs. The curriculum in these programs tends to be ‘highly prescriptive with courses required in direct correspondence to courses and competencies specified in state education agency requirements’ (Fagan and Wise, 2000; p. 203). In other words, pragmatic programs base their training on standards specified by the state in which the program is located as necessary for graduates to attain certification to practice as a school psychologist in public schools. Overall, states in the US have increased their standards for certification to practice over the past 20 years to the point where the minimum accepted standard in most states is a 60 semester hour degree (either Masters or Specialist) which includes a 1200 hour internship. Little and Rodemaker (1997) found that the mean semester hours for Masters school psychology programs in the US was 62.8 compared to 70.8 for Specialist programs. In viewing requirements in both levels of programs they concluded that ‘what differences exist are minor and almost all programs are designed to achieve for their graduates the credential to practice in public schools’ (p. 75). In 1995, Carey and Wilson briefly traced the development of school psychology training standards in the US. School psychology received some recognition from the National Council for the Accreditation of Teacher Education (NCATE) in the early 1970s. The National Association of School Psychologists (NASP) developed written training standards in 1972; however, they were not strictly enforced. These standards were revised in 1978 and became more important when NASP became an NCATE affiliate, meaning that NASP could provide program approval (NCATE awards accreditation). These standards included a minimum entry level of a 60-hour program and one-year internship. There are currently 123 Specialist-level programs and 53 Doctoral-level programs approved by NASP (NASP, 2003). The American Psychological Association (APA) developed standards for doctoral programs beginning in 1969 and began accrediting school psychology programs in 1971. These standards for doctoral education include the dissertation or major project, written or oral examinations and a one-year internship. There are currently 56 school psychology programs and 13 programs in ‘Combined Professional-Scientific Psychology’ that include school psychology accredited by the APA (APA, 2002). Fagan (1990) further addressed the issue as it pertained to accreditation by examining differences between programs accredited based on NCATE (NASP) or APA guidelines. NCATE (NASP) guidelines are similar for both non-doctoral and doctoral programs. APA has guidelines for doctoral programs only. An examination of these guidelines finds both to be very similar. Additional research and statistical requirements, however, are expected of the doctoral student. These requirements include the dissertation, additional papers, oral and written examinations and an internship experience.
Salkind_Chapter 93.indd 368
9/4/2010 10:59:24 AM
Little et al.
Statistics and Research Design
369
It has been argued that the doctorate is more of a research-based degree than subdoctoral degrees (Martens and Keller, 1987). While many make the assumption that doctoral level school psychologists follow the ScientistPractitioner model more so than subdoctorally trained individuals; in general, both have highly similar curricular expectations (Fagan, 1990; Fagan and Wise, 2000). Goh (1977) examined curricular content for school psychology training programs in the mid-1970s. The only areas in which he found significant differences between doctoral and subdoctoral training programs were in ‘Consultation’ and ‘Quantitative Methods.’ Brown and Minke (1986) conducted a comparable study in the mid-1980s and found similar differences in the area of research skills. It is the purpose of this study to expand the work of both Goh and Brown and Minke by examining statistics and research design requirements at various levels of graduate training in school psychology. The focus of this study is on training programs in the US because it provides the greatest concentration of programs which, although they may differ by level, follow generally similar training standards. An internet search of training standards at universities in other English speaking countries is also provided in an attempt to determine if there is consistency internationally with regard to training in statistics and research design.
Methods A survey requesting information on the type and quantity of statistics and research design courses required of students receiving graduate training in school psychology in the US was sent to each NASP approved and APA accredited program as well as others listed in the Directory of School Psychology Graduate Programs (Thomas, 1998). A total of 181 surveys were sent with a follow-up postcard sent approximately three weeks later. One hundred and eight surveys were returned, giving a return rate of 60 percent. Seven surveys were incomplete or unreadable leaving 101 usable surveys. Of the reporting programs, 22 offered a degree at the Master’s level while 47 were at the Specialist level and 35 were at the doctoral level (seven PsyD and 25 PhD). Twenty-three of the doctoral level programs reported APA accreditation and 21 reported NASP approval. Of subdoctoral programs, 40 reported NASP approval. Overall, 48 programs (47.5 percent) indicated being located in a psychology department, 41 (40.6 percent) in a school of education, and 12 (11.9 percent) in another administrative unit (usually a free standing department). At the MA level (n = 22), 63.6 percent of the programs were located in psychology departments, 31.8 percent in schools of education, and 4.5 percent in another administrative unit. At the EdS level (n = 47), 44.7 percent of the programs were located in psychology departments, 40.4 percent in schools of education, and 14.9 percent in another administrative unit. At the PsyD
Salkind_Chapter 93.indd 369
9/4/2010 10:59:24 AM
370
Research Design, Measurement and Statistics and Evaluation
level (n = 7), 42.9 percent of the programs were located in psychology departments, 28.6 percent in schools of education, and 28.6 percent in another administrative unit. At the PhD level (n = 25), 40.0 percent of the programs were located in psychology departments, 52.0 percent in schools of education and 8.0 percent in another administrative unit. In an attempt to ascertain requirements for education in statistics and research design in universities outside of the US, an internet search was conducted of universities in countries where English is a primary language. A total of nine universities were identified as having school/educational psychology programs and a website with adequate information on requirements in statistics and research design. These programs were located in Australia (n = 2), Canada (n = 3), England (n= 1), Scotland (n = 2) and New Zealand (n = 1).
Results A comparison of programs at the sub-doctoral level indicated no significant difference between MA and EdS programs on the number of students admitted per year (M = 11.67 and M = 12.23 respectively; t(62) = −0.38, p = 0.706); however, a significant difference (t(62) = −3.83, p < 0.001) was noted between the total number of hours required for the degree with EdS program (M = 71.16) requiring more hours than MA program (M = 63.67). No differences were noted between MA and EdS programs in the number of statistics courses required (M = 1.80 and M = 1.81 respectively; t(63) = 0.478, p = 0.638) or the number of research design courses required (M = 11.67 and M = 12.23 respectively; t(61) = −0.13, p = 0.898). A comparison of programs at the doctoral level indicated no significance difference between PsyD and PhD programs on the number of students admitted per year (M = 13.29 and M = 8.32 respectively; t(30) = 1.76, p = 0.088) or between the total number of hours required for the degree (M= 97.17 and M = 102.48 respectively; t(16.32) = 6.48, p = 0.105). In addition, no difference were noted between Psy.D and PhD programs in the number of statistics courses required (M = 1.57 and M = 1.96 respectively; t(6.48) = −1.89, p = 0.105) or the number of research design courses required M = 1.50 and M = 1.96 respectively; t(5.68) = −1.80, p = 0.124). As analyses revealed no significant differences within programs at the doctoral (PhD and PsyD) and subdoctoral (MA and EdS) levels in quantity of statistics or research design course requirements, all further analyses compared doctoral versus subdoctoral training programs. As would be expected, significant differences were observed between subdoctoral and doctoral programs in the number of students admitted each year (M = 12.05 and M = 9.41 respectively; t(94) = 2.03, p = 0.045) and the number of hours required for the degree (M = 68.70 and M = 101.38 respectively;
Salkind_Chapter 93.indd 370
9/4/2010 10:59:24 AM
Little et al.
Statistics and Research Design
371
t(91) = −13.44, p < 0.001). Significant differences were also observed between subdoctoral and doctoral programs in the quantity of required statistics courses (M = 1.44 and M = 2.91 respectively; t(98) = −6.72, p < 0.001) and research design classes (M = 1.08 and M = 1.48 respectively; t(89) = −2.87, p = 0.005). Percentages of the programs that require particular numbers of statistics and research design courses respectively are presented in Tables 1 and 2. The survey also requested information on specific statistics and research design competencies. Seventeen statistics competencies were identified in the survey. As can be seen in Table 3, of all programs, regardless of level, 90 percent or better agreement was reached on the importance of students knowing correlation (99 percent), descriptive statistics (99 percent), ANOVA (96 percent), and t-tests (97 percent). Chi Square was the next most common statistics competency with 86 percent of programs agreeing to its importance. This was followed by multiple regression (79 percent), power (76 percent), probability theory (67 percent), nonparametric statistics (58 percent), the central limit theorem (60 percent), ANCOVA (56 percent), exploratory data analysis (48 percent), MANOVA (42 percent), factor analysis (41 percent), discriminant function analysis (30 percent), Latin squares (14 percent), and confirmatory factor analysis (8 percent). A comparison of doctoral and subdoctoral programs using chi square analyses indicated differential emphasis being placed on knowledge of ANCOVA [Χ2(1, n = 95) = 20.81, P < 0.001], confirmatory factor analysis [Χ2(1, n = 95) = 3.87,
Table 1: Number of required statistics classes by degree in percent Number of statistics class Degree
0
1
2
3
4
5
6
Mean
Overall MA/MS/Med Specialist PsyD PhD
3 9 2 0 0
46 50 67 29 8
25 27 20 29 32
14 14 7 14 28
9 0 4 29 20
1 0 0 0 4
2 0 0 0 8
1.91 1.46 1.44 2.43 3.04
Table 2: Number of required research design classes by degree in percent Number of research design courses Degree
0
1
2
3
Mean
Overall MA/MS/Med Specialist PsyD PhD
9 10 12 0 4
66 62 74 25 61
22 29 13 75 22
4 0 0 0 13
1.20 1.46 1.44 2.43 3.04
Salkind_Chapter 93.indd 371
9/4/2010 10:59:24 AM
372
Research Design, Measurement and Statistics and Evaluation
Table 3: Percent programs requiring statistics competencies by degree Competency
All
MA/MS
EdS
PsyD
PhD
ANCOVA ANOVA Central limit theorem Chi square Confirmatory factor analysis Correlation Descriptive statistic Discriminant function analysis Exploratory data analysis Factor analysis Latin square MANOVA Multiple regression Nonparametric statistics Power Probability theory t-tests
56 96 60 86 8 99 99 30 48 41 14 42 79 58 76 67 97
48 95 52 76 0 100 95 19 38 35 19 33 67 76 67 71 95
36 93 59 86 7 97 100 18 41 32 0 30 73 52 71 71 96
83 100 33 83 33 100 100 67 33 67 17 50 100 83 100 50 100
95 100 71 95 10 100 52 52 71 38 29 67 95 52 86 62 100
Table 4: Percent requiring research design competencies by degree Competency
All
MA/MS
EdS
PsyD
PhD
Concepts and constructs Data collection methods Ethics Experimental designs Hypothesis testing Operational definitions Quasi-experimental design Reliability Sampling Scientific approach Single-subject designs Subject selection Types of variables Validity
90 88 88 96 97 85 85 93 86 91 60 83 88 91
90 90 90 90 100 95 80 80 85 80 60 55 85 75
93 91 88 100 98 81 81 93 84 95 56 88 88 95
100 100 100 83 100 100 100 100 100 83 67 100 100 100
83 79 83 96 92 79 92 100 88 96 67 92 88 96
p = 0.049], discriminant function analysis [X2(1, n = 95) = 20.01, p < 0.001], exploratory data analysis [X2(1, n = 95) = 50.80, p = 0.016], Latin squares [X2(1, n = 95) = 9.88, p = 0.002] , MANOVA [X2(1, n = 95) = 10.85, p < 0.001], multiple regression [X2(1, n = 95) = 8.28, p = 0.004], and power [X2(1, n = 95) = 4.83, p = 0.028] with doctoral programs being more likely to require each. Fourteen research design competencies were identified in the survey. As can be seen in Table 4, of all programs, regardless of level, 90 percent or better agreement was reached on the importance of students knowing hypothesis testing (97 percent), experimental design (96 percent), reliability (93 percent), scientific approach (91 percent), validity (91 percent), and
Salkind_Chapter 93.indd 372
9/4/2010 10:59:24 AM
Little et al.
Statistics and Research Design
373
Table 5: Quantity of statistical packages by level of program in percent Number of packages available Degree
0
1
2
3
4 or More
Masters Specialist PsyD PhD
20 7 0 0
50 46 43 10
5 20 43 43
25 17 14 38
0 9 0 16
Table 6: Availability of statistical packages by level of program by percent Package available Degree Masters Specialist PsyD PhD
SPSS
SAS
Other
None
68 81 86 88
27 21 43 64
18 21 29 36
23 11 0 0
concepts and constructs (90 percent). Data collection methods (88 percent), ethics (88 percent), types of variables (88 percent), operational definitions (85 percent), sampling (86 percent), quasi-experimental designs (85 percent), and subject selection (83 percent) reached a level of agreement of at least 80 percent. Only single subject design (57 percent) was considered important by less than 80 percent of responding programs. A series of chi square analyses indicated no significant differences between doctoral and subdoctoral programs on any of the identified competencies (all p > 0.05). The quantity and variety of statistical software packages was also examined for each level of program (Tables 5 and 6). Consistent with increased statistical competencies with doctoral level education came a greater quantity and variety in statistical software packages. While 20 percent of Masters and seven percent of Specialist programs offered their students no statistical software, all of the PsyD or PhD programs offered at least one. Masters (50 percent) and Specialist programs (46 percent) were most likely to offer one package, with 5 percent of Masters and 20 percent of Specialist programs offering two and 25 percent of Masters and 17 percent of Specialist programs offering three. Forty-three percent of PsyD and 10 percent of PhD programs offered one package, 43 percent of both PsyD and PhD programs offered two packages and 14 percent of PsyD programs and 38 percent of PhD programs offered three packages. Ten percent of PhD programs offered four or more packages. When only one package was offered it was most likely SPSS (80 percent), with 9 percent offering SAS and 11 percent offering assorted other packages. Overall, 68 percent of Masters, 81 percent
Salkind_Chapter 93.indd 373
9/4/2010 10:59:25 AM
374
Research Design, Measurement and Statistics and Evaluation
Specialist, 86 percent of PsyD and 88 percent of PhD programs offered their students access to SPSS while 27 percent of Masters, 21 percent of Specialist, 43 percent of PsyD and 64 percent of PhD programs offered SAS. Programs outside of the US were sampled solely to determine the quantity of statistics and research design courses required of students. One factor that became evident is that training standards and terminology differ to a large extent from country to country making anything but the most broad comparisons difficult. The programs surveyed offered various degrees such as Postgraduate Diploma, Master of Educational Psychology, MA, MEd, MSci, and PhD Results indicated that all of the programs identified required at least some coursework in research design and/or statistics. Programs in Canada most resembled the patterns found in the US. Programs in the other countries surveyed focused more on research methods/design courses. In other countries it was much less likely to find a course identified as Statistics than in the US or Canada.
Discussion Results of the survey of school psychology training programs in the US suggest that doctoral programs require a greater number of statistics and research design classes than do subdoctoral programs. This is to be expected as it is argued that the doctoral degree, especially the PhD, is a much more research oriented degree than the Masters or Specialist degrees (Martens and Keller, 1987). If the focus of training at the doctoral level is primarily that of scientist-practitioner and the focus of subdoctoral level training is more pragmatic (Fagan and Wise, 2000), then a larger number of courses in statistics and research design at the doctoral level would be logical. Fagan (1995) reports that the majority of degree recipients, regardless of level of training, become practitioners. One must then ask the question, how important is a knowledge of statistics and research design to school psychology practice and is there a minimum level of competency in these disciplines for practicing school psychologists? Results indicate that there is a greater emphasis on statistics competency at the doctoral level than at the subdoctoral level; however, research design competencies are remarkably similar regardless of level of education. It is clear that all levels require a certain level of competency in research design and an understanding of basic statistical principles. We believe this is essential for the practice of school psychology. The school psychologist is frequently the only one in the school environment who has been educated from the perspective of a scientist. This allows him or her to approach problem solving in ways teachers or other school personnel may not consider. Research and statistical competencies at all levels of training are therefore essential. These results suggest that all levels are receiving at least the basic education to allow this to occur. In addition, the increased training doctoral-level practitioners receive in statistics and research design should provide valuable to them in
Salkind_Chapter 93.indd 374
9/4/2010 10:59:25 AM
Little et al.
Statistics and Research Design
375
expanding their role to include more systemic interventions and being a catalyst for systems change initiatives. Internationally, a similar pattern appears evident. All programs surveyed required some training in research design and/or statistics. While training standards may differ from country to country, the importance of school/educational psychology being grounded in a solid foundation of research understanding appears universal. Differences were observed in specific statistical competencies between subdoctoral and doctoral level programs. Doctoral level programs were more likely to require knowledge of ANCOVA, confirmatory factor analysis, discriminant function analysis, exploratory data analysis, Latin squares, MANOVA, multiple regression and power. While making up a small percentage of the work force, trainers and researchers do provide a vital service to the profession. These individuals are most likely to hold the doctorate. As statistical competency may be considered important to be able to produce and disseminate reliable and valid research, the increased emphasis on statistical competencies at the doctoral level appear justified. It has been over 25 years since anyone has conducted a comprehensive analysis of training standards in school psychology. In a 1977 publication, Goh found a greater emphasis on ‘quantitative methods’ in doctoral programs than either Masters or Specialist level programs. In the 1980s, Brown and Minke (1986) found that ‘doctoral programs contain a much more comprehensive quantitative-methods curriculum (i.e. four to five courses), whereas the average specialist program requires two such courses’ (p. 1336). This same trend continues today. What this study investigated, that neither Goh nor Brown and Minke examined, were the specifics of training and how they differed depending on the level of training. The current results suggest several outcomes: training in statistics and research design is considered to be an important component in all levels of training in the preparation of school psychologists; doctoral level programs require a greater number of courses in both statistics and research design; there is little difference in the competencies that students are expected to acquire in research design, regardless of level and while there are core statistical competencies, doctoral level students are expected to have a greater understanding of a broader array of statistical concepts than subdoctoral students. The results of this study are mostly applicable to the training of school psychologists in the US. While an attempt was made to ascertain training standards at universities in other countries, those data are superficial at best. It would be helpful if future research examined training standards as they currently exist internationally. Such a comparative study of training standards in all areas of school/educational psychology would help trainers in all countries to explore commonalities and differences in training. More effective training models could be developed that would aid in professional growth. These programs could become a potent force for improving the lives and education of children in all nations.
Salkind_Chapter 93.indd 375
9/4/2010 10:59:25 AM
376
Research Design, Measurement and Statistics and Evaluation
References American Psychological Association (2002) ‘Accredited Doctoral Programs in Professional Psychology: 2002’, American Psychologist 57: 1096–109. Brown, D. T. (1989) ‘The Evolution of Entry Level Training in School Psychology: Are We Now Approaching The Doctoral Level?’, School Psychology Review 18: 11–15. Brown, D. T. and Minke, K. M. (1986) ‘School Psychology Graduate Training: A Comprehensive Analysis’, American Psychologist 41: 1328–38. Carey, K. T. and Wilson, M. S. (1995) ‘Training school psychologists’, in A. Thomas and J. Grimes (eds) Best Practices in School Psychology – III, pp. 171–78. Washington, DC: National Association of School Psychologists. Cobb, C. T. (1989) ‘Is it Time to Establish the Doctorate Entry-Level’, School Psychology Review 18: 16–19. Coulter, W. A. (1989) ‘The Entry Level for School Psychology: A Modest Proposal’, School Psychology Review 18: 20–24. Fagan, T. K. (1989a) ‘Debate: Is it Time to Establish the Doctorate as the Entry-Level for School Psychology?’, School Psychology Review 18: 9–10. Fagan, T. K. (1989b) ‘On the Entry Level Debate’, School Psychology Review 18: 34–36. Fagan, T. K. (1990) ‘Best Practices in the Training of School Psychologists: Considerations for Trainers, Prospective Entry-level and Advanced Students’, in A. Thomas and J. Grimes (eds) Best Practices in School Psychology – II, pp. 723–41. Washington, DC: National Association of School Psychologists. Fagan, T. K. (1995) ‘Trends in the History of School Psychology in the United States’, in A. Thomas and J. Grimes (eds) Best Practices in School Psychology – II. pp. 59–67. Washington, DC: National Association of School Psychologists. Fagan, T. K. and Wise, P. S. (2000) ‘School Psychology: Past, Present, and Future, 2nd edn. Bethesda, MD: National Association of School Psychologists. Frank, G. (1984) ‘The Boulder Model: History, Rationale, and Critique’, Professional Psychology: Research and Practice 15: 417–35. Goh, D. S. (1977) ‘Graduate Training in School Psychology’, Journal of School Psychology 15: 207–18. Korman, M. (1974) ‘National Conference on Levels and Patterns of Professional Training in Psychology’, American Psychologist 29: 441–49. Little, S. G. and Rodemaker, J. E. (1997) ‘Master’s Versus Specialist Degrees in School Psychology: Is There a Difference?’, Journal of Psychological Practice 3: 72–78. Martens, B. K. and Keller, H. R. (1987) ‘Training School Psychologists in the Scientific Tradition’, School Psychology Review 16: 329–37. National Association of School Psychologists (2003) ‘NASP-Approved Graduate Programs in School Psychology’, retrieved February 20, 2003, from http://www/nasponline. org/certification/NASPapproved.html. Noble, K. A. (1994) Changing Doctoral Degrees: An International Perspective. Bristol, PA: The Society for Research into Higher Education and Open University Press. Prasse, D. P. (1989) ‘Polarity: The Past and the Future’, School Psychology Review 18: 25–29. Slate, J. R. (1989) ‘Where’s The Data?’, School Psychology Review 18: 3–31. Stone, B. J. (1989) ‘In Support of the Doctoral Degree as the Entry Level for School Psychologists in the 1990s’, School Psychology Review 18: 32–33. Thomas, A. (1998) Directory of School Psychology Graduate Programs. Bethesda, MD: National Association of School Psychologists. Welsh, J. S., Rosenthal, G. T. and Stout, L. J. (1990) ‘Comments on the Second Entry-level Debate: A Modest Reply to a Modest Proposal’, School Psychology Review 19: 122–25.
Salkind_Chapter 93.indd 376
9/4/2010 10:59:25 AM
94 The Role of Measurement Error in Familiar Statistics Malcolm James Ree and Thomas R. Carretta
M
easurement error is an integral part of measurement and is frequently indexed by reliability. The reliability of a measure is the ratio of true variability to total variability. In simple nontechnical language, reliability means precision. Most educational researchers and psychologists learned about reliability in courses in psychometrics. Statistical techniques such as descriptive statistics, regression, or analysis of variance are taught in separate courses. In some instances, the two are combined, such as when Spearman’s true correlation is introduced; this, however, is frequently the only time. More recently, courses covering meta-analytic techniques frequently bridge that gap. The purpose of this article is to show the role of reliability in familiar statistics and to show how ignoring the consequences of (less than perfect) reliability in common statistical techniques can lead to false conclusions and erroneous interpretation. Because of their widespread use in applied research and application, we will illustrate the role of reliability with examples from descriptive statistics, z tests and t tests, correlation, partial correlation, linear regression, test bias analysis, factor analysis, analysis of variance, and analysis of covariance.
Measurement Error Model The true score model is the most frequently used measurement error model (see Fuller, 1987). In this model, the basic equation states that the observed Source: Organizational Research Methods, 9(1) (2006): 99–112.
Salkind_Chapter 94.indd 377
9/4/2010 10:59:12 AM
378
Research Design, Measurement and Statistics and Evaluation
score is equal to the true score plus an error score. Furthermore, the error score is assumed to be random and therefore independent of the true score. The true score and error score are not correlated. If the error score is not random, the result is called bias and the consequences may be situationally specific. There are measurement models for nonrandom error, but the current article is limited to random error. The basic true score equation is O = t + e, or Observed = True + Error. This yields1 2 2 2 σ observed + σ true + σ error .
(1)
By definition, reliability (Stanley, 1971) is the ratio of true score variance 2 2 to observed score variance or reliability = rXX ′ = σ true /σ obs . This is equivalent 2 2 r = 1 − ( σ / σ ), to XX ′ or reliability equals 1 minus the proportion of error error obs variance to observed variance. The effects of measurement error on familiar statistical techniques can be determined from these equations. The purpose of the current effort is to explain the consequences of the use of less than perfectly reliable measures in statistical analyses.
The Variable Is Not Reliable It is not unusual to hear people say, “That test is reliable” or “That is a reliable measure” of some construct. However, Thompson (2003) has forcefully made the point that a test (or other measured variable) is neither reliable nor unreliable. Reliability concerns the scores of the measure and is a consequence of the sample at hand. “It is important to evaluate score reliability in all studies, because it is the reliability of the data in hand that will drive study results, and not the reliability of the scores described in the test manual” (Thompson, 2003, p. 5). Your sample will almost surely differ from the normative sample reported in the test manual. It may differ in composition by gender, ethnicity, age, experience, education, testing circumstances, or many other variables. These differences will cause the reliability of your sample to be different from the reliability reported in the manual, and your results will be driven by the reliability of your sample.
Descriptive Statistics Mean The effect of unreliability on the mean is benign. Because the error score is random, the mean of the error score is expected to be zero. Therefore, the expectation of the observed mean equals the true mean. The bias caused by measurement error on the observed mean is nil.
Salkind_Chapter 94.indd 378
9/4/2010 10:59:13 AM
Ree and Carretta
The Role of Measurement Error
379
Variance and Standard Deviation The effects of measurement error on the variance and standard deviation are not so agreeable. Returning to the true score equation, we see that the observed 2 2 2 = σ true + σ error ). variance is the sum of the true and error variances (σ observed Consequently, the observed variance is greater when error increases. Observed score variance is always greater than true score variance when the variable has been measured with less than perfect reliability. If the true vari2 2 ) is 100 and the error variance (σ error ) is 10, the observed variance ance (σ true 2 ( Sobs ) will be 110. If the true variance remains 100 and the error variance is 20, the observed variance will be 120. Note that the effect on the standard deviations will appear to be less at 10.49 110 and 10.95 120 . In these cases, the reliability of the two scores would be .91 (σ2true/σ2obs = 100/110) and .83 (σ2true/σ2obs = 100/120), respectively. The biasing influence on effect size will be discussed in a subsequent section.
(
)
(
)
The z Test and the t Test The basic form of the z test and the t test is a sample statistic minus a population parameter in the numerator, divided by a standard error in the denominator. In the case of the z or t test of a mean, the benign effect on the mean precludes changes in the numerator. The effect of reliability is found in the denominator. The standard error (or estimated standard error in the t test) is the standard deviation divided by the square root of n, the sample size. Consider the two-sample (or independent-samples) two-tailed z test using a .05 Type I error rate. With a difference between the means of 3.6, a sample size of 30, and a true standard deviation (i.e., measured without error, rxx ′ = 1.0) of 10, the computed z value would be significant at 1.972. If the standard deviation were increased to 11 by unreliability (rxx ′ = .83), the z test statistic would not be significant with a value of 1.793. If the reliability were reduced further, the z value also would be reduced. For example, with the same sample size and mean difference, but reliability reduced to r xx ′ = .625 (observed standard deviation = 16), the z value is 1.232 and would not be significant.
Confidence Intervals Another way to evaluate the effects of unreliability is to look for differences in the width of confidence intervals. Addition of error variance to true variance causes the confidence intervals to increase. With a sample size of 30 and a true standard deviation of 10, when the reliability is 1.0, the true standard error is 1.826. If the reliability were reduced to .830 or to .625,
Salkind_Chapter 94.indd 379
9/4/2010 10:59:14 AM
380
Research Design, Measurement and Statistics and Evaluation
the standard errors become 2.008 and 2.921, respectively. The confidence interval becomes wider.
Effect Size Less than perfect reliability also will have an influence on effect size ((μ1 −μ2)/σe) (see Baugh, 2002, for an insightful discussion of the issue). Russell and Peterson (2002) reported the effect size for African American means versus White means on a series of tests in a research project. They discuss a spatial test called Reasoning, which had an effect size of .77. Russell and Peterson noted that many tests show an African American versus White effect size of 1.0, and their experimental tests were on average less than 1.0. The Reasoning test had a test-retest reliability of .65, and correcting the effect size for this unreliability, the true effect size becomes .96, very close to the size reported frequently for such differences. This change in effect size occurred because of the change in the estimate of the standard deviation when unreliability was accounted for. A different conclusion about the Reasoning test would have been reached had unreliability been taken into account. Clearly, unreliability causes a reduction in statistical power, an artifactual increase in confidence intervals, and a bias in estimating effect size. Ignoring the effects of unreliability will lead to inappropriate conclusions and inferences about the tests and the constructs being studied.
Correlation With the increased popularity of the meta-analytic technique of validity generalization (Hunter & Schmidt, 1990, 2004), the correction for attenuation has become well known again, at least by industrial/organizational psychologists (see Ree & Earles, 1993). Spearman (1904) demonstrated that the correlation between the observed scores of two variables was a function of the reliability of the two variables. The well-known formula that expresses this is rxy = rc
(
)
rXX ′ rYY ′ ,
(2)
where rc is the estimate of the true correlation (sometimes written rτxτy, where t indicates true score), rxy is the observed attenuated correlation, and rxx′ and ryy′ are the reliabilities of X and Y, respectively. For example, if two measures of the same construct (true score correlation of 1.0) each have a reliability of .8, the maximum correlation between the two (rXY)is .8. If one of the measures has a reliability of .6 and the other .8, the maximum observed2 correlation would be .69. Ignoring the consequences of reliability of the measures, the conclusion would be that there is a moderate to strong correlation rather than the perfect correlation obtained
Salkind_Chapter 94.indd 380
9/4/2010 10:59:14 AM
Ree and Carretta
The Role of Measurement Error
381
at the true score level. A practical consequence of this might be the search for new predictors to close the (specious) gap between .69 and 1.0. Observed correlations can be corrected for the unreliability of the variables by using an algebraic manipulation of Equation 2 to yield rC = rXY /
(
)
rXX ' rYY ' .
(3)
Consider an observed correlation of .72, where both variables X and Y have reliabilities of .8. Using the equation above for correcting the correlation, the true correlation between X and Y is .9. That is, rc = .72 / .8 × .8 = .9. Correlations between variables that change from low or moderate to moderate or high after correction for (less than perfect) reliability suggest that the variables’ utility could be improved if they were made more reliable. In addition, low to moderate correlations that do not increase in magnitude after correction for unreliability suggest that the variables contain other sources of variance.
(
)
Partial Correlation Partial correlation is the correlation between two variables, X and Y, while holding a third variable, Z, constant. Whether used for control or for selecting variables for stepwise regression, the role of reliability in partial correlation can be large. Consider the following example with three variables, X, Y, and Z, which measure the same construct with perfect reliability. The true correlation between X and Y, X and Z, and Y and Z would be 1.0. The partial correlation between any pair holding the other constant (i.e., partialing it out) would be .0. If the reliability of all measures were .8, the partial correlation can be given as .44. rp =
rXY (.8) − rXZ (.8) rYZ (.8) 1 − (rXZ × .8)2 1 − rYZ × .8)2
= .44.
Note that the value goes from no partial relationship (.0) to a moderate (.44) partial relationship. This is a big difference and might have substantial implications for theory, application, and policy. Caution is urged in interpretation. If Z were a variable used for control by partialing it out, its reliability would be influential in the estimation. For example, researchers partialed out age (Z in this example) to estimate the true correlation between leg length (X ) and running speed (Y ). Suppose that age (Z ), the variable to be partialed out, had a reliability of .4 and the triad of correlations among X, Y, and Z was truly 1. The observed partial correlation between leg length (X ) and running speed (Y ) would be .29 rather than .0. The observed correlation of .29 is a poor estimate of the true correlation, and the researcher would make erroneous conclusions about the relationship between the variables.
Salkind_Chapter 94.indd 381
9/4/2010 10:59:14 AM
382
Research Design, Measurement and Statistics and Evaluation
Linear Regression Coefficients Simple Linear Regression with One Predictor Consider a simple linear regression of Y on X. In the explanation of this regression, many statistics texts contain a single line such as “it is assumed that all predictors are fixed variables measured without error.” The role of measurement error in estimation of raw score regression weights, b, is given by b = β rXX ,
(4)
and for the regression constant (or intercept), we have a = Y −(b/rXX ′ ) X .
(5)
In the case of a one-predictor regression, the effect is direct and easy to understand. The b coefficient is biased toward zero, and the a coefficient is inflated. They are biased estimates of the population parameters. Unreliability in the criterion has no biasing effect on the regression coefficients; however, it does attenuate the correlation between the predictor and criterion. There is a simple method to correct these biased estimates. The b coefficient is divided by the reliability of the predictor variable, and this b coefficient is then placed in Equation 5 for the intercept. Suppose job performance criterion Y is regressed on test X, yielding the regression equation Y = 2.0 + 1.6X and that the reliability of test X is .80. Correcting the b coefficient gives (1.6/.8 = 2), and assuming means of 5 for the X variables and 10 for the Y variables, correcting the intercept gives 10 – 2(5) = 0. The corrected regression equation is Y = 0 + 2X.
Multiple Regression When there are multiple predictors, the effects on the regression coefficients become complex and difficult to specify simply. The effect of reliability is a function of the reliability magnitudes and the true score correlations among the predictors. Unreliability in the criterion has no biasing effect on the regression coefficients; however, it does attenuate the multiple correlations between the predictors and criterion. Aiken and West (1991) provided an instructive example for the case of two independent variables, X and Z, used to predict the criterion Y. In this case, the standardized regression weight being estimated is the partial regression coefficient of Y on X holding out the effect of Z. The effect of the unreliability of the variable being partialed out has a substantial effect on the partial regression coefficient of the other variable. Even if one independent variable in a regression were measured with perfect reliability, the unreliability of the other independent variables will have a biasing effect on the regression coefficient associated with the
Salkind_Chapter 94.indd 382
9/4/2010 10:59:14 AM
Ree and Carretta
The Role of Measurement Error
383
independent variable measured without error. The standardized regression coefficient is given by bYX⋅ Z = (rYX − rYZ rXZ ) / (1 − rXZ ) .
(6)
To correct this equation for unreliability of variable Z, it is necessary to write it as cbYX ⋅ Z = (rYX rZZ ′ − rYZ rXZ ) / (1 − rXZ ).
(7)
For example, if X is measured without error, the reliability of Z is . 64, and rYX = rYZ = rXZ = .5, the corrected standardized coefficient is cbYX ⋅ Z ((.5 × .64 ) − (.5 × .5))/(1 − .5) = .07 /.5 = .14. The two-variable case can be extended to the case of many independent variables.
Interpretation of Regression Coefficients The failure to include reliability in the interpretation of the regression equation causes problems in several ways depending on the use made of the regression equation and its coefficients. The first is in the interpretation of the relative importance of the constructs related to the predictors. Frequently, researchers compare weights and derive meaning of the relative importance of the constructs represented by the observed variables such as verbal or mathematical ability. The uncorrected regression weights are not dependable indicators of the importance of the independent variables; therefore, interpretation of them can lead to erroneous conclusions. Consider an aptitude test with three equally reliable measures representing reading skill, mathematics knowledge, and space perception. Furthermore, the source of validity is limited to the common first factor (i.e., g) underlying each test in the battery and no validity, in this example, is due to the specific measurement (i.e., s) of each test. Under these conditions, each test should have the same regression weight when used in a regression equation to predict the criterion. However, if there are differences in test reliabilities, the regression coefficients will vary differentially from their true population values. Suppose these three example tests have reliabilities of .65, .70, and .85, respectively. In estimation, the three regression coefficients will differ because of their reliability. For example, if the three uncorrected regression coefficients were .195, .210, and .255, some might interpret this to mean that space perception is 1.3 times (.255/.195) as important as reading skill. In reality, the only difference is in the reliability of the measures. In his computer programs called “Package,” John Hunter (personal communication, May 1, 1995) has a regression procedure that allows for explicit correction for unreliability and corrects the regression coefficients.
Salkind_Chapter 94.indd 383
9/4/2010 10:59:14 AM
384
Research Design, Measurement and Statistics and Evaluation
Even if the reliability of the measures starts out the same, prior selection leads to reduction of reliability in the sample. Prior selection refers to the process of selecting a sample using some method, such as minimum qualification scores, that changes the variability of the scores in that sample. Gulliksen (1950, 1987, p. 124, Equation 5) provides the following equation to show the relationship between prior selection and reliability. Rxx = 1 − ( s2X / S2X )(1 − rXX ).
(8)
Consider the previous example with the three tests in which the sample has been selected on the basis of scores on the reading skill test, which has caused indirect selection (Thorndike, 1949) to occur on the mathematics and space-perception tests. This indirect selection is the result of the correlation of the variables. Given the same true regression coefficients and reduction in variance of 50%, 30%, and 20%, respectively, for reading skill, mathematics, and space perception, the reliabilities of the tests have changed differentially. The regression coefficients thus become differentially biased and poor estimates of the population values. Some would interpret these coefficients, and clearly, erroneous conclusions would be drawn. Another use of regression coefficients is in production of individual jobspecific regression equations for personnel classification. Johnson and Zeidner (1991) have called for the use of linear programming to achieve optimal assignment of individuals to jobs by such systems. When the regression coefficients are computed in several range-restricted samples of job incumbents, the prior selection of the job incumbents causes the reliabilities of the tests to vary from sample to sample (Gulliksen 1950, 1987, p. 124, Equation 5). These varying reliabilities cause biases in the regression coefficients. In addition, the effect of the potential removal of homoscedasticity because of range restriction induced by prior selection also biases the regression coefficients. When samples are preselected and homoscedasticity is maintained, the regression coefficient in the selected sample will not show bias due to heteroscedasticity (Cohen & Cohen, 1983). The benefits of the use of these biased coefficients in optimization (Johnson & Zeidner, 1991) may be illusory and due to nothing more than the reliability artifact. Any technique that uses regression coefficients such as clustering, profile analysis (Nunnally & Bernstein, 1994) or policy capturing (Ward & Jennings, 1973) must take the unreliability of the variables into account or inappropriate inferences will be made.
Test Bias Detection Jensen (1980, pp. 383–386) and others (Cohen & Cohen, 1983; Crocker & Algina, 1986; Fuller, 1987) have shown that less than perfect reliability can influence the interpretation of models of test bias that rely on examination
Salkind_Chapter 94.indd 384
9/4/2010 10:59:15 AM
Ree and Carretta
The Role of Measurement Error
385
of regression slope, intercept, and standard error of estimate. What may be mistakenly interpreted as test bias may in fact be due solely to unreliability. As Jensen noted, “Before concluding that a test is intrinsically biased, it should be determined how much of the apparent bias is attributable to the unreliability of the test” (p. 383). Test unreliability disadvantages high-scoring individuals, regardless of their group (e.g., ethnicity/race, gender, socioeconomic) membership. Therefore, any group with proportionally fewer high-scoring members will benefit (as a group) from a test’s unreliability. As noted by Jensen (1980), Hunter and Schmidt (1976, p. 1056) suggested that test unreliability by itself might account for half of the overprediction of grade point average for Blacks reported in the literature. In an unbiased test with perfect reliability, by definition, the slope, intercept, and standard error of estimate are the same for the groups being compared. Through several illustrative examples, Jensen (1980) showed that even in an unbiased test, unreliability reduces the regression slope, produces group differences in the Y intercept, and increases the standard error of estimate.
Regression Slope In a perfectly reliable test, the observed slope will be bYX. When reliability is less than 1, the slope becomes rXX bYX. If the reliability of the test were zero, the regression line would be horizontal (no slope). There is no group-difference effect of test unreliability on the slope, unless the reliabilities differ in the two groups.
Regression Intercept Interpretation of regression intercepts is hazardous when the predictor is not perfectly reliable. Jensen (1980) showed that if the test’s reliability is less than perfect and there are two groups and a single regression line, there must be two intercepts found solely because of the unreliability of the predictors. The difference in intercepts for the two groups will increase by an amount equal to Δ(k A − kB ) = bYX (1 − rXX )( X A − X B ) , where kA and kB are the intercepts for groups A and B and X A and X B are the means. Furthermore, bYX is the raw score regression coefficient for the regression of Y on X, and rXX is the reliability of predictor X. The expected difference in intercepts is a function of group means, regression coefficient, and predictor reliability. For example, if the regression coefficient were 1 and X A and X B were 10 and 5 for a test (X) with reliability of .9, the expected difference in intercepts would be 0.5. If the reliability were decreased to .7, the expected difference in intercepts would increase to 1.5. If the reliability decreased further to .5, the expected intercept difference would increase to 2.5. The nature and magnitude of the
Salkind_Chapter 94.indd 385
9/4/2010 10:59:15 AM
386
Research Design, Measurement and Statistics and Evaluation
artifact is made clear when we contrast this to the circumstance in which reliability is perfect and a zero difference in intercepts is found. The uncritical interpretation of different intercepts as bias is unwarranted.
Standard Error of Estimate Test unreliability increases the standard error of estimate (SEY ′) by an amount 2 2 equal to ΔSEY ′ = σ Y ( 1 − ( rXY rXX ) 1 − rXY ). Test unreliability increases the amount of overlap of the distributions of the predicted criterion scores for the two groups being compared. Finally, test unreliability decreases the stan2 dard deviation of the predicted criterion, σ Y ′ = σ Y rXX , by an amount equal 2 to Δσ Y = ((1 − rXX )σ Y ).
An Example of Corrected Test Bias Detection Analyses Carretta (1997) provided a practical example in a study of gender and ethnic group differences in the predictive utility of aptitude composites used to select U.S. Air Force pilot trainees. Uncorrected results showed group differences in predicted pilot training completion rates with overestimation for the minority group (women = .07 and Hispanics = .12) relative to the majority group (men and Whites). After correction for unreliability of the predictors, all differences were reduced to a trivial .0004 or less.
Validity Coefficient Test unreliability reduces the validity coefficient for both groups by an 2 rXY ) . In addition, test unreliability increases amount equal to ΔrXY = ( 1 − rXX the amount of overlap of the distributions of the predicted criterion scores for the two groups being compared. Finally, test unreliability decreases the 2 standard deviation of the predicted criterion, σ Y ′ = σ Y rXX , by an amount 2 equal to ΔσY = ( 1 − rXX σY ) . A particularly interesting situation occurs in the tests of predictive bias (Cole, 1973) using regression models (Lautenschlager & Mendoza, 1986). Usually the first test of models in the detection of bias is a comparison of a four-parameter regression model against a two-parameter model. The two models tested are (9) Yˆ = a + b S + b X + b XS 1
1
2
3
and Yˆ = a4 + b4 X
Salkind_Chapter 94.indd 386
(10)
9/4/2010 10:59:15 AM
Ree and Carretta
The Role of Measurement Error
387
where X is a test score, S is a categorical variable (often called a dummy variable) of 0 and 1 denoting group membership, and XS is the cross-product of X and S. Note that XS has a peculiar distribution, with zeros for the group coded 0 and test scores for the group coded 1. In the first model, a1 and b1 are intercepts and b2 and b3 are slopes. In the simpler model, a4 is the intercept and b4 is the slope. The first regression model can provide two lines; the second regression model can provide only one line. Frequently, the two groups considered have a mean score difference of 1σ. Consequently, the test has a different reliability for each group, and depending on the placement of the minimum cut score, the reliability may be made to differ further between the groups after selection. The effects of unreliability in the full model are more difficult to specify than in the reduced model, and comparison of the models in the presence of measurement error may lead to inappropriate inferences in the population.
Factor Analysis The role of reliability in factor analysis is well known and generally straightforward. The general model of factor analysis is that the variance of the observed variable is a linear combination of common factors and unique factors. The ratio of the variance associated with the common factor to the total variance of an observed variable is known as the communality of the observed variable (Fuller, 1987, pp. 60–61). Using Fuller’s notation, communality can be written as −1 2 2 2 . (11) k11 ≡ [β11 σ xx + σ ee11]−1 β11 σ xx ≡ 1 − σ YY 11σ ee11 2 This quantity k11 , the communality, is an estimate of the reliability of the variable. The communality of a variable provides a lower bound estimate of its reliability (Baggaley, 1964). It is a lower bound estimate because it does not include the reliable variance measured by specific factors. The unique variance or uniqueness of a variable is (1 – communality). The unique factors are composed of specific variance and error variance. Symbolically, these relationships can be expressed as
X=h+u
(12)
X = h + s + e,
(13)
or where X is an observed variable, h is the commonality, u is the uniqueness, s is the specific, and e is the error. If the variable is associated with the factor, as the reliability of the observed variable increases and the error decreases, the loadings of the variable on the factors can be expected to increase. For example, if there are three variables
Salkind_Chapter 94.indd 387
9/4/2010 10:59:15 AM
388
Research Design, Measurement and Statistics and Evaluation
that have true loadings that are equal but are measured with differing reliability, the observed loadings will differ as a function of the reliability, with the more reliably measured variables receiving higher loadings.3 Interpretations of these observed loadings will lead to erroneous conclusions about the factorial causation of the variables because the differences are due to differing reliabilities and not differing relationships to the factor. To correct factor loadings for unreliability, the loadings for the observed variables can be divided by the reliability of the observed variables. These corrected loadings give more appropriate estimates of the true relationships of the factors to the observed variables. Ree and Carretta (1998) reported a study that showed the correlation between the unrotated first-factor loadings of multiple aptitude battery 4 scores and average validity of those scores. The correlation was .76. The factor loadings were then corrected for unreliability, and the correlation became .98.
Analysis of Variance (ANOVA) and Analysis of Covariance (ANCOVA) ANOVA and ANCOVA are examples of the linear model as is regression analysis. The effects of measurement error are similar; however, the independent variables in ANOVA are usually uncorrelated owing to random assignment of participants. As stated above, the effects of unreliability on uncorrelated independent variables are simpler.
ANOVA Let us consider a one-way ANOVA with three levels of the independent variable with μ1, μ2, and μ3. Remembering that ANOVA is a linear model and that the parameter estimates can be found by means of regression, we note that μ1 = α + β1, μ2 = α + β2, and μ3 = α, where α is the regression additive constant (intercept) and β1 and β2 are the multiplicative partial regression coefficients for the two categorical variables needed to represent the three levels of the independent variable. Furthermore, note that α = μ3, β1 = μ1 – μ2, and β2 = μ2 − μ3. Suppose μ1 = +1, μ2 = 0, and μ3 = −1 and that the reliabilities rXX1 = rXX2 = rXX3 = .50. The true differences are 1 or 2 points, but there is a loss of statistical power. In addition, the effect size (e.g., (μ1 − μ2)/σ) may be substantially underestimated because σ is inflated by error variance. Much the same may be found in an N-way ANOVA. Consider a two-way ANOVA with the independent variables of gender and political party affiliation. There are two gender (male/female) and three political affiliation (Democrat, Independent, and Republican) levels, respectively.
Salkind_Chapter 94.indd 388
9/4/2010 10:59:16 AM
Ree and Carretta
The Role of Measurement Error
389
Using the same logic as before, the group means may be represented as follows: Gender
Political affiliation
Female
Democrat Independent Republican Democrat Independent Republican
Male
Group means μf1 = α + β1 + β3 μf2 = α + β2 + β3 μf3 = α + β3 μm1 = α + β1 μm2 = α + β2 μm3 = α
Again, both statistical power and effect size may be reduced. If the two independent variables above are correlated, as they well may be given the impossibility of randomly assigning gender and party affiliation, the same biases will be found as in a multiple regression with correlated predictors. With less than perfectly reliable variables, the results can be very misleading. An instructive example is provided in the work of Guttman (2000) in a study of 16- to 40-year-old females. Independent variables for the analysis of variance were based on meeting the criteria in the Diagnostic and Statistical Manual of Mental Disorders (3rd ed., revised; American Psychiatric Association, 1987) for the clinical conditions of anorexia nervosa and borderline personality disorder. The control participants were admitted to the study on three less than perfectly reliable self-report clinical instruments. Of particular interest were the dependent variables assessed by a 28-item measure of an individual’s cognitive and emotional capacity for empathy. This instrument yielded four scales whose median reliability was reported by Guttman to be about .70. The DSM-III-R assessments have less than perfect reliability, and .6 is a reasonable approximation of the reliability of the categorization into the anorexic and borderline personality groups. Given the magnitudes of less than perfect reliability in both the independent and dependent variables, it is likely that all effect sizes were underestimated and many significant differences were undetected.
ANCOVA ANCOVA is a linear model with categorical variables and at least one continuous variable as a covariate. For example, suppose we were interested in examining the effects of political party affiliation, a three-level categorical variable (Democrat, Independent, and Republican) and annual income measured in dollars earned (a continuous variable) on amount of support for the president’s proposed budget (a continuous variable). The three levels of party affiliation are represented by two categorical independent variables (X1 and X2). Income level (X3) is the continuous covariate independent variable. The following linear model represents the relationship of party
Salkind_Chapter 94.indd 389
9/4/2010 10:59:16 AM
390
Research Design, Measurement and Statistics and Evaluation
affiliation and income to the dependent variable, support for the president’s budget (Y): Y = a + β1X1 + β2X2 + β3X3.
(14)
Considering less than perfect reliability of the independent variables, the equation can be rewritten as Y = α ′ + r11β1 X 1 + r22β 2 X 2 + r33β3 X 3 ,
(15)
where r11, r22, and r33 are reliabilities and α′ denotes the additive regression coefficient affected by the unreliability of the independent variables. The effects of less than perfect reliability will be found and the higher the correlation between the covariate X3 and the independent variables X1 and X2, the more bias will be noted in the analysis and the greater the loss of statistical power. The same biases will be found as would be found in a multiple regression with correlated predictors.
Ameliorating or Correcting the Problem In each of the cases we reviewed, it has been shown that using less than perfectly reliable variables creates bias in the parameter estimates. This reduces statistical power and provides the opportunity for misinterpretation of findings and misstatement of fundamental relationships. There are several ways to ameliorate or correct this problem. The first approach is to use variables that yield highly reliable scores for your sample (see Ree, Carretta, & Steindl, 2001). Revising unreliable test items or observational techniques, adding test items or observations, revising vague or confusing instructions, and clarifying ambiguous scoring and coding procedures can accomplish this. This alleviates most of the problems but does not entirely remove the bias due to unreliability. A second approach is to correct the observed variables for the effects of unreliability and conduct the analyses on the corrected values. This can be accomplished with reliability estimates from the participant sample in the study. Finally, the use of latent variable analyses, such as confirmatory factor analyses or structural equation modeling, which eliminate or substantially reduce the unreliability of the variables, is a third worthwhile approach. Cohen and Cohen (1983, p. 411) reported that Dunivant (1981) conducted simulation studies to evaluate the last two approaches and concluded that both “have merit and yield reasonable results.” Unreliability poses a threat to our knowledge and practice, whether in theoretical studies or in practical application. Baugh (2002) expressed it well, stating, “As the winds of change continue to shape responsible research practice, it is hoped that researchers will give more thoughtful consideration to the influence that measurement error variance exerts” (p. 261).
Salkind_Chapter 94.indd 390
9/4/2010 10:59:16 AM
Ree and Carretta
The Role of Measurement Error
391
Notes 1. We follow the convention of using Greek letters for population parameters and Roman letters for sample statistical estimates of population parameters. Equation 1 is the single exception to this rule as many are familiar with the equation when written as presented. 2. Due to sampling error, the observed correlation could take on numerous higher values. We present the maximum expected observed correlation. 3. We noted a similar finding for regression coefficients in the section on multiple regression. In factor analyses of test items or questionnaire items, it may be difficult to estimate the reliability of items. 4. The unrotated first factor of a multiple aptitude battery is a measure of general cognitive ability ( g).
References Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. American Psychiatric Association. (1987). Diagnostic and statistical manual of mental disorders (3rd ed., rev). Washington, DC: Author. Baggaley, A. R. (1964). Intermediate correlational techniques. New York: Wiley. Baugh, F (2002). Correcting effect sizes for score reliability: A reminder that measurement and substantive issues are linked inextricably. Educational and Psychological Measurement, 62, 254–263. Carretta, T. R. (1997). Group differences on U.S. Air Force pilot selection tests. International Journal of Selection and Assessment, 5, 115–127. Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Mahwah, NJ: Lawrence Erlbaum. Cole, N. S. (1973). Bias in selection. Journal of Educational Measurement, 10, 237–255. Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Harcourt Brace Jovanovich. Dunivant, N. (1981). The effects of measurement error on statistical models for analyzing change (Final Report, Grant NIE-G-78 –0071). Washington, DC: National Institute of Education, U.S. Department of Education. Fuller, W. A. (1987). Measurement error models. New York: Wiley. Gulliksen, H. (1950). Theory of mental tests. New York: Wiley. Gulliksen, H. (1987). Theory of mental tests. Mahwah, NJ: Lawrence Erlbaum. Guttman, H. A. (2000). Empathy in families of women with borderline personality disorder, anorexia nervosa, and a control group. Family Process, 39, 345–358. Hunter, J. E., & Schmidt, F. L. (1976). A critical analysis of the statistical and ethical implications of various definitions of “test bias.” Psychological Bulletin, 83, 1053–1071. Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis. Newbury Park, CA: Sage. Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis (2nd ed.). Thousand Oaks, CA: Sage. Jensen, A. R. (1980). Bias in mental testing. New York: Free Press. Johnson, C., & Zeidner, J. (1991). The economic benefits of predicting job performance: Volume 2. Classification efficiency. New York: Praeger. Lautenschlager, G. J., & Mendoza, J. (1986). A step-down hierarchical multiple regression analysis for estimating hypotheses about test bias in prediction. Applied Psychological Measurement, 10, 133–159.
Salkind_Chapter 94.indd 391
9/4/2010 10:59:16 AM
392
Research Design, Measurement and Statistics and Evaluation
Nunnally, J. C., & Bernstein, I. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Ree, M. J., & Carretta, T. R. (1998). General cognitive ability and occupational performance. In C. L. Cooper & I. T. Robertson (Eds.), International review of industrial organizational psychology, 1998 (pp. 159–184). Chichester, UK: Wiley. Ree, M. J., Carretta, T. R., & Steindl, J. R. (2001). Cognitive ability. In N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran (Vol. Eds.), International handbook of work and organizational psychology ( Vol. 1, pp. 219–232). London: Sage. Ree, M. J., & Earles, J. A. (1993). g is to psychology what carbon is to chemistry: A reply to Sternberg and Wagner, McClelland, and Calfee. Current Directions in Psychological Science, 2, 11–12. Russell, T. L., & Peterson, N. G. (2002). The experimental battery: Basic attribute scores for predicting performance in a population of jobs. In J. P. Campbell & D. J. Knapp (Eds.), Exploring the limits in personnel selection and classification (pp. 269–306). Mahwah, NJ: Lawrence Erlbaum. Spearman, C. (1904). “General intelligence,” objectively determined and measured. American Journal of Psychology, 15, 201–293. Stanley, J. C. (1971). Reliability. In R. L. Thorndike (Ed.), Educational measurement (2nd ed., pp. 356–442). Washington, DC: American Council on Education. Thompson, B. (2003). Understanding reliability and coefficient alpha, really. In B. Thompson (Ed.), Score reliability (pp. 3–30). Thousand Oaks, CA: Sage. Thorndike, R. L. (1949). Personnel selection. New York: Wiley. Ward, J. H., Jr., & Jennings, E. (1973). Introduction to linear models. Englewood Cliffs, NJ: Prentice Hall.
Salkind_Chapter 94.indd 392
9/4/2010 10:59:16 AM
95 Qualitative Methods and the Development of Clinical Assessment Tools Jane F. Gilgun
Q
ualitative methods can enhance the development and evaluation of clinical assessment tools. Their contributions to development include the identification of concepts that compose the tools and the generation of items. For the evaluation of tools, qualitative approaches can provide information on whether practitioners find the tools a help or a hindrance to practice, whether training on the use of the tools was adequate, and whether revisions of the tools are warranted. Psychometric testing of tools is important, but evaluations that end there are incomplete without a qualitative component. In this article, I elaborate on these points with case studies of clinical rating scales that I took the lead in developing as part of university-practitioner collaborations. These instruments include the CASPARS (Clinical Assessment Package for Risks and Strengths) for families and children where the children have adjustment issues (Gilgun, 1999a, 1999b, 2002b; Gilgun, Keskinen, Marti, & Rice, 1999) and the 4-D, for adolescents who have experienced adversities (Gilgun, 2002a, 2003).1 Sources of the concepts and the items for the CASPARS were primarily my life history research with adults who had experienced serious adversities in childhood and adolescence (Gilgun, 1990, 1991, 1996a, 1996b, 1999a, 1999b, 1999c, 1999d, 2002b; Gilgun & Connor, 1990; Gilgun & McLeod, 1999), related research and theory, and consultations with practitioners in the field. For the 4-D, the Source: Qualitative Health Research, 14(7) (2004): 1008–1019.
Salkind_Chapter 95.indd 393
9/8/2010 12:09:33 PM
394
Research Design, Measurement and Statistics and Evaluation
work of Brendtro, Brokenleg, and Van Bockern (1990) provides the main concepts and their definitions. Related research and theory, my life history research, and the combined experience of social workers, administrators, and a clinical psychologist further fleshed out the definitions of the concepts and were the sources of the items. This triangulation of sources provides a strong argument for content validity of both sets of instruments and appears to have led to high indices of reliability. These instruments have alpha coefficients that are .9 and higher, the gold standard for clinical instruments (cf. Nunnally, 1978). The multiple sources of concepts and items parallel the components of evidence-based practice as enunciated by the originators: clinical experience, best research evidence, and patient wishes and preferences (cf. Evidence-Based Medicine Working Group, 1992; Sackett, Straus, Richardson, Rosenberg, & Haynes, 2000; Straus & McAlister, 2000).
Significance Practitioners in a variety of health care fields are using standardized clinical assessment tools more frequently than ever before, and their use of these tools is likely to increase. Practitioners are under more pressure than ever before to demonstrate effectiveness and to do so in as short a time as possible to hold down costs. The evidence-based practice movement emphasizes the application of best research evidence to clinical care, as well as the importance of incorporating clinical experience and the preferences and wishes of patients. Clinical assessment tools can be responsive to these demands, in that they provide outcome scores and, at their best, incorporate available research, clinical wisdom, and patient perspectives. When assessment tools are constructed in this way, they also bring attention to significant areas of patient functioning and therefore can provide guidelines for practice. Such guidelines systematize practice but in a flexible way, so that interventions can be crafted to fit individual patients (Morse, Hutchinson, & Penrod, 1998). Clinical instruments that serve as practice guidelines also provide clinicians with a common language and sets of assumptions that will increase communication and, I hope, quality of care.
Procedures for Developing and Testing Assessment Tools In general, procedures for developing instruments are well established and involve identifying and defining both conceptually and operationally the concepts to be measured, making decisions about the level of specificity of the items of the instruments, generating an item pool, presenting the item pool
Salkind_Chapter 95.indd 394
9/8/2010 12:09:34 PM
Gilgun
Clinical Assessment Tools
395
to experts for critique and possible revisions and additions, assembling the tool, piloting and field testing the tool, and conducting studies of reliability and validity (DeVellis, 1991; Gilgun, 1999b; Nunnally, 1978). Content validity is a primary concern at the beginning of test development and involves the including in the instruments that adequately represent the concepts the instrument is designed to measure. Ensuring adequate content validity is difficult, because there is no way to identify the universe of possible items (DeVellis, 1991; Nunnally, 1978; Pike, 2002), just as there is no way to write a definition that includes every aspect of a concept. Thus, “true” concepts and complete lists of items representing them are not possible. These ideas are useful, because they remind researchers that the tools we create are inevitably incomplete and subject to revision. Using qualitative research in the development of tools is not new, though there appears to be little published research on the use of qualitative approaches in the evaluation of tools. For example, Faul (2002) relied primarily on practitioners to identify the core concepts of the Corporate Behavior Wellness Inventory. Morse and colleagues used qualitative data in the development of two different tools. The Adolescent Menstrual Attitude Questionnaire (Morse & Kieren, 1993; Morses, Kieren, & Bottorff, 1993) was based on qualitative data derived from open-ended interviews (Morse & Doan, 1987). The Morse Fall Scale (Morse, Morse, & Tylko, 1989) was based on multiple sources, including patient records, observation of the patients’ environments, and examination of patients (p. 367). Typically, the developers of these tools also used quantitative methods, including factor analysis and item-total analysis, to complete the construction of the tools. They also evaluated the tools quantitatively, using such forms as alpha coefficients and tests of construct validity. The present article adds to the literature on the use of qualitative methods in the development and evaluation of clinical instruments by demonstrating (a) the fit between qualitative research and the structure of clinical assessment tools, (b) how researchers’ immersion in qualitative data facilitates the identification of items and concepts of clinical tools, (c) the excellent psychometric properties of instruments based on qualitative approaches, and (d) the usefulness of qualitative methods in the evaluation of the tools.
Clinical Assessment Tools Clinical assessment tools, sometimes called rapid assessment instruments, are short, easy to use and score, and provide information that is useful to practitioners (Bloom, Fischer, & Orme, 1999; Levitt & Reid, 1981). When practitioners find that clinical tools have these qualities, they are much more likely to use them.
Salkind_Chapter 95.indd 395
9/8/2010 12:09:34 PM
396
Research Design, Measurement and Statistics and Evaluation
The standards for judging the reliability of clinical assessment tools are stringent. For example, in psychological and educational testing that is done on groups, alpha coefficients of .5 and .6, which are measures of internal consistency reliability, are satisfactory. Such alpha coefficients are inadequate for clinical assessment tools, however, which have a standard of .9 or higher (Nunnally, 1978). In like manner, an item-total coefficient, which is an indirect test of content validity, in clinical tools should be .5 or .6 and not at the lower standard of .4 for educational and psychological group testing (Nunnally, 1978). Clinical rating scales can be considered idiographic (focus on understanding individuals and analytic generalizations), whereas educational and psychological tools typically can be considered nomothetic (focus on groups and probabilistic generalization).2 In general, the items of clinical assessment tools are likely to be more global than items on educational and psychological tests. Clinical practitioners, such as nurses, social workers, and occupational therapists, typically gather information about clients from a wide variety of sources. A major task is to synthesize this information so that it can be used first for a comprehensive assessment of clients’ situations, then for treatment planning, and, finally, to evaluate treatment effectiveness. Items that are relatively wide in scope can help organize information derived from multiple sources. The exceptions are clinical tools useful for a specific condition, such as the Morse Fall Scale (Morse, Morse, et al., 1989), which are at a concrete level of abstraction. The mathematics of clinical assessment tools differs from that of instruments based on nomothetic assumptions. Researchers can provide norms for nomothetic scores, whereas clinical tools might not be normed. Reasons for this include variations among individuals when they are assessed using idiographic instruments and the likelihood of variations in scoring across practitioners. First, clients often respond differently to different individuals. Therefore, what some practitioners learn from clients might be different from what other practitioners learn. Second, training and experience can affect scoring. The more theoretically based an instrument is, the more clinicians require training on the theory. Practitioners unfamiliar with the theory will score the same individual differently from scorers who understand the theory. Thus, in using clinical tools to evaluate outcome, each client would be his or her own control; in other words, scores at beginning of treatment would be compared to scores at the end of treatment for the same individual. Furthermore, interpreting scores is challenging in other ways, because scores attained early in treatment might not reflect clients’ conditions accurately. Over the course of treatment, both clinicians and clients might discover additional information that was not on the original assessment. Clients sometimes look worse before they look better. Thus, even on the individual level, there are many opportunities for scores not to reflect clients’ situations accurately.
Salkind_Chapter 95.indd 396
9/8/2010 12:09:34 PM
Gilgun
Clinical Assessment Tools
397
The Fit between Qualitative Methods and Clinical Assessment Tools Qualitative methods and instrument development are a good fit. Conceptindicator models link the two. Many years ago, Lazarsfeld and colleagues (Lazarsfeld, 1958; Lazarsfeld & Thielsens, 1958) showed that instruments are composed of concepts and their indicators. The development of tools requires a thorough understanding of the concepts that instruments are intended to measure and the identification of adequate item pools. Glaser and Strauss (1967) modeled their version of grounded theory on Lazarsfeld’s conceptindicator model and cited his work many times. They wrote that a grounded theorist’s main task is to extract concepts from their concrete indicators. Codes can be thought of as concepts and the coded bits of text as the indicators of the concepts or codes. Findings of qualitative research typically are presented in a form that is consistent with concept-indicator models. This includes research for which findings are presented as concepts, typologies, or theoretical statements, all supported by excerpts from texts. The parallels between concept-indicator models, qualitative research, and clinical instruments facilitate instrument development. Clinical assessment tools as defined here are similar to assessment guidelines, whose purpose also is to direct attention to important areas of client (patient) functioning (Morse, Hutchinson, et al., 1998). Assessment guidelines, however, typically do not yield scores. Furthermore, some guidelines might be explicitly connected to classes of interventions, whereas the clinical assessment tools in this article are not. Of course, clinical assessment tools could be linked to specific interventions.
Sources of Concepts and Items The sources of the concepts for clinical tools can be preestablished conceptual frameworks, or the concepts can be identified and developed for the purpose of developing tools for particular purposes. The CASPARS, a set of five clinical assessment tools, and the 4-D, a set of our clinical assessment tools, are examples. I identified the five concepts that compose the CASPARS for the purpose of developing instruments to be used in treatment programs for children aged 5 to 12 and their families where the children had adversities, including histories of abuse and neglect, parental abandonments, and out-of-home care. These concepts are emotional expressiveness, family relationships, the family’s embeddeness in the community, peer relationships, and sexuality. I had no preestablished framework on which to draw. As mentioned earlier, I drew on my long-term qualitative research with adults who had experienced adversities during childhood and adolescence, and related research and theory.
Salkind_Chapter 95.indd 397
9/8/2010 12:09:34 PM
398
Research Design, Measurement and Statistics and Evaluation
The 4-D are based on a preestablished conceptual framework taken from Reclaiming Youth at Risk, a book that draws on 15,000 years of Native wisdom on child socialization and that is widely used in youth-caring agencies (Brendtro et al., 1990). The authors represent Native wisdom as a medicine wheel they call the Circle of Courage, which is composed of four dimensions: belonging, mastery, independence, and generosity. Administrators of a national therapeutic foster care agency asked me to help them develop a set of assessment tools based on the Circle of Courage, which they had adopted as their guide to practice.
Developing the Instruments The CASPARS I wrote a first draft of the items of the CASPARS based on my life history study, backed up by my knowledge of the research and theory on risk and resilience (Gilgun, 1996b, 1999a, 1999b). I found the generation of an item pool to be an enjoyable, flowing task. I had been immersed in my life history findings for about 15 years and had become a repository of concrete indications of multiple concepts, including those that become the bases of the CASPARS. Then, working closely with two social work practitioners who had extensive experience working with children who had experienced severe adversities, we reworded, eliminated, and added items based on their clinical experience. Two experienced clinical psychologists also reviewed the instruments and made further suggestions about items to keep and to eliminate and about wording. We piloted the instruments, made revisions until there seemed to be no further need, and then did a field test.
The 4-D To supplement the Brendtro et al. (1990) discussion of the four components of the Circle of Courage, I read widely on theories of human development. I found the Circle to be complex. I wrote an article linking the ideas of the Circle with contemporary theories of human development to ensure that I understood the concepts of the Circle (e.g., Gilgun, 2002a). Once I had some confidence that I understood the Circle of Courage, I began working with a team of administrators, practitioners, and a clinical psychologist. To facilitate teamwork on the development process, I gave a workshop on procedures for developing clinical instruments and also on the four dimensions of the Circle. In Figure 1, I have summarized much of what I said about instrument development. Next, we generated about 700 items that we thought were indicators of the Circle of Courage. We had long discussions about which items to keep and to eliminate. We agreed on 50 items, and I wrote the instruments.
Salkind_Chapter 95.indd 398
9/8/2010 12:09:34 PM
Gilgun
“True” Definition
The Definition that Covers Every Aspect of the Concept
Clinical Assessment Tools
Conceptual Definition
Operational Definition
Adequate Definitions: Best Possible
Concrete Indicators
399
The Tool
Items Composing the Instrument
Figure 1: The relationship between “True” definitions, conceptual definitions, operational definitions, and instruments
The practitioners on the development team then piloted the first draft of the instrument and made suggestions for eliminating and rewording items. When they had nothing more to add to the instrument, we did a field test. The instrument as field tested was composed of 48 items.
Field Tests The field test for the CASPARS took place in one state in the upper Midwest and involved children between the ages of 5 and 12 and their families. Professionals who worked with children in a large therapeutic foster care agency and the two social workers who participated in the development of the CASPARS assessed 146 children over a 1-year period. (See Gilgun, 1999b, for details.) The field test for the 4-D was a national sample. Professionals who worked with these adolescents and their families assessed 118 adolescents between the ages of 12 and 19 over an 18-month period. (See Gilgun, 2003, for details.)
Psychometric Testing For the CASPARS, I did an item-total analysis and studies of alpha coefficient, interrater reliabilities, and construct validities. Content validity, as mentioned earlier, is judged through logical argument. The alpha coefficients and the interrater reliabilities were in the .9-and-higher range. The item-total analyses of the five scales ranged from .67 to .80. Their correlations with instruments thought to measure the same thing (e.g., construct validity) ranged from .46 (Sexuality) to .82, with three of the instruments in the .80 range (Peer Relationship, Family Relationships, and Family Embeddedness in Community).
Salkind_Chapter 95.indd 399
9/8/2010 12:09:34 PM
400
Research Design, Measurement and Statistics and Evaluation
Emotional Expressiveness had a construct validity index of .56. These are excellent psychometric properties. The three sources of items (my life history research, related research and theory, and experience of two social workers and two clinical psychologists), but primarily my qualitative research, are responsible for the stellar properties of the CASPARS. The psychometric properties of the 4-D also are exemplary. The alpha coefficients of the long forms of the four instruments ranged from .89 to .91, with a .96 for the 47 items of the entire item pool. In the item-total analysis, every item but one had a correlation of .5 or higher. We also calculated the standard error of measurement (SEM), which was 11.75. This SEM is less than 5% of the total possible score for the instruments. Five percent or less indicates instruments with excellent reliabilities (Springer, Abell, & Nugent, 2002). Again, I ascribe these results to my qualitative life history research, the research and experience on which the Circle of Courage is based, and the clinical experience of practitioners. What I learned from my life history research was a primary source of the items and also provided me with guidelines for my appraisals not only of the items the team generated but also of the suggestions that the piloters of the 4-D offered. When I wrote the final version of the 4-D, I took into consideration the experiences and insights practitioners had shared; my knowledge of what Brendtro et al. (1990) had written about belonging, mastery, independence, and generosity; my understanding of related research and theory; and, finally, what I learned from many years of listening to individuals talk about their experiences with adversities through life history research.
Qualitative Evaluation of the Tools Both sets of tools were meant to be clinical assessment tools, as discussed earlier. As such, they provide guidance in assessment and treatment planning, help monitor the ongoing process of treatment, and provide outcome scores. They were also meant to fulfill the requirements of rapid assessment tools, which are to be short; to be easy to use, score, and interpret; and, above all, to provide information that practitioners find useful.
Caspars I evaluated the CASPARS qualitatively through talking to the two social work clinicians who had participated in the development, piloting, and field test phases of the CASPARS. They gave high marks to the usefulness, ease of use, scoring, and ease of interpretation. I have kept in touch with these clinicians over the years since the CASPARS were developed. They no longer use them
Salkind_Chapter 95.indd 400
9/8/2010 12:09:34 PM
Gilgun
Clinical Assessment Tools
401
as paper-and-pencil assessments. Because they are part of a private practice, there are no administrators to tell them they have to have outcome scores. From my ongoing research with them, I have observed that their clinical assessments and evaluations are consistent with the items of the CASPARS. The CASPARS gave them a language for many of the clinical issues that they had long recognized. Their current assessments and evaluations are informal. I have no other qualitative evaluations of the CASPARS.
4-D Over an 18-month period, I talked to users of the 4-D for a total of 30 hours to evaluate their clinical usefulness, their ease of use, and any other issues that practitioners encountered. I communicated with users individually by phone, e-mail, and in person. I also had in-person conversations with users in groups in both Minnesota and South Carolina. In addition, I spent about 4 hours in informational interviews with South Carolina social workers and care providers to get a sense of the larger context in which these professionals did their work. I kept detailed field notes of these conversations. My experience as a practitioner who had had evaluation tools imposed on me was helpful in this evaluation. While in practice, I had complied with the paperwork requirements but saw little purpose to these evaluations. I thought many of the questions were of questionable relevance. Furthermore, the evaluators did not share with practitioners the evaluation results. In addition, I and other practitioners had been excluded from the process of instrument development. I believed strongly that the persons who developed these evaluations had little idea about the kind of work that my colleagues and I did. I was horrified at the thought that I might be doing to social workers in the 4-D study what had been done to me when I was in practice. I was, therefore, eager to hear what the practitioners had to say about the 4-D and made every effort to see things from their points of view. The social workers had positive things to say about the 4-D as well as suggestions for improvement. In addition, conversations showed that many did not have adequate training on the Circle of Courage and thus did not understand the implications of many of the items. Finally, a significant subgroup of social workers experienced the agency’s paperwork demands as onerous and saw the 4-D as more paperwork. Many praised the 4-D. Several said the tools helped them know youth in new ways and led to some of the best conversations they had ever had. One social worker adapted the 4-D so that adolescents with whom he worked in group used the items of the tools to interview each other. He said he obtained excellent result from doing this. Others found the 4-D remote from practice and too abstract. One social worker called the 4-D “high-brained,” whereas others did not think the 4-D
Salkind_Chapter 95.indd 401
9/8/2010 12:09:34 PM
402
Research Design, Measurement and Statistics and Evaluation
fit their practice. Those who evaluated the tools this way saw their work as moving from crisis to crisis or as teaching youth simple skills, like brushing teeth and behaving in school. They said they would love to work with youth on such issues as belonging and generosity, but the youth simply were not ready for those levels of intervention. A few found the sexuality items intrusive and said it was not feasible to assess youths’ sexual adjustment. They thought others, such as the foster parents and therapists, should deal with youths’ sexuality. Many complained about the length of the 4-D. No one thought the wording of the items should be changed. Others experienced the 4-D as more paperwork. A group of social workers were distraught and close to tears when they talked to me about agency paperwork requirements. They led me to a room that had three walls covered with mailbox-like slots, each of which had forms they had to fill out. They wanted to spend face-to-face time with clients, and the demands of paperwork kept them from this. To have yet another set of forms to fill out – no matter how well constructed and well intentioned – was overwhelming.
Reflections on the Qualitative Interviewing of the Users of the 4-D I learned a great deal through this interviewing. Had I relied only on the quantitative outcomes, I would have thought I had participated in the development of ideal tools. In their comments, the social workers identified some strong points of the tools but also serious issues that gave direction to future steps the agency could take to ensure that the social workers were providing the services that the agency said they were giving. For example, the qualitative evaluation of the 4-D gave impetus to the creation of an agency-wide paperwork audit and a serious effort to reduce paperwork by creating a computer program that allowed for a single point of entry for data that were required on several forms. The evaluation also pointed out to agency administrators that they had a lot more training to do on the Circle of Courage. Although they had adopted the ideas of the Circle as the guiding principles for the services they offered, there was little buy-in from the practitioners. Therefore, the administrators committed themselves to renewed efforts to do in-depth training on the Circle of Courage. One social worker called the tool “high-brained,” but the administrators thought her opinion would likely change with more training. In addition, the administrators were astounded that some social workers were squeamish about sexual topics. They decided to develop more training on adolescent sexuality, particularly sexual issues characteristic of youth in out-of-home care. Practitioner feedback led to modification of the tool. Many thought the number of points on the rating scale should be reduced from five to three,
Salkind_Chapter 95.indd 402
9/8/2010 12:09:35 PM
Gilgun
Clinical Assessment Tools
403
and that was done, with the points becoming high, medium, and low risks and high, medium, and low strengths with a midpoint of mixed. In response to complaints that the tool was too long and took too much time to complete, the number of items of the tools was reduced. The long form of the final version of the 4-D had 47 items. A small group of administrators and practitioners eliminated 19 more items to create a short form composed of 28 items. The psychometric properties of the short form were not as good as those of the long form, but they were respectable (Gilgun, 2003). For example, with the elimination of items, the overall alpha was reduced from .96 to .94, with the individual alphas ranging from .83 to .91. The overall SEM at 6.9 remained less than 5% of the total score.
Discussion Qualitative methods are invaluable in developing and evaluating clinical rating scales. Solid qualitative research in combination with reviews of the literature and consultation with practitioners provides a basis for confidence in the content and face validities of clinical tools and appears to lead to clinical tools that have excellent psychometric properties. Furthermore, qualitative interviewing of users provides valuable feedback about the usefulness of the tools, factors that block effective use of the tools such as training issues and competing paperwork, and effective ways to improve the instruments themselves. In addition, I found that my own experience of having evaluation tools thrust on me helped me stay connected to practitioners, who complained mightily about instruments that I had put my heart into. In addition, I think I could hear them because I have learned from doing qualitative research that no matter what we come up with, our results are subject to revision. The practitioners with whom I worked made this point clear. Practitioners are under more pressure than ever to incorporate best research evidence into their practice, to demonstrate effectiveness, and to do so in as short a time as possible. Clinical rating scales as I defined them in this article incorporated best research evidence and also incorporated the expertise of clinical practitioners, though I did not interview youths themselves. Thus, clinical rating scales as described in this article fulfill two of the three requirements of evidenced-based practice (cf. Evidence-Based Medicine Working Group, 1992; Sackett et al., 2000; Straus & McAlister, 2000). The contributions of qualitative approaches to the development of clinical assessment tools include the identification of the concepts that compose the tools, the development of definitions, and the generation of items. Qualitative research provides information that is at the level of specificity that is required for effective clinical assessment tools. Qualitative data contain multiple concrete indicators of the concepts that are significant to informants. The natural products of qualitative analysis are concepts and the concrete instances from
Salkind_Chapter 95.indd 403
9/8/2010 12:09:35 PM
404
Research Design, Measurement and Statistics and Evaluation
which the concepts are derived. Thus, the products of qualitative research are analogous to concept-indicator models, as Glaser and Strauss (1967) pointed out many years ago. Concept-indicator models also underlie clinical assessment tools and other instrumentation. With such congruence, qualitative researchers are positioned to develop clinical assessment tools and other instruments that will be useful to practice and will not be add-ons. These tools not only will be responsive to the demands of evidence-based practice but might also contribute to effectiveness and efficiency.
Notes 1. The instruments, related articles, and a manual for the 4-D are available at ssw.che.umn. edu/ faculty/jgilgun.htm. 2. See Gilgun (1994), Mezzich (2002), and Schafer (1999) for discussions of idiographic and nomothetic approaches.
References Bloom, M., Fischer, J., & Orme, J. G. (1999). Evaluating practice: Guidelines for the accountable professional (3rd ed.). Boston: Allyn & Bacon. Brendtro, L., Brokenleg, M., & Van Bockern, S. (1990). Reclaiming youth at risk: Our hope for the future. Bloomington, IN: National Educational Service. DeVellis, R. F. (1991). Scale development: Theory and applications. Newbury Park, CA: Sage. Evidence-Based Medicine Working Group (1992). Evidence-based medicine: A new approach to teaching the practice of medicine. Journal of the American Medical Association, 268, 2420–2425. Faul, A. C. (2002). Comprehensive assessment in occupational social work: The development and validation of the corporate behavioral wellness inventory. Research on Social Work Practice, 12 (1), 47–70. Gilgun, J. F. (1990). Factors mediating the effects of childhood maltreatment. In M. Hunter (Ed.), The sexually abused male: Prevalence, impact, and treatment (pp. 177–190). Lexington, MA: Lexington Books. Gilgun, J. F. (1991). Resilience and the intergenerational transmission of child sexual abuse. In M. Q. Patton (Ed.), Family sexual abuse: Frontline research and evaluation (pp. 93105). Newbury Park, CA: Sage. Gilgun, J. F. (1994). A case for case studies in social work research. Social Work, 39, 371–380. Gilgun, J. F. (1996a). Human development and adversity in ecological perspective, Part 1: A conceptual framework. Families in Society, 77, 395–402. Gilgun, J. F. (1996b). Human development and adversity in ecological perspective, Part 2: Three patterns. Families in Society, 77, 459–576. Gilgun, J. F. (1999a). CASPARS: Clinical Assessment Instruments that measure strengths and risks in children and families. In M. C. Calder (Ed.), Working with young people who sexually abuse: New pieces of the jigsaw puzzle (pp. 49–58). Dorset, UK: Russell House. Gilgun, J. F. (1999b). CASPARS: New tools for assessing client risks and strengths. Families in Society, 80, 450–459. Retrieved March 29, 2004, from http://ssw.che.umn.edu/ faculty/jgilgun.htm
Salkind_Chapter 95.indd 404
9/8/2010 12:09:35 PM
Gilgun
Clinical Assessment Tools
405
Gilgun, J. F. (1999c). Fingernails painted red: A feminist, semiotic analysis of “hot” text. Qualitative Inquiry, 5, 181–207. Gilgun, J. F. (1999d). Mapping resilience as process among adults maltreated in childhood. In H. I. McCubbin, E. A. Thompson, A. I. Thompson, & J. A. Futrell (Eds.), The dynamics of resilient families (pp. 41–70). Thousand Oaks, CA: Sage. Gilgun, J. F. (2002a). Completing the circle: American Indian medicine wheels and the promotion of resilience in children and youth in care. Journal of Human Behavior and the Social Environment, 6(2), 65–84. Gilgun, J. F. (2002b). Social work and the assessment of the potential for violence. In T. N. Tiong & I. Dodds (Eds.), Social work around the world II (pp. 58–74). Berne, Switzerland: International Federation of Social Workers. Gilgun, J. F. (2003). The 4-D: Strengths-based instruments for the assessment of youth who’ve experienced adversities. Unpublished manuscript. Gilgun, J. F., & Connor, T. M. (1990). Isolation and the adult male perpetrator of child sexual abuse. In A. L. Horton, B. L. Johnson, L. M. Roundy & D. Williams (Eds.), The incest perpetrator: The family member no one wants to treat (pp. 74–87). Newbury Park, CA: Sage. Gilgun, J. F., Keskinen, S., Marti, D. J., & Rice, K. (1999). Clinical applications of the CASPARS instrument: Boys who act out sexually. Families in Society, 80, 629–641 Gilgun, J. F., & McLeod, L. (1999). Gendering violence. Studies in Symbolic Interactionism, 22, 167–193. Glaser, B., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. New York: Aldine. Levitt, J. L., & Reid, W. J. (1981). Rapid-assessment instruments for practice. Social Work Research and Abstracts, 17, 13–19. Lazarsfeld, P. F. (1958). Evidence and inference in social research. Daedalus, 97, 47–67. Lazarsfeld, P. F., & Thielens, W. (1958). The academic mind. Glencoe, IL: Free Press. Mezzich, J. E. (2002). Comprehensive diagnosis: A conceptual basis for future diagnostic systems. Psychopathology, 35(2/3), 162–165. Morse, J. M., & Doan, H. M. (1987). Adolescents’ response to menarche. Journal of School Health, 57(9), 385–389. Morse, J. M., Hutchinson, S. A., & Penrod, J. (1998). From theory to practice: The development of assessment guides from qualitatively derived theory. Qualitative Health Research, 8(3), 329–340. Morse, J. M., & Kieren, D. (1993). The adolescent menstrual attitude questionnaire, Part II: Normative scores. Health Care for Women International, 14, 63–76. Morse, J., Kieren, D., & Bottorff, J. (1993). The adolescent menstrual attitude questionnaire, Part I: Scale construction. Health Care for Women International, 14, 39–62. Morse, J. M., Morse, R. M., & Tylko, S. J. (1989). Development of a scale to identify the fall-prone patient. Canadian Journal on Aging, 8(4), 366–377. Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill. Pike, C. K. (2002). Measuring racial climate in schools of social work: Instrument development and validation. Research on Social Work Practice, 12(1), 29–46. Sackett, D. L., Straus, S. E., Richardson, W. S., Rosenberg, W., & Haynes, R. B. (2000). Evidence-based medicine: How to practice and teach EBM (2nd ed.). Edinburgh, UK: Churchill Livingston. Schafer, M. (1999). Nomothetic and idiographic methodology in psychiatry: A historicalphilosophical analysis. Medicine, Health Care & Philosophy, 2(3), 265–274. Springer, D. W., Abell, N., & Nugent, W. R. (2002). Creating and validating rapid assessment instruments for practice and research: Part 2. Research on Social Work Practice, 12(6), 768–795. Straus, S., & McAlister, F. A. (2000). Evidence-based medicine: A commentary on common criticisms. Canadian Medical Association Journal, 163 (7), 837–841.
Salkind_Chapter 95.indd 405
9/8/2010 12:09:35 PM