Applied Psychometry
ii
Applied Psychometry
Applied Psychometry
Narender Kumar Chadha
Copyright © Narender Kumar...
505 downloads
3342 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Applied Psychometry
ii
Applied Psychometry
Applied Psychometry
Narender Kumar Chadha
Copyright © Narender Kumar Chadha, 2009 All rights reserved. No part of this book may be reproduced or utilised in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage or retrieval system, without permission in writing from the publisher. First published in 2009 by SAGE Publications India Pvt Ltd B1/I-1, Mohan Cooperative Industrial Area Mathura Road, New Delhi 110044, India www.sagepub.in SAGE Publications Inc 2455 Teller Road Thousand Oaks, California 91320, USA SAGE Publications Ltd 1 Oliver’s Yard, 55 City Road London EC1Y 1SP, United Kingdom SAGE Publications Asia-Pacific Pte Ltd 33 Pekin Street #02-01 Far East Square Singapore 048763 Published by Vivek Mehra for SAGE Publications India Pvt Ltd, typeset in 10/12 pt Palation by Star Compugraphics Private Limited, Delhi and printed at Chaman Enterprises, New Delhi. Library of Congress Cataloging-in-Publication Data Chadha, N. K. Applied psychometry/Narender Kumar Chadha. p. cm. Includes bibliographical references and index. 1. Psychometrics. I. Title. BF39.C487
150.1'5195—dc22
2009
ISBN: 978-81-321-0078-2 (PB) The SAGE Team: Reema Singhal, Vikas Jain, Amrita Saha and Trinankur Banerjee
2009027295
Dedication
This book is dedicated to my wife Neena for her patience and support
vi
Applied Psychometry
Contents Foreword by Gregory S. Kolt Preface
ix xi
PART 1: MEASUREMENT IN MODERN PSYCHOLOGICAL RESEARCH 1. 2. 3. 4.
Basics of Measurement Theory Errors in Measurement Speed Test versus Power Test Criterion for Parallel Tests
3 21 39 49
PART 2: THEORY AND PRACTICE OF PSYCHOLOGICAL TESTING 5. 6. 7. 8. 9. 10. 11.
Introduction to Psychological Testing Test Construction Item Analysis Scoring of Tests and Problems of Scoring Reliability Validity Norms
71 87 95 119 131 146 157
PART 3: APPLICATIONS OF PSYCHOLOGICAL TESTING 12. 13. 14. 15.
Applications of Psychological Testing in Educational Setting Applications of Psychological Testing in Counselling and Guidance Applications of Psychological Testing in Clinical Settings Applications of Psychological Testing in Organisational Setting
183 208 227 257
PART 4: ETHICAL ISSUES IN PSYCHOLOGICAL TESTING 16. Ethical Issues in Psychological Testing
285
vii
viii
Applied Psychometry
PART 5: FACTOR ANALYSIS 17. Basics of Factor Analysis 18. Extraction of Factors by Centroid Method 19. Applications of Factor Analysis
301 316 334
Bibliography Index About the Author
354 359 362
Foreword The field of psychometrics has been an important part of psychology and education for many years. Whilst broadly speaking, psychometrics is concerned with the theoretical and technical aspects of measurement (especially construction of test instruments), the development of theoretical approaches to measurement has gained popularity recently. Psychometrics has come a long way since its earlier focus, mostly on measuring intelligence. The field has now been applied more rigorously to the measurement of a range of psychological and educational variables, with a recent applied focus on measurement in the fields of health and organisational management. As we work in an ever increasingly accountable world, our ability to reliably measure a range of human attributes and responses is paramount. Whether it be assessing a person’s response to a psychological or other health intervention or measuring performance in an organisational setting, being able to rely on well-constructed, reliable and valid measures is vital. Having such measures allows us to use a platform of evidence in demonstrating change. We must continue to view psychometrics as an important science in assisting us for progress in developments in psychology, health and organisational settings. The moment we begin to lose sight of the rigorous science behind the way we assess and measure will be the point in time where our evidence base for our actions will rapidly dwindle. Whilst many texts are available on psychometrics and its applications to a variety of disciplines, Applied Psychometry provides the reader with a comprehensive coverage of many of the important aspects helpful to students, researchers and practitioners alike. Applied Psychometry is a fine example of a text that covers a range of important topics in gaining a working understanding of the field. It is not simply a text full of statistical formulae, but rather a text that systematically works through the main areas of measurement, psychological testing and its applications, ethical issue in psychological testing and factor analysis. Professor Narender K. Chadha is a highly experienced author and psychometrician. His strong background in organisational psychology, management and mathematics allows him to integrate the theoretical and practical side of psychometrics. In this text, Narender Chadha has brought together a wide range of topics relevant to psychometrics and integrated them in a way that will be helpful to students, researchers and professionals. The book is structured around sections covering the role of measurement in modern psychological research (for example, measurement error, power, parallel tests, and so on), the theoretical and practical aspects of testing (including test construction, scoring and item analysis, norms, reliability and validity), the applications of psychological testing in educational, counselling, clinical and organisational settings, the ethical issues around psychology testing and a detailed section on factor analysis.
ix
x
Applied Psychometry
I view Applied Psychometry as an important text in an increasingly important and accountable field. Its content is appropriate and has resonance with both undergraduate and postgraduate students, as well as for researchers and professional practitioners. Gregory S. Kolt, PhD Professor and Head, School of Biomedical and Health Sciences University of Western Sydney, Sydney, Australia
Preface Taking an idea and turning it into a book is not an easy task. This is especially true when the idea is regarding an abstract and quantitative discipline like psychometry. In the course of my long association with various institutions, I have observed that students are generally not very interested in psychometry despite the emphasis given to this branch by all academic and research institutions. This lack of interest may be due to a number of reasons but the one that I share with famous management wizards Thomas Peters and Robert Waterman, Jr, is related to the gross numerification—and, thus, ossification—of this discipline without exploring its applied potential. The natural outcome of all these have had been what Peters and Waterman (1982: 31) in their runaway bestseller In Search of Excellence: Lessons from America’s Best Run Companies have called ‘Paralysis through Analysis Syndrome’. While writing this book the purpose with which I charged myself was to come up with a reading that makes psychometry not only beautiful, fascinating and attractive, but also unlock its applied potential. My psychometry is ‘Action Psychometry’. To fulfil the agenda at hand I have adopted a traveller’s perspective wherein I have listed all the three major milestones on the road to psychometry, that is, Measurement, Psychological Testing and Factor Analysis. Keeping in mind its natural requirements, I have given a separate space to theoretical aspects of psychological testing, then its application, and finally the ethical issues involved in the entire exercise. So, all in all, the entire book is divided in five major parts. Part I deals with measurement and its various aspects in psychological research. It begins with building basics of measurement in the mind of readers and gradually takes them to its applied aspects in the form of Speed Tests, Power Tests and Rating Scales. It also discusses errors and problems involved in the psychological measurement under a separate chapter titled ‘Errors in Measurement Theory’. The second part of the book deals with the theory and practice of psychological testing and the important aspects of psychometry in this regard. This part of the book again starts with the basics of psychological testing and gradually introduces the problems related to test construction, item analysis, scoring, estimation of reliability, and validity and setting of norms or standards. Along with this, the chapters contain my personal reflections (‘Notes from the experiential blurb’), activity sheets, solved problems, critical thinking questions, and so on, at the appropriate places, which readers will definitely find interesting, insightful and useful. Apart from this, I have discussed two related aspects of psychological testing, that is, its practical applications in various settings and the ethical issues involved under two separate parts (Part III and Part IV, respectively) to address their special demand and relevance. Part III discusses the application of psychological testing in four major areas, that is, education, counselling and guidance, clinical setting and organisational setting. A novel feature of this part is the bibliographical directory of major psychological tests related to xi
xii
Applied Psychometry
these four major areas, which I am sure that students, academicians, planners as well as practitioners in the relevant areas will find equally useful. The final part of the book (Part V) deals with Factor Analysis, which includes chapter like ‘Basics of Factor Analysis’, ‘Extraction of Factors by Centroid Method’ and ‘Applications of Factor Analysis’. These chapters have attempted to simplify the complexities involved in factor analysis by offering lots of illustrations, worked examples, figures and diagrams at appropriate places. The book is a unique blend of psychometric theory and its application with the following main features: 1. A comprehensive reference, very well organised in five parts covering measurement, factor analysis, psychological testing, its application and the ethical issues involved. 2. Well-structured chapters, with chapter objective, activity sheet and experiential learning from the author’s own life. 3. Covers four major areas of application of psychological testing in educational, counselling, and clinical and organisational settings, with two psychological tests discussed in detail for each setting. 4. A comprehensive reference of various psychological tests meticulously organised under ‘Foreign Psychological Tests’ and ‘Indian Psychological Tests’ categories with all the relevant psychometric data. 5. APA guidelines regarding the ethical considerations in psychological testing. Further, the book is different vis-à-vis competing texts on the following grounds: 1. The book has attempted to incorporate current issues and debates, like the rise of qualitative research methodology and implications of its misonumeric character for the theory and practice of Psychometry, and so on, among others. This feature, again, is missing from most of the earlier books (like Freeman) on this subject. 2. As the title of the book Applied Psychometry suggests, the book emphasises on the application part of psychometry with a matching theoretical orientation. 3. The book provides an active teaching–learning interface with lots of live examples and activities. Apart from this, the book has been written keeping in mind the international format of the best books (exemplified by features like, ‘Notes from the Experiential Blurb’, ‘Test Your Knowledge’, ‘Chapter Objective’/‘Chapter Summary’, and so on) which is missing from most of the commonly used books on this subject. 4. The book is a comprehensive reader on psychometry wherein all the major themes of the subject find an appropriate space. Apart from this, it emphasises on numericals useful for the students appearing for various courses with psychometry as one of their papers. Available books emphasise more on theories and this book fills the vacuum of practical and numerical examples. I am thankful to the team at SAGE Publications, especially Ms Anjana Saproo, for their assistance and cooperation without which we would not have been able to finish this project in a mission
Preface
xiii
mode manner. A good book is one that is opened with expectations and closed with profit (Alcott), and the idea behind this book is to meet the expectations of its readers and ensure their academic, intellectual and practical gains. I shall be highly obliged to them if they will let me know to what extent I have been successful in my effort. There is large number of my students who inspired me to write this book and I would like to thank them. Here I will fail in my responsibility, if I do not acknowledge the help tendered to me by my very dear and affectionate student Mr Sanjay Singh. Mr Sanjay Singh helped and encouraged me without any interest as and when I approached him. Without his help, this book may not have been completed as per the deadline fixed by the publisher. Last, but not the least, I would like to extend thanks to my wife Neena for her support in all walks of life in completing this mission . Narender Kumar Chadha
PART 1 Measurement in Modern Psychological Research
2
Applied Psychometry
1
Basics of Measurement Theory
CHAPTER OUTLINE 1. 2. 3. 4. 5. 6. 7.
Measurement in science Definition and meaning of measurement Theories of measurement: Campbell’s theory, Steven’s contribution Types of measurement scale: Nominal, Ordinal, Interval, Ratio Attributes of measuring instruments Application of Measurement Theory to educational and psychological research Problems in psychological measurement
LEARNING OBJECTIVES At the end of this chapter, you will be able to: 1. 2. 3. 4. 5.
Conceptualise measurement Discuss the role and advantage of measurement in psychology Understand various theories of measurement Deal with the concept and various types of measurement scales Deal with problems and limitations of psychological measurement
3
4
Applied Psychometry
MEASUREMENT IN SCIENCE
A
lthough measurement in science has a long history (Leahey 2006) as compared to a hundred and a quarter century old discipline of psychology, the science of measurement and the science of psychology have grown and matured with each other. This is because the science of measurement enacts its own necessity by becoming a prerequisite, not only for the scientist conducting research but also for the practitioners trying to apply scientific theories and methods for social development and academic enrichment. Scientists are more concerned with the development of knowledge which is objective, exact and verifiable. As such the aim of a scientist or a researcher in any scientific inquiry is to collect facts about an object or a phenomenon or a system or a problem under investigation, objectively and precisely. These facts are not observed or considered in isolation from one another. Rather, an attempt is usually made to identify the nature of the exact relationship among them to interpret and formulate deductive or probable explanation of the object, or the system or the events under study. In order to ascertain the extent, dimension or magnitude of something, or to determine the attribute of something with precision, scientists so often resort to measurement. This helps them to obtain quantitative data or information about objects, phenomena, systems or their attributes. The quantitative data or information which accrues through measurement is more precise and meaningful than non-quantitative data or information, which is generally vague and, quite often, misleading. Through measurement and quantification with the aid of mathematical models and statistics, the scientists are able to distinguish objects or their properties, and predicate or establish relationship amongst them with a greater degree of refinement and exactness. Measurement is, thus, one of the essential elements of scientific investigation and discovery.
DEFINITION AND MEANING OF MEASUREMENT Stevens (1951) defined measurement as ‘the assignment of numbers to objects or events according to some rule.’ However, this definition contains some inherent assumptions, like, things or attributes always exist in some amount and whatever exists in some amount can be quantified and measured. Also, the attributes are quantified according to a set of rules or criteria known as the rules of measurement. So, measurement simply consists of rules for assigning numbers to objects to represent quantities of attribute. However, we can have another—and different—notion of measurement. For example, we can conceive measurement as a heuristic to social research because measurement acts as a heuristic to social research and the understanding of social behaviour. The meaning of this statement could be better understood with the help of Activity 1.1:
Basics of Measurement Theory
Activity 1.1
5
Measurement as a Heuristic to Social Understanding
Measurement is a heuristic to social understanding. For example, consider the following two persons A and B with attributes given: Person
Attribute
A
Very good student, high achiever, sharp, active, alert
B
Very high on learning graph, high performance, skilled, vigorous, effective
And now, if I ask you a question as to which person is more intelligent—A or B—are you finding it difficult to answer or taking more than the appropriate time? Ok. Now, if I give a statistic that A’s Intelligence Quotient (IQ) is 100 and B’s IQ is 105, and again ask the same question, how much time you took this time to answer my question? Probably a fraction of a second (if you are familiar with the term IQ and its meaning). Still I would refrain from making a claim that measurement has made our life easy, but there is no doubt that it has made the process of learning and knowledge-acquisition easy, simple and fast. So lets see what’s this thing called measurement? Source: Author.
THEORIES OF MEASUREMENT The development of the theory of measurement is of comparatively recent origin. The concept of measurement grew out of the evolution of Theory of Numbers and its application in physical sciences. Measurement Theory is primarily concerned with the development of a yardstick or an instrument with the help of which a system analyst or a researcher can measure the attributes of an entity/phenomenon/system under study. It is a process which involves assigning symbols, that is, numerals to people, objects, events or their attributes according to predetermined rules. A rule explicates the way in which symbols are assigned to entities. Whereas physical scientists use the term ‘property’ to delineate the quality/quantity of any physical entity, social scientists prefer the use of the term ‘attribute’ to designate the quality/quantity of any human or social phenomenon. The essence of the procedure is the assignment of symbols in such a way as to reflect one-to-one correspondence between certain characteristics of symbols or the number system involved, and a corresponding relation between the instances of properties to be measured in a concrete/conceptual entity or a set of entities. In some instances, the symbols which are assigned to entities or their attributes have no quantitative meaning; for example, in the measurement of academic achievement in a class, we may assign numerals or labels such as I or A to high achievers, II or B to average achievers and III or C to low achievers on the basis of their academic performance. In other instances, the assigned symbols have quantitative meaning. For example, in a clerical aptitude test where speed plays an important role, we count the number of questions replied to
6
Applied Psychometry
by each candidate and calculate the average speed of the candidates. Later, on the basis of this, we may assign the label/value ‘most efficient’ to the candidate whose speed is much above average, ‘efficient’ to those whose speed is average and ‘inefficient’ to those whose output is below average. It may be seen that in both the cases, certain values—in the form of numerals and numbers—have been assigned according to a procedure or a rule. The symbols which are used to represent nonquantitative aspects of entities or their attributes are called numerals. It means that numerals have no quantitative meaning. Symbols that are given quantitative meaning are called numbers, which enable us to use mathematical and statistical techniques for purpose of description, analysis and prediction. Thus, the numbers are amenable to statistical manipulation and mathematical analysis, which, in turn, reveals new information or facts about the entities, phenomenon or systems being measured. In the context of Measurement Theory, the term ‘assignment’ means matching, that is, numerals or numbers are matched with/mapped on to entities or their attributes according to some procedure. In other words, measurement always involves a process through which items in one set are matched with items in another set. To understand the mapping ideas in measurement, examine Table 1.1 which shows a number of students on the left and their achievement rating on the right, and each student is matched with his corresponding achievement rating. Table 1.1
Mapping Ideas in Measurement
Students
Achievements Rating/Scale Points
A B C D E F G
High Achievers – – Average – Low Achievers –
Source: Author.
Rules are the most significant component of the measurement process because the quality of measurement or the product of the measurement process largely depends upon them. Poor rules make the measurement meaningless and lead to misleading or irrelevant conclusions. Further, measurement is meaningful only when it is tied to reality and the function of the rule is to tie the measurement process to reality.
Mathematics and Measurement Mathematics is the formal, logical and systematic structure with the help of which scientists formulate formal models to represent and measure objects, phenomena, systems or their properties, as well as their relationship to the empirical world, according to set rules. With the help of mathematical or
Basics of Measurement Theory
7
statistical models, measurement is possible because there is a kind of isomorphism (that is, similarity of identity or structure) between (a) the empirical relation among the properties of objects and (b) the properties of formal game in which numerals are pawns and operate the moves. When this correspondence between the formal models and their empirical counterparts is loose and tight, we discover about matters of fact by examining the model itself. Thus, we calculate the flight of a bullet or the course of a comet without laying hands on either. And we are awed by the prodigious power of mathematics when we see what is beyond our vision. In the beginning, the concept of measurement and the number system were so closely bound together that no one considered mathematics and measurement as two distinct disciplines. In recent times, however, it has become clear that the formal system of mathematics is ‘empty play of symbols’ and constitutes a game of signs and rules having no reference to empirical objects or events. On the other hand, the process of measurement is concerned with systematic association of numbers or other mathematical concepts, such as vectors, for scaling or representing empirical attributes or relations in such a way that the values of the attributes or relations are faithfully scaled or represented as numerical values. Measurement is primarily concerned with mapping empirical observations on to some numerical structure or formal mathematical system(s)/model(s) and mathematics per se, with abstract or non-empirical model-building. Mathematics and measurement are no longer considered tightly interwoven, although measurement borrows its models from mathematics. These disciplines can be looked as two-faced bipartite endeavours (of science) looking towards the formal, analytic, schematic feature of model-building and towards the concrete, empirical and experimental observation by which we test the usefulness of a particular representation.
Campbell’s Theory of Measurement It was Norman Robert Campbell, who, in his writings published between 1920 and 1938, laid down the foundation of the modem theory of measurement. In his works, he has discussed the problems of measurement as applicable to the domain of science. Explaining what is measurement, he says, ‘Measurement is the process of assigning numbers to represent qualities (properties), the object of measurement is to enable the powerful weapon of mathematical analysis to be applied to the subject matter of science’ (Campbell 1960). In his masterly exposition, he visualised that when we study or measure objects, we, in fact, study or measure the properties or attributes of those objects. For example, when a person’s attitude is measured, it is often incorrectly assumed that the person has been measured. In actuality, what has been measured is an attribute of the person, that is, his attitude, personality, temperament, intelligence, and so on. In order to promote a person on a job, we try to measure or establish his technical proficiency. What we measure is one of the multitudes of attributes of the person and not the person. The attributes and the person (that is, the possessor of the attribute) are not the same thing and should not be mixed up as such. Nevertheless, it is also, in a way, true to say that to know the exact or nearly exact trait of a person is to know him, or when we measure the traits of a person, we measure him, or a person is nothing but the sum total of his
8
Applied Psychometry
traits. But, at the same time, the traits and persons are not the same thing. Campbell also points out that physical objects/phenomena could possess two kinds of properties: (a) quality-like and (b) quantity like, that is, one which represents the qualities of the objects and other which represents the quantities of the objects. To bring out the difference between the two types of properties, he states that no property can be regarded as a quantity of a substance, unless it is fully measurable and/or the process of addition which fulfils all laws can be found for it. For example, we can find a body whose weight is one unit and add to it another body with the same weight, to get the weight of two units. However, when we combine bodies of equal density, we always obtain a body of the same density. The distinction between the properties (namely weight and density) corresponds to the distinction between quantitative and qualitative properties of substances. This Campbellian idea is truly difficult to implement in psychology because we cannot say that two persons each possessing IQ score of 60 jointly equal a person whose IQ score is 120, or a person whose IQ score is 120 is doubly intelligent than a person whose IQ score is 60. According to Campbell, measurement applies to both types of properties. But, quantitative properties like length, weight, volume, height, and so on, can be measured by a ‘fundamental’ or direct way, that is, one does not require the measurement of other properties to measure them, and admit ‘stranger’ or higher level of measurement than quantitative properties. On the other hand, qualitative properties can be measured or determined by the ‘derived’ or indirect way, which is based on the measurement of other magnitudes. For instance, the values of density (d = m/v) can only be determined on the basis of a prior measurement of mass (m) and volume (v), or as hardness is measured on Meh’s scale up to a limited or an incomplete way. In psychological research, measurement is often qualitative and indirect, as psychological properties like intelligence, emotions, attitudes, and so on, cannot be accessed directly. They can only be indirectly assessed by observation, psychological tests or any other such technique. Thus, the theory of measurement, as enunciated by Campbell, can be stated in terms of the following: 1. 2. 3. 4.
Strictly speaking, one does not measure physical objects; one measures their properties. There are two kinds of properties possessed by substances: quality-like and quantity-like. Measurement applies to both, but each can be measured by different ways or modes. Quantity-like properties admit a more precise or a higher level of measurement than qualitylike properties. 5. Quantity-like properties such as length, weight, volume satisfy the law of addition and are measurable by fundamental or direct way. 6. Quality-like properties such as density and hardness are not additives and can only be measured by direct or indirect way.
Measurement in the Domain of Social Science The science of measurement was earlier a concern of physical scientists and it was thought that only empirical objects or their physical properties can be measured. As Peter Caws points out, ‘There seems
Basics of Measurement Theory
9
to be a general feeling that no physical property can really qualify as such, unless we know how to measure it, that is, unless we know how to describe situations in mathematical terms’ (Caws 2005). It was considered that sociological and psychological traits being abstract (not physical) and more qualitative in nature, are not amenable to measurement. However, in the recent past, efforts have been made by psychologists to measure psychological traits through statistical and mathematical manipulation. Further, by acceptance and application of scales and measurement methodology developed by psychologists in the other fields of social sciences, one of the major obstacles hindering the study of social systems or phenomena on scientific lines was removed.
Steven’s Contribution After Campbell, the major contribution to the theory of measurement was made by S.S. Stevens, Professor of Psychology at Harvard University. While agreeing with Campbell that measurement involves a linking of the number system to discriminable aspects of people, objects or events according to one or another rule or convention, Stevens in his earlier writings, put forward the view that assignment of numbers or numerals under different rules or conventions leads us to different kinds of scales of measurement, namely, Nominal Scale, Ordinal Scale, Interval Scale and Ratio Scale. Later, he proposed another kind of scale called Logarithmic Interval Scale, which according to him, can possibly be derived. According to Stevens (1951): Scales are possible in the first place only because there exists an isomorphism between properties of the numeral series and the empirical operations that we can perform with the certain aspects of the objects. This isomorphism, of course, is only partial. Not all the properties of number and not all the properties of objects can be paired off in a systematic correspondence. But some properties of objects can be related by semantic rules to some properties of the numeral series. In dealing with different aspects of objects, we can invoke empirical operations for determining equality (the basis for classifying things), for rank-ordering, and for determining when differences and ratios between the aspects of the objects are equal. Stevens pointed out that with a Nominal Scale, we can determine equality or sameness among entities; with an Ordinal Scale, we can determine the degree to which a particular attribute is possessed by entities; with Interval Scale, we can determine not only the relative rank of the entities with respect to the degree to which a particular attribute or characteristic is possessed by them, but can also determine how large the difference between the entities is at the zero value of the variable attributes in absolute sense. With Ratio Scale, it is possible to determine all four relations, namely, equality, rank-order, equality of interval and the equality of ratio among the entities or phenomena. Stevens further pointed out that the terms ‘fundamental’ and ‘derived’ were used by Campbell to describe two classes of measurement operations/conventions and not two different classes of scales.
10
Applied Psychometry
Both fundamental measurement and derived measurement usually result in Ratio Scale. Further, the derived magnitudes are basically mathematical functions of certain fundamental magnitudes, as we find in the case of measurement of density.
TYPES OF MEASUREMENT SCALE A scale reminds us of a measuring instrument—a ruler, thermometer, weighing machine, and so on—or a scale may be looked upon as a set of items, as in the Likert scale. It can also be a set of questions so constructed that entities being measured can be systematically assigned scores or values on the chosen scale. Assignment of such values (or scores) is related to the amount of the attribute (or quality) and the entity process. However, by scales of measurement, Stevens meant different ways of measurement. Although a larger number of scales exist and can be created for measuring attributes of people, objects, events, and so on, all scales belong to one of the four basic types: 1. 2. 3. 4.
Nominal, Ordinal, Interval and Ratio.
These scales are actually four hierarchies of measurement procedures, the lowest in the hierarchy being the Nominal Scale measurement and highest Ratio Scale measurement. That is why the expression ‘levels of measurement’ has been used by some scholars for scales of measurement. It may be pertinent to record here that the measurement of many physical quantities has progressed from scale to scale. Initially, men knew temperature only by sensation; when things were only a shade warmer or colder than other things, temperature belonged to the ordinal class of scales. It became an Interval Scale with the development of thermometry and, thereafter, when thermodynamics used the expansion ratio of gases to extrapolate the absolute score, it became a Ratio Scale. The major characteristics of these classes of scales and the empirical operations that are possible with each scale are shown in Table 1.2.
Nominal Scale This is the most basic scale of measurement. It consists of formulating purposeful and homogeneous (that is, having logical relationship) set of classes or categories of entities, facts or data on the basis of some trait(s), and assigning them some symbols or numerals as a way of differentiating two or more classes of entities or data, and keeping track of them. It may be pointed out that each category and its members are assigned the same symbols or numerals. For example, we may assign individuals
The above plus determination of rank order (that is, greater or less than)
The above plus determination of equality of intervals.
The above plus determination of quality of ratio.
Ordinal
Interval
Ratio
A/B = C/D A/B ≠ C/D
A≠B For example, if categories are male and female, then a person A who falls in the male category will not be similar to another person B who falls in the female category in terms of chosen attributes. A > B, B < A For example, if the IQ score of A is greater than B, then (the only other possibility) the IQ score of B is less than A. Hence, the mathematical relation of symmetricity is applicable to Ordinal Scale. (A – B) = (C – D)
Mathematical Relations
Mean, standard deviation, average deviation, correlation ratio, t-test, Product Moment Correlation. Y = ax, where absolute Geometric mean, zero exists, a > 0 (one coefficient of degree of freedom). variation harmonic mean, per cent variation.
Y = ax + b, where a > 0 (two degrees of freedom).
Numerosity, length, weight, density, Kelvin-scale, time interval, height, Cephalic Index.
Celsius and Fahrenheit scales, calendar dates, IQ scales.
Arranging the students of a classbased on their marks obtained in the psychology exam or their height or marathon results in terms of 1st, 2nd, 3rd position and so on.
Median Percentile Rank correlation, Sign test, Run test.
Y = f(x), where f(x) means any increasing monotonic function.
Typical Example Male vs female classification; Literate vs illiterate classification. Various psychological scales/tests use nominal property; for example, Maudsley Personality Inventory (MPI) score categorises test-takers in terms of introverts and extroverts.
Permissible Statistics
Y = f(x), where f(x) Number of cases means any one to one given set mode; substitution; categories contingency are mutually exclusive and a person can fall in only one category.
Permissible Transformation
Types of Measurement Scales with Major Attributes
Source: Author. Notes: 1. Scales are listed in an ascending order of power or strength. The stronger scales have the ability to perform the empirical and mathematical operations of the weaker ones. Thus, Ratio Scale is supposed to perform all the operations of the Interval, Ordinal and Nominal Scales as well as those which are unique to itself. 2. The scale type is defined by the manner in which numbers can be transferred without empirical information. 3. The basic operations needed to create a scale are listed in the second column. 4. In column three, lower case letters indicate numerals and upper case letters indicate numbers. 5. The fourth column gives the mathematical transformations that leave the scale form invariant. Any numeral x on a scale can be replaced by another numeral y, which is a function of x as listed in the column. 6. The fifth column lists, cumulatively downwards, some of the statistical operations that show invariance under the transformations of the fourth column.
Determination of Equivalence or Non-Equivalence
Nominal
Scale Type/ Basic Empirical Level Operations Possible
Table 1.2
12
Applied Psychometry
to such categories as males and females depending upon their sex, or as high intelligent and low intelligent based on intelligence scores, as externally controlled individuals and internally controlled individuals, as adjusted and maladjusted individual, and so on. The following exercise will make these points clear:
Exercise In a pre-poll survey done on 1,000 persons for the November 2008 Presidential Elections in United States of America (USA), 500 persons voted for the Democratic Party (Barack Obama) and 400 persons voted for the Republican Party (John McCain), and the rest were undecided/neither. Express this survey on the Nominal Scale?
Solution The voters fall in three mutually exclusive categories, that is, those who voted for the Democratic Party (50 per cent), those who voted for the Republican Party (40 per cent) and those who were undecided or voted for neither. So, the categories of Nominal Scale are (a) Democratic Voters, (b) Republican Voters and (c) Undecided/Neither. As a rule, if a set of entities can be assigned to a set of classes that are (a) exhaustive, that is, they include all the entities or the data in the set, (b) mutually exclusive, that is, in any case no entity or data of the set belongs to more than one class and (c) each class is represented by a distinct symbol or label, then, a nominal measurement is achieved. By nominal measurement, we are able to decide whether a given person, object, fact or datum belongs to nominal scaling and can determine equivalence or non-equivalence between things or data. Further, nominal measurement is only concerned with discriminating and comparing entities or data as the type of characteristics possessed by them, rather than on the degree to which a characteristic is possessed. Nominal measurement is the least sophisticated and is considered the lowest level of measurement. At the same time, it is basic to all the higher levels of measurement. The only arithmetical operation applicable to Nominal Scales is counting, a mere enumeration of members in a category or a set. Further, mathematically, the basic characteristic of the nominal level measurement is that the entities or data placed in a given category are equal to each other with respect to the given characteristic. The relation of equality or entity between things expressed by the sign ‘=’ (is equal to) has the logical properties of being reflective, symmetrical and transitive. ‘Reflexivity’ means that every entity in a category is equal to itself. For example, c = a, for all values of a in a given category. In a ‘symmetrical’ relation, if a stands in a certain position with respect to b, then b stands in the same position with respect to a (that is, if a = b then b = a). In a ‘transitive’ relation, if a stands in a certain position with respects to b, and b stands in a similar position with
Basics of Measurement Theory
13
respect to c, then a also stands in that position with respect to c, that is, if a = b and b = c, then a = c. These three logical properties are operative among objects or symbols in the Nominal Scale.
Ordinal Scale In the Ordinal Scale, the entities or the data are ranked with respect to the degree to which a particular attribute is possessed by them. In other words, through Ordinal Scale of measurement, we seek to determine the rank order or inequality of elements to which numbers are assigned. A typical example is: A is greater than or better than or more useful than B, B is greater than or better than or more useful than C, and so on. Such relations are designated by the symbol ‘>’ which means ‘greater than’ in reference to particular attributes. We can continue with the earlier exercise to make things clearer.
Exercise Prepare an Ordinal Scale based on the polling survey exercise given above.
Solution Since the political parties have got varying amount of votes that can be arranged in an ascending or descending order, the Ordinal Scale is possible and will be as follows: Choice of American Voters Democratic Party (50%)
>
Republican Party (40%)
>
Undecided (10%)
The ‘>’ relation (or relation of higher-level order) has the logical properties of being irreflexive, asymmetrical and transitive. Irreflexivity is a logical property wherein it is not true that for any A, A > A. Asymmetrical means that if A > B then B ≯ A. Transitivity means that if A > B and B > C, then A > C. In other words, if the result of a marathon race is measured on the ordinal level scale, one can infer that if a person A is faster than a person B, if person B is faster than person C, then person A is faster than person C. This relation is maintained with regards to all the individuals in the group. Ordinal Scales allow slightly more refined measurement than do the Nominal Scales because they do allow for each entity being measured to determine whether that entity has more (that is ‘>’ I) of the attribute in question than any other entity of the same (that is ‘=’ ) amount or less (that is ‘<’). For instance, it is easier to identify and level willing workers in contrast to unwilling workers, rather than making comparison among workers in terms of the degree to which the trait is possessed.
14
Applied Psychometry
An Ordinal Scale, however, does not reflect or tell how much in the absolute sense is the attribute possessed by the entities or how far apart are the entities with respect to the attribute.
Interval Scale In the case of the Interval Scale, entities are not only ordered and ranked with respect to some measured attribute but the distance or difference between neighbouring ranks or states is also reflected, and this distance is constant between each successive interval or rank. The examples of Interval Scales are Fahrenheit and IQ scales. With the Interval Scale, we come to a form of measurement that is quantitative in the ordinary sense of the word. For instance, on the Interval Scale, heat at 40 degree Centigrade or 104 degree Fahrenheit is not twice as hot as 20 degree Centigrade because temperature equivalent to 20 degree Centigrade on the Fahrenheit scale is 68 degree Fahrenheit, and 104 degree Fahrenheit is not equal to 2 × 68 degree Fahrenheit. For clarification, see the calculation below: The formula for converting degree Centigrade into Fahrenheit is— F − 32 C = 9 5 where F = Temperature in degree Fahrenheit and C = Temperature in degree Centigrade. Now, to find the value of 40 degree Centigrade in terms of Fahrenheit— F − 32 40 = 9 5 Hence, F = 72 + 32 = 104 degree Fahrenheit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(i)
Now, to find the value of 20 degree Centigrade in terms of Fahrenheit— F − 32 20 = 9 5 Hence F = 36 + 32 = 68 degree Fahrenheit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
(ii)
From (i) and (ii), we can say that as 104 degree Fahrenheit ≠ 2 × 68 degree Fahrenheit (2 × 68 degree F = 136 degree F, which is more than 104 degree F). Hence, 40 degree Centigrade ≠ 20 degree Centigrade.
Also, when we reflect on the fact that 50 degrees Fahrenheit is equal to 10 degrees Centigrade and 100 degrees Fahrenheit is equal to 38 degrees Centigrade, we can very clearly see that 38 degrees Centigrade is not twice as hot as 10 degrees Centigrade. In the same manner, according to Minium et al. (2001), 100 degree Centigrade temperature is not twice as hot as 50 degree Centigrade. In fact, according to them, 100 degree Centigrade is only 1.15 times the 50 degree Centigrade. This problem mainly arises due to the fact that Interval Scales lack an absolute zero, that is, a point where the property being measured is absent and, thus, offers a reference point for comparing other observations on the scale.
Basics of Measurement Theory
15
Though very few Interval Scales have been developed in the areas of social sciences, we can achieve interval measurement of attributes such as (a) interest, attitude, personality, motivation, and so on, (b) status, reading interests, and so on and (c) recall, relevance or usefulness of information for this purpose. We can design numerical rating scales beginning from an arbitrary zero point representing the total absence of an attribute or quality being measured, and increasing the value in successive equal units on the scale up to the desired limit. However, it may be pointed out that it is a rather arduous and monumental job to create a measure or scale which exactly and precisely represents equal successive units representing an increase in equal quantity. Let us consider the following example of an 11-point rating scale (Figure 1.1) for measuring attitude towards paranormal phenomenon like telepathy precognition. Figure 1.1 0
1
2
Negative
3
Attitude towards Paranormal Phenomenon 4
5
6
Neutral
7
8
9
10
11
Positive
Source: Author.
In this scale, we do have a logical zero point indicating an absence of attitude, and the scale moves from the zero point—meaning the negative or zero attitude—to 10 point, meaning the positive attitude in equal apparent units of one, and it looks as if it is an interval measure. But it becomes an interval measure only if it could logical1y and scientifical1y define the proposition that each of the 11 point intervals represents exactly the same increment in attitude. Of course, to do this, we have to define precisely the concept of attitude and describe meaningful stages or grades of attitude which are equidistant, and then attempt to develop a scale, starting from the stage where the value of attitude is nil to the stage of maximum or highest attitude. If it is not done, then this scale—despite the intervals appearing equal—remains only an Ordinal Scale.
Ratio Scale The Ratio Scales are the most sophisticated of all the four measurement scales. Weight, length, time interval, electric resistance and temperature measured on the Kelvin scale fall within Ratio Scale measurement. A Ratio Scale has all the characteristics of Nominal, Ordinal and Interval Scales and, in addition, an absolute or natural zero point representing the absence of magnitude of a variable attribute. So, the main characteristics of the four measurement scales can be summarised in the following manner (Table 1.3):
16
Applied Psychometry
Table 1.3
Scales of Measurement and their Properties Property
Type of Scale Nominal Ordinal Ratio Interval
Magnitude
Equal Intervals
Absolute Zero
No Yes Yes Yes
No No Yes Yes
No No No Yes
Source: Kaplan and Saccuzzo (2005).
As pointed out earlier, a given body at 40 degrees Centigrade cannot be said to possess four times the heat or kinetic energy of heat than at 10 degrees Centigrade. On the other hand, if the temperature of the body is measured on the Kelvin scale, which is the true measure of kinetic energy of heat associated with the random motion of molecules of the substance, we can say that the body possesses four times the kinetic energy of heat or the motion of the molecule is four times as rapid at 40 degrees Kelvin as at 10 degrees Kelvin. We know that heat is the internal/kinetic energy which an object possesses because its molecules are in motion. At zero degree absolute Kelvin scale, it is a point at which matter ceases to agitate. In other words, at zero degree Kelvin, molecules of all bodies are completely at rest or motionless. It may be pointed out that zero degree Kelvin = –273 degree Centigrade, that is, the lowest temperature—absolute zero—which can never be achieved. Once the Ratio Scale is constructed, its numerical value can also be transformed into another unit of the Ratio Scale by multiplying each value by a constant which is expressed algebraically. If x is the measure or value of a given attribute in some specific units, then y = ax would be the measured value on another scale where ‘a’ units of second kind make one unit of first kind. Thus, length of an entity in inches (inch) can be converted to corresponding length in centimetres (cm) by the use of the following equation: 1 inch = 2.54 cm or, 1 cm = 1/2.54 inch 1 cm = or, 1 inch =
3937 inch 10000 10000 cm 3937
ATTRIBUTES OF MEASURING INSTRUMENTS The quality of research largely depends upon a good or effective measure, and to develop a good measure is an arduous task. Measurement is an exact formulation in research and conceptualising or providing precise definitions of concepts pertaining to the phenomena or their attributes being measured is of basic importance. That is, one must know what it is that is being measured. For
Basics of Measurement Theory
17
example, if one wants to measure the usefulness or effectiveness of teaching material, one must conceptualise what could be the indicator of its usefulness or effectiveness. The other criterion which has been laid down by the measurement theorists to achieve meaningful measurement is to evaluate or test the instrument with respect to its (a) validity and (b) reliability. These two are extremely important properties of all sound measures. The validity of a measure is defined as the degree to which a measure actually measures what it claims or seeks to measure (Nunnally 1967). Further, it is necessary to gather some sort of evidence which provides us the confirmation that the measuring device shall, in fact, measure what it is/was expected to measure. Traditionally, three basic types of validity have been established, each of which relates to a different aspect of the measurement situation. These are (a) content validity, (b) empirical validity and (c) construct validity. To validate a certain measuring instrument, these aspects must be checked. The reliability of a measure is the degree to which the results of measurement are free of error. A measuring instrument is considered reliable if its results remain constant when the measurement of a given attribute is taken by different observers or at different points in time. In other words, an instrument is known to be positively reliable if it gives consistent results from one set of measurement to another. It may be pointed out that some error is bound to occur in any type of measurement. To the extent to which measurement error is slight, a measure is said to be reliable. The other factors that can cause error in measurement, besides the defective measuring instrument, are (a) the lack of stability in the attribute being measured, (b) the skill, absent-mindedness or carelessness of the investigator and (c) lies in the pool of the researcher (at the time of collecting data). Reliability of a measuring instrument can be established by one or more of the following methods: test–retest, parallel forms and split-half, and method of rational equivalence. Because of the nature of measurement in behavioural and social sciences, the errors that occur when social attributes are measured are likely to be much more than when physical properties are measured. This fact needs to be kept foremost in mind by a researcher while measuring social phenomena and human attributes.
APPLICATION OF MEASUREMENT THEORY TO EDUCATIONAL AND PSYCHOLOGICAL RESEARCH Scientists are more concerned with the development of knowledge which is objective, exact and verifiable. As such, the aim of a scientist or a researcher in any scientific inquiry is to collect facts about an object or a phenomenon or a system, or a problem under investigation, objectively and precisely. These facts are not observed or considered in isolation of one another. Rather, an attempt is usually made to identify the nature of the exact relationship among them to interpret and formulate deductive or probable explanation of the object, or the system or the event under study. In order to ascertain the extent, dimension or magnitude of something, or to determine the attribute of something with precision, scientists so often resort to measurement. This helps them to
18
Applied Psychometry
obtain quantitative data or information about objects, phenomena, systems or their attributes. The quantitative data or information which accrues through measurement is more precise and meaningful than non-quantitative data or information, which is generally vague and quite often indefinite. Through measurement and quantification with the aid of mathematical models and statistics, scientists are able to distinguish objects or their properties, and predicate or establish a relationship amongst them with a greater degree of refinement and exactness. Measurement is, thus, one of the essential elements of scientific investigation and discovery. It is also the sine qua non in systems study, decision-making or any other problem-solving activity. Application of measurement principles and scales enables a system-analyst or manager to weigh, evaluate and judge the relative utility or outcome of each of the alternative designs or courses of action available to them, and choose the best or the most suitable amongst them.
PROBLEMS IN PSYCHOLOGICAL MEASUREMENTS Indirectness of Measurement Various psychological attributes are accessible to research and measurement only indirectly. Generally, the attributes underlying psychological processes are presumed to be manifesting themselves through overt behaviours only, which are considered as objective and quantifiable. For example, if a researcher is interested in measuring the personality dimensions of a subject, then it is something that is not directly available for measurement as physical quantities—like length—are visible and concretely available for observation and assessment. The only way to measure it is to assess the person on a set of overt or covert responses (for example, by administering a psychological test) related to his personality or other psychological attributes of interest. This renders the meaning and scope of psychological measurement limited and inexact.
Lack of Absolute Zero Absolute zero, in case of psychological measurement, means a situation where the property being measured does not exist. The absolute zero is available in case of physical quantities, like length, but is very difficult to decide in the case of psychological attributes. For example, suppose an investigator wants to measure shyness, or say, attitude towards fashion, then it is very difficult to define and find a situation where there is absence of shyness or the attitude towards fashion.
We Measure a Sample of Behaviour not the Complete Behaviour In psychological measurements, a complete set of behavioural dimensions is not possible and we take only a carefully chosen sample of behavioural dimensions to assess the attributes in question.
Basics of Measurement Theory
19
For example, the Weschler Intelligence Scale uses carefully chosen 35 words from the English dictionary to judge the vocabulary of the test taker. Although the sample is chosen only after fulfilling the various psychometric criteria, like randomness, representativeness, and so on, it is always questionable to reach at a conclusion about an aspect of behaviour, only by measuring a small, though representative part of it.
Lack of Sufficient Stimulus/Responses Threshold Another problem encountered in psychological measurement is the creation of sufficient amount of variable strength or threshold, which is actually relevant while studying the particular psychological attribute. Experimental method, the main method behind the scientificity of psychology, heavily suffers from this problem. Apart from this, it is also very difficult to decide the relevant levels of the response threshold that is adequate for an accurate prediction of the behavioural dimension in question.
Uncertainty and Desirability Involved in Human Responses Test subjects often give uncertain and desirable responses which generally negates the entire purpose of the psychological measurement. Uncertainty may arise either due to the negligence on part of the researcher, or carelessness on part of the subject(s), or due to uncontrolled extraneous variables. Although the researcher may correct such problems arising out of uncertainties involved in the case of psychological measurements by adhering strictly to the tenets of scientific research and measurement, treating the problem of desirability is more challenging. On tests of intelligence, though the test items like an arithmetical ability, logical numerical ability, and so on, lessen the scope of the desirability of responses on part of the subjects, the subject(s) may resort to guessing, which may again fail the purpose of psychological research and measurement.
Variability of Human Attributes Over Time Various human attributes, like intelligence, personality, attitude, and so on, are likely to vary over a period of time, and sometimes even hours are sufficient to provide scope for such variations. Psychological attributes are highly dynamic and they continuously undergo organisation and reorganisation. To capture these fluctuating attributes in terms of exact numbers is really an uphill task for any researcher. Apart from this, this variability of attributes acts as a threat to the validity of psychological research and measurement.
20
Applied Psychometry
Problem of Quantification It is questionable whether numbers are eligible and capable enough to denote all the psychological attributes. Quantitative measurement has its own limitations because everything that exists may not always exist in some amount, and even if it may, we may find it difficult to assign this amount a number that exactly captures its meaning and essence. For example, various emotions like disgust, jealousy, grief, surprise, and so on, are difficult to quantify. A researcher may find it difficult to decide various levels for such variables in quantifiable terms, and the choice of a number that is appropriate for a particular level of such attributes may become even more difficult.
2
Errors in Measurement
CHAPTER OUTLINE 1. What are Errors of Measurement? 2. Sources/Types of Error: (i) (ii) (iii) (iv) (v) (vi) (vi)
Accidental/Chance Error Systematic/Biased Error Interpretative Error Variable Error Personal Error Constant Error Statistical Error z z
Errors of Descriptive Statistics Errors of Inferential Statistics
LEARNING OBJECTIVES After reading this chapter, you will be able to understand: 1. 2. 3. 4. 5.
What are errors of measurement? Where do they come from? What are the different types of measurement errors? What are Type I & Type II errors? How to derive equations for various ‘test errors’?
21
22
Applied Psychometry
WHAT ARE ERRORS OF MEASUREMENT?
T
o error make mistakes is quite natural. Even in our day-to-day life, we commit a number of errors; some of them because of our nature and others because of sudden situational changes or chance factors. This implies that errors are as necessary as purity, because nothing can be labeled as ‘complete’ or ‘perfect’ in itself. Though errors are unavoidable, they can certainly be reduced to a minimum by taking precautions. The same is true for measurement—be it physical, mental or psychological—though the errors of measurement in mental science and psychology are much more prevalent than in physics because the nature of the subject is comparatively more complex. For example, four individuals A, B, C and D were asked to measure the length of a 50 inch long table. The length measured by A was 44 inches, and 47 inches and 48 inches by C and D, respectively. Although the table is a physical object, the four persons did make error in measuring its length. The error in physical measurement can be calculated by mathematical numbers. In contrast, for mental sciences and psychology, both the measurement and errors of measurement are difficult to assess because in addition to the personal characteristics of the subject, the situational and chance factors also influence the results. The errors of measurement are calculated by different statistical measures. But what is meant by errors of measurement? Generally speaking, the difference between the actual score of a person on a certain job and the obtained score by him is called error of measurement. For instance, on a certain test, the intellectual capacity of a child is calculated as 115 but his actual intelligence quotient (IQ) is 120. This error is due to the unreliable test used and this is called the error of measurement. To make it clearer, let us consider another example. In a language test of English, a student scored 70 marks out of 100, but in another language test of English, he scored 50 marks out of 100. This is attributed to error either in test or in measurement. Errors of Measurement refer to the difference in an individual’s hypothetical true score and the actual score obtained by him. Since exact determination of an individual’s hypothetical true score is not possible, all the measurements carry an element of error. The point to be remembered is that there exists an inverse relationship between error of measurement and the confidence one can lay in the measurement, that is, the reliability of the measurement. The lesser the error of measurement, the more will be the reliability of the test (or any other measurement technique used). Thus, the reliability of test/instrument can be estimated by error of measurement. It can, therefore, be concluded that the errors relating to measurement are those that are usually committed while collecting data.
SOURCES/TYPES OF ERROR Psychologists have divided the errors of measurement in two types: 1. Accidental/Chance errors and 2. Systematic/Biased errors.
Errors in Measurement
23
1. Accidental/Chance Errors These errors, as the nomenclature suggests, can occur any time; for example, sudden noise during test administration or physical inconvenience or pain experienced by the subject during the administration, and so on, are certain interfering factors that can produce errors in the results. As these factors are situational, the amount of chance errors varies. Also the direction in which they work varies, that is, such errors can sometimes increase the resultant and sometimes decrease its value. When the scores of students increase or decrease, it is the chance factors that are at work. An important quality of chance factors is that with repeated measurements, these errors diminish naturally. Such errors are also inherent in the test/instrument being used and these errors are called testcentred errors. It is not possible for any test to be totally culture-free because no test can contain all the items applicable to all types of cultures. Any test administered in two random samples from a population yields different results and this difference in results is attributed to the test-centered errors. This type of error is test-centred because items related to different populations cannot be put together in one test. The second type of chance errors are subject-centred errors. All the factors related to the subject like health, motivation, will, attention factor, working ability, and so on, do matter during the administration of the test. Any disturbance in these factors can bring about significant difference in the results. The third type of chance error is assessment error. Generally, sufficient numbers of examiners are not available to assess the subject’s result, which causes inconvenience as well as errors of measurement. In material assessment, such errors are less, but in the assessment of detailed or essay type responses, these errors are more prominent because the ‘situation factor’ has to be considered. Thus, the degree of such errors is more.
2. Systematic/Biased Errors As these errors are in a specific direction, they can produce quite misleading results. These errors are a result of polluted thoughts, personal biases, improper morals, and so on, and these errors can never be removed. Under such errors, we first talk about personal errors which arise from mistakes made by the tester during the test administration or lack of alertness on his part; for example, errors made in reading the thermometer, stop watch or the test manual. Second, such errors occur when in a haste, the tester starts or stops the stop watch early or when the start or stop of the stop watch is delayed due to some reason. This shows its effect directly on the experiment or results, and this destroys the actual aim of the study (or the experiment). These errors can be removed by comparing the results obtained by several testers/experimenters. The second type of biased errors arise from carelessness. These errors are more evident when, for example, the experimenter instead of writing 0.1 writes 1 or instead of writing 0.1 writes 0. 01, or instead of +ve (positive) he writes –ve (negative). These errors can actually alter the results. In order
24
Applied Psychometry
to avoid these errors the experimenter and the evaluator should be very careful while recording and calculating the results. The next type is unavoidable errors. Social scientists are generally interested in humans, and human behaviour is influenced by so many internal and external factors that it is perhaps impossible to tell the number of these factors. The most evident factors can be controlled by the scientists but there are still too many factors left uncontrolled, and these factors leave their impact by altering the behaviour in some way or the other. There are many factors which, we might think, cannot influence the particular behaviour but still they do leave some impact. These unchecked/uncontrolled factors produce errors called unavoidable errors. These errors can be reduced by controlling more factors. In addition to the above stated classification, there is another classification for errors of measurement by Mursell (1947). According to him, there are four kinds of errors of measurement in behavioural studies.
3. Interpretative Errors Interpretative errors occur due to the wrong or erroneous interpretation of the (test/experiment) results or scores. Sometimes these errors occur because of a set of wrong assumptions on the part of the researcher, and sometimes these errors occur due to some inherent problems related to the numerical measurement of the psychological properties. There are two shortcomings in the measurement of behavioural quality. First, there is no absolute zero point. Second, it is not certain that whether all the points on the scale are equidistant. Because of these two reasons, in measurement, there is no value or meaning of data and results, until the results are interpreted by other persons. In these types of errors, the examiner is usually not able to understand as to in which group’s context is an individual’s score to be interpreted. In such a situation, generally, the examiner evaluates inappropriately and also interprets incorrectly. Such mistakes are known as ‘interpretative errors’. These are committed due to the misunderstanding by one or the other regarding: 1. Which kind of group an individual has been compared to and 2. In what way the comparison between the individual and the group has been expressed. For example, if in a test, the scores obtained by the students of high school are compared to the scores obtained by the students of Intermediate or those of class VIII, the difference in the interpretation of results will be significant. Similarly, the difference would be evident in the hypothesis. These types of errors are related to the standardisation of the test. By standardisation, we mean that a test should be administered to well-defined groups and the scores obtained by these groups should be recorded because these records set the standard for a particular test. For any test, the standards can be prescribed for many groups and these standards can be explained in different
Errors in Measurement
25
forms such as occupation, geographical area, sex, age, school grade, and so on. The main objective of this distribution is to compare an individual’s score to appropriate group norms. In order to control such errors, the experimenter should particularly keep in mind the time factor during the standardisation process. In order to interpret an individual’s score, it is essential to understand two such groups to which an individual can be appropriately compared. The first group is one, of which the individual himself is a part (member), and his actions and abilities are compared to those of its other members. The other group is that which the individual aspires for. For example, if a college student is interested to be a lecturer, then it will be important to compare his educational qualifications and test scores to those possessed by the existing lecturers of college. The final problem that is to be considered regarding the interpretative errors is that the individual’s obtained scores are converted into derived scores like mental age, IQ, standard scores, and so on, so that they could be compared to a normative group.
4. Variable Errors The second type of errors in psychological measurement is known as variable errors. These errors are consequences of impurities arising due to different reasons and situational factors. For example, if the IQ of a person is measured during three different occasions by different persons, the results will vary. These errors are evident in psychological as well as physical measurements. Snapping of lead during administration or any kind of noise, pain experienced by the subject during administration, and even the incorrect instructions by the test conductor, can alter the results. All these are situational errors and these are called variable errors because the degree of error varies from person to person and if one person is tested twice on different occasions, the degree of error of measurement would be less. Such type of errors in test can be estimated through the amount of confidence one can lay on a test (the test reliability).
5. Personal Errors The third type of errors in measurement is called personal errors. These errors are due to subjective element of an individual. For example, if four persons sitting in a car are asked to read the speedometer, the difference in readings will be quite evident. Similarly, two persons evaluating and interpreting the same response do so quite differently because each has a unique perspective. In the same way, two testers evaluating one subject express their results differently and if the same subject takes the same test twice on different occasions, it gives different results. Such errors are personal errors. Thus, personal errors should be checked out (kept in mind) during the measurement of behavioural qualities. Efforts are made to reduce personal errors during testing and measurement through objectivity.
26
Applied Psychometry
6. Constant Errors These are based on the fact that most of the behavioural qualities are measured indirectly. It is not possible in psychological measurement, like in mental ability, to dissect the person’s brain and tell that this much is his mental ability or intelligence! Instead, here, the measurement is related to internal abilities and qualities. It is evident that the score of any individual on a mental ability test depends on his ability to read as well. The difference in these two qualities gives rise to constant error. Such errors are called constant errors because the amount of such errors remains constant irrespective of the tester. Such errors are related to the test validity. Thus, in order to reduce these errors, it is most important to find out whether the test actually measures what it is supposed to measure, that is, is the test valid? Also, the description of errors of measurement remains incomplete unless statistical errors are mentioned.
7. Statistical Errors Generally, statistical errors are divided into two types: 1. Errors of Descriptive Statistics: Under these are included standard errors and probable errors. 2. Inferential Errors: Under these are Type I and Type II errors.
Errors of Descriptive Statistics There are two ways of calculating errors of descriptive statistics. The first is the standard error and the second is the probable error. Standard errors get quite affected by extremes because standard errors are calculated using standard deviations. Thus, the standard deviations in such cases are impure. In such conditions, probable error is calculated. Both standard error and probable error indicate the amount of error or expected error in the data obtained. After calculating these errors, we can also know whether these errors are due to chance factors or due to deviation in variables. If these are due to chance factors, the degree would be less. But if these are due to differences in variables, the degree of error would be high. The determination of error depends to a great extent on the nature of research and planning. Sometimes we agree to take 5 per cent chance errors and sometimes it becomes difficult for us to take even 1 per cent chance errors.
Standard Errors Whenever the mean of a sample is calculated, the obvious argument is whether this mean is appropriate or not, that is, what may be the difference between the sample’s mean and the mean of population from which the sample is drawn. The difference between the two means can be expressed
Errors in Measurement
27
through standard error. Standard deviation of the distribution of statistics is called the standard error of that statistics. Therefore, standard error is that value which implies the amount of variation in the mean and the standard deviation of a sample, and the mean and the standard deviation of the population from which the sample is drawn. The standard error is important in order to study and understand the characteristics of the sample. Through standard error, we can know how near the sample is to the population concerned. The less the standard error of a sample, the more it represents the population. A good sample will show low standard error. Separate formulae are used to calculate standard error of different measurements, some of which are given below: 1. Standard error of Mean of small sample is given by: SEM =
Q N −1
where, Q = standard deviation of the sample N = size of the sample SEM = standard error of mean (standard mean error) Note: If the sample size is leas than or equal to 30, it is considered as a small sample. If the size of the sample is more than 30, the sample is large.
2. Standard error of the Mean of a large sample is given by: SEM =
Q N −1
where, Q = standard deviation of sample N = size of sample Note: In a large sample, as the value of 1 is very small, so N–1 (correction) is not taken in the denominator.
3. Standard error of the Median (M.D.) of a sample is given by:
SEMd =
1.253Q (in terms of Q) N
SEMd =
1.858Q (in terms of Q) N
4. Standard error of the Quartile of a sample is given by: SEQ =
0.786Q (in terms of Q) N
28
Applied Psychometry
1.17Q N
SEQ =
(in terms of Q)
5. Standard error of the standard deviation of a sample is given by: 071Q N
SEQ =
6. Standard error of percentage (%) of a sample is given by: PQ N
SE% = Where P = % behaviour absent Q = percentage behaviour absent N = size of sample
7. Standard error of Correlation of a sample is given by: SEr =
1 − r2 N
Probable Errors If the difference in results of two observations on different occasions is less or if the observations of two persons on a problem are not very different, then it is not considered true or reliable, but a small deviation in two observations make it reliable. Thus, when we try to know the truth through measures of variance, the result is probable error. Using different formulae, we can calculate the probable error for different measures. 1. Probable error of the Mean of a sample is given by: PEM =
0.6745Q N
2. Probable Error of the Median of a sample is given by: PEMd =
0.8454Q N
Errors in Measurement
29
3. Probable error of the Median of a sample is given by: PEQ =
1.2523Q N
4. Probable error of the Correlation of a sample is given by: PEr =
0.5745(1 − r ) N −1
Inferential Errors The following are two types of the inferential errors: 1. Type I error 2. Type II error Type I Error This type of error is committed when a null hypothesis (H0) is rejected and, in fact, it is true. For example, the H0 is that there is no significant difference between the IQ of boys and girls. Suppose, the calculated t1 value of these groups is less than 1.96, then H0 is rejected indicating that there is some difference between the two groups; but if there is no actual difference, then there is some difference between the two groups. But if there is no actual difference in population means and difference is observed on the basis t value, then Type I error is committed (Figure 2.1). This error is represented by α (alpha). There exists an inverse relationship between α and the significance level. If the significance level is high, α will be less, and if the significance level is low, Type I error will be more. For example, if the significance level for any hypothesis is kept at 0.01, then there would be more chances of committing Type I error, and if the significance level is kept at 0.05, then the chances of committing Type I error would be comparatively less. Type II Error Such an error is committed when H0 is accepted and, in fact, it is false. For example, H0 is that there is no significant difference between the IQs of boys and girls. Suppose the t value calculated for these groups is more than 2.58, then the H0 is accepted but actually there is a difference between the two groups. Thus, Type II error is committed (Figure 2.1). This error is denoted by β (beta). Type II error has direct relation with the significance level. Higher the significance level, more are the chances β of being committed; for example, Type II error will be more when the significance level is 0.5 as compared to when it is 0.1. This is important in psychological experimentation.
30
Applied Psychometry
Figure 2.1
Type I and Type II Errors
Null Hypothesis True
Null Hypothesis False
Reject Null Hypothesis
Type I Error
Correct
Accept Null Hypothesis
Correct
Type II Error
Source: Author.
Both the types of errors are related to each other. It we try to reduce Type I error, we increase the chance of Type II error being committed and, similarly, if we wish to reduce Type II error, we are taking a chance with Type I error. The two types of errors can be written as: z z
Type I error rejecting null hypothesis (H0) when it is true, Type II error accepting null hypothesis (H0) when it is false.
The most often used measures of ‘test errors’ are the four types of errors, namely: 1. 2. 3. 4.
Error of measurement, Error of substitution, Error of estimating observed scores and Error of estimating true scores.
Error of Measurement The error of measurement is nothing but the error committed while substituting the observed/ obtained score for the corresponding true score. Suppose we are interested in assigning every person a true score, but instead of true score, we assign the observed score. Whatever be the difference between these two scores, it is nothing but the error of measurement. Thus, the error of measurement is the error made while substituting the observed score for the true score. Based on this theory, error of measurement is: e=x–t
(1)
x=t+e
(2)
Where, e is the error score x is the observed score t is the true score Or,
Errors in Measurement
31
Squaring both sides, we get x2 = (t + e) 2 x2 = t2 + e2 + 2te
(3)
Summing both sides, we get
∑ x = ∑t + ∑ e 2
2
2
+ 2∑ te
(4)
Dividing both sides by N, we get
∑x N
2
=
∑t
2
N
+
∑e N
2
+2
∑ te N
(5)
We know,
∑x
2
= variance of observed score = Sx2
N
∑t
2
over N∼ ∼ = roman {variance ∼ ∼ true ∼ ∼ score} ∼ ∼ s sub t sup 2
∑e
2
2 N = variance of error score = Sx
Also correlation between t and e score is given by rte =
∑ te NSt Se
Or,
∑ te = r N
ss
te t e
Substituting these in equation (5) we get Sx2 = St2 + Se2 + 2rte St Se
(6)
32
Applied Psychometry
We know that the correlation between error score and true score is always zero. Hence, (6) will become, Sx2 = St2 – Se2 Se2 = Sx2 – St2
(7)
Or, we know true variance is rxx Sx2, hence (7) will become Se2 = Sx2 – rxxSx2 Or, Se2 = (1 – rxx) Taking square root of both sides, we get Se = Sx (1 − rxx )
(8)
Error of Substitution Error of substitution is the error made while substituting a score on one test for a score on a parallel test. In other words, error of substitution is the difference between two observed scores on parallel tests. Mathematically, error of substitution is: d = x1 – x2 Where, d is the difference x1 is the observed score on one test and x2 is the observed score on another test. The above definition of the error of substitution would be applicable in case we are interested to find the differences between the results obtained by one researcher employing one psychological test and another researcher employing a parallel form of the same psychological test. To find the standard error of substitution, we need to find the standard deviation of these difference scores. d = x1 – x2 Squaring both sides, we get d 2 = ( x1 − x2 )2 = x12 + x 22 − 2 x1 x2
(9)
Errors in Measurement
33
Take summation and divide both sides by N
∑d N We know that
∑x
2 1
N
∑x N
2 2
∑d N
2
2
=
∑x
2 1
N
+
∑x
2 2
N
−
2∑ x1 x2 N
(10)
= variance of difference scores = S12
= variance of obtained score of test 1 = S12 = variance of obtained score of test 1 = S22
Also correlation between scores x1 and x2 is given by r12 =
∑x x 1
2
NS1S2
Substituting these in equation (10) we get Sd2 = S12 + S22 − 2r12 s1 s2
(11)
For parallel tests S1 = S2 , hence (11) can be written as Sd2 = S12 + S12 − 2r12 S1S2 Sd2 = 2S12 − 2r12 S12 Sd2 = 2S12 (1 − r12 ) Sd = 2S12 (1 − r12
(12)
Error of Estimating Observed Score Error of estimating or predicting observed score is the error made while a regression equation is used to estimate or predict the scores on one test from scores on a parallel test. In other words, it is the error made while predicating the score on one form of the test on the basis of the other form of the same test with the use of least square solutions, that is, regression equation. Mathematically, error of estimating observed score would be ⎛ ⎜ ⎜
⎞
S1 ⎟⎟ ⎟ x ⎟ 2 ⎜⎜ S ⎟⎟ 2 ⎠ ⎝
d′ = x1 − r12 ⎜⎜
(13)
34
Applied Psychometry
Where, d is the difference in predicating an observed score on one form from the other form by using the least square solution: x1 and x2 are the observed scores S1 and S2 are the standard deviations of the two tests r12 is the correlation between score x1 and x2 score. For parallel tests, S1 = S2; hence, equation (13) would become, ⎛S ⎞ d r = x1 − x2 ⎜ 1 ⎟ x2 ⎝ S1 ⎠ r d = x1 – r12 x2
(14)
dr2 = x12 + r 212 x22 – 2r12 x1 x2
(15)
Squaring both sides, we get,
Summating and dividing both sides of equation (15) by N we get,
∑ d′ N
2
=
∑x
2 1
N
+ r122
∑x x 1
2
(16)
N
Substituting, Sd2 =
∑d
2 1
N
, S12 =
∑x
2 1
N
, S22
∑x
2 2
N
and, r12 =
∑x x 1
2
NS1S2
in equation (16), we get
Sd2 = S12 + r122 S22 − 2r12 (r12 S1S2 ) For parallel tests, S1 = S2; hence, equation (17) becomes, Sd2 = S12 + r122 S22 − 2r12 r122 S12
(17)
Errors in Measurement
35
Sd′ = S12 − r122 s12 2 Sd′ = S12 (1 − r122 ) 2
Sd′ = S1 (1 − r122 ) 2
(18)
If the regression equations are used to estimate the scores on parallel forms of a test and we are interested to find the error involved, equation (18) should be used. But if scores on one test are assumed to be equal to the scores on a parallel test without the application of regression equation, then equation (12) is to be used.
Error in Estimating True Score Error in estimating true score is the error made in predicting the true score from the observed score while applying the best fitting regression equation or least square solutions. In other words, error of estimating the true score is the error difference between the true score and the predicated value of the true score. The predicated value of true of true score is given by the predication equation. ⎛S tp = rxt ⎜ t ⎝ Sx
⎞ ⎟x ⎠
(19)
Where, rxt is the correlation between observed and true scores. St and Sx are the standard deviations of true and observed scores, respectively, and x is the observed score. Mathematically, error in estimating true score is written as,
er = t – tp
(20)
Substituting equation (19) in equation (20) we get, ⎛S ⎞ e r = t − r xt ⎜ t ⎟ x ⎝ Sx ⎠
(21)
Squaring both sides we get, 2
⎛ ⎜ ⎜
⎞
⎛
⎞
⎟ ⎜ St ⎟⎟ 2 ⎜ S ⎟ ⎟ x − 2rxt ⎜⎜ t ⎟⎟ tx ⎟ ⎟ ⎜ ⎟ ⎜ ⎜S ⎟ ⎜S ⎟ x ⎠ x ⎠ ⎝ ⎝
e r = t 2 + rxt2 ⎜⎜
(22)
36
Applied Psychometry
Summating and dividing both sides by N, we get,
∑e N 2
r Substituting Se =
rxt =
∑ tx NSt Sx
∑e N
r2
=
2
, St 2 =
∑t
⎛ S ⎞ ∑ x2 ⎛ S ⎞ ∑ tx + rxt2 ⎜ t ⎟ − 2rxt ⎜ t ⎟ N ⎝ Sx ⎠ N ⎝ Sx ⎠ N
∑t N
2
2
, Sx2 =
∑x N
(23)
2
and
in equation (23), we get, 2 ⎛S ⎞ ⎛S ⎞ Se r = S12 + rxt2 ⎜ t ⎟ Sx2 − 2rxt ⎜ t ⎟ (rxt St Sx ) ⎝ Sx ⎠ ⎝ Sx ⎠
(24)
Or, 2
Se r = St2 − rxt2 st2 − 2rxt2 St2 2
Se r = St2 − rxt2 st2 2
Ser = St2 (1 − rxt2 )
(25)
Since we know that the standard deviation of the true score is given by, St = Sx rxx
(26)
and correlation between observed and true score is given by, rxt = rxx
(27)
Substituting (26) and (27) in (25) we get, 2
Ser = Sx2 rxx (1 − rxx ) ∴ Set = Sx rxx (1 − rxx ) Relationship between the Four Errors The four errors obtained are, Se = Sx (1 − rxx ) Se = Sx (1 − rxx ) Sdt = Sx (1 − rxx2 )rxx Set = Sx (1 − rxx2 )rxx
(28)
Errors in Measurement
37
By taking the common factor out, that is, Sx = Sx (1 − rxx from the four factors, we can write these four errors as, Se = Sx (1 − rxx ) 1 Sd = Sx (1 − rxx ) 2 2 Sdr = Sx (1 − rxx ) 1 + rxx ) (Θ1 − rxx = (1 − rxx )(1 + rxx ))
Set = Sx (1 − rxx ) rxx As we know that the reliability coefficient of the test rages from 0 to 1.0, hence, 2 > 1 + rxx > 1 > rxx Hence, by looking at the above four errors, one can say, Sx 1 − rxx 2 > Sx (1 − rxx ) 1 + rxx ) > Sx 1 − rxx 1 >Sx 1 − rxx rxx or,
Sd > Sdt > Se > Set
or,
Set < Se < Sdt < Sd
Therefore, for any data, we should always get, Set < Se < Sdt < Sd Problem: Find each of the four error indices for following three tests. Test X Y Z
Number of Items 60 50 90
Mean 35 40 30
Solution: For Test X: Se = 8.0 1 − 0.40 = 6.197 Sd = 8.0 1(1 − 0.40 ) = 8.764 Sdt = 8.0 1 − (0.40)2 = 7.332 Sdt = 8.0 (1 − 0.40)(0.40) = 3.919
S.D.
Reliability
8.0 9.0 7.0
0.40 0.50 0.60
38
Applied Psychometry
As 8.764 > 7.332 > 6.197 > 3.919, therefore, For Test X: Sd > Sdt > Se > Set For Test Y: Se = 9.0 (1 − 0.50) = 6.364 Sd = 9.0 2(1 − 0.50) = 9.000 Sdt = 9.0 1 − (0.50)2 = 7.794 Set = 9.0 (1 − 0.50)(0.50) = 4.500 As 9.000 > 7.794 > 6.364 > 4.500, therefore, For Test Y: Sd > Sdt > Se > Set For Test Z: Se = 7.0 (1 − 0.60) = 4.427 Sd = 7.0 2(1 − 0.60) = 6.261 Sdt = 7.0 1 − (0.60)2 = 5.600 As 6.261 > 5.600 > 4.427 > 3.429, therefore, For Test Z: Sd > Sdt > Se > Set
3
Speed Test versus Power Test
CHAPTER OUTLINE 1. 2. 3. 4. 5.
Introduction Speed Test Power Test Types and Errors and Speed and Power Tests Effect of unattempted items on errors of measurement
LEARNING OBJECTIVES After reading this chapter, you will be able to understand: 1. 2. 3. 4. 5.
What are Speed Tests? What are Power Tests? Where are differences between Speed and Power Tests? What is the effect of unattempted items on the errors of measurement? What are Speed and Power Tests in relation to various types of errors?
39
40
Applied Psychometry
INTRODUCTION
P
sychological Tests can be classified on various dimensions. One classification which is based on the rate of performance distinguishes between Speed Test and Power Test.
SPEED TEST Speed Tests are the ones in which individual differences depend entirely on the speed of performance. Items are of uniform difficulty, all of which are well within the ability level of the persons for whom the test is designed; but the time limit is such that no examinee can attempt all the items. Under such conditions, each person’s score reflects only the speed with which he worked. According to Gulliksen, a pure Speed Test is a test composed of items so easy that the subjects never give the wrong answer to any of them, that is, there would be no attempted item that would be incorrect and, consequently, the score for each person would equal the number of items attempted.
POWER TEST Power Tests, on the other hand, have time limits long enough to permit everyone to attempt all the items. But the difficulty of items is steeply graded and the test includes certain items which are too difficult for anyone to solve, so that no one can get a perfect score. The level of difficulty that can be mastered in liberal time is determined. Achievement examinations fall in this category. In a pure Power Test, all the items are attempted so that the score on the test depends entirely upon the number of items that are answered and answered correctly. Therefore, in such tests, the items cannot be of trivial difficulty because that would not produce a desirable distribution of scores. Further, a Speed Test is a test whose primary emphasis is on the number of problems the subject can do in a given time period. A pure Speed Test is a test composed of items so easy that the subjects never give wrong answer to any of them. The answers are correct so far as the subject has appeared in the test. However, the test contains so many items that no one can finish it in the time allowed. The subjects’ score, therefore, depends entirely on how far he is able to go in the allotted time.
TYPES OF ERRORS, AND SPEED AND POWER TESTS Distinction between two types of ‘errors’ is important for the discussion of the Speed–Power problem. Suppose, W U X
the number of items for which the subject gives incorrect answers the number of items left unanswered by the subject, that is, the number of items that the subject does not reach the total error score
Speed Test versus Power Test
41
Thus, X=W+U
For a pure Speed Test, W will always be zero for each individual. Therefore, mean (M) and standard deviation (S) of ‘W’ will be zero. Mathematically, Mw = 0 Sw = 0 Also X = U, that is the individual total score will be found by the number of items that he does not answer. Therefore, the mean and standard deviation of x would be equal to the mean and the standard deviation of U. Mathematically, MX = MU SX = SU The above mentioned properties are the properties of a pure Speed Test. In other words, any test would approach a pure Speed Test to the extent that the mean (MW) and the standard deviation (SU) approach the mean and standard deviation of the total number of errors (that is, W + U) is same as mean (MU) and standard deviation (SU) of unattempted items. In a Power Test, the level that an individual can attain is measured. In this category of tests, the items are arranged in a difficulty order and stress is less on time limit as compared to Speed Tests. In a pure Power Test, the testee is made to attempt all the items and the scores on the test is determined by the number of items that are answered, and answered incorrectly. In a pure Power Test, unattempted (U) will be zero for each individual because enough time is provided to attempt all items. Therefore, for a pure Power Test, MU = 0 SU = 0 Therefore, for each individual, X =W Hence, MX = MW And, the above properties hold only for pure Power Tests. To the extent that the above properties are approximate, that is, approximately satisfied, the test reaches to the condition of a Power Test.
42
Applied Psychometry
For a pure Power Test, the odd–even method to find split-half reliability should be employed. The odd–even reliability will become higher if it is used for Speed Test because Speed factor will enter more and more into the determination of test scores. Whether a test is sufficiently close to being a pure Power Test can be indicated by a criterion which would make one sure that the odd–even reliability will not be spuriously high or low. Similarly, for a pure Speed Test, a criterion should indicate that the variability obtained due to item difficulty or carelessness in answering the items is negligible. The test–retest reliability of a test that involves both Speed and Power elements is likely to be higher or lower than the reliability of a test that involves only one element, depending on whether Speed and Power are positively or negatively correlated. Thus, if one wishes to measure Speed in a given function, it is essential to make sure that one is dealing only to a negligible extent with a test involving the Power element. Let us see the effect of Speed or Power factor in a test on the standard deviation of a test. We know that, X=W+U or, MX = MW + MU If small letters are the deviation scores, then, X=W+U On squaring both sides, we get, X2 = (W + U)2 On summing and dividing both sides by N, we get,
∑x
=
∑ (W + U )
=
∑W
2
N
2
N
or,
∑x N
2
N
2
+
∑U
2
N
+2
∑ WU N
or, Sx2 = Sw2 + 2rwu Sw Su + Su 2 In a pure Power Test, the subject will finish all items, that is, variance of u will be zero. Hence, Su = 0.
Speed Test versus Power Test
Therefore, S2x = S2w or, Sx = Sw Thus, for a pure Power Test, Su = 0, Sx = Sw Thus, for a pure Speed Test, Sw = 0, Sx = Su For instance, if rwu = –1 or +1 (as correlation coefficient ranges from –1 to +1) then, S2x = S2w + S2u ± 2SwSu = (Sw + Su)2 or (Su – Sw)2 or, Sx = Sw + Su or else, Sx = Su – Sw It might be the case that both Sw and Su are larger than Sx. On the other hand, suppose if Sw is zero or nearly zero, then Sw would be equal to Sx. These cases would occur in case, ruw = + 1 or – 1 If rwu = + 1, then Sx = Sw + Su and if rWU = –1, then |Sx| = |Sw – Su|
43
44
Applied Psychometry
If Su/Sw = 0.1, then Sw/Sx will lie between 0.90 and 1.10; and if Su/Sx = 0.10, then the relation Sw/Sx ≮ 0.99 nor Sw/Sx ≯ 1.01. Hence in such a situation, the test will be Power Test. From standard deviation point of view, we can say that a test is a Speed Test if Sw/Sx is very small, and the test is a Power Test if Su/Sw is very small. Hence, for a Speed Test Sw/Sx is very small and, 1 + Sw/Sx >
Su S > 1− w Sx Sx
or, 1–Sw/Sx < Su/Sx < 1 + Sw/Sx For a Power Test Su/Sx is small and, 1 + Su/Sx > Sw/Sx > 1 – Su/Sx or, 1 – Su/Sx < Sw/Sx < 1 + Su/Sx
EFFECT OF UNATTEMPTED ITEMS ON THE ERRORS OF MEASUREMENT We know that error of measurement for the total score x is Sx forms is given by the formula, rx1 x2 =
∑x x 1
(1 − r ) . The reliability of two parallel
2
(1)
NSx Sx 2
The numerator can be written in terms of w and u scores as,
∑ x x = ∑ (w 1
2
1
+ u1 )(w1 + u2 )
or,
∑x x = ∑w w + ∑u u 1
2
1
2
1
2
+ 2∑ w1u2 + ∑ w1u
Using reliabilities and intercorrelations, the above equation can be written as,
∑x x 1
2
= Nrw 1Sw2 + Nru1u2 Su2 + 2Nrwu Sw Su
Speed Test versus Power Test
45
Substituting this in equation (1) above, we get, rw1 w 2 Sw2 + ru1 u2 Su2 + 2rwu Sw Su
rx1 x2 =
SX2
This can be rewritten as, 1 − rx1 x2 = 1 −
rw 1Sw2 + ru1 u2 Su2 + 2rwu Sw Su Sx2 + Su2 + 2rwu Sw Su
or, 1 − rx1 x2 =
Sw2 (1 − rw1 w2 ) + Su2 (1 − ru1u2 ) Sx2 + Su2 + 2rwu Sw Su
or, 1 − rx1 x2 =
Sw2 (1 − rw1 w 2 ) + Su2 (1 − ru1 u2 ) Sx2 + Su2 + 2rwu Sw Su
or, Sx2 (1 − rx1 x2 ) = Sw2 (1 − rw 1 w2 ) + Su2 (1 − ru 1 u2 ) Thus, if x is defined as equal to w + u, then the error variance for the x-score would be equal to the error variance of the w-score, plus the error variance of the u-score. If a test is a Power Test, then it is possible to employ the split-half reliability to estimate the extent of error of measurement. For any split-half reliability, Sx2 (1 − rxc ) = Sw2 (1 − rww ) For a Power Test, error of measurement would be greater if the subject is allowed to attempt all the items in a test. If a test is a Speed Test, then a test–retest of an alternate form of reliability method should be used to calculate the reliability of the test. If a Power Test is partially Speeded up and we use split-half reliability method, then the reliability will be overestimated. How much reliability would be overestimated will be given by the formula: rxx > R >
rxx − 2 H 1 − 2H
46
Applied Psychometry
where, rxx is the stepped up split-half correlation H = Su/Sx and R = reliability of the test. In case a test is a speeded-up test, the retest or alternate form is to be used to calculate the reliability of the test. Such reliability will correctly represent the functioning reliability of the test. Illustration: Compare the Speed component with the Power component of any two aptitude tests/ subjects. To attain these objective two subtests of Differential Aptitude Test (DAT), namely, Clerical Ability and Accuracy Test (Table 3.1) and Numerical Ability Test (Table 3.2) are administered to a group of 10 college graduates. The results are as follows:
Clerical Speed and Accuracy Table 3.1 S. No. 1 2 3 4 5 6 7 8 9 10 Mean S.D.
Unattempted Items (U)
Clerical Ability Test
Wrong Items (W)
Right Items (R )
1 1 1 1 1 1 0 2 1 3 1.2 2.73
94 96 92 99 60 99 98 84 94 83
5 3 7 0 39 0 2 14 5 14 8.9 10.92
Total Error (X) = U + W 6 4 8 1 40 1 2 16 6 17 10.1 11.94
Source: Author. Table 3.2 S. No. 1 2 3 4 5 6 7 8 9 10 Mean S. D. Source: Author.
Unattempted Items (U) 2 3 3 3 0 0 5 0 1 3 2.0 1.24
Numerical Ability Test
Wrong Items (W) 1 1 2 5 8 12 9 10 9 12 6.9 4.33
Right Items (R) 37 36 35 32 32 28 26 30 30 25
Total Error (X) = U + W 3 4 5 8 8 12 14 10 10 15 8.9 4.09
Speed Test versus Power Test
47
Standard Deviation Subtests of DATA
Su
Numerical Ability Clerical Ability
1.24 10.92
Sw 4.33 2.73
Sx 4.09 11.94
Numerical Ability Test Sw 4.33 = = 1.05 Sx 4.09 Su 1.24 = = 0.30 Sx 4.09 Sw Su > i.e. 1.05 > 0.30 Sx Sx Therefore, Numerical Ability Test is a Power Test.
Clerical Ability Test Sw 2.73 = = 0.22 Sx 11.94 Su 10.92 = = 0.91 Sx 11.94 Su Sw that is, 0.91 > 0.22 > Sx Sx Thus, Clerical Ability Test is a Speed Test. First of all, the mean and standard deviation of the unattempted items, wrong items and total number of errors are calculated. These values are then transferred to the formula for comparing as to which of the two components is more loaded in which subtest, that is, Su Sw Sw Su > > or Sx Sx Sx Sx Through this is seen that for Numerical Ability Test, the ratio of wrong items to the total error (Su/Sx = 1.05) is greater than the ratio between standard deviation of unattempted items to the
48
Applied Psychometry
standard deviation of error (Su/Sx = 0.30). In contrast, for the Clerical Speed and Accuracy Test, the ratio between the standard deviation of unattempted items and the standard deviation of total error (0.91) is greater than the ratio between the standard deviation of wrong items to the standard deviation of the total error (0.22). These statistical figures clearly advocate that the Numerical Ability Test of DAT is a Power loaded test, whereas the Clerical Speed and Accuracy Test of DAT is a Speed loaded test. This means that the Clerical Speed and Accuracy Test emphasises on the number of items that the subject can accomplish in a given time period. The mean value of the wrong items in the Clerical Speed and Accuracy Test is less in comparison to the mean value of Numerical Ability Test. This fact also supports the fact that Clerical Speed and Accuracy Test is a Speed loaded test. The Power Test in contrast to Speed Test emphasises the level that the individual can attain and the mean value of unatempted items is less than the mean value of wrong items. This also supports the obtained results that Numerical Ability Test is a Power Test, and Clerical Speed and Accuracy Test is a Speed Test.
4
Criterion for Parallel Tests
CHAPTER OUTLINE 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
What are parallel tests? Criterion for parallel tests Equality of means, variances and covariances Calculation of Lmvc Interpretation of Lmvc Equality of variances and covariances Calculation of Lvc Interpretation of Lvc Equality of means Interpretation of Lm Use of Parallel Tests to Calculate Test Reliability
LEARNING OBJECTIVES At the end of this chapter, you will be able to understand: 1. 2. 3. 4.
What are parallel tests? Why we need parallel tests? What are the criterion for parallel tests? The use of parallel test to calculate test reliability.
49
50
Applied Psychometry
WHAT ARE PARALLEL TESTS?
‘P
arallel tests are the tests that have equal means, equal variances and equal intercorrelations’ (Gulliksen 1961). The idea of parallel tests has been provided by S.S. Wilks (1946, cited in Gulliksen 1950).
CRITERION FOR PARALLEL TESTS Whenever we administer any test on an individual, we are not sure whether the subject’s scores really represent him. At times, to be very sure about the test score, we need a parallel form of the test. These parallel tests are very important from technical as well as psychological points of view. The question arises whether the parallel test made by us is really parallel. This indicates that the test must fulfil some criterion before declaring it to be parallel (Table 4.1). Thus, for the tests be parallel, they should fulfil the following criteria normally. Figure 4.1
Criteria for Parallel Tests
Source: Author.
Psychological Criterion The following are the psychological criteria to be fulfilled for the parallel test: 1. Parallel tests should have approximately equal validities, 2. Parallel tests should have items concerning the same subject mater and 3. Parallel tests should have items of the same format.
Criterion for Parallel Tests
51
Statistical criterion The following are the statistical criterion to be fulfilled by the parallel test: 1. The parallel tests should have equal means, 2. The parallel tests should have equal variances and 3. The parallel tests should have equal covariances. Whenever parallel forms of a test are given to a single group of individuals, there will be small sampling differences even under the controlled experimental conditions. To be sure that the test is parallel, it is necessary that it should have some statistical criteria. Here, the statistical criteria should indicate whether or not the means can be regarded as samples from a population in which the means are equal, the variances can be regarded as samples from a population in which variances are equal and the correlations are equal. In case of two parallel forms, there will be only one intercorrelation and, hence, for the two parallel forms/tests, it is possible to test the equality of means and equality of variances. Suppose there are two tests A and B. We will have means and standard deviations of these two tests, say, MA and MB and, SA and SB The task is to test statistically whether MA ≈ MB and also SA ≈ SB. To test the equality of the means and the standard deviations, the formulae used are: For mean, t=
IM A − MB l SA2 S2 + B N A NB
For Standard deviation, t=
ISA − SB l SA2 S2 + B N A NB
The degree of freedom (df ), df = NA + NB – 2
52
Applied Psychometry
Where NA and NB are the sizes of the samples used for tests A and B, and SA and SB are the standard deviation of the two forms. If the obtained value of t is greater or equal to the table value t for NA + NB–2, then t will be significant. If the obtained value of t is less than the table value of t for NA + NB–2, degree of freedom, then t will be insignificant. The two forms of the test will be parallel if t for the mean as well as the standard deviation is insignificant, and the two tests will not be parallel if t for the mean and/or the standard deviation is significant.
Illustration Suppose two forms of an intelligence test are administered to a group of 30 students of law and the following are the results. The problem is to test whether the two forms are parallel or not. M1 = 48.6, M2 = 46.5 SD(s1) = 9.5, SD(S2) = 10.5 N1 = 30, N2 = 30 For mean,
t=
M1 − M2 SA2 SB2 + N1 N 2 =
=
148.6 − 46.51 (9.5)2 (10.5)2 + 30 30
2.1 = 0.148 14.16
df = N1 + N2 – 2 = 30 + 30 – 2 = 58 For df = 58 t0.05 = 2.00 and t0.01 = 2.66 The above t value is insignificant and hence the two forms are parallel with respect to the mean. For standard deviation, t=
lS1 − S2 1 SA2 SB2 + N1 N 2 =
=
19.5 − 10.51 (9.5)2 (10.5)2 + 30 30
1 = 0.071 14.16
Criterion for Parallel Tests
53
df = N1 + N2 – 2 = 30 + 30 – 2 = 58 For df = 58 t0.05 = 2.00, and t0.01 = 2.66 The above t value for standard deviation is insignificant and, hence, the two forms are parallel with respect to variances. The two forms have usual means and variances and, therefore, the two forms are parallel statistically.
There Parallel Forms For three forms to be parallel statistically, the criteria are: 1. Equality of means of the three forms, 2. Equality of variances of the three forms and 3. Equality of covariances of the three forms. Apart form the above criteria, parallel tests should have equal validities and the tests should contain items dealing with the same subject matter along with the item format. To find whether three forms of a test are parallel or not, the significant differences are to be calculated between means, variances and covariances (correlations) of the three forms, say, A, B and C. For three forms: Sample Size
NA
NB
NC
Means Variances Correlations
MA SA rAB
MB SB rAC
MC SC rBC
To check the equality of means, we need to find whether there is any significant difference between MA and MB with the formula, t=
IM A − MB l SA2 S2 + B N A NB
and also between MA and MC, and further between MB and MC. If all the three t values come out to be non-significant statistically, then we can conclude the equality of means of the three forms.
54
Applied Psychometry
Next, we would test the equality of variances. This could be done with the F ratio between form A and B; form A and C; and form B and C. First, we will do this between form A and B as, FAB =
N A (SDA )2 N B − (SDB )2 / NA − 1 NB − 1
The significance of FAB would be checked and if it is found to be insignificant, then forms A and B are to be taken as equal on variance. The same procedure is followed between form A and C, and then between form B and C. Hence, forms A, B and C would be taken as equal with respect to variance if all the three F- ratios (FAB, FAC and FBC) turned out to be insignificant. Last, we would test the equality of covariances (correlations). For this purpose, we would apply the Z-transformation as described below. First, we will test between rAB and rAC. If all three Z’s turn out to be insignificant, then only can one say that the three forms are equal with respect to variances.
Difference between Correlations The standard error of Z will be given by the formula, SEz =
1 N−3
(1)
where N is the size of the sample. Now the question arises that when we have two values of correlations and we are interested to find whether these correlations differ significantly or not, the method used to see the significant difference between two correlations is the Fisher Z-transformation. The formula to test the difference between two correlation is, Z=
Z1 − Z2 SEZ − Z2
(2)
Where Z1 and Z2 are the corresponding Z values for the correlations r1 and r2, and SEz1 – z2 is the standard error of the difference of the two Z-values, that is, Z1 and Z2. Hence, SEz1 − z2 = SEZ21 + SEZ2 2 2
=
⎛ 1 ⎜ ⎜ N −3 1 ⎝
⎞ ⎛ 1 ⎟ +⎜ ⎟ ⎜ N −3 2 ⎠ ⎝
=
1 1 + N1 − 3 N 2 − 3
⎞ ⎟ ⎟ ⎠ (3)
Criterion for Parallel Tests
55
Substituting the value of SEz1 – z2 form formula (3) into the formula (2), we get, Z=
Z1 − Z2 1/(N1 − 3) + 1/(N 2 − 3)
(4)
This obtained value of Z form formula (4) is compared with the table Z-values at 0.05 and 0.01 level of significance. If the obtained value is greater than or equal to the table value of Z, that is, 1.96 (at 0.05 level) and 2.58 (at 0.01 level), the null hypothesis is rejected and the alternative hypothesis is accepted. In this case, the null hypothesis is that there is no difference between the two correlations and the alternative hypothesis is that there is a significant difference between the two correlations.
Problem For the following data, test whether the two correlations differ significantly or not. r1 = 0.80, r2 = 0.83 N1 = 150, N2 = 105 Solution Convert the values of r1 and r2 to corresponding Z1 and Z 2 values, For r1 = 0.80, z1 = 1.1513 log10 = 1.1513 log10
1 + r1 1 − r1 1 + 0.80 1 − 0.80
= 1.1513 log10 9.0 = 1.10 For r2 = 0.83, Z2 = 1.1513 log10 1 + r2 1 − r2 1 + 0.83 = 1.1513 log 1 − 0.83 = 1.1513 log 10.795 = 1.19
56
Applied Psychometry
There is another method also to convert the value of r to the corresponding Z-values. For table values, we check the values of Z for the corresponding values of r. Z= =
Z1 − Z2 1/(N1 − 3) + 1/(N 2 − 3) 1.10 − 1.19 0.09 = = 0.698 1/(150 − 3) + 1/(150 − 3) 0.1289
This value of Z is insignificant because any value of Z less than 1 is always insignificant or the table Z-value at 5 per cent and 1 per cent levels are 1.96 and 2.58, respectively. Henceforth, this value of Z is insignificant. This indicates that there is no significant difference between r1 and r2. In other words, we accept the null hypothesis H0 that r1 and r2 do not differ statistically. Problem In a test on intelligence, the test–retest and split-half reliability were calculated and the results obtained are r (test–retest) = 0.80 when N = 145, r (split-half) = 0.60 when N = 120. Test whether the two reliabilities differ or not. Solution r1 = 0.80, r2 = 0.60 N1 = 145, N2 = 120 Converting r1 and r2 into Z1 and Z2, we get, 1 + r1 1 − r1 = 1.1513 log10 1 + 0.80 1 − 0.80 = 1.1513 log10 9 = 1.10
Z1 = 1.1513 log10
1 + r2 1 − r2
Z2 = 1.1513 log10 = 1.1513 log10
1 + 0.60 1 − 0.60
= 1.1513 log10 4 = 0.693
Criterion for Parallel Tests
lZ1 − Z2 l 1/(N1 − 3) + 1/(N 2 − 3)
Z=
11.10 − 0.691 1 1 + (145 − 3) (120 − 3)
=
=
57
0.407 0.1249
= 3.259 This table value of Z at 5 per cent level and 1 per cent level are 1.96 and 2.58, respectively. Our calculated value of Z is 3.259, which is greater than the alternative hypothesis H0 and so we accept the alternative hypothesis. In other words, this indicates that there is statistically significant difference between the test–retest and split-half reliability.
Problem The following are the correlation obtained among three anthropometric variables like calf circumference (1), hip breadth (2) and leg length (3) for a sample of 120 individuals.
1 2 3
(1)
(2)
(3)
– 0.70 0.90
0.70 – 0.86
0.92 0.86 –
Is the correlation between 1 and 2 significations different from the correlations between 1 and 3?
Solution The correlation between 1 and 2 = r1 = 0.70 The correlation between 1 and 3 = r2 = 0.92 Convert the values of r1 and r2 to the corresponding values of z1 and z2. r1 = 0.70, z1 = 0.87 r2 = 0.92, z2 = 1.59 N1 = 120, N2 = 120
58
Applied Psychometry
Z=
=
=
lZ1 − Z2 l 1 1 + ( N1 − 3) N 2 − 3) 10.87 − 1.591 1 1 + (120 − 3) (120 − 3) 0.72 0.72 = = 5.496 0.131 1 1 + 117 117
The calculated value of Z is 5.496. The table values of Z at 0.05 and 0.01 level are 1.96 and 2.58, respectively. The calculated value of Z is greater than the table value of Z, which indicates that the obtained Z is significant. This implies that the null hypothesis is rejected and the alternative hypothesis is accepted. Thus, the correlation between 1 and 2 is statistically significant form the correlation between 1 and 3 at 0.01 level of significance. Now, let us suppose that the variable 1 is correlated with variable 2 and also correlated with variable 3. Also, the measurements are made on the same sample. In such a case, the above method is not applicable to test the significant difference between correlation of variables 1 and 2, and correlation between variables 1 and 3, that is, to find the difference between r12 and r13. For such a case, t is applicable and the required formula is, t=
(r12 − r13 ) (N − 3)(1 + r23 ) 2(1 − r122 − r132 − r232 + 2r12 r13 r2
The degree of freedom (df ) = N–3
Problem In a situation where intelligence (1), creativity (2) and academic achievement (3) were measured for a sample of 100 students, it was observed that intelligence is highly related with creativity and academic achievement. The obtained results are r12 = 0.80, r13 = 0.60 and r23 = 0.30. Test whether the correlation between intelligence and creativity is different from correlation between intelligence and academic achievement.
Criterion for Parallel Tests
59
Solution Here, r12 = 0.80, r13 = 0.60, and = r23 = 0.30 N = 100 t=
=
=
(r12 − r13 ) (N − 3)(1 + r23 ) 2(1 − r122 − r132 − r232 + 2r12 r13 r2 0.80 − 0.60 (100 − 3)(1 + 30) 2[1 − 0.80 2 − 0.30 2 − 0.60 2 + 2(0.80)(0.60)((0.30)] (0.20) (97 )(1.30) 2 × 0.198
=
2.2459 0.6293
= 3.569 df = N – 3 = 100 – 3 = 97 calculated t-value = 3.569 For df = 97, table t-value at 0.01 level = 2.58 and for df = 97; table t value at 0.05 level = 1.96. The calculated value of t is greater than the table t-value at 0.01 level. Hence, the null hypothesis is rejected and the alternative hypothesis is accepted. This indicates that the correlation between intelligence and creativity is significantly different than the correlation between intelligence and academic achievement at 0.01 level of significance.
EQUALITY OF MEANS, VARIANCES AND COVARIANCES We have seen that computation is a very lengthy procedure. If there are four or more forms to be tested for parallel property, then the procedure will become more and more difficult and time consuming. To overcome this difficulty, Dr S.S. Wilk gave a technique in 1946 known as Lmvc method. This method can be used to test the equality of mean, variance and covariance of any number of tests/ forms at a time. Following Wilk’s notation, we shall use Lmvc to designate the appropriate statistic for testing simultaneously the hypothesis that all means are equal, all variances are equal and all covariances are equal.
Lmvc =
D S [1 + (k − l)r ][S 2 (1 − r ) + v]k − l 2
60
Applied Psychometry
where D is the determinant of the matrix, K is the number of parallel forms , S2 is the average variance, M is the mean of the test means, V is the variance of the means and r is the average correlation. Suppose there are three parallel forms. Hence, for three parallel forms the Lmvc formula would be: D S 2 (1 + 2r )[S 2 (1 − r ) + ν ]2
Lmvc = and,
D = S12 S22 S32 [1 + 2r12 r13 r23 − r122 − r132 − r232 −] K
∑S s2 =
g = l
2 g
= The average variance
k k
Vgh ∑ g ≠ h =1
r=
= The average correlation, computed as the average covariance divided by the average variance
k(k − 1)s2 k
ν=
( Mg − M ) ∑ g = l
2
= The variance of the means
K −l
For three parallel forms/tests 3
∑1 s
s2 =
3
2 g
=
s12 + s22 + s32 3
3
∑ Vgh
r=
g ≠ h =1
σ S2
r12 + r13 + r23 σ S2
=
3
ν=
∑ ( Mg − M)2
g=l
2
=
( M1 − M )2 + ( M2 − M )2 + ( M3 − M )2 2
2
M=
∑ MgM
g = l
3
=
1
+ M2 + M3 3
The statistic Lmvc varies between zero and unity. If the means are identical in the sample, the variances are identical and the covariances are identical, then Lmvc is equal to one. If Lmvc is equal to
Criterion for Parallel Tests
61
zero, then it means that the tests are not parallel, that is, they are not identical on means, variances and covariances. If Lmvc is sufficiently near unity to support the hypothesis that the means are identical, the variances are identical and the covariances are identical, the population is characterised by one common mean, one common variance and a common correlation (it is reliability) coefficient. Statistics –N log10 Lmvc is approximately distributed as chi-square with degree of freedom equal to (k/2)(k + 3), when the hypothesis of equal means, equal variances and equal covariances is true. k(k − 1) For k test, there would k means, k variances and covariances. Hence, the degree of free2 dom for k means is equal to k–1. k variances equal to k–1, and, k(k − 1) Covariance is equal to k(k − 1) −1 2 2 Therefore, the degree of freedom associated with Lmvc statistics would be, (k − 1) + (k − 1) + 2k +
k(k − 1) −1 2
k(k − 1) −3 2
=
4k + k 2 − k 3k + k 2 −3 = −3 2 2
=
k ( k + 3) ⎛k⎞ − 3 = ⎜ ⎟ ( k + 3) − 3 2 ⎝2⎠
3 ⎛k⎞ ⎜ ⎟ ( k + 3) − 3 = ( 3 + 3) − 3 2 ⎝2⎠ For the parallel tests, the degree of freedom associated with Lmvc would be, 3 ⎛k⎞ df = ⎜ ⎟ (k + 3) − 3 = (3 + 3) − 3 2 ⎝2⎠ =9–3=6 Lmvc varies between 0 and 1. As Lmvc approaches one, the quantity –N log10 Lmvc will approach zero. Further, if Lmvc approaches zero, the quantity –N log10 Lmvc will approach one.
62
Applied Psychometry
INTERPRETATION OF LMVC If the quantity –N log10 Lmvc obtained from the given data is less than the value in the 5 per cent level of the table for the appropriate number of tests (k), one can consider the tests as parallel. If the value of –N log10 Lmvc obtained from the data is greater than the value given in the table at 1 per cent level for appropriate k, then there is less than one chance in a hundred that such a sample would be drawn from a population in which means were equal, variances were equal and covariances were equal. In other words, under these circumstances, one can conclude that the tests were not parallel with regard to means, variances and covariances.
EQUALITY OF VARIANCES AND COVARIANCES Suppose on testing Lmvc statistics, we found that the tests are not parallel with regard to means, variances and covariances, that is, we find a significant difference due to a small value of Lmvc or a relatively large value of –N log10 Lmvc. Still there would be some interest left in the mind of the researcher to know whether or not that difference is attributable only to the differences in means. Suppose we can find that variances are equal as well as covariances are equal, while only means are unequal. It is easy to adjust test scores by transforming scores with the help of addition or subtraction of a constant value in such a way that the means of the adjusted scores will be identical. This is called Lvc statistics. This is given by, D Lvc= 2 S [1 + (k − 1)r ][S 2 (1 − r )]k −1 For three parallel tests, Lvc=
D S (1 − 2r )[S 2 (1 − r )]2 2
=
S12 S22 S32 [1 + 2r12 r13 r23 − r122 − r132 − r232 ] S 2 (1 − 2r )[S 2 (1 − r )]2
=
S12 S22 S32 [1 + 2r12 r13 r23 − r122 − r132 − r232 ] S 2 [1 − 2r ][1 − r ]2
The value of Lvc ranges between zero and one. If the value of Lvc approaches one, it means that the variances are alike and the covariances are alike. If the value of Lvc approaches zero, then it means that the variances and covariances are not identical.
Criterion for Parallel Tests
63
The quantity –N log10 Lvc is approximately distributed to chi-square distribution of large samples with degree of freedom equal to (k/2) (k + 1) –2, when the hypothesis of equal variances and equal k(k − 1) covariances and, hence, covariances is true. For k tests, there would be k variances and 2 k(k − 1) the degree of freedom for k variances will be equal to k–1, and covariances will be equal 2 k(k − 1) –1. to 2 Therefore, degree of freedom associated with the Lmc statistics would be, (k − 1) + (k − 1) +
k(k − 1) −1 2
k(k − 1) −3 2 4k + k 2 − k 3k + k 2 = −3 = −3 2 2 = 2k +
=
k ( k + 3) ⎛k⎞ − 3 = ⎜ ⎟ ( k + 3) − 3 2 ⎝2⎠
For three parallel tests, the degree of freedom associated with Lmc would be, 3 ⎛k⎞ df = ⎜ ⎟ (k + 3) − 3 = + (3 + 3) − 3 2 ⎝2⎠ =9–3=6 Lvc varies between 0 and 1. As Lvc approaches one, the quantity –N log10 Lvc approaches zero. Further, if Lvc approaches zero, the quantity –N log10 Lvc approaches 1.
INTERPRETATION OF LVC If the quantity –N log10 Lvc obtained form the given data is less than the value in the 5 per cent level of the table for the appropriate number of tests (k), one can consider that the tests are parallel. If the value of –N log10 Lvc obtained from the data is greater than the value given in the table at 1 per cent level for an appropriate value of k, then there is less than one chance in a hundred that such a sample would be drawn form a population in which the variances were equal and the covariances were equal. In other words, under these circumstances, one can conclude that the tests were not parallel with regard to variances and covariances.
64
Applied Psychometry
EQUALITY OF MEANS Suppose after testing Lmvc, we find that it is significant, that is, the tests are not parallel with respect to mean, variance and covariance. Also, on testing Lvc again we find it is significant Lmvc and we get homogeneity when testing with Lvc. It means that the difference in means was responsible for the heterogeneity in Lmvc. Thus, testing for Lm becomes very important. For testing equality of means, we use the statistics Lm which is given by, Lm =
s2 (1 − r ) s2 (1 − r ) + v
Lm varies between 0 and 1. The quantity, Lm equal to unity, implies that the means are identical and if it approaches zero, then the sample means diverge. The quantity –N(k–1) log10 Lm is zero if the sample means are identical. On the other hand, the quantity –N(k–1) log Lm is unity if the sample means are not identical. There are k means for k tests and, hence, the degree of freedom associated with Lm would be (k–1).
INTERPRETATION OF Lm If the quantity –N(k–1) log10 Lm obtained from the given data is less than the value in the 5 per cent level of the table for the appropriate number of tests (k), one can conclude that the tests are parallel. If the value of –N(k–1) log10 Lm obtained form the data is greater than the value given in the table at 1 per cent level for appropriate k, there is less than one chance in a hundred that such a sample would be drawn form a population in which means were equal. In other words, under these circumstances, one can say that the tests were not parallel with regard to means.
Problem For the following data for 100 subjects, find the value of Lmvc, Lvc and Lm.
Mean S.D.
A
B
C
60 6
65 5
63 7
= 0.30, r13 = 0.40, r23 = 0.35
Criterion for Parallel Tests
Solution S2 =
S12 + S22 + S32 (6)2 + (5)2 + (7 )2 = 3 3
36 + 25 + 49 110 = = 36.47 3 3 C + C13 + C23 r = 12 6S 2 =
Here,
Therefore,
C12 C13 C23 C12 C13 C23
= r12S1S2 = r13S1S3 = r23S2S3 = (0.30) (6) (5) = 9.0 = (0.40) (6) (7) = 16.8 = (0.35) (5) (7) = 12.25 C12 + C13 + C23 9.0 + 16.8 + 12.25 = 6 6 = 6.34
S2 r =
M1 + M2 + M3 60 + 65 + 63 188 = = 2 = 62.67 3 3 3 ( M1 − M )2 + ( M2 − M )2 + ( M3 − M )2 v = 2 (60 − 62.67 )2 + (65 − 62.67 )2 + (63 − 62.67 )2 = 2 (−2.67 )2 + (2.33)2 + (00.33)2 12.6667 = = 2 2 = 6.33 M=
D = S12 S22 S32 ⎡⎣1 + 2r12 r13 r23 − r122 − r232 − r123 ⎤⎦ = (6)2 (5)2 (7 )2 ⎡⎣1 + 2(0.30)(0.40)(0.35) − (0.30)2 − (0.40)2 − (0.35)2 ⎤⎦ = 44100 [ 1 + 0.084 − 0.09 − 0.16 −0.1225] = 44100 [0.7115] = 31377.15 D Lmvc = 2 2 2 2 2 (S + 2S r )(S − S r + v)
65
66
Applied Psychometry
=
31377.15 [36.67 + 2(6.34 )[36.67 − 6.34 + 6.33]2
=
31377.15 3137.15 = [49.35][1343.96] 66324.21
= 0.4731 Lvc =
D S (1 + 2r )(1 − r )2 6
=
D S 4 (S 2 + 2S 2 r )(1 − r )2
=
31377.15 [36.67 + 2(6.34)][36.67 − 6.34]2
=
D (S + 2S r )(S 2 − S 2 r )2
=
31377.15 31377.15 = ( 49.35)(919.91) 45397.50
2
2
=0.691 Lm =
S 2 (1 − r ) S2 − S2 r = S 2 (1 − r ) + v S 2 − S 2 r + v
=
(36.67 − 6.34) [36.67 − 6.34 + 6.33]
=
30.33 = 0.827 36.66
Interpretation of Lmvc Lmvc = 0.4731 Log10 Lmvc = (.4731) = – 0.325 = 32.5 Table value for df = 6 would be for k = 3, df = 65.4685 at 0.05 level and 7.3013 at 0.01 level. The obtained value of –N log10 Lmvc is 32.5 which is greater than the table value. Therefore, Lmvc is significant, which means that the tests are not parallel with regard to mean, variance and covariance.
Criterion for Parallel Tests
67
Interpretation of Lvc Lvc = 0.691 log10 Lvc = log10 0.691 = –0.1605 –N log10 Lvc = (–100) (0.1605) = 16.05 Table value for df = 4 would be 4.12047 and 5.7660 at 5 per cent and 1 per cent level, respectively. The obtained value of –N log10 Lvc is 16.05 which is greater than the table value at 1 per cent. Therefore, Lvc is significant, which indicates that the tests are not parallel with respect to equality of variances and covariances. Interpretation of Lm Lm = 0.827 log10 Lm = log10 Lm = (–100)(3–1)(–0.0825) = 16.50 The table value for df = 2 would be 2.602 and 4.000 at 0.05 and 0.01 levels, respectively. The obtained value of –N (k–1) log10 Lm is 16.50, which is greater than the table value of 1 per cent level. Therefore, Lm is significant, which indicates that the tests are not parallel with regard to equality of means. Table values of –N log10 Lmvc, –N log10 Lvc and –N (k–1) log10 Lm at 5 and 1 per cent level. Table 4.1
Summary Table for Interpretation of Parallel Forms
–N log10 Lmvc
–N log10 Lvc
df
5%
1%
df
2 6 11 17 24
2.602 5.469 8.545 11.981 15.815
4.000 7.3010 10.737 15.509 18.666
1 4 8 13 19
5% 1.668 4.121 6.735 9.712 13.091
–N (k–1) log10 Lm 1%
df
5%
1%
2.882 5.766 8.725 12.025 15.718
1 2 3 4 5
1.668 2.602 3.394 4.121 4.808
2.882 4.000 4.927 5.766 6.552
Source: Author.
USE OF PARALLEL TESTS TO CALCULATE TEST RELIABILITY Parallel tests are important psychometric tools and according to Gulliksen (1961: 194), they offer ‘the best method to obtain test reliability’. Generally, when we talk about parallel forms, we mean two tests which are same in every respect except the items of the test. But when we want to calculate the test reliability, we construct three parallel forms and by definition—of parallel tests—we get tests of equal means, variances and intercorrelations.
68
Applied Psychometry
However, while using this method to calculate test reliability, following points should be taken into account: 1. When the ability being tested changes markedly in the internal between test administrations, the use of parallel forms is not advisable. For example, if we want to determine the reliability of typewriting test (Gulliksen 1961) by administering one form to a group on Monday and another form on Friday, the method would not work if the group was practising (and, hence, improving their typewriting ability) during the intervening time. Likewise, the method is not good when the first test is given when subjects are in excellent ‘form’ and the second test is given when the subject is at his ‘low’ or not up to his abilities or the subject’s ability has decreased for the lack of practice during the intervening period. The same consideration applies, for example, to any test of physical fitness or muscular skills. The two administrations of the test cannot be used to estimate the reliability of the test, if there is a good reason for believing that the subjects have either improved or declined in the ability being tested. For most tests of scholastic achievement and mental ability, it is reasonably easy to be sure that the subjects have not actually changed markedly during the period intervening the two tests. For other types of performance, of which athletic skills of various type are a good example, it is very difficult to maintain a group of subjects at a uniform level of excellence. The skill is likely to enhance/deteriorate with/without practice. In such cases, all the error of measurement cannot be attributed to the test. Much of what shows up in the statistical check as the error of measurement is actually the true variation in ability. However, from another point of view, we must keep in mind that measurement of certain skills is extremely unreliable (regardless of the causes of this unreliability). Hence, while using any such measures we must—for many purposes—treat them just as we would treat very unreliable measures. However, if we are dealing with a time period during which the ability measured will not change systematically for different members of the group and we are dealing with the group of subjects under conditions such that it is not likely that the ability will change, the use of parallel forms of the test is the most realistic method of indicating reliability (Gulliksen 1961). 2. Instability in either the test or the trait being assessed would result in a low correlation between parallel forms. Thus, a low correlation between two parallel forms of a test indicates that the test is an unstable measure of a stable trait and, at other times, such a low correlation may arise from a stable measurement of an unstable trait. Methods for determining the instability of a trait as distinguished from the instability of the test have been suggested by Paulsen (1931), Thouless (1936, 1939), Prestion (1940), and Jackson and Ferguson (1941) (all cited in Gulliksen 1950).
PART 2 Theory and Practice of Psychological Testing
70
Applied Psychometry
5
Introduction to Psychological Testing
CHAPTER OUTLINE 1. 2. 3. 4.
Psychological tests: What are they? Nature and characteristics of psychological tests History of psychological testing Types of psychological tests
LEARNING OBJECTIVES After reading this chapter, you will be able to understand: 1. 2. 3. 4.
What are psychological tests? What are the main characteristics of psychological tests? What are the major types of psychological tests? What are situational and psychometric tests?
71
72
Applied Psychometry
Notes from the Experiential Blurb My association with psychology has not been interesting for me only but has also interested others around me. Once I had been to a small informal student party given by one of my close friends on his 23rd birthday. At the party, my friend introduced me to one of his friends (not from psychology background and hitherto not familiar to me) saying that recently I had finished my post-graduation in Psychology. This man rather than introducing himself surprised me with his first utterance: ‘...Then, definitely you can tell something about my psychology.’ And when I told him that unless and until he shares something about himself like his personal thinking, social life, and so on, I won’t be able to do so. I could read a judgmental insinuation on his face defined partly from his limitations to understand the limitation of a psychologist and partly from the premature death of his curiosity. The lesson is … You meet a layman and tell him that you are a psychologist or from psychology background, and you are no less than a magician for him. Chances are that he will ask you to tell something about himself and if you don’t come up with something (revealing), you disappoint him. Don’t disappoint him so that he unknowingly feels compelled to make an unfavourable assumption about you. Rather let him go more curious and elevate his levels of hope. And the solution to this problem will be provided by psychological testing. Generally people are more interested in knowing about personality (theirs and others) but tell them that you can tell more about them ... like their intelligence, motivations, attitudes, values, interests, social skills, and so on. Equipped with knowledge of psychological testing, you are no less than a magician.
PSYCHOLOGICAL TESTS: WHAT ARE THEY?
A
ll of us have some preliminary idea of what a test means based on the various tests that we have taken since our school days. Then, don’t you remember your graduate level test consisting of a series of questions based on which you were assigned certain ‘scores’ or rank, or your psychology annual examination question paper which ‘tested’ your understanding of this particular subject by making you answer certain number of questions? Then, have you ever wondered what these tests actually did or for what purpose have they actually been administered to you? Yes. They measured a particular domain of your behaviour or individuality and, here, your particular behavioural domain was intelligence or aptitude related to a particular subject, that is, psychology. And what did you see in the question paper? Did they ask you all the questions related to all the areas of psychology? Definitely not. So, you answered a definite number of carefully selected questions which we call ‘sample’ here. So, you already have some idea about what a test is. In our everyday life, the term ‘test’ is generally used either as a noun or as a verb. As a noun, a test denotes an instrument and, as a verb, it denotes something which is instrumental in giving information about an aspect of a person’s personality or behaviour. Many of these everyday life notions of a ‘test’ find a parallel in case of psychological tests. Thus, we can define a psychological test as a standardised instrument consisting of a series of questions (called ‘items’) which assess certain aspects of a person’s individuality and describe it in terms of scores and categories. However, psychological tests are varied and wide in their use and approach, and are not easy to define. According to Anne Anastasi (1988), a psychological test is essentially an objective and standardised measure of a sample of behaviour. Frank Freeman (1955)
Introduction to Psychological Testing
73
has defined a psychological test as a standardised instrument designed to measure objectively one or more aspect of a total personality by means of samples of performance on behaviour. According to Elmer Lemke and William Wiersma (1976), a psychological test is a device for quantitative assessment, educational and psychological assessment, and educational and psychological attributes of an individual. According to them, operationally, a test secures a sample of behaviour. Anastasi and Urbina (1997) have defined a psychological test as an objective and standardised measure of a sample of behaviour. According to Bean (1953), a test is an organised succession of stimuli designed to measure quantitatively, or to evaluate qualitatively, some mental process, trait or characteristic. According to Singh (2006), a psychological or educational test is a standardised procedure to measure quantitatively or qualitatively one or more than one aspect of a trait by means of a sample of verbal or non-verbal behaviour. According to Sandra A. McIntire and Leslie A. Miller (2007) psychological tests are instruments that require the testee to perform some behaviour. The behaviour performed is used to measure some personal attribute, trait or characteristic—such as intelligence—that is thought to be important in describing and understanding the behaviour. According to Robert J. Gregory (2004), a test refers to a standardised procedure for sampling behaviour and describing it with categories of scores. In addition, most tests have norms or standards by which the result can be used to predict other more important behaviour. So, often, going through the above definitions, it appears that a test consists of a series of items, on the basis of which some information is sought about one or more aspect of an individual or groups traits, abilities, motives, attitude, and so on. This is done by making individuals/group perform on a series of questions called ‘test items’, which elicit the desired behaviour related to the intended trait/ability being measured. As can be seen from some definitions mentioned above (for example, Freeman 1955; McIntire and Miller 2007; and so on), some behavioural dimensions of a person’s individuality (like personality, intelligence) have received more attention compared to others. McIntire and Miller (2007) have offered a continuum of some of the most and least commonly recognised type of psychological tests, as given in Figure 5.1. Figure 5.1
A Continuum of Psychological Tests
More Typical Personality Tests Intelligence Tests
Less Typical Vocational Tests Interest Emunctories Achievement Tests Ability Tests
Self Scored Magazine Tests Classroom Quizzes and Exams
Road Portion of Driving Test Structured Employment Interviews Assessment Centres
Source: McIntire and Miller (2007).
NATURE AND CHARACTERISTICS OF PSYCHOLOGICAL TESTS A simple look to previously mentioned definitions of psychological tests is enough to bring out their essential characteristics (see Activity 5.1).
74
Applied Psychometry
Activity 5.1 Apart from the previously mentioned eight definitions, collect some more definitions of psychological tests. Underline the keyword in each definition. These are the characteristics of psychological tests. Discuss with your group/teacher.
Let me take one definition, that is, the definition given by Robert J. Gregory and discuss the chief characteristics of psychological tests: 1. Standardised procedure: A test is considered to be ‘standardised’ if the procedure for administering it is uniform from one examiner and setting to another. This is ensured by outlining clear-cut instructions for the administration of test in the test manual. 2. Behaviour sample: A test targets a well-defined and finite behaviour or domain known as the ‘sample of behaviour’ due to the constraints involved in truly comprehensive testing. For example, Wechsler Adult Intelligence Scale (WAIS) uses 35 carefully selected words to judge the vocabulary of testee. 3. Scores or categories: This is based on the premise that ‘whatever exist at all, exist in some amount’ (Thorndike 1918, cited in Gregory 2004), and ‘anything that exists in an amount can be measured’ (McCall 1939, cited in Gregory op. cit.). This essentially means that psychological tests assign some specific number (category) to some abstract quality (for example, aptitude, memory, and so on) according to some rules (measurement): 1. Norms/standards: Norms are (statistical) average series of a large and representative group of subjects (Petersen and Kolen 1989, cited in Gregory 2004). Norms help in determining the test taker’s relative standing in comparison to a representative population. 2. Prediction of non-test behaviour: The ultimate purpose of a test is to predict the additional behaviour other than those directly sampled by the test. Suppose a persons score on Maudsley Personality Inventory (MPI) on its introversion scale is 12 (I = 12/40), then this means that the person is highly introvert. So, based on this, we can make a prediction that this person will not be very successful as a customer care executive or a human resource officer.
HISTORY OF PSYCHOLOGICAL TESTING Psychological testing, as it is today, is the outcome of many years’ informal and formal effort to assess the human abilities in certain terms and make predictions on the basis of it. However, most of the major developments in psychological testing occurred in the twentieth century and mainly in the United States of America. But historically, there is much varied geography of psychological testing
Introduction to Psychological Testing
75
and most of the scholars and textbooks trace the origin of psychological testing to the Chinese Civil Services as early as around 4,000 years ago (DuBois 1970,1972, cited in Kaplan and Saccuzzo 2005). Every third year in China, oral examinations were given to help evaluate the work of the employees and make decisions about their promotions. However, according to Erford (2007) (Table 5.1), the Greeks may have been the first to use assessment for educational or military purpose. The testing movement gained its psychological character with the publication of Charlse Darwin’s all time influential book The Origin of Species (1859). Darwin’s ideas offered major impetus to work on the individual differences and it was Sir Francis Galton, a relative of Darwin, who soon began applying Darwin’s ideas to the study of human abilities. In his book Heriditary Genius (1869), Galton contended that some persons posses certain characteristics that made them more fit than others, and he carried out a series of experimental studies to validate his observations. Fathers of Mental Testing
Charles Darwin
Francis Galton
James Mc Keen Cattell
Galton did important work on individual differences which was later extended by the US psychologist James Mc Keen Cattell, who coined the term ‘mental test’. Cattell’s doctoral dissertation was based on Galton’s work on individual differences in reaction time. The works of Galton and Cattell offered the stage for the development and growth in the modern testing and led to its present status as we see today. In the previous century, however, three major areas of application, that is, clinical, educational and industrial, acted like three major forces which led to the phenomenal increase and growth in the development of psychological tests. The major events in the history of psychological testing are briefly given in Table 5.1.
TYPES OF PSYCHOLOGICAL TESTS Typology of tests is a purely arbitrary determination (Gregory 2004). However, using different criteria, psychological tests can be classified in the following manner (Figure 5.2). A brief description of the above mentioned types is given here.
Chinese set up civil service exams to select mandarins.
English university administers first oral examination.
Fitcherberr proposes first measure of mental ability (identification of one’s age; counting 20 Pence).
Lesuit universities administer first written exams.
AD 1219
ca. 1510
1540
Oxford University requires oral exams for degree candidates.
German philosopher Thomasius advocates for obtaining knowledge of the mind through objective, quantitative methods.
1636
1692
Oxford University introduces written exams.
Gross develops theory of observational error.
Weber, pioneer in the study of individual differences, studies awareness thresholds.
Queteler develops and studies normal probability curves.
Seguin develops the Seguin Form Board Test and opens school for mentally retarded children.
Esquirol advocates differences between mental retardation and mental illness; proposes that mental retardation has several levels of severity.
Galton, founder of individual psychology, authors Hereditary Genius, sparking study of individual differences and cognitive heritability.
Wundt establishes world’s first psychological laboratory at the University of Leipzig in Germany.
J.M. Cattell establishes assessment laboratory at the University of Pennsylvania, stimulating the study of mental measurements.
Cattell coins the term ‘mental test’.
Ebbinghaus develops and experiments with tests of sentence completion, short-term memory and arithmetic.
Spearman espouses two-dimensional theory of intelligence (g = general factor; s = specific factors). Pearson develops theory of correlation.
E.L. Thorndike writes about test development principles and laws of learning, and develops tests of handwriting, spelling, arithmetic and language. He later introduces one of the first textbooks on the use of measurement in education. First standardised group test of achievement published. Jung’s Word Association Test published.
Binet and Simon introduce the first ‘intelligence test’ to screen French public school children for mental retardation.
Goddard translates Binet-Simon Scale into English.
Stern introduces the term ‘mental quotient’.
Terman publishes the Stanford Revision and Extension of the Binet-Simon Intelligence Scale.
Yerkes and colleagues from the American Psychological Association (APA) publish the Army Alpha and Army Beta tests, designed for the intellectual assessment and screening of the US military recruits.
1803
1809
1834
1835
1837
1838
1869
1879
1888
1890
1897
1904
ca. 1905
1905
1909
1912
1916
1917
In working with the ‘Wild Boy of Aveyren’, Itard differentiates between normal and abnormal cognitive abilities.
Jesuits agree to rules for administering written exams.
1599
Spanish physician Huarte defines intelligence in Examen de luge (independent judgment; meek compliance when learning).
Greeks may have used assessments for educational purposes.
220 BCE
History of Psychological Testing
500 BCE
Table 5.1
Monroe and Buckingham publish the Illinois Examination, a group achievement test.
Rorschach publishes his inkblot technique.
Kelly, Ruch and Terman publish the Stanford Achievement Test. Kobs Block Design Test measures non-verbal reasoning.
Porteus publishes the Porteus Maze Test. Seashore measures of Musical Talents published. Spearman publishes Factors in Intelligence.
Goodenough publishes the Draw-a-Man Test.
Spearman publishes The Abilities of Man: Their Nature and Measurement.
Arthur publishes the Point Scale of Performance Tests.
Stutsman publishes the Merrill-Palmer Scale of Mental Tests.
Thurstone advocates that human abilities be approached using multiple-factor analysis. Tiegs and Clark publish the progressive achievement Tests, later called the California Achievement Test. Johnson develops a test scoring machine.
Murray and Morgan develop the Thematic Apperception Test.
Piaget publishes Origins of Intelligence Lindquist publishes the Iowa Every Pupil Tests of Basic Skills, later renamed the Iowa Tests of Basic Skills. Doll publishes the Vineland Social Maturity Scale.
Terman and Merrill revise their earlier work (Terman 1916, cited in Erford 2007) as the Stanford-Binet Intelligence Scale (SBIS).
Buros publishes the first volume of the Mental Measurements Yearbook. Bender publishes the Bender Visual-Motor Gestalt Test. Gesell publishes the Gesell Maturity Scale.
Wechsler introduces the Wechseler-Bellerue Intelligence Scale. Original Kuder Preference Scale Record published.
Hathaway and McKimey publish the Minnesota Multiphasic Personality Inventory (MMPI). Psyche Cattell publishes the Cattell Infant Intelligence Scale.
Wechsler publishes the Wechsler Intelligence Scale for Children (WISC). Graduate Record Exam (GRE) published.
Wechsler revises the Wechsler-Bellevue Intelligence Scale as the Wechsler Adult Intelligence Scale (WAIS).
Bloom publishes Taxonomy of Educational Objectives. Kuder Occupational Interest Survey published.
Osgood designs the semantic differential scaling technique.
1919
1921
1923
1924
1926
1927
1928
1931
1933
1935
1936
1937
1938
1939
1940
1949
1955
1956
1957
(Table 5.1 continued)
Oris publishes the Absolute Point Scale, a group intelligence test.
1918
Kirk and McCarthy publish the Illinois Test of Psycholinguistic Ability.
R.B. Cattell introduces theory of crystallised and fluid intelligence.
Strong Vocational Interest Bank published.
American Educational Research Association (AERA), American Psychological Association (APA) and National Council on Measurement in Education (NCME) publish the Standards for Educational and Psychological Testing.
Wechsler publishes the Wechsler Preschool and Primary Scale of Intelligence (WPPSI).
Bayley publishes the Bayley Scales of Infant Development. National Assessment of Educational Progress program implemented. Jensen publishes How Much Can We Boost IQ and Scholastic Achievement? which is controversial.
Form L-M (Third ed.) of SBIS released. McCarthy publishes McCarthy Scales of Children’s Abilities.
Marino publishes Sociometric Techniques.
Wechsler Intelligence Scale for Children–Revised (WISC–R) published. Congress passes the Family Educational Rights and Privacy Act (FERPA).
Congress passes Public Law 94-142, the Education for All Handicapped Children Act. Kuder’s General Interest Survey, Form E-published.
System of Multicultural Pluralistic Assessment (SOMPA) published.
Federal judge Robert P. Peckham rules in Larry P. vs Wilson Riles that intelligence tests are culturally biased when used to determine African-American children’s eligibility for mental retardation services.
Leiter International Performance Scale, a language-free test of non-verbal ability, published.
In Parents in Action on Special Education vs Joseph P. Hannon, Illinois judge Grady concludes that intelligence tests do not discriminate against African-American children due to cultural or racial bias. New York state legislators pass Truth in Testing Act.
Volumes 1–7 of Test Critiques published. High-speed computers begin to be used in large-scale testing programs. Computer-adaptive and computer-assisted testing developed.
Wechsler publishes the Wechsler Adult Intelligence Scale–Revised (WAIS–R).
Kaufman publishes the Kaufman Assessment Battery for Children (K–ABC).
1961
1963
1965
1966
1967
1969
1972
1973
1974
1975
1977
1979
1979
1980
1980
1981
1983
1960
Guilford proposes the structure of intellect model in his The Nature of Human Intelligence. Dunns publish the Peabody Picture Vocabulary Test. National Defense Education Act provides funding for career assessment screening and high school counsellor positions. SBIS revised.
1959
(Table 5.1 continued)
Sparrow, Balla and Cicchetti revise the Vineland Adaptive Beharior Scales, originally published by Doll (1936). AERA, APA and NCME revise the Standards for Educational and Psychological Testing.
SBIS–Fourth Edition (SBIS–4) published, as revised by Thorndike, Hagen and Sattler.
Minnesota Multiphasic Personality Inventory–Second Edition (MMPI–2) published. Wechsler Preschool and Primary Scales of Intelligence revised.
Authentic (performance) assessment and high-stakes testing rise to prominence. Volumes 11–13 of Mental Measurements Yearbook published. Volumes 8–10 of Test Critiques published.
Wechsler Intelligence Scale for Children–Third Edition (WISC–III) published. Kunder’s Occupational Interest Survey Form DD published.
Wechsler Individual Achievement Test (WIAT) published.
Wechsler Adult Intelligence Scale–Third Edition (WAIS–III) published.
AERA, APA and NCME publish Standards for Educational and Psychological Testing–Third Edition. Volume 5 of Tests in Print published.
Nader and Nairn publish The Reign of ETS.
Mental Measurements Yearbook becomes available through an electronic retrieval system.
Educational Testing Service revises its Scholastic Assessment Test (SAT). Wechsler Preschool and Primary Scales of Intelligence–Third Edition (WPPSI–III) published.
Wechsler Intelligence Scale for Children–Fourth Edition (WISC–IV) published. SBIS–Fifth Edition (SB–5) published.
1985
1986
1989
1990
1991
1992
1997
1999
2000
2001
2002
2003
Source: Erford (2007).
US Employment Service publishes the General Aptitude Test Battery.
1984
80
Applied Psychometry
Figure 5.2
Source: Author.
Types of Psychological Tests
Introduction to Psychological Testing
81
According to Mode of Administration According to mode of administration, tests can be classified as individual tests or group tests. Individual tests are administered to one person at a time and are useful for collecting comprehensive information about the testee. They are often used in clinical evaluations. The problems with individual tests are that they are time, cost and labour intensive. An example of individual tests is the children’s individual test for creativity. Group tests are primarily designed for ‘mass-testing’, that is, they can be administered to more than one individual at time. They are economical and time saving. For example, Army Alpha and Army Beta tests.
According to Rate of Performance According to the rate of performance, tests can be classified into Speed Tests and Power Tests. Speed Tests are timed tests, that is, they examine the subject’s speed of responding within the stipulated period of time. Test items in a Speed Test are of uniform difficulty but time limit is such that no examinee can attempt all the items (Chadha 1996). A pure Speed Test is a test composed of items so easy that the subject never gives a wrong answer and his score is equal to number of questions attempted by him; for example, the Clinical Speed and Accuracy Test. Power Tests, on the other hand, offer enough time for the subject to attempt all the questions. In a Power Test, items are managed according to their increasing order of difficulty and certain items are such that they are too difficult for anyone to solve; for example, Raven’s Progressive Matrices (Raven and Court 1998).
According to Behavioural Attribute Measured According to the behavioural attributes assessed, tests can be classified into personality tests, ability (intelligence, aptitude, achievement and creativity) tests and tests of attitudes, values and interests.
Personality Tests These tests are designed to measure a person’s individuality in terms of his unique traits and behaviour. These tests help in predicting an individuals’ future behaviour. They come in several varieties like checklists, inventories and subject evaluation techniques, inkblot and sentence completion tests. Personality tests can broadly be classified further into two categories—structured personality tests and unstructured personality tests. Structured personality tests are based on the premise that there are common dimensions across all personalities which can be measured with
82
Applied Psychometry
the help of a psychological test in an objective manner. In such tests, responses are already defined and the testee has only to choose one of the options in the form of his responses. Tests coming in this category are 16 PF, MMPI, Maudsley Personality Inventory (MPI), and so on. Unstructured Personality Tests, on the other hand, believe in idiosyncratic individual specific needs, which are discovered and measured by analysing the responses given by the testee on the presentation of ambiguous stimuli. Examples of unstructured personality tests are projective tests like Rorschach Test, Thematic Apperception Test, and so on.
Abilities These are the qualities that enable an individual to do specific tasks at a specified time and can be classified into intelligence, aptitude, achievement and creativity (Figure 5.3). Figure 5.3
Ability
Source: Author.
Intelligence Intelligence refers to the global mental capacities of an individual, and tests of intelligence essentially measure rational and abstract thinking of an individual. They are designed to measure the global mental capacities of an individual in terms of verbal comprehension, perceptual organisation, reasoning, and so on. The purpose is usually to determine the subject’s suitability for some occupation or scholastic work. Examples of intelligence test are Wechsler Adult Intelligence Test (WAIS), and so on.
Aptitude Aptitude refers to an individual’s potential to learn a specified task under provision of training. Aptitude tests are designed to measure the subjects’ capability of learning specific task or acquiring specific skill. Examples of aptitude tests are Seashore Measure of Musical Talent (Seashore et al. 1940),
Introduction to Psychological Testing
83
Assessment and Scholastic Aptitude Tests used for college admissions, Guilford and Zimmerman Aptitude Survey, General Aptitude Test Battery (1948), and so on. Achievement Achievement refers to a person’s past learning, and achievement tests are designed to measure a person’s past learning on accomplishment in a task, that is, group tests of Achievement, Stanford Achievement test by Gardner and Madden (1969). The distinction between aptitude and achievement test is more a matter of use than content (Gregory 1994). In fact, any test can be an aptitude test to the extent it helps in predicting the future performance. Likewise, any test can be an achievement test to the extent it measures past Learning. Creativity Creativity refers to a person’s ability to think new ideas and creativity tests are designed to measure a person’s ability to produce new and original ideas, and the capacity to find unexpected solutions to vaguely defined problems. Examples of creativity tests are Torrance Test of Creative Thinking by E. Paul Torrance (1966) and Creativity Self Report by Feldhusen (1965). Apart from this, based on the behavioural dimension measured, tests can again be classified into tests of attitude, values and interests. Attitude refers to our evaluation about various aspects of the world and tests of attitude measure a person’s tendency to evaluate—favourably or unfavourably—a class of events, objects or persons. Examples of attitude tests are Criminal Attitude Scale (CATS) (Taylor 1968); Attitude towards Retarded (Efron and Efron 1967). Values, on the other hand, are normative frameworks related to individual/group behaviour or expectations. Examples of value tests are Allport, Vernon and Lindzey’s test of values. Items on value test contain items like: I would prefer a friend who...
a. b. c. d.
is practical, efficient, and hard working. is seriously interested in thinking out his or her philosophy of life. has leadership and organisational skills. shows artistic sensitivity.
If I had a lot of extra money, I would prefer to...
a. b. c. d.
use it to promote industrial and commercial growth. use to help advance people’s spiritual needs. contribute to the development of scientific research. give it to family- and child-oriented charities.
Source: http://webspace.ship.edu/cgboer/valuestest.html
84
Applied Psychometry
Finally, the tests of interests show a persons preference or interest towards a class of things or objects, for example, Campbell’s Interest and Skill Survey (CISS) (1995).
According to the Medium Used According to the medium used, tests can be classified as paper-and-pencil tests and situational tests. Paper-and-pencil tests require students to read or write independently or to demonstrate the understanding of concepts at a symbolic level. Items of a paper-and-pencil test can be objective type, short answer type or extended answer type. The drawbacks of the paper-and-pencil tests are that they suffer from artificiality, that is, they do not test a person by putting him/her in actual situation. Situational tests, on the other hand, test a person by putting the candidate in the actual or highly real life simulated conditions.
According to the Nature of Test Items Based on the nature of the test items, tests can be classified into verbal and non-verbal tests. Verbal tests are the tests in which responses of the testee are recorded in the verbal format and the emphasis is on reading, writing and oral expression. Examples of verbal tests are Jalota Group General Intelligence Test and Mehta Group Test of Intelligence. Non-verbal tests are those which emphasise but do not altogether eliminate the role of language by using symbolic materials like pictures, figures, and so on. Such tests use the language in instructions but in items they do not use language. Raven’s Progressive Matrices is a good example of a non-verbal test. There is another classification according to the nature of test items, where tests can be classified into objective type tests in which the responses are of multiple choice types, and essay type tests in which the responses are of long answer type.
Based on the Mode of Interpretation Based on the mode of interpretation, tests can be classified into Norm Referenced Tests and Criterion Referenced Tests. Norms are statistical average score of representative populations, with which the relative standing of the testee can be compared. A Norm-Reference Test compares an individual’s results on the test with the statistically representative sample. In practice, rather than testing a population, a representative sample or a group is tested. This provides a group norm or a set of norms. One representation of norms is the Bell Curve (also called ‘normal curve’). Norms are available for standardised psychological tests, allowing for an understanding of how an individual’s scores compare with the group norms. Examples of Norm Referenced Test are MMPI and GRE.
Introduction to Psychological Testing
85
Criterion refers to the measure of performance that we expect to correlate with test scores. So, in case of Criterion Referenced Tests, the testee’s score is compared with an objectively stated standard of performance on that test.
According to Mode of Scoring According to the mode of scoring, tests can be classified into self-scored versus expert scored or hand scored versus machine scored tests. In self-scored tests, the testee himself/herself can score his responses with the help of a scoring key, while in the case of expert scored tests, the test responses are scored by an expert person (generally the test administrator). Hand scored tests are the tests that are scored manually while machine scored tests are the tests that are scored with the help of a machine (computer aided); for example, the Optical Mark Recognition (OMR) sheet responses used for various educational and mass assessment.
According to the Scope A very important but debatable classification based on the scope of the tests is that they can be classified into culture-specific and culture-free tests. All the tests, at least to some extent, are culture specific and it is very difficult to find a test which is totally culture-free.
WHAT ARE SITUATIONAL AND PSYCHOMETRIC TESTS? Another type of tests discussed by Anastasi (1961) and Freeman (1955) are situational and psychometric tests (Freeman has kept psychometric tests within the category of situational tests). According to Anastasi (op. cit.), a situational test is one that places the testee in a situation closely resembling or simulating a ‘real life’ criterion/situation. So, a situational test is very useful in measuring the individual abilities in settings where knowledge of his actual performance in actual condition is very important (for example, combat setting, organisations, and so on.) On the other hand, psychometric tests are based on the work of J.L. Moreno on psychometrics. A psychometric test could be conceived ‘as a technique for revealing and evaluating the social structure of a group through the measurement of frequency of acceptance or non-acceptance between the individuals who constitute the group’ (Freeman 1955: 577). Sociometric tests were used in (American) State training school for girls to determine with whom each individual would prefer to live or work, and with whom each would not want to live or work (Jennings 1943, cited in Freeman 1955). So, you can use a psychometric test to find a night room-partner or even a life partner!
86
Applied Psychometry
CONCLUSION Psychological testing is a big business today. It has emerged as an important tool of objective psychometric assessment of psychological and social attributes. The science of psychological testing has always responded to the needs of a changing society in which man is changing faster, and his mind is increasingly becoming erratic and unpredictable. However, in all such scenarios and the precipitating global economic crisis, the need to tap human potential is at an all time high, though social tolerance is at an all time low. Psychomericians have responded well to all these challenges, and now there are specialised tests for every conceivable psychometric attribute to the acceptable levels of reliability and validity. Apart from increasing emphasis on specialisation, psychomericians have also shifted their attention towards what can be called ‘Positive Testing’. Positive Testing can be described as an ally to Positive Psychology which aims to enrich human lives by focussing on topics like the quality-oflife, hope, happiness, compassion, and so on. The current testing scenario is no longer dominated by a single construct or faculty, as happened earlier in the case of mental testing and personality testing. Though qualitative psychology is today a force to reckon with, it can never replace the necessities of quantification and standardisation related to psychological tests. Also, the current trend is dominated by the use of technology in testing and the world wide web has opened the scope of ‘virtual testing’. However, it has simultaneously also created the problem of authenticity. The current period in psychological testing can be called a transition period, but it can be stated with fair amount of confidence that the future of psychological testing is very bright.
6
Test Construction
CHAPTER OUTLINE 1. Test construction and standardisation 2. Steps involved in test construction: Step 1: Planning for the test Step 2: Preparing the preliminary draft of the test Step 3: Trying out the preliminary draft of the test Step 4: Evaluating the test Step 5: Construction of the final draft of the test
LEARNING OBJECTIVES At the end of this chapter, you will be able to understand: 1. 2. 3. 4.
Why psychological tests are constructed what they measure? What are the necessary steps involved in test construction? What are the main points that one should keep in mind while planning for a psychological test? What should a test developer do to reduce the effect of guessing or situational factor while constructing a psychological test?
87
88
Applied Psychometry
TEST CONSTRUCTION AND STANDARDISATION
A
ccording to their respective requirements, a school teacher, research scholar, psychologist, educationist, sociologist, army officer, and so on, prepare tests for assessing mental abilities and, for this, the knowledge of test construction and its standardisation is essential. Almost all psychological tests are constructed and standardised in a similar way. The only difference lies in the purpose of the test and in the content of items. Therefore, prior to test construction and its standardisation, some general rules must be considered. The construction of a test and its standardisation are two different but related concepts. In test construction, after item analysis, the items are finally chosen, whereas in standardisation, the chosen items are administered to large groups and then standard norms are prepared according to the results. In other word, test construction is one of the steps in standardisation. A test can be constructed and may or may not be standardised, but for standardisation, a test must be constructed. Usually, many tests are constructed for a special purpose and, beyond this purpose, they have no value, but the tests which are standardised can be used for a wide range of purposes.
STEPS INVOLVED IN TEST CONSTRUCTION By test construction, we mean the final act of choosing the most appropriate items or questions that are to be included in a test. Generally, for all psychological and educational test construction, the following five steps are used: 1. 2. 3. 4. 5.
Planning the test, Preparing the preliminary draft of the test, Trying out the preliminary draft of the test, Evaluating the test and Construction of the final draft of the test.
Planning the Test The first task of a test constructor is to produce the outline of the desired test, that is, the plan of the test. For this purpose, the subject, medium, administration, procedure, sample, population, and so on, are established and age, sex, educational qualification, mother tongue, rural/urban, socio-economic status and other environmental factors must be considered. The particular mental or behavioural characteristics should be clearly stated before test construction is undertaken. Thus, it is a fact that without any purpose, a test cannot be constructed. The test constructor himself sets the purpose of the test, which must be clear, relevant and in tune with the behaviour of the testees.
Test Construction
89
Practical Tips while Planning for the Test 1. (i) Specify the objective of your test clearly. (ii) Specify the method of meeting these objectives. (iii) Specify the theoretical background involved in the use and measurement of constructs. 2. Review of the literature: (i) What is the previous research on the construct? (ii) If the previous research exists then in what way your work is an extension to earlier work? 3. Give operational definition of the construct: This means providing an objective and measurable definition of the construct. For example, if the construct that you want to measure is ‘leadership’, then you may operationally define it as: ‘leadership is a proactive relational process in which a person (called leader) proactively relates and guides a group of persons (called followers) for the realisation of some shared vision or mutually agreed upon goal(s).’ 4. Decide the population and the sample: This means answering the question ‘what is the group on which you will develop the test?’ For example, college students may constitute the population, and 50 male students and 50 female students chosen from Delhi University graduate courses may constitute the sample. Source: Author.
After the aim has been established, the subject matter is decided, keeping the aim of the test in mind. If the aim of the test is to measure the intelligence of students up to 16 years of age, the subject matter should be such that it appropriately measures the required abilities. Therefore, the content of the test may be verbal, arithmetical or pictorial matter. Likewise, if an aptitude test is to be constructed for any directed work, then it has to be determined that in which sphere does such a test measure a person’s aptitude. An example of a question from a mechanical aptitude test: Question: Which of the two following figures can you correlate?
1. 2. 3. 4.
a and b b and c c and d d and a
Source: Author.
90
Applied Psychometry
1. Characteristics of some good/well-written items: (i) (ii) (iii) (iv)
Items should be situational in nature, Should be of moderate length, Should be of moderate difficulty and Should not use technical and culturally biased words and phrases.
2. Examples of some well-written items: (i) I take an initiative and I am willing to take appropriate levels of risk to achieve my goal: a) Yes b) No c) Undecided (ii) I am proactive while dealing with people: a) Yes b) No c) Undecided 3. Examples of some poorly written items: (i) I think leadership involves nurturance and quid pro quo: a) b) c) d)
Agree Disagree Often Sometimes
(ii) ‘A leader is a dealer in hope . . . merchant in faith . . . manager of dreams . . . leader is a person who aligns people along the contours of common sympathies for the realisation of shared goals.’ a) Yes b) No c) Can not say.
Therefore, while planning a test, the tester usually considers its objectives, including the subject matter of the items, and the capabilities, educational standard, age factor, and so on, of those persons for whom the test is to be constructed or undertaken. Besides, the format of the test (paper-andpencil or performance, verbal or non-verbal), its medium (Hindi, English, Punjabi), the way it has to be administered (individual, group or both), the amount of money and time involved, characteristics of the testees such as their age, sex, ability, experience, and so on, of the testee will also be highlighted. Thus, in the first step of test construction, the following four points are to be considered: 1. Arrangement of assessment of the test objectives, 2. The objective for which the test is being constructed, 3. Reflection of the objectives in the test items. The items for the test should be in accordance with the objectives of the test and 4. What will be the form, medium and language of the test and under which conditions will it be administered to persons of a certain age, sex, and so on.
Test Construction
91
Preparation for the Preliminary Tryout of the Test Only after the definite and organised planning of the test, the tester prepares for the test’s preliminary tryout form. First of all, he selects the various items from other sources according to the basis of the subject matter and himself constructs the test. He collects the items on the basis of his experience; from the available standardised or constructed tests, he selects the items; or by constructing the tests from other sources which could represent the subject matter of the test. Those collected items are displayed in an organised way on the basis of objectives and the subject matter. All types of items required for the test are constructed in a pre-tryout form. To make the test interesting for the testees, there should invariably be more than one item included in it. Therefore, the testers are confronted with the problem of selecting the test items and presenting them with appropriate response criteria. For example, the type of items to be included in the test should be determined, such as true/false, yes/no, recall, multiple-choice type, supplementary type, comparative parallel type, and so on. The selection of the forms of the items determines the score’s product reliability. Most testers emphasise the inclusion of a particular type of item for the full test, because by doing so, the administration becomes easy. For including different types of items, separate directions have to be given. The inclusion of only one type of item not only saves time but the process also becomes easier. Although the use of various types of items in the test results in complications which measure different types of abilities, certain test constructors prefer to keep up the interest by using various types of items that beget different kinds of responses. It will, therefore, not be unjustified to point out that one type of item for the small tests and different types of items for the large tests are useful. After collecting the items of the test and determining its item types, it is essential for the tester to evaluate the items for the preliminary tryout of the test. First, the collected items of the test should be sent to two to three specialists of that field for knowing their views. Each item should have the same response format. In case of multiple choice items, the wrong response should also be considered. Besides the clarity of words, their usefulness, sufficiency of test material, the forms of the test, their arrangement, and so on, should also be reviewed. The preliminary tryout should be given to specialists for testing. They should be informed of the standards of age, education and other important points of the target group, so that improvements are made on their suggestions. After the preliminary tryout is amended, the tester should write down the instruction separately for the testee and the test administrators. The testee should divide the instructions in two parts: (a) ordinary instructions, the form of the test and the description of the objectives and (b) detailed special instructions relating to the test should be given which should be clear and understandable. The usefulness of the instructions can be checked by solving the exercise items, which a testee has to solve in the final real test. At the same stage, the tester should also devise the value-assessment process. For this, he will have to determine what score (weighed score or standard score) should be given. Here, he will have to make a value-assessment guide or scoring stencil. Even if the responses are to be received in yes or no, he will have to make it clear with the help of the scoring stencil that in the Item No. 1, if it is ‘yes’, what would it mean. Likewise, the tester should devise the value-assessment process and, at this stage, whatever may be the form of the items, this value-assessment process should be amended after the preliminary check in the light of the evidences obtained.
92
Applied Psychometry
In the preliminary form of the test, there are usually double the numbers of items than there are in its final form, and these are gradually arranged for simple to complicated ones. Therefore, at this stage, the tester should remember the following points: 1. To collect the test items from different sources. 2. To include the various forms of the items (which get responses from different sources). 3. Reviewing and editing of the items by specialists to avoid the use of words such as always, seldom, completely, and so on. 4. To write down the instructions separately for the testee and the test administrator. 5. To determine the mode of value assessment in such a way that similar items are arranged or presented together for convenience of assessment and interpretation along with providing convenience to the testee. After the construction of the test and its pre-tryout form, an effort is made to evaluate the test for its quality, validity and reliability, and to delete the unnecessary items. Therefore, prior to the construction of the final form, it is essential to test the pre-tryout form. This is also known as pilot study. This is done for the following purposes: 1. By this check, the weak and erroneous items, those with double meanings, uncertain items, inadequate items, those with incomplete meaning, very difficult and very simple items should be deleted from the test. 2. The test objectives should be reflected in all the selected items in order to ensure the validity of every individual item. 3. To indicate the actual number of items included in the final form of the test. 4. To express or bring out the shortcomings of the responses of the testee and the tester. 5. To determine the inter-item correlations and, thus, prevent overlap in item content. 6. To arrange all the items of the test in sub-parts. 7. To determine the test instructions, related precautions and groups to be affected. 8. To know the actual limit of the final form of the test. 9. To determine the value assessment of the test. Under the present steps, the evaluation of the test can be divided into two parts, which are as follows: 1. The first evaluation is known as the pre-tryout and its objective is to find out the main shortcomings of the test and remove them. 2. The second evaluation, known as actual tryout, relates to a very important aspect of the test, that is, item-analysis.
Preliminary Tryout For the pre-tryout, the test is usually administered on 15 to 20 per cent of the total population with the objective of finding out its main shortcomings and remove them somehow. Thus, the test is
Test Construction
93
administered on a small group by which many aspects related to the test are estimated and the main shortcomings are removed. However, this test administration does not allow any individual item analysis. In this process of evaluation, first, that sample is determined for which the test is being constructed and, for its evaluation, it is administered to a representative sample of the same group. The time to be taken for the administration of the test is also determined. If the test has a parallel form, then the difference in the time taken in the administration of both the tests should be recorded. In the same way, the instructions of the test should be devised in brief and with similarity, so that these are easily and clearly understood by the testees. Besides, the instructions given to the subjects, the insufficiency of the test and exercise items should also be noted. At the same time, the scoring process which is easy or familiar should be used. If, under some circumstances, the possibility of guessing or the effect of situational factors exist, then the following correction formula should be adopted: S = R−
W N −1
Where, S = pure score for guessing R = number of correct responses W = number of wrong responses N = total number of responses available
Actual Tryout The actual tryout is the stage at which item analysis is carried out for the screening of the test-items. According to Guilford (1954) the number of testees in actual tryout phase should be around 400.
Evaluating the Test After checking the test, it is evaluated. The test which is appropriate for measuring specific variables will provide the best results. The evaluation of the test is done on the following condition: 1. As the purity of the test is an index of its difficulty level, it is important to determine it. Usually, items with 50 per cent difficulty level are considered appropriate. A test should not include items which are either so easy that they are correctly solved by all the group members or so difficult that they are solved by none. Hence, for the evaluation of the test, study of its difficulty level is the first step. 2. In the second step, item validity and discrimination value are studied. Usually, a test should include those items which can differentiate/discriminate between extreme groups such as the upper scoring and lower scoring groups. Items with zero or negative index value are not
94
Applied Psychometry
included. The views and criticism of those who take the test can be considered. According to their suggestions, the language of an item can be changed or an item may be discarded. 3. For evaluation, the test can be compared with some other standardised test meant for the same purpose. 4. The reliability of the test is also to be determined. Low reliability indicates that the result of the test cannot be relied upon. Different methods are available to determine test reliability.
CONSTRUCTION OF THE FINAL DRAFT OF THE TEST The final test construction starts after initially testing and evaluating the test at different levels. Usually, the final test includes those items which are valid and have appropriate difficulty level. Instructions for the testees are given properly and clearly, so that the test can be used in a scientific manner. The limits (the range of scores) and the scoring method are also determined. At this stage, all the important aspects should be properly organised because the reliability and validity of the test depend on the final format of the test.
7
Item Analysis
CHAPTER OUTLINE 1. 2. 3. 4.
Introduction: Item Analysis Item discrimination Item difficulty Item validity: (i) Biserial Correlation method (ii) Point-Biserial correlation
5. Role of item characteristics curve in predicting the test scores
LEARNING OBJECTIVES After reading this chapter, you will be able to understand: 1. 2. 3. 4. 5. 6.
What is item analysis? What are the steps involved in item analysis? What is item discrimination and how to calculate it? What is item difficulty and what are various methods to calculate item difficulty? What is item validity? How to use Biserial and Point-Biserial Method to calculate item validity?
95
96
Applied Psychometry
INTRODUCTION: ITEM ANALYSIS
T
he effectiveness and usefulness of any test depends upon the qualities of the items that are included in it. The score of the test is obtained as a result of its validity, reliability and the intercorrelation between two items. Hence, to make the test more effective, the test constructor should study one by one all the items which are to be included in it. This process is known as item analysis. In other words, in this method of item analysis, all the item of the test are studied individually to see as to what number of persons of a group or percentage has actually tried to respond or solve each item. Under this method, the usefulness of the item is analysed because the quality and utility of the test depends on those items which finally construct the test. So, in the process of test construction, item analysis is quite necessary for the selection of items, as the final test form will be according to the objectives and subject matter of the test. In the context of the item analysis, according to Freeman (1962), two valuable things that should be considered while analysing an item are item difficulty and item validity. According to Guilford (1954), item analysis is the best method for the selection of items in a test finally. According to Harper and Stevens (1948), item analysis is the most appropriate way to bring improvement in measurement results. It has been found effective in improving teachers’ behaviour, understanding of students and the evaluation of the teaching–learning process. Under the item analysis method, two technical aspects—item validity and item difficulty (ID)—are usually considered for the value assessment of the items.
ITEM DISCRIMINATION There are two methods for this: (a) significance of the difference between proportion and (b) correlational technique. In case of significance of difference between proportions, the percentage or proportion of individuals who answer the items correctly in the high group is tested against the proportion in the low group. If the difference is significant for a particular item, then that item is accepted as being the one which discriminates. On the other hand, if the difference is non-significant for a particular item, then that item is rejected as being one which does not discriminate. The difficulty with this method is that it does not reveal how well does each item discriminate. T.L. Kelley (1939) demonstrated empirically that when the response of the individuals in the upper 27 per cent was compared with the response of the individuals in the lower 27 per cent, the ratio of the difference between the means of the two groups over a probable error of the difference between the means was maximum. Since that time, researchers have accepted Kelley’s findings and used this value of 27 per cent for selecting the high and low groups. In case of the second method, that is, correlational approach to item analysis, a correlational coefficient is computed to indicate the relationship of the responses to the total test score. This means how well the item is doing and what the test itself is doing. Suppose we have responses of 300 individuals to a 70-items test. First, we will set up a distribution of scores on the Y-axis of a scatter plot, and because the item is generally scored on a right or wrong basis, we have only two categories on the X-axis (see Table 7.1).
Item Analysis
Table 7.1
97
Responses of 300 Individuals to One Item of a 70-Items Test
Test Score
Right
52–54 49–51 46–48 43–45 40–42 37–39 34–36 31–33 28–30 25–27 22–24 Total
Wrong
17 30 24 14 19 16 13 12 18 05 06 174
14 18 15 13 08 07 14 11 16 06 04 126
Source: Author.
Now we compute either a Point-Biserial or Biserial correlation. We may use tetrachoric correlation or phi-coefficient if the distribution of scores is dichotomised with regard to the response to the item. A similar procedure would be repeated for the remaining 69-items of the test. As we observed, this method is too time consuming because it requires a large number of computations and data processing. For example, for a 100-item test, we need to find 100 correlations. To overcome this difficulty, Flanagan (1939) devised a shorthand method. It is a correlational chart from which correlation coefficients can be read directly by entering the percentage of subjects answering the item correctly in the upper 27 per cent on the Y-axis and lower 27 per cent on the X-axis of the graph. Suppose data is given as in Table 7.2 to understand the use of the Flanagan method. Take any test of, say, 80 items administered to 300 individuals. The group was divided into upper 27 per cent and lower 27 per cent, and found the percentage of cases who correctly answered each item in the upper and lower 27 per cent group. The first column in the table gives the item, second and third Table 7.2 Items 1 2 3 4 5 6 7 – – 80 Source: Author.
Per cent Correct Upper 27 Per cent 0.80 0.60 0.50 0.90 0.60 0.80 0.30 – – 0.60
Datasheet for Item Analysis Per cent Correct Lower 27 Per cent
Difficulty (p)
Discrimination (r)
0.60 0.30 0.34 0.70 0.20 0.10 0.40 – – 0.40
0.70 0.45 0.42 0.80 0.40 0.45 0.35 – – 0.50
0.24 0.31 0.16 0.30 0.42 0.70 0.09 – – 0.20
98
Applied Psychometry
column give the per cent correct for upper 27 per cent and lower 27 per cent, respectively. The fourth column indicates the difficulty (p) of the item which is the estimate of difficulty values obtained by averaging the proportion in the second and third columns. In the fifth column, the correlation coefficients are given as obtained from Figure 7.1. Enter for Item 1 with 0.80 on Y-axis and 0.60 on X-axis. The point where 0.80 and 0.60 intersect is the estimated value of correlation coefficient (r). It is a point between two axes of the figure. Here, the value of correlation coefficient is 0.24. Take, for instance, Item No. 4, in which, we enter 0.90 on the Y-axis and 0.70 on the X-axis and find that they intersect on the arc of 0.30. Therefore, 0.30 is the correlation coefficient for Item 4. The procedure is repeated for all the items of the test. Figure 7.1
Estimating Biserial Correlation between Item and Total Test Score
Source: Guilford (1954: 428).
Another Flanagan chart is given in Figure 7.2 for estimating the phi-coefficient using the datasheet. On Figure 7.2, we estimate the phi-correlation for Item 1 with the correlation as done earlier. Enter for Item 10.80 on the Y-axis and 0.60 on the X-axis. The point where they intersect is the estimation
Item Analysis
Figure 7.2
99
Estimating of the Phi-coefficient When Variable has an even Division of Cases in Two Categories
Source: Guilford (1954: 429).
of the phi-coefficient. It is a point estimated between two axes of Figure 7.2. In this case, we get the value of the phi-coefficient to be 0.45. The value is not the same as the value estimated from Figure 7.1. If we apply each method separately to the same data, almost the same items will be selected which will be discriminating. Similarly, Figure 7.3 is used for estimating the Point-Biserial correlation when one variable is divided at the median of the distribution, and Figure 7.4 is used to estimate the tetrachoric correlation, where one variable is divided at the median of the distribution. After computing the discriminating power with the help of the respective correlations, the next step is to test the significance of these estimated coefficients. If the coefficients are significant, only then will these items be retained. In selecting items for further use, items are selected/retained on the basis of both their difficulty values, and the items that are retained show significant differences between two criterion groups.
100
Applied Psychometry
Figure 7.3
Estimating the Point-Biserial Correlation When One Variable is Divided at the Median of the Distribution
Source: Guilford (1954: 430).
ITEM DIFFICULTY (ID) Another problem in determining difficulty level is encountered after determining item validity. These two independent aspects of item analysis are interrelated because both item validity and difficulty levels are determined simultaneously. The number of difficulty levels equals the number of items in a test. Usually, the percentage of those who pass the test item and those who fail is determined to find out the difficulty level. Neither a very easy item which can be responded to by every individual of a group nor the difficult items that cannot be responded by every individual of a group are included in the test. Generally, that item is considered to be of appropriate difficulty level to which 50 per cent of any group have responded, but it does not imply that only those items are included which can be responded by 50 per cent of the group, but those items should also be selected which are responded to by best (upper) and lowest person of the group.
Item Analysis
Figure 7.4
101
Estimating the Tetrachoric Correlation When One Variable is Divided at the Median of the Distribution
Source: Guilford (1954: 431).
Difficulty level helps in arranging the items in order. This indicates which item will come first, in the middle, at the end or in any other position. Now the question arises as to how to determine difficulty level? Though there is no single appropriate method to determine the difficulty level, five methods are henceforth discussed. Prior to this, however, views of certain specialists on difficulty level are given. According to Bradfield and Moredock (1957) the degree of discrimination may be taken as an index of item difficulty. An item which 90 per cent of the group answered correctly would be considered an easy item. One which only 10 per cent answered would be termed very difficult. An item that half of the class answered correctly and half answered incorrectly is said to have 50 per cent difficulty. According to Tate (1955), item difficulty can be calculated by finding out the proportion of the subjects that have answered the item correctly. According to him, difficulty value of an item is inversely proportional to the proportion of students that have answered it correctly. Garrett (1958) says, ‘The number right or the proportion of the group which can solve an item correctly is the standard method for determining difficulty in test.’
102
Applied Psychometry
The five important methods of estimating the difficulty level of items are: 1. 2. 3. 4. 5.
Through correct responses of 27 per cent upper and 27 per cent lower group, Harper’s facility index (FI), From the item’s responded percentage, By normal curve and By the means of formula.
Through Correct Responses of 27 Per Cent Upper and 27 Per Cent Lower Group By this method, for determining difficulty level, first of all, 27 per cent lower and 27 per cent upper of the group are selected and, then, summation of the correct responses for the items of each group is done by the following formula: ID = 100 −
U+L 2
Where, U = Upper 27 per cent of persons who solve the item correctly, L = Lower 27 per cent of person who solve the item correctly. For example, if an item has been solved by 30 persons from the upper group and 20 persons from the lower group, then, ID = 100 −
30 + 20 = 75 2
According to this method, the difficulty level ranges in between 0 to 100. No item should be so simple that it would be solved by all members of group, nor should it be so difficult that no member of the group is able to solve it. Even then, item with ID less than 19 or even more than 90 is considered invalid or weak.
Harper’s Facility Index (FI) Kelley and Harper (1967) also used the method of responses of 27 per cent upper and 27 per cent lower group. They ask why only 27 per cent? Though it is a technical argument, but even then it is that percentage through which reliable conclusions can be drawn. In this, the correct responses of upper and lower groups are totalled and then divided by maximum total correct responses. FI =
R(U ) + R(L) × 100 2E
Item Analysis
103
Where, R(U) = right responses given by upper 27 per cent group, R(L) = right responses given by lower 27 per cent group, E = maximum total correct responses. Suppose, out of a total 300 persons, 45 of upper group and 35 of lower group persons respond correctly. Then, its FI is equal to, =
45 + 35 80 × 100 = × 100 2 × 81 162
= 49.38% This item’s difficulty level is 49.38 per cent. The items whose FI lies between 35 per cent and 85 per cent are chosen as the items with the highest difficulty level (Dedrich 1960, cited in Chadha 1996).
From the Item’s Responded to Percentage Any item’s difficulty level can be estimated on the basis of the percentage of testees answering the particular test item. If the selection of any test item is according to the 50 per cent difficulty level, then testees can be divided into two groups: Those above this predetermined dividing point and those below it. These items cannot differentiate among the individuals in the group above 59 per cent level nor among these below it. Hence, for maximum differentiating efficiency, a test must contain items at various levels of difficulty as represented by the pass percentage. In terms of difficulty, the discriminative value of a test item is the degree to which individuals, who vary in regard to the character being measured, can be differentiated.
By Normal Curve Difficulty levels can be given also in terms of the standard deviation of the normal curve. Thus, if 84 per cent of the testees pass an item, it has a rank of –1 σ (S.D.) (that is, one S.D. below the means). If an item is passed by 16 per cent of the testees, its rank is + 1 σ (that is, one S.D. above the means). In other words, if an item is passed by 69 per cent or 31 per cent, the ranks would be –0.5 σ (S.D.) or + 0.5 σ (S.D.), respectively. Thus, the difficulty level would be 69 per cent or 31 per cent. Some psychologists prefer this index because the standard deviation is a property of the normal curve.
By the Means of a Formula To determine the difficulty level of an item, the following formula can be used:
104
Applied Psychometry
ID =
Ni × 100 Nt
Where, ID = item difficulty Ni = Number of testees who have responded to the item correctly Nt = Total number of testees For example, take a group of 85 students who are administered three items of a test. Item 1 is correctly answered by 55 students, Item 2 is correctly answered by 48 students and Item 3 is correctly answered by 66 students. Therefore, ID of item 1 =
55 × 100 = 64.71% 85
48 × 100 = 56.47% 85 66 ID of item 3 = × 100 = 77.65% 85 ID of item 2 =
It can, hence, be concluded that the value of ID is that percentage of the individual of a group who have answered any question of the test correctly. The determination of the difficulty level is not important for the discriminating value of the test. The degree of difficulty level is also determined in the situation when the test is to be divided into two equal forms. The difficulty level of the items is according to the test form. Whereas in a Speed Test all the items are of same difficulty level, in a Power Test, the difficulty level of the items increases in serial order. Other than these, there are some tests whose objective is to divide the testees into two groups: pass–fail or satisfactory–unsatisfactory. Therein, it is necessary that a predetermined level of difficulty be selected.
ITEM VALIDITY It is important from the validity point of view of the test to see the validity of each item because independent validity of items is the validity of the whole test. When each item of the test effectively discriminates on the basis of a particular quality or ability between high and low or strong and weak persons of a group, then only it can be called valid. Therefore, each item has to be discriminative besides being fully valid. The discriminative quality of the item is the indicator of its being valid too.
Biserial Correlation Method This method is considered most appropriate to determine item validity or the item’s discrimination power in the process of item analysis. In this method, internal consistency is used to find item validity.
Item Analysis
105
For this, the first step is to calculate from both upper and lower groups the number of those who respond correctly to the item and then Point-Biserial correlation is computed. In other words, through this method, Biserial correlation for every item is determined from the total test scores and different subtest scores. Not only the Biserial correlation between different subparts of the test, but also the subparts of the test are determined. The main objective of determining Biserial correlation is to find the consistency between a person’s total test outcome and that of the subparts. For this, an item is given marks as pass/fail or +/–. The assumption of this process is that for any psychological variable being measured, items are homogeneous. Other than this, every item’s success/failure of the total test and its subtests, total score is correlated which expresses, its validity. Since determining Biserial correlation of every item of test is a difficult, lengthy and tedious process, hence, to find the internal consistency of the test, usually, Product Moment Correlation between the total test and subtests, and between different subtests is determined. This is also known as the internal validity of the test, which also indicates internal consistency. According to Garrett (1958), items that measure similar traits or go in the same direction are grouped together in an internally consistent test.
Method of Determining Item Validity from Biserial Correlation The process of determining item validity and difficulty level using this method is as follows: 1. Arrange the testees in order of their test scores, 2. Top 27 per cent and bottom 27 per cent students are selected, 3. Forty-six per cent of the middle (or central) level students are omitted because they have no importance in determining item validity or discriminating value, 4. After combining the group’s top and bottom scores, these are converted into percentage scores, 5. For chance success, these percentages are improved, 6. Then, in J.C. Flanagan’s (1939) normalised Biserial coefficient series, the success percentage of the two groups (or levels) with Biserial correlation is read from intersecting column and 7. The average of top and bottom level is taken to find (or determine) the difficulty level of the item.
Biserial Correlation In the above discussion, an attempt has been made to understand item validity or item discrimination. In the same context, we now discuss Biserial correlation. The Biserial correlation can be determined from the formula: rbis =
M p − Mq
σT
×
Pq µ
106
Applied Psychometry
where, rbis = Biserial correlation coefficient, Mp = Median of higher group, Mq = Median of lower group, P = equal ratio of higher group people, q = equal ratio of lower group people, µ = the height of side (or arm) of the ordinary arc which divides p and q into two parts, σT = standard deviation of the test. When one variable is measured in a graduated fashion and the other is in the form of a dichotomy, we have the Biserial situation, for which there are two measures of correlation: Biserial correlation and Point-Biserial correlation. The difference between these two measures depends essentially on the type of assumption, which is made concerning the nature of the dichotomised variable. We generally use this method when we have one continuous variable, and another which is actually continuous but which has been forced into dichotomy. The most typical example of a situation calling for one or the other of these measures is to be found in the test (ability and personality) field, that is, the correlation between an item scored as pass or fail (yes or no, like or dislike, agree or disagree, and so on), and a graduated criterion variable (or a total score on all of a set of items). Passing or failing is an example of forced dichotomy. Achievement may be thought of a continuum, ranging from those who pass with exceedingly high honours and down to those who fail miserably. Passing is made up of group individuals from the honours students down to the borderline cases, and failure includes all those who just barely failed to the utter failures. We reduce this continuum to pass-fail dichotomy and as this is the normal procedure in test scoring, the Biserial correlation may be used as a measure of the discrimination index of an item.
Example Compute the Biserial correlation from the following data (distribution of scores in general English examination). Scoring for Item 5 Scores
Group I (Passing)
Group II (Failing)
55–59 50–54 45–49 40–44 35–39 30–34 25–29 20–24 15–19 10–14
1 1 3 4 6 7 12 6 8 2 50
1 0 2 2 3 6 9 4 3 2 32
107
Item Analysis
Solution Scores
Mid-point
Group I (Passing)
55–59 50–54 45–49 40–44 35–39 30–34 25–29 20–24 15–19 10–14
57 52 47 42 37 32 27 22 17 12
1 1 3 4 6 7 12 6 8 2 50
Group II (Failing)
Total
1 0 2 2 3 6 9 4 3 2 32
2 1 5 6 9 13 21 10 11 4 82
Mp (Mean of group I) = [57 × 1 + 52 × 1 + 47 × 3 + 42 × 4 + 37 × 6 + 32 × 7 + 27 × 12 + 22 × 6 + 17 × 8 + 12 × 2]/50 = 299.60 Mq (Mean of group II) = [57 × 1 + 52 × 0 + 47 × 2 + 42 × 2 + 37 × 3 + 32 × 6 + 27 × 9 + 22 × 4 + 17 × 3 + 12 × 2]/32 = 29..50
Calculation of σ (Standard Deviation) Scores
Mid-point (X)
Total ( f)
fX
x=X–M
X2
f x2
55–59 50–54 45–49 40–44 35–39 30–34 25–29 20–24 12–19 10–14
57 52 47 42 37 32 27 22 17 12
2 1 5 6 9 13 21 10 11 4 82
114 52 235 252 333 416 567 220 187 48 2424
27.44 22.44 17.44 12.44 7.44 2.44 –2.56 –7.56 –12.56 –17.56
752.95 503.55 304.15 154.75 35.35 5.96 6.55 57.15 157.75 308.35
1505.90 503.55 1520.75 928.50 498.15 77.48 137.55 571.50 1735.25 1233.40 8712.03
M=
∑ fX = 2424 = 29.56 N
82
Where, M = Mean X = Score F = Frequency N = Total number of observations
108
Applied Psychometry
∑ fX
σ=
2
8712.03 = 10.31 82
=
N
See the value of µ from Garrett (1958: 382) for Area from mean (0.61–0.50) = 0.11 We get µ = 0.384 rbis =
M p − Mq Q
×
pq µ
=
29.60 − 29.50 (0.61)(0.39) × 10.31 0.384
=
0.10 × 0.61 × 0.39 0.02379 = = 0.006 10.31 × .384 3.95904
There is another formula to calculate the value of Biserial correlation coefficient. The formula for Biserial correlation is, rbis =
X P − XT P × σT µ
Where, XP = the mean of the first group (items answered correctly) XT = mean of the total test scores σT = the standard deviation of the test P = the proportion of the first group of items q=1–P µ = the height of the side (or arm) of the ordinary arc which divides p and q into two parts. Let us apply the above formula to the previous data. Group I (Passing)
Group II (Failing)
Scores
Fp
Fq
fr
X’
fpX’
fqX’
frX’
frX’2
55–59 50–54 45–49 40–44 35–39 30–34 25–29 20–24 15–19 10–14 Σ
1 1 3 4 6 7 12 6 8 2 50
1 0 2 2 3 6 9 4 3 2 32
2 1 5 6 9 13 21 10 11 4 82
5 4 3 2 1 0 –1 –2 –3 –4
5 4 9 8 6 0 –12 –12 –24 –8 –24
5 0 6 4 3 0 –9 –8 –9 –8 –16
10 4 15 12 9 0 –21 –20 –33 –16 –40
50 16 45 24 9 0 21 40 99 64 368
Item Analysis
Xp = A.M. +
109
∑ f x ’ (C.I.) ∑f p
p
= 32 +
−24 ( 5) 50
= 29.6 XT = A.M. +
∑f ∑f
TX
’
(C.I .)
T
−24 ( 5) 82 = 29.56 = 32 +
∑X
2
⎡ (∑ fT x ’2 ) ⎤ = ⎢ ∑ fT x ’2 − ⎥ [C.I .] N ⎢⎣ ⎥⎦ ⎡ (−40)2 ⎤ = ⎢ 368 − ⎥ [5] 82 ⎦ ⎣ = [368 − 19.51] [5] = 1742.45
σ= P=
∑X
2
N
∑f ∑T
P
=
T
1742.45 = 4.61 82
=
50 = 0.61 82
For area from mean (0.61–0.50 = 0.11), the value of the ordinate is 0.384 or for area corresponding to the large proportion (P) of 0.61, the height of the ordinate is 0.384
µ = 0.384 X − XT rbis = P σT rbis =
29.6 − 29.56 4.61
0.04 × 1.26 4.61 = 0.01 =
P µ 0.61 0.384
110
Applied Psychometry
The Point-Biserial Correlation There are circumstances, especially in the field of test construction and test validation, where one of the variables is continuous and other is conceived of as a dichotomy. In the usual scoring of items, the procedure is to mark the item either right or wrong. This right/wrong scoring is regarded as being true dichotomy. Examples of true dichotomies (when the criteria are exact) are delinquent– nondelinquent, psychotic–normal, colourblind–normal, and so on. Thus, in such cases, when one variable is continuous and the other variable is truly dichotomous, then, to find the correlation, we have to compute the Point-Biserial correlation (rpbis). The formula is, rpbis =
M p − Mq
σ
pq
Where, Mp = is the mean of the first group Mq = is the mean of the second group P = is the proportion of the first group q = is the proportion of the second group σ = is the standard-deviation of the total score
Example Compute Point-Biserial correlation for the following data: Students 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Test-Criterion
Item No. 5
25 23 18 24 23 20 19 22 21 23 21 20 21 21 22
1 1 0 0 1 0 0 1 1 1 0 0 1 1 1
Item Analysis
111
Solution Number of passing = 9 Number of failing = 6 MP = [25 + 23 + 23 + 22 + 21 + 23 + 21 + 21 + 22]/9 = 22.23 Mq = [18 + 24 + 20 + 19 + 21 + 20]/6 = 20.33 9 = 0.60 15 6 q= = 0.40 15 σ = 1.82 M p − Mq rpbis = pq σ 22.33 − 20.33 = 1.82 P=
(0.60) (0.40)
= 0.54 The Point-Biserial correlation coefficient is a Person–Product Moment Correlation. It is widely used in test construction and analysis. Let us illustrate the Point-Biserial correlation as a special case of Product moment correlation. Let us take the earlier data to calculate the Product Moment Correlation (Table 7.3). Table 7.3 Students 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Σ Source: Author.
Calculation of Product Moment Correlation of 15 Students
Test Criterion (X)
Item No. 5 (Y)
X2
Y2
XY
25 23 18 24 23 20 19 22 21 23 21 20 21 21 22 323
1 1 0 0 1 0 0 1 1 1 0 0 1 1 1 9
625 529 324 576 529 400 361 484 441 529 441 400 441 441 484 7005
1 1 0 0 1 0 0 1 1 1 0 0 1 1 1 9
25 23 0 0 23 0 0 22 21 23 0 0 21 21 22 201
112
Applied Psychometry
r=
=
=
N ∑ XY − (∑ X )(∑ Y ) ⎡⎣ N ∑ X 2 (∑ X )2 ⎤⎦ ⎡⎣ N ∑ Y 2 (∑ Y )2 ⎤⎦ 15(201) − (323)(9) ⎡⎣15(7005) − (323)2 ⎤⎦ ⎡⎣15(9) − (9)2 ⎤⎦ 3015 − 2907 = [105075 − 104329][135 − 81]
r=
108 = 0.539 200.17
r=
X P − XT P × σT q
108 = 746 × 54
108 40284
This value of product–moment is the same as the value of the Point-Biserial correlation coefficient. Another formula to calculate the Point-Biserial correlation is, r=
X P − XT P × σT q
Where, XP is the mean score of those answering the item correctly XT is the mean of the total test scores
σT is the standard deviation of the test P is the proportion of the total groups answering the item correctly q=1–P
Let us apply this formula to the earlier data (Table 7.4).
Item Analysis
Table 7.4
Using the Formula q = p – 1 to Calculate Point-Biserial (rpbis ) Correlation
(X) Test Scores
Item No.
x = X – XT
1 1 0 0 1 0 0 1 1 1 0 0 1 1 1
3.47 1.47 –3.53 2.47 1.47 –1.53 –2.53 0.47 –0.53 1.47 –0.53 –1.53 –0.53 –0.53 0.47
25 23 18 24 23 20 19 22 21 23 21 20 21 21 22 Source: Author.
XP = Mean of item scoring = [25 + 23 + 23 + 22 + 21 + 23 + 21 + 21 + 22]/9 = 22.33 XT = [25 + 23 + ... + 22]/15 = 323/26 = 21.53
σT = P=
∑X N
2
=
49.72 = 3.315 = 1.82 15
9 = 0.60 15
q = 1 − p = 1 − 0.60 = 0.40 r= =
X P − XT P × σT q 22.33 − 21.53 0.60 × 1.82 0.40
X2 12.04 2.16 12.46 6.10 2.16 2.34 6.40 0.22 0.28 2.16 0.28 2.34 0.28 0.28 0.22 ΣX2 = 49.72
113
114
Applied Psychometry
=
0.82 × 1.225 1.82
= 0.539
Problem Find the Point-Biserial correlation coefficient for the data given below: Scores
Yes
No
85–89 80–84 75–79 70–74 65–69 60–64 55–59 50–54 45–49 40–44
2 3 5 2 6 5 8 3 9 8
4 6 3 5 7 8 2 8 5 3
Solution Scores
Yes Fp
No fq
fT
Midpoint (X)
x’
FT x’
FT x’2
fp x’
fx’q
85–89 80–84 75–79 70–74 65–69 60–64 55–59 50–54 45–49 40–44 Σ
2 3 5 2 6 5 0 3 9 8 51
4 6 3 5 7 8 4 8 5 3 53
6 9 8 7 13 13 12 11 14 11 104
87 82 77 72 67 62 57 52 47 42
5 4 3 2 1 0 –1 –2 –3 –4
30 36 24 14 13 0 –12 –22 –42 –44 –3
150 144 72 28 13 0 12 44 126 176 765
10 12 15 4 6 0 –8 –6 –27 –32 –26
20 24 9 10 7 0 –4 –16 –15 –12 23
Xp =
∑ (X )( f ∑f
p
)
= [(2)(87 ) + (3)(82) + ... (9)( 47 ) + (8)( 42)/ 51]
p
=
3032 = 59.45 51
Item Analysis
Xq =
∑ (X )( f ) = [(4)(87 ) + (6)(82) + . . . (5)(47 ) + (3)(42)/ 53] ∑f q
q
=
XT =
3401 = 64.17 53
∑ (X )( f ∑f
T
)
= [( 4)(87 ) + (9)(82) + . . . + (14)( 47 ) + (11)( 42)/ 104]
q
=
6433 = 61.86 104
Another method to calculate XP, Xq and XT,
∑ f X ’ (C.I.) = 62 + −26 (5) 51 ∑f
X p = AM +
p
p
= 59.45 X q = AM +
∑ f X ’ (C.I.) ∑f q
q
= 62 +
26 ( 5) 53
= 62.17 XT = AM +
∑ f X ’ (C.I.) ∑f T
T
= 62 +
σT =
−3 (5) = 61.86 104
∑x N
2
and
⎡⎣ ∑ fT x ’2 − (∑ fT X ’2 )(C.I .)⎤⎦ 2 x = ∑ N
⎡ (−3)2 ⎤ = ⎢765 − ⎥ [5] 100 ⎦ ⎣
115
116
Applied Psychometry
Hence,
σT =
3824.95 = 38.2495 100
= 6.19 p=
51 = 0.49 104
q = 1 − p = 1 − 0.49 = 0.51 rpbis =
X p − XT
σT
p q
=
59.45 − 61.86 0.49 × 6.19 0.51
=
−2.41 × 0.98 6.19
= –0.382 Another method to find rpbis is, ⎡ X p − Xq ⎤ rpbis = ⎢ ⎥ ⎣ σT ⎦
pq
⎡ 59.45 − 64.17 ⎤ =⎢ ⎥⎦ 6.19 ⎣ =
(0.49)(0.51
−4.72 × 0.50 6.19
= –0.382
ROLE OF ITEM CHARACTERISTICS CURVE (ICC) IN PREDICTING THE TEST SCORES The ICC is an important tool of item analysis especially, in terms of its item difficulty and item discrimination. It is a two dimensional curve, in which the total scores are plotted on the X-axis and the probability of its correct response on Y-axis. The curve depicts the probability that a person with
Item Analysis
117
a given ability level will answer the item correctly. Persons with lower ability have lesser chances of answering a problem correctly, while persons with high ability are more likely to answer the item correctly; simply stating that a person possessing higher intelligence will score higher, and a person possessing lower intelligence (ability) will score lower. For example, see Figure 7.5 below. Figure 7.5
Item Characteristic Curve and Item Discrimination
Source: Author.
The steepness or the slope of the ICC shows the discriminating power of an item. Item discriminating power is given by a relationship between the slope of the ICC and the inter-item correlation. The ICC can also be used for the calculation of the item difficulty level; for example, see Figure 7.6 below. In the above figure, Item A is the most difficult item because the probability of answering it correct is minimum, while Item C is the easiest one because it has the maximum probability of being answered correctly. Item B falls in between and, thus, is of moderate difficulty value. So, we can predict the score of a person with the help of the ICC, depending upon his probability to answer am item correctly/incorrectly.
118
Applied Psychometry
Figure 7.6
Source: Author.
Item Characteristic Curve and Item Difficulty
8
Scoring of Tests and Problems of Scoring
CHAPTER OUTLINE 1. Scoring of tests 2. Problems of scoring: (i) Time scoring problems (ii) Response prejudice/bias (iii) Scoring of rank-ordered items 3. Importance of scoring in psychological testing
LEARNING OBJECTIVES After reading this chapter, you will be able to understand: 1. 2. 3. 4. 5.
What is scoring? How is scoring done? How to calculate the scores due to questioning by the subject? What are different types of response biases and how to deal with them? How to score the rank-order items?
119
120
Applied Psychometry
SCORING OF TESTS
I
n this chapter, we will consider some of the basic problems which are associated with the scoring of tests. Usually, an examiner faces a lot of difficulty in checking the questions which are based on multiple choices and it becomes really difficult to evaluate whether the respondent has marked the answer by making use of his prior knowledge and understanding of the subject or merely by guessing it. In any multiple choice question there are four possible outcomes: 1. R (rights), the questions marked correctly, 2. W (wrong), the questions marked incorrectly, 3. O (omits), the questions that have not been marked but are followed by questions that have been answered either as right or wrong. It appears that the respondent tried to attempt the question and then decided to omit it and move on to the next question, and 4. U (unattempted), the number of questions at the end of the test which are not marked. It seems that the respondent did not have the opportunity to attempt these questions before the time was over. For any exam which is based on multiple choice questions, the problem of guessing is increasing day by day. Generally, a student who knows the question will solve it correctly and mark quickly; therefore, he will have more time to approach the other questions than the student who does not know the question. Problem of guessing arises when the student realises that he has only few minutes left and may feel that it is rather beneficial for him to mark quickly the rest of the questions for which he does have the time to read and get rewarded. If the final marks are taken as questions marked correctly, this student is likely to enhance his marks in the last few minutes than another equally good student who attempted only one question in those few minutes. Earlier, the tests were usually designated with a view that the number of omitted and unattempted questions will be approaching zero. Under ordinary circumstances, if omitted and unattempted items are fairly large, the number of questions marked correctly will turn out to be a dependable score for the examination. This will be the case in which a student reads each question and honestly tries to solve the problem or he has the fear of negative marks (if any). According to Gulliksen (1961), it should be possible to detect such guessing by plotting the number of the last question attempted, as the abscissa against Right responses on the ordinate. On such a plot, the line y = x, would indicate the perfectness of the score as far as the particular question is marked and the line y = (1/n) x, where n is the number of alternatives for each question, would indicate the locus of the average chance score (Figures 8.1 and 8.2). In the above example, a question is composed of five alternatives. The average score from pure guessing would be one-fifth (1/5) of the questions correctly answered and the line y = (1/5) x would be the locus of such scores. If a student has some points with a relatively right response near to this line, it means that a relatively good ‘right responses’ is made by the student who is apparently guessing the answers to a large number of questions. If we want to draw a graph which is more accurate, good scores made by merely guessing would be needed to draw the graph by choosing the number correct (R) as the ordinate against the number
Scoring of Tests and Problems of Scoring
Figure 8.1
121
Relationship between Number of Last Questions Attempted (X-Axis) and Correct Answers (Y-Axis)
Source: Gulliksen (1961).
Figure 8.2
Relationship between Number of Last Questions Attempted (X-Axis) and Correct Answers (Y-Axis)
Source: Gulliksen (1961).
attempted (R + W) as the abscissa. In the earlier example, the number of questions attempted is equal to (R + W + O). In the new graph, the points of chance factor will not come under consideration. The inevitable problem of guessing worsens when some respondents make high right responses on the basis of chance factor between right responses and the number of questions attempted. This situation is really disgusting and some steps must be taken to alter the situation. Gulliksen (1959) suggested that if a test is a trial run, it may be possible to shorten the test by eliminating some of the
122
Applied Psychometry
items, so that more respondents can finish the test. However, if the test scores must be used and if it is not possible for other reasons to shorten the test or lengthen the time (mainly due to the high competitiveness), then it is possible to consider more complicated scoring procedures to avoid the possible effects of guessing. One should be sure enough to consider these formulas only when the omitted and unattempted responses are fairly large for some respondents and fairly small for others. These formulas have no use if we are considering only right and wrong responses. Let A be the number of questions that have not been answered by the respondent; either they were skipped or unattempted at the end of the test, that is, A=O+U Using T to designate the total number of questions in a particular test: T=R+W+A
(A = O + U)
One method of dealing with the problem of variation in amount of guessing from one person to another is to assume that there are 11 alternative choices for each question, and if each person had answered every question, he would have answered 1/nth of them correctly by chance. Let XA be the score that would have been made if all the questions had been tried or attempted, that is, XA = R + (1/n) A
(1)
Example: In a test of 50 questions with each question having five alternatives, the subject’s responses have been categorised in four groups: Right responses (R) = 32 Wrong responses (W) = 08 Omitted (O) = 06 Unattempted (U) = 04 In this case, XA will be XA = 32 + [1/5] (6 + 4) = 32 + 2 = 34 It must be noted here that if some of the questions in the test are so difficult that 1/nth of the respondents attempted the question and they found their answer correct, equation (1) cannot be used. It is better to use the method of problem of correction for guessing by attempting to estimate the number of questions for which the respondent knew the answers. In this method, it is assumed that left blank (O + U) are not known, so that nothing needs to be added for them. The second assumption in this method is that out of the questions that the respondent guessed 1/nth were by
Scoring of Tests and Problems of Scoring
123
chance answered correctly and are included in the tally of right responses. The remaining fraction 1 of the questions answered by guessing, that is, ⎛⎜ 1 − ⎞⎟ represents wrong answers. Let Xc represent N ⎝ ⎠ the number of questions for which the answers were known. Then, XC = R −
W N −1
Xc = 32 −
8 = 30 4
(2)
As shown by the example, equation (1) will always give higher numerical value than equation (2), except for the respondent making perfect score and, again, as mentioned earlier, like equation (1), equation (2) cannot be used when the questions are so difficult that less than a chance proportion of those attempting the item get it correct. However, from the viewpoint of ranking the students, the two scores will give exactly the same results. This is also useful in some correlational studies because these scores are perfectly correlated. Functional Relationship between XA and XC Since,
T=R+W+A A=T–R–W
(3)
substitute this value in equation (1) XA =
N −1 1 1 R− W + T N n n
(4)
T N If we multiply both sides by and subtract the constant , then after simplifying, we ( N − 1) ( N − 1 ) get, N T W = R− XA − N −1 N −1 N −1
(5)
If we compare the right hand side of equation (5) and the right hand side of equation (2), we find both are same. So, it can be inferred that XC is a linear function of XA. There is another method of dealing with the problem of correcting scores for the effects of possible guessing. This is one of the very effective methods against the practice of so-called lucky answering for all unfinished questions just before the allotted time ends. XU = R +
U N
(6)
124
Applied Psychometry
Where, R stands for right responses. Add 1/N of the number of questions at the end of the test that were unattempted by the subject. The only difference between equations (1) and (6) is that no credit is given for omitting a question. One of the advantages of equation (6) is that a student has nothing to lose but everything to gain in case he had time to study. Earlier most of the writers suggested that the best policy is to ensure that practically all questions are attempted by practically all the students and, then, simply score the right responses, but in today’s competitive society, it must be stressed that only the deserving students should get marks, and for this equations (1), (2) and (6) are suggested for the evaluation according to the nature of examination. It has been realised that there are lot of problems of careless errors in the test. A student is able to answer each question if he genuinely knows the answer. Thus, there is no problem of estimating how many questions the student knew, as distinct from how many lucky guesses he made. Undoubtedly, the problem is that how many questions can be solved in a given time. If, in a test, a student responds properly to the questions and the number of omitted responses is zero, then the test can be scored either in terms of right responses or the number of unattempted questions at the end.
PROBLEMS OF SCORING Time Scoring Problems In time scoring, there are three major components which cause the problems. The first part deals with the weighing of responses to items, the second part is associated with scoring procedures and their difficulties and, last, response prejudice.
Scoring Weights As we know that every test is composed of items and its validity may be improved by optimal weighing of its parts, naturally, the question of differential weighing of items and of responses arises (Figure 8.3). Item analysis will reveal that items are not equally correlated with the criterion and that they have unequal variances and correlations with other Items. Gulliksen (1950) has paid much attention to the weighing problem and concluded that the effectiveness of weights in changing the essential character of the common factor variances in scores depends upon several things. It depends first of all upon the range of weights assigned to its components. It must be emphasised that it depends upon the range of weights relative to their mean. If the ratio is greater, then the possibility is that one set of weights will give a composite score that does not correlate high enough with that from another set of weights. If weights correlate perfectly, then composite scores will correlate perfectly. It should be noted that effective weight is determined by its variance and covariance with other items.
Scoring of Tests and Problems of Scoring
Figure 8.3 A Graphic Determination of Scoring Weights for Responses to Items (Pu = Proportion Responding in the Prescribed Manner in the Upper Criterion Group, and PL = Proportion Responding in the Prescribed Manner in the Lower Criterion Group)
Source: Guilford (1954: 446).
125
126
Applied Psychometry
Scoring Procedure and Their Difficulties Usually, we give numbers to right response (R) and deduct in case of wrong response (W), but the difficulty arises in a multiple choice test where there are four possible outcomes to a particular question. There are six formulas designed to score in a test of different situations, as given in earlier part, that is, problems of guessing.
Response Prejudice/Bias When in a test response to a test item gets altered in such a way that it indicates something other than that which we intended it to measure, it is termed as ‘Response Bias’. These biases are usually determined by mental sets of the testee. L.J. Cronbach (1941) has done systematic study of response sets and their effects. According to him, there are six types of response sets: 1. 2. 3. 4. 5. 6.
The set of gamble, Semantics, Impulsion, Acquiescence, Speed versus accuracy and Falsification.
The following four response sets are worth mentioning at this point. For other response sets, refer to Gulliksen (1950) and Guilford (1954).
Semantics When the response categories are ‘agree’, ‘dislike’, ‘strongly disagree’ and ‘sometimes’, there is space for subjective interpretations. What I call ‘often’, another person may call ‘sometimes’. This individual interpretation of the response category leads to constant errors, biasing the scores. A little has been done to establish how much difference in quantitative meaning contributes to true variance. The same difficulty is encountered in the method of constant stimuli. When doubtful judgements are used, and also in the case of rating scales where verbal cues are used to guide the rater, a method that stabilises these subjective interpretations of words will go a long way to improve the psychological measurements of different kinds.
Acquiescence This type of error is associated with that question, in which there are two alternatives (that is, either yes or no). The set of questions are of either affinitive domination or of negative domination, and the
Scoring of Tests and Problems of Scoring
127
score will depend upon the individual’s style of marking. It is found that more individuals give an excess number to ‘true’ responses in a true–false test. For example, a student who guesses to score relatively high on the ‘true’ items but a very poor score on ‘false’ items because the score may be based only on false items, correlates extremely high with that on the total test. This may represent the actual situation with respect to this person’s status by which he scored, but it may also represent distortions due to acquiescence.
Impulsion This set operates when the experimenter has a choice whether to respond or not to respond, to mark or not to mark. This is present in the multiple choice tests, in which, given a model object, the experimenter is to mark all those in a list that resemble it or are identical with it. The more marks they make, the probability of right response increases. The same bias occurs in using a checklist; there, each item is to be marked or not marked. It also occurs in essay examination where the amount written weighs heavily in the mark given.
Falsification This means faking responses, that is, responding to good and sophisticated responses. This set is motivated by the desire to make a good score or to make a good appearance, and to cover up defects and deficiencies. This happens in those cases when the testee is clear what is being tested; for example, in interest inventories and temperament inventories. This faking of responses can increase the scores or decrease the overall scores. While answering, if the testee’s ideas as to which response is more desirable correlated with any trait key, then he can increase the score. He might also be mistaken about the way in which items are actually keyed and might make a poorer score rather than a better one; yet, the score he makes does not describe him as it should and is, therefore, biased. According to Cronbach, there are four principles of response sets: 1. 2. 3. 4.
Sets are consistent and persistent, Response sets make scores more ambiguous, Response sets operate most in ambiguous and unstructured situations and Difficult tests open the way to response sets.
Scoring of Rank Order Items There are certain instances in testing a person, where the testee is required to rank the number of given alternatives on some basis. There are numerous examples for testing this kind of knowledge. In testing for the person’s appreciation of a given viewpoint, it is possible to present three to five
128
Applied Psychometry
arguments, and requires the student to mark ‘1’ for most liked, ‘2’ for next most liked, and so on, to ‘5’ for the least liked. Still in another case, for testing very fine discrimination in any field, each question asked can have three to five answers, and the student is required to grade these answers according to his own will by ranking them in order from best to worst. Scoring of such rank order items is a bit tricky. One way is to prepare a key containing correct order and giving one point for each agreement between key and the student’s ranking. For example, if the correct order is ‘a, b, c, d’, the answer ‘b, a, d, c’ shows zero agreement; so does the answer ‘d, c, a, b’; yet, the first response is better than the second. One way of scoring such matter is to give credit only when every item is correct. All the errors of inversion (2, 4, 6) are treated at par. A second way of scoring considers the number of errors made (that is, how many disagreements are made). This method secures the differences between the rank order given by the subject and the one given by the key. This requires an elaborate scoring procedure, squaring the differences, summing them and then computing the rank correlation by the following formula: R = 1−
6∑ d 2
n3 − n
Where, R = Spearman’s rank coefficient of correlation d = difference in ranks n = number of observations One major drawback is that the computation of a correlation coefficient for each such item on each paper leads to considerable labour and, probably, error. Hence, the above two stated methods have their own disadvantages; a simple and a good method for scoring is to use the sum of absolute differences. In this case, if the examination is scored in terms of the number of errors made, then this sum of absolute differences can be added directly to the error. But if the scoring is in terms of the number of correct responses, then this sum can be subtracted from a constant to give zero disagreement to the highest score and great disagreement the lowest score. The following formula is used: Score: C – Σ |d| Where, Σ [d] is the sum of difference ignoring sign and C is a constant larger than the greatest Σ [d] we are likely to find (negative differences may be counted as zero) For those cases where just three arguments are given, a simpler method is available. The testee is to mark ‘+’ for the best of the three, ‘0’ for the poorest and to leave the middle one blank. By comparing the responses with the key, the person gets two points for perfect agreement with the key, one point if either the best or the poorest alternative has been confused with the middle one and zero for more serious confusions. In order to secure more different scores, it is possible to assign
Scoring of Tests and Problems of Scoring
129
two points for agreement with the key on ‘+’ and two points for agreement on ‘0’; one point for leaving either ‘+’ or ‘0’ blank; and no credit for marking with the wrong symbol. This scoring system gives four points for perfect agreement, three points if there is a confusion of either the best or the worst with the middle one, zero for complete reversal of the correct order and one point for only one inversion from this correct order. Two points are not possible to get. Such a scoring plan makes possible the rapid scoring of rank order items and has given scores that correlate highly with the total score in many instances.
IMPORTANCE OF SCORING IN PSYCHOLOGICAL TESTING Psychological tests are not only important achievements in the psychometric sense but they are also among the most prominent achievements of modern psychology. However, measurement and testing of psychological attributes have always been challenging for psychometricians because of some problems inherent in the psychological measurements. (For a detailed discussion on this issue, readers may refer to Chapters 1 and 2.) The measurement of psychological attributes has to have a base on some conceptual foundation regarding the quantifiability of the psychological attribute. One such assumption is inherent in the Positive–Objectivist idea that ‘everything that exists, exists in some amount . . . and everything that exists in some amount can be measured’. Psychological tests are the tools with which we can describe psychological attributes in terms of scores and categories. So, we can say that without a proper scoring method, development of psychological attributes is not possible. Consider a psychological test measuring Emotional Quotient (Chadha 2006), in which responses are scored in the following manner: 3. At the workplace, due to some misunderstanding, your colleagues stop talking to you. You are convinced that there was no fault of yours. How will you react? a. b. c. d.
Wait till they come and start talking to you again. Take the initiative, go forward and start talking to them. Let things take their own time to improve. Ask someone to mediate.
18. While having an argument with someone, if you lose, you: a. b. c. d.
Feel totally beaten. Wait for the next opportunity to beat your opponents. Winning and losing are part of the game. Analyse the reasons for the loss.
These items have been scored in the following manner in Emotional Quotient Test (Chadha 2006): 3. a. b. c. d.
15 20 5 10
130
Applied Psychometry
18. a. b. c. d.
5 10 15 20
Now, to answer the question as to why only these particular numbers have been assigned to the four options (that is, ‘a’, ‘b’, ‘c’ and ‘d’), we have to focus on what essentially the test intends to measure and to what extent these responses capture this intention. In Items 3 and 18, the responses ‘b’ and ‘d’, respectively, capture best the concept of Emotional Quotient as operationally defined by the test. Hence, they have been assigned maximum scores and other options have been assigned numbers in order of their relevance. The interval of magnitude has been taken as five (‘20’ – ‘15’ – ‘10’ – ‘5’).
9
Reliability
CHAPTER OUTLINE 1. What is reliability? 2. Methods of calculating reliability: (i) (ii) (iii) (iv) (v)
Test–retest Parallel form Split-half Method of rational equivalence Cronbach Alpha
3. Factors affecting reliability: (i) (ii) (iii) (iv) (v) (vi)
Variability of age Variability of scores Time interval between testing Effect of practice and learning Consistency in scores Effect of test length
4. Types of reliability used in some psychological tests 5. Importance of reliability in psychological testing
LEARNING OBJECTIVES At the end of this chapter, you will be able to understand: 1. 2. 3. 4. 5.
What is reliability and what is its significance? What are the various methods of calculating reliability? What are the various factors that can affect reliability? How variability in age and scores affect reliability? What is the effect of time interval between testing, and practice or learning on the reliability of a psychological test? 6. What will happen to reliability if certain number of items are added or removed from a test?
131
132
Applied Psychometry
WHAT IS RELIABILITY?
R
eliability of a test is a criterion of test quality relating to the accuracy of psychological measurements. The higher the reliability of a test, relatively the freer it would be of measurement errors. Some regard it as the stability of results in repeated testing, that is, the same individual or object is tested in the same way, so that it yields the same value from moment to moment, provided that the thing measured has itself not changed in the meantime. The concept of reliability underlines the computation of error of measurement of a single score, whereby we can predict the range of fluctuations likely to occur in a single individual score as a result of irrelevant chance factors. The test reliability in its broadest sense indicates the extent to which individual differences—as in a test score—are attributable to true differences in the chance scores under consideration and the extent to which they are attributable to chance scores. In technical terms, the measures of test reliability make it possible to estimate as to what proportion of the total test score is error variance. The more the error, the lesser the reliability. Practically speaking, this means that if we can estimate the error variance in any measure, then we can also estimate the measure of reliability. This brings us two equivalent definitions of reliability: 1. Reliability is the proportion of the ‘true’ variance to the total obtained variance of the data yielded by the measuring instrument and 2. It is the proportion of error variance to the total obtained variance of the data yielded by the measuring instrument subtracted from 1.00. The index of 1.00 indicates perfect reliability.
METHODS OF CALCULATING RELIABILITY There are five methods to measure the reliability of a test. These are, (a) test–retest method, (b) method of parallel form, (c) split-half reliability, (d) method of rational equivalence and (e) Cronbach Alpha. These five methods are discussed in detail here.
Test–Retest Method The most frequently used method to find the reliability of a test is by repeating the same test on a second occasion. The reliability coefficient (r) in this case would be the correlation between the score obtained by the same person on two administrations of the test. An error variance corresponds to the random fluctuations of performance from one test session to the other test session. The problem related to this test is the controversy about the interval between two administrations. If the interval between the tests is long (say, six months) and the subjects are young children, growth changes will affect the test scores. In general, it increases the initial score by various amounts and tends to
Reliability
133
lower the reliability coefficient. Owing to the difficulty in controlling the factor which influences scores on retest, the retest method is generally less useful than are the other methods. The formula used to find the test–retest reliability is the Pearson product–moment formula. reliability =
N ∑ XY − (∑ X )(∑ Y )
[N ∑ X 2 − (∑ X )2 ][N ∑ Y 2 − (∑ Y )2 ]
Where, N = total numbers of observations, X = Scores in first data set, Y = Scores in second data set. Take, for example, a test to measure numerical ability—having 60 items—was administered to a group of 20 students twice with a gap of 15 days. The two sets of scores obtained were test score (X) and retest score (Y), as given in Table 9.1. Table 9.1 S. No.
Test Score X
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Total
Test–Retest Reliability X2
Y2
XY
2916 2500 2809 3249 3025 2704 2704 2304 2704 2401 2709 2209 2809 2601 3136 2500 2209 1936 2704 2809 52830
3025 2500 2500 2809 3136 2916 2916 2809 3025 3136 3364 2916 2304 2916 3249 2601 2809 2304 2809 2704 57196
2970 2500 2650 3021 3080 2808 2808 2958 2544 2860 2744 3016 2538 2544 2754 3192 2550 2491 2112 2756 54844
Retest Score Y
54 50 53 57 55 52 52 48 52 49 52 47 53 51 56 50 47 44 52 53 1026
55 50 50 53 53 54 54 53 55 56 58 54 48 54 57 51 53 48 53 52 1068
Source: Author.
r= =
N ∑ XY − ∑ X ∑ Y
[N ∑ X − (∑ X )2 ][N ∑ Y 2 − (∑ Y )2 ] 2
20 × 54844 − 1026 × 1068 [20 × 52830 − (1026)2 ][20 × 57196 − (1058)2 ]
134
Applied Psychometry
=
1096880 − 1095768 (1056600 − 1052676)(1143920 − 1140624)
=
1112 1112 − 3924 × 3296 12933504
=
1112 3596.32
= 0.31
Method of Parallel Form To overcome the difficulty of practice and time interval in case of test–retest method, the method of parallel or alternate form is used. Using the equivalent or parallel forms has some advantages like lessening the possible effect of practice and recall. But this method presents an additional problem of construction and standardisation of the second form. According to Freeman, both forms should meet all of the test specifications as follows: 1. The number of items should be the same, 2. The kinds of items in both should be uniform with respect to content, operations or traits involved, levels and range of difficulty, and adequacy of sampling, 3. The items should be uniformly distributed as to difficulty, 4. Both test forms should have the same degree of item homogeneity in the operations or traits being measured. The degree of homogeneity may be shown by intercorrelations of items with subtest scores, or with total-test scores, 5. The means and the standard deviations of both the forms should correspond closely and 6. The mechanics of administering and scoring should be uniform. Freeman (1962) states that the above are the ideal criteria of equivalent forms, but complete uniformity in all the respects cannot be expected. However, it is necessary that uniformity be closely approximated. The parallel forms are administered on the group of individuals and the correlation coefficient is calculated between one form and the other. For instance, the 1937 Standford-Binet Scale has Form L and Form M (Terman and Merrill 1973). The content of the forms was derived from one and the same process of standardisation. The correlation of 0.91 is obtained between these two forms for chronological age of seven years. The formula and the method used to find the reliability with the help of parallel or alternative techniques is the same as used in test–retest method.
Reliability
135
Split-half Reliability The advantage that this method has over the test–retest method is that only testing is needed. This technique is also better than the parallel form method to find reliability because only one test is required. In this method, the test is scored for the single testing to get two halves, so that variation brought about by difference between the two testing situations is eliminated. A marked advantage of the split-half technique lies in the fact that chance errors may affect scores on the two halves of the test in the same way, thus tending to make the reliability coefficient too high. This follows because the test is administered only once. The larger the test, the less is the probability that the effects of temporary and variable disturbances will be cumulative in one direction and the more accurate the estimate of score reliability. The two halves of the test can be made by counting the number of odd-numbered items answered correctly as the other half. In other words, odd-items and even-items are scored separately and those are considered as two separate halves. There are other methods also to split the test items into two halves, like Items ‘1’ and ‘2’ will go to the first score, Items ‘3’ and ‘4’ will go to the second score, Items ‘5’ and ‘6’ will go to the first score, Items ‘7’ and ‘8’ will go to the second score, and so on. The other method to divide the test into two halves is to consider the first 50 per cent items as one half and the second 50 per cent items as the other half. Whenever the difficulty level of the test items is not the same, we apply odd–even method, and if the difficulty level is same, then we apply the first half and second half method to divide the test into two halves. Once the two halves have been obtained for each individual score, these halves will be correlated with the help of the Pearson product–moment formula, namely, r=
N ∑ XY − (∑ X )(∑ Y )
[N ∑ X 2 − (∑ X )2 ] − [N ∑ Y 2 − (∑ Y )2 ]
A reliability coefficient of this type is called a coefficient of internal consistency. The reliability depends upon the test length. When we score as two halves, we, in fact, cut the length of the original test to half. Therefore, the reliability which we have calculated is equivalent of one for a test of half of the size of our original test. Thus, we make the correlation of test length to get the reliability of the total original test. For this Spearman–Brown formula is used for doubling the length of the test. The formula is, Reliability =
2r 1+ r
Where r is the correlation coefficient obtained by correlating the scores of one half with the other half.
136
Applied Psychometry
For instance, a test of abstract reasoning was administered to a group of 20 students. The data is divided into two halves with the help of odd–even method. The split-half reliability computed is as follows (Table 9.2): Table 9.2
Calculation of Split-half Reliability of the Two Halves of a Test
S. No.
(Even) X
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Total
26 22 27 29 28 23 26 25 24 26 25 23 25 27 29 24 22 21 28 27 507
(Odd) Y 28 28 26 28 27 29 25 23 28 23 27 24 28 24 27 26 25 23 24 26 519
X2
Y2
XY
676 484 729 841 784 529 676 625 576 676 625 529 625 729 841 576 484 441 784 729 12959
784 784 676 784 729 841 625 529 784 529 729 841 784 576 729 676 625 529 576 676 13541
728 616 702 812 756 667 650 575 672 598 675 667 700 648 783 624 550 483 672 702 13165
Source: Author.
r=
=
N ∑ XY − ∑ X ∑ Y
[N ∑ X − (∑ X )2 ] − [N ∑ Y 2 − (∑ Y )2 ] 2
20 × 13165 − 507 × 519 [20 × 12959 − (507)2 ][20 × 13541 − (519)2 ]
=
263300 − 263133 (259180 − 257049)(270820 − 269361)
=
167 2131 × 1459
=
167 3109129
Reliability
r=
137
167 1763.27
= 0.0947 = 0.095 Spearman–Brown formula: r1 =
Reliability
2r 1+ r
=
2 × 0.095 1 + 0.095
=
0.19 = 0.1735 1.095
= 0.174
Method of Rational Equivalence The coefficient of internal consistency could also be obtained with the help of Kuder–Richardson formula number 20. One of the techniques for items analysis is item difficulty index. Item difficulty (ID) is the proportion or percentage of those answering correctly to an item; say, symbol ‘p’ is used to represent the difficulty index. Suppose an Item ‘X’ has p = 0.74. This means Item ‘X’ was answered correctly by 74 per cent of those who answered the item. To compute reliability with the help of Kuder–Richardson formula number 20, the following procedure is used. First, write the first column in a worksheet showing the number of items. The second column should give the difficulty value (p) of each item obtained during item analysis. The third column is given as q where q = 1 – p. The fourth column is taken as (p) (q). This column is the product of column two and column three. The Kuder–Richardson formula number 20 is, Reliability =
N ⎡ ∑ pq ⎤⎥ ⎢1 − N − 1 ⎢⎣ 159σ t2 ⎥⎦
Where N is the number of items on the test, σ is the variance of the test and σt is the standard deviation. For instance, examine the following data (Table 9.3):
138
Applied Psychometry
Table 9.3
Calculation of Reliability by Method of Rational Equivalence
Item (1)
P (2)
Q (3)
1 2 3 4 5 6 7 8 9 10
0.60 0.40 0.30 0.72 0.68 0.46 0.54 0.82 0.90 0.63
0.40 0.60 0.70 0.28 0.32 0.54 0.46 0.18 0.10 0.37
pq (4) 0.2400 0.2400 0.2100 0.2016 0.2176 0.2484 0.2484 0.1476 0.0900 0.2331 Σpq = 2.0767
Source: Author.
Suppose the test has 10 items and variance of the test is 6.4, Reliability =
N ⎡ ∑ pq ⎤ ⎢1 − ⎥ N − 1 ⎢⎣ σ t2 ⎥⎦
= [1.111][1 – .3245] = [1.111][0.6755] = 0.751 The above formula can also be written as, 2 ⎡ N ⎤ ⎡ σ t ∑ pq ⎤ Reliability = ⎢ ⎢ ⎥ 2 ⎣ N − 1 ⎥⎦ ⎢⎣ σ t ⎥⎦
An approximation to the above formula is useful for teachers and practitioners who are interested to find quickly the reliability of a short objective test needed for classroom examinations. This formula can be written as, Reliability =
Nσ t2 − M(N − M ) σ t2 (N − 1)
where N is the number of items in the test σ t2 Is the variance of test score and M is the mean of the test score This is a simpler method than the Kuder–Richardson formula number 20 to find the reliability. For instance, one examiner gave 80 objective type questions to a group of students. For a right answer, a score of 1 was given, and for a wrong answer, a score of 0 was given. Suppose the mean
Reliability
139
score of the objective test is 60.5 and the standard deviation of the test scores is 7.00. The reliability of the test would be given by, Reliability =
Nσ t2 − M(N − M ) σ t2 (N − 1)
=
80(7.0)2 − (60.5)(80 − 60.5) (7.00)2 (80 − 1)
=
3920 − 1179.75 2740.25 = 3871 3871
= 0.708 For the above formula, there is one basic assumption that there is some difficulty level for all the test items. In other words, the same proportion of individuals (but not the same individuals) answer each item correctly. It has been observed that the above formula holds true even when the assumption of equal ID is not satisfied. The formula of rational equivalence cannot yield strictly comparable results as obtained by other methods of finding reliability. The actual differences obtained by the method of rational equivalence and split-half method is never large and is often negligible within the acceptable range. Two forms of a test are equivalent when the corresponding items like a1, a2, b1, b2, and so on, are interchangeable and when the item–item correlations are the same for both the forms.
Cronbach Alpha The Kuder–Richardson formula is applicable to find the internal consistency of tests whose items are scored as right or wrong, or according to some other all or none system. Some tests, however, may have multiple choice items. On a personality inventory, however, there are more than two response categories. For such tests, a generalised formula has been derived known as coefficient alpha (Cronbach 1951). In this formula, the value of Σpq is replaced by Σσt2 which is the sum of variance of item scores. The procedure is to find the variance of all individuals scores for each item and then to add these variances across all items. The formula is,
∑ σ i2 ⎛ n ⎞⎛ ru = ⎜ ⎟ ⎜⎜ 1 − σ t2 ⎝ n −1 ⎠⎝
⎞ ⎟⎟ ⎠
where n = number of items in test σ t2 = variance of test scores σ i2 = item ith variance Σσ i2 = sum total of items variance
140
Applied Psychometry
For example, take a test having seven items and each item having a response category of ‘agree’, ‘undecided’ and ‘disagree’. ‘Agree’ is scored as 2, ‘undecided’ as 1 and ‘disagree’ as 0. The following is the outcome of ten subjects (Table 9.4). Table 9.4
Calculation of Reliability by Using Cronbach Alpha Subjects Responses
Test Items
1
2
3
4
5
6
7
8
9
10
I II III IV V VI VII
2 1 0 1 0 2 2
2 1 1 1 2 0 2
0 2 2 1 2 1 0
1 2 2 1 0 2 0
1 2 2 2 2 2 1
2 2 2 2 2 2 0
2 2 2 2 1 2 1
2 2 2 2 2 1 2
2 2 2 2 2 2 2
2 1 1 1 2 2 2
Source: Author.
X=
∑ X = 107 = 15.29
σt =
n
∑X
7
2
n−1
= 1.603
σ t2 = 2.57 3.53 ⎞ ⎛ 7 ⎞⎛ Cronbach Alpha = ⎜ ⎟⎜1− ⎟ 2.57 ⎠ ⎝ 7 −1 ⎠⎝ = 0.44
FACTORS AFFECTING RELIABILITY The main factors affecting reliability are as follows: 1. 2. 3. 4. 5. 6.
Variability of age, Variability of scores, Time interval between testing, Effect of practice and learning, Consistency in scores and Effect of test length.
Total Test Score (X) 16 17 16 15 15 16 12 107
0.48 0.23 0.49 0.28 0.72 0.49 0.84 3.53
Reliability
141
Variability of Ages The age variability of the group affects the reliability coefficient. The reliability will be higher for a group having a wider range and the reliability will be small for a group having small variation of the trail or ability assessed. This is illustrated in Figure 9.1. Figure 9.1
Increase in Test Reliability with Increase in Variability of a Group
Source: Author.
For instance, if we have a group completely homogeneous with respect to chronological age (CA), the range of test scores will be from extremely low to extremely high. As there is no deviation in age, the correlation of CA with test scores would be zero. Further, if there is a small variation in a group with respect to CA, then the correlation of CA with test scores would be lower, and if there is a large variation in CA, then the correlation coefficient will be larger. Therefore, while interpreting the reliability coefficient of a test, it is necessary to know the range for which the test is standardised.
Variability of Scores As discussed above with regard to variability of age, the reliability is affected in the same way with the variability in the measured scores. When the variation among the testees is small, the correlation between two sets of scores may also be lowered by chance and by minor psychological factors. Because the testees in such a group
142
Applied Psychometry
are closely clustered, the chances in scores and relative position produced by extraneous factors are more significant than they would be in a widely divergent group.
Time Interval between Testing When there is a time interval between test and retest, the retest results will be affected due to the differences in individual performances and also due to the change in environmental conditions. If the time interval has been quite long, namely, in case of young children, an individual’s retest results may be influenced due to his growth tempo or due to enduring conditions like emotional experiences.
Effects of Practice and Learning Practice on the test will help in learning and this, in turn, can affect the reliability of a test. For example, therapy or counselling may modify an individual’s attitudes, values, and behaviour sufficiently to produce significant differences in test–retest results in case of personality tests.
Consistency in Scores Lack of agreement among the scorers will drastically affect the reliability coefficient. This is generally true in case of tests in which entirely objective scoring is not available. For such tests, it is advisable to know the extent of agreement in scoring among competent psychologists who have scored the same set of responses.
Effect of Test Length The reliability of a test is directly dependent on the test length, that is, the number of items in a test. Suppose a test of 40 items has a reliability of 0.60. Now, we increase the items to 120 by adding 80 more homogeneous items to the previous 40 items. The reliability of the new test of 120 items will be increased. Similarly, by shortening the length of the test, the reliability of the test will also be decreased. How much will be the reliability of the test change after increasing/decreasing the items is given by a formula called Spearman–Brown formula. The Spearman–Brown formula is, rn =
kr 1 + (k − 1)r
Reliability
143
where, K is the number of times by which the test is larger or shorter than the original test. Mathematically, K is the ratio of the number of items in the lengthened/shortened test to the number of items in the original test. K=
Number of items in lengthened/shortened test Number of ittems in the orignal test
r = the reliability of the test which is being lengthened or shortened. In the above example, K=
120 =3 40
r = 0.60 Therefore, new reliability would be given by, Kr 1 + (K − 1)r
Reliability = =
(3)(0.60) 1 + (3 − 1)(0.60)
=
1.80 1.80 = = 0.82 1 + 1.20 2.20
What we have seen is that by increasing the length of the test from 40 items to 120 items, the reliability increases. Here, K =
45 = 0.45 100 r = 0.50
Reliability =
Kr 1 + (K − 1)r =
(0.45)(0.50) 1 + (0.45 − 1)(0.50)
=
0.225 1 − 0.275
0.225 0.725 = 0.31 =
144
Applied Psychometry
Thus, reducing the test length from 100 items to 45 items reduces the reliability of the form 0.50 to 0.31 of the test.
TYPES OF RELIABILITY USED IN SOME PSYCHOLOGICAL TESTS Different types of measures of test reliability are used, depending upon the nature of the psychological test in question. Given below are the name of some tests with the type of reliabilities used by them (Table 9.5). Table 9.5
Types of Reliability Used by Some Major Psychological Tests
S. N.
Psychological Test
Type of Reliability Used
1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
16 Personality Factors (Cattell et al., in Russell and Karol 2002) State Trait Anxiety Inventory for Children (Speilberger et al., cited in Chadha 1996) Eysenck Personality Inventory (Eysenck and Eysenck 1975) Kuder General Interest Survey (Kuder 1975) Managerial Style Questionnaire (Kirchoff 1975) Draw-a-Man Test (Pramila Pathak, cited in Chadha op. cit.) Value Preference Scale (Udai Pareek, cited in Chadha op. cit.) Life Scale (Chadha and Willigen, cited in Chadha op. cit.) Social Integration Attitude Scale (SIAS) (Roy 1967, cited in Chadha op. cit.) Interpersonal Trust Scale (Chadha and Menon, cited in Chadha op. cit.)
Equivalent Form Test–Retest Test–Retest Kuder–Richardson Test–Retest Test–Retest Split-half Split-half (0.84) Split-half (0.901) Split-half (0.89 to 0.98)
Source: Author.
For more details on this topic, please refer to Chapters 12, 13, 14 and 15.
IMPORTANCE OF RELIABILITY IN PSYCHOLOGICAL TESTING Reliability refers to the internal consistency of a test and without establishing the reliability estimates, a psychological test will be rendered as a cherished instrument of no use. To be viable, a psychological test must be reliable. Imagine the condition of a bright job candidate who could not be hired by the recruiting agency because it adopted an unreliable psychological test, like the heaps of them are available on the internet in the form of freeware. It would have been frustrating not only for the job applicant but also was great loss for the employer. A standardised psychological test fulfilling the strict psychometric criteria and appropriate levels of reliability and validity should be used for the purpose of assessment and measurement. However, the reverse engineering of above can also prove very useful. For example, if we have a test of established reliability and other required psychometric criteria and, hence, we are confident
Reliability
145
about its reliability, we can use it to predict the consistency of the behaviour of the testee assessed by the measuring instrument. An employer might be disturbed with the inconsistent behaviour of one of his employees in behavioural terms. If we use the standard tests of good reliability estimate (∼ 0.70), he can use such tests to predict which of the candidates will show more (in)consistency on the measured attribute, if other things are equal.
10
Validity
CHAPTER OUTLINE 1. What is validity? 2. Methods for calculating validity: (i) (ii) (iii) (iv)
Determining validity by means of judgement Criterion based validity Construct validity Factorial validity
3. Factors affecting validity: (i) (ii) (iii) (iv)
Group differences Correction of attenuation Criterion contamination Test length
4. Using validity information to make prediction: (i) Linear regression (ii) Multiple regressions (iii) Making predictions with a linear regression equation 5. Relationship between Reliability and Validity
LEARNING OBJECTIVES After reading this chapter, you will be empowered to deal with: 1. 2. 3. 4. 5.
146
What is validity and what is its role in psychological testing? What are the various methods to calculate the validity of a test? What are the factors that can affect the validity of a test? How to calculate new validity if certain number of items are added/removed from the original test? How we can use validity information to make predictions?
Validity
147
WHAT IS VALIDITY?
T
he validity of any measuring instruments depends upon the accuracy with which it measures what it to be measured when compared with standard criterion. A test is valid when the performance which it measures corresponds to the same performance as otherwise independently measured or objectively defined. At this point, a distinction needs to be made between validity and reliability. Suppose, a clock is set forward by 30 minutes. If the clock is a good timepiece, the time it ‘shows’ will be reliable, that is, consistent, but will not be valid as judged by ‘standard time’. The reliability of a test is determined by making reported measurements of the same facts and validity is found by comparing the data obtained from the test with standard (and sometimes arbitrary) measures. Since independent standards (that is, criteria) are hard to get in mental measurement, the validity of a mental test can never be estimated as accurately as can the validity of a physical instrument. Validity is a relative term. A test is valid for a particular purpose; it is not generally valid.
METHODS OF CALCULATING VALIDITY Determining Validity by Means of Judgements Content validity becomes more of an issue for tests of achievement or ability and less a concern for tests of personality, where high content validity may limit the overall usefulness/applicability of the test. Further, it is useful for tests related with cognitive skills that require an assessment of a broad range of skills in a given area. The concept of ‘content validity’ is employed in the selection of items for a test. Standard educational achievement examination represents the consensus of many educators as to what should a child of a given age or grade know about arithmetic, reading, spelling, history and other subjects. A test of English history, for instance, would be valid if its content consists of questions covering this area. The validation of content through competent judgements is most satisfying under two conditions, (a) when the sampling of items is wide and judicious and (b) when adequate standardisation groups are utilised. Less defensible than content validity is the judgement process called ‘face validity’. A test is said to have face validity when it appears to measure whatever the author has in mind. Content validity is generally confused with ‘face validity’ that a scale appears to measure based on the reading of various items. Rating scales for various hypothesised traits, neurotic inventories, attitude scales and even intelligence tests often claim face validity. Judgement of face validity is very useful in helping an author decide whether his test items are relevant to some specific situation (that is, the industry) or to specialised occupational experiences. For example, arithmetic problems dealing with bank operations are more relevant to bank jobs than are fictitious problems dealing with men rowing against a river, or the cost of preparing a wall. However, face validity should never be more than a first step in testing an item; it should not be the final word.
148
Applied Psychometry
Criterion based Validity This kind of validity deals with the ability of test scores to predict human behaviour, either with the help of other test scores, observable behaviour or other accomplishments, such as grade point averages. Experimentally, the validity of a test determined by finding the correlation between the test and some independent criterion, may be an objective measure of performance, or a quantitative measure such as a judgement of character or excellence in work done. Intelligence tests were first to be validated against school grades/ratings for aptitude by teachers, and other indices of ability. Personality, attitude and interest inventories are validated in a variety of ways. The best way to check test prediction is evidence of validity, provided that (a) the criterion was setup independently and (b) both the test and the criterion are reliable. Criterion validity can be categorised into two types, that is, concurrent and predictive. Concurrent validity involves prediction of an alternative method of measuring the same characteristics of interest, while predictive validity attempts to show a relationship with future behaviour. Both predictive and concurrent validities are accepted by deciding the appropriate level of validity coefficient or correlation between a test score and some criterion variable. The appropriate acceptance level depends upon the intended use of the test. For instance, if one is interested to predict group membership, then a classification analysis or a similar technique that determines placement accuracy based on test scores would be appropriate. This is known as the non-correlational method of validation. The index of reliability is sometimes taken as a measure of validity. The correlation coefficient gives the relationship between the obtained scores and their theoretical true counterparts. Suppose the reliability coefficient of a test is 0.81, then r is 0.81 or 0.90. This would mean that the test measures true ability to the extent expressed by r equal to 0.90.
Construct Validity Construct validity approach is much more complex than other forms of validity and is based on the accumulation of data over a long period of time. Construct validity requires the study of test scores in relation not only to variables that the test is intended to assess, but also in relation to the study of those variables that have no relationship to the domain underlying the instrument. Therefore, one builds a homothetic net or inferential definition of the characteristics that a test is intended to assess. Another approach includes predications to other tests that are tests which are assumed to measure the same underlying trait as well as those that describe unrelated traits. Hence, we may find or predict that a specific intellectual skill should have a moderate correlation with the test of general Intelligence Quotient (IQ), little or no correlation with a measure of hypochondrias, and a strong correlation to another test assessing the same intellectual skill. One should keep in mind the accuracy of the original hypothesis. This hypothesis is related with the researcher’s comprehension of the traits under study. One should be careful and not confuse a researcher’s misunderstanding of either the intention of an instrument or the underplaying theory with the inefficiency of the instrument itself.
Validity
149
Factorial Validity Another method to study construct validity is with the help of factor analysis. One may postulate a factorial structure for a specific test, given one’s assumptions about both the characteristics that are being assessed and the theory from which they are derived. A confirmatory factor analysis is then performed to test the hypothesis. In case of tests in which a limited number of scores or a single score is generated, other variables with such marker variables can then be used to determine the meaning of the new test scores. In such an analysis and all factor analytic procedures, it is helpful to perform a series of factor analyses in order to determine whether the factor structure and the factorial relationship are stable across time and across groups. Factor analysis, a specialised statistical technique, is widely used and is highly important in modern test construction. The intercorrelations of a large number of tests are examined and, if possible, accounted for in terms of a much smaller number of more general ‘factor’ or trait categories. The factors presumably run through the often complex abilities measured by the individual tests. It is sometimes found, for example, that three or four factors account for the intercorrelation obtained among 15 or more tests. The validity of a given test is, thus, defined by its factor loading and these are given by the correlation of the test with each factor. For instance, a vocabulary test may correlate 0.85 with the verbal factor extracted from the entire test battery. This coefficient becomes the test’s factorial validity. Unlike reliability, which is influenced only by unsystematic errors of measurement, the validity of a test is affected by both unsystematic and systematic (constant) errors. This correctly implies that a test may be reliable without being valid, but it cannot be valid without being reliable. Another way of stating the same point is that reliability is a necessary but not a sufficient condition for validity. Technically speaking, the criterion related validity of a test, as indicated by the correlation between the test and an external criterion measure, can never be greater than the square root of the parallel form’s reliability coefficient.
FACTORS AFFECTING VALIDITY Validity of a psychological test may be influenced by a number of factors, some of which are listed here.
Group Differences The characteristics of a group of people on whom the test is validated affect the criterion related validity. Differences among the group of people on variables like sex, age and personality traits may affect the correlation coefficient between the test and the selected criteria. Like reliability coefficient, the magnitude of the validity coefficient depends on the degree of heterogeneity of the validation group on the test variable. In a group having narrower range of test scores, that is, in a
150
Applied Psychometry
more homogeneous group, the validity coefficient tends to be smaller. Since the size of a correlation coefficient is a function of two variables, a narrowing of the range of the either of the predictors of the criterion variable will tend to lower the validity coefficient.
Correction for Attenuation An unreliable test cannot be very valid. A test of low reliability also has low validity. There is a formula that can be employed to estimate what the validity coefficient would be if both the test and the criterion are perfectly reliable. This correction for attenuation formula is, r=
v12 r11 r22
where, r is an estimate of what the validity coefficient would be if the predictor and criterion variables are perfectly reliable V12 is the validity coefficient r11 is the reliability of the predictor and r22 is the reliability of the criterion For instance, let us assume that the validity coefficient is 0.40, the reliability of the test is 0.70, and the reliability of the criterion is 0.50. The validity coefficient would be given as follows, if both the test and the criterion are perfectly reliable, v12 = 0.40 r11 = 0.70 r22 = 0.50 r=
v12 r11 r22
=
0.40
= 0.68
0.70 0.50
This shows substantial increase in validity coefficient. Therefore researchers are cautioned at this point about employing the correction for attenuation because tests are never perfectly reliable and validity coefficients that are generally corrected for attenuation do not exist.
Criterion Contamination The validity of a test is also dependent upon the validity of the criterion itself as a measure of the particular cognitive or affective characteristic of interest. Sometimes, the criterion is contaminated
Validity
151
or rendered invalid due to the method by which the criterion scores are determined. Teachers have been known to test students’ scores on academic achievements tests (AATs) before deciding what course grades to assign. Since AATs are also taken into consideration by the admission office to select students who are predicted to make satisfactory grades. This method of assigning grades contaminates the criterion and, hence, results in an inaccurate validity coefficient. Therefore, if AAT scores are to be used for predicting grades, then grades should be arrived at independently without reference to AAT scores.
Test Length Like reliability, validity coefficient varies directly as the test length, that is, the longer a test the greater is its validity and vice versa. Increasing a test’s length affects the validity coefficient. This effect can be measured by the following formula, Vn = where, Vn Vo r K
KV0 K + K (K − 1)r
(1)
= the validity of the lengthened test = the validity of the original test = the reliability coefficient of the test = number of parallel forms of test X, or the number of times it is lengthened. Let us explain with the help of an example.
Problem 1 Test X has a reliability coefficient of 0.60 and a correlation of 0.30 with the criterion (c). What would be the correlation of Test X with the same criterion (validity coefficient), if the test were tripled in length? Solution 1 Substituting r = 0.60, V0 = 0.30 and K = 3 in above formula, we have, V=
3 × 0.30 0.9 0.9 = = = 0.35 3 + 3 × 2 × 0.60 66 2.6
Thus, tripling the length of the test or averaging administration of the same test, increases the validity coefficient from 0.30 to 0.35. It can be noticed that tripling the test’s length also increased the reliability coefficient from 0.60 to 0.82.
152
Applied Psychometry
This shows that lengthening a test not only increases its reliability, but it also increases its validity, and that the increase in reliability with increase in validity depicts a close relation between the two measures of test efficiency. This brings us to the question as to how much should the test be lengthened in order to reach a given level of validity? To get an answer to this question, formula (1) needs to be rearranged as we now need to solve for K, K=
Vn2 (1 − r ) V02 − Vn2 × r
Problem 2 Suppose that test has a reliability coefficient of 0.60 and a criterion correlation of 0.30. How much lengthening of the test is necessary so that this test will yield a validity coefficient of 0.35?
Solution 2 Substituting V = 0.35, r = 0.60 and V0 = 0.30 in above formula, and solving for K, we get, K=
(0.35)2 (1 − 0.60) (0.30)2 − (0.35)2 (0.60)
= 2.965 = 3 (approx.) This test must, therefore, be lengthened to three times its present length for the validity coefficient to go from 0.30 to 0.35. Also, note that increasing the length of the above test by three times raises its reliability coefficient from 0.60 to 0.87. There is another formula to find the new validity of a test whose length is increased or decreased. The formula is, New validity
V0 =
V0 K 1 + (K − 1)r
where, V0 is the original validity of test R is the original reliability of the test K is the ratio of the number of items in the new test to the number of items in the original test Applying this to the previous problem where, V0 = 0.30 K =3 R = 0.60
Validity
V=
=
153
V0 k (0.30) 3 = 1 − (k − 1)r 1 + (3 − 1)(0.60) 0.5196 0.5196 0.5196 = = 1.483 1 + 1.20 2.20
= 0.35 This value is the same as obtained by the previous formula.
USING VALIDITY INFORMATION TO MAKE PREDICTION When a relationship can be established between a test and a criterion, we can use test scores from other individuals to predict how well those individuals will perform on the criterion measure. For example, some universities use students’ scores on the Scholastic Assessment Test (SAT) to predict the students’ success in college. Organisations use job candidates’ scores on pre-employment tests that have criterion related validity to predict those candidates’ scores on the criteria of job performance.
Linear Regression We use the statistical process called linear regression when we use one set of test scores (X) to predict one set of criterion scores (Y). To do this, we construct the following linear regression equation, Y´ = a + bx where, Y´ = the predicted score on the criterion a = the intercept b = the slope x = the score the individual made on the predictor test This equation actually provides a predicted score on the criterion (Y´) for each test score (X). When the Y´ values are plotted, they form the linear regression line associated with the correlation between the test and the criterion. We calculate the slope (b) of the regression line—the expected change in one unit of Y for every change in X—using the following formula, b=r
sy sx
154
Applied Psychometry
Where, r = the correlation coefficient Sx = the standard deviation of the distribution of X Sy = the standard deviation of the distribution of Y The intercept is the place where the regression line crosses the Y-axis. The intercept (a) is calculated using the following formula, a = Y + bX – where, Y= The mean of the distribution of Y b = the slope – X = The mean of the distribution of X
Multiple Regressions Complex criteria, such as job performance and success in graduate school, are often difficult to predict with a single test. In these situations, researchers use more than one test to make an accurate prediction. An expansion of the linear regression equation helps in this situation. We use the statistical process of multiple regression for predicting a criterion (Y´) using more than one set of test scores (X1 X2,….. XN). The multiple regression equation that incorporates information for more than one predictor or test is as follows, Y´ = a + b1X1 + b2X2 + b3X3 … bn Xn where, Y´ = the predicated score on the criterion a = the intercept b = the slope of the regression line and amount of variance the predictor contributes to the equation, also known as beta (β ) X = the predictor
Making Predictions with a Linear Regression Equation Research suggests that academic self-efficacy (ASE) and class grades are related. Consider the following data to show how we could use the scores on an ASE test to predict a student’s grade. For instance, we can ask the question, ‘If a student scores 65 on the ASE test, what course grade would we expect the student to receive?’ We have assigned numbers to each grade to facilitate this analysis (Table 10.1). Therefore, 1 = D, 2 = C, 3 = B and 4 = A.
Validity
Table 10.1
155
Academic Self-efficacy (ASE) and Class Grades
Student 1 2 3 4 5 6 7 8 9 10
ASE (X)
Grade (Y)
80 62 90 40 55 85 70 75 25 50
2 2 3 2 3 2 4 3 2 3
Source: Kothari (2005).
Step 1: Calculate the means and standard deviations of X and Y, X = 63.2 Y = 2.6 SX = 19.8 SY = 0.7 Step 2: Calculate the correlation coefficient (rxy) for X and Y, rxy = .33 Step 3: Calculate the slope and the intercept, b=r
Sy′ SX
so b = .33(.7/19.8), so b = .012
a = Y + bX , a = 2.6 − (.012) (63.2) = 1.83 Step 4: Calculate Y´ when X = 65, Y´ = a + bx = 1.83 – (.012) (65) = 1.83 – .78 = 1.05 Step 5: Translate the number calculated for Y´ back into a letter grade, Therefore, 1.05 would be a grade of D. The best predication we can make is that a person who scored 65 on an ASE test would be expected to earn a course grade of D. Note that by substituting any test score for X, we will receive a corresponding prediction for score on Y.
156
Applied Psychometry
THE RELATION BETWEEN RELIABILITY AND VALIDITY Reliability and validity refer to different aspects of essentially the same thing, namely, test efficiency. Reliability is concerned with the stability of test scores. It does not go beyond the test itself. Validity, on the other hand, implies evaluation in terms of an outside and independent criteria. The purpose of a test is to find a measure which will be an adequate and time saving substitute for criterion measures, obtainable only after long intervals of time, for example, school grades or performance records. In case of psychological tests, high reliability and high validity is desirable. Or, we can say that it is the interaction between reliability and validity that determines the desirability of a psychological test. For example, see Figure 10.1. The most desirable test will be the test that carries a high reliability as well as validity. Figure 10.1
Interaction between Reliability and Validity, and Desirability of Psychological Tests
Source: Author.
Daniel Goleman (1997), in his best-selling book Emotional Intelligence, argues that Emotional Quotient (EQ) makes for a much better predictor of success in life than IQ. In psychometric terminology we can say that, what essentially Goleman argues is that while IQ may demonstrate high reliability, but it has modest validity.
To be valid a test must be reliable. A highly reliable test is always a valid measure of some function. To explain this further, if a test has a reliability coefficient of 0.81 and its index of reliability is 0.90, then the test correlates 0.90 with the true measures that constitute the criterion. However, a test may be theoretically valid and show little or no correlation with anything else. For example, word cancellation test scores can be made highly reliable by lengthening or repeating the test, so that the index of reliability becomes high. But the correlation of these tests with such criteria as speed or accuracy are so low that they show little practical validity.
11
Norms
CHAPTER OUTLINE 1. The concept of norms: definition and nature 2. Types and methods of calculating norms: (i) (ii) (iii) (iv) (v) (vi) (vii)
Percentile norms Deciles Standard score T-score Stamina Age norms Grade norms
3. What is the difference between norms and standards? 4. Types of norms used by some psychological tests
LEARNING OBJECTIVES At the end of this chapter, you will be empowered to deal with: 1. 2. 3. 4. 5. 6.
What are norms and what is their relevance in psychological research? Why do we need norms? What are the various factors that can influence the norms? What are the various types and methods for calculating norms? How to choose an appropriate norm for your test? What is the difference between norms and standards?
157
158
Applied Psychometry
THE CONCEPT OF NORMS: DEFINITION AND NATURE
A
norm is the average or typical score (mean or median) on a particular test made by a specified population; for example, the mean intelligence test score for a group of 10-year olds. The raw score, that is, the actual number of units or points, obtained by an individual on a test does not in itself have much, if any, significance. One test may yield a score of 43 and cannot be directly compared with a score of 43 on another test. Furthermore, the average scores for each of the two tests will in all probability be different, as the degree of variation of scores—called the deviation—both above and below the average will be different for the two scores. However, such score-for-score comparisons would be extremely cumbersome and would, in each instance, have to be interpreted in terms of some common, meaningful index. It is, thus, clear that if the scores obtained on each of the several tests are to be compared, indices must be used which will express the relative significance of any given score or what is known as relative rank. Hence, to facilitate interpretation, sound psychological tests provide tables of age norms or grade norms or percentile ranks or decile ranks or standard scores, depending upon the instrument’s purpose. Reference to a test’s table of norms enables us to rank an individual’s performance relative to his own and other age or grade groups. For example, a child of 10 might have an intelligence test score that is average for his own age or for a population of nine-year olds, or for those who are 11 years of age. Since it is desirable to locate an individual’s score and relative rank not only with reference to an average, but also with reference to other levels in the scale, table of norms should include the frequency distribution of the scores, from which percentile ranks and standard scores may be readily calculated. The characteristics of any table of norms depend on a number of factors affecting the individuals who make up the group: 1. In standardising a psychological test, the norm and the distribution of scores are influenced by the representativeness of the population sample, that is, by the proportion from each sex, their geographic distribution, their socio-economic status and their age distribution. 2. In devising a test of educational achievement, factors influencing the normative data, in addition to the above, are the quality of the schools and the kinds of curricula from which this standardisation population is drawn. 3. Norms of tests of aptitude, like clerical or mechanical, are influenced by the standardisation population’s degree of experience, the kind of work they have been doing and by the representativeness of the group.
Therefore, norms derived for several tests classified under the same name and intended for the same purposes are not necessarily comparable. It is necessary to know the characteristics of the standardisation population before deciding on the selection and use of a test. This information is essential to decide whether the instrument is appropriate in a given situation or not.
Norms
159
TYPES AND METHODS OF CALCULATING NORMS Percentile Rank An individual’s percentile rank on a test designates the percentage of cases or scores lying below it. In other words, this statistical device determines at which one-hundredth part of the distribution of scores or cases is an individual located. For example, a person having a percentile rank of 20 (P20) is situated above 20 per cent of the group of which he is a member or, otherwise stated, 20 per cent of the group fall below this person’s rank. By this means, a person’s relative status or position in the group can be established with respect to the traits or functions being tested. It is a known fact that psychological measurement, unlike physical measurement, derives its significance principally from relative ranks ascribed to individuals rather than from quantitative units of measurement. A table of norms and frequency distribution often provides percentile ranks. In case the percentile ranks are not given in a table, it is possible to calculate them easily from the frequency distribution. The percentile method is a technique whereby scores on two or more tests, given in units that are different, may be transformed into uniform and comparable values. This method has the advantage of being easily calculated, easily understood and of making no assumptions with regard to the characteristics of the total distribution. It answers the question, ‘Where does an individual’s score rank him in another group whose members have taken the same test?’ The distribution might be normal, skewed or rectangular. Percentile points are based upon the number of scores (cases) falling within a certain range. Hence, the distance between any two perentiles represents a certain area under the curve, that is, a certain number of cases (N/100). Figure 11.1 shows that if percentages of the total area (total number of cases) are equal, then the distances on the base line (range of scores) must be unequal, unless the distribution is rectangular (Figure 11.2). Figure 11.1 shows that differences in scores between any two percentile points become greater as we move from the median (P50) towards the extremes. Inspection of the curve shows, for instance, that the distance on the baseline (representing scores) between percentiles 50 and 60, and the distance between 50 and 40 (these being at the centre and equal) are smaller than that between any other intervals of 10 percentile points. What this means in the practical interpretation of test results is that at—and close to—the median, differences in scores between percentile ratings are smaller in the measured characteristic than they are between the same percentile differences elsewhere on the curve. Observe, for example, that the spread of the baseline between 50 and 60 and that between 80 and 90, or 90 and 100. Let each of these represent 10 percentile points.
Problem Calculate P20 and P60 the following distribution.
160
Applied Psychometry
Figure 11.1
Unequal Distances between Points of a Normal Curve
Source: Freeman (1962: 125).
Figure 11.2
Source: Freeman (1962: 126).
Equal Distances between Points of a Rectangle
Norms
Class Interval
161
Frequency
129–133 124–128 119–123 114–118 109–113 104–108 99–103 94–98 89–93
5 4 6 7 5 8 8 2 5
Solution Class Interval 129–133 124–128 119–123 114–118 109–113 104–108 99–103 94–98 89–93
f
Cf
5 4 6 7 5 8 8 2 5 N = 50
50 45 41 35 28 23 15 7 5
Step 1: Compute the cumulative frequency (cf ) for the given frequency distribution as shown in column 3 of the above table. Step 2: Calculate a fraction = X.N/100, and locate a class in which Px falls. It is a class where the value of cumulative frequency associated with a class just exceeds the value of the fraction X.N/100. In the distribution of N = 50, P20 is the score of 20 × 50/100 = 10, that is, the score below which 10 scores lie and will be in the class interval 99–103. In other words, P20 (20 x 50/100 = 10) lies in class interval of 99–103. Step 3: l, the lower limit of the class interval of 99–103 is 98.5. Thus, l = 98.5 Step 4: cf, the cumulative frequency immediately below 1 is 7. Hence cf = 7
(XN/100 − cf) (20 × 50 / 100 − 7 ) × i , where i is the height of the = 98.5 + × 5 class f 8 (10 − 7 ) interval = 98.5 + × ( 5) 8 = 98.5 + 1.872 = 100.375 Therefore, P20 = l +
162
Applied Psychometry
Therefore, P20 is 100.375 Now to calculate P60: P60 is the score of 60 × 50/100 = 30 and will lie in the class interval of 114–18. l = 113.5 cf = 28 f=7
Here,
P60 = l +
Hence,
(XN / 100 − cf ) ×i f
= 113.5 +
{(60 × 50 / 100) − 28} ×5 7
= 113.5 +
(30 − 28) ×5 7
= 113.5 + 1.429 = 114.929
Problem Compute P30, P80 and P90 from the following distribution. Class Interval
Frequency
34–37 30–33 26–29 22–25 18–21 14–17
10 6 12 20 25 27
Solution Class Interval 34–37 30–33 26–29 22–25 18–21 14–17
f 10 6 12 20 25 27 N = 100
cf 100 90 84 72 52 27
Norms
(i) P30 is the score of 30 × 100/100 = 30 and this will lie in the class interval 18–21. l = the lower limit = 17.5 cf = cumulative frequency immediately below l = 27 i=4 X = 30 N = 100 f = frequency of the class interval in which P30 falls = 25 Hence,
P60 = l +
(XN / 100 − cf ) ×i f
{(30 × 100 / 100) − 27 } ×4 25 (30 − 27 ) = 17.5 + × ( 4) 7 = 17.5 + 0.48 = 17.98 = 17.5 +
(ii) P60 is the score of 60 x 100/100 = 60 and will lie in the class interval of 22–25. Thus, l = 21.5 X = 60 N = 100 i=4 cf = cumulative frequency immediately below l = 52 f = frequency of class interval in which P60 falls = 20 P60 = l +
(XN / 100 − cf ) ×i f
= 21.5 +
(60 × 100 / 100) − 25) ×4 20
60 − 25 ×4 20 = 21.5 + 1.6 = 23.10
= 21.5 +
(iii) P80 is the score of 80 × 100/100 = 80 and will lie in the class interval of 26–29. Thus, l = 25.5 X = 80 N = 100 I=4 cf = cumulative frequency immediately below l = 72 f = frequency of class interval in which P80 falls = 12
163
164
Applied Psychometry
∴
P80 = l +
(XN/100 − cf ) ×i f
= 25.5 +
(80 × 100 / 100) − 75) ×4 12
= 25.5 +
(80 − 72) ×4 12
= 25.5 + 2.67 = 28.17 (iv) P90 is the score of 90 × 100/100 = 90 and will lie in the class interval of 30–33. Thus, l = lower limit = 29.5 i=4 X = 90 N = 100 cf = 84 f=6 P60 = l +
XN / 100 − cf ) ×i f
(90 × 100 / 100) − 84) ×4 6 90 − 84 = 29.5 + ×4 6 = 29.5 + 4 = 33.50 = 29.5 +
Calculation of Percentiles by Graphical Method To calculate the percentiles or percentile points by graphical method, the graph is to be drawn by taking the scores on the X-axis and percentage of cumulative frequency on the Y-axis. This indicates that we need to convert the cumulative into the corresponding percentage cumulative frequency. To calculate the percentage cumulative frequency, the following formula is used: Percentage cumulative frequency =
cumulative frequency × 100 total frequency
Ungrouped Data: The percentage cumulative frequency (column 4) will be plotted on the Y-axis (ordinate) and the scores (column 1) on the X-axis (abscissa) as shown in Figure. 11.3.
Norms
Figure 11.3
Cumulative Frequency Curve
Source: Author.
Take the following ungrouped data (Table 11.1) and find percentiles. Table 11.1 Score (1) 10 9 8 7 6 5 4 3 2 1 0 Source: Author.
Calculation of Percentile Scores of a Frequency Distribution f (2)
cf (3)
Percentage cf (4)
2 4 6 8 10 16 14 6 6 6 2 N = 80
80 78 74 68 60 50 34 20 14 8 2
100.0 97.5 92.5 82.5 75.0 62.5 42.5 25.0 17.5 10.0 2.5
165
166
Applied Psychometry
For instance, P70 is found by finding the point on the graph where the perpendicular line from 70 per cent meets the curve and drawing a vertical line from this point to the X-axis of score. The vertical line cuts the X-axis at 5.6. Hence, P70 = 5.6. Take the following grouped data (Table 11.2): Table 11.2
Calculation of Percentile through a Given Frequency Distribution by Graphical Method
Class Interval (1) 128–132 123–127 118–122 113–117 108–112 103–107 98–102 93–97
f (2)
cf (3)
2 4 3 4 10 8 5 4 N = 40
40 38 34 31 27 17 9 4
Percentage cf (4) 100.0 95.0 85.0 77.5 67.5 42.5 22.5 10
Source: Author.
When the frequency distribution is given for grouped data, the method of finding percentiles is similar to that for individual scores, except that in the case of grouped distribution, we have to plot the percentage of cumulative frequency against the upper limits of the classes. Thus, in this case, the following is the graph which we obtained for the above grouped data (Figure 11.4). The percentiles are read in the usual manner. For instance, for P60 we shall first draw a horizontal line through point 60 in the percentage axis (Y-axis). Then, through the intersection of this line and curve, we shall draw a vertical line which will interest the X-axis at the required value. It is found to be 111.0. Therefore, P60 = 111.0. Percentile Ranks (PRs) If k per cent members of a sample have scores less than a particular value of x, then x, is the kth percentile and k is the PR of x. Suppose a psychologist takes an intelligence test and finds that 60 per cent of the sample score less than 72, then score 72 is the 60th percentile. In other words, if P20 = 38, then the PR of score 38 is 20. If P75 = 105 and P25 = 78, then the PR of 78 is 25 and that of 105 is 75. The computation of the PRs is the reverse process of computation of percentile points. For this, we have to compute ranks corresponding to particular scores. In case of individual scores, first, we have to find the rank of individual scores from the bottom, that is, rank in ascending order. If R is this rank and N is the total number of cases, then PR is given by, PR = 100R/N In case of grouped data, the following steps are involved in computing the PR:
Norms
Figure 11.4
167
Percentage Cumulative Frequency
Source: Author.
Step 1: Find the lower limit l of the class containing the score x whose percentile rank is required. Step 2: Find the difference between x and the lower limit l of the class containing it, that is, x – 1. Step 3: Divide (x – 1) by the size of the class interval i and multiply by the frequency of the class. Step 4: Add this to the percentile rank corresponding to the lower limit l of the class or use the following formula: PR =
100 ⎡ ⎛ x − l ⎞⎤ F −⎜ ⎟⎥ × f ⎢ N ⎣ ⎝ i ⎠⎦
168
Applied Psychometry
Problem Compute the percentile rank corresponding to 66 in the following distribution: Class Interval
Frequency
93–97 88–92 83–87 78–82 73–77 68–72 63–67 58–62 53–57 48–52
4 7 5 8 3 6 7 10 5 4
Solution Class Interval (1) 93–97 88–92 83–87 78–82 73–77 68–72 63–67 58–62 53–57 48–52
f (2)
cf (3)
Percentage cf (4)
4 7 5 8 3 6 7 10 5 4 N = 59
59 55 48 43 35 32 26 19 9 4
100.00 93.22 81.36 72.88 59.32 54.24 44.07 32.20 15.25 6.78
The score 66 falls within the class interval of 63 – 67. The exact lower limit of this class interval is 62.5 The difference x – 1 = 66 – 62.5 = 3.5 Percentile Rank of 66
=
100 ⎡ ⎛ 66 − 62.5 ⎞ ⎤ 19 − ⎜ ⎟7⎥ ⎢ 59 ⎣ 5 ⎝ ⎠ ⎦
=
100 ⎡ (3.5) ⎤ 19 (7 )⎥ 59 ⎢⎣ 5 ⎦
100 [19 − 4.9] 59 100 = [14.1] 59 = 23.90 =
Norms
169
Problem Determine the percentile rank of 64, 48.5 and 32 from the following distribution: Class Interval
Frequency
74.5–79.5 69.5–74.5 64.5–69.5 59.5–64.5 54.5–59.5 49.5–54.5 44.5–49.5 39.5–44.5 34.5–39.5 29.5–34.5 24.5–29.5
4 5 3 6 10 8 7 4 6 10 5
Solution Class Interval (1)
f (2)
cf (3)
74.5–79.5 69.5–74.5 64.5–69.5 59.5–64.5 54.5–59.5 49.5–54.5 44.5–49.5 39.5–44.5 34.5–39.5 29.5–34.5 24.5–29.5
4 5 3 6 10 8 7 4 6 10 5 N = 68
68 64 59 56 50 40 32 25 21 15 5
Percentage cf (4) 100.00 94.12 88.77 82.35 73.53 58.82 47.06 36.77 30.88 22.06 7.35
The score of 64 falls within the class interval of 59.5 – 64.5. The exact lower limit l of this class interval is 59.5 The difference x – l = 64 – 59.5 = 4.5 Percentile of Rank of 64
=
100 ⎡ ( x − 1) F − N ⎢⎣ i
=
100 ⎡ (64 − 59.5) ⎤ × 6⎥ 50 ⎢ − 68 5 ⎣ ⎦
=
100 ⎡ 4.5 ⎤ × 6⎥ 50 − ⎢ 68 ⎣ 5 ⎦
⎤ f⎥ ⎦
170
Applied Psychometry
=
100 [50 − 5.4] 68
=
100 × 44.6 68
= 65.59 The score of 48.5 falls within the class intervals of 44.5 – 49.5. The exact lower limit l of this class interval is 44.5, Percentile Rank of 48.5
=
100 ⎡ ( 48.5 − 44.5) ⎤ × 7⎥ 25 − ⎢ 68 ⎣ 5 ⎦
=
100 100 [25 − 5.6] = × 19.4 68 68
= 28.53 The score of 32 falls within the class interval of 29.5 – 34.5. The exact lower limit l of this class interval is 29.5, Percentile Rank of 32
=
100 ⎡ (32 − 29.5) ⎤ × 10 ⎥ 5− ⎢ 68 ⎣ 5 ⎦
=
100 ⎡ 2.5 ⎤ × 10 ⎥ 5− 68 ⎢⎣ 5 ⎦
100 [5 − 5] 68 =0 =
Deciles Deciles are the points which divide the scale of measurement into 10 equal parts. Thus, there will be in all 10 deciles, namely, first decile to ninth decile. These deciles are denoted by D1, D2, D3, D4, D5, D6, D7, D8 and D9. The first decile may be defined as a point on the scale of measurement below which one-tenth (1/10) or 10 per cent of the total cases lie, second decile as the point below which 20 per cent of the cases lie, and so on (Figure 11.5). The term decile is used to mean a dividing point. Decile Rank signifies a range of scores between two dividing points. For example, a testee who has a decile rank of 10 (D10) is located in the highest
Norms
Figure 11.5
171
Cumulative Frequency Curve: Decile Points
Source: Author.
10 per cent of the group; one whose Decile Rrank is 9 (D9) is in the second highest 10 per cent; one whose Decile Rank is 1 (D1), is in the lowest 10 per cent of the group. The Decile Rank is the same in principle as the percentile; but instead of designating the onehundredth part of a distribution, it designates the one-tenth part of the group (N/10) in which any tested person is placed by his score. When the number of scores in a distribution is small, percentiles are not used, because there is little or no significance in making fine distinctions in rank. The decile ranking method may be used instead.
Standard Score This index too designates an individual’s position with respect to the total range and distribution of scores, but its index is less obvious than that of Percentile and Decile Ranks. The standard score indicates in terms of standard deviation as to how far a particular score is removed from the mean of the distribution. The mean is taken as the zero point and standard scores are given as plus or
172
Applied Psychometry
minus. If the distributions of scores of two or more tests are approximately normal, then the standard scores derived from one distribution may be compared with those derived from the other. The formula is, z=
X−M SD
where, X is an individual score, M is the mean of the distribution and SD its standard deviation. Assume, for example, that the mean Intelligence Quotient (IQ) of a group is 100 and that the standard deviation is 14. In this distribution, an individual reaching an IQ of 114 has a z-score of +1.0. Another individual having an IQ of 79 has a z-score of –1.5. Ultimately, standard scores must be given percentile values to express their full significance. Since the number of cases encompassed within a given number of standard deviations in a normal distribution is mathematically fixed, it is always possible to translate a z-score into a percentile value. Thus, a person having z-score of +1.0, has a percentile rank of approximately 84, that is, his score surpasses 84 per cent of the scores in the group. A person having a z-score of –1.5, has a percentile rank of approximately 7, surpassing only 7 per cent of the scores. To give an illustration, Table 11.3 shows several standard scores and their percentile values. Table 11.3 z-Scores 0.10 0.15 0.25 0.35 0.50 0.75 1.00 1.50 2.00 2.50 3.00 3.70
Area under the Curve Corresponding to Given Standard Deviation Percentage of Cases from the Mean 4 6 10 14 19 27 34 43 48 49.4 49.8 50.0
Source: Author.
Figure 11.6 shows the percentage of scores (or cases) occurring above and below standard scores, with their corresponding PR values. The standard score is preferred by some psychologists as an index of relative rank because it is a well-defined property of the normal curve, representing a fixed and uniform number of units throughout the scale. Percentile and decile scores, on the other hand, are ranks in a group and do not represent equal units of individual differences.
Source: Freeman (1962: 129).
Figure 11.6
The Normal Curve and Derived Scores
174
Applied Psychometry
T-Score This variant of standard score was suggested by McCall (1922). In the T-score method, the mean is set at 50, unlike in standard score where the value of mean is zero. To obtain a T-score, the standard score is multiplied by 10 and then added to or subtracted from the mean T-score of 50. Thus, a standard score of +1.00 becomes a T-score of 60, while that of –1.00 becomes a T-score of 40. The assumption in this technique is that nearly all the scores will be within a range of five standard deviations from the mean. Since each SD is divided into 10 units, T-score is based upon a scale of 100 units, thus avoiding negative scores and fractions. It is important, therefore, to understand that T-score, found for an individual, is relevant only to the distribution of scores of the group from which the values have been derived and with which his score is being compared.
Stanine This term, coined by psychologists in the Army & Air Force during World War II, is yet another variant of the standard score technique. In this method, the standard population is divided into nine groups; this is ‘standard nine’ termed as ‘stanine’. Except the ranks of stanine 1 (lowest) and 9 (highest), each unit is equal to one half of a standard deviation. A score of five represents the median group, defined as those whose scores are within +0.25 SD from the mean, that is, a range of a half-sigma at the centre of the distribution. Similarly, a rank of stanine 6 represents the group whose scores fall between +0.25 sigma and +0.75 sigma (SD). The meanings of the other stanine rankings can be determined likewise except 1 and 9, since the former represents all scores below, –1.75 sigma and the latter includes those above +1.75 sigma. This single digit system of scores has certain advantages for machine computations and it does eliminate plus and minus signs. Other than these considerations, the stanine method does not have much advantage in preference to other methods.
Age Norms Sometimes it is desirable to express norms in terms of children’s age; for instance, by testing, we found the reasoning of children at all ages from five to 15 years. The average score can be calculated for each group separately. This can be plotted as shown in Figure 11.7. From Figure 11.7, it can be seen that a child of age six years has a higher score than a five-year old child. Age norms would prove useful to know the capability of a child of a particular age. Let us take a child of 10 years, who scored a reasoning score of 50. The mean score for the ten-year olds in the normative sample is 55. Hence, we can find the age group which corresponds to a reasoning score of 50. From the curve, it can be seen that the reasoning score of 50 corresponds to a child of age nine years. Therefore, it can be said that this 10-year old child has a reasoning score of a nineyear old child. Age norms fit very well with the way we generally think about children in our day-to-day life.
Norms
175
Figure 11.7 Age Norm Curve
Source: Based on Nunnally (1970).
Age norms are also available for some intelligence tests; for instance, the Stanford-Binet test provides the age-norms. By comparing a child’s score with age norms, one can say whether the child is more or less intelligent than the average child of his age. The score compared with age norms in this way is called mental age (MA). In other words, if any 10-year old child performs equal to the performance of a 12-year old child, then that 10-year old child will be said to have a mental of a 12year old child. Similarly, if a child of 15 years performs equal to the performance of a 10-year old child, then this child will be said to have a mental age of 10 years. The IQ was developed on this concept. The IQ is 100 times the ratio of MA and the chronological age (CA) of the testee, that is, IQ =
MA ×100 CA
If a 10-year old child does as well as a 15-year old child on intelligence test, then his IQ is 150. It is calculated as, IQ =
15 × 100 = 150 10
176
Applied Psychometry
Grade Norms Grade norms are similar to age norms. In this case, the grade levels are taken in place of age on the X-axis of the graph. The grades are plotted with regard to the scores on the Y-axis as shown in Figure 11.8. Figure 11.8 Grade Norm Curve
Source: Author.
These norms are very useful for teachers to understand as to how well are the students progressing in the grade level.
WHAT IS THE DIFFERENCE BETWEEN NORMS AND STANDARDS? Norms, simply stated, are statistical average scores of the representative sample of populations. They are the average scores of values determined by the actual measurement of a group of persons, who are representative of a specified population (Freeman 1955: 56). On the other hand, standards are value averages. Norms denote the actual achievement of the representative group on the chosen
Norms
177
psychometric criteria under the prevalent socio-economic conditions, while standards refer to the desired levels (or standards) of achievement that could be achieved if the input conditions are improved or enhanced. Consider an example. The statistical average score of VIII Grade students on a vocabulary test is 60 (out of 100). So, this figure ‘60’ can be taken as the ‘norm’ to compare the vocabulary proficiency of other VIII graders on the same test and under similar conditions. But keeping in mind the importance of verbal ability, this score may set a ‘standard’ score of 75 for this proficiency. For this, we would be expected to emphasise on vocabulary learning and enrichment, invest more on training in verbal proficiency and vocabulary development, and so on. In this example, ‘60’ is the norm and ‘75’ is the standard. So, the norms of performance with respect to some of the psychological processes, measured by the means of test of intelligence and of specific aptitude, are likewise dependent upon conditions and opportunities present during the course of development. Norms of height, weight and other body measurements will also reflect the past conditions of nutrition and health (Freeman 1955: 56). In this manner, norms could be an important policy guide. This is because standards presume a shift in norms, which essentially could be presumed as a statistical representation of the status quo. This status quo is challenged by setting higher standards. Thus, governmental policy and developmental measures could be conceived as nothing but as a process of setting standards and driving norms. For example, consider the following diagram (Figure 11.9): Figure 11.9
Source: Author.
The Relationship between Training and Verbal Fluency Scores
178
Applied Psychometry
The figure shows that if we double the level of training and investment for enhancing verbal proficiency, then there is a 15 digit growth [(I, 30); (II, 45)] and (III, 60)], in verbal fluency, or we can say that when standards are doubled (100 per cent increase), norms shift by 15 per cent. In this manner, the government and policy makers can make an estimate of the expected level of investment, based on the desired goal or set standards. For example, in Figure 11.9 if you want to achieve the desired goal, you need to shift the norms by 15 units from the current situation, which will demand the quadrupling of current level of effort (in the form of training and investment). The policy makers can use the services of psychometricians and statisticians to calculate the norms about different social groups and, on its basis, can conceive a common norm to eradicate socio-economic disparities, which will be a standard and whenever a new standard is concerned, the present standard becomes a norm.
TYPES OF NORMS USED IN SOME PSYCHOLOGICAL TESTS Table 11.4 S.N. Name of Test
Types of Norms Used in Some Psychological Tests
Norms Used
®
1
16PF Fifth Edition A stratified random sampling that reflects the 2000 US Census and was used to Authors: Raymond B. Cattell, create the normative sample, which consisted of 10,261 adults. A. Karen Cattell and Heather E.P. Cattell
2
MMPI®–2 (Minnesota Multiphasic Personality Inventory®–2) Authors: Revision of the original MMPI® by Starke R. Hathaway and J. Charnley McKinley
The MMPI–2 normative samples consist of 1,138 males and 1,462 females from diverse geographic regions and communities across the US. Individuals between the ages of 18 and 80 were recruited for inclusion in the samples.
3
QOLI® (Quality-of-Life Inventory) Author: Michael B. Frisch
Normative data are based on 798 non-clinical adults sampled from 12 states from the Northeast, the South, the Midwest, and the West America. An attempt was made to match the 1990 US Census data as closely as possible.
4
GZTS (Guilford-Zimmerman Temperament Survey) Authors: J.P. Guilford and Wayne S. Zimmerman
Norms for the GZTS instrument are based on representative samples of high school students, college students and adults in various occupational settings.
5
BASI™ (Basic Achievement Skills Inventory) Author: Achilles N. Bardos
The BASI Comprehensive and Survey versions were standardised on a sample of more than 4,000 students (grades 3–12 and college) matched to the 2000 US Census demographic information. The Survey version also was normed on a sample of 2,000 adults (ages 18–80), matched to the 2000 US Census demographic data. The samples were stratified by race and ethnicity, age, gender, geographical region and socio-economic status. (Table 11.4 continued)
Norms
179
(Table 11.4 continued) S.N. Name of Test
Norms Used
6
Career Assessment Inventory™—The Enhanced Version Author: Charles B. Johansson
The reference sample consisted of 900 employed adults and students. The sample was stratified by selecting cases from a larger sample, so that 75 females and 75 males had their highest score on each of the six different theme scales used in the test.
7
The Children’s Depression Inventory Author: Maria Kovacs
The normative sample used for scoring the CDI was divided into groups based on age (ages 6–11, 12–17) and gender. The normative sample includes 1,266 public school students (592 boys, 674 girls), 23 per cent of whom were African–American, American Indian or Hispanic in origin. Twenty per cent of the children came from single-parent homes.
8
MACI™ (Millon™ Adolescent Clinical Inventory) Authors: Theodore Millon, Carrie Millon, Roger Davis and Seth Grossman
The normative population of the MACI test consists exclusively of clinical adolescent patients, offering relevant comparisons. The sample includes 1,017 adolescents from outpatient, inpatient and residential treatment programmes in 28 states and Canada. The delineation of four distinct norm groups, given below, further enhances the test’s usefulness: z z z z
Males: 13–15 years old Females: 13–15 years old Males: 16–19 years old Females: 16–19 years old
9
PDS® (Posttraumatic Stress Diagnostic Scale) Author: Edna B. Foa
The PDS instrument was normed on a group of 248 men and women between the ages of 18 and 65, who had experienced a traumatic event at least one month before they took the test. The diverse normative base includes clients of women’s shelters, PTSD treatment clinics and Veterans Administration hospitals, in addition to staff of fire stations and ambulance corps.
10
CAARS (Conner’s Adult ADHD Rating Scales) Authors: C. Keith Conners, Drew Erhardt and Elizabeth Sparrow
The non-clinical self-report form was normed on 1,026 individuals and the observer form on 943. Separate norms are available by gender and age-group intervals.
182
Applied Psychometry
PART 3 Applications of Psychological Testing
182
Applied Psychometry
12
Applications of Psychological Testing in Educational Setting
CHAPTER OUTLINE 1. Psychological testing in the field of education 2. Two practical demonstrations with scores and interpretation: (i) Career interest inventory (ii) Standard progressive matrices (SPM) 3. Directory of major tests used in the educational field: (i) Foreign Test (ii) Indian Tests
LEARNING OBJECTIVES After reading this chapter, you will be able to: 1. Understand how psychological tests are used in the field of education for assessment, guidance and enhancing the effectiveness of teaching–learning process. 2. Understand the use and interpretation of two major tests used for assessment in the educational field. 3. Find the information about important psychological tests used for assessment and guidance in the field of education. 4. Find the web information regarding educational psychology, educational assessment, career guidance professionals and test providers.
183
184
Applied Psychometry
EDUCATION, PSYCHOLOGY AND PSYCHOLOGICAL TESTING
P
sychological tests play an important role in educational settings and their role is likely to continue to increase in future. Assessment has always been an integral part of education. However, current educational environment demands a greater emphasis on the psychological aspects of the teaching–learning process. The scientific understanding of behaviour and mental processes, as derived from psychology, can be operationalised for effective and empowering education that is needed in twenty-first century through the use of appropriate psychological tests.
Educators as Assess-mentors Assessment is generally taken as synonymous to ‘evaluation’ in the areas of psychology and education, which has essentially become a passive concept. The modern requirements are that educators play a more positive role of assessing their students and offering counselling, guidance or mentoring to them as well. This will make assessment a more likely and fruitful enterprise and, so ultimately, ‘assessment’ is to be finally replaced by ‘assess-mentoring’. Assess-mentoring is a science which involves careful and objective measurement of behavioural, psychological and educational attributes of the students, and mentoring the students by providing them with active feedback and support, based on the previously mentioned type of assessment. Such assessment begins at the entrance test level, and includes the assessment of abilities (intelligence, aptitude and achievement), personality assessment and assessment of career, guidance and placements. Each of these levels of assessment is discussed here with suitable examples.
TWO PRACTICAL DEMONSTRATIONS WITH SCORES AND INTERPRETATION Career Interest Inventory Career Interest Inventory: A Brief Profile Career Interest Inventory (1995) is a 170-items inventory, which is used to provide information concerning the interest in 15 different occupational areas. The inventory has been prepared by Gillian Hyde and Geoff Tricpey (1995), and published by The Psychological Corporation®. Career Interest Inventory can be used alone or in combination with Differential Aptitude Tests (DAT), if the measure of aptitude is required for guidance and counselling purpose. The responses to items are a five-point Likert-type,1 through which the subject shows his level of liking/disliking for various activities mentioned in the item. The items are like ‘Help families choose a home to buy’ and responses are:
Applications of Psychological Testing in Educational Setting
185
5 = like very much 4 = like a little 3 = unsure or undecided 2 = dislike a little 1 = dislike a great deal
The inventory requires approximately 30 minutes to complete plus around 10 minutes for reading and explaining instructions. The inventory is quite reliable, with Cronbach’s (1951) coefficient alpha for the 15 occupational groups ranging from r = 0.87 to r = 0.94, and the standard errors of measurement ranging from 2.4 to 3.6. Since, the results of Career Interest Inventory are intended for immediate use in career education and guidance, it is important to have content validity for it. For this, Career Interest Inventory uses Dictionary of Occupational Titles published by the US Department of Labour (1977). The test score profile of the Career Interest Inventory helps to focus on career related discussions between the person who takes the test and the career adviser. For any beneficial outcome, it is important that the person who has taken the test be fully involved in the interpretation of the test results. The interest scores need to be interpreted keeping in mind the person’s personal goals and aspirations. The test scores acquire greater significance if they are interpreted along with the results of other tests like aptitude, educational achievements and personality tests. Information about a person’s work experience and leisure pursuits makes the test scores more meaningful, as all this information helps in career planning for the person. As we can see in Figure 12.1, a person ‘ABC’ has a well defined profile, where his level of interest is high for some occupational groups, medium for others and low for the remaining groups. The areas of high interest are clearly differentiated from those with medium and low interest. The profile indicates a high interest in the areas of fine arts, math and science, social science and management. The related educational subjects for these occupational groups show a great degree of commonality. The occupational group of fine arts indicates that the related educational subjects are speech or drama, photography, journalism, creative writing, music or art, and English or foreign language. The related educational subjects for mathematics and science are computer programming, mathematics or science, and creative writing, along with other subjects like marketing or sales, business law or management, and bookkeeping. All these areas are related to handling of information and creativity. Both these areas relate very well to his interest in pursuing a course in computer animation, as it involves computer programming as well as creativity. The Career Interest Profile not only helps in gaining self-awareness, but also helps in constructing a list of areas for career exploration and identifying specific courses of action. Educational planning, career choices and their impact on the person’s lifestyle and roles, as well as strategies for career explorations are important areas of discussion during the process of career counselling and this test is an excellent tool for discussion and planning of these areas.
186
Applied Psychometry
Figure 12.1
Source: Hyde and Tricpey (1995).
Career Interest Inventory Profile Sheet
Applications of Psychological Testing in Educational Setting
187
Standard Progressive Matrices (SPM) (A Part of Raven’s Progressive Matrices) The Raven’s Progressive Matrices test is a widely used intelligence test in many researches and applied settings (Raven et al. 2003). It consists of 60 items arranged in five sets (A, B, C, D and E) of 12 items each. Each item contains a figure with a missing piece (see Figure 12.2). Below the figure are either six (sets A and B) or eight (sets C through E) alternative pieces to complete the figure, only one of which is correct. Each set involves a different principle or ‘theme’ for obtaining the missing piece. Each set of items gets progressively harder, requiring greater cognitive capacity to encode and analyse. The raw score is typically converted to a percentile rank by using the appropriate norms. Figure 12.2
A Sample Item from the SPM
Source: Raven et al. (2003).
Raven’s Progressive Matrices was designed primarily as a measure of Spearman’s ‘g’ (1904).2 There are no time limits, although most subjects complete the test in 20–25 minutes. These are simple oral instructions. There are three different tests for different abilities as follows: 1. Coloured Progressed Matrices (younger children and special groups), 2. Stanford Progressive Matrices (average six-year olds to 80-year olds), and 3. Advanced Progressive Matrices (above average adolescents and adults). Raven’s Progressive Matrices: 1. has good test–retest reliability, between .70 and .90 (however, for low score ranges, the test–retest reliability is lower), 2. has good internal consistency coefficients—mostly in the .80s and .90s, 3. has correlations with verbal and performance tests which range between .40 and .75, 4. fair concurrent validity in studies with mentally retarded groups, and 5. lower predictive validity than verbal intelligence tests for academic criteria.
188
Applied Psychometry
As can be seen from Figure 12.3, Ms XYZ has a total score of 52. When this score is assessed for the discrepancy score, her scores of 0, 0, +1, 0 and –1 indicate that her test scores can be accepted as genuine, as they fall within the acceptable range of +2 and –2. The 2000 Edition of SPM has Indian Norms for Pune and Mumbai and these were the norms taken for the assessment of the subject. According to her age, the subject’s score falls at the 90th percentile, putting her in Grade II+, which shows that the subject is definitely above the average in intellectual capacity. Figure 12.3
Response Sheet of Standard Progressive Matrices
Applications of Psychological Testing in Educational Setting
189
DIRECTORY OF MAJOR TESTS USED IN THE EDUCATIONAL FIELD Foreign Tests Tests of Ability 1 Name: Columbia Mental Maturity Scale (CMMS). Authors: Bessie B. Burgemeister, Lucille Hollander Blum and Irving Lorge. Applicability: 3 to 6, to 9 to 11 years. Measures: General reasoning ability. No. of items: Ninety-two pictorial and figural classification items arranged in a series of eight scales or levels. Each level consists of 51 to 65 items. Reliability: Split-half reliability for each applicability group ranges from 0.85 to 0.91 with a median of 0.88. Test–retest reliability for three different groups was 0.85. Validity: Correlations range from 0.31 to 0.61 between the CMMS and scores from a standardised achievement test. Availability: Psychological Corporation, 304 East 45th St, New York–10017.
2 Name: Moving Embedded Figures Test (MEFT). Author: Jacqueline Herkowitz. Applicability: 5 to 12 years. Measures: Figure-ground perceptual ability. No. of items: 20 minute six mm individually administered filmed test; 12 figures (four heavy density, four medium density and four light density figures); in all 27 items. Reliability: Test–retest correlation coefficients: –0.65, 0.62, 0.71 and 0.79. Reliability of the mean or total of 27 items was 0.94. Estimated reliability of a single item was 0.35. Availability: Jacqueline Herkowitz, Department of Education, Ohio State University, 309 Pomerene Hall, Columbus, Ohio–43210.
3 Name: Perceptual Acuity Test (PAT). Author: Harrison G. Gough.
190
Applied Psychometry
Applicability: 5 to 75 years and up. Measures: 1. Field independence and analytic perceptual ability. 2. Judgements concerning relative size, length, area, contour and equivalence of geometric forms. No. of items: Thirty multiple choice items presented by 35 mm slides (25 are based on standard optical illusion, five are illusion free). Reliability: Odd–even reliability is 0.70. Validity: Correlations of 0.41 with accuracy of the Witkin Rod-and-Frame Test; of 0.30 with Crutchfield’s adaptation of the Gott schaldt embedded figures; and of 0.28 with the Case Ruch Survey of Space Relations. Availability: Harrison G. Gough, Department of Psychology, University of California, Berkeley, California–94720. 4 Name: Minnesota Child Development Inventory. Authors: Harold Ireton and Edward Thwing. Applicability: Mothers of children aged 12 months to 6 years. Measures: 1. 2. 3. 4. 5.
Behaviour of children in the first six-and-a-half years of life, Representation of developmental skills, Observability by mothers in real-life situations, Descriptive clarity, and Age discriminating power.
No. of items: 320. Reliability: Split-half coefficients ranged from 0.68 to 0.90. Validity: 1. Systematic increase in mean score with increase in applicability. 2. Low incidences of children scoring appreciably below their applicability level. Availability: Behaviour Science Systems, 5701 Hawfees Terrace, Minneapolis, Minnesota–55436. 5 Name: Thurstone Test of Mental Alertness. Authors: Thelma Gwinn Thurstone and L.L. Thurstone.
Applications of Psychological Testing in Educational Setting
191
Applicability: Grades 9 to 12 years and adults. Measures: Verbal and mathematical abilities. No. of items: 126 items of four types, arranged spirally and in order of difficulty. Reliability: Test–retest reliability coefficients range from 0.84 to 0.96 and 0.92 to 0.95. Validity: Correlations of 0.40 to 0.77 with grade-point averages, 0.83 to 0.85 with Iowa Test of Educational Development, 0.71 to 0.73 with SRA High School Placement Test. Availability: Science Research Associates, Inc., 259, East Erie St, Chicago II, Illinois. 6 Name: Lanyon Stuttering Severity (SS) Scale. Author: Richard I. Lanyon. Applicability: 14 years and up. Measures: Severity of stuttering. No. of items: 64 true–false items. Reliability: Kuder-Richardson reliability 0.93. Validity: Cross-validation mean for stutters is 39.2 (SO = 9.2), and mean for non-starters is 12 (SD = 7.8). Availability: Lanyon (1967: 836–43). 7 Name: Index of Perceptual-Motor Laterality. Author: Allan Berman. Applicability: 6 to 12 years. Measures: Laterality of hand, eye, ear and foot. No. of items: 54. Reliability and Validity: The index as a whole correlates 0.81 with measures of non-verbal intelligence. Availability: Berman (1973: 599–605).
8 Name: General Aptitude Test Battery (GATB). Author: David J. Weiss. Applicability: 16 years and over, grades 9 to 12 and adults. Measures: Used in vocational guidance; measures capacities to learn various jobs. Reliability and Validity: The GATB validity data indicates that individuals high on an ability today are also likely to be rated high in ability to perform related tasks tomorrow (or even the same day). Availability: David J. Weiss, Department of Psychology, University of Minnesota, Minneapolis, Minnesota.
192
Applied Psychometry
9 Name: Bennett Mechanical Comprehension Test. Author: George K. Bennett. Applicability: Grades 9 to 12 and adults. Measures: The ability to perceive and understand the relationship of physical forces and mechanical relationship of physical forces and mechanical elements in practical situations (Chadha 1996). Reliability: Internal consistency coefficients vary from 0.81 to 0.93. Validity: Face validity: 1. Correlations above 0.30 between test scores on form Sand T and job ratings. 2. The predictive validities varied from 0.12 (00.64 with a median of 0.36). Availability: Psychological Corporation, 304, East 45th St, New York–10017.
10 Name: Stanford Achievement Test: High English and Spelling Tests. Authors: Eric F. Gardner, Jack C. Merwin, Robert Callis and Richard Madden. Applicability: Grades 9 to 12. Measures: Mechanics, style and paragraph organisation. No. of items: 80 items spread over above mentioned three sections; 60 items in spelling tests, each consisting of four unrelated words, one misspelled. Reliability: Coefficients of 0.87 to 0.94. Validity: Face validity. Availability: Harcourt Brace Jovanovich, Inc, Tarrytown and N.Y.
11 Name: Fairview Social Skills Scale (FSSS). Authors: Robert T. Ross and James S. Giampiccolo, Jr. Applicability: 6 to 12.5 years. Measures: 1. 2. 3. 4. 5.
Self-Help Skills (Locomotion, toilet training, dressing, eating and grooming), Communication, Social Interaction, Occupation, and Self-Direction.
Applications of Psychological Testing in Educational Setting
193
Primarily used with the mildly and moderately retarded. No. of items: 36. Reliability: Interrater reliabilities were 0.81 for total score and ranged from 0.68 to 0.84 for the subscales. Test–retest reliabilities were 0.89 for the total scale and ranged from 0.77 to 0.90 for the sub-scale. Validity: Correlation of FSSS scores with Vineland Social Maturity Scale was 0.53. Availability: Research Department, Fairview State Hospital, 2501 Harbor Boulevard, Costa Mesa, California–92626.
12 Name: Verbal Maturity Scale. Authors: Albert G. Packard, Elaine E. Lee and M. Adele Mitzel. Applicability: 3, 4 and 5 years. Measures: 1. A child’s ability to understand and to respond verbally to questions about himself and his environment, and 2. A child’s general fund of knowledge and his ability to recall and express that score knowledge without the use of visual dues. No. of questions: Sixty. Reliability and validity: Content and Predictive validity. The correlations ranged from the high 0.70s and 0.80s. Availability: Baltimore City Public Schools, 3 East 25th St, Baltimore, Maryland–21218. 13 Name: Original Written Language Inventory (OWL). Author: Helen B. Craig. Applicability: 6 to 14 years. Measures: The development of vocabulary, syntax and functional linguistic categories of young deaf children, as demonstrated by their performance on a narrative composition task. No. of items: Four sets of stimuli, two sets geared to the vocabulary most familiar to young children (6 to 8 years) and two sets for older children (9 to 14 years). The stimuli are seven-picture sequences depicting simple stories. Reliability and validity: Validity of the test is indicated by the degree of increase from one age level to the next. Availability: Western Pennsylvania School for the Deaf, 300 East Swissval Avenue, Pittsburgh, Pennsylvania–15218.
194
Applied Psychometry
14 Name: Bowen Language Behaviour Inventory. Author: Mack L. Bowen. Applicability: Mental applicability from 3 to 6 years. Measures: 1. 2. 3. 4.
Language development/behaviour in developmentally retarded children, Language based on criterion referenced testing procedures, Language ability on a receptive-motoric-expressive basis, and A number of language behaviour, based on the formulated theoretical constructs.
No. of items: Eight subtests. Reliability and validity: Reliability coefficients (Kuder-Richardson formula) and Point-Biserial correlations were established. Availability: Mack L. Bowen, 1502 West Hovey, Normal, Illinois–61761.
15 Name: Arithmetic Concept Individual Test (ACIT). Authors: Gerald Melnick, S. Freeland and Bary Lehrer. Applicability: Primary and intermediate level educable mentally retarded children. Measures: 1. The process by which students attack quantitative relations, 2. Why particular children do not progress in their arithmetic skills, 3. Tasks related to seriation, classification, class inclusion, differentiation of length and number concepts, one-to-one correspondence, conservation of number and spatial relations. Reliability: The highest correlation (0.84) was between seriation and conservation of number. Validity: High reported. Availability: Curriculum Research and Development Center in Mental Retardation, Department of Special Education, Ferkauf Graduate School of Humanities and Social Sciences, Yeshiva University, 55th Avenue, New York–10003
16 Name: Differential Aptitude Test Authors: George K. Bennett, Harold G. Seashore and Aleczander G. Wesman. Applicability: Grades 8 to 12 and adults.
Applications of Psychological Testing in Educational Setting
195
Measures: Eight tests: verbal reasoning, numerical ability, abstract reasoning, clerical speed to find accuracy, mechanical reasoning, space relations, Language usage I: spelling, and Language usage II: Grammar. CU-I and LU-IT are achievement tests. Reliability and validity: Correlations with other standardised aptitude (including Primary Mental Abilities [PMA] and General Aptitude Test Battery [GATB]), achievement and interest measures are reported. These are all (with one exception) zero-order coefficients representing the predictive value of DAT scores over a period of a few weeks to several years. Availability: The Psychological Corporation, 304 East 45th St. New York–10017.
17 Name: The Guilford-Zimmerman Aptitude Survey. Authors: J.P. Guilford and Wayne S. Zimmerman. Applicability: Grades 9 to 16 and adults. Measures: Aptitude. Reliability: Reliability coefficients are high ranging from 0.74 to 0.94. Validity: Factorial validity range between 0.52 to 0.89. Availability: Sheridan Psychological Services, Inc., P.O. Box 837, Beverly Hill, California–90213.
18 Name: Peabody Individual Achievement Test (PIAT). Author: Joseph L. French. Applicability: Grades Kgu–12. Measures: Achievement in the areas of mathematics, reading, spelling and general information. No. of items: Five sub-scores with a total score. Reliability: Test–retest reliability; total test: 0.82–0.92; mathematics: 0.52–0.84; reading recognition: 0.81–0.94; reading comprehension: 0.61–0.78; spelling: 0.42–0.78 and general information: 0.70–0.88. Validity: Correlations between PIAT total scores and Peabody Picture Vocabulary test (PPVT) IQs range from 0.53 to 0.19 with median 0.68. Availability: Joseph L. French, Prof. of Special Education and Educational Psychology, The Pennsylvania State University, University Park, Pennsylvania. 19 Name: Infant Cognitive Development Scale. Author: Albert Mehrabian. Applicability: Birth to 2 years.
196
Applied Psychometry
Measures: Cognitive development in relation to subsequent linguistic functions (based on Piaget’s concepts). No. of items: 28 Reliability and validity: Median, r = 0.93 (high level of consistency); test–retest reliability = 0.72; and coefficients of homogeneity = 0.93. Availability: Albet Mehrabian, University of California, Los Angeles, Department of Psychology, California–90024.
20 Name: Teacher’s Rating Questionnaire (TRQ). Author: E.N. Wright. Applicability: Kindergarten to grade 9. Measure: Diverse properties of achievement and character of the teacher–child interaction (on a Likert-type scale). Reliability: The estimated reliability coefficient for repeated administrations of the same version is 0.73 and 0.66 between two versions of TRQ. Validity: Concurrent validation between the various items of the language and mental subscales of the TRQ with five subtests of the Metropolitan Achievement Test (MAT): Word knowledge and discrimination, reading, spelling and arithmetic. Availability: Research Department, Board of Education for the city of Toronto, 155 College St., Toronto, Ontario, Canada–MST IP6.
21 Name: Personal Values Inventory (PVI). Authors: George E. Schlesser, John A. Finger and Thomas Lynch. Applicability: Grades 12–13. Measures: College academic achievement. Reliability: Median coefficients ranged from 0.75 to 0.91. Validity: Multiple correlation (PVI scale and Scholastic Aptitude Test [SAT]) was approximately 0.60, ranging in various samples from 0.42 to 0.79. Availability: Colgate University Testing Service, Hamilton, NY–13346.
22 Name: Group Test for Assessing Handedness. Authors: Herbert F. Crivitz and Karl E. Zener. Applicability: Kindergarten and above.
Applications of Psychological Testing in Educational Setting
197
Measures: A quantitative measure for hand dominance. No. of items: 14. Reliability and validity: The questionnaire has been used in a number of research studies. Availability: H.F. Crivitz’, Veterans’ Administration Hospital, Durham, North Carolina–27705.
23 Name: Iowa Test of Pre-school Development (ITPD). Author: Ralph Scott. Applicability: 2 to 5 years. Measures: Pre-school achievement. No. of items: Four: Language, visual motor, memory, concepts. Reliability: Split-half reliability: 0.94–0.98 Validity: Concurrent validity = 0.64; predictive validity = 0.58. Availability: Go-Mo Products, Inc., 1906 Main, Cedar Falls, Iowa–50613.
24 Name: Bayley Scale of Infant Development. Author: Nancy Bayley. Applicability: 2 to 30 months. No. of items: 81. Measures: Infant development: mental scale, motor scale and infant behaviour (rating scale). Reliability: Split-half reliability coefficients: 0.81 to 0.93 (median 0.88) on the mental scale and from 0.68 to 0.92 on the motor scale. Validity: Correlations with Stanford-Binet: 0.57. Availability: Bayley, Psychological Corporation, (The) 304 East 45th St., New York–10017.
25 Name: Minnesota Test for Differential Diagnosis of Aphasia. Author: Hildred Schuell. Applicability: Adults. Measures: The nature and extent of the language deficit in the aphasic patients. No. of items: Forty-seven tests battery, five section each designed to investigate a different aspect of functioning. Reliability: Test–retest reliability high for diagnostic categories. Availability: University of Minnesota Press, 2037 University Ave., Minneapolis, Minnesota.
198
Applied Psychometry
26 Name: Children’s Individual Test of Creativity (CITOC). Authors: N.S. Metfessel, Marilyn E. Burns and J.T. Foster. Applicability: Pre-school through elementary grades. Measures: 1. 2. 3. 4. 5. 6.
Sensitivity to problems, Fluency of thinking, Flexibility, Originality, Redefinition, and Elaboration.
No. of items: Twelve: six verbal subtests and six performance subtests. Reliability: Reliability coefficients for the verbal, performance and total scores were 0.86, 0.78 and 0.84. Validity: Burns (1969, cited in Chadha 1996) found the relationship between total Children’s Individual Test of Creativity (CITOC) scores to be in the slight-to-low positive range often found in correlation studies between intelligence and creativity test scores, ranging from 0.11 at age four to 0.33 at age nine. Availability: Marilyn Burns, 3858 Buena Park, Dr. Studio City, CA–91604. 27 Name: Torrance Tests of Creative Thinking (TTCT). Author: E. Paul Torrance. Applicability: Kindergarten through graduate school. Measures: 1. Four aspects of creative thinking: fluency, flexibility, originality and elaboration. 2. Scores for each aspect: verbal and figural. No. of items: Three figural and seven verbal tasks. Reliability: Retest reliabilities range from 0.50 to 0.93 over one to two week period and from 0.35 to 0.73 over a three-year period. Validity: TTCT is related to academic intelligence and educational achievement. Concurrent and predictive validity is established. Availability: Personnel Press Inc., 20, Nassav St., Princeton, N.J.
Applications of Psychological Testing in Educational Setting
199
28 Name: Behaviour Maturity Checklist (BMC). Author: Donald C. Soule. Applicability: All ages, mental age 6 to 54 months. Measures: Behaviour maturity of severely and profoundly retarded children and adults. No. of items: 15. No. of areas: Seven: grooming, eating, toileting, language, social interaction, total self-care and total interpersonal skills. Reliability: 0.93 (retest after six months). Availability: Psychology Research and Evaluation Section, O’ Berry Center, Goldsboro, North Carolina 27530.
29 Name: Wechsler Preschool and Primary Scale of Intelligence (WPPSI). Author: David Wechsler. Applicability: 4 to 6.5 years. Measures: Intelligence. Items: Five, regular verbal subtests and three of the five are performance subtests. Reliability: Test–retest reliabilities: 0.86, 0.89 and 0.92 for verbal performance and full scale IQs. Validity: Correlations with the WPPSI Full Scale IQ ranged from 0.58 (PPVT) to 0.75 S-B). Correlations with Stanford-Binet are higher for the verbal than for performance IQ of WPPSI. Availability: Australian Council for Educational Research, 369, Lonsdale St., Melbourne C.L. Victoria, Australia.
30 Name: Wechsler Intelligence Scale for children. Authors: J.A. Radcliffe and F.E. Trainer. Applicability: 5 to 15.5 years. Measures: Verbal and performance tests and individual subtests. Reliability and validity: Correlations as high as 0.70 between Wechsler scales and the visually evoked response. Availability: Australian Council for Educational Research, 369, Lonsdale St., Melbourne C.I., Victoria, Australia.
200
Applied Psychometry
31 Name: Culture Fair Intelligence Test (a measure of ‘g’.) Scale 2, forms A and B. Authors: R.H. Cattell and A.K.S. Cattel. Applicability: Children of 2 to 13 years and above. Measures: Intelligence. Items: 46; Test 1–4. Reliability: 0.70, 0.86, 0.87 and 0.92. Availability: Institute for Personality and Ability Testing, USA.
32 Name: Non-verbal Test of Intelligence. Author: G.H. Napde. Applicability: School Students. Measures: Intelligence. Items: 80 m.c. Reliability: 0.70–0.94. Validity: 0.35–0.67. Availability: Institute of Vocational Guidance, 3, Cruichshanb Road, Bombay.
Tests of Interest 33 Name: California Occupational Preference Survey (COPS). Authors: Robert R. Knapp, Bruce Grant and George D. Demos. Applicability: Grades 9 to 16 and adults. Measures: Occupational information relevant to a subject’s measured interests. Reliability: Stability coefficients obtained over one- and two-year periods were 0.66 and 0.63, respectively. Validity: No correlations between the COPS and existing interest inventories, such as Kuder or Strong was found. Availability: Educational and Industrial Testing Service, P.O. Box 7234, San Diego, Calif.
34 Name: Kuder General Interest Survey. Author: G. Frederic Kuder.
Applications of Psychological Testing in Educational Setting
201
Applicability: Grades 6 to 12. Measures: It is a revision of Kuder Preference Record, Vocational Form C: outdoor, mechanical, computational, scientific, persuasive literary, musical, social service, clerical verification. No. of items: 10 occupational scales and verification. Reliability: K.R. reliabilities are above 0.70 for all the scales. Availability: Science Research Associates Inc., 259, East Erie St. Chicago, Illinois–60611.
35 Name: Strong-Campbell Interest Inventory. Authors: Edward K. Strong, Jr and David P. Campbell. Applicability: 16 years and over. Measures: Applied behaviour career counselling and personal selection. No. of items: Six general occupational themes, 23 basic interest scales, 124 occupational scales and two special scales (total items 325). Reliability: Test–retest reliability is 0.90 for 14 days and 0.8 for 30 days. Validity: High Reported. Availability: Stanford University Press, Stanford, California–94305.
36 Name: College Interest Inventory. Author: Robert W. Hendersen. Applicability: Grades 11–16. Measures: 16 areas: agriculture, home economics, literature and journalism, fine arts, social science, physical science, biological science, foreign language, business administration, accounting, teaching, civil engineering, electrical engineering, mechanical engineering and law. No. of items: 45 main items. Within each of these are 15 sub-items, consisting of occupations, areas of study or activities. Reliability: Internal consistency is excellent. Validity: Inter-scale correlations support the relative independence of the 15 curricular groupings. Availability: Henderson, Personal Growth Press, 20, Nassan St. Princeton, N.J.–8540.
Other Foreign Tests Useful for Educational Purpose 37 Name: Stanford University Evaluation Scales. Authors: Emily F. Garfield J. and Richard H. Blum.
202
Applied Psychometry
Applicability: Grades 2 to 12. Measures: Reported individual drug use patterns in several applicability groups. Reliability and validity: Involved of student’s responses through contact with those he considers friends, models, and examples. Availability: Garfield and Blum (1973).
38 Name: Comprehensive Career Assessment Scale. Authors: Stephan L. Jackson and Peggy M. Goulding. Applicability: Grades 3 to 7, 8 to 12 teachers. Measures: Used for assessment of needs in career, education and assists in programme evaluation. No. of items: 75 occupational titles, five in each of 15 occupational clusters. Reliability: Alpha coefficients for the five point cluster scale range from 0 to 0.75. Availability: Learning concepts, California, USA.
39 Name: School Behaviour Profile. Authors: Bruce Balow anel Rosalin A. Rubin. Applicability: 5 to 18 years. Measures: 1. A child’s behaviour as observed by the classroom teacher or other observers in routine school activities. 2. Problem behaviour or personal adjustment difficulties in the environment of the school.
Indian Psychological Tests Ability Tests 40 Name: Divergent Production Abilities Test. Author: K.N. Sharma. Applicability: Children, adolescents and adults. Measures: Divergent thinking.
Applications of Psychological Testing in Educational Setting
Reliability: Test–retest established: 0.67–0.85. Validity: Product Moment Correlation with Baqer Mehidi test of creative thinking. Availability: Ankur Psychology Agency, 22/481, Indira Nagar, Lucknow–16.
41 Name: Differential Aptitude Test (DAT). Author: J.M. Ojha. Applicability: Adolescents and adults. Measures: Adapted from DAT published by Bennett, Seashore Wesman. Items: Seven subtests. Reliability: Established. Validity: Established. Availability: Mansayan, New Delhi.
42
Name: Test of General Mental Ability. Author: M.C. Joshi. Applicability: 12–20 years. Measures: Mental ability. No. of items: 100. Reliability: Test–retest 0.88. Validity: Correlation with other tests: 0.8 Availability: Rupa Psychological Corporation, Sarakuan, Varanasi.
43 Name: Verbal Test of Scientific Creativity (VISC). Authors: V.P. Sharma and J.P. Shukla. Applicability: Students. Measures: Scientific thinking abilities. Items: Twelve, belonging to four subtests. Reliability: Test–retest reliability. Validity: Inter-factor correlations reported 0.95 to 0.99. Availability: National Psychological Corporation, Agra.
203
204
Applied Psychometry
44 Name: Draw-a-Man Test. Author: Dr Pramilla Pathak. Applicability: Small children. Measures: Growing mental status. Reliability: Test–retest: 0.62–0.96. Validity: External criterion 0.13 to 0.69; correlations on intelligence and reading ability: 0.13 to 0.24; correlation with good enough’s Draw-a-Man scale: 0.87. Availability: Pramilla Pathak, Dandia Bazar, Baroda.
45 Name: A Group Test of General Mental Ability (A five-point scale for adults). Authors: S. Jalota and R.K. Tandon. Applicability: 15 years and above belonging to English speaking areas. Measures: General mental ability. No: of items: Test booklets. Reliabilities: Split-half, K.R. formula, item reliability index and variance. Validity: Validity with Exam Marks: 0.35. Availability: Karuna Tandon, 2, K.G.K. Building, Rine Par, Muradabad–24400l.
46 Name: Test of ‘g’ culture fair, scale 3, Form A and B. Author: S. Rao. Applicability: Adolescent and Adult. Measures: All round general ability. No. of items: Same as in the original test by Cattell. Reliability: Repeat-test, two form 0.45–0.94. Validity: Concurrent validity (0.10) with RPM; with Mohsin test: 0.39–0.46. Availability: Psycho-centre, T-22, Green Park, New Delhi.
47 Name: Passi Tests of creativity (verbal and non-verbal). Author: B.K. Passi. Applicability: Grade IX–XI (14 – 16 + 1 years). Measures: Creativity.
Applications of Psychological Testing in Educational Setting
205
No. of items: Open-ended items, six tests (four verbal and two non-verbal). Reliability: Test–retest: 0.68 to 0.96; split-half of three tests: 0.88, 0.50, 0.51. Validity: Factorial validity, correlation with intelligence and achievement tests are significant for more than half of the tests. Availability: National Psychological Corporation, Agra.
48 Name: Non-verbal Test of Creative Thinking. Author: B. Mehdi. Applicability: Grades VII and VIII. Measures: Ability to deal with figural content in a creative manner. No. of items: Three types of activities. Reliability: Test–retest: 0.93 to 0.946; inter-score: 0.91–0.986. Validity: Factor validity, external validity, concurrent validity. Availability: Qamar Fatima, 3, Ghefooran Manzil, Dudhpur, Aligarh.
49 Name: Verbal Test of Creative Thinking. Author: B. Mehdi Applicability: Subjects from middle school up to graduate level. Measures: Creative factors: fluency, flexibility, originality and elaboration. No. of items: Four activities have three items each. Reliability: Test–retest: 0.896–0.959. Validity: Criterion validity: 0.32–0.40. Availability: Qamar Fatima, 3, Ghefooran Manzil, Dudhpur, Aligarh.
50 Name: OTIS Self-Administering Test of Mental Ability, Form B. Authors: Hindi adaptation by N.S. Chauhan, G. Tiwari, Original by A.M. OTIS and T.N. Bures. Applicability: Students of intermediate class. Measures: Intelligence. No. of items: 11 types of items arranged in mixed form. Validity: Form B coefficient of correlation, PE: 0.002; Form C coefficients of correlation: 0.036. Reliability: Parallel test and self-correlation. Availability: Agra Psychological Research Cell, Agra.
206
Applied Psychometry
51 Name: Social Intelligence Scale (SIS). Authors: N.K. Chadha and Usha Ganesan. Applicability: Adult. Measures: Patience, Cooperativeness, confidence, sensitivity, recognition of social environment, tactfulness, sense of humour, memory. No. of items: 66 Reliability: Split-half: 0.89–0.96; retest: 0.84–0.97. Validity: Cross-validation: 0.75–0.95. Availability: National Psychological Corporation, 4/230, Kacheri Ghat, Agra.
Other Indian Tests Useful for Educational Purpose 52 Name: Vocational Interest Inventory. Author: V. George Mathew. Applicability: English knowing persons in India. Measures: Originally developed to measure the vocational interest pattern of college students in Kerala, who have completed two years of college education. The seven scales are outdoor, mechanical, clerical, persuasive, aesthetic, social work and scientific. No. of items: 92 forced choice items. Reliability: Odd even reliabilities range from 0.78–0.90. Validity: Correlations of 0.26 to 0.59 are reported with an adaptation of study of values. Significant differences were obtained between college students specialising in nine fields of study. Availability: Mathew V. George, Department of Psychology, Kerala University, Kariavattom.
53 Name: Teacher Attitude Inventory. Author: S.P. Ahluwalia. Applicability: Adult males (student, teachers and practicing teachers). No. of items: 150. Measures: This is constructed in Hindi and English, with two equivalent forms. Both of these forms have six sub-scales, namely, attitudes towards (a) teaching as a profession, (b) classroom teaching, (c) child-centred practices, (d) educational process, (e) pupils and (f) teachers. Reliability: Internal consistency. Validity: Face and content validity.
Applications of Psychological Testing in Educational Setting
207
Availability: Ahluwalia, S.P., Development of a teacher attitude inventory and a study of the change in professional attitudes of student–teachers: Project Reports; NCERT research project (Varanasi).
NOTES 1. Likert-type scale is a assessment technique in which the responses of the test taker are measured on a continuum like: Question: Do you think cigarette smoking causes lung cancer? Responses: z z z z z
Strongly agree (5) Agree (4) Undecided (3) Disagree (2) Strongly disagree (1)
2. According to Charles Spearman, there are two types of intelligence: ‘General Intelligence’ (he called it ‘g’ intelligence) and ‘Special Intelligence’ (‘s’ intelligence).
13
Applications of Psychological Testing in Counselling and Guidance
CHAPTER OUTLINE 1. Psychological testing for better health, adjustment and counselling. 2. Two practical demonstrations of tests with scores and interpretation: (i) Dimensions of Temperament Scale (DTS) (ii) Family Environment Scale (FES) 3. Directory of major tests used in the health psychology and counselling
LEARNING OBJECTIVES After reading this chapter, you will be able to: 1. Understand how psychological tests can be used for better health, adjustment and counselling. 2. Understand the use and interpretation of two major tests used to assess health, adjustment and related counselling. 3. Find the information about important psychological tests used to assess health, adjustment and doing counselling. 4. Find the web information regarding health psychology, adjustment, counselling professionals and test providers.
208
Applications of Psychological Testing in Counselling and Guidance
209
PSYCHOLOGICAL TESTING FOR BETTER HEALTH, ADJUSTMENT AND COUNSELLING
H
ealth is a big issue today and health psychology is an emerging force to reckon with in the field of psychology. Health refers to a state of complete physical, mental and social wellbeing, and not merely the absence of some disease (WHO 1948, cited in Atwater and Duffy 1998). However, apart from physical, mental and social components, health also has emotional, moral, aesthetic and spiritual components. A complete sense of well-being in terms of these components is increasingly eluding the modern life. In such a scenario, the role of psychological counselling for promotion of better health, adjustment and integration becomes an imperative. Life threatening variables like stress, anxiety, and so on, often need a three-stage strategy to diagnose and deal with them effectively, and psychological tests find useful applications at all these stages. These three stages—diagnosis, treatment/coping, termination and follow-up—are applicable for various types of counselling, like career counselling, family counselling, and so on. Assessment of quality-of-life, especially the health related quality-of-life is an important consideration in this regard. The two major approaches to quality-of-life assessment are the psychometric approach and the decision theory approach. The psychometric approach attempts to provide separate measures for the different dimensions of quality-of-life. Perhaps the best-known example of psychometric approach is the sickness impact profile (SIP) given by Bergner et al. in 1981. The decision theory approach attempts to weigh the different dimensions of quality-of-life in order to provide its single expression. Other important tests for health and quality-of-life are Quality-of-Well-being Scale (Kaplan and Anderson 1990), the MC Master Health Index Questionnaire (Chambers 1996), the SF-36 (Ware et al. 1995) and the Nottingham Health Profile (McEwen 1992). The important psychological measurements for family counselling are the Marital Satisfaction Inventory (MSI) (Snyder 1997), Premarital Personal and Relationship (PREPARE) Issues, Communication and Happiness Scale by Olson (2004), Dyadic Adjustment Scale (DAS) by Spanier (1989), Pre-marriage Awareness Inventory (PAI) by Velander (1993), Relationship Evaluation (RELATE) by Loyer-Carlson et al. (2002), and so on.
TWO PRACTICAL DEMONSTRATIONS OF TESTS WITH SCORES AND INTERPRETATION Dimensions of Temperament Scale (DTS) Dimensions of Temperament Scale (DTS) has been devised by Professor N.K. Chadha and Ms Sunanda Chandra (1984) to measure the temperament of a person on 15 selected dimensions: sociability (12), ascendance (9), secretiveness (10), reflectiveness (10), impulsivity (7), placid, (11), accepting (6), responsible (9), vigorous (14), cooperative (14), persistence (8), warmth (14), aggressiveness (10),
210
Applied Psychometry
tolerance (11) and tough mindedness (7). (In brackets are written the number of items for each dimension.) There are in total 152 yes/no type responses which were standardised on a sample of 250 persons. Some items of the test (along with their actual serial number) are as follows, 24. 42. 65. 94. 149.
I don’t have strong political opinion? I talk about myself only if it is very necessary? Do you try hard to stand tearless? Do you like to plan things in advance? Do you like to walk fast?
The scale has a test–retest reliability of 0.94 and has been checked successfully for cross-validity and empirical validity. The overall cross-validity stands at 0.81. The test is available with National Psychological Corporation, 4/230, Kacheri Ghat, Agra–282004, UP, India. Scores and Interpretation Consider score of a person XYZ on the 15 dimensions of DTS as given in Table 13.1. Table 13.1 Score of a Subject on DTS S. No.
Dimensions
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13 14 15.
Sociability (A) Ascendance (B) Secretiveness (C) Reflective (D) Impulsivity (E) Placid (F) Accepting (G) Responsible (H) Vigorous (I) Cooperative (J) Persistence (K) Warmth (L) Aggressiveness (M) Tolerance (N) Tough-minded (O)
Scores 6 8 9 9 6 4 3 9 12 11 6 6 9 6 6
Descriptive Category Very low High High High High Very low Low Very high High High Average Very Low High Very Low High
Source: Author.
The total temperament score of XYZ is 111, which is very low. This indicates that the subject can be called unstable and individualistic by nature and closer to Type-A characteristic.
Applications of Psychological Testing in Counselling and Guidance
211
Family Environment Scale (FES) The FES has been developed by Professor N.K. Chadha and Dr Harpreet Bhatia (1989) to measure the environment of the most important human institution, that is, family, on three broad dimensions. The three broad dimensions on which the scale measures the family environment are relationship, personal growth and system maintenance. The relationship dimension is further divided into sub-dimensions of cohesion, expressiveness, conflict, and acceptance and caring. Personal growth dimension has been bifurcated further into independence and active recreational orientation. System maintenance dimensions have been divided into sub-dimensions of organisation and control. The scale by the same title by Moos (1989) is the conceptual basis for this scale. The FES has 69 items with overall test reliability coefficient of 0.95. The scale possesses both face and content validity, and has used qualitative norms between age range 17 to 50 years. The test is available with Ankur Psychological Agency, 22/481, Indira Nagar, Lucknow–16. Some sample items from the FES are given in Table 13.2. Table 13.2
Sample Items with Response Options on FES
Item No.
Items
1. 2.
We enjoy doing things together Having hobbies is encouraged in our family Family members do not get along with each other There is no sense of closeness in our family All of us participate together in family functions/programmes
3. 4. 5.
Strongly Agree
Agree
Neutral
Disagree
Strongly Disagree
Source: Author.
Scores and Interpretation Consider the score of a person on FES as 250, whose dimension-wise description in Table 13.3. This indicates that the family environment is low on five dimensions, that is, cohesion, acceptance and caring, active recreational activities, organisation and control. Out of these five dimensions, two relate to relationship (cohesion, and acceptance and caring), two to system maintenance (organisation and control) and one with personal growth (active-recreational orientation). The subject has scored high on only two dimensions, that is, expressiveness (relationship) and independence (personal growth). The only average score is on conflict (relationship).
212
Applied Psychometry
Table 13.3
Interpretation of Score of a Person on FSE
S. No.
Dimension
1. 2. 3. 4. 5. 6. 7. 8.
Cohesion Expressiveness Conflict Acceptance and caring Independence Active recreation Organisation Control
Score 35 40 47 38 52 20 6 12
Interpretation Low High Average conflict Low High Low Low Low
Source: Author.
So, the family has low system maintenance as it has scored low on both its dimensions, that is, organisation and control. High expressiveness and independence along with low cohesion, and acceptance and caring may be the causal factors behind the average conflicts, which may deteriorate into major ones in the future, if counselling or intervention in some form does not take place. High score on expressiveness can have a positive as well as a negative implication. If the expressiveness is used as a vent for negative thoughts and actions, it would increase the conflict within the family. On the other hand, if the expressiveness is used in a positive manner, it can bring about an increase in the cohesiveness of the family. Similarly, independence in thought and decision-making is positive but it is not appropriate if it is a way for the members of the family growing away from each other.
DIRECTORY OF MAJOR TESTS USED IN HEALTH PSYCHOLOGY AND COUNSELLING Child-Rearing Practices 1 Name: Child-Rearing Practices Questionnaire. Authors: T.E. Dielman and Keith Barton. Applicability: Potential or actual mothers and fathers. Measures: 16 factors of child-rearing practices. Reliability and validity: Congruence coefficients were computed among rotated factor pattern matrices, all factor mulches were significant; additional validation research completed on the questionnaire includes the relation of child-rearing practices to achievement in school (Barton et al. 1974, cited in Chadha 1996); prediction of behaviour problems in 6–8-year old children.
Applications of Psychological Testing in Counselling and Guidance
213
Availability: The Institute for Personality and Ability Testing, 1602 Coronado Drive, Champaign, IIlinois–61870.
2 Name: Child-Rearing Practices Report (CRPR). Author: Jeanne H. Block. Applicability: Form I: Parents; Form II: 16+ years. Measures: 1. Socialisation attitudes and orientations (Form I) of mothers and fathers. 2. Perceptions of parent’s child-rearing behaviours and attitudes of adolescents. No. of items: 91, Q-sort, a seven-point rectangular distribution of responses. Reliability: Test–retest reliabilities (one-year interval), r = 0.71; Form I, r = 0.64 (mothers); and Form II, r = 0.65 (fathers). Availability: Jeanne H. Block, Institute of Human Development, Tolman Hall, University of California, Berkeley, California–94720.
3 Name: Adaptive Strategies Interview. Author. Daniel R. Scheinfeld. Applicability: Parents. Measures: Parent’s ideas regarding children, child-rearing and the child’s relationship to the world. No. of items: 8. Reliability and validity: Correlations with an achievement factor composed of grades, achievement test scores and IQ. 1. 2. 3. 4. 5. 6. 7. 8.
Purposeful agent: 0.48, Social connected needs: 0.48, Active exchange with environment: 0.48, Dominance versus exchange in relationships with parents and other adults: –0.41, Trust selectivity: 0.49, Process of growth of competence: 0.66, Meaning of being adapted: 0.51 and Adaptation to threat: 0.45.
Availability: D. Scheinfeld, Institute for Juvenile Research, 1140 S. Pauline, Chicago, Illinois–60612.
214
Applied Psychometry
Measures of Self-concept 4 Name: The Thomas Self-concept Values Test. Author: Walter L. Thomas. Applicability: 3 to 9 years. Measures: Self-concept of young children using the self-report format. No. of items: 14 bipolar adjectival items (to be responded to by the child four times. 1. Child’s own perception, 2. His perceptions of other’s perceptions of him, like his mother’s, his teacher’s and his peers’. Reliability: Test–retest: 0.78 for the total score; 0.82 for self-reliant scale; 0.68 for the teacher-referent scale; 0.61 for peer-referent scale; internal consistency: 0.73. Validity: Construct validity. Availability: Combined Motivation Education systems, Inc., 6300, River Road, Rosemont, NY– 10628
5 Name: Tennessee Self-Concept Scale. Author: William H. Fitts. Applicability: 12 years and over. Measures: Self-concept (90 items) and self-criticism (10 items). No. of items: 100 (five-point scale from completely false to completely true). Reliability: Retest reliability—while it varies for different scores—is in the high 0.80s. Validity: Correlations with various MMPI scales are frequently in the 0.50s and 0.60s. Availability: Counsellor Recordings and Tests, Box 6184, Acklen Station, Nashville, Tennesse–37212.
6 Name: Self-concept Scale. Author: Joseph C. Bledsoe. Applicability: 7 to 13 years. Measures: Self-esteem and self-ideal. No. of items: 30 descriptive adjectives, a three-point scale. Reliability: Test–retest reliabilities ranged from 0.66 to 0.81.
Applications of Psychological Testing in Counselling and Guidance
215
Validity: Correlations with anxiety scales: –0.30 to –0.46; correlations with California Tests of Personality Self-Adjustment Scale: 0.39. Availability: Joseph C. Bledsoe, Box 325, Aderhold Hall, University of Georgia, Athens, Georgia– 30602.
7 Name: Coopersmith Self-Esteem Inventory (SEI). Author: Stanley Coopersmith. Applicability: 9 years to adults. No. of items: 58 items that make five subscales: General Self (26 items), Social Self-peers (eight items), Home-parents (eight items), Lie Scale (eight items) and School-academic (eight items). Reliability: Total scores of Forms A and B correlated by 0.86. Validity: Current validity data reported in the monthly self-esteem Institute Newsletter. Availability: Self-esteem Institute, 1736 Strockton Street, San Francisco, California–94133.
Test of Adjustment 8 Name: College Adjustment and Study Skills Inventory. Author: Frank A. Christensen. Applicability: College/youth. Measures: Characteristics important to a student’s success in college. No. of items: 57 question inventory, four-point scale from very often to very seldom (five subtests). Reliability: 0.51–0.83. Validity: The inventory compares very unfavourably with similar inventories such as BrownHoltzman survey of Study Habits and Attitudes. Availability: Personal Growth Press, Inc., 653, Longfellow Drive, Berea, Ohio–44017.
9 Name: Bell Adjustment Inventory. Author: H.M. Bell. Applicability: High School and College Students. Measures: Individual life adjustments. No. of items: Six subscales, 200 items (Yes/No). Reliability: Odd–even: +0.80.
216
Applied Psychometry
Validity: Cross validity; construct validity by correlating the scale with other tests. Availability: Consulting Psychological Press, California.
Test of Attitude 10 Name: Laureton Self-Attitude Scale. Authors: George M. Guthrie, Alfred Benneth and Leon Gorlow. Applicability: Developed on female subjects, 14 to 22 years. Measures: The individual’s attitudes towards his own physical appearance, physical health, interpersonal relationships with peers, interpersonal relationships with others, personal worth and mental health. No. of items: 150. Reliability: Test–retest reliabilities for the various subscales are all in 0.80’s. Availability: George M.Guthrie, Department of Psychology, Pennsylvania State University, University Park, Pennsylvania–16802.
11 Name: Suryey of Study Habits and Attitudes (SSHA). Authors: Willian F. Brown and Wayne H. Holtzman. Applicability: Grades 7 to 12, 12 to 14. Measures: Study habits and study attitudes. No. of items: 100; five point scale: rarely, sometimes, frequently, generally or almost always. Reliability: K-R 8 reliability coefficient: 0.87; Test–retest coefficient: 0.88. Validity: Low correlation coefficients between SSHA and Aptitude tests, and higher coefficients between SSHA and grades. Availability: Psychological Corporation, The, 304 East 45th St., New York, NY–10017. 12 Name: Brief Criminal Attitude Scale (BCATS). Author: A.J.W. Taylor. Applicability: 13 years to adult. Measures: Differences in criminal attitudes. No. of items: 15. Reliability: For males 0.6 and for females 0.65. Availability: Taylor (1968).
Applications of Psychological Testing in Counselling and Guidance
217
13 Name: Attitudes toward the Retarded. Authors. Rosalyn E. Efron and Herman Y. Efron. Applicability: 14 years and over. Measures: Attitudes towards the retarded (six factors). No. of items: 70 (six-point agree–disagree continuum). Reliability: Reliability coefficients for six factors were: 1. 2. 3. 4. 5. 6.
Segregation via institutionalisation: 0.79, Cultural deprivation: 0.63, Non-condemnatory etiology: 0.57, Personal exclusion: 0.73, Authoritarianism: 0.69 and Hopelessness 0.59.
Availability: Herman Y. Efron, Veterans Administration, 810 Vermont Avenue, N.W., Washington DC–20420.
14 Name: Attitude toward Underprivileged Children and Youth. Author. T. Bentley Edwards. Applicability: Adults. Measures: Attitude toward the underprivileged. No. of items: Likert-type scale consisting of 72 items (six-point scale from strongly agree to strongly disagree). Reliability and validity: Coefficients of reproducibility varying between 0.91 and 0.95 were obtained for the six scales. Availability: Edwards (1966).
15 Name: Attitude Behaviour Scale: Mentally Retarded (ABS-MR). Author: John E. Jordon. Applicability: Adults. Measures: Attitudes and behaviour in relation to the mentally retarded. No. of items: 140 (six subscales). Reliability: Coefficients ranged from 0.60 to 0.90 for the six subscales.
218
Applied Psychometry
Validity: Construct, content and face validity have been determined (Jordan 1971). Predictive validity was determined via the known group method (ibid.). Availability: John E, Jordan, Director, International Rehabilitation Special Education, Michigan State University, East Lansing, Michigan–48824.
16 Name: Questionnaire of School-Related Attitudes and Motivation. Authors: Sar B. Khan, Adapted from McGuire et al. (1961) and Child et al. (1956). Applicability: Junior High School. Measures: 1. Attitudes towards education and 2. Attitudes towards teacher, study habits, achievement motivation, need achievement, achievement anxiety. No. of items: 122. Reliability: Internal consistency coefficient ranged from 0.84 to 0.93 for females. Validity: Canonical correlation of 0.69 and 0.76 were obtained between the composites of affective and achievement variables. Availability: Khan (1966).
17 Name: Tobacco Use Questionnaire. Authors: Herbert S. Rabinowitz and William H. Zimmerli. Applicability: Grade seven to nine students, parents and teachers. Measures: Attitudes towards smoking (10 items), knowledge about health hazards in smoking (45 items) and smoking behaviour (three items). No. of items: 58. Reliability: Kuder-Richardon reliability 0.84. Availability: Herbert S. Rabinowitz, Community Services Group, United Way of Buffalo and Erie County, 742 Delware Avene, Buffalo, New York–10209.
18 Name: Test of Attitudes towards the Gifted. Author: Jon C. Jacobs.
Applications of Psychological Testing in Counselling and Guidance
219
Applicability: Adults working with gifted children. Measures: Adult’s attitudes towards gifted children and classified those attitudes as positive or negative. No. of items: Form I is a sentence completion (15 items) set; Form II is a test of information gifted children (18 items). Reliability: Test–retest reliability correlation was 0.74. Validity: Mean positive responses: 9.43; standard deviation: 0.91 (for teachers). Availability: Jon C. Jacobs, Special Service Division, Plymouth Community School, Plymouth, Michigan–48170.
19 Name: Teenage Self-test: Cigarette Smoking. Authors: Ann M. Miline and Joseph G. Colemen. Applicability: 13 to 18 years. Measures: Psychosocial forces affecting decision to smoke in form of eight scales: 1. 2. 3. 4. 5. 6. 7. 8.
Health concern: cost, Non-smokers rights, Positive smoker’s attributes, Direct effects: benefit, Negative smoker attributes, Parental control: authority, Density control: independence and Rationalisation.
No. of items: 29 (five-point scale: strongly agree to strongly disagree). Reliability: Kuder-Richardson reliabilities for the scale range from 0.83 to 0.50. Availability: Bureau of Health Education, Center for Disease Control, 1600 Clifton Road, Atlanta, Georgia–30333.
20 Name: Knowledge and Attitude Survey Concerning Epilepsy. Author: Jane W. Martin. Applicability: Adults. Measures: Knowledge of an attitude towards epilepsy. No. of items: 10. Availability: Jane Martin, 131 Mesilla NE, Albuquerque, New Mexico–87108.
220
Applied Psychometry
21 Name: Parental Attitudes toward Mentally Retarded Children Scale. Author. Harold D. Love. Applicability: 18 to 65 years. Measures: Attitudes towards mentally retarded children. No. of items: 30 (four-point. strongly agree to strongly disagree). Reliability: Split-half reliability: 0.91 with 62 parents; 0.92 with 200 parents. Availability: Harold D. Love, Special Education Department, University of Central Arkansas, Conway, Arkansas–72032. 22 Name: Children’s Social Attitude and Value Scales. Authors: Daniel Soloman, Arthur, J. Kendall and Mark I Ober Cander. Applicability: 8 to 15 years. Measures: Various social values and attitudes: 1. 2. 3. 4. 5. 6. 7.
Task self-direction versus authority-reliance (6 items), Democratic values (16 items) (four subscales), Attitude towards group activities (12 items), Cooperation versus competition (nine items), Value on decision-making autonomy (10 items), Value on heterogeneity (four items), and Concern for others (nine items).
No. of items: Seven with four subscales of democratic values (six-point scale from strongly agree to strongly disagree). Reliability: Internal reliability (alpha) coefficients (in order of scales): 0.81, 0.49, 0.54, 0.07, 0.73, –0.24. Availability: Daniel Soloman, Psychological Service Section, Montgomery County Public Schools, 850 Hungerford Drive, Rockville, Maryland–20850.
Indian Tests on Counselling and Guidance Self-Concept 23 Name: Self-Concept Inventory. Author: Sagar Sharma. Applicability: High school students (X and XI students).
Applications of Psychological Testing in Counselling and Guidance
221
Measures: Self-Concept and self-ideal discrepancies. Available in both English and Hindi. No. of items: 68. Reliability: Test–retest reliability coefficients on self-concept scores was found to be 0.81 and test–retest reliability on self-ideal discrepancies was found to be 0.72. Validity: Content validity was established; convergent validity with Deo’s (cited in Chadha 1996) Personality test also worked out. Availability: Sharma (1970a).
24 Name: Self-Concept Scale for Children. Authors: Harmohan Singh and Saraswati Singh. Applicability: Teenagers and adolescents. Measures: To enlighten the subjects to know about themselves. No. of items: 22 trait descriptive adjectives. Reliability: Test–retest: 0.73–0.91. Availability: Agra Psychological Research Cell, Tiwari Kothi, Belanganj, Agra–282004.
25 Name: Chadha Self-Concept Scale. Author: N.K. Chadha. Applicability: All age levels. Measures: Perceived self-concept, ideal self-concept and self-concept discrepancy. No. of items: 120. Reliability: Split-half: 0.78 and 0.87; test–retest: 0.69 and 0.72. Validity: Face and content. Availability: Agra Psychological Research Cell, Tiwari Kothi, Belanganj, Agra–282004.
Test of Adjustment 26 Name: Adolescent Adjustment Inventory. Author: N.Y. Reddy. Applicability: Adolescents. Measures: Personal and social adjustment. Items on personal adjustment measure neurotic tendencies, feelings of inferiority, guilt, personal worth and attitude towards future. Social adjustment items measure the adjustment towards home and school as well as sex adjustment. No. of items: 88.
222
Applied Psychometry
Reliability: Odd–even reliability coefficients were 0.84 and 0.95 for personal and social adjustment, respectively, after applying the Spearman-Brown correlation. Validity: Validated against teachers’ ratings, Bell’s adjustment inventory, California personal and social adjustment inventory, parents’ ratings and by comparing the adjustment scores of delinquent and non-delinquent groups. Availability: Reddy (1964). 27 Name: Adjustment Inventory. Author: H.S. Asthana. Applicability: Hindi knowing school and college students. Measures: The adjustment among students. No. of items: 42 Yes/No type questions in Hindi. Reliability: Split-half reliability coefficient with Spearman-Brown’s correlation: 0.97. Validity: Item validities established. Availability: Asthana (1945, 1950, 1968). 28 Name: Adjustment Inventory for Older People. Author: P.V. Ramamurti. Applicability: Older people. Measures: Studies the general adjustment of middle aged and older people in the areas of health, home, social, emotional and self. Reliability: Test–retest reliability coefficient was found to be 0.88 with a gap of 10 days. Validity: Item analysis was done; discriminates well between the well adjusted and the maladjusted in each area. Availability: Ramamurti (1968). 29 Name: Adjustment Inventory for College Students. Authors: A.K.P. Sinha and R.P. Singh. Applicability: College students. Measures: Adjustment in five areas: home, health, social, emotional and educational. No. of items: 102. Reliability: Split-half reliability: 0.94; test–retest reliability: 0.93; by Hoyt’s method: 0.94 and K-R formula: 0.92. Validity: Item analysis validity coefficients were significant at 0.001 level correlation between inventory scores and hostel superintendent rating was 0.58.
Applications of Psychological Testing in Counselling and Guidance
223
Availability: National Psychological Corporation, Labh Chand Market, Raja Mandi, Agra–282002.
30 Name: Samoyojan Suchika. Author: R.P. Tripathi. Applicability: Adults and college students. Measures: Adjustment. No. of items: 82 (True–False type). Reliability: 0.89–0.79. Validity: 0.87. Availability: Department of Psychology, Banaras Hindu University, Varanasi.
31 Name: Adjustment Inventory. Author: M.N. Palsane. Applicability: High school and college students. Measures: Home, family, social, personal, emotional, educational and health. No. of items: 375. Reliability: 0.79–0.93. Validity: Content validity. Availability: Department of Experimental Psychology, University of Poona, Pune.
32 Name: Adjustment Inventory for School Children. Authors: A.K.P. Sinha and R.P. Singh. Applicability: Designed for Hindi speaking school students of 14–18 years. Measures: Used to assess their adjustment in emotional, social and educational areas. No. of items: 60 items; for each area of adjustment, there are 20 items. Reliability: Split-half, test–retest, and K-R formula range between 0.92 to 0.96. Availability: National Psychological Corporation, Agra.
33 Name: Social Adjustment Inventory. Author: N.M. Bhagia. Applicability: School students of class IX, X and XI.
224
Applied Psychometry
Measures: Cumulative record of the student, divided in five categories: academic matters, schoolmates, teachers, school organisation and self. No. of items: 165. Reliability: Split-half (odd–even): 0.83; test–retest: 0.98. Validity: Intercorrelations among various categories from 0.25 to 0.70. Availability: Mansayan, S-524, School Block, Shakarpur, Main Vikas Marg, Delhi–110092.
34 Name: Emotional Maturity Scale. Authors: Yashvir Singh and Mahesh Bhargava. Applicability: College students. Measures: Emotional maturity in terms of emotional regressing, emotional instability, social maladjustment, and so on. No. of items: 48; self-reporting five point scale. Reliability: Test–retest: 0.75; internal consistency 0.42 to 0.86. Validity: External criterion: 0.64; reasonable predictive validity. Availability: National Psychological Corporation, 4/230, Kacheri Ghat, Agra.
35 Name: Family Relation Inventory. Authors: G.P. Sherry and J.C. Sinha. Applicability: Maladjusted cases, both men and women. Measures: Areas of maladjustment vis-à-vis home. No. of items: 50 (true–false type). Reliability: Test–retest: 0.42–0.85. Validity: Correlation worked out with Saxena’s Personality Inventory: 0.40–0.78 Availability: Agra Psychological Corporation, Agra.
36 Name: Marital Adjustment questionnaire. Author: Pramod Kumar. Applicability: Married adult group. Measures: Amount of satisfaction in marital life. No. of items: 27 items with alternative responses ‘Yes’ and ‘No’; final form consists of 25 items. Reliability: Split-half and test–retest reliabilities of 0.49 and 0.71 have been obtained but are questionable as the sample size is only 60.
Applications of Psychological Testing in Counselling and Guidance
225
Validity: Content validity. Availability: Rupa Psychological Centre, B-19/60 Deoriabir, Bhelupura, Varanasi.
37 Name: The Battery of Pre-Adolescent Personality Test. Authors: U. Pareek, T.V. Rao, P. Ramalingaswamy and B.R. Sharma. Applicability: Pre-adolescent group of children. Measures: Patterns of adjustment dependency, classroom trust, initiative, activity level and level of aspiration of the pre-adolescent. No. of items: The battery consists of six tests: 1. Pre Adolescent Adjustment Scale (PAAS): It consists of 40 items of the following adjustment areas like home, school, teachers, peers, general and total. 2. Pre-adolescent Dependency Scale (PAPS): Consists of two forms with 10 items in each form. 3. Pre-Adolescent Class Trust Scale (PACTS). 4. Pre-Adolescent Initiative Questionnaire (PAlQ): Consists of six situations with open ended questions. 5. Pre-Adolescent Activity Level Scale (PAALS). 6. Pre-Adolescent Level of Aspiration Test (PALAT): It has four versions presenting different experimental conditions. Reliability: Test–retest reliability coefficients of PAAS range from 0.22 to +0.60; Pre-Adolescence Dependency Scale (PADS) from 0.06 to +0.51; Pre-Adolescence Class Trust Scale (PACTS) from +0.33 to +0.77 and PAIQ from +0.50 to +0.66; PAALS is highly reliable. Validity: Acceptable level of validity. Availability: Rupa Psychological Centre, B 19/60-B Deoriabir, Bhelupura, Varanasi.
Attitude 38 Name: Student’s Liking Scale. Authors: S.P. Malhotra and B.K. Passi. Applicability: Secondary school and college students up to graduation level. Measures: Liking of the students towards their teachers. No. of items: Likert-type scale with 30 items. Reliability: Test–retest: 0.92. Validity: Face (0.89) and concurrent validity (0.79) established. Availability: National Psychological Corporation, Kacheri Ghat, Agra.
226
Applied Psychometry
39 Name: Religiousity Scale. Author: L.I. Bhushan. Applicability: Students and adults. Measures: Intensity of religious attitude. No. of items: 36 items, five alternative responses and the testee can choose any one response. Reliability: Split-half: 0.82, test–retest: 0.91. Validity: Content, predictive and concurrent validity. Availability: National Psychological Corporation, Agra.
40 Name: Attitude Scale for Physical Education. Authors: G.P. Thakur and Manju Thakur. Applicability: Males and females but no age group is given. Measures: Attitude of people towards physical education. No. of items: 16 statements. Reliability: Split-half: 0.78; test–retest: 0.72. Validity: Validity calculated by calculating the significance of two known extreme groups by t-test. Availability: Agra Psychological Research Cell, Agra.
41 Name: Rao’s School Attitude Inventory. Author: D. Gopal Rao. Applicability: School students. Measures: School attitude. No. of items: 57 items (Likert-type scale). Reliability: Split-half: 0.81. Validity: No validity coefficient reported. Availability: Agra Psychological Research Cell, Tiwari Kothi, Belanganj, Agra.
14
Applications of Psychological Testing in Clinical Settings
CHAPTER OUTLINE 1. Psychological testing for clinical help 2. Two practical demonstrations with scores and interpretation: (i) 16-Personality Factors (16-PF) (ii) Thematic Apperception Test (TAT) 3. Directory of major tests used for clinical purpose
LEARNING OBJECTIVES After reading this chapter, you will be able to: 1. 2. 3. 4.
Understand the legacy of psychological tests in clinical setting. Understand the use and interpretation of two major tests used for clinical purpose. Find the information about various tests used for the clinical purpose. Find the web information about major clinical consultants and the providers of clinical tests.
227
228
Applied Psychometry
PSYCHOLOGICAL TESTING FOR CLINICAL HELP
C
linical help is the identity of psychological help because in the absence of it, psychological help will be reduced just to a common sense social help. Clinical assessment has a long history in the field of mental health. The use of psychological tests in clinical setting is guided by basically three models of mental health: the information gathering model, the therapeutic model and the differential treatment model (Finn and Tonsager 1997). The information gathering model is used by the practitioners of mental health when they use psychological tests to collect information for the diagnosis of mental problems. And when a practitioner works with the therapeutic model, he uses psychological tests to provide new experiences and information to the client, which he can use for self-discovery, personal growth and development. So, in the therapeutic model, tests basically act as tools of intervention to bring about positive changes in the patient. The therapeutic model grew out of the humanistic movement of the 1950s and the 1960s. Finally, the differential treatment represents the use of tests for conducting research and evaluating the outcomes of the intervention programmes. These three models provide a complete psychological intervention when they are combined in the same order as presented above. Hence, they should not be viewed in isolation. Based on above three models, we can say that there are basically three ordered stages of clinical intervention: beginning, treatment, and termination and follow-up. In the beginning phase, tests are used for diagnostic purposes and they may chiefly include clinical interviews and projective tests. In the treatment phase, the levels of abnormality are measured with the use of various psychological tests and electrophysiological techniques. As clinical disorders possess a tendency to relapse, psychological tests are used even after the formal termination of the treatment, as a precautionary or follow-up measure.
TWO PRACTICAL DEMONSTRATIONS WITH SCORES AND INTERPRETATION Cattell’s 16-Personality Factors (16-PF) Cattell’s 16-PF: A Brief Profile 16-PF is a landmark contribution by Raymond B. Cattell (1982). This is a test which gives objective scores. This is a test which has condensed 4,504 real traits (Allport and Odbert 1936), beginning with all the adjectives in unabridged English dictionary, into 36 surface traits and 16 source traits, through the use of factor analytic method. The test consists of 185 items with three forced choice alternatives for each item. The test is applicable to persons of 16 years and above, and measures anxiety, extraversion, independence, self-control and tough-mindedness. It gives a most complete coverage of the personality possible in a brief time.
Applications of Psychological Testing in Clinical Settings
229
The 16 personality factors mentioned are: Warmth (A), Reasoning (B), Emotional stability (C), Dominance (E), Liveliness (F), Rule-consciousness (G), Social boldness (H), Sensitivity (I), Vigilance (L), Abstractedness (M), Privateness (N), Apprehension (O), Openness to change (Q1), Self-reliance (Q2), Perfectionism (Q3) and Tension (Q4). 16-PF has been standardised on 2,500 US persons, consisting of equal member of males and females. Reliability reports of scores on 16-PF are low, with only the social boldness scale consistently above r = .80 (Erford 2007). Clinicians should be cautious while using 16-PF with high school graduates and persons above 65 years of age because they were under-represented in the normative sample. 16-PF is widely used to measure normal personality characteristics, preference for various work activities, problem-solving abilities and to identity problems in areas known to be problematic for adults (Erford 2007).
Scores and Interpretation A
B
C
E
F
G
H
I
L
M
N
O
Q1 Q2
4
9
4
8
4
3
4
9
5
10
6
6
10
7
Q3 Q4 6
7
This report is based on the 16-PF questionnaire administered to a normal person. The report is an interpretation of the 16-PF scales in light of the issues relevant for counsellors and clinicians. Normal personality traits can bring individuals to a clinician’s office for several reasons, including a mismatch between the trait and the person’s circumstances or a conflict between two normal traits within the individual. In any case, having an understanding of normal personality traits can facilitate treatment. It is important to note that the reports discussed here are samples used as examples of how a report needs to be written. For understanding 16-PF, the scores, Sten1 scores and the profile of one Mr X have been given and discussed. Mr X’s high score on Factor M indicates impracticality along with which he exhibits below average ego strength, affecting his ability to defer his needs when appropriate and to modulate the expression of his personality as the occasion demands. He may, at times, lack the psychological resources to compensate for his impracticality. Self-esteem problems are suggested. He seems to find reality to be frustrating and disappointing. He does not always deal with it directly. Instead, he may seek ways to compensate for his low self-esteem by emotionally escaping from unpleasant prospects. No significant problems with tension are immediately apparent, although he does seem to be somewhat more driven and anxious than the average person.
Interpersonal Issues His assertiveness makes him a powerful figure in groups. He likes to stand up for his point of view in whatever circumstances he finds himself, especially when his conceptual skills give him a strong sense of what needs to be done. At times, his force of will can seem controlling and domineering,
230
Applied Psychometry
while at other times, he can seem confident and persuasive. His level of aggressiveness can sometimes make him seem stubborn or angry, especially when he resents anyone else trying to take control over the situation. He may have trouble making others feel listened to or that their points of view are not respected. He is a somewhat cautious person who shies away from the spotlight and who finds it difficult to handle novel social situations. His scores also seem to be of a self-consciousness that may come to him when he is required to initiate social contacts or to manage conflict. His desire to control his environment and the flow of events around him may be limited to his own immediate sphere, as he seems to lack the social boldness that usually accompanies the dominant role in a group. Self-sufficiency is indicated to some degree. This is typically an asset, especially as in this case where it is not so pronounced as to signal problems with being a team player. Still, he may not always be at his best in groups because relating to others can seem to him an annoyance. He appears to be average on warmth and discretion. In his interpersonal communications, he is neither forthright and clumsy nor especially discreet and tactful.
Self-Control He has little inclination to follow rules and, thus, may lack one of the major motivations for doing what others expect. Mr X is much less of a follower of rules than most people, indicating that he is likely to live up to the standards of society only if they happen to be similar to his own beliefs and interests. Some of the tasks of living, usually taken for granted, may become optional in his mind. His energy level seems to be about average, suggesting that this is not a problem for him in general. He does not seem to be lacking in restraint but neither is he overly sombre and serious. He is not a particularly organised or orderly person. He does not always bother with details and he may be too susceptible to the influence of recent events. He may have a tendency to make impulsive decisions. His lack of good habits to fall back on—when necessary—can leave him vulnerable to emotional distress. Cognition and Communication He demonstrates superior abstract reasoning ability with a capacity to understand complex ideas, process complicated verbal material and grasp difficult concepts. He is sensitive and oriented towards emotions and likes to express his feelings rather than keeping them held in check. It may be necessary to approach him tactfully, since he seems to be highly sensitive to unexpected criticism. He may prove more vulnerable to stress than his peers. He is an extremely imaginative, idea-oriented person who often generates solutions that go beyond the needs of the immediate situation. Too much of his potential may be wasted on daydreams, rather than being put to constructive use. Attending to routine details is probably not his strong point. He reports being open to change to an unusual degree, actively seeking out novel experiences and new approaches. This implies flexibility and does not necessarily mean that he is ready to reject
Applications of Psychological Testing in Clinical Settings
231
proven solutions just because they are not new. However, he may need to learn not to change course in some spheres just because it might be interesting to do so.
Areas to Explore This profile has shown only one area to be explored. One such score is not that unusual but should still be investigated carefully before treatment planning and clinical decisions are finalised.
Impracticality (Factor M score = 10) Very high scores on Factor M indicate a detachment from the mundane issues of life that is likely to interfere with the individual’s competence and effectiveness.
Thematic Apperception Test (TAT) American psychologist Henry Murray (1893–1988) developed a theory of personality that spoke about motives, presses and needs (Table 14.1). According to Murray (Murray and Bellak 1973), a need is a ‘potentiality or readiness to respond in a certain way under certain given circumstances.’ Murray states that the existence of a need can be inferred on the basis of: (1) the effect or end result of the behavior, (2) the particular pattern or mode of behavior involved, (3) the selective attention and response to a particular class of stimulus objects, (4) the expression of a particular emotion or affect and (5) the expression of satisfaction when a particular effect is achieved or disappointment when the effect is not achieved (1938, p. 124). Subjective reports regarding feelings, intentions, and goals provide additional criteria. —Hall and Lindzey (1970) Theories of personality based upon needs and motives suggest that our personalities are a reflection of behaviours controlled by needs. While some needs are temporary and changing, other needs are more deeply seated in our nature. According to Murray, although these psychogenic needs function mostly at the unconscious level, they play a major role in our personality. Murray identified two types of needs: 1. Primary Needs: Based on biological demands, such as the need for oxygen, food and water. 2. Secondary Needs: Generally psychological, such as the need for nurturing, independence and achievement.
232
Applied Psychometry
Illustrative List of Murray’s Needs Table 14.1
Illustrative List of Murray’s Needs
Need
Brief Definition
n Abasement
To submit passively to external force; to accept injury, blame, criticism, punishment; to surrender; to become resigned to fate; to admit inferiority, error, wrongdoing or defeat; to confess and atone; to blame, belittle or mutilate the self; to seek and enjoy pain, punishment, illness and misfortune.
n Achievement
To accomplish something difficult; to master, manipulate or organise physical objects, human beings or ideas; to do this as rapidly and as independently as possible; to overcome obstacles and attain a high standard; to excel oneself; to rival and surpass others; to increase self-regard by successful exercise of talent.
n Affiliation
To draw near and enjoyably cooperate or reciprocate with an allied other (an ‘other’ who resembles the subject or who likes the subject); to please and win affection of a cathected object; to adhere and remain loyal to a friend.
n Aggression
To overcome opposition forcefully; to fight; to avenge an injury; to attack, injure or kill another; to oppose forcefully or punish another.
n Autonomy
To get free, shake-off restraint, break out of confinement; to resist coercion and restriction; to avoid or quit activities prescribed by domineering authorities; to be independent and free to act according to impulse; to be unattached, irresponsible; to defy convention.
n Counteraction
To master or make up for a failure by restriving; to obliterate a humiliation by resumed action; to overcome weaknesses, to repress fear; to efface a dishonour by action; to search for obstacles and difficulties to overcome; to maintain self-respect and pride on a high level.
n Defendence
To defend the self against assault, criticism and blame; to conceal or justify a misdeed, failure or humiliation; to vindicate the ego.
n Deference
To admire and support a superior; to praise, honour or eulogise; to yield eagerly to the influence of an allied other; to emulate an exemplar; to conform to custom.
n Dominance
To control one’s human environment; to influence or direct the behaviour of others by suggestion, seduction, persuasion or command; to dissuade, restrain or prohibit.
n Exhibition
To make an impression; to be seen and heard; to excite, amaze, fascinate, entertain, shock, intrigue, amuse or entice others.
n Harm avoidance To avoid pain, physical injury, illness and death; to escape from a dangerous situation; to take precautionary measures. n Infaavoidance
To avoid humiliation; to quit embarrassing situations or to avoid conditions that may lead to belittlement: scorn, derision or indifference of others; to refrain from action because of the fear of failure.
n Nurturance
To give sympathy and gratify the needs of a helpless object: an infant or any object that is weak, disabled, tired, inexperienced, infirm, defeated, humiliated, lonely, dejected, sick, mentally confused; to assist an object in danger; to feed, help, support, console, protect, comfort, nurse, heal.
n Order
To put things in order; to achieve cleanliness, arrangement, organisation, balance, neatness, tidiness and precision. (Table 14.1 continued)
Applications of Psychological Testing in Clinical Settings
233
(Table 14.1 continued) Need
Brief Definition
n Play
To act for ‘fun’ without further purpose; to like to laugh and make jokes; to seek enjoyable relaxation of stress; to participate in games, sports, dancing, drinking parties, cards.
n Rejection
To separate oneself from a negatively cathected object; to exclude, abandon, expel or remain indifferent to an inferior object; to snub or jilt an object.
n Sentience
To seek and enjoy sensuous impressions.
n Sex
To form and further an erotic relationship; to have sexual intercourse.
n Succorance
To have one’s needs gratified by the sympathetic aid of an allied object; to be nursed, supported, sustained, surrounded, protected, loved, advised, guided, indulged, forgiven, consoled; to remain close to a devoted protector; to always have a supporter.
n Understanding
To ask or answer general questions; to be interested in theory; to speculate, formulate, analyse and generalise.
Source: http://www.ptypes.com/needs-as-personality.html
Stories and Interpretation Storyurce: Murray and Bellak (1973). Jeffery worked as a window cleaner at the Canadian National Tower (CNT). One of the tallest buildings in the world, the CNT was like a mammoth in front of an ant. One day Jeffery was working on the sixtieth floor of the building. Work was almost done when disaster struck. One of the ropes, that kept the platform that he worked on, safe and in place, snapped. In the flash of a second, Jeffery was plummeting 60 stories down to the hard concrete path below, with his life flashing before his eyes. Jeffery tried desperately to grab on to a rope. Finally, around the eighth floor, he grabs hold of a rope. The pressure of the body on the rope weakens it and breaks it and Jeffery starts falling down, and just when he is about to hit the hard ground down below, he snaps out of the ‘hypnosis’ he was in. And then . . . the rope snaps. The interpretation: 1. Premonition about his accident, in which his death is the end result. However, to keep the suspense of story, he leaves the end hanging. 2. Feels that premonition is a reality and many people do have them. 3. Likes the idea of premonition. 4. ‘Many people have saved themselves from disaster because of premonition.’ Gave the example of 9/11. 5. Usually occurs in dreams.
234
Applied Psychometry
Story After working outstation for a month, the husband returned home late in the night. He rang the bell the third time, then the fourth, the fifth, the sixth and then lost his patience and went to the backyard of his house. Broke open the door and entered the house. Fuming, he walked up the stairs to his bedroom. He opened the door and saw his wife lying on the bed scarcely breathing. Her skin had turned blue due to lack of air, her face, flushed of its entire colour. She takes her last breath after looking at her husband for the last time, looking like she was trying to keep herself alive just to look at him for the last time. He looks over her, her body and starts weeping. Screams out loud and suddenly wakes up from his sleep, panting and heaving. He drank a glass of water, walked to his window. Then turns around and walks to his wife’s side, kisses her on her head and goes back to sleep. Interpretation: 1. Made up himself. 2. Likes stories where you manage to escape something bad that is happening. When pointed out that in the first story, the person had not been able to escape, he said that he had had the premonition; he could have saved himself. Observation: Both the events have occurred at an unconscious level. From the stories given by the subject, it can be observed that the motivating needs in his case are: need for exhibition, need for abasement, need for affiliation and need for nurturance. The first two needs can be seen in both the stories and the latter two come out strongly in the second story. From his remarks, it can also be clearly seen that he has a tendency to submit passively to external forces and resigns to his fate. At the same time, a slight tendency to counteraction can also be observed when he gives himself a second chance to correct the wrong by indicating that the pain has occurred in ‘dreams’. The need for nurturance and affiliation can be seen very clearly in the relationship indicated between the husband and wife in the second story. The need for exhibition can be observed in the way he has formulated his stories, giving both of them a dramatic end.
Applications of Psychological Testing in Clinical Settings
235
DIRECTORY OF MAJOR TESTS USED IN CLINICAL SETTINGS Foreign Tests Tests of Personality 1 Name: Teacher’s Ratings of ‘Masculine’ Behaviour. Author: Donald K. Freedheim. Applicability: 7 to 11 years. Measures: Masculinity or boyishness. No. of items: 16. Reliability: 0.90. Validity: Content validity appears high. Availability: Donald K. Freedheim, Dept of Psychology, Case Western Reserve University, Cleveland, Ohio–44106.
2 Name: Death Anxiety Scale. Author: Donald I. Templer. Applicability: 12-year old. Measures: Death anxiety that permeates a variety of life experiences. No. of items: 15. Reliability and Validity: The scale was found to have satisfactory reliability which is free of response sets. Availability: Templer (1970).
236
Applied Psychometry
3 Name: Sixteen Personality Factor (16-PF) Questionnaire. Author: Raymond B. Cattell. Applicability: 16 years and over. Measures: 16 most important dimensions of personality. Reliability: Correlations between equivalent scales for forms A and B and for Forms C and D of 16PF range between 0.30s to 0.70s. Validity: Predictive and concurrent validity high. Availability: Institute for Personality and Ability Testing, P.O. 188, Champaign, Illinois, USA.
4 Name: Psychiatric Evaluation Form. Authors: Robert L. Spitzer, Jean Eudicolt, Alvin Mesnikoff and George Cohen. Applicability: Psychiatric patients and non-patients. Measures: Intra-class correlations ranged from 0.51 to 0.85, with a median of 0.74. Validity: Intercorrelations among the 19 rating scales are available on 433 patients. Availability: Biometrics Research, New York State Psychiatric Institute, 712, West 168th, New York, NY–10032.
5 Name: The Draw-a-Person (Projective). Author: William H. Urban. Applicability: 5 years and above. Measures: Five distinct categories of concepts: qualities to be inferred or concluded, drawing characteristics, diagnostic categories, behavioural characteristics and subject matter of drawings. No. of items: Two checklists: (a) 16-items Severe Mental/Emotional Disturbance Checklist and (b) 14-items Organic Brain Damage Checklist. Reliability: Inter-scorer reliability: 0.33. Validity: Face and content validity. Availability: Western Psychological Services, 12031 Wilshire Blvd, Los Angeles, California–90025.
Applications of Psychological Testing in Clinical Settings
237
6 Name: Thorndike Dimensions of Temperament. Author: Robert L. Thorndike. Applicability: Grades 11–16 and adults. Measures: Stable personality dimensions: Sociable, Ascendant, Active, Tough-minded, Reflective, Impulsive, Cheerful, Placid, Accepting, Responsible. Items: 200 forced-choice items. Reliability: 0.54 to 0.87. Validity: 0.10. Availability: Psychological Corporation, 304, East 45th Street, New York–10017.
7 Name: The Institute of Personality and Ability Testing (IPAT) Anxiety Scale Questionnaire. Authors: Raymond B. Cattell, Samuel K. Krug and Ivan H. Scheier. Applicability: 14 years and over. Measures: Free anxiety. No. of items: Two 20-item subscales (covert and overt anxiety) classified into five primary trait factor scales consisting of two or six items each. Reliability: Range from 0.77 to 0.86. Validity: Correlations with Taylor’s Anxiety Scale: 0.70; with Maudsley and Eysenck neuroticism scale: 0.73; construct validity. Availability: Institute for Personality and Ability, P.O. 188, Champaign, Illinois, USA.
8 Name: State-Trait Anxiety Inventory. Authors: C.D. Spielberger, RL. Gorsuch and R. Lushere. Applicability: Grades 9 to 16 and adults. Measures: State and trait anxiety. Reliability and validity: High reliability and validity indices available. Availability: Consulting Psychologists Press Inc., 577 College Ave., Palo Alto, California.
238
Applied Psychometry
9 Name: State-Trait Anxiety Inventory for Children. Authors: Charles D. Spielberger, C. Drew Edwards, Robert E. Lushene, Joseph Mantuori and Denna Platzek. Applicability: Grades 4 to 6. Measures: A-state: ‘how you feel right now?’ No. of items: 40 items (20 in A-state, 20 in A-trait); three-point scale (hardly-ever, sometimes or often). Reliability: Internal consistency coefficients for A-state are 0.82 for males and 0.87 for females; coefficients for A-trait are 0.78 and 0.81, respectively. Test–retest reliability coefficients for A-trait are 0.65 for males and 0.71 for females; coefficients for A-state are 0.31 and 0.47, respectively. Validity: Correlations between the A-trait scale and both the Children Manifest Anxiety Scale (CMAS) and Generalised Anxiety Scale for Children (GASC) are 0.75 and 0.63, respectively. Availability: Consulting Psychologists Press Inc., 577, College Ave., Palo Alto, California. 10 Name: Ego State Inventory (ESI). Author: David Gordon Me-Carley. Applicability: Adolescents and adults. Measures: Five ego states: punitive parent, nurturing parent, adult, rebellious child and adaptive child. No. of items: 52 items, each consisting of a cartoon drawing of two or more people in social situations. Reliability: Test–retest reliability ranges from 0.47 to 0.73 on the five scales. Validity: Correlations of various of the ESI scales with other tests, such as Rokeach’s Dogmatism Scale (1960) and scale from Gough’s California Personality Inventory Cluster (Gough and Bradley 1996) around 0.06. Availability: McCarley, Stoelting Co., 424, North Horman Ave, Chicago, Illinois–60624. 11 Name: Eysenck Personality Inventory. Authors: H.J. Eysenck and Sybil B.G. Eysenck. Applicability: Grades 9 to 16 and adults. Measures: 1. E (Extraversion–Introversion).
Applications of Psychological Testing in Clinical Settings
239
2. N (Neuroticism–stability) and 3. Lie scale (developed and adapted from the Minnesota Multiphasic Personality Inventory [MMPI] with some modifications). No. of items: 24-item E-Scale, (Yes/No inventory, two parallel forms, 24-item N-scale and nine item lie scale. Reliability: Test–retest reliabilities range between 0.80 and 0.97 and correlations between the two forms run from 0.75 to 0.91. Validity: Construct and concurrent validity vary in quality and completeness. Availability: Educational and Industrial Testing Service, P.O. Box 7234, San Diego, California.
12 Name: Edwards Personality Inventory (EPI). Author: Allen L. Edwards. Applicability: Grades 11–16 and adults. Measures: 53 personality variables. No. of items: 1,500 items (five booklets of 300 items each). Reliability: Kuder-Richardson Formula 20 reliability coefficients for the 53 scales vary from a low of around 0.65 for ‘logical’ to a high of around 0.95 for ‘sensitive to criticism’. Approximately 30 per cent of the scales have reliabilities in the 0.80s and another 30 per cent have reliabilities of 0.90 or more. Validity: Correlations between EPI scales and Edwards social desirability scale range up to the low 0.70s (self-critical), while nine others are above 0.40. Availability: Science Research Associates, Inc., 304, East 45th street, New York, USA.
13 Name: Edwards Personal Preference Schedule (EPPS). Author: Allen L. Edwards. Applicability: College youth and adults. Measures: 15 needs selected from Murray’s list of manifest needs. Reliability and validity: EPPS Scales are useful in the prediction of external, socially important criteria; face validity is adequate. However, the scales are relatively transparent and empirical correlations are reasonably low. Availability: Psychological Corporation, 304, East 45th street, New York, USA.
240
Applied Psychometry
14 Name: Depression Adjective Check Lists (DACL). Author: Bernard Lubin. Applicability: Grades 9 to 16 and adults. Measures: Depression through seven different forms identified as forms A through G. No. of items: First four forms consist of three self-descriptive adjectives each; last three forms include 34 adjectives each. Reliability: Internal consistency ranges from 0.79 to 0.90. Split-half reliabilities range from 0.82 to 0.93 for normal and 0.86 and 0.93 for patients. Validity: The highest correlation with the several clinical scales of the MMPI was 0.57, obtained between the depression scales and the DACL. Availability: Educational and Industrial Testing Service, P.O. Box 7234, San Diego, California. 15 Name: Children’s Embedded Figures Test (EFT). Author. Stephen A. Karp and Norman L. Konstadt. Applicability: 5 to 12 years. Measures: An individual difference dimension initially labelled by Witkin (cited in Chadha 1996) as field dependence–independence and more recently as psychological differentiation. No. of items: 11-item ‘tent’ series and 14-item ‘house’ series. Reliability: Internal reliability estimates range from 0.83 to 0.90; test–retest correlation: 0.87. Validity: Concurrent validity estimates using EFT as the criterion measure correlations between LEFT and EFT (cited in Chadha 1996) for 11-year olds and 12-year olds (0.83 to 0.86), for nine-yearolds) and 10-year olds (0.70 to 0.73). Availability: Consulting Psychologists Press, Inc., 577 College Ave., Palo Alto, California. 16 Name: California Psychological Inventory (CPI). Author: Harrison G. Gough. Applicability: 13 and over. Measures: Dominance, Capacity for Status, Sociability, Social Presence, Self-acceptance, Sense of well-being, Responsibility, Socialisation, Self-control, Tolerance, Good Impression, Communality, Achievement via conformance, Achievement via Independence, Intellectual Efficiency, Psychological mindedness, Flexibility, Femininity. No. of items: 480 (True/False format).
Applications of Psychological Testing in Clinical Settings
241
Reliability: For non-incarcerated groups, the retest correlation are generally between 0.55 and 0.75 over a one year period. Validity: The correlations of CPI scales with scales from other inventories are quite substantial: 1. Great deal of overlap between the Guilford–Zimmerman Temperament Survey (cited in Chadha 1996) and CPI. 2. Sizable correlations between about one-third of CPI scale and MMPI scales. 3. Either 16-PF or EPPS serve as well as the CPI in most prediction situations. Availability: Consulting Psychologists Press, Inc., 577 College Ave., Palo Alto, California.
17 Name: Anger Self-Report (ASR) Authors: Martin L. Zelin, Gerald Adler and Paul Myerson. Applicability: 13 years to geriatrics. Measures: Awareness and expression of anger, guilt and mistrust. No. of items: 64 Reliability and validity: Highest correlation for physical expression scale (0.41) with ASR; highest correlation for verbal expression (0.36) with dependency and 0.31 with anger; awareness of anger correlated 0.24 with antisocial. Availability: Martin L. Zelin, 260 Tremont street, Seventh Floor, Boston, Massachusetts–02166.
18 Name: Dean’s Alienation Scale. Author: Dwight G. Dean. Applicability: College youth and adults. Measures: Powerlessness, normlessness and social isolation. No. of items: 24 (Likert-type scale, five-point scale). Reliability: Spearman-Brown correlation: powerlessness (0.78), normlessness (0.73) and social isolation (0.84). Validity: Validity was determined by presenting 139 proposed items to seven instructors and assistants to judge each statement as to its applicability or non-applicability to the component of powerlessness, normlessness and social isolation. Availability: Dwight G. Dean, Dept. of Sociology, Iowa State University, Amess, Iowa–50010.
242
Applied Psychometry
19 Name: Myers-Briggs Type Indicator. Authors: Katharine C. Briggs and Isabel Briggs Myers. Applicability: Grades 9 to 16 and adults. Measures: Jungian Personality typology-4 scale; extraversion versus introversion; sensation versus intuition, thinking versus feeling, judgement versus perception. Availability: Consulting Psychologists Press, Inc., 577 College Ave., Palo Alto, California.
20 Name: Multifactorial Scale of Anxiety. Authors: Walter D. Fenz and Epstein. Applicability: Adolescent to adult. Measures: 1. Muscle tension (18 items), 2. Autonomic arousal (16 items) and 3. Feelings of insecurity (19 items). No. of items: 53. Reliability: Test–retest reliability coefficients were 0.63 for mental tension, 0.70 for automatic arousal, and 0.62 for feeling of insecurity; odd–even reliability was 0.84, 0.83 and 0.85, respectively. Validity: Neurotic subjects manifested more; specificity in factor loadings associated with the three scales than with normal subjects. Availability: Walter D. Fenz, Psychology Department, University of Waterloo, Waterloo, Ontario, Canada.
21 Name: Children’s Locus-of-control Scale. Authors: Irv Bialer and Rue L. Cromnell. Applicability: 6 to 14 years. Measures: The extent to which a child characteristically construes event outcomes (positive and negative) as being due to his own actions (internally controlled) or to the whims and/or manipulations of fate, chance or others (externally controlled). No. of items: 23. Reliability: Split-half reliability coefficient: 0.87; test–retest reliability: 0.37. Availability: Bialer (1960 [1961]).
Applications of Psychological Testing in Clinical Settings
243
22 Name: Embedded Figures Test. Authors: Herman A. Witkin, Philip K. Oltman, Evelyn Raskin and Stephen A. Karp. Applicability: 10 years and over. Measures: Field dependence. Availability: Consulting Psychologist Press, Inc., 577, College Ave., Palo Alto, California.
23 Name: Rosenzweig Picture-Frustration (P-F) Study Form for Adolescents. Author: Saul Rosenzweig. Applicability: 12 to 18 years. Measures: 1. Direction of aggression (extragression, introgression and imagression) and 2. Type of aggression (obstacle-dominance, need-persistence). No. of items: 24. Reliability: Interscorer reliability: 0.80–0.90; retest correlations: 0.40.–0.70. Validity: Construct, concurrent and predictive validity have been demonstrated. Availability: Saul Rosenzweig, 8029 Washington avenue, SI. Louis, Missouri–63114.
24 Name: Adjective Inventory Diagnostic (AID). Authors: Robert E. Peck and J.E. Everson. Applicability: Children and adults. Measures: Personality traits such as type of personality and relative intelligence level. No. of item: 100 self-descriptive adjectives. Reliability: The reliability of the different diagnostic categories varies considerably from a high of about 93 per cent at the present time to a low of 40 per cent. Availability: Computer Paramedics Inc., 175 Jericho Turnpike, Syosset, New York–11791.
25 Name: Symbolic Measure of Authoritarianism (SF-Test) Author: H. Wayne Hogan.
244
Applied Psychometry
Applicability: 5 years and over. Measures: Authoritarianism and tolerance for ambiguity. No. of drawings: 15 pairs of line drawings and number arrangements. Reliability: 0.84 to 0.93. Validity: Convergent validity is 0.64. Availability: H. Wayne Hogan, Department of Sociology, Tennesse Technological University, Cookeville, Tennessee–385Ql.
26 Name: Structural Clinical Interview (SCI). Authors: Eugene I. Burdock and Anne S. Hardesty. Age: Adult (18+) Mental Patients. Measures: Change in Psychopathology. No. of items: 179 dichotomous (Yes/No) behavioural items and a standard interview protocol (45 specific stimulus questions). Reliability: Inter-rater reliability ranges from 0.43 to 0.93 with median 0.73. Validity: Coefficients of 0.35 to 0.68 were found between the WBI and SCI in samples of 73 and 16, respectively. Availability: Soringer Publishing Co., Inc., 200 Park Ave. South New York. NY.
27 Name: The Group Personality Projective Test. Authors: Ressell N. Cassel and Theodore C. Kahn. Age: 11 years and over. Measures: The respondent’s needs. No. of items: 90 drawings (five alternative explanations). Reliability: Split-half reliability ranges from 0.24 to 0.84 with median 0.50; test–retest reliability ranges from 0.41 to 0.85 with median 0.60. Validity: Face validity. Availability: Psychological Test Specialists, Box 1441, Missoula, Mont.–59801.
28 Name: Thematic Appreciation Test. Author: Henry A. Murray.
Applications of Psychological Testing in Clinical Settings
245
Age: 4 years and over. Measures: Fantasy in a manner that leads to conceptualisation of individual personality or group dynamics, and to theories of personality. Reliability: Retest reliability in terms of stability over 20 years ranged between 0.76 and 0.85. Validity: Concurrent validity in terms of differential results with contrasted groups of subjects remains satisfactory. Availability: Harvard University Press, 79, Carden St. Cambridge, Massachusetts–02138.
29 Name: Choice Dilemmas Questionnaire (CDQ). Authors: Nathan Kodan and Michael A. Wallach. Age: Approximately 17 years and over. Measures: Self-described risk-taking attitudes in such areas as business, sports, marriage, and so on. No. of items: 12. Availability: Wallach and Mabil (1970) and Wallach and Wing (1968).
30 Name: Children’s Humor Test. Authors: Priscilla V. King and James E. King. Age: 4 to 8 years. Measures: Hostile and nonsensical humour. No. of items: Five, each with a missing section. Reliability and validity: Four-year olds preferred hostile-aggressive alternatives more often than fiveyear olds (P < 0.001). Boys preferred hostile-aggressive more often than girls (P < 0.05). Availability: James E. King, Department of Psychology, University of Virginia, Virginia, USA.
31 Name: Thorndike Dimensions of Temperament. Author: Robert L. Thorndike. Applicability: Grades 11, 12 and college freshmen. Measures: Temperament. No. of items: 10 dimensions of temperament. Reliability: Split-half reliability ranges from 0.54 to 0.87.
246
Applied Psychometry
Validity: 0.43–0.73 Availability: The Psychological Corporation, NY–10017.
32 Name: The Rotter Incomplete Sentence Blank. Authors: Julian B. Rotter and Janet E. Rafferty. Applicability: High School and Adults. Measures: Personality No. of items: 40 items. Reliability: Split-half reliability: 0.84 for male students and 0.83 for female students; intel scores reliability of 0.91 for males and 0.96 for females. Validity: Concurrent validity. Availability: The Psychological Corporation, 304 East 45th Street, NY–10017.
33 Name: Neuroticism Scale Questionnaire (NSQ). Authors: I.E. Scheier and R.B. Cattell. Applicability: College students. Measures: Degree of neuroticism or neurotic trend. No. of items: 40 items. Reliability and validity: Reliability and validity coefficients are being worked out. Availability: International copyright of the Institute of Personality and Ability Testing.
34 Name: The Orientation Inventory. Author: Bernard M. Bass. Applicability: Students and workers. Measures: Self-orientation, interaction orientation and task orientation. No. of items: 27 statements and examinee had to choose the most and least preferred of the three alternatives. Reliability: Test–retest 0.73 to 0.76. Validity: Concurrent validity. Availability: Consulting Psychologists Press, California.
Applications of Psychological Testing in Clinical Settings
247
35 Name: Color Pyramid Test. Authors: K. Warner Schaie and Robert Heiss. Applicability: Children, adolescents and adults. Measures: Aspects of personality which are relevant to affect expression and impulse control. No. of items: Squares of coloured papers in 24 different hues. Reliability: Test–retest: 0.33–0.81; internal consistency via ANOVA. Validity: Construct validity (separately for boys and girls), concurrent validity and predictive. Availability: Hans Huber Publishers, Berne.
Indian Personality Tests Personality 1 Name: Kundu’s Neurotic Personality Inventory (KNPI). Author: Ramanath Kundu. Applicability: For adults. Measures: Diagnosis, selection and guidance (for neurotic personality). No. of items: 66. Reliability: Split-half reliability coefficients range from 0.72 to 0.89 for males (N = 692), female (N = 308) and neurotic (N = 50) groups. Validity: Criterion validity coefficients using neurotic group were found to be 0.86 and 0.87 respectively for males and females. Availability: Kundu (1962). 2 Name: Maudsley Personality Inventory. Author: T.E. Shanmugarn. Applicability: English/Tamil speaking adults. Measures: Personality traits. No. of items: 48. Reliability: Split-half reliability using Spearman-Brown Prophecy formula is 0.64. Validity: It has been used on varied samples and was found to be valid to measure proposed traits. Availability: Shanmugarn (1965).
248
Applied Psychometry
3 Name: Rosenzweig Picture-Frustration Study for Children. Author: Udai Pareek. Applicability: Hindi speaking children. No. of items: Nine pictures. Measures: Frustration. Reliability: Scoring reliability: 98 per cent agreement; stability coefficients range from 0.51 to 0.78. Validity: Validated using teacher’s ratings, delinquent groups and by artificially induced frustration. Availability: Manasayan, 32, Netaji Subhash Marg, Delhi–110006. 4 Name: Rosenzweig Picture-Frustration Study (Adult from). Authors: Udai Pareek, R.S. Devi and Saul Rosenzweig. Applicability: Adults. Measures: Frustration. Reliability: Stability coefficients ranged from 0.27 to 0.82 and consistency values from 0.46 to 0.74. Validity: Cross-validity established. Availability: Rupa Psychological Corporation, 15/23, Sora Kuan, Varanasi; Pareek et al. (1968).
5 Name: Sharma Manifest Anxiety Scale (SMAS). Author: Sagar Sharma. Applicability: High School boys (11th grade) or adolescents of about 16 years. Measures: Manifest anxiety. No. of items: 46. Reliability: Reliability coefficients using KR-21 was 0.90. Validity: Content validity using 10 psychologists was worked out. Concurrent validity with TMAS was 0.72 (N = 100); correlation with Cattell’s IPAT Anxiety questionnaire was 0.91 (N = 100); correlation with Spilberger’s State-Trait Anxiety Inventory was 0.80 (N = 100). Availability: Sharma (1970a, 1970b, 1971) and Deo and Sharma (1971).
6 Name: Thematic Apperceptive Measure of Need Achievement. Author: Prayag Mehta.
Applications of Psychological Testing in Clinical Settings
249
Applicability: High school students. Measures: The level of need achievement. No. of items: Projective test containing six cards. Reliability: Coefficient of split-half: 0.73 (N = 22); test–retest reliability: 0.39 (N = 41) and 0.56 (N = 42). Validity: Construct and Predictive validity were calculated. Availability: Mehta (1969).
7 Name: Sinha Anxiety Scale. Author: D. Sinha. Applicability: College and university students of undergraduate and postgraduate levels. Measures: Manifest anxiety. No. of items: 100 items, with two response alternatives (Yes/No). Reliability: Reliability by split-half and test–retest methods ranges between 0.92 and 0.85. Validity: Ranges between 0.69 and 0.72. Availability: Rupa Psychological Corporation, Varanasi.
8 Name: The Dependence Proneness Scale. Author: J.B.P. Sinha. Applicability: The scale is intended to measure the dependence proneness in a person. Measure: The behavioural descriptions included in the scale cover dimensions (inclinations) which consist of seeking support, advice and/or order from others; confiding in others uncritically, desiring to be encouraged, and so on. The negative inclinations included lacking initiative and independence. No. of items: 50 items, measuring dependence proneness on a five-point scale. Reliability: Split-half: 0.67. Validity: Construct validity: 0.55. Availability: Rupa Psychological Centre, BI9/60-B, Deoriabir, Bhelupura, Varanasi–221001.
9 Name: Test Anxiety Scale (TAS). Author: V.P. Sharma. Applicability: College students, both boys and girls.
250
Applied Psychometry
Measures: Test anxiety. Reliability: Test–retest and split-half coefficients are 0.927 and 0.876, respectively. Validity: 0.768 and 0.743. Availability: National Psychological Corporation, 4/230, Kacheri Ghat, Agra–282004.
10 Name: Multivariable Personality Inventory (MPI). Author: B.C. Muthayya. Applicability: Administrative personnel at the block level. Measures: Personality attributes. No. of items: 50. Reliability: Obtained r = 0.52 Validity: Item validity mentioned. Availability: Agra Psychological Research Cell, Agra.
11 Name: Indian Adaptation of Rosenzweig’s Picture Frustration Study for Children. Author: Udai Pareek. Applicability: 5 to 12 years children. Measures: Responses to frustrating situations. No. of items: 24 pictures. Reliability: Stability coefficients: 0.51–0.78; consistency is 0.56 to 0.91. Validity: Item homogeneity; highly significant correlations. Availability: Manasayan, 32, Netajee Subhash Marg, Daryaganj, Delhi–110006.
12 Name: Verbal Projection Test (VPT). Author: T.E. Shanmugam. Applicability: 12–18 years of children Measures: Personality. No. of items: 20. Reliability: Split-half and parallel method. Validity: Validated against comparable groups of delinquents and teachers and parents estimates. Availability: Department of Psychology, University of Madras.
Applications of Psychological Testing in Clinical Settings
251
13 Name: Kundu Introversion Extraversion Inventory. Author: Ramanath Kundu. Applicability: Adult group. Measures: Introversion–extraversion dimension of adults. No. of items: 70 items with an uneven number of response choices divided into five blocks. Reliability: Split-half, odd–even, first half versus second half for each block and whole test. Validity: Concurrent validity and predictive validity. Availability: Ramanath Kundu, Department of Psychology, University College of Science and Technology, 92 Acharya Profulla Chandra Road, Calcutta–700009 . 14 Name: Ego Strength Scale. Author: Q. Hasan. Applicability: Adult male and female groups. Measures: Improvement in psychoneurotic patients undergoing psychotherapy. No. of items: 32 dichotomous items requiring true or false responses. Reliability: Split-half : 0.78; test–retest: 0.86 and 0.82 after a gap of two and five weeks, respectively. Validity: Some evidence in support of construct validity, but more in support of concurrent and predictive validity. Availability: Rupa Psychological Centre, Varanasi–221001. 15 Name: Sinha W-A Self-Analysis Form (Anxiety Scale). Author: D. Sinha. Applicability: Undergraduate and postgraduate students. Measures: Manifest anxiety. No. of items: 100 items each with two probable responses (Yes or No). Reliability: Split-half: 0.92; and test–retest: 0.85 Validity: Inventory validated against two well-established scales: Taylor’s MAS Scale and Cattell’s IPAT anxiety scale and the obtained values are 0.69 and 0.72. Availability: Rupa Psychological Corporation, Varanasi. 16 Name: The Insecurity Questionnaire. Author: Dr G.C. Pati.
252
Applied Psychometry
Applicability: Used to find out the degree of insecurity feeling in any individual. Measures: Feelings of guilt, shame, rejection, isolation, perception of world and life. No. of items: 20. Reliability: Test–retest: 0.936; no attempt to determine the internal consistency of the questionnaire. Validity: Coefficient correlating the scores of 55 subjects with their self-ratings was 0.713. Availability: Rupa Psychological Centre, Varanasi–221001.
17 Name: Hindi Versions of the Eysenck’s Maudsley Personality Inventory. Authors: S.S. Jalota and S.D. Kapoor. Applicability: 16 years and over. Measures: Degree of neuroticism and extraversion. Reliability: Split-half reliability for neuroticism and extraversion were 0.71 and 0.42, respectively. Availability: Psycho-centre, Green Park, Delhi.
18 Name: Dutt Personality Scale. Author: N.K. Dutt. Applicability: 15 years and above. Measures: General anxiety and various components of anxiety such as insecurity complex, selfconsciousness, guilt proneness, tension, paranoid, suspiciousness, emotional instability, hypochondriac tendencies and somatic reactions. No. of items: 90. Reliability: Split-half: 0.84–0.96 Validity: Content and Construct Validity. Availability: Rupa Psychological Centre, B, 19/60-B, Deoriabir, Dhallupura, Varanasi–221001.
19 Name: Indian Frustration Scale. Author: L.I. Bhushan. Applicability: College students and teachers. Measures: Authoritarianism and measures nine dimensions: conventionalism authoritarian, submission, authoritarian aggression, anti-intraception, superstition and stereotypes, power and toughness, destructiveness and cynicism, projectivity, sex.
Applications of Psychological Testing in Clinical Settings
253
No. of items: Nice items are negative and 25 are positive. Reliability: Split-half: 0. 77; test–retest: 0.72. Validity: Content validity, concurrent validity, construct validity (0.54), predictive validity is statistically significant. Availability: National Psychological Corporation, 4/230, Kacheri Ghat, Agra–282604.
20 Name: Prolonged Deprivation Scale (PDS). Authors: Girishwar Misra and L.B. Tripathi. Measures: Degree of deprivation. No. of items: 96 items, five-point scale items. Reliability: Inter-rater reliability (0.55–0.93), test–retest (0.77), split-half (0.95). Validity: Content, intrinsic, predictive, construct. Availability: National Psychological Corporation, Agra.
21 Name: Measure of Altruism. Authors: V.K. Kool and Manisha Sen. Applicability: 17 years and above. Measures: Personality. No. of items: 23 (Forced choice). Reliability: 0.60 (Guttman split-halt); 0.65 (alpha). Validity: 0.42. Availability: Manisha Sen, Department of Applied Psychology, Bombay University, Bombay.
22 Name: Post-Graduate Institute (PGI) Locus of Control Scale. Authors: O.K. Menon, N.N. Wig and S.K. Varma. Applicability: Children, adults and aged persons. Measures: Lotus of control. No. of items: 7. Reliability: Test–retest (N = 100): 0.76; inter-scorer reliability (TV = 100): 0.995. Validity: Correlations of scale scores with psychiatrists rating (N = 25): 0.792. Availability: PGI, Medical Education and Research, Chandigarh.
254
Applied Psychometry
23 Name: Psychoanalytic Personality Inventory. Author: Bishwanath Roy. Applicability: 15 years and above. Measures: Personality Structure of the individual based upon the factors like Identification, Tolerance, Narcissism, Lasting cathexis, Object cathexis, Libidinal cathexis, Dependence, Sibling Rivalry and Total. No. of items: 44. Reliability: Split-half (Spearman-Brown method): 0.833; odd–even: 0.828. Validity: Factorial and face validity. Availability: Bishwanath Roy, Department of Psychology, NCERT, New Delhi.
24 Name: Swamulyankan Prashnavali: Ek Chinta Mapani. Authors: RR Tripathi and Ambar Rastogi. Applicability: Adolescents and adults. Measures: State, trait and free floating anxieties. Reliability: A-state scale = 0.931 A-Trait scale = 0.892 Free Floating scale = 0.869 A-State scale = 0.798 A-Trait scale = 0.892 Free Floating scale = 0.901
[Alpha coefficient N = 200] [N = 200 stability coefficient]
Validity: Convergent validity coefficients against IPAT scale, A-State: 0.691, A-trait: 0.799, Free floating: = 0.528 [N = 100]. Discriminant validity coefficients vary between .003 and 0.110 against the 15 subscales of IPPS. Availability: Raghubir Sharan Publications, Varanasi.
25 Name: Dimensions of Friendship Scale. Authors: S. Chandra and N.K. Chadha. Applicability: College students.
Applications of Psychological Testing in Clinical Settings
255
Measures: Enjoyment, Acceptance, Trust, Respect, Mutual Assistance, Confiding, Understanding and spontaneity. No. of items: 64 Yes/No type. Reliability: Split-half: 0.78; test–retest: 0.70– 0.81. Validity: Cross-validity 0.74–0.82. Availability: National Psychological Corporation, 4/230, Kacheri Ghat. Agra.
26 Name: Family Environment Scale (FES). Authors: Harpreet Bhatia and N.K. Chadha. Applicability: College students. Measures: Cohesion, Expressiveness, Conflict, Acceptance and Caring, Independence, ActiveRecreational Orientation, Organisation and Control. No. of items: 69. Reliability: Split-half: 0.48–0.92. Validity: Face and content established. Availability: Ankur Psychological Agency, 22/481, Indira Nagar, Lucknow.
27 Name: Dimensions of Temperament Scale (DTS). Authors: N.K. Chadha and S. Chandna. Applicability: College students. Measures: Sociability, Ascendance, Secretiveness, Reflective, Impulsivity, Placid, Accepting, Responsible, Vigorous, Cooperative, Persistence, Warmth, Aggressiveness, Tolerance and Toughminded. No. of items: 152 Yes/No type. Reliability: Test–retest 0.84–0.95; split-half: 0.79. Validity: Empirical: 0.73; cross-validity: 0.68–0.92. Availability: National Psychological Corporation, 4/230, Kacheri Ghat, Agra.
28 Name: Dimensions of Rigidity Scale (DRS). Author: N.K. Chadha. Applicability: College students.
256
Applied Psychometry
Measures: Intellectual rigidity, Emotional rigidity, Dispositional rigidity, Social rigidity, Behavioural rigidity, Perceptual rigidity, Creative rigidity. No. of items: 75 Yes/No type. Reliability: Split-half: 0.71; test–retest: 0.69–0.94. Validity: Cross-validity: 0.20–0.38; empirical validity: 0.20–0.59. Availability: National Psychological Corporation, 4/230, Kacheri Ghat, Agra.
NOTE 1. Sten Scores is short form for Standard Nine Scores. A Sten Score is a standard score often used in the interpretation of psychological tests.
15
Applications of Psychological Testing in Organisational Setting
CHAPTER OUTLINE 1. Applications of psychological testing in an organisational setting 2. Two practical demonstrations with scores and interpretations: (i) Myres–Briggs Type Indicator (MBTI) (ii) Emotional Quotient (EQ) Test 3. Directory of major tests used in organisational settings
LEARNING OBJECTIVES After reading this chapter, you will be ale to understand: 1. How psychological tests can be applied in business settings? 2. Understand the use and interpretation of two major tests used in business settings. 3. Find the information about important tests.
257
258
Applied Psychometry
APPLICATIONS OF PSYCHOLOGICAL TESTING IN AN ORGANISATIONAL SETTING
P
sychological testing is a big business today and business organisations are the one area where it finds tremendous application. In case of business organisations, psychological tests are used mainly at two stages: first in the pre-employment phase and, second, in the form of various performance appraisal tools done after the employment. It can be said that formal testing begins with an interview wherein information regarding the suitability of a candidate is assessed by the interviewer, in which the interviewer follows a standardised format of questions in set order. The information received by formal interviews is systematic and its validity is high. On the other hand, informal interviews are unsystematic in the sense that there are no predetermined questions or procedures and the interviewer may ask the questions irrelevant to the job and may evaluate the candidate subjectively. Successful candidates can be assessed by using psychological tests for their polytonality traits, cognitive skills, ethical values, and social and interpersonal skills. Psychological tests and various rating scales are very useful instruments for the performance appraisal of employees. Apart from this, psychological tests can be used in a more positive sense for management development programme, leadership development training and other areas of training and development.
TWO PRACTICAL DEMONSTRATIONS WITH SCORES AND INTERPRETATIONS Myres–Briggs Type Indicator (MBTI) MBTI: A Brief Profile Myres–Briggs Type Indicator (MBTI) is a personality inventory that was developed by Isabel Briggs Myres and her mother Katherine C. Briggs (Briggs et al. 2003). It is based on Carl Jung’s theory of psychological type. The purpose of MBTI ‘is to make the theory of psychological types as described by C.G. Jung (1921–71) (1971) understandable and useful in people’s lives’ (MBTI Manual, Briggs et al. 2003: 3). The test consists of 93 items and is applicable to persons aged 14 years and over. Based on Jung’s theory, these 93 items measure four different bipolar continuums like: 1. 2. 3. 4.
Extraversion–Introversion (E–I), Sensing–Intuition (S–N), Thinking–Feeling (T–F) and Judging–Perceiving (J–P).
Applications of Psychological Testing in Organisational Setting
259
Items of the test are arranged in four parts, all of which have varying number of items. Part I asks ‘which answer comes closest to describing how you usually feel or act’ and has 26 items. The items are like: Do you consider yourself to be z z
More of a spontaneous person or More of an organised person.
So, items, as can be seen, are of forced-choice type which have been standardised on a normative sample of 3,009 US adults (18+), generally representing the sex and ethnicity consistent with 1990 US census, although white women were overrepresented and black men were underrepresented. The test has split-half reliability of 0.90, the test–retest reliability ranges from 0.83 to 0.97, internal consistency (coefficient alpha) ranges from 0.83 to 0.97, and internal consistency (coefficient alpha) for males and females range from r = 0.90 to r = 0.93. Its validity is from moderate to high when correlated with five-factor model of personality as portrayed in the NEO-PI-R (Erford 2006). MBTI is a very popular test and more than three million people are administered MBTI each year (Michael 2003). It has been used to study areas like communication, conflict, leadership, change, customer relationship, and so on.
MBTI: Scores and Interpretation The scores of the subject ABC indicate that on the Introversion–Extraversion dimension, the subject has a leaning towards Introversion. On the scale of Sensing–Intuition, the score is higher for Intuition. For Thinking–Feeling, Feeling has a higher score and for Judging–Perceiving, ABC shows a higher score for Perceiving. These results indicate that ABC has the combination of INFP (Table 15.1). At their best, people with INFP preferences have an inner core of values that guides their interactions and decisions. They want to be involved in work that contributes to both their own growth and inner development and those of others, that is, to have a purpose beyond their paycheck. They make a priority of clarifying their values and living in congruence with them. An INFP combination indicates that ABC prefers to focus on his inner world of ideas and experiences. He directs his energy and attention inward and receives energy from reflecting on his thoughts, memories and feelings. He is able to recognise and honour the emotional and psychological needs of others, even when others may not have recognised or expressed their own needs. As an INFP, ABC primarily uses his Feeling preference internally where he may need to make decisions. His decisions are primarily based on his values of self-understanding, individuality and growth. Living by moral commitments to what he believes in is crucial to a person with an INFP combination. He is likely to be: z z
Sensitive, concerned and caring and Idealistic and loyal to his ideas.
260
Applied Psychometry
Table 15.1 E
I
S
N
T
F
J
P
1 4 4 3 1 1
0 1 3 5
3 0
4 0
4
0
1
32 2
0 0
2
2
2
3
0 6
4
2
6 32
1
7 8
0 10
11
4
2
22
5
19
2
20
REPORTED TYPE E or 1
S or N
T or F
J or P
I
N
E
P
TIE-BREAKING RULE
If E = I then write I If S = N then write N If T = F then write F If J = P then write P
RAW POINTS: Extraversion
F
10
11
I
Introversion
Sensing
S
4
22
N
Intuition
Thinking
T
5
19
F
Feeling
Judging
J
2
20
P
Perceiving
Applications of Psychological Testing in Organisational Setting
261
An INFP person enjoys reading, discussing and reflecting on possibilities for positive change in the future. He is curious about ideas and quick to see connections and meanings. As an INFP, the subject is likely to: z z
Be curious and creative and Have long-range vision.
Another aspect of an INFP person is that he is usually fascinated by opportunities to explore the complexities of human personality—his own and others’. He is capable of putting himself in the other person’s place and can be empathetic. He may tend to work in bursts of energy and is capable of great concentration and output when fully engaged in a project. A person with an INFP personality is generally faithful in fulfilling obligations related to people, work or ideas to which he is committed, but can have difficulty performing routine work that has little meaning for him.
How Others May See Them INFP persons find structures and rules confining and prefer to work autonomously. They are adaptable and flexible until something violates their inner values. Then they stop adapting. The resulting expression of value judgements can emerge with an intensity which is surprising to others. Individuals with INFP personality tend to be reserved and selective about sharing their most deeply held values and feelings. They value relationships based on depth, authenticity, true connection and mutual growth. INFP persons prize most those who take time to understand their values and goals. Others usually see them as: z z z
Sensitive, introspective and complex, Original and individual and Sometimes difficult to understand.
Potential Areas for Growth z
z
z z z
Sometimes they may fail to take in information and may fail to notice the realities of situations. Then they may make decisions based solely on personal values and may find it difficult to translate their values into action. They may not take time for the inner valuing process by which they make their best decisions; instead, they may go from one exciting possibility to another and achieve little. Have uncharacteristic difficulty expressing themselves verbally. Withdraw from people and situations. Do not give enough information to others, especially about important values.
262
z z
z
Applied Psychometry
Become easily discouraged about the contrast between their ideals and accomplishments. Reject logical reasoning even in situations that require it, asserting the supremacy of their internal viewpoint. Be impractical and have difficulty estimating the resources required to reach a desired goal.
Under great stress, the INFP persons may begin seriously doubting their own competence and that of others, becoming overly critical and judgemental.
Emotional Quotient (EQ) Test EQ Test: A Brief Profile The EQ test given in this book has been developed by Professor N.K. Chadha (2006). The present EQ test is based on the operational definition proposed by the author that: . . . emotional intelligence is the ability of an individual to appropriately and successfully respond to a vast variety of emotional stimuli being elicited from the inner self and immediate environment. Emotional intelligence constitutes three psychological dimensions such as emotional sensitivity, emotional maturity and emotional competency, which motivate an individual to recognise truthfully, interpret honestly and handle tactfully the dynamics of human behavior. The test has been designed in such a way that it measures all the three emotional dimensions. This test has been standardised for professional managers, businessmen, bureaucrats, artists and graduate student population. This EQ test has a test–retest and split-half reliability of 0.94 and 0.89, respectively, and validity of 0.89. The test consists of 22 situational items with four alternatives for each item. Items 2, 8, 16, 17 and 22 measure emotional sensitivity, items 4, 6, 9, 11, 12, 18 and 21 measure emotional maturity, and items 1, 3, 5, 7, 10, 13, 14, 15, 19 and 20 measure emotional competency. Some sample items from the test are given here: 3. At the workplace, due to some misunderstanding, your colleagues stop talking to you. You are convinced that there was no fault of yours. How will you react? a) b) c) d)
Wait till they come and start talking to you again. Take the initiative, go forward and start talking to them. Let things take their own time to improve. Ask someone to mediate.
18. While having an argument with someone, if you lose, you: a) Feel totally beaten. b) Wait for the next opportunity to beat your opponents.
Applications of Psychological Testing in Organisational Setting
263
c) Winning and losing are part of the game. d) Analyze the reasons for the loss. 21. You have lived your life for so many years on this earth. How would you like to explain your life at the moment in one sentence? a) b) c) d)
Successful: Well, I am a contended person who got whatever could make me feel happy. OK: Well, it’s a mixed experience for me. It’s 50: 50. Comfortable: Well, destiny is in the hand of God. Man is just a puppet. Uncomfortable: Well, I feel I deserved better but could not get it.
Scores and Interpretation Your Name: XYZ. Age: 25. Profession: Corp. employee Gender: Male. Qualifications: Masters. Country: India. Date: 25-09-08.
Response sheet for EQ test: Response of the subject is shown in bold and underlined options: 1. a b c d
12. a b c d
2. a b c d
13. a b c d
3. a b c d
14. a b c d
4. a b c d
15. a b c d
5. a b c d
16. a b c d
6. a b c d
17. a b c d
264
Applied Psychometry
7. a b c d
18. a b c d
8. a b c d
19. a b c d
9. a b c d
20. a b c d
10. a b c d
21. a b c d
11. a b c d
22. a b c d
Calculate your score by using this scoring key: Question No.
Response
Score
1.
a. b. c. d.
15 5 10 20
2.
a. b. c. d.
5 10 15 20
3.
a. b. c. d.
15 20 5 10
4.
a. b. c. d.
20 15 10 5
5.
a. b. c. d.
5 20 15 10
Applications of Psychological Testing in Organisational Setting
Question No.
Response
Score
6.
a. b. c. d.
10 20 5 15
7.
a. b. c. d.
5 15 20 10
8.
a. b. c. d.
10 5 20 15
9.
a. b. c. d.
5 10 20 15
10.
a. b. c. d.
5 20 15 10
11.
a. b. c. d.
5 10 15 20
12.
a. b. c. d.
20 15 10 5
13.
a. b. c. d.
5 15 20 10
14.
a. b. c. d.
10 15 5 20
15.
a. b. c. d.
10 10 20 5
16.
a. b. c. d.
5 10 15 20
265
266
Applied Psychometry
Question No.
Response
Score
17.
a. b. c. d.
5 10 15 20
18.
a. b. c. d.
5 10 15 20
19.
a. b. c. d.
5 20 15 10
20.
a. b. c. d.
15 20 10 5
21.
a. b. c. d.
20 15 10 5
22.
a. b. c. d.
20 15 10 5
Quantitative analysis of your scores: EQ DIMENSIONS
Situations
Your Score
SENSITIVITY
2-8-16-17-22
MATURITY
4-6-9-11-12-18-21 (7 situations)
(5 situations)
Your P (percentile)
75
P- 50
90
P- 50
COMPETENCY
1-3-5-7-10-13-14-15-19-20 (10 situations)
160
P- 75
TOTAL EQ SCORE
All Situations (22 situations)
325
P- 75
The percentile table: EQ DIMENSIONS
P-75
P-50
P-40
P-20
93–100
86–92
66–85
36-65
< 35
MATURITY (Range of score: 35–140)
133–140
113–132
88–112
53-87
< 52
COMPETENCY (Range of score: 50–200) TOTAL EQ (Range of score: 110–440)
168–200 379–440
141–168 308–379
97–140 261–307
71–96 159–260
< 70 < 158
SENSITIVITY (Range of score: 25–100)
P-90
Applications of Psychological Testing in Organisational Setting
267
Qualitative analysis of your scores: Percentile
Interpretation
P-90 P-75 P-50 P-40 P-20
Extremely high EQ High EQ Moderate EQ Low EQ Try the test again some other day
DIRECTORY OF MAJOR TESTS USED IN ORGANISATIONAL SETTINGS Major Indian Tests Useful in an Organisational Setting 1 Name: Supervisory Behaviour Description. Author: Edwin A. Fleishman. Applicability: Supervisors. Measures: Leader behaviour dimensions: Consideration (C) and Structure (S). Reliability: Split-half reliability for C ranges from 0.89 to 0.98, and for S, it ranges from 0.68 to 0.87; test–retest: 0.56, 0.58 and 0.87 for C, and 0.46, 0.53 and 0.75 for S; inter-rater agreement is somewhat marginal: 0.55–0.73 for C, and 0.47–0.90 for S. Validity: Construct validity, predictive and constructive validity. Availability: Management Research Associates, 185, North Wabash Ave, Chicago, Illinois. 2 Name: Managerial Style Questionnaire. Author: Brue A. Kirchhoff. Applicability: Managers and subordinates. Measures: Extent of goal use. Reliability: Test–retest reliability is 0.72 for total raw score and 0.62 for standardised score. Validity: Construct validity. Availability: Personnel Psychol, 20, Nussar St., Princeton, N.J. 3 Name: Career Awareness Inventory. Author: La Verna M. Fadale.
268
Applied Psychometry
Applicability: Grades 4 to 8. Measures: Used for career education and familiarity with and knowledge of a variety of careers. No. of items: 12 occupational clusters; 125-item inventory with seven parts: Part I (61 items), Part II (6 items), Part III (32 items), Part IV (4 items), Part V (5 items), Part VI (10 items), and Part VII (7 items). Reliability: Split-half reliability is 0.80. Validity: Relationship to reading achievement and intelligence and the intercorrelations of the subparts (r = 0.47 and 0.62, respectively). Availability: Scholastic Testing Service, Inc., 480, Meyer Road, Bensenville, Illinois.
4 Name: Junior High School Student’s Rating Scale of Teaching Effectiveness. Authors: L. Grant Somers and Mara L. Southern. Applicability: 11 to 14 years. Measures: Teacher’s behaviour as evaluated by the students. No. of items: Eight-item Likert-type scale. Reliability: Alpha coefficient of 0.84. Availability: Mara L. Southern, Department of Testing and Evaluation, California–95192.
5 Name: Minnesota Satisfaction. Questionnaire. Authors: David J. Weiss, Rene V. Dawis, George W. England and Lloyd H. Lifquist. Applicability: Business and industry. Measures: Satisfaction with a number of different aspects of the work environment and evaluates vocational rehabilitation outcome. No. of items: Long form consists of 100 items in Likert-type response format; the short form contains 20 items. Reliability: Internal consistency coefficients computed for 27 occupational groups: 0.80 or higher; stability (one week period) coefficients ranged from 0.66 to 0.91; test–retest (one year) correlations ranged from 0.35 to 0.71. Validity: Construct validity. Availability: Psychology Research, Elliot Hall, University of Minnestota, Minneapolis, Minnesota.
6 Name: College and University Environments Scales, Second Edition. Author: C. Robert Pace.
Applications of Psychological Testing in Organisational Setting
269
Applicability: College Students. Measures: The cultural, social and intellectual climate of the campus. Original five scales are: 1. 2. 3. 4. 5.
Practicality (20 items), Community (20 items), Awareness (20 items), Propriety (20 items) and Scholarship (20 items).
Two additional scales in this edition are: 1. Campus morale (22 items) and 2. Quality of teaching and faculty–student relationships (11 items). No. of item: 160. Reliability: Test–retest reliability groups within a single institution showed that 80 per cent differed by three points or less and 90 per cent differed by four points or less. Validity: Reported high. Availability: Educational Testing Service. Princeton, N.J.
7 Name: Kuldau Occupational Development Inventory (KODI). Authors: Yon D. Kuldau and Janice E. Kuldau. Applicability: Grades 4 to 6. Measures: Attitudes children in grades four, five and six have acquired towards the world of work: money, working conditions, states and prestige, leadership, self-expression and independence on the job. No. of items: 40. Validity: Content validity. Availability: Yon D. Kuldau and Janice E. Kuldau, I White Birch. Trial, Superior, Wisconsin–54880.
8 Name: Occupational Aspiration Scale (OAS). Authors: Archibald O. Haller and Irwin W. Miller. Applicability: Primarily male high school students. Measures: Level of occupational goal. No. of items: Six multiple choice questions each consisting of 10 different occupations.
270
Applied Psychometry
Reliability: 19 estimates of the reliability of OAS range from 0.69 to 0.84 with a mean of 0.73. Validity: Coefficient of concurrent validity is a moderate value of 0.62 based on OAS’s correlation with the North–Hatt technique. Availability: Archibald O. Haller, Rural Sociological Research Laboratory, WARF Office Building Room–617, 610, Walnut Street, Madison, Wisconsin–53706.
9 Name: Minnesota Satisfactoriness Scale. Authors: Dennis L. Gibson, Daird J. Weoss, Rene V. Dawis and Lloyd H. Lofquist. Applicability: Employees. Measures: The satisfactoriness of an individual as an employee. No. of items: 28-item questionnaire. Reliability: Hoyt reliability coefficient: 0.69–0.95; test–retest (stability) correlation coefficients: 0.40–0.68. Availability: Vocational Psychology Research, Elliott Hall, University of Minnesota, Minneapolis, Minnesota.
10 Name: Overall Job Satisfaction. Authors: Dr A.H Brayfield and H.F. Rothe. Applicability: Applicable to a wide variety of jobs. Measures: Job satisfaction. No. of items: 18. Reliability: Spearman-Brown coefficient of internal reliability is 0.87. Validity: Correlation with Organisational Commitment (Cook and Wall 1980): 0.58. Availability: Brayfield and Rothe (l95l).
11 Name: Job Descriptive Index. Authors: P.C. Smith, L.M. Kendall and C.L. Hulin. Applicability: Employees at any level. Measures: The principal features of satisfaction and its emphasis is on extrinsic satisfaction. No. of items: Five subscales and a total of 72 items: Type of work (18 items), Pay (9 items), Promotion Opportunities (nine items), Supervision (18 items), Coworkers (18 items). Reliability: Internal reliability: 0.93; Spearman-Brown coefficients range from 0.80 to 0.88.
271
Applications of Psychological Testing in Organisational Setting
Validity: Concurrent validity is 0.66 (Koch and Steers 1978) between total JDI Score and measure of job attachment. Availability: Smith et al. (1969).
12 Name: Organisational Commitment Questionnaire. Authors: L.W. Porter and F.J. Smith. Applicability: Employees at any level. Measures: The organisational commitment which is defined as the strength of an individual’s identification with an involvement in a particular organisation is said to be characterised by three factors: a strong belief in and acceptance of the organisation’s goals and values, a readiness to exert considerable effort on behalf of the organisation and a strong desire to remain a member of the organisation. No. of items: 15. Reliability: Ranges from 0.82 to 0.93 (Dubin et al. 1975; Steers 1977); internal reliability (Kerr 1978); 0.86. Validity: Convergent validity comes from significant negative correlations with work-oriented interests using Dubin’s (1956) measures of central life interests. Availability: Porter and Smith (1970).
13 Name: Organisational Commitment. Authors: J. Cook and T.D. Wall. Applicability: Blue collar workers of modest educational attainment. Measures: The organisational commitment in terms of identification, involvement and loyalty. No. of items: Nine. Reliability: Test–retest reliability: 0.50 (N = 63). Validity: Correlation with overall job satisfaction: 0.62. Availability: Cook and Wall (1980). 14 Name: Job Related Tension. Authors: R.L. Kahn, D.M. Wolfe, P.P. Quinn and J.D. Snock. Applicability: Employees at all organisational levels. Measure: The nature, causes and consequences of two kinds of organisational stress: role conflict and role ambiguity.
272
Applied Psychometry
No. of items: 15. Reliability: Alpha = 0.84. Validity: The correlation with attitude towards job (Vroom 1960): 0.49. Availability: Kahn et al. (1964).
15 Name: Self-Esteem at Work Authors: R.P. Quinn and L.J. Shephard. Applicability: Employees at all levels. Measures: Self-esteem in job related context. No. of items: Four items. Reliability: Spearman-Brown internal reliability coefficient: 0.68. Validity: Self-esteem at work was correlated 0.44, 0.48 and 0.50 with depressed mood at work, lifesatisfaction and overaIl job satisfaction, respectively. Availability: Quinn and Shephard (1974).
16 Name: Intrinsic Motivation. Authors: E.E. Lawler and D.J. Hall. Applicability: Blue colIar employees, professional workers. Measures: Intrinsic motivation which has been defined as the extent to which an employee is motivated to perform because of subjective rewards or feelings he expects as a results of performing well. No. of items: Four items each with a seven point scale. Reliability: 0.72 (internal consistency). Validity: Factorial validity established. Availability: Lawler and Hall (1970).
17 Name: Intrinsic Job Motivation. Authors: P.B. Warr, J. Cook and T.D. Wall. Applicability: Employees of modest educational attainment. Measures: Intrinsic job motivation which has been defined as the degree to which a person wants to work well in his job in order to achieve intrinsic satisfaction. No. of items: 6.
Applications of Psychological Testing in Organisational Setting
273
Reliability: Test–retest reliability: 0.65. Validity: Correlation with interpersonal trust at work: 0.30; correlation with organisational commitment: 0.45. Availability: Warr et al. (1979).
18 Name: Work Environment Preference Schedule. Author: L.V. Gordon. Applicability: Male and female college students. Measures: Individual bureaucratic orientation, that is, the respondent’s commitment to the set of attitudes, values behaviours that are characteristic of bureaucratic organisations. No of items: 24. Reliability: Internal reliability coefficients range from 0.83 to 0.91; test–retest correlation: 0.65. Validity: Construct validity provided: correlation with authoritarianism 0.50 and 0.66, respectively, in samples of 108 and 81 female and male college students. Availability: Gordon (1973).
19 Name: Role Ambiguity and Role Conflict. Authors: J. Rizzo, R.J. House and S.J. Lirtzman. Application: Managerial and occupational groups. Measures: The degree of ambiguity and of conflict regarding the role which is viewed as a set of expectations about behaviour and the two central concepts are role ambiguity and role conflict. No. of items: 30. Reliability: Szilagyi et al. (1976) (cited in fields 2002) report Spearman-Brown internal reliability coefficients of 0.76 and 0.90 for role ambiguity, and 0.90 and 0.94 for role conflict. Validity: Cross-validity established. Availability: Rizzo et al. (1970).
20 Name: Leadership Opinion Questionnaire. Author: E.A. Fleishman. Applicability: Supervisors. Measures: It is a measure of a leader’s opinions about desirable leadership behaviour.
274
Applied Psychometry
No. of items: 40. Reliability: Kuehl et al. (1975) obtained a correlation of 0.35 between the two subscales. Validity: Concurrent validity established. Availability: Fleishman (1953).
21 Name: Conflict Resolution Strategies. Authors: G. Howat and M. London. Applicability: Superior–subordinate or other dyads. Measures: It provides a measure of how conflicts are handled within named superior–subordinate or other dyads. Following the work of Blake and Mouton (1964), five strategies are identified, for which there are five subscales: Confrontation, Withdrawal, Forcing, Smoothing and Compromise. No. of items: 25. Reliability; Test–retest correlation for five scales: 0.68, 0.49, 0.55, 0.55 and 0.49. Validity: 0.17 and 0.43. Availability: Howat and London (1980).
22 Name: Achievement Related Affect Scale. Authors: Daniel Solomon and Judy Yaeger. Applicability: 8 to 15 years. Measures: Achievement motivation that contains elements of behavioural striving for success and of affect associated with both success and failure. No. of items: 20. Reliability and validity: Internal consistency reliability (alpha): 0.47; correlations with other measures: n; social desirability (Crandall): 0.02; Locus of Control (IRA, Crandall): 0.24; IQ scale: 0.11; Anxiety (CMAS): 0.06; Lie Scale: 0.02 and Grade Point Average: 0.02. Availability: Daniel Solomon, Psychological Services Section, Montgomery County Public School, 850, Hunger Ford Drive, Rockville, Maryland–20850.
Major Indian Tests Useful in an Organisational Setting 23 Name: Interpersonal Trust Scale. Author: K.J. Christopher.
Applications of Psychological Testing in Organisational Setting
275
Applicability: Small-scale industrial entrepreneurs; could also be used with other industrial managements. Measures: The interpersonal trust orientation. No. of items: 25. Reliability: Reliability has been established by Rational Equivalence Method. Validity: The difficulty and validity indices for all the 25 items have been established. Availability: K.J. Christopher. ‘Social-Psychological factors influencing the, adoption of the innovation of starting a small industry unit’, Hyderabad: SIET Institute.
24 Name: Workers’ Attitude Questionnaire. Authors: A. Hafeez and S.V. Subbaraya. Applicability: Adult male and female workers, supervisors of middle management personnel. Measures: Attitude of industrial workers, office personnel and supervisors and such others to work, working conditions, management (employers), and so on. No. of items: 21. Reliability: Split-half reliability established. Availability: A. Hafeez Department of Industrial Management, Indian Institute of Science. Bangalore–560012.
25 Name: Organisational Atmosphere Questionnaire. Author: Jayalakshmi Indiresan. Applicability: Engineering colleges (for teachers); could also be used with other colleges. Measures: The organisational atmosphere. No. of items: 20. Reliability: Odd–even reliability coefficient was 0.934 after Spearman-Brown formula was used. The subset scores were found to be highly intercorrelatable with each other. Validity: The subtests were derived by using factor analysis. The factor structure was very much similar to Haplin and Crafts Organisational Climate Descriptive Questionnaire (OCDQ) dimensions. Availability: Indiresan (1973). 26 Name: Trade Union Affiliation Questionnaire. Author: Babu Thomas. Applicability: Employees.
276
Applied Psychometry
Measures: The affiliation of trade unions. No. of items: 30 questions covering five areas (five-point scale). Reliability: Test–retest reliability: 0.87. Validity: Boseroa; r = 0.614. Availability: Babu Thomas, Department of Psychology, M.S. University, Baroda.
27 Name: MAO(C). Author: Udai Pareek. Applicability: Students, entrepreneurs and institutions. Measures: Motivational climate of organisations. No. of items: 72 items measuring six motives, divided into eight dimensions of six items each (sixpoint scale). Reliability: Test–retest reliability established. Validity: Established by factor analysis. Availability: Pareek (1975).
28 Name: Organisational Effectiveness Scale. Author: C.N. Daftuar. Applicability: Supervisors and above in organisations. Measures: Nine dimensions/soft criteria of organisational effectiveness. No. of items: 46 (Likert-type). Reliability: Alpha coefficient ranged between 0.51 to 0.99. Validity: Validated on basis of data from effective and non-effective organisation. Availability: C.N. Daftuar, Department of Psychology, M.S. University, Baroda.
29 Name: Work Satisfaction Questionnaire (for Workers). Authors: Prayag Mehta and Mahaveer Jain. Applicability: Industrial workers. Measures: Level of satisfaction with work. No. of items: 26 questions. Reliability: KR-20: 0.85; split-half: 0.67.
Applications of Psychological Testing in Organisational Setting
277
Validity: Factorial validity and face validity established. Availability: Mehta (1976).
30 Name: Role Conflict Inventory. Author: B.A. Parikh. Applicability: Married working women. Measures: The level of adjustment and degree of conflict. No. of items: 40 (Likert-type scale). Reliability: Test–retest: 0.84; split-half: 0.79. Validity: Validated against self-evaluation. Availability: Rupa Psychological Centre, Bhelupura, Varanasi. 31 Name: Occupational Stress Index (OSI). Authors: A.K. Srivastava and A.P. Singh. Applicability: Employees. Measures: Degree of perceived stress arising from various aspect of the job of the employees. No. of items: 56 items: 28 are true-keyed and 28 are false-keyed. Reliability: Split-half: 0.935; Cronbach’s alpha: 0.90. Validity: Coefficients of correlation between the scores on the occupational stress index and the measures of job involvement, applicability strength and employee’s motivation were found to be 0.80 (N = 120), 0.40 (N = 120) and 0.44 (N = 200), respectively. Availability: Srivastava and Singh (1981). 32 Name: Resistance to Change Scale. Author: D.M. Pestonjee. Applicability: Industrial workers and supervisors. Measures: Organisational change. No. of items: 50 items on social, economic and personal aspects. Reliability: Split-half reliability: 0.96. Validity: Item-test correlation method. Availability: Pestonjee (1972).
278
Applied Psychometry
33 Name: Job Satisfaction Questionnaire (JSQ). Authors: P. Kumar and D.N. Mutha. Applicability: Teachers. Measures: Four different aspects of job satisfaction in teaching work: salary; security and promotion policies; institutional plans and policies; and authority, including school management. No. of items: 29 Yes/No type. Reliability: Split-half: 0.95 (N = 100); test–retest: 0.73 (N = 60). Validity: Face validity very high. Availability: Kumar and Mutha (1975).
34 Name: School Organisational Climate Description Questionnaire. Author: Motilal Sharma. Applicability: Students. Measures: Institutional climate of schools and other organisations. No. of items: 64 Likert-type distributed over eight dimensions. Reliability: Coefficient of internal consistency ranged between 0.34 to 0.81. Validity: Mamal correlation with Holpin and Crofts Scale. Availability: M.L. Sharma, National Psychological Corporation, Kacheri Ghat, Agra.
35 Name: Organisational Role Stress (ORS) Scale. Author: Udai Pareek. Applicability: University students, organisations and training institutions. Measures: Role stress. No. of items: 50 items for 10 organisational role stresses. Reliability: Test–retest calculated. Validity: Highly significant coefficients. Availability: National Psychological Corporation, Kacheri Ghat, Agra.
36 Name: An Inventory to Measure Quality of Working Life. Authors: Prakash Sinha and O.B. Sayeed.
Applications of Psychological Testing in Organisational Setting
279
Applicability: Workers and supervisors. Measures: Different dimensions of quality of working life. Reliability and Validity: Established good. Availability: Sinha and Sayeed (1980).
37 Name: Motivation Scale. Author: M.C. Agarwal. Applicability: Middle-level supervisors. Measures: Motivational level of supervisors. No. of items: 20 (Likert-type, five point scale). Reliability and Validity: In item analysis, all items found significant at 1 per cent level. Availability: P.O. NITIE, Vihar Lake Road, Bombay.
38 Name: Comprehensive Scale of Entrepreneurship. Author: V.P. Sharma. Applicability: Identify the entrepreneur during his educational career. Measures: Self-perception, organisational ability and skill, personality maturity, executive reaction pattern, human relation and human engineering. Reliability: Coefficient of stability and coefficient of internal consistency are as high as (r = 0.7) required to render a scale reliable. High test–retest reliability could be due to social desirability. Availability: National Psychological Corporation, Agra.
39 Name: Teachers Rating Scale. Author: R.C. Deva. Applicability: Evaluate teaching effectiveness of teachers in profession. Measures: Observation of actual behaviour in teaching situations. No. of items: 17 dimensions divided into three divisions viz (a) personal qualities, (b) professional competence and (c) class-room performance; Seven point scale. Reliability: Inter-rater reliability coefficient is 0.97. Validity: Coefficient of correlation of 0.85 was obtained. Availability: National Psychological Corporation, Agra.
280
Applied Psychometry
40 Name: Attitude Scale for Locating Pro- and Anti-Management. Author: K.D. Kapoor. Applicability: For employees of any organisation. Measures: Attitude towards management. No. of items: 20 statements (Likert-type) Reliability: Odd–even: 0:32. Validity: r between supervisor’s rating and obtained score was 0.79. Availability: Agra Psychological Research Cell, Agra.
41 Name: Interpersonal Trust Scale. Authors: N.K. Chadha and Prabha Menon. Applicability: Industrial Worker. Measures: Confidence, reliance, faith, credibility, confidentiality, security and communicability. No. of items: 62 (Likert-type). Reliability: Split-half: 0.89–0.98. Validity: 0.88–0.96. Availability: N.K Chadha, Department of Psychology, Delhi University, Delhi–110007.
42 Name: Job Satisfaction Scale. Authors: N.K. Chadha and S. Handu. Applicability: Employees and supervisors. Measures: Motivators (work, recognition, responsibility, achievement, advancement), hygiene (departmental policy, interpersonal relations with peers, superiors, working conditions, technical competence, pay status, job security). No. of items: 96 (Likert-type). Reliability: Split-half: 0.82; test–retest: 0.25–0.84. Validity: Empirical validity with job descriptive index: 0.22–0.90. Availability: N.K. Chadha, Department of Psychology, Delhi University, Delhi–110007.
43 Name: Scale to Measure Attitude towards Working Women. Author: S. Sultan Akhtar.
Applications of Psychological Testing in Organisational Setting
281
Applicability: Adults (men and women). Measures: The attitude towards working women. No. of items: 20. Reliability: Split-half reliability varies from 0.72 to 0.89 (N = 100). Validity: Tetrachoric coefficients of correlation were calculated between items and total scores, and only those items were included that yielded values of r significant at 1 per cent level. Also, discriminative value of each item was computed. Availability: Department of Psychology, Aligarh Muslim University, Aligarh.
282
Applied Psychometry
PART 4 Ethical Issues in Psychological Testing
284
Applied Psychometry
16
Ethical Issues in Psychological Testing
CHAPTER OUTLINE 1. 2. 3. 4. 5. 6.
Ethical considerations in psychological testing Specific principles for psychological testing: America Psychological Association (APA) guidelines Moral and legal standards Inter-professional relation Responsibility towards organisations Promotional activities
LEARNING OBJECTIVES After reading this chapter, you will be able to understand: 1. What are the main ethical considerations involved in psychological testing? 2. What are the specific norms and principles that a tester is expected to adhere while testing?
285
286
Applied Psychometry
M
ost of the psychological research has two parties, the experimenter and the subject. While carrying research, a contemporary behavioural scientist is required to keep the subject’s best interest in mind at all times. Many concerned people have spent considerable time and effort struggling with questions about ethical principles in the context of the subject being used as a guinea pig. It has been claimed that the tests and experiments conducted in psychological laboratories invade privacy, create unwarranted emotional stress and are potentially damaging to the well-being of the participants. Dyer (1964) remarked, ‘tests are a menace to education only in the sense that automobiles are menace to physical well-being. If you use them wrong, you get into trouble. If you use them right, you open up all parts of new possibilities for the betterment of mankind.’ No test can be appropriate in all circumstances or all purposes. The purpose and conditions need to be specified to clarify the limits of each test. As it is, the testing movement dates back to mere 70 to 80 years and, hence, it is difficult to expect perfection of these testing tools. Various problems and issues arise at stages like construction, administration and measurement of tests, and interpretation of the results. Most of the issues concerning construction and measurement are related to cultural specificity of tests. Over time, again and again, the question of reliability of tests is raised, more so ever since distressing findings appeared in the literature suggesting that psychiatric diagnosis was being done quite unreliably (Zubin 1967). Similarly, there have been many reports of the failure of objective tests to predict such matters as success in an occupation. Ability test items have been attacked by some observers as repressing individuality and undermining self-esteem. The content of certain items on personality tests has been objected to quite strongly at times. Test scores, in a general sense, are quantitative descriptions of some aspect of human behaviour. Behaviour that is exhibited under prescribed conditions may be quantified in a variety of ways. The question of choice of dimensions is also an important problem. How far the choice is relevant to the purpose for which the measurements are taken has to be considered. The most common type of test is that composed of certain number of items, each item presenting its own task. The score is the number of ‘correct’ responses of the subject in a limited time. The items may cover a range of difficulty or they may be concentrated at one level of difficulty. In the latter case, the score depends upon the probability of the examinee’s passing items at that particular level. Some examinees have greater probability than others and, hence, they are assumed to have higher levels of ability than others. If each item is of a different level of difficulty, some can pass more difficult items than others and, hence, obtain a higher score. The time limit that is ordinarily imposed for convenience in group testing provides another source of variance in the score, in that different examinees work at different rates of speed. Without proof, it is not appropriate to assume that rate of work is closely correlated with the ability to master items at different difficulty levels. The relative contributions of speed and power in time limit tests are usually unknown. Despite these criticisms levelled at the construction and measurement of tests, in India, the merits of these psychological tests have been recognised and in future, they may well have a revolutionary effect upon psychological measurement, yet it has not gained the appropriate place in the general life patterns. The reason cannot simply be attributed to its recent origin in this country. Various problems and issues need to be dealt with before psychological testes lend their usage as they do in the West. The most obvious issue is that of ignorance and misuse or rather disuse of the test.
Ethical Issues in Psychological Testing
287
After psychological evaluation, at least in the psychiatric arena, the usual isomorphic relationship between assessment and treatment does not hold. The isolated and extended psychological examination frequently proves to be an empty academic exercise. Treatment pattern is unrelated to the findings in the report. Consequently, it is not surprising that most of the Western tests are simply adapted for Indian population, even though the items may not be applicable. The most serious issue concerning testing is that the need of qualified and trained administrator is virtually ignored. The mere existence of high quality tests and a set of conditions do not guarantee that the tests will be administered properly. An untrained examiner may break standard administration and invalidate an item, for he does not completely comprehend the intent of the item. Such a problem raises the possibility that the item itself may contain unintended ambiguity. Thus, it is imperative that the item procedure and intent are completely comprehended by the clinicians. Mere knowledge of the administration is not sufficient. The examiner must be able to evaluate the extent of information to be imported and also know how to communicate. For example, scores in themselves may be misinterpreted by the parents of a child and the child is in danger of being wrongly underestimated or overestimated. The disclosure of test information should be selective, keeping in mind the recipient’s ability to understand the recipient’s need and in keeping with the right to know and the welfare of the examinee. In India, tests are administered for research and evaluation purposes. But, instead of being administered by psychologists, the responsibility is generally handed over to clerks or untrained teachers or friends. Some tests, such as multiple choice achievement tests, can be administered, scored and interpreted by most users with little training. But there are others such as intelligence tests, which require highly trained administrator. As it is, most of the intelligence tests have questionable validity on Indian population. An untrained clerk would further invalidate the scores. This fact needs further exploration. In fact, an empirical research would help in finding the extent of variance an untrained administrator can cause. Ethical issues in experimentation got their share of attention only during the 1990s, though The American Psychological Association discussed these issues separately and exclusively in the Biographical Directory of the American Psychological Association, some three decades back. It is reproduced here in its original form.
SPECIFIC PRINCIPLES Principle 1: Responsibility The psychologist committed to increasing man’s understanding of man places high value on objectivity and integrity, and maintains the highest standards in the services he offers. 1. As a scientist, the psychologist believes that society will be best served when he investigates where—his judgement indicates—investigation is needed. He plans research in such a way as
288
Applied Psychometry
to minimise the possibility that his findings will be misleading and he publishes full report of his work, never discarding without explanation the data which may modify the interpretation of his results. 2. As a teacher, the psychologist recognises his primary obligation to help others acquire knowledge and skill, and to maintain high standards of scholarship. 3. As a practitioner, the psychologist knows that he bears a heavy social responsibility because his work may touch intimately the lives of others.
Principle 2: Competence The maintenance of high standards of professional competence is a responsibility shared by all psychologists in the interest of the public and of the profession as a whole. 1. Psychologists discourage the practice of psychology by unqualified persons and assist the public in identifying psychologists competent to give dependable professional service. Whenever a psychologist or a person identifying himself as a psychologist violates ethical standards, psychologists who know first hand of such activities, attempt to rectify the situation. When such a situation cannot be dealt with informally, it is called to the attention of the appropriate local, state or national committee on professional ethical standards and practices. 2. Psychologists regarded as qualified for independent practice are those who (a) have been awarded a diploma by the American Board of Professional Psychology, (b) have been licenced or certified by state examining boards, or (c) have been certified by voluntary boards established by state psychological associations. Psychologists who do not meet the qualifications recognised for independent practice should gain experience under qualified supervision. 3. The psychologist recognises the boundaries of his competence and the limitations of his techniques and does not offer services or use techniques that fail to meet professional standards established in particular fields. The psychologist who engages in practice assists his clients in obtaining professional help for all important aspects of his problem that fall outside the boundaries of his own competence. This principle requires, for example, that provision be made for the diagnosis and treatment of relevant medical problems and for referral to or consultation with other specialists. 4. The psychologist in clinical work recognises that his effectiveness depends in good part upon his ability to maintain interpersonal relations; that temporary or more enduring aberrations in his own personality may interfere with this ability or distort his appraisals of others. Therefore, he refrains from undertaking any activity in which his personal problems are likely to result in inferior professional services or harm to a client; or if he is already engaged in such an activity when he becomes aware of his personal problems, he seeks competent professional assistance to determine whether he should continue or terminate his services to his client.
Ethical Issues in Psychological Testing
289
Principle 3: Moral and Legal Standards The psychologist in the practice of his profession shows sensible regard for the legal codes and moral expectations of the community in which he works, recognising that violations of accepted moral and legal standards on his part may involve his clients, students or colleagues, and impugn his own name and the reputation of his profession.
Principle 4: Misrepresentation The psychologist avoids misrepresentation of his own professional qualifications, affiliation and purposes, and those of the institutions and organisations with which he is associated. 1. A psychologist does not claim either directly or by implication professional qualifications that differ from his actual qualifications, nor does he misrepresent his affiliation with any institution, organisation or individual, nor lead others to assume that he has affiliations that he does not have. The psychologist is responsible for correcting others who misrepresent his professional qualifications or affiliations. 2. The psychologist does not misrepresent an institution or organisation with which he is affiliated by ascribing to it characteristics that it does not have. 3. A psychologist does not use his affiliation with the American Psychological Association or its divisions for purposes that are not in consonant with the stated purposes of the Association. 4. A psychologist does not associate himself with or permit his name to be used in connection with any services or products in such a way as to misrepresent them, the degree of his responsibility for them, or the nature of his affiliation.
Principle 5: Public Statements Modesty, scientific caution and due regard for the limits of present knowledge, characterise all statements of the psychologists who supply information to the public, either directly or indirectly. 1. Psychologists who interpret the science of psychology or the services of psychologists to clients or to the general public have an obligation to report fairly and accurately. Exaggeration, sensationalism, superficiality and other kinds of misrepresentation are avoided. 2. When information about psychological procedures and techniques is given, care is taken to indicate that they should be used only by persons adequately trained in their use. 3. The psychologist who engages in radio or television activities does not participate in commercial announcements recommending purchase or use of a product.
290
Applied Psychometry
Principle 6: Confidentiality Safeguarding information about an individual that has been obtained by the psychologist in the course of his teaching, practice or investigation is a primary obligation of the psychologist. Such information is not communicated to others unless certain important conditions are met. 1. Information received in confidence is revealed only after most careful deliberation and when there is clear and imminent danger to an individual or to society, and thereafter only to appropriate professional workers or public authorities. 2. Information obtained in clinical or consulting relationships, or evaluative data concerning children, students, employees and others are discussed only for professional purposes and only with persons clearly concerned with the case. Written and oral reports should present only data germane to the purposes of the evaluation; every effort should be made to avoid undue invasion of privacy. 3. Clinical and other materials are used in classroom teaching and writing only, whereas the identity of the person involved is adequately disguised. 4. The confidentiality of professional communications about individuals is maintained. Only when the originator and other persons involved give their express permission is a confidential professional communication shown to the individual concerned. The psychologist is responsible for informing the client of the limits of the confidentiality. 5. Only after explicit permission has been granted is the identity of research subjects published. When data have been published without permission for identification, the psychologist assumes responsibility for adequately disguising their sources. 6. The psychologist makes provisions for the maintenance of confidentiality in the preservation and ultimate disposition of confidential records.
Principle 7: Client Welfare The psychologist respects the integrity and protects the welfare of the person or group with whom he is working. 1. The psychologist in industry, education and other situations in which conflicts of interest may arise among various parties, as between management and labour, or between the client and employer of the psychologist, defines for himself the nature and direction of his loyalties and responsibilities and keeps all parties informed of these commitments. 2. When there is a conflict among professional workers, the psychologist is concerned primarily with the welfare of any client involved and only secondarily with the interest of his own professional group.
Ethical Issues in Psychological Testing
291
3. The psychologist attempts to terminate a clinical or consulting relationship when it is reasonably clear to the psychologist that the client is not benefiting from it. 4. The psychologist who asks that an individual reveal personal information in the course of interviewing, testing or evaluation, or who allows such information to be divulged to him, does so only after making certain that the responsible person is fully aware of the purposes of the interview, testing or evaluation, and of the ways in which the information may be used. 5. In cases involving referral, the responsibility of the psychologist for the welfare of the client continues until this responsibility is assumed by the professional person to whom the client is referred or until the relationship with the psychologist making the referral has been terminated by mutual agreement. In situations where referral, consultation or other changes in the conditions of the treatment are indicated and the client refuses referral, the psychologist carefully weighs the possible harm to the client, to himself and to his profession that might ensue from continuing the relationship. 6. The psychologist who requires the taking of psychological tests for didactic, classification or research purposes protects the examinees by insuring that the tests and test results are used in a professional manner. 7. When potentially disturbing subject matter is presented to students, it is discussed objectively and efforts are made to handle constructively any difficulties that arise. 8. Care must be taken to ensure an appropriate setting for clinical work to protect both client and psychologist from actual or imputed harm and the profession form censure. 9. In the use of accepted drugs for therapeutic purposes, special care needs to be exercised by the psychologist to assure himself that the collaborating physician provides suitable safeguards for the client.
Principle 8: Client Relationship The psychologist informs his prospective client of the important aspects of the potential relationship that might affect the client’s decision to enter the relationship. 1. Aspects of the relationship likely to affect the client’s decision include the recording of an interview, the use of interview material for training purposes and observation of an interview by other persons. 2. When the client is not competent to evaluate the situation (as in the case of a child), the person responsible for the client is informed of the circumstances which may influence the relationship. 3. The psychologist does not normally enter into a professional relationship with members of his own family, intimate friends, close associates or others whose welfare might be jeopardised by such a relationship.
292
Applied Psychometry
Principle 9: Impersonal Service Psychological services for the purpose of diagnosis, treatment or personalised advice are provided only in the context of a professional relationship and are not given by means of public lectures or demonstrations, newspaper or magazine articles, radio or television programmes, mail or similar media. 1. The preparation of personnel reports and recommendations based on test data secured solely by mail is unethical unless such appraisals are an integral part of a continuing client relationship with a company, as a result of which the consulting psychologist has intimate knowledge of the client’s personnel situation and can be assured thereby that his written appraisals will be adequate to the purpose and will be properly interpreted by the client. These reports must not be embellished with such detailed analyses of the subject’s personality traits as would be appropriate only after intensive interviews with the subject. The reports must not make specific recommendations as to employment or placement of the subject which go beyond the psychologist’s knowledge of the job requirements of the company. The reports must not purport to eliminate the company’s need to carry on such other regular employment or personnel practices as, appraisal of the work history, checking of references, past performance in the company, and so on.
Principle 10: Announcement of Services A psychologist adheres to professional rather than commercial standards in making known his availability for professional services. 1. A psychologist does not directly solicit clients for individual diagnosis or therapy. 2. Individual listings in telephone directories are limited to name, highest relevant degree, certification status, address and telephone number. They may also include identification in a few words of the psychologist’s major areas of practice, for example, child therapy, personnel selection, industrial psychology, and so on. Agency listings are equally modest. 3. Announcements of individual private practice are limited to a simple statement of the name, highest relevant degree, certification or diplomat status, address, telephone number, office hours and a brief explanation of the types of services rendered. Announcements of agencies may list names of staff members with their qualifications. They conform in other particulars with the same standards as individual announcements, making certain that the true nature of the organisation is apparent. 4. A psychologist or an agency announcing non-clinical professional services may use brochures that are descriptive of the services rendered but not evaluative. They may be sent to professional persons, schools, business firms, government agencies and other similar organisations.
Ethical Issues in Psychological Testing
293
5. The use in a brochure of ‘testimonials from satisfied users’ is unacceptable. The offer of a free trial of services is unacceptable if it operates to misrepresent in any way the nature or the efficacy of the services rendered by the psychologist. Claims that a psychologist has unique skills or unique devices not available to others in the profession are made only if the special efficacy of these unique skills or devices has been demonstrated by scientifically acceptable evidence. 6. The psychologist must not encourage (nor, within his power, even allow) a client to have exaggerated ideas as to the efficacy of the services rendered. Claims made to clients about the efficacy of his services must not go beyond those which the psychologist would be willing to subject to professional scrutiny through publishing his results and his claims in a professional journal.
Principle 11: Interprofessional Relations A psychologist acts with integrity in regard to colleagues in psychology and in other professions. 1. Each member of the association cooperates with the duly constituted committee on scientific and professional ethics and conduct in the performance of its duties by responding to enquiries with reasonable promptness and completeness. A member taking longer than 30 days to respond to such enquiries shall have the burden of demonstrating that he acted with reasonable promptness. 2. A psychologist does not normally offer professional services to a person receiving psychological assistance from another professional worker except by agreement with the other worker or after the termination of the client’s relationship with the other professional worker. 3. The welfare of clients and colleagues requires that psychologists in joint practice or corporate activities make an orderly and explicit arrangement regarding the conditions of their association and its possible termination. Psychologists who serve as employers of other psychologists have an obligation to make similar appropriate arrangements.
Principle 12: Remuneration Financial arrangements in professional practice are in accord with professional standards that safeguard the best interest of the client and the profession. 1. In establishing rates for professional services, the psychologist considers carefully both the ability of the client to meet the financial burden and the charges made by other professional persons engaged in comparable work. He is willing to contribute a portion of his services to work for which he receives little or no financial return.
294
Applied Psychometry
2. No commission or rebate or any other form of remuneration is given or received for referral of clients for professional services. 3. The psychologist in clinical or counselling practice does not use his relationships with clients to promote, for personal gain or the profit of an agency, commercial enterprises of any kind. 4. A psychologist does not accept a private fee or any other form of remuneration for professional work with a person who is entitled to his services through an institution or agency. The policies of a particular agency may make explicit provision for private work with its clients by members of its staff and, in such instances, the client must be fully apprised of all policies affecting him.
Principle 13: Test Security Psychological tests and other assessment devices, the value of which depends in part on the naiveté of the subject, are not reproduced or described in popular publications in ways that might invalidate the techniques. Access to such devices is limited to persons with professional interests who will safeguard their use. 1. Sample items made up to resemble those of tests being discussed may be reproduced in popular articles and elsewhere, but scoreable tests and actual test items are not reproduced except in professional publications. 2. The psychologist is responsible for the control of psychological tests and other devices and procedures used for instruction when their values might be damaged by revealing to the general public their specific contents or underlying principles.
Principle 14: Test Interpretation Test scores, like test materials, are released only to persons who are qualified interpreters and use them properly. 1. Materials for reporting test scores to parents—or which are designed for self-appraisal purposes in schools, social agencies or industry—are closely supervised by qualified psychologists or counsellors with provisions for referring and counselling individuals when needed. 2. Test results or other assessment data used for evaluation or classification are communicated to employers, relatives or other appropriate persons in such a manner as to guard against misinterpretation or misuse. Usually, an interpretation of the test result rather than the scores is communicated. 3. When test results are communicated directly to parents and students, they are accompanied by adequate interpretive aids or advice.
Ethical Issues in Psychological Testing
295
Principle 15: Test Publication Psychological tests are offered for commercial publication only to publishers who present their tests in a professional way and distribute them only to qualified users. 1. A test manual, technical handbook or other suitable report on the test is provided which describes the method of constructing and standardising the test and summarises the validation research. 2. The populations for which the test has been developed and the purposes for which it is recommended are stated in the manual. Limitations upon the tests, dependability and aspects of its validity on which research is lacking or incomplete, are clearly stated. In particular, the manual contains a warning regarding the interpretations likely to be made that have not yet been substantiated by research. 3. The catalogue and the manual indicate the training and professional qualifications required for sound interpretation of the test. 4. The test manual and supporting documents take into account the principles enunciated in the Standards for Educational and Psychological Test. 5. Test advertisements are factual and descriptive rather than emotional and persuasive.
Principle 16: Research Precautions The psychologist assumes obligations for the welfare of his research subjects, both animal and human. The decision to undertake research should rest upon a considered judgment by the individual psychologist about how best to contribute to psychological science and to human welfare. The responsible psychologist weighs alternative directions in which personal energies and resources might be invested. Having made the decision to conduct research, psychologists must carry out their investigations with respect for the people who participate, and with concern for their dignity and welfare. The principles that follow make explicit the investigators responsibilities towards participants over the course of research, from the initial decision to pursue a study to the steps necessary to protect the confidentiality of research data. The principles should be interpreted in terms of the contexts provided in the complete document offered as a supplement to these principles. 1. In planning a study, the investigator has the personal responsibility to make a careful evaluation of its ethical acceptability, taking into account these principles for research with human beings. To the extent that this appraisal, weighing scientific and humane values, suggests a deviation from any principle, the investigator incurs an increasingly serious obligation to seek ethical advice and to observe more stringent safeguards to protect the rights of the human research participants. 2. Responsibility for the establishment and maintenance of acceptable ethical practice in research always remains with the individual investigator. The investigator is also responsible for the
296
3.
4.
5.
6.
7.
8.
9.
10.
Applied Psychometry
ethical treatment of research participants by collaborators, assistants, students and employees, all of whom, however, incur parallel obligations. Ethical practices require the investigator to inform the participants of all features of the research that reasonably might by expected to influence the subject’s willingness to participate, and to explain all other aspects of the research about which the participant may enquire. Failure to make full disclosure gives added emphasis to the investigators abiding responsibility to protect the welfare and dignity of the research participant. Openness and honesty are essential characteristics of the relationship between the investigator and the research participant. When the methodological requirements of a study necessitate concealment or deception, the investigator is required to ensure the participants understanding of the reasons for this action and to restore the quality of the relationship with the investigator. Ethical research practices require the investigator to respect the individual’s freedom to decline to participate in research or to discontinue participation at any time. The obligation to protect this freedom requires special vigilance when the investigator is in a position of power over the participant. The decision to limit this freedom gives added emphasis to the investigators abiding responsibility to protect the participant’s dignity and welfare. Ethically acceptable research begins with the establishment of a clear and fair agreement between the investigator and the research participant that clarifies the responsibilities of each. The investigator has the obligation to honour all promises and commitments included in that agreement. An ethical investigator protects participants from physical and mental discomfort, harm and danger. If the risk of such consequences exists, the investigator is required to inform the participant of that fact, secure consent before proceeding, and take all possible measures to minimise distress. A research procedure may not be used if it is likely to cause serious and lasting harm to participants. After the data are collected, ethical practices require the investigator to provide the participant with full clarification of the nature of the study and to remove any misconceptions that may have arisen. Where scientific or humane values justify delaying or withholding information, the investigator acquires a special responsibility to assure that there are no damaging consequences for the participant. Where research procedures may result in undesirable consequences for the participant, the investigator has the responsibility to detect and remove or correct these consequences, including, where relevant, long term after-effects. Information obtained about the research participants during the course of an investigation is confidential. When the possibility exists that others may access such information, ethical research practice requires that this possibility, together with the plans for protecting confidentiality, be explained to the participants as a part of the procedure for obtaining their informed consent.
Ethical Issues in Psychological Testing
297
11. A psychologist using animals in research adheres to the provisions of the Rules Regarding Animals, drawn up by the Committee on Precautions and Standards in Animal Experimentation and adopted by the American Psychological Association. 12. Investigations of human subjects using experimental drugs (for example, hallucinogenic, psychedelic or similar substance) should be conducted only in such settings as clinics, hospitals, or research facilities maintaining appropriate safeguards for the subjects.
Principle 17: Publication Credit Credit is assigned to those who have contributed to a publication, in proportion to their contribution and only to these. 1. Major contributions of a professional character, made by several persons to a common project, are recognised by joint authorship. The experimenter or author who has made the principal contribution to a publication is identified as the first listed. 2. Minor contribution of a professional character, extensive clerical or similar non-professional assistance, and other minor contributions are acknowledged in footnotes or in an introductory statement. 3. Acknowledgement through specific citations is made for unpublished as well as published material that has directly influenced the research or writing. 4. A psychologist who compiles and edits for publication the contributions of other publishers, the symposium or report under the title of the committee or symposium, with his own name appearing as chairman or editor among those of the other contributors or committee members.
Principle 18: Responsibility towards Organisation A psychologist respects the rights and reputation of the institute or organisation with which he is associated. 1. Materials prepared by a psychologist as a part of his regular work under specific direction of his organisation are the property of that organisation. Such materials are released for use or publication by a psychologist in accordance with the policies of authorisation, assignment of credit and related matters, which have been established by his organisation. 2. Other material resulting incidentally from activity supported by any agency, and for which the psychologist rightly assumes individual responsibility, is published with a disclaimer for any responsibility on the part of the supporting agency.
298
Applied Psychometry
Principle 19: Promotional Activities The psychologist associated with the development or promotion of psychological devices, books or other products offered for commercial sale is responsible for ensuring that such devices, books or products are presented in a professional and factual way. 1. Claims regarding performance, benefits or results are supported by scientifically acceptable evidence. 2. The psychologist does not use professional journals for commercial exploitation of psychological products and the psychologist–editor guards against such misuse. 3. The psychologist with a financial interest in the sale or use of a psychological product is sensitive to possible conflict of interest in his promotion of such products, and avoids compromise of his professional responsibilities and objectives. These specific principles are to be kept in mind and the research protocols to be monitored carefully by the psychologists. Failure to adhere to these can have severe consequences for the scientist. However, there exists a group of psychologists who believes that there is no need for special ethical rules to govern psychological research. They think that the ethical questions do not always have precise and clear-cut answers. As Puhan (1995) rightly argued, ‘Measurement as we all know, not only includes formal testing but also all other processes involved on coding a subjective response.’ In many experimental studies, measurement is often blended into the method of investigation. In a broad sense, therefore, measurement is a crucial component of description of variables. Unless a variable can be measured or known, it would be difficult to speak about it in a meaningful way. This is the meaning and importance of testing. Psychology has an advantage over the rest of the social sciences. Today, an argument against measurement has become the fashion. We argue against testing without fully understanding the implications of their opinion. In fact, the psychologists who are advocating against measurement have too narrow a concept of testing or measurement. These two positions discussed above are in two extremes. Somewhere between these two is a rational position recognising that much of value is to be gained from behavioural research and that all interests must be protected in the process of reaping these benefits. Like most rules, the ethical rules that govern psychological research are subject to different interpretations in different contexts. There will never be precise answers to the ethical questions that arise in the context of psychological research. There can only be rational and informed individuals who make decisions about ethical matters according to their best judgement.
PART 5 Factor Analysis
300
Applied Psychometry
17
Basics of Factor Analysis
CHAPTER OUTLINE 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Factor analysis: Introduction Relationship between correlation coefficient and factor loadings Communality Specificity Uniqueness What are R-type and Q-type factor analyses? Merits of factor analysis Major limitations of factor analysis When to factor analyse? When not to factor analyse?
LEARNING OBJECTIVES At the end of this chapter, you will be able to understand: 1. 2. 3. 4. 5. 6. 7.
What is factor analysis and what are its major objectives? What are the various methods of factor analysis? What are communality, specificity and uniqueness, and how to compute them? What are R-type and Q-type factor analyses? What are main merits and limitations of factor analyses? What are the conditions under which we can factor analyse a given data set? What are the conditions when we cannot factor analyse?
301
302
Applied Psychometry
FACTOR ANALYSIS: INTRODUCTION
F
actor analysis is a very broad term and does not really represent a unitary concept. It generally refers to a set of statistical procedures, all of which function so as to locate a small number of dimensions, clusters or factors in a larger set of independent variables or items. The primary distinctive element of factor analysis is data reduction. In factor analysis, we start with a large set of variables, and variables that correlate highly with each other are identified as representing a single factor, and variables that do not correlate with each other are identified as representing orthogonal (or independent) factors. The ideal factor analysis would identify a small number of factors which are orthogonal to each other; that is, in spatial terms, they would lie at right angles to each other when graphed (Reber and Reber 2001). All procedures for factor analysis require the same basic kind of data for the purpose, that is, for correlation matrix. There are a few procedures which can also use matrix of covariance. The main methods of factor analysis are: 1. 2. 3. 4.
Principal components method (Hottelling 1933), Principal axes method (Kelley 1935), Summation method (Burt 1941) and Centroid method (Thurstone 1947).
Principal component (Hottelling) and principal axes (Kelley) methods are mathematically more rigorous. These can be applied with more objectivity. These methods can account for obtained scores and intercorrelations but are difficult to interpret psychologically. The summation method has arbitrary restrictions like requirement of a g-factor.1 In this chapter, Thrustone’s centroid method will be discussed in detail. In statistical language, factor analysis is a technique to replace the correlational matrix by the factor matrix. The number of columns and rows in the case of correlational matrix is the same and the cell entries are nothing but the correlational values. The range of correlational values is from +1.0 to –1.0. In the factor matrix, the columns are generally less than the row, and the columns are the common factors while the rows are factor loadings and their range is +1.0 to –1.0. These cell values are correlation coefficients of the tests with the factors, if the factors are orthogonal. But if the factors are oblique (correlated), then the cell values in factor matrix may or may not be the correlations between the tests and the factors.
RELATIONSHIP BETWEEN CORRELATION COEFFICIENTS AND FACTOR LOADINGS As we know that in factor analysis, we proceed from the correlation matrix and reach the factor matrix, can we start with the factor matrix and obtain the correlational matrix? Yes, one can start
Basics of Factor Analysis
303
with the factor matrix (Say F) and can obtain the correlational matrix (say R). The mathematical formula for this purpose is R = F × F’ where F’ is the transpose of factor matrix F. Before illustrating it with an example, let us first understand the multiplication of two matrices. Suppose there is matrix, ⎡ X1 Y1 ⎤ ⎢ ⎥ F = ⎢ X 2 Y2 ⎥ ⎢⎣ X 3 Y3 ⎥⎦ Then, the transpose of this matrix is, ⎡ X1 F’= ⎢ ⎣ Y1
X2 Y2
X3 ⎤ Y3 ⎥⎦
⎡ X1 Y1 ⎤ ⎢ ⎥ ⎡ X1 F × F ’ = ⎢ X 2 Y2 ⎥ × ⎢ Y ⎢⎣ X 3 Y3 ⎥⎦ ⎣ 1
Thus,
X2 Y2
X3 ⎤ =R Y3 ⎥⎦
(1)
⎡ (X1 X1 + Y1Y2 ) (X1 X1 + Y1Y2 ) (X1 X 3 + Y1Y3 ) ⎤ ⎢ ⎥ = ⎢(X 2 X1 + Y2 Y1 ) (X 2 X 2 + Y2 Y2 ) (X 2 X 3 + Y2 Y3 )⎥ ⎢⎣(X 3 X1 + Y3 Y1 ) (X 3 X 2 + Y3 Y2 ) (X 3 X 3 + Y3 Y3 )⎥⎦ The reader should keep in mind that F × F’ is not the same as F’ × F. Mathematically, F × F’ ≠ F’ × F Now let us take an example of the following factor matrix (F), Tests A B C D E
I
II
0.8 0.6 0.7 0.2 0.1
0.1 0.5 0.2 0.8 0.7
⎡ 0.8 ⎢ ⎢ 0.6 then F = ⎢ 0.7 ⎢ ⎢ 0.2 ⎢ 0.1 ⎣
0.1 ⎤ ⎥ 0.5 ⎥ 0.2 ⎥ ⎥ 0.8 ⎥ 0.7 ⎥⎦
304
Applied Psychometry
To find the transpose of F, the columns and rows will be interchanged as shown below. ⎡0.8 0.6 0.7 0.2 0.1⎤ F’ = ⎢ ⎥ ⎣0.1 0.5 0.2 0.8 0.7 ⎦ Substitute these in equation (1), we get, ⎡0.8 ⎢ ⎢0.6 R = F × F’ = ⎢0.7 ⎢ ⎢0.2 ⎢0.1 ⎣ ⎡ 0.8 × 0.8 + 0.1 × 0.1 ⎢ 0.6 × 0.8 + 0.5 × 0.1 ⎢ or R = ⎢ 0.7 × 0.8 + 0.2 × 0.1 ⎢ 0.2 × 0.8 + 0.8 × 0.1 ⎢ ⎢⎣ 0.1 × 0.8 + 0.7 × 0.1
0.8 × 0.6 + 0.1 × 0.5 0.6 × 0.6 + 0.5 × 0.5 0.7 × 0.6 + 0.2 × 0.5 0.2 × 0.6 + 0.8 × 0.5 0.1 × 0.6 + 0.7 × 0.5
0.1 ⎤ ⎥ 0.5 ⎥ ⎡0.88 0.6 0.7 0.2 0.1⎤ 0.2 ⎥ × ⎢ ⎥ ⎥ ⎣0.1 0.5 0.2 0.8 0.7 ⎦ 0.8 ⎥ 0.7 ⎥⎦ 0.8 × 0.7 + 0.1 × 0.2 0.6 × 0.7 + 0.5 × 0.2 0.7 × 0.7 + 0.2 × 0.2 0.2 × 0.7 + 0.8 × 0.2 0.1 × 0.7 + 0.7 × 0.2
0.8 × 0.2 + 0.1 × 0.8 0.6 × 0.2 + 0.5 × 0.8 0.7 × 0.2 + 0.2 × 0.8 0.2 × 0.2 + 0.8 × 0.8 0.1 × 0.2 + 0.7 × 0.8
0.8 × 0.1 + 0.1 × 0.7 ⎤ 0.6 × 0.1 + 0.5 × 0.7 ⎥ ⎥ 0.7 × 0.1 + 0.2 × 0.7 ⎥ 0.2 × 0.1 + 0.8 × 0.7 ⎥ ⎥ 0.1 × 0.1 + 0.7 × 0.7 ⎥⎦
⎡(0.65) 0.53 0.58 0.24 0.15 ⎤ ⎢ ⎥ ⎢0.53 (0.61) 0.52 0.52 0.41 ⎥ ⎢0.58 0.52 (0.53) 0.30 0.21 ⎥ ⎢ ⎥ ⎢0.24 0.52 0.30 (0.68) 0.58 ⎥ ⎢0.15 0.41 0.21 0.58 (0.50)⎥ ⎣ ⎦ The correlation coefficients in parentheses are the communalities. Take another example when the extracted factors in a factor matrix are more extracted factors. For example,
Tests A B C D E
I 0.7 0.6 0.1 0.3 0.2
Factors II 0.1 0.2 0.2 0.4 0.8
III 0.2 0.3 0.7 0.7 0.3
⎡0.7 ⎢ ⎢0.6 then F = ⎢0.1 ⎢ ⎢0.3 ⎢0.2 ⎣
0.1 0.2 0.2 0.4 0.8
0.2 ⎤ ⎥ 0.3 ⎥ 0.7 ⎥ ⎥ 0.7 ⎥ 0.3 ⎥⎦
Basics of Factor Analysis
The transpose of this factor matrix (F’) will be, ⎡ 0.7 0.6 0.1 0.3 0.2 ⎤ ⎢ ⎥ F ′ = ⎢0.1 0.2 0.2 0.4 0.8 ⎥ ⎢⎣0.2 0.3 0.7 0.7 0.3 ⎥⎦ The correlation matrix (R) is given by, R = F × F’ ⎡ 0.7 ⎢ 0.6 ⎢ ∴ R = ⎢0.1 ⎢ ⎢0.3 ⎢⎣ 0..2
0.1 0.2 0.2 0.4 0.8
0.2 ⎤ 0.3 ⎥ ⎡ 0.7 0.6 0.1 0.3 0.2 ⎤ ⎥ 0.7 ⎥ × ⎢⎢ 0.1 0.2 0.2 0.4 0.8 ⎥⎥ ⎥ ⎢⎣0.2 0.3 0.7 0.7 0.3 ⎥⎦ 0.7 ⎥ 0.3 ⎥⎦
⎡ 0.7 × 0.7 + 0.1 × 0.1 + 0.2 × 0.2 ⎢ 0.6 × 0.7 + 0.2× × 0.1 + 0.3 × 0.2 ⎢ = ⎢ 0.1 × 0.7 + 0.2 × 0.1 + 0.7 × 0.2 ⎢ ⎢ 0.3 × 0.7 + 0.4 × 0.1 + 0.7 × 0.2 ⎢⎣ 0.2 × 0.7 + 0.8 × 0.1 + 0.3 × 0.2
0.7 × 0.6 + 0.1 × 0.2 + 0.2 × 0.3 0.6 × 0.6 + 0.2 × 0.2 + 0.3 × 0.3 0.1 × 0.6 + 0.2 × 0.2 + 0.7 × 0.3 0.3 × 0.6 + 0.4 × 0.2 + 0.7 × 0.3 0.2 × 0.6 + 0.8 × 0.2 + 0.3 × 0.3
0.7 × 0.1 + 0.1 × 0.2 + 0.2 × 0.7 0.6 × 0.1 + 0.2 × 0.2 + 0.3 × 0.7 0.1 × 0.1 + 0.2 × 0.2 + 0.7 × 0.7 0.3 × 0.1 + 0.4 × 0.2 + 0.7 × 0.7 0.2 × 0.1 + 0.8 × 0.2 + 0.3 × 0.7
0.7 × 0.3 + 0.1 × 0.4 + 0.2 × 0.7 0.6 × 0.3 + 0.2 × 0.4 + 0.3 × 0.7 0.1 × 0.3 + 0.2 × 0.4 + 0.7 × 0.7 0.3 × 0.4 + 0.4 × 0.7 + 0.7 × 0.7 0.2 × 0.3 + 0.8 × 0.4 + 0.3 × 0.7
0.7 × 0.2 + 0.1 × 0.8 + 0.2 × 0.3 ⎤ 0.6 × 0.2 + 0.2 × 0.8 + 0.3 × 0.3 ⎥ ⎥ 0.1 × 0.2 + 0.2 × 0.8 + 0.7 × 0.3 ⎥ ⎥ 0.3 × 0.2 + 0.4 × 0.8 + 0.7 × 0.3 ⎥ 0.2 × 0.2 + 0.8 × 0.8 + 0.3 × 0.3 ⎥⎥ ⎦ ⎡(0.54) 0.50 0.23 0.39 0.28 ⎤ ⎢ ⎥ ⎢ 0.50 (0.49) 0.31 0.47 0.37 ⎥ or, R = ⎢ 0.23 0.31 (0.54) 0.60 0.39 ⎥ ⎢ ⎥ ⎢ 0.39 0.47 0.60 (0.74) 0.59 ⎥ ⎢ 0.28 0.37 0.39 0.59 (0.77 )⎥ ⎣ ⎦
305
306
Applied Psychometry
In the same way, the correlation matrix can be obtained from the factor matrix having more than three factors, say four, five, six, and so on. The correlation matrix, R, thus obtained is a reduced correlation matrix because in this correlation matrix, the diagonal cell values are communalities of the tests, that is, the proportions of common factor variance. This leaves behind the specificity and error variances of the tests. Some factor experts put the value 1.00 in every cell. This is possible if we introduce specific and error variances in the correlational matrix.
COMMUNALITY, SPECIFICITY AND UNIQUENESS Let us first define the terms (a) common variance, (b) specific variance, (c) error variance and (d) uniqueness. (a) Common variance is that portion of the reliable variance which correlates with other variables. (b) Specific variance is that portion of the reliable variance which does not correlate with any other variable. (c) Error variance is the chance variance due to errors of sampling, measurement, unstandardised conditions of testing, physiological and other changes within the individual, and the host of other influences which may contribute to unreliability. It is assumed to be uncorrelated with the reliable variance. (d) Uniqueness of variable is that portion of the total variance which does not have anything in common with any other variable. According to factor theory, the total variance of any variable/test can be partitioned into three independent components, namely, common variance (popularly known as communality), specific variance (popularly known as specificity) and error variance. Therefore, Total variance = common variance + specific variance + error variance Total variance is 1.0 and error variance is nothing but the standard error of measurement squared as per the reliability theory. In other words, Reliability = 1 – error variance If a test has reliability of, say, 0.60, then, Error variance would be 0.40 (1 – reliability = 1 – 0.60 = 0.40)
Basics of Factor Analysis
307
Hence, 1 = communality + specificity + unreliability or 1 = communality + specificity + (1 – reliability) or, Reliability = communality + specificity. If communality is denoted C 2j and specificity by S 2ji , then Reliability = C 2j + S 2j In most of the analyses, specific variance is not separated from the error variance. In other words, specific variance and error variance are lumped together, which is called a unique variance (denoted by U 2j). Hence, Total variance = communality + uniqueness or, 1 = C 2j + U 2j or, U 2j = 1 – C 2j or, Uniqueness = 1 – communality Communality is defined as the proportion of common factor variance in the scores. In simple terms, Communality = Square and sum (sum of the squares) of the factor loadings over a particular test. Another term which is frequently used in factor theory is Eigen value, which is given by the ‘square and sum’ of the factor loadings over a factor. To sum up, the communality, specificity and uniqueness are computed for the test, whereas the Eigen value and variance (percentages) are computed for the factors. Let us make clear the meaning of these with the help of an example.
Problem Find (a) communality, specificity and uniqueness for each test and (b) the Eigen value and percentage of variance for each factor form the factor matrix given below: Factors Tests 1 2 3 4 5 6
I
II
III
Reliability
0.14 –0.26 0.92 0.92 0.22 0.62
0.48 0.87 0.02 0.03 0.49 0.17
–0.78 0.01 0.02 0.04 0.74 0.27
0.92 0.82 0.91 0.94 0.85 0.80
308
Applied Psychometry
Solution Communality We know communality (c2j ) = ‘Square and sum’ of factor loadings over each test. Communality of Test 1
= (0.14)2 + (0.48)2 + (–0.78)2 = 0.0196 + 0.2304 + 0.6084 = 0.86
Communality of Test 2
= (–0.26)2 + (0.87)2 + (0.01)2 = 0.0676 + 0.7569 + 0.0001 = 0.83
Communality of Test 3
= (–0.92)2 + (0.02)2 + (0.02)2 = 0.8464 + 0.0004 + 0.0004 = 0.85
Communality of Test 4
= (0.92)2 + (0.03)2 + (0.04)2 = 0.8464 + 0.0009 + 0.0016 = 0.85
Communality of Test 5
= (0.22)2 + (0.49)2 + (0.74)2 = 0.0484 + 0.2401 + 0.5476 = 0.84
Communality of Test 6
= (0.62)2 + (0.17)2 + (0.27)2 = 0.3844 + 0.0289 + 0.0729 = 0.49
Specificity We know Reliability = communality (c2j ) + Specificity (s2j ) Hence, Specificity = Reliability – Community Specificity of Test 1
= 0.92 – 0.86 = 0.06
Specificity of Test 2
= 0.84 – 0.83 = 0.01
Specificity of Test 3
= 0.91 – 0.85 = 0.06
Basics of Factor Analysis
Specificity of Test 4
= 0.94 – 0.85 = 0.09
Specificity of Test 5
= 0.85 – 0.84 = 0.01
Specificity of Test 6
= 0.80 – 0.49 = 0.31
309
Uniqueness We know uniqueness = 1 – Communality Hence, Uniqueness of Test 1
= 1 – 0.86 = 0.14
Uniqueness of Test 2
= 1 – 0.83 = 0.17
Uniqueness of Test 3
= 1 – 0.85 = 0.15
Uniqueness of Test 4
= 1 – 0.85 = 0.15
Uniqueness of Test 5
= 1 – 0.84 = 0.16
Uniqueness of Test 6
= 1 – 0.49 = 0.51
Eigen Value We know Eigen Value of any factor is the ‘square and sum’ of the factor loadings over each factor. Hence, Eigen Value of Factor I
= (0.14)) 2 + (–0.26)2 + (0.92)2 + (0.92)2 + (0.22)2 + (0.62)2 = 0.0196 + 0.0676 + 0.8464 + 0.8464 + 0.0484 + 0.3844 = 2.21
310
Applied Psychometry
= (0.48)) 2 + (0.87)2 + (0.02)2 + (0.03)2 + (0.49)2 + (0.17)2
Eigen Value of Factor II
= 0.2304 + 0.7056 + 0.0004 + 0.0009 + 0.2401 + 0.0289 = 1.21 = (–0.78)) 2 + (0.01)2 + (0.02)2 + (0.04)2 + (0.74)2 + (0.27)2
Eigen Value of Factor III
= 0.6084 + 0.0001 + 0.0004 + 0.00016 + 0.5476 + 0.0729 = 1.23 Percentage of Variance The percentage of variance is given by the formula, Eigen Value of the factor × 100 Sum of Eigen Values
Percentage of variance
=
Here, total Eigen values
= 2.21 + 1.21 + 1.23 = 4.65
Percentage of variance of Factor I
=
2.21 × 100 = 47.53% 4.65
Percentage of variance of Factor II =
1.21 × 100 = 26.02% 4.65
Percentage of variance of Factor III =
1.23 × 100 = 26.45% 4.65
Problem For the following problem involving seven variables, find (a) communality, specificity and uniqueness for each of the five variables and (b) the Eigen value and percentage of variance for each of the factor. Factors S. No.
Variables
1 2 3 4 5 6 7
Intelligence Creativity Motivation Anxiety Acceptability Expression Control
I
II
III
Reliability Index
0.64 0.72 0.69 0.11 0.23 0.57 0.04
0.11 0.13 0.04 0.64 0.39 0.05 0.17
0.01 0.06 0.11 0.31 0.49 0.12 0.16
0.81 0.79 0.92 0.81 0.67 0.72 0.80
Basics of Factor Analysis
Solution Communalities, Communality of Test 1 Communality of Test 2 Communality of Test 3 Communality of Test 4 Communality of Test 5 Communality of Test 6 Communality of Test 7
= (0.64)2 + (0.11)2 + (0.01)2 = 0.42 = (0.72)2 + (0.13)2 + (0.06)2 = 0.54 = (0.69)2 + (0.04)2 + (0.11)2 = 0.49 = (0.11)2 + (0.64)2 + (0.31)2 = 0.52 = (0.23)2 + (0.39)2 + (0.49)2 = 0.45 = (0.57)2 + (0.05)2 + (0.12)2 = 0.34 = (0.04)2 + (0.17)2 + (0.16)2 = 0.06
Specificity, Specificity of Test 1 Specificity of Test 2 Specificity of Test 3 Specificity of Test 4 Specificity of Test 5 Specificity of Test 6 Specificity of Test 7
= 081 – 0.42 = 0.39 = 0.79 – 0.54 = 0.25 = 0.92 – 0.49 = 0.43 = 0.81 – 0.52 = 0.29 = 0.67 – 0.45 = 0.22 = 0.72 – 0.34 = 0.38 = 0.80 – 0.06 = 0.74
Uniqueness Uniqueness for Test 1 Uniqueness for Test 2 Uniqueness for Test 3 Uniqueness for Test 4 Uniqueness for Test 5 Uniqueness for Test 6 Uniqueness for Test 7
= 1 – 0.42 = 0.58 = 1 – 0.54 = 0.46 = 1 – 0.49 = 0.51 = 1 – 0.52 = 0.48 = 1 – 0.45 = 0.55 = 1 – 0.34 = 0.66 = 1 – 0.06 = 0.94
Eigen Value Eigen Value for Factor I = (0.64)2 + (0.72)2 + (0.69)2 + (0.11)2 + (0.23)2 + (0.57)2 + (0.04)2 = 1.80 Eigen Value for Factor II = (0.11)2 + (0.13)2 + (0.04)2 + (0.64)2 + (0.39)2 + (0.05)2 + (0.17)2 = 0.70 Eigen Value for Factor III = (0.01)2 + (0.06)2 + (0.11)2 + (0.31)2 + (0.49)2 + (0.12)2 + (0.16)2 = 0.39
311
312
Applied Psychometry
Percentage of Variance Total Eigen values (variance) = 1.80 + 0.70 + 0.39 = 2.89 Percentage of variance for Factor I =
1.80 × 100 = 62.28 2.89
Percentage of variance for Factor II =
0.70 × 100 = 24.22 2.89
Percentage of variance for Factor III =
0.39 × 100 = 13.50 2.89
What are R-type and Q-type Factor Analyses? Factor analysis may be R-type factor analysis or it may be Q-type factor analysis. In R-type factor analysis, high correlations occur when respondents who score high on variable 1, also score high on variable 2 and respondents who score low on variable 1, also score low on variable 2. Factors emerge when there are high correlations within groups of people. Q-type factor analysis is useful when the object is to sort out people into groups based on their simultaneous response to all the variables. Factor analysis was mainly used in developing psychological tests (such as IQ tests, personality tests, and so on) in the realm of psychology. In marketing, this technique has been used to look at media readership profiles of people.
MERITS OF FACTOR ANALYSIS The main merits of factor analysis can be stated thus: 1. The technique is helpful in pointing out important and interesting relationships among observed data that were there at the time, but not easy to see from the data alone. 2. The technique can reveal the latent factors (that is, underlying factors not directly observed) that determine relationships among several variables concerning a research study. For example, if people are asked to rate different cold drinks (say, Limca, Coca-Cola, Gold Spot and so on) according to preference, a factor analysis may reveal some salient characteristics of cold drinks that underlie the relative preferences. 3. The technique may be used in the context of empirical clustering of products.
MAJOR LIMITATIONS OF FACTOR ANALYSIS Apart from considering its merits, one should also be aware of several limitations of factor analysis. Important ones are as follows:
Basics of Factor Analysis
313
1. Factor analysis, like all multivariate techniques, involves laborious computations involving heavy cost burden. With computer facility available these days, there is no doubt that factor analysis has become relatively faster and easier, but the cost factor continues to be the same, that is, large factor analyses are still bound to be quite expensive. 2. The results of a single factor analysis are considered generally less reliable and dependable, for, very often, a factor analysis starts with a set of imperfect data. The factors are merely blurred averages, difficult to be identified. To overcome this difficulty, it has been realised that analysis should be done at least twice. If we get more or less similar results form all rounds of analyses, our confidences concerning such results increases. 3. Factor analysis is a complicated decision tool that can be used only when one has thorough knowledge and enough experience of handling this tool. Even then, at times, it may not work well and may disappoint the user. To conclude, we can state that in spite of all the said limitations factor analysis not only helps the investigator make sense of intertwined data, but also points out some interesting relationships that may not be obvious from a more superficial examination.
WHEN TO FACTOR ANALYSE? Professor B.N. Mukherjee (Computer and Science Unit of Indian Statistical Institute, Calcutta, 1975) has outlined seven conditions under which factor analysis should be used. These are: 1. Factor analysis should be used for data reduction when the researcher wants to combine different aspects of variance related to original variables. If the investigator is interested in classifying a set of variables in relation to each other to identify sub-groups of observed variables that are similar, then cluster analysis and not factor analysis is the appropriate method. 2. Factor analysis should be used in test development programme (Guilford 1948) of studying the factorial structure of a battery of tests and the extent to which certain tests are factorially ‘pure’. 3. Principal component analysis or principal axes solution may be used for deriving a set of orthogonal variables which could be used in applied predication using multiple regression technique. Thus, factor analysis may be quite useful in subsuming a large number of factors into a smaller group. 4. Factor analysis or principal component analysis can be used to combine and scale several measures of a unidimensional domain (such as extraversion or neuroticism or verbal fluency), so as to produce maximum discrimination among individuals along a single dimension (Takenchi et al. 1982). 5. Factor analysis reveals the minimum number of independent dimensions that are required to define adequately the domain under investigation. 6. Factor analysis is preferred when the total underlying dimensions are less than the number of variables taken in the study. This is particularly required when the observed variables are infested with measurement error.
314
Applied Psychometry
7. An oblique factor analysis can only reveal as to what extent two or more hypothetical variables (such as neuroticism and extraversion) are mutually orthogonal. If this is demonstrated by varimax rotation as has been done in part by a few researchers, then such a result is merely due to mathematical artifice.
WHEN NOT TO FACTOR ANALYSE? A number of misuses of factor analysis are noticed in published literature. In some cases, a poor selection has been made of experimental variables or populations or both. Guilford stated this more specifically when he observed: Scores form such sources as the strong vocational interest blank, the Kuder Preference Record, Bernreuter’s Personality Inventory, the Minnesota Multiphasic Personality Inventory and Guilford-Martin Personality Inventories are inappropriate variables to use under most conditions of analysis. The use of a complicated statistical procedure like factor analysis does not permit one to forget the normal safeguards that should surround any scientific investigation. Guilford (1952) had listed the following 10 common faults of factor analysis: 1. Too many factors are often extracted for the number of experimental variables. 2. Too many experimental variables are factorially complex. 3. Sometimes a common factor fails to come out because it is substantially represented in only one experimental variable. 4. Not enough factors are extracted. 5. Correlation coefficients used in the analysis are often spurious. 6. Correlations of scores are sometimes used in the analysis. 7. A pair of factors is very much confined to the same experimental variable. 8. The population on which the analysis is based is heterogeneous. 9. Not enough attention is given to requirements for correlation coefficients. 10. Difficulty levels of tests often vary substantially. Nunnally Jum C. (1978) has very rightly highlighted in his work Psychometric Theory, how one is likely to fool himself with factor analysis. The main points to be kept in mind are: 1. One should not take recourse to factor analysis if the correlation between variable used or factor analysis is near zero. In other words, when correlation among variables is zero or near zero, then factor analysis should not be carried out. The main reason for this is that it can lead to misinterpretation of the factors extracted. 2. Interpretation of small factor loadings should be avoided because this would lead to overinterpretation of the factor. 3. If variables/tests are highly correlated, then factor analysis should not be carried out. In fact, the aim is to find natural correlations among variables and not the correlations that are forced through experimental dependence.
Basics of Factor Analysis
315
4. Factor analysis should not be carried out when the selected subjects are relatively heterogeneous, for example, age, sex and education. If the factors to be interpreted with regard to differences in older people are within a particular socio-economic status, then the sample of older people being studied in the factor analysis should be relatively homogeneous with regard to socioeconomic status. 5. If both sexes are taken in the research study, then factor analysis should be carried out separately for males and females. If one is not interested in carrying out separate factor analysis for males and females, then sex should be taken as another variable in the analysis. 6. If the number of subjects used in the study is less than the number of variables taken, then also factor analysis should not be carried out. Professor B.N. Mukherjee has also listed the features under which factor analysis should not be carried out. There are: 1. When intercorrelations fail to meet the assumption of linearity. 2. If the theoretical framework is based on the postulates that traits or learning disabilities combine in a non-additive manner (multiplicative) as is often assumed in psychoanalytic theory or in the theory of achievement motivation, then factor analysis should not be applied. 3. If the variables have different amount of variability in different populations, then one should not go for comparing factors form one population to another, since factor loadings are not invariant under change of scaling units. Unfortunately, this fact is still not widely known or realised despite numerous warnings. 4. Q-correlation matrices should not, in general, be factor analysed, since they rarely meet the specifications indicated by Guilford (1952). 5. Most factor analysis methods are not suitable for ‘distance measures’ or ‘proportions of agreement’ (among teachers in assessing individual students performance) or for such correlation coefficient as biserial or tetrachoric correlations. 6. Factor analysis is also contraindicated where an intercorrelation matrix is composed of selections of correlation matrices derived from different groups. 7. If the observed intercorrelation matrix does not show significant deviation form an identity matrix (a matrix with unity in the diagonal and zeros in all the off-diagonal elements), then such a matrix should not be factor analysed (Tobias and Carlson 1969). 8. Conventional factor analysis, applied to certain patterned correlation matrices as the simplex and circumplex which Guttman (1957) included under his index structure, is not appropriate (Mukherjee 1975).
18
Extraction of Factors by Centroid Method
CHAPTER OUTLINE 1. Why we extract factors: The purpose of factor extraction? 2. How we extract factors: The centroid method? 3. How many factors can be extracted? (i) Fruckter formula (ii) Eigen value index (iii) Residual correlation matrix 4. Rotation of the reference axis: (i) Oblique rotation (ii) Scree test 5. Interpretation of factors
LEARNING OBJECTIVES After reading this chapter, you will be able to understand: 1. What is the main objective behind factor extraction? 2. How can we extract factors using centroid method? 3. How can we determine the number of factors that can be extracted by using Fruckter formula, Eigen value index or Residual correlation matrix? 4. What is the meaning of factor rotation and how is it done? 5. What is oblique rotation and what is its role in factor extraction? 6. What is Scree test and how do we calculate Scree Point? 7. How do we interpret the extracted factors?
316
317
Extraction of Factors by Centroid Method
WHY WE EXTRACT FACTORS: THE PURPOSE OF FACTOR EXTRACTION
T
he primary, distinctive feature of factor analysis is data reduction. The technique of factor analysis covers all these procedures, which function to locate a small number of dimensions and cluster of factors in a larger set of independent variables or items. So, if we have to spell it in terminology of factor analysis, then we can say that the main aim of factor analysis is to find the minimal number of factors that capture the essence of a construct or idea. Thus the main aim of the factor analysis is to reduce the number of variables in a set or group of measurements by taking into account the overlap/correlations among these measures. In the area of psychological testing, the purpose of factor analysis is to find a set of salient factors that will account for the major part of the variance of a set or group of scores on different tests.
HOW WE EXTRACT FACTORS: THE CENTROID METHOD The basic data required for extraction of factors is correlation matrix. In any correlation matrix derived from test scores or values, the diagonal cell values are empty. So, the question arises as to what values to place on the diagonal? These can be (a) reliabilities of the test, (b) estimates of communalities or (c) all 1.00 values. In the Thurstone centroid method, communalities are used in the diagonal of the correlation matrix to calculate the factors. In case of Hottelling principal axes method, the diagonal entries are all 1.00. In reality, the choice of diagonal entries affects (a) the number of factors extracted and (b) the loadings of each factor on each test. In carrying out the factor analysis, communalities are taken as the diagonal values while converting the factor matrix to correlation matrix. The following are the main steps involved for extracting factors by the centroid method. For example, consider the correlation matrix given below: Tests 1 2 3 4 5
1
2
3
4
5
(0.54) 0.50 0.23 0.39 0.28
0.50 (0.49) 0.31 0.47 0.37
0.23 0.31 (0.54) 0.60 0.39
0.39 0.47 0.60 (0.74) 0.59
0.28 0.37 0.39 0.59 (0.77)
Step 1: Add values column-wise and denote by symbol C Step 2: Add values row-wise. Tests 1 2 3 4 5 C
1
2
3
4
5
(0.54) 0.50 0.23 0.39 0.28 1.94
0.50 (0.49) 0.31 0.47 0.37 2.14
0.23 0.31 (0.54) 0.60 0.39 2.07
0.39 0.47 0.60 (0.74) 0.59 2.79
0.28 0.37 0.39 0.59 (0.77) 2.40
Σ 1.94 2.14 2.07 2.79 2.40 GT = 11.34
318
Applied Psychometry
Step 3: Add all the column sums or row sums to get the grand total (GT). Step 4: Find the value of N by the following formula, N= N=
Hence,
1 1 = GT 11.34 1 = 0.297 3.37
Step 5: Multiply each column sum with N to get the first factor loadings Li of each test. Li = C × N
Mathematically, First Factor Loadings: For Test 1 = 1.94 × 0.297 = 0.58 For Test 2 = 2.14 × 0.297 = 0.64 For Test 3 = 2.07 × 0.297 = 0.62 For Test 4 = 2.79 × 0.297 = 0.83 For Test 5 = 2.40 × 0.297 = 0.71
After obtaining the first factor loadings, the next point is to find the second factor loadings as follows. Step 6: Cross-product of the matrix. List all the first factor loadings on the horizontal and vertical side of a table. Multiply corresponding rows and columns to get the cross-product of the matrix and denote as Lij. Cross-Product of the Matrix (Lij) Li
0.58
0.64
0.62
0.83
0.71
0.58 0.64 0.62 0.83 0.71
0.34 0.37 0.36 0.48 0.41
0.37 0.41 0.40 0.53 0.45
0.36 0.40 0.38 0.52 0.44
0.48 0.53 0.52 0.69 0.60
0.41 0.45 0.44 0.60 0.50
Step 7: Find first factor residual with the formula: First factor residual = rij – Lij where rij is the original correlational matrix and Lij is the first factor cross-product matrix. Step 8: Reflection As is indicated in the table above, the value of GT = –10, that is, we obtain a negative sign of the grand total. This indicates that reflection is required. By reflection, it is meant that each test vector retains its length but extends in the opposite direction. The major purpose of reflection is to get a
319
Extraction of Factors by Centroid Method
Σ
Test
1
2
3
4
5
1 2 3 4 5 C=
0.20 0.13 –0.13 –0.09 –0.13 –0.02
0.13 0.78 –0.09 –0.06 –0.08 –0.02
–0.13 –0.09 0.16 0.08 –0.05 –0.03
–0.09 –0.06 0.08 0.05 –0.01 –0.03
–0.13 –0.08 –0.05 –0.01 0.27 0.00
0.02 –0.02 –0.03 –0.03 0.00 GT = –0.10
reflected correlational matrix which will have the highest possible sum of coefficients (GT). This step is taken due to the reason that the sum of factor loadings increases with the GT, because the purpose is to make each factor account for as much variance as possible. Therefore, the reflection is done to maximise the GT. There is no hard and fast rule for this; only trial and error method is used. The best method of reflection is by inspection of the factor loadings in the residual matrix. For instance, in the above residual matrix the variables 3, 4 and 5 need reflection. This can be done by changing the signs of variables 3, 4 and 5 from positive to negative and vice versa, and also row-wise. The outcome will be a reflected residual matrix as below: 1
2
3∗
4∗
5∗
0.20 0.13 0.13 0.09 0.13 0.68
0.13 0.08 0.09 0.06 0.08 0.44
0.13 0.09 0.16 0.08 0.05 0.41
0.09 0.06 0.08 0.05 –0.01 0.27
0.13 0.08 –0.05
Test 1 2 3∗ 4∗ 5∗ C
–0.01∗ 0.27 0.42
0.68 0.44 0.41 0.27 0.42 GT = 2.22
∗reflected variables. Now loadings on the second factor are obtained from this reflected residual matrix in the same way as was done in case of origin correlational matrix for the first factor loadings (Step 1 to Step 5). Here, GT = 2.22 N= N= Second Factor Loadings For Test 1 = 0.68 × 0.67 = 0.46 For Test 2 = 0.44 × 0.67 = 0.30 For Test 3 = 0.41 × 0.67 = 0.28 For Test 4 = 0.27 × 0.67 = 0.18 For Test 5 = 0.42 × 0.67 = 0.28
1 = GT
1 2.22
1 = 0.67 1.49
320
Applied Psychometry
All tests/variables that were reflected receive minus signs, and all unreflected ones receive positive signs. Thus, the second factor loadings are 0.46, 0.30, –0.28, –0.18 and –0.28. Further to get the third factor, one would calculate from the above residual correlational matrix and not from the reflected residual correlational matrix. The method involved would be the same as adopted for the extraction of second factor loadings. A similar method would be employed for fourth, fifth and so on, factor extraction. In the present example, two factors are sufficient to explain much of the common variance. The obtained factor matrix is as below: Tests 1 2 3 4 5
Factor 1
Factor II
0.58 0.64 0.62 0.83 0.71
0.46 0.30 –0.28 –0.18 –0.28
HOW MANY FACTORS CAN BE EXTRACTED? Theoretically speaking, the maximum number of factors that can be extracted in any one problem can be equal to the number of variables/tests involved. For instance, if we have a correlational matrix of order 10 x 10, then the maximum factors that can be extracted is equal to 10. But the basic problem in factor analysis is to extract the common variance, that is, the important factors which can explain maximum variance. Therefore, it is essential to decide how many factors should be extracted in a particular research problem. There are three methods to answer this question. These are (a) Fruckter formula, (b) Eigen value index and (c) Residual correlational matrix.
Fruckter Formula The formula proposed by Fruckter to extract the number of factors in a problem is, Number of Factors =
(2n + 1) − 8 n + 1 2
where n is the number of variables in a problem or correlational matrix. For instance, we have a research problem involving 20 variables and we are interested in extracting the important factors. Before extracting the factors, as discussed earlier, the issue in front of the researcher is to decide about the number of factors to be extracted. This is given by, Number of Factors (N) =
(2n + 1) − 8 n + 1 2
Extraction of Factors by Centroid Method
321
`n = 20
Here, Thus,
number of factors
=
(2n × 20 + 1) − 8 × 20 + 1 2
=
41 − 12.69 2
= 14 (rounding up) factors Hence, in a problem involving 20 variables, 14 factors are important to be extracted. Take another example having five tests, Number of factors
=
(2 × 5 + 1) − 5 × 8 + 1 2
Number of factors
=
11 − 6.40 4.6 = = 2.3 2 2
= 2 factors Therefore, in the correlation matrix of order 5 × 5, we have extracted only two centroid factors.
Eigen Value Index Another method is to go on calculating the factors till one gets the Eigen value equal to 1.0. In another sense, only those factors are to be extracted which have Eigen value equal to or more than 1.0. The factors which show Eigen value less than 1.0 are not taken into consideration. This method is generally employed when one extracts the factors with the help of computer package (Statistical Package for Social Sciences [SPSS], Statistical Analysis Software [SAS], and so on).
Residual Correlational Matrix The residual correlational matrix is the third crude method to determine the number of factors to be extracted or to decide when to stop extracting factors. In this method, the residual correlational matrix is observed and if it is seen that most of the correlational coefficients in the residual correlational matrix are zero or approximately zero, then further extraction of the factors is stopped. Consider the following problem to understand the process of extraction of factors.
322
Applied Psychometry
Problem Six different measurements were taken from a group of 300 individuals. The correlations between measurements are given below: Measurements 1 2 3 4 5 6
1
2
3
4
5
6
1.0
0.55 1.0
0.43 0.50 1.0
0.12 0.15 0.27 1.0
0.16 0.21 0.15 0.24 1.0
0.20 0.22 0.30 0.18 0.14 1.0
Extract the centroid factors.
Solution Measurements 1 2 3 4 5 6 C
1
2
3
4
5
6
1.0 0.55 0.43 0.12 0.16 0.20 2.46
0.55 1.0 0.50 0.15 0.21 0.22 2.63
0.43 0.50 1.0 0.27 0.15 0.30 2.65
0.12 0.15 0.27 1.0 0.24 0.18 1.96
0.16 0.21 0.15 0.24 1.0 0.14 1.90
0.20 0.22 0.30 0.18 0.14 1.0 2.04
N= = First Factor Loadings (= NC) For Measurement 1 = 0.27 × 2.46 = 0.66 For Measurement 2 = 0.27 × 2.63 = 0.71 For Measurement 3 = 0.27 × 2.65 = 0.72 For Measurement 4 = 0.27 × 1.96 = 0.53 For Measurement 5 = 0.27 × 1.90 = 0.51 For Measurement 6 = 0.27 × 2.04 = 0.55
1 GT
1 1 = = 0.27 3 . 69 13.64
Σ 2.46 2.63 2.65 1.96 1.90 2.04 G. TOTAL 13.64
Extraction of Factors by Centroid Method
323
Cross-Product of the Matrix (Lij) Li
0.66
0.71
0.72
0.53
0.51
0.55
0.66 0.71 0.72 0.53 0.51 0.55
0.44 0.47 0.48 0.35 0.34 0.36
0.47 0.50 0.51 0.38 0.36 0.39
0.48 0.51 0.52 0.38 0.37 0.40
0.35 0.38 0.38 0.28 0.27 0.29
0.34 0.36 0.37 0.27 0.26 0.28
0.36 0.39 0.40 0.29 0.28 0.30
Residual Correlational Matrix = rij – Lij Measurements 1 2 3 4 5 6
1
2
3
4
5
6
0.56 0.08 –0.05 –0.23 –0.18 –0.16
0.08 0.50 –0.01 –0.23 –0.15 –0.17
–0.05 –0.01 0.48 –0.11 –0.22 –0.10
–0.23 –0.23 –0.11 0.72 –0.03 –0.11
–0.18 –0.15 –0.22 –0.03 0.74 –0.14
–0.16 –0.17 –0.10 –0.11 –0.14 0.70
Reflection Measurement 3, 4, 5 and 6 would be reflected to get the optimal value of GT. Reflected Correlational Matrix Measurement 1 2 3∗ 4∗ 5∗ 6∗ C
1
2
3∗
4∗
0.56 0.08 0.05 0.23 0.18 0.16 1.26
0.08 0.50 0.01 0.23 0.15 0.17 1.14
0.05 0.01 0.48 –0.11 –0.22 –0.10 0.11
0.23 0.23 –0.11 0.72 –0.03 –0.11 0.93
N=
N=
1 = GT
1 4.8
1 = 0.46 2.19
5∗ 0.18 0.15 –0.22 –0.03 0.74 –0.14 0.68
6∗ 0.16 0.17 –0.10 –0.11 –0.14 0.70 0.68
Σ 1.26 1.14 0.11 0.93 0.68 0.68 G. TOTAL 4.8
324
Applied Psychometry
Second Factor Loadings (= NC) For Measurement 1 = 0.46 × 1.26 = 0.58 For Measurement 2 = 0.46 × 1.14 = 0.52 For Measurement 3 = 0.46 × 0.11 = 0.05 For Measurement 4 = 0.46 × 0.93 = 0.43 For Measurement 5 = 0.46 × 0.68 = 0.31 For Measurement 6 = 0.46 × 0.68 = 0.31 As measurement 3, 4, 5 and 6 were reflected in case of reflected residual matrix, the values of loadings for measurements 3, 4, 5 and 6 would be changed to minus sign. The obtained factor matrix is, Factors Measurement
I
II
1
0.66
0.58
2 3 4 5 6
0.71 0.72 0.53 0.51 0.55
0.52 –0.05 –0.43 –0.31 –0.31
ROTATION OF THE REFERENCE AXIS For the purpose of simplifying the interpretation of the obtained factors and also to increase the number of high and low positive loadings in the column of factor analysis, a procedure is used which is known as factor rotation. There are two basic methods, (a) Orthogonal rotation and (b) Oblique rotation. Orthogonal rotation method is employed when we have factors that are not correlated with one another, while the oblique rotation method is employed when the obtained factors are related to one another. Factor analysts like Guilford prefer orthogonal rotation, while Thurstone/Cattell prefer oblique rotation. Suppose X and Y are the centroid factor vectors, and X and Y are the rotated vectors. Loadings on the five variables (tests) can be read off the rotated factors, as is done in case of unrotated factors. Factors Unrotated Test 1 2 3 4 5
Rotated
X
Y
0.69 0.56 0.74 0.65 0.70
0.50 0.40 0.38 –0.42 –0.46
X
Y
Extraction of Factors by Centroid Method
325
To derive the loadings of the factors in the case of rotated matrix, graphical method is used. Rotated factors X and Y can be converted into standard measurements by spacing deciles along the new axis from 0.00 to 1.00 in the same way, as was done in the case of original factors. Now, a perpendicular is drawn from the point on each rotated axis (X and Y) to the point corresponding to the test/variable. The point where this perpendicular touches the new rotated factor axis is the loading value of the test on the rotated factor. The other method to compute the factor loadings of rotated factor matrix is mathematical. We should keep in mind that the rotated factor is a linear combination of a set of linear combinations. For instance, the rotation of factor X in Figure 18.1 can be understood with the following linear equation. Figure 18.1 Orthogonal Rotation
Source: Author.
326
Applied Psychometry
where, RiX1 = a1Rix + b1Riy RiX1 = rotated loadings on X Rix = unrotated loadings on X Riy = unrotated loadings on Y a1 and b1 are weights for the rotation The weights are the cosine of the angles of rotation of X1 with X and Y, respectively. Hence, in this case, a1 = cosine of angle of X1 with X = cos (47°) = 0.682 = cosine of angle of X, with Y = cos (43°) = 0.731 Similarly, one can also form a factor as a linear combination of a larger number of factors. Hence, a rotated factor loading X1 could be computed from a linear combination of three factors as, RiX1 = a1Rix + b1Riy + C1Riz It is very difficult to find weights by graphical comparison in case of more than two factors. This is because the rotation is simultaneous in more than two dimensions and consequently the weights cannot be obtained by the movement of factor vectors on the flat surface of a piece of graph paper. Likewise, one can find factor loadings for the rotated factor Y, as follows: where, RiY1 = a2Rix + b2Riy RiY1 = rotated loadings on Y Riy = unrotated loadings on Y Rix = unrotated loadings on X a2 is cosine of angle of Y1 with X b2 is the cosine of angle of Y1 with Y If the unrotated factors are orthogonal and the rotated factors are also orthogonal, then the sum of the products of weights must be equal to zero, that is, a1a2 + b1b2 = 0 If there are four factors and the unrotated as well as rotated factors are orthogonal, then, a1a2 + b1b2 + c1c2 + d1d2 = 0 If the above condition is not true (that is a1a2 + b1b2 ≠ 0), then X1 and Y1 are not orthogonal (that is correlation between two factors is different than zero).
Extraction of Factors by Centroid Method
327
Oblique Rotation We have discussed the orthogonal rotation with the assumption that the original factors as well as rotated factors are orthogonal. In reality, there is no necessity to maintain right angles among rotated factors. There is another rotation known as non-orthogonal or oblique rotation. It is the rotation when the angles among the factor rotate from 90°, as shown in Figure 18.2. Figure 18.2 Oblique Rotation
Source: Author.
In this, the cosine of the angle between X and Y is 0.342, which is the correlation between the two factors. The loadings of the rotated oblique factors can be obtained in the manner as was done in the case of orthogonal factors. In case of oblique rotations, the sums of cross-product of weights for rotation of two rotated factors would not be equal to zero. Mathematically, a1a2 + b1b2 ≠ 0 for oblique rotation. Further, if oblique rotated factors are rotated again, then the sum of the square weights for rotation would be 1.0 only by chance. To explain the point further, the geometric representation of the unrotated and the rotated factors matrix for Factor I and II, respectively, are presented in Figures 18.3 and 18.4. The geometric representation of the first factor shows that variable 1, 2, 3 and 6 are all positive and their vectors
328
Applied Psychometry
lie within the first quadrant of the circle. Variables 4, 5 and 7 fall in the fourth quadrant of the circle which is displayed in Figure 18.3. When the factors were rotated following orthogonal rotation, all the vectors turned out to be positive and they all fall in the first quadrant of the circle which is displayed in Figure 18.4. Figure 18.3
Unrotated Factor Matrix: Geometric Representation
Source: Author.
The communalities for each variable can be obtained from the figures with the help of Pythagorean theorem, which states that the hypotenuse square is equal to the sum of the squares of the perpendicular and the base of the right-angled triangle. The square of the perpendicular and the base of the right-angled triangle give the factor variances, which, when added, give the communalities.
Extraction of Factors by Centroid Method
Figure 18.4
329
Unrotated Factor Matrix: Geometric Representation
Source: Author.
Scree Test Scree test is the method to decide about the important factors to be interpreted/retained from the extracted factors. The Scree test is conducted to determine which among the extracted factors is actually contributing to the variance and does not measure random error. This test was first proposed by Cattell (1966) from the practical observation that factor variance levels off when the factors are largely measuring random error. The number of factors is plotted against the proportion of the variance it extracts. The curve fitted to the plot of these factors will have a decreasing negative slope, until the random error factors or trivial factors are reached. Then, the curve will level off and the incremental difference between successive factors will be about the same.
330
Applied Psychometry
Cattell called the test Scree test, since the random error factors in a plot like that shown below in Figure 18.5 resembles a Scree, that is, the debris that has fallen or been eroded off a mountain and that lies at its base. Figure 18.5
Scree Point
Source: Author.
The factors which are at the right hand side of the factor cut-off point—popularly known as Scree point—are dropped, while the factors towards the left hand side of the Scree point are retained for final usage and interpretation. Presently, the Scree test is the most often used technique to decide about the important factors to be extracted for factor analysis. In a problem involving 38 variables, factor analysis was used to extract factors with the help of a computer statistical package. The cutting point of Eigen value was kept at 1.0 to extract the factors. On the basis of Eigen value of 1.0, 15 factors were extracted. Interpretation of these factors was found to be a difficult proposition. To reduce further and to retain only the very important factors, Scree test was used. The graph of the Scree test is shown as in Figure 18.6. It is clear from the Scree point that the first seven factors are important for the purpose of interpretation, that is, the factors that lie above the Scree point on the plot. The remaining seven factors (Factor VIII to XIV) are to be eliminated from factor interpretation because these are not contributing much of the vector variances.
INTERPRETING THE FACTOR After the statistical computations of factoring and rotation have been completed, the next problem faced by the investigator would be of interpreting the factors. This is achieved by inspecting the
Extraction of Factors by Centroid Method
Figure 18.6
331
Scree Test
Source: Author.
pattern of high and low loadings of each factor on the subtests/variables. It is to be remembered that the higher the loading, the more important the factor is on the given variable/test. Let us elaborate the interpretation of a factor with some examples. Take factor I, S. No.
Name of the Variable
1 2 3 4 5 6 7 8 9
One-upmanship lifestyle Exploitative lifestyle Domineering lifestyle Conservatism Political, social and economic gain Marriage and interpersonal relations Religiosity Individualistic lifestyle Defiant lifestyle
Loadings 0.876 0.869 0.755 0.656 0.494 0.444 0.439 –0.657 –0.658
332
Applied Psychometry
It is observed that the above factor is a very complex factor, consisting of nine variables in all. Among these variables are five lifestyles, two dimensions of caste prejudice and conservatism. The highest positive loading is that of the one-upmanship coping style. This positive loading indicates an individual or operator who uses his wits to get the better of people. The second highest positive loading on the exploitative lifestyle indicates attempt to get ahead by exploiting others. The positive loadings on the domineering authoritarian lifestyle reflect the attitude ‘I know best’ and ‘You should do what I say’. Positive loadings on conservatism reflect the desire to preserve old beliefs, customs and traditions, and to vehemently oppose progress or change. Positive loading on political, social and economic gain (PSEC) and marriage and interpersonal relations (MIR) reflect the existence of caste prejudice, more specifically in the form of expecting political, economic and social privileges and gains from members of one’s own caste, and encouraging marriage and interpersonal relation within one’s caste. Positive loadings on religiosity are indicative of the existence of religious orientation, with an emphasis on the concrete and literal qualities of religious belief. The negative loadings of the individualistic and defiant-resistive on the factor indicates that such individuals do not have the desire to be distinctively different and individual, and neither do they possess the ‘to hell with you’ attitude. In the light of above discussion, this factor may be named as the ‘Going Against People’ factor. Take another example of a factor having significant loadings on four variables, as given below. Significant Loadings of Factor S. No.
Variables
1. 2. 3. 4.
Obedient versus Assertive Expedient versus Conscientious Shy versus Venturesome Group Dependent versus Self-sufficient
Loadings –0.694 –0.778 –0.489 –0.506
The above factor has negative loadings on all the four personality variables of Cattell’s 16-PF Test. They are, namely, Cattell’s factor E (obedient versus assertive) which indicates dependent, a follower, who—to take action—goes along with the group, tends to lean on others in making decisions, soft-hearted, expressive and easily upset personality; Cattell’s factor G (expedient versus conscientious)—the negative loadings on Factor G indicate a weaker superego strength, one who evades rules and feels few obligations; Cattell’s factor H (shy versus venturesome) on which the negative loadings indicate the person to be restrained, diffident and timid, and Cattell’s factor Q2 (group dependent versus self-sufficient), the negative loadings on which indicate a ‘jainer’ and sound follower. In the light of this interpretation, the above factor (having significant loadings on four variables) may be named as ‘Dependency and Super-Ego’. Take another example in which two variables have significant loadings on the extracted factor, as given ahead:
Extraction of Factors by Centroid Method
333
Significant Loadings of factor S. No.
Variables
1. 2.
Perceived Self Ideal Self
Loadings 0.92 0.90
The above factor is found to have significant positive factor loadings on the perceived self and the ideal self. Both of these variables are components of ‘self concept’. The perceived self is the way one perceives or describes oneself or what one thinks about oneself, while the ‘ideal self’ reflects to the part what one likes to be. As both of these are components of the self-concept, this factor may be named as ‘self concept’ factor. In another study, 200 college fresh graduates were administered Standard Progressive Matrices (Raven), Need Achieve Inventory and Torrance Test of Creative Thinking. The significant loadings on the first factors are as follows: Loadings on First Centroid Factors S. No.
Name of the Variables
1. 2. 3. 4. 5. 6. 7. 8.
Scholastic achievement Intellectual capacity Need Achievement Fluency Flexibility Originality Elaboration Composite creativity
Loadings 0.43 0.54 0.09 0.78 0.71 0.68 0.71 0.80
The significant loadings of this factor are on scholastic achievement, intellectual capacity and creativity, along with the components of creativity like fluency, flexibility, originality and elaboration. The main variables having the highest factor loading was creativity which includes fluency, flexibility, originality and elaboration. All these measure divergent thinking. The other variables to be taken into account for the first factor were intellectual capacity and scholastic achievement. The intellectual capacity is loaded on convergent thinking, while scholastic achievement is loaded both on convergent and divergent thinking. Keeping this in mind, this factor may be named as ‘Creative Potential’. This name is given because the main variables with high factor loadings refer to the creative thinking and reasonable factor loadings on variables like intellectual capacity and scholastic achievement. The term ‘Creative Potential’ is chosen, as the common ability that runs through all the variables taken into account is the capacity for creative thinking and capacity to action.
19
Applications of Factor Analysis
CHAPTER OUTLINE 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Applications of factor analysis to various fields Theory development Test development Vocational psychology Personnel selection and job performance Clinical psychology Experimental psychology Practical examples A factor analytic study of the dimensions of temperament A factor analytic study of socio-economic status, frustration and anxiety
LEARNING OBJECTIVES After reading this chapter, you will be able to understand: 1. 2. 3. 4.
334
What are the areas of social life, where findings of factor analysis can be applied? What is the future scope of factor analysis? A demonstration of the factor analytic study of the dimensions of temperament. A demonstration of the factor analytic study of socio-economic status, frustration and anxiety.
Applications of Factor Analysis
335
APPLICATION OF FACTOR ANALYSIS TO VARIOUS FIELDS
F
actor analysis is a popular area in psychometry and many of its principles find application in theory development as well as in solving problems concerning common social life. Some of the areas wherein factor analysis finds important applications are discussed here.
Theory Development Several theories of intelligence have been based on the application of factor analysis. Important among these theories are Spearman’s two-factor theory, Thurston’s theory of primary mental abilities, Guilford’s structure of intellect model, Cattell’s theory of fluid and crystallised intelligence, and Vernon’s hierarchical theory. Other non-factor theorists, namely, Wesman, Hayes and Ferguson have also emphasised the importance of learning in the determination of human abilities. Developmental theories of cognition—of the sort proposed by Piaget—stress the concept of stages and organism–environment interactions in the formulation of intellectual abilities.
Test Development Another important area in which factor analysis has been used is the development of tests. In this aspect, factor analysis helps in deciding the factor structure of the items in a test, and also in giving the name to the concept to be measured by the test. Factor analysis has also been found to be useful in the search for primary interests, attitudes and temperament traits. By 1947, the US Army and Air Force had reported some four years of wartime research on test development in which factor analysis played a key role (Guilford and Lacey 1947). The phenomenal properties of sensory response have been verified by two or three studies.
Applications in Vocational Psychology An important field directly susceptible to the application of factor analysis is that of vocational psychology. In the area of personnel selection, factor analysis has been used very successfully. In vocational guidance, much emphasis is placed on personality profiles. A personality profile is difficult to interpret unless the nature of the scores that go into it is known. The best way of knowing the nature of the score is to have it factor analysed (Guilford 1954).
Applications in Personnel Selection and Job Performance In selection of personnel for certain assignments, we obtain the greatest efficiency of selection if the test or test battery we use measures all the significant common factors involved in success
336
Applied Psychometry
on the job. There will come a time when job assignments will be specified in terms of weighted combinations of the factors. When these combinations are known, we can write the prescriptions for successful tests to use in selecting personnel. When tests prove valid for a certain selection purpose, the explanation is in terms of the common factors that are related both to the tests and to success on the job. A battery of factorial tests achieves great economy of testing effort. There needs to be a maximum of only one test per common factor involved. Many present batteries, composed of many tests, may be wasteful in that they measure over and over again the same factors, limited in number. In the classification of personnel, univocal factor tests definitely come into their own. In selection, we can tolerate two or more common factors per test, provided all are also related to the job criterion and all are weighted to advantage. In classification, we sort persons among jobs, so that differential prediction becomes very important. If we are to say that person A should be assigned to job X in preference to Y or Z, or if we are to say that of two persons A and B, the assignments should be to jobs X and Z, respectively, and not the reverse, we must be able to discriminate—as much as possible—the patterns of abilities of A and B. We are concerned with differences in job patterns and individuals’ patterns of traits rather than in amounts of ability in each factor as such. We cannot establish patterns and differences in relatively unique variables without having separate measures of the factors.
Applications in Clinical Psychology Clinical psychology, generally, would find the concepts developed by factor analysis very useful. Where they have been tried, and this is unfortunately on very rare occasions, they have proved to be understandable, communicable and dependable. There is little more that one could ask of a concept. What is more, the factor concept has referents, a fact which passes the hurdles of semantic and operational standards! As Cattell (1974) has pointed out, the most important step in mastering problems of personality is to have a good list of descriptive terms.
Applications in Experimental Psychology Experimental psychology has more problems involving unknown variables than it realises. A superficial justification of this remark is the fact that most of its measurements are of the nature of test scores. Studies of learning, memory, motivation and thinking commonly utilise measurements that belong in the category of test scores. These scores are as likely to be factorially complex as those of vocational psychology. Their underlying psychological variables are usually assumed but, actually, they are usually unknown. There is much that one can do with experimental variables in the way of framing laws on that same level. In terms of more fundamental theory and systematic ramifications, however, it is likely that much is missed. Application of factor analysis is further illustrated with the help of some practical examples. For example, factor analysis has been carried out with the help of Principal Component Method (Hottelling 1933) using computer facility (Statistical Package for the Social Sciences [SPSS]) with
Applications of Factor Analysis
337
orthogonal rotation having 36 variables and a sample size of 300. First, a correlational matrix of order 36 × 36 was computed. For extracting factors, Eigen value was kept at 1.0, that is, only those factors were extracted which have the Eigen value of 1.0 or more. Following this procedure, nine factors were extracted from the 36 variables analysed (Table 19.1 and Table 19.2). To retain different factors from the variables taken in this study, the cut point of Eigen values was taken as one. These component factors were further rotated to orthogonal rotation. The unrotated and rotated factor matrices are presented in Tables 19.1 and 19.2, respectively.
Table 19.1 S. No. Name of the Variable 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23 24 25 26 27 28. 29. 30. 31. 32. 33. 34. 35. 36.
Raven’s score Achievement Motivation Factor A Factor B Factor C Factor D Factor E Factor F Factor G Factor H Factor I Factor J Factor O Factor Q2 Factor Q3 Factor Q4 Emotional Adjustment Social Adjustment Educational Adjustment Verbal Fluency Verbal flexibility Verbal Originality Verbal Elaboration Verbal Creativity Figural Fluency Figural Elaboration Figural Originality Figural Elaboration Figural Creativity Composite Creativity Sch. ACQ. English Sch. Ach. Hindi Sch. Ach. Maths Sch. Ach. Soc. Studies Sch. Ach. Science Total Scholastic Ach.
Source: Author.
Unrotated Factor Matrix (Eigen Value 1.0)
F-I
F-II
0.392 0.129 0.061 0.172 0.119 –0.082 0.145 0.056 0.238 0.127 –0.057 –0.041 –0.052 0.219 0.076 –0.110 0.106 –0.059 –0.136 0.676 0.634 0.601 0.495 0.740 0.571 0.528 0.657 0.309 0.506 0.824 0.477 0.632 0.629 0.695 0.544 0.692
–0.277 –0.110 0.079 –0.077 –0.120 0.100 –0.225 0.070 0.003 –0.196 0.086 –0.036 0.066 –0.101 –0.114 –0.051 –0.109 –0.049 –0.063 0.472 0.505 0.397 0.256 0.498 0.439 0.414 0.180 0.177 0.274 0.426 –0.659 –0.530 –0.629 –0.483 –0.622 –0.647
F-III
F-IV
0.018 0.284 0.446 0.266 0.518 –0.593 –0.101 –0.131 0.469 0.392 –0142 –0.088 –0.548 0.214 0.333 –0.441 0.605 –0.463 –0.605 –0.076 0.077 0.099 0.134 –0.117 0.067 0.074 –0.026 –0.090 0.029 –0.065 –0.180 –0.045 –0.048 –0.116 –0.094 –0.128
0.017 –0.034 0.020 0.126 –0.115 0.066 0.009 –0.166 0.074 –0.128 0.145 –0.146 0.074 0.249 0.025 0.213 0.046 0.134 0.048 –0.297 0.362 –0.363 –0.299 –0.405 0.515 0.569 0.299 0.292 0.559 –0.009 –0.132 0.083 –0.041 0.107 –0.134 –0.037
F-V
F-VI
0.218 –0.045 –0.090 –0.200 0.299 –0.092 0.135 –00.087 0.015 0.412 0.174 –0.364 0.078 0.138 0.506 –0.066 –0.269 0.126 0.269 0.126 –0.145 –0.555 –0.371 –0.141 –0.303 –0.116 –0.557 0.133 –0.454 0.0006 0.149 –0.199 0.0033 0.414 –0.302 0.365 0.122 0.385 0.047 0.019 0.027 –0.018 0.088 0.003 –0.227 0.051 –0.063 0.026 0.143 0.128 0.134 0.114 –0.013 –0.087 0.255 –0.150 0.088 0.054 –0.072 0.010 –0.052 –0.029 0.089 –0.049 0.039 0.001 0.047 –0.012 0.040 –0.077 0.011 –0.051
F-VII
F-VIII
F-IX
0.063 –0.637 0.209 –0.302 0.077 –0.053 –0.269 0.071 0.256 0.256 0.225 0.270 –0.153 –0.112 –0.087 –0.020 0.119 –0.069 0.007 –0.100 –0.100 0.066 0.165 0.005 –0.097 –0.112 0.110 0.402 0.223 –0.017 0.027 –0.071 –0.026 –0.068 0.070 0.001
–0.312 0.079 0.040 –0.570 0.133 0.020 0.536 0.072 0.023 0.023 –0.015 0.303 0.107 –0.157 –0.169 –0.190 –0.082 –0.127 –0.254 –0.046 –0.035 0.045 –0.171 –0.063 0.174 0.166 –0.015 –0.002 0.070 0.059 0.096 –0.001 0.038 –0.006 0.026 0.018
0.186 0.226 –0.098 0.080 –0.045 0.041 0.458 0.141 0.332 0.332 0.245 –0.208 0.095 –0.102 0.071 0.324 0.166 0.157 0.061 0.074 0.048 0.001 –0.087 0.003 –0.187 –0.164 0.086 0.398 0.096 0.024 –0.151 –0.039 –0.122 –0.020 –0.100 –0.109
338
Applied Psychometry
Table 19.2 S. No. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.
Name of the Variable
F-I
Raven’s Score Achievement Motivation Factor A Factor B Factor C Factor D Factor E Factor F Factor G Factor H Factor I Factor J Factor O Factor Q2 Factor Q3 Factor Q4 Emotional Adjustment Social Adjustment Educational Adjustment Verbal fluency Verbal Flexibility Verbal Originality Verbal Elaboration Verbal Creativity Figural Fluency Figural Flexibility Figural Originality Figural Elaboration Figural Creativity Composite Creativity Sch. Ach. English Sch. Ach. Hindi Sch. Ach. Maths Sch. Ach. Soc. Studies Sch. Ach. Science Total Scholastic Ach.
0.100 –0.424 0.024 –0.089
Source: Author.
F-III
F-IV
–0.033 0.300
0.029 –0.075
0.102 –0.201 0.065 –0.258 0.116 –0.573
–0.391 0.131 –0.317 –0.099
0.416 0.622
0.040 –0.103 –0.142 0.010 –0.206 –0.203 –0.030 –0.132 0.092 –0.049 0.032 –0.131 –0.032 –0.101 –0.047
0.422 0.129 0.098 –0.116 –0.155 0.008 0.210 0.119 0.164 0.086 –0.321 –0.049 0.090 –0.115 –0.743
0.151 0.026 0.020 –0.099 0.071 –0.036 0.066 –0.058 –0.039 –0.137 –0.034 0.157 –0.101 0.077 –0.033
0.161 –0.171 –0.158 0.351 0.151 0.534 –0.370 0.188 0.026 –0.157 –0.081 –0.701 –0.562 0.176 0.204
–0.055 –0.711 0.051 –0.015 0.127 –0.197 –0.236 –0.140 0.054 0.561 0.192 –0.039 –0.107 –0.043 0.066
0.324 0.007 0.647 –0.636 0.105 0.031 0.302 0.593 –0.320 –0.085 –0.510 0.045 0.142 –0.509 –0.071
0.362 0.570 0.535 0.561 0.677 0.343 0.401 0.489 0.492 0.397 0.462 0.541 0.378 0.458 0.612
–0.009 0.016 –0.055 –0.013
–0.066 –0.710
–0.001 –0.043
–0.192 –0.024 –0.056 0.221 0.096 0.136
0.026 –0.151 –0.105 –0.164
0.508 0.623
0.082 0.087 0.032 –0.126 0.029 –0.031 0.051 –0.065 –0.146 –0.009 –0.059
–0.116 –0.081 0.108 0.069 –0.010 –0.034 –0.055 –0.091 0.071 –0.086 –0.004
–0.000 –0.026 0.025 0.010 0.007 –0.028 –0.044 –0.020 0.071 0.084 –0.031
0.798 0.808 0.674 0.536 0.982 0.899 0.878 0.582 0.621 0.714 0.877
–0.036 0.011 –0.051 0.011 –0.051 0.120 0.116 –0.025 –0.023 0.047 0.066 0.018 0.016 –0.176 –0.058
F-II
Rotated Factor Matrix (Eigen Value 1.0) F-V
F- VI
–0.009 0.024 0.229 –0.113 –0.004 –0.035 –0.228 –0.196 –0.590 0.099 –0.126 –0.006 –0.095 0.069 –0.025
0.032 0.047 –0.022 –0.056 0.014 0.118 0.094 –0.283 –0.690 –0.362 –0.052
F- VII
0.154 0.080 –0.054 –0.076 –0.743 0.019 0.050 –0.091 0.018 0.097 –0.182 0.040 –0.038 0.317 0.034
–0.087 –0.076 –0.006 0.217 0.016 –0.029 –0.044 0.006 –0.024 0.007 –0.075
F-VII
F-IX
Communality
0.847 0.871 0.800 0.664 0.975 0.281 0.215 0.370 0.009 0.170 0.772
–0.065 –0.023 –0.090 –0.129 –0.098 –0.026 –0.012 –0.250 –0.249 –0.068 –0.186
–0.040 0.077 0.038 –0.087 0.019 0.094 0.089 0.041 –0.145 –0.036 0.029
0.203 0.143 0.100 0.007 0.145 0.891 0.898 0.537 0.171 0.753 0.478
0.027 0.069 0.077 0.141
–0.851 –0.804 –0.889 –0.808
–0.055 0.025 0.013 –0.051
–0.106 0.160 0.051 0.209
–0.045 0.018 –0.027 –0.036 –0.038 –0.088 –0.058 0.030 –0.048 –0.056 –0.057 –0.093
0.132 –0.005 0.132 0.030 –0.025 0.084 –0.118 0.003
0.760 0.706 0.814 0.748
0.081 –0.845 0.121 –0.952
0.026 –0.014
–0.072 0.051
0.010 –0.045 0.007 –0.056 –0.036 –0.031
0.027 –0.017
0.732 0.930
0.059 0.019
Applications of Factor Analysis
339
The proportions of the variance contributed by these factors for unrotated and rotated factor matrix are presented in Table 19.3. Table 19.3
Loadings on Varimax Factor I
Factor
Unrotated
Rotated
1. 2. 3. 4. 5. 6. 7. 8. 9.
30.14 17.78 13.97 8.88 7.53 5.87 5.70 5.03 5.12
20.17 22.05 9.40 13.70 8.15 5.80 5.33 6.19 9.53
Source: Author.
The discussion of results has been based on the orthogonal rotated factor matrix. It was thought appropriate to avoid the interpretation of the correlation matrix and the unrotated factor matrix for the simple reason that all of them were involved in the process of obtaining the rotated varimax factor matrix in accordance with the criterion of Kaiser’s varimax solution. For the purpose of discussing the results of factor matrix, the factor loadings of 0.30 or above are considered to be significant. The reason for omitting the interpretation of the original factor matrix is that the rotation of the factors is done to get new factors, which can be interpreted comfortably, economically and more meaningfully. The first factor contributed a variance of 20.17 per cent of the total variance and has got significant loadings of verbal fluency, verbal flexibility, verbal originality, verbal elaboration, verbal creativity, figural originality and composite creativity. The factor loadings are shown in Table 19.4. Table 19.4 Name of the Variable Verbal Fluency Verbal Flexibility Verbal Originality Verbal Elaboration Verbal Creativity Figural Originality Composite Creativity
Loading 0.847 0.871 0.800 0.664 0.975 0.370 0.772
Source: Author.
The second factor contributed a variance of 22.05 per cent of total variance, and intelligence and scholastic achievement (in all subjects) are the significant factor loadings. The factor loading of these factors are presented in Table 19.5.
340
Applied Psychometry
Table 19.5 Name of Variable
Loading
Raven’s score Scholastic Achievement in Social Studies Scholastic Achievement in Hindi Scholastic Achievement in Science Scholastic Achievement in English Scholastic Achievement in Mathematics Total Scholastic Achievement
–0.424 –0.808 –0.804 –0.845 –0.851 –0.889 –0.952
Source: Author.
The third factor contributed a variance of 9.40 per cent of the total variance and has got significant factor loadings of factor A (Reserved versus outgoing), achievement motivation, Factor O (Self-assured versus apprehensiveness), social adjustment, education adjustment and emotional adjustment. The factor loadings of these variables are displayed in Table 19.6. Table 19.6
Loadings on Varimax Factor III
Name of Variable
Loading
Reserved versus outgoing Achievement motivation Self-assured versus apprehensive Social adjustment Educational adjustment Emotional adjustment
0.422 0.300 –0.321 –0.666 –0.710 –0.743
Source: Author.
The fourth factor contributed a variance of 13.70 per cent and has got significant loadings of figural flexibility, figural fluency, figural originality, figural creativity and composite creativity. The factor loading of these variables are presented in Table 19.7. Table 19.7 Name of Variable Figural flexibility Figural fluency Figural creativity Figural originality Figural creativity
Loading on Varimax Factor IV Loading 0.898 0.891 0.753 0.537 0.478
Source: Author.
The fifth factor contributed a variance of 8.15 per cent of the total variance and has got a significant loading on Factor F (Sober versus Happy-go-lucky), Factor D (Phlegmatic versus Excitable), Factor G
Applications of Factor Analysis
341
(Expedient versus Conscientious), Factor Q3 (Undisciplined Self-conflict versus Controlled) and Factor Q2 (Group Dependent versus Self-sufficient). The factor loadings of these variables are presented in Table 19.8. Table 19.8
Loading on Varimax Factor V
Name of Variable
Loading
Sober versus Happy-go-lucky Phlegmatic versus Evitable Expedient versus Conscientious Undisciplined Self-conflict versus Controlled Group Dependent versus Self-sufficient
0.534 0.351 –0.370 –0.562 –0.701
Source: Author.
The sixth factor contributed a variance 5.80 per cent of the total variance and has got significant factor loadings of figural creativity, Factor I (Tough Minded versus Tender Minded) and figural elaboration. The factor loading of these variables are presented in Table 19.9. Table 19.9
Loading on Varimax Factor VI
Name of Variable
Loading
Figural Creativity Tough Minded versus Tender Minded Figural Elaboration
–0.302 –0.590 –0.690
Source: Author.
The seventh factor contributed a variance of 5.33 per cent of total variance and got significant factor loadings of Factor Q4 (Relaxed versus Tense), achievement motivation and factor E (Obedient versus Assertive). The factor lading of these variables are presented in Table 19.10. Table 19.10
Loading on Varimax Factor VII
Name of Variable
Loading
Relaxed versus Tense Achievement Motivation Obedient versus Assertive
0.317 –0.573 –0.743
Source: Author.
The eighth factor contributed a variance of 6.19 per cent of the total variance and has got significant loadings of Factor J (Vigorous versus Doubting), achievement motivation, intelligence and Factor B (Less Intelligent versus More Intelligent). The factor loadings of these variables are presented in Table 19.11.
342
Applied Psychometry
Table 19.11
Loading on Varimax Factor VIII
Name of Variable
Loading
Vigorous versus Doubting Achievement Motivation Raven’s Score Less Intelligent versus More Intelligent
0.561 –0.317 –0.391 –0.478
Source: Author.
The ninth factor contributed a variance of 9.53 per cent of the total variance and has got significant factor loadings of Factor C (Affected by feeling versus Emotionally stable), Factor H (Shy versus Venturesome), Factor I (Tough Minded versus Tender Minded), Factor Q4 (Relaxed versus Tense), Factor O (Self-assured versus Apprehensive), and Factor D (Phlegmatic versus Excitable). The factor loading of these variables are presented in Table 19.12. Table 19.12
Loading on Varimax Factor IX
Name of Variable
Loading
Affected by feeling versus Emotionally stable Shy versus Venturesome Reserved versus Outgoing Expedient versus Conscientious Tough Minded versus Tender Minded Relaxed versus Tense Self-assured versus Apprehensive Phlegmatic versus Evitable
0.647 0.593 0.324 0.302 –0.320 –0.509 –0.510 –0.636
Source: Author.
To sum up, the clear picture of the nine factors with the variables contained in each are presented in Table 19.13. It was observed that Factor I is characterised by significant loadings on seven variables namely, verbal fluency, verbal flexibility, verbal originality, verbal elaboration, verbal creativity, figural originality and composite creativity. All these variables are components of creativity and mainly represent verbal part of creativity. In view of this fact, this factor may be named as ‘Verbal Creativity’. Factor II was found to contain significant loading on intelligence, scholastic achievement in all subjects and total scholastic achievement. All these variables explain the convergent thinking nature of this factor and, hence, this factor may be named as ‘General Scholastic Achievement’ factor. The significant loadings of Factor III are due to Factor A, achievement motivation Factor O, and three factors of adjustment, namely, emotional, social and educational adjustment. The highest loading is that of Factor A (Reserved versus Outgoing). The positive loading is that on Factor III, which indicates that these persons are good-natured, easy, ready to cooperate, attentive to people, soft-hearted, kind, trustful and adaptable. Such a person likes occupations dealing with people and
Scholastic Achievement Achievement: Motivation Social Studies
Verbal Flexibility
Relaxed versus Tense
F-VII
Tough Minded versus Tender Minded
Expedient Conscientious
Reserved versus Outgoing
Source: Author.
Total Scholastic Achievement
Self-Assured versus Apprehensive Phlegmatic versus Excitable
Less Intelligence versus More Intelligence
Intelligence
Composite
Obedient versus Assertive
Affected by Feelings versus Emotionally Stable
F-IX
Achievement Shy versus Motivation Venturesome
Vigorous versus Doubting
F-VIII
Relaxed versus Tense
Figural Elaboration
Tough Minded Achievement versus Tender Motivation Minded
Figural Creativity
F-VI
Figural Scholastic Emotional Originality Achievement: Adjustment Maths
Group Dependent versus Selfsufficient
Composite Creativity
Verbal Creativity
Scholastic Educational Achievement: Adjustment English
Undisciplined Self-conflict versus Controlled
Figural Originality
Scholastic Social Verbal Elaboration Achievement: Adjustment Science
Phlegmatic versus Excitable
Sober versus Happy-golucky
F-V
Expedient versus Conscientious
Figural Fluency
Figural Flexibility
F-IV
Scholastic Achievement in Social Studies
Verbal Scholastic Self-assured Figural Originality Achievement: versus Creativity Hindi Apprehensive
Reserved versus Outgoing
Intelligence
Verbal Fluency
F-III
F-II
F-I
Table 19.13
344
Applied Psychometry
socially impressive situations. He is generous in personal relations, less afraid of criticism, better able to remember names of people but is often less dependable in precision work and in obligations. The second highest loading is on achievement motivation which shows that the person is motivated. There is a negative significant loading of Factor O, which indicates the person to be placid, calm, with unshakable nerve, mature, unanxious, confident and having the capacity to deal with things. In addition to these loadings, there are negative loadings on emotional adjustment, social adjustment and educational adjustment. This reveals the well-adjusted nature of the person in emotional sphere, social sphere and educational sphere. In view of the above, this factor may be named as ‘Motivational Adjustment’ factor. Factor IV contains figural flexibility, figural fluency, figural creativity, figural originality and composite creativity. All these variables are components of creativity. In view of this, this factor may be named as ‘Figural Creativity’. Factor V is very complex and consists of five factors of personality. The five factors of personality are F (Sober versus Happy-go-lucky), D (Phlegmatic versus Excitable), F (Expedient versus Conscientious), Q3 (Undisciplined Self-conflict versus Controlled) and Q2 (Group Dependent versus Self-sufficiency). The highest loading is that of Factor F of personality. The positive loading of F indicates headless, gay and enthusiastic personality. The positive significant I loading of Factor D indicates an excitable, impatient, demanding and overactive personality. The negative loadings on Factor G indicate weaker superego strength. The negative loading of Factor Q3 indicates relaxed nature than being tense. The other negative loading is due to Factor Q2 which indicates group dependent and sound follower. In the light of above discussion, this factor may be named as ‘Super-ego and Self-assured’. Factor VI contains significant loading on figural creativity, I (Tough Minded versus Tender Minded) and figural elaboration. The negative loading on I indicates tough minded, self-reliant, and realistic personality. This factor may be named as ‘Figural and Realistic Creativity’. Factor VII consisted of two factors of personality and third factor of achievement motivation. The factor of personality is Q4 (Relaxed versus Tense) and E (Obedient versus Assertive). The positive loading of Factor Q4 indicates tense, excitable, restless, impatient, over-fatigued, unable-to-remain-inactive personality. The negative loading of Factor E indicates dependent, a follower and prone to take action, one who goes along with the group, tends to lean on others in making decisions, soft-hearted, expressive and easily upset personality. The negative loading on an achievement motivation shows less motivated achiever personality. In the light of the above, this factor may be named as ‘Restless Motivation and Dependency’. Factor VIII consisted of four variables. The positive loading on Factor J (Vigorous versus Doubting) indicates that an individual with high scores on J prefers to think in his own way. He is often involved in his own ego, is self-opinionated and interested in internal mental life. He is usually deliberate in his actions, unconcerned about other people, and a poor team maker. The negative loading on B indicates a person who is slow to learn and less grasping, dull and sluggish. He has little taste or capacity for the higher forms of knowledge and tends to be somewhat bookish. The negative loading on achievement motivation and intelligence shows a person to be less motivated
Applications of Factor Analysis
345
for achieving goals intelligently. In the light of above, this factor may be named as ‘Individualistic Motivation and Intelligence’. Factor IX is a very complex factor and contains eight factors of personality. The positive loadings were on Factor C (Affected by Feeling versus Emotionally Stable), Factor H (Shy versus Venturesome), factor A (Reserved versus Outgoing), and Factor G (Expedient versus Conscientious). The negative loadings were on Factor I (Tough Minded versus Tender Minded), Factor O (Self-assured versus Apprehensive), and Factor D (Phlegmatic versus Excitable). The highest positive loading on Factor C indicates a person who is emotionally mature, stable, phlegmatic, realistic about life, possessing ego strength, having an integrated philosophy of life and one who maintains high group morale. The positive loading on Factor H shows a person who is sociable, participative, ready to try new things, spontaneous, able to face wear and tear in dealing with people, and ignore danger signals. The positive loadings on A and G indicate a person who is ready to cooperate, soft-hearted, kind, attentive to people, responsible, determined, energetic, well-organised and having high regards for moral standards. The negative loading on I indicates a person to be practical, realistic, responsible, phlegmatic and hard. The negative loading on Q4 indicates a person to be relaxed, composed and satisfied. The negative loadings on O and D indicate a person to be calm, mature, confident and deliberate. In the light of above discussion, this factor may be named as ‘Emotionally Stable, Adventurous and Phlegmatic Self-assured’. To sum up, the results of factor analysis are presented in Table 19.14. Table 19.14 Factor I Factor II Factor III Factor IV Factor V Factor VI Factor VII Factor VIII Factor IX
Naming of Factors at a Glance Verbal Creativity General Scholastic Achievement Motivational Adjustment Figural Creativity Super-ego and Self-assured Figural and Realistic Creativity Restless Motivation and Dependency Individualistic Motivation and Intelligence Emotional, Stable, Adventurous and Phlegmatic Self-assured
Source: Author.
A FACTOR ANALYTIC STUDY OF THE DIMENSIONS OF TEMPERAMENT Allport (1937) defined personality as the dynamic organisation—within an individual—of those psychological systems which determine his unique adjustment to his environment. Temperament, one of the most important dimensions of personality, is a composite of several individual traits. Allport (ibid.: 34), defined temperament as:
346
Applied Psychometry
Temperament refers to the characteristic phenomena of an individual’s nature, including his susceptibility to emotional situation, his customary strength and speed of response, the quality of his prevailing mood and all the peculiarities of fluctuation and intensity of mood; these being phenomena regarded as dependent on constitutional makeup and, therefore, largely hereditary in origin. Hilgard and Atkinson (1952) defined temperament as the aspect of personality revealed in the tendency to experience mood changes in characteristic ways. To describe a person’s temperament is to describe such qualities abstracted from his behaviour as dullness or alertness, gentleness, sympathy, emotionality, and so on. Temperament is also used to denote the strength, vividness and other qualities attached to senses and to the basic drives like hunger, sex, the activity level and to emotional reactivity with the implication that these things are only partially inborn. Temperament is concerned more with ‘style’ than with ‘content’. Temperaments are broad disposition that are expected to differentiate one person from another. The main objective of this study was to find the possible determinant factor of ‘temperament’ and give it a psychological interpretation. The subjects of this study consisted of students of XIth and XIIth grades of both sexes. They were selected randomly from arts, commerce and science streams from two schools of Delhi. The test administered on a sample of 100 students was ‘Dimensions of Temperament Scale’ authored by Chadha and Chandra (1984). This tool measures 15 dimensions of temperament, namely, Sociability, Ascendance, Secretiveness, Reflective, Impulsivity, Placid, Accepting, Responsible, Vigorous, Cooperative, Persistence, Warmth, Aggressiveness, Tolerance and Tough-minded. The factor analysis was carried out by following the Principle Component Solution (Hottelling 1933) with a varimax rotation (Kaiser 1958). Before the factor analysis was carried out, the Product Moment coefficient of correlation for all the 15 dimensions were computed and displayed in a 15 × 15 matrix. Following the procedure of factor analysis, seven factors were extracted from the 15 dimensions analysed. To retain different factors from the variables taken, the cutting point of Eigen value was taken as 1.0. These component factors were further rotated to varimax solution. The proportions of variance contributed by these factors for unrotated and rotated factor matrix are presented in Table 19.15. Table 19.15 Factors Unrotated Rotated
Percentage of Variance Accounted by Each Factor
I
II
III
IV
V
VI
VII
23.49 17.96
16.95 17.05
14.70 15.53
13.89 13.70
11.03 12.08
10.82 11.87
9.09 11.72
Source: Author.
The discussion of results has been based on varimax rotated factor matrix. It was thought appropriate to avoid the interpretation of correlation matrix and the unrotated factor matrix for the simple reason that all of them were involved in the process in accordance with the criterion of Kaiser’s varimax solution.
Applications of Factor Analysis
347
For the purpose of discussing the results of factor matrix, the factor loadings of 0.50 or more were considered to be significant. The reason for omitting the interpretation of the original factor matrix is that the rotation of factor is done to get new factors which can be interpreted comfortably, economically and more meaningfully. The first factor contributed a variance of 17.96 per cent of the total variance and has got significant loadings of Vigorous, Reflective, Responsible and Ascendance. The factor loadings of these variables are presented in Table 19.16. Table 19.16 Loadings of Varimax Factor I S. No.
Variables
1. 2. 3. 4
Vigorous Reflective Responsible Ascendance
Loadings 0.68 0.67 0.66 0.60
Source: Author.
The second factor contributed a variance of 17.05 per cent of total variance and has significant loadings on Persistence and Sociability. It is shown in Table 19.17. Table 19.17 S. No
Loadings of Varimax Factor II
Variables
1. 2.
Persistence Sociability
Loadings 0.77 0.74
Source: Author.
The third factor contributed a variance of 15.53 per cent of the total variance and has significant loadings on Tough-minded and Placid. Loadings of these are presented in Table 16.18. Table 19.18 S. No 1. 2.
Loadings of Varimax Factor III Variables Placid Tough-minded
Loadings 0.86 0.65
Source: Author.
The fourth factor contributed a variance of 13.70 per cent of the total variance and has significant loadings of tolerance and aggressiveness. Factor loadings of these variables are shown in Table 19.19.
348
Applied Psychometry
Table 19.19 S. No
Loadings of Varimax Factor IV Variables
1. 2.
Loadings
Tolerance Aggressiveness
0.72 0.64
Source: Author.
The fifth factor contributed a variance of 12.08 per cent of the total variance and has significant loadings of Cooperative and Impulsivity. The factor loadings of these variables are presented in Table 19.20. Table 19.20 S. No
Loadings of Varimax Factor V
Variables
1. 2.
Loadings
Cooperative Impulsivity
0.76 0.63
Source: Author.
The sixth factor contributed a variance of 11.87 per cent of the total variance and has significant loadings of Accepting. The factor loading of this is presented in Table 19.21. Table 19.21
Giving Loadings of Varimax Factor VI
S. No
Variable
1
Accepting
Loading 0.62
Source: Author.
The seventh factor contributed a variance of 11.72 per cent of the total variance and has significant loadings on Secretiveness and Warmth. The factor loadings of these variables are presented in Table 19.22. Table 19.22 S. No 1. 2.
Loadings of Varimax Factor VII Variables Secretiveness Warmth
Loadings –0.86 –0.50
Source: Author.
It was observed that Factor I is characterised by significant loadings on four variables, namely, Vigorous, Reflective, Responsible and Ascendance. The highest loading is that of variable Vigorous. The positive loading on this variable indicates that these persons have strength of mind, are forcible,
Applications of Factor Analysis
349
energetic and very productive, and do not feel tired. These people engage in continuous activity and are able to get a lot of things done. The second highest loading is on Reflective, which shows that these persons are interested in ideas, abstractions, discussions and speculations. They are interested in knowing for its own sake but are very interested in doing and using their knowledge for practical works. The third highest loading is on the variable Responsible. It indicates that these people are dependable, reliable, compulsive and easily predictable, and are determined to complete tasks on time. The positive loadings on the variable Ascendance indicates that the person easily starts conversation with strangers, likes to be in the centre of the stage, tends to stand up for his rights, takes initiative in meeting people and is very much interested in meeting important people. In view of the above, this factor may be named as ‘Individualistic Motivation and Responsibility’. Factor II consists of significant loadings on two variables: Persistence and Sociability. The highest positive loading on Persistence refers to liking to see things through to the end, sticking to one task at a time and not giving it up easily. The second highest positive loadings on Sociability indicate the desire to be with other persons, gregariousness, warm response, pleasant in interaction and ability to handle complex situations easily. In the light of the above discussion, the factor may be named as ‘Social Persistence’. Factor III comprises of significant loadings on two variables: Placid and Tough-minded. The highest loading is on Placid. It reveals that person is even tempered, calm and easy going, quiet, not easily ruffled or annoyed, and is not inclined to blow his top. The next positive loading is on the factor Tough-minded. It shows that persons with this temperament are tolerant of dirt, bugs and profanity, enjoy sports, do not care much about personal appearance and are rational. In view of this, this factor may be named as ‘Placid and Toughminded’. Factor IV contains significant positive loadings on Tolerance and Aggressiveness. The highest positive loading on Tolerance refers to people who have no social prejudice, no special political opinion, no disapproval of a class of people, can easily stand stress and misbehaviour of others and show fundamental contentment. The second highest loading is on Aggressiveness which indicates enjoyment of competitive strife, push, drive and enterprise in the struggle to get ahead of others. In the light of the above discussion, this factor may be named as ‘Tolerance and Aggression’. Factor V is characterised by significant loadings on Cooperative and Impulsivity. The highest positive loading is on Cooperative indicating a person’s full faith in people, his belief that most people do their duties, liking for superiors, business corporations, opinions of others and trust in people in general as well as successfu1 people. The positive loading on Impulsivity shows the tendency to respond quickly, happy-go-lucky carefree nature and unsystematic, unorderly and immature personality. In view of the above, this factor may be named as ‘Cooperation and Impulsivity’. The significant loadings on Factor VI are due to the factor Accepting. It refers to people who tend to think the best of people, accept people at face value, expect altruism to prevail, have no self-interest, have a positive approach towards life and who try to expect the right things at the cost of their own interest. Thus, this factor may be named as ‘Accepting’.
350
Applied Psychometry
Factor VII consists of two factors: Secretiveness and Warmth. They both are characterised by negative loadings. The highest negative loading is on Secretiveness, which indicates that the person does not keep his emotions under control, does not repress his wishes and does not check his emotions, avoids exaggeration and inhibits expression of feelings. There is negative significant loading on factor Warmth that indicates that the person is not lavish in giving praise, does not smile much, is not genuine or humane and does not act as a friend. In view of the above, this factor may be named as ‘Secretiveness and Warmth’.
A FACTOR ANALYTIC STUDY OF SOCIO-ECONOMIC STATUS, FRUSTRATION AND ANXIETY Human beings have numerous psychological and social needs. Most of them are satisfied fairly promptly and completely. It is very unusual for a person to remain entirely hungry or without any shelter. Also, there are numerous opportunities to achieve a reasonable amount of satisfaction from having friends, from receiving the approval of people and from gaining self-realisation. It is equally normal for people to encounter some difficulties in their personal adjustments. Many a times it is even desirable to face some difficulties because it would keep one trying and would add to the zest of life till success is achieved. However, if the motives are severely and continuously thwarted, the person is forced to accept substitute adjective solutions, which are definitely less satisfying, individually, and, less effective, socially. One view about frustration is that it is an internal motivation, a response which is aroused when an organism is not rewarded in the presence of stimuli, whereas previously it got rewarded. In a general sense, frustration is thwarting of a need or desire. Studies have shown that frustration exerts three effects on the behaviour of an organism, which are, (a) inhibiting (avoidance responses aroused by conditioned frustration counteracts approach responses), (b) excitatory (vigorous responses following the arousal of frustration and (c) reinforcing (learning of responses allowing escape from frustration arousing stimuli). The circumstances and situations that are potentially capable of preventing immediate and direct satisfaction of our needs, and thus leading to frustration are numerous. Man’s physical and social environment, individual limitations and psychological complexities are the main blocks causing frustration. Keeping these in view, a factorial study was done by taking into consideration socioeconomic status, frustration and anxiety of the subjects.
Methodology Sample The sample consisted of 100 students of grade Xth of a government school, situated in New Delhi. The procedure for the selection of the school was random and the sample could be regarded as a systematic random sample. These 100 students were assessed on socio-economic status, frustration and anxiety.
Applications of Factor Analysis
351
Tools Employed For measuring socio-economic status, frustration and anxiety, three tests were administered. The tests were: 1. Kulshreshtha’s (1972) Socio-Economic Status Scale 2. Chauhan and Tiwari’s (1978) frustration test and 3. Sinha and Sinha’s (1973) comprehensive anxiety scale The selection of the tests was done on the basis of high reliability and validity of these tests.
Procedure Rapport was established with the students by explaining to them the objectives of the study in brief. The data was collected by administering the tests to a group consisting of 20 to 25 students in different settings and on different days. The scoring of each scale was done as per the manual instructions. The Kulshrestha scale measures socio-economic status, the Chauhan and Tiwari scale measures regression fixation, resignation, aggression and frustration, and the Sinha and Sinha test measures anxiety. Thus, altogether seven variables were scored and subjected to analysis.
Results, Interpretation and Discussion Before the factor analysis was carried out, the Product Moment coefficients of correlation for all the seven variables were computed and displayed in a 7 × 7 matrix (Table 19.23). Table 19.23 Variables Socio-economic Regression Fixation Resignation Aggression Frustration Anxiety
Socio-economic 1.0
Regression Fixation
Fixation
Resignation
Aggression
–0.63∗ 1.0
–0.67∗ 0.88∗
–0.62∗ 0.88∗ 0.79∗
–0.62∗ 0.77∗ 0.78∗ 0.83∗
10
1.0
1.0
Frustration
Anxiety
–0.66∗ 0.82∗ 0.98∗ 0.93∗ 0.92∗
–0.28 0.48∗ 0.67∗ 0.44∗ 0.45∗
1.0
0.62 1.0
Source: Author. Note: ∗Significant at 0.01 level.
The data was subjected to factor analysis to study the structure of the variables. The programme used for the factor analysis was based on Pearson Product Moment Correlation and subsequently
352
Applied Psychometry
on Principal Component Solution with Eigen value 1.0 as the cutting point to retain the factors (Hottelling 1933). Following the procedure of Principal Component Solution, only one factor was extracted from the seven variables analysed. It was thought inappropriate to carry out varimax rotation (Kaiser 1958), as only one factor was extracted. The significance of factor loadings was tested by using Table B of Herman (1976: 441), where the standard error of the factor coefficients are given. In practice, the standard error of the factor loadings were multiplied by the p = 0.01 value of t with appropriate degrees of freedom.1 When this value was less than a particular factor loading, the loading was accepted to be significant. The other procedure adopted was that the loadings were accepted as significant if their values were greater than 0.30. Both the methods, in fact, yielded the same results. In this particular study, second method was employed. The factor loadings for the unrotated factor matrix are given in Table 19.24. Table 19.24
Factor Loadings Contributed by Each Variable
Variables
Factor I
Scio-economic status Regression Fixation Resignation Aggression Frustration Anxiety
–0.73 0.92 0.93 0.91 0.91 0.99 0.60
Source: Author.
While interpreting and naming the factor, greater weightage was given to high loading variables. The factor was found to contain significant loadings on all the seven variables, namely, socioeconomic status, regression, fixation, resignation, aggression, frustration and anxiety. Except socioeconomic status, all other variables showed positive and significant loadings on the factor extracted. Out of the six variables, five belong to Frustration and the sixth belongs to Anxiety. This suggests that frustration and anxiety are highly correlated. On the other hand, socio-economic status contains a negative and significant loading on the factor extracted. This implies that socio-economic status is negatively related with frustration and anxiety, that is, poor socio-economic status will lead to high frustration and anxiety. Even the correlation matrix (Table 19.1) indicates that socioeconomic status is negatively but significantly related with regression, fixation, resignation, aggression, frustration and anxiety. One can say that the reason for high degree of frustration and anxiety is the low socio-economic status. In other sense, persons with low socio-economic status are more frustrated and have higher anxiety levels. Keeping in view the above loadings, this factor may be named as ‘Socio-economic Status as Factor for Frustration and Anxiety’. This revealed that for an individual to be frustrated, a variety of factors rather than a unitary factor, are needed. The study also showed negative relationship between socio-economic status and all the components of frustration, namely, regression, fixation, resignation, aggression and frustration.
Applications of Factor Analysis
353
There is a positive and significant relationship between anxiety and components of frustration (regression, fixation, resignation and aggression). Anxiety may be defined as a mental distress with respect to some anticipated frustration. Sometimes trivial matters and minor frustration cause a feeling of uneasiness. Growing out of many frustrating situations, anxiety serves as the driving force for a large number of subsequent adjustments. This anxiety has its origin in the birth trauma. It arises more out of loss of love rather than from lack of love. This could be one of the reasons for a positive and significant correlation between frustration and anxiety.
NOTE 1. p is the level of significance and t is the student distribution.
Bibliography Allport, Gordon W. 1937. Personality: A Psychological Interpretation. New York: Holt and Company. Allport, Gordon W. and H.S. Odbert. 1936. ‘Trait Names: A Psycholexical Study’, Psychological Monograph, 47: 171. Anastasi, Anne. 1988. Psychological Testing. 6th Edition. New York: McMillan Publishing Company. Anastasi, Anne and Susana Urbina. 1997. Psychological Testing. 7th Edition. New York: Printice-Hall. ———. 2000. Psychological Testing. 7th Edition. Singapore: Pearson Education (Singapore) Pvt. Ltd. Asthana, H.S. 1945. ‘Psychoneurotic Tendencies among University Students’, Indian Journal of Psychology, 20: 94–95. ———. 1950. ‘Hindustani Adjustment Inventory’, Education, 29: 17–20. ———. 1968. Manual of Direction and Norms for Adjustment Inventory in Hindi. Varanasi: Rupa Psychological Corporation. Atwater, Eastwood and Karen Duffy. 1998. Psychology for Living: Growth, Adjustment and Behaviour Today. New Delhi: Pearson Education. Bean, K.L. 1953. Construction of Educational and Personnel Tests. New York: McGraw-Hill Books Company. Bergner, M., R.A. Bobbitt and W.B. Carter. 1981. ‘The Sickness Impact Profile: Development and Final Revision of a Health Status Measure’, Medical Care, 19: 787–805. Berman, A. 1973. ‘Reliability of Perceptual-Motor Laterality Tasks’, Perceptual and Motor Skills, 36: 599–605. Bialer, I. 1960. Conceptualisation of Success and Failure in Mentally Retarded and Normal Children. Ann Arbor, Michigan: University Microfilms (Also in brief in Journal of Personality, 1961, 29: 303–20). Bradfield, James M. and H. Stewart Moredock. 1957. Measurement and Evaluation in Education. New York: The MacMillan Company. Bridget, S. and L. Cathy (eds). 2008. Research Methods in the Social Sciences. New Delhi: Vistaar. Blake, R. and J. Mouton. 1964. The Managerial Grid: The Key to Leadership Excellence. Houston: Gulf Publishing Co. Brayfield, A.H. and H.F. Rothe. l95l. ‘An index of Job Satisfaction’, Journal of Applied Psychology, 35: 307–31. Briggs, Myers Isabel, Mary H. McCaulley, Naomi L. Quenk and Allen L. Hammer. 2003. MBTI Manual. 3rd Edition. California: Mountain View. Burt, C. 1941. ‘Factor Analysis and Physical Types’, Psychometrika, 12: 171–88. Campbell, D.T. 1960. ‘Recommendations for APA Test Standards Regarding Construct Traits and Discriminant Validity’, American Psychologist, 15: 546–53. ———. 1995. ‘The Campbell Interest and Skill Survey (CISS): A Product of Ninety Years of Psychometric Evolution’, Journal of Career Assessment, 3(4): 391–410. Cattell, R.B. 1966. Factor Analysis. New York: Harper Collins. ———. 1974. ‘A Large Sample Cross Check on 16PF Primary Structure by Parcelled Factoring’, Multivariate Experimental Clinical Research, 1(2): 79–95. ———. 1982. ‘The Inheritance of Personality and Ability’, Proceedings of the Second International Conference on 16-PF Test. Champaign. IL: IPAT. Caws, P. 2005. ‘First and Second Order Unification in the Social and Human Sciences’, Graduate Journal of Social Sciences, 20: 35–41. Chadha, N.K. 1996. Theory and Practice of Psychometry. New Delhi: New Age International Publishers. ———. 1998. Statistical Methods in Behavioural and Social Sciences. New Delhi: Reliance Publishing House. ———. 2006. ‘Emotional Quotient Test’, in Dalip Singh (ed.), Emotional Intelligence at Work: A Professional Guide. New Delhi: Response Books. Chadha, N.K. and H. Bhatia. 1989. Family Environment Scale. Agra: National Psychological Corporation. Chadha, N.K. and S. Chandra. 1984. Manual for Dimensions of Temperament Scale. Agra: National Psychological Corporation. Chauhan, N.S. and G. Tiwari. 1978. Manual of Frustration Scale. Agra: Psychological Research Cell.
354
Bibliography
355
Chambers, L.W. 1996. ‘McMaster Health Index Questionnaire’, in B. Spilker (ed.), Quality of Life Assessments in Clinical Trials, pp. 267–79. New York: Raven Press. Cook, J. and T.D. Wall. 1980. ‘New Work Attitude Measures of Trust, Organisational Commitment and Personal Need Nonfulfillment’, Journal of Occupational Psychology, 53: 39–52. Cronbach, L.J. 1941. ‘Test Reliability: Its Meaning and Determination’, Psychometrika, 12: 1–16. ———. 1947. ‘Test “Reliability”: Its Meaning and Determination’, Psychometrika, 12: 297–334. ———. 1951. ‘Coefficient Alpha and the Interval Structure of Tests’, Psychometrika, 16: 297–334. Deo, P. and Sagar Sharma. 1971. ‘Relationship of Self-acceptance and Anxiety’, Journal of Psychological Researches, 15: 63–65. DuBois, P.H. 1970. A History of Psychological Testing. Boston: Allyn & Bacon. ———. 1972. ‘Increase in Educational Opportunity through Measurement’, Proceedings from the 1971 Invitational Conference on Testing Problems. Princeton, NJ: Educational Testing Service. Dubin, R. 1956. ‘Industrial Workers’ Worlds: A Study of the “Central Life Interests” of Industrial Workers’, Social Problems, 3: 131–42. Dubin, R., J.E. Champoux and L.W. Porter. 1975. ‘Central Life Interests and Organizational Commitment of Blue Collar and Clerical Workers’, Administrative Science Quarterly, 20: 411–21. Dyer, G.W. 1964. ‘Educational Therapy: Objectives and Values’, Ment Hosp, 15: 23–26. Edwards, T.B. 1966. ‘Teacher Attitudes and Cultural Differentiation’, Journal of Experimental Education, 35(2): 80–86. Efron, R.E. and H.U. Efron. 1967. ‘Measurement of Attitude toward the Retarded and an Application with Educators’, American Journal of Mental Deficiency, 72: 100–07. Erford, B.T. (ed.). 2006. Counselor’s Guide to Clinical, Personality and Behavioral Assessment. Boston: Lahaska Press. ———. 2007. Assessment for Counselors. Boston and New York: Lahaska Press Houghton Mifflin Company. Eysenck, H.J. and S.B.G. Eysenck. 1975. Manual of the Eysenck Personality Questionnaire. London: Hodder and Stoughton. Feldhusen, J.F. 1965. ‘Teacher’s and Children’s Perception of Creativity in High and Low Anxious Children’, Journal of Educational Research, 58(11): 442–46. Fields, D.L. 2002. Taking the Measure of Work: A Guide to Validated Scales for Organizational Research and Diagnosis. London: Sage Publications. Finn, S.E. and M.E. Tonsager. 1997. ‘Information Gathering and Therapeutic Models of Assessment: Complementary Paradigms’, Psychological Assessment, 9(4): 374–85. Flanagan, J.C. 1939. ‘General Considerations in Selections of Test Items and a Short Method for Estimating the Product Moment Coefficient from Data at the Tails of Distribution’, Journal of Educational Psychology, 30: 674–80. Fleishman, E.A. 1953. ‘The Measurement of Leadership Attitudes in Industry’, Journal of Applied Psychology, 37: 153–58. Freeman, F.S. 1955. Theory and Practice of Psychological Testing. USA: Sir Issac Pitman & Sons Ltd. ———. 1962. Theory and Practice of Psychological Testing. London: IBH Publishing Co. Gardner, E. and R. Madden. 1969. Stanford Early School Achievement Test: Directions for Administering. New York: Harcourt, Brace & World. Garfield, E.F. and R.H. Blum. 1973. ‘Stanford University Evaluation Scales’, in L.A. Abranes, E.F. Garfield, J.D. Sisher and others (eds), Accountability in Drug Education. Washington, DC: The Drug Abuse Council, Inc. Gregory, R. 1994. ‘Seeing Intelligence’, in J. Khalfa (ed.) What Is Intelligence, pp. 13–26. Cambridge, England: Cambridge University Press. Garrett, H.E. 1958. Statistics in Psychology and Education. New York: Longmans, Green & Co. Goleman, Daniel. 1997. Emotional Intelligence: Why it can Matter more than IQ. New York: Bantam Books. Goodwin, C.J. 1998. Research in Psychology: Method and Design. New York: John Wiley and Sons, Inc. Gordon, L.V. 1973. Work Environment Preference Schedule. New York: The Psychological Corporation. Gough, H.G. and P. Bradley. 1996. California Personality Inventory (CPI) Manual. 3rd Edition. Palo Alto, CA: Consulting Psychology Press. Gregory, R.J. 2004. ‘The Nature and Uses of Psychological Tests’, in Psychological Testing: History, Principles & Applications, pp. 53–66. New Delhi: Pearson Education. Guilford, J.P. 1948. A Revised Structure of Intellect. Report of Psychological Lab. No. 19. Los Angeles: University of Southern California. ———. 1952. ‘When not to Factor Analyze?’, Psychological Bulletin, 49: 26–37.
356
Applied Psychometry
Guilford, J.P. 1954. Psychometric Methods. New York: McGraw-Hill Book Company, Inc. Guilford, J.P. and J.I. Lacey (eds). 1947. ‘Printed Classification Tests’, Army Air Force Aviation Psychology Program Research Reports, No. 5. Washington, DC: US Government Printing Office. Guilford, J. P. and W.S. Zimmerman. 1948. ‘The Guilford-Zimmerman Aptitude Survey’, Journal of Applied Psychology, 32(1), 24–35. Gulliksen, H. 1950. Theory of Mental Tests. New York: John Wiley & Sons, Inc. ———. 1961. Theory of Mental Tests. New York and London: John Wiley and Sons, Inc. Guttman, L. 1957. ‘A General Nonmetric Technique for Finding the Smallest Coordinate Space for a Configuration of Points’, Psychometrika, 33: 469–506. Hall, Calvin S. and Gardner Lindzey. 1970. Theories of Personality. 2nd Edition. New Delhi: John Wiley & Sons. Harper, R.S. and S.S. Stevens. 1948. ‘A Psychological Scale of Weight and Formula for its Derivation’, American Journal of Psychology, 61: 343–51. Henerson, M.E., L.L. Morris and C.T. Fitz-Gibbon. 1987. How to Measure Attitude. London and New Delhi: SAGE Publications. Herman, H. 1976. Modern Factor Analysis. New Jersey: John Wiley and Sons. Hilgard, E.R. and R.C. Atkinson. 1952. ‘A Motivational Syndrome in Chronic Schizophrenics’, Journal of Personality, 20: 253–76. Hottelling, H. 1933. ‘Analysis of a Complex of Statistical Variables into Principal Components’, Journal of Educational Psychology, 24: 417–41, 498–530. Howat, G. and M. London. 1980. ‘Attributions of Conflict Management Strategies in Supervisor-Subordinate Dyads’, Journal of Applied Psychology, 65: 172–75. Hyde, Gillian and Geoff Tricpey. 1995. Career Interest Inventory. Florida: The Psychological Corporation. Indiresan, J. 1973. ‘Multivariate Analysis of Factors Affecting Job Satisfaction of Engineering Teachers’, Doctoral Dissertation in Psychology. New Delhi: Indian Institute Technology. Jackson, C. 2003. Understanding Psychological Testing. Mumbai: Jaico Publishing House. Jordan, B.T. 1971. ‘Jordan Left-Right Reversal Test: A Study of Visual Reversals in Children’, Child Psychiatry and Human Development, 4(3): 178–87. Jung, Carl Gustav. 1971. ‘Psychological Types’, Collected Works of C.G. Jung. Volume 6. Princeton, New Jersey: Princeton University Press. Kahn, R.L, D.M. Wolfe, R.P. Quinn, J.D. Snock and R. Rosenthal. 1964. Organisational Stress: Studies in Role Conflict and Ambiguity. New York: John Wiley and Sons. Kaiser, H.F. 1958. ‘The Varimax Criterion for Analytic Rotation in Factor Analysis’, Psychology, 23: 187–200. Kaplan, R.M. and D.P. Saccuzzo. 2005. Psychological Testing: Principles, Applications and Issues. California: Thomson Wadsworth. Kaplan, R.M. and J.P. Anderson. 1990. ‘An Integrated Approach to Quality of Life Assessment: The General Health Policy Model’, in B. Spilker (ed.), Quality of Life Assessments in Clinical Trials, pp. 131–49. New York: Raven Press. Kelly, T.L. 1935. ‘Analysis of a Complex of Statistical Variables into Principal Components’, Psychometrika, 26: 139–42. ———. 1939. ‘The Selection of Upper and Lower Groups for the Validation of Test Items’, Journal of Educational Psychology, 30: 17–24. Kelley, J. and H. Harper. 1967. Item Analysis and its Significance. Research monograph. New York: American Educational Research Association. Kerr, Isabel J. 1978. ‘El papel central del diálogo en la estructura del discurso del cuiba’, in Nancy L. Morse (ed.), Estudios guahibos, pp. 1–110. Serie Sintáctica, 11. Bogotá: Ministerio de Gobierno. Khan, S.B. 1966. ‘The Contribution of Attitudinal Factors to the Prediction of Academic Achievement’, Dissertation submitted at the Florida State University, Tallahassee. Kirchoff, B.A. 1975. ‘Managerial Style Questionnair’, Personnel Psychology, 28(3): 351–64. Kline, T.J.B. 2005. Psychological Testing. New Delhi: Vistaar. Koch, J. and R. Steers. 1978. ‘Job Attachment, Satisfaction, and Turnover among Public Sector Employees’, Journal of Vocational Behavior, 12: 119–28. Kothari, C.R. 2004. ‘Multivariate Analysis Technique’, in Research Methodology: Methods & Techniques, pp. 315–43. New Delhi: New Age International Publishers. Kuder, G.F. 1975. General Interest Survey. Chicago: Science Research Associate, Inc. Kulshreshtha, S.P. 1972. Socio-Economic Status Inventory. Agra: Psychological Research Cell.
Bibliography
357
Kumar, R. 2005. Research Methodology: A Step by Step Guide for Beginners. Australia: Pearson Education. Kumar, P. and D.N. Mutha. 1975. ‘Standardisation of Job Satisfaction Questionnaire’, Behaviourometric, 23(1): 85–89. Kundu, R. 1962. ‘Development of Personality Inventory’, Indian Journal of Psychology, 37: 171–74. Lanyon, R.I. 1967. ‘The Measurement of Stuttering Severity’, Journal of Speech and Hearing Research, 10: 836–43. Lawler, E.E. and D.T. Hall. 1970. ‘Relationship of Job Characteristics to Job Involvement, Satisfaction and Intrinsic Motivation’, Journal of Applied Psychology, 54(4): 305–12. Leahey, T.H. 2006. A History of Psychology: Main Currents in Psychological Thought. Pearson Education. Lemke, E. and W. Wiersma. 1976. Principles of Psychological Measurement. Chicago: Rand McNally College Publishing Company. Loyer-Carlson, V.L., D. Busby, T. Holman, D, Klein and J. Larson. 2002. RELATE: User’s Guide. Needham Heights, MA: Allyn & Bacon. McIntire, S.A. and L.A. Miller. 2007. Foundations of Psychological Testing: A Practical Approach. New Delhi: SAGE Publications. McCall, W.A. 1922. How to Measure in Education. New York: The McMillan Company. McEwen, J. 1992. ‘The Nottingham Health Profile’, in S.R. Walker and R.M. Rosser (eds), Quality of Life Assessment: Key Issues for the 1990s. Dordreht, Netherlands: Kluwer Academic Publishers. Mehta, P. 1969. The Achievement Motive in High School Boys. New Delhi. National Council of Educational Research and Training. Mehta, Prayag. 1976. ‘From Economism to Democratic Ccommitment: The Role of Worker Participation’, Vikalpa: The Journal of Decision Makers, 1(4): 39–46. Michael, J. 2003. ‘Using the Myer-Briggs Type Indicator as a Tool for Leadership Development? Apply with Caution’, Journal of Leadership and Organizational Studies, 10(1): 68–81. Minium, Edward W., Bruce M. King and Gordon Bear. 2001. Statistical Reasoning in Psychology and Education. New Delhi: John Wiley & Sons Asia Pte Ltd. Moos, R.H. 1989. Family Environment Scale. Consulting. Palo Alto, California: Psychologists Press, Inc. Mukherjee, B.N. 1975. ‘When to Factor Analyse’, Unpublished manuscript. Calcutta: Indian Statistical Institute. Murray, H.A. and L. Bellak. 1973. Thematic Apperception Test. San Antonio, TX: Harcourt Assessment. Mursell, J.L. 1947. Psychological Testing. New York: Longmans, Green and Company. Nunnally, J.C. 1967. Psychometric Theory. New York: McGraw-Hill. ———. 1970. Introduction to Psychological Measurement. New York: McGraw-Hill Book Company. ———. 1978. Psychometric Theory. New York: McGraw-Hill Book Company. Olson, D.H. 2004. PREPARE/ENRICH: Counselor’s Manual. Minneapolis: Life Innovations. Pareek, U. 1975. ‘Motivational Climate Questionnaire’, Mimeographed report. Indian Institute of Management, Ahmedabad. Pareek, U., R.S. Devi and S. Rosenzweig. 1968. Manual of the Indian Adaptation of the Rosenzweig Picture-frustration Study: Adult Form. New Delhi: Rupa Psychological Corporation. Pestonjee, D.M. 1972. ‘An Investigation into Job Satisfaction and Resistance to Change’, Psychological Studies, 17(2): 8–12. Peters, T.J. and R.H. Waterman, Jr. 1982. In Search of Excellence: Lessons from America’s Best-Run Companies: The Rational Model. New York: Warner Books. Porter, L.W. and F.J. Smith. 1970. ‘The Etiology of Organisational Commitment’, Unpublished paper, University of California, Irvine. Puhan, B.N. 1995. ‘Projective-Inventory: An Indigenous Approach to Personality Assessment’, Psychology and Developing Societies, 7(2): 115–31. Quinn, R.P. and L.T. Shephard. 1974. ‘The 1972–73 Quality of Employment Survey’, Institute for Social Research, University of Michigan, Ann-Arbor, Michigan. Ramamurti, P.V. 1968. ‘Adjustment Inventory for the Aged’, Indian Journal of Psychology, 43: 27–29. Raven, J.C. and J.H. Court. 1998. Raven’s Progressive Matrices. Oxford, England: Oxford University Press. Raven, J., J.C. Raven and J.H. Court. 2003. Manual for Raven’s Progressive Matrices and Vocabulary Scales. San Antonio, TX: Harcourt Assessment. Reber, A.S. and E.S. Reber. 2001. Dictionary of Psychology. 3rd Edition. London: Penguin. Reddy, N.Y. 1964. ‘Development of an Adjustment Inventory for Use with Adolescents’, Journal of Psychological Researches, 8(1): 68–76. Rizzo, J., R.J. House and S.I. Lirtzman. 1970. ‘Role Conflict and Ambiguity in Complex Organisations’, Administrative Science Quarterly, 15: 150–63. Rokeach, M. 1960. The Open and Closed Mind. New York: The Basic Books.
358
Applied Psychometry
Roy, Bishwanath. 1967. ‘Social Integration Attitude Scale (SIAS)’, Psychological Studies, 12(1): 39–45. Russell, M. and D. Karol. 2002. Manual Authors: 16 PF Authors, Raymond B. Cattell, A. Karen Cattell, and Heather E.P. Cattell 16PF ®, 5th Edition. Illinois: Institute of Personality and Ability Testing. Seashore, C.E., D. Lewis and J.G. Saetveit. 1940. Manual of Instructions and Interpretations for the Seashore Measures of Musical Talents. Chicago: Stoelting. Shanmugarn T.E. 1965. ‘Personality Traits of Pupils Who had Their Education through the Medium of Their Mother Tongue and English’, Journal of Educational Research and Extension, II(2): 51–57. Sharma, Sagar. 1970a. ‘Manifest Anxiety & School Achievement of Adolescents’, Journal of Consulting and Clinical Psychology, 34(3): 46–49. ———. 1970b. ‘Standardisation of an Anxiety Scale’, Recent Trends in Education, I: 14–16. ———. 1971. ‘Parental Occupation & Anxiety’, New Trends in Education, 2: 21–24. Sharma, Sagar and Pratibha Deo. 1970. ‘Self-concept and School Achievement’, Indian Educational Review, 5: 101–05. Sharma, S.R. 2004. Evaluation in Education. New Delhi: Shri Sai Printographers. Singh , A.K. 2006. Tests, Measurements and Research Methods in Behavioural Sciences. New Delhi: Bharti Bhavan. Singh, K.P.S. and J. Sinha. 1973. Manual of Comprehensive Anxiety Scale. Agra: National Psychologcal Corporation. Sinha, P. and O.B. Sayeed. 1980. ‘Measuring Quality of Working Life: Development of an Inventory’, Indian Journal of Social Work, 1(3): 219–26. Smith, P.C., L.M. Kendall and C.L. Hulin. 1969. The Measurement of Satisfaction in Work and Retirement. Chicago: RandMcNally. Snyder, C.R. 1997. Marital Satisfaction Inventory: Revised Manual. Los Angels: Western Psychological Services. Spanier, G.B. 1989. Dyadic Adjustment Scale. Tonawanda, NY: Multi Health Systems. Spearman, C. 1904. ‘General Intelligence—Objectively Determined and Measured’, American Journal of Psychology, 15: 201–93. Srivastava, A.K. and A.P. Singh. 1981. ‘Construction & Standardisation of an Occupational Stress Index: A Pilot Study’, Indian Journal of Clinical Psychology, 8: 133–36. Steers, R.M. 1977. ‘Antecedents and Outcomes of Organizational Commitment’, Administrative Science Quarterly, 22: 46–56. Stevens, S.S. (ed.). 1951. Handbook of Experimental Psychology. New York: John Wiley and Sons. Szilagyi, A.D. and R.T. Keller. 1976. ‘A Comparative Investigation of the Supervisory Behaviour Description Questionnaire’, The Academy of Management Journal, 19(4): 642–49. Takenchi, K., H. Yanai and B.N. Mukherjee. 1982. ‘A Generalized Method of Image Analysis from an Intercorrelation Matrix’, Psychometrika, 44(1): 95–97. Tate, M.W. 1955. Statistics in Education. New York: Macmillan. Taylor, A.J.W. 1968. ‘A Brief Criminal Attitude Scale’, Journal of Criminal Law, Criminology, and Police Science, 51(1): 37–40. Templer, D.J. 1970. ‘The Construction and Validation of a Death Anxiety Scale’, Journal of General Psychology, 82: 165–77. Terman, L.M. and M.A. Merrill. 1973. Stanford-Binet Intelligence Scale: Manual for the Third Revision Form L-M. (1972 Norm Tables by R. L. Thorndike.) Boston: Houghton Mifflin. Thurstone, L.L. 1947. ‘Factor Analysis as a Scientific Method’, Psychomtric Lab Report No. 65, University of Chicago. Tobias, S. and J.E. Carlson. 1969. ‘Brief Report: Barlett’s Test of Sphericity and Chance Findings in Factor Analysis’, Multivariate Behavioural Research Monographs, 4(3): 375–77. Torrance, E.P. 1966. Torrance Test of Creative Thinking. Princeton, N.J.: Personnel Press. United States Department of Labor. 1977. Dictionary of Occupational Titles (DoT). USA: United States Department of Labor. Velander, P.L. 1993. Premarriage Awareness Inventory. Inver Grove Heights, MN: Logos Production. Vroom, V.H. 1960. ‘The Effects of Attitudes on the Perception of Organization Goals’, Human Relations, 13: 229–40. Wallach, M.A. and C.W. Wing, Jr. 1968. ‘Is Risk a Value?’ Journal of Personality and Social Psychology, 9: 101–06. Wallach, M.A. and J. Mabil. 1970. ‘Information versus Conformity in the Effects of Group Discussion on Risk Taking’, Journal of Personality and Social Psychology, 14: 149–56. Ware, J.E., Jr., M. Kosinski, M.S. Bayliss, C.A. McHorney, W.H. Rogers and A. Raczek. 1995. ‘Comparison of Methods for the Scoring and Analysis of SF-36 Health Profile and Summary Measures: Summary of Results from the Medical Outcomes Study’, Medical Care, 33: AS 264–AS 279. Warr, P.B., J. Cook and T.D. Wall. 1979. ‘Scales for the Measurement of Some Work Attitudes and Aspects of Psychological Well-being’, Journal of Occupational and Organisational Psychology, 52(1): 129–48. Whitely, B.E., Jr. 1997. Principles of Research in Behavioural Science. London and Toronto: Mayfield. Zubin, J. 1934. The Method of Internal Consistency for Selecting Items. J. Educ Psychol, 25: 345–56.
Index Absolute zero, 18 Accidental error, 23 Aptitude, 82 Assessment, 184 Assess-mentors educators as (see Educators, as assess-mentors) Attributes measurement, for, 16–17 Biased error, 23–24 Biserial correlation, 105–10 Campbell’s Theory of Measurement, 7–8 Career interest inventory, 184–86 Cattell’s 16-personality factors (16-PF), 228–31 Chance error, 23 Clerical ability test, 46, 47–48 Communality meaning of, 307 Common variance meaning of, 306 Constant error, 26 Construct validity, 148 Content validity, 147 Counseling Indian test on, 220–26 Criterion validity, 148 Cronbach alpha, 139–40 Data reduction through factor analysis, 317 Decile(s), 170–71 Differential aptitude test (DAT), 184 Dimensions of Temperament Scale (DTS), 209–10 Educators as assess-mentors, 184 Eigen value, 307, 321 Emotional Quotient (EQ) test, 262–67 Errors measurement, of, 22 types of
accidental/chance errors (see Accidental error and Chance error) constant error (see Constant error) interpretative error (see Interpretative error) personal error (see Personal error) statistical error (see Statistical error) systematic/biased error (see Systematic error and Biased error) variable errors (see Variable error) unattempted items, effect, 44–47 Error variance meaning of, 306 Face validity, 147 Factor analysis, 313–14 applications of, 335 extraction of, 320–24 through Centriod method, 317–20 faults in, 314–15 feature, 317 interpretation of, 330 limitations of, 312–13 meaning of, 302 merits of, 312 methods of, 302 study of socio-economic status, 350 Factorial validity, 149 Factor loading(s) and correlation coefficient, 302–06 Factor rotation, 324–26 Factor theory, 306 Family Environment Scale (FES), 211 Flanagan method, 97 Fruckter formula for extraction, of number of factors, 320–21 Guessing problem of, 120–21 Guidance Indian test on, 220–26 Interpretative error, 24–25 Interval scale, 14–15
359
360
Applied Psychometry
Item analysis meaning of, 96 methods for estimation difficult level, 102–05 technical aspects in, 96 Item characteristics curve (ICC) role in prediction, of test scores, 116–18 Item difficulty, 100–102 Item discrimination, 96–100 Item validity determination methods, from biserial correlation of, 105 Lm interpretation of, 64–66, 67 Lmvc method interpretation of, 62, 66 use of, 59 Lucky answering, 123 Lvc interpretation of, 63, 67 Marital satisfaction inventory (MSI), 209 Mathematics and measurement, 6–7 Maudsley personality inventory (MPI), 82 Measurement attributes for (see Attributes, for measurement) Campbell’s Theory of Measurement (see Campbell’s Theory of Measurement) Errors of (see Errors, of measurement) definition of, 4 mathematics and (see Mathematics and measurement) psychological (see Psychological measurement) social science domain, in, 8–9 Stevens contribution to, 9–10 theories of, 5–6 types of measurement scale interval scale (see Interval scale) nominal scale (see Nominal scale) ordinal scale (see Ordinal scale) ratio scale (see Ratio scale) Myres–Briggs type indicator (MBTI), 258–62 Nominal scale, 10–13 Norm(s) meaning and nature of, 158 types and methods for calculation of, 159 use in psychological testing, 178–79 Numerical ability test, 46–47
Oblique rotation, 327–29 Odd–even method, 42 Ordinal scale, 13–14 Parallel test calculation of test reliability, for, 67–68 criterion for psychological criterion, 50 statistical criterion, 51–53 equal validity, 53 meaning of, 50 Percentile method, 159 Percentile ranks (PRs), 166–70 Personal error, 25 Personality definition of, 345 Personality test classification of, 81 Point-biserial correlation, 110–16 Power test, 40, 41 Problem of correction method, 122 Psychological measurement problems in human attributes, variability in, 19 lacking of absolute zero, 18 lack of monetary support, 19 measurement, indirectness of, 18 measure of simple behavior, 18–19 quantification, problem of, 20 uncertainty and desirability, in human responses, 19 Psychological test(ing) clinical help, for, 228 definition of, 72–73 ethical issues, 286–87 health, adjustment and counseling, for, 209 history of, 74–79 nature and characteristics of, 73–74 organizational setting, in Emotional Quotient (EQ) Test (see Emotional Quotient (EQ) Test) major test in, 267 Myres–Briggs Type Indicator (MBTI) (see Myres– Briggs Type Indicator (MBTI)) principles of, 287–98 role of, 184 scoring importance in, 129–30 test use for adjustment, test of, 215–16 attitude, test of, 216–20
Index
child rearing practice, 212–13 self-concept measure, 214–15 types of, 75, 80–85 Psychometric test meaning of, 85 Rank order items scoring of, 127–29 Ratio scale, 15–16 Reliability factors affecting age variability, 141 consistency, in score, 142 interval of time between testing, 142 practice and learning, effect of, 142 scores variability, 141–42 test length, effect of, 142–44 importance, in psychological testing, 144–45 meaning of, 132 methods for calculation parallel form method, 134 rational equivalence method, 137–39 split-half reliability, 135–37 test-retest method, 132–34 types, in psychological testing, 144 validity and, 156 Residual correlational matrix for determination, of number of factors, 321 Scoring importance, in psychological testing, 129–30 problem of time scoring problem (see Time scoring problem) Scree test, 329–30 Sickness impact profile (SIP), 209 Situational test meaning of, 98 Social persistence, 349 Specific variance
361
meaning of, 306 Speed test, 40, 41 Standard progressive matrices (SPM), 187–88 Statistical error, 26–27 Substitution error of, 32–33 Systematic error, 23–24 Temperament meaning of, 345–46 Test construction steps in evaluation of test, 93–94 final draft, construction of, 94 planning of test, 88–90 preparation for preliminary tryout test, 91–93 Test items, 73 Thematic apperception test (TAT), 231–34 Time scoring problem, 124 Uniqueness meaning of, 306 Validity factors affecting correction for attenuation, 150 criterion contamination, 150–51 differences among group, 149–50 length of test, 151–53 meaning of, 147 methods of calculation construct validity (see Construct validity) criterion validity (see Criterion validity) determination of validity by means of judgment, 147 factorial validity (see Factorial validity) reliability and (see Reliability and validity) use to make prediction, 153–55 Variable error, 25
About the Author Narender Kumar Chadha is Professor of Psychology at the University of Delhi, Delhi; Head, Department of Adult, Continuing Education and Extension, University of Delhi, Delhi and Chairman, Board of Research Studies in Humanities and Social Sciences, University of Delhi, Delhi. Professor Chadha has been teaching in the Department of Psychology since 1982. He has taught courses in Research Methodology and Statistics, Psychometry, Organisational behaviour, Social Gerontology, and Applied Social Psychology. He specialises in Gerontology, Psychometric (Psychological Testing) and Organisational Behaviour. He is responsible for constructing and standardising many Psychological Tests like Emotional Intelligence Test, Social Intelligence Scale, Job Satisfaction Inventory, Family Environment Scale and Achievement Motivation Test. He has also developed a Scale for the selection of DRDO Scientists. He has several publications to his credit, including Theory and Practice of Psychometry (New Age International Publications, 1996), Aging and the Aged—A Challenge to Indian Gerontology (Friends Publications, 1997), Human Resource Management—Issues, Case-studies and Experiential Exercises (Sai Printographers, 2002), Think Before Getting Grey (Sai Printographers, 2003) and Recruitment and Selection (Galgotia Publishing House, 2004). He has also co-authored two books with Harpreet Bhatia, which are Know Yourself (Friends Publications, 2006) and Be a Winner (Friends Publications, 2006). He has also co-edited Social Aging in a Delhi Neighbourhood with John van Willigen (Bergin and Garvey, 1999) He has been conducting many ongoing research projects with foreign universities. Presently, he is in collaboration with institutes like Sanders-Brown Center for Aging Research, University of Kentucky, USA; Healthy Aging, Institute of Gerontology, University of Heidelberg, Germany; Department of Social Gerontology, University of Dortmund, Germany; Department of Physiology and Applied Sciences, La Trobe University, Australia; British University, UK; and Department of Psychology and Educational Counselling, Penn State University, USA.
362