CLINICAL ASSESSMENT COMPUTERIZED METHODS AND INSTRUMENTATION
CLINICAL ASSESSMENT, COMPUTERIZED METHODS, AND INSTRUMENTATION Ed rs
F.J.Maarse Nijmegen Institute for Cognition and Information (NICI) Nijmegen University, The Netherlands A.E.Akkerman LTP BV Amsterdam, The Netherlands A.N.Brand Department of Clinical Psychology University of Utrecht, The Netherlands L.J.M.Mulder Department of Environmental and Work Psychology University of Groningen, The Netherlands
LISSE ABINGDON EXTON (PA) TOKYO
Library of Congress Cataloging-in-Publication Data Applied for Copyright © 200 Swets & Zeitlinger B.V., Lisse, The Netherlands All rights reserved. No part of this publication or the information contained herein may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, by photocopying, recording or otherwise, without written prior permission from the publishers. Although all care is taken to ensure the integrity and quality of this publication and the information herein, no responsibility is assumed by the publishers nor the author for any damage to property or persons as a result of operation or use of this publication and/or the information contained herein. Published by: Swets & Zeitlinger Publishers http://www.szp.swets.nl/ This edition published in the Taylor & Francis e-Library, 2006. “ To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to http://www.ebookstore.tandf.co.uk/.” ISBN 0-203-97098-5 Master e-book ISBN
ISBN 90 265 (Print Edition)
Contents Preface
vii
Section I. Clinical Assessment 1 Computerized Measurement of Implicit Short Term Memory Retention in a Digit Copying Task C.Lafosse, M.Moeremans, G.De Corte, I.Leenen, H.Maes, and E. Vandenbussche 2 Neuropsychological Assessment of Spatial Memory: Using Object Relocation in Clinical Subjects R.P.C.Kessels, A.Postma, and E.H.F.de Haan 3 Developing a Computer-Supported, Collaborative Learning Environment for Argumentative Writing H.J.M.(Tabachneck-)Schijf, G.Erkens, J.Jaspers, and G.Kanselaar 4 A Multi-Media Computer Program for Training in Basic Professional Counseling Skills J.Adema and K.I.van der Zee 5 The Conquered Giant: The Use of the Computer in Play Therapy M.F.Delfos 6 A Conceptualization of Alexithymia and Defense Mechanisms in relation to Lateral Brain Dominance P.P.Moormann, N.Brand, E.Behrendt, and J.Massink 7 Quantification of Eye Blinks and Eye Tics in Gilles de la Tourette Syndrome by Means of Computer-Assisted Observational Analysis—Clinical Application J.H.M.Tulen, M.Azzolini, J.A.de Vries, W.H.Groeneveld, J.Passchier, and B.J.M.van de Wetering
2
15
26
41
53 66
85
Section II. Computerized Methods 8 Sequential Testing in Psychodiagnostics Using Minimax Decision Theory H.J.Vos 9 Visual Ordination Techniques in Grid Methodology: The Example of the Self-Confrontation Method
100 113
R.van Geel, H.de Mey, and N.Bendermacher 10 From Theory to Research: Computer Applications in a Novel Educational Program for Introductory Psychology N.Brand, G.Panhuijsen, H.Kunst, J.Boom, and H.Lodewijkx 11 Psychotherapy Treatment Decisions Supported by Select Care C.Witteman 12 A Method for the Assessment of Interpersonal Functioning in a Residential Group Therapy Program K.Linker
128
136 146
Section III. Instrumentation 13 Using Windows for Psychological Tests and Experiments with Real-time Requirements C.F.Bouwhuisen and F.J.Maarse 14 WinTask: Using Microsoft Windows ™ for Real-time Psychological Experiments J.Bos, E.Hoekstra, L.J.M.Mulder, J.A.Ruiter, J.R.Smit, D.Wybenga, and J.B.P Veldman 15 Mouse or Touch Screen J.van de Ven and A.de Haan 16 Automan: A Psychologically Based Model of a Human Driver L.Quispel, S.Warris, M.Heemskerk, L.J.M.Mulder, and P.C.van Wolffelaar 17 Workplace for Analysis of Task Performance J.Bos, L.J.M.Mulder, and R.J.van Ouwerkerk 18 Cognitive Analysis and Modeling of an Ambulance Dispatch Task R.J.van Ouwerkerk, R.Kramer, J.Bos, and L.J.M.Mulder Author Index
157
166
181 194 208 222
234
Preface This seventh edition of ‘Computers in Psychology’ contains proceedings of the workshop with the same name held at the University of Utrecht in September 1999. The workshop is intended for psychologists and researchers working in the field of experimental psychology, who are engaged in dedicated professional use of computers for assessment or research. The organizing committee conducts these workshops in order to improve communication between technicians and researchers working at psychological laboratories, and psychologists working in the field of psycho-diagnostics, clinical assessment and neuropsychology. This book is divided into three sections: Clinical Assessment, Computerized Methods, and Instrumentation.
Clinical Assessment
This section contains contributions from seven different research areas. In Chapters 1 and 2, Lafossse and Kessels describe two clinical assessment tasks for short-term memory and spatial memory, respectively. Chapter 3 ((Tabachneck-) Schijf et al.) investigates progress in a collaborative learning environment, using argumentative texts. In Chapter 4 (Adema) and Chapter 5 (Delfos), computer programs are used for training in counseling skills and play therapy. In Chapter 6 (Moormann et al), the issue of alexithymia in relation to defense mechanisms is addressed. In Chapter 7 (Tulen), the last in the section Clinical Assessment, a computer-assisted observational analysis of eye blinks and eye tics in Gilles de la Tourette syndrome is presented.
Computerized Methods
This section focuses on methods realized with the use of computers, in clinical as well as in experimental settings. In the first contribution (Chapter 8), by Vos, optimal rules of sequential testing problems in psycho-diagnostics are derived. Chapter 9 (Van Geel et al.) presents visual ordination techniques in grid methodology. The computer program KUNGRID is applied to the self-confrontation method of Hermans (1976). Chapter 10 (Brand et al.) describes the application of a Testmanager program to a practical introductory course for psychology students. SelectCare is another computer program (Chapter 11, by Witteman), containing a decision making model for supporting
psychotherapists who need to decide how to treat patients suffering from depression. A method for the assessment of interpersonal functioning in a residential group therapy program (Chapter 12, by Linker) is the last topic of this section.
Instrumentation
The central issue of this section is the hardware and software needed for controlling and implementing psychological and ergonomic experiments. Chapters 13 and 14 (Bouwhuisen; Bos) describe the problems that have to be solved for real-time experimentation using Microsoft Windows 95/98. Van de Ven and de Haan, in Chapter 15, presents the differences between using a mouse and a touch screen as a computer input device. The last three contributions describe three experimental settings at the University of Groningen. The first contribution (Chapter 16), by Quispel et al., concerns a psychologically based model of a human driver. The second paper, by Bos et al. (Chapter 17), introduces a laboratory based environment for a broad range of day-to-day tasks. In the last contribution to this section and to the entire book, by Van Ouwerkerk et al. (Chapter 18), the results of a field study of the workload of ambulance dispatchers are used to build a virtual spatial navigation and planning environment in the laboratory.
Organization and background to ‘Computers in Psychology’
The Contact group for Instrumentation in Psychology in the Netherlands (KIP), and the users Group Computer Aided Diagnostics in the Netherlands (GCOP) recognized the significance of the computer for psychological research and assessment purposes as early as 1984, when they took the initiative to organize the first workshop ‘Computers in Psychology’. Since that first occasion, the workshop has been organized every two or three years. The purpose of the workshop is to provide a platform for the exchange of knowledge, insights and ideas concerning the use of the computer in psychological research and applied psychological work. The workshops are held to stimulate technicians, test developers and scientists to present newly developed instrumentation and methods of tackling the problems associated with real-time data-acquisition, data processing and information management.
Acknowledgements
Here we would like to acknowledge the numerous people who have contributed to the success of the last workshop in Utrecht and to the production of these proceedings. We
want to thank the members of the local organizing committee, which includes Nico Brand, Sibe Doosje, Wim Sjouw, Barbara Wesselingh and Erna Beeker. Due to their enthusiasm, the workshop became a great success. We also thank the audio-visual department of the Faculty of Social Sciences for their support. We could not forget to mention the members of the KIP and the GCOP for their help, advice and reviews. We thank the authors for their creative contributions and their cooperation in producing this book. Finally, we thank Karen Mansfield for her accurate corrections and improvements to the written English of all the chapters in this book. Frans J.Maarse
Section I CLINICAL ASSESSMENT
Chapter 1 Computerized Measurement of Implicit Short Term Memory Retention in a Digit Copying Task C.Lafosse1,2, M.Moeremans1, G.De Corte3, I.Leenen2, H.Maes2, and E.Vandenbussche2 1
Rehabilitation Center Hof ter Schelde, August Vermeylenlaan 6, B-2050 Antwerp, LO, Belgium 2 Laboratory of Neuropsychology, K.U.Leuven, Belgium 3 UIA, University of Antwerp, Belgium
Abstract Timing characteristics of memory recall appear to be as informative as accuracy measures. We argue that a specific kind of response time, namely the Inter-Response time (IRI), can be of special significance in measuring implicit short term memory retention in a digit copying task. In this study we investigated the congruent and empirical validity of the IRI as a behavioral correlate of short-term memory retention during digit copying. Normal subjects (n=107; age varying between 12 and 70 years) were asked to copy strings of 5 digits by echoing them on the numerical keyboard. The computer recorded the time that elapsed between the correct entry of two successive digits of a digit string: the Inter-Response Interval (IRI). First, congruent validity was established by investigating the (cor)relation between subjects’ IRI of the non-repeated digit strings and their Digit Span retention score measured by the WAIS. Second, based on Hebb’s paradigm, empirical validity was established by manipulating the recurrence of a particular digit string. The results showed a significant correlation between subjects IRI and their Digit Span score indicating that individual differences in the IRI of the non-repeated digits can be explained by differences in short term memory retention capacity. The results also showed that, according to Hebb’s paradigm, with the progress of the test trials, the IRI of the repeated digit strings decreases gradually (as it was stored in LTM), while the IRI of the nonrepeated strings (measuring STM retention capacity) remain stable indicating that STM retention can be dissociated from digit learning. These results suggest that classical and computerized assessment should be considered complementary. Each one can reveal different aspects of cognitive functioning. In this study, we demonstrated that response times
Computerized measurement of implicit short term memory
3
are informative for measuring implicit memory functioning in a digit copying task.
Introduction Memory assessment is one of the principal objectives of neuropsychological evaluation. Therefore one would expect a considerable degree of sophistication on the part of the professional performing these assessments, both in the approach to the phenomena of memory and in the assessment of the procedures they select (Loring & Papanicolaou, 1987). However, since the clinical and research functions of clinical neuropsychology have moved away from establishing the cerebral localization of lesions and brain functions, it is more important to find practical methods of assessing implicit and explicit everyday memory performance in the laboratory (Loring & Papanicolaou, 1987) both for subjects with major and minor brain dysfunctions. Memory plays a part in the performance of a wide variety of everyday tasks in a rather implicit or indirect manner. For instance, Conrad and Hull (1967) found evidence that short-term memory is important in a person’s accuracy in copying alpha-numeric material by hand. In these kind of tasks memory intervenes in a rather automated and less conscious way in comparison with the classical laboratory memory tasks. However, what is done effortlessly by a normal person may require cognitive effort on the part of the brain damaged subject. Even minor neurological impairments may cause subtle and transient cognitive changes resulting in major “hidden” consequences for everyday functioning. For example, a transient but unnoticed “slowing” may be more traumatic because of its “hidden nature” than a continuous and perceivable side effect for which the subject can compensate (Thompson & Huppert, 1980). Consequently, whereas subjects with minor memory impairments will perform at ceiling level on everyday memory tests like, for instance, the Rivermead Behavioral Memory Test, they need more cognitive effort in comparison with normal persons on simple tasks such as copying digits due to slower information processing. The question now is, how can we measure memory retention adequately for tasks that at first sight do not explicitly refer to memory processing? An important condition by the implicit measurement of this kind of memory is that the subject is not aware that his memory is being tested. Most parameters registered by paper and pencil tests do not fulfil this requirement because they assess memory explicitly, so that subjects are aware of memory testing. Consequently, many subjects are confronted with their failing competence because their lost functions are being addressed. As a behavioral correlate of this kind of implicit memory function, the timing characteristics of recall appear to be more informative than reporting the subjects’ number of items correctly recalled. Indeed, response times can be registered implicitly, this means without explicitly informing the subjects. By means of computerized assessment response times can also be measured with millisecond precision. Consequently, subtle brain abnormalities and minor memory impairments affecting processing speed, which are difficult to detect by conventional pencil and paper tests, can be readily measured by a computer (Alpherts & Aldenkamp, 1990). Some studies illustrate the value of time registration in addition to traditional accuracy measures as a
Clinical assessment computerized methods
4
sensitive and significant factor in performance. Harness et al. (1977) found that response latency was of special differential significance in the comparison of cognition and performance in subjects with organic brain damage and psychiatric patients. Glenn and Parsons (1990) specifically addressed the role of processing time in neuropsychological performance and combined it with accuracy measures to provide a more sensitive measure in detecting subtle alcohol impairment. We argue that a specific kind of response time, namely the Inter-Response time, can be of special significance in measuring implicit memory functioning in certain everyday tasks. There is some indication from experimental psychology that this may be the case. Based on the Inter-Response times, Chase and Simon (1973) defined the memory chunks during playing chess and how they are related to one another. In other words, they measured the implicit or indirect memory retrieval in playing chess by means of Interresponse times. We developed a computerized digit copying task based on the registration of subject’s Inter-Response Interval times (IRI). Digit strings, presented successively on the computer monitor, had to be echoed on an adapted numerical keyboard. Each digit string remains on the screen until the subject has completely entered it. The subjects work at their own pace (they were not instructed to perform at a particular pace). The computer recorded the time that elapsed between the entry of two successive digits in the digit string: the Inter-Response Interval (IRI). We argue that the IRI can be an implicit measurement of short-term memory retrieval during digit copying for two reasons. First, there is evidence that short-term memory plays a role in copying digit strings presented on a computer monitor (Conrad, 1966). Second, an analysis of this task reveals that subjects have to shift their fixation point from the monitor to the keyboard. This process of ‘shifting’ takes time and can be accurately measured by the IRI. During this time subjects have to store some digits in short-term memory. Therefore, individual differences in IRI may be explained by differences in short-term memory retention capacity. In this study we investigated the congruent and empirical validity of the IRI as a behavioral correlate of short-term memory retention during digit copying. Based on Hebb’s Recurring Digits paradigm (Hebb, 1961; Milner, 1970, 1971), which distinguishes short term from long term memory behaviorally, the trials consist of alternating non-repeated and, unknown to the subjects, a repeated digit string of five digits. First, congruent validity was established by investigating the (cor)relation between subjects’ IRI of the non-repeated digit strings and their Digit Span retention score measured by the WAIS. If individual differences in the IRI of the non-repeated digits can be explained by differences in STM capacity, then we expect a significant correlation between subjects IRI and their Digit Span score. Second, empirical validity was established by manipulating the recurrence of a particular digit string. If the IRI short term memory retention can be dissociated from digit learning, then we expect that, according to Hebb’s paradigm, with the progress of the test trials, the IRI of the repeated digit strings will decrease gradually (as it will have been stored in long term memory), while the IRI of the non-repeated strings (measuring STM capacity) will remain stable.
Computerized measurement of implicit short term memory
5
Figure 1. Example of one of the test trials. In total, thirty-two strings of 5 digits are presented successively on the computer screen placed in front of the subjects. Method Subjects The subjects in this study were 107 normal persons (N=107; age varying between 12 and 70). None of these subjects had motor or visual problems that would interfere with correct performance of the task. All subjects were recruited either from one of two regular schools, from one university (psychology students and staff from the K.U.Leuven) or from one of three nursing homes, based on their voluntarily cooperation and availability. Task Thirty-two strings of 5 digits were presented successively on the computer screen placed in front of the subjects (see Figure 1). Fifteen unrepeated strings of 5 digits each, were presented successively. Based on Hebb’s Recurring Digit paradigm (Hebb, 1961; Milner, 1970) we manipulated also the recurrence of a particular digit string. Therefore, from trial four, a specific digit string (namely “2–5–1–9–6”) was alternated with a non repeated digit string. They were asked to copy each string by echoing them on the numerical keyboard. While performing this task, subjects worked at their own pace. Each digit string remained on the screen until the subject had completely entered it. Subjects could also monitor which digits they were entering and they were allowed to make corrections.
Clinical assessment computerized methods
6
Figure 2. The Inter-Response Intervals (IRI) within one digit string, i.e. the time elapsed between the entry of two successive digits of a digit string. There are 5 digits, therefore each digit string consists of 4 IRI’s. The computer recorded the time that elapsed between the correct entry of two successive digits of a digit string: the Inter-Response Interval (IRI). There are 5 digits, therefore each digit string consists of 4 IRI’s (Figure 2). The adapted numerical keyboard differs from the default computer keyboard in the sense that the position of the digits was reversed (1 2 3 being at the top instead of the bottom). This arrangement has three advantages: (1) from the ergonomic point of view it gets priority (Conrad & Hull, 1966); (2) it fits in with daily life keyboard designs such as a telephone, a calculator, a teller machine etc. and hence increases the ecological validity; (3) the difficulty level of this keyboard layout will be roughly the same for subjects with and without computer experience. Experienced computer users can not profit from their skills because they are unfamiliar with this keyboard layout. For the other group it conforms more to subjects’ expectations of where numerals are to be found. To avoid any response bias that might be introduced by this particular keyboard, subjects were able to familiarize themselves extensively with this layout before the actual experiment. For this purpose a similar task was presented this time with non-repeated strings ranging from 1 to 4 digits. Furthermore, it should be emphasized that we investigated and optimized the user-interface of this task before the initial experiment. Procedure All subjects had to perform individually the digit copying task followed by the classical pen-and-paper WAIS Digit Span sub-test. The total duration of a test session lasted 15 minutes. Subjects were asked to copy, at their own pace, each string by echoing it on the adapted numerical keyboard. The experimenter told them that he wanted to investigate the human user-interface of a simple digit copying task. No suggestion was made with reference to a memory experiment.
Computerized measurement of implicit short term memory
7
Data Analysis In this study we focused on the analysis of subjects’ response times for correctly entered digit strings. Congruent Validity. Congruent validity was investigated by calculating the relation between subject’s median IRI and short-term memory as measured afterwards by the Digit Span score (WAIS). This Digit Span test is a classical test that reflects the amount of digits that can be immediately attended to and stored in short-term memory. It encompasses a greater memory component in combining a forward and backward digit span, and thus makes it a more sensitive index than measuring the forward digit span alone (Kapur, 1994). We used a one-tailed non-parametric partial Spearman rank-order correlation coefficient to determine the degree of association between the subject’s median IRI (across digit strings) and their Digit Span score, corrected for age. Empirical Validity. Based on Hebb’s Recurring Digit paradigm (Hebb, 1961; Milner, 1970) we manipulated the recurrence of a particular digit string. We expected that, with the progress of the test trials, the median IRI within the repeated digit strings decreases gradually, in comparison with the median IRI within the non-repeated strings that would remain stable. In addition, we wondered whether the decrease could be attributed to an equal decrease of the four IRI’s within a digit string or to a more pronounced decrease in one particular IRI? It is worth looking more closely at this due to the light it may shed on the strategies involved in performing this task. Therefore, we carried out an ANOVA by means of multiple regression with a three Factorial Block design, examining main effects and factor interactions. The factors involved were respectively (1) the variable trial number (this factor occurred at 30 levels equal with the number of trials); (2) the variable repeat, (i.e. the variable indicating whether or not the string was repeated; it occurred at 2 levels, namely the recurring or non-recurring appearance of digit string; (3) the variable interval (occurring at 4 levels, namely the 4 individual IRI’s within a digit string). Since the test consists of all these conditions and all subjects passed the test (and consequently, all levels of the independent variables), subject scores will be considered as ‘blocks’ of observations. By this block design, variability in the dependent variable (i.e. IRI) due to inter-individual differences can be eliminated from the error variance. In this way, possible effects from the independent variables on the dependent variable might be revealed more efficiently. Results Congruent Validity. We used a one-tailed non-parametric partial Spearman rank-order correlation coefficient to determine the degree of association between the subject’s median IRI (across digit strings) and their Digit Span score, corrected for age. We found a significant negative correlation between subjects’ median IRI of the non-repeated digit strings and their Digit Span score: rs=−.56 (p<.0005). This means that a low median IRI corresponds with a high Digit Span and a high median IRI corresponds with a low Digit Span.
Clinical assessment computerized methods
8
Figure 3. Median IRI time (ms) per trial for the repeated and non-repeated test trials. Empirical Validity. The ANOVA indicates a main-effect of the variable ‘repeat’ (F(1,8848)=422.20; p<.05). There was a main effect of the variable ‘trial number’ (F(1,8848)=109.05; p<.05). We also found a significant interaction effect between the variables ‘repeat’ and ‘trial number’ (F(1,8848)=43.59: p<.05) which can be explained by the IRI decrease for the repeated digit string compared with the non-repeated string (Figure 3 and 4). The main-effect of the variable ‘interval’ (F(3,8848)=577.50; p<.05) indicates that the registered IRI-times across the four intervals are not equal, as can be seen in Figure 4. More specifically, the IRI’s of the second and third interval positions are larger in comparison with the other two intervals. The results further indicate no significant interaction-effect between the variables ‘interval’ and ‘trial number’ (F(3,8848)=1.06; n.s.). On the other hand, we can conclude for the normal subjects that this interaction (interval×trial number) differs according to the level of the variable ‘repeat’, due to the presence of a significant triple interaction between the variables ‘repeat’, ‘interval’, and ‘trial number’ (F(3,8848)=2.69; p<.05). However, post hoc analysis shows that, within the recurring digit string, there was no significant interaction between the variables ‘interval’ and ‘trial number’ (F(3,4387)=0.23; n.s.) In other words, due to the main-effect of the variable ‘interval’ we can conclude that the IRI differs across the four interval positions, but we have no evidence that this difference tends to be smaller as the trial number of the repeated digit string increases. This supports the hypotheses that the execution of the task, namely echoing digits, remains stable regardless of the repeated or non-repeated appearance of the digit string.
Computerized measurement of implicit short term memory
9
Figure 4. Median IRI time (ms) of the non-repeated and repeated digit strings for each ‘IRI’-interval (see also Figure 2). For easy reference we divided the test trials in three blocks (at the beginning of the task, in the middle and on the end) each consisting of 5 digit strings. For each block of trials we showed the median IRI (ms) corresponding each ‘IRI’ -interval (non-repeated trial blocks: trial block 1 consists of trials 3, 5, 7, 9, 11; trial block 2 consists of trials 13, 15, 17, 19, 21; and trial block 3 consists of trials 23, 25, 27, 29, 31; repeated trial block: trial block 1 consists of trials 4, 6, 8, 10, 12; trial block 2 consists of trials 14, 16, 18, 20, 22; and trial block 3 consists of trials 24, 26, 28, 30, 32) Discussion The IRI was measured on a computer keyboard whilst echoing digit strings, which were presented successively on a computer screen. The results provide an argument for the
Clinical assessment computerized methods
10
validity of the IRI as an implicit measurement of short term memory retention in a digit copying task. First the results showed that the median IRI of the (non-repeated) digit strings correlated significantly with the Digit Span score of the WAIS. Since the digit span score is likely to be mediated by short term memory, the latter factor can explain the observed congruence. Consequently, the IRI may be used to measure the storage capacity of short term memory. Second, we found that the IRI of the recurring digit string gradually decreased, compared to the non-recurring digit string. Combined with the results of the correlation study, this decrease can be explained by a transfer from short to long term memory. The subjects gradually learned the repeating series, as it was evidently stored in long term memory. In this sense the results were congruent with the Hebb’s RecurringDigit paradigm findings. Along with Hebb’s experimental findings, the visual task we constructed shows a distinction between short and long term memory. However, our paradigm was different from Hebb’s in several aspects. First, his subjects were asked to repeat auditorily presented strings of digits. Second, his task was based on the number of digits that the subjects could recall, therefore it can be considered an explicit memory task. We used a digit copying task that at first sight does not seem to be a memory task. Only by the registration of subjects’ IRI times we can measure this implicit memory retrieval in our digit copying task. The results of this study provide us again, as Conrad (1966) demonstrated, with some evidence that memory plays an important role in an implicit way in copying digit strings presented on a computer screen. The question that now arises is how we can explain that the registration of subjects’ IRI times can be used as a behavioral correlate of memory function in this digit copying task? As argued in a previous section, an analysis of the task reveals that subjects have to shift their fixation point from the monitor to the keyboard. This process of ‘shifting’ takes time and can be accurately measured by the IRI. During this time, subjects have to store several digits in short term memory. Although the relation between the shifting of the fixation point and the increase in IRI can only be confirmed by eye movement registration, we tried to investigate this argumentation by looking at individual IRI’s within a digit string. We saw in study b that the IRI’s of the second and third interval positions were larger in comparison with the other two intervals. If this was due to the fact that the subjects were ‘shifting’, because their memory buffer was empty or because of a delayed access to the stored digits, we expect that when the subjects gradually learn the repeating series they have to shift less with the progress of performing the test trials. However, this was not confirmed by the results. We have evidence that the IRI’s differ between the four interval positions, but this difference did not tend to be smaller as the number of digit string repetitions increased. In summary, we found that the median IRI of the repeated digit strings gradually decreases, and this decrease appears to be equal between the four IRI’s within a digit string. This leads to the proposition that the execution of this digit copying remains stable regardless of the repeated or non-repeated appearance of the digit strings. Analogous results were found by Tromp and Mulder (1991). In their task, head injury subjects had to copy drawings of varying complexity and familiarity presented on a monitor in front of the subjects. The subjects did not differ from a control group in how they solved the problem or in other words they did not differ in the execution of the task.
Computerized measurement of implicit short term memory
11
To understand why there is no difference at the execution level of the task and why we found that the IRI of the repeated digit string gradually decreased in comparison with the non-repeated digit strings, it is important to consider that every human act is the result of the continuous interplay between stored knowledge and cognitive processes (Stillings et al., 1987). The way knowledge is stored can be characterized by the distinction between procedural and declarative knowledge and redundancy (Tromp & Mulder, 1991). Redundancy implies that knowledge is stored in multiple ways, and that multiple access routes can be taken to reach a certain item of knowledge. It makes the system less vulnerable to damage (Powell, 1981). The distinction between declarative and procedural knowledge clarifies how knowledge is used in the performance of a task (Anderson, 1976, 1982, 1987; Squire, 1981). Procedural knowledge specifies how to do something and is formed by experience. Therefore, it is necessarily represented in a highly redundant way and consequently not very vulnerable to brain dysfunctions (Tromp & Mulder, 1991). This is a possible explanation for why there was no difference at the execution level of the task. On the contrary, the redundancy of declarative knowledge can be either high or low, according to its familiarity. In our task this corresponds with, respectively, the repeated and non-repeated presentation of a digit string. The repetition of a digit string could facilitate mental routes creating memory traces. These become stronger each time they are used. Consequently, frequently activated memory traces are stronger than seldom activated traces. This could explain the observed interaction effect between the variables ‘repeat’ and ‘trial number’ under investigation, indicating that, with advancing test trials, the IRI of the repeated digit string decreased in comparison with the IRI of the nonrepeated strings. The task we used is described as a measure of very short-term memory but it can also be described as a measure of attentional capacity. The fact that attention and memory are intertwined becomes apparent when an attempt is made to assess them. Some investigators may describe a given test as a test of short-term memory, while others may describe it as a test of attention. Digit Span, for example, is usually considered as a test of immediate or working memory, but it appeared as a test of attention in other studies (Kaufman, McLean, & Reynolds, 1991; Fowler, Richards, et al., 1987; Spitz, 1972). The close connection between attention and memory is prominent in the concept of working memory (Baddeley & Hitch, 1974). Recently, Baddeley (1993) has raised the question whether the term “working memory” could be better replaced by “working attention”. The reason for raising this question was his assumption that the most crucial component of working memory, i.e., the central executive, is concerned with attention and coordinates rather than storage. Finally, however, Baddeley (1993) proposed to continue to use the term “working memory” for the system as a whole, as temporary storage is an absolutely essential feature of that system. By use of a repeated digit string in or task, we can also say that attention leaves ‘memory’ traces (van Zomeren & Brouwer, 1994). There is overwhelming evidence that the quality of memory traces is largely determined by the amount and type of processing given to the information to be remembered (Craik & Lockhart, 1972; Baddeley, 1990). Items that escape attention cannot be remembered. However, as soon as we pay attention, even with no intention to learn, some information will be retained in what has been called “incidental memory”. The strength of the trace appears to be directly proportional to the duration and to the intensity of the attention
Clinical assessment computerized methods
12
given to the material (Russel, 1981). However, using the IRI time as a measure of processing speed and short-term memory capacity reflects two important dimensions of attention: how fast the attentional system operates, and how much it can process at once. These two are related: the faster a system can process information the more it can process within a given time. From a neuropsychological point of view, it can be hypothesized that subjects with brain dysfunctions, as Brouwer (1985) suggested for head injury victims, suffer from mental slowness because of a decreased trace strength of the connections between nodes of knowledge and a reduced redundancy of memory representations. Consequently, we expect memory impaired subjects to have slower IRI’s and we expect their repeated digit string IRI to decrease slower in comparison to the normal sample. The results of ongoing studies at this moment point in this direction (Lafosse, et al., 2000). Inter-response times can be applied for both experimental and diagnostic purposes. The experimental use of Inter-Response times was introduced by Chase and Simon (1973) in their work on chess, and replicated by Reitman (1976) in the game “Go”. In these studies, memory structures were to be deduced from Inter-Response times of chessand Go-players. The subjects were instructed to reproduce a chess (or Go) pattern. They were allowed to look back at the structure pattern as often as they liked. It was hypothesized that with each glance at the structure pattern, the subject perceived and coded one chunk, then turned to the response board to place that chunk’s constituent elements (Reitman, 1976). When the elements to be learned were preorganized into chunks, they saw that subjects recalled elements in bursts. Pauses between bursts can be used to discover unknown chunks (Bower & Springston, 1970; McLean & Gregg, 1967; Gelfond, 1971; Reitman, 1976). These pauses in recall, measured by Inter-Response times, were adapted by Chase and Simon (1973) as an indication of the boundaries of the chunks skilled chess players see in the material they perceive and recall. Thus, based on the Inter-Response times, they define what the memory chunks are and how they are related during playing chess or Go. In other words, they measure the implicit or indirect memory retrieval in playing chess by means of Inter-Response times. For diagnostic purposes, three characteristics of our paradigm have to be emphasized. First, the subject can work at his own pace. In this way, we try to have the subject rely as little as possible on sensorimotor skills required for the execution of the task. In this respect it differs from studies (e.g. Conrad, 1966; Tromp & Mulder, 1991) where subjects are instructed to perform a task as fast and as accurately as possible. Second, the subjects are not told that their response time is registered and that we are interested in the memory component of the digit copying task. In this sense, subjects’ memory function during the performance of this copying task is measured in a rather implicit way. This has two advantages in the context of clinical neuropsychological assessment. On one hand, subjects with memory disturbances experience their failing competence less because we are not explicitly addressing their lost functions. On the other hand, we can measure subjects’ disrupted or unaffected implicit memory capacities. This has important implications for setting up a therapy plan. Third, we demonstrated that the timing characteristics of recall appear to be more applicable in measuring implicit memory retrieval in a digit copy task instead of reporting the number of digits correctly recalled. In connection with our second remark we can say that response times can be registered implicitly, without explicitly informing the subjects. By means of computerized
Computerized measurement of implicit short term memory
13
assessment, response times can be measured with millisecond precision. Consequently, subtle brain abnormalities affecting processing speed, and minor memory impairments that are difficult to detect using conventional pencil and paper tests, can be objectively and accurately measured. This is one of the major advantages of computerized assessment. This, however, does not mean that standard paper and pencil tests should be adapted. Classical and computerized assessment should be considered complementary. Each type of test can reveal different aspects of cognitive functioning. In this study, we demonstrated that response times are informative of implicit memory functioning in a digit copying task. References Alpherts, W.C.J., & Aldenkamp, A.P. (1990). Computerized neuropsychological assessment of cognitive functioning in children with epilepsy. Epilepsia, 31 (suppl 4), 35–40. Baddeley, A.D., & Hitch, G.J. (1974). Working memory. In G.H.Bower (Ed.), The psychology of learning and motivation, 8, 47–90. New York: Academic Press. Baddeley, A.D. (1993). Working memory or working attention? In A.D.Baddeley & L.Weiskrantz (Eds.), Attention, selection, awareness and control (pp. 153–170). Oxford: Clarendon Press. Bower, G.H., & Springston, F. (1970). Pauses as receding points in letter series. Journal of Experimental psychology, 83(3), 421–430. Brouwer, W.H., Van Wolffelaar, P.C. (1986). Sustained attention and sustained effort after closed head injury: detection and 0.10 Hz heart rate variability in a low event rate vigilance task. Cortex, 21(1), 111–119. Chase, W.G., & Simon, H.A. (1973). Perception in chess. Cognitive Psychology, Vol. 4(1), 55–81. Conrad, R. (1966). Short-term memory factor in the design of data-entry keyboards: An interface between short-term memory and S-R compatibility. Journal of Applied Psychology, 50(5), 353– 356. Conrad, R., & Hull, A.J. (1168). The preferred lay-out for numerical data-entry keysets. Ergonomics, Vol. 11(2), 165–173. Craik, F.I.M., & Lockhart, R.S. (1972). Levels of processing: a framework for memory research. Journal of verbal Learning and verbal Behavior, 11, 671–684. Fowler, P.C., Richards, H.C., Berent, S., & Boll, S. (1987). Epilepsy, neuropsychological deficits, and EEG lateralization. Archives of Clinical Neuropsychology, 2(1), 81–92. Glenn, S.W., & Parsons, O.A. (1990). The role of time in neuropsychological performance: Investigation and application in alcoholic population. Clinical Neuropsychologist, 4(4), 344– 354. Harness, B.Z., Bental, E., & Carman, A. (1977). Comparison of cognition and performance in subjects with organic brain damage and psychiatric subjects. Ada Psychiatrica Belgica, 77(3), 339–347. Kaufmann, A.S., McLean, J.E., & Reynolds, C.R. (1991). Analysis of WAIS-R factor patterns by sex and race. Journal of Clinical Psychology, Vol. 47(4), 548–557. Lafosse, C., Leenen, I., Maes, H., & Vandenbussche, E. (2000). Detecting short term memory problems in a simple digit copying task: new perspectives for computerbased assessment and training. Poster presented at the International Conference ‘Geheugen’, University of Antwerp, 28–30 September, Antwerp, Belgium. Loring, D.W., & Papanicolaou, A.C. (1987). Memory assessment in neuropsychology: Theoretical considerations and practical utility. Journal of Clinical and Experimental Psychology, 9 (4), 340–358.
Clinical assessment computerized methods
14
McLean, R.S., & Gregg, L.W. (1967). Effects of induced chunking on temporal aspects of serial recitation. Journal of Experimental Psychology, 74, 455–459. Milner, B. (1970). Memory and the medial temporal regions of the brain. In K.H.Probram & D.E.Broadbent (Eds.), Biology of Memory (pp. 29–50). New York : Academic Press. Reitman, J.S. (1976). Skilled perception in Go: Deducing memory structures from inter-response times. Cognitive Psychology, 8(3), 336–356. Tromp, E., & Mulder, T. (1992). Slowness of information processing after traumatic head injury. Journal of Clinical and Experimental Neuropsychology, 13(6), 821–830. Spitz, H.H. (1972). Note on immediate memory for digits: Invariance over the years. Psychological Bulletin, 78(3), 183–185. Stillings, N.A., Feinstein, N.A., Garfield, J.L., Rissland, E.L., Rosenbaum, D.A., Weisler, S.E., & Baker-Ward, L. (1987). Cognitive Science: An introduction. Cambridge, USA: MIT Press.
Chapter 2 Neuropsychological Assessment of Spatial Memory: Using Object Relocation in Clinical Subjects R.P.C.Kessels, A.Postma, and E.H.F.de Haan Psychological Laboratory, Utrecht University, Department of Psychonomics, Heidelberglaan 2, 3584 CS, Utrecht, The Netherlands Abstract A number of traditional, standardized neuropsychological tests exist to measure impairments in spatial memory. There is, however, evidence that spatial memory fractionates into multiple components, and that dissociations between these components can be observed in patients with neurological deficits. The standard tests for the neuropsychological assessment of spatial memory that are currently available are not sensitive to these sub-processes. Therefore, a new computerized object-location memory task has been introduced. This task entails subtests aimed at three important processes: precise positional encoding, remembering the general layout of a scene in terms of which object was where, and the integration of these two. This task—Object Relocation—was used to study patients with cerebral damage as a result of Korsakoff’s s syndrome, stroke or brain surgery, compared to healthy control subjects. The results show that Korsakoff patients perform worse on all conditions, and righthemisphere patients were impaired in precise positional encoding, while left-hemisphere patients performed at control level. These findings indicate that the task is sensitive to different patterns of impaired and spared functioning, and can be applied in a clinical population. Normative data will be collected that can be used for clinical-assessment purposes.
Introduction Memory for Object Locations Spatial memory enables us to find our way about in our environment and to recall the locations of objects in space (Postma & De Haan, 1996; Kessels, Postma, Wijnalda, & De Haan, 2000). Consequently, if this ability is impaired for some reason, everyday life can be seriously affected. For example, it has been demonstrated that patients suffering from Alzheimer’s disease have problems in spatial memory (Adelstein, Kesner, & Strassberg, 1992), which perhaps account for many behavioral difficulties that are seen in these
Clinical assessment computerized methods
16
patients, such as getting lost and wandering around (Henderson, Mack, & Williams, 1989). Moreover, spatial memory is known to decrease as a result of normal aging (Cooney & Arbuckle, 1997). Therefore, psychologists have been trying to develop standardized tests to measure this cognitive construct clinically. Typically, these tests assess memory for object locations, i.e. remembering what is where in our environment. In this paper, all object-location memory tests that are currently available will be summarized briefly first. Next, we will present some clinical and control data of a new object-location memory test, Object Relocation, which has some important advantages over the standard spatial-memory tests. In general, all object-location memory tests are adaptations from procedures that have been used previously in experimental research. Smith and Milner (1981), for example, developed a procedure which they used to study various groups of neurologically damaged patients. Basically, they presented 16 small toys on a table, and prompted the subject to remember the exact location of each object as accurately as possible. Hereafter, the table was cleared and the subject had to relocate the items, either immediately or after a delay of 3 minutes or 24 hours. This procedure proved to be effective in demonstrating deficits in patients with lesions of the hippocampus and the frontal lobe (Smith & Milner, 1981, 1984). However, this paradigm is not very convenient to use in a normal clinical setting, since the administration and scoring of this task is highly complex. This had lead to the development of several other neuropsychological tests of memory for object locations, which will be described in brief. Neuropsychological Spatial-memory Tests The Misplaced Objects Test (Crook, Youngjohn, & Larrabee, 1990) is a computerized version of a three-dimensional task in which subjects have to learn the location of common objects within a seven-room house (Crook, Ferris, & McCarthy, 1979). The computerized, two-dimensional version displays a house which is divided into 12 rooms using a 3×4 grid. The subject then has to place 20 objects into the grid using a touchsensitive screen (with a maximum of two objects in one cell), and the location of each object has to be recalled after a delay of 40 minutes. This test has been studied using various age groups and has been correlated with several other neuropsychological tests, such as subtests of the Wechsler Adult Intelligence Scale (WAIS) and the Benton Visual Retention Test (Crook et al., 1990), but no clinical data are available yet. Another test for object-location memory is the Spatial Location Test. This test was first developed by Grober (1984) and adapted by Sanchez, Grober, and Birkett (1997) for use in clinical settings. Here, 4 or 8 pictures of common objects are presented in different locations on a large card, which has to be studied by the subject. Subsequently, the pictures have to be relocated to their correct positions, which are pre-marked in the relocation phase by empty rectangles (in a grid-like manner, i.e. top left, bottom right). This test correlated strongly with the Mental Status Questionnaire (MSQ) in a group of elderly patients with severe mental disorder (Sanchez et al., 1997). The authors suggest that this test may be useful for assessment in the absence of dementia. However, the patient group used in this study was fairly heterogeneous. Also, data from healthy control subjects are not available.
Neuropsychological assessment of spatial memory
17
In the Spatial Array Memory Test (SAMT), two posters containing pictures of 10 abstract geometric figures that are difficult to verbalize are presented to the subject for 60 seconds. Hereafter, the figures have to be relocated, both immediately and after a 15minute delay (Meador, Meador, Loring, Lee, & Martin, 1990). This test assesses free recall of locations, since figures have to be relocated into ‘free space’, i.e. without the use of a grid or pre-marked positions. This test has been validated using healthy control subjects (Jue, Meador, Zamrini, Allen, & Loring, 1992; Meador et al., 1990; Meador, Moore, Nichols, Abney, Taylor, Zamrini, & Loring, 1993). Clinical data on patient groups, however, do not exist. Malec, Ivnik, and Hinkeldey (1991) developed the Visual Spatial Learning Test (VSLT), which has been evaluated using a large sample of various age groups. Also, data are available from demented patients (Malec, Ivnik, Smith, Tangalos, Petersen, Kokmen, & Kurland, 1992). A grid is used to present seven abstract geometric figures for 10 seconds. Subsequently, these figures have to be recognized among eight distracting items and have to be relocated to their previously occupied positions. A major advantage of this test procedure is that scores can be obtained for object recognition, recall of correct positions only and also for recall of correctly placed items. The Brief Visuospatial Memory Test—Revised (BVMT-R) was originally developed by Benedict and Groninger (1995). Six geometric figures are presented in a 2×3 grid for a brief period (10 s). Next, these figures have to be reproduced and placed in the correct position, both immediately and after a 25-minute delay. This pen-and-paper test comes with 6 alternate forms. Also, a yes-no delayed recognition test has been added recently (Benedict, Schretlen, Groninger, Dobraski, & Shpritz, 1996). The BVMT-R has been studied with various age groups (between 18 and 88), but data on clinical populations are not available. More recently, the Location Learning Test (LLT) has been introduced. Here, ten everyday pictures are presented in a large grid for 30 seconds, and the subject has to relocate them to the correct position immediately and after a delay of 30 minutes (Bucks & Willison, 1997). Detailed normative data are available for senior healthy subjects (aged between 50 and 96). Also, correlations with the Mini-Mental State Examination (MMSE) and the National Adult Reading Test (NART) have been determined (Bucks, Willison, & Byrne, 2000). Clinical data are available for Alzheimer patients and patients suffering from vascular dementia (Bucks & Willison, 1997). There are, however, a number of problems with the aforementioned clinical tests for object-location memory. First, five of the six tests use a matrix or grid-like presentation method. This enables the subject to verbally code the positions of the objects. Subjects can, for example, count the positions or just remember that item X is in the left bottom cell (Postma, 1996; Postma, & De Haan, 1996). This can be a serious problem if one is interested in “pure” spatial memory processes. Another problem is that visual memory tests typically rely heavily on visuo-perceptual and visuo-constructional abilities (Heilbronner, 1992). These are factors that cannot be measured with the spatial-memory tests described above. Moreover, only one test procedure (the VSLT) can be used to assess object recognition memory (regardless of the spatial information) and spatial memory per se (regardless of item information). Most importantly, only the VSLT and the LLT have been validated using both clinical patients and healthy control subjects.
Clinical assessment computerized methods
18
Finally, most tests are aimed at the assessment of dementia, and may be less applicable for other neurological conditions. Object Relocation To overcome the aforementioned difficulties, a new test procedure for object-location memory has been recently introduced, Object Relocation (Kessels, Postma, & De Haan, 1999). Object Relocation is a 32-bit computer program, which runs under Windows 95/98 and is designed for the assessment of spatial-memory, visuo-spatial construction and perception, and object-recognition memory. The procedure is an adaptation of a previous version developed for experimental purposes by Postma and De Haan (1996). What is important here is that spatial memory is not conceived of as a unitary concept, but as consisting of at least three important mechanisms. First, the processing of precise, metric positional information (i.e., the exact coordinates) is important. Second, object-toposition assignment is involved in the binding of item and location information and in the encoding of relative relations between objects (e.g. “the table is on the left of the window”). Finally, these two mechanisms have to be integrated, since we do not remember location and item information separately in everyday life. Object Relocation can be used to assess these three functions separately. There is converging evidence that these three mechanisms dissociate under experimental conditions (Postma, 1996; Postma & De Haan, 1996). Also, selective sex differences have been found (Postma, Izendoorn, & De Haan, 1998; Postma, Winkel, Tuiten, & Van Honk, 1999). In the present paper, groups of neuropsychological patients have been compared to healthy control participants using this spatial-memory test in order to evaluate the clinical applicability of Object Relocation. Methods Participants Three different groups were studied; a group of healthy adults (N=20), a group of alcoholic Korsakoff patients (N=20), and a group with focal cerebral lesions (N=37). In the group with focal lesions, 28 patients suffered from an ischaemic stroke, whereas 9 patients had a tumor resection. The focal-lesion group was subdivided into a lefthemispheric (N=21) and a right-hemispheric group (N= 16) in the further analyses. All patients participated in a larger study, the results of which will be published elsewhere. The educational levels of all participants were recorded using seven categories (1 being the lowest education, 7 the highest). Furthermore, each participant completed a Dutch translation of the revised Annett Handedness Inventory (described in Briggs & Nebes, 1975) with scores between −24 (left-handed) and +24 (right-handed). Apparatus and Procedure All stimulus displays were presented on a Philips 17A Brilliance 17″ computer monitor, including an ELO AccuTouch 2201 touch-sensitive screen to register the responses.
Neuropsychological assessment of spatial memory
19
Various conditions have been used. First, in the object-recognition condition, ten everyday objects are presented within a 5×2 grid. Hereafter, the subject has to recognize these objects among ten distracters regardless of their location. In the visuo-spatial construction condition, two frames were visible on the screen simultaneously. The left frame contained ten different objects at different locations (evenly distributed), the right frame was empty with the ten objects placed in a row on the top of the screen. The subject’s task was to place the objects in the same position in the right-hand frame as the corresponding objects in the left-hand frame. Next, the three spatial-memory conditions were presented. In the object-to-position assignment condition, a frame was visible containing 10 different objects at 10 locations. Subsequently, the frame was cleared and the objects reappeared on top of the computer screen (see Figure 1 for an actual stimulus display). The subject now had to place these objects in their previously occupied positions, which were pre-marked with black dots. In the positional-memory condition, a frame was presented containing ten objects that were all identical. Then the frame was cleared, and the objects had to be relocated as accurately as possible (no pre-marked dots were present, see Figure 2). Finally, a frame was presented containing ten different objects in the combined task condition, reflecting the integration process. In the relocation phase, the frame was empty and the objects had to be relocated to their previously occupied positions as accurately as possible (without pre-marked dots, see Figure 3). The frame size was 19×19 cm and all displays were presented for a period of 30 s (in the visuo-spatial construction condition the frame size was 15×15 cm due to the limited screen size, since the frames were presented simultaneously). The objects all were easyto-name everyday items (size approximately 1×1 cm). Across all trials, different spatial layouts and object sets were used. Each trial began with immediate testing followed by delayed testing after 3 minutes. There were no time limits in the relocation phases. Each condition consisted of two subsequent trials. Before the actual testing, practice stimulus displays were presented containing only 4 objects.
Figure 1. Example of a stimulus display (actual size 19×19 cm) of the object-to-position assignment condition in the presentation phase (left) and the relocation phase (right).
Clinical assessment computerized methods
20
Figure 2. Actual stimulus display of the positional-memory condition in the presentation phase (left) and the relocation phase (right). Figure 3. Presentation phase (left) and the relocation phase (right) for the combined task condition.
Table 1. Schematic overview of the sub-processes, task conditions, and dependent variables (i.e., error measures) of Object Relocation. Cognitive function
Task condition
Error measure
Object-identity memory
Recognition of 10 objects among 10 distracter items
Percentage of false positives
Visuo-spatial
Precise copying of a display containing
Absolute displacement in mm
Neuropsychological assessment of spatial memory
21
construction
10 objects at 10 different positions
for the display as a whole
Object-to-position assignment
Relocation of 10 different objects at 10 different locations, pre-marked by dots
Percentage of incorrectly located objects
Positional memory
Precise relocation of 10 identical objects Best-fit score in mm for the at 10 different positions stimulus as a whole1
Integration process
Precise relocation of 10 different objects Absolute displacement in mm at 10 different locations for the display as a whole
1
Here, all objects are identical, which complicates the calculation of the absolute displacement because there is no unique pairing between original and relocated position. Since the total number of possible pairings is rather large (3,628,800 for ten objects, i.e. 10!), the best-fit measure was computed, reflecting the pairing that produced the smallest absolute displacement (in mm) for the stimulus display.
Analyses The mean score of the two trials within each condition was calculated for each subject. Different error scores were used because of the differences between the various task conditions. In the object-recognition and the object-to-position conditions, the percentage incorrect objects was determined. In the visuo-spatial construction and the combined conditions, the absolute deviations between the original and the relocated positions in mm were computed for the stimulus display as a whole. Since this was not possible in the positions-only condition, all objects being equal, a best-fit score in mm was calculated, based on the pairings of original and relocated positions that yielded the smallest error for the stimulus display as a whole (Kessels et al., 1999; Postma et al., 1999; see Table 1 for a schematic overview of the task conditions and dependant variables). Multivariate analyses of variance were performed for each task, and post-hoc Tukey tests were performed in case of overall significance.
Table 2. Mean age, education level, handedness score (and standard deviations) and mean error scores (and standard deviations) on the Object Relocation conditions plus gender distribution for the different groups. Hea lthy
Mean Age Education level Annett Handedness
Adults Kors akoff Patients SD
Mean
LeftHemis pheric Lesions SD
Mean
RightHemis pheric Lesions SD
Mean
SD
51.70
8.01
48.25
4.59
47.76
18.24
49.22
12.28
4.15
1.46
4.05
1.23
4.24
1.26
4.19
1.11
12.50
17.55
16.85
14.17
16.20
15.92
14.47
15.92
Clinical assessment computerized methods
22
Inv. VPC (mm)
73.76
20.84
126.50
46.69
89.46
31.31
122.88
53.76
OR – immediate (%)
3.00
4.97
25.25
11.29
5.71
7.95
20.94
22.89
OR – delayed (%)
4.25
6.34
34.25
13.01
6.84
7.49
22.81
26.77
OTP – immediate (%)
22.75
17.20
74.00
17.74
35.79
26.17
39.37
27.62
OTP – delayed (%)
27.25
18.10
78.75
14.50
37.63
29.50
44.37
29.83
206.00
50.58
303.98
30.81
230.25
52.21
257.65
51.98
PON – delayed 218.44 (mm)
44.56
331.30
44.91
263.16
63.25
276.14
68.23
PON – immediate (mm)
COM – immediate (mm)
308.17
116.22
658.13 152.39
423.76 148.52
431.98 200.50
COM – delayed (mm)
354.99
157.70
751.75 148.76
434.14 161.25
477.11 202.40
N Male:female
20
20
21
16
16:4
16:4
9:12
5:11
Note. VPC=visuo-spatial construction; OR=object-recognition memory; OTP=object-to-position assignment; PON=positional memory; COM=combined condition.
Results Table 2 summarizes the results for the five groups. There were no significant differences between these groups with respect to age, handedness or educational level. In the object recognition task, an overall group effect was found (F(3,71)= 16.21, p<0.0005) as well as an effect of delay (F(1,71)=27.34, p<0.0005). Moreover, there was a group×delay interaction (F(3,71)=9.73, p<0.0005). Post-hoc tests showed that the Korsakoff and righthemispheric group performed worse than the healthy controls. An overall group effect was also found in the visuo-spatial construction condition (F(3,73)=8.24, p<0.0005). Both the right-hemispheric and the Korsakoff group performed post-hoc worse than the control group. There was a significant group effect in the object-to-position condition (F(3,71)=19.65, p<0.0005), as well as an effect of delay (F(1,71)=8.88, p< 0.005). No delay×group interaction was found. Here, only the Korsakoff group scored lower than the control subjects. Analyses of the results of the positional-memory task showed a group difference (F(3,70)=17.30, p<0.0005) and a delay effect (F(1,70)=23.51, p<0.0005). The interaction was not significant. The Korsakoff and right-hemispheric patients had lower
Neuropsychological assessment of spatial memory
23
scores than the healthy subjects. Finally, in the combined condition there was an effect of group (F(3,69) =20.89, p<0.0005) and delay (F(1,69)=31.96, p<0.0005). Also, a significant group×delay interaction was found (F(3,69)=2.84, p<0.05). In this condition, only the Korsakoff patients performed worse than the healthy controls. Discussion In this study, groups of patients with neurological damage were compared to healthy volunteers on a task measuring different aspects of memory for object-locations. In addition, visuo-spatial construction and object-identity memory were assessed. The results show that patients suffering from Korsakoff’s amnesia perform poor on all tasks compared to healthy adults. Furthermore, patients with right-hemisphere brain lesions perform worse on object-identity memory, visuo-spatial construction, and metric positional reconstruction compared to healthy subjects. No differences were found between controls and left-hemisphere patients. These results are in accordance with previous neuropsychological findings. For example, there is evidence that especially the right hemisphere is involved in precise, metric encoding (Kosslyn, Koenig, Barrett, & Cave, 1989). This process is measured in the positional-memory condition of the Object Relocation task. Here, it was found that right-hemispheric patients performed worse than controls, whereas left-hemispheric subjects did not. Impairments were also found in the visuo-spatial construction and object-identity memory tasks in patients with lesions in the right hemisphere. In the object-to-position and combined task, however, the right-hemispheric patients performed at control level. The Korsakoff patients performed worse in all memory tasks compared to healthy adults. Moreover, a deficit in visuo-spatial construction was found in this group. These findings are in line with the severe cognitive dysfunctions that patients with Korsakoff’s syndrome display as a result of widespread brain damage (see also Butters, 1985). These data convincingly show that Object Relocation is a sensitive task for the assessment of spatial memory in clinical subjects. All patients were able to understand the test instruction, to perform the task, and were comfortable using the touch-sensitive screen to relocate the objects after a short practice trial. No apparent floor or ceiling effects were present. An additional advantage of Object Relocation is that it does not require a verbal response by the subject, i.e. subjects do not have to be able to name the objects, but only have to relocate the objects to their correct positions. This is particularly important when testing patients with speech or language disorders, such as aphasia. Furthermore, the aforementioned selective findings provide additional evidence that spatial memory is not a unitary construct, but consists of separate sub-processes. These results corroborate and extend previous findings (Postma & De Haan, 1996; Postma et al., 1998, 1999). The current findings also emphasize that the visuo-spatial construction and perception ability of subjects should definitely be taken into account. Since Korsakoff and right-hemispheric patients display impairments in this ability, this could potentially influence the performance on the spatial-memory conditions. In order to use this task as a clinical assessment tool, normative data need to be collected, that is, data from other neurological patients, and from several age groups of
Clinical assessment computerized methods
24
healthy subjects. With these data, it is possible to determine cut-off scores that could be helpful in neuropsychological practice to assess impairments in spatial memory in individual patients, or to evaluate clinical treatments. Finally, in validating Object Relocation further, it is of interest to correlate this task with other measures of spatial cognition, such as maze learning or spatial working memory as assessed with the Corsi Block-Tapping Test. Also, since discrepancies might exist between the performance on computer tasks and tests measuring real-life aspects of spatial cognition (Stepánková & Ruzicka, 1998), it would be of interest to examine the relation between Object Relocation and real-life spatial tests, such as finding-the-way or route learning. The aim of the present study was not to examine the relations between precise lesion localization (e.g. parietal vs. temporal) and spatial-memory dysfunction, but rather to look at functional impairments in these groups as a whole. This was done since the cerebral damage in these groups cannot be directly compared. For example, the patients with focal lesions had either an ischaemic stroke or brain surgery, and there are differences between these two types of patients (Anderson, Damasio, & Tranel, 1990). Also, differences exist within these groups with respect to the size of the lesion and etiology. Therefore, it would be interesting for future research to look in detail at the nature of the cerebral dysfunction, for example with the help of brain-imaging techniques, and to correlate these findings with the performance on the different conditions of the Object Relocation task. Acknowledgements The authors would like to thank Prof. Dr. L.J.Kappelle and Dr. M.J.B.Taphoorn (Department of Neurology, University Medical Center, Utrecht) and Drs. A.J.Wester (Korsakoff Clinic, Vincent van Gogh Institute, Venray) for their help in selecting the patients. References Adelstein, T.B., Kesner, R.P., & Strassberg, D.S. (1992). Spatial recognition and spatial order memory in patients with dementia of the Alzheimer’s type. Neuropsychologia, 30, 59–61. Anderson, S.W., Damasio, H., & Tranel, D. (1990). Neuropsychological impairments associated with lesions caused by tumor or stroke. Archives of Neurology, 47, 397–405. Benedict, R.H., & Groninger, L. (1995). Preliminary standardization of a new visuospatial memory test with six alternative forms. Clinical Neuropsychologist, 9, 11–16. Benedict, R.H., Schretlen, D., Groninger, L., Dobraski, M., & Shpritz, B. (1996). Revision of the Brief Visuospatial Memory Test: Studies of normal performance, reliability, and validity. Psychological Assessment, 8, 145–153. Briggs, G.G., & Nebes, R.D. (1975). Patterns of hand preference in a student population. Cortex, 11, 230–238. Bucks, R.S., & Willison, J.R. (1997). Development and validation of the Location Learning Test (LLT): A test of visuo-spatial learning designed for use with older adults and dementia. Clinical Neuropsychologist, 11, 273–286. Bucks, R.S., Willison, J.R., & Byrne, L.M.T. (2000). Location Learning Test—Manual. Suffolk, UK: Thames Valley Test Company.
Neuropsychological assessment of spatial memory
25
Butters, N. (1985). Alcoholic Korsakoff’s syndrome: Some unresolved issues concerning etiology, neuropathology, and cognitive deficits. Journal of Clinical and Experimental Neuropsychology, 7, 181–210. Cooney, R., & Arbuckle, T. (1997). Age, context, and spatial memory: A neuropsychological approach. Aging, Neuropsychology, and Cognition, 4, 249–265. Crook, T., Ferris, S., & McCarthy, M. (1979). The Misplaced Objects Task: A brief test for memory dysfunctions in the aged. Journal of the American Geriatrics Society, 27, 284–287. Crook, T.H., Youngjohn, J.R., & Larrabee, G.J. (1990). The Misplaced Objects Test: A Measure of everyday visual memory. Journal of Clinical and Experimental Neuropsychology, 12, 819–833. Grober, E. (1984). Non-linguistic memory in aphasia. Cortex, 20, 67–73. Heilbronner, R.L. (1992). The search for a “pure” visual memory test: Pursuit of perfection? Clinical Neuropsychologist, 6, 105–112. Henderson, V.W., Mack, W., & Williams, B.W. (1989). Spatial disorientation in Alzheimer’s disease. Archives of Neurology, 46, 391–394. Jue, D., Meador, K.J., Zamrini, E.Y., Allan, M.E., & Loring, D.W. (1992). Differential effects of aging on directional and absolute errors in visuospatial memory. Neuropsychology, 6, 331–339. Kessels, R.P.C., Postma, A., & De Haan, E.H.F. (1999). Object Relocation: A program for setting up, running and analyzing experiments on memory for object locations. Behavior Research Methods, Instruments, and Computers, 31, 423–428. Kessels, R.P.C., Postma, A., Wijnalda, E.M., & De Haan, E.H.F. (2000). Frontallobe involvement in spatial memory: Evidence from PET, fMRI, and lesion studies. Neuropsychology Review, 10, 101–113. Kosslyn, S.M., Koenig, O., Barrett, A., & Cave, C.B. (1989). Evidence for two types of spatial representations: Hemispheric specialization for categorical and coordinate relations. Journal of Experimental Psychology: Human Perception and Performance, 15, 723–735. Malec, J.F., Ivnik, R.J., & Hinkeldey, N.S. (1991). Visual Spatial Learning Test. Psychological Assessment, 3, 82–88. Malec, J.F., Ivnik, R.J., Smith, G.E., Tangalos, E.G., Petersen, R.C., Kokmen, E., & Kurland, L.T. (1992). Visual Spatial Learning Test: Normative data and further validation. Psychological Assessment, 4, 433–441. Meador, K.J., Meador, A.S., Loring, D.W., Lee, G.P., & Martin, R.C. (1990). Anterograde memory for visuospatial arrays. International Journal of Neuroscience, 57, 1–8. Meador, K.J., Moore, E.E., Nichols, M.E., Abney, O.L., Taylor, H.S., Zamrini, E.Y., & Loring, D.W. (1993). The role of cholinergic system in visuospatial processing and memory. Journal of Clinical and Experimental Neuropsychology, 15, 832–842. Postma, A. (1996). Reconstructing object locations in a 7×7 grid. Psychologische Beiträge, 38, 90– 100. Postma, A., & De Haan, E.H.F. (1996). What was where? Memory for object locations. Quarterly Journal of Experimental Psychology, 49A, 178–199. Postma, A., Izendoorn, R., & De Haan, E.H.F. (1998). Sex differences in object location memory. Brain and Cognition, 36, 334–345. Postma, A., Winkel, J., Tuiten, A., & van Honk, J. (1999). Sex differences and menstrual cycle effects in human spatial memory. Psychoneuroendocrinology, 24, 175–192. Sanchez, M., Grober, E., & Birkett, D.P. (1997). Dementia in left brain damage. Clinical Gerontologist, 17(4), 13–22. Smith, M.L., & Milner, B. (1981). The role of the right hippocampus in the recall of spatial location. Neuropsychologia, 19, 781–793. Smith, M.L., & Milner, B. (1984). Differential effects of frontal-lobe lesions on cognitive estimation and spatial memory. Neuropsychologia, 22, 697–705. Stepánková, K., & Ruzicka, E. (1998). Object location learning and non-spatial working memory of patients with Parkinson’s disease may be preserved in “real life” situations. Physiological Research, 47, 377–384.
Chapter 3 Developing a Computer-Supported, Collaborative Learning Environment for Argumentative Writing H.J.M.(Tabachneck-) Schijf, G.Erkens, J.Jaspers, and G.Kanselaar Utrecht University, Institute of Information and Computing Sciences, Padualaan 14, 3584 CH Utrecht, The Netherlands. Abstract In the last ten years the focus of learning as an intra-individual cognitive process has shifted to learning as a social process whereby knowledge is co-constructed as a distributed cognition of the ‘world-around-us’. Computer-supported, collaborative shared writing environments appear to have much potential as media for the implementation of both technological and human mediation as a flexible source, support and deposit of knowledge during the writing process. Research is needed to examine which didactical, social and cognitive conditions should be implemented in these virtual, shared learning environments to realize the desired goal. We are developing a shared writing environment offering support in planning and organizing argumentative texts, in order to study collaborative writing. The basic environment includes an online discussion module, an information module, a shared writing module and a window for private notes. We intend to extend the basic environment with an idea-organizer (the ‘Diagrammed) and hierarchical linearizer (the ‘Outliner’), plus an advisor component for each tool (the ‘Advisor’) that will explain and scaffold/structure the ongoing interaction of the students with the tools. These tools are intended for help with pre- as well as online planning and revising; students will be asked to update the ‘Diagram’ and ‘Outline’ of their text to reflect changes. We hypothesize that the result of iterative use of the tools in collaborative writing will overcome the documented finding that such tools have a positive effect on pre-planning but little or no effect on the quality of the final paper. Additionally, we will examine relationships between the paired students’ discussion characteristics and the final text product. In this paper we present a description of the intended environment and some theoretical background, as well as a description of the first experiment in which six experimental groups will be evaluated against a control group.
Introduction A recent Dutch educational law has transformed the curriculum in the last three years of college preparatory high schools. Among the changes, schools are required to provide
Developing a computer-supported, collaborative learning environment
27
support for students to do increasingly independent research, in order to prepare them better for college studies. Working and learning actively, constructively and collaboratively are seen as important parts of this program. The computer-supported, collaborative writing environment that we are developing is meant to fit within this new program, called “studiehuis” (study-house). This is because the Information and Communication Technology (ICT) involved can emphasize both its constructivist and collaborative aspects because of ICT’s active and interactive nature. We conceptualize writing argumentative texts mainly as a knowledge construction task. In this task, several informational units from internal or external sources must be generated, selected, collected, related to each other, and organized in a consistent knowledge structure. This entails quite a few skills, among which social, cognitive, rhetorical, and cultural. Stein, Calicchia, and Bernas (1997) found that argumentation facilitated learning because it involves searching for relevant information and because authors are then using each other as a source of knowledge. Generating and Organizing Conceptually vs. Organizing Linearly Planning an argumentative text is a type of task whereby arguments need to be generated and ordered based on one’s position and on the audience’s needs. Unlike in storytelling, the order of the content of an argumentative text does not inherently follow the order in which events take place (McCutchen, 1987). During preplanning, ideas will probably be generated and organized in a very different manner. Most likely, the organization will be in concepts, for instance, in argument clusters. There is no pre-ordained order in which such argument clusters should be put down in a paper. This depends on the point one wishes to make, and on the audience’s needs. Hence, linearization of the contents is an important part of argumentative writing. This is needed before ideas can be expanded into text (this is also called ‘translating’), and again when a text is reorganized (Levelt, 1989). Research at our department showed that an explicit separation of the conceptualization (generating and organizing ideas) and linearization actions during planning leads to an improvement of the quality of an argumentative text (Coirier, Andriessen, & Chanquoy, 1999). It became apparent that converting the conceptual representation into linear text was a crucial problem for a writer producing argumentative texts. The proposed environment will endeavor to support students in executing these two actions (organizing concepts and linearizing them) with an ICT environment that, besides a collaborative writing environment, offers a structured environment for these activities, as well as advice and help. Computer Supported Collaborative Learning systems (CSCL) are assumed to have the potential to enhance the effectiveness of peer learning interactions (Andriessen, Erkens, Overeem, & Jaspers, 1996). As for the role computers play with regard to education, the focus is on the construction of computer-based, multimedia environments: open learning environments which may give rise to multiple authentic learning experiences. The cooperative aspect is mainly realized by offering computerized tools which can help collaborating students in solving the task at hand (e.g. the CSILE-program of Scardamalia, Bereiter & Lamon, 1994; the Belvèdere program of Suthers, Weiner, Connelly, & Paolucci, 1995). These tools are generally of two kinds: task content related and communicative tools. Task-related tools support the performance of the task and the
Clinical assessment computerized methods
28
problem-solving process. Communicative tools give access to collaborating partners, but also to other resources like external experts or other information sources via the Internet. The function of the program is in this respect a communication medium. Programs that integrate both functions are generally known as groupware: programs that are meant to support collaborative group work by sharing tools and resources between group members and by giving communication opportunities within the group and to the external world. Offering an ICT groupware environment with a shared workplace with tools for planning, writing and communication has several advantages. First, the students do not necessarily have to coincide in time and place (though in our first experiment they did coincide in time). Second, sources and notes can be presented in the same environment. Third, the effort of constructing and altering text, as well as producing a readable and attractive output, is considerably less in an ICT environment than by hand. Fourth, the collaborating dialogue can take place in a chat-window, substantially reducing the amount of noise in the students’ workspace, which may often be a large classroom. Fifth, all of the students activities are visually explicit to each other, and by these means negotiable. Students become aware and can discuss complex activities supported by the tools. Sixth, an ICT tool affords rich data capture of the process of writing, ready to be analyzed. Although one can obtain such data from non-ICT collaborative writing by videotaping, getting the data to an analyzable form is a daunting task. A drawback could be that a student who has problems typing is at a disadvantage. A further disadvantage could be impoverishment of social contact, especially if many school tasks were performed long-distance. Collaborating via the computer, however, is socially better than interacting with only the computer. Preplanning vs. Online Planning Much prior research has concerned itself with examining the extent and effects of preplanning , which are planning activities carried out prior to writing. Our previous research shows that preplanning can have a favorable effect on the quality of the text (Andriessen, Coirier, Roos, Passerault, & Bert-Erboul, 1996). However, it is known that inexperienced writers seldom do any preplanning (Bereiter & Scardamalia, 1987). Moreover, because of a lack of knowledge of the issues involved, when preplanning does occur in novices, it is more likely to be a superficial sort of brainstorming. Torrance, Thomas, and Robinson (1996) found that, for adult undergraduates (relative novices), very little idea generation was based on rhetorical demands during preplanning. Rather, idea generation was more comparable to a simple content-activation model: terms produced were semantically related to terms in the assignment. Supporting this, the number and originality of ideas in the draft were not correlated with the amount of time spent preplanning. In addition, Bereiter & Scardamalia (1987) found this to be true for children. The ideas in the draft, therefore, must have been mostly planned during the writing itself. This type of planning is called online planning. By online planning we mean the monitoring activities that occur during writing, based on set goals, ideas, expectations and strategies (Van der Pool, 1995). These activities direct the process of knowledge construction during writing. Online planning activities, unlike preplanning, are generally linked more strongly to the local organization of the text. Preplanning, at least in experts,
Developing a computer-supported, collaborative learning environment
29
is more concerned with global issues like setting goals and determining overall organization and genre. As novices don’t do much preplanning, supporting online planning becomes especially important. In the proposed environment, students will not be restricted to using the planning tools only for preplanning. Rather, they will be encouraged to use the tools throughout the writing effort, supporting both preplanning and online planning. Planning, Writing and Collaboration Dialogues In prior research, going from the plan constructed in the preplanning processes to writing the actual text was found to be a stumbling block. Kozma (1991) and Scardamalia and Bereiter (1985, 1987) all found positive effects of teaching preplanning on the amount and/or the quality of preplanning, but not on the quality of the written text. In other words, the improvement remained localized to the preplan. As previously mentioned, to get from the preplan to writing the text, two transitions need to take place. The preplan must be translated into proper sentences and these sentences must be linearized. An advantage of collaborative writing is that reflecting on such transitions becomes a natural process. By writing a shared text, the partners will have to negotiate, in their collaboration dialogue, the generating and organizing of the plan, linearizing the planned elements, as well as translating into the common text. Unlike when writers work alone, these usually internal processes will need to be verbalized and ideas will need to be made concrete. This negotiation will result in a shared knowledge construction. In research concerning the relation between the problem-solving collaboration dialogues of grade school children and the quality of their solution processes, we found that the better problem solvers were characterized by a higher frequency of three types of coordinating activities in the collaboration dialogue (Erkens, 1997). These types are (1) checking: finding out whether new information is consistent with knowledge constructed earlier, in order to arrive at a common ground (see also Clark & Brennan, 1991); (2) focusing: being aimed at the same sub-problems and subtasks; and (3) argumentation: activities concerned with convincing the other of the relevance and status of information. In other prior research, in which college undergraduates selected arguments and produced an argumentative text while collaborating in an electronic environment, differences in coordinating activities were found to correlate with the representation of the source material. It was found that, in a task in which the arguments appeared as pictures, more inferences were needed to deduce the usefulness of the information, the students discussed more new arguments in the discussion window, and more new arguments were recorded in the text (Andriessen, Erkens, Overeem, & Jaspers, 1996). One conclusion of this research is that features of the process of knowledge construction in the collaboration dialogue can be related to features of the resulting text. The expectation that more mutual coordinating activities in the dialogue result in a more consistent, shared knowledge structure and in a better mutual problem solution needs further research (also see Baker, 1999). Our ICT tool captures a rich set of process data, enabling us to carry out in-depth analyses.
Clinical assessment computerized methods
30
Research Questions The issue addressed in the current research is whether supporting “co-construction” of knowledge in the planning of argumentative texts will lead to more productive coordinating activities like checking, focusing and argumentation in the dialogue between the students, and, ultimately, to better texts. The experiment can also lead to suggestions concerning in which ways this coordination between partners could be additionally supported. We expect that more support during collaborative planning, using various ICT-tools, will lead to better coordination in co-constructive activities and, therefore, to qualitatively better texts. These three research questions will be addressed: 1. What sort of influence does the support of the organization of the plan and linearization using various ICT-tools have on the consistency and coherence of collaboratively written argumentative texts? 2. Is there a relationship between features of the planning process and transitions to text using the ICT-tools and the coordination in the dialogue of the collaborating partners, measured by the occurrence of checking, focusing and argumentation? 3. How do activities of knowledge construction differ in the phases of planning, preplanning and online planning?
Writing Environment Belvédère: An Early Example of a Networked Environment An early networked environment developed for testing scientific hypotheses in grade school is Belvédère (see, e.g., Suthers, Weiner, Connelly, & Paolucci, 1995). Belvédère contains WYSIWIS (What You See Is What I See) knowledge mapping software (comparable to our Diagrammer) and a chatting facility, as well as a ‘computer coach’, comparable to our Advisor. In Belvédère, small groups of children collaborate to support or refute a scientific hypothesis, getting information from an on-line database, discussing their ideas via an asynchronous chatting facility, and adding to the diagram in the knowledge mapping window. Belvédère is being used in dozens of schools, reportedly with success. The authors claim as benefits that the software helps children express and reflect on their own knowledge, that the computer coach helps children apply principles of scientific reasoning correctly, that the shared visual workspace coordinates collaborative learning, and that the diagrams make abstract ideas concrete and keep track of work. The TC3 Writing Environment We have constructed an Internet-mediated shared writing environment in which students can discuss and collaborate in pairs while writing texts. Our environment, the TC3 (Text Composer: Collaborative & Computer-supported) will be based on an earlier tool named the CTP environment (Collaborative Text Production; Andriessen, Erkens, Overeem, &
Developing a computer-supported, collaborative learning environment
31
Jaspers, 1996) and is specifically written for collaborative writing. Unlike in Belvédère, the TC3 supports synchronous work in pairs. The working screen of the program displays several private and shared windows. The basic environment comprises four main windows (see Figure 1): 1. INFORMATION (upper right): The task assignment, sources and TC3 operating instructions can be accessed within one private window in a tabbed format. Sources are divided evenly over the students. Each has different sources. The content of the sources can not be copied or pasted. 2. NOTES (upper left): A private notepad in which each partner can make personal, nonshared notes 3. CHAT (lower left): The lower chat box is for the student’s current contribution, the other for the incoming messages of his partner. Every typed letter is immediately conveyed to the partner via the network, so that both boxes are WYSIWIS: What You See Is What I See. The scrollable window holds the discussion history. 4. SHARED TEXT (lower right): A word processor (also WYSIWIS) in which the shared text can be composed while taking turns, and a turn-taking device. Information in notes, chat, chat history and shared text can be exchanged via copy and paste functions.
Figure 1. The neutral layout of the interface of the basic TC3 collaborative shared composing environment. The middle layout button (on the bottom toolbar) gives a larger
Clinical assessment computerized methods
32
work area for private work (reading sources and making notes); the rightmost button a larger area for collaborative work (chatting and writing). The left pane, from top to bottom, holds private notes, the chat history, the other’s chat input (gray) and one’s own chat input. The right pane holds the information window with the assignment, sources and directions for use of the TC3, and the shared text window. Note that students have either the even-numbered or the odd-numbered sources (bron#). This cues them to the fact that there are sources missing. It is currently not this partner’s turn to write, as his text window is grayed out. On the lower toolbar: of the three buttons to the right of the layout buttons, Markeer will highlight selected text in the information window, Wis will erase the highlight, and Zoek will bring up consecutive highlighted areas across sources. Clicking the Aantal woorden button will result in a word count of the text. Stoppen will stop the program. The stoplights regulate turn taking. Students are expected to co-write an argumentative text in the shared text window, using the information in the sources and collaborating via the chat. The students can copy material from their notes or the chat to the shared text and vice versa. To discourage lots of quotations, teachers wanted copy and paste from the sources to be disabled. Hence, students are encouraged to use their own words to summarize information from the sources. Collaboration (and reading and summarizing the sources) is encouraged by giving each student different sources. In our practice runs with the basic TC3 environment, students needed very little instruction getting started. It helped that the vast majority of the students had computer experience. Collaboration via the chat appeared to be very natural; students used it for task-oriented as well as for social communication, just like regular face-to-face (f2f)
Developing a computer-supported, collaborative learning environment
33
communication and typing did not appear to be a problem. Just like in f2f, students’ language is much more colloquial in the chats than in the argumentative text. In general, the windows appeared to be used for what they were intended: for instance, the notes for summarizing the sources and text when it was the other partner’s turn to write. The only exception to this was that occasionally the text window, being shared, was used for communication. All the sources were generally read, also because partners asked each other about the information they were missing: “Do your sources say anything about…?”. When asked afterwards, most students said they had enjoyed working with the TC3. They saw real advantages of not having to be in the same place at the same time to collaborate, to have the resulting paper in a format ready to hand in rather than having to hand-rewrite it ‘neatly’, and really enjoyed the chatting. Turn-taking, for the most part, was orderly; most student pairs reported contributing equally to the writing effort and there was little ‘hogging’ of turns. Of course there were some complaints, predictably about the inability to copy from the sources, but also about not having control over their copy of the text while their partner was writing in it. Pairs would have liked to be able to both write in the window at the same time (an updating nightmare and near-impossibility for the programmer) or, lacking that, to at least be able to control one’s own text window. Students also would have liked more text-processing possibilities in order to lay out their text better. Overall though, their feedback was very positive. The basic environment will be used in 2000 to get data from a control group. For the experimental groups, two planning tools and an advising module will be constructed. Both plan windows are, like the shared text window, shared and WYSIWIS, and will always display the part being worked on at that time: Diagrammer (Dia): a tool for conceptualizing (see Figure 2). With this tool, one can generate, organize and relate information-units in a graphical knowledge structure (comparable to Belvédère). Like in Belvédère, the various labels, shapes and/or colors of the textboxes will encode their role in the argument: position, argument for, argument against, support, refutation, conclusion and an argument-neutral box for information. Diagrams make abstract ideas concrete and keep track of work. The argumentative structure is made concrete in this tool through several constructs. First, the encoding of the argument role will enable students to see clearly the presence or absence of argument segments and types. For instance, they can then perceive whether they have supported or refuted arguments, whether both arguments for and against have been included and supported, whether their position is clearly written out and whether the paper or subparts of the paper contain a conclusion. Furthermore, the organization of the textboxes in clusters, connected with arrows or lines, shows whether students have over- or underemphasized elements of the argument. This also makes it clear which support goes with which argument, and makes it easier to see whether support and refutations have been used appropriately and which argument remains unsupported and/or unrefuted. For instance, you can see on the Diagrammer in Figure 2 that two positions have been outlined, that the leftmost position has received the majority of the argumentation, and that the rightmost position is supported by arguments that are themselves unsupported by evidence. We will show students how to use the tool to preplan and to plan online. The tool will be conceptualized to the students as a graphical summary of the information in the paper. Students will be told that the information contained in the Diagrammer has to faithfully represent the information in the text when the text is handed in. We hope that
Clinical assessment computerized methods
34
this requirement will help students to do more online planning, and thereby to notice inconsistencies, gaps, and other imperfections in their texts, and that this may also encourage them to start a review-and-revise session. Each textbox in the Diagrammer will be consecutively numbered for easy reference in the chats. Outliner (Out): a tool for linearizing (see Figure. 2). This tool will aid students to lay out their ideas in a linear, hierarchical format rather than in a semantic format. The textboxes will carry unique, hierarchical numbering (1, 1.1, 1.1.1 etc) for easy reference and to double-encode hierarchy. They will not carry argument type information as in the Diagrammer. Similar to the well-known outliner in Microsoft Word™ (MsW), students will be able to move textboxes both horizontally and vertically; subordinate boxes will move across. Unlike the MsW outliner, the boxes are not meant to reflect headers, but rather contain small summaries of arguments. Therefore, students will not be able to ‘expand’ the boxes into text, though they will be able to copy and paste the text in the boxes to the shared text window, their notes window and the chat. The structure inherent in this tool reifies hierarchical depth and complexity. For instance, students will be able to see where their reasoning is too ‘deep’, such as in the bottom structure in the Outliner in Figure 2, and which parts remain to be worked out, such as the second and third structure. This tool will be conceptualized to the student as producing a meaningful outline of the paper and, just as with the Diagrammer, the student will be required to make the information in the Outliner faithfully represent the information in the text when it is handed in. Both Diagrammer and Outliner are tools that are much easier used on a computer. Although they can be implemented with paper and pencil (for instance, with little sticky notes on a large sheet of paper), changes are much more conveniently made with the computer, as well as copying and pasting between the different windows. One drawback is the small space currently available on the display for the tools; though the students will be able to make the diagram larger than the window, using scroll boxes, the view of the whole structure may be less optimal than by hand. We consider this a temporary technology problem however, because the schools we work with currently only have 800×0 resolution monitors, decidedly outdated but the affordable option. In a few years monitors with much larger resolution will be affordable, though schools will probably always run behind in terms of technology. Advisor (Adv): a program that prompts, advises and asks questions as to the consistency of the knowledge structure and the coherence of the contents of the Diagrammer and Outliner, dependent on the phase in the writing process. For instance, like in Belvédère, the advisor will be able to remark on ‘missing parts’ and balance of the structures in the Diagrammer, and depth and balance of the structures in the Outliner, in essence teaching students how to ‘read’ the information in the tools, and about the advisability of the use of the tools throughout the task. The advisor will also contain a tool-neutral part, giving advice on general writing problems. Both the tool-dependent and the tool-independent parts of the advisor will react to the students’ individual input and the passage of time, and are therefore eminently suited to be implemented in ICT technology. The partner whose turn it is will be able to work in any of the three shared windows; the inactive partner in none, though she will have control over the windows that are not actively being used. The basic environment will always be visible; the Diagrammer and Outliner windows may or may not be made visible.
Developing a computer-supported, collaborative learning environment
35
Figure 2. Initial concepts of the interfaces of the TC3 tools Diagrammer (left) and Outliner (right) It is expected that the effects of the Diagrammer will mainly concern the consistency and completeness of the argumentation of the text (Veerman & Andriessen, 1997). Using the Outliner may result in a better and therefore more persuasive argumentative structure and a more adequate use of linguistic structures like connectives and anaphors (Chanquoy, 1996). We are hypothesizing that these effects will be strengthened when both the graphical/semantic and the linear/hierarchical organization are supported, and when there is explicit help from the Advisors with the pre- and online planning and with the translation between the organizational structures and the linear text. Experimental Design Subjects and Materials In each condition 12 randomly chosen pairs of students will write two argumentative texts about a given subject. The students will be at the start of their junior school year in college preparatory high schools (approx. age 16–17). All the windows applicable to their experimental group will be available at all times to the students. The experiment will be run on the high schools’ PCs that support Windows95 and are connected to the Internet. We must unfortunately restrict the resolution of the program to 800X600 pixels because the high schools’ monitors are set for this resolution. The program will be locally installed; we plan to record the data via Internet on a dedicated server at the University.
Clinical assessment computerized methods
36
Independent Variables The effects of the tools for collaborative writing will be investigated with the experimental conditions as shown in Table 1. A total of 7 experimental groups (each 12 pairs of students) will write argumentative texts with or without one or two of the planning tools and with or without the Advisor component. Produced texts and chat discussions between students during writing will be analyzed. In a first study only the control group will write texts with the basic environment. Research questions for this study will focus on the relationship between planning activities in the chat dialogues and the quality of the resulting texts. The results of this study will be used as a baseline for a second study in which the use of the planning tools, with and without the advisor component, will be analyzed. In the second study the main question is how these tools support the planning activities in collaborative argumentative writing and the coordination between the students.
Table 1. Experimental conditions. Group (n=12)
Basics
Diagrammer
Advisor Outliner
1
Control
X
2
Dia
X
3
Out
X
4
Dia+Out
X
X
5
Dia+Adv
X
X
6
Out+Adv
X
7
Dia+Out+Adv
X
X X
X
X X X
X
X
X
Dependent Variables The dependent variables will be: 1. Global measures: The length of the text in number of words and the time taken to finish the text. 2. Various measures of the content of the texts: The quality (consistency/coherence) of the text according to a schema that enables judging complexity in texts (Veerman, 1996). A comparable schema will be used to determine the complexity of the structures constructed in the Diagrammer and the Outliner. 3. Various measures of the content of the protocols: The collaboration protocols (chats) will be analyzed as to type and frequency of coordinating episodes (checking, focusing and argumentation) using an analysis instrument developed for collaborative dialogues (the VOS-system, Kanselaar & Erkens, 1996). They will also be analyzed in terms of
Developing a computer-supported, collaborative learning environment
37
the existence of, and differences in, iterative process loops in the writing and planning phases. 4. Differences in the coordination process (checking, focusing, argumentation) and process loops during the various planning and writing phases will be analyzed using protocol analysis tools. For the statistical analyses we will use multivariate analyses for testing the differences between the conditions and correlational analyses to investigate the relationships between the planning- and text products. Discussion As we are still in the planning stage, there are no results to discuss. There are, however, a few known prior research outcomes, pitfalls if you will, that may affect this research. Below, we will discuss three of them, and how we hope to avoid them or ameliorate the situation. Pitfall 1: No effect of Planning on the Quality of the Paper Some previous research on planning shows an effect of prompts to preplan on the proportion of abstract planning done, but not on the quality of the paper (Kozma, 1991; Scardamalia & Bereiter, 1985, 1987). In fact, Scardamalia and Bereiter (1987) describe a whole series of studies where children were taught or prompted to use separate skills thought to underlie good writing, many of which showed good local results (i.e. the use of taught or prompted skills improved). However, there was little or no effect on the quality of the finished text. Several reasons may account for this, among which the following three: 1. Having learned how to plan may not mean that one knows how to translate that plan into text; 2. The plan is (largely) ignored when the students actually write the text. Torrance, Thomas, and Robinson (1996) found that only half the ideas generated in preplanning were used in the text. 3. Novice preplanning isn’t the effective, goal-setting behavior of experts, but rather a superficial sort of brainstorming, not much more than just simple content-activation. Torrance, Thomas, and Robinson (1996) found that the number and originality of ideas in novice drafts were not correlated with time spent preplanning. Furthermore, Bereiter and Scardamalia (1987) also showed a planning quality difference between adults and children: namely that while adults considered constraints when planning, 5th grade students did not and 10th grade students did so only to a very small extent. Avoiding Pitfall 1 We will address this pitfall in two ways. First, we will request that students update the contents of the planning tools during text writing, and require an update when they feel
Clinical assessment computerized methods
38
they are done. Updating should focus students on attending to the differences between the text and the contents of the planning tool. The plan representations will help to make certain problems with the text noticeable while updating the plan, and thus may initiate a revision session. Secondly, by requiring two students to collaborate, reflection on the translation will be a naturally occurring process. In summary, they will have to translate their thoughts explicitly into words and justify their decisions to their partner. Pitfall 2: Effects of Time-On-Task Explain Away Desired Effects Hayes and Nash (1996) point out that some experimental studies on planning show a positive effect of preplanning on the quality of the paper, but that when time-on-task is removed as a source of variability, that correlation vanishes. Avoiding Pitfall 2 Ideally, this would require an experimental design whereby time-on-task is kept equal for all students. This is difficult to do, as the task the students will be doing is a regular school task, and we cannot force an equal time-on-task. Probably, the students in the experimental groups that use the planning modules will require more time. As we will be tracing their work progress, we will at least be able to measure the time spent on different processes. Even without planning tools, any planning should become visible in the chats. This should make it possible to allocate eventual improvements in text either to time (if the processes used are the same, but there are more of them) or to a difference in process (if different, additional processes are used, requiring extra time). Pitfall 3: Writing Good Papers is Correlated with Good Planning, but Third Variable also Explains Correlation Hayes and Nash (1996) point out that although some studies show a favorable effect of preplanning on the quality of the text, these effects are correlational in nature (e.g., Spivey & King 1987; Nelson, 1988). Correlational studies have the problem of assignment of cause: for instance, does good planning cause a good paper to be written, is it vice versa, or is there some third variable that causes both or that causes the effect instead? Hayes and Nash (1996, pg. 47) cite a study by Ruth Lebovitz (no reference, personal communication by Hayes and Nash) in which one group (n=50) planned to write on topic A and another group (n=55) planned to write on topic B. After planning, half of each group was asked to write on A and half on B. It turned out that good planning was indeed correlated with good writing, but that it didn’t matter at all whether students had done the planning on the topic they wrote on or not. Avoiding Pitfall 3 First, our study is an experimental design, and we are recording and measuring both the product and the process. Of course, it is never possible to exclude the existence of extra, non-measured variables completely, especially in a study conducted in a natural environment. However, by analyzing in detail the planning processes in several phases of
Developing a computer-supported, collaborative learning environment
39
the writing assignment, we may be able to assign cause with a bit more certainty. By running process analyses we hope to get a better insight into the direct effects of planning activities on writing itself. For our correlational measures, of course, cause cannot be assigned. Second, Hayes and Nash suggest a more holistic approach. They say that although it is true that expert writers plan more, they also spend more time drafting, revising, translating and doing library research, suggesting that it is something more highlevel that they do better, and focusing on just improving one process like planning may not do much good. We feel that our process study approaches a holistic design; although our experiment focuses on planning aids, these tools can just as well be used to aid conceptual revising and translating from concepts to text. It is in fact difficult to draw a hard line between abstract-content online planning and abstract-content revising, as the two go hand-in-hand and may well be parts of the same process, as Hayes and Nash suggest. References Andriessen, J., Coirier, P., Roos, L., Passerault, J.M., & Bert-Erboul, A. (1996). Thematic and structural planning in constrained argumentative text production. In H.Van den Bergh, G.Rijlaarsdam, & M.Couzijn (Eds.) Theories, Models and Methodology in writing research (pp. 237–251) . Amsterdam: University Press,. Andriessen, J.E.B., Erkens, G., Overeem, E., & Jaspers, J. (1996, September). Using complex information in argumentation for collaborative text production. Paper presented on the First Conference on Using Complex Information Systems (UCIS’96), Poitiers, France. Baker, M. (1999). Argumentation and constructive interaction. In J.E.B. Andriessen & P.Coirier (Eds.) Foundations of Argumentative Text Processing. (pp. 179–203) Amsterdam: University Press. Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Hillsdale, NJ: Lawrence Erlbaum Associates. Chanquoy, L. (1996, October). Connectives and argumentative text: a developmental study. Paper presented on the First International Workshop on Argumentative Text Processing. Barcelona, Spain. Clark, H.H., & Brennan, S.E. (1991). Grounding in Communication. In L.B.Resnick, J.M.Levine, & S.D.Teasley (Eds.): Perspectives on socially shared cognition (pp. 127–150). Washington, DC: American Psychological Association Coirier, P, Andriessen, J.E.B., & Chanquoy, L. (1999). From planning to translating: the specificity of argumentative writing. In J.E.B.Andriessen & P.Coirier (Eds.) Foundations of Argumentative Text Processing (pp. 1–29). Amsterdam: University Press. Erkens, G. (1997). Coöperatief probleemoplossen met computers in het onderwijs: Het modelleren van coöperatieve dialogen voor de ontwikkeling van intelligente onderwijssystemen [Cooperative problem solving with computers in education: Modeling of cooperative dialogues for the design of intelligent educational systems.] Unpublished doctoral dissertation, Utrecht University, Netherlands. Hayes, J.R, & Nash, J.G. (1996). On the nature of planning in writing. In C.M.Levy & S.Ransdell (Eds.), The Science of Writing (pp 29–55). Mahwah, NJ: Lawrence Erlbaum Associates. Kanselaar, G., & Erkens, G. (1996) Interactivity in Cooperative Problem Solving with Computers. In S.Vosniadou, E.DeCorte, R.Glaser, & H.Mandl (Eds.) International Perspectives on the design of Technology Supported Learning Environments (pp. 185–202). Mahwah, NJ: Lawrence Erlbaum Associates.
Clinical assessment computerized methods
40
Kozma, R.B. (1991). The impact of computer-based tools and embedded prompts on writing processes and products of novice and advanced college writers. Cognition and Instruction, 8, 1– 27. Levelt, W.J.M. (1988). Speaking: from intention to articulation. Boston, MA: Bradford Books, MIT press. McCutchen, D. (1987). Children’s discourse skill: form and modality requirements of schooled writing. Discourse Processes, 10, 267–286 Nelson, J., Pool, E.van der (1988). Examining the practices that shape student writing: Two studies of college freshmen writing across disciplines. Unpublished doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA, USA. Pool, E.van der (1995). Writing as a conceptual process: a text-analytical study of developmental aspects. Unpublished doctoral dissertation, Catholic University Brabant, Netherlands. Scardamalia, M., & Bereiter, C. (1985). The development of dialectical processes in composition. In D.Olson, N.Torrance, & A.Hildyard (Eds.). Literacy, language and learning: the nature and consequences of reading and writing (pp. 307–329). New York: Cambridge University Press Spivey, N.N., & King, J.R. (1987). Readers as writers composing from sources. Reading Research Quarterly, 24, 7–26. Stein, N.L., Bernas, R.S., & Calicchia, D. (1997). Conflict talk: understanding and resolving arguments. In T. Giron (Ed.) Conversation: cognitive, communicative and social perspectives (pp 123–144) (Typological studies in language, volume 34). Amsterdam: John Benjamins. Suthers, D., Weiner, A., Connelly, J., & Paolucci, M. (1995). Belvedere: Engaging students in critical discussion of science and public policy issues. In Greer, J. (Ed.). Artificial Intelligence in Education, (pp. 266–273) Charlottesville, VA: AACE. Torrance, M., Thomas, G.V., & Robinson, E.J. (1996). Finding something to write about: strategic and automatic processes in idea generation. In C.M.Levy & S.Ransdell (Eds.): The Science of Writing (pp. 189–205). Mahwah, NJ: Lawrence Erlbaum Associates. Veerman, A.L. (1996, October). Argumentation during solving ill-structured problems. Paper presented at the First International Workshop on Argumentative Text Processing. Barcelona, Spain. Veerman, A.L., & Andriessen, J.E.B. (1997, September) Academic Learning and Writing through the use of educational technology. Paper presented at the conference on Learning & Teaching Argumentation, Middlesex University, London, England.
Chapter 4 A Multi-Media Computer Program for Training in Basic Professional Counseling Skills J.Adema and K.I.van der Zee Sectie Persoonlijkheids- en Onderwijspsychologie, Vakgroep Psychologie, Grote Kruisstraat 2/1, 9712 TS Groningen Abstract This paper concerns the development of a self-instructional program for training in basic counseling skills. The product was a multimedia computer program, named GEVAT. The training under consideration was based on a traditional training in which students enhance these skills under supervision. Theoretical foundations that underlie the training in counseling skills are Bandura’s social learning theory (1977) and Ivey’s (1971) microcounseling method. In this paper the potential advantages of self-instructed education are described, and empirical evidence suggesting that self-instruction may improve the quality of education is cited. After describing the contents of the training, it is reasoned which elements of the training can be transformed into self-instruction and why. In addition, the development of the computerized self-instructional program GEVAT, is described. For each skill, theory, videotaped role models and exercises were provided. At the end of this paper, results are presented from a first evaluation study that focused on students’ reactions to the program and on the effects of the training on their skill levels. In general, the program was evaluated positively.
Introduction Counseling concerns a vital portion of the work of most practitioners in the field of psychology, and education in the necessary skills is therefore considered very important. For this reason, the Psychology Curriculum at the University of Groningen incorporates a number of courses aimed at teaching students basic and more advanced communication skills. For many years, psychology students have been intensively trained within the tradition of the ‘cumulative micro-counseling method’ (Lang & Van der Molen, 1992). With the recent expansion of information and communication technology, the question arose whether it would be possible to use modern techniques to enhance and modernize skill training. Therefore, in 1996 the department of Psychology of the University of Groningen started a project aimed at developing computerized multimedia programs for professional skills. These training programs were designed in self-instructional form, that
Clinical assessment computerized methods
42
is, students could work through the materials independently, without supervision. In this way students were stimulated in their independence and teacher input could be applied more efficiently in those parts of the training where it was indispensable. In this paper, the development and implementation of GEVAT (GEspreksVAardigheidsTraining), a computerized multimedia program that is developed for a basic course in counseling skills, will be discussed. The chapter starts with a description of the theoretical background and design of the basic training in counseling skills. Next, the possible advantages of implementing modern multimedia are discussed, followed by an overview of the possibilities and limits of self-instruction in basic skill training. Third, the design of the traditional training in counseling skills is described, followed by a detailed overview of the development of the self-instructional computer program. The chapter ends with a short evaluation of the computer program. Basic Training in Counseling Skills The basic training in counseling skills for which a self-instructional program was developed was based on Bandura’s social learning theory (1977) and Ivey’s (1971) microcounseling method. Three principles are essential to Bandura’s social learning theory (1977). First, Bandura states that learning occurs through modeling. In order to learn new behaviors one has to observe others performing those behaviors in an effective way. Second, Bandura argues that vicarious learning is not enough. In order to perform the observed behaviors effectively one has to practice those behaviors. Finally, in order to prevent extinction of the acquired behaviors they have to be reinforced, that is, followed by positive feedback. The concept of self-efficacy is considered an essential feature of social learning. An important function of modeling is to enhance our feeling of self-efficacy, that is, our belief that we are able to perform specific types of behavior effectively, which in turn enhances subsequent skill acquisition. Essential in Ivey’s method—which is also partly based on social learning theory—is that complex social behaviors are learned through separate skills. A skill is defined as a meaningful distinguishable unit of behavior. Examples of counseling skills are asking questions, reflecting feelings and summarizing what the other has told. These skills are trained in an isolated way. Although this seems a good method for teaching counseling skills, this method has two important disadvantages. First, in the microcounseling method, students practice skills without knowledge of the purposes of the exercises or without having seen models. This is in conflict with social learning theory, which heavily emphasizes that vicarious learning should precede skill practicing. Second, the disadvantage of practicing skills in isolation is that students do not learn how to apply skills in complex counseling situations (Lang & Van der Molen, 1992). Therefore, Lang, and Van der Molen (1992) developed the cumulative microcounseling training (CMT) method, which was the foundation of the basic training counseling skills. The CMT consists of theory, videotaped models, exercises, role-playing and feedback. In addition to the fact that theory and models precede role-playing, an important difference with the microcounseling method is that CMT is characterized by a gradual increase in complexity during role-playing. Beginning with practicing one skill, a new skill is added in each succeeding session. Although the focus in a role-playing session lies on the new skill,
A multi-media computer program for training in basic professional
43
trainees are asked to integrate all skills treated up to that moment (Van der Molen, Smit, Hommes, & Lang, 1995). There is strong evidence for the effectiveness of CMT in enhancing communication skills (Smit, 1995; Smit & Van der Molen, 1996). Moreover, students seem to appreciate training in counseling skills according to this method very much. Advantages of Multimedia In the previous paragraph, the importance of role models in skills training was stressed. For years, the observation and discussion of videotaped models has been an important element of training. Traditionally, these models were displayed through a video recorder in the classroom. The theory was presented separately in a textbook that was provided to the students. With the recent expansion of information and communication technology, the question arose whether it would be possible to improve and modernize this method of presentation. By integrating the videotaped models together with theory and exercises into a computer program, students can now work through the materials in an integrated way. They are able, for example, to watch a video fragment and, at the same time, to push a button to get the theory behind it on their computer screen. Moreover, it is possible to provide the students with interactive feedback in exercises. In addition, it seems that the use of modern techniques makes education more diverse and more attractive, which can motivate students and make them more dedicated to their task (e.g., Gastkemper, 1984). Interestingly, in educating practitioners in Psychology, interactive computer programs may serve another important function. Computer simulations of counseling interviews with videotaped clients enable students to practice skills that can, for ethical reasons, not be practiced on real clients. Working with realistic client problems will probably enhance the chances of transfer of what is learned to the job situation (Simons & Verschaffel, 1992; Travers, 1970). Practicing on fellow students will probably present them with limited problems in a less realistic setting. Why Self-Instruction? In the previous paragraph, some advantages of implementing multimedia in training counseling skills were discussed. The present paragraph focuses on the motives for developing a self-instruction program for this training course. A self-instruction program is a program that students could work on independently, without supervision. In the traditional training situation, students were trained by teachers and well-trained advanced students, who clarified the theory, discussed videotaped role models and exercises and supervised during role-playing. Self-instruction, on the contrary, means contact independent instruction, which entails that instruction takes place via the study material. What reason is there to replace the teacher with self-instruction? It appears that selfinstruction has several advantages. With the present expansion of knowledge, there is a tendency to move away from pure transfer of knowledge to the cultivation of students’ self-learning abilities (e.g., Peng, 1989) and a more active learning attitude. An advantage of self-instructional training which refers to this, is that self-instruction asks more self-
Clinical assessment computerized methods
44
responsibility and independence from students, which in turn can influence motivation and achievement (e.g. Light, 1990). A second advantage of self-instruction is that, in reducing teacher-assisted instruction time, valuable inclass time can be used, for example, in order to provide more personal feedback during exercises. In this way, it is possible to make trainer input more efficient (Robinson & Kinnier, 1988; Van der Zee, Lang, & Adema, 1997). A third advantage of self-instructional training is that education can take place independent of place and time (Boonstra, 1997; Budd, 1987; Van Hout Wolters & Willems, 1991; Van der Perre, 1997). Consequently, learning paths can be adjusted to the level and pace of individual students (Boonstra, 1997; Van Hout Wolters & Willems, 1991). In this way, students who enhance the various skills easily, experience less hindrance from weak students than was the case in the traditional training situation, and they can work through the materials at a higher level. It is also possible to add extra exercises for those students who experience difficulties with the materials. This may enhance the quality of education for all students. Fourth, instruction becomes more consistent. Intra-personal variation in teacher performance, as well as variation between teachers, is reduced for a great deal by presenting a standard program (Budd, 1987; Kanselaar, 1986). Last but not least, cost-effectiveness is often a ground for implementing self-instruction (Antonides & Kokhuis, 1987; Budd, 1987). Since the late 1970s, there has been an increasing demand for self-instructional training modules in counselor education (Cormier & Cormier, 1976; Fuhrmann, 1978; Hector, Elson, & Yager, 1977; Robinson & Kinnier, 1988). Several studies revealed that self-instructed education in general, and self-instructed skill training in specific, can be effective. With respect to the former, for example, a meta-analytic study by McNeil and Nelson (1991) showed that interactive video-instruction can be an effective form of instruction. Budd (1987) argued that in many cases self-instructed education is more or at least equally effective as traditional instruction in the classroom. With respect to the effectiveness of self-instruction in teaching communication skills, research by Rosenthal (1977) showed that both a self-instruction approach and a standard implementation of Structured Learning Technique were effective in teaching confrontation skills. Structured Learning Technique includes systematic and sequential application of modeling, role-playing, reinforcement and transfer training (Goldstein, 1973). Robinson and Kinnier (1988) compared self-instruction training with traditional training in counseling skills and found the self-instruction training to be equally effective as the traditional teacher-assisted training method. Mason, Barkley, Kappelman, Carter, and Beachy (1988) developed a self-instruction video aimed at improving medical students’ communication skills. These authors found a significant improvement in comparison to a no-treatment control condition. Although self-instructed training seems an effective method for skill training, one could argue against relying completely on self-instructed training materials. McNeil & Nelson (1991), for example, found that interactive video was significantly more effective when it was used in combination with traditional education methods than when it was the only method used. Goldman, Wade, and Zegar (1974) state that self-instruction methods are more effective when they are integrated in a well-structured program with deadlines, forced presence, and obligation to run through all parts of the program. In a similar vein, Mueller (1974) found that students themselves rated additional guiding materials supplied in self-instructional packets, such as homework exercises, answers to these homework
A multi-media computer program for training in basic professional
45
exercises, mimeographed handouts and practice tests as very helpful. Roosendaal and Vermunt (1996) warn against relying completely on contact-independent instruction. To summarize, self-instructed education seems to have important advantages, and empirical evidence suggests that it may improve the quality of education. However, it seems insensible to rely completely on this mode of instruction. In developing GEVAT it was therefore decided not to integrate all elements of training into the computerized program. Some parts of the original training kept their original form as will be discussed in the next paragraph, in which the computerized multimedia program GEVAT will be described. GEVAT As was stated previously, the traditional training was centered around basic counseling skills. These skills were: paying attention, subtle stimulation, asking questions, paraphrasing, reflecting feelings, concretizing, summarizing, opening an interview, clarifying the situation, thinking aloud and ending an interview. Each skill is introduced and practiced in a standardized way. The training sessions contain four basic elements: a) theory; b) videotaped examples of effective and ineffective application of the focal skill; c) skill exercises and d) role playing. In designing GEVAT it was decided to integrate elements a, b and c into the computer program and to use the supervised group sessions for role-playing and feedback. First, with respect to theory, this part contains two components, namely preparing for the training by reading a textbook about counseling skills and refreshing parts of the theory previously studied during the course. In the program, it was attempted to present and clarify learning materials and to direct the learning process by providing guidance and structure in the materials. This was done, for example, by entering a certain order and size of steps in the explanation of learning materials and by providing summaries, exercises, revisions and questions in the text (for an extensive overview, see Adema, in preparation; see also Van Hout Wolters, & Willems, 1991; Lorch & Lorch, 1995; Van Parreren & Peeck, 1974; Ploeger, 1987; Raaijmakers, 1984). Second, the videotaped models of ineffective and effective skill use (see Figure 1) are aimed at acquiring knowledge of and insight in the skill to be learned and to see the effects of effective and ineffective skill use. Because video materials are more easily processed than written materials, students may pay less attention to information that is provided through video, which may affect their learning process negatively (Cennamo, 1993; Krendl, 1986; Salomon, 1984; Salomon & Leigh, 1984). The computer program GEVAT contains a number of questions forcing the students to observe the videotaped models in a systematic way. An advantage of having the video examples accompanied by concrete exercises is that these guide students’ observational learning. There is evidence
Clinical assessment computerized methods
46
Figure 1. GEVAT: the element modeling. that students who watch video materials under explicit learning instructions apply more mental effort and show higher performance than students who are watching the same materials under an instruction to watch it for fun (Field & Anderson, 1985; Krendl, & Watkins, 1983; Salomon & Leigh, 1984). The presence of explicit goals (Britton, Glynn, Muth, & Penland, 1982) and embedded questions (Britton, Piha, Davis, & Wehausen, 1978) affects the amount of mental effort positively. Therefore, in GEVAT, students are asked to evaluate the counselor’s actions in terms of (in-)effective skill-use. When they have finished this task they receive detailed feedback from the program, which is considered sufficient to make students self-confident and to prevent them from becoming insecure and in need of the trainer’s help (see also Adema, in preparation; Van Hout Wolters, & Willems, 1991; Vermunt, 1992). The third element concerns practicing skills in an isolated way in well-structured exercises. The exercises are intended to acquire competence in adequate application of the skill that is at the center of attention. Students are presented with video vignettes. On these vignettes a fragment is shown of a client who presents his problem to the student (see Figure 2). The student has to adopt the role of counselor and has to react to this client (for example, by giving a reflection of the feelings of the videotaped client). A response box is presented on the screen, in which students can type what they would say
A multi-media computer program for training in basic professional
47
to the client. The program provides feedback to the student by presenting one or more right answers (responses) on the screen, as soon as students have indicated that they have finished typing. The pre-programming of feedback, which seemed important for regulating the learning process and also for keeping students motivated, was difficult to realize: the reactions of students are not always predictable, which makes it impossible to anticipate all individual reactions to the exercises. This problem was handled by providing students not only with the right answers, but also with a list of possible false answers (unsuitable responses) that included feedback to explain in what way aspects of counseling skills had been wrongly applied. By motivating why certain answers are inadequate, it is attempted to make the student reconsider his or her own answer. The idea is that students will be more capable of judging the quality of their own answers if they comprehend the principle upon which the feedback is based.
Figure 2. GEVAT: video exercise for practicing skills in an isolated way. The goal of the final training element, role-playing, is attaining proficiency in the application of counseling skills and being able to choose the appropriate skill. It is difficult to simulate real interactions between a counselor and a client in computer programs. Not all reactions of the counselor can be pre-programmed and techniques to capture the non-verbal signs of the practicing counselor (student) behavior have not yet been fully developed. Moreover, the course of a role-play is unpredictable and it is therefore impossible to develop pre-programmed feedback. Therefore, it was decided to perform role-playing in traditional supervised classroom sessions. By having the trainer supervise the role-plays, it is possible to reduce insecurity, to motivate students and to monitor the quality of education.
Clinical assessment computerized methods
48
In summary, the elements of the computer program consist of theory, videotaped models and short skill exercises. The computer program starts with an overview of the skills that are trained in the program. Students can select a skill from the menu and in the next screen the training element they want to start with. Within the program, it is recommended to go through the program in a specific order. This recommended order has been proven effective in traditional skill training (Bögels & Kreutzkamp, 1990; Smit, 1995; Smit & Van der Molen, 1996; Van der Molen, Smit, Hommes, & Lang, 1995). After carrying out an exercise, the student directly gets feedback from the program. It is possible to make the program save students’ answers in a database. This database can be made available to the teacher through the intra- or Internet. In this way the teacher can control students’ learning processes and extra feedback can be provided to the student. The video materials are developed in cooperation with a professional video company and with (semi-) professional actors. The video material has been digitized with MPEG encoder hard and software techniques and have been incorporated in an interactive shell that was developed in Visual Basics. Evaluation of the Computer Program In 1999 the computer program was implemented in the curriculum of Psychology. In a large-scale evaluation study, both students' reactions to the program and the effects of the training on their skill levels were evaluated. In total, 131 second-year students of Psychology completed an evaluation form, which contained questions concerning the instructiveness and the usefulness of the program and their attitudes towards elements of the program. In general, the computer program was evaluated as an instructive program (Table 1). In addition, students had experienced few difficulties in completing the program independently, i.e. without supervision. Moreover, the goals of the different elements of the training had been realized through the program, according to the students. In particular the short interactive video scenes that were followed by feedback, including feedback on typical reactions that are inadequate, were evaluated as instructive, useful and pleasant to do. This is an interesting finding because, in developing the program, anticipating the students' answers to these exercises was a difficult experience. Students also had the opportunity to write down clarifying remarks on the evaluation form. Some students, although they considered the feedback in the program in general very useful, wrote down that that they desired more, better or personal feedback from a professional supervisor. Interestingly, other students remarked that the feedback was not always sufficient, but that this stimulated mutual discussion, which was considered to be very useful. According to the students, the role of the trainer ought to be discussing exercises, clarifying uncertainties, answering questions, giving feedback and providing instructions on the use of the computer program. The combination of the computer program with role-playing sessions was evaluated positively. Congruent with our own point of view, some of the students explicitly remarked that role-playing was an essential part of the training that could not be replaced by the computer.
A multi-media computer program for training in basic professional
49
Table 1. Summary of the results of the evaluation form*. Items
M
Program instructiveness
3.44
.81
Program pleasantness
3.10
.97
Program easiness
3.52
.71
Program pace
2.65
.70
Program provided insight in counseling
3.51
.85
Feedback in program useful
3.84
.98
Program independently executable (without trainer)
4.27
1.04
Evaluation of theory
3.70
.67
Evaluation of inadequate model
3.93
.92
Evaluation of adequate model
4.06
.66
Evaluation of video exercises for practicing skills in an isolated way
3.71
.76
Evaluation of written exercises for practicing skills in an isolated way
3.57
.88
Pleasantness of video exercises for practicing skills in an isolated way
3.73
.79
Usefulness of video exercises for practicing skills in an isolated way
3.90
.86
Instructiveness of video exercises for practicing skills in an isolated way
3.84
.81
Sd
* It must be noted that initially some technical problems were encountered, which made the program sub-optimal. This may have influenced the results negatively. Note: for all items 1 is ‘low’ and 5 is ‘high’.
Moreover, the effects of the training on students’ actual skill levels were examined. Pre- and post-training performance was measured both through video tests and role-play exercises. Preliminary analyses of the effects of the training on students’ actual skill level suggest that the training is effective in skill-enhancement and at least equally effective as the traditional training (see Adema, in preparation). Conclusions To conclude, in general the program was evaluated positively. At this moment, the effects of the training program on student’s actual skill level are examined in the context of a larger research program. Preliminary analyses suggest that, when compared to the traditional training, the self-instruction training is equally effective. Considering this finding and the fact that the program is still in the phase of development, it seems justified to conclude that the computer program GEVAT is a promising method for teaching counseling skills. Thereby, supervision of training in counseling skills can be reduced substantially.
Clinical assessment computerized methods
50
It is important to realize that, although students indicate that it is possible to run the program independently, some students keep feeling a need for personal contact with a trainer. Therefore, it seems insensible to rely completely on self-instruction. Besides, the training may not work for everyone. Future research may examine whether selfinstruction training is effective for all kinds of students. The present authors are examining the influence of personality characteristics and learning styles on the effect of self-instructional multimedia training. Finally, recent technical developments enable further development of the program. At this moment, students have to type their answers in reaction to the video scenes, but in future it may be possible to ask students to give their reactions (non -verbally) and to record them with a web-camera. Furthermore, in the present version students react to short videotaped scenes and immediately receive feedback from the program. Simulating entire counseling interviews in the computer program in which students have to interact with a videotaped client may enhance the interactivity in the program. Modern techniques enable the computer to analyze and categorize students’ written reactions to a client and to generate the appropriate response of the videotaped client. In this way, the effectiveness of the program can be further enhanced. References Adema, J. (in preparation). Dissertation. Antonides, E, & Kokhuis, A. (1987). Interactief videogebruik bij COO: cursus ‘tweegesprekken’. In D.de Bie (Ed.), Studiedag: de taak van de docent in het nieuwe HBO. Utrecht: De Som. Bandura, A. (1977). Social learning theory. Englewood Cliffs, NJ: Prentice Hall, Inc. Bögels, S.M., & Kreutzkamp, R. (1990). Effecten van een training in basale gespreksvaardigheden. Tijdschrift voor onderwijsresearch, 15, 201–214. Boonstra, A. (1997). Investeren in ICT met visie. In M.Mirande, J.Riemersma, & W.Veen, (Eds.), Hoger Onderwijs Reeks. De Digitale Leeromgeving. (pp. 63–77). Groningen: WoltersNoordhoff. Britton, B.K., Glynn, S.M., Muth, K.D., & Penland, M.J. (1982). Instructional objectives in text: Managing the reader’s attention. Journal of Reading Behavior, 17, 101–113. Britton, B.K., Piha, A., Davis, J., & Wehausen, E. (1978). Reading and cognitive capacity usage: Adjunct question effects. Memory and Cognition, 6, 266–273. Budd, M.L. (1987). Self-Instruction. In R.L.Craig, (Ed.), Training and development Handbook. A guide to human resource development (3rd ed. pp .488–499). New York [etc.]: McGraw-Hill Book Company. Cennamo, K.S. (1993). Learning from video: Factors influencing learners’ preconceptions and invested mental effort. Educational Technology Research and Development, 41 (1), 33–45. Cormier, L.S., & Cormier,W.H. (1976). Developing and implementing self-instructional modules for counselor training. Counselor-Education-and-Supervision, 16, 37–45. Field, D.E., & Anderson, D.R. (1985). Instruction and modality effects on children’s television attention and comprehension. Journal of Educational Psychology, 77, 91–100. Fuhrmann, B.S. (1978). Self-evaluation: An approach for training counselors. Counselor evaluation and supervision, 17, 315–317. Gastkemper, F.H.D. (1984). The integrated use of computer assisted instruction (CAI) and videodisk (VLP) for observation training. Computers and Education, 8, 219–224. Goldman, R.M., Wade, S., & Zegar, D. (1974). Students without harness: The ‘SUM’ experiment in self-paced learning. Journal of higher education, 45, 197–210.
A multi-media computer program for training in basic professional
51
Goldstein, A. (1973). Structured learning therapy. Towards a psychotherapy for the poor. Beverly Hills, CA: Harper & Rowe. Hector, M.A., Elson, S.E., & Yager, G.G. (1977). Teaching counseling skills through selfmanagement procedures. Counselor education and supervision, 17, 315–317. Hout-Wolters, B.H.A.M.van, & Willems, J.M.H.M. (1991). Zelfinstructie: mogelijkheden en beperkingen. Pedagogische Studiën, 68, 289–294. Ivey, A.E. (1971). Microcounseling. Springfield, IL: Charles C.Thomas. Kanselaar, G. (1986). De beoordeling van de kwaliteit van computerondersteund onderwijs. In J.S.ten Brinke, H.P.Hooymayers, & G.Kanselaar, (Eds.) Bijdragen aan Onderwijsresearch. Vakdidactiek en informatietechnologie in curriculumontwikkeling (pp. 113–125). Lisse: Swets & Zeitlinger. Krendl, K.A. (1986). Media influences on learning: Examining the role of preconceptions. Educational Communication and Technology Journal, 34, 223–234. Krendl, K.A., & Watkins, B. (1983). Understanding television: An exploratory inquiry into the reconstruction of narrative content. Educational Communication and Technology Journal, 34, 223–234. Lang, G., & Molen, H.T.van der (1992). Methodiek van gesprekstraining. Baarn: H.Nelissen. Light, R.J. (1990). The Harvard Assessment Seminars. Explorations with students and faculty about teaching, Learning and student life. Harvard: Harvard University. McNeil, B.J., & K.R.Nelson (1991). Meta-analysis of interactive video instruction: A 10 year review of achievement effects. Journal of Computer-Based Instruction, 18 (1), 1–6. Lorch, R.F. Jr., & Lorch, E.P. (1995). Effects of Organizational Signals on Text-processing Strategies. Journal of Educational Psychology, 87, 537–544. Mason, J.L., Barkley, S.E., Kappelman, M.M., Carter D.E., & Beachy, W.V. (1988). Evaluation of a self-instructional method for improving doctor-patient communication. Journal of Medical Education, 63, 629–635. Molen, H.T.van der, Smit, G.N., Hommes, M.A., & Lang, G. (1995). Two decades of micro training in the Netherlands: A meta-analysis. Educational Research and Evaluation, 1 (4), 347– 378. Mueller, D.J. (1974). Evaluation of instructional materials and prediction of student success in a self-instructional section of an educational measurement course. The Journal of Experimental Education, 42(3), 53–56 Parreren, C.F.van, & Peeck, J. (1974). Informatie over leren en onderwijzen. Groningen: Tjeenk Willink. Peng, W.-W. (1989). Self-directed learning: a matched control trial. Teaching and Learning in Medicine, 1(2), 78–81. Perre, G. van der (1997). Brengt de virtuele universiteit een antwoord op de vragen van de kennismaatschappij? In M.Mirande, J.Riemersma, & W.Veen, (Eds.), Hoger Onderwijs Reeks. De Digitale Leeromgeving. (pp. 49–62). Groningen: Wolters-Noordhoff. Ploeger, A. (1987). Bewerken van schriftelijk studiemateriaal. In D.de Bie (Ed.), Studiedag: de taak van de docent in het nieuwe HBO. Utrecht: De Som. Raaijmakers, J.G.W. (1984). Psychologie van het geheugen. Deventer: Van Loghum Slaterus. Robinson, S.E., & R.T.Kinnier (1988). Self-instructional versus traditional training for teaching basic counseling skills. Counselor Education and Supervision, 28, 140–145. Roosendaal, A., & Vermunt, J. (1996). Leerstijlen en zelfstandig leren in het voorportaal van het studiehuis. Tijdschrift voor onderwijsresearch, 21(4), 336–347. Rosenthal, N.R. (1977). A prescriptive approach for counselor training. Journal of Counseling Psychology, 24(3), 231–237. Salomon, G. (1984). Television is “easy” and print is “tough”: The differential investment of mental effort in learning as a function of perceptions and attributions. Journal of Educational Psychology, 76, 647–658.
Clinical assessment computerized methods
52
Salomon, G., & Leigh, T. (1984). Predispositions about learning from print and television. Journal of Communication, 34, 119–135. Simons, P.R.J., & Verschaffel, L. (1992). Transfer: Onderzoek en onderwijs. Tijdschrift voor Onderwijs research, 1, 3–16. Smit, G.N. (1995). De beoordeling van professionele gespreksvaardigheden. Constructie en evaluatie van rollenspel-, video- en schriftelijke toetsen. Baarn: Nelissen. Smit, G.N., & Molen, H.T.van der (1996). Three methods for the assessment of communication skills. British Journal of Educational Psychology, 66 (4), 543–555. Travers, R.M.W. (1970). Man’s information system: A primer for media specialists and educational technologists. Scranton, PN: Chandler Publishing Company. Vermunt, J.D.H.M. (1992). Leerstijlen en sturen van leerprocessen in het hoger onderwijs. Naar procesgerichte instructie in zelfstandig denken. Amsterdam/Lisse: Swets & Zeitlinger. Zee, K.I.van der, Lang, G., & Adema, J. (1997). Het gebruik van multimediale computersystemen binnen trainingen in professionele gespreksvaardigheden. In M.Mirande, J.Riemersma, & W.Veen, (Eds.), Hoger Onderwijs Reeks. De Digitale Leeromgeving. (pp. 273–282). Groningen: Wolters-Noordhoff.
Chapter 5 The Conquered Giant: The Use of the Computer in Play Therapy M.F.Delfos PICOWO, Goeree 18, 3524 ZZ, Utrecht, The Netherlands Abstract The computer plays an important role in every day life. It could represent a powerful tool in play therapy, but play therapists still lag behind in their use of computers in general. Children handle the computer with much more ease than adults. Boys are more attracted to computers than girls. They are especially fascinated by aggressive computer games. The learning potential of the computer is seemingly endless but several risks exist, as the addictive power and the moral adult world of internet is not adapted to the young child. The pace of the computer fits the modern child well. In play therapy the computer can be very useful in the treatment of problems like anxiety, aggression management, ADHD, but also as a general tool. It can further ‘automatic writing’ and help the child express inner turmoil.
The Digibeth World We can no longer conceive a world without the computer. At the end of the century, the millennium problem was not a psychological fin-de-siècle feeling, but a practical one: how to program the computer to enable it to compute data beyond 1999. The millennium problem shows that the importance of the computer has reached far beyond the comprehension of the long line of its inventors, beginning with the sixteen-year-old French Blaise Pascal inventing a calculating machine for his father in the seventeenth century. Now, all over the world, we are being ‘processed’ by computers and daily life has become truly digitalized. Still, there are many areas where the computer is not used and could represent a powerful instrument, as it could in play therapy (Delfos, 1992). Play therapists, however, do not belong to the generation that grew up with the computer, and they often see it at best as a useful instrument for writing up reports on play therapy. That is one of the reasons why there is seldom a computer to be found in the play therapy room. There has been some experimental work on systematic use of the computer in social work with children (The Bridge, 1996). But there is still much resistance. Matsuda (1999) views opponents to small children playing on computers as ‘emotional opponents’ who are opposed because of their own preconceptions. Many of them have never touched a computer.
Clinical assessment computerized methods
54
To children, however, the computer is an instrument for learning while playing, and playing while learning; it is the ‘education permanente’ par excellence. The computer plays an important role in children’s daily life, at school, at home and at their friends’ homes. Roberts (2000) notes that American youth devotes most of its waking activity to media, especially television. About one-half of the youngsters aged 8 through 18 years uses a computer daily at home. Much of the time spent on the computer is while playing computer games. Boys are much more attracted to computer games than girls, and producers of computer games are assiduously looking for games that will attract girls. Girls, however, have a totally different nature from boys, and these differences are encountered in play therapy as everywhere else. Before considering for which mental problems the computer could be useful and how, I want to examine certain aspects concerning the use of the computer: the differences between adults and children, those between boys and girls, and some of the advantages and dangers of computer use. Differences between Adults and Children For the adult, the computer is mainly an instrument for performing several tasks. Its main significance lies in its capacity to organize the world in a digital way. After having been made literate in the nineteenth century, man is becoming ‘digitalized’ during the twentieth century. As in so many other learning fields, children prove to be superior to adults in mastering the new instrument (Delfos, 2000a). Whereas adults possess more knowledge, children seem to have a more lively intelligence. Give the child one language and it will learn it in a way linguists all over the world are still trying to comprehend. Listening carefully to the speakers around him, the infant will discover grammatical rules for himself and make mistakes because children assume a perfect system, which grammar itself is not. He will say ‘I runned’ in what could be seen as a too perfect conjugation. Nobody taught the child to say this. When corrected, it readily understands the exception to the rule, applies it and then generalizes its discovery to the conjugation of the whole verb and other verbs if necessary. Linguists all over the world are still trying to discover the universal grammar underlying that of different languages, and they are still failing. Today the computer is yet one more example that shows us the enormous capacity of children to learn. Children are very intelligent information processors. When you put an adult at a computer, especially with a new program, and you compare his learning speed to that of a child under the same conditions, the difference is staggering. The child finds its way easily, is not hampered by fear and enjoys discovering the new medium. Its eyehand coordination is clearly superior to that of the adult, which we see when we look at the way adults and children handle the ‘mouse’. The child finds its way through trial and error and learns quickly. Most of the time children enjoy the medium and are not easily frustrated by set backs. So the computer could represent an important tool in therapy with children. Still there are differences between boys and girls.
The conquered giant: the use of the computer in play therapy
55
Gender Differences in Computer Use Gender has a basic constitutional influence in the development of the child. We are reluctant to emphasize the differences because they can give rise to a battle of the sexes, as the concepts ‘differences’ and ‘inequality’ tend to get confused. Whether it should be attributed to genetics or socialization, girls and boys differ significantly in how they express problems (Delfos, 2000b). Boys show more externalizing behavior (externally oriented, aggressive behavior) while girls show more internalizing behavior (inwardly directed, anxiety behavior) (Achenbach and Edelbrock, 1978; Verhulst, 1985; American Psychiatric Association, DSM-IV, 1994). As a result, boys in general show more aggressive problems, girls more anxiety problems. It can be said that boys in trouble are generally more annoying to those around them, and that girls with problems are more annoying to themselves. As the environment of a child is more disturbed by externalizing behavior than by internalizing behavior, more boys are registered for help than girls. Boys, moreover, show more disorders than girls (Gomez, 1991; Delfos, 1996). So, clients for play therapy are more often boys than girls. Here the idea of the computer becomes even more interesting, because, as I said above, boys are more attracted to computer games than girls. Kubey and Larson (1990) found that 80% of children between 9 and 15 years who do computer games are boys. Funk, Germann, and Buchman (1997) observed that boys tend to spend three times as much time at the computer than girls. This difference is reflected in the character of the games, because most of the games are action games of an aggressive nature, and aggression generally attracts boys more than girls. This effect can be observed in all kinds of play material, not only computer games. Berenbaum and Hines (1992) discovered that preference for male playthings is associated with the postnatal level of the male hormone testosterone, just as Meyer-Bahlburg and others (1988) found that a decrease in playing with rough playthings is associated with an increase in the level of the female hormone progesterone. In line with this research, during the fifteen years I have had a computer in the play room, boys have used it much more often than girls. Dangers and Advantages of Computer Games The computer is a controversial instrument. It has important advantages, but there are also dangers. One of the important advantages of the computer is that it offers the child a seemingly endless universe of knowledge. The encyclopedia in book form has its counterpart in the computer encyclopedia (for instance Encarta, 1997–2000) here the child cannot only read about Martin Luther King and see his picture, but can also view original video material, hear Martin Luther King pronounce his famous visionary speech and download material for a school project. On the other hand, internet can be explored by children without any restraint, while anyone can place on it virtually everything with no concern about moral standards or censorship of any kind. Some favor censor solutions as a ‘child lock’ on internet but these are still solutions within an adult virtual world.
Clinical assessment computerized methods
56
There should be a ‘childnet’ where opportunities are available for the child without the dangers that the adult world of internet presents. The range of opportunities in the field of computer and video games (computer games that are played through television) seems unlimited: educational games, creative games, adventure games, but also old-fashioned party games. The learning potential reaches far beyond those of the school. As early as 1984 Greenfield showed that the complexity of a ‘simple’ Pacman game easily exceeds those of any ordinary party game. It demands a good eye-hand coordination, a sharp discrimination, quick reaction and a flexibility for changes. The graphic quality of computer and video games are improving day by day, so that their realism is continually approaching that of television. Moreover, computer play and learning material is very attractive and the speed better fits the acceleration that this generation has undergone in the move from television (Coupland, 1991) to computer. If you watch video clips on television, you can see the speed at which young people live, and understand how they are able to process many more images every minute than the past generation can. So the computer fits the standards of youngsters better than much of the orthodox play material. The computer has become an important educational tool. In the Netherlands nearly every classroom has its computer nowadays. There exists an overwhelming quantity of educational programs, from elementary school through to university. One of the great advantages is the way learning material can be adapted to suit the level of a specific child. Moreover, the computer, just like the television, is a very attractive medium. Nihei, Shirakawa, Isshiki, Hirose, Iwata and Kobayashi (1999) report the first systematic use of virtual technology in Japan to improve the quality of life and amenity of in-patients in a children’s hospital. The quality of life of the children who suffered from psychological and physiological stress greatly improved. Children love to spend time on computers, not only to play games. De Leeuw and colleagues (De Leeuw and Otter, 1995; De Leeuw, Hox, Kef, and Hattum, 1997; Borgers, De Leeuw, and Hox, 1999) used the computer as a means to carry out overall research with children and adolescents by using computer questionnaires. The precision of children when filling these in increased enormously and they enjoyed the task very much. The response quality and quantity in children was significantly better with computer questionnaires than with written questionnaires. As children like spending time at the computer so much, here lies one of the dangers of the computer: the risk of addiction. However, that risk seems to be limited. At first most children are fascinated by the computer, but after a time their interest diminishes to a level like that for all other activities. There is a short revival with a new game. However, for a small number of children the computer can become addictive. Probably those who are also prone to gambling. In addition there are the children who have problems playing with their peer group, especially children suffering from an autistic disorder. Many children with Asperger’s syndrome prefer the computer to playing with peers (Wing, 1996). The computer offers them a world where problems with peers do not arise. As a consequence their experience of being with other children lags more and more behind and forming relations with their peers becomes even more difficult. There are several addictive factors inherent in computer games. There is the principle of variable reinforcement (that is that sometimes behavior is reinforced through reward and sometimes not). A variable scheme of reinforcement can be very powerful in strengthening behavior (Bandura, 1986). This is an important element in Action Games
The conquered giant: the use of the computer in play therapy
57
where the child has to ‘shoot’ an adversary, or in Platform Games where the child has to perform activities in order to get to a higher level of the game. There are several aspects that make computer games very attractive and fascinating for children (Greenfield, 1984; Kubey, 1996). The games have very attractive graphics and sound effects that follow youth trends. The child’s curiosity is continuously being aroused, and a good game is certainly constructed so as to offer continuous challenges. Moreover the child can actively control the medium. Mastering a problem is a very rewarding experience. There are ‘help desks’ and ‘walk-throughs” to overcome phases that are too frustrating during the game, so that motivation for continuing remains intact. The computer is a truly interactive instrument and offers opportunities for active control and continuous feedback. Playing on a computer can be a very rewarding experience; the player is a hero in the game and at the end is rewarded by having his name being entered in the ‘hall of fame’ or becoming a ‘supreme warrior’ or such like. Another danger of computer games is that of aggression. Worldwide research (by 1966 already close to two thousand studies since the invention of the television, Federman, 1966) shows that aggression on television tends to stimulate aggression in children, especially boys with an aggressive nature. Bandura (1965) was the first to show the influence of seeing aggression on the later behavior of children. He demonstrated that if children are not always in a position to re-enact the aggressive behavior they have seen, they internalize it and display it at some later time. This new behavior becomes part of what might be called a passive aggressive behavior repertoire. International Unescoresearch by Groebel (1998) showed again that aggression in the media encourages aggression. Huesmann and colleagues (Huesmann, Eron, Lefkowitz, & Walder, 1984; Huesmann and Eron, 1986) found that aggressive behavior in adults was connected with having watched aggressive television programs during childhood. The same results are being found now that computer games are being studied (Silvern and Williamson, 1987; Provenzo, 1991). With the more realistic computer and video games, the effect tends to be even more important. Aggressive games further aggressive behavior (Sherry, 1997). Finally, it is important to point out the danger of computer games for the moral development of young children. By means of the computer the child is confronted with moral material that someone so young cannot always understand. Young children take the world very literally and they run the risk of taking moral judgments in the games too literally. As the scenes are very realistic, children sometimes try to apply the game to the real world, as certain actions seem so rewarding in the virtual world. There are examples of children, who were addicted to the computer game Doom (2000) (a very aggressive game where you have to shoot your adversaries in order to reach a higher level), behaving like serial killers. The generalization of its effect is demonstrated by the fact that this game (adapted version) is used by the American army to numb the feeling of soldiers towards killing. We have no idea how much young children are already exposed to aggression before they have developed a moral sense that enables them to situate that behavior within a specific context. The moral learning of the child mainly takes place through watching the behavior of others (Bandura, 1986). The model does not necessarily need to be an adult from the child’s immediate surroundings. Learning can also take place through symbolic model behavior, a figure from a fairy tale or a television hero. Research on the influence of television programs has demonstrated that television heroes have a model function. This holds also for the symbolic figures of the computer game.
Clinical assessment computerized methods
58
The influence of computer games may be even larger, because in computer games the child plays an active role, producing events on the screen. The interactive opportunities of the computer to ‘communicate’ with the child facilitates identification with the figures in the game. Kohlberg (1987) refined a model of the moral development created by Piaget (1964). The phases of moral development during the primary school age and the beginning of secondary school are: orientation towards obedience and punishment (phase 1), orientation towards personal whims (phase 2), and orientation towards approval by others (phase 3). A young child presumes that authority must be obeyed. The child forms judgments on the basis of the consequences that behavior has: punishment or reward. And this is exactly what is essential in the computer game. Consequently the child is highly susceptible to the standards and values that are offered through the game. With aggressive games the standard is that aggressive behavior (beat, shoot and kill) are rewarded with points and non-aggressive behavior is punished by being ‘killed’, called ‘losing a life’. The ‘shoot and kill’ rewards are on a variable schedule and thus reinforces the behavior of the player. As well as this aspect, the computer game also relates to the world of fantasy of the child. Because of it’s magical way of thinking, the young child has difficulty in discriminating between reality and fantasy (Fraiberg, 1968) and is not able to estimate the degree of reality of the game. From the perspective of the child, the computer is mighty, a giant, and therefore games should be developed by adults aware of their responsibility towards children. Responsibility, however, doesn’t sell, irresponsible behavior does. So most of the computer games are more harmful than not. The young child (until approximately eight years) can take computer games too literally, but as the child grows older, being good is the most important motivation for moral growth. Computer games that offer a spectrum of bad guys and bad behavior being rewarded could influence the moral judgement the child is developing during this period. The moral development of boys and girls is different. Boys are more oriented towards competition and girls more towards cooperation (Damon, 1988). Because computer games with their accent on competition match the moral orientation of boys, they are a threat to moral development, especially of boys (Delfos, 1994). It is important to realize these effects when using the computer in play therapy. Various Computer Applications The computer can be used in various ways in play therapy. It can be used to stimulate a child’s self-expression through writing or drawing. As early as 1984, Johnson mentions the use of the computer in play therapy in making drawings. There are countless numbers of games, and every day the number grows. Many games of these are dexterity games and aggression games. They need not always be bought as children exchange them among themselves. Moreover computer games are easy to copy, so that there is a quick distribution. Different games can be arranged into some distinct categories: Platform Games are those in which the player has to carry out various tasks, and when he succeeds he moves to a higher and more complex
The conquered giant: the use of the computer in play therapy
59
level. One of the first, and still very successful computer games, Mario, is a good example of a platform game. Role Playing Games (RPG) are games in which the player (or players) takes on a specific role and the computer takes on those that remain. The outcome depends on the role taken and the moves made. Adventure Games are those in which the principal personage walks through a virtual world, striving to attain a certain goal, such as rescuing a princess. He is constantly being confronted by problems that he has to solve before he can go any further. Arcade Games cover a wide spectrum, in which there is a lot of action, often very aggressive. These games are also called ‘beat’em ups’ and ‘shoot’em ups’. Simulation Games simulate activities like car racing of flying. Mental Mind Games include the famous Tetris in which descending geometric forms have to be arranged so that they fit together. Games sometimes fall into more than one category. In many games, dexterity plays an important role, as it does in a mental mind game like Tetris. The Computer as a Therapeutic Instrument Because the computer fits in with the world of the contemporary child, it can play an important role in the treatment of several disorders, and be used as a general instrument in play therapy. As well as being a general tool for establishing a good relation with the child, it can be used in the play room in two other ways: to stimulate expression through writing and drawing and for the use of computer games to treat specific problems. In my experience girls use the computer more often as an instrument for expressing themselves, whereas boys prefer computer games. Research on play therapy is very rare. Sometimes a case study has been presented where the computer was used in play therapy, but not in any systematic way. Therefore, what follows is mainly my own experience with the use of a computer in the play room. Possible applications of the computer in play therapy are countless. It is an excellent instrument for associative and creative expression, for the treatment of anxiety, for the management of aggression, for moral development and for concentration. Computer games enhance the eye-hand coordination (Greenfield, Brannon, & Lohr, 1994; Smit, 1992). A computer game like Tetris can enhance concentration (Haier, Siegel, MacLachlan, Soderling, Lottenberg, & Buchsbaum, 1992; Trimmel & Huber, 1998) and develops spatial orientation (Okagaki, & Frensch, 1994). Attention-DeficitHyperactivity-Disorder (ADHD) can be made comprehensible to a child by using a break-out game. Training left-right coordination can be done with a computer and it can be useful in the treatment of addiction. A computer game can be adjusted to fit a particular child’s needs, much more than is possible with ordinary play material. Using examples, the therapist can observe, participate, interpret and explain the problems the child is dealing with. That does not mean that the computer can replace the play material; it is nothing more than an extra
Clinical assessment computerized methods
60
tool. In play therapy, it is especially children between eight and fourteen who use the computer during certain periods. Creative Expression The great advantage of computer writing in play therapy is that it stimulates the child to write without worrying about spelling and grammar. It can always be easily corrected afterwards. As the play therapy is not the place to teach grammar, the speller offers the opportunity to write without mistakes—and children want that—without having to know all the rules. This means that ‘automatic writing’ is made possible. Young children often have difficulties with the motor function of writing, as this function is not fully automatized, and this is certainly often true of children with disorders that are genetic in nature. They feel restrained in the writing they otherwise like to do. On the computer every single letter is readable and the speller corrects the errors. The child can confide his/her inner turmoil without any concern for the computer. Girls in particular like to express themselves in this way. The therapist and the child can write together, alternately, for example. In this way, the therapist can make his therapeutic interventions and the child can respond. This kind of writing orders thoughts in a different way than talking does. It is a powerful tool in treating trauma (Anonymous; Bowen, 1972; Lange, Schrieken, Van de Ven, Bredeweg, Emmelkamp, Van der Kolk, Lydsdottir, Massaro, & Reuvers, 2000a). Writing can change cognitions about inner turmoil, grief and trauma and even influences physical reactions in a positive way (Lange et al, 2000a). As a result of his research in the field Lange (2000b) developed a writing therapy on the Internet, Interapy. The fact that therapist and child are sitting side by side writing at the computer can prove to be very helpful in some instances. Children often confide more easily in situations where there is no direct eye contact, for example during a ride in a car or while washing the dishes (Delfos, 2000a). Therapist and child can write a story about a painful subject in the child’s life. Afterwards it can be printed and at a next session it can be extended. Very often children use the printed text to comfort themselves and to explain their problems to others. For example, a seven year old child wanted to explain to her mother how afraid she was that her mother, an alcoholic, would die. Also she wanted her mother to know that, even though her foster parents were very good to her, she still missed her mother and wanted to live with her. The story was read to the mother by the child and the therapist, and the metaphoric language enabled the mother to take a step back and really listen to the child (Delfos, 1997). Young children especially enjoy the ‘perfect’ production the computer can give. Moreover it is very simple to erase something and rewrite it. Children, no less than adults, want to perform well. It annoys them that they are not able to write or draw perfectly. When drawing, children often greatly enjoy the chance to erase parts of the drawing by clicking on the ‘mouse’ and having an ‘eraser’ under the cursor. Some children enjoy this latter so much that they do nothing else. Children love to draw, but they often see that their drawing is not right. Felt-tips and color crayons cannot be erased. With the computer this is possible and children can control their own drawing. Very often children want to draw a perfectly straight line and the computer enables this to come within reach. Computer drawing does not replace real drawing, but it offers other
The conquered giant: the use of the computer in play therapy
61
opportunities. Children alternate the material they choose to fit in with their needs. Moreover the computer offers an inexhaustible number of ready made illustrations. A scanner makes it possible to insert any image the child wants. Treatment of Anxiety Most of the children in play therapy suffer from anxieties. The origin of a particular anxiety can be treated, but this does not mean it generalizes to all subject areas that have become associated to the original fear subject. One of the problems of anxiety is that the coping mechanisms can be ruled out. The anxiety ‘freezes’ the person into inactivity. And it is this inactivity and the avoidance that increase the anxiety. Computer games can be extremely helpful in the treatment of anxiety. They offer a virtual exposure and stimulate coping mechanisms. Guided by the therapist the child can overcome difficult situations and the effect seems to generalize to daily situations. As a ten-year-old boy said: ‘I was not at all afraid at home after I beat that spider in the game.’ At first I worked with the computer game Castle, using very simple graphics. In the game the player had to overcome several obstacles, one of them killing a little spider. The graphics were nothing more than symbols. They did not even resemble the real thing, but the effect was sometimes greater than using the realistic graphics computer games now available. The identification with the cursor was no problem, the adversity to be defeated was filled with the anxieties the child had. Children are absorbed in the game they play and the difference between reality and fantasy temporarily vanishes. This same game with its simple graphics made the child who was afraid of spiders cry out while playing: ‘It can’t bite me, can it?’, taking his fingers off the keyboard in terror. It was his absorption in the game and the anxiety being laid bare that made this intelligent boy feel that way. Treatment of Aggression The psychological and physiological aspects of aggression can be made comprehensible while playing computer games. All over the world, children like to play aggressive games (Groebel, 1998). Aggression is accompanied by an increase of an aggressive feeling, and has its physical counterpart in the production of adrenergic hormones like adrenaline. This production increases during thrilling computer games and can lead to aggression afterwards. In case the amount of hormones in the body is already high, it is hard to change the course of events. However, at the beginning of the game before the hormone level rises too much, it is possible to stop the aggression from rising. A therapist can show the child how an aggressive feeling increases when playing a computer game and how this can be controlled. The child learns to recognize that his feeling of joy changes into one of frustration and anger when there is adversity in the game. The child is often amazed to feel how an innocent situation can turn into a very unpleasant one so easily. The great advantage is that this is being made comprehensible in the innocent situation of a computer game, where the child’s self-image is not being affected. The child’s motivation to discover the mechanism is greater than when the child is in conflict with his/her surroundings, and the opportunities for learning to change self-expression of aggression increase. The platform games are especially suitable for helping the child overcome frustration and increase the tolerance of that frustration.
Clinical assessment computerized methods
62
Treatment of Attention Deficit Hyper activity Disorder Children with ADHD have problems with impulsiveness, hyperactivity and concentration. They switch from one stimulus to the next. It is important for them to understand themselves and learn to optimize the control they can practice on themselves. A computer game like the ‘break out’ games helps them understand the effects on their surroundings. In this game the player has to direct a ball towards bricks in a wall in order to break them and pull the wall down. In the meantime there are objects falling down from the bricks that are opened and these objects can be helpful or make it more difficult for the player who has to decide quickly whether to catch the object or avoid it. The ball jumps to every side and its behavior is rather unexpected if the player moves too fast. Problems arise if the ball is moved too fast or too slow. That is exactly what a child with ADHD experiences, so it identifies with the ball. This game, like Tetris, trains the leftright coordination, one of the problems that comes with ADHD. Impulsiveness is a big problem when playing a computer game, and children with ADHD see how this works out in a game. Concentration and dexterity games have an enormous advantage for children with ADHD. Tetris, for example, enhances concentration so much that after playing Tetris (Trimmel and Huber, 1998) it is sustained for some time. Treatment of Gambling Compulsion The computer can be used to treat gambling compulsion and to make the addiction statistically visible and felt. There are lots of computer games simulating a gambling machine, for example a fruit machine. While playing this game the physical reaction can be made clear, just as in the case of aggression. The addiction to gambling games is also a physical one. The reinforcement scheme and its reaction can be made visible in graphics without the risk the player runs in real life. General Aspects Moral development can be registered and stimulated with the help of computer games. Lots of games stimulate bad moral behavior. Children, especially boys, love to play these games and cannot be stopped from doing so. The benefit of the play therapy is that it can stimulate moral development while playing the game. There is, for example, a game where you shoot ‘Smurfs’ to kill them, Smurf hunt. The more Smurfs the player kills, the more he is awarded in a high score and even more if the Smurf is female. This is contrary to the idea that you should not kill good people and that you should protect the weak ones. In discussion, the therapist can stimulate moral development. Children’s problems, can be treated with play material, and the wide range of computer games offers new opportunities. For example, a girl living in a children’s home was wondering if her mother was still thinking about her, and if she was actively trying to get her child home again. For some time this girl loved playing the adventure King’s Quest 7 (Williams, 2000), because the principal figure is a queen searching desperately for her daughter. There are many more areas in play therapy where the computer can be used, and further research on possible applications in play therapy is needed.
The conquered giant: the use of the computer in play therapy
63
References Achenbach, T.M., & Edelbrock, C.S. (1978). The classification of child psychopathology: A review and analysis of empirical efforts. Psychological Bulletin, 85, 1275–1301. American Psychiatric Association. (1994) DSM-IV. Diagnostic and Statistical Manual of Mental Disorders. Washington DC: American Psychiatric Association. Anonymous, Bowen. (1972). Toward the differentiation of the self in one’s own family. In J.L.Framo (Ed.). Family interactions. New York: Springer. Bandura, A. (1965). Influence of Model’s reinforcement contingencies on the acquisition of imitative responses. Journal of Personality and Social Psychology, 1, 589–595. Bandura, A. (1986). Social foundations of thought and action: A social cognitive theory. Englewood Cliffs, N.J: Prentice Hall. Berenbaum, S.A., & Hines, M. (1992). Early androgens are related to childhood sex-typed toy preferences. Psychological Science, 3, 203–206. Borgers, N., Leeuw, E.de., & Hox, J. (1999). Surveying Children. Cognitive Development and Response Quality in Questionnaire Research. Methodological Issues in Official Statistics. Stockholm, Sweden: SCB. Bridge, The. (1996). My life in words and pictures. London: The Bridge, Child care consultancy Service London. Coupland, D. (1991). Generation X. Tales for an accelerated culture. New York: St Martin’s Press. Damon, W. (1988). The moral child. Nurturing children’s natural moral growth. New York/London: The Free Press. De Leeuw, E.D., Hox, J., Kef, S., & Hattum, M.van (1997). Overcoming The Problems of Special Interviews on Sensitive Topics: Computer Assisted Self-Interviewing Tailored For Young Children and Adolescents. In Sawtooth Software Conference Proceedings. Sequem, WA: Sawtooth Software Inc. De Leeuw, E.D., & Otter, M.E. (1995). The Reliability of Children’s Responses to Questionnaire Items. Question Effects in Children’s Questionnaire Data. In J.J.Hox, B.F.van der Meulen, J.M.A.M.Janssens, J.J.F.ter Laak, & L.W.C.Tavecchio (Eds.), Advances in Family Research. Amsterdam: Thesis Publishers. Delfos, M.F. (1992). De computer als hulpmiddel in de spelkamer. Nederlands Tijdschrift voor Opvoeding, Vorming en Onderwijs. NTOVO, 8, 6, 388–394. Delfos, M.F. (1994). Jij bent dood, daar krijg ik mooi 150 punten voor! TJJ, Tijdschrift voor Jeugdhulpverlening en Jeugdwerk, 6, 9, 9–13. Delfos, M.F. (1996). Jongens, de zorgenkindjes van de toekomst. Psychologie, 15, 6, 16–17. Delfos, M. (1997) Oline, het olifantje. Over opgroeien bij verslaafde ouders. Bussum: Trude van Waarden Produkties. Series therapeutic children’s books. Delfos, M.F. (2000a, in preparation). Are you listening to me? Conversing with children from four to twelve years old. This is a translation of: Delfos, M.F. (2000a). Luister je wel naar míj? Gespreksvoering met kinderen tussen vier en twaalf jaar. Amsterdam: SWP. Delfos, M.F. (2000b, in preparation). Children and behaviour problems. Anxiety, aggression, depression and ADHD. A biopsychological model with guidelines for diagnostics and treatment. This is a translation of: Delfos, M.F. (2000b) Kinderen en gedragsproblemen. Angst, agressie, depressie en ADHD. Een biopsychologisch model met richtlijnen voor diagnostiek en behandleing. Lisse: Swets & Zeitlinger. Doom (2000). http://hotfiles.lycos.com/cgi-bin/texis/swlib/lycos/info.html?fcode-000048 Encarta. (1997–2000). Microsoft Corporation: http://encarta.msn.com/. Federman, J. (1966). Media Ratings: Design, use and consequences. Studio City, CA: Mediascope. Fraiberg, S. (1968). The magic years: understanding and handling the problems of early childhood. London.
Clinical assessment computerized methods
64
Funk, J.B., Germann, J.N., & Buchman, D.D. (1997). Children and electronic games in the United States. Trends in Communication, 1, 111–126. Gomez, J. (1991). Psychological & Psychiatric problems in men. London/New York: Routledge. Greenfield, P. (1984). Mind and media: The effects of television, video games and computers. Cambridge, Mass: Harvard University Press Greenfield, P.M., Brannon, C., & Lohr, D. (1994). Two-dimensional representation of movement through three-dimensional space: The role of video game expertise. Journal of Applied Developmental Psychology, 15, 87–103. Groebel, J. (1998). Summary of the Unesco Global study on media violence. Paris: Unesco. Haier, R.J., Siegel, B.V.Jr., MacLachlan, A., Soderling, E., Lottenberg, S., & Buchsbaum, M.S. (1992). Regional glucose metabolic changes after learning a complex visuospatial/motor task: a positron emission tomographic study. Brain Research, 20, 570 (1–2), 134–143. Huesmann, L.R. & Eron, L.D. (Eds.). (1986). Television and the aggressive child: A cross-national comparison. Hillsdale, NJ: Erlbaum. Huesmann, L.R., Eron, L.D., Lefkowitz, M.M. & Walder, L.O. (1984). Stability of aggression over time and generations. Developmental Psychology, 20, 1120–1134. Johnson, R.G. (1984). High Tech Play Therapy. Techniques, 1 (2), 128–133. Kohlberg, L. (1987). Child psychology and childhood education. New York: Longman. Kubey, R.W. (1996). Television dependence, diagnosis, and prevention. In T. MacBeth (Ed.). Tuning into young viewers. Thousand Oaks, CA: Sage. Kubey, R.L., & Larson, R. (1990). The use and experience of the new video media among children and young adolescents. Communication Research, 17, 107–130. Lange, A. (2000b). Internet Interapy: http://145.18.113.242/interapy2/nl/public/splash.html. Lange, A., Schrieken, B., Ven, J-P.van de, Bredeweg, B., Emmelkamp, P.M.G., Kolk, J. van der, Lydsdottir, L., Massaro, M., & Reuvers, A. (2000a). Interapy: The effects of a short protocolled treatment of post-traumatic stress and pathological grief through the Internet. Behavioral and Cognitive Psychotherapy, 28, 103–120. Matsuda, S. (1999). Digital of the Heisei era: experiment at Toyonaka Bunka Kindergarten. Turk. Journal of Pediatrics, 41 (suppl.), 91–97. Meyer-Bahlburg, H.F.L., Feldman, J.F., Cohen, P., & Ehrhardt, A.A. (1988). Perinatal factors in the development of gender-related play behavior: Sex hormones versus pregnancy complications. Psychiatry, 51, 260–271. Nihei, K., Shirakawa, K, Isshiki, N., Hirose, M., Iwata, H., & Kobayashi, N. (1999). Virtual reality in a children’s hospital. Turk. Journal of Pediatrics, 41, suppl: 73–82. Okagaki, L., & Frensch, P.A. (1994). Effects of video game playing on measures of spatial performance: Gender effects in late adolescents. Journal of Applied Developmental Psychology, 15, 33–58. Piaget, J. (1964). The moral judgment of the child. New York: Free Press. Provenzo, E.F. (1991). Video-kids: making sense of Nintendo. Cambridge: Harvard University Press. Roberts, D.F. (2000). Media and youth: access, exposure, and privatization. Journal of adolescent health, 27 (2, supp.), 8–14. Sherry, J. (1997). Do Violent video games cause aggression? A meta-analytic review. Paper presented at the International Communication Association, Montreal, Canada. Silvern, S.B., & Williamson, P.A. (1987). The Effects of Video Game Play on Young Children’s Aggression, Fantasy, and Prosocial Behavior. Journal of Applied Developmental Psychology, 8, 453–462. Smit, B.D. (1992). Is your child overdosing on video games? Contemporary Pediatrics, 105–107. Trimmel, M., & Huber, R. (1998). After-effects of human-computer interaction indicated by P300 of the event-related brain potential. Ergonomics, 41, 649–655.
The conquered giant: the use of the computer in play therapy
65
Verhulst, F.C. (1985). Mental health in Dutch children (I): a cross-cultural comparison. Ada Psychiatrica Scandinavia, 72. Williams, R. (2000). King’s Quest 7, http://www.sierra.com/. Wing, L. (1996). The Autistic Spectrum. A Guide for Parents and Professionals. London: Constable & Company Limited.
Chapter 6 A Conceptualization of Alexithymia and Defense Mechanisms in relation to Lateral Brain Dominance P.P.Moormann1, N.Brand2, E.Behrendt1, and J.Massink1 1
University of Leiden, Department of Clinical & Health Psychology, Wassenaarseweg 52, 2333 AK Leiden 2 Utrecht University, Department of Social Sciences, Heidelberglaan 2, 3508 TC Utrecht Abstract 139 second year psychology students of the University of Leiden participated in a study where the relation between alexithymia, defense mechanisms and lateral dominance was investigated. Lateral dominance and defense were measured with the Perceptual Defense Test (PDT), a visual half field task, in which threatening and neutral pictures were very briefly, in a semi-random order, presented in both visual fields. The Defense Mechanism Inventory (DMI) was administered, to allow for differentiation between defensive styles. Both the PDT and the DMI were computerized by using MINDS Test Manager. A paper & pencil version of the Bermond-Vorst Alexithymia Questionnaire (BVAQ) was given to assess alexithymia and its sub-scales. Our results indicate that repression as measured by a visual half field study (PDT) could not predict repression as measured by a questionnaire (DMI). The only exception was that the perceptual defense score based on reaction times was a significant predictor of Principalization, one of the five defensive styles in the DMI. Although affect regulation is a common feature of both alexithymia and defense mechanism, it would be an oversimplification to call them similar. From a theoretical point of view it was therefore hypothesized that the only sub-scale of the BVAQ that could be predicted from the DMI defensive styles would be Emotional Arousability. Substantial support was found for this hypothesis. The only exception was that, in contrast to the hypotheses, the DMI sub-scale Turning Against Self proved to be a significant predictor of the alexithymia sub-scale Reduced Ability to Identify Emotions. The results on defense mechanisms and lateral dominance were ambiguous. The same holds for the relation between alexithymia and lateral dominance. These inconsistent results are attributed to the application of an unsuitable lateral dominance index. The DMI and the BVAQ are verbal tests. The RL-index is based on pictures. The processing of words and pictures rely on a different hemispheric
A conceptualization of alexithymia and defense mechanisms
67
specialization. The use of pictures instead of words would therefore be an invalid method when assessing lateral dominance in the verbal domain.
Introduction The present study is intended to explore: 1. The relation between measures of defense obtained with a questionnaire (Defense Mechanism Inventory or DMI) versus measures of defense obtained with a visual half field study (Perceptual Defense Test or PDT). Although the methods of data collection differ greatly (single trait within multiple methods), both instruments are supposed to measure the same construct, e.g. defense. According to the multi-trait/multi-method matrix design (see Campbell & Fiske, 1959), a high correlation between repression on the DMI and repression on the PDT is hypothesized. 2. The relation between alexithymia and defense mechanisms: • Both obtained with a questionnaire (Bermond-Vorst Alexithymia Questionnaire or BVAQ and Defense Mechanism Inventory or DMI). Here we are dealing with multi-traits within a single method. If alexithymia and defense really are different constructs then they should not be correlated. • Where alexithymia measures are obtained with a questionnaire (BVAQ) and where defense measures are obtained with a visual half field study (PDT). Now we are dealing with multi-traits and multi-methods. Again, if alexithymia and defense are different constructs, then they should not be correlated. 3. The relation between alexithymia and lateral dominance on the one hand, and between defense mechanisms and lateral dominance on the other hand. Under the next subheadings, a more detailed clarification of the constructs of alexithymia, defense, and their interrelationship will be given. Furthermore, the relation between alexithymia and hemispheric specialization will be discussed. The relationship between defense mechanisms and lateral brain dominance will be elucidated as well. The theoretical part will be concluded with several hypotheses on: a) The association between defense mechanism measures obtained with different instruments, b) The association between alexithymia and defense mechanisms, c) Alexithymia and lateral brain dominance, and d) Defense mechanisms and lateral dominance. Alexithymia As early as the late forties, MacLean (1949) described how, in a large proportion of patients with psychosomatic complaints, emotional experience does not reach the stage of full conscious symbolic and verbal elaboration, resulting in problems during psychotherapy based on psychoanalysis (Ruesch, 1948, Groen, Van der Horst, & Bastiaans, 1951). Sifneos (1973) has introduced the word alexithymia to describe this phenomenon. Alexithymic patients are known to have difficulties with naming and recognizing their emotions, and behave action-oriented and stoical. Furthermore, they
Clinical assessment computerized methods
68
appear to be unable to tolerate intense emotions (Krystal, 1988), and exhibit hardly any self-regulation (Taylor, Bagby, & Parker, 1997). The above conceptualization of alexithymia is rooted in a psychodynamic framework, e.g. in a conflict model. In this approach, it is assumed that emotional representations parallel developmental phases, characterized by increasing differentiation of and reflection upon feelings. Here, failures in parental bonding (particularly in the motherchild relationship) are held responsible for the development of alexithymia (Krystal, 1988; Taylor, et al. 1997). Furthermore, it has been suggested that psycho traumata or post traumatic stress can induce alexithymia as well (Krystal, 1988; Yehuda, R., Steiner, A., Kahana, B., Binder-Brynes, K., Southwick, S.M., Zemelman, S., & Giller, E.L., 1997; Taylor et al., 1997). Hence, these findings indicate that situational factors (in particular those that are extremely stressful) do have an impact on the development of alexithymia. However, some authors are more inclined to see alexithymia as an inherent personality aberration (this topic is also treated by Taylor at al., 1997). As with many other personality traits, the corresponding behavior is generally linked to a certain biological make-up. Well-known examples of other biologically based personality traits are introversion—extraversion and sensation seeking. Therefore, it is far from surprising that several authors claim a relationship between alexithymia and the brain. Hemispheric Specialization and Alexithymia Nemiah (1975) tried to explain the alexithymic features by postulating a blocking or malfunction of the neural connections between the limbic system and the neocortex (see Bermond & Moormann, 1999). Although this is an appealing hypothesis, there is little or no experimental confirmation for it. Others have tried to explain alexithymia by stressing the hemispheric specializations and the information exchange between the hemispheres. Various studies have indicated that conscious and serial information processing primarily takes place in the left hemisphere, whereas unconscious, non-verbal and parallel holistic and emotional information processing mainly takes place in the right hemisphere. This has led to two different hypotheses explaining alexithymia. Some authors assume that alexithymia is related to a hyperactive left hemisphere or a reduced functioning of the right hemisphere, resulting in preferential processing in the left hemisphere. Others however consider alexithymia as the result of corpus callosum malfunction. Recent laboratory studies done by Bermond and associates have provided results supporting both hypotheses (e.g. the left hemispheric hyperactivity and the functional commissurotomy hypothesis). Bermond (1995, 1997) distinguishes two types of alexithymia based on both hypotheses. Full-blown alexithymia (Type 1), where emotion excitability is reduced, is caused by a decreased functioning of the orbito-prefrontal cortex, reduced neural dopaminergic innervation of this area, and reduced functioning of the right hemisphere, or possibly, reduced functioning of the commisura anterior (the left hemispheric hyperactivity hypothesis). These regions are also known to affect imagination. In Type 2 alexithymia, the emotional excitability is present, but the accompanying cognitions are absent. A reduced functioning of the corpus callosum (the functional commissurotomy hypothesis) can cause Type 2 alexithymia. Moormann, Bermond, Albach and Van Dorp (1997) have reported empirical evidence for Type 2 alexithymia.
A conceptualization of alexithymia and defense mechanisms
69
The point to be made is that Type 2 alexithymia seems to be caused by situational factors, whereas Type I alexithymia seems to originate from an inherent personality aberration. Defense Mechanisms Mechanisms of Defense are ego-protective strategies to hide threats from the self and thereby reduce anxiety (Carver & Scheier, 2000). The conceptualization of defenses originates from psychodynamic theories. Projective techniques have been used to measure defense mechanisms. However, because most projective techniques are susceptible to unsatisfactory reliability and validity coefficients, alternative measuring devices have been developed. One approach makes use of the reaction time paradigm. According to psychodynamic thinking, threatening stimuli will elicit relatively long reaction times, which are supposed to denote defense (for instance in the Rorschach test). It should be noted that, according to original psychodynamic thinking, extremely short reaction times are not seen as a defensive reaction. Nevertheless, adherents of the reaction time paradigm are still inclined to treat very short reaction times as a defensive reaction (e.g. vigilance, a term from experimental psychology). From a theoretical point of view, this is incorrect. It leads to unnecessary confusion of terminology, to a conceptual contamination. Another way to capture defense mechanisms is to let the subject choose between a neutral and a threatening stimulus. Preference for neutral stimuli is considered to be a defensive reaction while preference for threatening pictures is considered to be an expressive reaction. Both techniques are used in the PDT. Nowadays, even self-reports are used to measure defenses, such as in the DMI. However, it should be noted that it is highly questionable whether defensive reactions can be measured by self-reports. The filling in of a questionnaire implies conscious awareness of what one is doing, while the very essence of a defensive reaction is that the subject is not aware of it. Lateral Brain Dominance and Defense Mechanisms As hypothesized by Ihilevich and Gleser (1986), most of the research findings on the cognitive functions of the left hemisphere suggest that this part of the brain is specialized in verbal, linear and analytic functions. The right hemisphere, on the other hand, is presumably more specialized in intuitive, spatial and holistic tasks. These findings imply that holistic or global defenses (Reversal and Turning Against Self), as defined by the differentiation hypothesis of Witkin (1962), would more likely be associated with right hemispheric dominance. The converse would be true for defenses characteristic of an analytic cognitive style (Turning Against Object and Projection), which would more likely be associated with left hemispheric dominance. The relationship with intellectualizing defenses (Principalization) is however more ambiguous (Ihilevich & Gleser, 1986). Several researchers suggest that the communication between the two hemispheres is “functionally severed” during the operation of intellectualizing defenses, because these defenses presumably separate painful affects from cognition, permitting only the “cold facts” into consciousness. According to this hypothesis, both the
Clinical assessment computerized methods
70
cognitive/analytic (left) and the repressive/global (right) hemispheres are active, albeit independently of each other, during the development of these defenses (Ihilevich & Gleser, 1986). Alexithymia and Defense Mechanisms It is interesting to note that the same mechanism (e.g. ‘functional commissuretomy’) is brought forward for Type 2 alexithymia as for intellectualization (the communication between the two hemispheres is “functionally severed”). In the case of Type 2 alexithymia, the individual has difficulties in describing his/her feelings, despite being emotionally excited. In case of Principalization, the affect “splits off from content and the former becomes repressed: People can describe what happened to them or what they felt, but they don’t show any emotion. The latter can also be called “affective anesthesia” (Minkowski, 1946). From the foregoing description, one might wonder if there’s a real difference between alexithymia Type 2 and intellectualization. In both cases, individuals become emotionally aroused, but they either don’t show the emotion, or the emotion is repressed, or the emotional experience does not reach the level of full cognitive awareness. In fact, it is a matter of terminology. Perhaps that’s the reason that some authors proclaim that alexithymia is a defense mechanism. Results from studies in which the Defense Style Questionnaire (DSQ) and the Toronto Alexithymia Scale (TAS) were administered (Taylor et al., 1997) indeed reveal substantial positive correlations between alexithymia and immature defenses (projection, passive-aggression, acting out, autistic fantasy, denial, dissociation, splitting, and somatization), weak positive correlations between alexithymia and neurotic defenses (idealization, reaction formation, undoing, and repression), and weak negative correlations between alexithymia and mature defenses (sublimation, humor, anticipation, and suppression). Although the results of the empirical studies indicate that alexithymia is associated most strongly with immature defenses, this does not mean (according to Taylor) that alexithymia itself should be conceptualized merely as a primitive defense, as some authors have mistakenly suggested. In Taylor’s vision, alexithymia should be viewed as a more complex construct, in which the developmental and psychic structural elements should be considered that prevent an alexithymic individual from employing more neurotic or mature defenses to manage affects. Hypotheses 1. A significant correlation between repression on the Defense Mechanism Inventory (questionnaire) and repression on the Perceptual Defense Test (visual half field study) is expected. 2. Although alexithymia and defense mechanisms are different constructs, it should be noted that they both rely on the processing of emotion. In the case of alexithymia, there is a disorder of affect regulation. In the case of defense mechanisms, egoprotective strategies are employed to hide threats from the self and thereby reduce neurotic anxiety. In some defense mechanisms, such as Turning Against Object (TAO), Projection (PRO), and Turning Against Self (TAS), this is done by acting out
A conceptualization of alexithymia and defense mechanisms
71
aggressive tendencies; either towards others, such as in TAO and PRO; or towards the self, such as in TAS. In other defenses, such as in Principalization (PRN) and Reversal (REV), aggression is denied or repressed by giving it a neutral or positive connotation. Hence in TAO, PRO and TAS, the emotional arousability is present (such as in Type 2 alexithymia), while in PRN, REV and REP, the emotional arousability is reduced (such as in Type 1 alexithymia). Therefore it is hypothesized that TAO, PRO and TAS predict an increase in emotional arousability (a sub-scale of alexithymia), whereas PRN, REV, and REP predict a decrease in emotional arousability. 3. Furthermore, in the DMI concept of defense, we are only dealing with the reduction of neurotic anxiety, either by expressing, denying or repressing aggressive tendencies. In alexithymia, we are dealing with the whole emotion spectrum. Either emotions cannot reach the stage of full cognitive awareness while still being emotionally aroused (Type 2 alexithymia), or emotions cannot reach the stage of cognitive awareness, simply because subjects are not emotionally aroused at all (Type 1 alexithymia). Hence from a theoretical point of view, mechanisms of defense and alexithymic features (except emotional arousability) are far from being similar. Therefore it is hypothesized that the various defense mechanisms in the DMI should not be related to the sub-scales of alexithymia other than reduced emotional arousability (e.g. reduced ability to verbalize, identify, and analyze emotions, in addition to a lack of creative imagination or fantasy). 4. Because Type 1 alexithymia (with reduced emotional arousability as a crucial feature) is explained by a reduced functioning of the right hemisphere, it is expected that reduced emotional arousability will be related to left hemispheric dominance. In Type 2 alexithymia (with normal emotional arousability) where the communication between the two hemispheres is hampered, as if one hemisphere does not know what the other is doing, but where both hemispheres may be active, it is hypothesized that manifestations of lateral brain dominance will be absent. 5. According to Ihilevich and Gleser (1986), holistic or global defenses (REV and TAS), as defined by the differentiation hypothesis of Within (1962), would more likely be associated with right hemispheric dominance. The converse would be true for defense characteristics of an analytic cognitive style (TAO and PRO), which would more likely be associated with left hemispheric dominance. In PRN it is assumed that both the cognitive/analytic (left) and the repressive/global (right) hemispheres are active, albeit independently of each other, during the development of these defenses (Ihilevich & Glaser, 1986). Therefore, just as in the case of Type 2 alexithymia, statements on lateral brain dominance in PRN don’t seem to be to-the-point.
Method Subjects A total of 139 second year psychology students of the University of Leiden (96 women, 29 men–14 students had forgotten to indicate their sex—from 19 to 55 years of age; M=24,0, s.d.=6.8) participated in our research. None of the students was familiar neither with the instruments used in our research when tested nor with the aim of our study.
Clinical assessment computerized methods
72
Procedure A computerized version (Brand, 1994) of the Dutch adaptation of the Defense Mechanism Inventory (Passchier & Verhage, 1986) was administered on the Mental Information Processing and Neuropsychological Diagnostic System (MINDS’96 Test Manager, see Brand & Houx, 1992; Brand, 1996). Furthermore, the students were asked to participate in a visual half field study (Perceptual Defense Test: Brand, Olff, Hulsman, & Slagman, 1991) on MINDS. A paper and pencil form of the Bermond Vorst Alexithymia Questionnaire (Bermond & Vorst, 1993) was administered as well. All three tests were part of a much larger test battery, in which questionnaires and neuropsychological tests were included. Instruments Alexithymia was measured by aid of the Bermond-Vorst Alexithymia Questionnaire (Bermond & Vorst, 1993). The BVAQ consists of five approximately independent subscales of 8 items each, (4 positive and 4 negative formulated items). Together, these five sub-scales more or less cover the alexithymia features as described by Taylor, Ryan, and Bagby (1985), and Hendryx, Haviland, and Shaw (1991). From the BVAQ, separate scores for the following alexithymia features can be obtained: [1] Reduced ability to differentiate between various emotional feelings (for instance knowing whether one is anxious or angry), [2] Reduced ability to fantasize, [3] Reduced ability to verbalize emotional experiences, [4] Reduced ability to experience emotional feelings (emotion arousability), and [5] Reduced tendency to reflect upon and analyze these feelings (penseé opératoire). The score for alexithymia itself is obtained by the sum total of all 40 item-scores. Factor analysis, on scores of 465 first year psychology students, produced five factors, explaining 49% of the variance with an almost perfect factor structure, eight items loading on one particular factor, four loading positively and four negatively (mean factor loading .67, range .38–.81). The main factor loading of all items was on one factor only, none had a factor loading >.30 on a second factor, with the exception of one item only, loading .32 on a second factor. The internal reliability of the scale is .89, and those for the sub-scales are, in the order as presented above .78, .88, .89, .77, and .81. The correlations between the sub-scales are moderately low and vary between .11 and .41. The validity of the scale has been demonstrated in other research before (Näring & Van der Staak, 1995; Houtveen, Bermond, & Elton, 1996). In a recent study, carried out by Vorst and Bermond (submitted), the BVAQ and the Dutch version of the Toronto Alexithymia Scale (TAS-20) were administered to 430 Dutch students. Correlations between (sub) scales of the Dutch BVAQ and (sub) scales of the TAS-20 support validity of the BVAQ. The validity of the BVAQ is further supported by correlations between BVAQ scores and measurements of psychological problems. Passchier and Verhage (1986) measured mechanisms of Defense with a Dutch adaptation of the Defense Mechanism Inventory (Gleser & Ihilevich, 1969). A computerized version of the DMI was realized by Brand (1994). The DMI is meant to measure 5 clusters of defenses:
A conceptualization of alexithymia and defense mechanisms
73
1. Turning Against Object (TAO). This class of defenses deals with conflict by attacking a real or presumed external-frustrating object. Such classical defenses as identification-with-the-aggressor and displacement can be placed in this category. 2. Projection (PRO). Included here are defenses that justify the expression of aggression toward an external object by first attributing to it, without unequivocal evidence, negative intent, or characteristics. 3. Principalization (PRN). This class of defenses deals with conflict by invoking a general principle that “splits off the affect from its content, and represses the former. Defenses such as intellectualization, isolation, and rationalization fall into this category. 4. Turning Against Self (TAS). In this class are those defenses that handle conflict through directing aggressive behavior toward S himself. Masochism and auto-sadism is examples of defensive solutions in this category. 5. Reversal (REV). This class includes defenses that deal with conflict by responding in a positive or neutral fashion to the frustrating object, which might normally be expected to evoke a negative reaction. Defenses such as negation, denial, reaction formation, and repression are subsumed under this category. In addition to these five sub-scales, Juni (1982) introduced a composite measure, based on the underlying correctional structure: REP=(REV+PRN)−(TAO+ PRO). A positive score on REP denotes repression. A negative score on REP denotes expression. The test consists of 200 items. The shortened version was used (5 situations), with a Likert presentation and scoring method (5 alternatives per situation), scores 0–1–2–3–4. The alpha’s of the Dutch adaptation of the paper & pencil DMI vary from .61 to .80 (Passchier & Verhage, 1986). Internal consistency of the computerized DMI subscales vary between alpha=.66 and alpha=.83 (N=274). The REP score correlated .37 (p<.01) with the score on the MCR Social Desirability Questionnaire, a generally accepted measure of defensiveness. Lateral dominance was measured with the Perceptual Defense Test (Brand et al., 1991), a visual half-field task in which the presentation of the stimuli is done by computer (MINDS test manger). In the PDT, 10 pairs of pictures are presented with one picture in the left visual field (LVF) and the other in the right visual field (RVF) (presentation time: 20 msecs). In half of the presentations the position of the pictures in each pair (left vs. right) is reversed. Each configuration is presented twice so that each condition consists of 40 presentation trials. Presentation order is semi-random, e.g. threat is not presented in the LVF or RVF on more than 3 successive trials. The subjects were asked to choose the ‘most salient picture’. For pictures in the LVF they pressed the left cursor key, and for pictures in the RVF they pressed the right cursor key. Reaction times in msecs were recorded. Several measures can be obtained from the PDT: 1. The sum total of threatening pictures that were thought to be the most salient, and the number of threatening pictures for the LVF and the RVF separately. 2. The sum total of neutral pictures that were thought to be the most salient, and the number of neutral pictures for the LVF and the RVF separately. 3. The Mean reaction time of the chosen threatening pictures, and the Mean reaction time of threatening pictures for the LVF and the RVF separately.
Clinical assessment computerized methods
74
4. The Mean reaction time of the chosen neutral pictures, and the Mean reaction time of neutral pictures for the LVF and the RVF separately. 5. A Perceptual-Defense Score (PD-score), consisting of the sum total of neutral pictures minus the sum total of threatening pictures. A positive score is assumed to measure perceptual defense (repression), while a negative score is assumed to be an indication of heightened vigilance. 6. A Right-Left Index: (R−L)/(R+L)×100. A positive index denotes a preference for pictures in the RVF and a negative denotes a preference for pictures in the LVF. This index was used as a measure for lateral dominance. Preliminary norms for the PDscore and the Right-Left index have been developed. 7. A Perceptual-Defense Score (PD-scoreRT), based on reaction times, and consisting of the sum total of the Mean reaction times of threatening pictures, in both visual fields, minus the sum total of the Mean reaction times of neutral pictures in both visual fields. Longer reaction times are supposed to measure more defensive reactions. A positive score is assumed to measure perceptual defense (repression), while a negative score is assumed to be an indication of heightened vigilance.
Results From Table 1 it can be seen that no significant correlation was found between repression measured by a questionnaire (REP on the DMI) and repression measured by a visual half field study (the two PD-scores of the PDT). Neither on the perceptual defense score based on the choice of pictures (r=−.07, n.s.), nor on the perceptual defense score based on reaction times (r=.21, p=.08). The result from the stepwise multiple regression analysis, where the two PD-scores were used as predictors for the REP-score from the DMI, was not significant either. The same holds for the results of the stepwise multiple regression analyses where the two PD-scores of the PDT were used as predictors for each separate defensive style score of the DMI. However, there was one exception: The PDscore based on RTs turned out to be a significant predictor of PRN (beta=.26, p =.02; r=.26; R=.26; Adj. R square=.05). Hence, on the whole, no convincing support was found for the notion that the same construct (e.g. repression) can be measured with different methods. An interesting result concerns the relation between the two defensive measures within the PDT (r=.32**). A regression analysis was used to test whether the PD-score based on pictures could be predicted from the PD-score based on reaction times. It turned out that the more time spent processing threatening pictures, compared with the time spent processing neutral pictures, the more subjects are inclined to prefer neutral pictures over threatening pictures (beta= .36, p=.001; r=.32; R=.32; Adj. R square=.12). This is exactly what is expected from psychodynamic thinking on defense mechanisms.
Table 1. Pearson correlation coefficients between a) Defense mechanisms obtained with the DMI and hemispheric dominance (RL-Index of the PDT), and b) Defense mechanisms obtained with the DMI and the two Perceptual Defense scores of the PDT.
A conceptualization of alexithymia and defense mechanisms
Defense Mechanism Inventory
Right-Left-Index (PDT)
PD-scoreP (PDT)
75
PD-scoreRT (PDT)
Turning Against Object
.11
.04
−.09
Projection
.02
.05
−.03
−.28*
−.01
.26*
Turning Against Self
−.20
−.19
.01
Reversal
.28*
−.10
.18
−.25*
−.07
.21
Perc. Def. score Reaction Times (PDT)
−.09
.32**
Perc. Def. score Pictures (PDT)
−.06
Principalization
Repression
*:p<.05;**:p<.01
In Table 1 the correlations between lateral dominance and defense mechanisms are given also. These results partly confirm the ideas of Ihilevich and Gleser (1989) that holistic or global defenses (REV and TAS), as defined by the differentiation hypothesis of Witkin (1962), would more likely be associated with right hemispheric dominance. However, only REV was significant (a negative outcome on the Right-Left Index means a preference for the left visual field on the PDT, which would imply right hemispheric dominance). The converse would be true for defense characteristics of an analytic cognitive style (TAO and PRO), which would more likely be associated with left hemispheric dominance. Our findings did not support this notion. Moreover PRN (r=−.28; p<.05) turned out to be associated with right hemispheric dominance instead of relying on the processing of information in both hemispheres. After this preliminary analysis, a stepwise multiple regression analysis was carried out to test which defenses could predict lateral brain dominance. The result with the two PDscores as predictor variables for the RL-index was not significant. All the DMI sub-scales (except REP) were used as predictor variables and the RL-Index was used as dependent variable. Only PRN (beta=− .28, p=.02; r=.28; R=.28; Adjusted R Square=.07) was entered. All the other predictor variables were excluded (some DMI sub-scales had very high inter-correlations: for instance r PRN/REV=.74**), Because REP consists of a composite score, a single regression analysis was done with REP as predictor and the RL-index as dependent variable. REP turned out to be a significant predictor of right hemispheric dominance (beta=−.25, p= .03; r=.25; R=.25; Adjusted R Square=.05). Hence, the results obtained with the RL-index do not support the ideas of Ihilevich and Gleser on the relationship between defense mechanisms and lateral brain dominance, based on the differentiation hypothesis of Witkin. The next analyses are intended to investigate the relationship between defense mechanisms and alexithymia. Most sub-scales of alexithymia were not related to the DMI-dimensions, with the exception of the significant correlation between TAS and having difficulties with identifying emotions. The significant correlations between all the
Clinical assessment computerized methods
76
subscales of the DMI and emotion arousability seem to confirm the notion that a disturbance in affect regulation is a common characteristic of both defense mechanisms and alexithymia. As expected, subjects scoring high on Turning against object (TAO), Projection (PRO) and Turning against self (TAS) turned out to be emotionally arousable (Type 2 alexithymia), whereas subjects high on Principalization (PRN), Reversal (REV), and Repression (REP) were not emotionally arousable (Type 1 alexithymia). A positive index on the Repression score of the DMI implies repression, a negative one expression. A low but significant correlation was found between the Sum Total of alexithymia and the TAS score of the DMI. However, a more robust analysis was needed to test whether the alexithymia sub-scale Emotional Arousability could be predicted from the DMI sub-scales. Therefore, again a stepwise multiple regression analysis was carried out. Now all the DMI sub-scales were entered as predictor variables and Emotional Arousability as dependent variable. The results are given in Table 3. The outcome of the stepwise multiple regression analysis indicates that the alexithymia sub-scale Emotional Arousability was predicted from the DMI sub-scales PRN, TAS and REV, in the expected direction. The three predictor variables together explained 29% of the variance. PRN was the strongest predictor. TAO and PRO were excluded.
Table 2. Pearson correlation coefficients between a) Sub-scales & Sum total of Alexithymia and the DMI sub-scales, b) Sub-scales & Sum total of Alexithymia and the two PD-scores of the PDT, and c) Sub-scales & Sum total of Alexithymia and the RL-Index of the PDT. Reduced ability to Verbalize fantasize identify emotions emotions
become emotionally aroused
analyze emotions (PO)
Sum total alexiathymia
TAO
−.17
−.10
.05
−.29**
−.001
−.20*
PRO
−.14
−.14
.17
−.23**
.01
−.14
PRN
−.02
−.06
−.04
.38**
.03
.07
TAS
−.09
−.14
.30**
−.25**
.03
−.04
REV
.04
−.03
−.004
.33**
.05
.12
REP
.14
.06
−.09
.41**
.02
.19
PD-Sc. Pictures
−.10
−.09
.18
−.002
−.06
−.07
PD-Sc. RTs
−.10
−.19
−.05
.04
.03
−.13
R-L
−.06
−.10
−.11
.04
−.01
−.06
A conceptualization of alexithymia and defense mechanisms
77
Index *:p<.05; **: p<.01
Table 3. Stepwise multiple regression analysis predicting the alexithymia sub-scale emotional arousability from the various defense mechanisms in the DMI Scale
beta
r
R
Adj.R square
PRN
.37***
.38
.38
.13
TAS
−.36***
−.38
.51
.25
REV
.28**
.25
.56
.29
**:p<.01;*** :p<.001
Then a single regression analysis was done with REP as predictor and Emotional Arousability as dependent variable. REP turned out to be a significant predictor of Emotional Arousability (beta=.47, p=.000; r=.41; R=.41; Adj. R square= .21), and explained 21% of the variance. This outcome is in accordance with the notion (Gleser & Ihilevich, 1969) that in repression the expression of emotional experience is inhibited. In contrast to what was hypothesized, the DMI sub-scale TAS turned out to be a significant predictor of the alexithymia sub-scale Reduced Ability to Identify Emotions (beta=.30, p=.002; r=.30, R=.30; Adj. R square=.08). Subjects scoring high on TAS experience difficulties with differentiating between emotions or identifying emotions. However, TAS only explained 8% of the variance. But are there any DMI sub-scales that can predict the Sum Total of alexithymia? This was tested by running a stepwise multiple regression analysis with all the DMI sub-scales as predictor variables, and with the Sum Total of alexithymia as dependent variable. Only TAO was significant (beta=−.20, p= .04; r=.20; R=.20; Adj. R square=.03), but it explained only 3% of the variance. PRO, PRN, TAS, and REV all were excluded. Hence subjects scoring high on Acting Out Tendencies, such as TAO, are not alexithymic. Furthermore, it was tested whether REP could predict the Sum Total of alexithymia. The outcome of the single regression analysis just failed to reach the required significance level (beta=.19, p=.06; r=.19, R=.19; Adj. R square= .03). As can be seen from the correlation matrix in Table 2, no significant correlations were found between the alexithymia subscales and the two PD-scores (repression) from the Perceptual Defense Test. The same result was found for the Sum Total of alexithymia. The results from the regression analyses were not significant either. The relation between alexithymia and lateral brain dominance is also presented in Table 2. None of the correlations were significant, neither were the results from the regression analyses. Hence no support was found for the assertion that reduced emotional arousability (Type 1 alexithymia) would be related to left hemispheric dominance. The results are more in line with what could be expected from Type 2 alexithymia, where
Clinical assessment computerized methods
78
both hemispheres are thought to be active, albeit independently of each other (functional commissurotomy). Finally, t-tests were run to see whether sex differences and differences in handedness could be detected. It should be noted that both the number of males (N=29) and the number of sinistrals (N=10) is much smaller than the number of females (N=96) and dextrals (N=107). Only significant results are reported. On the measure for lateral dominance, e.g. the RL-index, neither a sex difference nor a difference in handedness was found. On the DMI, males (M=40.38; s.d.=11.6) scored significantly higher [t (1, 108)=2.89, p=.005] than females (M=34.65; s.d.=7.6) on PRN. Males (M= 27.17; s.d.=13.3) also had higher scores than females (M=22.02, s.d.=7.2) on REV, but the t-value was not significant (p=.079). Although the Mean of both males and females had a negative value on REP, which means that both sexes were expressive instead of repressive, males (M=−3.04; s.d.=23.4) were significantly less expressive [t (1, 108)=3.06, p=.003] than females (M=− 21.23, s.d.=26.4). A comparable result was found on alexithymia, where males (M=21.65; s.d.=5.9) had significantly higher [t (1, 32.10)=4.45, p=.001] scores on the reduced ability to become emotionally aroused than females (M= 16.18, s.d.=4.0). As far as handedness is concerned, no significant differences were found on the DMI and PDT variables between sinistrals and dextrals. Discussion Our results indicate that repression measured by a visual half field study (PD-score on the PDT) and repression measured by a questionnaire (REP on the DMI) don’t have much in common. The same holds for the PD-scores on the PDT and all the five defensive styles of the DMI. Obviously, we are dealing with different aspects of the construct of defense, although the theoretical assumptions underlying both tests pretend to measure the same form of defense (e.g. repression). These results cast doubts on the validity of both defensive reactions measured by the PD-scores and defensive reactions measured with the DMI. Possible explanations for the lack of similarity between the different defense measures mainly relate to different operationalisations of the construct of defense: 1. A critique on the DMI, regarding the measurement of defense, concerns the very nature of the test, e.g. a self-report questionnaire. According to Maddi (1973) and Pervin (1980) questionnaires are more likely to measure the surface structure of personality and are therefore not suited to measure deeper layers of personality such as the unconscious. The fundamental mechanism of defense is repression. Freud often used the term defense and repression interchangeably. Despite the fact that repression is sometimes undertaken consciously (being thereby equivalent to suppression) when the person tries to manipulate an idea entering consciousness, most discussions of repression assume that its operation is usually unconscious (Carver & Scheier, 2000). Hence the essence of repression is its unconscious nature, and because questionnaires are known to be less suited to measuring the unconscious, it is more likely that the DMI measures the more conscious defenses. In fact, the DMI is more suited to measuring coping styles than defense mechanisms in the strict psychodynamic sense. Therefore, it is highly questionable to what extent the composite measure of repression
A conceptualization of alexithymia and defense mechanisms
79
[REP=(REV+PRN)−(TAO+ PRO)], introduced by Juni (1982), really is a reflection of the original conceptualization of repression as defined by Freud (e.g. as a process of keeping an idea or impulse in the unconscious). 2. Regarding the measurement of unconscious defenses, the PDT is a more promising instrument. The pictures are exposed very briefly. The subject is lacking the time needed to process them fully. This ambiguity invites the subject to project the content of less conscious parts of the mind into the pictures (just like a Thematic Apperception Test). However, the pictures themselves are open for critique. There are too many confounding variables, such as differences in color, clarity, contrast, and complexity. It might well be that people choose neutral pictures not because they are less threatening to them, but instead because they are more colorful, brighter, more complex, or more pleasing from an esthetic point of view. In that case we are measuring something else than perceptual defense. However, the result on the relation between the two defensive measures within the PDT does not support this hypothesis. It turned out that the more time spent on the processing of threatening pictures, compared with the time spent on the processing of neutral pictures, the more subjects were inclined to prefer neutral pictures over threatening pictures. This is exactly what is expected from psychodynamic thinking on defense mechanisms. It supports the construct validity of the perceptual defense measures of the PDT. 3. When comparing the modalities in which the defense takes place, it should be noted that the DMI is verbal while the PDT is visual. The processing of verbal material relies on different mechanisms in the brain than the processing of visual spatial material. It might well be that there is a difference in defensive reactions between modalities as well, which might have contributed to the lack of association between questionnaires and visual half field studies when measuring defense mechanisms. It is quite possible that there would have been more association between the two methods if in the visual half field study words were used instead of pictures. Regarding the relation between alexithymia and defense mechanisms, two points were emphasized: 1. Defense mechanisms and alexithymia share a common feature, e.g. affect regulation. Concerning defense mechanisms, ego-protective strategies are employed to hide threats from the self and thereby reduce neurotic anxiety. In some defense mechanisms, such as Turning Against Object (TAO), Projection (PRO), and Turning Against Self (TAS), the reduction of neurotic anxiety is achieved by acting out aggressive tendencies; either towards others, such as in TAO and PRO; or towards the self, such as in TAS. Acting out aggressive tendencies implies being emotionally excited, which is also a feature of Type 2 alexithymia. Therefore, it was hypothesized that TAO, PRO and TAS predict increased levels of emotional arousability. In other defenses, such as in Principalization (PRN) and Reversal (REV), aggression is denied or repressed by giving it a neutral or positive connotation. This implies a reduced ability to experience emotion excitability, which is also an important feature of alexithymia Type 1. Therefore, it was hypothesized that PRN, REV, and REP predict reduced levels of emotional arousability. Our results seem to confirm these hypotheses. The correlation configuration in Table 2 is an almost perfect reflection of this quite complex relationship. However, some of the subscales of the DMI turned out
Clinical assessment computerized methods
80
to have inter correlations greater than .70. To capture these methodological problems, a multivariate approach to data-analysis seemed more justified. Conforming with the expectation, the outcome of the stepwise multiple regression analysis revealed that the alexithymia sub-scale emotional arousability was predicted from the DMI sub-scales PRN, TAS and REV in the expected direction. The three predictor variables together explained 29% of the variance. TAO and PRO were excluded. PRN was the strongest predictor. The latter result makes sense when one realizes that this class of defense deals with conflict by invoking a general principle that “splits off” affect from content and represses the former. Defenses such as intellectualization, isolation, and rationalization fall into this category. This is exactly the kind of behavior shown by Type 1 alexithymics: highly rational without any affective tones. According to Gleser & Ihilevich (1969), the expression of emotional experience is inhibited in repression as well. REP could not be analyzed together with the other predictor variables, because REP is a composite of the other DMI sub-scale variables. Therefore, a single regression analysis was done. In accordance with the expectation, REP turned out to be a significant predictor of emotional arousability, and explained 21 % of the variance. 2. Although affect regulation is a common feature of alexithymia and defense mechanism, both constructs are different when other features are taken into consideration. In the concept of defense we are only dealing with the reduction of neurotic anxiety, either by expressing, or by denying or repressing aggressive tendencies. In alexithymia, we are dealing with the whole emotion spectrum. Either emotions cannot reach the stage of full cognitive awareness while still being emotionally aroused (Type 2 alexithymia), or emotions cannot reach the stage of cognitive awareness simply because subjects are not emotionally aroused at all (Type 1 alexithymia). Substantial support was found for this hypothesis. The alexithymia sub-scales other than emotional arousability could not be predicted from the DMI subscales. However, there was one exception: in contrast to what was hypothesized, the DMI sub-scale TAS turned out to be a significant predictor of the alexithymia subscale Reduced Ability to Identify Emotions. Subjects with a strong tendency to handle conflicts by directing aggressive behavior toward the self, reported experiencing difficulties in differentiating between emotions or in identifying emotions. Why? • It seems plausible that feelings of guilt and shame predominate in Turning against self: “It’s my fault, I am to blame”. The array of emotions might therefore be more restricted, but this is more speculation than fact. But what constitutes the Sum Total of alexithymia? This was tested by running a stepwise multiple regression analysis with all the DMI sub-scales as predictor variables, and the Sum Total of alexithymia as dependent variable. Only TAO was significant, but it explained only 3% of the variance. Hence subjects who deal with conflict by attacking a real or presumed external frustrating object are not alexithymic. Such classical defenses as identification-with-the-aggressor and displacement can be placed in this category, and are treated as immature defenses. Taylor et al. (1997) reported substantial positive correlations between TAO and alexithymia. We found the opposite. How can this be explained?
A conceptualization of alexithymia and defense mechanisms
81
• The test used might be a factor. The Defense Style Questionnaire and the Toronto Alexithymia Scale were used in the studies quoted by Taylor and associates. In the Toronto Alexithymia Scale, the subscales Fantasy and Emotional Arousability are omitted. This has an effect on the sum total of alexithymia, particularly when populations of psychiatric patients are involved in the research. Borderline patients, for instance, score high on TAO, but are nevertheless still emotionally excitable and may have fantasy as well, because they often dissociate. The last two subscales lower the sum total of alexithymia on the Bermond Vorst Alexithymia Questionnaire (5 sub-scales), but won’t affect the sum total of the Toronto Alexithymia Scale (3 sub-scales). This may influence the direction of the correlation. The results on the relation between defense mechanisms and lateral brain dominance were disappointing. According to Ihilevich and Gleser (1986), holistic or global defenses (REV and TAS), as defined by the differentiation hypothesis of Witkin (1962), would more likely be associated with right hemispheric dominance. The converse would be true for defenses characteristic of an analytical cognitive style (TAO and PRO), which would more likely be associated with left hemispheric dominance. In PRN it is assumed that both the cognitive/analytical (left) and the repressive/global (right) hemispheres are active, albeit independently of each other, during the development of these defenses (Ihilevich & Glaser, 1986). This would imply that no statements about lateral brain dominance can be made concerning PRN. A stepwise multiple regression analysis was carried out to test which defenses could predict lateral brain dominance. In a stepwise multiple regression analysis all the DMI sub-scales (except REP) were used as predictor variables and the RL-Index was used as dependent variable. All the other predictor variables except PRN were excluded; PRN turned out to be a significant predictor of right hemispheric dominance instead of having no outspoken lateral dominance. Hence, no support was found for the differentiation hypothesis and lateral dominance. Because REP consists of a composite score, a single regression analysis was done with REP as predictor and the RL-index as dependent variable. REP turned out to be a significant predictor of right hemispheric dominance. This finding is in agreement with the vision hold by Gur and Gur (1975), who found that more primitive defenses, such as denial, reaction formation (reversal), and repression, were more related to right hemispheric dominance. However, the result with the two PD-scores (repression) as predictor variables for the RL-index was not significant. The relation between alexithymia and lateral brain dominance was also investigated. Because Type 1 alexithymia (with reduced emotional arousability as a crucial feature) is explained by a reduced functioning of the right hemisphere, it was hypothesized that reduced emotional arousability would be related to left hemispheric dominance. Although recent laboratory studies done by Bermond and associates did provide results for the left hemispheric activity hypothesis, no support for this hypothesis was found in our data. In Type 2 alexithymia (with normal emotional arousability), where the communication between the two hemispheres is hampered, as if one hemisphere does not know what the other is doing but where both hemispheres may be active, it was hypothesized that manifestations of lateral brain dominance would be absent. This hypothesis was confirmed.
Clinical assessment computerized methods
82
On the whole, the results on lateral brain dominance were quite confusing. This conclusion concerns both defense mechanisms and alexithymia. The disappointing results might be explained as follows: The DMI and BVAQ are verbal tests. The RL-index used as an index for lateral brain dominance is based on the processing of pictures. From subjects showing a preference for pictures presented in the left visual field, it is assumed that in the processing of this task, e.g. a visual spatial task, they exhibit right hemispheric dominance and vice versa for subjects showing a preference for pictures presented in the right visual field. However, the processing of words and pictures is known to rely on different hemispheric specializations in RH subjects: words in the left and pictures in the right hemisphere. Furthermore this asymmetry in function is less clear in women than in men (despite this critique it should be noted that in regard to the RL-index neither a sex difference nor a difference in handedness was found in our study). From the foregoing it is concluded that the use of pictures instead of words is highly inappropriate for a reliable assessment of lateral dominance in the verbal domain. It is just like comparing apples with pears. Regarding the DMI, it is therefore recommended to use neutral and threatening words instead of pictures in the PDT to test the relation between defensive styles and lateral brain dominance. The PDT does not seem suitable at all for testing the relation between alexithymia and lateral brain dominance. Apart from being pictorial, another confounding variable consists of the neutral versus threatening content of the stimuli. Therefore, it is suggested that future studies use a tachistoscope, where the subject has to identify neutral words presented either in the left or right visual field to test both the left hemispheric activity hypothesis and the functional commissurotomy hypothesis. Subjects with Type 1 alexithymia should have shorter reaction times for words presented in the left visual field, whereas subjects with Type 2 alexithymia should not have differences in reaction time between words presented in the left or right visual field. Furthermore, because of the interference between the hemispheres, Type 2 alexithymics should have relatively longer reaction times than non-alexithymics.
References Bermond, B. (1995). Alexithymia, een neuropsychologische benadering [Alexithymia, a neuropsychological approach]. Tijdschrift voor Psychiatric, 37, 717–727. Bermond, B. (1997). Brain and alexithymia. In A.Vingerhoets, F.van Bussel, & J.Boelhouwer (Eds.), The (non)expression of emotions in health and disease (pp. 115–129). Tilburg:Tilburg University Press. Bermond, B., & Moormann, P.P. (1999). Brain and alexithymia: Empirical evidence for both the hypothesis of left hemispheric hyperactivity and the hypothesis of functional commissurotomy. In A.J.J.M.Vingerhoets & I.Nyklícek (Eds.), Abstracts of the Second International Conference on The (Non)Expression of Emotions in Health and Disease, (pp. 43). Tilburg, The Netherlands.
A conceptualization of alexithymia and defense mechanisms
83
Bermond, B., & Vorst, H. (1993). The Bermond Vorst Alexithymia Scale. Unpublished Internal Report, Department of Psychology, University of Amsterdam. Brand, A.N. (1994). Defensiemechanismen [Defense Mechanisms]. Psychologie & Computers, 11, 139–145. Brand, A.N., (1996). MINDS: Een testmanager voor gezondheids-psychologisch en neuropsychologisch onderzoek [A testmanager for helath psychological and neuropsychological research]. In B.P.L.M.den Brinker, P.J.Beek, A.P. Hollander, & R.T.Nieuwboer (Eds.), Zesde workshop computers in de psychologic, (pp. 37–39). IFKB: Amsterdam. Brand, N., & Houx, P.J. (1992). MINDS: Toward a computerized test battery for health psychological and neuropsychological Assessment. Behavioral Research Methods, Instrumentation and Computers, 24, 385–389. Brand, A.N., Olff, M., Hulsman, R., & Slagman, C. (1991). Perceptual Defense: The use of digitized pictures. In M.Olff, G.Godaert, & H.Ursine (Eds.), Quantification of human Defense (pp. 293–301). Heidelberg: Springer-Verlag. Campbell, D.T., & Fiske, D.W. (1959). Convergent and discriminant validation by the multitraitmultimethod matrix. Psychological Bulletin, 56, 81–105. Carver, C.S., & Scheier, M.F. (2000). Perspectives on personality (4th. ed.). Needham Heights, Mass.: Allyn & Bacon. Gleser, G.C., & Ihilevich, D. (1969). An objective instrument for measuring Defense mechanisms. Journal of Consulting and Clinical Psychology, 33, 51–60. Groen, J.J., Horst, L.van der, & Bastiaans, J. (1951). Grondslagen der Klinische Psychosomatiek [Fundamentals of Clinical Psychosomatics]. Haarlem: De Erven F.Bohn. Gur, R.E., & Gur, R.C. (1975). Defense mechanisms, psychosomatic symptomatology, and conjugate lateral eye movements. Journal of Consulting and Clinical Psychology, 43, 416–420. Hendryx, M.S., Haviland, M.G., & Shaw, D.G. (1991). Dimensions of alexithymia and their relationships to anxiety and depression. Journal of Personality Assessment, 56, 227–237. Houtveen, J.M., Bermond, B., & Elton, H.R. (1996). Alexithymia: A disruption in a cortical network? An EEG power and coherence analysis. Journal of Psychophysiology, 11, 147–157. Ihilevich, D., & Gleser, G.C. (1986). Defense mechanisms. Their classification, correlates, and measurement with the Defense Mechanism Inventory. Owosso: DMI Associates. Juni, S. (1982). The composite measure of the Defense Mechanism Inventory. Journal of Research in Personality, 16, 193–200. Krystal, H. (1988). Integration and self-healing; affect, trauma and alexithymia. Hillsdale, New York: Analytic Press. MacLean, P.D. (1949). Psychosomatic disease and the “visceral brain”. Psychosomatic Medicine, 11, 338–353. Maddi, S.R. (1973). Personality theories: A comparative analysis (3rd ed.). Homewood, Illinois: The Dorsey Press. Minkowski, E. (1946). L’anesthésie affective [Affective anesthesia]. Ann. Medicopsychol., 104, 8– 13. Moormann, P.P., Bermond, B., Albach, F., & Dorp, I.van (1997). The etiology of alexithymia from the perspective of childhood sexual abuse. In Ad Vingerhoets, Frans van Bussel, & Jan Boelhouwer (Eds.), The (non)expression of emotions in health and disease (pp. 139–153). Tilburg: Tilburg University Press. Nähring, G.W.B., & Staak, C.P.F.van der (1995). Perception of heart rate and blood pressure: The role of alexithymia and anxiety. Psychotherapy and Psychosomatics, 63, 193–200. Nemiah, J.C. (1975). Denial revisited: reflection on psychosomatic theory. Psychotherapy and Psychosomatics, 26, 140–147. Passchier, J., & Verhage, F. (1986). The Defense Mechanism Inventory. Preliminary findings on reliability and validity of the Dutch translation. Gedrag & Gezondheid, 14, 119–124. Pervin, L.A. (1980). Personality: Theory, assessment, and research (3rd ed.). New York: John Wiley & Sons.
Clinical assessment computerized methods
84
Ruesch, J.E. (1948) The infantile personality. Psychosomatic Medicine, 10, 134. Sifneos, P.E. (1973). The prevalence of “alexithymic” characteristics in psychosomatic patients. Psychotherapy and Psychosomatics, 22, 255–262. Taylor, G.J., Bagby, R.M., & Parker, D.A. (1997). Disorders of affect regulation: Alexithymia in medical and psychiatric illness. Cambridge: Cambridge University Press. Taylor, G.J., Ryan, D., & Bagby, R.M. (1985). Toward the development of a new self- report alexithymia scale. Psychotherapy and Psychosomatics, 44, 191–199. Vorst, H.C.M., & Bermond, B. (submitted). Validity and reliability of Bermond-Vorst Alexithymia Questionnaire. University of Amsterdam, The Netherlands. Witkin, H.A., Dijk, R.B., Fraterson, H.F., Goodenough, D.R., & Karp, S.A. (1962). Psychological differentiation: Studies of development. New York: Wiley. Yehuda, R., Steiner, A., Kahana, B., Binder-Brynes, K., Southwick, S.M., Zemelman, S., & Giller, E.L. (1997). Alexithymia in Holocaust survivors with and without PTSD. Journal of Traumatic Stress, 10, 1, 93–100.
Chapter 7 Quantification of Eye Blinks and Eye Tics in Gilles de la Tourette Syndrome by Means of Computer-Assisted Observational Analysis—Clinical Application J.H.M.Tulen1, M.Azzolini1,2, J.A.de Vries1, W.H.Groeneveld3, J.Passchier2, and B.J.M.van de Wetering1 Departments of Psychiatry1, Medical Psychology & Psychotherapy2, and Biomedical Physics & Technology3, University Hospital Rotterdam— Dijkzigt and Erasmus University Rotterdam, Dr. Molewaterplein 40, 3015 GD, Rotterdam, The Netherlands Abstract Inter- and intra-individual differences in frequency of spontaneous eye blinks and eye tics were studied in patients with the Gilles de la Tourette syndrome (GTS) in order to increase our understanding of (eye)tic-related behavior in GTS. Spontaneous eye blinks and eye tics of 9 patients with GTS and 10 healthy controls were recorded on videotape during periods of rest, conversation and while watching a video with an entertaining program. Frequency of blinks and eye tics were assessed by means of a computer-aided observational analysis (The Observer program) from videotape. In comparison with a healthy control group, the Tourette patients showed a significantly higher blink rate during periods of rest and video watching. Conversation induced a significant increase in blink rate in the control group, but not in the Tourette patients, whereas video watching significantly increased blink rate in both groups. The frequency of eye tics showed a significant decrease during conversation and increased significantly during video watching in Tourette patients. For 5 of the 9 patients, a significant positive correlation between blink rate and eye tic frequency was found, whereas 1 patient showed a significant negative correlation. These are the first quantitative data to illustrate task specific effects on eye tic frequency and the complexity of their relationship with spontaneous eye blinks. The Observer program proved to be an adequate tool to quantify these relationships for patients with complex behaviors.
Clinical assessment computerized methods
86
Introduction Gilles de la Tourette syndrome (GTS) is a chronic neuropsychiatric disorder characterized by recurrent and involuntary motor tics (e.g., eye blinking, facial twitches, neck jerking) and vocal tics (e.g., throat clearing, squealing, grunting, yelling) as well as complex behavioral symptoms (Shapiro et al., 1988). The tics usually show a waxing and waning over the course of time, are often exacerbated by stress (Leckman et al., 1993), but can also be temporarily suppressed. Eye blink tics are among the most frequently reported initial symptoms of GTS (Comings & Comings, 1985). Quantification of blink frequency in Tourette patients can be of relevance because spontaneous blink frequency is considered to be a useful noninvasive measure of central dopaminergic activity (Karson, 1983); elevated central dopaminergic activity is assumed to play a role in the etiology of GTS (Shapiro et al., 1988). So far, quantitative studies on blink rate in Tourette patients are scarce and contradictory (Bonnett, 1982; Karson et al., 1985). In order to evaluate spontaneous blink frequency in Tourette patients accurately, these should be studied in relation to eye tics, including both eyelid tics (e.g. blepharo-spasm, unilateral or bilateral squeezing the eyelids; opening the eyes wide) and eyeball tics (e.g. rolling of the eyeballs, staring), since these may interfere with spontaneous eye blink rate. A positive correlation between blink rate and severity of tics was reported by Karson et al. (1985), but they included all kinds of tics, thereby ignoring details of the possible interaction between eye blinks and eye tics. Spontaneous eye blink rate in healthy subjects under controlled environmental conditions can be affected by a variety of activities, such as conversation, emotional states and arousing stimuli, during which blink rate has been reported to increase (Hall, 1945; Ponder & Kennedy, 1927; Weiner & Conception, 1975; Karson et al., 1981; Bentivoglio et al., 1997), whereas activities such as reading, visually demanding tasks and increased concentration appear to reduce blink rate (Hall, 1945; Karson et al., 1981; Bentivoglio et al., 1997; Stern & Skelly, 1984). In their group of GTS patients, Karson et al. (1985) reported a significant reduction in blink frequency during reading, similar to the reduction observed in their group of healthy controls. However, at present there are no quantitative data available on the state dependency of the occurrence of eye tics and their relationship to spontaneous eye blinks. In this study, we quantified spontaneous eye blink rate and eye tics in a group of GTS patients during standardized periods of quiet rest, a brief conversation with the researcher and watching a video with an amusing educational program. The blink data of the patients were compared with data from a healthy control group. Quantification of spontaneous eye blink rate is usually performed by means of EOG (electro-oculogram) recordings. However, due to the complexity and diversity of eye tics (consisting of both complex eyelid and eyeball movements) and the current lack of validated signal features of eye tics, this measurement method was not considered adequate for the quantification of eye tics. Therefore, the time- and task-dependent changes in eye blinks and eye tics were quantified by means of an observational analysis from videotape. In this exploratory study our goals were a) to evaluate details of relationships between eye blinks and eye
Quantification of eye blinks and eye tics in gilles de la tourette syndrome
87
tics by focusing on inter- and intra-individual differences during rest and task periods, in order to increase our understanding of the complexities involved in tic-related behavior in GTS, and b) to evaluate the practical use of a computer-assisted observational analysis program (The Observer program) for the quantification of the behavioral aspects of task responsiveness in neuropsychiatric disorders. Methods Subjects Patients: Nine patients (7 males, 2 females; mean age: 37.6 years, sd:13, age range: 20– 56 years) with current symptoms of Tourette syndrome participated in a study aimed at the objective quantification of eye blinks, eye tics and head movements. The patients were recruited in a random order within a period of two months from the psychiatric outpatient service of the University Hospital Rotterdam—Dijkzigt and were diagnosed by a senior psychiatrist according to the criteria of the DSM-IV (American Psychiatric Association, 1994). All patients suffered from multiple and single motor and vocal tics, as well as obsessive compulsive behaviors. Patients had variable duration of the illness, and all scored mild to moderate on severity of the syndrome according to the Yale Global Tic Severity Scale (Chappell et al., 1994). No specific selection of patients was made regarding the presence of increased blink rate or eye tics during the intake interview. Six of the 9 patients received medical treatment at the time of the study (5 patients were on neuroleptics; 1 patient was treated with an antidepressant). Controls: The control group consisted of 10 healthy volunteers (5 males, 5 females; mean age: 33.8 years, Sd: 16, age range: 18–61 years), who were unrelated to the patients. The subjects had no personal history of movement disorders or ocular diseases and were free of psychoactive drugs at the time of the study. During the intake, the subjects were informed about the procedures, measurements, and video-recordings, and gave written informed consent to participate in the study. None of the subjects wore corneal lenses. Procedure Each subject participated in a recording session of about one hour at the Psychophysiological Laboratory of the University Hospital Rotterdam—Dijkzigt. During the entire procedure, the subject was seated in a comfortable chair and was not allowed to smoke or drink coffee. For the measurement of head movements, 4 accelerometers were placed on an elastic headband; the signals were recorded during the whole session on a small portable digital recorder attached to a waist-belt. This study will focus only on the quantification of eye blinks and eye tics, which was done by means of observational analysis of video recordings. A camcorder (Panasonic NV-S7E), placed at a distance of about 2 meters from the subject, recorded in detail the movements of the face, head, and shoulders of the subject during the following sequence of tasks: a) REST1: a 10-minute period of rest (quiet sitting, without talking), b) CONVERSATION: an interview/conversation of 5 minutes during which the subject talked freely with the
Clinical assessment computerized methods
88
researcher about his/her favorite hobbies, c) REST2: a 5-minute period of rest (quiet sitting, without talking), d) VIDEO: watching a video (of about 15 minutes) on a television monitor; the tape showed an entertaining television program regarding a specific topic (in this case how neon light is made and used), with sketches and singing, combined with technical and educational information; no task was required from the subject, the instruction was merely to relax and watch the video, and e) RESTS: a 5minute period of rest (quiet sitting, without talking).
Figure 1. Hardware configuration required for analysis of videotapes by means of the Observer program. Quantification of Eye Blinks and Eye Tics Spontaneous eye blinks and eye tics during the periods of rest and the two tasks (conversation, video watching) were quantified by means of computer-aided visual analyses of the videotapes. The following definitions were used regarding scoring of blinks and tics:
Quantification of eye blinks and eye tics in gilles de la tourette syndrome
89
1. Spontaneous Eye Blink: a bilateral paroxysmal closure of the eyelids (duration <1 second) in the absence of a clear provoking external stimulus. 2. (Eye) Tic: a) normal eye blink occurring at the same instant as a sudden head movement, b) squeezing of the eyelids, either unilateral or bilateral; the eyelids are tightly closed for a variable duration of time, involving the orbicular ocular muscles, c) rolling of the eyes: a brief bilateral movement of the eyeballs, whereby rolling up, down, left, or right of the eyes were quantified separately, d) opening the eyes wide: the eyelids are widely opened (bilateral) for a variable period of time; this usually involved a period of staring, e) combination of opening the eyelids wide and rolling of the eyeballs, and f) uncertain: this category was used when a tic occurred that did not fall into one of the above categories. Analysis Tools and Equipment Prior to the analysis phase, a copy of the original videotape was made on which a Vertical Interval Time Code (VITC) was added in order to have an exact time-base for the analysis (AEC-18 time code generator). The configuration of the analysis system used is given in Figure 1. It consisted of a videocassette recorder (VCR-Panasonic AG-5700) linked to an IBM-compatible personal computer and a standard video monitor. An additional PC plug-in board was used to read the time code information required by the software. The analysis program (The Observer, MS Windows version 4.0; Noldus, Wageningen, The Netherlands), which functions as an event recorder, had full control (frame by frame to fast speed) over the VCR via an RS-232 serial link. The Observer program allows one to define the independent and dependent variables in a configuration file, which is then used for event recording within a specific application (in this case quantification of blinks and tics). The collected events were saved in observational data files (ASCII format), which contain information regarding the duration as well as the frequency of the different types of behaviors, and an accurate indication of the occurrence of each event in time (0.02 second accuracy). In addition, graphical output was generated (example: Figure 2). Blink rate and frequency of tics were summarized per 10 seconds, for the rest and task periods separately, in order to study details of trends and relationships between blink rate and eye tic frequency. In addition, blink rate was summarized for the first and fourth minute of each rest and task period for the statistical evaluation of initial and sustained effects during the tasks. The incidences of different eye tics were summarized to obtain a total number of eye tics per minute per period. Reliability of Scoring The first 4 minutes of each rest and task period were scored by a trained observer. A second observer evaluated one rest and task period (REST1, CONVERSATION) completely, and the first and fourth minute of REST2, VIDEO, and RESTS. For these periods, an overall percentage of agreement between the two observers of 95.5% was found for the number of blinks of the controls, 96.6% for the blinks of the Tourette patients, and 87.8% for the eye tics of the Tourette patients.
Clinical assessment computerized methods
90
Figure 2. Example of a graphical display of observational events in time. Data from individual subjects during the first minute of REST1 are shown: A) occurrence of spontaneous blinks, in time, for the 10 control subjects; B) occurrence of spontaneous blinks, in time, for the 9 Tourette patients; C) occurrence of spontaneous eye tics, in time, for the 9 Tourette patients. Statistical Analysis Data are presented as mean and standard deviation (SD). Statistical analyses were performed by means of the SPSS-Windows (release 8.0) package. In order to analyze the initial effects of the tasks (CONVERSATION, VIDEO) on blink rate in patients and controls, the data of the first minute of each task were compared with the data of the preceding rest period by means of two-tailed Wilcoxon tests, for each group separately. Sustained task effects were computed by comparing the first and fourth minute data of each task, also for each group separately (two-tailed Wilcoxon tests). Task effects on eye
Quantification of eye blinks and eye tics in gilles de la tourette syndrome
91
tic frequency in the Tourette patients were studied in a similar manner. Differences in blink rate between patients and controls were evaluated by means of two-tailed MannWhitney U tests, both for the rest and task periods. Per patient, relationships between blinks and eye tics were studied by computing Spearman rank correlation coefficients between the number of blinks and the number of eye tics per 10-second periods, per condition. A p-value of <.05 was used to indicate a significant difference or a significant correlation. Results Spontaneous Eye Blink Rate in Controls and Tourette Patients Spontaneous eye blink data during rest and task periods of controls and patients are presented in Figure 3. Controls: During the fourth minute of the 3 rest periods (baseline values), mean blink rate varied between 10.5 (Standard Deviation, SD: 7.7) and 13.9 (11.7) per minute. Conversation significantly increased blink rate (from 13.1 to 26.5 blinks/minute; p<.05), as did video watching (from 10.5 to 17.0 blinks/minute; p=.001). Blink rate remained elevated during the first minute of rest after the tasks had finished. During the tasks, there was no significant difference between the blink rate of the first and fourth minute, indicating a lack of habituation to the tasks. Patients: Blink rate during minute 4 of the 3 rest periods of the Tourette patients varied between 32.9 (11.4) and 34.4 (18.7) blinks/minute. Although, on average, the mean blink rate appeared to increase during the conversation (from 33.6 to 39.2 blinks/minute), this was not a significant increase, due to the large inter-subject variability. Video watching significantly increased mean blink rate from 32.9 (11.4) to 43.2 (11.6) blinks/minute (p<.05). Similar to the controls, no habituation effects to the tasks were observed and no significant decrease during the first minute of rest after the tasks had finished. Patients versus Controls: Tourette patients showed a significantly higher blink rate than control subjects during all periods (for all comparisons: p<.01), with the exception of the conversation period, during which no significant differences in comparison with the controls were observed (Figure 3). Eye Tics in Tourette Patients Based on the individual data per patient (table 1), mean eye tic frequency per minute during rest varied from 0 to 15.8 tics/minute. During the 3 rest periods, the overall mean eye tic frequency varied between 3.6 (3) and 4.9 (5) eye tics/minute. During conversation, the mean total number of eye tics decreased significantly versus the preceding rest period (from 4.8 to 2.1 tics/minute; p<.05), whereas mean eye tic frequency during video watching increased significantly versus the preceding rest period (from 3.6 to 8.0 tics/minute; p=.05). Mean total eye tic frequency during video watching was significantly higher than during conversation (p=.015). After video watching, mean
Clinical assessment computerized methods
92
total eye tic frequency per min decreased again to baseline levels (from 8.0 to 4.9 tics/minute; p=.05).
Figure 3. Mean (+ standard deviation) of the number of spontaneous eye blinks during the first and fourth minute of each rest and the task period, for the control and patient group separately. Table 1. For each Tourette patient, the mean number of blinks per minute and the mean number of eye tics per minute, per rest and task period. Patient
Rest1 bl/m ;
tt/m
Conversation
Rest2
bl/m
bl/m
tt/m
Video tt/m
bl/m
Rest3 tt/m
bl/m
tt/m
1#
38.0
8.8
36.8
1.5
28.0
5.3
29.5
1.8
30.0
3.5
2
41.3
0.5
77.3
0.5
40.3
0.0
58.5
0.0
67.0
0.8
3#
34.5
12.3
42.8
3.5
32.8
3.8
47.8
15.3
34.0
9.0
4#
20.3
1.8
11.3
0.3
14.5
0.8
19.8
2.5
11.3
0.0
5$
32.5
1.3
31.5
0.0
31.3
2.0
40.5
5.3
24.3
2.3
6#
40.5
7.8
37.8
2.0
46.8
6.0
44.0
9.8
29.5
3.0
7
37.3
8.0
39.3
8.0
32.0
9.0
43.8
24.3
40.5
15.8
Quantification of eye blinks and eye tics in gilles de la tourette syndrome
93
8#
26.3
2.3
80.8
3.0
41.3
5.3
55.0
6.8
39.8
8.8
9
21.3
0.0
8.3
0.0
28.8
0.0
33.3
6.0
36.5
0.5
mean
32.4
4.8
40.7
2.1*
32.9
3.6
41.4*
8.0*
34.8
4.9*
(SD)
(8)
(5)
(25)
(3)
(9)
(3)
(12)
(8)
(15)
(5)
bl/m: mean number of blinks per min; tt/m: mean total number of eye tics per min; #: patients treated with neuroleptics; $: patient treated with antidepressant; Wilcoxon tests: within-group change versus preceding period: *:p<0.05.
Table 2. For each Tourette patient, the Spearman rank correlation coefficients between the number of eye tics and the number of eye blinks per 10-second periods, per rest and task period and for the overall procedure. Patient
Rest1 (n=24) total
Conversation (n=24) total
Rest2 (n=24) total
Video (n=24) total
Rest 3 (n=24) total
Overall (n=120) total
1
.39
.01
.12
.37
−.07
.15
2
.08
.42*
–
–
−.15
.10
3
−.61**
.13
−.04
−.43*
−.43*
−.18*
4
.37
.22
−.04
.54**
–
.31***
5
.37
–
.16
.52**
.44*
.31***
6
.27
.10
.13
.19
.22
.28**
7
.38
.24
.29
.38
.63***
.43***
8
.34
−.02
−.19
−.04
.00
−.02
9
–
–
–
.46*
−.14
.23**
*:p<.05; **:p<.01; ***:p<.001;. –: no correlation coefficient could be computed because of absence of eye tics.
Correlations between Blink Rate and Eye Tics in Tourette Patients Table 2 shows that the strength of relationships between blink rate and eye tic frequency varied per patient and per condition. Only one patient (patient 3) showed significant negative correlations between blink rate and eye tic frequency (during 3 of the 5 conditions: REST1, video watching, RESTS). For most of the patients and conditions, we did not observe significant correlations between blink rate and eye tics, although significant positive correlations were found during video watching for 3 patients. Correlation coefficients based on the overall data per subject indicated for 5 patients a significant positive correlation between blink rate and eye tic frequency, and for 1 patient a significant negative correlation.
Clinical assessment computerized methods
94
Discussion In this study, inter- and intra-individual aspects of spontaneous eye blink rate and frequency of eye tics in Tourette patients were evaluated during periods of rest, conversation and video watching, by means of a computer-assisted observational analysis from videotape. We found significant and task-dependent differences in blink rate versus a control group and significant task-dependent effects on eye tic frequency in the Tourette patients. Large inter- and intra-individual differences were observed regarding relationships between blink rate and eye tic frequency in the GTS patients. Spontaneous Blink Rate in Tourette Patients and Controls On average, blink rate during quiet rest in our control group varied between 10 and 14 blinks/minute (SD varied between 8 and 12). These rates appear somewhat lower than those reported by Zametkin et al. (1979; 14–18 blinks/minute) and Bentivoglio et al. (1997; 17 blinks/minute), but it should be noted that our analysis periods were selected to exclude residual effects of previous tasks. The increase in blink rate during a casual conversation is in agreement with findings of other studies (e.g., Hall, 1945; Bentivoglio et al., 1997). We included watching a video with an amusing program as a second task because it required a minimum of attention or concentration, was pleasant for the subjects to perform, and required (in contrast to conversation) no speaking or interaction with other persons. Similar to conversation, we observed an increase in blink rate during video watching in our control group. Apparently, a simple task of watching a video without speaking is enough to increase blink rate. Overall, the Tourette patients showed a significantly higher blink rate during periods of rest and video watching, as compared to the control group. During these periods, an almost threefold increase versus the controls was observed, similar to the findings of Bonnet (1982), but contrary to the data of Karson et al. (1985). Whereas video watching significantly increased blink rate in the Tourette patients, we observed no significant increase during conversation, although the patients were actively involved in this task. If an increased blink rate during rest may have been responsible for a reduced task responsiveness during conversation, it then remains unclear why Tourette patients showed a stronger reaction to video watching. Perhaps the continuous change of visual stimuli during video watching due to the lively character of the program has contributed to the observed findings. From clinical experience, we have the impression that increased environmental stimulation may lead to increased tic frequency in Tourette patients. Eye Tics and Eye Blinks in Tourette Patients The frequency of eye tics showed a significant decrease during conversation and increased significantly during video watching in Tourette patients. During this period, eyelid tics were also significantly lower than during video watching. Our findings illustrate that task responsiveness of spontaneous eye tics exists and that it is different from that of spontaneous eye blinks. It is evident that passively watching an amusing video stimulates both blink rate and eye tic rate, whereas active involvement in a conversation reduces eye tic frequency without significantly affecting blink rate. We did
Quantification of eye blinks and eye tics in gilles de la tourette syndrome
95
not find evidence of specific rebound effects on eye tic frequency after the tasks had been stopped. When analyzing relationships between blinks and eye tics for the whole recording period, we found a significant positive correlation between blink rate and eye tic frequency in 5 of the 9 patients, whereas one patient showed a significant negative correlation. When we looked at relationships per rest or task period, an even more complex pattern emerged. For most of the patients no significant correlations were found; video watching was the task during which the largest number of significant correlations were observed (4). The correlations seemed to indicate that: 1) usually there is no significant correlation between frequency of eye blinks and eye tics, 2) in some subjects a significant positive or negative correlation may occur, but 3) this is dependent on certain situational, cognitive, or perceptual requirements. Dopaminergic Function and Eye Blinks Spontaneous blink frequency is considered to be a useful non-invasive measure of central dopaminergic activity, based on both clinical and experimental data (Karson, 1983; Kleven & Koek, 1996). Elevated central dopaminergic activity is assumed to play a role in the etiology of GTS (Shapiro et al., 1988). This implies that alterations in blink rate in GTS may be attributed to a central dopaminergic dysfunction. In this study, we observed an increased blink rate in Tourette patients versus a healthy control group. Five of the nine patients were treated satisfactorily with dopamine antagonists (neuroleptics) at the time of the measurements. Bonnett (1982) observed a normalization of blink frequency after neuroleptic treatment in GTS patients, but Karson et al. (1985) did not find a significant effect of pimozide on blink rate in their patient group. Our results indicate that, even though five of the nine patients were on neuroleptic treatment, blink rate was about threefold increased versus healthy controls. Overall, these findings tend to support the hypothesis of a central dopaminergic dysfunction in GTS. Whether or not the frequency of eye tics was influenced by the neuroleptic medication could not be answered within the context of this study. Computer-assisted Observational Analysis in Neuropsychiatric Disorders Systematic quantitative knowledge about the psychological aspects of “involuntary” behaviors, such as tics or other hyperkinesia (e.g. tremor, chorea, myoclonus and dystonia), is limited to a few reports. Yet objective quantitative data are essential in order to increase our understanding of the state-dependency of overt movement dysfunction in relation to psychological processes, as well as for the evaluation of therapy efficacy. At present this area of research is too much dominated by case reports and subjective clinical assessments by means of questionnaires. Quantification of motor activity solely by means of electromyography (EMG) or accelerometry is at present not an option because of the subtle and complex characteristics of these movement disorders. The Observer program, however, proved to be an adequate tool for quantifying these details of tic-related behavior in time. The resolution of the program (0.02 sec) was sufficient to analyze the very fast (eye)tics and the sometimes very high frequency of eye blinks/min. An additional advantage of the program was that time-dependent relationships between different behavioral events can be displayed graphically, which can be very informative
Clinical assessment computerized methods
96
when looking for behavioral patterns. The major drawbacks of this method were that it is time-consuming and can be used only in controlled situations (where the position of the camera is optimal). Future developments in this area of research should therefore focus on movement recordings by means of EOG, EMG or accelerometer sensors, whereby feature-extraction by means of signal processing techniques are combined with a visual analysis of behavior, by means of an observational analysis program. With this approach, the signal processing criteria can be validated. In addition, the extent of the advantages of signal processing can be explored (the analyses will be less time-consuming with signal processing techniques; measurements can also be performed under ambulatory conditions) as well as their shortcomings (some behaviors are too subtle and complex to be captured reliably by means of feature-extraction techniques). At present, the Observer program proved to be an adequate tool for quantifying details of movement dysfunction in patients with complex behaviors. Conclusion In this study, the Observer program was used to quantify details of inter- and intraindividual aspects of task-related (eye)tic behaviors in patients with the syndrome of Gilles de la Tourette. Although the method proved to be time-consuming, as a result of the nature of the chosen behaviors, for the first time relevant relationships between eye blinks and eye tics could be explored due to the accuracy of the program regarding event monitoring in time. In a clinical research field where still much can be gained from the objective quantification of overt movement disturbances in neuropsychiatric disorders, the Observer program can be a useful tool for quantifying task-specific and/or therapeutic effects under controlled conditions. References American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders. (4th ed.). Washington, DC: American Psychiatric Press. Bentivoglio, A.R., Bressman, S.B., Cassetta, E., Carretta, D., Tonali, P., & Albanese, A. (1997). Analysis of blink rate patterns in normal subjects. Movement Disorders, 12, 1028–1034. Bonnet, K.A. (1982). Neurobiological dissection of Tourette syndrome: a neurochemical focus on a human neuroanatomical model. In A.J.Friedhoff & T.N.Chase (Eds.), Gilles de la Tourette syndrome (pp. 77–82). New York: Raven Press. Chappell, P.B., McSwiggan-Hardin, M.T., Scahill, L., Rubenstein, M., Walker, D.E., Cohen, D.J., & Leckman, J.E. (1994). Videotape tic counts in the assessment of Tourette’s syndrome: stability, reliability, and valididy. Journal of the American Academy of Child & Adolescent Psychiatry, 33, 386–393. Comings, D.E., & Comings, B.G. (1985). Tourette syndrome: clinical and psychological aspects of 250 cases. American Journal of Human Genetics, 37, 435–450. Hall, A. (1945). The origin and purposes of blinking. British Journal of Ophthalmology, 29, 445– 467. Karson, C.N. (1983). Spontaneous eye-blink rates and dopaminergic systems. Brain, 106, 643–653. Karson, C.N., Freed, W.J., Kleinman, J.E., Bigelow, L.B., & Wyatt, R.J. (1981). Neuroleptics decrease blinking in schizophrenic subjects. Biological Psychiatry, 16, 679–682.
Quantification of eye blinks and eye tics in gilles de la tourette syndrome
97
Karson, C.N., Kaufmann, C.A., Shapiro, A.K., & Shapiro, E. (1985). Eye-blink rate in Tourette’s syndrome. Journal of Nervous Mental Disorders, 173, 566–569. Kleven, M.S., & Koek, W. (1996). Differential effects of direct and indirect dopamine agonists on eye blink rate in Cynomolgus monkeys. Journal of Pharmacology and Experimental Therapeutics, 279, 1211–1219. Leckman, J.F., Walker, D.E., Goodman, W.K., Pauls, D.L., & Cohen, D.J. (1993). Premonitory urges in Tourette’s syndrome. American Journal of Psychiatry, 150, 98–102. Ponder, E., & Kennedy, W.P. (1927). On the act of blinking. Quarterly Journal of Experimental Physiology, 18, 89–110. Shapiro, A.K., Shapiro, E.S., Young, J.G., & Feinberg, T.E. (1988). Gilles de la Tourette syndrome. New York: Raven. Stern, J.A., & Skelly, J.J. (1984). The eye blink and workload consideration. Proceedings of Human Factor Society, 28, 942–944. Weiner, E.A., & Concepcion, P. (1975). Effects of affective stimuli mode on eye-blink rate and anxiety. Journal of Clinical Psychology, 3, 256–259. Zametkin, A.J., Stevens, J.R., & Pittman, R. (1979). Ontogeny of spontaneous blinking and of habituation of the blink reflex. Annals of Neurology, 5, 453–457.
Section II COMPUTERIZED METHODS
Chapter 8 Sequential Testing in Psychodiagnostics Using Minimax Decision Theory H.J.Vos University of Twente, Faculty of Educational Science and Technology, Department of Educational Measurement and Data Analysis, P.O. Box 217, 7500 AE Enschede, The Netherlands Abstract The purpose of this paper is to derive optimal rules for sequential testing problems in psychodiagnostics. In sequential psychodiagnostic testing, each time after a patient has been exposed to a new form of treatment, a decision must be made whether to declare this new treatment effective, ineffective, or to continue testing and exposing the new treatment to another patient suffering from the same mental health problem. The framework of minimax decision theory is proposed for solving such sequential testing problems; that is, optimal rules are obtained by minimizing the maximum expected losses associated with all possible decision rules at each stage of testing. The main advantage of this approach is that costs of testing can be explicitly taken into account. The paper concludes with a simulation study, in which the minimax strategy is compared with other procedures that exist for similar classification decision problems in the literature.
Introduction Many different treatments have been developed in psychodiagnostics for patients with all kinds of mental health problems. For instance, patients suffering from anorexia nervosa with symptoms such as severe weight loss, binge eating, dysfunctional thoughts related to eating, presence of suicidal tendencies, and presence of low self-esteem, might be given treatments such as psychodynamic therapy, nutrition counseling, pharmacotherapy and cognitive-analytic therapy. The success of such treatments can be measured by different types of tests, such as questionnaires, interviews, observational schemes, attitude measurements, and so on. Suppose it is decided that a new treatment is considered effective if it turns out to be effective for a certain pre-specified proportion of the total population suffering from some mental health problem. The effectiveness of the new treatment, however, is generally only tested by a small sample of the total population suffering from that
Sequential testing in psychodiagnostics using minimax decision theory
101
specific mental health problem. Hence, the question can be raised, how many patients of the sample must be tested before we can make a confident decision as to whether or not the new treatment is effective. This kind of testing procedure is denoted as sequential testing. Sequential tests are designed with the goal of maximizing the probability of making correct classification decisions (i.e., declaring the new treatment is either effective or ineffective) while at the same time minimizing the number of patients to be tested. For a similar problem in education, Ferguson (1969) applied Wald’s well-known sequential probability ratio test (SPRT) procedure to sequential mastery testing where the decision to be made is to classify a student as a master, a non-master, or to continue testing and administering another random item. The main advantage of sequential mastery tests is that they offer the possibility to provide shorter tests for those students who have clearly attained a certain level of mastery (or clearly non-mastery) and longer tests for whom the mastery decision is not as clear-cut. For instance, Ferguson (1969) showed that average test lengths could be reduced by half without sacrificing classification accuracy. Similarly, the main advantage of sequential testing in psychodiagnostics is that less patients are needed for making decisions concerning the effectiveness of new treatments, especially if the new treatment has clearly demonstrated its effectiveness or ineffectiveness. To demonstrate the SPRT-framework, suppose that we need to determine whether a student knows more or less than some proportion p of the items in an item pool, that is, whether the student is either a ‘true’ master or ‘true’ non-master. In order to use the SPRT procedure to make this decision, an indifference region, within which the decision made does not matter, must first be selected by the decision-maker, around p, say pn
Clinical assessment computerized methods
102
that it does not take costs of testing explicitly into account. In testing students on mastery/non-mastery this wasn’t a real problem, since these costs can be assumed to be very low. In psychodiagnostics, however, costs of testing a patient may be quite large. The purpose of this paper, therefore, is to derive optimal rules for sequential testing in psychodiagnostics that takes costs of testing explicitly into account. Decision rules are hereby prescriptions specifying for each possible observed pattern of reactions what decision (i.e., declaring the new treatment as effective/ineffective, or choosing to continue testing another patient) has to be made. In the present paper, the minimax principle from statistical decision theory (e.g., De Groot, 1970; Lehmann, 1959) is proposed for sequential testing in psychodiagnostics. Minimax Sequential Decision Theory In addition to costs of testing one additional patient (i.e., ‘cost per observation’), two other basic elements are distinguished in the framework of minimax sequential decision theory. Before being able to describe these two elements, however, first some necessary notation must be introduced. In the following, as in Ferguson’s approach (1969), a sequential test is supposed to be exposed to a maximum number of n patients (n≥1) in order to make decisions concerning the effectiveness of the new treatment within a reasonable period of time. Let the observed reaction of the k-th patient (1≤k≤ n) to the new treatment be denoted by xk, which takes the value 0 or 1 for reacting negatively or positively, respectively, to the new treatment. Furthermore, let sk=x1+…+xk (0≤sk≤k) denote the observed number of patients who react positively to the new treatment after k patients have been tested. Finally, let p (0 ≤p≤1) denote the (unknown) proportion of the total population suffering from some mental health problem for whom the new treatment is effective. In the sequel, this proportion will be denoted as the success rate of the treatment. Assuming an observed pattern of reactions (x1,…,xk), the two other basic elements of the minimax sequential principle can now be formulated as follows: A probability model Prob(sk|p) relating sk to p at each stage of testing k (i.e., measurement model), and a loss function describing the loss L(+, p) or L(−, p) incurred when a new treatment is declared effective or ineffective for given p, respectively. Generally, a loss structure evaluates the total costs and benefits for each possible combination of classification outcome and success rate of the treatment. Having introduced these two other basic elements of minimax sequential decision theory, maximum expected losses associated with the two classification decisions can now be calculated straightforwardly at each stage of testing by first multiplying the probability of each classification outcome by its associated loss and then summing these values. Next, its maximum value is taken. As far as the maximum expected loss associated with the continue testing decision is concerned, this quantity is determined by averaging the maximum expected losses associated with each of the possible future classification outcomes relative to the probabilities of observing those outcomes. These probabilities are conditional on the observed pattern of all previous reactions and are denoted as posterior predictive distributions. To compute these conditional probabilities, a prior probability must be specified in advance by the decision-maker representing our
Sequential testing in psychodiagnostics using minimax decision theory
103
best prior beliefs concerning the success rate of the treatment, that is, before any patients have been tested. Optimal rules (i.e., minimax sequential rules) are now obtained by choosing the decision that minimizes maximum expected loss at each stage of testing, using techniques of backward induction. This technique starts by considering the final stage of testing (i.e., after the last patient of the sample has been exposed to the new treatment) and then works backward to the first stage of testing (i.e., after the first patient of the sample has been exposed to the new treatment). In fact, the minimax principle assumes that it is best to prepare for the worst and to establish the maximum expected loss for each possible decision rule (e.g., van der Linden, 1981). In other words, the minimax decision rule is rather conservative and pessimistic (Coombs, Dawes, & Tversky, 1970). Threshold Loss and Costs of Testing In this section, the loss function and the costs for testing one random patient will be presented. Here, the well-known threshold loss function is adopted as the loss structure involved. The choice of this loss function implies that the “seriousness” of all possible consequences of the decisions can be summarized by possibly different constants, one for each of the possible classification outcomes. For our sequential testing problem in psychodiagnostics, a threshold loss function can be formulated as a natural extension of the one for the fixed-length mastery problem in education (e.g., van der Linden, 1981) at each stage of testing k, as follows:
Table 1. Table for threshold loss function at stage k (1≤k≤n) of testing. Success rate Decision Declaring ineffectiveness Declaring effectiveness
p≤p0
p>p0
ke
101+ke
110+ke
ke
The value p0(0≤p0≤1) denotes the minimum success rate for considering the new treatment as effective, which must be specified in advance by the decision-maker. Furthermore, the value e represents the costs of testing one additional patient. For the sake of simplicity, these costs are assumed to be equal for each classification outcome as well as for each testing occasion. Of course, these two assumptions can be relaxed in specific sequential testing problems in psychodiagnostics. Assuming the losses l00 and l11 associated with the correct classification outcomes are equal and take the smallest values, the threshold loss function in Table 1 was rescaled in such a way that l00 and l11 were equal to zero. Hence, the losses l01 (i.e., false negative) and l10 (i.e., false positive) must take positive values. Note that no losses need to be specified in Table 1 for the continue testing option. This is because the maximum expected loss associated with the continue testing option is computed at each stage of testing as a weighted average of the maximum expected losses
Clinical assessment computerized methods
104
associated with the possible classification outcomes of future patients exposed to the new treatment, with weights equal to the probabilities of observing those outcomes (i.e., the posterior predictive distributions). For assessing the loss parameters lij (i,j=0,1; i ≠ j) associated with the incorrect classification outcomes, most texts on decision theory propose lottery methods (e.g., Luce & Raiffa, 1957) using the notions of desirability of outcomes to scale the consequences of each combination of classification outcome and p. However, in principle, any psychological scaling method can be used. Binomial Probability Model Following Wald (1947) and Ferguson (1969), in the present paper the well-known binomial model will be adopted for the probability that after k patients have been tested, sk of them react positively to the new treatment. Its distribution at stage k of testing for given p, Prob(sk|p), can be written as follows: (1)
The binomial model assumes that each patient suffering from some mental health problem can be considered as being randomly drawn with replacement (i.e., independent draws) from a sample of patients suffering from the same mental health problem. Optimizing Rules for the Sequential Testing Problem In this section, it will be shown how optimal rules for sequential testing in psychodiagnostics can be derived using the framework of minimax decision theory. Doing so, given an observed pattern of reactions (x1,…,xk), the minimax principle will first be applied to the fixed-length testing problem by determining which of the maximum expected losses associated with the two classification decisions is the smallest. Next, applying the minimax principle again, optimal rules for the sequential testing problem are derived at each stage of testing k by comparing this quantity with the maximum expected loss associated with the option of testing another random patient. Applying the Minimax Principle to the Fixed-Length Testing Problem As noted before, the minimax decision rule for the fixed-length testing problem can be found by minimizing the maximum expected losses associated with the two classification decisions, that is, declaring that the new treatment is either effective or ineffective. Let max[E(L(+, p)|sk)] and max[E(L(−, p)|sk)] denote respectively the maximum expected losses associated with these two classification decisions, given that sk patients reacted positively to the new treatment. It then follows that effectiveness of the new treatment is declared when the number of positive reactions sk is such that
Sequential testing in psychodiagnostics using minimax decision theory
105
(2) and ineffectiveness of the new treatment is declared otherwise. Let y=0,1,…,k represent all possible values the number of positive reactions sk can take after having exposed k patients to the new treatment, and using Table 1, it then can easily be verified from (1) and (2) that effectiveness of the new treatment is declared when the number of positive reactions sk is such that (3)
and that ineffectiveness of the new treatment is declared otherwise. Since the cumulative binomial distribution function is decreasing in p, and rearranging terms, it follows that effectiveness of the new treatment is declared when the number of positive reactions sk is such that: (4) and that ineffectiveness of the new treatment is declared otherwise. Derivation of Minimax Sequential Rules Let d(x1,…,xk) denote the minimax sequential rule at stage k of testing, then, at each stage of testing, d(x1,…,xk) can be found by using the following backward induction computational scheme: First, the minimax sequential rule at the final stage of testing n is computed. Since the continue testing option is not available at that stage of testing, it follows immediately that the minimax sequential rule (i.e., d(x1,…,xn)) coincides with the minimax rule for the fixed-length testing problem, that is, declare effectiveness of the new treatment if the inequality in (4) holds for sk=sn and k=n; otherwise, declare ineffectiveness of the new treatment. Next, the minimax sequential rule at stage (n−1) of testing is computed by comparing the minimum of the two classification decisions, that is, min{max[E(L(+, p)|Sn−1)], max[E(L(−, p)|Sn−1)]}, with the maximum expected loss associated with the continue testing option. As noted before, the maximum expected loss associated with testing one more patient at stage (n−1) of testing, given an observed pattern of reactions (x1,…,xn−1), is computed by averaging the maximum expected losses associated with each of the possible future classification outcomes at the final stage of testing n, relative to the probability of observing those outcomes (i.e., backward induction). Let Prob(xn|sn−1) denote the probability of observing reaction xn (xn=0 or 1) at the final stage of testing n, given an observed number of positive reactions sn−1 on the (n−1)
Clinical assessment computerized methods
106
previous stages of testing. Then the maximum expected loss associated with testing one more patient after (n−1) patients have been tested, max[E(L(c, p)|sn−1)], is computed as follows: (5) Note that (5) averages the maximum expected losses associated with each of the possible future classification outcomes at stage n of testing, relative to the probability of observing those outcomes. Generally, Prob(xk|sk−1) is called the posterior predictive probability of observing reaction xk(xk=0 or 1) at stage (k− 1) of testing, on the condition that an observed number of positive reactions sk−1 have been obtained in the (k−1) previous stages of testing. It will be indicated later on how this conditional probability can be computed. Given an observed pattern of reactions (x1,…,xn−1), the minimax sequential rule at stage (n−1) of testing (i.e., d(x1,…,xn−1)) is now given by: (6)
To compute the maximum expected loss associated with the continue testing option at stage (n−2), the so-called risk at stage (n−1) of testing is needed. The risk at stage (n−1) of testing, Risk(x1,…xn−1), is defined as the minimum of the maximum expected losses associated with all available decisions; that is, declaring the new treatment effective, ineffective, or testing one more patient. In other words: (7) The maximum expected loss associated with testing one more patient after (n−2) patients have been exposed to the new treatment with sn−2 number of positive reactions, max[E(L(c, p)|sn−2)], can then be computed by using the following recurrent relation (i.e., computing the expected risk): (8)
Given an observed pattern of reactions (x1,…,xn−2), the minimax sequential rule at stage (n−2) of testing (i.e., d(x1,…,xn−2)) can now be computed analogous to the computation of d(x1,…,xn−1) under (6). Following the same computational backward scheme as used to determine the minimax sequential rules at stages (n− 1) and (n−2), the minimax sequential rules at stages (n−3),…,1 are computed.
Sequential testing in psychodiagnostics using minimax decision theory
107
Computation of Posterior Predictive Distributions For computing the maximum expected loss associated with testing one more patient after (k−1) patients have been exposed to the new treatment and sk−1 of them reacting positively to it (i.e., max[E(L(c, p)|sk−1)]), as can be seen from (5) and (8), the posterior predictive distribution Prob(xk|sk−1) is needed. To compute this conditional probability, as previously noted, a prior probability for the success rate p of the treatment must be specified. Since the minimax principle is very attractive (e.g., Coombs et al., 1970) when the only information available is the number of patients who reacted positively to the new treatment, the uniform prior on the standard interval [0,1] is taken in this paper as a special form of the beta prior B(α, β). That is, prior success rate of the treatment can take on all values between 0 and 1 with equal probability. The beta prior B(α, β) reduces to the uniform prior on [0,1] for α=β=1. If sk−1 patients reacted positively to the new treatment after (k−1) patients have been exposed to it, and assuming a uniform prior, in combination with the binomial distribution for the measurement model, it is known (e.g., De Groot, 1970) that the probability of a positive reaction to the new treatment of the k-th patient (i.e., Prob(xk=1|sk−1)) is equal to (l+sk−1)/(k+1). Since the probabilities of a positive and negative reaction must sum to 1, it follows immediately that the probability of a negative reaction to the new treatment of the k-th patient (i.e., Prob(xk=0|sk−1)) is equal to [1−(1+sk−1)/(k+1)]=(k−sk−1)/(k+1). Simulation of Different Testing Strategies In a Monte Carlo simulation, the minimax sequential strategy will be compared with other existing approaches to testing in psychodiagnostics. As an example, the simulation study is concerned with the effectiveness of a cognitive-analytic therapy for patients suffering from anorexia nervosa. Description of Different Testing Strategies The first comparison will be made with a conventional fixed-length test (CT). The cognitive-analytic therapy was declared effective if 60% or more of the maximum number of patients to be tested reacted positively to the new therapy, whereas ineffectiveness was declared otherwise. The second comparison will be made with Wald’s SPRT procedure. The limits of the indifference region around p, in which the decision made does not matter, were set symmetrically around 0.6, with a band width of 0.2; that is, pn and pm were set at 0.5 and 0.7, respectively. Furthermore, the values of Type I and Type II error rates (i.e., α and β) were each set equal to 0.1. According to the SPRT procedure, after k patients have been exposed to the new treatment with sk of them reacting positively, effectiveness of the new treatment is declared if the likelihood ratio
Clinical assessment computerized methods
108
was larger than α/(l−β). Ineffectiveness was declared if this likelihood ratio was smaller than (l−α)/β, and otherwise another random patient was tested. If no classification decision concerning effectiveness or ineffectiveness of the cognitive-analytic therapy could be made before the maximum number of available patients was tested, a classification decision was made in the same way as in the CT procedure, using a proportion-success value of 0.6. In order to make a fair comparison of the minimax sequential strategy with the two strategies described above, the minimum success rate p0 of the treatment was set equal to 0.6. Furthermore, the losses l01 and l10 associated with the incorrect classification outcomes were assumed to be equal, corresponding to the assumption of equal error rates in Wald’s SPRT procedure. On a scale in which one unit corresponded to the costs of testing one random patient (i.e., e=1), l01 and l10 were each set equal to 100, in order to reflect the fact that costs for testing another patient were assumed to be small relative to the costs associated with incorrect classification outcomes. Using the backward induction computational scheme discussed earlier, for given maximum number of n patients, a computer program called MINIMAX was developed to determine the appropriate decision (i.e., declaring effectiveness, declaring ineffectiveness, or continuing testing) for the minimax strategy at each stage of testing k for a different number of positive reactions sk. A copy of the program MINIMAX is available from the author upon request. Type of Test and Maximum Number of Patients to be Tested The simulation study will be conducted using a test that was modeled as a one-parameter logistic model (Rasch, 1960). Maximum number of patients to be tested (MNP) in the simulation study will be examined for 10, 25, and 50 patients. Patient Reactions Generation Reactions for simulated patients with the same parameter p were generated for the test. Values for the success rate p were obtained by first drawing randomly z values from a N(0,1) distribution and next applying a logistic transformation to these z values such that the median of the resulting p values was equal to the success rate p0. For known value of p and given test parameters (i.e., difficulty of the test), first the probability of a positive reaction was calculated using the one-parameter logistic model. Next, this probability was compared with a random number drawn from the uniform distribution in the range from 0 to 1. The test administered to the simulated patient was scored as a positive or negative reaction when this randomly selected number was less or greater than the probability of a positive reaction, respectively. For each of the three testing strategies, reactions of patients with the same value of p were generated in this way for the three MNP levels of 10, 25, and 50. After a classification decision was made, the simulation experiment was repeated with patients having another value for the parameter p. The above described simulation experiment was repeated 500 times totally, each time with a different value for p. Furthermore, the cognitive-analytic therapy was supposed to be “truly” effective in each simulation run if
Sequential testing in psychodiagnostics using minimax decision theory
109
the parameter p used to generate the reactions of the patients in that particular run was higher than p0. Results of the Monte Carlo Simulation In this section, the results of the Monte Carlo simulations will be compared for the three different testing strategies in terms of average number of patients to be tested (i.e., the number of patients to be tested before an effective/ineffective decision is made), classification accuracy, and classification accuracy as a function of average number of patients to be tested. Average Number of Patients to be Tested Table 2 shows the average number of patients to be tested required by each of the three testing strategies before an effective/ineffective decision can be made. The minimax strategy is hereby denoted as MINI.
Table 2. Average number of patients to be tested. Maximum Number of Patients to be Tested Strategy
10
25
50
CT
10
25
50
SPRT
7.08
9.30
9.86
MINI
4.61
7.48
8.21
As can be seen from Table 2, the MINI strategy resulted in considerable reductions of average number of patients to be tested at each MNP level. Table 2 also shows that the MINI procedure resulted in a greater reduction of average number of patients to be tested than the SPRT strategy at all MNP levels. Finally, just as with the SPRT strategy, it can be inferred from Table 2 that the reduction of average number of patients to be tested increased under the MINI strategy with increasing MNP level. More specifically, the average number of patients to be tested was reduced by 53.90%, 70.08%, and 83.58% for the 10 MNP level, 25 MNP level, and 50 MNP level, respectively. Classification Accuracy Table 3 shows phi correlations between true classification status (i.e., true effectiveness or true ineffectiveness) and estimated classification status (i.e., declaring effectiveness or ineffectiveness) for the three testing procedures at each MNP level. These phi correlations can be considered as an indicator of the accuracy of the classification decisions. As can be seen from Table 3, the phi correlations increased under the MINI strategy with increasing MNP level.
Clinical assessment computerized methods
110
Table 3. Phi Correlations between True Classification Status and Estimated Classification Status. Maximum Number of Patients to be Tested 10
25
50
CT
0.856
0.838
0.827
SPRT
0.854
0.829
0.847
MINI
0.821
0.829
0.830
Strategy
Most Efficient Testing Strategy Kingsbury and Weiss (1983) depicted graphically the phi correlation as a function of the average number of patients to be tested for each testing strategy. From these graphs conclusions were derived concerning which testing strategy was most efficient. A testing strategy was hereby said to be most efficient if it results in the combination of the highest phi correlation and the smallest average number of patients to be tested. As can be seen from Tables 2 and 3, the MINI strategy required a smaller average number of patients to be tested than the two other strategies at each MNP level, whereas the phi correlations were generally lower at each MNP level. Therefore, in order to examine which strategy is most efficient, we compute for the other two strategies the average number of patients to be tested at each MNP level for achieving the same phi correlation as under the MINI strategy. In other words, we match the average number of patients to be tested on the classification accuracy. As indicated in Tables 2 and 3, the MINI strategy resulted in a phi correlation of 0.830 for 8.21 patients to be tested on the average at the 50 MNP level. Interpolating data from Tables 2 and 3, it can easily be verified that the SPRT procedure would need to test 9.33 patients on the average to achieve this same phi correlation of 0.830, whereas the CT procedure would need to test 43.18 patients on the average. Similarly, as shown by Tables 2 and 3, the MINI strategy resulted in a phi correlation of 0.829 for 7.48 patients to be tested on the average at the 25 MNP level. The SPRT and CT procedures would need to test 9.30 and 45.46 patients on the average to achieve this same phi correlation of 0.829, respectively. Finally, as indicated by Tables 2 and 3, the MINI procedure resulted in a phi correlation of 0.821 for 4.61 patients to be tested on the average at the 10 MNP level. The SPRT and CT procedures would need to test 10.01 and 39.17 patients on the average to achieve this same phi correlation of 0.821, respectively. Hence, it can be concluded that the MINI procedure was the most efficient of the three testing procedures at each MNP level for the specific values of the parameters chosen in the simulation study.
Sequential testing in psychodiagnostics using minimax decision theory
111
Discussion Optimal rules for the sequential testing problem in psychodiagnostics (declaring effectiveness, declaring ineffectiveness, or testing another random patient) were derived using the framework of minimax decision theory. The binomial probability function was assumed for modeling the number of tested patients reacting positively to the new treatment, given its success rate. Furthermore, threshold loss was adopted for the loss function involved. In a Monte Carlo simulation, the minimax procedure was compared to a conventional fixed-length test and to the SPRT procedure. Maximum number of patients to be tested varied from 10 to 50 patients. The results of the simulation study indicated that the minimax strategy was most efficient (i.e., combination of highest classification accuracy and shortest average number of patients to be tested) for test pools reflecting the oneparameter logistic model at each level of maximum number of patients to be tested. Several procedures have been proposed which are simple variants of the minimax strategy (e.g., Coombs et al., 1970), and may also be applied to the sequential testing problem in psychodiagnostics. The first is the minimin (complete optimism) strategy, where optimal rules are obtained by minimizing the minimum expected losses associated with all possible decision rules. This strategy is optimal if the best that could happen always happens. The second is the pessimism-optimism (Hurwicz) strategy. This strategy is a combination of the minimax and minimin strategies, and decisions are made on the basis of the smallest and largest expected losses associated with each possible decision rule. The third is the minimax-regret (Savage) strategy. This strategy is similar to the minimax strategy since the focus is on the worst possible decision outcome, but ‘worst’ is here defined by maximal regret; that is, the difference between the maximal expected loss that was actually obtained, and the maximal expected loss among all possible decision outcomes. A final note is appropriate. Following the same line of reasoning as in the present paper, the optimal rules derived here can easily be generalized to the situation where three or more mutually exclusive classification categories can be distinguished; for instance, for classification of a new treatment as significantly effective, moderately effective, or ineffective. References Coombs, C.H., Dawes, R.M., & Tversky, A. (1970). Mathematical psychology: An elementary introduction. Englewood Cliffs, New Jersey: Prentice-Hall Inc. De Groot, M.H. (1970). Optimal statistical decisions. New York: McGraw-Hill. Ferguson, R.L. (1969). The development, implementation, and evaluation of a computer-assisted branched test for a program of individually prescribed instruction. Unpublished doctoral dissertation, University of Pittsburgh, Pittsburgh PA. Kingsbury, G.G., & Weiss, D.J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D.J.Weiss (Ed.), New horizons in testing: Latent trait test theory and computerized adaptive testing (pp. 257–283). New York: Academic Press.
Clinical assessment computerized methods
112
Lehmann, E.L. (1959). Testing statistical hypotheses (3rd ed.). New York: Macmillan. Linden, W.J.van der (1981). Decision models for use with criterion-referenced tests. Applied Psychological Measurement, 4, 469–492. Luce, R.D., & Raiffa, H. (1957). Games and decisions. New York: John Wiley and Sons. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research. Wald, A. (1947). Sequential analysis. New York: John Wiley and Sons.
Chapter 9 Visual Ordination Techniques in Grid Methodology: The Example of the SelfConfrontation Method R.van Geel, H.de Mey, and N.Bendermacher Rutten Institute for Research in Psychology, University of Nijmegen, Montessorilaan 3, 6525 HR Nijmegen, The Netherlands Abstract The self-confrontation method (SCM) is a form of psychotherapy, in which the personal experiences of a client (i.e., valuations) are ordered thematically. In this study, a numerical method is proposed for depicting these valuations within a two-dimensional hexagonal frame. Such a geometric depiction is intended to facilitate and improve the communication between psychologist and client. Typically, a client connects affect-denoting terms from a standard list to every valuation. This list contains indicators of self-enhancement (e.g., strength, selfconfidence), of contact and union (e.g., love, tenderness), of positive affect (e.g., happiness, joy), and of negative affect (e.g., despondency and disappointment). A typology of valuations, which is based on these four scales, lends itself well to the purpose of mapping within a (flexible or fixed) hexagonal frame. The angular points of the hexagon are formed by the (theoretically defined) extreme elements of each of the six types of valuations most frequently encountered in valuation research (+O, +HH, +S, −S, −LL and −O). Extreme elements arrange themselves on the boundaries of a valuation system, thus generating a hexagonal frame. With the aid of a recent version of the computer program KUNGRID, a (fixed or flexible) hexagon analysis can easily be executed. An empirical study (n=210) on the quality of our hexagon analysis shows that both the flexible and fixed methods generate high quality depictions. The fixed method, in particular, is very promising because it allows valuation systems to be projected against a theoretically meaningful background.
Introduction Two-dimensional representations of grids are commonly used in ideographic personality research. In Repertory Grid studies (Kelly, 1955), Principal Component Analysis (PCA) has been applied to that end (Rathod, 1981a; Slater, 1964, 1977), and is contained in programs such as INGRID. Applications of INGRID can be found in a variety of contexts
Clinical assessment computerized methods
114
(e.g., Beail, 1985; Bonarius, Holland & Rosenberg, 1981; Fransella & Bannister, 1977; Slater, 1976). Programs similar to INGRID have been developed by Rathod (1981b), e.g., PAGAN, and by Tschudi et al. (1993), e.g., FLEXIGRID, the latter as part of a compilation of programs for grid-methodology. Although, from a technical point of view, the self confrontation method (Hermans, 1976; Hermans & Hermans-Jansen, 1995) has strong ties with repertory-grid methodology (Rathod, 1982; Takens, 1994; Tschudi, 1994), it is not until recently that a PCA-program has been specifically designed for making two-dimensional representations of SCM grid data; this program is called KUNGRID (Thissen-Pennings, Bendermacher, Van Geel & De Mey, 1996). Hermans’ self-confrontation method (SCM) is a genuinely ideographic instrument, rooted in valuation theory (Hermans, 1976). In the SCM, two persons are engaged in a profound dialogue with each other. One person, usually a psychologist, invites the other, who is considered a co-investigator (Hermans, 1991; Hermans & Bonarius, 1991), to construct so-called “valuations,” i.e., narrative elements that reflect important experiences about the past, present and future. Valuation includes anything people identify as a relevant meaning unit when telling their life narrative. A valuation is any unit of meaning that has a positive (pleasant), negative (unpleasant), or ambivalent (both pleasant and unpleasant) value in the eyes of the self-reflecting individual. It can include a broad range of phenomena: a precious memory, a difficult problem, a beloved person, an unreachable goal, the anticipated death of a significant other, and so forth (Hermans & Hermans-Jansen, 1995, p. 15). After the dialogue, the individual fills in a matrix, of which the rows consist of valuations (ca. 20–40) and the columns each represent an affect term, from a list of 16, 24, or 30 affects. Each affect is rated on a 0–5 scale of intensity, per valuation. In this way, each valuation yields a characteristic profile. Using a measure of proximity, similar valuations (i.e., similar affective profiles) and contrasting valuations can be detected.
Table 1. Basic Types of Valuations and Their Underlying Themes. Type
S
O
P
N
Theme
+S
high
low
high
low
success, autonomy, perseverance
−S
high
low
low
high
aggression, anger, opposition
+O
low
high
high
low
love and unity
−O
low
high
low
high
unfulfilled longing, loss
−LL
low
low
low
high
powerlessness and isolation
+HH
high
high
high
low
strength and unity
Note. Levels of S, O, P and N scales (SOPN-profile) result in a classification of each valuation within the typology. For valuations of the type +S, −S, +O and −O, the differences between the levels of S and O, and P and N, respectively, must be _ 6. The negative −LL type valuations have a low score on both S and O (i.e., S+O_7) and the difference between S and O is small (|O−S|_4). The positive +HH type valuations have high scores on S and O (S+O_20), and |O−S|_5. Adapted from Hermans et al. (1985).
Visual ordination techniques in grid methodology
115
On the basis of these profiles, valuations can also be classified into a typology. Such a typology gradually evolved in the course of the development of valuation theory (Hermans, 1992; Hermans & Hermans-Jansen, 1995; Hermans & Van Gilst, 1991; Hermans, Hermans-Jansen, & Van Gilst, 1985, 1987), and distinguishes six major types of experiences, each representing a different theme. As can be gleaned from Table 1, these types are generated by dual combinations of four indices (see below), each consisting of the sum of the scores of the affect terms that make up the scale or index (the affect terms mentioned under the scales constitute the 16list). 1. S-scale: feelings that represent the striving for self-enhancement like self-esteem, strength, self-confidence and pride; 2. O-scale: feelings that reflect the striving for contact with the other: caring, love, tenderness and intimacy; 3. P-scale: positive feelings like joy, happiness, enjoyment and inner calm; 4. N-scale: negative feelings like worry, unhappiness, despondency and disappointment. For example, high levels on the N and O indices are often associated with valuations in which the striving for contact with someone is unfulfilled, as in “Due to our preoccupation with our sick daughter, our son was rather neglected (S=6, O=16, P=5, N=11)” (Hermans & Hermans-Jansen, 1995, p.89). Its pattern of affect refers to a valuation of the −O type (see Table 1). Examples of this and other types of valuations can be found in the book by Hermans and Hermans-Jansen (1995), and in Van Geel (2000). Proximity Measures: Euclidean Distance versus Correlation Within SCM research, Pearson’s r has always been the standard proximity measure used to compare valuations with respect to their affective profiles. With this measure, similar and dissimilar valuations can be detected, with high positive scores on r referring to high similarity, and high negative scores indicating affective contrast between two valuations. One of the main disadvantages of r, however, is its high sensitivity for shape (not level). Thus, it is possible that valuations --very different in level-- are nevertheless highly related to each other via a high score on r, which can be misleading in some cases. Elsewhere, we elaborated on the disadvantages of r as a proximity measure in SCM research, and proposed Euclidean distance as an alternative (Van Geel & De Mey, 1996). There are studies, albeit in somewhat different contexts, in which a comparable line of reasoning has been followed (Everitt, 1974; Mackay, 1992; Rathod, 1981a). Evaluating the Quality of Ordinations One of the methods for looking at the quality of ordinations is that of calculating the “cophenetic correlation coefficient,” as proposed by Sokal and Rohlf (1962). This product-moment correlation coefficient, explained below, is “a measure of agreement between the similarity values implied by the phenogram and those of the original similarity matrix” (Sneath & Sokal, 1973, p. 278).
Clinical assessment computerized methods
116
We can investigate the performance of an ordination or mapping technique in the SCM by computing the cophenetic coefficient between the distances among valuations in the original space, on the one hand, and the distances in the alternative two-dimensional space, on the other hand. If we designate the original distances as d and the corresponding distances in the graphical representations as d*, the cophenetic correlation coefficient rdd* (i.e., the Pearson correlation between the distance pairs (d,d*)) gives us information about the extent to which the distances are preserved in the alternative space. Representation of a Single Valuation System in the SCM: A Short History In the early stages of valuation theory and of the self confrontation method (Hermans, 1986), valuations were projected, according to their type, on the inside of a vase, of which the boundaries were named according to the six types distinguished by the theory. This depiction could then be used during a SCM dialogue for summarizing a person’s valuation system. However, in this way much information was lost. In particular, not all valuations could be situated, because a certain amount of them were not “pure” types according to the criteria. In addition, the distances among valuations, informing us about the affective similarity of valuations, could be gleaned from that depiction. Another way to picture single valuations in space is to make use of mathematically derived algorithms in order to position the valuations in such a manner that their location and distances between them, as contained in the matrix of affects valuations, are maximally preserved. The program KUNGRID (Thissen-Pennings et al., 1996), as mentioned previously, was made with exactly the aim in mind of preserving maximal information in two dimensions. Van Geel, De Mey, Thissen-Pennings and Bendermacher (2000) compared KUNGRID with the Multi-Dimensional Scaling algorithm ALSCAL, and found the depictions in two dimensions to be of high quality, for both algorithms. Although the preservation of information is important, theoretical as well as practical considerations may bend the depiction towards a less perfect but more sensible and useful representation of the actual data. With respect to SCM research, this translates into an ordination not merely depending on the specific characteristics of a particular SCM-grid, but also on the standard, theoretical orientation of scales and types. Suchlike standardizations have been achieved, for example, in fields of personality research with the aid of circular models (e.g., Wiggins & Broughton, 1991; Plutchik & Conte, 1997). This preference for a standardized organization of types in two dimensions shows up in recent work on valuation theory. In Hermans (1991, 1996) and Hermans and HermansJansen (1995), we see valuation types arranged in a circular pattern, in the form of a hexagon (see Figure 1a). In this orientation, +O types are situated in the first (upper right) sextant (see Figure 1b). Then, moving anti-clockwise, +HH valuations are positioned in the second sextant, +S in the third, −S in the fourth, −LL in the fifth, and −O valuations in the sixth sextant.
Visual ordination techniques in grid methodology
117
Figure 1a. Hexagonal model of SelfNarratives (adapted from Hermans & Hermans-Jansen, 1995).
Figure 1b. Theoretical Segments in Depictions of Valuation-Systems.
Clinical assessment computerized methods
118
In the hexagon, two main axes or dimensions can be recognized: A horizontal axis differentiating the S elements (+S and −S) from the O elements (+O and −O), and a vertical axis containing the negative (−S, −LL, −O) vs. the positive elements (+S, +HH, +O). The location of the types in the hexagon is strongly tied to the theoretical orientation of the axes. Our aim is to investigate which method is most suitable for obtaining such an “ideal” hexagonal arrangement of types for a single valuation system. Thereto, consideration of the following two aspects is necessary. Firstly, the information of the original Euclidean distances among affect profiles must be preserved in the two-dimensional depiction (see earlier paragraph about the quality of ordinations). Secondly, the orientation of scales and the location of types must be adequate. A method is adequate in this respect if it depicts types in accordance with the hexagonal model. Although a wide variety of methods is eligible for analysis, four methods in particular will be elaborated (see Table 2).
Figure 2a. Depiction generated according to Method I (rdd*=.85). Figure 2a shows an example of an ordination generated by Method I. Although the goal of positioning the elements in the desired orientation is achieved, the original distances are rather distorted (rdd*=.85).
Visual ordination techniques in grid methodology
119
Table 2. Overview of ordination methods aiming at an ideal orientation in scales and types Information from: I: difference scales
II: affect terms and scales
Desired orientation of scales (and/or types)
Strength/Weaknessa
Method Straightforward projection in plane with the difference scales O-S and P-N as coordinates on horizontal and vertical axes, respectively.
+:
PCA of affect terms (2 factor solution) followed by an approximation to a desired factor structure of S, O, P and N scales (RØTA03: Roskam & Borgers, 1969) :
+/−: Approximation to desired orientation is not guaranteed in all cases. In some cases -- due to little variability in types -- the ideal orientation of axes is not attainable
dim1
dim2
S
−1
0
O
1
0
P
0
1
N
0
–1
Types visualized in desired orientation -: Original distances among elements of similar type can be highly distorted due to strong assumptions ++: Simple method that can be done by hand (transparent method)
++: Original distances among elements very well preserved
Information Desired orientation of Method from: scales and/or types
Strength/Weaknessa
III: affect terms and marker profiles
Points are projected onto plane containing six markers constituting a fixed hexagon.
+++: Desired orientation is perfect due to method
Factor score coefficients derived from PCA on affect terms of the six markers are used for the projection.
+: Original distances among elements well preserved +++: Easily accessible depiction due to markers, fixed shape and fixed orientation or hexagon
Clinical assessment computerized methods
IV: affect terms, marker profiles and scales
120
PCA of affect terms of matrix augmented with six rows representing six extreme profiles (2 factor solution), followed by an approximation to a desired factor structure in S, O, P and N scales (RØTA03: Roskam & Borgers, 1969): dim1 S
dim2 −1
++: Approximation to desired orientation is very good due to inclusion of marker profiles ++: Original distances among elements well preserved
++: Easily accessible depiction due to markers
0 O
1 0
P
0 1
N
0 −1
a
+= positive, ++=very positive, +++=extremely positive, +/−=doubtful, −= weakness.
Figure 2b. Depiction generated according to Method II (rdd*=.95)
Visual ordination techniques in grid methodology
121
Method II aims at a similar axis orientation to that of Method I: A first bipolar dimension, differentiating S from O elements, and the second dimension separating negative from positive elements. Firstly, a two-dimensional PCA ordination (as in KUNGRID) is performed on affect terms, followed by a rotation of scales to the ideal configuration (see Table 2). Secondly, Method II tries to approximate the desired configuration via the separate scales. It integrates theoretical considerations with empirical data, in contrast to Method I, which is more theory guided. Although the model of Method II also assumes orthogonality between the two bipolar axes, S and O scales will not necessarily be perpendicular to the P and N scales, since the model only functions as a target. For the same reason, the S scale will not necessarily be in line with the O scale, neither will the P scale with the N scale. In other words, Method II approximates an ideal configuration without neglecting the empirical relationships among the S, O, P and N scales of a specific SCM-matrix. With regard to the axes, Method-I uses fixed axes, whereas Method-II axes are flexible. In Method I, the orthogonal orientation of the axes is guaranteed at the expense of a distortion of distances among the elements. In Method II, distances are well preserved, yet the orthogonal orientation of axes is not guaranteed (only approximated). In other words, in Method I the theoretical axes dominate the solution, whereas in Method II they only guide it. Figure 2b shows an example of a depiction made via Method II. It yields a high quality (rdd*=.95), and the positioning of the types agrees well with that of the model (Figure 1b).
Figure 2c. Depiction generated according to Method III (rdd*=.92).
Clinical assessment computerized methods
122
Methods for Obtaining a Hexagonal Configuration of Types The four methods employ information derived either from scales, affect terms, so-called markers (extreme profiles), or from combinations of the three. In Method I, the locations of elements in the plane are defined by two perpendicular axes (O-S and P-N). Here, we calculate the coordinates O-S and P-N for the first and second axes, respectively. Hence, +O valuations will be positioned in the first (upper right) sextant, since they have positive scores on O-S (O>S), and positive scores on P-N (P>N). Similarly, +S, −S and −O valuations are located in the third, fourth, and sixth sextant, respectively, according to the hexagonal configuration of Figure 1b. The valuations +HH and − LL, which both have approximately equal S and O scores, become positioned in the second (between +S and +O) and fifth (between −S and −O) sextant, respectively. Although this method is fairly easy to execute, and approximates the ideal configuration rather well, one disadvantage is that the underlying model makes strong assumptions with respect to the correlation between the O-S and P-N difference scales, which are assumed to be zero. As a result, the model may greatly distort the original distances among elements.
Figure 2d. Depiction generated according to Method IV (rdd*+.94).
Visual ordination techniques in grid methodology
123
Method III uses so-called “marker” profiles, projected as marker points in a hexagonal plane. Basically, marker profiles are extreme profiles of each of the six valuation types.
Table 3. Six marker profiles used in hexagon analysis. Affect terms
Coordinates in fixed hexagon (Method III)
Type s1
s2
s3
s4 o1 o2 o3 o4 p1 p2 p3 p4 n1 n2 n3 n4
Dim1
Dim2
+HH
5
5
5
5
5
5
5
5
5
5
5
5
0
0
0
0
0.00
6.53
+S
5
5
5
5
0
0
0
0
5
5
5
5
0
0
0
0
−5.00
4.62
−S
5
5
5
5
0
0
0
0
0
0
0
0
5
5
5
5
−5.00
−4.62
−LL
0
0
0
0
0
0
0
0
0
0
0
0
5
5
5
5
0.00
−6.53
−O
0
0
0
0
5
5
5
5
0
0
0
0
5
5
5
5
5.00
−4.62
+O
0
0
0
0
5
5
5
5
5
5
5
5
0
0
0
0
5.00
4.62
0.00
0.00
4.08
5.33
Mean 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 SD
2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 2.5 25
Note. Dim1=(O−S)/4; Dim2=0.38269 * (S+O)/4+0.92388 * (P−N)/4−1.91343; S=s1+s2+s3+s4, O=o1+o2+o3+o4, P=p1+p2+p3+p4, N=n1+n2+n3+n4.
As shown in Table 3, the +HH marker profile contains the highest possible scores on S-, O-, and P-feelings, and the lowest possible scores on N-feelings. The opposite holds for the −LL marker profile. If we project these marker profiles as marker points on the boundaries of a valuation system, a useful hexagonal frame emerges in which a person’s valuation types can then be easily situated. The techniques used to obtain a fixed hexagon in which other “outside” elements can be projected, are based on principal component analysis, wherein elements are treated either actively or passively (Van de Geer, 1988). In Method III, the six markers are treated actively, i.e., they are used in a PCA to generate a hexagon. Due to their typical features, in KUNGRID, the extreme markers will arrange themselves in the form of a hexagon, at the same time maximally preserving their original mutual distances. After this initial step, factor score coefficients derived from this PCA (with six elements) are employed to project “outside” elements into the hexagon. Because they do not affect the location of other elements it is said that these “outside” elements are treated passively. It can be proven that—with S, O, P and N as sum scales, each composed of an equal number of affect terms i—this procedure comes down to the calculation of coordinates according to the following formulae:
Clinical assessment computerized methods
124
The weights in the formulae are derived from a principal component analysis of six markers, with each “sum scale” containing one affect (i=1). Set up in this way, hexagon analysis will yield standardized results, with coordinates ranging from [−5.00, 5.00] for the first dimension to [−6.53, 6.53] for the second.1 An example of an ordination produced by Method III is shown in Figure 2c. The reader can observe that the marker profiles are positioned in their fixed locations, as indicated in Table 3. By fixating the locations of marker profiles, a highly standardized depiction of a set of affect profiles is obtained with a fixed (hexagonal) configuration as background. Method IV is another way of approximating a hexagon by adding “marker” profiles to the existing matrix of affects valuations. Here, both the marker profiles and the other profiles are treated actively. As a result, the points of the hexagon will not be positioned in a fixed way. Nevertheless, due to their extremity, they will strongly affect the solution. In order to obtain the theoretical orientation of types (and scales), a rotation is performed. An example of an ordination according to Method IV is given in Figure 2d. Both quality and orientation are adequate, thereby combining the features of Methods II and III. 1
A similar configuration can be obtained by including six markers in a KUNGRID-ordination. By treating the markers as “active” elements and the remaining affect profiles as “passive,” a (fixed) hexagon analysis is performed. In addition, the program output displays the concomitant cophenetic correlation coefficient of the passive elements, expressing the quality of the ordination.
When looking back at these four methods, we may notice that they seem to fall nicely along two classification principles, one consisting of a fixed versus flexible approach, the other consisting of the use of markers versus no use of markers (See Table 4).
Table 4. Methods for obtaining a hexagonal configuration. Fixed approach: no PCA
Flexible approach: PCA, followed by congruence rotation
passive markers
I
II
active markers
III
IV
In a fixed approach, for each SCM matrix, the location of a single valuation is independent of that of the other valuations. Methods I and III are examples of this approach, as is evident from the employed formulas. In a flexible approach, using PCA, the location is dependent on the locations of the other elements (Methods II and IV).
Visual ordination techniques in grid methodology
125
The methods III and IV include so-called marker profiles for each of the six types. The reason for including these marker profiles is based on the notion that extremes greatly determine the features of a space. In its fixed form, markers are positioned in hexagonal form (Method III), in contrast to the rectangular form seen in Method I. In its flexible form, the inclusion of markers in a principal component analysis compensates for the possible lack of diversity in types. For SCM-grids lacking diversity, an ideal orientation of types is difficult to attain by Method II. The inclusion of extreme markers undoubtedly yields a better orientation (Method IV).
Table 5. Mean quality of two-dimensional representations generated by the four methods of ordination. Mean
Standard Deviation
I
.888
.053
II
.955
.029
III
.929
.046
IV
.947
.035
Note: n=210.
A prerequisite for any ordination method, whether “fixed” or “flexible,” is the preservation of information. Without a sufficiently high quality, a depiction cannot be considered a reliable summary of a valuation system. Therefore, we finish this study by showing a table representing the average quality of the depictions belonging to each of the four methods, as ascertained in a study of 210 SCM-matrices. (For details about the data gathering, see Van Geel, 2000, Ch6.) Table 5 shows that, as expected, the flexible methods (Methods II and IV) yield better preservation of distances. Of the fixed methods, Method I yields pictures of doubtful quality, whereas Method III can compete with the two flexible methods. Method III thus appears to combine the advantage of the standard orientation of types with that of the preservation of distance information. Concluding Remarks The aim of this study was to let theoretical considerations enter into the picture by looking for a method that could combine the requirement of preserving distances with that of the need for surveyability. The theoretical emphasis on the six types of valuations revealed the idea of using the prototypical elements as boundaries of a hexagonal space. Although the “empirically best fitting plane” is generated by Method II (and to a lesser extent by Method IV), the most interesting solution from the standpoint of valuation theory is undoubtedly that construed by Method III, which allows valuation systems to be compared against a fixed background.
Clinical assessment computerized methods
126
References Beail, N. (1985). An introduction to repertory grid technique. In N.Beail (Ed.), Repertory grid technique and personal constructs: applications in clinical and educational settings (pp. 1–24). London: Croom Helm. Bonarius, H., Holland, R., & Rosenberg, S. (Eds.). (1981). Personal construct psychology: Recent advances in theory and practice. London: MacMillan. Everitt, B. (1974). Cluster analysis. London: Heinemann Educational Books. Fransella, F., & Bannister, D. (1977). A Manual for the Repertory Grid Technique. London: Academic Press. Geel, R.van (2000). Agency and communion in self-narratives: A psychometric study of the selfconfrontation method. Nijmegen: Nijmegen University Press. Geel, R.van, & De Mey, H. (1996). Kwaliteit en nut van geometrische representaties van grid data verzameld met de Zelfkonfrontatiemethode [Quality and utility of geometrical representations of grid data gathered with the Self-Confrontation Method](Intern Rapport 96 KP 01). Nijmegen: Katholieke Universiteit Nijmegen. Geel, R.van, De Mey, H.R.A., Thissen-Pennings, M., & Bendermacher, N. (2000). Picturing valuations in affect space: Comparison of two methods of ordination. Journal of Constructivist Psychology, 13, 27–45. Geer, J.P.van de (1988). Analyse van categorische gegevens [Analysis of categorical data]. Deventer: Van Lochum Slaterus. Hermans, H.J.M. (1976). Value areas and their development: Theory and method of selfconfrontation. Amsterdam: Swets & Zeitlinger. Hermans, H.J.M. (1986). Het verdeelde gemoed [The divided heart]. (1st ed.). Baarn: Nelissen. Hermans, H.J.M. (1991). The person as co-investigator in self-research: Valuation theory. European Journal of Personality, 5, 217–234. Hermans, H.J.M. (1992). Unhappy self-esteem: A meaningful exception to the rule. The Journal of Psychology, 126, 555–570. Hermans, H.J.M. (1996). Het verdeelde gemoed [The divided heart]. (4th ed.). Baarn: Nelissen. Hermans, H.J.M., & Bonarius, H. (1991). The person as co-investigator in personality research. European Journal of Personality, 5, 199–216. Hermans, H.J.M., & Van Gilst, W. (1991). Self-narrative and collective myth: An analysis of the Narcissus story. Canadian Journal of Behavioral Science, 23, 423–440. Hermans, H.J.M., & Hermans-Jansen, E. (1995). Self-narratives: The construction of meaning in psychotherapy. New York: Guilford. Hermans, H.J.M., Hermans-Jansen, E., & van Gilst, W. (1985). De grondmotieven van het menselijk bestaan: Hun expressie in het persoonlijk waarderingsleven [The basic motives of human experience: Their expression in personal valuation]. Lisse: Swets & Zeitlinger. Hermans, H.J.M., Hermans-Jansen, E., & Van Gilst, W. (1987). The fugit amor experience in the process of valuation: a self-confrontation with the unreachable other. British Journal of Psychology, 78, 465–481. Kelly, G. (1955). The psychology of personal constructs. New York: Norton. Mackay, N. (1992). Identification, reflection, and correlation: Problems in the bases of repertory grid measures. International Journal of Personal Construct Psychology, 5, 57–75. Plutchik, R., & Conte, H.R. (Eds.). (1997). Circumplex models of personality and emotions. Washington, DC: American Psychological Association. Rathod, P. (1981a). Methods for the analysis of rep grid data. In H.Bonarius, R.Holland, & S.Rosenberg (Eds.), Personal construct psychology: Recent advances in theory and practice (pp. 117–146). London: MacMillan. Rathod, P. (1981b). PAGAN: Package for grid analysis, user’s manual, Version 1: Utrecht: University of Utrecht, subfaculty of psychology.
Visual ordination techniques in grid methodology
127
Rathod, P. (1982). The grid method: Methodology and applications. Unpublished Doctoral Dissertation, Rijksuniversiteit Leiden. Roskam, E., & Borgers, H. (1969). RØTA03: toetsende en exploratieve rotatie van factorstructuren [Testing and explorative rotation of factor structures]. Nijmegen: Psychologisch Laboratorium der Katholieke Universiteit, afd. mathematische psychologic. Slater, P. (1964). The principal components of a repertory grid. London: Vincent Andrews. Slater, P. (Ed.). (1976). The measurement of intrapersonal space by grid technique. (Vol. I. Explorations of intrapersonal space). London: Wiley. Slater, P. (Ed.). (1977). The measurement of intrapersonal space by grid technique. (Vol. II. Dimensions of intrapersonal space). London: Wiley. Sneath, P.H., & Sokal, R.R. (1973). Numerical taxonomy. San Francisco: Freeman. Sokal, R.R., & Rohlf, F.J. (1962). The comparison of dendrograms by objective methods. Taxon, 11, 33–40. Takens, R. (1994). Hermans’ valuation theory and the Self-Confrontation Method. EPCA Newsletter, 3, 13–16. Thissen-Pennings, M.C.E., Bendermacher, N., Van Geel, R., & De Mey, H.R.A. (1996). KUNGRID: A program for the analysis of ideographic grid data. Groep Rekentechnische Dienst: Katholieke Universiteit Nijmegen. Tschudi, F. (1994). Flexigrid, Hermans’ self-confrontational method (SCM) and Tomkins script theory. EPCA Newsletter, 3, 8–10. Tschudi, F., Bannister, D., Higginbotham, P., Keen, T., Shaw, M., Thomas, L., & Tschudi, P. (1993). FLEXIGRID 5.21. Oslo: Tschudi System Sales. Wiggins, J.S., & Broughton, R. (1991). A geometric taxonomy of personality scales. European Journal of Personality, 5, 343–365.
Chapter 10 From Theory to Research: Computer Applications in a Novel Educational Program for Introductory Psychology N.Brand, G.Panhuijsen, H.Kunst, J.Boom, and H.Lodewijkx Faculty Social Sciences, University of Utrecht, the Netherlands Abstract A description is presented of a new practical course for first year psychology students, in which the computer plays an important role. The aims of the course are to enliven and illustrate theoretical issues, to increase understanding of the relations between theory and research, and to provide training in writing research reports. In the course, a number of classical psychological research paradigms are presented by the computer program MINDS. The applications vary across the various psychological subdisciplines at the University of Utrecht, and include self-administered experiments and questionnaires, social psychological dilemma games, and a sorting task for moral judgement problems. A short description is given of each application. Finally, an account of the student’s evaluations is presented, which are generally favorable.
Introduction As part of a renewal of a first year educational course in psychology, the question was raised concerning in what way the computer could play a role in the education of introductory psychology for first year college students. Until recently, first year students at our university, have made their first acquaintance with psychology through DIPS: Disciplinary Introduction of Psychology. Presently, this curriculum consists of three parts, each encompassing 13 weeks. The first part is a theoretical introduction in which an overview is given of the most important theories, research methods, empirical findings and applications in the broad field of psychology. The several topics include psychological functions, personality theories, social and developmental psychology, and are presently introduced through the textbook by Peter Gray (1999). The second part is concerned with the biological foundation of behavior, which centers around the physiological, anatomical and genetic aspects of behavior. The third part: ‘From Theory to Research’, is most closely linked to the first part, and is meant to acquaint the student with a number of classical research methods, as
Computer applications in a novel educational program
129
discussed in the description of the first part.1 This practical part is the focus of the present report. The aim of the application of the computer in this part is to enliven and illustrate the theoretical models and research paradigms, to clarify the relationships between them, and to make it possible for students to undertake and undergo a number of classical psychological experiments and test administrations. The principle of ‘learning by doing’ is adhered to, whilst carrying out a number of concrete assignments. Method The educational program in its present form was realized in the educational year 1996– 1997. About 400 freshmen in the social sciences, both full-time and part-time (evening-) students participate each year. The final DIPS practical period encompasses 9 assignments, including 8 assignments in which the computer plays a crucial part. The other assignment aims to improve students’ analytical skills, with reference to a regular research article. The 8 computer-reliant assignments are equally divided across the four psychology departments, which are part of the Utrecht Social Sciences Faculty: the Psychonomic department, Clinical and Health Psychology, Social Psychology and Developmental Psychology. One of the 8 computer assignments used until recently consists of practicing using a simulation of a growth model of stages of moral development, for which the spreadsheet program of Microsoft Excel has been used. The other seven assignments are based on applications in the software package of MINDS. MINDS is a so-called test-manager program from which several performance tests and questionnaires (or instruments combined into a test battery) may be administered, scored, and from which the results can be reported (Brand & Houx, 1992; Brand, 1999). ‘MINDS’ is an acronym for ‘Mental Information processing and Neuropsychological Diagnostic System’. Test administration and outcome report (or scoring) occur separately. Outcome report of a test consists of a display of the relevant scores or mean reaction times (RT), and references to the appropriate norms, if available for the particular test, and if requested. It is also possible to create a summary file per test for use in the statistical package of SPSS, including test specific variables, and with complete data declaration syntax, which is generated automatically. One can also specify the contents of the statistical file, e.g. with the inclusion of scores on item level or scale level, or with item response times. For using MINDS in the present educational program, the software package was adapted for use in a network-environment (Windows NT), and several new applications or test modules were developed. 1
The first year curriculum has been set up in this way (three parts) until very recently. From the academic year 2000–2001 onward, the first year theoretical courses will be organised into six sequential parts throughout the year, with the present practical course being held parallel to parts 5 and 6.
Presently, each assignment includes a weekly meeting for the students, who are assigned to study groups of about 20 students per group, and a computer session, which takes place in one of the faculty’s computer rooms. Each assignment is completed with
Clinical assessment computerized methods
130
the eventual writing of a report by each student. During the study group meeting (taking about two hours), the current assignment is briefly introduced. Also, the last assignment is discussed thoroughly, and feedback is given on reports of earlier assignments. The computer session (taking less than 45 minutes) consists of undergoing the proper experiment or questionnaire, following which a report with the individual outcomes and a graphical chart can be requested and displayed on the computer screen. This report (in plain text format) and the graphics can be saved on floppy disk and may be printed. On exiting the MINDS program, the individual data are copied to a central network directory. Following every weekly computer session (always on a particular day for all students), the individual data files are collected from the central directory. Using MINDS, the data are processed and aggregated into an SPSS syntax file. Following this, the individual error percentages and mean RT in case of a performance test, or response times and scores on questionnaire items, are grossly inspected for extreme values. On this inspection, some records from respondents with unrealistic extreme values are excluded from further analysis. On the basis of the remaining data, depending on the current assignment, descriptive statistics (such as means and standard deviations), or simple tests for differences or correlations, are computed. The results from these analyses are then presented in a brief summary report (in table format). Presently, on the same day this report is placed on an intranet page (the faculty’s electronic learning environment), which is accessible by all students (in the first years of the course, the reports on the group data were sent to each individual student email address, and also placed on a paper board in the faculty building). For most assignments, the group data have to be used as the central data in the individual student reports. They may also serve as a reference to the individual data. Having the group data available for the student within 24 hours is one of the major advantages of the present procedure. Assignments What follows is a short description of each of the 7 assignments that were carried out by means of the program MINDS. The computer programs in most of the assignments were developed in such a way that a number of task parameters can easily be adjusted. For each application this is a different set, but generally these include presentation time, letter type fonts and size, number of trials and practice trials, and some others. Access to these parameters can be secured by way of a password facility. 1. Signal Detection. This classic psychonomic research paradigm (Tanner & Swets, 1954) is set up as a perception experiment. The aim is to illustrate how the signal sensitivity and the decision criterion of the perceiver varies as a function of the signal/noise proportion and the signal likelihood, respectively. One half of the subjects performs a ‘noise’experiment; the other half is engaged in a ‘likelihood'-experiment. Task for the subject is to detect, on each of a large number of trials, the presence of a predefined target symbol (signal), among a number of non-target symbols (noise). In two blocks of trials, such an experiment (manipulation of signal/noise proportion or proportion of trials with signal/ noise presentation) is accomplished. In the individual results, d’ (d-prime) is calculated and presented as a measure of the perceiver’s signal sensitivity, and ß (beta) is presented
Computer applications in a novel educational program
131
as a measure of response bias. In the noise-experiment d’ increases and ß is constant when the number of non-targets in a display decreases. In the likelihood-experiment it can be shown that d’ remains relatively constant and that ß changes when the proportion of signal/noise trials is manipulated. This is the case in most of the individual assessments. 2. Lexical Decision and Priming (Meyer & Schvaneveldt, 1971). In this task, the subject has to decide whether two simultaneously and horizontally presented letter strings are both regular Dutch words or not. The pairs of letter strings can be one of five categories: two semantically related words, two words not related, a pair with a word in the upper position and a non-word in the lower position, a similar combination of word and nonword but with reversed position, and a pair of non-words. The aim of the assignment is to illustrate the effect of priming: the recognition of a pair of words is faster and more accurate if the words in a pair are semantically related than if they are unrelated. In most of the individual cases, this effect can be shown. See Figure 1 for the aggregated results from 354 subjects in this task.
Figure 1. Mean RT (left) and percentage correct (right) for pairs with related words, unrelated words, pairs with word/nonword, nonword/word, and pairs with both nonwords, in the assignment ‘Lexical Decision and Priming’; n=354 subjects. 3. Defense Mechanisms and Social Desirability. In this study, two operationalisations of the construct of repression are explored. The subject fills in three questionnaires that are presented on the screen: the Defense Mechanisms Inventory (DMI; Ihilevich & Gleser, 1986), the Marlowe-Crowne Social Desirability Scale (Crowne & Marlowe, 1964), and the State Trait Anxiety Inventory (Spielberger, Gorsuch, Lushene, Vagg, & Jacobs, 1983). The first operationalisation of repression comes from the DMI. This questionnaire consists of descriptions of a number of problem situations. Following each situation, a number of questions is presented, each with five possible answers, corresponding to the five underlying defense styles. A combination of scores of these styles results in a score
Clinical assessment computerized methods
132
for repression (see also Moormann, Brand, Behrendt & Massink, this volume). The second operationalisation comes from a combination of scores on trait anxiety and social desirability (Weinberger, Schwartz & Davidson, 1979). In the report of the aggregated group data, several psychometric properties of the 3 instruments are presented, including Cronbach’s alpha, and the correlation between the two measurements of repression. The student’s task is to deal adequately with a number of questions about the aggregated results. 4. Stress and Information Processing. In this assignment, the classical paradigm by Sternberg is used (Sternberg, 1975). In a letter recognition task, the RT is unraveled into one cognitive and one perceptual-motor component (the additive factor method). Also, a daily hassles questionnaire is filled in, resulting in a measure for chronic stress. The aim of this assignment is, besides the illustration of the application of a classical laboratory paradigm, to explore the possible negative effect of chronic stress on information processing, especially on the cognitive component. An underlying explanation is the ‘worrying’-hypothesis (Eysenck & Calvo, 1992). In this account the assumption is made that stress and anxiety results in more rumination and worrying, and that this requires too much of the short term memory capacity (Brand, Hanson & Godaert, 2000). The students have to write a research report for this assignment. 5. Prisoners Dilemma Game (PDG). This classic social psychological game (Rapoport & Chamma, 1965) shows the consequential gains and losses of cooperation as opposed to competition. The game is played 4 times, in each of which the competitor (the computer) adopts a different predefined strategy. The strategies are: Tit for Tat, Suspicious Tit for Tat, Hardball, and Change of Heart (see Axelrod, 1984). The aggregated results show that the most gains (the most co-operative responses) are achieved when the computer adopts a Tit for Tat strategy. 6. Social Values Game (Liebrand, 1983). In this interactive simulation game, the subject has to choose from two possible responses, reckoning on the possible moves of the competitor. As in the PDG, the options are gaining or losing. On the basis of the underlying theory and his own choices, the subject is eventually classified in a continuum, running from altruism to co-operation to individualism to competition. As such, this task can be regarded, and in fact has been used, as a personality measure. 7. Stages of Moral Development. This assignment illustrates a developmental psychological assessment instrument. On the basis of several classic ‘moral’ dilemmas, moral development is assessed, using two different methods. One is a computer version of the well-known Defining Issues Test (DIT) by Rest (1979). Subjects have to rate items reflecting different stages of moral development. The other task is a computer version of a sorting task, in which nine styles of reasoning, with respect to a moral dilemma, have to be sorted from “simplistic” to “wise” (Boom, Brugman, van der Heijden, 2001). Both instruments assume the model of moral stages by Kohlberg (c.f. Colby & Kohlberg, 1987). In the aggregated data, the outcomes of the two instruments are related to theoretical expectations, to each other, and to the outcomes of the previous assignment ‘Social Values Game’. Each year one or two assignments have been and will be replaced by new ones ‘to keep the staff involved alert’.2 Minor changes to the remaining assignments are being made to prevent students from using reports from the previous cohort. Using MINDS, new instruments can be implemented easily.
Computer applications in a novel educational program
133
Evaluation At the end of this course, the students fill in an evaluation form. The questions on this form are related to a number of aspects in the course. Six general questions (being most informative about the course and the use of the computer) have been listed in Table 1.
Table 1. Mean ratings (and standard deviations) of the practical course by students in four consecutive years. 96/97 Evaluative item
97/98
98/99
99/00
(n=230) (n=228) (n=207) (n=367)
1. Better understanding of how research in psychology is set up
4.0(0.7)
4.1(0.8)
4.2(0.7)
4.3(0.8)
2. Increase of insight in the different topics
3.8(0.8)
3.9(0.8)
4.0(0.8)
3.9(0.8)
3. Better skilled in writing research report
4.2(0.7)
4.4(0.7)
4.4(0.6)
4.3(0.7)
4. Better impression of the role of the computer in psychological research
3.9(0.8)
3.8(0.9)
3.8(0.8)
3.6(0.9)
5. More familiarity using the computer
2.9(1.1)
3.1(1.0)
2.7(1.0)
2.8(1.2)
6. Benefit of the course
3.8(1.0)
4.1(0.9)
4.1(0.9)
4.2(0.7)
The questions can be answered on a 5-point Likert scale, with higher rating as more favorable. In Table 1, the mean evaluations on four consecutive years, from the college year 1996–1997 onward, have been listed. As can be seen, in the year of the program’s first application, the students rated the course quite favorably, in reaching the aims of the educational program. In the second year, and also the third, the ratings generally became even more favorable, except for the rating of more familiarity in using the computer. The students have a better understanding 2
At present (academic year 2000–2001), the assignment concerning the growth model for moral development using the MS Excel program has been replaced by an assignment focusing on developmental stages in solving balance problems and development of working memory capacity. Likewise, the Defence Mechanisms assignment and the assignment on stress and information processing have been replaced by two assignments in which the Big Five model of personality traits plays a central role.
of the role of research in psychology, and get more insight and knowledge of the topics that are discussed in the course. The increase in skills specific to writing research reports was rated quite high, and so was the appreciation of the benefit of the course and of the procedure used. The increase in ratings may parallel several minor improvements in certain aspects of the educational program, on the basis of earlier evaluations. That is, each year several improvements have been applied to the accompanying textbook, to the study group’s program and procedure, and improvements have been implemented in the computer
Clinical assessment computerized methods
134
programs. As for the latter, there has been a better display of the individual results on the screen (addition of graphic charts with respect to the individual outcome) and each year the tables of norms are adjusted as more data become available. As mentioned, assignments 3 and 4 have been replaced. This was done partly because of difficulties in the interpretation of aggregated results in these assignments. The mean rating of gaining more familiarity in using the computer seems to decrease as compared to earlier years. This may be explained by assuming that the influence of information technology and computers in daily living is rapidly increasing, and thus an educational program in which the computer plays an obvious part, does not actually add to the familiarity by itself. Generally, the course has been rated very favorably by most of the students. Most appreciation seems to be associated with the feeling of being better skilled in writing research reports, and with a higher understanding of how research is set up following psychological research questions. The former has frequently been acknowledged by many teachers in the second and later years of the curriculum in the evaluation of research reports, as compared to the quality of reports in earlier years. Conclusions From these and other evaluations, we arrive at the following conclusions: • There is a high proportion of practical activity, but this is guided by theoretical concerns as well. • Aggregation of data and feedback to the students can be attained quickly and efficiently. • There is much use of electronic facilities (testmanager MINDS, electronic learning environment). • Several different research designs and statistical procedures may be applied. • Valid research data may be obtained. • The course has been set up as an integrative learning trajectory. That is, care is taken to keep up with the issues currently being discussed in a parallel methodological course. • The only minor point, raised from the side of the teachers, is concerned with the large amount of time spent in correction of the research reports and in supplying adequate, individual feedback.
References Axelrod, R. (1984). The evolution of co-operation. New York: Basic Books. Boom, J., Brugman, D., & van der Heijden, P.G.M. (2001). Hierarchical Structure of Moral Stages Assessed by a Sorting Task. Child Development, 72, 535–548. Brand, N. (1999). MINDS: Tool for Research in Health Psychology and Neuropsychology. In B.P.L.M. den Brinker, P.J.Beek, A.N.Brand, F.J.Maarse, & L.J.M.Mulder (Eds.), Cognitive Ergonomics, Clinical Assessment and Computer-Assisted Learning, (pp. 155–168). Lisse: Swets & Zeitlinger. Brand, N., Hanson, E., & Godaert, G. (2000). Chronic Stress affects Blood Pressure and Speed of Short Term Memory. Perceptual and Motor Skills, 91, 291–298.
Computer applications in a novel educational program
135
Brand, N., & Houx, P.J. (1992). MINDS: Toward a computerized Test Battery for Health Psychological and Neuropsychological Assessment. Behavioral Research Methods, Instrumentation and Computers, 24, 385–389. Crowne, D.P., & Marlowe, D. (1964). The approval motive. Studies in evaluative dependence. New York: Wiley. Colby, A., & Kohlberg, L. (1987). The Measurement of Moral Judgement. Vol II. Cambridge Ma. Cambridge University Press. Eysenck, M.W., & Calvo, M.G. (1992). Anxiety and Performance: the Processing Efficiency Theory. Cognition and Emotion, 6, 409–434. Gleitman, H. (1996). Basic Psychology, (4th ed.). New York: Norton & Comp. Gray, P. (1999). Psychology. (3rd ed.). New York: Worth. Ihilevich, D., & Gleser, G.C. (1986). Defence Mechanisms. Owosso: DMI Ass. Liebrand, W.B.G. (1983). Interpersonal differences in social dilemma’s: A game approach. Dissertation Abstracts, 43 (7-B): 2373. Meyer, D.E., & Schvaneveldt, R.W. (1971). Facilitation in recognizing pairs of words: Evidence of a dependence of retrieval operations. Journal of Experimental psychology, 90, 227–234. Rapoport, A., & Chamma, A.M. (1965). Prisoner’s Dilemma. Ann Arbor: The University of Michigan Press. Rest, J.R. (1979). Development in judging moral issues. Minneapolis: University of Minnesota Press. Spielberger, C.D., Gorsuch, R.L., Lushene, R., Vagg, P.R., & Jacobs, G.A. (1983). Manual for the State-Trait Anxiety Inventory. Palo Alto, CA: Consulting Psychologists Press. Sternberg, S. (1975). Memory scanning: New findings and current controversies. Quarterly Journal of Experimental Psychology, 27, 1–32. Tanner, W.P., & Swets, J.A. (1954). A decision making theory of visual detection. Psychological Review, 34, 401–409. Weinberger, D.A., Schwartz, G.E., & Davidson, J.R. (1979). Low-anxious, high-anxious, and repressive coping styles: Psychometric patterns and behavioural and physiological responses to stress. Journal of Abnormal psychology, 88, 369–380.
Chapter 11 Psychotherapy Treatment Decisions Supported by SelectCare C.Witteman University Of Utrecht, Institute of Information and Computing Sciences, PO Box 80.089, 3508 TB Utrecht, The Netherlands Abstract SelectCare is a computerized decision support system for psychotherapists who have to decide how to treat their depressed patients. This paper describes the decision making model that is implemented in SelectCare and the decision elements on which it bases its advice to its users. The system itself is then presented, followed by data from an evaluation of its use.
Introduction SelectCare is a computer program that supports psychotherapists in their difficult task of deciding which treatment method is most suitable for their depressed patients. The program addresses treatment planning, not the preceding process of diagnosis. It is assumed that a ‘depressive disorder’ has been diagnosed. SelectCare supports the task of deciding what type of treatment will be proposed to a particular depressed patient, with her or his specific complaints, taking into account the patient’s, as well as the therapist’s own, circumstances (compare Paul, 1967). SelectCare is not an expert system that gives the correct treatment plan. It is not such because there is no consensus among therapists as to what the correct treatment for depression is, and practicable handbooks with decision guidelines or criteria are absent (Snyder & Thomsen, 1988). SelectCare therefore cannot support a correct decision outcome. The more modest goal of SelectCare is to support a correct decision process. Such support seems welcome, because psychotherapists’ unaided decision performance has been found to be imperfect (Ayton, 1992; Shanteau, 1992; Witteman, 1992; Witteman & Koele, 1999). This imperfection may be explained by the inherent uncertainty of knowledge in the psychotherapy domain, added to the general human proneness to error in reasoning and decision making. Imperfect decision making seems unsatisfactory, particularly where patients are concerned. We therefore decided to construct a decision aid for the treatment selection task. In the next sections we will first specify the model of correct decision making we have incorporated in our system. Then we describe our design choices, the decision elements in the system and their acquisition, followed by a short description of the resulting system and evaluation data.
Psychotherapy treatment decisions supported by select care
137
The Decision Model Normative decision models give a standard procedure for how the best decision outcome is to be reached. Analogously to the empirical cycle of generate and test, unbiased and unprejudiced decision making means thinking up all possible alternatives or courses of action, considering each alternative carefully in the light of both positive and negative values of its attributes and deciding upon that alternative that comparative evaluation shows to be the best (compare for example Simon, 1961; Janis & Mann, 1977; Baron, 1994). In such a comprehensive process, all information (attribute-values) is to be considered with each alternative, and the decision maker should keep an open mind to the possibility that an alternative is not acceptable. The normative principle aims at maximizing the outcome of the decision process by summing the attribute values of each of the alternatives, in contrast to the alternative of satisfying (Simon, 1957), which states taking the first alternative that is satisfactory on all important aspects (Crozier & Ranyard, 1997). SelectCare supports a maximizing strategy because, in our view, therapists ought not to be too easily satisfied with an alternative, but should look extensively into the specifics of their patients’ cases. When a decision maker follows the maximizing normative strategy, the resulting decisions are defensible to others by pointing out their internal consistency and the correct use of logical rules (compare Edwards, Kiss, Majone, & Toda, 1984). The strategy does not guarantee that the correct decisions will be made, that is: with the best possible outcome, but it does guarantee that the decisions will be made correctly. Translated to a decision about the treatment of a depressed patient, this means that all possible alternative therapy options need to be considered, that the reasons in favor and against each, not forgetting the contra-indications, need to be checked and summed, and that the treatment method with the most positive score is proposed. It is this process of remaining unprejudiced and impartial to the different possible options until the patient data have been carefully looked at that is supported by SelectCare. Design Choices People, professional or otherwise, do not commonly use a normative strategy when making decisions. In earlier studies, we established at what points psychotherapists’ performance deviated from the norm. We found psychotherapists to use conservative decision strategies (Witteman, 1992; 1995; Witteman & Kunst, 1997; Witteman & Koele, 1999). They framed their decision; that is, they adopted a frame of reference, usually their own treatment approach, through which they looked at the patient data. They represented the problem as a yes/no question about the suitability of their own method for the treatment of that patient. They looked for, and found, data that supported their method. Framing and a confirmation bias are well-documented phenomena in decision making. Obviously, they impede a correct decision making process. We designed SelectCare to help therapists improve their strategies, by implementing debiasing techniques (see also Witteman & Kunst, 1995; 1999) that counteract framing and confirmation bias. From the debiasing literature, we borrowed two techniques with
Clinical assessment computerized methods
138
proven effectiveness. The first is structuring the decision problem by clarifying it and ordering the decision elements in some appropriate way. This provides judges with an overview and thereby enables them to focus on the major issues without overlooking possibly relevant aspects (see among others Keren, 1992; Westenberg & Koele, 1993). The second technique is instilling a more critical attitude in decision makers, both towards their own decision processes and towards their favorite decision options, by asking them to pay attention to decisive reasons (in our system: contra-indications) why their decisions might be wrong (compare among others Arkes, 1981; 1991; Keren, 1990; Williams, 1992). Both techniques counteract the psychotherapists’ prejudiced strategies. They are enforced through the interface. In our design, we also took human limitations in cognitive processing capacities into account, to improve SelectCare’s chances of being accepted in practice (compare Keren, 1992; Rasmussen, 1993). More specifically, therapists are not required to think up all possible arguments for or against treatment decisions themselves, but they are offered structured checklists of possibly relevant items to choose from. SelectCare, taking advantage of the fact that computers outperform humans in complex calculations, combines the selected items into a measure of suitability for the different treatment options for the therapists. The Decision Elements SelectCare is not an expert system, yet it has to contain domain knowledge in order for the users to be able to work with it. These domain elements are presented to the userpsychotherapists as the consensus beliefs of their colleagues, who they may consult to get a second opinion about the treatment of their patients. A database contains the domain knowledge, which consists of one hundred and thirtyone elements. There are six treatment modalities, such as ‘psychodynamic therapy’ or ‘client-centered therapy’. Each of these six is represented twice, once in an individual setting and once in a group setting. There are twenty-nine symptom descriptions, ordered in four categories, for example ‘loss of interest’ and ‘sleeping problems’ in the category ‘vital symptoms’. Then there are fifty-one descriptions of factors that may influence or have influenced the symptoms, ordered in ten categories. For example, there are ‘disrupted reality testing’ and ‘external locus of control’ in the category ‘weak personality factors’. Finally, there are thirty-nine contra-indications, in five categories, e.g. ‘alcohol addiction’ in the category ‘behavioral aspects’. The symptoms, factors and contra-indications in the database are those that were found to be relevant, to different degrees, in the choice of a treatment plan. This was established by questionnaire, filled in by one hundred and forty-two practicing therapists. In this questionnaire, respondents were presented with longer lists of possibly relevant decision elements. They marked for each symptom and factor whether it was relevant to a treatment decision and, if so, to what degree. They gave their answers on a scale from −2 to 2, with negative values indicating that that element was a relevant reason against the treatment decision, a zero indicating that it was irrelevant to the decision and positive values that it was a relevant reason for the treatment decision. Factor analyses carried out using the answers resulted in the exclusion of elements that explained none of the
Psychotherapy treatment decisions supported by select care
139
variance and the inclusion of the eighty elements, ordered in fourteen categories, as described above. The fourteen symptom and factor categories have differential relevance against or in favor of each of the treatment modalities. Their positive or negative decision weights were determined using analysis of variance. This revealed, for example, that the category of ‘anxious symptoms’ constituted a reason in favor of behavior therapy with a decision weight of .70 and a reason against psychodynamic treatment with a weight of −.60. The list of thirty-nine contra-indications contains those elements that were said by eighty percent or more of the participating psychotherapists to be decisive reasons against one or more specific treatment plans.
Figure 1. SelectCare’s main menu. SelectCare: The System SelectCare was first programmed in LISP, for the Apple Macintosh (Witteman & Kunst, 1999). When its functionality had been established, a less cumbersome programming environment, Microsoft Access, was adopted for a more widely used platform. The system now runs on a Windows PC and presents itself as shown in Figure 1. The Interface The interface is more than just the window to the program. It enforces the order in which decision steps are performed. First, the psychotherapist-user fills in information about the patient, such as name, gender, address, year of birth, etc. .Then a ‘session’ with the
Clinical assessment computerized methods
140
program is started. The therapist is shown the names of the categories of symptoms, and invited to select those symptoms that describe the patient’s complaint (see Figure 2).
Figure 2. Session screen for the selection of symptoms. A category ‘all’ is added to choose from when a therapist is uncertain in which category a symptom may be found. When a symptom is selected, a small overlay window is presented in which the therapist marks to what extent the symptom applies to the patient. A patient may be very or slightly aggressive, which should and does make a difference for the suitability of the different treatments (as described below). In this small window the therapist may also enter any information that is relevant for the treatment, such as how long the patient has shown the symptom, whether it impedes her or him in work or at home, etc. . Adding factors to the patient description proceeds in the same way as adding the symptoms. The therapist who might then be tempted to end the session and ask for advice is prevented from doing so, because the ‘conclusion’ button appears on screen only after the therapist has checked all possible contra-indications. Therapists may also indicate persons who are relevant to the onset or maintenance of a patient’s complaints. There may for example be intense rivalry with a sibling. Such information is added in a ‘persons’ window. They do not constitute decision elements as such, but they may be very informative for the treatment. Information provided here is taken up in the report that the therapist may have printed at the end of a session. Checking the contra-indications may be done by passing over the whole list in alphabetical order, and by marking each contra-indication as either present or not present. It may also be done category-wise, and marking a whole category as present or absent (see Figure 3). The ‘conclusion’ button appears on screen only after all contra-indications have been checked.
Psychotherapy treatment decisions supported by select care
141
Figure 3. Session screen for checking the contra-indications. The possibility of marking a whole category of contra-indications as present or absent instead of consciously considering each in turn is a compromise. We wished to minimize the possibility of overlooking a possible contra-indication, but therapists became weary of the long series of clicking actions they had to perform. In the conclusion window, all twelve treatment possibilities (the six therapies, each in individual and in group setting) are displayed, with a percentage expressing their suitability for the described patient. The most suitable treatment is displayed at the top, the treatment(s) that are unsuitable because of a contra-indication are presented in shaded characters in a separate box (see Figure 4). The Database The database contains the decision elements, described above. They are represented in frames (compare Lucas & van der Gaag, 1991), which allows for easy change or addition of elements (see below: Expert mode). There is one super-frame ‘therapies’, with the twelve modalities as sub-frames. There is one super-frame ‘symptoms’, with the four categories of symptoms as sub-frames and each symptom as sub-frame in the appropriate category frame. Likewise, there is a super-category ‘factors’ with its sub- and sub-sub frames, and a super-frame ‘contra-indications’. The decision weights of each category of symptoms and factors relative to each treatment option are represented in the category frames. For example, when a user selects a symptom, the decision weight of that symptom relative to each treatment option is activated. With each contra-indication, a link to a treatment option indicates that that option should be excluded from consideration when the contra-indication is present. In the final advice, the decision weights of the symptoms and factors are combined, as described below, into measures of suitability for each treatment option. The suitability of
Clinical assessment computerized methods
142
an option that is excluded by the presence of a contra-indication is also presented, but in shaded characters. That way a therapist may see how suitable that option would be in the absence of the contra-indication(s).
Figure 4. Session screen presenting the conclusions. The Calculations SelectCare carries out three subsequent calculations. First, the decision weights of the selected symptoms and factors as represented in the database are combined, for each symptom and factors separately, with their measures of confidence as expressed by the therapist, representing her/his confidence in the applicability of that symptom or factor to the current patient. This combination is done by multiplying, such that for example a depressive complaint with a weight factor of −0.95 and a confidence measure of 0.9 carries a total weight of −0.85. The second calculation adds together all negatively rated elements, that is, the reasons against a treatment option, and all positively rated elements or reasons for an option, for each option separately. The third calculation combines the two reasons against with the reasons for an option, again for each option separately, by subtracting the value of the reasons against from the value of the positive reasons. The Expert Mode SelectCare runs in normal mode or in expert mode. When one identifies oneself as an expert, one may change the data in the database. A behavior therapist may for example wish to add ‘I feel incompetent to use psychodynamic treatment’ as a contra-indication to such treatment. Another therapist may believe that psychodynamic treatment works well with anxious patients, and change the decision weights accordingly. Symptoms, factors
Psychotherapy treatment decisions supported by select care
143
and contra-indications may be changed, removed or added, always supplying decision weights. We stress again the fact that SelectCare cannot pretend to support decisions for the correct treatment plan given a particular patient, but that it supports a correct decision making process. Our goal is to guide therapists in first describing their patients and checking contra-indications, and only then considering specific treatment options. A therapist who wishes to make changes in the ‘default’ database, which contains the consensus opinion of many colleagues, will have to explicitly argue why, at least to her or himself. This counteracts random argumentation and, what’s more, it may foster a focused debate about decision elements in the selection of a treatment option. Evaluation We conducted two evaluation studies with SelectCare, one addressing its ease of use, and one to find out whether psychotherapists could and would use it. The ease of use of the system was judged quite favorably, by sixteen subjects. These subjects were not practicing psychotherapists but advanced psychology students. They were set a task in which they had to use all the functionality offered through the interface, but for which they did not have to think about the precise contents of the selections they made. They gave their judgements on a scale of 1–10, where 1 means extremely poor and 10 means perfect. The general judgement about the usability of SelectCare, expressed with statements such as ‘the system is frustrating (satisfying, difficult, great, etc.) to work with’, had an average of 7.2 (SD=1.6). The screen layout was assigned a mean of 8.6 (SD=1.4), the terminology used a 8.1 (SD=0.9) and the learnability a 7.7 (SD=1.4). Most convincingly, all subjects found SelectCare to be an attractive program. The psychotherapists who tested SelectCare gave answers to the question concerning the subjective utility of SelectCare, by expressing their satisfaction, and to the question concerning the possibility of more objective utility when they gave more complete patient descriptions. The therapists read a case history of a depressed patient and used SelectCare to come to a proposition for a treatment plan. The twenty subjects, eleven final year clinical psychology students and nine practicing therapists, all read the same case history. They described the case by selecting symptoms and factors from SelectCare’s database. Their descriptions were compared to those of a group of colleagues who had described the same patient on paper, without using the system. The psychotherapists using SelectCare were significantly more comprehensive, with a mean of 23.8 versus 2.8 descriptions. The psychotherapists were satisfied with the system. They expressed their opinion by placing a mark on a scale of 1–10, 1 again meaning extremely poor and 10 meaning perfect. They judged the terminology ‘adequate’ with an average score of 7.5, the categorization of the decision elements ‘satisfactory’ with an average of 7.6, they judged the completeness of the lists of decision elements and treatment options with a 7, the order in which the decision process had to be performed was given an 8.6 and the advice given by SelectCare a 6.6. The therapists did not always completely agree with the
Clinical assessment computerized methods
144
advice, but they did find that advice at least plausible and were enthusiastic about the thoroughness of the decision process. The psychotherapists expressed their experience with SelectCare as pleasant (7 times), useful (6 x), stimulating (4 x), easy (4 x), comfortable (3 x), difficult (2 x) and tiring (2 x). The most rewarding remark was that SelectCare forces a closer look at the patient. Conclusion The majority of our participants described their experience with SelectCare as pleasant and useful. It is thus ready for professional use and will actually be marketed this year. We trust that both individual therapists and teams will profit from the support SelectCare offers. Individual therapists may consult SelectCare when they are uncertain about their decisions and would like a second opinion. In teams that discuss the planning of the treatment of patients, different therapists may individually describe the same patient with the system, and they may then discuss differences in their judgments. This would help to avoid a discussion in terms of beliefs, and stimulate more focused exchanges. And, of course, SelectCare may be a very useful training tool for psychotherapists-in-training, who may get to know the vocabulary and possible links between symptoms, factors and treatment plans by describing fictitious patients. References Arkes, H.R. (1981). Impediments to Accurate Clinical Judgment and Possible Ways to Minimize their Impact, Journal of Consulting and Clinical Psychology, 49, 323–330. Arkes, H.R. (1991). Costs and Benefits of Judgment Errors: Implications for Debiasing, Psychological Bulletin, 110, 486–498. Ayton, P. (1992). On the Competence and Incompetence of Experts. In G.Wright & F.Bolger (Eds.), Expertise and Decision Support (pp. 77–105). New York: Plenum Press Baron, J. (1994). Thinking and Deciding (2nd ed.). Cambridge: Cambridge University Press. Crozier, W.R., & Ranyard, R. (1997). Cognitive process models and explanations of decision making. In R.Ranyard, W.R.Crozier, & O.Svenson (Eds.), Decision making: cognitive models and explanations (pp. 5–20). London, UK: Routledge. Edwards, W., Kiss, I., Majone, G., & Toda, M. (1984). What Constitutes ‘a Good Decision’? Acta Psychologica 56, 5–27. Janis, I.L., & Mann L. (1977). Decision Making. New York: Free Press. Keren, G. (1990). Cognitive Aids and Debiasing Methods: Can Cognitive Pills Cure Cognitive Ills? In J.P.Caverni, J.M.Fabre, & M.Gonzalez (Eds.), Cognitive Biases (pp. 523–552). Amsterdam: North-Holland. Keren, G. (1992). Improving Decisions and Judgments, the Desirable versus the Feasible. In G.Wright & F.Bolger (Eds.), Expertise and Decision Support (pp. 25–46). New York: Plenum Press. Lucas, P., & Gaag, L.C.van der (1991). Principles of expert systems. Workingham, UK: Addison Wesley. Paul, G.L. (1967). Strategy of Outcome Research in Psychotherapy, Journal of Consulting Psychology, 31, 109–118.
Psychotherapy treatment decisions supported by select care
145
Rasmussen, J. (1993). Deciding and Doing: Decision Making in Natural Contexts. In G.Klein, J.Orasanu, R.Calderwood, & C.E.Zsambok (Eds.), Decision Making in Action: Models and Methods (pp. 158–171). Norwood, NJ: Ablex. Shanteau, J. (1992). The Psychology of Experts: an Alternative View. In G.Wright & F.Bolger (Eds.), Expertise and Decision Support (pp. 11–23). New York: Plenum Press. Simon, H.A. (1957). Models of Man. New York: Wiley. Simon, H.A. (1961). Administrative Behavior: A Study of Decision-making Processes in Administrative Organization (2nd ed.). New York: Macmillan. Snyder, M., & Thomsen, C.J. (1988). Interactions between therapists and clients: hypothesis testing and behavioral confirmation. In D.C.Turk & P.Salovey, Reasoning, inference and judgment in clinical psychology (pp. 125–152). New York: Free Press. Westenberg, M.R.M., & Koele, P. (1993). Klinische Besliskunde [Clinical Decision Science]. In P.Koele & J.van der Pligt (Eds.), Beslissen en Beoordelen [Thinking and Deciding] (pp. 319– 345). Amsterdam: Boom. Williams, A.S. (1992). Bias and Debiasing Techniques in Forensic Psychology. American Journal of Forensic Psychology, 10, 19–26. Witteman, C.L.M. (1995). Psychotherapists’ Decision Making. Paper presented at the 15th Conference on Subjective Probability, Utility and Decision Making, Jerusalem. Witteman, C.L.M., & Koele, P. (1999). Explaining treatment decisions. Psychotherapy Research, 9, 100–114. Witteman, C.L.M., & Kunst, H. (1995). A Technology for Debiasing Psychotherapists’ Decision Processes. Paper presented at the First International Cognitive Technology Conference, Hong Kong. Witteman, C.L.M., & Kunst, H. (1997). Planning the treatment of a depressed patient. Clinical Psychology and Psychotherapy, 4, 157–171. Witteman, C.L.M., & Kunst, H. (1999). SelectCare—in aid of psychotherapists’ treatment decisions. Computers in Human Behavior, 15, 143–159.
Chapter 12 A Method for the Assessment of Interpersonal Functioning in a Residential Group Therapy Program K.Linker Psychotherapeutic Center “Noordvliet”, Leeuwarden, The Netherlands Abstract To investigate the interpersonal functioning (IPF) of participants in a residential group therapy program, in terms of their own perspectives and those of others, a computer-aided procedure, based on the theory of the discursive cycle, is developed. The discursive cycle is considered to consist of an interpersonal cognition, an affective appraisal, and a tendency towards action. A participant starts the investigation by selecting an interpersonal cognition, a focal event, that is a short description of an event seen as an important issue for the social life in a group and that gets much attention of the members. In relation to a selected focal event, the affective appraisal and the tendency towards action are elicited with respect to a selected member. Subsequently, each discursive cycle is rated in terms of the extent to which it is experienced as enhancing or not enhancing the participant’s self-feelings. The rated self-experiences are graphically presented in terms of both an individual and a group perspective. Results are discussed in the group context.
Introduction The IPF-procedure is a computer aided procedure to investigate the interpersonal relationships between the participants in a residential group therapy program, but can be employed in each setting where people are functioning in some social relationship. In the following presentation of the procedure, some basic assumptions are discussed first. This is followed by a description of how the data regarding significant theoretical aspects are elicited. How these data are processed and how the outcome can be portrayed in a visual presentation is shown by means of an example of an investigation that is carried out. Basic Assumptions The social structure of a community is shaped by the on-going process of social discourse between the people of that community. By means of their interactions, the participants
A method for the assessment of interpersonal functioning
147
adjust their mutual positioning and the division of roles derived from these positions (Gergen, 1989; Hermans & Bonarius, 1991; Harré & Gilett, 1994; Sampson, 1993; Shotter, 1989). Through their everyday interactions, the participants express their approval or disapproval with regard to the contributions of their fellow participants. These interactions can be described by the concept of the discursive cycle. A discursive cycle starts with the perception of a social situation that leads to an interpersonal cognition. It is likely that the persons concerned have some emotional experience, serving as an affective valuation of what is happening. In correspondence to the cognition and the affective valuation, an action readiness will arise, leading to an action tendency (Frijda, 1986, 1987; Frijda, Kuipers, & ter Schure, 1989). In terms of the action undertaken or not undertaken, repositioning of the group members, with respect to each other, occurs. To a certain extent, the completed discursive cycle will result in an experience of acknowledgement or of disqualification because of the received feedback. This will be the case particularly if the discursive cycle represents a situation that is experienced as a focal event. An event is considered to be focal if it represents a socially and generally shared concern. A focal event draws attention and is an important subject of daily discourse (Frijda & Mesquita, 1995). IPF-Procedure Each discursive cycle starts with an interpersonal cognition (IC). The interpersonal cognitions, as used in this procedure, are elicited in a community meeting by asking for statements describing interpersonal situations that occur in the daily activities. These ICstatements, along with the names of all the members of the community, are entered into a computer program (1). In individual sessions with this computer program, each participant investigates his or her relationships with the other members. During these individual sessions, a respondent selects a statement from the pool of IC-statements and he or she is asked to what extent that statement applies to the first person on the community’s membership list; this is the attribution-score or the IC-score (2). If the respondent does not apply the statement to this person, then the program skips to the next person on the list. The respondent is also asked to enter a statement about his or her feelings about the association of that person with the situation the IC-statement refers to; in this way the respondent generates an AV-statement (3). Then the respondent is asked again to enter a statement about what he or she will do, or would like to do, in this condition; this step constitutes an AT-statement (4). Now the completed discursive cycle is rated with respect to the impact it has on the Self-experience of the respondent; the discursive Self-experience (DSE-score) (5). The discursive Self-experience consists of two dimensions: the experience of acknowledgement (A-dimension) if the effect is felt as facilitative; or the experience of disqualification (D-dimension) if the effect is felt as negative. After the rating of the first discursive cycle is completed, the second name on the membership list appears in combination with the selected IC-statement, and the described steps are repeated. When an AV-statement or an AT-statement is asked for, the respondent can add a new statement or choose a statement he or she has already entered. “The Staff” is added to the membership list, to give the participants the opportunity to express how the staff is experienced, and also “The Other” is added, in order to obtain an
Clinical assessment computerized methods
148
indication of what is expected from people in general. As soon as the membership list has been completely worked through, a new IC-statement is selected and the whole procedure starts again.
Table 1. Clusters of IC-statements and their correlation with the cluster as a whole. The label reflects the theme of the cluster. (r•.36 = p<.05 and r•.46 =p<.01). Cluster 1:
He/she withdraws themselves
.73
- He/she does not take me seriously
.87
- He/she behaves indirectly
.72
- He/she stands out
.78
- I am not able to get through to him/her
.74
- He/she always has to have the last word
.71
- He/she readily enters into a discussion with me
.71
- He/she attracts attention
.78
- He/she reacts impulsively
.56
Label: Self-oriented, directed at autonomy Cluster 2:
- He/she supports me
.68
- He/she takes feedback to heart
.84
- He/she is frank about his/her trouble
.72
- He/she behaves directly
.55
- He/she feels sympathy for others
.61
- He/she compliments me on something
.55
- He/she gives appropriate feedback to me
.75
- He/she stands up for themselves in an appropriate way
.49
- He/she listens with interest
.72
- He/she reacts critically if I bring something up
.61
- He/she is attractive
.68
- He/she is complains to me
.49
Label: Other-oriented, directed at alliance with others Cluster 3:
- He/she talks about leaving
.91
- He/she is involved in maintaining relations with me
.75
- He/she is playful
.72
A method for the assessment of interpersonal functioning
- He/she inclined to laugh
149
.85
Label: Impulsive, avoiding Cluster 4:
- He/she behaves self-damagingly
.90
- His/her self-care is inadequate
.91
Label: Against oneself
Data Analysis Themes in the Functioning of the Community Themes that are focal in the functioning of the community can be discovered by looking at the participants’ IC-scores. With an IC-score, a respondent has indicated to what extent an IC-statement, in his or her view, can be attributed to the fellow members in the community. By adding the IC-scores a member has received from each of the other participants on the IC-statements that are attributed to him or her, we develop a matrix with collective attribution scores. These sum scores reflect to what extent, according to the view of the community, the IC-statements refer to that member. The matrix, with the attribution scores of all members, can be cluster analyzed. An example the outcome of a cluster analysis of such a matrix is shown. The cluster analysis has produced four clusters. The theme of a cluster is reflected by a label (Table 1). The first two clusters are clearly composed of more IC-statements then the last two clusters, which we can take as a sign of their significance. The themes of the first two clusters are typical. Until now, each time we performed the IPF-procedure we have found clusters that could easily be labeled as trying to act autonomous and trying to act in solidarity with others. It seems obvious that these themes are basic in interpersonal functioning. Whereas the first two clusters characterize the interpersonal behavior, it seems that the next two clusters describe the view of the community concerning the way a member controls him or her self. By adding all the IC-scores attributed to each member of the community onto the statements that shape a cluster, we arrive at totals that indicate to what extent the theme is associated with the various members. By converting these totals to standard scores (z scores), we get a view of the community to the extent that these themes are applicable to each member of the community. In Fig. 1, the standard scores of the participants of the first clusters are portrayed. Standard scores of −1 and below and of +1 and above can be looked at as outstanding with respect to the theme of that cluster. In accordance with the opinion in the community, persons with a z score of −1 or below are minimally Self-oriented, they will display little activity towards creating personal position in the social structure of the community. Persons with a z score of +1 or above show the opposite: they try hard to push their own personal view upon interpersonal affairs.
Clinical assessment computerized methods
150
Figure 1. The view of the community concerning to what extent the theme of the first cluster is attributed to the members. The scores are converted to standard scores (z scores). By following the specific case of Ann within the group, the interpretation of this kind of data is exemplified. In Fig. 2, the z scores of Ann that go together with the themes of the four clusters are shown.
Figure 2. The view of the community concerning to what extent the themes of the four clusters are attributed to Ann. The shown scores are standard scores (z scores). It turns out that, to a considerable extent, the second cluster has been associated with Ann. Her z score on this theme outweighs her z scores on the other themes. This outcome suggests that she highly values the alliance with her fellow members and probably is
A method for the assessment of interpersonal functioning
151
inclined to pay more attention to the interest of her fellow members than to her selfinterest. She possibly has difficulties in maintaining the balance between her own personal interest and the interest of the persons she lives with. The Quality of the Relationship between the Members of the Community A DSE-score can reflect an experience of acknowledgement or an experience of disqualification. By multiplying the attribution score (IC-score) and the respective DSEscore, we get an A(acknowledgement)-score or a D(disqualification)-score. The A-score or the D-score is an indication for the quality of the experience resulting from the combination of a fellow member with an IC-statement. As a rule, a fellow member can get an A-score in combination with a particular IC-statement but could also have a Dscore in combination with another IC-statement. By converting the A- and D-scores to standard scores (z scores), the scores of the participants become relative to what, in the community, is experienced as usual. A score that differs one standard deviation or more from the mean can be considered as deviating from what the community feels as common. Acknowledgement and disqualification are assumed as important principles by which a community is controlled and the roles of the members are adjusted according to each other. By portraying the social functioning in a community on a plane, we can use these two principles as the dimensions that structure this plane. In a co-ordinate system, the standardized scores of the A-scores are placed on the x-axis and those of the D-scores on the y-axis. Scores of the A-dimension that are smaller than the mean are placed on the left side of the x-axis, and those larger than the mean on the right side of the x-axis. As for the scores of the D-dimension, those smaller than the mean are placed on the upper part of the y-axis, whereas those larger than the mean are placed on the lower part. The upper-left quadrant is characterized as neutral: little acknowledgement and little disqualification. The upper-right quadrant is characterized by acknowledgement and little disqualification. The lower-right quadrant is characterized by ambivalence: acknowledgement combined with disqualification. The lower-left quadrant is characterized by disqualification and little acknowledgement. Scores that are positioned in the four squares around the center are considered to be common to what the community feels as a unit. These A- and D-scores can now be analyzed from different points of view. The analysis from the point of view of the various participants gives an impression of how they experience their stay in the community; what they think of the discursive climate in the community. The analysis can also be done from the point of view of the community, the joint perspective of the participants; how the joint participants experience the functioning of the individual members. The Discursive Climate of the Community By adding up all the A-scores and all the D-scores, a participant has associated themselves with the various members. In such a way, we get the summarized experiences this participant has reported. These totals from the experienced acknowledgement and disqualification of the various participants for all the discursive cycles they generated,
Clinical assessment computerized methods
152
converted to standard scores, are used to portray a picture of how the community is experienced by its members (Fig. 3).
Figure 3. The overall view of the participants upon the community, as expressed by their total scores on the A- and the D-dimension. The shown scores are standard scores (z scores). In this example, it turns out that, in the case of Ann, little acknowledgement and moderate disqualification is experienced from her fellow members in the community. Earlier in the group therapy program, it turned out that she was spending much effort in the relationship with her fellow members, but it seems now that she often feels frustrated. Apparently, she has a problem speaking her mind and standing up for her opinion and her personal interests. The Opinion of the Community By summarizing all the A-scores and all the D-scores the various participants have given to a member, we get a generalized view of the community on this member. These sum scores show the general consent of the community on the contribution of the respective members. Again, these totals for all the members are converted into standard scores (Fig. 4).
A method for the assessment of interpersonal functioning
153
Figure 4. The collective opinion concerning the contribution of the various members to the general functioning of the community, expressed by the scores they obtained on the A- and the D-dimension. The shown scores are standard scores (z scores). In the example of Ann, we see that her fellow members are experiencing more than average acknowledgement in relation to her. This again is an indication that she tries hard to avoid conflicts, not to get into trouble with her fellow members. The outcome suggests that it will be an important therapeutic goal to overcome her fears of standing up for herself. Discussion In keeping with the concept of the discursive cycle, it is the dialogue between the members of a social structure concerning everyday events, the social dialogue, that constitutes the social reality. It is according to this line of thought that the outcome of an IPF-investigation should be brought up for discussion, both in the community and amongst the staff.
Clinical assessment computerized methods
154
Discussing the Outcome in the Community Discussing the outcome with the members of the community offers an opportunity to discuss and reflect upon their experiences in the daily routine of the community. It is said that, fairly often, the confrontation with aspects of functioning is somehow known, but rather not attended to. These aspects of functioning get special significance if the participants concerned are willing to look at it as a matter of joint interest. As for the therapeutic process, to be confronted with one’s social functioning and to make use of the opportunity to talk it over, will be essential to gaining insight in one’s role and position in what is happening. Discussing the Outcome amongst the Staff Supporting the functioning of the community by stimulating reflection by means of feedback and confrontation is a basic task of the staff. The staff can only hope to perform this task successfully if they are well informed about what is going on and about the dynamics behind it. Discussing the presented outcome offers various leads that can clarify matters. Feedback and confrontation will only have a positive effect if these interventions are not experienced as disqualifying. It appears (Fig. 4) that the staff is experienced as moderate, both on the A-dimension as well as on the D-dimension. For bringing about an active and close cooperation with the community, this positioning of the staff seems satisfactory. Nevertheless, in regarding this positioning, the staff could see a cause to talk over the “why”s and the “wherefore”s of it. Evaluation of the Method The concept of the discursive cycle can offer a helpful structure in thinking and talking about interpersonal affairs. A decided advantage of the described method is the possibility to connect personal experiences and community life. Feedback of the outcome is facilitated by the fact that the discussed issues are expressed in statements originating from the members of the community themselves. The reported experiences are processed into meaningful inferences in a way near to their own frame of reference. Another possibility is to use the data from the recurrent investigations to keep up with the development of the members and of the community. References Frijda, N.H. (1986). The emotions. Cambridge: University Press. Frijda, N.H. (1987). Emotion, cognitive structure, and action tendency. Cognition and Emotion, 1, 115–143. Frijda, N.H., & Kuipers, P., & ter Schure, E. (1989). Relations among emotion, appraisal, and emotional action readiness. Journal of Personality and Social Psychology, 57, 212–228.
A method for the assessment of interpersonal functioning
155
Frijda, N.H., & Mesquita, B. (1995). The social roles and functions of emotions. In S.Kitayama & H.R.Markus (Eds.), Emotion and culture (2nd ed., pp. 23–50). Washington, DC: American Psychological Association. Gergen, K.J. (1989). Warranting voice and the elaboration of the self. In J.Shotter & K.J.Gergen (Eds.), Texts of identity (pp.70–81). London: Sage. Harré, R., & Gilett, G. (1994). The discursive mind. London: Sage. Hermans, H.J.M., & Bonarius, H. (1991). The person as co-investigator in personality research. European Journal of Personality, 5, 199–216. Sampson, E.E. (1993). Identity politics, challenges to psychology’s understanding, American Psychologist, 48, 1219–1230. Shorter, J. (1989). Social accountability and the social construction of You’. In J.Shotter & K.J.Gergen (Eds.), Texts of Identity (pp. 133–151). London: Sage.
Section III INSTRUMENTATION
Chapter 13 Using Windows for Psychological Tests and Experiments with Real-time Requirements C.F.Bouwhuisen and F.J.Maarse Nijmegen Institute for Cognition and Information, University of Nijmegen, Montessorilaan 3, 6525 HR Nijmegen, The Netherlands Abstract In this article problems are discussed that arise when implementing psychological tests and experiments imposing real-time requirements under Windows. An overview is given of the different sources of errors, and the techniques that are available to deal with these errors. Some simple experiments are described that were conducted to determine the magnitude of the errors. Finally, it is shown that satisfactory results can be obtained when certain restrictions are observed and/or specific techniques are applied. This article does not provide any detailed information about programming.
Introduction For many years MS-DOS was the operating system that was most frequently used for the implementation of psychological tests and experiments. These applications often include time critical (or real-time) elements such as, for example, reaction-time measurements. Although MS-DOS was not especially designed as a real-time system, it performs very well in this kind of application. This is mainly due to the very simple architecture of this operating system that allows the programmer to establish complete control over the computer and all its peripherals (Tibosch, 1986). Over the last years, people have got used to the user friendly interface of Windows 95, Windows 98 and Windows NT. When compared to MS-DOS, these systems also offer many other advantages that can be useful in implementing psychological tests and experiments. Some of these are: multitasking, 32 bit address space, virtual memory and multimedia facilities. In this situation, the question arises as to whether Windows systems are also suitable for implementing psychological tests and experiments with real-time requirements, in spite of the fact that Windows systems are far more complicated than DOS. Unfortunately, very little information is available about the performance that can be expected from Windows systems in such an environment. In this article, we will investigate which inaccuracies should be expected when executing psychological tests and experiments with real-time requirements under
Clinical assessment computerized methods
158
Windows 95 and Windows NT, and which techniques are available to keep these inaccuracies as small as possible. We will emphasize techniques that are supported by the software manufacturer and that are likely to be supported in future versions of Windows. Since the various Windows systems, such as Windows 95 and Windows NT, have much in common, we will talk simply about Windows and will only distinguish between these separate systems when significant differences between them have been found. Since Windows 2000 is in fact the newest version of Windows NT, we may assume that many properties of Windows NT will also apply to Windows 2000. Real-time Response under Windows Definition of Real-time Response In this article we will use the term real-time response to indicate the time that a program needs to react upon an external event. In the case of a reaction time experiment for instance, this is the time that a program needs to read the current time after the subject has pressed a button. This time introduces an inaccuracy in the measurements if it cannot be compensated for. Single Tasking and Multitasking Unlike MS-DOS, Windows is a multitasking system. This means that Windows is designed to execute several programs simultaneously. In addition to programs that are explicitly activated by the user, Windows itself also activates programs for internal tasks that are not controlled by the user. As a computer has (in general) only one CPU, a multitasking system has to switch quickly between all active programs to divide the available CPU time among these programs. In such an environment a real-time program can only give reliable real-time responses in the following circumstances: 1. The real-time program can be given absolute priority above all other active programs. 2. The operating system is able to switch quickly to this real-time program when needed. In a multitasking system, programs should be prevented from disturbing each others’ functioning. For this reason, access to commonly used peripherals like screen and disks must be coordinated by the operating system, and programs must be prevented from handling their own communication with peripherals without the intervention of the system. This is particularly important in systems that claim some degree of security. The coordination of the access to peripherals by the operating system causes time delay and affects the real-time response of the programs. Virtual Memory Windows is a virtual memory system. In such a system, a program may (seemingly) use more working memory than is physically available. In fact, part of the contents of this virtual memory is stored on disk, and is only copied into physical working memory when needed. To make room for this operation, the least recently used part of the contents of
Using windows for psychological tests
159
the physical working memory is copied back to disk. The operation of copying information between working memory and disk is called paging. The advantage of the use of virtual memory is that it enables the user to run programs that would otherwise not fit into working memory, the disadvantage is, however, that the paging process slows downs the execution of a program. The latter is especially undesirable in real-time applications. The most simple way to avoid paging is to stop all unneeded programs and to provide an ample amount of physical working memory for the real-time program. Windows as a Real-time System Every good operating system is designed to respond as quickly as possible to user actions. The difference between general purpose systems and specific real-time systems is that general purpose systems are designed to give an optimal average response time, while real-time systems are optimized on guarantied response time. Since Windows is a general purpose system, we cannot expect a guarantied real-time response. Windows can however still be used for time critical tests and experiments where the following conditions apply: 1. The average real-time response is known and is within an acceptable range. 2. The variations in real-time response are small. We will discuss these conditions in the following sections. Optimizing Real-time Response Windows offers several possibilities for optimizing the real-time response: 1. Using Priorities: In Windows a user can set the priority of a program with respect to other programs in the system. Windows distinguishes between priority levels: Idle, Normal, High and RealTime. When using High or RealTime priority the following points must be considered: • Not all of the internal processes of Windows can be interrupted immediately, so delay in real-time response can never be completely avoided, even when using the highest priority. • A program having RealTime priority may block the system when it gets into a program loop. • When using Windows NT, specific user privileges are needed to use elevated priorities. 2. Using Multithreading: More complex real-time programs can often be divided into a time critical and a less time critical part. Multithreading gives the possibility of implementing both parts as threads. Threads are separate parts of a program that are executed independently and may have different priorities. By assigning a higher priority to the time critical thread, the real-time response of a program may be improved.
Clinical assessment computerized methods
160
3. Using DirectX: DirectX is a software package that was designed by Microsoft to be used in computer games. Since computer games, like psychological tests and experiments, often have strict real-time requirements, DirectX was designed primarily to improve real-time response. For this purpose, it gives the possibility of reserving a specific peripheral (like the screen for instance) exclusively for the use of one program, so that the time delay that results from coordinating the access to the peripheral between several programs can be reduced. DirectX also gives a program a more direct control over the peripheral. When using the screen, for instance, DirectX offers the possibility to synchronize a program to the video frames and to switch between several separate pages of video memory (page flipping). In this way DirectX offers a substitute for the direct access to the peripherals that was commonly used in MS-DOS for achieving optimal real-time response.1 Since DirectX was developed for use in computer games, only those peripherals that are commonly used in games (like screen, joystick, keyboard and sound card) are supported. It is important to note that programming in DirectX requires considerable experience in low-level programming techniques. The current version of Windows NT has only a restricted implementation of DirectX. 4. Using Real-time Software from other Manufacturers: Besides Microsoft, other manufacturers provide software packages for improving the real-time response of Windows. When using such software, it is important to be sure that it will also be supported in future versions of Windows. 5. Increasing the System Clock Frequency. When implementing real-time tests and experiments, we often need the possibility of measuring time intervals. The functions that are available in Windows for reading the current time make use of the system clock. This clock has a standard setting of 100Hz, so we can expect these functions to have a resolution of 10 ms. This limited resolution is an additional source of measurement error besides the errors that are caused by the limited real-time response of the system. Windows offers the possibility of increasing the system clock frequency above the standard frequency of 100Hz. The maximum possible value depends on the system configuration2 (Jones & Regehr, 1999). Another way of improving the accuracy of time measurements is by using the performance counter. This is a clock that is implemented as a piece of hardware on the system board and can also be read by standard Windows functions (reference to Microsoft 1
Since Windows 95 is not a secure system, direct access from application programs to peripherals (although not a good practice in multitasking systems) is still possible in some cases. In Windows NT however, which claims a high degree of security, this kind of access is (or at least should be) impossible. Since future versions of Windows will be based on Windows NT, programs using these techniques will probably not work anymore in the future. 2 The Windows Application Programmers Interface function timeBeginPeriod can be used for this purpose.
documentation.). This clock is implemented in hardware and has a very high resolution (better than 1µs). When available, it can be used instead of the Windows time functions for accurate measurement of time intervals.
Using windows for psychological tests
161
Some Experiments The available documentation about the Windows systems does not provide (as far as we could find) much information about the real-time response that can be expected from programs running on these systems. To get more information about this subject, we decided to perform some experiments. We do not claim that the results provide all the necessary information, but they seem enough to get an impression of the errors that are to be expected. Delphi 3 was used as the programming environment. Simulation of Reaction Time Measurements This experiment was designed to determine the accuracy that can be achieved when executing a reaction-time experiment on a Windows system. In practice, this accuracy is not only dependent on the real-time properties of Windows, but also on the properties of the equipment that is used for stimulus presentation and response registration. Here we concentrate upon the effects that are specific for the Windows system and that can be expected in every program running on such a system. Important are the influences of elevated priorities and of loading the computer during the experiment by other programs running (on the same computer) with Normal priority. For this experiment, the human subject has been simulated by an electronic pulse generator, connected to the serial port of the computer. Since the actual reaction time of this ‘artificial subject’ is accurately known, the measurement error can be easily determined by comparing the measured time to the actual time. By connecting the pulse generator directly to the serial port, errors specific to a video screen or a standard keyboard, which are to be considered in many practical experiments, are avoided (these errors will be discussed separately). The modem control lines of the serial port were use to avoid the delay that is inherent to serial transmission. During the measurements, the computer was loaded in the following way at three different levels: None: Except for the experimental program, no programs were explicitly activated. However, no measures were taken to eliminate internal Windows programs or to stop the network. Low: In addition to the experimental program, a second program was activated that was constantly performing mathematical calculations without doing much disk i/o and without interaction with the user. This program was running at Normal priority. High: Concurrent to the measurements, the system was loaded by an interactive user who was using the Windows explorer for finding and copying files (including network transfers) as extensively as practical. Measurement Conditions Systems:
Windows 95 and Windows NT
Priorities:
Normal and RealTime
System load:
None, Low and High
Clinical assessment computerized methods
Trials/condition:
2000
Hardware:
Pentium 1, 100MHz, 32Mb
162
Results The results of the measurements are given in Table 1 and Table 2. On an unloaded system, the average error is just above 1 ms, with a low standard deviation. The maximum value for all trials is 1.6 ms. The measured difference between Windows 95 and Windows NT can be ignored for applications like psychological tests and experiments. As might be expected, the priority level has little influence in this condition. On a loaded system, we see that elevated priorities must be used to limit the measurement error. When using RealTime priority, there is only a small increase of average values and standard deviation as compared to an unloaded system, but there is a more significant increase of maximum values, especially at higher loads. The occurrence frequency of these values, however, is very low, as can be concluded from the ratio between maximum value and standard deviation. Windows NT seems to be more sensitive to system load than Windows 95.3 Of course, better results can be expected on faster computers and/or computers with more working memory.
Table 1. Reaction time measurement error in ms for Windows 95, with different priorities and loads. Normal Priority Load Average
Standard deviation
Real-time Priority Maximum Average
Standard deviation
Maximum
None
1.14
0.12
1.4
1.12
0.12
1.6
Low
1.13
0.12
5.7
1.13
0.068
1.5
High
1.7
7.4
162.4
1.14
0.15
3.8
Again it must be emphasized that the results of these measurements only apply to the errors that are specific for the Windows system. Errors that result from the properties of specific peripherals as, for instance, screen, keyboard or sound-card are discussed separately. 3
Differences between Windows 95 and Windows NT (especially those that occur at higher loads) may result from the fact that Windows NT needs more memory than Windows 95, so more paging might occur for Windows NT in comparable situations.
Using windows for psychological tests
163
Table 2. Reaction time measurement error in ms for Windows NT, with different priorities and loads. Normal Priority
Real-time Priority
Average Standard deviation Maximum Average Standard deviation Maximum None
1.11
0.048
1.2
1.12
0.011
1.3
Low
2.05
0.5
5.1
1.26
0.08
1.5
High
2.48
4.2
983
1.3
0.33
6.8
Reaction Time Measurements using the Keyboard In many cases, the standard keyboard is used for response registration in reaction time experiments. Since the keyboard and its driver software were never designed for accurate time measurements, and the Windows documentation does not provide any information on this subject, it is important to verify the accuracy of these registrations. For this purpose, we made recordings of the movements of the finger while pressing a key repeatedly, by using an optical movement recording system (Optotrak). At the same time, the moment of the key activation was detected by a computer program in the same way as would be done in a reaction time experiment. By comparing the time points of the key activations, as detected by the computer program, to the recorded movement data, the registration error was determined. In our experiment, we found delays between 2 and 12 ms, which seem to be uniformly divided between these two boundaries. Verification Time and Timer Functions In tests and experiments, we may need two different kinds of functions that have to do with time: 1. Functions for requesting the current time (time functions). 2. Functions for scheduling a specific action to a given point of time (timer functions). In a simple program, we tested the accuracy of time and timer functions by referencing the results to the performance counter. As could be expected from the fact that the system clock in Windows runs normally in steps of 10ms, we found a resolution of 10ms for most functions. With exception of the timer function of Windows 95, we found that the functions are accurate within the limitations of the 10 ms resolution and the delay times, as found in the first experiment. The time resolution can be improved by increasing the system clock frequency (Jones & Regehr). For Windows 95, however, the most commonly used timer function (SetTimer) frequently shows errors of more than 90 ms, so it cannot be used for real-time applications.
Clinical assessment computerized methods
164
Other Sources of Errors Time Accuracy of Auditory Stimuli Most functions available in Windows for playing sounds read the required sound data from files on disk. Since reading data from disk is a relatively slow process that may introduce unpredictable delay due to required head movements, these functions are not suitable for stimulus presentation in psychological tests and experiments with real-time requirements. To avoid this delay, sound data must be read directly from memory. For this purpose, DirectX can be used. However, according to the specifications, a delay of up to 20 ms must still be expected in this case. Probably this delay is caused by the mixing algorithm that is built in DirectX and that enables the mixing of sounds from several sources. We suggest two possibilities for handling this delay (see the DirectX documentation from Microsoft for more information.): 1. Avoiding the delay by bypassing the mixing algorithm by means of primary buffer access. 2. Determination of the actual delay by requesting the current play position and comparing it with the expected playing position, based on the current time. Time Accuracy of Visual Stimuli on the Screen Just like in the MS-DOS environment, the main source of timing errors during the presentation of visual stimuli on the screen results from the fact that the picture is generated in discrete refresh periods. Since a change in the picture only becomes visible after the next refresh period, and the process of refreshing cannot be controlled by the program, a random delay occurs with a maximum length of one refresh period. In Windows, the refresh frequency can be set by the user within the limitations of the hardware. Although it is not possible to synchronize the refresh period to the program, in many cases the error can be avoided by synchronizing the program to the refresh period. The only way to achieve this synchronization in a reliable way seems to be the use of DirectX. DirectX also offers many other possibilities that can be useful in stimulus presentation, like the switching between separate pages of video-memory (known as page flipping). Conclusions When we assume that an accuracy of about 1 ms is sufficient for real-time experiments and tests, we can conclude that Windows 95 and Windows NT can be used for these applications, provided that the following points are taken into consideration: 1. Loading the computer, during a test or experiment, with unnecessary tasks must be avoided for best results.
Using windows for psychological tests
165
2. At least a Pentium 1 processor running at 100mHz is required with an ample amount of working memory for the given application (usually 32Mb or more for Windows 95). Future versions of Windows will require a more powerful computer. 3. If an accuracy of exactly 1 ms is needed, a compensation for the average delay in realtime response might be necessary. The magnitude of this delay is among other things dependent on the speed of the computer, and must be experimentally determined. 4. If a timing resolution better than 10 ms is required, the system clock frequency must be increased (timeBeginPeriod). In Windows 95, the standard timer function (SetTimer) is not suitable for real-time applications. The Windows Application Programming Interface provides other timers with better accuracy. 5. The use of the screen for presentation of real-time stimuli requires DirectX (or a comparable product) in combination with adequate programming techniques in order to achieve about 1ms accuracy. 6. The keyboard shows a random delay between 2 ms and 12 ms on our test system. This delay could possibly be reduced by using DirectX. 7. The sound card requires the use of DirectX to produce auditory stimuli with some degree of accuracy, but even in this case a delay of up to 20 ms must be expected. Additional techniques may be required for handling this delay. The errors caused by the screen and the keyboard have random values with a uniform division. Provided that enough measurements (or trials) are available, these errors can be reduced by statistical methods, and the complexity inherent to the use of DirectX can be avoided. The mean value of these errors is dependent on the properties and settings of the computer. Caution must be taken when using DOS-like techniques to overcome the limitations of Windows. In fact these techniques work due to shortcomings in the current Windows version, and may no longer work in future versions of Windows. DirectX will probably provide a more safe solution in most cases, in particular for the future. Additional realtime software is available from other manufacturers. References Jones , M.B., Regehr, J. (1999). The Problems You’re Having May Not Be the Problems You Think You’re Having: Results from a Latency Study of Windows NT. http://www.cs.virginia.edu/~jdr8d/papers/hotos7/hotos7.html MicroSoft (1999). DirectX documentation files included in the DirectX Software Development Kit. MicroSoft (1999). Various documents from Microsoft’s web site (http://www.microsoft.com/) Tibosch, H. (1986). Timer: het meten van reaktietijden op een PC. Psychologie & Computers, 3, 40–46.
Chapter 14 WinTask: Using Microsoft Windows ™ for Real-time Psychological Experiments J.Bos, E.Hoekstra, L.J.M.Mulder, J.A.Ruiter, J.R.Smit, D. Wybenga, and J.B.P Veldman Department of Psychology, University of Groningen, Grote Kruisstraat 2/1, 9712 TS Groningen, The Netherlands Abstract In the department of Psychology at the university of Groningen, a development toolkit for building real time experiments called Taskit has been used for many years. This development toolkit offers the possibility to create psychological experiments in a MsDos environment using Turbo Pascal. Recently, however, the question was raised to construct a toolkit with at least comparable functionality, but now using the advantages of a Windows based operating system. A thorough investigation was made into the possibilities for translating the existing Taskit toolkit to a Windows based environment. This approach resulted in a new toolkit that can be used for developing real time experiments within Microsoft Windows: WinTask. At present this environment supports all the necessary visual and auditory modes of information presentation with adequate, guaranteed accuracy of timing. The WinTask toolkit has already been used as part of a complex dynamic environment, simulating an ambulance dispatch work domain.
Introduction First the question arises as to what the benefits should be in using Windows 95/98 and Windows NT/2000 for psychological experiments. For many years, MsDos based applications have been used and these have proved to be reliable experimental systems, despite the fact that MsDos is not a specific real time operating system. Many development toolkits, such as Mel (Schneider, 1988) and Taskit (Gerritsma & Van der Meulen, 1991), are used for building experimental tasks with adequate real time characteristics, and they have gained quite a high popularity. In particular the benefit-cost ratio is good. One great advantage of using Windows could be the higher functionality this operating system offers, like working with a graphical visual interface with a mouse. Another argument is that the MsDos environment becomes less commonly used nowadays, and this trend will go on in the years to come. Much new equipment with greater functionality works only partially, or not at all, within a MsDos environment. Building a new system for Windows would give access to a wide range of new useful equipment specially suited for this operating system. Also, concepts such as Multimedia
Wintask: using microsoft windows ™
167
(video & music), multi user task environments and the Internet can be introduced within the setting of experimental tasks. Such functions can be combined with the use of many existing Windows applications. These characteristics of Windows can well be used to build complex task environments. For these kinds of tasks, a new experimental laboratory has been developed in which the above-mentioned Windows characteristics play an essential role: the Digital Workplace for Analysis of Task performance (Bos, Mulder, & Van Ouwerkerk, 1999). Another benefit of using Windows is the ease of developing applications that meet criteria for usability and time-efficiency. Due to graphical user interface options, it is possible to create a visually oriented development toolkit, in which users can build their own real time experiments without requiring the knowledge and skills of programming or of overcoming timing problems. Although the previously mentioned MsDos toolkits all provide a certain user interface, the usability of these interfaces is restricted due to the limitations of the MsDos operating system. These arguments play an important role in further development of the Taskit toolkit. This development toolkit is based upon the assumption that the user is well able to program in Pascal, starting with a set of examples and a library of effective functions. Using this language the function library of Taskit can be used to create real time experimental tasks in a simple and effective way. The reason that Taskit is still readily popular for its users is that many example applications are available that can be easily converted into a new experimental set up. The second reason is that Taskit creates a very stable experimental environment that guarantees real time experimenting with a resolution of 1 millisecond. A good step would be the combination of the Taskit methodology with a Windows based development interface. In the following sections of this paper, a survey will be given of the known problems using Windows within a real time based experimental environment, solutions found to solve these problems and the steps taken to develop a Windows based version of Taskit. Problems with Microsoft Windows Although Microsoft Windows may provide a good user interface for developing real time psychological experiments, it has some major drawbacks. At first, it is very difficult for the designer of the experimental environment and its user to obtain a stable control over the operating system during an experiment. This is not only caused by the open structure provided for the user, but also due to the restrictions in controlling low-level hardware devices. Also, it is very difficult to create a stable timing device that can give a real time clock with a resolution of at least one millisecond. The last problem, closely related to the previous one, is the difficulty in presenting stimuli on the video monitor with a stable and fixed delay. Because direct access to the video card is limited, and display functions are shielded by the Graphic Device Interface (GDI) of Windows, no real time timing can be guaranteed for presenting stimuli. Some techniques have been suggested to overcome this problem. One solution is to use averaging techniques (Maarse, Ghisaidoobé, & Bouwhuizen, 1997), by which the error is reduced in the data by averaging over a large number of stimuli. A disadvantage of this technique is the large amount of stimuli and subjects necessary to minimize the
Clinical assessment computerized methods
168
resulting error. Most of the subject groups used within our department do not have the approved size to use this method. Additionally, the technique is badly suited for Evoked Potential research due to spreading activation patterns reducing the obtained resolution in such data. So a solution is needed that provides the possibility to create real time experiments in Windows 95/98 and Windows NT without using averaging techniques. Before a system can be designed for creating real-time psychological experiments in Windows, the following four questions have to be answered: 1. Is it possible to create an accurate timing device in Windows that is stable for a long period of time, and that has a resolution of at least 1 millisecond? 2. Is it possible to directly access hardware elements in Windows such as memory, I/O ports and interrupts? 3. Is it possible to shield most of the disturbing elements of Windows in order to create a stable and controllable environment for conducting real time psychological experiments? 4. Is it possible to send stimuli directly to the video card and audio speakers with a controllable delay, and are possibilities available to measure such delays? Each of these questions had to be answered before further developments could be started. By answering the questions, a distinction must be made between Windows 95/98 and Windows NT. Windows NT uses a closed model in which direct access to hardware is not permitted at all, though it is possible to use a low level kernel device that gives ‘almost’ direct access to hardware elements. Windows uses another approach to control hardware elements which enables an easier access to the hardware. A solution should provide an answer for both operating systems, in order to fulfill future requirements. Solving the Windows Constraints Most of the commonly known toolkits being used in experimental psychology for MsDos, such as Mel (Schneider, 1988) and Taskit (Gerritsma, 1991), use a real time clock with a resolution of one millisecond, to provide the necessary timing resolution for registering events in combination with physiological data. To create a timing device in Windows with a stable resolution of one millisecond, standard programming techniques cannot be used. The standard Windows timers do not have a stable timing mechanism. These timers are based upon the main Windows processing cycle with a mean cycle time of 55 milliseconds. To create a stable timer with a higher resolution, it is necessary to create a thread based timing device that can have its own processing cycle and priority setting within the system. One of the options for accomplishing this goal is to use Windows Multimedia Timers. Although in recent years Multimedia Timers have become more general, there has still been limited information available about the correctness of these timers for real time experiments. To partially solve this problem, an external toolkit (ExacTicks, Ryledesign Corp, 1998) was acquired, from which the C++ source could be obtained. ExacTicks is a Windows runtime library that offers the possibility to create a stable multimedia timer with a preferred resolution of at least 1 millisecond. This toolkit was adapted to function
Wintask: using microsoft windows ™
169
as the central timing module. The code was partially rewritten to fit into the conceptual structure used by the Digital Workplace. A powerful feature of the Taskit toolkit is the possibility to directly access low level devices such as memory, I/O ports and interrupts. With these functions, experiments can be created which have a very high-resolution stimulus/ response cycle. The stimulus is presented on the screen by directly accessing the video card. Responses are registered using the parallel port in combination with interrupt handling. On a high level, this access in Windows is not possible. Hardware cannot directly be accessed and the use of interrupts is normally not allowed within Windows. To solve this problem an external toolkit was acquired (TvickHW32, Ishekeev, 1998). TvickHW32 is a Windows runtime library, which enables the possibility to almost directly access hardware elements in Windows 95/98 and Windows NT. It gives the possibility to access I/O ports, memory addresses and one interrupt line, hereby making use of a special device driver to access the hardware elements. The C++ code of this package was obtained and integrated with the routines provided by the ExacTicks package. To find an answer to the third question of shielding disturbing elements, a distinction must be made between the disturbances caused by the operating system and disturbances caused by the user. Disturbances caused by the operating system are a result of the background processing structure of Microsoft Windows. Due to the open structure of the Windows environment, users do have access to most parts of the Windows interface. The Windows environment should be altered to shield some of these access possibilities to the user. Elements such as the Desktop and the Taskbar within Windows should not be controlled by the user (subject) during an experiment, but by the experimenter. Also, control should be maintained over other running applications in situations such as starting and quitting an application. It must be possible to prevent the subject from closing a running application or starting a new application. Also routines must be developed to intercept some of the keyboard strokes that the user could use to manipulate the Windows environment (CRTL-ALT-DEL, ALT-tab, etc.). To solve this problem, a technique is used as shown in Figure 1, called hooking. With this hooking mechanism, it is possible to filter the internal message queue of the Windows operating system. By filtering this queue, certain messages such as mouse movements, keystrokes and system messages from the user can be filtered out without interfering with the experimental timing.
Clinical assessment computerized methods
170
Figure 1. Basic configuration of the hooking mechanism. Messages are removed/saved as events if necessary or inserted to simulate a certain message. To control running applications in Windows, it is necessary to make use of the internal handle codes of the running processes and windows. A handle code is a unique code that represents a window (or process) within the operating system. From each window (also the dormant invisible ones) the handle codes are deprived, offering the possibility to control each window. Using the handle codes in combination with the hooking procedure, it becomes possible to filter messages that are specially meant for a certain window. Using this method it becomes possible to control the entire interface, shielding the subject from performing unwanted disturbances. The Taskbar, the Desktop and other open structures within Windows can be shielded, and also the possibility to start and stop applications can be controlled within one application. The next step would be the control of the lower processes within the system, such as disk caching and the access of floppy drives and CD-ROM drives. For this a solution has not yet been found. Disk Caching is a fundamental concept within Windows that cannot easily be shut down. Minimizing disk caching is possible by preventing swapping between applications, but the system must then be provided with a large amount of memory (preferably at least 256 MB).
Wintask: using microsoft windows ™
171
Figure 2. Basic configuration of the Windows display system. DirectX uses the DirectDraw module to bypass the GDI and DDI interface of Windows to display images. Almost direct access to the hardware is possible through the hardware emulation layer (HEL). Another solution is to make a distinction within an experiment between critical and noncritical sections. Within a non-critical section, all used applications are loaded and disk caching is unnecessary. After this non-critical section, the real time part of the experiment is executed, where disk caching is limited as much as possible. After the critical section, a non-critical section can be used to shut down the applications. Within all these sections, the internal clock must retain its resolution of one millisecond. A powerful feature of the Taskit toolkit is the possibility to present stimuli to a monitor with a standard delay that is equal to the vertical refresh rate of the monitor. Also, the possibility to build stimuli on several virtual pages in the background to prevent delays caused by the construction of the stimuli, is a feature which also should be implemented in the Windows version. To implement this feature in Windows it is necessary to go beyond the standard graphic user interface of Windows. Figure 2 shows the basic configuration of the Windows display system. Normally everything that is presented on the screen is built up by the Display Device Interface (DDI) and the Graphics Device Interface (GDI). These interfaces function as a layer for
Clinical assessment computerized methods
172
drawing the windows and other objects on the screen. A problem with using the DDI and GDI is that no precise time delay can be measured between the actually given drawing command and the moment that the action is being executed on the monitor. For presenting stimuli with a high-resolution timing it is essential that the delay time between the drawing command and the actual presentation on the monitor is known and also has a constant delay time. To solve this problem, an alternative system is proposed for presenting stimuli with a high time resolution, DirectX. DirectX is a specially developed driver interface by Microsoft to be used for developing highly demanding graphical programs such as Multimedia applications and games (Bargen & Donnely, 1998). A great advantage is that, when using this system, the GDI of Windows is bypassed, resulting in a direct handling of all the drawing methods. It is almost possible to directly access the video card, providing a fast method for drawing stimuli. Also some features, used by the Taskit toolkit, are available to create the stimuli on background pages and using page flipping to create a better performance. A restriction of using DirectX is that no other application that uses the GDI interface of Windows can be used at the same time. This implicates that only one DirectX task can be active and that all the other GDI Windows applications become dormant background processes. To partially solve this problem, an alternative implementation is developed which uses the GDI to present the stimuli instead of using DirectX. If presenting stimuli with high-resolution timing is not necessary and an experiment only needs a correct timing mechanism, then the stimulus presentation functions are routed to a normal GDI window instead of using DirectX. In this case other Windows applications can also be used during the experimental task. The option to switch very quickly between the accurate but restricted mode and the less accurate but flexible windows mode offers an advanced stimulus presentation facility. Implementation of the auditory module has not yet been completed. For this purpose a part of DirectX, called DirectSound, can be used to enable auditory stimuli to be sent with high-resolution timing to a subject, in combination with visual stimuli with highresolution timing. Creating a Windows Task Environment: WinTask. By partially answering the four questions, a new toolkit can be developed for creating real time experiments in Windows. For this reason the approach used by the Taskit toolkit was adapted to the Windows environment. All functions provided by the Taskit toolkit should also be provided by the new toolkit, without any limitations. For each of the mentioned questions a separate test module was developed to see whether the demands for a real time experimental environment could be reached. The final step is the combination of the four modules to one single module, the WinTask toolkit.
Wintask: using microsoft windows ™
173
Figure 3. Basic structure layout of the WinTask DLL toolkit. The four modules do have their own distinct functions. Interaction between the modules is shown by the arrows between the circles. The remaining arrows indicate the input and output possibilities of the WinTask toolkit. Although the four modules will be integrated into one single toolkit, each module will have its own separate functionality, as can be seen in Figure 3. The four modules do interact with each other but all perform their own basic functions. The central module is the clock module that contains a one-millisecond real time clock, which uses a Multimedia timer. The second module, called the hardware module, is for controlling the hardware, registering events (through interrupts) and sending events (I/O ports). A small VDX driver (Windows Device Driver) is used to get access to the hardware.
Clinical assessment computerized methods
174
Figure 4. Main test configuration for testing the WinTask toolkit. The EDMPC is used as an interface between the WinTask stimulus computer and the block pulse generator. The Windows control module is used to obtain more control on the Windows environment. This module filters the message queue of Windows and contains a list of all available windows that can be controlled. This module enables the control of all of the user processes within the operating system. The stimulus presentation module contains the functions for presenting visual and auditory stimuli to a subject. For this purpose, a distinction is made in presenting stimuli using DirectX and presenting stimuli using the standard Graphic Device Interface. The functions for both kinds of presentation from the user’s point of view are the same, however, timing accuracy and flexibility are different as pointed out earlier. The WinTask toolkit will choose the right configuration, as indicated by the developers. The four modules are integrated into one DLL library that will contain a list of functions that can be used by a task developer. Various programming development environments, ranging from Microsoft C++, Borland C++, Microsoft Visual Basic up to Borland Delphi, may use the DLL. The experiment developer himself can decide which language to use or which developing environment is the best fit for integrating the WinTask toolkit. Experimental Evaluation Although the specifications of the separate modules indicate that a reliable system can be developed, extended research was conducted to see how the WinTask toolkit would operate under varying conditions. Several configurations were chosen to test the stability and reliability of the toolkit. First a stand-alone configuration was built using an EDMPC as shown in Figure 4. An EDM-PC, used for years in the Taskit environment, is a small box that acts as an interface device, which connects the parallel port of a computer with other devices that send and receive digital signals. These signals can represent stimulus events, subjects’ responses or other external events. The EDM-PC is connected with a pulse generator that generates block pulses with a frequency of one thousand Hz. The EDM-PC is then connected through the parallel port
Wintask: using microsoft windows ™
175
to the WinTask Computer. To start the measurement, the WinTask computer generates a start command through the parallel port to the EDM-PC. At that moment, one thousand pulses per second are sent through the EDP-PC to the parallel port of the WinTask computer. Every time a pulse enters an interrupt is generated and captured by the WinTask toolkit. Results of this test (Table 1, 1.a) show that the software clock of the WinTask toolkit performed on a stable basis for a long period of time (tested eight hours). Sometimes, a one or maximal two-millisecond deviation was found, but this was directly readjusted during the next cycle. The next step (Table 1, 1.b) was to add deliberate disturbances during a test cycle, such as starting other applications. The deliberate disturbances did not affect the clock seriously, as can be seen in Table 1.
Table 1. Results of the different test configurations: the third column gives the resolution of the used clock, the fourth column gives the range of deviation found during the test cycles, the fifth column indicates whether a deviation was corrected during the next time cycle. Test Type of test
Resolution Range of Deviation
Corrected
1.a
EDM-PC, using interrupts, no distortions 1 ms
0 ms
Yes.
1.b
EDM-PC, using interrupts, with distortions
1 ms
1–5 ms
Yes
2.a
No EDM-PC, using interrupts, no distortions
1 ms
0 ms
Yes
2.b
No EDM-PC, using interrupts with distortions
1 ms
1–5 ms
Yes
3.a
No EDM-PC, no interrupts, no distortions 1 ms
1–5 ms
No
3.b
No EDM-PC, no interrupts, with distortions
1 ms
1–10 ms
No
4.
Direct X, using EDM-PC, no distortions
1 ms
1 ms (constant)
No (constant)
5.
Media player, using EDM-PC, no distortions
100 ms
1–3 ms
Yes
The second configuration (Table 1, 2.a & 2.b) was fairly similar to the first, but no EDMPC device was used. In this case, pulses were sent only to the WinTask computer, using the pulse generator, resulting only in interruptions from the LPT port, without event values generated by the EDM-PC. In this case also, a stable timing without any distortions was recorded, with the same test settings as used for the first test configuration: results were not different from the testing effects with the EDM-PC.
Clinical assessment computerized methods
176
A third configuration (Table 1, 3.a & 3.b) used extended possibilities of the new LPT port configuration. In this case, no interrupt was used but instead a call back routine that continuously polled the printer port to see if there was incoming data. This configuration was less accurate than the interrupt based configuration, but works fine within a resolution of about ten milliseconds. An advantage of this configuration is that no EDMPC is necessary to send event codes to the WinTask computer.
Figure 5. Test configuration for testing WinTask in combination with DirectX. The WinTask computer generates and registers the stimulus events and screen events. The Taskit computer only registers the stimulus events and screen events. To see how the WinTask toolkit would perform when taking over several functions of Windows, a special module was developed for controlling the interface. This module acts as a digital recorder and registers all actions of the user during an experimental session. Also, some of the possibilities of the user can be limited, such as access to the Taskbar and the Desktop. After a session, all these actions can be replayed. This module makes use of grabbing the message queue in combination with controlling the active processes and Windows (Stowers, 1998). During testing, this module performed without any problems. All actions of the user could be monitored and controlled. After a session, it was possible to replay the entire session on the same computer. Testing of this module proved that it is possible to shield a great part of the open structure of Windows without disturbing the operating system itself. For testing the possibilities of DirectX in combination with the WinTask toolkit, another test configuration was constructed, as shown in Figure 5. This configuration consisted of two computers and two EDM-PC modules connected to each other. One computer presented the stimuli, using the WinTask toolkit in combination with DirectX. On the monitor of this computer, a light sensitive diode was connected that would generate a pulse when a small part of the screen was illuminated white. Using this method, it is possible to measure the delay time between sending the command for drawing a specific screen pattern to the video-card and the actual drawing at the screen.
Wintask: using microsoft windows ™
177
The second EDM-PC and the Taskit computer were used for comparing the event registration cycle with that of the WinTask computer. The following procedure was used: a white spot was generated on the black surface of the WinTask Monitor using DirectX. Directly before executing this task, a pulse was generated to the first EDM-PC using the parallel port. The white spot was detected with the light sensitive diode, generating a second pulse that was sent to both EDM-PC’s. The first EDM-PC sends this pulse through to the LPT port of the WinTask toolkit, resulting in an interrupt. The time between the first (generation) pulse and the second (white spot) pulse was recorded with the WinTask toolkit. During this sequence, both pulses were also sent to the second EDM-PC connected to the Taskit computer. On this computer, the time between both pulses was also measured. If the time between the generation pulse and the white spot pulse was variable, then presentation of the stimulus was not performed correctly. If the time difference between the Taskit computer and the WinTask computer varied, then the test configuration itself was not working properly. If the time between the generation pulse and the pulse generated by the light sensitive diode was the same, while there was a time difference between both computers, the conclusion can be made that the WinTask toolkit uses no additional time for registering the events by the parallel port. Results of this test were positive (Table 1, 4.). The delay time between the first pulse, generated by the WinTask computer, and the second pulse, generated by the light sensitive diode, was equal to the delay time caused by the vertical frequency of the WinTask monitor, added with a stable 1 millisecond delay. The time difference measured with the Taskit computer was equal to the time difference measured with the WinTask computer. The entire system was tested sending a sequence of pulses for a longer period of time (more than eight hours). The nature of a pulse sequence and length of the time period did not affect the delay time. Looking at these results, the conclusion can be drawn that it is possible to send stimuli to the monitor using WinTask and DirectX with a known delay time that is equal to the vertical frequency delay time of the monitor with an additional delay time of 1 millisecond. Collecting the events using the parallel port generates no extra delay time. To see how the WinTask toolkit would perform when presenting stimuli without using DirectX, a special task was developed to be used for research with children. In this case, a task was constructed that presents two small video fragments simultaneously on one monitor using the Windows Media Player. For this purpose, a task was built using Borland Delphi in combination with the WinTask toolkit. The WinTask toolkit was used as a timing reference for presenting the different video fragments. For presenting the stimuli, Windows Multimedia Player was used as an integrated object within the Delphi program. The task was tested in the same setting as the first three tests (see Figure 4). Only now a pulse generator of 10 HZ was used, yielding a time resolution of 100 milliseconds. In this case, an output file was generated to see whether the clock would perform correctly within intervals of 100 milliseconds (Table 1, 5.). These intervals were recorded with a timestamp. Results show that the clock was stable during the entire experiment, even when two video fragments were shown simultaneously. Occasionally, a short distortion was measured of one, two or three milliseconds but this was promptly corrected in the next interval.
Clinical assessment computerized methods
178
Although the experimental tests described above do give a positive result, more of these tests are necessary to prove the stability and reliability of the WinTask toolkit. Also, the WinTask toolkit at this moment still exists of several distinct modules that have been tested separately. Next the modules must be integrated. Then this module must be tested to see how the integrated modules work together. Future Developments Although the basic functionality of the WinTask toolkit is available, still further research and development is necessary. The first goal is to create a module that has a complete downward compatibility with the existing MsDos Taskit toolkit, providing the possibility to translate many existing MsDos tasks into the WinTask toolkit in an easy way. This can be achieved by using the WinTask toolkit combination with the Borland Delphi development environment that is Pascal based. The next step, after completion of all these functions, is an extension of the WinTask toolkit by providing new functions that can make use of the additional functionality of Microsoft Windows. Such functions are: closing the operating system for the user, controlling the message queue, Multimedia and multi-user network functions. Next, the WinTask toolkit will be integrated within the structure of the Workplace for Analysis of Task performance (Bos, 1999), to provide a mechanism for using Windows within complex dynamic task environments. Finally, a visual development interface will be created for the WinTask toolkit. Hereby the user will yield a visual guiding interface that shields the user from the programming and timing problems. It gives the user a timeline at which task elements can be placed according to a predefined task scenario or script. A frame-based methodology will be used in which the user can build the experiment by defining and filling in the frames. Finally, the interface translates this frame information into a programming code which can be executed. Because of the DLL structure of the WinTask library, other development environments can also be used. Users can directly develop an experiment using one of the available Windows development toolkits such as Visual Basic, Delphi, C++ Builder or Visual C++. In this case, the user combines the functions of the WinTask toolkit with the functions provided by the used development toolkit, and compiles this into an application. Discussion The current findings implicate that it is possible to create psychological experiments within Microsoft Windows using a clock with a resolution of one millisecond. Also, access to low-level devices, such as memory, I/O ports and interrupts, is not a problem even in combination with high-resolution timing. By combining this with DirectX, stimulus/response experiments with a high-resolution timing can be created in an unchanged Windows environment.
Wintask: using microsoft windows ™
179
These findings implicate that WinTask will have the same functionality as the existing Taskit environment without additional timing problems or inaccuracies. Moreover, the standard Windows systems (Windows 95, Windows 98, Windows NT and Windows 2000) can be used for, a) task development, b) task start-up and control and, c) data analysis. Of course, the present approach still has restrictions; some can probably be overcome in the near future, others are not that important to solve. One of the main restrictions will be that clear distinctions have to be made between necessary real time applications and applications which allow much lower timing accuracy. Switching between these modes will be possible in a time range of about 100 milliseconds or less. In particular, it will not be possible to use normal Windows applications in combination with the real time measuring techniques, due to the delay caused by the Windows Graphic User Interface (GUI) and the distortions caused by the applications during execution (disk swapping). Still further research is necessary to find out the exact delay times caused by the GDI interface and disk swapping. Also, controlling hardware elements on a low level is still limited. At this moment, only one interrupt can be used for retrieving external events. Next, a device driver will be developed which allows retrieval of multiple interrupts. Within the next year, the problems described above can be solved to ultimately create a toolkit in which real time psychological experiments can be generated, using all the advantages of the Microsoft Windows environment and leaving away the current restrictions. The final step will be the integration of the WinTask toolkit with the Workplace for Analysis of Task performance environment (Bos, 1999). Using the extended possibilities of the WinTask toolkit, it will be possible to use the Microsoft Windows environment within the workplace to conduct real time experiments. References Bargen, B., & Donnely, P. (1998). Inside DirectX, In-depth techniques for developing highperformance multimedia applications. Microsoft Programming Series. Bos, J., Mulder, L.J.M., & Ouwerkerk, R.J.van (1998). Digitale werkplaats (internal report). Institute of Experimental and Work Psychology, University of Groningen. Bos, J., Mulder, L.J.M., & Ouwerkerk, R.J.van. Workplace for Analysis of Task Performance. This volume. Gerritsma, F., & Meulen, P.van der (1991). The TASKIT development environment. In L.J.M.Mulder, F.J.Maarse, W.P.B.Sjouw, & A.E.Akkerman (Eds.), Computers in Psychology: AppilicationsEducation, Research, and Psychodiagnostics (pp. 81–88). Amsterdam/Lisse: Swets & Zeitlinger. Ishekeev, V. (1998). TvickHW32, a toolkit for accessing hardware in Windows 95/98 and Windows NT. User manual and function library. Maarse, F., Ghisaidoobé, H., & Bouwhuizen, C. (1998). Real-time aspecten van Windows 95/NT voor psychologische tests (internal report). Institute for Cognition and Information, Nijmegen. Ryledesign Corp (1998). ExacTicks, a toolkit for high resolution timing in Windows 95/98 and Windows NT. User manual and function library. Schneider, W. (1988). Microcomputer Experimental Laboratory (MEL). Behavior Research Methods, Instruments and computers.
Clinical assessment computerized methods
180
Simon, R.J. (1996). Windows NT Win32 Api Superbible. Waite Group Press, Sms Publishing, Corte Madera, U.S.A. Solomon, D.A (1998). Inside Windows NT, second edition. Microsoft Programming Series, Microsoft Press, Redmond Washington U.S.A.. Stowers, B. (1998). Winhook, unit for grabbing the Windows message queue. On Line Function Library. Available: http://www.pobox.eom/~/bstowers/delphi/. Ticher, M., & Jennrich, B. (1996). PC Intern 5.0 Systeemprogrammeren. Easy Computing, Brussels. Ryledesign Corp (1998). ExacTicks, a toolkit for high resolution timing in Windows 95/98 and Windows NT. User manual and function library.
Chapter 15 Mouse or Touch Screen J.van de Ven1 and A.de Haan2 1
Dutch IT Group/E&K Multimedia, Van Kinsbergenstraat 5, 8081 CL Elburg 2 NICI/University of Nijmegen, Postbus 9104, 6500 HE Nijmegen
Abstract In research on costs and benefits of different input devices for Graphical User Interfaces (GUI’s), we looked at task performance measures in both a simple pointing task and a more mentally loaded pointing task. In the pointing task, as expected, touch input is faster than mouse input. With touch input we found an interaction between errors and target-size, whereas target size has no influence on errors made when working with the mouse. The differences between the devices disappear, however, during performance of a mental task, updating mental counters while pointing. Analysis shows that device properties are no longer a significant determining factor in task performance.
Introduction Different Input Devices Human beings interact with computers through interfaces, using a keyboard and other input devices. Many users will use a mouse as an input device, as this is the device most likely to be attached to the computer. However, when interacting with a computer in public places, a touch screen is very popular. It does not require an additional flat space like the mouse, and it is less vulnerable to crime. Motor Skills Both devices are very different with respect to their cognitive and perceptual motor aspects in navigation. Touch screens are considered to be easy to use input devices because of their directness (Mayhew, 1992, Sears & Shneiderman, 1991). Pointing at a (touch) screen is simply achieved by moving a finger or a pen through three-dimensional space to the corresponding target on the screen. Like everyday pointing, this kind of aiming consists of two phases, a distance covering and a homing in phase (Woodworth, 1899). The first phase will cover most of the distance between the starting position of the finger and the target, and will most likely be fast and relatively imprecise. This distance-covering phase can be treated as a ballistic
Clinical assessment computerized methods
182
movement, where no corrections can be made during the execution of the movement. This is due to the fact that ballistic movement is primarily based on central proprioceptive feed-forward co-ordination of the movement with respect to the target. Instead of making corrections during the movement, the ‘path’ is estimated so that the end of the distancecovering phase ends in a position nearby the target. The final movement towards the target, the homing-in phase, will be performed much slower so that corrections can be made and the target will eventually be reached. Contrary to the touch screen, the mouse is indirect and takes longer to learn. Users have to learn to co-ordinate between hand movement and the position of the cursor on the screen. Besides this indirectness, there is also the problem of the way movements with the hand are related to the movements of the cursor. While the mouse moves in a horizontal plane on the desk, the cursor moves in a vertical plane on the screen. Proprioceptive feedback of the limb will provide information about the place of the limb and thus the mouse on the table. This kind of feedback, though, does not provide information about the cursor on the screen. That information is provided by perceptual feedback. Thus, in order to direct the cursor to the desired place, one has to depend more on perceptual feedback than on proprioceptive feedback. One of the consequences of the more or less indirect relation between proprioceptive information and perceptual information is that it is unlikely that mouse movements are of a ballistic nature, because of the lack of relevant proprioceptive feed-forward information. Previous Experiments Many experiments have been done to compare mouse and touch screen. Touch screens are supposed to be fast and easy to use, but not very useful in situations where one has to position a cursor in a text or perform dragging operations (Mayhew, 1992). Touch screens have been found to be less accurate than a mouse (Albert, 1982), probably due to their ballistic phase, this makes them less useful in task environments where correcting an error is not easy. As a result, these are the environments were a mouse is a more popular input device. However, a more recent study from Sears and Shneiderman (1991) shows that touch screens can, in most situations, be as accurate as the mouse. Only for small targets (0.4×0.6 mm and 1.7×2.2 mm) is the touch screen less accurate than the mouse. Currently, touch screens are increasingly used in public information systems and in small electronic devices. Typically, these small devices combine a touch screen with a pen-like pointing device, because a finger is to big for the small screens. Public information systems, like city-location systems1, on the other hand, are systems comparable to mouse operated systems. As they are placed in public places, they should be easily accessible and operable for everyone. Touch screens are easier to operate than a mouse, especially for novice users, as soon as users know it is a touch screen. Besides being easy to operate, touch screens also have no need for extra space (for mouse and keyboard). This makes them good alternatives for mouse-like input devices. 1
Systems where city-maps are provided.
Despite these positive characteristics, the trade-off between speed and accuracy for touch screens is notorious and creates a negative image. Many experiments have shown
Mouse or touch screen
183
that users make more errors using a touch screen when they speed up. This trade-off could prevent touch screens from becoming widely used input devices, because correcting errors takes time and frustrates users. New technologies that are used to make touch screens nowadays could influence the accuracy in a positive way. The qualitative perceptual and motor differences between the two devices and their properties provide them with distinctive performance profiles that are optimal for different situations. In the current research we shall report on an experiment that is designed to clarify these distinctive performance profiles. Experiment Target Size From past experiments, it is clear that a touch screen is a very different device to a mouse, and that a touch screen’s major drawback is its inaccuracy. This is easy to understand when targets are very small; the finger when used as a stylus is not as small as the pointer on the screen or a hair-cross-cursor. However, Albert (1982) reported the same trade-off for larger targets, which are expected to be easier to point at. Albert used a 1¼-inch (3.1 cm) square target and still found a low accuracy for touch screens. This might be due to the fact that accuracy was stated as the number of pixels from the center of the selected target and not the number of errors in hitting the target. Six years later, in guidelines for touch screens, Brown (1988) advises targets of ¾ square inch (1.9 cm) and at least 1/8 inch (0.3 cm) apart. Another three years later, in their 1991-study, Shneiderman and Sears show that changing the selection strategy to ‘take-off, instead of ‘land-on’2, has a positive effect on accuracy. Only for small targets (0.4×0.6 mm and 1.7×2.2 mm) is the touch screen less accurate than the mouse. For targets that are bigger (6.9×9.0 mm and 13.8×17.9 mm), no difference is found between mouse and touch screen. In the current experiment, we would like to look at the influence of target size on the use of both devices. If accuracy does increase with larger touch objects, then this could make a difference for touch screen use. Our first task is a simple pointing task, modeled to simulate a Fitt’s law-like continuous pointing task. A target appears on the screen and subjects have to hit the target as quickly as possible. Three different target sizes were used—3.1 (large), 1.9 (medium), 1.27 (small) square cm—to investigate the influence of target size. To position the targets on the screen, the Fitt’s law ‘Index of Difficulty’-formula (ID, where ID=log2 ((distance+width)/width)) was used (MacKenzie (1989), Murata (1996)). In this formula, ‘distance’ is the distance from the starting point to the target, and ‘width’ is the width of the target. Five ID’s (0.5, 2
Take-off strategy means that the object is selected when the finger is taken of the touch screen. Land-on is what most touch screen users will recognise, an object is selected by landing on that object.
Clinical assessment computerized methods
184
1.0, 1.5, 2.0, 2.5) were used to compute the related distance for every target size. For example, when using ID=1.5, the distance associated with this ID for small targets is 2.23 cm. The related distance for this ID with the large targets is 5.43 cm. Task Difficulty The task just described is a very simple task and looks a lot like the tasks used in many past experiments. A target appears and, after hitting it, it disappears. This is a good abstract task that will enable us to say something about pointing in general. However, nowadays computer-tasks are more complex and pointing is embedded in these complex tasks. If a user is typing letters or analyzing data, pointing will be part of the task, but not the main part. One question we asked ourselves is whether the pointing behavior is different in those situations, and no longer dependent on the pointing device used. Our second goal is therefore to add a more complex task that will enable us to find out how different these devices really are. An abstract mental task was used to check the influence of task difficulty on pointing behavior. A character appears on the surface of the target and subjects have to mentally count the occurrences of these characters. This mental task is based on the experiment of van Dellen, Aasman, Mulder, and Mulder (1985) where the QRST task was used to manipulate workload. In the QRST-task, subjects start by memorizing a set of characters, the memory-set. This set contains a variable number of characters (1–5), depending on the condition. Then a series of stimulus frames is shown, consisting of 1 character or digit. The subjects carry out three different tasks. In the simple task subjects push a ‘YES’-button if the character is a member of the memory-set, and push a ‘NO’-button if the character is not a memory-set member. In the counting task, subjects only count the occurrences of the memory-set characters during stimulus presentation. In the dual task, both the simple task and the counting task have to be performed. This experiment (van Dellen et. al., 1985), shows that the memory load has a slowing effect on reaction times. In the dual task condition, reaction times, especially for Yes-responses, are much slower than in the simple task condition. According to Aasman, Mulder, and Mulder (1987), this shows that the time-sharing aspects of these tasks (memorizing counters, and especially updating of counters in Yes-responding) heavily interfere with the process of memory search. Also, reaction time errors and counting errors continue to increase as a function of memory load. Heart rate variability, as a measure of mental workload, decreases strongly with increase in memory load, showing that time-sharing requires increased mental effort. It is shown that a memory-set of 3 counters is still tolerable for most subjects, but that 4 and 5 counters are (too) difficult for most subjects and they then have problems coping with the task. In our experiment, we manipulate mental load by presenting subjects a memory-set of 3 characters: ‘A’, ‘B’ and ‘C’. They are instructed to count the occurrences of these characters and still perform the pointing task. The information is presented on the target, and contain only A’s, B’s and C’s. After the task, subjects are asked to state the totals for each character. S that the visual input remains equal, a character, ‘A’, is displayed on the targets during the simple pointing task, but no meaning is attached to this character.
Mouse or touch screen
185
Method Subjects 16 Students, 9 female and 7 male, between 20 and 30 years of age, from the University of Nijmegen participated in this experiment. All were experienced computer and mouse users (working almost everyday with the computer and mouse). Some subjects had no experience of using the touch screen. This was not considered to be a problem, since it is an easy to use device and subjects were offered a short familiarization exercise with the touch screen. None of them were paid to participate in the experiment. Apparatus All tasks were performed on an 80386 PC, using a 20 inch “intellitouch” touch screen with land-on strategy. Land-on-strategy means that the target is selected when the finger touches the object on the screen. The screen was placed in front of the subjects in the normal vertical position. If the sessions required a mouse, the mouse was placed in front of the subjects. The mouse-speed was adjusted to the slowest rate. Procedure During a session of 45 minutes, subjects performed both tasks as described above, the simple pointing task and the mental task. Practice sessions pointed out that the simple pointing task was considered fun, but the mental task was difficult for subjects, and took a lot of time so that they could easily lose enthusiasm and concentration. In order to keep subjects enthusiastic and focused on all tasks, it was decided to test target size influence on the simple pointing task and not on the mental task. Also, all simple pointing tasks were put first and the mental task last. Before the session started, a heart-belt, to record heartbeat, was put on and a 3-minute rest measurement was taken. Heartbeat was registered as a mental workload measure. After the rest-measure, the first instruction was given to the subject, and a practice block was carried out. The first tasks were the simple pointing tasks, in order to show the difference between the devices and to test the influence of target size; this is shown in Table 1. Two independent variables, device and size, were balanced over six blocks. In one block, 100 targets appeared on the screen, all the same size, and every target had to be hit with the same device. This meant that device and target size were constant within a block. Devices were alternated between blocks and the target sizes were counterbalanced. If subjects started with large targets and the mouse, they ended with large targets and touch screen. Starting device and target size were randomized over subjects.
Clinical assessment computerized methods
186
Table 1. Task overview, S=small, M=medium, and L=large target size. The six pointing tasks are counterbalanced within subjects, 100 targets are presented. The mental tasks are always carried out last, 50 targets are presented. Pointing Task Mouse
Mental task
Touch Screen
Mouse
S
M
L
S
M
L
100 x
100 x
100 x
100 x
100 x
100 x
–
M 50 x
Touch screen –
–
M
–
50 x
After completing all six blocks (the first six in Table 1), subjects were given the instruction for the mental task and with a practice set. Subjects were then asked if they were comfortable with the task or needed any help in performing the task. If they needed any help they were offered another practice set. In the mental task, subjects had to hit the targets as in the simple pointing task, but also had to mentally count the occurrences of the three characters that could appear on the target. Characters (A, B or C) were randomly generated. In a block, 50 targets appeared before subjects had to give the totals. The mental task was carried out twice (see Table 1), once using the mouse and once using the touch screen. The device that subjects started with was equal to the device they first used in the simple pointing task; so half of the subjects started the mental task using the touch screen and the other half started using the mouse. The medium target size was used during this task, and other target sizes were not used for reasons explained earlier. After these last two tasks, subjects were asked to complete a questionnaire and to discuss their experience with all tasks. Results Task completion time, number of errors and heart rate were measured to compare touch screen and mouse. In this article, the results on the heart rate data are not discussed (for more information see van de Ven and de Haan, 1999). Results using Fitts’ law will also be presented, mainly to show some of the differences between task difficulty. Task Completion Times Task completion time is the time between the appearance of the target on the screen and its disappearance. The target did not disappear until it had been hit, therefore the repair time for missing a target is included in the task completion time.
Mouse or touch screen
187
Figure 1. Task completion times for the pointing task, using mouse and touch screen. The results show large differences between devices. As expected, subjects work faster with a touch screen, F(1,15)=70.8, p<.00. However target size has a different effect on the task completion time for each device (F(1,15)=11.3 p<.05. Small target sizes have a slowing effect on the task completion time for touch screens. In the mouse condition, the large and medium targets show, especially at larger distances, an unexpected slowing effect on task completion time, when compared to the small target. This can be seen in Figure 1. Another interaction effect (F(1,15)=266.0 p<.00) found is that of device and distance. As expected, larger distances take more time to be reached. This holds for both devices, but the effect is larger for the mouse condition. Larger distances are, in this experiment, related to larger ID’s. The increase in task completion time related to distance can therefore be seen in Figure 2. For the mental task, we found no difference in task completion time between the two devices (F(1,15)=0.1 p>.50). Also, no influence of index of difficulty was found between these two devices (F(4,12)=1.8 p>.10). The pointing task is completed much faster that the mental task, as can be seen in Figure 2. This was expected because, based on past experiments, more difficult tasks are expected to have a longer task-completion time.
Clinical assessment computerized methods
188
Figure 2. Task completion times for mouse and touch screen in the pointing task (small, medium, large) and in the mental task. Errors The task completion times show the effects that were expected: faster completion times for the touch screen. As for the errors, it would be expected that the same touch screen would also be less accurate. A pointing error is made when a subject touches the screen (when using the touch screen) or clicks the mouse (when using the mouse), but does not hit the target. Indeed, error-analysis shows that the touch screen is less accurate in the pointing task F(1, 15)=19.8, p<.00. The overall percentage errors made with the touch screen are larger than for the mouse. The analysis also shows a significant difference between the target size, F(2, 14)=21.6, p<.00. Contrast analysis shows that this effect is linear; pointing errors decrease with the increase of target size. There is also an interaction effect between device and target size, F(2, 14)=18.0, p<.00. The effect of target size is larger for touch screens. Figure 3 shows that the difference in the percentage pointing errors for each target size is much larger for the touch screen than for the mouse. Figure 3 also shows that the percentage of pointing errors for large targets is lower for touch screens than for the mouse.
Mouse or touch screen
189
Figure 3. Percentage errors for mouse and touch screen, in the pointing task Figure 4 shows the pointing errors for the pointing task compared to the mental task. Statistical analysis shows that there is no overall effect of percentage of pointing errors between the tasks. However, there is a statistical difference between the devices, F(1, 15)=6.8, p<.05; less errors are made with the mouse, regardless of the task. The interaction effect between task and device is not significant, indicating that, taking into account the task and the device, the percentage of pointing errors is equal. During both tasks, pointing errors could be made, but during the counting task, counting errors could also be made. Subjects had to count the occurrences and could miscalculate these. Results show no significant differences between the devices. About 3.5% of counting errors were made using the mouse, and 5.2% using the touch screen. Fitts’ Law The regression between Fitts’ law ID and the task completion time tells something about the predictability of the ID for task completion times. If the regression is high, and there is a linear connection, then the data are homogeneous and the ID can be used as an estimate for task completion time. On the other hand, if the regression is low, then data points are scattered and not useful for making predictions about task completion time. We found that data for the pointing task was nearly linear, but data for the mental task was not.
Clinical assessment computerized methods
190
Figure 4. Percentage of pointing errors for mouse and touch screen, in the pointing and the counting tasks. In the case of the pointing task, data for mouse and touch screen had to be treated separately to get better results, indicating that mouse and touch screen are different. When all data were analyzed together, then r2=.5; but when the devices were analyzed separately, removing between-device variance, then r2= .9 (for the mouse) and r2=.8 (for the touch screen). Figure 5 shows a scatter plot of all data together. Here the difference between mouse and touch screen is very clear for the larger ID’s. For the mouse, separating the data according to target size gave similar results: r2=.9 (for all targets). But for touch screen, it was clear that larger targets give more homogeneous results. While small targets have a r2 =.7, medium and large targets have a r2=.8. As for the mental task, both mouse and touch screens give low revalues. For the mouse r2=.07 and for the touch screen r2=.005. Both data sets are scattered, and movement time is clearly not dependent upon distance to—and size of—the target or the device used. Figure 6 shows a scatter plot for the touch screen for medium targets during the mental task. The scatter plot for the mouse during this task shows a similar picture.
Mouse or touch screen
191
Figure 5. Scatter plot of data points from mouse and touch screen in the pointing task, for all target sizes.
Figure 6. Scatter plot of data points for the touch screen in the mental task.
Clinical assessment computerized methods
192
Discussion The main implication of the reported research is that the expected qualitative difference between the mouse and the touch screen exists, but that they may be obscured in certain task contexts. When the pointing task is accompanied by a more difficult task, no differences are found between the two devices, not in operation speed, nor in accuracy, nor in performance. We observed that, in a simple pointing task, a touch screen is always faster than a mouse. However, the touch screen is also less accurate except for the situation with the large targets, where a touch screen is both fast and accurate. While the target size is important when operating the touch screen, it is of no influence when working with a mouse. So, a mouse is a good input device when working with the little icons in, for example, text editors or spreadsheet programs. The touch screen is of little use in these situations. It would probably cause a lot of frustration having to correct the errors made because, when the icons are so small, putting a finger on one icon is a difficult task, increasing the chance of making an error. The correction of errors was relatively simple during this experiment, and therefore of low impact on the task completion times. In our experiment, nothing happened when making an error, and subjects could retry hitting the target without having to first correct the situation. In everyday computer programs, missing one target usually means hitting another one, so, before a user can continue, the previous situation has to be recovered. In real life, therefore, the impact on both task completion time and user experience will decrease the usability of the touch screen in situations where targets are small. Also, drawing programs, where accuracy is important, are better operated by a mouse because working with pixels is easier with a cursor (or hair-cross cursor) than with a touch screen. Generalizing, it can even be stated that most applications in office-like environments are built for mouse-like interfaces and thus better operated with a mouse. In contrast to results that Albert (1982) found in his study, we conclude that touch screens can be accurate devices. Besides public places where touch screens are convenient for their easiness of use, there are more situations where touch screens could be useful. Situations where users have to respond quickly to one or more different options e.g. control rooms and emergency call-centers. All places where people use a computer to control processes, but do not continuously use it in their job, can be situations in which touch screens are suitable devices. No time is lost with a touch screen by having to locate the mouse on the desk and after that the cursor on the screen. By pointing directly at the wanted object, precious seconds can be saved. Also, young children and elderly persons can benefit from the ease of use (as pointing is for them), when small motor movements are difficult. Also, changing the target size in an educational game could help young children develop their motor system, and a touch screen becomes a sort of exercise tool at a moment where the mouse might still be too difficult. The actual design constraints depend on both situational and cognitive factors, such as task environment and limitations of working memory. For example, it is much easier to keep a good overview of a screen if all possible choices fit on one screen layout. However, when targets are relatively large, as we recommend with touch screens, this could be a cumbersome procedure in situations where many choices can be made. To show all touch objects, either a bigger screen has to be used or interfaces will have to
Mouse or touch screen
193
include more hierarchies. Both solutions generate a higher load on memory and a higher load on attention capacities. So larger touch-objects will have a trade-off if the number of touch objects is rather high. One thing that could prevent this is using a stylus or pen-like device for pointing. Using such a device to point at small objects could solve some of these drawbacks, but might introduce others, like finding your stylus on a desk full of papers. A problem with our conclusion regarding speed could be the fact that the mouse was adjusted to the slowest possible speed. Maybe the results for the mouse would have been different if other speed adjustments had been made. This might not only change the task completion time but also the number of errors made, and of course it could influence the results from Fitts’ law in the pointing situation. According to earlier research in this field (MacKenzie, 1989) we would expect the fit to stay at this level or to improve. We don’t expect Fitts’ law will have a better fit in case of the mental task. Analysis shows that movement difficulty, as proposed by the index of difficulty by Fitts, is no longer of dominant influence in completing the task as it was in the simple pointing task. Finally, we would like to affirm that mouse and touch screen are devices that operate on different qualitative principles and are best suited for different situations. Pointing tasks can be carried out with a touch screen much faster than with a mouse. If users have to work fast and (almost) error free, the target must be large. For pointing tasks that include small targets, a mouse is a more useful device, especially when the distance between the targets is small. References Aasman, J., Mulder, G., & Mulder, L.J.M. (1987). Operator effort and the measurement of heart rate variability. Human Factors, 29(2), 161–170. Albert, A.E. (1982). The effect of Graphic Input Devices on Performance in a Cursor Positioning Task. Proceedings of the Human Factors Society 26th Annual Meeting, 54–58. Brown, C.M. (1989). Human-Computer interface Design Guidelines. Norwood, N.J.: Ablex Publishing Corporation. Dellen, H.J.van, Aasman, J., Mulder, L.J.M., & Mulder, G. (1985). Time domain versus frequency domain measures of heart-rate variability. In J.F.Orlebeke, G.Mulder, & L.J.P.van Doornen, (Eds.), The psychophysiology of cardiovascular control (pp. 353–374). New York: Plenum. MacKenzie, I.S. (1989). A note on the information-theoretic basis for Fitts’ law. Journal of Motor Behavior, 2(3), 323–330. Mayhew, D.J. (1992). Principles and Guidelines in Software User Interface Design. Englewood Cliffs: New Jersey: Prentice Hall Murata, A. (1996). Empirical evaluation of performance models of pointing accuracy and speed with a PC mouse. International Journal of Human-Computer Interaction, 8(4), 457–469. Sears, A., & Shneiderman, B. (1991). High precision touch screens: Design strategies and comparisons with a mouse. International Journal Man-Machine Studies, 34, 593–613. Ven, J.Van de, & de Haan, A. (1999). Differences and similarities in the usability of mouse and touch screen [CD-ROM]. Proceedings of the CAES'99 conference, Barcelona. Woodworth, R.S. (1899). The accuracy of voluntary movement. Psychological Review 3,_(2, whole No. 13).
Chapter 16 Automan: A Psychologically Based Model of a Human Driver L.Quispel, S.Warris, M.Heemskerk, L.J.M.Mulder, and P.C.van Wolffelaar Experimental and Work Psychology, Department of Psychology, University of Groningen, Grote Kruisstraat 2/1, 9751 MN Groningen Abstract This paper describes the design of an autonomous agent for controlling vehicles in a traffic simulator. This agent is based on recent developments in artificial intelligence, autonomous robotics and cognitive psychology. The goal of the agent is to simulate realistic driving behavior. The agent is composed of four control systems. The Perception system controls visual attention and gaze direction. The Behavior System controls high level driving behavior. The Action system controls the actions required for lowlevel control of the car. The Emotion System implements the influence emotions have on human driving behavior. Furthermore, it contains three different types of memories. A declarative memory contains the knowledge the agent has about the world. A procedural memory contains all rules and procedures required for driving. Lastly, a working memory is used for storing representations of the actual situation. These systems and memories are realized using a behavior based approach, in which the overall behavior of the agent is the result of the interaction between small and simple behavioral patterns. Fuzzy logic is used to assure natural flow of information and to make humanlike reasoning possible.
Introduction Research on human task behavior has traditionally been performed in either laboratory settings or real world environments. While a laboratory setting provides explicit control over experimental conditions, it often lacks sufficient realism. This realism is present in real world situations, but these are prone to uncontrollability and unexpected disturbances. Therefore, simulations are required that provide more realism and ecological validity, while at the same time providing excellent control (Brookhuis, Bos, Mulder, & Veltman, (2000); DiFonzo, Hantula, & Bordia, (1998); Sauer, Wastell, & Hockey, (2000); Brehmer & Dorner, (1993)). For this purpose, various simulators have been developed at the University of Groningen (Bos, Mulder, & Ouwekerk, (1999); Wolffelaar, (1996); Wolffelaar & Van Winsum, (1993)), the first one being a driving simulator. When building such a simulator, it is not only important to have a good model of the physical world. To provide a realistic environment for studying human behavior, the behavior of virtual traffic participants also needs to be simulated. The current
Automan: a psychologically based model of a human driver
195
simulator uses simple agents for simulating other traffic (Wolffelaar, (1996); Van Winsum, (1996)). These agents are controlled by a small set of rigid rules. These rules enable the agents to function in the simulated traffic environment. However, because of their rigid rules, their behavior is very straightforward and predictable. People driving in the simulator have the impression that other cars are simple robots. Different kinds of traffic behavior and emotional influences, for example aggressive driving styles, or elderly people’s driving styles, are very hard to model. Also, new traffic situations require the agents to be partly rewritten. For realistic agents in simulators, simple rules are not enough. Tambe et al. (1995) show that the requirements for intelligent and autonomous agents in interactive simulations should be addressed by using a sophisticated cognitive model. They used the SOAR cognitive architecture (Laird, Newell, & Rosenbloom, (1987)) to successfully create realistic automated pilots for use in simulated combat exercises (Hill, Chen, Gratch, Rosenbloom, & Tambe, (1997); Jones et al., (1999)). The present paper describes a new cognitive model that can be used for simulating traffic participants. The goal of this model is to generate more realistic driving behavior, incorporating emotional aspects into this behavior, and making the agents more interactive. In the next section, the approach followed in designing this model will be explained, and the theoretical and technical background needed for this approach will be discussed. Thereafter, the model and its subsystems will be described. Finally, the paper will be completed by a look into the future. Simulating Behavior Human Driving Behavior Over the years, much research has been done on the question of how humans drive. McKnight and Adams (1970) did a complete task analysis of a driving task. However, this analysis was aimed at behavioral requirements (specifying how a driver should drive) instead of being aimed at how a driver actually drives. Research indicates that it is useful to distinguish three task levels: strategic, tactical and a control level (Knippenberg, Rothengather, & Michon, (1989); Michon, Smiley, & Aasman, (1990); Rothengather & Vaya, (1997); Cnossen, (2000)). At the strategic level, plans are made on which route to take, what kind of transportation to use, etc. Decisions at this level are influenced by personal opinions, attitudes and circumstances. The tactical level describes how specific situations are handled; for example, how to cope with an intersection. At the control level, all low-level control tasks are handled, such as steering, gear shifting, etc. Normally, tasks at this level will be handled automatically; a driver is mostly not conscious of how he shifts his gear. The three levels influence each other; for instance, take the situation depicted in Figure 1. Driver A is confronted by a T-shaped intersection. He has to decide whether to take the left or right road. This decision is made at the strategic level. Then, he has to make a turn, while watching other traffic. This task is a tactical task; how it is performed is dependent on the traffic situation, and of course on the direction of the turn. If traffic permits, he has to perform the actual turn, which consists of braking, shifting gear, steering and accelerating. These tasks are control level tasks, and are initiated from the tactical level.
Clinical assessment computerized methods
196
Although they influence each other, these three levels can be more or less independently studied. For instance, when studying human route planning, it is not necessary to consider human manual skills. Conversely, a model of human vehicle steering should be able to describe human behavior at an intersection without considering why the driver wants to turn.
Figure 1. Approaching a T-shaped intersection. This situation involves tasks at the strategic (which road to take ?), tactical (wait for car coming from the left ?) and control (how much force to use on the brake pedal ?) levels. Modeling Human Behavior Human behavior modeling is a diverse field, in which researchers engage with different objectives. Without going into an extensive review of the work being done, one can state that much research is based around so-called cognitive architectures (Van Lehn, (1991)). These are based on the assumption that humans are capable of performing a wide variety of cognitive functions (from understanding language to playing chess) using the same basic underlying setup (the brain). If such a basic setup could be specified, one would have one system in which it would be possible to model various human cognitive processes. Cognitive architectures are attempts at specifying and implementing such structures. Well-known examples are ACT-R (Anderson & Lebiere, (1998)), SOAR (Laird et al., (1987)) and EPIC (Kieras & Meyer, (1995)). However, especially in complex and dynamic tasks such as driving a car, these architectures have some drawbacks (Banks & Stytz, (2000)) (Meyer & Kieras, (2000)). Situation assessment, decision making and planning are not part of the architecture, and must be manually modeled in each model. The architectures use production systems for their reasoning;
Automan: a psychologically based model of a human driver
197
these are sets of elaborate rules, and the logic for applying these rules. However, advanced reasoning techniques like fuzzy logic, Bayesian and uncertainty reasoning are not incorporated in these production systems, which makes their applicability limited. Also, intentions and emotions are hard to model in these architectures. The goal and sub goal structure normally used for control would be a candidate for this, but when intentions and emotions are seen as behavior modifiers, this would not be the right way. Other problems are the granularity of the models, and the absence of perceptual/motor processes. Mostly, it is very hard to model on different levels; for instance, the strategic, tactical and operational levels from the previous section would be very hard to incorporate. Progress has been made with the incorporation of perceptual/motor processes in architectures, but these mechanisms used are far from ideal. Some other shortcomings exist, but they are not relevant for the subject of human driver behavior modeling (Banks & Stytz, (2000)). Previous attempts at driver modeling in cognitive architectures (for example, Aasman, (1995)) have suffered from these drawbacks. Alternatively, probabilistic models have been used for behavior modeling; for example, Oza (1999). However, this is a very difficult approach if behavior is to be simulated for a wide variety of situations. Autonomous Agents An autonomous agent is an artificial ‘creature’ in a (virtual or real) world, that can act and react without direct interference from an outside controller; it can, so to speak, make decisions for itself. In Autonomous Agent design, one useful approach is called the Behavior Based approach (Brooks, (1991); Braitenberg, (1984); Steels, (1994); Steinhage & Bergener, (1998)). The idea is that, to model intelligent behavior in a complex, dynamic environment, one can specify relatively independent modules that perform small parts of the behavior. Traditionally, one would make a central controller that has access to all sensorial information, and incorporates a decision mechanism to decide on the correct action. There are several problems with this approach. A central representation of the environment will probably be needed, which has to be updated when an ac-tion is performed or when the environment changes. This representation also has to contain all the information necessary so that the control structure can decide on an action. This is inherently complex (Dennett, (1987); McFarland, (1996)). Also, the more complex the task of the agent and its environment gets, the more structures that will be needed, and the more complex the control structure gets. For every possible situation, a process has to be designed to cope with that situation, and the central controller will need a mechanism that decides to start that process. By contrast, in the behavior based approach, the task of the agent is split up into small elementary behaviors. These behaviors are directly coupled to only their relevant sensorial inputs, and can be activated by these inputs. Also, behaviors can activate or inhibit one another. The overall behavior of the agent is determined by the interaction between the elementary behaviors. By specifying relatively simple behaviors and their interactions, complex behavior will emerge from the interaction of those simpler elements of behavior.
Clinical assessment computerized methods
198
Fuzzy Logic To model human reasoning processes, one needs logic in order to specify rules. However, classical logic is not very well suited to modeling human reasoning in driving a vehicle. Classical logic needs precisely defined terms to function, and human reasoning does not function with such terms. When driving, another car is perceived as driving fast, not as driving 150.7 km/h. Rules used by humans are more like: If the car in front of me is close and I’m driving fast, then brake. The exact definitions of close and fast cannot be given. Classical logic will therefore fail in a driving task. Fuzzy logic however is designed to work with “vague” definitions (Ross, (1995); Kosko, (1992)). With fuzzy logic, it becomes possible to use terms like fast, very close etc. without having to specify them exactly. The values of fuzzy variables are given by membership functions. These functions determine which values belong to the fuzzy terms; e.g. it may specify that speeds between 10 and 50 km/h are to be considered slow. A value can belong more or less to several terms ; 45 km/h is only a little slow, and is almost normal (see Figure 2). This way, human reasoning processes can be modeled much better then by using classical logic.
Figure 2. Fuzzy membership function for speed. As can be seen, the different values overlap. This makes it possible to work with terms that are not precisely defined. Membership functions can have all sorts of shapes. The function shown is simple and has proven to work very well in most cases. Vagueness Of course, the fuzziness introduced by the membership functions can be manipulated. Is normal something between 50 and 110 km/h, or something between 60 and 140 km/h ? The smaller the range used in the membership functions, the less vague or uncertain the
Automan: a psychologically based model of a human driver
199
property value is. This vagueness is used in the Automan model; compare, for instance, what is perceived when someone takes a quick glance at the road to their left; the approaching car will be seen, but the driver will not be completely aware of its speed or distance. Also, other cars on the road may not be seen at all. However, with a long look, the speed, distance, even brand and color of a car can be very accurately observed. The property values of the object representing the car will have a lower level of vagueness. For instance, suppose the speed of the car is normal; with a quick glance, the range of normal can be between 30 and 130 km/h; with a long look, this range can be decreased, and normal can be more precisely defined as being between 40 and 60 km/h. Every fuzzy value has an intrinsic vagueness, and additionally, an actual vagueness. This actual vagueness will be adapted according to the situation. The Automan Model Although a driving task can be described by three almost independent levels, the processes at these levels can be described in a similar way. The amount of force required on a brake pedal to attain a certain speed (a control task) can be determined by a set of rules, be they fuzzy or not. The decision to attain that particular speed (which is a tactical decision) is also governed by rules; for instance, if one wants to make a turn, one has to be moving at a certain speed. Actually, one could say that the application of the control level rules is triggered by the outcome of the tactical level rules. The same argument goes of course for the strategic and tactical levels. On the strategic level, the driver has to decide whether he should turn left or right at this intersection. This is , of course, dependent on his final destination. Suppose he is traveling from Groningen to Utrecht, and a traffic sign indicates that Utrecht is to the right. This process can be described by a rule set: IF the final destination is to the right THEN turn right; IF the final destination is to the left THEN turn left. Because the driver has to turn right, a whole new set of rules is triggered. Of course, the driver first has to watch his speed: IF I approach an intersection AND my speed is not slow THEN brake. Then, the driver has to determine whether he can safely initiate his turn: IF I am turning right AND a car from the left is near THEN stop. IF I am turning right AND I can’t see whether a car from the left is near THEN slow down and keep looking. IF I am turning right AND there is no car near on the left THEN look to the right AND start turning. At the control level, initially Automan is braking to reduce its speed. Again this can be described as a rule set. It is not so difficult to imagine rules that describe these braking patterns. When, somewhat later, the driver is initiating his turn, another rule set would be needed: IF my speed is slow AND I am starting to turn right THEN steer right. In this pattern, some more rules would be needed to control the turn, but the general idea will be clear. By describing the three levels in the same way, using rule sets, it becomes possible to model human driving behavior on all three levels. Actually, when one examines these rule sets closely, one can determine very specific sets of rules, which are applicable in specific situations and which need to be triggered by certain outcomes of other very specific rule sets. These rule sets are very well suited for use in a behavior based design
Clinical assessment computerized methods
200
of a driver model. A rule set can be seen as a behavior, as explained in section 2.3. In the context of cognitive modeling, the word behavior is used for a lot of things. Therefore, we will refer to the behaviors/behaviors we define for our model as behavioral patterns. In our example, there would be a strategic behavioral pattern, let’s call it Drive_To_Utrecht. Furthermore, there would be two tactical patterns. The first one, Approach_Intersection, will be triggered by the perception of the intersection. The second one, Turn_Right_T_Intersection, will be triggered by the outcome of the Drive_To_Utrecht strategic pattern. Finally, operational behavioral patterns will be needed for slowing down and turning. Our driver model, called Automan (Automated Human), is based on this concept. All tasks and subtasks involved in driving a car will be described as behavioral patterns. Automan’s behavior is controlled by active behavioral patterns. A behavioral pattern can be activated either by some perception or by another behavioral pattern. It can occur at the strategic, tactical or control level. All active patterns are stored in working memory, together with all perceived objects (we will describe Automan’s perception later on). Working memory is continuously evaluated against rules in the procedural memory. In this memory, all rules necessary for the patterns are stored. These rules specify the patterns themselves and when they have to be activated. Automan also has a declarative memory, in which its knowledge of the world is stored. In this memory, object definitions, which tell for example what a car looks like, are stored. Declarative memory is queried by the perceptual processes and reasoning systems for this knowledge, when it is needed. Apart from the three memory systems, Automan consists of four executive systems: the Behavior, Perception, Action, and Emotion systems. All of these systems are constructed using behavioral patterns, and all interact with Automan’s memory systems in the same way. However, the functions performed and the rules needed for these systems are so different that it is conceptually convenient to distinguish them. In the following sections, we will describe these systems in more detail. The Behavior System The behavior system takes care of activating the behavioral patterns associated with the tactical and strategic levels. A behavioral pattern can be activated by a perception, which appears as an item in working memory, or by another behavioral pattern. Behavioral patterns have a kind of hierarchical structure, representing the influence that the strategic, tactical and operational levels have on each other. This hierarchical structure manifests itself in the rules governing the activation of the patterns. For example, take the situation depicted in Figure 1: Automan is approaching a T-shaped intersection. The intersection will be perceived; this means, the perception system will create a new item in Working Memory, called a perceived object. This object will trigger the activation of a behavioral pattern (we will call the pattern Approach_Intersection). This is a tactical pattern and it will contain rules that specify right of way, when to brake, etc. Also, this pattern in its turn can activate another pattern, which we call Check_Route. This is a strategic pattern that will determine whether Automan will have to turn left or right. In our previous discussion of the example, we have introduced a pattern
Automan: a psychologically based model of a human driver
201
Drive_To_Utrecht; Check_Route and Drive_To_Utrecht together will activate another tactical behavioral pattern, Turn_Right_T_Intersection, that handles the actual turning.
Figure 3. Automan’s subsystems. The procedural and declarative memories have been drawn as one system, because they have the same communication channels with the other systems. The Perception System Human drivers use almost exclusively visual perceptual information. Indeed, it has been argued that a large part of the driving task must be considered a visual task. We think this is somewhat exaggerated; not only other sources of information are used (e.g. sound, gforces due to acceleration), but also a large part of the driving task is making decisions and controlling a vehicle. However, perception remains an important aspect. Just like humans, Automan doesn’t have a complete field of view. It has a foveal field of view, in which objects are easily recognized, and a peripheral field of view, in which objects are perceived if they are very conspicuous. Eye- and head movements in traffic situations are closely related. Therefore, in determining the foveal and peripheral fields of view, we use Automan’s gaze direction, not the direction of his eyes or head. Land & Horwood (1991) show that gaze direction is a good estimate of the visual field a driver pays attention to. It is generally accepted that humans perceive one object at a time in traffic situations. Also, the more time spent looking at an object, the better it will be perceived (up to a certain optimal level, of course). To model the low level perceptual process in Automan, a Perceptual Filter is included. This filter determines which object is currently being
Clinical assessment computerized methods
202
perceived. Mostly, this object will be in the foveal field of view. However, if a very conspicuous object is present in the peripheral field of view (e.g. because it is moving fast, is fairly big, or has salient colors) the perceptual filter will output that object. If a person looks at an object, he not only perceives the object, but also object properties like speed, heading, color, etc. However, several factors affect the quality of perception. Weather conditions, time spent looking, and individual differences all have influence on what a person perceives and how accurately he perceives it. For example, at first glance, a car seems to be driving relatively fast. How fast? The viewer could say ‘between 20 en 80 kph.’ But after a longer look, the viewer may narrow it down to ‘about 50 kph’. Of course, there is a limit to how accurately a person can perceive properties of objects; ‘about 50 kph’ could be 45, but also 55 kph. We call this the intrinsic vagueness of perception. We use the vagueness adaptations, explained in section 2.5, to model this perceptual feature. The perceptual filter modifies vagueness, based on the time spent looking at a certain object. The more time is spent looking, the lower the vagueness. There is a minimum perceptual time that is needed to perceive objects and their properties at intrinsic vagueness. There is also a minimum perceptual time that is needed to perceive an object with its properties at maximum vagueness (which would amount to not perceiving the properties at all, just the object). By varying the time necessary to perceive an object properly, the perceptual function can reflect the efficiency of a human drivers perception. It is known that experienced drivers can perceive a situation much faster than inexperienced or elderly drivers. This can be easily simulated in the model by increasing the perceptual time. As explained, the output of the perceptual filter is dependent on the time spent looking at an object. This time, in turn, is governed by the Visual Schemes. A visual scheme is a behavioral pattern; its activation is based on the other active behavioral patterns. Drivers exhibit specific looking patterns in different situations; when overtaking a car, a person looks differently than when taking a right turn. Visual schemes are used to model these patterns. They are activated by corresponding tactical or strategic behavioral patterns from the behavior system. Visual Schemes inhibit each others activation, so only one can be active at a time. A visual scheme contains priorities for various gaze directions, and rules to update these priorities. The gaze direction of Automan will be set to the direction with the highest priority in the active visual scheme. The rules consult working memory for perceived objects in the various directions. If, in a certain direction, enough objects have been perceived sufficiently recently, the priority of that direction will be decreased. Another direction will now have highest priority, and the gaze direction will be shifted. How many objects are required to change priority depends on the visual scheme; sometimes, only one object is enough (for example, a traffic sign), but sometimes a complete picture of the situation is needed.
Automan: a psychologically based model of a human driver
203
Figure 4. Automan’s perception system. The world projection is the interface between the world and Automan, and determines what can be physically seen. The perceptual filter determines what is perceived from the things that can be seen. Take our T-shaped intersection example: the Navigate _T_Intersection behavioral pattern has just become active. This behavioral pattern will have a visual scheme associated with it, let’s call it Check_T_Intersection, that specifies where the driver has to look. Of course, firstly it is important whether the road from the left has right of way, or whether a stop sign is present. The priority for looking to the right side of the road will thus be highest. As soon as a traffic sign is perceived (or no traffic sign is perceived of course), the priority of this direction will be decreased, and the priorities of the directions of the two roads are increased. The Action System The action system of Automan consists of operational behavioral patterns. These patterns consist of simple rules for controlling the vehicle. For example, the Steer_Right behavior would be part of the action system, and contains rules to control the turn; how much to turn the wheel, how much to compensate, etc. These rules will be somewhat like a control system; fuzzy logic is very suitable for making control systems. This operational level Steer_Right behavioral pattern will be activated by the tactical Turn_Right_T_Intersection behavioral pattern, when the road is clear to make the turn.
Clinical assessment computerized methods
204
The Emotion System One major influence on human decision making is emotion. A good example of an emotion influencing behavior is aggression. Aggressive drivers tend to keep short distances to other vehicles, tend to overtake more, tend to forcefully maneuver themselves amongst other traffic, and so on. Some people are predisposed to aggressive behavior; aggression is a kind of personality trait for them. Their driving behavior will be aggressive, regardless of the traffic situation. Other people might be aggressive because of events that came to pass just prior to their participation in traffic. However, a large part of aggression in traffic comes from the traffic itself. People get frustrated about traffic jams, slow drivers, traffic lights, and so on. In Automan, emotions are controlled by success of behavioral patterns. They are also implemented as behavioral patterns, whose activation is determined by the success of specific other behavioral patterns. If Automan is driving behind a slower car, but cannot overtake this car, the overtaking behavior will be active relatively long. This will steadily increase the activation of the frustration pattern. In turn, emotions influence activation of other behavioral patterns. Suppose, in the intersection example, Automan is very frustrated; he might be late for his presentation in Utrecht. The Turn behavior will be more easily activated, and the Slow_Down behavior will be harder to activate. Therefore, Automan will likely try to make the turn just before the approaching car, instead of slowing down. It is also possible to model people with a predisposition for aggression, or with aggression induced by recent events, in Automan. Some emotional behavioral patterns can have a very high initial activation, to reflect the emotional state of the driver. Evaluation and Updating Objects Properties of perceived objects (which are items in Working Memory) are initially set by the perception system, like distance from the intersection, speed, etc. These objects will stay in working memory and may still be needed by the behavior system. However, the world is a dynamic system. Especially when driving, the environment is continuously changing, either because of the actions of the driver himself, or because of the presence and actions of other traffic participants. Thus, the reliability of the perceptual objects in working memory will degrade over time. A car that was seen a couple of seconds ago can very quickly have a totally different speed and heading. Therefore, properties of objects in Working Memory are continuously updated. Dependent on the property, this update consists of either estimating new values, deleting the object from working memory, or activating a certain visual scheme to perceive the object and its properties again. In the example where Automan is approaching the intersection, suppose he is looking to the right. However, the active tactical behavioral pattern (suppose it is Turn_Right_T_Intersection) needs information about traffic from the left to decide whether to start the turn or stop and wait. Working memory will update the properties of the object representing the car from the left, by using its last known speed and distance and the time the car was last seen. Of course, the uncertainty of these values (their ‘vagueness’) increases when they are calculated this way. If this vagueness becomes too large, the perceptual system will be engaged to guide Automan’s gaze direction to the left.
Automan: a psychologically based model of a human driver
205
Activation of Behavioral Patterns Behavioral patterns are activated by other behavioral patterns or by perceived objects. Also, an active behavioral pattern can influence its own activation. This has an effect which is widely know in psychology: humans will stick to their current decision, even when more appropriate decisions have arisen. Automan has for example decided to start to turn right. Automan then notices the car from the left. If Automan had seen the car in time, he may not have started the turn. But he did start it, so the Steer_Right operational behavioral pattern is still active. It will remain active, even when the Negotiate_Intersection behavioral pattern starts activating the Brake pattern. Dependent on the situation, this can result in a collision. Another example to clarify this: if a person is driving behind a slow driving vehicle, the person will determine whether he can overtake. If the lane is free, he may decide to start overtaking. While in the process of overtaking, suddenly a car approaches from the other side. If this situation occurred before he started overtaking he may not have started the routine. But he did, so he may try to ‘go for it’ and keep overtaking. Of course, this process is also influenced by the emotion system. Conclusions and Future Directions This paper shows how to use sophisticated techniques in modeling the task of driving a vehicle. Until now, the Automan project has been just a design. To actually implement it some additional parts would be needed; a design has already been made of a cognitive architecture in which to implement the model (Heemskerk, Quispel, & Warris, (1999). Although initially designed for simulating traffic participants in a traffic simulator, Automan’s use can be much broader. When the model is well developed and validated, it can also be used for testing new traffic situations and road configurations. This kind of research can not be done in real life, of course, and using a simulator with test subjects is a time-consuming process. It would be convenient to use cognitive models for this purpose. Research into modeling traffic jams can also be done, by using large sets of Automen. This would make it possible to investigate whether certain styles of driving, or certain legal measures, are really effective in combating traffic jams. Research into the use of new equipment in cars, for example navigation systems or mobile phones, can be greatly facilitated using an Automan model. Furthermore, parts of Automan can be used in research into specific aspects of driving. All three levels in Automan are modeled on a level that is fine-grained enough for its overall behavior to be realistic, but of course this modeling could be done in more detail if a specific aspect of driving is under investigation. When research is to be carried out upon these aspects, Automan can be used as a background architecture. The processes can then be modeled while taking the whole driving task into account. Alternatively, it would be very interesting to see whether the concepts developed and the approach followed in designing Automan could be used for modeling other tasks. Nowadays, a growing need exists for good user models. The approach followed here would be very useful in a lot of domains, especially in dynamic, demanding task environments. Take, for example, an ambulance dispatcher simulation. Such a simulation models the task of an ambulance dispatcher, who has to deploy and coordinate all
Clinical assessment computerized methods
206
ambulances in a certain region. This simulation could greatly benefit from realistic ambulance agents that can react autonomously to obstacles or unexpected situations, and that can communicate with the dispatcher. This enables the use of sophisticated scenarios, in which multiple complicated accidents are involved. Alternatively, one could use the approach in this paper to model the ambulance dispatcher itself. References Aasman J. (1995). Modeling Driver Behavior in Soar. PhD. Thesis, Groningen: University of Groningen. Anderson, J., & Lebiere, C. (1998). The Atomic Components of Thought. London: Lawrence Erlbaum. Banks, S.E., & Stytz, M.R. (2000). Advancing the state of human behavior representation for modeling and simulation: Technologies and techniques. 9th conference on computer generated forces and behavior representation, http://www.sisostds.org/cgf-br/9th/. Bos, J., Mulder, L.J.M., & Ouwekerk, R.v. (2001). Workplace for Analysis of Task Performance. This issue. Braitenberg, V. (1984). Vehicles, experiments in synthetic psychology. Cambridge, MA: MIT Press. Brehmer, B., & Dorner, D. (1993). Experiments with computer simulated micro worlds: Escaping both the narrow straits of the laboratory and the deep blue sea of the field study. Computers in Human Behavior, 9, 171–184. Brookhuis, K., Bos, J., Mulder, L.J.M., & Veltman, H.A. (2000). Simulation in Psychology- A sketch of a future digital workplace environment. In D.de Waard, C.Weikert, J.Hoonhout, & J.Ramaekers (Eds.), Human System Interaction: Education, Research and Application in the 21st. century (pp. 79–86). Maastricht, The Netherlands: Shaker Publishing. Brooks, R.A. (1991). Intelligence without representation. Artificial Intelligence, 47, 139–160. Cnossen, F. (2000). Adaptive strategies and goal management in car drivers. PhD. Thesis, Groningen: University of Groningen. Dennet, D.C. (1987). Cognitive Wheels: The frame problem of AI. In Z.W.Pylyshyn (Ed.), The robot’s dilemma: The frame problem in Artificial Intelligence. Norwood, NJ: Able publishing corp. DiFonzo, N., Hantula, D.A., & Bordia, P. (1998). Microworlds for experimental research: Having your (control and collection) cake, and realism too. Behavior Research Methods, Instruments and Computers, 30, 278–286. Heemskerk, M., Quispel, L., & Warris, S. (1999). Automan. Project Report, Groningen: University of Groningen. http://www.ai.rug.nl/~sim/automannewlayout.pdf. Hill, R.W., Chen, J., Gratch, J., Rosenbloom, P., & Tambe, M. (1997). Intelligent agents for the synthetic battlefield: A company of rotary wing aircraft. Proceedings of Innovative Applications of Artificial Intelligence (IAAI-97). Providence, Rhode Island: AAAI Press. Jones, R.M., Laird, J.E., Nielsen, P.E., Coulter, K.J., Kenny, P., & Koss, F.V. (1999). Automated Intelligent Pilots for Combat Flight Simulation. AI Magazine, 20, 1. Kieras, D.E., & Meyer, D.E. (1995). An overview of the Epic architecture for cognition and performance with application to human computer interaction. EPIC Tech. Rep. No. 5, TR95/ONR-EPIC-4). Ann Arbor, University of Michigan, Electrical Engineering and Computer Science Department. Knippenberg, C.W.F.v., Rothengather, J.A., & Michon, J.A. (1989). Handboek Sociale Verkeerskunde. Groningen, The Netherlands: Van Gorcum & Comp. Bv. Kosko, B. (1992). Neural Networks and Fuzzy Systems. Englewood Cliffs, NJ.: Prentice Hall. Laird, J.E., Newell, A., & Rosenbloom, P.S. (1987). Soar: An architecture for general intelligence. Artificial Intelligence, 33, 1–64.
Automan: a psychologically based model of a human driver
207
Land, M.,& Horwood, J. (1991). The relations between Head and Eye Movement during driving. In A.G.Gale (Ed.), Vision in Vehicles III. Amsterdam, The Netherlands: Elsevier Science. McFarland, D.J. (1996). Animals as cost-based robots. In A.Boden (Ed.), The philosophy of artifical life. Oxford University Press. McKnight, A.J., & Adams, B.B. (1970). Driver education task analysis. Volume 1: Task Descriptions. Alexandria, Va.: Human Resources Research Organization. Meyer, D.E., & Kieras, D.E. (2000). Precis to a practical unified theory of cognition and action: Some lessons from EPIC computational models of human multiple-task performance. Attention and Performance XVII. Cognitive regulation of performance: Interation of theory and application, 17–88. Boston, MA: MIT Press. Michon, J.A., Smiley, A., & Aasman, J. (1990). Errors and driver support systems. Ergonomics, 33, 1215–1229. Oza, N.C. (1999). Probabilistic Models of Driver Behavior. Spatial Cognition Conference, http://ist-socrates.berkeley.edu:4247/conference.html. Ross, T.J. (1995). Fuzzy Logic with Engineering Applications. New York: McGraw-Hill, Inc. Rothengather, T., & Vaya, E.G. (1997). Traffic & Transport Psychology. Amsterdam, The Netherlands: Elsevier Science Ltd. Sauer, J., Wastell, D.G., & Hockey, G.R.J. (2000). A conceptual framework for designing micro worlds for complex work domains: a case study of the Cabin Air Management System. Computers in Human Behavior, 16, 45–58. Steels, L. (1994). The artificial life roots of artificial intelligence. Artificial Life Journal, 1. Steinhage, A.,& Bergener, T. (1998). Dynamical Systems for the Behavioral Organization of an Autonomous Mobile Robot. From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior. Zurich, Switzerland: Bradford. Tambe, M., Johnson, L.W., Jones, R.M., Koss, F.V., Laird, J.E., Rosenbloom, P.S., & Schwamb, K. (1995). Intelligent Agents for Interactive Simulation Environments. AI Magazine, 16,1. VanLehn, P.C. (1991). Architectures for intelligence: The twenty-second Carnegie Mellon Symposium on Cognition. Hillsdale, NJ: Lawrence Erlbaum. Winsum, W.van (1996). From adaptive control to adaptive behavior. Phd. Thesis, Haren: Traffic Research Centre. Wolffelaar, P.van (1996). Functional Aspects of the Driving Simulator at the University of Groningen. Report, Groningen: Centre for Environmental and Traffic Psychology. Wolffelaar, P.van, & Winsum, W.van (1993). GIDS Small world Simulation. In J.A.Michon (Ed.), Generic Intelligent Driver Support: A comprehensive report on GIDS. (pp. 175–192). London: Taylor & Francis. Wolffelaar, P.van, & Winsum, W.van (1996). Driving simulation and traffic scenario control in the TRC driving simulator. Symposium on the Design and Validation of Driving Simulators (ICTTP 1996).
Chapter 17 Workplace for Analysis of Task Performance J.Bos, L.J.M.Mulder, and R.J.van Ouwerkerk Department of Psychology, University of Groningen, Grote Kruisstraat 2/1, 9712TS Groningen, The Netherlands Abstract In current research on mental workload and task performance a large gap exists between laboratory based studies and research projects in real life working practice. Tasks conducted within a laboratory environment often lack a strong resemblance with real life working situations. This paper presents an experimental approach to minimizing this gap by designing a very flexible experimentation system with adequate hardware and software components. The first goal of the system is to design a laboratory based environment in which a broad range of computer supported daily life work can be simulated, including co-operative working situations. Moreover, several behavioral and physiological measurement and analysis techniques are supported, such as video based behavioral analysis, task related event registration, cardiovascular state analysis, determining mental workload indices as well as EEG background and ERP analysis. An important requirement for relating the different measured variables to task performance is synchronization of data sources and task parameters at varying time scales. The highest time accuracy should be at least 10 milliseconds. The present system fulfils this requirement by using software system components and libraries that allow real time experiment control and measurement. Additionally, the new system should work within a Microsoft Windows based environment, providing the possibility to use standard office software that is well known to subjects having to work in the new environment. The option to use such standard software, in combination with new (simulation) techniques for presenting more realistic tasks, results in a powerful laboratory environment in which task elements in semi-realistic tasks can be manipulated experimentally. The way to do this is by defining adequate scenarios that can be simulated. At present, is that both a simple, less realistic task has been realized (Synwork) with a high time accuracy (1 ms), as well as a more realistic simulation of an ambulance dispatcher task with lower time accuracy (10– 100 ms). Both types of task can be seen as examples of the range of tasks to be implemented in the near future.
Workplace for analysis of task performance
209
Introduction Often, when psychological research is conducted in relation to tasks performed in reallife office work, a choice has to be made between conducting this research within a controllable laboratory environment and conducting the research on a normal work location. Both kinds of research do have their advantages and restrictions. Research at a work location can give a very good analogy of the task conducted in real life, but has the disadvantage that unexpected disturbances can occur and it can be very difficult to set up an adequate research environment. Conducting an experiment within a laboratory provides a stable and controllable research environment, but tasks to be performed often lack a strong resemblance with the situation in the outside world. One reason is that the laboratory environment itself does not provide a good resemblance with the office location, while, additionally, the tasks conducted by the subject do not completely comply with the real-life task to be performed at the office. Often the level of abstraction is too high in these laboratory tasks also. Moreover, in most of the laboratory studies, emotional and motivational factors are not included, while these are very important in real life. Another aspect is related to the work domain as a whole and the place of separate tasks in that domain (Vicente, 1999); often in a laboratory experiment there is almost no connection between (sub-)tasks that have to be performed in a certain sequence, while in real-life situations the result of a preceding sub-task is decisive for the choice of subsequent task elements or task strategies to be followed. The aim of the present paper is to describe a laboratory based simulation system that has been designed to overcome some of the described problems of the standard laboratory facilities usually available in Experimental and Work Psychology. The first task was to develop the necessary concepts for a system that has the flexibility to build different types of working environments that can be used for experimental studies. It was decided to use object technology in a hierarchical way, as normally done, both for the software as the hardware components. The work domains to be simulated can be characterized as: computer-based and dynamically changing, and subjects can work together if necessary (co-operative). The way to implement the simulation is to define specific scenarios that specify the events that occur in the work domain and that determine the task components to be performed and the actions to be taken. Registration and analysis of physiological signals, as well as behavioral measurements, will basically not be different from the well known laboratory methods. An important aspect concerns synchronization and time accuracy. All available data sources have to be synchronized with the events generated in the simulation. Additionally, in the working environment standard software products such as Microsoft Office have to be used in order to bring simulator freshmen into an environment which is partly well known. This facilitates progress in such a workplace and shortens learning time. In 1997 a NWO investment grant was provided to the department of Experimental and Work Psychology of Groningen University for starting a project with the main goal of setting up a laboratory facility for analysis of task performance (DWAT). Goals were set to build a general purpose simulation environment for complex, dynamic tasks. Hardware components had to be specified and bought, while specific simulation software had to be developed.
Clinical assessment computerized methods
210
The main aim of the present paper is to explain the approach taken and to report on the chosen concepts. System Requirements In order to facilitate daily computer based work, standard office applications have to be used in combination with task specific self-built simulation software; additionally, facilities for high-resolution time measurements are required.. Currently, most widespread office applications do operate on the Microsoft Windows operating system. Despite the better interface capabilities of the system in comparison to the MSDOS operating system, Windows has two main disadvantages. 1. Due to the complexity of the system, it is very difficult to control the operating system during an experiment (prevent unexpected events from occurring). 2. An even greater problem is the lack of a good and stable timing device in combination with a method to directly access hardware devices. This is also one of the main reasons that psychological research conducted under Microsoft Windows is still not commonly used. Both problems must be solved if a system is developed to run in this environment. An important reason to choose for Microsoft Windows, despite the above mentioned disadvantages, is the widespread use of this system for home and business applications. This means that most subjects that enter the laboratory facility to be built are already acquainted with part of the software that will be used. Therefore, a system is needed that supports this operating system as well as the use of external applications within this environment, such as Microsoft Word or the Windows Task scheduling programs. A rather different aspect of daily office work is the interaction with colleagues. Often this interaction is not taken into account, which narrows the resemblance with daily-life office work. For this reason, a multi-user environment is required where different computers are connected via a local network. Also, all kinds of physiological measuring techniques should be available without compromise. This aspect gives rise to some problems due to the above mentioned restrictions of Microsoft Windows. At this point, the real time requirements come into play. By using a separate physiological measuring system of high quality real-time demands will not be different from other physiological measuring systems. However, the simulation software has to generate task related events at high accuracy in order to measure, for instance, evoked responses (ERP) and other task related physiological changes, as well as sustaining synchronization facilities. The compromise that will suffice in this situation is that, a) synchronization will be arranged in the whole simulation system by one external clock and, b) that exact timing of internal and external events is only required in well controlled, user defined situations and not within the standard office applications. Due to the use of a multi-user environment, the complexity of the system will increase. Not only are there more hardware and software elements to be taken in account, but also the communication between the subjects must be controlled and registered. For this reason, a general mechanism has to be developed which enables researchers to easily control the system. In order to accomplish this goal, scenario based experimental-control
Workplace for analysis of task performance
211
will be introduced (Van Winsum,…) in which the entire experiment is described through a scenario file (script). Aspects that should be supported by the scenario system are the possibility to use feedback from the user, and the possibility to describe the desired user interaction patterns during an experiment. To summarize, a system is needed that: 1. Provides a multi-subject environment, well controlled, in which real time measurements can be accomplished. 2. Uses scenario-based experimental-control, with interaction and feedback facilities. 3. Can work with most of the commonly used commercial office applications. 4. Supports all kinds of standard methods for physiological measurement and analysis. 5. Supports dynamic addition and removal of hardware and software elements (computers, software and registration devices). 6. Has a large degree of scalability: the same concepts, software and hardware components will be used for both complex and simple experiment structures At present, a system has been designed which meets the mentioned criteria. A new conceptual structure is realized, in which context the newly developed system will be built. This system, called the Workplace for Analysis of Task performance (DWAT) will not only provide a new way of building and conducting psychological experiments, but will also extend the range of possible experiments using new techniques, such as Multimedia and digital video, integrated within a multi-user environment. General Concepts Conducting a psychological experiment generally consists of combining hardware (computers and registration devices) and software (tasks) elements together, in which execution of the software is synchronized with the registration of events (software and hardware). For simple, stand alone experiments, it may very well be possible to control all these elements by hand but the more subjects that simultaneously work together during an experiment, the more devices that are needed for controlling the experiment, resulting in a growing complexity. In order to avoid problems related to this complexity, a simple mechanism must be developed that controls all the hardware and software elements within the system. For this reason, a new concept, called the task-object, is introduced in the workplace, which gives a simple but straightforward method for controlling the elements within the system. A task-object is a virtual representation of a software or hardware element used within the workplace. This representation is used to give the researcher access to all the properties and methods of an element that are used during an experiment. The concept is completely compatible with the idea of object oriented programming, as used in several visual software development environments (e.g. Delphi). The main idea is that the experiment designer can specify the main features of a specific ‘task-component’ in his own terms while simple, well-known manipulation techniques (drag and drop) will help him to position such an element at the correct place in the experimentation environment. This can be done for both software as well as hardware components. One of the nice
Clinical assessment computerized methods
212
features of such an approach is that the experiment designer can use the terminology of his research field while the underlying features in terms of software properties and real time characteristics are directly connected to the object. Additionally, easy extendibility is an important aspect of this approach, which is also used in other experiment controlling environments (Eprime…,…). A simple example is camera control. The experiment designer specifies that he wants to connect task object ‘camera 1’, at a specified time, with a specified position and direction. The underlying object software translates these specifications to adequate camera commands. Other examples could be: ‘start blood pressure measurement’, ‘send message X to subject 1’ etc. The advantage of this methodology is that it gives a transparent and simple way of controlling and integrating all the different elements. It also implies that all the elements can (and have to be) controlled on a software level and that the control is carried out with a standard set of basic methods that are basically the same for all elements. When a hardware or software element is added to the environment, a description file is generated that contains a translation list. This translation list is then used to transform specific commands of an element into more general properties and methods used within the workplace. Also, specific commands and properties are added to this list to control the specific characteristics of an element. When a task-object is selected to be used within an experiment, a virtual representation is loaded within the system that also contains the translation list of the task-object. During the experiment, this list is used by the system to control the task-object. All task-objects used within the system are placed in a virtual environment called the task-world. This task-world represents the entire experiment with all its used hardware and software elements. The system uses a tree representation, in which all the task-objects are placed in relation to each other. This is shown in figure 1. Each task-object is represented as a node; the relation between the task-objects is represented by the leaf structure of the tree. Using this task-world, a clear method is introduced for handling all the elements within the system. It is easy to add or remove task-objects from the task-world by just removing or adding a node (or more complex parts of the tree) during an experiment. Using this general concept, the system reaches a level of modularity that is very important, both for reasons of extendibility and for controllability by the researcher. With this object modularity, the system can easily be adapted, making it possible to create a great variety of different experimentation settings. For instance, a system can be built with one single computer while other systems can be built consisting of a network of multiple computers in combination with other hardware and software elements. This results in an experiment and simulation system that can easily be optimized to meet the requirements of the external world.
Workplace for analysis of task performance
213
Figure 1. The task-world representation of an experiment within the workplace. Every task-object is presented as a node within the tree, the relations between the task-objects are presented as the branches of the tree. General Configuration The central core of the workplace is a software application that fulfills the previously mentioned requirements and includes the described conceptual structure. This application functions as a background controller on each computer used within the workplace. This application, called the control program, acts as a director and communicates with the other control programs and task-objects within the system. One specific version of these control programs becomes the central control program (master); the user is able to specify the computer on which the master program will run. All other control programs will be subsequently controlled by this computer and conduct only local functions. Every control program has the possibility to add and remove new task-objects to the system, thereby extending or reducing the task-world. The master control program also has the ability to add or remove control programs (computer systems) from the task-world. A control program is built upon four integrated modules that each have their distinct task within the control program. The modules work as independent sub-processes that
Clinical assessment computerized methods
214
can interact with each-other. One module, the Task World data module, is the central data object and contains a virtual representation of the task-world with all task-objects, including their properties and methods. Figure 2 shows how the modules are organized.
Figure 2. The structure of the control program. The four modules are independent sub-processes within the control program, which interact with each other. Communication with the outside world is provided through the communication module and the WinTask stimulus/response module. The Communication Module The communication module is used for communicating with other task-objects within the system. Communication at this moment is achieved by using three various communication protocols. The TCP/IP protocol is used for communication between the different computers (control programs) and the outside world; Win32 messaging, a Windows communication protocol, is used for communication between the control program and other Windows based applications on the same computer. The third communication protocol uses the standard computer communication ports (COM and LPT) for accessing external devices such as physiological registration devices and
Workplace for analysis of task performance
215
response devices on the same computer. Timing within this module and in relation to other task applications does not meet high accuracy requirements. In other words, this module is not meant for real time measurement and control. The method for internal processing of the communication inside a control program is as follows. When a message is presented to a control program, this message is placed in one of the three data-buffers depending on the kind of communication protocol that is used by the message. Subsequently, the message is converted to a standard data type and then placed inside the message queue of the control program. This queue is an independent sub-process within the control program that will process the message independently of the other three processes (the three other modules). A message always contains a header, which gives information about the target locations of the message, and information about the type of message. Within the queue process, a distinction is made between messages for the control program itself, messages for local task-objects and messages for task-objects connected to the other computers within the workplace. Finally, the message is sent to the target location. Communication between task-objects is, as a general rule, communication through the control program. But sometimes it is desirable to have communication between two taskobjects without the interference of a control program. This can be achieved by setting up an external communication line. In this case the control program only activates and deactivates the communication but does not actually control the communication itself. This kind of communication can be used to achieve rapid data transport between computers, and it relieves the processing needs of the master control program. The WinTask Stimulus Response Module The WinTask module is the successor of the MsDos Taskit toolkit (Gerritsma & Van der Meulen, 1987), which is used to create psychological experiments using Turbo Pascal in MsDos. WinTask is specially developed to run in a Microsoft Windows environment using the same structure and methodology as Taskit. The module provides the system with the possibility to register events with a resolution of one millisecond, using Windows Multimedia Timers in combination with the parallel port. Also, additional functions are provided, such as a clock with a resolution of one millisecond, functions to control and stabilize the operating system, functions to present stimuli with a high time resolution, and functions to register mouse, keyboard and system events of a subject (Bos, Hoekstra, Mulder, Ruiter, Smit, Wybenga, & Veldman, this issue). An important difference with the stand-alone version of the WinTask module is that the workplace is a system that can operate with more than one computer, thus resulting in the problem that these computers (and other hardware devices) need to be aligned to an equal time basis with a preferred resolution of one millisecond. To solve this problem, a special device is developed to which all computers and time dependent hardware devices are linked. This device provides the workplace with overall timing with a resolution of one millisecond. The WinTask module has the possibility to use an external timing input instead of the Windows Multimedia Timers, thus enabling the possibility of a central timing device within the workplace and providing the different computers with the same time basis.
Clinical assessment computerized methods
216
Another important aspect of the WinTask module is the screen object, which offers the possibility of presenting stimuli with high resolution timing, by using special graphic output drivers provided by Microsoft for game development (DirectX). This option can be applied for presenting stimulus/response frames using high-resolution timing, and can be combined with response registration using response boxes connected to the parallel port. Using this feature introduces the restriction that no other standard Windows applications can be actively used at the same time. Only this screen object is active, and it controls the whole screen presenting the stimuli. Also, communication with other taskobjects is limited, to prevent interruptions during such presentation. Although this mechanism contradicts the open multi-user structure of the workplace, the following remarks must be made.1) Switching between the open and closed model during an experiment can be done as often as necessary.2) The duration for using this mechanism can be very short. Switching to the closed high resolution timing model, presenting the stimuli, capturing the response and switching back to the open model can be accomplished within a frame of a few hundred milliseconds.3) The closed model is acting within one computer, leaving the possibility for the other computers within the system to communicate with each other.4) The use of this mechanism is not necessary throughout the entire experiment, but only within certain distinct time periods. An important remark to be made is that these restrictions only hold for the DirectX screen control, not for the timing mechanisms. The Script Parsing Module The third module is a script interpreter that is used to parse a sequence of commands in a scenario during an experiment. Within the workplace, an experiment can be conducted without using the script interpreter, but then all actions must be performed by the controller. A more convenient method is to describe the experiment in a scenario using a script language. Executing the scenario will start the experiment, whilst leaving the controller the task of monitoring the experiment within it. All other operations are then automated in the scenario. A scenario consists of two distinct parts. One part, called the static description, is always generated within the workplace, even if the script interpreter is not used during an experiment. This static description contains a list of all the task-objects being used within the workplace during an experiment. The second part, called the dynamic representation, describes the actions that occur during an experiment for the task-objects that are described in the static description. These actions can be described on a global scale (for all or some of the task-objects) or on a local scale (for only one property of a task-object). Also, interaction with the user and feedback provided by the registration devices can be used within a scenario. The script interpreter chosen (Ackermann, 1998) is based on the Pascal language and has all the possibilities provided by this language. In the script language, task-objects are seen as individual object variables. Scenarios are executed in a two step sequence. First, the scenario is pre-compiled into a pseudo code. This pseudo code is then executed. This enables the system to check for inconsistencies in a scenario before running the scenario itself. Another advantage is that the execution of the pseudo code is much faster than executing the scenario directly.
Workplace for analysis of task performance
217
Because all control programs have their own script interpreter, it is possible to build a system that uses more than one script at the same time. In this case, the script used by the central control program superimposes on the other local scripts used within the system. The advantage of using more than one script interpreter at the same time can be found in the division of ‘work’ during an experiment, and in the possibility of creating pseudo autonomous sub-systems within the workplace. The Task-World Data Module The fourth module is a data structure object containing all the information about the taskworld. The virtual representations of all the task-objects are placed in this module within a tree representation. A differentiation is made between the master control program and the other control programs within the workplace. The master control program contains a representation of all the task-objects used within the workplace, whereas the other control programs only have a local representation of the task-objects used by the computer on which that control program is active. If a property of a task-object changes or needs to be changed, this is realized through this data structure. For this reason, the data representation always contains the actual representation of the active task-objects within the workplace. All four modules within the control program can be seen as individual task-objects directly related to the control program itself. These four task-objects do have their own properties and methods, which can be used during an experiment. This enables the possibility to deactivate some of the modules during an experiment (e.g. the script module). Also, the control program itself is an individual task-object with its own properties and methods. Experimentation Environment: Hardware Configuration Although the software, in particular the control program, plays an important role in the system, hardware devices and a suitable laboratory room are also needed to build a reliable and working experimentation laboratory. For the current workplace implementation, the choice was made to build a laboratory where a maximum of four subjects can work simultaneously. For this reason, two office/observation rooms were made available with each room containing two workplaces (a desk with a computer). A third room situated between the two observation rooms is used as a control center, where the researcher can monitor the subjects and control the experiment. The configuration can be seen in Figure 3. Five different sub-systems can be distinguished at the hardware level: 1. workstations; 2. physiological registration and analysis; 3. behavioral registration and analysis; 4. eye movement registration and analysis; 5. experimental control. Each workplace consists of a computer and two screens for information presentation and registration of user interaction.
Clinical assessment computerized methods
218
Figure 3. Current hardware configuration of the Workplace. Four subjects can be monitored at the same time during an experiment in two office rooms (only one presented in the picture). Each workplace can be configured with the same available hardware and software elements. One system is used as a sub-system for controlling the physiological registration devices and collecting the physiological data. Each workplace has available a general purpose measuring device that can measure 8 channels of EEG and 8 additional channels, including ECG and respiration. The sampled data can be combined with the registration of digital events, using the system as a digital recorder for registering EEG, ERP, Blood pressure (arm cuff and finger cuff) and heart rate (Clotz, Mulder, & Hoekstra, 1993). Finally, a central timing device is added to the system, to which all the hardware devices are linked to provide a one millisecond time base. Another system is used as a sub-system for controlling the video cameras and collecting the video data. Finally, one system is used for controlling all six other computers. This system is called the master control computer and contains a version of the control program that controls all the control programs on the other computers. For behavioral registration of the subjects, four cameras are available. The cameras are controlled from the observation room as individual objects. Computers within the system are directly controlled by the master control computer. Cameras and physiological registration devices are indirectly controlled through the two specific computers used for registration and control of this kind of device. Data collected from the video cameras and the physiological registration devices are entered into a central data storage unit, where both kinds of data can be combined based upon the same time base. The system described above will be used as a basis test configuration. The goal is to provide a scalable environment that can be used at multiple locations in a wide variety of settings. A final step will be the linking of this system to the other networks. Hence, providing a method for developing tasks and experiments at a different location, using the provided development tools, downloading them to the workplace, conducting the
Workplace for analysis of task performance
219
experiment, collecting data using the workplace environment, and finally sending the data back to the researcher for further analysis. This analysis of combined behavioral and video data can then be carried out using external programs such as Camera+ (Geuze, Mulder, Van Ouwerkerk, 1996) or Observer (Noldus, 1991). Research Projects During the last year, three experimental prototypes were constructed to test the fundamental principles and concepts of the workplace. The first prototype was a generalized simulation of a disaster control center. In this simulation, three subjects had to work together to minimize the damage caused by a simulated disaster (fire in a university building). Three subjects each had a distinct task, in which one person had to gather the information, one person had to decide what to do according to this information, and one person had to carry out the commands given by the second subject. Using electronic means of communication, an interaction process was generated. The goal of this experiment task was to see how a multiple user system could be set up using Microsoft Windows. The second prototype is a system derived from an existing switch task, the Synwork Task (Elsmore, 1994). This task, developed for the MsDos operating system, has four distinct tasks presented on one screen. A subject needs to divide his/her attention to the different tasks, where each task requires a different attention mechanism. A Windows version of this task, controlled by another computer, with a scenario interpreter was developed. The aim of this prototype was to test how multiple applications can be controlled using a small scenario. Already a third prototype has been built with a more complex simulation system. A simulation is generated of a ambulance dispatcher task, in which two ambulance dispatchers must work together to control a pool of ambulances in the province of Groningen. In combination with the results of an extended field study of the dispatcher task, this simulation has been built (Van Ouwerkerk, Kramer, Bos, & Mulder, 1999). The simulation is based on the use of three computers. One computer is used for controlling the experiment and simulation, and provides the necessary database information (a map of Groningen, position of the ambulances in this map and a list of simulated ambulance rides). The other two computers provide the subjects with an interface for conducting the task. Special aspects of this simulation are the use of intelligent autonomous agents for guiding the ambulances, and the use of an adaptive interface for supporting the dispatcher task. A variety of interfaces will be used to test the presumed advantages of this adaptive interface. After completion, a series of experiments will be conducted in the workplace to see how this simulation is related to the actual dispatchers’ task. Future Developments Conducting an experiment in the workplace is one aspect of research. Another equally important aspect is the development stage of an experiment. The workplace should also
Clinical assessment computerized methods
220
provide means of support for this part of the experimental project. In future developments, this will be accomplished by adding two supporting elements. A development environment will be constructed that provides the researcher with a visual interface for developing an experiment. In combination with a task-library and a set of standard task-objects, it is possible to set up an experiment without using a programming language or other difficult software development techniques. The method used is a combination of the drag & drop principle with task-objects, and setting properties of task-objects within a visual timeline. The researcher decides which taskobjects are necessary at a certain time and projects these task-objects onto the timeline. If more complex settings are needed that can not be handled with this method, then the internal script language can be used. This methodology is based upon the same principles used within PsyScope, an experimentation development tool for the Macintosh (Cohen, MacWhinney, & Flat, 1993), but has a wider range due to the introduction of multi-user aspects and the possibility to work beyond the scope of frame levels. The new methodology closely resembles the RAD-principles (Rapid Application Development) commonly used for developing Windows Programs. In this context, the methodology used in the workplace is called the RED principle (Rapid Experiment Development). Another well-known problem within the field of psychological research is the reuse of experimentation devices and programs. Within many laboratories, these programs and devices are available but often there is no common software library interface where those tasks can be found for reuse. For the workplace, a library is constructed in which all known task-objects are listed that can be used within that workplace. A researcher can make use of this library to find the task-objects that are suitable for a certain kind of experiment, or to find a task-object that can be easily adapted to the experiment’s requirements. This task-library will start with only the task used within the department of Experimental and Work Psychology, but will have the possibility to include the experimental set ups of other laboratories. Conclusion At this moment the system is still in a developmental stage. The current findings do however implicate that the methodology used within the workplace does offer an easy method for developing experiments, and does give a stable environment for conducting psychological experiments. Aspects such as a Multi-user Environment, Multimedia and Windows based tasks can be used within psychological experiments. The current findings seem to be very promising. Using real time aspects within Microsoft Windows is possible if certain restrictions are made. Because of the modularity and the used object structure within the system, a simple, orderly and powerful mechanism is provided for building and working with the complex structures involved in running psychological experiments.
Workplace for analysis of task performance
221
References Ackermann, M., (1999). TScript, A Pascal script component. Internet location: http://n.ethz.ch/student/ackermma/delph_en.html. Bos, J., Hoekstra, Mulder, L.J.M., E., Ruiter, J.de, Smit, J., Veldman, J.B.P, & Wybenga, D. (1999). Using Microsoft Windows™ for real-time psychological experiments. (In this volume). Bos, J., Mulder, L.J.M., & Ouwerkerk, R.J.van (1998). Werkplaats voor Arbeid en Technologie. Internal publication: Department of Experimental and Work Psychology, University of Groningen. Clotz, J., Mulder, L.J.M., & Hoekstra, E. (1993). A PC-based phychological measurement and data acquisition system. In F.J.Maarse, A.E.Akkermans, A.N.Brand, L.J.M.Mulder, & M.J.van der Stelt (Eds.), Computers in Psychology 4: Tools for Experimental and Applied Psychology (pp.20–32). Amsterdam/Lisse: Zwets & Zeitlinger. Cohen, J., MacWhinney, B., Flat, M., & Provost, J. (1993). PsyScope: An Interactive graphical system for designing and controlling Experiments in the psychology laboratory using Macintosh computers. Behaviour research methods, instruments & computers: a journal of the Psychonomic Society, ISSN 0743–3808. Elsmore, T.F. (1994). A Synthetic Work Environment for the PC. Washington DC: Walter Reed Army Institute of Research, Division of neuropsychiatry. Gerritsma, F., Meulen, P.van der (1988). The Taskit Development Environment. In L.J.M.Mulder, F.J.Maarse, W.P.B.Sjouw, & A.E.Akkerman (Eds.), Computers In Psychology: Applications in Education, Research and Psychodiagnostics (pp.81–88). Amsterdam/Lisse: Zwets & Zeitlinger. Geuze, R.H., Mulder, L.J.M., & Ouwerkerk, R.J.van (1996). CAMERA+ A system for Collection and Correcting Behavioural Data. Groningen, IEC-programma. Lamain, W., Sikken, J.A., Mulder, L.J.M., Ouwerkerk, R.J.van, & Veldman, J.B.P. (1994). Het Arbeidspsychologisch en Cognitief Ergonomisch Laboratorium. Heymans Bulletin, Psychological Institute, University of Groningen. Maarse, F., Ghisaidoobé, H., & Bouwhuizen, C. (1998). Real-time aspecten van Windows 95/NT voor psychologische tests. Pre-publication: Institute for Cognition and Information Nijmegen. Noldus, L.P.J.J. (1991). The Observer: a software system for collection and analysis of observational data. Behaviour Research Methods, Instruments & Computers, 23,415–429. Ouwerkerk, R.J.van, Kramer, R., Bos, J., & Mulder, L.J.M. (1999). Cognitive Analysis and modeling of an ambulance dispatch task. (In this volume). Vicente, K.J. (1999). Cognitive work analysis. Toward safe, productive, and healthy computerbased work. Mahwah, New Yersey: Lawrence Erlbaum Ass. Voort, A.G.van der, Ouwerkerk, R.J.van, & Mulder, L.J.M. (1996). Laboratorium voor Arbeid en Technologie: Aanbevelingen voor een eerste opzet. Internal publication: Department of Experimental and Work Psychology, University of Groningen.
Chapter 18 Cognitive Analysis and Modeling of an Ambulance Dispatch Task R.J.van Ouwerkerk, R.Kramer, J.Bos, and L.J.M.Mulder Institute of Experimental and Occupational Psychology, University of Groningen, Grote Kruisstraat 2/1, 9751 MN Groningen, The Netherlands Abstract Simulating complex natural tasks in the laboratory can be facilitated by a thorough analysis and modeling of the environment and tasks in the field. A field study is conducted to determine the workload of ambulance dispatchers in the province of Groningen. The results of the study are used to build a spatial navigation and planning task that is simulated in the laboratory. A major advantage of the virtual environment is that it allows for controlled experimentation. The results of the approach of analysis and of the development of a laboratory task environment are described in this article.
Introduction The work of ambulance dispatchers can be characterized as a highly cognitively demanding task involving dynamic conditions, time pressure, high stakes, uncertain information and multiple actors. An ambulance dispatcher has to screen incoming calls, diagnose the nature and severity of the problem, determine availability of ambulances and allocate them to the problem while maintaining adequate resources for future problems. This type of work is often referred to as naturalistic decision-making tasks (O’Hare, Wiggens, Williams, & Wong, 1998). The aim of the present study is to identify the cognitive demanding characteristics of the dispatch work, determine its temporal and sequential structure and create a model of the work that can be used to predict the consequences of the rearrangement or reallocation of tasks. The results of the analysis are used to develop a simulation of the task in a more experimental and controllable environment (Workplace for Analysis of Task Performance; Bos, Mulder & van Ouwerkerk, 1999). The general outline and the preliminary results of the study are described in this paper.
Cognitive analysis and modeling of an ambulance dispatch task
223
Method: Stepwise Analysis The work analysis techniques used in this study are based on earlier methods of stepwise analysis of human task performance (de Vries-Griever, 1989; Matern, 1984). In these methods the analysis process is structured in different steps and starts with a global orientation (a) on the characteristics of the work. Further steps are the analysis of (b) the formal assignment of tasks, (c) the actual and individual performance of the tasks and (d) the consequences for workload and health. In the analysis presented here only the orientation (system analysis) and the formal assignment and characterization of tasks (cognitive task analysis) are described. In the entire study all four steps are conducted (see for the results of the last two steps Kramer, 1999; Messer & Duyndam, 1999). However, the results of the last two steps are used to construct task network models in which the time characteristics, the sequential dependencies and the workload of tasks are integrated. In the next sections, the method of analysis is discussed in more detail. System Description The system description not only serves as a context to interpret the results from the cognitive task analysis but also guides the main focus of the cognitive task analysis. Four aspects of the man-machine system are distinguished. First, the organization and structure of the system is described. The global structure of man-man and man-machine communication is determined and the elements and boundaries of the system are defined. The description of the process of input, transformation and output is central to the second step. The purpose and functions of elements are also characterized in this stage. The third aspect that is described concerns the environment of the system. The environment constitutes the physical, cognitive, social and organizational layout of the system. Finally, the main characteristics of the employees are described e.g. age, education, health and experience. The necessary information is gathered in interviews with ambulance dispatchers and their superiors and from internal reports and documents. Cognitive Task Analysis The main characteristics of the work are determined in this step to evaluate and to identify demanding aspects of the dispatch work. The work is decomposed into smaller task elements comparable to the approach of a hierarchical task analysis (HTA, Annett & Duncan, 1967). The subtasks and their interrelations are arranged in a hierarchicalsequential table that describes the goals and structure of the work. The hierarchicalsequential table consists of a formal description of the structure of the ambulance dispatch work. Each element in the table is further described and evaluated in cognitive and emotional demands. The evaluation is based on theories of information processing and decision making in complex environments (e.g. Card, Moran & Newell, 1983; Anderson, 1983).
Clinical assessment computerized methods
224
Figure 1. Typical network model for making a phone call Several subtasks are interrelated and preconditions are defined for the relations between subtasks. Modeling Task Performance and Workload Modeling of human task performance allows the prediction of the duration or time profiles of individual performance as a function of system design, task allocation and individual capacity (Laughery & Corker, 1997). In addition, the workload demands resulting from a reconfiguration of tasks and environment can be predicted. A prominent approach for modeling human performance is the construction of task network models. One major advantage of task network models is that they can be applied to a broad range of research questions. A tool that is specially designed to construct task network models is Micro Saint (Micro Analysis & Design, 1998). In this program, the sequence of tasks or task elements are represented as nodes and connections in networks within the program. The time distribution of each element in the model can be defined. By sharing variables (multiple paths) between tasks, the relationship among components of the network can be interrelated. This makes the program highly suitable for calculating the temporal characteristics of human behavior e.g. duration of individual tasks and time profile of rest time, single and multiple task performance. In Figure 1, a typical example of a task network model of dialing a telephone number is displayed (adopted from Laughery & Corker, 1997). It is assumed that the information processing capacity of the subject is limited and that workload or cognitive demands should remain within acceptable limits. A combined score of the cognitive and emotional demands of each subtask can be added to the model to calculate a total workload index. The construction and simulation of different models provides information on the amount and time distribution of workload under different conditions. Results Although the work of ambulance dispatchers is comparable to the work of fire brigade dispatchers, the results are restricted to the former population. The global description of the work and the environment are described in the next paragraph. The results of the
Cognitive analysis and modeling of an ambulance dispatch task
225
cognitive task analysis are presented in the second paragraph. In the last section a model of the work is presented and the time profile and a total work load index for different distributions of emergency and non-emergency calls is calculated. The consequences for work load are described in this paragraph. System Analysis The ambulance dispatchers are located in the city of Groningen. They are seated together in a control room (Dutch: Meldkamer Ambulancezorg en Brandweer). The main task of ambulance dispatchers concerns the processing of emergency and non-emergency requests for ambulance transportation. Two requests out of every ten requests is an emergency request, so the large part of the work (80 %) consists of processing requests for non-emergent patient transfer. The dispatchers are responsible for the complete area of the province of Groningen. Once a request is processed by the dispatcher, ambulances from different stations can be allocated to the accident or patient site. Twelve ambulance stations are present in the area with a total of 38 available ambulances. A total of approximately 40.000 requests are processed during a year. The work is a 24 hour service in which the employee rotates over three different shifts: day, evening and night shift during the week as well as in the weekends. The dispatcher is faced with a list of non-urgent patient transfers at the beginning of a day shift. In this list, the time, locations and type of transfers are displayed. The normal routines of the work consist of communicating information and sending ambulances to the patients awaiting them. In between these daily activities, spontaneously incoming urgent 112-requests and non-urgent calls have to be processed. With both types of new request, the employee has to clarify the urgency and severity of the call and consecutively has to select the nearest ambulance(s). In case of a calamity which happens about twice a year, the unit serves as a command center with its own power supply and communication network. Besides these active tasks, the dispatcher is also responsible for the preservation of necessary ambulance resources to cover the area. A legal time constraint of 15 minutes between call from the accident site and presence of the ambulance must be guaranteed by the dispatcher. This may imply that other ambulances need to be re-deployed to cover any gaps in the area. This task is highly relevant because the covered area is large but at some locations thinly populated. The employee can not leave the workplace because emergency calls have to be processed immediately. As a consequence, all breaks and rests are spent at the work place and very few opportunities for inherent recovery are present in the work.
Clinical assessment computerized methods
226
Figure 2. Overview of the visual display unit in which the dispatch work takes place. The employees work at one of six identical visual display units (VDU) that are centered in the middle of the control room. Each VDU is equipped with several screens, keyboards and special input devices that are connected to separate computer and communication systems. Although some interconnections between these systems exist, they are not yet completely integrated. A distinction can be made in systems primarily for communication, for administration or for automatic alarms and signals. Every call that is made by the dispatcher is recorded both on the short and long term. Short term recording provides the employee with the possibility to replay any requests whenever the information was unclear or inaudible. The long term recording is primarily for backup purposes and to monitor deviancies from the standard procedure afterwards. The general layout of the workplace of an ambulance dispatcher is displayed in Figure 2. A total of nine ambulance dispatchers (eight males) varying in age between 38 and 58 (mean 50) are working at the unit. Most dispatchers are trained (male) nurses with several years of experience as ambulance personnel. During the week and in a day shift two dispatchers are working. In the weekend and in the evening and night shifts only one ambulance dispatcher is present. The small population of dispatchers leads to considerable problems making up healthy rosters for the employees, especially during absence, sick leave and vacation periods. The highly specialized nature of the work makes the recruitment of temporary agency workers very unlikely. Cognitive Task Analysis The requests for ambulances can be divided into three different types: 112-requests (urgent), regular ordered patient transfer (A3-request) and incidental non-urgent patient transfer (A2/A3 request). Before the dispatcher starts working, the most important
Cognitive analysis and modeling of an ambulance dispatch task
227
information about incidents during the previous shift is communicated, e.g. critical incidents, weather constraints and traffic information. The employee will use this information whenever it affects the standards of task performance. This task element is not very demanding cognitively because the information is only occasionally used. After the information is acquired, the dispatcher starts to process the requests for patient transfers from the list. The A3-request on the list can be planned in advance, although in practice the dispatcher only plans half an hour to one hour in advance. Several steps are followed to process an A3 request. First the list is checked to find the location of the patient and the transfer address and a preliminary ambulance station is chosen. The dispatcher has several production rules to accomplish this task, for example: Production rule 1: If patient has to be collected from area X, Then choose station Y Or a more specific one, Production rule 2: If patient has to be collected from area X and ambulance A is located further away than B but A is returning from a ride in the neighborhood, Then select ambulance A The second production rule contains more conditions than the first one, indicating a difference in the complexity of the applied rules. After the selection of an ambulance, the station is notified and the necessary information is communicated to the ambulance personnel. The cognitive demands of the selection task depend on the number of conditions in the production rule to be matched in working memory. The number of conditions in a production rule is further extended by a second main task in the work, namely maintaining coverage of the area. In this task the dispatcher has to (re)construct a cognitive map of the geographical area (mentally) and determine whether the area would be covered in the case of a 112-request. This is a continuous task with every request that is processed. Whenever an area is too sparsely covered or too vulnerable to guarantee the 15 minute arrival time of an ambulance, the dispatcher either has to select another ambulance station or send an ambulance to a strategic location to ensure coverage of that area. During interviews the dispatchers indicated that this maintain and control task is one of the most cognitively demanding but also most motivating aspects of their work. The task requires detailed geographic knowledge of the region integrated with dynamic task knowledge concerning the availability of ambulances. Contrary to other comparable ambulance dispatch control units, they have no geographical information system (GIS) to support this task. Due to the amount and complexity of the information, the dispatcher is only partially aware of the situation, but he can look up additional information in the computer systems to update the cognitive map or model. If an ambulance is deployed to cover that area, the same subtasks have to be performed as with a normal ambulance request. In between the first two tasks (processing A3-list and maintaining coverage) a 112request may arise. As soon as a 112-call arrives, the dispatcher puts the current task aside and focuses on the new incoming call. The first subtask is to clarify the request of the inquirer. Most 112-calls concern minor or major accidents and the main goal of this task is to extract relevant and crucial information from the caller. In severe situations the caller is often in a state of panic, which means that the collection of information is a cognitively and emotionally demanding job. The next fragment illustrates the often chaotic information transfer:
Clinical assessment computerized methods
228
After a caller has given a detailed description of a victim of a car accident the dispatcher asks: “But where are you standing?” “I am in front of the HEMA”, he replied. “Yes, but in front of which HEMA?” “You know the HEMA across the main street.” The caller had obviously no idea he was talking to someone who was not located in his village and probably assumed he was talking to someone at the nearest police office. The dispatcher has to filter all relevant information and also to check if the call is not a fake one. Although formal procedures exist to clarify requests, most dispatchers have adopted their own strategy in interview technique. The relevant data is submitted and administrated in the computer system. Relevant data includes location of accident, number of persons involved, type of accident, urgency of accident, specific details and the name of the caller. In the next task the nearest ambulance station is selected. Contrary to processing the request on the list this task has one simple production rule: Production rule 3: If accident is at place X, Then search for nearest ambulance station and select appropriate ambulance This production rule places demands especially on the persons geographic knowledge of the region. After the selection of the nearest ambulance the maintain and control task has to be performed again. The processing of a spontaneous A3-request is comparable to a 112-request. The main differences are that, in the former case the information is communicated by a professional, and the time pressure on the allocation of an ambulance is lower. In Figure 3, the main tasks and subtasks are arranged in a table of HTA format. The plans or procedures to accomplish the task can be described as follows: Plan 0:
Throughout do 1 & 2, if new request arrives do 3 or 4 else wait
Plan 1:
Throughout do 1.1. to 1.3. if finished do 2 or if new request arrives do 3 or 4
Plan 2:
Throughout do 2.1. if area not covered do 2.2. to 2.4. or if new request arrives do 3 or 4
Plan 3:
Throughout do 3.1. to 3.4. if finished do 2
Plan 4:
If immediate ambulance is necessary throughout do 4.1. to 4.4. if delayed ambulance request place on list if 112-request arrives do 3 or if finished do 2
Cognitive analysis and modeling of an ambulance dispatch task
229
Figure 3. Simplified HTA-table with the four main tasks and several subtasks (see next for explanation of the (sub) tasks and the plans). Modeling Task Performance and Workload As part of the field study the (sub)tasks of 12 dispatchers are observed during an 8-hour working day (Messer & Duyndam, 1999). The observed tasks are comparable to the tasks depicted in the HTA-table and contain the categories 112-request, A3-request, submit information to system, communicate, contact ambulance station, waiting for call, and others. The mean frequency, duration and standard deviation of each task, averaged across all subjects, is computed. Also the transition probability and transition time between pairs of observation categories are calculated. These variables are necessary to construct the task network model of the dispatch work, and to predict 8-hour time profiles. To predict a workload index over a working day, the emotional and cognitive load of each task is estimated by the researcher (Kramer, 1999) in co-operation with the dispatchers on a five point scale. The following ratings (see table 1.) are assigned to each task in the model: The cognitive and emotional estimates are assigned at the beginning of a task and reset at the end-time of a task. Whenever two tasks coincide, the load of the two tasks are summed. Several models are constructed to determine the consequences of different distributions of 112-requests and A3 requests on both time profiles of rest time, single and multiple task performance and on workload demands. Four different models are compared with a ratio of 112-A3 requests of 20:80 (present situation), 30:70, 40:60 and 50:50. The time profiles and work load index are predicted throughout an 8-hour working day. Each model is run 100 times with the same random seed. The results from running the models are used to predict the consequences of incidental and structural increases of
Clinical assessment computerized methods
230
112-request and to determine the optimal balance between the number and severity of calls and the minimal number of ambulance dispatchers.
Table 1. Estimates (five-point scale) of the researcher of the cognitive and emotional load of different subtasks in the dispatcher task. The estimation is based on information provided by the dispatchers. Cognitive load
Emotional load
112-request
4
5
A3-request
4
2
Submit information
2
1
Communicate
2
2
Contact Ambulance Station
2
2
Wait
1
1
Simulating the Dispatch-Task in a Virtual Environment As has been shown in the previous sections, the dispatch work can be considered as highly cognitively demanding. Especially the continuing control and maintenance of adequate resources without the support of visual aids seems to load working memory capacity considerably. Dispatching ambulances could probably be facilitated by using a graphical interface with information on locations and the presence of ambulances. However, the actual work environment of dispatchers doesn’t allow for controlled experimentation during daily tasks because of the high risks involved and the variability of requests. This is the main reason to create a simulation of the dispatch work in a virtual environment that can be used in the laboratory. Bos et al. (1999) describe the general design of a laboratory environment for controllable experiments with complex tasks. The dispatch task is implemented in this environment. The basic subtasks of the dispatch task are included in the simulation task but new and different forms of computer support are added to the task. The efficiency, user-friendliness and cognitive load of different forms of decision support can be evaluated in this way. Also more detailed analysis of human task performance is possible in this virtual environment. The task environment is structured in such a way that both individual and group-work can be studied. This implies that an open environment is created in which all the subjects in the work environment can access information on availability and location of ambulances. Also information between dispatchers can be transferred. The structure of the virtual environment of the dispatch task (figure 4.) is based on the general layout of the task environment as described by Bos et al. (1999). Two dispatchers
Cognitive analysis and modeling of an ambulance dispatch task
231
can work separately to process incoming and planned requests for ambulances. However, they draw on the same pool of ambulances. Each VDU work unit is equipped with a computer system with the necessary local programs, e.g. GIS, communication and administration tools to accomplish the tasks. The local programs communicate with a central master system that coordinates the messages and events from and to the units. Three different databases are located at this master system. The first database contains a list with possible 112-requests based on actual 112-requests over the last years. The task program on the central system can at any time randomly select a 112-request from the 112-database. This 112-request is successively assigned and transferred to one of the work units. At the start of the task a random list of planned A3 rides is selected from a second database with A3 requests also based on actual A3 rides. Processing this list is the basic task to be accomplished by the subjects. From the same list, incidental A3 requests are selected at random and transferred to a work unit. The third database contains geographical information of the province of Groningen. This database is divided into two different parts, a static and a dynamic part. All roads and streets, crossings, city and villages, zip-codes and other information are present as static information in the database. The dynamic part of the database contains the information of location of ambulance stations and the current position of ambulances on the road. Also, the mapping of ambulance requests and geographical information is represented in the dynamic part of the database. Both static data and dynamic information about location of ambulances is communicated to the work units and used to rebuild a (graphical) user interface. This user interface is used to support the subject with specific information on the nearest ambulance station and the coverage of the area. The graphic information is also projected on a large screen by a beamer, to provide the subject with general informational cues of region, position of ambulances, and the like. One of the important aspects of a realistic simulation environment is the natural behavior of task elements. In the real task, the ambulance personnel determine the shortest or fastest way to an accident spot. Only occasionally does the dispatcher provide information on traffic or weather constraints. In the virtual environment, driving the ambulances is accomplished by autonomous agents. A request for an ambulance is sent from the work unit to the central system that transfers the request to the autonomous agent program. This program can accept (and occasionally refuse) the request, after which the agent will search for the most appropriate route to the place of the accident. The data for the route can be found in the static part of the geographical database to which the agent frequently transfers its location within the region. This information is stored in the dynamic part of the database and is available to the dispatcher at work.
Clinical assessment computerized methods
232
Figure 4. Structure of the virtual dispatch environment. Two work units are connected to a central server that implements the routines for the task and controls the communication between three different databasesystems. If a request is submitted to the system, an autonomous agent will simulate the ambulance behavior. It may become clear that several software components are necessary to simulate the dispatch task. The components that have to be developed are: 1) the implementation of the dispatch task world with task objects, e.g. 112-generation, the interfaces; 2) the module for the communication between the central and local programs; 3) the programs for autonomous agents; and 4) a script parsing module that allows for manipulation and controlling experiments. In this last module the necessary specifications of an experiment, such as number, location, timing and type of 112- and A3 requests, are provided and compiled into an experimental session. Conclusions and Discussion The approach that is adopted in this study indicates that a thorough analysis of real complex tasks can provide valuable information for the simulation of those tasks in an
Cognitive analysis and modeling of an ambulance dispatch task
233
experimental environment. The system description is useful in determining the context and the rules of the ‘game’ being played. In the cognitive task analysis, the structure and interrelationship between tasks are further defined and are used to make up the specific goals and actual actions to be performed in the game. Finally, the modeling of the workload and time profiles in the dispatch task is used to predict possible limitations in human information capacity. Without adequate support, these limitations will affect task performance or increase mental effort. The three analysis steps put the final play together. The question remains as to what should be studied in the virtual dispatch environment. The dispatch task is clearly a complex task in which natural decision making and spatial planning are prominently feasible. The virtual environment allows for several research questions of which a few are listed below: 1. What type of interface is most optimal in supporting the dispatch task and maintaining awareness of the availability of ambulances? 2. How can the user interface adapt to differences in expertise between subjects and to consequences of fatigue and work load upon subjects? 3. How do interruptions affect task performance in the dispatch task? 4. What strategies do subjects adopt to cope with excessive cognitive demands and how can we support these? 5. How do dispatchers co-operate when faced with restricted resources of ambulances? The research questions described are only global indications of the research topics that can be studied in the virtual dispatch environment. The next few years will be necessary to answer at least some of the questions addressed above. References Anderson, J.R. (1983). The architecture of cognition. Cambridge: Harvard University Press.
Annett, J., & Duncan, K.D. (1967). Task analysis and training design. Occupational Psychology, 41, 211–221. Bos, J., Mulder, L.J.M., & van Ouwerkerk, R.J. (1999). Workplace for Analysis of Task Performance (in this volume). Card, S.K., Moran, T.P., & Newell, A. (1983). The psychology of human-computer interaction. Hillsdale NJ: Erlbaum. Kramer, R. (1999). Analyse van het werk van centralisten van de Meldkamer Groningen. Intern afstudeerverslag: Rijksuniversiteit Groningen. Laughery, K.R., & Corker, K. (1997). Computer modeling and simulation. In G.Salvendy (Ed.), Handbook of human factors and ergonomics 2nd edition. New York: John Wiley. Matern, B. (1984). Psychologische Arbeitsanalyse. Berlin: Springer Verlag. Messer, N., & Duyndam, K. (1999). Een veldonderzoek naar de mentale belasting van meldkamerwerk. Intern projectverslag: Rijksuniversiteit Groningen. Micro Analysis and Design (1998). Micro Saint User’s Manual Windows Version 3.0. Boulder, CO: Micro Analysis and Design. O’Hare, D., Wiggins, M., Williams, A., & Wong, W. (1998). Cognitive task analyses for decision centred design and training. Ergonomics, 41(11), 1698–1718. Vries-Griever, A.H.G.de. (1989). De methode van psychologische arbeidsanalyse. In T.F.Meijman (Ed.), Mentale belasting en werkstress. Assen: Van Gorcum.
Author Index Adema, J. Azzolini, M. 91 Behrendt, E. 71 Bendermacher, N. 120 Boom, J. 136 Bos, J. 176, 220, 235 Bouwhuisen, C.F. 167 Brand, N. 71, 136 Corte, G. de 3 Delfos, M.F. 57 Erkens, G. 28 Geel, R. van 120 Groeneveld, W.H. 91 Haan, A. de 191 Haan, E.H.F. de 16 Heemskerk, M. 205 Hoekstra, E. 176 Jaspers, J. 28 Kanselaar, G. 28 Kessels, R.P.C. 16 Kramer, R. 235 Kunst, H. 136 Lafosse, C. 3 Leenen, I. 3 Linker, K. 154 Lodewijkx, H. 135 Maarse, F.J. 167 Maes, H. 3 Massink, J. 71
Author index
Mey, H. de 120 Moeremans, M. 3 Moormann, P.P. Mulder, L.J.M. 176, 205, 220, 235 Ouwerkerk, R.J.van 220, 235 Panhuijsen, G. 136 Passchier, J. 91 Postma, A. 16 Quispel, L. 205 Ruiter, J.A. 176 (Tabachneck-) Schijf, H.J.M. 28 Smit, J.R. 176 Tulen, J.H.M. 91 Vandenbussche, E. 3 Veldman, J.B.P. 176 Ven, J.van de 191 Vos, H.J. 107 Vries, J.A. de 91 Warris, S. 205 Wetering, B.J.M.van de 91 Witteman, C. 144 Wolffelaar, P.C.van 205 Wybenga era D 176
235
COMPUTERS in PSYCHOLOGY 1. Computers in de psychologic (in Dutch) F.J.Maarse, P.Wittenburg, E.A.Zuiderveen et al. 1985 ISBN 90 265 0618 X 2. Methods, Instrumentation and Psychodiagnostics F.J.Maarse, L.J.M.Mulder, W.P.B.Sjouw & A.E.Akkerman 1988 ISBN 90 265 0896 4 3. Applications in Education, Research and Psychodiagnostics L.J.M.Mulder, F.J.Maarse, W.P.B.Sjouw & A.E.Akkerman 1991 ISBN 90 265 1170 1 4. Tools for Experimental and Applied Psychology F.J.Maarse, A.E.Akkerman, A.N.Brand, L.J.M.Mulder & M.J.van der Stelt 1993 ISBN 90 265 12686 5. Applications, Methods, and Instrumentation A.E.Akkerman, A.N.Brand, & M.J.van der Stelt 1993 ISBN 90 265 1415 8 6. Cognitive Ergonomics, Clinical Assessment and Computer-Assisted Learning B.P.L.M.den Brinker, P.J.Beek, A.N.Brand, F.J.Maarse & L.J.M.Mulder 1999 ISBN 90 265 1553 7 7. Clinical Assessment, Computerized Methods, and Instrumentation F.J.Maarse, A.E.Akkerman, A.N.Brand & L.J.M.Mulder 2003 ISBN 90 265 1553 7