Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science
6334
Yiyu Yao Ron Sun Tomaso Poggio Jiming Liu Ning Zhong Jimmy Huang (Eds.)
Brain Informatics International Conference, BI 2010 Toronto, ON, Canada, August 28-30, 2010 Proceedings
13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Yiyu Yao University of Regina, Regina, SK, Canada E-mail:
[email protected] Ron Sun Rensselaer Polytechnic Institute, Troy, NY, USA E-mail:
[email protected] Tomaso Poggio Massachusetts Institute of Technology, Cambridge, MA, USA E-mail:
[email protected] Jiming Liu Hong Kong Baptist University, Kowloon Tong, Hong Kong E-mail:
[email protected] Ning Zhong Maebashi Institute of Technology, Maebashi-City, Japan E-mail:
[email protected] Jimmy Huang York University, Toronto, ON, Canada E-mail:
[email protected] Library of Congress Control Number: 2010932525 CR Subject Classification (1998): I.2, I.4, I.5, H.3, H.5, H.4 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-642-15313-5 Springer Berlin Heidelberg New York 978-3-642-15313-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180
Preface
This volume contains the papers selected for presentation at The 2010 International Conference on Brain Informatics (BI 2010) held at York University, Toronto, Canada, during August 28–30, 2010. It was organized by the Web Intelligence Consortium (WIC), the IEEE Computational Intelligence Society Task Force on Brain Informatics (IEEE-CIS TF-BI), and York University. The conference was held jointly with the 2010 International Conference on Active Media Technology (AMT 2010). Brain informatics (BI) has emerged as an interdisciplinary research field that focuses on studying the mechanisms underlying the human information processing system (HIPS). It investigates the essential functions of the brain, ranging from perception to thinking, and encompassing such areas as multi-perception, attention, memory, language, computation, heuristic search, reasoning, planning, decision-making, problem-solving, learning, discovery, and creativity. The goal of BI is to develop and demonstrate a systematic approach to achieving an integrated understanding of both macroscopic and microscopic-level working principles of the brain, by means of experimental, computational, and cognitive neuroscience studies, as well as utilizing advanced Web intelligence (WI)-centric information technologies. BI represents a potentially revolutionary shift in the way that research is undertaken. It attempts to capture new forms of collaborative and interdisciplinary work. In this vision, new kinds of BI methods and global research communities will emerge, through infrastructure on the wisdom Web and knowledge grids that enable high-speed and distributed, large-scale analysis and computations, and radically new ways of sharing data/knowledge. The Brain Informatics Conferences started with the First WICI International Workshop on Web Intelligence meets Brain Informatics (WImBI 2006), held at Beijing, China, December 15–16, 2006. The second conference, Brain Informatics 2009, was held again in Beijing, China, October 22–24, 2009. This series is the first conference specifically dedicated to interdisciplinary research in BI and provides an international forum to bring together researchers and practitioners from diverse fields, such as computer science, information technology, artificial intelligence, Web intelligence, cognitive science, neuroscience, medical science, life science, economics, data mining, data science and knowledge science, intelligent agent technology, human–computer interaction, complex systems, and systems science, to present the state of the art in the development of BI, and to explore the main research problems in BI that lie in the interplay between the studies of the human brain and the research of informatics. All the papers submitted to BI 2010 were rigorously reviewed by three committee members and external reviewers. The selected papers offered new insights into the research challenges and development of BI.
VI
Preface
There are bidirectional mutual support tracks of BI research. In one direction, one models and characterizes the functions of the human brain based on the notions of information processing systems. WI-centric information technologies are applied to support brain science studies. For instance, the wisdom Web, knowledge grids, and cloud computing enable high-speed, large-scale analysis, simulation, and computation as well as new ways of sharing research data and scientific discoveries. In another direction, informatics-enabled brain studies, e.g., based on fMRI, EEG, and MEG, significantly broaden the spectrum of theories and models of brain sciences and offer new insights into the development of human-level intelligence toward brain-inspired wisdom Web computing. BI 2010 had a very exciting program with many features, ranging from keynote talks, regular technical sessions, WIC featured sessions and social programs. All of these would not have been possible without the great support of the authors in submitting and presenting their best and latest research results, the distinguished contributions of keynote speakers, Vinod Goel (York University, Canada), Jianhua Ma (Hosei University, Japan), Ben Shneiderman (University of Maryland, USA) and Yingxu Wang (University of Calgary, Canada), in preparing and delivering their very stimulating talks, and the generous dedication of the Program Committee members and the external reviewers in reviewing the submitted papers. We wish to express our gratitude to all authors, the keynote speakers, and the members of the Conference Committees for their instrumental and unfailing support. BI 2010 could not have taken place without the great team effort of the Local Organizing Committee, the support of the International WIC Institute, Beijing University of Technology, China and York University, Canada. Our special thanks go to Aijun An, Juzhen Dong, Jian Yang, and Daniel Tao for organizing and promoting BI 2010 and coordinating with AMT 2010. We are grateful to the Springer Lecture Notes in Computer Science (LNCS/LNAI) team for their generous support. We thank Alfred Hofmann and Anna Kramer of Springer for their help in coordinating the publication of this special volume in an emerging and interdisciplinary research field. August 2010
Yiyu Yao Ron Sun Tomaso Poggio Jiming Liu Ning Zhong Jimmy Huang
Conference Organization
Conference General Chairs Tomaso Poggio Jiming Liu
Massachusetts Institute of Technology, USA International WIC Institute, Beijing University of Technology, China Hong Kong Baptist University, Hong Kong
Program Chairs Yiyu Yao
Ron Sun
International WIC Institute, Beijing University of Technology, China University of Regina, Canada Rensselaer Polytechnic Institute, USA
Organizing Chair Jimmy Huang
York University, Toronto, Canada
Publicity Chairs Jian Yang Daniel Tao
International WIC Institute, Beijing University of Technology, China Queensland University of Technology, Australia
IEEE-CIS TF-BI Chair Ning Zhong
Maebashi Institute of Technology, Japan International WIC Institute, Beijing University of Technology, China
WIC Co-chairs/Directors Ning Zhong Jiming Liu
Maebashi Institute of Technology, Japan Hong Kong Baptist University, Hong Kong
WIC Advisory Board Edward A. Feigenbaum Setsuo Ohsuga
Stanford University, USA University of Tokyo, Japan
VIII
Conference Organization
Benjamin Wah Philip Yu L.A. Zadeh
University of Illinois, Urbana-Champaign, USA University of Illinois, Chicago, USA University of California, Berkeley, USA
WIC Technical Committee Jeffrey Bradshaw Nick Cercone Dieter Fensel Georg Gottlob Lakhmi Jain Jianchang Mao Pierre Morizet-Mahoudeaux Hiroshi Motoda Toyoaki Nishida Andrzej Skowron Jinglong Wu Xindong Wu Yiyu Yao
UWF/Institute for Human and Machine Cognition, USA York University, Canada University of Innsbruck, Austria Oxford University, UK University of South Australia, Australia Yahoo! Inc., USA Compiegne University of Technology, France Osaka University, Japan Kyoto University, Japan Warsaw University, Poland Okayama University, Japan University of Vermont, USA University of Regina, Canada
Program Committee John R. Anderson Chang Cai Xiaocong Fan Mohand-Said Hacid D. Frank Hsu Kazuyuki Imamura Kuncheng Li Peipeng Liang Pawan Lingras Duoqian Miao Mariofanna Milanova Sankar Kumar Pal Frank Ritter Hideyuki Sawada Lael Schooler Tomoaki Shirao Andrzej Skowron Dominik Slezak
Carnegie Mellon University, USA National Rehabilitation Center for Persons with Disabilities, Japan The Pennsylvania State University, USA Universite Claude Bernard Lyon 1, France Fordham University, USA Maebashi Institute of Technology, Japan Xuanwu Hospital, China Beijing University of Technology, China Saint Mary’s University, Canada Tongji University, China University of Arkansas at Little Rock, USA Indian Statistical Institute, India Penn State University, USA Kagawa University, Japan Max Planck Institute for Human Development, Germany Gunma University Graduate School of Medicine, Japan Warsaw University, Poland University of Warsaw and Infobright Inc., Poland
Conference Organization
Diego Sona Piotr S. Szczepaniak Shusaku Tsumoto Frank van der Velde Guoyin Wang
Fondazione Bruno Kessler, Italy Technical University of Lodz, Poland Shimane University, Japan Leiden University, The Netherlands Chongqing University of Posts and Telecommunications, China Okayama University, Japan International WIC Institute, Beijing University of Technology, China University of Rome “Tor Vergata”, Italy Tsinghua University, China Georgia State University, USA Maebashi Institute of Technology, Japan International WIC Institute, Beijing University of Technology, China Fudan University, China
Jinglong Wu Jian Yang Fabio Massimo Zanzotto Bo Zhang Yanqing Zhang Ning Zhong Haiyan Zhou Yangyong Zhu
Additional Reviewers Paolo Avesani Emanuele Olivetti
Yang Mei Linchang Qin
Andrea Mognon Shujuan Zhang
IX
Table of Contents
Keynote Talks Fractionating the Rational Brain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vinod Goel Cognitive Informatics and Denotational Mathematical Means for Brain Informatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yingxu Wang
1
2
Cognitive Computing An Adaptive Model for Dynamics of Desiring and Feeling Based on Hebbian Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tibor Bosse, Mark Hoogendoorn, Zulfiqar A. Memon, Jan Treur, and Muhammad Umair Modelling the Emergence of Group Decisions Based on Mirroring and Somatic Marking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark Hoogendoorn, Jan Treur, C. Natalie van der Wal, and Arlette van Wissen
14
29
Rank-Score Characteristics (RSC) Function and Cognitive Diversity . . . . D. Frank Hsu, Bruce S. Kristal, and Christina Schweikert
42
Cognitive Effort for Multi-agent Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luca Longo and Stephen Barrett
55
Behavioural Abstraction of Agent Models Addressing Mutual Interaction of Cognitive and Affective Processes . . . . . . . . . . . . . . . . . . . . . . Alexei Sharpanskykh and Jan Treur
67
Data Brain and Analysis The Effect of the Normalization Strategy on Voxel-Based Analysis of DTI Images: A Pattern Recognition Based Assessment . . . . . . . . . . . . . . . . Gloria D´ıaz, Gonzalo Pajares, Eduardo Romero, Juan Alvarez-Linera, Eva L´ opez, Juan Antonio Hern´ andez-Tamames, and Norberto Malpica
78
XII
Table of Contents
Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos . . . . . . . . . . . . . . . . Sander Koelstra, Ashkan Yazdani, Mohammad Soleymani, Christian M¨ uhl, Jong-Seok Lee, Anton Nijholt, Thierry Pun, Touradj Ebrahimi, and Ioannis Patras Brain Signal Recognition and Conversion towards Symbiosis with Ambulatory Humanoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yasuo Matsuyama, Keita Noguchi, Takashi Hatakeyama, Nimiko Ochiai, and Tatsuro Hori Feature Rating by Random Subspaces for Functional Brain Mapping . . . Diego Sona and Paolo Avesani
89
101
112
Recurrence Plots for Identifying Memory Components in Single-Trial EEGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nasibeh Talebi and Ali Motie Nasrabadi
124
Comparing EEG/ERP-Like and fMRI-Like Techniques for Reading Machine Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabio Massimo Zanzotto and Danilo Croce
133
Improving Individual Identification in Security Check with an EEG Based Biometric Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qinglin Zhao, Hong Peng, Bin Hu, Quanying Liu, Li Liu, YanBing Qi, and Lanlan Li
145
Neuronal Modeling and Brain Modeling Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed Baghdadi, Nac´era Benamrane, and Lakhdar Sais
156
Domain-Specific Modeling as a Pragmatic Approach to Neuronal Model Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ralf Ansorg and Lars Schwabe
168
Guessing What’s on Your Mind: Using the N400 in Brain Computer Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marijn van Vliet, Christian M¨ uhl, Boris Reuderink, and Mannes Poel
180
A Brain Data Integration Model Based on Multiple Ontology and Semantic Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Li Xue, Yun Xiong, and Yangyong Zhu
192
Table of Contents
XIII
Perception and Information Processing How Does Repetition of Signals Increase Precision of Numerical Judgment? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eike B. Kroll, J¨ org Rieger, and Bodo Vogt
200
Sparse Regression Models of Pain Perception . . . . . . . . . . . . . . . . . . . . . . . . Irina Rish, Guillermo A. Cecchi, Marwan N. Baliki, and A. Vania Apkarian
212
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink . . . . Chen Xie, Lun Zhao, Duoqian Miao, Deng Wang, Zhihua Wei, and Hongyun Zhang
224
Learning Attentional Disengage from Test-Related Pictures in Test-Anxious Students: Evidence from Event-Related Potentials . . . . . . . . . . . . . . . . . . . . Rui Chen and Renlai Zhou
232
Concept Learning in Text Comprehension . . . . . . . . . . . . . . . . . . . . . . . . . . . Manas Hardas and Javed Khan
240
A Qualitative Approach of Learning in Parkinson’s Disease . . . . . . . . . . . . Delphine Penny-Leguy and Josiane Caron-Pargue
252
Cognition-Inspired Applications Modelling Caregiving Interactions during Stress . . . . . . . . . . . . . . . . . . . . . . Azizi Ab Aziz, Jan Treur, and C. Natalie van der Wal Computational Modeling and Analysis of Therapeutical Interventions for Depression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fiemke Both, Mark Hoogendoorn, Michel C.A. Klein, and Jan Treur A Time Series Based Method for Analyzing and Predicting Personalized Medical Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qinwin Vivian Hu, Xiangji Jimmy Huang, William Melek, and C. Joseph Kurian Language Analytics for Assessing Brain Health: Cognitive Impairment, Depression and Pre-symptomatic Alzheimer’s Disease . . . . . . . . . . . . . . . . . William L. Jarrold, Bart Peintner, Eric Yeh, Ruth Krasnow, Harold S. Javitz, and Gary E. Swan The Effect of Sequence Complexity on the Construction of Protein-Protein Interaction Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehdi Kargar and Aijun An
263
274
288
299
308
XIV
Table of Contents
Data Fusion and Feature Selection for Alzheimer’s Diagnosis . . . . . . . . . . Blake Lemoine, Sara Rayburn, and Ryan Benton A Cognitive Architecture Based on Neuroscience for the Control of Virtual 3D Human Creatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Felipe Rodr´ıguez, Francisco Galvan, F´elix Ramos, Erick Castellanos, Gregorio Garc´ıa, and Pablo Covarrubias Towards Inexpensive BCI Control for Wheelchair Navigation in the Enabled Environment – A Hardware Survey . . . . . . . . . . . . . . . . . . . . . . . . . Kenyon Stamps and Yskandar Hamam Expression Recognition Methods Based on Feature Fusion . . . . . . . . . . . . . Chang Su, Jiefang Deng, Yong Yang, and Guoyin Wang Investigation on Human Characteristics of Japanese Katakana Recognition by Active Touch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suguru Yokotani, Jiajia Yang, and Jinglong Wu
320
328
336 346
357
WICI Perspectives on Brain Informatics Towards Systematic Human Brain Data Management Using a Data-Brain Based GLS-BI System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianhui Chen, Ning Zhong, and Runhe Huang
365
The Role of the Parahippocampal Cortex in Memory Encoding and Retrieval: An fMRI Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mi Li, Shengfu Lu, Jiaojiao Li, and Ning Zhong
377
Brain Activation and Deactivation in Human Inductive Reasoning: An fMRI Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peipeng Liang, Yang Mei, Xiuqin Jia, Yanhui Yang, Shengfu Lu, Ning Zhong, and Kuncheng Li Clustering of fMRI Data Using Affinity Propagation . . . . . . . . . . . . . . . . . . Dazhong Liu, Wanxuan Lu, and Ning Zhong Interaction between Visual Attention and Goal Control for Speeding Up Human Heuristic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rifeng Wang, Jie Xiang, and Ning Zhong The Role of Posterior Parietal Cortex in Problem Representation . . . . . . Jie Xiang, Yulin Qin, Junjie Chen, Haiyan Zhou, Kuncheng Li, and Ning Zhong Basic Level Advantage and Its Switching during Information Retrieval: An fMRI Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haiyan Zhou, Jieyu Liu, Wei Jing, Yulin Qin, Shengfu Lu, Yiyu Yao, and Ning Zhong Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
387
399
407 417
427
437
Fractionating the Rational Brain Vinod Goel York University, Canada http://www.yorku.ca/vgoel
Considerable progress has been made over the past decade in our understanding of the neural basis of logical reasoning. Unsurprisingly these data are telling us that the brain is organized in ways not anticipated by cognitive theory. In particular, they’re forcing us to confront the possibility that there may be no unitary reasoning system in the brain (be it mental models or mental logic). Rather, the evidence points to a fractionated system that is dynamically configured in response to certain task and environmental cues. I will review three lines of demarcation including (a) systems for heuristic and formal processes (with evidence for some degree of content specificity in the heuristic system), (b) conflict detection/resolution systems, and (c) systems for dealing with certain and uncertain inferences; and then offer a tentative account of how the systems might interact to facilitate logical reasoning. Sensitivity to data generated by neuroimaging and patient methodologies will move us beyond the sterility of mental models vs. mental logic debate and further the development of cognitive theories of reasoning.
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, p. 1, 2010. c Springer-Verlag Berlin Heidelberg 2010
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics Yingxu Wang Director, International Institute of Cognitive Informatics and Cognitive Computing (IICICC) Director, Theoretical and Empirical Software Engineering Research Centre (TESERC) Dept. of Electrical and Computer Engineering, Schulich School of Engineering University of Calgary 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4 Tel.: (403) 220 6141, Fax: (403) 282 6855
[email protected] http://enel.ucalgary.ca/People/wangyx
Abstract. Cognitive informatics studies the natural intelligence and the brain from a theoretical and a computational approach, which rigorously explains the mechanisms of the brain by a fundamental theory known as abstract intelligence, and formally models the brain by contemporary denotational mathematics. This paper, as an extended summary of the invited keynote presented in AMT-BI 2010, describes the interplay of cognitive informatics, abstract intelligence, denotational mathematics, brain informatics, and computational intelligence. Some of the theoretical foundations for brain informatics developed in cognitive informatics are elaborated. A key notion recognized in recent studies in cognitive informatics is that the root and profound objective in natural, abstract, and artificial intelligence in general, and in cognitive informatics and brain informatics in particular, is to seek suitable mathematical means for their special needs that were missing in the last six decades. A layered reference model of the brain and a set of cognitive processes of the mind are systematically developed towards the exploration of the theoretical framework of brain informatics. The current methodologies for brain studies are reviewed and their strengths and weaknesses are analyzed. A wide range of applications of cognitive informatics and denotational mathematics are recognized in brain informatics toward the implementation of highly intelligent systems such as world-wide wisdom (WWW+), cognitive knowledge search engines, autonomous learning machines, and cognitive robots. Keywords: Cognitive informatics, abstract intelligence, brain informatics, cognitive computing, cognitive computers, natural intelligence, artificial intelligence, machinable intelligence, computational intelligence, denotational mathematics, concept algebra, system algebra, RTPA, visual semantic algebra, granular algebra, eBrain, engineering applications.
1 Introduction The contemporary wonder of sciences and engineering has recently refocused on the starting point of them: how the brain processes internal and external information Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 2–13, 2010. © Springer-Verlag Berlin Heidelberg 2010
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
3
autonomously and cognitively rather than imperatively as those of conventional computers. The latest advances and engineering applications of CI have led to the emergence of cognitive computing and the development of cognitive computers that perceive, learn, and reason [9, 18, 20, 23, 24, 32]. CI has also fundamentally contributed to autonomous agent systems and cognitive robots. A wide range of applications of CI are identified such as in the development of cognitive computers, cognitive robots, cognitive agent systems, cognitive search engines, cognitive learning systems, and artificial brains. The work in CI may also lead to a fundamental solution to computational linguistics, Computing with Natural Language (CNL), and Computing with Words (CWW) [34, 35]. Cognitive Informatics is a term coined by Wang in the first IEEE International Conference on Cognitive Informatics (ICCI 2002) [6]. Cognitive informatics [6, 8, 11, 12, 26, 27, 28, 29, 31] studies the natural intelligence and the brain from a theoretical and a computational approach, which rigorously explains the mechanisms of the brain by a fundamental theory known as abstract intelligence, and formally models the brain by contemporary denotational mathematics such as concept algebra [Wang, 2008b], real-time process algebra (RTPA) [7, 16], system algebra [15, 30], and visual semantic algebra (VSA) [19]. The latest advances in CI have led to a systematic solution for explaining brain informatics and the future generation of intelligent computers. A key notion recognized in recent studies in cognitive informatics is that the root and profound objective in natural, abstract, and artificial intelligence in general, and in cognitive informatics and brain informatics in particular, is to seek suitable mathematical means for their special needs, which were missing in the last six decades. This is a general need and requirement for searching the metamethodology in any discipline particularly those of emerging fields where no suitable mathematics has been developed or of traditional fields where persistent hard problems have been unsolved efficiently or completely [1, 2, 4, 13]. This paper is an extended summary of the invited keynote lecture presented in the 2010 joint International Conferences on Active Media Technology and Brain Informatics (AMT-BI 2010), which covers some of the theoretical foundations of brain informatics (BI) developed in cognitive informatics and denotational mathematics. In this paper, cognitive informatics as the science of abstract intelligence and cognitive computing is briefly described in Section 2. The fundamental theories and expressive tools for cognitive informatics, brain Informatics, and computational intelligence, collectively known as denotational mathematics, are introduced in Section 3. Applications of cognitive informatics and denotational mathematics in BI and cognitive computing are elaborated in Sections 4, where the layered reference model of the brain and a set of cognitive processes of the mind are systematically modeled towards the exploration of the theoretical framework of brain informatics.
2 Cognitive Informatics: The Science of Abstract Intelligence and Computational Intelligence Information is the third essence of the word supplementing energy and matter. A key discovery in information science is the basic unit of information, bit, abbreviated from a “binary digit”, which forms a shared foundation of computer science and informatics.
4
Y. Wang
The science of information, informatics, has gone through three generations of evolution, known as the classic, modern, and cognitive informatics, since Shannon proposed the classic notion of information [5]. The classical information theory founded by Shannon (1948) defined information as a probabilistic measure of the variability of message that can be obtained from a message source. Along with the development in computer science and in the IT industry, the domain of informatics has been dramatically extended in the last few decades. This led to the modern informatics that treats information as entities of messages rather than a probabilistic measurement of the variability of messages as in that of the classic information theory. The new perception of information is found better to explain the theories in computer science and practices in the IT industry. However, both classic and modern views on information are only focused on external information. The real sources and destinations of information, the human brains, are often overlooked. This leads to the third generation of informatics, cognitive informatics, which focuses on the nature of information in the brain, such as information acquisition, memory, categorization, retrieve, generation, representation, and communication. Information in cognitive informatics is defined as the abstract artifacts and their relations that can be modeled, processed, stored and processed by human brains. Cognitive informatics [6, 8, 11, 12, 26, 27, 28, 29, 31] is emerged and developed based on the multidisciplinary research in cognitive science, computing science, information science, abstract intelligence, and denotational mathematics since the inauguration of the 1st IEEE ICCI’02 [6]. Definition 1. Cognitive informatics (CI) is a transdisciplinary enquiry of computer science, information science, cognitive science, and intelligence science that investigates into the internal information processing mechanisms and processes of the brain and natural intelligence, as well as their engineering applications in cognitive computing. CI is a cutting-edge and multidisciplinary research area that tackles the fundamental problems shared by modern informatics, computation, software engineering, AI, cybernetics, cognitive science, neuropsychology, medical science, philosophy, linguistics, brain sciences, and many others. The development and the cross fertilization among the aforementioned science and engineering disciplines have led to a whole range of extremely interesting new research areas. The theoretical framework of CI encompasses four main areas of basic and applied research [11] such as: a) fundamental theories of natural intelligence; b) abstract intelligence; c) denotational mathematics; and d) cognitive computing. These areas of CI are elaborated in the following subsections. Fundamental theories developed in CI covers the Information-Matter-Energy (IME) model [8], the Layered Reference Model of the Brain (LRMB) [28], the Object-Attribute-Relation (OAR) model of information/knowledge representation in the brain [12], the cognitive informatics model of the brain [23, 26], Natural Intelligence (NI) [8], and neuroinformatics [12]. Recent studies on LRMB in cognitive informatics reveal an entire set of cognitive functions of the brain and their cognitive process models, which explain the functional mechanisms of the natural intelligence with 43 cognitive processes at seven layers known as the sensation, memory, perception, action, meta-cognitive, metainference, and higher cognitive layers [28].
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
5
Definition 2. Abstract intelligence (αI) is a universal mathematical form of intelligence that transfers information into actions and behaviors. The studies on αI form a field of enquiry for both natural and artificial intelligence at the reductive levels of neural, cognitive, functional, and logical from the bottom up [17]. The paradigms of αI are such as natural, artificial, machinable, and computational intelligence. The studies in CI and αI lay a theoretical foundation toward revealing the basic mechanisms of different forms of intelligence [25]. As a result, cognitive computers may be developed, which are characterized as a knowledge processor beyond those of data processors in conventional computing. Definition 3. Cognitive Computing (CC) is an emerging paradigm of intelligent computing methodologies and systems that implements computational intelligence by autonomous inferences and perceptions mimicking the mechanisms of the brain. CC is emerged and developed based on the transdisciplinary research in cognitive informatics and abstract intelligence. The term computing in a narrow sense is an application of computers to solve a given problem by imperative instructions; while in a broad sense, it is a process to implement the instructive intelligence by a system that transfers a set of given information or instructions into expected intelligent behaviors. The essences of computing are both its data objects and their predefined computational operations. From these facets, different computing paradigms may be comparatively analyzed as follows: a) Conventional computing - Data objects: abstract bits and structured data - Operations: logic, arithmetic, and functional operations
(1a)
b) Cognitive computing (CC) - Data objects: words, concepts, syntax, and semantics - Basic operations: syntactic analyses and semantic analyses - Advanced operations: concept formulation, knowledge representation, comprehension, learning, inferences, and causal analyses (1b) The latest advances in cognitive informatics, abstract intelligence, and denotational mathematics have led to a systematic solution for the future generation of intelligent computers known as cognitive computers [9, 18]. Definition 4. A cognitive computer (cC) is an intelligent computer for knowledge processing that perceive, learn, and reason. As that of a conventional von Neumann computers for data processing, cCs are designed to embody machinable intelligence such as computational inferences, causal analyses, knowledge manipulations, learning, and problem solving. According to the above analyses, a cC is driven by a cognitive CPU with a cognitive learning engine and formal inference engine for intelligent operations on abstract concepts as the basic unit of human knowledge. cCs are designed based on contemporary denotational mathematics [13, 21], particularly concept algebra, as that of Boolean algebra for the conventional von Neumann architecture computers. cC is an important extension of conventional computing in both data objects modeling capabilities and their advanced operations at the abstract level of concept beyond bits. Therefore, cC is
6
Y. Wang
an intelligent knowledge processor that is much closer to the capability of human brains thinking at the level of concepts rather than bits. It is recognized that the basic unit of human knowledge in natural language representation is a concept rather than a word [14], because the former conveys the structured semantics of the latter with its intention (attributes), extension (objects), and relations to other concepts in the context of a knowledge network. Main applications of the fundamental theories and technologies of CI can be divided into two categories. The first category of applications uses informatics and computing techniques to investigate intelligence science, cognitive science, and knowledge science problems, such as abstract intelligence, memory, learning, and reasoning. The second category includes the areas that use cognitive informatics theories to investigate problems in informatics, computing, software engineering, knowledge engineering, and computational intelligence. CI focuses on the nature of information processing in the brain, such as information acquisition, representation, memory, retrieval, creation, and communication. Via the interdisciplinary approach and with the support of modern information and neuroscience technologies, intelligent mechanisms of the brain and cognitive processes of the mind may be systematically explored [33] within the framework of CI.
3 Denotational Mathematics: A Metamethodology for Cognitive Informatics, Brain Informatics, Cognitive Computing, and Computational Intelligence It is recognized that the maturity of a scientific discipline is characterized by the maturity of its mathematical (meta-methodological) means. A key notion recognized in recent studies in cognitive informatics and computational intelligence is that the root and profound problem in natural, abstract, and artificial intelligence in general, and in cognitive informatics and brain informatics in particular, is to seek suitable mathematical means for their special needs. This is a general need and requirement for searching the metamethodology in any discipline particularly the emerging fields where no suitable mathematics has been developed and the traditional fields where persistent hard problems have been unsolved efficiently or completely [1, 2, 3, 4, 10, 13]. Definition 5. Denotational mathematics (DM) is a category of expressive mathematical structures that deals with high-level mathematical entities beyond numbers and sets, such as abstract objects, complex relations, perceptual information, abstract concepts, knowledge, intelligent behaviors, behavioral processes, inferences, and systems. A number of DMs have been created and developed [13, 21] such as concept algebra [14], system algebra [15, 30], real-time process algebra (RTPA) [7, 16], granular algebra [22], visual semantic algebra (VSA) [19], and formal causal inference methodologies. As summarized in Table 1 with their structures, mathematical entities, algebraic operations, and usages, the set of DMs provide a coherent set of contemporary mathematical means and explicit expressive power for CI, αI, CC, AI, and computational intelligence.
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
7
Table 1. Paradigms of Denotational Mathematics Paradigm
Structure
Concept algebra (CA)
CA (C, OP, Θ) = ({O, A, Rc , Ri , Ro},
System algebra (SA)
SA (S, OP, Θ) = ({C, Rc , Ri , Ro , B, Ω},
Real-time process algebra (RTPA)
RTPA
Mathematical entities
c
{•r , •c}, ΘC )
(O, A, Rc , Ri , Ro )
Algebraic operations •r •c
{•r , •c }, Θ)
S (C, Rc , Ri , Ro , B, Ω, Θ)
•r •c
(T, P, N)
P {:=, , ⇒, ⇐, , *, ), |*, |), @ ↑, ↓, !, ⊗, , §} ,
T
R
Usage
Algebraic manipulations on − + ∼ {⇒, ⇒⇒⇒ , , , , , , , } abstract concepts {↔, , ≺, ,=, ≅, ∼, }
Algebraic manipulations on {⇒, ⇒, ⇒⇒ , , , , , , } abstract systems { ,↔,∏,=, , } −
{→,
,
,
+
∼
Algebraic manipulations on , ||, ∯, |||, », , t, e, i} abstract processes , |, |…|…,
*
+
i
R ,R ,R ,
{N, Z, R, S, BL, B, H, P, TI, D, DT,
RT, ST, @eS, @t TM, @int , s BL}
Visual semantic algebra (VSA)
VSA
O
(O, •VSA )
{H ∪ S ∪ F ∪ L}
•VSA
{ ↑, ↓, ←→ , , , ⊗, , , @(p),@(x,y,x), n−1N
R(Ai
Ai+1)}
,
,
Algebraic manipulations on abstract visual objects/patterns
iN=0
GA (G, •r , • p , •c )
Granular algebra (GrA)
G (C, Rc , Ri , Ro , B, Ω, Θ)
= ((C, R , R , R , B, Ω ), •r , • p , •c ) c
i
•r
{ ,↔, ∏,=, , }
o
−
+
∼
⇒, ⇒, ⇒}
•p
{⇒,
•c
{, ,
Algebraic manipulations on abstract granules
}
Among the above collection of denotational mathematics, concept algebra is an abstract mathematical structure for the formal treatment of concepts as the basic unit of human reasoning and their algebraic relations, operations, and associative rules for composing complex concepts. It is noteworthy that, according to concept algebra, although the semantics of words may be ambiguity, the semantics of concept is always unique and precise in CC. Example 1. The word, “bank”, is ambiguity because it may be a notion of a financial institution, a geographic location of raised ground of a river/lake, and/or a storage of something. However, the three distinguished concepts related to “bank”, i.e., bo = bank(organization), br = bank(river), and bs = bank(storage), are precisely unique, which can be formally described in concept algebra [14] for CC as shown in Fig. 1, where K represents the entire concepts existed in the analyser’s knowledge. All given concrete concepts share a generic framework, known as the universal abstract concept as modeled in concept algebra as given below. Definition 6. An abstract concept, c, is a 5-tuple, i.e.:
c
(O, A, R c , Ri , R o )
(2)
where •
O is a nonempty set of objects of the concept, O = {o1, o2, …, om} ⊆ ÞO, where ÞO denotes a power set of abstract objects in the universal discourse
U.
8
Y. Wang
•
A is a nonempty set of attributes, A = {a1, a2, …, an} ⊆ ÞA, where ÞA
• •
denotes a power set of attributes in U. Rc = O × A is a set of internal relations. Ri ⊆ C′ × c is a set of input relations, where C′ is a set of external concepts in U.
•
Ro ⊆ c × C′ is a set of output relations. boST
(A, O, Rc, Ri, Ro)
// bank(organization)
= ( boST.A = {organization, company, financial business, money, deposit, withdraw, invest, exchange}, boST.O = {international bank, national bank, local bank, investment bank, ATM} boST.Rc = O × A, boST.Ri = K × boST, boST.Ro = boST × K ) brST
(A, O, Rc, Ri, Ro)
// bank(river)
= ( brST.A = {sides of a river, raised ground, a pile of earth, location}, brST.O = {river bank, lake bank, canal bank} brST.Rc = O × A, brST.Ri = K × brST, brST.Ro = brST × K ) bsST
(A, O, Rc, Ri, Ro)
// bank(storage)
= ( bsST.A = {storage, container, place, organization}, bsST.O = {information bank, human resource bank, blood bank} bsST.Rc = O × A, bsST.Ri = K × bsST, bsST.Ro = bsST × K )
Fig. 1. Formal and distinguished concepts derived from the word “bank”
Concept algebra provides a set of 8 relational and 9 compositional operations on abstract concepts as summarized in Table 1. Detailed definitions of operations defined in concept algebra may be referred to [14]. A Cognitive Learning Engine (CLE), known as the "CPU" of cCs, is under developing in my lab on the basis of concept algebra, which implements the basic and advanced cognitive computational operations of concepts and knowledge for cCs as outlined in Eq. 1b. Additional concept operations may be introduced in order to reveal the underpinning mechanisms of learning and natural language comprehension. One of
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
9
the advanced operations in concept algebra for knowledge processing is known as knowledge differential, which can be formalized in concept algebra as follows. Definition 7. Knowledge differential, dK/dt, is an eliciting operation on a set of knowledge K represented by a set of concepts over time that recalls new concepts learnt during a given period t1 through t2, i.e.: dK dt
d (OAR) dt = OAR(t2 ) − OAR(t1 )
(3)
= OAR.C (t2 ) \ OAR.C (t1 )
where the set of concepts, OAR.C(t1), are existing concepts that have already been known at time point t1. Example 2. As given in Example 1, assume the following concepts, OAR.C(t1) = {Co}, are known at t1, and the system’s learning result at t2 is OAR.C(t2) = {Co, Cr, Cs}. Then, a knowledge differential can be carried out using Eq. 3 as follows: dK dt
d (OAR ) dt = OAR.C (t2 ) \ OAR.C (t1 ) = {Co , Cr , Cs } \ {Co } = {Cr , Cs }
Concept algebra provides a powerful denotational mathematical means for algebraic manipulations of abstract concepts. Concept algebra can be used to model, specify, and manipulate generic “to be” type problems, particularly system architectures, knowledge bases, and detail-level system designs, in cognitive informatics, intelligence science, computational intelligence, computing science, software science, and knowledge science. The work in this area may also lead to a fundamental solution to computational linguistics, Computing with Natural Language (CNL), and Computing with Words (CWW) [34, 35].
4 Applications of Cognitive Informatics and Denotational Mathematics in Brain Informatics This section introduces the notion of brain informatics as developed by Zhong and his colleagues [36]. A functional and logical reference model of the brain and a set of cognitive processes of the mind are systematically developed towards the exploration of the theoretical framework of brain informatics. The current methodologies for brain studies are reviewed and their strengths and weaknesses are analyzed. Definition 8. Brain informatics (BI) is a joint field of brain and information sciences that studies the information processing mechanisms of the brain by computing and medical imagination technologies. A variety of life functions and their cognitive processes have been identified in cognitive informatics, neuropsychology, cognitive science, and neurophilosophy.
10
Y. Wang
Based on the advances of research in cognitive informatics and related fields, a Layered Reference Model of the Brain (LRMB) is developed by Wang and his colleagues [28]. The LRMB model explains the functional mechanisms and cognitive processes of the natural and artificial brains with 43 cognitive processes at seven layers. LRMB elicits the core and highly recurrent cognitive processes from a huge variety of life functions, which may shed light on the study of the fundamental mechanisms and interactions of complicated mental processes as well as of cognitive systems, particularly the relationships and interactions between the inherited and the acquired life functions as well as those of the subconscious and conscious cognitive processes. Any everyday life function or behavior, such as reading or driving, is a concurrent combination of part or all of the 43 fundamental cognitive processes according to LRMB. The basic methodologies in CI and BI are: a) logic (formal and mathematical) modeling and reasoning; b) empirical introspection; c) experiments (particularly abductive observations on brain patients); and d) using high technologies particularly brain imaging technologies. The central roles of formal logical and functional modeling for BI have been demonstrated in Sections 2 and 3 by CI, αI, and denotational mathematics. The advantage and disadvantaged of the latest methodologies of brain imaging are analyzed in the following subsections. Modern brain imaging technologies such as EEG, fMRI, MEG, and PET are illustrated as shown in Fig. 2. Although many promising results on cognitive functions of the brain have been derived by brain imaging studies in cognitive tests and neurobiology, they are limited to simple cognitive functions compared with the entire framework of the brain as revealed in LRMB. Moreover, there is a lack of a systematic knowledge about what roles particular types of neurons may play in complex cognitive functions such as learning and memorization, because neuroimages cannot pinpoint to detailed relationships between structures and functions in the brain.
Fig. 2. Major imaging technologies in brain studies
The limitations of current brain imaging technologies such as PET and fMRI towards understanding the functions of the brain may be equivalent to the problem to exam the functions of a computer by looking at its layout and locations where they are
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
11
active using imaging technologies. It is well recognized that without understanding the logical and functional models and mechanisms of the CPU as shown in Fig. 3, nobody can explain the functions of it by fine pictures of the intricate interconnections of millions of transistors (gates). Further, it would be more confusing because the control unit (CU) and arithmetic and logic unit (ALU) of the CPU and its buses are always active for almost all different kind of operations. So do unfortunately, brain science and neurobiology. Without a rational guide to the high-level life functions and cognitive processes as shown in the LRMB reference model, nobody may pinpoint rational functional relationship between a brain image and a specific behaviour such as an action of learning and its effect in memory, a recall of a particular knowledge retained in long-term memory, and a mapping of the same mental object from shortterm memory to long-term memory.
Fig. 3. The layout of a CPU
The above case study indicates that neuroscience theories and artificial intelligence technologies toward the brain have been studied at almost separate levels so far in biophysics, neurology, cognitive science, and computational/artificial intelligence. However, a synergic model as that of LRMB that maps the architectures and functions of the brain crossing individual disciplines is necessary to explain the complexity and underpinning mechanisms of the brain. This coherent approach will leads to the development of novel engineering applications of CI, αI, DM, CC, and BI, such as cognitive computers, artificial brains, cognitive robots, and cognitive software agents, which mimic the natural intelligence of the brain based on the theories and denotational mathematical means developed in cognitive informatics and abstract intelligence.
5 Conclusions Cognitive Informatics (CI) has been described as a transdisciplinary enquiry of computer science, information sciences, cognitive science, and intelligence science that investigates into the internal information processing mechanisms and processes of the brain and natural intelligence, as well as their engineering applications in
12
Y. Wang
cognitive computing. Brain informatics (BI) has been introduced as a joint field of brain and information sciences that studies the information processing mechanisms of the brain by computing and medical imagination technologies. This paper has presented some of the theoretical foundations of brain informatics developed in cognitive informatics, abstract intelligence, and denotational mathematics. In this paper, cognitive informatics as the science of abstract intelligence and cognitive computing has been briefly introduced. A set of denotational mathematics, particularly concept algebra, has been elaborated in order to enhance the fundamental theories and mathematical means for cognitive informatics, brain Informatics, and computational intelligence. Applications of cognitive informatics and denotational mathematics in brain informatics and cognitive computing are demonstrated based on the Layered Reference Model of the Brain (LRMB) and a set of cognitive processes of the mind towards the exploration of the theoretical framework of brain informatics.
References 1. Bender, E.A.: Mathematical Methods in Artificial Intelligence. IEEE CS Press, Los Alamitos (1996) 2. Boole, G.: The Laws of Thought, 1854. Prometheus Books, NY (2003) 3. Kline, M.: Mathematical Thought: From Ancient to Modern Times, Oxford, UK (1972) 4. Russell, B.: The Principles of Mathematics, 1903. W.W. Norton & Co., NY (1996) 5. Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423, 623–656 (1948) 6. Wang, Y.: Keynote: On Cognitive Informatics. In: Proc. 1st IEEE International Conference on Cognitive Informatics (ICCI 2002), Calgary, Canada, pp. 34–42. IEEE CS Press, Los Alamitos (August 2002a) 7. Wang, Y.: The Real-Time Process Algebra (RTPA). Annals of Software Engineering 14, 235–274 (2002b) 8. Wang, Y.: On Cognitive Informatics. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy 4(3), 151–167 (2003) 9. Wang, Y.: Keynote: Cognitive Informatics - Towards the Future Generation Computers that Think and Feel. In: Proc. 5th IEEE International Conference on Cognitive Informatics (ICCI 2006), Beijing, China, pp. 3–7. IEEE CS Press, Los Alamitos (July 2006) 10. Wang, Y.: Software Engineering Foundations: A Software Science Perspective, July 2007. CRC Series in Software Engineering, vol. II. Auerbach Publications, NY (July 2007a) 11. Wang, Y.: The Theoretical Framework of Cognitive Informatics. International Journal of Cognitive Informatics and Natural Intelligence 1(1), 1–27 (2007b) 12. Wang, Y.: The OAR Model of Neural Informatics for Internal Knowledge Representation in the Brain. International Journal of Cognitive Informatics and Natural Intelligence 1(3), 64–75 (2007c) 13. Wang, Y.: On Contemporary Denotational Mathematics for Computational Intelligence. Transactions of Computational Science 2, 6–29 (2008a) 14. Wang, Y.: On Concept Algebra: A Denotational Mathematical Structure for Knowledge and Software Modeling. International Journal of Cognitive Informatics and Natural Intelligence 2(2), 1–19 (2008b) 15. Wang, Y.: On System Algebra: A Denotational Mathematical Structure for Abstract System Modeling. International Journal of Cognitive Informatics and Natural Intelligence 2(2), 20–42 (2008c)
Cognitive Informatics and Denotational Mathematical Means for Brain Informatics
13
16. Wang, Y.: RTPA: A Denotational Mathematics for Manipulating Intelligent and Computational Behaviors. International Journal of Cognitive Informatics and Natural Intelligence 2(2), 44–62 (2008d) 17. Wang, Y.: On Abstract Intelligence: Toward a Unified Theory of Natural, Artificial, Machinable, and Computational Intelligence. International Journal of Software Science and Computational Intelligence 1(1), 1–18 (2009a) 18. Wang, Y.: On Cognitive Computing. International Journal of Software Science and Computational Intelligence 1(3), 1–15 (2009b) 19. Wang, Y.: On Visual Semantic Algebra (VSA): A Denotational Mathematical Structure for Modeling and Manipulating Visual Objects and Patterns. International Journal of Software Science and Computational Intelligence 1(4), 1–15 (2009c) 20. Wang, Y.(ed.): Special Issue on Cognitive Computing. International Journal of Software Science and Computational Intelligence 1(3) (July 2009d) 21. Wang, Y.: Paradigms of Denotational Mathematics for Cognitive Informatics and Cognitive Computing. Fundamenta Informaticae 90(3), 282–303 (2009e) 22. Wang, Y.: Granular Algebra for Modeling Granular Systems and Granular Computing. In: Proc. 8th IEEE International Conference on Cognitive Informatics (ICCI 2009), Hong Kong, pp. 145–154. IEEE CS Press, Los Alamitos (2009f) 23. Wang, Y.: Toward a Cognitive Behavioral Reference Model of Artificial Brains. Journal of Computational and Theoretical Nanoscience (2010a) (to appear) 24. Wang, Y.: Abstract Intelligence and Cognitive Robots. Journal of Behavioral Robotics 1(1), 66–72 (2010b) 25. Wang, Y.: A Sociopsychological Perspective on Collective Intelligence in Metaheuristic Computing. International Journal of Applied Metaheuristic Computing 1(1), 110–128 (2010c) 26. Wang, Y., Wang, Y.: Cognitive Informatics Models of the Brain. IEEE Trans. on Systems, Man, and Cybernetics (C) 36(2), 203–207 (2006) 27. Wang, Y., Kinsner, W.: Recent Advances in Cognitive Informatics. IEEE Transactions on Systems, Man, and Cybernetics (C) 36(2), 121–123 (2006a) 28. Wang, Y., Wang, Y., Patel, S., Patel, D.: A Layered Reference Model of the Brain (LRMB). IEEE Trans. on Systems, Man, and Cybernetics (C) 36(2), 124–133 (2006b) 29. Wang, Y., Kinsner, W., Zhang, D.: Contemporary Cybernetics and its Faces of Cognitive Informatics and Computational Intelligence. IEEE Trans. on System, Man, and Cybernetics (B) 39(4), 1–11 (2009a) 30. Wang, Y., Zadeh, L.A., Yao, Y.: On the System Algebra Foundations for Granular Computing. International Journal of Software Science and Computational Intelligence (1), 1–17 (2009b) 31. Wang, Y., Kinsner, W., Anderson, J.A., Zhang, D., Yao, Y., Sheu, P., Tsai, J., Pedrycz, W., Latombe, J.-C., Zadeh, L.A., Patel, D., Chan, C.: A Doctrine of Cognitive Informatics. Fundamenta Informaticae 90(3), 203–228 (2009c) 32. Wang, Y., Zhang, D., Tsumoto, S.: Cognitive Informatics, Cognitive Computing, and Their Denotational Mathematical Foundations (I). Fundamenta Informaticae 90(3), 1–7 (2009d) 33. Wang, Y., Chiew, V.: On the Cognitive Process of Human Problem Solving. Cognitive Systems Research: An International Journal 11(1), 81–92 (2010) 34. Zadeh, L.A.: Fuzzy Logic and Approximate Reasoning. Syntheses 30, 407–428 (1975) 35. Zadeh, L.A.: From Computing with Numbers to Computing with Words – from Manipulation of Measurements to Manipulation of Perception. IEEE Trans. on Circuits and Systems I 45(1), 105–119 (1999) 36. Zhong, N.: A Unified Study on Human and Web Granular Reasoning. In: Proc. 8th Int’l. Conf. Cognitive Informatics (ICCI 2009), Hong Kong, pp. 3–4. IEEE CS Press, Los Alamitos (July 2009)
An Adaptive Model for Dynamics of Desiring and Feeling Based on Hebbian Learning Tibor Bosse1, Mark Hoogendoorn1, Zulfiqar A. Memon1,2, Jan Treur1, and Muhammad Umair1,3 1 VU University Amsterdam, Department of AI De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands 2 Sukkur Ins. of Business Administration (Sukkur IBA) Air Port Road Sukkur, Sindh, Pakistan 3 COMSATS Institute of Information Technology, Dept. of Computer Science Lahore, Pakistan {tbosse,mhoogen,zamemon,treur,mumair}@few.vu.nl http://www.few.vu.nl/~{tbosse,mhoogen,zamemon,treur,mumair}
Abstract. Within cognitive models, desires are often considered as functional concepts that play a role in efficient focusing of behaviour. In practice a desire often goes hand in hand with having certain feelings. In this paper by adopting neurological theories a model is introduced incorporating both cognitive and affective aspects in the dynamics of desiring and feeling. Example simulations are presented, and both a mathematical and logical analysis is included.
1 Introduction Desires play an important role in human functioning. To provide automated support for human functioning in various domains [2], it may be important to also monitor the humans states of desiring. Desires [13] are often considered cognitive states with the function of focusing the behaviour by constraining or indicating the options for actions to be chosen. Yet, there is much more to the concept of desire, especially concerning associated affective aspects. Cognitive functioning is often strongly related to affective processes, as has been shown more in general in empirical work as described in, for example, [9, 19]. In this paper a model is introduced that addresses both cognitive and affective aspects related to desires, adopting neurological theories as described in, for example, [3, 6, 7, 8, 19]. The aim of developing such a model is both to analyse adaptive dynamics of interacting cognitive and affective processes, and to provide a basis for an ambient agent that supports a person; cf. [14, 16, 2]. Evaluation criteria include in how far the model shows emerging patterns that are considered plausible, and the possibility to use the model in model-based reasoning within an ambient agent; cf. [2]. Within the presented model an activated desire induces a set of responses in the form of preparations for actions to fulfil the desire, and involving changing body states. By a recursive as-if body loop each of these preparations generates a level of feeling [18] that in turn can strengthen the level of the related preparation. These loops result in equilibria for both the strength of the preparation and of the feeling, Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 14–28, 2010. © Springer-Verlag Berlin Heidelberg 2010
An Adaptive Model for Dynamics of Desiring and Feeling
15
and when these are strong enough, the action is actually activated. The specific strengths of the connections from the desire to the preparations, and within the recursive as-if body loops can be innate, or are acquired during lifetime. The computational model is based on neurological notions such as somatic marking, body loop and as-if body loop. The adaptivity in the model is based on Hebbian learning. Any mental state in a person induces emotions felt by this person, as described in [7, 8]; e.g., [8], p. 93: ‘… few if any exceptions of any object or event, actually present or recalled from memory, are ever neutral in emotional terms. Through either innate design or by learning, we react to most, perhaps all, objects with emotions, however weak, and subsequent feelings, however feeble.’ More specifically, in this paper it is assumed that responses in relation to a mental state of desiring roughly proceed according to the following causal chain for a body loop, based on elements from [3, 7, 8]: desire → preparation for bodily response → body state modification → sensing body state → sensory representation of body state → induced feeling
In addition, an as-if body loop uses a direct causal relation preparation for bodily response → sensory representation of body state
as a shortcut in the causal chain; cf. [7]. The body loop (or as-if body loop) is extended to a recursive (as-if) body loop by assuming that the preparation of the bodily response is also affected by the state of feeling the emotion: feeling → preparation for the bodily response
Such recursion is suggested in [8], pp. 91-92, noticing that what is felt is a body state under the person’s control: ‘The brain has a direct means to respond to the object as feelings unfold because the object at the origin is inside the body, rather than external to it. The brain can act directly on the very object it is perceiving. (…) The object at the origin on the one hand, and the brain map of that object on the other, can influence each other in a sort of reverberative process that is not to be found, for example, in the perception of an external object.’ Within the model presented in this paper, both the bodily response and the feeling are assigned a level (or gradation), expressed by a number. The causal cycle is triggered by an activation of the desire and converges to certain activation levels of feeling and preparation for a body state. The activation of a specific action preparation is based on both the activation level of the desire and of the feeling associated to this action. This illustrates Damasio’s theory on decision making by somatic marking, called the Somatic Marker Hypothesis; cf. [1, 6, 8]. The strengths of the connections from feeling to preparation may be subject to learning. Especially when a specific action is performed and it leads to a strong effect in feeling, by Hebbian learning [10, 12] this may give a positive effect on the strength of this connection and consequently on future activations of the preparation of this specific action. Through such a mechanism experiences in the past may have their effect on behavioural choices made in the future, as also described as part of Damasio’s Somatic Marker Hypothesis [6]. In the computational model described below, this is applied in the form of a Hebbian learning rule realising that actions induced by a certain desire which result in stronger experiences of satisfaction felt will be chosen more often to fulfil this desire.
16
T. Bosse et al.
In Section 2 the computational model for the dynamics of desiring and feeling is described. Section 3 presents some simulation results. In Section 4, formal analysis of the model is addressed, both by mathematical analysis of equilibria and automated logical verification of properties. Finally, Section 5 is a discussion.
2 Modelling Desiring and Feeling In this section the computational model for desiring and feeling is presented; for an overview see Fig. 1. This picture also shows representations from the detailed specifications explained below. The precise numerical relations between the indicated variables V shown are not expressed in this picture, but in the detailed specifications of properties below, which are labelled by LP0 to LP9 (where LP stands for Local Property), as also shown in the picture. The detailed specification (both informally and formally) of the computational model is presented below. Here capitals are used for (assumed universally quantified) variables. The model was specified in LEADSTO [4], where the temporal relation a → → b denotes that when a state property a occurs, then after a certain time delay (which can be specified as any positive real number), state property b will occur. In LEADSTO both logical and numerical relations can be specified. Generating a desire by sensing a bodily unbalance The desire considered in the example scenario is assumed to be generated by sensing an unbalance in a body state b, according to the principle that organisms aim at maintaining homeostasis of their internal milieu. The first dynamic property addresses how body states are sensed. LP0 Sensing a body state If body state property B has level V, then the sensor state for B will have level V. body_state(B, V) → sensor_state(B, V)
For the example scenario this dynamic property is used by the person to sense the body state b from which the desire originates (e.g., a state of being hungry), and the body states bi involved in feeling satisfaction with specific ways in which the desire is being fulfilled. From sensor states, sensory representations are generated as follows. LP1 Generating a sensory representation for a sensed body state If a sensor state for B has level V, then the sensory representation for B will have level V. sensor_state(B, V) → srs(B, V)
Next the dynamic property for the process for desire generation is described, from the sensory representation of the body state unbalance. LP2 Generating a desire based on a sensory representation If a sensory representation for B has level V, then the desire to address B will have level V. srs(B, V) → desire(B, V)
Inducing preparations It is assumed that activation of a desire, together with a feeling, induces preparations for a number of action options: those actions considered relevant to satisfy the desire, for example based on earlier experiences. Dynamic property LP3 describes such responses in the form of the preparation for specific actions. It combines the activation levels V and Vi of two states (desire and feeling) through connection
An Adaptive Model for Dynamics of Desiring and Feeling
17
strengths ω1i and ω2i respectively. This specifies part of the recursive as-if loop between feeling and body state. This dynamic property uses a combination model based on a function g(σ, τ,V, Vi, ω1i, ω2i) which includes a sigmoid threshold function th(σ, τ,V) =
σ
τ
with steepness σ and threshold τ . For this model g(σ, τ,V, Vi, ω1i, ω2i) is defined as g(σ, τ,V, Vi, ω1i, ω2i) = th(σ, τ,ω1iV + ω2iVi)
with V, Vi activation levels and ω1i, ω2i weights of the connections to the preparation state. Note that alternative combination functions g could be used as well, for example quadratic functions such as used in [15]. Property LP3 is formalised in LEADSTO as: LP3 From desire and feeling to preparation If the desire for b has level V and feeling the associated body state bi has level Vi and the preparation state for bi has level Ui and ω1i is the strength of the connection from desire for b to preparation for bi and ω2i is the strength of the connection from feeling of bi to preparation for bi and σ i is the steepness value for the preparation for bi and τ i is the threshold value for the preparation for bi and γ 1 is the person’s flexibility for bodily responses then the preparation state for bi will have level Ui + γ 1(g(σi, τ i, V, Vi, ω1i, ω2i) - Ui) Δt. desire(b, V) & feeling(bi, Vi) & prep_state(bi, Ui) & has_steepness(prep_state(bi), σi) & has_threshold(prep_state(bi), τi) → prep_state(bi, Ui + γ1 (g(σi, τ i, V, Vi, ω1i, ω2i) - Ui) Δt)
From preparation to feeling Dynamic properties LP4 and LP5 describe how the as-if body loop together with the body loop affects the feeling. LP4 From preparation and sensor state to sensory representation of body state If the preparation state for body state B has level V1 and the sensor state for B has level V2 and the sensory representation for B has level U and σ is the steepness value for the sensory representation of B and τ is the threshold value for the sensory representation of B and γ 2 is the person’s flexibility for bodily responses then the sensory representation for body state B will have level level U + γ2 (g(σ, τ, V1, V2, 1, 1) - U) Δt. prep_state(B, V1) & sensor_state(B, V2) & srs(B, U) & has_steepness(srs(B), σ) & has_threshold(srs(B), τ) → srs(B, U + γ2 (g(σ, τ, V1, V2, 1, 1) - U) Δt)
Dynamic properties LP5 describes the remaining part of the as-if body loop. LP5 From sensory representation of body state to feeling If a sensory representation for body state B has level V, then B will be felt with level V. srs(B, V) → feeling(B, V)
Action performance and effects on body states Temporal relationships LP6 and LP7 below describe the preparations of body states bi and their effects on body states b and bi. The idea is that the actions performed by body states bi are different means to satisfy the desire related to b, by having an impact on the body state that decreases the activation level V (indicating the extent of
18
T. Bosse et al.
unbalance) of body state b. In addition, when performed, each of them involves an effect on a specific body state bi which can be interpreted as a basis for a form of satisfaction felt for the specific way in which b was satisfied. So, an action performance involving bi has an effect on both body state b, by decreasing the level of unbalance entailed by b, and on body state bi by increasing the specific level of satisfaction. This specific level of satisfaction may or may not be proportional to the extent to which the unbalance is reduced.
effector_ state(bi, Vi)
prep_state(bi, Vi)
sensor_state(b,V)
desire(b, V)
srs(b, V)
LP6 LP1
sensor state(b1,V1)
LP2
srs(b1, V1) feeling(b1, V1)
srs(b2, V2) feeling(b2, V2)
sensor_state(b3,V3)
LP3 LP7
srs(b3, V3) feeling(b3, V3) LP5
LP8
LP4
LP0
body_state(b3,V)
body_state(b2,V)
body_state(b1,V)
body_state(b,V)
Fig. 1. Overview of the computational model for desiring and feeling
As the possible actions to fulfil a desire are considered different, they differ in the extents of their effects on these two types of body states, according to an effectiveness rate αi between 0 and 1 for b, and an effectiveness rate βi between 0 and 1 for bi. The effectiveness rates αi and βi can be considered a kind of connection strengths from the effector state to the body states b and bi, respectively. In common situations for each action these two rates may be equal (i.e., αi = βi), but especially in more pathological
An Adaptive Model for Dynamics of Desiring and Feeling
19
cases they may also have different values where the satisfaction felt based on rate βi for bi may be disproportionally higher or lower in comparison to the effect on b based on rate αi (i.e., βi > αi or βi < αi). An example of this situation would be a case of addiction for one of the actions. To express the extent of disproportionality between βi and αi, a parameter λi, called satisfaction disproportion rate, between -1 and 1 is used; here: λi = (βi - αi) / (1-αi) if βi ≥ αi; λi = (βi - αi) /αi if βi ≤ αi. This parameter can also be used to relate βi to αi using a function: βi = f(λi, αi). Here f(λ, α) satisfies f(0, α) = α f(-1, α) = 0 f(1, α) = 1 The piecewise linear function f(λ, α) can be defined in a continuous manner as: f(λ, α) = α + λ(1-α) if λ ≥ 0; f(λ, α) = (1+λ)α if λ ≤ 0 Using this, for normal cases λi = 0 is taken, for cases were satisfaction is higher 0 < λi ≤ 1 and for cases where satisfaction is lower -1 ≤ λi < 0. LP6 From preparation to effector state If preparation state for B has level V,
prep_state(B, V) → effector_state(B, V)
then the effector state for body state B will have level V.
LP7 From effector state to modified body state bi If the effector state for bi has level Vi, and for each i the effectivity of bi for b is αi and the satisfaction disproportion rate for bi for b is λi then body state bi will have level f(λi, αi)Vi. effector_state(bi, Vi) & is_effectivity_for(αi, bi, b) & is_disproportion_rate_for(λi, bi) → body_state(bi, f(λi, αi)Vi)
LP8 From effector state to modified body state b If the effector states for bi have levels Vi, and body state b has level V, and for each i the effectivity of bi for b is αi then body state b will have level V +(ϑ * (1-V) – ρ * (1 – ( (1 - α1 * V1) * (1 - α2 * V2) * (1 - α3 * V3) )) * V) Δt. effector_state(bi, Vi) & body_state(b, V) & is_effectivity_for(αi, bi, b) → body_state(b, V + (ϑ * (1-V) – ρ * (1 – ( (1 - α1*V1) * (1 - α2*V2) * (1 - α3*V3) )) * V) Δt
Note that in case only one action is performed (i.e., Vj = 0 for all j ≠ i), the formula in LP8 above reduces to V +(ϑ * (1-V) – ρ*αi*Vi * V) Δt. In the formula ϑ is a rate of developing unbalance over time (for example, getting hungry), and ρ a general rate of compensating for this unbalance. Note that the specific formula used here to adapt the level of b is meant as just an example. As no assumptions on body state b are made, this formula is meant as a stand-in for more realistic formulae that could be used for specific body states b. Learning of the connections from desire to preparation The strengths ω2i of the connections from feeling bi to preparation of bi are considered to be subjected to learning. When an action involving bi is performed and leads to a strong effect on bi, by Hebbian learning [10, 12] this increases the strength of this connection. This is an adaptive mechanism that models how experiences in the past may have their effect on behavioural choices made in the future, as also described in Damasio’s Somatic Marker Hypothesis [6]. Within the model the strength ω2i of the connection from feeling to preparation is adapted using the following Hebbian learning rule. It takes into account a maximal connection strength 1, a learning rate η, and an extinction rate ζ.
20
T. Bosse et al.
LP9 If and and and and then
Hebbian learning for the connection from feeling to preparation the connection from feeling bi to preparation of bi has strength ω2i the feeling bi has level V1i the preparation of bi has level V2i the learning rate from feeling bi to preparation of bi is η the extinction rate from feeling bi to preparation of bi is ζ after Δt the connection strength from feeling bi to preparation of bi will be ω2i + (ηV1iV2i (1 - ω2i) - ζω2i) Δt.
has_connection_strength(feeling(bi), preparation(bi), ω2i) & feeling(bi, V1i) & preparation(bi, V2i) & has_learning_rate(feeling(bi), preparation(bi), η) & has_extinction_rate(feeling(bi), preparation(bi), ζ) → has_connection_strength(feeling(bi), preparation(bi), ω2i + (ηV1iV2i (1 - ω2i) - ζω2i) Δt)
3 Example Simulation Results Based on the model described in the previous section, a number of simulations have been performed. A first example simulation trace included in this section as an illustration is shown in Fig. 2; in all traces, the time delays within the temporal LEADSTO relations were taken 1 time unit. Note that only a selection of the relevant nodes (represented as state properties) is shown. In all of the figures time is on the horizontal axis, and the activation levels of state properties on the vertical axis. 1.2
1.2 1
1
0.8
0.8
0.6
effector1 effector2 effector3
0.4 0.2
body state body1 body2 body3
0.6 0.4 0.2
0
0 1
101
201
(a)
301
401
1
1.2 1 0.8 0.6
feeling1 feeling2 feeling3
0.4 0.2 0 1
101
201
(c)
301
401
101
201
(b)
301
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
401
Connection Strength-1 Connection Strength-2 Connection Strength-3
1
101
201
301
(d)
401
501
601
Fig. 2. Simulation Trace 1 – Normal behavior (σ1=σ2=10, τ1=τ2=0.5, γ1=γ2=0.05, α1=β1 =0.05, α2=β2 =0.25, α3=β3=1, ρ=0.8, ϑ=0.1, η=0.04, ζ=0.01)
For the example shown in Fig. 2, for each i it was taken λi = 0, so satisfaction felt is in proportion with fulfilment of the desire. Action option 3 has the highest effectiveness rate, i.e. α3 =1. Its value is higher as compared to the other two action options. This effect has been propagated to their respective body states as shown in
An Adaptive Model for Dynamics of Desiring and Feeling
21
Fig. 2(b). All these body states has a positive effect on body state b, decreasing the level of unbalance, as shown in Fig. 2(b), where the value of body state b (which was set initially to 0.3) decreases over time until it reaches an equilibrium state. Each of these body states generates feelings by a recursive as-if body loop, as shown in Fig. 2(c). Furthermore it gives a strong effect on the strength of the connection from feeling to preparation. The connection strength keeps on increasing over time until it reaches an equilibrium state, as shown in Fig. 2(d). As the extinction rate (ζ=0.01) is smaller compared to the learning rate (η=0.04), the connection strength becomes 0.8, which is closer to 1, as confirmed by the mathematical analysis in Section 4. Fig. 3, shows the simulation of an example scenario where the person is addicted to a particular action, in this case to action option 1, λ1 = 1. But because the effectiveness rate α1 for this option is very low (0.05), the addiction makes that the person is not very effective in fulfilling the desire: the level of unbalance remains around 0.3; the person mainly selects action option 1 because of its higher satisfaction.
Fig. 3. Simulation Trace 2 – Addiction-like behaviour (σ1=σ2=10, τ1=τ2=0.5, γ1=γ2=0.05, α1=0.05, α2=β2=0.1, α3=β3=0.7, ρ =0.8, ϑ=0.1, η=0.02, ζ=0.01)
In the next trace (see Fig. 4), the effectiveness rates for the different action options have been given a distinct pattern, i.e. after some time α1 has been gradually increased with a term of 0.009, starting with an initial value of 0.05 until it reaches the value of 1, thereafter it has been kept constant to 1. In the same period the effectiveness rate α3 has been gradually decreased with 0.009, starting with an initial value of 1, until it reaches the value of 0.05, thereafter it has been kept constant to 0.05, showing an exact opposite pattern of α1. Effectiveness rate α2 is being kept constant to 0.15 for all
22
T. Bosse et al.
the time points. As can be seen in Fig. 4, first the person selects action option 3 as the most effective one, but after a change in circumstances the person shows adaptation by selecting action option 1, which has now a higher effectiveness rate.
Fig. 4. Simulation Trace 3 – Adapting to changing circumstances (σ1=σ2=6, τ1=τ2=0.5, γ1=γ2=0.1, α1=β1 increasing from 0.05 to 1, α2=β2=0.15, α3=β3 decreasing from 1 to 0.05, ρ =0.8, ϑ=0.1, η=0.04, ζ=0.02)
4 Formal Analysis of the Model This section addresses formal analysis of the model and the simulation results as presented above. First a mathematical analysis of the equilibria is made. Next, a number of more globally emerging dynamic properties are verified for a set of simulation traces. Mathematical analysis of equilibria For an equilibrium of the strength of the connection from feeling bi to preparation of bi, by LP9 it holds ηV1iV2i (1 - ω2i) - ζω2i = 0 with values V1i for feeling level and V2i for preparation level for bi. This can be rewritten into ω2i =
η η
ζ
=
ζ/ η
Using V1i, V2i ≤ 1 from this it follows that ω2i ≤
ζ /η
gives a maximal connection strength that can be obtained. This shows that given the extinction, the maximal connection strength will be lower than 1, but may be close to
An Adaptive Model for Dynamics of Desiring and Feeling
23
1 when the extinction rate is small compared to the learning rate. For example, for the trace shown in Fig. 2 with ζ = 0.01 and η=0.04, this bound is 0.8, which indeed is reached for option 3. For the traces in Fig. 3 and 4 with ζ /η = ½ this maximum is 2/3, which is indeed reached for option 1 in Fig. 3 and option 3, resp. 1 in Fig. 4. Whether or not this maximally possible value for ω2i is approximated for a certain option, also depends on the equilibrium values for feeling level V1i and preparation level V2i for bi. For values of V1i and V2i that are 1 or close to 1, the maximal possible value of ω2i is approximated. When in contrast these values are very low, also the equilibrium value for ω2i will be low, since: ω2i =
η η
≤ η V1iV2i /ζ
ζ
So, when one of V1i and V2i is 0 then also ω2i = 0 (and conversely). This is illustrated by the options 1 and 2 in Fig. 2, and option 2 in Fig. 3. Given the sigmoid combination functions it is not possible to analytically solve the equilibrium equations in general. Therefore the patterns emerging in the simulations cannot be derived mathematically in a precise manner. However, as the combination functions are monotonic, some relationships between inequalities can be found: (1) (2) (3) (4) (5) (6)
V1jV2j ≤ V1kV2k ⇒ ω2j ≤ ω2k ω2j < ω2k ⇒ V1jV2j < V1kV2k ω2j ≤ ω2k & V1j ≤ V1k ⇒ ω2j V1j ≤ ω2k V1k ⇒ V2j ≤ V2k V2j < V2k ⇒ ω2j V1j < ω2k V1k βj ≤ βk & V2j ≤ V2k ⇒ (1+βj ) V2j ≤ (1+βk ) V2k ⇒ V1j ≤ V1k V1j < V1k ⇒ (1+βj ) V2j < (1+βk ) V2k
Here (1) and (2) follow from the above expressions based on LP9. Moreover, (3) and (4) follow from LP3, and (5) and (6) from the properties LP4, LP5, LP6, LP7, LP0 and LP1 describing the body loop and as-if body loop. For the case that one action dominates exclusively, i.e., V2k = 0 and ω2k = 0 for all k ≠ i, and V2i > 0, by LP8 it holds ϑ * (1-V) – ρ * αi * V2i * V = 0 where V is the level of body state b. Therefore for ϑ >0 it holds V =
ρα /ϑ
≥
ρ/ ϑ α
As V2i > 0 is assumed, this shows that if ϑ is close to 0 (almost no development of unbalance), and ρ > 0 and αi > 0, the value V can be close to 0 as well. If, in contrast, the value of ϑ is high (strong development of unbalance) compared to ρ and αi, then the equilibrium value V will be close to 1. For the example traces in Fig. 2, 3 and 4, ρ =0.8 and ϑ=0.1, so ρ /ϑ = 8. Therefore for a dominating option with αi = 1, it holds V ≥ 0.11, which can be seen in Fig. 2 and 4. In Fig. 3 the effectiveness of option 1 is very low (α1 = 0.05), and therefore the potential of this option to decrease V is low: V ≥ 0.7. However, as in Fig. 3 also option 3 is partially active, V reaches values around 0.35. Note that for the special case ϑ = 0 (no development of unbalance) it follows that ρ * αi * V2i * V = 0 which shows that V = 0. Values for V at or close to 0 confirm that in such an equilibrium state the desire is fulfilled or is close to being fulfilled (via LP0, LP1 and LP2 which show that the same value V occurs for the desire).
24
T. Bosse et al.
Logical verification of properties on simulation traces In order to investigate particular patterns in the processes shown in the simulation runs, a number of properties have been formulated. Formal specification of the properties, enabled automatic verification of them against simulation traces, using the logical language and verification tool TTL (cf. [5]). The purpose of this type of verification is to check whether the simulation model behaves as it should. A typical example of a property that may be checked is whether certain equilibria occur, or whether the appropriate actions are selected. The temporal predicate logical language TTL supports formal specification and analysis of dynamic properties, covering both qualitative and quantitative aspects. TTL is built on atoms referring to states of the world, time points and traces, i.e. trajectories of states over time. Dynamic properties are temporal statements formulated with respect to traces based on the state ontology Ont in the following manner. Given a trace γ over state ontology Ont, the state in γ at time point t is denoted by state(γ, t). These states are related to state properties via the infix predicate |=, where state(γ, t) |= p denotes that state property p holds in trace γ at time t. Based on these statements, dynamic properties are formulated in a sorted predicate logic, using quantifiers over time and traces and the usual logical connectives such as ¬, ∧, ∨, ⇒, ∀, ∃. For more details on TTL, see [5]. A number of properties have been identified for the processes modelled. Note that not all properties are expected to always hold for all traces. The first property, GP1 (short for Global Property 1), expresses that eventually the preparation state with respect to an action will stabilise. GP1(d): Equilibrium of preparation state Eventually, the preparation state for each bi will stabilise at a certain value (i.e., not deviate more than a value d). ∀γ:TRACE, B:BODY_STATE [ ∃t1:TIME [ ∀t2:TIME > t1, V1, V2 :VALUE [ state(γ, t1) |= prep_state(B, V1) & state(γ, t2) |= prep_state(B, V2) ⇒ V2 ≥ (1 – d) * V1 & V2 ≤ (1 + d) * V1 ] ] ]
Next, in property GP2 it is expressed that eventually the action which has the most positive feeling associated with it will have the highest preparation state value. GP2: Action with best feeling is eventually selected For all traces there exists a time point such that the bi with the highest value for feeling eventually also has the highest activation level. ∀γ:TRACE, B:BODY_STATE, t1:TIME<end_time, V:VALUE [ [ state(γ, t1) |= feeling(B, V) & ∀B2:BODY_STATE, V2:VALUE [ state(γ, t1) |= feeling(B2, V2) ⇒ V2 ≤ V] ⇒ [ ∃t2:TIME > t1, V1:VALUE [ state(γ, t2) |= prep_state(B, V1) & ∀B3:BODY_STATE, V3:VALUE [ state(γ, t2) |= prep_state(B3, V3) ⇒ V3 ≤ V1 ] ] ] ]
Property GP3 expresses that if the accumulated positive feelings experienced in the past are higher compared to another time point, and the number of negative experiences is lower or equal, then the weight through Hebbian learning will be higher. GP3: Accumulation of positive experiences If at time point t1 the accumulated feeling for bi is higher than the accumulated feeling at time point t2, then the weight of the connection from bi is higher than at t1 compared to t2.
An Adaptive Model for Dynamics of Desiring and Feeling
25
∀γ:TRACE, B:BODY_STATE, a:ACTION, t1, t2:TIME<end_time, V1, V2:VALUE [ [state(γ, t1) |= accumulated_feeling(B, V1) & state(γ, t2) |= accumulated_feeling(B, V2) & V1>V2 ] ⇒ ∃W1, W2:VALUE [state(γ, t1) |= has_connection_strength(feeling(B), preparation(B), W1) & state(γ, t2) |= has_connection_strength(feeling(B), preparation(B), W2) & W1 ≥ W2 ] ]
Next, property GP4 specifies a monotonicity property where two traces are compared. It expresses that strictly higher feeling levels result in a higher weight of the connection between the feeling and the preparation state. GP4: High feelings lead to high connection strength If at time point t1 in a trace γ1 the feelings have been strictly higher level compared to another trace γ2, then the weight of the connection between the feeling and the preparation state will also be strictly higher. ∀γ1, γ2:TRACE, B:BODY_STATE, t1:TIME<end_time, W1, W2:VALUE [∀t’ < t1:TIME, V1, V2:VALUE [ [ state(γ1, t’) |= feeling(B, V1) & state(γ2, t’) |= feeling(B, V2) ] ⇒ V1 > V2 ] & state(γ1, t1) |= has_connection_strength(feeling(B), preparation(B), W1) & state(γ2, t1) |= has_connection_strength(feeling(B), preparation(B), W2) ⇒ W1 ≥ W2 ]
Finally, property GP5 analyses traces that address cases of addiction. In particular, it checks whether it is the case that if a person is addicted to a certain action (i.e., has a high value for the satisfaction disproportion rate λ for this action), this results in a situation of unbalance (i.e., a situation in which the feeling caused by this action stays higher than the overall body state). An example of such a situation is found in simulation trace 2 (in Fig. 3). GP5: Addiction leads to unbalance between feeling and body state For all traces, if a certain action has λ > 0, then there will be a time point t1 after which the feeling caused by this action stays higher than the overall body state. ∀γ:TRACE, B1:BODY_STATE, L1:VALUE [ state(γ, 0) |= has_lambda(B1,L1) & L1 > 0 ⇒ [ ∃t1:TIME < last_time ∀t2:TIME>t1 X,X1:VALUE [ state(γ, t2) |= body_state(b, X) & body_state(B1, X1) ⇒ X < X1 ] ] ]
An overview of the results of the verification process is shown in Table 1 for the three traces that have been considered in Section 4. The results show that several expected global properties of the model were confirmed. For example, the first row indicates that for all traces, eventually an equilibrium occurs in which the values of the preparation states never deviate more than 0.0005 (this number can still be decreased by running the simulation for a longer time period). Also, the checks indicate that some properties do not hold. In such cases, the TTL checkersoftware provides a counter example, i.e., a situation in which the property does not hold. This way, it could be concluded, for example, that property GP1 only holds for the generated traces if d is not chosen too small. Table 1. Results of verification property GP1(X) GP2 GP3 GP4 GP5
trace 1 trace 2 trace 3 X≥0.0001 X≥0.0005 X≥0.0001 satisfied satisfied satisfied satisfied satisfied Satisfied satisfied for all pairs of traces satisfied satisfied satisfied
26
T. Bosse et al.
5 Discussion In this paper an adaptive computational model was introduced for dynamics of cognitive and affective aspects of desiring, based on neurological theories involving (as-if) body loops, somatic marking, and Hebbian learning. The introduced model describes more specifically how a desire induces (as a response) a set of preparations for a number of possible actions, involving certain body states, which each affect sensory representations of the body states involved and thus provide associated feelings. On their turn these feelings affect the preparations, for example, by amplifying them. In this way an model is obtained for desiring which integrates both cognitive and affective aspects of mental functioning. For the interaction between feeling and preparation of responses, a converging recursive body loop is included in the model, based on elements taken from [3, 7, 8]. Both the strength of the preparation and of the feeling emerge as a result of the dynamic pattern generated by this loop. The model is adaptive in the sense that within these loops the connection strengths from feelings to preparations are adapted over time by Hebbian learning. By this adaptation mechanism, in principle the person achieves that the most effective action to fulfill a desire is chosen. However, the model can also be used to cover persons for whom satisfaction for an action is not in proportion with the fulfilment of the desire, as occurs, e.g., in certain cases of temptation and addiction, such as illustrated in [14]. Despite growing interest in integrating cognitive and affective aspects of mental functioning in recent years, both in informally described approaches [9, 19] and in formal and computational approaches [11, 15], the relation of affective and cognitive aspects of desires has received less than adequate attention. Moreover, most existing formal models that integrate cognitive and affective aspects in mental functioning adopt the BDI (belief-desire-intention) paradigm and/or are based on appraisal theory (e.g., [11]). The proposed model is the first to show the effect of desire on feeling in a formalised computational manner and is based on neurological theories given in the literature as opposed to the BDI paradigm or appraisal-based theories. An interesting contrasting proposal of representing feelings as resistance to variance is put forward by [17]; this model is however not computational. The computational model was specified in the hybrid dynamic modelling language LEADSTO, and simulations were performed in its software environment; cf. [4]. The computational model was analysed through a number of simulations for a variety of different settings and scenarios, and by formal analyses both by mathematical methods and by automated logical verification of dynamic properties on a set of simulation traces. Several expected global properties, such as the occurrence of equilibria and the selection of appropriate actions, were confirmed for the generated traces. Although this is not an exhaustive proof, it is an important indication that the model behaves as expected. Currently the model is generic in a sense that it does not address any specific desire or feeling. It would be an interesting future work to parameterise the model to analyse desire relating to different types of feeling. Future work will also focus on a more extensive validation of the model. It was shown that under normal circumstances indeed over time the behaviour of the person is more and more focusing on actions that provide higher levels of desire fulfilment and stronger feelings of satisfaction, thus improving effectiveness of desire fulfilment. Also less standard circumstances have been analysed: particular cases in
An Adaptive Model for Dynamics of Desiring and Feeling
27
which the fulfilment of the desire and the feeling of satisfaction are out of proportion, as, for example, shown in some types of addictive behaviour. Indeed also such cases are covered well by the model as it shows over time a stronger focus on the action for which the satisfaction is unreasonably high, thereby reducing the effectiveness to fulfil the desire. In [14] it is reported how this model can be used as a basis for an ambient agent performing model-based reasoning and supporting addictive persons in order to avoid temptations.
References [1] Bechara, A., Damasio, A.: The Somatic Marker Hypothesis: a neural theory of economic decision. Games and Economic Behavior 52, 336–372 (2004) [2] Bosse, T., Both, F., Gerritsen, C., Hoogendoorn, M., Treur, J.: Model-Based Reasoning Methods within an Ambient Intelligent Agent Model. In: Mühlhäuser, M., Ferscha, A., Aitenbichler, E. (eds.) Proceedings of the First International Workshop on Human Aspects in Ambient Intelligence, Constructing Ambient Intelligence: AmI-2007 Workshops Proceedings. Communications in Computer and Information Science (CCIS), vol. 11, pp. 352–370. Springer, Hiedelberg (2008) [3] Bosse, T., Jonker, C.M., Treur, J.: Formalisation of Damasio’s Theory of Emotion, Feeling and Core Consciousness. Consciousness and Cognition 17, 94–113 (2008) [4] Bosse, T., Jonker, C.M., van der Meij, L., Treur, J.: A Language and Environment for Analysis of Dynamics by Simulation. International Journal of Artificial Intelligence Tools 16, 435–464 (2007) [5] Bosse, T., Jonker, C.M., van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and Verification of Dynamics in Agent Models. International Journal of Cooperative Information Systems 18, 167–193 (2009) [6] Damasio, A.: Descartes’ Error: Emotion, Reason and the Human Brain, Papermac. (1994) [7] Damasio, A.: The Feeling of What Happens. In: Body and Emotion in the Making of Consciousness. Harcourt Brace, New York (1999) [8] Damasio, A.: Looking for Spinoza. Vintage books, London(2004) [9] Eich, E., Kihlstrom, J.F., Bower, G.H., Forgas, J.P., Niedenthal, P.M.: Cognition and Emotion. Oxford University Press, New York (2000) [10] Gerstner, W., Kirstner, W.M.: Mathematical formulations of Hebbian learning. Biol. Cybern. 87, 145–404 (2002) [11] Gratch, J., Marsella, S.: A domain independent framework for modeling emotion. Journal of Cognitive Systems Research 5, 269–306 (2004) [12] Hebb, D.: The Organisation of Behavior. Wiley, New York (1949) [13] Marks, J.: The Ways of Desire: New Essays in Philosophical Psychology on the Concept of Wanting. Transaction Publishers, New Brunswick (1986) [14] Memon, Z.A., Treur, J.: An Adaptive Integrative Ambient Agent Model to Intervene in the Dynamics of Beliefs and Emotions. In: Catrambone, R., Ohlsson, S. (eds.) Proc. of the 32nd Annual Conference of the Cognitive Science Society, CogSci 2010. Cognitive Science Society, Austin (to appear 2010) [15] Memon, Z.A., Treur, J.: Modelling the Reciprocal Interaction between Believing and Feeling from a Neurological Perspective. In: Zhong, N., Li, K., Lu, S., Chen, L. (eds.) BI 2009. Lecture Notes in Computer Science(LNAI), vol. 5819, pp. 13–24. Springer, Heidelberg (2009)
28
T. Bosse et al.
[16] Riva, G., Vatalaro, F., Davide, F., Alcañiz, M. (eds.): Ambient Intelligence. IOS Press, Amsterdam (2005) [17] Rudrauf, D., Damasio, A.: A conjecture regarding the biological mechanism of subjectivity and feeling. Journal of Consciousness Studies 12(8-10), 26–42 (2005) [18] Solomon, R.C. (ed.): Thinking About Feeling: Contemporary Philosophers on Emotions. Oxford University Press, Oxford (2004) [19] Winkielman, P., Niedenthal, P.M., Oberman, L.M.: Embodied Perspective on EmotionCognition Interactions. In: Pineda, J.A. (ed.) Mirror Neuron Systems: the Role of Mirroring Processes in Social Cognition, pp. 235–257. Humana Press/Springer Science (2009)
Modelling the Emergence of Group Decisions Based on Mirroring and Somatic Marking Mark Hoogendoorn, Jan Treur, C. Natalie van der Wal, and Arlette van Wissen Vrije Universiteit Amsterdam, Department of Artificial Intelligence De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands {mhoogen,treur,cn.van.der.wal,wissen}@few.vu.nl http://www.few.vu.nl/~{mhoogen,treur,cn.van.der.wal,wissen}
Abstract. This paper introduces a neurologically inspired computational model for the emergence of group decisions. The model combines an individual decision making model based on Damasio’s Somatic Marker Hypothesis with mutual effects of group members on each other via mirroring of emotions and intentions. The obtained model shows how this combination of assumed neural mechanisms can form an adequate basis for the emergence of common group decisions, while, in addition, there is a feeling of wellness with these common decisions amongst the group members.
1 Introduction To express the impossibility of a task, sometimes the expression ‘like managing a herd of cats’ is used, for example, in relation to managing a group of researchers. This is meant to indicate that no single direction or decision will come out of such a group, no matter how hard it is tried. As an alternative, sometimes a reference is made to ‘riding a garden-cart with frogs’. It seems that such a lack of coherence-directed tendency in a group is considered as something exceptional, a kind of surprising, and in a way unfair. However, as each group member is an autonomous agent with his or her own neurological structures, patterns and states, carrying for example, their own emotions, desires, preferences, and intentions, it would be more reasonable to expect that the surprise concerns the opposite side: how is it possible that so often, groups – even those of researchers – develop coherent directions and decisions, and, moreover, why do the group members in some miraculous manner even seem to feel good with these? This paper presents a neurologically inspired computational modelling approach for the emergence of group decisions. It incorporates the ideas of somatic marking as a basis for individual decision making, see [1], [3], [5], [6] and mirroring of emotions and intentions as a basis for mutual influences between group members, see [7], [11], [12], [14], [15], 16], [18]. The model shows how for many cases indeed, the combination of these two neural mechanisms is sufficient to obtain the emergence of common group decisions on the one hand, and, on the other hand, to achieve that the group members have a feeling of wellness with these decisions. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 29–41, 2010. © Springer-Verlag Berlin Heidelberg 2010
30
M. Hoogendoorn et al.
The paper is organised as follows. In Section 2 a brief introduction of the neurological ideas underlying the approach is presented: mirroring and somatic marking. Next, in Section 3 the computational model is described in detail. Section 4 presents a number of simulation results. Section 5 addresses verification of the model against formally specified properties describing expected emerging patterns. Finally, Section 6 is a discussion.
2 Somatic Marking and Mirroring Cognitive states of a person, such as sensory or other representations often induce emotions felt within this person, as described by neurologist Damasio, [4], [5]; for example: ‘Even when we somewhat misuse the notion of feeling – as in “I feel I am right about this” or “I feel I cannot agree with you” – we are referring, at least vaguely, to the feeling that accompanies the idea of believing a certain fact or endorsing a certain view. This is because believing and endorsing cause a certain emotion to happen.’ ([5], p. 93)
Damasio’s Somatic Marker Hypothesis; cf. [1], [3], [5], [6], is a theory on decision making which provides a central role to emotions felt. Within a given context, each represented decision option induces (via an emotional response) a feeling which is used to mark the option. For example, a strongly negative somatic marker linked to a particular option occurs as a strongly negative feeling for that option. Similarly, a positive somatic marker occurs as a positive feeling for that option. Damasio describes the use of somatic markers in the following way: ‘the somatic marker (..) forces attention on the negative outcome to which a given action may lead, and functions as an automated alarm signal which says: beware of danger ahead if you choose the option which leads to this outcome. The signal may lead you to reject, immediately, the negative course of action and thus make you choose among other alternatives. (…) When a positive somatic marker is juxtaposed instead, it becomes a beacon of incentive.’ ([3], pp. 173-174)
Usually the Somatic Marker Hypothesis is applied to provide endorsements or valuations for options for a person’s actions, thus shaping a decision process. Somatic markers may be innate, but may also by adaptive, related to experiences: ‘Somatic markers are thus acquired through experience, under the control of an internal preference system and under the influence of an external set of circumstances which include not only entities and events with which the organism must interact, but also social conventions and ethical rules. ([3], p. 179)
In a social context, the idea of somatic marking can be combined with recent neurological findings on the mirroring function of certain neurons (e.g., [7], [11], [12], [14], [15], [16], [17], [18]. Mirror neurons are neurons which, in the context of the neural circuits in which they are embedded, show both a function to prepare for certain actions or bodily changes and a function to mirror states of other persons. They are active not only when a person intends to perform a specific action or body change, but also when the person observes somebody else intending or performing this action or body change. This includes expressing emotions in body states, such as facial expressions. For example, there is strong evidence that (already from an age of just 1 hour) sensing somebody else’s face expression leads (within about 300 milliseconds) to preparing for and showing the same face expression ([10], p. 129-130). The idea is
Modelling the Emergence of Group Decisions
31
that these neurons and the neural circuits in which they are embedded play an important role in social functioning and in (empathic) understanding of others; (e.g., [7], [11], [17], [18]). The discovery of mirror neurons is often considered a crucial step for the further development of the discipline of social cognition, comparable to the role the discovery of DNA has played for biology, as it provides a biological basis for many social phenomena; cf. [11]. Indeed, when states of other persons are mirrored by some of the person’s own states that at the same time are connected via neural circuits to states that are crucial for the own feelings and actions, then this provides an effective basic mechanism for how in a social context persons fundamentally affect each other’s actions and feelings. Given the general principles described above, the mirroring function relates to decision making in two different ways. In the first place mirroring of emotions indicates how emotions felt in different individuals about a certain considered decision option mutually affect each other, and, assuming a context of somatic marking, in this way affect how by individuals decision options are valuated in relation to how they feel about them. A second way in which a mirroring function relates to decision making is by applying it to the mirroring of intentions or action tendencies of individuals for the respective decision options. This may work when by verbal and/or nonverbal behaviour, individuals show in how far they tend to choose for a certain option. For example, in ([9], p.70) action tendencies are described as ‘states of readiness to execute a given kind of action, [which] is defined by its end result aimed at or achieved’. In the computational model introduced below both of these (emotion and intention) mirroring effects are incorporated in the proposed model.
3 The Computational Model for Group Decision Making In this section, based on the neurological principles of somatic marking and mirroring discussed in the previous section, the computational model for group decision making is introduced. To design such a model a choice has to be made for the grain-size: for example, it has to be decided in which level of detail the internal neurological processes of individuals are described. Such a choice depends on the aim of the model. In this case the aim was more to be able to simulate emerging patterns in groups of individuals, than to obtain a more detailed account of the intermediate neurological patterns and states involved. Therefore the choice was made to abstract to a certain extent from the latter types of intermediate processes. For example, the process of mirroring is described in an abstract manner by a direct causal relation from the emotional state shown by an individual to the emotional state shown by another individual, and the process of somatic marking is described by a direct causal relation from the emotional state shown for a certain option to the intention shown for this option (see Figure 1). These choices provide a model that is easier to handle for larger numbers of individuals. However, the model can easily be refined into a model that also incorporates more detailed intermediate internal processes, for example, based on recursive as-if body loops involving preparation and sensory neuron activations and the states of feeling the emotion, as shown in [13].
32
M. Hoogendoorn et al.
emotion states of other group members for option O
intention states of other group members for option O
A’s mirroring of emotion for option O A’s emotion state for option O A’s somatic marking for option O
A’s intention state A’s mirroring for option O of intention for option O
Fig. 1. Abstract causal relations induced by mirroring and somatic marking by person A
First for a given state S of a person (for example, an emotion or an intention) the impact due to the person’s mirroring function is described. This is done by a basic building block called the contagion strength for any particular state S between two individuals within a group. This contagion strength from person B to person A for state S is defined as follows: γSBA = εSB ⋅ αSBA ⋅ δSA
(1)
Here εSB is the personal characteristic expressiveness of the sender (person B) for S, δSA the personal characteristic openness of the receiver (person A) for S, and αSBA the interaction characteristic channel strength for S from sender B to receiver A. The expressiveness describes the strength of expression of given internal states by verbal and/or nonverbal behaviour (e.g., body states). The openness describes how strong stimuli from outside are propagated internally. The channel strength depends on the type of connection between the two persons, for example their closeness. To determine the level qSA(t) of an agent A for a specific state S the following model is used. First, the overall contagion strength γSA from the group towards agent A is calculated: γSA = ∑B≠A γSBA
(2)
This value is used to determine the weighed impact qSA*(t) of all the other agents upon state S of agent A: qSA*(t) = ∑B≠A γSBA ⋅ qSB(t) / γSA
(3)
How much this external influence actually changes state S of the agent A is determined by two additional personal characteristics of the agent, namely the tendency ηSA to absorb or to amplify the level of a state and the bias βSA towards positive or negative impact for the value of the state. The model to update the value of qSA(t) over time is then expressed as follows: qSA(t + Δt) = qSA(t) + γSA ·[ηSA·[βSA·(1 - (1-qSA*(t))·(1-qSA(t))) + (1 βSA)·qSA*(t)·qSA(t) ] + (1 - ηSA)·qSA*(t) - qSA(t) ] Δt
(4)
Modelling the Emergence of Group Decisions
33
Here the new value of the state is the old value, plus the change of the value based on the contagion. This change is defined as the multiplication of the contagion strength times a factor for the amplification of information plus a factor for the absorption of information. The absorption part (after 1 - ηSA) simply considers the difference between the incoming contagion and the current level for S. The amplification part (afterηSA) depends on the tendency or bias of the agent towards more positive (part of equation multiplied by βSA) or negative (part of equation multiplied by 1 - βSA) level for S. Table 1 summarizes the most important parameters and state variables within the model (note that the last two parameters will be explained below). Table 1. Parameters and state variables qSA(t)
εSA δSA ηSA βSA αSBA γSBA ωOIA ωOEA
level for state S of agent A at time t extent to which agent A expresses state S extent to which agent A is open to state S tendency of agent A to absorb or amplify state S positive or negative bias of agent A on state S channel strenght for state S from sender B to receiver A contagion strength for S from sender B to receiver A weigth for group intention impact on agent A ‘s intention for O weigth for own emotion impact on agent A ‘s intention for O
The abstract model for mirroring described above applies to both emotion and intention states S or an option O, but does not describe any interplay between them yet. Taking the Somatic Marker Hypothesis on decision making as a point of departure, not only intentions of others, but also one’s own emotions affect one’s own intentions. To incorporate such an interaction, the basic model is extended as follows: to update qSA(t) for an intention state S relating to an option O, both the intention states of others for O and the qS'A(t) values for the emotion state S' for O are taken into account. These intention and emotion states S and S' for option O are denoted by OI and OE, respectively: Level of emotion for option O of person A: Level of intention indication for O of person A:
qOEA(t) qOIA(t)
The combination of the own (positive) emotion level and the rest of the group’s aggregated intention is made by a weighted average of the two: qOIA**(t) = (ωOIA/ωOA) qOIA*(t) + (ωOEA/ωOA) qOEA(t) γOIA* = ω γOIA
(5)
where ωOIA and ωOEA are the weights for the contributions of the group intention impact (by mirroring) and the own emotion impact (by somatic marking) on the intention of A for O, respectively, and ωOA = ωOIA + ωOEA. Then the model for the intention and emotion contagion based on mirroring and somatic marking becomes: qOEA(t + Δt) = qOEA(t) + γOEA[ηOEA(βOEA (1 - (1-qOEA*(t))(1-qOEA(t))) + (1-βOEA) qOEA*(t) qOEA(t)) + (1 - ηOEA) qOEA*(t) - qOEA(t) ] ⋅ Δt qOIA(t + Δt) = qOIA(t) + γOIA* [ηOIA (βOIA (1 - (1-qOIA**(t))(1-qOIA(t))) + (1-βOIA) qOIA**(t) qOIA(t)) + (1 - ηOIA) qOIA**(t) - qOIA (t)] ⋅ Δt
(6) (7)
34
M. Hoogendoorn et al.
4 Simulation Results The model has been studied in several scenarios in order to examine whether the proposed approach indeed exhibits the patterns that can be expected from literature. The investigated domain consists of a group of four agents who have to make a choice between four different options: A, B, C or D. The model has been implemented in Matlab by constructing three different scenarios which are characterized by different relationships (i.e., channel strength) between the agents. The scenarios used, involve two more specific types of agents: leaders and followers. Some agents have strong leadership abilities while others play a more timid role within the group. The general characteristics of leaders and followers as they were used in the experiments, which can be manifested differently within all agents, can be found in Table 2. Table 2. Parameters and state variables for leaders and followers emotion level intention level expressivity channel strength
scenario 1
Leader A qOEA high for particular O qOIA high for particular O εSA high αSAB high αSBA low scenario 2
Follower B εSB low αSAB high αSBA low scenario 3
Fig. 2. Scenarios for the presented simulation experiments
The different scenarios are depicted in Figure 2. Scenario 1 consists of a group of agents in which agent1 has strong leadership abilities and high channel strengths with all other agents. His initial levels of emotion and intention for option A, are very high. Scenario 2 depicts a situation where there are two agents with leadership abilities in the group, agent1 and agent4. Agent1 has strong channel strength to agent2, while agent4 has a strong connection to agent3. Agent1 has an initial state of high (positive) emotion and intention for option A, while agent4 has strong emotion and intention states for option D. Agent2 and agent3 have show no strong intentions and emotions for any of the options in their initial emotion and intention states. In Scenario 3 there are no evident leaders. Instead, all agents have moderate channel strengths with each other. A majority of the agents (agent3 and agent4) prefers option C, i.e., initially they have high intention and emotions states for option C. For both scenarios two variants have been created, one with similar agent characteristics within the group (besides the
Modelling the Emergence of Group Decisions
35
difference between leader and follower characteristics), and the second with a greater variety of agent personalities. In this section, only the main results using the greater variety in agent characteristics are shown for the sake of brevity. For the formal verification (Section 6) both have been used. The results of scenario 1 clearly show how one influential leader can influence the emotions and intention in a group. This is shown in the left graph of Figure 3, here the z-axis shows the value for the respective states, and the x-and y-axes represent time and the various agents. The emotion and intention of the leader (in this case agent1) spread through the network of agents, while the emotions and intentions of other agents hardly spread. Consequently, the emotions and intentions for option A, which is the preferred option of the leader, develop to be high in all agents. As can be seen in the figure, there are small differences between the developments of emotions and intentions of the agents. This is because they have different personality characteristics, which are reflected in the settings for the scenario . Depending on their openness, agents are more or less influenced by the states of others. Those agents with low openness (such as agent4) are hardly influenced by intentions and emotions of others. 1
Fig. 3. Simulation results for scenario 1 (left) and scenario 2 (right)
In scenario 2 (as shown in the right graph of Figure 3), the leader has somewhat positive emotions about option C as well, which explains the small but increasing spread of emotions (and after a while also intentions) concerning option C through the social network. Even though agent3 and agent2 both have a moderate intention for option B, their only strong channel strength is with each other, causing only some contagion between the two of them. Their intention does not spread because of a low
1
A full description of the characteristics and different parameter setting of the agents can be found in Appendix A: http://www.cs.vu.nl/~wai/Papers/group_decisions_appendix1.pdf
36
M. Hoogendoorn et al.
expressive nature and low amplification rate of both agents. The patterns found in the simulation of scenario 2 are similar to the ones of scenario 1, with the addition that both leaders highly dominate the spread of the emotions and intentions. The figure shows that the emotions and intentions of agent2 turn out to depend highly on the emotions and intentions of agent1, whereas the emotions and intentions of agent3 highly depend on those of agent4. As can be seen in the figure, any preferences for option D and C by agent2 and agent3 quickly grow silent.
Fig. 4. Simulation results for scenario 3
Scenario 3 shows how a group converges to the same high emotions and intentions for an option when there is no authority. In general, the graphs show that when there is no clear leadership, the majority determines the option with highest emotion and intentions in all agents. Option C, initially preferred by agent4 and agent3, eventually is the preferred option for all. However, the emotions and intentions for option A also spread and increase, though to a lesser extent. This is due to the fact that agent1 has strong feelings and intentions for option A and a high amplification level for these states. Furthermore, he has a significant channel strength with agent3, explaining why agent3 has the most increasing emotions and intentions for option A. However, the majority has the most important vote in this scenario. Furthermore, some general statements can be made about the behaviour of the model. In case a leader has high emotions but low intentions for a particular option, both the intentions and emotions of all followers will increase for that option. On the other hand, if a leader has high intentions for a particular option, but not high emotions for that option, this intention will not spread to other agents.
Modelling the Emergence of Group Decisions
37
5 Mathematical Analysis of Equilibria During simulations it turns out that eventually equilibria are reached: all variables approximate values for which no change occurs anymore. Such equilibrium values can also be determined by mathematical analysis of the differential equations for the model: dqOEA(t)/dt = γOEA[ηOEA(βOEA (1 - (1-qOEA*(t))(1-qOEA(t))) + (8) (1-βOEA) qOEA*(t) qOEA(t)) + (1 - ηOEA) qOEA*(t) - qOEA(t) ] ⋅ Δt dqOIA(t)/dt = γOIA* [ηOIA (βOIA (1 - (1-qOIA**(t))(1-qOIA(t))) + (1-βOIA) qOIA**(t) qOIA(t)) + (1 - ηOIA) qOIA**(t) - qOIA (t)] ⋅ Δt
(9)
Putting dqOEA(t)/dt = 0 and dqOIA(t)/dt = 0 and assuming γOEA and γOIA* nonzero, provides the following equilibrium equations for each agent A. ηOEA(βOEA (1-(1-qOEA*)(1-qOEA)) + (1-βOEA) qOEA* qOEA) + (1 - ηOEA) qOEA* - qOEA = 0
(10)
ηOIA (βOIA (1-(1-qOIA**)(1-qOIA)) + (1-βOIA) qOIA** qOIA) + (1 - ηOIA) qOIA** - qOIA = 0
(11)
For given values of the parameters ηOEA, βOEA, ηOIA, and βOIA , these equations may be solved analytically or by standard numerical approximation procedures. Moreover, by considering when dqOEA(t)/dt > 0 or dqOEA(t)/dt < 0 one can find out when qOEA(t) is strictly increasing and when strictly decreasing, and similarly for qOIA(t). For example, for equation (2), one of the cases considered is the following. Case ηOIA = 1 and β OIA = 1 For this case, equation (2) reduces to (1-(1-qOIA**)(1-qOIA)) - qOIA = 0. This can easily be rewritten via (1- qOIA ) -(1-qOIA**)(1-qOIA) = 0 into qOIA**(1-qOIA) = 0. From this, it can be concluded that equilibrium values satisfy qOIA**= 0 or qOIA = 1, and qOIA is never strictly decreasing, and is strictly increasing when qOIA** > 0 and qOIA < 1. Now the condition qOIA** = 0 is equivalent to (ωOIA/ωOA) qOIA* + (ωOEA/ωOA) qOEA = 0 ⇔ qOIA* = 0 if ωOIA > 0 and qOEA = 0 if ωOEA > 0
where qOIA* = 0 is equivalent to ∑B≠A γOIBA ⋅ qOIB / γOIA = 0 ⇔ qOIB = 0 for all B≠A with γOIBA > 0. Assuming both ωOIA and ωOEA nonzero, this results in the following: equilibrium: qOIA = 1 or qOIA < 1 and qOEA = 0 and qOIB = 0 for all B≠A with γOIBA > 0 strictly increasing: qOIA < 1 and qOEA > 0 or qOIB > 0 for some B≠A with γOIBA > 0 For a number of cases such results have been found, as summarised in Table 3. This table considers any agent A in the group. Suppose A is the agent in the group with highest qOEA, i.e., qOEB ≤ qOEA for all B≠ A. This implies that qOEA* = ∑B≠A γOEBA ⋅ qOEB / γOEA ≤ ∑B≠A γOEBA ⋅ qOEA / γOEA = qOEA ∑B≠A γOEBA / γOEA = qOEA. So in this case always qOEA* ≤ qOEA . Note that when qOEB < qOEA for some B≠ A with γOEBA > 0, then qOEA* = ∑B≠A γOEBA ⋅ qOEB / γOEA < ∑B≠A γOEBA ⋅ qOEA / γOEA = qOEA ∑B≠A γOEBA ⋅ / γOEA = qOEA. Therefore qOEA* = qOEA implies qOEB = qOEA for all B ≠ A with γOEBA > 0. Similarly, when A has the lowest qOEA of the group, then always qOEA* ≥ qOEA and again qOEA* = qOEA implies qOEB = qOEA for all B ≠ A with γOEBA > 0. This implies, for example, for ηOEA = 1 and βOEA = 0.5, assuming nonzero γOEBA , that always for each option the members’ emotion levels for option O will converge to one value in the group (everybody will feel the same about option O).
38
M. Hoogendoorn et al.
Table 3. Equilibria cases for an agent A with both ωOEA > 0, ωOIA > 0, and γOEBA > 0 for all B ηOIA = 1 β OIA = 1 qOIA = 1
ηOEA = 1 β OEA = 1
qOEA = 1
qOEA = 1 qOIA = 1
qOEA < 1 qOEB = 0 for all B ≠ A
qOEA < 1 qOIA = 1 qOEB = 0 for all B ≠ A
ηOEA = 1 qOEA* = qOEA β OEA = 0.5
qOEA* = qOEA qOIA = 1
ηOEA = 1 β OEA = 0
qOEA = 0 qOIA = 1
qOEA = 0
qOEA > 0 qOEB = 1 for all B ≠ A
qOEA > 0 qOIA = 1 qOEB = 1 for all B ≠ A
ηOIA = 1 β OIA = 0.5
qOIA < 1 qOEA = 0 qOIB = 0 for all B ≠ A none
qOEC = 0 for all C qOIA < 1 qOIB = 0 for all B ≠ A qOEC = 0 for all C qOIA < 1 qOIB = 0 for all B ≠ A qOEA = 0 qOIA < 1 qOIB = 0 for all B ≠ A none
ηOIA = 1 β OIA = 0
qOIA** = qOIA qOIA = 0
qOEA = 1 qOEA = 1 qOIA** = qOIA qOIA = 0
qOEA < 1 qOIA** = qOIA qOEB = 0 for all B ≠ A
qOEA < 1 qOIA = 0 qOEB = 0 for all B ≠ A
qOEA* = qOEA qOEA* = qOEA qOIA** = qOIA qOIA = 0
qOEA = 0 qOEA = 0 qOIA** = qOIA qOIA = 0
qOEA > 0 qOIA** = qOIA qOEB = 1 for all B ≠ A
qOEA > 0 qOIA = 0 qOEB = 1 for all B ≠ A
qOIA > 0 qOEA = 1 qOIB = 1 for all B ≠ A qOEA = 1 qOIA > 0 qOIB = 1 for all B ≠ A none
qOEC = 1 for all C qOIA > 0 qOIB = 1 for all B ≠ A none
qOIA > 0 qOEC = 1 for all C qOIB = 1 for all B ≠ A
6 Verifying Properties Specifying Emerging Patterns This section addresses the analysis of the group decision making model by specification and verification of properties expressing dynamic patterns that emerge. The purpose of this type of verification is to check whether the model behaves as it should, by automatically verifying such properties against the simulation traces for the various scenarios. In this way the modeller can easily detect inappropriate behaviours and locate sources of errors in the model. A typical example of a property that may be checked, is whether no unexpected situations occur, such as a variable running out of its bounds (e.g., qA(t) > 1, for some time point t and agent A), or whether eventually an equilibrium value is reached, but also more detailed expected properties of the model such as compliance to the theories found in literature. A number of dynamic properties have been identified, formalized in the Temporal Trace Language (TTL), cf. [2] and automatically checked. The TTL software environment includes a dedicated editor supporting specification of dynamic properties to obtain a formally represented temporal predicate logical language TTL formula. In addition, an automated checker is included that takes such a formula and a set of traces as input, and verifies automatically whether the formula holds for the traces. The language TTL is built on atoms referring to states of the world, time points and traces, i.e.
Modelling the Emergence of Group Decisions
39
trajectories of states over time. In addition, dynamic properties are temporal predicate logic statements that can be formulated with respect to traces based on a state ontology. Below, a number of the dynamic properties that were identified for the group decision making model are introduced, both in semi-formal and in informal notation (where state(γ, t) |= p denotes that p holds in trace γ at time t). The first property counts the number of subgroups that are present. Here, a subgroup is defined as a group of agents having the same highest intention. Each agent has 4 intention values (namely one for each of the four options that exist), therefore the number of subgroups that can emerge are always: 1, 2, 3 or 4 subgroups. P1 –number of subgroups The number of subgroups in a trace γ is the number of options for which there exists at least one agent that has an intention for this option as its highest valued intention. P1_number_of_subgroups(γ:TRACE) ≡ sum(I:INTENTION, case(highest_intention(γ, I), 1, 0)
where highest_intention(γ:TRACE, I:INTENTION) ≡ ∃A:AGENT [∀R1:REAL state(γ, te) |= has_value(A, I, R1) ⇒ ∀I2:INTENTION≠I, ∀R2:REAL [state(γ, te) |= has_value(A, I2, R2) ⇒ R2 < R1]]
In this property, the expression case(p, 1, 0) in TTL functions such that if property p holds it is evaluated to the second argument (1 in this example), and to the third argument (0 in this example) if the property does not hold. The sum operator simply adds these over the number of elements in the sort over which the sum is calculated (the intentions in this case). Furthermore, when tb or te are used in the property, they denote the begin or end time of the simulation, whereby in te an equilibrium is often reached. Property P1 can be used to count the number of subgroups that emerge. A subgroup is defined as a group of agents that each have the same intention as their intention with highest value. This property was checked on multiple traces that each belong to one of the three scenario’s discussed in the simulation results section. For the traces for both variants of scenario 1: , a single subgroup was found, for scenario 2: two subgroups were found, and for scenario 3, a single subgroup was found, which is precisely according to the expectations. The second property counts the number of agents in each of the subgroups, using a similar construct. P2– subgroup size The number of agents in a subgroup for intention I is the number of agents that have this intention as their highest intention. P2_subgroup_size(γ:TRACE, I:INTENTION) ≡ sum(A:AGENT, case(highest_intention_for(γ, I, A), 1, 0))
where highest_intention_for(γ:TRACE, I:INTENTION, A:AGENT) ≡ ∀R1:REAL [state(γ, te) |= has_level(A, I, R1) ⇒ ∀I2:OPTION≠I, ∀R2:REAL [state(γ, te) |= has_level(A, I2, R2) ⇒ R2 < R1]]
In the traces for scenario1 the size of the single subgroup that occurred was 4 agents. For scenario 2 two subgroups of 2 agents were found. Finally, in scenario 3 only a single subgroup combining 4 agents has been found. These findings are correct; they indeed correspond to the simulation results.
40
M. Hoogendoorn et al.
The final property, P3, expresses that an agent is a leader in case its intention values have changed the least over the whole simulation trace, as seen from his initial intention values and compared to the other agents (thereby assuming that these agents moved towards the intention of the leader that managed to convince them of this intention). P3–leader An agent is considered a leader in a trace if the number of intentions for which it has the lowest change is at least as high as all other agents. P3_leader (γ:TRACE, A:AGENT) ≡ ∀A2:AGENT ≠A sum(I:INTENTION, case(leader_for_intention(γ, A, I),1,0)) ≥ sum(I:INTENTION, case(leader_for_intention(γ, A2, I),1,0))
where leader_for_intention(M:TRACE, A:AGENT, I:INTENTION) ≡ ∀R1, R2: REAL [ [state(γ, tb) |= has_value(A,I, R1) & state(γ, te) |= has_value(A, I, R2) ] ⇒ ∀R3, R4: REAL, ∀A2:AGENT ≠A [state(γ, tb) |= has_value (A2, I, R4) & state(γ, te) |= has_value (A2, I, R3) ⇒ |R2-R1|< |R3-R4| ]]
Using this definition, only agent 1 qualifies as a leader in scenario 1. For scenario 2 only agent 4 is a leader. Finally, in scenario 3 both agent 1 and agent 3 are found to be leaders as they both have equal intentions for which they change the least.
7 Discussion In this paper, an approach has been presented, to model the emergence of group decisions. The current model has been based on the neurological concept of mirroring (see e.g. [12], [18]) in combination with the Somatic Marker Hypothesis of Damasio (cf. [1], [3], [5], [6]). An existing model of emotion contagion (cf. [8]) was taken as inspiration, and has been generalised to contagion of both emotions and intentions, and extended with interaction between the two, in the form of influences of emotions upon intentions. Several scenarios have been simulated by the model to investigate the emerging patterns, and also to look at leadership of agents within groups. The results of these simulation experiments show patterns as desired and expected. In order to be able to make this claim more solid, both a mathematical analysis as well as a formal verification of the simulation traces have been performed, showing that the model indeed behaves properly. For future work, an interesting element would be to scale up the simulations and investigate the behaviour of agents in larger scale simulations. Furthermore, modelling a more detailed neurological model is also part of future work, thereby defining an abstraction relation mapping between this detailed level model and the current model. Acknowledgements. This research has partly been conducted as part of the FP7 ICT Future Enabling Technologies program of the European Commission under grant agreement No 231288 (SOCIONICAL).
Modelling the Emergence of Group Decisions
41
References 1. Bechara, A., Damasio, A.: The Somatic Marker Hypothesis: a neural theory of economic decision. Games and Economic Behavior 52, 336–372 (2004) 2. Bosse, T., Jonker, C.M., van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and Verification of Dynamics in Agent Models. International Journal of Cooperative Information Systems 18, 167–193 (2009) 3. Damasio, A.: Descartes’ Error: Emotion, Reason and the Human Brain. Papermac, London (1994) 4. Damasio, A.: The Feeling of What Happens. In: Body and Emotion in the Making of Consciousness. Harcourt Brace, New York (1999) 5. Damasio, A.: Looking for Spinoza. Vintage books, London (2003) 6. Damasio, A.: The Somatic Marker Hypothesis and the Possible Functions of the Prefrontal Cortex. Philosophical Transactions of the Royal Society: Biological Sciences 351, 1413– 1420 (1996) 7. Damasio, A., Meyer, K.: Behind the looking-glass. Nature 454, 167–168 (2008) 8. Duell, R., Memon, Z.A., Treur, J., van der Wal, C.N.: An Ambient Agent Model for Group Emotion Support. In: Cohn, J., Nijholt, A., Pantic, M. (eds.) Proceedings of the Third International Conference on Affective Computing and Intelligent Interaction, ACII 2009, pp. 550–557. IEEE Computer Society Press, Los Alamitos (2009) 9. Frijda, N.H.: The Emotions. Studies in Emotion and Social Interaction. Cambridge University Press, Cambridge (1987) 10. Goldman, A.I.: Simulating Minds: The Philosophy, Psychology, and Neuroscience of Mindreading. Oxford Univ. Press, New York (2006) 11. Iacoboni, M.: Mirroring People, Farrar, Straus & Giroux, New York (2008) 12. Iacoboni, M.: Understanding others: imitation, language, empathy. In: Hurley, S., Chater, N. (eds.) Perspectives on imitation: from cognitive neuroscience to social science, vol. 1, pp. 77–100. MIT Press, Cambridge (2005) 13. Memon, Z.A., Treur, J.: Modelling the Reciprocal Interaction between Believing and Feeling from a Neurological Perspective. In: Zhong, N., Li, K., Lu, S., Chen, L. (eds.) BI 2009. Lecture Notes in Computer Science(LNAI), vol. 5819, pp. 13–24. Springer, Heidelberg (2009) 14. Rizzolatti, G.: The mirror-neuron system and imitation. In: Hurley, S., Chater, N. (eds.) Perspectives on imitation: from cognitive neuroscience to social science, vol. 1, pp. 55–76. MIT Press, Cambridge (2005) 15. Rizzolatti, G., Craighero, L.: The mirror-neuron system. Annu. Rev. Neurosci. 27, 169– 192 (2004) 16. Rizzolatti, G., Fogassi, L., Gallese, V.: Neuro-physiological mechanisms underlying the understanding and imitation of action. Nature Rev. Neurosci. 2, 661–670 (2001) 17. Rizzolatti, G., Sinigaglia, C.: Mirrors in the Brain: How Our Minds Share Actions and Emotions. Oxford University Press, Oxford (2008) 18. Pineda, J.A. (ed.): Mirror Neuron Systems: the Role of Mirroring Processes in Social Cognition. Humana Press Inc., Totowa (2009)
Rank-Score Characteristics (RSC) Function and Cognitive Diversity D. Frank Hsu1, Bruce S. Kristal2, and Christina Schweikert1 1
Department of Computer and Information Science, Fordham University, New York, NY 10023, USA 2 Department of Neurosurgery, Brigham and Women’s Hospital, Boston, MA 02115, USA and Department of Surgery, Harvard Medical School, Boston, MA 02115, USA
Abstract. In Combinatorial Fusion Analysis (CFA), a set of multiple scoring systems is used to facilitate integration and fusion of data, features, and/or decisions so as to improve the quality of resultant decisions and actions. Specifically, in a recently developed information fusion method, each system consists of a score function, a rank function, and a Rank-Score Characteristic (RSC) function. The RSC function illustrates the scoring (or ranking) behavior of the system. In this report, we show that RSC functions can be computed easily and RSC functions can be used to measure cognitive diversity for two or more scoring systems. In addition, we show that measuring diversity using the RSC function is inherently distinct from the concept of correlation in statistics and can be used to improve fusion results in classification and decision making. Among a set of domain applications, we discuss information retrieval, virtual screening, and target tracking.
1 Introduction In the second half of the last century, and, as we enter the second decade of the twenty-first century, information and scientific revolutions have taken place and new progress is being made. The emerging digital and genomic landscapes have shaped our life, community, culture, society, and the world. 1.1 The Digital Landscape and the Genomic Landscape The number of information providers and users has increased tremendously over the last 2-3 decades, and now includes a large percentage of the population of the developed world. The nature of information content has also changed drastically from text to a mix of text, speech, still and video images to histories of interactions with colleagues, friends, information sources and their automated proxies. Raw data sources of interest also now include tracks of sensor readings from GPS devices, medical devices, and possibly other embedded sensors and robots in our environment [23]. Communication conduits have included twisted pairs, coaxial cables, optical fibers, wireline, wireless, satellite, and the Internet. More recently, the list extends to include radio, iPod, iPhone, Blackberry, laptop, notebook, desktop, and iPad. As such, a pipeline has been formed [7]: Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 42–54, 2010. © Springer-Verlag Berlin Heidelberg 2010
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
43
Data ---> Information ---> Knowledge Medicine is beginning to make strides using genomic information and biomarkers to study, diagnose, and treat diseases and disorders -- heralding the beginning of the era of personalized medicine. Moreover, a renewed emphasis of translational science (from bench to bed side) has similarly begun to enhance the diagnostics and screening of diseases and disorders and to improve the process of (and way for) treatment (and therapy). More recently, molecular networks, which connect molecular biology to clinical medicine, have become a major focus for translational science [24]. 1.2 The Fourth Paradigm and the “CompXInfor” Evolution Jim Gray, in his presentation to the Computer Science and Telecommunications Board, proposed what he considered a new paradigm for scientific discovery that he called the Fourth Paradigm [10]. He argued that, as early as a thousand years ago, science could be described as “empirical,” describing natural phenomena. Then, in the last few hundred years, “theoretical” branches of science used models, methods, and generalizations. In the last few decades, scientists have increasingly used “computational” models and simulations as an adjunct to study complex phenomena. Indeed, one branch of contemporary scientific approaches utilizes “data exploration” (what he called e-science) to attempt to synergistically probe and/or unify experiment, theory, and simulation. Similarly, experiments today increasingly involve megavariate datasets captured by instruments or generated by simulators and processed by software. Information and knowledge are stored in computers or data centers as databases. These information and databases are analyzed using statistical and computational tools and techniques. A point raised by Jim Gray in the above exposition [10] is that one of the central problems in scientific discovery is 'how to codify and represent knowledge in a given discipline X?'. Several generic problems include: data ingest, managing large datasets, identifying and enforcing common schema, how to organize and reorganize these data and their associated analyses, building and executing models, documenting experiments, curation, long-term preservation, interpretation of information, and transformation of information to knowledge. All these issues require computational and informatics tools and techniques. Hence the “CompXinfor” is born which means computational-X and X-informatics for a given discipline X. One example is computational biology and bioinformatics. Another is computational neuroscience and neuroinformatics. The name of this conference is related to computational brain and brain informatics. 1.3 Informatics and Information Fusion The word “Informatics” has been used very often in several different contexts and disciplines. Webster’s Dictionary (10th Edition) describes it as “Information science”, which is stated as “the collection, classification, storage, retrieval, and dissemination of recorded knowledge treated both as a pure and as an applied science.” In an attempt to place the framework and issues in proper perspective, we suggest the following: “Informatics is the science that studies and investigates the acquisition, representation, processing, interpretation, and transformation of information in, for, and by living organisms, neuronal systems, interconnection networks, and other complex systems.”
44
D.F. Hsu, B.S. Kristal, and C. Schweikert
Informatics as an emerging scientific discipline consisting of methods, processes, and applications is the crucial link between domain data and domain knowledge (see Figure 1 and Figure 2).
Fig. 1. Scope and Scale of Informatics
Fig. 2. The Autopoiesis of Informatics
Information fusion is the “integration” or “combination” of information (or data) from multiple sensors, sources, features, classifiers, and decisions in order to improve the quality of situation analysis, ensembled decisions, and action outcomes (see [1, 6, 14, 16, and 26]). Information fusion is a crucial and integral part of the process and function of informatics. Combinatorial Fusion Analysis (CFA), a recently developed information fusion method, uses multiple scoring systems to facilitate fusion of data, features, and decisions [14]. Figure 3 depicts the CFA architecture and its informatics flow.
Fig. 3. The CFA Architecture
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
45
2 Combinatorial Fusion Analysis (CFA): An Emerging Fusion Method As stated in Section (1.2) and depicted in Figure 1 and Figure 3, representation of data and the process of informatics takes a sequence of formats: Data ---> Features ---> Decision. The following remark might help clarify the complexity of the informatic process and justify the need for information fusion [11, 12, 14]. Remark 2.1: (a) Real world applications today typically involve data sets collected from different devices/sources or generated from different information sources/experiments. Different features/attributes/indicators/cues frequently use different, often non-interconvertable kinds of measurements or parameters, and different decisions/methods may be appropriate for different feature sets, different data sets, and different temporal traces. (b) Different methods/systems for decision and action may be ensembled to address potential solutions to the same problem with the same (or different) data and feature sets. 2.1 Multiple Scoring Systems (MSS) Let D be a set of, for example, documents, genes, molecules, or classes with |D| = n. Let N = [1, n] be the set of integers from 1 to n and R be the set of real numbers. We have the following: Remark 2.2: A set of p scoring systems A1, A2, …, Ap on D has each scoring system A consisting of a score function sA, a rank function rA derived by sorting the score function sA, and a Rank-Score Characteristic (RSC) function fA defined as fA: N→R in Figure 4:
Fig. 4. Rank-Score Characteristic (RSC) Function
In a set of p scoring systems A1, A2, …, Ap, there are many, indeed essentially an infinite number of different ways to combine these scoring systems into a single system A* (e.g. see [14] and [29]). Hsu and Taksa studied comparisons between score combination and rank combination [13], which are defined as follows:
46
D.F. Hsu, B.S. Kristal, and C. Schweikert
Remark 2.3: Let A1, A2, …, Ap, be p scoring systems. Let Cs(∑Ai) = E and Cr(∑Ai) = F be the score combination and rank combination defined by sE(d) = (1/p) ∑ sAi(d) and sF(d) = (1/p) ∑ rAi(d), and rE and rF are derived by sorting sE and sF in decreasing order and increasing order, respectively. Depending on application domains, performances can be evaluated using different measurements such as true/false positives and true/false negatives, precision and recall, goodness of hit, specificity and sensitivity, etc... Once one or more performance measurements can be agreed upon, the following remark states two of the most fundamental problems in information fusion. For simplicity, we only describe the case where p = 2, and a single performance metric is used. Remark 2.4: Let A and B be two scoring systems on the domain set D. Let E = Cs(A,B) and F = Cr(A,B) be a score combination and a rank combination of A and B. Let P be a performance measurement. (a) When is P(E) or P(F) greater than or equal to max{P(A), P(B)}? (b) When is P(F) greater than or equal to P(E)? 2.2 How to Compute the RSC Function fA? For a scoring system A with score function sA, as stated in Remark 2.2 and shown in Figure 4, its rank function rA can be derived by sorting the score values in decreasing order and assigning a rank value to replace the score value. The diagram in Figure 4 shows mathematically, fA(i) = (sA◦ rA-1)(i) = sA(rA-1(i)) for i in N=[1,n]. Computationally, we can derive fA simply by sorting the score values by using the rank values as the keys. The example in Figure 5 illustrates a RSC function on D = {d1,d2,…, d12} using the computational approach, which is short and easy. However, Figure 6 (a), (b), (c), (d), and (e) depict a statistical approach to derive the same RSC
D d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12
Score function s:D→R 3 8.5 8 4.5 4 10 9.5 3.5 1 2 5 5.5
Rank function r:D→N 10 3 4 7 8 1 2 9 12 11 6 5
RSC function f:N→R 1 10 2 9.5 3 8.5 4 8 5 5.5 6 5 7 4.5 8 4 9 3.5 10 3 11 2 12 1
Fig. 5. Computational Derivation of RSC Function
Rank-Score Characteristics (RSC) Function and Cognitive Diversity Variable Score Function Rank Function
47
d1 3
d2 8.5
d3 8
d4 4.5
d5 4
d6 10
d7 9.5
d8 3.5
d9 1
d10 2
d11 5
d12 5.5
di s(di)
10
3
4
7
8
1
2
9
12
11
6
5
r(di)
(a) Score Function and Rank Function
(b) epdf = empirical probability distribution function
(c) cumulative epdf = cepdf ---> epcdf ---> cdf = cumulative distribution function
(d) i- cdf = inverse-cdf
(e) reversed inverse-cdf
Figure 6(c) → Figure 6(d): Interchange x, y coordinates, x ↔ y Figure 6(d) → Figure 6(e): Re-label x-coordinates 12x → x and transform f(i) → f(13-i) Rank Score Characteristic (RSC) Function
i
1
2
3
4
5
6
7
8
9
10
11
12
N
f(i)
10
9.5
8.5
8
5.5
5
4.5
4
3.5
3
2
1
R
(f) Resulting RSC function f = s ° f-1 Fig. 6. Statistical Derivation of RSC Function
48
D.F. Hsu, B.S. Kristal, and C. Schweikert
function, which is much longer and more complicated. For the sake of contrast and comparison, we use the same data set D = {d1,d2,…,d12}. The function in Figure 6(d) is the inverse of that in Figure 6(c). The function in Figure 6(e) is derived from Figure 6(d) by re-labeling the x-coordinates 12x → x and applying the following transformation f(i) → f(13-i). According to this statistical approach, the function in Figure 6(e), as derived from Figure 6(a), would have been called “reversed inverse empirical probability cumulated distribution function.”
3 Diversity vs. Correlation 3.1 RSC Function for Computing Cognitive Diversity Let D be a set of twenty students, and consider the example of three professors A, B, C assigning scores to this class at the end of a semester. Figure 7 illustrates three potential RSC functions fA, fB, and fC, respectively. In this case, each RSC function illustrates the scoring (or ranking) behavior of the scoring system, which is one of the professors. The example shows that Professor A has a very evenly distributed scoring practice while Professor B gives less students high scores and Professor C gives more students high scores.
Fig. 7. Three RSC functions fA, fB, and fC
This example thus highlights a use of multiple scoring systems, as we could use a multiple scoring system to assess how good a given student was in the combined views of the three professors. Specifically, in a multiple scoring system, suppose we have two systems A and B. The concept of diversity d(A,B) can be defined in the following (see [14]). Remark 3.1: For scoring systems A and B, the diversity d(A,B) between A and B can be defined as: (a) d(A,B)= 1-d(sA,sB), where d(sA,sB) is the correlation (e.g. Pearson’s z correlation) between score functions sA and sB, (b) d(A,B)=1-d(rA,rB), where d(rA,rB) is the rank correlation (e.g. Kendall’s τ or Spearman’s ρ) between rank functions rA and rB, and (c) d(A,B)=d(fA, fB), the diversity between RSC functions fA and fB.
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
49
Correlation is a central concept in statistics. Correlation has been shown to be very useful in many application domains which use statistical methods and tools. However, it is always a challenge to process, predict, and interpret correlations in a complex system or dynamic environment. More recently, for example, Engle discussed the challenge of forecasting dynamic correlations which play an essential role in risk management, portfolio management, and other financial activities [8]. Diversity, on the other hand, is a crucial concept in informatics. In machine learning, data mining, and information fusion, it has been shown that when combining multiple classifier systems, multiple neural nets, and multiple scoring systems, higher diversity is a necessary condition for improvement [2, 14, 16, 26, 28]. Figure 8 shows some characteristic differences between correlation and diversity.
Correlation / Similarity Diversity / Heterogeneity
Likely Target
Domain Rules
Reasoning / Method
Opposite Concept
Measurement / Judgment
Fusion Level
Object
Syntactic
Statistics
Difference
Data
Data
Subject
Semantic
Informatics
Homogeneity
Decision
Feature / Decision
Fig. 8. Correlation/Similarity vs. Diversity/Heterogeneity
3.2 The Role of Diversity in Information Fusion In regression, multiple classifier systems, multiple artificial neural nets, and multiple scoring systems, it has been shown that combination, fusion, or ensemble of these systems can outperform single systems. A variety of diversity measures and fusion methods have been proposed and studied [2, 14, 16, 26, 28]. However, it remains a challenging problem to predict and assess the fused system performance in terms of the performance of and the diversity among individual systems. In regression, Krogh and Vedelsby [15] established the following elegant and practical relationship: E = Ē - Ā, where E is the quadratic error of the combined estimate, Ē is the average quadratic error of the individual estimation, and Ā is the variance among the individual estimates. In classification, a relationship of this kind is not as clear. In classifier ensemble, using majority voting and entropy diversity, Chung, Hsu and Tang [4] showed that:
max{P − D, p ( P + D) + 1 − p} ≤ P m ≤ min{P + D, p ( P − D)} , where Pm is the performance of the ensemble of p classifiers using majority voting,
P is the average performance of the p classifiers and D is the average entropy diversity among p individual classifiers. These upper and lower bounds were shown to be tight using the concept of a performance distribution pattern (PDP) for the input set. More recently, tight bounds of Pm in terms of
P and Dis (the pairwise disagreement measure) and similar results in terms of P and D for tight bounds of Ppl using plurality voting have been established [3, 5].
50
D.F. Hsu, B.S. Kristal, and C. Schweikert
In multiple scoring systems, several results have been obtained that demonstrate that combining multiple scoring systems can improve the result only if (a) each of the individual scoring systems has relatively high performance and (b) the individual scoring systems are diverse [13, 14, 19, 22, 29]. A closed formula or bounds for the combined system is yet to be found. In the next section, we discuss several examples in different application domains with two necessary conditions (a) and (b) for the combination to be positive, i.e. its performance is better than each individual system.
4 Examples of Domain Applications In this section, we show examples of domain applications in information retrieval, virtual screening, and target tracking where RSC function is used to define cognitive diversity [13, 19, 29]. Applications in other domains include bioinformatics, text mining, protein structure prediction, portfolio management, and online learning [17, 18, 20, 21, 25, 27]. 4.1 Comparing Rank and Score Combination Methods Let A and B be two scoring systems on a set of five hundred documents in a retrieval system. Let fA and fB be the RSC function of A and B, respectively, as defined in Remark 2.2. Let E and F be two new scoring systems related to score combination and rank combination, respectively, as defined in Remark 2.3. Using the symmetric group S500 as the sample space for rank functions with respect to five hundred documents, Hsu and Taksa [13] showed the following: Remark 4.1: Under certain conditions, such as the greatest value of the diversity d(fA, fB), the performance of rank combination is better than that of score combination, P(F)≥P(E), under both performance evaluation of precision and average precision. Figure 9 gives an illustration of two sets of RSC functions: one with ten documents and scores ranging from 1 to 10 and the other with five hundred documents and scores ranging from 1 to 100. These two examples indicate that as long as the diversity
Fig. 9. Two RSC functions with (a) n=10 and s=10 and (b), n=500 and s=100
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
51
between two RSC functions d(fA, fB) is large (one measure is the area between these two functions fA and fB), rank combination of A and B, Cr(A, B) = F has better precision than score combination of A and B, Cs(A, B) = E [13]. 4.2 Improving Enrichment in Virtual Screening Virtual screening of molecular compound libraries has, in the past decades, been shown to be a distinct, useful and potentially faster and less expensive method for novel lead compound discovery in drug design and discovery. However, a major weakness of virtual screening - the inability to consistently and accurately identify true positives - is probably due to our insufficient understanding of the chemistry involved in ligand binding and the scoring systems used to screen these ligands. Although it has been demonstrated that combining multiple scoring systems (consensus scoring) would improve the enrichment of true positives, it has been a challenge to provide a theoretical foundation which explains when and how combination for virtual screening should be done. Using five scoring systems with two genetic docking algorithms on four target proteins: thymidine kinase (TK), human dihydrofolate reductase (DHFR), and estrogen receptors of antagonists and agonists (ER antagonist and ER agonist), Yang et al [29] demonstrated that high performance ratio and high diversity are two conditions necessary for the fusion to be positive, i.e. combination performs better than each of the individual systems. Figure 10 illustrates two necessary conditions (high performance ratio and high diversity using RSC functions) for positive enrichment among eighty rank or score combinations of two individual systems. These examples suggest that applications of consensus scoring could increase the hit rate and reduce the false positive rate. Moreover, the improvement depends heavily on both factors: (a) the performance of each of the individual systems and (b) the diversity among individual scoring systems as stated in Section 3.2.
Fig. 10. Positive vs. Negative cases
52
D.F. Hsu, B.S. Kristal, and C. Schweikert
4.3 Target Tracking under Occlusion Target tracking is the process of predicting the future state of a target by examining the current state from a sequence of information and data collected by a group of sensors, sources, and databases. Multi-target tracking can be very complicated because targets can occlude one another affecting feature or cue measurements. Lyons and Hsu [19] applied a multisensory fusion approach, based on Combinatorial Fusion Analysis and the RSC function to measure cognitive diversity, to study the problem of multisensory video tracking with occlusion. Each sensory cue is considered as a scoring system. A RSC function is used to characterize scoring (or ranking) behavior of each system (or sensor). A diversity measure, computed using the variation in the RSC function, is used to dynamically choose the best scoring system to combine and the best operations to fuse. The relationship between the diversity measure and the tracking accuracy of two fusion operations (rank combination vs. score combination) is evaluated using a set of twelve video sequences. In this study, Lyons and Hsu [19] demonstrated that using RSC function as a diversity measure is an effective method to study target tracking video with occlusions. The experiments by Lyons and Hsu [19] confirm what Hsu and Taksa [13] and Hsu, Chung, and Kristal [14] proposed that the RSC function is a feasible and useful characteristic to define cognitive diversity and to guide us in the process of fusing multiple scoring systems.
5 Conclusion and Remarks In this paper, we show that the Rank-Score Characteristic (RSC) function as defined in Combinatorial Fusion Analysis can be computed easily and can be used to measure cognitive diversity among two or more scoring systems. We also show that diversity measure using the RSC function is different from the concept of correlation in statistics. Moreover, the notion of diversity using RSC function plays an important role in the fusion of data, features, cues, and decision in classification and other decision making. Three domain applications in information retrieval and search algorithms, virtual screening and drug discovery, and target tracking and recognition were discussed [13, 19, 29]. We wish to include other domain applications in bioinformatics, text mining protein structure prediction, online learning, and portfolio management later in a future report [17, 18, 20, 21, 25, 27]. In the future, we wish to study diversity using RSC function in application domains such as sports teams ranking, figure skating judgment, ecology, and biodiversity (e.g. [9]). Living organisms, neuronal systems, interconnection networks, and other complex systems are in great need for efficient and effective methods and techniques such as computational X and X-informatics in the emerging field of Informatics. Information fusion plays an important role in the CompXinfor e-science approach in modern day scientific discovery.
Rank-Score Characteristics (RSC) Function and Cognitive Diversity
53
References [1] Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1), 1–41 (2008) [2] Brown, G., Wyatt, J.L., Harris, R., Yao, X.: Diversity creation methods: A survey and categorisation. Journal of Information Fusion 6(1), 5–20 (2005a) [3] Chun, Y.S., Hsu, D.F., Tang, C.Y.: On the relationships among various diversity measures in multiple classifier systems. In: 2008 International Symposium on Parallel Architectures, Algorithms, and Networks (ISPAN 2008), pp. 184–190 (2008) [4] Chung, Y.S., Hsu, D.F., Tang, C.Y.: On the Diversity-Performance Relationship for Majority Voting in Classifier Ensembles. MCS, 407–420 (2007) [5] Chung, Y.S., Hsu, D.F., Liu, C.Y., Tang, C.Y.: Performance Evaluation of Classifer Ensembles in Terms of Diversity and Performance of Individual Systems (submitted) [6] Dasarathy, B.V.: Elucidative fusion systems—an exposition. Information Fusion 1, 5–15 (2000) [7] Denning, P.J.: The profession of IT: The IT schools movement. Commun. ACM 44(8), 19–22 (2001) [8] Engle, R.: Anticipating Correlations: A New Paradigm for Risk Management. Princeton University Press, Princeton (2009) [9] Gewin, V.: Rack and Field. Nature 460, 944–946 (2009) [10] Hey, T., et al.(eds.): Jim Gray on eScience: A Transformed Scientific Method, in the Fourth Paradigm, pp. 17–31. Microsoft Research(2009) [11] Ho, T.K.: Multiple classifier combination: Lessons and next steps. In: Bunke, H., Kandel, A. (eds.) Hybrid methods in pattern recognition, pp. 171–198. World Scientific, Singapore (2002) [12] Ho, T.K., Hull, J.J., Srihari, S.N.: Decision combination in multiple classifier system. IEEE Trans. on Pattern Analysis and Machine Intelligence 16(1), 66–75 (1994) [13] Hsu, D.F., Taksa, I.: Comparing rank and score combination methods for data fusion in information retrieval. Information Retrieval 8(3), 449–480 (2005) [14] Hsu, D.F., Chung, Y.S., Kristal, B.S.: Combinatorial fusion analysis: methods and practice of combining multiple scoring systems. In: Hsu, H.H. (ed.) Advanced Data Mining Technologies in Bioinformatics. Idea Group Inc., USA (2006) [15] Krogh, A., Vedelsby, J.: Neural Network Ensembles, Cross Validation, and Active Learning. In: Advances in Neural Information Processing Systems, vol. 7, pp. 231–238. M.I.T. Press, Cambridge (1995) [16] Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. WileyInterscience, Hoboken (2004) [17] Li, Y., Hsu, D.F., Chung, S.M.: Combining Multiple Feature Selection Methods for Text Categorization by Using Rank-Score Characteristics. In: 21st IEEE International Conference on Tools with Artificial Intelligence, pp. 508–517 (2009) [18] Lin, K.-L., et al.: Feature Selection and Combination Criteria for Improving Accuracy in Protein Structure Prediction. IEEE Transactions on Nanobioscience 6(2), 186–196 (2007) [19] Lyons, D.M., Hsu, D.F.: Combining multiple scoring systems for target tracking using rank-score characteristics. Information Fusion 10(2), 124–136 (2009) [20] McMunn-Coffran, C., Schweikert, C., Hsu, D.F.: Microarray Gene Expression Analysis Using Combinatorial Fusion. BIBE, 410–414 (2009) [21] Mesterharm, C., Hsu, D.F.: Combinatorial Fusion with On-line Learning Algorithms. In: The 11th International Conference on Information Fusion, pp. 1117–1124 (2008) [22] Ng, K.B., Kantor, P.B.: Predicting the effectiveness of naive data fusion on the basis of system characteristics. J. Am. Soc. Inform. Sci. 51(12), 1177–1189 (2000)
54
D.F. Hsu, B.S. Kristal, and C. Schweikert
[23] Norvig, P.: Search. In ”2020 visions”. Nature 463, 26 (2010) [24] Schadt, E.: Molecular networks as sensors and drivers of common human diseases. Nature 461, 218–223 (2009) [25] Schweikert, C., Li, Y., Dayya, D., Yens, D., Torrents, M., Hsu, D.F.: Analysis of Autism Prevalence and Neurotoxins Using Combinatorial Fusion and Association Rule Mining. BIBE, 400–404 (2009) [26] Sharkey, A.J.C. (ed.): Combining Artificial Neural Nets: Ensemble and. Modular MultiNet Systems. Perspectives in Neural Computing. Springer, London (1999) [27] Vinod, H.D., Hsu, D.F., Tian, Y.: Combinatorial Fusion for Improving Portfolio Performance. In: Advances in Social Science Research Using R, pp. 95–105. Springer, Heidelberg (2010) [28] Whittle, M., Gillet, V.J., Willett, P.: Analysis of data fusion methods in virtual screening: Theoretical model. Journal of Chemical Information and Modeling 46, 2193–2205 (2006) [29] Yang, J.M., Chen, Y.F., Shen, T.W., Kristal, B.S., Hsu, D.F.: Consensus scoring for improving enrichment in virtual screening. Journal of Chemical Information and Modeling 45, 1134–1146 (2005)
Cognitive Effort for Multi-agent Systems Luca Longo and Stephen Barrett Department of Computer Science and Statistics - Trinity College Dublin {llongo,stephen.barrett}@cs.tcd.ie
Abstract. Cognitive Effort is a multi-faceted phenomenon that has suffered from an imperfect understanding, an informal use in everyday life and numerous definitions. This paper attempts to clarify the concept, along with some of the main influencing factors, by presenting a possible heuristic formalism intended to be implemented as a computational concept, and therefore be embedded in an artificial agent capable of cognitive effort-based decision support. Its applicability in the domain of Artificial Intelligence and Multi-Agent Systems is discussed. The technical challenge of this contribution is to start an active discussion towards the formalisation of Cognitive Effort and its application in AI.
1
Introduction
Theoretical constructs of attention and cognitive effort have a long history in psychology [11]. Cognitive effort is often understood as a multi-faceted phenomenon and a subjective concept, influenced by attention, that changes within individuals in response to individual and environmental factors [18]. Such a view, sustained by motivation theories, contrasts with empirical studies that have tended to treat attention as a static concept [6]. Theories of information processing consider cognitive effort as a hypothetical construct, regarded as a limited capacity resources that affects the speed of information processing [11]. Studies suggest that, even though cognitive effort may be a hypothetical construct, it is manifest as a subjective state that people have introspective access to [10]. Attention can be related to physiological states of stress and effort, to subjective experiences of stress, mental effort, and time pressure, and to objective measures of performance levels to breakdown in performance. These various aspects of attention have led to distinct means for assessing cognitive effort including physiological criteria such as heart rate, performance criteria such as quantity and quality of performance and subjective criteria such as rating of level of effort. Despite the interest in the topic for the past 40 years, there is no universally accepted and clear definition of cognitive effort often referred to as mental workload [9]. There appears to be little work to link the measurement of workload by any one paradigm to others and the lack of a formal theory of cognitive effort has lead to a proliferation of several methods with little chance of reconciliation [7]. Formalising cognitive effort as a computational concept would appear to be an interesting step towards a common definition and an opportunity to provide a usable structure for investigating behaviours. The goal of this paper is to facilitate such a development through the presentation of a formalisation of cognitive Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 55–66, 2010. c Springer-Verlag Berlin Heidelberg 2010
56
L. Longo and S. Barrett
effort in a organised fashion using formal tools. The principal reason for measuring cognitive effort is to quantify the mental cost of performing tasks to predict operator and system performances. It is studied from the point of view of artificial agents: our formalism does not aim to be the de-facto standard but it provides the tools necessary for its own revision. We are concerned with two key issues: How can we formalise cognitive effort as a usable computational concept? How can we provide cognitive effort-based decision supporting capabilities to an artificial agent? The methodology adopted to model cognitive effort is presented in section 2. The subjective nature of the concept is underlined in section 3 where a literature review identifies some of the main factors, amenable to computational treatment, that influence cognitive effort along with related works. We present our heuristic formalism in section 4. In 5 an optimisation problem in multi-agent systems is presented that aims to clarify a possible application of our heuristic formalism. We address open issues and future challenges in section 6.
2
Attacking the Phenomenon: Our Approach
Cognitive effort is a subjective, elusive concept and its precise definition is far from trivial. Indeed, the contextual aspect of the phenomenon may render attempts at both precise and generally applicable definition impossible in practice. Our approach tries to study the essential behaviour of cognitive effort and seeks to capture some of its aspects in a formalism structured as an open and extensible framework. Our method is based on a generalist assessment of the available literature, seeking to merge together different observations, intuitions and definitions towards a tool for the assessment of Cognitive effort in practical scenarios. The multi-agent paradigm is a powerful tool for investigating the problem. Although an agent’s cognitive model of its human peer is not necessarily precise, having at least a realistic model can be beneficial in offering unintrusive help, bias reduction, as well as trustable and self-adjustable autonomy. It is feasible to develop agents as cognitive aids to alleviate human bias, as long as an agent can be trained to obtain a model of a human’s cognitive inclination. Furthermore, with a realistic human cognitive model, an agent can also better adjust its automation level [19].
3
Cognitive Effort and Related Work
The assessment of the cognitive effort expended in the completion of a task is dependent on several factors such as individual skill, background and status that means the individual’s subjective experience and cognitive ability. Self-regulation theories [4] suggest that individuals with different levels of cognitive ability may react to changes in task difficulty in different ways because their perception of the task may be different. High ability individuals have a larger pool of cognitive resources than their counterparts who need to make larger resource adjustments to achieve the same outcome. People of low ability who perceive a high degree
Cognitive Effort for Multi-agent Systems
57
of difficulty in a task, will expend greater cognitive effort [20]. Similarly, intentions play a role on attention, and individuals with strong intentions allocate more cognitive effort to a task: highly conscientious individuals choose to work harder and persevere longer than their counterparts [1]. In the literature, curiosity, motivation, psychological stress, anxiety are often referred as arousals [11] and have a strong impact on attention and therefore on cognitive effort. Similarly, time plays a central role on attention as well: time-pressure may increase the amount of attention an individual needs to allocate on a task. Furthermore, performing a task requires an interval of time in which an individual has to elicit an amount of cognitive effort. Finally, contextual biases may influence attention over time: these may be unpredictable external distractions, contextual or taskrelated constraints. All these factors represent a sub-portion of all the possible factors used by several existing models of workload and mental effort. The popular NASA-Task load index, for instance, consists of six clusters of variables such as mental, physical and temporal demands, frustration, effort and performance [8]. The Subjective Workload Assessment Technique is a subjective rating technique that considers time load, mental effort and psychological stress load to assess workload [14]. Multi-agents systems are often used to model social structure where artificial agents collaborate with each other towards a common goal [17] seeking to find the best solution for their problems autonomously without human intervention. Most of the work in agent-based systems has assumed highly simplified agent models and artificial agents developed so far incorporate a wide range of cognitive functionalities such as memory, representation, learning and sensory motor capabilities. However, at present, there is a weak consideration of cognitive effort in multi-agent systems [16]. Integrating cognitive effort in an artificial agent may increase its robustness in terms of interdependence with other agents and the ability in the decision-making process, without loosing any of the freedom of choice such agents will be expected to possess.
4
A Presumption-Based Heuristic Formalism
As discussed briefly in section 3, models of cognitive effort involve a highly contextual and individual-dependent set of factors. Our approach begins by focusing on a set of context-dependent influencing factors, each representing a presumption or interpretation of facts in literature useful for inferring cognitive effort. Each presumption needs to be formally conceptualised in order to be computable, and in the following paragraphs we present six factors with different difficulty of formalisation. The set of factors considered here can be expanded, refined, criticised and reducted: we provide these as illustrative of our approach. The aim of our framework is to be open, extensible and applicable in different contexts where only some influencing factor can be monitored, captured and conceptualised formally. Cognitive Ability. Some people obviously and consistently understand new concepts quicker, solve new problem faster, see relationship and are more
58
L. Longo and S. Barrett
knowledgeable about a wider range of topics than others. Modern psychological theory views cognitive ability as a multidimensional concept and several studies, today known as IQ tests, tried to measure this trait [5]. Carroll suggested in his work [3] that there is a tendency for people who perform well in a specific range of activities, to perform well in all others as well. Prof. T. Salthouse suggested in his recent work [15] that some aspects of people’s cognitive ability peak around the age of 22 and begin a slow decline starting around 27. However, he pointed out that there is a great deal of variance among people and most cognitive functions are at a highly effective level into their final years, even when living a long life. Some type of mental flexibility decreases relatively early in adulthood, but how much knowledge one has, and the effectiveness of integrating it with one’s abilities may increase throughout all of adulthood if there are no pathological diseases. This research provides suitable evidence to model cognitive ability with a long-term growing function as the flexible sigmoid function proposed by Yin [22]: CA : [1..Gth ] ∈ ℵ3 → [0..1] ∈ G Gth −t t th −Gr CA(Gth , Gr , t) = CAmax 1+ GGthth−G Gth r where CA is cognitive ability whose maximum level is defined by CAmax (in this case equal to 1) and t is the age in years of an individual. Gth is the growing threshold, set to an average of mortality of 85 years and Gr is the growing rate, set to 22 years that identifies where the curve reaches the maximum growing weight and from that, increases moderately. The properties Gth and Gr are flexible because they may be set by considering environmental factors. Arousal. The concept of arousal plays an important role in assessing cognitive effort. It is sometimes treated in literature as a unitary dimension, as if a subject’s arousal state could be completely specified by a single measurement such as the size of his pupil. However, this is an oversimplification since arousal is a multidimensional concept that may vary in different situations [11]. Its intrinsic degree of uncertainty and subjectiveness are hard to model and we propose a simple subjective arousal taxonomy where different types of arousal, such as curiosity, motivation, anxiety, psychological stress, are organised in a multi-level tree. A subjective arousal taxonomy is a 3-tuple < A, W, R >, composed by vertexes A connected as a tree by unidirectional weighted edges, defined in R, by using the weights in W . Each vertex has at most one parent, except the root node Aroot which has no parent and represents the final level of arousal that influences cognitive effort. A : {a|a ∈ {[0..1] ∈ }}
W : {w|w ∈ {[0..1] ∈ }}
R : {∀ ai ∈ A ∃! r | r : A × A → W, r : (ai , ap ) = w} internal Aleaf explicit ∪ Aaggregated = A; ∀ ai ∈ A ∃! path(ai , aroot )
All the nodes have a path towards the root node: this property guarantees the non-presence of cycles. Leaf nodes (node without children) are values explicitly provided by an agent: they indicate the related degree of a given type of arousal
Cognitive Effort for Multi-agent Systems
59
(eg. 0 is not motivated at all, 1 is highly motivated). Internal nodes represent aggregation nodes and like the root node’s value are inferred by the relationship with their children defined in R along with the related strength in W . In particular, each internal node’s value is the weighted sum of its c children’s values: c leaf internal (az · wz ) ≤ 1 aexplicit = [0..1] ∈ , aaggregated = z=0
Finally, the root node is a special internal node with weight wroot = 1 and, as it has no parent, its relation rroot = ∅. The weights w in the arousal taxonomy may be derived from the literature or learnt while the explicit values aleaf explicit represent an individual’s subjective status before starting a task. An example of a possible subjective arousal taxonomy is depicted in figure 1. Based on the level of arousal, we may adopt the descriptive Yerkes-Dodson law [21] which empirically studied the relationship between performance and arousal. For example, the authors discovered that increasing the intensity of a shock administered to mice facilitated the learning of brightness discrimination, up to a point. However, further increases of shock intensity caused learning deteriorate. These conclusions, appear to be valid in an extraordinarily wide range of situations. The law is usually modeled with an inverted U-shape curve which increases the level of performance at low level of arousal and then decreases with higher levels of arousal. The law is task-dependent: different tasks require different levels of arousal for optimal performance thus the shape of the curve can be highly variable. The first part of the curve, which increases, is positively affected by the strength of arousal while the second part, which decreases, is influenced by the negative effect of arousal on cognitive effort. The law is useful to study the maximum performance an agent can achieve based on his subjective status before performing a task. As each task may have a different complexity and as the law is task-dependent, we propose to introduce a task dictionary formally described as a tuple < T S, A, P, T P, D, Y D, δyd , δd , δtp >: · T S ⊆ ℵ is the set of possible tasks; · A, P, T P, D ⊆ {[0..1] ∈ } are the possible set of values for Arousal, Performance, Time-pressure, Difficulty; · Y D ⊆ {fyd : A → P } is the set of possible functions that model the YerkesDodson law. Each of them takes an arousal level and return a performance value; · δyd : T S → Y D assigns to a task a Y.D. law function; · δtp : T S → T P assigns to a task a degree of time-pressure. · δd : T S → D maps for each task a level of difficulty. A task dictionary example is depicted in table 1. Here the YD laws associated to each task are for descriptive purposes but in reality they may be approximated with experiments via numerical analysis. Once we have the subjective arousal taxonomy for a subject and the task difficulty dictionary, we are able to study the effect of arousal on cognitive effort. The derived performance is the maximum level of attention that a subject can elicit on a certain task. Formally, given a task
60
L. Longo and S. Barrett Table 1. Tasks dictionary with descriptive YD equations Description
D TP
YD law (a ∈ A) 2
math equation 0.8 0.9 fyd (a) = e−a reading/summary 0.6 0.7 fyd (a) = −a2 + a reading 0.3 0.2 fyd (a) = −2a2 + 2a 2
dictate 0.4 0.8 fyd (a) = e−(a−2) memorising poetry 0.7 0.6 fyd (a) = −3a2 + a
Fig. 1. A possible Subjective Arousal Taxonomy
ts ∈ T S, the max performance p on the task ts is derived from the associated Yerkes-Dodson law with an input level of arousal a, that means p = (δyd (ts))(a). Intentions. A subject’s intentions have an important role in determining the amount of cognitive effort while performing a task. As with arousal, this is an individual, subjective concept that may be split into short-term and long-term intentions, and that may be modelled with real values. We refer to short-term intentions or momentary intentions with Ist and to long-term intentions with Ilt . Those are subjective judgments in the range [−1..1] ∈ (-1: no intention at all; 1: highly intentioned). The overall degree of intentions I is the average of the above values and may have a negative, positive or null influence on cognitive effort: I : [0..1] ∈ 2 → [−1..1] ∈ , IST , ILT : [−1..1] ∈ I(IST , ILT ) = 32 IST + 31 ILT This model deals with intentional shades: an individual may be momentarily intentioned to success in a IQ test without any future intention. Involuntary Context Bias. Several external factors may influence cognitive effort as pseudo-static and unpredictable biases. The former refers to biases that are almost static and depend on environmental aspects. For instance, there is a large difference across ethnic groups and geographic areas in the available knowledge: people living in poor African countries have a reduced access to knowledge compared to their counterpart in occidental countries so they may find a question dependent on access to information to be more difficult to answer. Another pseudo-static bias is the task’s difficulty. Even though it is hard to exactly estimate the complexity of different tasks, it is not unfeasible perhaps
Cognitive Effort for Multi-agent Systems
61
to claim that reading a newspaper demands less cognitive effort than resolving a math equation. Unpredictable context biases represent involuntary context biases such as a phone ringing, questions from colleagues, e-mail delivering in a working context. These involuntary distractions and environmental aspects, in comparison to arousals and intentions, are easier to embed in a formalism as they are not individual-dependent. We propose real fuzzy values to model contextual available knowledge and unpredictable bias, while the level of task difficulty is obtained from the task dictionary. Knowledge availability is a positive factor, that means it elicits less cognitive effort, while task-difficulty and unpredictable bias are negative as they require more cognitive effort as value increases. The higher the value of contextual bias is, the more a subject has to concentrate allocating more cognitive effort on a task. To model how context bias negatively affects attention, we take the complement of knowledge availability: CB : [0..1] ∈ 3 → [0..1] ∈ , CB(Cknow , Tdif f , Ubias ) =
Cknow ,Tdif f ,Ubias : [0..1]∈
[1 − Cknow ] + Tdif f + Ubias 3
where CB is the total context bias, Cknow is the contextual knowledge availability, Tdif f is the task difficulty and Ubias is the unpredictable bias. Perception. The same task may be perceived differently by two subjects. In literature there is evidence suggesting that perceived difficulty is higher when individuals are presented with a new task: they may not know what the optimal amount of effort is, given a particular difficulty level [20]. We propose to model this concept as a simple real fuzzy value Pdif f = [0..1] ∈ where values close to 0 indicate a task perceived highly complex. Perception is connected to cognitive ability and skill acquisition. Intermediate students may perceive the resolution of math equations to be difficult compared to university students due to their limited experience, preparation and background. Perception has a negative effect as a subject who perceives a task to be difficult needs to allocate more resources eliciting higher cognitive effort. Time. Time is a crucial factor that must be considered in modelling cognitive effort. Temporal properties are essential because performing a task is not a single-instant action, rather is an action over time, therefore cognitive effort’s influencing factors need to be considered over time. Our environment is dynamic and, consequently, time-related: the temporal dimension is an important aspect of perception necessary to guide effective action in the real world [12]. Several temporal theories are available in the literature of computer science but less effort has been spent on the temporal-related aspect of cognitive effort. Firstly, we take into consideration time as a single stimulus that influence attention. We refer to this as time-pressure which is sometimes imposed by explicit instruction to hurry and sometimes by intrinsic characteristics of the explicit : [0..1] ∈ . For task. The former may me modelled as a fuzzy value Tpress instance, a student may resolve a task within an interval of 10 minutes. In this case we need to estimate or learn the maximum time to perform a task
62
L. Longo and S. Barrett
(mapped to 1) and transform 10 minutes in the scale [0..1]. The latter may be implicit modelled as a fuzzy value Tpress : [0..1] ∈ and we propose to adopt the task-related time-pressure value from the task-dictionary previously proposed that underlines the intrinsic pressure imposed by a certain task. For instance, a student may resolve an integral equation which requires an auto discipline and rigorousness in performing the task. He must keep track of the initial problem, partial results, the next step, requiring greater cognitive effort: slowing down or even stopping for just an instant of time may force the student to start again. The more difficult arithmetic problems require more storage, the more they impose high time-pressure eliciting greater cognitive effort [11]. The final degree of time-pressure is modelled as the average of the above values: explicit implicit Tpress : [0..1] ∈ , Tpress = 12 Tpress + 21 Tpress
Everyday experience suggests that time intervals also play an important role in directing our attention to the external world. Cognitive effort may vary while performing a task due to the variation on the degree of focused attention and sustained attention. The former is referred to as the ability to respond discretely to specific visual auditory or tactile stimuli while the latter refers to the ability to maintain a consistent behavioural response while performing a certain task [11]. The modelling of focused, and sustained attention, is not easy at all and these properties are individual-dependent. However, taking into consideration a certain task, a trajectory that describes how the degree of sustained attention would likely behave for most of the people on that task would be useful. To deal with this we propose an extension for our task-dictionary by adding an estimation of the time needed to complete a certain task. This value may be learnt through experimentations by using unsupervised techniques and is needed to estimate the end of a certain task in order to model the focused attention function. This function likely has a S-shape that increases quickly at the beginning reaching the maximum peak of attention, then decreases very moderately during the sustained attention time-interval, and decreases quicker until the estimated time for the completion of the task. Yet, this function may be approximated by applying numerical analysis and should model the fact that, at the beginning, people elicit almost the highest degree of attention, which is the maximum performance level obtained by the Yerkes-Dodson law of a task ts with a given arousal a defined before (p = (δyd (ts))(a)); from here it follows an interval of time in which individuals perform well, maintaining a high level of sustained attention. Then the curve starts to decrease towards the estimated end-point of the task from which the function persists but at very low levels, underlying that a small amount of cognitive effort is dedicated to the task. Formally, we add to the task-dictionary: · T ⊂ is the domain of time; · AT ⊆ P is the set of possible degrees for attention; · SA : {ff a : T → AT } is the set of possible functions that models the concept of focused attention for tasks; · δf a : T S → SA is the function that maps a S-shaped function from the domain SA to a given task;
Cognitive Effort for Multi-agent Systems
63
· δT E : T S → T is the function that assigns to a task a completion estimated time. The completion time would be useful for understanding whether an agent performed similarly to others or required further time to complete a task or even before giving up. Taking into account the explanations so far, we are now able to provide a general formula to compute cognitive effort of an agent on a given task along with a formalism summary depicted in figure 2. CE : [0..1] ∈ 5 × [−1..1] ∈ × T S × T 2 →
CA = CA(Gth ,Gr ,t), A = (δyd (α))(Aroot ), P D = P D
t = tpress , I = I(Ist ,Ilt ), CB = CB(Cknow ,Tdif f ,Ubias )
CE(CA , A , I , CB , P D , t , α, t0 , t1 ) =
t1 CA + A + I + CB + P D + t δf a (α) (x) dx 6 t0 where CA is cognitive ability, A represents arousals, I is intentions, CB is contextual bias, tpress is the time pressure, t0 is the start time and t1 the time spent on the task α.
Fig. 2. The Cognitive Effort’s formalism
5
A Multi-agent Application
In this section we take the viewpoint of an agent α situated in an open environment trying to choose the best interaction partners from a pool of potential agents A and deciding on the strategy to adopt with them to resolve an effort-full task T in an optimal way. Our heuristic, based on cognitive effort, represents a possible strategy to select reliable partners. Each agent in the system has certain cognitive properties such as experience, motivation, intentions, cognitive ability, and it is realistic to assume that they operate in environments with different constraints and biases. Furthermore, we assume each agent acts honestly and provides real information about its cognitive status. An agent α may split the
64
L. Longo and S. Barrett Table 2. Agents, influencing factors and Cognitive Effort Factor Gth (Growing Threshold) GR (Growing Rate) Age (Years) CA (Cognitive Ability) IST (Short-term Intentions) ILT (Long-term Intentions) I (Intentions) Pdif f (Perceived Difficulty) T P (Time Pressure) Cknow (Context Knowledge) Tdif f (Task Difficulty) Ubias (Unpredictable Bias) CB (Contextual Bias) ar1 (Anxiety) ar2 (Curiosity) ar3 (Sleepiness) ar4 (Tiredness) ar5 (Motivation) Aroot (Arousal Root) 2
fyd (A) = e−Aroot t0 /t1 (secs) C.E. (Cognitive Effort)
a1 85 22 18 0.25 0.4 0.6 0.47 0.7 0.5 1 0.8 0.6 0.47 0.5 0.7 0.3 0.4 0.6 0.67
a2 a3 a4 a5 85 85 85 85 22 22 22 22 25 28 40 55 0.37 0.43 0.62 0.82 0.6 -0.5 -1 0.5 -0.3 -1 0 0.2 0.3 -0.67 -0.67 0.4 0.7 0.6 0.5 0.7 0.5 0.5 0.5 0.5 1 0.9 0.8 1 0.8 0.8 0.8 0.8 0.3 0.4 0.6 0.7 0.37 0.43 0.53 0.5 0.7 0.5 0.4 0.3 0.3 0.5 0.8 0.4 0.3 0.5 0.7 0.4 0.3 0.6 0.2 0.5 1 0.8 0.6 0.3 0.81 0.72 0.67 0.37
0.64 0.52 0.59 0.64 0.87 0 / 55 0 / 40 0/45 0/40 0/50 18.30 14.32 11.00 11.87 22.63
task T in partial sub-tasks t1 ...tn with the same estimation of required effort. We suppose he has direct connections with 5 agents, a1 .. a5 ∈ A , and it forwards to each of them one of the 5 sub-tasks t1 ..t5 .. Now, each agent starts to resolve the sub-task of competence by using its own resources, skills and experience. Once the sub-task is completed, they send back to α their subjective status of arousals, their intentions, cognitive ability, perception, involuntary context bias and the start/stop time needed to complete the assigned sub-task. Let’s assume the agent α adopts the first task (math equation) of the task-dictionary depicted in table 1 and uses the subjective arousal taxonomy depicted in figure 1 with the explicit values (ari ) provided by each agent and showed in table 2. The 2 Yerkes-Dodson law associated to the task is δyd (α) = fyd (a) = e−a while the task difficulty is δd (α) = 0.8. The time pressure is δtp (α) = 0.9 and the focused attention trajectory is: δf a (α) = ff a (t) = [1 + e(bt−a) ]−1 . The parameter b shrinks the S-shaped curve while a shifts the function to 15 the right. We set b = 100 to model sustained attention at the beginning of α for around 20 seconds, and a to effectively start from the 0 of the time line (x-axis) with attention at high level (1). The function decreases quickly after 20 seconds reaching low levels of around 50 (δT E (α) = 50) seconds which is the estimated time we set for the completion of T . α uses our heuristic formalism as a potential decision-supporting tool useful to generate an index of cognitive effort for each partner: the results obtained are presented in table 2. It may forward remaining sub-tasks in proportion of the elicited agents’ cognitive effort, it may deliver more sub-tasks to agents that showed less cognitive effort (eg. a3 , a4 ) in completing assigned work. Furthermore, α has a knowledge of its partners’ skills, their subjective status and over time it can infer something about their
Cognitive Effort for Multi-agent Systems
65
behaviour. For instance, information about the learning rate may be learnt, as α might assume that its partners, over time, should acquire experience, get more skilled therefore manifesting less cognitive effort in performing similar tasks.
6
Open Issues and Future Challenges
Cognitive effort is a subjective phenomenon and its formalisation for a virtual agent is not a trivial problem. In this paper we tackled the problem by analysing current state-of-the-art in psychology, cognitive and neuro-science to build a formalism that is extensible and open to further refinements. The heuristic proposed here can be embedded in an artificial agent providing it with a cognitive effort-based decision supporting system. The computational model is an aggregation of a subset of the possible presumptions or factors influencing cognitive effort such as cognitive ability, arousals, intentions, contextual bias, perception and time. We intend this to be the starting point of an active discussion among researchers in social and computer science fields. In this work we have considered each factors’ influence being the same but a simple aggregation is not subtle enough to provide good estimates of cognitive effort. Argumentation theory provides a framework for systematic studying how cognitive effort influencing factors may be combined, sustained or discarded in a computable formalism towards a robust approximation of the concept. In our opinion, cognitive effort shares some of the properties of a non-monotonic concept by which we mean that adding a factor to the overall formalism never produces a reduction of its set of consequences [2]. Adding a new argument and reasoning on its plausibility/combination with previous ones increases the robustness of the overall formalism. A new factor may attack or support an existing one therefore amplifying or diminishing its strength. The consideration of mutual relationships among arguments is fundamental in assessing an index of cognitive effort, therefore a future challenge might be the investigation of the strength of each argument and their mutual influence by using non-monotonic logics such as the defeasible reasoning semantic proposed by Pollock [13]. It remains to demonstrate this aspect of computation of cognitive effort. In terms of evaluation, popular frameworks such as the NASA-TLX [8] and SWAT [14] may be useful for comparisons. Furthermore, our framework, as conceived to be open and adaptable to different contexts, may be applied in operational environment and, for instance, populated by physiological-based argument related to neuro-science equipment such as fMRI, EEG and other types of physiological scanner.
References 1. Barrick, M.R., Mount, M.K., Strauss, J.P.: Conscientiousness and performance of sales representatives: Test of the mediating effects of goal setting. Journal of Applied Psychology 78(5) (1993) 2. Brewka, G., Niemel, I., Truszczynski, M.: Nonmonotonic reasoning. In: Handbook of Knowledge Representation, pp. 239–284 (2007)
66
L. Longo and S. Barrett
3. Carroll, J.B.: Human Cognitive Abilities: A survey of Factor-analytic Studies. Cambridge Uni. Press, Cambridge (1993) 4. Carver, C.S., Scheiner, M.F.: On the Self-regulation of Behavior. Cambridge University Press, UK (1998) 5. Dickens, T.W.: New Palgrave Dictionary of Economics. In: Cognitive Ability (forthcoming) 6. Fried, Y., Slowik, L.H.: Enriching goal-setting theory with time: An integrated approach. Academy of Management Review 29(3), 404–422 (2004) 7. Gopher, D., Braune, R.: On the psychophysics of workload: Why bother with subjective measures? Human Factors 26(5) (1984) 8. Hart, S.G., Staveland, L.E.: Development of nasa-tlx (task load index): Results of empirical and theoretical research. In: Human Mental Workload, pp. 139–183 (1988) 9. Huey, F.M., Wickens, C.D.: Workload transition: Implications for Individual and team performance. National Academy Press, Washington (1993) 10. Humphreys, M.S., Revelle, W.: Personality, motivation and performance: A theory of the relationship between individual differences and information processing. Psychological Review 91(2) (1984) 11. Kahneman, D.: Attention and Effort. Prentice Hall, NJ (1973) 12. Miniussi, C., Wilding, E.L., Coull, J.T., Nobre, A.C.: Orienting attention in time: Modulation of brain potentials. Brain 122 (8) 13. Pollock, J.L.: Cognitive Carpentry. A blueprint for How to Build a Person. MIT Press, Cambridge (1995) 14. Reid, G.B., Nygren, T.E.: The subjective workload assessment technique: A scaling procedure for measuring mental workload. In: Human Mental Workload, pp. 185– 218 (1988) 15. Salthouse, T.: When does age-related cognitive decline begin? Neurobiology of Aging 30(4), 507–515 (2009) 16. Sun, R.: Duality of the Mind. Lawrence Erlbaum Assoc., NJ (2002) 17. Sun, R., Naveh, I.: Simulating organizational decision-making using a cognitively realistic agent model. Journal of Artificial Societies and Social Simulation 7 (3) (2004) 18. Vroom, V.H.: Work and Motivation. Wiley, NY (1964) 19. Xiaocong, F., John, Y.: Realistic cognitive load modeling for enhancing shared mental models in human-agent collaboration. In: AAMAS 2007 - Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems, p. 60 (2007) 20. Yeo, G., Neal, A.: Subjective cognitive effort: a model of states, traits, and time. Journal of Applied Psychology 93(3) (2008) 21. Yerkes, R.M., Dodson, J.D.: The relation of strength of stimulus to rapidity of habit-formation. Journal of Comparative Neurology and Psychology 18, 459–482 (1908) 22. Yin, X., Goudriaan, J., Lantinga, E.A., Vos, J., Spiertz, H.J.: A flexible sigmoid function of determinate growth. Annals of Botany 91, 361–371 (2003)
Behavioural Abstraction of Agent Models Addressing Mutual Interaction of Cognitive and Affective Processes Alexei Sharpanskykh and Jan Treur VU University Amsterdam, Department of Artificial Intelligence De Boelelaan 1081, 1081 HV Amsterdam, The Netherlands {sharp,treur}@few.vu.nl http://www.few.vu.nl/~{sharp,treur} Abstract. In this paper the issue of relating a specification of the internal processes within an agent to a specification of the behaviour of the agent is addressed. A previously proposed approach for automated generation of behavioural specifications from an internal specification was limited to stratified specifications of internal processes. Therefore, it cannot be applied to mutually interacting cognitive and affective processes described by interacting loops. However, such processes are not rare in agent models addressing integration of cognitive and affective processes and agent learning. In this paper a novel approach is proposed which addresses this issue. The proposed technique for loop abstraction is based on identifying dependencies of equilibrium states for interacting loops. The technique is illustrated by an example of an internal agent model with interdependent processes of believing, feeling, and trusting.
1 Introduction Dynamics of an agent are usually modelled by an internal agent model specifying relations between mental states of the agent. Often such agent models are specified in an executable format following a noncyclic causal graph (e.g., [12]). However, for more complex and adaptive types of agents, such agent models may have a format of dynamical systems including internal loops. Such cyclic interactions are well-known from the neurological and brain research areas. For example, agents in which as-if body loops [5] are used to model the interaction between feelings and other mental states (e.g., [9]). Thus, although the noncyclic graph assumption behind most existing agent models (as, for example in [12] ) may be useful for the design of software agents, it seriously limits applicability for modelling more realistic neurologically founded processes in natural or human-like agents. To perform simulations with agents it is often only the behaviour of the agents that matters, and the internal states can be kept out of the simulation model. Other work shows that automated transformations are possible (1) to obtain an executable internal model for a given behavioural specification (e.g., [13]), and (2) to obtain a behavioural specification from an executable internal model. The approach available for the second type of transformation (cf. [12]) has a severe limitation, as an executable internal model is assumed which has a noncyclic, stratified form. This limitation excludes the approach from being applied to agent models addressing more complex internal processes in which internal loops play a crucial role. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 67–77, 2010. © Springer-Verlag Berlin Heidelberg 2010
68
A. Sharpanskykh and J. Treur
In this paper a more generally applicable automated transformation is introduced from an internal agent model to a behavioural model, abstracting from the internal states. Within this transformation, techniques for loop abstraction are applied by identifying how equilibrium states depend on inputs for these loops. It is also shown how interaction between loops is addressed. In particular for agent models, in which the interaction between cognitive and affective processes plays an important role the proposed approach is useful. Empirical work such as described in, for example, [8, 10], reports such types of effects of emotions on beliefs. From the area of neuroscience informal theories and models have been proposed (e.g., [5, 6]), involving a causal relation from feeling to belief, which is in line, for example, with the Somatic Marker Hypothesis described in [2], and may also be justified by a Hebbian learning principle (cf. [4]). These informal theories have been formalised in an abstracted computational form to obtain internal agent models (e.g., [16]). The transformation is illustrated for two agent models that include interaction between cognitive and affective processes. A single loop case is illustrated for an existing agent model for emotion-affected beliefs, described in [9]. In addition, a novel agent model with interdependent processes of believing, feeling, and trusting is introduced in this paper illustrating a case with two interacting loops. The paper is organised as follows. First, in Section 2 the modelling approach is briefly introduced. Section 3 presents the transformation procedure. The applications of the procedure are described in Section 4. Finally, Section 5 is a discussion.
2 Specifying Internal Agent Models As in [12], both behavioural specifications and internal agent models are specified using the reified temporal predicate language RTPL, a many-sorted temporal predicate logic language that allows specification and reasoning about the dynamics of a system. To express state properties ontologies are used. An ontology is a signature specified by a tuple <S1,…, Sn,…, C, f, P, arity>, where Si is a sort for i=1,.., n, C is a finite set of constant symbols, f is a finite set of function symbols, P is a finite set of predicate symbols, arity is a mapping of function or predicate symbols to a natural number. An interaction ontology InteractOnt is used to describe the (externally observable) behaviour of an agent. It is the union of input (for observations and incoming communications) and output (for actions and outgoing communications) ontologies: InteractOnt = InputOnt ∪ OutputOnt. For example, observed(a, t) means that an agent has an observation of state property a at time point t, communicated(a1, a2, m, v, t) means that message m with confidence v is communicated from agent a1 to agent a2 at time point t, and performing_action(b) represents action b. The internal ontology InternalOnt is used to describe the agent’s internal cognitive state properties. Within the state ontology also numbers are included with the usual relations and functions. In RTPL state properties as represented by formulae within the state language are used as terms (denoting objects). The set of function symbols of RTPL includes ∧, ∨, →, ↔: STATPROP x STATPROP → STATPROP; not: STATPROP → STATPROP, and ∀, ∃: SVARS x STATPROP → STATPROP, of which the counterparts in the state language are Boolean propositional connectives and quantifiers. To represent dynamics of a system sort TIME (a set of time points) and the ordering relation > : TIME x TIME are introduced in RTPL. To indicate that some state property holds at some time point the relation at: STATPROP x TIME is introduced. The terms of
Behavioural Abstraction of Agent Models Addressing Mutual Interaction
69
RTPL are constructed by induction in a standard way from variables, constants and function symbols typed with all before-mentioned sorts. The set of well-formed RTPL formulae is defined inductively in a standard way using Boolean connectives and quantifiers over variables of RTPL sorts. More details can be found in [12]. Agent models are specified within RTPL in the following format: at(a, t) ⇒ at(b, t+d) where d is the time delay of the effect of state property a on state property b, which for dynamical systems is often indicated by Δt. These state properties may involve variables, for example for real numbers. This format subsumes both causal modelling languages (e.g., GARP [8]) and dynamical system modelling languages based on difference or differential equations (e.g., [10]), as well as hybrid languages combining the two, such as LEADSTO [3].
3 Abstraction of an Internal Agent Model: Eliminating Loops In this section first the general transformation procedure as adopted from [12] is described. Next the contributed loop elimination procedure is addressed, starting by discussing the assumptions underlying the procedure, and further showing in more detail how both single loops and interaction between loops can be handled. The general transformation procedure The format at(a, t) ⇒ at(b, t+d) is equivalent to at(a, t-d) ⇒ at(b, t), where t is a variable of sort TIME. When a number of such specifications are available for one atom at(b, t), by taking the disjunction of the antecedents one specification in past to present format can be obtained ∨i at(ai, t-di) ⇒ at(b, t). When in addition a form of closed world assumption is assumed, also the format ∨i at(ai, t-di) ⇔ at(b, t) is obtained, which specifies to equivalence of the state formula b at t with a past formula. This type of format, called pp-format is used in the abstraction procedure introduced in [12]. The rough idea behind the overall procedure is as follows. Suppose a ppspecification B ⇔ at(p, t) is available. Moreover, suppose that in B only two atoms of the form at(p1, t1) and at(p2, t2) occur, whereas as part of the agent model also specifications B1 ⇔ at(p1, t1) and B2 ⇔ at(p2, t2) are available. Then, within B the atoms can be replaced (by substitution) by the formula B1 and B2. This results in B[B1/at(p1, t1), B2/at(p2, t2)] ⇔ at(p, t) which again is a pp-specification. Here for any formula C the expression C[x/y] denotes the formula C transformed by substituting x for y. Such a substitution corresponds to an abstraction step. For the general case the procedure includes a sequence of abstraction steps; the last step produces a behavioural specification that corresponds to the given agent model. Assumptions underlying the loop elimination approach 1. Internal dynamics develop an order of magnitude faster than the dynamics of the world external to the agent. 2. Loops are internal in the sense that they do not involve the agent’s output states. 3. Different loops have limited mutual interaction; in particular, loops may contain internal loops; loops may interact in couples; interacting couples of loops may interact with each other by forming noncyclic interaction chains. 4. For static input information any internal loop reaches an equilibrium state for this input information.
70
A. Sharpanskykh and J. Treur
5. It can be specified how the value for this equilibrium state of a given loop depends on the input values for the loop. 6. In the agent model the loop can be replaced by the equilibrium specification of 4. The idea is that when these assumptions are fulfilled, for each received input, before new input information arrives, the agent computes its internal equilibrium states, and based on that determines its behaviour. Loop elimination setup To address the loop elimination process, the following representation of a loop is assumed at(has_value(u, V1) ∧ has_value(p, V2), t) ⇒ at(has_value(p, V2 + f(V1, V2)d), t+d)
(1)
here u is the name of an input variable, p of the loop variable, t is a variable of sort TIME, and f(V1, V2) is a function combining the input value with the current value for p. Note that an equilibrium state for a given input value V1 in (1) is a value V2 for p such that f(V1, V2) = 0. A specification of how V2 depends on V1 is a function g such that f(V1, g(V1)) = 0. Note that the latter expression is an implicit function definition, and under mild conditions (e.g., ∂f(V1, V2)/∂V2 ≠ 0, or strict monotonicity of the function V2 → f(V1, V2)) the Implicit Function Theorem within calculus guarantees the existence (mathematically) of such a function g. However, knowing such an existence in the mathematical sense is not sufficient to obtain a procedure to calculate the value of g for any given input value V1. When such a specification of g is obtained, the loop representation shown above can be transformed into: at(has_value(u, V1) ⇒ at(has_value(p, g(V1)), t+D),
where D is chosen as a timing parameter for the process of approximating the equilibrium value up to some accuracy level. To obtain a procedure to compute g based on a given function f, two options are available. The first option is, for a given input V1 by numerical approximation of the solution V2 of the equation f(V1, V2) = 0. This method can always be applied and is not difficult to implement using very efficient standard procedures in numerical analysis, taking only a few steps to come to high precision. The second option, elaborated further below is by symbolically solving the equation f(V1, V2) = 0 depending on V1 in order to obtain an explicit algebraic expression for the function g. This option can be used successfully when the symbolic expression for the function f is not too complex; however, it is still possible to have it nonlinear. In various agent models involving such loops a threshold function is used to keep the combined values within a certain interval, for example [0, 1]. A threshold function can be defined, for example, in three ways: (1) as a piecewise constant function, jumping from 0 to 1 at some threshold value (2) by a logistic function with format 1/(1+exp(-σ(V1+ V2-τ)), or (3) by a function β (1-(1- V1)(1- V2)) + (1-β) V1 V2. The first option provides a discontinuous function, which is not desirable for analysis. The third format is used here, since it provides a continuous function, can be used for explicit symbolic manipulation, and is effective as a way of keeping the values between bounds. Note that this function can be written as a linear function of V2 with coefficients in V1 as follows:
Behavioural Abstraction of Agent Models Addressing Mutual Interaction
71
f(V1, V2) = β(1-(1- V1)(1- V2)) + (1-β) V1 V2 – V2 = - [(1- β)(1- V1) +β V1 ] V2 + β V1 From this form it follows that ∂ f(V1, V2) /∂ V2 = ∂ -[[(1- β)(1- V1)
+β V1 ] V2 + β V1]/∂ V2 = - [(1- β)(1- V1) +β V1 ] ≤ 0
This is only 0 for extreme cases: β = 0 and V1 = 1 or β = 1 and V1 = 0. So, for the general case V2 → f(V1, V2) is strictly monotonically decreasing, which shows that it fulfills the conditions of the Implicit Function Theorem, thus guaranteeing the existence of a function g as desired. Obtaining the equilibrium specification: single loop case Using the above expression, the equation f(V1, V2) = 0 can be easily solved symbolically: V2 = β V1 / [(1- β)(1- V1) +β V1 ]. This provides an explicit symbolic definition of the function g: g(V1) = V2 = β V1 / [(1- β)(1- V1) +β V1 ]. For each β with 0<β <1 this g is a strictly monotonically increasing function with g(0) = 0 and g(1) = 1. A few cases for specific values of the parameter β are as follows: (i) β = 0, g(V1) = 0; (ii) β = 0.5, g(V1) = V1 ; (iii) β =1, g(V1) = 1. Obtaining the equilibrium specification: interacting loops case Interaction between two loops occurs when the outcome of one loop is used as (part of) input in another loop; it may occur in two forms: monodirectional or bidirectional. In the monodirectional case the previously described method can be used in a straightforward manner one-by-one for each of the loops, first for the loop providing input for the other loop. The bidirectional case requires more elaboration. First it is assumed that the input from the other loop is combined with the externally provided input as follows: v1 = λ1(u1)p2 + μ1(u1) and v2 = λ2(u2)p1 + μ2(u2) where ui denotes the external input (what was indicated above by V2) for a loop i, pi the state of the loop (what was indicated above by V2), and λi and μi are functions of the external input ui. Special cases are: (1) λ1(u1) = w1 and μ1(u1) = w2 u1 , in which case they are combined according to a weighted sum, (2) λ1(u1) = u1 and μ1(u1) = 0, in which case p2 acts as a modifier of the external input u1; e.g., an estimated degree of reliability of the incoming information (3) λ1(u1) = - [(1- β)(1-u1) +βu1 ] and μ1(u1) =βu1 which provides the combination function used in f(V1, V2) above. To solve the two coupled equations for this case a simplified notation is used: v1 =
λ1p2 + μ1 and v2 = λ2p1 + μ2.
[( 1- β1)(1-(λ1p2 + μ1)) +β1(λ1p2 + μ1) ] p1 = β1(λ1p2 + μ1) [( 1- β2)(1-(λ2p1 + μ2)) +β2(λ2p1 + μ2) ] p2 = β2(λ2p1 + μ2) These equations can be rewritten as follows: (2β1-1)λ1 p1 p2 +[ (1- β1)(1- μ1) +β1μ1 ] p1 = β1(λ1p2 + μ1) (2β2-1)λ2 p1 p2 +[ (1- β2)(1- μ2) +β2μ2 ] p2 = β2(λ2p1 + μ2)
72
A. Sharpanskykh and J. Treur
Multiplying the first equation by (2β2-1)λ2 and the second by (2β1-1)λ1 and subtracting them from each other provides one equation that can be rewritten into a form that provides an explicit expression of p2 in terms of p1: p2 = [ (2β2-1)λ2 [ ( 1- β1)(1- μ1) +β1μ1 + (2β1-1)λ1 β2λ2 ] p1 + (2β1-1)λ1 β2 μ2 - (2β2-1)λ2 β1μ1) ] / [ (2β1-1)λ1 [ ( 1- β2)(1- μ2) +β2μ2 + (2β2-1)λ2 β1λ1]] Filling the expression for p2 in the second equation provides one equation in p1: [( 1- β2)(1-(λ2p1 + μ2)) +β2(λ2p1 + μ2) ] [ (2β2-1)λ2 [ ( 1- β1)(1- μ1) +β1μ1 + (2β1-1)λ1 β2λ2 ] p1 + (2β1-1)λ1 β2 μ2 - (2β2-1)λ2 β1μ1) ] / [ (2β1-1)λ1 [ ( 1- β2)(1- μ2) +β2μ2 + (2β2-1)λ2 β1λ1]] = β2(λ2p1 + μ2) By solving this equation an explicit symbolic expression is obtained for p1 and for p2.
4 Feeling, Trusting and Believing In this section two applications of the proposed procedure are described. First, the single loop case is illustrated for an agent model involving emotion-affected beliefs. Then, a novel agent model with interdependent processes of believing, feeling, and trusting is presented illustrating a case with two interacting loops. 4.1 A Single Loop Case for Emotion-Affected Beliefs Beliefs of an agent are time-labelled internal representations created based on communication and observation results received by the agent. In [9] beliefs are specified using the function belief(p:STATPROP, v:VALUE), here p is the content of the belief and v is the degree of confidence of the agent from the interval [0, 1] that the belief content is true. According to the literature [7, 8], beliefs are only rarely emotionally unbiased. Previously, a model for emotion-affected beliefs was proposed in [9] based on a body loop for a cognitive state described by Damasio [5, 6]: input → cognitive state → preparation for the induced bodily response → induced bodily response → sensing the bodily response → sensory representation of the bodily response → feeling the emotion
As a variation, an as-if body loop uses a direct causal relation preparation for the induced bodily response → sensory representation of the induced bodily response as a shortcut in the causal chain. The body loop and as-if body loop are extended to a recursive body loop or as-if body loop by assuming that the preparation of the bodily response is (also) affected by the state of feeling the emotion. An as-if body loop for a cognitive state w is formalized in RTPL as follows: at(input(w, V1) ∧ feeling(b, V2) ∧ cog_state(w, V3), t-Δt) ⇒ at(cog_state (w, V3 + γ(g(β1, V1,V2) - V3)Δt), t) at(cog_state(w, V) ∧ body_state_for(b, w), t-Δt) ⇒ at(preparation_state(b, V), t) at(preparation_state(B, V), t-Δt) ⇒ at(srs(B, V), t) at(srs(B, V), t-Δt) ⇒ at(feeling(B, V), t)
Here g(β1, V1,V2) is a threshold function and γ determines the speed of change.
Behavioural Abstraction of Agent Models Addressing Mutual Interaction
73
The model from [9] contains a composite body loop, which comprises two simple loops. To eliminate the composite loop using the mechanisms from Section 3, first an isomorphic model has been identified in which a redefined simple body loop has a reciprocal relation with the belief state. Using the procedure from Section 3, two coupled equations are obtained for this model: p1 = β1(1 –(1-p2)(1-V))+(1-β1) p2V [(1-β2)(1- p1)+ β2 p1] p2 =β2 p1 Here p1 represents the confidence of the belief state, p2 is the variable for the preparation state and V is the input provided by the world. From this system a quadratic equation in p1 is obtained: (1-2β2) p12 + (β2(1+β1+V)+ β1V-1) p1+β1V(1-β2) = 0 The solution to this equation agrees with the simulation results for particular values of parameters and the input reported in [9]. For example, for the case β1=0.8, β2=0.4, V=0.8, it is calculated that p1=0.9255 and p2=0.8923, which is also the case in the simulation. 4.2 Interacting Loops for Belief, Feeling and Trust Previously, several models combining beliefs and trust were proposed [1, 14]. The authors are not aware of any computational model that combines cognitive processes of believing and trusting with affective processes (feelings and emotions). In the following a first attempt for a model for believing, feeling, and trusting is described. In this model two types of beliefs are distinguished: a factual belief of an agent that some information was observed in the environment or communicated by some source, and a belief representing the agent’s own valuation of some property. An agent creates beliefs not only about world states, but also about the world dynamics, specified by the function dyn_prop(o, f), where f is the name of a dynamic property describing the dynamics of the world object o which may be composite. In the absence of recent experience the agent may reason about the present world state using such beliefs and old experience stored in factual beliefs. To enable such reasoning, the auxiliary predicate belief_project(s:AGENT, ag:AGENT, w:STATPROP, V:VALUE) is introduced, which specifies a temporal projection of agent ag of the most recent factual belief about w based on the information received from source s. Here V is the confidence value obtained by projection; it is updated at each time point based on the agent’s beliefs about the world dynamics with the highest confidence: at(belief(communicated(s, ag, w, v, t1), q) ∧ belief(dyn_prop(w, f), v2) ∧ name_for(f, expr(x, y, z)) ∧ ∀f1:STATPROP [ f1≠f ∧ belief(dyn_prop(w, f1), v3, t1) → v3 < v2], t-Δt) ⇒ at(belief_project(s, ag, w, expr[x/v, y/t1, z/t]), t)
here expr [x/v] denotes the substitution of x by v in expr(x, y, z). It is assumed that the emotional influence on the factual beliefs is insignificant, belief projections are influenced by emotions indirectly through beliefs about the world dynamics, and all beliefs of the second type are influenced by emotions directly via an as-if body loop (see Fig. 1). A belief prospect is provided to this loop as input mediated by the agent’s trust in the information source of the belief prospect. In the model
74
A. Sharpanskykh and J. Treur
trust is an (cognitive and affective) attitude of an agent towards an information source that determines the extent to which information received by the agent from the source influences agent's beliefs. It is often argued that trust should be distinguished per information type [7]. In the model trust in a source w.r.t. an information type is represented by the preparation state to accept information of this type from the source. This preparation state accumulates all experience with the source. The amount of trust is a number from the range [0, 1]. Formally, the trust-mediated input to the as-if body loop for a belief about w is specified by: v = η pu. Here η is the strength of the communication through the channel from the source (η =1 if the source provided information about w, η=0 if no information about w was received from the source); for an agent’s observations the source is the environment, u is the confidence value for the belief prospect for w based on the information received from the source, p is the amount of trust of the agent to the source. According to the formula, the higher the agent’s trust in a source, the greater the source’s influence on the input value. A high confidence value provided by a trustworthy source brings the input value further away from the minimal knowledge state (v1 = 0). In the case when more than one source provides information of a type w to the agent, the overall confidence value of the agent’s belief representing its valuation of w is calculated by aggregating the agent’s emotional beliefs about w for each source: b = ∑i=1..n ηi bi / ∑i=1..n ηi , where ∑i=1..n ηi > 0 Here n is the number of information sources, bi is the confidence value of the emotional belief created based on information from the ith source, ηi is the strength of the communication channel from the ith source. srs(communicated (s, ag, w, v, t)) communicated (s, ag,w, v, t)
belief(communicated (s, ag, w, v, t), q)
preparation_ state(b2, v)
belief_project (s, ag, w, v) feeling(b2, v) srs(b2, v)
belief(dyn_ preparation_ prop(w, f), v) state(b3, v) feeling(b3, v)
srs(b3, v)
belief(w, v) feeling(b1, v)
preparation_ state(b1, v) srs(b1, v)
communicated (ag, a, w, v, t)
preparation_state(to_ be_communicated (ag, a, w, v))
observed(agent_ srs(observed belief(observed close(a), t) (agent_close(a), t)) (agent_close(a), t),1)
Fig. 1. A schematic representation of the model for believing, feeling, and trusting for an information source s; the bold arrows represent interaction between two loops
Thus, trust and beliefs are interdependent: on the one hand, the trust in a source builds up based on information received from the source evaluated using the agent's beliefs; on the other hand, the trust in a source determines the degree of influence of
Behavioural Abstraction of Agent Models Addressing Mutual Interaction
75
information from the source on the agent's beliefs. Furthermore, both trust and beliefs are influenced by emotions. Similarly to beliefs, the emotional influence on trust is modelled by an as-if body loop. The input for this loop is provided by the evaluation of experiences with the source. The parameters of the model allow specifying diverse individual characteristics, similar to the Big Five traits: γ’s in as-if body loops reflect the agent’s flexibility to adopt new experiences; α reflects the agent’s openness, as reported in [7], positive emotions, such as happiness, increase the agent’s openness, whereas negative emotions, such as anger, have the opposite effect. Based on the valuation of beliefs the agent decides how to act. In the model shown in Fig.1, if the agent has a high confidence (> 0.8) in a property, and observes that another agent is close, then it communicates this property to that agent. Formally: at(belief(observed(agent_close(a), t-Δt), 1) ∧ belief(w, v) ∧ v > 0.8, t- Δt) ⇒ at(preparation_state(to_be_communicated(ag, a, w, v)), t) at(preparation_state(to_be_communicated(ag, a, w, v)), t-Δt) ⇒ at(communicated(ag, a, p, w, t)), t)
In the following it is demonstrated how the procedure from Section 3 is applied to eliminate the loops from the model from Fig.1. The loop for the belief about the world dynamics is eliminated as shown in Section 4.1. To eliminate two interacting loops from Fig.1, following the procedure, two coupled equations are obtained: [(1-β1)(1-p2u)+ β1 p2u] p1 = β1p2u [(1-β2)(1- α| u – p1|) + β2 α| u – p1|] p2 =β2 α| u – p1| Here p1 represents the confidence of the agent’s belief about w, p2 is the degree of agent’s trust in the source for w; u is the confidence value for the belief prospect based on the information about w provided from the source (i.e., experience). The parameters β1 and β2 account for temporal discounting of old experiences in calculation of confidence values of beliefs and trust values. Furthermore, β1 and β2 reflect the agent’s positive versus negative bias. From this system for the case u ≥ p1 a quadratic equation in p1 is obtained: (h2h3 – (β1-h3)β2 α) p12 + (h1h3 + (β1-h3)β2αu + β1β2αu)p1 - αβ1β2u2 = 0,
(2)
where h1= (1-β2)(1-αu), h2=α (1-β2), h3 = 1-β1. The case u < p1 is treated similarly. In cases with more than one source, each couple of loops for each source is eliminated as described above, and the obtained expressions for emotion-affected beliefs is used to calculate the overall confidence value of the agent’s belief by aggregation. Now, after all loops have been eliminated from the model, an executable behavioural specification containing a direct relation between the input and output states of the model can be automatically generated using the procedure from [12]: at(observed(agent_close(a), t-Dt) ∧ communicated(s, ag, w, v, t1) ∧ t-Dt ≥ t1 ∧ f(expr(v, t1, t) > 0.8, t-Dt) ⇒ at(communicated(ag, a, w, v, t), t)
Here f(expr(v, t1, t)) is the solution to the equation (2) with function used to calculate the belief projection; Dt>>Δt.
u=expr(v, t1, t)
and expr is the
76
A. Sharpanskykh and J. Treur
5 Discussion Existing models for an agent’s internal functioning often have been designed from an artificial (software) agent perspective, without taking into account underlying neurological principles. In particular, they usually are based on a noncyclic causal graph assumption for the mental states involved. From the literature in the neurological and brain research area it is known that realistic processes often have a highly cyclic character. For example, affective processes may be triggered by cognitive processes, but in turn affect the very same cognitive processes. To obtain more realistic and neurologically founded agent models such mutual interactions cannot be ignored. To obtain such agent models, as for example argued in [11], techniques from the dynamical (complex) systems area in principle are a useful option, as opposed to the logical methods usually advocated. In general, the complexity of such dynamical systems may provide some computational difficulties. However, for a substantial class of applications of such models their complexity can be analysed by identifying a number of loops that during processing lead to equilibria, and transforming the model into one in which these loops are replaced by the equilibria they reach. This paper contributes such a transformation procedure to relate a specification of an agent’s internal processes to its behavioural specification, in particular for more complex and neurologically founded agent models. Due to this contribution agent models have become within reach with internal processing and adaptation involved in valuation of cognitive states based on the emotional responses they trigger. It has been shown that when an approximation perspective is adopted loops can be eliminated by replacing them by direct functional association specifications that only require limited time for their processing. Noncyclic specifications obtained using the proposed procedure can be handled by more common analysis methods. The resulting agent models also become suitable for other analysis methods, for example model checking.
References [1] Barber, K.S., Kim, J.: Belief Revision Process Based on Trust: Agents Evaluating Reputation of Information Sources. In: Falcone, R., Singh, M., Tan, Y.-H. (eds.) AA-WS 2000. LNCS (LNAI), vol. 2246, pp. 73–82. Springer, Heidelberg (2001) [2] Bechara, A., Damasio, A.: The Somatic Marker Hypothesis: a neural theory of economic decision. Games and Economic Behavior 52, 336–372 (2004) [3] Bosse, T., Jonker, C.M., van der Meij, L., Treur, J.: A Language and Environment for Analysis of Dynamics by Simulation. Int. J. of AI Tools 16, 435–464 (2007) [4] Bi, G.Q., Poo, M.M.: Synaptic Modifications by Correlated Activity: Hebb’s Postulate Revisited. Ann. Rev. Neurosci. 24, 139–166 (2001) [5] Damasio, A.: The Feeling of What Happens. Body and Emotion in the Making of Consciousness. Harcourt Brace, New York (1999) [6] Damasio, A.: Looking for Spinoza. Vintage books, London (2004) [7] Dunn, J.R., Schweitzer, M.E.: Feeling and Believing: The Influence of Emotion on Trust. Journal of Personality and Social Psychology 88(5), 736–748 (2005) [8] Eich, E., Kihlstrom, J.F., Bower, G.H., Forgas, J.P., Niedenthal, P.M.: Cognition and Emotion. Oxford University Press, New York (2000)
Behavioural Abstraction of Agent Models Addressing Mutual Interaction
77
[9] Memon, Z.A., Treur, J.: Modelling the Reciprocal Interaction between Believing and Feeling from a Neurological Perspective. In: Zhong, N., Li, K., Lu, S., Chen, L., et al. (eds.) BI 2009. Lecture Notes in Computer Science(LNAI), vol. 5819, pp. 13–24. Springer, Heidelberg (2009) [10] Niedenthal, P.M.: Embodying Emotion. Science 316, 1002–1005 (2007) [11] Port, R.F., van Gelder, T. (eds.): Mind as Motion: Explorations in the Dynamics of Cognition. MIT Press, Cambridge (1995) [12] Sharpanskykh, A., Treur, J.: Relating Cognitive Process Models to Behavioural Models of Agents. In: Jain, L., Gini, M., Faltings, B.B., Terano, T., Zhang, C., Cercone, N., Cao, L. (eds.) Proceedings of the 8th IEEE/WIC/ACM International Conference on Intelligent Agent Technology, IAT 2008, pp. 330–335. IEEE Computer Society Press, Los Alamitos (2008) [13] Sharpanskykh, A., Treur, J.: Verifying Interlevel Relations within Multi-Agent Systems. Int. J. of Agent-Oriented Software Engineering 4(2), 174–221 (2010) [14] Wang, Y., Singh, M.P.: Formal Trust Model for Multiagent Systems. In: Proc. of the 20th Int. Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 1551–1556 (2007)
The Effect of the Normalization Strategy on Voxel-Based Analysis of DTI Images: A Pattern Recognition Based Assessment Gloria D´ıaz1, , Gonzalo Pajares2, Eduardo Romero1 , Juan Alvarez-Linera3, Eva L´ opez4 , Juan Antonio Hern´ andez-Tamames2,5, and Norberto Malpica2,5 1
Universidad Nacional de Colombia, Colombia 2 Fundaci´ on C.I.E.N, Spain 3 Hospital Ruber Internacional, Spain 4 Hospital Severo Ochoa, Spain 5 Universidad Rey Juan Carlos, Spain
Abstract. Quantitative analysis on diffusion tensor imaging (DTI) has shown be useful in the study of disease-related degeneration. More and more studies perform voxel-by-voxel comparisons of fractional anisotropy (FA) values, aiming at detecting white matter alterations. Overall, there is no agreement about how the normalization stage should be performed. The purpose of this study was to evaluate the effect of the normalization strategy on voxel-based analysis of DTI images, using the performance of a classification approach as objective measure of normalization quality. This is achieved by using a Support Vector Machine (SVM) which constructs a decision surface that allows binary classification with two types of regions, generated after a statistical evaluation of the grey level values of regions detected as statistically significant in a FA analysis.
1
Introduction
Statistical comparison between brains of different groups of subjects is a common procedure in brain research. The standard framework for statistical group analysis is the Statistical Parametric Mapping (SPM) [1]. Initially designed for functional image analysis, it can be used to compare any group of images with scalar values. Voxel Based Morphometry [2], for example, allows to study morphological changes in the complete brain, by encoding the deformation of every brain to a standard template. Diffusion tensor Imaging provides information about water diffusion in several directions, obtaining a complete tensor that describes the direction of water diffusion in a specific voxel [3]. From the tensor image, several scalar values (e.g. mean diffusivity, fractional anisotropy), that characterize diffusion in brain regions, can be computed. Although in recent years, a high number of studies comparing Fractional Anisotropy (FA) images have been published [4–7], the image processing protocol used by the different
Gloria D´ıaz is supported by a grant from the Colombian Department of Science, Technology and Innovation (COLCIENCIAS), Grant no. 109-2005.
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 78–88, 2010. c Springer-Verlag Berlin Heidelberg 2010
Voxel-Based Analysis of DTI Images
79
groups is not standard. The SPM package does not include a FA template, so the existing T1, EPI and T2 templates have to be used. A non-weighted image, similar to T2 which is always acquired in the DTI protocol, is used by some authors. This image is normalized to the template, and the resulting deformation is then applied to the FA images [7]. In other cases, an anatomical T1 image is acquired to be used in the normalization step [4]. Some authors create an intermediate FA template from the normalized FA images, to which the original FA images are then renormalized [5, 6]. Although several different normalization protocols have been used, no specific reason for choosing one or the other is provided or even suggested. Evaluation of several normalization protocols was carried out by Pell et al. [8]. However, reported results were limited to qualitative differences. On the other hand, pattern recognition techniques has been recently used as classification tools for discriminating subjects affected by a brain pathology, based on the analysis of Diffusion Tensor Images [9–14]. The main challenge to these approaches is to identify signatures of disease in the images, named feature vectors, which allow to discriminate pathological from healthy patients. These features can be computed on regions extracted from a voxel-based analysis of anatomical images. In this work, we propose to use the classification performances as an objective measure of the effect of normalization protocols on a FA analysis, assuming that accurate discriminative regions produce better classification performance. For doing that, a support vector machine classifier is trained for learning the set of boundaries that optimally separates pathological from healthy patients, according to Fractional Anisotropy measures of regions detected as statistically significant in a voxel-based analysis. Then, this learning model is used for classify unobserved brain volumes and average classification performance is reported. We analyze the performance of the classifier when the FA images are normalized with the different protocols proposed in the literature.
2 2.1
Methods Image Acquisition
Data were acquired on a 3-T scanner (GE Signa II), equipped with 40 mT/m gradients, using an eight-channel phased array coil. Imaging parameters were b = 1000 s/mm/mm and TE=73 ms. Sixty 2.4-mm-thick contiguous slices were acquired with a 96x96 matrix on 24 cm field of view, reconstructed to 128x128, with an in-plane resolution of 1.875x1.875 mm. The sequence acquired unweighted (b=0) images and 15 diffusion-weighted images. The Fractional Anisotropy maps were calculated using Functool (GE 4.3. Advantage Windows WS). 2.2
Statistical Parametric Mapping
Statistical Parametric Mapping is a framework that allows statistical comparison of brain images of different groups. All brains are normalized to a standard anatomical space and are then voxel wise compared to find statistically significant differences in gray values. The technique was initially designed for
80
G. D´ıaz et al.
the analysis of functional images (PET and fMRI), but it can be applied to any set of images in which the gray value of the voxels has a meaning. Voxelbased morphometry (VBM) is the application of the SPM methodology to detect inter-subject morphological brain differences. This method provides a statistical estimation of inter-group brain density or volume differences using a voxel-byvoxel basis in a standardized space, in which the deformation needed for each brain to fit the template has been encoded as a gray valued on the normalized image (a process known as modulation). Overall, this method computes statistical parametric maps (SPM) for localizing significant differences between two or more experimental groups using a general linear model (GLM) [2]. Diffusion Tensor Imaging (DTI) characterizes brain tissue structure, based on underlying water diffusivity, providing a proper characterization of the white matter. It has extensively been used for studying pathologies such as schizophrenia, where the white matter is known to have been affected [3]. DTI acquisition provides full tensorial information that describes the direction of water diffusion. The diffusion tensor at each voxel is a 3x3 positive-definite symmetric matrix D, which can be represented by its eigen-decomposition as eq. 1 D = λ1 g1 g1 T + λ2 g2 g2 T + λ3 g3 g3 T
(1)
where, λ1 ≥ λ2 ≥ λ3 and g1 , g2 , g3 are the eigenvalues and eigenvectors of D respectively. Multiple scalar features can be extracted from the tensor data. The most commonly used is Fractional Anisotropy that characterizes the anisotropy of the diffusion tensor, providing measure of ’directionality’, which is computed by eq. 2. (λ1 − λ2 )2 + (λ2 − λ3 )2 + (λ3 − λ1 )2 FA = (2) √ 2 2 · (λ1 + λ22 + λ23 Once a Fractional Anisotropy image is obtained for each subject, the standard SPM statistical analysis can be applied over these scalar images for evaluating morphological differences between groups. The problem arises with the normalization step, i.e., there is no Fractional Anisotropy template in SPM. The complete DTI acquisition includes ones unweighted image (b=0), which we will name b0 and 15 weighted (b=1000) images. Also, a T1 anatomical image of the subject is always acquired. The b0 image is similar to a T2 acquisition, so it can be used to normalize the study to the T2 template. The anatomical T1 image can also be used as a normalization image. Once all FA images are normalized, we can create a study specific FA template, to which the FA images can be directly normalized. Thus, depending on the image used for normalization and on the creation of a FA template, different normalization pipelines can be implemented. We have tested six of them in this work: 1. Normalizing to an EPI template (N1): all b=0 images are registered and normalized to the Montreal Neurological Institute (MNI) EPI template, included in the SPM8 package. Then, the spatially transformation is applied
Voxel-Based Analysis of DTI Images
81
to the FA maps, and smoothed with Gaussian kernels with Full Width at Half Maximum of 4 × 4 × 4 millimeters. 2. Normalizing to a T2 template (N2): all b=0 images are registered and normalized to the MNI T2 template. Then, the spatial transformation is applied to the FA maps, and smoothed with FWHM of 4 × 4 × 4 millimeters. 3. Normalizing to a T1 template (N3): each b=0 image and FA map are registered to the corresponding T1-weighted image. After that, all T1weighted images are registered and normalized to the MNI T1 template, supplied by the SPM8 package, applying the transformation to the FA maps. Finally, the FA maps are smoothed with FWHM of 4 × 4 × 4 millimeters. 4. Normalizing to a FA template: in this method, a FA template is created from the FA maps of the control group. This is done by registering and normalizing the b=0 images and the FA maps using one of the three previous methods i.e normalization to EPI template (named in this work N4), normalization to the MNI T2 template (named in this work N5) and normalization to the MNI T1 template (named in this work N6). Then, the FA maps are smoothed with a 4 × 4 × 4 millimeters FWHM Gaussian kernel and averaged to created the FA template. Finally, all registered FA maps are normalized to this template, and smoothed with FWHM of 4 × 4 × 4 millimeters. In all methods, the creation of a binary mask is performed in order to improve the normalization of b=0 images (and the corresponding FA map) to the selected template. 2.3
Brain Volume Classification
A statistical machine learning classifier model was used to evaluate the discriminative capacity of regions identified as relevant from the different normalization approaches in a t-Test. This classification model is a function, constructed from a set of training instances, which is able to predict the class of an unclassified instance, based on the information provided by a set of features computed from those regions. So, most discriminative regions will be those that report the best classification performance for a set of test samples. Figure 1 illustrates the main stages of the approach used in this evaluation. Images were first off-line processed in order to automatically extract regions with significant morphological differences between the groups that we wanted to classify. For doing so, statistical parametrical maps, obtained from each normalization scheme, were thresholded for selecting statistically significant regions with a t-Test, under the restriction that this significance should be spatially coherent within a neighborhood of 5 voxels. Then, a feature extraction process was applied for generating features which were used for training a learning model, able to separate the feature space into the two groups. Each region was modeled as a random variable, described by its corresponding probability density function, and their mean computed as a regional descriptor. So, a feature vector composed of mean values of all selected regions was built
82
G. D´ıaz et al.
up for describing each brain volume. Based on these features, a support vector machine learning model (SVM), with a radial based kernel was trained for finding the optimal separating hyperplane between two classes in the feature space, in which each training instance was represented [15]. When a new MRI volume had to be classified, the relevant regions were located and characterized with the same descriptors used in the training stage and the feature vector that describe the volume was building up. This feature vector was mapped to the same training space and the distance to SVM hyperplane allows to decide if the new instance falls into one category or the other.
Fig. 1. Classification model used for assessing the discriminative capacity of regions detected by each normalization strategy
2.4
Classification Model Evaluation
SVM learning models were trained through an exhaustive search of their learning parameters. The regularization parameter C was varied from 1 to 10 with increment steps of 1, while the parameter α that defines the nonlinear mapping from the input space to some high-dimensional feature space, was varied from 0.01 to 1 with increment steps of 0.02. Evaluation of parameters was carried out through a leave-one-out cross validation. In each test case, one instance was selected as the testing set while the remaining subset was used for extracting the relevant regions and training the learning model. Each subsample was therefore used exactly once as the testing data. By using a Dell PowerEdge 2950 with 24 GB memory, it takes around 9 minutes to finish a leave-one-out cross-validation for control-vs-PSP evaluation, and around 10 minutes for control-vs-EPK evaluation. So, a total of 8.5 hours were required for running all evaluation experiments.
Voxel-Based Analysis of DTI Images
83
On the other hand, the performance of the classification tasks was quantified in terms of its average predictive precision, sensitivity and effectiveness as shown equation 3. TP TP + FP TP Sensitivity = TP + FN P recision ∗ Sensitivity Fβ = 2 P recision + Sensitivity P recision =
(3)
where TP stands for the true positives, FN for the false negatives, and FP for the false positives. Fβ measure allows to combine both the precision and sensitivity rates.
3 3.1
Experimental Results Subjects
The performance of the normalization approaches was evaluated on two datasets of Difusion Tensor MRI images. The former was composed of 14 patients diagnosed with Parkinson (EPK) disease and 15 control subjects, and the latter was composed of 11 patients diagnosed with Progressive Supranuclear Palsy (PSP) and the same control subjects. 3.2
Morphometrical Differences
Figure 2 shows the volume differences observed when healthy controls and PSP patients were compared using SPM of the fractional anisotropy with p < 0.001, for each normalization strategy evaluated. In the upper row, regions from the VBM analysis when MNI maps were used as the template for the normalization process (N 1, N 2 and N 3). In the bottom row, regions involved in the VBM analysis when we used customized FA maps, computed from the control subjects as template (N 4, N 5 and N 6). The pons and corpus callosum appear clearly affected. However, better identification depends on the normalization method applied. Figure 3 shows the volume differences observed when healthy controls and EPK patients were compared using VBM with p < 0.01, for each normalization approach evaluated here. We try to find regions with the larger statistically significance p < 0.001, but they were very small (less than 4 voxels). In the upper row, regions from the VBM analysis when MNI maps were used as the template for the normalization process (N 1, N 2 and N 3). In the bottom row, regions involved in the VBM analysis when we used customized FA maps, computed from the control subjects as template (N 4, N 5 and N 6). In most cases, parts of the internal capsule turn out to be affected. It would appear that the creation of customized templates worsen the detection of specific regions as the pons, which appers affected when analysis is performed on FA images, normalized to MNI templates.
84
G. D´ıaz et al.
Fig. 2. Morphometrical differences between PSP and Control groups for the six normalization approaches evaluated
Fig. 3. Morphometrical differences between EPK and Control groups for the six normalization approaches evaluated
Voxel-Based Analysis of DTI Images
85
Fig. 4. Effectiveness performance for control-vs-EPK subject classification task. X-axis corresponds to the complexity values of SVM learning model and y-axis corresponds to Fβ measure reported in each case.
3.3
Classification Based Assessment
Two learning models were trained for classifying control-vs-EPK and controlvs-PSP subjects. Effectiveness of the learning models was assessed using the Fβ measure and varying the algorithm parameters, as explained in 2.4. We found that the γ parameter of the SVM learning model did not have relevant effect on the performance (values smaller than 0.1). Figures 4 and 5 show the graphic of Fbeta measure reported by each normalization strategy, for control-vs-EPK and control-vs-PSP classification tasks respectively. In both cases, the γ parameter was fixed to the best average performance (results not shown) i.e. γ = 0.06 for control-vs-EPK and γ = 0.08 for control-vs-PSP. Each Fβ value corresponds to the average of all experiments from the leave-one-out cross validation process. These results evidence that the effectiveness was mainly affected by the pathological disease, as expected, due to the larger variability found in the PSP-vscontrol subjects (see Figures 2 and 3). Then, a decision on whether or not a subject belongs to a control or PSP group, is much easier than deciding on EPK or control group. Although, apparently there are not large visible differences between regions, resulting from the different normalization strategies, performance of classifiers trained with these regions showed an important variability. For the control-vsEPK classification task, the use of customized templates outperforms strategies
86
G. D´ıaz et al.
Fig. 5. Effectiveness performance for control-vs-PSP subject classification task. X-axis corresponds to the complexity values of SVM learning model and y-axis corresponds to Fβ measure reported in each case.
based on MNI templates in more than 22%. The best overall performance was accomplished when FA images were normalized to a FA template created from the FA maps of the control group after normalization to MNI T2 template. On the other hand, the control-vs-PSP classification task does not show a clear improvement when customized templates were used. This can be explained by the large regions selected in the SPM analysis because small changes can not affect the mean value of FA computed from each region. It is likely then than a finer selection of these regions, i.e. a more restrictive threshold, allows to find more explicit bias. Regarding strategies using MNI templates, normalization to the T2 template produced the best results in both cases, whilst normalization to T1 template produce poorer results. The reason could be that the T2 images (b=0 images) are acquired simultaneously to the DTI volume, whilst the T1 images are acquired prior to the DTI, so there can be a bigger misalignment between them.
4
Conclusions
In this paper, classification performance is proposed as an objective measure of the normalization quality in a voxel-by-voxel analysis of DTI images. Capacity of normalization strategies for generating most discriminative regions for a classification task, were used as a quality measure. Here we used a SVM learning
Voxel-Based Analysis of DTI Images
87
model for finding the boundaries that optimally separate controls from pathological subjects according to mean of the deformation values of regions extracted from a SPM analysis. Six normalization strategies for analysis of fractional anisotropy maps, proposed in the literature, were evaluated. We find that the normalization procedure selection affects the accurate detection of discriminative regions related to a specific disease. The effect was more remarked when differences between groups were smaller, such as control-vs-EPK study. From the results, there is no evidence that one strategy is definitively better than others; however strategies that used customized templates report good performance in general whilst strategies that used MNI templates report most unstable results.
References 1. Friston, K.J., Ashburner, J.T., Kiebel, S., Nichols, T.E., Penny, W.D. (eds.): Statistical Parametric Mapping: The analysis of functional brain images. Academic Press, London (2007) 2. Ashburner, J.: Computational anatomy with the spm software. Magnetic Resonance Imaging 27, 1163–1174 (2009) 3. Kubicki, M., McCarley, R., Westin, C., Park, H., Maier, S., Kikinis, R., Shenton, M., Jolesz, F.: A review of diffusion tensor images un schizophrenia. Journal of psychiatric research 41, 15–30 (2007) 4. Chao, T., Chou, M., Yang, P., Chung, H., Wu, M.T.: Effects of interpolation methods in spatial normalization of diffusion tensor imaging data on group comparison of fractional anisotropy. Magnetic Resonance Imaging 27, 681–690 (2008) 5. Focke, N.: Voxel-based diffusion tensor imaging in patients with mesial temporal lobe epilepsy and hippocampal sclerosis. Neuroimage 40, 728–737 (2008) 6. Kunimatsu, A.: Utilization of diffusion tensor tractography in combination with spatial normalization to asses involvement of the corticospinal tract in capsular/pericapsular stroke: feasibility and clinical implications. Magnetic Resonance Imaging 26, 1399–1404 (2007) 7. Snook, L.: Voxel based versus region of interest analysis in diffusion tensor imaging of neurodevelopment. Neuroimage 34, 243–252 (2007) 8. Pell, G., Pardoe, H., Briellmann, R., Abbott, D., Jackson, G.: Sensitivity of voxelbased analysis of dti images to the warping strategy. In: Proceedings of the International Society for Magnetic Resonance in Medicine (2006) 9. Kloppel, S., Draganski, B.V., Golding, C., Chu, C., Nagy, Z., Cook, P.A., Hicks, S.L., Kennard, C., Alexander, D.C., Parker, G.J.M., Tabrizi, S.J., Frackowiak, R.S.J.: White matter connections reflect changes in voluntary-guided saccades in pre-symptomatic huntington’s disease. Brain Advance (2007) 10. Caprihan, A., Pearlson, G., Calhoun, V.: Application of principal component analysis to distinguish patients with schizophrenia from healthy controls based on fractional anisotropy measurements. Neuroimage 42, 675–682 (2008) 11. Freidlin, R.Z., Ozarslan, E., Assaf, Y., Komlosh, M.E., Basser, P.J.: A multivariate hypothesis testing framework for tissue clustering and classification of dti data. NMR in Biomedicine 22, 716–729 (2009) 12. Fan, Y., Shen, D.: Integrated feature extraction and selection for neuroimage classification. In: Proceedings of SPIE (2009)
88
G. D´ıaz et al.
13. Kloppel, S., Stonnington, C.M., Chu, C., Draganski, B., Scahill, R.I., Rohrer, J.D., Fox, N.C., Jack Jr., C.R., Ashburner, J., Frackowiak, R.S.J.: Automatic classification of mr scans in alzheimer’s disease. Brain Advance Access (2008) 14. Plant, C., Teipel, S.J., Oswald, A., Bohm, C., Meindl, T., Mourao-Miranda, J., Bokde, A.W., Hampel, H., Ewers, M.: Automated detection of brain atrophy patterns based on mri for the prediction of alzheimer’s disease. NeuroImage (2009) 15. Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods: support vector learning, pp. 185– 208. MIT Press, Cambridge (1999)
Single Trial Classification of EEG and Peripheral Physiological Signals for Recognition of Emotions Induced by Music Videos Sander Koelstra1 , Ashkan Yazdani2 , Mohammad Soleymani3 , Christian M¨ uhl4 , 2 4 3 Jong-Seok Lee , Anton Nijholt , Thierry Pun , Touradj Ebrahimi2 , and Ioannis Patras1 2
1 Department of Electronic Engineering, Queen Mary University of London Multimedia Signal Processing Group, Ecole Polytechnique F´ed´erale de Lausanne 3 Computer Vision and Multimedia Laboratory, University of Geneva 4 Human Media Interaction Group, University of Twente {sander.koelstra,ioannis.patras}@elec.qmul.ac.uk, {ashkan.yazdani,jong-seok.lee,touradj.ebrahimi}@epfl.ch, {mohammad.soleymani,thierry.pun}@unige.ch, {muehlc,a.nijholt}@ewi.utwente.nl
Abstract. Recently, the field of automatic recognition of users’ affective states has gained a great deal of attention. Automatic, implicit recognition of affective states has many applications, ranging from personalized content recommendation to automatic tutoring systems. In this work, we present some promising results of our research in classification of emotions induced by watching music videos. We show robust correlations between users’ self-assessments of arousal and valence and the frequency powers of their EEG activity. We present methods for single trial classification using both EEG and peripheral physiological signals. For EEG, an average (maximum) classification rate of 55.7% (67.0%) for arousal and 58.8% (76.0%) for valence was obtained. For peripheral physiological signals, the results were 58.9% (85.5%) for arousal and 54.2% (78.5%) for valence.
1
Introduction
Given the enormous amounts of untagged video data available on the web nowadays, the need for automatic categorization and tagging of video content to enable efficient indexing and retrieval is evident. Up to this date, the most widespread method for tagging video data is through manual explicit annotation. This is a slow and cumbersome procedure and cannot keep up with the growing amount of created data. An alternative for this method is to automate the tagging procedure. Recently, considerable progress has been made towards automatic content-based tagging with acceptable accuracy under restrictive conditions and for specific domains. However, research in this area has shown that it is not feasible to fully automate the process for general video tagging in the foreseeable future due to the existence of the semantic gap. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 89–100, 2010. c Springer-Verlag Berlin Heidelberg 2010
90
S. Koelstra et al.
Emotional tags associated with the video content can play a significant role for indexing and retrieval purposes. For instance, they can be used in efficient retrieval of video content that is in consonance with the affective mood and state of the users. Therefore, extracting emotional tags implicitly by studying the affective states of the users and assigning these as metadata to video content allows the personalization of the content delivery. One approach to analysis and recognition of emotions is to directly assess the activity of the central nervous system, specifically brain electrical activity, and study the changes in this activity as the user experiences different emotional states. Several works exist that are related to emotion recognition from electroencephalogram (EEG) [7,14,10,2]. Furthermore, there are a number of experiments pointing to the fact that physiological activity is not an independent variable in autonomous nervous system patterns but reflects experienced emotional states with consistent correlates[1,17]. To the best of our knowledge, this is the first work using music videos as stimulus material. The possibility of contradictory information received from visual and auditory modalities makes this particularly challenging. There has been a large number of published works in the domain of emotion recognition from physiological signals [11,2,18]. Amongst these studies, few of them studied EEG signals and achieved notable results using video stimuli. Lisetti and Nosaz used peripheral physiological response to recognize emotion in response to movie scenes [11]. The movie scenes elicited six emotions, namely sadness, amusement, fear, anger, frustration and surprise. They achieved a high recognition rate of 84% for the recognition of these six emotions. However the classification was based on the analysis of the signals in response to pre-selected segments in the shown video known to be related to highly emotional events. Kierkels et al. [5] proposed a method for personalized affective tagging of multimedia using physiological signals. Valence and arousal levels of participants’ emotion when watching videos were computed from physiological responses using linear regression. Quantized arousal and valence levels for a video clip were then mapped to emotion labels. This mapping gave the possibility to retrieve video clips based on keyword queries. So far this novel method achieved low precision. Yazdani et al. [19] proposed a brain computer interface (BCI) based on P300 evoked potentials to emotionally tag videos with one of the six basic emotions proposed by Ekman [4]. Their system was trained with eight subjects and then tested on four other subjects. They achieved a high accuracy on selecting tags. However, in their proposed system, a BCI only replaces the interface for explicit expression of emotional tags. The method does not implicitly tag a multimedia item using the subject’s behavioural and psycho-physiological responses. In this paper, the EEG and biological signals are acquired from six subjects as they watch different music videos and methods for automatic recognition of the user’s affective states are presented. The rest of the paper is organized as follows. Section 2 introduces the methodology used in this study including test material selection, data acquisition and data processing. Experimental results are presented and discussed in Section 3 and Section 4 concludes the paper.
Single Trial Classification of EEG and Peripheral Physiological Signals
2
91
Methodology
We use the valence-arousal scale proposed by Russell [15], which has been widely used in research on affect, in order to quantitatively describe emotion. In this scale, each emotional state can be placed on a two-dimensional plane with arousal and valence as the horizontal and vertical axes. Arousal can range from inactive (e.g. uninterested,bored) to active (e.g. alert, excited), whereas valence ranges from unpleasant (e.g. sad, stressed) to pleasant (e.g. happy, elated). In the following sections, the procedures of test material selection and physiological data acquisition and processing will be explained. 2.1
Subjective Test and Data Selection
The first step of our work is to compile a test set of music videos for physiological data acquisition. The objective of the selection procedure is to ensure that clips inducing various levels of valence and arousal are included in the final data set. We first manually collected 70 candidate music videos spanning diverse genres, ages, and styles. From this collection, the final set of 20 test videos were chosen using a web-based subjective emotion assessment interface. Participants watched video clips one by one and rated them on a discrete 9-point scale for each of valence and arousal, as shown in Fig. 1. Each subject watched 17 clips and, on average, each video clip was rated by 11 subjects.
Fig. 1. Screenshot of the web interface for subjective emotion assessment
Fig. 2(a) shows the ratings, averaged over the test subjects, plotted on the valence-arousal plane. The plane is divided into five regions: positive valence and positive arousal (V+ A+ ), positive valence and negative arousal (V+ A− ), negative valence and positive arousal (V− A+ ), negative valence and negative
92
S. Koelstra et al.
arousal (V− A− ), and neutral (N ). For each of the five regions, the four video clips showing the largest discriminability from the other four regions were chosen for inclusion in the test set. In the V− A+ -region, only two video clips were available and thus each of them was split into two parts that were separately used in the experiments. The first two minute portions of the selected 20 videos were used for the data acquisition. 2.2
Data Acquisition
The experiments were performed in a laboratory environment with controlled temperature and illumination. EEG and peripheral physiological signals were recorded using a Biosemi ActiveTwo system1 on a dedicated recording laptop (Pentium M, 1.8 GHz). Stimuli were presented on a dedicated stimulus laptop (P4, 3.2GHz) that sent synchronization markers directly to the recording PC. For displaying the stimuli and recording the user’s ratings the software “Presentation” by Neurobehavioral systems2 was used. In order to minimize eye movements, all video stimuli were shown with a width of 640 pixels, filling approximately a quarter of the screen. 32 active AgCl electrodes were used (placed according to the international 10-20 system) and the EEG data was recorded at 512 Hz. At the same time, 13 peripheral physiological signals (which will be introduced in section 2.4) were also recorded. Six participants were asked to view the 20 selected music videos, displayed in a random order. Before the experiment a 2 minute baseline recording was made and before each trial (video) a 5 second baseline was recorded. After each video was finished the participant was asked to perform a self-assessment of their levels of valence, arousal, and like/dislike which was later used as the ground truth in the single trial classification. Fig. 2(b) shows a participant shortly before the start of the experiment. 2.3
Data Processing for EEG Signals
Correlation analysis. For the investigation of the correlates of the subjective ratings with the EEG signals, the EEG data was referenced to the common average, re-sampled to 256 Hz, and high-pass filtered with a 0.5 Hz cutoff-frequency using EEGlab3 . Eye movement and blinking artefacts were removed with a blind source separation technique from the AAR toolbox4 for EEGlab. Then the signals from the last 30 seconds of each trial (video) were extracted for further analysis. To correct for stimulus-unrelated variations in power over time the EEG signal from the five seconds before each video was used as a baseline. The frequency power of trials and baselines between 2 and 40Hz was extracted with Welch’s method with windows of 256 samples. The baseline power was then 1 2 3 4
http://www.biosemi.com http://www.neurobs.com http://sccn.ucsd.edu/eeglab/ http://www.cs.tut.fi/~ gomezher/projects/eeg/aar.htm
Single Trial Classification of EEG and Peripheral Physiological Signals
93
4 3
Arousal
2 1
V- A+
V+ A+
V- A-
V+ A-
0
-1 -2
-3 -4 -4
-3
-2
-1
0
Valence
(a)
1
2
3
4
(b)
Fig. 2. (a) Division of the valence-arousal plane into 5 different regions. The chosen clips from each region are shown in light red. (b) A participant shortly before the experiment.
subtracted from the trial power, yielding the change of power relative to the prestimulus period. These changes of power were then Spearman correlated with the valence ratings. This was done for each subject separately and the six p-values per frequency and electrode were then combined to one p-value via Fisher’s method [12]. Classification analysis. For the single trial classification of the EEG data, a five second baseline before each trial was subtracted from the data and it was referenced to the common average (CAR). The data was down-sampled to 100Hz and bandpass-filtered between 0.5 and 35Hz to remove DC drifts and suppress the 50Hz power line interference. Two different feature extraction methods were compared: power spectral density (PSD) and common spatial patterns (CSP). A PSD analysis concerns the spectral domain and investigates the rhythmic activity of brainwaves. The power in each of the frequency bands was calculated using the Fourier transform of the signal. Often, the delta theta, alpha, beta and gamma wave bands are used, but here we tried different fixed bandwidths (from 1 to 10Hz) with 50% band overlap. We also included the difference in band power between every pair of electrodes as features. CSP was originally proposed by Koles [8]. It is a technique to decompose the EEG signal into a number of components based on the variance of the signal that takes into account the class labels. In brief, it attempts to extract components for which the variance is maximal for one class and minimal for the other. Then, for a new, unclassified signal, one uses the variance of the components as features to classify the signal as belonging to one of the classes. For details, the reader is referred to [8]. 2.4
Data Processing for Peripheral Physiological Signals
The following peripheral nervous system signals were recorded: galvanic skin response (GSR), respiration amplitude, skin temperature, electrocardiogram,
94
S. Koelstra et al.
blood volume by plethysmograph, electromyograms of Zygomaticus and Trapezius muscles, and electrooculogram (EOG). GSR provides a measure of the resistance of the skin by positioning two electrodes on the distal phalanges of the middle and index fingers. This resistance decreases due to an increase of perspiration, which usually occurs when one is experimenting emotions such as stress or surprise. Moreover, Lang et al. discovered that the mean value of the GSR is related to the level of arousal [9]. A plethysmograph measures blood volume in the participant’s thumb. This measurement can also be used to compute heart rate (HR) by identification of local maxima (i.e. heart beats), inter-beat periods, and heart rate variability (HRV). Blood pressure and heart rate variability correlate with emotions, since stress can increase blood pressure. Pleasantness of stimuli can increase peak heart rate response [9]. In addition to the HR and HRV features, spectral features derived from HRV were shown to be a useful feature in emotion assessment [13]. Skin temperature was also recorded since it changes in different emotional states. The respiration amplitude was measured by tying a respiration belt around the abdomen of the participant. Slow respiration is linked to relaxation while irregular rhythm, quick variations, and cessation of respiration correspond to more aroused emotions like anger or fear. Regarding the EMG signals, the Trapezius muscles (neck) activity was recorded to investigate the possible head movements during music listening. The activity of the Zygomaticus major was also monitored, since this muscle is active when the user is laughing or smiling. Most of the power in the spectrum of an EMG during muscle contraction is in the frequency range between 4 to 40 Hz. Thus, the muscle activity features were obtained from the energy of EMG signals in this frequency range for the different muscles. The rate of eye blinking is another feature, which is correlated with anxiety. Eye-blinking affects the EOG signal and results in easily detectable peaks in that signal. In total 53 features were extracted from peripheral physiological responses based on the proposed features in the literature [2,18]. A summary of the features is given below. GSR: Mean and standard deviation of skin resistance, mean of derivative, mean of absolute of derivative, mean of derivative for negative values only (mean decrease rate during decay time), proportion of negative samples in the derivative vs. all samples, spectral power in the bands (0-0.1Hz, 0.1-0.2Hz, 0.2-0.3Hz, 0.30.4Hz) Blood volume pressure: Mean and standard deviation of HR and its derivative, HRV, mean and standard deviation of inter beat intervals, energy ratio between the frequency bands 0.04-0.15Hz and 0.15-0.5Hz, spectral power in the bands (0.1-0.2Hz, 0.2-0.3Hz, 0.3-0.4Hz), low (0.01-0.08Hz), medium (0.080.15Hz) and high (0.15-0.5Hz) frequency components of HRV power spectrum. Respiration: Mean respiration signal, mean of derivative (variation of the respiration signal), standard deviation, range or greatest breath, breathing rate, spectral power in the bands (0-0.1Hz, 0.1-0.2Hz, 0.2-0.3Hz, 0.3-0.4Hz).
Single Trial Classification of EEG and Peripheral Physiological Signals
95
Skin Temperature: Range, mean, standard deviation, mean of its derivative, spectral power in the bands (0-0.1Hz, 0.1-0.2Hz, 0.2-0.3Hz, 0.3-0.4Hz). EMG and EOG: Eye blinking rate, energy, mean and variance of the signal. Normalization was applied on each feature separately by subtracting the minimum and dividing by the difference between the maximum and the minimum value of the features. The normalization parameters, maximum and minimum values, were obtained from the training set.
3
Results
In this section, we present the results of the methods introduced earlier. First, an analysis on the validity of the self-assessment of participants is presented. Next, we investigate the average correlations between these ratings and observed EEG frequency power. Finally, the results of single trial classification using EEG and peripheral physiological signals are presented. 3.1
Analysis of Subjective Ratings
To validate the affect induction approach and identify possible threats to reliability (e.g. due to extreme habituation or fatigue), we computed the (Spearman) correlations between the rating scales and the stimulus order. Table 1. The correlations between the rating scales of valence, arousal, like/dislike and the order of the presentation of stimuli. Significant correlations are indicated by stars.
Valence Arousal Like/Dislike Order
Valence
Arousal
Like/Dislike
Order
1 -
0.46* 1 -
0.66* 0.56* 1 -
-0.24 -0.17 -0.18 1
The correlation analysis revealed a medium correlation between the ratings on the valence, arousal, and like/dislike scales (Table 1). That could be due to the fact that people liked positive emotions evoking and arousing clips more. Despite the correlations between valence, arousal, and like/dislike, the results suggest that subjects did differentiate between these concepts. Furthermore, no significant correlation between stimulus order and the ratings was observed. This indicates that any effects of habituation and fatigue were kept to an acceptable minimum. 3.2
Correlations between EEG Frequencies and Ratings
The results of the correlation analysis between participant ratings and EEG frequency power suggest that brain activity from different regions of the scalp
96
S. Koelstra et al. Valence
0.30 0.25 0.20 0.15 0.10
Theta band (7Hz)
Alpha band (10Hz)
Beta band (21Hz)
Beta band (29Hz)
Gamma band (38Hz)
Arousal
0.05 0
Liking
-0.05 -0.10 -0.15 -0.20 -0.25 -0.30
Alpha band (10Hz)
Theta band (7Hz)
Alpha band (11Hz)
Beta band (20Hz)
Beta band (25Hz)
Fig. 3. The plots show the mean correlation coefficients over all 6 subjects for specific narrow frequency bands. Electrodes showing highly significant (p < 0.01) differences are highlighted.
can be related to the subjective emotional states of the participants along the axes of arousal and valence, and to their preference for the clips (Fig. 3). The large number of tests computed may lead to an increase in false positives. To attenuate this risk, only highly significant (p < 0.01) correlations are discussed. For valence a strong positive correlation with left parietal-occipital power in the theta band, and a negative correlation with right posterior alpha power is observed. This pattern of increasing low frequency band and decreasing alpha band power can be understood in the context of emotion regulation and increased sensory processing [6]. Furthermore, a left central increase and a right frontal decrease in high beta band power is visible with higher valence. Especially, the frontal response might indicate a relative deactivation of cortical regions related to negative mental states [3]. Additionally, a positive correlation with right posterior gamma is observed, possibly hinting again to a role of right posterior cortices in emotion-related sensory processes. For states of higher arousal, a robust decrease of right posterior alpha power can be observed. This is consistent with the role of (posterior) alpha in sensory processes, and the role of the right hemisphere in affective processing [3]. Like/dislike shows a similar positive correlation in the theta range and negative correlation in the alpha range as observed for valence. This is presumably due to the correlations seen between the valence and like/dislike ratings. Interestingly, a decrease of beta power with higher liking is observed over the central cortical region, known to be involved in imaginary and real (foot) movement [16]. 3.3
EEG Single Trial Classification
In both the EEG and peripheral physiological signal single trial classification, the same ground truth labels and classification methods were used. Three different
Single Trial Classification of EEG and Peripheral Physiological Signals
97
targets were classified: the like/dislike, arousal and valence ratings. We have posed the problem as a two-class classification problem. The given ratings were thresholded (at the centre of the 9-point rating scale) into two classes for each classification target. For the arousal and general rating targets, we had to exclude participant 1, as this participant assigned 17/20 videos a high arousal rating and 19/20 videos a high like/dislike rating. As a result, we did not have enough samples to train the classifier for low arousal and low like/dislike rating for this participant. All other participants rated the videos in a more balanced manner. To improve statistical accuracy of the classification, each trial (video) was split into ten 12-second segments. Testing was done using leave-one-trial-out crossvalidation. A linear support vector machine (SVM) was used for classification. As mentioned before, in the EEG single trial classification, we compared two different feature extraction methods for feature extraction from the EEG signals, PSD and CSP. With the PSD method, we tested several options for the width of the frequency bands (1,2,3,4,5 and 10Hz). Only the results of the best scoring bandwidth are reported. The results of each algorithm and each classification target are given in Table 2 and discussed below. Table 2. Single trial two-class classification rates for the valence, arousal and like/dislike targets Target
Method
P1
P2
P3
P4
P5
P6
Avg.
Valence
PSD (3Hz bands) CSP (2 comp.)
59.0 60.0
45.0 38.5
63.0 54.0
51.5 60.0
58.5 65.0
76.0 75.0
58.8 58.8
Arousal
PSD (10Hz bands) CSP (2 comp.)
— —
19.5 44.5
67.0 55.0
63.5 65.0
56.5 59.5
53.5 54.5
51.9 55.7
Like/Dislike
PSD (4Hz bands) CSP (4 comp.)
— —
54.0 53.0
50.5 54.0
57.5 63.5
32.5 17.0
52.5 56.5
49.4 48.8
Valence. The performance for valence prediction is better than for arousal prediction. For CSP using two components gave the best result. The best result for the PSD method was obtained using 3Hz frequency bands (with 50% overlap). Overall, both algorithms lead to the same classification accuracy (58.8%). Participant 2 scores badly, when excluding this participant, CSP scores 62.8% vs. 61.6% for the PSD method. Arousal. For CSP two components were used and for PSD 10Hz frequency bands. Overall, CSP outperforms the PSD method (55.7% vs. 51.9%). For participant 2, the result is very low for both methods. When excluding this participant, CSP has a mean classification rate of 58.5% vs. 60.0% for PSD. Like/dislike. For CSP four components were used and for PSD 4Hz frequency bands. The PSD method obtains the highest classification accuracy, though the difference is minimal (49.4% vs. 48.8%). Participant 5 had a very low accuracy for both methods. When excluding this participant, CSP outperforms the PSD
98
S. Koelstra et al.
method (56.8% vs. 53.6%). We are currently investigating the possible causes and remedies of the surprisingly low scores for some of the participants. 3.4
Peripheral Physiological Signals Single Trial Classification
As mentioned earlier, 53 features were extracted from each physiological signal sample. The classification scheme remains the same as for EEG-based classification. The fast correlation based filter (FCBF) feature selection method was used to select the most discriminating features at each iteration of cross-validation[20]. Table 3. Classification rates using an SVM classifier and FCBF feature selection Target
P1
P2
P3
P4
P5
P6
Avg.
Valence Arousal Like/Dislike
40.5 — —
37.0 44.5 73.0
78.5 85.5 69.0
40.5 55.0 55.5
65.5 49.0 32.0
63.0 60.5 60.0
54.2 58.9 57.9
The classification rates for valence, arousal and like/dislike are given in Table 3. On average, valence results using peripheral physiological signals are worse than like/dislike and arousal. Arousal relates most to peripheral nervous system activities; therefore, the best classification results were obtained for arousal classification. All the results of participant 3 are shown to be amongst the best obtained results. This may be due to a better self-assessment for this participant.
4
Conclusion
In this paper an experiment was conducted to automatically recognize emotions induced by watching music video clips. Six subjects were asked to watch 20 music videos each and rate them according to perceived levels of valence, arousal and general like/dislike. As they watched the videos, their EEG and peripheral physiological signals were recorded. On average, frequency power over several cortical regions correlated to the subjective state and preferences of the participants, especially in the lower frequencies (i.e. in the theta and alpha bands). Similar findings have been reported in the literature on neurophysiological affective responses. For single trial classification, We posed the affect recognition problem as a twoclass classification problem, classifying the videos as having low or high arousal, valence and like/dislike. We presented results for classification of both EEG and peripheral physiological signals. For EEG classification, the average (maximum) classification rates are 55.7% (67%) for arousal, 58.8% (76%) for valence and 49.4% (63.5%) for like/dislike ratings. Using peripheral physiological responses, the average (maximum) classification rates are 58.9% (85.5%) for arousal, 54.2% (78.5%) for valence, and 57.9% (73%) for like/dislike rating. Due to the low
Single Trial Classification of EEG and Peripheral Physiological Signals
99
number of samples we could not validate the significance of our results. We are currently repeating the experiment with 20-30 subjects and 40 samples each in order to gain more statistically valid results. The classification based on arousal and valence values and binary thresholding proved to be rather challenging. The use of music videos may lead to mixed emotional messages from the video and audio modalities. Furthermore the affectrelated responses could be specific to the modality the affect was induced by. These effects may complicate any classification. We intend to investigate the influence of the different modalities in our next study. The results of the single trial classification show that there is a relatively large amount of information in both EEG and peripheral physiological signals regarding users’ emotional states. In future work, we aim to create a more extensive video clip database so that they can elicit stronger and more diverse emotions in participants and thus increase the accuracy of the emotion recognition. Furthermore, we plan to fuse the peripheral physiological and EEG modalities in order to better exploit the relative strengths of each modality.
Acknowledgment The research leading to these results has been performed in the frameworks of European Community’s Seventh Framework Program (FP7/2007-2011) under grant agreement no. 216444 (PetaMedia). Furthermore, the authors gratefully acknowledge the support of the BrainGain Smart Mix Programme of the Netherlands Ministry of Economic Affairs, the Netherlands Ministry of Education, Culture and Science and the Swiss National Foundation for Scientific Research in the framework of NCCR Interactive Multimodal Information Management (IM2).
References 1. Cacioppo, J., Berntson, G., Larsen, J., Poehlmann, K., Ito, T.: The psychophysiology of emotion. In: Handbook of Emotions, pp. 119–142 (1993) 2. Chanel, G., Kierkels, J., Soleymani, M., Pun, T.: Short-term emotion assessment in a recall paradigm. Int’l. Journal Human-Computer Studies 67(8), 607–627 (2009) 3. Demaree, H.A., Everhart, E.D., Youngstrom, E.A., Harrison, D.W.: Brain lateralization of emotional processing: Historical roots and a future incorporating“dominance”. Behavioral and Cognitive Neuroscience Reviews 4(1), 3–20 (2005) 4. Ekman, P., Friesen, W., Osullivan, M., Chan, A., Diacoyannitarlatzis, I., Heider, K., Krause, R., Lecompte, W., Pitcairn, T., Riccibitti, P., Scherer, K., Tomita, M., Tzavaras, A.: Universals and cultural-differences in the judgments of facial expressions of emotion. Journal of Personality and Social Psychology 53(4), 712– 717 (1987) 5. Kierkels, J., Soleymani, M., Pun, T.: Queries and tags in affect-based multimedia retrieval. In: Int’l. Conf. Multimedia and Expo, Special Session on Implicit Tagging (ICME 2009), New York, United States (2009) 6. Knyazev, G.: Motivation, emotion, and their inhibitory control mirrored in brain oscillations. Neuroscience & Biobehavioral Reviews 31(3), 377–395 (2007)
100
S. Koelstra et al.
7. Ko, K., Yang, H., Sim, K.: Emotion recognition using EEG signals with relative power values and Bayesian network. Int’l. Journal of Control, Automation and Systems 7(5), 865–870 (2009) 8. Koles, Z.: The quantitative extraction and topographic mapping of the abnormal components in the clinical EEG. Electroencephalography and Clinical Neurophysiology 79(6), 440–447 (1991) 9. Lang, P., Greenwald, M., Bradely, M., Hamm, A.: Looking at pictures - affective, facial, visceral, and behavioral reactions. Psychophysiology 30(3), 261–273 (1993) 10. Li, M., Chai, Q., Kaixiang, T., Wahab, A., Abut, H.: EEG Emotion Recognition System. In: Vehicle Corpus and Signal Processing for Driver Behavior, p. 125 (2008) 11. Lisetti, C.L., Nasoz, F.: Using noninvasive wearable computers to recognize human emotions from physiological signals. EURASIP J. Appl. Signal Process. 2004, 1672– 1687 (2004) 12. Loughin, T.M.: A systematic comparison of methods for combining p-values from independent tests. Computational Statistics & Data Analysis 47, 467–485 (2004) 13. McCraty, R., Atkinson, M., Tiller, W., Rein, G., Watkins, A.: The effects of emotions on short-term power spectrum analysis of heart rate variability. The American Journal of Cardiology 76(14), 1089–1093 (1995) 14. Murugappan, M., Juhari, M., Nagarajan, R., Yaacob, S.: An investigation on visual and audiovisual stimulus based emotion recognition using EEG. Int’l. Journal of Medical Engineering and Informatics 1(3), 342–356 (2009) 15. Russell, J.: A circumplex model of affect. Journal of Personality and Social Psychology 39(6), 1161–1178 (1980) 16. Solis-Escalante, T., M¨ uller-Putz, G., Pfurtscheller, G.: Overt foot movement detection in one single laplacian EEG derivation. Journal of Neuroscience Methods 175(1), 148–153 (2008) 17. Stemmler, G., Heldmann, M., Pauls, C., Scherer, T.: Constraints for emotion specificity in fear and anger: The context counts. Psychophysiology 38(02), 275–291 (2001) 18. Wang, J., Gong, Y.: Recognition of multiple drivers emotional state. In: Int’l. Conf. Pattern Recognition, pp. 1–4 (December 2008) 19. Yazdani, A., Lee, J.-S., Ebrahimi, T.: Implicit emotional tagging of multimedia using EEG signals and brain computer interface. In: Proc. SIGMM Workshop on Social Media, pp. 81–88. ACM, New York (2009) 20. Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
Brain Signal Recognition and Conversion towards Symbiosis with Ambulatory Humanoids Yasuo Matsuyama , Keita Noguchi, Takashi Hatakeyama, Nimiko Ochiai, and Tatsuro Hori Waseda University, Department of Computer Science and Engineering, Tokyo, 169-8555, Japan {yasuo,n kei,rdsg,n.ochoai,tatsuro}@wiz.cs.waseda.ac.jp http://www.wiz.cs.waseda.ac.jp
Abstract. Human-humanoid symbiosis by using brain signals is presented. Humans issue two types of brain signals. One is non-invasive NIRS giving oxygenated hemoglobin concentration change and tissue oxygeneration index. The other is a set of neural spike trains (measured on macaques for safety compliance). In addition to such brain signals, human motions are combined so that rich in carbo information is provided for the operation of a humanoid which is a representative of in silico information processing appliances. The total system contains a recognition engine of an HMM/SVM-embedded Bayesian network so that the in carbo signals are integrated, recognized and converted to operate the humanoid. This well-folded system has made it possible to operate the humanoid by thinking alone using a conventional PC. The designed system’s ability of transducing sensory information is expected to lead to amusement systems, rehabilitation and prostheses.
1
Introduction
Analyzing and utilizing information contained in brain signals has been studied for a long time from the viewpoints of physiology [1], engineering utilization [2], stochastic processes [3], [4], and many other disciplines. Thorough comprehension of the brain still has a long way to go. However, contemporary network society starts revealing the germ of novel applications in ICT. This paper tries to compile additional evidence by presenting methods for integrating and recognizing human-generated information including brain signals and human motions. This multimodal signal set is referred to as in carbo one1 . In the aforementioned application, the interaction with networked ambulant humanoids (walking PCs) is a good example of in silico technology. In this
1
This study was supported by the Grant-in-Aid for Scientific Research #22656088, the Ambient SoC Global COE Program of Waseda University, and the Waseda University Grant for Special Research Projects #2010B. In later experiments, neural spike trains are those of monkeys due to the current compliance on measurement for humans. Biologically, humans are classified as monkeys.
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 101–111, 2010. c Springer-Verlag Berlin Heidelberg 2010
102
Y. Matsuyama et al.
paper, the human-humanoid interaction is non-verbal. Reflecting this, the rest of this paper is organized as follows. In Section 2, three types of in carbo signals including brain signals are explained. Section 3 explains how these signals are recognized as patterns. In Section 4, an HMM/SVM-embedded Bayesian network is designed for transducing recognized information to different sensory roles. Section 5 shows experiments on human-humanoid interaction including thinking alone control of a humanoid. Section 6 is the conclusion.
2
Multimodal Signals Carrying Information from Brains
2.1
Brain Activity Measured by NIRS
For the measurement of brain activities, non-invasive methods are popular. It should be noted in advance, however, that there are limitations to this class on resolutions and machine sizes. Another essential limitation to non-invasive measure is on the response speed. Signal measurements for this class capture brain activities only indirectly. Accepting the status of current technologies, we uses the NIRS system (Near Infra-Red Spectroscopy) of the NIRO-200 [5]. We measured the following data at a rate of 6 samples per second. (a) Oxy-hemoglobin change: This is the difference of oxy-hemoglobin levels between the start and the present. It is expressed as ΔO2 Hb μM . (b) Tissue Oxigeneration Index (TOI): This is the ratio of the oxy-hemoglobin level with respect to the total hemoglobin level. T OI = 100 × O2 Hb(t)/cHb(t)
(1)
We will feed such non-invasive brain signals to a set of SVMs (Support Vector Machines). 2.2
Neural Spike Trains and Artificial EPSP
The main unit of brain information processing is neurons. It provides a spike train to connecting neurons so that their post synaptic potentials are modified. Recording of spike trains requires invasive measurement by electrodes. This limits the spike train measurement on humans due to subject safety. Therefore, we used spike trains in an open database [6]2 . Patterns of spike trains as “thinking” remains untrodden scientifically. However, it is possible to classify their patterns using machine learning algorithms. In a later section, artificial EPSPs (Excitatory Post Synaptic Potentials) will be generated for the recognition by using HMMs (Hidden Markov Models). 2.3
Body Motions
Body motions or gestures result from brain activities. Therefore, it is a class of in carbo information. The measurement system used in this paper was a set of 11 2
Task-related and labeled spike trains are still precious even on primates. We are thankful to this database.
Brain Signal Recognition and Conversion
103
magnetic sensors [7]. Readers are recommended to preview Figure 9 (a) where an operator wears such magnetic sensors. Body motions were recognized to patterns from vector time series. Such motions contain semi-periodic patterns such as walking and single-shot ones such as hand-ups. For semi-periodic patterns, HMMs are appropriate. For single-shot movements, BNs (Bayesian Networks) were found to be appropriate by preliminary experiments.
3 3.1
Signal Types, Recognizers and Target Machine NIRS Signals and Recognition
Brain signals measured by non-invasive methods usually contain artifact information caused by blood flow. Therefore, pulsation signals need to be filtered out by FFT and IFFT as pre-processing. Figure 1 illustrates left and right forehead ΔO2 Hb from a certain subject without filtering and with a 0.9 Hz cut-off filtering.
Fig. 1. Oxy-hemoglobin changes with (bottom) and without (top) filtering
For the generation of brain signals, we prepared three tasks; “left hand tapping (LHT)”, “right hand tapping (RHT)” and “knitting eyebrows (KEB).” There is also a quiescent state without any tasks (QST). Figure 2 illustrates the total system for recognizing {LHT, RHT, KEB, QST} using SVM combinations. Figure 3 (left) is a subrecognizer for each signal in the case of N = 2 of Figure 2. Figure 3 (right) is a structure which combines subrecognizers where ΔO2 Hb and TOI were used. 3.2
Spike Trains and Recognition
Spike trains from neuron populations are considered as main entities of brain signals. However, task-related measurements from subjects are very expensive because of their invasive nature. Therefore, on-line utilization of such brain signals is very limited despite a prospective view of “electronic chips in the brain” [8]. Spike train utilization planned in this paper is not so general, but tries to design a system which recognizes and transduces sensory information to other
104
Y. Matsuyama et al.
Fig. 2. Total structure of non-invasive brain signal recognition
Fig. 3. Sub-recognition units by SVM combination
types. In the experimental stages, a subject cannot be a human. Therefore, we use data from the neural signal archive [6]. A set of spike trains from a macaque watching coherent semi-random dots [9], [10] is a data set. Spike trains are point processes [3], [4]. For the purpose of their classification, we converted them to artificial EPSPs. Figure 4 illustrates a spike train and its EPSP generated by superposing the α-function [11]. αj (t) = wji (qt/τ 2 ) exp{−(t/τ )}u(t)
(2)
Here, u(t) is a unit function, wji is the connection from neuron j to neuron i, q is the total injected charge and τ is a time constant. The artificial EPSP in this
Fig. 4. Spike train and its artificial EPSP
Brain Signal Recognition and Conversion
105
illustration was computed by wji q = 1 and τ = 2 ms. In this figure, the average was shifted so that the mean value was zero. The continuity in time and amplitude of artificial EPSPs matches the classification by HMMs. In a later section of experiments, a parallel structure of left-to-right models will be used. 3.3
Body Motions and Recognition
Body motions measured by 11 magnetic sensors were transformed to BVH format (Bio-Vision Hierarchical format). Since each sensor gave 6-dimensinal data of {x, y, z, roll, pitch, yaw}, the input data by the body motions made a time series of super vectors. From our preliminary experiments, the following results were found. (a) The HMM is good at identifying semi-periodic motions which occur while a humanoid walks. However, switching motions is sometimes misrecognized. (b) The Bayesian Network (BN) is reliable in deciding motion classes by placing undecided elements. However, this needs information from a module for the periodic motion decision. Therefore, a combination of parallel HMM and a Bayesian Network is considered to be an appropriate recognizer for body motions. 3.4
Target Machine: Bipedal Humanoids
The bipedal humanoid, or walking PC, we used was an HOAP-2 [12] whose motions can be designed by users in a network environment. Readers can preview its appearance in Figure 5 in the next section.
4
System Integration
4.1
System Integration in the Network Environment
We designed an environment where each of a human and humanoid was a node in the network. There, a human operates a humanoid3 using in carbo signals so that his/her intentions are recognized, then transduced and forwarded to the humanoid. Figure 5 illustrates this humanoid operation system. A human operator generates brain signals and body motions. These signals are recognized as patterns so that the operator’s intentions are identified and transduced. This enables sensory conversion and ensures the machine independence of the total system. An important engine in this system is the total recognizer, whose specifications will be given in the next section separately. Across the network, a humanoid acts according to the commands reflecting the in carbo signals. It is important to note that the information flow from the human to humanoid is multimodal and non-verbal. 3
Multiple humanoids are allowed.
106
Y. Matsuyama et al.
Fig. 5. Human-humanoid interaction using in carbo signals
On the humanoid side, transmitted commands are expanded. However, simple expansion of a given series of commands is likely to cause falling-down. Therefore, at the cost of operation speed, we need to make a path through intermediate states. Fro instance, “walking → squat” is expanded to “walking → stop → upright → squat.” Such a transition path expansion ensures the safety of the humanoid’s actions. This target machine, the walking PC, has the following significance: (a) This humanoid is next to an android which ambulates. The recognized commands can also be used to control wheel humanoids and other multi-peds by preparing a different but corresponding motion database. (b) The information flow between a human and humanoid is bidirectional. A path for sending stereo views from the humanoid to the human operator is provided. This information is useful to the operator when the humanoid is located at a large distance. (c) The humanoid is not a perfect servant to the operator. If the humanoid finds a cue, it acts to complete a task automatically without human instructions (see Figure 9 (b)). 4.2
Recognizer Integration by HMM/SVM-Embedded Bayesian Network
Since the addressed objective is the integration and utilization of multimodal and non-verbal signals, we need to design a total recognizer. Important check points are the following: (a) Each of the types in the multimodal signals are very different. They have appropriate recognition methods, respectively. (b) A monolithic system is not appropriate in view of its size since it would be too heavy to be operated on a PC level.
Brain Signal Recognition and Conversion
107
(c) The Bayesian network as a combining system for sub-recognition units is on the level of PC operations, although a genuinely large one may become easily intractable. By considering the above items and carrying on extensive pre-experiments, the Bayesian network (BN) comprising of HMMs and SVMs was found to be appropriate to fold the total system well into a compact one. This is called HMM/SVM-embedded BN. Figure 6 is its illustration for the bipedal humanoid operation by non-verbal information including brain signals. Placing undecided nodes, whose states are computed after receiving evidences to other nodes according to the Bayes rule, is a key point for successful BN.
Fig. 6. HMM/SVM-embedded Bayesian network integrating sub-recognizers
5 5.1
Individual Experiments and Total Operations Brain NIRS by Tapping
The designed recognition systems shown in Figures 2 and 3 were applied to a subject’s tapping tasks by right and left fingers (see Figure 1 for right fingers tapping). We carried out three types of experiments. [Exp-1: Tapping classification for one subject]: Brain activities were individually versatile. Signal positions, timing and strength differed one by one. Useful systems needed not to be omnirange but a tailor-made style. Experiments contained training and test phases. Training data collection: Ten sets of eighty tapping patterns were collected as the NIRS brain signals of a subject with respect to left and right fingers, respectively. Data types were ΔO2 Hb and TOI (cf. Figure 1). Since data were dependent on subject’s fatigue, typical sets were chosen for left and right fingers, respectively. Quiescent state signals were also measured. Training on recognizer: This was the phase to train the machine recognizer of Figures 2 and 3. Gaussian kernels were used for SVMs. On the decision of threshold values, manual adjustments were applied since the system was of a tailor-made type.
108
Y. Matsuyama et al.
Test of the designed recognizer: We prepared eighty test data which were not used for the training. Using this data set, we obtained the performances as follows: For right finger tapping, the recognition is {right, left, quiescent} ={61.0, 22.5, 16.5}%. For left finger tapping, the recognition is {right, left, quiescent} = {8.7, 83.8, 7.5}%. For quiescent state, the recognition is {right, left, quiescent} = {3.8, 11.2, 85.0}%. On this right-handed subject, left hand tapping was best recognized among {right, left, quiescent}. However, on the experiment of thinking alone in Exp-2, the results were reversed. [Exp-2: Brain NIRS signal generation by thinking alone]: Considering the results of Exp-1 as a set of preliminary experiments, we carried out the experiment of thinking alone. Recognition system setup: A subject gave a set of training data and a recognizer was designed in the same way as in EXP-1. Test of thinking alone: The subject made efforts to generate brain signals by executing tapping without finger actions. Recognition results of these thinking alone experiments are as follows: For right finger tapping, the recognition rate is {right, left, quiescent} = {75.0, 1.3, 23.7}%. For left finger tapping, the recognition is {right, left, quiescent} = {18.7, 73.8, quiescent=7.5}%. For quiescent state, the recognition is {right, left, quiescent} = {1.2, 7.5, 91.3}%. Thus, three trials of thinking-alone out of four were successful. Figure 7 (a) illustrates humanoid operation to veer to the right switched from walking strait when the right finger tapping was recognized (see the operator’s right hand). Figure 7 (b) shows the same motion generated by thinking-alone.
(a) Tapping
(b) Thinking-alone
Fig. 7. Humanoid operation by brain NIRS signals
[Exp-3: Subject mismatch]: Although we understand that systems making use of brain signals are highly user-dependent, it is interesting to see performances on mismatched subjects. This is to measure the system performance
Brain Signal Recognition and Conversion
109
using a different person’s training data. This might be the case where a handicapped person receives trainings. The recognition with another person’s training data with threshold adjustment to each user gave around a 60% recognition rate. 5.2
Spike Train Recognition
Spike trains we apply to the humanoid operation are macaque’s VS/MT responses to moving dot patterns [10]. Dots are coherent ranging from 0% (perfectly random) to 100% (deterministic; upward or downward). Thus, each spike train is associated with a label for {dot coherence, up/down direction, fail/success}. We performed extensive preliminary experiments to decide design parameters in the following way: (a) Spike trains were drawn from the database for learning and tests for the recognizer design without any foresight. Dot direction was either 0 (upward) or 1 (downward). Coherence was larger than or equal to 25.6%. Each record length was 2,000 ms. (b) Parallel HMM recognition systems by left-to-right models were designed and tested so that the start timing and the best window length could be decided. (c) By repeating the above procedure and pruning inappropriate data, eligible spike trains were selected. There were important findings obtained by this procedure. (1) The best spike duration is 90 ms taken from the start of the task from a machine recognition viewpoint, i.e., the best recognition rate. This was an analysis-by-synthesis result compatible with physiologist’s default value of 100 ms. (2) There are one to seven spikes in this window size of 90 ms. (3) The least numbers of necessary training data was 20 for the two-class and quiescence recognition, and 40 for the three-class and quiescence recognition. The latter can be understood from Table 1 which explains three classes to be recognized and Table 2 which gives the recognition rate. Table 1. The three generated classes class direction correlation response # of data C0 0, 1 0% 0,1 45 D0 0 51.2%, 99.9% 1 49 D1 1 51.2%, 99.9% 1 41
Figure 8 illustrates that the spike patterns in the visual recognition area were converted to humanoid’s head motions. The significance of this head swing is that the VS/MT spike trains which are not motor signals were transduced to humanoid motions unlike in the trial of [13].
110
Y. Matsuyama et al. Table 2. Recognition rate of the three classes # of data (used rate) recognition rate % 68 (1/2) 95.4 45 (1/3) 98.3 34 (1/4) 93.0 17 (1/8) 79.3
Fig. 8. Humanoid’s head swing motion converted from VS/MT spike trains
(a) Operation by motions and NIRS
(b) Button pushing
Fig. 9. Total operation of humanoid via in carbo signals
5.3
Total Operation
The designed system of Figure 5 has the universal ability of accepting a variety of multimodal signals due to the general recognition ability by the HMM/SVMembedded BN. Here, we give an example of the total operation of the humanoid using motion and brain signals. In Figure 9 (a), the human operator sent motion signals for walking. When the humanoid was walking to an obstacle, the operator sent a strong brain signal of KEB so that the walking would be stopped. Figure 9 (b) is the button pushing in the autonomous mode where the humanoid acted independently of the operator.
6
Conclusion
An integration, recognition and conversion system of spike trains, NIRS signals and body motions, all of which originate in carbo, was designed. The total
Brain Signal Recognition and Conversion
111
system is an HMM/SVM-embedded Bayesian network. The recognition results were utilized for the humanoid operation. This is a promoting evidence for understanding brain and motion information. The authors have anticipation that the set of algorithms and systems in this paper would lead to future advancements in gaming, rehabilitation and prostheses utilizing the transduction of sensory information.
References 1. Lettvin, J.Y., Maturana, H.R., McCulloch, W.S., Pitts, W.H.: What the frog’s eye tells the frog’s Brain. Proc. IRE 47, 1940–1951 (1959) 2. Martin, T.B., Talavage, J.J.: Application of neural logic to speech analysis and recognition. IEEE Trans. Military Electronics 7, 189–196 (1963) 3. Matsuyama, Y., Shirai, K., Akizuki, K.: On some properties of stochastic information processes in neurons and neuron populations. Kybernetik (Biological Cybernetics) 15, 127–145 (1974) 4. Matsuyama, Y.: A note on stochastic modeling of shunting inhibition. Biological Cybernetics 24, 139–145 (1976) 5. Hamamatsu Photonics: NIRO 200 document, Shizuoka, Japan (2003) 6. Bair, W.: Neural signal archive, http://www.neuralsignal.org/ 7. Ascension Technologies: Motion Star instruction manual (2000) 8. Minsky, M.: Kyoto 1200th anniversary lecture, Kyoto, June 25 (1994) 9. Britten, K.H., et al.: The analysis of visual motion: A comparison of neuronal and psychological performance. J. Neurosciencs 12, 4745–4765 (1992) 10. Zohary, E., Newsome, W.T.: emu035P, Neural Signal Archive, nsa2004.2 (2004) 11. Gerstner, W., Kistler, W.: Spiking Neuron Models, pp. 100–102. Cambridge University Press, Cambridge (2002) 12. Fujitsu Automation and Michiya System: HOAP-2 instruction manual (2004) 13. Duke University and ATR: Monkey’s thoughts propel robot, a step that may help humans. New York Times (January 15, 2008)
Feature Rating by Random Subspaces for Functional Brain Mapping Diego Sona1,2 and Paolo Avesani1,2 1
NILab, Fondazione Bruno Kessler 2 CIMeC, University of Trento
Abstract. Functional magnetic resonance imaging is a technology allowing for a non-invasive measurement of the brain activity. Data are encoded as sequences of 3D images, usually few hundreds samples, each made by tens of thousands voxels, namely volumetric pixels. The main question in neuroimaging is the identification of the voxels affected by a specific brain activity. This task, referred to as brain mapping, can be conceived as a problem of feature rating. The challenge is twofold: the former is to deal with the high feature space dimensionality; the latter is the need for preservation of redundant features. Most common techniques of feature selection do not cover both requirements. In this work we propose the adoption of a random subspace method, arguing, by theoretical arguments and empirical evidence on synthetic data, that it might be a viable solution for a multi-variate approach to brain mapping. In addition we provide some results on a neuroscientific case study investigating on a visual perception task.
1
Introduction
Neuroimaging is a relatively new discipline exploiting a set of technologies (Magnetic Resonance, Positron Emission Tomography, Magnetoencephalography, etc.) to study the brain and its functions. In particular, magnetic resonance supports the acquisition of different types of data, including structural and functional images. Structural neuroimaging deals with the anatomy of the brain, and it is regularly used by clinicians to diagnose brain damages, while functional neuroimaging supports the investigation of human brain functions. A potential impact of functional imaging on neurosurgery is the opportunity to circumscribe relevant functional brain areas to minimize the surgical intervention damages. Functional Magnetic Resonance Imaging (fMRI ) allows for a non-invasive measurement of the human brain activity with a relatively high spatial resolution. This technology records the variation in time of the oxygenated blood flow, called Blood Oxygenation Level Dependent (BOLD ). An increase in the local neural activity results in a greater consumption of oxygen. As a reaction, the vascular system provides more oxygenated blood to the interested regions. The measurement is taken as a sequence of volumetric brain images, usually few hundreds. A single brain image is made of several thousands voxels, namely volumetric pixels, that record the BOLD signal in a portion of brain of approximately 3 millimeters diameter. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 112–123, 2010. c Springer-Verlag Berlin Heidelberg 2010
Feature Rating by Random Subspaces for Functional Brain Mapping
113
Cognitive neuroscientists analyze this kind of data to locate the portion of brain, namely the set of voxels, maximally related to some specific cognitive or perceptual tasks. This analysis, known as brain mapping, produces a three dimensional image where each voxel is coloured according to its relevance. A recent trend, based on multi-variate pattern analysis, suggests an alternative approach. The idea is to use a classifier to predict the cognitive state given the brain activity [11]. This approach, however, does not directly answer the brain mapping issue. In the following, we propose a bridging solution where a classification-based approach is used to address the brain mapping issue. The brain mapping task can be conceived as a problem of feature rating. In this domain, two main issues affect the data: its dimensionality and the feature redundancy. The amount of data recorded during a functional neuroimaging experiment is massively high-dimensional. On average, a dataset includes 102 examples, with an input space of 104 features. In this situation, where the features significantly outnumber the examples and data have an extremely low signal-tonoise ratio, the assessment of feature relevance may become very unstable [13]. In addition, the feature rating cannot be driven by the performance of a classification task, where features are usually selected removing redundancies. Brain mapping requires to detect all the relevant voxels for a following analysis by the domain experts. This means that redundant features cannot be discarded. Feature selection methods can be grouped into three main approaches: wrappers, filters, and embedded models [10]. Apparently, the filters are the most appropriate approaches to accomplish the above requirements. Actually, these models rank the features using uni-variate analysis based on either information-based measures and scores (e.g., correlation, mutual information, etc.) or statistical indices and tests (e.g., χ2 , Mann-Whitney, etc.) [1]. They allow for finding highly descriptive features while preserving redundant ones. Moreover, the curse of dimensionality does not affect these methods. On the contrary, wrapper and embedded methods very often fail to fulfill both the high-dimensionality and the redundancy requirements. Most solutions explicitly aim at finding a minimum set of discriminant features by avoiding redundant ones. In addition, the scalability issue prevents an exhaustive exploration of all possible feature combinations. The Elastic-net [18] is an example apart from other embedded methods. This regularization method represents a state-of-the-art linear regression model particularly suited to address the sparsity problem in high-dimensional spaces [4], preserving the redundancy requirements as well. Elastic-net is a regularized model made of two components: the lasso regularization [16] (norm L1 ) enabling the selection of relevant features only, zeroing all others, and the Tikhonov regularization (norm L2 also known as ridge regression) allowing for a grouping effect, where for similar features the model computes similar relevance. This embedded model has been exploited in the analysis of fMRI data [3] as well. Elastic-net suffers two main limitations. Due to the computational cost of the model, it cannot rank the entire set of features in a huge space. Early stopping criteria are adopted to limit the number of selected features in the greedy learning process. This means that possibly some relevant brain areas may not enter the
114
D. Sona and P. Avesani
set of selected features just as a matter of numbers. It is worthwhile to notice that in [3] the voxel sets selected for two different fMRI datasets concerned with a replication of the same experiment, overlap just by 15% of the voxels. To address all the above issues, we propose the adoption of a model belonging to the broader category of the Random Subspace Methods (RSM ) similarly introduced in [9] and [2]. This approach, based on the bagging principle, has been designed to be effective with high-dimensional domains and small sample sizes. The main contribution of this work is the investigation of this feature rating method. We argue that the proposed RSM -based solution is effective to preserve the feature redundancy. To sustain for the claim we provide a theoretical analysis of the redundancy issue and a further empirical analysis with synthetic data. The paper also includes a case study of brain mapping for a neuroscientific experiment on face perception.
2
The Random Subspace Method
Robustness or stability of feature selection methods is a topic that received little attention in the past. In biological domains, however, it is an important issue when the feature selection is the main quest of domain experts. Robustness is strongly correlated with the high-dimensionality and the redundancy issues outlined in the previous section. Actually, high-dimensional spaces and redundant feature rejection are the two main sources of feature selection instability [17]. This issue has been addressed in [13] with an ensemble feature selection technique. The feature selectors were constructed on samples of the data, and the ratings of selected features for each selector were summarized to obtain a unique ranking. A similar idea proposed in [8] produces many different rankings, training an algorithm on different data samples. The rankings are then combined by analysing the features in pairs. These approaches however are inadequate when addressing high-dimensional spaces. An alternative approach, initially introduced in [6], the Random Subspace Method (RSM ) is an ensemble method based on the combination of many classifiers trained on different random views on the same dataset. The idea is to train multiple classifiers on randomly selected subspaces, achieving a generalization improvement thanks to the ensemble approach. It was demonstrated that RSM may outperform both boosting and bagging [14]. This algorithm can achieve both low bias and low variance, that in terms of feature selection, it means higher stability than the single feature selector [5]. Although the main issue of RSM was the classification performance as for the models presented in [2, 5, 7, 15], some derivations produced methods characterized by stable feature selection [12] avoiding feature under-training [15]. In particular, aiming at improving the classification performance, both [2] and [9] addressed the feature selection problem. The two works mainly differ by the chosen embedded feature selection classifiers. The global features’ relevance are determined independently for each feature as a combination of all feature ratings, computed in each selected subspace. The two works, however, did not recognize the potential redundancy preservation property characterizing the approach.
Feature Rating by Random Subspaces for Functional Brain Mapping
115
Algorithm 1. The RSM algorithm Input: The dataset X described by the feature set F , and the learning model M - Init the ratings vector R=[0, . . . , 0] repeat - Randomly select a subset of k features F k ⊂ F - Project X to the new space F k producing X k - Train the model M on the projected dataset X k - ∀fi ∈ F k preserve the rating Rfi = Rfi + Φfi (M) until convergence - Normalize R return R
Here, we suggest a generalization, determining the feature relevance as an ensemble of feature sensitivities, determined for each trained model. Many learners are trained with the dataset projected on random feature subspaces. A feature rating is determined for each feature in the subspace using the sensitivity analysis. Then, a global rating is determined as the average of all “partial” ratings. Computing the relevance as a combination of sensitivities allows for the adoption of any learning model. The process is detailed by the Algorithm 1. The function Φfi computes the rating of i-th feature performing a sensitivity analysis on the trained model M. A commonly used approach is a local method that computes the partial derivative of the model output Y with respect to the input factor fi . This local computation could then be estimated with some approximations over the entire input domain. ∂Y Φfi (M) = EX . (1) ∂fi The function Φ may also compute other ratings, e.g., the model precision. The outlined algorithm is general enough to allow for various instantiations, according to the task at hand. The choice of the learning model M may clearly influence the computational power and complexity of the algorithm. We may chose many solutions, there are however some limitations. First and foremost, training should be fast, hence, a very good solution is to consider simple learners. Notice that in case of a linear regression Y = XW the sensitivities are the model weights Φfi = Wfi . Alternatively, a weak learning of complex models is still suitable.
3
Grouping Effect
Even though RSM is similar to a wrapper feature selection method, its behaviour is much closer to a filter approach. Together with the great scalability to high-dimensional spaces, one of its main advantages is the ability to retrieve all features, preserving redundant ones as well. In statistical terms, this task may be referred to the grouped variables issue. Similar features should have similar ratings. In all our experiments on synthetic and real data we have seen that the model benefits of this property with various classifiers M. Under certain conditions it is possible to provide a theoretical proof of this ability as well.
116
D. Sona and P. Avesani
Let X = x1 , . . . , xN denote a dataset composed of N examples, described by M attributes xi = (xi1 , . . . , xiM ) ∈ X, and F = (f1 , . . . , fM ) be the set of attributes. Let F k ⊂ F denote a subset of k attributes of the feature space F , and X k be the projection of the dataset X on the new space F k . Assume, k finally, Φ : MF k → IR|F | be a function that given a model M, trained using the dataset projected on the feature subspace F k , returns a relevance rating for all the used features. To begin, let assume the RSM -based algorithm explores all possible subsets of k features F k ⊂ F . Adopting a very simple schema to summarize the relevances, such as the sum of relevances across all subsets (see Alg. 1), we have that the relevance computed by RSM for a feature fi is: Φfi (MF k ) (2) Rfi = F k ⊂F
where Φfi (MF k ) is the rating of feature fi determined by function Φ(MF k ) if / F k then the corresponding relevance fi ∈ F k . For all subsets F k such that fi ∈ Φfi (MF k ) = 0. Qualitatively speaking, a learning model M exhibits a grouping effect when for highly correlated features fi , fj ∈ F , the relevance coefficients assigned to the features tends to be equal (up to a change of sign if negatively correlated) [18]. This property can be easily proved in the extreme situation where the two variables are exactly identical fi ≡ fj . In this case the model should assign equal coefficients Φfi (MF ) = Φfj (MF ). Theorem 1. If M is a deterministic learning model exhibiting a grouping effect, then ∀fi , fj ∈ F such that fi ≡ fj it holds Rfi = Rfj . Proof. We may partition all possible subsets F k into four different categories, depending whether the two features fi and fj belong to the feature set or not: Gij = {F k | fi ∈ F k ∧ fj ∈ F k }
G¯ij = {F k | fi ∈ / F k ∧ fj ∈ F k }
Gi¯j = {F k | fi ∈ F k ∧ fj ∈ / F k}
G¯i¯j = {F k | fi ∈ / F k ∧ fj ∈ / F k}
Equation 2 can be rewritten for fi as Φfi (MF k ) + Φfi (MF k ) + Rfi = F k ∈Gij
F k ∈Gi¯j
Φfi (MF k )
F k ∈G¯ij ∪G¯i¯j
We know however that for any F k ∈ G¯ij ∪ G¯i¯j with fi ∈ / F k the corresponding feature relevance is null Φfi (MF k ) = 0. Hence the above equation becomes Φfi (MF k ) + Φfi (MF k ) . (3) Rfi = F k ∈Gij
F k ∈Gi¯j
Since M is a model exhibiting a grouping effect then, ∀F k ∈ Gij we have that Φfi (MF k ) = Φfj (MF k ), hence Φfj (MF k ) + Φfi (MF k ) . (4) Rfi = F k ∈Gij
F k ∈Gi¯j
Feature Rating by Random Subspaces for Functional Brain Mapping
117
Moreover, for any subset Fpk ∈ Gi¯j there exist a subset Fqk ∈ G¯ij , such that excluding respectively fi from Fpk and fj from Fqk ∈ G¯ij the two resulting subsets are equal Fpk \fi = Fqk \fj . Since, however, the two features are equivalent fi ≡ fj then Fpk \fi ∪ fj cannot be distinguished from Fqk . Hence, since M is a deterministic model it holds that Φfi (MFpk ) = Φfj (MFqk ). As a result Φfj (MF k ) + Φfj (MF k ) = Rfj . (5) Rfi = F k ∈Gij
F k ∈G¯ij
This theorem says that adopting a model that exhibits a grouping effect, RSM inherits the same property as well.
4
Experimental Settings
Feature selection methods are usually analyzed comparing both the classification accuracy obtained by some learners, and the size of the selected feature set. Feature redundancy is considered a source of suboptimal solutions and therefore it has to be reduced as much as possible. On the contrary, in the neuroscientific framework, finding all relevant features becomes a major goal. Unfortunately, for the brain mapping task it is not straightforward to get datasets with available ground truth. Empirical studies are limited by a qualitative and subjective analysis of the results. For this reason we organize a twofold experimental evaluation: a quantitative and controlled analysis using synthetic data and a qualitative and practical assessment using a real neuroimaging dataset. Without loss of generality let assume the values x·j for feature fj for all patters are normalized with unit variance and zero mean. In this situation, a very simple model exhibiting grouping effect is the linear ridge regression. The property is guaranteed by the strictly convex L2 penalty, i.e., the ridge regularization. For this reason, we used this simple model within RSM for all the following empirical tests. Training was done with ordinary least square approximation. To understand the extent of the results we compared the proposed model to the state-of-the-art Elastic-Net [18] devised to address the same issues (scalability and variable grouping). Elastic-net was executed in all experiments with a L2 regularization coefficient equal 2, the best average value found after some experiments. Exploring the whole space of possible sets of k features, as required by the above theoretical assumptions (see Sec. 3), is computationally unfeasible in the presence of huge dimensional spaces. Under the bootstrapping principle, however, with a “good” number of samples, the features’ relevance asymptotically converge to fixed points. The task here is to decide how to explore the space of samples and when to stop the algorithm, determining a “stable” solution in a reasonable amount of time. A random strategy was adopted to create the feature samples using a uniform distribution without replacement. Another important issue is the dimension of the sample sets. The algorithm is clearly faster when using small subsets.
118
D. Sona and P. Avesani
Nonetheless, if the subset is too small the multivariate relationships may be lost. Moreover, using fewer features then examples, the model is less prone to illconditioning. In the following synthetic experiments, the sample dimensionality has been matter of study in terms of both performance and stability. In all our experiments with synthetic data, the stopping criterion was based on a measure of convergence of the normalized ratings ˆ = R(t)
R(t) R(t)2
The algorithm was stopped when the normalized ratings vector was not changing during the last few iterations, i.e., when the maximum feature change over an interval of iterations is below a threshold ˆ − R(t ˆ − k)∞ < R(t) For stability reasons we used k = 10 and = 0.01. In the neuroscientific case study illustrated in Section 6 we adopted a different policy. We defined in advance a number of iterations per feature in such a way that the stability of results was guaranteed.
5
Empirical Evaluation
The analysis of the grouping effect of RSM was carried out with three different experiments on synthetic data, that we refer to as redundancy, ranking, and retrieval analyses. All the experiments were performed many times, always presenting similar results. 5.1
Redundancy
The analysis of redundancy preservation helps to understand the grouping effect inherited by RSM when the adopted classifier M exhibits a grouping effect. The experiment was designed to have two groups of features controlling the target, while all other features were completely unrelated. Specifically, all features were generated with i.i.d. Gaussian distributions G(0, 1). Then the target was generated as a linear combination of two randomly selected features fi and fj , 1 y = x·i − x·j 2
(6)
and four other features were made equal to the two above features(x·k = x·l = x·i and x·m = x·n = x·j ). The task here was regression. Figure 1 shows the ratings convergence during a RSM execution. As expected, there are two groups of tree relevances each that are significantly different from all other relevances. The two groups also preserve the relative significance and sign as determined in equation (6). One group presents ratings that are approximately one half the ratings of the other group and their sign is reversed. Finally, all other ratings are near zero. Similar results were observed using a different number of examples, features, and sample dimensions.
Feature Rating by Random Subspaces for Functional Brain Mapping
119
0.8
0.6
Normalized weights
0.4
0.2
0
-0.2
-0.4 0
50
100
150 200 250 Number of iterations per feature
300
350
Fig. 1. Convergence of the normalized ratings during RSM execution
5.2
Ranking
For a deeper analysis of the relative ratings, we designed an experiment where all features were decreasingly related to the target. In particular, the target y was first generated with a continuous uniform distribution U (−1, 1), then, the features were generated adding an incremental Gaussian noise i ∼ G(0, σi ) to the target. xi = y + i (7) where σi ranges in the interval [0.01, 3]. As a results the features have a decreasing level of correlation with the target. To satisfy the grouping effect property of ridge regression the features were independently z-scored. The quality of the generated rankings was measured using the Spearman correlation versus the ground truth. In Table 1 are shown the mean correlations over 10 trials for rankings obtained with Elastic-Net and RSM. The datasets was made of 100 examples. Various feature space dimensions where tested ranging from 100 to 1600. The correlation was determined both over the whole feature space and over a selection of 10% most relevant features. The ranking obtained with RSM always outperforms the corresponding Elastic-Net ranking. Figure 2 depicts the average percentage (over 10 trials) of correct top ranked features when performing feature selection. In particular, anytime ranking is Table 1. Mean correlation over 10 trials between real and computed rankings for all features (5 left columns) and for 10% most relevant features (5 right columns). All features
10% most relevant features 100 200 400 800 1600 100 200 400 800 1600 Elastic-Net .24 .34 .50 .59 .67 .57 .61 .58 .67 .72 RSM .75 .77 .78 .79 .79 .93 .90 .90 .90 .89
120
D. Sona and P. Avesani
RSM 200 feats RSM 400 feats RSM 800 feats RSM 1600 feats EN 200 feats EN 400 feats EN 800 feats EN 1600 feats
0.9
Percentage of correct features
0.8
0.7
0.6
0.5
0.4
0.3 5
10
15
20
25 30 35 Number of selected features
40
45
50
Fig. 2. Percentage of correct top ranked features with increasing dimension of selected feature set
used for feature selection. The last step in the process is to decide the number of features to select. The first ranked features are usually selected with the hope to find the most relevant ones. Figure 2 shows the variation of correctness varying the number of selected features. An encouraging result is that the RSM is quite stable considering both the dimension of the input space (good scalability) and the size of selected feature sets (due to the overall good ranking). 5.3
Retrieval
Another important issue to consider is the retrieval capability. Assuming that only a limited number of features are related to the target, the task is to select the minimum number of features in order to find all relevant ones. This analysis was performed generating a dataset similar to the one in Section 5.2, where only 20 features were related with increasing Gaussian noise (σ ∈ [0.01, 3]) to the target. All other features were generated with i.i.d. continuous uniform distributions U (−1, 1) adding again a Gaussian white noise G(0, σ) with constant variance (σ = 0.1). Each feature was then independently z-scored. Table 2 shows the average size of the selected feature sets required by the two models for the retrieval of all 20 relevant features. These values are averaged over 10 experiments. The experiments were conducted with constant number of training examples (100) and increasing dimensionality of features space. In spite of the increasing number of required features as the feature space increases, the set sizes are smaller then 10% for high-dimensional spaces. This result is quite promising, showing the ability of the two models to scale on the feature space dimensionality. Notice that RSM still outperforms Elastic-Net.
6
Case Study
As a real neuroimaging case study we considered a testbed dataset we acquired in our laboratories. It was specifically designed to evaluate multi-variate pattern
Feature Rating by Random Subspaces for Functional Brain Mapping
121
analysis techniques on well understood cognitive tasks. During the fMRI acquisition, a volunteer was exposed to visual stimuli of two classes: faces, either plain or scrambled, and fixation, a kind of null or baseline stimulus. According to the best practice of cognitive neuroscience, recording was done with two different protocol designs. A first dataset was acquired using a block design protocol: the sequence of stimuli is organized into blocks of 20 seconds, where each block is made of stimuli of the same class, presented every 2 seconds. A second dataset was acquired using an event-related protocol: each face stimulus lasting for 2 seconds is followed by 9 seconds of fixation. Each design protocol was organized into 5 chunks of stimuli, spaced out by few minutes of resting. In the following we present results determined for the Visual cognitive task where the goal was to rank the voxels according to their relevance in predicting whether the volunteer was exposed to faces or to fixation stimuli. According to neuroscientific insights, the posterior visual cortex is expected to be most active. Data acquired with a block design provides better signal-to-noise ratio, while data acquired with an event-related protocol are more challenging for brain mapping. The data was pre-processed following domain-specific standards: slice-time and movement correction, linear detrending, and z-scoring. The stimulation protocol was delayed by 4 seconds in order to align the stimuli, namely the class labels, with the expected brain hemodynamic. The brain was segmented from the skull, resulting in a dataset of 604 examples described by 62.190 features. To guarantee a stable solution, RSM was iterated much more than needed (an average of 400 times for each feature). Thresholding was manually performed by filtering the values according to the feature rates computed with RSM. The main purpose of the case study was to observe the reproducibility of RSM results cross-protocol. In Figure 3 are depicted the brain maps determined for the two dataset, acquired with the block and the event-related protocols. The voxels are coloured according to their relevance. Bright colors denote high correlation between brain region and cognitive task while dark colours denote low correlation. Warm colors palette is devoted to positive correlations, and cold colors palette for negative correlations. Despite the known variance of functional neuroimages acquired in different times, the two brain maps seem to provide a good degree of overlap. It is worth remarking that using Elastic-Net the degree of overlap in [3] was only 15%, an high variance probably exacerbated by the bias of stopping criteria. Concluding, the brain map were generated with a sample size of variable dimension. Since we did not notice any great difference between different maps, the results seem to confirm the stability of the brain maps to the sample size parameter. Table 2. Average minimum dimension allowed to retrieve all 20 relevant features 100 200 400 800 1600 Elastic-Net 94.5 126.7 83.5 81.5 219.9 RSM 30.6 31.9 42.9 61.1 139.4
122
D. Sona and P. Avesani
Fig. 3. First row presents the lateral the top and posterior section views of the sensitivity map determined with RSM on the data collected with the block design protocol. The second row shows the same views of the map generated on the data collected with the event-related design protocol.
7
Conclusions
We address the challenge of computing brain maps from functional neuroimages using a multivariate approach, and assuming this problem as a feature rating task. The redundancy preservation issue is a major concern for successfully deploying a computational method in the neuroimaging domain. While Elastic-Net represents a state of the art solution for preserving the redundant features, it suffers of some computational issues: scalability and parameter optimization. We argue that RSM might represent a more suitable solution for brain mapping computation. A theoretical contribution allows to understand when the RSM method can be considered an effective solution for dealing with redundancy issue. Empirical studies on synthetic data provide evidence for some properties of RSM as rank stability, retrieval capability and feature redundancy. A comparison with Elastic-Net allows to show the improved robustness of RSM with respect to scalability and the choice of parameters. A quantitative analysis has been extended by a qualitative analysis including an application of RSM to a functional neuroimaging testbed dataset. Brain maps generated with RSM were stable across different runs and less sensitive to the dimensionality of the projected feature space.
References [1] Biesiada, J., Duch, W., Kachel, A., Maczka, K., Pa, S., Palucha, S.: Feature ranking methods based on information entropy with Parzen windows. In: International Conference on Research in Electrotechnology and Applied Informatics (REI), pp. 109–119 (2005)
Feature Rating by Random Subspaces for Functional Brain Mapping
123
[2] Cai, R., Hao, Z., Wen, W.: A Novel Gene Ranking Algorithm Based on Random Subspace Method. In: 2007 International Joint Conference on Neural Networks, pp. 219–223. IEEE, Los Alamitos (2007) [3] Carroll, M.K., Cecchi, G.A., Rish, I., Garg, R., Ravishankar Rao, A.: Prediction and interpretation of distributed neural activity with sparse models. NeuroImage 44(1), 112–122 (2009) [4] Destrero, A., Mosci, S., Mol, C., Verri, A., Odone, F.: Feature selection for highdimensional data. Computational Management Science 6(1), 25–40 (2009) [5] D´ıaz-Uriarte, R., De Andr´es, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(3) (2006) [6] Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998) [7] Horton, M., Cameron-jones, M., Williams, R.: Virtual Attribute Subsetting. In: Australian Joint Conference on Artificial Intelligence, pp. 214–223. Springer, Heidelberg (2006) [8] Jong, K., Mary, J., Cornu´ejols, A., Cornu, A., Marchiori, E., Sebag, M.: Ensemble Feature Ranking. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 267–278. Springer, Heidelberg (2004) [9] Lai, C., Reinders, M.J.T., Wessels, L.: Random subspace method for multivariate feature selection. Pattern Recognition Letters 27(10), 1067–1076 (luglio 2006) [10] Liu, H., Motoda, H. (eds.): Computational Methods of Feature Selection, 1st edn. Data Mining and Knowledge Discovery. Chapman & Hall/CRC (ottobre 2007) [11] Norman, K., Polyn, S., Detre, G., Haxby, J.: Beyond mind-reading: multi-voxel pattern analysis of fmri data. Trends in Cognitive Sciences 10(9), 424–430 (2006) [12] O’Sullivan, J., Langford, J., Caruana, R., Blum, A.: FeatureBoost: A Meta Learning Algorithm that Improves Model Robustness. In: International Conference on Machine Learning (ICML), pp. 703–710. Morgan Kaufmann Publishers Inc., San Francisco (2000) [13] Saeys, Y., Abeel, T., Peer, Y.: Robust Feature Selection Using Ensemble Feature Selection Techniques. In: European conference on Machine Learning and Knowledge Discovery in Databases (ECMP/PKDD), Antwerp, Belgium, pp. 313–325. Springer, Heidelberg (2008) [14] Skurichina, M., Duin, R.P.W.: Bagging, Boosting and the Random Subspace Method for Linear Classifiers. Pattern Analysis & Applications 5(2), 121–135 (giugno 2002) [15] Sutton, C., Sindelar, M., Mccallum, A.: Feature Bagging: Preventing Weight Undertraining in Structured Discriminative Learning (2005) [16] Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58(1), 267–288 (1996) [17] Yu, L., Ding, C., Loscalzo, S.: Stable feature selection via dense feature groups. In: International Conference on Knowledge Discovery and Data Mining (KDD), pp. 803–811. ACM Press, New York (2008) [18] Zou, H., Hastie, T.: Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society B 67, 301–320 (2005)
Recurrence Plots for Identifying Memory Components in Single-Trial EEGs Nasibeh Talebi1,* and Ali Motie Nasrabadi2 1
Department of Biomedical Engineering, Faculty of Engineering, Shahed University, Tehran, Iran Tel.: +98-21-51212075, Fax: +98-21-51213167
[email protected] 2 Department of Biomedical Engineering, Faculty of Engineering, Shahed University, Tehran, Iran
[email protected]
Abstract. The purpose of this study was to apply recurrence plots and recurrence quantification analysis (RQA) on event related potentials (ERPs) recorded during memory recognition tests. Data recorded during memory retrieval in four scalp region was used. Tow most important ERP’s components corresponding to memory retrieval, FN400 and LPC, were detected in recurrence plots computed for single-trial EEGs. In addition, the RQA was used to quantify changes in signal dynamic structure during memory retrieval, and measures of complexity as RQA variables were computed. Given the stimulus, amplitude of the RQA variables increases around 400ms, corresponding to dimension reduction of system. Furthermore, after 800ms these amplitudes decreased which can be as a consequence of an increase in system dimension and back to its basic state. The mean amplitude of Old items was more than New one. Using this method, we found its ability to detect memory components of EEG signals and do a distinction between Old/ New items. In contrast with linear techniques recurrence plots and RQA do not need large number of recorded trials, and they can indicate changes in even single-trial EEGs. RQA can also show differences between old and new events in a memory process.
1 Introduction Electroencephalogram (EEG) is a valuable noninvasive tool for measuring small-scale changes in brain electric field by electrodes placed on the scalp. EEG’s high temporal resolution is its main advantage, which allows studying changes in brain’s electric field over time. Specific patterns of electrical changes are known corresponding to cognitive processes. These typical patterns are called event related potentials (ERPs) and can provide good visions of information processing in the brain [1]. For example, many studies on recognition memory used ERP signals have shown that ERPs related to "Old" items (items that have already been studied), which was correctly judged *
Corresponding author.
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 124–132, 2010. © Springer-Verlag Berlin Heidelberg 2010
Recurrence Plots for Identifying Memory Components in Single-Trial EEGs
125
tend to be more positive in amplitude than the responses to ‘‘New’’ items (items that have not previously been studied) – commonly referred to as old/new effects [2, 3].Two of these recognition memory components that are often observed, are old / new early frontal (FN400) and a later parietal old/ new effect (LPC). Previous researches have suggested that these potentials can be an index of separate familiarity and recognition processes. Usually familiarity is defined as sense of prior exposure lacking contextual details while recollection is thought to retrieve details about a previously studied event. According to dual-process theories (claim that familiarity and recollection are separate processes) of recognition memory, it is suggested that an early (∼300–500 ms) frontal component, FN400, is correlated with familiarity while a later (∼500–800 ms) parietal component, LPC, appears modulated by recollection ([4, 5, 6, 7]). Spontaneous EEG is a noise for ERPs and since the amplitude of the ERP component is very small, compared to the background noise, a common approach is the use of Grand Averages, i.e. a large number of measurements is averaged in order to improve the signal-to-noise ratio (SNR). While this procedure in general highlights the component of interest, it may not be an appropriate means of analysis. Averaging will have good result if there are synchronic samples to average, but in biologic signal the latency of the components of interest, especially those related to higher cognitive processes, may vary in time, and averaging causes smooth signal [8]. Furthermore, ERPs are defined in relation to a baseline. The baseline is taken to be a period of inactivity and is most commonly calculated from the pre-stimulus interval and subtracted from original signal. So pre-stimulus activities will have effects on ERPs. The other disadvantage of the averaging method is the high number of trials needed to reduce the signal-to-noise-ratio [9]. This is crucial for example for clinical studies, for studies with children and for studies where repeating a task would influence the performance. Additionally there are several high-frequency structures in EEG reflecting other mental activities, filtered out by averaging. So it is desirable to find new ways of analyzing event-related activity on a single-trial basis. In addition, Neurons are known to be nonlinear devices because they become activated when their somatic membrane potential crosses a certain threshold [8]. This nonlinearity is one of the essentials in neural modeling which leads to the sigmoidal activation functions of neural networks [10]. The activity of large formations of neurons is macroscopically measurable as the electroencephalogram at the human scalp which results from a spatial integration of postsynaptic potentials [11]. However, in data processing it is an unsolved problem whether the EEG should be treated as a time series stemming from a linear system or a nonlinear dynamical system, and which of linear or nonlinear analysis method is more suitable [12]. Applying nonlinear techniques of data analysis to EEG measurements has a long tradition. Most of these efforts have been done by computing the correlation dimension of spontaneous EEG (e.g. [13, 14, 15, 16, 17]). While correlation dimensions are only well defined for stationary time series generated by a low dimensional dynamical system moving around an attractor, these measures fail in investigating event-related brain potentials (ERPs), [18] since they are non-stationary
126
N. Talebi and A.M. Nasrabadi
by definition (time-dependent changes of EEGs power in different frequency bands [19], which is modeled by allowing the variances of the driving noises to change with time [20]). A new non-linear analysis method is based on recurrence plots (RPs). The main advantage of this method is that it can be used for non-stationary signals. Thomasson used RPs on EEGs to predict seizure [21]. In 2004 Marwan et al. applied extended recurrence quantification analysis (RQA) to physiological ERPs, and detected transients in EEG signals around 300ms, corresponding to p300 component by Marwan et. al [22] and Order Pattern Recurrence Plots by Schinkelto [23] are the other examples of data processing by this method. It seems that recurrence plots can use as a nonlinear approach to analyze ERPs recorded during memory recognition test. In addition to visualize transients in signal because of an external stimulus, the RQA measures quantify changes in brain dynamical structures during memory phenomena. In this article in the first section a short introduction into RPs and RQA measures is given. In the next section this method will use for single-trial EEGs recorded during memory recognition test. The RQA measures will be computed for these signals and finally these measures compare in tow different Old and New items.
2 Methods and Materials 2.1 Recurrence Plots for Analyzing Data i=1 of a system in its phase space [24]. The Suppose that there is a trajectory corresponding RP is based on the following recurrence matrix: 1: 0:
,
, ,
,
1, … ,
(1)
where N is the number of considered states and means equality up to an error (or distance) ε. Usually the phase space has to be reconstructed from the original onedimensional time series [25, 26]. A frequently used method for the reconstruction is the time delay method: ∑
1
(2)
Where m is the embedding dimension and τ is the time delay. For the analysis of time series, both embedding parameters, the dimension m and the delay τ, have to be chosen appropriately. In a recurrence plot there are three small scale structures; single points which can occur if states are rare, a diagonal line of length l which occurs when a segment of the trajectory runs almost in parallel to another segment (i.e. through an ε-tube around the other segment), and a vertical (horizontal) line (with v the length of the vertical line) marks a time interval in which a state does not change or changes very slowly. In order to go beyond the visual impression yielded by RPs, several measures of complexity which quantify the small scale structures in RPs have been proposed in [12, 27, 28]and are known as recurrence quantification analysis (RQA). These measures are based on the recurrence point density (recurrence rate (RR)), the
Recurrence Plots for Identifying Memory Components in Single-Trial EEGs
127
diagonal line structures (determinism (DET), Average diagonal line length (
), the longest diagonal line ( and entropy (ENTR)), and vertical line structures (laminarity (LAM), trapping time (TT), and maximal length of the vertical lines (vmax)) of the RP. A computation of these measures in small windows (sub-matrices) of the RP moving along the LOI1 yields the time dependent behavior of these variables. 2.2
Database
The data used in this article has previously examined and was kindly provided by Tim Curran et al. (Colorado University, 2006 [7]). Subjects studied 120 words. About 70 min later, ERPs were recorded during memory recognition test. Data was from 15 student of Colorado university (age: Mean 21; range 18-21 years). All subjects were right-handed, native English speakers. Stimuli were 480 low-frequency (MN, 1.26; range, 1–2 counts per million), four to seven letter English words. The words were divided randomly into four 120-word sets. During the recognition memory task, scalp voltages were collected with a 128 channel EEG. Amplified analog voltages (0.1–100 Hz band pass) were digitized at 250 Hz. The EEG was digitally low-pass filtered at 40 Hz. Trials were discarded from analyses if they contained incorrect responses, eye movements (electrooculogram over 70μV), or20% of channels were bad (average amplitude over 100μV or transit amplitude over 50μV). EEG was measured with respect to a vertex reference (Cz), but an average-reference transformation was used to minimize the effects of reference-site activity and accurately estimate the scalp topography of the measured electrical fields. According to previous studies [13-17], FN400 component has a frontocentral distribution. On the other hand, LPC appears stronger in parietal. So four special regions were selected on scalp electrodes (fig 1).
Fig. 1. Region locations are shown by arrows
1
Line Of Identity (
,
1
1
)
128
N. Talebi and A.M. Nasrabadi
Fig. 2. Diagram of correlation dimension as a function of embedding dimension, computed for 1 second- EEG
2.3 Data Analysis To do Recurrence quantification analysis, the reconstructed signal in phase space is required. A frequently used method for the reconstruction is the time delay method. Initial signal is sampled and embedded in new phase space. Both embedding parameters, the dimension m and the delay τ, have to be chosen appropriately. According to previous works, time delay τ is proportional to signal’s autocorrelation time. Correlation dimension and false nearest neighbors are tow common approaches to estimate the smallest sufficient embedding dimension. In this study correlation dimension was computed for data. Approximate saturation appears after m=4 (fig 2). The false nearest neighbor method also return the same m=4. Since data used in this study is short and has only 275 point length and time delay method is appropriate for long signals, we prefer to use signals directly to reconstruct trajectories in phase space. To do this and make a four dimensional vector we take one electrode from each region (LAS, RAS, LPS and RPS) respectively. Measures of complexity were computed using moving window along the LOI, yielded the time dependent behavior of these variables. For every trial this procedure repeated seven times, equal to the number of electrodes located in each region. As it mentioned before, computing these measures depends on several parameters. The most important of them is threshold ε. A common method is to choose ε as 10% of the maximum phase space diameter, which in this article is used. In addition parameters lmin , vmin , size of moving window w, and Thiler window size [29] should be selected properly. In this study we considered vmin=lmin=4, moving window size w=50 (200ms), Theiler window size 2.
3 Result Our aim is to study single trials in order to find transitions in the brain processes as a consequence of memorial stimulation. Recurrence plots were computed for single trial EEGs. Due to the FN400 and the LPC components in the data, the RPs show varying structures changing in time (Fig. 3, 4). Diagonal structures and clusters of black points occur. The non-stationarity of the data around the FN400 and LPC causes
Recurrence Plots for Identifying Memory Components in Single-Trial EEGs
129
3(a)
3(c)
3(b)
Fig. 3. Single trial EEG and corresponding RPs: (3.a) an old/new single trial EEG in frontal (old EEG is drawn by black solid line and new is drawn by blue dot line); (3.b) old item, (3.c) new item. White bands resulted from FN400, and clustered black points around 400 ms are noticeable in the RPs.
extended white bands along these times in the RPs. However, the clustered black points around 400 ms (fig 3(a), 3(b)) and 800ms (fig 4(a), 4(b)) occur in almost all RPs prominent in frontal and parietal regions, respectively. The measures of complexity of these single trial ERPs indicate FN400 and LPC components resulting from a memorial event. Whereas these components are not so obvious in raw EEGs, and for detecting changes the averaging is needed .As it mentioned before, recurrence quantification analysis can show and quantify changes in dynamical structure, which can be use as an appropriate method to analyze EEG data. The RQA was computed from the RPs of Old and New ERPs for single trials, in sliding windows over the RPs (which have the dimension m = 4) with a length of 200 ms and with a shifting step of 4ms. The mean of 7 electrode’s RQA variables of ERPs reveal typical structures in the data (Fig. 5). They indicate the transitions corresponding to the FN400 and LPC components in frontal and parietal. These transients are visible in the RQA variables for both Old and New ERPs (Fig. 5). The onset of the increasing of the parameter is about 200 ms before the event. This is due to the windowed analysis of the RPs (200 ms windows). We have chosen the beginning of the RP window for the time. Actually, 400 ms after stimulation RQA variables start to increase reflecting a reduction in system’s dimension and complexity. These amplitudes lessen after 800ms, corresponding to an increment in system’s dimension and back into its basic state [30]. Additionally variables belong to Old items are grater then those from New one. It means that when
130
N. Talebi and A.M. Nasrabadi
4(a)
4(c)
4(b)
Fig. 4. Single trial EEG and corresponding RPs: (4.a) an old/new single trial EEG in parietal
(old EEG is drawn by black solid line and new is drawn by blue dot line); (4.b) old item, (4.c) new item. White bands resulted from LPC, and clustered black points around 800 ms are noticeable in the RPs.
Fig. 5. The RQA measures computed for Old/ New single trial EEG. Old variables (black- solid lines) are greater than New ones (blue- dash lines).
Recurrence Plots for Identifying Memory Components in Single-Trial EEGs
131
somebody correctly remembers an old item he’d studied before, his brain’s dimension decrease more than when he haven’t studied an item. These differences reveal the RQA’s capability to discriminate between Old and New items, whereas this distinction is not obvious in raw EEGs and averaging is needed.
4 Discussion To discriminate between Old and New items, whereas this distinction is not obvious in raw EEGs and averaging is needed. We have applied recurrence plots and recurrence quantification analysis (RQA) to physiological event-related potential data (ERP). Our aim was to study single trials in order to find transitions in the brain processes as a consequence of memorial stimulation. Due to the FN400 and the LPC components in the data, the RPs show varying structures changing in time. Diagonal structures and clusters of black points occur. The non-stationarity of the data around the FN400 and LPC causes extended white bands along these times in the RPs. However, the clustered black points around 400 ms and 800ms occur in almost all RPs prominent in frontal and parietal regions, respectively. RQA measures which are mainly based on diagonal structures in the recurrence plots (RPs), i.e. (RR), a measure of the density of recurrence points, (DET), which is the ratio of recurrence points located on connected diagonal structures in the RP, the averaged (L), and the maximum diagonal line length (Lmax), and the (ENTR) which reflects the complexity of the RP in respect of the diagonal lines enable the identification of period-chaos transitions. Furthermore RQA measures based on vertical structures, i.e. the laminarity (LAM), the trapping time (TT), and the maximum vertical line (Vmax) which are analogously defined as measures base on diagonal structures make the identification of chaos-chaos transitions and laminar states possible. Variables belong to Old items are grater then those from New one. It means that when somebody correctly remembers an old item he’d studied before, his brain’s dimension decrease more than when he haven’t studied an item. These differences reveal the RQA’s capability.
Acknowledgment The authors are grateful to Tim Curran in Colorado University for providing the EEG data of the memory recognition test.
References [1] Donchin, E., Ritter, W., McCallum, C.: Cognitive psychophysiology: the endogenous components of the ERP. In: Callaway, E., Tueting, P., Koslow, S. (eds.) Event-related potentials in man, pp. 349–441. Academic Press, New York (1978) [2] Friedman, D., Johnson Jr., R.: Event-related potential (ERP) studies of memory encoding and retrieval: a selective review. Microsc. Res. Tech. 51, 6–28 (2000) [3] Rugg, M.D., Allan, K.: Memory retrieval: an electrophysiological perspective. In: The new cognitive neurosciences, 2nd edn., pp. 805–816. MIT Press, Cambridge (2000) [4] Jacoby, L.L.: A process dissociation framework: separating automatic from intentional uses of memory. J. Mem. Lang. 30, 513–541 (1991) [5] Reder, L.M., Nhouyvanisvong, A., Schunn, C.D., Ayers, M.S., Angstadt, P., Hiraki, K.: A mechanistic account of the mirror effect for word frequency: a computational model of remember-know judgments in a continuous recognition paradigm. Exp. Psychol. Learn. Mem. Cogn. 26, 294–320 (2000) [6] Yonelinas, A.P., Mem Lang, J.: J. Mem. Lang. The nature of recollection and familiarity: a review of 30 years of research 46, 441–517 (2002)
132
N. Talebi and A.M. Nasrabadi
[7] Curran, T., DeBuse, C., Woroch, B., Hirshman, E.: Combined Pharmacological and Electrophysiological Dissociation of Familiarity and Recollection. Behavioral/ Systems/Cognitive: The Journal of Neuroscience 26(7) (2006) [8] Kandel, E.R., Schwartz, J.H., Jessel, T.M.: Essentials of Neural Science and Behavior (1995) (Appleton & Lange, East Norwalk, Connecticut) [9] Kutas, M., van Petten, C.: Psycholinguistics electrified: event related potential investigations. In: Gensbacher, M.A. (ed.) Handbook of psycholinguistics, pp. 83–143. Academic Press, San Diego (1994) [10] Amit, D.J.: Modeling Brain Function. The World of Attractor Neural Networks. Cambridge University Press, Cambridge (1989) [11] P.L.: Electric Fields of the Brain. Oxford University Press, NY (1981) [12] Longtin, A., Galdrikian, B., Farmer, B., Theiler, J., Eubank, S.: Testing for nonlinearity in time series: The method of surrogate data. Physica D 58, 77–94 (1992) [13] Babloyantz, A., Salazar, J.M., Nicolis, C.: Evidence of chaotic dynamics of brain activity during the sleep cycle. Phys. Lett. A 111, 152–156 (1985) [14] Gallez, D., Babloyantz, A.: Predictability of human EEG: A dynamical approach. Biol. Cybern. 64, 381–391 (1991) [15] Rapp, P.E., Zimmerman, I.D., Albano, A.M., de Guzman, G.C., Greenbaun, N.N., Bashore, T.R.: Experimental studies of chaotic neural behavior: Cellular activity and electroencephalographic signals. In: Othmer, H.G. (ed.) Nonlinear Oscillations in Biology and Chemistry. Lecture Notes in Biomathematics, vol. 66, pp. 175–205. Springer, Berlin (1986) [16] Lutzenberger, W., Elbert, T., Birbaumer, N., Ray, W.J., Schupp, H.: The scalp distribution of the fractal dimension of the EEG and its variation with mental tasks. Brain Topogr. 5, 27–33 (1992) [17] Pritchard, W.S., Duke, D.W.: Dimensional analysis of no-task human EEG using the Grassberger-Procaccia method. Psychophysiol. 29, 182–191 (1992) [18] Sutton, S., Braren, M., Zubin, J., John, E.R.: Evoked potential correlates of stimulus uncertainty. Science 150, 1187–1188 (1965) [19] Wong, K.F.K., Galka, A., Yamashitad, O., Ozaki, T.: Modelling non-stationary variance in EEG time series by state space GARCH model. Computers in Biology and Medicine (2005) [20] Wong, K.K.F.: Modelling non-stationary variance in EEG time series by state space GARCH model [21] Thomasson, N., Hoeppner, T.J., Webber Jr., C.L., Zbilut, J.P.: Recurrence quantification in epileptic EEGs. Phys. Lett. A 279(1-2), 94–101 (2001) [22] Marwan, N., Meinke, A.: J. Bifur.Extended recurrence plot analysis and its application to ERP data. Chaos Cogn. Int. Complex Brain Dynam. 14, 761–771 (2004) [23] Schinkel, S., Marwan, N., Kurths, J.: Order patterns recurrence plots in the analysis of ERP data. Cogn. Neurodyn. 1, 317–325 (2007) [24] Eckmann, J.-P., Kamphorst, S.O., Ruelle, D.: Recurrence plots of dynamical systems. Europhys 5, 973–977 (1987) [25] Takens, F.: Detecting strange attractors in turbulence. In: Rand, D., Young, L.-S. (eds.) Dynamical Systems and Turbulence. Lecture Notes in Mathematics, vol. 898, pp. 366– 381. Springer, Berlin (1981) [26] Packard, N.H., Crutchfield, J.P., Farmer, J.D., Shaw, R.S.: Geometry from a time series. Phys. Rev. Lett. 45, 712–716 (1980) [27] Webber Jr., C.L., Zbilut, J.P.: Dynamical assessment of physiological systems and states using recurrence plot strategies. J. Appl. Physiol. 76, 956–973 (1994) [28] Marwan, N., Wessel, N., Meyerfeldt, U., Schirdewan, A., Kurths, J.: Recurrence plot based measures of complexity and its application to heart rate variability data. Phys. Rev. E, 66(2) (2002) [29] Theiler, J.: Spurious dimension from correlation algorithms applied to limited time-series data. Phys. Rev. A 34, 2427–2432 (1986) [30] Kozma, R., Freeman, W.J., Erdi, P.: The KIV model—nonlinear sp spatio-temporal dynamics of the primordial vertebrate forebrain. Neurocomputing 52-54, 819–826 (2003)
Comparing EEG/ERP-Like and fMRI-Like Techniques for Reading Machine Thoughts Fabio Massimo Zanzotto and Danilo Croce University of Rome Tor Vergata 00133 Roma, Italy {zanzotto,croce}@info.uniroma2.it Abstract. fMRI and ERP/EEG are two different sources for scanning the brain for building mind state decoders. fMRI produces accurate images but it is expensive and cumbersome. ERP/EEG is cheaper and potentially wearable but it gives more coarse-grain data. Recently the metaphor between machines and brains has been introduced in the context of mind state decoders: the “readers for machines’ thoughts”. This metaphor gives the possibility for comparing mind state decoder methods in a more controlled setting. In this paper, we compare the fMRI and ERP/EEG in the context of building “readers for machines’ thoughts”. We want assess if the cheaper ERP/EEG can be competitive with fMRI models for building decoders for mind states. Experiments show that accuracy of “readers” based on ERP/EEG-like data are considerably lower than the one of those based on fMRI-like images.
1
Introduction
Mining brain related data for decoding mind states is a fascinating and active area of research. Two major and different sources of brain related data are considered in these studies: functional magnetic resonance imaging (fMRI) and electroencephalography for detecting event related potentials (ERP/EEG). Brain images obtained with fMRI have been largely used in combination with machine learning for building decoders for mind states. fMRI and support vector machines [1] have been used to build decoders that determine if participants are looking to: (1) shoes or bottles [2]; (2) human faces or objects [3]. In [4], fMRI based mind state decoder have been successfully learnt and applied for determining if subjects were (1) looking at pictures or sentences, (2) reading an ambiguous or non-ambiguous sentence, and (3) looking at words describing food, people, buildings, etc. fMRI related information have been also used for building classifiers that can predict the perceived complexity of a simplified Sudoku schema [5]. Finally, corpus linguistics has been used to induce novel brain activation images for unobserved semantic categories from observed brain activation images [6]. Also ERP/EEG brain data have been used for the same purposes. ERP/EEG brain data are used to explore subjects performing super-ordinate categorization between natural and artifact objects [7,8]. Similarly to [6], corpus linguistic Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 133–144, 2010. c Springer-Verlag Berlin Heidelberg 2010
134
F.M. Zanzotto and D. Croce
has been used to determine and classify ERP/EEG activation data for unseen semantic categories [9]. fMRI and ERP/EEG are then two alternative models for investigating and collecting data for the activities of human brains. Both these models have been used to induce mind state decoders from data. These are different models with different strengths and limits. In general, fMRI is used to extract detailed brain activation images representing snapshots of activation states [4,6]. In some cases, time related series are extracted for specific voxels of the brain [2,3,5] giving the possibility of including state changes in the investigation. Yet, fMRI is still an expensive and cumbersome technique. ERP/EEG is instead relatively cheaper and possibly wearable but it gives more coarse-grained data. These data represent the electrical activation of specific spots in the brain (e.g., less than 100 spots in the 64-channel the Geodesic Sensor Net [10]). This technique naturally generates time series for the electric activation of specific voxels. In this paper, we want to compare fMRI and ERP/EEG in order to assess if the cheaper ERP/EEG can be competitive with fMRI for building decoders for mind states. Using the metaphor and the method introduced in [11], we will compare these two models in a more controlled setting: we compare the two models in the context of building a “reader for machines’ thoughts”. We then propose an fMRI-like method and an ERP/EEG-like method to extract data from “machine brains” performing three different “cognitive tasks”. We then compare the resultant “readers for machines’ thoughts” based on fMRI-like and an ERP/EEG-like data. These more controlled setting gives a better possibility to compare these two methods. Experiments show that there is a consistent drop in accuracy of the detection of the “cognitive tasks” when using “readers for machines’ thoughts” based on ERP/EEG-like data. The rest of the paper is organized as follows. First, we describe the aim of the experiments in Section 2. We introduce the investigation method in Section 3. Then, we report the results of the experiments in Section 4. Finally, we draw some conclusions and we plan the future work in Section 5.
2
Aim of the Experiments
The aim of the experiments is to compare fMRI and EEG/ERP for investigating human brains using the metaphor of “reading machines’ thoughts”. fMRI and EEG/ERP methods are extremely different in investigating human brain. Generally, the first takes snapshots of brain activations producing brain images. The second, instead, records the evolution of the electrical activity in specific spots of the brain. Thus, fMRI methods have high resolution but no time evolution while EEG/ERP methods report time evolution of brain activation in low resolution, i.e., the set of electric sensors is small. We want to compare these two alternative methods. EEG/ERP methods are less expensive and more practical than fMRI method. Then, we want to investigate whether the resolution reduction in EEG/ERP is balanced by the addition of the time dimension.
Comparing EEG/ERP-Like and fMRI-Like Techniques
3
135
Method
For comparing the EEG/ERP and fMRI investigation methods within the metaphor of “reading machines’ thoughts”, we need to build two classifiers of machine activation images that detect the performed “cognitive task ”: the first is based on fMRI-like images and the second is based on EEG/ERP-like data. In the rest of this section, first we describe the “subjects” of our study (i.e., the machines) (Sec. 3.1). Second, we introduce the three different “cognitive tasks” that the “readers of machines’ thoughts” have to recognize (Sec. 3.2): this will produce a corpus of different observations that can be used for later experiments. We describe how we can extract fMRI-like (Sec. 3.3) and EEG/ERP-like (Sec. 3.4) activation images for the three different machines and we introduce the features that will be extracted from these images (Sec. 3.5). Finally, we shortly describe the machine learning methods used to build the classifiers implementing the “readers of machines’ thoughts” (Sec. 3.6). 3.1
Machines
We selected three different kinds of machines for our comparative experiments. These machines are indeed different “individuals” as representing different levels of abstraction. We selected a physical machine, an interpreted machine, and a virtual machine. The second and the third run on the first machine. We hereafter describe these machines as if these were the “subjects” of our experiments. Physical machine. The physical machine we consider is a Von Neumann machine organized with a central processing unit and a memory. This machine runs a Linux operating system. “Cognitive tasks” are represented as programs of this machine working on particular input data. For our experiments, we use programs originally written in C language and compiled according to the hosting Linux machine. We observe the cognitive task by looking at the process executing the program with a predefined input. Observing this process, we can see the compiled program, the ancillary libraries, the process control block, the heap area, the stack area, and the data area. This area does not include any other additional information. Virtual machine. A virtual machine is an abstract machine that runs over a physical machine. This machine is a software program that runs on a real hardware platform with an operating system. Yet, this machine has all the features of a real machine. First, it runs programs written in its particular language that is generally a bytecode. Programs written in upperlevel languages are then compiled (i.e., translated) in the particular machine language. As for any physical machine, the stream of bytecodes is a sequence of instructions: each instruction has one operation code and a possibly null sequence of operands. The operation code tells the actions that the machine has to undertake on possible operands. A virtual machine has all the parts of a real machine: a computing part and a
136
F.M. Zanzotto and D. Croce
memory part. The memory part is organized according to the specific virtual machine. For our experiments, we use a java virtual machine (JVM). This machine run java compiled programs. The memory of this machine is basically divided in four parts: the registers, the stack, the garbage-collected heap, and the method area. All these parts are virtual but implemented somewhere on top of the real machine. We will observe java virtual machines running only one compiled program. This is the observed “cognitive activity”. We monitor this virtual machine observing the memory of the process that hosts the java virtual machine running a compiled program. This memory is larger and more complex than the one of the process representing the “cognitive activity” of the physical machine (cf. Sec. 3.1). It contains: java virtual machine compiled in the language of the hosting physical machine, the java compiled code of the program, the state of the JVM registers, and the state of the memory of the java virtual machine (the stack, the heap, and the method area). All this information is not so clearly related to the performed “cognitive process”, i.e., the algorithm. Interpreted machine. An interpreter is a machine that executes programs written in a high-level language. Likewise a virtual machine, this interpreted machine runs on a physical machine. The major difference with respect to virtual machines is that the program is not compiled in advance in a machine language but it is used in its original form. At running time, the interpreter translates high-level instructions into an intermediate form, which it then executes. For our experiments, we use a PHP interpreter. This interpreter runs programs written in php. These latter are represented in the memory of the process. There is not a clear distinction between different types of memory. As for the above virtual machine, we monitor the activity of the interpreter performing a “cognitive task ” by observing its process while executing a specific program. We then observe: the interpreter compiled in the language of the hosting physical machine, the PHP program, and the state of the working memory. 3.2
Materials
We designed three different “cognitive tasks”, i.e., three programs, for the three different machines. We selected three different classes of programs working on different data structures: vectors of numbers, strings, and binary trees. The three different algorithms are: – a sorting algorithm performed by means of the quick sort algorithm – a edit distance algorithm performed by means of the edit distance – a tree visit algorithm performed as a dept-first search We assume that these three are the classes of cognitive tasks we want to recognize. We want to perform the classification of the activation images at this level. Different instances of these classes of algorithms are given by different associated data. An instance of the class of sorting algorithm is a sorting algorithm program executed with a specific input vector.
Comparing EEG/ERP-Like and fMRI-Like Techniques
137
For each of the three different classes, we randomly prepared 40 input data: 40 vectors of 105 integers, 40 pairs of strings, and 40 binary trees. Each pair (algorithm, data) is a different “cognitive task ”. For the three different machines (the physical machine, the virtual machine, and the interpreted machine), we then have 120 different “cognitive task ” grouped in three classes. We run the 120 different “cognitive tasks” on the three machines obtaining our samples. We then captured the activation images of the different “cognitive tasks” performed on the three different machines. 3.3
fMRI-Like Activation Capturing
To simulate the fMRI activation capturing, we snapshot machines performing the cognitive task at a given stage of execution. This is a more controlled setting with respect to what happens for real brain imaging by fMRI. We exactly know what we are taking as snapshots and the stage of the program that is executed. Given a cognitive activity (i.e., a pair (algorithm, data)), the procedure for extracting images from this activity is then the following: – running the process p representing the activity, i.e., the program and the related input data – stopping the process p at given states or at given time intervals τ – dumping the memory associated to the process M (p) – given a fixed height image and the memory dump, read incrementally bytes of the memory dump, and fill the associated RGB pixel with the read values obtaining the image I(p) This simple procedure can produce more images for each process related to a cognitive activity. The process memory is transformed in an image using the following procedure. Let M (p) be the memory dump of the process p. The memory dump is a sequence of bytes, i.e., M (p) = [b0 , ..., bm ]. Using this sequence of bytes we can produce an image I(p) in Red-Green-Blue (RGB) coding. The image I(p) is a bi-dimensional array of pixels pi,j . Each pixel is three contiguous bytes of the memory image. Given the height h of the image, each RGB pixel has the following RGB values: pi,j = [b3(i+h·j) b3(i+h·j)+1 b3(i+h·j)+2 ]
(1)
where the first byte b3(i+h·j) is used for the red component, the second b3(i+h·j)+1 for the green one, and the third b3(i+h·j)+2 for the blue one. The activation images we obtain are reported in Figure 1. The original images have a fixed height of 768 pixels. For displaying purposes, also the width of the images has been resized to a fixed value. Yet, this does not represent the reality as the virtual machine’s and the interpreter’s pictures are much bigger than the physical machine’s ones. These images represents how the memory of the three different machines react to the “cognitive task ” of sorting a vector with the quick sort algorithm. These snapshots are taken in the middle of the computation. Looking at the data area of the “cognitive process” performed by the physical
138
F.M. Zanzotto and D. Croce
physical machine (c)
virtual machine (java)
interpreted machine (php) Fig. 1. Sorting process in the three different machines (fMRI-like snapshots)
machine, we can observe that half of the vector forms a recurrent pattern while the other half has a completely random pattern. The images in Figure 1 clearly represents the commonalities and the differences of the three machines. A common fact is that the sizes of program code area and the data area are extremely different. The data area is bigger than the code area. The differences are instead clear. The virtual machine allocates a big chunk of memory in advance even if this memory is not used. This is the black part of the related image. This behavior is explained as the virtual machine is completely represented in memory at the beginning of the process. Its virtual memory is completely allocated. The second difference is that the physical machine is the only one that has only the code area. In this case, the process representing the cognitive activity does not include a representation of the machine. 3.4
EEG/ERP-Like Activation Capturing
We describe here a way to simulate EEG activation capturing. The major difference between fMRI and EEG/ERP methods for brain investigation is that the first produce a snapshot of the activation state of the brain while the second represent the electrical activation of the brain with respect to the time. Yet, fMRI images are more accurate and detailed than EEG/ERP activation plots. The resolution is extremely different: EEG/ERP tools observe a fixed number of spots of the brain whereas fMRI captures virtually a picture of the activation of the brain. The simulation for EEG/ERP that we propose here puts together the qualities of fMRI (i.e., high resolution) and EEG/ERP (i.e., capturing a time span). The idea is simple. We can build activation images by concatenating a sequence of
Comparing EEG/ERP-Like and fMRI-Like Techniques
139
physical machine (c)
virtual machine (java)
interpreted machine (php) Fig. 2. Sorting process in the three different machines (EEG/ERP-like time series)
140
F.M. Zanzotto and D. Croce
snapshots taken for each “cognitive activity”. This is possible as we can easily capture several activation images for a single cognitive activity. Resulting images represent the evolution of the zones of the memory with respect to the time. Even if time-related as EEG/ERP, these images have finer resolution with respect to this method. Common EEG/ERP tools capture around 100 channels of electric signals, e.g. the 64-channel Geodesic Sensor Net [10]. If we imagine to build images from the electric activation of these 64 channels, we can produce images with very scarce resolution, e.g., 1024 × 64. As each channel can be used to build a column of the image, the observed time can be partitioned in 1024 slots, and the color of the pixel is given by the mean electric activation of the channel in the specific time slot. Then, real EEG/ERP tools can produce some sort of “activation images”. Simulated EEG/ERP images that we propose for “machine thoughts’ readers” are valuable alternative to possible images that can be produced from real EEG/ERP investigation tools. Then, we can use these images to compare performance of “machine thoughts’ readers” based on snapshots and on time series. The process for capturing the EEG/ERP-like activation images is simple and it is based on the previous method. The procedure is the following: – run the process p representing the activity, i.e., the program and the related input data – stop the process p at each given time interval τ obtaining n memory dumps M0 (p), . . . , Mn (p) – for each memory dump Mi (p), produce the activation image Ii (p) – produce an image TI(p) that is the concatenation of the images I0 (p), . . . , In (p) – resize the image TI(p) to the related fixed size image T I(p) Figure 2 represents these EEG-like images captured for the three different machines performing the sorting “cognitive process” based on the quick sort. The time series evolve from the top to the bottom of each image. For each machine, the first snapshot is taken before loading the target vector. For the physical machine, the space for the vector is allocated at the beginning of the process. We can then clearly distinguish the code/machine area and the data area. The behavior for the three machines is extremely different and, then, the resulting EEG/ERG-like activation images appear to be dissimilar. The memory occupation of the physical machine does not grow during the time. This does not happen for the other two machines where the space occupied by the vector grows as the computation goes forward. In the physical machine, the progress of the sorting of the vector is clearly represented by the image. Visual patterns representing ordered chunks of the vector grow during the computation. The data space of the resulting image is then organized as two triangles: the upper unsorted triangle and the lower sorted triangle. In the virtual machine, the major part of the picture is represented by black pixel as it represents empty memory. In the interpreted machine instead, we can clearly see the growing occupation of the memory. This depends on the recursive nature of the quick sort algorithm that produces a stack of procedure calls where the vector is replicated.
Comparing EEG/ERP-Like and fMRI-Like Techniques
141
In our experiments we considered time series of 21 snapshots for each pair (algorithm, data). The final images have been resized to 1024 × 768. 3.5
Feature Extraction from Activation Images
For using a classifier learner, we need to extract specific features from images. As in both cases, fMRI-like and EEG/ERP-like activation states, we have images, we can rely on the same method to extract features. This is a first step for comparing the fMRI-like and EEG/ERP-like activation states in a common ground. We then used three major classes of features: chromatic, textures (OP - OGD) and transformation features (OGD), as described in [12]. Chromatic features express the color properties of the image. They determine, in particular, a ndimensional vector representation of the 2D chromaticity histograms. Texture features emphasize the background properties and their composition. Texture feature extraction, in LTI-Lib [13], uses the steerability property of the oriented gaussian derivatives (OGD) to also generate rotation invariant feature vectors. Transformations are thus modeled by the OGD features. A more detailed discussion of the theoretical and methodological aspects behind each feature set are presented in [12]. 3.6
Classifier Learner for Machine Thought’s Readers
For finally building the classifiers of the “cognitive task ” that the machine is performing, we used three alternative machine learning models. This is useful to see whether or not results are confirmed for any kind of classification method. We then used: a decision tree learner based [14], a simple Naive Bayes classifier (for more information see [15]), and, finally, an instance based learner (IBk) [16]. These machine learning methods have been used in the context of Weka [17]. The three models are different. Decision tree learners capture and select the best features for doing the classification. Naive bayes learners instead use a simple probabilistic model that considers all the features to be independent. The instance based learner defines a distance in the feature space, does not make any abstraction of the samples, and classifies new instances according to the distance of these new elements with respect to training samples. While the first model makes a sort of feature selection, the second and the third use all the features for taking the final decision.
4
Results
We are ready to compare fMRI-like and ERP/EEG-like “machines’ thought readers”, i.e., the classifiers. We then experimented with the set of feature vectors extracted from the images and the three different machine learning algorithms described in the previous section. We want to determine whether or not ERP/EEGlike methods can have similar performance with respect to fMRI-like methods. We performed two sets of experiments: a generic task and a one-machine-out
142
F.M. Zanzotto and D. Croce
task. In the generic task, we took all the images and we concealed the type of machine (i.e., physical, virtual, and interpreted). In the one-machine-out task, we learnt the classifiers on a kind of machine and we applied it to another kind. For example, we learnt the classifier on the physical machine and we applied it on the virtual machine. The second setting is extremely more complex than the first. Table 1 reports the results for the generic experiments. The first row reports the accuracy of the classifiers when using 50% of the instance set as training and 50% of the instance set as testing. The accuracy of the classifier is the number of correctly classified instances with respect to the total number of decisions. The second row reports the accuracy on a 10-fold cross validation. The complete set of instances is used. Table 1. Accuracy of different learning algorithms for the fMRI-like and the ERP/EEG-like settings: Generic task fMRI-like ERP/EEG-like Decision Tree NaiveBayes IBk Decision Tree NaiveBayes IBk 50% Train-50% Test 96.67 93.33 89.44 87.78 78.33 87.78 10-fold cross validation 96.67 92.78 90.83 88.06 79.44 88.61
We can derive some facts. We can learn good classifiers for the generic task and decision tree learners derive the best classifiers. This implies that some feature is extremely more important than the others in taking the final decision. Yet, these experiments suggest that there is a consistent drop in performance by using ERP/EEG-like methods. This drop in performance is more important for decision tree learners and Naive Bayes methods (8-9% and 13-14%, respectively). Table 2 reports the results of the one-machine-out experiments. This is a more complex task as we want to learn a classifier on some kinds of machines and apply this on other machines. We want to simulate the fact that brains may be similar but not exactly equal. In the third row (1 vs. 1 machine), we report the accuracy of classifiers learnt on a kind of machine and applied to another, e.g., the classifier is learnt on the physical machine and applied on the virtual machine. We have then 6 different pairs of experiments. We report here the average accuracy. The forth row instead reports the experiments where we used two kinds of machines and we tested on the third, e.g., physical and virtual machines used as training and interpreted machine used as testing. As the average accuracies show, this task is extremely more complex than the generic task. As the classifiers have to decide among three evenly distributed classes, the random baseline classifier obtains an accuracy of 33.33%. In two cases for the 1 vs. 1 setting for the Naive Bayes learning methods, the learnt classifiers cannot decide which cognitive task the machine are performing as these classifiers basically predict always one class. In one case for the 2 vs. 1 machine (IBk) and in one for the 1 vs. 1 machine (Decision Tree), we have a classifier that performs worse than a random classifier. We have two important observations. First, the 2 vs. 1 machine cases are generally better than 1 vs. 1
Comparing EEG/ERP-Like and fMRI-Like Techniques
143
Table 2. Accuracy of different learning algorithms for the fMRI-like and the ERP/EEG-like settings: One-machine-out task fMRI-like ERP/EEG-like Decision Tree NaiveBayes IBk Decision Tree NaiveBayes IBk 1 vs. 1 machine 46.81 33.33 35.83 29.31 33.33 35.42 2 vs. 1 machine 63.06 48.61 29.17 46.11 34.44 48.61
machine cases. Second, EEG/ERP methods are nearly always extremely worse than fMRI-like methods. There is only one case where this does not happen: the IBk classifiers in the 2 vs. 1 machine case. Yet, the accuracy obtained by the ERP/EEG-like IBk classifier is extremely lower than the best accuracy of the 2 vs. 1 machine obtained in the fMRI-like side, i.e., the fMRI-like decision tree classifier.
5
Discussion and Conclusions
In this paper, we presented a comparison of two different methods for acquiring data from brain for building mind state decoders: fMRI and ERP/EEG. This comparison has been done using the metaphor between brains and machines. Experiments shows that there is a consistent drop in accuracy when using ERP/EEG-like models with respect to fMRI-like methods. These results are confirmed both in the simple case, i.e., the generic task and in the more complex case, i.e., the one-machine-out task. The drop in performance suggests that building mind state decoders using ERP/EEG will be extremely more complex than building these decoders using fMRI.
Acknowledgments We would like to thank Marco Cesati for his precious advices on the organization of the memory in the Linux kernel and Paul Kantor for his insights on fMRI and EEG applied to the study of brain activation.
References 1. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995) 2. Norman, K., Polyn, S., Detre, G., Haxby, J.: Beyond mind-reading: multi-voxel pattern analysis of fmri data. Trends in Cognitive Sciences 10(9), 424–430 (2006) 3. Haxby, J.V., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L., Pietrini, P.: Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293(5539), 2425–2430 (2001) 4. Mitchell, T.M., Hutchinson, R., Niculescu, R.S., Pereira, F., Wang, X., Just, M., Newman, S.: Learning to decode cognitive states from brain images. Mach. Learn. 57(1-2), 145–175 (2004)
144
F.M. Zanzotto and D. Croce
5. Xiang, J., Chen, J., Zhou, H., Qin, Y., Li, K., Zhong, N.: Using svm to predict high-level cognition from fmri data: A case study of 4*4 sudoku solving. In: Zhong, N., Li, K., Lu, S., Chen, L. (eds.) BI 2009. LNCS, vol. 5819, pp. 171–181. Springer, Heidelberg (2009) 6. Mitchell, T.M., Shinkareva, S.V., Carlson, A., Chang, K.M., Malave, V.L., Mason, R.A., Just, M.A.: Predicting human brain activity associated with the meanings of nouns. Science 320(5880), 1191–1195 (2008) 7. Kiefer, M.: Perceptual and semantic sources of category-specific effects: Eventrelated potentials during picture and word categorization. Memory & Cognition 29(1), 100–116 (2001) 8. Paz-Caballero, D., Cuetos, F., Dobarro, A.: Electrophysiological evidence for a natural/artifactual dissociation. Brain Research 1067(1), 189–200 (2006) 9. Murphy, B., Baroni, M., Poesio, M.: EEG responds to conceptual stimuli and corpus semantics. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 619–627. Association for Computational Linguistics (August 2009) 10. Tucker, D.M.: Spatial sampling of head electrical fields: the geodesic sensor net. Electroencephalography and Clinical Neurophysiology 87(3), 154–163 (1993) 11. Zanzotto, F.M., Croce, D.: Reading what machines “think”. In: Zhong, N., Li, K., Lu, S., Chen, L. (eds.) BI 2009. LNCS, vol. 5819, pp. 159–170. Springer, Heidelberg (2009) 12. Alvarado, P., Doerfler, P., Wickel, J.: Axon2 - a visual object recognition system for non-rigid objects. In: IASTED International Conference-Signal Processing, Pattern Recognition and Applications (SPPRA), Rhodes, pp. 235–240. IASTED (July 2001) 13. Alvarado, P., Doerfler, P.: LTI-Lib - A C++ Open Source Computer Vision Library. In: Kraiss, K.F. (ed.) Advanced Man-Machine Interaction. Fundamentals and Implementation, pp. 399–421. Springer, Dordrecht (2006) 14. Quinlan, J.: C4:5:programs for Machine Learning. Morgan Kaufmann, San Mateo (1993) 15. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers, pp. 338–345 (1995) 16. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991) 17. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, Chicago (1999)
Improving Individual Identification in Security Check with an EEG Based Biometric Solution Qinglin Zhao, Hong Peng , Bin Hu, Quanying Liu, Li Liu, YanBing Qi, and Lanlan Li School of Information Science and Engineering, Lanzhou University, Lanzhou, China {qlzhao,pengh,bh}@lzu.edu.cn, [email protected]
Abstract. Security issue is always challenging to the real world applications. Many biometric approaches, such as fingerprint, iris and retina, have been proposed to improve recognizing accuracy or practical facility in individual identification in security. However, there is little research on individual identification using EEG methodology mainly because of the complexity of EEG signal collection and analysis in practice. In this paper, we present an EEG based unobtrusive and non-replicable solution to achieve more practical and accurate in individual identification, and our experiment involving 10 subjects has been conducted to verify this method. The accuracy of 10 subjects can reach at 96.77%. The high-level accuracy result has validated the utility of our solution in the real world. Besides, subject combinations were randomly selected, and the recognizing performance from 3 subjects to 10 subjects can still keep equivalent, which has proven the extendibility of the solution. Keywords: Security check, EEG, individual identification.
1
Introduction
The objective of our work is to explore a EEG based solution on improving security check. As we all known, keeping the security of entrance which only allow the authorized persons to access is a effective measure of security management. Security check is widely utilized in armory, bank, government offices and other high-level security departments.[1][2] For example, in armory, security check can obstruct from irrelevant people and prevent the confidential information of military from stolen; in government offices, security check will block terrorists which is essential to governmental operation. In factual, in order to assure country security, American government have intensified security check to American visitors and immigrants since Sep.11th, 2001. So, it is vital and urgent to improve security check. In age of digitalizing and networking, security check is not as simple as mechanical lock. Because on the one hand no matter how solid and firm the lock is, it can be open in many ways, even via breaking in; on the other hand, they also
The Corresponding author.
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 145–155, 2010. c Springer-Verlag Berlin Heidelberg 2010
146
Q. Zhao et al.
bother users to always keep key in hand. Thus, digital lock appears to free people from key. Since cryptographic keys for digital lock are long and random (e.g., 128 bits for the advanced encryption standard (AES) [3],[4]), they are difficult to memorize. So, it is hardly to be widely promoted for difficult to memorize the password and easy to expose the password. Because of the limitation of traditional lock, researchers manage to develop biometrics technique[5][11] to check security: biometrical recognizing person. Biometric authentication or, simply biometrics refers to establishing identity based on the physical and behavioral characteristics (also known as traits or identifiers) of an individual such as face[6], fingerprint[7], signature[8], iris[9], retina[5], voice[10], etc. However, each biometric method have their own limitations to be utilized in security check alone. For instance, fingerprint is unavailable to handless people because of the physical requirement of this method. The uncertainty of harmfulness contribute to Retina security check narrowly practicality. As for voice, it is easily forged and not so distinct to different person; what’s worse, one person’s voice is changeable sometimes, which makes false reject rate(FRR) too high to security check. In this paper, we propose an EEG approach. Compared with other biometric methods, EEG signals have many strengths in security check: 1) Cannot be forged; 2) Harmless; 3) Easy to be automatically applied. So, based on the prior researches, our paper is engaged in biometric identification based on EEG, involving EEG signal collection, model establishing and classification algorithms. The significance of our paper are as follows: 1. The result of experiment verifies the implicit relationship between EEG and individual characteristics, which can greatly promote the individual identification technology based on EEG. 2. Instead of imposing complex external stimulation or motor imagine, the subjects in our experiments just need to relax themselves with eyes closed. 3. The high average accuracy(97.63%) of the experiment indicates that the alpha power features of EEG is appropriate to security check. 4. The accuracy from 3 subjects to 10 subjects is few changes, so it proves the extendibility of this solution.
2
Relevant Research
EEG signal is the electrical signal generated by the brain and recorded in the scalp of the subject. The relationship between EEG signal and individual personality have interested scientists by years. The prior scientists, such as M. Poulos and R. Palaniappan etc, have substantiated the implicit relations between EEG and personality which can be used to recognize individuals with satisfactory accuracy, although there is no evidence to validate the one-to-one correspondence so far. In the paper [12][13], M. Poulos conducted many experiments to classify each subject by parametrical and non-parametrical features of EEG, and the result is case-dependent in the range of 80% to 100%.[12] The weakness of his
Improving Individual Identification in Security Check
147
deduction is the occasionality of the result. For example, the accuracy of 4 subjects is between 76% and 88% in [13], but the paper didn’t investigate the trend of classification accuracy with subjects increasing. That’s to say, the method proposed by M. Poulos can be doubted when it is applied to plenty of subjects. Visual evoked potentials (VEP) method proposed by R. Palaniappan in [14], where EEG signals are recorded from 64 channels while the subjects perceive a single picture. The overall average classification performance is 99.06%, thus validates the ability of the proposed method to identify individuals. However, the VEP method is hardly to be practical in security check. Firstly, the eye-blink artifact will be abundant during picture evoking and it hardly automatically removed. Without automation, security check cannot be promoted in real world. Secondly, 64-channel signal collection will gravely distract subjects’ activities and emotions, as well as complexity of the operation, which makes this method obtrusive. At last, the requirement of visual evoking is a limitation to wide application because it is inappropriate to blind one and distracted one. American Wadsworth Research Center indicated that people can control Mu rhythms (8-12Hz) and Beta rhythms (18-26Hz), which offered a reference for the choice of frequency of EEG in motor imagine analysis. Graz college in Austria indicated that all-left or all-right limbs’ movement and imagination can evoke the major motor sensory cortex, so they used 3 electrodes and respectively collected EEG signals in C3, Cz and C4, then extracted features and finished classification with 80%-92% accuracy.[15] However, motor imagine takes long time to train and difficult to manipulate. As a result, this method is unsuitable in security check. Now we propose a unobtrusive EEG based solution in security check which just site unique electrode with closed eyes and quiet environment. In order to achieve a content performance, Alpha rhythms (8-13Hz), theta rhythms (4-7Hz)[12][13] and SMR rhythms (13-15Hz) of EEG were respectively extract by ICA algorithm. The high averaged classification performance of 97.63% validates the feasibility of the proposed method in security check.
3
Methodology of EEG-Based Solution
The purpose of this paper is to present a EEG based solution to improve security check. There are many facts to take into consideration, such as how to acquire EEG recordings, what features is suitable to identify individual, how to obtain these features, and what classifier is appropriate in this practical application. We will illuminate in detail at the following section. Every steps of this solution is illustrated by Fig.1. 3.1
Data Collection
Taking into account of reducing distraction to subjects, simple electrode is allowed to site in subject’s head. The prior research supported that the alpha rhythm is the strongest in the position Cz according to 10-20 international system[16] standard. And the alpha rhythm contains plenty of characteristic
148
Q. Zhao et al.
Fig. 1. Methodology of the whole solution
information. So, we site the electrode in Cz. Furthermore, every data collection last just 30 seconds to free the subjects. In order to broaden the application environment, the only requirement of this solution is quiet. And subjects should guarantee the stillness of their head and their eyes closed during data collection. Compared with other individual identification technique, our solution is more convenient and less environmentally limited. 3.2
Data Conditioning
In order to remove artifacts and obtaining better frequency characteristics, raw EEG signals need to be processed by low-pass filtered with 40Hz cutoff frequency, which can eliminate frequency interference and parts of artifact. After that, extract alpha rhythms, theta rhythms and SMR rhythms from the conditioned signals by FastICA [17]algorithms, which also plays a significant role in artifacts removal, such as ECG, EMG and eye-blink signals; moreover, it lays the foundation for the next step, model building and spectral analysis. Because FastICA[18][19] can extract the associated signals from eye-closedEEG without knowing actual measurement of these signals, and preserve other information. So FastICA is used to extract the signals, and it is fast and effective which is proved in this experiment. Constructing a series of sine and cosine signals as a reference of the extraction wave, take these series signals and EEGs to be input of FastICA mixing matrix, the FastICA algorithm is adopted to separate the signals, then realized the corresponding waves extracting from EEG signal. 3.3
Feature Extraction
To classify each person, the implicit features of EEG signals have to be extracted. Because of the muscle artifact or EOG, the primary work of feature extraction is signal conditioning to remove frequency interference and artifacts. Then, six coefficients of AR model, power spectrum and some other amplitude information of signals are calculated as the features. After the removal of artifacts, the EEG signals can be considered as stationary random signal in short time, and obviously, parametric model is the main method to research and analyze the random signal. AR model applies to signals whose power spectrum has a peak, which EEG signals fit well. EEG signals can usually
Improving Individual Identification in Security Check
149
be seen as being generated by a certain system which is stimulated by white noise, so we can analyze EEG signals by dealing with the relationship between the input and output of this system as long as the power of white noise and parameters of the system are known, shown in Fig.2. In other works, characteristics of EEG can be reflected by the coefficients of the model system, for example, the 6 coefficients of AR model in this paper. The representation of AR model[20] is: x(n) = −
p
ak x(n − k) + u(n)
k=1
in which p is the order of AR model, u(n) is white Guass noise (or the output uncorrelated errors), x(n) is input.
Fig. 2. AR model
In building AR model[21], the order of AR model should be considered firstly. If the order is too high, power spectrum will be divisive, and it cannot guarantee the precision reversely. What’s worse, higher order of AR model leads to more coefficients which makes difficult to choose the best feature combination to classify individuals. The prior experiments indicated that AR model in 6 8th orders can represent the EEG signal better, and we choose 6th order in this paper. Power Spectrum and Center Frequency. There are many methods to calculate power spectrum, such as FFT method, Welch method and AR model method. Fig.3 is the comparison of FFT method, Welch method and AR model in 40 order. The result shows that the power spectrum calculated by AR model is more smooth than FFT method and welch method, which means AR model is in more frequency resolution ratio. So we choose AR model method to obtain the power spectrum. If the coefficients of AR model have been calculated by above method, the power spectrum can be obtained by the following equation: 2 δw Γxx (f ) = 2 p −j2πf k 1 + ak e k=1
In addition, the frequency which makes the power maximize is the center frequency. Both center frequency and the max power are useful features to reflect the characteristics of EEG.
150
Q. Zhao et al.
Fig. 3. The comparison of FFT method, Welch method and AR model method
3.4
Classification
In our work, k-Nearest-Neighbor (KNN) Classifier [22] are applied to classify the obtained features. All samples are randomly split into three parts, two of which play as training data and the other as testing data. Nearest-Neighbor classifier is based on factual cases, and instead of being designed in advance, it directly classifies the samples in unknown categories with training data. In Bayes classification[23], the posteriori probabilities of one sample set, also considered as probabilities that it is in each class, are calculated by its priori probabilities. The class with most posteriori probability is the class of this sample set. However, due to the requirement of conditional independence and the uncertain conditional independence in this experiment between properties, Bayes Classifier in this experiment is not so good as KNN classifier. In order to improve the classification performance, we select KNN Classifier.
4
Experiments
The whole procedure of the experiments is illustrated by Fig.3. 30-s EEG data is recorded by Nexues-4. The artifacts mixed in raw data are removed in preprocessing. EEG features extracted in next step which used to find the implicit association between certain feature combination and individual characteristics. The result of classification attains the recognition of each subject which can apply to security check. The whole procedure is illustrated by Fig.4. 4.1
Subjects and Experimental Conditions
We recruited ten subjects involving 4 females and 6 males in the same age ranges (20-22 old years). All processes of experiments are conducted in quiet room, and
Improving Individual Identification in Security Check
151
Fig. 4. Procedures of experiment
the experimental environment is illustrated by signal collection in Fig.4. When the subject keeps eyes closed and head immobile, his or her EEG signals are collected. For not to distract subject’s normal activities, each collection time is shorted to 30s and the number of electrodes are reduced to one in Cz position. We totally collect 5 times of EEG signals in different days for every subject. 4.2
Experimental Process
At first, user’s raw EEG signals are collected; after EEG pre-processing, EEG features are extracted, features are inputted into classifier and identify individuals.(Fig.4) At last, the result of classification can be applied to many aspects by transmitting between Internet or blue-tooth, such as intelligent security door and e-commerce and so on. Artifacts are removed by ICA algorithm in EEG conditioning. After EEG conditioning, alpha rhythms, theta rhythms and SMR rhythms of EEG signals are extracted by ICA algorithm. As the features, 6 AR model parameters, center frequency, the max power and the power ratio of each rhythm are inputted into classifier and identify individuals. The menu of classifier is illustrated by Fig.5. 4.3
Result
Table 1 illustrates the result of identification when classified by the center frequency, max power, average peak-to-peak value of alpha in kNN(k=1) classifier. The first row and the first column both respectively list the ten subjects, and data in a row shows the probability that the subject in the head of row is predicted as each five subject in the first column. For example, If subject A have 80% possible to be classified as A, and 15% as B, 5% as C, that’s to say, the accuracy of A is 80%, and the false reject rate of A is 20%. Obviously, in this table, the identification accuracy show in diagonal grids.
152
Q. Zhao et al.
Fig. 5. The menu of classifier
Table 1. One classification accuracy with one sort of feature combination
DQX FDP LLL LQY ZGQ LYC QYB MHY ZW ZXW
DQX 100% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
FDP 0.00% 100% 0.00% 0.00% 0.00% 17.65% 0.00% 0.00% 0.00% 0.00%
LLL 0.00% 0.00% 100% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
LQY 0.00% 0.00% 0.00% 100% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%
ZGQ 0.00% 0.00% 0.00% 0.00% 100% 0.00% 0.00% 0.00% 0.00% 0.00%
LYC 0.00% 0.00% 0.00% 0.00% 0.00% 82.35% 0.00% 0.00% 0.00% 0.00%
QYB 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100% 0.00% 0.00% 0.00%
MHY 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100% 0.00% 0.00%
ZW 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100% 0.00%
ZXW 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100%
The comparison of classifier and the number of subjects are illustrated by Fig.6. The average accuracy is 97.63%, and this considerable accuracy proves the feasibility of this solution. Furthermore, the classification performance are in the range of 96.77% and 99.23%. It indicate that the change of accuracy is not definite with subject number increasing. So, we conjecture that this solution can be extended into more subjects.
Improving Individual Identification in Security Check
153
Fig. 6. Comparison of classifier and subjects’ number
5
Discussion and Conclusions
An identification biometric solution based on EEG to improve security check, using 1 Cz electrodes is described in this paper. Taking into account unobtrusive to subjects, the tested subject only need to sit, close her eyes, and relax during 30 seconds of EEG recording. The only inputs to the system are the 30-s EEG recording and the claimed identity of the subject. The output is a classification decision: who is the tested subject. In this solution, we did not use the traditional right and left motor imagine or leg movement, nor visual evoking to obtain EEG signal, but a single electrode against Cz of subjects’ scalp with eyes closed and quiet, which free the physical requirements of users and the condition of applying environment. And then we extracted alpha rhythms, theta rhythms and SMR rhythms by ICA algorithm and extracted their features respectively via AR model. After classifying each feature combinations with KNN classifier, the average accuracy reached 97.63%. It indicates that this method can indeed identify individuals thus apply to security check. Compared with prior work, our method is easy to implement and unobtrusive to subject. Moreover, we also compared the accuracy from three subjects to ten subjects with KNN classifier and the results are in the range of 96.77% and 99.23%. Obviously, we can find that the change of accuracy is indefinite with the increasing subjects. So it implies this solution can apply to security check with more users. In the future, we will, on the one hand, be engaged in ubiquitous improvement of security check, such as enlarging the applying condition and improving the identifying performance. On the other hand, we will fix EEG signals with other biometrics in security check. In order to achieve these prospects, we intend to collect more samples and bring in more classifiers, as well as data fusion method.
154
Q. Zhao et al.
Acknowledgments This work was supported by National Natural Science Foundation of China (grant no. 60973138), the EU’s Seventh Framework Programme OPTIMI (grant no. 248544), the Fundamental Research Funds for the Central Universities (grant no. lzujbky-2009-62), the Interdisciplinary Innovation Research Fund for Young Scholars of Lanzhou University (grant no. LZUJC200910).
References 1. Ratha, N.K., Connell, J.H., Bolle, R.M.: Enhancing security and privacy in biometrics-based authentication systems. International Business Machines Corporation 40(3) (2001) 2. Moss, B.: Getting personal. Biometric security devices gain access to health care facilities. Health Facil. Manage. 15(9), 20–24 (2002) 3. Stallings, W.: Cryptography and Network Security: Principles and Practices, 3rd edn. Prentice-Hall, Upper Saddle River (2003); Pankanti, S., Bolle, R., Jain, A.K. (Guest eds.): Special Issue of IEEE Computer on Biometrics (February 2000) 4. Klein, D.V.: Foiling the cracker: a survey of, and improvements to, password security. In: Proc. 2nd USENIX Workshop Security, pp. 5–14 (1990) 5. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans. Circuits Syst. Video Technology, Special Issue Image- and VideoBased Biomet. 14(1), 4–20 (2004) 6. Li, S.Z., Jain, A.K. (eds.): Handbook of Face Recognition. Springer, New York (2004) 7. Maltoni, D., Maio, D., Jain, A.K., Prabhakar, S.: Handbook of Fingerprint Recognition. Springer, New York (June 2003) 8. Lam, C.F., Kamins, D.: Signature recognition through spectral analysis. Pattern Recognition 22, 39–44 (1989) 9. Roizenblatt, R., Schor, P., et al.: Iris recognition as a biometric method after cataract surgery. Biomed. Eng. Online 3, 2 (2004) 10. Markowitz, J.A.: Voice Biometrics. Communications of the ACM 43(9) (September 2000) 11. Jain, A.K., Pankanti, S.: Biometrics: A Tool for Information Security. IEEE Transactions on Information Forensics and Security 1(2) (June 2006) 12. Poulos, M., Rangoussi, M., et al.: On the use of EEG features towards person identification via neural networks. Med. Inform. Internet Med. 26(1), 35–48 (2001) 13. Poulos, M., Rangoussi, M., et al.: Person identification from the EEG using nonlinear signal classification. Methods Inf. Med. 41(1), 64–75 (2002) 14. Palaniappan, R.: Method of identifying individuals using VEP signals and neural network. IEE Proc.-Sci. Meas. Technol. 151(1) (January 2004) 15. Birbaumer, N., Hinterberger, T., Kubler, A.: The Thought Translation Device (TTD): neurobevioral mechanisims and clinical outcome. IEEE Transaction on Neural Systems and Rehabilitation Engineering 11(2), 120–122 (2003) 16. Homan, R.W., Herman, J., et al.: Cerebral location of international 10-20 system electrode placement. Electroencephalography and Clinical Neurophysiology 66(4), 376–382 (1987) 17. Vorobyov, S., Cichocki, A.: Blind noise reduction for multisensory signals using ICA and subspace filtering, with application to EEG analysis. Biol. Cybern. 86, 293–303 (2002), doi: 10.1007/s00422-001-0298-6 (2002)
Improving Individual Identification in Security Check
155
18. Eichele, T., Calhoun, V.D., Debener, S.: Mining EEG-fMRI using independent component analysis. International Journal of Psychophysiology (2009) 19. Singh, J., Sapatnekar, P.: Statistical timing analysis with correlated non-gaussian parameters using independent component analysis. In: Proceedings of the 43rd annual conference on Design automation (July 2006) 20. Riera, A., Soria-Frisch, A., Caparrini, M., Grau, C., Ruffini, G.: Unobtrusive Biometric System Based on Electroencephalogram Analysis. EURASIP Journal on Advances in Signal Processing 2008 21. Pardey, J., Roberts, S., et al.: A review of parametric modelling techniques for EEG analysis. Med. Eng. Phys. 18(1), 2–11 (1996) 22. Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Elsevier Inc., Amsterdam (2006) 23. Amor, N.B., Benferhat, S., Elouedi, Z.: Naive Bayes vs Decision Trees in Intrusion Detection Systems. In: ACM Symposium on Applied Computing, pp. 420–424 (2004)
Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching Method Mohamed Baghdadi1, Nacéra Benamrane1, and Lakhdar Sais2 1 Department of Informatics, USTOMB, PB 1505, EL’Mnaour 31000, Oran, Algeria [email protected] [email protected] 2 Université Lille Nord de France, CRIL - CNRS, Rue Jean Souvraz, SP-18, F-62307, Lens Cedex 3 [email protected]
Abstract. In this paper a modular approach of segmentation which combines the Bayesian model with the deformable model is proposed. It is based on the level set method, and breaks up into two great parts. Initially, a preliminary stage allows constructing the information map. Then, a deformable model, implemented with the Generalized Fast Marching Method (GFMM), evolves towards the structure to be segmented, under the action of a force defined from the information map. This last is constructed from the posterior probability information. The major contribution of this work is the use and the improvement of the GFMM for the segmentation of 3D images and also the design of a robust evolution model based on adaptive parameters depending on the image. Experimental evaluation of our segmentation approach on several MRI volumes shows satisfactory results. Keywords: Segmentation, Brain MR Imagery, Deformable model, GFMM, Bayesian model.
1 Introduction Segmentation of the brain structures in Magnetic Resonance Images (MRI) is the first step in many medical image analysis applications and is useful for the diagnosis and evaluation of the neurological diseases (Alzheimer's, Parkinson's...). Therefore, accurate, reliable, and automatic segmentation of the brain structures can improve diagnostic and treatment of the related neurological diseases. Manual segmentation by an expert is usually accurate but it is impractical for large datasets because it is a tedious and time-consuming process. Several methods have been developed for tissue segmentation, among these the deformable models have taken a great importance; indeed they define a powerful tool to accurately recover a structure using very few assumptions about its shape. Two great families of deformable models exist: parametric models and nonparametric models. The first are the oldest and require a parametric or discrete representation. The seconds, based on the curves evolution theory, use an implicit representation of the Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 156–167, 2010. © Springer-Verlag Berlin Heidelberg 2010
Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching
157
model and allow topological changes. The principle of the parametric Active Contours (snakes) [6] is to make evolve a contour or a surface towards the borders of the object to be segmented while minimizing certain energy. Two types of energies are useful for the snake. The external energy her role is to guide the active contour on the image, and attracts it towards the position corresponding to a required characteristic. Internal energy, which is associated to the geometry of the contour, its goal, is to avoid the trap of the local minima of external energy, and translates the regularization of the active contour. This type of active contours was widely used in medical imagery [4] [11]. Indeed parametric active contours are powerful tools for segmentation, however were shown not very efficient to treat the complex forms, presenting convolutions, and to manage the variations of topology, which are frequent in 3D medical images. The second family is the nonparametric deformable models including geometrical active contours [8] and geodesic active contours [3], which were the subject of very numerous developments and applications in the field of the medical imagery [7]. The nonparametric deformable models adapt better to the medical applications. Moreover, the formalism of the level sets [9] allows implementing these methods efficiently and has the following advantages: (1) an adapted numerical schemes are available, (2) the topological changes are allowed, and (3) the extension of the method to higher dimensions is easy. The contour or the surface (3D case) deforms under the action of a normal evolution force, so the choice of the evolution force is very intrinsic, its role is to guide the contour towards the target zones, which depend on the application. In particular on medical images, the structures of interest are often of complex form and of variable topology. That makes difficult the definition of a force which is at the same time general and adaptable to specific structures and pathologies. Several evolution models have been proposed, including some parameters (step size, weighting parameters...). The appropriate setting of these parameters strongly influences the performance of the methods. In this paper, we propose a novel approach based on a robust and adaptive evolution model which enables a volume to be segmented with certain performance. The versatility of our segmentation method is demonstrated by reporting results on brain structures in MR images. The remainder of this paper is organized as follows. Section 2 presents the method of segmentation GFMM which constitutes the kernel of our approach. The approach suggested is detailed in section 3. At last, the results of our experiments are presented and discussed in section 4.
2 The Generalized Fast Marching Method (GFMM) 2.1 Introduction to the GFMM The Generalized Fast Marching Method [2][5] is a generalization of the Fast Marching Method (FMM)[6] which is a very fast algorithm allowing to advance a front evolving with a normal velocity. The generalization comes from the fact that the propagation velocity can change sign in space-time, so it can treat the general velocity case without any restriction of sign. In this method the concept of discontinuous solution is used to represent the front. In a very simple way, the front will be represented by the discontinuity of a function θ which will take values 1 and - 1. The function is then the solution of the following equation (in the viscosity sense):
158
M. Baghdadi, N. Benamrane, and L. Sais
θt=c(x, t)|∇θ|
(1)
Where c is the propagation velocity. The main idea of this generalized algorithm is as follows. At each stage, we consider two zones: one where the velocity is negative and the other where it is positive. Then, it is sufficient to make evolve the contour using two different Fast Marching (one in each zone). 2.2 The GFMM Algorithm Given the speed defined by:
cIn of the front at point I at time tn and the normalized speed cˆIn
⎧⎪0 if there exists J ∈ V ( I ) such that (c In c Jn < 0 and c In ≤ c Jn ) , cˆ In ≡ ⎨ ⎪⎩c In otherwise
Where V is a vicinity system defined by:
{
V (I ) = J ∈ Z
N
}
: J − I ≤1.
(2)
As in the classical FMM, we define the Narrow Band (NB) which consists of the points I that can be reached by the front. More precisely, the Narrow Band is defined by: NB n = {I ∈ Z N , ∃ J ∈ V ( I ), θ In = − θ Jn et θ In cˆ In < 0 }, (3)
{
}
NB+n = NBn ∩ I , θIn = +1
{
}
(4)
and NB−n = NBn ∩ I , θIn = −1
~ repreAs in the FMM, for all point I ∈ NB n, we have to compute a tentative value u I senting the arrival time of the front at point I. To compute this tentative value, we must define, the set of points which are useful for I, namely, the set: n
{
}
U n (I ) = J ∈ V ( I ) ∩ NB n , θ In = −θ Jn ,
U n = ∪ I∈NB n U n (I ).
(5)
For all the points J which are useful for a point I ∈ NBn (J∈ Un (I)), we introduce a n
time u ( J , I ) , which can be interpreted as the time when the front Fn starts to from
~ . Once we have computed the tentapoint J to point I, this last is used to compute u I n
~
tive values for all points of the Narrow Band, we denote by tn the minimum of all this values. Below is a pseudo code showing the main steps of the method: Initialization () { n=1; // initialize the field θ θ I0
⎧1 = ⎨ ⎩− 1
for
xI ∈ Ω 0
elsewhere
0
as: (6)
Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching
/* Ω 0 is the initial region*/ //Initialize the time for I: ⎧t if I ∈ U 0 ( K ) and K ∈ NB u 0 (I , K ) = ⎨ 0 otherwise ⎩+ ∞ } // end of Initialization
159
0
(7)
Loop () { U_Computation: {
~ // Compute u
n −1
on NB
n-1
as follows: n −1
/* Let I∈ NB , then we compute uI as the solution of the following second order equation:*/ 2 2 2 ⎛ ∂ u~ n −1 ⎞ ⎛ ∂ u~ n −1 ⎞ ⎛ ∂u~ n −1 ⎞ (Δ x )2 ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ + ⎜⎜ ⎟⎟ == 2 ⎝ ∂ x ⎠ ⎝ ∂ y ⎠ ⎝ ∂z ⎠ cˆ In −1 n-1
(8)
//Such as (similarly for y and z): ∂u~ n −1 = max 0 , u~In −1 ( x , y , z ) − u~In −1 ( x + 1, y , z ), u~In −1 ( x , y , z ) − u~In −1 ( x − 1, y , z ) ∂x
[
]
} //end of U_Computation ~ t n = min
{u~
n −1 I
, I ∈ NB
n −1
}.;
~
// Truncate tn
t n = max (t n −1 , min( ~ tn , t n −1 + Δ t ) );
(9)
if (t n == (t n−1 + Δt ) & & t n < ~ tn ) {
n = n + 1; θ n = θ n−1 ; u n = u n−1 ; Go to (U_Computation); } // Initialize the new accepted points
{
~ NA +n = I ∈ NB +n −1 , u~ In −1 = t n NA
n
= NA
n +
∪ NA
}
and
{
~ NA −n = I ∈ NB −n −1 , u~ In −1 = t n
}
(10)
n −
// Reinitialize θ
⎧ + 1 if I ∈ NA−n ⎪ θ In = ⎨ − 1 if I ∈ NA+n ⎪ n −1 otherwise ⎩θ I
n
(11)
160
M. Baghdadi, N. Benamrane, and L. Sais n
// Reinitialize u ( I , K )
⎧⎪ min( u n −1 ( I , K ), t n ) if I ∈ U n ( K ) and K ∈ NB n u n (I , K ) = ⎨ ⎪⎩ + ∞ , otherwise
(12)
n=n+1; go to (U_Computation); }// end of Loop
3 The Proposed Method: Bayesian Generalized Fast Marching Method (BGFMM) The proposed approach breaks up into two great parts. Initially, a preliminary stage allows constructing the information map. Then, a deformable model, implemented with the Generalized Fast Marching Method, evolves towards the structure to be segmented. Figure 1 presents the various stages of our approach.
FCMA
Posterior Probability Map
Velocity Map
Evolution of the Level Sets
Final Segmentation
Fig. 1. Introductory Scheme of the segmentation approach using the BGFMM
3.1 The Choice of the Kernel Method of Segmentation The Fast Marching Method (FMM) is a powerful method of segmentation characterized by its rapidity in particular for the 3D volumes, however this method suffer of two major defects: (1) the evolution function must be of constant sign (positive or negative), (2) it does not take into account the curvature term. To treat the first problem we used the Generalized Fast Marching Method (GFMM) proposed in [5]. In this version the evolution function can change sign in space time. Moreover, it preserves the rapidity of the FMM. This method is very recent. Indeed, there are no works which uses it for the 3D images segmentation. However the method such as FMM, always suffers from the curvature problem. To treat this pitfall we improved the GFMM (see the following section) so that it can treat complex forms of high curvature. 3.2
Improvement of the Classical GFFM
The GFMM is a powerful method of segmentation, but her performance is decreased when treating forms of strong curvature, that can be explained by the fact that the GFMM as the FMM does not know the curvature of the front. To adjust this problem, some modifications were brought to the classical GFMM. We recall that the classical GFMM is based on the use of the one sided derivatives, computed using forward and backward differences.
Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching
D1− x = u~ n −1 [ x , y , z ] − u~ n −1 [ x − 1, y , z ] D + x = u~ n −1 [ x + 1, y , z ] − u~ n −1 [ x , y , z ]
161
(13)
1
To integrate the curvature information, we replace the first order approximations ( D1− x and D1+ x ) of the partial derivative used in GFMM by the following second order approximations: D 2− x =
3u~ n −1 [ x , y , z ] − 4 u~ n −1 [ x − 1, y , z ] + u~ n −1 [ x − 2 , y , z ]
D 2+ x = −
, 2 3u~ n −1 [ x , y , z ] − 4 u~ n −1 [ x + 1, y , z ] + u~ n −1 [ x + 2 , y , z ]
(14)
2
When these second order approximations are used, the scheme still works in exactly the same way except that we get different polynomial coefficients. The equation to be resolved thus becomes: max(D2− x ,− D2+ x ,0) 2 + max( D2− y ,− D2+ y ,0) 2 + max(D2− z ,− D2+ z ,0) 2 =
We define then the following vicinity system V 2 ( I ) = {J ∈ Z this case the useful points are described by: U
n
=
{
N
(Δx )2 cˆ In −1
2
: J − I ≤ 2
}
I , ∃ J ∈ V 2 ( I ) ∩ NB n , θ In = − θ Jn ,
(15)
} . In (16)
To use the new scheme, the voxels at 2 vu (voxel unit) distance must be useful and have smaller distance values than those at 1 vu distance. If these two conditions are not met, the first order approximations to the derivative can be used instead. 3.3 The Adaptive FCM (FCMA) The FCM [1] algorithm can be adapted to the case of the fuzzy classification of the N gray levels of an image. In this case, it is not necessary to classify all the voxels of an image, but to simply classify the different gray levels values which we find in this one. In this case the gain will be very important particularly if we consider 3D images. The main stages of the FCMA method are given by: 1.
2.
Initialization - Fix the parameters: The number of classes’ c: c (2≤ c≤ M), where M is the number of the gray levels of the image. The fuzzy degree m ∈[1, ∞[, generally m=2. - Initialize the vector v of the centres of the classes - Compute the frequency fi of each gray level in the image Repeat: - Compute the distances dik between all the gray levels i, (i=1,2,…,N-1) and the centres vk of the classes (k=1,2,…, c),and the new centres vk of the c classes :
162
M. Baghdadi, N. Benamrane, and L. Sais N −1
2
c
∑
μ ik = 1
l =1
⎛ d ik ⎞ m −1 ⎜ ⎟ ⎝ d il ⎠
,
ν
k
=
∑
( μ ik ) m ⋅ i ⋅ f i
i=0 N −1
∑
l=0
( μ ik ) m f i
(17)
3.4 The Preliminary Stage At the beginning, we construct the fuzzy map using the result of the FCMA, which gives us for each gray level I its membership degree to the various classes. The membership degree of the gray level I to the class k can be interpreted as the probability of being in the presence of the class ω k knowing I. From this fact we will thus have the following equality: P (ω k I ) = U k ( I )
(18)
In the case of the brain MR images, we generally use Gaussian distributions to represent the various classes of tissues. Each distribution is associated to a class labeled ω k and described by a parameter vector Γk made of three components. The first two components are the parameters of the distribution (mean and standard deviation) the third component is the prior probability of the class ω k , noted π k = P (ω k ) .The mixture of N distributions thus is described by the vector Γ = {Γ k ,1 ≤ k ≤ n }, whose parameters are calculated as follow: After the application of the FCMA algorithm, we label each voxel x of which the gray level equals to I to his class, using the posterior probability. x ∈ωk
if
p(ωk I) > p(ωq I), ∀q,1≤ q ≤ n
(19)
Then, we calculate the prior probability π k of each class starting from the proportion of the voxels belonging to the class k, also the other parameters of the various classes (variance, mean). We denote T the set of the classes corresponding to the segmentation target. The complementary set including the classes corresponding to the background, or to the outside of the structure, is denoted B. The classes mainly represented inside the initial volume of the segmentation are assigned to T. More precisely, the intensity distribution over the whole image can be explicitly written as: n
n
k =1
k =1
P( I ) = ∑ π k ⋅ P( I ω k ) = ∑ (π ki + π ke ) ⋅ P( I ω k )
(20)
Where π and π are respectively the proportions (or prior probabilities) of class ω k inside and outside the initial surface segmenting the object of interest. Once the sets T and B have been determined the estimated intensity distributions inside and outside the structure to be segmented can be defined as: i k
e k
⎧ p i ( I ) = ∑ k ω ∈T π k ⋅ P ( I ω k ) ⎪ k ⎨ = p ( I ) ∑ k ω k ∈B π k ⋅ P ( I ω k ) ⎪⎩ e
(21)
Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching
163
And the prior probability of a voxel to be inside the structure of interest is given by: ηT =
∑
π
k ω k ∈T
(22)
k
3.5 The Evolution Model The propagation equation that we employ uses the region based information of the posterior probability map and described by the following formula:
F = S * SF
(23)
Where: S is the propagation sign, of which the goal is to guide the contour towards the borders of the structure to be segmented (target of segmentation). SF is the stopping factor its role is to stop the evolution of the contour at the desired location. The computation of the propagation sign S: This term favours a contraction or a local expansion of the whole surface all along the process. The problem can be expressed as the classification of each point of the current interface, if a point belongs to the object of interest then the surface should locally extend; if it does not, the surface should contract. We perform this classification by maximizing the posterior segmentation probability p ( w I ) , where w denotes the membership class of the considered point. According to Bayes rule, the maximization of the posterior distribution p ( w I ) is equivalent to the maximization of p ( w ) ⋅ p ( I w ) , where p ( w ) is the prior probability of class w and p ( I w ) is the conditional likelihood of intensity. S is computed as the following way: ⎧ + 1 if S = ⎨ ⎩ − 1 if
P (w ∈ T I ) − P (w ∈ B I ) > 0 P (w ∈ T I ) − P (w ∈ B I ) < 0
(24)
And by using the Bayes rule, we finally obtain: S = Sign (η T p i ( I ) − (1 − η T ) p e ( I ))
(25)
The computation of the stopping factor SF: Taking its values in the interval [0, 1]. Indeed, If its value is close to 1, the propagation velocity of the contour is high, and inversely, if its value is close to 0, the propagation velocity of the contour is low (stopping of the evolution). We imagine the zones of the true borders of the object of interest i.e. the zones where the contour must be stop like a transition bridge between the interior and the exterior of the segmentation target. We need thus to calculate for each voxel its probability to being on this bridge which is denoted Pbridge. Let x be a voxel of the current interface, and ω the estimated class of x. The posterior probability of x being on the transition bridge, given I and ω, is described by: ⎧⎪ Pbridge ( y ∈ T I ) Pbridge ( x I , ω ) = ⎨ ⎪⎩ Pbridge ( y ∈ B I )
if
ω ∈T
if
ω ∈ B
(26)
164
M. Baghdadi, N. Benamrane, and L. Sais
The voxel y is a neighbouring voxel of x in the normal direction located outside the volume defined by Ө (Ө in reference to the GFMM). Thus if x is more likely to be inside the object to be segmented (i.e. if ω∈T), then the posterior probability of x being on the transition bridge is the probability of y to be located outside the object to be segmented. Under the assumption that the intensity values are independent variables, if we denote I' the intensity of the voxel y, we have: P ( y ∈ B I ) = P ( y ∈ B I ')
(27)
Then Bayes rule leads to:
P( y ∈ T ) ⋅ P(I ' y ∈ T ) P(I ' ) The equation (26) can then be written as: P( y ∈ B I ) =
(1 − η T ) ⋅ p e ( I ' ) ⎧ ⎪η ⋅ p ( I ' ) + (1 − η ) ⋅ p ( I ' ) ⎪ i i T e Pbridge ( x I , ω ) = ⎨ ( ' ) η ⋅ p I T e ⎪ ⎪⎩η i ⋅ p i ( I ' ) + (1 − η T ) ⋅ p e ( I ' )
if
ω ∈T
if
ω∈B
(28)
(29)
We return now to the GFMM, to be able to calculate Pbridge, and for each point x belonging to F (x ∈ F+, or x∈ F-), we detect the set of points EP such as: EP = {x ' , x '∈ V ( x ) and θ ( x ) = − θ ( x ' ) }. If EP contains more than one element, we choose x’∈ EP verified: ⎧ Pe ( I ′ ) > Pe ( I ′′ ), ∀ q ∈ EP , if Pi ( I ) > Pe ( I ) ⎨ ⎩ Pi ( I ′ ) > Pi ( I ′′ ), ∀ q ∈ EP , if Pi ( I ) < Pe ( I )
(30)
where I, I’ and I’’ are the gray levels of the voxels x ,x’ and q respectively, Then we calculate Pbridge(x) using the formula (29). The data consistency term SF at point x belonging to the interface is defined as a decreasing function of Pbridge(x), and given by:
SF = g b ( Pbridge ( x I , ω ))
(31)
The decreasing function gb is given by the following formula: ⎧⎪1 − 3 x 2 if x < 0 . 5 ∀ x ∈ [ 0 ,1], g b ( x ) = ⎨ 2 ⎪⎩ 3(1 − x ) else
(32)
4 Experiments and Results In order to test the BGFMM for the segmentation of the brain structures several series of experiments were made on varied data bases, also quantitative measurements were performed on a data base comprising a manual segmentation used as reference(ISBR). For all the series of the experiments we have initialised the number of class for the FCMA method to 5 for detecting the five regions (gray matter, white matter, cerebrospinal fluid and the two other classes constituting of voxels corresponding to the partial volume (WM-GM) and (GM-CSF).
Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching
165
Fig. 2. Results of the segmentation of WM (first line), GM (second line) and CSF (third line of the image). The left images represent the 3D view of the results.
(A)
(B)
Fig. 3. Segmentation of (a) the whole brain and (b) the ventricles using the BGFMM
First, the method was applied to the segmentation of a particular class of tissue (white matter, grey matter, cerebro-spinal fluid). For this purpose, we have initialized the surface with a small cube located inside the tissue. An example of the results is visible on figure 2 (In order to improve the visibility of the results a white mask was superimposed on the surface of the tissues). We have also evaluated the method by computing the overlapping index DSC (Dice Similarity Coefficient) between the manual segmentation of the three tissues provided with the IBSR base, considered as ground truth, and our results. The values of similarity obtained for the 18 volumes of this base treated for each tissue are presented on the curves of figure 4(A). In another series of experiments, we have segmented the whole brain, and the ventricles. For the whole brain, the segmentation was initialized with a cube 100×80 ×80, and the classes
166
M. Baghdadi, N. Benamrane, and L. Sais
Fig. 4. Values of DSC for the segmentation of: (A) white matter, grey matter, and CSF. (B) whole brain, and the ventricles.
of interest (grey matter + white matter) automatically determined as described in section (3.4). An example of the results is visible on figure 3. The quantitative results are presented on figure 4(B). The method gives satisfying results for the segmentation of the three brain tissues (WM, GM, CSF) of the whole of the bases, although more or less marked according to subjects. Also for the brain and the ventricles, in particular the furrows of the brain and the contours of the ventricles are well segmented. It remains that the segmentation of the ventricles is not always perfect expressed by a low value of DSC. This can be explained by the fact that the ventricles are structures of small size what is reflected on the values of the DSC. Also the characteristics of the base have an influence on the results because the dynamics of the intensity is reduced even very reduced for certain volumes of the ISBR base.
5 Conclusion We have proposed in this paper a new approach based on the Deformable model and on the Bayesian model for the segmentation of 3D medical images. This new model was applied more specifically to brain MRI volumes for the segmentation of the brain structures. This approach is divided into two parts. Initially, a preliminary stage allows constructing the information map. Then, a deformable model, implemented with the Generalized Fast Marching Method (GFMM), evolves towards the structure to be segmented. Our contribution consists of the use and the improvement of the GFMM for the segmentation of 3D images and the design of a robust evolution model based on adaptive parameters. In a future work we expect to enrich this evolution model with some a priori knowledge (expert, anatomical atlas, form models, space relations...) to improve the performances of the method and also to extend it to more difficult and complicated applications.
Segmentation of 3D Brain Structures Using the Bayesian Generalized Fast Marching
167
References 1. Bezdek, J.C.: A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms. IEEE Transaction, Pattern Analysis. Machine Intelligence 2(1), 1–8 (1980) 2. Carlini, E., Falcone, M., Forcadel, N., Monneau, R.: Convergence of a Generalized Fast Marching Method for a non-convex eikonal equation (2007) 3. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic active contours. International Journal of Computer Vision 22(1), 61–79 (1997) 4. Chen, X., Teoh, E.K.: 3D object segmentation using B-Surface. Image and Vision Computing 23(14), 1237–1249 (2005) 5. Forcadel, N.: Comparison principle for the generalized fast marching method. In: ENSTA, Mars 20 (2008) 6. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active contour models. International Journal of Computer Vision 1(4), 321–331 (1987) 7. Lynch, M., Ghita, O., Whelan, P.F.: Left-ventricle myocardium segmentation using a coupled level-set with a priori knowledge. Computerized Medical Imaging and Graphics 30, 255–262 (2006) 8. Malladi, R., Sethian, J.A., Vemuri, C.: Shape modelling with front propagation: a level set approach. IEEE Transactions on Pattern Analysis and Machine Intelligence 17(2), 158– 175 (1995) 9. Osher, S., Sethian, J.A.: Fronts propagating with curvature dependant speed: Algorithms based on Hamilton-Jacobi formulation. Journal of Computational Physics 79, 12–49 (1988) 10. Sethian, J.: Level set methods and fast marching methods. In: Evolving interfaces in computational geometry, fluid mechanics, computer vision and material science. Cambridge University Press, Cambridge (1999) 11. Xiao, D., Sing, W., Charles, N., Tsang, B., Abeyratne, U.R.: A region and gradient based active contour model and its application in boundary tracking on anal canal ultrasound images. Pattern Recognition 40(12), 3522–3539 (2007)
Domain-Specific Modeling as a Pragmatic Approach to Neuronal Model Descriptions Ralf Ansorg1 and Lars Schwabe2 1
2
Technische Universität Berlin, Dept. of Electrical Engineering and Computer Science, 10623 Berlin, Germany Universität Rostock, Dept. of Computer Science and Electrical Engineering, Adaptive and Regenerative Software Systems, 18051, Germany
Abstract. Biologically realistic modeling has been greatly facilitated by the development of neuro-simulators, and the development of simulatorindependent formats for model exchange is the subject of multiple initiatives. Neuronal systems need to be described at multiple levels of granularity, and compared to other such multi-level systems they also exhibit emergent properties, which are best described with computational and psychological terminology. The links between these levels are often neither clear in terms of concepts nor of the underlying mathematics. Given that modeling and simulation depends on explicit formal descriptions, we argue that rapid prototyping of model descriptions and their mutual relations will be a key to making progress here. Here we propose to adapt the paradigm of domain-specific modeling from software engineering. Using the popular Eclipse platform, we develop the modular and extensible NeuroBench1 model and showcase a toolchain for code generation, which can also support, mediate between, and complement ongoing initiatives. This may kick-start the development of a multiplicity of model descriptions, which eventually may lead to ontologically sound multi-level descriptions of neuronal systems capturing neuronal, computational, and even psychological and social phenomena.
1
Introduction
Publications of simulation studies of neuronal systems need to contain enough details about the model and its parameters in order to allow for re-implementing the model, but due to the high level of detail such re-implementations are often tedious and time-consuming. Sharing models in terms of scripts for established simulators like NEURON, GENESIS or NEST has been a major step towards model exchange. Recently, these efforts have been extended by the development of simulator-independent model descriptions like generating models via Python scripts [8], renewed interest in NeuroML [3], and efforts of the International Neuroinformatics Coordination Facility (INCF) to develop an open standard for neuronal model descriptions. Compared to systems biology, however, corresponding efforts in computational neuroscience are less developed in terms of model 1
See http://www.neurobench.org
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 168–179, 2010. c Springer-Verlag Berlin Heidelberg 2010
Domain-Specific Modeling as a Pragmatic Approach
169
exchange [9], because a standard as widely accepted as the Systems Biology Markup Language (SBML) [10] is still missing. This advantage of systems biology could be partly traced back to the success of bioinformatics, which has always been a heavily data-driven enterprise depending on common standards and formats. Nowadays, most major publications of models in systems biology provide a model description in SBML, but most publications of models in computational neuroscience do not provide any machine readable description or executable code. Is this solely due to the fact that neuroinformatics matured after computational neuroscience, as compared to systems biology becoming more popular after bioinformatics? Here we argue that computational neuroscience is facing a few challenges, which are seldom considered in systems biology. For example, the notion of the “biological function” of a certain “neuronal computation” is omnipresent in most computational neuroscience publications, whereas the focus on quantitative models in systems biology so far masked the possible computational nature of many subcelluar processes (but see [14]). Here we identify and focus on four desirable properties of neuronal model descriptions: First, they should be usable by domain experts like mainly experimentally working neuroscientists, not only neuroinformatics professionals. Second, they should allow for formulating models at different levels of granularity. Examples include quantitative models of synaptic plasticity [12] at a very fine-grained spatial and temporal scale, which may include receptor trafficking, binding mechanisms, or the movements of vesicles. Third, they should allow for formulating models at different levels of abstraction. Examples include phenomenological models of synaptic plasticity [13] without an explicit link to the underlying molecular mechanisms, or computational models aiming at teleological explanations based on first principles derived within domains outside biophysics like information theory, statistical inference, or decision and game theory. Finally, they should allow for formulating explicit links between different ontological spheres. Examples include the link between population activity (“electrical discharges”) and computations hypothesized to take place in sensory systems (“algorithms and computations”), or ultimately the link between the electrical neuronal activations and cognitive phenomena like perception and imagery at the phenomenal level. How to obtain and agree upon a single model description standard with these properties? On the one hand, proper tool support is an essential factor determining the “success” of any new standard. On the other hand, properties of a standard like modularity and extensibility determine the nature of the software ecosystem developing around it. For example, a large and still growing ecosystem has been developed around SBML, and the availability of toolboxes and libraries further facilitates its adoption as a standard for model exchange. Thus, for neuronal model descriptions both the tool support as well as openness and extensibility of the standard need to be considered. Here we argue that a single model description with all four properties is neither achievable nor desirable,
170
R. Ansorg and L. Schwabe
but aiming at many different model descriptions with explicit relations between each other is a promising and feasible approach. More specifically, we suggest to apply the software engineering methodology of domain-specific modeling (DSM) to the modeling of neuronal systems. We suggest to make use of model-to-model transformations to interface between intentionally very specific domain models and more general purpose model descriptions, which are suitable for driving popular neuro-simulators via automatic code generation. Here we develop a modular and extensible model for a general purpose neuronal model description, describe a domain-specific “visual cortex modeling language” as toy-example to showcase model-to-model transformation and code generation using the popular Eclipse platform [7,11]. Our main contribution in this work is to demonstrate that industry-proven methods and tools can be applied in a neuroinformatics setting for the benefit of fostering a communitybased development of a multitude of model descriptions, which acknowledges the multiplicity of “world views” in the neuroscience community, but due to the model-to-model transformations (and finally the code generation) are explicitly related to each other.
2 2.1
Background Domain-Specific Modeling
DSM is a software engineering methodology, where highly specific modeling languages are used in order to generate software via fully automatic code generation directly from the models. This shall be contrasted with the use of general purpose modeling languages like the Unified Modeling Language (UML). The UML defines graphical notations for concepts, which are already present in objectoriented programming languages. While UML certainly eases communication between developers, or software manufacturers and their clients, it does not raise the level of abstraction much above the abstractions already present in the programming languages. In contrast, the so-called domain-specific languages (DSLs) of DSM make use of concepts from the particular application domain. Together with the fully automatic code generation from models expressed in such DSLs, the productivity gain of DSM compared to more classical methods using general purpose languages like UML can be dramatic [17]. Applying DSM calls for redefining the roles of the people involved in software development. A major change in DSM compared to more classical approaches is the separation between a domain expert and the author of the code generators. While the latter is usually a well-experienced programmer with much knowledge about the target platform, the former only needs to be an expert in the application domain. Most important, however, is the active involvement of the domain expert in the development process: Using a DSL, the domain expert is doing the actual development and the “programming” via the code generators as tools. For that reason, the author of the code generators is sometimes referred to as the “toolsmith,” which is the terminology we adopt here.
Domain-Specific Modeling as a Pragmatic Approach
171
Since a DSL has to be developed for each particular application domain, DSM can only be successfully applied with sufficient experience in developing, continuously improving, refining, and adapting the DSLs, where proper tool support is crucial. The former methodological aspect is subject of ongoing research and calls for communicating “best practices” between practitioners. In terms of tool support, multiple commercial (see, e. g., [2,16]) and open source options are available. While individual projects and approaches may differ in terms of technology and in the specifities of the proposed methodology, a common property is their productive use of models beyond plain documentation purposes. Here we use the popular Eclipse platform in order to develop a toolchain, which builds upon well-established object-oriented paradigms, methodologies, and formats like MOF and the OMG’s MDA. 2.2
OMG’s MOF and MDA with QVT
The Object Management Group (OMG) is an international consortium, which now focuses on modeling and the corresponding standards. Currently, the main application is in model-driven architecture (MDA), which is a model-based approach for developing software systems, but the methods and standards are applicable in the broader field of engineering in general (see [4] for a prominent example). One standard proposed by the OMG is the Meta-Object Facility (MOF), which is a DSL to define metamodels, such as the metamodel for the UML. More specifically, MOF proposes a multi-layered architecture: Objects in the real world have a one-to-one correspondence to objects in the runtime environment (M0-model), which are instances of classes defined in a domain-specific model (M1-model). For example, individual book objects are entities in M0models, whereas the class book is defined in an M1-model. The domain-specific M1-models are expressed using structures defined in the layer above (M2-model or metamodel). The most frequently used metamodel is the UML, which specifies how classes and relations between classes in M1-models are defined. For example, the UML metamodel states that classes have methods and attributes. Other metamodels than UML could specify how, for example, other M1-models like Petri-nets or tables in relational databases are defined. In a similar manner, the MOF specifies how M2-models are defined. For that reason, the MOF is a M3-model and can viewed as a meta-metamodel or a DSL for describing metamodels. A major technological contribution of the OMG is the specification of the XML Metadata Interchange (XMI) format. It was initially developed in order to exchange metadata and hence became a de facto standard for exchanging UML models, but it can also be used in order to exchange instances of models as well. Note that, for example, a particular UML model is nothing but an instance of the UML metamodel. In order to use models in a productive way within MDA, the OMG specified the Query View Transformation (QVT). Using QVT, a transformation from a source to a target model can be defined and executed without user intervention. Hence, QVT is a candidate for a key tool in a DSM toolchain. However, the extent to which DSM shall make use of such model-to-model transformations
172
R. Ansorg and L. Schwabe
compared to a one-shoot transformation from a high-level description into code is subject of an ongoing debate [17].
3 3.1
Proposal for the Extendable NeuroBench Model The Core of the NeuroBench Model
Here we propose the NeuroBench model as a modular and extensible model for describing neuronal models. To be specific in terms of terminology: We propose to follow the OMG’s approach to MDA and develop an M1-model with instances of this model corresponding to descriptions of neuronal models. The core of this model is shown in Fig. 1. Root objects of model descriptions are of class ModelDesc, which contain general information such as name, version, or author. The actual model elements are contained within the root object and are of class ModelElement. Subclasses of ModelElement may define specific models like, for example, a single compartment Hodgkin-Huxley neuron model. Most important, however, is the explicit distinction between ModelElement and Factory, which itself is a ModelElement. Model-to-model transformations and code generators need to operationalize this distinction as follows: When processing a NeuroBench model description, only objects of class Factory shall trigger the instantiation of Factory.copies objects in the destination model or the generation of code, based upon the ModelElement referenced by the Factory as template. While such a constructive approach is part of other neuronal model descriptions [3] in order to allow for constructing, for example, large populations of model neurons without enumerating them, a key advantage of our model is that every Factory is also a ModelElement. This allows a very compact compositional description of neuronal models of even very large neuronal structures like populations of populations of neurons, which in a certain context may correspond to a functionally defined neocortical brain area (see Sec. 5 for examples). Besides referencing a ModelElement as a template, a Factory may or may not have associated labelings (class Labeling) for the to-be-created objects. Such labelings can be used to attach spatial positions or other non-physiological attributes to the to-be-created objects, which can be used when establishing connections between them. As to whether such a labeling shall be created for each processing of a Factory (like randomly assigning spatial positions to each created model neuron) or shared between multiple invocations for object creation is represented in the isGlobal attribute. A Factory as well as a ModelElement can be compositional (CompositeFactory and CompositeModelElement, the latter not shown in Fig. 1), which serves mainly to structure larger model descriptions. Another key property of the core model is the specification of state variables of model elements. Attributes of subclasses of ModelElement shall be viewed as constants like the membrane time constant in a possible subclass IaFNeuron, which could represent an integrate-and-fire model neuron. However, the membrane potential of a model neuron is certainly a state variable, which needs to be allocated for each created model instance. Our core model separates the definition of such state variables into the specification and the declaration. For
Domain-Specific Modeling as a Pragmatic Approach
173
example, a variable specification named “membrane_potential” (VarSpec.name) together with the type float (or an SI unit in a future revision of the core model) would be contained as a VarSpec in the root object, which is then available for the whole model description. Each ModelElement contains declarations (ModelElement.vars) for the state variables, which contain an Initializer. Initializers shall drive code generators such that state variables are set to initial values. Note that code generators can also make the initialization be dependent on the labelings attached to a Factory. We make use of this in the examples in Sec. 5. Connections between ModelElement objetcs are represented by a DirectedLink. While such a connection could correspond to a synaptic connection between model neurons, a DirectedLink could also represent a link between two compartments of a multicompartment neuron model, or a whole projection pattern between two populations of neurons. In the latter case, the source and destination objects would be of class Factory having a model neuron, i. e. a subclass of ModelElement, as the template.
ModelDesc name : EString varspecs
author : EString
VarSpec name : EString
version : EString
0..*
type : vartype 1 varspec
elements 0..* VarDecl
vars ModelElement template 1 labelings
0..*
1 src
name : EString comment : EString
1 dst
0..*
1
Labeling name : EString type : vartype
initializer
1 Initializer
modelfactory 0..*
isGlobal : EBoolean
Factory
DirectedLink
copies : EInt labelings 0..* factories
CompositeFactory
Fig. 1. Core of the NeuroBench model for neuronal model descriptions. The root object of any model description is of class ModelDesc, and the ModelElements describe the individual model elements in greater detail. Particular neuron or synapse models are subclasses of ModelElement and DirectedLink, respectively. A key property of the core model is to consider a Factory of model elements as a ModelElement itself, which allows for compact compositional descriptions of even very large neuronal structures as a Factory’s template can also be a Factory. See text for a detailed explanation.
174
3.2
R. Ansorg and L. Schwabe
Extensions for Neurons, Synapses and Connectivity
Extending the core model with particular models for neurons and synapses is straightforward: For specialized neuron models and synapses the ModelElement or DirectedLink class need to be extended, respectively. The details of such subclasses are not of interest here as they could be defined on-demand using the definition of a particular target simulation platform, referring to external definitions in other standards like SBML, or using the “lowest common denominator approach” of PyNN [8]. Note that if a particular model does not call for any constants, there is also no need to subclass ModelElement of DirectedLink. While we have subclassed ModelElement into a LinearRateNeuron with threshold and gain as constants in order to describe firing rate models, we attached a state variable called Weight to a DirectedLink in order to describe the coupling between such rate-based neuron models, i. e. no subclassing of DirectedLink is necessary in such cases, but for kinetic synapse models one may want to store the transition rates between states of synaptic channels as constants in a proper subclass of DirectedLink. So far, the support of the NeuroBench model for connectivity is kept intentionally minimal, but it still allows for DSM with full code generation. The model defines a FullDirectedLinkFactory subclass of Factory and demands that the corresponding template of this factory is a DirectedLink. In other words, such a special factory shall trigger the creation of DirectedLink objects, which connect model elements. The core model also defines a subclass ToolsmithDefinedInitializer of Initializer with funname (of type String), as well as arg1, arg2, etc. (of type Double) as attributes. If an initializer for the state variables of such a DirectedLink (the template of a FullDirectedLinkFactory) is a ToolsmithDefinedInitializer, then the code generator shall delegate the initialization to a function funname of the desired target platform with optional arguments arg1, arg2, etc. Here, the code generator may or may not pass the values of the labelings as additional arguments. We defined all extensions in separate models, which import the NeuroBench core model (Fig. 1). In the same manner, other users can extend the core model or our extensions and develop their own code generators.
4 4.1
An Eclipse-Based Toolchain The Eclipse Platform
DSM claims to raise the level of productivity in software development similar to the move from assembler to higher programming languages [17]. While such productivity increases depend on a few tool-independent abstractions, proper tool support is essential for DSM. While the OMG’s approach to MDA is only one among other approaches (see, e. g., [6,16]), building upon OMG standards such as MOF and XMI prevents re-inventing some wheels and ensures an interoperability among tools and libraries. The popular Eclipse platform makes intensive use of OMG standards, and we selected it as the ecosystem for the NeuroBench model
Domain-Specific Modeling as a Pragmatic Approach
175
and toolchain. In short, Eclipse has been originally developed by IBM in order to unify all its development environments on a single code base. While it is widely known as an Integrated Development Environment (IDE), it provides a whole ecosystem for software and system development purposes, now based upon the Equinox OSGi code. As an IDE it supports multiple target programming languages, and as a DSM infrastructure it supports any target platform. 4.2
M2M and M2T in the Eclipse Modeling Project
The Eclipse Modeling Project [1] brings together multiple Eclipse-based technologies for MDA. The NeuroBench toolchain has been set up using the M2M and M2T subprojects. The M2M subproject realizes so-called model-to-model transformations. For example, a model of decision making (defined with computational concepts) would be translated into a neuronal network model (defined with biophysical concepts) using M2M’s technologies. We make use of QVT for such transformations. The M2T subproject is used for model-to-text transformations, i. e. for code generation. We make use of Xpand for these tasks. All NeuroBench software artifacts, together with a short tutorial-like introduction for applying them to neuronal modeling, can be downloaded from www.neurobench.org/publications/braininf2010.tgz as supplementary online matreial (SOM).
5 5.1
Examples Recurrently Connected Rate-Based Neurons
Let us consider a first example in order to demonstrate how neuronal models can be expressed using the NeuroBench model. Here we consider the structural description of a simple recurrent network model with, say, N = 191 excitatory neurons (Fig. 2a). Each neuron has an activity state variable, and with each neuron i a one-dimensional position xi is associated. The N = 191 neurons shall be equally spaced between the positions −10 and 10. The neurons shall be recurrently connected, where the weight of the synaptic connection between any two neurons is computed by a user-defined function, which may depend on the positions of the neurons. Fig. 2b shows an instance of the corresponding NeuroBench model as an object diagram. We defined a LinearRateNeuron as a subclass of ModelElement, and a LinspaceLabeling as a subclass of Labeling. The former represents a ratebased neuron with a threshold-linear activation function, the latter the particular labeling strategy of equal spacing. The corresponding XMI-file storing these objects has been created and edited with an editor generated from the NeuroBench model using the Eclipse modeling framework (see SOM). First, note the strict usage of Factory objects, each having their own template. While the CompositeFactory referenced by the root object serves to structure the model description and does not have a template itself, the two contained Factory objects reference a ModelElement, i. e. a LinearRateNeuron and a SynapticConnection (subclassed from DirectedLink ). Second, note the separation of
176
R. Ansorg and L. Schwabe a) ...
...
N=191 exc. neuron
b) Ex01:ModelDesc name = “ex01”
MainFac:CompositeFactory modelfactory name = “mainfac”
ExcNeurons:Factory
ExcToExcPrj:FullDirectedLinkFactory
name = “ExcNeurons” copies = 191
name = “ExcToExcPrj”
Weight:VarSpec name = “Weight” type = float
template
template
varspec
labelings
Position:LinspaceLabeling
ExcToExc:SynapticConnection
name = “Position” min = -10 max = 10 type = float isGlobal = true
name = “ExcToExc”
src
dst
ExcNeuron:LinearRateNeuron
VD2:VarDecl
name = “ExcNeuron” gain = 1.0 th = 1.0
VD1:VarDecl
MyWeightInitFun:ToolsmithDefinedInitializer name = “myWeightInitFun” arg1 = 3.0 arg2 = 1.0 arg3 = 19.0 defaultValue = 0.0
varspec
Init:Initializer defaultValue = 0.0
Activity:VarSpec name = “Activity” type=float
Fig. 2. Example of a network with N = 191 recurrently connected neurons. a) Illustration of the network model with neurons as circles and recurrent synaptic connections. b) Instance of the corresponding NeuroBench model as an object diagram. Not shown is the containment of the two VarSpec’s in the ModelDesc.
the variable specifications and declarations. Each of the ModelElement objects declares a state variable (“Activity” and a “Weight”). Third, note that the initialization of the state variables is delegated to Initializer objects. In order to showcase code generation within the proposed DSM approach, we set up socalled Xpand templates in order to transform such a NeuroBench model into proper Matlab code. Listing 1 shows the definition and initialization of the state variables, and a different set of templates was used to generate C code (see SOM). Variables of ModelElement and DirectedLink became vectors and matrices, respectively. Code generation for neuro-simulators would be even more straight-forward as they already provide many domain-specific abstractions for general purpose neuronal modeling. In a second example we extended the model shown in Fig. 2b in order to showcase how small changes to a NeuroBench model description together with the very same code generation templates can be used in order to model large neuronal structures such as a sheet of recurrently connected orientation hypercolumns in a model of orientation tuning and contextual effects in primary visual cortex (V1) (see, e. g., [15]). In addition to another input state variable each neuron has two associated labels, an orientation bias corresponding to a “preferred
Domain-Specific Modeling as a Pragmatic Approach
177
orientation” of an orientation-selective V1 neuron and a position. Most important, however, is the creation of neurons via a factory of factories. Here, the Factory “ExcSheet” (with ExcSheet.copies=25) has another Factory “ExcNeurons” (with ExcNeurons.copies=80) as a template. The FullDirectedLinkFactory “ExcToExcHCs” has ExcToExcHCs.src = ExcToExcHCs.dst = ExcSheet. It connects a factory of factories with another factory of factories, which is possible due to Factory being a ModelElement itself. As toolsmiths we exploited this in the code generation templates and generated the code in listing 2. Note that the values of the lables are passed to the user-defined function initializing the weights. Listing 1. Matlab code generated from the first example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
m Ex cN eurons _ A ct i v i t y = z e r o s ( 1 9 1 ) ; f or iExcNeurons = 1 : 1 9 1 m Ex cN eurons _ A ct i v i t y ( i E x c N e u r o n s ) = 0 . 0 ; end mExcToExcPrj_Weight = z e r o s ( 1 9 1 , 1 9 1 ) ; f o r i D s t_ Ex cN eurons = 1 : 1 9 1 vLabelsDst = [ ] ; v L a b e l s D s t = [ v L a b e l s D s t v E x c N e u r o n s _ P o s i t i o n ( i D s t_ Ex cN eurons ) ] ; f or iSrc_ExcNeurons = 1 : 1 9 1 vLabelsSrc = [ ] ; v L a b e l s S r c = [ v L a b e l s S r c vExcNeurons_Position ( iSrc_ExcNeurons ) ] ; mExcToExcPrj_Weight ( i D s t_ Ex cN eurons , i S r c _ E x c N e u r o n s ) = . . . m y Wei ghtI ni tFun ( v L a b e l s D s t , v L a b e l s S r c , 3 . 0 , 1 . 0 , 1 9 . 0 ) ; end end
Listing 2. Matlab code generated from the second example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
N_EXCNEURONS = 8 0 ; % L a b e l i n g s f o r ’ ExcNeurons ’ vExcNeurons_ OrientationBias = . . . l i n s p a c e ( −90. 0 , 9 0 . 0 , N_EXCNEURONS ) ; vExcNeurons_Position = . . l i n s p a c e ( 2 3 . 0 , −42. 0 , N_EXCNEURONS ) ; % S t a t e v a r i a b l e s f o r ’ ExcNeurons ’ m Ex cSheet_ Ex cN euron s _ A ct i v i ty = z e r o s ( 2 5 , 8 0 ) ; mExcSheet_ExcNeurons_Input = z e r o s ( 2 5 , 8 0 ) ; for iExcSheet = 1: 25 f or iExcNeurons = 1 : 8 0 m Ex cSheet_ Ex cN eurons _ A c ti v i t y ( i E x c S h e e t , i E x c N e u r o n s ) = 0 . 0 ; mExcSheet_ExcNeurons_Input ( i E x c S h e e t , i E x c N e u r o n s ) = 0 . 0 ; end end % S t a t e v a r i a b l e s f o r ’ mExcToExcHCs ’ mExcToExcHCs_Weight = z e r o s ( 2 5 , 8 0 , 2 5 , 8 0 ) ; f or iDst_ExcSheet = 1 : 2 5 f o r i D s t_ Ex cN eurons = 1 : 8 0 vLabelsDst = [ ] ; vLabelsDst = [ vLabelsDst ] ; % f o r ’ E x c Sh e e t ’ vLabelsDst = [ vLabelsDst v E x c N e u r o n s _ O r i e n t a t i o n B i a s ( i D s t_ Ex cN eurons ) . . . v E x c N e u r o n s _ P o s i t i o n ( i D s t_ Ex cN eurons ) ] ; % f o r ’ ExcNeurons ’ f or iSrc_ExcSheet = 1 : 2 5 f or iSrc_ExcNeurons = 1 : 8 0 vLabelsSrc = [ ] ; vLabelsSrc = [ vLabelsSrc ] ; % L a b e l s f o r ExcSheet vLabelsSrc = [ vLabelsSrc v Ex cN eurons _ Ori entationBias( iSrc_ExcNeurons ) . . . v E x c N e u r o n s _ P o s i t i o n ( i S r c _ E x c N e u r o n s ) ] ; % f o r ’ ExcNeurons ’ mExcToExcHCs_Weight ( . . . i D s t_ Ex cSheet , i D s t_ Ex cN eurons , . . . iSrc_ExcSheet , iSrc_ExcNeurons ) = . . . m y O r i D i f f I n i t ( vLabelsDst , v Label s Src , 1 0 . 0 , 0 . 0 , 0 . 0 ) ; end end end end
178
5.2
R. Ansorg and L. Schwabe
Model-to-Model Transformation
The second example suggests that the use of a Factory object as a template of another Factory together with CompositeFactory objects for structuring larger models may be suffient for building and exchanging models of large neuronal structures. However, while modeling and code generation using the NeuroBench model is already powerful and due to customizable Xpand templates flexible, true DSM shall raise the level of abstraction by making use of highly specific DSLs. In order to highlight this disctinction, we set up a toy DSL for modeling in the visual system (called “VDSL”), which contains only a single construct: a class OrientationHypercolumn with the single attribute cols to define the number of orientation columns within an orientation hypercolumn. This class is not related to any class in the NeuroBench model, because the latter is intended for general purpose modeling, whereas the VDSL shall be used by a visual system domain expert. More specifically, QVT scripts (see SOM) translate each instance of an OrientationHypercolumn object into a Factory with a LinearRateNeuron as a template, proper recurrent synaptic connections, etc.
6
Discussion
Here we argued for adopting DSM for developing neuronal model descriptions. We developed the NeuroBench model for general purpose neuronal modeling, and we presented a toolchain based upon the popular Eclipse platform. However, in which ways could the promise of DSM for increased productivity carry over to neuronal modeling, and in which ways is the DSM approach different from ongoing initiatives? Current initiatives are facing the major challenge of converging on common model descriptions and formats, which involves proper abstractions for general purpose modeling. Our NeuroBench model is yet another general purpose model description. We argue that this multiplicity of model descriptions is beneficial for neuroinformatics at this point, because only such a grass-root approach to model descriptions will ultimately yield widely accepted and usable standards. In particular, we select the OMG’s approach to MDA as a conceptual basis, and the popular Eclipse platform as the infrastructure. In other words, our approach is inherently open, because mediating between different model descriptions becomes just another model-to-model or model-to-text transformation. For example, the NeuroML initiative provides XML Schema definitions, which can be readily imported as a model into the Eclipse modeling project, and hence be source or target of model-to-model transformations. Another example is PyNN, which can be considered as a prime target for model-to-text transformations. Thus, the promise of DSM could be fulfilled in neuronal modeling as long as neuroscience modelers finally address a currently underappreciated but intellectually stimulating enterprise: meta-modeling and the explicit formulation of links between different levels of abstraction. We also need to point out that the NeuroBench model has been constructed as a minimal model located at a strategic position within the hierarchy of
Domain-Specific Modeling as a Pragmatic Approach
179
abstractions. It covers only structural aspects, but so far no explicit description of the dynamics of a model. We argue that such aspects could also be delegated to the toolsmith compared to being stated explictly in a model description (although this would be desirable). In other words, we intentionally ignore the definition of the operational semantics of a model, and in this sense our approach is pragmatic. Future extensions of the NeuroBench model and transformations to other more fine- and coarse-grained descriptions as well as connections to modeling approaches rooted in psychology such as, for example, ACT-R [5] will show if the power of DSM can be unleashed in neuronal modeling. In other words, probably the best way to evaluate and compare our approach is to measure increases in productivity for formulating and simulating models spanning multiple levels of granularity and abstraction.
References 1. 2. 3. 4. 5. 6. 7. 8.
9. 10.
11. 12.
13. 14. 15.
16. 17.
The Eclipse Modeling Project, http://www.eclipse.org/modeling/ MetaCase, http://www.metacase.com/ NeuroML, http://www.neuroml.org/ Open System Engineering Environment (OSEE), http://www.eclipse.org/osee/ Anderson, J.R., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y.: An integrated theory of the mind. Psychological Review 111(4), 1036–1050 (2004) Czarneck, K., Eisenecker, U.W.: Generative Programming: Methods, Tools, and Applications. Addison-Wesley, Reading (2000) Paternostro, M., Merks, E., Steinberg, D., Budinsky, F.: EMF: Eclipse Modeling Framework, 2nd edn. Addison-Wesley, Reading (2009) Eppler, J.M., Kremkow, J., Muller, E., Pecevski, D.A., Perrinet, L., Yger, P., Davison, A.P., Bruederle, D.: PyNN: a common interface for neuronal network simulators. Front. Neuroinform. 2 (2008) De Schutter, E.: Why are computational neuroscience and systems biology so separate? PLoS Comput. Biol. 4(5), e1000078 (2008) Hucka, M., et al.: The systems biology markup language (sbml): a medium for representation and exchange of biochemical network models. Bioinformatics 19(4), 524–531 (2003) Gronback, R.C.: Eclipse Modeling Project: A Domain-Specific Language (DSL) Toolkit. Addison-Wesley, Reading (2009) Kotaleski, J.H., Blackwell, K.T.: Modelling the molecular mechanisms of synaptic plasticity using systems biology approaches. Nat. Rev. Neurosci. 11(4), 239–251 (2010) Morrison, A., Diesmann, M., Gerstner, W.: Phenomenological models of synaptic plasticity based on spike timing. Biol. Cybern. 98(6), 459–478 (2008) Regev, A., Shapiro, E.: Cellular abstractions: Cells as computation. Nature 419(6905), 343 (2002) Schwabe, L., Obermayer, K., Angelucci, A., Bressloff, P.C.: The role of feedback in shaping the extra-classical receptive field of cortical neurons: a recurrent network model. J. Neurosci. 26(36), 9117–9129 (2006) Simonyi, C.: Intentional software, http://intentsoft.com/ Tolvanen, J.-P., Kelly, S.: Domain-Specific Modeling: Enabling Full Code Generation. Wiley, Chichester (2008)
Guessing What’s on Your Mind: Using the N400 in Brain Computer Interfaces Marijn van Vliet, Christian M¨ uhl, Boris Reuderink, and Mannes Poel University of Twente, Human Media Interaction, Enschede 7522 NB, NL [email protected] Abstract. In this paper, a method is proposed for using a simple neurophysiological brain response, the N400 potential, to determine a deeper underlying brain state. The goal is to construct a BCI that can determine what the user is ‘thinking about’, where ‘thinking about’ is defined as being primed on. The results indicate that a subject can prime himself on a physical object by actively thinking about it during the experiment, as opposed to being shown explicit priming stimuli. Probe words are presented that elicit an N400 response which amplitude is modulated by the associative relatedness of the probe word to the object the user has primed himself on.
1
Introduction
Brain Computer Interfaces (BCI) are devices that let a user control a computer program, without any physical movement. A BCI measures the activity within the brain directly, interprets it and sends a control signal to a computer. By actively or passively changing his brain activity, the user can send different control signals and by doing so, operate the system. The effectiveness of a BCI depends highly on the ability to measure relevant processes within the brain and the performance of the classification of the signal. 1.1
Low Level Brain Responses versus High Level Cognitive Processes
Without significant progress in recording and signal processing technology, braincomputer interfaces (BCI) that rely on electroencephalography (EEG) recordings only have access to basic, neurophysiological responses. Examples include the P300 response, event related (de)synchronization (ERD/ERS) and steady state visually evoked potentials (SSVEP). These directly measurable phenomena can be regarded as low level responses. These are manifestations of higher level cognitive processes that are more complex and cannot be directly measured, such as object recognition, intention of movement and visual processing. When measuring the low level responses, most information about specifics of the higher level processing is lost. However, the low level responses can give insight into the higher level brain processes by using probes in a search scheme. By using probes, different possibilities for the high level brain state can be tested until the correct one is found. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 180–191, 2010. c Springer-Verlag Berlin Heidelberg 2010
Guessing What’s on Your Mind: Using the N400 in BCI
181
One example of existing BCI that try to determine a deeper underlying brain state are those that try to determine the memories of the user, usually by exploiting the P300 effect [9]. The P300 potential is linked to an oddball task: when multiple stimuli are presented to the subject, the task related interesting stimulus will elicit a bigger P300 then a uninteresting one, due to a response triggered by increased attention due to recognition. This method allows the user to both consciously control which stimulus to select by focussing on a task and unconsciously be probed by stimuli that trigger a recognition response. The unconscious probes could for instance be used to determine whether a subject looked into a box containing some objects, or not. He would be instructed to look for images of birds in a collection of various photographs. Photographs of birds and those of objects in the box both elicit an enlarged P300 potential in relation to irrelevant photographs [1]. In this paper, the N400 potential is used as the low level brain response to determine a search scheme using probes to determine the high level brain state. While the P300 is related to attention, the N400 has been associated with semantic processing [6]. 1.2
N400
The N400 potential was first discovered by Kutas et al. [5], who were analyzing the Event Related Potential (ERP) of subjects that were reading sentences. They studied the effect of adding words that did not make sense given the preceding ones in a sentence in order to get an insight into brain activity during semantic parsing. To this end, a set of sentences was created of which half the sentences ended in a semantically congruent word (e.g. I drink coffee with milk and sugar ) and half the sentences ended incongruent (e.g. I drink coffee with milk and socks).
Grand averages ERP channel PO3
correct incorrect
6 4 2 0 −2 −4 −6 0
I
drink 1000
coffee 2000
with 3000
milk 4000
and 5000
sugar socks 6000
7000
Time (ms)
Fig. 1. Plot of an ERP recorded on a subject which was shown 7 word sentences. When the last word is shown, there is a distinctive difference between a word that lies in the line of expectation (labeled as correct) and a word that does not (labeled as incorrect).
182
M. van Vliet et al.
The grand average ERP’s, recorded at position PO3, of both classes are shown in fig. 1 for a subject, recorded at position PO3. Each second, a word is flashed on the screen and a recognizable series of ERP components reappears after the onset of each word. One of these components, which appears around 400ms after the word is shown, changes amplitude when the word is unrelated to the rest of the sentence and is called the N400 potential. The N400 has been linked to the concept of priming. Priming is an improvement in performance in a perceptual or cognitive task, relative to an appropriate baseline, which is caused by previous, related experience. In semantic priming, the ability to decode a stimulus carrying meaning is improved by previous exposure to a stimulus with a related meaning. In the experiment described above, the subject is primed on the first six words of a sentence, and the seventh word is either semantically related to the prime or not. It was later discovered that the N400 effect not only occurs in sentences, but it is shown that a whole range of stimuli can be used. The underlying strategy is to first show a prime stimulus and a short time after, show a probe stimulus, where the prime and probe can for example be word–word, image–word, image–image or even sound–word pairs [2][10]. There are two competing theories as to the cause of the N400 potential [12]. The integration view states that the N400 is caused by a difference in difficulty integrating the symbol in a context. This theory is obviously in line with the results of the experiments of Kutas et al. where the last word had to be integrated with the rest of the sentence. The results with word pairs can be explained by regarding the first word as a context where the second word has to be integrated into. The lexical view states that the N400 is caused by a difference in the difficulty of long term memory access. According to the spreading activation model, the activation of a symbol from memory causes nearby symbols to preactivate, making subsequent access of these symbols easier. Findings in [8], in which fMRI is used, and [7], in which MEG is used to localize the N400 effect, suggest that the effect is primarily due to facilitated memory access, which states a case for the lexical view. 1.3
Goals of the Present Research
Like the P300, the N400 potential can give insight into high level brain processes by using probes. In this paper, the possibility is explored of using the N400 potential to determine which one out of several possible objects the user is thinking of. For example, to differentiate between the user thinking about a coffee mug or a tomato. The advantage of using the N400 over the P300 is that the N400 effect will also occur on stimuli that do not correspond completely with the object the subject is thinking of, but also on stimuli that are closely related. This could in the future allow for a system to deploy a binary search algorithm in order to find the target object, allowing for a much larger choice for the subject. For instance, the system can first try to determine whether the object is a living
Guessing What’s on Your Mind: Using the N400 in BCI
183
organism or not, before descending down the search tree, playing a BCI version of 20 questions1 with the user. When showing a probe word, the N400 effect can be used to detect whether the subject was primed on a stimulus related to the probe word or not. Current N400 research does not leave any choice to the subject as to which stimulus he will be primed, so the prime will always be known in advance. If however the subject is allowed to choose his prime and this choice can be detected, the N400 potential will be a useful feature to use in a BCI. This choice presents a problem when showing the priming stimulus: how can a system know which stimulus the subject wants to be primed on? Showing the stimuli corresponding to all the choices will most likely prime the subject for all of them, disallowing any choice. In this paper, a method is investigated in which the subject is not presented with a priming stimulus, but must achieve the proper priming by actively thinking about a physical object. When a user has primed himself on an object, showing probe words corresponding with all the possibilities and examining the N400 potential elicited by them may enable an automated system to determine which object the user was thinking about. The ambiguous term ‘thinking about’ is now defined as ‘being primed on’. Since the priming effect occurs for many different types of stimuli, such as words, images and sounds, the hypothesis that a subject can prime himself by being told to think about an object is assumed to be possible and is evaluated with an experiment. The goal of the experiment is to determine whether a subject can prime himself on an object without being shown a priming stimulus.
2
Method
Probe words have been prepared that correspond to one of two possible primes (e.g. a book or a bottle). The problem is how to convey the choices of prime to the subject. Telling him the choices may cause him to be primed on all of them. In the experiment, the subject was therefore not given a choice. He was given a physical object, such as a book, mug or tomato, to hold. Physical objects were chosen to not limit the subject’s mind to a visual, auditory or lingual stimulus, but allow him to choose for himself how to go about thinking about the object. In order to promote the latter, two auditory beeps are played before showing the probe word. The subject is instructed to close his eyes on the first (low pitched) beep and concentrate on the object and to open his eyes on the second (high pitched) beep and look at the screen, where the probe word appears. 2.1
Participants
The experiment was performed on three participants aged between 23 and 28, all of which were males and native speakers of Dutch. All of them were right handed. They were placed in a comfortable chair in front of a desk with a computer screen and did not leave the chair for the duration of the experiment. 1
A game in which the player is allowed 20 yes-or-no questions to determine what the opponent is thinking of. See also http://www.20q.net
184
2.2
M. van Vliet et al.
Design
Each participant completed one session with consisted of three blocks. The procedure during a block is as follows: 1. An object is given to the subject, allowed to be held and placed before him on the table. 2. Instructor leaves the room. 3. 100 Trials were performed. 50 trials matching the object shown to the subject, 50 matching a different object, not shown to the subject. 4. Instructor enters the room. 5. 5-10 minute break. Fig. 2 summarizes the way each trial was presented to the subject. Prior to each word, a low beep was heard. The subject was instructed to close his eyes when hearing this beep and think about the shown object. Two seconds later, a high beep followed. The subject was instructed to open his eyes when hearing this beep. A fixation cross appears and the subjects eyes are drawn to the center of the screen. This closing and opening of the eyes will produce a large EOG2 artifact, which is dealt with in the signal processing step later on. After 2 seconds to prevent overlap with the artifact, a probe word replaces the fixation cross for 200ms. The next beep would sound 1800ms after that, prompting the subject to close his eyes again for the next trial.
1
0
lo beep
fixation cross
2
time (s) 3
eyes closed
4
hi beep
fixation cross
5
6
word
fixation cross
Fig. 2. Timeline of a single trial. Two beeps sound, one indicating the subject to close his eyes and think about the object, one indicating him to open his eyes.
2.3
Procedure
1. Subject is seated in a comfortable chair in front of a computer screen. 2. Subject is told about the goal of the experiment and given instructions on the procedure. 3. Subject is fitted with electrodes during the explanation of the experiment. 4. 10 Test trials were presented to acquaint the subject with the procedure. 5. Three blocks were performed. 6. End of experiment. 2
Electro-OculoGraphy, in this context meaning the electric current produced by eye movements that shows up in the EEG recording.
Guessing What’s on Your Mind: Using the N400 in BCI
2.4
185
Stimuli
The experiment consists of 3 blocks. In each block, the subject is given a physical object and 10 words are shown that match the given object and 10 words are shown that match a different object. Each word was included in the randomized sequence 5 times, in order to average the recordings of the 5 repetitions later on. The words that were shown to the subject have to be closely related to one object, but not at all related to the other objects. To archive this, the Dutch Wordnet [13] was used. This wordnet is a graph with Dutch words as nodes, with semantic relations drawn between them as edges, such as synonyms, hyponyms, opposites etc. The 6 physical objects are chosen to be exemplars which can be described with one word (e.g. a book, a mug, a tomato, etc.), hereafter called the object name. The goal is to associate 10 Dutch words with each object name. Candidate words are generated by traversing the Dutch Wordnet, starting at the object word o1 (for instance ‘book’) and spreading outwards in a breadth-first fashion. For each word that is encountered in this fashion, the distance to each of the 6 object names is calculated and a score is calculated: 6 d(w, oi ) − 5d(w, o1 ) (1) s= i=2
Where s is the score, w is the word under consideration, oi is one of the object names, with o1 being the object name from which the search was started. d(x, y) is the distance function between two words: the number of edges in the shortest path between them, which is used as a measure of relatedness of the words. Words with a high score will be close to the object we are searching from, but distant from any of the other objects. A search like this will generate many words that are either uncommon or not necessarily associatively related to the object. For each object name, a list of 30 words is created by sorting all the generated words by score, highest to lowest, and manually taking the first 30 words that were judged to be strongly associatively related to the object by the instructor. This subjective selection improves the effectiveness of the dataset, because the distance function used takes only purely semantic relatedness into account, whereas the N400 effect is also attributed to associative relatedness between words. The total list of 180 words (30 words times 2 objects times 3 blocks) has been presented to each subject at least a week before the experiment. The subject was asked to score each word in relation to a photograph of the corresponding object. A 5 point scale was used, 1 being not related at all, 5 being practically the same thing. For each object, the 10 words that the subject scored highest where chosen to be used in the experiment. When choosing between words with the same score, the scores by the other subjects was taken into account and the words with a high score assigned by all subjects were favored. For example, fig. 3 lists the words that the first subject scored highest in relation to an object. The full list of words used during the experiment is included in appendix A.
186
M. van Vliet et al. Dutch original: woordenboek bibliotheek bijbel bladzijde hoofdstuk paragraaf uitgever verhaal auteur kaft English translation: dictionary library bible page chapter section publisher story author cover
Fig. 3. left: a photograph of a sample object that was shown to the subjects. right: the 10 words marked as most related to the object in the photograph by the first subject.
2.5
Method of Analysis
A schematic overview of the data analysis method is presented in fig. 4. Numbers in the text below correspond to the blocks in the diagram. The recordings were made with a 32 channel EEG cap and 3 external electrodes placed in the middle of the forehead and below each eye (1). All data was recorded with a samplerate of 256Hz, average referenced (2) and bandpass filtered between 0.3Hz and 30Hz (3). Trials were extracted on the interval t = −3s 3
2
1
4
0,3 Hz
32 EEG
30 Hz
-3s
2s
trials
CAR
3 EOG
-1s
EEG EOG
AVG
AVG 0s
256 Hz
100 Hz
5
7
6
5
correct
ERP 150
50%
8
50%
10
T-Test
incorrect correct
9
incorrect
11
ERP 150
10
Fig. 4. Diagram summarizing the data analysis process. Each block has a number which correspond to the numbers in the text.
Guessing What’s on Your Mind: Using the N400 in BCI
187
to t = 2s, relative to the onset of the probe word at t = 0s (4). This includes the moment the subject opens his eyes until the moment any N400 effects should no longer be visible. Each probe word occurred 5 times in the presentation sequence. The corresponding trials were averaged (5) to form the final trials. These averaged trials were filtered with an automated method for reducing EOG effects [11] which involves calculating a mixing matrix between the recorded EOG signal and the recorded EEG (6). Using this matrix, the EOG can be subtracted from the EEG, reducing the effect of eye movements in the data, which are severe but predictable in this experiment, since the subject was instructed to close and open his eyes before being shown a probe word. Application to the average of 5 trials instead of unaveraged data increases the effectiveness of this filtering [3]. Each trial was baselined on the interval t = −1s to t = 0s (7) and resampled to 100Hz (8) to reduce the number of data points. For each class, an Event Related Potential (ERP) plot was created (10). Student’s t-tests were performed on each 10ms segment (11) between both classes to determine the statistical significance between any differences between the two classes.
3
Results
The resulting ERP plots are presented in fig. 5. It can be seen that starting around 400ms, the waveforms diverge between the classes for all subjects. From the topo plots can be seen that the location of the N400 effect differs for each subject. This could be explained by the fact that the subjects employed different strategies for concentrating on an object, ranging from visualizing it to thinking about related symbols. The ERP plots show that there is a dipole effect: the N400 is a positive deflection in relation to the baseline at frontal/right positions and a negative deflection in occipital/left positions. Variation of the N400 amplitude and timing is to be expected between subjects, as it is also the case in other studies (see for an example [4], figure 16). There are also differences in the duration of the effect, in the recordings for subject 3, the effect is measurable for more than a second, up until the subject closes his eyes again, while subject 1 only shows the effect for a few hundred milliseconds. All data preprocessing steps are performed on the dataset as a whole, except for the baselining. It is possible that the separate calculation of the baseline for each class creates an artificial difference between the ERPs. This could for instance be the case when the baseline is calculated on a unstable portion of the signal containing lots of artifacts. Such periods exists in the recordings, where the subject opens and closes his eyes causing an EOG artifact. Calculating the baseline on the wrong portion of the signal will effectively generate a random baseline value for each class. In order to rule this possibility out, the exact same data analysis was performed again, but the trials were assigned random class labels by shuffling them and assigning the first half the label correct and the last half the label incorrect. The result was that any differences between the classes that could be seen were not statistically significant and randomly distributed over channels and time. This bestows confidence in the method of analysis.
188
M. van Vliet et al. Subject 1
Subject 2
Subject 1
P3
Pz
PO4
5
5
5
0
0
0
0
−5
−5 500
1000
−5
0
CP1 Subject 2
PO3
5
0
500
1000
−5
0
500
1000
0
F4
FC6 5
5
5
0
0
0
0
−5
−5
−5
−5
500
1000
0
FC1
500
1000
0
C3
500
1000
0
5
5
5
0
0
0
0
−5
0
500
1000
0
−5 500
1000
0
1000
500
1000
Pz
CP1
5
−5
500
Cz
5
0
Subject 3
Subject 3
−5 500
1000
0
500
1000
correct incorrect
Fig. 5. Top: For each subject, the topo plot shows the mean significance for each channel during the time interval 500–1000ms. The values are given as 1/p-value, so a higher value means the difference between the classes is more significant. Bottom: ERP plots for all three subjects. Each row corresponds to a subject and the 4 most significant channels. Shaded area’s indicates a statistically significant (p ≤ 0.01) difference between the waveforms.
4
Conclusions
The purpose of the experiment was to explore whether a subject can be primed without being shown an explicit stimulus, such as a word, image or sound. The subject was instead asked to prime himself by actively thinking about an object when hearing a beep. Single probe words were used to trigger a N400 response when the word matched the previously shown object.
Guessing What’s on Your Mind: Using the N400 in BCI
189
The recordings indicate that the N400 effect is indeed elicited using this strategy and so, the experiment gives support for the hypothesis that a subject can prime himself by thinking about an object in such a way that the N400 effect occurs when shown probe words. Evidence is given that priming can be achieved without using explicit stimuli, leaving choice for the subject to prime himself. Using this pilot experiment as a basis for further research, a BCI could be constructed which can guess the object that the user is primed on and by extension what it is the user is ‘thinking about’.
5
Future Work
Many questions remain to be answered before the N400 signal can be reliably used in a BCI context as envisioned in this paper. The decision to ask the subject to close his eyes, made signal processing considerably harder, because of the generated EOG artifacts. It was included to make it easier for the subject to concentrate and not be distracted by outside stimuli. In retrospect, this might have done more harm than good, so a future experiment can be performed to compare results without the closing of the eyes. A first attempt has been made on trying to automatically classify the trials. A linear support vector machine was trained on the data segment corresponding to the 4 channels with the lowest average t-test scores (the most significance) and the time interval 400ms–600ms, resulting in 80 datapoints per trial. However, naive classification of this kind proved to be insufficient as the performance was around chance level. After the experiment, the subjects were familiar with the words used, which can have an influence on their ability to trigger an N400 effect if there same dataset would be used again. Research can be done to determine the impact of these repetition effects. When constructing a BCI, care must perhaps be taken to present different probe words every time it is used by the same user. Improvements due to user training can also be explored.
Acknowledgements The authors gratefully acknowledge the support of the BrainGain Smart Mix Programme of the Netherlands Ministry of Economic Affairs and the Netherlands Ministry of Education, Culture and Science. This work is also made possible by the academic-assistant project of the Royal Dutch Academy of Science, financed by the Dutch Ministry of Education, Culture and Science. We would also like to thank the test subjects for their efforts.
References 1. Abootalebi, V., Moradi, M.H., Khalilzadeh, M.A.: A new approach for EEG feature extraction in P300-based lie detection. Computer methods and programs in biomedicine 94(1), 48–57 (2009)
190
M. van Vliet et al.
2. Bajo, M.T.: Semantic facilitation with pictures and words. Journal of Experimental Psychology: Learning, Memory, and Cognition 14(4), 579–589 (1988) 3. Croft, R.J., Barry, R.J.: EOG correction: a new aligned-artifact average solution. Electroencephalography and clinical neurophysiology 107(6), 395–401 (1998) 4. Hagoort, P., Brown, C.M., Swaab, T.Y.: Lexical-semantic event-related potential effects in patients with left hemisphere lesions and aphasia, and patients with right hemisphere lesions without aphasia. Brain: a journal of neurology 119, 627–649 (1996) 5. Kutas, M., Hillyard, S.A.: Reading Senseless Sentences: Brain Potentials Reflect Semantic Incongruity. Advancement of Science 207(4427), 203–205 (1980) 6. Kutas, M., Hillyard, S.A.: Brain potentials during reading reflect word expectancy and semantic association. Nature 307, 161–163 (1984) 7. Lau, E., et al.: A lexical basis for N400 context effects: evidence from MEG. Brain and language 111(3), 161–172 (2009) 8. Lau, E.F., Phillips, C., Poeppel, D.: A cortical network for semantics (de)constructing the N400. Nature reviews. Neuroscience 9(12), 920–933 (2008) 9. Meegan, D.V.: Neuroimaging techniques for memory detection: scientific, ethical, and legal issues. The American journal of bioethics: AJOB 8(1), 9–20 (2008) 10. Orgs, G., Lange, K., Dombrowski, J.H., Heil, M.: Conceptual priming for environmental sounds and words: an ERP study. Brain and cognition 62(3), 267–272 (2006) 11. Schl¨ ogl, A., et al.: A fully automated correction method of EOG artifacts in EEG recordings. Clinical neurophysiology: official journal of the International Federation of Clinical Neurophysiology 118(1), 98–104 (2007) 12. Thompson-Schill, S.L., Kurtz, K.J., Gabrieli, J.D.E.: Effects of Semantic and Associative Relatedness on Automatic Priming. Journal of Memory and Language 38(4), 440–458 (1998) 13. Vossen, P., Bloksma, L., Boersma, P.: The Dutch Wordnet (1999)
Guessing What’s on Your Mind: Using the N400 in BCI
A
191
Words Used in Experiment 4 zin sentence bibliotheek library trilogie trilogy letter letter paragraaf section hoofdstuk chapter inleiding introduction voorwoord foreword nawoord epilogue uitgever publisher kaft cover woordenboek dictionary bladzijde page verhaal story auteur author pint pint glas glass hals neck kroonkurk bottle cap alcohol alcohol biertap beer pul pint doorzichtig transparent gezellig merry fust cask bier beer wijn wine statiegeld deposit krat crate flessenopener bottle opener
snuiten blow servet napkin snuiven sniff wegdoen put away hoesten cough doekje handkerchief niezen sneeze afvegen wipe broekzak pocket neus nose snotteren snivel weggooien throw away niesbui sneezing fit opvouwen fold
karton cardboard verhuizen move bewaren keep verpakken package etiket label inpakken pack tillen lift magazijn storehouse plakband duct tape stapelen stack opbergen store schoenendoos shoe box opslag storage zolder attic dragen carry fruit fruit groente vegetable tortilla tortilla paprika paprika peterselie parsley voeden feed ovenschotel oven dish tros bunch plant plant voedsel food lekker delicious ketchup ketchup saus sauce pizza pizza groentenboer greengrocer plukken pick gerecht dish maaltijd meal salade salad beker cup thee tea chocolademelk chocolate theelepel teaspoon slurpen slurp schenken pour out gieten pour bord plate oortje ear onderzetter coaster koffie coffee breken break keukenkast cupboard drinken drink drank beverage
A Brain Data Integration Model Based on Multiple Ontology and Semantic Similarity Li Xue, Yun Xiong, and Yangyong Zhu School of Computer Science, Fudan University Shanghai 200433, P.R. China [email protected], [email protected], [email protected]
Abstract. In this paper, a brain data integration model(BDIM) is proposed by building up the Brain Science Ontology(BSO), which integrates the existing literature ontologies used in brain informatics research. Considering the features of current brain data sources, which are usually large scale, heterogeneous and distributed, our model offers brain scientists an effective way to share brain data, and helps them optimize the systematic management of those data. Besides, a brain data integration framework(BDIF) is presented in accordance with this model. Finally, many key issues about the brain data integration are also discussed, including semantic similarity computation, new data source insertion and the brain data extraction.
1
Introduction
In recent years, the research of brain science has offered BI plenty of useful brain data. However, it is difficult and ineffective to extract data from diverse brain databases, as well as liable to make errors due to the inherent features, such as, heterogeneity and decentralization. Moreover, the volume of such data still increases rapidly with the development of the research related to brain science. Due to these problems, the BI researchers are facing some common pain points, which mainly lie in three aspects: – The researchers have to master the usage of several specific query languages and interfaces, because different database systems do not support one common query language and application interface. – Due to the heterogeneities of current existing brain databases, the contradictions usually occur, thus most of the brain data need to be reorganized and cleaned manually. – For obtaining the latest and the most complete brain data, problems mentioned in 1) and 2) usually occur repeatedly, because the source databases are realtime updated together with the new development of BI research. Due to these problems, we propose a brain data integration model based on integrating some existing ontologies used in brain science. Furthermore, to deal with the inconsistencies according to a unified semantics, the measure of the Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 192–199, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Brain Data Integration Model Based on Multiple Ontology
193
semantic similarity between concepts in the ontology is defined, which represents the similarity degree of brain concepts from semantic view. Several other related methods for extracting brain data are given too, which are used for new data source insertion, data extracting. The rest of this paper is organized as follows. Section 2 introduces related works. Section 3 describes our brain data integration model, including the model structure, the function of each component, and model development process. Section 4 presents the integration framework built up in accordant with this model, as well as several critical issues, in particular a novel semantic similarity computation method. Finally, Section 5 gives concluding remarks.
2
Related Works
Currently, there are several research branches related to brain science, such as Neuroscience, Cognitive Informatics, Cognitive Neuroscience, and Brain Informatics, among which many effective integration platforms have been developed by using information technologies. However, most of the integration models have been applied in a limited area, due to researchers’ specific motivations. Neuroscience researchers explore the brain functions by cooperating the mathematics theory and computer simulation methods,such as Gardner,et al[1]design and implement a Neuroscience Information Framework on the Web. However, the research attention is limited among brain structure data and function data. Cognitive informatics is an interdisciplinary research field of cognitive science and machine learning method, which investigates the brain from the view of informatics, but the researchers of this domain do not make systematically research on large scale data integration. As to the cognitive neuroscience, scientists focus on the mechanism of cognitive activities of the brain. That is, they study on how the brain regulates its components on different levels, i.e, molecule level, cell level, brain region level, and the whole brain level. Finally, the Brain Informatics researcher take advantage of both neuroscience methods and web intelligence technologies, to perform systematical investigation on human cognitive mechanisms. Chen, et al [2] bring out the Data-Brain concept model for explicitly representing various relationships among multiple human brain data sources, with respect to all major aspects and capabilities of human information processing systems(HIPS), which can be utilized for data sharing and integration. Besides, although most of the aforementioned integration systems take ontology as an critical tool for building up conceptual model, a universal ontology of brain science is still missing, which greatly limits the further investigation insight the brain science. Inspired by the existing literature, we propose a brain data integration model by building up a universal brain science ontology, which integrates many existing brain science related ontologies. According to this model, a brain data integration framework is built up too, which offers a more effective way to manipulate the brain data in many heterogeneous data sources.
194
3 3.1
L. Xue, Y. Xiong, and Y. Zhu
The Brain Data Integration Model The Structure of BDIM
The BDIM is comprised of two main parts: The core of BDIM and Selected existing data sources. As shown in Fig.1, the core of BDIM is the fundamental part of this model, which is made up of three basic modules, the Brain Data Access Interface(BDAI), the Brain Data Management Agent(BDMA), and the Brain Science Ontology(BSO).
Fig. 1. The Brain Data Integration Model
The BDAI module offers the users an unified access method to the distributed brain data sources. The BDMA module mainly implements basic management of the brain data, such as query, modify and delete etc. And the BSO acts as the global ontology of brain science, and covers comprehensive concepts and relationships among this field. The role of BSO is to link and integrate the existing ontologies, such as, ERP ontology, fMRI ontology, etc. Among the core part of BDIM, building up BSO is the most important task. With respect to state-of-the-art method of integrating multiple ontologies, our method can be categorized into Hybrid Approaches[3]. Following this approach, the semantics of each data source is described by its own ontology. In order to make the source ontologies comparable to each other, all the ontologies are built upon a global shared vocabulary. The shared vocabulary contains basic terms (the primitives) of a domain. Thus, for creating compound terms of a source ontology, the primitives are combined by some operators. In this case, the terms of all source ontologies are based on common primitives, so it is easier to compare them than multiple ontology approaches. Sometimes the shared vocabulary is
A Brain Data Integration Model Based on Multiple Ontology
195
also an ontology[4], BSO is such kind a case, which extracts primitives from the Unified Medical Language System(UMLS) and some influential ontologies, e.g., Gene Ontology(GO), fMIR Ontology, ERP Ontology, etc. 3.2
The Development of BSO
According to the ontology development methodology, there are many successful methods (e.g. METHONTODOLOGY or TOVE). The development process of BSO is in accordance with method proposed by Uschold and Gruninger[5], which divides the development process into four phases: Phase one. Identifying development purpose and scope: Specialization, intended use, scenarios, set of terms including characteristics and granularity. Obviously, the BSO is created for brain science research, but it covers many different researching levels and aspects, which has been mentioned above. Hence, the concepts abstracted from UMLS and other source ontologies also varies a lot reflecting diverse levels and various aspects. In fact, building up appropriate concepts set and relationships set is an extremely critical and tuff work, which has to deal with much existing inconsistencies among source ontologies. Generally, the inconsistencies between ontologies are considered on three levels[6]: inconsistency on the instance level, inconsistency on the concept level, and inconsistency on the relation level. Phase two. Building the ontology: (a) Ontology capture: interacts with requirements of phase one for Knowledge acquisition. (b)Ontology coding: Building up the conceptual model for domain knowledge (c) Integrating existing ontologies: The reuse of existing ontologies speeds up the development process, but some new problems will arise up at the same time, in particular the inconsistency problem mentioned in phase one. During the development process of BSO, We apply three different strategies, proposed by Ngoc Tjanh Nguyen[6], to these three types of inconsistency problems accordingly. Phase three. Evaluation: Verification and Validation. Phase four. Guidelines for each phase.
4 4.1
The Integration Framework Based on BDIM The Structure of Integration Framework
In this section, we bring out an integration framework based on BDIM. This framework is composed of four layers, shown in figure 2: Data Source Layer: contains many brain data sources, which is the lowest layer of the integration framework. At the beginning,we select some influential and authorized brain databases as data sources of the integration platform,e.g. FlyBase(Drosophila), the Saccharomyces Genome Database(SGD), and the Mouse Genome Database(MGD), etc. Data integration layer: implements the data integration task by using Extraction, Transformation and Load (ETL) tools. Supervised by the BSO, the meta
196
L. Xue, Y. Xiong, and Y. Zhu
Fig. 2. The Brain Data Integration Framework
data mappings between data sources and the objective datahouse are defined. According to these mappings, the ETL tools automatically extract data from source databases, transform source data into the objective format, and finally load them into the objective datahouse. User Data View Layer: the platform applies both physical and logical integration strategy. The physical integration strategy will be performed for the frequently used brain data.In other cases, the platform carries logic strategy, which does not load the integrated brain data into a real physical database. This compromised integration strategy takes both time and space factors into consideration. Application Service layer offers the users various data application services as follows: a) Offering the researchers standard brain data,which can be used as a criteria for analyzing the brain data. b) Brain data query, including general brain data query and bibliographic data query. c) Brain data mining. d) Brain data display and demonstration. e) Online analysis via Web service. 4.2
Several Critical Techniques
New Data Source Insert. To insert a new data source into the BDIF, the researchers need perform the following steps: a) Wrap and publish the data source as a service node before insert it into the integration platform. b)Authenticate the database register tool before using it. c) Seek and download the related virtual table for the data source from the information center by register tool. d) Submit the data source information to the information center by the register tool. Then, this service node can be discovered by information center.
A Brain Data Integration Model Based on Multiple Ontology
197
Semantic Similarity Computation. In BDIF, a novel similarity computation method based on the semantic path coverage(SPC) is adopted, which is first proposed by us[7]. The basic idea of this method will be presented after making the following definitions. Definition 1. Ontological Link-structure Graph The ontological link-structure graph is an acyclic, denoted by G =< V, E, W, r >, where, V (V = Φ ) is the node set, r is the root node, and E (E ⊂ V ×V ) denotes the directed arc set. The W is a weighting function, defines the mapping from V to the positive real number set. Definition 2. Semantic Path For a given ontological link-structure graph G =< V, E, W, r >, each path from r to v(v ∈ V ) is called a semantic path of v. Among G, each node v has at least one semantic path, and the semantic path set is denoted by Φ(v). Definition 3. Intersection of Semantic Path ), which Given two semantic paths, P = (v0 , v1 , ..., vn ) and Q = (v0 , v1 , ..., vm are among the ontological link-structure graph G =< V, E, W, r >. Suppose n ≥ m > 0, then the intersection of P and Q is a semantic path,denoted by P ∩ Q. Definition 4. Union of Semantic Path Suppose P = (v0 , v1 , ..., vn ) and Q = (v0 , v1 , ..., vm ) are two semantic paths among the ontological link-structure graph G =< V, E, W, r >. where n > 0, m > 0, the union of P and Q is a node set containing all the nodes in P and Q, denoted by P ∪ Q. Based on the above definitions, our proposed method computes the semantic similarity of the nodes v1 and v2 among BSO by the following steps: Step one: Compute the total number of the nodes among BSO. Step two: Compute the number of descendant nodes of each node ϕ, denoted by ϕ . Step three: Compute the occurrence probability of the descendance of ϕ, de noted by p(ϕ) = ϕN . Step four:Compute the information content of ϕ according to the informatics theory, have IC(ϕ) = −log(p(ϕ)). Step five: Compute the intersection of the semantic pathes of v1 and v2 , denoted by α: N Pi ) ∩ ( N Pj )). α = (( N Pi ∈ϕ(v1 )
N Pj ∈ϕ(v2 )
Where, ϕ(v1 ) and ϕ(v2 ) respectively indicates the semantic path set of node v1 and v2 . Step six: Compute the union of the semantic pathes of v1 and v2 , denoted by β: β = (( N Pi ) ∪ ( N Pj )). N Pi ∈ϕ(v1 )
N Pj ∈ϕ(v2 )
198
L. Xue, Y. Xiong, and Y. Zhu
Step seven: Compute the sum of information content of the nodes in α, IC(α): IC(α) = IC(ωi ). ωi ∈α
Step eight: Compute the sum of information content of the nodes in β, IC(β): IC(β) = IC(ωi ). ωi ∈β
Step nine: Define the semantic similarity of v1 and v2 as the ratio of IC(α) to IC(β), that is: Sim(v1 , v2 ) =
IC(α) IC(β)
Brain Data Extraction – For General Brain Databases: The BDIF extracts brain data from general brain database by building up wrappers for complex data objects, which mainly adopts the instance segmentation method. This approach is a perfect combination of top-down and bottom-up methods. Follow this idea, the locator(the segmentation tags set) on sibling nodes are developed for locating the searching area in a probable scope, which is in accordance with the top-down thoughts. Then, checking wether the instance is extracting object according to constraints of the node. – For Bibliographic database: Compared with the general brain databases, the bibliographic database of brain science are a special group of data sources, among which the data are mainly nonstructural textual data. The BDIF applies a method, named Question Net(QNet) extraction method, to the bibliographic data, which is proposed by us[8]. The QNet method transforms the sentence into a directed graph, which puts both the weight and the order of words into consideration. This method is a two-phased extraction method, builds up the question net at first by training process, and then finds out similar sentences with the question net as candidate objectives. The final objectives are picked out from these candidates.
5
Conclusion
The brain data integration is a hot issue of BI research, which becomes more and more important for brain science study. For realizing systematic data management for brain investigation, a novel brain data integration model is proposed, which integrates multiple brain databases by linking up existing ontologies through a global Brain Science Ontology. The fundamental idea of our model is an extension of our former idea of Data-Brain Model[9]. Furthermore, a concrete framework has been built up according to this model, which offers the brain scientists an effective way to obtain data from various heterogeneous data sources and is more suitable for performing multi-dimensional study on brain science.
A Brain Data Integration Model Based on Multiple Ontology
199
Acknowledgement The research was supported by the National Science Foundation Project of China under Grant. No.60903075 and Shanghai Leading Academic Discipline Project under Grant No. B114.
References 1. Gardner, D., Akil, H., Ascoli, G.A., Bowden, D.M., Bug, W., Donohue, D.E., et al.: The Neuroscience Information Framework: a data and knowledge environment for neuroscience. Neuroinformatics (2008), doi:10.1007/s12021-008-9024-z 2. Chen, J.H., Zhong, N.: Data-Brain Modeling Based on Brain Informatics Methodology. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 41–47. IEEE Computer Society Press, Los Alamitos (2008) 3. Wache, H., Voegele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., Huebner, S.: Ontology-Based Integration of Information - A Survey of Existing Approaches. In: Proceedings of the IJCAI 2001 Workshop on Ontologies and Information Sharing, pp. 108–118 (2001) 4. Stuckenschmidt, H., Wache, H., Vogele, T., Visser, U.: Enabling technologies for interoperability. In: Workshop on the 14th International Symposium of Computer Science for Environmental Protection, pp. 35–46 (2000) 5. Uschold, M., Gruniger, M.: Ontologies: Principles, methods and applications. Knowledge Engineering Review 11(2), 93–155 (1996) 6. Nguyen, N.T.: Advanced Methods for Inconsistent Knowledge Management, pp. 242–262. Springer, Heidelberg (2008) 7. Li, R., Chao, S., Li, Y., Tan, H., Zhu, Y., Zhou, Y., Li, Y.: Ontological Similarity Computation Method Based on Semantic Path Coverage. Progress in Nature Science 16(07), 916–919 (2006) 8. Yang, Q., Zheng, G., Xiong, Y., Zhu, Y.: Qnet-BSTM: An Algorithm for Mining Transcription Factor Binding Site from Literature. Journal of Computer Research and Development 45(suppl.), 323–329 (2009) (in Chinese) 9. Zhu, Y., Zhong, N., Xiong, Y.: Data Explosion, Data Nature and Dataology. In: IEEE/WIC International Conference on Brain Informatics, pp. 147–158. Springer, Heidelberg (2009)
How Does Repetition of Signals Increase Precision of Numerical Judgment? Eike B. Kroll1, Jörg Rieger2, and Bodo Vogt2 1
Karlsruhe Institute of Technology (KIT), Institute of Economic Theory and Statistics, Karlsruhe, Germany 2 Otto-von-Guericke University Magdeburg, Chair of Empirical Economics, Magdeburg, Germany
Abstract. This paper investigates the processing of repeated complex information. The focus of this study is how precision of stated numerical estimates is influenced by repetition of the signal and information about the estimates of others. The key question is whether individuals use the law of large numbers in their estimates. In an experiment, participants are asked to estimate the number of points in a scatter plot, which is visible for a short time. The setting of the experiment allows for stating intervals and/or point estimates. Our analysis shows that the estimated interval gets smaller with each repetition of the signal, but the pattern does not follow the prediction of statistical models. The difference between their own estimates and information about the estimates of others does not lead to higher stated precision of the estimate, but does improve its average quality, i.e. the difference between answer and signal gets smaller.
1 Introduction In advanced societies, people constantly face complex decision tasks in their jobs and private life. Although standard economic models on decision making assume full information and an unlimited capacity for information processing in human brains, experimental work suggests that the capacity for information processing of humans is limited. In consequence, complex information can be processed only to a certain degree of abstraction. The result of this information processing is diffuse information, which in turn leads to imprecision of judgment [1]. In scientific literature it is well established, that humans cannot grasp a large number of objects without counting them [2]. The perception of visual information and its numerical transformation is investigated in the literature also under the term numerosity [3,4]. Economic agents face similar decisions and are required to make judgments based on numerical information. This paper deals with the human information process. In other contexts of economic research departures from theoretical predictions are considered to be caused by the lack of experience of the decision-makers. This paper addresses this issue by analyzing the effect of task repetition on the accuracy of subjects’ statements. More specifically, this paper analyzes whether estimation of decision-makers, when facing complex information, follows statistical estimation methods. Therefore, an experimental setting is derived to show the degree of precision which subjects can provide when facing a complex estimation task and how the precision of subjects’ estimations change when a task Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 200–211, 2010. © Springer-Verlag Berlin Heidelberg 2010
How Does Repetition of Signals Increase Precision of Numerical Judgment?
201
is repeated. Specifically, the proposed experiment allows analyzing whether subjects follow a statistical method, the law of large numbers that is proven to be used intuitively in various aspects [5,6]. The applicability of the law of large numbers is discussed in the following section when the research hypotheses are defined. The issue of information processing in human brains is not only considered in economic research, but it is also subject to psychological studies identifying how humans process signals. Following the literature on human signal processing one can argue that imprecision is caused by the complexity of tasks faced by individuals [7]. The processing capacity of human brains necessary for solving a decision problem is the determinant for complexity. The fact that the information processing of the human brain is limited is the main aspect discussed with respect to the phenomena of imprecision of judgment caused by the processing of complex information [8]. This limitation in itself forces the human brain to simplify complex information. However, this process of simplifying the input causes a certain degree of imprecision when interpreting subjects’ decisions. Following this argument there seems to be a tradeoff between problem simplification and quality of the decision. How the degree of vagueness or imprecision in judgment affects the outcome of a decision-making process remains an open question. It seems to be an established fact that humans have a limited capacity for processing information coming from the outside world. Early work about psychophysical perception shows that stimuli from the outside world need to reach a certain threshold of signal intensity in order to be perceived by the human brain at all [9], which was later termed the Weber-Fechner-Law. The implication of this finding is that a part of the stimulus will not be perceived and therefore cannot be processed. As argued in the paragraph before, this can be regarded as a simplification due to the limitations of human brains and as one cause for imprecision of judgment as revealed when answering complex questions. These fundamental findings about the way humans perceive their environment is adopted in theoretical models. One example is the theory of the "window of attention” [10] describing how visual perception is limited in human brains. This model is also based on the assumption that human perception is associated with a certain degree of imprecision. This "lack of precision" can be explained by a restriction of conscious perception of visual information by the visual field [10]. That means that the model on how visual information is processed and conceived by humans includes a process of simplifying the signal and creating imprecision before the perception of visual information becomes conscious. Furthermore, psychological experiments have shown a correlation between shortterm memory and judgment [11]. These findings inspired a modeling approach for human information processing based on the assumption that the structure of the human brain only allows the processing of a limited number of information chunks [11]. While the initial finding suggested that humans have a memory span of seven information chunks, the exact number is debatable. However, this work focuses on limitations of information processing in human brains as the cause for imprecision in judgment. In summary, one can argue that decisions in economic contexts contain some degree of imprecision caused by mental processes developed to cope with the complexity of the world surrounding us. Following the argumentation above, it seems to establish that decision-processes are limited by the capacity of the human brain to
202
E.B. Kroll, J. Rieger, and B. Vogt
process incoming signals. The question arises as to what happens when individuals gain more experience with the task at hand or are provided with additional or repeated information. While psychologists and economists seem to agree on the fact that more information leads to better performance in judgment, more recent research suggests that increasing information does not necessarily lead to better performance of the individuals [12]. It may be possible that the degree of imprecision inherent in judgment decreases with the repetition of signals when individuals face similar decisions over and over again. This argument would follow similar argumentation as stated in discussions about anomalies in expected utility theory [13]. That is, the frequency of departures from expected utility theory decreases when subjects gain more experience with otherwise unfamiliar tasks [14]. The question arises as to how the performance of signal processing changes and how models can describe this process. One possibility to predict the processing of repeated complex information by the human brain could be the use of statistical models. Following the argument that approximate numerical attributes play a central role in human mathematical thinking [15], one can argue that statistical models are used in human decision processes. This is based on the idea that judgment is indeed provided with a degree of imprecision, but when people are faced with the same task repeatedly, the precision increases when the individual gains experience with each repetition of the task at hand [16]. In order to address the question whether the repetition of a task leads to a decrease in imprecision, we design an experimental task which is explained in further detail in the next section of the paper. Furthermore, the analysis of this experiment provides insights on how an increase in information that is available to the subjects affects the precision of their statements. In particular, we check in an experiment if the increase in precision follows statistical models. While there are a variety of statistical models that can be applied to the task provided in the experiment, this paper focuses on the law of large numbers. The justification of using the law of large numbers for the theoretical prediction of the observed effects is as follows. First and foremost, it is based on the assumption that using the mean of different independent estimations is the simplest procedure which is available for human beings. This can be confirmed by different kinds of experiments and studies based on the identification of the phenomenon "the wisdom of the crowds" [17]. The law of large numbers indicates that the relative frequency of random results approximating the probability of the random results if the underlying random experiment is performed repeatedly. That means it provides a projection of the probability distribution of the result of a repeated random event. In combination with using the mean of independent estimates, this means that the variance of the estimate decreases with each repetition of the signal. Further investigation shows, what the differences are between receiving information by your own and additionally receiving information about the interpretation of that information by others. Therefore, the focus is whether there is a difference between receiving a signal repeatedly and observing the interpretation of the signal by a number of other players who received the same signal. In a last step it is analyzed what the general effect of receiving information about the observations of others has on the precision and quality of statements about the signal.
How Does Repetition of Signals Increase Precision of Numerical Judgment?
203
2 The Task We designed a simple task where the manner in which numerical information is aggregated in the brain can be analyzed and compared with standard technical procedures to do this aggregation. Experimental subjects faced the following task: They were shown a scatter plot with a fixed number of points which shows the true value for ten seconds (in this time, subjects were not able to count all the points). In the next stage, subjects were asked to estimate the number of points. Specifically they were asked to state an interval framing the actual number of points shown to them before. Subjects were asked to state this interval as accurately as possible but as widely as necessary. This task was repeated for ten rounds with each scatter plot showing the same number of points but with a varying distribution on the screen. This fact was known to the subjects.
Fig. 1. Example of scatter plot as shown to the subjects
The problem that is analyzed by means of this task is not how human beings try to count the points, but how they deal with the different counts they get per round. We are interested in how they aggregate the imprecise information. A statistical description of this task is that every round the subject observes a random variable, which is characterized by its mean and variance. We assume that the mean and variance is constant in every round. An effective strategy to see how subjects could deal with this task is to determine an estimated number of points every round and then use the law of large numbers. The estimations should be independent of each other. If in round n, a subject calculates the mean of these estimates, this mean will approach the true value and the variance (measured as an interval) should decrease by a factor of n. In a second task, a subject is informed about the estimates of other subjects. In this task observations are independent since the individuals did not interact.
3 Hypotheses Information is one of the most important factors in economics. Therefore, a lot of research focuses on how new information is processed by individuals as well as
204
E.B. Kroll, J. Rieger, and B. Vogt
groups. Although economics assumes perfect sensitivity of decision makers [18], more recent research on decision-making has brought attention to imprecise judgments as a factor when choices are observed in experimental laboratories [19]. Therefore, the question arises as to how precise the perception of new information can be when the information received is not perfect and requires the recipient to process the information and calculate an approximation. The estimate moves from an extreme to a mean of a distribution, because subjects use the law of large numbers for high frequency observations [20]. The law of large numbers states, that with increasing number of trials for the same random experiment, the average observation will be close to the expected value of the experiment. Furthermore, the accuracy increases with increasing numbers of trials. That means the variance of the estimator decreases with each additional trial. Thus, if decision makers were perfect statisticians, or in this case would follow the law of large numbers, the precision of their estimates increases with the number of observations. We will use stated standard deviation as a measure of precision of a response. The quality of a response is measured as the difference between the response and a true value. Following the law of large numbers, this difference should tend to go to zero. In the following part we state and derive the three hypotheses we will test. Hypothesis 1a: Precision of an estimate increases with the observation of repetitions of the signal following the law of large numbers. Following the law of large numbers, the precision increases by a defined factor. The level of precision in this case is reflected by the variance in the stated estimates. One can further assume that for increasing numbers of repetition for the random event, the distribution of the expected value converges to a normal distribution. For this case, assuming that the experimental trials are independent, the variance can be calculated. Following this calculation using the law of large numbers to a signal being repeated n-times, one can conclude that the standard deviation of the stated estimate decreases by the factor . However, there is a discussion in economic and psychological literature whether humans use the law of large numbers when faced with imprecise information. For example, experimental subjects are known to misconceive the fairness of a random draw leading to a perception of the probability of successes in random outcomes [21]. Following this argument, subjects are overly confident in their prognosis of future outcomes when a series of signals is observed. Therefore, the perceived precision is even higher than calculated by the law of large numbers. The application of the law of large numbers in human judgment is a controversial subject. There are arguments favoring the law of large numbers as a good approximation of human behavior [5] as we see that even children are found to have an intuitive understanding of it [6]. Because experiments reveal human behavior that is close to statistical models [22], it seems that people act on the basis of a rule system similar to the law of large numbers [23]. In contrast, other researchers found that people tend to discard sample sizes when facing decisions [24] and do not realize the effect of sample size on sample heuristics applied to decision tasks [25]. Following their argument, the number of repetitions of an incoming signal is neglected by the subjects and does not change the precision of the estimate following the law of large numbers.
How Does Repetition of Signals Increase Precision of Numerical Judgment?
205
While there are different arguments in favor of and against statistical models and the law of large numbers in specific, the comparison of different tasks in the economic literature suggests that the law of large numbers is applied with relative high frequency when questions concern the mean rather than the tail of a distribution [26]. Furthermore, the law of large numbers holds for decisions about frequency distribution tasks [16]. Therefore, the experiment reported in this paper is using a question about a mean of a distribution in a frequency distribution task. Hypothesis 1b: Precision increases with the knowledge of estimates of others following the law of large numbers. Generally speaking, two possibilities exist for receiving additional information about a signal. The first concerns receiving the signal multiple times. The second concerns receiving information about the estimates of other subjects who received the same signal. We expect the law of large numbers to also hold if the observations of others are included. Hypothesis 2: The increase of precision is lower when the estimates of other participants are observed compared to their own information. Experimental results show that subjects tend to copy the behavior of others. For example, analysts adjust their own forecasts if information about the forecasts of others is available neglecting their own private information [27] and even in simple games with two agents, subjects copy the decision of the first mover when it is observable significantly more often than when it is not observable [28]. In games where information of other subjects is observable for the individual, cascade behavior is initialized [29]. That means people make the same choices as others without using their own private information. Furthermore, participants do not recognize cascade behavior of others [30], which is sometimes described as the persuasion bias [31]. Additionally, one can find differences in mental activity between processing private information and the information provided by others, which can explain why people tend to follow the behavior of others [32]. Following these arguments, it seems that humans prefer to stick with the behavior of the group where that behavior is observable and their own decisions can be changed accordingly after observing such behavior. Therefore, one can conclude that subjects should feel more certain about their own answers when following a group. Thus, the precision of answers should be even higher, when the estimates of other participants are observable. Hence, subjects place a higher value on signals from other participants than receiving more information themselves. On the other hand, overconfidence [33] predicts that participants rely more on their own information than on the information on others. In our setting, no strategic implications have to be considered. We also do not compare estimates of subjects in front of the group such that they might feel a pressure to stick to the group opinion. We simply test which information gets heavier weight in the estimates: their own or private information. We think that for individual estimates their own information gets a higher weight and have stated the hypothesis above. Hypothesis 3: The individual estimate is closer to the true value when estimates of others can be taken into account, i.e. the quality gets higher.
206
E.B. Kroll, J. Rieger, and B. Vogt
One of the most famous examples for dealing with one’s own estimates compared to the estimates of others is the winner’s curse [34]. This effect describes the change of a person’s perception of her own signal after being made aware of the estimates of other persons that are made public through a price mechanism. Furthermore, it stresses the importance of receiving signals from others in order to make better estimates of an imprecise signal. In auctions revealing winner’s curse, bidders are found to play best-response to the distribution of other bids [35], depending on the beliefs of the rival’s uncertainty [33]. Since according to the theory, the winner’s curse disappears in market settings [36] and players correct their behavior with respect to the received information [35], one can expect group estimates to have a higher quality than a series of individual estimates. In our setting, the observation of the estimates of others should also lead to a higher quality of the own estimate since the estimates of the others are independent observations.
4 Experiment The experiment was conducted in a laboratory environment at the MaXLab of the Otto-von-Guericke University in Magdeburg. The group of participants consisted of 48 students from the Otto-von-Guericke University Magdeburg enrolled in different fields of study, matched randomly using the ORSEE System [37] into two different groups and several sessions. The subjects faced the task described in the section task. They had to give an estimate of the numerical value representing the number of points in a cloud (see section task). The true value was 143. The experiment consists of two different treatments. In the first treatment, subjects could just look at the plot for ten seconds before they gave their answer.. In the second treatment, additional information was added. In each round, after stating the interval, the subjects were shown a table with the estimated intervals of ten other participants before seeing the next scatter plot. In the second treatment, the number of observations is ten times as high as in the first treatment. All groups played both treatments while one half of the groups started with the first treatment and the other started with the second treatment. Therefore, we were able to test for sequence effects. The software implementation of the experiment in the laboratory was designed with ztree [38]. Table 1. Sequence of the experiment and comparison of treatments Step 1 2 3
Treatment 1&2 1&2 2
Screen Scatter plot Provide upper and lower bound of interval Table with answers of all the other participants
Time Frame 10 sec. Press OK Press OK
5 Results In the experiment, an interval or a point estimate is stated by the participants framing the true value for the number of points in the scatter plot. Therefore, the width of this
How Does Repetition of Signals Increase Precision of Numerical Judgment?
207
interval represents how confident the subjects are about their stated estimation. We use this interval as a proxy for the standard deviation attributed to the estimate by the participants. 5.1 Hypothesis 1a and 1b The analysis of the estimates without revealing the answers of other participants (Treatment 1) shows a decreasing width of these intervals. While the width of the mean interval in the first round is 40, it shrinks to a mean width of 27.5 in the tenth round. Thus, the interval decreases with the number of observations (Wilcoxon-Test, 5%-level). When comparing the width of the interval in rounds one and ten, the law of large numbers would predict a decrease of the interval by the factor For the analysis we calculate this factor for each individual by dividing the interval width of round one by the interval width of round ten .A factor larger than 3.16 can be interpreted as the subject is in line with the law of large numbers and a smaller factor as the subject not being in line with the law of large numbers. It has to be noted here, that one session only consisted of eight subjects because not all recruited subjects showed up. For this session the factor was corrected for the decreased number of observations. Individuals do not provide estimates that can be explained by the law of large numbers (Binomial-Test, 5%-level). That means that while it is true that the precision of the stated estimates increases with the number of observations (Hypothesis 1a), the participants of this experiment do not follow the law of large numbers (Hypothesis 1b). The decrease of the interval width is by far less than predicted. The analysis stated above focused on repeating reception of a signal and the precision of subjects’ perception of this signal. As stated previously, there is a second possibility of acquiring information about a signal, which is the observation of others’ estimates. Analyzing the data of the second treatment, we see that the width of the estimated interval is getting smaller as well. This in turn confirms the observation that the precision of the estimate interval is increasing. 5.2 Hypothesis 2 Following the arguments derived from the literature developing Hypothesis 2, the expectation is that the same number of observations from the estimates of others would lead to higher precision than when the observations were made by the subject. Therefore, the width of the intervals in round two of Treatment 2 (with information) was compared with the width of the intervals in round ten of Treatment 1 (without information). At these stages, participants had the same number of observations, with the only difference being that in Treatment 1 their own observations were considered while in Treatment 2 the observations are estimates of other participants. Following the law of large numbers, one would not expect a difference between these data points. When considering the arguments on information cascade behavior, one would expect the intervals in round two with information to be even smaller than the interval in round ten without information. However, the contrary is true. The interval is significantly smaller for the same number of observations in Treatment 1 (Wilcoxon-Test, 1%-level). Furthermore, the changes in the width of the intervals for increasing numbers of rounds do follow very similar patterns in
208
E.B. Kroll, J. Rieger, and B. Vogt
both treatments. While the precision in round ten with information is slightly smaller than without information in the mean, the difference in the data is not significant at any common level of significance (Wilcoxon-Test). This is in line with Hypothesis 2. 5.3 Hypothesis 3 Until now, the analysis focuses on the precision of the estimates provided by the subjects. The data can also be interpreted in terms of how confident the subjects are with their point estimation. Further analysis is required to check the quality of the estimates. Therefore, the question is whether a group average or a group decision gets an estimate of higher quality. This is checked by comparing the distance between estimates provided by the participants with the true number of points in the scatter plot. For the analysis of the quality of the estimate, we calculate the midpoints of the intervals provided by the participants for all rounds in both treatments. Then the distance between these midpoints and the true value of 143 is calculated. Comparison of the treatments shows that without information about others (Treatment 1), the mean of the calculated midpoints remains at 153.75, while with information about others (Treatment 2), the mean shifts to 143.75 and is significantly closer to the true value (Wilcoxon-Test, 10%-level). The basis of this test is the difference between the midpoints of the estimated interval and the true value of points. Using the Wilcoxon-Test, we tested whether the differences are significantly different. Therefore, one can conclude that the quality of an estimate is higher for group decisions (Hypothesis 3), however, the confidence of the participants as reflected in the precision of the stated estimate does not differ between the groups. This favors our Hypothesis 3.
Fig. 2. Results compared by treatment
How Does Repetition of Signals Increase Precision of Numerical Judgment?
209
6 Conclusion This paper deals with the question of how the perception of imprecise signals changes in an experimental setting when the signal is repeated. We show that the processing of complex information of the human brain in most cases leads to imprecise numerical judgments. Furthermore, it was investigated whether or not the repetition of the consideration of complex information leads to more accurate judgments. It was shown that over the rounds of the experiments, the responses get more precise. But, this does not happen in the dimensions of the theoretically expected factor. The question whether or not humans follow statistical models when dealing with repeating inputs, with a focus on the law of large numbers as the benchmark, can be negated. While the notion of the precision increasing with repetition of the signal is true, we find that it does so to a significantly lower extent than the law of large numbers predicts. Furthermore, this paper underlines the importance of considering imprecise judgment in economic decision making even for repeated situations. In addition, it can be shown that imprecise judgment of repeated information over the rounds is not greatly improved, as it can be expected by theoretical methods. The second issue analyzed in this study is the value of observing the estimates of others in a group. The theoretical argument states that it does not matter whether additional information is received from observing the information of others or from an additional private observation. That means, the source of the information is not considered to be a factor. In our experiment, we show that the information about the decisions of the other experiment participants has an influence on the numerical response. It could also be investigated that responses which were made with the information about the responses of the other participants were closer to the real answer then those responses which were made without the information about the decision of the other participants. Although the literature on information cascades shows that people tend to be more comfortable with copying the actions of others, our data shows that observing the estimates of others has little to no effect on the precision of estimates. However, the quality of the estimates increases when the information is available. This finding is partly in line with overconfidence since the precision increases and partly with other literature, which describes the avoidance of errors by considering the opinion of others.
References [1] Kahneman, D., Knetsch, J.L.: The Endowment Effect, Loss Aversion, and Status Quo Bias. Journal of Economic Perspectives 5, 193–206 (1991) [2] Jevons, W.: The power of numerical discrimination. Nature 3, 281–282 (1871) [3] Braunstein, M.L.: Depth perception in rotating dot patterns: effects of numerosity and perspective. Journal of experimental psychology 64, 415–420 (1962) [4] Piazza, M., Mechelli, A., Price, C.J., Butterworth, B.: Exact and approximate judgements of visual and auditory numerosity: an fMRI study. Brain research 1106, 177–188 (2006) [5] Peterson, C.R., Beach, L.R.: Man As an Intuitive Statistician. Psychological Bulletin 68, 29–46 (1967) [6] Piaget, J., Inhelder, B.: The Origin of the Idea of Chance in Children. Norton, New York (1975)
210
E.B. Kroll, J. Rieger, and B. Vogt
[7] Akin, O., Chase, W.: Quantification of Three-Dimensional Structures. Journal of Experimental Psychology: Human Perception and Performance 4, 397–410 (1978) [8] Miller, J.: Discrete and continuous models of human information processing: theoretical distinctions and empirical results. Acta Psychologica 67, 191–257 (1988) [9] Fechner, G.T.: In Sachen der Psychophysik. Kessinger, Leipzig (1877) [10] Oishausen, B.A., Anderson, H., Essenla, D.C.: A Neurobiological Model of Visual Attention and Invariant Pattern Recognition Based on Dynamic Routing of Information. The Journal of Neuroscience 13, 4700–4719 (1993) [11] Miller, G.A.: The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review 63, 81–97 (1956) [12] Fiedler, K., Kareev, Y.: Does Decision Quality (Always) Increase With the Size of Information Samples? Some Vicissitudes in Applying the Law of Large Numbers. Journal of Experimental Psychology / Learning, Memory & Cognition 32, 883–903 (2006) [13] Kahneman, D., Tversky, A.: Prospect Theory: An analysis of decision under risk. Econometrica 47, 263–292 (1979) [14] Cox, J.C., Grether, D.M.: The preference reversal phenomenon - Response Mode Markets and Incentives.pdf. Economic Theory 7, 381–405 (1996) [15] Borst, A., Theunissen, F.E.: Information theory and neural coding. Nature Neuroscience 2, 947–957 (1999) [16] Sedlmeier, P., Gigerenzer, G.: Intuitions about sample size: the empirical law of large numbers. Journal of Behavioral Decision Making 10, 33–51 (1997) [17] Surowiecki, J.: The Wisdom of Crowds: Why the Many Are Smarter Than the Few. Abacus, Great Britain (2004) [18] Edwards, W.: The theory of decision making. Psychological Bulletin 51, 380–417 (1954) [19] Loomes, G., Butler, D.J.: Imprecision as an account of the preference reversal phenomenon. American Economic Review 1, 277–297 (2007) [20] Rabin, M.: Inference by Believers in the Law of Small Numbers. Quarterly Journal of Economics 117, 775–816 (2002) [21] Tversky, A., Kahneman, D.: Belief in the law of small numbers. Psychological Bulletin 76, 105–110 (1971) [22] Evans, J., Pollard, P.: Intuitive statistical inferences about normally distributed data. Acta Psychologica 60, 57–71 (1985) [23] Nisbett, R.E.: Rules for Reasoning. Earlbaum, Hillsdale (1992) [24] Kahneman, D., Tversky, A.: Subjective probability: A judgment of representativeness. Cognitive Psychology 3, 430–454 (1972) [25] Reagan, R.T.: Variations on a seminal demonstration of people’s insensitivity to sample size. Organizational Behavior and Human Decision Processes 43, 52–57 (1989) [26] Well, A.D., Pollatzek, A., Boyce, S.J.: Understanding the effects of sample size on the variability of the mean. Organizational Behavior and Human Decision Processes 47, 289– 312 (1990) [27] Bloomfield, R., Hales, J.: An Experimental Investigation of the Positive and Negative Effects of Mutual Observation An Experimental Investigation of the Positive and Negative Effects of Mutual Observation. The Accounting Review 84, 331–354 (2009) [28] González, M., Modernell, R., París, E.: Herding behavior inside the board: an experimental approach. Corporate Governance: An international review 14, 388–405 (2005) [29] Anderson, L.R., Holt, C.A.: Information cascades in the laboratory. American Economic Review 87, 847–862 (1997) [30] Grebe, T., Schmidt, J., Stiehler, A.: Do individuals recognize cascade behavior of others? An experimental study. Journal of Economic Psychology 29, 197–209
How Does Repetition of Signals Increase Precision of Numerical Judgment?
211
[31] DeMarzo, P.M., Vayanos, D., Zwiebel, J.: Persuasion Bias, Social Influence and Undimensional Opions. The Quarterly Journal of Economics 18, 909–968 (2003) [32] Prechter, R.R.: Unconscious herding behavior as the psychological basis of financial market trends and patterns. Journal of Psychology 2, 120–125 (2003) [33] Charness, G., Levin, D.: The Origin of the Winner’s Curse: A Laboratory Study. American Economic Journal: Microeconomics 1, 207–236 (2009) [34] Capen, E.C., Clapp, R.V., Campbel, W.M.: Competitive bidding in high-risk situations. Journal of Petroleum Technology 23, 641–653 (1971) [35] Eyster, E., Rabin, M.: Cursed Equilibrium. Econometrica 73, 1623–1672 (2005) [36] Cox, J.C., Isaac, R.M.: In Search of the Winner‘s Curse. Economic Inquiry 22, 579–592 (2007) [37] Greiner, B.: The Online Recruitment System ORSEE 2.0 - A Guide for the Organization of Experiments in Economics. University of Cologne, Cologne (2004) [38] Fischbacher, U.: z-Tree: Zurich toolbox for ready-made economic experiments. Experimental Economics 10, 171–178 (2007)
Sparse Regression Models of Pain Perception Irina Rish1 , Guillermo A. Cecchi1 , Marwan N. Baliki2 , and A. Vania Apkarian2 1
IBM T.J. Watson Research Center, Yorktown Heights, NY 2 Northwestern University, Chicago, IL
Abstract. Discovering brain mechanisms underlying pain perception remains a challenging neuroscientific problem with important practical applications, such as developing better treatments for chronic pain. Herein, we focus on statistical analysis of functional MRI (fMRI) data associated with pain stimuli. While the traditional mass-univariate GLM [8] analysis of pain-related brain activation can miss potentially informative voxel interaction patterns, our approach relies instead on multivariate predictive modeling methods such as sparse regression (LASSO [17] and, more generally, Elastic Net (EN) [18]) that can learn accurate predictive models of pain and simultaneously discover brain activity patterns (relatively small subsets of voxels) allowing for such predictions. Moreover, we investigate the effect of temporal (time-lagged) information, often ignored in traditional fMRI studies, on the predictive accuracy and on the selection of brain areas relevant to pain perception. We demonstrate that (1) Elastic Net regression can be highly predictive of pain perception, by far outperforming ordinary leastsquares (OLS) linear regression; (2) temporal information is very important for pain perception modeling and can significantly increase the prediction accuracy; (3) moreover, regression models that incorporate temporal information discover brain activation patterns undetected by non-temporal models.
1 Introduction Brain imaging studies of pain perception are a rapidly growing area of neuroscience, motivated both by a scientific goal of improving our understanding of pain mechanisms in the human brain and by practical medical applications [2,3,1,5,15]. Localizing painspecific brain areas remains a challenging problem due to a complex nature of pain perception that involves activations of multiple brain processes [2]. In this work, we focus on pain perception analysis based on fMRI studies, and explore advantages of statistical predictive modeling techniques known as sparse (l1 -regularized) regression. To our knowledge, this is the first attempt to analyze pain perception using the sparse regression methodology. Functional Magnetic Resonance Imaging (fMRI) uses MR scanner to measure the blood-oxygenation-level dependent (BOLD) signal, known to be correlated with neural activity in response to some input stimuli. Such scans produce a sequence of 3D images, where each image typically has on the order of 10,000-100,000 subvolumes, or voxels, and the sequence typically contains a few hundreds of time points, or TRs (time repetitions). Standard fMRI analysis approaches, such as the General Linear Model (GLM) [8], examine mass-univariate relationships between each voxel and the Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 212–223, 2010. c Springer-Verlag Berlin Heidelberg 2010
Sparse Regression Models of Pain Perception
213
stimulus in order to build statistical parametric maps that associate each voxel with some statistics that reflects its relationship to the stimulus. Commonly used activation maps depict the “activity” level of each voxel determined by the linear correlation of its time course with the stimulus. However, the GLM approach models each voxel separately, missing potentially important information contained in the interactions among voxels. Indeed, as it was shown in [12], highly predictive models of mental states can be built from voxels with submaximal activation. Recently, applying multivariate predictive methods to fMRI became an active area of research, focused on predicting “mental states” from fMRI data (see, for example, [14,9,4,6]). In this paper, we focus on sparse regression modeling, a fast-growing statistical field that aims at learning predictive models from data while simultaneously discovering sparse predictive patterns. Two main advantages of sparse modeling are (1) effective regularization via l1 -norm constraint that allows to avoid overfitting on small-sample, high-dimensional data (typical for fMRI) and (2) variable selection, naturally embedded into the model estimation due to sparsity-enforcing properties of l1 -regularization. Such embedded variable selection leads to both predictive and interpretable statistical models that pinpoint informative variables (e.g. groups of voxels) that are most relevant for prediction. (From the variable-selection perspective, GLM approach can be viewed as a more simplistic filter-based variable selection, where each variable/voxel is ranked separately by its relevance to the response/stimulus using a univariate criterion such as correlation; besides, GLM does not provide a predictive model of the response.) Specifically, we experiment with the Elastic Net (EN) [18]) regression, a recent extension of the original l1 -regularized linear regression method called Lasso [17]. Besides sparsity, EN also enforces a grouping property that Lasso lacks, namely, it tends to assign similar coefficients to highly-correlated variables, thus including (or excluding) them as groups. This EN property yields more interpretable solutions that show whole groups of relevant predictors (e.g, spatially coherent groups of voxels) rather than just single representatives of such groups [18,6]. We observe that the Elastic Net is capable of learning highly-predictive models of subjective pain-perception from fMRI data, often achieving 0.7-0.8 correlation between the predicted and actual pain ratings. In practically all cases, EN outperforms the Ordinary Least Squares (OLS) regression, often by far, indeed making use of regularization to prevent overfiting that usually hurts OLS. (Similar results are also observed when predicting ratings of a visual stimulus that are included in our fMRI the experiment together with pain ratings.) Another key aspect of this work is exploring effects of temporal information on the predictive modeling. As we demonstrate, incorporating functional dynamics by using the information from the past time slices (up to 8 in this study) provides consistent and often significant improvement in predictive accuracy. Moreover, using such temporal information may provide new insights into the brain mechanisms related to pain perception, since sparse temporal models discover highly-predictive and thus functionally relevant brain activity patterns that are left undetected by more traditional, non-temporal models.
214
I. Rish et al.
2 Materials and Methods 2.1 Experimental Setup Our analysis was performed on the fMRI dataset originally presented in [2]. The group of 14 healthy subjects participated in this study, including 7 healthy woman and 7 healthy men, of age 35.21±11.48yr; All gave informed consent to procedures approved by the Northwestern University Institutional Review Board committee. The experiment consisted of two sessions, focusing on two different rating tasks, respectively: pain rating and visual rating. The visual task was included in order to compare the activation patterns that relate specifically to pain perception versus the patterns that related, in general, to rating the magnitude of different types of stimuli; as it was observed by [2], “brain activations segregate into two groups, one preferentially activated for pain and another one equally activated for both visual and pain magnitude ratings”. During the first session (pain rating), the subjects in the scanner were asked to rate their pain level (using a finger-span device) in response to a painful stimuli applied to their back. An fMRI-compatible device was used to deliver fast-ramping (20C/s) painful thermal stimuli (baseline 38C; peak temperatures 47, 49, and 51C) via a contact probe. During each session, nine such stimuli were generated sequentially, ranging in duration from 10s to 40s, with similar-length rest intervals in between. During the second session (visual stimulus rating), subjects had to rate the magnitude of the bar length, which was actually following their ratings of the thermal stimulus (although the subjects were unaware of this). The data were acquired on a 3T Siemens Trio scanner with echo-planar imaging (EPI) capability using the standard radio-frequency head coil. An average of 240 volumes were acquired for each subject, and each task, with the repetition time (TR) of 2.5s. Each volume consists of 36 slices (slice thickness 3mm), each of size 64 × 64 covered the whole brain from the cerebellum to the vertex. The standard fMRI data preprocessing was performed using the Oxford Centre for Functional MRI of the Brain (FMRIB) Expert Analysis Tool (FEAT; Smith et al. 2004, http://www.fmrib.ox.ac.uk/fsl), including, for each subject: skull extraction using a brain extraction tool (BET), slice time correction, motion correction, spatial smoothing using a Gaussian kernel of fullwidth half-maximum 5 mm, nonlinear high-pass temporal filtering (120 s), and subtraction of the mean of each voxel time course from that time course. Pain and visual ratings were convolved with a generalized hemodynamic response function (gamma function with 6s lag and 3 s SD). 2.2 Methods Let X1 , · · · , XN be a set of N predictors, such as voxel intensities (BOLD signals), and let Y be the response variable, such as pain perception rating, or visual stimulus. Let X = (x1 | · · · |xN ) denote the M × N data matrix, where each xi is an M -dimensional vector consisting of the values for predictor Xi for M data samples, while the M dimensional vector y denotes the corresponding values for the response variable Y . When using regularized regression, such as Lasso and Elastic Net, the data are usually preprocessed, ensuring that the response variable is centered to have zero mean and
Sparse Regression Models of Pain Perception
215
all predictors have been standardized to have zero mean and unit length. Herein, we consider the problem of estimating the coefficients βi in the following linear regression model ˆ = x1 β1 + · · · xN βN = Xβ, (1) y ˆ is an approximation of y. As a baseline, we use the Ordinary Least Squares where y (OLS) regression which finds a set of βi that minimize the sum-squared approximation error ||y − Xβ||22 of the above linear model. When X has the full column-rank (which also implies that the number of samples M is larger than the number of variables N ), OLS find the (closed-form) unique solution βˆ = (XT X)−1 XT y. However, when N > M , as it is often the case in fMRI data with (dozens of) thousands of predictors (voxels) and only a few hundreds of samples (TRs), here is no unique solution, and some additional constraints are required. (Herein, we used the pseudoinverse based on the Matlab pinv function in order to solve OLS when N > M ). However, in general, OLS solutions are often unsatisfactory, since (1) their predictive accuracy can be low due to overfitting, especially in presence of large number of variables and relatively small number of samples and (2) no variable selection occurs with OLS (i.e., all coefficients tend to be nonzero), so that it is hard to pinpoint which predictors (e.g., voxels) are most relevant to the response. Various regularization approaches have been proposed in order to handle large-N , small-M datasets, and to avoid the overfitting [13,10,11,17]. Moreover, recently proposed sparse regularization methods such as Lasso[17] and Elastic Net [18] address both of the OLS shortcomings, since variable selection is embedded into their model-fitting process. Sparse regularization methods include the l1 -norm regularization on the coefficients1 , which is known to produce sparse solutions, i.e. solutions with many zeros, thus eliminating predictors that are not essential. In this paper, we use the Elastic Net (EN) regression [18] that finds an optimal solution to the least-squares (OLS) problem objective, augmented with additional regularization terms that include the sparsity-enforcing l1 -norm constraint on the regression coefficients that “shrinks” some coefficients to zero, and a “grouping” l2 -norm constraint that enforces similar coefficients on predictors that are highly correlated with each other, thus allowing selection of relevant groups of voxels, which l1 -constraint alone is not providing. This can improve the interpretability of the model, for example, including a group of similarly relevant voxels, rather than one representative voxel from the group. Formally, EN regression optimizes the following function2: Lλ1 ,λ2 (β) = ||y − Xβ||22 + λ1 ||β||1 + λ2 ||β||22 .
(2)
In order to solve the EN problem, we use the publicly available Matlab code [16] that implements the LARS-EN algorithm of [18]. It takes as an input the grouping parameter λ2 and the sparsity parameter that specifies the desired number of selected predictors. Since this number corresponds to a unique value of λ1 in Eq. 2, as shown in [7], we 1
2
q 1/q Given some q > 0, anlq -norm is defined as lq (β) = ( N . E.g., ||β||1 = i=1 |βi | ) N N 2 i=1 |βi |, and ||β||2 = i=1 βi . Note that EN becomes equivalent to Lasso when λ2 = 0 and λ1 > 0, while for λ1 = 0 and λ2 > 0 it is equivalent to ridge regression.
216
I. Rish et al.
will slightly abuse the notation, and, following [6], denote the sparsity parameter as λ1 , while always interpreting it as the number of selected predictors. Selecting Predictor Sets: Temporal and Non-temporal. When predicting a stimulus or behavior from fMRI data, it is typical to use as the predictors the voxels intensities at the current TR, and treat TRs as independent and identically distributed (i.i.d.) samples [6]. However, temporal information from the past TRs may sometimes improve the predictive model, as we demonstrate in this paper. We considered as a set of predictors all voxels from the past 8 time lags (previous TRs), including the current TR. However, due to very high dimensionality of this set, we selected only a subset of those voxels that were correlated with the response variable above the given threshold (herein, we used = 0.2). (Note that time-lagged voxel’s time series were shifted forward by the appropriate lag in order to properly align it with the response time series). Overall, we experimented with the following sets of predictors: Set1 - all brain voxels at the current TR; Set2 a subset of Set1 that included only (current-TR) voxels correlated with the response variable above the same threshold = 0.2, for a more fair comparison with time-lagged voxel subset described above, that we denoted Set3. For the response variable, we first used the pain perception, and then the visual stimulus (recomputing Set2 and Set3 according to the correlation with the different response; we denote the corresponding sets for visual stimulus as Set2v and Set3v ). Moreover, we experimented with the two more sets of predictors, that we refer to as pain-only voxels, which were obtained by removing the “visual” voxels from the “pain” voxels, in the corresponding time-lag and no-lag settings. Specifically, “painonly” time-lagged voxel Set4 was obtained by removing the time-lagged visual voxels Set3v from the time-lagged pain voxels Set3, while the “pain-only” no-lag (current TR) voxel set Set5 was obtained by removing the no-lag visual voxels Set2v from the no-lag pain voxels Set2, respectively. The objective of the experiments with the “painonly” voxels, inspired by the similar work of [2] (that was performed in GLM rather than predictive setting, and without considering temporal information) was to test the hypothesis that exclusion of the voxels common to both pain- and visual stimulus rating (and thus possibly just relevant to magnitude rating) leaves us with a set of only painrelevant voxels that contains enough information for a good predictive modeling of pain. We experimented with EN, varying the sparsity (number of voxels selected into EN solution) and grouping (weight on the l2 -norm) parameters, and compared the results with the OLS as a baseline. We used the first 120 TRs for training the model, and the remaining 120 TRs for testing its predictive accuracy, measured by the Pearson’s ˆ. correlation coefficient ρ(ˆ y, y) between the response variable y and its prediction y The resulting sparse EN solutions were visualized as brain maps.
3 Results EN parameter selection We explored a range of grouping parameters for EN, from λ2 = 0.1 to 20, and observed, similarly to [18,6], that higher values of grouping parameter yielded similar
Sparse Regression Models of Pain Perception
217
Pain Prediction 0.8
predictive accuracy (corr w/ response)
0.75
0.7
0.65
OLS lambda2=0.1 lambda2=1 lambda2=5 lambda2=10
0.6
0.55
0
500
1000
1500
number of voxels (sparsity)
Fig. 1. Effects of sparsity and grouping parameters on the performance of EN
(and often better) predictive accuracy while allowing EN to select larger and more spatially coherent clusters of correlated voxels, and thus improving the interpretability of the corresponding brain maps. A typical behavior of EN as a function of its sparsity and grouping parameters is shown in Fig. 1, for one of the subjects: as λ2 increases, the peak performance is achieved for larger number of selected voxels. As we can also see, higher λ2 achieve better peak performance than the lower ones, which seems to be a common trend in other subjects as well, although there are a few exceptions. We also noticed that increasing λ2 beyond 20 did not produce any further significant improvement in performance, and thus we decided to fix the grouping parameter to λ2 = 20 in our experiments. In the following, we present the results for EN with the fixed sparsity parameter of 1000 voxels, since EN’s predictive accuracy often reached a plateau around this number. Later in this section we will also present the full set of experiments with varying sparsity, for all subjects, and for all subsets of predictors discussed above. Elastic Net versus OLS First, we observe a significant improvement in predictive accuracy when comparing EN versus OLS on the same subsets of voxels. Fig. 2 shows the results for OLS versus EN with a fixed sparsity parameter λ1 = 1000 voxels, and the grouping parameter λ2 = 20. Specifically, Fig. 2a shows the results for both methods on the same Set3 of temporal (time-lagged) voxels. For all 14 subjects, EN was always making a prediction that was better than OLS prediction, measured by correlation between the predicted and actual pain rating of a particular subject, and the improvement was often quite significant (e.g., from 0.2 to about 0.65, or from 0.3-0.4 to 0.55). (Note that the straight line in the Fig. 2a corresponds to equally predictive values, and EN is always above it). Similar results were observed for visual stimulus prediction, where EN was compared to OLS on Set3v of the corresponding temporal (time-lagged) voxels for visual stimulus (Fig. 2c). On 12 out of 14 subjects, EN made more accurate predictions than OLS, often improving the correlation of the prediction with the response from about 0.3-0.5 to about 0.6-0.7. Finally, EN was also clearly outperforming OLS on the set of time-lagged “pain-only” voxels (Set4), as shown in Fig. 3d.
218
I. Rish et al. Pain: EN w/lag vs EN no lag
0.8
0.8
0.7
0.7
0.6
0.6
EN (lag)
EN (lag)
Pain: EN vs OLS (both w/ lag)
0.5
0.5
0.4
0.4
0.3
0.3
0.2 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.2 0.2
0.9
0.3
0.4
OLS (lag)
Pain: (a) Set3, EN vs OLS 0.9
0.8
0.8
0.7
0.7
0.6
0.6
EN (lag)
EN (lag)
0.9
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0.2
0.4
0.8
0.9
1
1.2
0.5
0.4
0
0.7
Visual: EN lag vs EN no lag 1
−0.2
0.6
(b) EN on Set3 vs Set1
Visual: EN vs OLS (both w/ lag) 1
0 −0.4
0.5
EN (no lag, all vox)
0.6
OLS (lag)
Visual: (c) Set3V , EN vs OLS
0.8
1
0 −0.2
0
0.2
0.4
0.6
0.8
EN (no lag, all vox)
(d) EN on Set3V vs Set1
Fig. 2. Prediction results for Elastic Net (fixed sparsity λ1 = 1000 voxels, grouping λ2 = 20) versus OLS, for predicting pain perception and visual stimulus. First column: Elastic Net outperforms OLS for (a) pain perception prediction on the voxel Set3 (time-lagged pain voxels) and (c) visual stimulus prediction on the Set3V (time-lagged visual voxels). Second column: effects of temporal information - EN w/ time-lag outperforms EN w/ no lag for (b) pain prediction on the time-lagged Set3 voxels vs Set1 voxels (no-lag, full-brain) and (d) visual prediction on the time-lagged visual voxels Set3V vs Set1 voxels (no-lag, full-brain).
Temporal information: EN with time-lag outperforms EN without time-lag Next, we compare the prediction results for EN on the time-lagged voxels versus EN on the current-TR (no-lag) voxels. Fig. 2b shows the results for pain perception prediction when using EN on time-lagged voxels (Set3) versus the current-TR, full-brain voxel set (Set1). (Note that the results for EN on Set1 and Set2 were almost identical, perhaps because EN would not select voxels below the correlation threshold = 0.2 with the response variable, anyway; similarly, for visual prediction, we did not see much difference between Set1 and Set2V - for more detail, see Fig. 5a,b.) We can see that using temporal information in the time-lagged voxels very often improves the predictive performance, sometimes quite dramatically, e.g., from about 0.47 to about 0.6. Again, similar results are observed for visual stimulus prediction (Fig. 2d, using time-lagged
Sparse Regression Models of Pain Perception Lagged "pain−only" vs no−lag, all voxels"
0.8
0.8
0.7
0.7
EN (lag, setdiff)
EN (lag, setdiff)
Lagged "Pain−only" voxels vs. just lag
0.6
0.5
0.6
0.5
0.4
0.4
0.3
0.3
0.2 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.2 0.2
0.9
0.3
0.4
EN (lag)
0.7
0.7
EN (lag, setdiff)
EN (lag, setdiff)
0.8
0.6
0.5
0.9
0.5
0.4
0.3
0.3
0.6
EN (no lag, setdiff, top vox)
(c)
0.8
0.6
0.4
0.5
0.7
"Pain−only" (set−difference) voxels: EN vs OLS (both w/ lag)
0.8
0.4
0.6
(b)
Lagged "pain−only" vs no−lag "pain−only" voxels"
0.3
0.5
EN (no lag, all vox)
(a)
0.2 0.2
219
0.7
0.8
0.9
0.2 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
OLS (lag, setdiff)
(d)
Fig. 3. Information preserved in “pain-only” voxels: EN pain perception prediction on (a) timelagged, “pain-only” voxels (Set4) versus all time-lagged voxels (Set3), (b) time-lagged, “painonly” voxels (Set4) versus full-brain, but no lag voxels (Set1), (c) time-lagged, “pain-only” voxels (Set4) versus the “pain-only” voxels selected without the lag (Set5) and (d) EN versus OLS for pain perception prediction using “pain-only” time-lagged voxels Set4.
Set3V vs no-lag Set1), and for pain perception prediction when using pain-only voxels (Set4 vs Set5) (Fig. 3c), although the advantages of using time-lagged versus no-lag voxels were most clear for pain perception, where time lag practically always improved the performance, unlike the cases of visual and pain-only voxels. This suggest that brain states immediately preceding the pain rating contain a significant amount of information related to the pain perception, and thus should be taken into account in any pain perception model (note that current GLM approaches do not incorporate time lag). Predictive information preserved in pain-only voxels Finally, we investigate how much information about the pain perception is preserved in the pain-only time-lagged voxels (Set4), i.e. the voxels remaining after eliminating from the ”pain” time-lagged voxels (Set3) all the voxels that also appear in the “visual” time-lagged voxels (Set3V ). The hypothesis suggested by [2], as we mentioned before, is that the pain-rating task also activates brain areas generally related to rating the
220
I. Rish et al.
magnitude of a stimulus (e.g., visual stimulus) besides activating pain-specific areas; removing voxels common to pain and visual ratings would allow for a better localization of the pain-specific areas. While [2] investigate such areas in GLM rather than predictive setting, and do not exploit temporal information, we explore the “pain-only” voxel sets chosen by sparse predictive model such as EN, both with and without the time-lag. We observe that (a) as expected, just reducing the set of all time-lagged voxels (Set3) to its pain-only subset may somewhat lower the predictive accuracy (Fig. 3a); (b) however, time-lagged pain-only voxels are still preserving enough information to frequently (6 out of 14 subjects) outperform even the full-brain set of voxels that ignores such temporal information (Set1), as shown in (Fig. 3b); (c) moreover, time-lagged pain-only voxels outperform, or are comparable with, the no-lag pain-only voxels (Set5) even more frequently, on 9 out of 14 subjects Fig. 3c, and (d) finally, EN on pain-only, timelagged voxels Set3 clearly outperforms OLS on the same set of voxels, just as observed earlier for other voxels subsets (Fig. 3d). Varying EN sparsity level While the above results were obtained for EN with a fixed sparsity (1000 voxels), Fig. 5 shows a more comprehensive set of results, where the sparsity, i.e. the number of voxels selected by EN, was varied from 30 to 2000. The first row shows the results for pain perception prediction (using the time-lagged Set3 and no-lag Set1 and Set2), the second row shows the results for visual stimulus prediction (using the corresponding time-lagged Set3V and no-lag Set1 and Set2V ), and the third row shows the results for pain perception prediction when using pain-only, time-lagged voxels (Set4) and no-lag Set5, also compared with the no-lag, full-brain set of voxels Set1. Each subplot shows the results for one subjects, and in each row, subjects are sorted in by the predictive accuracy of OLS on the corresponding set of voxels. We can see that the accuracy of EN, especially on the Set1 and Set2 typically increases with increasing number of voxels selected, and stabilizes around 1000 voxels (this is why we selected this sparsity level for the comparison presented above). Interestingly, however, the time-lagged
Fig. 4. Brain maps visualizing sparse EN solutions over no-lag (Set1, red and blue) versus timelag (Set3, green and pink) subsets of voxels
Sparse Regression Models of Pain Perception
221
voxels in Set3 and Set4 sometimes reach the best performance for a much lower number of voxels (from 30 to 500), and then the performance may actually decline. This suggest that a relatively small number of time-lagged voxels may contain a better predictive information than similar or larger number of non-temporal voxels. Clearly, using cross-validation to select best sparsity parameter value for each voxel subset, rather than fixing the sparsity level to 1000 voxels as we did in Fig. 2 and Fig. 3, would show an even more dramatic performance improvement in predictive performance due to inclusion of time-lagged voxels. (We hope to include these cross-validation results into the final version in case the paper is accepted for publication.) Predicting pain perception OLS and EN on time-lagged voxels (Set3), and EN on the voxels without the lag (Set1 and Set2 ) 0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.1 0
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1
0 1000 2000
0
0.1 0
0 1000 2000
0 1000 2000
0.2 OLS (Set3 − time lag) EN (Set3 − time lag) 0.1 0.1 0.1 0.1 0.1 0.1 EN (Set1 − no 0.1lag/all vox) EN (Set2 − no lag/top vox) 0 0 0 0 0 0 0 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000
Predicting visual stimulus OLS and EN on time-lagged voxels (Set3V ), and EN on the voxels without the lag (Set1 and Set2V ) 0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.1 0
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1 0
0 1000 2000
0.1
0 1000 2000
0
0.1 0
0 1000 2000
0 1000 2000
0.2 OLS (Set3V − time lag) EN (Set3V − time lag) 0.1 0.1 0.1 0.1 0.1 0.1 EN (Set1 − no0.1 lag/all vox) EN (Set2V − no lag/top vox) 0 0 0 0 0 0 0 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000
Predicting pain perception w/ pain-only voxels OLS and EN on time-lagged, pain-only voxels (Set4), EN on no-lag, pain-only voxels(Set5), and EN on Set1 0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.9
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.3
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.2
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0
0
0
0
0
0
0 1000 2000
0 1000 2000
0 1000 2000
0 1000 2000
0 1000 2000
0 1000 2000
0
0 1000 2000
0.2 0.2 OLS (Set4 − time lag/pain−only) EN (Set4 − time lag/pain−only) 0.1 0.1 0.1 0.1 0.1 0.1 − no lag/all vox) 0.1 EN (Set1 EN (Set5 − no lag/top vox/pain−only) 0 0 0 0 0 0 0 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000 0 1000 2000
Fig. 5. Prediction results for Elastic Net with varying sparsity from 30 to 2000 predictors/voxels, and same grouping parameter λ2 = 20 as in Figures 2 and 3
222
I. Rish et al.
Brain Maps: Visualizing Sparse Solutions Finally, Fig. 4 displays the brain maps corresponding to the sparse models found by EN when applied to the time-lagged (Set3) vs no-lag, full-brain (Set1) sets of predictors. For each of the 14 subjects, we produce two EN solution maps (for lag vs no-lag voxels), where the value at each voxel corresponds to its coefficient in the regression model. Next, we aggregate the maps of each type (lag vs no-lag) over the 14 subjects by selecting only statistically significant voxels, using binomial test of statistical significance. We show the superposition of the two resulting spatial maps corresponding to EN solutions on no-lag Set1 (red and blue for positive and negative values, respectively), and on time-lagged Set3 (green and pink). Given that the lag model also includes zero lags, it is expected that both models overlap. Indeed, this is what we observe in most of the areas identified by the full model as relevant for prediction; we highlight three of them with dashed circles. Less obviously, although intuitively expected, we also observe that the lag model selects a significant number of voxels that are disregarded by the no-lag model; some of them are indicated by the arrows, and include both positive (green) and negative (pink) values. Given that our time-lagged models are highly-predictive, these areas must contain functionally relevant information about pain perception that is ignored by the non-temporal models.
4 Conclusions Based on our results, we conclude that: (a) the use of sparse predictive modeling for pain perception analysis reveals that functional MRI signals possess considerably higher information than expected by a simple linear regression approach; (b) functional dynamics can also considerably increase the amount of information about the subject’s performance, as opposed to single-TR, or “instantaneous” approaches, and (c) the sparse, temporal models reveal functional areas that are not identified by non-temporal approaches, and yet can be highly predictive and thus functionally relevant.
Acknowledgements Marwan N. Baliki was supported by an anonymous donor; A. Vania Apkarian and experimental work were supported by NIH/NINDS grant NS35115.
References 1. Apkarian, A.V., Bushnell, M.C., Treede, R.D., Zubieta, J.K.: Human brain mechanisms of pain perception and regulation in health and disease. Eur. J. Pain (9), 463–484 (2005) 2. Baliki, M.N., Geha, P.Y., Apkarian, A.V.: Parsing pain perception between nociceptive representation and magnitude estimation. Journal of Neurophysiology (101), 875–887 (2009) 3. Baliki, M.N., Geha, P.Y., Apkarian, A.V., Chialvo, D.R.: Beyond feeling: chronic pain hurts the brain, disrupting the default-mode network dynamics. J. Neurosci. (28), 1398–1403 (2008) 4. Battle, A., Chechik, G., Koller, D.: Temporal and Cross-Subject Probabilistic Models for fMRI Prediction Tasks. In: Sch¨olkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 121–128. MIT Press, Cambridge (2007)
Sparse Regression Models of Pain Perception
223
5. Buchel, C., Bornhovd, K., Quante, M., Glauche, V., Bromm, B., Weiller, C.: Dissociable neural responses related to pain intensity, stimulus intensity, and stimulus awareness within the anterior cingulate cortex: a parametric single-trial laser functional magnetic resonance imaging study. J. Neurosci. (22), 970–976 (2002) 6. Carroll, M.K., Cecchi, G.A., Rish, I., Garg, R., Rao, A.R.: Prediction and Interpretation of Distributed Neural Activity with Sparse Models. Neuroimage 44(1), 112–122 (2009) 7. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Statist. 32(1), 407–499 (2004) 8. Friston, K.J., et al.: Statistical parametric maps in functional imaging - a general linear approach. Human Brain Mapping 2, 189–210 (1995) 9. Pereira, F., Gordon, G.: The Support Vector Decomposition Machine. In: ICML 2006, pp. 689–696 (2006) 10. Frank, I., Friedman, J.: A statistical view of some chemometrics regression tools. Technometrics 35(2), 109–148 (1993) 11. Fu, W.: Penalized regression: the bridge versus the lasso. J. Comput. Graph. Statist. 7(2), 397–416 (1998) 12. Haxby, J.V., Gobbini, M.I., Furey, M.L., Ishai, A., Schouten, J.L., Pietrini, P.: Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex. Science 293(5539), 2425–2430 (2001) 13. Hoerl, A., Kennard, R.: Ridge regression. Encyclopedia of Statistical Sciences 8(2), 129–136 (1988) 14. Mitchell, T.M., Hutchinson, R., Niculescu, R.S., Pereira, F., Wang, X., Just, M., Newman, S.: Learning to Decode Cognitive States from Brain Images. Machine Learning 57, 145–175 (2004) 15. Price, D.D.: Psychological and neural mechanisms of the affective dimension of pain. Science (288), 1769–1772 (2000) 16. Sj¨ostrand, K.: Matlab implementation of LASSO, LARS, the elastic net and SPCA, Version 2.0. (June 2005) 17. Tibshirani, R.: Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58(1), 267–288 (1996) 18. Zou, H., Hastie, T.: Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society, Series B 67(2), 301–320 (2005)
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink Chen Xie1,2,3 , Lun Zhao4,5, Duoqian Miao1,2,3, , Deng Wang1,2,3 , Zhihua Wei1,2,3 , and Hongyun Zhang1,2,3 1
Department of Computer Science and Technology, Tongji University, Shanghai 201804, P.R.China 2 Key Laboratory of Embedded System & Service Computing, Ministry of Education of China, Tongji University, Shanghai 201804, P.R.China 3 Tongji Branch, National Engineering & Technology Center of High Performance Computer, Shanghai 201804, P.R.China Tel.: +86-21-69589375 [email protected] 4 Visual Art & Brain Cognition Lab., Beijing Shengkun Yanlun Technology Co. Ltd., Beijing 100192, P.R.China 5 Institute of Public Opinion, Renmin University of China, Beijing, P.R.China
Abstract. In this study, we investigated the existence of the temporal component of Mozart effect, analyzed the influence of arousal or mood changing to attentional blink task performance when listening to Mozart Sonata. The results of the experiment showed the performance of subjects in attentional blink task did not significantly improve when they listened to Mozart Sonata played in either normal or fast speed. It is indicated that the temporal component of Mozart effect is not general exist. We propose that Mozart Sonata might possibly induce listener’s arousal or mood shifting, but could not give significantly influence to temporal attention.
1
Introduction
A set of research results indicate that listening to Mozart’s music may induce a short-term improvement on the performance of certain kinds of mental tasks. Mozart effect is reported firstly by Rauscher, Shaw, and Ky (1993)[11] who investigated the effect of listening to music by Mozart on spatial reasoning. In their study, the subjects got 8 to 9 points improvement in spatial-temporal tasks after they listened 10 min Mozart’s Sonata for Two Pianos in D Major, K.448). However, among the large number of attempts trying to replicate the findings, some have, indeed, reproduced the findings, while others failed to show a significant effect of listening to Mozart’s music. Nonetheless, despite critical discussions, the more widely accepted account to explain those failures of replication is that Mozart’s music may induce the change of listener’s arousal or mood rather than their spatial-reasoning ability, and that change may influence
Corresponding author.
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 224–231, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink
225
the spatial reasoning processing. It is well know that arousal and mood influence cognition. According to the arousal-mood hypothesis, listening to music affects arousal and mood, which then influence performance on various cognitive skills[3][5][10][12][13][14]. Several studies supported the similar results that participants who listened quick major music had better performance in tests than those who listened slow minor music. For example, in one study, researchers examined the effects of Musical Tempo and Mode on arousal, mood, and spatial abilities. They detailed their experiments that participants were asked to do the paper-folding-and-cutting (PF&C) task while listening to one of four versions of the Mozart sonata (K.448) by adjusting specific properties of the music: tempo (fast or slow) and mode (major and minor). According to their results, exposure to fast-major K.448 helped participants improving performance significantly [2]. Further more, another report claimed their finding of the existence of a temporal component of the ’Mozart effect’ in non-spatial visual Attentional Blink task experiments[1]. They compared participants’ temporal attention in attentional blink task under three conditions (Mozart sonata played normally, in reverse and in silence). They put forward the result that ’Mozart effect’ influenced temporal attention. It is discussed that the temporal influence may depend on the changing of the arousal or mood induced by Mozart’s music. It is an exciting finding if the temporal component does exist in Mozart effect. To assess the validity and determine the explanation of Mozart effect’s temporal influence, more evidence and analysis are needed. Attentional blink will be introduced later in detail at ’Prior knowledge’. The purpose of present study was to validate whether Mozart effect can influence temporal attention in a general way, in another word, if Mozart effect temporal influence is a robust phenomenon. Following this, we also further investigate the reliable explanation of the Mozart effect temporal influence if it exactly exists. Toward this end, we also using attentional blink experiment, as attentional blink can be viewed as a method to access the limits of humans’ ability to consciously perceive stimuli distributed across time. We manipulated audio background conditions in the experiment as: in silence (baseline), Mozart Sonata (K.448 D Major) played normally and Mozart Sonata (K.448 D Major) played in fast speed. We predicted that, if Mozart effect temporal influence exists, participants should do better in attentional blink task when they listen to Mozart Sonata played in normal than in silence. As enjoyment ratings were much higher when listening to faster major music, if the Mozart effect temporal influence depends on the arousal or enjoyment induces, participants should do the best in the attentional blink task under the Mozart Sonata (K.448 D Major) fast condition among those three audio background conditions. We will briefly address those three experiment conditions as: silence, Mozart normal and Mozart fast condition in the following.
2
Prior Knowledge
Visual attention plays a vital role in visual cognition. The mechanism of visual attention has been studied over 50 years as one of the major goals of both
226
C. Xie et al.
cognitive science and neuroscience[6]. In the last 15 years, the intense interest among researchers has shifted from the mechanisms and processes involved in deploying across space dimension to time dimension[9]. Attentional blink is a robust phenomenon which reflects human attention constraint. In a typical attentional blink experiment, participants are required to observe a rapid stream of visually presented items (RSVP). There are two targets (T1 and T2) embedded in the stream of nontargets (i.e., distracters). Participants are instructed to report the two targets after the stimulus stream ended. The attentional blink is defined as having occurred when T1 is reported correctly but report of T2 is inaccurate at short T1-T2 lags, typically between 100 to 500 ms, but recovers to the baseline level of accuracy at longer intervals. Fig. 1 shows standard attentional blink task and results.
Fig. 1. Standard attentional blink task and results
Theoretical accounts of the Attentional blink indicate that attentional demands of T1 for selection, prevents attentional resource from being applied to T2 and transiently impairs the redeployment of these resources to subsequent targets at short T1-T2 lags. The research of attentional blink helps us to investigate human reaction in some real-life situations in which multiple events may rapidly succeed each other (e.g., in traffic).
3 3.1
Methods Subjects
Twenty six participants between 21 and 27 years old (Mean = 23.9) were recruited from the local university of applied sciences, twelve were female, and all right-handed. Subjects were paid for participation and oral consent was obtained prior to start of the experiment. All participants had normal or corrected-tonormal visual acuity and normal hearing by self-report. The experiments lasted for approximately 40 min. All participants had no specific music or instruments learning experience.
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink
3.2
227
Apparatus and Materials
The software program E-Prime (Psychology Software Tools, Inc., Pittsburgh, PA ) installed on a desktop computer with CRT monitor ( screen refresh rate of 85Hz) was used to display the visual stimuli and record the data. The distance between participant and monitor screen was approximately 65 cm. Participant sat directly in front of the monitor in a quiet experimental room and had a comfortable sight view of the screen. Visual stimuli consisted of letters from the alphabet (omitting letters I, O, Q, and S) and digits 2 to 9, was displayed in black in the center of a gray background with Courier New font, size 22. Auditory stimulus was Mozart Sonata for Two Pianos in D Major, K.448 played in normal speed (tempo of 120 bpm) or fast speed (tempo of 65 bpm), and was played over headphone. In silence condition, no music was played over the headphone. 3.3
Design and Procedure
The present study employed a dual target task in four blocks. The first block was practice block with 10 trials under silence condition and not included in statistic, the left three blocks were statistic blocks with 100 trials each and under silence, Mozart normal, Mozart fast condition respectively. Each trial began with the presentation of a fixation cross ’+’ for 1000ms followed by 13 - 21 distracter letters (presented randomly without replacement from 22 letters except letter ’X’ ), one of which was replace by a digit (first target T1, presented randomly without replacement from 8 digits). The letters and digit were presented for 65ms each, followed by a 15 ms blank interval. The second target T2 in each trial was letter ’X’, presented on 80% of the trials with 3-6 positions randomly from the end of the stimulus steam. The first target digit (T1) was presented
Fig. 2. Sequence of screen presentation of a typical trial
228
C. Xie et al.
randomly 1, 3, 5, 8 stream positions (80ms, 240ms, 400ms, 640ms) before T2. After the presentation of RSVP in each trial, two questions about T1 and T2 (’Was the digit even number or odd number?’, ’Was there a letter ’X’ in the stream?’ ) was presented orderly. The participants were instructed to answer these two questions by pressing the specific letter key on the keyboard of the computer at the end of each trial. The second question was presented 250ms after the response to the first question. The next trial began 500ms after the participants had responded to the second question.(see Fig. 2) Participants were asked to concentrate their mind to the RSVP on the screen and answer the two questions as accurately as possible. All responds of the participants were recorded. The experiment was within-participants manipulation with balanced block design of conditions (i.e. Mozart Normal-Silence-Mozart Fast, Silence-Mozart Fast-Mozart Normal, Mozart Fast-Mozart Normal-Silence).
4
Results
The data of all twenty six participants were taken into statistics. We concerned the second target T2 report accuracy at the trials that the first target was reported correctly. Fig. 3 shows Mean T2 detection accuracy while T1 detect correctly as a function of Condition and Lags. Lag 0 represents the trails which contained no letter ’x’, Lag1, 3, 5, 8 represent T1-T2 lags. As we can see in Fig. 3, at Lag0, 5, 8 the accuracy of T2 is almost the same, at lag1 and lag3, the accuracy of T2 is slightly better under the Mozart Normal condition than under the silence condition. Nevertheless, under Mozart fast condition, the accuracy
Fig. 3. Mean T2 detection accuracy while T1 detect correctly as a function of Condition and Lags
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink
229
of T2 is worse than under the Mozart Normal condition and almost the same as under the silence condition. This is not consistent with previous hypothesis. This result indicates that even if any Mozart effect influence exists, it is not induced by the change of arousal and enjoyment. To make clear if changes of the accuracy of T2 among three experiment conditions were significant, we performed the two-way analysis of variance (ANOVAs) on the accuracy data for T2 with the within-participants factors of Condition(Mozart normal, Mozart fast, or silence) and T1-T2 Lags(0,1,3,5,8). According to the results of statistics in SPSS, there were no main effect of condition, F(2, 50) = 1.045, p > .05, nor any interaction between condition and lag, F(8, 200) = 0.731, p > .05. The main effect of lag was significant, F(4, 100) = 45.089, p < .001. Subsequent pair-wise comparisons revealed significant differences among all 4 lags (omitting lag0, Lag0 was not related to attentional blink phenomena.), p < .05, except the differences between lag1 and lag3 (p > .05).
5
Discussion
In the present study, we conducted the attentional blink experiments under three conditions (Mozart normal, Mozart fast and silence). The results of the experiments revealed that though it seemed there was a slight trend of accuracy improvement on detecting the second target T2 at lag 1 and lag 3 under Mozart normal condition than silence condition, but there was no significant difference between these two conditions. In another words, we didn’t observe Mozart effect temporal component in the present study. In a different report[1] which claimed the significant temporal influence of Mozart Sonata, the ANOVAs analysis result of the difference between Mozart Sonata and silence was little smaller than significance level, and was little bigger than significance level while excluding the non-blink participants. One explanation for this inconsistence could be that the temporal influence over attention of Mozart effect may not exist or not strong in general. Even if this influence does exist, the factor induced it could not be the change of arousal caused by Mozart effect, since in present study, the detection accuracy on T2 under the Mozart fast condition is almost the same as under the silence condition, and worse than under the Mozart normal condition which expected to be better according to the arousal theory. In contrast to findings of Olivers and Nieuwenhuis’s (2005)[7], they reported the improvement of the T2 accuracy under the music condition to silence condition also. However, the music they used in the experiment was a tune with continuous beats which had not the same musical meaning as music works like Mozart Sonata. It can be explained that rhythm beat could induce arousal change more easily, and could attractive human attention so that hearing rhythm beat became into an irrelative task to the participant. That irrelative task caused the redeployment of the attention resource of the participant applied to the first target T1, and eventually improved the detection of the second target T2. It is not only validated in laboratory but also experienced in real life that Music including Mozart Sonata does change listener’s arousal or mood[4] .It
230
C. Xie et al.
might bring a change of the detection accuracy on T2 if the participants’ arousal or mood was shifted, which was supported by the resource theory of attentional blink. Why didn’t it appear in present study? It can be explained that the arousal change caused by Mozart effect is not strong enough to influence attention, and the mood change given by music often does not occur immediately[8]. It needs to do further investigation to examine whether Mozart Sonata has post effect on attentional blink. Another possible explanation of the result in present study is culture gap. All the participants were Chinese with no special musical education. They self reported that they seldom listened to classical music. All of them never heard of even Mozart Sonata’s name. Their cognitive activity of listening Mozart Sonata might be different from those who were familiar with classical music or grown up in western culture environment.
6
Conclusion
The present study revealed that the temporal attention influence of Mozart effect is not general exist. Though, Mozart Sonata changed listeners’ arousal or mood in many researches, it failed to induce any temporal influence in present experiment. Acknowledgments. This research was supported by the National Natural Science Foundation of China (No. 60775036 , No. 60970061) and the Research Fund for the Doctoral Program of Higher Education (No. 20060247039).
References 1. Cristy, H., Oliver, M., Charles, S.: An investigation into the temporal dimension of the Mozart effect: Evidence from the attentional blink task. Acta psychological 125, 117–128 (2007) 2. Gabriala, H., William, F.T., Glenn S, E.: Effects of musical tempo and mode on arousal, mood, and spatial abilities. Music Perception 20(2), 151–171 (2002) 3. Gabrielsson, A.: Emotions in strong experiences with music: Music and emotion: Theory and research. In: Juslin, P.N., Sloboda, J.A. (eds.), pp. 431–449. Oxford University Press, New York (2001) 4. Gabrielsson, A., Lindstr¨ om, E.: The influence of musical structure on emotional expression: Music and emotion: Theory and research. In: Juslin, P.N., Sloboda, J.A. (eds.), pp. 223–248. Oxford University Press, New York (2001) 5. Krumhansl, C.L.: An exploratory study of musical emotions and psychophysiology. Canadian Journal of Experimental Psychology 51, 336–352 (1997) 6. Miller, G.A.: The cognitive revolution: A historical perspective. Trends in Cognitive Sciences 7, 141–144 (2003) 7. Olivers, C.N.L., Nieuwenhuis, S.: The beneficial effect of concurrent task-irrelevant mental activity on temporal attention. Psychological Science 16, 265–269 (2005) 8. Panksepp, J., Bernatzky, G.: Emotional sounds and the brain: the neuro-affective foundations of musical appreciation. Behavioural Processes 60, 133–155 (2002)
A Study of Mozart Effect on Arousal, Mood, and Attentional Blink
231
9. Paul, E.D.: The attentional blink: A review of data and theory. Attention, Perception, & Psychophysics 71(8), 1683–1700 (2009) 10. Peretz, I.: Listen to the brain: A biological perspective on musical emotions. In: Juslin, P.N., Sloboda, J.A. (eds.) Music and emotion: Theory and research, pp. 105–134. Oxford University Press, Oxford (2001a) 11. Rauscher, F., Shaw, G., Ky, K.: Music and spatial task performance. Nature, 365– 611 (1993) 12. Schmidt, L.A., Trainor, L.J.: Frontal brain electrical activity (EEG) distinguishes valence and intensity of musical emotions. Cognition and Emotion 15, 487–500 (2001) 13. Sloboda, J.A., Juslin, P.N.: Psychological perspectives on music and emotion. In: Juslin, P.N., Sloboda, J.A. (eds.) Music and emotion: Theory and research, pp. 71–104. Oxford University Press, New York (2001) 14. Thayer, J.F., Levenson, R.W.: Effects of music on psychophysiological responses to a stressful film. Psychomusicology 3, 44–54 (1983)
Attentional Disengage from Test-Related Pictures in Test-Anxious Students: Evidence from Event-Related Potentials Rui Chen1,2 and Renlai Zhou1,2,3,4 1
Key Laboratory of Child Development and Learning Science (Southeast University), Nanjing, 210096 2 Research Center of Learning Science, Southeast University, Nanjing, 210096 3 State Key Laboratory of Cognitive Neuroscience and Learning(Beijing Normal University), Beijing, 100875 4 Beijing Key Lab of Applied Experimental Psychology (Beijing Normal University), Beijing, 100875
Abstract. The present study aims to investigate neural correlations on attentional disengagement in test-anxious students. Event-related potentials were recorded from 28 undergraduates, grouped according to their scores in Sarason test scale (TAS). All students performed a same central cue task. Results of response times (RTs) show slowing effect of test-related stimuli appeared on high test-anxious students only. ERPs results show the targets following test-related cues captured more attentional processing in the early period and attentional resource allocation (enhanced N100 and P300 amplitude) both in high and low test-anxious students. These findings indicate the behavioral performance is consistent with cognitive processing in high test-anxious students only. This means the test-related cue captured more attentional resource of high testanxious students, and bring them difficulty in shifting attention from target following test-related cue. For the low test-anxious students, however, there is no slowing effect on test-related trials. Keywords: test-anxiety, undergraduates, attentional disengagement, ERPs.
1 Introduction Test anxiety is a situation-specific trait anxiety [1]. Almost all of students suffer from test anxiety when faced with an examination. According to a questionnaire survey, Wang (2001) found that the rate of high test anxiety in Chinese undergraduates was 21.8% [2]. Generally, test anxiety was described as an emotional state in which a person experiences distress before, during, or after an examination or other assessment to such an extent that this anxiety causes poor performance or interferes with normal learning. Numbers of previous studies testified that the negative mood of high anxious individuals could be elicited and maintained by their attentional bias to threat stimuli. It finally led to response-delay effect on threat stimuli, which means high anxious individuals had more difficulties in attentional disengage from threat stimuli Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 232–239, 2010. © Springer-Verlag Berlin Heidelberg 2010
Attentional Disengage from Test-Related Pictures in Test-Anxious Students
233
than low anxious individuals [3-5]. Moreover, researches with the cue-target paradigm indicated that attentional bias to threatening locations might often arise from slow reactions to neutral locations due to delays in disengaging from the threatening locations, and there was no evidence for a facilitated detection of threatening information [6,7]. For high test-anxious students, the index which reflected attentional disengage from test-related threatening words was higher than low test-anxious ones in cue-target paradigm [8]. However, visuospatial factors might influence on allocation of attention, which means the difficulty of attentional disengage from threat stimuli is elicited by its location but its valence or any other features. In this current experiment, we used central-cue paradigm to investigate a pure attentional disengagement without visuospatial factors. In this paradigm, all stimuli were presented in the centre of screen that ensures no shifts of visuospatial attention are required [5]. In order to investigate neural mechanism of attentional processing, event-related potentials (ERPs) were used in this experiment. A couple of previous researches suggested that N100 and P300 components which related to attentional modulation were more enhanced following threat stimuli cues compared with non-threat stimuli cues [9,10]. These two components were mostly detected at prefrontal cortex, frontal cortex and parietal cortex [11,12]. The enhancement of amplitudes in both components is interpreted as signs of facilitated attentional processing and more attentional resource is occupied. In details, the enhancement of N100 component is a sign of active attentional orienting to task-relevant location, and P300 index a complex late positive component (LPC) which most relate to allocation of attentional resource [13,14]. To summarize the previous studies, we noticed that attentional resource of high anxious individuals can be captured by threat stimuli more facilely. Meanwhile, both N100 and P300 of ERPs are important components to attentional processing, which reflect the allocation of attentional resource during a cognitive task. Thus, as a subset of anxiety, is a test-related stimulus often considered being a threat stimulus by testanxious individuals? Are there any differences of behavior response and ERPs between high and low test-anxious students? According to this hypothesis, the present study aims to investigate the differences of cognitive and neural mechanism between high and low test-anxious students in central-cue paradigm. The results from EEG data combined with response time (RT) should reflect that early attentional selection and response preparation would be often influenced by the different types of cueing stimulus (test-related vs. test-unrelated) within the central-cue paradigm.
2 Methods 2.1 Participants Twenty-eight right-handed students from Southeast University in China voluteered for this experiment. Thirteen of them were high-test anxious students (TAS≥20) and the rests are low ones (TAS≤12), according to their scores of TAS (Sarason, 1978). These 28 subjects, 12 females and 16 males, were aged between 19 and 27 years (mean=22.25, S.D. =1.79). All participants were normal sight or corrected to normal.
234
R. Chen and R. Zhou
2.2 Stimuli Cue stimuli in this experiment consisted of two different types of pictures (test-related and test-unrelated). In order to avoid the affection of color to participants’ emotion, 30 pictures in each type are set to grey mode by Adobe Photoshop software. Moreover, target stimuli also consisted of two types: one is an arrow point to left, the other point to right. Both cue and target stimuli were displayed in the center of a light grey box on screen, which is sized 9 cm × 9 cm. The cue stimuli were assessed on the basis of three-factor theory of emotions [15,16]. For test-related pictures, average index of pleasure-displeasure is 4.96, and for test-unrelated pictures, the index is 5.02. It ensured that all pictures were neutral without interference of positive or negative emotion. The other two dimension of emotion were not involved in this experiment. 2.3 EEG Recording Electroencephalogram was continuously recorded (band pass 0.05-100Hz, sample rate 1000Hz) with Neuroscan Synamp2 Amplifier (Scan 4.3.1, Neurosoft Labs, Inc.), using an electrode cap with 64 Ag/AgCI electrodes mounted according to the extended international 10-20 system and hemi-reference to linked left mastoid. Vertical and horizontal elecoculograms are recorded with two pairs of electrodes, one place above and below the right eye, and another 10 mm from the lateral canthi. Electrode impedance was maintained below 5 KOhm throughout the experiment. All 64 sites were chosen for statistical analysis. The early attentional orienting component and late positive component were measured separately in the 110-170 ms and 290-340 ms time windows, respectively Repeated-measures analysis of variance (ANOVA) was conducted on each ERP component with three factors: test-anxious state (high/low), picture type (test-related/-unrelated) and electrode site. 2.4 Procedure Participants were placed in a ERPs isolated room, and sat in a comfortable chair, 60 cm from the screen. As already indicated, they were told to look continuously at a light grey box located in the center of the black background screen. All stimuli were presented in the central box using E-Prime version 1.1 software. This experiment consisted of 12 practice trials, and 270 experimental trials split equably into 3 blocks. In order to avoid the anticipated response to the target, 10% of all experiment trials were filled as blank targets. Each trial began with a central light grey box for 1000 ms, which remained on the screen throughout the experiment. A cue stimulus then appeared within the grey box for 200 ms, and after a blank mask for 200 to 600 ms, a target (left or right arrow) was presented until response, otherwise 3000 ms has elapsed without response. Participants were instructed to press one of two horizontally-positioned buttons on a keyboard, using the index finger of each hand, for response as quickly and accurately as possible to the type of target (i.e., they are instructed to press the left key if the arrow point left). There was a variable inter-trial interval (ITI) ranging from 500 to 1050ms. An equal number of trials with each type of cue valence (test-related and test-unrelated) and target direction
Attentional Disengage from Test-Related Pictures in Test-Anxious Students
235
(left and right) were required in this experiment. Trials were presented in a new random order for each participant. 2.5 Preparation of EEG and RTs Data Both RTs and EEGs from incorrect trials were eliminated. Following inspection of the RT data with SPSS, RTs which beyond 3 SDs of each participant’s mean were excluded in order to reduce the influence of outliers.
3 Results 3.1 Task Performance To estimate response delay (slowing) effects on the central cue task, response times difference score, which was also called attentional disengagement index, was calculated for each participant by subtracting the mean RT on test-unrelated cue trials from the mean RT on test-related cue trials, so that positive values indicate a slowing effect, and negative values indicate a speeding effect. Table 1. Mean RTs and SD for low and high test anxiety groups in each condition of the central cue paradigm Stimuli type Test-related Test-unrelated Test-related Test-unrelated
Subjects group
Number of subjects
High test-anxiety
13
Low test-anxiety
15
Mean 425.30a 418.28 431.15 432.85
Std. Deviation 104.61 101.32 108.37 103.02
a. The unit of mean is millisecond.
A 2 × 2 mixed design ANOVA of RT was carried out with test-anxiety groups (high and low), types of cue (test-related and -unrelated) (see Table 1). The results show a significant anxiety groups × cue types interaction, F (1, 26) = 5.45, p < .05, ηp2 = .17. Subsequently, a further simple effect analysis finds that RTs to the test-related cues are significantly longer than test-unrelated ones merely in high test-anxious students, F (1, 12) = 6.58, p < .05, ηp2 = .20. There is no other significant results found in ANOVA. Furthermore, an independent sample T-test is was used to assess the attentional disengagement index between two anxious groups (see Table 2). This indicates a significant difference between high and low test-anxious groups; t (26) = 2.33, p < 0.05, d = 0.88, the high test-anxious students show a slowing effect following the test-related cues. Table 2. Mean and SD of attentional disengagement index for low and high test anxiety groups in the central cue paradigm Subjects group High test-anxiety Low test-anxiety
Number of subjects 13 15
Mean 7.02 -1.70
Std. Deviation 10.20 9.57
236
R. Chen and R. Zhou
3.2 Event-Related Potentials Data Peak amplitudes were calculated for N100 (110-170 ms), P100 (150-200ms) and N200 (180-230 ms) time windows, and average amplitudes were calculated for P300 time window (290-340 ms).
Fig. 1. Grand averages separating high and low groups of subjects according to their TAS scores in each condition. An average of the six recording channels is represented.
For N100 component, the amplitude of the most prominent peak is computed for each individual ERP. The peak amplitude for test-unrelated trials is more enhanced than test-related both in high and low test-anxious students (see Fig. 1). These differences are significant at prefrontal region (F (1, 26) =27.53, p < .01, ηp2 = .51), frontal region (F (1, 26) =32.38, p < .01, ηp2 = .56) and centro-parietal region (F (1, 26) =11.44, p < .01, ηp2 = .31). Furthermore, P100 and N200 components are only inspected in high test-anxious group. One way ANOVA for P100 indicate that significant difference was detected at left frontal lobe (F (1, 12) =4.95, p < .05, ηp2 = .29). The peak amplitude for test-related trials is more positive than test-unrelated ones. For N200 component, significant differences are detected at left frontal lobe (F (1, 12) =6.82, p < .05, ηp2 = .36), centro-frontal area (F (1, 12) =5.84, p < .05, ηp2 = .31) and centro-patietal region (F (1, 12) =5.73, p < .05, ηp2 = .32). The peak amplitudes of all above brain areas for test-unrelated trials are more negative than test-related ones. In addition, Fig. 2 shows that average amplitude of P300 component for testrelated trials is more enhanced than test-unrelated in both high and low test-anxious students. These differences are significant at centro-frontal and superior parietal region (F (1, 26) =9.15, p < .01, ηp2 = .26), posterior parietal region (F (1, 26) =4.92, p < .05, ηp2 = .16).
Attentional Disengage from Test-Related Pictures in Test-Anxious Students
237
Fig. 2. Grand averages separating high and low groups of subjects according to their TAS scores in each condition. An average of the four recording channels is represented.
4 Discussions The primary aim of this study was to characterize the behavioral response combined with ERPs to test-related or -unrelated cues both in high and low test-anxious students. We hypothesized that slowing effect in task performance are appeared only on high test-anxious students when target stimulus is presented following a test-related cue, but not on low test-anxious students. The results of RTs are identified with the hypothesis. An explanation for this result is the theory of behavior inhibition system (BIS), which asserts risk assessing, and risk aversion increasing in conflict situations [17]. This means the test-related cues are detected as the threat stimuli by high testanxious students only, and it can capture more attentional resource. The BIS is activated to increase assessment for the valence of threat stimuli, and interfere in the response to targets. Finally, the slowing effect on high test-anxious student in the testrelated trials is generated. According to the results of ERPs, the most prominent N100 amplitudes are mainly detected at prefrontal region and centro-parietal region. This component is an early ERPs component, which related to attentional processing and active orienting of attention to a task-related location [13]. In the present study, N100 peak amplitudes are more enhanced in the test-unrelated trials on both high and low test-anxious students. This means that test-related cues capture more attentional resources before target
238
R. Chen and R. Zhou
stimuli appear. It leads to less attentional resource allocate to the following target stimuli. Another significant component in this experiment is P300, which reflects a readjustment of cognitive strategies in preparation for future stimulus processing [9]. The P300 average amplitudes are typically measured most strongly by the electrodes covering the parietal region though it is generated by various parts of the brain. In this current study, the enhanced P300 amplitude in test-related trials is more enhanced than test-unrelated trials. This implies that the test-related trials occupy more attentional resources in both high and low test-anxious individuals. Moreover, P100 and N200 components are only appeared on high test-anxious students, which means more feature processing (enhanced P100) and stronger behavior inhibition (enhanced N200) are found in test-related trials. It is inferred that both P100 and N200 components related to the slowing effect, which is investigated in high test-anxious students in behavioral performance. According to above analyses, we find there is no difference during the stages of attentional orienting and readjustment of cognitive strategies between high and low testanxious students. The main differences between these two groups appeared on the stages of feature processing and response inhibition. Therefore, both P100 and N200 should be the special components to high test-anxious students, which reflects the difficulty of attentional disengage from test-related stimuli in the behavioral performance.
References 1. Keogh, E., French, C.C.: Test anxiety, evaluative stress, and susceptibility to distraction from threat. European Journal of Personality 15, 123–141 (2001) 2. Wang, C.K.: Reliability and Validity of Test Anxiety Scale (Chinese Version). Chinese Mental Health Journal 2, 96–97 (2001) 3. Fox, E., Russo, R., Dutton, K.: Attentional bias for threat: Evidence for delayed disengagement from emotional faces. Cognition and Emotion 16, 355–379 (2002) 4. Jongen, E.M.M., Smulders, F.T.Y., et al.: Attentional bias and general orienting processes in bipolar disorder. Journal of Behavior Therapy and Experimental Psychiatry 38, 168–183 (2007) 5. Mogg, K., Holmes, A., et al.: Effects of threat cues on attentional shifting, disengagement and response slowing in anxious individuals. Behaviour Research and Therapy 46, 656– 667 (2008) 6. Derryberry, D., Reed, M.A.: Temperament and attention: orienting toward and away from positive and negative signals. Journal of Personality and Social Psychology 66, 1128–1139 (1994) 7. Koster, E.H.W., Crombez, G., et al.: Selective attention to threat in the dot probe paradigm: differentiating vigilance and difficulty to disengage. Behaviour Research and Therapy 42, 1183–1192 (2004) 8. Liu, Y., Zhou, R.L.: The Cognitive Mechanism of Attentional Bias in Test-anxious Students. Annual Report of Southeast University, Nanjing (2008) 9. Wright, M.J., Geffen, G.M., et al.: Event related potentials during covert orientation of visual attention: effects of cue validity and directionality. Biological Psychology 41, 183– 202 (1995)
Attentional Disengage from Test-Related Pictures in Test-Anxious Students
239
10. Bar-Haim, Y., Lamy, D., et al.: Threat-related attentional bias in anxious and nonanxious individuals: a meta-analytic study. Psychological Bulletin 133, 1–24 (2007) 11. Woods, D.L., Knight, R.T.: Electrophysiologic evidence of increased distractibility after dorsolateral prefrontal lesions. Neurology 36, 212–216 (1986) 12. Eimer, M.: Effects of attention and stimulus probability on ERPs in a Go/Nogo task. Biological Psychology 35, 123–138 (1993) 13. Heinze, H.J., Luck, S.J., et al.: Visual event-related potentials index focused attention within bilateral stimulus arrays. I. Evidence for early selection. Electroencephalography and clinical neurophysiology 75, 511–527 (1990) 14. Gray, H.M., Ambady, N., et al.: P300 as an index of attention to self-relevant stimuli. Journal of Experimental Social Psychology 40, 216–224 (2004) 15. Tucker, D.M., Hartry-Speiser, A., et al.: Mood and spatial memory: emotion and right hemisphere contribution to spatial cognition. Biological Psychology 50, 103–125 (1999) 16. Kemp, A.H., Gray, M.A., et al.: Steady-state visually evoked potential topography during processing of emotional valence in healthy subjects. NeuroImage 17, 1684–1692 (2002) 17. Putman, P., Hermans, E., et al.: Emotional stroop performance for masked angry faces: it’s bas, not bis. Emotion. 4, 305–311 (2004)
Concept Learning in Text Comprehension Manas Hardas and Javed Khan Computer Science Department, Kent State University, Kent Ohio 44240, USA {mhardas,javed}@cs.kent.edu
Abstract. This paper presents a mechanism to reverse engineer a cognitive concept association graph (CAG) which is formed by a reader while reading a piece of text. During text comprehension a human reader recognizes some concepts and skips some. The recognized concepts are retained to construct the meaning of the read text while the other concepts are discarded. The concepts which are recognized and discarded vary for every reader because of the differences in the prior knowledge possessed by all readers. We propose a theoretical forward calculation model to predict which concepts are recognized based on the prior knowledge. To demonstrate the truthful existence of this model, we employ a reverse engineered approach to calculate a concept association graph as per the rules defined by the model. An empirical study is conducted of how six readers from an undergraduate class of Computer Networks form a concept association graph given a paragraph of text to read. The model computes a resultant graph which is flexible and can give quantitative insights into the more complex processes involved in human concept learning.
1
Introduction
Text comprehension is a high level cognitive process performed remarkably well by humans. From a computational view point, it is remarkable because humans can understand and comprehend any piece of text they have never seen before and learn completely new concepts from it they never knew before. Human concept learning as defined by Bruner et. al. (1967) is the correct classification of examples into categories. Previous computational models of human concept learning (Tanenbaum 1999, Dietterich et.al. 1997) give very good approximations of this kind of concept learning. However learning concepts from text is unlike learning from examples. Completely new concepts are not learnt by hypotheses induction but by making new associations with prior knowledge. Therefore a cognitive theory of concept construction is needed to explain this process rather than a theory of generalization. Constructivism (Piaget, 1937) is a cognitive theory of learning which explains how concepts are internalization based on previously acquired concepts by assimilation and accommodation. It gives a systematic cognitive model for acquiring new concepts in context of the prior knowledge. Hence we propose that the process of concept learning from text be examined in the light of the cognitive process involved in constructivism. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 240–251, 2010. c Springer-Verlag Berlin Heidelberg 2010
Concept Learning in Text Comprehension
241
Text comprehension research in the cognitive sciences domain considers comprehension, from a process model point of view (Kintsch 1988, van den Broek, Risden, Fletcher and Thurlow, 1999, Tzeng, Y., van den Broek, P., Kendeo, P., Lee, C., 2005, Gerrig and McKoon, 1998; Myers and O’Brien, 1998) or from a knowledge point of view (Landauer, Dumais, 1988, 1997). Both approaches to text comprehension necessitate a mathematical model for learning and the involvement of prior knowledge. There is ample evidence for the importance of prior knowledge in learning from texts (Kintsch, E., and Kintsch, W., 1995, McKeown, M. G., et al., 1992, Means, M., et al., 1985, Schneider, W., et al. 1990). As observed by Verhoeven, L. and Perfetti, C. (2008), over the past decade, research on text comprehension has moved towards models in which connectionist memory-based and constructivist aspects of comprehension are more integrated. There are two main contributions of this paper. The first is a mechanistic model of the cognitive processes involved in concept learning in text comprehension. The process of text comprehension is defined as concept recognition and concept association. During human text comprehension the CAG goes through a series of incremental changes as specific concepts are recognized or discarded depending upon the reader’s prior knowledge. This paper proposes a computational model for the two processes. The second main contribution is a reverse engineered approach to the model for obtaining the concept association graph. When a person reads text, the CAG which is formed cannot be known beforehand. Hence we need a reverse engineered approach to find the CAG from the data generated by the subjects. An empirical study is conducted in which reader drawn CAGs are fed into a constraint satisfaction system which computes the CAG for a reader or a group of readers. This novel method to compute the comprehensive graph can be efficiently used to comment on the state of learning for a reader or a group of readers.
2 2.1
Computational Model Knowledge Representation
The knowledge representation is a concept association graph (CAG). The graph consists of nodes which represent concepts and the edges between the nodes signify the association between concepts. The strength of the association is given by an association strength. Association strength is positive and can also be negative in some special circumstances. Any CAG, for example T, has a set of concepts and a set of associations represented by tuple CT = [c1 , c2 , c3 , ...] and AT = [lc1 ,c2 , lc1 ,c3 , lc1 ,c4 , ...] respectively. Figure 1. shows an example of a simple concept association graph. From the graph it can be seen that the concept “ethernet” is most strongly associated with “CSMA” because of it high association strength. Similarly “LAN” and “CSMA” are the two most weakly associated concepts. The time line, represented by t=1, t=2 and so on, is the order in which the concepts were acquired. A lower time value means that the concepts were acquired relatively earlier in learning. The semantic of the association strengths is provided by the theory of constructivism.
242
M. Hardas and J. Khan
Fig. 1. Example of a simple concept association graph (CAG)
For the concepts which are acquired at t=2, the concepts acquired at t=1 are considered as previous knowledge. Since according to the theory all new concepts can only be acquired in the context of previous concepts, the strength of the association signifies the importance of the existence of a prior concept to learn a particular new concept. For example, to learn concept “CSMA”, it is more important to learn “ethernet” than it is to learn “LAN” because the association strength between ethernet-CSMA (10) is much greater than that between LANCSMA (2). 2.2
Terminology
1. L(t) is defined as the current learned graph at time “t”. It is the graph formed by the reader from episodic memory of the text. The graph goes through a series of changes during text comprehension and is continuously evolving as L(t → ∞). 2. Z is the graph which represents the global prior knowledge a reader possesses i.e. the non text/domain specific knowledge. It is assumed for computational purposes, that Z has the association information for all new concepts. So whenever a reader is confronted by a totally new concept not present in L(t), the new concept is acquired in L(t + 1) by getting the association information from Z. 3. S(t) is a series of graphs each representing the newly introduced concepts in each comprehension episode. Based on the prior knowledge some concepts from this graph are recognized while some are discarded. The concepts which are recognized are then associated with concepts from L(t) to get L(t + 1), refer Figure 2. 4. Learning: It is a series of ’comprehension episodes’, each a two step process. Latent concept recognition: From the set of presented concepts, some are recognized while some are not. Latent concept association: From the set of recognized concepts, the concepts are associated with previously known concepts.
Concept Learning in Text Comprehension
3
243
CAG Transition Process
In the given model we assume two instances of CAG. The first one is the initially learned CAG namely L(t = 1) represented by concept set CL(t=1) and association set AL(t=1) . Learning progresses through a set of learning episodes. It starts with L(t = 1) and then incrementally constructs L(t = 2, 3, 4, ...). In each learning episode a small graph S(t) is presented. S(t) is the graph for every new sentence at time t. It too has a concept set CS(t=1) and an association set AS(t=1) . Generally learning episode can range from reading a sentence, or a part of a sentence or simply a word. By the immediacy assumption we consider the smallest unit of a learning episode as a sentence. S(t = 1) includes novel concepts and associations which are not in L(t = 1) and well as known elements. The second instance of CAG is Z. Z provides a learner with the connections between the current L(t = 1) and the newly acquired concepts from S(t = 1). By definition L(t = 1) cannot have association information that can connect the new concepts in S(t = 1) to that of L(t = 1). Thus the model requires an imaginary CAG namely Z to provide the learner with some basis of computation to discover new concepts and attach them to current L(t). Whenever a new concept is presented to the learner the association information to connect the concept to L(t = 1) is acquired from Z. It is assumed that Z has all the concepts and association information. From the example it can be seen that in the first learning episode when concepts “d” and “e” are presented, the connection between “a” and “d” is obtained from Z. Concept “e” is not present in Z and therefore has no connectivity information. Hence it is discarded while forming the learned graph L(t = 2). In the second episode when concepts “f” and “g” are presented, again the connection information is obtained from Z and L(t = 3) if formed. Fig. 2 shows only the relevant part of Z w.r.t. this example.
Fig. 2. CAG transition
244
4
M. Hardas and J. Khan
Processing in a Learning Episode
As mentioned in the previous section a learning episode is made up of two distinct processes as detailed below. 4.1
Latent Concept Recognition
The first process is called latent concept recognition. A new concept graph denoted by S(t) is presented out of which some concepts are recognized based on the prior knowledge and some are not. Let the L(t) be the learned CAG denoted by its concept set CL(t) and association set AL(t) . A new sentence S(t), presented at step t, has a finite set of discrete concepts CS(t) and association set AS(t) . The set may contain already learned (i.e. which are already in L(t)) or new concepts (i.e. which are not already in L(t)). The new concepts from S(t) which are recognized by the learner are called as “latent concepts” and denoted by the set Clat(t) , where Clat(t) ⊂ CS(t) . The latent concept set Clat(t) is formed by evaluating a comprehension function which returns comprehension strength (hi ) for each concept “i” in S(t). The concepts for which hi exceeds a certain threshold hT are added to Clat(t) i.e. are recognized. Clat(t) = [i|i ∈ CS(t) ; hi > hT ]
(1)
The comprehension strength for a node “i” is function of the association strengths of the links between concept “i” and prior concepts in L(t). It is computed as follows, hi = f (ls,i ); ∀s ∈ CL(t) . We assume a linear relationship between the comprehension strength and the threshold coefficient analogous to linear weighted sum activation function in a simple artificial neural network. It is possible that a nonlinear function holds true, but that is a part of another discussion. Therefore, hi =
ls,i
(2)
s∈CL(t)
An example calculation of the comprehension strength for each concept in CS(t) is shown in Fig. 3. Assume that the association strength between the concepts in L(t) and S(t) are known. Let the threshold coefficient for this particular example be, hT = 10. Calculating the individual comprehension strengths for individual concepts in CS(t=1) we have, 1. hCSMA = lethernet,CSMA + lLAN,CSMA = 12 > hT (10) 2. hcarrier sense = lethernet,carrier sense + lLAN,carrier sense = 9 < hT (10) 3. hcollision detect = lethernet,collision detect + lLAN,collision detect = 13 > hT (10) Since comprehension threshold for “carrier sense” is below the threshold it is not recognized and not included in Clat(t=1) . After the step of latent concept recognition the set consists of Clat(t=1) = (CSM A, collision detect).
Concept Learning in Text Comprehension
245
Fig. 3. Latent concept recognition
4.2
Latent Concept Association
The second process associates the latent concepts in Clat(t) with the concept(s) in the learned set L(t) to form L(t + 1). The set of latent associations is denoted by Alat(t) , where Alat(t) ⊂ AZ . The latent association set if formed by evaluating an association function which gives the association strengths ai,j for a concept i and j. The association strength ai,j is simply the scalar link strength of li,j . All the associations with strengths greater than a certain threshold aT are included. Alat(t) = [li,j ; ∀i ∈ CL(t) , ∀j ∈ Clat(t) |ai,j ≥ aT ]
(3)
Fig. 4. Latent concept association
If we assume the association threshold aT = 5, then from the figure it can be seen that the association between “LAN” and “CSMA” is dropped because it is less than the association threshold aT (5) > aLAN,CSMA (2). After the process of recognition and association the concept map evolves from L(t) to L(t + 1) represented by concept set and association set computed as follows, CL(t+1) = CL(t) ∪ Clat(t) & AL(t+1) = AL(t) ∪ Alat(t)
5
(4)
Reverse Engineering the Association Strengths
We present a constraint satisfaction model to calculate the association strengths for the example CAG as shown in Figure 5. Since the concepts (c, d and g) are
246
M. Hardas and J. Khan
recognized, it means that the comprehension strengths of each of the nodes is greater than the comprehension threshold. This can be represented by a set of linear equations called as recognition threshold (hT ) equations. hT equations lac + lbc ≥ hT lad + lbd ≥ hT lae + lbe ≤ hT laf + lbf + lcf + ldf ≤ hT lag + lbg + lcg + ldg ≥ hT
aT equations lac ≥ aT , lac > 0 lbc ≤ aT , lbc > 0 lad ≥ aT , lad > 0 lbd ≥ aT , lbd > 0 lag ≤ aT , lag > 0 lbg ≤ aT , lbg > 0 lcg ≥ aT , lcg > 0 ldg ≥ aT , ldg > 0
It is seen from the graph that concepts “e” and “f” are not recognized. Therefore no associations exist for them. Also, the links lbc , lag and lbg are less than the threshold, therefore these links are not present in the learned graph. In this discussion we do not consider associations between concepts recognized at the same time. So we form the association threshold (aT ) equations. The hT equations are the ones that constrain the recognition of concept while the aT equations are the ones that constrain the association of concepts. It may happen that the association strengths between concepts in L(t) and CS(t) − Clat(t) maybe greater than the association threshold for example, maybe lae > aT . But since the summation of lae and lbe is not greater than hT , concept “e” is not recognized. And since concept “e” is not recognized we do not put aT constraints on it associations. All links are constrained to be greater than 0. The matrix representation for the equations as a linear programming problem is as follows; min f*x subject to constraints A ∗ x ≤ b where x is the vector of variables and f, A and b are the coefficient matrices for the objective function, equation set and the result. Since we are not trying to optimize an objective function all the members of f are set to 0. An important observation here is that, the recognition and association thresholds (hT and aT ) are variable and
Fig. 5. Example of a learned CAG at time t=3
Concept Learning in Text Comprehension
247
Fig. 6. Matrix representation
factored into the coefficient matrix for the equation set. Fig.6 shows an example matrix representation. Solving this gives the association strengths for all the associations for a given CAG.
6 6.1
Finding and Analyzing the Solution CAG Experiment Setup
To find the comprehensive complex graph that can explain the concept learning for a particular example text an experiment is conducted in the classroom setting with a group of six students in the undergraduate “Computer Networks” class. Subjects were given a paragraph of text about the concept “Ethernet” from the standard text book prescribed for that class and were asked to go through each sentence in the paragraph and simultaneously identify each concept in the sentence and progressively draw CAGs. The paragraph contained 8 sentences, so the concept learning activity was divided into 8 learning episodes. By the end of the eighth episode the students had drawn CAGs from t=1-8 using the concepts from the text. 6.2
Observations
Figure 7 (a) shows the concept graph drawn by one of the students. The student drawn CAGs are used to reconstruct CAGs for all students indicating the concepts which were recognized and those which were not. To construct these graphs we first have to find CS(t) for t=1-8. This is done by collecting concepts in t=1 to 8 for every student. For example, at t=3 the possible set of recognized concepts which covers all students is, CS(t=3) = (P ARC, N etwork, Shared link). Out of these student recognized only the concept “Shared link”. Therefore, Clat(t=3) = (Shared link) and CS(t=3) − Clat(t=3) = (P ARC, N etwork). Figure 7(b) shows the reconstruction of CS(t=3) and Clat(t=3) for a student. After determining CS(t) and Clat(t) for every sentence for every student the concept maps are reconstructed to include the recognized as well as the unrecognized concepts.
248
M. Hardas and J. Khan
(a) Student drawn CAG
(b) Reconstructed CAG
Fig. 7. Reconstructing student drawn CAG
Once CAGs for all students are reconstructed, they are converted into a set of linear equations and solved to obtain the values of the association strengths which satisfy all the constraints. The result is a fully connected CAG called as solution CAG which can mathematically explain the concept learning for all students according to the laws of concept recognition and concept association as specified before. 6.3
Analysis of Solution CAG
The solution CAG is a fully connected graph between concepts from the text and special “hidden” nodes. These types of nodes are introduced at the stage of reconstructing a particular student CAG. A single hidden node is inserted at time t=1 for every student. This hidden node signifies the background knowledge of a particular student. For an experiment like this, it is impossible to actually construct a graph of a student’s entire background knowledge. There exists no method which can accurately hypothesize a person’s concept knowledge graph. Hence we assume that all the background concept knowledge possessed by a student is encompassed in a single hidden node namely “std1” for student 1 and so on. Thus the resultant CAG contains six such hidden nodes, one for each student. The hidden nodes have connections to all the rest of the concept nodes in the CAG. The associations from a hidden node can have positive as well as negative strengths. This can be intuitively explained as, sometimes the student’s background knowledge helps in learning new concepts whereas sometimes it is found to be an obstacle. If the association from the hidden node to a concept node is positive then it implies that the hidden node is beneficial in learning the
Concept Learning in Text Comprehension
249
new concept whereas negative association strength implies that the hidden node is actually detrimental to learning the new concept. A zero strength association from a hidden node implies it’s neither beneficial nor detrimental to learning. The existence of hidden nodes also helps in solving another known problem of learning the XOR function using this model since the hidden node associations are allowed to take negative values. The solution CAG is actually the imaginary CAG namely Z, which we assumed to have contained all the connectivity and association strength information. Z can thus be calculated by reverse engineering the observed student drawn CAGs. Association strength distribution. In this section we analyze the distribution of association strengths. Figure 8(a) shows the sorted association strengths between hidden nodes and concept nodes and Figure 8(b) shows just between concepts. Some of the links between hidden nodes and concepts have negative strength link but most have positive strength indicating that more often than not background knowledge helps in learning new concepts. This plot can be used to determine which of the associations are most important and need reinforcement.
(a) Between hidden and concept nodes
(b) Between concept nodes
Fig. 8. Association strength distribution
Node strength distribution. The node strength of a node is calculated by summing up all the association strengths to that particular node. Figure 9(a) shows the sorted node strengths for the hidden nodes. It is seen that std5 hidden node has highest node strength. Figure 9(b) shows the sorted node strengths for the actual concepts. “CSMA/CD” has the highest node strength signifying it importance in comprehension of this particular paragraph of text. This graph gives us an idea about which concepts are central in comprehending this paragraph of text. As seen from the figure the concepts “CSMA/CD”, “Collision detect”, “Aloha”, “frames” etc. are much more central in comprehending the concept of “ethernet” than say, “bus”, “coax cable” or “shared medium”.
250
M. Hardas and J. Khan
(a) For hidden nodes
(b) For concept nodes
Fig. 9. Nodes strength distribution
Correlation with hT . Each student is assumed to have a variable hT and aT . These variables are factored into the problem while constructing the equations and coefficient matrices. The variable hT for each student signifies the difficulty or ease of a student in comprehending the particular paragraph of text. A lower value of hT implies that the student possibly has a lower threshold for learning new concepts. Meaning the student is more easily capable of learning new concepts than one with a high threshold. To observe this we simply plot the correlation between the threshold hT for each of the students and the number of concepts recognized (n) by the student. Figure 10 table shows the exact values hT against n and the plot. As expected there is negative correlation between the two variable equal to -0.172.
student hT 1 554.027 2 582.764 3 622.879 4 631.583 5 775.794 6 574.077
n 17 16 11 10 13 9
Fig. 10. Correlation between hT and n for six students is -0.172
7
Conclusion and Potential Directions
In this paper we proposed a computational model for computing the concept associating graph which is formed during human text comprehension. A study
Concept Learning in Text Comprehension
251
is conducted to explain concept learning for a group of six readers, which can be extrapolated to any number of subjects. We perform simple graph analysis on the obtained CAG to find peculiar characteristics a cognitive concept graph might have. Some of the questions we are able to answer are, which associations are important than others, what distribution do association strengths have, which concept is central in comprehending a particular topic, which student has the maximum chance of learning new concepts and what is the significance of the threshold coefficient in learning new concepts. The CAG can be subjected to rigorous complex network analysis to derive other interesting inferences. From the theory it is clear that prior concepts play an important role in learning new concepts. As a potential direction we plan to study how the sequence of concepts affects concept learning.
References 1. Dietterich, T., Lathrop, R., Lozano-Perez, T.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89(1-2), 31–71 (1997) 2. Tenenbaum, J.B.: Bayesian modeling of human concept learning. In: Proceedings of the 1998 conference on Advances in neural information processing systems, vol. II, pp. 59–65 (July 1999) 3. Kintsch, W.: Predication. Cognitive Science: A Multidisciplinary Journal 25(2), 173– 202 (2001) 4. Kintsch, W.: The Role of Knowledge in Discourse Comprehension: A ConstructionIntegration Model. Psychological Review 95(2), 163–182 (1988) 5. Kintsch, W.: Text Comprehension, Memory, and Learning. American Psychologist 49(4), 294–303 (1994) 6. Kintsch, W., Van Dijk, T.A.: Toward a Model of Text Comprehension and Production. Psychological Review 85(5), 363–394 (1978) 7. Chater, N., Manning, C.D.: Probabilistic Models of Language Processing and Acquisition. Trends in Cognitive Sciences in Special issue: Probabilistic models of cognition 10(7), 335–344 (2006) 8. Landauer, T.K., Laham, D., Foltz, P.: Learning human-like knowledge by singular value decomposition: a progress report. In: Proceedings of the 1997 conference on Advances in neural information processing systems, Denver, Colorado, United States, vol. 10, pp. 45–51 (July 1998)
A Qualitative Approach of Learning in Parkinson’s Disease Delphine Penny-Leguy1,2 and Josiane Caron-Pargue1 1
Department of psychology, 97 Avenue du Recteur Pineau, F-86022 Poitiers cedex 2 CHU-Poitiers, 2 rue de la Milétrie, F-86000 Poitiers [email protected], [email protected]
Abstract. Verbal reports of PD patients and of two control groups (their matched Elderly and Young), obtained during the solving of the 4- disks Tower of Hanoi, were analyzed in terms of enunciative operations, cognitively interpreted. The analysis focuses on the processes involved in the reconstruction of implicit knowledge in a new context. Results show processes of deconstruction of implicit knowledge and several impairments in the processes involved in its reconstruction and stabilization. Notably, instead of locating objects relative to one another at the declarative level, PD patients locate objects, at the procedural level, by means of locating the place where they are or where they go, relative to one another.
1 Introduction Many studies attest to the emergence and the progress of cognitive impairments in Parkinson’s disease (PD). These impairments affect controlled processing and executive functions. More specifically, they are mainly due to working memory, notably when the task requires manipulation of information, but they remain secondary in visual recognition [3], [6], [10]. Furthermore, several difficulties in working memory have been specified and linked to the PD. A lack of flexibility appears when patients are confronted to new situations. A process of de-automatization arises when automaticity would have to be applied again, or when its construction has to be completed. Difficulties affect parallel processing, mainly at the declarative level while the procedural level seems independent from executive impairments [8], [9], [11]. Furthermore, impairments in language affect mainly the production of verbs, sentence comprehension, and pragmatic communication abilities (cf. [7] for a review). In fact, all these impairments appear as the result of deeper dysfunctions. The key point remains to characterize the deficient cognitive processes underlying them. Our assumption is that new insights could be brought from a cognitive approach of language by recent contributions in cognitive linguistics, notably by Culioli’s enunciative theory [5]. Indeed, some cognitive processes, marked by enunciative operations, intervene in the reconstruction of knowledge within the task [1], [2]. The aim of this paper is to compare verbal reports produced during the solving of the Tower of Hanoi test by PD patients and control groups, their matched Elderly (CE), and Young (CY). Our intent is to focus on three kinds of enunciative operations: external locations, in relation or not with an internal access to abstraction, modal Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 252–262, 2010. © Springer-Verlag Berlin Heidelberg 2010
A Qualitative Approach of Learning in Parkinson’s Disease
253
expressions and connectives which mark the planning. All of them intervene in the construction of cognitive units. Our general hypothesis is that impairments in PD can be specified at these points in order to show impairments in the construction of implicit knowledge and in the planning. The task. All participants solved the 4-Disks Tower of Hanoi four consecutive times. Half of each group was verbalizing, and the other half was not, during the solving process.1 Only the verbalizing groups are considered in this paper. The Tower of Hanoi was a wooden one, with three pegs (A,B,C) from the left to the right and four colored disks of decreasing size (pink disk 1, green disk 2, yellow disk 3, black disk 4, from the smallest to the largest one). In the initial state, all disks stood on peg A, each on a larger one. The goal was to move all disks on peg C in the same configuration, with two constraints: not to move a disk on a smaller one; not to move a disk on which another one was laying.
2 Theoretical Background Our assumption is that conceptual knowledge is reconstructed in the task as a contextual and local one, restricted to the current situation in time and space before being more or less generalized and de-contextualized [1], [2], [4]. Processes marked by linguistic forms, such as, for example, links between pieces of information or detachments from the situation, play a role in these de-contextualizations. They operate by means of interactions between the different parts of distributed representations, notably of internal-external interactions. Our hypothesis is that PD patients’ impairments which lead to de-automatisation and to a lack of generalization are concerned with these cognitive processes. A theoretical and methodological way for characterizing cognitive operations from linguistic markers is to consider two semiotic levels, one identifying enunciative operations from linguistic markers, the other giving a cognitive interpretation of those operations [1], [2]. Indeed, enunciative operations constitute a formal construct accounting for various steps in the construction of utterances from propositional contents. So, a cognitive interpretation of those operations, close to their formal definition, allows the characterization of cognitive re-organizations, escaping from both literal and subjective interpretations. In order to show PD patients’ impairments, we will consider links involved in the structure of implicit knowledge, and in the planning. These links can be marked by linguistic forms as follows. Propositional contents. A succession of similar propositional contents may be interpreted as a step toward the construction of implicit knowledge. According to our assumption, the cognitive processes involved in the actualization of propositional contents do not generally retain the organization of this knowledge, but re-organize it. 1
A preliminary experiment, not reported here, was about possible effects of verbalization. The results showed only a significant increase of total time in the three verbalizing groups (Young, Elderly, PD patients). No significant difference arose for the number of moves between verbalizing vs non verbalizing groups.
254
D. Penny-Leguy and J. Caron-Pargue
Then, we can expect impairments for the PD group in the internal-external interaction which requires several manipulations of information in working memory. These impairments must be found notably in the operations which intervene in the processes of internalization and externalization, and which categorize external occurrences. They could lock previous implicit knowledge to a very local situation by lack of internalization, or to a completely detached level unusable in the current situation by lack of externalization. Then, a new explanation could be given for the already mentioned PD’s de-automatization. Locations. Every oriented relation from a toward b is formalized in Culioli’s theory by an operation of location (fr. repérage), and cognitively interpreted as an attentional focusing on a, and as b coming with a [1]. Indeed, locations intervene in the re-organization of propositional contents as cognitive aggregates, in order to construct decontextualized and stabilized knowledge by means of internalization and externalization. The basic re-organization of a propositional content stands in its contextualization by means of locating it either relative to the situation or to the internal subjective space. The criterion used in order to recognize an internal location is the presence of a starting term. A starting term is defined by the detachment of an argument from the propositional content, this detachment being marked by an anaphora. A starting term marks an access to abstraction and to conceptual knowledge in order to internally reorganize and partly re-construct it within the situation. However, there is a constraint for that. At least one external location has to be categorized, or else the categorization is reduced to the prior local external occurrence [1]. The categorization of external occurrences plays a double role, first in internalization, re-inscribing the categorized locations at the internal level, second in externalization, re-introducing them at the external level. The absence of a starting term is the criterion used to recognize a cognitive processing in the external space. External locations may occur either at first as a very local processing, or later when decontextualized. Besides, procedural vs declarative aggregates, both with or without starting terms, may be constructed at different levels of control [2]. The criterion used in order to recognize them is the repetition of lexical choices associated either to the objects or to the moves. In the case of PD, one can expect impairments in the construction of declarative aggregates and in their matching with procedural aggregates. Modal expressions and planning. Two kinds of planning may be distinguished [2], one being automatic without difficulties, marked mainly by connectives, without modal expressions. The other, marked by modal expressions, is the critical, strategic planning. It involves a detachment from the situation aiming at recovering information, considering other possibilities, and re-organizing the planning. It arises in case of difficulties in the current processing of the situation. The critical planning may concern a strategic positioning of goals (marked by modal verbs can, want, have to), a strategic initialization of sequence (marked by well), or a strategic access to memory in terms of storage or retrieval (marked by interjections and verbs such as I think, I believe). It marks consecutive steps for identifying the constraints linked to the task and to the situation [2]. Impairments in PD can be expected in the planning as a deterioration of the automatic and an increase of the critical planning which will concern notably local reorganizations.
A Qualitative Approach of Learning in Parkinson’s Disease
255
3 Method 3.1 Participants Three groups of French subjects participated in the experiment: PD: a group of 20 non-demented patients, at the beginning of Parkinson’s Disease according to Hoehn and Yahr’s score (stades 1 to 3); CE: a control group of 20 elderly matched to the Parkinson group; CY: a control group of 20 young students. The mean age of each group was respectively 62, 61, and 18. All participants were healthy, right-handed, and novice for solving the ToH puzzle. Each group comprised 10 women and 10 men. PD patients were medicamented with L-Dopa. Both PD and CE were selected on the basis of their score at the following tests,: score higher or equal to 25 at the Mini Mental State Examination (MMSE); score lower or equal to 20 at Mongomery and Asberg’s scale of depression (MADRS); score between 120 and 136 at MATTIS’s scale of impairments in executive functions. 3.2 Linguistic Criteria The blocks. The succession of similar propositional contents verbalizing consecutive moves characterizes implicit units of knowledge, called ‘blocks’. Different kinds of blocks can be defined, and differentiated by criteria defining the predicative relations as follows, independently of final lexical choices (PD1 means PD patient, trial 1; CE3 Elderly, trial 3; CY3 Young, trial 3): TGT (Take, Go, To), e.g. (PD1) I take the pink disk I place it on the C; (CE3) I remove the pink one it goes on the A; (CY3) I take the pink one I put it to A. GT (Go, To), e.g. (PD1) I put the B on the C; (CE2) then I pass the pink one to B. T (To), e.g. (CY4) pink one on B; (PD3) the C on the B. 0 when the predicative relation is not verbalized. For example: (CP4) and then ABC no? ; (CE4) and the pink one; (CP4) oh right! ; (CP3 ) no-verbalization at all. Then four kinds of blocks may be defined, each of them associated to the predicative relations: BTGT, BGT, BT, B0. The succession of these blocks may imply either a simplification or a complexification of the verbalization, according to the following criteria: Simplified blocks: if two blocks follow each other in the order BTGT, BGT, BT, B0. Complexified blocks: if the reverse order arises between two blocks. Furthermore, blocks can be reorganized by the following linguistic markers, which can appear inside a block or between blocks: Connectives. Inside a block, connectives partition the implicit knowledge; between blocks, they link them. Modal expressions. Inside a block, modal expressions mark a local modification of the planning; outside, they mark a substantial reorganization of the planning.
256
D. Penny-Leguy and J. Caron-Pargue
Internal-external organization. Our intent is to study this organization from the repartition of locations between two consecutives moves, constructing various aggregates of moves, with or without starting terms. Starting terms e.g. (PD1) I take the green disk I put it on the B; (CE1) the pink one I put it on the C. The terms the green disk and the pink one have both the status of starting terms, marked by anaphora it. External aggregates e.g. (CY3) the green one on the A – the pink one on the A; (CE1) the green one on the yellow one – the pink one on the green one. The criterion for recognizing an aggregate is the repetition of the lexical choice of arguments: either the repetition of the naming of pegs, e.g. the A in (CY3), or the repetition of the naming of disks, e.g. the green one in (CE1). Furthermore no starting term appears in the verbalization of an external aggregate. Categorized aggregates e.g. (PD1) I take the green disk I place it on the C – I take the pink disk I place it on the green one on the C; (CE2) then the green one the A I put it on the C – the pink one the B I put it on the C. The criterion is that there is at least one starting term in the verbalization of an aggregate: e. g., in (PD1) the green disk and the pink disk are both starting terms, marked by anaphora it, with two aggregates, respectively marked by repetitions the green and the C; in (CE2) the green one and the pink one are both starting terms, marked by anaphora it, and with an aggregate marked by the repetition the C. Declarative-procedural organization. This category refers to aggregates with or without starting terms. Declarative aggregates. The criterion for recognizing them is the repetition of the naming of disks, e. g. repetition of the green one in (CE1) then the green one I put it on the B on the yellow disk – the pink one on the green one; repetition of the black one in (CY2) the black one on the C – the pink one on the black one. Procedural aggregates. The criterion is the repetition of the naming of pegs. We distinguish the specific case where the repeated naming is the naming of a peg referring to a disk: Peg-aggregates e. g. with the repetition the B in (CY1) the yellow one on the B – er the pink one on the B; repetition B in (CE2) right! the C to B – the A to B. Disk-peg-aggregates e. g. with repetition B in (CE2) the yellow one B returns on the C – the pink one on the B; repetition A in (PD2) I take the A I put it on the C – I take the pink one I put it on the A; repetition the A in (PD4) the A on the B – the C on the A; repetition A in (PD4) the pink disk on the A – the disk A on the C; repetition AB in (PD4) the AB on the C – and the pink one AB. 3.3 Dependent Variables Dependent variables were constructed in relation to the different linguistic criteria used for studying the structure of propositional contents and the re-organization of this structure. Blocks of identical propositional contents, each verbalizing a move, were identified and characterized by their respective number and length.
A Qualitative Approach of Learning in Parkinson’s Disease
257
The ratio to moves was computed for each linguistic criterion by 100 x ratio to the total number of moves, except for the ratio of blocks, computed in ‰. A ratio to words (1000 x ratio to the total number of words) was also computed, but not presented here. The length of blocks, and the ratio of simplified blocks or of complexified blocks were computed as follows: Length: the length of a block is the number of moves inside the block. Ratio of simplified blocks: it is the ratio of the number of blocks followed by a more simple one, to the total number of blocks of the trial. Ratio of complexified blocks: it is the ratio of the number of blocks followed by a more complex one, to the total number of blocks of the trial.
4 Results Specificities of PD patients. Both ratio and length of blocks B0 are higher for PD than for CE and CY, F(2, 27) = 4.62(3.60), p < .05(.05) respectively (cf. Table 1), both non-significantly decreasing with trials. Furthermore, in blocks BTGT, the length decreases with trials for PD, F(3, 27) = 2.68, p = .05, without significant effect for others. In blocks BGT, the length is higher in trial 1, decreasing faster with trials than for CE, with interaction group × trials, F(3, 54) = 4.51, p < .01. The simplification of blocks shows a higher ratio for PD than for CE, itself higher for CE than for CY, F(2, 27) = 9.21, p < .05 (means PD: 4.8, CE: 2.8, CY: 1.3). Furthermore, both ratios of simplification and of complexification decrease for CE (p < .05) and for CY (p < .01), while non-significantly decreasing for PD. Inside blocks, the ratio of Modal Expressions is higher on trial 1, mainly in blocks BGT, for PD than for CE, but without significant effect on other trials, with the interaction groups × trials, F(6, 54) = 3.30, p < .05. It persists for PD in blocks BT and B0, and almost disappears for CE. The ratio of Connectives decreases with trials for CE and CY, but does not significantly for PD. In the whole strategy, independently of blocks: at the procedural level, the ratio of Disk-peg-aggregates increases with trials for PD, while non-significant for CE, with a significant interaction groups × trials, F(3, 54) = 3.53, p < .05 (cf. fig 1). Table 1. Means of ratio and Length for the different blocks and groups
Ratio ‰
Length
PD
CE
CY
PD
CE
CY
Block BTGT
1.5
2.3
1.5
0,8
1,6
1,8
Block BGT
3.0
6.8
1.0
1,7
1,2
0,2
Block BT
8.5
9.0
6.0
9,8
9.0
13.8
Block B0
4.9
1.2
0,7
2,8
0,3
0,1
258
D. Penny-Leguy and J. Caron-Pargue
Fig. 1. Ratio to moves of linguistic markers with trials for the three groups PD, CE, and CY. Cnn: Connectives, Mod: Modal expressions, Loc: Aggregates, LocST: Categorized aggregates, PegLoc: Peg-Aggregates, D-Peg-Loc: Disk-Peg-Aggregates.
A Qualitative Approach of Learning in Parkinson’s Disease
259
Similarities between PD patients and Young. Other specificities of PD patients are marked by a non-significant difference with CY but a significant one with CE, as follows: Between blocks, the ratio of Connectives is lower for PD and CY than for CE, respectively F(1, 18) = 5.05(4.62), p < .05(.05). In the whole strategy, independently of blocks (cf. fig. 1): Categorized aggregates show a non-significant effect of trials for PD and CY, but a significant decrease for CE, F(3, 27) = 3.68, p < .05. Declarative aggregates have their ratio decreasing with trials for PD and CY, respectively F(3, 27) = 2.7(3.15), p = .05(< .05), while increasing for CE, F(3, 27) = 3.67, p < .05. Peg aggregates have an increasing ratio with trials for PD and CY, respectively F(3, 27) = 2.86(4.01), p = .05( p < .05), but non-significant for CE. Similarities between PD patients and Elderly. No significant difference appears between PD and CE, both distinct from CY: In block BGT, the ratio and the decreasing length, are both higher for PD and CE than for CY, respectively F(2, 27) = 3.68(4.06), p < .05(.05). The complexification of blocks shows a higher ratio for PD and CE than for CY (means PD: 2.5, CE: 2.1, CY: 0.4), F(2, 27) = 4.70, p < .05. Inside blocks, the ratio of Connectives does not significantly decrease for PD, but does for CY, F(3, 27) = 2.65, p = .05. Between blocks, for Modal expressions, the ratio decreases with trials for both PD and CE, F(3.54) = 4.32, p < .01. In the whole strategy, independently of blocks: Modal expressions show no group effect and no effect with trials between PD and CE. The ratio in CY is very low (cf. fig. 1). Starting terms show a higher ratio for PD and CE than for CY (means PD: 3.1, CE: 4.8, CY: 1.1), with a significant difference between CE and CY, F(1, 18) = 4.95, p < .05. Their ratio decreases with trials for PD and CE, F(3, 81) = 11.03, p < .001. External aggregates show an effect of trials, with an increase of their ratio, significant for PD and CE, F(3, 81) = 5.25, p < .01), but not for CY. Peg aggregates. A significant effect of group shows a lower ratio for PD and CE than for CY, F(2, 27) = 3.30, p = .05. Disk-peg aggregates. Only PD and CE produce them (means PD: 1.28, CE: 0.79, CY: 0.06). There is no significant effect of group between PD and CE. Similarities among the three groups. In blocks BTGT, BT, there is no effect of group for ratio and length. In BT, the ratio increases with trials in the three groups. Inside blocks there is no effect of group but a decreasing effect of trials for Connectives. There are very few Modal expressions in blocks BTGT for the three groups. In the whole strategy, independently of blocks: for Connectives, there is no effect of group, but a decrease with trials; for Declarative aggregates, there is no main effect, nor interaction (means PD: 0.22, CE: 0.44, CY: 0.24); for External and Categorized aggregates, there is no group effect.
260
D. Penny-Leguy and J. Caron-Pargue
5 Discussion Our results show two complementary kinds of processes underlying PD patients’ impairments, both explaining disorganization and lack of stabilization in knowledge. The first kind of impairments concerns implicit knowledge which is disorganized as it is constructed. The construction of implicit knowledge by PD patients shows two specificities. One is due to the highest ratio and the highest length of blocks B0, defined by the absence of verb, and the other to a high ratio, persisting along trials, in the simplification of propositional contents defining blocks. This simplification is progressive and begins by a decrease in the length of blocks BTGT and BGT, and by the decrease of re-organizations inside blocks with Connectives and Modal expressions. But, at the same time implicit knowledge is de-constructed, and gives rise to an apparent de-automatization [8], [9], [11]. That is marked by the highest ratio of complexification of propositional contents, persisting along trials, and by the amount of Modal expressions inside blocks BGT, persisting in blocks BT, and B0. In order to explain that, one can’t resort to impairments in the use of verbs [7], because verbs appear without difference in blocks BTGT, BT, among the three groups of subjects, and in BGT between PD patients and Elderly. The explanation rather stands in impairments in the interaction between two kinds of detachments from the situation, one marked by Modal expressions, the other by starting terms. Both kinds of detachments aim at the re-organization of solving process and at the identification of constraints. They should give rise to an automatic planning, marked mainly by connectives instead of modal expressions. In fact, it is not the case, as shown by the absence of significant difference for the ratio of Connectives, between PD patients and Young, while this ratio is significantly higher for Elderly. This discrepancy leads to think that the apparent similarity between PD patients and Young does not rely on the same kind of processes, while the discrepancy between PD patients and Elderly marks a real impairment for PD patients. A second kind of impairments observed for PD patients involves qualitative differences in procedural processing, and constitutes the most original part of our data. That relies on a specific increase of the ratio of Disk-peg aggregates for PD patients, while it decreases for Elderly, and is almost nil for Young. These differences show an odd location of disks relative to one another taking its explanation in the light of BégoinAugereau and Caron-Pargue’s data [2]. Indeed, the location of disks relative to one another should have been marked by a repetition of the naming of disks, so constructing declarative aggregates. That is not the case, the naming of disks refers to the place where disks are. Then, instead of disks being locating relative to one another, it is the places where disks are, which are located relative to one another, so constructing procedural aggregates. That arises at the detriment of the further constitution of disks in larger units, notably pyramid 12, and therefore at the detriment of the chunking of the conditions of procedures. Here again, the apparent absence of impairment of PD patients relative to Young, in the decrease with trials of Declarative aggregates, probably relies on different kinds of processes, while there is a real impairment relative to their matched Elderly. One can think that such impairments might result from the motor training of PD patients in order to compensate them. That might entail a shift of attention toward the places of objects at the detriment of objects themselves. But the convergence of our
A Qualitative Approach of Learning in Parkinson’s Disease
261
data convinces that the problem is elsewhere. In fact, an apparent absence of impairment of PD patients relative to Young for Categorized aggregates is, in fact, a real impairment relative to Elderly. Furthermore, complementary analyses, not presented here, show that the non-significant decrease with trials of Categorized aggregates, observed for PD patients, concerns mainly the Categorized-Disk-peg-aggregates, while a significant decrease arises for both Categorized-Peg-aggregates and Categorized-Disks-aggregate. That is a specificity of PD patients because, on the contrary, the ratio of Categorized-Disk-peg-aggregates decreases significantly for Elderly, and the ratios of Categorized-Disk-aggregates and Categorized-peg-aggregates, remain both non significantly decreasing for Elderly and for Young. In fact, the categorized aggregates play an essential role in the processes of generalization by internalization and externalization, and in adjustments between conditions and actions, giving rise to the matching between declarative and procedural levels. The amount of Disk-pegaggregates hampers these processes. In sum, our data show that PD patients might begin internal-external interactions, essential for stabilization, and manipulation of information in working memory. But, they do not succeed in carrying on this interaction, because some slight deviations occur during the usual processing. That hampers their whole coordination, which intervene necessarily in the generalization of knowledge. Some of these odd processings were characterized in our analysis, but further investigations are needed.
6 Conclusion A main assumption underlying this research is that implicit knowledge has not the same nature as stabilized representational knowledge. Between them, there is a gap where several re-organizations take place and result in the already observed PD patients’ impairments. Our data provide new explanations relying on the characterization of refined cognitive processes underlying these impairments. These processes, marked by linguistic forms, may give rise to testable predictions, able to characterize early PD. Nevertheless, this research, which takes place within a larger study aiming at formalizing the problem solving processes from linguistic markers, remains exploratory. Furthermore, the case of the 4-disks Tower of Hanoi, too simple for the control groups, does not allow the grasp of all the steps involved in the processes we have characterized. Further analyses are needed. By and large, our approach involves a semiotic approach of general cognitive processes, relying on an enunciative approach of relations in the context of the current situation. Such an approach could be extended to the study of communication [2], and suggests very refined and interacting neurological processes underlining those processes.
References 1. Bégoin-Augereau, S., Caron-Pargue, J.: Linguistic Markers of Decision Processes in a Problem Solving Task. Cognitive Systems Research 10, 102–123 (2009) 2. Bégoin-Augereau, S., Caron-Pargue, J.: Modified decision processes marked by linguistic forms in a problem solving task. Cognitive Systems Research 11, 260–286 (2010)
262
D. Penny-Leguy and J. Caron-Pargue
3. Blanchet, S., Marié, R.M., Dauvillier, F., Landeau, B., Benali, K., Eustache, F., Chavoix, C.: Cognitive processes involved in delayed non-matching-to-sample performance in Parkinson’s disease. European Journal of Neurology 7, 473–483 (2000) 4. Clancey, W.J.: Is abstraction a kind of idea or how conceptualization works? Cognitive Science Quarterly 1, 389–421 (2001) 5. Culioli, A.: Cognition and representation in linguistic theory. J. Benjamins, Amsterdam (1995) 6. Goel, V., Pullara, D., Grafman, J.: A computational model of frontal lobe dysfunction: working memory and the Tower of Hanoi task. Cognitive Science 25, 287–313 (2001) 7. Holtgraves, T., McNamara, P., Cappaert, K., Durso, R.: Linguistic correlates of asymmetric motor symptom severity in Parkinson’s disease. Brain and Cognition 72, 189–196 (2010) 8. Koerts, J., Leenders, K.L., Brouwer, W.H.: Cognitive dysfunction in non demented Parkinson’s disease patients: controlled and automatic behavior. Cortex 45, 922–929 (2009) 9. Muslimovic, D., Post, B., Speelman, D., Schmand, B.: Motor procedural learning in Parkinson’s disease. Brain 130, 2887–2897 (2007) 10. Owen, A.M., Iddon, J.L., Hodges, J.R., Summers, B.A.: Spatial and non spatial working memory at different stages of Parkinson’s disease. Neuropsychologia 35, 519–532 (1997) 11. Taylor, A.E., Saint-Cyr, J.A.: The neuropsychology of Parkinson’s disease. Brain and cognition 28, 281–296 (1995)
Modelling Caregiving Interactions during Stress Azizi Ab Aziz, Jan Treur, and C. Natalie van der Wal Department of Artificial Intelligence, VU University Amsterdam De Boelelaan 1081, 1081HV Amsterdam, The Netherlands {mraaziz,treur,cn.van.der.wal}@few.vu.nl http://www.few.vu.nl/~{mraaziz,treur,cn.van.der.wal}
Abstract. Few studies describing caregiver stress and coping have focused on the effects of informal caregiving for depressed care recipients. The major purpose of this paper was to investigate the dynamics of the informal care support and receipt interactions among caregivers and care recipients using a computational modelling approach. Important concepts in coping skills, strong ties support networks and stress buffering studies were used as a basis for the model design and verification. Simulation experiments for several cases pointed out that the model is able to reproduce interaction among strong tie network members during stress. In addition, the possible equillibria of the model have been determined, and the model has been automatically verified against expected overall properties.
1 Introduction Caring for a family member, spouse or friend (informal caregiving) who is diagnosed with a severe illness (e.g., a unipolar disorder) can be a stressful experience. While most caregivers adapt well to the situation of caring for a person with a unipolar depression, some do not. A number of studies investigate the negative consequences for the informal caregiver, such as the development of depression, burden, burnout, or (chronic) stress, when caring for elderly patients or patients with illnesses like dementia, or Parkinson’s [5], [6], [7], [9], [10]. The current paper addresses the development of stress in informal caregivers of patients with unipolar depression and the effect of this stress on the interactions between the caregiver and care recipient. To understand the caregiver’s adaptations to the cognitive disabilities of his/her close acquaintance, the complex nature of stress processes must be accounted for and the constructs and factors that play a function in the caregiving must be considered. For each individual a number of cognitive and physiological mechanisms regulate the impact of stress on health and well-being. Individuals typically occupy multiple roles in life; becoming a caregiver of a person with depression introduces an additional role, and therefore will require some rearrangement of priorities, and redirection of energy [10]. Not only is this likely to produce strain at a personal level, but it is also likely to spur reactions (potentially negative) from diverse people who are interconnected to a person through his or her roles outside the realm of caregiving. Although much work has been dedicated to understand the caregiving mechanism, little attention has been paid to a computational modelling angle on how caregivers work together to support their close acquaintances under stress. The caregiving process Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 263–273, 2010. © Springer-Verlag Berlin Heidelberg 2010
264
A.A. Aziz, J. Treur, and C.N. van der Wal
is highly dynamic in nature, and it requires demanding resources to monitor such a process in the real world [6]. The aim of this paper is to present a computational model that can be used to simulate the dynamics in the caregiver and care recipient under influence of external events. The current work is an addition to our previous model of social support selection, where in the current model, individuals with a depressive state are receiving help from close acquaintances [1]. The paper is organized as follows; Section 2 describes several theoretical concepts of social support networks and their relation to stress. From this point of view, a formal model is designed (Section 3). Later in Section 4, a number of simulation traces are presented to illustrate how the proposed model satisfies the expected outcomes. In Section 5, a mathematical analysis is performed in order to identify possible equilibria in the model, followed by verification of the model against formally specified expected overall patterns, using an automated verification tool (Section 6). Finally, Section 7 concludes the paper.
2 Underlying Principles in Informal Caregiving Interactions Researchers from several domains have become increasingly interested in social support, caregiving, and mental health. For instance, researchers in nursing and healthcare domain have contributed several theories to explain those relationships by presenting foundations on coping behaviours, mediating attributes, caregiving adaptation, and stress. One of the theories that has been used to explain these interactions is the Theory of Caregiver Stress and Coping which combines important principles in Lazarus Stress-Coping Theory, Interpersonal Framework of Stress-Coping, and Stress Process Theory of Pearlin [3], [4], [11]. Within the model introduced, three aspects play important roles to regulate support and maintain the caregiver’s personal health: 1) externally generated stressors (negative events), 2) mediating conditions, and 3) caregiver outcomes [4], [6], [10]. For the first aspect, stressors are related to specific internal or external demands (primary stressors) that the caregiver has to manage. For example, several studies show that sufficient caregiver personal resources (e.g. financial incomes, social) reduces the perception of caregiving burden, while a loss of emotional resources (long term emotional exhaustion) amplifies the perceived burden [9]. The second aspect represents how the caregiver reacts (coping strategies) when facing the adversity in caregiving. In the proposed model, caregivers who face a primary stressful situation generally use a combination of problem-focused coping and emotion-focused coping. Problem-focused coping is associated with positive interpersonal efforts to get the problem solved [3]. In contrast to this, emotion-focused coping strategies (thinking rather than acting to change the person-environment relationship) entail efforts to regulate the emotional consequences (e.g. avoidance) of stressful or potentially stressful events [4]. This choice of coping is related to the caregiver’s personality, for example, a caregiver with a positive personality (e.g., low in neuroticism) tends to choose problem-focused approach [5]. Another important concept that can derived from these coping strategies is the relationship focused coping (positive or negative). The combination of high caregiver’s empathy (perceiving the inner feeling of care recipient) and problem-focused coping will lead to positive relationship coping, and vice versa [4], [7], [8]. The third aspect is related to the caregiver’s outcome. Mainly, this component ranges on a continuum from bonadaptation
Modelling Caregiving Interactions during Stress
265
(meeting the needs to support the care recipient) to maladaptation (continued negative situation and need for referral and assistance) [4], [11]. In addition to this, bonadaption is related to the high personal accomplishment (expected personal gain) and provided support (social support), while maladaptation is linked to the emotional exhaustion [9]. A high expected personal gain reduces the short term and long term stress level in caregivers, which will improve interaction during the caregiving process [7]. When the care recipients receive support, it will reduce their stress by the resource serves as an insulating factor, or stress buffer, so that people who have more social support resources are less affected by negative events [5], [6].
3 Modeling Approach Based on the analysis of the dynamics in coping behaviours, mediating attributes, caregiving adaptation, and stress, as given in the previous section, it is possible to specify computational properties for the multi-agent model. The results from the interaction between these variables form several relationships, both in instantaneous and in temporal form. To represent these relationships in agent terms, each variable will be coupled with an agent’s name (A or B) and a time variable t. When using the agent variable A, this refers to the caregiver agent and B to the care recipient agent. This convention will be used throughout the development of the model in this paper. The details of this model are shown in Fig. 1.
Caregiver
Caregiver personality
Problem focused Coping
Caregiver empathy
Caregiver (primary stressors)
Bonadaptation
Burden Neg. r/ship focused coping
Caregiver’s short-term stress Maladaptation
Experienced personal satisfaction
Care recipient negative events Care recipient negative interaction
Social support
Expected personal gain
Coping style
Caregiver outcomes
Care recipient health problem status
Support from caregiver
Long term emotional exhaustion Short term emotional exhaustion
Caregiver personal resources
Emotion focused Coping Caregiver negative events
Pos. r/ship focused coping
Negative interaction
Caregiver’s long-term stress
Care recipient functional status
Care recipient negative personality
Care recipient’s long-term stress Care recipient behavioral status
Care recipient coping skills
Stress buffering
Care recipient perceived stress
Care recipient’s short- term stress
Care recipient
Fig. 1. Global Relationships for Caregiving Interactions During Stress
266
A.A. Aziz, J. Treur, and C.N. van der Wal
3.1 The Caregiver Model This component of the overall model aims to formalise important concepts within the caregiver. The instantaneous relationships are expressed as follows. The problemfocused coping PfC is calculated using the combination of the caregiver personality GpP and burden Bd. Note that a high burden level close to 1 will have the effect that the choice of using problem focused coping becomes smaller. PfCA(t) = GpPA(t).(1-BdA(t)) EfCA(t) = (1-GpPA(t)).BdA(t)
(1) (2)
However in emotional-focused coping EfC, those factors provide a contrasting effect. Positive relationship focused coping (RfC+) depends on the relation between problem focused coping and caregiver’s empathy. A high empathy will increase this function, while reducing its counterpart (negative relationship focused coping (RfC-)). RfCA+= PfCA(t).GEA(t) RfCA- = EfCA(t).(1-GEA(t))
(3) (4)
Burden (Bd) is determined by regulating proportional contribution β between caregiver primary stressors (GpS), long term emotional exhaustion (ExH), and caregiver resources (GpR). Expected personal gain (PgN) is measured using the proportional contribution (determined by α) of the bonadaption (Bn) and experienced personal satisfaction EpN. Short term emotional exhaustion EsH is measured by combining maladaption Md and negative relationship of expected personal gain. BdA(t)=[β.GpSA(t)+(1-β).ExHA(t)].(1-GpRA(t)) PgNA(t) = σ.BnA(t) + (1-σ).EpNA(t) EsHA(t) = MdA(t).(1-PgNA(t))
(5) (6) (7)
Caregiver short term stress GsS is related to the presence of caregiver negative events GnE and burden Bd. Note that a high expected personal gain will reduce the short term stress level. The maladaptation Md is calculated using the combination of negative (RfC-), positive, relationship, and emotional-focused coping. In the case of bonadaptation, it is determined by measuring the level of positive, negative, relationship, and problem-focused coping. Parameters φ, ϒ, and ρ provide a proportional contribution factor in respective relationships. In addition to the instantaneous relations, there are four temporal relationships involved, namely experienced personal satisfaction EpN, long term emotional exhaustion ExH, caregiver long term stress GlS, and social support ScP. The rate of change for all temporal relationships are determined by flexibility rates, γ, ϑ, ϕ, and ψ, respectively GsSA(t) = [φ.GnEA(t) + (1-φ).BdA(t)].(1-PgNA(t)) MdA(t) =[ϒ.RfCA-(t)+(1-ϒ).EfCA(t)](1-RfCA+(t)) BnA(t) =[ρ.RfCA+(t)+ (1-ρ).PfCA(t)].(1-RfCA-(t))
(8) (9) (10)
The current value for all of these temporal relations is related to the previous respective attribute. It should be noted that the change process is measured in a time interval between t and t+Δt. The operator Pos for the positive part is defined by Pos(x) = (x + |x|)/2, or, alternatively; Pos(x) = x if x≥0 and 0 else. ExHA(t+Δt) = ExHA(t)+γ.[(Pos(EsHA(t)- ExHA(t)).(1-ExHA(t))) – Pos(-(EsHA(t) - ExHA(t)).ExHA(t))]. Δt EpNA(t+Δt)=EpNA(t)+ ϑ.[(Pos ((ScpA(t)- GpSA(t))–EpNA(t)).(1-EpNA(t))) – Pos(- ((ScpA(t)-GpSA(t)) – EpNA(t)).EpNA(t))].Δt GlSA(t+Δt)= GlSA(t) + ϕ.(GsSA(t)-GlSA(t)).(1- GlSA(t)).GlSA(t)].Δt ScPA(t+Δt)= ScPA(t) + ψ.[(Pos(PgNA(t) - ScPA(t)).(1-ScPA(t))) – Pos(-(PgNA(t)- ScPA(t)).ScPA(t))].Δt
(11) (12) (13) (14)
Modelling Caregiving Interactions during Stress
267
3.2 The Care Recipient Model The care recipient model is another interacting components in the overall model. It has five instantaneous relations (care recipient perceived stress RpS, stress buffer SbF, care recipient short term stress RsS, care recipient functional RfS, and behavioural status RbS) and one temporal relation (care recipient long term stress RlS). RpSB(t) = τ.RnIB(t) + (1-τ).RnEB(t) SbFB(t) = ω.RsGB(t) RsSB(t) = [λ.RpB(t) + (1-λ).(1- RcSB(t))].RpSB(t).(1-SbfB(t)) RfSB(t) = RhSB(t).RlSB(t) RbSB(t) = RpB(t).RlSB(t) RlSB(t+Δt)= RlSB(t) + η.(RsSB(t)-RlSB(t)).(1-RlSB(t)).RlSB(t).Δt
(15) (16) (17) (18) (19) (20)
Care recipient perceived stress is modelled by instantaneous relations (regulated by a proportional factor τ) between the care recipient negative interactions RnI and events RnE. Stress buffer is determined by ω times received support RsG. Care recipient short term stress depends on the relation between stress buffer SbF, and the proportion contribution λ of care recipient coping skills RcS, perceived stress RpS, and negative personality RpS. For the care recipient functional and behaviour status levels, both of these relations are calculated by multiplying the value of care recipient health problem status RhS and negative personality Rp with care recipient long term stress RlS respectively. In addition, the temporal relation of care recipient long term stress is contributed from the accumulation exposure towards care recipient short term stress with the flexibility rate η.
4 Simulation Results In this section, a number of simulated scenarios with a variety of different conditions of individuals are discussed. Only three conditions are considered: prolonged, fluctuated stressor, and non-stressful events with a different personality profile. For clarity, cg and cr denotes caregiver and care recipient agent profiles respectively. The labels ‘good’ and ‘bad’ in Table 1 can also be read as ‘effective’ and ‘ineffective’ or ‘bonadaptive’ and ‘maladaptive’. Table 1. Individual Profiles cg1 cg2 cr1 cr2
Caregiver (‘good’ caregiver) (‘bad’ caregiver) Care recipient (‘good’ coping skills) (‘bad’ coping skills)
GpR 0.8 0.1 RhS 0.9 0.9
GE 0.7 0.2 Rp 0.9 0.9
GpP 0.7 0.2 RcS 0.8 0.1
Corresponding to these settings, the level of severity (or potential onset) is measured, defining that any individual that scored more than 0.5 in their long term stress level (within more than 336 time steps) then the caregiver or support receipt agent will be experiencing stress. There are several parameters that can be varied to simulate different characteristics. However, the current simulations used the following parameters settings: tmax=1000 (to represent a monitoring activity up to 42 days), Δt=0.3, (flexibility rate) ϕ=η=β=ψ=ϑ=0.3, (regulatory rate) α=β=ϒ=ρ=σ=φ=τ=λ=0.5,
268
A.A. Aziz, J. Treur, and C.N. van der Wal
ω=ξ=0.8. These settings were obtained from previous systematic experiments to determine the most suitable parameter values in the model.
lev els
lev els
Result # 1: Caregiver and receiver experience negative events. During this simulation, all agents have been exposed to an extreme case of stressor events. This kind of pattern is comparable to the prolonged stressors throughout a life time. For the first simulation trace (Fig. 2(a)), a good caregiver tends to provide a good social support provision towards its care recipient even facing persistent heighten stressors. This pattern is in line with the 1 findings reported in [5]. 0.8 One of the factors can be used to explain this condi0.6 Long term stress (cg1) tion is the increasing level Expected personal gain (cg1) Long term stress (cr2) of caregiver’s personal Stress buffer (cr2) 0.4 gain. It proposes that care0.2 givers do not unequivocally view caregiving as an over0 1 101 201 301 401 501 601 701 801 901 time(t) whelmingly negative ex1 perience but can appraise 0.8 the demands of caregiving as rewarding [4], [9]. Pre0.6 Long term stress (cg2) vious research works has Expected personal gain (cg2) Long term stress (cr2) 0.4 also suggests that caregivStress buffer(cr2) ing satisfaction is an im0.2 portant aspect of the care0 giving experience and seem 1 101 201 301 401 501 601 701 801 901 time (t) to share parallel relationships with other variables Fig. 2. Simulations during prolonged stressors for (a, upper (e.g, personality and empagraph) a good caregiver and bad care recipient (b, lower graph) a bad caregiver and bad recipient thy) [4], [11]. Moreover, a good caregiver normally uses a problem focused coping to solve the perceived problem and later increases positive relationship focused coping. By the same token, research has consistently established a significant relationship between personal gains, problem focused coping, and positive social support. For example, several studies reported that caregivers who were satisfied with caregiving used more problem-focused coping [3]. Having this in motion, it provides a positive view of social support and later will be translated as a support received by the care recipient. In the second simulation trace (as shown in Fig. 2(b)), both agents (caregiver and care recipient) are facing high long term stress levels in the long run. The precursors of having these conditions are perception of caregiving as a burden and the inability of the caregiver to provide positive coping during stressful events [11]. These factors lead to the decreasing level of caregiver’s positive relationship focused coping and experienced personal gain, and later will reduce the ability to provide support. Additionally, in the real world, it can be perceived as feeling overwhelmed and out of control of the situation. This condition occurs almost within the majority of caregivers when they feel burdened by the demands of caregiving [6].
Modelling Caregiving Interactions during Stress
269
levels
lev els
Result # 2: Caregiver and receiver experience different types of negative events. In this simulation, a new kind of stressor was introduced. This stressor comprises two parts: the first part is one with very high constant prolonged stressors, and is followed by the second one, with a 1 very low stressor event. During simulation, the 0.8 Long term stress (cg1) caregiver agents (cg1 and 0.6 Expected personal gain (cg1) Long term stress (cr2) cg2) were exposed towards Stress buffer (cr2) these stressors, while the 0.4 care recipient agents will 0.2 only experience prolonged 0 stressors. As it can be seen 1 101 201 301 401 501 601 701 801 901 time(t) from Fig. 3(a), the graph 1 Long term stress (cg2) Expected personal gain (cg2) indicates both agents (cg1 Long term stress (cr2) 0.8 Stress bufffer (cr2) and cr2) experience grad0.6 ual drops in their long term stress. Comparison be- 0.4 tween Fig. 3(a) and Fig. 3(a), shows that the sce- 0.2 nario’s almost have a simi0 1 101 201 301 401 501 601 701 801 901 time(t) lar pattern, but 3(a) has a substantial decrease in a Fig. 3. Simulation traces during different stressors for caregiver’s long term stress (a, upper graph) a good caregiver and bad care recipient (b, lower graph) a bad caregiver and bad recipient level after the first half of the simulation. It is consistent with the findings that caregivers with a positive personality, empathic, and high personal resources tend to help more if they experienced less negative event [3], [8]. Meanwhile, Fig. 3(b) provides different scenarios. The simulation results show that caregivers with a negative personality, less empathic, and low personal resources is incapable to provide support during caregiving process. Note that despite the caregivers experience non-stressor events after the first half of the simulation, their care recipient is still experiencing a high long term stress level. Similar findings can be found in [5], [10]. Result # 3: Managing a good care recipient. In this part, simulation was carried out to investigate the effects of the caregiving behaviours of caregiver agents with different profiles to good care recipients, during prolonged negative stressors. Interaction between good caregiver and recipient shows that both agents have low long term stress levels, while the recipients stress buffer and the caregiver’s expected personal gain are increasing [5], [7]. On the contrary, interaction between bad caregiver and good care recipient indicates that both agents are experiencing high long term stress levels. However, the care recipient experiences lesser long term stress compared to the caregiver.
5 Mathematical Analysis In this section it is discussed which equilibria value are possible for the model, i.e., values for the variables of the model for which no change will occur. As a first step
270
A.A. Aziz, J. Treur, and C.N. van der Wal
the temporal relations for both caregiver and care recipient will be inspected (refer to the equations (11),(12),(13),(14),and (20)). An equilibrium state is characterised by: ExHA(t+Δt) = ExHA(t), ScPA(t+Δt)= ScPA(t), GlSA(t+Δt)= GlSA(t), EpNA(t+Δt)=EpNA(t), and RlSB(t+Δt)= RlSB(t). Assuming γ, ψ, ϕ, ϑ nonzero, and leaving out t, this is equivalent to: [(Pos(EsHA-ExHA).(1-ExHA)) – Pos(-(EsHA-ExHA).ExHA)] = 0 [(Pos(PgNA-ScPA).(1-ScPA)) – Pos(-(PgNA-ScPA).ScPA)] = 0 (GsSA-GlSA).(1-GlSA).GlSA = 0 [(Pos((ScPA-GpSA)–EpNA).(1-EpNA))-Pos(-((ScPA-GpSA)–EpNA).EpNA)] = 0 (RsSB-RlSB).(1-RlSB).RlSB = 0
These equations are equivalent to: and (EsHA-ExHA).ExHA = 0 (EsHA-ExHA).(1-ExHA) = 0 (PgNA-ScPA).(1-ScPA) = 0 and (PgNA-ScPA).ScPA = 0 (GsSA-GlSA).(1-GlSA).GlSA = 0 ((ScPA-GpSA) –EpNA).(1-EpNA)) = 0 and ((ScPA-GpSA) –EpNA).EpNA = 0 RlSB = RsSB or RlSB = 0 or RlSB = 1
These have the following solutions: EsHA = ExHA PgNA = ScPA GlSA = GsSA or GlSA = 0 or GlSA = 1 ScPA-GpSA = EpNA RlSB = RsSB or RlSB = 0 or RlSB = 1
(21) (22) (23) (24) (25)
This means that for the caregiver short term and long term emotional exhaustion are equal (21). Also for both the caregiver and the care recipient short term and long term stress are the same, when the long term stress is not 0 or 1 (23) and (25). Moreover, for the caregiver social support provision is equal to expected personal gain (22), and on the other hand social support provision is equal to the sum of experienced personal gain and the caregiver’s primary stressors (24).
6 Formal Verification of the Model This section addresses the analysis of the informal caregiving interactions model by specification and verification of properties expressing dynamic patterns that are expected to emerge. The purpose of this type of verification is to check whether the model behaves as it should by running a large number of simulations and automatically verifying such properties against the simulation traces. A number of dynamic properties have been identified, formalized in the language TTL and automatically checked [2]. The language TTL is built on atoms state(γ, t) |= p denoting that p holds in trace γ (a trajectory of states over time). Dynamic properties are temporal predicate logic statements that can be formulated using such state atoms. Below, a some of the dynamic properties that were identified for the informal caregiving interactions model are introduced, both in semi-formal and in informal notation. Note that the properties are all defined for a particular trace γ or a pair of traces γ1, γ2. P1 – Stress level of cg
For all time points t1 and t2 in traces γ1 and γ2 if in trace γ1 at t1 the level of negative life events of agent cg is x1 and in trace γ2 at t1 the level of negative life events of agent CG is x2, and in trace γ1 at t1 the level of personal resources of agent cg is y1 and in trace γ2 at t1 the level of personal resources of agent cg is y1, and in trace γ1 at t1 the level of long term stress of agent cg is z1 and in trace γ2 at t1 the level of caregiver stress of agent cg is z2,
Modelling Caregiving Interactions during Stress and then
271
x1 ≥ x2, and y1 ≤ y2, and t1 < t2, z1 ≥ z2.
∀γ1, γ2:TRACE, ∀t1, t2:TIME ∀x1,x2, y1, y2, z1, z2:REAL state(γ1, t1) |= negative_life_events(ag(cg), x1) & state(γ2, t1) |= negative_life_events(ag(cg), x2) & state(γ1, t1) |= personal_resources(ag(cg), y1) & state(γ2, t1) |= personal_resources (ag(cg), y2) & state(γ1, t2) |= long_term_stress(ag(cg), z1) & state(γ2, t2) |= long_term_stress (ag(cg), z2) & x1 ≥ x2 & y1 ≤y2 & t1 < t2 ⇒ z1 ≥ z2
Property P1 can be used to check whether caregivers with more stressful life events and lack of resources will experience a higher level of caregiver (long term) stress. The property succeeded when two traces were compared where in one trace the caregiver had more (or equal) negative life events and less personal resources than the caregiver from the other trace. In this situation the first caregiver experienced more long term stress than the caregiver with more personal resources and less negative life events. Notice that since this property checks whether it is true for all time points in the traces, in some simulation traces the values for negative life events or personal resources change halfway the simulation trace, then the property succeeds for only a part of the trace, which can be expressed by an additional condition stating that t1 is at time point 500 (halfway our traces of 1000 time steps). P2 – Stress buffering of cr
For all time points t1 and t2 in trace γ, If at t1 the level of received social support of agent cr is m1 and m1 ≥ 0.5 (high) and at time point t2 the level of the stress buffer of agent cr is m2 and t2≥ t1+d, then m2 ≥ 0.5 (high). ∀γ:TRACE, ∀t1, t2:TIME ∀m1, m2, d:REAL state(γ, t1) |= received_social_support(ag(cr), m1) & state(γ, t2) |= stress_buffer(ag(cr), m2) & m1 ≥ 0.5 & t2= t1+d ⇒ m2 ≥ 0.5
Property P2 can be used to check whether social support buffers the care recipient’s stress. It is checked whether if the received social support in agent cr is high (a value higher or equal to 0.5), then the stress buffer of agent cr also has a high value after some time (having a value above or equal to 0.5). The property succeeded on the traces, where the received social support was higher or equal to 0.5. Relating positive recovery of care receiver and social support from care giver Property P3 can be used to check whether positive recovery shown by the care recipient, will make the caregiver provide more social support at a later time point. This property P3can be logically related to milestone properties P3a and P3b that together imply it: P3a & P3b ⇒ P3. Given this, using the checker it can be found out why a hierarchically higher level property does not succeed. For example, when property P3 does not succeed on a trace, by the above implication it can be concluded that at least one of P3a and P3b cannot be satisfied. By the model checker it can be discovered if it is property P3a and/or P3b that does/do not succeed. Properties P3a and P3b are introduced after property P3 below. P3 – Positive recovery of cr leads to more social support from cg
For all time points t1 and t2 in trace γ, If at time point t1 the level of primary stressors of agent cg is d1 and at time point t2 the level of primary stressors of agent cg is d2
272
A.A. Aziz, J. Treur, and C.N. van der Wal
and and and then
at time point t1 the level of received support of agent cr is f1 at time point t2 the level of received support of agent CR is f2 d2 ≥ d1, and t1< t2, f2 ≥ f1
∀γ:TRACE, ∀t1, t2:TIME ∀d1, d2, f1, f2:REAL state(γ, t1) |= primary_stressors(ag(cg), d1) & state(γ, t2) |= primary_stressors (ag(cg), d2) & state(γ, t1) |= received_social_support(ag(cr), f1) & state(γ, t2) |= received_social_support(ag(cr), f2) & d2 < d1 & t1< t2 ⇒ f2 ≥ f1
Property P3 succeeded in all generated simulation traces: when the primary stressors of the caregiver decreased, then at a later time point the received social support of the care recipient increased. In some simulation traces the property only succeeded on the first or second half of the trace. In these traces the primary stressors of the caregiver increased in the first part of the trace and then decreased in the second part of the trace. For this, a condition was added to the antecedent of the formal property, namely t1 = 500 or t2 = 500, so that the property is only checked on the second part or first part of the trace respectively. P3a – Positive recovery of cr leads to more personal gain in cg
For all time points t1 and t2 in trace γ, If at t1 the level of primary stressors of agent cg is d1 and at time point t2 the level of primary stressors of agent cg is d2 and at time point t1 the level of personal gain of agent cg is e1 and at time point t2 the level of personal gain of agent cg is e2 and d2 ≤ d1, and t1< t2 then e2 ≥ e1
∀γ:TRACE, ∀t1, t2:TIME ∀d1, d2, e1, e2:REAL state(γ, t1) |= primary_stressors(ag(cg), d1) & state(γ, t2) |= primary_stressors (ag(cg), d2) & state(γ, t1) |= expected_personal_gain(ag(cg), e1) & state(γ, t2) |= expected_personal_ gain (ag(cg), e2) & d2 < d1 & t1< t2 ⇒ e2 ≥ e1
Property P3a can be used to check whether, the caregiver’s expected personal gain will increase, if the primary stressors of the caregiver decrease. This property succeeded on the simulation traces where the primary stressors of the caregiver indeed decreased. P3b – Personal gain in cg motivates cg to provide more social support to cr
For all time points t1 and t2 in trace γ, If at time point t1 the level of personal gain of agent cg is e1 and at time point t2 the level of personal gain of agent cg is e2 and at t1 the level of received support of agent cr is f1 and at time point t2 the level of received support of agent cr is f2, and e2 ≥ e1, and t1< t2, then f2 ≥ f1
∀γ:TRACE, ∀t1, t2:TIME ∀e1, e2, f1, f2:REAL state(γ, t1) |= expected_personal_gain(ag(cg), e1) & state(γ, t2) |= expected_personal_gain(ag(cg), e2) & state(γ, t1) |= received_social_support(ag(cr), f1) & state(γ, t2) |= received_social_support(ag(cr), f2) & e2 > e1 & t1< t2 ⇒ f2 ≥ f1
Property P3b can be used to check whether the caregiver receives more social support if the expected personal gain of the caregiver increases. This property succeeded on the simulation traces where the expected personal gain indeed increased.
Modelling Caregiving Interactions during Stress
273
7 Conclusion The challenge addressed in this paper is to provide a computational model that is capable of simulating the behaviour of an informal caregiver and care recipient in a caregiving process when dealing with negative events. The proposed model is based on several insights from psychology, specifically stress-coping theory, and informal caregiving interactions; see [3], [4]. Simulation traces show interesting patterns that illustrate the relationship between personality attributes, support provision, and support receiving, and the effect on long term stress. A mathematical analysis indicates which types of equillibria occur for the model. Furthermore, using generated simulation traces, the model has been verified against a number of properties describing emerging patterns put forward in the literature. The resulting model can be useful to understand how certain concepts in a societal level (for example; personality attributes) may influence caregivers and recipients while coping with incoming stress. In addition to this, it could be used as a mechanism to develop assistive agents that are capable to support informal caregivers when they are facing stress during a caregiving process. As part of future work, it would be interesting to expand the proposed model in a social network of multiple caregivers and care recipients.
References 1. Aziz, A.A., Treur, J.: Modeling Dynamics of Social Support Networks for Mutual Support in Coping with Stress. In: Nguyen, N.T., Katarzyniak, R., Janiak, A. (eds.) Proc. of the First Int. Conference on Computational Collective Intelligence, ICCCI 2009, Part B. SCI, vol. 244, pp. 167–179. Springer, Heidelberg (2009) 2. Bosse, T., Jonker, C.M., van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and Verification of Dynamics in Agent Models. Int. Journal of Cooperative Information Systems 18, 167–1193 (2009) 3. Folkman, S.: Personal Control, Stress and Coping Processes: A theoretical analysis. Journal of Personality and Social Psychology 46, 839–852 (1984) 4. Kramer, B.J.: Expanding the Conceptualization of Caregiver Coping: The Importance of Relationship Focused Coping Strategies. J. of Family Relations 42(4), 383–391 (1993) 5. Musil, M.C., Morris, D.L., Warner, C., Saeid, H.: Issues in Caregivers Stress’ and Provider’s Support. Research on Aging 25(5), 505–526 (2003) 6. Sisk, R.J.: Caregiver burden and Health Promotion. International Journal of Nursing Studies 37, 37–43 (2000) 7. Sherwood, P., Given, C., Given, B., Von Eye, A.: Caregiver Burden and Depressive Symptoms: Analysis of Common Outcomes in Caregivers of Elderly Patients. Journal of Aging and Health 17(2), 125–147 (2005) 8. Skaff, M.M., Pearlin, L.I.: Caregiving: Role Engulfment and the Lost of Self. Gerontologist 32(5), 656–664 (1992) 9. Ostwald, S.K.: Caregiver Exhaustion: Caring for the Hidden Patients. Adv. Practical Nursing 3, 29–35 (1997) 10. Whitlach, C.J., Feinberg, L.F., Sebesta, D.F.: Depression and Health in Family Caregivers: Adaptation over Time. Journal of Aging and Health 9, 22–43 (1997) 11. Yates, M.E., Tennstedt, S., Chang, B.H.: Contributions to and Mediators Psychological Well-being for Informal Caregivers. J. of Gerontology 54, 12–22 (1999)
Computational Modeling and Analysis of Therapeutical Interventions for Depression Fiemke Both, Mark Hoogendoorn, Michel C.A. Klein, and Jan Treur VU University Amsterdam, Department of Artificial Intelligence De Boelelaan 1081, 1081HV Amsterdam, The Netherlands {fboth,mhoogen,mcaklein,treur}@cs.vu.nl http://www.few.vu.nl/~{fboth,mhoogen,mcaklein,treur}
Abstract. Depressions impose a huge burden on both the patient suffering from a depression as well as society in general. In order to make interventions for a depressed patient during a therapy more personalized and effective, a supporting personal software agent can be useful. Such an agent should then have a good idea of the current state of the person. A computational model for human mood regulation and depression has been developed in previous work, but in order for the agent to give optimal support during an intervention, it should also have knowledge on the precise functioning of the intervention in relation with the mood regulation and depression. This paper therefore presents computational models for these interventions for different types of therapy. Simulation results are presented showing that the mood regulation and depression indeed follow the expected patterns when applying these therapies. The intervention models have been evaluated for a variety of patient types by simulation experiments and formal verification.
1 Introduction Major depression is currently the fourth disorder worldwide in terms of disease burden, and is expected to be the disorder with the highest disease burden in high-income countries by the year 2030 (cf. [14]). Effective interventions for treating depressions are of utmost importance for both the patients suffering from a depression as well as for society in general. Supporting software agents can be very helpful in effectively treating a depression by providing personalized support for patients. Thereby, the agent can for example provide feedback on the current situation, give tips, and give certain tasks or assignments. In order for such a personal assistant agents to function effectively, it requires a detailed computational model on the relevant human states and their interrelationship regarding regulation of mood and depression. Such a model can also help to understand and analyze the basics behind a depression better. In [5] an example was shown of a computational model for mood regulation and depression based on literature on emotion and mood regulation. This model however does not explicitly address the functioning of interventions, such as activity scheduling [13] and cognitive restructuring [3]. Particularly for the domain of a personal assistant agent that supports patients during a major depression, knowledge about the functioning of these therapies is crucial to give effective support. In [6] a first attempt has Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 274–287, 2010. © Springer-Verlag Berlin Heidelberg 2010
Computational Modeling and Analysis of Therapeutical Interventions for Depression
275
been made to create such a model that combines the concepts of mood, depression, and a single type of intervention, namely activity scheduling. This paper presents a computational model of the effect on interventions on mood regulation and depression for a number of frequently used interventions such as activity scheduling, cognitive behavioral therapy, and other types of interventions aiming at enhancing coping skills. Within the model, the main principles of the interventions from psychological literature have been incorporated. This computational model is an extension of the mood regulation and depression model as presented in [5]. The model was used to simulate various patient types and the correctness of the behavior was analyzed using formal verification. The obtained model is suitable to be integrated within a personal assistant agent in order to provide effective support for the patient. In recent literature many contributions can be found about relations between mood regulation or depression and brain functioning; e.g., [1, 2, 7, 8, 9, 10, 11, 12, 15]. Much neurological support has been found for the processes of emotion and mood regulation, and in particular for modulation (down-regulation) of a negative mood in order to avoid or recover from a depression; e.g., [1, 2, 7, 12]. To capture this process of down-regulation of negative moods has been a basic point of departure for the model designed. More specifically, the model presented in this paper addresses how this down-regulation process can be stimulated and improved by therapeutical interventions. This paper is organized as follows. In Section 2 the model for mood regulation and depression as taken from [5] is explained in more detail. The various interventions are integrated into the model in Section 3. Section 4 presents simulation results, whereas Section 5 verifies that these results indeed comply with existing theories within clinical psychology. Finally, Section 6 is a discussion.
2 A Model for Mood Regulation and Depression In order to model mood regulation and depression an existing model has been adopted which is based on psychological and neurological literature on mood regulation (cf. [5]). In this section, this model is explained in more detail. The model as described already incorporates the main influences of interventions upon the states in the model (as an extension to the existing model of [5]). The learning effects for each of the specific therapies will be described in Section 3. Figure 1 shows an overview of the relevant states within the model and the relations between the states. In the figure, the states that are depicted in grey represent states that have been added to model the points of impact of interventions. The same holds for the dashed lines. States. In the model, a number of states are defined, whereby each state is represented by a number on the interval [0,1]. First, the states of the previous model will be explained. Hereby, the state objective emotional value of situation is present, which represents the value of the situation a human is in (without any influence of the current state of mind of the human). The state appraisal represents the current judgment of the situation given the current state of mind (e.g. when you are feeling down, a pleasant situation might no longer be considered pleasant). The mood level represents the current mood of the human, whereas thoughts level the current level of thoughts
276
F. Both et al.
ŝŶƚĞƌǀĞŶƚŝŽŶ ĂƉƉƌĂŝƐĂů ĞĨĨĞĐƚ
Žďũ͘ĞŵŽͲǀĂůƵĞ ǁŽƌůĚĞǀĞŶƚƐ ŽĨƐŝƚƵĂƚŝŽŶ
ƌĞĨůĞĐƚŽŶ ŶĞŐ͘ƚŚŽƵŐŚƚƐ ŵŽŽĚ ĂƉƉƌĂŝƐĂů ůĞǀĞů
ǁŽƌůĚ ŝŶĨůƵĞŶĐĞƐ
>dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ƚŚŽƵŐŚƚƐ ůĞǀĞů
ƐĞŶƐŝƚŝǀŝƚLJ
ƌĞĨůĞĐƚŝŽŶ ĐŽƉŝŶŐ
ǀƵůŶĞƌĂďŝůŝƚLJ
^dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ŽƉĞŶŶĞƐƐĨŽƌ ŽƉĞŶŶĞƐƐ ĨŽƌy ŝŶƚĞƌǀĞŶƚŝŽŶ
Fig. 1. Model for mood and depression (dashed lines and gray states indicate the extensions compared to [5])
(i.e. the positivism of the thoughts). The long term prospected mood level expresses what mood level the human is striving for in the long term, whereas the short term prospected mood level represents the goal for mood on the shorter term (in case you are feeling very bad, your short term goal will not be to feel excellent immediately, but to feel somewhat better). The sensitivity indicates the ability to select situations in order to bring the mood level to the short term prospected mood level. Coping expresses the ability of a human to deal with negative moods and situations, whereas vulnerability expresses how vulnerable the human is for getting depressed. Finally, world event indicates an external situation which is imposed on the human (e.g., losing your job). In addition to the states mentioned above, a number of states have been added to the model. First, a state representing the intervention (i.e., intervention) expressing that an intervention is taking place. The state reflection on negative thoughts expresses the therapeutic effect that the human is made aware of negative thinking about situations whereas the appraisal effect models the immediate effect on the appraisal of the situation. The world influences state is used to represent the impact of a therapy aiming to improve the objective emotional value of situation. The openness for intervention is a state indicating how open the human is for therapy in general, which is made more specific for each specific influence of the therapy in the state openness for X. Finally, reflection represents the ability to reflect on the relationships between various states, and as a result learn something for the future. Dynamics. The states as explained above are causally related, as indicated by the arrows in Figure 1. These influences have been mathematically modeled. The first state to be discussed is the objective emotional value of situation (oevs). This represents the situation selection mechanism of the human. First, the change in situation as would be selected by the human is determined (referred to as action in this case) as an intermediate step: action(t) = oevs(t) + sensitivity(t) (Neg(oevs(t)·(st_prosp_mood(t)-mood(t)) + Pos((1-oevs(t))·(st_prosp_mood(t)-mood(t))))
Computational Modeling and Analysis of Therapeutical Interventions for Depression
277
In the equation, the Neg(X) evaluates to 0 in case X is positive, and X in case X is negative, and Pos(X) evaluates to X in case X is positive, and 0 in case X is negative. The formula expresses that the selected situation is more negative compared to the previous oevs in case the short term prospected mood is lower than the current mood and more positive in the opposite case. Note that the whole result is multiplied with the sensitivity. The action in combination with the external influences now determines the new value for oevs: oevs(t+Δt) = oevs(t) + (world_event(t)·(action(t) + openness(t)·world_influence(t)·(1 – action(t))) - oevs(t))·Δt
The above equations basically take the value of actions as derived before in combination with the external influences (i.e. world influence and world event). The second step is that the human starts to judge the situation (i.e. appraisal) based upon his/her own state of mind: appraisal(t+Δt) = appraisal(t) + α (γ + openness_intervention(t)·reflect_neg_th(t) - appraisal(t)) Δt
where γ = (vulnerability·oevs(t)·thoughts(t) + coping·(1 - (1-oevs(t))·(1-thoughts(t))))
The value of appraisal is determined by the thoughts of the human in combination with the coping skills and vulnerability. In addition, the intervention related state reflection on negative thoughts plays a role (i.e. being aware that you are judging the situation as more negative than a person without a depression) in combination with the openness to this type of intervention. The state reflection on negative thoughts is calculated as follows: reflect_neg_th(t) = (basic_reflection(t) + appraisal_effect(t)·openness_X(t))·(1-appraisal(t))
Hence, the value increases based upon the appraisal effect of the intervention in combination with the openness to this specific part of the intervention. Furthermore, a basic reflection is expressed, which is the reflection already present in the beginning. Therapy can also dynamically change this basic reflection which can be seen as one of the permanent effects of therapy: basic_reflection(t+Δt) = basic_reflection(t) + α intervention(t)·learning_factor·(1-asic_reflection(t))Δt
The value for mood depends on a combination of the current appraisal with the thoughts, whereby a positive influence (i.e. thoughts and appraisal are higher than mood) is determined by the coping and the negative influence by the vulnerability. mood(t+Δt) = mood(t) + α (Pos(coping·(ε - mood(t))) - Neg(vulnerability·(ε - mood(t)))) Δt
where ε = appraisal(t)·wappraisal_mood + thoughts(t)·wthoughts_mood
Thoughts is a bit more complex, and is expressed as follows: thoughts(t+Δt) = thoughts(t) + α (ζ + (1 - (thoughts(t) + ζ)) · intervention(t)·wintervention(t))Δt
278
F. Both et al.
where: ζ = Pos(coping·(appraisal(t)·wappraisal_thoughts + mood(t)·wmood_thoughts - thoughts(t))) – Neg(vulnerability·(appraisal(t)·wappraisal_thoughts + mood(t)·wmood_thoughts - thoughts(t))) wintervention(t+Δt) = wintervention(t) + α (openness_X(t) - wintervention(t))Δt
This indicates that thoughts are positively influenced by the fact that you participate in an intervention (you start thinking a bit more positive about the situation, you are in therapy). The weight of this contribution depends on the openness for the intervention at that time point. In addition, the thoughts can either be positively influenced due to the higher combination of the levels of mood and appraisal (again multiplied with the coping), or negatively influenced by the same state (whereby the vulnerability plays a role). The sensitivity is calculated in a similar manner (without the influence of therapy of course): sensitivity(t+Δt) = sensitivity(t) + α (Pos(coping·(η - sensitivity(t))) - Neg(vulnerability·(η - sensitivity(t))))Δt
where η = mood(t)·wmood_sens + thoughts(t)·wthoughts_sens
Finally, the short term prospected mood is calculated as follows: st_prospmood(t+Δt) = st_prospmood(t) + α (vulnerability·(mood(t) - lt_prospmood) + coping·(lt_prospmood - st_prospmood(t)))Δt
3 Modeling Interventions for Mood Regulation and Depression In this section it is shown how the influences of three types of therapies are modeled in the extended model presented in Section 2. First, activity scheduling (cf. [13]) will be discussed, followed by cognitive behavior therapy (cf. [3]). The third model shows how an intervention that addresses coping skills and vulnerability directly can work. 3.1 Activity Scheduling Therapy Activity scheduling, also called behavioral activation therapy, works according to two principles: the patient learns the relationship between the selection of a relatively positive activity and the level of mood (i.e., when you do fun things, you will start to feel better). In order to learn this relationship again, the therapy imposes the selection of positive situations. In Figure 2 the main influences of this therapy are shown by means of the black arrows. Note that most of the influences have already been explained in the general overview in Section 2. One element part of the therapy states that learning the relationship between mood and objective emotional value of situation results in better coping (as the human can now better cope with a lower mood since he/she knows that an option is to select better situations). This is expressed as follows: coping(t+Δt) = coping(t) + α reflection(t)·wreflection(t)· (1 - |oevs(t) - mood(t)| )·(1 - coping(t)) Δt
where wreflection(t+Δt) = wreflection(t) + α (openness_as(t) - wreflection(t)) Δt
Computational Modeling and Analysis of Therapeutical Interventions for Depression
279
This states that the coping will increase as the difference between the mood and oevs is perceived small (which makes it easy to see the relationship and improve coping). Furthermore, an effect is that the openness for the specific therapy increases as the coping skills go up (since the human notices that the therapy works): openness_as(t+Δt) = openness_as(t) + θ α ·((coping(t) - coping(t-Δt))/dt) Δt
ĂĐƚŝǀŝƚLJ ƐĐŚĞĚƵůŝŶŐ
ŝŶƚĞƌǀĞŶƚŝŽŶ
ǁŽƌůĚ ĞǀĞŶƚƐ Žďũ͘ĞŵŽͲǀĂůƵĞ ŽĨƐŝƚƵĂƚŝŽŶ
ƌĞĨůĞĐƚŽŶŶĞŐ͘ ƚŚŽƵŐŚƚƐ ĂƉƉƌĂŝƐĂů
ŵŽŽĚ ůĞǀĞů
>dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ǁŽƌůĚ ŝŶĨůƵĞŶĐĞƐ ƚŚŽƵŐŚƚƐ ůĞǀĞů
ƐĞŶƐŝƚŝǀŝƚLJ
ƌĞĨůĞĐƚŝŽŶ ĐŽƉŝŶŐ ŽƉĞŶŶĞƐƐĨŽƌ ŝŶƚĞƌǀĞŶƚŝŽŶ
ǀƵůŶĞƌĂďŝůŝƚLJ
^dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ŽƉĞŶŶĞƐƐ ĨŽƌ^
Fig. 2. Computational model for activity scheduling therapy
3.2 Cognitive Behavioral Therapy Most negative situations occur without being able to control them. It is impossible to avoid all bad situations, it is therefore wise to be able to deal with bad circumstances. The theory behind CBT assumes that emotions are determined by thoughts about a situation and not by the situation itself. In the mood regulation model, it is not the concept ‘thoughts level’ but the concept ‘appraisal’ that corresponds to thoughts in the CBT theory, because the thoughts in CBT are about a specific situation, as the state ‘appraisal’ in the mood regulation model, and do not represent thoughts in general. The intervention CBT consists of understanding (reflection) that thoughts about a situation determine your mood and by detecting and transforming negative thoughts into positive thinking. The fact that you are doing something about your depression improves the thoughts level, which is a shared effect of CBT with the other therapies. Figure 3 shows the relevant part of the model for CBT by means of the black arrows. In this case, the reflection is modeled by learning the relationship between appraisal and mood: coping(t+Δt) = coping(t) + α reflection(t)·wreflection(t)·(1 - |appraisal(t)-mood(t)| )·(1 - coping(t)) Δt
In addition, the openness for CBT is increased by reflection in the same manner as the openness for AS.
280
F. Both et al. ŝŶƚĞƌǀĞŶƚŝŽŶ d ĂƉƉƌĂŝƐĂů ĞĨĨĞĐƚ
ǁŽƌůĚ Žďũ͘ĞŵŽͲǀĂůƵĞ ĞǀĞŶƚƐ ŽĨƐŝƚƵĂƚŝŽŶ
ƌĞĨůĞĐƚŽŶŶĞŐ͘ ƚŚŽƵŐŚƚƐ ŵŽŽĚ ĂƉƉƌĂŝƐĂů ůĞǀĞů
ǁŽƌůĚ ŝŶĨůƵĞŶĐĞƐ
ƚŚŽƵŐŚƚƐ ůĞǀĞů
ƐĞŶƐŝƚŝǀŝƚLJ
ƌĞĨůĞĐƚŝŽŶ ĐŽƉŝŶŐ ŽƉĞŶŶĞƐƐĨŽƌ ŝŶƚĞƌǀĞŶƚŝŽŶ
>dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ǀƵůŶĞƌĂďŝůŝƚLJ
^dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ŽƉĞŶŶĞƐƐ ĨŽƌd
Fig. 3. Computational Model for Cognitive Behavior Therapy
3.3 Intervention Directly Addressing Coping Skills and Vulnerability The last type of intervention investigated is one which is assumed to affect coping skills and vulnerability directly. Such a type of intervention might be based, for example, on a belief that coping skills and vulnerability may be affected negatively by traumatic experiences in the past, and that these effects could be taken away or diminished by some form of therapy addressing these. For the moment ignoring questions such as whether existing therapies with such claims are effective, or would be possible at all, it still can be explored how such a type of therapy could work according to the computational model. This is shown in Figure 4. Here the impact of the therapy is modeled as a direct causal connection to coping skills and vulnerability.
ŝƌĞĐƚ /ŶƚĞƌǀĞŶƚŝŽŶŝŶƚĞƌǀĞŶƚŝŽŶ ǁŽƌůĚ Žďũ͘ĞŵŽͲǀĂůƵĞ ĞǀĞŶƚƐ ŽĨƐŝƚƵĂƚŝŽŶ
ƌĞĨůĞĐƚŽŶŶĞŐ͘ ƚŚŽƵŐŚƚƐ ŵŽŽĚ ĂƉƉƌĂŝƐĂů ůĞǀĞů
ǁŽƌůĚ ŝŶĨůƵĞŶĐĞƐ
ƚŚŽƵŐŚƚƐ ůĞǀĞů
ǀƵůŶĞƌĂďŝůŝƚLJ ĞĨĨĞĐƚ ƌĞĨůĞĐƚŝŽŶ
ǀƵůŶĞƌĂďŝůŝƚLJ
ĐŽƉŝŶŐ ŽƉĞŶŶĞƐƐĨŽƌ ŝŶƚĞƌǀĞŶƚŝŽŶ
>dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ƐĞŶƐŝƚŝǀŝƚLJ ^dƉƌŽƐƉĞĐƚĞĚ ŵŽŽĚůĞǀĞů
ŽƉĞŶŶĞƐƐ ĨŽƌ/
Fig. 4. Computational Model for an Intervention Directly Addressing Coping Skills and Vulnerability
Computational Modeling and Analysis of Therapeutical Interventions for Depression
281
4 Simulation Results In this section, simulation results are presented. Three different fictional persons are studied with divergent values for coping and vulnerability. Furthermore, the value for openness is varied for each of these persons as well (0.2 and 0.3 for less and more openness respectively). These values are chosen to show the different influences of the therapies on different types of people and are in accordance with real persons who will follow the therapies in the future. Table 1 shows the initial values for the most important variables of the model for each person: Table 1. Initial values for the simulation experiments
coping vulnerability oevs appraisal, mood, thoughts, sensitivity, short term prospected mood, long term prospected mood
person 1 0.1 0.9 0.925 0.6
person 2 0.15 0.85 0.907 0.65
person 3 0.3 0.7 0.84 0.7
For the sake of brevity, this section will only discuss the results for person 1. First, the simulation without any form of therapy is shown. The person experiences very negative events during a substantial period (with value 0.1 during 80 hours). Since the person is highly vulnerable, a depression follows. Note that time is represented in hours. 1 oevs 0.8
appraisal mood
0.6
st prospected mood
0.4 0.2 0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 5. Person type 1 without therapy
The figure shows that a negative event of 0.1 is imposed on the person; this has a dramatic effect on all of the internal states of the patient: mood drops to a very low level and so do appraisal and the short term prospected mood. Eventually all states do start to increase again due to relatively good situations selected, but this goes very slowly. Figure 6 shows an example whereby the patient is receiving cognitive behavioral therapy. The patient does however have a relatively low openness of 0.2 for this type of therapy.
282
F. Both et al. 1 oevs 0.8
appraisal mood
0.6
st prospected mood
0.4 0.2 0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 6. Person type 1 following CBT with a lower openness 1 oevs 0.8
appraisal mood
0.6
st prospected mood
0.4 0.2 0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 7. Person type 1 following CBT with a higher openness 1 oevs 0.8
appraisal mood
0.6
st prospected mood
0.4 0.2 0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 8. Person type 1 following AS with a lower openness 1 0.8 0.6 0.4
oevs
0.2
appraisal mood st prospected mood
0
0
200
400
600
800
1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 9. Person type 1 following AS with a higher openness
Computational Modeling and Analysis of Therapeutical Interventions for Depression
283
For this case, it can be seen that the appraisal is increased via reflection on negative thoughts, pulling the other states up as well. It does however still take quite some time to get the mood level sufficiently up. The dip after the intervention stops (after 6 weeks) is the result of the fact that the person is no longer reminded about the correctness/importance of appraisal, resulting in a slight search for a new equilibrium. If the openness is increased, the person recovers more quickly, because reflection on negative thoughts increases faster (see Figure 7). For activity scheduling the same types of experiments have been conducted. Figure 8 shows an example of a person with a lower openness for this type of therapy. In this case, the world influence changes due to the therapy (since the therapy results in better situations being selected). This results in an increase of the objective emotional value of situation, pulling the rest of the states up as well. In case the person is more sensitive to the therapy, the oevs increases more quickly and therefore it takes less time for the person to recover (Figure 9). Finally, in Figure 10 and 11 the results for the direct intervention are shown for a person with a low and high openness respectively. The figures show a more rapid recovery in case of a higher openness. 1 0.8 0.6 0.4
oevs
0.2
appraisal mood st prospected mood
0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 10. Person type 1 following a direct intervention with lower openness 1 0.8 0.6 0.4
oevs
0.2
appraisal mood st prospected mood
0
0
200
400
600
800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
hours
Fig. 11. Person type 1 following a direct intervention with higher openness
5 Analysis of the Computational Model In this section, an analysis of the model described above is presented. Two different types of analysis have been performed, with partly different purposes. First, in order to verify the patterns produced by the model, a number of temporal patterns have been
284
F. Both et al.
specified that reflect a number of general characteristics of the process of depression and its treatment. For example, the characteristic that the length of a depression should be shorter for persons that follow a therapy than for people that did not follow a therapy. These properties have been automatically verified for different simulation traces of the model (Section 5.1). Second, the effect of specific therapies on the change of the values for the different variables in the model has been analyzed. This analysis is also useful for verification of the intended effect of a therapy, but can be used for a different purpose as well. Based on the order in which different model variables start changing in reaction to a specific therapy, it is possible to derive which type therapy is given. Thus, this analysis forms a basis for a diagnostic process that can detect that a person follows some specific type of therapy, based on observations of values of variables that are present in the model (e.g., reports about the mood or an analysis of the objective emotional value of the situation). This part of the analysis is described in Section 5.2. 5.1 Verification The following temporal properties that reflect a number of general patterns and characteristics of the process of depression and the treatment have been formulated. The properties were specified in the TTL language [4]. This predicate logical temporal language supports formal specification and analysis of dynamic properties, covering both qualitative and quantitative aspects. TTL is built on atoms referring to states of the world, time points and traces, i.e. trajectories of states over time. In addition, dynamic properties are temporal statements that can be formulated with respect to traces based on the state ontology Ont in the following manner. Given a trace γ over state ontology Ont, the state in γ at time point t is denoted by state(γ, t). These states can be related to state properties via the infix predicate |=, where state(γ, t) |= p denotes that state property p holds in trace γ at time t. Based on these statements, dynamic properties can be formulated in a sorted first-order predicate logic, using quantifiers over time and traces and the usual first-order logical connectives such as ¬, ∧, ∨, ⇒, ∀, ∃. For more details, see [4]. Automated tool support is also available that allows for verifying whether the properties hold in a set of simulation traces. A number of simulations (thereby considering all the different types of persons mentioned in Section 4 in combination with different openness to therapy) have been used as basis for the verification and were confirmed. P1: Effectiveness of Therapy Persons that follow a therapy are depressed for a shorter period than persons who do not.
∀γ1, γ2:TRACE, ∀ t:TIME [ [ [ state(γ1, t) |= intervention_CBT | state(γ1, t) |= intervention_AS ] & state(γ2, t) |= not intervention_AS & state(γ2, t) |= not intervention_CBT ] ⇒ ∃t2:TIME > t, R1,R2 :REAL [ R1 < MIN_LEVEL & R2 > MIN_LEVEL & state(γ2, t2) |= has_value(mood, R1) & state(γ1, t2) |= has_value(mood, R2)]
P2: Openness to therapy helps Persons more open to therapy remain depressed for a shorter period than those less open. ∀γ1, ∀γ2:TRACE, ∀ R1,R2: REAL, t:TIME [ [ state(γ1 t) |= has_value(openness,R1) & state(γ2 t) |= has_value(openness,R2) & R2 < R1) ] ⇒ ∃t2:TIME, R3,R4 :REAL [ R3 < MIN_LEVEL & R4 > MIN_LEVEL & state(γ2, t2) |= has_value(mood, R3) & state(γ1, t2) |= has_value(mood, R4) ]
Computational Modeling and Analysis of Therapeutical Interventions for Depression
285
P3: Effect on coping skills After a person has followed therapy for some time, the coping skills have improved.
∀γ:TRACE, t:TIME, R1:REAL [ [ [state(γ, t) |= intervention_CBT | state(γ, t) |= intervention_AS ) ] & state(γ, t) |= has_value(coping, R1)] ⇒ ∃t2:TIME > t + MIN_DURATION, R2 :REAL [ R2 > R1 + MIN_INCREASE & state(γ, t2) |= has_value(coping, R2) ]
P4: CBT results in higher appraisal than AS After a person has followed CBT, appraisal is higher than after following AS.
∀γ1, γ2:TRACE, ∀ R1,R2: REAL, t1, t2:TIME [ [ state(γ1, t1) |= intervention_CBT & state(γ2, t1) |= intervention_AS & state(γ1, t2) |= has_value(appraisal, A1) & state(γ2, t2) |= has_value(appraisal, A2) & T2 > T1 + MIN_DUR] ⇒ A1 > A2 ]
This latter property was confirmed for persons with the same openness for therapy; those following AS with a high openness may end up with a higher appraisal than those following CBT with a low openness. 5.2 Effects of Therapy Types In order to analyze the effect of the different types of therapies on the model variables, it is useful to see when a specific model variable starts changing as a result of the therapy, and in particular which variable changes first. The order in which the different concepts start being influenced by the treatment, is a characteristic of the therapy. For example, when following behavioral activation it is assumed that the objective emotional value of the situation will be affected before the mood itself will change. In contrast, cognitive behavioral therapy will first affect the reflection on negative thoughts. To detect the moment when an intervention affects a variable, we look at a sudden change in the increase or decrease of the value of a concept over time: a form of acceleration. Formally, this can be determined by looking at the relative second-order derivative of a variable over time: the second-order derivative divided by the first-order derivative. This can be calculated more easily by dividing the change of the value of a variable in the current time step (t + Δt) by the change of this value in the previous time step (t - Δt), as this is mathematically almost equivalent: (y(t + Δt) – y(t)) / (y(t) – y(t – Δt)) – 1 = y’(t) / y’(t-Δt) – 1 = [ [[y’(t) - y’(t-Δt) ]/Δt]/ y’(t-Δt) ] Δt
= [ (y(t+Δt) – y(t))/Δt ] / [ (y(t) – y(t-Δt))/Δt] - 1 = [y’(t) - y’(t-Δt) ]/ y’(t-Δt) = [ y’’(t-Δt)/y’(t-Δt) ] Δt
So, to be precise, for mood this relative acceleration y’’(t-Δt)/y’(t-Δt) can be measured by: mood_acceleration(t) = [(mood(t + Δt) – mood(t)) / (mood(t) – mood(t – Δt)) - 1] / Δt
The acceleration values for the concepts mood, objective emotional value of the situation and reflection on negative thoughts can be calculated similarly. All acceleration values have been determined from 5 time steps before the start of the intervention till 15 time steps after the start. Figures 12 and 13 illustrate the order of change of the different variables for the different types of therapy. It can be seen that all therapies start having an effect at time point t = 0. Moreover, Figure 12 shows that AS indeed first affects the situation before the mood is affected. Similarly, CBT first affects the reflection on negative thoughts (Figure 13), however, this is a bit more difficult to see. At t = 0, the acceleration of reflection on negative thoughts is very
286
F. Both et al.
low (far below the bottom of the graph), because of the large increase of this concept at the start of the intervention. At t = 1 this value is almost zero (and therefore visible again in the graph), after which another dip follows at t = 2. This is because the concept stays at the high level for one time step and then starts dropping again, which can be seen in the left panel of Figure 13. However, the conclusion is that the reflection is influenced before the mood is affected. 15
0.05
der2 oevs der2 mood der2 reflect neg th
reflect neg th mood oevs
0.04
10
0.03 0.02 5 0.01 0 430
435
440
445
450
0
-4
-2
0
2
4
6
8
10
12
14
Fig. 12. Original (left) and acceleration (right) of values for a patient following AS 0.25
der2 oevs der2 mood der2 reflect neg th
200 0.2
150 reflect neg th mood oevs
0.15
100 50
0.1
0 -50
0.05
-100 0 430
435
440
445
450
-4
-2
0
2
4
6
8
10
12
14
Fig. 13. Original (left) and acceleration (right) of values for a patient following CBT
6 Discussion In this paper, a computational model has been presented for the effect of three different types of therapies for depression. It extends a computational model for human mood regulation and depression that has been developed in previous work [5]. Simulation results presented have shown how the mood regulation and depression indeed follow the expected patterns when applying these therapies. The intervention models have been analyzed for a variety of patient types by simulation experiments and formal verification. This work is one of the first steps in the development of a software agent to support patients and the therapy followed during a depression in a personal manner. In future work these computational models will be integrated as a domain model within an agent model, in such a way that the agent is able to reason based on the domain model by causal deductive and abductive forms of reasoning. The aim is that in this way the agent can both analyze the state of the patient and generate appropriate (inter)actions to the patient in order to improve the patient’s state.
Computational Modeling and Analysis of Therapeutical Interventions for Depression
287
Acknowledgments. This research has been conducted as part of the FP7 ICT program of the European Commission under grant agreement No 248778 (ICT4Depression). Furthermore, the authors wish to thank Pim Cuijpers within the Department of Clinical Psychology at the VU University Amsterdam for the fruitful discussions.
References 1. Anand, A., Li, Y., Wang, Y., Wu, J., Gao, S., Bukhari, L., Mathews, V.P., Kalnin, A., Lowe, M.J.: Activity and connectivity of brain mood regulating circuit in depression: A functional magnetic resonance study. Biological Psychiatry 57, 1079–1088 (2005) 2. Beauregard, M., Paquette, V., Levesque, J.: Dysfunction in the neural circuitry of emotional self regulation in major depressive disorder. Learning and Memory 17, 843–846 (2006) 3. Beck, A.T.: Depression: Causes and Treatment. University of Pennsylvania Press, Philadelphia (1972) 4. Bosse, T., Jonker, C.M., van der Meij, L., Sharpanskykh, A., Treur, J.: Specification and Verification of Dynamics in Agent Models. International Journal of Cooperative Information Systems 18, 167–193 (2009) 5. Both, F., Hoogendoorn, M., Klein, M.A., Treur, J.: Formalizing Dynamics of Mood and Depression. In: Ghallab, M., Spyropoulos, C.D., Fakotakis, N., Avouris, N. (eds.) Proc. of the 18th European Conf., on Art. Int., ECAI 2008, pp. 266–270. IOS Press, Amsterdam (2008) 6. Both, F., Hoogendoorn, M., Klein, M.C.A., Treur, J.: Design and Analysis of an Ambient Intelligent System Supporting Depression Therapy. In: Azevedo, L., Londral, A.R. (eds.) Proc. of the Second International Conference on Health Informatics, HEALTHINF 2009, pp. 142–148. INSTICC Press (2009) 7. Davidson, R.J., Lewis, D.A., Alloy, L.B., Amaral, D.G., Bush, G., Cohen, J.D., Drevets, W.C., Farah, M.J., Kagan, J., McClelland, J.L., Nolen-Hoeksema, S., Peterson, B.S.: Neural and behavioral substrates of mood and mood regulation. Bio. Psychiatry 52, 478–502 (2002) 8. Drevets, W.C.: Orbitofrontal Cortex Function and Structure in Depression. Annals of the New York Academy of Sciences 1121, 499–527 (2007) 9. Drevets, W.C.: Neuroimaging abnormalities in the amygdala in mood disorders. Ann. N Y Acad. Sci. 985, 420–444 (2003) 10. Harrison, P.J.: The neuropathology of primary mood disorder. Brain 125, 1428–1449 (2002) 11. Konarski, J.Z., McIntyre, R.S., Kennedy, S.H., Rafi-Tari, S., Soczynska, J.K., Ketter, T.A.: Volumetric neuroimaging investigations in mood disorders: bipolar disorder versus major depressive disorder. Bipolar Disorder 10, 1–37 (2008) 12. Lévesque, J., Eugene, F., Joanette, Y., Paquette, V., Mensour, B., Beaudoin, G., Lerous, J.M., Bourgouin, P., Beauregard, M.: Neural circuitry underlying voluntary suppression of sadness. Biological Psychiatry 53, 502–510 (2003) 13. Lewinsohn, P.M., Youngren, M.A., Grosscup, S.J.: Reinforcement and depression. In: Dupue, R.A. (ed.) The psychobiology of depressive disorders: Implications for the effects of stress, pp. 291–316. Academic Press, New York (1979) 14. Mathers, C.D., Loncar, D.: Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med. 3, e442 (2006) 15. Mayberg, H.S.: Modulating dysfunctional limbic-cortical circuits in depression: towards development of brain-based algorithms for diagnosis and optimized treatment. British Medical Bulletin 65, 193–207 (2003)
A Time Series Based Method for Analyzing and Predicting Personalized Medical Data Qinwin Vivian Hu1 , Xiangji Jimmy Huang1 , William Melek2 , and C. Joseph Kurian2 1
Information Retrieval and Knowledge Management Research Lab York University, Toronto, Canada 2 Alpha Global IT, Toronto, Canada [email protected], [email protected], {william,cjk}@alpha-it.com
Abstract. In this paper, we propose a time series based method for analyzing and predicting personal medical data. First, we introduce an auto-regressive integrated moving average model which is good for all time series processes. Second, we describe how to identify a personalized time series model based on the patient’s history information, followed by estimating the parameters in the model. Furthermore, a case study is presented to show how the proposed method works. In addition, we forecast the laboratory tests for the next twelve months in the future, with giving the corresponding prediction limits. Finally, we draw our contributions as our conclusions.
1
Introduction and Motivation
Like many areas in medicine, medical tests conduct on small samples collected from the human being’s body and then provide the information a doctor needed to evaluate a human being’s health or to understand what is causing an illness. Sometimes, doctors need to order tests to find out more. With the development of the health care theories, techniques and methods, all kinds of clinic laboratory tests are available. Then, how to make good use of these large amount of data and how to predict the laboratory tests in the future are important for the health care systems [7, 12]. In this paper, we are motivated to analyze the personalized time series process for a patient for predicting her/his laboratory tests in the future. The data are from a real research project, which will be introduced in Section 2. We have 79 monthly laboratory test records for each patient. Therefore, for each patient, we build up a time series process to predict the laboratory tests in the next N th month or the next N th year. First, we employ a general auto-regression integrated moving average (ARIMA) model [3, 8, 9]which is good for any time series process. Then, according to the history data, we identify a personalized time series model for each patient by conducting transformations, calculating the sample auto-correlation function (ACF) [6, 10]and the sample partial auto-correlation function (PACF) [6, 10]. Third, we estimate the parameters in the personalized model, based on the modified stationary model for the patient. Later, we Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 288–298, 2010. Springer-Verlag Berlin Heidelberg 2010
Analyzing and Predicting Personalized Medical Data
289
present a case study to show how the proposed method works, with forecasting the laboratory tests in the future and giving the 95% prediction interval. The remainder of this paper is organized as follows. First, we describe the data set in Section 2. Then Section 3, we introduce a personalized model identification for each patient. An ARIMA model which is the most general model fitting for every time series process is shown and the steps for how to set up a model for a patient according to his/her unique time series are presented, followed by estimating the parameters in the model. After that, we present a case study to present the experimental results, discuss and analyze the influences of our work in Section 4. Finally, we briefly draw the contributions of this paper in Section 5.
2
Data Set Description
The datasets in our experiment are obtained from Alpha Global IT [1]. Alpha Corporate Group is an authorization that has been providing Medical Laboratory, Industrial/Pharmaceutical Laboratory, Diagnostic Imaging services and Managed Care Medical Clinic in addition to providing commercial Electronic Medical Record and Practice Management Software. The medical test datasets contains 78 monthly patients’ blood testing records. We first extract data for each patient and rank the data according to time order. Then we apply a general time series model and identify a personalized stationary model for predictions. In order to understand the data set better, we present some sample data in Table 1. There are five attributes employed in this paper, in which SDTE stands for service date, PNUM for patient health card number, PSEX for patient gender, BDTE for patient date of birth and TSEQ for test sequence number. In particular, for the sake of privacy, the information in Table 1 is faked and it only shows the format of a class of datasets. Table 1. Criteria of Theoretical ACF and PACF for Stationary Processes SDTE PNUM PSEX 20020101 patient number female ... ... ... 20030201 patient number female ... ... ... 20040101 patient number female ... ... ... ... ... 20080601 patient number female ... ... ...
3
BDTE TSEQ mm/dd/yyyy test1 ... ... mm/dd/yyyy test9 ... ... mm/dd/yyyy test5 ... ... mm/dd/yyyy test1 ... ...
Personalized Model Identification
In this section, we introduce a personalized model identification for each patient. First, we introduce an ARIMA model which is the most general model fitting for every time series process. Then, we describe the steps for how to set up a model for a patient according to his/her unique time series.
290
3.1
Q.V. Hu et al.
The General ARIMA Model
In statistics, and in particular in time series analysis, not all time series are always stationary. A homogeneous non-stationary time series can be reduced to a stationary time series by taking a proper degree of differencing. The auto-regressive model, the moving average model and the auto-regressive moving average model, are useful in describing stationary time series. Then the auto-regressive integrated moving average (ARIMA) model is built as a large class of time series model using differencing, which is useful in describing various homogeneous nonstationary time series. The general ARIMA model can be presented in Equation 1. 1. ARIM A(p, d, q) : φp (B)(1 − B)d Zt = θ0 + θq (B)at
(1)
where B is a back shift operator [3]; φp (B) is the stationary AR operator [8, 9] with φp (B) = (1 − φ1 B − ...− φp B p ); θq (B) is the invertible MA operator [11, 13] with θq (B) = (1 − θ1 B − ... − θq B q ); φp (B) and θq (B) share no common factors; θ0 is a parameter related to the mean of the process; and at is a white noise process [3, 9]. The parameter θ0 plays very different roles for d = 0 and d > 0. When d = 0, the original process is stationary, and we get θ0 = μ(1 − φ1 − ... − φp ). When d > 0, however, θ0 is called the deterministic trend term and is often omitted from the model unless it is really needed. 3.2
Steps for Model Identification
To illustrate the model identification, we consider the generalARIM A(p, d, q) model introduced in Section 3.1. Model identification refers to the methodology in identifying the required transformations, the decision to include the deterministic parameter θ0 , and the proper order of p the AR operator and q for the MA operator. Given a time series of a patient, we use the following steps to identify a tentative model for predicting the lab tests in the future. Step 1. Plot the time series data and choose proper transformations. In a time series analysis, plotting the time series data is always the first step. Through careful examination of the plot, we usually get a good idea about whether the series contains a trend, seasonality, outliers, non-constant means, non-constant variances, and other abnormal and non-stationary phenomena. This understanding often provides a basis for postulating a possible data transformation. Since we prefer examining the plot automatically, there are more than one way to understand the series, such as simulating a distribution for the data. Differencing and variance-stabilizing transformations are two commonly used transformations, in time series analysis. Because variance-stabilizing transformations such as the power transformation require non-negative values and differencing may create some negative values, we should always apply variance-stabilizing transformations before taking differencing. A series with non-constant variance
Analyzing and Predicting Personalized Medical Data
291
often needs a logarithmic transformation. More generally, we refer to the transformed data as the original series in the following discussion unless mentioned otherwise. Step 2. Compute and examine the sample ACF and the sample PACF of the original series to further confirm a necessary degree of differencing so that differenced series is stationary. We employ two rules as follows. First, if the sample ACF decays very slowly (the individual ACF may not be large) and the sample PACF cuts off after lag 1, then it indicates differencing is needed. In general, we try taking the first differencing (1 − B)Zt . One can also use the unit root test proposed by Dickey and Fuller (1979) [4]. In a borderline case, differencing is recommended by Dickey, Bell and Miller (1986) [5]. Second, in order to remove non-stationary, we may need to consider a higher order differencing (1 − B)d Zt for d > 1. In most cases, d is ether 0, 1, or 2. Note that if (1 − B)d Zt is stationary, then (1 − B)d+i Zt for i = 1, 2, ... are also stationary. Step 3. Compute and examine the sample ACF and the sample PACF of the properly transformed and differenced series to identify the orders of p and q where we recall that p is the highest order in the AR polynomial (1 − φ1 B − ... − φp B p ), and q is the highest order in the MA polynomial (1 − θ1 B − ... − θq B q ). Usually, we obtain the needed orders of p and q less than or equal to 3. We also present a table in Table 2 to summarize the important criteria for selecting p and q. Table 2. Criteria of Theoretical ACF and PACF for Stationary Processes Process AR(p)
ACF PACF Tails off as exponential decay or Cuts off damped sine wave M A(q) Cuts off after lag q Tails off damped ARM A(p, q) Tails off after lag (q − p) Tails off
after lag p as exponential decay or sine wave after lag (p − q)
Step 4. Test the deterministic trend term θ0 when d > 0. For a non-stationary model, φB(1 − B)d Zt = θ0 + θ(B)at , the parameter θ0 is usually omitted so that it is capable of representing series with random changes in the level, slope or trend. If there is reason to believe that the differenced series contains a deterministic trend mean, however, we can test for its inclusion by ¯ of the differenced series Wt = (1 − B)d Zt with comparing the sample mean W its approximate standard error SW ¯ . To derive Sw ¯ , we note that ¯)= limn→∞ nV ar(W
∞
γj
(2)
j=−∞
Hence, we get 2 σW ¯ =
γ0 1 1 j = −∞∞ ρj = j = −∞∞ γj = γ(1) n n n
(3)
292
Q.V. Hu et al.
where γ(B) is the auto-covariance generating function and γ(1) is its value at ¯ is model dependent. B = 1. Thus, the variance and the standard error for W For example, for the ARIM A(1, d, 0) model, (1 − φB)Wt = at , we have: γ(B) =
σa2 (1 − φB)(1 − φB −1 )
(4)
so that 1 σa2 σ 2 1 − φ2 = W 2 n (1 − φ) n (1 − φ)2 σ 2 1 + ρ1 σ2 1 + φ = W = W n 1−φ n 1 − ρ1
2 σW ¯ =
2 where we note that σW = σa2 /(1 − φ2 ). The required standard error is γˆ0 1 + ρˆ1 SW ( ) ¯ = n 1 − ρˆ1
(5)
(6)
Expressions of SW ¯ for other models can be derived similarly. At the model identification phase, however, because the underlying model is unknown, most available software use the approximation SW ¯ =[
γˆ0 (1 + 2ρˆ1 + 2ρˆ2 + ... + 2ρˆk )]1/2 n
(7)
where γˆ0 is the sample variance and ρˆ1 , ..., ρˆk are the first k significant sample ACFs of {Wt }. Under the null hypothesis ρk = 0 for k ≥ 1, we can reduce Equation 6 to γˆ0 /n (8) SW ¯ = Alternatively, one can include θ0 initially and discard it at the final model estimation if the preliminary estimation results is not significant. 3.3
Parameter Estimation
After we identify a personalized model in Section 3.2, we have to estimate the parameters in the model. In this section, we apply the method of moments for parameter estimation. The method of moments consists of substituting sample moments such as the ¯ sample variance γˆ0 and the sample ACF ρˆi for their theoretical sample mean Z, counterparts and solving the resultant equations to obtain estimates of unknown parameters. For better understanding, we take an auto-regressive process AR(p) as an example. In a similar way, we can deal with a moving average process M A(q) and an auto-regression moving average process ARM A(p, q) at the same way. In an AR(p) process, we have ˙ + φ2 Zt−2 ˙ + ... + φp Zt−p ˙ + at Z˙ t = φ1 Zt−1
(9)
Analyzing and Predicting Personalized Medical Data
293
¯ To estimate φ, we first use that ρk = the mean u = E(Zt ) is estimated by Z. φ1 ρk−1 + φ2 ρk−2 + ... + φp ρk−p for k ≥ 1 to obtain the following system of Yule-Walker [13] equations: ρ1 = φ1 + φ2 ρ1 + φ3 ρ2 + ... + φp ρp−1 ρ2 = φ1 ρ1 + φ2 + φ3 ρ1 + ... + φp ρp−2 ...
(10)
ρp = φ1 ρp−1 + φ2 ρp−2 + φ3 ρp−3 + ... + φp Then, replacing ρk by ρˆk , we obtain the moment estimators φˆ1 , φˆ2 , ..., φˆp by solving the above linear system of equations. That is, ⎛ ˆ ⎞ ⎛ ⎞−1 ⎛ ⎞ φ1 ˆ ρp−1 ˆ 1 ρˆ1 ρˆ2 .. ρp−2 ρˆ1 ⎜ φˆ2 ⎟ ⎜ ρˆ1 ⎟ ⎜ ρˆ2 ⎟ 1 ρ ˆ .. ρ ˆ ρ ˆ 1 p−3 p−2 ⎜ ⎟=⎜ ⎟ ⎜ ⎟ ⎝ .. ⎠ ⎝ .. .. ⎠ ⎝ .. ⎠ ρp−1 ˆ ρp−2 ρˆp ˆ ρp−3 ˆ .. ρˆ1 1 φˆp
(11)
These estimators are usually called Yule-Walker estimators [13]. Having obtained φˆ1 , φˆ2 , ..., φˆp , we use the result ˙ + φ2 Zt−2 ˙ + .. + φp Zt−p ˙ + at )] γ0 = E(Z˙t Z˙ t ) = E[Z˙t (φ1 Zt−1 = φ1 γ1 + φ2 γ2 + .. + φp γp + σa2
(12)
and obtain the moment estimator for σa2 as σa2 = γˆ0 (1 − φˆ1 ρˆ1 − φˆ2 ρˆ2 − φˆp ρˆp )
4
(13)
Case Study
In this section, we present an example to show our proposed personalized time series model. 4.1
Model Identification
The number of laboratory tests can be attractive for many health care systems. Figure 1 shows a time series for a female patient who had done her blooding tests from January, 2002 to July, 2008. In total, there are 79 records as the observations of consecutive months. From this figure, it indicates that the series is not stationary in the mean and variance. We compute the sample ACF and sample PACF of the time series Z are shown in Figure 2 and 3 for choosing transformations or differencing. In Figure 2 and 3, we can see that the sample ACF doesn’t decays very slowly, and the sample PACF does not cut off after lag 1. Therefore, we do not need to consider a degree of differencing to make the time series stationary. At the same time, to investigate the required transformation for variance stabilization, we apply the power transformation analysis [2] to suggest
294
Q.V. Hu et al.
15 0
5
10
Z_t
20
25
30
Time Series Z Process: Monthly Lab Tests
0
20
40
60
80
t
Fig. 1. 79 Monthly Blood Testing Records: Z
0.6 0.4 0.0
0.2
rho_k
0.8
1.0
ACF for Time Series Z
0
5
10
15
20
k
Fig. 2. Sample ACF for Time Series Z
an optimal parameter as λ = 0.25. The power transformation is presented in Equation 14 Wt = T (Zt ) =
Ztλ − 1 λ
(14)
The transformed time series process W is plotted in Figure 4, in which we can see that W is stationary in the mean but may not be stationary in the
Analyzing and Predicting Personalized Medical Data
295
0.0 -0.3
-0.2
-0.1
phi_kk
0.1
0.2
PACF for Time Series Z
0
5
10
15
20
k
Fig. 3. Sample PACF for Time Series Z
1.0 0.0
0.5
W_t
1.5
2.0
Transformed W process
0
20
40
60
80
t
Fig. 4. Transformed Time Series: W
variance. Hence, we further compute the sample ACF and sample PACF for the transformed series W , which are shown in Figure 5 and 6. The sample ACF shows a dample sin-consine wave and the sample PACF has relatively large spike at lag 1, 8 and 13, suggesting that a tentative model may be an AR(1) model in Equation 16. (1 − φB)(Wt − μ) = at where Wt = T (Zt ) =
Ztλ −1 λ
in Equation 14 with λ = 0.25.
(15)
296
Q.V. Hu et al.
0.6 0.4 0.0
0.2
rho_k
0.8
1.0
ACF for Transformed Time Series W
0
5
10
15
20
k
Fig. 5. Sample ACF for Transformed Time Series W
0.00 0.05 0.10 0.15 0.20 -0.10
phi_kk
PACF for Transformed Time Series W
0
5
10
15
20
k
Fig. 6. Sample PACF for Transformed Time Series W
4.2
Forecasting
We have identified an AR(1) model in Section 4.1 for the transformed series. In this section, we also use this transformed series to forecast the next N th month laboratory tests in the future [3]. For this AR(1) model, we have (1 − φB)(Wt − μ) = at
(16)
Analyzing and Predicting Personalized Medical Data
297
where φ = −0.1, μ = 1.2 and σa2 = 0.1. In this case, we have 79 observations and want to forecast the next year of twelve months with their associated 95% forecast limits. First of all, we write the AR(1) model as Wt − μ = φ(Wt−1 − μ) + at
(17)
and the general form of the forecast equation is ˆ t (l) = μ + φ(Wˆt−1 (l − 1) − μ) W = μ + φl (Wt − μ)
(18)
Thus, the following twelve months’ predictions are computed in Equation 19 and the results are shown in Table 3. −1 ) 0.25 20.25 − 1 Wˆ79 (2) = 1.2 + (−0.1)2 ∗ ( ) 0.25 ... 2 Wˆ79 (1) = 1.2 − 0.1 ∗ (
0.25
2 Wˆ79 (12) = 1.2 + (−0.1)12 ∗ (
(19)
−1 ) 0.25
0.25
Table 3. Forecasting for the Next Twelve Months Predicted Month 1 2 3 4 5 6 7 8 9 10 11 12 Numbers 2.4 2.0 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 Table 4. 95% Forecasting Limits for the Next Twelve Months Predicted Month 1 2 3 4 5 6 Intervals [0.2, 12.1] [0.1, 10.9] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] Predicted Month 7 8 9 10 11 12 Intervals [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0] [0.1, 11.0]
Second, in order to obtain the forecast limits, we calculate the weights called ψ from the relationship (1 − φB)(1 + ψ1 B + ψ2 B 2 + ...) = 1
(20)
ψj = φj , ∀j ≥ 0
(21)
That is, Therefore, the 95% forecast limits for the forecasting results in Table 3 are computed in Table 4. Wˆ79 (1) ± 1.96 ∗ σa2 (22)
298
5
Q.V. Hu et al.
Conclusions
In this paper, we propose a time series method on how to identify a personalized model based on the patient’s laboratory test records. After successfully building up the personalized model, we predict the laboratory tests in the future. In addition, we also give a predictive limits for the forecasting, which is useful for many health care systems. The case study shows that the proposed method provides a good way for personalization analysis. In the future, we will continue on working for personalization tools, such as making this time series method be a GUI tool. Furthermore, we plan to work on group information for predictions.
Acknowledgements This research is supported part by NSERC CRD project. We would also like to thank three anonymous reviewers for their useful comments on this paper.
References [1] Alpha Global IT, http://www.alpha-it.com/ [2] Box, G.E.P., Cox, D.R.: An Analysis of Transformations. Journal of the Royal Statistical Society, Series B 26(2), 211–252 (1964) [3] Box, G.E.P., Jenkins, G.M.: TIme Series Analysis Forecasting and Control, 2nd edn. Holden-Day, San Franscisco (1976) [4] Dickey, D.A., Fuller, W.A.: Distribution of the Estimators for Autoregressive Time Series With a Unit Root. J. Amer. Statist. Assoc. 74, 427–431 (1979) [5] Dickey, D.A., Bell, B., Miller, R.: Unit Roots in Time Series Models: Tests and Implications. The American Statistician 40(1), 12–26 (1986) [6] Dunn, P.F.: Measurement and Data Analysis for Engineering and Science. McGraw–Hill, New York (2005) ISBN 0-07-282538-3 [7] Garg, A., Adhikari, N., McDonald, H., Rosas-Arellano, M., Devereaux, P., Beyene, J., Sam, J., Haynes, R.: Effects of Computerized Clinical Decision Support Systems on Practitioner Performance and Patient Outcomes: A Systematic Review. Jama 293(10), 1223 (2005) [8] Mills, T.C.: Time Series Techniques for Economists. Cambridge University Press, Cambridge (1990) [9] Pandit, S.M., Wu, S.-M.: Time Series and System Analysis with Applications. John Wiley & Sons, Inc., Chichester (1983) [10] Percival, D.B., Walden, A.T.: Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques, pp. 190–195. Cambridge University Press, Cambridge (1993) ISBN 0-521-43541-2 [11] Slutzky, E.: The Summation of Random Causes as the Source of Cyclic Processes. Econometrica 5, 105–146 (1937); Translated from the earlier paper of the same title in Problems of Economic Conditions [12] Stead, W.W., Garrett Jr., L.E., Hammond, W.E.: Practicing nephrology with a computerized medical record. Kidney Int. 24(4), 446–454 (1983) [13] Yule, G.U.: On a method of Investigating Periodicities in Disturbed Series with Special Reference to Wolfer’s Sunspot Numbers. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 226, 267–298 (1927)
Language Analytics for Assessing Brain Health: Cognitive Impairment, Depression and Pre-symptomatic Alzheimer’s Disease William L. Jarrold1 , Bart Peintner2 , Eric Yeh2 , Ruth Krasnow2, Harold S. Javitz2 , and Gary E. Swan2 1
2
UC-Davis, Davis, CA, 95616 [email protected] SRI International, Menlo Park, CA, 94025 [email protected]
Abstract. We present data demonstrating how brain health may be assessed by applying data-mining and text analytics to patient language. Three brain-based disorders are investigated - Alzheimer’s Disease, cognitive impairment and clinical depression. Prior studies identify particular language characteristics associated with these disorders. Our data show computer-based pattern recognition can distinguish language samples from individuals with and without these conditions. Binary classification accuracies range from 73% to 97% depending on details of the classification task. Text classification accuracy is known to improve substantially as training data approaches web-scale. Such a web scale dataset seems inevitable given the ubiquity of social computing and its language intensive nature. Given this context, we claim that the classification accuracy levels obtained in our experiments are significant findings for the fields of web intelligence and applied brain informatics.
1 Motivation Computational analysis of language shows promise as a diagnostic. Word choice and other linguistic markers are heavily affected by many brain-based disorders. This increasingly substantiated phenomenon implies that language analysis can contribute to early diagnosis, and therefore to more timely and effective treatment. As individuals continue to increase the quantity and richness of their language-based interaction through the web, and as we increase the sophistication of automatic language analysis, there is a corresponding increase in the viability of near-continuous monitoring for early signs of brain-based disorders, an important capability for an intelligent web. Therefore, it is of substantial importance to investigate and identify particular language measures associated with each disorder and how this association varies with context. As reasonably accurate models have been proven, continuous language analysis can provide doctors with an objective, unobtrusive, and ecologically-valid measure of cognitive status. We present a machine learning-based methodology and architecture for identifying and testing language (and other) measures that serve as markers for brain-based Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 299–307, 2010. c Springer-Verlag Berlin Heidelberg 2010
300
W.L. Jarrold et al.
disorders. We evaluate its application to three disorders: pre-symptomatic Alzheimer’s Disease (Pre-AD), cognitive impairment, and depression. We show that the methodology independently discovers relationships previously reported in the literature and produces accurate diagnostic models. Finally, we demonstrate the importance of context when processing unstructured speech or language, and discuss this problem as a critical area for future work.
2 Methodology and Architecture We describe a method that allows researchers to classify patients according to atypicalities in speech and language production. Using only speech samples labeled with the speakers’ clinical classification, plus a set of controls the system determines the associations between the disorder and a large set of lexical measures, and produces a model that maps multiple measures to a prediction of the disorder. Here, we focus on the language elements of the process, but it works for acoustical features as well.
Fig. 1. Data flow for speech and language data analysis architecture used to determine key measures develop predictive models for a particular disorder
Figure 1 shows the process graphically. First, the raw audio is transcribed into text. In these studies, we used human transcribers for maximum accuracy, but automatic transcription has been similarly used ([1], [2]). Next, during lexical feature extraction, the transcript for each subject is fed into lexical analysis tools which extract a number of linguistic features, each of which may vary based on which disorder is present. The present work uses three lexical analyzers: Part of Speech Tagger(POST). Part of speech frequency features are extracted from a text sample using this tool [3]. The result is a vector consisting of the percent frequency of nouns, adjectives, verbs, etc. present in the sample. Linguistic Inquiry Word Count(LIWC). This tool [4] computes the frequency of words from pre-defined lists based on categories such as positive emotion words, socially related words, first-person words, etc. The are approximately 80 such features in the output vector. CPIDR. CPIDR [5] accurately measures propositional idea densities, which intuitively is the density of distinct facts or notions contained in a text. Low idea density has been show to presage AD decades before any overt signs of the disorder [6]. During feature selection (see Figure 1), we remove features that do not vary significantly based on the presence or absence of a particular disorder. The measures simply
Language Analytics for Assessing Brain Health
301
need to be partitioned based on the label of the subject, e.g., ‘depressed’ or ‘control’. The last step in the process feeds the measures and the associated label for a group of subjects into machine learning tools, which induce models that classify arbitrary text samples via their lexical features. Diagnostic models are induced from a subset of the patients (i.e. the training set) and then evaluated via a test set. During training, each patient’s transcript is fed to lexical feature extraction resulting in a vector of lexical features values that characterize a given patient’s speech. Each vector used in the training set is paired with the corresponding patient’s diagnosis and input to ML. The learner outputs an executable classifier that should predict diagnosis associated with a given text. In the test phase (not shown in 1), a classifier is evaluated by presenting it with lexical feature vectors from patients in the held-out test set. The classifier outputs a diagnosis which is compared with the known diagnosis resulting in a correct or incorrect score for that patient.
3 AD and Cognitive Impairment (CI) Experiments This section contains our result showing that induced models of linguistic features can detect current cognitive impairment and predict future onset of AD. 3.1 Background on Language Connections with AD and CI The scientific basis of language-based markers of conditions such as AD and cognitive impairment is found across a variety of studies. Cognitive impairment (CI) was shown to be measurable via computer-based analysis of speech ([2], [7]). Conditionspecific language features (e.g. [8]) can be detected and exploited using automatic speech recognition, text analytics and machine learning (ML) to classify different types of fronto-temporal lobar degeneration - a gerontological neurodegenerative condition distinct from AD [1]. Regarding AD, [6] has shown that a language characteristic known as low idea density in the autobiographical writings of Nuns in their 20’s was a strong predictor of Alzheimer’s disease (AD) at time of death more than 50 years later. This finding provides the basis for our aim to detect preclinical AD. This is because although there are no disease modifying treatments for AD, the consensus in the field is that when treatments are available it will be very important to start treatment long before clinically significant damage has occurred to the brain. These findings provide the basis for two main questions: (1) can measures and models be developed to distinguish Pre-AD subjects from age-matched healthy controls; and (2) can measures and models be developed to distinguish current cognitive impairment subjects from healthy controls? Regarding (1) we aimed to evaluate language markers that predict which WCGS 1980s interview participants eventually died with AD. This had two aspects. First, we aimed to replicate findings from the Nun Study [6] in which a language measure known as idea density was found to be associated with the development of AD decades later. Secondly, we aimed to evaluate the ability of machine learning to identify patterns across the entire feature set that are predictive of AD acquisition.
302
W.L. Jarrold et al.
Regarding (2) we aimed to replicate findings of another study in which a computer tool called PCAD was shown to assess current cognitive impairment in spontaneous speech [2]. We evaluated the ability of machine learning to classify patients via features other than PCAD-based ones. 3.2 Data and Subject Selection Patient speech and other clinical data were obtained from the Western Collaborative Group Study (WCGS), a 40+ year longitudinal study involving a wide-ranging array of demographic, personality, neuropsychological and cause of death data collected for the purpose of studying behavioral and neuropsychological variables associated with cardiovascular outcomes. Although cardiovascular outcome was not of interest in the present study, speech data were obtained from audio recordings of the 15-minute structured interview [9] administered to every WCGS participant circa 1988. We transcribed interviews from subsamples of the WCGS population. We applied the three lexical analyzers described in Section 2. The output of these analyses (e.g. measures of frequencies of various types of words and phrases) were compared to expectations from prior literature. The output vectors were fed to ML algorithms. The diagnostic accuracies of the resulting classifiers were evaluated. The following describes our method for selecting the sub-samples: Pre-symptomatic AD vs Controls. A pre-symptomatic AD group was identifiable because of the 40+ year WCGS duration. We were able to select a subsample of 22 who were cognitively normal at the time of the 1988 interview but would eventually die with the cause of death be listed as clinically verified AD (all ICD-9 = 331.0). Controls were an age-matched cognitively normal sub-sample of 23 men never diagnosed with dementia. Mean age at time of interview was 73.13 (SD 4.9; range 65 80). Cognitively normal was defined as scoring less than 0 on the IOWA Screening Battery for Mental Decline [10] (a neuropsychological measure of cognitive impairment) at time of interview. CI vs Controls. Groups were formed via random selection of subjects with IOWA > 8 (Cognitively Impaired) and IOWA <0 (Controls). 3.3 Procedure As outlined in Section 2 we first transcribed the interview audio recordings. Then, we used the three linguistic analysis tools (LIWC, CPIDR, POST) to extract a large number of linguistic measures from each sample. For feature selection, we analyzed all features independently using a one-way ANOVA. In addition, we fed all vectors into implementations of three different ML algorithms: logistic regression, J48, and a multi layered perceptron1 To assess model accuracy, we performed this 5-fold cross-validation procedure: (1) Randomly split the data into 5 parts (folds), maintaining (approximately) equal distribution over categories in each fold; (2) Choose the first fold as the test set; (3) Run an ANOVA on the remaining folds (i.e. the training set) and select features with p values <= 0.05 (if none, choose all features); (4) Induce a diagnostic model by running a learner using the training set based on the selected features; and (5) Calculate accuracy 1
We used implementations in the open-source Weka machine learning toolkit. [11]
Language Analytics for Assessing Brain Health
303
by applying model to test set transcripts and comparing model predictions against actual CESD-based diagnostic category (6) Repeat steps 2-5 using each remaining fold as the test set and report average accuracy. Because of the low number of subjects, variation in accuracy was expected to be high. Therefore, we performed the procedure above 100 times and reported the mean accuracy and its standard-deviation. 3.4 Results As hypothesized, idea density as measured by the program CPIDR [5] was significantly less in the Pre-AD group than in the matched controls group (see Table 1). Further, this difference was significant even after statistically controlling for Education, age, age squared and cognitive impairment measures (i.e. IOWA as well as its subcomponents, i.e. the COWA (a measure of verbal fluency), and the BVRT a measure of visual perception and visual memory. This provides evidence to support the claim that the “Nun Study” finding [6] involving idea density generalizes to (a) spontaneous speech in addition to written language (b) men in addition to women (c) individuals in later life as well as early life. After applying machine learning to computer-based lexical analysis of the transcripts we were able to predict which individuals went on to develop AD with an accuracy of 73 %(compared to standard benchmark of naive learner baseline accuracy of 58%). Table 1. Prediction accuracy and Idea Density in Pre-AD versus Controls Mean(sd) Number of Cases Pre Alzheimer’s Disease 0.499 (0.02) 20 Age Matched Controls 0.528 (0.02) 23 Prediction Accuracy using best learner 73%
Regarding detecting cognitive impairment we were simply interested in how lexical analyses feature profiles when fed to machine learning may be used to predict diagnosis. As can be seen in the table below our accuracy at predicting cognitive impairment was reasonably good – 82.6 %. (baseline accuracy:50%) Table 2. Accuracy of Classification into High versus Low Cognitive Impairment N with High Cognitive Impairment (i.e. IOWA < 0) 15 Controls (IOWA > 8.0) 15 Best Classification Accuracy Obtained 82.6%
3.5 Discussion We believe our findings regarding the association between low idea density and individuals who were cognitively intact (e.g. not suffering from MCI) and who later developed AD is one of our most significant results. They are potentially a replication of [6] and support utility as an early diagnostic. However, it should be noted that our diagnostic labels were based on clinical judgment of the physician signing the death certificate rather than via the gold standard method of analysis of brain tissue at autopsy.
304
W.L. Jarrold et al.
4 Importance of Conversational Context Our data involving depression demonstrates how diagnostic accuracy can vary as a function of the conversational context. Like AD and CI, clinical depression involves measurable brain-level phenomena [12] and is therefore also relevant to brain informatics. Depression has been shown to exert the largest disease burden worldwide as measured in terms of years lost due to disability [13]]. For this reason, findings that may aid diagnosis are important. We aimed to replicate findings of prior studies demonstrating that depression is associated with a higher frequency of self-focused words (e.g. I, me, my, I’d, I’ve) [14], [15], [16]. Second we wanted to evaluate diagnostic potential by measuring how accurately machine learning algorithms classify individuals on the basis self-focus and other lexical features. Third, we had a hypothesis of context specificity, i.e. that diagnostic accuracy would vary substantially as a function of the conversational context. Given that our data was derived from a structured interview it was ideal for this question. Cognitive depression theory [12] caused us to conjecture that greatest accuracy would be obtained if we focused analysis on responses to a question that focused the interviewee on broad self-evaluations. One question (see immediately below) from the interview stood out by far as the one that fit this description and had sufficient mean response length (greater than 100 words). Question 24-b: In your work or career, have you accomplished most of the things that you wanted to accomplish? (If No) Why not? What’s gotten in the way? Are you doing anything about this? WCGS patients (see 3.2 above) were administered the Center for Epidemiologic Studies - Depression Scale (CESD), a self-report measure of depression, at the time of their 1988 interview. We created a group of depressed subjects by randomly selecting cases that had a CESD score of greater than 25. The non-depressed group was created by selecting a sample of individuals with CESD scores less than 20 such that half of them had a CESD score above the median value and half had a CESD score below the median value. 4.1 Results In an analysis of the entire interview, there was no association between depression and self-focused language. Secondly, the diagnostic accuracy of models based on the lexical feature set (including self-focused word frequency) was only slightly better than chance. We hypothesized the effect may only be present in certain questions that may activate depressive schemas whereas certain questions may not. Many of the questions were more relevant to Type A behavior characteristics than to depressive status, e.g. How do you show your anger? However, in the analysis restricted to the target selected question, we did find the expected difference. That is, depressed individuals had significantly higher frequencies of first-person words: depressed subjects had a mean of 12.9% (sd 4.2), while non-depressed had a mean of 6.3 (2.03). Furthermore, there was nearly complete separation between these two groups in frequency of self-focused words. As a result an ML algorithm based simply on this one feature was very accurate (mean(sd)=97.6(4.91) %). Interestingly, if we gave all measures to our machine learning
Language Analytics for Assessing Brain Health
305
algorithms, accuracy was lower (78(4.7) %). This is due to a small sample size: in an attempt to not over generalize, the learners gave some weight to other measures that hurt accuracy. 4.2 Discussion Results demonstrate how analyses restricted to certain contexts – in this case a response to a particular question – can improve accuracy. A noteworthy aspect of the depression work was that when we did not get good results we were able to leverage existing theory in a top down way focusing on a specific conversational context to improve accuracy. It also illustrates a challenge – the whole web is not a structured interview. To apply linguistic analysis to unstructured settings, while the amount of data will undoubtedly be larger, not all of it will be relevant. Identifying the proper conversational contexts, preferably automatically, is a key challenge.
5 Conclusions, Future Directions Results motivate and inform additional research and development into systems that automatically analyze language thus providing information of interest in a non-invasive, low-cost way to e.g. health-care providers. These findings are significant in the context of web intelligence [17]. The language streams from large numbers of individual “patients” interacting with the intelligent web can comprise a large training corpus. Machine learning performance tends to increase dramatically when ones training set becomes larger by several orders of magnitude[18]. Thus, the approach should achieve useful accuracy to the extent that we can reliably associate individual’s language data with their health status without violating privacy laws and ethics. Regarding future work, one important project involves applying a multi-agent framework to the problem. For example, one may partition the problem such that there would be one learner for each of the 24 questions in the structured interview. Outputs could be combined in a meta-learner. [19] has found that for certain task partitionings, multiagent learners can outperform single-agent learners. This method is not inherently limited to identifying clinical level lexical patterns. Results augur successful application of the method scientifically – to identify language patterns associated with phenomena at the level of brain science and informatics – e.g. fMRI, ERP etc. Thus results are significant to brain informatics. Across three disorders (four considering [1]) our results provide compelling evidence of the promise of language analytics based on machine learning as an aid to clinicians - especially in the context of web intelligence. Of these, the finding regarding pre-symptomatic AD diagnosis seems most noteworthy. Results justify performing more work using larger samples, samples obtained from a greater variety of language tasks, and more sophisticated lexical analysis and machine learning algorithms. If these conditions are met web intelligence may someday assist with diagnostic assessment of brain health using methods akin to the above.
306
W.L. Jarrold et al.
Acknowledgements WCGS was supported by NIA Grant AG09341 and the present paper’s analysis was supported by the Center for Research on Independent Agiing (CRIA) at SRI International. Data analyses were performed while Dr Jarrold was at SRI International. Address correspondence to him at [email protected].
References [1] Peintner, B., Jarrold, W., Vergyri, D., Richey, C., Gorno Tempini, M., Ogar, J.: Learning Diagnostic Models Using Speech and Language Measures. In: 30th Annual International IEEE EMBS Conference, Vancouver, British Columbia, Canada, August 20-24 (2008) [2] Gottschalk, L.A., Bechtel, R.J., Maguire, G.A., Katz, M.L., Levinson, D.M., Harrington, D.E., et al.: Computer detection of cognitive impairment and associated neuropsychiatric dimensions from the content analysis of verbal samples. American Journal of Drug and Alcohol Abuse 28, 653–670 (2002) [3] Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part- of-speech tagging with a cyclic dependency network. In: Proceedings of HLT-NAACL, pp. 252–259 (2003) [4] Pennebaker, J.W., Francis, M.E., Booth, R.J.: Linguistic Inquiry and Word Count: LIWC 2001. Erlbaum Publishers, Mahwah (2001) [5] Brown, C., Snodgrass, T., Kemper, S.J., Herman, R., Covington, M.A.: Automatic measurement of propositional idea density from part-of-speech tagging. Behavior Research Methods 40(2), 540–545 (2008) [6] Snowdon, D.A., Kemper, S.J., Mortimer, J.A., Greiner, L.H., Wekstein, D.R., Mackesbery, W.R.: Linguistic ability in early life and cognitive function and Alzheimer’s disease in late life: Finds from the Nun Study. Journal of the American Medical Association 275, 528–532 (1996) [7] Thomas, C., Keselj, V., Cercone, N., Rockwood, K., Asp, E.: Automatic detection and rating of dementia of Alzheimer type through lexical analysis of spontaneous speech. In: 2005 IEEE International Conference on Mechatronics and Automation, vol. 3, pp. 1569– 1574 (2005) [8] Wilson, S.M., Henry, M.L., Besbris, M., Ogar, J.M., Dronkers, N.F., Jarrold, W., Miller, B.L., Gorno-Tempini, M.L.: Connected speech production in three variants of primary progressive aphasia. Brain, Advance Access published on June 11 (2010), doi: 10.1093/brain/awq129 [9] Rosenman, R.H., et al.: A Predictive Study of Coronary Heart Disease: The Western Collaborative Groups Study. Journal of the American Medical Association 189, 15–22 (1964) [10] Eslinger, P.J., Damasio, A.R., Benton, A.L.: The Iowa Screening Battery for Mental Decline. Department of Neurology (Div of Behavioral Neurology), University of Iowa (1984) [11] Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) [12] Beck, A.T.: The Evolution of the Cognitive Model of Depression and Its Neurobiological Correlates. Am. J. Psychiatry 165, 969–977 (2008) [13] World Health Organization: The global burden of disease: 2004 update, Part 3, Disease incidence, prevalence and disability (2004) [14] Stirman, S.W., Pennebaker, J.W.: Word use in poetry of suicidal and non-suicidal poets. Psychosomatic Medicine 63, 517–522 (2001)
Language Analytics for Assessing Brain Health
307
[15] Rude, S.S., Gortner, E.M., Pennebaker, J.W.: Language use of depressed and depressionvulnerable college students. Cognition and Emotion 18, 112–133 (2004) [16] Mehl, M.R.: The lay assessment of sub-clinical depression in daily life. Psychological Assessment 18, 340–345 (2006) [17] Zhong, N., Liu, J., Yao, Y.: In Search of the Wisdom Web. Computer, 27–31 (November 2002) [18] Banko, M., Brill, E.: Scaling to Very Very Large Corpora for Natural Language Disambiguation. In: ACL 2001, pp.26–33 (2001) [19] Sun, R., Peterson, T.: Multi-agent reinforcement learning: weighting and partitioning. Neural Networks 12, 727–753 (1999)
The Effect of Sequence Complexity on the Construction of Protein-Protein Interaction Networks Mehdi Kargar and Aijun An Department of Computer Science and Engineering, York University 4700 Keele Street, Toronto, Ontario, Canada, M3J 1P3 Abstract. In this paper, the role of sequence complexity in the construction of important nodes in protein-protein interaction (PPI) networks is investigated. We use two complexity measures, linguistic complexity and Shanon entropy, to measure the complexity of protein sequences. Three different datasets of yeast PPI networks are used to conclude the results. It has been shown that there are two important types of nodes in the PPI networks, which are hub and bottleneck nodes. It has been shown recently that hubs and bottlenecks tend to be essential in the process of evolution. Better understanding of the properties of these two types of nodes will shed light on why proteins interact with each other in the observed manner. We show that the sequence complexity of hubs are lower than that of non-hubs. But the difference is not significant in most cases. On the other hand, the sequence complexity of bottlenecks are lower than that of non-bottlenecks and the difference is significant in most cases. Modularity has an effective role in the construction of PPI networks. We find that there is no significant difference in the node complexity among different modules in a PPI network.
1
Introduction
Biological networks compose of interaction of cellular components such as DNA, RNA and protein. They are fundamental to the most biological processes. Proteinprotein interaction (PPI) networks can be generated using new technologies. A PPI network is a graph that can consist of several thousand nodes. In the graph, nodes represent proteins and edges represent physical interaction between two proteins. Since the function of each protein is related to its interaction with other proteins, analyzing PPI networks leads to obtaining the unknown functions of proteins. Since PPI networks are free-scale networks, they have power-law distribution of node connectivity. Based on the distribution of power-law, free scale networks contain a small number of nodes which are highly connected, while most nodes in the network are slightly connected. Nodes with high connectivity are called hubs and nodes with low connectivity are called non-hubs [20]. Another topological property of the network is betweenness. Betweenness specifies the total number of nonredundant shortest paths going through a specific node [4]. Nodes with high values of betweenness lay on the most of the Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 308–319, 2010. c Springer-Verlag Berlin Heidelberg 2010
The Effect of Sequence Complexity on the Construction of PPI Networks
309
shortest paths in the networks. Thus, these nodes represent the central nodes of the network. Yu et al. called these high-betweenness proteins bottlenecks, and hypothesized that these bottlenecks, like hubs, are important nodes in biological networks [20]. Modularity also plays an important role in the construction of networks. PPI networks can be specified as functional modules which are isolated chemically and spatially where links within modules are much denser than those across modules. Modern molecular biology is concerned with analyzing large amounts of data. Due to the complexity of biological systems and networks, it is not a trivial task. For acquiring complete and comprehensive knowledge, different methods and approaches should be used for analyzing the input data. One of the most important properties of biological sequences is their high repetitiveness. Different combinations of frequencies and repetitiveness in specific parts of biological sequences indicates the presence and density of biological messages. In this work we evaluate the role of sequence complexity in the construction of PPI networks. The complexity of each protein node is calculated using linguistic complexity and Shanon entropy. We assume that different types of nodes in the network might have different sequence complexities. We use the protein-protein interaction network of yeast to investigate whether this assumption is true. Three PPI network datasets are used and analyzed. The rest of this paper is organized as follows. Section 2 reviews the related work. Definitions of concepts, methods and measures used in our study are presented in section 3. The PPI datasets are introduced and compared to each other in section 4. The effect of sequence complexity on hubs, bottlenecks and modules are evaluated in section 5. Section 6 concludes the paper.
2
Related Work
In [3], authors describe a computational methodology capable of predicting key regulatory genes and proteins in disease and condition-specific biological networks. Their method builds shortest path network connecting condition-specific genes using global database of protein interactions from MetaCore. They evaluate the number of all paths traversing each node in the shortest path network in relation to the total number of paths going via the same node in the global network. They determine the statistical significance of the network connectivity based on these numbers and the relative size of the initial data set. They also construct disease pathways based on predicted regulatory nodes. In [9], authors develop information flow analysis, a new computational method that specifies proteins central to the transmission of biological network. In the information flow analysis, they show the network as an electrical circuit, where proteins are modeled as interconnecting junctions and interactions as resistors. The method calculates an information flow score for every protein. Their approach
310
M. Kargar and A. An
incorporates confidence scores of protein-protein interactions and evaluates all possible paths in a network. They suggest that the likelihood of observing lethality and pleiotropy when a protein is eliminated is positively correlated with the proteins information flow score. Structures and dynamics of living organisms are affected by the evolution, properties and complexity of genome sequences [8]. The complexity of sequences is useful in reproducing phylogenic trees, compacting biological sequences, identifying the genomic structures and studying genomic evolution [10]. Text complexity can be measured using the Kolmogorov complexity test, the entropy of higher-order Markov model (CM) or Shannon entropy [19]. Since there is not a unique definition for sequence complexity, various complexity measures have been defined for analyzing biological sequences. In contrast to complexity measures, low complexity regions have a well defined definition. Low complexity zones are produced in the presence of dispersed or tandem repeats, palindromic structures, biased nucleotide composition and also a combination of these properties [14,17]. In addition to complexity and entropy based measures for analyzing genome sequences, the statistical properties of coding and non coding regions are evaluated in [1]. The application of statistical methods such as Pearson’s chi-square test to detect the signals in the whole genome of the Escherichia coli is presented in [15]. The efficiency of the method is evaluated by comparing the Pearson’s chisquare test with linguistic, CE and CWF complexity on the complete genome of E. coli. They showed that Pearson’s chi-square test distinguishes genes (coding regions) from pseudo genes (non-coding regions). They also demonstrated which parts of the ORF have significant effect on discriminating genes from pseudo genes. These parts are 100 nucleotides before the start codon position, around the start codon position, the middle position of an ORF and around the stop codon position. They concluded that the region around the start codon has the best performance in discriminating genes from pseudo genes.
3
Methods and Measures
In the first three parts of this section, the definitions of hubs, bottlenecks and modules are presented. Then, the process of computing the measures is illustrated. The measures of linguistic complexity and Shanon entropy are described at the end. 3.1
Hubs and Non-Hubs
The connectivity of a protein in the PPI network is defined as the number of proteins it is connected to, excluding self interactions (Figure 1). Hubs are defined as proteins with five or more interactions [7]. Using different values of thresholds, we might have different numbers of hubs. In this work, we consider nodes with degree greater than five as hub nodes.
The Effect of Sequence Complexity on the Construction of PPI Networks
C
A
311
D
B Fig. 1. Schematic showing of hubs and bottlenecks in a network. A is a hub-bottleneck node. B is a non-hub-non-bottleneck node. C is a non-hub-bottleneck node. D is a hub-non-bottleneck node. Red (dark) and green (light) nodes belong to two different modules.
3.2
Bottlenecks and Non-Bottlenecks
The betweenness of a node i is defined as the total number of nonredundant shortest paths between pairs of other nodes which go through node i [4]. Nodes with a high value of betweenness usually lay in the center of the network (e.g., nodes A and C in Figure 1). It measures how influential a node over the flow of information between other nodes. Bottlenecks are defined as the proteins that are in the top 20% in terms of betweenness values [20]. To compute node betweenness within networks, we used an improved version of the algorithm developed by Newman and Girvan [11,4]. This algorithm is improved by Yu et al. [20] and contains eight steps. These steps are as follows. 1. Initialize the betweenness of every node n in the network bn = 0. 2. Start from a node p. A breadth-first tree is built with p on the top and the nodes which are farthest from p in the network are located at the bottom of the tree. Each node is put at a specific level of the tree based on its shortest distance from p. 3. A variable wp = 1 is assigned to p. In the process of building the tree, for every node q, wq = r∈L wr where L is the set of nodes that are the neighbors of the q and are at the immediate proceeding level. 4. Another variable cq , with an initial value of 1, is assigned to every node q in the tree. 5. Starting from a bottom node q, the value of cq is added to the corresponding variable of the predecessor of q. If q has more than one predecessor, each predecessor r gets the value of: (cq ) × (wr /wq ). Thus, cr = cr + ((cq ) × (wr /wq )). 6. Perform step 5 for every node in the tree. 7. For every node q in the tree, bq = bq + cq . 8. Repeat steps 2-7 for every node in the network.
312
3.3
M. Kargar and A. An
Modularity
Many complex networks are divided into modules or communities, where edges within modules are much denser than the edges across modules [12]. The modularity of a network provides structural information about the network and determines the underlying mechanisms that specify the network structure (Figure 1). NM The modularity of a network is usually measured by M = m=1 [(em /E) − (dm /2E)2 ] [5,13]. In this objective function, NM is the number of modules, E is the total number of edges (interactions) in the network, em is the number of edges (interactions) in module m and dm is the sum of the degrees of the nodes (proteins) in module m. A modular separation that maximizes the objective function M is referred to the optimal modular separation and the corresponding M value is considered as the modularity of the network. This objective function produces accurate modules because a good partition of a network into modules must include as many within-module edges as possible and as few betweenmodule edges as possible and M satisfies this property well. Several methods were proposed to separate a network into modules to maximize M . Simulation results suggest that the method proposed by Guimera and Amaral [5] has the best accuracy because it produces the highest M [2]. 3.4
Process of Computing the Complexity Measures
In this section, the process of computing the complexity value for each protein sequence (i.e., a node of the PPI network) is described. For each protein, we consider a sliding window of size N , where N is the number of amino acids in the window. A window W of size N is represented as W = (si , si+1 , si+2 , . . . , si+N −1 ), where si+j is an amino acid in the window. For each window, a complexity value is calculated according to the measure of interest (to be described in the next sections). Then, the average complexity over all windows is assigned to the protein as the complexity of the protein. Assume a protein consists of K amino acids (s1 , s2 , . . . , sK ). Then, its first sliding window is (s1 , s2 , . . . , sN ), the second is (s2 , s3 , . . . , sN +1 ), and so on. Thus, the number of sliding windows for this protein is K − N + 1. For example, if the window size is set to 100 and the size of the protein sequence is 400, then the complexity measure of interest is calculated over 301 sliding windows. The average complexity over these 301 windows is assigned to the protein as its complexity. 3.5
Linguistic Complexity
One of the complexity based measures is linguistic complexity. Linguistic complexity is represented by the ratio of the number of subsequences that occur in the sequence of interest to the maximum number of subsequences for a sequence
The Effect of Sequence Complexity on the Construction of PPI Networks
313
of the same length over a same alphabet [18]. Since the sequence is a protein, the alphabet is defined as 20 amino acids. The maximum number of subsequences (also called maximum vocabulary) can be computed as follows: maximum vocabulary =
N
min(AL , N − L + 1)
(1)
L=1
where A is the alphabet size, L is the subsequences length and N is the size of the window. The complexity measure is defined as Linguistic Complexity =
n maximum vocabulary
(2)
where n is the number of subsequences that occur in the sequence of interest. 3.6
Shanon Entropy
Shanon’s entropy is defined based on the probability of occurrence of the symbols [10]. The entropy of a subsequence (window) is calculated based on the following relation: Shanon s entropy = −
M [mi /(N − m + 1)] log2 [mi /N − m + 1]
(3)
i=1
where N is the window size, m is the length of the word (in this work, we only use words of length one), M = Am is the total number of words with length m, A is the alphabet size (in this work it equals 20) and mi is the number of i-th word in a window.
4
The Protein-Protein Interaction Datasets
In this paper, three datasets of yeast protein-protein interaction networks are used. These include MIPS, DIP and FYI datasets. The MIPS dataset is from the MPact dataset of [6]. It contains small-scale binary and human curated high-throughput interactions directly obtained by experiments. Also, it includes binary interactions inferred from high-confidence protein complex data. MIPS contains only non-self physical interactions. The DIP dataset was built using the core dataset from [16]. The FYI dataset is from [7]. Since all the three datasets are yeast PPI networks, there are some overlaps among them. Figure 2 illustrates the intersections among the sets of nodes in the three PPI networks. Figure 3 shows how the edges in the three networks overlap with each other. In addition, the overlaps in hubs and bottlenecks of the three datasets are illustrated in Figures 4 and 5, respectively.
314
M. Kargar and A. An
MIPS DIP
959 1735
FYI 435 1098 115
93 72
Fig. 2. The overlaps in nodes of three different yeast protein-protein interaction datasets
861 4976
3999
MIPS DIP
1110
FYI 391
313 678
Fig. 3. The overlaps in edges of three different yeast protein-protein interaction datasets
The Effect of Sequence Complexity on the Construction of PPI Networks
315
MIPS 111 300
252
DIP FYI
110 46
28 15
Fig. 4. The overlaps in hubs of three different yeast protein-protein interaction datasets
MIPS DIP
108
529
275
FYI
92 47
45 67
Fig. 5. The overlaps in bottlenecks of three different yeast protein-protein interaction datasets
5
The Effect of Sequence Complexity on Hubs, Bottlenecks and Modules
The significance levels of the difference in linguistic complexity and Shanon entropy between hubs and non-hubs are presented in Table 1. Those between bottlenecks and non-bottlenecks are presented in Table 2. By using the KolmogrovSmirnov test, we found that the obtained linguistic complexity and Shanon entropy values does not follow normal distributions. Thus, we used the MannWhitney test to calculate the significance level (i.e., p-value) of the difference
316
M. Kargar and A. An
Table 1. The significance (p-value) of the difference in complexity between hubs and non-hubs in PPI networks. The Mann-Whitney test is used to compare hubs and nonhubs. complexity measure dataset window 25 window 50 window 100 Linguistic DIP 0.343 0.087 0.082 Linguistic MIPS 0.000 0.000 0.000 Linguistic FYI 0.752 0.763 0.585 Shanon DIP 0.719 0.377 0.121 Shanon MIPS 0.000 0.000 0.000 Shanon FYI 0.441 0.994 0.864 Table 2. The significance (p-value) of the difference in complexity between bottlenecks and non-bottlenecks in PPI networks. The Mann-Whitney test is used to compare bottlenecks and non-bottlenecks. complexity measure dataset window 25 window 50 window 100 Linguistic DIP 0.120 0.012 0.019 Linguistic MIPS 0.000 0.000 0.000 Linguistic FYI 0.010 0.002 0.020 Shanon DIP 0.389 0.061 0.038 Shanon MIPS 0.000 0.000 0.000 Shanon FYI 0.036 0.019 0.174
between different classes. The results are computed for windows of size 25, 50 and 100. The significance threshold is set to 0.05. The results whose p-value is below 0.05 are shown in the bold face. The results suggest that there is no significant difference in node complexity between hubs and non-hubs except for the MIPS dataset. On the other hand, the complexity values of bottlenecks are significantly different from those of non-bottlenecks in most of the cases. We also tested to see whether the complexities of the nodes belonging to one module are significantly different from the complexities of nodes belonging to other modules. Since we have more than two modules in each dataset and the complexity values do not follow a normal distribution, the Kruskal-Wallis test is applied to find the significance level of the difference in node complexity among different modules. The results are summarized in Table 3. In most cases, there is not a significant difference in node complexity among different modules and most of the nodes in different modules have similar complexity values except for the MIPS dataset. The average complexity values of hubs and non-hubs are presented in Table 4. The results show that the complexity of hubs are lower than the complexity of non-hubs. This suggests that the complexities of hubs decrease during the evolution (although not significantly in most of the cases according to the results in Table 1). The similar process happens in the bottleneck nodes. The average complexity values of bottlenecks are summarized in Table 5. Similar to hub nodes,
The Effect of Sequence Complexity on the Construction of PPI Networks
317
Table 3. The significance (p-value) of the difference in complexity of nodes among different modules in PPI networks. The Kruskal-Wallis test is used to compare different modules. complexity measure dataset window 25 window 50 window 100 Linguistic DIP 0.078 0.028 0.120 Linguistic MIPS 0.000 0.000 0.000 Linguistic FYI 0.567 0.487 0.200 Shanon DIP 0.300 0.184 0.341 Shanon MIPS 0.000 0.000 0.000 Shanon FYI 0.505 0.294 0.209
Table 4. The average complexity of hubs and non-hubs in PPI network complexity measure dataset Linguistic DIP Linguistic DIP Linguistic MIPS Linguistic MIPS Linguistic FYI Linguistic FYI Shanon DIP Shanon DIP Shanon MIPS Shanon MIPS Shanon FYI Shanon FYI
node type window 25 window 50 window 100 hub 0.9717 0.9915 0.9950 non-hub 0.9722 0.9920 0.9953 hub 0.9708 0.9912 0.9947 non-hub 0.9721 0.9918 0.9951 hub 0.9724 0.9920 0.9952 non-hub 0.9721 0.9920 0.9952 hub 3.4494 3.7547 3.9216 non-hub 3.4526 3.7610 3.9295 hub 3.4299 3.7344 3.9048 non-hub 3.4524 3.7634 3.9331 hub 3.4606 3.7643 3.9298 non-hub 3.4523 3.7611 3.9280
Table 5. The average complexity of bottlenecks and non-bottlenecks in PPI network complexity measure dataset Linguistic DIP Linguistic DIP Linguistic MIPS Linguistic MIPS Linguistic FYI Linguistic FYI Shanon DIP Shanon DIP Shanon MIPS Shanon MIPS Shanon FYI Shanon FYI
node type window 25 window 50 window 100 bottleneck 0.9715 0.9914 0.9948 non-bottleneck 0.9722 0.9920 0.9953 bottleneck 0.9715 0.9915 0.9948 non-bottleneck 0.9720 0.9918 0.9950 bottleneck 0.9716 0.9915 0.9950 non-bottleneck 0.9723 0.9921 0.9952 bottleneck 3.4445 3.7473 3.9157 non-bottleneck 3.4536 3.7622 3.9302 bottleneck 3.4361 3.7424 3.9140 non-bottleneck 3.4525 3.7636 3.9329 bottleneck 3.4422 3.7454 3.9161 non-bottleneck 3.4569 3.7660 3.9316
318
M. Kargar and A. An
the complexity of bottlenecks are lower than the complexity of non-bottlenecks, which suggests that the complexities of bottlenecks decrease during the evolution and they decrease significantly most of the time (according to the results in Table 2).
6
Conclusions
In this paper, the role of sequence complexity in the construction of important nodes in PPI networks of yeast is studied. The complexity measures we used include linguistic complexity and Shanon entropy. Three different datasets of yeast PPI networks are used in the study. We focused on investigating two types of essential nodes in the PPI networks, namely hub and bottleneck nodes. We show that the complexities of hubs are lower than the complexities of non-hubs. It should be noted that this difference is not significant in most cases. On the other hand, the complexities of bottlenecks are lower than the complexities of non-bottlenecks and this difference is significant in most cases. We also conclude that there is no significant difference in the complexity of nodes among different modules. We showed that proteins which are hubs and/or bottlenecks have lower complexity. Hubs and bottlenecks are essential proteins in the process of evolution and it is probable that these types of proteins tend to have lower complexity in order to survive better in different conditions.
References 1. Abnizova, I., Gilks, W.R.: Studying statistical properties of regulatory dna sequences, and their use in predicting regulatory regions in the eukaryotic genomes. Briefings in Bioinformatics 7(1), 48–54 (2006) 2. Danon, L., Diaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. J. Stat. Mech. P09008, 1–10 (2005) 3. Dezso, Z., Nikolsky, Y., Nikolskaya, T., Miller, J., Cherba, D., Webb, C., Bugrim, A.: Identifying disease-specific genes based on their topological significance in protein networks. BMC Systems Biology 3(36) (March 2009) 4. Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. U S A 99, 7821–7826 (2002) 5. Guimera, R., Amaral, L.A.N.: Functional cartography of complex metabolic networks. Nature 433, 895–900 (2005) 6. Guldener, U., Munsterkotter, M., Oesterheld, M., Pagel, P., Ruepp, A., Mewes, H.W., Stumpflen, V.: Mpact: The mips protein interaction resource on yeast. Nucleic. Acids. Res. 34, D436–D441 (2006) 7. Han, J.D., Bertin, N., Hao, T., Goldberg, D.S., Berriz, G.F., Zhang, L.V., Dupuy, D., Walhout, A.J., Cusick, M.E., Roth, F.P., Vidal, M., et al.: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430, 88–93 (2004) 8. Menconi, G., Benci, V., Buiatti, M.: Data compression and genomes: A twodimensional life domain map. J. Theor. Biol. 253(2), 281–288 (2008)
The Effect of Sequence Complexity on the Construction of PPI Networks
319
9. Missiuro, P.V., Liu, K., Zou, L., Ross, B.C., Zhao, G., Liu, J.S., Ge, H.: Information flow analysis of interactome networks. PLOS Computational Biology 5(4) (April 2009) 10. Nan, F., Adjeroh, D.: On complexity measures for biological sequences. In: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, pp. 522–526. TeX Users Group (2004) 11. Newman, M.E.: Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Phys. Rev. E 64, 016132 (2001) 12. Newman, M.E.: The structure and function of complex networks. SIAM Rev. 45, 167–256 (2003) 13. Newman, M.E., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E Stat. Nonlin. Soft Matter Phys. 69, 026113 (2004) 14. Orlov, Y., Boekhorst, R., Abnizova, I.: Statistical measures of the structure of genomic sequences: entropy, complexity, and position information. J. Bioinform. Comput. Biol. 4(2), 523–536 (2006) 15. Pirhaji, L., Kargar, M., Sheari, A., Poormohammadi, H., Sadeghi, M., Pezeshk, H., Eslahchi, C.: The performances of the chi-square test and complexity measures for signal recognition in biological sequences. J. Theor. Biol. 251(2), 380–387 (2008) 16. Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U., Eisenberg, D.: The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449– D451 (2004) 17. Sheari, A., Kargar, M., Katanforoush, A., Arab, S., Sadeghi, M., Pezeshk, H., Eslahchi, C., Marashi, S.-A.: A tale of two symmetrical tails: Structural and functional characteristics of palindromes in proteins. BMC Bioinformatics 9(274) (2008) 18. Troyanskaya, O.G., Arbell, O., Koren, Y., Landau, G.M., Bolshoy, A.: Sequence complexity profiles of prokaryotic genomic sequences: A fast algorithm for calculating linguistic complexity. Bioinformatics 18(5), 679–688 (2002) 19. Vingaa, S., Jonas, S., Almeidaa, B.: Renyi continuous entropy of dna sequences. J. Theor. Biol. 231(3), 377–388 (2004) 20. Yu, H., Kim, P.M., Sprecher, E., Trifonov, V., Gerstein, M.: The importance of bottlenecks in protein networks: Correlation with gene essentiality and expression dynamics. PLoS Computational Biology 3, 713–720 (2007)
Data Fusion and Feature Selection for Alzheimer's Diagnosis Blake Lemoine, Sara Rayburn, and Ryan Benton Alzheimer’s Disease Neuroimaging Initiative*
Abstract. The exact cause of Alzheimer’s disease is unknown; thus, ascertaining what information is vital for the purpose of diagnosis, whether human or automated, is difficult. When conducting a diagnosis, one approach is to collect as much potentially relevant information as possible in the hopes of capturing the important information; this is the Alzheimer's Disease Neuroimaging Initiative (ADNI) adopted approach. ADNI collects different clinical, image-based and genetic information related to Alzheimer's disease. This study proposes a methodology for using ADNI's data. First, a series of support vector machines is constructed upon nine data sets. Five are the results of clinical tests and the other four are features derived from positron emission tomography (PET) imagery. Next, the SVMs are fused together to determine the final clinical dementia rating of a patient: normal or abnormal. In addition, the utility of applying feature selection methods to the generated PET feature data is demonstrated.
1 Introduction The Alzheimer's Disease Neuroimaging Initiative (ADNI) has made available a large quantity of data pertaining to Alzheimer's Disease (AD) [1]. This paper addresses the potential of this data to be used to train support vector machines (SVMs) for the purpose of automatic AD diagnosis. Support vector machines work by mapping data that may not be linearly separable into a high-dimensional space where it is separable. This mechanism explicitly requires that data be represented as a feature vector. With the data sets utilized in this study, there are two distinct challenges that must be overcome. The first difficulty is the data sets containing an enormous number of features. The primary examples of this scenario are data sets containing several thousand features and only a few hundred patients. SVMs trained on these data sets are highly likely to over fit the data. Fortunately, it is likely that some features are more relevant to the diagnosis of AD than others. In this case, a classifier should use only those features that are relevant and disregard the remaining features. *
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (www.loni.ucla.edu\ADNI). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. ADNI investigators include (complete listing available at http://www.loni.ucla.edu/ADNI/Collaboration/ADNI_Manuscript_Citations.pdf).
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 320–327, 2010. © Springer-Verlag Berlin Heidelberg 2010
Data Fusion and Feature Selection for Alzheimer's Diagnosis
321
With respect to the second difficulty, there are several dozen different tests, activities, and evaluations for any given patient. Each of these data sets is very different in nature. A medical doctor uses all of these variant sources of data to form a holistic picture of a patient's condition and renders an opinion based on that picture. Ideally, an automatic classifier would also take multiple sources of information about a patient into consideration simultaneously. Considering these two challenges, it is clear that the problem of using the ADNI data for automatic classification of AD is one of information management. The task is to use as many relevant data sources as possible while ignoring as much irrelevant information within those sources as possible. Feature selection using the genetic algorithm and data fusion are the methods proposed here for addressing these challenges. This paper first presents the background of the problem and gives details of the data sets used. Next the method for performing feature selection using the genetic algorithm is presented, followed by the method for performing data fusion. Finally, the empirical results of classification experiments using this methodology are presented.
2 Background of Problem In [2], the investigators assumed that each patient was represented by one or two features, each which described the metabolic activity within a region, along with a clinical dementia rating (CDR) value as described below. The CDR values were mapped into one of two categories, normal or abnormal. A threshold was assigned a priori and, then, each feature was compared to the threshold. If the feature’s value was greater than the threshold, the feature was considered to be evidence for dementia. Otherwise, it was evidence against. If all the features were supporting dementia, then the patient was classified as abnormal. Otherwise, the patient was classified as being normal. In [3], researchers report on efforts to build a classifier to distinguish between subjects with AD and Frontotemporal dementia (FTD) using features extracted from PET images. Calculated z-scores associated with locations within the cortical region were partitioned into groups that best distinguished AD and FTD. For each resulting region, the z-scores were used to generate a representative “z-score” value for the region. The representative values were then used to build a decision tree to distinguish between subjects with AD and FTD. A total of 48 subjects were used, 34 diagnosed with AD and 14 diagnosed with FTD. An accuracy of 94% was reported. Results reported on in [4] applied voxel-based morphometry to a set of magnetic resonance imaging (MRI) scans in order to extract 20 features from each scan. The extracted features measured the amount of grey-matter found within a region. An artificial neural network was constructed to distinguish subjects diagnosed with AD and subjects diagnosed as normal condition (NC). The evaluated data consisted of 10 subjects diagnosed as AD and 12 subjects diagnosed as NC. The reported average accuracy was 100%. Finally, results related to the classification of subjects with respect to mild cognitive impairment (MCI) have been reported. Specifically, in [5] researchers applied a nonlinear multivariate analysis technique to MRI images which resulted in classification accuracies
322
B. Lemoine, S. Rayburn, and R. Benton
of 81% and 74% with respect to MCI vs. NC and AD vs. MCI, respectively. The study included 66 subjects diagnosed as normal, 88 subjects diagnosed as MCI, and 56 diagnosed as AD. In an alternative study, which was based on extracted z-scores from PET images, it was determined that subjects diagnosed as normal could be distinguished from subjects diagnosed as MCI with an accuracy of 92% [6]. The latter study included 110 subjects diagnosed as NC and 114 subjects diagnosed as MCI.
3 Data In this section, we will briefly describe the source of the data used in this study, explain what features were extracted from the PET imagery, provide an overview of the clinical data, and, finally, address some data cleaning issues. 3.1 Data Source Data used in the preparation of this paper were obtained from the ADNI database (www.loni.ucla.edu/ADNI). The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non-profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test whether serial MRI, PET, and other biological markers could be used to detect the progression of MCI and early AD. Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials. The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and University of California - San Francisco. ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations, and subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit 800 adults, ages 55 to 90, to participate in the research - approximately 200 cognitively normal older individuals to be followed for 3 years, 400 people with MCI to be followed for 3 years and 200 people with early AD to be followed for 2 years. For up-to-date information, see www.adni-info.org. 3.2 PET Data As noted, we acquired the PET imagery from ADNI. However, the learning algorithms utilized assumes that the data is represented as feature vectors. Hence, we needed to convert each PET image into a usable representation. First, we converted the PET image into a set of 15964 data points (voxels); each data point is a Z-Score representing the metabolic activity at that point. The data points were computed using a GE proprietary application known as Cortex ID. The regions are based on the Talairach-Tourneau atlas. Next, as noted in [2], it is common practice to normalize each PET image set to a reference region. In this study, two different normalizations were considered. One reference region used was the Pons (PNS); in this case, each pixel value within the
Data Fusion and Feature Selection for Alzheimer's Diagnosis
323
PET scan was divided by the average activity in the Pons. The other normalization considered was the average Global (GLB) activity of the brain. The resulting two versions of the 15,964 points are referred as the Full Feature (FF) Data Set. In addition 31 regional Z-Score averages were also computed. The values for the left side and right side of the brain were calculated for the following regions: Parietal, Temporal, Frontal, Occipital, Post Cingulate, Ant Cingulate, Medial Frontal, Medial Parietal, Sensorimotor, Visual, Caudate, Cerebellum, Vermis, and the combined Parietal, Temporal and Frontal. In addition, a single value was computed for each of the following: Pons, Global (total brain) and Cortex. The averages are treated as an additional dataset, which, in this paper, is referred as PET 31. Thus, four data sets had been created: PET FF GLB, PET 31 GLB, PET FF PNS, and PET 31 PNS. 3.3 Clinical Data There are several dozen clinical data sets provided by ADNI. Five were chosen as representative data sets spanning lab tests, questionnaires, and doctor interviews. These were: the ADAS-Cognitive behavior test (ADAS), the functional assessment questionnaire (FAQ), the family history questionnaire (FHQ), a homocysteine test (HCRES), and the baseline symptoms checklist (BLSCHECK). The result of each of these is a small number of “yes/no” values. These were attractive because they are easily representable as feature vectors and because feature selection would not be required for these data sets, due to the small number of features in each set. 3.4 Pre-processing the Data For the described data sets to be useful for classification, a certain amount of preprocessing was performed. The ADNI data has a number of mismatches between CDR records, which provide the correct classification of a patient, and the PET imaging data and the clinical evaluation data. In order to generate usable data, one or the other of these must be chosen as constant. CDR records serve that purpose here. Each CDR was paired with exactly one PET scan, the one nearest to it in time. It was also paired with exactly one of each clinical measure to form a single problem instance. Any problem instance which did not include one or more of the diagnostic measures was removed from the data set. Furthermore, many of the PET scans have measurements which are inconsistent with other PET scans. The feature values are Z-scores in comparison to a normal population. This sets an upper and lower bound for plausibility. A Z-score of +100 is simply so unlikely that it is more likely to be bad data. Any problem instances containing PET scans with values outside of a predefined acceptable range were removed from the data set. Finally, the Z-scores of PET scans were normalized to values between 0 and 1.
4 Feature Selection Finding an optimal set of features to use for classification purposes requires exponential time in the worst case. Evolutionary algorithms have been found to efficiently solve such problems in the past [7]. Furthermore, there is some historic basis for the
324
B. Lemoine, S. Rayburn, and R. Benton
genetic algorithm being used for feature selection [8] and, as will be shown, it is useful in obtaining good results for the PET FF GLB and PET FF PNS sets. The problem of feature selection in classification systems is, informally, a matter of deciding, with respect to the classification task, what information is relevant and what information is not. In order to translate this into an algorithm, however, the problem must be formalized. The task will be to find a set of features of size n, where n is some fixed constant, that maximize some performance function when used to perform a classification task. Alternate definitions for the feature selection problem exist [9] but the one given above is the one adopted in this paper. For the procedure advocated here, n is chosen to be 30 and the evaluation function is chosen to be the area under the classifier's receiver operating characteristic (ROC) curve (AUC) [10]. 1) Generate a random gene pool 2) Create an SVM for each chromosome; the SVM created will only use the features referenced by that chromosome. 3) Rank the SVMs by the area under ROC curve 4) Create a set of chromosomes that correspond to the top 10% of the SVMs as well as the chromosome that corresponds to the worst 5) Use the resultant set of chromosomes as the parents of a new gene pool a) Randomly select two chromosomes from the resultant set as parents. b) Perform crossover; this results in two new chromosomes. c) For each gene in each new chromosome, randomly change the value m% of the time. d) Insert the two new chromosomes into the new gene poll. 6) If the best SVM is better than the evaluation function's stopping metric then end, otherwise go to step 2 Fig. 1. Outline of the feature selection algorithm
In adapting the genetic algorithm for this purpose, each feature will correspond to a single gene. A chromosome will be a set of n genes. The evaluation function will be the AUC of the chromosome's SVM after 5-fold stratified cross validation. Crossover will consist of using a portion of one parent's features and a complementary portion of the other parent's features. Mutation will be incorporated by allowing features to be randomly changed to other features. The rate of mutation will be varied such that mutations will be more common in early generations than in later generations. An outline of this algorithm is given in Figure 1. Once the gene pool contains a member that meets the given stopping criterion then the best member of that generation will be returned as the best feature set. Hence, by the end of this processing, we will have two new data sets: PET FS GLB and PET FS PNS.
5 Data Integration The purpose of data integration is to combine the five clinical sets (ADAS, FAQ, FHQ, HCRES, and BLSCHECK) with four PET data sets (31 GLB, FS GLB, 31 PNS
Data Fusion and Feature Selection for Alzheimer's Diagnosis
325
and FS PNS). Each of the sets are individually used to train and test SVMs. In order to create RBF SVMs, a Gaussian kernel was computed for each of the above data sets. Each training set consists of one feature set associated with a patient diagnosis (CDR value). These training sets are each used to perform 10-fold stratified cross validation of a soft margin SVM. The results for each data set are averaged across all 10 folds, yielding a single summary statistic, the AUC. This AUC is then used to apportion weights in the data integration step. The combined kernel is created by a weighted linear summation of the individual kernel matrices. The weights are based on the individual ROC scores of the included data sources. A simple linear combination of kernels itself creates a valid kernel matrix because it maintains the positive, semi-definite nature of the kernel [11]. Let K be the set of n individual kernel matrices and W be the set of normalized relative performance of each kernel at the classification problem. n
K summed = ∑ wi k i i =1
(1)
Formula (1) is the linear combination of the individual kernel matrices. This combined kernel is then used as input to perform 10-fold stratified cross validation with an SVM classifier on the same problem.
6 Results After data preprocessing there are 495 clinical dementia ratings (CDRs) in the ADNI data which have related records in each of the data sets considered. A CDR can have a value of 0, 0.5, or 1+, where 0 is no dementia, 0.5 is questionable and 1+ is dementia. These possible CDR values can be used to create a binary classification problem where the task is to distinguish normal from abnormal patients, where a CDR of 0 is normal, and a CDR of 0.5 or 1+ is considered abnormal [2]. The set of training samples contains 362 abnormal CDRs and 133 normal. For the purposes of evaluation, an abnormal CDR is taken as the positive class. As stated earlier, all feature selection experiments thus far run with the ADNI Alzheimer's data generate chromosomes that contain 30 genes. This number was chosen somewhat arbitrarily, but fits the requirement that the number of features be much smaller than the number of examples. The stopping criteria is when an SVM is derived from the 30 features that has a minimum AUC of 0.98. To determine how the combined data set performed it was compared to its constituent parts. Each individual data set was evaluated on the same training set and classification task, where the evaluation metric is the area under the ROC curve. After feature selection was performed on the full feature set for the PET data, this feature set was evaluated for the subset of CDRs chosen for the data integration experiments. The AUC for both the global and pons PET scans were 0.94, gaining 3% AUC over the PET feature set of 31 regional averages. The other non-imaging clinical data had AUCs ranging from 0.58 to 0.93, as shown in Table 1.
326
B. Lemoine, S. Rayburn, and R. Benton Table 1. AUC and weights for data sets evaluated
AUC ADAS BLSCHECK FAQ FHQ HCRES PET31 GLB PET31 PNS PET FS GLB PET FS PNS COMBINED
0.93 0.62 0.92 0.72 0.58 0.92 0.92 0.92 0.94 0.97
Normalized Relative Performance (weight) 0.16 0.11 0.16 0.13 0.1 0.16 0.16 0.16 0.17 n/a
By combining the data sets with weights as shown in Table 1, we were able to improve the AUC of our classifier another 3% from 0.94 with the feature selected PET data to 0.97 with all of the data included.
7 Conclusions and Future Work The results of this experiment demonstrate that both feature selection by the genetic algorithm and data integration can successfully increase the accuracy of SVM classification of AD using the available data from ADNI. Future work in this area will incorporate other clinical data to potentially increase accuracy even further. Additionally, the feature selection process could be used to identify which area(s) of the brain contain the most relevant information for the purposes of PET scan based Alzheimer's diagnosis. Determining whether or not any such areas exist is another area of future research.
Acknowledgements First, we would like to thank Dr. Suresh Choubey at GE Healthcare for providing the 31 and 15964 PET feature data sets. Second, data collection and sharing for this project was funded by the ADNI (National Institutes of Health Grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Abbott, AstraZeneca AB, Bayer Schering Pharma AG, Bristol-Myers Squibb, Eisai Global Clinical Development, Elan Corporation, Genentech, GE Healthcare, GlaxoSmithKline, Innogenetics, Johnson and Johnson, Eli Lilly and Co., Medpace, Inc., Merck and Co., Inc., Novartis AG, Pfizer Inc, F. Hoffman-La Roche, Schering-Plough, Synarc, Inc., as well as nonprofit partners the Alzheimer's Association and Alzheimer's Drug Discovery Foundation, with participation from the U.S. Food and Drug Administration. Private sector
Data Fusion and Feature Selection for Alzheimer's Diagnosis
327
contributions to ADNI are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129, K01 AG030514, and the Dana Foundation.
References 1. ADNI-Info.: Alzheimer’s Disease Neuroimaging, Initiative: http://www.adni-info.org/ (Retrieved March 1, 2010) 2. Minoshima, S., Frey, K., Koeppe, R., Foster, N., Kuhl, D.: A Diagnostic Approach in Alzheimer’s Disease Using Three-Dimensional Stereotactic Surface Projections of Fluorine18-FDG PET. Journal of Nuclear Medicine 36(7), 1238–1248 (1995) 3. Sadeghi, N., Foster, N., Wang, A., Minoshima, S., Lieberman, A., Tasdizen, T.: Automatic Classification of Alzheimer’s Disease vs. Frontotemporal Dementia: A Spatial Decision Tree Approach with FDG-PET. In: Proceedings of IEEE International Symposium on Biomedical Imaging: From Nano to Micro, pp. 408–411 (2008) 4. Huang, C., Yan, B., Jiang, H., Wang, D.: Combining voxel-based morphometry with Artificial Neural Network theory in the application research of diagnosing Alzheimer’s disease. In: International Conference on BioMedical Engineering and Informatics, pp. 250– 254 (2008) 5. Fan, Y., Batmanghelich, N., Clark, C., Davatzikos, C.: Spatial patterns of brain atrophy in MCI patients, identified via high-dimensional pattern classification, predict subsequent cognitive decline. NeuroImage 39(4), 1731–1743 (2008) 6. Mosconi, L., Tsui, W., Herholz, K., Pupi, A., Drzezga, A., Lucignani, G., Reiman, E., Holthoff, V., Kalbe, E., Sorbi, S., Diehl-Schmid, J., Perneczky, R., Clerici, F., Caselli, R., Beuthien-Baumann, B., Kurz, A., Minoshima, S., de Leon, M.: Multicenter Standardized 18 F-FDG PET Diagnosis of Mild Cognitive Impairment, Alzheimer’s Disease, and Other Dementias. The Journal of Nuclear Medicine 49(3), 390–398 (2008) 7. DeJong, K., Spears, W.: Using Genetic Algorithms to Solve NP-Complete Problems. In: Proceedings of the 3rd International Conference on Genetic Algorithms, pp. 124–132 (1989) 8. Fröhlich, H., Chapelle, O., Schölkopf, B.: Feature selection for support vector machines by means of genetic algorithms. In: 15th IEEE International Conference on Tools with AI (ICTAI 2003), pp. 142–148 (2003) 9. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature Selection for SVMs. In: Advances in Neural Information Processing Systems, vol. 13 (2000) 10. Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006) 11. Lanckriet, G., DeBie, T., Christianini, N., Jordan, M., Noble, W.: A statistical framework for genomic data fusion. Bioinformatics 20 (2004)
A Cognitive Architecture Based on Neuroscience for the Control of Virtual 3D Human Creatures Felipe Rodr´ıguez1, Francisco Galvan1 , F´elix Ramos1 , Erick Castellanos1 , Gregorio Garc´ıa2, and Pablo Covarrubias3 1
Cinvestav Guadalajara, Av. Cient´ıfica 1145, Col. El Baj´ıo, Zapopan 45010, Jalisco, M´exico {lrodrigue,fgalvan,framos,ecastella}@gdl.cinvestav.mx 2 Instituto de Neurociencias, Francisco de Quevedo 180, Col. Arcos Vallarta, Guadalajara 44130, Jalisco, M´exico [email protected] 3 Universidad del Valle de M´exico, campus Zapopan, Perif´erico Poniente 7900, Col. Jardines del Colli, Zapopan 45010, Jalisco, M´exico [email protected]
Abstract. For the creation of a virtual 3D creature it is necessary an underlying structure that provides to it some desired capabilities. One of our main research objectives is creating a virtual 3D creature that resembles human behavior in an actual environment. In this paper, we propose a cognitive architecture inspired in the recent findings of the neuroscience which will represent the underlying structure for implementing virtual 3D creatures with human-like capabilities. Those virtual creatures will be useful to study human behavior in actual environments by means of simulations.
1
Introduction
Many virtual environments try to simulate our world, in this way virtual creatures that exist in them possess a set of capabilities similiar to those of humans. We propose a cognitive architecture conforming the underlying structure of a virtual 3D creature simulating human-like behavior. The paper is structured as follows: section 2 describes the motivations that leaded us to choose the neuroscience approach; section 3 states the set of desirable abilities; section 4 presents the whole proposed architecture; section 5 shows how each desirable ability works on the architecture; section 6 explains how the results of a neuroscientific experiment are related to our proposal; section 7 gives conclusions.
2
The Cognitive Architecture and the Neuroscience
Although there are a diversity of implemented cognitive processes models and simulations of brain structures (which are focused in specialized functions) [1], our main intention is to establish a fully designed cognitive architecture to build specialized processes based on this design. This paradigm avoids difficulties in Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 328–335, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Cognitive Architecture Based on Neuroscience
329
the integration phase presented in the unified theory of cognition [2]. The development of the architecture proposed in this paper is in accordance with the idea of a unified theory of cognition that Newell argued in [3]. We argue that since each of our components works according to a specific neuroscience theory, the resulting behavior must resemble those of humans. This idea is according to that of Newell description of how to build a unified theory of cognition grounded in our existing understanding of cognition. We claim that our approach is doable since the contemporary neuroscience research has explained more accurately some of the cognitive processes as well as its brain correlated structures [4], [5]. Stressing the benefits of using neuroscience for the construction of a cognitive architecture, we identify at least two important advantages. First, the computational cognitive model may help to integrate separated theories about cognitive processes, leading to unified explanations of cognition theory, covering the gap between isolated findings in neuroscience and unifying them in wider theories [6]. In addition, by using integrated computational models of cognitive processes, neuroscientists may achieve clearer explanations about those processes. On the other hand, related to artificial intelligence and to our main goal, with this approach we are searching for more realistic human behavior on virtual creatures (by means of a set of abilities). In fact, this approach allows the architecture to consider most of the capabilities and properties proposed in [7] and [8]. While our approach is based on neuroscience, there are other cognitive architectures grounded on different theories. Two of the most important architectures are ACT-R [9] and Soar [10]. Although, ACT-R is an architecture psychologically grounded, lately they have mapped some architecture’s modules to parts of the human brain, based on the functioning perspective [11]. But, from its conception, ACT-R was not thought in a neuroscience based design, therefore, that task has not been entirely transparent. Soar has lately been modified with missing capabilities they see granted in humans, such as reinforcement learning, appraisal detector, semantic memory, episodic memory, among others. Although, as they state, all of the new components have been built, integrated, and run individually with the traditional Soar components, there is not a single unified system that has all the components running at once [12]. Accordingly, in order to prevent limitations that have aroused during the development of similar projects our proposal is fully conceived on the neuroscience findings. Thus, the main objective of the paper is to present a cognitive software architecture based on the description of the brain provided by the neuroscience and complemented with the knowledge of the computer sciences useful to provide virtual creatures with human-like behavior. If the behavior is based on abilities, an important question should be what are the sufficient or desirable abilities to achieve a similar human behavior?
3
Abilities for a Virtual 3D Human Creature
Currently, an area from neuroscience have been focused in the study of the executive functions. Those functions can be divided in “meta cognitive” and
330
F. Rodr´ıguez et al.
“emotional-motivational” executive functions, the first includes key abilities to achieve and pursue goals while the second is responsible for coordinating cognition and emotion [13]. In order to approximate human behavior in virtual creatures, the architecture provides some abilities that are based on this perspective. In addition, the perception process and motor response are mediated by the interaction of those executive functions (described below) as well as learning and memory processes. Here a description of the abilities: Perceptual function: (1) Perception: is the process of reception and interpretation of the different external and internal stimulus. This will give to the creature an internal representation of the world and itself. Subsequently, the creature will be able to evaluate the situation and try to shape the environment and itself in order to achieve its goals. Cognitive abilities: (2) Learning: stable changes in the mechanisms of behavior. (3) Memory: the ability to store, retain and recall knowledge. Emotional function: (4) Emotions: the ability to encode emotional stimulus and to influence a set of cognitive process with the emotional nuance extracted from the perceived stimulus. Metacognitive functions: (5) Planning: the ability to create a sequence of possible actions that will lead to a expected result. Within this ability is hide the ability to “imagine” and predict the results of application of actions. (6) Deliberation Process: represents the process of selecting one among a set of possible actions. (7) Cognitive flexibility: ability of spontaneously restructure the self knowledge in an adaptive way to change the environment demands. Motor function: (8) Motor action: the ability to control successfully the movements of each movable body part. As a result of the modification of its body, the environment changes. The architecture should support this set of processes. Next section depicts the design of the proposed architecture and a description of its constituent parts.
4
The Cognitive Architecture Design
The components of the architecture and its proposed function are directly related to brain components and processes [5]. We now describe the characteristics of each module and its role in the architecture. See figure 1. 1. Set of Sensory System: this module catches the environment status. Then, it sends the information to the Thalamus. Due the primitive nature of the olfactory system, this sense sends the information directly to the Olfactory Cortex, after a filter is applied by the Olfactory Bulb. 2. (α) Olfactory Bulb: this module is a first filter to olfactory information. Then, it sends to the association cortex through the Hippocampus. 3. Thalamus: this is the first processing phase for the data received from the sensors. It consists of four modules: (a) (β) Lateral Geniculate Nucleus: receives information from the vision sensor and sends the selected information to Visual Cortex.
A Cognitive Architecture Based on Neuroscience
331
(b) (γ) Medial Geniculate Nucleus: sends the auditory information selected to Auditory Cortex. (c) (δ) Ventrobasal : filters tactile sensory signals before sending them towards the Somatosensory Cortex. (d) () Ventral Posterior Medial Nucleus: taste information is put together here; only a selected amount of it is sent to Gustatory Cortex. Submodules of the Thalamus are all interconnected and share information. 4. Sensory Cortex : this set of modules are the ones in charge of giving an interpretation to the data received by the Set of Sensory System. (a) (ζ) Visual Cortex : incoming data is visually interpreted with knowledge provided by Hippocampus and sent to Association Cortex. (b) (η) Gustatory Cortex : taste information is interpreted, using the information provided by Hippocampus and sent to Association Cortex. (c) (θ) Somatosensory Cortex : somatic data is transformed using data provided by Hippocampus and sent to the Association Cortex. (d) (ι) Auditory Cortex : interpretation of auditory data is done using Hippocampus information. Information is sent to Association Cortex. (e) (κ) Olfactory Cortex : the olfactory data is interpreted using the information provided by the Hippocampus and sent to the Association Cortex. (f) (λ) Association Cortex : this module puts together current and past sensory interpretations and associations of the objects on the environment. It has connection with the Hippocampus to get past information and to return the deducted information, also, it has connections with the Olfactory, Gustatory, Visual, Somatosensory and Auditory Cortex. 5. Limbic system: two of its important functions are related to emotions and long-term memories. (a) (μ) Hippocampus: This module creates a context of all the information gathered. It manages the storage and recall of memories from cortex. At the signal of the Amygdala, store all information received in the recent past, present and future and creates a temporal relationship between those information. (b) (ν) Amygdala: here the information related to context and current state is received thanks to Sensory Cortex through the Hippocampus and Thalamus. This information is used to organize a set of emotional reactions. Those reactions take effect on the Thalamus to affect perception, the Hippocampus to instruct when to keep knowledge in the long term memory and affect context creation, and in the Orbitofrontal Cortex to modify the appraisal level of the information gathered. 6. Prefrontal cortex : coordinates the temporal organization of actions. (a) (ξ) Orbitofrontal Cortex : this module evaluates the affective information of perceived stimulus. It mainly receives information from the Amygdala and projects to the Ventromedial Prefrontal Cortex. (b) (o) Dorsolateral Prefrontal Cortex : it is related to the motor planning behavior, stores the current goal and integrates the information of long term memory and sensory input; the main objective is to create
332
F. Rodr´ıguez et al.
plans to achieve the current goal. It communicates with the Ventromedial Prefrontal Cortex when a plan is created and decisions must be made. When the next action is decided, the order is sent to the Basal Ganglia which in turn regulate (via feedback) the action previously decided. (c) (π) Ventromedial Prefrontal Cortex : receives perceived information from the Hippocampus, emotional appraisal information from Orbitofrontal Cortex and objective information from the Dorsolateral Prefrontal Cortex. With this information, chooses between possible actions to achieve the goal. When one action does not lead to the goal, the information is redirected to Dorsolateral Prefrontal Cortex to form a plan. 7. Motor System: here, the instructions given by the Prefrontal Cortex are translated into body movement attempts. (a) (ρ) Basal Ganglia: this module selects the possible muscles of the body to achieve the action sent by the Dorsolateral Prefrontal Cortex. (b) (σ) Motor Cortex : once an action is received from Basal Ganglia, this module makes the needed calculations to control body and, therefore, complete the action. Olfactory Bulb Lateral Geniculate Nucleus
Frontal Lobe
Medial Geniculate Nucleus Ventrobasal
Parietal Lobe
Ventral Posterior Medial Nucleus Visual Cortex Gustatory Cortex
Sensory System Vision
Limbic System
Somatosensory Cortex
Thalamus
Auditory Cortex Olfactory Cortex
Hearing Somatic Senses Taste Olfactory
Association Cortex
Temporal Lobe
Hippocampus
Occipital Lobe
Amygdala Orbitofrontal Cortex Dorsolatelar Prefrontal Cortex Ventromedial Prefrontal Cortex Basal Ganglia
Motor Cortex
Fig. 1. The Cognitive architecture design
Although we have explained the architecture design, it remains to describe how each ability arises from the interaction of some of those modules.
5
Abilities for Virtual 3D Creatures and the Architecture
Here we explain how each of those modules interact to allow the virtual 3D creature to show the desired abilities [5]. See diagrams in figure 2. 1. Perception: this ability will be granted by the following information cycle: (a) Environment information gathering would be done by the Sensory System (1). The olfactory sensor, after applying a filter to the data with the Olfactory Bulb, sends its data directly to the Olfactory Cortex (2). The rest of the sensors send its data to the Thalamus (3).
A Cognitive Architecture Based on Neuroscience
2.
3.
4.
5.
6.
6
333
(b) After the Thalamus finishes filtering the data, it is sent to the Sensory Cortex and to the Amygdala to affect emotional status (4). (c) Data interpretation and association is done in the Sensory Cortex. When done, it sends this information to the Hippocampus (5). (d) The Hippocampus helps to recall knowledge stored in memory to interpret the data received at the Sensory Cortex (6). It creates a context using memory and emotional information sent by the Amygdala (7). Also, all context and information deduced is stored in memory and sent to the Amygdala to update emotional state (8). Learning: for learning to exist in this architecture, there must be a discrepancy between actual and predicted rewards in the environment. That means that learning will occur if a stimulus is paired with an unexpected reward. Memory: when the Amygdala senses a high emotional level, a temporal window is opened (2). While this window is opened, all information previously passed (1), currently at and passed from that moment on to the Hippocampus (3), will be temporarily related and stored in the long term memory, located at the Association Cortex, for other modules to use (4). Emotions: the emotions would occur in the architecture as follows: (a) The Amygdala receives fast (1) and highly processed sensorial information from the Thalamus and Hippocampus (1.5) respectively. (b) The Amygdala sends processed information to the Thalamus, Orbitofrontal Cortex and Hippocampus to produce a emotional reaction (2). Planning and Decision Making: (a) At Dorsolateral Prefrontal Cortex a plan is built using current state sent by Sensory Cortex (1), the goal and data provided by Hippocampus (2). (b) The Orbitofrontal Cortex receives emotional information (1.5) and emotional appraisal level is set to the knowledge stored at Hippocampus (2). (c) The Ventromedial Prefrontal Cortex receives the plan and the appraisal level (3, 3.5). Using the raw knowledge and its appraisal level (4), the plan is trimmed and sent back to the Dorsolateral Prefrontal Cortex (5). (d) The plan could be refined, extended or returned to the Ventromedial Prefrontal Cortex. When the immediate next action is decided, the action is sent to the Basal Ganglia to be executed (6). Motor action: (a) Dorsolateral Prefrontal Cortex gives the following action to execute to the Basal Ganglia (1). (b) The Motor Cortex receives the information of the current state together with the set of muscles to move and the intended action (2). The action is passed to the muscles and the action is executed.
Neurofunctional Data Over the Architecture
To theoretically prove the system functionality we will make use of the delay match to sample task. We follow the brain activations that emerge during such task and then we compare with the information flow in the proposed architecture, see figure 3. During the training phase, a sample stimulus is presented to
334
F. Rodr´ıguez et al.
1,2,3 3.5
Fig. 2. Diagram of the different processes in the architecture
Test
Sensory System Vision
Thalamus
Limbic System
Medial Lateral Geniculate Geniculate Ventrobasal Nucleus Nucleus
b) Objective set c) A ligth on screen d) Image is captured e) Select the most relevant information f) No relevant information for this sense g) Information produces high excitation
Ventral Posterior Medial Neucleus
Sensory Cortex
Prefrontal Cortex
Orbitofrontal Dorsolateral Prefrontal Ventromedial Amygdala Hippocampus Visual Cortex Association Cortex Cortex Cortex Cortext
i) Information is stored in memory j) Perceptors are built based in grouped stimuli k) An evaluation of the current situation is done l) A meaning is built m) Stimuli appears on screen
Motor System Basal Ganglia
Motor Cortex
n) Hold information o) Makes a relationship between past and current stimuli p) A decision is made q) Pass information to Basal Ganglia r) Selects appropriate motor system s) Executes the response
Fig. 3. Functioning and module interconnection based on neuroscience paradigm
the subject (e.g., a green light) and then, a couple of stimuli to be compared are presented (e.g., a green and red lights together). The choice of the subject is awarded if match correctly with the sample stimulus. Once the subject identifies the relation between the sample and target stimuli, a delay is made between the stimuli presentation and the subject choice. Next, we describe the brain activation together with the architectural work flow. In a matching task there is activation of subcortical areas including sensory system along Thalamus (1); then activates Amygdala functioning that goes to orbitofrontal cortex and hippocampus (2); at the same time, activation then extends to Visual Cortex and follows a ventral path through association extrastriate areas of the occipital and limbic temporal sites near medial temporal circumvolution (3). At this point, activation spreads in a corticolimbic interaction involving the activation of ventromedial prefrontal cortex (4). Then, there are a feedback interaction from
A Cognitive Architecture Based on Neuroscience
335
prefrontal cortices to limbic areas. Following, interaction between Ventromedial and Dorsolateral Prefrontal Cortex produces a direct pass of information to the motor system (5).
7
Conclusions
We use the neuroscience approach and take the executive brain paradigm to orchestate a functionig view of the brain. According to this perspectives, we depict the design of the cognitive architecture, subsequently we show how each ability arises from the functional activity of the various modules and their coupling interactions. In order to explain the expected activations of the architecture, we use a neuroscience paradigm, which allows us to compare the data from the real experiment in humans with the mentioned activations. This is a work in progress where the next stage is the implementation of each process. The final objective is integrate them following the architecture design. Acknowledgments. This research is partially supported by CoECyT-Jal Project No. 2008-05-97094, whilst authors Felipe Rodr´ıguez, Francisco Galvan, Erick Castellanos are supported by CONACYT grant No. 229386, 219078, 219074 respectively.
References 1. Morton, J.B., Munakata, Y.: Active vs. latent representations: a neural network model of perseveration, dissociation, and decalage. Developmental Psychobiology 40, 255–265 (2002) 2. Langley, P.: Cognitive architectures and general intelligent systems. AI Mag. 27(2), 33–44 (2006) 3. Newell, A.: Unified theories of cognition. Harvard University Press, Boston (1990) 4. Carter, R.: Mapping the Mind. University of California Press, California (2000) 5. Kandel, E.: Principles of Neural Science, 4th edn. McGraw-Hill, New York (2000) 6. Thelen, E., Schner, G., Schneider, C., Smith, L.B.: The dynamics of embodiment: a field theory of infant perseverative reaching. Behavioral and Brain Sciences 24, 1–86 (2001) 7. Langley, P., Laird, J., Rogers, S.: Cognitive architectures: Research issues and challenges. Cognitive Systems Research 10(2), 141–160 (2009) 8. Sun, R.: Desiderata for cognitive architectures. Philosophical Psychology 17(3), 341–373 (2004) 9. Anderson, J.R.: Rules of the mind. Lawrence Erlbaum Associates, Mahwah (1993) 10. Laird, J.E., Newell, A., Rosenbloom, P.S.: SOAR: an architecture for general intelligence. Artif. Intell. 33(1) (1987) 11. Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., Qin, Y.: An integrated theory of the mind. Psychological Review 111(4), 1036–1060 (2004) 12. Laird, J.E.: Extending the Soar Cognitive Architecture. In: Proceeding of the 2008 Conference on Artificial General Intelligence, vol. 171, pp. 224–235. IOS Press, Amsterdam (2008) 13. Ardila, A.: On the evolutionary origins of executive functions. Brain and Cognition 68(1), 92–99 (2008)
Towards Inexpensive BCI Control for Wheelchair Navigation in the Enabled Environment – A Hardware Survey Kenyon Stamps1 and Yskandar Hamam1,2 1
French-South African Institute of Technology, Tshwane University of Technology, P/Bag X680, Pretoria 0001, South Africa 2 ESIEE-Paris, Paris-Est University, Cit´e Descartes, BP99, Noisy-le-Grand, 93162 France [email protected], [email protected]
Abstract. This study attempts to support further research into the development of practical and inexpensive non-invasive brain-computer interface systems for the control of prosthetic devices, especially electric wheelchairs. With motivations from literature, the steady state visual evoked potential is reasoned to be the neurological mechanism for a proposed modular-based BCI system. Selected papers on surveys of BCI research and BCI designs are mentioned. Available acquisition hardware for BCI-interfaces, with particular attention to non-invasive electroencephalogram (EEG) acquisition, are presented with a selection of articles reporting their use. In conclusion, some suggestions for further study towards practical BCI systems are made.
1
Introduction
Brain computer interface (BCI), particularly non-invasive electroencephalogram based BCI, research has increased immensely in the past ten years and knowledge in this multi-disciplinary field is vast, yet far from completely explored. Most of the research related to control-centered BCIs is done on assistive technologies, enabling severely handicapped patients to interact with their environment. Ironically, the only affordable (to the average income person) BCI systems commercially available target the media and PC gaming industry. The cost, power and processing capabilities of embedded processors and digital signal processors, and the capabilities of brain-computer interfaces are ever improving. This is so to the point that some feature extraction and classification algorithms for BCIs can be well supported on relatively low cost embedded systems. This has been proved true by NeuroSky and Emotive1 with their now commercially available BCI headsets called Mindset [1] and Epoc [2] which detect and classify mental activity for media and entertainment applications. Despite 1
Neurosky and Emotiv are only two examples, another example is NIA from OCZ technologies [3].
Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 336–345, 2010. c Springer-Verlag Berlin Heidelberg 2010
Towards Inexpensive BCI Control for Wheelchair Navigation
337
all this there seems to fail to exist a complete and affordable BCI system suitable for wheelchair control. The steady state visual evoked potential (SSVEP) based BCI system in one system which, until the past two years, seems to have been a relatively neglected area in BCI research as suggested by results in Bashashati et al.[4]. A possible reason for this could be for the sometimes annoying and tiring visual stimulus of the SSVEP approach as mentioned in Garcia [5] and Wang et al. [6] and possibly also because of its less expensive alternatives, such as the eye movement tracker. Until recently the cost of implementing BCI systems, including the much reduced price professional BCI systems for control, commonly used in research made them rarely and only useful to those patients with severe handicap, such as locked-in syndrome, invovled in the research. Following in this paper a proposed modular based BCI system for prosthetic control is presented with an introduction of papers reviewing different aspects of a BCI system, a survey of available low-cost EEG acquisition systems and selected examples of their use in BCI related research. This paper looks at the possibility developing a BCI-system which can be implemented on a wheelchair having standard day-to-day constraints such as limited electrical energy, space, weight capacity, cost and the need for fast decisions to be made. These constraints almost completely rule out the option of sensor equipment such as invasive EEG, MEG and fMRI which are either expensive or bulky. Leaving us with two options to investigate, these being non-invasive EEG and near-infrared spectroscopy. For the purpose of this paper we will not go into detail about near-infrared spectroscopy. Working on only the financial cost aspect of the BCI system, as it also has influence on the size and computational demands, we determined that one of the major cost areas of a conventional BCI system is the computation platform which is usually either a personal computer (PC) or FPGA-based embedded system. In the next section of this paper we give reasons why the steady state visual evoked potential is the best mechanism to use for a low cost BCI system.
2
Methodology
A BCI system can generally be illustrated by three or four subsystems. Based on the general framework for a BCI system by Mason and Birch [7] figure 1 describes how we have divided our BCI system into four main modular hardware subsystems. As the functionality of the different hardware sections may differ from application to application, the modular nature of our system enables us to use parts of the system for other BCIs without having to redesign the entire BCI system, this runs in line with Quitadamo et al. [8] proposal on a UML for BCI systems. The modular BCI system is described as having an ‘acquisition’ subsystem which comprises EEG amplifiers and analogue to digital converters, the ‘BCI transducer’ which contains the artifact processor (beyond those which are included in the acquisition stage), the feature extractor and feature classifier or translator. Following the transducer the control interface controls the visual
338
K. Stamps and Y. Hamam
Fig. 1. Functional Model of a modular BCI hardware system, based on Mason and Birch functional model of a BCI system [9]
feedback and stimulus and communicates commands based on information from the transducer to the device controller. The device controller operates the environment manipulation device which in our case are the wheelchair motors. 2.1
Selection of Neurological Mechanism
The neurological mechanism of a BCI refers to the underlying brain functions or characteristics which are used in that BCI system, examples are mu-rhythms, slow cortical potential (SCP), event related potentials (ERPs), P300 and steady state visual evoked potential (SSVEP). With regard to neurological mechanisms the SSVEP-based BCI is the one of choice for the reasons illustrated hereafter. P300 and SSVEP-based BCI systems differ from the others in that it does not read the users level of thought directly but rather detects the brain’s reaction to a changing external visual stimulus on which the user focuses attention [10]. The SSVEP is characterised by a neurological synchrony or resonance in the occipital region of the brain, elicited by focused attention on a continuous flickering at a sinusoidal frequency between 3.5Hz to 75 Hz. Therefore, unlike other neurological mechanisms the power spectrum of the attended flicker frequency is easily extracted from background EEG noise [10] hereby not needing complex feature extraction algorithms. Suitable for wheelchair control, the SSVEP-based BCI has the potential for a high information transfer rate (ITR) of over 60 bits per minute (43 bits/min in Wang et al. [6]) as compared with other BCIs having ITRs of less than 30 bits per minute [5]. 2.2
Selection of Hardware for the BCI Subsystems
In Wu et al. [11] showed that LED flicker evoked SSVEP has a larger frequency amplitude than CRT and LCD evoked SSVEPs, especially in the lower stimulus frequencies. This suggests that in place of the feedback subsystem of a BCI (figure 1) the visual stimulus in SSVEP-based BCI system can be implemented using an embedded LED stimulus system adding to its PC-independence.
Towards Inexpensive BCI Control for Wheelchair Navigation
339
For the BCI transducer, the feature extraction and classification section, a low-cost embedded system can be implemented. A basic SSVEP-based system needs very little pre-processing as the features of interest are usually above the noise. An FFT of the EEG signal together with a simple statistical analysis of the FFT can be implemented using a microprocessor or low-end DSP. To keep to the modular flexibility of the proposed BCI system, the control interface can be implemented using a low-end microprocessor. What remains is the acquisition section of the BCI system for the electric wheelchair. Good quality instrumentation amplifiers, and thus the EEG amplifier unit, of the BCI signal acquisition section are necessary to reliably amplify the very weak electrical signals from the brain. These signals are measured from the surface of a BCI user’s scalp so care must be taken to avoid compromising the user’s safety in terms of electrical isolation whilst maintaining signal integrity. In most non-invasive EEG systems this is done by using instrumentation amplifiers which are expensive, a typical example of the use of such is found in the modularEEG amplifier [13]. Opto-isolators are used in a wired BCI system2 for added electrical isolation between sections of the BCI system. A number of papers, including Beverina et al. [10] and Wang et al. [6] indicate that an SSVEP-based BCI system needs only two EEG acquisition channels, and possibly only one. This will further reduce the cost of our BCI system as less hardware will be needed on the amplifier and signal acquisition stage. In the remainder of this paper we describe a search and short study done on the acquisition hardware available with examples of research literature using the hardware in BCIs. 2.3
Gathering and Organisation of Information
In finding out what acquisition devices for BCI systems are readily available and advertised on the world wide web we carried out an extensive search for noninvasive electroencephalogram (EEG) acquisition hardware. From the results of various searches we gave the most attention to devices of a manufacture range (e.g. the g.tech range) with the least EEG channels and selected only results showing EEG acquisition systems which support on-line use. We also selectively eliminated results for devices not directly usable for BCI as a machine interface for control or command (such as those for psychological research and brain dynamics etc)The results containing the devices found were sorted according to cost and those above US$1000 filtered out3 . The top results from the search were then ordered into price ranges and tabulated, table 2 and table 3 are abridged versions of these. 2
3
A wired BCI system refers to those BCIs in which the communication of acquisition section is wire-connected to the transducer, control interface and device controller sections, as opposed to wire-less BCI systems in which the communication is wireless. US$1000 is an arbitrary figure to ensure inclusion of several systems yet restricting cost to a minimum.
340
K. Stamps and Y. Hamam
A search was also done for survey and review articles relating to BCI system and hardware design. The results of this search is briefly presented in table 1.The information gathered was organized tabularly into three groups namely (1) survey and review articles (2) information on available EEG-based acquisition hardware and acquisition hardware designs for BCI systems and (3) a selected papers describing work implementing some of the acquisition hardware.
3 3.1
Results Review Papers
The literature review and survey articles found relating to BCI development listed in table 1 are organised into three groups which give general descriptions of the contents to the articles. Table 1. Current reviews and surveys
Software related reviews and surveys
Review Content References Survey of classification algorithms and comparative [4] [14] [12] analysis of classification algorithms.
Hardware related reviews and surveys
Survey of hardware systems and illustration hardware systems used in BCI research.
General BCI Reviews Review and assessments of BCI research, BCI design and research meetings.
3.2
[7] [9] [15] [16] [17] [18]
BCI Hardware
Our on-line search for non-invasive EEG hardware and open source schematics showed 47 easily obtainable devices obtainable. Of the open source devices we discovered 8, most of which are based upon or inspired by the OpenEEG project’s modularEEG [13]. Table 2 lists all the non-invasive BCI related EEG acquisition hardware under US$1000 sorted into three groups of those less than US$200, those between $200 and $501 and those between $500 and $1001. As illustrated in table 2 there exist very few low-cost EEG acquisition devices for application in a broad range of implementable BCI systems. To give a tentative idea of the usefulness of each device for research and development we devised a feature score sheet and used specifications and information about to give each device a total score. The characteristics mapped onto a 0,1 or 2 score (meaning low(or none), limited (or adaptable) and high feature richness) includes number of EEG channels available, ADC resolution and support for multiple neurogical mechanisms. Characterisitcs mapped onto a true or false (1 or 0, present or absent) score includes ADC and CMRR hardware design flexbility, sample rate flexibility above 128 sps, embeded system interability, commercial producibility,
Towards Inexpensive BCI Control for Wheelchair Navigation
341
Table 2. The cheapest EEG acquisition hardware for on-line BCI systems Price Range Device (US$) - $200 ModularEEG Neurosky, Mindset $200 - $500 OCZ Tech., NIA Emotiv, Epoc $500 - $1000 Neurobit lite Pocket Neurobics Pendant EEG Teunis van Beelen’s openEEG Psychlab EEG1
Data contents Interface from device Raw EEG RS232 Raw EEG/Classification Data Bluetooth Classification Data USB Raw EEG/Classification Data Wireless Classification Data Wireless IrDA Raw EEG Wireless Raw EEG
RS232
Raw EEG
USB
Fig. 2. A tentative scale for the usability score of 8 EEG acquisition devices
easy of electrode placement by the user, portability, and support for dry or hydrated electrodes. This gives a total possible score of 13 for each device. Figure 2 shows the calculated scores for each device normalised to 10. Here the Emotiv Epoc showed the highest score for this particular rating system. The authors would like to point out that although we intended these scores to give some idea of the usefulness of each device to research, the scores have little meaning in themselves as different applications would require different device specifications. Sectioned into three categories, the following information is a summary of a table composed of extensive advantages and disadvantages of each of the devices4 . Commercial, Specialised Hardware defines devices which provide little or no target application flexibility. We placed the OCZ’s NIA and the Neurobit Lite in this category. The NIA is designed for PC use, specifically as a user input device for computer gaming, it uses combinations of classifications of EMG, EOG and EEG signals to give input symbols to a PC application. The Neurobit Lite is 4
The authors may be contacted for a more detailed copy of the feature score sheet and the advantages & disadvantages table.
342
K. Stamps and Y. Hamam Table 3. Other, open source EEG hardware systems Device
Hardware Data Contents Description from device MonolithEEG Improved raw EEG ModularEEG Soundcard EEG (ScEEG) AM/FM to raw EEG (Various designs) PC Soundcard OpenEXG-2 ModularEEG raw EEG Alternative
Interface USB PC Soundcard USB
designed specifically for biofeedback but may be useful in some BCI devices for prosthetic control5 . Open Source Prototype Hardware defines open design BCI systems which may have to be assembled in-lab. The modularEEG was developed to provide a lowlevel, low-cost open-source platform. It has been used in a number of research activities around the world and is a very good platform for introduction to and prototyping BCIs. Teunis van Beelen [19] [20] developed a system, based the modularEEG system, with an attractive 12 EEG input channels and high 22bit ADC resolution.Table 3 presents other open source EEG systems. The MonolithEEG is simply a design improvement on the modularEEG. The OpenEXG-2 is a 2 channel system with a better resolution and CMRR than the modularEEG. The SoundcardEEG (scEEG) has various designs which utilize a PC sound card to receive and digitise a multi-channel analogue input. The ADC resolution, CMRR and LSB resolution depends on the quality of the sound card used [21]. Commercial, Flexible Hardware defines devices which are whole or partly adaptable for use in various BCI systems. These devices are of the most interest in terms of the development of a practical, inexpensive BCI system, specifically in relation to control of prosthetics such as an electric wheelchair. They are first of all commercially available and secondly their target application is fully or partially adaptable. The Psychlab EEG and Psychlab EEG2 are the most bulky yet most application flexible in this category and only provide one or two channels with wired electrodes and raw EEG data at up to 4.7k sample per second (sps). The Pendant EEG, is portable and uses wireless communication to transmit raw EEG data at 256 sps to a target PC. It is designed for but not limited to biofeedback and uses wired electrodes. The remaining devices are the the Emotiv Epoc and the NeuroSky Mindset. Both are headsets with EEG acquisition and transducer (see figure 1) sections integrated into the headsets which send preprocessed raw EEG and classified EEG data wirelessly to a target device (PC or embedded supported). and provide PC SDK software and support third-party API development, unrestricted to programming language. The Mindset is designed for media applications and 5
This is based on the information provided in product specifications indicating that raw EEG data or other BCI-useful data is available possibly comparable to that used in BCI research.
Towards Inexpensive BCI Control for Wheelchair Navigation
343
includes audio and voice support as extra functionality. Unlike the Mindset the Epoc’s 14 channels give it an unmatched versatility in terms of neurological mechanism and target application. It does however provide a maximum of 128 sps per channel over its wireless data link. Emotiv and Neurosky participate in and encourage collaboration with research institutions and application development companies to add to and improve their products. BCInet the owners of OCZ technologies’ NIA, to a lesser extent, also encourage development input and research into assistive devices. 3.3
Related Papers in Which Hardware Is Used
Table 4 lists some research papers which specifically stated the use of hardware mentioned in table 2 and table 3. The table states the target application in which the BCI system was used and the main feature extraction and classification computation platform used. The table also includes two papers specifically centered around development of low-cost BCI systems. Table 4. Selected research papers incorporating EEG-based devices in BCI systems Device Pendant EEG
BCI Application EEG signals for music composition [22] NeuroskyMindset For design of new gameplay elements in PC games[23] Lab developed wireless High-temporal resolution monwearable BCI itoring of brain dynamics [24] Based on modularEEG Low-cost real-time wheelchair steering [25] Other devices In-expensive BCI system for (Deymed Truscan32) off-line research studies [26]
4
Mechanism Platform Mu-rhythm PC Mu-rhythm Embedded SCPs
Embedded
SSVEP
PC/disPIC
ERP
PC
Conclusion
In the Methods section we reasoned as to why SSVEP-based BCI systems are the most practical and least resource expensive approach for a BCI system for use as a user interface to control an electric powered wheelchair. In table 1 we presented a couple of review papers on signal processing algorithms, BCI techniques and articles on reviews of BCI hardware systems used in research. We established that there do exist commercially available EEG acquisition devices for BCI which are less than US$1000 and listed them in table 2 together with the non-commercial devices less than US$1000. Few research papers on BCI directly mention the use of the available EEG acquisition systems priced below US$1000 for prosthetic devices. Papers we have found using the same low-cost EEG acquisition models to those listed in table 2 have included papers on music composition devices and devices for studies on
344
K. Stamps and Y. Hamam
computer game play. No papers were found stating the use of commercially available BCI devices in prothetic devices for assitive technologies to date. There is however, evidence of working development of low-cost BCI for prosthesis control in the research done by Ribeiro et al. which supports the use of an SSVEP-based system as the simplest mechanism to use for low-cost BCI devices [25]. In conclusion, with the availability of affordable EEG acquisition systems mentioned in this paper, further research into the development of reliable and flexible yet affordable BCI systems for prosthesis-centered applications will be welcomed. This will be beneficial to the further study and improvement of BCIs for assistive technologies to the severely disabled person. Suggested Future Work. A study on hardware and computational resource requirements of classification algorithms for BCI systems used in research. A more comprehensive comparative study of the available BCI acquisition devices in the US$100 - US$1000 range in various target applications with various neurological mechanisms. A practical SSVEP system to increase the transparency of SSVEP stimulus whilst maintaining a high ITR and classification success rate. Acknowledgments. Tshwane University of Technology, South Africa for financial support.
References 1. NeuroSky: BCI technology grounded in laboratory research (2010), http://company.neurosky.com/university/ 2. Emotiv: Become an Emotiv Researcher (2010), http://www.emotiv.com/researchers/ 3. nia Game Controller OCZ Technology (2010), http://www.ocztechnology.com/products/ocz_peripherals/nia-neural_ impulse_actuator 4. Bashashati, A., Fatourechi, M., Ward, R.K., Birch, G.E.: A survey of signal processing algorithms in 5. Garcia, G.: High frequency SSVEPs for BCI applications (2008) 6. Wang, Y., Wang, R., Gao, X., Hong, B., Gao, S.: A practical VEP-based braincomputer interface. IEEE Transactions on Neural Systems and Rehabilitation Engineering 14, 234–240 (2006) 7. Mason, S.G., Birch, G.E.: A general framework for brain-computer interface design. IEEE transactions on neural systems and rehabilitation engineering: a publication of the IEEE Engineering in Medicine and Biology Society 11, 70–85 (2003) 8. Quitadamo, L., Abbafati, M., Saggio, G., Marciani, M., Cardarilli, G., Bianchi, L.: A UML model for the description of different brain-computer interface systems. Engineering in Medicine and Biology Society. In: 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS 2008, pp. 1363–1366 (2008) 9. Mason, S.G., Bashashati, A., Fatourechi, M., Navarro, K.F., Birch, G.E.: A comprehensive survey of brain interface technology designs. Annals of biomedical engineering 35, 137–169 (2007)
Towards Inexpensive BCI Control for Wheelchair Navigation
345
10. Beverina, F., Palmas, G., Silvoni, S., Piccione, F., Giove, S.: User adaptive BCIs: SSVEP and P300 based interfaces. PsychNology Journal 1, 331–354 (2003) 11. Wu, Z., Lai, Y., Xia, Y., Wu, D., Yao, D.: Stimulator selection in SSVEP-based BCI. Medical engineering & physics 30, 1079–1088 (2008) 12. Yang, R., Gray, D.A., Ng, B.W., He, M.: Comparative analysis of signal processing in brain computer interface. In: Proceedings of the 4th IEEE Conference on ICIEA 2009, pp. 580–585: IEEE Xplore (2009); brain-computer interfaces based on electrical brain signals. Journal of neural engineering 4, R32–R57 (2007) 13. the modulareeg (2008), http://openeeg.sourceforge.net/doc/modeeg/modeeg.html 14. Lotte, F., Congedo, M., Lcuyer, A., Lamarche, F., Arnaldi, B.: A review of classification algorithms for EEG-based brain-computer interfaces. Journal of neural engineering 4, R1–R13 (2007) 15. Lin, C., Ko, L., Chang, M., Duann, J., Chen, J., Su, T., Jung, T.: Review of Wireless and Wearable Electroencephalogram Systems and Brain-Computer Interfaces – A Mini-Review. Gerontology, 112–119 (2010) 16. Wang, Y., Gao, X., Hong, B., Jia, C., Gao, S.: Brain–Computer Interfaces Based on Visual Evoked Potentials. IEEE Engineering in Medicine and Biology Magazine 65 (2008) 17. Wolpaw, J.R., Birbaumer, N., Heetderks, W.J., McFarland, D.J., Peckham, P.H., Schalk, G., Donchin, E., Quatrano, L.A., Robinson, C.J., Vaughan, T.M.: Braincomputer interface technology: a review of the first international meeting. IEEE transactions on rehabilitation engineering: a publication of the IEEE Engineering in Medicine and Biology Society 8, 164–173 (2000) 18. Wolpaw, J.R., Birbaumer, N., McFarland, D., Pfurtscheller, G., Vaughan, T.M.: Brain–computer interfaces for communication and control. Clinical Neurophysiology 113, 767–791 (2002) 19. van Beelen, T.: 12 channel EEG amplifier (2010), http://www.teuniz.net/12-ch_EEG_amplifier/index.html 20. van Beelen, T.: 12 channel ADC-box (2010), http://www.teuniz.net/12-ch_ADC-board/index.html 21. soundcardeeg (sceeg) prototype (2010), http://openeeg.sourceforge.net/doc/hw/sceeg/ 22. Trevisan, A.A., Jones, L.: A low-end device to convert EEG waves to music. Journal of the Audio Engineering Society (2010) 23. Ko, M., Bae, K., Oh, G., Ryu, T.: A Study on New Gameplay Based on BrainComputer Interface. In: Proceedings of DiGRA 2009 (2009) 24. Lin, C., Ko, L., Chiou, J., Duann, J., Huang, R., Liang, S., Chiu, T., Jung, T.: Noninvasive Neural Prostheses Using Mobile and Wireless EEG. Proceedings of the IEEE 96, 1167–1183 (2008) 25. Ribeiro, A., Sirgado, A., Aperta, J., Lopes, A.: A low-cost eeg stand-alone device for brain computer interface. In: BIODEVICES 2009 International Conference on Biomedical Electronics and Devices, pp. 430–433. INSTICC (2009) 26. Portelli, A.J., Nasuto, S.J.: Toward Construction of an Inexpensive Brain Computer Interface for Goal Oriented Applications. In: AISB (2008)
Expression Recognition Methods Based on Feature Fusion* Chang Su, Jiefang Deng, Yong Yang, and Guoyin Wang** Institute of Computer Science & Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, P.R. China Tel.: 86-23-62460066 {changsu,jiefangdeng,yongyang,wanggy}@cqupt.edu.cn [email protected], http://cs.cqupt.edu.cn/wanggy
Abstract. Expression recognition is popular research focus in Artificial Intelligence and Pattern Recognition. Feature fusion is one of the most important technical methods in expression recognition. To study how the feature information extracted from different part of the face play the role in facial expression recognition, experiments have been done and shown that Gabor wavelet feature and geometric characteristics of mouth are more important. In the first experiment, Gabor wavelet features of mouth is used for expression recognition, it is only worse than the result of the whole face. It has even better performance in Occidental emotion expression recognition. In the second experiment, we show that fusing the Gabor wavelet feature and geometric characteristics of mouth together can achieve better recognition results than using either method alone. It also has better real-time performance than using the whole face image. Keywords: Expression recognition, Feature fusion, Gabor wavelet, Geometric feature.
1 Introduction Along with the continuous development of computer science technology, the computer is expected to be more interactional with human. Not only the computer is needed to be faster, but also it is demanded to achieve human-computer intelligent interaction (HCII). The computers can have emotions and express it naturally, similarly to the way human does. Thus, the expression recognition has become one of the most important research subjects in the current artificial intelligence field. In recent years, world wide research scholars paid more attention on this research area [1]. Meanwhile, data fusion (including three levels: data-level, feature-level and decisionlevel) developed rapidly in multi-modal expression recognition field, especially that *
This paper is partially supported by National Natural Science Foundation of China under Grants No.60773113, Natural Science Foundation of Chongqing under Grants No.2007BB2445, No.2008BA2017 and No.2008BA2041. ** Corresponding author. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 346–356, 2010. © Springer-Verlag Berlin Heidelberg 2010
Expression Recognition Methods Based on Feature Fusion
347
feature-level fusion received more attentions because of its advantage of memorizing classification criteria about the various features participating in the fusion [2~4]. Currently, expression recognition methods based on feature fusion could be divided into two categories: one is based on the global facial features, the other is based on local facial features. Usually, facial features method integrates facial local information and global information together, or fuses one part of facial information with another part, rather than combines different features in one part of face. According to the psychological study results of Xue Sui [5], the features of mouth and eyes play an important role in expression recognition. Meanwhile through extracting the geometric characteristics of mouth and eyes, Yong Yang proved that geometric characteristics of mouth play a significant role in expression recognition. Based on these results, we decide to first study the contribution of specific features of eyes and mouth to expression recognition respectively. After that we can fuse the different features in the same parts or different characteristics in different parts to get better expression recognition results and to resolve the real-time problem of global facial feature method.
2 Feature Extraction The printing area is 122 mm × 193 mm. The text should be justified to occupy the full line width, so that the right margin is not ragged, with words hyphenated as appropriate. Please fill pages so that the length of the text is no less than 180 mm, if possible. 2.1 Gabor Wavelet Feature During the study, we noticed that Gabor wavelet feature is useful in extracting local facial information, lowering dimensions and other functions in facial expression image processing. Like a "digital microscope”, Gabor wavelet feature can enlarge gray variation of eyes, nose, mouth and other local features of the facial expression image. Because of the importance and good features of Gabor wavelet in the local extraction of facial expression images and the experiment results of Xue Sui’s [5], this paper argues that it is needed to do a detailed study about the importance of Gabor wavelet in facial expression local features (eyes, mouth) and then use it for the characterization of expression feature. The function formula of Gabor filter [6] can be expressed as:
Ψ u ,v
(z ) =
k u ,v
σ
2
2 e
⎛ ⎜ ⎜ ⎜ ⎜⎜ ⎝
2 z 2 − k u ,v 2σ 2
⎞ ⎟ ⎟ ⎟ ⎟⎟ ⎠
−σ 2 ⎡ i kG z u v , ⎢e − e 2 ⎢ ⎣
Where u and v define the direction and scale of Gabor nuclear,
⎤ ⎥ ⎥ ⎦ .
(1)
z = ( x, y ) , z is the
spatial location, v determines the scale of the Gabor filter (spatial frequency), μ determines the direction of the Gabor filter, i is the complex operator, σ defines the bandwidth of Wavelet filter and determines the ratio of width and wavelength of
348
C. Su et al.
Gaussian function , and set
σ = 2π
. Wave
k u , v is the wave vector of plane wave,
defined as:
k u ,v Where,
k m ax quency,
kv =
k max
fv
=
iφ u
(2) .
express the different nuclear frequency of wavelet,
is the maximum frequency,
φu = π u 6
k v e
f is the inner core interval factor of the fre-
express the different directions of the wavelet. In this paper,
expression recognition applies three scales of Gabor wavelet, v ∈ {0,1, 2} , and six
u ∈ {0 ,1, 2 , 3, 4 , 5 } , hence let k max = π . Accord2 f = 2 ing to the formula (1), we can get 3 × 6 altogether eighteen filters. directions
The Gabor features description of the facial expression image can be implemented by its Gabor wavelet transformation. Assuming that P ( x, y ) is the single facial expression image, then the two-dimensional Gabor wavelet transform should be defined as the convolution of the image and the Gabor wavelet function O u , v ( x , y ) , such
as the equation (3):
O u ,v
( x , y ) = P ( x , y ) * Ψ u ,v ( x , y ) .
Where * is the convolution operation, O u , v
(3)
( x , y ) is the convolution result that
corresponding to Gabor core in the direction u and the scale
v . Therefore, the Gabor
feature set through Gabor wavelet transform of the image P ( x , y ) can be expressed as:
{
}.
S = Ou,v ( x, y) : u ∈{0,⋅⋅⋅,5} , v ∈{0,1,2}
(4)
2.2 Geometric Features
Expression recognition based on the geometric features is consistent with the mechanism of human facial recognition. It is convenient to extract and easy to understand, but single geometric feature results in unfavorable recognition effect. Similarly, since the traditional Gabor wavelet use the multi-scale and multi-dimensional information in the transformation, its feature vector's dimension is higher, such that the computing time and the overheads of memory are in great demand. To complement the shortage of the geometric features and Gabor wavelet features in facial expression recognition,
Expression Recognition Methods Based on Feature Fusion
349
this paper combines local Gabor wavelet features and local geometric characteristics, then use this new feature fusion to study the expression recognition and do the facial expression recognition research. With reference to Facial Animation Parameters (FAP) parameters [7] of the standard MEPG-4, we define the facial feature points. In this paper, we only employ 52 out of 68 parameters of FAP since some parameters have little effect in the countenance expression, e.g., raise_l_ear. These parameters all have something to do with eyebrows, eyes, nose, lips, (see Fig.1.) .we don’t consider the left 14 FAP parameters [8].
Fig. 1. Facial feature points defined in this paper
In facial expression recognition, because of different facial geometric scale, geometric features of different face are also different. In this section, although we only extract the geometric features of mouth, we also need to do volume normalization on extracted geometric feature distance. As the distance between two inner canthi in all the expressions (the distance between feature points 23 and 27) does not change basically, in this paper we use the distance between two inner canthi as the feature normalization factor. 2.3 The Methods of Feature Fusion
According to data fusion theory, data fusion is divided into the three levels: data-level fusion, feature-level fusion and decision-level fusion. This paper will employ the feature-level fusion that specifically concerns combining two features of the mouth, while ignoring features of the remaining parts of the face. For the Gabor wavelet feature of mouth, although the extracted Gabor feature dimension is greatly reduced compared to those through traditional methods, but its dimension still remains high. Thus in this paper we utilize the PCA methods to reduce the dimension. The mean of the Gabor wavelet feature in mouth after reducing the dimension is
μ1, μ 2, ⋅⋅⋅ μ c , where c is the expression category. As the parallel fu-
sion algorithm is relatively complex, basic serial feature fusion approach is used to control the impact of other additional conditions, i.e., to normalize the Gabor wavelet
350
C. Su et al.
feature μc of mouth and its geometry feature xi into the same expression sample space and then combine them into a row of feature vector.
3 Experiment and Analysis 3.1 Experimental Sample Library
In this section we use the Cohn-Kanade facial expression database [9], the Japanese female facial expression database (JAFFE) [10], and Self-build expression database (CQUPTE) to test the methods mentioned above. The Cohn-Kanade facial expression database [9] contains 2000 images of nearly 200 men and women. Using this library, we can use the method proposed in this paper to verify the validity in a larger number of data samples. Inadequacy of this library is that the image face is mostly Westerners face and lack of the Asians face characterization. The Japan's women face database (JAFFE) [10] is most frequently used in the facial expression recognition study. It consists of 213 images of l0 women. There are some differences in the expression intensity of a single sample. Unfortunately, this library has many shortcomings, such as the inadequate objects (only 10 persons), single-sex (all are women) and ethnic homogeneity (all are the Asian Japanese). Thus in this paper we will use the self-built expression library (CQUPTE) as a better supplement for the Asians face expression recognition. Self-build expression library (CQUPTE) contains eight men and women, for each single facial expression each person has 3 to 6 images. We gather the expression data under the guidance of the structural expression features table [11] proposed by Jin Hui and Gao Wen and then take the experimental objects' natural feeling as the standard to limit the strength of expression. Parts of the expression images are showed in Fig. 2:
Fig. 2. Part of self-builded expression library images (happy, depression, fear, disgust, surprise, anger)
3.2 Experiments 1: The Gabor Wavelet Features of Eyes and Mouth 3.2.1 Methods and Procedures The purpose of this experiment is to test the importance of the Gabor wavelet features of eyes and mouth in facial expression recognition and test which one is more important. This research is based on both Yong Yang’s research on the important feature foundation in the expression recognition and Xue Sui’s study on psychology [5]. It will contribute to the literature and lay the foundation for future research that aim to further improve the effect of facial expression recognition under feature fusion approach.
Expression Recognition Methods Based on Feature Fusion
351
In order to effectively test, we make full comparison on the effects of global face Gabor wavelet feature, Gabor wavelet feature of mouth, and eyes' Gabor wavelet features on expression recognition. In addition, we also use the same methods to deal with the three expression libraries mentioned in this section. The experiment will take the six basic facial expressions (angry, disgust, fear, happy, sad, and surprised) mentioned by the Ekman [12] to do classification. The specific experimental steps are as follows: (1) For each face image, we will do the face detection, pupil orientation, cutting, gray balance and other pretreatment, and then get the 80×80 image of the whole face after pretreatment. (2) The preprocessed image is divided into 64 cells. Then use the facial pupil orientation and facial structure’s principle of "three court five other" to locate the eyes and mouth. Next, cut the image to extract the continuous 20 cells, i.e., the 20 × 80 image in the mouth and eyes. (3) we will do the Gabor wavelet transform and feature extraction in three scales and six directions on the 80×80 image of the whole face after pretreatment and the 20 × 80 image in the mouth and eyes. (4) the image in the three databases are divided into the training set and test set according to the ratio of 3:1 , and we use four cross-validation approach to compare the methods of Gabor + PCA, Gabor + 2DPCA, Gabor + LDA respectively on their results on sample database. 3.2.2 The Analysis of Experimental Results The experimental results are as follows in Table 1, Table 2, Table 3: Table 1. The results of CQUPTE expression library
Method
The recognition results in the self-build library The whole face
Mouth
Eyes
Gabor+PCA
95.5%
93.33%
83.33%
Gabor+2DPCA
96.67%
94.66%
84.17%
Gabor+LDA
77.3%
72.77%
63.1%
Table 2. The results of JAFFE expression library
Method Gabor+PCA Gabor+2DPCA Gabor+LDA
The recognition results in the JAFFE expression library The whole face
Mouth
Eyes
91.67%
80%
78.67%
93.3%
89%
85%
70.58%
65.34%
54.17%
352
C. Su et al. Table 3. The results of Cohn-Kanade expression library
Method
The recognition results in the Cohn-Kanade expression library The whole face
Mouth
Eyes
Gabor+PCA
93.1%
91.67%
89.04%
Gabor+2DPCA
95.17%
94.3%
91.33%
Gabor+LDA
74.6%
71.77%
58.11%
According to Table 1 to 3 we can get the average recognition results (Table 4) of the whole face, the mouth and the eyes in the three expression library with three methods, and the comparison of the difference results of the whole face and mouth (Table 5). Table 4. The average experimental results in the three expression libraries
Method
The average recognition results in the three expression libraries The whole face
Mouth
Eyes
Gabor+PCA
93.42%
88.33%
83.68%
Gabor+2DPCA
95.05%
92.65%
86.83%
Gabor+LDA
74.16%
69.96%
58.46%
Table 5. The comparison of the difference results of the whole face and mouth in three expression libraries
Method
The difference recognition rates between the whole face and mouth in three expression libraries CQUPTE
JAFFE
Cohn-Kanade
Gabor+PCA
2.17%
11.67%
1.43%
Gabor+2DPCA
2.01%
4.3%
0.87%
Gabor+LDA
4.53%
5.24%
2.83%
Analyzing the experimental results we can draw the following conclusions: (1) From Table 1 to 3 we can see that, in these three libraries, the Gabor wavelet feature of the whole face has the highest expression recognition rate, but the recognition rate of the mouth expression features obtained with the three methods are all significantly high, and only to the whole facial expression recognition.
Expression Recognition Methods Based on Feature Fusion
353
(2) Combined with Table 1 to 3, it can be seen from Table 4 that the Gabor wavelet feature of the mouth it has apparently higher effect in the whole facial expression recognition than features of eyes. So we can extract the local information well using the Gabor wavelets and therefore the Gabor wavelet feature of the mouth in the facial expression recognition plays a very important role. (3) According to Table 5, the Gabor wavelet feature of the mouth extracted in the Cohn-Kanade database has the closest effect to the whole recognition, while those from the CQUPTE expression library rank the second and those from JAFFE expression rank third. Therefore, we can conclude that the Gabor wavelet features of the mouth play a greater function in the facial expression recognition of the westerner. In other words, the mouth feature is more important in the emotional expression in the west. 3.3 Experiment 2: Fusion of the Geometric Features and the Gabor Wavelets
The results from experimental 1, show that the Gabor wavelet feature of the mouth plays an important role in expression recognition. In addition, Yong Yang’s experimental results on the geometric features of the mouth also reveal that the geometric characteristics of the mouth have significant effect in the expression recognition. In essence, geometric features approach has many merits: consistent with human’s natural mechanism of recognizing facial expression; convenient to extraction and easy to understand. Coupled with the fact that it is also insensitive to light in the same way as the Gabor wavelets. In the experiment 2, we combine the geometric features and the Gabor wavelets to see the effect on expression recognition. 3.3.1 Methods and Procedures (1) We will get the Gabor wavelet features of the whole face and the mouth by using the method utilized in the experiment 1. (2) Concerning the extraction of the geometric features in mouth, we first employ the methods in literature [8] to track the feature points of the facial image automatically; then, borrowing the experimental methods of Xue Sui’s psychological experiment [5] , we will cover parts of organs in the human face and only use the relevant features for the expression recognition . Finally, according to Yong Yang’s the geometric feature experimental methods, we extract 12 most important features
d8,d11,d12,d13,d14,d15,d22,d23,d24 of the mouth, as shown in Fig. 3 and Table 6.
Fig. 3. The important geometric features of the mouth
354
C. Su et al. Table 6. The description of the important geometric features of the mouth
Feature
Describe
Feature
Describe
d8
dis(39,46)
d15
dis(39,3)
d11
dis(39,44)
d22
dis(44,48)/2
d12
dis(39,48)
d23
dis(45,51)
d13
dis(44,48)
d24
dis(47,49)
d14
dis(46,50)
(3) We combine the Gabor wavelet features of the mouth expression image in three scales and six directions with the 12 geometric features in mouth using a serial fusion method. Then form the combined features into eigenvectors and decrease the dimension of the features using PCA by columns and obtain the recognition results by using the nearest neighbor classifier. In order to fully verify that the fusion of the Gabor wavelet feature of the mouth and the geometric characteristics have better effect than the single feature approach in achieving expression recognition and get the most precise experiment results. We divide the images of the three database into the training set and test the set according to the ratio of 3:1 and use the four cross-validation methods to test the experiments. The Cohn-Kanade facial expression database and CQUPTE expression database have more human facial expression image, so we group them into four sets with 120 images each, i.e., each library has 480 images. However, JAFFE expression database has less images, so we group them into a total of 4 groups with 60 images each, namely, there are 240 images. The second experiment contains two sub-tests to compare the same set of images in dimensions of recognition rate and real-time. Each experiment is repeated three times and takes the average recognition rate and recognition time as the recognition results. There are two sets of experiments in Experiment 2: The first group: for the same sets of images, use PCA to reduce the dimensions of the Gabor wavelets of mouth after fusing the serial features and geometric features, the single Gabor wavelet features and the single geometric features. Then classify using the nearest neighbor classifier to obtain the final comparison recognition results. The experiment results are shown in Table 7. The second group: for the same sets of images, use PCA to reduce the dimension of Gabor wavelets of mouth after fusing the serial features +geometric features of the mouth and the whole facial Gabor wavelet features. Also classify using the nearest neighbor classifier to obtain the final comparison recognition results and the recognition time. The experiment results are shown in Table 8. 3.3.2 The Analysis of the Experiment Results The experiment results are shown in Table 7 and Table 8:
Expression Recognition Methods Based on Feature Fusion
355
Table 7. Comparison of expression recognition rate of the mouth feature
Expression library
The expression recognition rates of the mouth feature (Gabor&geometric) Gabor Wavelet Geometric feature feature feature
CQUPTE
94.20%
93.10%
89.73%
JAFFE
86.67%
80%
58.28%
Cohn-Kanade
92.30%
90.90%
75.15%
The average
91.06%
88%
74.39%
Table 8. Comparison between the whole facial Gabor feature and mouth feature
Expression library
The whole facial Gabor feature Expression recognition Recognition rate time
Expression recognition rate
Recognition time
CQUPTE
95.50%
21.047s
94.20%
2.609s
JAFFE
91.67%
9.829s
86.67%
0.841s
Cohn-Kanade
93.10%
20.969s
92.30%
2.781s
The average
93.42%
17.282s
91.06%
2.077s
Mouth (Gabor&geometric) feature
Analyzing the experiment results, we can know that: (1) As it can be seen from Table 7, after fusing the Gabor wavelet features of the mouth and the geometric characteristics of the mouth, the average of the expression recognition rate is 91.06%, apparently higher than those recognition rates of individual feature. (2) As it can be seen from Table 8, the expression recognition rate obtained by the feature fusion is close to the recognition rate of the whole facial Gabor wavelet feature, while the average difference is only 2.36%. (3) The recognition time in the expression recognition using only the feature fusion of the mouth (Gabor wavelet & geometric) is far less than the recognition time using the full face Gabor feature.
4 Conclusions In this paper, we conduct the research on the importance Gabor wavelet in the facial parts (mouth, eyes) in expression recognition. The experiment results show that the Gabor wavelet feature of the mouth play an important role in expression recognition, especially in the western emotional expression. Then, we further explore the effectiveness of using local features, in this context, the fusion of the Gabor wavelet feature
356
C. Su et al.
and the geometric feature of the mouth to recognize the human face by employing the sample in the CQUPTE JAFFE and Cohn-Kanade expression library. The results show that such fusion approach can achieve significant recognition effect with better real-time than the whole facial expression approach does, thus solve the problem that Gabor wavelet feature approach has high recognition effect but low real-time. In addition, the research also shows that the future study on expression recognition can start from the fusion of different features of the local part in the human face. So computer can use only local features to recognize the facial expression, achieve better humancomputer interaction and provide convenience for people to further explore the cognitive science world.
References 1. Zhao, W., Chellappa, R., Philips, P.J., Rosenfeld, A.: Face recognition: a literature survey. J. ACM comput. Surv. 3(4), 399–458 (2003) 2. Loh, M.P., Wong, Y.P., Wong, C.: Facial expression recognition for e-learning systems using Gabor wavelet & neutral network. In: Proceedings of the Sixth International Conference on Advanced Learning Technologies, pp. 523–525. IEEE Computer Society, The Netherlands (2006) 3. Kotsia, I., Nikolaedis, N., Pitas, I.: Fusion of geometrical and texture information for facial expression recognition, pp. 2649–2652. IEEE, Los Alamitos (2006) 4. Aleksic, P.S., Katsaggelos, A.K.: Automatic facial expression recognition using facial animation parameters and multistream HMMs. J. IEEE Transactions on Information Forensics and Security 1(1), 3–11 (2006) 5. Sui, X., Ren, Y.T.: Online Processing of Facial Expression Recognition. J. Acta Psychologica Sinica 39(1), 64–70 (2007) 6. Liu, C., Wechsler, H.: Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition. J. IEEE Transaction on Image Processing 11, 467–476 (2002) 7. Pandzic, I.S., Forchheimer, R.: MPEG-4 Facial Animation: The Standard, Implementation and Applications. John Wiley & Sons, Inc., New York (2003) 8. Yang, Y., Wang, G., Chen, P.: Feature Selection in Audiovisual Emotional Recognition Based on Rough Set Theory. J. Transactions on Rough Sets VII, 283–294 (2007) 9. Kanade, T., Chon, J., Tian, Y.: Comprehensive database for facial expression analysis. In: Proceedings of International Conference on Face and Gesture Recognition, pp. 46–53 (2000) 10. The Japanese Female Facial Expression (JAFFE) Database (1998), http://www.mis.atr.co.jp/~mlyons/jaffe.html 11. Jin, H., Gao, W.: The Human Facial Combined Expression Recognition System. J. Chinese Journal of Computers 23(6), 602–608 (2000) 12. Ekman, P.: Facial Expression and Emotion. J. American Psychologist 48(1), 384–392 (1993)
Investigation on Human Characteristics of Japanese Katakana Recognition by Active Touch Suguru Yokotani1, Jiajia Yang1, and Jinglong Wu1,2 1
Biomedical Engineering Laboratory, Graduate School of Natural Science and Technology, Okayama University, Tsushima-Naka 3-1-1,Okayama 700-8530, Japan [email protected] 2 The International WIC Institute, Beijing University of Technology, China
Abstract. Braille is one of the few reading systems where tactile perception is used. However, one important issue of Braille is that that is difficult to learn, especially for elderly. Thus, there is a need to develop a new reading system which presents letters directly for blind people. The aim of present study is to investigate the human characteristics of katakana recognition by active touch for tactile reading system development. In present experiment, ten healthy young subjects were asked to recognize 46 Japanese katakana by active touch. The raised-Japanese katakana characters were made of Duralumin, and the height of these stimuli were 10mm. Subjects were instructed to touch the katakana stimuli with their right index finger without large submovement. The mean accuracy of all young subjects was over 80%, and the mean reaction time was about 27.3 s. Our results indicated that the mean accuracy was decreased with similarity increased. However, several differences with regard to high accuracy under high similarity pair conditions need to be considered.
1 Introduction Humans excel at recognizing shapes, objects, and material via tactile means. Touch is a very important sensory for blind people. The Braille system is a method that is widely used by blind people to read and write. Braille system consists of patterns of raised dots arranged in cells of up to six dots in a 3 x 2 configuration. Each cell represents a letter, numeral or punctuation mark. Braille system has been adapted to write many different languages, including Japanese, and is also used for musical and mathematical notation [1]. However, one important issue of Braille system is that that is difficult to learn, especially for elderly. Therefore, it is highly needed to develop an easy tactile reading system for blind people. A lot of researchers were considered to use the letter directly instead of Braille system. In Japanese, a great deal of previous studies has focused on Japanese Katakana, Hiragana and Kanji Characters recognition. For instance, Shimizu et al. used tactile Japanese Katakana stimuli to investigate the ability of Katakana recognition [2]. In their study, the vibrational tactile information display unit was used to present 46 Katakana on subjects’ palm. The katakana was presented with stroke order. The results indicated that subjects could recognize letters correctly with only a few seconds. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 357–364, 2010. © Springer-Verlag Berlin Heidelberg 2010
358
S. Yokotani, J. Yang, and J. Wu
Moreover, although only the starting point, ending point and folding point of letters were presented, subject also kept in high accuracy. In this way, they could convey letter information to subjects correctly and rapidly. However, the recognition characteristics for each Katakana letter were remaining unclear. On the other hand, Sagawa et al. [2] used Hiragana as stimuli to examine the tactile cognitive ability by difference tactile strategy. The Hiragana patterns of previous study were made of dot plotter. The size of Hiragana was from 6 x 6 mm to 34 x 34 mm. The subjects were asked to touch the stimuli freely. Both sighted and blind subjects participated in their study. The results showed that the blind subjects obtained higher cognitive accuracy than that of sighted subjects. In addition, the accuracy was improved and reaction time was shortened when the size of Hiragana was increased. Otherwise, they also investigated cognitive ability using Japanese Kanji characters [3]. The Kanji characters used in their study consisted of four different heights and made of the same dot pattern as in their previous study. The heights of Kanji were 16mm, 24mm, 32mm and 40mm and the stroke counts of letters were between 1 and 14. Subjects were asked to perceive the Hiragana by active touch. The results indicated that subjects could recognize correctly despite of small letter size when letter stroke counts are small. In addition, the performance of subjects increased when the size of letter was increased. For other language, many previous studies also used alphabets letter to investigate tactile discriminability [4]. In their experiment, subjects were asked to perceive the letter shape by passive and active touch. The result suggested that the accuracy improved when subjects actively scan the letter pattern. Moreover, Manning et al. [5] also used the alphabets letters to examine the letter recognition ability compared young to older subjects. The observation showed that the relationship between tactile spatial sensitivity effect on accuracy of letter recognition. This is because the function of receptors located in skin was declined for older subjects. As we have described above, previous studies used the different methodology to present the tactile Japanese and alphabet letter patterns. However, whether the tactile recognition depends on the basic shape of Japanese letter was still unknown. Thus, the aim of present study was to investigate the tactile katakana recognition characteristics for each letter by active touch.
2 Materials and Method 2.1 Subjects Ten volunteers (21 to 25 years, mean: 23.0 years, both males) from Okayama University participated in the experiment. All subjects gave their written informed consent prior to participation in this study. Subjects were right-handed as confirmed by the Edinburgh Handedness Inventory [6]. All subjects reported no loss of tactile sensation or any unusual experiences with haptic input. Subjects were blindfolded and seated at a table, and Katakana patterns were placed in a device that secured their bases and made their location predictable.
Investigation on Human Characteristics of Japanese Katakana Recognition
359
2.2 Stimuli The base of tactile letter patterns were used simple “MS UI Gothic” font. We used 46 letters consists with simple raised line such as ア, カ and サ. Fig. 1 shows all shape of tactile letter patterns used in present experiment.
Fig. 1. The shape of 46 katakana characters
(a) Top view
Height
Thickness
(Height)
10mm
0.5mm
5mm(Thickness)
As shown in Fig.2, the height of each letter pattern was 10mm. The width of the raised line was 0.5mm. The width of each letter was differed from every letters, because we varied the width for each letter base on their normal font scale. The width was about 11mm for the maximum letter and about 5mm for the minimum one. Even the width was differed for all letters, subjects cannot used this implicate to recognize all letters.
Stimulus Base (b) Side view
(c) Real stimuli
Fig. 2. The configuration of katakana
2.3 Similarity Measure In order to complete a quantitative analysis of the human characteristics of letter recognition, each pair of letter pattern was characterized by a “similarity measure” defined as follows. First, each letter pattern was transformed into an image. We applied a weighting mask image to weight the circumference of each tactile pattern. Then, each image was spatially smoothed to take into account the spatial sensitivity of tactile perception. Second, the correlations between the two different images were calculated for all pairs and were used as the similarity measure. The similarity was the rate of overlapped part. Fig.3 shows one example use “ア” and “フ”.
360
S. Yokotani, J. Yang, and J. Wu
Fig. 3. Overlapped part of two katakana letters
2.4 Experimental Device The primary device was composed of the three parts as shown in Fig.4 (a). It consists of a disk for tactile pattern delivery, a finger gap with cover, and ultrasonic motor and force sensor unit. One motor controller (Canon Precision Inc. URC200) was used to control the ultrasonic motors and optical sensors. In order to control and operate the system, we developed a program to achieve: precise position control of stimulus delivery, precise control of subjects’ finger movement, reliable recording of reaction times, and monitoring of system operations and integrity. Fig.4 (b) shows the subject’s finger position. The program was written with “Presentation Version 0.61” which emphasizes flexibility and ease of use through displayed user instructions while maintaining control and accuracy of experimental parameters. (b)
(a)
(b)
Fig. 4. Photo of experimental device and finger position
2.5 Task and Procedure Subjects were asked to put their fingers at the initial position as show in Fig.4 (b). First, the experimenter randomly presented a Katakana. Next, the subjects were asked to touch the Katakana pattern and verbally identify what the Katakana is. In the experiment, we presented 46 letters per day. Each subjects completed five sessions during five days. The task and data acquisition were controlled by a personal computer. For each trial, the finger movement was controlled at approximately the same level.
Investigation on Human Characteristics of Japanese Katakana Recognition
361
The subject’s response was stored along with information about the trial, including the name of Katakana patter, and the order of presentation of the shift. We calculated the accuracy rates and recorded the reaction time in present experiment.
3 Results 3.1 Responses The left vertical column of Table.1 shows the actual katakana which was presented and the upper horizontal row show the responses from all subjects. Thus, the diagonal line represents the number of correct response. The numbers in other grid represents the number of incorrect answers. The bottom line of Table 1 shows the total of subjects’ answers and the line on the right shows the total presented number. As show in Table 1, the largest incorrect number is 28, which when “チ” was presented and the response was “テ”. Table 1. Presented-Response
RESPONSE アイウ エオ カキクケコ サシスセ ソ タ チツテ ト ナ ニヌネ ノ ハ ヒ フ ヘ ホ マ ミ ム メ モ ヤユ ヨ ラ リ ルレ ロ ワ ヲ ン ア 26 1 1 1 19 1 1 50 イ 48 1 1 50 ウ 40 1 5 1 1 1 1 50 エ 44 1 1 3 1 50 オ 29 5 2 1 1 2 2 5 1 1 1 50 カ 5 3 33 1 1 1 1 2 1 1 1 50 キ 6 2 20 1 3 5 13 50 ク 5 2 34 2 2 3 2 50 ケ 1 1 2 1 35 2 1 1 1 1 1 3 50 コ 49 1 50 サ 5 2 36 2 1 4 50 シ 1 42 2 1 4 50 ス 47 1 1 1 50 セ 1 1 1 1 40 1 2 2 1 50 ソ 36 11 3 50 タ 3 3 2 1 1 1 28 1 1 5 1 2 1 50 チ 1 1 1 11 28 3 1 3 1 50 ツ 2 1 1 1 6 38 1 50 テ 1 1 1 44 1 1 1 50 ト 2 42 5 1 50 1 1 4 4 1 35 1 3 50 Sナ 50 50 LU ニ 1 1 1 3 1 1 1 36 1 1 3 50 Uヌ M 1 1 5 1 1 1 23 15 1 1 50 TIS ネ ノ 50 50 ハ 1 1 1 46 1 50 ヒ 1 49 50 フ 1 49 50 ヘ 50 50 ホ 1 4 1 1 1 1 7 34 50 マ 1 1 3 43 2 50 ミ 1 1 45 3 50 ム 1 1 1 4 2 38 1 1 1 50 メ 3 2 3 2 1 37 2 50 モ 1 6 42 1 50 ヤ 1 1 5 1 2 4 1 1 1 5 3 1 24 50 ユ 3 5 42 50 ヨ1 9 38 2 50 ラ 2 2 1 40 1 4 50 リ 2 48 50 ル 1 1 1 1 4 1 1 40 50 レ 1 49 50 ロ 50 50 ワ 5 45 50 ヲ 1 1 1 1 46 50 ン 17 33 50 28 53 65 45 57 58 36 47 51 53 49 57 62 43 58 37 24 55 76 47 48 52 48 54 51 48 49 69 50 68 45 61 46 43 46 28 43 43 46 70 41 51 50 49 60 40
362
S. Yokotani, J. Yang, and J. Wu
3.2 Relationship between Similarity and Incorrect Responses The relationship between similarities and incorrect responses was shown in Fig.5. The horizontal axis is similarities and the vertical axis is total of incorrect responses. Fig.5 indicated that the incorrect number increased when the similarity of each katakana pair was increased. However, some pairs with high similarity also obtained very lower incorrect number.
Fig. 5. Relationship between similarity and total of incorrect responses
4 Discussion The current experiment investigated the tactile katakana recognition characteristics by active touch. We calculated the accuracy rates and recorded the reaction time in present experiment. We found that blindfolded subjects manually identified 46 Katakana characters with almost 80% accuracy, typically within 30 sec per character. The result indicates that the mean accuracy was decreased with similarity increased. However, several differences with regard to high accuracy and short reaction time under high similarity pair conditions need to be considered. All subjects were asked to touch each katakana by active touch and recognize what the katakana is. Current katakana recognition task is a typical procedure of tactile active shape recognition. The sensory feedback, which is critical for katakana recognition, is generated by the four mechanoreceptive afferent systems [7] located in the skin and the proprioception system. Johnson and Phillips [8] suggested that the threshold for gap discrimination was close to 0 mm when a moving hand was used. Therefore, all subjects in the current experiment were clearly able to perceive the change in size of katakana stimuli and the recognition accuracy related to the similarity of each katakana pair. However, as shown in Fig.5 there were some pairs with high similarity also received high accuracy. We listed these katakana pairs in Table.2.
Investigation on Human Characteristics of Japanese Katakana Recognition
363
Table 2. The high similarity pairs with low incorrect rate
Pair of letters
エ ク フ コ サ
ニ タ ヲ ロ リ
Similarity 0.789 0.775 0.762 0.746 0.730
Sum of incorrect response 1 2 0 0 0
As shown in Table.2, the similarities of these pairs were larger than 0.73, but the number of incorrect response was only 1 or 2. This result was consistent with previous study [9] who investigated the discriminability of English letters. There are two main possible reasons to rationalize this phenomenon. First, even the similarities of these pair were higher, one katakana from each pair was seems very easy to recognized. For example, “ニ” is consist of two simple horizontal lines. Thus, subjects were easily to discriminate the vertical line of “エ”. Second, the number of endpoints for each pair was significantly different. Previous study [10] indicated that indicate that the mechanoreceptive afferent population convey intensive information about the endpoints of shape than the central part. Therefore, we suggested that subjects may obtain more tactile information from the katakana with more endpoints. This process seems contribute to low incorrect rate for high similarity pairs.
5 Conclusion Present study is the first report to investigate the human characteristics of katakana recognition on each letter. The results suggest that the performance of Katakana recognition were higher than 78% for healthy young subjects. However, the mean accuracy was decreased with similarity increased except some katakana pairs. In order to develop a new letter presentation system, we also need to consider how to improve the recognition performance for high similarity pairs with low accuracy.
Acknowledgment This study was supported in part by a Grant-in-Aid for Scientific Research (B) 21404002, Japan and AA Science Platform Program of the Japan Society for the Promotion Science. We also thank the subjects who participated in this study and the staff of the Biomedical Engineering Laboratory for their assistance with data collection.
References 1. Emerson, F.: Reading Braille. In: Schiff, Foulke (eds.) Tactual Perception - A Resource Book, p. 170. Cambridge University Press, New York (1982) 2. Oyama, T., Imai, S., Wake, T.: Handbook of sensation and perception, Revised edn. p. 1285. Seishinshobo, Tokyo (1994) (in Japanese)
364
S. Yokotani, J. Yang, and J. Wu
3. Oyama, T., Imai, S., Wake, T.: Handbook of sensation and perception, Revised edn. p. 1286. Seishinshobo, Tokyo (1994) (in Japanese) 4. Loomis, J.M.: On the tangibility of letters and braille. Perception & Psychophysics 29, 37– 46 (1981) 5. Manning, H., Tremblay, F.: Age differences in tactile pattern recognition at the fingertip. Somatosensory and Motor Research 23, 147–155 (2006) 6. Oldfield, R.C.: The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9, 97–113 (1971) 7. Johnson, K.O.: The roles and functions of cutaneous mechanoreceptors. Current Opinion in Neurobiology 11, 455–461 (2001) 8. Johnson, K.O., Phillips, J.R.: Tactile spatial resolution I: Two-point discrimination, gap detection, grating resolution, and letter recognition. Journal of Neurophysiology 46, 1177– 1191 (1981) 9. Looms, J.M.: Tactile letter recognition under different modes of stimulus presentation. Perception & Psychophysics 16, 401–408 (1974) 10. Blake, D.T., Johnson, K.O., Hsiao, S.S.: Monkey cutaneous SAI and RA responses to raised and depressed scanned patterns: effects of width, height, orientation, and a raised surround. Journal of Neurophysiology 78, 2503–2517 (1997)
Towards Systematic Human Brain Data Management Using a Data-Brain Based GLS-BI System Jianhui Chen1 , Ning Zhong1,2 , and Runhe Huang3 1
International WIC Institute, Beijing University of Technology, Beijing, 100124, China 2 Department of Life Science and Informatics, Maebashi Institute of Technology, Maebashi-City, 371-0816, Japan 3 Faculty of Computer and Information Sciences, Hosei University, Tokyo, 184-8584, Japan [email protected], [email protected], [email protected]
Abstract. Aiming at the characteristics of thinking centric studies, Brain Informatics (BI) emphasizes on a systematic approach to investigate human information processing mechanisms. Systematic human brain data management is the basic of BI methodology. It needs to realize not only the storage and data publishing oriented management, but also the systematic analysis oriented management. However, the traditional brain databases cannot effectively support such a systematic human brain data management. In this paper, we propose a Data-Brain based framework, Global Learning Scheme for BI (GLS-BI), to dynamically integrate BI data and analytical resources for realizing the systematic analysis oriented brain data management. The GLS-BI offsets the disadvantages of the existing brain databases and provides a practical approach towards the systematic human brain data management.
1
Introduction
Brain Informatics (BI) [19] represents a new viewpoint to study human brain. It focuses on the thinking centric studies of human cognitive functions, which are complex and involved with multiple inter-related functions with respect to activated brain areas and their neurobiological processes of spatio-temporal features for a given task. Hence, BI emphasizes on a systematic methodology, including the four core issues: systematic investigation of human thinking centric mechanisms, systematic design of cognitive experiments, systematic human brain data management, and systematic human brain data analysis and simulation [9]. Systematic human brain data management is the basic of BI methodology. For effectively supporting systematic human brain data analysis, systematic human brain data management needs to meet the management requirements stated in [20]. These requirements can be generalized as realizing the following three levels of data management: Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 365–376, 2010. c Springer-Verlag Berlin Heidelberg 2010
366
J. Chen, N. Zhong, and R. Huang
– The storage oriented management is to effectively store the heterogeneous brain data and related information for the local or small-scale data query and data utilization. – The data publishing oriented management is to describe the origins of raw data, middle data and analytical results in detail using various metadata, especially semantic metadata, for web based large-scale data reuse; – The systematic analysis oriented management is to dynamically integrate various BI data and analytical resources for effectively supporting systematic human brain data analysis. Over the last decade, a large amount of brain databases have been constructed to effectively manage brain data. However, most of them only realize the storage and data publishing oriented data management. For realizing the systematic analysis oriented management, BI needs to develop a new approach of brain data management. This paper presents a case study on systematic human brain data management. Based on the various brain data coming from systematic BI studies, a Data-Brain [3,4] based integration framework, Global Learning Scheme for BI (GLS-BI), is proposed to realize the systematic analysis oriented brain data management. It supports the formal descriptions and dynamical integrations of BI data and analytical resources for automatically generating all of possible mining workflows. This will reduce the dependencies on individual capabilities and effectively support the comprehensive multi-aspect data analysis. The GLS-BI provides a practical approach towards the systematic human brain data management. The remainder of this paper is organized as follows. Section 2 discusses background and related work. Section 3 proposes a Data-Brain based framework, GLS-BI, to realize the systematic analysis oriented brain data management. Two core components of the GLS-BI, the knowledge base for BI data and the multi-aspect mining process planning engine, are described in Section 4 and Section 5, respectively. A prototype system is introduced in Section 6. Finally, Section 7 gives the concluding remarks.
2
Background and Related Work
The increasing ability to obtain digital information in human brain study has led to a vast increase of brain data from across a variety of spatial and temporal scales. Various brain databases, such as Olfactory Receptor Database [25] and Neocortical Microcircuit Database [24], have been constructed to effectively store and share these brain data. At present, BI focuses on the EEG (electroencephalogram) data and fMRI (functional magnetic resonance imaging) data. Their databases are the important topics in brain database studies. At the early stage, the EEG and fMRI databases were oriented to data storage [5,14]. Thus, related studies focused on conceptual schemas of databases [14]. With the increase of data, how to effectively reuse these data becomes an important problem. The brain database
Systematic Human Brain Data Management
367
studies begin to focus on the descriptions of brain data. Various metadata have been designed to describe the information about data origins, including experimental proposals [15] and data processing [8]. However, because most of the data in the existing brain databases come from the studies isolated from each other, their metadata neglect the relationships among the data which were obtained by different experimental tasks or analytical tasks. Thus, these brain databases are oriented to data publishing. Based on them, it is difficult to synthetically understand and reuse the data coming from different experimental tasks or analytical tasks. Systematic brain data analysis is an important issue of BI methodology. At present, a POM (peculiarity oriented mining) centric multi-aspect data analysis has been proposed [10,21] to integrate multiple data and analytical methods for realizing the systematic brain data analysis. However, using the existing brain databases, this POM centric multi-aspect data analysis can only be implemented by an expert-driven approach. Researchers need to choose fit data and analytical methods based on their own knowledge. This is very difficult for most researchers. Thus, BI needs to develop a new approach for realizing the systematic analysis oriented data management. It can effectively assist researchers to integrate various data and analytical resources for the multi-aspect data analysis. In recent years, some researches focus on analysis oriented brain data storages and descriptions, in order to not only share data across multiple sites, but also make data available and useful to the scientific community at large. The LONI pipeline workflow environment [23] is a typical example. It defines data provenance and processing provenance to describe experiments and data processing for combining various software packages to achieve more sensitive and accurate results. However, the LONI only provides a workflow platform on which the selection of data and analytical methods still depends on researchers’ knowledge. Thus, it can also not realize the systematic analysis oriented data management.
3
Global Learning Scheme for BI
The Global Learning Scheme (GLS) [1,18] is a multi-agent system which models the KDD (Knowledge Discovery and Data Mining) process as an organized society of autonomous knowledge discovery agents for dynamically organizing and managing the KDD process based on the ontology of KDD agents. We extend the GLS and propose a Global Learning Scheme for BI (GLS-BI). It formally describes and dynamically integrates the existing BI data and analytical resources for realizing the systematic analysis oriented data management. In the GLS-BI, all of possible mining workflows can be automatically generated to guide the multi-aspect data analysis and reduce the dependencies on individual capabilities during the analytical process. Figure 1 illustrates the system architecture of GLS-BI, including the following three levels:
368
J. Chen, N. Zhong, and R. Huang
Analysis Purpose Portal Level
BI Portal
Controlling Center
GLS-BI
Planning and Controlling Level
Implementing Level
Analysis Agent Data Agent
Fig. 1. A multi-level system architecture of GLS-BI
– The portal level provides a series of graphical user interfaces (GUIs) by which users can implement the multi-aspect brain data analysis guided by the formal domain knowledge. – The planning and controlling level includes a controlling center to realize the planning and controlling of mining processes. Its core is a multi-aspect mining process planning engine which can dynamically integrate data and analysis agents to generate all of possible mining workflows based on the formal domain knowledge. – The implementing level includes various data and analysis agents. Each data agent is corresponding to one data set of an experimental group1 in BI experimental studies. It can be a physical data agent which can respond to data requests and return data. It can also be a virtual data agent which only denotes an existing data set. Each analysis agent is corresponding to an analytical process in BI data analytical studies. It can be a physical analysis agent which can respond to analytical requests and return analytical results. It can also be a virtual analysis agent which only denotes an existing analytical process. For servicing the global BI community, the GLS-BI must be oriented to an open system. Thus, Semantic Web Service technologies are chosen for agent description, agent discovery, agent organization, etc. This method is called agentbased Web service workflow model, which combines agent and Web service and has been successfully applied in many previous studies [2,16]. Figure 2 illustrates the software architecture of GLS-BI. The WSDL (Web Services Description Language) and UDDI (Universal Description Discovery and Integration Protocol) are used for service description and registration. The OWLS (Web Ontology Language-Service) is used for semantic service description. All of these are the mature information technologies and supported by various tools, such as Apache Axis2 and Jena. 1
An experimental group is a group of experiments which have the same experimental proposal.
Systematic Human Brain Data Management
Presentation Layer
369
Graphical User Interfaces
Multi-aspect Mining Process Planning Engine
Middleware Layer Service Register Center
Knowledge Base for BI Data
Semantic Service Descriptions
Semantic Service Descriptions
Services (virtual data agents)
Services (physical data agents)
Semantic Service Descriptions
Management and Controlling Tools
GLS-BI
Semantic Service Descriptions
Service Layer
Physical Layer
Existing Data Sets
Available Data Resources
Services (virtual analysis agents
Existing Analytical Methods
Services (physical analysis agents)
Available Computing Resources
Fig. 2. A multi-level software architecture of GLS-BI
In this architecture, the knowledge base for BI data and the multi-aspect mining process planning engine are the core components. They will be introduced in the next sections.
4
The Knowledge Base for BI Data
The knowledge base for BI data is a formal domain knowledge base which contains the knowledge about BI data for annotating the semantic service descriptions and guiding the multi-aspect mining process planning. It includes three sub-components: Data-Brain, Data-Brain based BI provenances and domain ontologies. The Data-Brain is a conceptual brain data model, which represents functional relationships among multiple human brain data sources, with respect to all major aspects and capabilities of human information processing systems (HIPS), for systematic investigation and understanding of human intelligence [3,4]. Based on BI methodology, an ontological modeling approach has been proposed for the Data-Brain construction [4]. We have realized a Data-Brain prototype using the OWL (Web Ontology Language). The ontological Data-Brain is the core of knowledge base for BI data. On one hand, it provides an ontological framework to integrate various BI data related domain ontologies. On the other hand, it also provides a general conceptual schema to describe BI data. Various conceptual schemas of BI provenances (metadata) can be obtained by extracting related concepts and relations from the Data-Brain.
370
J. Chen, N. Zhong, and R. Huang
The BI provenances are the metadata which describe the various data resources obtained by BI experimental studies and data analysis studies. The metadata describing the origin and subsequent processing of biological images is often referred to as “provenance” [13]. Similarly, we call “BI provenances”, including data provenances and analysis provenances, which are the metadata describing the origin and subsequent processing of various human brain data in systematic BI study. A data provenance is a metadata set that describes related information of an experimental group, including the subject information, how the experimental data of subjects were collected, what instruments were used, etc. An analysis provenance is a metadata set that describes what processing a brain data set has undergone, including what analytic tasks were performed, what experimental data were used, what data features were extracted, etc. The data obtained by different experiments and data analysis need different data and analysis provenances. All of their schemas can be extracted from the Data-Brain. For example, after completing a series of fMRI experiments of reversed triangle inductive reasoning related tasks [7], the following data resources can be gotten: – a group of experimental materials corresponding to two tasks, reversed triangle inductive task and reversed triangle computing task, – a group of equipment parameters, such as TR=2s, 25 axial slices, etc., – thirty experimental records which describe the information about experimental processes, such as start time, operator, etc., – thirty subject records which describe the basic information about subjects, such as name, age, handedness, etc., – thirty fMRI data sets (twenty valid data sets and ten invalid data sets) in which each data set includes a series of whole-brain BOLD (Blood Oxygenation Level-dependent) images with the EPI (echo planar imaging) sequence. From these data resources, we can identify some key concepts, such as “MRI”, “College-Student”, etc. Extracting these concepts and other related concepts, as well as relations among them, by a traversal in the ontological Data-Brain prototype, an ontological view [11] can be obtained from the Data-Brain. After some necessary revisions, including merging subclasses, removing reduplicate relations, etc., the conceptual schema of the data provenance corresponding to the above data resources can be obtained, as shown in Fig. 3. Furthermore, as shown in Fig. 4, the data provenance can be constructed by extracting related information from the above data resources and creating instances of the concepts and relations based on the obtained schema. The domain ontologies are some BI data related ontologies. For describing BI data roundly, the knowledge base for BI data needs to include various formal domain knowledge. Integrating the existing domain ontologies is a practical approach to enrich the knowledge base. For example, brain cortex anatomy ontology [6] can be integrated into this knowledge base to describe the locations of the found activations.
Systematic Human Brain Data Management
371
Numerical-Induction
Experimental-Materials
has-experimental-purpose has-experimental-material
Reversed-TriangleInductive-Task
is-a Reversed-Triangle-Task is-a
Reversed-TriangleComputing-Task
has-experimental-task
has-experimental-purpose Computation has-experimental -means
Experimental-Group
MRI
performs-experiment fMRI-Experiment
has-subject
College-Student
has-result-data EPI-Sequence
Fig. 3. An example of data provenance schemas
Concepts in Data-Brain Instances of Concepts
Experimental-Group Measuring-Instrument EEG MRI Experimental-Task Numerical-Task Reversed-TriangleInductive-Task Reversed-TriangleComputing-Task Sentential-Task Figural-Task Experiment ERP-Experiment fMRI-Experiment Subject College-Student MCI-Patient Experimental-Materials
ed:Experimental-Group (1) ed:Measuring-Instrument ed:EEG ed:MRI (1) ed:Experimental-Task ed:Numerical-Task ed:Reversed-TriangleInductive-Task (1) ed:Reversed-TriangleComputing-Task (1) ed:Sentential-Task ed:Figural-Task ed:Experiment ed:ERP-Experiment ed:fMRI-Experiment (30) ed:Subject ed:College-Student (30) ed:MCI-Patient ed:Experimental-Materials (1)
Cognitive-Function Reasoning Numerical-Induction Computation
fd:Cognitive-Function fd:Reasoning <ed:has-subject rdf:resource="http://www.wici.org/Datafd:Numerical-Induction (1) Brain/MetaData/College-Student/S79"/> fd:Computation (1)
Unstructured-Original-Data Original-ERP-Data Original-fMRI-Data BOLD-Image EPI-Sequence
dd:Unstructured-Original-Data dd:Original-fMRI-Data …… dd:BOLD-Image dd:EPI-Sequence (120)
<ed:has-experimental-means rdf:resource="http://www. wici.org/Data-Brain/MetaData/MRI/f02"/> <ed:has-experimental-task rdf:resource="http://www. wici.org/Data-Brain/MetaData/Reversed-Triangle-InductiveTask/T12"/> <ed:has-experimental-task rdf:resource="http://www. wici.org/Data-Brain/MetaData/Reversed-TriangleComputing-Task /T13"/> <ed:performs-experiment rdf:resource="http://www. wici.org/Data-Brain/MetaData/fMRI-Experiment/fE95"/> <ed:performs-experiment rdf:resource="http://www. wici.org/Data-Brain/MetaData/fMRI-Experiment/fE96"/> ……
<ed:ID>fE95 <ed:name>צϝ㾦 ӏࡵᔦ 㒇⧚fMRIᅲ偠 1
Fig. 4. Data provenance construction based on the Data-Brain
5
The Multi-aspect Mining Process Planning Engine
The multi-aspect mining process planning engine is a core component of the GLS-BI for implementing the multi-aspect mining process planning (MMPP). Based on semantic service descriptions in the service register center, it reasons on the knowledge base for BI data to dynamically generate all of possible mining workflows for different mining purposes of multi-aspect brain data analysis. According to our previous studies [10,21], a mining purpose of multi-aspect brain data analysis can be generalized as “aiming at an objective cognitive function to find some special data features”. Based on such a mining purpose, the planning engine can implement the Data-Brain driven multi-aspect mining process planning by the following steps:
372
J. Chen, N. Zhong, and R. Huang
– Step 1: Based on the objective cognitive function defined in the mining purpose, find fit data agents as the data sources of multi-aspect brain data analysis; – Step 2: Based on the found data agents and the data features defined in the mining purpose, find fit analysis agents as the atomic processes of mining workflows and organize them into a topology graph; – Step 3: Extract various multi-aspect mining workflows from the topology graph. During the above process, the core issue is agent discovery, including data agent discovery and analysis agent discovery. Because Semantic Web Service technologies are used to construct agents, the agent discovery can be realized by semantic service matching. At present, related studies, such as OWLS-UDDI matchmaker [12], mainly focus on the I/O concept based matchmaking, which needs a strict concept hierarchy. In our planning engine, the concept based matchmaking is used for analysis agent discovery. The data dimension of Data-Brain includes various BI data concepts, which have been classified according to data formats. Thus, the data dimension can provide the concept hierarchy needed by semantic analysis agent discovery. We use data concepts of the data dimension to annotate the properties “hasInput” and “hasOutput” of OWL-S based service descriptions of analysis agents, and then compute the semantic matching degrees between service requests(i.e., data of data agents or output data of analysis agents in the previous step of workflow) and input data of analysis agents based on the concept hierarchy of data dimension. The function “degreeOfMatch” stated in [12] is used to compute the semantic matching degrees. If matching degrees are “exact” or “plugIn”, the analysis agents are matching. However, this kind of concept based matchmaking cannot be used for data agent discovery. On one hand, the recent researches about human cognitive functions only obtained some preliminary results. It is difficult to construct a strict concept hierarchy of cognitive functions based on the functional taxonomy for computing the semantic matching degrees between the objective cognitive function and the cognitive functions corresponding to data agents. On the other hand, as stated above, the thinking centric studies of human cognitive functions are complex and involved with multiple inter-related cognitive functions for a given task. Thus, aiming at a cognitive function, such as reasoning, not only the data agents which include the data obtained by inductive reasoning tasks, deductive reasoning tasks, etc., are the fit data sources for multi-aspect brain data analysis, but also the data agents which include the data obtained by computing tasks, attention tasks, etc., can be the fit data sources. Obviously, the concept based matchmaking cannot find such various data agents because it only uses the hierarchical relations among concepts to compute semantic matching degrees. In our studies, an instance based matchmaking is adopted for data agent discovery. BI provenances include many instances of experimental task concepts. We use these instances to annotate the semantic service descriptions of data agents, and then find the fit data agents using the query and rule based reasoning.
Systematic Human Brain Data Management
373
For example, we define the “Reasoning” as the objective cognitive function. According to the experimental task instance based data agent descriptions, the data agents including the data of reversed triangle inductive tasks can be found as data sources by querying in an inference model combining the Data-Brain and BI provenances. However, for supporting comparative analysis, the further data agent discovery needs to be performed to find the data agents whose experimental tasks are “functionally-related-to” reversed triangle inductive tasks, i.e., whose data include some similar spatio-temporal features with the data of reversed triangle inductive tasks based on the results of previous data analysis. Because the experimental task instances are used for data agent descriptions, this further data agent discovery can be realized by querying the experimental task instances which are connected to one/several instances of experimental task concept “Reversed-Triangle-Inductive-Task” by the relationship “functionallyrelated-to” in the inference model. The relationship “functionally-related-to” is implicit and can be generated by reasoning based on the following custom Jena rules [22]: – (?x rdf :type Experimental-T ask), (?y produced-by ?x), (?y rdf :type Structured-Data-F eature) → (?x has-f eature ?y) – (?x has-f eature ?z1), (?y has-f eature ?z2), (?z1 similar-to ?z2), notEqual(?x, ?y) → (?x f unctionally-related-to ?y). Obviously, this kind of instance based matchmaking is more fit to find various data agents. The remaining work of planning engine construction is very simple. We use the dataflow driven service composition technology [17] to organize analysis agents into a topology graph, and then extract mining workflows from the topology graph using the directed graph based depth-first traversal.
6
Prototype System
A prototype system has been developed. It includes three main components: provenance creating module, multi-aspect mining process planning engine, and management and controlling module. The provenance creating module provides GUIs to create Data-Brain based BI provenances in RDF (Resource Description Framework) format. Users can complete the descriptions of experiments and data analysis guided by a series of GUIs. This simplifies and standardizes the gathering process of multi-aspect data information. The multi-aspect mining process planning engine is the core component of prototype system. It assists users to implement the MMPP by a series of GUIs shown in Fig. 5. All of possible mining workflows are generated to guide the multi-aspect brain data analysis. Some workflows reflect the analytical approaches used in those previous studies, and others show the neglected analytical approaches.
374
J. Chen, N. Zhong, and R. Huang
The First Step: Purpose Definition
The Second Step: Data-Brain Driven Data Selection
The Third Step: Data-Brain Driven Analysis Agent Discovery
The Fourth Step: Workflow Extraction
Fig. 5. The GUIs for multi-aspect mining process planning
The management and controlling module is used to manage the service register center and control performances of workflows. In fact, the prototype system is not intended to provide a distributed analytical environment, but rather to demonstrate a Data-Brain driven approach for integrating the existing BI data and analytical resources. Thus, this module only provides basic information of agents. Users need to manually call each agent according to the descriptions of workflows.
7
Conclusions and Future Work
Because of the complexity of human thinking centric studies, BI adopts a systematic research methodology. As an important issue of BI methodology, systematic human brain data management needs to realize not only the storage and data publishing oriented management but also the systematic analysis oriented management. However, the traditional brain databases cannot support such a systematic human brain data management. This paper proposed a Data-Brain based integration framework, GLS-BI, to realize the systematic analysis oriented brain data management. The GLS-BI offsets the disadvantages of the existing brain databases and provides a practical approach towards the systematic human brain data management. Our study only obtained some preliminary results and more work needs to be done: – The multi-aspect mining process planning engine provides all of possible mining workflows. With the increase of data and analytical resources in the GLS-BI, it is difficult to find valuable workflows from the large numbers
Systematic Human Brain Data Management
375
of possible workflows. Hence, the workflow evaluation needs to be further studied to simplify the discovery of valuable workflows. – Our recent study only focuses on the functions of agents. The description, discovery and organization of agents are based on their functions. This is only fit for a small-scale prototype system. For constructing a global integration framework, the further study needs to focus on more agent properties, such as QoS (Quality of Service), security, etc.
Acknowledgment The work is partially supported by the China Scholarship Council (CSC) (File No. 2009654018), National Natural Science Foundation of China (Number: 60905027), Beijing Natural Science Foundation (4102007), Open Foundation of Key Laboratory of Multimedia and Intelligent Software (Beijing University of Technology), Beijing, and Support Center for Advanced Telecommunications Technology Research, Foundation (SCAT), Japan.
References 1. Brazdil, P., Soares, C.: Metalearning: Applications to Data Mining. Springer, Heidelberg (2009) 2. Buhler, P.A., Vidal, J.M.: Towards Adaptive Workflow Enactment Using Multiagent Systems. Information Technology and Management Journal 6(1), 61–87 (2005) 3. Chen, J.H., Zhong, N.: Data-Brain Modeling Based on Brain Informatics Methodology. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2008), pp. 41–47. IEEE Computer Society Press, Los Alamitos (2008) 4. Chen, J.H., Zhong, N.: Data-Brain Modeling Based on Brain Informatics Methodology. In: Zhong, N., Li, K., Lu, S., Chen, L. (eds.) BI 2009. LNCS (LNAI), vol. 5819, pp. 182–193. Springer, Heidelberg (2009) 5. Finnerup, N.B., Fuglsang-Frederiksen, A., Rossel, P., Jennum, P.: A Computerbased Information System for Epilepsy and Electroencephalography. International Journal of Medical Informatics 55, 127–134 (1999) 6. Golbreich, C., Dameron, O., Gibaud, B., Burgun, A.: Web Ontology Language Requirements w.r.t Expressiveness of Taxonomy and Axioms in Medicine. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 180–194. Springer, Heidelberg (2003) 7. Liang, P.P., Zhong, N., Lu, S.F., Liu, J.M., Yao, Y.Y., Li, K.C., Yang, Y.H.: The Neural Mechanism of Human Numerical Inductive Reasoning Process: A Combined ERP and fMRI Study. In: Zhong, N., Liu, J., Yao, Y., Wu, J., Lu, S., Li, K. (eds.) WImBI 2006. LNCS (LNAI), vol. 4845, pp. 223–243. Springer, Heidelberg (2007) 8. MacKenzie-Graham, A.J., Van Horn, J.D., Woods, R.P., Crawford, K.L., Toga, A.W.: Provenance in Neuroimaging. NeuroImage 42, 178–195 (2008) 9. Motomura, S., Hara, A., Zhong, N., Lu, S.F.: POM Centric Multi-aspect Data Analysis for Investigating Human Problem Solving Function. In: Ra´s, Z.W., Tsumoto, S., Zighed, D.A. (eds.) MCD 2007. LNCS (LNAI), vol. 4944, pp. 252–264. Springer, Heidelberg (2008)
376
J. Chen, N. Zhong, and R. Huang
10. Motomura, S., Zhong, N.: Multi-aspect Data Analysis for Investigating Human Computation Mechanism. Cognitive Systems Research 11(1), 3–15 (2010) 11. Noy, N.F., Musen, M.A.: Specifying Ontology Views by Traversal. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 713–725. Springer, Heidelberg (2004) 12. Paolucci, M., Kawamura, T., Payne, T.R., Sycara, K.: Semantic Matching of Web Services Capabilities. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 333–347. Springer, Heidelberg (2002) 13. Simmhan, Y.L., Plale, B., Gannon, D.: A Survey of Data Provenance in e-Science. Sigmod Record 34, 31–36 (2005) 14. Van Horn, J.D., Grethe, J.S., Kostelec, P., Woodward, J.B., Aslam, J.A., Rus, D., Rockmore, D., Gazzaniga, M.S.: The Functional Magnetic Resonance Imaging Data Center (fMRIDC): The Challenges and Rewards of Large-scale Databasing of Neuroimaging Studies. Philosophical Transactions of the Royal Society B: Biological Sciences 356(1412), 1323–1339 (2001) 15. Van Horn, J.D., Gazzaniga, M.S.: Maximizing Information Content in Shared and Archived Neuroimaging Studies of Human Cognition. In: Koslow, S.H., Subramaniam, S. (eds.) Databasing the Brain, pp. 449–458. Wiley, Chichester (2005) 16. Wang, S., Shen, W., Hao, Q.: An Agent-based Web Service Workflow Model for Inter-enterprise Collaboration. Expert Systems with Applications 31, 787–799 (2006) 17. Wang, M., Du, Z.H., Chen, Y.N., Zhu, S.H., Zhu, W.H.: Dynamic Dataflow Driven Service Composition Mechanism for Astronomy Data Processing. In: Proceedings of the 2007 IEEE International Conference on e-Business Engineering (ICEBE 2007), pp. 596–599 (2007) 18. Zhong, N., Liu, C., Ohsuga, S.: Dynamically Organizing KDD Processes. International Journal of Pattern Recognition and Artificial Intelligence 15(3), 451–473 (2001) 19. Zhong, N.: Impending Brain Informatics (BI) Research from Web Intelligence (WI) Perspective. International Journal of Information Technology and Decision Making 5(4), 713–727 (2006) 20. Zhong, N., Liu, J.M., Yao, Y.Y., Wu, J.L., Lu, S.F., Qin, Y.L., Li, K.C., Wah, B.: Web Intelligence Meets Brain Informatics. In: Zhong, N., Liu, J., Yao, Y., Wu, J., Lu, S., Li, K. (eds.) WImBI 2006. LNCS (LNAI), vol. 4845, pp. 1–31. Springer, Heidelberg (2007) 21. Zhong, N., Motomura, S.: Agent-enriched Data Mining: A Case Study in Brain Informatics. IEEE Intelligent Systems 24(3), 38–45 (2009) 22. Jena 2 Inference Support, http://jena.sourceforge.net/inference/ 23. LONI Pipeline, http://pipeline.loni.ucla.edu 24. Neocortical Microcircuit Database (NMDB), http://microcircuit.epfl.ch 25. Olfactory Receptor DataBase (ORDB), http://senselab.med.yale.edu/ORDB/
The Role of the Parahippocampal Cortex in Memory Encoding and Retrieval: An fMRI Study Mi Li1,2 , Shengfu Lu1 , Jiaojiao Li1 , and Ning Zhong1,3 1
3
International WIC Institute, Beijing University of Technology Beijing 100022, P.R. China [email protected] 2 Liaoning ShiHua University, Liaoning, 113001, P.R. China [email protected] Dept. of Life Science and Informatics, Maebashi Institute of Technology Maebashi-City 371-0816, Japan [email protected]
Abstract. A large number of studies have provided evidence of the parahippocampal cortex (PHC) activation during episodic memory which refers to two major processes that encoding and retrieval. To date, the differential role of PHC is involved memory encoding and retrieval remains unclear. We examined this issue in an fMRI study by measuring the brain activity of 36 normal subjects. The tasks consisted of encoding and retrieval two present forms of information (text and figure). At encoding, subjects were required to try to read and comprehend the meaning of the text or figure; at retrieval, they were asked to make judgments in regard to the content of reading comprehension. The direct comparison was between encoding and retrieval irrespective of text and figure that encoding was more significantly activated the left PHC (BA36) than retrieval; in contrast, retrieval was no significantly activated the PHC than encoding. The results suggest that the PHC is more involved in memory encoding. In addition, more significantly activated regions in PHC related to figure than text irrespective of encoding and retrieval indicates that the PHC more contributes to figure than text.
1
Introduction
The Parahippocampal cortex (PHC) located in the medial temporal lobe (MTL) is believed to play an important role in episodic memory, spatial analysis and contextual associations processing in many previous neuroimaging studies. The specific role of PHC is still a hot topic in cognitive neuroscience. Memory encoding and retrieval are two major processing in episodic memory. Numerous previous studies on memory encoding processes have reported the activation in PHC using visuo-spatial materials [1] - [5], verbal materials [6,7], or both [8,9]. In one of the first such studies, Stern et al. [1] reported activation of the posterior aspects of the hippocampus and the PHC during participants viewing the colored magazine pictures. The subsequently studies [4,5,8] compared Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 377–386, 2010. c Springer-Verlag Berlin Heidelberg 2010
378
M. Li et al.
the encoding pictures of scenes with the control condition, and the results also suggest that the encoding condition produced the activation of PHC bilaterally. The PHC responses during encoding extend beyond the domain visuo-spatial materials and may involve other processes [6]. Wagner et al. [7] designed two experiments, one using a blocked design and one using event-related fMRI, to examine verbal encoding process. In the block-design experiment, participants made the semantic encoding judgments, and the fMRI data showed that the left PHC was significantly greater activated; In event-related experiment, to determine whether the left PHC responses to the memory performance, they further analyzed the fMRI data by a word was subsequently remembered or forgotten, and found significant activation in left PHC between the remembered and forgotten. The combined findings of nonverbal and verbal tasks indicate that some lateralization of PHC responses during encoding: verbal tasks particularly activated the left PHC, whereas the nonverbal tasks more recruited the bilaterally PHC. Consistent with these observations, Kelly et al. [9] reported a similar pattern of lateralization during subjects took part in three encoding tasks that words, objects and unfamiliar faces. In contrast to the memory encoding process, only a handful of fMRI studies have reported the PHC activations during memory retrieval. Aguirre et al. [2] examined retrieval of a recently learned complex environment which was described earlier. The result showed that the bilateral PHC was activated during retrieval, which was similar to their findings about the encoding of the environment. Aguirre and D’Esposito [3] scanned subjects while they attempted to recall different aspects of a “virtual town” that there was also significant bilateral PHC activation in memory retrieval. Another study by Schacter [10,11] through the recognition of previously studied words, which provided the evidence of left PHC activations during retrieval. In addition, a review by Lepage et al. [12] suggests that the anterior MTL is strongly associated with episodic encoding, whereas posterior MTL is strongly associated with episodic retrieval. To date, however, few studies reported that the direct comparison between memory encoding and retrieval in PHC. To address this issue, the present event-related fMRI study focused on the specific role of PHC in encoding and retrieval two present forms that text and figure.
2 2.1
Methods Participants
Thirty-six native-Chinese speaking, right-handed normal participants took part in the experiment (18 males and 18 females), with a mean age of 22.5 (SD 1.7) years. None of the participants reported any history of neurological or psychiatric diseases. All participants gave informed consent and the study was approved by the Ethical Committee of Xuanwu Hospital of Capital Medical University and the institutional Review Board of the Beijing University of Technology.
The Role of the Parahippocampal Cortex in Memory Encoding and Retrieval
2.2
379
Materials and Procedure
In our study, the stimulus materials were selected the content about the common concerned information in daily life, such as “In 2007, five top countries of the global transgenic crop area include: first is USA 57.7 million hectares; second is Argentina 19.1; third is Brazil 11.5; then Canada 6.1 and India 3.8”. The same content was described by two present forms: text and figure. We presented four kinds of stimuli to subjects: text encoding, text retrieval, figure encoding and figure retrieval. At the stage of encoding, each text was presented for a period of 14 seconds, each figure was presented for 16 seconds, and subjects were required to try to read and comprehend the graphic information. Then, at the stage of retrieval, subjects were required to make judgments in regard to the content of previous reading, and press the corresponding button in which right or left, as quickly as possible. The experiment consisted of four sessions. The order of different type stimuli within the sessions was presented randomly in an event-related design. The images for the initial 10s were discarded because of unsteady magnetization, and the remaining images per session were analyzed. The participants were instructed to read text and figure information attentively. Four sessions were collected per each participant. The images for the initial 10s were discarded because of unsteady magnetization; the remaining images in the session were used in the analysis. 2.3
Image Acquisition
Blood oxygenation level-dependent fMRI signal data were collected from each participant using a Siemens 3-T Trio scanner (Trio system; Siemens Magnetom scanner, Erlangen, Germany). Functional data were acquired using a gradientecho echo-planar pulse sequence (TR = 2000 ms, TE = 31 ms, FA = 90◦ ,the matrix size = 64× 64 mm, 30 axial slices parallel to the AC-PC plane, Voxel = 4 × 4 × 4 mm, 0.8 mm inter-slice gap, FOV = 240 × 240 mm). High-resolution T1weighted anatomical images were collected in the same plane as the functional image using a spin echo sequence with the following parameters (TR = 130 ms, TE = 2.89 ms, FA = 70◦ , the matrix size = 320 × 320 mm, 30 axial slices parallel to the AC-PC plane, Voxel = 0.8 × 0.8 × 4 mm, FOV = 240 × 240 mm). 2.4
Data Analysis
Data analysis was performed with SPM2 from the Welcome Department of Cognitive Neurology, London, UK implemented in Matlab 7.0 from the Mathworks, Sherborne, MA, USA. MNI coordinates were transferred into Talairach coordinates (Talairach and Tournoux, 1988). The first two scans were discarded from the analysis to eliminate nonequilibrium effects of magnetization. The functional images of each participant were corrected for slice timing, and all volumes were spatially realigned to the first volume (head movement was < 2 mm in all cases). A mean image created from the realigned volumes was coregistered
380
M. Li et al.
with the structural T1 volume and the structural volumes spatially normalized to the Montreal Neurological Institute (MNI) EPI temple using nonlinear basis functions. Images were resampled into 2-mm cubic voxels and then spatially smoothed with a Gaussian kernel of 8 mm full-width at half-maximum (FWHM). The stimulus onsets of the trials for each condition were convolved with the canonical form of the hemodynamic response function (hrf) as defined in SPM 2. Statistical inferences were drawn on the basis of the general linear modal as it is implemented in SPM 2. Linear contrasts were calculated for the comparisons between conditions. The contrast images were then entered into a second level analysis (random effects model) to extend statistical inference about activity differences to the population from which the participants were drawn. Activations are reported for clusters of 50 contiguous voxels (400 mm3 ) that surpassed a corrected threshold of p < .05 on cluster level.
3 3.1
Results Behavioral Results of the fMRI Study
Behavioral accuracy was larger than 0.75 in each of the memory retrieval tasks under scanning, indicating that the brain activity being measured was associated with successful memory encoding and retrieval in all tasks (Table1). A two-way ANOVA showed that there were no significant differences between text and figure in behavioral accuracy [F (1, 70) = 0.03, P = 0.87 > 0.05], which revealed that text and figure tasks were homogenous. Table 1. Behavioral results during the fMRI experiment Accuracy (%correct) Text Figure
3.2
78.62 ± 8.76 78.96 ± 9.29
Reaction time (s) 3.88 ± 0.57 4.17 ± 0.61
fMRI Results
Direct comparison between memory encoding and retrieval. As shown in Fig. 1a and Table 2, the main effect of encoding vs. retrieval was significantly greater activated the left PHC (Talairach: -24, -32, -17, BA36, L). Then we performed two additional comparisons: figure encoding vs. figure retrieval and text encoding vs. text retrieval. Figure encoding was more significantly activated the anterior PHC (Talairach: -22, -13, -21, BA28, L) than figure retrieval; and text encoding was also more significantly activated the posterior PHC (Talairach: -24, -32, -17, BA36, L) than text retrieval. In contrast, retrieval was no significantly activated the PHC than encoding in all conditions (see Fig. 1b). The results suggest that the left PHC was more involved in the encoding than retrieval.
The Role of the Parahippocampal Cortex in Memory Encoding and Retrieval
381
Fig. 1. Statistical parametric map (SPM) through the subjects normalized averaged brains of PHC for comparisons of encoding and retrieval.(a) encoding > retrieval (b) retrieval > encoding. All of the Statistical Parametric Mapping t of the contrasts was thresholded at t > 5.63 (p < 0.05, corrected) and an 80 mm3 . FR, figure encoding; TR, text encoding; FA, figure retrieval; TA, text retrieval
Fig. 2. Statistical parametric map (SPM) through the subjects normalized averaged brains of PHC for comparisons of figure and text: (a) figure > text (b) text > figure. All of the Statistical Parametric Mapping t of the contrasts was thresholded at t > 5.63 (p < 0.05, corrected) and an 80 mm3 . FR, figure encoding; TR, text encoding; FA, figure retrieval; TA, text retrieval
382
M. Li et al.
Direct comparison between figure and text. To further investigate the present form-specific in PHC, we did the direct comparison of text and figure regardless of encoding and retrieval. As shown in Fig. 2a and Table 2, the main effect of figure vs. text was significantly more activated the bilateral PHC (Talairach: -30, -45, -10, BA37, L and Talairach: 30, -43, -10, BA37, R) which left greater than right. Then the subtraction of figure and text was in two processes respectively. During encoding, figure was more significantly activated in bilateral PHC (Talairach: -32, -44, -8, BA37, L and Talairach: 30, -55, -9, BA19, R) which left greater than right; during retrieval, figure was more significantly activated in only right PHC (Talairach: 32, -28, -17, BA36, R). In contrast, text was no significantly activated the PHC than figure in all conditions (see Fig. 2b). These results indicate that the PHC more contributes to figure than text.
4
Discussion
A major question addressed by the present investigation was how the differential role of PHC is involved memory encoding and retrieval. The direct comparison between memory encoding and retrieval showed that the left PHC was higher activated by encoding vs. retrieval, whereas there was no significant activation of the PHC in retrieval vs. encoding (see Fig. 1a). The results suggest that the PHC was more related to memory encoding than retrieval, which was supported by the previous studies that episodic encoding was related to higher response in left PHC than episodic retrieval of that same type of information [13,14] . Some reviews [15,16] also summarized that a variety of fMRI studies have found that the PHC, particularly posterior PHC, play an important role in encoding process, whereas only a few studies have reported the similar activation during retrieval. During the process of encoding, subjects reading and comprehend the text of figure information, which requires for continuously integrating a variety of information and forming the novel associations among items. The evidence from the study about the novel words recognition showed that encoding of novel word elicited a greater activation in the medial part of the left PHC than word recognition [17]. Other studies were also reported the similar results that the bilaterally PHC was more activated in encoding of novel pictures [1,4,8]. In contrast, during the process of retrieval, subjects were only required to retrieve and make judgments in regard to the content of memory encoding, which is not relevant to the information integration and association. Therefore, the PHC responds more strongly to memory encoding than retrieval. In addition, the main effect of encoding vs. retrieval which was more activated left PHC (Talairach: -24, -32, -17, BA36), and the same region was also engaged in text encoding vs. text retrieval, which indicated that this region may be more associated with the semantic encoding. Most previous neuroimaging studies demonstrated that the left PHC was participated in semantic encoding of pictures and words [13,14,18]. Furthermore, the figure encoding was more activated the left PHC (Talairach: -22, -13, -21, BA28) which located at more anterior PHC than the activation (Talairach: -24, -32, -17, BA36) in text encoding vs. text retrieval. The anterior PHC has been reported in a number of
The Role of the Parahippocampal Cortex in Memory Encoding and Retrieval
383
Table 2. Brain activations within the parahippocampal cortex (PHC) related to different Coordinatesa Region Encoding > Retrieval (FR + TR) > (FA + TA) Lt. PHC (BA36) FR > FA Lt. PHC (BA28) TR > TA Lt. PHC (BA36) Figure > Text (FR + FA) > (TR + TA) Lt. PHC (BA37) Rt. PHC (BA37) FR > TR Lt. PHC (BA37) Rt. PHC (BA19) FA > TA Rt. PHC (BA36) a
x
y
z
t
Cluster size (mm3 )
-24
-32
-17
7.10
1816
-22
-13
-21
6.33
200
-24
-32
-17
6.45
240
-30 30
-45 -43
-10 -10
17.78 9.23
10392 9552
-32 30
-44 -55
-8 -9
20.10 13.34
9768 5216
32
-28
-17
11.34
3680
The Talairach coordinates of the centroid and associated maximum t within contiguous regions are reported. BA, Brodmann area; FR, figure reading;TR, text reading; FA, figure-related retrieval; TA, text-related retreival; Lt, left hemisphere; Rt, right hemisphere; PHC, parahippocampal cortex.
studies that specifically activated in relational processing of nonspatial stimulus such as associating abstract nouns [19], or face-name pairs [20]. Moreover, the recent studies about the contextual associations were also found that a similar region was recruited by the nonspatial associations [21] - [23]. As encoding the semantic associative information of figure is not as clear as text, therefore the more anterior left PHC serves to figure encoding. Thus, it seems likely that in the present study the activation of the left PHC in encoding vs. retrieval reflects the left PHC is more important for semantic encoding of text and figure information than retrieval. To further investigate the differences between figure and text in PHC, we did the direct comparison of text and figure regardless of encoding and retrieval. The results showed that bilateral PHC (Talairach: -30, -45, -10, BA37, L and Talairach: 30, -43, -10, BA37, R) was more activated in figure vs. text, whereas there was no significant activation of PHC in text vs. figure (see Fig. 2). The combined findings of previous studies about nonverbal and verbal tasks in PHC indicate that some lateralization of PHC responses during encoding: verbal tasks particularly activated the left PHC, whereas the nonverbal tasks more recruited the bilaterally PHC [1] - [8]. In our study, whether during encoding or retrieval,
384
M. Li et al.
the right PHC was all recruited in figure information. This activation is nearby the fusiform, which is known to be associated with the visuospatial processing during spatial memory [24] - [28], spatial navigation [2,29,30]. The recent studies using various stimuli with contextual associations also found the posterior PHC plays a key role in processing the spatial contextual associations [21] - [23]. Thus, the present results suggest that the PHC was more involved in figure encoding and retrieval.
5
Conclusion
This study investigated that the differential role of PHC is involved memory encoding and retrieval. The results showed that encoding was more significantly activated the left PHC than retrieval irrespective of text and figure; in contrast, retrieval was no significantly activated the PHC than encoding, which suggest that encoding is more dependent on the PHC. In addition, more significantly activated regions in PHC related to figure than text irrespective of encoding and retrieval indicates that the figure is more dependent on the PHC.
Acknowledgements This work is partially supported by the National Science Foundation of China (No. 60775039 and No. 60905027), the 8th Graduate Science and Technology Foundation of Beijing University of Technology (No. ykj-2010-3409) and the grant-in-aid for scientific research (No. 18300053) from the Japanese Ministry of Education, Culture, Sport, Science and Technology, and the Open Foundation of Key Laboratory of Multimedia and Intelligent Software Technology (Beijing University of Technology) Beijing.
References 1. Stern, C.E., Corkin, S., Gonzalez, R.G., Guimaraes, A.R., Baker, J.R., Jennings, P.J., Carr, C.A., Sugiura, R.M., Vedantham, V., Rosen, B.R.: The hippocampal formation participates in novel picture encoding: Evidence from functional magnetic resonance imaging. Proceedings of the National Academy of Sciences the United States of America 93, 8660–8665 (1996) 2. Aguirre, G.K., Detre, J.A., Alsop, D.C., DEsposito, M.: The parahippocampus subserves topographical learning in man. Cereb Cortex 6, 823–829 (1996) 3. Aguirre, G.K., DEsposito, M.: Environmental knowledge is subserved by separable dorsal/ventral neural areas. J. Neurosci. 17, 2512–2518 (1997) 4. Gabrieli, J., Brewer, J.B., Desmond, J.E., Glover, G.H.: Separate neural bases of two fundamental memory processes in the human medial temporal lobe. Science 276, 264–266 (1997) 5. Brewer, J.B., Zhao, Z., Desmond, J.E., Glover, G.H., Gabrieli, J.: Making memories: Brain activity that predicts how well visual experience will be remembered. Science 281, 1185–1187 (1998)
The Role of the Parahippocampal Cortex in Memory Encoding and Retrieval
385
6. Fernandez, G., Weyerts, H., Schrader-Bolsche, M., Tendolkar, I., Smid, H., Tempelmann, C., Hinrichs, H., Scheich, H., Elger, C.E., Mangun, G.R., Heinze, H.J.: Successful verbal encoding into episodic memory engages the posterior hippocampus: A parametrically analyzed functional magnetic resonance imaging study. J. Neurosci. 18, 1841–1847 (1998) 7. Wagner, A.D., Schacter, D.L., Rotte, M., Koutstaal, W., Maril, A., Dale, A.M., Rosen, B.R., Buckner, R.L.: Building memories: Remembering and forgetting of verbal experiences as predicted by brain activity. Science 281, 1188–1191 (1998) 8. Rombouts, S., Machielsen, W., Witter, M.P., Barkhof, F., Lindeboom, J., Scheltens, P.: Visual association encoding activates the medial temporal lobe: A functional magnetic resonance imaging study. Hippocampus 7, 594–601 (1997) 9. Kelley, W.M., Miezin, F.M., McDermott, K.B., Buckner, R.L., Raichle, M.E., Cohen, N.J., Ollinger, J.M., Akbudak, E., Conturo, T.E., Snyder, A.Z., Petersen, S.E.: Hemispheric specialization in human dorsal frontal cortex and medial temporal lobe for verbal and nonverbal memory encoding. Neuron 20, 927–936 (1998) 10. Schacter, D.L., Uecker, A., Reiman, E., Yun, L.S., Bandy, D., Chen, K.W., Cooper, L.A., Curran, T.: Effects of size and orientation change on hippocampal activation during episodic recognition: A PET study. Neuroreport 8, 3993–3998 (1997) 11. Schacter, D.L., Buckner, R.L., Koutstaal, W., Dale, A.M., Rosen, B.R.: Late onset of anteriorprefrontal activity during true and false recognition: An event-related fMRI study. Neuroimage 6, 259–269 (1997) 12. Lepage, M., Habib, R., Tulving, E.: Hippocampal PET activations of memory encoding and retrieval: The HIPER model. Hippocampus 8, 313–322 (1998) 13. Wiggs, C.L., Weisberg, J., Martin, A.: Neural correlates of semantic and episodic memory retrieval. Neuropsychologia 37, 103–118 (1999) 14. Kohler, S., Moscovitch, M., Winocur, G., McIntosh, A.R.: Episodic encoding and recognition of pictures and words: role of the human medial temporal lobes. Acta Psychologica 105, 159–179 (2000) 15. Schacter, D.L., Wagner, A.D.: Medial temporal lobe activations in fMRI and PET studies of episodic encoding and retrieval. Hippocampus 9, 7–24 (1999) 16. Greicius, M.D., Krasnow, B., Boyett-Anderson, J.M., Eliez, S., Schatzberg, A.F., Reiss, A.L., Menon, V.: Regional analysis of hippocampal activation during memory encoding and retrieval: fMRI study. Hippocampus 13, 164–174 (2003) 17. Jessen, F., Flacke, S., Granath, D.O., Manka, C., Scheef, L., Papassotiropoulos, A., Schild, H.H., Heun, R.: Encoding and retrieval related cerebral activation in continuous verbal recognition. Cognitive Brain Res. 12, 199–206 (2001) 18. Vandenberghe, R., Price, C., Wise, R., Josephs, O., Frackowiak, R.: Functional anatomy of a common semantic system for words and pictures. Nature 383, 254– 256 (1996) 19. Henke, K., Buck, A., Weber, B., Wieser, H.G.: Human hippocampus establishes associations in memory. Hippocampus 7, 249–256 (1997) 20. Sperling, R., Chua, E., Cocchiarella, A., Rand-Giovannetti, E., Poldrack, R., Schacter, D.L., Albert, M.: Putting names to faces: Successful encoding of associative memories activates the anterior hippocampal formation. Neuroimage 20, 1400–1410 (2003) 21. Bar, M., Aminoff, E.: Cortical analysis of visual context. Neuron 38, 347–358 (2003) 22. Aminoff, E., Gronau, N., Bar, M.: The parahippocampal cortex mediates spatial and nonspatial associations. Cereb Cortex 17, 1493–1503 (2007) 23. Bar, M., Aminoff, E., Schacter, D.L.: Scenes unseen: The parahippocampal cortex intrinsically subserves contextual associations, not scenes or places per se. J. Neurosci. 28, 8539–8544 (2008)
386
M. Li et al.
24. Bird, C.M., Burgess, N.: The hippocampus and memory: insights from spatial processing. Nat. Rev. Neurosci. 9, 182–194 (2008) 25. Epstein, R.A., Kanwisher, N.: A cortical representation of the local visual environment. Nature 392, 598–601 (1998) 26. Epstein, R.A., Harris, A., Stanley, D., Kanwisher, N.: The parahippocampal place area: Recognition, navigation, or encoding? Neuron 23, 115–125 (1999) 27. Epstein, R.A.: Parahippocampal and retrosplenial contributions to human spatial navigation. Trends Cogn. Sci. 12, 388–396 (2008) 28. Epstein, R.A., Ward, E.J.: How Reliable Are Visual Context Effects in the Parahippocampal Place Area? Cereb. Cortex 20, 294–303 (2010) 29. Maguire, E.A., Frackowiak, R., Frith, C.D.: Recalling routes around London: Activation of the right hippocampus in taxi drivers. J. Neurosci. 17, 7103–7110 (1997) 30. Mellet, E., Bricogne, S., Tzourio-Mazoyer, N., Ghaem, O., Petit, L., Zago, L., Etard, O., Berthoz, A., Mazoyer, B., Denis, M.: Neural correlates of topographic mental exploration: The impact of route versus survey perspective learning. Neuroimage 12, 588–600 (2000)
Brain Activation and Deactivation in Human Inductive Reasoning: An fMRI Study Peipeng Liang1,2,3 , Yang Mei1,3 , Xiuqin Jia2,3 , Yanhui Yang2,3 , Shengfu Lu1,3 , Ning Zhong1,3,4, and Kuncheng Li2,3 1
International WIC Institute, Beijing University of Technology Beijing 100124, China [email protected], [email protected], [email protected] 2 Xuanwu Hospital, Capital Medical University Beijing 100053, China {xiuqin.jia,yanhui826,lkc1955}@gmail.com 3 Beijing Municipal Lab of Brain Informatics Beijing 100124, China 4 Dept. of Life Science and Informatics, Maebashi Institute of Technology Maebashi-City 371-0816, Japan [email protected]
Abstract. In order to study the cognitive neural mechanism of human inductive reasoning, both the positive and negative activation should be combined. However, most studies only focus on the positive activation and the negative activation of inductive reasoning has not been reported. The present study will examine the two aspects simultaneously. Two experimental tasks were designed according to the magnitude of shared attributes: sharing two common attributes (2T) and sharing one common attribute (1T), and rest acted as control task. 2T and 1T tasks are both inductive reasoning tasks. 2T task contains the component of perceptual features’ integration, while 1T does not. Fourteen college students participated in this study. It was showed that, as compared to rest condition, induction activated a distributed regions including prefrontal cortex (BA 6, 9, 11, 46, 47), caudate, putamen, thalamus, etc., and these regions were related to task difficulty. This may reflect the important role the prefrontal-striatal-thalamus loop in inductive reasoning. The fMRI result also showed the significant negative activation of the right superior temporal gyrus (BA 22), the left angular gyrus (BA 39), bilateral middle frontal gyrus (BA 8, 9, 10), posterior cingulated cortex (BA 31) in inductive reasoning as compared to rest condition. These results were consistent with previous studies of default mode network. Future work were required to examine if there exist induction specific positive activation network and negative activation network, and what the relationship between the two networks.
1
Introduction
Inductive reasoning, a process of identifying the general rule from specific cases, is considered as one of the most important higher cognitive functions of human brain. Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 387–398, 2010. c Springer-Verlag Berlin Heidelberg 2010
388
P. Liang et al.
It is a generalization process from premises to conclusion. Previous studies by using PET (Positron Emission Tomography) or fMRI (Functional Magnetic Resonance Imaging), have preliminarily examined of its neural mechanism [1,2,3,4,5,7,8]. All these studies identified the active brain regions by the contrast between inductive task and baseline task. As to the baseline, some studies employed rest task, and others used perceptual task which include less cognitive process. When subjects participate in inductive reasoning task, two kinds of activations exist (as compared to baseline), i.e., increased activation and decreased activation. The two activations have distinct physiological mechanisms. In order to reveal the cognitive neural mechanism of human inductive reasoning, the two activations should be considered together. As demonstrated before, the active regions sensitive to a specific task may interact with negative activated regions, and the interaction effect may relate to the cognitive efforts required [9,10]. However, only the increased brain activities were explored previously by cognitively subtract baseline from induction. Different from previous studies, the present study would explore the positive (i.e., increased) and negative (i.e., decreased) activations simultaneously. As a kind of higher cognitive activity of human brain, inductive reasoning recruited a large number of regions, including frontal, parietal, occipital and subcortical regions, of which the left prefrontal region was focally discussed of its role. So far, no study has focused on the negative activation of inductive reasoning, although there were some studies of the other tasks including analogical reasoning. It was revealed that several brain regions showed consistent deactivations for dome different experimental tasks. These regions contained posterior cingulate cortex (PCC, BA 29, 31), dorsomedial prefrontal cortex (BA 8, 9), angular gyrus (BA 39), and anterior cingulate cortex (ACC, BA 24, 32) [11,12]. Based on previous studies, we hypothesized that 1) fronto-striatal-thalamic circuit may participate in inductive reasoning, and 2) Significant deactivations may be observed in default mode network regions when the inductive reasoning task is performed.
2
Material and Methods
2.1
Subjects
Fourteen paid undergraduate or graduate students (7 males and 7 females, aged 23.4±1.3 years) participated in the experiment. All participator were righthanded and had the normal or corrected-to-normal vision. None of the participator reported any history of neurological or psychiatric diseases. All participants gave their written informed consent for the study. 2.2
Tasks
The stimuli were consisted of regular geometrical figures. Each figure had two features (shape and stripe orientation), and each feature had five values, i.e., square, circle, triangle, crisscross and pentacle for shape, and horizontal stripe, 45 degree’s stripe, vertical stripe, 135 degree’s stripe and gridding for stripe orientation. All figures were made by Adobe Photoshop CS software and saved as
Brain Activation and Deactivation in Human Inductive Reasoning
389
Fig. 1. Experimental tasks
jpg type. The side length for the square, triangle and crisscross, the diameter for the circle, and the side length and the height for the pentacle were all 3.8 centimeters. The resolution for the screen was 1280×1024. Figures were presented in white on a black background. Three kinds of tasks were designed in this study (see Fig. 1), including 2T, 1T and Rest. Two figures involved in 2T tasks had the same values for the two features (shape and stripe orientation), while two figures involved in 1T tasks only had the same value for one of the two features (shape or stripe orientation). For Rest tasks, figures were replaced by ”*”. The equivalent verbal representation of the experimental tasks was also given in Table 1. Additionally, a kind of interference task (0T) was included, in which two figures had no same values. The present study would focus on two inductive reasoning tasks, i.e., 1T and 2T, and Rest was taken as control task. The instructions were followed: Each figure represented one kind of object. Given that two objects had the property of X, the object with what characteristic had the property of X? To help to understand the experimental tasks, an example was given as follows: Each figure represented a new cultivated watermelon variety. Two watermelons simultaneous presented tasted sweet (/had the gene Y/had no seeds), thus, what watermelons tasted sweet? The third watermelon should be judged if it tasted sweet based on the generalization from the previous two. 2.3
Stimuli Presentation
Trials began with the mark of ”+”in the center of the screen. Two figures were followed and participants were instructed to respond within 4 seconds. The screen with 4 options was then presented for another 4 seconds and participants were required to choose the correct answer. Then, a mark of ”<>” was showed and the third figure was followed. Participants were asked to press button to judge the third figure within 4 seconds. The inter-stimulus interval (ISI) was set to 6 s. Button-pressing response would end the current screen and initiate the presentation of the next screen, otherwise, each screen would disappear after the corresponding presentation time. For rest tasks, participators were asked to relax and respond randomly when needed. 150 trials (50 2T trials, 50 1T trials, 25 0T trials and 25 Rest trials) were designed in this study. All trials were pseudo-randomly distributed into 5 sessions and each session contained 30 trials.
390
P. Liang et al.
Shape and stripe orientation of figures were counterbalanced among sessions. Button-pressing responses were counterbalanced among participants. 2.4
Pre-test Training
Previous studies have demonstrated that human brain prefer to react to overall features (e.g., shape), and then to local features (e.g., stripe orientation). A pilot study of us also validated that the processing of shape in human brain was significantly faster than that of stripe orientation. Therefore, participants were instructed to visually detect the shape of figures firstly, and then stripe orientation. Subjects reviewed example stimuli of each condition prior to being scanned to ensure that they understood the task. The exercise involved all 4 kinds of tasks, and 20 trials were partitioned in 2 sessions. For the first session, participants were asked to orally report their solving procedure without time pressure. For the second session, the instructions were same to the following formal test. 2.5
Imaging Recording
Scanning was performed on a 3.0 T Siemens system using a standard whole-head coil. Functional data was acquired using a gradient echo planar pulse sequence (TR = 2 s, TE = 31 ms, 30 axial slices, 3.75×3.75×4 mm3 voxels, 0.8 mm interslice gap, 90◦ flip angle, 64×64 matrix size in 240×240 mm2 field of view). The imaging sequence was optimized for detection of the BOLD effect including local shimming and 10 s of scanning prior to data collection to allow the MR signal to reach equilibrium. The scanner was synchronized with the presentation of every trial in each run using Presentation software (http://nbs.neuro-bs.com). 2.6
fMRI Data Analysis
Data were analyzed using SPM2 software (Welcome Department of Cognitive Neurology, London, UK, http://www.fil.ion.ucl.ac.uk). The first two volumes in each run were discarded. The remaining images were corrected for differences in timing of slice acquisition, followed by rigid body motion correction. The data were for realigned and normalized to the standard EPI template. The registration of the EPI data to the template was checked for each individual subject. Head movement was < 2 mm in all cases. The fMRI data were then smoothed with an 8 mm FWHM isotropic Gaussian kernel. The hemodynamic response to the presentation of the first two figures was modeled with the canonical hemodynamic response function employed in SPM2. No scaling was implemented for global effects. The resulting time series across each voxel was high-pass filtered with a cut-off 128 s to remove section-specific low frequency drifts in the BOLD signal. An auto-regression AR (1) was used to exclude the variance explained by the pervious scan. The contrast images for each subject were then used in a random effects analysis to determine which regions were the most consistently activated
Brain Activation and Deactivation in Human Inductive Reasoning
391
across subjects using a one-sample t test. The co-activation of two kinds of contrast images were attained using the mask method. The activations reported survived an FDR corrected voxel-level intensity threshold of p < 0.05/0.01 with a minimum cluster size of 20 contiguous voxels.
3 3.1
Results Behavioral Results
The correctness was not statistically analyzed, for the correctness for all three tasks were relatively high after training. For the reaction time (2T=1372.2 ms, 1T=1470.9 ms, Rest=885.7 ms), the results of repeated-measured ANOVA showed that the main effect of task was significant (F(2, 26)=45.320, p < 0.001). Pair-wise comparison indicated that 2T was significantly faster than 1T (F(1, 13)=7.536, p=0.017). These results were consistent with Chen et al. (2006) [13]. In the present study, the component of perceptual integration was involved in 2T but not in 1T, while other components were identical for the two tasks. Therefore, the current results of reaction times were somewhat different from our expectation. The reason may be due to the effect of fast ”same” [14]. 3.2
fMRI Results
As compared to Rest, induction activated a large number of regions, including bilateral frontal cortex (BA 6,9,11,45,46,47), right anterior cingulate gyrus (BA 24), bilateral occipital cortex (BA 19, 18), bilateral putamen and thalamus (the co-activations of 2T vs. Rest and 1T vs. Rest; as shown in Table 1). As contrast to Rest, many regions showed negative activation in induction including left medial frontal gyrus (BA 10), right superior frontal gyrus (BA 6, 9, 10), right middle frontal gyrus (BA 8), right superior temporal gyrus (STG)/inferior parietal lobule (IPL)/supramarginal gyrus (SG) (BA 22, 40), left superior temporal gyrus (STG)/angular gyrus (AG)/middle temporal gyrus (MTG) (BA 39), right inferior temporal gyrus/fusiform gyrus (BA 20) (the co-activations of Rest vs. 1T and Rest vs. 2T; as shown in Table 2). Additionally, when the threshold were loosened (not corrected for multiple comparisons), two bilateral cluster of PCC/precuneus existed (12 -51 30, T=3.86, cluster size = 46 voxels; -3 -42 30, T=2.64, cluster size = 20 voxels; BA 31). The activation map of induction (p < 0.01, cluster size >= 20 voxels, multiple comparison corrections with FDR Deactivations in PCC/precuneus and Dorsomedial prefrontal cortex have been widely discussed. This study would focus on the deactivation of the temporal cortex based on the previous studies of analogical reasoning [28,29]. Based on the co-activation of Rest vs. 1T and Rest vs. 2T, two temporal clusters (right STG/IPL/SG, left STG/AG/MTG) were defined as functional regions of interest (ROIs). Fig. 2 and 3 showed the BOLD response of the two ROIs. Then, repeated ANOVAs were performed based on the BOLD value. For the left STG/AG/MTG ROI, the main effect of task was significant
392
P. Liang et al.
Table 1. The co-activation regions of 2T vs Rest and 1T vs Rest (p < 0.01, cluster size >= 20 voxels, multiple comparison corrections with FDR), Loci of maxima are in MNI coordinates Region Rt. Cingulate Gyrus Rt Superior Frontal Gyrus Rt. Middle Frontal Gyrus Lt. Middle Frontal Gyrus
BA 24 6
Lt. Paracentral Lobule
5 6 6 11
27
47 9
22
Lt. Medial Frontal Gyrus Lt. Middle Frontal Gyrus Lt. Inferior Frontal Gyrus Rt. Inferior Frontal Gyrus Rt. Middle Frontal Gyrus Lt. Cuneus Lt. Inferior Occipital Gyrus Rt. Middle Occipital Gyrus Rt. Fusiform Gyrus Rt. Middle Occipital Gyrus Rt. Middle Frontal Gyrus Lt. Inferior frontal gyrus Rt. Putamen Rt. Thalamus Rt. Inferior frontal gyrus Lt. Thalamus Lt. Putamen Lt. Putamen Rt. Hippocampus
Cluster 446
616
28
18 18
1449
19 19 18 46 45
1313
21 24 710
47 636
21
x 18 3 30 -24 -50 -3 -3 -12 -6 -24 -33 -24 45 56 -15 -33 -24 30 24 30 42 -30 18 12 36 -9 -15 -21 33
y -4 11 -3 -4 5 0 -32 -24 -23 43 40 32 10 22 -99 -84 -88 -81 -62 -87 33 24 12 -6 20 -11 6 4 -27
z 42 52 50 42 44 55 54 51 56 -7 -12 -7 27 29 8 -2 -8 20 -7 2 15 4 0 6 -1 6 -3 14 -9
Z 5 4.79 4.82 4.83 4.03 4.71 3.79 3.41 3.58 4.14 3.49 3.78 4.03 3.18 5.42 5.24 5.11 5.59 5.38 5.38 4.18 4.2 5.03 5.03 5.02 5.04 4.73 4.7 4.64
T 9.03 8.22 8.35 8.37 8.59 7.94 5.27 4.45 4.81 6.15 4.62 5.25 5.85 4.01 10.98 10.12 9.52 11.91 10.81 10.81 6.25 6.32 9.17 9.16 9.11 9.2 7.99 7.88 7.7
Brain Activation and Deactivation in Human Inductive Reasoning
393
Table 2. Deactivations in induction (the co-activation regions of Rest vs 2T and Rest vs 1T; p < 0.05, cluster size >= 20 voxels, multiple comparison corrections with FDR), Loci of maxima are in MNI coordinates Region Lt. Medial Frontal Gyrus Rt. Superior Frontal Gyrus
Rt. Middle Frontal Gyrus Rt. Superior Temporal Gyrus Rt. Inferior Parietal Lobule Rt. Supramarginal Gyrus Lt. Superior Temporal Gyrus Lt. Angular Gyrus Lt. Middle Temporal Gyrus Rt. Inferior Temporal Gyrus Rt. Fusiform Gyrus Rt. Inferior Temporal Gyrus
BA 10 6 10 9 8 22 40 40 39
Cluster 45 159
20
44
26 251
43
x -3 12 12 18 39 59 65 62 -56 -50 -48 62 56 59
y 50 26 62 48 28 -60 -36 -45 -60 -65 -74 -21 -19 -10
z 9 54 22 34 40 14 32 35 22 34 26 -17 -22 -20
Z 3.43 4.66 4.44 4.43 3.40 4.50 4.41 4.16 3.85 3.73 3.65 3.70 3.57 3.32
T 4.5 7.77 7.01 6.98 4.44 7.20 6.92 6.22 5.41 5.13 4.95 5.08 4.79 4.27
Fig. 2. The BOLD response of the left STG/AG/MTG (as shown in Table 2)
Fig. 3. The BOLD response of the right STG/IPL/SG (as shown in Table 2)
394
P. Liang et al.
(F(2, 26) = 18.435, p=0.001), and further pair-wise comparison showed that there was no difference between the BOLD response of 2T and 1T (p=0.56) while the BOLD responses of 2T (p=0.001) and 1T (p=0.000) were significantly lower than that of Rest. For the right STG/IPL/SG ROI, the main effect of task was significant (F(2, 26) = 27.061, p=0.000), and pair-wise comparison showed that the BOLD response of 2T is near to be significantly lower than that of 1T (p=0.081) while the BOLD responses of 2T (p=0.000) and 1T (p=0.000) were significantly lower than that of Rest.
4
Discussions
This study designed a kind of figural inductive reasoning tasks which is composed of simple regular geometrical figures and is homogeneous to classical verbal inductive reasoning tasks. Then, fMRI experiment was executed to explore the neural correlates of human inductive reasoning. To our knowledge, this may be the first study to simultaneous examine the activation and deactivation involved in human inductive reasoning. 4.1
Induction and Fronto-striatal-thalamus Circuit
In the current study, prefrontal cortex, putamen and thalamus were all positively activated in inductive reasoning. This is consistent with previous studies, including studies of human subjects [15,16,17,18,19,20] and studies of primate animals [21,22,23]. Some of these studies adopted inductive reasoning tasks, while cognitive tasks used in other studies were considered to contain the inductive inference components. It was reported that Parkinson’s disease (PD) and Huntington’s disease (HD) patients with lesion in the basal ganglia were impaired in the tasks including reasoning, problem solving and learning (inductive reasoning is thought of as the important component of these tasks) [24,25]. These impairments were attributed to the deficit of the circuit between the dorsolateral prefrontal cortex (DLPFC) and basal ganglia. These studies preliminarily demonstrated the role of the information-processing loop of fronto-striatial-thalamus in inductive reasoning: first, information is pre-processed in frontal cortex; second, the pre-processed information is transferred into basal ganglia, and then enters into thalamus through the fibers between the basal ganglia and thalamus; finally, information is returned back to the frontal cortex, and the whole inductive reasoning process is finished. Recently, fiber tracking and physiological studies have validated the existence of the fronto-striatial-thalamus loop, and the functional role of this loop was linked to motivation, emotion, planning, cognition and action [26]. This circuit was considered as a layered model, and each parts in this circuit may play a specific role to processing of input and organization of output. This is congruent with the present study. Different from some previous studies [15,2], this study did not find the activation of the parietal lobe. This may be explained by the differences in experimental tasks and experimental procedures. On the one hand, the experimental task
Brain Activation and Deactivation in Human Inductive Reasoning
395
in the current study was relatively more pure than Raven’s Progressive Matrix (RPM) in Christoff et al. (2001) [15] and odd animals in Goel et al. (2000) [2]. On the other hand, the detection order of features were not restricted in previous studies [15,2], while it was specifically concerned and united among participants in this study. Additionally, the activation of the parietal cortex was also not observed in a previous study of us [5], although we adopted the same experimental design to Goel et al. (1997) [1]. Therefore, this discrepancy may also relate to cultural difference. 4.2
Deactivations in Induction
Regions showed deactivations in this study are on the whole consistent with previous studies [27,11,12]. In a meta-analysis of nine positron emission tomography (PET) studies of different cognitive tasks, including language, mental calculation, visual movement, mental image, perceptual match, finger movement, etc., [27], it was found that regions including PCC, DMPFC and angular gyrus showed significant negative activation as compared to resting state. These results have been further validated in following studies [11,12]. Additionally, the significant deactivation of the temporal cortex in this study is also congruent with previous studies of analogical reasoning [28,29]. ROI analysis showed that the BOLD responses of 2T and 1T were not significant in the left STG/AG/MTG but significant in the right STG/IPL/SG. The deactivation pattern in the right STG/IPL/SG is consistent with the argument that the deactivation activity is positively correlated with task difficulty [30]. As mentioned above, 2T task in the present study contains the component of perceptual integration while 1T task does not. The different deactivation pattern in the left STG/AG/MTG and right STG/IPL/SG suggested the different roles of the two regions. However, the relative intensity of PCC deactivation in this study is different from previous studies of default mode network (DMN), in which PCC deactivation was the maxima and PCC was considered to play a critical role [9]. Some studies reported that some regions in PCC/precuneus may be recruited in inductive reasoning [5]. This may be the reason why PCC deactivation weakened in this study. This effect should be detailed examined in the future study. 4.3
Theoretical Explanation of Activation and Deactivation
Many efforts have been made to explain the mechanism of activation and deactivation co-existed in cognitive experiments. Drevets et al. (1995) [31]proposed that activation regions and deactivation regions composed of a whole network. Deactivation regions may play a ”gating or inhibition” role, i.e., filtering out the useless information, which may be helpful for the processing of the useful information. In the current study, deactivation in the right STG may reflect the inhibition of some superficial visual spatial information, as the key of the current task is to detect if two figures shared some features. This is consistent with some previous studies in which the right STG was concluded to be recruited in visual
396
P. Liang et al.
search and spatial perceptual processing [32,33]. For task-independent deactivations, McKiernan et al. (2003) [30] proposed a mechanism of reallocation of processing resources [30]. They considered that there is ongoing, internal information processing during the resting state. This ongoing processing is suspended when performing some cognitive tasks, and the related processing resources are transferred into task-induced activation regions. Then, the regions with less processing resources may present negative activation. According to this opinion, the bilateral STG regions in this study may be due to the reason that these regions are involved in this internal resting state activity, and then showed deactivation. 4.4
Limitation and Future Work
This study still has some limitations. First, the present experimental design prevent us from further inferring which regions were task-dependent deactivations and which regions were task-independent deactivations. Task-dependent deactivations have close relation to specific cognitive tasks, and different tasks may induce different deactivations. Task-independent deactivations may keep mostly the same for different cognitive tasks [27,34]. The results of this study showed the pattern of induction induced deactivations is different from that of lower level cognitive functions (e.g., some basic visual, auditory and movement tasks), however, it was still not clear if a deactivation regions was task-dependent or not. Second, multiple methods, including functional connectivity, independent component analysis (ICA), etc., and multiple modal data, including resting state fMRI, diffusion tensor imaging (DTI), etc., should be combined to further study the information-processing mechanism of inductive reasoning in human brain.
5
Conclusion
In summary, activation and deactivation should be simultaneous examined and interactions among activation and deactivation regions should be analyzed in order to in-depth understand the neural mechanism of human inductive reasoning. This study preliminarily explored the activation and deactivation in inductive reasoning, however, there still had some insufficiencies. Future work will include studies of other different kinds of inductive reasoning, such as numerical and sentential tasks, in order to systematically examine the neural mechanism of inductive reasoning.
Acknowledgments This work was supported by the National Natural Science Foundation of China (No. 60775039), and partially supported by the grant-in-aid for scientific research (No. 18300053) from the Japanese Ministry of Education, Culture, Sports, Science and Technology.
Brain Activation and Deactivation in Human Inductive Reasoning
397
References 1. Goel, V., Gold, B., Kapur, S., Houle, S.: The seats of reason? an imaging study of deductive and inductive reasoning. NeuroReport 8, 1305–1310 (1997) 2. Goel, V., Dolan, R.J.: Anatomical segregation of component processes in an inductive inference task. Journal of Cognitive Neuroscience 12, 1–10 (2000) 3. Goel, V., Dolan, R.J.: Differential involvement of left prefrontal cortex in inductive and deductive reasoning. Cognition 93, B109–B121 (2004) 4. Liang, P.P., Zhong, N., Lu, S.F., Liu, J.M., Yao, Y.Y., Li, K.C., Yang, Y.H.: The neural mechanism of human numerical inductive reasoning process: A combined ERP and fMRI study. In: Zhong, N., Liu, J., Yao, Y., Wu, J., Lu, S., Li, K. (eds.) WImBI 2007. LNCS (LNAI), vol. 4845, pp. 223–243. Springer, Heidelberg (2007) 5. Liang, P.P.: Study of the cognitive neural mechanism of inductive reasoning. Doctoral dissertation of Beijing University of Technology (2010) 6. Prabhakaran, V., Smith, J.A.L., Desmond, J.E., Glover, G.H., Gabrieli, J.D.E.: Neural substrates of fluid reasoning: an fMRI study of neocortical activation during performance of the Raven’s Progressive Matrices Test. Cognitive Psychology 33, 43–63 (1997) 7. Zhong, N., Liang, P.P., Qin, Y., Lu, S.F., Yang, Y.H., Li, K.C.: Neural subsrates of data-driven scientific discovery: An fMRI study during performance of number series completion task. Science in China Series C: Life Science (in press) 8. Yang, Y.H., Liang, P.P., Lu, S.F., Li, K.C., Zhong, N.: The role of the DLPFC in inductive reasoning of MCI patient and normal aging: An fMRI study. Science in China Series C: Life Science 52(8), 789–795 (2009) 9. Greiclus, M.D., Krasnow, B., Reiss, A.L., Menon, V.: Functional connectivity in the resting brain: A network analysis of the default mode hypothesis. PNAS 100(1), 253–258 (2003) 10. Esposito, F., Bertolino, A., Scarabino, T., Latorre, V., Blasi, G., Popolizio, T., Tedeschi, G., Cirillo, S., Goebel, R., Salle, F.D.: Independent component model of the default-mode brain function: assessing the impact of active thinking. Brain Research Bulletin 70, 263–269 (2006) 11. Fox, M.D., Raichle, M.E.: Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nature Reviews Neuroscience 8, 700–711 (2007) 12. Raichle, M.E., Snyder, A.Z.: A default mode of brain function: a brief history of an evolving idea. NeuroImage 37, 1083–1090 (2007) 13. Chen, A., Luo, Y., Wang, Q., et al.: Electrophysiological correlates of category induction: PSW amplitude as an index of identifying shared attributes. Biological Psychology 76, 230–238 (2007) 14. Zhou, G.M., Fu, X.L.: Effect of holistic and analytical features on same-different judgment. Acta Psychologica Sinica 6, 687–689 (2004) 15. Christoff, K., Prabhakaran, V., Dorfman, J.: Rostrolateral prefrontal cortex involvement in relational integration during reasoning. NeuroImage 14, 1136–1149 (2001) 16. Merkl, A., Pochon, J., Lehericy, S., Deweer, B., Pillon, B., Levy, R., Dubois, B.: Differential caudate-prefrontal activation during the initial and late stages of mirror reading learning: A fMRI study. NeuroImage 13, S711 (2001) 17. Poldrack, R.A., Sabb, F.W., Foerde, K., Tom, S.M., Asarnow, R.F., Bookheimer, S.Y., et al.: The neural correlates of motor skill automaticity. The Journal of Neuroscience 25, 5356–5364 (2005)
398
P. Liang et al.
18. Seger, C.A., Cincotta, C.M.: Striatal activation in concept learning. Cogn. Affect Behav. Neurosci. 2, 149–161 (2002) 19. Seger, C.A., Cincotta, C.M.: The roles of the caudate nucleus in human classification learning. Journal of Neuroscience 25, 2941–2951 (2005) 20. Seger, C.A., Cincotta, C.M.: Dynamics of frontal, striatal, and hippocampal systems during rule learning. Cerebral Cortex 16, 1546–1555 (2006) 21. Levy, R., Friedman, H.R., Davachi, L., Goldman-Rakic, P.S.: Differential activation of the caudate nucleus in primates performing spatial and nonspatial working memory tasks. The Journal of Neuroscience 17, 3870–3882 (1997) 22. Pasupathy, A., Miller, E.K.: Different time courses of learning related activity in the prefrontal cortex and striatum. Nature 433, 873–876 (2005) 23. Yuta, S., Munekazu, Y., Toshio, I., Ken-Ichiro, T.: Involvement of the dorsolateral prefrontal cortex in inductive reasoning. Neuroscience Research 58, S228 (2007) 24. Grahna, J.A., Parkinsonb, J.A., Owen, A.M.: The role of the basal ganglia in learning and memory: Neuropsychological studies. Behavioural Brain Research 199, 53–60 (2009) 25. Poldrack, R.A., Prabhakaran, V., Seger, C.A., Gabriel, G.D.E.: Striatal Activation During Acquisition of a Cognitive Skill. Neuropsychology 13, 564–574 (1999) 26. Heyder, K., Suchan, B., Daum, I.: Cortico-subcortical contributions to executive control. Acta Psychologica 115, 271–289 (2004) 27. Mazoyer, B., Zago, L., Mellet, E., Bricogne, S., Etard, O., Houde, O., Crivello, F., Joliot, M., Petit, L., Tzourio-Mazoyer, N.: Cortical networks for working memory and executive functions sustain the conscious resting state in man. Brain Research Bulletin 54(3), 287–298 (2001) 28. Mu, Y., Mu, Q.: Neural circuits of verbal analogical mapping using functional MRI and principal component analysis. Medical Journal of West China 20(1), 171–176 (2008) 29. Wharton, C.M., Grafman, J., Flitman, S.S., Hansen, E.K., Brauner, J., Marks, A., Honda, M.: Toward neuroanatomical models of analogy: A positron emission tomography study of analogical mapping. Cognitive Psychology 40, 173–197 (2000) 30. McKiernan, K.A., Kaufman, J.N., Kucera-Thompson, J., Binder, J.R.: A parametric manipulation of factors affecting task-induced deactivation in functional neuroimaging. Journal of Cognitive Neuroscience 15, 3394–3408 (2003) 31. Drevets, W.C., Burton, H., Videen, T.O.: Blood flow changes in human somatosensory cortex during anticipated stimulation. Nature 373, 249–252 (1995) 32. Ellison, A., Schindler, I., Pattison, L.L., Milner, A.D.: An exploration of the role of the superior temporal gyrus in visual search and spatial perception using TMS. Brain 127, 2307–2315 (2004) 33. Gharabaghi, A., Berger, M.F., Tatagiba, M., Karnath, H.: The role of the right superior temporal gyrus in visual search-Insights from intraoperative electrical stimulation. Neuropsychologia 44(12), 2578–2581 (2006) 34. Shulman, G., Fiez, J.A., Corbetta, M., Buckner, R.L., Miezin, F.M., Raichle, M.E., Petersen, S.E.: Common blood flow changes across visual tasks: II. decreases in cerebral cortex. Journal of Cognitive Neuroscience 9, 648–663 (1997)
Clustering of fMRI Data Using Affinity Propagation Dazhong Liu1,2 , Wanxuan Lu1 , and Ning Zhong1,3 1
International WIC Institute, Beijing University of Technology Beijing 100124, China 2 School of Mathematics and Computer Science, Hebei University Baoding 071002, China 3 Department of Life Science and Informatics, Maebashi Institute of Technology Maebashi 371-0816, Japan [email protected], [email protected], [email protected]
Abstract. Clustering methods are commonly used for fMRI (functional Magnetic Resonance Imaging) data analysis. Based on an effective clustering algorithm called Affinity Propagation (AP) and a new defined similarity measure, we present a method for detecting activated brain regions. In the proposed method, autocovariance function values and the Euclidean distance metric of time series are firstly calculated and combined into a new similarity measure, then the AP algorithm with the measure is carried out on all time series of data, and at last regions with which their cross-correlation coefficients are greater than a threshold are taken as activations. Without setting the number of clusters in advance, our method is especially appropriate for the analysis of fMRI data collected with a periodic experimental paradigm. The validity of the proposed method is illustrated by experiments on a simulated dataset and a benchmark dataset. It can detect all activated regions in the simulated dataset accurately, and its error rate is smaller than that of K-means. On the benchmark dataset, the result is very similar to SPM.
1
Introduction
Functional Magnetic Resonance Imaging (fMRI) is a noninvasive technique based on Blood Oxygenation Level Dependent (BOLD) contrast that accompanies neural activity in the brain [1]. In most fMRI applications, fMRI has been widely used to determine the relationship between cognitive states and some specific functional areas of the brain. These fMRI images are always polluted by strong noises, such as subject movements, respiratory, heart artifacts and machine noise. Hence BOLD signals in fMRI data have low signal-to-noise ratios (SNRs). This results in that the detection and analysis of BOLD effects is a challenging problem. There are many methods developed for fMRI data analysis, which can be roughly divided into two main categories: model-driven methods and data-driven methods. The general linear model (GLM) method is a conventional modeldriven method, which uses a canonical hemodynamic response function (HRF) as Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 399–406, 2010. c Springer-Verlag Berlin Heidelberg 2010
400
D. Liu, W. Lu, and N. Zhong
the priori hypothesis. The GLM has been implemented as one of the most popular fMRI data analysis softwares called Statistical Parametric Mapping(SPM) [2]. Being complementary to the hypothesis of model-driven methods, data-driven approaches do not make assumptions on the profile of the HRF. Many datadriven methods have been developed, such as clustering [3,4,5], principal component analysis (PCA) [6,7], and independent component analysis (ICA) [8], etc. Recently, many clustering algorithms have been applied to fMRI data analysis. These techniques attempt to classify fMRI time courses in the data into several patterns according to temporal proximity measure among them [9,10,11,12, 13,14]. For clustering problems, proximity measure is so important that it may determine how successful the resulting clusters are. Bandettini et al. considered the cross-correlation coefficient between the data time series and a reference function [15]. Goutte et al. suggested clustering on the cross-correlation function between the fMRI activation and the experimental protocol signal [3]. Ye et al. proposed to make the new feature of empirical autocorrelation function for clustering [16]. In the paper we present a novel explorative and data-driven approach for fMRI data analysis. It is based on a newly combined similarity measure and Affinity Propagation clustering algorithm. The method was evaluated by results on simulated fMRI data and real fMRI data. The remainder of this paper is organized as follows. In Section 2, we introduce the dataset used in our experiment, define the combined similarity measure, and describe our method. In Section 3, experimental results of the proposed approach are given and compared with that of K-means and SPM. Finally, Section 4 gives concluding remarks.
2 2.1
Materials and Methods Datasets
Simulated fMRI Data. We used a simulated dataset similar to the one described by Yang et al [17] to evaluate the performance of activation detection methods. The dataset simulated one slice of fMRI data containing N = 1963 brain voxels and with the series length T = 200. The simulated paradigm comprises two alternating conditions in a block design. Three small activation foci of 21 voxels were created, and the activation time courses were obtained by the convolution of the experimental condition time courses with the HRF of SPM sampled at TP = 2s [18]. Furthermore, Gaussian noise of zero mean value and standard deviation varies from 0.2 to 3.5 corresponding to SNR (the standard deviation of the simulated signal over the standard deviation of the added noise) from 0.93 to 0.22, which is in the range of the SNR of real fMRI data. The data was smoothed spatially as commonly done for fMRI (FWHM = 4.5 mm = 1.5 voxel). In Vivo fMRI Data. We illustrated our approach using a typical dataset, an auditory dataset from the Wellcome Department of Imaging Neuroscience of the
Clustering of fMRI Data Using Affinity Propagation
401
University College London. This dataset was the first ever collected and analyzed in the Functional Imaging Laboratory (http://www.fil.ion.ucl.ac.uk/spm/data/). The data were acquired using a modified 2T Siemens MAGNETOM Vision system. Each acquisition consisted of 64 contiguous slices (64 x 64 x 64, 3mm x 3mm x 3mm voxels). Acquisition took 6.05s, it consists of 96 acquisitions made (TR = 7s) from a single subject in blocks of 6 (42 seconds), giving 16 42s blocks. The paradigm consisted of eight rest and auditory stimulation cycles, starting with rest. Auditory stimulation was bi-syllabic words presented binaurally at a rate of 60 per minute. The functional data starts at acquisition 4. A T1-weighted high-resolution (1mm x 1mm x 1mm) scan was also required for anatomical reference and segmentation purpose. To avoid T1 effects in the initial scans of an fMRI time series, just as the Statistical Parametric Mapping SPM manual (http://www.fil.ion.ucl.ac.uk/spm/data/) suggested, we discarded the first complete cycle (12 scans), leaving 84 scans to analysis. 2.2
Similarity Measures
Recently, a new clustering approach named affinity propagation (AP) was proposed [19]. Affinity propagation has several advantages being able to avoid many of the unlucky initializations and hard decisions. Appropriate similarity measure plays a critical role in clustering. One should be try to quantify the similarity between two fMRI voxels meaningfully. Given one slice of a fMRI data volume, we use {xn (k)} to represent fMRI time series of the k th voxel in the slice, where n is the time point. Given two fMRI voxels i and k, their similarity s(i,k) is defined as follows: d (i, k) = s1 (i, k) + αs2 (i, k) ; s (i, k) = −ed(i,k)
(1)
s1 (i, k) = ||Rm (i) − Rm (k) ||2 , s2 (i, k) = ||xn (i) − xn (k) ||2
(2)
Rl (k) = E{(xn+l (k) − μ (k)) (xn (k) − μ (k))},
(3)
where μ(k) denotes the mean values of time series for voxel k, Rl (k) is the autocovariance of voxel k s time series and l is a time point, s1(i,k) represents the squared distance of the autocovariance values of voxel i and voxel k, s2(i,k) represents the squared distance of raw time series of voxel i and voxel k, and α is a constant and is set to equals to 0.5 in our experiments. Similarity s(i, k) consists of two components. One is the squared Euclidean distance of two voxels’ standardized fMRI time series, and the other is the squared Euclidean distance of two autocovariance function values. The reason of using the autocovariance function is that the autocovariance function value for a periodical signal is quite different from that of white noise. The merit of using Euclidean distance of two voxels’ standardized fMRI time series is that it can be used as a measure of consecutiveness for neighborhood voxels. 2.3
Analysis Methods
Our method is based on the combined similarity measure, and is called combined similarity measure based AP (abbreviated as CAP). For real fMRI dataset, we
402
D. Liu, W. Lu, and N. Zhong
Fig. 1. Flowchart of the CAP clustering analysis method
standardize time series by subtracting the mean and dividing it by its standard deviation. The similarity between two voxels is set according to equations (1)(3). Once the similarity is obtained, we adopt the affinity propagation (AP) clustering method to perform the grouping. Initially, the preferences are set to be the minimum of the similarity matrix. The small preferences will generate small number of clusters. For the real dataset, the pre-processing step was undertaken to enhance signals while suppressing noise. The low frequency components and polynomial drifts were removed by applying a regression model for each voxel. The entire program was implemented using SPM5 toolbox ( http://www.fil.ion.ucl.ac.uk/spm/ ext/) and Matlab. In this work, we used SPM5 to pre-process (realignment, normalization/registration and smoothing) the fMRI sequences. After that, the similarity calculation and the affinity propagation (AP) clustering were carried out as stated above. In the post-processing, for each cluster a cross-correlation coefficient between the averaged cluster time series and the experimental paradigm was calculated. Clusters with which their cross-correlation coefficients is greater than a threshold were chosen as activated regions. Here, the threshold was set to 0.4. The flowchart of the CAP clustering analysis method is shown in Fig. 1.
3
Results
We first tested our analysis method on the synthetic fMRI data. We generated data with different SNRs. To demonstrate the effectivity of our method, the most widely used K-means clustering was applied on the same dataset, with the number of clusters setting to 4. The cross-correlation coefficient (Bandettini et al. [15]) was selected as the similarity measure. Since the location of all activated voxels is known, the error rate and standard deviation can be used to evaluate the performance of an activation detection algorithm. The results are shown in Fig. 2. From the results, it can be seen that for the synthetic dataset, the CAP has lower error rates than K-means in any conditions.
Clustering of fMRI Data Using Affinity Propagation
403
6
5
Error rate %
4
3
2 K−means CAP 1
0 0.2
0.3
0.4
0.5
0.6 SNR
0.7
0.8
0.9
1
Fig. 2. The error rates and standard deviations of CAP and K-means clustering on synthetic data under different SNRs
Fig. 3. The activated cortex overlay maps and clustering results. The z coordinate in the MNI space of each column slice equals to 6, 9, 12, and 15 mm, respectively. The first row shows their SPM t-map. The color bar on the right side shows the corresponding t-value for a particular color only for this row. The second row gives the activated cortex map using CAP. The third row illustrates the CAP clustering map.
For the real auditory fMRI data, we selected four axial slices (coordinate in MNI space with z = 6,9,12,15 mm) to illustrate the results. Initially, the real datasets were pre-processed just as shown in Fig. 1. In the analysis, SPM5 was used as a pre-processing tool. After the pre-processing steps, statistical analysis or clustering analysis was carried out to determine which voxels are activated by the stimulation. We first used SPM5 to analyse the pre-processed dataset and the t-map (t>3) was generated. The overlay map results are shown in the first row of Fig. 3. After that, we used CAP to cluster the same pre-processed dataset. The clustering results corresponding to each slice are shown in the third row of Fig. 3. The activated clusters taken out from each clustering results are
404
D. Liu, W. Lu, and N. Zhong
Fig. 4. The raw time series of activated voxels in the slice (the first column of the second row in Fig. 3) and the average time series of the raw time series drawn in thick red line. The yellow dashed line denotes box-car stimulus waveform.
shown in the second row of Fig. 3. From Fig. 3, we can see that the results of CAP clustering are very similar to that of SPM. We select the slice with z = 6 to show time series of the activated auditory cortex voxels. The average time series of those signals are also calculated. The results are plotted in Fig. 4.
4
Discussions and Conclusions
In this paper, a newly defined similarity measure and using of AP are the key components in our exploratory fMRI analysis method. The selection of similarity measure depends on the features of fMRI dataset and clustering algorithm itself. Considering the property of fMRI dataset, the autocovariance function value is added in the similarity measure to discriminate periodical signals from white noise. Meanwhile, the similarities used in AP do not necessarily satisfy the triangle inequality, unlike the K-means required [19]. Hence an appropriate combined similarity measure needs to be used. From results on the synthetic fMRI dataset, we can see that the proposed similarity measure performs well for signals with noises. From the results of the real fMRI data analysis, one can find that major activated auditory cortex regions are grouped to one cluster, which accords with the results of SPM5 well. In a word, the developed CAP is one of appropriate methods for identifying the brain activation regions under the periodic experimental paradigm. Moreover, it is not necessary to specify the number of clusters and to know the HRF profile in advance. Besides using the cross-correlation coefficient as the standard to determine the activated regions, one can first calculate Fourier transformation of each cluster center and the stimulus trail, and then choose the cluster that has the most similar frequencies as the activated regions. Any method needs setting some parameters more or less. These parameters always are decided according to experience or theory. Our parameters are acquired from experiments. There are several tasks, including data reduction and calculating strategy, to improve our method. The future work
Clustering of fMRI Data Using Affinity Propagation
405
includes to find some asymptotic optimal condition for the similarity measure selection and parameters adjustment.
Acknowledgments The work is partially supported by Support Center for Advanced Telecommunications Technology Research, Foundation, the National Natural Science Foundation of China under Grant No. 60875075, and the Beijing Natural Science Foundation under Grant No. 4102007.
References 1. Ogawa, S., Lee, T.M., Kay, A.R., Tank, D.W.: Brain Magnetic Resonance Imaging with Contrast Dependent on Blood Oxygenation. Proceedings of the National Academy of Sciences 87, 9868–9872 (1990) 2. Worsley, K., Friston, K.: Analysis of fMRI Time-series Revisited Again. NeuroImage 2, 173–181 (1995) 3. Goutte, C., Toft, P., Rostrup, E., Nielsen, F.A., Hansen, L.K.: On Clustering fMRI Time Series. NeuroImage 9, 298–310 (1999) 4. Chuang, K., Chiu, M., Lin, C.C., Chen, J.: Model-free Functional MRI Analysis Using Kohonen Clustering Neural Network and Fuzzy C-means. IEEE Transactions on Medical Imaging 18, 1117–1128 (1999) 5. Baumgartner, R., Scarth, G., Teichtmeister, C., Somorjai, R., Moser, E.: Fuzzy Clustering of Gradient-echo Functional MRI in the Human Visual Cortex. Part I: Reproducibility. Journal of Magnetic Resonance Imaging 7, 1094–1101 (1997) 6. Baumgartner, R., Ryner, L., Richter, W., Summers, R., Jarmasz, M., Somor-jai, R.: Comparison of Two Exploratory Data Analysis Methods for fMRI: Fuzzy Clustering vs. Principal Component Analysis. Magnetic Resonance in Medicine 18, 89–94 (2000) 7. Hansen, L.K., Larsen, F.A., Nielsen, S.C., Strother, E., Rostrup, E., Savoy, R., Svarer, C., Paulson, O.B.: Generalizeable Patterns in NeuroImaging: How Many Principal Components. NeuroImage 9, 534–544 (1999) 8. McKeown, M.J., Makeig, S., Brown, G.G., Jung, T.P., Kindermann, S.S., Bell, A.J., Spjnowski, T.J.: Analysis of fMRI Data by Blind Separation into Independent Spatial Components. Human Brain Mapping 6, 160–188 (1998) 9. Mohamed, L.S., Friston, K.J., Price, C.J.: Detecting Subject-specific Activations Using Fuzzy Clustering. NeuroImage 36, 594–605 (2007) 10. Filzmoser, P., Baumgartner, R., Moser, E.: A Hierarchical Clustering Method for Analyzing Functional MR Images. Magnetic Resonance Imaging 17(6), 817–826 (1999) 11. Dimitriadou, E., Barth, M., Windischberger, C., Hornik, K., Moser, E.: A Quantitative Comparison of Functional MRI Cluster Analysis. Artificial Intelligence in Medicine 31, 57–71 (2004) 12. Yeo, B.T., Ou, W.: Clustering fMRI time series, http://people.csail.mit.edu/ythomas/6867fMRI.pdf 13. Fadili, M.J., Ruan, S., Bloyet, D., Mazoyer, B.: A Multistep Unsupervised Fuzzy Clustering Analysis of fMRI Time Series. Human Brain Mapping 10, 160–178 (2000)
406
D. Liu, W. Lu, and N. Zhong
14. Wang, D., Shi, L., Yeung, D.S., Heng, P., Wong, T., Tsang, E.C.C.: Support Vector Clustering for Brain Activation Detection. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3749, pp. 572–579. Springer, Heidelberg (2005) 15. Bandettini, P.A., Jesmanowicz, A., Wong, E.C., Hyde, J.S.: Processing Strategies for Time-course Data Sets in Functional MRI of the Human Brain. Magnetic Resonance in Medicine 30(2), 161–173 (1993) 16. Ye, J., Lazar, N.A., Li, Y.: Geostatistical Analysis in Clustering fMRI Time Series. Statistics in Medicine 28(19), 2490–2508 (2009) 17. Yang, J., Zhong, N., Liang, P.P., Wang, J., Yao, Y.Y., Lu, S.F.: Brain Activation Detection by Neighborhood One-class SVM. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology – workshops, pp. 47–51 (2007) 18. Friston, K.J.: Statistical Parametric Mapping: The Analysis of Functional Brain Images. Academic Press, London (2006) 19. Frey, B.J., Dueck, D.: Clustering by Passing Messages between Data Points. Science 315(5814), 972–976 (2007) 20. Meyer-Baese, A., Wismueller, A., Lange, O.: Comparison of Two Exploratory Data Analysis Methods for fMRI: Unsupervised Clustering versus Independent Component Analysis. IEEE Transactions on Information Technology in Biomedicine 8(3), 387–398 (2004) 21. Maas, L.C., Frederick, B.D., Yurgelun-Todd, D.A., Renshaw, P.F.: Autocovariance Based Analysis of Functional MRI Data. Biological Psychiatry 39, 640–641 (1996) 22. Thirion, B., Faugeras, O.: Feature Characterization in fMRI Data: The Information Bottleneck approach. Medical Image Analysis 8, 403–419 (2004) 23. Chen, H., Yuan, H., Yao, D., Chen, L., Chen, W.: An Integrated Neighborhood Correlation and Hierarchical Clustering Approach of Functional MRI. IEEE Transactions on Biomedical Engineering 53(3), 452–458 (2006) 24. Fadilia, M.J., Ruana, S., Bloyeta, D., Mazoyer, B.: On the Number of Clusters and the Fuzziness Index for Unsupervised FCA Application to BOLD fMRI Time Series. Medical Image Analysis 5, 55–67 (2001) 25. Zhong, N., Yao, Y.Y., Ohshima, M.: Peculiarity Oriented Multidatabase Mining. IEEE Transactions on Knowledge and Data Engineering 15(4), 952–960 (2003) 26. Golland, Y., Golland, P., Bentin, S., Malach, R.: Data-driven Clustering Reveals a Fundamental Subdivision of the Human Cortex into Two Global Systems. Neuropsychologia 46, 540–553 (2008) 27. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis, pp. 189–225. John Wiley & Sons, New York (1973)
Interaction between Visual Attention and Goal Control for Speeding Up Human Heuristic Search Rifeng Wang1,2 , Jie Xiang1,3 , and Ning Zhong1,4 1
The International WIC Institute, Beijing University of Technology, China 2 Dept. of Computer Science, Guangxi University of Technology, China 3 College of Computer and Software, Taiyuan University of Technology, China 4 Dept. of Life Science and Informatics, Maebashi Institute of Technology, Japan [email protected]
Abstract. Heuristic search is the largest qualitative gap between human performance and computer performance. The dominant study in heuristic search is an empirical way in computer science especially in artificial intelligence. In this paper, we examine the factors that are impact heuristic search from the perspective of human brain. Subjects performed a set of heuristic problems in functional Magnetic Resonance Imaging (fMRI) environment. Then a computational cognitive model is set up to simulate processes of information processing of the heuristics problem solving. During the modeling, we found that two main factors, visual attention and goal control, are responsible for speeding up heuristic search in problem solving, where visual attention captures a target selectively with the goal state-directed control. The interaction between these two cognitive systems speeds up the heuristic search, which is superior to machine intelligence. We demonstrate this conclusion by results of modeling, including cost analysis in time, information processing operations, and fitness of fMRI results and model prediction.
1
Introduction
The largest qualitative gap between human performance and computer performance is in the area of heuristics [12]. Heuristic search is a main strategy of human problem solving like chess playing and cryptarithmetic problem [14,21]. Heuristic search is also a popular method in Artificial Intelligence (AI) problem solving like traveling salesman problem [20] and flow shops scheduling problem [11]. Most existing studies on heuristic search are empirical studies in the computer science area, which still results in a large gap between human intelligence and artificial intelligence [15]. In this paper, we focus on how to speed up heuristic search by the interaction between two cognitive systems, the one is selective attention of visual perceptional system and the other one is goal state control of thought. We suggest that the interaction is the main reason that improves the performance of heuristic search. There are many studies on these two systems of Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 407–416, 2010. c Springer-Verlag Berlin Heidelberg 2010
408
R. Wang, J. Xiang, and N. Zhong
human brain [10,13,22,25]. The common on these studies is that they focus on the mechanisms on the perspective of data-driven analysis. Theses studies showed that the interaction between visual attention and goal state control is not a new topic. In this paper, however, we show some evidence from the perspective of model-driven. A method of ACT-R (Adaptive Control of Thought-Rational) simulation meeting fMRI experiment is used in this study where an information processing model was set up to simulate heuristic problem solving on simplified 4*4 Sudoku paradigm and an fMRI (functional Magnetic Resonance Imaging) experiment [19] was performed with the paradigm of the puzzle to collect real data to test the results of the model. By simulation, we analyze the time cost of selective attention and show sequences of information processing in the two cognitive systems. The fitness between fMRI results and ACT-R predictions also supports our conclusion.
2 2.1
Human Heuristic Problem Solving Paradigm: Simplified 4*4 Sudoku
We developed a paradigm of problem solving, namely simplified 4*4 Sudoku as shown in Fig. 1, to examine the heuristics search of human beings. A 4*4 Sudoku is a 4*4 matrix with two mid-lines lines to divide the matrix into 4 boxes and some of grids are filled with digits. The constraint is that each row, each column, and each 2*2 box fill digits from 1 to 4 only one time. For controlling heuristic strategies that participants use, we simplified the problem, so that it only needs to fill one grid marked with ‘?’. For example, the ‘?’ in Fig. 1(a) should be ‘3’ according to the constraint. 2.2
Heuristics in Simplified 4*4 Sudoku
Heuristics are some effective rules with some empirical knowledge that helps to solve the problem efficiently. When solving a problem, people always try to find some heuristics for them to answer the question quickly. We have suggested that goal-oriented search is an important factor on heuristics retrieval [25]. In this study we concerned more about factors that speed up heuristic search. The fundamental knowledge to solve a simplified 4*4 Sudoku involves searching the grid ‘?’ and three digits of 1,2,3,4 related to ‘?’ while detecting the constraint in not more than three dimensions of ‘?’: row, column and box. As a wellstructured problem, heuristics in the simplified 4*4 Sudoku are predefined before experiments. The following are four heuristics used in this study. – Row-Column heuristics: three digits are set in the row and the column of ‘?’, and no digit in the box of ‘?’, as shown in Fig. 1(a). – Row-Box heuristics: three digits are set in the row and the box of ‘?’, as shown in Fig. 1(b). – Column-Box heuristics: three digits are set in the column and the box of ‘?’, as shown in Fig. 1(c). – Row-Column-Box heuristics: three digits are set in the row, the column and the box of ‘?’, only one digit in each dimension, as shown in Fig. 1(d).
Interaction between Visual Attention and Goal Control
409
Fig. 1. Heuristics in the simplified 4*4 Sudoku
3 3.1
fMRI Experiment Target of fMRI Experiment
As an advanced brain imaging techniques, the advantages of fMRI over competing techniques include whole brain coverage, noninterference of spatially separate activation sites, as well as good spatial resolution [24]. Our target of fMRI experiment is to get the real data including brain activity on predefined regions during subjects performs the tasks and behavioral data to fit with predictions of ACT-R, as further to be stated in Section 4.4. 3.2
Procedure and Stimulus
The visual stimuli of the tasks include 4 types of tasks using heuristics for the simplified 4*4 Sudoku problem solving as shown in Fig. 1. The experiment took two steps. First, before fMRI scanning, all participants were trained to know the heuristics well. Second, the participants performed tasks in MRI scanner. The protocol of a scan trial is shown in Fig. 2 where a trial began with an alerting stimulus presented for two second followed by a 4*4 Sudoku stimulus that stayed on the screen until the participant indicated they knew the answer or 20 seconds had elapsed. When participants found the answer of the problem, they pressed a key under the thumb of their right hands and then spoke out the answer. Then the Sudoku stimulus was cleared. Event-related fMRI data were collected by a gradient echo planar pulse acquisition on a Siemens 3T Trio Tim Scanner. Behavioral data and fMRI brain imaging data of 16 participants were collected at the same time. 2 * 2s
3
4
2
1 ?
3
1 20s
2s
4 1 2
1 2s
+ 10s
Time
Fig. 2. The protocol of a scan trial in fMRI
410
R. Wang, J. Xiang, and N. Zhong
3.3
Results of fMRI Experiment
The mean response time (RT) from behavioral data collected with fMRI simultaneously over 16 subjects is about 2.9 seconds, which shows that subjects used about 3 seconds to answer the simplified 4*4 Sudoku puzzles. fMRI data preprocessing (e.g., motion correction) and statistical analysis were performed with NeuroImaging Software package (NIS, kraepelin.wpic.pitt.edu/nis/). The percent change BOLD (Blood Oxygen Level-Dependent) in cortex shows in the dotted line in Fig. 4 reflecting the brain activity in predefined regions associated to different cognitive modules for problem solving. In this study, we mainly focus on the visual module (associated to fusiform gyrus (FG) cortex and goal module (associated to cingulated cortex (ACC)).
4 4.1
Simulation on ACT-R Architecture Target of ACT-R Model
Both behavioral data RT and functional data BOLD% in different predefined regions of FG and ACC from fMRI data tell us how long subjects solved the problem and which brain regions were activated. Another thing we concern more about is how subjects finish solving the complex tasks. We want to get processes of the information processing of heuristic search in detail. Yet fMRI experiment cannot tell us the answers, but ACT-R does. As one of the most popular theory and computational models of human cognitive architecture [2,3,5], the ACT-R proposes systematical hypotheses on the basic structure of human cognitive system and functions of these structures in information processing to generate the human cognitive behavior and their associated BOLD response of brain regions (which is most superior than other cognitive architecture, e.g., SOAR [16]). It is also a computer software platform for the development of computational models to quantitatively simulate and predict human behavior for a wide range of cognitive tasks [4,17,18,19]. Hence we used ACT-R as a tool to investigate the information processing process. 4.2
ACT-R Architecture
In ACT-R architecture, there are eight modules that correspond to human capacities, as shown in Fig. 3. For example, the visual module records the visual attention and the goal module records the goal and control state. These eights modules are integrated to produce coherent cognition in problem solving. Furthermore, these eights modules are associated with distinct regions demonstrated from a series of psychology experiments where operations on the visual module reflects on the activity of FG and the goal module reflects on ACC. Accordingly, BOLD responses on these regions can be predicted from the operations of related modules [1-6,17,18,19].
Interaction between Visual Attention and Goal Control
Visual Perception
Vocal Control
Visual
Vocal Manual Control Manual Declarative Memory Declarative
411
Production System Procedural
Control State Goal
Aural Perception Aural Problem State Imaginal
Fig. 3. Eight modules in ACT-R architecture
4.3
Tasks Analysis
The goal of task analysis is always to get cognitive assumptions on how subjects finish solving problem in a very small granule [14,22]. In heuristic search studies, a typical view is that searching always begins on the most constraint item of problem so as to get maximal information. Integrated this view and the constraints of heuristics of a simplified 4*4 Sudoku puzzle, we get cognitive processes of the puzzle solving. Subjects search for ‘?’ firstly and then they shift attention to the dimensions related to ‘?’ to search three digits and ignore irrelative information. While all related information is gathered and integrated, a heuristic will be fired to retrieve the answer of ‘?’. At last, a response of key press is made and the answer is spoken out. During the period, a serial of goal and control states in the goal module are set to guide the progress of problem solving. In ACT-R, several distinct functional modules are cooperated to finish these processes, including visual attention, goal and state control, problem presentation, memory retrieval and production rule. In this paper, we only focus on visual module and goal module and their interaction. Operations in visual selective attention include from taking a glance to search ‘?’ based on ‘global-first’ theory [8] to find the answer of ‘?’. And operations on the goal module include finding out all items in heuristics and shifting states control step by step until answering the question. 4.4
Predictions of ACT-R Model
Based on tasks analysis, an ACT-R model is set up. Prediction of a behavioral result, the mean latencies (also call response time (RT)) over 16 participants from the presentation of tasks to the thumb press is about 2.84 seconds, which is deviated 0.06 from 2.9 seconds against real data. Visual selective attention costs some time in each operation while goal state control costs less time than
412
R. Wang, J. Xiang, and N. Zhong
I05,
$&75
)XVLIRUP*\UXVU
$&75
&LQJXODWHU
I05, %2/'5HVSRQVH
%2/'5HVSRQVH
7LPH6HF
7LPH6HF
Fig. 4. ACT-R predictions on visual and goal modules
visual selective attention, because they are parallel with visual attention according to the ACT-R theory [3]. The solid lines in part (a) and part (b) of Fig. 4 displayed predictions of percent change of BOLD response in two predefined regions (only left cortex). The base line scans that to be contrasted is two former scans (2 seconds per scan) before the stimulus emerging. – Part (a) of Fig. 4 shows the results for the left fusiform region (FG) that associates to visual module. The predictions showed a good match in real data. The correlation with the left fusiform was about 0.97. – Part (b) of Fig. 4 shows the results for the left anterior cingulated cortex (ACC) that associates to goal module. The correlation with the left ACC region was about 0.91. The fitness shows that our hypotheses on visual module and goal module are reasonable and the model is validated.
5 5.1
Interaction of Visual Attention and Goal State Control Function of Visual Attention
Visual attention is one of the human capacities that is superior to machine intelligence where human vision can capture selectively relevant information in the scene and ignore distractions. Many current studies have demonstrated the function of selective attention model [9,26]. In our study, selective attention of visual module is one of the main factors that results in high performance of human heuristic search. It shows the power of human perception system in controlling input information from outside world. One interpret of selective attention involves that the input capacity of human visual system is limited [26]. In this study, we suggest that it is also part of human intelligence that reduces the cost of cognition.
Interaction between Visual Attention and Goal Control
413
The main hypotheses in visual operation for the simplified 4*4 Sudoku involve taking a global-glance to capture ‘?’ firstly, and then only attending the digits related to ‘?’ selectively. The search for ‘?’ is based on the “global-first” theory of vision, proposed by Lin Chen in 1982 [8]. In ACT-R model, there are three steps to perform a visual operation. First is to attend the location of the target from a ‘visual-location’ chunk; second is to shift attention to the location from a ‘move-attention’ chunk; last is to encode the character of the location from a ‘text’ chunk. The total time consumed of three steps is about 185 milliseconds. To validate the hypothesis, we test it on ACT-R model against a casual attention process in the local-first theory where three digits were attended before finding ‘?’. Table 1 shows the results of two models from which demonstrate that more digits attended should increase time cost. For example, it will cost about 740 milliseconds to attend three digits and ‘?’, and cost 1110 milliseconds to attend six digits. So the second model with the ‘attend all’ hypothesis costs more time and is not a reasonable model. Table 1. Time cost in two visual hypotheses ACT-R Model Selective Attention Attend ’ ?’ (Global-first) Casual Attention (Local-first) Selective Attention Attend all (Global-first) Casual Attention (Local-first)
5.2
Visualizing Items ’ ?’
Time Cost (ms) 185
Three digits,’ ?’
740
’ ?’, three digits
740
’ ?’, six digits,
1295
Function of Goal State Control
Besides the capacity of visual perceptional system, another factor guiding selective attention is goals state control inside of human mind. Current studies viewed that a goal state can guide or control which item of the scene should be attended and should be encoded during visual search [10]. A goal refers to an intent of mental process. In heuristic search, the goal sequences are heuristic information that helps to find out the answer. And in the simplified 4*4 Sudoku, goals involve information in heuristics, they are ‘?’ and three digits related to ‘?’. In ACT-R, the goal module is responsible for maintaining goal states during searching. While the current goal is achieved, it will be erased and the next goal will be loaded, this loop happened again until the final goal is achieved. Goal state sequences can guide attention capturing only related information and ignoring distractions during search. Thus attention operation can reduce the cost of cognition and improve the performance of human vision. Attention on each item will not only cost much time, as shown in Table 1, but also cause more BOLD activities according to the ACT-R theory [1,17,18].
414
5.3
R. Wang, J. Xiang, and N. Zhong
Interaction between Two Cognitive Modules
Goal state control in human mind decides the intention of searching on what information is to be selected. In heuristic search, a goal state itself becomes a heuristic item that helps to speed up the search. Without goal state controls in mind, heuristic searching will be a casual operation, which means that any item in scene may be captured. In this case, the time cost will be limitless until a goal is selected. Current studies viewed that a goal state can guide or control which item in the scene should be attended and should be encoded during visual search [13,23]. The interaction between selective attention and goal state control may speed up the heuristic search in problem solving. It looks as a process of pattern recognition. When a goal state is active in mind, the visual perception system is fired to search in the range of scene and to find out something matching the goal only. While the item is encoded, the current goal is erased, and then a new goal is loaded again. This process is looped until the answer is achieved in the end. Table 2 (take Fig. 1(a) for example) shows the interaction between the visual attention module and the goal module in solving the puzzle. Table 2. Information processing of goal-oriented selective attention Operation Visual Module Lists Visual Location Text 1 (250,250) Null 2 (280,260) ‘?’ 3 Row of ’ ?’ 4 (205,260) ‘3’ 5 6 (255,260) ‘1’ 7 Column of ‘?’ 8 (280,210) ‘4’ 9 Done 10 11
6
Goal Module Goal State ‘?’ Searching Encoding First digit Searching Encoding Second digit Searching Encoding Third digit Searching Encoding Answer Integrating Answer Retrieving Pressing
Discussion
The information processing sequences in Table 2 extracted from ACT-R output traces in the visual module and the goal module suggested that cooperation of these two functional cognitive systems plays an important role in speeding up heuristic searching. Visual attention takes in perception stimulant from external world while goal state controller sends out the requests from internal mind. This well interaction can speed up heuristic search, which looks a rational circuit of human intelligence. And the issue on cooperation of human perceptional system (including the aural module, etc.) and the thought system (including memory, reasoning, etc.) may be a long-term topic so as to promote a harmonious way of human brain working with environment. Although we do not refer to other functional cognitive systems (e.g., memory retrieval, problem presentation and
Interaction between Visual Attention and Goal Control
415
production rule) here, it does not mean that they are not important in heuristic search. Just because they have less influence in heuristic search in relation to the visual system and the goal control system. In future work, we will continue considering other functional modules (e.g., vision and memory [7,9]) that cooperate to finish heuristic search).
Acknowledgments We would like to thank Prof. Yulin Qin for giving many good suggestions in this study. This work is supported by the National Natural Science Foundation of China under Grant No. 60875075 and partly by the Beijing Natural Science Foundation under Grant No. 4102007.
References 1. Anderson, J.R., Qin, Y.L., Sohn, M.H., et al.: An information-processing model of the BOLD response in symbol manipulation tasks. Psychonomic Bulletin and Review 10(2), 241–261 (2003) 2. Anderson, J.R., Bothell, D., Byrne, M.D., et al.: An integrated theory of Mind. Psychological Review 111, 1036–1060 (2004) 3. Anderson, J.R.: Human symbol manipulation within an integrated cognitive architecture. Cognitive Science 29(3), 313–341 (2005) 4. Anderson, J.R., Albert, M.V., Fincham, J.M.: Tracing problem solving in real time: fMRI analysis of the subject-paced Tower of Hanoi. Journal of Cognitive Neuroscience 17(8), 1261–1274 (2005) 5. Anderson, J.R.: How Can the Human Mind Occur in the Physical Universe? Oxford University Press, USA (2007) 6. Anderson, J.R., Qin, Y.L., Jung, K.J., et al.: Information-processing modules and their relative modality specificity. Cognitive Psychology 54(3), 185–217 (2007) 7. Bays, P.M., Husain, M.: Dynamic Shifts of Limited Working Memory Resources in Human Vision. Science 321(5890), 851–854 (2008) 8. Chen, L.: Topological structure in visual perception. Science 218, 699–700 (1982) 9. Fockert, J.W., Rees, G., Frith, C.D., et al.: The role of working memory in visual selective attention. Science 291, 1803 (2001) 10. Luszczynska, A., Diehl, M., Gutirrez-Do?a, B., et al.: Measuring one component of dispositional self-regulation: attention control in goal pursuit. Personality and Individual Differences 37(3), 555–566 (2004) 11. Laha, D., Subhash, C.S.: A heuristic to minimize total flow time in permutation flow shop. Omega 37, 734–739 (2009) 12. McCarthy, J.: From here to human-level AI. Artificial Intelligence 171(18), 1174– 1182 (2007) 13. Michel, W., Rik, P., John, L.: Attention Switching During Scene Perception: How Goals Influence the Time Course of Eye Movements Across Advertisements. Journal of Experimental Psychology: Applied 14(2), 129–138 (2008) 14. Newell, A., Simon, H.A.: Human Problem Solving. Prentice-Hall, Englewood Cliffs (1972)
416
R. Wang, J. Xiang, and N. Zhong
15. Newell, A., Simon, H.A.: Computer science as empirical inquiry: Symbols and search. Communications of the Association for Computing Machinery 19(3), 113– 126 (1976) [1975 ACM Turing Award lecture] 16. Newell, A.: Unified theories of cognition. Harvard University Press, MA (1990) 17. Qin, Y.L., Sohn, M.H., Anderson, J.R., et al.: Predicting the practice effects on the blood oxygenation level-dependent (BOLD) function of fMRI in a symbolic manipulation task. PNAS (Proceedings of the National Academy of Sciences of the United States of America) 100(8), 4951–4956 (2003) 18. Qin, Y.L., Carter, C.S., Silk, E., et al.: The change of the brain activation patterns as children learn algebra equation solving. PNAS (Proceedings of the National Academy of Sciences of the United States of America) 101(15), 5686–5691 (2004) 19. Qin, Y.L., Bothell, D., Anderson, J.R.: ACT-R meets fMRI. In: Zhong, N., Liu, J., Yao, Y., Wu, J., Lu, S., Li, K., et al. (eds.) WImBI 2006. LNCS (LNAI), vol. 4845, pp. 205–222. Springer, Heidelberg (2007) 20. Renaud, J., Boctor, F.F., Ouenniche, J.: A heuristic for the pickup and delivery traveling salesman problem. Computers and Operations Research 27(9), 905–916 (2000) 21. Simon, H.A., Schaeffer, J.: The game of chess. In: Aumann, R.J., Hart, S. (eds.) Handbook of game theory, vol. (1), pp. 2–17. Elsevier, Holland (1992) 22. Simon, H.A.: The information-processing theory of mind. American Psychologist 50(7), 507–508 (1995) 23. Song, S.H., Chen, A.J., Nycum, T.J., et al.: Esposito Attention Reduces Variability of Goal-relevant Perceptual Representations within Visual Association Cortex. NeuroImage 47, S65 (2009) 24. Weiskopf, N., Sitaram, R., Josephs, O., et al.: Real-time functional magnetic resonance imaging: methods and applications. Magnetic Resonance Imaging 25(6), 989–1003 (2007) 25. Wang, R.F., Xiang, J., Zhou, H.Y., et al.: Simulating Human Heuristic Problem Solving: A Study by Combining ACT-R and fMRI Brain Image. Proc. Brain Informatics, 53–62 (2009) 26. Yantis, S.: To See Is to Attend. Science 299(5603), 54–56 (2003)
The Role of Posterior Parietal Cortex in Problem Representation Jie Xiang1,2 , Yulin Qin1,3 , Junjie Chen2 , Haiyan Zhou1 , Kuncheng Li4 , and Ning Zhong1,5 1
The International WIC Institute, Beijing University of Technology, China College of Computer and Software, Taiyuan University of Technology, China 3 Dept of Psychology, Carnegie Mellon University, USA Dept of Radiology, Xuanwu Hospital Capital University of Medical Sciences, China 5 Dept of Life Science and Informatics, Maebashi Institute of Technology, Japan [email protected], [email protected] 2
4
Abstract. Problem representation is one of the key factors in problem solving. According to previous studies, PPC (Posterior Parietal Cortex) is critical for problem representation. Whether does problem expression form affect problem representation? What are the cognitive role of PPC in representation? In order to answer these questions, a fMRI experiment was performed in this study to examine the role of PPC in problem solving. It was a 2 × 2 designed experiment with two 2-level factors: task complexity (one-step and two-steps) and expression form (digits and symbols). In a digital task, 4 digits are provided in the initial grids, while 4 symbols of poker are provided in a symbolic task. In a task of one-step, participants only need one time rules retrieving to get the target answer, while in a task of two-steps, participants need two times rules retrieving to get the answer of target after getting the answer of a bridging location. The results of fMRI show that PPC activated significantly. The further analysis shows that there is a positive correlation between the activation intensity of PPC and task complexity, but the correlation between the activation intensity of PPC and task expression is not significant. According these results, we infer that PPC plays an important role in problem representation, maybe this representation is a high level abstraction.
1
Introduction
Problem solving is one of the most general form of human being’s intelligence, cognitive psychology has done a great deal of research on it. Newell and Simon [1,2] proposed the information processing theory of problem solving. In their theory, problem solving is considered as a series of goal-directed cognitive operations. A problem should have initial state, target state and operations on problem states, and problem solving is the process of a series of operations which can change the problem states, once changing to the target state, the problem has been solved. The core components of problem solving are the identification and representation of problem states, as well as the selection and application of Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 417–426, 2010. c Springer-Verlag Berlin Heidelberg 2010
418
J. Xiang et al.
operations. Based on the view of information processing, Newell [3] and Anderson [4] respectively developed cognitive architecture system to simulate the human brain’s problem solving process, and their works give more comprehension on the psychological process and psychological mechanisms of problem solving further. However, these studies are mainly based on behavior experiments, such as oral reports and computer simulation, these methods are difficult to reveal cognitive neural mechanisms of problem solving. Focusing on cognitive neuroscience mechanism of problem solving, lots of studies have made some preliminary meaningful results by using brain imaging technology, such as PET (Positron emission tomography) and fMRI (functional magnetic resonance imaging). The tasks of these studies include Tower of Hanoi (TOH) and Tower of London (TOL). Fincham [5] improved TOH, and used event-related fMRI to study the information processing of problem solving. In this study, participants were trained to use the same strategy before MRI scans. The activated brain regions in this experiment include the right dorsolateral prefrontal (BA 9), bilateral parietal (BA 40,7), bilateral primary motor areas (BA 6), et al. Anderson and his team [6] used ACT-R (Adaptive Control of Thought-Rational) to explore and simulate the problem solving process of TOH, they focused on the BOLD (Blood Oxygen Level Dynamic) effect of PPC (BA39,40), PFC (Prefrontal Cortex, BA45,46), motor region (BA3,4). In their opinion, problem representation induce PPC activation, and PPC playes a crucial role in problem representation. There are also many studies about TOL. Baker [7] used PET to study the activated brain regions while participants solving TOL problem. By comparing TOL tasks with control task as well as the more difficult tasks with easier tasks, he infered that PPC is related to represent intermediate problem states in visual space working memory. Dagher [8] designed TOL task with five difficulty levels and a resting level as a baseline. PET results show that PPC are related to moving steps. Odile [9] improved and designed a event-related fMRI experiment using TOL task. The results show that the BOLD of inferior parietal cortex of task is higher than the baseline. A fMRI study of Wagner [10] shows that bilateral dorsolateral prefrontal, right ventrolateral prefrontal, left rostral prefrontal, thalamus, bilateral parietal are all related to task complexity. Newman [11] used fMRI, computer simulation and functional connections to study the solving process of TOL. The results show that the left and right prefrontal both are activated, and the function connection pattern of left prefrontal is different with the right. In order to explain the inherent neural mechanisms of problem solving, some researchers used TOL task to explore the ability of patients’ problem solving. Shallice [12] used TOL to study the planning and problem solving ability of patients with brain injury. The results show that patients with prefrontal damage perform poorly, the main reason is that patients are lack of effective advanced planning capacity. Owen [13] also found that the problem solving ability decline due to the prefrontal damage. His study also found that hippocampus, temporal lobe and parietal brain regions also affect the ability of problem solving. The results of Dagher’s[14] PET study on Parkinson’s patients also show that
The Role of Posterior Parietal Cortex in Problem Representation
419
the activated brain regions are related to task complexity include PFC, ACC (Anterior Cingulated Cortex), PPC and caudate nucleus. Based on those previous studies, this study focuses on the role of PPC in problem solving. Although the most of existing studies have showed the activation of this brain region, the exploration of its function is still relatively limited. This study uses a new kind of problem solving task, 4 × 4 Sudouk, the task don’t require much background knowledge and only seven basic operation rules may be used to solve it. The process of this problem solving is simple and clear, so it is facilitate to repeated experiment to increase the ratio of signal to noise. On the other hand, the behaviors of participants are easy to control, and easy to infer the relationship between cognitive component and activated brain regions. We assumes that PPC will be significantly activated in 4 × 4 Sudouk, there will be a positive correlation between BOLD and complexity, and activation intensity will not correlate to expression.
2 2.1
Experiment Participants
19 college students (9 males and 10 females) from Beijing University of Technology were scanned after obtaining informed consent. The average age of participants was 22.8, and all of them were right-handed. 2.2
Tasks
Event related fMRI data were recorded while participants were solving simplified 4 × 4 sudoku tasks. Sudoku is a combinatorial number-placement puzzle, the goal of the puzzle is to fill a 4 × 4 grid so that each column, each row, and each of the four 2 × 2 boxes contains the digits from 1 to 4 only one time each. As shown in Figure 1, in this study, we simplify the puzzle and ask participants to give the answer of the cell marked with ‘?’. It was a 2 × 2 designed experiment with two 2-level factors: task complexity (one-step and two-steps) and expression form (digits and symbols). There are totally four types of tasks, in digital form task, 4 digits are provided in the initial grids (e.g., Figure 1(a) and 1(c)), while 4 symbols of poker are provided in a symbolic task (e.g., Figure 1(b) and 1(d)); in a task of 1-step, participants only needed to find the answer of the cell with a mark ‘?’ (e.g., Figure 1(a) and 1(b)); while in a task of 2-steps, participants had to find the answer of the cell with mark ‘*’ before they could find the answer of the cell with mark ‘?’ (e.g., Figure 1(c) and 1(d)). 2.3
The Protocal of Stimuli Presentation
As shown in Figure 2, a trial of the experiment starts with a red star shown for 2 seconds as warning (the stimulus was visually shown on a black screen), the participants solved the puzzle in the following period of 20 seconds in maximum.
420
J. Xiang et al.
♠ ♣♦
1
3
♦
3
(a)one-step digits
(b) one-step symbols
(c)two-steps digits
1
2 3 4 ?
♥
2 * 4 ?
?
♠ ♣* ♥?
♦
(d) two-steps symbols
Fig. 1. Four experiment tasks
When participants found the answer of ‘?’, they were asked to press a button immediately and spoke out the answer in a 2 seconds period. Participants were encouraged to finish the problem as correctly and quickly as possible. After that, the correct answer was provided in the screen for 2 seconds as a feedback. Then there was a 10 seconds of inter-trial interval (ITI; a white cross shown on the screen) and the participants were asked to take rest in this period. There were 5 sessions each with 48 or more trials and each session involved 4 types of tasks randomly selected with equal probability.
1 *
2 3 4 ?
1
3 2s
2 3 4 1
+
3 20s
2s
2s
10s
Time
Fig. 2. The process of stimuli presentation
2.4
Tasks Training
In order to ensure uniform behaviors, we trained participants to acquaint seven heuristic rules shown in Figure 3 before fMRI scan. Then they were asked to practice four kinds of formal experimental tasks, five for each rule, solve problem step by step, and report aloud the process of problem solving. The training process ensure participants use the same cognitive process during they were scaning, that is checking column and/or row and/or box firstly, then integrating these information and representing the initial state, and finally retrieving these rules one or two times to get the target answer. 2.5
fMRI Scan
The images were acquired on a 3.0T MR scanner (Siemens Trio+timGenmeny) NMR equipment and a SS-EPI (single shot echo planar imaging) sequence sensitive to BOLD signals was used to acquire the fMRI data. The functional images were acquired with the following parameters: TR = 2000 ms, TE = 30 ms, flip angle = 90o , FOV = 200mm × 200mm, matrix size = 64 × 64, slice thickness = 3.2 mm, slice gap = 0 mm, and 32 axial slices with AC-PC on the 10th slice from the bottom of brain.
The Role of Posterior Parietal Cortex in Problem Representation
1 ? 2 3
3 4 2 1
?
1 (a) Row heuristics
4 4
2 1 4
1
3
?
(d) Row & column heuristics
?
(b) Column heuristics
?3
(e) Row & Box heuristics
4 2 1 3
?
(c) Box heuristics
2
2 4 1 1
421
3
(f) Column & Box heuristics
2 4 ? 3
(g) Row & column & Box heuristics
Fig. 3. Training tasks of seven rules
2.6
Data Processing
Data preprocessing (e.g., motion correction) and statistical analysis were performed with NeuroImaging Software package (NIS, kraepelin.wpic.pitt.edu/nis/). 9 participants were excluded due to head movement exceeding 5mm. Deleting the first 4 images for each session, all of the remain images were coregistered to a common reference structural MRI image and smoothed with a 6 mm fullwidth half-maximum three-dimensional Gaussian filter. The activated threshold of group analysis is P<0.01, activated regions≥ 8 contiguous voxels or more. After exploratory group fMRI analysis, ROI (Region of Interesting) analysis were performed, bilateral PPC (talariach coordinates: ±23, -63, 40) was defined as a ROI based on the results of group analysis to get BOLD effect of PPC.
3 3.1
Results Behavioral Data
Behavior results are shown in Figure 4. The RT (reaction time) of the 2-steps task is significantly longer than that of 1-step task [F(1,9)=41.263, P<0.0001], and the symbols task is significantly longer than digital task [F(1,9)=47.179, P<0.0001]; the ACC (accuracy) of digital task is higher than that of symbol tasks [F(1,9)=10.785, P<0.01], and 1-step task is higher than 2-steps [F(1,9)=1.236, P=0.295]. 3.2
fMRI Results
As shown in Figure 5, PPC, frontal middle gyrus, ACC, FEF(Frontal Eye Field) are all activated in this experiment, and the activation maxima is localized in the PPC. According to our research purpose, we focus on the role of PPC in problem solving. As shown in Figure 6, bilateral PPC are defined as ROIs, each ROI
422
J. Xiang et al. 8000
) 6000 s m ( T R 4000 n a e M 2000 0 1-step digits
2-steps digits
1-step symbols
2-steps symbols
1-step digits
2-steps digits
1-step symbols
2-steps symbols
102 100 98 ) % 96 ( C C 94 A 92 90 88
Fig. 4. The reaction time and accuracy of four experiment tasks
Fig. 5. Activated brain regions from ex- Fig. 6. The regions of interested of PPC ploratory analysis (left is right brain, right defined based on activated brain regions is left brain) (the pink region is group analysis results, the yellow is PPC ROI)
contains 100 voxels. The BOLD effects of these two ROIs are shown in Figure 7, the accumulate change from 5 to 10 scans of BOLD effect were calculated as BOLD intensity to do further ANOVA analysis. For the left PPC, the ANOVA results show that the main effect of task expression is not significant [F(1,9)=0.393, P=0.546], the main effect of step is significant [F=32.055, P<0.0001]. For the right PPC, the main effect of task expression is not significant [F(1,9)=0.613, P=0.454], the main effect of step is significant [F(1,9)=21.022, P<0.001].
The Role of Posterior Parietal Cortex in Problem Representation
423
Fig. 7. BOLD signal of bilateral PPC in four experimental tasks Above is left PPC, Follow is right PPC
3.3
The Correlation between Behavioral Data and the fMRI Results
The results of Pearson correlation analysis between behavior data(RT, ACC) and BOLD intensity of PPC show that the correlation coefficient between RT and BOLD signal of left PPC is 0.78 (p<0.05), and the right PPC is 0.75 (p<0.05); the correlation coefficient between the ACC and BOLD signal of left PPC is -0.38 (p<0.05), the right PPC is -0.35 (p<0.05).
4
Discussion
This paper firstly uses 4×4 Sudoku tasks to explore the neural mechanisms of human brain problem solving, and focuses on the role of PPC in the process of problem solving. Consistent with the assumption of the research, we observe the significant activation of bilateral PPC, the left PFC, FEF, ACC, SMA, and so on. This is consistent with the previous studies [6,8,17]. This study explores that the activation of PPC is positively correlated with task complexity, which is consistent with many previous studies [8-9, 15-16]. In
424
J. Xiang et al.
this study, the more difficult tasks require two times rules retriving, while in TOH and TOL tasks, the more difficult tasks need movement for more times. In general, they are the same in essence. The presentation of problem states is essential for problem solving. Bustini [17] compared 17 schizophrenia patients on TOH task with 17 normal, and the results showed that the performance of schizophrenia patients are significant worse than normal, and the further data regression analysis showed that the critical factor of the lower problem solving ability of schizophrenia patients may be the confusion of the presentation of problem context. Based on this cognitive analysis, we infer that PPC, the strongest activated regions, is related to the presentation and operation of problem spaces. The inference was verified by the significant correlation between the behavioral data and PPC activity. This inference is consistent with a large number of previous studies on the PPC. Many studies have reported that PPC is related to the presentation of information space. For example, the studies of the monkey and human have all revealed that PPC is a critical component of space presentation. From the view of anatomical location, PPC locates in the dorsal pathway of visual system, and lots of studies found that it is related to the perception of the location of spatial object. Many studies also showed that PPC involves in spatial information processing. After summarizing the previous studies on mammals and human, Jonathan [18] proposed that PPC plays an important role in coordinate information in the environment converting to internal presentation; Todd [19] argued that PPC is related to the storage of perception information; Anderson [20] proposed that PPC is related to encoding and updating problem states, and he proposed PPC is the neuron mechanism of the buffer of imaginal model in ACT-R; Based on TOL, Wagner [10] infers that PPC is related to visual spatial information processing, the imagination and operations of stimulated materials. In this study, the activation of PPC is not related to task expression form. Either digits or four kinds of poker symbols, the abstract presentation of 4×4 Sudoku is the same. This indicates that PPC may reflect the abstract presentation of problem states, but is not related to specific expression form of problem. The conclusion is consistent with the previous studies that PPC involves in multisensory information integration [21-22]. In a sense, this indicates that PPC may represent problem states at a higher level of abstraction. As the first study using 4×4 Sudoku, a new kind of problem paradigm, the study also has some shortcomings. For example, the 4×4 Sudoku task can not distinguish the spatial information integration and problem presentation. The future work will improve the existing experimental design to explore the cognitive role of PPC, multi-view analysis methods, such as dynamic causal modelling, will be used in to explain the information processing of problem solving.
Acknowledgement The authors would like to thank Shengfu Lv and Peipeng Liang for their comments on this paper,the authors also would like to thank Lijuan Wang, Yinyin
The Role of Posterior Parietal Cortex in Problem Representation
425
Hu and Fenfen Wang for their help in experiments. This work was supported by the Natural Science Foundation of China (No. 60875075 and 60975032).
References 1. Newell, A., Simon, H.A.: Human problem solving. Prentice-Hall, New Jersey (1972) 2. Simon, H.A.: Search and reasoning in problem solving. Artificial Intelligence 21, 7–29 (1983) 3. Newell, A.: Unified Theories of Cognition. Harvard University Press, Cambridge (1990) 4. Anderson, J.R.: ACT: A simple theory of complex cognition. American Psychologist 51, 355–365 (1996) 5. Fincham, J.M., Carter, C.S., Vincent, V.V., et al.: Neural mechanisms of planning: A computational analysis using event-related fMRI. PNAS 99(5), 3346–3351 (2002) 6. Anderson, J.R., Albert, M.V., Fincham, J.M.: Tracing problem solving in real time: fMRI analysis of the subject-paced Tower of Hanoi. Journal of Cognitive Neuroscience 17(8), 1261–1274 (2005) 7. Baker, S.C., Rogers, R.D., Owen, A.M., et al.: Neural systems engaged by planning: a PET study of the Tower of London task. Neuropsychologia 34, 515–526 (1996) 8. Dagher, A., Owen, A.M., Boecker, H., et al.: Mapping the network for planning: A correlational PET activation study with the Tower of London task. Brain 122(10), 1973–1987 (1999) 9. Odile, A., Henk, J.G., Barkhof, F., et al.: Fron-tostriatal system in planning complexity: a parametric functional magnetic resonance version of Tower of London task. NeuroImage 18, 367–374 (2003) 10. Wagner, G., Koch, K., Reichenbach, J.R., et al.: The special involvement of the rostro-lateral prefrontal cortex in planning abilities: An event-related fMRI study with the Tower of London paradigm. Neuropsychologia 44, 2337–2347 (2006) 11. Newman, S.D., Carpenter, P.A., Varma, S., Just, M.A.: Frontal and parietal participation in problem solving in the Tower of London: fMRI and computational modeling of planning and high-level perception. Neuropsychologia 41, 1668–1682 (2003) 12. Shallice, T.: Specific impairments of planning. Philosophical Transactions of the Royal Society London[Biol.] 298, 199–209 (1982) 13. Owen, A.M., Downes, J.D., Sahakian, B.J., et al.: Planning and spatial working memory following frontal lobe lesions in man. Neuropsychologia 28, 1021–1034 (1990) 14. Rasser, P.E., Johnston, P., Lagopoulos, J., et al.: Functional MRI BOLD response to Tower of London performance of first-episode schizophrenia patients using cortical pattern matching. NeuroImage 26, 941–951 (2005) 15. Lazeron, R.H., Rombouts, S.A., Machielsen, W.C., et al.: Visualizing brain activation during planning: the tower of London test adapted for functional MR imaging. American Journal of Neuroradiology 21, 1407–1414 (2000) 16. Schall, U., Johnston, P., Lagopoulos, J., et al.: Functional brain maps of Tower of London performance: a PET and fMRI study. NeuroImage 20, 1154–1161 (2003) 17. Bustini, M., Stratta, P., Daneluzzo, E., et al.: Tower of Hanoi and WCST performance in schizophrenia: problem-solving capacity and clinical correlates. Journal of Psychiatric Research 33, 285–290 (1999)
426
J. Xiang et al.
18. Whitlock, J.R., Sutherland, R.J., Witter, M.P., et al.: Navigating from hippocampus to parietal cortex. PNAS 105(39), 14755–14762 (2008) 19. Todd, J.J., Marois, R.: Posterior parietal cortex activity predicts individual differences in visual short-term memory capacity. Cognitive, Affective, & Behavioral Neuroscience 5(2), 144–155 (2005) 20. Anderson, J.R.: How Can the Human Mind Occur in the Physical Universe? Oxford University Press, USA (2007) 21. Andersen, R.A., Snyder, L.H., Bradley, D.C., Xing, J.: Multimodal representation of space in the posterior parietal cortex and its use in planning movements. Annual Review of Neuroscience 20, 303–320 (1997) 22. Medendorp, W.P., Goltz, H.C., Crawford, J.D., Vilis, T.: Integration of target and effector information in human posterior parietal cortex for the planning of action. Journal of Neurophysiology 93(2), 188–199 (2005)
Basic Level Advantage and Its Switching during Information Retrieval: An fMRI Study Haiyan Zhou1 , Jieyu Liu1 , Wei Jing1 , Yulin Qin , Shengfu Lu1 , Yiyu Yao1,3 , and Ning Zhong1,4 1,2
1
4
International WIC Institute, Beijing University of Technology, China 2 Dept. of Psychology, Carnegie Mellon University, USA 3 Dept. of Computer Science, University of Regina, Canada Dept. of Life Science and Informatics, Maebashi Institute of Technology, Japan [email protected], [email protected]
Abstract. Basic level advantage effect in human cognition suggested there was a more readily accessible cognitive status to the human mind than others during information retrieval, but the phenomenon of advantage effect switching to superordinate level suggested basic level would not always a sufficient way. Processing demands of tasks might explain the discrepancy between basic and superordinate precedence. In this study, we used word-picture matching task and picture-word matching task with same materials to investigate the neural systems of basic level advantage and its switching. Results showed that more activation in the area of fusiform and lingural gyrus in word-picture matching task, suggesting a visual perceptual processing loaded more in this task; while more activation in left inferior frontal gyrus in picture-word matching task, suggesting a semantic memory relied more in this task. These contrast analysis revealed different processing strategies across the two tasks which led to different advantage effect. Furthermore, the inferior parietal lobe played an important role for the advantage effect during information retrieval, with weakest deactivations appeared in the superordinate level in word-picture matching task and in the intermediate level in pictureword matching task, which might served as a control or centralized system to integrate all kinds of cognitive resources.
1
Introduction
The scale requirement is a huge problem in Web and other IT areas, which caused by the explosion of information and knowledge (e.g., reasoning and retrieving about 10 billion RDF triples in less than 100 ms) [1]. The real world faced by human beings is also with distributed, incomplete, inconsistent, dynamic massive scale heterogeneous information sources. By evolution, human beings have developed sophisticated heuristic search skills in reasoning, retrieving, problem solving and decision making. We developed a system based on the basic level advantage in human cognition to speed up information retrieval in huge semantic repositories [2]. Understanding the principles and mechanisms of Y.Y. Yao et al. (Eds.): BI 2010, LNAI 6334, pp. 427–436, 2010. c Springer-Verlag Berlin Heidelberg 2010
428
H. Zhou et al
information organization, retrieval and selection in human memory is helpful to find more cognition-inspired methods of information organization, problemsolving and reasoning at the Web scale. To further explore the neural mechanism of human information organization and retrieval, we carried out this fMRI study to investigate how human brain areas cooperated to achieve basic level advantage and its switching during information retrieval. Basic level was first found by Rosch [3]. She found that children tended to select the concept in the middle level (such as “dog”) to name a picture, but not a concept in the superordinate (such as “animal”) or subordinate (such as “beagle”) level. Many other researchers also found this kind of phenomenon in adult people in picture naming and other picture-word matching tasks [4,5], and the response to this level was more correctly and faster than others. It seemed that the basic level was the preferred cognitive status of the intermediate level of abstractness [3], more readily accessible to the human mind than others [5], and information in that level was more frequent than others [6]. But other researches found that the basic level advantage would disappear in some constraints and the processing advantage level switched to other levels. VanRullen [7] used a rapid visual categorization of objects task in their study, and found categorization in superordinate level took place before that in basic level. Large [8] manipulated the similarity between target and nontarge, they found the response time was shortest when matching word and picture in the superordinate level. Others also found the reversal of basic level advantage in semantic dementia patients [4,5] and healthy people [5] in rapid matching task. These results suggested that although information in different levels contributed to a problem solving, basic level would not always a sufficient way; whereas selecting more general information might be a shortcut sometimes. A possible explanation for this discrepancy between basic and superordinate precedence might relate to the differences in processing demands of tasks used to measure the performance [8]. For instance, in rapid object detection tasks, stimuli are presented for brief periods. The brief presentation might load upon the visual perceptual processing of objects. While tasks such as object naming involving a relatively longer stimuli presentation and the process might depend more on semantic memory and knowledge organization. Because of the good spatial resolution and low invasiveness, in this study we used the technique of fMRI (functional Magnetic Resonance Imaging) to investigate when basic level advantage and its switching happened in two different tasks, how the neural system modulates to the processing strategies changing.
2 2.1
Method Participants
Thirty-two Beijing University of Technology students (16 males and 16 females) participated in the fMRI study signed a written informed consent. 15 of them finished the word-picture matching task and 17 finished the picture-word matching
Basic Level Advantage and Its Switching during Information Retrieval
429
task. Age of the participants was from 24 to 26 years old. All participants were right-handed and reported no neurological diseases or psychiatric disorders. 2.2
Tasks and Materials
Two tasks were used in this study. One was word-picture matching (WP), in which a concept word presented first and then followed a picture, and participants were asked to judge whether the picture matched the word or not. The other was picture-word matching (PW), in which a picture presented first and then a concept word, and participants were asked to judge whether the word matched the picture or not. In WP task, a first presented word would provide a semantic cue to guide the picture processing and a visual demand would be important to finish this kind of matching task, so the response time to a superordinate level would be fastest since the visual recognition needed least detail information and visual comparison. While in PW task, a first presented picture could correspond to different concepts in different levels, so the concept that was the most easily to access would be retrieved first than others and a basic level advantage would appear. We use same stimuli in these two tasks. Pictures were 32 color photographs of animals and vehicles. They included a range of animals and vehicles for classification at the superordinate level; a set of cows, sheep, cars and boats for classification at the intermediate level; and 4 different photographs for each of eight specific categories: water buffalo, milk cows, goat, jumbuckgoat, bus, truck, sailboat and steamboat. There were 3 sessions and 18 trials in each session. In the subordinate conditions, distracters were always items from the same intermediate category as the target; for instance, if the word stimulus was “water buffalo”, the distracter was a photograph of a different kind of “cow”. In intermediate condition, distracters were always from the same superordinate category as the target, for the word stimulus “sheep”, the distracter was always another kind of animal. For the superordinate condition, distracters were selected from a different superordinate category than the target. For example, “vehicle”, the distracter could be any kind of animal. 2.3
Procedure
We used fast event-related design to present materials. Figure 1 showed the procedure of word-picture matching task (WP). A 500 ms fixation of star was presented first, then a word was appeared in the screen for 1000 ms followed by a 500 ms cross, finally a picture was appeared lasting for 2000 ms. Participants were demanded to respond in 2000 ms with pressing index finger to matching pictures and pressing middle finger to non-matching pictures. The intervals between trials were randomly changed from 0 ms to 14000 ms. As shown in figure 2, the procedure of picture-word matching task (PW) was quite similar to the word-picture matching task, except for that picture was appeared previously for 1000 ms followed by the word for 2000 ms, and participants were asked to respond when the word appeared.
430
H. Zhou et al
Fig. 1. Procedure of word-picture matching task
Fig. 2. Procedure of picture-word matching task
2.4
fMRI Imaging
Each participant performed 3 functional sessions, and each session took 8’14” with 244 images. All images were acquired from a 3.0 Tesla MRI system (Siemens Trio Tim; Siemens Medical System, Erlangen, Germany). Functional images were sequenced from bottom to top in a whole brain EPI acquisition. The following scan parameters were used: TE = 31 ms, flip angle = 90, matrix size = 64 by 64, field of view = 200 mm by 200 mm, slice thickness = 0 mm, number of slices = 32, TR =2000 ms. In addition, a high resolution, T1 weighted 3D image was acquired (SPGR, TR = 1600 ms, TE = 3.28 ms, flip angle = 9, matrix size =256 by 256, field of view = 256 mm by 256 mm, slice thickness = 1 mm, number of slices = 192). The orientation of the 3D image was identical to the functional slices. 2.5
fMRI Data Processing
fMRI data were analyzed using SPM2 (Statistical Parametric Mapping, Institute of Neurology at University College London, UK. http://www.fil.ion.ucl.ac.uk/spm). The first 14 s created 4 images, which were discarded to minimize the transit effects of hemodynamic response. The functional images were corrected for differences in slice-acquisition time to the middle volume. Images were resampled to 2 by
Basic Level Advantage and Its Switching during Information Retrieval
431
2 by 2 mm3 voxles. Co-registered images were normalized to the EPI template and spatially smoothed with an 8 mm FWHM Gaussian kernel. Data from each participant were entered into a general linear model using an event-related analysis procedure [9]. Parameter estimates from contrasts of the canonical HRF in single subject models were entered into random-effects analysis using one-sample t tests across all participants to determine whether activation during a contrast was significant (i.e., parameter estimates were reliably greater than 0). All reported areas of activation were significant using P < 0.005 uncorrected for the voxel level and contained a cluster size greater than 0 voxels. We concentrated on reporting the results of contrast between WP and PW tasks. We also did ROI (Region of Interest) analysis. ROI was defined by the definition function in Marsbar and the percentage of signal change was calculated. The center point of ROI was selected based on the result of contrast analysis, with radius of 6 mm.
3 3.1
Results Behavioral Performance
A repeated measures ANOVA performed on response times (RTs) for revealed a main effect of concept level (subordinate, intermediate and superordinate) (F (2, 28) = 50.669, P = 0.000 for WP task and F (2, 32) = 16.683, P = 0.000 for PW task). In WP task, response time to superordinate level was fatest (mean = 762ms), then to intermediate (mean = 802ms) and subordinate (mean = 882ms) level. Pairwise comparisons indicated that the response time to superordinate level was significantly faster than that to intermediate and subordinate level (both P < 0.01), suggesting a reversal of basic level advantage effect. However, in PW task, response time to intermediate level was fastest (mean = 832ms), then to superordinate (mean = 869ms), and that to subordinate level was still the slowest (mean = 897ms). Pairwise comparisons indicated that the response time to intermediate level was significantly faster than that to superordinate and subordinate level (both P < 0.001), suggesting a typical basic level advantage effect. The behavioral data showed task demand affects the advantage effect during information retrieval although stimuli were same in the two tasks. 3.2 fMRI Results 3.2.1 Contrast between Two Tasks In the contrast of WP vs. PW in each concept level showed that more activation was widely found in occipital, parietal and frontal lobes. More important, consistent with our expectation, we found more activation in the regions of fusiform gyrus(BA37) and lingual gyrus (BA17,18) in WP task (top of figure 3), which suggested more visual processing was involved in WP task. While in the contrast of PW vs. WP, more activation was mainly found in the left inferior frontal gyrus (BA45) in PW task (bottom of figure 3), suggesting more efforts of retrieving information from memory system in PW task.
432
H. Zhou et al
Fig. 3. Contrast results between two tasks
3.2.2 Advantage Effect We measured the advantage effect by contrast analysis of advantage level vs. other levels, which respectively referred to [superordinate vs. intermediate] and [superordiante vs. subordinate] in WP task, and [intermediate vs. superordinate] and [intermediate vs. subordinate] in PW task. The results suggested that the inferior parietal lobe (IPL) was the only consistent areas across all the advantage effect contrast analysis. Further result of ROI analysis was shown in figure 4. There were typical deactivations in bilateral inferior parietal lobe, and these areas were belonged to the default mode network [10,11]. Less deactivation would be in default mode network when there was less cognitive effort. In both the left and right inferior parietal lobe, the deactivation was weakest in the condition of superordinate level in WP task, while the weakest deactivation was in the condition of intermediate level in PW task. For the left inferior parietal gurus (IPG), a repeated measures ANOVA revealed a main effect of concept level in WP task, F (2, 28) = 6.756, P = 0.004, and pairwise analysis showed the intensity was weaker in the superordinate level than that in both the intermediate and subordinate level (both P < 0.05). For the right inferior parietal gurus (IPG), the main effects of concept level were significant in both tasks (F (2, 28) = 5.830, P = 0.008 in WP task, and F (2, 32) = 3.791, P = 0.03 in PW task). Pairwise analysis showed the intensity
Basic Level Advantage and Its Switching during Information Retrieval
433
Fig. 4. BOLD effects in bilateral inferior parietal lobe across two tasks
was weaker in the superordinate level than that in the subordinate level (P < 0.05) in the WP task, and the intensity was weaker in the intermediate level than that in the subordinate level (P < 0.05) in the PW task. The results of contrast and ROI analysis suggested that bilateral inferior parietal lobe play an important role in advantage effect during information retrieval. The BOLD effect in these areas changed with the advantage effect switching. When the advantage effect located in the intermediate level, the weakest deactivation was appeared in intermediate level (as in the PW task), and when the advantage effect switched to the superordinate level, the weakest deactivation changed to that level (as in the WP task), suggesting lowest cognitive efforts were required in these two conditions.
4 4.1
Discussions Advantage Effect during Information Retrieval
In this study, we used word-picture matching task and picture-word matching task with same materials to investigate the neural systems of advantage effect during information retrieval. The behavioral performance showed there was a typical basic level advantage effect in picture-word matching task (PW); and a switching advantage effect to the superordinate level in word-picture matching task. The results were consistent with our expectations. As to picture-word matching task, which is similar to naming task [3], a picture was corresponding to different levels, and retrieving any concepts from these levels would get a correct answer. Because the information in the middle level was more frequent [6] and readily accessible [5] than that in other levels, which was a basic level, so retrieving the concept names from this level was quite easier and faster than concept names in other levels, and a basic level advantage effect appeared in this kind of task. While as to the task of word-picture matching (WP), the previous appeared word could activate the corresponding meaning or images stored in memory system, and participant would have a tendency to respond. When a picture appeared in the screen later, the condition of superordinate level could be recognized most rapidly because there was no need of careful visual processing and
434
H. Zhou et al
demonstrate details to recognize, just as that in the rapid response task [5] or rapid visual categorization task [7], and also same with the constrained condition in Large [8] because of the enlarged visual comparison; therefore, the advantage effect switched to the superordinate level in this task. So the processing strategies were different across the PW and WP task. Although both the tasks were demanded to judge whether picture and word matched or not, but the appearance order of picture and word decided the modulation and integration of cognitive components. In PW task, participants retrieved concept from memory system directly, but in WP task, participants relied more on visual processing. The fMRI results supported the view of different processing strategies across these two tasks. Contrast analysis between WP and PW tasks showed that there was more activation in the visual processing related regions, such as fusiform and lingual gyrus when comparing WP to PW; while there was more activation in left inferior frontal gyrus when comparing PW to WP. Fusiform and lingual gyrus are typical areas related to visual processing and recognition. fMRI research found that bilateral occipital lobe would be activated in picture naming task [12], and left fusiform gyrus would also be involved in semantic categorization task [13]. A recent research found that there was no difference in bilateral inferior occipital gyrus and fusiform between the conditions of superordinate and intermediate level in a picture naming task, but the activation in the inferior temporal cortex was different [14]. We will do more analysis in the occipital and temporal conjunction areas by comparing concept levels to investigate the role of these areas in the processing of information retrieval. The frontal lobe was related to task control. It had been widely argued that the left inferior frontal gyrus (LIFG) was involved in the control of retrieval of information from long-term memory. Studies had shown that the inferior frontal gyrus was related to information retrieving and selecting, but not to the memory storage [15,16]. Recent claims that the LIFG was involved in selecting among semantic alternatives, even in the absence of controlled retrieval processes [17]. We will also do further analysis in LIFG by comparing concept levels to know more about the role of LIFG during information retrieval. 4.2 The Role of Inferior Parietal Lobe during Information Retrieval In this study, we found that the area of inferior parietal lobe (IPL) played an important role in the basic-level advantage effect and its switching during information retrieval. First, the contrast analysis showed that the inferior parietal lobe was the only consistent areas across all the advantage effect contrast. That is to say, in the WP task, more activation was appeared in this area when comparing superordinate to other levels, and in the PW task, more activation in this area when comparing intermediate to other levels. Second, ROI analysis supported the contrast results. Since there was deactivation in this area when performing cognitive task, weakest deactivation were in the condition of superordinate level in WP task and in the condition of intermediate level in PW task. The area of IPL was located in the conjunction system within visual, auditory, and spatial processing regions, so the IPL might play a key role in integrating all
Basic Level Advantage and Its Switching during Information Retrieval
435
kinds of information and resources from other regions. Research also found that IPL was involved in the integration of features and semantic organization to form a clear and definite concept [18]. In the field of language processing, IPL was proved to be related to the ability of reading and writing. Research found patient with this area damaged was also hard to finish semantic tasks [19], suggesting that IPL was not only important to integrate orthographic and phonological information, but also to combine semantic information with orthographic or phonological information. Therefore, IPL was a very important area involved in concept integration and retrieval. In our study, whether in the task of WP or in the task of PW, participants needed to compare the visual presented word and pictures to semantic knowledge in memory system, and then judge whether the word and picture match or not. Both information from visual processing and semantic memory were required to integrate in the two tasks. When in the task of WP, the visual processing was more important and the condition of superordinate level needed least visual load, so there was a weakest deactivation in the condition of superordinate level to integrate visual and semantic information. While in the task of PW, because participants relied more on the semantic memory, and the information in the intermediate level was most familiar and ready to access than that in other levels, least efforts was required to integrate visual and semantic information in the intermediate level, so weakest deactivation appeared in this condition. 4.3
Summary
In a summary, there were at least two important things to be noted when considering basic-level advantage effect and its switching during information retrieval: First, many cognitive components were involved in the processing of information retrieval; although sometimes the cognitive components could be same, the processing strategies might change according to task demands or constraints, and human could make use of cognitive resources very flexibly as a result of experience accumulation during evolution and development; Second, it seemed that there was a control or centralized system which specialized to integrate and combine all kinds of information or resources to achieve task demand, and that control or centralized system would help to change processing strategies flexibly finally.
Acknowledgments This study was supported by the National Science Foundation of China (No.60875075), Beijing University of Technology Doctoral Foundation Project (No. 52007999200701) and Beijing Natural Science Foundation (No. 4102007).
References 1. Fensel, D., Harmelen, F.v.: Unifying reasoning and search to web scale. IEEE International Computing 11(2), 94–96 (2007) 2. Zeng, Y., Zhong, N., Wang, Y., Qin, Y., Huang, Z., Zhou, H., Yao, Y., Harmelen, F.v.: User-centric query refinement and processing using granularity based strategies. Knowledge and Information Systems (in press)
436
H. Zhou et al
3. Rosch, E., Mervis, C.B., Gray, W., Johnson, D., Boyes-Braem, P.: Basic objects in natural categories. Cognitive Psychology 8, 382–439 (1976) 4. Rogers, T., Hocking, J., Nopperney, U., Mechelli, A., Gorno-Tempini, M., Paterson, K., Price, C.: Anterior temporal cortex and semantic memory: Reconsiling findings from neuropsychology and functional imaging. Cognitive, Affective, & Behaviroral Neuroscience 6(3), 201–213 (2006) 5. Rogers, T., Patterson, K.: Object categorization: Reversals and explanations of the basic-levle advantage. Journal of Experimental Psychology: Gerneral 136(3), 451–469 (2007) 6. Wisniewski, E.J., Murphy, G.L.: Superordinate and basic category names in discourse: A textual analysis. Discourse Processing 12, 245–261 (1989) 7. VanRullen, R., Thorpe, S.J.: Is it a bird? is it a plane? ultra-rapid visual categorization of natural and artifactual objects. Perception 30, 655–688 (2001) 8. Large, M., Kiss, I., McMullen, P.: Electrophysiological correlates of objects categorization: Back to basics. Cognitive Brain Research 20, 415–426 (2004) 9. Josephs, O., Henson, R.N.: Event-related functional magnetic resonance imaging: Modelling, inference and optimization. Philosophical Transactions of the Royal Society B: Biological Sciences 354, 1215–1228 (1999) 10. Binder, J.R., Frost, J.A., Hammeke, T.K., Bellgowan, P.S.F., Rao, S.W., Cox, R.W.: Conceptual processing during the conscious resting state: A functional mri study. Journal of Cognitive Neuroscience 11, 80–93 (1999) 11. Raichle, M.E., McLeod, A.M., Snyder, A.Z., Powers, W.J., Gusnard, D.A., Shulman, G.L.: A default mode of brain function. PNAS 98, 676–682 (2001) 12. Chouinard, P.A., Goodale, M.A.: Category-specific neural processing for naming pictures of animals and naming pictures of tools: An ale meta-analysis. Neuropsychologia 48, 409–418 (2010) 13. Bright, P., Moss, H., Tyler, L.K.: Unitary vs. multiple semantics: Pet studies of word and picture processing. Brain and Language 89, 417–432 (2004) 14. Tyler, L.K., Stamatakis, E.A., Bright, P., Acres, K., Abdallah, S., Rodd, J.M., Moss, H.E.: Processing objects at different levels of specificity. Journal of Cognitive Neuroscience 16, 351–362 (2004) 15. Bookheimer, S.: Functional mri of language: New approaches to understanding the cortical organization of semantic processing. Annual Review of Neuroscience 25, 151–188 (2002) 16. Badre, D., Wagner, A.D.: Left ventrolateral prefrontal cortex and the cognitive controlo of memory. Neuropsychologia 45, 2883–2901 (2007) 17. Moss, H.E., Abdallah, S., Fletcher, P., Bright, P., Pilgrim, L., Acres, K., Tyler, L.K.: Selecting among competing alternatives: Selection and retrieval in the left inferior frontal gyrus. Cerebral Cortex 15, 1723–1735 (2005) 18. Assums, A., Marshall, J.C., Noth, J., Zilles, K., Fink, G.R.: Difficulty of perceptual spatiotemporal integration modulates the neural activity of left inferior parietal cortex. Neuroscience 132, 923–927 (2005) 19. Hofmann, M.J., Herrmann, M.J., Dan, I., Obrig, H., Conrad, M., Kunchinke, L., Jacobs, A.M., Fallgatter, A.J.: Differential activation of frontal and parietal regions during visual word recognition: An optical topography study. NeuroImage 40, 1340– 1349 (2008)
Author Index
Ab Aziz, Azizi 263 Alvarez-Linera, Juan 78 An, Aijun 308 Ansorg, Ralf 168 Apkarian, A. Vania 212 Avesani, Paolo 112
Jarrold, William L. 299 Javitz, Harold S. 299 Jia, Xiuqin 387 Jing, Wei 427 Kargar, Mehdi 308 Khan, Javed 240 Klein, Michel C.A. 274 Koelstra, Sander 89 Krasnow, Ruth 299 Kristal, Bruce S. 42 Kroll, Eike B. 200 Kurian, C. Joseph 288
Baghdadi, Mohamed 156 Baliki, Marwan N. 212 Barrett, Stephen 55 Benamrane, Nac´era 156 Benton, Ryan 320 Bosse, Tibor 14 Both, Fiemke 274
Lee, Jong-Seok 89 Lemoine, Blake 320 Li, Jiaojiao 377 Li, Kuncheng 387, 417 Li, Lanlan 145 Li, Mi 377 Liang, Peipeng 387 Liu, Dazhong 399 Liu, Jieyu 427 Liu, Li 145 Liu, Quanying 145 Longo, Luca 55 L´ opez, Eva 78 Lu, Shengfu 377, 387, 427 Lu, Wanxuan 399
Caron-Pargue, Josiane 252 Castellanos, Erick 328 Cecchi, Guillermo A. 212 Chen, Jianhui 365 Chen, Junjie 417 Chen, Rui 232 Covarrubias, Pablo 328 Croce, Danilo 133 Deng, Jiefang 346 D´ıaz, Gloria 78 Ebrahimi, Touradj
89
Malpica, Norberto 78 Matsuyama, Yasuo 101 Mei, Yang 387 Melek, William 288 Memon, Zulfiqar A. 14 Miao, Duoqian 224 M¨ uhl, Christian 89, 180
Galvan, Francisco 328 Garc´ıa, Gregorio 328 Goel, Vinod 1 Hamam, Yskandar 336 Hardas, Manas 240 Hatakeyama, Takashi 101 Hern´ andez-Tamames, Juan Antonio Hoogendoorn, Mark 14, 29, 274 Hori, Tatsuro 101 Hsu, D. Frank 42 Hu, Bin 145 Hu, Vivian Qinwin 288 Huang, Jimmy Xiangji 288 Huang, Runhe 365
78
Nasrabadi, Ali Motie Nijholt, Anton 89 Noguchi, Keita 101 Ochiai, Nimiko
101
Pajares, Gonzalo 78 Patras, Ioannis 89
124
438
Author Index
Peintner, Bart 299 Peng, Hong 145 Penny-Leguy, Delphine Poel, Mannes 180 Pun, Thierry 89
van Wissen, Arlette Vogt, Bodo 200 252 Wang, Deng 224 Wang, Guoyin 346 Wang, Rifeng 407 Wang, Yingxu 2 Wei, Zhihua 224 Wu, Jinglong 357
Qi, YanBing 145 Qin, Yulin 417, 427 Ramos, F´elix 328 Rayburn, Sara 320 Reuderink, Boris 180 Rieger, J¨ org 200 Rish, Irina 212 Rodr´ıguez, Felipe 328 Romero, Eduardo 78
Xiang, Jie 407, 417 Xie, Chen 224 Xiong, Yun 192 Xue, Li 192
Sais, Lakhdar 156 Schwabe, Lars 168 Schweikert, Christina 42 Sharpanskykh, Alexei 67 Soleymani, Mohammad 89 Sona, Diego 112 Stamps, Kenyon 336 Su, Chang 346 Swan, Gary E. 299 Talebi, Nasibeh 124 Treur, Jan 14, 29, 67, 263, 274 Umair, Muhammad
14
van der Wal, C. Natalie van Vliet, Marijn 180
29
29, 263
Yang, Jiajia 357 Yang, Yanhui 387 Yang, Yong 346 Yao, Yiyu 427 Yazdani, Ashkan 89 Yeh, Eric 299 Yokotani, Suguru 357 Zanzotto, Fabio Massimo 133 Zhang, Hongyun 224 Zhao, Lun 224 Zhao, Qinglin 145 Zhong, Ning 365, 377, 387, 399, 407, 417, 427 Zhou, Haiyan 417, 427 Zhou, Renlai 232 Zhu, Yangyong 192