Lecture Notes in Artificial Intelligence Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science
6437
Grigori Sidorov Arturo Hernández Aguirre Carlos Alberto Reyes García (Eds.)
Advances in Artificial Intelligence 9th Mexican International Conference on Artificial Intelligence, MICAI 2010 Pachuca, Mexico, November 8-13, 2010 Proceedings, Part I
13
Series Editors Randy Goebel, University of Alberta, Edmonton, Canada Jörg Siekmann, University of Saarland, Saarbrücken, Germany Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany Volume Editors Grigori Sidorov Instituto Politécnico Nacional Centro de Investigación en Computación Av. Juan Dios Batiz, s/n, Zacatenco, 07738 Mexico City, México E-mail:
[email protected] Arturo Hernández Aguirre Centro de Investigación en Matemáticas (CIMAT) Departamento de Ciencias de la Computación, Callejón de Jalisco s/n Mineral de Valenciana, Guanajuato, 36240, Guanajuato, México E-mail:
[email protected] Carlos Alberto Reyes García Instituto Nacional de Astrofísica, Optica y Electrónica (INAOE) Coordinación de Ciencias Computacionales, Luis Enrique Erro No. 1 Santa María Tonantzintla, 72840, Puebla, México E-mail:
[email protected]
Library of Congress Control Number: 2010937860 CR Subject Classification (1998): I.2, I.2.9, I.4, F.1, I.5.4 LNCS Sublibrary: SL 7 – Artificial Intelligence ISSN ISBN-10 ISBN-13
0302-9743 3-642-16760-8 Springer Berlin Heidelberg New York 978-3-642-16760-7 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180
Preface
Artificial intelligence (AI) is a branch of computer science that models the human ability of reasoning, usage of human language and organization of knowledge, solving problems and practically all other human intellectual abilities. Usually it is characterized by the application of heuristic methods because in the majority of cases there is no exact solution to this kind of problem. The Mexican International Conference on Artificial Intelligence (MICAI), a yearly international conference series organized by the Mexican Society for Artificial Intelligence (SMIA), is a major international AI forum and the main event in the academic life of the country’s growing AI community. In 2010, SMIA celebrated 10 years of activity related to the organization of MICAI as is represented in its slogan: “Ten years on the road with AI”. MICAI conferences traditionally publish high-quality papers in all areas of artificial intelligence and its applications. The proceedings of the previous MICAI events were also published by Springer in its Lecture Notes in Artificial Intelligence (LNAI) series, vols. 1793, 2313, 2972, 3789, 4293, 4827, 5317, and 5845. Since its foundation in 2000, the conference has been growing in popularity and improving in quality. This book contains 38 papers that were peer-reviewed by reviewers from the independent international Program Committee. The book is structured into five thematic areas representative of the main current topics of interest for the AI community (excluding the areas related to soft computing that are presented in another volume corresponding to the MICAI conference): − − − − −
Natural language processing Robotics, planning and scheduling Computer vision and image processing Logic and distributed systems AI-based medical applications
The other volume that corresponds to MICAI 2010 contains the papers related to several areas of soft computing: − − − − −
Machine learning and pattern recognition Automatic learning for natural language processing Evolutionary algorithms and other naturally inspired algorithms Hybrid intelligent systems and neural networks Fuzzy logic
We are sure that the book will be of interest for researchers in all AI fields, students that are specializing in these topics and for the public in general that pays attention to the recent development of the AI.
VI
Preface
MICAI is an international conference both due to the extended geography of its submissions and to the composition of its Program Committee. Below we present the statistics of the papers submitted and accepted at MICAI 2010. We received 301 submissions from 34 countries, from which 82 papers were accepted. So the general acceptance rate was 27.2%. Since MICAI is held in Mexico, we received many submissions from this country, but the acceptance rate for these papers was even lower: 24%. In the table below, the papers are counted by authors, e.g., for a paper by two authors from the country X and one author from the country Y, we added two-thirds to X and one-third to Y. Table 1. Statistics of MICAI 2010 papers by country
Country
Authors
Submitted
Accepted
Argentina
7
4.00
2.00
Benin
1
0.50
0.50
Brazil
33
13.50
3.00
Canada
3
1.50
1.50
Chile
4
2.00
0.00
China
7
2.50
0.00
Colombia
25
16.67
2.67
Cuba
10
6.78
1.95
Czech Republic
2
2.00
2.00
Finland
1
1.00
0.00
France
11
3.23
0.73
Germany
5
3.25
1.00
Greece
2
0.50
0.00
Hungary
1
0.20
0.20
India
3
1.67
0.00
Iran, Islamic Republic of
9
5.00
1.00
Israel
7
3.33
2.67
Italy
3
0.60
0.60
Japan
4
3.50
2.00
Korea, Republic of
11
4.00
2.00
Lithuania
2
1.00
0.00
384
186.78
45.53
2
0.67
0.00
Mexico New Zealand
Preface
VII
Table 1. (continued)
Country
Authors
Submitted
Accepted
Pakistan
9
4.75
2.67
Poland
6
4.00
1.00
Russian Federation
3
2.00
1.00
Singapore
2
2.33
0.33
Spain
22
7.08
2.25
Sweden
1
1.00
0.00
Taiwan
1
1.00
0.00
Turkey
2
1.00
1.00
UK
8
2.67
1.00
USA
19
9.98
4.40
Venezuela, Bolivarian Republic of
2
1.00
0.00
We want to thank all the people involved in the organization of this conference. In the first place, these are the authors of the papers published in this book: it is the value of their research work that constitutes the essence of the book. We thank the Track Chairs for their hard work and the Program Committee members and additional reviewers for their great effort reviewing the papers, allowing their selection for the conference. We would like to express our sincere gratitude to the Universidad Autónoma del Estado de Hidalgo (UAEH), ITESM Campus Pachuca, Universidad Politécnica de Pachuca (UPP), Rectoría of the UAEH headed by Humberto Veras Godoy, Gerardo Sosa Castelán, General Secretary of the UAEH, and Octavio Castillo Acosta, head of the Basic Sciences and Engineering Institute of the UAEH, for their warm hospitality related to MICAI 2010, and for providing the infrastructure for the presentation of the keynote speakers, tutorials and workshops, and for their valuable participation and support in the organization of this conference. Their commitment allowed that the opening ceremony, technical contributory conferences, workshops and tutorials could be held in the main historic building of the UAEH. We also want to thank the Consejo de Ciencia y Tecnología del Estado de Hidalgo for their partial financial support (project FOMIX 2008/97071) and the Oficina de Convenciones y Visitantes of the State of Hidalgo represented by Lizeth Islas for their valuable effort in organizing the cultural program as well as entertainment activities. We are deeply grateful to the conference staff and to all members of the Local Committee headed by Félix A. Castro Espinoza and Joel Suárez Cansino. The entire submission, reviewing, and selection process, as well as putting together the proceedings, was supported for free by the EasyChair system (www.easychair.org). We are also grateful to Springer’s staff for their help in preparation of this issue. Grigori Sidorov Arturo Hernández-Aguirre Carlos Alberto Reyes-García
Conference Organization
MICAI 2010 was organized by the Mexican Society for Artificial Intelligence (SMIA, Sociedad Mexicana de Inteligencia Artificial) in collaboration with Universidad Autónoma del Estado de Hidalgo (UAEH), Centro de Investigación en Computación del Instituto Politécnico Nacional(CIC-IPN), Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Universidad Nacional Autónoma de México (UNAM), Universidad Autónoma de México (UAM), Instituto Tecnológico de Estudios Superiores de Monterrey (ITESM), and Centro de Investigación en Matemáticas (CIMAT), Mexico. The MICAI series webpage is www.MICAI.org. The webpage of the Mexican Society for Artificial Intelligence (SMIA) is www.SMIA.org.mx. Contact options and additional information can be found on those webpages.
Conference Committee General Chair Carlos Alberto Reyes García Program Chairs Grigori Sidorov Arturo Hernández Aguirre Workshop Chair Gustavo Arroyo Tutorials Chair Rafael Murrieta Plenary Talks and Grants Chair Jesus A. Gonzalez Financial Chair Grigori Sidorov Best Thesis Awards Chair Miguel Gonzalez
X
Organization
Doctoral Consortium Chairs Oscar Herrera Miguel Gonzalez Promotion Chair Alejandro Peña Local Chairs Felix A. Castro Espinoza Joel Suárez Cansino Track Chairs Track
Track Chair
Natural Language Processing
Sofia N. Galicia-Haro
Machine Learning and Pattern Recognition
Mario Koeppen
Hybrid Intelligent Systems and Neural Networks
Carlos A. Reyes-García
Logic, Reasoning, Ontologies, Knowledge Management, Knowledge-Based Systems, Multi-agent Systems and Distributed AI
Raul Monroy
Data Mining
Jesus A. González
Intelligent Tutoring Systems
Alexander Gelbukh
Evolutionary Algorithms and Other Naturally-Inspired Algorithms
Efrén Mezura-Montes, Guillermo Leguizamón
Computer Vision and Image Processing
Alonso Ramírez-Manzanares
Fuzzy Logic, Uncertainty and Probabilistic Reasoning
Oscar Castillo
Bioinformatics and Medical Applications
Olac Fuentes
Robotics, Planning and Scheduling
Gildardo Sánchez
Program Committee Luis Aguilar Ruth Aguilar Teresa Alarcon Alfonso Alba Adel Alimi Annalisa Appice Edgar Arce-Santana
Miguel Arias Estrada Gustavo Arroyo Serge Autexier Victor Ayala-Ramirez Andrew Bagdanov Sivaji Bandyopadhyay Maria Lucia Barrón-Estrada
Organization
Ildar Batyrshin Bettina Berendt Igor Bolshakov Ramon Brena Peter Brusilovsky Phillip Burrell Pedro Cabalar Leticia Cagnina Felix Calderon Hiram Calvo Nicoletta Calzolari Sergio Daniel Cano Ortiz Gustavo Carneiro Juan Martín Carpio Valadez Jesus Ariel Carrasco-Ochoa Oscar Castillo Juan Castro Mario Chacon Aravindan Chandrabose Chuan-Yu Chang Edgar Chavez ZheChen Yueh-Hong Chen Simon Colton Quim Comas Diane Cook Oscar Cordon Juan-Francisco Corona Ulises Cortes Nareli Cruz-Cortés Nicandro Cruz-Ramirez Vicente Cubells Nonell Alfredo Cuzzocrea Oscar Dalmau Justin Dauwels Jorge de la Calleja Marina De Vos Louise Dennis Juergen Dix Lucas Dixon Bernabe Dorronsoro Beatrice Duval Susana Esquivel Marc Esteva Claudia Esteves Julio Estrada Gibran Etcheverry
Eugene Ezin Luis Falcón Francesc J. Ferri Juan J. Flores Andrea Formisano Olac Fuentes Sofia N. Galicia-Haro Jean Gao René Arnulfo García-Hernández Eduardo Garea Alexander Gelbukh Fernando Gomez Pilar Gómez-Gil Eduardo Gomez-Ramirez Jesus A. Gonzalez Arturo Gonzalez Miguel Gonzalez-Mendoza Alejandro Guerra-Hernández Steven Gutstein Hartmut Haehnel Hyoil Han Jin-Kao Hao Yasunari Harada Pitoyo Hartono Rogelio Hasimoto Jean-Bernard Hayet Sergio Hernandez Arturo Hernández Hugo Hidalgo Larry Holder Joel Huegel Marc-Philippe Huget Seth Hutchinson Dieter Hutter Pablo H. Ibarguengoytia Héctor Jiménez Salazar Moa Johansson Young Hoon Joo Chia-Feng Juang Vicente Julian Hiroharu Kawanaka Mario Koeppen Mark Kon Vladik Kreinovich Ricardo Landa-Becerra Reinhard Langmann Yulia Ledeneva
XI
XII
Organization
Yoel Ledo Mezquita Chang-Yong Lee Guillermo Leguizamón Eugene Levner Tao Li James Little Giovanni Lizárraga Lizárraga Aurelio Lopez Edgar Lopez Francisco Luna Gabriel Luque Rene Mac Kinney Tanja Magoc Jacek Malec Luis Ernesto Mancilla Espinosa Claudia Manfredi Jose Marroquin Ricardo Martinez José Fco. Martínez-Trinidad Alfonso Medina Urrea Patricia Melin Efrén Mezura-Montes Mikhail Mikhailov Gabriela Minetti Dunja Mladenic Francois Modave Raul Monroy Manuel Montes-y-Gómez Oscar Montiel Eduardo Morales Rafael Morales Guillermo Morales-Luna Jaime Mora-Vargas Angel E. Munoz Zavala Masaki Murata Tomoharu Nakashima Juan Antonio Navarro Perez Antonio Nebro Atul Negi Juan Carlos Nieves Juan Arturo Nolazco Flores Alberto Ochoa Zezzatti Ivan Olmos Constantin Orasan Magdalena Ortiz Mauricio Osorio
Daniel Pandolfi Ted Pedersen Alejandro Peña Ayala Arturo Perez David Pinto Michele Piunti Silvia Poles Eunice E. Ponce-de-Leon Edgar Alfredo Portilla-Flores Pilar Pozos Jorge Adolfo Ramirez Uresti Alonso Ramirez-Manzanares Zbigniew Ras Fuji Ren Orion Fausto Reyes-Galaviz Carlos A Reyes-Garcia María Cristina Riff Mariano Rivera Eduardo Rodriguez Leandro Fermín Rojas Peña Paolo Rosso Jianhua Ruan Salvador Ruiz Correa Carolina Salto Gildardo Sanchez Frank-Michael Schleif Roberto Sepulveda Leonid Sheremetov Grigori Sidorov Gerardo Sierra Thamar Solorio Humberto Sossa Azuela Graham Steel Luis Enrique Sucar Javier Tejada Cárcamo Hugo Terashima Sulema Torres Ramos Gregorio Toscano-Pulido Fevrier Valdez Aida Valls Berend Jan van der Zwaag Maria Vargas-Vera Karin Verspoor Francois Vialatte Javier Vigueras Eliseo Vilalta
Organization
Manuel Vilares Ferro Andrea Villagra Miguel Villarreal Thomas Villmann Toby Walsh
Julio Zamora Carlos Mario Zapata Jaramillo Ramon Zatarain Claudia Zepeda Cortes Qiangfu Zhao
Additional Reviewers Rita M. Acéves-Pérez Esteve Almirall Tristan Behrens Federico Bergenti Janez Brank Nils Bulling Noe Alejandro Castro Sánchez Elva Díaz Gibran Etcheverry Ivan Figueroa
Jon Ander Gómez Maria Auxilio Medina Sabino Miranda J. Arturo Olvera-López Santiago Ontañón John Quarles Daniel Ramirez-Cano Jorge Alberto Soria-Alcaraz Ivan Varzinczak Esaú Villatoro-Tello Victor Manuel Zamudio Rodriguez
XIII
Table of Contents – Part I
Invited Paper Some Encounters on the Productive Use of a Failed Proof Attempt or a Counterexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ra´ ul Monroy
1
Natural Language Processing Discourse Segmentation for Spanish Based on Shallow Parsing . . . . . . . . . Iria da Cunha, Eric SanJuan, Juan-Manuel Torres-Moreno, Marina Lloberes, and Irene Castell´ on Towards Document Plagiarism Detection Based on the Relevance and Fragmentation of the Reused Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fernando S´ anchez-Vega, Luis Villase˜ nor-Pineda, Manuel Montes-y-G´ omez, and Paolo Rosso Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits . . . . . . Afraz Z. Syed, Muhammad Aslam, and Ana Maria Martinez-Enriquez
13
24
32
A Semantic Oriented Approach to Textual Entailment Using WordNet-Based Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julio J. Castillo
44
On Managing Collaborative Dialogue Using an Agent-Based Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tom´ aˇs Nestoroviˇc
56
Dialog Structure Automatic Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D´ebora Hisgen and Daniela L´ opez De Luise A Probabilistic Model Based on n-Grams for Bilingual Word Sense Disambiguation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Darnes Vilari˜ no, David Pinto, Mireya Tovar, Carlos Balderas, and Beatriz Beltr´ an Information Retrieval with a Simplified Conceptual Graph-Like Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sonia Ordo˜ nez-Salinas and Alexander Gelbukh Teaching a Robot to Perform Tasks with Voice Commands . . . . . . . . . . . . Ana C. Tenorio-Gonzalez, Eduardo F. Morales, and Luis Villase˜ nor-Pineda
69
82
92 105
XVI
Table of Contents – Part I
Music Composition Based on Linguistic Approach . . . . . . . . . . . . . . . . . . . . Horacio Alberto Garc´ıa Salas, Alexander Gelbukh, and Hiram Calvo
117
Robotics, Planning and Scheduling A Practical Robot Coverage Algorithm for Unknown Environments . . . . Heung Seok Jeon, Myeong-Cheol Ko, Ryumduck Oh, and Hyun Kyu Kang An Algorithm for the Automatic Generation of Human-Like Motions Based on Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Carlos Arenas Mena, Jean-Bernard Hayet, and Claudia Esteves
129
141
Line Maps in Cluttered Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leonardo Romero and Carlos Lara
154
Fuzzy Cognitive Maps for Modeling Complex Systems . . . . . . . . . . . . . . . . Maikel Le´ on, Ciro Rodriguez, Mar´ıa M. Garc´ıa, Rafael Bello, and Koen Vanhoof
166
Semantic Representation and Management of Student Models: An Approach to Adapt Lecture Sequencing to Enhance Learning . . . . . . . Alejandro Pe˜ na Ayala and Humberto Sossa
175
An Effective Heuristic for the No-Wait Flowshop with Sequence-Dependent Setup Times Problem . . . . . . . . . . . . . . . . . . . . . . . . . . Daniella Castro Ara´ ujo and Marcelo Seido Nagano
187
Optimizing Alternatives in Precedence Networks . . . . . . . . . . . . . . . . . . . . . Roman Bart´ ak AI-Based Integrated Scheduling of Production and Transportation Operations within Military Supply Chains . . . . . . . . . . . . . . . . . . . . . . . . . . Dmitry Tsadikovich, Eugene Levner, and Hanan Tell Turbo Codification Techniques for Error Control in a Communication Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pablo Manrique Ram´ırez, Rafael Antonio M´ arquez Ram´ırez, Oleksiy Pogrebnyak, and Luis Pastor S´ anchez Fernandez
197
209
221
A New Graphical Recursive Pruning Method for the Incremental Pruning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mahdi Naser-Moghadasi
232
A New Pruning Method for Incremental Pruning Algorithm Using a Sweeping Scan-Line through the Belief Space . . . . . . . . . . . . . . . . . . . . . . . . Mahdi Naser-Moghadasi
243
Table of Contents – Part I
POMDP Filter: Pruning POMDP Value Functions with the Kaczmarz Iterative Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eddy C. Borera, Larry D. Pyeatt, Arisoa S. Randrianasolo, and Mahdi Naser-Moghadasi
XVII
254
Computer Vision and Image Processing Testing Image Segmentation for Topological SLAM with Omnidirectional Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anna Romero and Miguel Cazorla Automatic Image Annotation Using Multiple Grid Segmentation . . . . . . . Gerardo Arellano, Luis Enrique Sucar, and Eduardo F. Morales Spatio-temporal Image Tracking Based on Optical Flow and Clustering: An Endoneurosonographic Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andr´es F. Serna-Morales, Flavio Prieto, and Eduardo Bayro-Corrochano
266
278
290
One Trilateral Filter Based on Surface Normal . . . . . . . . . . . . . . . . . . . . . . . Felix Calderon and Mariano Rivera
301
Beta-Measure for Probabilistic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . Oscar Dalmau and Mariano Rivera
312
Robust Spatial Regularization and Velocity Layer Separation for Optical Flow Computation on Transparent Sequences . . . . . . . . . . . . . . . . Alonso Ramirez-Manzanares, Abel Palafox-Gonzalez, and Mariano Rivera SAR Image Denoising Using the Non-Subsampled Contourlet Transform and Morphological Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jos´e Manuel Mej´ıa Mu˜ noz, Humberto de Jes´ us Ochoa Dom´ınguez, Leticia Ortega M´ aynez, Osslan Osiris Vergara Villegas, Vianey Guadalupe Cruz S´ anchez, Nelly Gordillo Castillo, and Efr´en David Guti´errez Casas
325
337
Logic and Distributed Systems Scheme-Based Synthesis of Inductive Theories . . . . . . . . . . . . . . . . . . . . . . . Omar Montano-Rivas, Roy McCasland, Lucas Dixon, and Alan Bundy
348
A Possibilistic Intuitionistic Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oscar Estrada, Jos´e Arrazola, and Mauricio Osorio
362
XVIII
Table of Contents – Part I
Jason Induction of Logical Decision Trees: A Learning Library and Its Application to Commitment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro Guerra-Hern´ andez, Carlos Alberto Gonz´ alez-Alarc´ on, and Amal El Fallah Seghrouchni Extending Soft Arc Consistency Algorithms to Non-invertible Semirings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefano Bistarelli, Fabio Gadducci, Javier Larrosa, Emma Rollon, and Francesco Santini Frequency Transition Based upon Dynamic Consensus for a Distributed System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oscar A. Esquivel Flores and H´ector Ben´ıtez P´erez
374
386
399
AI-Based Medical Application Towards Ubiquitous Acquisition and Processing of Gait Parameters . . . . Irvin Hussein L´ opez-Nava and Ang´elica Mu˜ noz-Mel´endez
410
Intelligent Wheelchair and Virtual Training by LabVIEW . . . . . . . . . . . . . Pedro Ponce, Arturo Molina, Rafael Mendoza, Marco Antonio Ruiz, David Gregory Monnard, and Luis David Fern´ andez del Campo
422
Environmental Pattern Recognition for Assessment of Air Quality Data with the Gamma Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jos´e Juan Carbajal Hern´ andez, Luis Pastor S´ anchez Fern´ andez, and Pablo Manrique Ram´ırez Massive Particles for Brain Tractography . . . . . . . . . . . . . . . . . . . . . . . . . . . Ram´ on Aranda, Mariano Rivera, Alonso Ram´ırez-Manzanares, Manzar Ashtari, and James C. Gee Emotional Conversational Agents in Clinical Psychology and Psychiatry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mar´ıa Lucila Morales-Rodr´ıguez, Juan Javier Gonz´ alez B., Rogelio Florencia Ju´ arez, Hector J. Fraire Huacuja, and Jos´e A. Mart´ınez Flores Knowledge-Based System for Diagnosis of Metabolic Alterations in Undergraduate Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miguel Murgu´ıa-Romero, Ren´e M´endez-Cruz, Rafael Villalobos-Molina, Norma Yolanda Rodr´ıguez-Soriano, Estrella Gonz´ alez-Dalhaus, and Rafael Jim´enez-Flores Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
436
446
458
467
477
Table of Contents – Part II
Invited Paper Discovering Role of Linguistic Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boris Stilman, Vladimir Yakhnis, and Oleg Umanskiy
1
Machine Learning and Pattern Recognition Elkan’s k-Means Algorithm for Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brijnesh J. Jain and Klaus Obermayer
22
A Simple Approach to Incorporate Label Dependency in Multi-label Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Everton Alvares Cherman, Jean Metz, and Maria Carolina Monard
33
Methods and Algorithms of Information Generalization in Noisy Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vadim Vagin and Marina Fomina
44
Object Class Recognition Using SIFT and Bayesian Networks . . . . . . . . . Leonardo Chang, Miriam Monica Duarte, Luis Enrique Sucar, and Eduardo F. Morales
56
Boosting Based Conditional Quantile Estimation for Regression and Binary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Songfeng Zheng
67
A Fast Fuzzy Cocke-Younger-Kasami Algorithm for DNA and RNA Strings Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Her´ on Molina-Lozano
80
A Fast Implementation of the CT EXT Algorithm for the Testor Property Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guillermo Sanchez-Diaz, Ivan Piza-Davila, Manuel Lazo-Cortes, Miguel Mora-Gonzalez, and Javier Salinas-Luna Supervised Probabilistic Classification Based on Gaussian Copulas . . . . . Rogelio Salinas-Guti´errez, Arturo Hern´ andez-Aguirre, Mariano J.J. Rivera-Meraz, and Enrique R. Villa-Diharce Text-Independent Speaker Identification Using VQ-HMM Model Based Multiple Classifier System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ali Zulfiqar, Aslam Muhammad, A.M. Martinez-Enriquez, and G. Escalada-Imaz
92
104
116
XX
Table of Contents – Part II
Towards One-Class Pattern Recognition in Brain Activity via Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Omer Boehm, David R. Hardoon, and Larry M. Manevitz Real Time Tracking of Musical Performances . . . . . . . . . . . . . . . . . . . . . . . . Antonio Camarena-Ibarrola and Edgar Ch´ avez Recognizing Dactylogical Symbols with Image Segmentation and a New Differentiated Weighting Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laura Jeanine Razo Gil, Salvador Godoy-Calder´ on, and Ricardo Barr´ on Fern´ andez
126 138
149
Automatic Learning for Natural Language Processing Selecting Candidate Labels For Hierarchical Document Clusters Using Association Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabiano Fernandes dos Santos, Veronica Oliveira de Carvalho, and Solange Oliveira Rezende Recognizing Textual Entailment Using a Machine Learning Approach . . . Miguel Angel R´ıos Gaona, Alexander Gelbukh, and Sivaji Bandyopadhyay
163
177
Detection of Different Authorship of Text Sequences through Self-organizing Maps and Mutual Information Function . . . . . . . . . . . . . . . Antonio Neme, Blanca Lugo, and Alejandra Cervera
186
Supervised Machine Learning for Predicting the Meaning of Verb-Noun Combinations in Spanish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Olga Kolesnikova and Alexander Gelbukh
196
Hybrid Intelligent Systems and Neural Networks On the Structure of Elimination Trees for Bayesian Network Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kevin Grant and Keilan Scholten
208
CBR and Neural Networks Based Technique for Predictive Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sohail Sarwar, Zia Ul-Qayyum, and Owais Ahmed Malik
221
Combining Neural Networks Based on Dempster-Shafer Theory for Classifying Data with Imperfect Labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mahdi Tabassian, Reza Ghaderi, and Reza Ebrahimpour
233
Stability and Topology in Reservoir Computing . . . . . . . . . . . . . . . . . . . . . . Larry Manevitz and Hananel Hazan
245
Table of Contents – Part II
A Radial Basis Function Redesigned for Predicting a Welding Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rolando J. Praga-Alejo, Luis M. Torres-Trevi˜ no, David S. Gonz´ alez, Jorge Acevedo-D´ avila, and Francisco Cepeda Dynamic Neural Networks Applied to Melody Retrieval . . . . . . . . . . . . . . . Laura E. Gomez, Humberto Sossa, Ricardo Barron, and Julio F. Jimenez Recognition of Huffman Codewords with a Genetic-Neural Hybrid System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eug`ene C. Ezin, Orion Fausto Reyes-Galaviz, and Carlos A. Reyes-Garc´ıa
XXI
257
269
280
Fraud Detection Model Based on the Discovery Symbolic Classification Rules Extracted from a Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wilfredy Santamar´ıa Ruiz and Elizabeth Le´ on Guzman
290
Hardware Implementation of Artificial Neural Networks for Arbitrary Boolean Functions with Generalised Threshold Gate Circuits . . . . . . . . . . Maciej Nikodem
303
Evolutionary Algorithms and Other Naturally-Inspired Algorithms Solving No-Wait Flow Shop Scheduling Problems by a Hybrid Quantum-Inspired Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . Tianmin Zheng and Mitsuo Yamashiro
315
Reducing the Search Space in Evolutive Design of ARIMA and ANN Models for Time Series Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan J. Flores, Hector Rodriguez, and Mario Graff
325
Routing Algorithms for Wireless Sensor Networks Using Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian Dom´ınguez-Medina and Nareli Cruz-Cort´es
337
Approximating Multi-Objective Hyper-Heuristics for Solving 2D Irregular Cutting Stock Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan Carlos Gomez and Hugo Terashima-Mar´ın
349
Particle Swarm Optimization with Gravitational Interactions for Multimodal and Unimodal Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan J. Flores, Rodrigo L´ opez, and Julio Barrera
361
Particle Swarm Optimization with Resets – Improving the Balance between Exploration and Exploitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yenny Noa Vargas and Stephen Chen
371
XXII
Table of Contents – Part II
MiTS: A New Approach of Tabu Search for Constructing Mixed Covering Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Loreto Gonzalez-Hernandez and Jose Torres-Jimenez
382
On the Best Evolutionary Wavelet Based Filter to Compress a Specific Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oscar Herrera Alc´ antara
394
Fuzzy Logic Financial Time Series Prediction in Cooperating with Event Knowledge: A Fuzzy Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Do-Thanh Sang, Dong-Min Woo, Dong-Chul Park, and Thi Nguyen The Fuzzy Syllogistic System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˙ Kumova and H¨ Bora I. useyin C ¸ akir Big Five Patterns for Software Engineering Roles Using an ANFIS Learning Approach with RAMSET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luis G. Mart´ınez, Antonio Rodr´ıguez-D´ıaz, Guillermo Licea, and Juan R. Castro New Proposal for Eliminating Interferences in a Radar System . . . . . . . . . Carlos Campa, Antonio Acevedo, and Elena Acevedo
406
418
428
440
Type-2 Fuzzy Inference System Optimization Based on the Uncertainty of Membership Functions Applied to Benchmark Problems . . . . . . . . . . . . Denisse Hidalgo, Patricia Melin, and Oscar Castillo
454
Fuzzy Logic for Parameter Tuning in Evolutionary Computation and Bio-inspired Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fevrier Valdez, Patricia Melin, and Oscar Castillo
465
Fuzzy Logic Controllers Optimization Using Genetic Algorithms and Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ricardo Martinez-Soto, Oscar Castillo, Luis T. Aguilar, and Patricia Melin FPGA Implementation of Fuzzy System with Parametric Membership Functions and Parametric Conjunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prometeo Cort´es Antonio, Ildar Batyrshin, Heron Molina Lozano, Luis A. Villa Vargas, and Imre Rudas
475
487
Table of Contents – Part II
Fuzzy Logic Hardware Implementation for Pneumatic Control of One DOF Pneumatic Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan-Manuel Ramos-Arreguin, Emmanuel Guillen-Garcia, Sandra Canchola-Magdaleno, Jesus-Carlos Pedraza-Ortega, Efren Gorrostieta-Hurtado, Marco-Antonio Aceves-Fern´ andez, and Carlos-Alberto Ramos-Arreguin Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XXIII
500
513
Some Encounters on the Productive Use of a Failed Proof Attempt or a Counterexample Ra´ ul Monroy Tecnol´ ogico de Monterrey, Campus Estado de M´exico Carr. Lago de Guadalupe Km. 3-5, Atizap´ an, 52926, M´exico
[email protected]
Abstract. In the formal methods approach to software verification, we use logical formulae to model both the program and its intended specification, and, then, we apply (automated) reasoning techniques to demonstrate that the formulae satisfy a verification conjecture. One may either apply proving techniques, to provide a formal verification argument, or disproving techniques to falsify the verification conjecture. However, programs often contain bugs or are flawed, and, so, the verification process breaks down. Interpreting the failed proof attempt or the counterexample, if any, is very valuable, since it potentially helps identifying the program bug or flaw. Lakatos, in his book Proofs and Refutations, argues that the analysis of a failed proof often holds the key for the development of a theory. Proof analysis enables the strengthening of na¨ıve conjectures and concepts, without severely weakening its content. In this paper, we survey our encounters on the productive use of failure in the context of a few theories, natural numbers and (higher-order) lists, and in the context of security protocols. Keywords: Formal methods, proofs and refutations, Lakatos’s style of reasoning.
1
Introduction
In the formal methods approach to software development, both the program and its intended specification are first modelled using logical formulae. Then, (automated) theorem proving techniques are used in order to demonstrate that the program and its specification are related in a way given by the verification conjecture. Alternatively, disproving techniques could be used in order to falsify the verification conjecture, and, in that case, provide either a counter-model or a counterexample (see, for example, [12,21,1]). Of course, proving and disproving techniques can be used simultaneously; a number of techniques combine them (see, for example, [2,20]). Programs, however, very often contain bugs or are flawed; accordingly, any verification attempt involving a faulty program is bound to break down. Interpreting
I am grateful to Dieter Hutter for his valuable comments on an earlier version of this paper.
G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 1–12, 2010. c Springer-Verlag Berlin Heidelberg 2010
2
R. Monroy
a failed proof attempt or a counterexample, if any, is very valuable, since it will potentially lead to identifying the root of the problem. Popper, [19], in his critical rationalism, held that scientific theories are abstract, and, hence, are not directly testable. He also held that scientific theories are conjectural, and, hence, cannot be confirmed by any number of positive outcomes in an experimental testing; yet a single counterexample is conclusive. Popper introduced the logical asymmetry between verification and falsification. He also described the different possibilities of correcting false theories, for example by excluding counterexamples. In his book, Proofs and Refutations, [8], Lakatos (a student of Popper) made the point that the analysis of a failed proof, driven by the interplay between proofs and counterexamples, plays a key role for the development of a theory that is still subject to revision. This form of proof analysis enables the strengthening of recently created conjectures, without severely weakening their content. At the same time, it helps the discovery of concepts, the refinement of definitions, the identification of hidden assumptions, or the strengthening of proofs. Failure interpretation is thus at the heart of scientific discovery, and so is it at the heart of artificial intelligence (AI). It is crucial, because it often holds the key for the correction of proofs or objects, the discovery of lemmata, the generalisation of conjecture or for the invention of new concepts. In this paper, we survey our encounters on the productive use of failure to correct (refine): i) a na¨ıve formulation of a mathematical conjecture, while possibly identifying new concepts; and, ii) a specific type of flawed programs, namely: security protocols. We shall argue that this repair process requires the analysis of any counterexample at hand, as well as the analysis of any failure or partial success in an attempt to prove the given conjecture. We shall structure our survey characterising our techniques as example applications, albeit modest, of the prescriptions developed by Lakatos. This will serve two purposes: first, it will help illustrating the type of reasoning performed in modifying a conjecture or a security protocol; and, second, it will help identifying further research in the area. Overview of Paper. The rest of the paper is organised as follows. First, we overview Lakatos’s philosophy of mathematics, §2. Then, we survey our results on failed proof analysis to refine a conjecture, §3. We then survey our results on counterexample analysis to repair a security protocol, §4. Finally, we conclude the paper.
2
Lakatos’s Perspective of the Evolution of Mathematical Methodology
In [8], Lakatos describes an approach to mathematics, where axioms and definitions are dynamic; so, proofs are not permanent, and are subject to revision. This is in contrast with formal, mathematical logic, which Lakatos criticised for being dogmatic, and creativity repressing.
The Productive Use of Failed Proof Attempts or Counterexamples
3
For Lakatos, mathematics progresses by a series of concepts, initial conjectures and proofs. The proofs for these conjectures are schematic. They do not abide to the usual logic form, and can be rejected by a counterexample. Counterexamples are used to analyse the failed proof. This analysis may suggest a refinement of the schematic proof or a reformulation of the conjecture. Noteworthy is that a counterexample may suggest new counterexamples, or even the refinement or the generation of new concepts. Concept development is possible because, in these kinds of schematic proofs, we usually appeal to intuitive definitions, and that, thereby, they are not yet agreed upon or they cannot be even deterministically decided. Lakatos introduces his perspective of the evolution of mathematical methodology through various case studies. In particular, he provides a rational reconstruction of Cauchy’s proof of Euler’s theorem about polyhedra. The theorem states that, for any polyhedron, V − E + F = 2, where V is the number of vertices, E is the number of edges, and F is the number of faces. Once given, both the conjecture and the proof are inspected and the methods below, follow. Reactions against the conjecture. First, some skepticism arises against the theorem. The objections make apparent that the theorem depends on hidden assumption; in particular, on a usual, but not universal understanding of what counts as a polyhedron. So, a hollow cube and a few joined tetrahedra are put on the table as counterexamples. The three following methods are then proposed. Surrender: Simply give up the theorem; Monster barring: Refine the concept of polyhedra, so that the ‘pathological’ cases, namely: the hollow cube and the joined tetrahedra, are ruled out as polyhedra. It is suggested to offer several definitions, which should all be tested out to see which captures the negative examples to disbar; Monster adjustment: Redefine the properties of the monsters, so that they now become positive examples. Reactions against the Proof. Complementary, one may analyse, in the light of the given counterexamples, the ‘proof’ of the conjecture, in order to pinpoint where it went wrong. Then, one may introduce a condition into the theorem, to prevent the failure. The two methods below, follow. Lemma incorporation: Given a counterexample to the proof (local counterexample), but not to the conjecture (global counterexample), identify the proof step (in Lakatos’s terms: a lemma), that is falsified by the counterexample. Take this lemma and incorporate it as a condition to the theorem, ensuring that the proof step cannot be carried out for the local counterexamples. It is noteworthy that this technique allows the modification or the discovery of a new definition. For the conjecture at hand, Lakatos modifies it so that it is taken to apply only to polyhedra that are topologically equivalent to a sphere, called simple polyhedra. Notice how lemma incorporation led to the discovery of this new concept;
4
R. Monroy
Proofs and refutations: This method sums up all the methods above mentioned. Basically, given an initial conjecture and an schematic proof of it, proofs and refutations consists of the three heuristic rules below: 1. Be ready to refute the conjecture, carefully analysing the two objects and, as a result, yield a list of non-trivial lemmas; find (local and global) counterexamples. 2. If there is a global counterexample, dismiss the conjecture; from the analysis of the proof, identify a ‘convenient’ lemma that is refuted by the counterexample; then, substitute the dismissed conjecture by an improved (modified) one that incorporates that lemma as a condition (that is, forward the falsehood of the consequent to the antecedent). 3. If there is a local counterexample, check if it is also a global one. If it is, apply rule 2. These techniques (as well as others from [8], which we refrain from discussion) are, of course, all applicable in the context of formal methods. For example formal proofs link specifications and properties; thus, they can help understanding how changing a specification might affect the properties it has enjoyed. The revision of specifications amounts to monster barring, since, by redefining the model of objects, we may exclude unwanted items. This strategy can be used to rule out errors introduced in the modelling process. We now survey our results on the use of failure analysis to refine initial conjectures. We shall structure our analysis characterising our techniques as example applications of the prescriptive rules developed by Lakatos. This will help illustrating the type of reasoning performed by our repair mechanisms, and identifying further research in the area.
3
On Refining Mathematical Conjectures
Let ∀X. G(X) be a recently formulated conjecture. Consider that this conjecture is not provable within a theory, Γ , written Γ ∀X. G(X), but that it would be if enough conditions were assumed to hold. Our method, introduced in [13,14,17,15,16], automatically identifies, or builds a definition for, a corrective condition, P (X), such that Γ ∀X. P (X) → G(X). The corrective condition is computed on the fly, during an attempt to prove the corresponding conjecture. The synthesis of a corrective condition, if necessary, is guided by the principle known as proofs-as-programs [6], relating inference with computation. So, a recursive condition corresponds to the use of an inductive rule of inference. Similarly, a conditional condition corresponds to the use of case analysis. Interestingly, for conjecture repair, we do not depend on the ability of finding a counterexample, but trust counterexamples will show up in an attempt to prove the conjecture. Thus, our ability to find a counterexample is shifted to our ability to automatically find a formal proof. To that aim, we have built our repair method within inductive proof planning [5]. Proof planning provides a
The Productive Use of Failed Proof Attempts or Counterexamples
5
significant level of automation in inductive theorem proving. It also is a means of exploiting any failure or partial success attained in a search for a proof. This, together with the careful search guidance induced by proof planning, has made it possible to achieve the experimental results summarised in this section. In what follows, we first survey our results on conjecture repair in the context of first-order logic (natural numbers and polymorphic lists), and then move on to a higher-order theory of lists. 3.1
First-Order Na¨ıve Conjectures
Treatment of Exceptions Excluding Exceptions Motivated by Contradictory Goals. Consider the following na¨ıve conjecture: ∀A, B : τ list. length(app(A, B)) > length(A)
(1)
where length stands for the number of elements in a list, app for list concatenation, and > has its natural interpretation; these relations are given by: length(nil) = 0 length(H :: T ) = s(length(T )) app(nil, L) = L app(H :: T, L) = H :: app(T, L)
s(X) > 0 ↔ true 0 > s(X) ↔ false X > 0 ↔ X = 0 s(X) > s(Y ) ↔ X > Y
where s stands for the successor constructor function on natural numbers, :: for the list constructor function, and nil stands for the empty list. An attempt to prove (1) using one-step list induction on A would yield one base case and one step case. The base case, (A = nil), ends up with a new goal, namely: ∀B : τ list. length(B) = 0. This new goal suggests a further application of one-step list induction. This time, the base case, (B = nil), is problematic, as it yields a contradictory goal : 0 = 0. These observations are all summarised in Table 1. Motivated by the appearance of a contradiction, exception exclusion turns (1) into a theorem: ∀A, B :: τ list : B = nil → length(app(A, B)) > length(A) Table 1. Outline of an attempt to prove (1) using induction Case length(app(nil, B)) = length(nil) length(B) = 0
Deduction Result New goal: length(B) = 0 Contradictory goal: 0 =0
Corrective condition B = nil
Excluding Exceptions Motivated by Primitive Conditions. Let addition, +, be defined given by 0 + Y = Y and s(X) + Y = s(X + Y ). Then, consider the following goal: ∀X, Y : nat. X + Y > X (2)
6
R. Monroy
A proof attempt of (2) proceeds by one-step induction. Again, the base case, (X = 0), is problematic, yielding the following sub-goal: ∀Y : nat. Y = 0, for which no further axiom is applicable. We call these kinds of dead end goals primitive conditions. To refine the input conjecture, primitive conditions are abducted to be used as corrective conditions. Abduction is a form of logical inference that allows us to find a cause that explains an observed phenomenon [18]. Motivated by the appearance of a primitive condition, see Table 2, exception exclusion turns (2) into a theorem: ∀X, Y : nat. Y =0→X +Y >X Table 2. Outline of an attempt to prove (2) using induction Case 0+Y >0 Y =0
Deduction Result New goal: Y =0 Dead end, primitive condition goal: Y =0
Corrective condition Y =0
Lemma Incorporation. Clearly, there could be a large, possibly infinite, number of counterexamples. Then, the corrective condition should be a concept that withdraws as many elements as possible from the conjecture domain so that the conjecture is a theorem. We apply Lakatos’s lemma incorporation, by looking at a whole attempt to prove a given conjecture by means of mathematical induction. To illustrate our implementation of lemma incorporation, consider: ∀N : nat. double(half(N )) = N
(3)
where the functions double and half have their natural interpretation, returning twice and half the integer part of their inputs: half(0) = 0 double(0) = 0 half(s(0)) = 0 double(s(N )) = s(s(double(N ))) half(s(s(N ))) = s(half(N )) An attempt to prove (3) using two step structural induction would yield one step case and two base cases, one where N = 0 and the other where N = s(0). While the first base case, N = 0, is readily provable, the second base case, N = s(0), is not. The latter case gives rise to a goal contradicting an axiom of the natural numbers theory, namely: 0 = s(0). This suggests that (3) holds if N = 0 but does not if N = s(0). Also the step case is readily provable, via symbolic evaluation and the application of the induction hypothesis. Similarly, this result suggests that (3) holds when N = s(s(x)), as long as it does when N = x. These observations are all summarised in Table 3. The corrective condition, P(N ), can be verified to be the condition even. The method therefore suggests that (3) holds only if N is divisible by two: ∀N : nat. even(N ) → double(half(N )) = N The corrective condition found using this form of lemma incorporation characterises a “safe domain” [8]. However, this heuristic may be overprotective, as the
The Productive Use of Failed Proof Attempts or Counterexamples
7
Table 3. Outline of an attempt to prove (3) using induction Case double(half(0)) = 0
Deduction Result Established using symbolic reasoning
double(half(s(0))) = s(0)
Unsolved
double(half(x)) = x double(half(s(s(x)))) = s(s(x))
Established using symbolic reasoning and the induction hypothesis
Corrective condition P(0) ↔ true P(s(0)) ↔ false P(s(s(N ))) ↔ P(N )
“safe domain” may leave out a number of elements that make the conjecture provable. For example, our technique would turn ∀L : τ list. reverse(L) = L into a theorem, by suggesting the following corrective condition: P (nil) ↔ true P (H :: L) ↔ P (L) ∧ Q(H, L) Q(H, nil) ↔ true H = H → Q(H, H :: L) ↔ Q(H, L) This condition is only a subset of all palindromes. Also, our heuristic for lemma incorporation may suggest corrective conditions that are trivial ; that is, the modified conjecture is provable even without appealing to the definitions of the symbols involved. For example, our heuristic suggests to refine ∀N : nat. even(N ) to: ∀N : nat. even(N ) → even(N ) That is, the conjecture is itself a primitive condition, but this is not readily identifiable without a proof attempt. Some example conjectures that have been turned into theorems, using the strategies described above, appear in Table 4. 3.2
Higher-Order Na¨ıve Conjectures
Lemma Incorporation by Abducting Primitive Conditions. A HOL theory of lists, e.g. as reported in [3], largely relies on the higher-order functions “fold left” (foldl) and “fold right” (foldr). It involves proof methods that can prove theorems about many inductively defined constants, without using induction. The deduction power of these methods originates in the use of conditional, higherorder rewrite-rules, extracted from theorems about foldl and foldr. Schematically, foldl and foldr are specified as follows: foldl F E X0 :: X1 :: · · · :: Xn = F (. . . (F (F E X0 ) X1 ) . . .) Xn foldr F E X0 :: X1 :: · · · :: Xn = F X0 (F X1 (. . . (F Xn E) . . .)) where E and F respectively denote the base element and the step function.
8
R. Monroy Table 4. First-order faulty conjectures Conjecture
Condition
odd(half(X))
P (X)
even(X + Y ) even(X) ∧ even(Y ) odd(X + Y ) even(X) ∧ odd(Y ) s(length(delete(X, L))) member(X, L) sort(L) = L ordered(L) ordered(app(L, M )) ordered(L) ∧ ordered(M )
app(A, B) = app(B, A)
P(A, B)
Definition P(0) ↔ false P(s(0)) ↔ false P(s(s(X))) ↔ Q(X) Q(0) ↔ true Q(s(0)) ↔ true Q(s(s(X))) ↔ P(X)
P(nil, B) ↔ P(H :: A, B) ↔ P(A, B) ∧ Q(nil, H) ↔ H = H → Q(H :: B, H) ↔ = H → Q(H :: B, H) ↔ H
true Q(B, H) true Q(B, H) false
A mechanism to refine conjectures, like the one outlined in this paper, could be used in the development of any one such a HOL theory of lists. It would help identify conditions that make initial conjectures into a theorems to further the repertoire of rewrite-rules, which would increase the power of the reasoning system. Interestingly, these conditions show up in an attempt to prove conjectures of this kind, using mathematical induction, and, so, are abducted as primitive conditions by our mechanism. Table 5 shows some of the higher-order faulty conjectures against which we tested our mechanism. There, flat returns the result of concatenating the elements of a list of lists, and map applies a function to every element of a list. The corrective predicate is a conjunct, involving the definitions below: left id F E. F E X = X assoc F. F X (F Y Z) = F (F X Y ) Z comm F. F X Y = F Y X right id F E. F X E = X eqc F G E. G E X = F X E
4
On Fixing Security Protocols
People is reluctant to deliver confidential information over a hostile network, such as the Internet. To guarantee security, engineers offer security protocols. A security protocol is a protocol whereby one or more agents agree about each others’ identity and the sharing of secrets. Security protocols consist of only a few messages, but, interestingly, it is difficult to get them right. For example, a flaw in the 3-message Needham-Schroeder public key (NSPK) protocol went unnoticed roughly for 17 years [11]; machine support was needed to spot it.
The Productive Use of Failed Proof Attempts or Counterexamples
9
Table 5. Higher-order faulty conjectures Conjecture foldr F E (app L M ) = F (foldrF E L) (foldr F E M )
Condition left id F E right id F E assoc F
foldr F E (flat L) = foldr F E (map L (λz. foldr F E Z)) right id F E assoc F foldr F E (rev L) = foldr F E L
foldr F E L = foldl G E L
assoc F comm F equc F G E assoc F comm F
Not surprisingly, the verification of security protocols has attracted a lot of interest in the formal methods community. Model checking tools are capable of determining whether or not a (finite abstraction of a) security protocol is valid. The verification process usually takes a few seconds and, in the case of unsatisfiability, a counterexample, called an attack, is output (see, for example, [2,4]). In [9,10,7], we introduced a method for fixing flawed security protocols, subject to replay attacks. A replay attack is a form of network attack in which a valid data transmission is repeated or delayed.1 Our method makes use of a state-of-the-art protocol verification tool. We apply that tool to the faulty protocol to identify an attack to it. We then compare this attack against a copy of an intended protocol run. This will hopefully spot differences, which indicate where the verification went wrong. Depending on our findings, we offer various heuristics, each of which suggests to a specific way to modify the protocol. Some of our heuristics rearrange parts of an individual message or enrich it with additional information, while others change the flow of messages in the protocol. Our approach is iterative: the mended protocol is sent to the verification tool for reinspection. If the tool still detects a bug, either because this other bug was already present in the original protocol (as often is) or because it was introduced by our method (as has never happened so far), we iterate, applying our repair heuristics on the mended protocol. 4.1
Overview of Approach to Protocol Repair
To explain the rationale behind our patching mechanism, we shall show an example correction of the NSPK protocol: 1. A → B : {|N a; A|}KB 2. B → A : {|N a; N b|}KA 3. A → B : {|N b|}KB 1
http://en.wikipedia.org/wiki/Replay_attack
10
R. Monroy
where q. A → B : M means that, at step q, agent A sends message M to agent B; N a and N b denote Numbers used ONCE only (Nonces); KA and KB represent the public keys of the associated agents; {|M |}K stands for the encryption of M under key K; and where ; stands for message concatenation. NSPK seems right at first glance: messages are all encrypted, and, if not compromised, the participants should be able to authenticate one another. Yet, Lowe found that an intruder, the agent C below, could elaborate the following man in the middle attack [11]: s1 : 1. A→C : {|N a; A|}KC s2 : 1. C(A) → B : {|N a; A|}KB s2 : 2. B → C(A) : {|N a; N b|}KA s1 : 2. C→A : {|N a; N b|}KA s1 : 3. A→C : {|N b|}KC s2 : 3. C(A) → B : {|N b|}KB Flaw Detection. In this attack, one message, namely: {|N a; N b|}KA , is used in two independent runs. Message 2 from session 1 has been passed to play message 2, but from session 2. The deceived agent, A, is the intended recipient, but she cannot distinguish who built the message. Thus, while B knows that A has recently participated in a run of the protocol, she cannot tell whether A is running it apparently with her. Flaw Repair. From this attack, our repair mechanism generalises that these kinds of attacks can be elaborated only when the deceived participants, A and B, run the protocol with complementary roles (initiator and responder) but with different parameters: the responder B is running the protocol with initiator A, but A initiated a run with C. Using the strand-space theory, [22], we can formalise these partial perspectives of protocol runs, and, hence, identify that the name of the responder needs to be included in message 2, to avoid this confusion: {|B; N a; N b|}KA . This repair enables one to prove the lemma that both the initiator and the responder must agree upon their individual view of the protocol run, which could not be proved earlier. It follows, then, that this is an application of monster barring; only that we do not exclude the monster by denying the ability of an intruder, but by modifying the (model of the) program so that the monster is no longer admissible. In our validation experiments, we considered 36 faulty security protocols, of which 20 were borrowed from standard libraries; 4 were variants of other protocols, generated by us; and 12 were improved versions of these protocols, as output by our repair mechanism.
5
Conclusions
Our encounters on failure analysis to conjecture/program repair are example applications, albeit modest, of Lakatos’s types of reasoning for his approach
The Productive Use of Failed Proof Attempts or Counterexamples
11
to mathematics. Our method to improve mathematical conjectures is, though limited, a reification of lemma incorporation, while our method for fixing security protocols a reification of monster barring. [8] is a rich source for inspiration, suggesting new research topics, and, just now, exposes limitations of our approach. For example, to refine an initial conjecture, ∀X. G(X), we postulate instead ∀X. P (X) → G(X). But there are alternative solutions; e.g. we could have weakened G(X), or even tried a combination thereof. As another example, rather than correcting a faulty security protocol, we could have tried to disbar those capabilities of the intruder which make the verification conjecture disprovable. This is of interest, since to elaborate some attacks, the intruder is required to be extremely fast. In some environments, these attacks are not achievable and, thus, can be dismissed. At the bottom of this research area is the question: does a system have the freedom to learn from its failures? Unfortunately, the general answer is no. To engineer conjecture/program refinement, we must anticipate the ways in which proofs go wrong and equip the system with a means to cope with, doing the repair. This determinism is rather disappointing. By way of comparison, AI-based theorem provers may find proofs that were not anticipated by their designers. In the context of theorem repair, can we perform similarly? Others have also investigated the use of failure to flaw correction, but space constraint refrain ourselves from a thorough discussion. Existing repair systems are not ‘aware’ of the objects (i.e. proofs, conjectures, programs, specifications) they manipulate: how they relate one another, how changes in one affect the others or their relations, etc. Such knowledge would enable incremental theory development, by transforming consisting theories into more complex ones. This is certainly a research topic worth pursuing, pointed out to me by D. Hutter.
References 1. Ahrendt, W.: Deductive search for errors in free data type specifications using model generation. In: Voronkov, A. (ed.) CADE 2002. LNCS (LNAI), vol. 2392, pp. 211–225. Springer, Heidelberg (2002) 2. Basin, D.A., M¨ odersheim, S., Vigan` o, L.: OFMC: A symbolic model checker for security protocols. International Journal of Information Security 4(3), 181–208 (2005) 3. Bird, R.: Introduction to Functional Programming Using Haskell, 2nd edn. Prentice Hall Europe, Englewood Cliffs (1998) 4. Blanchet, B.: An efficient cryptographic protocol verifier based on prolog rules. In: Computer Security Foundations Workshop, pp. 82–96. IEEE Computer Science Press, Los Alamitos (2001) 5. Bundy, A.: The Use of Explicit Plans to Guide Inductive Proofs. In: Lusk, R., Overbeek, R. (eds.) CADE 1988. LNCS, vol. 310, pp. 111–120. Springer, Heidelberg (1988); Also available from Edinburgh as DAI Research Paper No. 349 6. Howard, W.A.: The formulae-as-types notion of construction. In: Seldin, J.P., Hindley, J.R. (eds.) To H. B. Curry; Essays on Combinatory Logic, Lambda Calculus and Formalism, pp. 479–490. Academic Press, London (1980) 7. Hutter, D., Monroy, R.: On the automated correction of protocols with improper message encoding. In: Degano, P. (ed.) ARSPA-WITS 2009. LNCS, vol. 5511, pp. 138–154. Springer, Heidelberg (2009)
12
R. Monroy
8. Lakatos, I.: Proofs and refutations: The logic of Mathematical discovery. Cambridge University Press, Cambridge (1976) 9. L´ opez-Pimentel, J.C., Monroy, R., Hutter, D.: A method for patching interleavingreplay attacks in faulty security protocols. Electronic Notes in Theoretical Computer Science 174, 117–130 (2007); Proceedings of the 2006 FLoC Verification and Debugging Workshop 10. L´ opez-Pimentel, J.C., Monroy, R., Hutter, D.: On the automated correction of security protocols susceptible to a replay attack. In: Biskup, J., L´ opez, J. (eds.) ESORICS 2007. LNCS, vol. 4734, pp. 594–609. Springer, Heidelberg (2007) 11. Lowe, G.: An attack on the needham-schroeder public-key authentication protocol. Information Processing Letters 56(3), 131–133 (1995) 12. McCune, W.: Mace4 reference manual and guide. Computer Research Repository cs.SC/0310055 (2003) 13. Monroy, R., Bundy, A., Ireland, A.: Proof plans for the correction of false conjectures. In: Pfenning, F. (ed.) LPAR 1994. LNCS (LNAI), vol. 822, pp. 54–68. Springer, Heidelberg (1994); Also available from Edinburgh as DAI Research Paper No. 681 14. Monroy, R.: The use of abduction and recursion-editor techniques for the correction of faulty conjectures. In: Flenner, P., Alexander, P. (eds.) Proceedings of the 15th Conference on Automated Software Engineering, September 11-15, pp. 91–99. IEEE Computer Society Press, Grenoble (2000) 15. Monroy, R.: Concept formation via proof planning failure. In: Nieuwenhuis, R., Voronkov, A. (eds.) LPAR 2001. LNCS (LNAI), vol. 2250, pp. 718–731. Springer, Heidelberg (2001) 16. Monroy, R.: Predicate synthesis for correcting faulty conjectures: the proof planning paradigm. Automated Software Engineering 10(3), 247–269 (2003) 17. Monroy, R., Bundy, A.: On the correction of faulty formulae. Computaci´ on y Sistemas 5(1), 25–37 (2001) 18. Peirce, C.S.: Collected papers of Charles Sanders Peirce. In: Harston, C., Weiss, P. (eds.), vol. 2. Harvard University Press, Cambridge (1959) 19. Popper, K.: The Logic of Scientific Discovery, vol. 2. Routledge, New York (2002) 20. Steel, G., Bundy, A.: Attacking group protocols by refuting incorrect inductive conjectures. Automated Reasoning 36(2), 149–176 (2006); Special Issue on Automated Reasoning for Security Protocol Analysis 21. Steel, G.: The importance of non-theorems and counterexamples in program verification. In: Meyer, B., Woodcock, J. (eds.) VSTTE 2005. LNCS, vol. 4171, pp. 491–495. Springer, Heidelberg (2008) 22. Thayer, F.J., Herzog, J.C., Guttman, J.D.: Strand spaces: Proving security protocols correct. Journal of Computer Security 7(2-3), 191–230 (1999)
Discourse Segmentation for Spanish Based on Shallow Parsing Iria da Cunha1,2,3, Eric SanJuan2, Juan-Manuel Torres-Moreno2,4, Marina Lloberes5, and Irene Castellón5 1
Institute for Applied Linguistics (UPF), C/Roc Boronat 138, Barcelona, Spain Laboratoire Informatique d’Avignon, BP1228, 84911, Avignon Cedex 9, France 3 Instituto de Ingeniería (UNAM), Ciudad Universitaria, 04510, Mexico 4 École Polytechnique de Montréal/DGI, Montréal (Québec), Canada 5 GRIAL – Universitat de Barcelona, C/Gran Via de les Corts 585, Barcelona, Spain
[email protected], {eric.sanjuan,juan-manuel.torres}@univ-avignon.fr, {marina.lloberes,icastellon}@ub.edu 2
Abstract. Nowadays discourse parsing is a very prominent research topic. However, there is not a discourse parser for Spanish texts. The first stage in order to develop this tool is discourse segmentation. In this work, we present DiSeg, the first discourse segmenter for Spanish, which uses the framework of Rhetorical Structure Theory and is based on lexical and syntactic rules. We describe the system and we evaluate its performance against a gold standard corpus, obtaining promising results. Keywords: Discourse Parsing, Discourse Segmentation, Rhetorical Structure Theory.
1 Introduction Nowadays discourse parsing is a very prominent research topic, since it is useful for text generation, automatic summarization, automatic translation, information extraction, etc. There are several discourse parsers for English [1,2], Japanese [3] and Brazilian Portuguese [4,5]. Most of them use the framework of Rhetorical Structure Theory (RST) [6]. However, there is no discourse parser for Spanish. The first stage in order to develop this tool for this language is to carry out discourse segmentation automatically. As stated in [7]: “Discourse segmentation is the process of decomposing discourse into elementary discourse units (EDUs), which may be simple sentences or clauses in a complex sentence, and from which discourse trees are constructed”. There are discourse segmenters, for English [71,8], Brazilian Portuguese [9]2 and French [22], but not for Spanish. All of them require some syntactic analysis of the sentences. [7,8,9] rely on a set of linguistic rules. [22] relies on machine learning techniques: it learns rules automatically from thoroughly annotated texts. 1 2
http://www.sfu.ca/~mtaboada/research/SLSeg.html http://www.nilc.icmc.usp.br/~erick/segmenter/
G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 13–23, 2010. © Springer-Verlag Berlin Heidelberg 2010
14
I. da Cunha et al.
In this paper we present DiSeg, the first discourse segmenter for Spanish. It produces state of the art results while it does not require syntactic analysis but only shallow parsing with a reduced set of linguistic rules. Therefore it can be easily included in applications that require fast text analysis on the fly. In particular it will be part of the discourse parser for Spanish that we are carrying out. It will be also used in tasks involving human discourse annotation, since it will allow annotators to perform their analysis starting from a unique automatic segmentation. We describe the system, based on shallow parsing and syntactic rules that insert segment boundaries into the sentences. Like in [7,22], we evaluate system performance over a corpus of manually annotated texts. We obtain promising results, although the system should be improved in some aspects. The rest of the paper is structured as follows. Section 2 explains our methodology. Section 3 describes the implementation. The gold standard corpus and the experimental setup is detailed in section 4, meanwhile results are reported in section 5. Finally, section 6 is devoted to conclusions and future work.
2 Methodology The theoretical framework of our research is based on Rhetorical Structure Theory (RST), as defined in [6]. As mentioned in [10], this theory is used in a wide range of NLP applications like automatic text generation [11,12,13], summarization [1,14,15], translation [16,17], etc. In all these applications, RST is used to obtain a deeper linguistic analysis. Figure 1 shows an example of RST discourse tree (with three relations: Background, Contrast and Concession).
Fig. 1. Example of a RST discourse tree
As it can be seen in this example, RST can work at two levels. At sentence level to analyse them or at an upper level to relate them. We shall not consider the transversal case where sub-units inside a sentence can be individually related to units inside other sentences. In [23] we have shown how RST at upper level can improve automatic summarization methods based on full sentence selection from the source text by scoring them according to their discursive role. In this paper we focus on RST applied at sentence level.
Discourse Segmentation for Spanish Based on Shallow Parsing
15
RST sentence segmentation tools are necessary for further discursive analysis but they are also useful on their own in many NLP applications. For example, by segmenting complex sentences they can be used in sentence compression. Therefore, in automatic summarization, RST-based strategies would allow to eliminate some passages of these sentences, obtaining more suitable summaries. With regard to automatic translation, most usual strategies rely on statistical sentence alignment. Again, for complex sentences, the results of these statistical systems could be improved by aligning sub-discourse units. Moreover, fast segmentation tools based on shallow parsing, as the one we propose here, can be applied in focus Information Retrieval systems that have to return short text passages instead of complete documents. Let us now precise the notion of EDU in our work. We consider them as in [18], but with some differences, similar to those included in [7] and [19]. The aim of these differences is to be able to clearly differentiate syntactic and discursive levels. In this work, we consider that EDUs must include at least one verb (that is, they have to constitute a sentence or a clause) and must show, strictly speaking, a rhetorical relation (many times marked with a discourse connector). For example, sentence 1a would be separated into two EDUs, while sentence 1b would constitute a single EDU: 1a. [The hospital is adequate to adults,]EDU1 [but children can use it as well.]EDU2 1b. [The hospital is adequate to adults, as well to children.]EDU1
Furthermore, subject and object clauses are not necessarily considered as EDUs. For example, sentence 2 would be a single EDU: 2. [She indicated that the emergency services of this hospital were very efficient.]EDU1
We have then developed a segmentation tool based on a set of discourse segmentation rules using lexical and syntactic features. These rules are based on: − discourse markers, as “while” (mientras que), “although” (aunque) or “that is”
(es decir), which usually mark relations of Contrast, Concession and Reformulation, respectively (we use the set of discourse markers listed in [20]); − conjunctions, as, for example, “and” (y) or “but” (pero); − adverbs, as “anyway” (de todas maneras); − verbal forms, as gerunds, finite verbs, etc.; − punctuation marks, as parenthesis or dashes. Finally, we have also annotated manually a corpus of texts to be used as gold standard for evaluation. The elaboration of a gold standard was necessary due to the current lack of discourse segmenters for Spanish. We thus evaluate DiSeg performance, measuring precision, recall and F-Score over this annotated corpus. We also consider three different baseline systems and a simplified system named DiSeg-base.
3 Implementation DiSeg implementation relies on the open source software FreeLing [21] for the Part of Speech (PoS) and shallow parsing. This open-source highly scalable resource for NLP
16
I. da Cunha et al.
applications is based on simple Hidden Markov Model (HMM) classifiers and on readable optimal Context Free Grammars (CFG), which can be easily adapted to specific needs. Therefore, we carry out some modifications into the default grammar of the shallow parser. These were mainly recategorizations of some elements (as prepositions, prepositional phrases, adverbs or adverbial phrases) into discourse markers (disc_mk). FreeLing output is then encoded into an XML structure to be processed by perl programs that apply discourse segmentation rules in a two-step process. First (DiSeg-base), candidate segment boundaries are detected using two simple automata based on the following tags: ger, forma_ger, ger_pas (that is, all possible present participles or gerunds), verb (that is, finite verbal forms), coord (coordinating conjunctions), conj_subord (subordinating conjunctions), disc_mk (recategorizated elements) and grup_sp_inf (infinitives). The only text markers that are used apart from these tags are the period and two words: “that” (que) and “for” (para). Second (DiSeg), EDUs are defined using a reverse parsing from right to left where boundaries are considered only if there is a verb in the resulting segments before and after this boundary. Indeed, if all previously inserted boundaries were considered, EDUs without verbs could be generated. Figure 2 shows DiSeg architecture. Te xt
[tag] [S] xxxx [/S ] [N] yyyy [/N] [J] www [/J] [/tag]
Sentence segmentation and POS tagging (Freeling)
Shallow parsing (Freeling with recate gorized grammar)
xxxx yyyy www
EDU1: [xxxx] EDU2: [yyyy] EDU3: [aaaa]
Transformation to XML (with Perl)
Segmenta tion rules application (with Perl and Twig): - Detection of se gme nt boundaries - EDUs definition
Fig. 2. DiSeg architecture
DiSeg can be used on-line at http://diseg.termwatch.es. The system is also available under General Public License (GPL). It requires FreeLing and it is made of three elements:
Discourse Segmentation for Spanish Based on Shallow Parsing
17
1) a grammar for FreeLing, 2) a small perl program to transform FreeLing output into XML, 3) a second perl program that applies discourse segmentation rules and requires TWIG library for XML. Appendix A includes a passage of the gold standard corpus, its translation and its DiSeg segmentation. Appendix B shows the resulting XML source with: − tags inserted by FreeLing, including disc_mk tags resulting from the added rules
to the default FreeLing CFG for Spanish, − the segD tags corresponding to possible EDU boundaries (DiSeg-base), − the subseg tags corresponding to selected EDUs (DiSeg).
The XML elements “...” that also appear in this output indicate the passages that DiSeg had to analyze. The analysis is carried out inside these elements. In the current case, these elements are full sentences but, whenever punctuation is ambiguous, extended elements could be considered. Since we use very few text marks, our approach should be easily adapted to other Latin languages defined in FreeLing. Moreover, Diseg-base could be implemented in a CFG, but it would be less computationally efficient. It is only the final reverse parsing that is not CFG definable. In our experiments we have tested to what extend the non CFG module is necessary.
4 Experiments and Evaluation The gold standard test corpus consists of 20 human annotated abstracts of medical research articles. These abstracts were extracted from the on-line Gaceta Médica de Bilbao (Medical Journal of Bilbao)3. The corpus includes 169 sentences. Text average is 8.45 sentences. The longest text contains 21 sentences and the shortest text contains 3 sentences. The corpus contains 3981 words. Text average is 199.05 words. The longest text contains 474 words and the shortest text contains 59 words. This corpus includes 203 EDUs. Text average is 10.15 EDUs (the maximum 28 EDUs and the minimum 3 EDUs). These statistics are similar to the statistics of the gold standard corpus used in [7] to develop a discourse segmenter for English, obtaining very good results. This corpus was segmented by one of the authors of this paper (following the guidelines of our project). Another linguist, external to the project, segmented the corpus following the same guidelines. We calculated the precision and recall of this second annotation. Both measures were very high: precision was 98.05 and recall 99.03. Moreover, after short discussions between annotators, a consensus was reached. We use the consensual segmentation as gold standard. This gold standard is also available at http://diseg.termwatch.es. We ran DiSeg over this corpus for evaluation and computed precision, recall and F-Score measures among detected and correct boundaries. Precision is the number of correct boundaries detected by the system over the total number of detected ones. Recall is the same number of correct boundaries detected by the system but divided 3
http://www.gacetamedicabilbao.org/web/es/
18
I. da Cunha et al.
this time by the total number of real boundaries existing in the gold standard corpus. As in [7], we do not count sentence boundaries in order to not inflate the results. For this evaluation, we used three baseline segmenters: 1. Baseline_0 only considers sentences as EDUs. This is not a trivial baseline since its precision is 100% by definition and four texts in the gold standard have no other type of EDUs. 2. Baseline_1 inserts discourse boundaries before each coor tag introduced by the Freeling shallow parsing. 3. Baseline_2 considers both tags indicating coor and conj_subord but only the last segment at the right of the sentence with a verb is considered as an EDU. We also consider a simplified system named DiSeg-base, where all candidate boundaries are considered as real EDU ones, even though some generated segments can have no verbs. For Baseline_1, Baseline_2 and DiSeg-base we do not count sentence boundaries.
5 Results Table 1 contains the results of the evaluation. Results show that DiSeg full system outperforms DiSeg-base and all the baselines. F-Score differences are statistically significant according to the pairwise Student test at 0.05 between the two versions of DiSeg and at 0.01 among DiSeg and the three baselines (Baseline_2, the most sophisticated baseline, appears to give the best results). These results are similar to those obtained by the discourse segmenter for English introduced in [7]: 93% of precision, 74% of recall and 83% of F-Score. Thus, we consider that DiSeg results are promising. Table 1. Results of the evaluation
DiSeg DiSeg-base Baseline_2 Baseline_1 Baseline_0
Precision 71% 70% 68% 33%
Recall 98% 88% 82% 70%
F-Score 80% 74% 72% 39%
100%
49%
62%
After a quantitative analysis of the results, we carry out a qualitative analysis in order to detect the main performance problems. We find problems concerning segmentation rules and concerning Freeling. The main problem of segmentation rules concerns situations where the element que (“that”) is involved at the same time that the conjunction y (“and”). Example 3a shows DiSeg segmentation and example 3b shows the correct segmentation.
Discourse Segmentation for Spanish Based on Shallow Parsing
19
3a. [El perfil del usuario sería el de un varón (51,4%) de mediana edad (43,2 años) que consulta por patología traumática (50,5%)]EDU1 [y procede de la comarca sanitaria cercana al hospital.]EDU2 ENGLISH TRANSLATION OF 3A. [The general profile of users would be a man (51.4%) of middle age (43.2 years) who consults because of traumatologic pathologies (50.5%)]EDU1 [and comes from the sanitary area near the hospital.]EDU2 3b. [El perfil del usuario sería el de un varón (51,4%) de mediana edad (43,2 años) que consulta por patología traumática (50,5%) y procede de la comarca sanitaria cercana al hospital.]EDU1 ENGLISH TRANSLATION OF 3B. [The general profile of users would be a man (51.4%) of middle age (43.2 years) who consults because of traumatologic pathologies (50.5%) and comes from the sanitary area near the hospital.]EDU1
One of the DiSeg segmentation rules indicates that the relative que (“that”) is not considered as segmentation boundary. Nevertheless, another of these rules indicates that, if there is a coordinative conjunction (like y [“and”]) and next there is a verb, that conjunction constitutes a possible segmentation boundary. Thus, DiSeg does not segment before que (“that”), but it segments just before y (“and”), because it finds the verb procede (“comes from”) before the end of the sentence. We have detected several cases with a similar problem. Moreover, we detect two errors due to a wrong sentence segmentation of Freeling. Example 4 shows one of them (example 4a shows DiSeg segmentation and example 4b shows the correct segmentation). 4a. [No encontramos cambios en la medición del ángulo astrágalo-calcáneo en AP. Realizamos una descripción de nuestra serie y una discusión acerca de la técnica y de la indicación actual de la cirugía en esta patología.]EDU1 ENGLISH TRANSLATION OF 4A. [In the measurement of the talar-calcaneal angle in AP there was no changes. We carry out a description of our series and a discussion about the surgical technique and the present indication in this pathology.]EDU1 4a. [No encontramos cambios en la medición del ángulo astrágalo-calcáneo en AP.]EDU1 [Realizamos una descripción de nuestra serie y una discusión acerca de la técnica y de la indicación actual de la cirugía en esta patología.]EDU2 ENGLISH TRANSLATION OF 4B. [In the measurement of the talar-calcaneal angle in AP there was no changes.]EDU1 [We carry out a description of our series and a discussion about the surgical technique and the present indication in this pathology.]EDU2
The sentence segmentation module does not segment correctly these two sentences, probably because it considers “AP.” as an abbreviation and it does not detect the beginning of the second sentence. This problem causes an error in the discourse segmentation of DiSeg.
6 Conclusions We have developed DiSeg, the first discourse segmenter for Spanish, based on lexical and syntactic rules. We consider that this research constitutes an important step into the research on automatic discourse parsing in Spanish, because there are not many
20
I. da Cunha et al.
works about this topic for this language. We have evaluated DiSeg performance, measuring precision, recall and F-Score, comparing it with a gold standard that we have carried out. Performance is good if we compare it with the baseline segmenters. Moreover, results are similar to the ones obtained in [7]. Additionally, we think that the gold standard we have carried out is a good contribution in order to encourage other researchers to go on investigating in this field. As future work, we plan to solve the detected errors, using more symbolic rules and/or machine learning approaches like in [22]. Moreover, we will apply DiSeg to another Spanish corpus including general texts from the Wikipedia. The final goal of the project is to develop the first discourse parser for Spanish on an open platform, easily adaptable to the other Latin languages implemented in FreeLing. Acknowledgments. This work is partially supported by: a postdoctoral grant (National Program for Mobility of Research Human Resources; National Plan of Scientific Research, Development and Innovation 2008-2011) given to Iria da Cunha by Ministerio de Ciencia e Innovación, Spain; the research project CONACyT, number 82050, the research project PAPIIT-DGAPA, number IN403108, and the research project "Representación del Conocimiento Semántico" (SKR) KNOW2 (TIN2009-14715-C0403).
References 1. Marcu, D.: The Theory and Practice of Discourse Parsing Summarization. Institute of Technology, Massachusetts (2000a) 2. Marcu, D.: The Rhetorical Parsing of Unrestricted Texts: A Surface-based Approach. Computational Linguistics 26(3), 395–448 (2000b) 3. Sumita, K., Ono, K., Chino, T., Ukita, T., Amano, S.: A discourse structure analyzer for Japonese text. In: International Conference on Fifth Generation Computer Systems, pp. 1133–1140 (1992) 4. Pardo, T.A.S., Nunes, M.G.V., Rino, L.M.F.: DiZer: An Automatic Discourse Analyzer for Brazilian Portuguese. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 224–234. Springer, Heidelberg (2004) 5. Pardo, T.A.S., Nunes, M.G.V.: On the Development and Evaluation of a Brazilian Portuguese Discourse Parser. Journal of Theoretical and Applied Computing 15(2), 43–64 (2008) 6. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: Toward a functional theory of text organization. Text 8(3), 243–281 (1988) 7. Tofiloski, M., Brooke, J., Taboada, M.: A Syntactic and Lexical-Based Discourse Segmenter. In: 47th Annual Meeting of the Association for Computational Linguistics, Singapur (2009) 8. Soricut, R., Marcu, D.: Sentence Level Discourse Parsing Using Syntactic and Lexical Information. In: 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, pp. 149– 156 (2003) 9. Mazeiro, E., Pardo, T.A.S., Nunes, M.G.V.: Identificação automática de segmentos discursivos: o uso do parser PALAVRAS. Série de Relatórios do Núcleo Interinstitucional de Lingüística Computacional (NILC). São Carlos, São Paulo (2007)
Discourse Segmentation for Spanish Based on Shallow Parsing
21
10. Taboada, M., Mann, W.C.: Applications of rhetorical structure theory. Discourse Studies 8(4), 567–588 (2005) 11. Hovy, E.: Automated discourse generation using discourse structure relations. Artificial Intelligence 63, 341–385 (1993) 12. Dale, R., Hovy, E., Rösner, D., Stock, O.: Aspects of Automated Natural Language Generation. Springer, Berlin (1992) 13. O’Donnell, M., Mellish, C., Oberlander, J., Knott, A.: ILEX: An architecture for a dynamic Hypertext generation system. Natural Language Engineering 7, 225–250 (2001) 14. Radev, D.: A common theory of information fusion from multiple text sources. Step one: Cross document structure. In: Dybkjær, L., Hasida, K., Traum, D. (eds.) 1st SIGdial Workshop on Discourse and Dialogue, Hong-Kong, pp. 74–83 (2000) 15. Pardo, T.A.S., Rino, L.H.M.: DMSumm: Review and assessment. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 263–274. Springer, Heidelberg (2002) 16. Ghorbel, H., Ballim, A., Coray, G.: ROSETTA: Rhetorical and Semantic Environment for Text Alignment. In: Rayson, P., Wilson, A., McEnery, A.M., Hardie, A., Khoja, S. (eds.) Proceedings of Corpus Linguistics, Lancaster, pp. 224–233 (2001) 17. Marcu, D., Carlson, L., Watanabe, M.: The automatic translation of discourse structures. In: 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2000), Seattle, vol. 1, pp. 9–17 (2000) 18. Carlson, L., Marcu, D.: Discourse Tagging Reference Manual. ISI Technical Report ISITR-545. University of Southern California, Los Angeles (2001) 19. da Cunha, I., Iruskieta, M.: La influencia del anotador y las técnicas de traducción en el desarrollo de árboles retóricos. Un estudio en español y euskera. In: 7th Brazilian Symposium in Information and Human Language Technology (STIL). Universidade de São Paulo, São Carlos (2009) 20. Alonso, L.: Representing discourse for automatic text summarization via shallow NLP techniques. PhD thesis. Universitat de Barcelona, Barcelona (2005) 21. Atserias, J., Casas, B., Comelles, E., González, M., Padró, L.l., Padró, M.: FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In: 5th International Conference on Language Resources and Evaluation. ELRA (2006) 22. Afantenos, S., Denis, P., Muller, P., Danlos, L.: Learning Recursive Segments for Discourse Parsing. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (2010) 23. da Cunha, I., Fernández, S., Velázquez-Morales, P., Vivaldi, J., SanJuan, E., TorresMoreno, J.-M.: A New Hybrid Summarizer Based on Vector Space Model, Statistical Physics and Linguistics. In: Gelbukh, A., Kuri Morales, Á.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 872–882. Springer, Heidelberg (2007)
22
I. da Cunha et al.
APPENDIX A. Example of DiSeg segmentation Original Spanish passage Con el fin de predecir la tasa esperable de ganglios centinela en nuestra población, hemos analizado la tasa de invasión axilar en los últimos 400 casos de cáncer de mama pT1 operados por nosotros, utilizando la técnica clásica de linfadenectomía axilar completa. De los 400 tumores 336 (84.0%) fueron carcinomas ductales infiltrantes NOS, 32 (8.0%) carcinomas lobulillares, 22 carcinomas tubulares puros (5.5%), y los 10 restantes correspondieron a otras variedades histológicas menos frecuentes. A la hora de realizar el estudio del ganglio centinela en cánceres de mama T1 en nuestra población, cabe esperar globalmente la detección de un ganglio positivo en al menos una de cada cuatro pacientes. English translation given by the text author In order to predict the expectable positive sentinel node rate in our population, we analyzed the rate of axillary invasion in the last 400 pT1 breast cancer cases operated by us, using the classical technique of complete axillary dissection. Of the 400 tumors, 336 (84.0%) were ductal NOS infiltrating carcinomas, 32 (8.0%) lobular carcinomas, the remaining 10 belonging to other, less frequent histological varieties. When studying the sentinel node in T1 breast cancers in our population, the detection of a positive node may globally be expected in one out of four patients. Segmented text Con el fin de predecir la tasa esperable de ganglios centinela en nuestra población, hemos analizado la tasa de invasión axilar en los últimos 400 casos de cáncer de mama pT1 operados por nosotros, utilizando la técnica clásica de linfadenectomía axilar completa. De los 400 tumores 336 (84.0%) fueron carcinomas ductales infiltrantes NOS, 32 (8.0%) carcinomas lobulillares, 22 carcinomas tubulares puros (5.5%), y los 10 restantes correspondieron a otras variedades histológicas menos frecuentes. A la hora de realizar el estudio del ganglio centinela en cánceres de mama T1 en nuestra población, cabe esperar globalmente la detección de un ganglio positivo en al menos una de cada cuatro pacientes.
Discourse Segmentation for Spanish Based on Shallow Parsing
APPENDIX B. Screenshot of the complete DiSeg XML output Con el fin de predecir población
[...] , hemos analizado la tasa
[...]
23
Towards Document Plagiarism Detection Based on the Relevance and Fragmentation of the Reused Text Fernando Sánchez-Vega1, Luis Villaseñor-Pineda1, Manuel Montes-y-Gómez1,3, and Paolo Rosso2 1
Laboratory of Language Technologies, Department of Computational Sciences, National Institute of Astrophysics, Optics and Electronics (INAOE), Mexico {fer.callotl,mmontesg,villasen}@inaoep.mx 2 Natural Language Engineering Lab, ELiRF, DSIC, Universidad Politécnica de Valencia, Spain
[email protected] 3 Department of Computer and Information Sciences, University of Alabama at Birmingham
Abstract. Traditionally, External Plagiarism Detection has been carried out by determining and measuring the similar sections between a given pair of documents, known as source and suspicious documents. One of the main difficulties of this task resides on the fact that not all similar text sections are examples of plagiarism, since thematic coincidences also tend to produce portions of common text. In order to face this problem in this paper we propose to represent the common (possibly reused) text by means of a set of features that denote its relevance and fragmentation. This new representation, used in conjunction with supervised learning algorithms, provides more elements for the automatic detection of document plagiarism; in particular, our experimental results show that it clearly outperformed the accuracy results achieved by traditional n-gram based approaches. Keywords: Plagiarism detection, text reuse, supervised classification.
1 Introduction Plagiarism is regarded as intellectual theft; it consists in using the words (and ideas) of others and presenting them as your own. Nowadays, due to current technologies for creating and disseminating electronic information, it is very simple to compose a new document by copying sections from different sources extracted from the Web. This situation has caused the growing of the plagiarism phenomenon, and, at the same time, it has motivated the development of tools for its automatic detection. From a general point of view document plagiarism detection divides in two major problems, intrinsic and external plagiarism detection [8]. The former aims to determine plagiarized sections by analyzing style changes within the document of interest, whereas, the latter tries to discriminate plagiarized from non-plagiarized documents by determining the reused text sections from a reference collection. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 24–31, 2010 © Springer-Verlag Berlin Heidelberg 2010
Towards Document Plagiarism Detection Based on the Relevance and Fragmentation
25
Regarding external plagiarism detection, its main concern involves finding similarities between any two documents which are more than just coincidence and more likely to be result of copying [3]. This is a very complex task since reused text is commonly modified with the aim of hide or camouflage the plagiarism. To date, most approaches have only partially addressed this issue by measuring the lexical and structural similarity of documents by means of different kinds of features such as single words [4, 11], fixed length substrings (known as n-grams) [1, 4, 6, 7], and variable-length substrings [4, 2]. The main drawback of these approaches is that they carry out their decision/classification based on one single value/feature, namely, the degree of overlap between the suspicious and source documents. Due to this strategy, they are affected by the thematic correspondence of the documents, which implies the existence of common domain-specific word sequences, and, therefore, causes an overestimation of their overlap [3]. In order to face the above problem we propose to consider more information into the classification process of the documents. Our idea is to characterize the common (possibly reused) text by its relevance and fragmentation. In particular, we consider a set of features that denote the frequency of occurrence of common sequences as well as their length distribution. Our assumption is that the larger and the less frequent the common sequences the greater the evidence of plagiarism. In other words, we consider that frequent common sequences tend to correspond to domain specific terminology, and that small common sequences may be co-incidental, and, therefore, they are not a clear signal of plagiarism. The experimental evaluation of the proposed approach was carried out on a subset of the METER corpus [5]. In particular, we model the document plagiarism detection as a classification problem, and, therefore, our goal was to show that using the proposed set of features, which better describe the particularities of the common sequences, it is possible to achieve a greater discrimination performance between plagiarized and non-plagiarized documents than only considering the general degree of overlap. The rest of the paper is organized as follows. Section 2 describes the proposed representation of the common text. First, it formally defines the set of common sequences between two given documents. Then, it introduces the set of relevance and fragmentation features used to characterize the common text. Section 3 presents the experiments. It describes the experimental configuration and shows the results from the classification of 253 pairs of suspicious and source documents. The achieved results are encouraging; they indicate that the proposed approach outperformed by more than 7% the accuracy of the current approaches mentioned above. Finally, Section 4 depicts our conclusions and formulates some future work ideas.
2 A New Representation of the Common Text Generally, as stated above, common word sequences between the suspicious and source documents are considered the primary evidence of plagiarism. Nevertheless, using their presence as unique indicator of plagiarism is too risky, since thematic coincidences also tend to produce portions of common text (i.e., false positives). In
26
F. Sánchez-Vega et al.
addition, even a minor modification to hide the plagiarism will avoid the identification of the corresponding sequences, generating false negatives. In order to handle these problems we propose using a set of features that describe diverse characteristics of the common sequences, and, therefore, that facilitate the recognition of sequences denoting the reused (plagiarized) text. Before introducing the proposed set of features we present the definition of a common sequence. Assuming that DS and DR are two documents, the suspicious and source (reference) documents respectively, and that each document is a sequence of and are the ith words of DS and DR respectively, then: words, where contained in DS is a com, ,…, , Definition 1. The word sequence mon sequence between DS and DR if and only if there exist at least one sequence in DR, such that: , ,…, ,
In order to learn to discriminate between plagiarized and non-plagiarized documents, we propose to characterize the set of common sequences (denoted by Ψ) by two main kinds of features, namely, relevance and fragmentation features. The next formula shows the proposed representation of Ψ. Ψ
,
,…,
,
,
,…,
As noticed, we represent the set of common sequences by 2 features, where each feature and indicate the relevance and fragmentation of the sequences of length i respectively. Cases of particular interest are the fm features, which indicate the values of all sequences with length equal or greater than m (a user-defined value). Their purpose is to deal with the data sparseness and to allow taking advantage of the occurrence of discriminative but very rare longer sequences. Following we define both kind of features. For the sake of simplicity we first describe fragmentation features and afterward relevance features. Fragmentation features. By means of these features we aim to find a relation between the length and quantity of common sequences and the plagiarism. These features are based on two basic assumptions. On the one hand, we consider that the longer the sequences the greater the evidence of plagiarism, and, on the other hand, based on the fact that long sequences are very rare, we consider that the more the common sequences the greater the evidence of plagiarism. According to these basic assumptions we compute the value of the feature by adding the lengths of all common sequences of length equal to i as described in the following formula: :
Towards Document Plagiarism Detection Based on the Relevance and Fragmentation
The definition of the agglomerative feature
27
is as stated below:
:
Relevance features. This second group of features aims to qualify the sequences by their words. That is, they aim to determine the relevance of the sequences with respect to the thematic content of both documents. The idea behind these features is that frequent words/sequences are related to the topic of the documents, and not necessarily are a clear signal of plagiarism. On the contrary, they are supported on the intuition that plagiarism is a planned action, and, therefore, that plagiarized sections (sequences) are not exhaustively used. In particular we measure the relevance of a given common sequence Ψ by the following formula: |
1
|
2
,
,
,
, in docuwhere indicates the occurrences of the common sequence ment D, and , indicates the times word wk occurs in D. This measure of relevance has two components, the first one evaluates how frequent is the given sequence in the suspicious document, strongly penalizing frequent sequences because they have more probability of being idiomatic or domain specific expressions. On the other hand, the second component castigates the sequences formed by words that are frequent in both documents. As noticed, this formula reaches its greatest value (relevance = 1), when the common sequence (and all their inner words) appear exclusively once in both documents, indicating that it has great chance for being a deliberate copy. Based on the definition of the relevance of a sequence, relevance features are computed as follows: :
The definition of the agglomerative feature
is as follows:
:
3 Experimental Evaluation 3.1 The Corpus For the experiments we used a subset of the METER corpus1 [5]; a corpus specially designed to evaluate text reuse in the journalism domain. It consists of annotated examples of related newspaper texts collected from the British Press Association (PA) and nine British newspapers that subscribe to the PA newswire service. 1
www.dcs.shef.ac.uk/nlp/funded/meter.html
28
F. Sánchez-Vega et al.
In the METER corpus news from the PA are considered as the source documents and the corresponding notes from the newspapers are regarded as the suspicious documents. In particular, we only used the subset of news (suspicious document) that has only one single related note (source documents). That is, we considered a subset of 253 pairs of source-suspicious documents. In this corpus each suspicious document (note from a newspaper) is annotated with one of three general classes indicating its derivation degree with respect to the corresponding PA news: wholly-derived, partially-derived and non-derived. For our experiments we considered wholly and partially derived documents as examples of plagiarism and non-derived documents as examples of non-plagiarism, modeling in this way the plagiarism detection task as a two-class classification problem. In particular, the formed evaluation corpus consists of 181 instances of plagiarism and 72 of non-plagiarism. 3.2 Evaluation For the evaluation of the proposed approach, as well as for the evaluation of the baseline methods, we employed the Naïve Bayes classification algorithm as implemented by Weka [10], and applied a 10 cross-fold validation strategy. In all cases we preprocessed the documents by substituting punctuation marks by a generic label, but we did not eliminate stopwords nor apply any stemming procedure. The evaluation of results was carried out mainly by means of the classification accuracy, which indicates the overall percentage of documents correctly classified as plagiarized and non-plagiarized. Additionally, due to the class imbalance, we also present the averaged F1 measure as it was used in the work by [4], which indicates the average of the F1 scores across the two classes. 3.3 Selection of the m Value As we explained in Section 2, we propose representing the set of common sequences between the suspicious and source documents by a vector of 2 features. In this vector, each feature indicates the relevance or fragmentation of the sequences of a particular length, except for the m-features which integrate information from all sequences with length greater than m. In order to determine an appropriate value of m for our experiments we evaluated the information gain (IG) [9] of each obtained feature. Given that we extracted common sequences of lengths varying from 1 to 61, we initially constructed a representation of 122 features. Then, for each one of the 10 folds, we evaluated the information gain of these features, and, finally, we decided preserving those having an averagedIG greater than 0.1. Following this procedure we established m = 4 for the experiments reported in this paper. As a reference, Table 1 shows the obtained averaged IG values as well as their standard deviation (for the 10 different folds) for the first five features, which correspond to sequences of lengths from 1 to 5.
Towards Document Plagiarism Detection Based on the Relevance and Fragmentation
29
Table 1. IG of the first five features of the proposed representation Length of sequences 1 2 3 m=4 5
Average IG 0.382 0.288 0.125 0.037 0.006
Standard Deviation 0.024 0.025 0.026 0.025 0.017
3.4 Baseline Results Table 2 shows some baseline results corresponding to current approaches for document plagiarism detection. For these results, the classification was carried out using different features denoting the percentage of overlap between the suspicious and source documents. In particular, for the first experiment we measured this overlap by means of the common words (unigrams); for the second experiment we represented the overlap by three features corresponding to the percentage of common unigrams, bigrams and trigrams respectively, and, for the third experiment we considered as single feature the percentage of common words extracted from the common sequences. As noticed, all results are very similar being the one based on the percentage of common unigrams the best. This result indicates that the used corpus has a great level of modification (rewritten), and, therefore, that the insertion of words for cutting long sequences may be high. On the other hand, this result was worrying (for us), since it indicates that structural information (not captured by unigrams) is not needed, and, in contrast to this conjecture, our approach aims to take advantage of this kind of information. Table 2. Baseline results: based on the proportion of common n-grams and sequences Kind of features Unigrams {1,2,3}-grams Common sequences (length ≥ 2)
Number of features 1 3 1
Accuracy
F1 measure
73.12% 70.75% 72.72%
0.655 0.6885 0.677
3.5 Results of the Proposed Approach Table 3 shows the results from the proposed approach. The first two rows indicate the results achieved by the relevance and fragmentation features respectively, whereas, the last row presents the results obtained by their combination. Results from this table indicate that: • Relevance and fragmentation features are both relevant for the task of plagiarism detection. In particular, fragmentation features showed to be very appropriate, outperforming the classification accuracy of current methods; whereas, relevance features only obtained comparable results. • Relevance and fragmentation features are complementary; their combined usage allowed obtaining a better result than their individual applications. • Results of the proposed approach, based on the combination of relevance and fragmentation features, improved by more than 7% the accuracy of the reference methods, and by more than 2% their averaged F1 measure.
30
F. Sánchez-Vega et al.
Table 3. Results of the proposed approach based on relevance and fragmentation features from the common sequences Kind of features Fragmentation features Relevance features All features and
(m = 4) (m = 4) (m = 4)
Number of features 4 4 8
Accuracy
F1 measure
77.07% 73.91% 78.26%
0.6755 0.606 0.7045
4 Conclusions and Future Work This paper describes the first ideas of a new approach for external plagiarism detection. This approach is based on the characterization of the common (possible reused) text between the source and suspicious documents by its relevance and fragmentation. In particular, it considers a set of features that denote the frequency of occurrence of the common sequences as well as their length distribution. The main assumption is that the larger and the less frequent the common sequences the greater the evidence of plagiarism. Experimental results on a subset of 253 pairs of source-suspicious documents from the METER corpus are encouraging; they indicated that the proposed features are appropriate for the plagiarism detection task and that they provide relevant elements for a classifier to discriminate between plagiarized and non plagiarized documents. In particular, the achieved accuracy results outperformed by more than 7% the results from other current methods based on the use of one single feature describing the degree of overlap between the documents. As future work we plan to investigate more features describing the common text between the source and suspicious documents. For instance, we consider incorporating some features that describe the density of the common sequences in the suspicious document as well as features that capture their relative order in both documents. In addition we plan to improve the evaluation of the relevance of single words by computing statistics from the Web or other external but thematically related corpus. Acknowledgments. This work was done under partial support of CONACYT (Project grants 83459, 82050, 106013 and scholarship 258345), and the research work of the last author is partially funded by CONACYT-Mexico and the MICINN project TEXTENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I+D+i). In addition, we thank Paul Clough for his help by providing us the METER corpus.
References 1. Barrón-Cedeño, A., Rosso, P.: On Automatic Plagiarism Detection Based on n-grams Comparison. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval (ECIR), Berlin, Heidelberg (2009) 2. Basile, C., Benedetto, D., Caglioti, E., Cristadoro, G., Degli Esposti, M.: A Plagiarism Detection Procedure in Three Steps: Selection, Matches and “Squares”. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), Donostia-San Sebastian, Spain, pp. 1–9 (September 2009)
Towards Document Plagiarism Detection Based on the Relevance and Fragmentation
31
3. Clough, P.: Old and new challenges in automatic plagiarism detection. National Plagiarism Advisory Service 76 (2003) 4. Clough, P., Gaizauskas, R., Piao, S., Wilks, Y.: METER: Measuring Text Reuse. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia (2002) 5. Gaizauskas, R., Foster, J., Wilks, Y., Arundel, J., Clough, P., Piao, S.: The meter corpus: A corpus for analysing journalistic text reuse. In: Proceedings of the Corpus Linguistics 2001 Conference (2001) 6. Grozea, C., Gehl, C., Popescu, M.: ENCOPLOT: Pairwise Sequence Matching in Linear Time Applied to Plagiarism Detection. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), DonostiaSan Sebastian, Spain, pp. 1–9 (September 2009) 7. Kasprzak, J., Brandejs, M., Křipač, M.: Finding Plagiarism by Evaluating Document Similarities. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), Donostia-San Sebastian, Spain, pp. 1–9 (September 2009) 8. Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st International Competition on Plagiarism Detection. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), Donostia-San Sebastian, Spain, pp. 1–9 (September 2009) 9. Sebastiani, F.: Machine learning in automated text categorization. ACM Comp. Surv. 34(1) (2002) 10. Witten, I.H., Frank, E.: Data Mining Practical Machine Learning Tools and Techniques. Elsevier, Amsterdam (2005) 11. Zechner, M., Muhr, M., Kern, R., Granitzer, M.: External and Intrinsic Plagiarism Detection using Vector Space Models. In: Proceedings of the SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN 2009), DonostiaSan Sebastian, Spain, pp. 1–9 (September 2009)
Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits Afraz Z. Syed1, Muhammad Aslam2, and Ana Maria Martinez-Enriquez3 1,2 Department of CS & E, U.E.T., Lahore, Pakistan
[email protected],
[email protected] 3 Department of CS, CINVESTAV-IPN, D.F. Mexico
[email protected]
Abstract. Like other languages, Urdu websites are becoming more popular, because the people prefer to share opinions and express sentiments in their own language. Sentiment analyzers developed for other well-studied languages, like English, are not workable for Urdu, due to their scriptic, morphological, and grammatical differences. As a result, this language should be studied as an independent problem domain. Our approach towards sentiment analysis is based on the identification and extraction of SentiUnits from the given text, using shallow parsing. SentiUnits are the expressions, which contain the sentiment information in a sentence. We use sentiment-annotated lexicon based approach. Unluckily, for Urdu language no such lexicon exists. So, a major part of this research consists in developing such a lexicon. Hence, this paper is presented as a base line for this colossal and complex task. Our goal is to highlight the linguistic (grammar and morphology) as well as technical aspects of this multidimensional research problem. The performance of the system is evaluated on multiple texts and the achieved results are quite satisfactory. Keywords: Natural language processing, computational linguistics, sentiment analysis, opinion mining, shallow parsing, Urdu text processing.
1 Introduction Natural Language Processing (NLP) or Computational Linguistics (CL) is a challenging field in Artificial Intelligence. The philosophical, psychological and conceptual nature of natural language entails complexity to process it. NLP applications can be seen as bi-dimensional problems. For some applications, computational aspect is more important. For instance, spell checkers, machine translators, grammar checkers, and human computer interaction based applications. These solutions conceal original issues in modeling aspects of the language processing and therefore, these are of no real conceptual interest. But on the other hand, there are some applications for which linguistics is a major concern. In this case, the goal is not only to simulate human language processing, but also to understand and manipulate the conceptual and psychological knowledge. Poetry generation, story generation, intelligent information retrieval, and sentiment analysis lay in this category. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 32–43, 2010. © Springer-Verlag Berlin Heidelberg 2010
Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits
33
For both types of computational linguistic applications mentioned above, English is a very well studied language. Among other areas of natural language processing, an emerging field is sentiment analysis (SA). Since the last decade, this area is the center of focus for NLP researchers. The need for sentiment analysis is the outcome of sudden increase in the opinionated or sentimental text, which is in the form of blogs, reviews, and discussions [1]. The main feature of such text is its variability in structure, style, and vocabulary. Researchers have made great attempts to cope with this variability issue. Some of the contributions are quite successful like [2] [3] [4]. In theses contributions SA is presented as a classification problem. A text can be categorized as objective or subjective. The objective text is fact based neutral text, but opinions, reviews and discussions are all in the category of subjective text, which exhibit some feeling or sentiment. The main goal of a sentiment analyzer is to classify the subjectivity orientation towards positive or negative. That is why sentiment analysis is sometimes referred to as subjectivity analysis. The approaches mentioned above are mostly addressed for English text, but these are not able to handle Arabic and Arabic-orthography-based language like Urdu, Persian, and some other local dialects of south Asia. These languages have altogether different script, morphology, and grammatical rules [5]. With increasing popularity of Internet, like other languages, Urdu websites are also becoming more popular as people prefer to share opinions and express sentiments in their own language. Keeping this fact in mind, for our research, in this paper, we propose a sentiment analyzer for Urdu language, which is spoken and understood in major part of Asia. As already mentioned Urdu is a quite different language from English in both script and morphology. Despite, script similarities with Arabic (spoken in Saudi Arabia, United Arab Emirates, and many other Arab and African countries) and Persian (spoken in Iran, Afghanistan, and many states of former Soviet Union), and morphological similarities with Hindi (spoken in India), Urdu has its own requirements, as far as CL is concerned. Literature survey shows that [5] and [6] approaches used for other languages are not applicable for the proper handling of Urdu text. This fact entails the need of an updated or even altogether new sentiment analysis model. The rest of the paper proceeds as follows: Section 2 depicts SA in general with some related works. Section 3 gives a brief overview of Urdu language; our focus in this section is on highlighting the major concerns (linguistic as well as technical) in sentiment analysis of Urdu text. We also present a comparison of Urdu with English, Arabic and Persian. Section 4 describes the SentiUnits (words and phrases carrying sentiment information of a sentence) and their attributes in details. Section 5 describes our methodology through a system model and its validation process. Finally, in Section 6 we conclude our effort and discuss some future directions.
2 Sentiment Analysis John Locke (1632-1704) said, "Man is by nature a social animal". So, man always seeks for suggestions, opinions, and views from other people in society for his survival and proper decisions making. In this modern era of computer and technology we are living in virtual communities and societies. Now, Internet forums, blogs, consumer reports, product reviews, and other type of discussion groups have opened new
34
A.Z. Syed, M. Aslam, and A.M. Martinez-Enriquez
horizons for human mind. That is why, from casting a vote to buying a latest gadget we search for opinions and reviews from other people on the Internet. This explosion of opinionated text has fashioned an exciting area in text analysis, which is referred by many names like sentiment analysis, opinion mining, subjectivity analysis, and appraisal extraction [1]. For this paper we use the term Sentiment analysis. 2.1 Sentiment Analysis Related Work Sentiment analysis is considered as a classification problem [7]. The opinionated text is classified as positive or negative e.g., thumbs up or thumbs down. Some classifiers use a multi-point-range e.g., five star scale for movie reviews, etc. The classification usually starts from term or phrase level and moves towards sentence and then document level. Usually, the output of one level becomes input to the next [1]. There are, a number of features on the basis of which, the text is classified. For example, frequency and position of a term or information of a part of speech like adjectives and adverbs [3] [4] [8]. However, an important aspect is to define semantic orientation of words and phrases. In this regard, the following approaches are used [9]: a) Lexicon based approaches. These approaches are based on lexicons of words/phrases or expressions, which are pre-tagged with sentiment, subjectivity or polarity information about each entry. This lexicon can be manually created as well as automatically generated, using machine learning methods for example [7] and [10]. Some researchers have used WordNet for sentiment mining like [11]. Each word in the text (to be analyzed) is compared with the lexicon entry. As a result a positive or negative orientation score is attached to it. The total score of a sentence is then computed as a sum of word scores. Consider a sentence, “This is a high quality camera” in which, the phrase “high quality” is the only semantic unit containing sentiment information (positive). All other words are neutral and hence the sentence is classified as a positive comment. b) Machine learning approaches. The main focus of these approaches is on feature vectors, which are selected according to domain. A classifier is trained by a tagged corpus. Feature selection is a crucial issue, which can highly affect the results [9]. Domain-Specific Contributions. It is observed that sentimental analysis of the text is highly domain specific in nature [1]. Thus, development of a generalized solution for analyzing all domains is still an open research direction. In literature this is referred to as Domain Adaptation [12] [13]. Consequently, most of the contributions try to spot a particular domain. For example, analysis of reviews related to products and movies. Such texts are relatively easier to handle due to specified targets and their attributes [9]. On the contrary, political speeches and discussions are perhaps the most complex to handle. In [14] is pinpointed an issue and evaluated whether the speech was in favor or opposition. Other challenging domains of research are News texts in which organizations, personalities, and events etc are center of focus [2]. Handling Negations. The words like, “no, not, do not, don’t, can’t” are called negations. These words can altogether alter the sense of a sentence, so are very important
Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits
35
to tackle. Different approaches are used to handle negations, e.g., [15] process negations as a part of post processing. They associate negating words with the subjective components of the sentence using co-location. This technique works in sentences like “I don’t like”, “This is not good” but is not effective in “No doubt it is amazing” [1]. In [16] is considered negations as part of appraisal expressions annotated with attitudes. In this way, negation becomes independent of location. Another approach is the use of POS tagged corpus [17].
3 Urdu Language About 60.5 million speakers mainly in Indian subcontinent [19], Urdu is a widely spoken Indo-Aryan language. A large number of speakers exist in Pakistan, India, Afghanistan, Iran, and Bangladesh. Moreover, Urdu is the Pakistan official language and scheduled of India. Urdu orthography resembles Arabic, Persian, and Turkish. Cursive Arabic script and Nastalique writing style is used [20]. 3.1 Urdu: In Comparison with Other Languages Urdu vs. English. For English language sentiment analysis is well explored. In [1] and [9] very extensive and comprehensive surveys are presented. Although the core approaches used to handle English text (lexicon based and machine learning, described in detail in related work) can be used for Urdu text but modifications and adaptations are compulsory due to vast orthographic, morphological, and grammatical differences between both languages as described in preceding section. Urdu vs. Arabic. A major language, which is comparable with Urdu, is Arabic. In computational linguistic realm Arabic is much mature than Urdu, and a number of approaches are proposed for Arabic text processing. Orthographically and morphologically both languages are very similar, but Urdu grammar is more inclined towards Sanskrit and Persian [20]. So, for appropriate processing of Urdu text we need to revise these approaches or require developing entirely new solutions. Urdu vs. Hindi. Hindi is a major dialect of Urdu and there are minimal differences in the grammar of both languages. But their orthography and vocabulary are dissimilar. Urdu uses right to left Nastalique calligraphy style of Persio-Arabic script and draws vocabulary from Arabic and Persian. Whereas, Hindi uses left to right Devanagari (ȯ ȡ) script and draws vocabulary from Sanskrit. 3.2 Major Concerns in Urdu Language Processing As already mentioned Urdu NLP is not a very established area. There are a number of hurdles with which we have to cope before applying sentiment analysis. Here, we highlight some major concerns in Urdu processing for accomplishing this complex task. We present both linguistic and technical difficulties with literature review.
36
A.Z. Syed, M. Aslam, and A.M. Martinez-Enriquez
3.2.1 Technical Aspects a) Corpus. Urdu websites are becoming popular day by day, even though these websites cannot be used for corpus construction, because such a task needs large amount of electronic text. This is an unfortunate fact that most of the Urdu websites use graphic formats i.e. gif or other image formats, to display Urdu text [21]. b) Lexicon. For lexicon based sentiment analysis we need a sentiment-annotated lexicon of Urdu words. Unluckily there is no such lexicon available or even developed to date. So, from conception to modeling and then implementation we have to cope with this challenging task. c) Word Segmentation. Urdu orthography is context sensitive. The “( ”ﺣﺮوفharoof, alphabets)1 have multiple glyphs and shapes and are categorized as joiners and nonjoiners. Moreover, word boundaries are not indicated by space. A single word can have a space in it, e.g., “( ”ﺧﻮب ﺻﻮرتkhoob surat, beautiful). On the contrary, two different words can be written without space, e.g., “( ”دﺳﺘﮕﻴﺮdastgeer, benefactor). This segmentation issue is divided into two sub problems [19]: a) Space-insertion, b) Space-deletion. This work emphasizes on the orthographic word “OW” instead of “word”, as an example consider again the word “( ”ﺧﻮب ﺻﻮرتkhoob surat, beautiful), orthographically it is a single word based on two lexical words. 3.2.2 Linguistic Aspects The following aspects are given, particularly, in comparison with English language: a) Variability in Morphology. In English there are mostly hard and fast inflectional and derivational rules applied on morphemes. For example, “s” or “es” suffixes are mostly used to make plurals, like in “chair + s” and “dish+es”. Exceptions are there but are very rare and can easily be handled. On the other hand, Urdu morphology is very complex. Inflection, derivation, compounding, and duplication are very common phenomenon. The plural can be indicated by a number of ways. For example, in the sentence: “ﺑﮩﺖ ﺳﺎرے ﭘﻬﻮل-” (bohat sare phool, A lot of flowers.) no plural suffix is used. And in, “ﭘﻬﻮﻟﻮں ﮐﮯ رﻧﮓ-” (pholoon kay rang, Colors of Flowers.) plural suffix “( “وںon) is used without any replacement. But in the sentence: “ﭘﻮدے ﺳﺒﺰﮨﻴﮟ-” (poday sabz hain, Plants are green.) plural suffix is “( ”ےay) and is replacing “(”اaa). b) Flexibility in vocabulary. Urdu has abridged several languages. It has words from languages like Arabic, Persian, Hindi, English Turkish and even more. The absorption power of Urdu is very exceptional and it enhances the beauty of the language. But, unfortunately this makes our work more challenging. Code switching, using multiple languages concurrently is also very common in Urdu writings,. For example, “ﮐﺮ دو mobile off” (Mobile off kar do, Turn off the mobile) means, “switch off the mobile”. c) Case markers. [21] Identifies eleven categories of POS tags in Urdu language: noun, verb, adjective, adverb, numeral, postposition, conjunction, pronoun, auxiliaries, case markers, and “( ”ﺣﺮفharf). Among all of them, the case markers are very dissimilar in nature for Urdu in comparison with other languages because they are 1
We first write the Urdu word/sentence in Persio-Arabic script based Nastalique writing style. Pronunciation is enclosed in parenthesis, followed by English translation.
Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits
37
written with space. Therefore, they are considered as a distinct POS tagged word. This distinction adds to the ambiguity of the words with which they are semantically associated. There are four case markers ergative “( ”ﻧﮯne), instrumentive “( ”ﺳﮯse), genitive “( ”ﮐﺎka) and dative/accusative “( ”ﮐﻮko).
4 SentiUnits In an opinionated sentence, all terms are not subjective. Indeed, the sentimentality of a sentence depends, only on some specific words or phrases. Consider, the examples “Fatima is an adorable child.” and “Irtaza is such a nice boy.” underlined words are the expressions made of one or more words, which carry the sentiment information of the whole sentence. We recognize them as SentiUnits. We can judge, only these units, as the representatives of the whole sentence’s sentiment. These are identified by shallow parsing based chunking. We consider two types of SentiUnits: • •
Single Adjective Phrases are made of adjective head and possible modifiers, e.g. “( ”ﺑﮩﺖ ﺧﻮشbohat khush, very happy), “( ”زﻳﺎدﮦ ﺑﮩﺎدرzyada bhadur, more brave) Multiple Adjective Phrases comprise of more than one adjective with a delimiter or a conjunction in between, e.g. “( ”ﺑﮩﺖ ﭼﺎﻻﮎ اور ﻃﺎﻗﺘﻮرbohat chalak aur taqatwar, very clever and strong).
4.1 Attributes of SentiUnits A SentiUnit can be described by following attributes: A) Adjectives(as head words). Conceptually, adjectives in Urdu can be divided into two types. First type describes quantity and quality, e.g. “( ”ﮐﻢkam, less), “”ﺑﺪﺗﺮﻳﻦ (budtareen, worst), “( ”زﻳﺎدﮦziyada, more). And the second distinguishes one person from other, e.g. “( ”ﺣﺴﻴﻦhaseen, pretty), “( ”ﻓﻄﻴﻦfateen, intelligent). Further, adjectives are categorized as marked, which, can be inflected for number and gender and unmarked which are usually Persian loan words. Also the adjectives inflected from nouns remain unmarked [22]. For examples, see Table 1. Table 1. Types of Urdu adjectives as Marked and unmarked Marked Male اﭼﻬﺎ ﮐﺎم acha kaam good work
Female اﭼﻬﺎ ﻗﻠﻢ acha qalam good pen
Unmarked Number اﭼﻬﮯﺁم achay aam good mangoes
Persian loan ﺗﺎزﮦ tazah fresh
Inflected from Noun دﻓﺘﺮﯼ دﻓﺘﺮ daftary daftar official office
Attributive adjectives precede the noun they qualify. Arabic and Persian loan adjectives are used predicatively and appear in the form of phrases. See Table 2.
38
A.Z. Syed, M. Aslam, and A.M. Martinez-Enriquez Table 2. Types of Urdu adjectives as attributive and predicative Attributive (precede the noun they qualify) Adjective Noun ﻣﺰﻳﺪار ﻣﺰﮦ (mazedar, tasty) (maza, taste)
Predicative Persian and Arabic based ﻣﻌﻠﻮم ﮨﻮﻧﺎ (maloom hona, to be known)
The postpositions “( ”ﺳﮯsay), “( ”ﺳﯽsi), “( ”ﺳﺎsa) and “( ”واﻻwala), “( ”واﻟﯽwali), “( ”واﻳﮯwalay) are very frequently used with noun to make adjectives. Examples are listed in Table 3. Whereas Table 4 shows the derivation of adjectives from nouns: Table 3. Use of postpositions with adjectives Noun ﭘﻬﻮل (phool, flower) ﭼﺎﻧﺪ (chand, moon)
With postposition si, sa, say ﭘﻬﻮل ﺳﯽ (phool si, like flower) ﭼﺎﻧﺪ ﺳﺎ (chand sa, like moon)
With postposition wala, wali, walay اوﭘﺮ واﻟﯽ (oopar wali, the upper one) اﭼﻬﮯ واﻟﮯ (achay walay, the good ones)
Table 4. Use of postpositions with adjectives
Noun ( ﺑﺮفbarf, ice) ( دردdard, pain) ( ﺑﻬﻮﮎbhook, hunger)
Adjectives ( ﺑﺮﻓﻴﻼbarf-eela, icy) ( دردﻧﺎﮎdard-nak, painful) ( ﺑﻬﻮﮐﺎbhooka, hungry)
B) Modifiers. These are classified as absolute, comparative and superlative: a) Absolute. Simple adjectives without modifiers make absolute expressions, e.g. “ﻳہ ﻟﺒﺎس ﻣﮩﻨﮕﺎ ﮨﮯ-” (Yeh libaas mehnga hay, This dress is expensive.) b) Comparative. Two comparative modifiers are there: “( ”ﺳﮯsay) or “( ”ﺳﮯ ذﻳﺎدﮦsay zyadha), e.g. “ﻳہ ﻟﺒﺎس اس ﺳﮯ ﻣﮩﻨﮕﺎ ﮨﮯ-” (Yeh libaas us say mehnga hay, This dress is more expensive than that). Or “ﻳہ ﻟﺒﺎس اس ﺳﮯ زﻳﺎدﮦ ﻣﮩﻨﮕﺎ ﮨﮯ-” (Yeh libaas us say zyadah mehnga hay, This dress is more expensive than that.) c) Superlative. For superlatives “( ”ﺳﺐ ﺳﮯsab say) and “( ”ﺳﺐ ﻣﻴﮟsab main) are used, e.g.“ﻳہ ﻟﺒﺎس ﺳﺐ ﺳﮯ ﻣﮩﻨﮕﺎ ﮨﮯ-” (Yeh libaas sab say mehnga hay, This dress is the most expensive), and “ﻳہ ﻟﺒﺎس ﺳﺐ ﻣﻴﮟ ﻣﮩﻨﮕﺎ ﮨﮯ-” (Yeh libaas sab main mehnga hay, This dress is the most expensive). C) Orientation. Orientation describes the positivity or negativity of an expression, e.g. "( "اﭼﻬﺎacha, good) have positive orientation. D) Intensity. This is the intensity of orientation, e.g. “( ”ﺑﮩﺘﺮbehtar, better). E) Polarity. A polarity mark is attached to each lexicon entry to show its orientation. F) Negations. Negations are the polarity shifters, e.g. “ارﺗﻀﯽ اﭼﻬﺎ ﮨﮯ-” (Irtaza acha hay, Irtaza is nice.) is a positive sentence but with the use of negation its polarity shifts to negative, i.e. “ارﺗﻀﯽ اﭼﻬﺎ ﻧﮩﻴﮟ ﮨﮯ-” (Irtaza acha naheen hay, Irtaza is not nice.)
Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits
39
5 Methodology The research work is divided into two main tasks: •
To create sentiment-annotated lexicon for the inclusion of information about the subjectivity of a word/phrase in addition to its orthographic, phonological, syntactic, and morphological aspects. To build an appropriate classification model for the processing and classification text in accordance with the inherent sentiments.
•
From the approaches discussed in Section 2, we use a lexicon-based approach. Fig. 1 shows the context model, in which the classification system represents the process. Whereas, sentiment annotated lexicon holds the sentiment orientation of each entry.
Urdu text (review)
0
Analysis Results
Classification System
Website
POS Tagged words/phrases
Polarity Tagged words/phrases
Sentiment annotated Lexicon
Fig. 1. Context model of the system
5.1 Sentiment-Annotated Lexicon The fundamental part of our research is the construction of sentiment-annotated lexicon. A SentiUnit is classified on the basis of orientation and intensity. Orientation is predicted by marked polarity and intensity is calculated by analyzing the modifiers. For example the intensity of “( ”ﺑﮩﺖ ذﻳﺎدﮦbohat zyadah, much more) is more than “”ذﻳﺎدﮦ (zyadah, more). The lexical construction tasks are divided as follows: • • • • • • • •
Identify the sentiment-oriented words/phrases in Urdu language. Identify morphological rules, e.g. inflection or derivation. Identify grammatical rules, e.g. use of modifiers. Identify semantics between different entries, e.g. synonyms, antonyms, and cross-references. Identify and annotate polarities to the entries. Identify modifiers and annotate intensities. Differentiate between multiple POS tags for same entries. Construct lexicon
5.2 Sentiment Classification A high level model of the classification system is presented in the Fig. 2.
40
A.Z. Syed, M. Aslam, and A.M. Martinez-Enriquez
Sentence as sequence of words and other symbols W = w1, w2,…,wn Normalization Preprocessing
Sequence of words W = w1, w2,.....wn Segmentation Sequence of orthographic words OW = ow1,ow2,…,own POS Tagging
Shallow Parsing
Sequence of tags T = t1, t2, ….tn Phrase Chunking Sequence of phrases Role Tagging SentiUnits Comparer
Classification
Lexicon
Polarity assigned output Classifier Sentiment of the sentence
Fig. 2. Classification process of the system
The process of sentiment analysis is composed of three phases (see Fig.2): Preprocessing. This phase prepares text for sentiment analysis. It is usually, based on removal of punctuations and striping of HTML tags [9]. But, due to Urdu’s orthographic characteristics, (i.e., optional use of diacritics and ambiguity in word boundaries) [19], we add two more tasks: Diacritic omission. In Urdu diacritics are optional and their use is highly author dependent. So as a regular practice these are removed during text normalization [19]. Word segmentation. As mentioned in section 3.2.1, Urdu orthography is context sensitive and word boundaries are not always identified by space like in English. So the outputs of the preprocessing phase are the orthographic words which can/cannot have space within. SentiUnit extraction. After preprocessing, a shallow parsing is applied to identify SentiUnits. At the same time, negation is considered because; negation can altogether change the polarity of the sentence. Classification. The extracted SentiUnits are compared with lexicon and their polarities are calculated for classification as positive or negative. After calculating the sentence polarity “s” a total post polarity “p” is calculated by adding all sentence polarities, i.e. p = s1 + s2 + s3 …..sn.
Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits
41
Fig. 3 shows execution of a single sentence. “ﮔﺎڑﯼ ﮐﺎ ﻳہ ﻣﺎڈل ﺧﻮب ﺻﻮرت ﻧﮩﻴﮟ-” (Gari ka yeh model khoobsurat naheen hay, This model of the car is not beautiful). ؐϴ٬ϧΕέϮλΏϮΧϝվΎϣ؟ϳΎ̯ ̵Ύ̳Preprocessing
ؐϴ٬ϧΕέϮλΏϮΧϝվΎϣ؟ϳΎ̯ ̵Ύ̳ ؐϴ٬ϧ |ΕέϮλΏϮΧ |ϝվΎϣ |؟ϳ |Ύ̯ |̵Ύ̳ ؐϴ٬ϧ |ΕέϮλΏϮΧ |ϝվΎϣ |؟ϳ |Ύ̯ |̵Ύ̳
Shallow Parsing
ؐϴ٬ϧ |ΕέϮλΏϮΧ |ϝվΎϣ؟ϳ |Ύ̯ ̵Ύ̳ ؐϴ٬ϧΕέϮλΏϮΧ |ϝվΎϣ؟ϳ |Ύ̯ ̵Ύ̳ |
Classification
ؐϴ٬ϧΕέϮλΏϮΧ |ϝվΎϣ؟ϳ |Ύ̯ ̵Ύ̳ | Result: This is a negative comment.
Fig. 3. Example execution of the analyzer
5.3 Validation Due to deficiency of publicly accessible corpus of Urdu text for sentiment analyze, a sentiment corpus was collected and constructed from domains like movies and products (electronic appliances from main brands in Pakistan) reviews. We processed 753 reviews from which 435 are movies and 318 product reviews. Positive (361) and negative (392) documents are included in both categories. The experimented results performed are shown in Table 5. The identification and extraction of SentiUnit is quite practical. Furthermore, sentiment annotated lexicon can extend the same model. Despite morphological complexity of adjectives and propositions used, we are hopeful to proceed on the same line, by including other speech parts. Further, we have considered certain observations during the evaluation: • • • • •
The classification accuracy for SentiUnits with unmarked adjectives is about 75%, and for marked adjectives is 71%. The SentiUnits with adjectives made by postpositions combined with nouns, cause errors and hence, an improved algorithm is required. On the other hand, adjectives made by inflected nouns, entail the best results, with an accuracy of 80-85%. The most frequent modifiers are “( ”ذﻳﺎدﮦzyadah, more) and “( ”ﮐﻢkam, less). Negations are less problematic in Urdu sentiment analysis, because these are present only in specific patterns.
42
A.Z. Syed, M. Aslam, and A.M. Martinez-Enriquez Table 5. Experiment results from sentiment corpora Domains Movies C1
Total Number 435
Orientation Positive Negative
Number 215 220
Product C2
318
Positive Negative
146 172
Over all Accuracy (%) 72% 78%
6 Conclusions and Future Work Despite, the developments in sentiment analysis of English text, it is a fact that for Urdu language this domain is still an open challenge. In this paper, we effectively identify the major concerns and explore the possible solutions for Urdu language. We also present a comprehensive overview of adjectives and their modifiers, with respect to the task of SentiUnits extraction. In consequence, our approach can serve as a baseline for this issue. Among a number of possible future works, the most important is the extension of the lexicon. Adding more adjectives and modifiers as well as other parts of speech can trigger this task. Moreover, for achieving better results the SentiUnits should be annotated with their respective targets.
References 1. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundation and Trends in Information Retrieval 2(1-2), 1–135 (2008) 2. Bautin, M., Vijayarenu, L., Skiena, S.: International sentiment analysis for news and blogs. In: International Conference on Weblogs and Social Media, ICWSM (2008) 3. Hatzivassiloglou, V., Wiebe, J.: Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In: 18th International Conference on Computational Linguistics, New Brunswick, NJ (2000). 4. Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: ACL, Ph, PA, pp. 417–424 (July 2002) 5. Riaz, K.: Challenges in Urdu Stemming. Future Directions in Information Access, Glasgow (August 2007) 6. Akram, Q., Naseer, A., Hussain, S.: Assas-band, an Affix-Exception-List Based Urdu Stemmer. In: 7th Workshop on Asian Language Resources, IJCNLP 2009, Singapore (2009) 7. Melville, P., Gryc, W., Lawrence, R.D.: Sentiment analysis of blogs by combining lexical knowledge with text classification. In: Conference on Knowledge Discovery and Data Mining (2009) 8. Bloom, K., Argamon, S.: Unsupervised Extraction of Appraisal Expressions. In: Farzindar, A., Kešelj, V. (eds.) Canadian AI 2010. LNCS (LNAI), vol. 6085, pp. 290–294. Springer, Heidelberg (2010) 9. Annet, M., Kondark, G.: A comparison of sentiment analysis techniques: Polarizing movie blogs. In: Bergler, S. (ed.) Canadian AI 2008. LNCS (LNAI), vol. 5032, pp. 25–35. Springer, Heidelberg (2008)
Lexicon Based Sentiment Analysis of Urdu Text Using SentiUnits
43
10. Bloom, K., Argamon, S.: Automated learning of appraisal extraction patterns. In: Gries, S.T., Wulff, S., Davies, M. (eds.) Corpus Linguistic Applications: Current Studies, New Directions. Rodopi, Amsterdam (2009) 11. Andreevskaia, A., Bergler, S.: Mining WordNet for fuzzy sentiment: Sentiment tag extraction from WordNet glosses. In: EACL 2006, Trent, Italy (2006) 12. Mansour, Y., Mohri, M., Rostamizadeh, A.: Multiple source adaptation and the Renyi divergence. In: Uncertainty in Artificial Intelligence, UAI (2009) 13. Tan, S., Cheng, Z., Wang, Y., Xu, H.: Adapting Naive Bayes to Domain Adaptation for Sentiment Analysis. In: Advances in Information Retrieval, vol. 5478, pp. 337–349 (2009) 14. Bansal, M., Cardi, C., Lee, L.: The power of negative thinking: Exploring label disagreement in the min cut classification framework. In: International Conference in Computational Linguistics, COLING (2008) 15. Hu, M., Lui, B.: Mining and summarizing customer reviews. In: Conference on Human Language Technology and Empirical Methods in Natural Language Processing (2005) 16. Whitelaw, C., Garg, N., Argamon, S.: Using appraisal taxonomies for sentiment analysis. In: SIGIR (2005) 17. Na, J.-C., Sui, H., Khoo, C., Chan, S., Zhou, Y.: Effectiveness of simple linguistic processing in automatic sentiment classification of product reviews. In: Conference of the International Society of Knowledge Organization (ISKO), pp. 49–54 (2004) 18. Muaz, A., Khan, A.: The morphosyntactic behavior of ‘Wala’ in Urdu Language. In: 28th Annual Meeting of the South Asian Language Analysis Roundtable, SALA 2009, University of North Texas, US (2009) 19. Durrani, N., Hussain, S.: Urdu Word Segmentation. In: 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), Los Angeles, US (2010) 20. Riaz, K.: Stop Word Identification in Urdu. In: Conference of Language and Technology, Bara Gali, Pakistan (August 2007) 21. Ijaz, M., Hussain, S.: Corpus based Urdu Lexicon Development. In: Conference on Language Technology (CLT 2007), University of Peshawar, Pakistan (2007) 22. Schmidt, R.: Urdu: An Essential Grammar. Routlege Publishing, New York (2000)
A Semantic Oriented Approach to Textual Entailment Using WordNet-Based Measures Julio J. Castillo1,2 1 2
National University of Cordoba-FaMAF, Cordoba, Argentina National Technological University-FRC, Cordoba, Argentina
[email protected]
Abstract. In this paper, we present a Recognizing Textual Entailment system which uses semantic similarity metrics to sentence level only using WordNet as source of knowledge. We show how the widely used semantic measures WordNet-based can be generalized to build sentence level semantic metrics in order to be used in the RTE. We also provide an analysis of efficiency of these metrics and drawn some conclusions about their utility in the practice in recognizing textual entailment. We also show that using the proposed method to extend word semantic measures could be used in building an average score system that only uses semantic information from WordNet. Keywords: Recognizing Textual Entailment, WordNet, Semantic Similarity.
1 Introduction The objective of the Recognizing Textual Entailment (RTE) task [1], [2] is determining whether or not the meaning of the “hypothesis” (H) can be inferred from a “text” (T). Thus, we say that “T entails H”, if a human reading T would infer that H is most likely true. The 2-way RTE task consists of deciding whether: T entails H, in which case the pair will be marked as “Entailment”, otherwise the pair will be marked as “No Entailment”. This definition of entailment is based on (and assumes) common human understanding of language as well as common background knowledge, as in the following example (pair id=33, RTE3 dataset). T=“As leaders gather in Argentina ahead of this weekends regional talks, Hugo Chávez, Venezuela's populist president, is using an energy windfall to win friends and promote his vision of 21st-century socialism.” H=“Chávez is a follower of socialism.” Recently the RTE4 Challenge has changed to a 3-way task that consists in distinguish among “Entailment”, “Contradiction” and “Unknown” when there is no information to accept or reject the hypothesis. In this paper, we address the RTE problem by using a machine learning approach. All feature set are WordNet-based, with the aim of measuring the benefit of WordNet as a knowledge resource to the RTE task. Thus, we tested the classifiers most widely used by other researchers, and showed how the training set could impact over them. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 44–55, 2010. © Springer-Verlag Berlin Heidelberg 2010
A Semantic Oriented Approach to Textual Entailment
45
Several authors [3, 4, and 5] among others have used Wordnet in textual entailment task. In [6] the author showed that some basic measure WordNet-based, seems to be enough to build an average score RTE system. We extend these results showing an analysis over the measures WordNet-based most widely used, to measure the semantic similarity between two concepts. The WordNet::Similarity package written by Ted Pedersen et al [7], [8] provides a very useful tool to compute semantic similarity metrics WordNet-based. We also provide some efficiency considerations in order to use these semantic measures in the practice in RTE task. Additionally, we use the Pirro & Seco similarity measure that is not contained in the WordNet::Similarity package and a measure to compute sentence similarity used in [6]. These metrics we selected for two mainly reasons: one, because of their well knows performance in computing semantic similarity using WordNet and second, because of these are the most widely used measures over WordNet in the natural language applications. The sentence level similarity measures are time consuming, and therefore we must to have into account which measures to choose implement, looking for a balance between performance vs. computational cost. Thus, in this paper we provide an analysis about efficiency considerations in order to deal with this issue. This paper continues with a system overview in Section 2. Section 3 shows how word-to-word semantic measures WordNet-based are extended to sentence level measures. Section 4 presents efficiency considerations when using these sentence metrics in a real RTE system. Section 5 provides an experimental evaluation and discussion of the results. Finally, in Section 6 some conclusions are provided.
2 System Overview The system is based on a machine learning approach for RTE. Using a machine learning approach we tested with different classifiers in order to classify RTE4 and RTE5 test pairs (provided by NIST in TAC Challenges). To deal with RTE4-5 in a 2-way task, we needed to convert this corpus only in two classes: Yes (Entailment), and No (No Entailment). For this purpose, both Contradiction and Unknown classes were taken as class No. The system produces feature vectors for the datasets RTE3, RTE4 and RTE5. Weka [18] is used to train classifiers on these feature vectors. We experimented with the following four machine learning algorithms: Support Vector Machine (SVM), AdaBoost (AB), Multilayer Perceptron (MLP), and Decision Trees (DT). Thus, we make use of classifiers used by other researchers in order to provide a common framework of evaluation. Additionally, we generated the following training sets: RTE3-4C, and RTE4-4C such as is proposed by Castillo [19] in order to extend the RTEs data sets by using machine translation engines. Thus, we analyze ten WordNetbased measures with the aim of obtain the maximum similarity between two concepts. The measures used are: Resnik [9], Lin [10], Jiang & Conrath [11], Pirro & Seco [12], Wu & Palmer [13], Path Metric, Leacock & Chodorow [14], Hirst &St-Onge [15], Adapted Lesk [16], and a semantic similarity to sentence level [17], which we named SenSim in this paper.
46
J.J. Castillo
3 Semantic Metrics to Sentence Level WordNet-Based In selecting measures to analyze and compare, we focused on those that used WordNet as their knowledge source and that allowed an efficient implementation to address the textual entailment problem, in a given programming language. There are several measures that have been previously developed with the aim of quantify the strength of the semantic relationship. Some of them are distance oriented measures computed over semantic networks, whereas others are based on probability models learned from large text collections, some are based on information theoretic approaches which exploit the notion of Information Content [20], and also there are hybrid approaches [21]. In this context, we are going to focus only in WordNet semantic measures (ranging in semantic networks and information theory), because regarding the RTE task we observed that WordNet is one of the main semantic resource used. On the other hand, Li et al. [20] proposed recently a procedure to compute semantic similarity between short sentences. Here, we proposed a procedure to compute sentence similarity applied to the pairs of RTEs data sets computationally efficient. First, we tried to model the semantic similarity of two texts (T,H) as a function of the semantic similarity of the constituent words of both phrases. In order to reach this objective, we proposed a text to text similarity measure with is based on word to word similarity. We expect that combining word to word similarity metrics would be a good indicator of text to text similarity. The final value of this measure depends on the function which is chosen to compute partial similarities. Consequently, we are interested in assess whether using WordNet semantic similarity measures can help in textual entailment recognition task or not. We experimented with the ten WordNet-based measures stated in Section 2, in order to build a semantic sentence measure. 3.1 Extension to Sentence Similarity Wordnet-Based WordNet is used to calculate the semantic similarity between a T (Text) and an H (Hypothesis). The following procedure is applied: Step 1. Perform WSD (Word Sense Disambiguation) using the Lesk algorithm [22], based on WordNet definitions (glosses). Step 2. A semantic similarity matrix between words in T and H is defined. Step 3. A function Fsim is applied to T and H. Where the Function Fsim could be one of the followings nine functions over concepts s, and t: Function 1. The Resnik [9] similarity metric measures the information content (IC) of the two WordNet concepts s and t by using LCS:
RES ( s, t ) = IC ( LCS ( s, t )) And IC is defined as:
IC ( w) = − log P ( w)
A Semantic Oriented Approach to Textual Entailment
47
Where: P(w) is the probability of finding the word w in a large corpus in English, and LCS(s,t):is the least common subsume of s and t. Function 2. The Lin [10] similarity metric, is based on Resnik’s measure of similarity, and adds a normalization factor consisting of the information content of the two input concepts s and t:
LIN ( s, t ) =
2 ∗ IC ( LCS ( s, t )) IC ( s, t )
Function 3. Another metric considered is Jiang & Conrath [11] which is defined as:
JICO( s, t ) =
1 IC ( s ) + IC (t ) − 2 ∗ IC ( LCS ( s, t ))
The word similarity measures are normalized in [0–1]. The normalization is done by dividing the similarity score provided by the maximum score of that measure. Function 4. The Pirro & Seco [12] similarity metric is also based on Resnik’s measure of similarity, but is defined by using information theory and solving a problem with Resnisk’s measure when computing the similarity between identical concepts yielding the information content value of their most specific common abstraction that subsumes two different concepts (msca). In the practice, msca gives the most specific common abstraction value for the two given synsets, where the synsets are represented as Lucene documents. So, the Pirro & Seco (PISE) similarity metric is the following:
⎧3 ∗ IC (msca( s, t )) − IC ( s ) − IC (t ) PISE ( s, t ) = ⎨ 1 ⎩
, if s ≠ t ⎫ ⎬ , if s = t ⎭
Function 5. The Wu & Palmer [13] measure is based on path length between concepts:
WUPA(C1 ( s), C 2 (t )) =
2 ∗ N3 N1 + N 2 + 2 ∗ N 3
Where: C1 and C2 are the synsets to which s and t belongs, respectively. C3 is the least common superconcept of C1 and C2. N1 is the number of nodes of the path from C1 to C3. N2 is the number of nodes of the path from C2 to C3. N3 is the number of nodes on the path from C3 to root. Function 6. The Path metric is the reciprocal of the length of the shortest path between 2 synsets. Note that we count the 'nodes' (synsets) in the path, not the links. The allowed POS types are nouns and verbs. It is an easy and fast method of getting similarity applying a notion of 'semantic relatedness' via node counting, and is defined as:
PA( s, t ) = Mini ( PathLengthi ( s, t )) Where: PathLengthi ( s, t ) gives the length of the i-Path between s and t. Function 7. The Leacock & Chodorow [14] metric finds the path length between s and t in the “is-a” hierarchy of WordNet. In order to obtain the relatedness of the two concepts, the path length is divided by the depth of the hierarchy (D) in which they reside. Our implementation applies the basic version of this measure by using “fake roots”.
48
J.J. Castillo
LECH (C1 ( s ), C 2 (t )) = − log(
Mini ( PathLengthi ( s, t ) ) 2∗D
Where: D = is the maximum depth of the taxonomy (considering only nouns and verbs). Function 8. The Hirst & St-Onge [15] metric is a similarity measure of the weight of the path connecting two concepts. A path is defined as a sequence of links between two synsets. Thus, two concepts are semantically close whether their WordNet synsets are connected by a path that is not too long and that “does not change direction too often” [7]. The weight (strength of the relationship) of a path between two concepts s, and t, is given by:
HIST ( s, t ) = C − PathLength( s, t ) − k ∗ numbers of changes of direction Where: C and k are constants, if no such path exists, HIST ( s, t ) ) is zero and the synsets are deemed unrelated. Function 9. The Adapted Lesk [16] metric uses the text of the gloss (short written definition) as a unique representation for the underlying concept. This measure assigns relatedness by finding and scoring overlaps between the glosses of the two concepts, and also by using the concepts that are directly linked to them according to WordNet. First, the gloss overlap lengths are squared. Second, a gloss overlap must not consist of all function words. Finally we have that the measure is defined by: AD(s,t) = “sum of the gloss overlap lengths squared for all relations in WordNet for word s, sense X, word t, sense Y” Step 4. Finally, the string similarity between two lists of words is reduced to the problem of bipartite graph matching, being performed by using the Hungarian algorithm over this bipartite graph. Then, we find the assignment that maximizes the sum of ratings of each token. Note that each graph node is a token/word of the list. At the end, the final score is calculated by: finalscore =
∑ opt (Fsim (s, t ))
s ∈T ,t ∈H
Max (Length(T ), Length ( H ))
Where: opt(F) is the optimal assignment in the graph. Length (T) is the number of tokens in T, Length (H) is the number of tokens in H, and Fsim∈ {RES, LIN , JICO, PISE, WUPA, PA, LECH , HIST , AD} . Finally, note that the partial influence on each of the individual similarity is going to be reflected on the overall similarity. 3.2 Using SemSim: Sentence Metric WordNet-Based To build our feature vector (Section 5) we use an additional metric such as [17] to compute sentence level similarity. This metric that we called “SemSim” is used to calculate the semantic similarity between a T and a H. The following procedure is applied:
A Semantic Oriented Approach to Textual Entailment
49
Step 1. Word sense disambiguation using the Lesk algorithm, based on Wordnet definitions. Step 2. A semantic similarity matrix between words in T and H is defined. Words are used only in synonym and hyperonym relationship. The Breadth First Search algorithm is used over these tokens; similarity is calculated using two factors: length of the path and orientation of the path. The semantic similarity between two words/concepts s and t, is computed as: Depth ( LCS ( s, t )) Sim ( s, t ) = 2 × Depth ( s ) + Depth (t ) Where: Depth(s) is the shortest distance from the root node to the current node. Step 3. To obtain the final score, matching average between two sentences T and H is calculated as follows:
SemSim(T , H ) = MatchingAverage(T , H ) = 2 ×
Match (T , H ) Length (T ) + Length ( H )
4 Efficiency Considerations In this section, we propose to analyze the performance of different semantic similarity measures with the aims of determining those best fit in recognizing textual entailment task. In order to qualify the processing time of each of these measures, we addressed five experiments to measure the semantic similarity of two texts following the method of section 3. So, we took five randomly selected pairs from RTE3, RTE4 or RTE5 data sets, and in all cases the results were the same, which allowed to built the Table 1. All measures analyzed on the before section are costly computationally, but some are faster than others. In the following table we show a sorted list in ascending order by time (i.e: the first is faster). The measures that do not show a significant difference in performance are placed in the same row. The table shows a ranking of speed of the measures to compute the semantic similarity between a “T” and “H” pair with the proposed method of section 3. Table 1. Ranking of speed of the measures Measure Path Lin, Resnik, Jiang & Conrath, Leacock & Chodorow, Wu & Palmer, SenSim Pirro & Seco Hirst & St-Onge Adapted Lesk Tanimoto No Hyponyms Adapted Lesk Tanimoto Adapted Lesk
Ranking 1 2 3 4 5 6 7
In order to compute the last four metrics of the table, we adapted a Java implementation of WordNet::Similirality1. 1
http://www.cogs.susx.ac.uk/users/drh21/
50
J.J. Castillo
To compute the Adapted Lesk measure, we use the same scoring mechanism as described in Banerjee & Pedersen [16], but using instead all the related glosses available and not just the direct relations. Therefore, this measure is slow compared to all other measures, mainly because of a matching is performed against all possible overlaps in glosses, which is a computationally expensive process. The Adapted Lesk Tanimoto is a modification of Adapted Lesk measure, which gets the glosses of all synsets that have a direct WordNet relation with each of the input senses, and then get the hyponyms of a hypernym of a sense. Then, the JaccardTanimoto Coefficient is calculated over two vectors containing the count of each word found in all the words in the gloss. The Adapted Lesk Tanimoto No Hyponyms, is similar that before measure but in this case, we do not get the hyponyms of the hypernyms relations of a given synset. As a conclusion, our the experiments suggests that the Adapted Lesk-based measures are not convenient to recognizing textual entailment task, in the sense of efficience because of are very slow, and their contribution is not significant, and in practice this hurt more often than it helped. The measure Hirst &St-Onge is very greedy, as well as Adapted Lesk measure, because it also needs to look in all senses. For example, suppose that we compute only a pair of word “car” and “wheel” of a given pair. The result is obtained after 2.32 minutes using a webservice2 (provided by Ted Pedersen), and 2.51 minutes after using our system. Despite of the fact that in general, the webservices are slows in response time comparing with the response time of a personal computer, we obtained a best response time in the webservice that our implementation. It may be due mainly to two factors. One, we are using a managed language over windows plataform which in general is slower that the equivalent in Linux systems, and two, our machine is a standard notebook (2GHz processor and 4GB Ram) not too fast. With other sentence pairs, we obtained similar times and results. The proposed method in this paper requires O(n2) comparisons, with n = max(lenght (T ), lenght ( H )) . So, for example to process the pair given in the introduction, the method requires approximately 7.36 hours in our standard notebook to finish. As a result, several hours are needed in order to process a typical pair (independently of the cpu speed) and therefore is cannot be useful in a real textual entailment application, following the method that we proposed in Section 3. Additionally, this process could be enhanced by multithread programming, or parallelized by using High Performance Computing, but it is not the common situation for the others existing RTE systems. Then, the proposed method is efficient with the measures: Path, Lin, Resnik, Jiang & Conrath, Leacock & Chodorow, Wu & Palmer, SenSim, and Pirro & Seco, but not with Hirst & St-Onge, and Lesk-based measures. It seems suggest that every application that uses Hirst & St-Onge measure, and Lesk-based measures to sentence level would be highly inefficient, independently of the architecture of the RTE system, even more because in general, we are required to compute several similarities measures of a given pair, in order to determine whether the entailment holds or not. 2
http://marimba.d.umn.edu/cgi-bin/similarity/similarity.cgi
A Semantic Oriented Approach to Textual Entailment
51
The other measures, which are placed on the rows 1, 2 and 3 of the table, are fast to compute and their response time when comparing a pair of two words gives an average time of 17seconds in our notebook, which are several orders of magnitude less than Lesk-based measures. Finally, we note that Pirro & Seco measure was the slower of the chosen measures to build feature vectors of our RTE system.
5 Experimental Evaluation First, we assess the system to predict RTE4 in two-way and three-way task, showing their results in tables 2 and 3. Then the RTE5 is taken as test set in two-way and three-way classification task, and their results are shown in tables 4 and 5. Afterwards, a feature analysis is performed switching off some set of features and assessing their impact in the overall accuracy. We generated a feature vector for every pair with both training and test sets. The feature vector is composed by the following eight components (as defined in Section 3): FRES, FLIN, FJICO, FPISE, FWUPA, FPA, FLECH, and SemSim. These components are chosen taking into account the conclusions of Section 4 about the efficiency on computing semantic metrics WorNet-based. These are eight semantic similarity measures to sentence level WordNet-based computationally efficient. In this section, we used the algorithm proposed by Castillo [19] to generate additional training data, in other words “expand a data set”, starting from RTE3 and RTE4 and following a double translation process (dtp). Double translation process can be defined as the process of starting with an S (String in English), translating it to a foreign language F(S), for example Spanish, and finally back again to the English source language F-1(S). We choose Spanish as intermediate language and Microsoft Bing Translator as the only MT (Machine Translation) system in this process. It was built based on the idea of providing a tool to increase the corpus size with the aim of acquiring more semantic variability. The augmented corpus is denoted RTE3-4C and it has the following composition in 3-way task: 340 pairs Contradiction, 1520 pairs Yes, and 1114 pairs Unknown. In the case of the RTE4-4C data set, it has the following composition: 546 pairs Contradiction, 1812 pairs Entailment, and 1272 pairs Unknown. We performed experiments with the following combination of datasets: RTE3, RTE3-4C, RTE4, RTE4-4C, RTE3-4C+RTE4-4C and RTE5 to deal with 2-way and 3-way classification task. The sign “+” represents the union operation of sets and “4C” means “four combinations” and denotes that the dataset was generated by using the algorithm to expand datasets [19], and using only one Translator engine. The RTE4 and RTE5 test sets, were converted to 2-way, taking Contradiction and Unknown pairs as No (No Entailment) in order to assess the system in the 2-way task. The tables 2 and 3, shows the accuracy predicting the following test sets: RTE4, and RTE5, in 2-way and 3-way task respectively. The training sets used were RTE3, RTE3-4C, RTE3-4C+RTE4-4C and RTE5 to predict the RTE4 test set in both tasks. The data sets RTE3, RTE3-4C, RTE4, RTE4-4C, and RTE3-4C+RTE4-4C are taking as training set when predicting RTE5 in both classification tasks, which are shown in tables 4 and 5.
52
J.J. Castillo Table 2. Results obtained taking RTE4 as test set in two-way task Training set RTE3 RTE3-4C RTE5 Baseline
MLP Classifier 56.72% 58.3% 57% 50%
SVM Classifier 57.24% 57.7% 57.2% 50%
ADTree Classifier 55.39% 56.7% 53.9% 50%
Tree Classifier(J48) 55.61% 56.8% 51% 50%
Table 3. Results obtained taking RTE4 as test set in three-way task Training set RTE3 RTE3-4C RTE5 Baseline
MLP Classifier 53.6% 54.2% 52.1% 50%
SVM Classifier 50.8% 52.7% 50.7% 50%
ADTree Classifier 50% 50% 50% 50%
Tree Classifier(J48) 52.3% 44.8% 51.7% 50%
Table 4. Results obtained taking RTE5 as test set in two-way task Training set RTE3 RTE3-4C RTE4 RTE4-4C RTE3-4C + RTE4-4C Baseline
MLP Classifier 55.83% 56.83% 54.17% 56.34% 57.33% 50%
SVM Classifier 53.83% 54.17% 53.83% 54.5% 58.5% 50%
ADTree Classifier 50.33% 51.67% 52.5% 53.17% 54.5% 50%
Tree Classifier(J48) 50.16% 50.33% 52.5% 54.5% 54.5% 50%
Table 5. Results obtained taking RTE5 as test set in three-way task Training set RTE3 RTE3-4C RTE4 RTE4-4C RTE3-4C + RTE4-4C Baseline
MLP Classifier 53.5% 53.67% 52.83% 52.5% 53.67% 50%
SVM Classifier 50% 51.6% 50% 50% 51.6% 50%
ADTree Classifier 50% 50% 50% 50% 50% 50%
Tree Classifier(J48) 51% 51% 48.33% 47.83% 41.5% 50%
The tables 2 and 3, shows that in 2-way task, the expanded training set RTE3-4C always yielded better results than by using the original training sets RTE3 and RTE4. We note that the highest accuracy was obtained with Multilayer Perceptron (tables 2, 3, and 5) as learning algorithm and using RTE3-4C as training set in the cases of tables 2 and 3, and using RTE3-4C + RTE4-4C in the case of table 5. In the TAC RTE4 Challenge the best score for 2-way task was 74.6% and the worst was 49.7%, with an average score of 58%, which is slightly below that our accuracy of 58.3%, obtained with RTE3-4C as training set and MLP as classifier, but resulting in not statistical significant difference. In the 3-way task the best accuracy
A Semantic Oriented Approach to Textual Entailment
53
was 68.5% and the lowest was 30.7%. The 3-way task is quite challenging, as the average score was 50.65%. Table 3 shows that using MLP as training algorithm and RTE3-4C as training data have outperformed the average score on 3.55% accuracy. Indeed, we outperformed the average score with the learning algorithms MLP and SVM, without having into account the training set employed. It seems suggests that these semantic features used are enough to build a RTE system with average accuracy. This claim is also supported by the experiments performed in tables 4 and 5. On the other hand, in the TAC RTE5 Challenge the best and worst score were 73.5% and 50%, with a median score of 61% for 2-way task. The scores for 3-way task were 68% (highest), 43% (lowest) and 52% (average). On predicting RTE5 our best accuracy in 2-way task was 58.5% which is slightly below that the average score. In 3-way task, we obtained 53.67% acc, which is slightly above that average RTE system, but resulting in not statistical significant difference. The tables 4 and 5, also shows the accuracy predicting RTE5 in 2-way and 3-way task. As a result of 2-way task, we note that in table 4, the expanded datasets RTE34C and RTE4-4C always yielded better results than the original datasets RTE3 and RTE4. Furthermore, using the dataset RTE3-4C+RTE4-4C a statistical significant difference is found with SVM classifier compared to the accuracy yielded by these training sets isolated, it seems suggest that by adding more data we gain real value. In table 5, we see that in 3-way task, by using the RTE3-4C training set yields better results than RTE3 training set. However, by using the RTE4-4C training set, a slightly decrease in accuracy is shown likely due to the unbalanced proportion of the pairs (50% Entailment, 35% Unknown, and 15% Contradiction) of the RTE4 dataset and because of the difficulty of predicting the Contradiction class. These experiments suggests that the algorithm for expand data sets [19] works better on two-way task than in the three-way task. To our surprise, by using the RTE3-4C+RTE4-4C training set, the best accuracy among other datasets is found, in both classification tasks over the RTE5 test set. Regarding the tables 3 and 5, we see that the tree classifier J48 seems to be very sensitive to the noisy introduced by the new examples. Then, it could be suggested that more robust classifiers as SVM or Neural Networks are preferred when training over the extended datasets. Despite the fact that the results in accuracy are far of the bests RTE system, we believe that the results obtained are very promising. This is because of the fact that we are not using other knowledge resource different to WordNet, and also we are not using lexical information neither syntactic information. Surely, to build a more competitive system is needed to incorporate lexical, syntactic and semantic information. Also, we observed that the use of expanded data sets [19], improves the accuracy of our system in any configuration desired, with 4.67% of accuracy points using the best classifier with RTE3-4C + RTE4-4C, predicting the RTE4. Tables 6 shows the results obtained with ten folds Cross Validation in two-way task using RTE3. We named the features as: R(Resnik), L(Lin), J(Jiang), Pi(Pirro & Seco), Le (Leacock & Chodorow),W(Wu & Palmer), P(Path), and C(SemSim). As a notation: All-C = represent all features turned-on except SemSim. We are interested on assessing the feature impact in the overall accuracy. We defined several sets based on the kind of information that the measure needs. So, we
54
J.J. Castillo
create subsets with measures that are based in a similar strategy of computing, we describe the most important below: -“RJLPi”: Is chosen to show contribution of the “information content” measures. -“WLeC”: Measures that requires Information about relative depth. With this subset we are interested on measuring the contribution of the information about relative depth. -“WLeCP”: With this subset we add the only information of node counting. Table 6. Results obtained with ten folds Cross Validation in 2-way task using RTE3 Features All All-RJ RJLPi WLeCP WLeC WLe RJ R
MLP Classifier 59.63% 57.62% 54.75% 58.5% 57.25% 54.5% 54.14% 49.87%
SVM Classifier 60.12% 59.25% 51.75% 56.87% 56% 53.75% 52.75% 51.5%
AB Classifier 57.5% 57.12% 57.5% 54.62% 54.37% 49.25% 56.87% 52.5%
DT (J48) Classifier 54.5% 57.25% 57.5% 57% 57% 51.5% 57.5% 51.5%
Regarding the tables, we clearly see that adding more features increase the performance of the classification of the employed datasets. Table 2 also shows that the best results are obtained turning-on all features. Indeed, progressive increasing is gained when more features are added. Thus, by using all features, a statistically significant difference is found instead of using individual features isolated, even more, using all features always outperforms the others subsets of features, with the exception of the decision tree classifier J48. Interesting, the experiment shows that for robust machine learning algorithms such as SVM and MLP, measures based on “relative depth” as WLeC are better than measures based on information content as RJLPi, but combining these two kind of measures the overall accuracy increases. As a conclusion, it seems to be that using a features vector with the eight semantic measures WordNet-based our RTE system could increase their accuracy.
6 Conclusions From our experiments we concluded that a RTE system can be built with an average score by using our method to extend the word-word semantic measures to sentence level. We also showed that the semantic measures over WordNet: Hirst & St-Onge, and Adapted Lesk-based metrics can’t be used in a practical RTE system because of efficiency considerations. At same time, we show eight semantic similarity measures WordNet-based computationally efficient that are useful in the recognizing textual entailment task. We also used in our experiments a promising algorithm to expand a RTE Corpus, which yielded statistical significant differences when predicting RTE test sets. We finally show that all proposed features from WordNet are not enough to build a competitive RTE system, although an average score could be reached.
A Semantic Oriented Approach to Textual Entailment
55
Future work is oriented to incorporate additional lexical similarities features and semantic resources and to test the improvements they may yield.
References 1. Giampiccolo, D., Dang, H., Magnini, B., Dagan, I., Cabrio, E.: The Fourth PASCAL RTE Challenge. In: TAC 2008 (2008) 2. Bentivogli, L., Dagan, I., Dang, H., Giampiccolo, D., Magnini, B.: The Fifth PASCAL RTE Challenge (2009) 3. Herrera, J., Penas, A., Verdejo, F.: Textual Entailment Recognition Based on Dependency Analysis and WordNet. PASCAL. In: First Challenge Workshop (2005) 4. Ofoghi, B., Yearwood, J.: From Lexical Entailment to Recognizing Textual Entailment Using Linguistic Resources. In: ALTA Workshop (2009) 5. Castillo, J.: A Machine Learning Approach for Recognizing Textual Entailment in Spanish. In: NAACL, Los Angeles, USA (2010) 6. Castillo, J., Cardenas, M.: Using Sentence Semantic Similarity Based on WordNet in Recognizing Textual Entailment. In: Iberamia 2010 (2010) (in press) 7. Pedersen, T., Patwardhan, S., Michelizzi, J.: WordNet: Similarity - Measuring the Relatedness of Concepts. In: AAAI 2004, San Jose, CA, pp. 1024–1025 (2004) 8. Patwardhan, S., Pedersen, T.: Using WordNet Based Context Vectors to Estimate the Semantic Relatedness of Concepts. In: EACL 2006, Trento, Italy (2006) 9. Resnik, P.: Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proc. of IJCAI 1995, pp. 448–453 (1995) 10. Lin, D.: An Information-Theoretic Definition of Similarity. In: Proc. of Conf. on Machine Learning, pp. 296–304 (1998) 11. Jiang, J., Conrath, D.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proc. ROCLING X (1997) 12. Pirrò, G., Seco, N.: Design, Implementation and Evaluation of a New Similarity Metric Combining Feature and Intrinsic Information Content. In: ODBASE 2008. LNCS (2008) 13. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd ACL (1994) 14. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification, pp. 265–283. MIT Press, Cambridge (1998) 15. Hirst, G., St-Onge, D.: Lexical chains as representations of context for the detection and correction of malapropisms, pp. 305–332. MIT Press, Cambridge (1998) 16. Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, p. 136. Springer, Heidelberg (2002) 17. Castillo, J.: Recognizing Textual Entailment: Experiments with Machine Learning Algorithms and RTE Corpora. In: CICLing 2010, Iaşi, Romania (2010) 18. Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) 19. Castillo, J.: Using Machine Translation Systems to Expand a Corpus in Textual Entailment. In: ICETAL 2010, Reykjavik, Iceland. LNCS(LNAI) (2010) 20. Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.: Sentence Similarity based on Semantic Nets and Corpus Statistics. IEEE TKDE 18(8), 1138–1150 (2006) 21. Li, Y., Bandar, A., McLean, D.: An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE TKDE 15(4), 871–882 (2003) 22. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from a ice cream cone. In: Proceedings of SIGDOC 1986 (1986)
On Managing Collaborative Dialogue Using an Agent-Based Architecture Tomáš Nestorovič University of West Bohemia in Pilsen, Univerzitní 8, 30614 Pilsen, Czech Republic
[email protected]
Abstract. In this paper, we focus on our agent-based approach to task-oriented dialogue management. We present our deliberation process based on optimizing the length of a dialogue. As innovative features, the manager accommodates the “lounge mode” for passing all initiative back to the user within a single dialogue turn, and a two-layered structure for representing the dialogue context. At the end of the paper, we suggest future extensions to the architecture. Keywords: Agent-based dialogue management; system adaptability; dialogue systems; artificial intelligence.
1 Introduction Dialogue management focuses on finding the machine's best response given an user's interaction history. During the past decades, many approaches emerged. What they have in common is the aim to manage and elicit knowledge from the user during a dialogue, however, their theoretical backgrounds differ. Ranging from simple finite state machines to Markov decision networks, there is a wide collection of methods to implement a dialogue manager. However, we decided to follow an agent-based approach to manage a spoken dialogue. In our case it accepts domain data description and intention satisfaction plans (instructing how to get a given task solved). The agent follows a scheme of the BDI architecture (Beliefs, Desires, Intentions) [1, 2]. The rest of the paper is divided as follows. First, we outline the manager overall structure (Section 2). Next, we move to our approaches to modules storing and coping with Beliefs, Desires, and Intentions (Section 3). The paper is concluded with scheduled future work and a brief summation (Sections 4 and 5).
2 Dialogue Manager Our previous work concerned with accommodating flexibility by making use of hierarchical frames structure. This structure was extended with a decentralized system of journals, and capable of OnFilled-events handling [3]. However, we dropped this course of development due to two reasons: 1) frames turned out to be a medium not G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 56–68, 2010. © Springer-Verlag Berlin Heidelberg 2010
On Managing Collaborative Dialogue Using an Agent-Based Architecture
57
strong enough to support features like dialogue optimization, and 2) frames cannot easily deal with negative notion of logic. These two facts were of the most importance for us to turn to the BDI architecture. The manager plays a role of a collaborative conversational agent. It consists of five modules (Fig. 1). The Context module maintains information about a current dialogue (having some Beliefs1 and defining some Desires). The History module serves as a source of historical data enabling user's utterances to be disambiguated (“the previous train”). The Strategy Selection module recognizes familiar situations within the dialogue and determines a corresponding initiative mode for the agent's next response production. The Core module controls all the previous modules – on the basis of given Desires and current Beliefs it sets new Intentions. The manager produces Concept-to-Speech (CTS) utterance descriptions and feeds them into the Prompt Planer module (its aim is to transform CTS descriptions into naturally sounding utterances). However, this module is currently not regularly designed and we substitute its function by merely passing the input to the output.
3 General Dialogue Management Architecture This section discusses each of the modules in detail, providing a comprehensive description of the architecture. 3.1 Context Module The Context module follows a two layered approach (Fig. 2), in which the upper layer serves for user's intentions detection, while the lower layer maintains dialogue objects and relations among them (thus “data” obtained during the recent dialogue). An input semantics preprocessing results into a block structure representation (Fig. 4) that passes through both of the layers, leaving a specific imprint behind. In Fig. 4, boxes are objects we will refer to as concepts throughout this paper, and arrows are relations among them. In both layers the imprints take the form of a facts Semantic information flow
Context
System response flow
(Beliefs, Desires)
History TTS PARSER
Core (Intentions)
Strategy Selection
Prompt Planner
Fig. 1. Manager modules topology and interaction 1
We will refer to the conceptual components of the BDI architecture with corresponding terms with first letter capitalized.
58
T. Nestorovič
(1)
FACT(concept, instance1, instance2, cs, salience, belief) .
Each fact provides a belief about a concept (the concept parameter in the fact definition) and/or relation (the instance{1,2} parameters). The initial value for the belief parameter (∈ ≡ < absolute disbelief; absolute confidence >) is the confidence score (cs, cs ∈ ) obtained from the automatic speech recognition module. The salience is an integer informing about how actual is this fact. Let us start with the lower layer storing Beliefs about a dialogue. Considering a time-table domain data model (Fig. 3), the system may, for example, “belief” that a train is the desired transportation means and Prague is the particular city of arrival. The lower layer stores all data-like information (solid white or hatched concepts in
input semantics
I4 I3 I2 I1
upper layer (intention detection) lower layer (data storage)
+
Fig. 2. Two-layered Context module approach; the input semantics is imprinted in both layers; upper layer controls the intentions stack by pushing (+) new intentions on it
Train Time-table MainLoop
Departure
City
Arrival
Time
Bus Ticket Ship
Fig. 3. Time-table domain data model
City > Rotterdam <
Departure Time Time-Table
Train Arrival
City > Prague <
Fig. 4. Block structure of preprocessed semantics of the complete request “When does a next train leave from Rotterdam to Prague?”. Graphics represents concepts value cardinalities: solid black = infinite, hatched = non-zero, solid white = zero.
On Managing Collaborative Dialogue Using an Agent-Based Architecture
59
Table 1. Upper layer semantics imprinting resolution
User's sentence
Cardinality Non-zero – Imprinted Imprinted
Zero – – –
Declarative Imperative Interrogative
Infinite – Imprinted Imprinted
Table 2. Lower layer semantics imprinting resolution
Zero Imprinted
Cardinality Non-zero Imprinted
Infinite –
Fig. 4). Knowledge represented by the lower layer can be arranged into a nested structure as Fig. 5 shows. The upper layer serves for user's intentions recognition. In a task-oriented dialogue, intentions are expressed in some form of actions to perform. Thus, instead of determining if a semantics particular concept relates to an action, we detect cases in which it definitely does not. These are all concepts except those that contain data only. More formally, we can describe such concepts by introducing cardinality of information they carry. The following list discusses all possibilities the cardinality may take on. − −
Let a leaf concept contain an atomic information (single time point, e.g. “2p.m.”); atomic information has zero cardinality, i.e., zero uncertainty, and therefore, cannot carry any intention as there is nothing to discuss about it. Let a leaf concept contain a non-atomic information (time interval, e.g. “2p.m.-3p.m.”); the information has non-zero cardinality as it has a certain level of uncertainty and as such may be a subject of a query. Departure Time-Table
City > Rotterdam <
Train
City
Arrival
> Prague <
Fig. 5. The lower layer contents only “data” mentioned in the dialogue 0
2
Time-Table
Time-Table
3
Train
1
Departure
Departure
Time
Time
Fig. 6. The upper layer content with salience (numbers) and the DepartureTimeQuestion intention detection pattern (dashed around)
60
T. Nestorovič
−
Let a leaf concept contain no information (undefined time value); we define such concept to have an infinite cardinality.
We recurrently can determine the cardinality of parent concepts in the block structure by considering the rule: −
Let the concept C contain at least one sub-concept with non-zero or infinite cardinality. Then C has a non-zero cardinality.
With knowing just the cardinalities, we still cannot determine if a semantics segment should be imprinted into the upper layer. If we are on detecting actions to perform, we need to involve dialogue acts, more specifically, detect imperative or interrogative sentences [4]. Hence, the approach we follow is a combination of both – cardinalities and dialogue acts as Table 1 shows. For completeness' sake, Table 2 shows the resolution of the lower layer imprinting. As it can be seen, to make an imprint, only the cardinality information is necessary. Once the semantics has been imprinted, the most recent intention is recognized. We use simple template matching approach, where each intention has its own pattern (Fig. 6). If it matches, the sum of concept saliences is computed. With more than one match, the pattern with highest total salience is considered as the actual user's intention. To manage intentions, we have partially adopted Grosz and Sidner's work on intentions in a dialogue [5] – a stack of intentions managed by dominance among intentions. We currently omit satisfaction-precedence (as we rely on having exhaustive plan description) and user's interruptions (i.e., temporal changes of the dialogue course). Thus, the stack is managed by a single rule (∀ Is ∈ Stack: Is DOM I ) → push( I )
(2)
dictating that the user may introduce a new intention I without losing any of the current intentions already on the stack only if each stacked intention Is dominates I. If the rule does not apply, introducing I is considered as a permanent change of the dialogue course, causing intentions that do not dominate I to be popped out of the stack immediately. 3.2 Core Module Once the intention stack is updated, the agent starts to process a plan on how to satisfy the top-positioned intention. The plan resembles a tree [6] where nodes are holders of agent's activity (i.e., dictate utterances to say or back-end interaction to do). An example of a plan for a DepartureTimeQuery intention may be seen in Fig. 7. First, the user is asked to say a transportation means to find a time-table information for (the result is unified with the M variable for further processing). Next, the train parameters are constrained by posing (some of) the disambiguation questions. Finally, the database is queried and results presented to the user. After presenting them, the agent considers the user's intention to be satisfied. However, we postpone the popping of agent's corresponding desire out of the stack with respect to the user's next utterance – if s/he reopens the intention (by changing the underlying data in the lower layer), the desire remains on the stack, otherwise it is popped out [6].
On Managing Collaborative Dialogue Using an Agent-Based Architecture
61
The plan in Fig. 7 is a rather simple one, however, it is sufficient enough do demonstrate the first level of agent's adaptability. This kind of adaptability simply swaps plan tree branches according to the lower layer data salience. We are motivated by adopting the results of user's initiative – if s/he prefers to discuss certain part of a task prior to discussing the rest, the agent adopts the decision. As an example, consider user's elliptical utterance “by train to Rotterdam” is misrecognized by not understanding the transportation means. The city of arrival (Rotterdam) now has the highest salience in the lower layer. From the agent's point of view, the user first wants to discuss the city of arrival and then move to the rest. As a result it adjusts the plan tree structure by positioning the corresponding branch at the beginning of the plan (Fig. 8). However, this time the mandatory M variable is unbound due to the $1'
',6$0%,* >9DULDEOH0@ >0RYH4XHU\@ ^7UDLQ%XV6KLS`
>9DULDEOH5@ >0RYH([HF@ VHOHFWZKHUH0
>0RYH4XHU\@ 0!'HS&LW\
>0RYH4XHU\@ 0!'HS7LPH
>0RYH4XHU\@ 0!$UU&LW\
>0RYH6WDWHPHQW@ 7KHQH[W5! IURP5!'HS&LW\! WR5!$UU&LW\! GHSDUWXUHV DW5!'HS7LPH!
>0RYH4XHU\@ 0!$UU7LPH
Fig. 7. Initial plan for satisfying the DepartureTimeQuery intention $1'
',6$0%,* >9DULDEOH0@ >0RYH4XHU\@ ^7UDLQ%XV6KLS` >0RYH4XHU\@ 0!$UU&LW\
>9DULDEOH5@ >0RYH([HF@ VHOHFWZKHUH0
>0RYH4XHU\@ 0!'HS7LPH
>0RYH4XHU\@ 0!'HS&LW\
>0RYH6WDWHPHQW@ 7KHQH[W5! IURP5!'HS&LW\! WR5!$UU&LW\! GHSDUWXUHV DW5!'HS7LPH!
>0RYH4XHU\@ 0!$UU7LPH
Fig. 8. The DepartureTimeQuery plan with swapped branches $1'
',6$0%,* >9DULDEOH0@ >0RYH4XHU\@ ^7UDLQ%XV6KLS`
>9DULDEOH5@ >0RYH([HF@ VHOHFWZKHUH0
>0RYH4XHU\@ 0!$UU&LW\
>0RYH4XHU\@ 0!'HS7LPH
>0RYH4XHU\@ 0!'HS&LW\
>0RYH6WDWHPHQW@ 7KHQH[W5! IURP5!'HS&LW\! WR5!$UU&LW\! GHSDUWXUHV DW5!'HS7LPH!
>0RYH4XHU\@ 0!$UU7LPH
Fig. 9. Feasible DepartureTimeQuery plan
62
T. Nestorovič
Table 3. Event types (ordered descending by importance); applicability (in parentheses): L = Lower layer, U = Upper layer, P = Plan, S = Stack Event type Desire satisfaction (P)
Event description An event of the most importance processed as soon as the manager is able to satisfy a desire on top of the stack. Generalization (U, L) A given concept needs to be a part of a more general concept (e.g., City can be either of Departure or Arrival). Disambiguation (P) A concept queries a database and the number of results returned exceeds the number of results allowed. Validation (U, L, S) The ASR module recognized the given concept with a low confidence score and it needs to be further validated by the user. Missing information (L) A concept is missing an information (i.e., Time concept exists but carries no value). Concept specification (P) More detailed information is needed (the counter-event to the Concept-Generalization event).
misrecognition error. As a result, the agent starts to search the tree structure to find how to reach a value. After having found a solution, it puts the corresponding branch at the beginning of a plan again (Fig. 9). The characteristic feature of an agent-based dialogue management is the ability to optimize the dialogue flow in some way. So far, we mentioned several supporting means the dialogue agent consists of – intentions stack, data lower layer, action upper layer, and plans. We were searching for how to bridge all of these to enable optimization of the agent's behaviour, and as a response, we devised an events environment. Events of different types define and represent elemental operations the agent is able to do (similar approach can be found in [7], however, it is applied on dialogue data only). As the events always relate to a specific entity (concept, relation, plan node, or intention), we preliminary can define them (full definition follows) as EVENT(entity, operation) .
(3)
For example, we might want to validate an emerged intention on top of the stack by a validation event, or generalize a concept in the lower layer by a generalization event (e.g., the Departure concept is a “generalization” of the Time concept). Table 3 gives an overview of all events we distinguish (in order of importance). Each event is considered as one option the manager is offered to take at a specific point in time. If the existence purpose of the event has been met (e.g. user validated a concept that the validation event relates to), we say that the event has been satisfied. As there may be more pending events in the context, the manager is given more choices regarding which one to start with at each of its turns. To find out the best order in that the events should be satisfied is the subject of the deliberation mechanism (see below). We split events into phases. For majority of the events, the agent must 1) utter to the user (initial phase), 2) wait for user's response (respond phase), and finally 3) check the event satisfaction (satisfaction phase). All events follow this cascade model. Additionally for each event, we track the state of its recovery. An event is said to be
On Managing Collaborative Dialogue Using an Agent-Based Architecture
63
Table 4. Event penalization criteria Penalty criterion (italicized) and explanation Event does not support a desire on top of the stack. The more dominant desire it supports, the higher penalty, i.e., we want to support narrowed topics in the dialogue. Event cannot be reacted (is unavailable) due to a missing piece of information. The system cannot query a time-table database if a transportation means is unknown; this results in an infinite penalty, i.e., stopping exploration in this direction. Concept salience. The longer a given concept did not appear in the dialogue, the higher penalty – we approach sticking to the current course of the dialogue. The number of pending events covered by the event. For example, at least two events are covered in an implicit confirmation utterance: at least one validation event + some of information elicitation events; the lower the number, the higher the penalty. Event is not recovered. Recovered events signal that the user has made corrections in the past dialogue – serving the corrections is given higher priority, i.e. not recovered events penalized. Event phase. Events in their last processing phase are preferred – with having the user's answer, we want to check if it satisfies system's question; events which do not meet this condition are penalized. Event type. Events of most importance are preferred – see the previous section for ordered list of event types.
recovered if it reached the satisfaction phase but user's interaction has turned it back to the initial phase (by making changes in the context). Therefore, the final definition of an event is EVENT(entity, operation, phase, recovered) .
(4)
As we noted above, events are means used by the agent to make decisions, i.e., plan its behaviour. For example, given a Time concept with the Validation and Generalization events, the agent may either 1) first attempt to satisfy the Validation (“Did you say ?”) followed by the Generalization (“Does the time refer to departure or arrival?”), or 2) it may attempt to satisfy straight the Generalization, as the Validation is involved (implicit validation). Which of these two options will be preferred depends on the deliberation mechanism. In our implementation, we currently consider only one agent's behaviour optimization criterion (although there may be more): the length of the dialogue in terms of dialogue turns. From this point of view, the second mentioned choice (“Does the time refer to departure or arrival?”) would be less penalized. Our future work also considers implementation of the optimization that regards the dialogue length in terms of elapsed real time. However, the missing clue still is how to combine both criteria into a single decision pattern.
64
T. Nestorovič Table 5. Spoken dialogue moves Dialogue move Example Query “When does a next train leave?” (user) “What time would you like to departure?” (system) YN-Query “Do you want to buy a ticket?” (system) Validation “Did you say…?” (system) “Tomorrow morning” (user) Statement “The next train departures at 7a.m.” (system) Grounding “I want” “Yes” “No” (user) “Ok” (system)
The current context (upper and lower layers, intention stack and the active plan) with all its events is considered as one of possible worlds from which we can move to another one by satisfying one or more of pending events. We search a space of possible worlds until it has been found the one in which the desire on top of the stack is satisfied. Fig. 10 shows the underlying algorithm. To reach the optimization of agent's behaviour we use the events penalization scheme (Table 4) for a plan construction. Recall that in our case the plan is a sequence of events that is optimal with respect to the “minimal dialogue length” criterion. Once the plan has been determined, finding out what the system should say next is to follow the plan up to the nearest point where an interaction with the user is necessary. This is usually the point where the agent is missing some information, and hence, utters a query. Additionally, the agent may also utter a final answer that satisfies its desire by making a statement. Agent's queries and statements are two of dialogue moves (Table 5) fed into the dialogue moves stack. The stack gathers moves performed by either of the participants. The purpose of the stack is to keep track of unfinished dialogue games [8] currently played.
1 2
3
Duplicate the current world. Repeat until you have to query the user, or you satisfy a desire on top of the stack. 2.1 Choose one of unsatisfied events. 2.2 Emulate a response to satisfy the selected event (e.g., validate a given concept). Here, we consider the domain of the event. E.g., for the Missing-information event we omit any emulation as the algorithm does not work with it at all, whereas for the Concept-specification and Generalization events we consider user's all immediate possible responses (we assume the number of them is always low in this case). Recurrently repeat from step 1. Fig. 10. Agent's deliberation algorithm
On Managing Collaborative Dialogue Using an Agent-Based Architecture
65
Table 6. Dialogue strategies selection criteria System-initiative strategy Mixed-initiative strategy − dialogue quality estimation − dialogue quality estimation − correction of information elicited using − higher initiative strategy higher initiative strategy information correction − information to get has a large range − acceptable recognition score − low recognition score
elicited
User-initiative strategy Lounge strategy − dialogue quality estimation − dialogue quality estimation − correction of information elicited using − user's intention is unknown higher initiative strategy − user's intention is ambiguous − high recognition score − user's response contributed to the intention − user's intention is unknown being solved
3.3 Strategy Selection Module The purpose of the dialogue moves stack is to keep track of the recent spoken interaction history. For example, despite we currently do not have the Prompt Planner module (Fig. 1) implemented, we yet are able to avoid agent's repetitive utterances by comparing agent's current move with moves already done. If there is a match, the agent may keep silent during its turn, handing the initiative back to the user. The example situation depicted above is a very special one, more specifically, it requires high user speech recognition scores and smooth dialogue progress. However, as both factors may vary during the dialogue, this interaction style is not guaranteed to work each time, and the agent is forced to adapt what it says to the current situation observed. As a solution, we introduce four dialogue strategies: system-initiative, mixed-initiative, user-initiative, and lounge strategy. All strategies are named with
1 2 3
4 5
Let U denote user's response received while discussing intention I. Deliberate and produce next move M. If U contributed to satisfying I, then 3.1 If M matches a move on the stack, keep silent during this turn (i.e. with U, the user did not respond agent's query). 3.2 Else, generate a context-free sentence (i.e., U responded to what the agent asked for). If U did not contribute to satisfying I, then drop the lounge strategy. Perform M with generating an utterance in accordance with the strategy selected.
Fig. 11. Lounge strategy rules (boxed) within the context of the agent's utterances production algorithm. The rules express the “user's response contributed to the intention being solved” feature listed in Table 6.
66
T. Nestorovič
respect to the level of initiative the agent exhibits in each of them. Table 6 conceives a listing of dialogue features for determining the strategy the agent should use to generate its response [9]. The decision process is similar to the Jaspis architecture [10]: with having four strategies to decide among, it is selected the one whose features reach the highest score. While the first three strategies are in accordance with common adaptability modeling habits [9, 11], the lounge strategy found an inspiration in [12] and [13]. The main difference between the user-initiative and lounge strategies is that in the user-initiative strategy, the agent utters at least some question (open-ended), whereas in the lounge strategy it keeps silent (i.e., hands the initiative back to the user) or encourages the user to say more by uttering explicit context-free sentences. There are currently three sentences: “Please say me more.”, “Please be more specific.”, and “Uhu.” We put one restriction at the sentences design – they should not contain any cue phrases that could indicate a possible change in the dialogue course [5], i.e., they should not make the user think the system wants to take the initiative. Fig. 11 shows the algorithm that governs the agent's behaviour using the lounge strategy, and Fig. 12 shows a dialogue snippet of a successful and unsuccessful application of the strategy.
S Welcome to Simple Time-table System. How can I help you? U Train departures. S [ User's last utterance carries information about train departures but does not contribute to agent's HowCanIHelpYou desire. ] Please say me more. U I need to get to Rotterdam by train. S [ “To get somewhere” is perceived as a DepartureTimeQuery intention. The city contributes to the adopted desire satisfaction, hence, a context-free sentence is produced. ] Uhu. U [ Timeout. ] S [ User's last utterance did not contribute to satisfying agent's desire, hence, the lounge strategy is dropped and more restrictive utterance generated. ] What time approximately would you like to depart? U Five thirty. S [ We currently are unable to care about agent's utterances naturalness as the Prompt Planner module is unimplemented. In the following statement, making use of ellipsis would be appropriate. ] The next train from Delft to Rotterdam leaves at 7 a.m. The next train from Delft to Rotterdam leaves at 9 a.m. The next train from Delft to Rotterdam leaves at 11 a.m. Fig. 12. Dialogue snippet presenting a successful and unsuccessful interaction using the lounge strategy
On Managing Collaborative Dialogue Using an Agent-Based Architecture
67
3.4 History Module The History module (Fig. 1) provides us a means for coping with information spoken in the past. The structure of the module derives from [14], and consists of objects we refer to as entities, denoting once sealed fragments of the context (sealed = there is nothing left to discuss about them and all information they carry is valid). The dialogue history is built automatically during the semantics anchoring process. The inverse operation, history reading, is initiated implicitly, i.e., every incoming semantics is treated as a reference to some historical data. The basic process of unreferencing takes as big fragment of the input semantics as possible and matches it against the most general historical entity found closest to the “present” point. If there is a match, the entity is transformed into a semantics replacing the portion of the original input semantics. The resulting semantics goes through to the processing described in the above sections. Due to space reasons we are unable to provide any further description and interested reader is referred to our related work, e.g. [3].
4 Future Work Our experience with the lounge strategy application shows that the beginning of a dialogue is a reasonable place the strategy should take place – a considerable confusion arises when an untrained user is presented a context-free utterance in the later of a dialogue (exhibited mostly by silence and sparingly with repetitions). We believe this problem would exist even if there were skilled users of the domain. Therefore, we want to (partially) substitute the context-free utterances with more specific general context-aware sentences – e.g., our intended aim is an agent producing sentences similar to “Please be more specific about the Train”, thus narrowing the user's absolute initiative s/he currently possesses. Obviously, this approach brings the lounge strategy closer to the user-initiative strategy. However, what should keep them separated are the context-free sentences for cases when the user's intention is not clear. Last but not least, apart of the time-table domain, we also would like to apply the above presented dialogue manager in a personal assistance domain, considering e-mails and appointments management. The reason for our choice – the user may become an expert (skilled user) in using these domains which gives us the opportunity to proof our thesis.
5 Conclusion This paper presented our general dialogue management architecture, designed as a deliberative agent. We have presented here a broad scale of algorithms that provide manager's particular capabilities, some of which derive from well known approaches. One of the features the manager accommodates is the two-layered approach for user's intentions detection and beliefs maintenance. The other feature is the lounge strategy the manager makes use of when encountering ambiguity – regarding both user's intentions and beliefs. The strategy needs further care to tune with respect to the utterances produced by the agent. Hopefully, our idea can find its place in the field of dialogue management with skilled users.
68
T. Nestorovič
This paper presented foremost theoretical background of our research and mentioned preliminary results of its application. The necessary changes to the architecture mentioned above are subject of our current work. Acknowledgment. This work was supported by grant No. 2C06009 Cot-Sewing.
References 1. Rao, A.S., Georgeff, M.P.: BDI agents: From theory to practice. In: 1st International Conference on Multi-Agent Systems (ICMAS), San Francisco, pp. 312–319 (1995) 2. Wooldridge, M.: Intelligent Agents. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence, pp. 27–77. MIT Press, London (2000) 3. Nestorovič, T.: Towards Flexible Dialogue Management Using Frames. In: Matoušek, V., Mautner, P. (eds.) TSD 2009. LNCS, vol. 5729, pp. 419–426. Springer, Heidelberg (2009) 4. Nguyen, A., Wobcke, W.: An Agent-Based Approach to Dialogue Management in Personal Assistants. In: IEEE/WIC/ACM, San Diego, pp. 367–371 (2006) 5. Grosz, B., Sidner, C.L.: Attention, Intention and the Structure of Discourse. Computational Linguistics 12, 175–204 (1986) 6. Rich, C., Sidner, C.L., Lesh, N.: COLLAGEN: Applying Collaborative Discourse Theory to Human-computer Interaction. AI Magazine 22, 15–25 (2001) 7. McGlashan, S.: Towards Multimodal Dialogue Management. In: 11th Twente Workshop on Language Technology, Twente, pp. 1–10 (1996) 8. Kowtko, J.C., Isard, S.D., Doherty, G.M.: Conversational Games Within Dialogue. Research Paper HCRC/RP-31, Human Communication Research Centre, University of Edinburgh (1993) 9. Chu, S.-W., O’Neill, I., Hanna, P., McTear, M.: An Approach to Multi-strategy Dialogue Management. In: INTERSPEECH, pp. 865–868 (2005) 10. Turunen, M., Hakulinen, J.: Agent-based Adaptive Interaction and Dialogue Management Architecture for Speech Applications. In: Matousek, V., Mautner, P., Moucek, R., Tauser, R. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 357–364. Springer, Heidelberg (2001) 11. Bui, T.H.: Multimodal Dialogue Management. Technical Report, TR-CTIT-06-01, Centre for Telematics and Information Technology, University of Twente, Enschede (2006) 12. Wallis, P., Mitchard, H., Das, J., O’Dea, D.: Dialogue Modelling for a Conversational Agent. In: Stumptner, M., Corbett, D.R., Brooks, M. (eds.) Canadian AI 2001. LNCS (LNAI), vol. 2256, pp. 532–544. Springer, Heidelberg (2001) 13. Weizenbaum, J.: ELIZA – A Computer Program for the Study of Natural Language Communication between Man and Machine. Communications of the Association for Computing Machinery 9, 36–45 (1966) 14. Zahradil, J., Müller, L., Jurcicek, F.: Model sveta hlasoveho dialogoveho systemu. In: Znalosti, Ostrava, pp. 404–409 (2003)
Dialog Structure Automatic Modeling Débora Hisgen and Daniela López De Luise AIGroup, Universidad de Palermo, Mario Bravo 1050, P.O. Box: 1175, Buenos Aires, Argentina
[email protected],
[email protected] http://www.aigroup.com.ar
Abstract. This paper presents the approach implemented as part of a conversational bot named WIH (Word Intelligent Handler). It has a complex architecture with several components. Some of them are the ER-memory, EP-memory and other minor modules that provide the prototype of good modeling of Spanish sentences. They constitute the knowledge-representation mechanism that is used by WIH to build automatic answers during dialogs with humans. In this paper a small description of these components and some of its interaction is given, along with test cases and a statistical analysis of the results obtained. It is shown here that WIH prototype can adapt its behavior and learning rate of its internal working memories according to dialogs contents. Keywords: Natural language processing, Man-machine interfaces, Chatterbot.
1 Introduction In the early fifties, Alan Turing proposed the Turing Test which constitutes one of the first contests in Artificial Intelligence field. This test aims to demonstrate the existence of intelligence in computers and to verify that machines can think as humans do. [1] Turing's work aroused great curiosity on J. Weizenbaum, who returned and restated his idea for what later would be called the ELIZA project. [2] Eliza is a computer program and an example of primitive Natural Language Programming (NLP) approach. It implements simple Pattern Matching techniques and was one of the first chatterbots. Some years later, Dr. Colby creates Parry [3], a bot that simulates paranoid behaviors. It was used to interact with three patients diagnosed as paranoid. Tests showed that even a group of psychiatrists could not distinguish the computer from the human patients [4]. Inspired by ELIZA, Richard Wallace began to develop ALICE in 1995. As part of his project, he also designed the Artificial Intelligence Mark-up Language (AIML). AIML is an XML-compliant language, composed of a general tag named Category, the elementary unit of knowledge. Every Category has two components: Pattern and Template. Pattern is a string that represents the conversation and a template represents the response of a matched pattern [6] [7]. The aim of this paper is to present part of the architecture of a chatterbot prototype named Word Intelligent Handler (WIH) and to show an alternate language modeling approach that does not use traditional pattern matching nor semantics or ontologies. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 69–81, 2010. © Springer-Verlag Berlin Heidelberg 2010
70
D. Hisgen and D. López De Luise
It is important to consider that one of the main tasks in this area is to determine the context in which the language is applied and use it to track inflectional variants of the word usage. Several algorithms can be used to solve the various problems wrongcontext produces such as ambiguity1, polysemy2, anaphora3, etc. Most of the well known computational linguistic community solutions use Natural Language Processing (NLP) techniques. [4] [8] Most of NLP solutions work on the basis of at least one of the following five processing types of analysis: phonological, morphological, syntactics, semantics and pragmatics. This kind of layering is quite most used to partition the problem and simplify the solution. Actual solutions often result in long dictionaries with heavy and/or complex processing. They usually require some degree of permanent human participation for updating and tuning its knowledge. WIH prototype was designed to mostly avoid humans as part of learning and modeling. But it is allowed a few external indications that could be optionally brought out to provide better system's behavior.[9] In the field of Semantics Frameworks (SFw) there are several proposals such as ODESeW a workbench built inside the WebODE [10][11][12] ontology engineering, that allows to develop portals managing knowledge automatically. Other SFw is ContentWeb, an ontology platform integrated with WebODE which allows users to interact by means of a predefined topic [13]. This framework interacts with: OntoTag (Ontological Semantics, implemented with RDF/S and XML), OntoConsult (natural language interface based on ontologies) and OntoAdvice (Information retrieval system based on ontologies). It was implemented in XML and RDF languages. Each word receives a URI (Uniform Resource Identifier). It also assigns new URI for each new morphosyntactic element. There are also frameworks that have been developed to provide administration of morphosyntactic corpus. For instance XTAG is a project that develops a widecoverage English grammar [15]. It uses a lexicalized Tree Adjoining Grammar (TAG) formalism to allow the development of TAGs. It mainly consists of a parser, an Xwindows grammar development interface and a morphological analyzer. MorphoLogic Recognition Assistant [16] presents a morphological and syntactic analysis applied to text automatically extracted from speech. The system admits three services: proper disambiguated segmentation (splitting into fragments according to content coherence), disambiguation for underspecified symbols (from unrecognized word sounds) and error correction of misinterpreted words. In this paper a new approach for traditional NLP is presented and used as part of WIH (Word Intelligent Handler) bot prototype. This chatterbot has a three layered architecture (see Figure 1) organized as follows: -Internal Structure: Takes every Spanish sentence from dialogs and transforms it into sets of Homogenized Basic Elements (HBE). Every HBE was represented in an internal language for WIH and is a kind of building block extracted from words that are not filtered-out. 1
More than one interpretation of a sentence, changing its meaning and causing confusion or uncertainty. 2 For the same context it could apply more than one meaning of a word. 3 The use of certain words to refer information declared in previous sentences.
Dialog Structure Automatic Modeling
71
Fig. 1. Three layered architecture of WIH
-Virtual Structure: selects a subset of HBE that are related according to certain criteria and use it to build an Eci (in Spanish, Estructura de Composición Interna; Inner Structure Arrangement in English). An Eci is a logical structure modeling the relationship between words, an oriented graph representing a whole statement in the original text. -External Structure: is made up of a set of Eci from the same text. It is an interface between users and the Virtual Structure. It provides an external view of the content that is independent of the original text organization. To define the way a set of HBE is transformed into an Eci, the system uses a weighting coefficient named po. According to [17] , po is an Eci metric that asses the relevance of the phrase being represented. It is a value based on weights that correspond to each HBE belonging to the Eci. It is calculated using the formula given in Equation 1. As can be seen, po takes the p values from previous HBE values, being i the ordinal position (between 1 and N, the number of structures in Eci). po=Σ1N(pi+ pi-1)/2
(1)
The pi values assigned to each HBE are taken from a predefined table T. It defines the individual weight based on the type of word represented. This typification is automatically determined using morphosyntactic considerations and induction trees [18] [19]. It was also statistically shown that po is invariant to document size and type of text content. Furthermore, it could be used to discriminate writer’s profile (in fact it can distinguish between users writing a document, for a forum, web index, or a blog) and to assess the relevance of a sentence within a text [18]. This paper is focused on main structures related to a novel knowledge modeling (see Figure 2): ER-Memory (in Spanish Estructura de Respuesta, Structure of the
72
D. Hisgen and D. López De Luise
answer in English), that models the knowledge stored in Virtual Structure and the original way it processes language sentences. It also describes interaction between ER-Memory and MCT-Memory (Memoria de Composición Transitoria, transient structure-arrangement memory).
Fig. 2. Detailed Diagram of WIH Architecture
The rest of this paper presents the WIH particular components, starring with EPMemory (section 2), ER learning approach (section 3), MCT-Memory (section 4), test case metrics and statistics analysis (section 5), and finally conclusions and future work to be done (section 6).
2 EP Memory The EP-Memory records the sequence of linguistics categories extracted from each sentence processed and then builds a chain of tokens named Parsed Structure (PS). Every PS is weighted according to its usage in real previously processed dialogs. As an example for sentence: “Hola” the related EP is: “ER= {< 0.0152009> [SUSTANTIVO] # {{Hola}}”. As the memory learns from experience, it reinforces parts of its knowledge from initial status, changing the relative relevance of involved structures. After certain number of occurrences, weighting is stabilized and changes are minor. In [20] EP tests show the WIH ability to process dialogs, derive weightings and generate category sequences.
Dialog Structure Automatic Modeling
73
The first EP training was performed with random structures to show that no special category sequence is preferred by the prototype since the dataset was constructed with no special use of any parsed structure. As a consequence, weighting values should be mostly evenly distributed in the domain. The obtained results, showed in Figure 3, verify this hypothesis.
Fig. 3. First Training with random EP dataset
Fig. 4. Second training with real EP datasets
74
D. Hisgen and D. López De Luise
The second test made in that paper, deal with real test cases. Figure 4 shows the 3D-histogram with a strong change in the general bar distribution. There is a significant biased slope down curve from highest values. As weighting distribution follows roughly real usage of linguistics categories, it follows that EP-Memory is able to model it. But, in order to make the prototype be able to handle language, it must be complemented with the modeling of contexts where each type of linguistic EP structure is used. That task is distributed mostly among ER-Memory and MCT-Memory.
3 Getting an Automatic Answer: The ER Learning The ER Memory is focused on learning human answering behavior. To do that it records EP relationship between a sentence and the human reply in past dialogs. In fact it keeps track of the main syntactic structure of every sentence (the EP) and the one used as a reaction. If WIH receives a sentence, the ER memory will try to find the corresponding EP. If such EP does not exist, it is new and has to be learned. Otherwise, it was learned in past dialogs. Let EPs be the matching EP. It has one or more EPa used as answer in those dialogs. They are organized chronologically in the memory and have a weighting value that represents the frequency EPs - EPa usage. Any time the EP-memory has previously memorized EPs the whole set of EPa is loaded and associated to the EP of the actual sentence. This EPs- EPa relation is known as an ER. Its weighting is updated according to the algorithm explained afterwards and known as Yi. The structure of ER Memory is shown in the Equation [2]: ERi={ #{case-1,case-2,...}}
(2)
Where Yj represents the usage value of a particular ERi. EPj is the related EPa, and case1, case2 are sentences in Spanish with the EPj structure (let them call 'Use Case'). Note that ER-Memory collects a set of associations between EPs cases, whereas EP-Memory records sets of syntactics structures. When ER is searched for a specific sentence, it really looks for identical or similar EPj structures. The following algorithm describes the steps performed: derive Epk for the actual case (sentence) to be answered get R= ERi from ER-memory with EP=EPk If R=ɸ (empty result): INSERT_ERi/EPj = EPi):ER (Epk, Y0, case) else get a random number S ϵ (0:1) set RR=the first ERj such that S>= Yj update Yj in ERj return RR
Dialog Structure Automatic Modeling
75
When RR is received by WIH, one of the set of cases (described as {case-1,case2,...}) previously recorded by the memory is used to build an answer to the actual sentence. The procedure for the answer construction is not under the scope of this paper. Regarding Yt, it is important to note that its value depends on a predefined learning curve determined by equation (3) (3)
Yt= 1/1+exp(-a*b*t*(100*b -1))
In Equation 3, parameter a is used to represent the learning rate, b typify the speed of training and t is the time of the event. In particular, t value is imposed by a discrete virtual clock that tics on every new EP matching case. The virtual clock ticks in range from 0 to MAX-TICKS. When clock rises until MAX-TICKS reseats itself and starts all over again. The t value is a kind of aging value to control structure deprecation.
4 MCT-Memory The MCT-memory is a transient storage used to record every input sentence and its related knowledge (EP, ER and its weighting value). The main role of this memory is to make a quick coherent historical reconstruction of every dialog, keeping pointers to original sources of data. As long as structures and vocabulary are assimilated its content is cleaned.
5 Test Cases In this section, a set of sentences from 5 dialogs are processed. ER, EP and MCT memories are statistically analyzed to asses and evaluate language usage and sentence morphosyntactic content modeling ability. The prototype was fed with 5 dialogs, each one with 8 to 11 sentences. The language used is Spanish. The total number of words is 224 with a mean of 1056 characters. Table 1 shows the number of linguistic category cases in the test database. Table 1. Liguistic Cathegories Distribution Verb Adjective Adverb denoting place Adverb denoting manner Adverb denoting time Article Demonstrative pronoun Personal pronoun Noun
317 338 320 314 301 320 328 321 294
76
D. Hisgen and D. López De Luise
It is important to note that linguistic categories processed by the prototype are just the ones in the table. Any other category is not processed. One of the original dialogs is shown in Figure 5. Figures 6 its corresponding EP content.
Fig. 5. Dialog number three from the dataset
Fig. 6. EP of dialog number three
The dialogs were fed to the prototype, parsed and processed by ER, EP and MCT memory. Resulting byte size changes is shown in Figure 7.
Dialog Structure Automatic Modeling
77
Fig. 7. Comparison between EP and ER. Size in bytes.
As can be seen, there is a similar growth in the byte size for dialogs 1 to 4, but dialog 5 makes the memory size to be incremented more than in the rest of the cases. This is visualized better in Figure 8 that sketches rate changes in size.
Fig. 8. Comparison between accumulated EP and ER. Size in bytes.
It is interesting to observe in Table 2, that the number of sentences in fifth dialog is the same as in the second one, but in this later case the memory size growth is almost the minimum rate (see Table 2). This happens because the memory knowledge is related to the sentence content and not to the number of sentences. A second test was performed with the same dataset but fed in reversed order. The resulting size changes in EP and ER memory are shown in Figure 9. Changes in rates are in Figure 10. This time sizes are bigger than in the first case (see also tables 4 and 5), even though the dialogs and contents are the same. This is because the knowledge through experience acquired at the beginning is greater in earlier steps, and there is a
78
D. Hisgen and D. López De Luise Table 2. Increment in bytes with every dialog
dialog ID 1 2 3 4 5
growing (bytes) number of accumulated sentences sentences 10 10 11 21 9 30 10 40 11 51
size (bytes)
EP 755 505 798 502 1290
ER 899 698 584 778 1239
EP 755 1260 2058 2560 3850
ER 899 1597 2181 2959 4198
few opportunities to reuse previous knowledge (in first test the knowledge is acquired gradually and this way it can be reused and optimized the memory organization). Note that even though dialogs are used in reverse sequence, graphics are not symmetrical with those of the first test. That is because the relationship between knowledge acquisition and dialog number is not linear. In addition the graphics shows a curve representing a slower learning speed.
Fig. 9. Comparison between EP and ER. Size in bytes using dialogs in reverse order.
Fig. 10. Comparison between accumulated EP and ER. Size in bytes using dialogs in reverse order.
Dialog Structure Automatic Modeling
79
As a second interesting observation (see Tables 3 and 5), the percentages of size changes are distributed similarly than in the first test: second and fourth dialogs make less change than the rest. This is due to the resemblance between their structures and contents (both of them use several sentences that are identical and a few ones that makes dialog topic very different). Table 3. Percentage of increment with every dialog growing (bytes) dialog ID 1 2 3 4 5
EP 0 0,66887417 1,58019802 0,62907268 2,56972112
ER 0 0,77641824 0,83667622 1,33219178 1,59254499
Table 4. Increase bytes upon every dialog fed in reverse order Growing (bytes) dialog ID 5 4 3 2 1
number of sentences 11 10 9 11 10
accumulated sentences 51 40 30 21 10
EP 1587 512 788 840 1188
ER 2027 604 1372 328 1024
EP 1587 2099 2887 3727 4915
ER 2027 2631 4003 4331 5355
Table 5. Percentage of increment with every dialog fed in reverse order
dialog ID 5 4 3 2 1
growing rate EP 0 0,3226213 1,5390625 1,06598985 1,41428571
ER 0 0,29797731 2,27152318 0,23906706 3,12195122
6 Conclusions The overall structure and approach of several WIH modules have been presented. The described modules take part in the prototype learning process and are complemented with other modules. ER-Memory, EP-Memory and MCT are part of the knowledge automatic modeling. These modules do not require human interaction and just use information gathered from real dialogs processed by the system. The architecture
80
D. Hisgen and D. López De Luise
provides a self adapting modeling, since they do not involve predefined dictionaries nor other type of structures. It builds all the internal structures using general morphological and syntactical information contained in processed sentences. From the tests performed it was shown that memory size indicate certain relationship with the complexity of the sentences (could be measured by number of syntactic categories in each one) and the topic (similar sentences but different topics make similar change in memory size), but not with the number of sentences.
7 Future Work It remains to evaluate and tune several modules. One of the main pending changes is to make Yt coefficients self-adaptive upon EP long-term usage. Even though deletion modules are implemented, it also reminds to make triggers for purge activities. Finally, multithreading is one of the main pendings.
References [1] Turing, A.: Computing Machinery and Intelligence. Mind 59, 433–460 (1950) [2] Weizenbaum, J.: ELIZA - A Computer Program for the Study of Natural Language Communication between Man and Machine. Communications of the Association for Computing Machinery 9, 36–45 (1966) [3] Weizenbaum, J.: Computer power and human reason. W.H. Freeman, San Francisco (1976) [4] Winograd, T.: Procedures as a Representation for Data in a Computer Program for Understanding Natural Language. Cognitive Psychology 3(1) (1972) [5] ALICEBOT, http://alicebot.blogspot.com/ [6] Colby, K.M., Hilf, F.D., Weber, S., Kraemer, J.: Turing-Like Indistinguishability Tests for the Validation of a Computer Simulation of Paranoid Processes. A.I. 3, 199–222 (1972) [7] Wallace, R.: The Elements of AIML Style. Alice AI Foundation (2003) [8] Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999) [9] Mauldin, M.: Chatterbots, TinyMuds and The Turing Test: Entering The Loebner Prize Competition. In: AAAI 1994 (1994) [10] Corcho, O., López Cima, A., Gómez Pérez, A.: A Platform for the Development of Semantic Web Portals. In: ICWE 2006, Palo Alto, California, USA, ACM. New York (2006) [11] Corcho, O., Fernández-López, M., Gómez-Pérez, A., Vicente, O.: WebODE: an integrated workbench for ontology representation, reasoning and exchange. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, pp. 138–153. Springer, Heidelberg (2002) [12] http://webode.dia.fi.upm.es/WebODEWeb/index.html [13] Aguado de Cea, G., Álvarez de Mon y Rego, I., Pareja Lora, A.: Primeras aproximaciones a la anotación lingüístico-ontológica de documentos de la Web Semántica: Onto Tag. Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial (17), 37–49 (2002)
Dialog Structure Automatic Modeling
81
[14] Aguado de Cea, G., Álvarez de Mon y Rego, I., Pareja Lora, A., Plaza Arteche, R.: RFD(S)/XML Linguistic Annotation of Semantic Web Pages. In: International Conference on Computational Linguistics. Proceedings of the 2nd workshop on NLP and XML, vol. 17, pp. 1–8 (2002) [15] Paroubek, P., Schabes, Y., Joshi, A.K.: XTAG - A Graphical Workbench for Developing Tree-Adjoining Grammars. In: Third Conference on Applied Natural Language Processing, Trento, Italy (1992) [16] Prószéky, G., Naszódi, M., Kis, B.: Recognition Assistance: Treating Errors in Texts Acquired from Various Recognition Processes. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 2, pp. 1–5 (2002) [17] López De Luise, M.D., Agüero, M.J.: Aplicabilidad de métricas categóricas en sistemas difusos. In: Jardini, J.J.A. (ed.) IEEE Latin America Magazine, vol. 5(1) (2007) [18] López De Luise, M.D.: A Morphosyntactical Complementary Structure for Searching and Browsing. In: Advances in Systems, Computing Sciences and Software Engineering, Proceedings of SCSS 2005. Springer, Netherlands (2005) [19] López De Luise, M.D.: A Metric for Automatic Word categorization. In: Advances in Systems, Computing Sciences and Software Engineering, Proc. Of SCSS 2007. Springer, Netherlands (2007) [20] López De Luise, M.D., Hisgen, D., Soffer, M.: Automatically Modeling Linguistic Categories in Spanish. In: CISSE 2009 (2009) [21] López De Luise, M.D., Soffer, M.: Automatic Text processing for Spanish Texts. In: ANDESCON 2008, Peru (2008) [22] López De Luise, M.D.: Ambiguity and Contradiction from a Morpho-Syntactic Prototype Perspective. In: Sobh, T., Elleithy, K. (eds.) Advances in Systems, Computing Sciences and Software Engineering, Proc. of SCSS 2007. Springer, Netherlands (2007) (aceptado para publicación)
A Probabilistic Model Based on n-Grams for Bilingual Word Sense Disambiguation Darnes Vilari˜ no, David Pinto, Mireya Tovar, Carlos Balderas, and Beatriz Beltr´ an Faculty of Computer Science Benem´erita Universidad Aut´ onoma de Puebla, Mexico {darnes,dpinto,mtovar,bbeltran}@cs.buap.mx
Abstract. Word Sense Disambiguation (WSD) is considered one of the most important problems in Natural Language Processing. Even if the problem of WSD is difficult, when we consider its bilingual version, this problem becomes to be much more complex. In this case, it is needed not only to find the correct translation, but this translation must consider the contextual senses of the original sentence (in a source language), in order to find the correct sense (in the target language) of the source word. In this paper we propose a model based on n-grams (3-grams and 5-grams) that significantly outperforms the last results that we presented at the cross-lingual word sense disambiguation task at the SemEval-2 forum. We use a na¨ıve Bayes classifier for determining the probability of a target sense (in a target language) given a sentence which contains the ambiguous word (in a source language). For this purpose, we use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to determine the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). As we mentioned, the results were compared with those of an international competition, obtaining a good performance. Keywords: Bilingual word sense disambiguation, Na¨ıve Bayes classifier, Parallel corpus.
1
Introduction
Word Sense Disambiguation (WSD) is a task that has been studied for a long time. The aim of WSD is to select the correct sense of a given ambiguous word in some context. The fact that automatic WSD still being an open problem has motivated a great interest on the computational linguistics community, therefore, many approaches has been introduced in the last years [1]. Different studies have demonstrated that many real applications may get benefit from WSD (see for
This work has been partially supported by the CONACYT project #106625, as well as by the PROMEP/103.5/09/4213 grant.
G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 82–91, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Probabilistic Model Based on n-Grams for Bilingual WSD
83
instance, [2,3]). The selection of the appropiate sense for a given ambiguous word, is commonly carried out by considering the words sorrounding the ambiguous word. A very complete survey of several approaches may be found in [1]. As may be seen, a lot of work has been done on finding the best supervised learning approach for WSD (see for instance [4,5,6,7]), but despite the wide range of learning algorithms, it has been noted that some classifiers such as Na¨ıve Bayes are very competitive and their performance basically relies on the representation schemata and their feature selection process. Monolingual word sense disambiguation is known for being a difficult task, however, when we consider its cross-lingual version, this problem becomes to be much more complex. In this case, it is needed not only to find the correct translation, but this translation must consider the contextual senses of the original sentence (in a source language), in order to find the correct sense (in the target language) of the source word. For the experiments carried out in this paper, we have considered English as the source language and Spanish as the target language, thus, we attempted the bilingual version of WSD. We do not use an inventory of senses, as the most of the WSD systems do. Instead, we attempt to find those senses automatically by means of a bilingual statistical dictionary which is calculated on the basis of the IBM-1 translation model1 , by using the EUROPARL parallel corpus2 . This bilingual statistical dictionary feeds a Na¨ıve Bayes classifier in order to determine the probability of sense given a source sentence which contains the ambiguous word. The manner in which we filter the content words of each sentence lead to present three different approaches based on n-grams whose performance is shown in this paper. Given a sentence S, we first consider its representation by using one |S|-gram. The second approach split the sentence into different 3-grams with the constraint of having each the ambiguous word. The third approach considers all the 5-grams extracted from the original sentence that again contain the ambiguous word. For each approach proposed, we obtain a candidate set of translations for the source ambiguous word by applying the probabilistic model on the basis of the n-grams selected. Noticed that there are other works in literature that have used parallel corpora (bilingual or multilingual) for dealing with the problem of WSD (see for instance [8,9]). However, at difference of these approaches in which it is expected to find the best sense in the same language (despite using other languages for training the learning model), in this case we are interested on finding the best translated word, i.e., with the correct sense in a different language. The rest of this paper is structured as follows. Section 2 presents the problem of bilingual word sense disambiguation. In Section 3 we define the probabilistic model used as classifier for the bilingual WSD task. The experimental results obtained with the two datasets used are shown in Section 4. Finally, the conclusions and further work are given in Section 5. 1 2
We used Giza++ (http://fjoch.com/GIZA++.html). http://www.statmt.org/europarl/
84
2
D. Vilari˜ no et al.
Bilingual Word Sense Disambiguation
Word sense disambiguation is an important task in multilingual scenarios due to fact that the meanings represented by an ambiguous word in one source language may be represented by multiple words in another language. Consider the word “bank” which may have up to 42 different meanings3 . If we select one of these meanings, let’s say: put into a bank account (to bank). The corresponding meaning in other languages would be to make a deposit. In Spanish, for instance, you would never say She banks her paycheck every month (Ella bankea su cheque cada mes), but She deposites her paycheck every month (Ella deposita su cheque cada mes). Therefore, the ability for disambiguating a polysemous word from one language to another one is essential to the task of machine translation and those Natural Language Processing (NLP) tasks related with it, such as cross-lingual lexical substitution [10]. In the task of bilingual word sense disambiguation we are required to obtain those translations of a given ambiguous word that match with the original word sense. In the following example, we have a sentence as input with one polysemous word to be disambiguated. The expected results are also given as follows. Input sentence: ... equivalent to giving fish to people living on the bank of the river ... [English] Output sense label: Sense Label = {oever/dijk} [Dutch] Sense Label = {rives/rivage/bord/bords} [French] Sense Label = {Ufer} [German] Sense Label = {riva} [Italian] Sense Label = {orilla} [Spanish] The bilingual WSD system is able to find the corresponding translation of “bank” in the target language with the same sense meaning. In order to approach this problem we propose the use of a probabilistic model based on n-grams. This propossal is discussed in the following section.
3
A Na¨ıve Bayes Approach to Bilingual WSD
We have approached the bilingual word sense disambiguation task by means of a probabilistic system based on Na¨ıve Bayes, which considers the probability of a word sense (in a target language), given a sentence (in a source language) containing the ambiguous word. We calculated the probability of each word in the source language of being associated/translated to the corresponding word (in the target language). The probabilities were estimated by means of a bilingual statistical dictionary which is calculated using the Giza++ system over the 3
http://ardictionary.com/Bank/742
A Probabilistic Model Based on n-Grams for Bilingual WSD
85
EUROPARL parallel corpus. We filtered this corpus by selecting only those sentences which included some senses of the ambiguous word which were obtained by translating this ambiguous word on the Google search engine. We will start this section by explaining the manner we represent the source documents (n-grams) in order to approach the bilingual word sense disambiguation problem. We further discuss the particularities of the general approach for each task evaluated. 3.1
The n-Grams Model
In order to represent the input sentence we have considered a model based on n-grams. In the experiments presented in this paper, we have considered three different approaches, which are described as follows. Given a sentence S, we first consider its representation by using one |S|-gram. The second approach split the sentence into different 3-grams with the constraint of having each the ambiguous word. The third approach considers all the 5-grams extracted from the original sentence that again contain the ambiguous word. Consider the following example for the ambiguous word execution and its preprocessed version which was just obtained by eliminating punctuation symbols and stop words (none other pre-processing step was performed): Input sentence: Allegations of Iraqi army brutality, including summary executions and the robbing of civilians at gun-point for food, were also reported frequently during February. Pre-processed input sentence: Allegations Iraqi army brutality including summary executions robbing civilians gun-point food reported frequently during February n-gram model: |S|-gram: Allegations Iraqi army brutality including summary executions robbing civilians gun-point food reported frequently during February 3-gram: {including, summary, executions}, {summary, executions robbing}, {executions robbing, civilians} 5-gram: {army, brutality, including, summary, executions}, {brutality, including, summary, executions robbing}, {including, summary, executions robbing, civilians}, {summary, executions robbing, civilians, gun-point}, {executions robbing, civilians, gun-point, food} For each approach of the n-grams sentence representation proposed, we obtain a candidate set of translations for the source ambiguous word by applying one probabilistic model on the basis of the n-grams selected. See the following section for further details. 3.2
The Probabilistic Model
Given an English sentence SE , we consider its representation based on n-grams as discussed in the previous section. Let S = {w1 , w2 , · · · , wk , · · · , wk+1 , · · · } be
86
D. Vilari˜ no et al.
the n-gram representation of SE by bringing together all the n-grams, where wk is the ambiguos word. Let us consider N candidate translations of wk , {tk1 , tk2 , · · · , tkN } obtained somehow (we will further discuss about this issue in this section). We are interested on finding the most probable candidate translations for the polysemous word wk . Therefore, we may use a Na¨ıve Bayes classifier which considers the probability of tki given wk . A formal description of the classifier is given as follows. p(tki |S) = p(tki |w1 , w2 , · · · , wk , · · · ) p(tki |w1 , w2 , · · · , wk , · · · ) =
p(tki )p(w1 , w2 , · · · , wk , · · · |tki ) p(w1 , w2 , · · · , wk , · · · )
(1) (2)
We are interested on finding the argument that maximizes p(tki |S), therefore, we may avoid calculating the denominator. Moreover, if we assume that all the different translations are equally distributed, then Eq. (2) must be approximated by Eq. (3). p(tki |w1 , w2 , · · · , wk , · · · ) ≈ p(w1 , w2 , · · · , wk , · · · |tki )
(3)
The complete calculation of Eq. (3) requires to apply the chain rule. However, if we assumed that the words of the sentence are independent, then we may rewrite Eq. (3) as Eq. (4). p(tki |w1 , w2 , · · ·
, wk , · · · ) ≈
|S|
p(wj |tki )
(4)
j=1
The best translation is obtained as shown in Eq. (5). Nevertheless the position of the ambiguous word, we are only considering a product of the probabilites of translation. Algorithm 1 provides details about implementation. BestSenseu (S) = arg max tk i
|S|
p(wj |tki )
(5)
j=1
with i = 1, · · · , N . With respect to the N candidate translations of the polysemous word wk , {tk1 , tk2 , · · · , tkN }, we have used of the Google translator4. Google provides all the possible translations for wk with the corresponding grammatical category. Therefore, we are able to use those translations that match with the same grammatical category of the ambiguous word. Even if we attempted other approaches such as selecting the most probable translations from the statistical dictionary, we confirmed that by using the Google online translator we obtain the best results. We consider that this result is derived from the fact that Google has a better language model than we have, because our bilingual statistical dictionary was trained only with the EUROPARL parallel corpus. 4
http://translate.google.com.mx/
A Probabilistic Model Based on n-Grams for Bilingual WSD
87
Algorithm 1. A Na¨ıve Bayes approach to bilingual WSD
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Input: A set Q of sentences: Q = {S1 , S2 , · · · }; Dictionary = p(w|t): A bilingual statistical dictionary; Output: The best word/sense for each ambiguous word wj ∈ Sl for l = 1 to |Q| do for i = 1 to N do Pl,i = 1; for j = 1 to |Sl | do foreach wj ∈ Sl do if wj ∈ Dictionary then Pl,i = Pl,i ∗ p(wj |tki ); else Pl,i = Pl,i ∗ ; end end end end end k return arg maxtk |S| j=1 p(wj |ti ) i
In Figure 1 we may see the complete process in which we approach the problem of bilingual WSD. The experimental results of the different sentence representations based on ngrams for bilingual word sense disambiguation are given in the following section.
4
Experimental Results
In this section we present the obtained results for the bilingual word sense disambiguation task. We first describe the corpus used in the experiments and, thereafter, we present the evaluation of the three different sentence representations based on n-grams. 4.1
Datasets
For the experiments conducted we have used 25 polysemous English nouns. We selected five nouns (movement, plant, occupation, bank and passage), each with 20 example instances, for conforming a development corpus. The remaining polysemous nouns (twenty) were considered as the test corpus. In the case of the test corpus, we used 50 instances per noun. A list of the ambiguous nouns of the test corpus may be seen in Table 1. Noticed that this corpus does not contains a repository of senses, since the task requires to find the most probable translation (with the correct sense) of a given ambiguous word.
88
D. Vilari˜ no et al.
Fig. 1. An overview of the presented approach for bilingual word sense disambiguation Table 1. Test set for the bilingual WSD task Noun name coach education execution figure post pot range job rest ring mood soil strain match scene test letter paper side mission
4.2
Evaluation of the n-Gram Based Sentence Representation
In Table 2 we may see the results obtained with the different versions of n-gram sentence representation when we evaluated the model with the corpus presented in Table 1. The runs are labeled as follows: 3-gram: A representation of the sentence based on trigrams. 5-gram: A representation of the sentence based on 5-grams. |S|-gram: A sentence representation based on a unique n-gram of length |S|. With the purpose of observing the performance of the proposed approaches, we show in the same Table the results obtained by other systems at the SemEval2 competition. A simple comparison lead to verify that two of the proposed sentence representations improve the rest of the approaches.
A Probabilistic Model Based on n-Grams for Bilingual WSD
89
Table 2. Evaluation of the bilingual word sense disambiguation task - Five best translations (oof) System name Precision (%) Recall (%) 3-gram 70.36 70.36 54.81 54.81 5-gram UvT-WSD1 42.17 42.17 UvT-WSD2 43.12 43.12 40.76 40.76 |S|-gram UHD-1 38.78 31.81 UHD-2 37.74 31.3 35.84 35.46 ColEur2
Fig. 2. An overview of the presented approach for bilingual word sense disambiguation
By observing the behaviour of precision over the different ambiguous words (see Figure 2), we may have a picture of the significant level of improving that may be reached when representing the sentence with 3-grams. We consider that again, the hypothesis of Harris5 [11] is confirmed. The closer the words to the polysemous one, the better they can be used for disambiguating the ambiguous word. In Figure 2 we may also see that there are some words that are easier to be disambiguated (e.g. soil) than others (e.g. mood). For research purposes, we also consider important to focus the investigation on those words that are hard to be disambiguated. 5
Words with similar syntactic usage have similar meaning.
90
5
D. Vilari˜ no et al.
Conclusions and Further Work
Bilingual word sense disambiguation is the task of obtaining those translations of a given ambiguous word that match with the original word sense. Different approaches have been presented in evaluations forums for dealing with this particular problem. In this paper we propose a model based on n-grams (3-grams and 5-grams) that significantly outperforms the last results presented at the cross-lingual word sense disambiguation task at the SemEval-2 forum. We use a Na¨ıve Bayes classifier for determining the probability of a target sense (in a target language) given a sentence which contains the ambiguous word (in a source language). For this purpose, we use a bilingual statistical dictionary, which is calculated with Giza++ by using the EUROPARL parallel corpus, in order to determine the probability of a source word to be translated to a target word (which is assumed to be the correct sense of the source word but in a different language). In order to represent the input sentence we have considered a model based on n-grams. For each approach of the n-grams sentence representation proposed, we obtain a candidate set of translations for the source ambiguous word by applying one probabilistic model on the basis of the n-grams selected. As we mentioned, the results were compared with those of an international competition, obtaining a very good performance.
References 1. Aguirre, E., Edmonds, P.: Word Sense Disambiguation, Text, Speech and Language Technology. Springer, Heidelberg (2006) 2. Chan, Y., Ng, H., Chiang, D.: Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 33–40 (2007) 3. Carpuat, M., Wu, D.: Improving statistical machine translation using word sense disambiguation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), pp. 61–72 (2007) 4. Florian, R., Yarowsky, D.: Modeling consensus: Classifier combination for word sense disambiguation. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pp. 25–32 (2002) 5. Lee, Y.K., Ng, H.T.: An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, pp. 41–48 (2002) 6. Mihalcea, R.F., Moldovan, D.I.: Pattern learning and active feature selection for word sense disambiguation. In: Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2), pp. 127–130 (2001) 7. Yarowsky, D., Cucerzan, S., Florian, R., Schafer, C., Wicentowski, R.: The johns hopkins senseval2 system descriptions. In: Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2), pp. 163–166 (2001)
A Probabilistic Model Based on n-Grams for Bilingual WSD
91
8. Ng, H.T., Wang, B., Chan, Y.S.: Exploiting parallel texts for word sense disambiguation: An empirical study. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 455–462 (2003) 9. Alonso, G.B.: Spanish word sense disambiguation with parallel texts (In spanish: Desambiguacion de los sentidos de las palabras en espa˜ nol usando textos paralelos). PhD thesis, Instituto Polit´ecnico Nacional, Centro de Investigaci´ on en Computaci´ on (2010) 10. Sinha, R., McCarthy, D., Mihalcea, R.: Semeval-2010 task 2: Cross-lingual lexical substitution. In: Proceedings of the NAACL HLT Workshop on Semantic Evaluations: Recent Achievements and Future Directions, Association for Computational Linguistics, pp. 76–81 (2009) 11. Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Information Retrieval with a Simplified Conceptual Graph-Like Representation Sonia Ordoñez-Salinas1 and Alexander Gelbukh2 2
1 Universidad Distrital F.J.C and Universidad Nacional, Colombia Center for Computing Research (CIC), National Polytechnic Institute (IPN), Mexico
[email protected],
[email protected]
Abstract. We argue for that taking into account semantic relations between words in the text can improve information retrieval performance. We implemented the process of information retrieval with simplified Conceptual Graphlike structures and compare the results with those of the vector space model. Our semantic representation, combined with a small simplification of the vector space model, gives better results. In order to build Conceptual Graph-like representation, we have developed a grammar based on the dependency formalism and the standard defined for Conceptual Graphs (CG). We used noun premodifiers and noun post-modifiers, as well as verb frames, extracted from VerbNet, as a source of definition of semantic roles. VerbNet was chosen since its definitions of semantic roles have much in common with the CG standard. We experimented on a subset of the ImageClef 2008 collection of titles and annotations of medical images. Keywords: Information Retrieval, Conceptual Graph, Dependency Grammar.
1 Introduction The language used in medical literature, as well as in other domains, has its own grammatical peculiarities concerning the usage of noun phrases and terminology. For processing of this type of language it is preferable to use structures that allow representing semantic relations between words. Conceptual Graphs structures allow retaining the relations between words. However, it is difficult to transform natural language text to Conceptual Graphs structures. We present method for transforming text into simplified conceptual graphlike structure, close to syntactic dependency structure. As a case study, we used these structures in the process of information retrieval and found that it improves the retrieval performance as compared with the standard vector-space model as a baseline. Our procedure for transforming text to simplified Conceptual Graphs (CG) is based on an adapted grammar, which we manually built for this purpose. This grammar is based on two elements: construction of concept nodes, usually noun phrases, and assigning them specific roles defined by the standards of Conceptual Graphs. We tested our method on a collection of annotations of medical images ImageClef 2008 [22]. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 92–104, 2010. © Springer-Verlag Berlin Heidelberg 2010
Information Retrieval with a Simplified Conceptual Graph-Like Representation
93
The paper is organized as follows. In Section 2, we explain our motivation. In Section 3, we discuss the importance of CG and its advantages as a computational structure and shortly present the state of the art both of knowledge representation structures and automatic parsing of natural language into Conceptual Graphs. In Section 4, we give more details about conceptual structures. In Section 5, we describe the experimental methodology and experimental results. Finally, Section 6 we concludes the paper and presents future work.
2 The Problem In medical science, information systems are very important for managing information on human health conditions. However, medical information is presented in natural language. People working in medical institutions choose different words when filling forms; it is difficult to standardize all the vocabulary related with health. In order to be able to analyze medical information, retrieve necessary details, or answer specific queries, natural language processing methods are to be applied. It is desirable that these methods represent natural language information in computational structures. There are several computational structures used in natural language processing. Simple structures like bag of words, which work only on words without preserving their relations, make processing easier; however, they lose the semantics of the text. Other structures like Conceptual Graphs can preserve many semantic details, but they are complex in management and processing, as well as difficult to obtain. In this paper we use a simpler structure than full standard Conceptual Graphs and show that it is still useful in an applied task, namely, in Information Retrieval task.
3 Related Work In this section, we give a brief overview of computational structures, Conceptual Graphs (CGs) in particular, and summarize the state of the art in automatic transformation of natural language texts into CGs. 3.1 Computational Structures for NLP There are many computational structures used in natural language processing (NLP). Statistical Computational Structures. These include structures ranging from simplest structures, such bag of words and vector space model, to more complex structures, such as graphs or trees. Most frequently mentioned in the literature are vector space model structures and graphs. The vector space model [32] is simple and most common in practice. This technique consists in extracting words from texts (tokenizing), removing stop words, and reducing the dimensionality (e.g., stemming). The documents are represented as vectors, where each word represents a feature (coordinate) and its value can be either frequency of the given word in the text or presence or just binary: presence or absence of this word in the text.
94
S. Ordoñez-Salinas and A. Gelbukh
The documents can be represented as graphs in many ways [34]. These methods used for this are classified into standard, single, n-distance, n-single distance, absolute distance, and relative frequency. Each method determines terms and adjacencies used as vertices and arcs of the graph, correspondingly. Rege et al. [33] describe many forms of representing documents as graphs and Badia and Kantardzic [2] propose a methodology for construction of graphs via statistical learning. Graphs are widely used in natural language processing, for example, is question answering [26], text classification [4, 10], named entity recognition [5], or information representation [3, 31] (in combination with vector space techniques).In other cases the documents are represented by probabilistic functions [25] or composite function of probabilistic function on words [12, 35]. Linguistic Computational Structures. To represent linguistic knowledge, grammar structures are most commonly used. Grammar structures include syntactic structure, morphology, and other linguistic details. Morphology provides the part of speech of each word in the text, as well as its dictionary form. Syntax describes the relations among words. In general, these structures are based on a grammar or a set of structural rules which is language-specific and depend on syntax theories: for example, dependency grammar, link grammar, or constituent-based (phrase structure) grammar. A structure based on the dependency grammar is determined by the relation between a word which functions as the head and the other words dependent on it. In this structure, the order of the words or their position in the sentence does not matter [39]. The link grammar builds undirected relations between pairs of words [36]. In the constituents-based grammar, runs of adjacent words that express a text unit are determined and labeled with constituent category, such as Noun Phrase (NP), Verb Phrase (VP), Noun, etc. This technique builds a tree by iterative process of segmentation and grammatical classification of runs of words in the sentence. Knowledge Representation Languages. These languages are designed to describe semantics and concepts. Examples of this type of languages are Frame Representation Language (FRL) and Descriptions Logics (DL) [7]. The language of first type (FRL) is defined with meta-language and is based on frames. The frames are oriented to the recognition of objects and classes. Frames have names and features, numeric or otherwise. FRL heavily relies of inheritance mechanisms. Conceptual Structures. The structures of this category, unlike the statisticaloriented structures, are intended for knowledge representation. As examples we can mention Semantic Networks, Conceptual Graphs, the Knowledge Interchange Format (KIF), the Resource Description Framework (RDF) of World Wide Web Consortium (W3C), an ontological language Web Ontology Language (OWL) of W3C [9], and Common Logic (CL). RDF is a language for information representation in Web resources. It represents documents’ metadata such as the title, author, and date. OWL is a markup language for publishing and sharing ontologies on the Web. Semantic network [13, 37] is a mathematical model that shares many features with Conceptual Graphs.
Information Retrieval with a Simplified Conceptual Graph-Like Representation
95
3.2 Conceptual Graphs and Their Representation Conceptual Graphs (CGs) for representing text were introducing by Sowa [37]. They are bipartite digraphs. They have two types of nodes: concepts and relations. A relation node indicates the semantic role of the incident concepts. Since CGs are semantically very rich, they are suitable for knowledge representation, including knowledge bases and ontologies. There are relatively few works, however, aimed at construction of CGs. Three trends can be mentioned: (1) methodology for manual development of CGs; (2) automatic transformation of natural language text into CGs using deterministic approaches, and (3) automatic transformation using statistical approaches. Deterministic Automatic Transformation. In one of his pioneering works, Sowa [38] proposed a procedure to build Conceptual Graphs based in four elements: (a) Type label (concepts and relations); (b) Canonical graph. The Canonical Graphs corresponds to the graphs that connect relation and concept nodes with their restrictions; (c) Type definition. Some concepts and relations are defined with primitives, while others can be defined by lambda abstractions; (d) Schema. A concept type may have one o more schemas that specify the corresponding knowledge. Other works present step by step construction of each element of the graphs. Hernández Cruz [18] presents a converter or Spanish text into Conceptual Graphs, based on previous syntactic analysis. Amghar et al. [1] describes how to convert French texts into Conceptual Graphs using cognitive patters. In medical context, Rassinoux et al. [28, 29] generate annotations for the text, which they use to construct CGs. Reddy et al. [30] present an implementation of a CG-like data structure. Conceptual graphs serve as knowledge representation in the systems LEAD (Learning Expert system for Agricultural Domain) and XLAR (Universal Learning Architecture). There, CGs are constructed via frames, which represent features of objects. Castro-Sanchez and Sidorov [8] extract semantic role and valency information from human-oriented dictionaries. Hensman et al. [15, 16, 17] use WordNet and VerbNet for identifying the semantic roles. All documents are converted into XML format and then parsed with Charniak’s probabilistic parser, which produces trees in Penn Treebank-style formalism based on constituent grammar. Then the roles are identified using VerbNet. For each clause in the sentence, the main verb is identified and a sentence pattern is built using the parse tree. For each verb in the sentence, they extract all possible semantic frames from VerbNet, taking into account the constraints of the roles in VerbNet. Statistical Automatic Transformation. Hensman [14] first transforms documents into Extensible Markup Language (XML). Then she identifies semantic roles using VerbNet, WordNet and a parser. Barrière and Barrière [6] describe the construction of CGs using tag words and a parser. Then they disambiguate the CGs. For transforming the grammatical rules into CGs they use heuristics methods. Other researchers use link grammar [19, 40] or dependency grammar [21]. In the latter work the authors use supervised learning to classify concepts, relations, and structures.
96
S. Ordoñez-Salinas and A. Gelbukh
4 Building the Structure We used a simplified structure, which is basically syntactic structure minimally adapted to semantics represented in conceptual graphs. For assignment of semantic roles, we used the verb lexicon VerbNet [20]. It is organized into verb classes. Each class contains a set of syntactic descriptions that include a verb and the elements that depend on it, along with their semantic roles. Basing on this information, we built a dependency grammar, which included verb classifications, their syntactic descriptions, and frame descriptions. Table 1 shows a sample of the rules. The first rule has the roles agent and theme, the class of the verb is V_ACCOMPANY-51-7, and LIS_NP correspond to a list of noun phrases. The @ in the rule marks the head; this allows producing a dependency tree using a context-free parser. The elements within square brackets are optional. Table 1. Example of alternative rules for the non-terminal SENTENCE agent:LIS_NP @:V_ACCOMPANY-51-7 theme:LIS_NP agent:LIS_NP @:V_ACCOMPANY-51-7 theme:LIS_NP [spatial] destination:LIS_NP actor1:LIS_NP @:V_ACQUIESCE-95 [DEST _DIR] actor2:LIS_NP agent:LIS_NP @:V_ADDICT -96 patient:LIS_NP agent:LIS_NP @:V_ADDICT -96 patient:LIS_NP [DEST _DIR] stimulus:LIS_NP agent:LIS_NP @:V_ADDICT -96 patient:LIS_NP [DEST _DIR] stimulus:LIS_NP agent:LIS_NP @:V_ADJUST -26-9 patient:LIS_NP
For parsing the obtained grammar, we used the parsing tool [11] developed in the Natural Language Processing Laboratory, CIC-IPN, available from nlp.cic.ipn.mx/tools/parser. The tool produces a dependency tree labeled with the semantic roles indicated in the grammar. Note that in our case the grammar was specially designed for the obtained trees to resemble CGs, and the labels on its arcs were semantic roles. We expect to add in the future more elaborated post-processing of the dependency trees to better approximate semantic graph structures.
5 Information Retrieval with Simplified Conceptual Graphs Since we represent the documents and the queries as graphs, the main issue for an information retrieval application is the similarity measure between two graphs. The system produces ranking of documents for a given query according to this similarity measure between the query and each document. To measure the similarity between two graphs G1 and G2 as the relative size of their maximum overlap, i.e., the maximum common sub-graph. To find the maximum common sub-graph, we build all maximal common sub-graphs and then choose the largest one. To find maximal common sub-graphs, we use the following procedure. A vertex mapping between two labeled graphs G1 and G2 is a one-to-one correspondence ϕ : S1 ↔ S2, Si is a subset of vertices of Gi, such that the labels on the corresponding vertices (which in our case are the stems of the corresponding words)
Information Retrieval with a Simplified Conceptual Graph-Like Representation
97
coincide. We require the corresponding subsets S1 and S2 to be maximal in the sense that no supersets of them can be mapped. For example, in a fat cat sat on a mat and a fat dog slept and a fat cat slept and a fat dog sat on a mat, the first fat from the first sentence can be mapped to either first or second occurrence of fat in the second sentence, and then the second fat is mapped to the other occurrence; the similarly there are six possible mappings of a’s, which gives 12 possible mappings in total. Either one of the isomorphic sets S1 ≅ S2 is the vertex set of a maximal common sub-graph. The arcs of this common sub-graph are those arcs that are present, with the same labels, in both graphs between the corresponding vertices of S1 and S2, i.e., such x x arcs that u ⎯ ⎯→ v , u,v ∈ S1, is an arc in G1, and ϕ (u ) ⎯ ⎯→ ϕ (v) , f(u), f(v) ∈ S2, is an arc in G2. This completes building of a maximal common sub-graph G12. We score a maximal common sub-graph very similarly to the standard vector similarity score, but combining the counts for words and relations separately: sim(d1 , d 2 ) =
(
exp α log
α ∑ idf w + β ∑ idf r w
∑ ∑ f w2,1
r
f w2, 2 + β log
, ∑f ∑f ) 2 r ,1
2 r ,2
(1)
where w runs over the mapped vertices, that is, the words in common between the two documents (nodes of G12); r runs over arcs (relations) present in both graphs between the corresponding vertices (arcs of G12); idf is the standard inverse document frequency measure, calculated both for vertices and for arcs. The frequency for an arc is measured by a triple of a label on the source vertex, label on the relation, and label on the target vertex; for example, love ⎯agent ⎯ ⎯→ John is a unit for counting the idf. The denominator is a standard vector space model normalizing factor, modified to reflect both vertices (words) and arcs (relations). In fact we found that for this dataset it is better not to include this denominator; see below. Finally, α, β are importance weights given to the intersection of the words and the arcs, correspondingly; see below. In fact, only the ratio α /β is what matters, so only one of the two parameters can be chosen independently. We consider all possible vertex mappings (maximal common sub-graphs G12) between G1 and G2; the best score for a mapping is considered as the similarity measure between the two documents.
6 Experimental Results We experimented with both the proposed representation and with the usual vector space model as a baseline. Dataset. As the test collection we used a subset of annotated collection of medical images of ImageClef 2008, only using the title of the image and its annotation, but not the image itself. By joining the title and annotation of each image, we obtained a collection of 67115 records. We only considered the documents that contained any text, and ignored the documents that only contained an image.
98
S. Ordoñez-Salinas and A. Gelbukh
Of these documents, we only experimented with a subset of first 1,000 documents (from 0000003 to 0001684) and first 15,603 documents (from 0000003 to 0026687), due to time limitations. We used 9 queries, namely 22 to 30, because these queries are intended to be answered not only by analyzing the image but also the textual part of the collection. The sample of 1,000 documents contained 160 relevant answers to all questions (counting twice the same answer to two different questions), and the sample of 15,603 documents contained 1,187 relevant answers. The collection consists of very short documents and even shorter queries. For example, query 25 reads “Merkel cell carcinoma,” and the first document marked in the collection as relevant for this query is document 79: “Eight single-level dynamic CT scans (A H) of the abdomen of a 32-year-old woman with abdominal pain. Scans were obtained during injection of 150 mL of nonionic contrast medium (iohexol) at 5.0 mL/sec. Scans show that the pancreas reaches peak enhancement before the liver. Effect of injection rate of contrast medium on pancreatic and hepatic helical CT”. Our initial hypothesis was that for so short texts the usual vector model may prove to be inaccurate and additional semantic information would be useful. Building Conceptual Graphs. We developed an English grammar with the following peculiarities, as described above. In addition to the usual syntactic structure, the grammar includes thematic roles, such as agent or attribute. These roles were taken from FrameNet, and were selected on the lexical basis, for each verb individually. This gave us more semantic-oriented analysis than a general-purpose grammar that only uses morphosyntactic information. This grammar recognizes all the words that occur in this collection. To include in it the words for which we did not find morphosyntactic information in WordNet, we used the UMLS tool [24] to determine their part of speech. The labels of the nodes (words) in the graphs were obtained with Porter stemmer [27]; the labels of the arcs were specified in the grammar. Performance Measure. To evaluate our system against the gold baseline, we used the Mean Average Precision [23] measure. This is one-number measure defined as
MAP =
1 Q 1 ∑ Q q =1 mq
mq
∑P
,
qd
(2)
d =1
where Q is the total number of queries (9 in our case), mq is the number of relevant documents for a query, Pqd is the precision on the set of documents ranked by the system, for the query q, higher or equally as the document d. The summation is over all relevant documents for the given query. The ranking (ordering) was defined by the scores calculated according to (1). However, there were many documents scored equally, so the ranking was ambiguous. For the purpose of calculating precision in (2), we used the following formula: Pqd =
> ≥ P( Rqd ) + P( Rqd )
2
,
(3)
Information Retrieval with a Simplified Conceptual Graph-Like Representation
99
> ≥ is the set of documents scored higher than d, and Rqd is where P is the precision, Rqd
the set of documents scored higher or equally as d. Baseline: the Vector Space Model. To build the vector representation of the documents, we used wordInd of Lexical Tools of UMLS [24] for tokenizing and Porter stemmer [27] for stemming. The vector coordinates were the frequencies (not binary vectors), and the similarity measure was cosine. To illustrate the behavior of the collection, we show in Figure 1 the precision and recall on each query (query numbers from 21 to 30) for binary retrieval by threshold of cosine > 0.5 and cosine > 0. 0.8 0.7
g
0.6 Precision (Cos > 0)
0.5
Recall (Cos > 0)
0.4
Precision (Cos > 0.5)
0.3
Recall (Cos > 0.5)
0.2 0.1 0 21
22
23
24
25
26
27
28
29
30
Fig. 1. Precision and recall on vector space model for different queries of the collection
Rather poor performance can be explained by very small size of the queries and documents, as well as by semantically hard nature of the queries, which would require a good medicine domain ontology. We did not use any synonym dictionary or ontology, because the purpose of this work was not to achieve good performance but to compare the options of using or not semantic relations in the text. However, we do not use precision and recall figures for comparison with our method, since these are set-based measures, while both vector space model and our method produce rankings. The Mean Average Precision for the baseline vector model without taking into account the relations can be observed on the figures below with the zero value of the parameter. Information Retrieval with Simplified Conceptual Graphs. For each document and for each query, we built its semantic representation, varying the weights α, β present in (1). We also considered the possibility not to include the normalizing denominator in (1), i.e., we considered a similarity measure that consisted only of the numerator. Figure 2 shows the value obtained for the performance evaluation measure described above on the sample of 15,603 documents described above, for the parameter α = 1 (coincidence of words) and varying β (coincidence of arcs). The left plot shows the experiments without the normalizing denominator in (1), and the right one shows the results for both complete formula (1) and without the denominator (same as on the left, but at larger scale).
100
S. Ordoñez-Salinas and A. Gelbukh
14.3
14.5 14
14.2
13.5 14.1 13 14
12.5
13.9
12
13.8
11.5 11
13.7 10.5 13.6
Without weighting With weighting
10
13.5
9.5 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
Fig. 2. Left: Mean average precision for the large sample, without normalizing weighting in (1), as function of relative weight of graph arcs. Right: The same, together with the results with normalization.
One can observe that, contrary to the assumptions of the vector space model, the formula without the normalizing denominator performs considerably better; we attribute this to the small size of the documents. The standard vector space model similarity is one with the parameter β = 0. With small nonzero β the results improve, mainly for the non-normalized variant of (1) (the normalized variant also improves very slightly for β around 0.1). The improvement is not impressive but clearly observable for β everywhere between 0 and 0.7. As the parameter β grows, the results decline. With an infinite β , that is, with α = 0, β = 1, the result was 45%. This is still better than random baseline, which gives 50%, so relations alone, without taking into account words at all, still can be used for retrieval, but the performance is much poorer than that for the words without relations. We believe that this may be due to low recall: for many documents there were no relations in common with the queries, because of too small size of both queries and documents. Figure 3 present the same data but separately for each query. As expected [23], the results vary greatly from query to query. 22 24 26 28 30
20 18
23 25 27 29 avg
22 24 26 28 30
20 18
16
16
14
14
12
12
10
10
8
8
6
6
23 25 27 29 avg
4
4 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
Fig. 3. Mean average precision for the large sample, for individual queries as well as averaged over queries (same as Figure 2). Left: with normalization, right: without.
Figure 4 shows the results for a smaller sample (1,000 documents). The sample also shows improvement of the formula with β > 0 over the baseline β = 0; in this case the improvement is observable for the variant of the formula with the normalizing denominator.
Information Retrieval with a Simplified Conceptual Graph-Like Representation
101
18 17.5 17 16.5 16 Without weighting With weighting
15.5 15 14.5 0
0.1
0.2
0.3
0.4
35
0.5
22 24 26 28 30
33 31 29
0.6
23 25 27 29 avg
27 25 23 21 19 17 15 13 11 9 7 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
35 33 31 29 27 25 23 21 19 17 15 13 11 9 7
22 24 26 28 30
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
23 25 27 29 avg
1.3
1.4
Fig. 4. Plots as in Figures 2 and 3, over the small sample
7 Conclusions and Future Work We have shown that taking into account semantic relation between words improves the results of information retrieval task. We also briefly presented a methodology for transforming short phrases expressed in natural language into Conceptual Graphs via automatic semantic analysis using lexical resources such as VerbNet. In the future, we plan to work on better post-processing of the dependency tree into a conceptual graph-like structure and on improvements to our grammar that produces the semantic roles. We will also experiment with other text collections to see whether the method gives greater improvement on collections with larger documents. Acknowledgements. The work was done during the first author’s research stay at the Laboratorio de Lenguaje Natural y Procesamiento de Texto of the Centro de Investigación en Computación of the Instituto Politécnico Nacional, Mexico, partially funded by the Universidad Nacional de Colombia and Universidad Distrital F.J.C., Bogota, Colombia, and with partial support of Mexican Government (SNI, CONACYT grant 50206-H, CONACYT scholarship for Sabbatical stay at Waseda U., COFAA-IPN, and SIP-IPN grant 20100773) to the second author.
References 1. Amghar, T., Battistelli, D., Charnois, T.: Reasoning on aspectual-temporal information in French within conceptual graphs. In: 14th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2002, pp. 315–322 (2002)
102
S. Ordoñez-Salinas and A. Gelbukh
2. Badia, A., Kantardzic, M.: Graph building as a mining activity: finding links in the small. In: Proceedings of the 3rd International Workshop on Link Discovery LinkKDD 2005, pp. 17–24. ACM, New York (2005) 3. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, Pearson Addison Wesley (1999) 4. Barbu, E., Heroux, P., Adam, S., Trupin, E.: Clustering document images using a bag of symbols representation. In: Proceedings, Eighth International Conference on Document Analysis and Recognition, vol. 2, pp. 1216–1220 (2005) 5. Barceló, G., Cendejas, E., Bolshakov, I., Sidorov, G.: Ambigüedad en nombres hispanos. Revista Signos. Estudios de Lingüística 42(70), 153–169 (2009) 6. Barrière, C., Barrière, N.C.: From a Children’s First Dictionary to a Lexical Knowledge Base of Conceptual Graphs. St. Leonards (NSW): Macquarie Library (1997) 7. Barski, C.: The enigmatic art of knowledge representation, http://www.lisperati.com/tellstuff/ind-ex.html (accessed March 2010) 8. Castro-Sánchez, N.A., Sidorov, G.: Analysis of Definitions of Verbs in an Explanatory Dictionary for Automatic Extraction of Actants based on Detection of Patterns. LNCS, vol. 6177, pp. 233–239. Springer, Heidelberg (2010) 9. Delugach, H.S.: Towards. Conceptual Structures Interoperability Using Common Logic Computer. In: Third Conceptual Structures Tool Interoperability Workshop. Science Department Univ. of Alabama in Huntsville (2008) 10. Figuerola, G.C., Zazo, F.A., Berrocal, J.L.A.: Categorización automática de documentos en español: algunos resultados experimentales. Universidad de Salamanca, Facultad de Documentación, Salamanca España, 6–16 (2000) 11. Gelbukh, A., Sidorov, G., Galicia, S., Bolshakov, I.: Environment for Development of a Natural Language Syntactic Analyzer. In: Acta Academia, Moldova, pp. 206–213 (2002) 12. Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Sciences 101(suppl. 1), 5228–5235 (2004) 13. Helbig, H.: Knowledge Representation and the Semantics of Natural Language. Springer, Heidelberg (2006) 14. Hensman, S.: Construction of Conceptual Graph representation of texts. In: Proceedings of Student Research Workshop at HLT-NAACL, Department of Computer Science, University College Dublin, Belfield, Dublin 4 (2004) 15. Hensman, S., Dunnion, J.: Automatically building conceptual graphs using VerbNet and WordNet. In: 2004 International Symposium on Information and Communication Technologies, Las Vegas, Nevada, June 16-18. ACM International Conference Proceeding Series, vol. 90, pp. 115–120. Trinity College, Dublin (2004) 16. Hensman, S., Dunnion, J.: Constructing conceptual graphs using linguistic resources. In: Husak, M., Mastorakis, N. (eds.) Proceedings of the 4th WSEAS International Conference on Telecommunications and Informatics, World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, Prague, Czech Republic, March 13-15, pp. 1–6 (2005) 17. Hensman, S.: Construction of conceptual graph representation of texts. In: Proceedings of the Student Research Workshop at HLT-NAACL 2004, Boston, Massachusetts, May 02– 07. Human Language Technology Conference, pp. 49–54. Association for Computational Linguistics, Morristown (2004) 18. Hernández Cruz, M.: Generador de los grafos conceptuales a partir del texto en español. MSc thesis. Instituto Politécnico Nacional, Mexico (2007)
Information Retrieval with a Simplified Conceptual Graph-Like Representation
103
19. Kamaruddin, S., Bakar, A., Hamdan, A., Nor, F.: Conceptual graph formalism for financial text representation. In: International Symposium on Information Technology (2008) 20. Kipper, K., Korhonen, A., Ryant, N., Palmer, M.: Extending VerbNet with Novel Verb Classes. In: 5th International Conf. on Language Resources and Evaluation, LREC 2006, Genoa, Italy (June 2006), http://verbs.colorado.edu/~mpalmer/projects/verbnet.html 21. Kovacs, L., Baksa-Varga, E.: Dependency-based mapping between symbolic language and Extended Conceptual Graph. In: 6th International Symposium on Intelligent Systems and Informatics (2008) 22. Medical Image Retrieval Challenge Evaluation P., http://ir.ohsu.edu/image 23. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008), http://www-nlp.stanford.edu/IR-book 24. National Library of Medicine, National of Institute of Health. United States Unified Medical Language System (UMLS), http://www.nlm.nih.gov/research/umls/about_umls.html (accessed April 2010) 25. Peltonen, J., Sinkkonen, J., Kaski, S.: Discriminative clustering of text documents. In: 9th International Conference on Neural Information Processing, ICONIP 2002, pp. 1956–1960 (2002) 26. Pérez-Coutiño, M., Montes-y-Gómez, M., López-López, A.: Applying dependency trees and term density for answer selection reinforcement. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 424–431. Springer, Heidelberg (2007) 27. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980), http://tartaus.org/~martin/PorterStemmer/ 28. Rassinoux, A.M., Baud, R.H., Scherrer, J.R.: A Multilingual Analyser of Medical Texts Conceptual Structures. In: Proceedings of 2nd International Conference on Conceptual Structures, ICCS 1994, College Park, Maryland, USA, August 16-20 (1994) 29. Rassinoux, A.M., Baud, R.H., Lovis, C., Wagner, J.C., Scherrer, J.R.: Tuning Up Conceptual Graph Representation for Multilingual Natural Language Processing in Medicine Conceptual Structures: Theory, Tools, and Applications. In: Proceedings of 6th International Conference on Conceptual Structures, ICCS 1998, Montpellier, France (August 1998) 30. Reddy, K.C., Reddy, C.S.K., Reddy, P.G.: Implementation of conceptual graphs using frames in lead. In: Ramani, S., Anjaneyulu, K.S.R., Chandrasekar, R. (eds.) KBCS 1989. LNCS, vol. 444, pp. 213–229. Springer, Heidelberg (1990) 31. Rege, M., Dong, M., Fotouhi, F.: Co-clustering Documents and Words Using Bipartite Isoperimetric Graph Partitioning. In: Proceedings Sixth International Conference Data Mining ICDM 2006, pp. 532–541 (2006) 32. Salton, G.: Relevance assessments and Retrieval system evaluation. Information Storage and Retrieval (1969) 33. Schenker, A., Bunke, H., Last, M., Kandel, A.: A Graph-Based Framework for Web Document Mining. In: Marinai, S., Dengel, A.R. (eds.) DAS 2004. LNCS, vol. 3163, pp. 401–412. Springer, Heidelberg (2004) 34. Schenker, A., Bunke, H., Last, M., Kandel, A.: Graph-Theoretic Techniques for Web Content Mining. World Scientific Publishing, Singapore (2005) 35. Shafiei, M., Milios, E.: Latent Dirichlet Co-Clustering. In: Sixth International Conference on, Data Mining (CDM 2006), pp. 542–551 (2006)
104
S. Ordoñez-Salinas and A. Gelbukh
36. Sleator, D., Temperley, D.: Parsing English with a link grammar. In: Third International Workshop on Parsing Technologies (1993) 37. Sowa, J.F.: Conceptual Graphs. Handbook of Knowledge Representation (2008) 38. Sowa, J.F., Way, E.C.: Implementing a semantic interpreter using conceptual graphs. IBM Journal of Research and Development 30(1), 57–69 (1986) 39. Tesnière, L.: Éléments de syntaxe structurale, Klincksieck, Paris (1959) 40. Williams, R.A.: Computational Effective Document Semantic Representation. In: Digital EcoSystems and Technologies Conference, DEST 2007. IEEE-IES (2007)
Teaching a Robot to Perform Tasks with Voice Commands Ana C. Tenorio-Gonzalez, Eduardo F. Morales, and Luis Villase˜ nor-Pineda National Institute of Astrophysics, Optics and Electronics, Computer Science Department, Luis Enrique Erro #1, 72840 Tonantzintla, M´exico {catanace17,emorales,villasen}@inaoep.mx http://www.inaoep.mx
Abstract. The full deployment of service robots in daily activities will require the robot to adapt to the needs of non-expert users, particularly, to learn how to perform new tasks from “natural” interactions. Reinforcement learning has been widely used in robotics, however, traditional algorithms require long training times, and may have problems with continuous spaces. Programming by demonstration has been used to instruct a robot, but is limited by the quality of the trace provided by the user. In this paper, we introduce a novel approach that can handle continuous spaces, can produce continuous actions and incorporates the user’s intervention to quickly learn optimal policies of tasks defined by the user. It is shown how the continuous actions produce smooth trajectories and how the user’s intervention allows the robot to learn significantly faster optimal policies. The proposed approach is tested in a simulated robot with very promising results. Keywords: Reinforcement learning, voice command, service robotics, continuous spaces.
1
Introduction
Service robots are rapidly expanding and soon will become part of everyday life. Their complete acceptance, however, will come when the robots are able to learn new tasks from natural interactions with their users. Many approaches have been developed to allow a robot to learn a new task, but particularly emphasis has recently been given to programming by demonstration and to reinforcement learning. In reinforcement learning an agent uses its own experience to learn how to perform a task, receiving rewards and punishments for good and bad actions that are scored depending on their contribution to reach a goal. However, traditional algorithms of reinforcement learning have very long convergence times and may have problems with continuous state and action spaces. Using a tabular representation is only applicable to simple domains and some value function approximations, as neural networks or Gaussian processes are computationally expensive that make them infeasible for on-line learning of tasks. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 105–116, 2010. c Springer-Verlag Berlin Heidelberg 2010
106
A.C. Tenorio-Gonzalez, E.F. Morales, and L. Villase˜ nor-Pineda
In programming by demonstration, a teacher shows the robot how to perform a task, however, it requires specialized hardware (e.g., gloves, special marks, etc.) and a controlled environment (i.e., controlled illumination conditions, special camera setting, etc.). They also tend to learn the task exactly as the user showed it, however human demonstrations tend to be noisy and suboptimal, and current systems are unable to improve it. In this paper we introduce an algorithm to instruct a robot on-line using speech and reinforcement learning with continuous states and actions. A simple representation based on kernels is used to deal with continuous spaces and a novel combination of discrete actions is used to produce continuous actions. We use speech to teach a robot a task without the need of any special equipment, providing initial traces and on-line feedback to the robot while it is learning a task. With these elements the convergence times of the reinforcement learning algorithm are significantly reduced. The rest of the paper is organized as follows. Section 2 gives an overview of related work using reinforcement learning with continuous states and actions, and with feedback provided in different ways. Section 3 has a description of the speech recognition process and vocabulary used. Section 4 describes our proposed algorithm of reinforcement learning with continuous states and actions adding feedback by the user. The experiments and results using a simulated autonomous mobile robot are presented in Section 5. Finally, conclusions and some ideas for future research work are presented in Section 6.
2
Related Work
There have been many approaches suggested in the literature to deal with continuous state and action spaces in reinforcement learning. For instance, some of them use artificial neural networks [1–4, 18, 20], Gaussian process [13, 14], tile coding, regressions [5, 11, 16], trees, kernels, receptive fields [19] and approximations based on Gaussian functions [7], among others. The most widely used approaches are based on artificial neural networks and Gaussian processes, however, they have a high computational cost and require a substantial training data which makes them inadequate for a fast on-line learning approach as proposed in this paper. Methods based on regressions, tile coding and receptive fields have similarities with tabular representations and they can work on-line and have potential to be used in an incremental way. Our proposed method is similar to receptive fields however we incorporate a novel approach for continuous actions and a simple state representation that can work on-line with low computational cost. Furthermore, we incorporate voice commands as additional reinforcements into the learning process. Several authors have considered the use of feedback in reinforcement learning [6, 8–10, 21]. But, these approaches use computer peripherals such as joystick, keyboard, mouse, and camera among others, to provide feedback to the system. The reported work assumes discrete spaces and the feedback is provided only at certain stages of the learning process. Unlike these approaches, we used natural
Teaching a Robot to Perform Tasks with Voice Commands
107
language and allow the user to provide feedback at any stage of the learning process. There are a few research works that have used spoken feedback [15, 22]. In [15] a system of learning by demonstration is introduced for an autonomous mobile robot (CMAssist). The robot learns how to do a task watching a human and receiving verbal instruction. Behaviors of the robot and tasks are represented with a directed acyclic graph. Verbal instructions are structured in similar way as control structures of a programming language. Additionally, information of the environment is given to the robot with a map and the learning process is supervised. By contrast, the method that we propose does not require any knowledge about the environment and uses a more natural verbal interaction. In [22] the authors present an Actor-Critic model based on an algorithm called IHDR (Hierarchical Discriminant Regression). They use Q-learning with a parameter of ’amnesia’. A teacher set the robot’s arm in different positions, each one is named with a particular word (command) which the teacher says. Then, the complete sequence of positions is named too. So, a position or sequence of them can be invoked with its identifier. A teacher uses a bumper to mark the positions and to provide some reinforcements, so it is necessary an interaction with hardware. In this paper, we propose a method without any special hardware on the robot and with a more natural interaction approach.
3
Spoken Feedback
In this paper speech is used to instruct a robot how to perform a new task. The use of speech gives flexibility and more natural interaction to the user during training and can be used to qualify or modify in different ways the behavior of the robot. The human voice interaction can be provided at any time and with different intensities and intentions, and does not require any special equipment or a deep knowledge about robot learning. We chose Spanish as our interaction language and created two corpora of vocabulary (with around 250 words). Our initial corpus was composed of isolated words which we later expanded in the second corpus to deal with short phrases. The vocabulary of the speech recognizer is composed by qualifiers and commands of actions. Some words used in the first corpus and short phrases used in the second are presented in Table 1. The transcriptions of speech are analyzed by an interpreter designed to look for words of interest considering the stage of the learning process. After the interpreter has identified the words of interest, depending on their meaning, it changes the behavior of the learning algorithm. If the words are qualifiers, the reward function is modified adding a value corresponding to the meaning of the word. If an action command is understood, then the respective action is sent to the robot also modifying the learning process. This is explained in the next section.
108
A.C. Tenorio-Gonzalez, E.F. Morales, and L. Villase˜ nor-Pineda Table 1. Examples of vocabulary with a rough translation into English WORDS SHORT PHRASES Avanzar (forward) Hacia adelante (move forward) Regresar (backward) Hacia atr´ as (move backwards) Izquierda (left) Gira a la izquierda (turn to your left) Derecha (right) Ve a tu derecha (go to your right) Fin, Final (end) Para ah´ı (stop there) Bien (good) Sigue as´ı (keep like this) Mal (bad) Por ah´ı no (not that way) Excelente (excellent) Muy bien (very good) Terrible (terrible) As´ı no (not like that) Objetivo (goal) Hasta ah´ı (until there)
4
Reinforcement Learning Algorithm
In Reinforcement learning (RL) [17] an agent explores the environment to reach a goal. The agent receives rewards or punishments for its actions, and it tries to maximize the total accumulated expected reward. Reinforcement learning can be modeled using Markov Decision Processes (MDPs). An MDP consists of: 1. A set of states S, 2. A set of actions A, 3. A probability transition function of going to state s given that action a was taken at state s, P (s |s, a), 4. R : SXAXS → a reward function that evaluates how good is to be in a state Algorithms of reinforcement learning try to approximate a value function that estimates the expected accumulated reward for each state V (s) or for each stateaction pair Q(s, a). There are several reinforcement learning algorithms that have been proposed to find optimal policies and value functions. One of the most popular algorithms of reinforcement learning is SARSA (State-Action-Reward-State-Action), an algorithm that solves MDPs based on temporal differences (differences between successive predictions). In this paper, we proposed a modified SARSA algorithm with eligibility traces. The algorithm that we proposed consists of three main components: (i) initial traces of how to perform a new task are provided by a teacher using voice commands, (ii) a reinforcement learning algorithm with exploration and continuous state and action spaces is used to learn an optimal policy, and (iii) the user provides voice feedback to the system during the learning process to improve the current policy and help the reinforcement learning algorithm to converge faster. And three stages, first initial traces are provided, then, training with feedback and reinforcement learning are combined, and finally the policy learned is improved.
Teaching a Robot to Perform Tasks with Voice Commands
4.1
109
Initial Traces
In order to teach the robot a new task, the user provides spoken instructions to the robot to complete the task. The robot executes the interpreted actions until it receives a finishing instruction. The user can provide several of these traces by voice as a first stage of the learning process. These traces, however may have some erroneous actions due to misinterpretations of the commands by the speech recognition system or mistakes made by the user during the instruction phase. However, they can be corrected using reinforcement learning and additional feedback from the user to quickly converge to an adequate policy during the rest of the learning process. 4.2
States, Actions and Value Functions
States are incrementally created as the robot traverses the environment. The robot receives data from its sensors, creates a new state representation and compares it with its stored states. If its current state is similar enough (as explained below) to a known state, the robot continues with its trace, otherwise, it stores the new state. There are different ways in which we can represent and compare states. In this work we used two representations and similarity functions, one using the mean and standard deviation of sensor readings as state representation and a Gaussian function to evaluate distance, and another one using directly the information from the sensors and a Pearson’s correlation coefficient. With the Gaussian function, we have: f (x) =
2 1 √ e−(x−μ)/(2σ ) . σ 2π
(1)
where, μ is the mean array of the sensor’s data representing a state, σ is its standard deviation and x is the new array of the sensor’s data. And using the Pearson’s correlation coefficient we have: N Σxy − ΣxΣy √ r= √ . 2 ( N Σx − (Σx)2 )( N Σy 2 − (Σy)2 )
(2)
where N is the size of sample (number of x, y pairs), x, y are arrays of sensor’s data, the first one has the values of a stored state and the second has a new set of data obtained from the sensors in the current robot’s position. Each new state is associated with a set of basic discrete actions (forward, backward, left, and right) and each state-action pair is associated with a Qvalue. After the initial traces, the learning process has two phases, which is used depends on whether or not user feedback is provided. During the training phase or during the initial traces, the robot follows the actions provided by the user and incrementally builds the representation of states. During the learning phase,
110
A.C. Tenorio-Gonzalez, E.F. Morales, and L. Villase˜ nor-Pineda
if the robot enters a new state then it chooses a random action, otherwise, it selects an action based on the combination of the actions with greater Q-values. The combined action is proportional to the Q-values of the selected discrete actions. For example, if in a particular state the actions with greater Q-values are ’forward’ and ’right’, if the ’right’ action has the largest Q-value, then the robot will go more to the right than forward producing an intermediate action ar . More formally: if Q(s, a1 ) < Q(s, a2 ) var (s) = (Q (s, a1 ) /Q (s, a2 )) ∗ va1 + (1 − Q (s, a1 ) /Q (s, a2 )) ∗ va2 .
(3)
where va is the value of an action according to the domain of the task (e.g. ’right’, ’left’). If the actions with the largest Q-values are opposite (i.e., right and left or forward and backward), one of them is randomly chosen. If all the actions have similar Q-values, also one is randomly chosen. In our implementation we use a modified SARSA learning algorithm with eligibility traces. So, the updating of the Q-values is performed considering the combined action using the following modified Sarsa(λ) update rule: Qt+1 (s, a) = Qt (s, a) + αδt et (s, a) .
(4)
δt = rt+1 + γQt (st+1 , at+1 ) − Qt (st , at ) .
(5)
for all s, a, where
and if s = st and a = a1 t or a = a2 t , et (s, a) = γλet−1 (s, a) + 1 .
(6)
et (s, a) = γλet−1 (s, a) .
(7)
otherwise, where s is a state, a is an action, a1t and a2t are the two actions used with the largest Q-values, α is a positive step-size parameter, γ is a discount-rate parameter and λ is a decay parameter. Since a continuous action is a combination of two basic actions (a1 , a2 ), two Q-values are updated at each step instead of only one. This combined action produces continuous actions and keeps a simple updating procedure. At the end of all process, the policy constructed by the system during the training phase is tested by the robot and improved with the feedback provided by the user too. Reward Function. During the learning process, two kinds of rewards are given. In addition to the traditional rewards given in reinforcement learning, more
Teaching a Robot to Perform Tasks with Voice Commands
111
rewards can be given by the user at any time. The reinforcement learning is still updating its Q-values as described before, but now the reward function is given by: R = RRL + RU .
(8)
where RRL is the traditional reward function used in reinforcement learning and RU is an additional reward given when the user specifies an action or qualifies the behavior of the robot. This is similar to reward shaping, however, instead of being predefined in advance and given all the time during the learning process, the rewards are given by the user occasionally, can change their values over time and also can be wrong.
5
Experiments
The experiments were focused in navigation tasks using a robot Pioneer 2 with a frontal laser and a rear sonar. The robot and the environment were simulated on the Player/Stage platform running on Linux Debian. The speech recognizer used was Sphinx3 with the acoustic models level T22 based on corpus DIMEx100 [12]. The language models were created over a specific vocabulary for the tasks. In order to test the capabilities of the algorithm, we teach the robot to do navigation tasks with different levels of complexity (increasing distance, number of doorways and corridors), shown in Figure 1. In the first task the robot only had to go inside a room from a point in a hall. The second task involved leaving a room, going through a corridor with four doorways and then entering a room. In the third task, the robot had to leave a room, go through a hall with three doorways, go through one doorway, go through another hall with two doorways and go through one final doorway. In the last task, the robot learned to leave a room, go through a corridor with five doorways, go through one doorway and then enter a final room (see Figure 1). Contrary to traditional navigation tasks, the robot in this case has no knowledge about the environment, using the laser as main sensor to identify states (using its 180 distance measures) and its sonars to detect collisions (in the rear part). A microphone was used by the user to give the feedback and the initial traces by voice during the training of the robot. The teacher could provide feedback at any moment during the learning process (qualifiers and commands of actions) and could also provide wrong feedback by mistake. For these experiments, the internal rewards were +100 when the robot reached the goal, −100 when the robot got close to walls and −1 for other state-action pairs. Two types of external feedback were given by the user: qualified commands which were translated into additional rewards and action commands which were translated into robot actions. In the experiments we used the following values for the qualification commands: +100 for reaching the goal (objetivo), +50 for “excellent” (excelente), +10 for “good” (bien), −50 for “terrible” (terrible), and −10 for “bad” (mal ). Similar rewards were associated to other words and to short phrases. The qualifiers were given depending on the observable states produced by actions. And, if the user gives a command of an action, the action is performed by the robot, otherwise, the robot follows the action chosen by its
112
A.C. Tenorio-Gonzalez, E.F. Morales, and L. Villase˜ nor-Pineda
Fig. 1. Tasks 1-4 (left to right, up to down) taught to the robot
normal reinforcement learning algorithm. An action command given by the user can also be accompanied by a qualifier that also modifies the reward function. In our initial experiments we used a Gaussian function but it created a larger number of states (200-500) when compared to the Pearson’s coefficient (20-140) and it also produced more errors in the identification of states. In the following section we present results only with the Pearson’s coefficient. 5.1
Results
We did four types of experiments to show the different aspects of our proposed algorithm and the advantages of the approach: 1. 2. 3. 4.
Reinforcement Learning (RL) with continuous states and discrete actions RL with contiguous states and actions RL with continuous states and actions with oral feedback from the user RL with continuous states and actions, initial traces given by the user with voice commands and with oral feedback from the user
We first compared the behavior of policies learned using discrete actions with policies learned using our approach for generating continuous actions. Figure 2 shows a comparison of two traces where it is clear that the continuous action policy produces smoother and shorter paths with a reduction of about 1 meter in these experiments. Larger reductions are obtained in longer paths. Figure 3 shows the duration and number of episodes required to learn each task in all different experiments without any knowledge of the environment. Each experiment was repeated three times and the averages are shown in the figures.
Teaching a Robot to Perform Tasks with Voice Commands
113
Fig. 2. Comparison of traces obtained with the algorithm using discrete actions (figures to the left) and continuous actions (figures to the right. Exp. 1). For the first task, the difference of distance between the discrete and continuous action policies was about 1m, for the second task, the difference was about 0.9m.
As can be seen, the best results are obtained using traces and on-line feedback by the user with an important reduction in time to obtain an acceptable policy. The policies learned with the learning strategies are very similar except for the traditional RL strategy with discrete state and action spaces. Table 2 shows the number of episodes and the learning time for the different tasks. As can be seen in the table there is a substantial reduction in the number of episodes and total training times required to learn a new task with the proposed approach. In these experiments the user needs to spend an average of 17 minutes to train a robot to perform a complete new navigation task in a unknown environment. The speech recognition system is not perfect and it is common for the robot to understand a different command and acts accordingly. Even with such errors our approach is able to converge faster to a reasonable policy in few iterations.
114
A.C. Tenorio-Gonzalez, E.F. Morales, and L. Villase˜ nor-Pineda
Fig. 3. Results of experiments. Learning time using different variations of the algorithm: without traces and without feedback (square), with feedback (diamond), with traces and feedback (triangle). The x -axis has number of episodes, the y-axis has the duration (minutes) of each episode. Table 2. Total number of episodes (table to the left) and time (table to the right) per task for the different experiments (RL = reinforcement learning with continuous actions, RL + F = RL with feedback from the user, and RL + T + F = RL + F with the initial traces provided by the user)
T1 T2 T3 T4 Avg
6
RL RL + F RL + T + F 13 9 6 6 7 7 12 12 7 7 12 11 9.5 10 7.75
RL RL + F RL + T + F T1 103.93 19.59 12.85 T2 66.4 15.1 13 T3 100.65 38.2 18.09 T4 99.1 23.43 24.61 Avg 92.54 24.08 17.13
Conclusions and Future Works
This paper introduces a novel approach to teach a robot how to perform a new task using reinforcement learning and feedback provided by a teacher. The algorithm uses an incremental creation of states able to deal with continuous states
Teaching a Robot to Perform Tasks with Voice Commands
115
and a simple, yet effective, combination of discrete actions to produce continuous actions. The algorithm works on-line and was used successfully on a simulated autonomous mobile robot for different navigation tasks. Our experiments show that by including the user into the learning process a substantial reduction in the convergence times for obtaining an adequate policy can be obtained, even with errors performed by the user or by the speech recognition system. We think that teaching a robot with voice commands and providing oral feedback during learning is a natural way to instruct robots and opens a new possibility for non-experts to train robots how to perform new tasks. As future work, we are planning to include a mechanism to quickly forget mistakes during the learning process. We would also like to increase the vocabulary used by the user and test our approach with external users. We would also like to test our algorithm with other robots and with different tasks. Acknowledgments. This work was done under partial support of CONACYT (Project grant 106013 and scholarship 214262).
References 1. Coulom, R.: Feedforward Neural Networks in Reinforcement Learning Applied to High-dimensional Motor Control. In: Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.) ALT 2002. LNCS (LNAI), vol. 2533, pp. 403–413. Springer, Heidelberg (2002) 2. Chohra, A., Madani, K., Kanzari, D.: Reinforcement Q-Learning and Neural Networks to Acquire Negotiation Behaviors. In: Studies in Computational Intelligence. Springer Series. The Special Session New Challenges in Applied Intelligence Technologies (2008) 3. Dongli, W., Yang, G., Pei, Y.: Applying Neural Network to Reinforcement Learning in Continuous Space. In: International Symposium on Neural Networks (2005) 4. Gromann, A., Poli, R.: Continual robot learning with constructive neural networks. In: Birk, A., Demiris, J. (eds.) EWLR 1997. LNCS (LNAI), vol. 1545, p. 95. Springer, Heidelberg (1998) 5. Guenter, F., Hersch, M., Calinon, S., Aude, B.: Reinforcement learning for imitating constrained reaching movements. Advanced Robotics 21(13), 152–154 (2007) 6. Iida, F., Tabata, M., Hara, F.: Generating Personality Character in a Face Robot through Interaction with Human. In: 7th IEEE International Workshop on Robot and Human Communication, pp. 481–486 (1998) 7. Kimura, H., Yamashita, T., Kobayashi, S.: Reinforcement Learning of Walking Behavior for a Four-Legged Robot. In: 40th IEEE Conference on Decisions and Control (2001) 8. Lockerd, T.A., Breazeal, C.: Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance. In: 21st National Conference on Artificial Intelligence, vol. 1 (2006) 9. Lockerd, T.A., Hoffman, G., Breazeal, C.: Real-Time Interactive Reinforcement Learning for Robots. In: AAAI Workshop on Human Comprehensible Machine Learning (2005)
116
A.C. Tenorio-Gonzalez, E.F. Morales, and L. Villase˜ nor-Pineda
10. Lockerd, T.A., Hoffman, G., Breazeal, C.: Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots. In: 15th IEEE International Symposium on Robot and Human Interactive Communication, pp. 352– 357 (2006) 11. Melo, F.S., Lopes, M.: Fitted natural actor-critic: A new algorithm for continuous state-action MDPs. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 66–81. Springer, Heidelberg (2008) 12. Pineda, L.: Corpus DIMEx100 (Level T22). DIME Project. Computer Sciences Department. ISBN:970-32-3395-3. IIMAS, UNAM 13. Rasmussen, C.E.: Gaussian Processes in Machine Learning. In: Bousquet, O., von Luxburg, U., R¨ atsch, G. (eds.) ML 2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004) 14. Rottmann, A., Plagemann, C., Hilgers, P., Burgard, W.: Autonomous Blimp Control using Model-free Reinforcement Learning in a Continuous State and Action Space. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2007) 15. Rybski, P.E., Yoon, K., Stolarz, J., Veloso Manuela, M.: Interactive Robot Task Training through Dialog and Demonstration. In: ACM/IEEE International Conference on Human Robot Interaction, pp. 255–262 (2007) 16. Smart, W.D., Pack Kaelbling, L.: Effective Reinforcement Learning for Mobile Robots. In: IEEE International Conference on Robotics and Automation (2002) 17. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1999) 18. Tadashi, T., Koichi, S., Yasuhiro, W.: Robot Task Learning based on Reinforcement Learning in Virtual Space. Neural Information Processing, Letters and Reviews, 165–174 (2007) 19. Tamosiunaite, M., Asfour, T., Wrgtter, F.: Learning to reach by reinforcement learning using a receptive field based function approximation approach with continuous actions. Biological Cybernetics 100(3) (2009) 20. Van Hasselt, H., Wiering, M.: Using Continuous Action Spaces to Solve Discrete Problems. In: International Joint Conference on Neural Networks (2009) 21. Wang, Y., Huber, M., Papudesi, V.N., Cook, D.J.: User-Guided Reinforcement Learning of Robot Assistive Tasks for an Intelligent Environment. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (2003) 22. Yilu, Z., Juyang, W.: Action Chaining by a Developmental Robot with a Value System. In: 2nd International Conference on Development and Learning (2002)
Music Composition Based on Linguistic Approach Horacio Alberto García Salas1, Alexander Gelbukh1, and Hiram Calvo1,2 1
Natural Language Laboratory, Computing Research Center, National Polytechnic Institute, 07738, DF, Mexico 2 Computational Linguistics Laboratory, Nara Institute of Science and Technology, Takayama, Ikoma, Nara 630-0192, Japan
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. Music is a form of expression. Since machines have limited capabilities in this sense, our main goal is to model musical composition process, to allow machines to express themselves musically. Our model is based on a linguistic approach. It describes music as a language composed of sequences of symbols that form melodies, with lexical symbols being sounds and silences with their duration in time. We determine functions to describe the probability distribution of these sequences of musical notes and use them for automatic music generation. Keywords: Affective computing, evolutionary systems, evolutionary matrix, generative music, generative grammars.
1 Introduction Machine emotional intelligence is part of the objectives of affective computing research [20]. Music is one of the fine arts and represents a form of expression. A desirable feature for machines is that they could express musically since they do not yet have this ability [2]. The problem is how to teach machines to compose music. Computers represent a musical instrument capable of generating a number of sounds. Development of computational models applied to humanistic branches as fine arts, especially music, has its results in generative music, music generated from algorithms. Different models have been applied in development of automatic music composers, for example, those ones based on neural networks [11], on genetic algorithms [21], on swarms [4], etc., resulting in a wide range of applications. Our work is to characterize music and find its patterns so it can be explained in terms of algorithms to model the process of musical composition. A notes’ sequence has certain probability of appearing in a melody. There are certain sequences that occur more regularly that forms characteristic patterns for each musical composition. The likelihood for these patterns to appear is used by our algorithm to generate a musical composition automatically. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 117–128, 2010. © Springer-Verlag Berlin Heidelberg 2010
118
H.A. García Salas, A. Gelbukh, and H. Calvo
It is possible to develop computational tools to automate composition process using our model. The following are possible applications of such systems: – – – –
– – –
Have a personal music composer. Create new music styles by finding different patterns styles and mixing them. Help people without musical knowledge to compose music. Providing tools to allow users edit generated compositions, resulting user’s composition. Enable computers to have the capacity to carry out a process until now reserved for humans. Making this, machines will get human characteristics creating another way of human-machine communication. Offer another alternative for creation of music; as a consequence, other alternatives of music are possible to be listened. Have machinery for the generation of live music for restaurants, offices, shops, etc. with compositions created in real time by indefatigable musicians. Provide tools to allow children from a very young age to have direct contact with musical composition process, which stimulates their minds for better performance in human activities.
This paper is organized as follows. In Section 2 we describe different algorithms to develop the same task we do. In Section 3 we explain our system. In Section 4 we present some results and a discussion about how we can improve our model. Section 5 is the future work we endeavor to accomplish. Then we present some conclusions.
2 Related Work The works [19] and [12] provide a comprehensive study of different methods that have been used to develop music composition systems based on: noise [5], knowledge, cellular automata, grammars [18], evolutionary methods, fractals, genetic algorithms [1], case based reasoning [14], agents [16] and neural networks [6, 11]. Some systems are called hybrid since they combine several of these techniques. For example, Harmonet [11] is a system based on connectionist networks, which has been trained to produce coral style of J. S. Bach. It focuses on the essence of musical information, rather than restrictions on music structure. The authors of [6] believe that music composed by recurrent neural networks lacks structure, as they do not maintain memory of distant events, and developed a model based on LSTM (Long Short Term Memory) to represent the overall and local music structure, generating blues compositions. The work [13] describes a system for automatic music genre recognition based on signal’s audio content, focusing only on melodies of three music genres: classical, metal and dance. The work [3] presents a system to recognize through the contents of a music database, which includes audio files (MIDI), with the idea to make search based on music contours, i.e. in a relative changes representation in a melody frequencies, regardless of tone or time. There is a number of works based on evolutionary ideas for music composition. For example, [18] used generative context-free grammars for modeling the melody, through genetic algorithms making grammar evolve to improve the melody and produce a composition. GenJam [1] is a system based on a genetic algorithm that
Music Composition Based on Linguistic Approach
119
models a novice jazz musician learning to improvise. Musical phrases are generated at random and user feedbacks the system, generating new compositions improving through several generations. In [21] a genetic algorithm with coevolution, learning and rules is used in a music composer system. In it, male individuals produce music and female critics evaluate it to mate with suitable males creating new melodies generations.
3 Music Composer A melody is a structure made up of other structures built over time. These structures are notes sequences. How many times a musical note is used after another reflects patterns of notes’ sequences in a melody. A personal characteristic of each author is the use of certain notes’ patterns with more regularity. We focus on finding these patterns over monophonic music to characterize it probabilistically. Our model is built based on a linguistic approach [8]. It describes music as a language composed of sequences of symbols, which lexical items are sounds and silences throughout time. Each melody is made of phrases of this language. Notes of a melody represent sounds or silences. Sequences of notes form phrases of sounds. Grammar Mental
Composer
Interpretative
Interpreter
Auditory
Listener
(sad, happy)
Expressiveness (time, frequency)
Fig. 1. Model of Music Process
Music process involves three main levels, mental, interpretative and auditory [15]. The process of musical composition is a mental process that involves the conception of an idea to be expressed in sounds and silences. The result of composition process is a musical composition and can be shaped in a score or in a sound file. In our model the language that represents the score is represented by a grammar. The performer turns the musical work into sound, adding his personal traits of expressiveness. The sound reaches the audience who gives meaning to the music according to how is perceived. Our model focuses on the mental level, see Fig. 1. To model the process of musical composition we rely on the concept of evolutionary systems [7], which states that systems evolve as a result of constant change caused by the flow of matter, energy and information [9]. Evolutionary systems interrelate with their environment finding rules to describe phenomena, they use functions that allow them to learn and adapt to changes that come before them. These rules can be expressed in the form of grammars. A generative grammar G (Vn, Vt, S, R), where Vn is the set of non-terminal symbols, Vt is the set of terminal
120
H.A. García Salas, A. Gelbukh, and H. Calvo
symbols or alphabet, which are the musical notes, the initial symbol S and a set of rules R. Each genre, style and musical author has its own rules of composition. Not all rules are described in music theory. So to make automatic music composition we use an evolutionary system to find the rules that determine the form of each melody in unsupervised manner. The scheme of our model is shown in Fig. 2. Request
Music Rules Recognizer
Composer
(FE, FS, Cl)
Unknown Request
Generated Music
Fig. 2. Scheme of our model
A characteristic of our model is the ability to learn from examples of music mi. From each example probabilistic grammars Gi are generated to describe patterns that characterize musical expressiveness of each melody. These learned rules are used to generate melody mi+1 automatically. The function R called recognizer generates production rules of grammar G from each musical melody, thereby creating an image of reality in musical terms. R(mi) = G It is possible to construct a function C(G). C is called a musical composer and uses G, a generative grammar to produce a novel melody m. C(G) = m In this paper we are dealing with Composer and Recognizer Functions. To hear the music composition it must exist a function I called musical interpreter or performer that generates the sound of melody m. I recognize the lexical-semantic symbols of G that make the expressiveness of melody m. I(m) = sound 3.1 R Function Recognizer: Music Learning Module Our model is modified according to each new melody. For every melody a musical language is generated that represents it. This is equivalent to generate a different automaton or a new compiler. Each example makes the model to restructure and to adapt to changes getting more musical knowledge. We are working with melodies, monophonic music, modeling frequencies and times of notes, the two more important variables of expressivity in music. Each of these variables forms a sequence along the melody. We construct a probability function for each sequence using a matrix. This matrix can be transformed into a
Music Composition Based on Linguistic Approach
121
probabilistic grammar. In 3.3 Matrix and Grammar we explain and algorithm to make this transformation. We are going to explain how the frequency matrix works. Time’s matrix works the same way. For example, Fig. 3 is a frequencies sequence of a melody. Where Vt={b, d#, e, f#, g, a, b2, d2, e2, g2} are the terminal symbols or alphabet of this melody. Each of these symbols of the alphabet corresponds to each note in a chromatic scale: A, A#, B, C, C#, D, D#, E, F, F#, G, G#.
El cóndor pasa (Peruvian song) b e d# e f# g f# g a b2 d2 b2 e2 d2 b2 a g e g e b e d# e f# g f# g a b2 d2 b2 e2 d2 b2 a g e g e b2 e2 d2 e2 d2 e2 g2 e2 d2 e2 d2 b2 g e2 d2 e2 d2 e2 g2 e2 d2 e2 d2 b2 a g e g e
Fig. 3. Example of a monophonic melody
Let Notes[n] to be an array in which are stored the numbers corresponding to melody notes. Where n is the index which refers to each array element. Let Mi,j be a matrix with i rows and j columns. Fig. 4 shows the learning algorithm we use to generate frequency distribution matrix of Fig. 5.
for each i ∈ Notes[n], j ∈ Notes[n+1] do Mi,j = Mi,j + 1 Fig. 4. Learning algorithm
We use a matrix of 60 rows and 60 columns representing 5 musical octaves to store frequency’s sequences. 5 musical octaves are 60 chromatic notes. Frequencies matrix’s tags of rows and columns are the notes (a, a#, b, c, c#, d, d#, e, f, f# g, g#, a2, a2#,…,g5, g5#). A matrix of 7 rows and 7 columns is used to store time’s sequences corresponding to whole note (semibreve), half note (minim), quarter note (crotchet), eighth note (quaver), sixteenth note (semiquaver), thirty-second note (demisemiquaver) and sixty-fourth note (hemidemisemiquaver).
Fig. 5. Frequency distribution matrix
122
H.A. García Salas, A. Gelbukh, and H. Calvo
Each number stored in frequency matrix represents how many times a row note was followed by a column note. An S row should be added to store the first note of each melody. S represents the axiom or initial symbol. Fig. 5 shows frequency distribution matrix after applying the learning algorithm to melody of Fig. 3. Matrix of Fig. 5 only nonzero contains columns and rows.
Fig. 6. Frequency distribution of e note
Rows of matrix of Fig. 5 represent frequency distribution of each note. In Fig. 6 we show as example the frequency distribution of e row. How many times a note is followed by another note can be used to calculate its probability distribution. 3.2 C Function Composer: Music Generator Module C (Composer) function generates a note sequence based on probabilities determined from frequency distribution matrix. From each note is possible to go only to certain notes according to frequency distribution for each note. The most probable notes form characteristic musical patterns. To determine the probability that a note follows another note, we need to determine the cumulative sum of each matrix row of Fig. 5. Let Mi,j to be a matrix, with i rows and j columns. We calculate the cumulative sum for each i row such that Mi,j ≠ 0. The partial i row sum is stored in each non-zero cell. We add a column T where the total cumulative sum for each row i is stored.
for each i ∈ M do for each j ∈ M do Ti = Ti + Mi,j Mi,j = Ti for each Mi,j ≠ 0 Fig. 7. Cumulative frequency distribution algorithm
With each new melody mi matrix Mi,j is modified. This means that world’s representation of our model has changed. It has more music knowledge. Fig. 8 is cumulative frequency distribution matrix after applying cumulative frequency distribution algorithm Fig. 7 to frequency distribution matrix Fig. 5. For music generation is necessary to decide next note of the melody. To take this decision a human composer bases in his musical knowledge. In our model this decision is made based on the cumulative frequency distribution matrix using the note generator algorithm Fig. 9.
Music Composition Based on Linguistic Approach
123
Fig. 8. Cumulative frequency distribution matrix
For example let us generate a melody based on matrix of Fig. 8. Music generation begins choosing the first composition note. We choose one of possible beginning notes, that is, notes which are first notes of melodies examples. S row of matrix of Fig. 8 contains all beginning notes. In our example, applying note generator algorithm Fig. 9 there is only one possible note to choose. This first note represents an i row of Mi,j which we use to determine the next note. The same happens with second note. Only e note can be chosen from the first note b.
while(not end) p=random(Ti) while(Mi,j < p) j++ new_note = j i=j Fig. 9. Note generator algorithm
The first two notes of this new melody are mi+1 = {b, e}. Applying the note generator algorithm to determine the third note: We take the value of column Te = 9. A p random number between zero and 9 is generated, p = 6. To find next note we compare p random number with each non-zero value of the E row until one greater than or equal to this number is found. Then column g is this next note since Me,g = 8 is greater than p=6. The column j=g is where it is stored this number that indicates the following composition note and the i following row to be processed. The third note of new melody mi+1 is g. So mi+1 = {b, e, g,…}. Then to determine the fourth note we must apply the note generator algorithm to i = g row. Since each non-zero value of i represents notes that were used to follow the i note, then can use them to generate patterns found in melodies examples. Generated music reflects these patterns learned from music examples. While the system generates a musical composition with each note it modifies itself, increasing the likelihood for that note to be generated again. This is another way the system evolves. Besides we added a forgetting mechanism to ensure that the values do not overflow, which causes the notes played the least, lesser probability to be played again even they are not forgotten.
124
H.A. García Salas, A. Gelbukh, and H. Calvo
3.3 Matrix and Grammar There are different ways to obtain a generative grammar G. A particular unsupervised case is an evolutionary matrix [10]. The algorithms described in Figs. 4, 7 and 9 of functions R and C represent an evolutionary matrix. An evolutionary matrix is a way of knowledge representation. From frequency distribution matrix and the T total column Fig. 8 is possible to generate a probabilistic generative grammar.
for each i ∈ M do for each j ∈ M do if Mi,j ≠ 0 Mi,j= Mi,j / Ti Fig. 10. Probability algorithm
To apply the algorithm, we need to determine probability matrix from frequency distribution matrix of Fig. 8. Probability matrix is calculated with the probability algorithm of Fig. 10.
Fig. 11. Probability matrix
Exist a grammar G{Vn, Vt, S, R} such that G can be generated from M, where M is the probability matrix Fig. 11. Vn is the set of no-terminals symbols, Vt is the set of all terminal symbols or alphabet; in this particular case the alphabet represents melody’s notes. S is the axiom or initial symbol and R is the set of rules generated. To transform matrix of Fig. 11 into grammar of Fig. 13 we use the following algorithm: – – –
We substitute each tag row of M with a no-terminal symbol of grammar G, Fig. 12. Each column tag must be substituted by its note and its non-terminal symbol, Fig. 12. For each i row and each j column such that Mi,j ≠ 0, j column represents a terminal symbol and a Xn no-terminal symbol with probability p = Mi,j / Ti to occur. Then rules are of the form Vn → Vt Vn(p).
In this way the grammar is G{Vn,Vt, S, R}. Vn={S, X1, X2 X3, X4, X5, X6, X7, X8, X9, X10} is the set of no-terminals symbols. Vt={b, d#, e, f#, g, a, b2, d2, e2, g2} is the set of all terminal symbols or alphabet. S is the axiom or initial symbol. Rules R are listed in Fig. 13.
Music Composition Based on Linguistic Approach
125
Fig. 12. Transition matrix S ĺ b X1(1) X1 ĺ e X2(1) X2 ĺ e X3(1) X3 ĺ b X1(1/9) | d# X2(2/9)| f# X4(2/9) | g X5(3/9) | b2 X7(1/9) X4 ĺ g X5(1) X5 ĺ e X3(6/11) | f# X4(2/11) | a X6(2/11) | e2 X9(1/11) X6 ĺ g X5(3/5) | b2 X7(2/5) X7 ĺ g X5(1/9) | a X6(3/9) | d2 X8(2/9) | e2 X9(3/9) X8 ĺ b2 X7(6/12) | g2 X10(6/12) X9 ĺ d2 X8(10/12) | g2 X10(2/12) X10 ĺ e2 X9(1)
Fig. 13. Probabilistic generative grammar
4 Results and Discussion Examples of music generated by our system can be found at www.olincuicatl.com. To evaluate whether our algorithm is generating music or not, we decided to conduct a Turing-like test. 26 participants of the test had to tell us if they like music generated by our model, knowing anything about that but it was automatically music generated. We compiled 10 melodies, 5 of them generated by our model and another 5 by human composers and we asked 26 human subjects to rank melodies according to whether they liked them or not, with numbers between 1 and 10 being number 1 the most they liked. None of subjects knew about the order of music compositions. These 10 melodies were presented as in table 1. Table 1. Order of melodies as they were presented to subjects ID A B C D E
Melody Zanya Fell Alucin Idiot Ciclos
Author (generated) Nathan Fake (generated) James Holden (generated)
ID F G H I J
Melody Dali Ritual Cibernetico Feelin' Electro Infinito Lost Town
Author Astrix (generated) Rob Mooney (generated) Kraftwerk
126
H.A. García Salas, A. Gelbukh, and H. Calvo
Test results were encouraging: since automatically generated melodies were ranked at 3rd and 4th place above human compositions of very famous bands. Table 2 shows the ranking of melodies as a result of the Turing-like test we conducted. Table 2. Order of melodies obtained after the like Turing test ID B D C A F H J E G I
Ranking 1 2 3 4 5 6 7 8 9 10
Melody Fell Idiot Alucín Zanya Dali Feelin' Electro Lost Town Ciclos Ritual Cibernético Infinito
Author Nathan Fake James Holden (generated) (generated) Astrix Rob Mooney Kraftwerk (generated) (generated) (generated)
We have obtained novelty results comparable with those obtained by other developments [21, 1], modeling frequency and time of a melody with simple algorithms. To the ears of musicians compositions generated by our system sound similar to the used examples. However we are developing other algorithms in order to shape the musical structure [17]. We consider if a larger corpus is used the results will considerably improve. It is necessary to develop more sophisticated forgetting functions to improve the method.
5 Conclusions and Future Work We have developed a model for music composition process. A way to represent music based on an evolving matrix [10] a paradigm for knowledge representation. We developed an algorithm to transform a matrix where we represent music into a grammar, what it is a linguistic representation of music. Generative music presents new forms that not always match with traditional rules of music. This feature is perhaps one of the attractions of these new forms of music which breaks with preset patterns. Transition patterns are measured statistically to determine the probability of moving from one musical note to another. This process can be model with a grammar, automata, matrix, etc. We propose a model, regardless of the modeling tool, for characterize music composition process. In our future work we will characterize different types of music, from sad to happy, from classic to electronic in order to determine functions for generating any kind of music. We are currently developing systems to improve using matrices of 3, 4 or n dimensions, which may reflect the many variables involved in a musical work. To model more music variables will be reflected in music expression.
Music Composition Based on Linguistic Approach
127
We plan to make matrices evolve into some other matrices to produce music morphing. Also, we are interested in develop a polyphonic model. Finally, it is necessary to develop better forgetting functions. Acknowledgements. The work was done under partial support of Mexican Government (CONACYT 50206-H, IPN-SIP 20100773, IPN-COFAA, IPN-PIFI, SNI, CONACYT scholarship for Sabbatical stay to the second author) and the Japanese Government (JSPS). The third author is a JSPS fellow.
References 1. Biles, J.A.: GenJam: Evolution of a jazz improviser. In: Source. Creative evolutionary systems. Section: Evolutionary music, pp. 165–187. Morgan Kaufmann Publishers Inc., San Francisco (2001) 2. Birchfield, D.: Generative model for the creation of musical emotion, meaning and form. In: Source International Multimedia Conference. Proceedings of the 2003 ACM SIGMM Workshop on Experiential Telepresence, pp. 99–104. ACM, New York (2003) 3. Blackburm, S., DeRoure, D.: A tool for content based navigation of music. In: Source International Multimedia Conference. Proceedings of the Sixth ACM International Conference on Multimedia, Bristol, United Kingdom, pp. 361–368 (1998) 4. Blackwell, T.: Swarming and Music. Evolutionary Computer Music. In: Subject Collection: Informática. In SpringerLink since, pp. 194–217. Springer, London ( October 12, 2007) 5. Bulmer, M.: Music From Fractal Noise. In: Proceedings of the Mathematics 2000 Festival, Melbourne, University of Queensland January 10–13 (2000) 6. Eck, D., Schmidhuber, J.: A First Look at Music Composition using LSTM Recurrent Neural Networks. Source Technical Report: IDSIA-07-02. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale (2002) 7. Galindo Soria, F.: Sistemas Evolutivos: Nuevo Paradigma de la Informática. In: Memorias XVII Conferencia Latinoamericana de Informática, Caracas Venezuela (July 1991) 8. Galindo Soria, F.: Enfoque Lingüístico. Instituto Politécnico Nacional UPIICSA ESCOM (1994) 9. Galindo Soria, F.: Teoría y Práctica de los Sistemas Evolutivos. Mexico. Editor Jesús Manuel Olivares Ceja (1997) 10. Galindo Soria, F.: Matrices Evolutivas. La Revista Científica, ESIME del IPN, #8 de 1998. In: Cuarta Conferencia de Ingeniería Eléctrica CIE/98, CINVESTAV-IPN, Cd. de México, pp. 17–22 (September 1998) 11. Hild, H., Feulner, J., Menzel, W.: Harmonet: A Neural Net for Harmonizing Chorales in the Style of J.S.Bach. In: Lippmann, R.P., Moody, J.E., Touretzky, D.S. (eds.) Neural Information Processing 4 (NIPS 4), pp. 267–274. Morgan Kaufmann Universität Karlsruhe, Germany 12. Järveläinen, H.: Algorithmic Musical Composition. April 7, Tik-111.080 Seminar on content creation Art@Science. Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing (2000) 13. Kosina, K.: Music Genre Recognition. Diplomarbeit. Eingereicht am FachhochschulStudiengang. Mediente Chnik und Design in Hagenberg (June 2002) 14. Maarten, G.J.A., López, M.R.: A Case Based Approach to Expressivity-Aware Tempo Transformation. Source Machine Learning 65(2-3), 411–437 (2006)
128
H.A. García Salas, A. Gelbukh, and H. Calvo
15. Miranda, E.R., Jesus, L.A., Barros, B.: Music Knowledge Analysis: Towards an Efficient Representation for Composition. In: Marín, R., Onaindía, E., Bugarín, A., Santos, J. (eds.) CAEPIA 2005. LNCS (LNAI), vol. 4177, pp. 331–341. Springer, Heidelberg (2006) 16. Minsky, M.: Music, Mind, and Meaning. Computer Music Journal 5(3) (Fall 1981) 17. Namunu, M., Changsheng, X., Mohan, S.K., Shao, X.: Content-based music structure analysis with applications to music semantics understanding. In: Source International Multimedia Conference. Proceedings of the 12th Annual ACM International Conference on Multimedia. Technical session 3: Audio Processing, pp. 112–119. ACM, New York (2004) 18. Ortega, A.P., Sánchez, A.R., Alfonseca, M.M.: Automatic composition of music by means of Grammatical Evolution. In: ACM SIGAPL APL, vol. 32(4), pp. 148–155. ACM, New York (June 2002) 19. Papadopoulos, G., Wiggins, G.: AI Methods for Algorithmic Composition: A Survey, a Critical View and Future Prospects. In: AISB Symposium on Musical Creativity. School of Artificial Intelligence, Division of Informatics, pp. 110–117. University of Edinburgh, Edinburgh (1999) 20. Picard, R.W., Vyzas, E., Healey, J.: Toward Machine Emotional Intelligence: Analysis of Affective Physiological State. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(10) (October 2001) 21. Todd, P.M., Werner, G.M.: Frankensteinian Methods for Evolutionary Music Composition. In: Griffith, N., Todd, P.M. (eds.) Musical Networks, p. 385. MIT Press, Cambridge (1999)
A Practical Robot Coverage Algorithm for Unknown Environments* Heung Seok Jeon1, Myeong-Cheol Ko1, Ryumduck Oh2, and Hyun Kyu Kang1,** 1
Department of Computer Science, Konkuk University, Chungju City, Korea {hsjeon,cheol,hkkang}@kku.ac.kr 2 Department of Computer Science, Chungju National University, Chungju City, Korea
[email protected]
Abstract. While there has been substantial research on coverage and SLAM algorithms, to our knowledge no previous research has considered using each of these algorithms with a service robot. As a result, the performance of these robots is less than adequate for most consumers, especially when the robots only rely on SLAM algorithms for their coverage services in an unknown environment. To address this problem, we propose a new coverage algorithm, AmaxCoverage, an area-maximizing coverage algorithm that efficiently integrates the SLAM solution. Our experimental results show that the AmaxCoverage algorithm outperforms previous representative coverage algorithms in unknown environments and therefore will increase consumers’ confidence toward service robots. Keywords: Practical, robot, coverage, SLAM, real-time.
1 Introduction Coverage algorithms are one of the core technologies required for intelligent robots in domains such as cleaning, harvesting, painting, and lawn mowing. These algorithms determine the path a robot will take to ensure complete coverage of an unknown target area, i.e., the robot will cover every part of this area at least once. For example, a cleaning robot will follow a path planned by its coverage algorithm to clean the target area and guarantee that it does not miss a spot. Coverage algorithms based on randomness, e.g., turn right whenever an obstacle is encountered, are widely used in commercial cleaning robots since they require very little overhead to implement. While these algorithms are simple to implement, they have one major flaw: they are not efficient. For example, as the area to be cleaned increases, a robot utilizing a random algorithm requires more time to finish the task and there is no guarantee that the area will be completely covered. For the consumer, the result of this poor performance is dissatisfaction with the product and possibly the idea of intelligent robots. * This work was supported by Konkuk University in 2010. ** Corresponding author. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 129–140, 2010. © Springer-Verlag Berlin Heidelberg 2010
130
H.S. Jeon et al.
To overcome the problems associated with random algorithms, several smart coverage algorithms have been proposed [1]-[8]. The seminal work of Zelinsky et al. [1] is regarded as the cornerstone of these approaches. They proposed a complete coverage method that uses a distance transform to plan the coverage path. While this approach is more robust than relying on a random algorithm, it has limitations since it requires a complete gridded map of the target environment before the start of the robot’s task. Choset and Pignon extended the work of Zelinsky et al. so that a non-gridded map could be used [2]. The Boustrophedon Path algorithm they proposed is one of the most popular and basic coverage algorithms for recently commercialized cleaning robots. Using this algorithm, the robot crosses the full length of the target area in a straight line, turns around at the boundary, and then traces a new straight-line path adjacent to the previous one, similar to how an ox would plow a field, hence the phrase Boustrophedon Path. By repeating this procedure, the robot will cover the target area when given enough time [2]. However, this approach has limitations. While a robot using the Boustrophedon Path algorithm shows very good performance for an area that contains no obstacles, if the area has obstacles, the performance is dramatically reduced. For example, in a real environment, i.e., non-simulated environment, obtaining a map for a particular area of interest prior to the actual task is not trivial because the layout of each area may be unique and include obstacles that are static, e.g., a chair or other dynamic, e.g., a person. To overcome this difficulty and increase its performance, a robot using the Boustrophedon Path algorithm requires a map and the ability to localize itself within the map to reduce duplicated visits to an area and to plan the most efficient path. The construction of a map for the target area and the ability for the robot to know its location at any point in time necessitate that the coverage algorithm be integrated with one of the solutions for the Simultaneous Localization and Mapping (SLAM) problem. This problem asks whether it is possible for a robot to incrementally build a complete map of an unknown environment while simultaneously determining its location within this map [13]. The SLAM problem has been a major issue for the mobile robotics community because its solution is to make a robot truly autonomous [13]. While EKFSLAM [14] and FastSLAM [15][16] are the two most influential solutions, newer potential alternatives have been proposed. The SLAM problem is considered solved in robotics literature [15]. As a result, coverage algorithms should enable integration with a SLAM algorithm to efficiently cover unknown environments. If coverage and SLAM algorithms are not integrated appropriately, however, the overall performance of the robot could decrease (Figure 1). These maps were constructed from an unknown environment after 30 minutes of coverage using the Random and Boustrophedon algorithms integrated with the FastSLAM algorithm. The results are interesting since the Boustrophedon algorithm, known as the one of the best-performing algorithms for unknown environments, shows a worse performance than the “simple-minded” random algorithm. This is because, in a complex environment, the Boustrophedon algorithm has an increased number of visits near obstacles. To address this issue, we propose a new coverage algorithm, AmaxCoverage, which we describe in the next section. In Section 3, we present the implementation details of the AmaxCoverage algorithm, followed by the performance evaluation results in Section 4. In Section 5, we present our conclusions and mention areas for future work.
A Practical Robot Coverage Algorithm for Unknown Environments
(a) Random Algorithm
131
(b) Boustrophedon Path Algorithm
Fig. 1. Maps Constructed after 30 Minutes of Covering
2 The AmaxCoverage Algorithm The AmaxCoverage algorithm is based on the assumption that a robot does not have a priori knowledge regarding the map of the environment it has to cover. Without a map, a robot cannot plan the path it needs to take to guarantee complete coverage. Our approach for this kind of situation is to have the robot optimize its coverage within a given time. Hence, the goal of the AmaxCoverage algorithm is not to minimize the total coverage time but to maximize the coverage efficiency in real time, i.e., cover the most area in the shortest time. To do this, the algorithm finds and covers the largest and most obstacle-free area. This means that the AmaxCoverage algorithm places a higher priority on large and empty areas with few obstacles than to small and complex areas. Compared with the algorithms mentioned, the AmaxCoverage algorithm provides better results when the total time for the task is not long enough to allow the robot to cover the target area. In this situation, the robot assigns a higher priority to covering large obstacle-free areas. We believe that this is a plausible assumption. Consider the following possible scenarios: 1) you are expecting a visitor to arrive sooner than the total time required for the cleaning task to be completed, or 2) the remaining battery charge of the robot is not sufficient to cover the target area and recharging requires more time (therefore, adding to the total time for the task). In each of these cases, having the robot maximize the coverage area will result in increased efficiency. Another benefit of the AmaxCoverage algorithm is that a robot that uses this algorithm can efficiently cooperate with a human counterpart. In real life, we cannot always expect a robot to perfectly complete its cleaning task all by itself without any assistance from a person. The interior layouts of most houses have atypical and dynamic characteristics. The structures of residences vary in size and shape. Moreover, the arrangement of tables, sofas, and other obstacles are not fixed, so this increases the complexity of the target area. Usually, the corners of rooms, areas beneath low obstacles, and the tops of obstacles need manual cleaning to achieve satisfactory quality [12].
132
H.S. Jeon et al.
For these reasons, it is more practical for people to cooperate rather than fully rely on the cleaning performance of a robot. For example, a robot can initially clean a large and mostly empty portion of a room that has few obstacles. People then complement the task by focusing on the complex parts of the room. This process may consume the same time for the robots to finish the cleaning job, but with assistance, the actual time required for cleaning will be reduced. Therefore, to increase the efficiency of cleaning robots with the help of people, robots can start the cleaning process in less complex areas first. This is another reason why the AmaxCoverage algorithm puts priority on large empty areas with few obstacles rather than on complex areas.
3 Implementation As stated, the AmaxCoverage algorithm works under the assumption that the robot has information about the area to be covered and that new information and a map will be provided while the robot covers that area. If the robot encounters a new obstacle in the environment during the coverage action, the map will have a real-time update to include the new information. This process performed by the AmaxCoverage algorithm as follows. First, it computes the Minimum Bounding Rectangle (MBR) in the scope of the currently available map. The MBR is the minimal, i.e., smallest, rectangle that includes all the areas that are free of obstacles (free areas), areas that contain obstacles (non-free areas), and the walls of the map (boundaries). The MBR is a current snapshot of the map that can dynamically change over time. A robot using the AmaxCoverage algorithm will always want to perform its task on free areas first; therefore, these areas will become the target areas for coverage. Second, if the robot finds new information that is outside the MBR while it is covering the free area, the MBR will be updated. The strategy used for changing the MBR will be discussed at the end of this section. As described in Section 1, the Boustrophedon Path algorithm is one of the most efficient solutions when there are no obstacles in the target area. However, when obstacles are in an environment, it is more difficult to plan a Boustrophedon Path that will have complete coverage. In non-simulated environments, an indoor area could have various obstacles such as tables, chairs, sofas, people, etc. To maximize the performance of the Boustrophedon Path algorithm, we decompose the MBR into free sub-areas, i.e., areas within the MBR without obstacles. This allows the AmaxCoverage algorithm to efficiently use the Boustrophedon Path algorithm. Decomposition utilizes the Rectangle Tiling scheme [17], a mathematical technique that finds all possible sub-rectangles within a rectangle. Given a rectangle of size m x n, N(m, n) sub-rectangles will be in it, where N(m, n) can be computed following Equation (1). For each bottom left corner with coordinates (i, j), there are (m - i)(n - j) possible top right-corners. m−1 n−1
N(m,n) = ∑∑(m−i)(n− j) = 14 m (m+1)n(n+1)
(1)
i=0 j=0
To ensure the efficiency of the Boustrophedon Path algorithm, we need to exclude all the sub-rectangles that contain obstacles. However, finding all possible sub-rectangles
A Practical Robot Coverage Algorithm for Unknown Environments
133
within the MBR may result in Rectangle Tiling taking an unacceptable amount of time to process. To limit this time and increase performance, the AmaxCoverage algorithm discards smaller sub-rectangles below a threshold value by using a RectangleTilingwithoutObstacle algorithm that will generate the rectangles that are good candidates for efficient sub-rectangle decomposition. Algorithm: RectangleTilingwithoutObstacle (mbr) 1: let mbr is a rectangle with an m x n rectangles 2: for i = 0 to m-1 do 3: for j = 0 to n-1 do 4: for p = i+1 to m do 5: for q = j+1 to n do 6: If(rectangle(i, j ,p, q) does not have any occupied cell and sizeof(rectangle(i, j, p, q) > robot size) then 7: Add rectangle(i, j, p, q) to rectangles 8: endif 9: repeat 10: repeat 11: repeat 12: repeat 13: return rectangles End RectangleTilingwithoutObstacle
Fig. 2. Rectangle Tiling Algorithm
The next step is to find a minimal set of sub-rectangles from the rectangles, that is, the set of candidates generated by the RectangleTilingwithoutObstacle algorithm so that the union of all the sub-rectangles in the resulting set covers the target. This problem is as hard as the well-known minimum Set Cover Problem, NP-Complete [18]. For example, consider that each candidate rectangle of size 1x1 is an element and all other candidate rectangles are a set of the unit rectangles. If you can find a minimum set cover in this problem, then you can find a minimum set of candidate rectangles. A greedy algorithm is known to be an efficient approximation for the Set Cover Problem, so we used it to choose the set of rectangles from the candidates. The SetCovering algorithm selected the sets as follows: Algorithm: SetCovering (rectangles) 1: while (uncovered cell exists) 2: Choose the rectangle that contains the largest number of uncovered cells from the rectangle list. If there are rectangles that have the same area size, give a higher priority to the rectangle that has the least number of turns within the rectangle when using the Boustrophedon Path algorithm. 3: Remove the rectangle from the rectangle list and add it to the sefofrectangle list. 4: endofwhile 5: return setofrectangles End SetCovering
Fig. 3. Set Covering Algorithm
134
H.S. Jeon et al.
During each iteration, the set that contains the largest number of uncovered elements is chosen. If there are rectangles that have the same area size, the algorithm will assign a higher priority to the one that has fewer turns within it when using the Boustrophedon Path algorithm. In the SetCovering algorithm (Figure 3), the setofrectangles become the target area for coverage. The next step is to determine the order of visits among the selected rectangles, i.e., the rectangles in setofrectangles. To do this, we measure the coverage efficiency of each rectangle. We define the coverage efficiency as the ratio of the rectangle size to the estimated coverage time required for it to be completely covered. The estimated coverage time for a rectangle is defined as the sum of the complete coverage time of rectangle plus the time it takes the robot to travel of the robot from its current location to the entrance of the rectangle. The coverage time of a rectangle depends on several factors: the size of the rectangle, the speed of the robot, and the number of turns the robot must execute within the rectangle. Additionally, we note that given these factors the coverage time for the same size rectangles could be different due to the number of turns the robot must make to cover it. Even for rectangles that do not have any obstacles (as decomposed rectangles), the number of turns could be different since layouts differ. For example, consider two rectangles which have the same calculated area; one is a square and the other is not. If the robots path is along the length of the rectangle, then the square will require more turns than the non-square rectangle and the total coverage time for the square will be longer than the rectangle. Therefore, the number of turns is an important factor when trying to optimizing coverage time. Once the coverage efficiency for all uncovered rectangles is evaluated, the rectangle that has the greatest coverage efficiency will be selected (denoted as R1 in the getR1 algorithm in Figure 4). Now, the coverage process can begin for the R1 rectangle. The Boustrophedon Path algorithm, in our opinion, is the best choice for covering the R1 rectangle, albeit, other algorithms could also be applied. Algorithm: getR1(setofrectangles, deadline) 1: Set CoverageArea as the number of cells in the rectangle. 2: RecCoverageTime = MoveTimeForOneCell*NumOfCells + TurnOverhead*NumOfTurns 3: EstimatedCoverageTime = TimeOfTravelToRec + RecCoverageTime 4: if(EstimatedCoverageTime > deadline) then 5: EstimatedCoverageTime := deadline 6: LeftTime = deadline - TimeOfTravelToRec 7: CoverageArea = LeftTime / MoveTimeForOneCell 8: endif 9: CoverageEfficiency = CoverageArea / EstimatedCoverageTime 10: For all the rectangles in the setofrectangles left, find the next R1 that has the maximum CoverageEfficiency 11: remove the R1 from setofrectangles 12: return R1 END getR1
Fig. 4. The getR1 Algorithm
A Practical Robot Coverage Algorithm for Unknown Environments
135
Algorithm: coverR1 (rectangle r1) 1: Cover the r1 area by Boustrophedon Path algorithm. 2: The rest of this module would be done while the robot is covering the r1 area. 3: if (new obstacle within r1) then 4: Avoid the obstacle and update the Map; 6: endif 7: if (new obstacle or new free area from mbr) then 8: Update the Map; 9: RECchanged := True; 10: endif 11: if the new obstacle or free area is from Unknown area 12: MBRchanged = True; 13: Update the Map; 14: endif End
Fig. 5. The coverR1 Algorithm Algorithm: AmaxCoverage(deadline) 1: Global Map 2: Bool MBRchanged, RECchanged 3: MBRchanged :=FALSE, RECchanged :=FALSE 4: mbr := call MinimumBoundingRectangle() 5: rectangles := call RectangleTilingwithoutObstacle(mbr) 6: setofrectangles := call SetCovering(rectangles) 7: r1 : = call getR1(setofrectangles, deadline) 8: while(r1 is not null and deadline is not exhausted) 9: move to the initial position of r1 10: call CoverR1(r1) 11: if(MBRchanged == True) Then 12: MBRchanged := False 13: goto line 4 14: endif 15: if(RECchanged == True) Then 16: RECchanged := False 17: goto line 5 18: r1 : = call getR1(setofrectangles) 19: endif 20: endofwhile END AmaxCoverage
Fig. 6. The AmaxCoverage Algorithm
After the robot covers the current rectangle, R1, the algorithm calculates the next R1 and covers it, this process continues until R1 is null. However, as we previously stated, the map of the environment is dynamic; it can change if the robot senses new information. For example, if the robot is covering the R1 rectangle, and encounters a
136
H.S. Jeon et al.
new obstacle or senses a new free area that was previously known as unknown area (i.e., it was not mapped), and then the robot will update the new information on the map. The strategy for completing this is as follows. If a new obstacle is found in the R1 area, the robot will avoid it and update the map. If a new obstacle or free area is found in the MBR area, the rectangle tiling process should be recalculated. If either of them are found outside the MBR area, that is, in an unknown area, the MBR needs to be recalculated since the current MBR is no longer valid for the current map. Figure 5 shows the algorithm for how the robot updates and handles the new information while it is covering the R1 area. Figure 6 gives a summary of the AmaxCoverage algorithm.
4 Experimental Results and Analysis In this section, we evaluate the performance of the AmaxCoverage algorithm. Through the experiments, we show how the AmaxCoverage algorithm is implemented and its performance results within an unknown environment. The AmaxCoverage algorithm was implemented using Player/Stage, versions 2.1.1 and 2.10. We used the Pioneer 2 DX robot with the SICK laser (LMS200) sensor for the experiments. To build a real-time map for the simulated indoor environment, we utilized the FastSLAM algorithm. Several maps were used for the evaluation of various indoor layouts. Figure 7(a) is a representative example. The dark-black-colored areas represent walls and obstacles and the three red circles, denoted as P1, P2, and P3, represent the starting points of the robot in our experiments, these were used to verify the stability of the AmaxCoverage algorithm’s performance. The white area represents the free area, i.e., area to be cleaned. To evaluate the performance of the algorithm, the robot was set with an average velocity of 0.4m/s and a delay time of 0.2 m/s per turn.
(a) Sample Area (left) (b) Map Constructed after 30 Minutes by AmaxCoverage(right) Fig. 7. Sample Area and Map constructed by AmaxCoverage algorithm
A Practical Robot Coverage Algorithm for Unknown Environments
137
Since the robot does not have any a priori information about the environment, it needs to build a map using the FastSLAM algorithm starting from one of the initial positions. Figure 8 shows the performance evaluation results of the AmaxCoverage algorithm compared with the Boustrophedon Path and Random algorithms. The graphs show the percentage of the completed area over time. At the start of the experiments, the AmaxCoverage algorithm is comparable to the Random and worse than the Boustrophedon due to the planning process, which includes rectangle tiling and set covering. However, in the latter phases of the experiments, the AmaxCoverage algorithm reduces cleaning time by almost 50% compared with the other algorithms regardless of the initial starting point.
(a) Results from initial position ‘P1’ (b) Results from initial position ‘P2’
(c) Results from initial position ‘P3’ Fig. 8. Cleaning completion rate
138
H.S. Jeon et al.
Another advantage of the AmaxCoverage algorithm is shown in Figure 7(b). This map was constructed after 30 minutes of covering using the AmaxCoverage algorithm. As shown, the map from the AmaxCoverage algorithm is most similar to the original map in Figure 7(a) compared with the results of the Random and Boustrophedon algorithms discussed in Section 1 and shown in Figure 1. Our experiments suggest the AmaxCoverage algorithm can complete the coverage of an unknown environment much faster if a human cooperates with the robot on the task. Figure 8 shows that the covering of the AmaxCoverage algorithm achieved an 80%–90% completion rate in the same amount of time that the covering of the Boustrophedon algorithm reached only a 40%–50% completion rate. Therefore, if a human working in cooperation with the AmaxCoverage algorithm-based robot, covered the remaining complex areas, i.e., non-free areas, the total covering time could be decreased even more. This is possible because the AmaxCoverage algorithm maximizes the coverage of free areas, which can be covered more quickly than nonfree areas, in other words, the areas that a human can, at this time, cover much more efficiently.
5 Conclusion In this paper, we proposed a new coverage algorithm, AmaxCoverage. The AmaxCoverage algorithm efficiently integrates with a SLAM algorithm for the unknown environment in real time. For covering the target area, the algorithm gives priority to large mostly empty spaces with few obstacles compared to complex areas. Our approach allows the algorithm to achieve high coverage efficiency and when working cooperatively with a human will lead to more efficient total coverage of an unknown environment when compared to the Random and Boustrophedon algorithms. Our future research includes on how to detect and account for dynamic obstacles, such as people, pets, etc. Additionally, we are working on extending the types of grid map the AmaxCoverage algorithm can use, e.g., topological maps.
References 1. Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. J. Mol. Biol. 147, 195–197 (1981) 2. May, P., Ehrlich, H.C., Steinke, T.: ZIB Structure Prediction Pipeline: Composing a Complex Biological Workflow through Web Services. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 1148–1158. Springer, Heidelberg (2006) 3. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1999) 4. Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. In: 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181–184. IEEE Press, New York (2001)
A Practical Robot Coverage Algorithm for Unknown Environments
139
5. Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid: an Open Grid Services Architecture for Distributed Systems Integration. Technical report, Global Grid Forum (2002) 6. Zelinsky, A., Jarvis, R.A., Byrne, J.C., Yuta, S.: Planning Paths of Complete Coverage of an Unstructured Environment by a Mobile Robot. In: Proceedings of International Conference on Advanced Robotics, Tokyo, Japan, pp. 533–538 (November 1993) 7. Choset, H., Pignon, P.: Coverage Path Planning: the boustrophedon cellular decomposition. In: Proceedings of the International Conference on Field and Service Robotics, Canberra, Australia (December 1997) 8. Neumann de Carvalho, R., Vidal, H.A., Vieria, P., Riberio, M.I.: Complete Coverage Path Planning Guidance for Cleaning Robots. In: Proceedings of the IEEE Int. Symposium on Industrial Electronics, vol. 2 (1997) 9. Wong, S.C., MacDonald, B.A.: A topological coverage algorithm for mobile robots. In: Proceedings of the 2003 IEEE/RSJ Intl. Conference on Intelligent Robots and Systems, Las Vegas, Nevada (October 2003) 10. Yoon, S.H., Park, S.H., Choi, B.J., Lee, Y.J.: Path Planning for Cleaning Robots: A Graph Model Approach. In: Proceedings of the International Conference on Control, Automation and Systems, Cheju National University, Jeju, Korea, October 17-21, pp. 2199–2202 (2001) 11. Agmon, N., Hozon, M., Kaminka, G.A.: Constructing Spanning Trees for Efficient MultiRobot Coverage. In: Proceedings of the 2006 IEEE International Conference on Robotics and Automation, Orlando, Florida (May 2006) 12. Hazon, N., Mieli, F., Kaminka, G.A.: Towards Robust On-line Multi-Robot Coverage. In: Proceedings of the 2006 IEEE International Conference on Robotics and Automation, Orlando, Florida (May 2006) 13. Doh, N.L., Kim, C., Chung, W.K.: A Practical Path Planner for the Robotic Vacuum Cleaner in Rectilinear Environments. IEEE Transactions on Consumer Electronics 53(2), 519–527 (2007) 14. Williams, K., Burdick, J.: Multi-robot Boundary Coverage with Plan Revision. In: Proceedings of the 2006 IEEE International Conference on Robotics and Automation, Orlando, Florida (May 2006) 15. Batalin, M.A., Sukhatme, G.S.: Coverage, Exploration and Deployment by a Mobile Robot and Communication Network. In: Zhao, F., Guibas, L.J. (eds.) IPSN 2003. LNCS, vol. 2634, pp. 376–391. Springer, Heidelberg (2003) 16. Schwager, M., Slotine, J.-J., Rus, D.: Decentralized, Adaptive Control for Coverage with Networked Robots. In: Proceedings of the 2007 IEEE International Conference on Robotics and Automation, Roma, Italy, April 10-14, pp. 10–14 (2007) 17. Taik, O.Y.: Survey on Cleaning Robot Consumer. In: Proceedings of 2nd Annual Workshop of Korea Robotics Society (2005) 18. Bailey, T., Durrant-Whyte, H.: Simultaneous Localization and Mapping (SLAM): Part I. IEEE Robotics & Automation Magazine 13(2), 99–108 (2006) 19. Dissanayake, G., Newman, P., Durrant-Whyte, H.F., Clark, S., Csobra, M.: A Solution to the Simultaneous Localisation and Mapping (SLAM) Problem. IEEE Transactions on Robotics and Automation 17(3), 229–241 (2001) 20. Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B.: Fast-SLAM: A Factored Solution to the Simultaneous Localization and Mapping Problem. In: Proceedings of AAAI National Conference on Artificial Intelligence, pp. 593–598 (2002)
140
H.S. Jeon et al.
21. Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B.: Fast-SLAM 2.0: An Improved Particle Filtering Algorithm for Simultaneous Localization and Mapping that Probably Converges. In: Proceedings of International Joint Conference on Artificial Intelligence, pp. 1151–1156 (2003) 22. Stewart, I.: Squaring the Square. Scientific American, 74–76 (July 1997) 23. Feige, U.: A Threshold of ln n for Approximating Set Cover. Journal of the ACM (JACM) 45(4), 634–652 (1998) 24. Fox, D.: Adapting the Sample Size in Particle Filters through KLD-Sampling. International Journal of Robotics Research 22 (2003)
An Algorithm for the Automatic Generation of Human-Like Motions Based on Examples Juan Carlos Arenas Mena1 , Jean-Bernard Hayet1 , and Claudia Esteves2 1
Centro de Investigaci´ on en Matem´ aticas, CIMAT, Guanajuato, M´exico 2 Departamento de Matem´ aticas de la Universidad de Guanajuato, Guanajuato, M´exico {jcarenas,jbhayet,cesteves}@cimat.mx
Abstract. In this work we present an algorithm to automatically generate eye-believable animations for virtual human-like characters evolving in cluttered environments. The algorithm has two components: (1) a motion planner for a simplified model of the character that produces admissible paths for this model and (2) a motion generator based on captured sequences to transform the planned path into a trajectory that can be followed by the complete model of the character. This second component uses a motion capture database and a classifier to choose which motions in the database are the most appropriate to follow the computed path. We give examples of successful plans generated with our method. Keywords: Motion planning, virtual characters.
1
Introduction
During the past decade, the problem of developing automatic methods for synthesizing human motion has gained attention from various research communities. This interest is mainly due to the increasing number of its applications, such as entertainment (video games, movies), product or building design (e.g. planning escaping routes from a building), training simulators, education, etc. Human motion synthesis is particularly challenging for two main reasons: (1) our familiarity with human motions makes even small artifacts on the synthesized ones look unnatural, and (2) the high-dimensionality of the representation of anthropomorphic figures makes the specification of every configuration highly redundant relative to almost every task and the planning task extremely complex. A now standard solution for dealing with the first issue is to use a database of previously recorded motions from human actors and use them on the synthetic characters. In this work, we use a database of motion capture clips to obtain natural-looking motions on our character. The second issue is tackled here by using a simplified model of the character to reduce the problem dimensionality. Overall Approach. In this work, we propose a motion planner that, given the initial and final positions and orientations for the virtual character and a set of motion capture clips, computes automatically a collision-free, eye-believable G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 141–153, 2010. c Springer-Verlag Berlin Heidelberg 2010
142
J.C. Arenas Mena, J.-B. Hayet, and C. Esteves
trajectory for the character in a cluttered environment. To do this, we propose a reduced model of the character, which allows us to reduce the problem dimensionality on the two separate parts of our approach: Motion Analysis and Motion Generation. The Motion Analysis part is needed to build a suitable motion database from the input motion capture sequences. These clips are not previously annotated with the type of action they represent (e.g. walking, running, jumping, etc.) which reduces user manual input and makes our whole approach automatic. The dimensionality of the motion data is reduced by using a simplified model of the character and Principal Component Analysis (PCA). In the Motion Generation part a collision-free path is first obtained using probabilistic motion planning techniques. This path is then segmented into homogeneous segments, each of which is compared with the database motions to find the type of action it represents as well as the best clip(s) to follow the path. Contributions. The contribution of this work is twofold: (1) We provide an original simplified model that reduces the dimensionality of the virtual character skeleton, and that we use both for planning and for motion classification; (2) We propose an approach that does not need a manual annotation of the motion database. This allows to consider several behaviors within the generated motions. Paper Organization. In Section 2, we review the most relevant work related to our problem. Section 3 describes our simplified models of the virtual character. Section 4 describes the method for building the motion database from the captured clips. In Section 5, the algorithm for generating the motion, from the path to the trajectory is detailed. Section 6 presents some results obtained using our method and finally, in Section 7, conclusions and future work are discussed.
2
Related Work
Among all the methods that have been proposed to synthesize human motion from examples, the most popular is maybe the Motion Graphs (e.g. [9,1]), which stores a set of captured clips and automatically constructs transitions between them when they are pertinent. Clips and transitions are stored in a directed graph, the Motion Graph, which is searched when new animation sequences are needed. The graph representation has the advantage that it preserves the realism of the original clips and new animation sequences can be synthesized only by performing graph searches. However, as the clips are only stitched together, it is not easy to obtain a fine control and variation necessary to follow a desired path. Hence, other works have proposed controllers, based on the combination of captured clips. In [13] Pettr´e et. al. propose a locomotion controller to follow a path of the character pelvis. Independently of how this path is obtained, the controller takes as input the linear and angular velocities of the target path and searches, among the database examples, the three clips with the closest velocities and interpolates them to get the desired motion. Here, we use at some point Pettr´e’s approach to follow the paths computed from the motion planner.
Automatic Generation of Human-Like Motions Based on Examples
143
In the context of motion planners for virtual characters, several examplesbased motion synthesizers among obstacles have been proposed (e.g. [14,4,12,11,6]). These methods can be divided in single-query and multiple-query. Among the single query methods for virtual characters, [11] have proposed a method using a finite-state machine, where the states are motion capture clips of the same type (running, walking, jumping, etc. ) and the connections are transitions between them. The environment is represented as a 2D height map annotated with the type of motion that should be used near certain obstacles. A Rapidly-exploring Random Tree (RRT) [10] is used, using the finite-state machine as a control to compute a feasible path in the given environment. Among the multiple-query methods, in which our method is inscribed, most methods are two-step. Generally, a feasible path for a reduced model is found first and, in a second step, this path is followed using a motion capture database. In [14], the authors propose a multi-layered grid, where each layer of the grid consists of a single posture of the character. These postures represent a characteristic configuration of a type of movement such as walking, jumping, crawling, etc. A collision-free path is found in this grid by giving some postures throughout the path, which are interpolated and dynamically validated to obtain a feasible path for the character. In [4], the authors plan a feasible, collision-free path for the footprints of the character. These footprints are the nodes of a graph and are linked with motion capture clips. Retargeting methods are used to satisfy the constraints imposed by each footprint position and orientation. In [12], the authors propose a two-stage method to synthesize new motions among obstacles from existing examples by first computing a collision-free path feasible for a rigid box that has the size of the character. The resulting path is considered as a target path for the character pelvis and is converted into a trajectory by using the locomotion controller of [13]. In [6], the authors take a similar approach but use a functional decomposition of the character and compute collision-free paths only for the lower part of the character. In a second stage, the path is converted into a trajectory using the same locomotion controller as above but at the same time enforcing manipulation constraints using inverse kinematics algorithms. In a third stage, residual collisions are eliminated with a local motion planner. Our work is also a two-stage multiple-query method as [13] and [6]. Our contribution relative to these works are, the addition of more behaviors into the planner and a more complex reduced model to plan the initial collision-free path.
3
Model of the Virtual Character
A virtual character is usually represented as a group of linkages and joints rooted on the pelvis (see Fig. 1(a)). Every joint in this whole-body model is spherical, i.e. its configuration can be completely defined by three angles. The root is a free floating object in 3D space, whose configuration is specified by three parameters of translation and three of rotation. This model has normally around 53 degrees of freedom for a humanoid, which is not adequate for path planning. In our work, we use this complex model only after having computed a feasible path and only
144
J.C. Arenas Mena, J.-B. Hayet, and C. Esteves
(a)
(b)
(c)
Fig. 1. (a) Whole-body model of the character. (b) Reduced model for motion analysis.(c) Reduced model for motion synthesis.
for the purpose of executing the motions that have been selected to follow it. Instead of using this complete model, we rely on two reduced models, one for motion analysis (Fig. 1(b)) and the other for motion synthesis (Fig. 1(c)). To reduce the dimensionality of the input data (motion capture clips) in the motion analysis procedure, a polygonal mesh that bounds the character is used. Polygonal meshes are frequently used when adding skin or clothes to a virtual character. In such a mesh, the displacement of any control point pk (i.e., a vertex in Fig. 1(b)) is associated to the motion of each joint ji of the skeleton through a weight wi,k that represents empirically the influence of the joint ji on the motion of a vertex, e.g. a weight wi,k = 0.5 means that the vertex pk moves half the amount and in the same direction as the joint ji . The displacement of a given joint ji is measured relative to an initial posture, usually called bind pose. Hence, the displacement of a control point pk to a new position pk is given by pk = pk +
J i=1
wi,k ai ,
0≤
J
wi,k ≤ 1,
i=1
where J is the total number of joints in the skeleton, ai is the 3D displacement vector of joint ji after being transformed by its hierarchy and wi,k is the influence of a given joint on a vertex pk of the mesh. The idea is that, when constructing our motion database, we do not use the values of all the joints ji but only the values of some of the vertices on the polygonal mesh (typically 5 control points) which (1) greatly reduces the problem dimensionality and (2) can be related to the model used in planning. The third model, depicted on Fig. 1(c), is used for planning and synthesizing motions. It consists of a series of boxes linked with a vertical and limited translation joint. The group of boxes has a size such that when the translating joints are at their default value, they bound the character in a neutral position, whereas, at their lower limits, they bound a crawling character and when at their upper limits they bound a jumping character. The way to generate valid configurations for this model is by using the 3D Chainmail algorithm, used for
Automatic Generation of Human-Like Motions Based on Examples
145
volume deformation [7]. When an element of the chain is moved, its displacement affects only the position of its neighboring elements allowing fast propagation of the deformation through the system. When the box moves up or down, the chain linking the boxes absorbs the motion and is stretched up to some limit; when this limit is reached, the motion is transmitted to the neighbors. Hence, small displacements have only local effect while large displacements affect the whole system. For example, if two points a and b, with an initial and final (vertical) positions ya and yb , respectively, are linked by a translational joint with a bound ya+ , a displacement yc imposed on a (e.g. because of a collision at height yc ) induces new coordinates of ya and yb that will satisfy
ya = {
yb = {
ya + yc if ya + yc < ya+ ya+ if ya + yc ≥ ya+ yb if ya + yc < ya+ . + yb + (ya + yc − ya ) if ya + yc ≥ ya+
After applying the motion, the system returns to its neutral position. This model is used to find a collision-free, smooth and feasible path for the character, as we will describe it in Section 5.
4
Database Construction: Motion Analysis
Along with the initial and final positions and orientations of the characters, the input data to our algorithm consists on a set of motion capture clips. These clips are processed and arranged into a database so they can be later compared with the output of a motion planner and the pertinent clip(s) can be chosen to follow the computed path. The first step to construct the database is to reduce
(a)
(b)
(c)
Fig. 2. (a) Polygonal mesh used to reduce dimensionality of the input motion capture clips. The front part of the mesh along with its numbered vertices pk are shown. (b) One frame of a bending motion, bounded by a polygonal mesh. (c) Example of a reduced input vector for a bending motion showing the height of the selected mesh vertices.
146
J.C. Arenas Mena, J.-B. Hayet, and C. Esteves
the dimensionality of the input data: a typical motion clip consists of more than 500 frames and each frame is specified by one value for every degree of freedom (DOF) of each joint ji , making each clip dimension around 500 × 53. For this purpose, we use the polygonal mesh model of Fig. 2(a), described in Section 3, that bounds the character and divides it vertically into five sections, each giving relevant information on the motion being analyzed. The weights wi,k of this skinning process are given in Table 1. Each input motion capture clip is transformed in this way, and a smaller subset of the mesh vertices are effectively extracted and stored. From observation, we have found that only the height of five points are enough to characterize the motions considered in this work: yp2 , yp5 , yp8 , yp11 and yp14 , where ypi is the coordinate y of point pi (see Fig. 2(a)). Table 1. Weights wi,k assigned to the mesh vertices on the polygonal mesh during the database construction Joint ji1 \Vertex pk Head j16 Spine j12 Pelvis j0 LCalf j8 RCalf j3 LFoot j10 RFoot j5
1 0.0 0.0 0.0 0.0 0.0 1.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.5 0.5
3 0.0 0.0 0.0 0.0 0.0 0.0 1.0
4 0.0 0.0 0.0 1.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.5 0.5 0.0 0.0
6 0.0 0.0 0.0 0.0 1.0 0.0 0.0
7 0.0 0.0 1.0 0.0 0.0 0.0 0.0
8 0.0 0.0 1.0 0.0 0.0 0.0 0.0
9 0.0 0.0 1.0 0.0 0.0 0.0 0.0
10 0.0 1.0 0.0 0.0 0.0 0.0 0.0
11 0.0 1.0 0.0 0.0 0.0 0.0 0.0
12 0.0 1.0 0.0 0.0 0.0 0.0 0.0
13 1.0 0.0 0.0 0.0 0.0 0.0 0.0
14 1.0 0.0 0.0 0.0 0.0 0.0 0.0
15 1.0 0.0 0.0 0.0 0.0 0.0 0.0
At this stage, each input capture clip with dimension 500 × 53 has been reduced to a vector of dimension 500 × 5, by keeping only the heights of vertices 2, 5, 8, 11 and 14. Fig. 2(b) shows one configuration of a bending motion with the polygonal mesh, and Fig. 2(c) shows an example of the time evolution of the five heights, for the same bending motion. It can clearly be seen that the head and shoulders lower while the feet, knees and waist remain at the same height. At this step, a motion is still represented by a vector of dimension 2500. This is why many techniques to segment or classify motion capture data use methods to reduce dimensionality of the data, such as Principal Component Analysis (PCA) [2]. Another advantage to use dimensionality reduction is that noisy data components are eliminated and only the most relevant ones are kept for classification. Since our input data has much more variables (heights of the mesh vertices in all frames) than observations (number of clips), standard PCA cannot capture the relationships between variables. Instead, we use Kernel-PCA with a linear kernel [3]. To apply Kernel PCA with a data matrix M , and a linear kernel φ(M ) = M , we compute the Gram matrix K = φ(M )φ(M )T through mj =
nc 1 Xij , for j = 1 to nvar nc i=1
Mij = Xij − mj , for i = 1 to nc and j = 1 to nvar K = MMT , where X is the nc × nvar observation matrix, nc is the number of motion capture clips and nvar is the number of variables in an observation (i.e.,
Automatic Generation of Human-Like Motions Based on Examples
147
Fig. 3. Example of 16 input motion capture clips converted into the new coordinate system extracted from performing a Kernel-PCA method. Here, motions are labeled only for the purpose of clarity as our method does not label motions explicitly.
number of frames × 5)). The row vector m contains the mean of each column of matrix X and is subtracted to each row of X to get M , nc × 2500. Then, the nc × nc Gram matrix K can be computed. The next step is to obtain the eigenvectors of K, which is done by using Singular Value Decomposition (SVD). Applying SVD gives: K = U SV T , where S is a diagonal matrix containing the singular values and V is the matrix containing the singular vectors. To move our input data into the new coordinate system defined by the eigenvectors we apply Z = K ∗ V,
Zij Zij = √ . Sii
At this point, Z is nc × nc and contains the input motion capture data expressed in the new coordinate system. Each vector can be further reduced by taking the first d principal components. With the motion examples taken as input in this work, 90% of the original input information could be recovered by using d = 3 principal components. One can see in Fig. 3 the whole motion database projected on these components.
5
Motion Synthesis
Once a motion database has been created, the second stage of our method is to synthesize collision-free whole-body motions for a virtual character navigating in a cluttered environment. Three main steps are followed: (1) a collision-free path is computed using the reduced model of the system shown in Fig. 1(c); (2) the computed path is segmented into homogeneous parts by detecting large changes in the path coordinates and (3) each segment is compared with the database motions to find the type of motion and to determine the adequate controller and input examples to use to generate the whole body motion to follow the computed path. In the following paragraphs these three steps are described.
148
5.1
J.C. Arenas Mena, J.-B. Hayet, and C. Esteves
Path Planner for the Reduced Model
In order to obtain a feasible collision-free path, any multi-query probabilistic motion planner can be used. Here, we use a variant of the Probabilistic Roadmap (PRM) algorithm [8], that captures the topology of Cf ree , i.e. the set of all the collision-free configurations the system can attain, in a roadmap, without computing it explicitly. The roadmap is then used to find a collision-free path that connects the initial and final configurations. The main idea of the PRM is to draw random configurations in Cf ree and to connect them with edges to their k-neighbor samples. Edges or local paths should also be collision-free and their form depends on the kinematic constraints of the system. This is the learning phase, where random configurations are drawn within the range allowed for each degree of freedom of the mechanism to build the roadmap. In the query phase, the initial and final configurations are added as new nodes and connected with the existing graph. Then, a graph search is performed to find a collision-free path between the start and the goal configurations. If such a path is found, then it is smoothened to remove useless detours. The difference between our method and the classic PRM algorithm is that, when a colliding configuration is detected, the Chainmail algorithm mentioned in section 3 looks for a configuration that can avoid this collision by displacing the boxes from the reduced model up or down. B´ezier curves are used to obtain smooth paths.
(a)
(b)
(c)
(d)
Fig. 4. Two different planning scenarios. (a) The system avoids the obstacle by passing on its side. (b) The system avoids the obstacle by passing over it, thanks to the inclusion of the Chainmail technique inside the planner. (c) When jumping, the height of the jump is an important parameter for whole-body motion generation. (d) Height of each box extracted from the computed path.
In Fig. 4, two scenarios are shown. In Fig. 4,(a) the obstacle is so tall that the reduced model cannot avoid it except when passing on its side. Three paths are shown: the path extracted from the PRM which makes a detour (thin dotted line), the optimized path avoiding detours (dotted line) and the smoothened path with B´ezier curves (continuous curve). In Fig. 4(b) the obstacle is avoided using the Chainmail, i.e. the reduced model can go over the obstacle. The path computed by the planner is a sequence of configurations specified with 8 DOFs, three for the system position (x, y), one for the orientation θ, and
Automatic Generation of Human-Like Motions Based on Examples
149
5 for boxes heights. In Fig. 4(c), an example path for the 5 heights is shown. This path is smoothened to be better compared with the input motion captures. 5.2
Path Segmentation and Classification
Path segmentation consists in obtaining ns motion segments Si from the path computed by the planner. Ideally, each segment Si would have a given type of motion and its size is not important as long as the type of motion remains the same. They are the input to a nearest-neighbor classifier in the motion capture database and can be labelled with the type of motion they correspond to. Our segmentation method is simple : we detect strong changes in the heights of the boxes (the y-coordinates of the linkages of the reduced model). The initial and final points in the segment Si are set a few frames before and after the strong change occurs. Fig. 5(a) shows an example of a path segmented into five pieces. To be able to compare each segment with the input data, the segment is resampled to have a length of 500 frames. Each segment will therefore have a dimension of 500 × 5 frames, the same as the input motion data.
(a)
(b)
Fig. 5. (a)Segmentation of a path. Segments are divided when a strong change in height is detected. (b)A segment is classified by projecting it to coordinate system defined by the principal components of the the training data and then finding its nearest neighbor. In this example the segment is classified as a Jump.
Then, the classifier projects the segment into the coordinate system provided by the training data principal components. Here, the nearest neighbor is found (see Fig. 5(b)), which gives the adequate type of movement to follow the path. 5.3
Whole-Body Motion Synthesis
Once the type of motion is identified from the classification process, a specific local controller for each type of motion (walking, jumping, bending, etc.) is used to generate the whole-body motions needed to follow the path. Each controller needs a different set of input parameters, among which the linear velocity provided by the user. In this work, we use only three types of motion and therefore three local controllers. The first one is the locomotion generator presented in
150
J.C. Arenas Mena, J.-B. Hayet, and C. Esteves
[13], which uses as input the linear and angular velocities of the desired walking pattern (extracted from the path) and produces a locomotion sequence by interpolating the 3 closest captures in the example input data. This controller is used when the segment is classified as a walking/running motion. The second controller chooses an appropriate jumping motion among the motion capture data by using the linear velocity and the jumping height extracted from the path segment. The third controller, for bending motions, is very similar to the jumping controller except that it takes the amount of displacement of highest point of the reduced model to obtain a bending or crawling motion capture that avoids the obstacle. After the whole-body motions are obtained for each segment , they are interpolated with the previous segments to get the complete trajectory. Algorithm 1 sums up the complete motion generation method. Algorithm 1. Whole-body motion generation algorithm Require: Initialize database {see Section 4} 1: Compute path p {see Section 5.1} 2: Segment path p = ∪Segmenti {see Section 5.2} 3: for i = 1 to ns do {ns is the number of segments in the path} 4: Labeli = nearest motion clip to Segmenti 5: Sequencei = subcontroller(Labeli , Segmenti ) 6: end for 7: for i = 1 to ns do {ns is the number of segments in the path} 8: Trajectory = Trajectory + Sequencei 9: end for 10: return Trajectory
6
Results
In this section, some results using different environments are presented. We use 16 input motion captures from the CMU database [5] which are of three different behaviors or types: jumping, walking/running and bending/crawling. To construct the database the “skinning” process (see Section 4) is applied to all 16 input motions, and the resulting 2500-dimensional vectors are projected into the coordinate system specified by the first 3 principal components obtained after applying the Kernel-PCA method as they contain at least 90% of the information from the original motion data. The constructed database, seen in Figure 3, has tree well-identified clusters of motions: jumps, walks and bends. Walking Motions. The first example of synthesized motions is a walking motion. A path is computed using the PRM-Chainmail method avoiding the obstacle by going on its side. A graph of the y-coordinates of the joints of the boxes is shown in Fig. 6(a). As there are no strong height changes, only one segment is extracted and classified as shown in Fig. 6(b). Figure 6(c) shows the motion generated by the controller of Pettr´e et. al. [13]. The computational time for this example on a standard PC was 7.02s (planning with PRM construction) plus 0.0026s (path segmentation) plus 0.018s (path classification).
Automatic Generation of Human-Like Motions Based on Examples
(a)
(b)
151
(c)
Fig. 6. Walking trajectory: (a) Planning/segmentation, (b) Classification, (c) Wholemotion synthesis
(a)
(b)
(c)
(d) Fig. 7. A walk-jump-walk trajectory: (a) Planning/segmentation, (b) Classification, (c)Whole-motion synthesis. (d) A trajectory with 7 segments and 3 types of motions.
Jumping and Bending Motions. The second example is a walk-jump-walk motion. The planner computed a path that goes over the obstacle. A graph of the heights of the joints of the reduced model is shown in Fig. 7(a). Here, 3 segments are extracted and classified as walking and jumping motions (Figure 7(b)). From the path data the 2 controllers generate 3 trajectory segments which are interpolated to form the complete trajectory seen in Figure 7(c). Computational time for this example was 3.02s (planning with PRM construction) plus 0.001s (path segmentation) plus 0.017*3 s (path classification, as there are 3 segments). A third example (Fig. 7 (d)) shows a trajectory with all three types of motions and 7 segments. The computational time was 3.56 s (planning with PRM construction) plus 6.26 s (path segmentation) plus 0.018*7 s (path classification).
152
7
J.C. Arenas Mena, J.-B. Hayet, and C. Esteves
Conclusions and Future Work
We have presented an automatic motion planner and synthesizer for virtual characters in cluttered environments. By using two reduced models of the system, our method is able to produce human-like motions chosen among the examples stored in a motion database. We have shown examples of the trajectories generated with our method in challenging environments. As future work we intend to propose new motion controllers for different types of motion, such as jumping or bending in order to produce new motions from existing ones and with this, better satisfying the constraints imposed on the character by the computed path. Among the limitations of our method, the motion needed to avoid obstacles (e.g., bending) may have an amplitude which is not available in the motion database. To solve this problem, more motions could be included in the database or the local controllers could use a generalized inverse-kinematics method to locally avoid the obstacle. Also, it may happen that the planner provides a path with two consecutive segments, and that, when synthesizing the motions, the first motion ends further than the point where the second motion has to start. This can be solved by interpolating the consecutive segments according to the obstacles or by adapting the motion length.
References 1. Arikan, O., Forsyth, D.A.: Interactive motion generation from examples. ACM Transactions on Graphics 21(3), 483–490 (2002) 2. Barbiˇc, J., Safonova, A., Pan, J., Faloutsos, C., Hodgins, J., Pollard, N.: Segmenting motion capture data into distinct behaviors. In: Proc. of Graphics Interface, GI 2004 pp. 185–194 (2004) 3. Bishop, C.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2007) 4. Choi, M., Lee, J., Shin, S.Y.: Planning biped locomotion using motion capture data and probabilistic roadmaps. ACM Trans. Graph. 22(2), 182–203 (2003) 5. CMU: Motion capture database (2003), http://mocap.cs.cmu.edu 6. Esteves, C., Arechavaleta, G., Pettr´e, J., Laumond, J.-P.: Animation planning for virtual characters cooperation. ACM Trans. Graph. 25(2), 319–339 (2006) 7. Gibson, S.: 3d chainmail: a fast algorithm for deforming volumetric objects. In: I3D 1997: Proc. of the Symp. on Interactive 3D Graphics, pp. 149–154. ACM, New York (1997) 8. Kavraki, L., Svestka, P., Latombe, J.-C., Overmars, M.: Probabilistic roadmaps for path planning in high dimensional configuration spaces. IEEE Trans. on Robotics and Automation 12(4), 566–580 (1996) 9. Kovar, L., Gleicher, M., Pighin, F.: Motion graphs. ACM Transactions on Graphics 21(3), 473–482 (2002) 10. Kuffner, J., Lavalle, S.: Rrt-connect: An efficient approach to single-query path planning. In: IEEE Int. Conf. on Robotics and Automation, pp. 995–1001 (2000) 11. Lau, M., Kuffner, J.J.: Behavior planning for character animation. In: SCA 2005: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation, pp. 271–280. ACM, New York (2005)
Automatic Generation of Human-Like Motions Based on Examples
153
12. Pettr´e, J., Laumond, J.-P., Sim´eon, T.: A 2-stages locomotion planner for digital actors. In: SCA 2003: Proc. ACM SIGGRAPH/Eurographics Symp. on Computer Animation, pp. 258–264 (2003) 13. Pettr´e, J., Laumond, J.-P.: A motion capture-based control-space approach for walking mannequins. Comp. Anim. Virtual Worlds 17, 109–126 (2006) 14. Shiller, Z., Yamane, K., Nakamura, Y.: Planning motion patterns of human figures using a multi-layered grid and the dynamics filter. In: ICRA, pp. 1–8 (2001)
Line Maps in Cluttered Environments Leonardo Romero and Carlos Lara Division de Estudios de Posgrado Facultad de Ingenieria Electrica Universidad Michoacana, Ciudad Universitaria, 58060, Morelia, Mexico {lromero,larac}@umich.mx
Abstract. This paper uses the smoothing and mapping framework to solve the SLAM problem in indoor environments; focusing on how some key issues such as feature extraction and data association can be handled by applying probabilistic techniques. For feature extraction, an odds ratio approach to find multiple lines from laser scans is proposed, this criterion allows to decide which model must be merged and to output the best number of models. In addition, to solve the data association problem a method based on the segments of each line is proposed. Experimental results show that high quality indoor maps can be obtained from noisy data. Keywords: Robotics, Probabilistic Reasoning, Line Maps.
1
Introduction
The Simultaneous Localization And Mapping (SLAM) is a classical problem in robotics. To solve it, a robot must use the measurements provided by its sensors to estimate a map of the environment and, at the same time, to localize itself within the map. While localization with a given map or mapping with known positions is relatively easy, the combined problem is hard to solve. Several approaches for SLAM have been proposed, the most successful are based on probabilistic techniques[1]. The principal difference between them are the map representation and the uncertainty description; feature–based SLAM approach uses a collection of primitives to represent the map. Features are recognizable structures of elements in the environment. The feature extraction process reduces the complexity by capturing only the essential infomation of the raw data; this data reduction allows to represent and manage the sensor information efficiently but the resulting maps are usually less expressive and precise than when using the raw data. Lines and segments are commonly used to represent indoor environments, and the preferred sensor is the laser scan. Many algorithms are available in the literature for extracting multiple lines from range scans; unfortunately, some of these approaches are ad hoc methods that use a distance threshold. One of the contributions of this paper is a new method to find multiple lines from laser scans; G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 154–165, 2010. c Springer-Verlag Berlin Heidelberg 2010
Line Maps in Cluttered Environments
155
the formulation presented focuses on estimating a likelihood ratio on the number of lines that are present in a laser scan. The proposed approach follows the Ockham’s razor principle; this principle states that the simplest explanation for some phenomenon is more likely to be accurate than more complicated explanations. Other approaches that follows this principle are the Minimum Description Length (MDL) [2,3] and the Akaike Information Criterion (AIC) [4]. Another trouble that faces the feature–based SLAM approach is known as the association problem; this problem consists of associating the newest features with those already stored in the map. This is also a crucial problem, and a good option to solve it is the Joint Compatibility Test (JCT)[5]. In this paper, a validation gate is used within the JCT; the validation gate uses the segments associated with each line to improve the association. To solve the SLAM problem, the Smoothing and Mapping (SAM) framework[6] is implemented; SAM is a smoothing approach rather than the commonly used filtering one. The essential computational advantage arises due to the smoothing information matrix that remains sparse without the need for any approximations; SAM is based on QR and Cholesky matrix factorizations that greatly speed up the optimization procedure leading to a very efficient algorithm. The rest of the paper is organized as follow: Section 2 introduces the new approach to determine the best number of lines and their parameters; Section 3 overviews the Smoothing and Mapping approach; Section 4 focuses on the association problem; Experimental results are presented in Sect. 5; finally, Sect. 6 ends this paper.
2
Multiple Line Extraction
The principal problems to obtain multiple features are: 1) find the best number of features, 2) determine which points belong to each feature, and 3) estimate the feature parameters given its points. There are many techniques for multiple geometric feature extraction: Bottom-up [7,8,9], Probabilistic Techniques [10,11], voting schemes such as RANSAC [12,13], Hough Transform [14], etc. Bottom–up approach has been used in many pattern recognition tasks. At Line , local features are extracted from the data. Normally, a distance between each pair of current clusters is computed and the closest pair is merged iteratively until a stopping criterion is met; the simplest criterion involves a threshold: when the distance of the closest features is greater than the predefined threshold the process is stopped [9]. Here, the result depends strongly on the threshold and the distance used (usually a Euclidean or Mahalanobis distance). Robust regression techniques can manage data with a large proportion of data points that do not belong to the main model. The most widely used robust algorithm is the Random Sampling and Consensus (RANSAC) [12]. Due to its greedy nature, the sequential approach does not consider the relationship among different models; consequently, the result is usually imprecise for non-trivial cases.
156
L. Romero and C. Lara
To estimate the number of features and their parameters, Thrun et al. [10] use a real-time variant of the expectation maximization (EM) algorithm; their method penalizes models with many features by using an exponential prior probability; the search for the best number of features is interleaved with the running EM algorithm. The search involves a step for creating new features, and another one for removing features, both executed in regular intervals. Han et al. [11] formulate the problem of finding features from range images in the Bayesian framework, where prior probabilities penalize the number of features. The algorithm simulates Markov chains with both reversible jumps and stochastic diffusions to traverse the solution space; reversible jumps are used to move among subspaces of different dimensions, such as switching feature models and changing the number of features. This paper formulates the problem of finding the number of features from a Bayesian viewpoint; specifically, we study the problem of finding a set of lines from a laser scan. Indoor environments are usually rich in planar surfaces, lines are the natural way to represent them. Nguyen et al. [15] compare some algorithms to extract lines from laser scans. 2.1
Problem Statement and Notation
The multiple line extraction problem is stated as: Given a set of points from a laser scan, find the best number of line segments and their parameters. The challenge for any line extraction algorithm consists of finding a realistic representation. Let Z = {z1 , . . . , zN } be a set of measurements obtained from a two dimensional laser scan; the line extraction problem consists of finding the set of lines Θ = {θ1 , . . . , θM };
(1)
that better represents Z; where, θj = rj , αj are the parameters of the j–th line. 2.2
Initial Segmentation
The proposed algorithm to find lines from a laser scan is similar to traditional agglomerative clustering. Initial clusters can be obtained by any conventional bottom–up technique (i.e. sliding window [9] or Iterative End Point Fit (IEPF) [7]). The segmentation step finds a set of M linear clusters Z = {Z1 , . . . , ZM } from Z; each linear cluster Zi is a group of adjacent points that follow a linear model. These clusters are pairwise disjoint, that is Zi ∩ Zj = {} | ∀i = j. To find the resulting line map, similar clusters must be merged; merging phase is usually based on Euclidean or Mahalanobis distance, the following section formulates the merging phase from a probabilistic viewpoint.
Line Maps in Cluttered Environments
2.3
157
Odds Ratio Test (ORT)
Let Za ⊆ Z be a point cluster that follows a linear model with parameters θa ; it is interesting to find the probability that Za is generated by θa . Assuming independence among measurements 2 d2 (zi , θa ) d⊥ (zi , θa ) ⊥ Pr(Za | θa ) ∝ exp − = exp − 2σ 2 2σ 2 zi ∈Za zi ∈Za χ2 ∝ exp − a , (2) 2 where χ2a =
d2 (zi , θa ) ⊥ . σ2
(3)
zi ∈Za
Using (2) and considering independent gaussian measurements, the likelihood for the M clusters is ⎞ ⎛
M 1 1 χ2j ⎠ dM rj dM αj Pr(Z | M, I) = . . . exp ⎝− M 2 (2πrmax ) j=1 =
M
1 M
(2πrmax )
e−
χ2 j 2
drj dαj ,
(4)
j=1
The (M − 1)–lines model is obtained by merging two clusters Za , Zb ∈ Z into a new cluster Zc = Za ∪ Zb . The likelihood in that case is
Pr(Z |M − 1, I) =
1 M−1
(2πrmax )
×
χ2 M −χ2j c e 2 drj dαj e− 2 drc dαc × j=1 (5) 2 − χa − χ2b e 2 dra dαa e 2 drb dαb
To choose between the M –lines model and the (M − 1)–lines model we compare their likelihood by using the ratio: − χ2c 2πrmax e 2 drc dαc Pr(Z | M − 1, I) = . 2 − χ2b χa Pr(Z | M, I) e− 2 dra dαa e 2 drb dαb
(6)
The integrals of (6) can be solved by using a Taylor series expansion about the parameters of the least square line for each cluster. Let θj∗ = rj∗ , αj∗ be the parameters of the least square line for the j–th cluster and χ2j∗ its corresponding error. Taylor expansion about χ2j∗ gives 1 χ2j ≈ χ2j∗ + (θ − θj∗ )T ∇∇χ2j (θ − θj∗ ) + . . . , 2
(7)
158
L. Romero and C. Lara
where ∇∇χ2 is the Hessian matrix, evaluated at θj∗ . Finally, the odds ratio is found by solving (6) with (7): Pr(Z | M − 1, I) rmax det(∇∇χ2a ) det(∇∇χ2b ) R= = Pr(Z | M, I) 2 det(∇∇χ2c ) 1 × exp (χ2a∗ + χ2b∗ − χ2c∗ ) . (8) 2 When the value of the odds ratio (8) is equal to one, both models are equally like; lower values mean that the (M − 1)–lines model obtained by merging Za and Zb is less likely to occur than the M –model; that is, clusters Za and Zb should not be merged. On the other hand, values greater than one prefer the (M − 1)–model. Then, the odds ratio can be used to decide greedily which two clusters to merge. Equation 8 follows the Ockham’s razor principle: factors that multiply the exponential term penalizes models with more features. 2.4
Proposed Algorithm
As is shown in previous section, pairs of clusters can merged iteratively based on the ratio given by (8). A tree provides a picture for agglomerative clustering techniques, in this sense the value obtained from (8) also helps to decide the cutting height of the tree. Given a laser scan Z, Algorithm 1 obtains a set of lines Θ; At step 1 a set of local clusters are extracted from the laser scan; the clusters with the highest
Algorithm 1. Proposed Algorithm
1 2 3 4 5 6 7 8 9 10 11 12
Input: A laser scan Z = {z1 , . . . , zN } Output: A set of lines Θ = {θ1 , . . . , θM } and their corresponding clusters Z = {Z1 , . . . , ZM } Find a set of local clusters Z = {Z1 , . . . , ZM } where M i=1 Zi ⊆ Z, and Zi ∩ Zj = {} for i =j repeat Find the pair Za , Zb ∈ Z with the highest probability of the merged hypothesis −1,I) r ← Pr(Z|M Pr(Z|M,I) if r > 1 then Za ← Za ∪ Zb Z ← Z \ {Zb } M ←M −1 end until r ≤ 1 Θ = {θj | j = 1, . . . , M } where θj is the best line for the cluster Zj in the least squares sense return Θ, Z
Line Maps in Cluttered Environments
x0 z1
...
u2
u1 x1 z2 l1
l2
xM
...
z4
z3
uM ...
x2
159
zK lN
...
Fig. 1. Bayesian belief network for a SLAM problem. The objective of SLAM is to localize the robot while simultaneously building a map of the environment. The Full SLAM problem requires that the entire robot trajectory is also determined.
Odds Ratio are merged until the value of r is less than or equal to one (lines 2-10). This approach is valid because the densities involved are MLR (i.e. have a monotone likelihood ratio), and since the Gaussian model is MLR, the Odds Ratio Test is applicable.
3
SLAM Framework
The Full SLAM problem can be formulated by a belief net representation as shown in Fig. 1; here, the iSAM approach [6] is used to solve it. The trajectory of the robot is denoted by X = [x0 , . . . xM ], where xi is the i–th pose of the robot. The map of the environment is denoted by the set of landmarks L = {lj | j = 1, . . . , N }, and the measurement set by Z = {zk | k = 1, . . . K}. For line maps, lj is a line –we use the algorithm introduced in the previous section to obtain line measurements from laser scans. The joint probability model corresponding to this network is P (X, L, Z) = Pr(x0 )
M
Pr(xi | xi−1 , ui )
i=1
K
Pr(zk | xik , ljk ).
(9)
k=1
Let us denote τ = (X, L) the variables we are looking for; the maximum a posteriori (MAP) estimate is obtained by τ ∗ = arg max Pr(X, L | Z) τ
= arg min (− log Pr(X, L, Z)) . τ
(10)
Assuming gaussian process and measurement models, defined by 1 2 Pr(xi | xi−1 , ui ) ∝ exp − fi (xi−1 , ui ) − xi Λi 2 1 2 Pr(zk | xik , ljk ) ∝ exp − hk (xik , ljk ) − zk Γk ; 2
(11) (12)
160
L. Romero and C. Lara
where e 2Σ ≡ eT Σ −1 e is the Mahalanobis distance. A standard non-linear least squared formulation is obtained by combining (11), (12) and (9), and considering as uniform the initial distribution Pr(x0 ), M K 2 2 τ ∗ = arg min
fi (xi−1 , ui ) − xi Λi +
hk (xik , ljk ) − zk Γk . (13) τ
i=1
k=1
Kaess et al. [6] propose to solve (13) incrementally; their method, known as iSAM, provides an efficient and exact solution by updating a QR factorization of the naturally sparse smoothing information matrix, therefore recalculating only the matrix entries that actually change. iSAM is efficient even for robot trajectories with many loops as it avoids unnecessary fill-in in the factor matrix by periodic variable reordering. Also, to enable data matrix factorization of the smoothing information matrix in association in real-time; Kaess et al. suggest efficient algorithms to access the estimation uncertainties of interest based on the factored information matrix. We have adopted a total SLAM schema such as iSAM because the information of each line can be updated when necessary. It can be accomplished because the robot position where a given measurement was taken is known. Following section describes how to use geometric information to improve data association.
4
Association Problem
The association problem consists on determine where two measurements acquired at different times were originated from the same physical object. In the context of feature–based SLAM, the system must determine the correct correspondences between a recently measured feature and map landmarks. This problem is a challenging task because features are often indistinguishable. The solution of this problem is essential for consistent map construction since any single false matching may invalidate the entire process [16]. The direct solution consists of associating a measurement with the closest predicted observation –A predicted observation is a function of the best robot pose and a map feature. This solution, known as Single Compatibility Test (SCT), commonly uses the Mahalanobis distance or Normalized Innovation Squared (NIS)[17,18]. Single compatibility ignores that measurement prediction errors are correlated; hence, it is susceptible to accept incorrect matchings. Neira and Tard´ os [5] propose the Joint Compatibility Test (JCT); an implementation of the JCT known as the Joint Compatibility Branch and Bound (JCBB) algorithm, generates tentative sets of associations and searches for the largest set that satisfies joint compatibility. For a given set of association pairs, joint compatibility is determined by calculating a single joint NIS gate. The benefit of joint compatibility is that it preserves the correlation information within the set of observations and predicted observations. Note that a validation gates based on the NIS distance (such as SCT and JCT) provides no statistical measure for the rejection of false associations [18]. To
Line Maps in Cluttered Environments
161
s¯i,1 si,1
si,2
li
Fig. 2. An illustration of the segments Si = {si,1 , si2 } and free-segments S¯i = {¯ si,1 } of a line li
overcome false association, the following section introduces a technique that uses both the compatibility test and geometric constraints to find correct matchings between lines. 4.1
Validation Gate Based on Segments
Let us define Si the set of segments of the i–th line li and S¯i the set of free– segments (the intervals of a line where there is a high probability of free space). The segments Si are calculated with the inliers of a line, while the free-segments Sˆi are calculated with points such as their measurement ray cross the line li , see Fig. 2. The set of segments from a set of points can be obtained straightforward, altough general techniques can be used [19,20]. To perform geometric association based on segments, the i–th line li and the new line lj are transformed into the same coordinate frame; this transformation allow to treat the segments of a line as a set of intervals. Then the intersection of two segments A ∧ B can be easily calculated by finding those segments that are present both in A and B. The probability that two lines represent geometrically the same line is ⎧
Sj ∧Si ⎨ if (Sj ∧ Si + Sj ∧ S¯i ) = 0, ∧S ∧S ¯ ¯ S + S Pr(G|Si , Si , Sj ) = j i j i (14) ⎩1 otherwise, 2 where · is the sum of the lengths of the segments. Here, a value of 0.5 is assigned when the local segments are never seen before. The Segment Validation (SV) gate can be used to improve the Joint Compatibility test. That is, a pair of lines are used in the Joint Compatibility test only if Pr(G|Si , Si , Sj ) ≥ 0.5.
5
Experimental Results
To test the ideas presented in this paper, two environments were used: a simulated environment and a real environment shown in Figs. 3 and 4, respectively.
162
L. Romero and C. Lara
Fig. 3. Synthetic environment used for the tests Table 1. Comparison of line extraction algorithms from synthetic laser scans
SR SM SM + ORT LT LT+ORT
5.1
TP % 91.29 75.86 95.38 89.97 96.82
ND % 21.32 24.13 12.70 18.22 13.20
μerrr [mm] 6.56 6.68 4.37 4.92 3.95
μerrα [rad] 0.0116 0.0143 0.0062 0.0077 0.0055
speed [Hz] 196.54 327.76 31.56 59.63 17.08
Simulated Environment
The simulated environment shown in Fig. 3 was used to test the ORT performance to find lines from lasers scans; this environment has 42 lines, some of them are parallel and very close one another (300mm between parallel lines representing a door and its corresponding wall); the robot takes 1000 laser scans from random poses. Laser measurements are corrupted with gaussian noise with σ = 10mm. To find lines from laser scans three algorithms were selected: the sequential RANSAC (SR), Split and Merge (SM) and Line Tracking (LT). Table 1 shows the experimental results; the first two columns show the True Positives (TP) and the Not Detected Lines (ND). In general, algorithms that use the Odds Ratio Test (LT+ORT and SR+ORT) increase the True Positives and reduce the proportion of lines not detected. The following two columns show the precision of the algorithms; this indicator shows the advantage of the ORT. Finally, the last column shows the speed1 of the algorithms –one drawback is the increment of the time complexity when using the ORT. 5.2
Real Environment
To test the smoothing and mapping framework –including the proposed techniques to line extraction and data association; we use a real robot to take 215 laser scans from our laboratory using a Sick LMS–200. This sensor is configured 1
Tests were performed on an ElliteBook 6930p (2.40 GHz).
Line Maps in Cluttered Environments
163
Fig. 4. Raw data registered with ICP + Lorenztian [21] and route (dotted line), the scale is in cm
(a) SR + JCT
(b) SR + (JCT + SV)
• •
(c) (SM + ORT) + JCT
•
•
(d) (SM + ORT) + (JCT + SV)
Fig. 5. Map of our laboratory using different techniques. Left column shows the results obtained by using the Joint Compatibility Test (JCT) and the right column maps are obtained with the Joint Compatibility Test with Segment Validation (JCT + SV). The dashed polygon represents the robot trajectory, and doors correctly detected are marked by bullets.
164
L. Romero and C. Lara
to read 30m over a 180o arc, with an angle resolution of 0.5o . Figure 4 shows the resulting points map from the same information; as can be seen, the environment is highly cluttered. Resulting line maps by using the SAM framework are shown in Fig. 5; here, only the results from two algorithms are presented: the Sequential RANSAC (SR) and the Split and Merge with Odds Ratio Test (SM + ORT). Figures 5a and 5c show the results from the Joint Compatibility Test, while Figs. 5b and 5d show the results from the Joint Compatibility Test with Segment Validation (SV). As expected, the line maps are less expressive than the map of raw points; SR approach gives poor quality line maps (Figs. 5a and 5b) –see for example the inaccurate estimation of angles; The Split and Merge algorithm gives better results (Figs. 5c and 5d). Wrong associations may become more evident in environments with parallel lines that are very close one another. The validation based on the segments of each line prevents wrong associations improving the final map –note the correct detection of doors and walls in Fig. 5d.
6
Conclusions
We have presented an implementation of the Smoothing and Mapping approach for indoor environments. The contributions of this paper are a probabilistic algorithm to find line clusters from laser scan data, and a validation gate based on segments that improves the data association for line maps. The proposed algorithm to extract lines from laser scans uses a probabilistic criterion to merge clusters rather than an ad hoc distance metric; the criterion is stated as a ratio of marginal likelihoods. This criterion allows to decide which model must be merged and to output the best number of models. The proposed algorithm only uses the noise model parameters and avoids to use unnecessary thresholds. Experimental results show that the proposed approach works well to find the correct number of lines, increasing the proportion of True Positives and improving their precision. In other hand, the Segment Validation scheme is helpful to improve the data association when parallel lines are present in the environment. As is shown in the real test, the complete SAM framework finds high quality indoor line maps from cluttered environments.
References 1. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics (Intelligent Robotics and Autonomous Agents). MIT Press, Cambridge (2005) 2. Gr¨ unwald, P.D.: The Minimum Description Length Principle (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2007) 3. Yang, M.Y., F¨ orstner, W.: Plane detection in point cloud data. Technical Report TR-IGG-P-2010-01, Department of Photogrammetry Institute of Geodesy and Geoinformation University of Bonn (2010) 4. Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723 (1974)
Line Maps in Cluttered Environments
165
5. Neira, J., Tard´ os, J.: Data association in stochastic mapping using the joint compatibility test. IEEE Transactions on Robotics and Automation (2001) 6. Kaess, M., Ranganathan, A., Dellaert, F.: isam: Incremental smoothing and mapping. IEEE Transactions on Robotics 24, 1365–1378 (2008) 7. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern classification and scene analysis. Wiley, New York (1973) 8. Borges, G.A.: A split-and-merge segmentation algorithm for line extraction in 2d range images. In: ICPR 2000: Proceedings of the International Conference on Pattern Recognition, Washington, DC, USA, p. 1441. IEEE Computer Society, Los Alamitos (2000) 9. Siegwart, R., Nourbakhsh, I.R.: Introduction to Autonomous Mobile Robots. Bradford Book (2004) 10. Thrun, S., Martin, C., Liu, Y., H¨ ahnel, D., Emery Montemerlo, R., Deepayan, C., Burgard, W.: A real-time expectation maximization algorithm for acquiring multiplanar maps of indoor environments with mobile robots. IEEE Transactions on Robotics and Automation 20, 433–442 (2003) 11. Han, F., Tu, Z., Zhu, S.C.: Range image segmentation by an effective jump-diffusion method. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 1138–1153 (2004) 12. Bolles, R.C., Fischler, M.A.: A ransac-based approach to model fitting and its application to finding cylinders in range data. In: IJCAI, pp. 637–643 (1981) 13. Schnabel, R., Wahl, R., Klein, R.: Efficient ransac for point-cloud shape detection. Computer Graphics Forum 26, 214–226 (2007) 14. Leavers, V.F.: Which hough transform? CVGIP: Image Underst. 58, 250–264 (1993) 15. Nguyen, V., G¨ achter, S., Martinelli, A., Tomatis, N., Siegwart, R.: A Comparison of Line Extraction Algorithms using 2D Range Data for Indoor Mobile Robotics. Autonomous Robots 23, 97–111 (2007) 16. Durrant-Whyte, H.F., Majumder, S., Thrun, S., Battista, M.D., Scheding, S.: A bayesian algorithm for simultaneous localisation and map building. In: ISRR, pp. 49–60 (2001) 17. Zhang, Z., Faugeras, O.: A 3-d world model builder with a mobile robot. International Journal of Robotics Research 11, 269–285 (1992) 18. Bailey, T.: Mobile Robot Localisation and Mapping in Extensive Outdoor Environments. PhD thesis, Australian Centre for Field Robotics, University of Sydney (2002) 19. Castellanos, J.A., Tard´ os, J.D.: Laser-based segmentation and localization for a mobile robot, pp. 101–108. ASME Press, New York (1996) 20. Castro, D., Nunes, U., Ruano, A.: Feature extraction for moving objects tracking system in indoor environments. In: Proc. 5th IFAC/Euron Symposium on Intelligent Autonomous Vehicles, pp. 5–7 (2004) 21. Romero, L., Arellano, J.J.: Robust local localization of a mobile robot using a 2-D laser range finder. In: ENC 2005: Proceedings of the Sixth Mexican International Conference on Computer Science, Puebla, Mexico, pp. 248–255. IEEE Computer Society, Los Alamitos (2005)
Fuzzy Cognitive Maps for Modeling Complex Systems Maikel León1, Ciro Rodriguez1, María M. García1, Rafael Bello1, and Koen Vanhoof2 1 Central University of Las Villas, Santa Clara, Cuba {mle,crleon,mmgarcia,rbellop}@uclv.edu.cu 2 Hasselt University, Diepenbeek, Belgium
[email protected]
Abstract. This paper presents Fuzzy Cognitive Maps as an approach in modeling the behavior and operation of complex systems. This technique is the fusion of the advances of the fuzzy logic and cognitive maps theories, they are fuzzy weighted directed graphs with feedback that create models that emulate the behavior of complex decision processes using fuzzy causal relations. There are some applications in diverse domains (manage, multiagent systems, etc.) and novel works (dynamical characteristics, learning procedures, etc.) to improve the performance of these systems. First the description and the methodology that this theory suggests is examined, also some ideas for using this approach in the control process area, and then the implementation of a tool based on Fuzzy Cognitive Maps is described. The application of this theory in the field of control and systems might contribute to the progress of more intelligent and independent control systems. Fuzzy Cognitive Maps have been fruitfully used in decision making and simulation of complex situation and analysis. Keywords: Fuzzy Cognitive Maps, Complex Systems, Casual Relations, Decision Making, Simulation.
1 Introduction Modeling dynamic systems can be hard in a computational sense and many quantitative techniques exist. Well-understood systems may be open to any of the mathematical programming techniques of operations study. First, developing the model usually requires a big deal of effort and specialized knowledge outside the area of interest. Secondly, systems involving important feedback may be nonlinear, in which case a quantitative model may not be possible [1]. In the past years, conventional methods were used to model and control systems but their contribution is limited in the representation, analysis and solution of complex systems. In such systems, the inspection of their operation, especially from the upper level, depends on human leadership. There is a great demand for the development of autonomous complex systems that can be achieved taking advantage of human like reasoning and description of systems. Human way of thinking process for any method includes vague descriptions and can have slight variations in relation to time and space; for such situations Fuzzy Cognitive Maps (FCM) seem to be appropriate to deal with. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 166 – 174, 2010. © Springer-Verlag Berlin Heidelberg 2010
Fuzzy Cognitive Maps for Modeling Complex Systems
167
FCM are a combination of Fuzzy Logic and Neural Networks; combining the hheuristic and common sense rules r of Fuzzy Logic with the learning heuristics of the Neural Networks. They werre introduced by Kosko [2], who enhanced cognitive m maps with fuzzy reasoning, that had been previously used in the field of socio-econom mic and political sciences to anaalyze social decision-making problems. The use of FCM for maany applications in different scientific fields was propossed. FCM had been apply to analyze extended graph theoretic behavior, to make decission analysis and cooperate disstributed agents, were used as structures for automatting human problem solving skillls and as behavioral models of virtual worlds. With the elaboration of a tool that allows the design and execution of FCM is pprovided to specialists of diverse knowledge areas of a means for the study and simulattion of situations that characteriize diverse problems. This work proposes a computatioonal tool for the study, design an nd execution of FCM, tool to represent the knowledge iin a graphic and comprehensiblee way. The maps are based on causal relationships, to tryy to study the systems like a wh hole, settling down how the entities that conform the systtem are affected with others, offering o to users, not necessarily specialist in Compuuter Science, a tool that allows the creation and execution of FCM, and including expperimentation facilities.
2 Overview about Fu uzzy Cognitive Maps FCM in a graphical illustrration seem to be a signed directed graph with feedbaack, consisting of nodes and weighted arcs (see figure 1). Nodes of the graph place for the concepts that are used to ex xpress the behavior of the system and they are connectedd by signed and weighted arcs representing r the causal relationships that exist connectting the concepts.
F 1. Simple Fuzzy Cognitive Map Fig.
It must be mentioned thaat the values in the graph are fuzzy, so concepts take vallues in the range between [0,1] and the weights of the arcs are in the interval [-1,1]. T The weights of the arcs betweeen concept Ci and concept Cj could be positive (Wij > 0) which means that an augm ment in the value of concept Ci leads to the increase of the value of concept Cj, and a decrease d in the value of concept Ci leads to a reduce off the value of concept Cj. Or th here is negative causality (Wij < 0) which means thatt an
168
M. León et al.
increase in the value of concept Ci leads to the decrease of the value of concept Cj and vice versa. Observing this graphical representation, it becomes clear which concept influences other concepts showing the interconnections between concepts and it permits updating in the construction of the graph. Each concept represents a characteristic of the system; in general it stands for events, actions, goals, values, trends of the system that is modeled as an FCM. Each concept is characterized by a number that represents its value and it results from the renovation of the real value of the system’s variable [3]. Beyond the graphical representation of the FCM there is its mathematical model. It consists of a 1 state vector A which includes the values of the n concepts and a weight matrix W which gathers the weights Wij of the interconnections between the n concepts. The value of each concept is influenced by the values of the connected concepts with the appropriate weights and by its previous value. So the value Ai for each concept Ci is calculated by the following rule expressed in (1).
A
AW
1
Ai is the activation level of concept Ci, Aj is the activation level of concept Cj and Wij is the weight of the interconnection between Cj and Ci, and f is a threshold function. So the new state vector Anew is computed by multiplying the previous state vector Aold by the weight matrix W, see equation (2). The new vector shows the effect of the change in the value of one concept in the whole FCM [4]. (2) In order to build an FCM, the knowledge and experience of one expert on the system’s operation must be used. The expert determines the concepts that best illustrate the system; a concept can be a feature of the system, a state or a variable or an input or an output of the system; indentifying which factors are central for the modeling of the system and representing a concept for each one. Moreover the expert has observed which elements of the system influence others elements; and for the corresponding concepts the expert determines the negative or positive effect of one concept on the others, with a fuzzy value for each interconnection, since it has been considered that there is a fuzzy degree of causation between concepts. It is possible to have better results in the drawing of the FCM, if more than one expert is used. In that case, all experts are polled together and they determine the relevant factors and thus the concepts that should be presented in the map. Then, experts are individually asked to express the relationship among concepts; during the assigning of weights three parameters must be considered: how strongly concepts influence each other, what is the sign of the weight and whether concepts cause. This is one advantage over other approaches like Bayesian Networks (BN) or Petri Nets (PN). PN is another graphical and mathematical modeling tool consisting of places, transitions, and arcs that connect them that can be used as a visual-communication
Fuzzy Cognitive Maps for Modeling Complex Systems
169
aid similar to flow charts, block diagrams, and networks. As a mathematical insstrument, it is possible to set up p state equations, algebraic equations, and other mathem matical models governing the performance of systems. It is well known that the use of PN has as a disadvantage the drrawing process by a non-expert in this technique, that’s w way there is a limited numbers of o tools usable for this purpose, and it is not well establisshed how to combine different PN N that describe the same system [5]. If there will be a collectiion of individual FCM that must be combined into a colllective map (see figure 2) and d if there are experts of different credibility, for them, tthen their proposed maps must be b multiplied with a nonnegative “credibility” weight.
Fig. 2. Combining C some FCM into a collective map
As over PN, this is an ad dvantage over BN [6]. BN is a powerful tool for graphically representing the relationship ps among a set of variables and for dealing with uncertaainties in expert systems, but demanding d effort caused by specification of the net (strructure and parameters) and difficulty d to implement the algorithms of propagationn of probabilities, which besidess being more or less complex are very expensive compuutationally. Also is not eviden nt for a non-expert in this field how to construct a BN, and even more difficult how to combine different BN that describe the same system. So the combination of these different FCM will produce an augmented FC CM. When a FCM has been constructed, it can be used to model and simulate the behavvior F should be initialized, the activation level of eachh of of the system. Firstly, the FCM the nodes of the map takes a value based on expert’s opinion for the current state and o interact. then the concepts are free to This interaction between n concepts continues until: • A fixed equilibrium m is reached • A limited cycle is reeached • Chaotic behavior is exhibited
170
M. León et al.
FCM are a powerful tool that can be used for modeling systems exploiting the knowledge on the operation of the system. It can avoid many of the knowledge extraction problems which are usually present in by rule based systems and moreover it must be mentioned that cycles are allowed in the graph [7]. The threshold function serves to decrease unbounded inputs to a severe range. This destroys the possibility of quantitative results, but it gives us a basis for comparing nodes (on or off, active or inactive, etc.). This mapping is a variation of the “fuzzification” process in fuzzy logic, giving us a qualitative model and frees us from strict quantification of edge weights [8].
3 Use of FCM in Control Process After the presentation of FCM, their illustration and their methodology with which they are constructed; their application is examined in control aspects. There are two distinct uses of a knowledgeable based model like the FCM in the upper level of a process [9]. When an FCM is used for direct control and FCM influences directly the process. FCM can replace the conventional control element and it performs every function that a conventional controller could implement. It is similar to the closed loop control approach because FCM is dependent directly on the real behavior of the process. Another important use of FCM is for supervisory control of a conventional controller, so complementing rather than replacing a conventional controller. The role of FCM is to extend the range of application of a conventional controller by using more abstract representation of process, general control knowledge and adaptation heuristics and enhance the performance of the overall system. Thus, FCM may replicate some of the knowledge and skills of the control engineer and it is built by using a combination of the knowledge representation techniques as causal models, production rules and object hierarchies. At the conventional controller level or at the process it may exist more than one controller for different parts of the process and only local information is available to each controller who communicates with the supervisor at the higher level [10]. The role of the supervisor is to elaborate information of the controllers and to allocate actions to controllers taking into account their effect on the global system. The supervisor indicates undesired or unpermitted process states and takes actions such as fail safe or reconfiguration schemes. Supervisory FCM is used to perform more demanding procedure as failure detection, diagnose abnormalities, decision making; also planning tasks and to intervene when a certain task or state is reached and take control in abnormal or unsafe situations [11]. A human supervisor of the controlled process usually performs these tasks. If the nature of the process under control is such that appropriate analytic models do not exist or are inadequate, but human operation at the process can manually control the process to a satisfactory degree, then the need to use an abstract methodology as FCMs is motivated. The meaning of the whole model of the system can be described beginning in the lower level to the upper one. In the lower level sensors measure some defined variables of the process and these measurements must pass to the higher level where
Fuzzy Cognitive Maps for Modeling Complex Systems
171
information of the processs is organized and categorized [12]. After that, availaable information on process is clustering c and grouping, because some measured variabbles could cause changes in the value v of one or more concepts of the FCM.
4 Tool Based on Fuzzzy Cognitive Maps The scientific literature sho ows some software developed with the intention of draw wing FCM by non-expert in com mputer science, as FCM Modeler [13] and FCM Desiggner [14]. The first one is a very y rustic and superficial incursion, while the second one is a better implementation, but still hard to interact with and with insufficient experim mental facilities. Figure 3 show ws the general architecture of our proposing tool to moodel and simulate FCM, the orgaanization and structuring of the components are presenteed.
Fig. 3. General architecture of the tool
Brief description of the components c of the tool: • • • •
Interface: Allows the t user-tool interaction through the options to create FC CM and the definition of o parameters. Makes the input data for a formalizationn of the information into o a knowledge base. Controllers: Makess a link between the Interface and the algorithms and datta, it is a connectivity lay yer that guarantees a right manipulation of the informationn. Knowledge: Generates the computational representation of the created FC CM from an Artificial Intelligence I point of view. Processes the input and outtput data of algorithms in n the variables modeling. Inference Mechaniism: Makes the inference process through the mathem matical calculus for the prediction p of the variable values.
172
M. León et al.
In figure 4 is possible to ob bserve the main window of the tool, and a modeled exaample, in the interface appearr some facilities to create concepts, make relations, deffine parameters, etc. Also the option o to initialize the execution of the inference process, and the visualization option ns for a better understanding of the simulation process. There were defined som me facilities and options in the tool, to create, open or ssave an FCM, and options to oth her properties of nodes and arrows. Through these ameenities a non-expert in compu uter science is able to elaborate his own FCM describinng a system; we had paid attentiion to these facilities guarantying a usable tool, specifically for simulation purposes.
Fig. 4. Main view of the FCM Tool
In figure 5 we can appreeciate some important options, where is possible to deffine the assignment of a delay time t in the execution for a better understanding of the rrunning of the FCM in the infference process, also it is possible to define the normaliization function that the FCM will use in the running [15]. This is a very importan nt option because in simulation experiments the user can compare results using thesee different functions or just can select the appropriate fuunction depending of the probleem to model: • • •
Binary FCM are suitab ble for highly qualitative problems where only represenntation of increase or stabiility of a concept is required. Trivalent FCM are su uitable for qualitative problems where representationn of increase, decrease or sttability of a concept is required. Sigmoid FCMs are suitable for qualitative and quantitative problems where representation of a degreee of increase, a degree of decrease or stability of a conccept is required and strategic planning scenarios are going to be introduced.
Fuzzy Cognitive Maps for Modeling Complex Systems
173
Fig. 5. Run options of the FCM Tool
5 Conclusions It has been examined Fuzzy Cognitive Maps as a theory used to model the behavior of complex systems, where is extremely difficult to describe the entire system by a precise mathematical model. Consequently, it is more attractive and practical to represent it in a graphical way showing the causal relationships between concepts. Since this symbolic method of modeling and control of a system is easily adaptable and relies on human expert experience and knowledge, it can be considered intelligent. FCM appear to be positive method in modeling and control of complex systems which will help the designer of a system in choice analysis and tactical planning. FCM appear to be an appealing tool in the description of the supervisor of complex control systems, which can be complemented with other techniques and will lead to more sophisticated control systems. The development of a tool based on FCM for the modeling of complex systems was presented, showing the facilities for the creation of FCM, the definition of parameters and options to make the inference process more comprehensible, understanding and used for simulations experiments.
References 1. Kosko, B.: Neural Networks and Fuzzy systems, a dynamic system approach to machine intelligence, p. 244. Prentice-Hall, Englewood Cliffs (1992) 2. Kosko, B.: Fuzzy Cognitive Maps. International Journal of Man-Machine Studies 24, 65– 75 (1986)
174
M. León et al.
3. Koulouritios, D.: Efficiently Modeling and Controlling Complex Dynamic Systems using Evolutionary Fuzzy Cognitive Maps. International Journal of Computational Cognition 1, 41–65 (2003) 4. Carlsson, C.: Adaptive Fuzzy Cognitive Maps for Hyperknowledge Representation in Strategy Formation Process. IAMSR, Abo Akademi University (2005) 5. Li, X.: Dynamic Knowledge Inference and Learning under Adaptive Fuzzy Petri Net Framework. IEEE Transactions on Systems, Man, and Cybernetics – Part C: Applications and Reviews (2000) 6. Castillo, E.: Expert Systems and Probabilistic Network Models. Springer, Heidelberg (2003) 7. Stylios, C.: Modeling Complex Systems Using Fuzzy Cognitive Maps. IEEE Transactions on Systems, Man and Cybernetics 34, 155–162 (2004) 8. Aguilar, J.: A Dynamic Fuzzy-Cognitive-Map Approach Based on Random Neural Networks. Journal of Computational Cognition 1, 91–107 (2003) 9. Drianko, D.: An Introduction to Fuzzy Control. Springer, Heidelberg (1996) 10. Czogała, E.: Fuzzy and Neuro-Fuzzy Intelligent Systems. Springer, Heidelberg (2000) 11. Fuller, R.: Introduction to Neuro-Fuzzy Systems. Advances in Soft Computing Series. Springer, Heidelberg (2000) 12. Nise, N.: Control Systems Engineering, 3rd edn. John Wiley & Sons, New York (2000) 13. Mohr, S.: Software Design for a Fuzzy Cognitive Map Modeling Tool. Tensselaer Polytechnic Institute (1997) 14. Contreras, J.: Aplicación de Mapas Cognitivos Difusos Dinámicos a tareas de supervisión y control. Trabajo Final de Grado. Universidad de los Andes. Mérida, Venezuela (2005) 15. Tsadiras, A.: A New Balance Degree for Fuzzy Cognitive Maps. Technical Report, Department of Applied Informatics, University of Macedonia (2007)
Semantic Representation and Management of Student Models: An Approach to Adapt Lecture Sequencing to Enhance Learning Alejandro Peña Ayala1,2,3,4 and Humberto Sossa3 1
WOLNM ESIME-Z-IPN 3 CIC-IPN 4 Osaka University 31 Julio 1859 # 1099B Leyes Reforma DF 09310 Mexico
[email protected],
[email protected] 2
Abstract. In this paper an approach oriented to acquire, depict, and administrate knowledge about the student is proposed. Moreover, content is also characterized to describe lectures. In addition, the work focuses on the semantics of the attributes that reveal a profile of the student and the teaching experiences. The meaning of such properties is stated as an ontology. Thus, inheritance and causal inferences are made. According to the semantics of the attributes and the conclusions induced, the sequencing module of a Web-based educational system (WBES) delivers the appropriate option of lecture to students. The underlying hypothesis is: the apprenticeship of students is enhanced when a WBES understands the nature of the content and the student’s characteristics. Based on the empirical evidence outcome by a trial, it is concluded that: Successful WBES account the knowledge that describe their students and lectures. Keywords: Student model, Web-based educational systems, ontology, semantics, knowledge representation, sequencing.
1
Introduction
Student modeling is a process devoted to represent several cognitive issues such as: analyzing the student’s performance, isolating the underlying misconceptions, representing student’s goals and plans, identifying prior and acquired knowledge, maintaining an episodic memory, and describing personality characteristics [1]. All these tasks are complex due to the object of study is an intangible entity: The human mind. The outcome, a student model (SM), is set by descriptions of what is believed relevant about the student’s knowledge and aptitudes [2]. Based on a SM, a WBES adapts its behavior to meet student’s needs in order to deliver suitable content. Most of the SM research and applications are devoted to collect, represent, and exploit knowledge about the student. Nevertheless, meta-knowledge about SM is also needed. It is not enough to account a student profile, when it is not comprehensible. Hence, approaches oriented to outline semantics for SM are emerging in the arena. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 175–186, 2010. © Springer-Verlag Berlin Heidelberg 2010
176
A. Peña Ayala and H. Sossa
This approach takes into account the trends of the semantic Web [3]. Hence, it represents SM attributes in repositories and schemas edited by the Extensible Markup Language (XML) [4]. Whereas, the meaning of the SM attributes is characterized as an ontology encoded through the Web Ontology Language (OWL) [5]. The logical design of the ontology applies object oriented criteria. Thereby, classes are outlined through a set of data properties. Classes are organized hierarchically to generalize definitions. Properties represent specific attributes about the student or the lecture. Concepts are a kind of class objects that instantiates data properties. The sequencing module holds an inference engine. It queries the ontology to induce inheritance and cause-effect relationships. The process is fulfilled by dynamic simulation. At the end, a qualitative decision is made to select the best available option. In consequence, the chosen option is delivered to the student. The rest of the paper is organized as follows: In section two the related work is pointed out, whereas the formal model of the approach is outlined in section three. The acquisition of student knowledge is explained in section four whilst, the representation of knowledge and meaning for the SM and lecture attributes is stated in section five. The exploitation of the semantics to support the sequencing of lectures is described in section six. An account of the trial is reported in section seven, and the evaluation and future work are given in the conclusion section.
2
Related Work
The development of WBES accounts a common framework that allows educational services and content to be shared and reused across applications, organizations, and community boundaries. Indeed, joint research organizations have proposed standards and models such as: Learning Technology Systems Architecture (LTSA) [6], Learning Object Metadata (LOM) [7], Sharable Content Object Reference Model (SCORM) –it embraces the Content Aggregation Model [8] and the Simple Sequencing [9]. When the semantic Web guidelines are also considered, WBES and SM are able to understand what it meant by the knowledge of the content and the student that they hold respectively. Moreover, meta-data and ontologies facilitate building a large-scale Web of machine-readable and machine-understandable knowledge [10]. Therefore, meta-data and meta-knowledge facilitate the reuse and the integration of resources and services, so that WBES and SM can provide better applications. Semantic approaches are used in research lines such as: adaptive WBES ([11], [12]), management Web resources through semantic tagging [13], understanding user intentions [14], knowledge sharing in virtual communities ([15], [16], [17]), heterogeneity of interoperable distributed user models [18]. Ontologies are the underlying repository to state vocabulary, relationships, axioms, constraints, and instances of a knowledge domain. They are used in applications to describe: the semantics of user models [19], pervasive personalizing ([20], [21]), personalizing resources delivery [22], user modeling ([23], [24], SM [25], [26]), and design content ([27], [28]).
Semantic Representation and Management of Student Models
3
177
Formal Model
In this section the formal model of the proposed approach is set. It owns two underlying elements: A formal notation of the assertions given in a SM and the axioms to define an ontology. Most of the student’s attributes are subjective; thus, their representation is imprecise and uncertain. Hence, a SM just states beliefs about the student. As regards with ontology, the philosophical and the artificial intelligence senses are outlined next. 3.1 Student Modeling It is said that: A SM is aware of the student according to the knowledge that it holds. Beliefs are described by means of propositions of the propositional calculus [29]. They are assessed as true or false. So the set of propositions (p) that a SM (SM) asserts (A) about the student (U) is represented by: ASM (U) ={p | ASM p(U)}. However, SM organizes knowledge of the student into several domains. A domain reveals a target of study. It contains several concepts to identify and measure homogeneous attributes. Measures are qualitative terms that represent degrees of levels or/and variations of the state of a concept. Such a state reveals intensity or variation of the presence of a given concept (e.g. the “visual reasoning” concept is a cognitive skill whose level and variation could respectively be normal and increasingly higher). Hence, any domain, as the cognitive (C), holds a set of propositions (ASMC) that asserts what is true about the student through: CSM (U) = {p | ASM p(U) ∩ p ∈ C }. Based on such underlying statements, the SM proposed in this work holds four domains more: personality (P), learning preferences (L), knowledge (K), and content (T). Thus, the whole SM corresponds to the union of sets of propositions asserted for each domain given by: SM (U) = CSM (U) ∪ PSM (U) ∪ LSM (U) ∪ KSM (U) ∪ TSM (U)}. 3.2 Ontology Representation Ontology term is coined from the philosophy and artificial intelligence fields. The former calls it as conceptualization. It concerns to the structure of reality as perceived by an agent independently of the occurrence of a situation. The later considers an ontology as a set of logical axioms to account for the meaning of a vocabulary [30]. Conceptualization aims at the semantic of relations in a domain independently of a state of affairs. Conceptual relations are set as the structure: , where D is a domain and W is the set of all states of the domain, named possible worlds. A conceptual relation pn of arity n on is stated as: a total function pn: W Æ(2D)n from W into the set of all n-ary (ordinary) relations on D. For p, the set Ep = {p(w)| w ∈ W}1 owns the admittable extensions of p. So a conceptualization for D is stated as: C=, where R is a set of conceptual relations on . So a conceptualization C is a set of conceptual relations declared on a domain space. Based on C, for each world w ∈ W, the corresponding world structure according to C is: SwC =, where RwC={p(w)|p ∈ R} is the set of extensions of the elements of R. So all the intended world structures of C yield the set: SC = {SwC | w ∈ W}. 1
In section 3.2, symbols denoting structures and sets of sets appear in boldface.
178
A. Peña Ayala and H. Sossa
Likewise, given a logical language L with vocabulary V, a model for L is stated as the structure: ; where S= is a world structure, and I: VÆD ∪ R is an interpretation function. I assigns elements of D to constant symbols of V, and elements of R to predicate symbols of V. In addition, an intentional interpretation is set by the structure , where C= is a conceptualization and ζ: VÆD∪R is a function that assigns elements of D to constant symbols of V, and elements of R to predicate symbols of V. It is called an ontological commitment for L, such as: K= (i.e. L commits to C by means of K, while C is the underlying conceptualization of K). Moreover, a language L, with vocabulary V, and an ontological commitment K, a model M with a world structure is compatible with K, if L meets three constraints: 1) S∈Sc; 2) for each constant c, I(c)=ζ(c); 3) for each predicate symbol p, I maps p into an admittable extension of ζ(p). It means that: a conceptual relation p and a world w such that ζ(p)= p ∧ p(w) = I(p). So the set Ik(L) of all models of L that are compatible with K are the set of intended models of L based on K. As regards an ontology O for a language L and an ontological commitment K, it is outlined as: A set of axioms defined in such way that, the set of its models approximates as near as possible to the set of intended models of L according to K. Thus, an ontology O specifies a conceptualization C in an indirect way, due to: it only approximates a set of intended models, and such models are a weak characterization of a conceptualization. Thereby, an ontology O for a language L approximates a conceptualization C, if there is an ontological commitment K, so that the intended models of L according to K are included in the models of O. In short, an ontology O commits to C if: O has been designed with the purpose of characterizing C, and O approximates C. Moreover, a language L commits to an ontology O if: L commits to some conceptualization C such that O agrees on C.
4
Student Knowledge Acquisition
As a result of the analysis of psychological and pedagogical works, five well sounded models are chosen: 1) Wechsler Adult Intelligent Scale (WAIS) to measure cognitive skills [31]; 2) Minnesota Multiphase Personality Inventory (MMPI) designed to diagnostic personality traits [32]; 3) Gardner’s Multiple Intelligence model (GMIM) devoted to identify learning preferences [33]; 4) Taxonomy of Educational Objectives (TEO) to estimate how well the student masters a educational concept [34]; 5) guidelines for using learning technologies to characterize content [35]. These models hold: guidelines to define the measurement procedures, tests that use questionnaires and instruments, criteria to control the application of tests, verification and evaluation methods to compute measures, and rules to diagnostic some individual’s traits. The tools were adapted to be used on the Web. Thus, fill forms and items were deployed into Web pages. The acquired knowledge is described as follows. The cognitive model embraces eleven quizzes to test student’s skills about: information and vocabulary recall, arithmetic and understanding, sonorous and visual retention, observation and sequencing of pictures, design and arrangement of objects,
Semantic Representation and Management of Student Models
179
similarity reasoning. As a result, eleven integer values are estimated. They are normalized according to the student’s age. Six of such values produce the verbal scale and the remaining composes the performance scale. Both scales are used to set the intelligence quotient (IQ). At the end, the fourteen normalized values are shifted to linguistic terms. Such terms fix up a universe of discourse (UOD). This set embraces five levels to depict the state of a concept (e.g. {quite low, low, normal, high, quite high}). Hence, a five-level UOD is attached to each concept. In consequence, the cognitive domain holds fourteen concepts about the cognitive skills of the student. As regards the personality model, it analyzes more than fifty traits. They are measured through 567 questions. Each one is attached to one or several concepts. The inquiries concern customs, issues, likes, and fears. Whether the student agrees: she/he answers true, otherwise responds false. A frequency account is made of the affirmative responses. These raw values are normalized to a common scale. They are transformed to linguistic terms. These qualitative values are members of the earlier stated five-level UOD. Thus, the personality domain holds fifty concepts about the student’s personality traits. Their state is measured by a linguistic term to depict a first diagnostic of a given issue (e.g. depression, paranoia, repression, anger, etc.). The learning preferences model depicts eight styles. A test of eighty questions is applied to the student. They explore how attractive is a particular stimulus or habit. When, it is preferred by the student, she/he chooses 1, otherwise picks 0. An inquiry is attached to one learning preference. The frequency of affirmative answers reveals a qualitative level of interest. Such a value belongs to the prior five-level UOD. So the learning preferences domain contains eight concepts (e.g. logical, verbal, visual, etc.). The educational knowledge model estimates the student’s proficiency for a set of key concepts. It assigns a level of mastering for each concept. Such a value corresponds to a seven-level UOD. It has linguistic terms that are ordered in ascending level of proficiency. If the student does not know a key concept, the ignorance level is assigned. When the student is able to list, define, identify, etc., the knowledge tier is given. If she/he also describes, interprets, associates, etc., the comprehension layer is attached. The application value is stated when the student applies, illustrates, classifies, etc. too. Whether, the student also separates, explains, compares, etc., the analysis level is fulfilled. Besides, the synthesis tier is met when student also modifies, substitutes, generalizes, etc. The maximum level, evaluation, is met when the student decides, tests, judges as well. At the end, the value assigned to a key concept is the highest level that student rightly answered in a row. So the knowledge domain owns a level for ten key concepts. They depict the subjects of the study domain. The content model depicts the lectures devoted to teach ten key concepts. The lecture’s content is authored according to several options. Each option privileges a learning theory and media. Hence, five concepts are attached to represent a lecture (e.g. technical, abstract, practical, deep, and complex). In addition, eight concepts are devoted to characterize a given option (e.g. constructivism, objectivism, linguistic, nonlinguistic, dynamic, and static). All of these attributes are instantiated by a six-level UOD. The null linguistic term is allocated before of the earlier stated five-level UOD.
180
5
A. Peña Ayala and H. Sossa
Student Knowledge Semantic Representation
The SM embraces five domains to characterize the student and lectures. A domain depicts the beliefs that the SM holds about a specific object of representation. Such assertions are split into concepts, attributes, values, and relationships. These kinds of SM items are stored in XML repositories, and semantically described in an ontology. The structure, elements, and samples of the ontology and repositories are given next. The ontology statements are defined through OWL. The items to be semantically stated are considered classes. They are declared by the owl:Class tag. Classes are hierarchically fixed up by the rdf:about attribute. Its value reveals the name of the immediate ancestor class with the prefix “#” (e.g. #_id). The description of the meaning is encoded by a sentence written in natural language as the value of the rdfs:comment tag. The lines 01 to 04 of the code, shown behind, set the class concept. A class is characterized by attributes. They are called properties. Hence, the owl: DatatypeProperty tag is used to describe them. A property is joined to a class by rdf:about attribute. Its value also identifies the name of the class with the prefix “#”. The property’s meaning is stated by a natural language description, which is the value of the rdfs:comment tag. The type of property (e.g. integer, string) is the value of the attribute rdf:datatype that is attached to the rdfs:range tag. The cardinality (e.g. single, multiple) is the attribute’s value rdf:Cardinality of the owl:DatatypeProperty tag. Lines 11 to 15 of the following code set the property description. Once an SM item is characterized by an OWL class, it is instantiated as an object. The object is declared as a XML element, whose tag name corresponds to the class name. Its attribute rdf:ID revels the name of the specific instance. Every sub-element instantiates also a specific property attached to the class. The value of its attributes rdf:class and rdf:level depicts the name of the joined class and identifies the number of hierarchical ascendency respectively. The value of the sub-element is the value of the attribute that describes the class. Visual is an instance of the concept class that is edited through lines 21 to 25 of the code presented as follows: OWL code to set a class, property and instance of the SM ontology taken from [36]. 00: 01: 02: 03: 04: :: 10: 11: 12: 13: 14: 15: :: 20: 21: 22: 23: 24: 25:
It is the way of thinking... ... It defines the nature... ... It is the learning preference that depicts the... level_variation
Semantic Representation and Management of Student Models
181
As regards the SM domains, they are arranged into three XML repositories: 1) student profile holds concepts of the cognitive, personality, and learning preferences domains; 2) lecture profile owns concepts that characterize each option authored to deliver a teaching experience; 3) knowledge profile reveals the former and the acquired knowledge gained by the student for the ten key concepts. The SM items are encoded as XML elements. They are hierarchically embedded. The name and the values of the SM items are stated as values for XML elements and attributes. A description and sample of the repositories is presented next. The student profile devotes an element for each domain. An element contains a set of instance sub-elements. An instance is set as a class in the ontology. It is devoted to depict a specific concept. Thus, 72 instances are declared to describe a student. So the sub-element uses the attribute id_concept to identify an ontological object. Such an object is an instance of the concept class. Moreover, an instance has four sub-subelements to characterize the concept. Hence, the value of the sub-sub-elements meets the ontological constraints established for the class. Some of them, as the value of the linguistic terms, are also instances of an ontological class. Regarding the lecture profile, a judging element is used to describe an option. Its attribute id_version identifies the name of the key concept. It also owns instance subelements to identify concepts by the attribute id_concept. Thus, 13 instances depict the option. But, an instance holds two kinds of sub-sub-elements to describe the concept. Their values also meet the ontological constraints and are objects of a class. The knowledge profile contains a judging element to identify a key concept by the attribute id_version. It owns an instance sub-element to reveal the former or acquired knowledge. Its id_concept attribute corresponds to an ontological object too. The instance holds four sub-sub-elements to set the level and variation knowledge, as does the student profile. A sample of the three XML repositories is outlined next. XML code to set SM repositories of three domains taken from [36]. 00: 01: 02: 03: 04: 05: 06: :: 10: 11: 12: 13: 14: 15: 16: :: 20: 21: 22: 23: 24: 25: 26: ::
low null 1.0 1.0 low normal 0.4 0.6 high null 1.0 1.0
182
6
A. Peña Ayala and H. Sossa
Lecture Sequencing
In this approach, it is believed that the delivery of the best available option of lecture to student enhances her/his learning. This goal is achieved by the sequencing module of the WEBS. This module accounts on the semantics and the content of the SM ontology and repositories. Based on such knowledge, the sequencing tailors a cognitive map (CM) to organize concepts and set causal relationships. It also uses a qualitative decision model to evaluate the lecture’s options. The model represents and simulates causal relationships. The results reveal how concepts’ states are modified by the causal bias. At the end, the option with the highest level for the concept of the knowledge profile is chosen. The management of student’s knowledge is stated next [36]. The ontology holds several key classes used to fix up the CM and evaluate the options. One of them is the relationship class oriented to set a relation between pairs of concepts. Its attributes reveal the antecedent and the consequent concept. They respectively play a cause and effect role. Another attribute identifies a fuzzy-causal rules base (FCRB) to define the bias. The FCRB class defines the rules that describe each instance of a causal relationship. So the attributes set a correspondence between the linguistic term held by an antecedent and the one that is assigned to the consequent. The linguistic term class states the shape of the fuzzy set that is outcome by a membership function. Thus, an instance of this class represents a value of a UOD. The CM organizes the concepts of the five domains into a topology of three tiers: the first holds the 13 concepts of the lecture profile; the second owns a sample of four concepts of each domain stored in the student profile; the third has just one concept to depict the knowledge domain. Concepts of the first tier exert concepts of the second. But, concepts of the second level bias each other. Moreover, they influence the concept of the third level. It also feedbacks concepts of the student profile. The qualitative model applies an algorithm to select the best lecture’s option as follows: Algorithm devoted to choose the best option of lecture taken from [36]. 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21:
Retrieve the student and knowledge profiles of the student e Retrieve the content profile of the current lecture l For every combination of criteria do: Tailor an instance of CM called: CMc Carry out the simulation of CMc as follows: Set time = 1 and stability = false While (time < 100 and not (stability)) do: For each concept c of the CMc: For each relation that points to c do: Estimate the effect on c through fuzzy inference Estimate the new state of c by fuzzy-causal inference Track behavior and final states’ values of c If (the concepts’ states at current time are already stored) Set stability = true Set time++ Track the causal behavior and outcomes of CMc Set b = 1, the first version of CM as the current best option For every option CMi, since the second to the last one do: Compare the outcomes of CMb against CMi If (CMi means a better learning achievement than CMb) set b = i Deliver the option b of the lecture l to student e
Semantic Representation and Management of Student Models
183
7 Experiment The approach was tested through a field experiment [36]. The trial was organized into seven stages: development, student modeling, training, pre-measure, stimulus provision, post-measure, and statistical processing. The results are next stated. During the development stage a WBES prototype was built. Four tools to acquire knowledge about the student were made. Content regarding the “scientific research method” was authored. The course focused on ten key concepts, such as: hypothesis, law, theory. Thus, one lecture was designed to teach a key concept. However, four options were tailored to provide the same lecture according to particular guidelines (i.e. learning theory, media…). Also, 200 volunteers were recruited. They are students, professors, or researchers from academic and governmental institutions. The student modeling stage was achieved by the application of four tests. They were sequentially provided to measure: learning preferences, personality traits, cognitive skills, and educational knowledge. As a result, a SM was outcome for each participant. However, only 113 subjects completed the first test. Later on, 102 volunteers successfully applied the personality exam too. Next, 71 of the remaining subjects achieved the cognitive test. Finally, just 50 volunteers carried out the last exam. Hence, they composed the population of the trial; due to their complete SM was set. The WBES delivered introductory lectures about science to the population during the training stage. Afterwards, 18 participants were randomly chosen to define a sample. The size depicts a standard error of 0.05 for the population. A pre-measure was applied to the sample in the fourth stage. They were inquired about ten key concepts. The evaluation of their former knowledge accounted the TEO criteria. Hence, the volunteer got a measure between 0 and 60 (i.e. for any key concept: ignorance gives 0 points, knowledge grants 1… So the qualification ranges from 0 to 6). Once the subject finished the test, she/he was attached to one comparative group. Participants who fulfilled the exam in an odd position were assigned to experimental group, otherwise to control group. So a pre-measure for each group corresponded to the range [0, 540] (i.e. nine members by the maximum value 60). The stimulus provision stage was devoted to deliver one lecture for each key concept. Prior to providing a lecture, the sequencing module identified the subject’s group. When her/his membership was control, a lecture’s option was randomly chosen; otherwise, the qualitative decision model was triggered to select the best option. During the sixth stage, a post-measure was applied. Thereby, the former test was inquired to volunteers once more. The aim was to estimate the acquired knowledge. Thus, the learning of a key concept was the difference of the post and pre measures. Hence, a volunteer could get an apprenticeship between -60 to 60. In consequence, the measure for a comparative group corresponded to the range [-540, 540]. At the end, statistical data were processed. A sample of the outcomes is set in Table 1. In rows 1 to 3 appear the pre, post, and difference measures of the two groups. It is evident that experimental group began with lower former knowledge that the control group. But at the end, they overcame the disadvantage and achieved a higher learning. Likewise, the IQ of the experimental group is lower than the IQ of the control group. Moreover, an analysis of variance was made as appears in Fig. 1. Control group got a probability of 0.1216, and experimental group achieved 0.006.
184
A. Peña Ayala and H. Sossa
Table 1. A sample of measures outcome by experimental and control groups during the trial
Criterion
Experimental group
Control group
Pre-measure 10 topics Post-measure10 topics Learning achievement Logical learning style Maturity Intelligence quotient
Total: 38; Mean 4.22 Total: 198; Mean 22 Total: 160; Mean 17.78 44% quite high, 44% high, 11 medium 11% high, 22% medium, 66% low 11% high, 22% medium, 66% low
Total: 42; Mean 4.67 Total: 174; Mean 19.33 Total: 132; Mean 14.67 44.4% quite high, 55% high 22% high, 22% medium, 55% low 44% high, 22% medium, 33% low
L i nea r R e gr e s i o n: Di s pe r s i on Di agr am be tw e en P r e and P os t me as ur es for Co nt r o l & Exp e r ime n t a l g r o up s 50
Y : Po st - me a s ur es
40
30
20
10 0
2
4
Variab le C o n_po s * C o n_pre Exp _p os * Exp_pre
6 8 10 X : P r e - me a s ur e s
12
14
16
Control Post = 13.66 + 1.22 Pre Ex perimental Post = 7.72 + 3.38 Pre
Fig. 1. Regression diagram of the tendency for control and experimental groups
8
Conclusions
The semantic representation of the SM is an open research line. It provides the meaning about the knowledge items that characterize the student. When a WBES understands what the SM means, it is able to behave more appropriately, In consequence, the adaptive performance of the WBES enhances the apprenticeship of the students. The results obtained by this approach provide empirical evidence to such assertion. It is found that: Knowledge about the student, which is well understood, contributes to positively stimulate her/his particular attributes. This kind of encouragement facilitates the learning of concepts. It also provides guidelines to author suitable content. As regards the linear regression pictured in Fig. 1, a significance level (α) of 0.05 was accounted. In consequence, the equation outcome for the control group tendency is: Post = 13.7 + (1.22 x Pre-measure). Whilst the one achieved for the experimental group is: Post = 7.72 + (3.28 x Pre-measure). So although the intercept of the control group equation is nearly twice as high as the one attached to the equation of the experimental group, its slope is nearly a third part of the experimental group slope. As a future work, research is needed to design psychological models to be implemented on the Web. The knowledge acquisition based on machine learning is considered too. In addition, the exploration of SM through data mining techniques is a new trend to develop. What is more, the outcome of an ontology as a result of the natural language interaction between student and system is desirable.
Semantic Representation and Management of Student Models
185
Acknowledgments First author states the strength given by his Father, Brother Jesus and Helper, as part of the research of World Outreach Light to the Nations Ministries (WOLNM). This research is supported by: CONACYT-SNI-36453, CONACYT 118962, CONACYT 118862, SIP-EDI: DOPI/3189/08, SIP-20101294 & 20100468, COFAA-SIBE, and European Union, European Commission, CONACYT-FONCICYT project 93829.
References 1. Self, J.: Bypassing the Intractable Problem of Student Modelling. Technical report No. 41 Applied Artificial Intelligence/Artificial Intelligence in Education (1990) 2. Parcus, N.: Software Engineering for Adaptive Hypermedia Systems. PhD thesis, LudwigMaximilians Universität (2000) 3. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American Magazine (May 28-37, 2001) 4. W3C Consortium XML Technology, http://www.w3.org/standards/xml/ 5. W3C Consortium OWL Web Ontology Language, http://www.w3.org/TR/owl-features/ 6. Institute of Electrical and Electronic Engineer (IEEE): Learning Technology Standard Architecture (LTSA). Draft Standard for Learning Technology IEEE P1848.1/D9. IEEE, Los Alamitos (2001) 7. Institute of Electrical and Electronic Engineer (IEEE): Learning Object Metadata (LOM). Draft Standard for Learning Object Metadata IEEE P1884.12.1. IEEE, Los Alamitos (2002) 8. Advanced Distributed Learning (ADL): Sharable Content Object Reference Model (SCORM) Content aggregation model. Draft Version 1.3.2, ADL (2006) 9. IMS Global Learning Consortium: IMS Simple Sequencing Information and Behavior Model. Final Specification Version 1.0, IMS (2003) 10. W3C Semantic Web, http://www.w3.org/2001/sw/ 11. Sosnovsky, S., Brusilovsky, P., Yudelson, M., Mitrovic, A., Mathews, M., Kumar, A.: Semantic Integration of Adaptive Educational Systems. In: Kuflik, T., Berkovsky, S., Carmagnola, F., Heckmann, D., Krüger, A. (eds.) Advances in Ubiquitous User Modelling. LNCS, vol. 5830, pp. 134–158. Springer, Heidelberg (2009) 12. Torre, I.: Adaptive Systems in the Era of the Semantic and Social Web, a Survey. J. User Modeling and User-Adapted Interaction 19(5), 433–486 (2009) 13. Böhnstedt, D., Scholl, P., Rensing, C., Steinmetz, R.: Collaborative Semantic Tagging of Web Resources on the Basis of Individual Knowledge Networks. In: Houben, G.J., McCalla, G., Pianesi, F., Zancanaro, M. (eds.) UMAP 2009. LNCS, vol. 5535, pp. 379– 384. Springer, Heidelberg (2009) 14. Plumbaum, T., Stelter, T., Korth, A.: Semantic Web Usage Mining: Using Semantics to Understand User Intentions. In: Houben, G.J., McCalla, G., Pianesi, F., Zancanaro, M. (eds.) UMAP 2009. LNCS, vol. 5535, pp. 391–396. Springer, Heidelberg (2009) 15. Kleanthous, S.: Semantic-Enhanced Personalized Support for Knowledge Sharing in Virtual Communities. In: Conati, C., McCoy, K., Paliouras, G. (eds.) UM 2007. LNCS (LNAI), vol. 4511, pp. 465–469. Springer, Heidelberg (2007) 16. Bajanki, S., Kaufhold, K., Le Bek, A., Dimitrova, V., Lau, L., O’Rourke, R., Walker, A.: Use of Semantics to Build an Academic Writing Community Environment. In: Dimitrova, V., Mizoguchi, R., du Bolay, B., Graesser, A. (eds.) AIED 2009. Frontiers in Artificial Intelligence and Applications, vol. 200, pp. 357–364. IOS Press, Amsterdam (2009)
186
A. Peña Ayala and H. Sossa
17. Kleanthous, S., Vania Dimitrova, V.: Modelling Semantic Relationships and Centrality to Facilitate Community Knowledge Sharing. In: Nejdl, W., Kay, J., Pu, P., Herder, E. (eds.) AH 2008. LNCS, vol. 5149, pp. 123–132. Springer, Heidelberg (2008) 18. Carmagnola, F., Dimitrova, V.: An Evidence-Based Approach to Handle Semantic Heterogeneity in Interoperable Distributed User Models. In: Nejdl, W., Kay, J., Pu, P., Herder, E. (eds.) AH 2008. LNCS, vol. 5149, pp. 73–82. Springer, Heidelberg (2008) 19. Sosnovsky, S., Mitrovic, A., Lee, D.H., Brusilovsky, P., Yudelson, M.: Ontology-based Integration of Adaptive Educational Systems. In: 16th International Conference on Computers in Education, pp. 11–18. Asia-Pacific Society for Computers in Education, Jhongli (2008) 20. Niu, W.T., Kay, J.: Pervasive Personalization of Location Information: Personalized Context Ontology. In: Nejdl, W., Kay, J., Pu, P., Herder, E. (eds.) AH 2008. LNCS, vol. 5149, pp. 143–152. Springer, Heidelberg (2008) 21. Niu, W.T., Kay, J.: PERSONAF: Framework for Personalized Ontological Reasoning in Pervasive Computing. J. User Modeling and User-Adapted Interaction, 1–49 (in press) 22. Tran, T., Wang, H., Lamparter, S., Cimiano, P.: Personalization Using Ontologies and Rules. In: Nejdl, W., Kay, J., Pu, P., Herder, E. (eds.) AH 2008. LNCS, vol. 5149, pp. 349–352. Springer, Heidelberg (2008) 23. Hatala, M., Wakkary, R.: Ontology-Based User Modeling in an Augmented Audio Reality System for Museums. J. User Modeling and User-Adapted Interaction 15(3-4), 339–380 (2005) 24. Zhang, H., Song, Y., Song, H.: Construction of Ontology-Based User Model for Web Personalization. In: Conati, C., McCoy, K., Paliouras, G. (eds.) UM 2007. LNCS (LNAI), vol. 4511, pp. 67–76. Springer, Heidelberg (2007) 25. Tu, L.Y., Hsu, W.L., Wu, S.H.: A Cognitive Student Model-an Ontological Approach. In: 12th International Conference on Computers in Education, pp. 3–6. Asia-Pacific Society for Computers in Education, Jhongli (2002) 26. Faulhaber, A., Melis, E.: An Efficient Student Model Based on Student Performance and Metadata. In: 18th European Conference on Artificial Intelligence, pp. 276–280. IOS Press, Amsterdam (2008) 27. Kasai, T., Nagano, K., Mizoguchi, R.: An Ontological Approach to Support Teachers in Designing Instruction Using ICT. In: 17th International Conference on Computers in Education, pp. 11–18. Asia-Pacific Society for Computers in Education, Taiwan (2009) 28. Kontopoulos, E., Vrakas, D., Kokkoras, F., Basiliades, N., Vlahavas, I.: An Ontologybased Planning System for e-Course Generation. J. Expert Systems with Applications 35(1-2), 398–406 (2008) 29. Self, J.: Formal Approaches to Student Modeling. Technical report AI-59, Lancaster University (1991) 30. Foundation for Intelligent Physical Agents (FIPA): FIPA Device Ontology Specification. Standard SI00091, FIPA (2002) 31. Wechsler, D.: WAIS III Test de Inteligencia para Adultos. Paidos, Argentina (2002) 32. Hathaway, S.R., McKinley, J.C.: Inventario Multifásico de la Personalidad Minnesota-2. El Manual Moderno, Mexico City (2000) 33. Gardner, H.: Frames of Mind. Basic Book Inc., New York (1983) 34. Anderson, L.W., Krathwohl, D.R.: A Taxonomy for Learning, Teaching, and Assessing: A Revision of Blomm’s Taxonomy. Longman, New York (2001) 35. Guttormsen, S., Krueger, H.: Using new Learning Technologies with Multimedia. J. IEEE Multimedia. 7(3), 40–51 (2000) 36. Peña, A.: A Student Model based on Cognitive Maps. PhD Thesis, National Polytechnic Institute (2008)
An Effective Heuristic for the No-Wait Flowshop with Sequence-Dependent Setup Times Problem Daniella Castro Araújo and Marcelo Seido Nagano Industrial Engineering, School of Engineering of São Carlos, University of São Paulo Av. Trabalhador São-Carlense 400, 13566-590 São Carlos, São Paulo, Brazil {daraujo,drnagano}@sc.usp.br
Abstract. This paper presents a new constructive heuristic named GAPH based on a structural property for the m-machine no-wait flowshop with sequencedependent setup times with makespan as the criterion. Experimental results demonstrate the superiority of the proposed approach over three of the bestknow methods in the literature. Experimental and statistical analyses show that the new heuristic proposed provides better solutions regarding the solution quality and the computational effort. Keywords: Scheduling, Constructive Heuristic, No-wait flowshop, Sequencedependent setup, Makespan.
1 Introduction The first systematic approach to scheduling problems was undertaken in the mid1950s. Since then, thousands of papers on different scheduling problems have appeared in the literature. The majority of these papers assumed that the setup time is negligible or part of the job processing time. While this assumption simplifies the analysis and reflects certain applications, it adversely affects the solution quality of many applications of scheduling that require an explicit treatment of setup times [1]. There are two types of setup time: sequence-independent and sequence-dependent. If setup time depends solely on the task to be processed, regardless of its preceding task, it is called sequence-independent. On the other hand, in the sequence-dependent type, setup time depends on both the task and its preceding task [2]. In todays scheduling problems in both manufacturing and service environments it is of significance to efficiently utilize various resources. Treating setup times separately from processing times allows operations to be performed simultaneously and hence improves resource utilization. This is, in particular, important in modern production management systems such as just-in-time (JIT), optimized production technology (OPT), group technology (GT), cellular manufacturing (CM), and time-based competition [1]. Another important area in scheduling arises in no-wait flowshop problems, where jobs have to be processed without interruption between consecutive machines. There are several industries where the no-wait flowshop problem applies including the metal, plastic, and chemical industries. For instance, in the case of steel production, G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 187–196, 2010. © Springer-Verlag Berlin Heidelberg 2010
188
D. Castro Araújo and d M. Seido Nagano
the heated metal must conttinuously go through a sequence of operations before iit is cooled in order to prevent defects d in the composition of the material [3]. As notedd by Hall and Sriskandarajah [4]], the first of two main reasons for the occurrence of a nowait or blocking production environment lies in the production technology itselff. In some processes, for exam mple, the temperature or other characteristics (suchh as viscosity) of the materiall require that each operation follow the previous one immediately. According to Bianco et al. [5], flowshop no-wait scheduling probleems are also motivated by concepts c such as JIT and zero inventory in moddern manufacturing systems. The main feature of the no-wait flowshop is that the task operation i +1 mustt be processed soon after the end of operation i, where 1 ≤ i ≤ m - 1. Thus, there can bee no waiting time in processing a job from one machine to the next. An example of the nowait flowshop with sequencce-dependent setup times problem is shown in Fig. 1.
Fig. 1. No o-wait flowshop with m machines and n jobs
Fig. 1 shows the schedulling problem of n jobs on m machines, considering thatt all tasks are available at the sam me time on the factory floor. A solution can be represennted by σ , ,…, , ,…, , where [i] is the task that appears in the i-th position of the sequence. The no-wait flowshop sccheduling problem consists of a set J { , , , … , } of n jobs which are to be processed on a set { , , ,…, } of m dedicaated machines, each one being able a to process only one job at a time. Job consists oof m operations , ,…, , to be executed in this order, where operation must be executed on mach hine k, with processing time. Furthermore, operattion must start immediately after operation is completed. peration , the machine k requires a sequence-dependdent Moreover, to execute op setup , if operation i processed immediately before is . Fig. 2 shows an examplee of the scheduling problem. In particular, an instance w with 2 machines and 3 jobs is represented. For that instance, a feasible solution andd an optimal solution are represeented.
An Effective Heuristic for the No-Wait Flowshop
189
Fig. 2. An example of the scheduling problem
This paper addresses the m-machine no-wait flowshop problem to minimize makespan where setup times are separated and sequence-dependent ( / , . As this problem is proved to be NP-hard [5], we propose a heuristic / solution algorithm and compare it to Bianco et al. [5] algorithms, BAH and BIH, and to Brown et al. [6] algorithm, TRIPS, adapted to this problem. This paper is organized as follows. In Sections 2 and 3, we describe the set of constructive heuristics available for the problem and the new heuristic proposed, respectively. In Section 4, we test the new heuristic effectiveness. Finally, conclusions and final considerations are given in Section 5.
2 Existing Constructive Heuristics for the Problem In this section, we review the main contributions to the problem regarding constructive methods. More specifically, we explain in detail the constructive heuristics BAH and BIH, from Bianco et al. [5] and TRIPS, from Brown et al. [6]. 2.1 BAH BAH algorithm finds a feasible sequence in n iterations. At each iteration, given a partial sequence of the scheduled jobs computed in the previous iteration, the algorithm examines a set of candidates of the unscheduled jobs, and appends a candidate job to a partial sequence minimizing the time when the shop is ready to process an unscheduled job. The pseudo-code of the heuristic is as follows: Given a set J { , , , … , } of n jobs, let σ be the set of programmed jobs and U be the set of non-programmed jobs. Step 1: U←J; ; Step 2: While U ≠ , do:
190
D. Castro Araújo and M. Seido Nagano
Step 2.1: Choose the job to be added at the end of the sequence σ , such that the makespan is minimum; Step 2.2: Add job to the end of the sequence σ; Step 2.3: U←U- . 2.2 BIH The BIH algorithm also finds a sequence of n jobs on n iterations. But in this algorithm, at each iteration it considers a sequence of a subset of jobs, and finds the best sequence obtained inserting an unscheduled job in any position of the given sequence. A more detailed description of the heuristic is as follows: Given a set J { , , , … , } of n jobs, let σ be the set of programmed jobs, U be the set of non-programmed jobs and h the relative insertion position. Step 1: U←J; ; Step 2: While U ≠ , do: which can be inserted in the sequence σ , Step 2.1: Choose the job such that the makespan is minimum. Let h be the relative insertion position; Step 2.2: Insert job at position h in the sequence σ; Step 2.3: U←U- . 2.3 TRIPS TRIPS heuristic was developed for the no-wait flowshop with sequence-independent setup times, for minimizing total flowtime / /∑ or makespan / / ). In this paper, because there are only BIH and BAH constructive heuristics for / / problem, we will adapt it to this problem. the TRIPS examines all possible three-job combinations from the set of unscheduled jobs U and chooses the sequence {jw, jx, jy} that minimizes the three-job objective. Then, assigns job to the last empty position in the sequence σ and removes from U. The heuristic repeats the process, assigning one more job to σ for each set of triplets examined until only three jobs are left. Then, it selects the optimal sequence for these jobs and places them in the final positions of heuristic sequence σ. The pseudo-code of the heuristic is as follows: Given a set J { , , , … , } of n jobs, let σ be the set of programmed jobs and U be the set of non-programmed jobs. Step 1: U←J; σ ← ; h←0; Step 2: While h 0) be nodes such that no zi is in V. If G is a nested graph (and I = {1,…,k}) then graph G’ = ( V ∪ {zi | i∈I}, E ∪ {(x,zi), (zi,y) | i∈I} – {(x,y)}) is also a nested graph. According to Definition 3, any nested graph can be obtained from the base graph with a single arc by repeated substitution of any arc (x,y) by a special sub-graph with k nodes (see Figure 2). Notice that a single decomposition rule covers both the serial decomposition (k = 1) and the parallel/alternative process decomposition (k > 1). x
z
y
x
x
y k=1
z
x
z y k=2
z
z
z
y k=3
Fig. 2. Arc decompositions in nested graphs
The directed nested graph defines topology of the nested P/A graph but it is also necessary to annotate all fan-in and fan-out sub-graphs as either alternative (ALT) or parallel (PAR) sub-graphs. The idea is to annotate each node by input and output label which defines the type of branching. Recall that a fan-out sub-graph with principal node x and branching nodes zi is a sub-graph consisting of nodes x, z1,…, zk (for some k) such that each (x, zi), 1 ≤ i ≤ k, is an arc in G. Fan-in sub-graph is defined similarly. Definition 4: Labeled nested graph is a nested graph where each node has (possibly empty) input and output labels defined in the following way. Nodes s and e in the base nested graph and nodes zi introduced during decomposition have empty initial labels. Let k be the parameter of decomposition when decomposing arc (x,y). If k > 1 then the output label of x and the input label of y are unified and set either to PAR or to ALT (if one of the labels is non-empty then this label is used for both nodes). Figure 3 shows how the labeled nested graph is constructed for the example from Figure 1. Notice how the labels are introduced (a semicircle for PAR label and A for ALT label) or unified in case that one of the labels already exists (see the third step). When a label is introduced for a node, it never changes in the decomposition process.
Fig. 3. Construction of a labelled nested graph
A
A
A
A
Optimizing Alternatives in Precedence Networks
201
Definition 5: A nested P/A graph is obtained from a labeled nested graph by removing the labels and defining fan-in and fan-out sub-graphs in the following way. If the input label of node x is non-empty then all arcs (y, x) form a fan-in sub-graph which is parallel for label PAR or alternative for label ALT. Similarly, nodes with a non-empty output label define fan-out sub-graphs. Each arc (x, y) such that both output label of x and input label of y are empty forms a parallel fan-in sub-graph. Note that requesting a single arc to form a parallel fan-in sub-graph is a bit artificial; this requirement is used to formally ensure that each arc is a part of some sub-graph. As showed in [3] a nested P/A graph is a P/A graph and the assignment problem for nested P/A graph is easy to solve. Proposition 2: The assignment problem for a nested P/A graph is tractable (can be solved in a polynomial time). Note finally that Temporal Network with Alternatives is obtained from the P/A graph by annotating each arc by a simple temporal constraint (defines minimal and maximal distances between the nodes allocated to time) and similarly for Nested TNA. In this paper we assume that simple temporal constraints are only in the form of precedence relations which are always satisfied as the graphs are acyclic. However, as showed in [4] if unrestricted simple temporal constraints are assumed then the assignment problem becomes NP-hard even for Nested TNA.
3 Equivalence Tree For solving the P/A graph assignment problem, it is useful to know which nodes are logically equivalent, that is, they either all appear in a feasible assignment or none of them appears there. If such a relation is known then the P/A graph assignment problem can be solved by backtrack-free search as described in [3]. Briefly speaking, the validity of a node is decided based on its equivalence with any already decided node. Notice that Nested TNAs do not explicitly represent information about the equivalent nodes. Let us first formally introduce the notion of equivalent nodes: Definition 6: Let G be a P/A graph and u and v be arbitrary two nodes of G. Then u and v are called equivalent if and only if there is no feasible assignment of nodes in G in which u and v are assigned different values. As showed in [3] equivalent nodes can be detected during the process of constructing the nested network and hence logical reasoning is tractable for nested P/A graphs. We propose now a novel data structure called an equivalence tree that explicitly keeps information about equivalent nodes. This data structure can be used both for logical reasoning (what happens if some node becomes valid or invalid) and for preference reasoning as we shall present later. This tree is similar to tree representation of process from [1] but it is smaller because of aggregating logically equivalent nodes. Notice that the nodes participating in parallel branching are all equivalent in the sense of being all or none present in the solution graph. Non-equivalence relation between nodes is introduced only via alternative branching. The idea of equivalence tree is to keep all equivalent nodes from a Nested TNA together in a so called E-node.
202
R. Barták
The alternative branching in Nested TNA is represented using a so called A-node in the equivalence tree that introduces new subsets of equivalent nodes (Figure 4). We build the equivalence tree along the process of building the Nested TNA in the following way. The base nested graph G = ( {s,e}, {(s,e)} ) is represented using a root E-node annotated by the set {s,e}. Now, if we decompose arc (s,e) and the decomposition is either parallel or a single node is introduced then we add the new nodes to the annotation of the root E-node. If the decomposition is alternative with new nodes z1,…, zk (k > 1) then we introduce a new A-node connected to the root and k E-nodes connected to this A-node such that i-th E-node is annotated by set { zi }. Hence there are as many A-nodes in the equivalence tree as the number of decompositions steps with alternative branching. In general, each time arc (x,y) is decomposed we find Enodes in the current equivalence tree containing x and y in their annotation. It may happen that both x and y are in the same E-node and then we continue like above (in case of alternative branching the new A-node is connected to this E-node, otherwise the new nodes are added to the annotation of that E-node). If x and y are in different E-nodes then we do the above extension of the equivalence tree with the E-node that is farther from the root (we shall show later that both E-nodes must be in the same branch of the equivalence tree). Figure 4 shows the equivalence tree for the Nested TNA from Figure 1. collectMaterial, shipPiston, assemblePiston, weldTube, weldRod, sawRod, assembleKit, clearRod, aux
E-node A-node E-node buyTube
E-node sawTube, clearTube
Fig. 4. Example of an equivalence tree (box represents E-node, circle represents A-node)
Before going into formal definitions, it should be noted that the equivalence tree is unique for given Nested TNA independently of the order of decompositions steps, but a single equivalence tree may correspond to several Nested TNAs. Definition 7: Let G = (V, E) be a nested P/A graph. Then the equivalence tree for graph G is a tree with two types of nodes: E-nodes and A-nodes. The root of the tree is E-node, leaves of the tree are E-nodes, the parent of A-node is E-node, and the parent of E-node is A-node (hence E-nodes and A-nodes alternate in any branch of the equivalence tree). Each E-node is annotated by a non-empty subset of nodes from V such that all these nodes are equivalent with respect to G. Each node from V is present in exactly one E-node. Finally, let X be E-node, Y be its child node (must be an A-node), and Z1,…, Zk be all children of Y (must be E-nodes) and let x be from the annotation of X and each zi be from the annotation of Zi (1 ≤ i ≤ k). Then in any feasible solution to the assignment problem for graph G, either all x,z1,…, zk are invalid or x is valid and exactly one of nodes z1,…, zk is valid.
Optimizing Alternatives in Precedence Networks
203
As already mentioned the motivation behind the equivalence tree is keeping all equivalent nodes together and representing alternative branching relations. We sketched the process of obtaining an equivalence tree for the nested P/A graph at the beginning of this section. The following Algorithm 1 formalizes the process. 1. 2.
The base nested graph G = ( {s,e}, {(s,e)} ) is represented using an equivalence tree T with a single E-node annotated by the set {s,e}. Let G = (V, E) be a nested P/A graph, (x,y) ∈ E be its arc that will be decomposed to new nodes z1,…, zk (k > 0), and G’ be the P/A graph after the decomposition (see Definition 3). Let T = (N, C) be an equivalence tree for G and X denotes the E-node from N that is the farthest node from the root among the nodes whose annotation contains either x or y. a. If the decomposition is parallel or k = 1 then T’ = T, where the annotation of X is extended by nodes z1,…, zk, is an equivalence tree for G’. b. If the decomposition is alternative and k > 1 then let Y be a new A-node (not in N) and Z1,…, Zk be new E-nodes (not in N) such that Zi is annotated by {zi}. The graph T’ = ( N ∪ {Y} ∪ {Zi | i∈I}, C ∪ {(X,Y)} ∪ {(Y,Zi) | i∈I}), where I = {1,…,k}, is an equivalence tree for G’.
Figure 5 demonstrates the above process of building an equivalence tree. Nested P/A graphs 1
1
1
1 4
5
1
5
4
5
4 9
7
8
10
4
7
11
6
3 2
2
6
3
2
10
13 12
3
9
8
11 15
14 6
3
2
2
Equivalence trees 1,2
1,2,3,4
1,2,3,4,5,6
1,2,3,4,5,6
1,2,3,4,5,6
7
8
9
10
11
7,12
8
13
9
10
11,15
14
Fig. 5. Incremental construction of the equivalence tree for a nested P/A graph
Lemma 1: Let G = (V, E) be a nested P/A graph, (x,y) ∈ E be its arc, T = (N, C) be an equivalence tree for G constructed using Algorithm 1, and X and Y be E-nodes in T containing x and y respectively in their annotations. Then either X = Y or the path between nodes X and Y is a part of branch in T. Proof: The equivalence tree for base nested graph trivially satisfies the lemma. All arcs added in step 2 of the algorithm also satisfy the lemma.
204
R. Barták
Proposition 3: The graph T obtained by Algorithm 1 for a nested P/A graph G = (V, E) is an equivalence tree for G. Proof: First, it should be clear that the graph T obtained by Algorithm 1 is a tree with alternating E-nodes and A-nodes on each branch, the root of T is an E-node and the leaves of T are E-nodes. Also each node from V appears in annotation of exactly one E-node from the equivalence tree T. Let us prove now that the nodes in the annotation of given E-node are equivalent with respect to G. Nodes s and e from the base graph are equivalent. New nodes can be added to existing annotation of E-node only by step (2a) of Algorithm 1. Either these nodes are part of parallel sub-graph (decomposition) and all such nodes are equivalent (definition of feasible assignment for parallel sub-graphs) or the new node is part of decomposition of size 1. In this second case, assume that arc (x, y) was decomposed and new node z was added. Either x and y are equivalent and hence part of annotation of a single E-node and z is added to this E-node because it is equivalent with x and y or x is a principal node and y is a branching node in some alternative subgraph (a symmetrical situation is similar) and then z is equivalent with y. Then y was added later to graph G and hence E-node Y with y is farther from the root of T than the node with x (due to step 2b of Algorithm 1) and z is added to the annotation of Y. It remains to prove the last condition of Definition 7 which is a direct consequence of step (2b) of Algorithm 1, definition of feasible assignment for alternative subgraphs, and having only equivalent nodes in the annotation of each E-node. Equivalence trees represent a compact way for describing branching constraints in nested P/A graphs. Instead of specifying the branching constraints for the nodes in a P/A graph, we can specify the branching constraints for the E-nodes in the corresponding equivalence tree. In particular, let ValX be a 0/1 variable for E-node X. Then we define the branching constraints in the following way. Let X be E-node, Y be its child node (must be an A-node), and Z1,…, Zk are all children of Y (must be E-nodes) then the branching constraint is ValX = Σi=1,…,k ValZi. The soundness of this model follows directly from Definition 7. Moreover, this model is clearly Berge acyclic where arc consistency guarantees global consistency [6]. Hence this is an alternative proof of proposition 2. Actually, the proposed model is equivalent to the model proposed in [3]; the only difference is that instead of a set of validity variables for equivalent nodes in the P/A graph that are connected via equality constraints we use a single validity variable for the corresponding E-node in the equivalence tree.
4 Preference Model for Alternatives Traditional scheduling problems are optimization problems where the utility function is frequently dependent on time such as makespan, tardiness, and earliness. In this paper we suggest a different type of utility function that depends on selection of activities among the alternatives. Such utility function is more typical for AI planning where actions have cost and the task is to find a plan (a sequence of actions) where the sum of action costs is minimized. Even the earliest formulations of planning problems where the task was to find the shortest plan (a plan with the smallest number of actions) can be formulated as a cost optimization problem – each action has identical
Optimizing Alternatives in Precedence Networks
205
cost 1. According to our experience with real-life scheduling problems, the alternative processes are not of the same cost and users want to specify that one alternative is preferred to another one. One of the possible ways how to model such preferences is using the cost model from AI planning. Briefly speaking, each action has a nonnegative cost and the task is to select a subset of actions where the sum of action costs is minimized and all other constraints are satisfied. In our formal model of P/A graphs we assume only the logical branching constraints (in practice, the temporal and resource constraints must also be assumed) so the problem can be formalized as follows: Definition 8: Given a P/A graph G such that each node has a non-negative cost and a subset of nodes in G which are assigned to 1, P/A graph optimization problem is the problem of finding a feasible assignment of 0/1 values to all nodes of G which extends the prescribed partial assignment and minimizes the sum of costs of nodes assigned to 1. The P/A graph optimization problem for G = (V, E) can be naturally modeled as a constrained optimization problem. Let X be the Boolean (0/1) validity variable for each node x ∈ V, costx be the cost of node x, and S ⊆ V be the set of nodes assigned to 1. Then the following constraint model describes the problem: minimize ∑x∈V X * costx under the constraints X=Y X = Σi=1,…,k Zi X=1 X ∈ {0, 1}
for each (x, y) ∈ E s.t. x and y are part of parallel sub-graph for each complete alternative sub-graph with the principal node x and branching nodes z1,…, zk for each x ∈ S for each x ∈ V
As shown in [3], the above model does not propagate a lot, in particular, arc consistency does not remove all infeasible values. Even if we modify the logical part of the model as described in [3] or earlier in this paper, we still do not achieve global consistency regarding the utility function. The following example demonstrates that: minimize A+B+C+D under the constraints A = B+C D = B+C A=1 A, B, C, D ∈ {0, 1} In the example we assume the cost of each action to be 1 hence utility is A+B+C+D. Clearly, the value of the utility function equals 3 if the logical (branching) constraints are satisfied. However, the lower bound of the utility function as computed via arc consistency is 1 while the upper bound is 4. If we add implied constraint A = D as suggested in [3], we obtain global consistency for the logical constraints (D = 1), the lower bound of the utility function increases to 2, but the upper bound does not change (it is still 4). In the following section, we shall show how to achieve global consistency for the utility function by integrating logical and preference reasoning by means of the equivalence tree.
206
R. Barták
5 Reasoning with Preferences on Alternatives As we already presented, the equivalence tree can be used to define an alternative constraint model for the P/A graph assignment problem and this model achieves global consistency. The main idea was that the logically equivalent nodes arising from the parallel sub-graphs in the P/A graph are kept together in a single E-node of the equivalence tree while the alternative sub-graphs are modeled via A-nodes. We used 0/1 variable ValX to describe whether all nodes from the annotation of E-node X (or none of them) belong to the solution sub-graph defined by the nodes of the P/A graph that are assigned to 1. Hence, a feasible assignment of nodes in the P/A graph corresponds to a feasible assignment of nodes in the corresponding equivalence tree. In a similar way, we can accumulate costs of all nodes in the annotation of E-node X – fCostX = ∑x∈annotation(X) costx. Then the utility function can be reformulated in terms of the equivalence tree T = (N,C): ∑X∈N ValX * fCostX. The above reformulation of the utility function strengthens domain filtering (estimate of bounds) as described in the previous section but it still does not achieve global consistency. We shall now present how to compute the lower and upper bounds of the utility function precisely for any consistent (possibly partial) assignment of variables ValX. Let dom(A) be the domain of variable A (possible values to be assigned to A). For each node X (both E-nodes and A-nodes) in the equivalence graph we compute values minCostX and maxCostX using the following Algorithm 2: Let the nodes X be visited in the order from leaves to the root if X is a leaf E-node then case 1∈ dom(ValX): minCostX = maxCostX = fCostX case ValX = 0: minCostX = ∞, maxCostX = -∞ if X is an A-node with Z1,…, Zk being all its child E-nodes then minCostX = min{ minCostZi | i = 1,…,k} maxCostX = max{ maxCostZi | i = 1,…,k} if X is a non-leaf E-node with Y1,…, Yk being all its child A-nodes then case 1∈ dom(ValX): minCostX = fCostX + ∑i=1,…,k minCostYi maxCostX = fCostX + ∑i=1,…,k maxCostYi case ValX = 0: minCostX = ∞, maxCostX = -∞ In the algorithm we assume that min {} = ∞ and max {} = - ∞. After running Algorithm 2, the values minCost and maxCost of the root node of the equivalence tree represent the lower and upper bounds of the utility function provided that at least one node is valid. We do not assume the trivial situation where all nodes are invalid, for which the utility function equals zero. Notice also that if at least one node is preassigned to 1 (it’s valid) then ValR = 1 for the root node R of the equivalence tree.
Optimizing Alternatives in Precedence Networks
207
Proposition 4: Let R be the root node of equivalence tree T and minCostR ≠ ∞. Then there exists a feasible assignment of nodes in T such that the utility function equals minCostR (and similarly for maxCostR). Proof: The feasible assignment will contain the root node, that is, ValR = 1. From definition 7 we know that for each A-node Y that is a child of R exactly one child Zi of Y must be valid. Let j = argmin{ minCostZi | i = 1,…,k}, where Z1,…, Zk are all children of Y, then put ValZj = 1 and ValZi = 0 for i ≠ j (if there are more such j then select one). Note that this is possible because there must be some minCostZj different from ∞ and hence1∈ dom(ValZj). Moreover, 0∈ dom(ValZj) for i ≠ j because otherwise ValZj = 0 due to global consistency of variables ValX. This way we continue until we reach the leaf nodes. ValZi = 0 means that all descendants of Zi are invalid; ValZj = 1 means that we select a valid children as above. Clearly, the selected nodes form a feasible assignment and the value of the utility function equals minCostR. A similar approach can be applied to obtain a feasible assignment for maxCostR. Proposition 5: The P/A graph optimization problem for nested P/A graphs is tractable. Proof: Algorithm 2 runs in linear time in the number of nodes of the equivalence tree and Proposition 4 shows how to find the optimal assignment. Algorithm 2 is used to compute the bounds of the utility function. It is based on idea of propagating cost information from the leaves to the root. It represents one-way propagation from variables ValX to the utility function. We can also propagate in the opposite direction from the bounds of the utility function to variables ValX. This way, we can invalidate nodes (ValX = 0) that cannot be part of the feasible assignment whose value of the utility function is within given bounds. Again, we can utilize values minCost and maxCost but now they are recomputed in the direction from the root R to the leaves as the proof of Proposition 4 showed. Algorithm 3 describes a general approach when minCostR is increased and/or maxCostR is decreased: Let the nodes X be visited in the order from the root to the leaves case X is an E-node (with possible children Y1,…, Yk): if minCostX > maxCostX then ValX = 0 for j = 1 to k do minCostYj = ∞, maxCostYj = -∞ else for j = 1 to k do minCostYj = max(minCostYj, minCostX - fCostX - ∑i=1,…,k, i≠j maxCostYi) maxCostYj = min(maxCostYj, maxCostX - fCostX - ∑i=1,…,k, i≠j minCostYi) case X is an A-node (with possible children Z1,…, Zk such that 1∈ dom(ValZj)): for j = 1 to k do minCostZj = max(minCostZj, minCostX) maxCostZj = min(maxCostZj, maxCostX) Algorithm 3 invalidates only nodes that cannot be part of a feasible assignment, but not all of them. We are currently working on its stronger version.
208
R. Barták
6 Discussion and Conclusions The paper describes equivalence trees as a compact representation of Nested Temporal Networks with Alternatives appropriate for efficient propagation of branching (logical) constraints. We proposed a novel utility function for scheduling problems where alternative activities are assumed and we showed that equivalence trees simplify propagation of cost information for that utility function too. In particular, we can compute the exact bounds of the utility function and we can also filter-out some activities based on the restricted bounds of the utility function. We focused on efficient logical and cost reasoning that is relatively novel for scheduling problems. To model real-life problems one needs to assume the temporal and resource constraints as well. They can be easily added when using the constraint satisfaction framework so it is interesting to find out how to integrate logical, temporal, resource, and cost reasoning to achieve stronger domain filtering. An interesting open question is also how to combine utility functions based on time (tardiness, earliness) with the utility function based on selection of alternatives. Acknowledgments. The research is supported by the Czech Science Foundation under the contract P202/10/1188.
References 1. Bae, J., Bae, H., Kang, S.-H., Kim, Z.: Automatic Control of Workflow Processes Using ECA Rules. IEEE Transactions on Knowledge and Data Engineering 16(8), 1010–1023 (2004) 2. Barták, R., Čepek, O.: Temporal Networks with Alternatives: Complexity and Model. In: Proceedings of the Twentieth International Florida AI Research Society Conference (FLAIRS), pp. 641–646. AAAI Press, Menlo Park (2007) 3. Barták, R., Čepek, O.: Nested Precedence Networks with Alternatives: Recognition, Tractability, and Models. In: Dochev, D., Pistore, M., Traverso, P. (eds.) AIMSA 2008. LNCS (LNAI), vol. 5253, pp. 235–246. Springer, Heidelberg (2008) 4. Barták, R., Čepek, O., Hejna, M.: Temporal Reasoning in Nested Temporal Networks with Alternatives. In: Fages, F., Rossi, F., Soliman, S. (eds.) CSCLP 2007. LNCS (LNAI), vol. 5129, pp. 17–31. Springer, Heidelberg (2008) 5. Beck, J.C., Fox, M.S.: Constraint-directed techniques for scheduling alternative activities. Artificial Intelligence 121, 211–250 (2000) 6. Beeri, C., Fagin, R., Maier, D., Yannakakis, M.: On the desirability of acyclic database schemes. Journal of the ACM 30, 479–513 (1983) 7. Kim, P., Williams, B., Abramson, M.: Executing Reactive, Model-based Programs through Graph-based Temporal Planning. In: Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 487–493 (2001) 8. Kuster, J., Jannach, D., Friedrich, G.: Handling Alternative Activities in ResourceConstrained Project Scheduling Problems. In: Proceedings of Twentieth International Joint Conference on Artificial Intelligence (IJCAI 2007), pp. 1960–1965 (2007) 9. Nuijten, W., Bousonville, T., Focacci, F., Godard, D., Le Pape, C.: MaScLib: Problem description and test bed design, http://www2.ilog.com/masclib/ 10. Sormaz, D.N., Khoshnevis, B.: Generation of alternative process plans in integrated manufacturing systems. Journal of Intelligent Manufacturing 14, 509–526 (2003)
AI-Based Integrated Scheduling of Production and Transportation Operations within Military Supply Chains Dmitry Tsadikovich*, Eugene Levner, and Hanan Tell Bar Ilan University, Department of Management, Ramat Gan 52900 Israel
[email protected],{levnere,tellha}@mail.biu.ac.il Abstract. Maintaining the military weapon systems requires that spare parts of military equipment are available where and when they are needed. We focus on integrated demand-responsive scheduling of operations within two-echelon military supply chains consisting of production and transportation operations. Integration across these operations is achieved by introducing an additional intermediary module into the mathematical model which is solved by a new twolevel artificial-intelligence (AI)-based algorithm. In order to carry out this task we introduce and analyze two performance measures: time of response and military effectiveness. A new solution method based on integer programming techniques in combination with AI-based heuristic search is derived. The discrete-event simulation tool - ARENA 11.0, is used to implement the integrated scheduling method. Keywords: Planning and scheduling, military supply chain, supply chain management, AI-based heuristic search, simulation.
1 Introduction Military supply chain management is a multi-functional approach to procuring, producing, repairing and delivering required military products, parts and services in a time-saving and cost-saving manner. The primary goal of a military supply chain system is to support offensive and defensive weapon systems. This goal includes subsuppliers, suppliers, repair depots, distribution centers, retailers and clients/consumers (the military bases). The logistics support of future war operations must offer a higher responsiveness of the military force, which means to deliver right product to right location at right time with right service in order to satisfy military forces requirements in dynamic environments. We focus on integrated demand-responsive scheduling of operations within military supply chains consisting of production and transportation operations. Coordination between these operations should lead to improving the customer performance for the whole chain. We introduce an additional module into the chain. Termed the "intermediary" module, its main role is to reduce the discrepancies in time and common resources that are shared between different components. *
Corresponding author.
G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 209–220, 2010. © Springer-Verlag Berlin Heidelberg 2010
210
D. Tsadikovich, E. Levner, and H. Tell
The demand-responsiveness and effectiveness of the military supply chains will be formally analyzed with the help of two performance measures, namely, the time of response and the military effectiveness. In order to find the optimal solution of the integrated mathematical model we introduce a new two-level algorithm belonging to the family of the artificial-intelligence (AI) based heuristic search algorithms. Discrete-event simulation tool (ARENA 11.0) is used to implement this algorithm. The remainder of the paper is organized as follows. Section 2 discusses the previous work. Section 3 describes the integrated approach. In Section 4 mathematical models are formulated. The algorithm description is displayed in Section 5. Computational results are given in Section 6. Section 7 concludes the paper.
2 Review of Previous Work In this section, the papers cited are grouped according to three research directions: (1) analysis and modeling of military supply chains; (2) analysis of performance measures in supply chains, and (3) integrated models of production and transportation operations within supply chains. In the literature on military supply chain modeling we can highlight the paper of Bhagavatula, et al., 2008, in which the modeling of material flow (arms and ammunitions) along a three-echelon military supply chain for two main different scenarios are analyzed, ordinary (peacetime) and emergency (wartime). In order to reduce the delay time a new mathematical model is created. The paper proposed by Barahona, et al., 2007 focuses on creating effective military supply chain for failed combat vehicles by using new dynamic multi-point approach. Its effectiveness is expressed by determining the operational availability and customer waiting time performance measures. The paper of McGee, et al., 2005 employs a simulation model of the Air Force multiechelon inventory system across a system of bases to examine local inventory and local repair strategies. Four performance measures are used: operational availability, abort rate, customer wait time and total transportation cost. Beamon, 1999 presents an overview of the used performance measures in the commercial supply chains and identifies three main groups: resource, output (customer responsiveness) and flexibility. Leiphart, 2001 emphasizes the important role of the customer's waiting time performance measure in creating effective military supply chains. Gibson, 2004 introduces the average customer wait time performance measure for identifying distribution challenges and supply chain bottlenecks. In order to improve the military logistic chains for spare parts, Johnson and Levite (2003) developed the Define-Measure-Improve methodology which is based on two main performance measures: customer wait time (CWT) and requisition wait time. Most of the integrated supply chain studies focus on the integrated analysis of production-inventory-distribution systems (Sarmiento and Nagi, 1999; 2006; Xiuli and Cheng, 2008). Other works consider a special type of integrated production and distribution scheduling problem, in which finished goods inventory is not allowed: make-to-order approach (Chen, 2000; Chen and Vairaktarakis, 2005; Li et al., 2008). Stecke and Zhao, 2007 introduce the integrated production-transportation problem with different commit-to-delivery mode. Yan et al, 2006 develop a network flow model that integrates material production scheduling and truck dispatching together.
AI-Based Integrated Scheduling of Production and Transportation Operations
211
Our work suggests a new logistic and mathematical approach in order to improve effectiveness and demand-responsiveness of the military supply chain. The contribution of our approach is based on the inclusion of an optimization module (intermediary module) between the production and transportation components with the aim of coordinating (integrating) the operations across these components. In order to measure the effectiveness and demand-responsiveness of the integrated military supply chains, two specific performance measures, the time of response and the military effectiveness, are suggested. These measures are mathematically described and presented in the objective function of the mixed integer programming model. To the extent that these performance measures are better, the whole military supply chains will be more demand-responsive and effective. In the current work we use two basic mathematical models: production (job shop problem) and transportation (vehicle routing problem). These basic models were introduced by Pinedo (2002) and Dantzig and Ramser (1959), respectively. In order to make them demand-responsive, we will henceforth develop and extend them.
3 Integrated Approach to the Military Supply Chain The standard modular approach in military supply chain planning and scheduling literature considers the production and transportation modules (PM and TM) independently and functions as follows: the recovery plant gives priorities to existing batches and tries to schedule them as soon as possible (with the batches of higher priorities being served first), whereas the transportation unit gives priorities to the shipments and tried to perform all the prioritized transportation tasks as fast as possible. Two basic concepts are widely exploited in the modern management science and practice: the "make-to-order" principle in production management, and the "demandresponsive transport" principle in transport management. However, the modular approach may be sometimes ineffective because it provides two local optimums only and may not provide a global optimum of the whole supply chain. To properly analyze what constitutes an effective military supply chain, we need to shift from the modular approach to the integrated approach. In order to enhance integration between logistic units we use the intermediary module (IM) to coordinate schedules and resources sharing. This integration should satisfy two globally integrated performance measures that we introduce and analyze: the time of response and the military effectiveness. The time of response defines the ability of the system to react speedily to military requirements. We determine the time of response as the total waiting time between maintenance and transportation operations. To the extent that the total waiting time is less, the time of response occurs more quickly. The military effectiveness is a measure of the ability of the military supply chain to deliver the right product at the right time. In our case we determine military effectiveness to be the total number of orders that are delivered to the end-customers/military bases on time.
212
D. Tsadikovich, E. Levner, and H. Tell
The individual production and transportation components may not guarantee the demand-responsiveness of the whole supply chain. The reason is that locally optimal solutions of each module may not be coordinated with each other both in time and in respect to the common resources that are to be shared between the production and transportation modules. In order to formally describe the discrepancies that appear, we introduce an additional module that we call the intermediary module. Its main purpose is to coordinate the operations between the former two modules in time and in terms of using common resources. This coordination makes it possible to achieve a more demand-responsive and effective military supply chain, i.e., to deliver the right product to the right location at the right time. We consider the case of a two-echelon military supply chain that includes production and transportation modules and focuses on the flow of the fixed parts from the repair plant to the end-customers. We develop earlier known scheduling models in two directions. Firstly, we develop and extend each of the independent modules by introducing additional variables and constraints in order to improve their demand-responsiveness. Secondly, in order to find the optimal solution of the integrated mathematical problem we introduce a new two-level algorithm belonging to the family of the artificial-intelligence (AI) based heuristic search algorithms. The intermediary module identifies the number of parameters that were not taken into account during the modular planning process. From among these parameters we identify two main types: time and capacity parameters. The former (delay times) serves to analyze and optimize the time of response, whereas the latter characterizes and limits the effective usage of all military resources involved in the IM and are not taken into account in the production model (PM) and transportation module (TM). Objective function is the total penalty costs for early/late delivery plus the total penalty costs of waiting times between the production and transportation operations. The first term in the objection function characterizes the military effectiveness. This variable measures the ability of the military supply chain to deliver the right product at the right time. This ability is achieved by increasing the number of repaired parts that are delivered to the end-customers on time. The second one characterizes the time of response which depends on better coordination of timing and results in reducing the total waiting time between maintenance and transportation operations.
4 Mathematical Models As a part of the military maintenance process, all failed units are moved from their military deployment to the repair plant. In the repair plant, all failed units of the same type are grouped together in batches. Each batch has its own customized repair route, i.e., it is processed according to a pre-determined technological repair sequence. The total length of the time for repairing all batches (makespan Cmax ) is to be minimized. Such a model is known in scheduling theory as a classical job shop problem with a makespan objective and no recirculation (see Pinedo, 2002). On the one hand, the concept we present is based on this basic model; on the other hand, it includes some new extensions: a) each operation of the batch in the recovery plant can be processed by a number of alternative, non-identical resources, b) each batch has a rule and its own due date.
AI-Based Integrated Scheduling of Production and Transportation Operations
213
The batches and resources in the repair plant are the jobs and machines, respectively and are described mathematically as follows. The number of jobs is denoted by n , j = 1, 2 ,...., n , and the number of machines by m , i = 1, 2 ,...., m . The pair ( i , j ) refers to the operation of job j on machine i . In order to describe the mathematical maintenance model we introduce two variables. The first one is y ij . This variable denotes the starting time of operation ( i , j ) . The other one is operationmachine variable B ij ; B ij = 1 if machine i is selected for operation ( i , j ) ; 0 otherwise (this variable is aimed at describing the ability of the operation ( i , j ) to be processed on the number of alternative, non-identical machines). Set N denotes the set of all operations ( i , j ) and set A the set of all routing constraints ( k , j ) that require job j to be processed on machine i before it is processed on machine k . Let H denote the set of all alternative, non-identical machines, by which the operation ( i , j ) can be processed. The processing time of the job j on (i, j )
machine i is defined by p ij and the priority of job j is defined by r j . D j is the due date of job j and F j is the finish process time of job j , Fj = Max( yij + pij ) , for all
i = 1,2,...., m , j = 1,2,...., n . The objective of the scheduling problem is to minimize total early/late maintenance time subject to the defined due dates. After maintenance is finished, the repaired batches must be transferred to their specific customers (military bases) using the number of vehicles that are waiting at the processing facility. The total length of time for delivering all batches has to be minimized. The mathematical formulation of this transportation problem (called the vehicle routing problem) was introduced by Dantzig and Ramser in 1959. We extend this basic problem in several directions: a) each customer (in our case, military bases) has to be served according to time intervals (this problem with time windows is a generalization of the VRP problem and called VRPTW); b) the transportation of the repaired batches is made by different vehicles types (with different capacity, transportation times); and c) various type of batches are delivered. In the transportation problem, a quantity d aq of batch q , q = 1 , 2 ,.., Q is to be delivered to each customer a∈G ={1,...,g} from a central depot {0} using k independent delivery vehicles, v = 1, 2 ,.., k with non-identical capacity C v . Delivery is to be accomplished according to the given time intervals, ua ≤ sav ≤ ra , where s ak is the decision variable which denotes the time a vehicle v starts service a customer a ( u a and ra are the lowest and highest time boundary respectively). Denote by t abv the transit time (delivery time) of vehicle v from customer a to customer b , for 0 ≤ a , b ≤ g . The time structure is assumed to be symmetric, i.e., t ab = t ba and taa = 0 . The solution to this problem consists of a partition of G into k routes {W 1 ,.... W k } , each satisfying
∑
Q
b ∈W a
∑
q =1
d bq ≤ C v , v = 1, 2 ,.., k . This
214
D. Tsadikovich, E. Levner, and H. Tell
problem is naturally associated with the complete undirected graph consisting of nodes G ∪ {0} , edges E , and edge-traversal times t abk ,{a, b} ∈ E . In this graph, a solution is the union of k cycles whose only intersection is the depot node. Each cycle corresponds to the route serviced by one of the k vehicles. By associating a binary decision variable x ev with each edge in the graph, we obtain the following integer programming formulation. The objective is to serve all the given customer demands at minimum total time. 4.1 Intermediary Module Every unit, which is to be repaired and then transported, is called a batch and denoted by j. Each batch consists of the number of identical repaired units (this number is varied across different batches). n - number of batches, j = 1,..., n . k - number of vehicles, v, l = 1,..., k . TP j - finish time of a repair operation for batch j (this parameter is received as the
output of the production module). ST vj - starting time of a transportation operation for batch j by vehicle v .
tt vj - total transportation time of batch j from the repair plant (depot) to the corresponding customer when it is transported by vehicle v . C v - available capacity of vehicle v . dd
- due date for delivery of batch j (this is a target instant of job j ).
j
ED
vj
- early delivery time of batch j when it is transported by vehicle
v : EDvj = MAX[0, dd j − (ttvj + STvj )] LD vj
- late delivery time of batch j when it is transported by vehicle
v : LDvj = MAX[0, (ttvj + STvj ) − dd j ] γ j - delay time allowed to be elapsed between the moment when repair of batch j is finished and the instant when its transportation to the customer starts. TD - total allowed delay time between all production and transportation operations. TQ j - total quantity of batch j which has to be delivered to the end customers. α
vj
- per-unit penalty costs for early delivery of batch j when it is transported by
vehicle v (earlier than the due date). β vj - per-unit penalty costs for late delivery of batch j when it is transported by vehicle v (later than the due date). η j - per-unit costs for batch j when its finish repair time is increased by one unit. ψ
, τ j - minimum and maximum quantity of batch j respectively, which can be delivered by external suppliers to the end customers. j
AI-Based Integrated Scheduling of Production and Transportation Operations
215
Decision variables: Z
vj
is the batch-vehicle allocation variable, Z vj = 1 if batch j is allocated to vehi-
cle v , 0 otherwise. U j is the increment variable for finish time of a repair operation for batch j (this variable represents the number of time units by which we will increase the finish time of a repair operation of batch j for the purpose of reducing waiting times between the moment when repair of this batch is finished and the instant when its transportation to the customer by a corresponding vehicle starts). Q vj is the quantity of batch j which will be delivered by vehicle v . ~
Q
j
is the quantity of batch j which will be delivered by external suppliers.
Auxiliary values to be computed and used in the model: CED vj - individual penalty costs for early delivery of batch
v:
by vehicle
CED
= CED
vj
vj
(Z
vj
) = Z vj Q vj ED
CLDvj - individual penalty costs for late delivery of batch
vehicle
j when they transported vj
α
vj
j when they transported by
v: CLD
= CLD
vj
vj
(Z
vj
) = Z
vj
Q
vj
LD
vj
β
vj
Objective function: The objective function is total penalty costs for early/late delivery plus total penalty costs for increasing finish repair times. The first term characterizes military effectiveness while the second term characterizes time of response. This objective function has to be minimized: k
n
n
k
n
n
MIN∑∑[CEDvj (Z vj ) + CLDvj (Z vj )] + ∑U jη j = MIN∑∑ Z vjQvj ( EDvjα vj + LDvj βvj ) + ∑U jη j , v=1 j =1
j =1
v=1 j =1
j =1
Set of constraints: 1. To ensure that the capacity C v of vehicle v is not exceeded. 2. To allow some quantity of batch j to be delivered by external suppliers. 3. To ensure that total quantity TQ j of batch j will be delivered to the end customers. 4. Delay time between the moment when repair of batch j is finished ( TP j ) and the moment when its transportation to the corresponding customer by vehicle v starts ( ST vj ) is limited 5. The total waiting time between all production and transportation operations is limited 6. The vehicle allocation variables are binary 7. The variables U j belong to the set R of real numbers. ~
8. The variables Q vj and Q j are non negative.
216
D. Tsadikovich, E. Levner, and H. Tell
5 Algorithm Description In order to find the optimal solution of the integrated mathematical model we introduce the algorithm from the family of the artificial-intelligence based heuristic search algorithms. The basic idea behind this algorithm is to provide a model with the human knowledge, heuristics and logical procedures which are needed to solve the given problem. Using such methods we try to arrive at workable and economical, even if non-optimal, solutions in a reasonable amount of time. There is evidence that these methods are not optimizers, but rather satisficers, that they do not necessarily search for the optimal solution to the problem, but rather for a sub-optimal solution that satisfies the majority of the problem constraints and conditions. Like many other AI-based search algorithms (see e.g. Russel and Norwig (2003)) the suggested algorithm first searches a schedule segment that appears to be most likely to lead towards the best schedule, keeping a sorted set of alternate segments. The algorithm is a combination of the mixed integer programming techniques and different heuristics in which a candidate solution is iteratively improved with regard to a selected measure of quality. In our model we implement and combine, in both the production and the transportation models, different heuristic methods such as early due dates (EDD), Coefficient Weighted Time Distance Heuristics (CWTDH), condition-action rules and mixed integer programming (MIP). The algorithm analyses the partial results obtained by heuristic search and directs this search towards achieving a goal by using the if-then rules (such as if_ condition_then_action) and human interference. At the first level of the algorithm, it solves the corresponding scheduling problems, separately for the production and transportation modules. For this aim, the corresponding heuristics are used and locally-optimal solutions are obtained. Then, at the second level, the optimization problem appearing in the intermediary module is solved; this permits us to optimize two global performance measures, rather than the local objective functions of the first level. The outputs of the scheduling problems obtained at the first level are used as the input for the second level. The optimal solution obtained at the second level is either accepted as a final solution for the two-level system under consideration or, if the obtained results are insufficient from the point of view of the time of response and military effectiveness, then the problems of the first level are re-constructed, that is, their input data are changed and modified. These problems are iteratively re-solved again and again with the modified data. The process is repeated until an acceptable solution of the integrated problem at the second level is obtained. We use the discrete-event simulation tool - ARENA 11.0, to implement the AI based heuristic search algorithm. Entities in the Arena model represent batches and vehicles in the production and transportation modules respectively. Each production batch belongs to one of the three possible types which are defined randomly by using the uniform distribution of Uniform [1, 3]; they are processed by their predetermined routes according to the defined due dates. Each production route consists of a number of the production machines with normally distributed processing times. In order to find the optimal processing rule, different heuristics are checked: EDD, service in random order (SIRO), allowance-based priorities, slack-based priorities and ratio-based
AI-Based Integrated Scheduling of Production and Transportation Operations
217
priorities (see Baker, 1984). As a result, a new heuristic technique based on a combination of EDD and allowance-based priorities is created. After leaving the production module, the repaired batches are transferred to their specific customers (military bases) using a number of vehicles with different capacities. The transportation times between different military bases are given in hours and they are normally distributed. In each iteration, the current capacity for each vehicle is defined randomly by using the uniform distribution of Uniform [0.5, 1]*Initial capacity. In the transportation module the CWTDH algorithm suggested by Gali´c, et al. (2006) is used in which at each iteration, it searches simultaneously for the customer with the earliest serving request and the minimum distance from the current vehicle position. In the present work this basic heuristic is extended. In addition to the existing parameters such as the earliest serving request (time windows) and the distance parameters, a number of the required batches in each site at any given time are taken into account. In order to find the optimal combination of these parameters, each of them receives a weight, which is updated randomly in the each of the iterations of the simulation according to the determined uniform distribution. The integration across the production and the transportation modules in the simulation model is made by the intermediary module. This module is presented in the Arena simulation model by the following features: external suppliers (a number of additional vehicles of the thirdpart logistic company in the transportation module is defined); due dates (new due dates for the jobs in the process module are defined by using the developed heuristic method); starting time of transportation operation (new starting delivery time for each vehicle are defined); vehicle capacity (the capacity for each vehicle is defined). At the end of each iteration, the AI algorithm analyses the partial results obtained by the heuristic search in the production, transportation and intermediary modules in terms of the military effectiveness and the time of response performance measures and directs this search towards achieving a goal by using the if-then rules and human interference. For example, if the obtained results of the current simulation happen to be worse than the previous one, the AI algorithm changes automatically or by human interference the parameters in the heuristic methods that are used in the PM, TM and IM modules (or replace the heuristic method itself); besides, the algorithm defines new priorities for the production batches and offers a new content of the delivered batches in each vehicle. As a result of the simulation, the ranges of the variables are dramatically reduced, and many of them become fixed. Then, many variables are removed and the problem size is decreased. In order to improve the solution that emerges, we use the results of the simulation model as the input data for the integer programming model which is solved with a regular LP/MIP solver (GAMS). The flow chart below displays the detailed logical structure of the AI-based heuristic search algorithm in the Arena simulation environment. Each block represents a process or a sequence of heuristic search operations in the PM, TM and IM modules. The connector lines are used to join these modules together and specify the flow of entities, in our case the production batches and vehicles. The integration across the PM and TM modules is done by the IM module. The output data, such as the total waiting time and the total number of jobs delivered on time are recorded and analyzed by the AI block.
218
D. Tsadikovich, E. Levner, and H. Tell Definition of the input data in the external Excel file Reading the input data from the external file into the relevant Arena variables Production Module
Finish repair time of each batch in the output file
Processing the failed batches according to the given due dates by using appropriate heuristic methods (EDD and allowance-based heuristics) Splitting the batches into the jobs (according to the number of jobs in each batch) Transportation Module Definition of the starting delivery time and capacity for each vehicle Allocation the repaired jobs to the appropriate vehicles Finding the best route for each vehicle by using extended CWTDH heuristic algorithm implemented in the Excel file
Finish transportation time for each job in the output file
Delivering the jobs to the endcustomers according with the results of the CWTDH heuristic AI algorithm
Calculating the new due date for each job based on finish repair time and delivery pickup time Calculating the new weights for the parameters of the CWTDH heuristic
Collection and analyzing the output data Re-running the simulation according to the new input data
Intermediary Module
Generation of local optimal solutions Design and solving of the MIP model Generation of a global optimal solution
Fig. 1. The flow chart of the algorithm
6 Computational Results In order to compare modular and integrated approaches we have experimented in two main directions: (1) the number of jobs in the simulation was changed from 25 to 110 and (2) the number of vehicles was changed from the 2 to 8. The output results, such as the time of response and the military effectiveness, are obtained and analyzed. The simulation is set to run 25 instances, each representing a different combination of factors. At the beginning of each of these instances, each of the factors is included in the simulation model. Each of these 25 design points is replicated 15 times using a different stream of random numbers for each of the 15 replications, yielding a total of 375 independent observations.
AI-Based Integrated Scheduling of Production and Transportation Operations
Military effectiveness
80
219
Time of response
20
70 16 Total waiting time (in thousands hr)
Number of jobs delivered on time
60 50 40 30 20
12
8
4
10 0
Modular
25
30
35
40
45
50
Integrated
55
60
65
70
Number of jobs
75
80
85
90
95
100 105 110
0
Modular Integrated
2
3
4
5
6
7
8
Number of vehicles
Fig. 2. Comparison of the time of response and military effectiveness performance measures for the modular and integrated approaches (in the average)
The integrated approach steadily achieves a better time of response performance than the modular approach. With the increasing the number of vehicles, the total waiting time between production and transportation operations in the integrated approach goes down more quickly. At the same time, with increasing the number of jobs, the integrated approach improves the system effectiveness: it allows delivering more jobs on time to the end customers than the modular one. These results are graphically displayed in Fig. 2.
7 Conclusion The added value of the proposed approach as compared with the existing ones is threefold: 1. Production and transportation demand-responsive modules are integrated into a uniform concept of demand-responsive scheduling of production and transportation within a military supply chain. Integration of these modules is achieved by introducing an additional intermediary module into the mathematical model and proposing a new two-level artificial-intelligence based heuristic search algorithm. 2. The demand-responsiveness and effectiveness of the military supply chains are formally described and analyzed with the help of two performance measures: the time of response and the military effectiveness. To the extent that these performance measures will be better, the whole military supply chains will be more demand-responsive and effective. 3. New mathematical models and solution methods based on the mixed integer programming techniques in combination with the heuristic algorithms are derived.
References 1. Baker, R.K.: Sequencing Rules and Due-Date Assignments in a Job Shop. Management Science 30(9), 1093 (1984) 2. Barahona, F., Chowdhary, P., Ettl, M., Huang, P., Kimbrel, T., Landanyi, L., Lee, Y.M., Schieber, B., Sourirajan, K., Sviridenko, M.I., Swirszcz, G.M.: Inventory Allocation and Transportation Scheduling for Logistics of Network-Centric Military Operations. IBM Journal of Research & Development 51(3/4), 391 (2007)
220
D. Tsadikovich, E. Levner, and H. Tell
3. Beamon, B.M.: Measuring Supply Chain Performance. International Journal of Operations & Production Management 19(3), 275–292 (1999) 4. Bhagavatula, S.K., Chedjou, J.S., Anne, K.R., Kyamakya, K.: Evaluation of Nonlinear System Behaviors in Military Supply Chain. In: First International Workshop on Nonlinear Dynamics and Synchronization (INDS 2008), vol. 18, pp. 45–51 (2008) 5. Chen, P.: Integrating Production and Transportation Scheduling in Make-to-Order Environment. Ph.D. Thesis, University of Cornell (2000) 6. Chen, Z.-L., Vairaktarakis, G.L.: Integrated Scheduling of Production and Distribution Operations. Management Science 51(4), 614–628 (2005) 7. Dantzig, G.B., Ramser, R.H.: The Truck Dispatching Problem. Management Science 6, 80–91 (1959) 8. Galić, A., Carić, T., Fosin, J., Cavar, I., Gold, H.: Distributed Solving of the VRPTW with Coefficient Weighted Time Distance and Lambda Local Search Heuristics. In: Biljanović, P., Skala, K. (eds.) Proceedings of the 29th International Convention MIPRO, Rijeka, pp. 247–252 (2006) 9. Gibson, D.R.: Average Customer Wait Time: A Supply Chain Performance Indicator. Army Logistician 36(6), 30–32 (2004) 10. Johnson, D., Levite, A.E.: CWT and RWT Metrics Measure the Performance of the Army’s Logistics Chain for Spare Parts, RB - 3035 - A, RAND Report (2003) 11. Leiphart, K.L.: Creating a military supply chain management model. Army Logistician 33(4), 36 (2001) 12. Li, K., Sivakumar, A.I., Ganesan, V.K.: Analysis and Algorithms for Coordinated Scheduling of Parallel Machine Manufacturing and 3PL Transportation. International Journal Production Economics 115, 482–491 (2008) 13. McGee, J.B., Rossetti, M.D., Mason, S.J.: Quantifying the Effect of Transportation Practices in Military Supply Chains. Journal of Defense Modeling and Simulation 2, 87–100 (2005) 14. Pinedo, M.: Scheduling: Theory, Algorithms, and Systems, 2nd edn., p. 159. Prentice-Hall, Englewood Cliffs (2002) 15. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (2003) 16. Sarmiento, A.M., Nagi, R.: A Review of Integrated Analysis of Production-Distribution Systems. IIE Transactions 31, 1061–1074 (1999) 17. Stecke, K.E., Zhao, X.: Production and Transportation Integration for a Make-to-Order Manufacturing Company With a Commit-to-Delivery Business Mode. Manufacturing & Service Operations Management 9(2), 206–224 (2007) 18. Xiuli, W., Cheng, T.C.E.: Production Scheduling With Supply and Delivery Considerations to Minimize the Makespan. European Journal of Operational Research 194, 743–752 (2008) 19. Yan, S., Lai, W., Chen, M.: Production Scheduling and Truck Dispatching of Ready Mixed Concrete. Transportation Research Part E 44, 164–179 (2006)
Turbo Codification Techniques for Error Control in a Communication Channel Pablo Manrique Ramírez, Rafael Antonio Márquez Ramírez, Oleksiy Pogrebnyak, and Luis Pastor Sánchez Fernandez Centro de Investigación en Computación del Instituto Politecnico Nacional, Av.Juan de Dios Batiz s/n, Colonia Nueva Industrial Vallejo, C.P. 07738, México D.F. {pmanriq,olek,lsanchez}@cic.ipn.mx,
[email protected]
Abstract. An implementation of the turbo coding technique for data error detection and correction in data transmission is presented. The turbo coding technique is known to be efficient in data transmission adding redundant parity that provides a high error correction capacity decreasing the number of erroneous bits for low signal to noise ratios increasing the number of iterations. The turbo encoder and turbo decoder were implemented in a FPGA development system. The design is oriented to reach a transmission speed near to the theoretical Shannon’s capacity of the communication channel and minimum possible energy consumption using only the FPGA resources without external memories. Keywords: Error correction coding, turbo coding, FPGA.
1 Introduction In the codification and information theory, an error correction code (ECC) is a code in which each one of the data signals observes specific “construction” rules. Depending on this construction, the received signal can be detected and corrected automatically. This property is used commonly in computer data storage, dynamic RAM or data transmission. Examples of ECC are algebraic codes (block codes), Hamming code, Reed-Solomon code, Reed-Muller code, binary Golay code, convolutional code, turbo code, low density parity check code (LDPCC), etc. The simplest ECC can correct single bit errors and detect double bit errors, more complicated codes can detect or correct multiple bits errors. The two main classes of ECC are block codes and convolutional codes. Prior to turbo codes, the best one were serial concatenated codes based on an outer Reed-Solomon error correction code combined with an inner Viterbi-decoded short constraint length convolutional code, also known as RSV codes. In 1993 Claude Berrou, Alain Glavieux and Punya Thitimajshima from the Superior National School of Telecommunications of Bretagne, France developed the turbo codes the most powerful ECC at the moment [8]. They are a class of high-performance convolutional codes for forward error correction whose performance in terms of binary error rate (BER) approaches to the Shannon’s limit. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 221–231, 2010. © Springer-Verlag Berlin Heidelberg 2010
222
P. Manrique Ramírez et al.
The turbo codification techniques are based on convolutional algorithms that strengthen the data to be sent adding redundant information to data at the transmitter side, and using iterative probability estimation algorithms at the receiver side. Along with the LDPCC, they are those that more approach the theoretical limit of maximum rate of information transference on a channel with noise (up to 0.5 dB in the limit). Turbo codes makes possible to increase the data rate without the necessity to increase the transmission power, reason why also they can be used to diminish the amount of used energy to transmit to a certain rate of data. From an artificial intelligence viewpoint, turbo codes can be considered are similar to the iterative belief propagation in Bayesian networks. Turbo codes can be used in various applications where one look for to obtain a maximum information transference in a limited bandwidth channel with the presence of noise. Such applications, for example, are: standards of cellular telephony of third generation, satellite communications, future standards of satellite and mobile television, DVB-S (digital video broadcasting - satellite), DVB-H (digital broadcasting video - handheld). Now, turbo codes replace the Reed-Solomon codes used before in telecommunications. In some future missions of the NASA space reconnaissance, the turbo codes will be used as a standard replacing RSV concatenated codes. Actually, the refinement and implementation of the turbo codes are an active area of research in many universities. This paper presents an attempt of efficient FPGA implementation of a turbo coder and decoder using only the FPGA resources.
2 Turbo Coding Theory The Shannon’s theorem for noisy communication channel capacity is important in error correction encoder's design. It describes the maximum attainable efficiency of an error correction scheme against the awaited noise interference levels in terms of channel capacity as [1], [7]: C = lim
T →∞
log( N (T ) ) , T
(1)
where N (T ) is a number of permitted signals of duration T . If the number of errors is smaller or equal to the correctable maximum threshold of the code, all errors will be corrected. Thus, the error correction codes require more signal elements than necessary ones to transport the basic information [6], [7]. 2.1 Encoder The turbo encoder may be designed using a parallel concatenation of two recursive systematic encoders (RSC) and the decoder uses decoding rules on the base of iterative estimations of probabilities using identical blocks of decoding [8]. Fig. 1 shows an example of two RSC encoders that use a parallel concatenation scheme. Both elementary encoders (C1 and C2) use the same input signal dk but in different sequence due to the presence of an interleaver. Thus the turbo encoder generates the original data Xk followed by Y1k sequence and later by Y2k sequence. This way, the redundant
Turbo Codification Techniques for Error Control in a Communication Channel
223
Fig. 1. Recursive systematic coding with parallel concatenation
parity information is added to the transmitted data making it more robust under the noise effects that will be added in the channel. Although in Fig. 1 the output sequence is a bit, commonly the output lengths are of hundreds or thousands of data bits. In this case the rate of the turbo encoder is 1/3, this is, a data bit produces a code word of 3 bits, although it is obvious that it is possible to work with other rates. 2.2 Decoder The original turbo decoder is an array of decoders in a serial concatenation scheme united by an interleaver [8].
Fig. 2. Principle of the decoder according to a serial concatenation scheme
224
P. Manrique Ramírez et al.
In Fig. 2 the decoder DEC1 is associated with the sequence Y1 and generates a soft decision [1]. The decoding is made simultaneously for the full scope of the code word sequence. Such decoding provides good results, but it is not always viable. The output sequence DEC1 is interleaved to enter at second decoder DEC2 that is associated with the Y2 sequences. The logarithm of the likelihood ratio, Λ1(dk), is associated with each decoded bit by DEC1 and is an relevant information piece for DEC2:
Λ1 (d k ) = log
(
where P observation d
k
=d
), ( ) P (observation P observation d
k
=1
(2)
d k =0
)
; d = 0,1 is a posteriori probability of the data bit dk, and the
observed data set is received from the transmission channel ( y1N ). Thus, the final decision of the decoded data will become based on the sign of Λ1. As the decoder is convolutional, it will have a certain amount of memory. This is the reason why the decoder may be considered as a “state machine”. Because of this, at time of deciding if data dk is one or zero, the decoder will be in some of those states, Sk = s. Considering this fact, the conditional probabilities of the possible value dk can be defined because the data set ( y1N ) already has been received. And in addition if we considered that the received data set can be separated in the observed data before the moment k, the present observation at the moment k and the future observation after moment k [1]:
{
y N1= y1k −1 , y k , y kN+1
}
(3)
it is possible to pass (3) to (2) and define ⎛ P( y N 1 ⎜ Λ (d k ) = log⎜ N ⎜ P( y1 ⎝
⎛ ) P ( y1N ) ⎞⎟ ⎜ d k =1 = log⎜ ⎟ ⎜ ) P ( y1N ) ⎟ ⎜ d k =0 ⎠ ⎝
∑ P( y1N s s'
∑ P( y1N s'
) P ( y1N ) ⎞⎟ ⎟. ) P ( y1N ) ⎟ ⎟ s k −1 = s ', d k = 0 ⎠ k −1 = s ', d k =1
With a little manipulation of course, and using some of the BAHL’s in the algorithm concepts [4], LLR becomes to be
∑
⎛ P ( y1k −1 ) P( y kN ⎜ s k −1 = s ' Λ (d k ) = log⎜ d =1 ⎜ P ( y1k −1 ) P( y kN ⎜ s k −1 = s ' ⎝ d =0
∑
) P( y1N ) ⎞⎟ ⎟, ) P ( y k s = s ',d = d ) P( y1N ) ⎟ ⎟ k −1 k sk −1 = s ' ⎠
sk −1 = s '
) P( yk
s k −1 = s ', d k = d
(4)
Denoting
α k ( s ) = P ( y1k
sk = s
) , β k −1 ( s ' ) = P( ykN
sk −1 = s '
) , γ k ( s ' , s ) = P ( yk
sk −1 = s ',d k = d
),
Turbo Codification Techniques for Error Control in a Communication Channel
225
one can finally formulate LLR as
∑
⎛ α k −1 ( s ' ) β k ( s )γ k ( s ' , s ) / P( y1N ) ⎞ ⎜ ⎟ ⎟, Λ (d k ) = log⎜ d =1 ⎜ α k −1 ( s ' ) β k ( s )γ k ( s ' , s ) / P( y1N ) ⎟ ⎜ ⎟ ⎝ d =0 ⎠
(5)
∑
where α k (s ) is a joint probability of state s at time k given the last observations (forward metric) [2], β k −1 ( s' ) is a conditional probability of the future observations given state s' at time k-1 (reverse metric) [2], and γ k ( s ' , s ) is the transition probability that the state sk-1 transits to sk at time k when the input is dk. Making simplifications and a little of manipulation, a recursive version of
α k ( s ) and β k −1 ( s' ) can be obtained as αk (s) =
∑
P( y1k −1
s ' / sk = s
β k −1 ( s ' ) =
sk −1 =s'
)P( yk d
k =d , sk −1 =s '
)=
∑αk −1(s')γ k (s', s) = ∑αk −1(s' )γ k (s', s),
s ' / sk = s
∑ P ( y kN+1 s = s ) P ( y k d = d , s
s / s k −1 = s '
k
k
k −1 = s '
s'
)=
⎧1 s = 1⎫ ⎧1 with the initial conditions α 0 ( s ) = ⎨ ⎬ , β0 = ⎨ ⎩0 s ≠ 1⎭ ⎩0
∑
βk s / s k −1 = s '
( s )γ k ( s ' , s ) ,
(6)
s = 1⎫ ⎬. s ≠ 1⎭
Having the advance and backward probabilities in a recursive form, it is possible to think in a recursive scheme that could be of two forms: with feedback loop and concatenated. Fig. 3 [8] shows the feedback decoder, where W2k represents the forward and backward probabilities that now are feedback to DEC1 like a third input parameter (zk).
Fig. 3. Feedback decoder assuming zero internal delay
226
P. Manrique Ramírez et al.
Fig. 4. a) Decoder module at level p ; b) modular decoder corresponding to an iterative process of feedback decoding
Because the first decoder, DEC, receives additional redundant information (zk), its performance can be improved significantly. The turbo code term arises from this iterative scheme of decoder, remembering the turbo ignition principle in motors. Note this additional information (extrinsic [2], [8]) comes from an iterative previous step. The decoder generates a delay caused by DEC1 and DEC2. The interleaver and deinterleaver imply that zk information must be used through an iterative process as shown in Fig. 4 [8], where the global decoder circuit is compound of P serially concatenated identical elementary decoders. The p-th DEC decoder input is formed by the output sequence of demodulator (y)p through a delay line and an extrinsic information (z)p generated by the (p-1)-th DEC decoder.
3 Implementation and Results One of the implementation objectives was to minimize the possible energy consumption at the same time obtaining the data transmitting rate similar to the theoretical rate for a communication channel given by Shannon’s limit. The turbo coder and decoder were implemented in FPGA observing the integration of all functional blocks in only one device in contrast with commercial solutions that use additional hardware.
Turbo Codification Techniques for Error Control in a Communication Channel
227
The turbo encoder and decoder initially tried to be implemented in XILINX FPGA SPARTAN 3E development system, but due to the resource limitations the final design was implemented in ALTERA FPGA DE2, which has more capacity. The turbo encoder was integrated with both coders RSC showed in Fig. 5 concatenated in parallel and separated by the interleaver as it was shown in Fig. 1.
Fig. 5. RSC coder with generators of [1, 05/07]
The RSC coder in Fig.5 was used as a coding component for the implementation. The interleaving process consists of generating a square matrix y order all bits by row and read them by columns in reversing bit order. Generally, any pattern for the interleaver could be used; different patterns provide different results with significant BER differences, reason why the design of the interleaver significantly contributes in the general performance of a turbo code system.
Fig. 6. Block diagram of designed turbo encoder-decoder system
At the output of the encoder, the original data Xk are followed by the first parity sequence Y1k next by the second parity sequence Y2k. The original information and the parity information redundant data are passed through the line encoder to the transmission channel and are transmitted in a manner more robust to the noise effects that will be added by the channel (see Fig. 6). At the channel output, the signal is mixed with additive channel noise. This signal passes through the line decoder to determine in a normal way if a logical one or zero were recovered. Next, the bit is submitted to the statistical evaluator of the turbo decoders. Fig. 7 shows the block diagram of the implemented turbo decoder. A scheme that concatenates series of the maximum a posteriori probability (MAP) feedback decoders was used for decoding. Each MAP decoder generates partial estimates that represent a
228
P. Manrique Ramírez et al.
priori information for conditional probability calculations in the next decoder; the feedback produces an iterative system that at each iteration resulting in estimate refining. In the decoder, the hard decisions are taken. The process starts acquiring quantified signal. The quantization step size is used to generate the transition probabilities that permit to calculate the conditional probabilities for all states and both trellis, which are generated by convolutional coding. Once all transitional and conditional probabilities were calculated, the partial estimates of the decoded signal are generated using the signal decoded by the criterion of logarithmic likelihood relation (5) to obtain an extrinsic information that at the next step could be used as a priory information.
Fig. 7. Turbo decoder with iterative serial concatenation implemented in FPGA
The most complex components are the maximum a posteriori probability (MAP) decoders, where the most complex calculations are performed. Such calculations assume sums, multiplications, dividing, logarithm and exponential. The logarithms and exponential are calculated using a modified version of Mitchell's algorithm [10] with Combet, Zonneveld and Verbeek adjusting [11] in less than 1% of hardware resources. On the other hand because of MAP processing nature, the forward, backward and transition metrics can be estimated in parallel with respective limitations of hardware resources. Fortunately, the actual technologies permit to design the computationally effective algorithms substituting the multiplications and divisions by sums and rests in the logarithmic domain that results in a significant reduction of the calculation latency, energy consumption and the use of hardware resources in almost 45%. The turbo decoder evaluates the data subset altogether with the rest of preceding and subsequent (soft) bits. If the bit from the line decoder is correct or erroneous, statistical metrics are produced and inputted to the following decoder with the same characteristics, which generates the same type of metrics. Thus, it is possible to decide how the iterative process (turbo) could be realized: 1) with a feedback loop of these statistics to the first decoder with a maximum number of iterations or with a
Turbo Codification Techniques for Error Control in a Communication Channel
229
difference threshold between these metrics as a stop criterion; 2) implementing a fixed number of decoding stages that determines the number of iterations. The results of Berrou et al.[8] (see Fig. 8) show that for any signal to noise ratio greater than 0 dB, BER decreases as function of iterations p. The codification gain is fairly high for the first values of p (p = 1, 2, 3. Thus for p=18, for example, BER is smaller of 10-5 with a signal to noise ratio Eb/N0= 0,7 dB. Shannon´s theorem establish that, for a binary modulation with .5 transmission rate, Pe=0 for Eb/N0=0 dB. (several authors take Pe = 10-5 as a reference). With a parallel concatenation of encoders RSC and a feedback decoding, the performance is at 0,7 dB from Shannon’s limit. Modifying the amount of memory of encoders and decoders one can obtain some degradation and inefficiencies in BER and the correction capacity added by the encoders C1 and C2 [8]. On the contrary, in the presented implementation of the turbo coding system one has as a reference the number of erroneous bits (BER) compared to the signal to noise ratio. Therefore, for both low and high signal to noise ratios the system has to maintain the stability of the number of erroneous bits. Logically, this has repercussions in the energy necessary for the transmission. The obtained results for the implemented turbo coding are better than the results of the concatenated or simple convolutional methods. Fig. 9 shows that when the amount of errors is greater of 50% the percentage of corrected errors begins to destabilize the system.
Fig. 8. Binary error rate. The proper results obtained for 64 bits’ block in 6 iterations are shown as bold segmented line.
230
P. Manrique Ramírez et al.
Fig. 9. Percentage of corrected errors against percentage of erroneous data in the reception
4 Conclusions The development of turbo coding techniques is an intense activity in as much academic research centers. Recently, much of improvement as inventions, algorithms for encoders and decoders, save energy, diminution of components etc are presented in the literature. Turbo coding is implemented in many of high technology products that imply great amounts of data manipulation, not only in telecommunications. The presented turbo coding approaches to the possible theoretical limit, although it needs the efforts to improve the energy consumption and optimization of computing resources. The primary objective of the presented design was to improve the data transmission in a transmission channel. Obviously, in this case there are a number of factors that degrades the signal quality. But with the considered coding system it is possible to reduce the energy consumed by the line coder and reduce the number of required components. The line coder transceptor was designed using low cost discrete components: operational amplifiers, line transformers, capacitors, fast switching transistors, etc. Among three different versions of the MAP algorithm the one of better numerical performance was chosen in spite of the greater demand of resources and time, because of its major potential. Nevertheless, it was possible to implement this processing in a single FPGA unlike many other implementations [12]. The MAX-LOG-MAP version is the most optimal for implementation in hardware but is the most vague of all. For this reason this type of processing can be used only for scientific applications unlike versions MAP or LOG-MAP.
Turbo Codification Techniques for Error Control in a Communication Channel
231
References 1. MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003) 2. Sklar, B.: Digital Communications: Fundamentals and Applications, 2nd edn. PrenticeHall, Upper Saddle River (2001) 3. Nguyen, Q.: High speed turbo codes decoder for 3G using pipelined SISO“ Log-map decoders architecture” United States ICOMM Technologies, Inc. (Wilmington, DE), Patent 6813742 (2004) 4. McGraw-Hill Dictionary of Scientific and Technical Terms. Sci-Tech-Dictionary Copyright © 2003, 1994, 1989, 1984, 1978, 1976, 1974 by Mc-Graw Hill Companies Inc. 5. Graell i Amat, A., Brännström, F., Rasmussen, L.K.: Design of rate-compatible serially concatenated convolutional codes. In: Int. Symp. on Turbo Codes and Related Topics, Munich, Germany, paper 13 (April 2006) 6. Shannon, C.E.: A Mathematical Theory of Communication. Bell. System Technical Journal 27, 379–423 (1948) 7. Shannon, C.E.: Communication in the presence of noise. In: Proc. of the IRE, vol. 37, p. 1021 (January 1949) 8. Berrou, C., Glavieux, A., Thitimajshima, P.: Near Shannon Limit Error-Correcting Coding and Decoding: Turbo-Codes. In: Proc. 1993 International Conference on Communications (ICC 1993), Geneva, Switzerland (1993) 9. Bahl, L.R., Cocke, J., Jelinek, F., Racic, J.: Optimal decoding of linear codes for minimizing symbol error rate. IEEE Trans, Inform. Theory IT-20, 284–287 (1974) 10. Mitchell Jr., J.N.: Computer multiplication and division using binary logarithm. IRE Trans. Electron. Comput., 512–517 (August 1962) 11. Combet, M., Van Zonneveld, H., Verbeek, L.: Computation of the Base Two Logarithm of Binary Numbers. IEEE Trans. Electronic Computers 14(6), 863–867 (1965) 12. Xilinx, Inc. 802.16e CTC Encoder v2.1 Product Specification, LogiCORE (April 2, 2007)
A New Graphical Recursive Pruning Method for the Incremental Pruning Algorithm Mahdi Naser-Moghadasi Texas Tech University 302 Pine Street, Abilene, TX 79601, USA
Abstract. Decision making is one of the central problems in artificial intelligence and specifically in robotics.In most cases this problem comes with uncertainty both in data received by the decision maker/agent and in the actions performed in the environment. One effective method to solve this problem is to model the environment and the agent as a Partially Observable Markov Decision Process (POMDP). A POMDP has a wide range of applications such as: Machine Vision , Marketing, Network troubleshooting, Medical diagnosis etc. We consider a new technique, called Recursive Point Filter (RPF) based on Incremental Pruning (IP) POMDP solver to introduce an alternative method to Linear Programming (LP) filter. It identifies vectors with maximum value in each witness region known as dominated vectors, the dominated vectors at each of these points would then be part of the upper surface. RPF takes its origin from computer graphic. In this paper, we tested this new technique against the popular Incremental Pruning (IP) exact solution method in order to measure the relative speed and quality of our new method. We show that a high-quality POMDP policy can be found in lesser time in some cases. Furthermore, RPF has solutions for several POMDP problems that LP could not converge to in 24 hours.
1
Introduction
One of the most challenging tasks of an intelligent decision maker or agent is planning, or choosing how to act in such of interactions with environment. Such agent/environment interactions can be often be effectively modelled as a Partially Observable Markov Decision Process (POMDPs).Operation research [1,2] and stochastic control [3] are two domains where this model can be applied for balancing between competing objectives, action costs, uncertainty of action effects and observations that provide incomplete knowledge about the world . Planning, in the context for a POMDP, corresponds to finding an optimal policy for the agent to follow.The process of finding a policy is often referred to as solving the POMDP. In the general case, finding an exact solution for this type of problem is known to computationally intractable [4], [5]. However, there have been some recent advances in both approximate and exact solution methods. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 232–242, 2010. c Springer-Verlag Berlin Heidelberg 2010
A New Graphical Recursive Pruning Method for the IP Algorithm
233
The value iteration algorithm for a POMDP was introduced by [2] first. The value function V for the belief-space MDP can be represented as a finite collection of | S | - dimensional vectors known as α vectors. Thus, V is both piecewise - linear and convex [2].Although its initial success for solving hard POMDP problems, there are two distinct reasons for the limited scalability of a POMDP value iteration algorithm.The more widely reason is dimensionality [6]; in a problem with n physical states, POMDP planners must reason about belief states in an (n -1) dimensional continuous space. The other reason is, the number of distinct action - observation histories grows exponentially with the planning horizon. Pruning is one proposed[7] solution to whittle down the set of histories considered. In some cases, an agent does not need to know the exact solution of a POMDP problem to perform its tasks. Over the years, many techniques have been developed to compute approximate solutions to POMDP problems. The goal of finding approximate solutions is to find a solution in a fast way within the condition that it does not become too far from the exact solution. Point-based algorithms [8],[9],[10] choose a subset of B of the belief points that is reachable from the initial belief state through different methods and compute a value function only over the belief points in B. After the value function has converged, the belief-point set is expanded with all the most distant immediate successors of the previous set. PBVI and Perseus use two opposing methods for gathering the belief point sets B. In larger or more complex domains, however, it is unlikely that a random walk would visit every location where a reward can be obtained. PBVI attempts to cover the reachable belief space in a uniform density by always selecting immediate successors that are as far as possible from the B. Perseus, on other hand, simply explores the belief space by performing random trajectories. While the points gathered by PBVI generate a good B set, the time it takes to compute these points makes other algorithms more attractive. However, approximate methods have the drawback that we cannot precisely evaluate them without knowing the exact solutions for the problems that we are solving.Furthermore, there are crucial domains that need exact solution to control accurately. For example when dealing with human’s life or controlling an expensive land rover. Our objective in this thesis is to present an alternative solver to evaluate approximate solutions on POMDP problems with small number of states. Among current methods for finding exact solutions, Incremental Pruning(IP) [7] is the most computationally efficient . As with most exact and many approximate methods, a set of linear action-value functions are stored as vectors representing the policy. In each iteration of running algorithm, the current policy is transformed into a new set of vectors and then they are filtered. The cycle is repeated for some fixed number of iterations, or until the value function converges to a stable set of vectors, The IP filter algorithm relies on solving multiple linear programs (LP) at each iteration.
234
2
M. Naser-Moghadasi
Introduction to POMDP
A POMDP models the interaction between an agent and a stochastic partially observable environment [4]. It consists of a septuple (S, A, R, P, γ, Z, O), where S represents the set of all possible states and A the set of all possible actions. R(s, a) is the reward an agent obtains after performing action a while in state s. P (s | s, a) is the probability of transitioning to state s immediately after taking action a in state s. γ is the discount factor that controls the importance of previous rewards at the current time. Z is the set of all possible observations and O(z, s) is the probability of observing z given that the current state is s. In addition, we define an ordered set of time steps T = (t0 , t1 , t2 , . . .), or decision epochs. At each epoch t ∈ T , the environment is in some state st ∈ S and the agent must choose an action at ∈ A which maximizes its long-term reward. However, it is assumed that the agent cannot determine precisely what the current state st is. Instead, the agent must maintain a probability distribution bt over S, the set of all possible states. The probability distribution bt is commonly referred to as the belief state, or simply the belief, at time t. The agent’s goal is to choose actions that will maximize its long term reward. It does this by following a policy π : b → a, which maps belief states into the appropriate action to take. Solving a POMDP means finding π such that the expected sum of discounted rewards is maximized. At each epoch t, after an action at has been taken and an observation zt has been made, the agent calculates a new belief state bt+1 using an application of Bayes rule. The process for maintaining the belief state is equivalent to a Bayes filter. The well known Kalman filter is a Bayes filter where the variables are restricted to be linear and normally distributed. The initial belief is set to some arbitrary b0 . A policy can be characterized by a value function. A value function at a belief bt is the expected future discounted reward the agent can gather by following the policy starting from the current belief bt [4]. The value function for a POMDP is represented by a set V of N -dimensional vectors v ∈ V where N = |S| is the number of possible states. These vectors represent linear action-value functions, i.e. each vector has an associated action, and represents the total reward obtainable for following a specific sequence of actions, from each possible initial belief state. For a complete introduction to the POMDP problem please visit [11] . The set of action value functions, when plotted, form a piecewise linear and convex (PLC) surface over the belief space. Finding the solution to a POMDP problem involves finding the vectors that form the PLC upper surface at every epoch. That process is commonly referred to as filtering the value function or policy.
3
POMDP Algorithm
Algorithm 1 and Algorithm 2 are the general algorithms used to find POMDP policies [11]. Algorithm 1 includes an optional pruning step on line 21, which can be used to filter the value functions in Υ to speed up calculation. This filter operation is performed by eliminating vectors that do not form the upper
A New Graphical Recursive Pruning Method for the IP Algorithm
235
Algorithm 1. POMDP Solver 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24:
Υ = (0;0,. . .,0) for τ = 1 to T do Υ = ∅ k ) ∈ Υ do for all (a’; v1k , . . . , vN for all control actions a do for all measurements z do for j = 1 to N dok k = N va,z,j i=1 vi p(z|si )p(si |a, sj ) end for end for end for end for for all control actions a do for all k(1),. . .,k(M)=(1,. . .,1) to (|Υ | . . . , |Υ |) do for i = 1 to N do k(z) vi = γ r(si , a) + z va,z,i end for ) to Υ (a; v1 , . . . , vN end for end for optional filtering method to prune Υ Υ = Υ end for return Υ
Algorithm 2. policyPOMDP(Υ, b =(p1 , . . . , pN )): 1: u ˆ=
arg max k ,...,v k )Υ (a;v1 N
N
i=1
vik pi
2: return u ˆ
surface, and therefore do not take part in the solution at epoch t or any future epoch. The most used filtering technique for finding an exact solution uses linear programming. The disadvantages of linear programming are that it is slow, computationally expensive, and subject to numerical instability. This is why we introduce this new exact filtering technique.
4 4.1
RPF RPF in a 2-States Belief Space
Since filtering eliminates the set of vectors which are not part of the final solution, it is an important part to make most of the POMDP solvers faster.
236
M. Naser-Moghadasi
Algorithm 3. RPF(InterStart, InterEnd): 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21:
if (InterEnd - InterStart) ≺ δ then return end if Vsrt ←Dominate Vector in InterStart Vend ←Dominate Vector in InterEnd (InterStart + InterEnd) Vmid ← Dominate Vector in 2 if Vsrt = Vend = Vmid then PrunedList.Append(Vsrt) return end if if Vsrt = Vmid then (InterStart + InterEnd) PrunedList.Append(RPF(InterStart, )) 2 else PrunedList.Append(Vsrt) end if if Vend = Vmid then (InterStart + InterEnd) , InterEnd)) PrunedList.Append(RPF( 2 else PrunedList.Append(Vend) end if return PrunedList
In each recursion of RPF, it gets two real numbers { InterStart, InterEnd } as the inputs and returns the dominated vectors list { P runedList }. InterStart, InterEnd are the starting and ending points in the interval of one dimensional belief space . In the next step, it calculates MidPnt as the middle point between InterStart, InterEnd. It is obvious that initially InterStart,InterEnd are set to 0 and 1 respectively. If | InterStart - InterEnd | ≺ δ ,δ is a positive parameter set before beginning of the recursion; it exits before entering into the main loop of the algorithm as shown in Algorithm 4.1. Otherwise, it identifies dominated vectors within given belief intervals :{ InterStart to MidPnt } and { MidPnt to InterEnd } . We call the dominated vector with belief value of MidPnt as Vmid , InterStart as Vsrt , InterEnd as Vend respectively (lines 4-6). In the next part of the algorithm, it compares Vmid with Vsrt ; if they are from the same vectors, it adds Vsrt into P runedList (line 14), and the witness region is extended by adding the boundary of dominated vectors. In the next recursion it (MidPnt + InterStart) as new arguments for the gets new intervals InterStart, 2 RPF algorithm (line 12). In this way, it recognizes upper surface vectors from (MidPnt + InterStart) InterStart to MidPnt. We use the same approach for , 2
A New Graphical Recursive Pruning Method for the IP Algorithm
237
InterEnd (lines 16-20), recursively to cover remaining part of the belief space interval. 4.2
RPF in Higher Dimensions
A POMDP problem with | S | number of states makes (| S | -1) - dimensional hyperplanes for representing in the belief space. A POMDP policy is then a set of labelled vectors that are the coefficient of the linear segments that make up the value function. The dominated vector at each corner of the belief space is obviously dominant over some region [12].As mentioned since RPF receives 2D vectors as the input arguments; the filter algorithm projects every hyperplane into 2D planes and then passes 2D vectors set to RPF algorithm. It sets zero to every component of hyperplane equations except ones that are in the 2D plane equations. Hence, the projections of hyperplanes are 2D vectors, therefore a set of vectors in each plane now can be passed to RPF for filtering. There are |S|2D 2 possible 2D planes where | S | is the number of states. As shown in Figure 1, each 3D vector represents value function of one action.As indicated, after each projection, RPF gets sets of 2D vector equation and starts filtering. In filtering process, in each plane, if any of the 2D vectors is part of a 2D upper surface, its corresponding plane index is labelled as dominated vector and its index will add to final pruned vectors list.
Fig. 1. Value Function is shown as a plane in a 3-states POMDP problem
238
5 5.1
M. Naser-Moghadasi
Experimental Results Empirical Results
Asymptotic analysis provides useful information about the complexity of an algorithm, but we also need to provide an intuition about how well an algorithm works in practice on a set of problems. Another disadvantage of Asymptotic analysis is that it does not consider constant factors and operations required outside the programs. To address these problems, we have run LP,RFP on the same machine which had 1.6 GHz AMD Athlon Processor with 2Gb RAM on a set of the benchmark problems from the POMDP literature. These problems are obtained from Cassandra’s Online repository [13]. For each problem and method we report the following: 1. size of the final value function (|V|); 2. CPU time until convergence; 3. resulting ADR; The number of states, actions and observations are represented by S , A and Z. As with most exact methods, a set of linear action-value functions are stored as vectors representing the policy. In each iteration of running algorithm, the current policy is transformed into a new set of vectors which are then filtered in three stages. Each stage produces a unique minimum-size set of vectors. This cycle is repeated until the value function converges to a stable set of vectors. The difference between two successive vector sets make an error bound. The algorithms are considered to have converged when the error bound is less than a threshold of convergence (). We set and δ values to 0.02. A final set of vectors after convergence does not guarantee an optimal solution until the performance of the policy is considered under a simulator of the POMDP model. Changing to a higher value would lead to a non-optimal solution, and on the other hand if it is set to a lower value; it may loop between two sets of vectors because of numerical instability of solvers. Although 0.02 may not be the absolute minimum value for , but we believe that it is small enough to provide the precision for evaluating policies in the simulator. We have evaluated based on their average CPU time spent to solve each problem. All POMDP solvers were allowed to run until they converged to a solution or they exceed a 24 hours maximum running time. Our hypothesis is, if the solvers do not find a solution before 24 hours because of the numerical instability they may oscillate between two successive iterations and may not converge. Table 1 summarizes the experiments for all three POMDP solvers.In this table RPF was compared to linear programming filtering (LP). An x on the table means that the problem did not converge under the maximum time limit set to perform the experiment; therefore we are unable to indicate how many vectors (i.e Value functions) form the final solution. The Vector column on the table indicates how many vectors form the final solution and the Time column shows average CPU time in second over 32 executions . Having higher number
A New Graphical Recursive Pruning Method for the IP Algorithm
239
of vectors in the final solution and less convergence time are two major positive factors that are considered in our evaluation. We define the term better in our evaluation when a POMDP solver can find solution of a problem in lesser time with more final value functions (|V|) than others. From the Table 1 we can see that RPF found solution on POMDP problems Network, Scai, 4x3 while LP was not able to converge to a solution before 24 hour limit. In the term of size of the final value function (|V|), RFP had more vectors than LP in the Shuttle, Hanks, Tiger problems. RPF is better than the others in POMDP problems Network, Scai and 4x3. Table 1. Experiment I: Descriptions and results presented as the arithmetic mean of 32 run-times
Vector
Time(second)
Problem |S| |A| |Z| LP RPF LP Tiger
2
3
2
7
9
4.68
8.625
Network 7
4
2
x
83
x
203.031
5
9
1.843
3.875
Hanks
4
4
2
Shuttle
8
3
5 22
35 44.68 101.031
Saci
12 6
5
x
43
x
600.938
4x3
16 4
2
x 436
x
72006.625
3
x
x
1.9062
Example 2 5.2
RPF
2
4
Simulation
One way to evaluate the quality of policy is to run it under simulator and observe accumulated average of the discounted reward that agent received over several trials. A POMDP policy is evaluated by its expected discounted reward over all possible policy rollouts. Since the exact computation of this expectation is typically intractable, we take a sampling approach, where we simulate interactions of an agent following the policy with the environment. A sequence of interactions, starting from b0 , is called a trial. To calculate ADR, successive polices are tested and rewards are discounted, added and averaged accordingly. Each test starts from an arbitrary belief state with a given policy, Discounted reward is added for each step until the maximum steps limit is reached. The test is repeated for the number of trials. Steps are added among all the trials. The ADR is represented
240
M. Naser-Moghadasi
in the form of ( mean ± confidence interval) among all tested policies. In our implementation of ADR, confidential interval is 95% . #trials #steps i=0
j=0
γ j rj
(1)
#trials
We computed the expected reward of each such trial, and averaged it over a set of trials, to compute an estimation of the expected discounted reward. In this experiment, we have computed ADR for a sub-set of POMDP problems where the RPF algorithm and LPF techniques have solutions. Since ADR values are noisy for the less number of trials, we have tested different number of trials starting with 500. After several tries, we saw that the difference between ADR means with 2000 trials and 2500 are small enough to chose 2000 as the final number of trials in our experiment.We tested such policies in the simulator with 500 steps for each POMDP problem over 2000 trials as shown in Table 2. In general, RPF has the near close ADR values to other approach. This implies RPF policy has a performance similar to LP for the set of problems we chose. Although LP is the winner in term of the ADR for the Tiger, but it has smaller ADR mean value for the rest of the problems than RPF. One hypothesis is size of |V| in LP policy in Table 1 for these problems is smaller than RFP; and it also shows that the value of computed ADR mean under a policy is proportional to the size of the final value function (|V|). However, the policy of LP in the Tiger problem is an exception that with less (|V|) we have observed nearly same or better ADR mean values than with higher (|V|). We believe that it may come from characteristics of the POMDP model, Therefore since these values are nearly close to each other further experiments need to be done to prove our guesses. Table 2. Experiment II: Average Discounted Reward
Average Discounted Reward Problem |S| |A| |Z|
6
LP
RPF
Tiger
2
3
2
20.74 ± 0.65
18.12 ± 0.696
Hanks
4
4
2
3.147 ± 0.039
3.178 ± 0.039
Shuttle 8
3
5 32.7116 ± 0.1064 32.74 ± 0.103
Conclusion
We considered a new filtering technique, called Recursive Point Filter (RPF) for Incremental Pruning (IP) POMDP solver to introduce an alternative method for Linear Programming (LP) filter. As suggested in the original work on Incremental Pruning technique, filtering takes place in three stages of an updating
A New Graphical Recursive Pruning Method for the IP Algorithm
241
process, we have followed the same structure in our implementation to have a fair evaluation with previous approaches. RPF identifies vectors with maximum values in each witness region known as dominated vectors. The dominating vectors at each of these points then become a part of the upper surface. We have shown that a high-quality POMDP policy can be found in the less time in some cases.Furthermore, RPF had solutions for several POMDP problems that LP was not able to converge in 24 hours. As mentioned in the paper, the quality of POMDP solutions of LP approach depends on the numerical stability of LP solver.Also, LP based filter requires LP libraries, which can be expensive, especially the powerful ones. Because of these reasons, we proposed the idea of filtering vectors as a graphical operator in POMDP solver. In each iteration of the algorithm, vectors that are not part of the upper-surface would be eliminated. We also included Average Discounted Reward in our evaluation for a sub-set of POMDP problems where the RPF and LP techniques have solutions. We tested such policies in the simulator with 500 steps for the POMDP problems over 2000 trials.The promising result is, RPF has a closer ADR mean value than other approach. This implies RPF policy has a performance similar to LP for the set of problems we chose. Our initial objective in this research was to present an alternative solver to evaluate approximate solutions on POMDP problems with small number of states. We are going to extend our implementation to use parallel processing over CPU nodes to test RPF on larger POMDP problems like Hallway to evaluate solutions of approximate techniques. We also intend to log the number of pruned vectors in each iteration of the algorithms for more consideration on how well each algorithm performs pruning on average after a large number of iterations and when the convergence threshold changes.
References 1. Monahan, G.E.: A survey of partially observable Markov decision processes. Management Science 28(1), 1–16 (1982) 2. Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Operations Research 21, 1071–1088 (1973) 3. Caines, P.E.: Linear Stochastic Systems. John Wiley, New York (April 1988) 4. Spaan, M.T.J.: Cooperative active perception using POMDPs. In: AAAI 2008 Workshop on Advancements in POMDP Solvers (July 2008) 5. Goldsmith, J., Mundhenk, M.: Complexity issues in Markov decision processes. In: Proceedings of the IEEE Conference on Computational Complexity. IEEE, Los Alamitos (1998) 6. Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Efficient dynamic-programming updates in partially observable markov decision process. Technical report, Brown University, Providence, RI (1996) 7. Cassandra, A., Littman, M.L., Zhang, N.L.: Incremental pruning: A simple, fast, exact algorithm for partially observable Markov decision processes. In: Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (1997) 8. Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, Acapulco, Mexico (2003)
242
M. Naser-Moghadasi
9. Smith, T., Simmons, R.G.: Heuristic search value iteration for POMDPs. In: Proc. Int. Conf. on Uncertainty in Artificial Intelligence, UAI (2004) 10. Spaan, M.T.J., Vlassis, N.: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, 195–220 (2005) 11. Thrun, S., Burgard, W., Fox, D.: Probablistic Robotics. The MIT Press, Cambridge (June 2006) 12. Pyeatt, L.D., Howe, A.E.: A parallel algorithm for POMDP solution. In: Proceedings of the Fifth European Conference on Planning, Durham, UK, pp. 73–83 (September 1999) 13. Cassandra, A.R.: Exact and Approximate Algorithms for Partially Observable Markov Decision Process. PhD thesis, Brown University, Department Of Computer Science (1998)
A New Pruning Method for Incremental Pruning Algorithm Using a Sweeping Scan-Line through the Belief Space Mahdi Naser-Moghadasi Texas Tech University 302 Pine Street, Abilene, TX 79601, USA
Abstract. This paper introduces a new filtering technique to speed up computation for finding exact policies for Partially Observable Markov Decision Problems (POMDP). We consider a new technique, called Scan Line Filter (SCF) for the Incremental Pruning (IP) POMDP exact solver to introduce an alternative method to Linear Programming (LP) filter. This technique takes its origin from the scan line method in computer graphics. By using a vertical scan line or plane, we show that a high-quality exact POMDP policy can be found easily and quickly. In this paper, we tested this new technique against the popular Incremental Pruning (IP) exact solution method in order to measure the relative speed and quality of our new method. We show that a high-quality POMDP policy can be found in lesser time in some cases. Furthermore, SCF has solutions for several POMDP problems that LP could not converge to in 12 hours.
1 Introduction One of the most challenging tasks of an intelligent decision maker or agent is planning, or choosing how to act in such a way that maximizes total expected benefit over a series of interactions with environment. Such agent/environment interactions can often be effectively modelled as a Partially Observable Markov Decision Problem (POMDP). Operations research [1,2] and stochastic control [3] are two domains where this model can be applied for balancing between competing objectives, action costs, uncertainty of action effects and observations that provide incomplete knowledge about the world. Planning, in the context of a POMDP, corresponds to finding an optimal policy for the agent to follow. The process of finding a policy is often referred to as solving the POMDP. In the general case, finding exact solutions for this type of problem is known to computationally intractable [4,5] . However, there have been some recent advances in both approximate and exact solution methods. One successful approach for finding solutions to these problems relies on approximate techniques. In [6], a finite set of points from belief space is selected in advance and then heuristically updated. However, approximate methods have the drawback that we cannot precisely evaluate them without knowing the exact solutions for the problems that we are solving.Furthermore, There are crucial domains that need an exact solution to control accurately an artifact for example when dealing with human life or controlling an expensive land rover. Our objective in this paper is to present an alternative G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 243–253, 2010. c Springer-Verlag Berlin Heidelberg 2010
244
M. Naser-Moghadasi
solver to evaluate approximate solutions on POMDP problems with small numbers of states. Among current methods for finding exact solutions, Incremental Pruning (IP) [7] is the most computationally efficient. As with most exact and many approximate methods, a set of linear action-value functions are stored as vectors representing the policy. In each iteration of running algorithm, the current policy is transformed into a new set of vectors and then they are filtered. The cycle is repeated for some fixed number of iterations, or until the value function converges to a stable set of vectors. The IP filter algorithm relies on solving multiple Linear Programs (LP) at each iteration. Based on experience, it is known that the quality and speed of the POMDP solvery depends on the speed and numerical stability of the LP solver that it uses. In some cases, an agent does not need to know the exact solution to a POMDP problem in order to perform its tasks. Often, an approximate solution is adequate. Over the years, many techniques have been developed to compute approximate solutions to POMDP problems. The goal of finding approximate solution is to find a solution very quickly with a condition that the solution is not ”too far” from the exact solution, where the definition of what is too far is problem dependent. It is for this purpose that we introduce this new technique based on the scan line method from computer graphics. Our approach does not directly imitate the scan line algorithm, which performs a vertical sweep from left to right or right to left. Instead, we start from a uniform probability distribution, and move on in a predefined direction. More details about this method will be explained later.
2 Introduction to POMDP A POMDP models the interaction between an agent and a stochastic partially observable environment [4]. It consists of a septuple (S, A, R, P, γ, Z, O), where S represents the set of all possible states and A the set of all possible actions. R(s, a) is the reward an agent obtains after performing action a while in state s. P (s | s, a) is the probability of transitioning to state s immediately after taking action a in state s. γ is the discount factor that controls the importance of previous rewards at the current time. Z is the set of all possible observations and O(z, s) is the probability of observing z given that the current state is s. In addition, we define an ordered set of time steps T = (t0 , t1 , t2 , . . .), or decision epochs. At each epoch t ∈ T , the environment is in some state st ∈ S and the agent must choose an action at ∈ A which maximizes its long-term reward. However, it is assumed that the agent cannot determine precisely what the current state st is. Instead, the agent must maintain a probability distribution bt over S, the set of all possible states. The probability distribution bt is commonly referred to as the belief state, or simply the belief, at time t. The agent’s goal is to choose actions that will maximize its long term reward. It does this by following a policy π : b → a, which maps belief states into the appropriate action to take. Solving a POMDP means finding π such that the expected sum of discounted rewards is maximized. At each epoch t, after an action at has been taken and an observation zt has been made, the agent calculates a new belief state bt+1 using an application of Bayes rule. The process for maintaining the belief state is equivalent to a Bayes filter. The well
A New Pruning Method for Incremental Pruning Algorithm
245
known Kalman filter is a Bayes filter where the variables are restricted to be linear and normally distributed. The initial belief is set to some arbitrary b0 . A policy can be characterized by a value function. A value function at a belief bt is the expected future discounted reward the agent can gather by following the policy starting from the current belief bt [4]. The value function for a POMDP is represented by a set V of N -dimensional vectors v ∈ V where N = |S| is the number of possible states. These vectors represent linear action-value functions, i.e. each vector has an associated action, and represents the total reward obtainable for following a specific sequence of actions, from each possible initial belief state. For a complete introduction to the POMDP problem please visit [8]. The set of action value functions, when plotted, form a piecewise linear and convex (PLC) surface over the belief space. Finding the solution to a POMDP problem involves finding the vectors that form the PLC upper surface at every epoch. That process is commonly referred to as filtering the value function or policy.
3 POMDP Algorithm Algorithm 1 and Algorithm 2 are the general algorithms used to find POMDP policies [8]. Algorithm 1 includes an optional pruning step on line 21, which can be used to filter the value functions in Υ to speed up calculation. This filter operation is performed by Algorithm 1. POMDP Solver 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24:
Υ = (0;0,. . .,0) for τ = 1 to T do Υ = ∅ k ) ∈ Υ do for all (a’; v1k , . . . , vN for all control actions a do for all measurements z do for j = 1 to N do k k = N va,z,j i=1 vi p(z|si )p(si |a, sj ) end for end for end for end for for all control actions a do for all k(1),. . .,k(M)=(1,. . .,1) to (|Υ | . . . , |Υ |) do for i = 1 to N do k(z) vi = γ r(si , a) + z va,z,i end for (a; v1 , . . . , vN ) to Υ end for end for optional filtering method to prune Υ Υ = Υ end for return Υ
246
M. Naser-Moghadasi
Algorithm 2. policyPOMDP(Υ, b =(p1 , . . . , pN )): 1: u ˆ=
arg max
N
k ,...,v k )Υ (a;v1 N
i=1
vik pi
2: return u ˆ
eliminating vectors that do not form the upper surface, and therefore do not take part in the solution at epoch t or any future epoch. The most used filtering technique for finding an exact solution uses linear programming. The disadvantages of linear programming are that it is slow, computationally expensive, and subject to numerical instability. This is why we introduce this new exact filtering technique.
4 Related Work Many approximate POMDP solvers have been developed over the years. Among these solvers are PERSEUS [9] and PBVI [10]. PERSEUS and PBVI are both point based solvers. They differ by the fact that PERSEUS uses randomly generated belief points instead of an increasing belief set, as in PBVI. However, the two methods use a similar technique, called backup, which can increase the number of belief points in PBVI and readjust the value of some belief points in PERSEUS at each iteration of the POMDP algorithm. Among current methods for finding exact solutions, Incremental Pruning(IP) [7] is the most computationally efficient . As with most exact and many approximate methods, a set of linear action-value functions are stored as vectors representing the policy. In each iteration of running algorithm, the current policy is transformed into a new set of vectors and then they are filtered. The cycle is repeated for some fixed number of iterations, or until the value function converges to a stable set of vectors, The IP filter algorithm relies on solving multiple linear programs (LP) at each iteration. Our approach differs from these two methods (PBVI,PERSEUS) by firstly it is a new filtering approach for the exact solver (Incremental Pruning), Secondly it generates belief points once and using them until the solution is found. Also, the belief points that we are generating are quite predictable and not entirely random. This technique takes its origin from the scan line method in computer graphics. By using a vertical scan line or plane, we show that a high-quality exact POMDP policy can be found easily and quickly.
5 The Scan Line Technique Value functions are stored in the form of vectors v = (v1 , . . . , v|S| ) which represent hyperplanes over belief space. Beliefs can also be presented in the form of vectors b = (b1 , . . . , b|S| ) with a condition that b1 +b2 +. . .+b|S| = 1. Given a value function, v, and a belief, b, we can calculate the reward associated with b by the following computation: R = v1 b1 + v2 b2 + . . . + v|S| b|S|
(1)
A New Pruning Method for Incremental Pruning Algorithm
247
Now, assume that we have a set of value functions V = (V1 , . . . , VN ), where N = |S|. What we need to do is generate a belief, b, and calculate the reward R1 , . . . , RN associated with b with respect to V . The value function that generates the maximum reward, from R1 , . . . , RN , is recorded and is considered part of the solution to the policy for the current problem. Like all approximations, the quality of the solution depends on how close it is to the true solution. In this technique, the quality of the solution is affected by the way the belief, b, is generated and how b is moved to cover most of the belief space related to the problem. 5.1 Generating the Belief Instead of generating beliefs that scan the belief space from left to right or right to left, a different approach was taken. We generated a set of beliefs B, with the initial belief 1 1 b0 , set to have equal probability of being in each existing state, b0 = ( |S| , . . . , |S| ), i.e. 1 . b0 was a uniform distribution over S. Then, a number is generated with 0 < < |S| To assure that the sum of the probability distribution in b is equal to 1, we move b in the following way, 1 1 B= + (|S| − x), + (|S| − x + 1), |S| |S| 1 1 + (|S| − x + 2), . . . , − (|S| − x + 2), |S| |S| 1 1 − (|S| − x + 1), − (|S| − x) . (2) |S| |S| The idea is to make sure that if we add to x number of probabilities, then we also subtract from another x number of probabilities. The main goal is that each addition of should be compensated by a subtraction of so that we can keep the sum of probabilities equal to 1. The number of belief points that will be in the set B depends on the value of a density parameter α, where 0 < α < 1. Algorithm 3 shows the algorithm that we have used to generate belief points. In this algorithm, the array Bel is initialized with the initial belief point b0 , as defined above, in all its indices. Line 1 determines the maximum and minimum boundary of which 1 and 0 respectively. In line 1, α is added to on each iteration until it reaches the are |S| maximum boundary. Lines 2 through 5 specify the range of indices where values will be increased or reduced. On lines 9 to 12, we subtract from the values in index k to l of the Bel array which has initial belief in each iteration. After computing new belief values in line 10 we add them to our belief set B. On lines 13 to 16, we add to the values in index i to j and insert these values to B as well. If the number of subtractions by exceeds the number of additions by , then we fix the difference by adding the value of the number of excess times to the final index in lines 17 to 19. We maintain the sum of addition and subtraction between indices by variables increasedSum and reduceSum in each iteration respectively. Belief points are only generated once. The belief set remains constant through all iterations of the POMDP algorithm.
248
M. Naser-Moghadasi
Algorithm 3. Belief Generator 1 1: for = 0 to |S| step α do 2: for k = 1 to |S| do 3: for l = 1 to |S| do 4: for i = 1 to |S| do 5: for j = 1 to |S| do 6: if (l − k ) < (j − i) then 7: continue 8: end if 9: for x = k to l do 10: B ← Bel[x] - 11: reduceSum = reduceSum + 12: end for 13: for y = i to j do 14: B ← Bel[y] + 15: increasedSum = increasedSum + 16: end for 17: if increasedSum < reduceSum then 18: B ← ( Bel[j] +(reduceSum - increasedSum) ) 19: end if 20: increasedSum = reduceSum = 0 21: end for 22: end for 23: end for 24: end for 25: end for
5.2 The Scan Line A parameter β is used to chose the belief points in B that will participate in the scanning procedure. The value of β indicates how many belief points from the current one will be skipped as the scan is performed. After the set of belief points B is generated, as shown in figure 1 for 2 states problem, we take a belief b ∈ B, calculate the expected rewards associated with b, find the maximum, and record the corresponding vector (i.e. value function) that is associated with the maximum reward. This procedure goes on until we have exhausted all the belief points in B or the number of remaining belief points is less than the value of β or the set of value functions associated with the maximum rewards is equal to the set of value functions.
6 Testing Results This scan line filter (SL) approach was used in our Incremental Pruning (IP) implementation to solve some well known POMDP problems like tiger, 4x4, network, chess, ect.
A New Pruning Method for Incremental Pruning Algorithm
249
Fig. 1. 2D Vectors in two states belief space Table 1. Comparing SL and LP in term of solution PROBLEM LP EPOCH LP VECS SL EPOCH SL VECS S A O Tiger 272 7 272 9 2 3 2 4x3 X X X X 11 4 6 4x4 239 6 X X 16 4 2 Chess 238 6 X X 11 4 7 EJS1 X X X X 3 4 2 Example1 X X 289 5 2 2 3 Network X X 330 3 7 4 2 Hanks 236 5 179 2 4 4 2 Saci.. 417 47 233 1 12 6 5 Shutle2 283 21 290 5 8 3 5 Shutle 287 22 334 5 8 3 5
Complete definitions about these problems can be found on [11]. Although the value of α and β were set to 0.001 and 10 respectively for each problem but it made different number of belief points generated with the SL approach for each problem with different number of states. The results were compared with our implementation of IP using a Linear Program (LP) filter. 6.1 Solution Quality In our first evaluation of the SL solver, we ran our program for a set of problems with different numbers of states, observations and actions. Table 1 summarizes the experiments and the results obtained on a 2 Ghz Intel Core 2 processor. On this table, our scan line filtering technique approach was compared to the linear programming filtering
250
M. Naser-Moghadasi
approach. SL and LP were allowed to run to solve problems until they converged to a solution or they exceed a 12 hours maximum running time. Each row in Table 1 shows one selected problem. The number of states, observations and actions for each problem are shown in columns S, O, and A respectively. For each problem, the number of total executed iterations before termination of the LP solver is included in LP EPOCHS column, and the the number of total executed iterations before termination of the SL solver is included in SL EPOCHS column. The total number of vectors in the value function of the final solution for each POMDP problem is indicated in LP VECS and SL VECS columns. For instance, in the Tiger problem which presented in first row, it is shown that the LP based solver terminated after 272 epochs with 7 vectors in the final value function. On that same problem, the SL solver produced a value function with 9 vectors after executing the same number of epochs. The term EPOCH in the table refers to the number of iterations or horizons needed for the problem to converge into a stable policy. An X on the table indicates that the problem did not converge within the maximum time limit set to perform the experiment; therefore we are unable to indicate how many vectors (i.e value functions) forms the final solution. The VECS columns on the table indicate how many vectors form the final solution. The number of epochs and the number of final vectors did not show any variation as more testing (50 for each problem) was performed. From the table, we can see that our scan line approach was able to solve two problems, Network and Example1, that LP was not able to solve under the 12 hour limit. In terms of quality of the solution, there were cases where the solution provided by SL was close to the solution given by LP (Hanks and Tiger on the table) and there were cases were the solution provided by SL was very different from the solution given by LP (Saci.. and Shutle). 6.2 Speed of the Solver We ran each of the two solvers 50 times each on the set of problems, and took the average run times. Table 2 shows the results of these experiments. Other than in the two problems mentioned in the previous section, LP was always faster than SL. In this table, time is expressed in CPU seconds unless otherwise specified.
Table 2. Comparing SL and LP in term of speed PROBLEM LP Average TIME Tiger 2.08 Hanks 1.08 Shutle2 22.58 Shutle 25.08
SL Average TIME 3.08 over 1 hour over 1 hour over 1 hour
S A O 2 4 8 8
3 4 3 3
2 2 5 5
A New Pruning Method for Incremental Pruning Algorithm
251
6.3 Parameter Testing with the Tiger Problem Our SL filter has two adjustable parameters, α and β. α is the density parameter and is also the value used to check whether or not two rewards are significantly different. The value of β indicates how many belief points will be skipped at each iteration of the scan. In the Tiger problem, each value of α corresponds to a specific number as illustrated by Table 3. Here, our goal was to find the values for α and β under which our approach would be faster on solving the tiger problem. Table 4 expresses the results of this experiment, which was performed on a 2 Ghz Intel Core 2 processor. From Table 4, we can see that setting α = 0.1 and β = 1 resulted in shorter run time than any other setting. For the tiger problem, our approach seems to perform fastest by considering 67 belief points in total. Table 3. Relation between α and number of belief points in tiger problem α Number of belief points 0.1 67 0.01 551 0.001 5510 0.0001 55012
Table 4. Varying α and β in order to determine the best setting to solve the tiger problem β 1 10 1 5 10 1 10 1 10 50 100 1 10 50 100
α 0.1 0.1 0.01 0.01 0.01 0.05 0.05 0.001 0.001 0.001 0.001 0.0001 0.0001 0.0001 0.0001
Time to find solution 0.394 sec No solution 8.1065 sec No solution No solution 0.996 sec No solution 8 min 6.3873 sec 9.7034 sec 1.6462 sec 1.2928 sec 496 min 50.3 sec 9 min 48.7 sec 1 min 31 sec 1 min 15 sec
7 Conclusion and Future Work In terms of speed, for maximum performance we need to set proper values for α and β for each problem. Our initial objective was to find a solution for those POMDP
252
M. Naser-Moghadasi
problems where the LP based filter was not able to find one, due to numerical instability.Also, LP based filter requires LP libraries, which can be expensive, especially the powerful ones.Because of these reasons, we proposed an easy to implement filter method for the Incremental Pruning algorithm which gets its originality from computer graphic. Our scan line approach did not perform better than regular linear programming with some values for α and β on some POMDP problems. However, as Table 2 shows, with some values for α and β we have a significantly faster solution than the LP based filter on the Tiger problem. The strength of the SL approach resides in the fact that it was able to solve two problems that LP was not able to finish within the 12 hour limit. Further experiments and tunings need to be performed to exploit the strengths of this approach and to increase its speed. Our future research in this domain will consider approaches which dynamically change values of α and β in each epoch. By better understanding the effects of these values, we may develop a formula that maps complexity of the POMDP problem to good values of these parameters. For instance, as Table 2 indicates, with small values for these parameters we can still have a high quality solution but the solution requires less time to compute. Future work will also include finding a faster and more efficient way to generate belief points. One possibility that we will explore is to generate belief points for different numbers of problems at one time, and then save those values in a file. When running the solver again on similar POMDP problems, we we can simply read the belief points from the file rather than generating them in the program. We will also test this scan line approach on more problems to find a better comparison with LP and some other approximate POMDP solvers.
References 1. Monahan, G.E.: A survey of partially observable Markov decision processes. Management Science 28(1), 1–16 (1982) 2. Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable Markov processes over a finite horizon. Operations Research 21, 1071–1088 (1973) 3. Caines, P.E.: Linear Stochastic Systems. John Wiley, New York ( April 1988) 4. Spaan, M.T.J.: Cooperative active perception using POMDPs. In: AAAI 2008 Workshop on Advancements in POMDP Solvers (July 2008) 5. Goldsmith, J., Mundhenk, M.: Complexity issues in Markov decision processes. In: Proceedings of the IEEE Conference on Computational Complexity. IEEE, Los Alamitos (1998) 6. Spaan, M.T.J., Vlassis, N.: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, 195–220 (2005) 7. Cassandra, A., Littman, M.L., Zhang, N.L.: Incremental pruning: A simple, fast, exact algorithm for partially observable Markov decision processes. In: Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (1997) 8. Thrun, S., Burgard, W., Fox, D.: Probablistic Robotics. The MIT Press, Cambridge (June 2006) 9. Spaan, M.T.J., Vlassis, N.: Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research 24, 195–220 (2005)
A New Pruning Method for Incremental Pruning Algorithm
253
10. Pineau, J., Gordon, G., Thrun, S.: Point-based value iteration: An anytime algorithm for POMDPs. In: Proceedings of the International Joint Conference on Artificial Intelligence, Acapulco, Mexico (2003) 11. Cassandra, A.R.: Tony’s POMDP file repository page (July 2009)
POMDP Filter: Pruning POMDP Value Functions with the Kaczmarz Iterative Method Eddy C. Borera, Larry D. Pyeatt, Arisoa S. Randrianasolo, and Mahdi Naser-Moghadasi Texas Tech University 302 Pine Street, Abilene, TX 79601, USA
[email protected],
[email protected], {mahdi.moghadasi,arisoa.randrianasolo}@ttu.edu
Abstract. In recent years, there has been significant interest in developing techniques for finding policies for Partially Observable Markov Decision Problems (POMDPs). This paper introduces a new POMDP filtering technique that is based on Incremental Pruning [1], but relies on geometries of hyperplane arrangements to compute for optimal policy. This new approach applies notions of linear algebra to transform hyperplanes and treat their intersections as witness points [5]. The main idea behind this technique is that a vector that has the highest value at any of the intersection points must be part of the policy. IPBS is an alternative of using linear programming (LP), which requires powerful and expensive libraries, and which is subjected to numerical instability. Keywords: Planning Under Uncertainty, POMDP Filter, A.I., Robotics.
1
Introduction
The increasing number of devices and agents that perform automated decision making, has intensified researchers’ interests in a Partially Observable Markov Decision Problem (POMDP) framework, in which an agent cannot directly observe its current state. And, finding solutions for this kind of problem is known to be computationally intractable [11,4]. Based on our understanding, the best current method for finding exact policies is the Incremental Pruning (IP) technique [1]. There are two steps to the algorithm: first, the current policy is transformed into a new set of vectors; then, the resulting vectors are filtered and the cycle is repeated. Most pruning techniques rely on solving exponentially increasing linear programs. In this paper, however, we use a different technique to eliminate unnecessary vectors during updates. This new filtering technique relies on geometrical properties of hyperplanes and casts the filtering operation for POMDP problems as finding intersection points between hyperplanes in Euclidean spaces. This avoids using LP libraries that can be expensive and unstable numerically.
Corresponding author.
G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 254–265, 2010. c Springer-Verlag Berlin Heidelberg 2010
POMDP Filter: Pruning POMDP Value Functions
2
255
POMDP Basics
The Partially Observable Markov Decision Process is a framework that models interactions between an agent and a stochastic partially observable environment [11]. It can be denoted as a tuple (S, A, Z, P r, R, γ, ) where S represents the set of all possible states, A the set of all possible actions. At each time step, the agent resides in a state s ∈ S, which is not directly observed, and performs an action a to receive an immediate reward r(s, a) ∈ R. Then, the agent moves to state s depending on the transitional probability P r(s | s, a), before making an observation z ∈ Z that is defined by the observation probability P r(z | s ). The agent’s goal is to select a set of actions that will maximize its long term reward, in other words finding the optimal actions. This requires solving a POMDP, which is the same as finding a policy that maps every belief state into an action that will maximize the expected discounted rewards. A policy is formulated as π : β → A, where β is the set of of all possible belief states: belief space. A value function is the expected future discounted reward that the agent will accumulate by following the policy starting from belief b [11]. It is a mapping from belief states to real numbers that represent the overall expected reward for every possible belief state. At any horizon, the updated value function V , for some belief state b, can be computed recursively in three steps [1]: V (b) = max V a (b) a∈A a Vza (b) V (b) =
(1) (2)
z∈Z
Vza
(b) =
z
r (a, s) b(s) + γP r (z | a, b) V (baz ) |Z|
(3)
At first, the action and observation specific value functions Vza (Eq. 3) are computed, and then combined to form the action specific value functions V a (Eq. 2), which in turn build the new value function V (Eq. 1). It has been shown that these functions when plotted, form a piecewise linear convex (PWLC) surface [2]. Thus, they can be represented by a set of |S|-vectors that are referred to as the set S , S a , and Sa z respectively.
S = purge
S
a
a∈A
a
S = purge
(4)
Sza
(5)
z∈Z
Sza = purge(r(b, a, z)|b ∈ β) where r(b, a, z) is the |S|-vector given by r(b, a, z)(s) =
r (a, s) + γb(s)P r (z | a, s ) P r (s | a, s) , |Z|
(6)
256
E.C. Borera et al.
and ⊕ denotes vectors cross sum, such that given two sets of vectors A and B, A ⊕ B = {α + β | α ∈ A, β ∈ B}. Building S from S a and Saz is straight forward, but the efficiency of every algorithm in this case relies on “pruning”, “filtering”, or “purging” these sets, where they are represented by the minimal number of vectors. This guarantees less number of linear programs to be solved, which in turn allows LP-based technique to find optimal policies faster.
3
Related Work
Approximate techniques have been successful in practice, which include HSVI [8], PERSEUS [10], VDCBPI [7]. Most of these methods avoid performing updates for all possible belief states, but rather choose a smaller sets of these states, or contract the belief space β to have a smaller subspace as in VDCBPI. The number of states being updated may change dynamically as the algorithm runs, while others keep the same number of states. HSVI has been very effective in this domain, which uses heuristic methods to updates the lower and upper bounds of the value function during its update. Poupart et al. [7] claims that VDCBPI have been able to solve problems with up to 33 millions states. In contrast to approximate technique, exact solvers perform updates for the entire belief space β, and they mostly apply a dynamic programming approach to update their value functions. Most of them also solve large number of linear programs in order to compute for witness points or regions. Some examples of these techniques include the Sondik’s One Pass algorithm [9], which starts with a random belief point, then finds the dominant vector at that point. It then, constructs a region for that vector, and also additional regions where the same vector is guaranteed to be dominant. The value function for the corresponding function is defined by the intersection of the regions being found. Another technique is the Cheng’s Linear Support Algorithm [1], which also starts from an arbitrary point, and add the best vector at that point to the corresponding value function V . Then, the endpoints at the corner of the region defined by the newly added vector would yield additional dominant vectors to be part of V , and same for their intersection points. There are additional techniques such as exhaustive(Monahan 1982) and witness algorithms (Littman, Cassandra, and Kaelbling) that are based on Sondik’s One Pass and Cheng’s Linear Support techniques. But, the one that is still widely used today is the Incremental Pruning(IP) [1].
4
Pruning Alpha-Vectors
This new technique that we are proposing avoids the usage of linear programming. Previous techniques rely on solving sets of linear programs that grow exponentially in term of the number of constraints, which mostly require a very powerful and efficient LP library when updating value functions. Despite having the most powerful LP library there is, these techniques are still limited to small
POMDP Filter: Pruning POMDP Value Functions
257
Purge(A, B) F ←A⊕B W ←∅ E←∅ foreach s in S do ωs ← argmaxφ∈F (es · φ) W ← W ∪ {ωs } F ← F \{ωs } end foreach ω in W do ρ ← MakeEquation(ω, S) foreach χ in E do ν ← Intersection(χ, ρ) I ← I ∪ {ν} end E ← E ∪ {ρ} end while F = ∅ and I = ∅ do P ← ProjectToBeliefSpace(I, S) M ←∅ I←∅ foreach τ in P do / M ) then if ( (argmaxφ∈F (τ · φ) > argmaxω∈W (τ · ω)) and φ ∈ M ← M ∪ {φ} ν ← MakeEquation(φ, S) foreach χ in E do ψ ← Intersection(χ, ν) I ← I ∪ {ψ} end E ← E ∪ {ν} end end W ←W ∪M F ← F \M end return F Fig. 1. Intersection point based filter, where a set of vectors (F = A ⊕ B) is being reduced to the minim size possible. S represents the set of all possible states, where |S|
state POMDP problems. Pineau in [6] mentions that the best exact techniques can efficiently solve problem of dozens of states. The major difference in our technique is that we use a different approach to filter the value function aphavectors, since by looking at their geometric surfaces, we think that we can tackle this problem as an imaging or graphic problem. The Purge function in Fig. 1 starts with a set F , where F = SA ⊕ SB , for some sets SA , SB ⊆ S a . Then, in the first foreach loop, a new set W is filled with all vectors that are easily dominant at one of the belief space endpoints. In
258
E.C. Borera et al.
other words, the vector v = argmaxi (ei · νi ) , where ei is the ith standard basis, for 1 ≤ i ≤ |S|. They are at the same time removed from F , then the next task is to compute intersections between these vectors in W , which are mapped into an Euclidean space by the MakeEquation procedure (Fig. 1). 4.1
Mapping into Euclidean Space
First, each set of vectors are transformed into an Euclidean space, which is defined by the MakeEquation procedure (Fig. 1), and it is also depicted in Fig. 2. y V1 (P s1)
V2 1
V1
(HV1 )
V0 S1
S0
1
V1 (P s0) x
Fig. 2. Mapping value function to Euclidean Space
v = νs1 , · · · , νsk
(7)
The transformation of a value function(Eq.7) is straightforward by creating vectors in the Euclidean Space of length vνsi = νsi with 45 degrees from every axis, then have them shifted one unit along the corresponding si -axis in order
to generate the set of points that define its corresponding hyperplane Hνsi . |S| Note that the belief space in this case still maintains its property i=1 si = 1.0. The points that define the transformed hyperplane are simply represented by the endpoints of the vectors formed from the elements of the alpha-vector v. Given their distance from the origin,which is |S| 2 νsi = vνsi = i=1 si the coordinates of these endpoints are defined as: νsi × |S| si = ± , 1 ≤ i ≤ |S| |S| The corresponding point Pνsi have the same value for its coordinates except for si , which has an extra unit due to the shifting. The sets of these points, then define a hyperplane corresponding to the alpha-vector v (Eq. 7) in Euclidean space, which is straightforward to compute.
POMDP Filter: Pruning POMDP Value Functions
5
259
Computing Intersection Points with the Kaczmarz Method
The Intersection procedure in Fig.1 uses the Kaczmarz method [3], which starts from an arbitrary point P , and applies iterative projections of points onto a sequence of hyperplanes. To illustrate the Kaczmarz process, consider two hyperplanes (H1 ) and (H2 ) defined by their corresponding linear equations. Take an arbitrary point P on the plane, then project that point to the first hyperplane (H1 ) perpendicularly. Then, we project the resulting point to the second hyperplane (H2 ), and again project that point back to (H1 ), and so on. This will eventually produce a zigzag path that will converge to an intersection point. In [3], Gal´ antai proves that the Kaczmarz method always converges for any linear system Ax = b, even if the system is overdetermined and inconsistent. This last case never occurs in our algorithm 1, as collinear planes are being checked. This is one of the main reason, we choose this technique, as A doesn’t have to be a square matrix. Given that at higher dimensions, intersections of hyperplanes may not be points but rather a hyperplane. Since,we are looking for points only, it is preferred to get multiple points from the plane, preferably some from each of the endpoints to get set points that form the hyperplane of the intersections. These points usually form a convex, that define the intersecting hyperplane. These intersection points then need to be projected back to the belief space by the ProjectToBeliefSpace procedure (Fig. 1). In another words, we need to turn the intersection points into probability distributions, by projecting them to the belief space defined by the equation s1 + s2 + . . .+ s|S| = 1, where every resulting point serves as a witness point. Then, the algorithm goes through every projected intersection point, and compute a scalar product to every value function in both W and F . If there exists a vector in F that dominates every vector in W , then it is added to W and being removed from F (Fig. 1). This process is repeated until there are no vectors to be moved from F to W , meaning the set W has all af the vectors that are part of the highest surface. The resulting set W then holds the minimal filtered alpha-vectors for the specific value function, which is used to compute the POMDP solution.
6 6.1
Experiments and Results Experiment 1
To test the new technique, we selected seven well known problems problems from the literature. The Tiger problem involves a three-way choice between going through one of two doorways or listening for the tiger, which is behind one of the doorways. The Shuttle problem is a simple model of a space shuttle that must move back and forth between two space stations. The Painting problem involves selecting actions for painting parts and rejecting defective parts, which
260
E.C. Borera et al. shuttle observation vectors size
shuttle observation vectors size
4500
1200 2e-1 1e-6 1e-10 1e-15
4000
2e-1 1e-6 1e-10 1e-15 1000
3500 800 Sa Set Size
2500
600
z
z
Sa Set Size
3000
2000 1500
400
1000 200 500 0
0 0
50
100
150
200
250
300
0
Epoch
50
100
150
200
250
300
350
400
Epoch
(a) Shuttle solved with LP.
(b) Shuttle solved with the Kaczmarz Method.
Fig. 3. (a) and (b) show the size of Saz for different values.
also known as the Hanks problem. Additionally, three maze problems: the 4x4, 4x3 , and cheese problems. To test them, we run each of these problems several times with Incremental Pruning, which will be referred to as LP, and the Iterative Method to compute for their averaged time and the number of iterations until convergence. The experiments were run on a 1.6 Ghz AMD Athlon Processor with 2Gb RAM. Each of the featured problem will be run with the Incremental Pruning technique that uses the Restricted Region filter and the intersection based filter that uses the Kaczmarz technique. Table 1. Experiment descriptions and results with averaged run times given in seconds. The iterations column indicates the number of loops until convergence. The average run time in seconds is for = 1e − 12. Problem Tiger Hanks Cheese Shuttle 4x4 4x3 Network *
|S| 2 4 11 8 16 11 7
|A| 3 4 4 3 4 4 4
|Z| 2 2 7 5 2 7 2
Iteration LP Kaczmarz 272 272 236 236 238 238 – 283 239 – 239 – – 351
Time LP Kaczmarz 420.62 182.25 341.19 21.12 8.50 174.94 – 854.75 31.12 – 31.12 – – 865
Table 1 provides some descriptions of these problems, which includes the number of states |S|, actions |A|, and observations|Z|. It also features the number of iterations performed until convergence. Most exact filter techniques including Resctricted Region still set the -optimality parameters to control their results quality and performance. The lower value of , the better the quality of the policy should be. Setting a larger value for , is similar to finding approximative
POMDP Filter: Pruning POMDP Value Functions
261
solutions. This value basically allows the filter to ignore vectors that provides rewards higher than the vectors that are already part of the solution. In simple words, the filter doesn’t need to optimize solutions if the next vector provide optimizations values less than . Table 2. Kaczmarz solver tested on four different values. Each clocked time represents an average time after running each problem 20 times for the respective value.
Tiger Shuttle Cheese Hanks 4x4 4x3 Network
Kaczmarz 2e − 1 1e − 6 1e − 10 1e − 15 120.55 178.70 164.650 161.7 73.75 754.90 787.05 1343.00 * 126.40 170.60 140.15 153.55 4.7 * 16.700 * 15.05 * 15.75 * 429.85 7315.25 7189.90 7794.00 676.35 132 * 517 * 560 * 2164 *
Table 3. LP solver tested on four different values. Each clocked time represents an average time after running each problem 20 times for the respective value. LP 2e − 1 1e − 6 1e − 10 1e − 15 Tiger 3.45 90.10 297.75 297.0 Shuttle 31.15 – – – Cheese 2.75 6.75 9.1 6.1 Hanks 1.35 70.95 237.10 238.55 4x4 2.70 18.30 24.20 18.75 4x3 18.75 * – – – Network 1601 * – – –
During the experiments, each problem was run 20 times for five different values in order to test how both technique behave numerically. The results are shown in Tables 1, 2, and 3. Table 1 gives the runtime for each problem, when = 1e − 12. The other tables provides additional runtime for the problems when is set to 2e − 1, 1e − 6, 1e − 10, and 1e − 15. In Tables 2 and 3, the entries that are marked with an asterisk (*), are time to reach a stopping criteria of 351 episodes. We chose this specific number, after realizing that usually if they pass 350 episode, the solutions usually oscillate for some period T . This incident is illustrated in Fig. 4 for the network and Fig. 5 for the hanks problem, and it can happen to both technique as well.
262
E.C. Borera et al. network observation vectors size
network observation vectors size
3000
800 2e-1 1e-15
2e-1 1e-6 1e-10 1e-15
700 2500 600 2000 Saz Set Size
Saz Set Size
500
1500
400
300 1000 200 500 100
0
0 0
50
100
150
200
250
300
350
Epoch
(a) Network solved with LP.
400
0
50
100
150
200
250
300
350
400
Epoch
(b) Network solved with the Kaczmarz Method.
Fig. 4. Fig. (a) and (b) show the size of Saz for different values
The missing entries in Tables 2 and 3 describes if an algorithm could not converge to final solutions after 8 hours nor reaches 351 episodes if it oscillates or diverges. This usually happens when the number of vectors at every iteration grows exponentially. For example, in table 1 ( = 1e − 12), when solving Tiger and hanks, the Kaczmarz version clearly outperforms LP. However, LP seems to do better for the cheese problem. To investigate more about these results, we decided to analyze the number of vectors at each iteration. The number of vectors usually affect the performance of the POMDP solver, because for LP if there are more vectors to be updated, there would be more LP constraints to be solved. As mentioned earlier, the number of linear constraints in linear programs define the efficiency of an LP related solver. Thus in case of Tiger and Hanks problem, the Kaczmarz technique was able to eliminate more vectors. This later is illustrated in Fig. 3, 4, 5, and 6. 6.2
Experiment 2: Simulation
Solving for POMDPs are very complex, and their solutions are sometimes very ambiguous to analyze due to different expectations from applications. Some applications may require fast algorithms, while others require accurate techniques. To test the quality of a policy from the Kaczmarz method, we decided to implement the 4x3 maze problem, as a simulation. The maze is illustrated in Fig. 7. At each time step, the agent belongs to one of the 11 states in Fig. 7. It also can make observation, where the sensors can detect whether a door on right and/or on the left. Also, two additional observations are added to determine if the robot has reached the goal state S3 . Thus, the observations are: left, right, both, neither, good, or bad. There are four actions: North, South, West, and East, which should move the agent to an adjacent state with a 0.8 probability. Policies from both the LP and Kaczmarz method are applied to the robot in order for it to navigate successfully to the final state. Fo each policy, the simulation is run 1000 times and the start state is chosen at random, which hidden
POMDP Filter: Pruning POMDP Value Functions hanks observation vectors size
hanks observation vectors size
400
80 2e-1 1e-6 1e-10 1e-15
350
2e-1 1e-6 1e-10 1e-15
70
300
60
250
50 Saz Set Size
Saz Set Size
263
200
40
150
30
100
20
50
10
0
0 0
50
100
150
200
250
0
50
100
150
200
Epoch
250
300
350
400
Epoch
(a) Hanks solved with LP.
(b) Hanks solved with the Kaczmarz Method.
Fig. 5. (a) and (b) show the size of Saz for different values 450
IPBS Tiger LP Tiger LP Cheese IPBS Cheese
400
350
Set Size
300
250
200
150
100
50
0
0
20
40
60
80
100
120
Epoch
Fig. 6. Size of the Saz set for the Tiger and Cheese Problem. LP refers to the Restricted Region Incremental Pruning filter which uses Linear Programming. IPBS refers to the filter that uses the Kaczmarz Method. ( = 1.e-12).
from the agent. At start, the agent is given a uniform probability distribution across all states except the final state: S3 . From the table 4, on average both technique have around the same number of steps, but the Kaczmarz method seems to have the highest number of step. To analyze this difference, we performed a two-tailed Student t-Test of information from LP vs. Kaczmarz., and if found no significant difference between the two categories. (t = 0.1448, p = 0.8849). Overall, the results are surprisingly close. Also, both technique never failed to achieve their goals. Failure happens, if an agent successively takes an action that will always bring it to a wall. This means it keeps on taking an action that would not allow it move anywhere else.
264
E.C. Borera et al. −0.04
S0
−0.04
−0.04
S1
S2
+1.0
S3 Goal State
−0.04
−0.04
S4
−0.04
S7
−1.0
S5
S6
−0.04
−0.04
−0.04
S8
S9
S10
Fig. 7. 4x3 Maze Structure
Table 4. Results after running the 4x3 simulation 1000 of times for LP and Kaczmarz Policy. With a 95% confidence interval for the average reward. LP Kaczmarz Average Iteration 5.93 5.95 Average Reward −0.2372 ± 0.0103 −0.2382 ± 0.0091 Success 100% 100%
7
Conclusions
From most of the problems, it seems that the Kaczmarz method can eliminate more vectors compared to the LP version, which not necessary entails a better solution. In Fig. 6, 5, and 4, LP retains more vectors in the Saz set, before converging to less number of vectors in later stage. This slows down the process of finding optimal policy, as the number of linear constraints to be solved by LP grows excessively. For the shuttle problem, it fails to find an optimal answer for 8 hours. This depends on a stopping criteria being set, which also can cause oscillations on solutions being generated. Dealing with this problem is part of our future work, and to find an efficient stopping criteria for a POMDP value iteration.
References 1. Cassandra, A., Littman, M.L., Zhang, N.L.: Incremental pruning: A simple, fast, exact algorithm for partially observable Markov decision processes. In: Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (1997) 2. Cassandra, A.R., Kaelbling, L.P., Littman, M.L.: Acting optimally in partially observable stochastic domains. In: Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA (1994) 3. Gal´ antai, A.: Projectors and Projection Methods. Kluwer Academic Pub., 3300 AH Dordrecht (2004)
POMDP Filter: Pruning POMDP Value Functions
265
4. Goldsmith, J., Mundhenk, M.: Complexity issues in Markov decision processes. In: Proceedings of the IEEE Conference on Computational Complexity. IEEE, Los Alamitos (1998) 5. Littman, M.L.: The witness algorithm: Solving partially observable Markov decision processes. Technical Report CS-94-40, Brown University, Department of Computer Science, Providence, RI (December 1994) 6. Pineau, J.: Tractable Planning Under Uncertainty: Exploiting Structure. Ph.D. thesis, Carnegie Mellon University (August 2004) 7. Poupart, P., Boutilier, C.: VDCBPI: an approximate scalable algorithm for large scale POMDPs. In: Proceedings of NIPS, Vancouver (2004) 8. Smith, T., Simmons, R.: Heuristic search value iteration for pomdps. In: Uncertainty in Artificial Intelligence (2004) 9. Sondik, E.: The optimal control of partially observable Markov processes. Ph.D. thesis, Standford University (1971) 10. Spaan, M.T.J., Vlassis, N.: Perseus: Randomized point-based value iteration for pomdps. JAIR 24, 195–220 (2005) 11. Spaan, M.T.J.: Cooperative active perception using POMDPs. In: AAAI 2008 Workshop on Advancements in POMDP Solvers (July 2008)
Testing Image Segmentation for Topological SLAM with Omnidirectional Images Anna Romero and Miguel Cazorla Instituto Universitario Investigación en Informática, Universidad de Alicante P.O. Box. 99 03080 Alicante
[email protected],
[email protected]
Abstract. Image feature extraction and matching is useful in many areas of robotics such as object and scene recognition, autonomous navigation, SLAM and so on. This paper describes a new approach to the problem of matching features and its application to scene recognition and topological SLAM. For that purpose we propose a prior image segmentation into regions in order to group the extracted features in a graph so that each graph defines a single region of the image. We compare two basic methods for image segmentation, in order to know the effect of segmentation in the result. We have also extend the initial segmentation algorithm in order to take into account the circular characteristics of the omnidirectional image. The matching process will take into account the features and the structure (graph) using the GTM algorithm, modified to take into account the cylindrical structure of omnidirectional images. Then, using this method of comparing images, we propose an algorithm for constructing topological maps. During the experimentation phase we will test the robustness of the method and its ability to construct topological maps. We have also introduced a new hysteresis behavior in order to solve some problems found in the graph construction. Keywords: Topological Mapping, Graph matching, Visual features.
1 Introduction The extraction and matching of features and regions is an important area in robotics since it allows, among other things, object and scene recognition and its application to object localization, autonomous navigation, obstacle avoidance, topological SLAM. The SLAM (Simultaneous Localization And Mapping) problem consists of estimating the position of the robot while building the environment map. The problem is not trivial, since errors in position estimation affect the map and vice versa. In the literature, depending on the form to represent the robot environment, we can talk of two types of SLAM: the Metric SLAM and the Topological SLAM. In the first, the position is determined by a continuous space, i.e. we know exactly what position the robot has on the map (with an assumed error). It is easy to find solutions that include odometry, sonars and lasers ([21,23]). There are less solutions using vision since calculating the exact position is more complicated. In the second type, the different points where you can find the robot are represented by a list of positions, i.e. the map is a discrete set of G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 266–277, 2010. c Springer-Verlag Berlin Heidelberg 2010
Testing Image Segmentation for Topological SLAM with Omnidirectional Images
267
locations which defines a small region on the environment. In this case there are plenty of solutions that use images for the calculations. In [25] they use the images captured by the AIBO robot to learn the topological map. We also find solutions using omnidirectional images such as [26] and [24], [27] where a topological map is constructed using an incremental algorithm. For both object and scene recognition we need methods of extracting features and/or regions from images. Several solutions in the literature use different methods for extracting the features. In [5] they use an over-segmentation algorithm for split the image into small regions. In [6] they combine the Harris corner detector with SIFT descriptor. Many solutions in the literature are based on the combination of a segmentation algorithm with a feature extractor ([5], [12], [10]). Object recognition requires a manually selected database to describe the objects that the robot must recognize. In the case of scene recognition we could require a scene database as in [11] where it is introduced the concept of “Visual Place Categorization“ (VPC) which consists of identifying the semantic category of one place/room using visual information. However, there are situations requiring no pre-existing database as it is constructed as the robot navigates through the environment ([12], [13]) such as in the SLAM problem. Affine invariant feature detectors have been shown to be very useful in several computer vision applications, like object recognition and categorization, wide baseline stereo and robot localization. These detection algorithms extract visual features from images that are invariant to image transformations such as illumination change, rotation, scale and slight viewpoint change. High level vision tasks that rely on these visual features are more robust to these transformations and also to the presence of clutter and occlusions. A more detailed survey of the state of the art of visual feature detectors can be found in [7]. In this work, the authors assess the performance of different algorithms for the matching problem, with the Maximally Stable Extremal Regions algorithm (MSER) [8], the Harris affine and the Hessian affine [9] being the best suited for that task. Several methods are based on a combination of feature detectors (regions, contours and/or invariant points) to improve the matching and taking advantage of the extraction methods used, as well as eliminating some of the problems of the individual methods. However, it has not proposed the creation of structures from the extracted features to check the overall consistency of the matchings but the features are matched one by one without taking into account any possible neighborhood relationships. Some of those methods apply a matching consistency, eliminating cross-matches, those matches that intersects with others. In the case of omnidirectional images that can not be done, due to the circular nature of the images. In this paper we propose a method for matching features and an algorithm to construct topological maps using this comparison me-thod. For the image comparison method we propose a image pre-processing in two steps: segmentation into regions and invariant feature extraction (using MSER with SIFT descriptors). For image segmentation, we compare the results for two algorithms: JSEG, which provides a good segmentation but it takes so much time to compute, and EGBIS, which is faster than the previous one. We have also extend the initial segmentation algorithm in order to take into account the circular characteristics of the omnidirectional image.
268
A. Romero and M. Cazorla
Each of the regions obtained in the first step will contain a list of invariant points inside its domain. For each region, our method will construct a graph with the invariant points considering that omnidirectional images have special characteristics. The feature matching is carried out by comparing the graph of each of the regions of the current image with the representative graph (built with all points of the image) of each of the previously captured images. This approach takes into account both the feature descriptors and the structure of those features within the region. We apply the image comparison method in our topological map algorithm in order to group images that are considered to belong to the same area. The rest of the paper is organized as follows: Section 2 describes the pre-processing done to the image (JSEG and EGBIS segmentation) and feature extraction (MSER). Section 3 explains the graph matching using the GTM algorithm. Then, in Section 4 we describe the algorithm that constructs topological maps. In Section 5 we present the results obtained applying the combination of the image matching method and the topological mapping algorithm. Finally in Section 6 we draw certain conclusions.
2 Image Processing MSER (Maximally Stable Extremal Regions) [8] is an affine invariant shape descriptor. The MSER algorithm detects regions that are darker or brighter than their surroundings and can be scale invariant. The algorithm uses the SIFT descriptor to describe the detected regions. Due to the nature of the descriptors, it is possible to associate (match) an MSER region (feature) of an image with one that appears in another image using Euclidean distance in the matching process. Despite the robustness of the method we can find many cases where the feature matching has not been successful (false positives or outliers). To eliminate these false positives and thus obtain a more reliable and robust results in identifying scenes seen before, we propose using a structure (graph) with which to compare (and match) images. To detect the different image regions (which eventually form the sub-graphs for comparison) we use the segmentation algorithm JSEG. 2.1 Segmentation Feature detection and extraction methods find characteristics throughout the whole image. Our goal is to group features according to the image region to which they belong, so we need a segmentation algorithm to divide an image into regions. In order to compare different algorithms we use the proposed in [2] and known as JSEG algorithm and the algorithm described in [3] which we have name as EGBIS (Efficient Graph-Based Image Segmentation). Considering the special characteristics of omnidirectional images (the left is a neighbor of the far right of the image, i.e., the image is circular) a piece of the initial part of the image has been added to the final of the image. This is to avoid that a region that has been cut during the image capture (one region part is on the left and the other part is on the right) is taken as two separate regions. In figure 1 we can see the original image with the initial part added to the final of the image and the result to apply the JSEG algorithm and the EGBIS algorithm.
Testing Image Segmentation for Topological SLAM with Omnidirectional Images
269
Fig. 1. Above: Image captured to which was added to the end a piece of the initial part of the image. Middle: Result in applying JSEG. The region inside the ellipse is an example of a region that could have been cut. Below: Result in applying EGBIS.
The first segmentation algorithm, JSEG, finds homogeneity of a particular color pattern and texture. It assumes that the image: – Contains a set of regions with approximately the same color and texture. – Color data in each region of the image can be represented with a small set of quantized colors. – Colors between two neighboring regions are distinguishable from each other. In order to obtain different regions, JSEG performs segmentation in two steps: color quantization and spacial segmentation. In the first step of the algorithm, image colors are coarsely quantized without significantly degrading image quality. This will extract a few representative colors that can be used as classes which separate regions of the image. For each image pixel, the algorithm find its class and replace its value, building an image of labels (class-map). In the second step, a spatial segmentation is performed directly on the class-map without taking into account the color similarity of the corresponding pixel. This transforms the output from the previous step in a J-Image ([2]). Once this image is calculated, the algorithm uses a region growing method for image segmentation. Initially, the JSEG considers the image as one region, performs an initial segmentation with the scale and repeat the same process with the new regions and the next scale. Once the seed-growing step ends, the regions that have been over-segmented are merged using a grouping method. Finally we get two images, one where each pixel has the value of the region it belongs to and one with the real image which has overlapped the edges of each region. An advantage of separating the segmentation into two steps yields an increase in the processing speed of each of the steps, which together do not exceed the time required for
270
A. Romero and M. Cazorla
processing the whole problem. Furthermore, the process is not supervised and therefore there is no need for experiments to calculate thresholds, since the algorithm automatically determines them. The second algorithm, EGBIS, is a graph-based greedy algorithm. The main properties that the algorithm should have according to the authors ([3]) are: – The algorithm must capture perceptually important regions which (often) reflects global aspects of the image. – It has to be efficient, running in time nearly linear regarding the number of the pixels of the image. To segment the image, the EGBIS algorithm measures the evidence for a boundary between two regions by comparing two features: one based on the difference of the intensities throughout the boundary, and the other is based on the difference of the intensities between neighbouring pixels within each region. Intiutively, the difference of the intensities throughout the boundary of the regions are perceptually important if they are large relative to the difference of the intensities inside at least one of the regions ([3]). The algorithm uses a graph to segment the image where the regions are represented as nodes and the relationships between them are represented as weighted edges. The method lets to select the size preference, i.e. if we want smaller or larger regions, although small regions can appear if certain requirements are fulfilled. Two important characteristics of the method are its ability to preserve detail in lowvariability image regions while ignoring detail in high-variability regions and its nearly linear runtime in the number of graph edges. 2.2 Feature Detection and Extraction Once the different regions of the image have been determined, we proceed to extract image features. In our case we use the affine invariant shape descriptor MSER. The algorithm described in [8] searches extremal regions, that is, regions in which all pixels are brighter or darker than all the pixels in their neighborhood. The image pixels are taken in intensity order, forming connected component regions that grow and merge, until all pixels have been selected. From all these connected components, or extremal regions, the algorithm selects those for which size remains constant during a given number of iterations. Finally, the selected Maximally Stable Extremal Regions, that can have any arbitrary shape, are transformed into ellipses. For each image, the features of the entire image are acquired and stored to build a representative graph of the image. Furthermore, each feature is assigned to the region it belongs to (by using the position of the feature in the image and taking into account that the initial pixels and the final pixels of the image are neighbors), obtaining a set of invariant points for every region calculated in the segmentation step. Points that belong to the same region are those used to construct the various sub-graphs that describe the image, i.e. each region has its own graph, built with all the features in its domain. Note that it is possible for some regions not to have any feature associated or some points not to belong to any particular region, in this case the region (or points) is discarded because it does not contain any interesting data.
Testing Image Segmentation for Topological SLAM with Omnidirectional Images
271
3 Matching with Graphs The feature matching process could result in unwanted false positives. To eliminate these outliers we suggest the use of graphs as the structure for matching. The use of graphs allows us to check not only a single invariant point consistency, but also a set of points that have some relationship with each other. The selected method for graph matching is GTM [4] (Graph Transformation Matching). This algorithm needs a list of the position of the matched points as input (x1 , y1 ) (x2 , y2 ). This list is calculated as follows: – A KD − T ree is built (this tree structure, KD − tree, allows relatively quick insertions and searches for a k-dimensional space (128 dimensions in our case, the SIFT descriptor dimension)) with all points of the base image (all points that form the representative graph). – For each region of the current image (image sub-graphs): • For each point in the region, the closest one in the KD − T ree is found. If its Euclidean distance is below a threshold, we have found a match. Once this step is completed, we have a list of matched points that describe a common region between the two images. As this matching may result in many false positives we use the GTM algorithm to compare the structure of the region in both images to eliminate those false positives in the matching. GTM is a point matching algorithm based on attributed graphs [4] that uses information from the local structure (graph) for the treatment of outliers. The graph constructed for comparison is called K-Nearest-Neighbours which is built by adding an edge to the adjacency matrix for the pair (i, j) if node j is one of the k nearest neighbors of node i and if the Euclidean distance between the two points is also less than the average distance of all points on the graph. We also note that the omnidirectional images are circular, so we calculated two Euclidean distances, the normal and the other by calculating the distance between the right point and the beginning of the image and the distance between the left point and the final of the image. The final distance is the smaller of the two previously calculated. Finally, if a node has no k edges, it is disconnected until we finish the graph construction. Once the two graphs from the two images have been constructed, the algorithm eliminates iteratively the correspondences distorting neighborhood relations. To do this, what is considered an outlier is selected, the two nodes (invariant points) that form the match (false positive) are removed from their respective graphs and also the references to those nodes in the two adjacency matrices. The two graphs are then recalculated again. The process continues until the residual matrix (the difference between the adjacency matrices of two graphs) is zero. At this point we considered that the algorithm has found a consensus graph. Once this is acquired, the disconnected nodes are eliminated from the initial matching, obtaining a match where the false positives are removed.
4 Topological Mapping The results of the previous section let us to know if two images can be seen as part of the same environmental region (they have been taken at nearby positions in the real
272
A. Romero and M. Cazorla
world). Using this method for image comparison we have built an algorithm capable of creating topological maps from a sequence of images that form a path in the real world. Our algorithm does not require a database because it is created as a new image is captured. The algorithm builds topological maps in the form of undirected graphs that can be used in topological SLAM applications. The topological map consists of nodes representing a particular area of the environment and an adjacency matrix that shows the relationships between them. The nodes can consist of any number of images, but always have a representative image. This image is one that has more regions in common with the rest of images belonging to the node. In order to calculate the node representative and its minimum matching percentage, we use the formulas: R = arg max( min (C(i, j)) iI
jI,i=j
NR = max( min (C(i, j)) iI
jI,i=j
(1) (2)
These equations appeared in [24] and use the number of matched points in function C(i, j). In order to use our previous method, we have modified this formula as follows: C(i, j) =
N umber of matched points min(N Pi , N Pj )
(3)
where N Pk is the number of points in image k. We select the image with the least number of points since it will match at most this number of points that otherwise could not reach 100% in the equation. Unlike the algorithm proposed in [24], we added a second threshold (T hnodes). This threshold is lower than T hmin (used to compare the current image with that representative of the node) and it is used in the comparison of the nodes. In this way we can improve the creation of maps, as seen in figure 2. The algorithm builds the topological map as follows: 1. When the robot captures a new image, it checks whether the image belongs to the region that defines the current node. For this, the new image is compared to the node representative and if the matching percentage passes a certain threshold (T hmin ), it is added to the node. 2. If the image does not exceed the threshold, it is compared with all the node representatives, to find the node whose percentage is higher. The threshold for comparison with other nodes (T hnodes ) is smaller than for the current node. In this way we are more restrictive when it comes to adding images to the current node and more permissive in loop closures. 3. If no match is found we establish that we have seen a new region, so it creates a new node and adds an edge between this node and the previous one. 4. In any case, if we add an image to an existing node, if T hnodes ≤ C(i, j) ≤ NR , the node representative is re-calculated.
Testing Image Segmentation for Topological SLAM with Omnidirectional Images
273
Fig. 2. The path taken is shown in the image above (from authors page [1]). Both graphs are the results from the two segmentation methods used: midlle JSEG and bottom EGBIS.
5 Results This section shows the results of applying the whole algorithm to the set of images described in [1] which are available for download from the authors’ website. The images
274
A. Romero and M. Cazorla
are omnidirectional, with a resolution of 2048x618. The tests were conducted on the images for the first route, the first 3,000 of the data-set. Omnidirectional images have special features not found in other images. When images covering an angle of 360◦ it is possible to find two images from the same scene containing objects in a different situation or objects that have been cut. In figure 2 we can see the graph representing the topological map created by our algorithm. Due to the large number of images, they have been processed one of every 15. The circles in the image represent the positions of the node representative and the arcs are the relations between the nodes (the edges of the graph). The thresholds for considering a pair of images belong to same region has been estimated empirically. As we can see, even though the path is composed by several laps in the environment, the topological map has not allocated different nodes for each of the laps made, but considers the existing loop-closure and combines images of the same area captured at different times in a single node. However, there are some situations in the below image where the algorithm has created multiple nodes for the same area or there are many edges. In some cases it is because we have not taken pictures in a row, but in other cases is due to changes in light. In order to see how the new algorithm behaves in such situations in the region marked as “sample 1” (figure 3) 1 of every 2 images were taken. Looking at this two images, we can see that they do not have the same illumination and also in the second image the tree covers parts that could be useful for identification (occlusion of many points of interest). Something similar occurs in the tunnel area but in this case, images inside the tunnel have very dark regions so there is no detection of so many feature points in order to match them. Nevertheless, the algorithm has created less nodes because the regions that appear at the beginning and end of the image have been merged in a single region and the result of the comparison between a pair of images taken from a different angle improved significantly.
Fig. 3. From sample 1: tree. Above: image representative of node 2. Below: the one of node 28.
Testing Image Segmentation for Topological SLAM with Omnidirectional Images
275
Comparing the two image segmentation algorithms (see Figure 2), we can observe than the results obtained with the JSEG algorithms seems better than the one with EGBIS. It is better because we can find less cross nodes. The ideal map would be one with no crossing. Nevertheless, we need to formulate a better way to determine that one map is better than other. JSEG gets a better map, but is slower than EGBIS. So, we can conclude that we can use JSEG if there is not restriction in time and we’ll use EGBIS when that restriction occurs. Moreover, the use of two thresholds in our algorithm has improved the topological map, getting less nodes (connecting more regions) and less edges (relations between nodes become clear).
6 Conclusions In this paper we have presented a new method for creating topological maps. The method consists of the combination of image segmentation into regions and the extraction of feature detectors, and then the creation of a graph structure from those features. Thus, for the matching of two images we take into account both the extracted features and the structure (graph) formed by these features. Then, we construct the topological map using this comparison method (with two thresholds), obtaining a non-directed graph that divides the environment into regions and indicates the relationships between different areas. We have focused this work in comparing two different image segmentation methods in order to know the efficiency and effectiveness of the whole algorithm. We can conclude that the second image segmentation method, which is faster to calculate, provides a good enough result. If we can expend some additional time, we’ll obtain better results with JSEG. Both methods have problems with illumination changes. We have also extend the initial segmentation algorithm in order to take into account the circular characteristics of the omnidirectional image. During the experimentation phase we constructed a topological graph that described the environment captured during a long path and with several loop-closures (several laps). As we have seen the environment is divided into several areas, most of them unique, that is, are described by a single node. In cases where more than one node has appeared, we have seen changes due to illumination and occlusions. As future work, we plan to improve the algorithm in order to reduce the sensitive to changes in illumination and occlusions. We also intend to make a more advanced study of the behavior of the algorithm using different features (SIFT, SURF, Harris-Affine, Hessian-Affine). We want also to find a better way to represent topological maps which can allow to compare maps.
Acknowledgements This work has been supported by grant DPI2009-07144 from Ministerio de Ciencia e Innovacion of the Spanish Government.
276
A. Romero and M. Cazorla
References 1. Smith, M., Baldwin, I., Churchill, W., Paul, R., Newman, P.: The New College Vision and Laser Data Set. I. J. Robotic Res. 28(5), 595–599 (2009) 2. Deng, Y., Manjunath, B.S.: Unsupervised Segmentation of Color-Texture Regions in Images and Video. IEEE Trans. Pattern Anal. Mach. Intell. 23(8), 800–810 (2001) 3. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient Graph-Based Image Segmentation. International Journal of Computer Vision 59(2), 167–181 (2004) 4. Aguilar, W., Frauel, Y., Escolano, F., Elena Martinez-Perez, M., Espinosa-Romero, A., Lozano, M.A.: A robust Graph Transformation Matching for non-rigid registration. Image Vis. Comput. 27(7), 897–910 (2009) 5. Joo, H., Jeong, Y., Duchenne, O., Ko, S.-Y., Kweon, I.-S.: Graph-based Robust Shape Matching for Robotic Application. In: IEEE Int. Conf. on Robotics and Automation, Kobe, Japan (May 2009) 6. Azad, P., Asfour, T., Dillmann, R.: Combining Harris Interest Points and the SIFT Descriptor for Fast Scale-Invariant Object Recognition. In: IEEE Int. Conf. on Intelligent Robots and Systems, St. Lois, USA (October 2009) 7. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L.: A comparison of affine region detectors. IJCV 65(1/2), 43–72 (2005) 8. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. BMVC, 384–393 (2002) 9. Mikolajczyk, K., Schmid, C.: Scale and Affine invariant interest point detectors. IJCV 60(1), 63–86 (2004) 10. Chen, X., Huang, Q., Hu, P., Li, M., Tian, Y., Li, C.: Rapid and Precise Object Detection based on Color Histograms and Adaptive Bandwidth Mean Shift. In: IEEE Int. Conf. on Intelligent Robots and Systems, St. Lois, USA (October 2009) 11. Wu, J., Christensen, H.I., Rehg, J.M.: Visual Place Categorization: Problem, Dataset, and Algoritm. In: IEEE Int. Conf. on Intelligent Robots and Systems, St. Lois, USA (October 2009) 12. Liu, M., Scaramuzza, D., Pradalier, C., Siegwart, R., Chen, Q.: Scene recognition with Omnidirectional Vision for Topological Map using Lightweight Adaptive Descriptors. In: IEEE Int. Conf. on Intelligent Robots and Systems, St. Lois, USA (October 2009) 13. Vaquez-Martin, R., Marfil, R., Bandera, A.: Affine image region detection and description. Journal of Physical Agents 4(1), 45–54 (2010) 14. Canny, J.F.: A computational approach to edge detection. IEEE Transaction on Pattern Analysis and Machine Intelligence 8(6), 679–698 (1986) 15. Smith, S.M., Brady, J.M.: SUSAN - A New Approach to Low Level Image Processing. International Journal of Computer Vision 23, 45–78 (1995) 16. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004) 17. Bay, H., Tuytelaars, T., Gool, L.V.: Surf: Speeded up robust features. Computer Vision and Image Understanding (CVIU) 110(3), 346–359 (2008) 18. Smith, R.C., Cheeseman, P.: On the representation and estimation of spatial uncertainty. Int. J. of Robotics Research 5(4), 56–68 (1986) 19. Smith, R., Self, M., Cheeseman, P.: Estimating uncertain spatial relationships in robotics. In: Cox, I.J., Wilfong, G.T. (eds.) Autonomous Robot Vehicles, pp. 167–193. Springer, Heidelberg (1990) 20. Julier, S., Uhlmann, J.K.: A counter example to the theory of simulataneous localization and map building. In: ICRA, pp. 4238–4243. IEEE, Los Alamitos (2001)
Testing Image Segmentation for Topological SLAM with Omnidirectional Images
277
21. Montemerlo, M., Thrun, S.: Simultaneous localization and mapping with unknown data association using FastSLAM. In: Proc. of Intl. Conf. on Robotics and Automation, Taiwan, vol. 2, pp. 1985–1991 (2003) 22. Montemerlo, M., Thrun, S., Koller, D., Wegbreit, B.: FastSLAM: A factored solution to the simultaneous localization and mapping problem. In: AAAI/IAAI, pp. 593–598 (2002) 23. Diosi, A., Kleeman, L.: Advanced Sonar and Laser Range Finder Fusion for Simultaneous Localization and Mapping. In: Proc. of Intl. Conf. on Intelligent Robots and Systems, Japan, vol. 2, pp. 1854–1859 (2004) 24. Valgren, C., Lilienthal, A.J., Duckett, T.: Incremental Topological Mapping Using Omnidirectional Vision. In: IROS, pp. 3441–3447. IEEE, Los Alamitos (2006) 25. Motard, E., Raducanu, B., Cadenat, V., Vitrià, J.: Incremental On-Line Topological Map Learning for A Visual Homing Application. In: ICRA, pp. 2049–2054. IEEE, Los Alamitos (2007) 26. Goedeme, T., Nuttin, M., Tuytelaars, T., Van Gool, L.J.: Omnidirectional Vision Based Topological Navigation. International Journal of Computer Vision 74(3), 219–236 (2007) 27. Valgren, C., Duckett, T., Lilienthal, A.J.: Incremental Spectral Clustering and Its Application To Topological Mapping. In: ICRA, pp. 4283–4288. IEEE, Los Alamitos (2007)
Automatic Image Annotation Using Multiple Grid Segmentation Gerardo Arellano, Luis Enrique Sucar, and Eduardo F. Morales Computer Science Department ´ Instituto Nacional de Astrof´ısica, Optica y Electr´ onica Luis Enrique Erro 1, Tonantzintla, Puebla, M´exico {garellano,esucar,emorales}@inaoep.mx
Abstract. Automatic image annotation refers to the process of automatically labeling an image with a predefined set of keywords. Image annotation is an important step of content-based image retrieval (CBIR), which is relevant for many real-world applications. In this paper, a new algorithm based on multiple grid segmentation, entropy-based information and a Bayesian classifier, is proposed for an efficient, yet very effective, image annotation process. The proposed approach follows a two step process. In the first step, the algorithm generates grids of different sizes and different overlaps, and each grid is classified with a Naive Bayes classifier. In a second step, we used information based on the predicted class probability, its entropy, and the entropy of the neighbors of each grid element at the same and different resolutions, as input to a second binary classifier that qualifies the initial classification to select the correct segments. This significantly reduces false positives and improves the overall performance. We performed several experiments with images from the MSRC-9 database collection, which has manual ground truth segmentation and annotation information. The results show that the proposed approach has a very good performance compared to the initial labeling, and it also improves other scheme based on multiple segmentations. Keywords: Automatic image annotation, multiple grid segmentation, classification.
1
Introduction
Most recent work on image labeling and object recognition is based on a sliding window approach [1], avoiding in this way the difficult problem of image segmentation. However, the rectangular regions used in these approaches do not provide, in general, a good spatial support of the object of interest; resulting in object features that are not considered (outside the rectangle) or incorrectly included (inside the rectangle). Recently, it has been shown [2] that a good spatial support can significantly improve object recognition. In particular, they demonstrate that by combining different image segmentation techniques, we can obtain G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 278–289, 2010. c Springer-Verlag Berlin Heidelberg 2010
Automatic Image Annotation Using Multiple Grid Segmentation
279
better results, with respect to a sliding window approach or any of the segmentation techniques by themselves. They found that for all the algorithms considered, multiple segmentations drastically outperform the best single segmentation and also that the different segmentation algorithms are complementary, with each algorithm providing better spatial support for different object categories. So it seems that by combining several segmentation algorithms, labeling and object classification can be improved. In [2] the authors do not solve the problem of how to combine automatically the several segmentation methods. More recently, Pantofaru et al. [3] proposed an alternative approach to combine multiple image segmentation for object recognition. Their hypotheses is that the quality of the segmentation is highly variable depending on the image, the algorithm and the parameters used. Their approach relies on two principles: (i) groups of pixels which are contained in the same segmentation region in multiple segmentations should be consistently classified, and (ii) the set of regions generated by multiple image segmentations provides robust features for classifying these pixel groups. They combine three segmentation algorithms, Normalized Cuts [4], Mean-Shift [5] and Felzenszwalb and Huttenlocher [6], with different parameters each. The selection of correct regions is based on Intersection of Regions, pixels which belong to the same region in every segmentation. Since different segmentations differ in quality, they assume that the reliability of a region’s prediction corresponds to the number of objects it overlaps with respect to the class labels. They tested their method with the MSRC 21 [7] and Pascal VOC2007 data sets [8], showing an improved performance with respect to any single segmentation method. A disadvantage of the previous work is that the segmentation algorithms utilized are computationally very demanding, and these are executed several times per image, resulting in a very time–consuming process, not suitable for real time applications such as image retrieval. Additionally, simple segmentation techniques, such as grids, have obtained similar results for region–based image labeling than those based on more complex methods [9]. We propose an alternative method based on multiple grid segmentation for image labeling. The idea is to take advantage of multiple segmentations to improve object recognition, but at the same time to develop a very efficient image annotation technique applicable in real–time. The method consists of two main stages. In the first stage, an image is segmented in multiple grids at different resolutions, and each rectangle is labeled based on color, position and texture features using a Bayesian classifier. Based on the results of this first stage, in the second stage another classifier qualifies each segment, to determine if the initial classification is correct or not. This second classifier uses as attributes a set of statistical measures from the first classifiers for the region of interest and its neighbors, such as the predicted class probability, its entropy, and the entropy of the neighbors of each grid element at the same and different resolutions. The incorrect segments according to the second classifier are discarded, and the final segmentation and labeling consists of the union of only the correct segments.
280
G. Arellano, L.E. Sucar, and E.F. Morales
We performed several experiments with images from the MSRC-9 database collection, which has manual ground truth segmentation and annotation information. The results show that the proposed approach is simple and efficient, and at the same time it has a very good performance, in particular reducing the false positives compared to the initial annotation. We also compared our approach for selecting segments based on a second classifier, against the method of [3] which uses region intersection, with favorable results. The rest of the paper is organized as follows. Section 2 presents an overall view of the proposed approach. Section 3 describes the segmentation process and the feature extraction mechanism. Section 3.3 describes the first classifier used to label the different segments in the images, while Section 4 explains the second classifier used to remove false positives and improve the overall performance of the system. In Section 5, the experiments and main results are given. Section 6 summarizes the main conclusions and provides future research directions.
2
Image Annotation Algorithm
The method for image annotation based on multiple grid segmentation consists of two main phases, each with several steps (see Figures 1 and 2): Phase 1 – Annotation 1. Segment the image using multiple grids at different resolutions. 2. Extract global features for each segment: color, texture and position. 3. Classify each segment based on the above attributes, using a Bayesian classifier previously trained based on a set of labeled images. Phase 2 – Qualification 1. For each segment obtain a set of statistical measures based on the results of the first classifier: class probability, entropy of the region, and entropy of its neighboring regions at the same and different resolutions. 2. Based on the statistical measures, use another binary Bayesian classifier to estimate if the original classification of each segment is correct/incorrect. 3. Discard incorrect segments (based on certain probability threshold). 4. Integrate the correct segments for the final image segmentation and labeling. In the following sections each phase is described in more detail.
3
Phase 1: Initial Annotation
Before applying our classifiers for image annotation, we perform two operations on the images: (i) segmentation and (ii) features extraction.
Automatic Image Annotation Using Multiple Grid Segmentation
281
Fig. 1. Phase 1: annotation. For each image in the database we compute multiple grid segmentations. For each segment we extract position, color and texture features; and use these features to classify each segment.
3.1
Image Segmentation
Carboneto [9] found that a simple and fast grid segmentation algorithm can have better performance than a computationaly expensive algorithm like Normalized Cuts [4]. The effectiveness of grid segmentations depends on the size of the grid and on the nature of the images. On the other hand, combining several segmentation algorithms tend to produce better performance, as suggested in [2] and shown in [3]. In this paper we propose to use several grid segmentations, label each segment and combine the results, integrating two relevant ideas: (i) grid segmentation has a very low computational cost with reasonable performance so it can be efectively used in image retrieval tasks, and (ii) using grids of different sizes can produce relevant information for different regions and images of different nature. So rather than guessing which is the right grid size for each image, we used a novel approach to combine the information of the different labels assigned to each grid to improve the final segmentation and labelling results. In this paper we used three grid segmentations of different sizes, but the approach can be easily extended to include additional grids. 3.2
Features
There is a large number of image features that have been proposed in the literature. In this paper we extract features based on position, color and texture for each image segment, but other features could be incorporated. As position features, we extract for each image segment the average and standard deviations of the coordinates, x and y, of the pixels in the rectangular region.
282
G. Arellano, L.E. Sucar, and E.F. Morales
Fig. 2. Phase 2: qualification. For all the classified segments, we obtain the probability of the label, its entropy, the neighborhoods entropy and the intersection entropy; as input to another classifier to reinforce the confidence of each label and then we finally select the correct segments and merge segments with the same class.
Color features are the most common features used in image retrieval. In this paper we used the average and standard deviation of the three RGB channels for each image segment (6 features). In the CIE-LAB Color space the numeric differences between colors agrees more consistently with human visual perceptions. Thus, we also include the average, standard deviation and skewness of the three channels of the CIE-LAB space for each image segment. The perception of textures also plays an important role in content-based image retrieval. Texture is defined as the statistical distribution of spatial dependences for the gray level properties [10]. One of the most powerful tools for texture analysis are the Gabor filters [11], a linear filter used in image processing for edge detection. Frequency and orientation representations of Gabor filter are similar to those of human visual system, and it has been found to be particularly appropriate for texture representation and discrimination. Gabor filters could be viewed as the product of a low pass (Gaussian) filter at different orientations and scales. In this paper we applied Gabor filters with four orientations θ = [0, 45, 90, 135], and two different scales, obtaining 8 filters in total. 3.3
Base Classifier
The previously described features are used as input to a Naive Bayes classifier. This classifier assumes that the attributes are independent between each other given the class, so using Bayes theorem the class probability given information of the attributes (P (Ci |A1 , ..., An )) is given by: P (Ci |A1 , ..., An ) =
P (Ci )P (A1 |Ci), ..., (An |Ci) P (A1 , ..., An )
(1)
Automatic Image Annotation Using Multiple Grid Segmentation
283
where Ci is the i-th value of the class variable on several feature variables A1 , ..., An . For each segment we obtain the probabilities of all the classes, and select the class value with maximum a posteriori probability; in this way we label all the segments created by the multi-grid segmentation process.
4
Phase 2: Annotation Qualification
In the second phase we use a second classifier to qualify the classes given by the first classifier and improve the performance of the system. This is a binary classifier that decides whether the predicted label of the first classifier is correct or not, given additional contextual information. We compute the likelihood of the predicted class using another Naive Bayes Classifier. As attributes for this second classifier we use: Class Probability: The probability of the label given by the first classifier. Entropy: The entropy of each segment is evaluated considering the probabilities of the predicted labels by the first classifier, defined as: H(s) = −
n
P (Ci ) log2 P (Ci )
(2)
i=1
where P (Ci ) is the likelihood of prediction for class i and n is the total number of the classes. Neighborhood Entropy: We also extract the entropy of the segment’s neighbors and add this information if they have the same class: 1 H(x)δ(Class(x), Class(v)) (3) H(v) = |vc | x∈Vc
where Vc are all neighbors segments, H(x) is the entropy from neighbors segments with the same Class(v), normalized by the number of neighbors with the same class |vc | and, 1 if x = v δ(x, v) = 0 otherwise This is illustrated in Figure 3, where the class label is illustrated with different colors. In this case, three neighbors have the same class as the central grid, while one neighbor has a different class. The value of the attribute will have the normalized sum of the entropies of three neighbors with the same class. Intersection Entropy: We also consider information of the cells from other grids that have an intersection with the current segment. In this case we used Equation 3 and applied it to the segments of the different grids. In the case of the three grids considered in the experiments, the cells in the largest grid size have 20 neighbors each, the middle size segments have 5 neighbors each and the smallest segments have only two neighbors. This is illustrated in Figure 4. Different grid segmentations and intersection schemes could be used as well.
284
G. Arellano, L.E. Sucar, and E.F. Morales
Fig. 3. Neighbors in the same grid. We consider the top, down, right and left neighbors and obtain their entropy, considering only the neighbors that have the same class as the current segment (best seen in color).
Fig. 4. Neighbors in the other grids. The image is segmented in grids of different sizes, each segment will have a different number of intersected neighbors. For the three grids we used in the experiments, the largest segments have 20 neighbors, the middle size segments have 5 neighbors, and the smallest segments have only 2 neighbors (best seen in color).
Combined Attributes: We also incorporated new attributes defined by a combination of some of the previous attributes, namely the entropy of the segment plus the neighbor’s entropy and the entropy of the segment plus the intersection’s entropy. The incorporation of additional features that represent a combination of features can sometimes produce better results. Other features and other combinations could be used as well. The second classifier is shown in Figure 5. This classifier is used to qualify each segment and filter the incorrect ones (those with a low probability of being correct are discarded).
Fig. 5. Second classifier. Graphical model for the qualification classifier, showing the features used to decide if segments were correctly or incorrectly classified.
Automatic Image Annotation Using Multiple Grid Segmentation
5
285
Experiments and Results
We performed several experiments using the MSRC-9 database with 9 classes: grass, cow, sky, tree, face, airplane, car, bicycle and building. All images were resized to 320x198 or to 198x320 depending on the original image size, and segmented using three different grids with cell sizes of 64, 32 and 16 pixels. The database has 232 images, 80% were randomly selected and used for training and 20% for testing for both classifiers. Position, color and texture features were extracted for the first classifier and the features mentioned in Section 4 for the second classifier.
Fig. 6. The figures show how a sample of segments (100) for different classes are classified, as correct (red triangles) or incorrect (blue circles), based on the segment local entropy (left) and the intersection entropy (right). Although there is not a perfect separation, we can observe certain tendency in both graphs, which confirms that these features could be used as indicators to distinguish correct vs. incorrect region classification.
Fig. 7. Comparison of the results with only the first classifier and after the second classifier is applied. The graphs show the TP and FP generated by both classifiers.
286
G. Arellano, L.E. Sucar, and E.F. Morales
We first analyzed some of the features used in the second classifier. We found that entropy and the intersection entropy measures work well as a discriminant between segments that are correctly or incorrectly classified. This is shown in Figure 6, where the left graph shows the division between correctly and incorrectly classified segments using entropy, while the right graph shows the separation considering the intersection entropy. Then we compared the performance of the base classifier against the performance after the segments are selected based on the qualification classifier. The results in terms of true positives (TP) and false positives (FP), are summarized in Figure 7. Although there is a slight reduction in true positives, there is a
Fig. 8. Comparison of segment selection based on our approach vs. region intersection in terms of true positives (TP) and false positives (FP)
Fig. 9. Comparison of segment selection based on our approach vs. region intersection in terms of precision, recall and accuracy
Automatic Image Annotation Using Multiple Grid Segmentation
287
.....................................................................................
Fig. 10. Sample images. Up: original image. Middle: coarse grid. Down: intermediate grid. Correct labels are in black and incorrect in red (best seen in color).
288
G. Arellano, L.E. Sucar, and E.F. Morales
great reduction in false positives; showing that our method can effectively eliminate incorrect segments. The reduction in TP is not a problem, as there is a redundancy in segments by using multiple grids. Finally, we compared our method for segment qualification based on a second classifier against the method proposed in [3] based on intersection of regions. For this we used the same data, and incorporated the intersection criteria into our multiple grid segmentations, instead of the second classifier. The results in terms of different performance criteria are summarized in Figures 8 and 9. In Figure 8 we compare the number of true positives (TP) and false positives (FP) for our approach vs. region intersection; and in Figure 9 we compared them in terms of precision, recall and accuracy. The threshold used for selecting the correct segments in the second classifier is 0.9 We observe that our method for region qualification outperforms the method based on intersections in all the performance measures, with significant differences in TP and precision. Examples of region labeling and qualification for 4 images with two different grid sizes are shown in Figure 10. Our current implementation in MatLab requires about 30 seconds to process an image for the complete process (both phases) using a PC with a dual core at 2.4 GHz and 4GB of RAM. We expect that an optimized “C/C++” implementation can reduce this time at least an order of magnitude. These experiments show that our novel approach using a second classifier with contextual information based on statistical measures, produces significant improvements over the initial labeling, and is more effective that the approach based on intersections.
6
Conclusions and Future Work
In this paper we proposed an automatic image annotation algorithm using a multiple grid segmentation approach. Images are segmented in a very efficient way using a simple grid segmentation with different grid sizes. Individual features based on position, color and texture are extracted and used for an initial labeling of the different grid segments. This paper introduces a novel approach to combine information from different segments using a second classifier and features based on entropy measures of the neighbor segments. It is shown how the second classifier significantly improves the initial labeling process decreasing the false negative rate; and has a better performance that a selection based on region intersection. As future work we plan to combine the correct regions in the different grids to produce a final single segmentation and labeling; and to test our approach in other image databases.
Acknowledgments We would like to thank the members of the INAOE-TIA research group for their comments and suggestions.This work was partially supported by CONACyT under grant 271708.
Automatic Image Annotation Using Multiple Grid Segmentation
289
References 1. Viola, P., Jones, M.: Robust real-time face detection. Int. J. of Comp. Vision (2001) 2. Malisiewicz, T., Efros, A.A.: Improving spatial support for objects via multiple segmentations. In: BMVC (2007) 3. Pantofaru, C., Schmid, C.: Object recognition by integrating multiple image segmentations. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part III. LNCS, vol. 5304, pp. 481–494. Springer, Heidelberg (2008) 4. Shi, J., Malik, J.: Normalized cuts and image segmentation. In: Proc. CVPR, pp. 731–743 (1997) 5. Comaniciu, D., Meer, P.: Mean shift: A robust approach toward feature space analysis pami. IEEE Trans. Patt. Anal. Mach. Intell. 24, 603–619 (2002) 6. Felzenszwalb, P., Huttenlocher, D.: Efficient graph-based image segmentation. Int. Journal of Computer Vision 59, 167–181 (2004) 7. Shotton, J., Winn, J., Rother, C., Criminisi, A.: The msrc 21-class object recognition database (2006) 8. Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal voc 2007 (2007) 9. Carbonetto, P.: Unsupervised statistical models for general object recognition. Master’s thesis, The University of British Columbia (2003) 10. Aksoy, S., Haralick, R.: Textural features for image database retrieval. In: CBAIVL 1998, p. 45. IEEE Computer Society, Los Alamitos (1998) 11. Chen, L., Lu, G., Zhang, D.: Content-based image retrieval using gabor texture features. In: PCM 2000, Sydney, Australia, pp. 1139–1142 (2000)
Spatio-temporal Image Tracking Based on Optical Flow and Clustering: An Endoneurosonographic Application Andr´es F. Serna-Morales1, Flavio Prieto2, , and Eduardo Bayro-Corrochano3 1
3
Department of Electrical, Electronic and Computer Engineering, Universidad Nacional de Colombia, Sede Manizales, Carrera 27 No. 64-60, Manizales (Caldas), Colombia Tel.: +57 (6) 8879300; Ext.: 55798
[email protected] 2 Department of Mechanical and Mechatronics Engineering, Universidad Nacional de Colombia, Sede Bogot´ a, Carrera 30 No 45-03, Bogot´ a, Colombia Tel.: +57 (1) 316 5000; Ext.: 14103
[email protected] CINVESTAV, Unidad Guadalajara, Av. Cient´ıfica 1145, El Baj´ıo, Zapopan, Jalisco, M´exico Tel.: +52 (33) 37773600; Ext.: 1027
[email protected]
Abstract. On the process of render brain tumors from endoneurosonography, one of the most important steps consists in track the axis line of an ultrasound probe throughout successive endoscopic images. Recognizing of this line is important because it allows computing its 3D coordinates using the projection matrix of the endoscopic cameras. In this paper we present a method to track an ultrasound probe in successive endoscopic images without relying on any external tracking system. The probe is tracked using a spatio-temporal technique based on optical flow and clustering algorithm. Firstly, we compute the optical flow using the Horn-Schunck algorithm. Secondly, a feature space using the optical flow magnitude and luminance is defined. Thirdly, feature space is partitioned in two regions using the k-means clustering algorithm. After this, we calculate the axis line of the ultrasound probe using Principal Component Analysis (PCA) over segmented region. Finally, a motion restriction is defined over consecutive frames in order to avoid tracking errors. We have used endoscopic images from brain phantoms to evaluate the performance of the proposed method, we compare our methodology against ground truth and a based–color particle filter, and our results show that it is robust and accurate. Keywords: Endoneurosonography (ENS), endoscopic images, tracking, ultrasound probe, optical flow, clustering, Principal Component Analysis (PCA).
Corresponding author.
G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 290–300, 2010. c Springer-Verlag Berlin Heidelberg 2010
Spatio-temporal Image Tracking
1
291
Introduction
Stereotactic neurosurgery involves the registration of both pre-operative and intra-operative medical images, typically from volumetric modalities such as MRI, CT and ultrasound, with a surgical coordinate system [2]. For most operations, the surgical coordinates are defined by a set of x, y, and z reticules on a surgical frame that is affixed to the patient’s skull. In modem computer-aided surgery systems, the frame is replaced by a tracking system (usually optical, but possibly mechanical or magnetic), that the computer uses to track the surgical tools [3]. Virtual representations of the tools are rendered over an image volume to provide guidance for the surgeon. In order to obtain good results using these images, the brain must not shift prior to or during surgery. This is only possible by using minimally invasive surgery, i.e. performing the operation through a small hole in the skull [12]. Recent trends in minimally invasive brain surgery aim the use of the joint acquisition of endoscopic and ultrasound images, a technique that has been called endoneurosonography (ENS) [8]. Endoscopic images have great utility for minimally invasive techniques in neurosurgery. Ultrasound images are cheaper than other medical images like CT and MRI; moreover, these ones are easier to obtain in an intra-operative scenario [14]. In literature, some work has been done in order to extract tridimensional information of brain tumors using endoneurosonography [3]. Due to their advantages, we are planning to use endoneurosonography for rendering internal structures of the brain, such as tumors. In this work, we want to solve one of the most important steps in the process: track an ultrasound probe in a sequence of endoscopic images and compute its pose in 3D space avoiding to use any external tool (neither optic nor magnetic). The equipment setup is as follows: the ultrasound probe is introduced through a channel in an endoscope and is seen by two endoscopic cameras. With a visual tracking system (Polaris) we calculate the 3D position of the endoscope tip and we want to know the pose of the ultrasound probe in order to have the exact location of the ultrasound sensor. This is important because the ultrasound probe is flexible and is rotating around its own axis, it can also move back and forth since the channel is enough wide. With these conditions, we need to develop a robust method for tracking the ultrasound probe throughout endoscopic images. In general, tracking methods can be divided into two main classes specified as top–down and bottom–up approaches. A top–down approach generates object hypotheses and tries to verify them using the image. For example, particle filter follows the top-down approaches, in the sense that the image content is only evaluated at the sample positions [10,11]. On the other hand, in a bottom– up approach the image is segmented into objects which are then used for the tracking. Spatio-temporal method proposed in this paper follows the bottom–up approaches. In our approach, we take advantage of the probe movements to perform a method based on optical flow, in which a clustering algorithm is applied using optical flow and luminance information with the aim of segment the ultrasound probe. Next, we applied Principal Component Analysis (PCA) over segmented
292
A.F. Serna-Morales, F. Prieto, and E. Bayro-Corrochano
region to determine the axis line of the probe. Finally, a motion restriction is defined in order to avoid tracking errors. This paper is organized as follows: Section 2 presents the methodologies used in spatio-temporal segmentation and axis line determination; in Section 3 experiments on an endoneurosonographic database are shown and we exposed reasons to use motion restrictions over probe tracking; finally, Section 4 is devoted to summarize and conclude the work.
2
Endoscopic Image Processing
At each time step, we obtain two color images from each camera of the endoscopic equipment and one ultrasound image from ultrasound probe (that we do not use in this work). These cameras are fully characterized, making it possible to perform a stereo reconstruction of the surgical scene. Endoscopic images have a dimension of 352 × 240 pixels. The object of interest is the ultrasound probe, which is visible by the endoscopic cameras. The probe is made of a metallic (specular) material, rotates on its own axis and moves randomly forward and backward through the endoscopic channel, so we use luminance and optical flow to perform its segmentation and its tracking. 2.1
Segmenting the Ultrasound Probe
The goal is determine the location of the ultrasound probe throughout endoscopic images. This process is developed applying a spatio-temporal method based on optical flow and clustering algorithm [1]. Optical Flow. In this work the algorithm proposed by Horn and Schunck has been used [6]. The algorithm determines the optical flow as a solution of the following partial differential equation: δL δx δL δy dL + + =0 (1) δx δt δy δt dt The solution of the Equation 1 is obtained by numerical procedure for error function minimization. The error function E is defined in terms of spatial and time gradients of optical flow vector field and consist of two terms shown in Equation 2. 2 2 E= α Lc + L2b dx dy Lb = L2c = u=
δL δx u
+
δu 2
dx dt
δx
;
δL δy v
+
+
δu δy
v=
dL dt
2
dy dt
+
δv 2 δx
+
δv δy
2
(2)
Spatio-temporal Image Tracking
293
L(x,y,t) represents luminance of the image in point (x,y) at the time moment t. To solve the minimization problem a steepest descent method is used, which is based on computation of gradient to determine the direction of search for the minimum. The optical flow algorithm has two main phases: in the first phase, gradient coefficients δL/δx, δL/δy, dL/dt are computed from input images; in the second phase, the optical flow vectors u and v, defined by Equation 2, are computed. Figure 1 shows the computation of optical flow for an endoscopic image. Note that the maximum values correspond to the image region where the ultrasound probe is moving in.
(a) Endoscopic image from left side (b) Optical flow from an endoscopic camera image Fig. 1. Optical Flow computation using the Horn and Schunck algorithm
Clustering-Based Segmentation. Most of the classical image segmentation techniques rely only on a single frame to segment the image [15]. However, the motion is a very useful clue for image segmentation [1], because we want to segment an ultrasound probe that rotates around its own axis and it is in continuous movement. In this approach, segmentation is not done on a simple frame-by-frame basis but utilizes multiple image frames to segment the ultrasound probe. For this purpose we extract features both from the current image that has to be segmented and from neighboring image frames in the sequence. The extracted feature vectors are clustered using a clustering algorithm to determine the probe region in the image. Currently, we use two features: the first one is the image luminance because the ultrasound probe is made of a metallic (specular) material then is brighter than the other objects in the background; the second one is the Euclidean norm of the optical flow. By using the above features, we obtain both spatial and temporal information about the scene. K-means clustering algorithm has been used in this work [13]. K-means is a numerical, unsupervised, non-deterministic and iterative method; it is simple and very fast, so in many practical applications, the method is proved
294
A.F. Serna-Morales, F. Prieto, and E. Bayro-Corrochano
to be a very effective way that can produce good clustering results [9]. K-means clustering consists in partitioning the feature space in clusters using an iterative algorithm that minimize the sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances. In our application, the feature space is divided into two characteristic areas corresponding with two image regions: the ultrasound probe and the background. After the clustering algorithm was applied, the image is morphologically opened in order to reduce noise and eliminate small regions [7]. Figure 2 shows the result of the segmentation using the procedure described above.
(a) Endoscopic image
(b) Segmentation
Fig. 2. Spatio-Temporal segmentation of the ultrasound probe
2.2
Determining the Axis Line of the Probe
The goal is to determine the axis line of an ultrasound probe throughout endoscopic images. After segmentation by clustering, we have a (x,y) cloud point corresponding to pixels inside the ultrasound probe region in the image. With the purpose of getting the axis line of the probe, we extract the first principal component of the segmented region using PCA [4]. Principal Components Analysis (PCA) can be used to align objects (regions or boundaries) with the eigenvectors of the objects [15]. In our case, the major axis line of the ultrasound probe is determined by first principal component analysis (PCA). With this information, we can track the orientations of the probe in different images of the sequence. The ultrasound probe has an elongated and thin shape, thus its longitudinal axis corresponds to the axis that we want to find. If we compute the region of the ultrasound probe in the image as a set of bivariate data (x1 , x2 ), the longitudinal axis is the one with greater dispersion of data. For this reason, calculating the first principal component (PC) of pixels in segmented region corresponds to determine the axis line of the probe. Consider the variable x =[x1 , x2 ], corresponding to the Cartesian coordinates of the pixels that conform the segmented probe in the image, the covariance
Spatio-temporal Image Tracking
295
matrix Σ and eigenvalues λ1 ≥ λ2 . We can construct the linear combination shown in Equation 3: Y1 = aX = a1 X1 + a2 X2 Y2 = bX = b1 X1 + b2 X2
(3)
Variance Var(Yi ) and covariance Cov(Yi , Yk ) are shown in Equations 4 and 5, respectively. PC is the irrelevant linear combination Y1 , Y2 that makes the variances of above formula as largest as possible. Inside of resultant PCA array, the first PC has the largest variance and the second PC has the second largest variance [5]. In this work we need to extract only the first Principal Component, which correspond to the major axis line of the ultrasound probe. (4) ai ; i = 1, 2 V ar(Yi ) = ai Cov(Yi , Yk ) = ai
ak
;
i = 1, 2
(5)
Figure 3 shows the axis line of the probe extracted using PCA in two endoscopic images.
(a) Endoscopic left camera
(b) Endoscopic right camera
Fig. 3. Axis line of the ultrasound probe in endoscopic images
3
Results
For our experiments, we used a database with 2900 images from brain phantoms acquired with endoneurosonographic equipment. We applied our methodology and the traditional color–based particle filter [10] to all of the images, and we obtain numerical results from comparison with 100 images manually tracked using ground-truth. Figure 4 shows the manual probe tracking performed by ground-truth segmentation through a video sequence. On the other hand, Figures 5 and 6 show the results of the probe tracking using the methodology explained in Section 2
296
A.F. Serna-Morales, F. Prieto, and E. Bayro-Corrochano
and a classical color–based particle filter [10], respectively. In both cases, a priori knowledge of the background is not required. The axis line of the ultrasound probe is defined by two parameters: first, the centroid of segmented region, which determines the point where the axis line must cross the probe; and second, the axis angle, which is computed in our approach using the first eigenvector obtained by Principal Component Analysis (PCA). As is shown in Figures 4, 5 and 6, we compared the axis line obtained by ground-truth, our methodology and particle filter. Errors are calculated using the Euclidean distance between the centroids and the angle difference between axis orientations. Table 1 shows the error measurements between the axes calculated manually using ground-truth, our procedure and particle filter. The Euclidean distances between centroids (EBC) are shown in pixels, and the differences between angles (DBA) are shown in degrees. It remember that all the endoscopic images have a dimension of 352 × 240 pixels. We present the mean (μerror ), the standard deviation (σerror ), minima (minerror ) and maxima (maxerror ) values of these errors in two endoscopic video sequences of 100 images taken at a sampling frequency of 24 Hz (24 frames per second).
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
Fig. 4. Probe tracking using ground-truth segmentation
3.1
Motion Restriction
Results in Figure 5 show that the tracking is correct when we have full visibility of the ultrasound probe. Unfortunately, as is shown in Figure 7, in some cases the ultrasound probe can leave out of the range of vision of the endoscopic cameras, which generates a wrong tracking and causes the high error values reported in Table 1. For this reason, we included a motion restriction on the axis line of the probe. This restriction consists in defining maximum variations in angle and displacement allowed for the probe axis line between consecutive frames of the video sequence. To achieve this, we defined the state vector shown in Equation 6, which encodes the location of the ultrasound probe axis as a variation in position and rotation of the probe at any instant of time t. In that equation dx (t), dy (t),
Spatio-temporal Image Tracking
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
297
Fig. 5. Probe tracking using a Spatio-Temporal methodology
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
Fig. 6. Probe tracking using a Particle Filter
dθ (t) are the first derivates of the probe position with respect to x, y and θ, respectively. The first step of motion restriction consists in estimating ultrasound probe axis location using the methodology proposed in Section 2. This estimation is accepted if two conditions are reached: a) Euclidean distance between centroids of current and previous frames should not exceed the threshold ud ; b) the difference between axis angle in current and previous frames should not exceed the threshold uθ . Heuristically, thresholds ud and uθ were set in 35 pixels and 30 degrees, respectively. If these conditions are not met, it is very probably that a tracking error has occurred due to ultrasound probe is leaving out from range of vision of endoscopic cameras. To solve this problem, the variation of the state vector can be defined by the evolution rule of Equation 7, where N (μn , σn ) is White Gaussian Noise. Analyzing ultrasound probe displacements throughout endoscopic images, Gaussian functions parameters were set in μn =0 and σn =5. This evolution rule ensures that the axis position variation from one image to the next is not sharp, eliminating tracking errors caused by occlusion or leaving out the probe of the range of vision of endoscopic cameras, as is shown in Figure 8. In Table 1 we can
298
A.F. Serna-Morales, F. Prieto, and E. Bayro-Corrochano
(a)
(b)
(c)
(d)
Fig. 7. Error tracking caused by no visibility of the probe in the image
(a)
(b)
(c)
(d)
Fig. 8. Tracking correction using motion restrictions
observe a considerable reduction in tracking errors due to the inclusion of motion restriction in the algorithm. Besides, tracking errors of the particle filter are low for the axis angle, but high for the position of the centroid. This is because the Particle Filter implemented [10] uses an ellipsoidal approximation of the region of interest instead of performing the correct region shape, as we do using our segmentation methodology. T
S (t) = [dx (t) , dy (t) , dθ (t)]
(6)
S (t) = S (t − 1) + N (μn = 0, σn = 5)
(7)
Table 1. Tracking errors with respect to ground-truth method μerror
EBC (pixels) DBA (degrees)
σerror
minerror
Spatio-Temporal Tracking 28.1 27.4 0.6 11.7 23.8 0.1
maxerror
108.6 88.8
Spatio-Temporal tracking with motion restriction EBC (pixels) 18.8 13.5 0.6 75.8 DBA (degrees) 3.9 3.5 0.1 13.4
EBC (pixels) DBA (degrees)
Color–based Particle Filter 49.6 29.5 3.2 14.1 10.3 0.0
109.8 31.0
Spatio-temporal Image Tracking
4
299
Summary and Conclusion
We have shown a straightforward and efficient solution to the problem of ultrasound probe tracking throughout a sequence of endoscopic images. The method is simple and efficient, yet robust to reasonable occlusion, randomly probe displacements and probe leaving out from the range of vision of cameras. Segmentation of the probe was performed based on luminance and optical flow information. We decided to use these features because the ultrasound probe is made of a specular (metallic) material and its luminance is higher than other objects in the background; on the other hand, optical flow is an important clue to detect continuous and erratic movements of the probe. After implementing the spatio-temporal tracking method, errors were noticed due to occlusion and lack of visibility of the probe. Therefore, we defined a state vector that encodes the position of the axis line of the probe at any instant of time, and introduced a motion restriction associated with the maximum allowable rotations and displacements between consecutive frames of the sequence. A comparison between results shown in Table 1 showed that this restriction was effective in reducing the average errors and standard deviations. In order to evaluate the performance of our work, we decided to compare it with one of the most popular tracking methods, the particle filter [11]. In Section 3, we show that there are no high errors in determining the angle of the axis line of the probe, however, there are significant errors in determining its centroid. This error is due to definition of the particle filter algorithm used [10], which uses a ellipse to define the region of interest. According to these results, our method provides a better tracking solution in this specific problem. We are currently working on a methodology for dynamic rendering of brain structures from endoneurosonography, for which the process proposed here is a critical stage to ensure an adequate 3D modeling.
Acknowledgment We want to thank to CONACYT (M´exico), COLCIENCIAS (Colombia) and Universidad Nacional de Colombia (Manizales) for their economical support to this project. We want to thank PhD. student Rub´en Machucho-Cadena from CINVESTAV Guadalajara for his help during acquisition process of endoneurosonographic database used in this work.
References 1. Galic, S., Loncaric, S.: Spatio-temporal image segmentation using optical flow and clustering algorithm. In: Proceedings of the First International Workshop on Image and Signal Processing and Analysis, IWISPA 2000, pp. 63–68 (2000) 2. Gillams, A.: 3d imaging-a clinical perspective. In: IEE Colloquium on 3D Imaging Techniques for Medicine, pp. 111–112 (18, 1991)
300
A.F. Serna-Morales, F. Prieto, and E. Bayro-Corrochano
3. Gobbi, D.G., Comeau, R.M., Lee, B.K.H., Peters, T.M.: Integration of intraoperative 3d ultrasound with pre-operative mri for neurosurgical guidance. In: Proceedings of the 22nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 3, pp. 1738–1740 (2000) 4. Gonzales, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing using MATLAB, 2nd edn. Gatesmark Publishing (2009) 5. Haibo, G., Wenxue, H., Jianxin, C., Yonghong, X.: Optimization of principal component analysis in feature extraction. In: International Conference on Mechatronics and Automation, ICMA 2007, pp. 3128–3132 (5-8, 2007) 6. Horn, B.K.P., Schunck, B.G.: Determining optical flow. Artificial Intelligence 17, 185–203 (1981) 7. J¨ ahne, B.: Digital Image Processing, 5th edn. Springer, Heidelberg (2002) 8. Machucho-Cadena, R., de la Cruz-Rodriguez, S., Bayro-Corrochano, E.: Rendering of brain tumors using endoneurosonography. In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4 (8-11, 2008) 9. Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: An improved k-means clustering algorithm. In: Third International Symposium on Intelligent Information Technology and Security Informatics, IITSI 2010, pp. 63–67 (2-4, 2010) 10. Nummiaro, K., Koller-Meier, E., Van Gool, L.: An adaptive color-based particle filter. Image Vision Computing 21(1), 99–110 (2003) 11. Ortegon-Aguilar, J., Bayro-Corrochano, E.: Omnidirectional vision tracking with particle filter. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 3, pp. 1115–1118 (2006) 12. Roberts, D.W., Hartov, A., Kennedy, F.E., Hartov, E., Miga, M.I., Paulsen, K.D.: Intraoperative brain shift and deformation: A quantitative analysis of cortical displacement in 28 cases (1998) 13. Seber, G.A.F.: Multivariate Observations. John Wiley & Sons, Inc., Hoboken (1984) 14. Tatar, F., Mollinger, J.R., Den Dulk, R.C., van Duyl, W.A., Goosen, J.F.L., Bossche, A.: Ultrasonic sensor system for measuring position and orientation of laproscopic instruments in minimal invasive surgery. In: 2nd Annual International IEEE-EMB Special Topic Conference on Microtechnologies in Medicine Biology, pp. 301–304 (2002) 15. Varshney, S.S., Rajpal, N., Purwar, R.: Comparative study of image segmentation techniques and object matching using segmentation. In: International Conference on Methods and Models in Computer Science, ICM2CS 2009, pp. 1–6 (2009)
One Trilateral Filter Based on Surface Normal Felix Calderon1 and Mariano Rivera2 1 Universidad Michoacana de San Nicol´ as de Hidalgo Divisi´ on de Estudios de Posgrado, Facultad de Ingenier´ıa El´ectrica Santiago Tapia 403 Centro, Morelia, Michoac´ an, M´exico, CP 58000
[email protected] 2 Centro de Investigacion en Matematicas A.C. Apdo. Postal 402, Guanajuato, Gto. Mexico, CP 36000
[email protected]
Abstract. In this paper we present an image filter based on proximity, range information and Surface Normal information, in order to distinguish discontinuities created by planes in different orientations. Our main contribution is the estimation of a piecewise smooth Surface Normal, the discontinuity for the Surface Normal and their use for image restoration. There are many applications for Surface Normals (SN) in many research fields, because it is a local measure of the surface orientation. The Bilateral Filter measure differences in range in order to weight a window around a point, this condition is equivalent to see the image as horizontal planes, nevertheless the image do not have the same orientation in different places so surface orientation could help to up perform the Bilateral Filter results. We present a Trilateral Filter (TF) based on proximity, range and Surface Normal information. In this paper, we present a robust algorithm to compute the SN and a new kernel based on SN, which does not have Gaussian formulation. With our Trilateral Filter we up perform the results obtained by BF and we shown with some experiments in which the images filter by our TF looks sharper than the image filter by BF. Keywords: Surface Normal, Trilateral Filters, Image Filtering.
1
Introduction
A Bilateral Filter (BF) is an edge-preserving smoothing filter. Whereas many filters are convolutions in the image domain f , a bilateral filter also operates in the image’s range. Lets define a pixel f (ri ) for a color image f at position ri = [xi , yi ]T , where xi and yi are integer numbers and they represent the row and column of the image respectively. A BF is commonly implemented with two kernels: one for space hg (ri − rj ) and other for intensity (or range) hf (f (ri ) − f (rj )). Rather than simply replace a pixel’s value with a weighted average of its neighbors, as for instance a spatial Gaussian filter does, a BF replaces the pixel’s value f (ri ) by a weighted average of its neighbors in a square window (neighborhood) defined as Wi , in both space and intensity. This preserves sharp G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 301–311, 2010. c Springer-Verlag Berlin Heidelberg 2010
302
F. Calderon and M. Rivera
edges by systematically excluding pixels across the discontinuity [1]. The basic idea underlying BFs is to do in the range of an image what the traditional filter does in its domain. The BF equation is given by equation (1): gb (ri ) =
hg (ri − rj )hf (f (ri ) − f (rj ))f (rj )
(1)
j∈Wi
1 −(sT Σg−1 s)/2 e kg 1 −(sT Σr−1 s)/2 hf (s) = e kf ∀i ∈ [1, . . . , M ] hg (s) =
where Σg and Σf are the covariance matrix for the spacial and intensity kernels, respectively, kg and kf are normalization constants, s is a vector and M is the image size. Next we illustrate the capabilities and the drawbacks of BFs. For this propose we use two discontinues functions. The first one is the step function S(x, y) and the second one the two ramps function T (x, y). Such piecewise functions are defined as: 1 f or x ≥ x0 (2) S(x, y) = 0 f or x < x0 2x0 − x f or x ≥ x0 T (x, y) = (3) x f or x < x0 Figures 1(a) and 1(e) shown the functions S(x, y) and T (x, y) respectively. For the S(x, y) function, we have, for a value δ near to x0 , a square difference (S(x0 + δ, y) − S(x0 − δ, y))2 = 1, such a large square difference indicates a discontinuity in the range; an edge in the image. On the other hand for the function T (x, y), the same δ value is not detected by a range kernel hf (s) (1) because ((T (x0 + δ, y) − T (x0 − δ, y))2 = 0, as consequence such a second order discontinuity. Thus, the BF will behave as a non-robust low pass filter. The BF allows to preserve edges, this condition is possible only when the edges are similar to those in S(x, y) function and BF looses the edge-preserving condition in case of second order discontinuities as the located in T (x, y) function. When an image have edges as the T (x, y) function, BF only will give a reasonable solution depending of the discretization rate, the kernel size, the covariance matrix Σr , but in general for large kernel size the solution instead of have a sharp edge between the planes, the final edge will look as smooth hump. Figures 1(b) and 1(f) shown the images, after apply a BF to the functions in figures 1(a) and 1(e) respectively. In the case of the step function the BF produces the expected results (see fig 1(b)), but in the case of the two ramps, instead of a sharp edge reconstruction, the
One Trilateral Filter Based on Surface Normal
303
BF over smoothes the peak and it computes a smooth hump, as can be see, in figure 1(f). Additional information is necessary in order to reconstruct not only edges as S(x, y) also as T (x, y) functions. In this work, we propose to use the information provided by the Surface Normals (SNs). Differently to the range difference, in case of T (x, y) function, the norm of the difference normal for √ a value δ near to x0 , will be significant |N (T (x0 + δ, y)) − N (T (x0 − δ, y))| = 2. In this case the normal difference will capture the second order discontinuity in better way than the range difference. However the square normal differences will be insignificant in case of S(x, y) function. Therefore we need to use both: Surface Normal information and Range information in a Trilateral Filter (TF), the details of our TF are introduced in following section. The SNs for S(x, y) and T (x, y) functions are given by equations (4) and (5). 0i + 0 j+ k f or x ≥ x0 (4) N (S(x, y)) = 0i + 0 j+ k f or x < x0 √1 i + 0 j + √12 k f or x ≥ x0 2 N (T (x, y)) = (5) 1 1 √ √ k f or x < x0 − i + 0j + 2
2
where i, j and k are unitary vector in direction of the axis for a 3D space.
60
0.5
40
0.0 0 20
60
0.5
40
0.0 0 20
20
60
40
0.0 0 20
20
S(x, y) function
(b)
40
0.0 0 20
BF
(c)
40
0.0 0
20
20
T (x, y) function
60
BF using SNs
(d)
20
(f )
20
BF
60
(g)
TF
60
0.5
40
0.0 0 20
20 40
40 0
0
1.0
60
40
0.0 0
20
60
0
20 40
0.5
40 60
20
0
1.0
60
0.5
40
40
0.0 0
20
60
0
1.0
60
0.5
60
0.5
40 60
0
1.0
(e)
60
0.5
40
40
(a)
1.0
1.0
1.0
1.0
60
0
BF using SNs
(h)
0
TF
Fig. 1. (a) Original step function S(x, y), (b, c, d) solution after applied BF, BF normal based and TF, (e) original two ramps T (x, y) and (f, g, h) similar to (b, c, d)
2
Trilateral Filter
In order to reduce, the pernicious smoothing effect introduced by BF at second order edges, one could replace the range kernel by SNs based one. The resulting BF produces poor results at the first order discontinuities (steps) because the SN
304
F. Calderon and M. Rivera
is the same in images that look locally as the S(x, y) function because |N S(x − δ, y) − N S(x + δ, y)| = 0. Figure 1(c) and 1(g) shown the solution using a BF Filter with a kernel based on range proximity and our SNs kernel. In this case the BF based on SN gives a solution equivalent to a low pass filter. In order to reduce this problem, we need to use three kernels: one kernel based on spatial proximity, other kernel based on intensity and one kernel more based on SN; it gives a Trilateral Filter (TF). Using this TF is possible to distinguish discontinuities by normal and range information for images with edges similar to S(x, y) and T (x, y) functions. Figures1(d) and 1(h) shown the solution for both function using our TF approach with SNs information. We remark, the idea of Trilateral Filters using SN is not new, Choudhury and Tumblin in [2] give an example of TF using normals for surface reconstruction. However, their kernel is very different of our approach and they do not report a robust procedure to compute the SN vectors. In our case the kernel for the convolution filter includes spatial kernel, intensity and SN information. Our main contributions are the adaptation of TF to image denosing, the robust procedure to computing the normals and we note that our kernel does not have a Gaussian formulation. The equation (6) is the expression for our TF. gT (ri ) =
1 hg (ri − rj )hf (f (ri ) − f (rj ))hm (mi , mj )f (rj ) cte
(6)
j∈Wi
where hm (mi , mj ) is a kernel based on SN and mi is a robust estimator for the surface normal at pixel ri ∀i ∈ [1, . . . , M ]. We describe in the next subsection the robust normal estimation procedure.
2.1
Robust Normal Estimation
Given a set of M points P = {p1 , . . . , pi , . . . , pM } in the image domain, with pi = [xi , yi , f (xi , yi )]T ≡ [ri , f (ri )]T , and let be Vi a square window with the set of k nearest neighbors (neighborhood) with centroid ci and placed at ri position, then the Tangent Plane (TP) at ith image pixel is computed by fitting a plane Ai xi + Bi yi + Ci zi + Di = 0 by means of a least-squares procedure. Thus the normal vector at each point is the vector given by n(xi , yi ) ≡ ni = [Ai , Bi , Ci ]T . Hoppe et al. [3] proposed to compute, the SN ni , as the third eigenvector (associated with the smallest eigenvalue) of the local covariance matrix Ci : Ci =
(pj − ci ) ⊗ (pj − ci )
(7)
j∈Vi
where ⊗ denotes the outer product vector operator. If λ1i ≥ λ2i ≥ λ3i are the eigenvalues of Ci , their associated eigenvectors vi1 , vi2 , vi3 , respectively, form an orthonormal basis. Then ni is either vi3 or −vi3 and one additional procedure will be necessary, in order to have homogeneous normals in nearest places. In general the neighborhood Vi is a squared window centered at ith point.
One Trilateral Filter Based on Surface Normal
305
An alternative is computing a covariance matrix weighting by wij according with (8). Cˆi = wij (pj − ci ) ⊗ (pj − ci ) (8) j∈Vi
Exists many ways to compute the weights and Pauly et al in [4] gave one procedure to compute it. Paulys approach use a weight based on proximity, which does not allow discarding points that belongs to different planes. A desirable condition could be a weight wi,j = 1, if the ith point is near to jth point and also both belong to the same plane, in other case wi,j = 0 . Our approach tries to compute the weights in this way. In the following subsection the algorithm for computing homogeneous normal vectors is described. 2.2
Homogeneous Normal Vector
Due the ambiguity in sign for the normal vector, we apply a homogenization procedure for whole normals in image domain. This procedure begin taken a reference pixel position, for instance the pixel at ith position with normal ni , and the rest of the normal vectors nj will be replaced with −nj if ni · nj < 0 ∀j ∈ [1, 2, 3, · · · M ] and i = j . This procedure computes coherent normal vectors in sign. The final homogeneous normal vectors are a vectorial field and this procedure does not change magnitude and orientation only the sing for the normal vector. In next subsection, we describe our approach for robust normal estimation edge–preserving based on Half–quadratic regularization. 2.3
Kernel Based on Robust Normal Estimation
Calderon et al in [5], proposed a procedure for computing a robust normal field mi using Half–quadratic regularization (HQR). HQR is an edge-preserving regularization technique for restoring piecewise smooth signals [6,7,8,9]. Using HQR its possible to take care of discontinuities with a membership term li,j and two regularization constants α and β. The membership term will be very important for our kernel normal based because this term allow us to discard a member for a given neighborhood. The general HQR energy function proposed is given by equations (9, 10)
U (m, l) =
⎧ M ⎨ i=1
⎩
mi − ni 2 + α
s.t.
2 mi − mj 2 (1 − lij )2 + βlij
j∈Vi
mi 2 = 1,
∀i ∈ {1, 2, · · · , M }
⎫
⎬ ⎭
,
(9)
(10)
where ni is the normal vector computed as the homogenized third eigenvalue for the covariance matrix (7) . For equation (9) α is a regularization parameter which controls the smoothness of mi . The second term is weighted by (1 − li j), which minimizes the effect of high difference in nearest neighbors; a common
306
F. Calderon and M. Rivera
condition in borders or discontinuities. The membership term lij is penalized by means of β given the possibility to compute a piecewise smooth normal mi . The Karush-Khun-Tucker conditions for the quadratic programing problem in (9) and (10) are: ∇mi U (m, l) + γi mi = 0
(11)
∇lij U (m, l) = 0
(12)
mTi mi
(13)
=1 ∀i ∈ {1, 2, · · · , M },
∀j ∈ Vi
where ∇x denotes the partial gradient operator. We will use Gauss-Seidel algorithm so the tth Gauss-Seidel Iteration is given by (t)
lij
(t+1)
mi,d
2 (t) (t) mi − mj = 2 (t) (t) β + mi − mj ni,d + α =
∀i ∈ {1, 2, · · · , M }
ni + α
j∈Vi
2 (t) (t) mj,d 1 − lij
j∈Vi
mj (1 − lij )2
(14)
(15)
d = 1, 2, 3
Applying equations (14) and (15), one can compute a edge–preserving normal vector set mi and also the indicator variables lij . The SNs estimation is one part of the procedure but, only we will use the indicator variable in order to compute a weight wij = (1 − lij ), this weight can be used in Paulys approach (8). This weight will be near to zero when the jth neighbor does not belong to the same plane that the ith point and near to 1 in other case. These weights will be controlled by parameter β and its expression, according with (14), is given by wij = 1 − lij β β + mi − mj ∀j ∈ Vi
wij =
(16)
this weight wi,j will be used in the kernel therefore, the final kernel SNs based is: β 1 hn (mi , mj ) = (17) ki β + mi − mj where ki is a normalization constant defined as ki = ∀j∈vi h(mi , mj ) The complete procedure to compute the kernel SN based using our approach is described in algorithm 1. Figure 2 show the kernels computed by BF (a-e) and TF
One Trilateral Filter Based on Surface Normal
(a) BF31,20
(b) BF31,29
(c) BF31,31
(d) BF31,33
(e) BF31,42
(f ) T F31,20
(g) T F31,29
(h) T F31,31
(i) T F31,33
(j) T F31,42
307
Fig. 2. Kernels shape for the Bilateral and Trilateral filter at different positions for T (x, y) function
(f-j) at row 31 and columns 20, 29, 31, 33 and 42 respectively for the two planes image (fig. 1(e) ). The kernels for the BF have the same shape independently of their position in T (x, y) while the TF kernel change their shape. The kernels in figures 2(f) and 2(g) are located at left of the edge between the planes, the kernel, in figure 2(h), the kernel is located exactly at the edge position. Figures 2(i) and 2(j) shown the kernel at the right of the edge. Note that according with the kernel shape, points across edge will have different weights and thus belongs to different neighborhood. In case of fig. 2(h) the kernel shape is narrow and aligned with the edge direction an expected condition for two ramps function.
Algorithm 1. Kernel Normal Based Input: A color Image f Output : The SN kernel hn Compute ni ∀i ∈ {1, 2, · · · , M } as the third eigenvalue of covariance matrix eq. (7) Homogenize ni ∀i ∈ {1, 2, · · · , M } as was described in section 2.2 Compute the robust normal mi and membership term li,j doing (0) Set mi = ni ∀i ∈ {1, 2, · · · , M } For t = 1, 2, · · · , Max Iter { (t) Compute the memberships lij using (14) (t+1) Update the normal vectors mi applying (15) } Compute the kernel normal hn (mi , mj ) ∀i ∈ {1, 2, · · · , M } ∀j ∈ Vi applying eq. (17)
308
3
F. Calderon and M. Rivera
Results
The TF were tested using five images, two synthetic (Step S(x, y) and two planes T (x, y) function) and three real images (Luca, Cat and Lena). The results obtained with the step function image are shown in figure 3 and the original image (see 3(a)) has an image range between 0 and 1 with additive Gaussian noise with media zero and standard deviation 0.05. The solution applying BF are presented in figure 3(b) and the solution TF in 3(c) for the 31-th row of the images. As we expect, for this case, does not exist any difference using the SN information because the SN vector is the same in whole image. For the two ramps function we created an image 64 × 64 pixels, with image range between [0, 3.2] and slope equal to ±0.1 for each plane (fig. 1(e)). The corrupted image was created adding Gaussian noise with media 0 and standard deviation 0.90 to the original. The image with noise was filtered using different kernel sizes and the results are presented in Table 1. This table has five columns, the first corresponds to the kernel size, the second is the final image after applied the BF and the next column presents the central image row, the fourth column presents the original image after applied the TF and the last column the central image row of the TF solution. Due the lost of quality for the final printer paper the columns tree and five, for this table, presents the middle row of the image (31–th row). Note, the sharp edges recovered with the TF instead of the lost of sharp for the BF when the kernel size was increased. The figures 4(a), 4(d) and 4(g) shown the Luca, Cat and Lena original images respectably. The resulting images after applied the BF are shown in figures 4(b), 4(e) and 4(h) and TF SN based resulting images are shown in 4(c), 4(f) and 4(i). Also figures 4(j), 4(k) and 4(l) shown at top the TF solution, in the middle the BF solution and at bottom the original row. In these rows you can see more sharp edges and details. The parameters for each image are in table 2. In general the images using the TF normal based filter looks more clear than the BF and the presented rows gave some details about it. We adjust by hand the parameters for both filters (Bilateral an Trilateral) such that we had the best results.
(a)
(b)
(c)
Fig. 3. Step function image S(x, y) (a) Original with Gaussian noise, (b) resulting Image after applying BF and (c) resulting Image after applying TF for the 31th row of the images
One Trilateral Filter Based on Surface Normal Table 1. Comparative results for BF and TF using different kernel size Kernel size
Bilater Filter Image Row
Trilateral Filter Image Row
5
10
15
20
25
Table 2. Parameters for Lena, cat and Luca images for BF and TF filters Kernel Normal Image Figures size Neighborhood size α β Σf Luca 4(a) , 4(b) and 4(c) 11 × 11 7×7 10.0 0.05 diag[30, 30] Cat 4(d), 4(e) and 4(f) 11 × 11 3×3 10.0 0.10 diag[20, 20] Lena 4(g) , 4(h) and 4(i) 11 × 11 3×3 10.0 0.05 diag[60, 60]
309
310
F. Calderon and M. Rivera
(a) Luca
(b) BF
(c) TF
(d) Cat
(e) BF
(f ) TF
(g) Lena
(h) BF
(i) TF
TF
TF
BF
BF
original
Original
TF
BF
Original
0
50
100
150
200
250
(j) Luca 81-th row
300
0
50
100
150
200
250
(k) Cat 128-th row
300
0
50
100
150
200
250
(l) Lena 151-th row
Fig. 4. Filtering Luca, Cat and Lena images using BF and TF, and rows showing filter details for TF at top and BF at middle
4
Conclusions
In this paper a comparison between TF and BF has been presented. In all experiments the resulting images using TF have edges more sharp than the resulting images using BF, so in general TF reach more clear images than BF.
One Trilateral Filter Based on Surface Normal
311
Due the lost on definition by the printer in all cases a row is presented in order to give to the reader clarity in our results. With our experiments also shown the robustness of the TF over the BF loosing the second one, the sharpness in edges when the kernel size was increased. Due the TF needs an extra kernel SN based the reader will choose between edges quality or time. Our approach only increases a little the execution time, but the results are better using TF than the BF. Acknowledges. This work was supported by the Consejo Nacional de Ciencia y Tecnologia, Mexico, grant 61367-Y to Mariano Rivera.
References 1. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proceedings of the 1998 IEEE International Conference on Computer Vision (1999) 2. Choudhury, P., Tumblin, J.: The trilateral filter for high contrast images and meshes. In: Christensen, P.H., Cohen, D. (eds.) Proc. of the Eurographics Symposium on Rendering, pp. 186–196 (2003) 3. Hoppe, H., DeRose, T., Duchamp, T., McDonald, J., Stuetzle, W.: Surface reconstruction from unorganized points. In: SIGGRAPH 1992: Proceedings of the 19th Annual Conference on Computer Graphics and Interactive Techniques, pp. 71–78. ACM Press, New York (1992) 4. Pauly, M., Keiser, R., Kobbelt, L.P., Gross, M.: Shape modeling with point-sampled geometry. In: SIGGRAPH 2003: ACM SIGGRAPH 2003 Papers, pp. 641–650. ACM Press, New York (2003) 5. Calderon, F., Ruiz, U., Rivera, M.: Surface-normal estimation with neighborhood reorganization for 3d reconstruction. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 321–330. Springer, Heidelberg (2007) 6. Black, M., Rangarajan, A.: On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. Int’l J. Computer Vision 19, 57–92 (1996) 7. Charbonnier, P., Blanc-Feraud, L., Aubert, G., Barluad, M.: Deterministic edgepreserving regularization in computed imaging. IEEE Trans. Image Processing 6, 298–311 (1997) 8. Rivera, M., Marroquin, J.L.: Adaptive rest condition potentials: first and second order edge-preserving regularization. Journal of Computer Vision and Image Understanding 88, 76–93 (2002) 9. Rivera, M., Marroquin, J.: Half–quadratic cost functions with granularity control. Image and Vision Computing 21, 345–357 (2003)
Beta-Measure for Probabilistic Segmentation Oscar Dalmau and Mariano Rivera Centro de Investigaci´on en Matem´aticas, A.C. Jalisco S/N, Colonia Valenciana. C.P. 36240, Guanajuato, Gto. M´exico {dalmau,mrivera}@cimat.mx
Abstract. We propose a new model for probabilistic image segmentation with spatial coherence through a Markov Random Field prior. Our model is based on a generalized information measure between discrete probability distribution (βMeasure). This model generalizes the quadratic Markov measure field models (QMMF). In our proposal, the entropy control is achieved trough the likelihood energy. This entropy control mechanism makes appropriate our method for being used in tasks that require of the simultaneous estimation of the segmentation and the model parameters. Keywords: Probabilistic segmentation, Markov random measure field, information and divergence measures, half-quadratic.
1 Introduction Image segmentation consists in partitioning the image into non-overlapping meaningful homogenous regions i.e. flat regions, movement (stereo, optical flow), model-based, texture, color, ... etc. Image segmentation has been widely used in different applications, for example, in medical image [1,2,3], robot vision [4,5], image colorization [6] and image editing [7]. Several techniques have been used to solve this problem. Among them one can find variational approach [8,9], clustering techniques [10], the fuzzy c-means (FCM) methods [11,12], graph theory [13,14] and Bayesian approach [15,16,2,17]. Recently Dalmau and Rivera presented a general framework for probabilistic image segmentation [17]. Using this framework one can obtain new probabilistic segmentation models by selecting a metric-divergence between discrete probability distributions [18]. Based on the previous framework, in this work we present a new model β-MMF (βMarkov measure field) for probabilistic image segmentation. This model has theoretical implications. First, it could be considered as a generalization of the quadratic Markov measure field (QMMF) models [2]. Second, the model presented here is related with fuzzy c-means [19,20] based-methods for image segmentation. Our model relies on a generalized measure between discrete probability distributions [21]. We are going to study the particular case of the β-MMF that produces a family of Half-quadratic Markov measure field (HQMMF) models. This paper is organized as follows. In Section 2 we make a brief review of the Bayesian formulation for probabilistic segmentation. Also, we introduce the β-MMF model and a half-quadratic version that allows us to solve this model efficiently. Section 3 shows some experimental results and finally, in the last section, we present our conclusions. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 312–324, 2010. © Springer-Verlag Berlin Heidelberg 2010
Beta-Measure for Probabilistic Segmentation
313
2 Mathematical Formulation 2.1 Review of Bayesian Formulation for Markov Measure Field (MMF) In general, the probabilistic image segmentation problem can be described as the computation of the preference of a pixel to certain models or classes. Given K models M = {m1 , m2 , · · · , mK }, an observable vector measure random field Y , i.e. at each pixel one observe a vector Y (r) = [Y1 (r), Y2 (r), · · · , YK (r)]T such that Yk (r) ≥ 0 K and k Yk (r) = 1, and a generative model Ψ (Y (r), X(r), N (r)) = 0, ∀r ∈ L,
(1)
where the function Ψ should satisfy the hypothesis of the implicit function theorem, i.e. there exists a function Γ such that N (r) = Γ (Y (r), X(r)), N (r) is a random variable with known distribution and X is a hidden measure Markov random field. If x, y are realizations of X, Y respectively, the problem consists in finding an estimator x∗ . Using a Bayesian formulation, the MAP estimator x∗ can be computed as def
x∗ = arg min U (x; y) = D(x; y) + λR(x), x s.t. xk (r) = 1, xk (r) ≥ 0, ∀r ∈ L,
(2) (3)
k
where D(x; y) is the likelihood energy, R(x) is the prior energy, λ is a regularization hyper-parameter and L is the regular lattice that corresponds to the pixel sites. According to [17] D(·; ·) and R(·) are based on metric-divergence between distributions [21,18]. The variational model (2) and (3) is a generalization [17] of the particular cases proposed in Refs. [15,16,2], 2.2 β-MMF Based on the general formulation Eqs. (2)-(3) we present a new model for probabilistic segmentation. Then, we need to define the likelihood and the prior energies. First, we start with the prior energy. Here we use a very popular energy which relies on the Euclidean distance, that is R(x) = ωrs x(r) − x(s)2 , (4) r∈L s∈Nr γ −3 where ωrs is a weight function i.e. ωrs = γ+G(r)−G(s) and Nr is the 2 , γ = 10 set of first neighboring pixels of r. The energy (4) promotes piecewise-smooth spatial changes of the vector measure field x. This spatial smoothness is controlled by the parameter λ > 0 in the functional (2). Therefore, if λ is very large then the segmented regions tend to be large, and vise-versa, the smaller the value of λ the more granular is the solution. We note that the prior energy (4) corresponds to the functional of the Random Walker algorithm [22]. However, this energy has largely been used in different image segmentation models, see Refs [15,16,23,2,17].
314
O. Dalmau and M. Rivera
αβ Second, we obtain the β-Measure as a particular case of the -measure beγ δ tween two discrete probability distributions f , h, i.e. k fk = 1, fk ≥ 0 and k hk = 1, hk ≥ 0, see Ref. [21]. This measure is defined as (α,β) I(γ,δ) (f , h) = (2−β − 2−δ )−1 (fkα hβk − fkγ hδk ), (5) k
where α, β, γ, δ > 0. The β-Measure is defined as the particular case (β,α)
def
I β (f , h) = lim I(β,0) (f , h), α→0
(6)
and therefore I β (f , h) = −
1 β fk log hk . log 2
(7)
k
Using the previous measure in the likelihood term of the functional (2), we obtain the β-MMF model: x∗ = arg min U β (x; y), x xk (r) = 1, ∀r ∈ L, s.t.
(8) (9)
k
where the functional U β (·; ·) is defined in the following way 1 β λ def − xk (r) log yk (r) + ωrs x(r) − x(s)2 (10) U β (x; y) = β 2 r∈L
k
s∈Nr
The parameters β, γ, λ could be trained, see Experiment Section, or can be manually tuned. In interactive experiments we use β = 1.5, γ = 10−3 and λ = 106 . Note that for β ≥ 1 the previous functional is convex, so the constraint optimization problem (8)-(9) has an unique solution, and for 0 < β < 1 the optimization problem is not convex. Two interesting particular cases of the β-MMF models are obtained when β = 1 and β = 2. If β = 1 then β-measure becomes the Kerridge’s measure [24], also known as the Cross-entropy, and the likelihood energy in (10) is linear. If β = 2 then the β-MMF models becomes the QMMF model [2]. 2.3 Half-Quadratic MMF The optimization problem β-MMF could be solved using gradient descent method [25]. However, here we present a more efficient alternative. When β ∈ (0, 2] we can minimize (8)-(9) by using the half-quadratic approach introduced by Geman and Reynolds [26]. In Appendix A we present an outline of the proof, see Ref. [27] for details about half-quadratic minimization technique. Therefore, the solution is computed by iterating until convergence the following two steps (assuming a initial point for x):
Beta-Measure for Probabilistic Segmentation
315
def
1. Update dk (r) = − β2 xβ−2 (r) log yk (r) k 2. Solve, approximately, the quadratic programing problem: 1 2 min xk (r)dk (r) + λ ωrs x(r) − x(s)2 , x 2 r∈L s∈Nr k subject to k xk (r) = 1, xk (r) ≥ 0, ∀r.
(11)
This half-quadratic technique produces an interactive scheme and the solution to(11), at iteration t + 1, can be obtained using the Lagrange multipliers method, see [28]. Here we present two algorithms: one for the multi-class segmentation and the second one for the binary segmentation problem. – For multi-class segmentation we obtain the following iterative scheme π(r) + λ s∈Nr ωrs xk (s) xk (r) = , dk (r) + λ s∈Nr ωrs where the Lagrange multipliers are approximated with 1 dk (r)xk (r) = EN (dN ), π(r) = K
(12)
(13)
k
that is, π(r) is the expected value of dN ∈ {d1 (r), d2 (r), · · · , dK (r)} with respect to N ∈ {1, 2, · · · , K}, with discrete probability distribution x(r). Note that if the previous solution at time t is positive then the obtained vector at time t + 1 is also positive, see expression (12). – For binary segmentation: The kind of segmentation is very useful in many applications, for instance: organ segmentation, object tracking, foreground/background or object/no-object segmentation. Although, we can use two classes in the multiclass scheme, a more efficient approachcan be developed. Firts, the functional is rewritten as follows 1 2 U β (x1 ; y 1 , y 2 ) = x1 (r)d1 (r) + (1 − x1 (r))2 d2 (r) 2 r∈L +λ ωrs (x1 (r) − x1 (s))2 , s∈Nr
Again, the solution can be obtained using the Lagrange multipliers method. This produces the iterative formulas: d2 (r) + λ s∈Nr ωrs x1 (s) x1 (r) = , (14) d1 (r) + d2 (r) + λ s∈Nr ωrs where 2 (r) log y1 (r), d1 (r) = − xβ−2 β 1 2 d2 (r) = − xβ−2 (r) log y2 (r), β 2 x2 (r) = 1 − x1 (r).
(15) (16) (17)
316
O. Dalmau and M. Rivera
We remark the for β ∈ (0, 1) the functional is non-convex and in general the solution is a local minimum. To obtain a ‘good’ local minimum we can apply some kind of Graduated Non-Convexity method [29]. If β ∈ (1, 2] the problem is convex and the global minimum is guaranteed. 2.4 Observation Modeling The observation modeling depends strongly on the problem we are dealing with. Here we present two examples. First, for the very popular interactive image segmentation task and second, for model parameter estimation problem. Interactive Segmentation. In the interactive segmentation problem the observation models can be represented by intensity or color histograms of user-marked regions [14,30]. This seed regions are known in the case of foreground/background segmentation as trimap [30], and in the case of multi-objects segmentation as multimap. Consider that some pixels in certain region of interest, Ω, are interactively labeled by a user. If K is the class label set, we define the pixels set (region) that belongs to the class k as Rk = {r : R(r) = k}, and R(r) ∈ {0} ∪ K, ∀r ∈ Ω,
(18)
is the label field (class map or multimap) where R(r) = k > 0 indicates that the pixel r is assigned to the class k and R(r) = 0 if the pixel class is unknown and needs to be estimated. Let g be an image such that g(r) ∈ t, where t = {t1 , t2 , . . . , tT } with ti ∈ Ên , n = 1, 3. n = 1 in the case of gray level images (ti represents an intensity value) and n = 3 in the case of color level images (ti represents a RGB color). Let hk (t) : Ên → Ê be the empirical histogram on the marked pixels which belong to class k, i.e. the ratio between the number of pixels in Rk whose intensity (or RGB color) ˆ k (t) the smoothed is t and the total number of pixels in the region Rk . We denote as h ˆ normalized histograms (i.e. t hk (t) = 1), then the preference (observation) of the pixel r to a given class k is computed with: ˆ k (g(r)) + h yk (r) = K , = 10−3 , ∀k > 0. ˆ j (g(r)) + h j=1
(19)
One can use more complex statistical models for defining the preference functions, for instance parametric models such as Gaussian Mixture Models. However, in the experiments we work with low dimension feature space (1 and 3 dimensions for gray and color images respectively), so the smoothed normalized histograms are computationally more efficient, i.e. they are implemented as look up tables. For higher dimension of the feature space, parametric models are in general more appropriate. Model Estimation. In this case we consider that the preference measure of a pixel r to some model k, or the preference to belonging to the region Rk , could be described
Beta-Measure for Probabilistic Segmentation
317
through a parametric model, i.e. yk (r) = fk (r, g, θk ), k ∈ {1, 2, · · · , K}. The problem consists in computing simultaneously the vector measure field x and the set of parameters θ = {θ1 , θ2 , · · · , θK }. (x∗ , θ∗ ) = arg min U β (x; y(θ)), (x,θ) s.t. xk (r) = 1, ∀r ∈ L,
(20) (21)
k
In general, this is a very hard problem and commonly highly non-linear. One alternative to face these drawbacks is to use the two step Segmentation/Model estimation (SM) algorithm [31,32]. First, we minimize with respect to x fixing the parameters θ (Segmentation step). Second, we minimize with respect to θ fixing the parameters x (Model estimation step). These two steps are repeated until a convergence criteria. We illustrate this through an example. Consider the observations are given by Gaussian functions: yk (r) = exp (−g(r) − θk 2 ). In this case, for the segmentation step we use the half-quadratic technique explained in Section 2.3. For the model estimation step we obtain the following closed formula: θk =
β r xk (r)g(r) . β r xk (r)
3 Experiments 3.1 Image Binary Segmentation We evaluate the performance of the β-MMF model using the binary segmentation problem. First, we have 4 data-sets, three of them composed by the letters of English alphabet in three fonts. The last one is composed by 65 maps of different countries. The images are normalize into the interval [0, 1], and we add, to each data-set, 5 levels of Gaussian noise with zero mean and standard deviation in {0.2, 0.4, 0.6, 0.8, 1.0}, see Fig 1. Finally we renormalize each data-set into [0, 1]. In summary, we obtained 20 data-sets with a total of 715 noisy images. The original data-sets, without noise, is the groundtruth. To measure the segmentation quality we use mean square error (MSE) between the groundtruth (t) and the segmented image (s) using the β-MMF model, that is: 1 (s(r) − t(r))2 (22) M SE(s, t) = |L| r∈L
The observations are modeled using Gaussian functions, see Section 2.4. We made two kind of experiments. One with fixed models between [0, 1], in the experiment we use 1 θ1 = 16 , θ2 = 15 16 as the mean of the models. And the other using parameter model estimation. Then, we have two set of parameters: the parameters of the models θ and the hyper–parameters of the algorithm Θ = [β, λ, γ]T . In order to obtain the best results of the algorithm, i.e the ones with less MSE error, we train hyper–parameter set Θ
318
O. Dalmau and M. Rivera
Fig. 1. Representative image of each noisy data-set. First row: Andale mono font, Second row: Constantia font, Third row: Courier bold font and Fourth row: Maps. From left to right: images with different levels of Gaussian noise with zero mean and standard deviations in {0.2, 0.4, 0.6, 0.8, 1.0}. Table 1. The best parameters obtained after training the β-MMF with Andale Mono Dataset noise std dev 0.2 0.4 0.6 0.8 1.0
λ 0.52 0.52 0.57 0.51 1.06
Fix models γ β training error 0.08 0.64 0.01 0.12 0.81 0.20 0.08 0.97 1.41 0.14 0.97 11.54 0.11 0.97 18.36
λ 0.54 0.22 0.47 0.47 0.53
Model estimation γ β training error 0.02 0.90 0.01 0.06 1.08 0.21 0.07 1.48 8.01 0.09 1.52 16.20 0.08 1.39 22.18
using the Nelder–Mead method [33,34]. In particular, we use the implementation in the Numerical Recipes [34]. The results of the training, for all data-sets, are shown in Tables 1, 2, 3, 4. We note that while the level noise increases the training error also increases. Obviously, this is what one expects. For noise standard deviation less than 0.6 the results are in general good, see Fig 2. However, for noise standard deviation greater than 0.6 the result is poor, see Fig. 3. This could be explained through the Fig 4.
Beta-Measure for Probabilistic Segmentation
319
Table 2. The best parameters obtained after training the β-MMF with Constantia Dataset noise std dev 0.2 0.4 0.6 0.8 1
λ 0.52 0.52 0.50 0.52 0.51
Fix models γ β training error 0.10 0.62 0.01 0.09 0.89 0.32 0.12 0.91 3.53 0.13 0.97 11.25 0.12 0.96 19.50
λ 0.53 0.52 0.47 0.55 0.31
Model estimation γ β training error 0.02 0.92 0.01 0.02 0.94 0.34 0.04 1.56 8.30 0.05 1.48 14.47 0.20 1.33 18.10
Table 3. The best parameters obtained after training the β-MMF with Courier Bold Dataset noise std dev 0.2 0.4 0.6 0.8 1
λ 0.53 0.52 0.51 0.48 0.46
Fix models γ β training error 0.10 0.57 0.01 0.10 0.86 0.24 0.11 0.93 0.71 0.12 0.97 8.96 0.16 0.95 13.99
λ 0.54 0.53 0.51 0.48 0.47
Model estimation γ β training error 0.01 1.08 0.01 0.01 0.90 0.24 0.01 1.05 0.67 0.08 1.63 8.16 0.02 1.61 10.11
Table 4. The best parameters obtained after training the β-MMF with Maps Dataset noise std dev 0.2 0.4 0.6 0.8 1
λ 0.60 0.54 0.52 0.55 0.55
Fix models γ β training error 0.02 1.08 0.03 0.05 0.90 0.36 0.09 0.97 0.87 0.13 0.97 4.40 0.11 0.97 10.73
λ 0.56 0.57 0.19 0.22 0.48
Model estimation γ β training error 0.01 0.63 0.03 0.01 0.92 0.37 0.05 1.02 0.71 0.35 0.88 1.81 0.08 1.45 6.48
This figure shows the histograms of the images of first row in Fig 1. When the standard deviation is 0.2 we can distinguish two models. However, when the standard deviation increases the parameter of the models collapse. 3.2 Interactive Segmentation Application Fig. 5 depicts an application of the probabilistic segmentation methods for interactive image editing task. In this example we use the editing scheme proposed in Ref. [7] and the β-MMF model, Sections 2.2, 2.3. The likelihood is computed using the scribbles provided by a user, first column of Fig. 5, and following Section 2.4. The first column shows scribbles on the image to be edited, the second column shows the segmented image, and the third and fourth columns show edited images. In the horses image, first row, we blend three images, one source image per class, see [7] for details. The used parameters for this experiment are β = 1.5, γ = 10−3 and λ = 106 .
320
O. Dalmau and M. Rivera
Fig. 2. Segmentation of selected images with noise standard deviation 0.6. First row: noisy images to be segmented, second row: soft segmentation, second row: hard segmentation.
Fig. 3. Segmentation of selected images with noise standard deviation 1.0. First row: noisy images to be segmented, second row: soft segmentation, second row: hard segmentation.
Beta-Measure for Probabilistic Segmentation 0.10
0.08
0.08
0.08
321
0.08
0.07
0.07
0.07
0.07
0.06
0.06
0.06
0.06
0.05
0.05
0.05
0.05
0.04
0.04
0.04
0.04
0.03
0.03
0.03
0.03
0.02
0.02
0.02
0.08
0.06
0.04
0.02
0.02 0.01
0.00 0
10
20
30
40
50
0.00 0
0.01
10
20
30
40
50
0.00 0
0.01
10
20
30
40
50
0.00 0
0.01
10
20
30
40
50
0.00 0
10
20
30
40
50
Fig. 4. From left to right: Image histograms from first row of Fig 1 with Gaussian noise with zero mean and standard deviations in {0.2, 0.4, 0.6, 0.8, 1.0}
Fig. 5. Interactive image editing using the models β-MMF and the probabilistic editing framework proposed in Ref. [7]. First column: Scribbles made on the original image, Second column: segmented images, Third and fourth columns: edited images using several source images (one per class). See the full color image on www.micai.org/2010/DalmauRivera-page10.pdf.
4 Conclusions We presented a new model for probabilistic segmentation (β-MMF). This model generalizes the Quadratic Markov Measure Field model (QMMF) and can be seen as a Half-Quadratic variant that keeps convex the energy for low entropy promotion. As it is demonstrated in our experiments, this is an important characteristic of the β-MMF models for the task of simultaneous model estimation and image segmentation. In particular, this model presents good results for binary image segmentation. The β-MMF models can be used for other applications, as for instance: image editing. In future work we will focus in improving our method for model parameter estimation in the case of corrupted images with high variance noise. Acknowledges. This work was supported by the Consejo Nacional de Ciencia y Tecnologia, Mexico: DSc. Scholarship to O. Dalmau and Grant 61367-Y to M Rivera.
References 1. Kaus, M.R., Warfield, S.K., Nabavi, A., Black, P.M., Jolesz, F.A., Kikinis, R.: Automated Segmentation of MR Images of Brain Tumors. Radiology 218, 586–591 (2001) 2. Rivera, M., Ocegueda, O., Marroqu´ın, J.L.: Entropy-controlled quadratic markov measure field models for efficient image segmentation. IEEE Transactions on Image Processing 16, 3047–3057 (2007)
322
O. Dalmau and M. Rivera
3. Hower, D., Singh, V., Johnson, S.: Label set perturbation for mrf based neuroimaging segmentation. In: IEEE International Conference on Computer Vision ICCV 2009, pp. 849–856 (2009) 4. Chamorro-Martinez, J., Sanchez, D., Prados-Suarez, B.: A Fuzzy Colour Image Segmentation Applied to Robot Vision. In: Advances in Soft Computing Engineering, Design and Manufacturing. Springer, Heidelberg (2002) 5. Mishra, A., Aloimonos, Y.: Active segmentation for robots. In: International Conference on Intelligent Robots and Systems (2009) 6. Dalmau, O., Rivera, M., Mayorga, P.P.: Computing the alpha-channel with probabilistic segmentation for image colorization. In: IEEE Proc. Workshop in Interactive Computer Vision (ICV 2007), pp. 1–7 (2007) 7. Dalmau, O., Rivera, M., Alarcon, T.: Bayesian Scheme for Interactive Colourization, Recolourization and Image/Video Editing. To Appear in Computer Graphics Forum (2010) 8. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problem. Commun. Pure Appl. Math., 577–685 (1989) 9. Hewer, G.A., Kenney, C., Manjunath, B.S.: Variational image segmentation using boundary functions. IEEE Transactions on Image Processing 7, 1269–1282 (1998) 10. Weiss, Y.: Segmentation using eigenvectors: A unifying view. In: ICCV, vol. (2), pp. 975–982 (1999) 11. Ahmed, M.N., Yamany, S.M., Mohamed, N., Farag, A.A., Moriarty, T.: A modified fuzzy c-means algorithm for bias field estimation and segmentation of mri data. IEEE Trans. Med. Imaging 21(3), 193–199 (2002) 12. Chuang, K.S., Tzeng, H.L., Chen, S., Wu, J., Chen, T.J.: Fuzzy c-means clustering with spatial information for image segmentation. Computerized Medical Imaging and Graphics 30, 9–15 (2006) 13. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000) 14. Boykov, Y., Jolly, M.P.: Interactive organ segmentation using graph cuts. In: Delp, S.L., DiGoia, A.M., Jaramaz, B. (eds.) MICCAI 2000. LNCS, vol. 1935, pp. 276–286. Springer, Heidelberg (2000) 15. Marroquin, J.L., Velazco, F., Rivera, M., Nakamura, M.: Gauss-markov measure field models for low-level vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 337–348 (2001) 16. Marroquin, J.L., Arce, E., Botello, S.: Hidden Markov measure field models for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 1380–1387 (2003) 17. Dalmau, O., Rivera, M.: A general bayesian markov random field model for probabilistic image segmentation. In: Wiederhold, P., Barneva, R.P. (eds.) IWCIA 2009. LNCS, vol. 5852, pp. 149–161. Springer, Heidelberg (2009) 18. Cha, S.H.: Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences 1, 300–307 (2007) 19. Bezdek, J.C., Coray, C., Gunderson, R., Watson, J.: Detection and characterization of cluster substructure i. linear structure: Fuzzy c-lines. SIAM Journal on Applied Mathematics 40(2), 339–357 (1981) 20. Bezdek, J.C., Coray, C., Gunderson, R., Watson, J.: Detection and characterization of cluster substructure ii. fuzzy c- varieties and convex combinations thereof. SIAM Journal on Applied Mathematics 40(2), 358–372 (1981) 21. Taneja, I.J., Gupta, H.: On generalized measures of relative information and inaccuracy. Applications of Mathematics 23, 317–333 (1978)
Beta-Measure for Probabilistic Segmentation
323
22. Grady, L., Schiwietz, T., Aharon, S., Westermann, R.: Random Walks for interactive organ segmentation in two and three dimensions: Implementation and validation. In: Duncan, J.S., Gerig, G. (eds.) MICCAI 2005. LNCS, vol. 3750, pp. 773–780. Springer, Heidelberg (2005) 23. Grady, L.: Multilabel random walker image segmentation using prior models. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CVPR, vol. 1, pp. 763–770. IEEE, Los Alamitos (June 2005) 24. Kerridge, D.: Inaccuracy and inference. Journal of the Royal Statistical, Series B, 184–194 (1961) 25. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operation Research (2000) 26. Geman, D., Reynolds, G.: Constrained restoration and the recovery of discontinuities. IEEE Trans. Pattern Anal. Mach. Intell. 14(3), 367–383 (1992) 27. Black, M., Rangarajan, A.: On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. Int’l J. Computer Vision 19(1), 57–92 (1996) 28. Rivera, M., Dalmau, O., Tago, J.: Image segmentation by convex quadratic programming. In: ICPR, pp. 1–5. IEEE, Los Alamitos (2008) 29. Blake, A., Zisserman, A.: Visual Reconstruction. MIT Press, Cambridge (1987) 30. Juan, O., Keriven, R.: Trimap segmentation for fast and user-friendly alpha matting. In: Paragios, N., Faugeras, O., Chan, T., Schn¨orr, C. (eds.) VLSM 2005. LNCS, vol. 3752, pp. 186– 197. Springer, Heidelberg (2005) 31. Neal, R.M., Hinton, G.E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In: Jordan, M.I. (ed.) Learning in Graphical Models, pp. 355–368. Kluwer Academic Publishers, Boston (1998) 32. Marroqu´ın, J.L., Santana, E.A., Botello, S.: Hidden Markov measure field models for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 1380– 1387 (2003) 33. Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7, 308–313 (1965) 34. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes in C: the art of scientific computing, 2nd edn. Cambridge University Press, New York (1992) 35. Vogel, C.R., Oman, M.E.: Iterative methods for total variation denoising. SIAM J. Sci. Comput. 17, 227–238 (1996)
Appendix A Here we show that the optimization problem (8)-(9) can be solved using the halfquadratic [26,27] technique. Here we follow the notation and methodology described in Ref. [27]. Then, one needs to find the conditions that the function ρ(x) = xβ should satisfy to be used in a half-quadratic regularization. According to [27] ρ(x) = min x2 z + Ψ (z),
(23)
z
where def
Ψ (z) = φ((φ )−1 (z)) − z(φ )−1 (z), φ(x2 ) = ρ(x). Based on the previous definition, one obtains that 2−β ψ(z) = 2
2 z β
β β−2
.
324
O. Dalmau and M. Rivera
(x) As the optimization problem (23) should have a minimum at z = φ (x2 ) = ρ 2x = β β−2 2−β 1 x , see Ref. [27] for details , then the coefficient of ψ(z), i.e. , should be 2 2 positive, and therefore β ≤ 2. Another way to obtain the previous result is using the condition that the function φ(x) should be convex, that is φ (x) ≤ 0.
φ (x) =
β β − 2 β−2 x 2 2 2
then, β2 β−2 ≤ 0 and again we conclude that β ≤ 2. 2 Observe also that for 1 ≤ β ≤ 2 the function φ(x) satisfies lim φ (x) = ∞,
x→0
lim φ (x) = 0,
x→∞
(24)
that is, the function φ(x) does not satisfy the condition limx→0 φ (x) = 1, which is very important to define a bounded weight function z = β2 xβ−2 for x > 0. Note, however, that the φ-function of Total Variation (TV), i.e. ρ(x) = |x|, satisfies the conditions (24). So, similar to TV [35] we can redefine the ρ-function in the followβ β
ing way ρ˜(x) = β2 1− 2 x2 + 2 , where > 0 is a small real value, for instance = 10−6 . For this ρ-function, the corresponding weight function is z = φ (x2 ) =
2 β −1 ρ˜ (x) 1− β 2 x + 2 . Now the function φ(x) satisfies: 2x = lim φ (x) = 1,
x→0
1
lim φ (x) = 0,
x→∞
(25)
The same result is obtained if one compute the derivative with respect to z of the function in Eq. (23) and set it equal to zero.
Robust Spatial Regularization and Velocity Layer Separation for Optical Flow Computation on Transparent Sequences Alonso Ramirez-Manzanares1, Abel Palafox-Gonzalez1, and Mariano Rivera2 1
Universidad de Guanajuato, Departamento de Matematicas, Valenciana, Guanajuato, Gto. Mexico. C.P. 36000 2 CIMAT A.C., Callejon Jalisco S/N, Valenciana, Guanajuato, Gto. Mexico. C.P. 36000
[email protected],
[email protected],
[email protected]
Abstract. Motion estimation in sequences with transparencies is an important problem in robotics and medical imaging applications. In this work we propose two procedures to improve the transparent optical flow computation. We build from a variational approach for estimating multivalued velocity fields in transparent sequences. That method estimates multi-valued velocity fields which are not necessarily piecewise constant on a layer –each layer can evolve according to a non-parametric optical flow. First we introduce a robust statistical spatial interaction weight which allows to segment the multi-motion field. As result, our method is capable to recover the object’s shape and the velocity field for each object with high accuracy. Second, we develop a procedure to separate the component layers of rigid objects from a transparent sequence. Such a separation is possible because of the high accuracy of the object’s shape recovered from our transparent optical flow computation. Our proposal is robust to the presence of several objects in the same sequence as well as different velocities for the same object along the sequence. We show how our approach outperforms existing methods and we illustrate its capabilities on challenging sequences. Keywords: Transparent Optical Flow, Layer Recovery, Regularization.
1
Introduction
There exists a very wide literature on apparent motion estimation, also called Optical Flow (OF), due to the number of applications that require motion estimation and the complexity of the task. Motion estimation methods rely on conservation of a function of the recorded signal, generally luminance, or some of its derivatives across time and on spatial or spatiotemporal regularity constraints. We first introduce the main notation: let Ω ⊂ R2 , then the function f : (x, t) ∈ Ω × {0, . . . , T } → R denotes the greyscale sequence or “image sequence”, defined G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 325–336, 2010. c Springer-Verlag Berlin Heidelberg 2010
326
A. Ramirez-Manzanares, A. Palafox-Gonzalez, and M. Rivera
as a volume over space x = (x1 , x2 ) and time t, and u = (u1 (x, t), u2 (x, t))T denotes “its” motion field. The simplest conservation principle, often refereed to as the Lambertian assumption, states that the intensity of a point remains constant along its trajectory. Thus, in the case of a time discrete sequence, it provides the classical Displaced Frame Difference equation (DFD) f (x, t) − f (x − u, t − 1) = 0
(1)
or its differential approximation, the well known Optical Flow Constraint equation (OFC) ⎛ 1⎞ u 1 (2) u ∂x1 + u2 ∂x2 + ∂t f (x, t) = ∇3 f (x, t) · ⎝u2 ⎠ = 0, 1 where ∇3 f is the spatiotemporal gradient (fx1 , fx2 , ft )T of f (we will use the notation ∇f for the spatial gradient (fx1 , fx2 )T of f ). The gradient provides an affine constraint on the velocity space, and is sometimes refereed to as “motion constraint vector”. Although widely used, this principle is not satisfied in several real situations which include: changing luminance conditions, specularities, multiple motions and in the case of interest for this paper: transparency. Transparency can be modeled as linear “superposition” of moving layers, corresponding to pixelwise operations on layer intensities. Given n layers Ii (x, t), a transparent image can be modeled as a combination of them f (x, t) =
n
Ii (x, t)
(3)
i=1
and this is the framework we place ourselves in this paper. Although more complex formation can be considered [1], this is in general a reasonable assumption: the light energy measured by a camera/eye is roughly proportional to the source energy times the reflectance and transmittance factors of the different layers that constitute the medium / scene under observation [2]. Spatial regularization (spatial average) reduces the corruption caused by illposed observation models, acquisition noise and incomplete data [3]. In this sense, spatial regularization methods has been extensively used in image processing by assuming that the computed velocities are similar for neighboring voxels (i.e. real sequences are composed of compact objects). However, a regularization framework which assumes global smoothness may inadvertently eliminate the boundaries that separate objects with different velocities. A more appropriate regularization approach is the one based on the statistical outlier rejection scheme, where the outliers over the global smoothness assumption are the pixels with a significantly different model over spatial regions [4,5]. These edge-preserving approaches allows the regularization to break the global smoothness assumption at those sites, thus preserving the features of each region. In this paper we present a new method for performing a robust spatial regularization for Transparent Optical Flow (TOF) computation. For this aim, we
Robust Spatial Regularization and Velocity Layer Separation for TOF
327
regularize the set of velocity indicator variables proposed in [6] by means of an outlier rejection framework. Such a method is based on local detectors sensitive to one or more velocities within a finite velocity space and an integration, regularization and simplification variational formulated stages for single and transparent OFs. Our strategy allows the method to select similar regions for data integration, as result we obtain a segmented velocity field. As a second step, and based on the previous velocity segmentation, we build a procedure to recover the moving image layers Ii by proposing a velocity–based tracking for each object over the sequence. This article is organized as follows. Section 2 reviews the related work on non parametric transparent motion recovery. In Section 3 we introduce our robust regularization which allows to recover well defined OF for each object, and also our method for layer separation given the robust TOF computation. The method’s performance is illustrated in Section 4, on challenging sequences. Finally, we present our discussion and conclusions in Section 5.
2
Related Work
Bergen et al . in [7] derive an iterative three frames algorithm for estimating two motions, by deriving first a 2-fold displaced frame difference equation using the three frames: assume that f is the sum of two layers f = I1 + I2 , moving respectively with motion u1 and u2 , and that the Lambertian assumption holds for each layer: Ii (x, t) − Ii (x − ui , t − 1) = 0. Applying first the DFD (1) for ui gives f (x, t) − f (x − ui , t − 1) = Ij (x, t) − Ij (x − ui , t − 1) := dj (x, t)
(4)
with (i, j) = (1, 2) or (i, j) = (2, 1). The DFD is in general non zero, but one of the layers, Ii , has been eliminated. In case the motion of each layer Ii is constant on at least the three frames t, t − 1 and t − 2, the “difference” layer dj satisfies the DFD dj (x, t) − dj (x − uj , t − 1) = 0. Assuming ui is known, uj can then be computed by a single estimation technique on dj , providing an estimate for the difference layer di which in turns allows to estimate ui . This process is iterated until convergence. In work [6], they introduce a 2-steps approach for transparent motion estimation in this section: the gathering of local information, and then its integration in order to provide global displacement information via a discrete variational model. For the finite sampling of the velocity space, we consider N vectors {u1 , . . . , uN }, with ui = (u1i , u2i )T ,
(5)
describing the set of possible velocities. They use an initial local estimate of transparent displacement likelihood. This likelihood is encoded via the distance function d(ui , r) ∈ R+ |i=1...N which describes at each spatiotemporal position
328
A. Ramirez-Manzanares, A. Palafox-Gonzalez, and M. Rivera
r = (x, t) whether the velocity ui can locally explain the apparent motion (characterized by d(ui , r) ≈ 0) or not (characterized d(ui , r) 0), see [6]. Given the local motion estimates, an integration process is applied in order to obtain a global velocity estimation which deals with the aperture problem and acquisition noise. The regularization is performed by computing the minimizer of the objective function E(α) defined by E(α) =
r
+ +
d(ui , r)α2i (r) + λa (1 − αi (r))2
(6)
i
λs wi (r, s)[αi (r) − αi (s)]2 2 s:s∈Nr i
2 2 λc κN α ¯ (r) − αi (r) ,
(7) (8)
i
where the unknown of the problem is the vector valued field α: α(r) = [α1 (r), . . . , αN (r)]T ,
(9)
with αi (r) ∈ [0, 1] ∀r ∈ Ω × {0, . . . T }. Note that, although each component αi (r) can be interpreted as a probability, α(r) is not a probability measure (as in [8,9]) in the sense that the sum of its components is not constrained to be equal to one. If two motions ui and uj are present at a particular pixel position, then we expect that αi (r) ≈ αj (r) ≈ 1. Conversely, the velocity(ies) at a position r can be extracted from α(r) by selecting the velocity(ies) ui with highest αi (r) value(s). The minimization cost function (6)–(8) is subject to the constraints αi (r) ∈
def def [0, 1] for all i; with α ¯ (r) = N1 i αi (r), Nr = {s : r, s ∈ Ω × [0, T ], r − s < 2} is the spatiotemporal neighborhood of the r position, and κ, λa , λs , λc are user–defined positive constants. Terms (6), (7) and (8) are denoted as Attach (likelihood), Spatial Regularization and Inter-Model Competition, respectively. We note that wi (r, s) are diffusion weights along the i–th velocity model that promotes to recover similar α values for positions along the i-th displacement as is explained in [6]. One drawback in that formulation is the global smoothness assumption over the spatio–temporal positions along the i-th velocity , as it is coded in term (7). Unfortunately, that term over–estimate the objects size as is shown in Panel 1(b). In this work we improve the results by the introduction of the robust statistics outlier rejection theory. Our formulation allows to break the global smoothness assumption and promotes the detection of boundaries between objects with different velocities. Moreover, in the original formulation, the layer recovering problem was not addressed. Given the TOF computation, we introduce a new methodology for recovering the original image layers of rigid objects based on the information coded on the α layers. Our methodologies are explained in the following sections.
Robust Spatial Regularization and Velocity Layer Separation for TOF
3
329
Methods
In this section we present our regularization framework. We first give a brief description of the regularization framework in which our method is based, then we introduce our robust regularization approach. Since each αj (r) coefficient (9) in (6)–(8) is associated with a velocity model from (5), a spatial regularization over the α vector field provides a regularization of the estimated velocity vectors. Thus, we propose to perform a robust spatial regularization over the α vector field as is explained below. 3.1
Robust Spatial Regularization
The adverse effect of noise or the aperture problem could lead the local fitting to erroneous estimations. To mitigate this effect, a previous approach introduces prior knowledge about the global smoothness of velocity via the integration of neighboring α coefficients [6]. However we observe that global smoothness is often broken at the interfaces of voxels where the number of velocities or their direction change significantly. Therefore we argue that a more appropriate strategy is to introduce a piecewise smoothness prior. This can be done by an outlier rejection scheme which iteratively recomputes robust weights of interaction between neighboring voxels [4,5]. For our robust spatial regularization we propose to minimize for α functional (6)–(8), where the spatial regularization term (7) is replaced by: ER (α, α ˆ , r) = ρ(ˆ αi (r) − α ˆi (s))[αi (r) − αi (s)]2 , (10) s:s∈Nr
i
where α ˆ is the non-regularized field, αj (r) is the j–th α coefficient at r. Regularization term (10) is derived from the Markov Random Field prior given the neighborhoods Nr , see [10]. According to the outlier rejection (or robust statistics) theory [4], the weights ρ(ˆ αi (r) − α ˆi (s)) indicate how similar the neighboring voxel r is to s, so that spatial data averaging can be performed. Those weights are defined as ∂ψ(t) /2t = exp(−kt2 )/2, (11) ∂t 1 where we select ψ(t) = 1− 2k exp(−kt2 ) as the Welsh’s robust potential function, such that for similar neighbors ρ() is large, and conversely small for different ones. Finally, note that our method promotes the computation of the same velocity model for similar neighboring voxels (i.e. for ρ(ˆ αi (r) − α ˆ i (s)) ≈ 0.5) due to the use of the euclidean norm in (10). With this in mind, our proposal in fact ˆ i (s)) ≈ produces a multi-velocity segmentation, where the places with ρ(ˆ αi (r)− α 0 denote the boundaries between objects with different velocities. Now we can apply the outlier rejection scheme which proposes to successively ˆ i (s)) for all i and all pairs < r, s >, and then regularize compute ρ(ˆ αi (r) − α the multi-velocity field by minimizing (10) until a stop criteria is achieved, as is explained in the implementation subsection and Algorithm 1. ρ(t) =
330
3.2
A. Ramirez-Manzanares, A. Palafox-Gonzalez, and M. Rivera
Recovering Rigid Objects from the TOF
Once the TOF is computed, it is still a challenging task to recover the original image layers. In this section we propose a new procedure for recovering the rigid transparent objects involved in a transparent sequence composed of M frames. For this aim, we use the multi-velocity field segmentation computed in the previous section, as is explained below. In the two layers situation, the use of equation (4) allows, theoretically, to recover partial information about the individual layers. Because only temporal layer difference is provided, the problem is ill posed and error in motion estimation and noise make it complex even in the two layers situation, while the three or more layers case becomes extremely arduous. Toro et al. have proposed a inverse problem regularization approach for it in [11]. Other authors use specific motion behavior information, as Sarel and Irani did in [12], or assume constant ego-motion as Oo et al . did in [13]. Our proposal below can handle several objects with different velocities in the same sequence and does not assume any type of displacements. On the other hand, it is restricted to the case of rigid objects, and, like all the methods for TOF recovery, it assumes that the movement for each object is constant for at least 3 frames. In the general case, for each frame FM , FM−1 , FM−2 , . . . , F2 we can group neighboring voxels associated to the same velocity layer αi (for instance connected pixels such that αi (r) > 0.5 ). Given that our regularization process provides homogeneous displacement regions inside the objects, see Figure 1(c), a simple threshold is useful to perform the connected component labeling (i.e. to separate the pixels that belong to the same moving object). Thus in this case the intensity values for the connected component labeling are V = {< 0.5, ≥ 0.5}. If the object is rigid, then it is possible to displace it according to the associated velocity to the next frame, and we can repeat this process several times until we reach the last frame (as in an object tracking procedure). Once each group of pixels for the same object for each frame F1 , F2 , F3 , . . . , FM reached the last frame, we have M different images of the same object with a different transparent overlap. Thus, we can average the image intensities for this object. This process eliminates the changing pattern (due to the transparent overlap) and keeps the object’s structure. The steps of this procedure are detailed on the Algorithm 2. The procedure above describes the basic idea, but still there are some problems: a) the recovered images present a contrast reduction due to the average process, and b) the structure of other objects that lie along the velocity of the object we are recovering are kept, because the average can not eliminate them since they are always present along the frames. In order to overcome the above problems it is possible to improve our proposal as explained in the following. Without loss of generality, suppose the sequence is compose of only two objects, A and B. Once we have the first estimation of object A and its position along the frames, we can subtract it from the original
Robust Spatial Regularization and Velocity Layer Separation for TOF
331
sequence generating a new sequence F/A , where the contribution of object A is attenuated (or eliminated) for all the frames. Thus we can feed this new sequence to the basic procedure in Algorithm 2 in order to re-estimate the object B. The symmetric procedure is performed to generate sequence F/B and reestimate object A. We can easily extend this idea to more than two objects (where, for the estimation of object Ai , we compute the sequence F/{Aj },j=i , i.e. the grey scale sequence without all the other objects ), and also it can be extended to an iterative method that computes a better estimation of the object in each iteration. 3.3
Implementation
For the sake of reproducibility we use the set of user defined parameters in (6)– (8) as proposed in [6]. We initially set all the αi (r)’s to the value 0.5 for all the velocities with a minimal distance in (6), all the other values were set to zero. We perform a deterministic annealing on λc . The annealing scheme is the (k) following. For each Gauss-Seidel (see [6]) iteration k = 1, 2, . . . , P , we set λc = (100k/P ) is a factor λc ak , where λc is the chosen contrast level and ak = 1 − 0.95 that increases to 1 in approximately 90% of the total iterations. We used the same annealing scheduling in all our experiments, see [6]. The complete robust regularization algorithm is given in Algorithm 1. Algorithm 1. Robust Regularization 1) Propose a local estimation α0 as the starting point for the minimization (We set αi (r) = 0.5 for all the position where d(ui , r) ≈ 0 and αi (r) = 0.0 otherwise) 2) Set ρ(α ˆ i (r) − α ˆ i (s)) = 0.5 ∀r, s, i (i.e. we enforce global spatial regularization for all voxels at the beginning) for t = 1,. . . ,n do 3) Compute the regularized αt by minimizing terms (6), (8) and (10). ˆ i (s)) from (11) with α ˆ = αt 4) Update ρ(α ˆ i (r) − α end for
In our experiments, for the noise free sequences, we use the user–defined parameter k = 200 which defines the shape of the robust regularization functional ψ(t) in (11). In the case of noisy sequences we decrease this parameter to k = 120. The procedure for the layer recovery method is summarized on Algorithm 2.
4
Experiments
In this section we present results for synthetic and real transparent sequences; all of them can be downloaded at the web site: www-sop.inria.fr/odyssee/data/ sequences/.
332
A. Ramirez-Manzanares, A. Palafox-Gonzalez, and M. Rivera
Algorithm 2. N Transparent layer (Ii,i=1,...,N ) recovery from robust TOF Compute αB layers as the binarization of the α layers. Detect all the objects Ai with different velocities by means of a connected component analysis over the αB layers at the first frame. for EACH object index i do Ii ← IF (Ai ,1) {where function I returns the image (grey scale values) of object Ai at frame t = 1 from sequence F } for t = 1,. . . ,M − 1 do Select the velocity that moves object Ai from Ft to Ft+1 . Move object Ai and Ii to Ft+1 according to the current velocity. Select the k-th velocity such that Ai ∩ αB k is maximum (i.e. the αk layer that contains the shape of Ai ). Ii ← Ii +I(Ai ,t+1) {Accumulation process} end for Ii ← Ii / M {Average process} end for
4.1
Regularization of Local Measurements for Realistic Textured Sequences
High textured sequences are relatively easy to solve using local motion measures. In order to evaluate the actual performance of the method, we use realistic
(a)
(b)
(c)
(d)
Fig. 1. (a) One frame of a sequence with a transparent object moving with changing translational speed over a translating background and (b) the non-robust recovered multi–motion field. The robust regularization advantages are shown in Panels (c) and (d), they show the α velocity layer for velocity [1, −1] and the computed TOF, respectively. The improvement w.r.t Panel (b) is clear.
Robust Spatial Regularization and Velocity Layer Separation for TOF
333
textured scenes with homogeneous regions where many velocities will locally explain the data. It is then necessary to be able to carry the information from less ambiguous regions. Next experiments are designed with that purpose in mind. Our first experiment presents a comparison of our proposal (in Algorithm 1) vs. the non regularized variational approach [6]. Panel 1(a) presents a transparent sequence with a time–varying transparent region and motions. The changing velocities are sketched in the same Panel. The non robust solution is presented in Panel 1(b) (this image was taken from article [6] ), and our result is presented in Panels 1(c) and 1(d): we present the α layer for the airplane’s velocity [1,-1] and the recovered and sampled TOF. Note that our proposal allows to recover details about the shape of the airplane. We do not show the α layer associated to the background because it does not present velocity boundaries. ˆ i (s)) are shown in Figure 2, for a noise free case The robust weights ρ(ˆ αi (r) − α and for a noisy case (SNR=50) on Panels 2(a) and 2(b), respectively. We note that the velocity (object) boundaries are well defined. Also you can confront with results of more frames of this sequence without robust regularization at www.cimat.mx/~mrivera/vision/transparent_sequences/index.html The next sequence is composed of two moving photographs: a face I1 with motion u = [1, 0] (limited textured scene) and a rocky Mars landscape I2 , with
(a)
(b)
ˆ i (s)) (11) at frame # 6. Black pixFig. 2. Robust regularization weights ρ(α ˆ i (r) − α els denote sites where the global smoothness assumption is broken, thus the velocity borders are preserved.
(a)
(b)
(c)
Fig. 3. (a) Two noisy (SNR=10) realistic textured patterns in translation. (b) and (c) recovered image frames according to basic procedure in Algorithm 2, see text.
334
A. Ramirez-Manzanares, A. Palafox-Gonzalez, and M. Rivera
(a)Noise–free ( )
(b)Noise–free ( )
(c) SNR=50
(d)Noise–free ( )
(e)Noise–free ( )
(f)Noise–free
(g)SNR=50
(h)SNR=50
(i)SNR=50
Fig. 4. Results for our two proposals. We show the transparent sequence in (a), the robust boundaries in (b), computed TOF in (c), α layers associated to the actual velocities in (d),(e), and (f), and recovered image layers in (g),(h), and (i), respectively.
motion v = [−1, 0]. The sequence was generated with f = 0.6I1 + 0.4I2 , see Figure 3 (a). We illustrate the basic idea in Algorithm 2 for recovering the original layers from a transparent noisy sequence (SNR = 10) in Panel 3(a) with M =17 frames. In this case, the process is simple because the whole frame is associated to the two velocity layers, thus we recovered the whole images. We show the recovered image layers in Panels 3(b) and 3(c), the root mean squared errors between the actual and recovered frames are 26.72 and 38.65 respectively (the dynamic range of the images is [0,255]). Note that the recovered images
Robust Spatial Regularization and Velocity Layer Separation for TOF
335
present a contrast reduction due to the average process, and also, structures that lie along the velocity are kept in the other image because the average can not eliminate them given that they are constant along the frames (as for instance the eyebrow in that sequence). In order to improve the results above, we apply the refinement procedure explained at the last paragraph in Section 3.2. That procedure allows one to recover high quality estimation of the image layers. This is shown in Figure 4: Panel 4(a) shows the original transparent sequence composed of 10 frames with an airplane, a car, and the background moving with different velocities: [-1,1], [1,0] and [-1,0] respectively . The velocity boundaries, and the α layer for the associated velocities for the noise–free case are shown in Panels 4(b), 4(d), 4(e) and 4(f), respectively. The computed TOF for the case SNR=50 is shown in Panel 4(c). Because of the high quality of the α layers it is possible to obtain a good estimation of the original layers, even for noisy sequences. The estimated image layers are shown in Panels 4(g), 4(h), and 4(i).
5
Discussion and Conclusion
In this work we developed a robust regularization tool that improved a nonparametric velocity scheme wich allows the recovery of an arbitrary number of displacements at the cost of using a fixed dictionary of velocities. Our approach is inspired by the robust statistical methods for outlier rejection. In this case the outliers refer to voxels where the global smoothness assumption is violated. Our main contribution is to propose a robust regularization framework that improves the previously proposed solutions. The computed velocities present homogeneous direction that show how the regularization corrects the local estimations. We introduce a new method for transparent layer recovery based on the velocity layers computed with our robust regularization scheme. Given that our robust method provide a well defined object’s shape along the frames, we can use that information in order to collect different samples of the object intensities and compute an estimation by means of an averaging process. The robust-to-noise feature of our proposal depicts an additional difference with respect to several methods such as in [14,15]: to solve noisy sequences (as for instance our experiment in Figure 3) is not the purpose of previous methods. With this in mind, we note that our proposal belongs to the regularization of local transparent velocities, as in [16]. In future work, we will study in more depth the diffusion terms and also investigate how different velocity maps may interact in the regularization.
References 1. Oppenheim, A.V.: Superposition in a class of nonlinear systems. In: Proceedings of IEEE International Convention, New York, USA, pp. 171–177 (1964) 2. Guenther, R.D.: Modern Optics. John Wiley and Sons, Chichester (1990)
336
A. Ramirez-Manzanares, A. Palafox-Gonzalez, and M. Rivera
3. Li, S.Z.: Markov Random Field Modeling in Image Analysis. Springer, Heidelberg (2001) 4. Black, M.J., Rangarajan, P.: On the unification of line processes, outlier rejection, and robust statistics with applications in early vision. The International Journal of Computer Vision 19(1), 57–91 (1996) 5. Charbonnier, P., Blanc-F´eraud, L., Aubert, G., Barlaud, M.: Deterministic edgepreserving regularization in computed imaging. IEEE Transactions on Image Processing 6(2), 298–311 (1997) 6. Ramirez-Manzanares, A., Rivera, M., Kornprobst, P., Lauze, F.: A variational approach for multi-valued velocity field estimation in transparent sequences. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 227– 238. Springer, Heidelberg (2007) 7. Bergen, J.R., Burt, P.J., Hingorani, R., Peleg, S.: Computing two motions from three frames. In: Third International Conference on Computer Vision, Osaka, Japan, pp. 27–32 (December 1990) 8. Weiss, Y., Adelson, E.H.: A unified mixture framework for motion segmentation: incorporating spatial coherence and estimating the number of models. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition, San Francisco, CA, pp. 321–326. IEEE, Los Alamitos (June 1996) 9. Rivera, M., Ocegueda, O., Marroquin, J.L.: Entropy-controlled quadratic markov measure field models for efficient image segmentation. IEEE Transactions on Image Processing 16(12), 3047–3057 (2007) 10. Besag, J.: Spatial interaction and the statistical analysis of lattice systems (with discussion). Journal of Royal Statistical Society 2, 192–236 (1974) 11. Toro, J., Owens, F., Medina, R.: Using known motion fields for image separation in transparency. Pattern Recognition Letters 24, 597–605 (2003) 12. Sarel, B., Irani, M.: Separating transparent layers of repetitive dynamic behaviors. In: Proceedings of the Tenth International Conference on Computer Vision, Bejin, China, vol. 1, pp. 26–32. IEEE Computer Society, Los Alamitos (2005) 13. Oo, T., Kawasaki, H., Ohsawa, Y., Ikeuchi, K.: The separation of reflected and transparent layers from real-world image sequences. Mach. Vision Appl. 18(1), 17– 24 (2007) 14. Szeliski, R., Avidan, S., Anandan, P.: Layer extraction from multiple images containing reflections and transparency. In: Proceedings In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 246–253 (2000) 15. Sarel, B., Irani, M.: Separating transparent layers through layer information exchange. In: Pajdla, T., Matas, J. (eds.) Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, pp. 328–341. Springer, Heidelberg (2004) 16. Stuke, I., Aach, T., Barth, E., Mota, C.: Multiple-motion-estimation by block matching using MRF. International Journal of Computer and Information Science 26, 141–152 (2004)
SAR Image Denoising Using the Non-Subsampled Contourlet Transform and Morphological Operators Jos´e Manuel Mej´ıa Mu˜ noz, Humberto de Jes´ us Ochoa Dom´ınguez , Leticia Ortega M´aynez, Osslan Osiris Vergara Villegas, Vianey Guadalupe Cruz S´ anchez, Nelly Gordillo Castillo, and Efr´en David Guti´errez Casas Departamento de Ingenier´ıa E´ectrica y Computaci´ on, Universidad Aut´ onoma de Ciudad Ju´ arez Avenida del Charro 450 Norte, C.P. 32310 Ciudad Ju´ arez, Chihuahua, M´exico
[email protected]
Abstract. This paper introduces a novel algorithm that combines the Non-Subsampled Contourlet Transform (NSCT) and morphological operators to reduce the multiplicative noise of synthetic aperture radar images. The image corrupted by multiplicative noise is preprocessed and decomposed into several scales and directions using the NSCT. Then, the contours and uniform regions of each subband are separated from noise. Finally, the resulting denoised subbands are transformed back into the spatial domain and applied the exponential function to obtain the denoised image. Experimental results show that the proposed method drastically reduces the multiplicative noise and outperforms other denoising methods, while achieving a better preservation of the visual details. Keywords: SAR images, Non-Subsampled Contourlet Transform, SAR image denoising, speckle noise.
1
Introduction
Images acquired by sensors are frequently corrupted by random variations of intensity called noise. The noise can be due to fluctuations in illumination or introduced by the sensor itself. In the case of Synthetic Aperture Radar (SAR) images, the noise is also related to the coherent nature of the radar system that generated the image. This multiplicative noise, also known as speckle, must be removed to further process the images. Therefore, the goal is to remove the speckle while preserving the main features such as edges, contours and textures form of the original image.
This work was supported by FOMIX CHIH-2009-C01-117569. Corresponding author.
G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 337–347, 2010. c Springer-Verlag Berlin Heidelberg 2010
338
J.M. Mej´ıa Mu˜ noz et al.
The Discrete Wavelet Transform (DWT) has the property of energy distribution of the wavelet coefficients which makes it suitable for image denoising. Most of the energy in a noised image, belongs to the original image and is captured to a great extend by the coefficients with magnitudes greater than certain threshold, while most of the noise transforms into low magnitude coefficients that lie below a given threshold [1]. There exist methods to remove speckle. The most cited and widely used filter in the SAR community is the Frost filter [2]. This filter has an exponentially shaped kernel. The response varies locally with the variation of coefficients. In case of low variation of the coefficients, the filter is more average-like or low pass, and in cases of high variation of coefficients, the filter attempts to preserve sharp features by not averaging. There are some methods based on edge preservation. For example, [3] proposed a scheme to denoise and to preserve edges by performing scale multiplication, using the DWT. Two adjacent scales are multiplied in order to magnify the edges structures and to suppress noise so that the edges are determined and retained after applying a dual threshold. The coefficients that do not belong to an edge are processed by a hard threshold to suppress the noise. Finally, the image is reconstructed to obtain a denoised image. In [4], the DWT and an edge detector are used to classify the coefficients of a subband into edges or non-edges coefficients. Therefore, only the amplitude of non-edge coefficients are shrunk by using a soft thresholding scheme. In [5], a method called Speckle Reducing Anisotropic Diffusion (SRAD) is proposed. The method defined a differential equation and used it together with an adaptive technique based on the minimum mean square error in order to preserve edges and small features of the image. This method is tailored for noise reduction in ultrasonic and radar imaging applications. One shortcome of the DWT is that the structures of an image can only be preserved in three directions (horizontal, vertical and diagonal). Thus, some important details might be lost. In this paper, a new method to reduce the speckle of SAR images is proposed. The method takes advantage of the sparseness of the Non-Subsampled Contourlet Transform representation (NSCT) [5], [6]. The approach to denoise each subbands, is to separate the contours from the homogenous regions. To this end, the definition of contour and edge in [7] is used in this work. The contours are detected through the use of several thresholds and morphological operations. The method reduces the speckle and outperforms the Frost and the SRAD method. This paper is organized as follows: in section 2 the materials and methods used are explained. In section 3 we describe the proposed algorithm. Section 4 Reports experimental results for simulations and real SAR images. The paper concludes in section 5.
2
Materials and Methods
In this section, we explain the methods used in the algorithm and its terminology. The speckle noise degrades images acquired with coherent imaging systems such as radar, sonar and ultrasound and it causes difficulties for image interpretation.
SAR Image Denoising Using the NSCT and Morphological Operators
339
The model of a noisy image I(x, y) containing the samples of a true image R(x, y), corrupted by speckle noise n(x, y), can be written as I (x , y) = R(x , y) · n(x , y)
(1)
Furthermore, it can be assumed that R(x, y) and n(x, y) are independent random variables and that n(x, y) has unit mean. One technique to reduce the speckle noise is to compute the expected value of (1) as E [I (x , y)] = E [R(x , y)]E [n(x , y)] = R(x , y)
(2)
where E[·] is the expectation operator. Another technique to separate the noise from the true image is by computing the logarithm of (1) as log[I (x , y)] = log[R(x , y) · n(x , y)]
(3)
I (x , y) = log[I (x , y)] = log[R(x , y)] + log[n(x , y)]
(4)
For SAR images, log[R(x, y)] is the logarithm of the reflectivity and log[n(x, y)] is additive Gaussian noise as was shown in [8]. In this work, the image I(x, y) is preprocessed using (4) to produce a new image I (x, y). The proposed system uses the NSCT implemented by the Laplacian pyramid, followed by directional filter banks as shown in Fig. 1a. The NSCT decomposes the image into a lowpass subband and multiple scales with several directional subbands per scale as shown Fig. 1b. Therefore, this decomposition yields multiscale and time frequency localization as well as a high degree of directionality and anisotropy. Also, the cascade structure allows the multiscale and directional decomposition to be independent of each other and it is possible to decompose each scale into any arbitrary power of twos number of directions. The pyramidal filter maxflat and directional diamond maxflat filters of order seven were used. Fig.2 shows the NSCT of the Lena image. For the visual clarity, only the lowpass subband and two scales of decomposition are shown. The first and second scales have four and eight directional subbands respectively. Consider the image I (x, y), which has been adequately transformed into a scale-direction space. The transformed image X(s, L,θ) is said to exhibit a pyramidal structure defined by the number of decomposition levels or scales L and noisy subbands s at different directions θ for each scale, with the topmost level being the coarsest level and the bottom level the finest level. S(L, θ) is the directional subband, normalized in the interval [0 - 255] at scale L and direction θ. The indexed transformed coefficient c(i, j, L, θ) is located at position i, j in a normalized subband at scale L with direction θ. The term coefficient will be used for samples at the transformed domain. The term pixel will be used for samples of the SAR image or samples inside a mask after thresholding. Since the denoising method is carried out in the normalized directional subbands, the terms subband and normalized directional subband will be used interchangeably. We say that a coefficient inside a subband is significant with respect to a scale threshold TL if,
340
J.M. Mej´ıa Mu˜ noz et al.
a)
b)
Fig. 1. The non-subsampled contourlet transform. a) Structure of the Laplacian pyramid together with the directional filter bank and b) frequency spectrum partitioned in several scales and directions by the contourlet transform [7].
|c(i, j, L, θ)| ≥ TL , (i,j)∈S(L,θ)
(5)
otherwise is insignificant. The threshold TL is adjusted according to the scale. The significance of a set of coefficients inside a subband can be written as, 1 if |c(i, j, L, θ)| ≥ TL . δTL {S (L, θ)} = (6) 0 otherwise. After thresholding, a binary mask of significant coefficients inside a subband is obtained, and it can be represented by SδTL (L, θ).
(7)
The algorithm makes use of a size filter which is applied to each binary mask. The filter searches for sets bi of 8-connected pixels with three or more neighbor samples set to one and discards (set to zero) sets of less than three pixels set to one. This operation can be written as, bi , |bi | ≥ size = 3 , (8) x (L, θ) = Filtersize [SδTL (L, θ)] = i
where x(L, θ) is the filtered binary mask and the union operator. During the course of the algorithm, we use of the fact that the true contours, inside a subband, are regions of coefficients with a very close intensity value across the contour. Also, in homogeneous regions, the edges produced by the noise do not form contours of significant size.
SAR Image Denoising Using the NSCT and Morphological Operators
341
Fig. 2. The contourlet transform of Lena image showing the lowpass subband and the two scales of decomposition. Each scale has four and eight directional subbands respectively.
3
The Speckle Reduction Algorithm
Having defined the terminology used in the denoising method, we are in position to understand the proposed algorithm. 1. Preprocessing Compute log[I(x, y)] to obtain I (x, y) 2. Transformation Transform I (x, y) using N scales and M directions per scale of NSCT to obtain X(s, L, θ) 3. Subband processing Discard the finest scale. Normalize the remaining subbands s ∈ X(s, L, θ) in the interval [0 - 255] except the lowpass subband. For each scale L and each direction θ Initialize the scale threshold TL = 255 Obtain the initial binary mask SδTL (L, θ) Obtain the initial x(L, θ) While( TL > (180 + (L − 1) ∗ 10)) TL = TL − 2 ∗ (N − L + 1) Obtain SδTL (L, θ) Update x(L, θ) = (x(L, θ) ⊕ E) ∩ SδTL (L, θ) End Obtain the homogeneous regions: H(L, θ) = s(L, θ) − x(L, θ) · s(L, θ) Apply average filter to H(L, θ) Obtain the denoised subband: s (L, θ) = H(L, θ) + x(L, θ) · s(L, θ) End
342
J.M. Mej´ıa Mu˜ noz et al.
4. Inverse transformation Apply inverse NSCT to s (L, θ) to recover Iˆ (x, y) 5. Compute the exp[Iˆ (x, y)] to recover the denoised image Furthermore, all the contours are expected to be composed of coefficients with higher intensity level than the coefficients in homogeneous regions. Homogeneous regions will be represented by H(L, θ). Finally, it can be said that the denoised subbands resulting from the application of the inverse NSCT are represented by s (L, θ). The pseudo code of the algorithm consists of four steps: preprocessing; transformation; subband processing and inverse transformation. The algorithm starts by computing the logarithm of the input image to obtain a preprocessed image I (x, y). Then, the NSCT is applied to obtain the lowpass subband and four scales with four directions per scale. The coefficients of the finest scale are discarded (set to zero) and the lowpass subband is not processed at all. The remaining subbands are normalized in the interval of [0 - 255] for further processing. 3.1
Contours Segmentation
The normalized subbands S(L, θ) are processed by testing for significance each coefficient against an initial scale threshold TL =255. The idea behind the thresholding scheme is to search for coefficients with the highest magnitude in a subband. This produces the binary mask SδTL (L, θ). Afterwards, a size filter is applied to preserve areas more likely to be contours. These pixels are considered the initial points or seed pixels where the contours will start to grow as shown in Fig. 3.
a)
b)
c)
d)
Fig. 3. Growing of a contour in three iterations, a) seed pixels with high intensity, b) pixels of medium intensity added to the seed pixels, c) low intensity pixels added, d) contour grown. The pixels are inside a safe area.
In the first iteration, the seed pixels are dilated. In the next iteration, the threshold is updated as according to the scale as follows, TL = TL − 2 · (N − L + 1 ).
(9)
Then, a new binary mask per subband is obtained. Afterwards, x(L, θ) is updated by discarding the samples that fall outside the previous dilated areas. It should
SAR Image Denoising Using the NSCT and Morphological Operators
343
Fig. 4. Black samples correspond to safe area of 3x3; light gray pixels belong to new pixels that appear with threshold TL . New pixels outside the safe area are discarded.
be noted that we are interested in coefficients that fall inside safe areas of the previous dilated mask as shown in Fig. 4. Then, the new x(L, θ) is dilated and the threshold updated according to equation 9 and a new iteration starts. The process is repeated until the threshold is equal to or less than (180 + (L − 1)∗ 10). This threshold was found after exhaustive tests. At the end of the iteration process, the binary mask x(L, θ) contains the contours. 3.2
Homogeneous Region Extraction
The homogeneous region is obtained by subtracting the contours founds of each subband; then, the remaining area is averaged using a filter of 10 x 10 samples to reduce the noise in this area. Afterwards, the segmented contours and the homogeneous region are added to produce the denoised subbands. Finally, the inverse NSCT is applied then the exponential of the recovered image is computed to recover a denoised version of the image.
4
Experimental Results
In this section, we test the system using simulated noise in natural images, demonstrated the performance of the algorithm with real SAR images and compare results with the Frost [2] and SRAD [5] filters. For simulation we used the well known images of peppers, Barbara and boat; speckle with different variances was added to these images. Results after denoising are summarized in table 1. We compared quantitatively the method proposed with the test bench Frost filter and the SRAD filter. From figures 5 to 7, it can be observed that all the details are preserved and the visual quality obtained with the proposed method is higher than the obtained with the Frost filter and the SRAD. For these experiments the algorithm used a decomposition of the image with four levels and four directions per level. The numerical results are further supported with SAR images. The images were obtained from The Microwave Earth Remote Sensing [9] Lab and Scatterometer Climate Record Pathfinder [10].
344
J.M. Mej´ıa Mu˜ noz et al. Table 1. Comparison results using different noise power Noise Variance/PSNR in dB 0.1 0.5 1 Peppers Noisy image 16.8574 10.9757 9.6027 Frost 22.109 16.3742 14.9089 SRAD 16.271 12.779 11.6569 Proposed Method 26.1808 21.1143 19.6451 Barbara Noisy image 16.2959 10.4085 8.97 Frost 20.5412 15.4394 14.0059 SRAD 15.4876 12.1532 11.0927 Proposed Method 24.4466 20.1097 18.7334 Boat Noisy image 15.0527 9.5815 8.3896 Frost 20.2662 14.8031 13.5585 SRAD 15.1144 11.7559 10.7593 Proposed Method 24.9092 19.9575 18.5822
a)
c)
b)
d)
Fig. 5. a) Original SAR image, b) SRAD filter, c) Frost filter, d) proposed method
SAR Image Denoising Using the NSCT and Morphological Operators
a)
c)
b)
d)
345
Fig. 6. a) Original SAR image, b) SRAD filter, c) Frost filter, d) proposed method
a)
c)
b)
d)
Fig. 7. a) Original SAR image, b) SRAD filter, c) Frost filter, d) proposed method
346
5
J.M. Mej´ıa Mu˜ noz et al.
Conclusions
In this paper, a new method for denoising SAR images was proposed. The image is preprocessed and applied the NSCT to decompose the image in several directions and multiple scales, preserving its geometric characteristics. The highest frequency scale is discarded and the remaining subbands are normalized except the lowpass. The normalized subbands are successively thresholded in order to separate the contours grown in safe areas defined by the seed pixels. These pixels are dilated in each iteration. Then, the homogeneous regions are separated from the contours and averaged to attenuate the noise in this region. The results were presented using both simulated noise and real SAR images. These results show that the method reduces the speckle noise significantly, without distorting useful information and destroying important image edges. It can be seen that this method outperforms the frost filter in both aspects, numerically and visually. One of the drawbacks of this approach is that the algorithm is more computationally extensive than others methods. As future works, we plan to link edges between the finest scale an the next in order to process all the available subbands. Also, we plan to try other filters for processing the homogenous region. Acknowledgments. This work was supported by Fondo Mixto FOMIX CHIH2009-C01-11756. The authors also would like to express their gratitude to Professor David G. Long, Director of BYU Center for Remote Sensing, Brigham Young University, Provo, Utah, for his valuable opinion about this work.
References 1. Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. on IT 41(3), 613–627 (1995) 2. Frost, V.S., Stiles, J.A., Shanmugan, K.S., Holtzman, J.C.: A model for radar images and its application to adaptive digital filtering of multiplicative noise. IEEE Transactions Pattern Analysis and Machine Intelligence 4, 157–165 (1980) 3. Liu, C., Wang, H.: Image Denoising Based on Wavelet Edge Detection by Scale Multiplication. In: Proceedings of the 2007 International Conference on Integration Technology, pp. 701–705 (2007) 4. Rosa Zurera, M., Cobreces Alvarez, A.M., Nieto Borge, J.C., Jarabo Amores, M.P., Mata Moya, D.: Wavelet Denoising With Edge Detection for Speckle Reduction In SAR Images. In: EURASIP, pp. 1098–1102 (2007) 5. Yu, Y., Acton, S.T.: Speckle Reducing Anisotropic Diffusion. IEEE Transactions on Image Processing 11(11), 1260–1270 (2002) 6. Do, M.N., Vetterli, M.: The contourlet transform: An efficient directional multiresolution image representation. IEEE Trans. Image Processing 14(12), 2091–2106 (2005) 7. Da Cunha, A.L., Zhou, J.P., Do, M.N.: The Nonsubsampled Contourlet Transform: Theory, Design and Applications. IEEE Transactions on Image Processing 15(10), 3089–3101 (2006)
SAR Image Denoising Using the NSCT and Morphological Operators
347
8. Arsenault, H.H., April, G.: Properties of speckle integrated with a finite aperture and logarithmically transformed. JOSA 66(11), 1160–1163 (1976) 9. The Microwave Earth Remote Sensing (MERS) Lab, http://www.mers.byu.edu 10. Scatterometer Climate Record Pathfinder, http://www.scp.byu.edu
Scheme-Based Synthesis of Inductive Theories Omar Montano-Rivas, Roy McCasland, Lucas Dixon, and Alan Bundy School of Informatics, University of Edinburgh {O.Montano-Rivas,rmccasla,ldixon,bundy}@inf.ed.ac.uk
Abstract. We describe an approach to automatically invent/explore new mathematical theories, with the goal of producing results comparable to those produced by humans, as represented, for example, in the libraries of the Isabelle proof assistant. Our approach is based on ‘schemes’, which are terms in higher-order logic. We show that it is possible to automate the instantiation process of schemes to generate conjectures and definitions. We also show how the new definitions and the lemmata discovered during the exploration of the theory can be used not only to help with the proof obligations during the exploration, but also to reduce redundancies inherent in most theory formation systems. We implemented our ideas in an automated tool, called IsaScheme, which employs Knuth-Bendix completion and recent automatic inductive proof tools. We have evaluated our system in a theory of natural numbers and a theory of lists. Keywords: Mathematical theory exploration, schemes, theorem proving, term rewriting, termination.
1
Introduction
Mathematical theory exploration consists of inventing mathematical theorems from a set of axioms. It also includes the definition of new concepts. For example, in the theory of natural numbers we can define addition using successor, multiplication using addition, exponentiation using multiplication and so on. Once we have these new concepts of interest we can start conjecturing their properties and proving them. A diversity of theory exploration computer programs have been implemented [12,5,13] and different approaches have been identified [16]. A recent approach, scheme-based mathematical theory exploration [2], has been proposed and its implementation is being undertaken within the Theorema project [3]. In [6], is described a case study of mathematical theory exploration in the theory of natural numbers using the scheme-based approach. However, apart from this paper there is, to our knowledge, no other case study of scheme-based mathematical theory exploration. In the Theorema system, which was used to carry out the aforementioned case study, the user had to provide the appropriate substitutions (Theorema cannot perform the possible instantiations automatically). The authors also pointed out that the implementation of some provers was still in progress and that the proof obligations were in part ‘pen-and-paper’. From G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 348–361, 2010. c Springer-Verlag Berlin Heidelberg 2010
Scheme-Based Synthesis of Inductive Theories
349
this observation, a natural question arises: whether the instantiation process of schemes and the proof obligations induced by the conjectures and definitions can be mechanized? The main contribution of this paper is to give a positive answer to the above question. The scheme-based approach gives a basic facility to instantiate schemes (or rather higher-order variables inside schemes) with different ‘pieces’ of mathematics (terms built on top of constructor and function symbols) already known in the theory. In section 2 we discuss some motivating examples for the generation of conjectures and definitions using schemes. In order to soundly instantiate the schemes it is necessary to pay attention to the type of objects being instantiated. In sections 3 and 4 we show how this can be performed rigorously and with total automation on top of the simply typed lambda calculus of Isabelle/HOL [14]. To facilitate the process of proof construction, Isabelle provides a number of automatic proof tools. Tools such as the Simplifier [15] or IsaPlanner [8] can help with the proof obligations for conjectures in the process of theory exploration. Isabelle also has strong definitional packages such as the function package [11] that can prove termination automatically for many of the functions that occur in practice. Section 5 shows how new definitions and the lemmata discovered during the exploration of the theory can be used not only to strengthen the aforementioned tools, but also to reduce redundancies inherent in most theory formation systems (section 6). In section 7 we describe our theory exploration algorithms where the processes of theorem and definition discovery are linked together. The evaluation is described in section 8. The related and future work are discussed in sections 9 and 10 respectively and the conclusions in section 11.
2
Motivating Examples
The central idea of scheme-based mathematical theory exploration is that of a scheme; i.e. a higher-order formula intended to capture the accumulated experience of mathematicians for discovering new pieces of mathematics. The invention process is carried out through the instantiation1 of variables within the scheme. As an example, let TN be the theory of natural numbers in which we already have the constant function zero (0), the unary function successor (suc) and the binary function addition (+) and let s be the following scheme which captures the idea of a binary function defined recursively in terms of other functions. ⎛ ⎞ def -scheme(g, h, i, j) ≡ ⎝ ⎠ f (g, y) = h(y) (1) ∃f. ∀x y. f (i(x), y) = j(y, f (x, y)) Here the existentially quantified variable f stands for the new function to be defined in terms of the variables g, h, i and j. We can generate the definition of 1
In Theorema, the instantiation process is limited to function, predicate or constant symbols already known in the theory. Additionally, IsaScheme can use any wellformed closed term of the theory including λ-terms such as (λx. x).
350
O. Montano-Rivas et al.
multiplication by allowing the theory TN to instantiate the scheme with σ1 = {g → 0, h → (λx. 0), i → suc, j → +} (here f → ∗). 0∗y =0 suc(x) ∗ y = y + (x ∗ y) which in turn can be used for the invention of the concept of exponentiation with the substitution σ2 = {g → 0, h → λx. suc(0), i → suc, j → ∗} on scheme 1 (note that the exponent is the first argument in this case). exp(0, y) = suc(0) exp(suc(x), y) = y ∗ exp(x, y) Schemes can be used not only for the invention of new mathematical concepts or definitions, they also can be used for the invention of new conjectures about those concepts. The scheme 2 creates conjectures about the left-distributivity property of two binary operators in a given theory (the variables p and q stand for the binary operators). Therefore if we are working w.r.t. TN extended with multiplication and exponentiation, we can conjecture the left-distributivity property of multiplication and addition and also between exponentiation and multiplication → +, q → ∗} and σ4 = {p → ∗, q → exp} by using the substitutions σ3 = {p respectively on the scheme (2). lef t-distributivity(p, q) ≡ (2) ∀x y z. q(x, p(y, z)) = p(q(x, y), q(x, z)) The aforementioned substitutions give the conjectures x ∗ (y + z) = (x ∗ y) + (x ∗ z) exp(x, y ∗ z) = exp(x, y) ∗ exp(x, z) It is important to note that schemes could generate invalid definitions and false conjectures. For example, consider the substitution σ4 = {g → 0, h → (λx. 0), i → (λx. x), j → +}2 on scheme 1 f (0, y) = 0 f (x, y) = y + f (x, y) This instantiation immediately leads to logical inconsistencies by subtracting f (x, y) from the second equation producing 0 = y. This definition is invalid because, contrary to the natural interpretation of i as a constructor symbol, schemes do not express such conditions on instantiations. Similarly, we can also obtain false conjectures from a substitution, e.g. σ4 = {p → ∗, q → +} on scheme 2 instantiates to x + (y ∗ z) = (x + y) ∗ (x + z). 2
Note that all theories considered depend upon the simply typed lambda calculus of Isabelle/HOL. Therefore, (λx. x) is a perfectly valid mathematical object. In fact, we can choose to have any (finite) set of well-formed closed terms as the initial theory elements for the exploration of the theory (see section 4 for details).
Scheme-Based Synthesis of Inductive Theories
3
351
Representation of Schemes
A scheme is a higher-order formula intended to generate new definitions of the underlying theory and conjectures about them. However, not every higher-order formula is a scheme. Here, we formally define schemes. Definition 1. A scheme s is a (non-recursive) constant definition of a proposition in HOL which we write in the form sn (x) ≡ t. For the scheme sn (x) ≡ t, x are free variables and t does not contain sn , does not refer to undefined symbols and does not introduce extra free variables. The scheme (where dvd means “divides”) prime(p) ≡ 1 < p ∧ (dvd(m, p) ⇒ m = 1 ∨ m = p) is flawed because it introduces the extra free variable m on the right hand side. The correct version is prime(p) ≡ 1 < p∧(∀m. dvd(m, p) ⇒ m = 1∨m = p) assuming that all symbols are properly defined. Definition 2. Given a scheme s := sn (x) ≡ t we say that s is a propositional scheme. In case t has the form ∃f ∀y ∧m i=1 li = ri then we say that the propositional scheme s is a definitional scheme, and l1 = r1 , . . . , lm = rm are the defining equations of s. Examples of valid propositional schemes are listed below. true ≡ comm(p) ≡ (∀x y. p(x, y) = p(y, x)) assoc comm(p) ≡ ∀x y z. p(p(x, y), z) = p(x, p(y, z)) ∧ comm(p) The following are examples of definitional schemes. ⎛ ⎞ def -scheme(g, h, i,j) ≡ ⎝ ⎠ f (g, y) = y ∃f. ∀x y z. f (h(z, x), y) = i(j(z, y), f (x, y)) ⎛ ⎞ mutual-def -scheme(g, ⎧ h, i, j, k, l) ≡ ⎜ ⎟ f1 (g) = h ⎪ ⎜ ⎪ ⎟ ⎨ ⎜ ⎟ f (g) = i 2 ⎜ ⎟ ∃f1 f2 . ∀x y. ⎝ ⎠ (j(z, x)) = k(z, f (x)) f ⎪ 1 2 ⎪ ⎩ f2 (j(z, x)) = l(z, f1 (x))
(3)
(4)
The definitional scheme (4) captures the idea of two mutual functions defined recursively. Here the existentially quantified variables (f in scheme (3) and f1 and f2 in scheme (4)) stand for the new functions to be defined.
4
Generation of Instantiations
In this section we describe the technique used to instantiate schemes automatically. Here we define some preliminary concepts.
352
O. Montano-Rivas et al.
Definition 3. For a scheme s, the set of schematic substitutions with respect to a (finite) set of closed terms X ⊂ T (F , V) is defined by: Sub(s, X) := {σ | Closed (sσ) ∧ (((v → x) ∈ σ) ⇒ x ∈ X)} where Closed (t) is true when the term t contains no free variables. Ensuring that sσ is a closed term avoids overgeneralisations on conjectures or definitions, e.g. it is impossible to prove ∀x y z. x ∗ p(y, z) = p(x ∗ y, x ∗ z) where p is free. Definition 3 also bounds the possible substitutions such that free variables in the scheme are mapped to closed terms in X. The problem of finding the substitutions Sub(s, X) of a scheme s given a set of terms X can be solved as follows. The free variables V(s) = {v1 , . . . , vn } in the scheme are associated with their initial domain D0i = {x ∈ X| vi and x can be unified} for 1 ≤ i ≤ n. The typing information of the partially instantiated scheme is the only constraint during the instantiation of variables. Each time a variable vi is instantiated to x ∈ Dki the domains D(k+1)j for i < j ≤ n of the remaining variables must be updated w.r.t the most general unifier σmgu of vi and x. Variables are instantiated sequentially and if a partial instantiation leaves no possible values for a variable then backtracking is performed to the most recently instantiated variable that still has alternatives available. This process is repeated using backtracking to exhaust all possible schematic substitutions obtaining a complete algorithm. Example 1. Let F be a signature consisting of F := {+:nat→nat→nat , ∗:nat→nat→ nat , @:‘a list→‘a list→‘a list , map:(‘a→‘b)→‘a list→‘b list }. Also let X = {+, ∗, @, map} and s be the propositional scheme (2) of section 2 (here we assume the most general type infered for the scheme). Figure (1) illustrates how Sub(s, X) is evaluated following a sequential instantiation of the free variables of s. It is important to note that the asymptotic running time of the algorithm is Θ(|X||V(s)| ) and the worst case is when we obtain |X||V(s)| valid substitutions. For a scheme s, the generated schematic substitutions are used to produce instantiations of s; i.e. conjectures or definitions Definition 4. Given σ ∈ Sub(s, X), the instantiation of the scheme s := u ≡ v with σ is denoted by inst(u ≡ v, σ) := vσ Definition 5. For a scheme s, the set of instantiations Insts(s, X) with respect to a (finite) set of closed terms X ⊂ T (F , V) is denoted by Insts(s, X) := {inst(s, σ) | σ ∈ Sub(s, X)}
(5)
Scheme-Based Synthesis of Inductive Theories
353
Fig. 1. Sequential evaluation of Sub(s, X) where s is the propositional scheme (2) and X = {+::nat→nat→nat, ∗::nat→nat→nat , @::‘a list→‘a list→‘a list , map::(‘a→‘b)→‘a list→‘b list }. Each box shows the unified and not-unified (in bold) variables and their domain during → the evaluation. The output of the algorithm is the set of substitutions {σ1 = {p → +, q → ∗}, σ3 = {p → ∗, q → +}, σ4 = {p → ∗, q → ∗}, σ5 = +, q → +}, σ2 = {p {p → @, q → @}, σ6 = {p → @, q → map}}. Note that a unified variable potentially changes the types of the rest of the variables, restricting their domain.
Example 2. The instantiations generated from scheme (2) and the set of terms X = {+, ∗, @, map} are depicted in the following table.
σ1 σ2 σ3 σ4 σ5 σ6
5
Sub(s, X) = {p → +, q → +} = {p → +, q → ∗} = {p → ∗, q → +} = {p → ∗, q → ∗} = {p → @, q → @} = {p → @, q → map}
∀x y z. ∀x y z. ∀x y z. ∀x y z. ∀x y z. ∀x y z.
Insts(s, X) x + (y + z) = (x + y) + (x + z) x ∗ (y + z) = x ∗ y + x ∗ z x + y ∗ z = (x + y) ∗ (x + z) x ∗ (y ∗ z) = (x ∗ y) ∗ (x ∗ z) x@(y@z) = (x@y)@(x@z) map(x, y@z) = map(x, y)@map(x, z)
Identification of Equivalent Instantiations
Processing the instantiations (conjectures and definitions) of a scheme could be a demanding task. In the worst case, the number of substitutions σ : V → X is |X||V | . However, we can reduce the number of conjectures and definitions
354
O. Montano-Rivas et al.
by noticing that two different substitutions σ1 and σ2 could lead to equivalent instantiations. Table 1 shows the set of instantiations Insts(s, X) obtained from the following definitional scheme. ⎛ ⎞ def -scheme(g, h, i) ≡ ⎝ ⎠ f (g, y) = h(g, g) (6) ∃f. ∀x y. f (suc(x), y) = i(y, f (x, y)) In Table 1 inst(s, σN 1 ) and inst(s, σN 2 ) are clearly equivalent3 . The key ingredient for automatically detecting equivalent instantiations is a term rewrite system (TRS) R which handles the normalization of the function symbols inside a term [10]. Table 1. Redundant definitions generated from the definitional scheme 6. Note that the instantiations inst(s, σN1 ) and inst(s, σN2 ) are equivalent as 0 + 0 can be ‘reduced’ (within the theory) to 0. inst(s, σN3 ) and inst(s, σN4 ) are similarly equivalent. Sub(s, X) g → 0, h →+ σN1 = i →+ g → 0, h → (λx y. x) σN2 = i →+ g → 0, h →+ σN3 = → (λx y. x) i g → 0, h → (λx y. x) σN4 = i → (λx y. x)
Insts(s, X) f (0, y) = 0 + 0 ∃f. ∀x y. f (suc(x), y) = y + f (x, y) f (0, y) = 0 ∃f. ∀x y. f (suc(x), y) = y + f (x, y) f (0, y) = 0 + 0 ∃f. ∀x y. f (suc(x), y) = y f (0, y) = 0 ∃f. ∀x y. f (suc(x), y) = y
However, for this idea to work, the constructed TRS R must have the property of being terminating. All functions in Isabelle/HOL are terminating to prevent inconsistencies. Therefore, the defining equations for a newly introduced function symbol can be used as a normalizing TRS. Furthermore, if we are to include a new equation e to the rewrite system R during the exploration of the theory then we must prove termination of the extended rewrite system R ∪ {e}. To this end, we use the termination checker AProVE [9] along with Knuth-Bendix completion to obtain a convergent rewrite system, if possible (using a similar approach to [17]). The following definition will help with the description of the algorithm for theory exploration of section 7. Definition 6. Given a terminating rewrite system R and an instantiation i ∈ Insts(s, X) of the form ∀x. s = t, the normalizing extension ext(R, i) of R with i is denoted by
3
Note that ‘+’ denotes standard addition of naturals.
Scheme-Based Synthesis of Inductive Theories
⎧ ⎪ ⎪ R ⎪ ⎪ ⎨ ext(R, i) :=
6
R ∪ {r} ⎪ ⎪ ⎪ ⎪ ⎩ R
355
if Knuth-Bendix completion succeeds for R ∪ {s = t} with a convergent system R if termination succeeds for R ∪ {r} with r ∈ {s = t, t = s} otherwise
Filtering of Conjectures and Definitions
As suggested by definition 6, IsaScheme updates the rewrite system R each time a new equational theorem is found. It is thus useful to consider the notion of equivalence of instantiations modulo R. Definition 7. Let u and v be two instantiations and R a terminating rewrite system. Equivalence of instantiations modulo R is denoted as u =α vˆ) u ≈R v := (ˆ where uˆ and vˆ are normal forms (w.r.t. R) of u and v respectively and =α is term equivalence up to variable renaming. Since the exploration process could generate a substantial number of definitions and each of them could potentially produce a multitude of conjectures it becomes necessary to restrict the search space in some way. We decided to filter out all functions whose values are independent of one of their arguments as they can always be defined with another function using fewer arguments and a λabstraction. For example, instead of generating f1 (x, y) = x2 and f2 (x, y) = y 2 it would be better to just generate f (x) = x2 and construct f1 and f2 on top of f , e.g. (λx y. f (x)) and (λx y. f (y)). Definition 8. An argument neglecting function f ∈ F with type τ1 → . . . → τn → τ0 , where τ0 is a base type and n > 0, is a function such that f (x1 , . . . , xk−1 , y, xk+1 , . . . , xn ) = f (x1 , . . . , xk−1 , z, xk+1 , . . . , xn ) for some k where 1 ≤ k ≤ n.
7
Theory Exploration Algorithms
Scheme-based Conjecture Synthesis. The overall procedure for the generation of theorems is described by the pseudocode of InventTheorems. The algorithm receives as arguments a set of terms I (conjectures), a terminating rewrite system R, a set of terms T (theorems), a set of propositional schemes Sp and a set of closed terms Xp from which schemes are to be instantiated. Note that initially, I = T = ∅.
356
O. Montano-Rivas et al.
InventTheorems(I, R, T , Sp, Xp ) for each i ∈ s ∈ Sp Insts(s, Xp ) ˆi := a normal form of i w.r.t. R 2 3 if ˆi is not subsumed by T ∪ {T rue} and 4 there is not a j ∈I such that j ≈R ˆi and 5 cannot find a counter-example of ˆi then 6 if can prove ˆi then 7 T := T ∪ {ˆi} 8 if ˆi is of the form ∀x. s = t then 9 R := ext(R, ˆi) 10 I := I ∪ {ˆi} 11 return 1
The algorithm iterates through all instantiations obtained from any scheme s ∈ Sp and the terms Xp . Line 4 can be implemented efficiently using discrimination nets and avoids counterexample checking equivalent instantiations modulo R. Falsifiable instantiations are detected in line 5 to avoid any proof attempt on such conjectures. Isabelle/HOL provides the counter-example checker quickcheck [1] which is used to refute the false conjectures in the implementation of IsaScheme. In case the conjecture is not rejected by the inspection in lines 4, 5 or 6 then a proof attempt is performed in line 6. The prover used for the proof obligations in IsaScheme was the automatic inductive prover IsaPlanner [8] which implements the rippling heuristic [4]. Scheme-based Definition Synthesis. The generation of definitions is described by the pseudocode of InventDefinitions. The algorithm takes as input the same arguments received by the InventTheorems method. Additionally, it also takes a set of function symbols F in the current theory, a set of terms D (definitions), a set of definitional schemes Sd and a set of closed terms Xd from which definitional schemes are to be instantiated. Again, initially D = ∅. The algorithm iterates through all instantiations obtained from any definitional scheme s ∈ Sd and the terms Xd . Each instantiation d is reduced to a normal form dˆ w.r.t. R in line 2. Since dˆ is generated from a definitional scheme, it has the form ∃f1 . . . fn .∀y.e1 ∧. . . em where f1 , . . . , fn are variables standing for the new functions to be defined and e1 , . . . , em are the defining equations of the functions. In lines 4 and 5, new function symbols f1 , . . . , fn (w.r.t. the signature F ) are created and a substitution σ is constructed to give a specific name to each of the new functions to be defined. This ‘renaming’ of functions is performed with the defining equations and [e1 , . . . , em ] is obtained in line 6. Line 7 ensures that definitions that are equivalent modulo R to earlier generated ones, are ignored. Well-definedness properties such as termination or totality of the functions generated are proved in line 8. We used Isabelle/HOL’s function package [11] for these proof obligations. Line 9 checks if the new functions created are not argument neglecting (AN). In practice, it is hard and expensive to prove conjectures of the form f (x1 , . . . , xk−1 , y, xk+1 , . . . , xn ) = f (x1 , . . . , xk−1 , z, xk+1 , . . . , xn ) demanded by definition 8; instead, we produce counter-examples of that function
Scheme-Based Synthesis of Inductive Theories
357
not being AN for each of its arguments. In case the instantiation dˆ is not rejected by the inspection in lines 7, 8 or 9 then the context F and the theorems T are updated with the new function symbols f1 , . . . , fn and the theorems {e1 , . . . , em } respectively (lines 10 and 11). Line 12 updates the rewrite system R with the newly introduced defining equations e1 , . . . , em . A call to InventTheorems is performed in line 13 updating I, N and T . At the end of each iteration the instantiation dˆ is added to theset of processed definitions D in line 14. Finally, when all instantiations d ∈ s∈Sd Insts(s, Xd ) have been processed the values are returned. InventDefinitions(I, R, T , Sp, Xp , F , D, Sd , Xd ) 1 for each d ∈ s ∈ Sd Insts(s, Xd ) ˆ 2 d := a normal form of d w.r.t. R 3 let ∃f1 . . .fn .∀y. e1 ∧. . .em = dˆ 4 create function symbols f1 , . . . , fn such that fi ∈ /F 5 σ := {f1 →f1 , . . . , fn →fn } 6 [e1 , . . . , em ] := [σ(e1 ), . . . , σ(em )] 7 if there is not a j ∈D such that j ≈R dˆ and 8 [e1 , . . . , em ] is well-defined and 9 f1 , . . . , fn are not argument neglecting then 10 F := F ∪ {f1 , . . . , fn } 11 T := T ∪ {e1 , . . . , em } 12 R := R ∪ {e1 , . . . , em } 13 :=InventTheorems(I, R, T , Sp, Xp ∪ {f1 , . . . , fn }) ˆ 14 D := D ∪ {d} 15 return
8
Evaluation
We conducted several case studies in a theory of natural numbers and a theory of lists to evaluate how similar were the results obtained with our method and implementation to those in the libraries of the Isabelle proof assistant. We performed a precision/recall analysis with Isabelle’s libraries as reference to evaluate the quality of the theorems found by the InventTheorems algorithm and the following propositional scheme4 . prop-scheme(p, q, r, s, t, u) ≡ ∀x y z. p(q(x, y), r(x, z)) = s(t(x, z), u(y, z)) IsaScheme produces a total of 23 theorems for the theory of naturals with 14 of them included in Isabelle’s libraries. Isabelle contains 33 theorems about addition, multiplication and exponentiation giving a precision of 60% and a recall of 42%. The theorems discovered in the theory of natural numbers included, 4
We also used another propositional scheme to handle ternary operators.
358
O. Montano-Rivas et al.
commutativity, associativity, distributivity of multiplication and addition, distributivity of exponentiation and multiplication, commuted versions of addition and multiplication, among others. It is important to say that 16 out of the 19 theorems not synthesised were subsumed (after normalization w.r.t. the resulting rewrite system R) by more general synthesised ones and the rest fell out of the scope of the propositional scheme used as they contained 4 variables. IsaScheme produces a total of 13 theorems for the theory of lists producing all 9 theorems about append, list reverse, map, right-fold and left-fold included in Isabelle’s libraries. This gives a precision of 70% and a recall of 100% for this theory. In the theory of lists, there was a rather high number of unfalsified and unproved conjectures (279). The type information for these conjectures was more complex than Quickcheck could manage. A small random sample of these conjectures was taken, and in each case, a counterexample was quite easily found by hand. Table 2 summarises the statistics for the theories analysed. Table 2. Precision and recall analysis with Isabelle’s theory library as reference. The constructors are zero, Suc, nil and cons with labels Z, S, N and C respectively. The functions are addition, multiplication, exponentiation, append, reverse, length, map, left-fold and right-fold with labels +, *, ˆ, A, R, L, M, FL and FR respectively. Precision-Recall 65%-50% 63%-100% 100%-100% 80%-100% Constructors Z, S Z, S, N, C N, C N, C Functions Symbols +, *, ˆ A, R, L A, R, M A, FL, FR Elapsed Time (s) 1756 9237 9885 18179 Conjectures Synthesised 78957 175847 204950 13576 Conjectures Filtered 78934 175839 204944 13292 Proved-Not Proved 23-0 8-0 6-0 7-279
The definitions synthesised by IsaScheme (InventDefinitions algorithm) with the following definitional scheme are, among others, addition, multiplication, exponentiation, append, map, and (tail recursive) reverse. ⎛ ⎞ def -scheme(g, h, i,j, k, l) ≡ ⎝ ⎠ f (g, y) = h(y) ∃f. ∀x y z. f (i(z, x), y) = j(k(x, y, z), f (x, l(z, y))) For the theory of natural numbers, IsaScheme obtained a precision of 6% and a recall of 100%. For the theory of lists, IsaScheme obtained a precision of 14% and a recall of 18%. The low recall for the list theory was caused because the definitional scheme used was only able to synthesise binary functions and not unary or ternary ones. This could have been addressed easily by considering definitional schemes producing unary and ternary functions at the expense of computational time (see section 4). This however, would have been detrimental for the precision of the InventDefinitions algorithm. Note that the scheme-based approach for the generation of definitions provides a free-form incremental construction of (potentially infinitely many) recursive functions. Overly general definitional
Scheme-Based Synthesis of Inductive Theories
359
schemes provide a wide range of possible instantiations and thus, definitions. In fact, we believe this was the reason of the low precision in the evaluation of the algorithm for both theories. Strategies to assess the relevance of definitions are required given the big search space during exploration. Nevertheless, this is left as future work. For space reasons we can not give a presentation of the theories or the theorems and definitions found. Formal theory documents in human-readable Isabelle/Isar notation and all results described in this paper are available online5 . For the evaluation we use a computer cluster where each theory exploration was run in a GNU/Linux node with 2 dual core CPUs and 4GB of RAM memory. We also use Isabelle/2009-2, IsaPlanner svn version 2614 and AProVE 1.2.
9
Related Work
Other than IsaScheme, Theorema is the only system performing the exploration of mathematical theories based on schemes [3]. However, the user needs to perform all schematic substitutions manually as Theorema does not instantiate the schemes automatically from a set of terms. The user also needs to conduct the proof obligations interactively [6]. Another important difference is that ensuring the soundness of definitions is left to the user in Theorema. In IsaScheme, which uses Isabelle’s LCF-methodology, definitions are sound by construction [11]. The MATHsAiD program was intended for use of research mathematicians and was designed to produce interesting theorems from the mathematician’s point of view [13]. MATHsAiD starts with an axiomatic description of a theory; hypotheses and terms of interest are then generated, forward reasoning is then applied to produce logical consequences of the hypotheses and then a filtering process is carried out according to a number of interestingness measures. MATHsAiD has been applied to the naturals, set theory and group theory. Like IsaScheme, IsaCoSy is a theory exploration system for Isabelle/ IsaPlanner [10]. It generates conjectures in a bottom-up fashion from the signature of an inductive theory. The synthesis process is accompanied by automatic counterexample checking and a prove attempt in case no counter-example is found. All theorems found are then used as constraints for the synthesis process generating only irreducible terms w.r.t. the discovered theorems. The main difference between IsaScheme and IsaCoSy is that IsaCoSy considers all (irreducible) terms as candidate conjectures where IsaScheme considers only a restricted set (modulo R) specified by the schemes. This restricted set of conjectures avoids the need for a sophisticated constraint language. The main advance made by IsaScheme is the use of Knuth-Bendix completion and termination checking to orient the resulting equational theorems to form a rewrite system. The empirical results show that for the theory of lists, these rewrite systems result in fewer theorems that prove all of the theorems in the theory produced by IsaCoSy. HR is a theory exploration system which uses an example driven approach for theory formation[5]. It uses MACE to build models from examples and also to 5
http://dream.inf.ed.ac.uk/projects/isascheme/
360
O. Montano-Rivas et al.
identify counter-examples. The resolution prover Otter is used for the proof obligations. The process of concept invention is carried out from old concepts starting with the concepts provided by MACE at the initial stage. These concepts, stored as data-tables of examples rather than definitions, are passed through a set of production rules whose purpose is to manipulate and generate new data-tables, thus generating new concepts. The conjecture synthesis process is built on top of concept formation. HR takes the concepts obtained by the production rules and forms conjectures about them. There are different types of conjectures HR can make, e.g. equivalence conjectures which amounts to finding two concepts and stating that their definitions are equivalent, implication conjectures are statements relating two concepts by stating that the first is a specialization of the second (all examples of the first will be examples of the second), etc. HR has been applied to the naturals, group theory and graph theory.
10
Limitations and Future Work
An important aspect of every theory exploration system is its applicability across different mathematical theories. The scheme-based approach used by IsaScheme, provides a generic mechanism for the exploration of any mathematical theory where the symbols (or closed terms built from them) in the theory’s signature and the variables within the schemes could be unified. However, this free-form theory exploration could lead to a substantial number of instantiations that needs to be processed (see section 4) and it is particularly true with large numbers of constructors and function symbols. This is partially mediated with the lemmata discovered during the exploration of the theory. Nevertheless, this could be improved by also exploiting the intermediate lemmata needed to finish the proofs, e.g. with the lemma calculation critic used in rippling. Another limitation is that termination (and thus confluence) of rewrite systems is in general undecidable and requires sophisticated technology to solve interesting cases. This problem is aggravated with rewrite systems with a large number of rewrite rules. In this situation, termination checking demanded by definition 6 would benefit from modular properties of rewrite systems such as hierarchical termination[7].
11
Conclusion
We have implemented the proposed scheme-based approach to mathematical theory exploration in Isabelle/HOL for the generation of conjectures and definitions. This interpretation is used to describe how the instantiation process of schemes can be automated. We have also described how we can make productive use of normalization in two ways: first to improve proof automation by maintaining a terminating and potentially convergent rewrite system and second to avoid numerous redundancies inherent in most theory exploration systems.
Scheme-Based Synthesis of Inductive Theories
361
Acknowledgments. This work has been supported by Universidad Polit´ecnica de San Luis Potos´ı, SEP-PROMEP, University of Edinburgh, the Edinburgh Compute and Data Facility, the RISC-Linz Transnational Access Programme (No. 026133), and EPSRC grants EP/F033559/1 and EP/E005713/1. The authors would like to thank the anonymous referees for their helpful comments.
References 1. Berghofer, S., Nipkow, T.: Random Testing in Isabelle/HOL. In: SEFM, pp. 230– 239 (2004) 2. Buchberger, B.: Algorithm Supported Mathematical Theory Exploration: A Personal View and Stragegy. In: Buchberger, B., Campbell, J. (eds.) AISC 2004. LNCS (LNAI), vol. 3249, pp. 236–250. Springer, Heidelberg (2004) 3. Buchberger, B., Craciun, A., Jebelean, T., Kov´ acs, L., Kutsia, T., Nakagawa, K., Piroi, F., Popov, N., Robu, J., Rosenkranz, M.: Theorema: Towards computeraided mathematical theory exploration. J. Applied Logic 4(4), 470–504 (2006) 4. Bundy, A., Basin, D., Hutter, D., Ireland, A.: Rippling: Meta-level Guidance for Mathematical Reasoning. Cambridge Tracts in Theoretical Computer Science, vol. 56. Cambridge University Press, Cambridge (2005) 5. Colton, S.: Automated Theory Formation in Pure Mathematics. PhD thesis, Division of Informatics, University of Edinburgh (2001) 6. Craciun, A., Hodorog, M.: Decompositions of Natural Numbers: From A Case Study in Mathematical Theory Exploration. In: SYNASC 2007 (2007) 7. Dershowitz, N.: Hierachical Termination. In: Lindenstrauss, N., Dershowitz, N. (eds.) CTRS 1994. LNCS, vol. 968, pp. 89–105. Springer, Heidelberg (1995) 8. Dixon, L., Fleuriot, J.D.: IsaPlanner: A Prototype Proof Planner in Isabelle. In: Baader, F. (ed.) CADE 2003. LNCS (LNAI), vol. 2741, pp. 279–283. Springer, Heidelberg (2003) 9. Giesl, J., Schneider-kamp, P., Thiemann, R.: AProVE 1.2: Automatic Termination Proofs in the Dependency Pair Framework. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS (LNAI), vol. 4130, pp. 281–286. Springer, Heidelberg (2006) 10. Johansson, M., Dixon, L., Bundy, A.: Conjecture Synthesis for Inductive Theories. Journal of Automated Reasoning (2010) (to appear) 11. Krauss, A.: Automating Recursive Definitions and Termination Proofs in HigherOrder Logic. PhD thesis, Dept. of Informatics, T. U. M¨ unchen (2009) 12. Lenat, D.B.: AM: An Artificial Intelligence approach to discovery in Mathematics as Heuristic Search. In: Knowledge-Based Systems in Artificial Intelligence (1982) 13. McCasland, R., Bundy, A., Smith, P.F.: Ascertaining Mathematical Theorems. Electr. Notes Theor. Comput. Sci. 151(1), 21–38 (2006) 14. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle’s Logics: HOL (2000) 15. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL — A Proof Assistant for Higher-Order Logic. LNCS, vol. 2283. Springer, Heidelberg (2002) 16. Sutcliffe, G., Gao, Y., Colton, S.: A Grand Challenge of Theorem Discovery (June 2003) 17. Wehrman, I., Stump, A., Westbrook, E.: Slothrop: KnuthBendix Completion with a Modern Termination Checker. In: Webster University, St. Louis, Missouri M.Sc. Computer Science, Washington University, pp. 268–279 (2006)
A Possibilistic Intuitionistic Logic Oscar Estrada1, Jos´e Arrazola1, and Mauricio Osorio2 1 Benem´erita Universidad Aut´ onoma de Puebla
[email protected],
[email protected] 2 Universidad de las Am´ericas - Puebla
[email protected]
Abstract. We define what we call “Possibilistic Intuitionistic Logic (PIL)”; We present results analogous to those of the well-known intuitionistic logic, such as a Deduction Theorem, a Generalized version of the Deduction Theorem, a Cut Rule, a weak version of a Refutation Theorem, a Substitution Theorem and Glivenko’s Theorem. Keywords: Possibilistic Logic, Intuitionistic Logic, Answer Set.
1
Introduction
Uncertainty is an attribute of information. The pioneering work of Claude Shannon [Sh1] on Information Theory led to the universal acceptance that information is statistical in nature; As a consequence, dealing with uncertainty was confined to the Theory of Probability. Didier Dubois et. al. presented Possibilistic Logic on [Du1]; Possibilistic Logic enables to handle information with uncertainty. Their logic is based on the Theory of Possibility of Zadeh[Za1]; On the paper by Dubois et. al.[Du1], it is included an axiomatization for Possibilistic Logic and an extended resolution-based method which is viable to be implemented on a computer. According to Dubois et. al. [Ni2], Possibilistic Logic provides a sound and complete machinery for handling qualitative uncertainty with respect to a semantics expressed by means of possibility distributions which rank order the possible interpretations. Dubois mention that in possibilistic logic it is dealt with uncertainty by means of classical two-valued (true or false) interpretations that can be more or less certain, more or less possible. Possibilistic Logic is not concerned to deal with a vagueness representation in a multi-valued framework but instead, it stays in the framework of classical logic to which it adds a way to graduate the confidence it had in each proposed information[Ni2]. On the other hand, Intuitionistic Logic has become important to the area of computer science since the publication of a paper by David Pearce[Pe1], on which it is established a link between Intuitionistic Logic (and furthermore, all the class of intermediate logics) and Answer Set Programming(ASP)[Ma1, Ni1]. Considering the importance of the ASP formalism, in combination with its usefulness within the context of intermediate logics, and the need for a better way of handling uncertain information is that we propose the Possibilistic Intuitionist G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 362–373, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Possibilistic Intuitionistic Logic
363
Logic(PIL) formalism. Our first attempt to develop PIL is to present it in an axiomatic way, so we only present a propositional calculus. First, on Section 2, we present some background on both possibilistic and intuitionistic logic. Then, on Section 3, we present similar results to those in Intuitionistic Logic, namely, a Deduction Theorem, a Generalized version of the Deduction Theorem, a Cut Rule, a weak version of the Refutation Theorem, a Substitution Theorem and finally, a version of Glivenko’s Theorem. Finally, on Section 4 we give the conclusions and sketch some ideas for future work.
2
Background
We present the necessary background for this paper. The propositional calculus is built as in the well-known book by Mendelson [Me1]. 2.1
Intuitionistic Logic
Definition 1. Positive Logic is defined by the following set of axioms: Pos 1: ϕ → (ψ → ϕ) Pos 2: (ϕ → (ψ → σ)) → ((ϕ → ψ) → (ϕ → σ)) Pos 3: ϕ ∧ ψ → ϕ Pos 4: ϕ ∧ ψ → ψ Pos 5: ϕ → (ψ → (ϕ ∧ ψ)) Pos 6: ϕ → (ϕ ∨ ψ) Pos 7: ψ → (ϕ ∨ ψ) Pos 8: (ϕ → σ) → ((ψ → σ) → (ϕ ∨ ψ → σ)) Definition 2. Intuitionistic Logic I is defined as positive logic plus the following two axioms: Int1: (ϕ → ψ) → [(ϕ → ¬ψ) → ¬ϕ] Int2: ¬ϕ → (ϕ → ψ) Recall that the following are valid meta-theorems of Intuitionism: 1. 2. 3. 4. 5.
If If If If If
C ϕ then, I ¬¬ϕ ϕ is a tautology (in the classical sense) then, I ¬¬ϕ C ¬ϕ then I ¬ϕ ¬ϕ is a tautology (in the classical sense) then, I ¬¬ϕ ϕ ∈ ¬, ∧ then, I ϕ if and only if, ϕ is a tautology.
Theorem 1 (Deduction Theorem [Va1]). If Γ is a set of formulas, ψ and σ are well-formed formulas and Γ, ψ I σ, then Γ I ψ → σ Lemma 1 ([Va1]) a) I (ϕ → (ψ ↔ σ)) ∧ (ϕ → (ξ ↔ χ)) → (ϕ → ((ψ → ξ) ↔ (σ → χ))) b) I (ϕ → ψ) → (¬ψ → ¬ϕ) c) I (σ → (ϕ ↔ ψ)) → (σ → (¬ϕ ↔ ¬ψ))
364
2.2
O. Estrada, J. Arrazola, and M. Osorio
Possibilistic Logic
The following definitions can be found on [Du1], to which the reader is referred for further details. A necessity-valued formula is a pair (ϕ α), where ϕ is a classical propositional formula and α ∈ (0, 1]. (ϕ α) expresses that ϕ is certain to the extent α, that is, N (ϕ) ≥ α, where N is a necessity measure which models our state of knowledge. The constant α is known as the valuation of the formula and is represented as val(ϕ). On [Du1], it is proposed an axiom system for Possibilistic Logic: (A1) (ϕ → (ψ → ϕ) 1) (A2) ((ϕ → (ψ → ξ)) → ((ϕ → ψ) → (ϕ → ξ)) 1) (A3) ((¬ϕ → ¬ψ) → ((¬ϕ → ψ) → ϕ) 1) with the inference rules, (GMP) (ϕ α), (ϕ → ψ β) (ψ min(α, β)). (S) (ϕ α) (ϕ β) if α ≥ β.
3
Contribution
We now present our main results. Here, we define Possibilistic Intuitionistic Logic (PIL) and prove the PIL-version of some well-known theorems, i.e., we prove the PIL-version of theorems such as The Deduction Theorem, Refutation Theorem, Cut Rule, Substitution Theorem and Glivenko’s Theorem. Also, we present a Generalized Deduction Theorem which is valid not only in PIL but also in Standard Possibilistic Logic[Du1] (which we will denote by “Pos”). 3.1
Possibilistic Intuitionistic Logic (PIL)
We now present the axioms for Possibilistic Intuitionistic Logic (PIL). Definition 3. We define Possibilistic Intuitionistic Logic (PIL) by means of the following set of axioms: PIL-1: (ϕ → (ψ → ϕ) 1) PIL-2: ((ϕ → (ψ → σ)) → ((ϕ → ψ) → (ϕ → σ)) 1) PIL- 3: (ϕ ∧ ψ → ϕ 1) PIL-4: (ϕ ∧ ψ → ψ 1) PIL-5: (ϕ → (ψ → (ϕ ∧ ψ)) 1) PIL-6: (ϕ → (ϕ ∨ ψ) 1) PIL-7: (ϕ → (ϕ ∨ ψ) 1) PIL-8: ((ϕ → σ) → ((ψ → σ) → (ϕ ∨ ψ → σ)) 1) PIL-9: ((ϕ → ψ) → [(ϕ → ¬ψ) → ¬ϕ] 1) PIL-10: (¬ϕ → (ϕ → ψ) 1) Together with the following rules of inference: (GMP) (ϕ α), (ϕ → ψ β) P IL (ψ min {α, β}) (S) (ϕ α) P IL (ϕ β) if α ≥ β
A Possibilistic Intuitionistic Logic
365
Lemma 2. P IL (ϕ → ϕ 1) The following result is given on [Du1] for Standard Possibilistic Logic, which is based on Classical Logic. The result we present here is given in the context of PIL; Although this result is similar to the one presented on [Du1], the proof we give differs from the one presented on [Du1] in which our proof is made from a purely axiomatic way, meanwhile the proof in Dubois et al. was made from a semantic point of view. Theorem 2 (Deduction Theorem). Let Γ ∪ {ϕ, ψ} be a set of formulas in PIL, and α ∈ (0, 1]. Then we have that Γ ∪ {(ϕ 1)} P IL (ψ α) if and only if Γ P IL (ϕ → ψ α) Proof. Let (ψ1 α1 ), (ψ2 α2 ), (ψ3 α3 ), . . . , (ψn αn ) be a proof of (ψ α) from Γ ∪ {(ϕ 1)} where ψn = ψ and αn = α. We will prove by induction on j that Γ P IL (ϕ → ψj αj ) for 1 ≤ j ≤ n. First, (ψ1 α1 ) could be in Γ , could be an axiom of PIL or it could be equal to (ϕ 1)1 . By Axiom PIL-1, we have that P IL (ψ1 → (ϕ → ψ1 ) 1), so in the first two cases we have that, by GMP, Γ P IL (ϕ → ψ1 α1 ); For the third case, when (ψ1 α1 ) is equal to (ϕ 1), we have by Lemma 2, Γ P IL (ϕ → ψ1 α1 ). Now, assume that Γ P IL (ϕ → ψk αk ) for k < j; We have that (ψj αj ) could be an axiom of PIL, it could be that (ψj αj ) is in Γ , it could be that (ψj αj ) is equal to (ψ α), it could be that (ψj αj ) follows from Rule (S) from a formula (ψj β) with β ≥ αj , or it could follow by GMP from some formulas (ψl αl ) and (ψm αm ) with l, m < j and ψm = ψl → ψ, and also αm is such that αj = min {αl , αm }. In the first three cases, Γ P IL (ϕ → ψj αj ) follows similarly to the case j = 1. If (ψj αj ) follows from Rule (S), then using axiom PIL-1 we obtain Γ P IL (ϕ → ψj αj ). Finally, for the last case, by the induction hypothesis, we have that Γ P IL (ϕ → ψl αl ) and Γ P IL (ϕ → (ψl → ψj ) αm ) but for Axiom PIL-2, we have that P IL ((ϕ → (ψl → ψj )) → ((ϕ → ψl ) → (ϕ → ψj )) 1) Therefore, applying GMP to the last formula, we have: Γ P IL ((ϕ → ψl ) → (ϕ → ψj ) αm ) Again, applying GMP to the last formula we have: Γ P IL (ϕ → ψj αj ) since αj = min {αl , αm }. Therefore, the proof is complete. The case j = n is the desired result.
1
We say that a PIL formula (ϕ α) is equal to another (ψ β), if ϕ = ψ and α = β.
366
O. Estrada, J. Arrazola, and M. Osorio
Theorem 3 (Generalized Deduction Theorem). Let Γ ∪ {ϕ, ψ} be a set of PIL formulas and α, β ∈ (0, 1]. Then we have that 1. Γ ∪ {(ϕ β)} P IL (ψ α) implies Γ P IL (ϕ → ψ α) 2. If, furthermore, β ≥ α, then we have the equivalence: Γ ∪ {(ϕ β)} P IL (ψ α) if and only if Γ P IL (ϕ → ψ α) Proof. Proof of (1) Let (ψ1 α1 ), (ψ2 α2 ), (ψ3 α3 ), . . . , (ψn αn ) be a proof of (ψ α) from Γ ∪ {(ϕ β)} where ψn = ψ and αn = α. We will prove by induction on j that Γ P IL (ϕ → ψj αj ) for 1 ≤ j ≤ n. Base Case. First, (ψ1 α1 ) could belong to in Γ , it could be an axiom of PIL or it could be equal to (ϕ β). By Axiom PIL-1, we have that P IL (ψ1 → (ϕ → ψ1 ) 1), so in the first two cases we have, by GMP, that Γ P IL (ϕ → ψ1 α1 ); For the third case, when (ψ1 α1 ) is equal to (ϕ β), we have by Lemma 2 that Γ P IL (ϕ → ψ1 α1 ). Inductive Hypothesis. Now, assume that Γ P IL (ϕ → ψk αk ) for k < j, then: We have one of the following cases: (ψj αj ) could be an axiom of PIL, it could be that (ψj αj ) belongs to Γ , it could be that (ψj αj ) is equal to (ϕ β), it could be that (ψj αj ) follows from Rule (S) from a formula (ψj γ) with γ ≥ αj , 5. it could be that (ψj αj ) follows by GMP from some formulas (ψl αl ) and (ψm αm ) with l, m < j and ψm = ψl → ψj and αm is such that αj = min {αl , αm }.
1. 2. 3. 4.
In the first three cases, Γ P IL (ϕ → ψj αj ) follows in an analogous way as in the case j = 1. If (ψj αj ) follows from Rule (S), then using Axiom PIL-1 and Rule (S) we obtain that Γ P IL (ϕ → ψj αj ). Finally, for the last case, by the induction hypothesis, we have that Γ P IL (ϕ → ψl αl ) and Γ P IL (ϕ → (ψl → ψj ) αm ) but for Axiom PIL-2, we have that P IL ((ϕ → (ψl → ψj )) → ((ϕ → ψl ) → (ϕ → ψj )) 1) So, applying GMP to the last formula, we obtain: Γ P IL ((ϕ → ψl ) → (ϕ → ψj ) αm ) Again, applying GMP to the last formula, we obtain Γ P IL (ϕ → ψj min {αl , αm })
A Possibilistic Intuitionistic Logic
367
Therefore, we have that Γ P IL (ϕ → ψj αj ) since αj = min {αl , αm }. Therefore, the proof is complete. The case j = n is the desired result. Proof of (2) Assume that β ≥ α; First, also assume that Γ ∪ {(ϕ β)} P IL (ψ α); By the previous part, we have that Γ P IL (ϕ → ψ α), as desired. Conversely, assume that Γ P IL (ϕ → ψ α); By monotony, we have that Γ ∪ {(ϕ β)} P IL (ϕ → ψ α), and since Γ ∪ {(ϕ β)} P IL (ϕ β), we obtain by GMP that Γ ∪{(ϕ β)} P IL (ψ min {α, β}), that is, Γ ∪{(ϕ β)} P IL (ψ α). Note that the proof given for the previous theorem uses theorems also valid on classical logic, so this theorem also holds in Standard Possibilistic Logic. Also, observe that as a consequence of the previous result, if we put β = 1 on 3 (2), then we obtain Theorem 2 as a corollary. We now define the concept of “dependence” upon a formula. Let (ϕ α) be a formula in a set Γ of formulas, and assume that we are given a deduction D1 , D2 , . . . , Dn from Γ , together with justification for each step in the deduction. We shall say that Di depends upon (ϕ α) in this proof if and only if: 1. Di is (ϕ α) and the justification for Di is that it belongs to Γ , or 2. Di is justified as a direct consequence by GMP or by the Rule (S) of some previous formulas of the sequence, where at least one of these preceding formulas depends upon (ϕ α). Theorem 4 (Generalized Deduction Theorem (2nd version)). Let Γ ∪ {ϕ, ψ} be PIL formulas and α, β ∈ (0, 1]. Assume that Γ ∪ {(ϕ β)} P IL (ψ α), and let Γ ∗ = {(φ1 ρ1 ), (φ2 ρ2 ), . . . , (φp ρp )} ⊆ Γ be the set of formulas on which the proof of (ψ α) from Γ ∪ {(ϕ β)} depends; Then, Γ P IL (ϕ → ψ min {ρ1 , ρ2 , . . . , ρp }) Proof. Let (ψ1 α1 ), (ψ2 α2 ), . . . , (ψn αn ) be a proof of (ψ α) from Γ ∪ {(ϕ β)}. We will prove by induction on j that Γ P IL (ϕ → ψj min {ρ1 , ρ2 , . . . , ρp }) for 1 ≤ j ≤ n. Base Case. First, (ψ1 α1 ) could be in Γ ∗ , it could be an axiom of PIL or it could be equal to (ϕ β) by Axiom PIL-1, we have that P IL (ψ1 → (ϕ → ψ1 ) 1), so in the first two cases we have by GMP, that Γ P IL (ϕ → ψ1 ρ1 ), and then, Γ P IL (ϕ → ψ1 min {ρ1 , ρ2 , . . . , ρp }); For the third case, when (ψ1 α1 ) is equal to (ϕ β), we have by Lemma 2 that P IL (ϕ → ϕ 1) and therefore, by Rule (S), we obtain Γ P IL (ϕ → ψ1 min {ρ1 , ρ2 , . . . , ρp }). Inductive Hypothesis. Now, assume that Γ P IL (ϕ → ψk min {ρ1 , ρ2 , . . . , ρp }) for k < j, then we have one of the following cases:
368
O. Estrada, J. Arrazola, and M. Osorio
(ψj αj ) could be an axiom of PIL, it could be that (ψj αj ) belongs to Γ ∗ , it could be that (ψj αj ) is equal to (ϕ β), it could be that (ψj αj ) follows by Rule (S) from a formula (ψj γ) with γ ≥ αj , 5. it could be that (ψj αj ) follows by GMP from some formulas (ψl αl ) and (ψm αm ) with l, m < j and ψm = ψl → ψj and αm is such that αj = min {αl , αm }.
1. 2. 3. 4.
In the first three cases, Γ P IL (ϕ → ψj min {ρ1 , ρ2 , . . . , ρp }) follows in an analogous way as in the case j = 1. If (ψj αj ) follows from Rule (S), then (ψj γ) = (φr ρr ) for some r (since (ψj γ) = (ϕ β) and, therefore, (ψ γ) ∈ Γ ); So we have γ = ρr ≥ min {ρ1 , ρ2 , · · · , ρp } and then we obtain Γ P IL (ϕ → ψj min {ρ1 , ρ2 , . . . , ρp }). Finally, for the last case, by the induction hypothesis, we have that Γ P IL (ϕ → ψl min {ρ1 , ρ2 , . . . , ρp }) and Γ P IL (ϕ → (ψl → ψj ) min {ρ1 , ρ2 , . . . , ρp }) and by the Axiom PIL-2, we have P IL ((ϕ → (ψl → ψj )) → ((ϕ → ψl ) → (ϕ → ψj )) 1) So, applying GMP to the last formula, we obtain: Γ P IL ((ϕ → ψl ) → (ϕ → ψj ) min {ρ1 , ρ2 , . . . , ρp }) Again, applying GMP to the last formula, we obtain: Γ P IL (ϕ → ψj min {min {ρ1 , ρ2 , . . . , ρp } , min {ρ1 , ρ2 , . . . , ρp }}) Therefore, we have Γ P IL (ϕ → ψj min {ρ1 , ρ2 , . . . , ρp }) so, the proof is complete. The case j = n is the desired result.
Observe that if in the previous result we let Γ = ∅, then the theorem states that if (ϕ β) P IL (ψ α), then P IL (ϕ → ψ min {∅}), that is, P IL (ϕ → ψ 1), which is what one would expect; For instance, is clear that (ϕ 0.3) P IL (ϕ 0.3), so when applying this version of the Generalized Deduction Theorem we obtain P IL (ϕ → ϕ 1) which is what is expected , since Int ϕ → ϕ. As a consequence of Theorem 3 we obtain the following result. Theorem 5 (Weak Refutation Theorem ). Let Γ be a set of PIL formulas, and let ϕ be an intuitionistic propositional formula, and α, β ∈ (0, 1]. If β ≥ α, then we have Γ ∪ {(¬¬ϕ β)} P IL (⊥ α) if and only if Γ P IL (¬ϕ α)
A Possibilistic Intuitionistic Logic
369
Proof. Γ ∪ {(¬¬ϕ β)} P IL (⊥ α) if and only if Γ P IL (¬¬ϕ → ⊥ α), if and only if Γ P IL (¬¬¬ϕ α). Recall that Int ¬¬¬ϕ ↔ ¬ϕ, and thus, we have P IL (¬¬¬ϕ ↔ ¬ϕ 1) Therefore, by Rule (S), we have that P IL (¬¬¬ϕ ↔ ¬ϕ α). So, Γ P IL (¬¬¬ϕ α) if and only if Γ P IL (¬ϕ α.)
Theorem 6 (Cut Rule). Let Γ be a set of PIL formulas, let ϕ, ψ two intuitionistic propositional formulas and α, β ∈ (0, 1]. Then we have that If Γ P IL (ϕ β) and Γ ∪ {(ϕ β)} P IL (ψ α) then Γ P IL (ψ min {α, β}) Proof. Assume that Γ P IL (ϕ β) and that Γ ∪ {(ϕ β)} P IL (ψ α) by Theorem 3, we have that Γ P IL (ϕ → ψ α). Therefore, using GMP we obtain that Γ P IL (ψ min {α, β}), as desired.
Consider the set of PIL formulas Γ = {(ϕ → ψ 0.2), (ϕ → ¬ψ 0.7), (σ → ϕ 0.3)}. It it easy to see that Γ P IL (¬ϕ 0.2), and also that Γ ∪ {(¬ϕ 0.2)} P IL (¬ψ 0.2). So, by the previous theorem, we have that Γ P IL (¬ψ 0.2). The following lemma is an ad-hoc result needed for the proof of Theorem 7. Lemma 3. 1. P IL ((σ → (ϕ ↔ ψ)) → (σ → (¬ϕ ↔ ¬ψ)) 1) 2. P IL ((ϕ → (ψ ↔ σ)) ∧ (ϕ → ( ↔ μ)) → (ϕ → ((ψ → ) ↔ (σ → μ))) 1) 3. P IL ((ϕ → (ψ ↔ σ)) ∧ (ϕ → ( ↔ μ)) → (ϕ → ((ψ ∨ ) ↔ (σ ∨ μ))) 1) 4. P IL ((ϕ → (ψ ↔ σ)) ∧ (ϕ → ( ↔ μ)) → (ϕ → ((ψ ∧ ) ↔ (σ ∧ μ))) 1) 5. P IL (ϕ → (ψ ↔ ψ) 1) We present a version of the Substitution Theorem for Possibilistic Intuitionistic Logic; The proof of this result is similar to the one presented for Proposition 2.9 in [Me1], it differs from it in that our proof is arranged for a propositional calculus. Theorem 7 (Substitution Theorem). Let ϕ be a classical propositional formula and let ψ be a subformula of ϕ; Let ϕ be the formula which results by substituting some, or none, occurrences of ψ in ϕ by a formula σ. 1. P IL ((ψ ↔ σ) → (ϕ ↔ ϕ ) 1) 2. If Γ P IL (ψ ↔ σ α), then Γ P IL (ϕ ↔ ϕ α) Proof. 1. We will use induction on the number of connectives in ϕ. Observe that if no occurrence ψ is changed, then ϕ is equal to ϕ , and then, the formula to be proven is an instance of Lemma 3(5). So, we have that P IL ((ψ ↔ σ) → (ϕ ↔ ϕ ) 1).
370
O. Estrada, J. Arrazola, and M. Osorio
Also note that, if ψ is identical to ϕ and this occurrence of ψ is substituted by σ, then the formula to be proven is P IL ((ψ ↔ σ) → (ϕ ↔ ϕ ) 1) which is an instance of Lemma 2. Therefore, we can assume that ψ is a proper subformula of ϕ, and that at least an occurrence of ψ has been substituted ϕ. For the case in which ϕ is an atomic formula, we have that ψ cannot be a proper subformula of ϕ and thus, we have the case in which ψ is identical to ϕ. Our inductive hypothesis is that the result holds for all formulas with less connectives than ϕ. Case 1: ϕ is a formula of the form ¬ρ. Let ϕ be equal to ¬ρ ; by inductive hypothesis we have that P IL ((ψ ↔ σ) → (ρ ↔ ρ 1) Therefore, using an instance of Lemma 3(1) we have that P IL ([(ψ ↔ σ) → (ρ ↔ ρ )] → [(ψ ↔ σ) → (¬ρ ↔ ¬ρ )] 1) and when we apply GMP we obtained, ⎛ P IL
⎞ ϕ ⎜ ⎟ ⎝(ψ ↔ σ) → ( ¬ρ ↔ ¬ρ ) 1⎠ ϕ
Case 2: ϕ is a formula of the form ρ → ν. Let ϕ be equal to ρ → ν ; By inductive hypothesis we have that P IL ((ψ ↔ σ) → (ρ ↔ ρ ) 1) and that P IL ((ψ ↔ σ) → (ν ↔ ν ) 1); Using an instance of Lemma 3(2), we have P IL ([(ψ ↔ σ) → (ρ ↔ ρ )] ∧ [(ψ ↔ σ) → (ν ↔ ν )] → [(ψ ↔ σ) → ((ρ → ν) ↔ (ρ → ν ))] 1) So, applying GMP we get, ⎛ P IL
⎞ ϕ ⎜ ⎟ ⎝(ψ ↔ σ) → ((ρ → ν) ↔ (ρ → ν )) 1⎠ ϕ
Case 4: ϕ is a formula of the form ρ ∨ ν. Let ϕ be equal to ρ ∨ ν ; By induction hypothesis we have that P IL ((ψ ↔ σ) → (ρ ↔ ρ ) 1) and that P IL ((ψ ↔ σ) → (ν ↔ ν ) 1); Using an instance of Lemma 3(3), we have
A Possibilistic Intuitionistic Logic
371
P IL ([(ψ ↔ σ) → (ρ ↔ ρ )] ∧ [(ψ ↔ σ) → (ν ↔ ν )] → [(ψ ↔ σ) → ((ρ ∨ ν) ↔ (ρ ∨ ν ))] 1) So, applying GMP we get, ⎛ P IL
⎞ ϕ ϕ ⎜ ⎟ ⎝(ψ ↔ σ) → ((ρ ∨ ν) ↔ (ρ ∨ ν )) 1⎠
Case 5: ϕ is a formula of the form ρ ∧ ν. Let ϕ be equal to ρ ∧ ν ; By inductive hypothesis we have that P IL ((ψ ↔ σ) → (ρ ↔ ρ ) 1) and that P IL ((ψ ↔ σ) → (ν ↔ ν ) 1); Using an instance of Lemma 3(4), we have P IL ([(ψ ↔ σ) → (ρ ↔ ρ )] ∧ [(ψ ↔ σ) → (ν ↔ ν )] → [(ψ ↔ σ) → ((ρ ∧ ν) ↔ (ρ ∧ ν ))] 1) So, when applying GMP we get, ⎛ P IL
⎞ ϕ ⎜ ⎟ ⎝(ψ ↔ σ) → ((ρ ∧ ν) ↔ (ρ ∧ ν )) 1⎠ ϕ
2. If Γ P IL (ψ ↔ σ α) then, by part (1) and GMP and monotony, we obtain Γ P IL (ϕ ↔ ϕ α).
Consider the set of PIL formulas Γ = {(¬¬ϕ → ϕ 0.3)}. Since P IL (ϕ → ¬¬ϕ 1), it is easy to show that Γ P IL (ϕ ↔ ¬¬ϕ 0.3). Now, by the previous theorem we have that Γ P IL ((ϕ → ψ) ↔ (¬¬ϕ → ψ) 0.3) Theorem 8 (PIL Glivenko’s Theorem). Let Γ ∪ {(ϕ α)} be PIL formulas, and assume that Γ P os (ϕ α); Let Γ ∗ = {(γ1 α1 ), (γ2 α2 ), . . . , (γn αn )} ⊆ Γ be such that the proof of (ϕ α) from Γ depends on the formulas on Γ ∗ , then we have that Γ P IL (¬¬ϕ min {α1 , α2 , . . . , αn }) Proof. Assume that Γ P os (ϕ α): By hypothesis, we have that Γ ∗ P os (ϕ α) that is, (γ1 α1 ), (γ2 α2 ), · · · , (γn−1 αn−1 ), (γn αn ) P os (ϕ α) by the 2nd version of the Generalized Deduction Theorem (Theorem 4), we have that (γ1 α1 ), (γ2 α2 ), · · · , (γn−1 αn−1 ) P os (γn → ϕ
min {α1 , α2 , . . . , αn−1 })
again, applying Theorem 4, we have (γ1 α1 ), (γ2 α2 ), · · · , (γn−2 αn−2 ) P os (γn−1 → (γn → ϕ) min {α1 , α2 , . . . , αk })
372
O. Estrada, J. Arrazola, and M. Osorio
where {α1 , α2 , · · · , αk } ⊆ {α1 , α2 , · · · , αn−1 } and Γ ∗∗ = {(γ1 α1 ), (γ2 α2 ), · · · , (γk αk )} is the set of formulas on which this last deduction depends upon. Note that min{α1 , α2 , · · · , αk } ≥ min{α1 , α2 , · · · , αn−1 }. Applying Theorem 4 many times, we have P os (γ1 ∧ γ2 ∧ · · · ∧ γn → ϕ 1) which implies, Cl γ1 ∧ γ2 ∧ · · · ∧ γn → ϕ now, by Glivenko’s Theorem [Va1], we have that Int ¬¬ (γ1 ∧ γ2 ∧ · · · ∧ γn → ϕ) Recall that Int ¬¬(χ → μ) ↔ (¬¬χ → ¬¬μ) and that also Int ¬¬(χ ∧ μ) ↔ (¬¬χ ∧ ¬¬μ). Therefore, we have that Int ¬¬ (γ1 ∧ γ2 ∧ · · · ∧ γn ) → ¬¬ϕ and furthermore, Int ¬¬γ1 ∧ ¬¬γ2 ∧ · · · ∧ ¬¬γn → ¬¬ϕ So, P IL (¬¬γ1 ∧ ¬¬γ2 ∧ · · · ∧ ¬¬γn → ¬¬ϕ
1) .
By Rule (S) we have that P IL (¬¬γ1 ∧ ¬¬γ2 ∧ · · · ∧ ¬¬γn → ¬¬ϕ
min {α1 , α2 , . . . , αn })
now, applying several times the Generalized Deduction Theorem (Theorem 3) (2), we have that (¬¬γ1 α1 ), (¬¬γ2 α2 ), . . . , (¬¬γn αn ) P IL (¬¬ϕ min {α1 , α2 , . . . , αn }) On the other hand, recall that Int χ → ¬¬χ and so, P IL (χ → ¬¬χ 1). By Rule (S), we have that P IL (χ → ¬¬χ α). Then, by Theorem 3 (2), (χ α) P IL (¬¬χ α). Therefore, for each i ∈ {1, 2, · · · , n} we have that (γi αi ) P IL (¬¬γi αi ). Thus, (γ1 α1 ), (γ2 α2 ), . . . , (γn αn ) P IL (¬¬γ1 α1 ), (¬¬γ2 α2 ), . . . , (¬¬γn αn ) and therefore, (γ1 α1 ), (γ2 α2 ), . . . , (γn αn ) P IL (¬¬ϕ min {α1 , α2 , . . . , αn }) So, by monotony, we have Γ P IL (¬¬ϕ min {α1 , α2 , . . . , αn }).
Consider the set of PIL formulas Γ = {(¬α → ϕ 0.8), (α → ϕ 0.7)}. Since: P os ((¬α → ϕ) → ((α → ϕ) → ((¬α ∨ α) → ϕ))) 1), and P os (¬α ∨ α 1) it is easy to show that Γ P os (ϕ 0.7). Now, by Glivenko’s Theorem we have that Γ P IL (¬¬ϕ 0.7) .
A Possibilistic Intuitionistic Logic
4
373
Conclusion
On this brief note we presented Possibilistic Intuitionistic Logic in a purely axiomatic way. This allowed us to state some basic, but relevant results. These are partial results, but as a first attempt, they pave the way for results on reduction of theories. For future work, we leave the task of finding a suitable semantics.
References [Ge1]
van Gelder, A., Ross, K.A., et al.: The well-founded semantics for general logic programs. J. ACM 38(3), 620–650 (1991) [Ca1] Carballido, J.L., Osorio, M., et al.: Equivalence for the G3 −stable models sematics. J. Applied Logic 8(1), 82–96 (2010) [Du1] Dubois, D., Lang, J., Prade, H.: Possibilistic logic. In: Gabbay, D.M., Hogger, C.J., Robinson, J.A. (eds.) Handbook of Logic in Artificial Intelligence and Logic Programming, vol. 3, pp. 439–513. Oxford University Press, Inc., New york (1994) [Jo1] De Jongh, D.H.J., Hendrix, L.: Characterizations of strongly equivalent logic programs in intermediate logics. Theory and Practice of Logic Programming 3(3), 259–270 (2003) [Li1] Lifschitz, V., Pearce, D., et al.: Strongly equivalent logic programs. ACM Trans. Comput. Logic 2(4), 526–541 (2001) [Ma1] Marek, V., Truszczyn’ski, M.: Stable models and an alternative logic programming paradigm. In: The Logic Programming Paradigm: a 25-Year Perspective, pp. 169–181. Springer, Heidelberg (1999) [Me1] Mendelson, E.: Introduction to mathematical logic, 4th edn. Chapman & Hall / CRC (1997) [Ni1] Niemel¨ a, I.: Logic programs with stable model semantics as a constraint programming paradigm. Annals of Mathematics and Artificial Intelligence 25, 241– 273 (1999) [Ni2] Nicolas, P., Garcia, L., et al.: Possibilistic uncertainty handling for answer set programming. Annals Math. Artif. Intell. 47, 139–181 (2006) [Os1] Osorio, M., Nieves, J.C.: Possibilistic well-founded semantics. In: MICAI 2009: Advances in Artificial Intelligence, vol. 5845, pp. 15–26 (2009) [Os2] Osorio, M., Nieves, J.C.: Pstable semantics for possibilistic logic programs. In: ´ Gelbukh, A., Kuri Morales, A.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 294–304. Springer, Heidelberg (2007) [Os3] Osorio, M., Navarro, J.A., et al.: Ground non-monotonic modal logic S5: new results. J. Log. Computation 15(5), 787–813 (2005) [Os4] Osorio, M., Navarro, J.A., et al.: Equivalence in answer set programming. In: Pettorossi, A. (ed.) LOPSTR 2001. LNCS, vol. 2372, pp. 57–75. Springer, Heidelberg (2002) [Pe1] Pearce, D.: Stable inference as intuitionistic validity. The Journal of Logic Programming 38(1), 79–91 (1999) [Sh1] Shannon, C.: A mathematical theory of communication. Bell System Technical Journal 27, 379–426, 623–656 (1948) [Va1] Van Dalen, D.: Logic and structure. Springer, Heidelberg (March 2004) [Za1] Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems 1, 3–28 (1978)
Jason Induction of Logical Decision Trees: A Learning Library and Its Application to Commitment Alejandro Guerra-Hern´ andez1, Carlos Alberto Gonz´ alez-Alarc´on1, 2 and Amal El Fallah Seghrouchni 1
Departamento de Inteligencia Artificial Universidad Veracruzana Facultad de F´ısica e Inteligencia Artificial Sebasti´ an Camacho No. 5, Xalapa, Ver., M´exico, 91000
[email protected], dn
[email protected] 2 Laboratoire d’Informatique de Paris 6 Universit´e Pierre et Marie Curie 4, Place Jussieu, Paris, France, 75005
[email protected]
Abstract. This paper presents JILDT (Jason Induction of Logical Decision Trees), a library that defines two learning agent classes for Jason, the well known java-based implementation of AgentSpeak(L). Agents defined as instances of JILDT can learn about their reasons to adopt intentions performing first-order induction of decision trees. A set of plans and actions are defined in the library for collecting training examples of executed intentions, labeling them as succeeded or failed executions, computing the target language for the induction, and using the induced trees to modify accordingly the plans of the learning agents. The library is tested studying commitment: A simple problem in a world of blocks is used to compare the behavior of a default Jason agent that does not reconsider his intentions, unless they fail; a learning agent that reconsiders when to adopt intentions by experience; and a single-minded agent that also drops intentions when this is rational. Results are very promissory for both, justifying a formal theory of single-mind commitment based on learning, as well as enhancing the adopted inductive process. Keywords: Multi-Agent Systems, Intentional Learning, Commitment, AgentSpeak(L).
1
Introduction
It is well known that the the Belief-Desire-Intention (BDI) model of agency [9,10] lacks of learning competences. Intending to cope with this problem, this paper introduces JILDT (Jason Induction of Logical Decision Trees): A library that defines two learning agent classes for Jason [3], the well known java-based implementation of the AgentSpeak(L) BDI model [11]. Agents defined as instances G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 374–385, 2010. c Springer-Verlag Berlin Heidelberg 2010
Jason Induction of Logical Decision Trees
375
of the JILDT intentionalLearner class can learn about their reasons to adopt intentions, performing first-order induction of logical decision trees [1]. A set of plans and actions are defined in the library for collecting training examples of executed intentions, labeling them as succeeded or failed executions, computing the target language for the induction, and using the induced trees to modify accordingly the plans of the learning agents. In this way, the intentional learning approach [5] can be applied to any Jason agent by declaring the membership to this class. The second class of agents defined in JILDT deals with single-mind commitment [9], i.e., an agent is single-mind committed if once he intends something, he maintains his intention until he believes it has been accomplished or he believes it is not possible to eventually accomplish it anymore. It is known that Jason agents are not single-minded by default [3,6]. So, agents defined as instances of the JILDT singleM inded class achieve single-mind commitment, performing a policy-based reconsideration, where policies are rules for dropping intentions learned by the agents. This is foundational and theoretical relevant, since the approach reconciles policy-based reconsideration, as defined in the theory of practical reasoning [4], with computational notions of commitment as the single-mind case [9]. Attending in this way the normative and descriptive aspects of reconsideration, opens the door for a formal theory of reconsideration in AgentSpeak(L) based on intentional learning. Organization of the paper is as follows: Section 2 offers a brief introduction to the AgentSpeak(L) agent oriented programming language, as defined in Jason. An agent program, used in the rest of the paper, is introduced to exemplify the reasoning cycle of Jason agents. Section 3 introduces the Top-Down Induction of Logical Decision Trees (Tilde) method, emphasizing the way Jason agents can use it for learning. Section 4 describes the implementation of the JILDT library. Section 5 presents the experimental results for three agents in the blocks world: a default Jason agent, an intentional learner and a single-mind committed agent. Section 6 offers discussion, including related and future work.
2
Jason and AgentSpeak(L)
Jason [3] is a well known java-based implementation of the AgentSpeak(L) [11] abstract language for BDI agents. As usual an agent ag is formed by a set of plans ps and beliefs bs. Each belief bi ∈ bs is a ground first-order term. Each plan p ∈ ps has the form trigger event : context ← body. A trigger event can be any update (addition or deletion) of beliefs (at) or goals (g).The context of a plan is an atom, a negation of an atom or a conjunction of them. A non empty plan body is a sequence of actions (a), goals, or belief updates. denotes empty elements, e.g., plan bodies, contexts, intentions. Atoms (at) can be labelled with sources. Two kinds of goals are defined, achieve goals (!) and test goals (?). The operational semantics [3] of the language, is given by a set of rules that define a transition system (see figure 1) between configurations ag, C, M, T, s, where:
376
A. Guerra-Hern´ andez, C.A. Gonz´ alez-Alarc´ on, and A. El Fallah Seghrouchni
– ag is an agent program formed by a set of beliefs bs and plans ps. – An agent circumstance C is a tuple I, E, A, where: I is a set of intentions; E is a set of events; and A is a set of actions to be performed in the environment. – M is a set of input/output mailboxes for communication. – T stores the current applicable plans, relevant plans, intention, etc. – s labels the current step in the reasoning cycle of the agent. Rel2
SelEv1
SelEv
SelEv2
ProcMsg
AchvGl
Rel1
RelPl
ClrInt1
ApplPl
Appl2
Appl1
SelInt2
ClrInt3
SelInt
ClrInt2
SelAppl
ClrInt
SelAppl SelInt1
TestGl1 TestGl2
AddBel DelBel
Action ExecInt
ExtEv IntEv
AddIM
Fig. 1. The transition system for AgentSpeak(L) operational semantics
An artificially simplified agent program for the blocks world environment, included in the distribution of Jason, is listed in the table 1. Examples in the rest of this paper are based on this agent program. Initially he believes that the table is clear (line 3) and that something with nothing on is clear too (line 2). He has a plan labeled put (line 10) expressing that to achieve putting a block X on Y , in any context (true), he must move X to Y . Our agent is bold about putting things somewhere else. Now suppose the agent starts running in his environment, where someone else asks him to put b on c. A reasoning cycle of the agent in the transition system of Jason is as follows: at the configuration procM sg the beliefs about on/2 are perceived (lines 5–8) reflecting the state of the environment; and an event +!put(b, c) is pushed on CE . Then this event is selected at configuration SelEv and the plan put is selected as relevant at configuration RelP l. Since the context of put is true, it is always applicable and it will be selected to form a new intention in CI at AddIM . Once selected for execution at SelInt, the action move(b, c) will be actually executed at ExecInt and since there is nothing else to be done, the intention is dropped from CI at ClrInt. Coming back to P rocM sg results in the agent believing on(b, c) instead of on(b, a).
Jason Induction of Logical Decision Trees
377
Table 1. A simplified agent in the blocks world 1 2 3 4 5 6 7 8 9 10 11
// Beliefs clear ( X ) : - not ( on (_ , X ) ) . clear ( table ) . // Beliefs perceived on (b , a ) . on (a , table ) . on (c , table ) . on (z , table ) . // Plans @[ put ] +! put (X , Y ) : true < - move (X , Y ) .
Now, what if something goes wrong? For instance, if another agent puts the block z on c before our agent achieves his goal? Well, his intention will fail. And it will fail every time this happens. The following section introduces the induction of logical decision trees, and the way they can be used to learn things like put is applicable only when Y is clear.
3
Tilde
Top-down Induction of Logical Decision Trees (Tilde) [1] has been used for learning in the context of Intentional BDI agents [5], mainly because the inputs required for this method are easily obtained from the mental state of such agents; and the obtained hypothesis are useful for updating the plans and beliefs of the agents, i.e., these trees can be used to express hypotheses about the successful or failed executions of the intentions, as illustrated in figure 2. This section introduces Tilde emphasizing this compatibility with the agents in Jason. A Logical Decision Tree is a binary first-order decision tree where: – Each node is a conjunction of first-order literals; and – The nodes can share variables, but a variable introduced in a node can only occur in the left branch below that node (where it is true). Three inputs are required to compute a logical decision tree: First, a set of training examples known as models, where each trainning example is composed by the set of beliefs the agent had when the intention was adopted; a literal coding what is intended; and a label indicating a successful or failed execution of the intention. Models are computed every time the agent believes an intention has been achieved (success) or dropped (failure). Table 2 shows two models corresponding to the examples in figure 2. The class of the examples is introduced at line 2, and the associated intention at line 3. The rest of the model corresponds to the beliefs of the agent when he adopted the intention.
378
A. Guerra-Hern´ andez, C.A. Gonz´ alez-Alarc´ on, and A. El Fallah Seghrouchni success intend(put, A, B), clear(A)
b a
c
z failure
clear(B) failure
b
z
a
c
success
failure
Fig. 2. A Tilde simplified setting: two training examples and the induced tree, when intending to put b on c Table 2. The training examples from figure 2 as models for Tilde. Labels at line 2. 1 2 3 4 5 6 7 8
begin ( model (1) ) succ . intend ( put ,b , c ) . on (b , a ) . on (a , table ) . on (c , table ) . on (z , table ) . end ( model (1) )
begin ( model (2) ) fail . intend ( put ,b , c ) . on (b , a ) . on (a , table ) . on (z , c ) . on (c , table ) . end ( model (2) )
Second, the rules believed by the agent, like clear/1 in table 1 (lines 2–3), do not form part of the training examples, since they constitute the background knowledge of the agent, i.e., general knowledge about the domain of experience of the agent. And third, the language bias, i.e., the definition of which literals are to be considered as candidates to be included in the logical decision tree, is defined combinatorially after the literals used in the agent program, as shown in table 3. The rmode directives indicate that their argument should be considered as a candidate to form part of the tree. The lookahead directives indicate that the conjunction in their argument should be considered as a candidate too. The last construction is very important since it links logically the variables in the intended plan with the variables in the candidate literals, enabling generalization. For the considered example, the induced decision tree for two successful examples and one failed is showed in table 4. Roughly, it is interpreted as: when intending to put a block A on B, the intention succeeds if B is clear (line 2), and fails otherwise (line 3). With more examples, it is expected to build the tree equivalent to the one shown in figure 2. Induction is computed recursively as in ID3. A set of candidates is computed after the language bias, and the one that maximizes information gain is selected as the root of the tree. The process finishes when a stop criteria is reached. Details about upgrading ID3 to Tilde, can be found in [2].
Jason Induction of Logical Decision Trees
379
Table 3. The language bias defining the vocabulary to build the decision tree 1 2 3 4 5 6
rmode ( clear ( V1 ) ) . rmode ( on ( V1 , V2 ) ) . rmode ( on ( V2 , V1 ) ) . rmode ( intend ( put , V1 , V2 ) ) . lookahead ( intend ( put , V1 , V2 ) , clear ( V1 ) ) . lookahead ( intend ( put , V1 , V2 ) , clear ( V2 ) ) . lookahead ( intend ( put , V1 , V2 ) , on ( V1 , V2 ) ) . lookahead ( intend ( put , V1 , V2 ) , on ( V2 , V1 ) ) . Table 4. The induced Logical Decision Tree 1 2 3
4
intend ( put ,A , B ) , clear ( B ) ? + - - yes : [ succ ] 1.0 [ [ succ :1.0 , fail :0.0 ] ] + - - no : [ fail ] 1.0 [ [ succ :0.0 , fail :1.0 ] ]
Implementation
JILDT implements two classes of agents: The first one is the intentionaLearner class, that implements agents capable of redefining the context of their plans accordingly to the induced decision trees. In this way, the reasons to adopt a plan that has failed, as an intention in future deliberations, are reconsidered. The second one is the singleMindedLearner class, that implements agents that are also capable of learning rules that express when it is rational to drop an intention. The body of these rules is obtained from the branches in the induced decision trees that lead to failure. For this, the library defines a set of plans to allow the agents to autonomously perform inductive experiments, as described in section 3, and to exploit their discoveries. The table 5 lists the main actions implemented in java to be used in the plans of the library. The rest of the section describes the use of these plans by a learning agent. Table 5. Principal actions defined in the JILDT library Action getCurrentBels(Bs) getCurrentCtxt(C) getCurrentInt(I) getLearnedCtxt(P,LC,F) changeCtxt(P,LC) setTilde(P) execTilde addDropRule(LC,P) setLearningMode setSMLearningMode
Description Bs unifies with the list of current beliefs of the agent. C unifies with the context of the current plan. I unifies with the current intention. LC unifies with the learned context for plan P . F is true if a new different context has been learned. Changes the context of plan P for LC. Builds the input files for learning about plan P . Executes Tilde saving inputs and results. Adds the rule to drop plan P accordingly to LC. Modifies plans to enable learning (intentionalLearner). Modifies plans to enable learning and dropping rules (singleMindedLearner class).
380
A. Guerra-Hern´ andez, C.A. Gonz´ alez-Alarc´ on, and A. El Fallah Seghrouchni
Both classes of agents define a plan @initialLearningGoal to set the correct learning mode (intentional or singleMinded) by extending the user defined plans to deal with the learning process. For example, such extensions applied to the plan put, as defined for the agent listed in table 1, are shown in the table 6. The original body of the plan is at line 6. If this plan is adopted as an intention and correctly executed, then the agent believes (line 8) a new successful training example about put, including his beliefs at the time the plan was adopted. Table 6. JILDT extensions for plan put (original body at line 6) 1 2 3 4 5 6 7 8
@ [ put ] + ! put (X , Y ) : true < jildt . getCurrentIn t ( I ) ; jildt . getCurrentBe l s ( Bs ) ; + intending (I , Bs ) ; move (X , Y ) ; - intending (I , Bs ) ; + example (I , Bs , succ ) ;
Fun starts when facing problems: First, if the execution of an intention fails, for instance, because move could not be executed correctly, an alternative added plan, as the one showed in table 7, responds to failure event −!put(X, Y ). The result is a f ailure training example added to the beliefs of the agent (line 4) and an inductive process intended to be achieved (line 5). Table 7. A plan added by JILDT to deal with put failures requiring induction 1 2 3 4 5 6
@ [ put_failCase ] -! put (X , Y ) : intending ( put (X , Y ) , Bs ) < - intending (I , Bs ) ; + example (I , Bs , fail ) ; ! learning ( put ) ; + example_pro ce s se d ;
But, if the context of the plan put is different from true, because the agent already had learned a new context, or because he was defined like that, a failure event will be produced and the inductive process should not be intended. In this case we say that plan put was relevant but non applicable. The plan in table 8 deals with this situation. It is rational to avoid commitment if there is no applicable plans for a given event. Observe that there is a small ontology associated to the inductive processes. Table 9 lists the atomic formulae used with this purpose. These formulae should be treated as a set of reserved words.
Jason Induction of Logical Decision Trees
381
Table 8. A plan added by JILDT to deal with put being non applicable 1 2 3 4
@[ p u t _ f a i l C a s e _ N o R e l e v a n t] -! put (X , Y ) : not intending ( put (X , Y ) ,_ ) < .print (" Plan " , put ," non applicable .") ; + non_applicabl e ( put ) . Table 9. A small ontology used by JILDT
Atom drop(I) root path(R) current path(P) dropped int(I) example(P,Bs,Class) intending(I, Bs) non applicable(TE)
Description I is an intention to be dropped. Head of dropping rules. R is the current root to Tilde experiments. P is the current path to Tilde experiments. The intention I has been dropped. A training example for plan P , beliefs Bs and Class. I is being intended yet. Class is still unknown. There were no applicable plans for the trigger event T E.
There is a plan @learning to build the inputs required by T ilde and executing it. If the agent succeeds in computing a Logical Decision Tree with the examples already collected, then he uses the tree to construct a new context for the associated plan (branches leading to success) and a set of rules for dropping the plan when it is appropriate (branches leading to failure). Two plans in the library are used to verify if something new has been learned.
5
Experiments
We have designed a very simple experiment to compare the behavior of a default Jason agent, an intentional learner, and single-minded agent that learns his policies for dropping intentions. For the sake of simplicity, these three agents are defined as shown in figure 1, i.e., they are all bold about putting blocks somewhere else; and that is their unique compentece. The experiment runs as illustrated in figure 3: The experimenter asks the other agents to achieve putting the block b on c, but with certain probability p(N ), he introduces noise in the experiment by putting the block z on c. There is also a latency probability p(L) for the last event: The experimenter could put block z before or after it asks the others agents to put b on c. This means that the other agents can perceive noise before or while intending to put b on c. Numerical results are shown in table 10 (average of 10 runs, each one of 100 experiments) for a probability of latency of 50%. The probability of noise varies (90%, 70%, 50%, 30%, and 10%). Lower values configure less dynamic environments free of surprises and effectively observable. The performance of the agent is interpreted as more or less rational as follows: dropping an intention because of the occurrence of an error, is considered irrational. Refusing to form an intention because the plan is not applicable; dropping the intention because
382
A. Guerra-Hern´ andez, C.A. Gonz´ alez-Alarc´ on, and A. El Fallah Seghrouchni experimenter Initial State c
!put(b,c)
done(success) done(success)
b c
b
z
a
c
failure before b
z
a
c
a
done(sucess)
z
p(N) failure
b
singleMinded
z
success a
learner
!put(b,c)
b a
default
!put(b,c)
p(L) failure p after a z
Learning
done(failure)
Learning
done(failure) +rejected_intention +rejected_intention
done(failure) done(failure) +dropped_intention
c
Fig. 3. The experiment process
of a reason to believe it will fail; and achieving the goal of putting b on c are considered rational behaviors. Figure 4 summarizes the result of all the executed experiments, where the probabilities of noise and latency range on {90%, 70%, 50%, 30%, 10%}. As expected the performance of the default agent is proportionally inverse to the probability of noise, independently of the probability of latency. The learner agent reduces the irrationality due to noise before the adoption of the plan as intention, because eventually he learns that in order to intend to put a block X on a block Y , Y must be clear: put (X , Y ) : clear ( Y ) < - move (X , Y ) .
Once this has been done, the learner can refuse to intend putting b on c if he perceives c is not clear. So, for low latency probabilities, he performs better than the default agent, but of course his performance decays as the probability of latency increases; and, more importantly: there is nothing to do if he perceives noise after the intention has been adopted. In addition, the singleMinded agent learns the following rule for dropping the intention when block Y is not clear: drop ( put (X , Y ) ) : - intending ( put (X , Y ) ,_ ) & not ( clear ( Y ) ) .
Jason Induction of Logical Decision Trees
383
Table 10. Experimental results (average from 10 runs of 100 iterations each one) for a probablity of latency of p(L)=0.5 and different probabilities of noise p(N) Agent default learner singleMinded default learner singleMinded default learner singleMinded default learner singleMinded default learner singleMinded
p(N) 90 90 90 70 70 70 50 50 50 30 30 30 10 10 10
Irrational after before total refuse 43.8 48.2 92.0 00.0 48.7 37.3 86.0 04.5 44.5 38.8 83.3 03.2 34.5 36.0 70.5 00.0 33.2 13.3 46.5 20.6 18.4 16.4 34.8 16.3 22.5 26.3 48.8 00.0 26.1 05.4 31.5 20.7 11.6 09.9 21.5 16.1 14.2 15.0 29.2 00.0 15.1 02.4 17.5 11.8 03.3 03.7 07.0 10.9 04.2 05.5 09.7 00.0 05.3 01.0 06.3 04.9 00.9 00.9 01.8 03.8
Rational drop achieve 00.0 08.0 00.0 09.5 03.8 09.7 00.0 29.5 00.0 32.9 17.5 31.4 00.0 51.2 00.0 47.8 14.9 47.5 00.0 70.8 00.0 70.7 12.0 70.1 00.0 90.3 00.0 88.8 03.4 91.0
total 08.0 14.0 16.7 29.5 53.5 65.2 51.2 68.5 78.5 70.8 82.5 93.0 90.3 93.7 98.2
Every time a singleM indedLearner agent instance is going to execute an intention, first it is verified that no reasons to drop the intention exist; otherwise the intention is dropped. So, when the singleM inded agent already intends to put b on c and the experimenter puts the block z on c, he rationally drops his intention. In fact, the singleMinded agent only fails when it is ready to execute the primitive action move and noise appears. For high probabilities of both noise and latency, the chances of collecting contradictory training examples increases and the performance of the learner and the singleMinded agents decay. By contradictory examples we mean that for the same blocks configuration, examples can be labeled as success, but also as failure. This happens because the examples are based on the beliefs of the agent when the plan was adopted as an intention, so that the later occurrence of noise is not included. In normal situations, an agent is expected to have different relevant plans for a given event. Refusing should then result in the adoption of a different relevant plan as a new intention. That is the true case of policy-based reconsideration, abandon is just an special case. Abandon is interpreted as rational behavior: the agent uses his learned policy-based reconsideration to prevent a real failure.
6
Discussion and Future Work
Experimental results are very promising. When compared with other experiments about commitment [7], it is observed that the intentionalLearner and singleMinded agents are adaptive: they were bold about put, and then they adopt a cautious strategy after having problems with their plan. Using intentional learning provides convergence to the right level of boldness-cautiousness
384
A. Guerra-Hern´ andez, C.A. Gonz´ alez-Alarc´ on, and A. El Fallah Seghrouchni
Fig. 4. The experiment results. Left: Def ault performance. Right: Learner performance. Center: SingleM inded performance.
based on their experience. But also, it seems that a bold attitude is adopted toward successful plans, and a cautious one toward failed plans; but more experiments are required to confirm this hypothesis. The JILDT library provides the extensions to AgentSpeak(L) required for defining intentional learning agents. Using the library, it was also easy to implement a single-mind committed class of agents. Extensions with respect to implementation include: implementing the inductive algorithm in java as an action of the JILDT library. Currently, the library computes the inputs for Tilde, but executes it to compute the logical decision trees. In this sense, we obtained a better understanding of the inductive method that will enable us to redefine it in JILDT. For instance, experimental results suggest that induction could be enhanced if the training examples represent not only the beliefs of the agent when the intention was adopted, but also when it was accomplished or dropped, in order to minimize the effects of the latency in noise. The transition system for the singleM inded agents has been modified to enable dropping intentions. Basically, every time the system is at execInt and a drop learned rule fires, the intention is dropped instead of being executed. It is possible now to think of a formal operational semantics for AgentSpeak(L) commitment based on policy-based reconsideration and intentional learning.
Jason Induction of Logical Decision Trees
385
In [12] an architecture for intentional learning is proposed. Their use of the term intentional learning is slightly different, meaning that learning was the goal of the BDI agents rather than an incidental outcome. Our use of the term is strictly circumscribed to the practical rationality theory [4] where plans are predefined and the target of the learning processes is the BDI reasons to adopt them as intentions. A similar goal is present in [8], where agents can be seen as learning the selection function for applicable plans. The main difference with our work is that they propose an ad hoc solution for a given non BDI agent. Our approach to single-mind commitment evidences the benefits of generalizing intentional learning as an extension for Jason. Acknowledgments. Authors are supported by Conacyt CB-2007 fundings for project 78910. The second author is also supported by scholarship 273098.
References 1. Blockeel, H., De Raedt, L.: Top-down induction of first-order logical decision trees. Artificial Intelligence 101(1-2), 285–297 (1998) 2. Blockeel, H., Raedt, L., Jacobs, N., Demoen, B.: Scaling up inductive logic programming by learning from interpretations. Data Mining and Knowledge Discovery 3(1), 59–93 (1999) 3. Bordini, R.H., H¨ ubner, J.F., Wooldridge, M.: Programming Multi-Agent Systems in AgentSpeak using Jason. Wiley, England (2007) 4. Bratman, M.: Intention, Plans, and Practical Reason. Harvard University Press, Cambridge (1987) 5. Guerra-Hern´ andez, A., Ort´ız-Hern´ andez, G.: Toward BDI sapient agents: Learning intentionally. In: Mayorga, R.V., Perlovsky, L.I. (eds.) Toward Artificial Sapience: Principles and Methods for Wise Systems, pp. 77–91. Springer, London (2008) 6. Guerra-Hern´ andez, A., Castro-Manzano, J.M., El Fallah Seghrouchni, A.: CTL AgentSpeak(L): a Specification Language for Agent Programs. Journal of Algorithms (64), 31–40 (2009) 7. Kinny, D., Georgeff, M.P.: Commitment and effectiveness of situated agents. In: Proceeding of the Twelfth International Conference on Artificial Intelligence IJCAI 1991, Sidney, Australia, pp. 82–88 (1991) 8. Nowaczyk, S., Malec, J.: Inductive Logic Programming Algorithm for Estimating ´ Quality of Partial Plans. In: Gelbukh, A., Kuri Morales, A.F. (eds.) MICAI 2007. LNCS (LNAI), vol. 4827, pp. 359–369. Springer, Heidelberg (2007) 9. Rao, A.S., Georgeff, M.P.: Modelling Rational Agents within a BDI-Architecture. In: Huhns, M.N., Singh, M.P. (eds.) Readings in Agents, pp. 317–328. Morgan Kaufmann, San Francisco (1991) 10. Rao, A.S., Georgeff, M.P.: Decision procedures for BDI logics. Journal of Logic and Computation 8(3), 293–342 (1998) 11. Rao, A.S.: AgentSpeak(L): BDI agents speak out in a logical computable language. In: de Velde, W.V., Perram, J.W. (eds.) MAAMAW 1996. LNCS, vol. 1038, pp. 42–55. Springer, Heidelberg (1996) 12. Subagdja, B., Sonennberg, L., Rahwan, I.: Intentional learning agent architecture. Autonomous Agents and Multi-Agent Systems 18, 417–470 (2008)
Extending Soft Arc Consistency Algorithms to Non-invertible Semirings Stefano Bistarelli1,2 , Fabio Gadducci3, Javier Larrosa4 , Emma Rollon4 , and Francesco Santini1,2 1
Dipartimento di Matematica e Informatica, Universit`a di Perugia
[email protected],
[email protected] 2 Istituto di Informatica e Telematica, CNR Pisa
[email protected],
[email protected] 3 Dipartimento di Informatica, Universit`a di Pisa
[email protected] 4 Departament de Llenguatges i Sistemes Inform`atics, Universitat Polit`ecnica de Catalunya
[email protected],
[email protected]
Abstract. We extend algorithms for arc consistency proposed in the literature in order to deal with (absorptive) semirings that are not invertible. As a consequence, these consistency algorithms can be used as a pre-processing procedure in soft Constraint Satisfaction Problems (CSPs) defined over a larger class of semirings: among other instances, for those semirings obtained as the cartesian product of any family of semirings. The main application is that the new arc consistency algorithm can be used for multi-criteria soft CSPs. To reach this objective, we first show that any semiring can be transformed into a new one where the + operator is instantiated with the Least Common Divisor (LCD) between the elements of the original semiring. The LCD value corresponds to the amount we can “safely move” from the binary constraint to the unary one in the arc consistency algorithm (when the × operator of the semiring is not idempotent). We then propose an arc consistency algorithm which takes advantage of this LCD operator. Keywords: Soft constraints, non-invertible semirings, local consistency.
1 Introduction and Motivations The basic idea in constraint programming (as surveyed in [13]) is that the user states the constraints and a general purpose constraint solver solves them. Constraints are just relations, and a Constraint Satisfaction Problem (CSP) states which relations should hold among the decision variables. Constraint satisfaction embeds any reasoning which consists in explicitly forbidding values or combinations of values for some variables of a problem because a given subset of its constraints cannot be satisfied otherwise. An important mean to accomplish this task is represented by local consistency algorithms. Local consistency [13, Section 3] is an essential component of a constraint solver: a local property with an enforcing, polynomial-time algorithm transforming a constraint
Supported by the Spanish Ministry of Science and Innovation project TIN 2006-15387-C03-0 and the Italian Ministry of University and Research project PRIN 20089M932N.
G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 386–398, 2010. c Springer-Verlag Berlin Heidelberg 2010
Extending Soft Arc Consistency Algorithms to Non-invertible Semirings
387
network into an equivalent one. If the resulting network is empty, the original problem/network is inconsistent, allowing the efficient detection of some inconsistencies. The semiring-based framework, as proposed in [4] (see [13, Section 9]), extends classical constraints by adding to the usual notion of CSP the concept of a structure representing the levels of satisfiability of a constraint. Such a structure is a set with two operations (see Section 2 for further details): one (written +) is used to generate an ordering over the levels, while the other one (×) is used to define how two levels can be combined and which level is the result of such combination. Because of the properties required on such operations, this structure is similar to a semiring (see Section 2): hence the terminology of “semiring-based soft constraint” [4,1], that is, constraints with several levels of satisfiability, and whose levels are (totally or partially) ordered according to the semiring structure. In general, problems defined according to the semiring-based framework are called Soft Constraint Satisfaction Problems (soft CSPs or SCSPs). In the literature, many local consistency algorithms have been proposed for soft CSPs (see e.g. [13, Section 9.6.2]): while classical consistency algorithms [12] aim at reducing the size of constraint problems, soft consistency algorithms work by making explicit the inconsistency level that is originally implicit in the problem. The most recent ones exploit invertible semirings (see Section 2), providing, under suitable conditions, an operator ÷ that is the inverse of ×, i.e., such that (a ÷ b) × b = a (see two alternative proposals in [2] and [7,6], respectively). In this paper we aim at generalizing the previous consistency algorithms to semirings that are not necessarily invertible. In particular, we first show how to distill from a semiring a novel one, such that its + operator corresponds to the Least Common Divisor (LCD) operator (see Section 3) of the elements in the semiring preference set. We then show, and this represents the practical application of the theoretical outcome, how to apply the derived semiring inside soft arc consistency algorithms in order to find solutions to the original, not necessarily invertible semiring, thus leading to a further generalization of soft arc consistency techniques. Summing up, this paper extends the use of consistency algorithms beyond the limits imposed by the proposals in [2,7]: for instance, the new consistency algorithms can be applied to multi-criteria problems, when modelled as soft CSPs [3]. Therefore, the main application of the paper is the practical use of the proposed new algorithm to satisfy multi-criteria problems, where multiple distinct criteria need to be satisfied at the same time, e.g. minimizing a money cost and saving time in the scheduling of activities [13, Section 22]. The paper is organized as follows: Section 2 summarizes the background notions about semirings and soft constraints. Section 3 shows how to assemble the new LCD operator by transforming a semiring, while Section 4 proposes its use inside a local consistency algorithm for soft CSPs. The final remarks are provided in Section 5.
2 Preliminaries Semirings provide an algebraic framework for the specification of a general class of combinatorial optimization problems. Outcomes associated to variable instantiations are modeled as elements of a set A, equipped with a sum and a product operator. These operators are used for combining constraints: the intuition is that the sum operator induces
388
S. Bistarelli et al.
a partial order a ≤ b, meaning that b is a better outcome than a; whilst the product operator denotes the aggregation of outcomes coming from different soft constraints. 2.1 The Algebra of Semirings This section reviews the main concepts, adopting the terminology used in [1,2]. A (commutative) semiring is a tuple K = A, +, ×, 0, 1 such that A is a set, 1, 0 ∈ A, and +, × : A×A → A are binary operators making the triples A, +, 0 and A, ×, 1 commutative monoids, satisfying distributivity (∀a, b, c ∈ A.a × (b + c) = (a × b) + (a × c)) and with 0 as annihilator element for × (∀a ∈ A.a × 0 = 0). A semiring is absorptive if additionally 1 is an annihilator element for + (∀a ∈ A.a + 1 = 1)1 . Let K be an absorptive semiring. Then, the operator + of K is idempotent. As a consequence, the relation A, ≤ defined as a ≤ b if a + b = b is a partial order and, moreover, 1 is its top element. If additionally K is also idempotent (that is, the product operator × is idempotent), then the partial order is actually a lattice, since a × b corresponds to the greatest lower bound of a and b. For the rest of the paper, we consider the absorptive semiring K = A, +, ×, 0, 1. An absorptive semiring is invertible if whenever a ≤ b, there exists an element c ∈ A such that b × c = a or, in other words, if the set Div(a, b) = {c | b × c = a} is not empty. It is uniquely invertible if that set is actually a singleton whenever b = 0. All classical soft constraint instances (i.e., Classical CSPs, Fuzzy CSPs, Probabilistic CSPs and Weighted CSPs) are indeed invertible, and uniquely so. A division operator ÷ is a kind of weak inverse of ×, such that a ÷ b returns an element from Div(a, b). There are currently two alternatives in the literature: the choice in [2] favours the maximum of such elements (whenever it exists), the choice in [7] favours the minimum (whenever it exists). The latter behaves better computationally, the former encompasses more semiring instances2 . All of our results in the following sections are applicable for any choice of ÷. Example 1. Consider the weighted semiring Kw = N ∪ {∞}, min, +, ∞, 0. The + and × operators are min and + with their usual meaning over the naturals. The ∞ value is handled in the usual way (i.e., min{∞, a} = a and ∞+a = ∞ for all a). This semiring is used to model and solve a variety of combinatorial optimization problems [11]. The induced order is total and corresponds to the inverse of the usual order among naturals (e.g. 9 ≤ 6 because min{9, 6} = 6). Note that Kw is uniquely invertible: the division corresponds to subtraction over the naturals (e.g. a ÷ b = a − b). The only case that deserves special attention is ∞ ÷ ∞, because any value of the semiring satisfies the division condition. In [2] and [7] it is defined as 0 and ∞, respectively. Consider now the semiring Kb = N+ ∪ {∞}, min, ×, ∞, 1, where N+ = N \ {0}. The + and × operators are now min, × with the usual meaning over the naturals. This semiring is a slightly modified version of a semiring proposed in [5] in order to deal with so-called bipolar preferences. As before, the semiring is totally ordered. However, 1
2
The absorptivity property is proved equivalent to require ∀a, b ∈ A.a + (a × b) = a: any element a × b is “absorbed” by either a or b. For example, complete semirings, i.e., those which are closed with respect to infinite sums. See [2] for a throughout comparison between the two alternatives.
Extending Soft Arc Consistency Algorithms to Non-invertible Semirings
389
it is not invertible: for example, even if 9 ≤ 6, clearly there is no a ∈ N+ ∪ {∞} such that 6 × a = 9. Intuitively, N+ ∪ {∞} is not closed under the arithmetic division, the obvious inverse of the arithmetic multiplication. 2.2 Soft Constraints Based on Semirings This section briefly introduces the semiring-based approach to the soft CSP framework. Definition 1 (constraints). Let K = A, +, ×, 0, 1 be an absorptive semiring; let V be an ordered set of variables; and let D be a finite domain of interpretation for V . Then, a constraint (V → D) → A is a function associating a value in A to each assignment η : V → D of the variables. We then define C as the set of all constraints that can be built starting from chosen K, V and D. The application of a constraint function c : (V → D) → A to a variable assignment η : V → D is noted cη. Note that even if a constraint involves all the variables in V , it can depend only on the assignment of a finite subset of them, called its support. For instance, a binary constraint c with supp(c) = {x, y} is a function c : (V → D) → A which depends only on the assignment of variables {x, y} ⊆ V , meaning that two assignments η1 , η2 : V → D differing only for the image of variables z ∈ {x, y} do coincide (i.e., cη1 = cη2 ). The support corresponds to the classical notion of scope of a constraint. We often refer to a constraint with support I as cI . Moreover, an assignment over a support I of size k is concisely represented by a tuple t in Dk and we often write cI (t) instead of cI η. We now present the extension of the basic operations (namely, combination, division and projection) to soft constraints. Definition 2 (combination and division). The combination operator ⊗ : C × C → C is defined as (c1 ⊗ c2 )η = c1 η × c2 η for constraints c1 , c2 . The division operator ÷ : C × C → C is defined as (c1 ÷c2 )η = c1 η ÷ c2 η for constraints c1 , c2 such that c1 η ≤ c2 η for all η. Informally, performing the ⊗ or the ÷ between two constraints means building a new constraint whose support involves all the variables of the original ones, and which associates with each tuple of domain values for such variables a semiring element which is obtained by multiplying or, respectively, dividing the elements associated by the original constraints to the appropriate sub-tuples. Definition 3 (projection). Let c ∈ C be a constraint and v ∈ V a variable. The projection of c over V − {v} is the constraint c with c η = d∈D cη[v := d]. Such projection is denoted c ⇓(V −{v}) . The projection operator is inductively extended to a set of problem variables I ⊆ V by c ⇓(V −I) = c ⇓(V −{v}) ⇓(V −{I−{v}}) . Informally, projecting means eliminating variables from the support. Definition 4 (soft CSPs). A soft constraint satisfaction problem is a pair C, Y , where C ⊆ C is a set of constraints and Y ⊆ V is a set of variables.
390
S. Bistarelli et al.
x
y 6
5 a 5 b
a 9 2
15
a)
b 6
x a a b b
y a b a b
b)
x 20 26 16 11
a a b b
y a b a b
270 450 90 30
c)
Fig. 1. A soft CSP and the combination of its constraints with respect to the semirings K w and K b . The arcs connecting the variables represent binary constraints, and the numbers represent the preference associated with that particular instantiation of variables (e.g. x = a, y = a costs 6 for the binary constraint and, respectively, 5 and 9 for the two unary constraints).
The set Y contains the variables of interest for the constraint set C. Definition 5 (solutions). The solution of a soft CSP P = C, Y is defined as the constraint Sol(P ) = ( C) ⇓Y . The solution of a soft CSP is obtained by combining all constraints, and then projecting over the variables in Y . In this way we get the constraint with support (not greater than) Y which is “induced” by the entire soft CSP. A tightly related notion, one which is quite important in combinatorial optimization problems, is the best level of consistency [4]. Definition 6. The best level of consistency of a soft CSP problem P = C, Y is de fined as the constraint blevel(P ) = ( C) ⇓∅ . Example 2. Figure 1.a) shows a soft CSP with variables V = {x, y} and values D = {a, b}. For the purpose of the example, all the variables in V are of interest. The problem has two unary soft constraints cx , cy and one binary constraint cxy . Each rectangle represents a variable and contains its domain values. Besides each domain value there is a semiring value given by the corresponding unary constraint (for instance, cy gives value 9 to any labeling in which variable y takes value a). Binary constraints are represented by labeled links between pairs of values. For instance, cxy gives value 15 to the labeling in which variable x and y take values a and b. If a link is missing, its value is the unit 1 of the semiring. Figure 1.b) shows the combination of all constraints (i.e., cx ⊗cy ⊗cxy ) assuming the weighted semiring Kw . Each table entry is the sum of the three corresponding semiring values. In this case, the best level of solution is 11, the minimum over all the entries. Figure 1.c) shows the combination of all constraints assuming the semiring Kb . It is different from the previous case, because semiring values are now multiplied. As before, the best level of solution is the minimum over all the entries which, in this case, is 30. In the rest of this paper, we sometimes assume that soft CSPs are binary (i.e., no constraint has arity greater than 2). This is done for the sake of the presentation of the consistency algorithm, and the assumption is done without loss of generality, since any
Extending Soft Arc Consistency Algorithms to Non-invertible Semirings
391
soft CSP can be transformed into an equivalent one (i.e, with the same best level of consistency) that is binary (even if the outcome of applying a local consistency algorithm may differ): see e.g. [13, Section 3]. Definition 7. The constraint graph of a binary soft CSP problem P = C, {x, y} is defined as the graph containing one vertex per problem variable. The graph contains an edge (x, y) if there is some constraint with both x and y in its support. In the following, we also assume the existence of a unary constraint cx for every variable x, and of a zero-arity constraint (i.e., a constant), noted c0 . This is also done without loss of generality because, if such constraints are not defined, we consider dummy ones: cx (a) = 1 for all a ∈ D and c0 = 1. 2.3 Variable Elimination A well-known technique to solve soft CSPs is variable elimination [13, Section 7]. The idea is to simplify a problem by eliminating variables while preserving its best level of consistency. This is achieved by adding new constraints that compensate for the eliminated variables. For simplicity, we only review the variable elimination process for variables with degree (the number of edges they are involved with) less than 3. The interested reader can find a comprehensive description of the general method in [8]. If x is a variable of degree 1 in the constraint graph, let cxy be the unique binary constraint having x in its support. The elimination of x consists in updating cy as cy := cy ⊗ ((cxy ⊗ cx ) ⇓y ) and discarding x, cx and cxy subsequently. If x is a variable of degree 2 in the constraint graph, let cxy and cxz be the only binary constraints having x in its support. The elimination of x consists in updating cyz as cyz := cyz ⊗ ((cxy ⊗ cxz ⊗ cx ) ⇓yz ) and discarding x, cx , cxy and cxz subsequently. Note that eliminating variables of larger degree becomes more complex and expensive. Even if we do not want to completely solve the problem with variable elimination, we can still take advantage of the previous procedure and eliminate candidate variables until all the remaining variables have degree greater than 2 in the constraint graph. This is a widely used simplification pre-process (see again [13, Section 7]). 2.4 Local Consistency Soft local consistencies are properties over soft CSPs that a given instance may or may not satisfy. For the sake of simplicity, in this paper we restrict ourselves to the simplest consistencies: node and arc consistency. However, all the notions and ideas presented in subsequent sections can be easily generalized to more sophisticated ones. One effect of local consistency is to detect and remove unfeasible values. For that purpose, we define the domain of a variable x in the support of a constraint c as Dxc = = 0}, or simply Dx when clear from the context. {a ∈ D | cx (a) Definition 8 (node/arc consistency [9]). Let P = C, {x, y} be a binary soft CSP. Node consistency (NC). A value a ∈ Dx is NC if c0 × cx (a) > 0. A variable x is NC if all a ∈ Dx are NC and a∈Dx cx (a) = 1. P is NC if every variable is NC.
392
S. Bistarelli et al.
Fig. 2. Three equivalent soft CSP instances (for semiring K w ). The first one is neither NC nor AC; the second one is not yet AC; the third one is both NC and AC.
Arc consistency (AC). A value a ∈ Dx is AC with respect to cxy if b∈Dy cxy (a, b) = 1. Variable x is AC if all a ∈ Dx are AC with respect to every binary constraint such that x is in its support. P is AC if every variable is AC and NC. Each property should come with an enforcing algorithm that transforms the problem into an equivalent one that satisfies the property. Node inconsistent values can just be filtered out as soon as they are identified. For the rest, enforcing algorithms are based on the concept of local consistency rule [13, Section 9.6.2]. Definition 9 (local consistency rule [2]). Let c1 and c2 be two constraints such that supp(c1 ) ⊂ supp(c2 ) and let Z = supp(c2 ) \ supp(c1 ). A local consistency rule involving c1 and c2 , denoted CR(c1 , c2 ), consists of two steps – Aggregate to c1 information induced by c2 c1 := c1 ⊗ (c2 ⇓Z ) – Compensate in c2 the information aggregated to c1 in the previous step ÷(c2 ⇓Z ). c2 := c2 Note that the local consistency rules above are defined only for soft CSPs with invertible semirings. The fundamental property of the above local consistency rule is that it does not change the solution of soft CSPs defined on invertible semirings. Proposition 1 (preserving solution [2]). Let P = C, Y be a soft CSP and let c1 and c2 be two constraints in C such that supp(c1 ) ⊂ supp(c2 ). Then, the solution of P coincides after and before the application of CR(c1 , c2 ). The result above boils down to show that the value of the combination c1 ⊗ c2 coincides after and before the application of CR(c1 , c2 ). Example 3. Figure 2.a) shows the soft CSP we chose as our running example. Assuming the Kw semiring, the problem is not NC. It can be made NC by applying the local consistency rule twice, to c0 , cx and c0 , cy , which increases c0 and decreases the unary constraints. The resulting soft CSP is depicted in Figure 2.b). However, it is not AC. It can be made AC by applying the local consistency rule to cx , cxy . The resulting soft CSP is depicted in Figure 2.c). This problem is more explicit than the original one. In particular, the zero-arity constraint c0 contains the best level of consistency.
Extending Soft Arc Consistency Algorithms to Non-invertible Semirings
393
3 Defining an LCD-Based Semiring Transformation Many absorptive semirings are not invertible: see e.g. [2] for some references. Nevertheless, we would like to apply some kind of consistency rules also to these cases. The aim of the section is to show how to distill from an absorptive semiring K a novel semiring that is both absorptive and invertible. We introduce the construction of a structure LCD(K), we prove it to be a semiring, and we state a conservativity result, namely, that K and LCD(K) coincide, if K is invertible. Technically, we exploit a notion of least common divisor: for any two elements of a semiring, we consider the set of common divisors, and we assume the existence of the minimum among such divisors: intuitively, these are the “worst” elements according to the ordering on the semiring. Definition 10 (common divisor). Let K be an absorptive semiring and a, b ∈ A. The divisors of a is the set Div(a) = {c | ∃d.c × d = a}; the set of common divisors of a and b is the set CD(a, b) = Div(a) ∩ Div(b). Clearly, the set CD(a, b) is never empty, since it contains at least 1. Note also that we could extend our definition to any finite set E of elements, simply as CD(E) = a∈E Div(a). Moreover, absorptivity implies that for all elements y ∈ CD(a, b) we have a + b ≤ y, which motivates our terminology below3 . Definition 11 (least common divisor). Let K be an absorptive semiring and a, b ∈ A. A least common divisor of a and b is any x ∈ CD(a, b) such that CD(a, b) ⊆ Div(x). The “least” component of our terminology is motivated by the proposition below. Proposition 2 (uniqueness). Let K be an absorptive semiring and a, b ∈ A. If x is a least common divisor of a and b, then it is also the minimum of CD(a, b). Thus, whenever it exists, the least common divisor of a and b is unique, and it coincides with the minimum of CD(a, b): we denote it with LCD(a, b). Note that the reverse may not hold, since the minimum of CD(a, b) might not be a common divisor. In the following, an absorptive semiring has least common divisors if the LCD operator is defined for each pair of elements of the semiring. Proposition 3 (associativity). Let K be an absorptive semiring with least common divisors. Then, the LCD operator is associative, that is, LCD(a, LCD(b, c)) = LCD (LCD(a, b), c) for all elements a, b, c ∈ A. In other terms, associativity implies that for any finite set E of elements of the semiring the intuitive notion of LCD(E) can be defined by decomposing its calculation into a sequence of applications of the binary operator, performed in any order. Note now that a × LCD(c, d) is actually a divisor of LCD(a × b, a × c). In the following, a semiring has distributive least common divisors if the reverse also hold. 3
Notice that if y ∈ CD(A, b), then there exist k1 ,k2 such that y × k1 = a, y × k2 = b. This means that a ≤ y and b ≤ y, hence the claim a + b ≤ y holds.
394
S. Bistarelli et al.
Definition 12 (distributivity). Let K be an absorptive semiring with least common divisors. Then, its has distributive least common divisors if the LCD operator is distributive, that is, a × LCD(b, c) = LCD(a × b, a × c) for all elements a, b, c ∈ A. We now prove the main result of this section, i.e., the existence of a semiring whose sum operator is based on the binary operator LCD. This semiring is used by the consistency algorithm in Section 4. Theorem 1 (LCD-based semiring). Let K be an absorptive semiring with distributive least common divisors. Then, the tuple LCD(K) = A, LCD, ×, 0, 1 is an absorptive and invertible semiring. Proof. We first prove that LCD(K) is an absorptive semiring. So, let us check that A, LCD, 0 is a commutative monoid. This boils down to prove the associativity and commutativity of the LCD operator (which are obvious, the former due to Proposition 3 above), and that LCD(a, 0) = 0: since LCD(a, 0) coincides with Div(a), and clearly the minimum of the latter is a, the required law holds. The absorptivity of 1 is obvious, since Div(1) = 1. Finally, distributivity amounts to the requirement for least common divisors to be also distributive. Let us now consider invertibility: by definition, a ≤LCD b implies that LCD(a, b) = b, hence b ∈ Div(a). As a final result, we need to check what is the outcome of the application of the LCD construction to an already invertible semiring. Proposition 4 (conservativity). Let K be an absorptive and invertible semiring. Then, K has distributive least common divisors and LCD(K) and K are the same semiring. The proof is immediate: note that if K is invertible, then by definition a+b ∈ CD(a, b), and since a + b is the least upper bound of a and b, it necessarily is the minimum of CD(a, b). So, LCD(a, b) = a + b, and the proposition follows. Example 4. Recall the non-invertible semiring Kb = N+ ∪ {∞}, min, ×, ∞, 1. By definition, the least common divisor of a and b is LCD(a, b) = max{c | ∃d.c × d = a, ∃e.c × e = b}, corresponding to the arithmetic notion of greatest common divisor. The LCD transformation of Kb is LCD(Kb ) = N+ ∪{∞}, LCD(a, b), ×, ∞, 1. The partial order induced by LCD(Kb ) is a ≤ b if LCD(a, b) = b, i.e., if b is a divisor of a. Finally, the division operator of LCD(Kb ) is the usual arithmetic division.
4 LCD-Based Local Consistency This section generalizes local consistencies to soft CSPs with non invertible semirings. The basic idea is to replace the + operator by the LCD operator in the original definition of the local consistency rule. The value represented by the LCD corresponds to the amount we can “safely move” from the binary constraint to the unary one in order to enforce consistency. This intuition can be also exploited for non invertible semirings. Definition 13 (LCD node/arc consistency). Let P = C, {x, y} be a binary soft CSP defined over a semiring K with distributive least common divisors.
Extending Soft Arc Consistency Algorithms to Non-invertible Semirings
395
LCD Node consistency (LCD-NC). A value a ∈ Dx is LCD-NC if c0 × cx (a) > 0. A variable x is LCD-NC if all a ∈ Dx are NC and LCD({cx (a) | a ∈ Dx }) = 1. P is LCD-NC if every variable is LCD-NC. LCD Arc consistency (LCD-AC). A value a ∈ Dx is LCD-AC with respect to cxy if LCD({cxy (a, b) | b ∈ Dy }) = 1. A variable x is LCD-AC if all a ∈ Dx are LCD-AC with respect to every binary constraint such that x is in its support. P is LCD-AC if every variable is LCD-AC and LCD-NC. In other terms, LCD consistency is verified by checking the desired properties with respect to the semiring LCD. Definition 14 (LCD local consistency rule). Let P = C, Y be a soft CSP defined over a semiring K with distributive least common divisors. A LCD local consistency rule involving c1 and c2 , noted LCD-CR(c1 , c2 ), is like a local consistency rule in Definition 9 where all operations (i.e, combination, projection and division) are instead performed using the LCD(K) semiring. In other terms, we are considering the problem P as if it were defined over the LCD(K) semiring: this is formally obtained by exploiting the obvious inclusion of K into LCD (K). Note that the solution of the derived problem over the semiring LCD(K) might be different from the one for the original problem over K, and Proposition 1 only ensures us the LCD consistency rule preserves the former solution. However, since the combination operator is actually preserved by the construction of LCD(K), we may infer that the its application of the LCD consistency rules preserves the solution also for the original problem, i.e., over the semiring K. Theorem 2 (preserving solution). Let P = C, Y be a soft CSP and let c1 and c2 be two constraints in C such that supp(c1 ) ⊂ supp(c2 ). Then, the solution of P coincides after and before the application of LCD-CR(c1 , c2 ). Proof. The theorem is shown by noting that the choice of the actual value for the divisor of c2 in the application of the local consistency rule is irrelevant, as long as it is a divisor: this guarantees that the value of the combination c1 ⊗ c2 is the same after and before the application of the rule. And the LCD construction ensures that for any absorptive semiring K the result of the projection c2 ⇓Z in LCD(K) is a divisor of c2 in K. Note that, as we already wrote, the solution of a soft CSP over K may differ from the one over LCD(K): indeed, the result of the projection c2 ⇓Z is different, depending on which semiring it is performed. Nevertheless, the theorem holds since the value of the combination c1 ⊗ c2 is the same in both semirings. Example 5. Figure 3.a) shows our running example. Assuming the Kb semiring, the problem is not LCD-NC. It can be made LCD-NC by applying the LCD consistency rule twice, to c0 , cx and c0 , cy . The resulting soft CSP is depicted in Figure 3.b). Note that it is LCD-NC but not NC. It is not LCD-AC and can be made so by applying the LCD consistency rule twice, first to cx , cxy (producing Figure 3.c)) and then to cy , cxy (producing Figure 3.d)). The resulting problem is not LCD-NC (due to variable y). It can be made LCD-NC and LCD-AC by applying the LCD consistency rule to c0 , cy , obtaining the problem in Figure 3.e). This problem is more explicit than the original one. In particular, the zero-arity constraint c0 contains the best level of consistency.
396
S. Bistarelli et al.
The above LCD node and arc consistency properties can be enforced by applying adhoc LCD consistency rules. Figure 4 shows a LCD-AC enforcing algorithm. It is based on the AC algorithm of [11]. For simplicity, we assume that no empty domain is produced. It uses two auxiliary functions: P runeV ar(x) prunes not LCD-NC values in Dx and returns true if the domain is changed; LCD-CR(c1 , c2 ) iteratively applies the LCD consistency rule to c1 and c2 until reaching a fixed point. Note that if the original semiring K is invertible, LCD-CR(c1 , c2 ) iterates only once. The main procedure uses a queue Q containing those variables that may not be LCD-AC. Q should be initialized with all the variables to check at least once their consistency.
Fig. 3. Five equivalent soft CSP instances (for semiring K b ).The first one is neither LCD-NC nor LCD-AC; the second one is not yet LCD-AC; the last one is both LCD-NC and LCD-AC. function P runeV ar(x) 1. change := f alse; 2. for each a ∈ Dx do 3. if cx (a) × c0 = 0 then 4. Dx := Dx − {a}; 5. change := true; 6. return change; procedure LCD-AC(P, V ) 1. Q := V ; 2. while (Q = ∅) do 3. y := pop(Q); 4. for each cx,y ∈ C do LCD-CR(cx , cxy ); 5. for each x ∈ V do LCD-CR(c0 , cx ); 6. for each x ∈ V do 7. if P runeV ar(x) then Q := Q ∪ {x}; Fig. 4. Algorithm LCD-AC
Extending Soft Arc Consistency Algorithms to Non-invertible Semirings
397
Proposition 5 (LCD-AC time complexity). The time complexity of LCD-AC(P, V ) is O(e × d2 × |LCD|), where e is the number of constraints, d the maximum domain size, and |LCD| the cost to compute the least common divisor. However, the complexity of AC(P,V) may actually be quite smaller. In fact, note that there are many semiring instances such that the iteration due to the application of local consistency immediately terminates after one round. In particular, this is true for cancellative semirings (∀a, b, c ∈ A.a × c = b × c ∧ c = 0 =⇒ a = b) such as Kw and Kb : indeed, if K is cancellative then so is LCD(K) (thus it is uniquely invertible [2]) and for any a, b ∈ K, LCD(a ÷ LCD(a, b), b ÷ LCD(a, b)) = 1 holds.
5 Conclusions and Further Work We presented a technique for transforming any semiring into a novel one, whose sum operator corresponds to the LCD of the set of preferences. This new semiring can be cast inside local consistency algorithms and allows us to extend their use to problems dealing with non-invertible semirings. A noticeable application case is represented by multi-criteria soft CSPs, where the (Hoare power-domain of the) cartesian product of semirings represents the set of partially ordered solutions [3]. As a future application we plan to extend existing libraries on crisp constraints in order to deal also with propagation for soft constraints, as proposed in this paper. Moreover, we would like to develop “imprecise” algorithms following the ideas in [10], and focusing on multi-criteria problems.
References 1. Bistarelli, S.: Semirings for Soft Constraint Solving and Programming. LNCS, vol. 2962. Springer, Heidelberg (2004) 2. Bistarelli, S., Gadducci, F.: Enhancing constraints manipulation in semiring-based formalisms. In: Brewka, G., Coradeschi, S., Perini, A., Traverso, P. (eds.) ECAI 2006. Frontiers in Artif. Intell. and Applications, vol. 141, pp. 63–67. IOS Press, Amsterdam (2006) 3. Bistarelli, S., Gadducci, F., Larrosa, J., Rollon, E.: A soft approach to multi-objective optimization. In: de la Banda, M.G., Pontelli, E. (eds.) ICLP 2008. LNCS, vol. 5366, pp. 764– 768. Springer, Heidelberg (2008) 4. Bistarelli, S., Montanari, U., Rossi, F.: Semiring-based constraint satisfaction and optimization. Journal of ACM 44(2), 201–236 (1997) 5. Bistarelli, S., Pini, M.S., Rossi, F., Venable, K.B.: Bipolar preference problems: Framework, properties and solving techniques. In: Azevedo, F., Barahona, P., Fages, F., Rossi, F. (eds.) CSCLP. LNCS (LNAI), vol. 4651, pp. 78–92. Springer, Heidelberg (2007) 6. Cooper, M.: High-order consistency in valued constraint satisfaction. Constraints 10(3), 283– 305 (2005) 7. Cooper, M., Schiex, T.: Arc consistency for soft constraints. Artificial Intelligence 154(1-2), 199–227 (2004) 8. Dechter, R.: Constraint Processing. Morgan Kaufmann, San Francisco (2003) 9. Larrosa, J.: Node and arc consistency in weighted CSP. In: AAAI/IAAI 2002, pp. 48–53. AAAI Press, Menlo Park (2002)
398
S. Bistarelli et al.
10. Larrosa, J., Meseguer, P.: Exploiting the use of DAC in MAX-CSP. In: Freuder, E.C. (ed.) CP 1996. LNCS, vol. 1118, pp. 308–322. Springer, Heidelberg (1996) 11. Larrosa, J., Schiex, T.: Solving weighted CSP by maintaining arc consistency. Artificial Intelligence 159(1-2), 1–26 (2004) 12. Mackworth, A.K.: Consistency in networks of relations. Artificial Intelligence 8(1), 99–118 (1977) 13. Rossi, F., van Beek, P., Walsh, T.: Handbook of Constraint Programming. Foundations of Artificial Intelligence. Elsevier (2006)
Frequency Transition Based Upon Dynamic Consensus for a Distributed System Oscar A. Esquivel Flores1 and Héctor Benítez Pérez2 1
Posgrado en Ciencia e Ingeniería de la Computación Universidad Nacional Autónoma de México, México D.F.
[email protected] 2 Departamento de Ingeniería de Sistemas Computacionales y Automatización IIMAS, Universidad Nacional Autónoma de México Apdo. Postal 20-726. Del. A. Obregón, México D.F. C.P. 01000, México Tel.: ++52.55.56.22.32.12; Fax: ++52.55.56.16.01.76
[email protected]
Abstract. This paper provides a strategy to schedule a real time distributed system. Modifications on frequency transmission (task periods) of system’s individual components impact on system quality performance. In this work the authors propose a dynamic linear time invariant model based upon frequency transmission of agents in a distributed system and using LQR control approach to bring the system into a nonlinear region. Numerical simulations show the effectiveness of the LQR feedback control law to modify agent’s frequency transmission. Keywords: Distributed systems, Frequency transmission, Consensus, Control, Agents.
1 Introduction At the present time distributed systems are widely used in the industrial and research. These systems fulfills critical mission and long-running applications, some characteristics of the distributed systems are either capacity to maintain consistency or recovering without suspending their execution. These systems should complete time restrictions, coherence, adaptability and stability among others. A current application on Distributed Systems under time restrictions are Networked Control Systems (NCS) which implementation consist of several nodes doing some part of the control process, sensoractuator activities works under real time operating system and real time communication network. In order to achieve the objectives of all tasks performed, it is necessary for all agents of the system to exchange their own information through communication media properly. Therefore the mechanism of communication plays an important role on stability and control system performance implemented over A a communication network [1]. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 399–409, 2010. © Springer-Verlag Berlin Heidelberg 2010
400
O.A. Esquivel Flores and H. Benítez Pérez
1.1 Sampling and Transmission Rates Network scheduling is a priority in the design of a NCS when a group of agents are linked through the available network resources. If there is not coordination among agents data transmissions may occur simultaneously and someone has to back off to avoid collisions or bandwidth violations. This results in transmission of some real time with delay or even failure to comply their deadlines. Therefore is necessary a scheduling control algorithm which minimize this loss of system performance [2]. Nevertheless there isn’t a global scheduler that guarantees an optimal performance [3]. Mainly due to communication network introduce a several issues not deal properly, Lian et al. [4],[5] have designed some methodologies for networked nodes (agents of the system) to generate proper control actions and utilize communication bandwidth optimally. Frequency transmission could be obtained through the sampling rate. The task period p performed by an agent defines the frequency transmission, it means, 1 f = . Figure 1 shows that effectiveness of the networked control systems depends on p
the sampling rate, a region which control performance is acceptable deals with two points b and c associated to f b and f c sampling rates respectably which can be determined by characteristics an statistics of networked induced delays and device processing time delays. f b implies that small sampling periods could be necessary to guarantee a certain level of control performance, as the sampling rate gets faster f c , the network traffic load becomes heavier, the possibility of more contention time or data loss increase in a bandwidth-limited network and longer time delays result.
Fig. 1. Networked control system performance
Hence it’s very important to considerate either sampling periods or frequency transmission to obtain better system performance.
Frequency Transition Based Upon Dynamic Consensus for a Distributed System
401
1.2 Consensus Other hand, the basic idea of a consensus algorithm is to impose similar dynamics on the information states of each agent involved in a dynamical system [6]. In networks of agents consensus means to reach an agreement regarding a certain quantity of interest that depends on the state of all agents. A consensus algorithm is an interaction rule that specifies the information exchange between an agent and all of its neighbors on the network. Recently several problems related to multi-agent networked systems with close ties to consensus problems have got an important interest. Olfati-Saber et al. in [7] present an overview of the key results on theory and applications of consensus problems in networked systems which include control theoretic methods for convergence and performance analysis of consensus protocols. Hayashi et al. [8] propose a fair quality service control with agent-based controller where each agent manages an allocated resource and a quality service level of several tasks working on a real time system, considers an application of typical consensus problem to fair quality service control in soft real time systems. The states of the system are resources allocated to a task such as CPU utilization, network bandwidth, and memory size, while the performance value of each agent is characterized by the performance function. This paper shows a way to control the frequency of transmission among agents in a NCS based on their frequency transmission relations. We propose a lineal time invariant model in which the coefficients of the state matrix are the relations between the frequencies of each node and we use a LQR feedback controller that modifies transmission frequencies bounded between maximum and minimum values of transmission in which ensures the system’s schedulability. The rest of this paper is organized as follow, section 2 shows a frequency transmission model and a proposal to matrix coefficients of the model, section 3 presents a particular networked control system as a case of study, section 3 shows numerical simulations of the model presented and performance of LQR controller. Brief conclusions are presented at the end.
2 Frequency Model Let a distributed system with n nodes or agents that perform one task ti with period pi and consumption ci each one for i = 1,2,...,n . The distributed system dynamics can be modeled as a linear time-invariant system, which state variables x1 , x2 ,...,xn .are the frequencies of transmission f i = 1 from n nodes involved on it. pi The authors assume there is a relationship between frequencies f1 , f 2 ,..., f n and external input frequencies u1 , u2 ,...,un which serve as coefficients of the linear system We assume there is a relationship between frequencies f1 , f 2 ,..., f n and external input frequencies u1 , u2 ,...,un which serve as coefficients of the linear system:
x = Ax + Bu y = Cx
(1)
402
O.A. Esquivel Flores and H. Benítez Pérez
A ∈ ℜnxn is the matrix of relationships between frequencies of the nodes, B ∈ℜnxn is the scale frequencies matrix, C ∈ℜnxn is the matrix with frequencies ordered, x ∈ℜn is a real frequencies vector, y ∈ℜn is the vector of output frequencies. The input
u = h(r − x) ∈ℜn is a function of reference frequencies and real frequencies of the nodes in the distributed system. It is important to note that relations between the frequencies of the n nodes lead to the system (1) is schedulable with respect to the use of processors, that is, n c U =∑ i i =1 pi Therefore it is possible to control the system through the input vector u such that the outputs y are in a region L non-linear where the system is schedulable. This is that during the time evolution of the system (1) the output frequencies could be stabilized by a controller within the schedulability region L . This region could be unique or a set of subregions Li in which each yi converges, defined by:
β1 ≤ y1 ≤ α1 β2 ≤ y2 ≤ α2 … β2 ≤ y2 ≤ α2 However, it is not need αn ≤ αn−1 ≤,…, ≤ α2 ≤ α1 or βn ≤ βn−1 ≤,…, ≤ β2 ≤ β1 .Each αi and βi belongs to minimum and maximum frequencies respectively for the node
ni which vary according to particular case study. Figure 2 shows the dynamics of the frequency system and the desired effect by controlling it through a LQR controller and defining a common region L for a set of frequencies. Each node of the system starts with a frequency fi and the LQR controller modifies the period pi = 1 of each fi task into a schedulable region L .The real frequency fi of the node ni is modified to fi ' , it means that pi in the time t0 changes to pi' at time t1 to converge in a region where the system performance is close to optimal.
Fig. 2. Frequencies controlled by a LQR controller into a schedulability region
Frequency Transition Based Upon Dynamic Consensus for a Distributed System
403
Figure 3 shows a time diagram of system dynamics and the desirable effect of a LQR controller modifications to set the task periods into region L .
Fig. 3. Task periods controlled by a LQR controller into a schedulability region
The objective of controlling the frequency is to achieve coordination through the convergence of values. 2.1 Matrix Coefficients Proposal Let aij ∈ A given by a function of minimal frequencies fm of node i and bij ∈ B given by a function of maximal frequencies f x :
( =ψ ( f
) ) =ψ ( f )
aij = ϕ f m1 , f m2 ,..., f mn = ϕ( f m ) bij
1 x
2 x
, f ,..., f
n x
x
The control input is given by a function of the minimal frequencies and the real frequencies of node i :
u = h(r − x) = k( fm − fr ) fm and f x are the vectors:
[ =[f , f
(2)
] ]
fm = fm1 , fm2 ,..., fmn
fr
1 r
2 r
,..., frn
Τ
Τ
Then, the system (1) can be written as:
x = Ax + Bu x = Afr + B(k( fm − fr )) (3)
x = Afr + Bkfm − Bkfr x = ( A − Bkfm ) + Bkfr
404
O.A. Esquivel Flores and H. Benítez Pérez
3 Case of Study The authors consider a distributed system which performs a control close loop dynamic system based upon: sensor-controller-actuator and a centralized scheduler. Figure 4 shows the networked control system which consists of 8 processors with real-time kernel, connected by through a network type CSM / AMP (CAN) with a rate of sending data of 10000000 bits / s and not likely to data loss.
Fig. 4. Networked Control System
These blocks of real-time kernel and network are simulated using Truetime [9],[10]. The first agent in the model, on the extreme left is the controller agent that uses the values from sensors and calculates control outputs. Sensor agents sample the analog signals. Two actuator agents located to the far right below, receives signals. Finally scheduler, main agent, above far right node, organizes the activity of other 7 agents and it is responsible for periodic allocation bandwidth used by others agents. Focused on sensor agents which use a common communication media and performing closed loop control, each one has a real transmission frequency f r and sets the minimum frequencies fm and maximum f x between which each node could transmit. The regions L1 L2 L3 L4 L as a whole should meet the following restriction n
U =∑ i =1
ci pi
Where ci is the time packet communication and it is value is 2 ms on average for Ethernet.
Frequency Transition Based Upon Dynamic Consensus for a Distributed System
405
Elements of the matrices for system (1) are defined as follows:
(
⎧λ f m1 , f m2 , f m3 , fm4 ⎪ fmi ⎪ aij = ⎨ ⎪ f mj ⎪ fmi ⎩
)
⎧1 ⎪fi i= j ⎪ x bij = ⎨ ⎪0 i≠ j ⎪ ⎩
i= j
i≠ j
⎧1 i = j ⎪ cij = ⎨ ⎪0 i ≠ j ⎩
λ ( f m1 , f m2 , f m3 , f m4 ) is the greatest common divisor of the minimum frequencies. Due to:
[ =[f , f =[f , f
] ] ]
fm = fm1 , fm2 ,..., fmn fr fx
1 r
2 r
,..., frn
1 x
2 x
,..., f xn
Τ
Τ Τ
also x = fr u = k( f m − f r )
Using (3) we can write (1) as:
ª x1 º «x » « 2» « x3 » « » ¬ x4 ¼
⎡ ⎢ ⎢ ⎢ ⎢ + ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
ª O f m1 , f m2 , f m3 , f m4 « f m1 « f m1 « « f m2 « f m1 « « f m3 « f m1 « «¬ f m4
f m2 f m1 O f m1 , f m2 , f m3 , f m4 f m2 f m2 f m3 f m2 f m4
1 f x1
0
0
0
0
1 f x2
0
0
0
0
1 f x3
0
0
1 f x4
0
0
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
⎡k1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0
f m3 f m1 f m3 f m2 1 2 3 4 O fm , fm , fm , fm f m3 f m3 f m4
0
0
k2 0
0 k3
0
0
f m4 f m1 f m4 f m2 f m4 f m3 O f m1 , f m2 , f m3 , f m4 f m4
º » » 1 »ª fr º »« f 2 » » « r3 » »« fr » »« f 4 » » «¬ r »¼ » »¼
0 ⎤⎛ ⎡ ⎜ ⎢ 0 ⎥⎥ ⎜ ⎢ ⎜ 0 ⎥⎜ ⎢ ⎥⎜ ⎢ k 4 ⎦ ⎝ ⎣⎢
⎡ f r1 f m1 ⎤ ⎥ ⎢ 2 f m2 ⎥ ⎢ fr − ⎢ f r3 f m3 ⎥ ⎢ 4 4 ⎥ f m ⎦⎥ ⎢⎣ f r
⎤ ⎥ ⎥ ⎥ ⎥ ⎦⎥
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
406
O.A. Esquivel Flores and H. Benítez Pérez
⎡ y1 ⎤ ⎡1 ⎢ y ⎥ ⎢0 ⎢ 2⎥ = ⎢ ⎢ y3 ⎥ ⎢0 ⎢ ⎥ ⎢ ⎣ y 4 ⎦ ⎣0
0 1
0 0
0
1
0
0
0 ⎤ ⎡ f r1 ⎤ ⎢ ⎥ 0 ⎥⎥ ⎢ f r2 ⎥ 0 ⎥ ⎢ f r3 ⎥ ⎥⎢ ⎥ 1 ⎦ ⎢⎣ f r4 ⎥⎦
that is:
ª x1 º «x » « 2» « x3 » « » ¬ x4 ¼
ª O f m1 , « « « « « « « « « « ¬
f m2 , f m3 , f m3 f f f f f f f
1 m 1 m 2 m 1 m 3 m 1 m 4 m
f m2
f m3
1 m
1 m 3 m 2 m
f m4
f f f O f m1 , f m2 , f m3 , f m3 f m2 f 2 1 fm O f m , f m2 , f m3 , f m3 f m3 f m3 f m2 f m3 f m4 f m4 ⎡ ⎢ k1 ⎢ ⎢k ⎢ 2 + ⎢ ⎢k ⎢ 3 ⎢ ⎢k4 ⎣⎢
⎡ y1 ⎤ ⎡1 ⎢ y ⎥ ⎢0 ⎢ 2⎥ = ⎢ ⎢ y 3 ⎥ ⎢0 ⎢ ⎥ ⎢ ⎣ y 4 ⎦ ⎣0
f m1 − f r1 f x1 2 f m − f r2 f x2 3 f m − f r3 f x3 4 f m − f r4 f x4
0 1
0 0
0
1
0
0
º » f » 1 »ª f r º f »« f 1 » f »« r » f » « f r1 » »« 1 » f » ¬« f r ¼» O f m1 , f m2 , f m3 , f m3 » » f m4 ¼
1 m 4 m 2 m 4 m 3 m
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦⎥
0 ⎤ ⎡ f r1 ⎤ ⎢ ⎥ 0 ⎥⎥ ⎢ f r2 ⎥ 0 ⎥ ⎢ f r3 ⎥ ⎥⎢ ⎥ 1 ⎦ ⎣⎢ f r4 ⎦⎥
4 Numerical Simulation We performed numerical simulations of the system (1) without control and with LQR controller for values of maximum, minimum and real frequencies as following: Table 1. Maximum, minimum, and real frequencies Node 1 2 3 4
Max freq. 65 55 55 45
Min. freq. 40 35 15 30
Real freq 53 52 33 38
Frequency Transition Based Upon Dynamic Consensus for a Distributed System
407
thus
⎡0.1250 ⎢1.3333 A= ⎢ ⎢4.0000 ⎢ ⎣1.6000
0.7500 0.1667 3.0000 1.2000
0.2500 0.3333 0.5000 0.4000
0.6250⎤ 0.8333⎥⎥ 2.5000⎥ ⎥ 0.2000⎦
and eigenvalues
λ1 = 3.1937 λ2 = −0.7149 λ3 = −0.8676 λ 4 = −0.8435 The system is unstable 4.1 LQR Control We chose weight matrices Q, R ∈ℜ4x4 as follows:
⎡10 ⎢0 Q=⎢ ⎢0 ⎢ ⎣0
0⎤ 10 0 0 ⎥⎥ 0 10 0 ⎥ ⎥ 0 0 10⎦ 0
⎡1 ⎢0 R=⎢ ⎢0 ⎢ ⎣0
0
0 1 0 0
0 0 1 0
0⎤ 0⎥⎥ 0⎥ ⎥ 1⎦
the gain K K ∈ℜ4 x4 and Ac = ( A − BK) ∈ℜ4x4 are:
⎡118.90 ⎢123.39 K =⎢ ⎢ 55.56 ⎢ ⎣130.01
104.41 47.01 90.00⎤ 108.54 48.82 93.47⎥⎥ 48.82 22.11 42.08⎥ ⎥ 114.25 51.44 98.62⎦
⎡−1.70 ⎢−1.10 Ac = ⎢ ⎢ 1.65 ⎢ ⎣−1.55
and eigenvalues of Ac
O1 O2 O3 O4
3.1937 0.7199 0.8690 0.8460
Figure 5 shows the dynamics of the controlled system
− 0.73 − 0.34 − 0.63⎤ −1.83 − 0.45 − 0.84⎥⎥ 1.44 − 0.06 1.23 ⎥ ⎥ −1.37 − 0.64 − 2.02⎦
408
O.A. Esquivel Flores and H. Benítez Pérez
Fig. 5. Frequency response controlled by a LQR controller
The LQR controller could modify frequencies into the limits defined by minimal and maximal frequencies.
5 Conclusions In this work, we have present a linear time invariant model of nodes frequency transmission involved into a distributed system. The significance of control the frequencies stem from the system schedulability. The key feature of LQR control approach is a simple design with good robustness and performance capabilities, easily frequencies are modified. We have shown via numerical simulations the performance of the proposed control scheme.
References 1. Lian, F., Moyne, J., Tilbury, D.: Network architecture and communication modules for guaranteeing acceptable control and communication performance for networked multiagent systems. IEEE Transactions on Industrial Informatics 2(1) (2006) 2. Branicky, M.S., Liberatore, V., Phillips, S.M.: Networked control system co-simulation for co-design. In: Proc. American Control Conf., Denver, June 4-6, vol. 4, pp. 3341–3346 (2003) 3. Menéndez, A., Benitez, H.: An interaction amongst real time distributed systems performance & global scheduling. JART 8(2) (August 2010) 4. Lian, F., Moyne, J., Tilbury, D.: Time delay modeling and sample time selection for networked control systems. In: Proceedings of ASME-DSC, New York, USA, vol. XX (2001) 5. Lian, F., Moyne, J., Tilbury, D.: Network design considerations for distributed networked for distributed control systems. IEEE Transactions on Control Systems Technology 10(2) (2002)
Frequency Transition Based Upon Dynamic Consensus for a Distributed System
409
6. Ren, W., Beard, R.W., Atkins, E.M.: Information consensus in Multivehicle cooperative control. IEEE Contol Systems Magazine 27(2) (2007) 7. Olfati-Saber, R., Fax, J.A., Murray, R.M.: Consensus and Cooperation in Networked Multi-Agent Systems. Proceedings of the IEEE 95(1) (2007) 8. Hayashi, N., Ushio, T.: Application of A consensus Problem to Fair Multi-resource Allocation in Real-time Systems. In: Proceedings of the 47th IEEE Conference on Decision and Control, México (2008) 9. Cervin, A., Henriksson, D., Lincoln, B., Eker, J., Arzen, K.: How does control timing affect performance? Control Systems Magazine 23(3) (2003) 10. Ohlin, M., Henriksson, D., Cervin, A.: TrueTime 1.5 Reference Manual, Department of Automatica Control, Lund University (2007)
Towards Ubiquitous Acquisition and Processing of Gait Parameters Irvin Hussein L´opez-Nava and Ang´elica Mu˜ noz-Mel´endez National Institute for Astrophysics, Optics and Electronics Luis Enrique Erro # 1, Tonantzintla, Puebla, Mexico {hussein,munoz}@inaoep.mx http://www.inaoep.mx
Abstract. Gait analysis is the process of measuring and evaluating gait and walking spatio-temporal patterns, namely of human locomotion. This process is usually performed on specialized equipment that is capable of acquiring extensive data and providing a gait analysis assessment based on reference values. Based on gait assessments, therapists and physicians can prescribe medications and provide physical therapy rehabilitation to patients with gait problems. This work is oriented to support the design of ambulatory and ubiquitous technologies for gait monitoring. A probabilistic method to automatically detect human strides from raw signals provided by wireless accelerometers is presented. Local thresholds are extracted from raw acceleration signals, and used to distinguish actual strides from characteristic peaks commonly produced by significant shifts of the acceleration signals. Then, a bayesian classifier is trained with these peaks to detect and count strides. The proposed method has a good precision for classifying strides of raw acceleration signals for both, young and elderly individuals. Strides detection is required to calculate gait parameters and provide a clinical assessment. Keywords: Gait parameters processing, wireless triaxial accelerometer, characteristic peaks detector, bayesian classifier.
1
Introduction
Gait analysis is the process of measuring and evaluating gait and walking spatiotemporal patterns, namely of human locomotion. Gait analysis is important because human gait may reflect early warning signs of senile dementia and Alzheimer’s disease [1], for instance. These signs are commonly associated with significant differences in gait parameters such as stride and step length, cadence, and base support. Gait analysis technology includes specialized systems for 2D or 3D motion capture and motion tracking, such as multi-camera systems, tactile ground surface indicators, and force platforms. These systems are capable of acquiring extensive data and providing a gait analysis assessment based on reference values. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 410–421, 2010. c Springer-Verlag Berlin Heidelberg 2010
Towards Ubiquitous Acquisition and Processing of Gait Parameters
411
In recent years, there has been an increasing interest in designing portable devices to measure human gait parameters [2,3]. The advantages of having these devices, in contrast to specialized sophisticated gait analysis systems, are their low-cost production, and ubiquitous portability. Mobile devices to measure human gait parameters may effectively complement traditional gait analysis systems, as the former may be capable of continuously monitoring gait parameters during daily activities, reducing the stress and anxiety in individuals subjected to controlled clinical gait studies. Extensive research remains to be done in this area. Technology and techniques for real-time acquisition, filtering, transmission and processing of high volumes of biomedical data are required. Data mining algorithms capable of automatically identifying patterns in biomedical data are also needed for the development of user-oriented and disease-oriented models. In this work, a probabilistic method to automatically detect human strides is presented. The method utilizes acceleration signals measured using wireless triaxial accelerometers, and it is divided into two stages. First, local thresholds are calculated from raw acceleration signals of individuals. These thresholds are used to extract characteristic peaks commonly produced by significant shifts of the acceleration signals, including strides. Then, in the second stage, a bayesian classifier is trained to distinguish actual strides from characteristic peaks. The method was tested using data recorded from young and elderly individuals walking a number of strides, and using the accelerometers attached to both ankles. The method had a good precision for classifying strides of raw acceleration signals, of 99.37% and 98.25% in average, using the models for young and elderly group, respectively. The rest of the paper is organized as follows. Section 2 addresses related work. Section 3 gives some details about the device used to acquire acceleration signals, and about the signals provided by this device. Section 4 presents the method that was implemented to automatically detect strides. Section 5 describes the experiments that were conducted to determine the precision of our method, and finally section 6 closes with some concluding remarks.
2
Related Work
Recent studies regarding clinical gait analysis comprise a broad range of subjects including the design and implementation of mobile devices for measuring spatiotemporal gait parameters [2–9], the design and programming of activity monitors [10–12] and automatic fall detectors [13, 14]. A brief review of work relevant to this research is given below. Morris et al. [2] developed “GaitShoe”, a wireless wearable system capable of detecting heel-strike and toe-off, as well as estimating foot orientation and position. The whole system includes accelerometers, gyroscopes, force sensors, bidirectional bend sensors, as well as electric field height sensors. Lee et al. [3] for their part developed a portable and wireless activity monitoring system based on triple-axis accelerometers and foot-switches sensors. This system identifies temporal gait parameters and classifies walking patterns at various walking speeds.
412
I.H. L´ opez-Nava and A. Mu˜ noz-Mel´endez
Jasiewicz et al. [4] reported three different methods of gait event detection (toeoff and heel strike) using miniature linear accelerometers and gyroscopes. Detection was performed using data recorded from normal and spinal-cord injured (SCI) individuals. The proposed detection methods were based on foot linear acceleration signals, and foot and shank sagittal angular velocity. The authors conclude that detection based on foot linear accelerations or foot angular velocity can correctly identify the gait events in both normal and SCI individuals. Hwang et al. [5] implemented a system that comprises two accelerometers placed on each posterior/superior iliac spine for calculating cadence, based on ”main peak” detection along three-axis acceleration curves. Main peaks are considered important factors to distinguish the gait phases. A more sophisticated system including a specific module based on acceleration signals was presented by Edmison et al. [6]. They developed E-textile, an electronic textile system capable of computing measures of human gait such as the stride length. The E-textile system consists of pants and shoes, and includes piezoelectric fibers, accelerometers, gyroscopes, and communication elements. The category of wireless devices includes also the work of Han et al. [10]. They implemented a wearable activity monitoring system to measure acceleration of both ankles, and developed an automatic gait detection algorithm to process the acceleration signals. These data were processed sequentially to measure two-axis acceleration. Bourke et al. [13] developed a threshold-based algorithm capable of automatically discriminating between activities of daily living and falls. Falls were simulated by young individuals under supervised conditions. Daily living activities of elder individuals were also monitored. In both cases, three-axial accelerometers mounted on the trunk and thigh of individuals were used. Culhane et al. [15] described some applications of accelerometers used in a clinical setting as gait and balance evaluation, falls risk assesment and mobility monitoring. It is worth mentioning that there is also an important category of human gait studies interested in the determination of dynamic stability [16,17] and gait instability [18, 19]. These works are based on the quantification of a person’s balance maintenance during locomotion, in order to provide an assessment of fall risk in elder adults. However, these topics are beyond the scope of this work. In contrast to previous work, in this work a probabilistic method to automatically detect human strides is presented. The method utilizes raw signals provided by wireless accelerometers, as it is oriented to support the design of ambulatory and ubiquitous technologies for gait monitoring.
3 3.1
Stride Patterns Hardware
The sensors used in this work are two ZSTAR3 systems from Freescale semiconductor, illustrated in Figure 1(a). This is small board consisting of a digital triple axis accelerometer, the MMA7456L; a ZigBee compliant 2.4GHz transceiver, the MC1321x; and an 8-bit micro-controller with USB interface, the MC68HC908JW32. The accelerometer MMA7456L has a range of sensitivity that covers 2g, 4g,
Towards Ubiquitous Acquisition and Processing of Gait Parameters
(a) ZStar3 system from (b) Accelerometer at- (c) Accelerometer Freescale semiconductor tached to the right ankle configuration
413
axes
Fig. 1. Configuration of the accelerometer
and 8g for all three axes. The sampling rate is selectable at 30, 60 or 120 Hz. It operates from 2.4V to 3.6V. The accelerometer is capable of detecting motion, shock, vibration and freefall. 3.2
Characteristic Patterns in the Acceleration Signals of the Ankles
In this study, the accelerometers were placed on the lateral side of each ankle, five centimeters over the malleolus, as indicated in Figures 1(b) and 1(c). Periodic spatio-temporal patterns occur in typical human gait captured by accelerometers. In the X-axis signal, for instance, there are positive peaks in swing phases when a foot moves forward. In the same way, in the Z-axis signal there are positive peaks in swing phases when a foot is lifted to start a stride. Thus, these characteristic peaks in the acceleration signals can indicate that a person made a stride [5]. However, the characteristic peaks are not clearly distinguishable in the acceleration signals of elder persons, because they drag their feet or they had short stride length. To illustrate this difference, in Figure 2 the acceleration signals of both, a young and an elder person measured using the ZStar3 system are shown.
(a) Young person
(b) Elder person
Fig. 2. The accelerometer signals show two strides in X-axis and Z-axis of a young person (a), and an elder person (b)
414
I.H. L´ opez-Nava and A. Mu˜ noz-Mel´endez
Fig. 3. Flow chart of characteristic peaks extraction algorithm Input: Let S a list of n tuples that represents acceleration measures a at time t. The components of the i-th element of S, si , are denoted as ai and ti . Output: Let P a list of tuples that represents the width, height, start and end of each characteristic peak. The components of the k-th element of P , pk , are denoted as wk , hk spk and epk .
4 4.1
Automatic Stride Detection Characteristic Peaks Extraction Algorithm
We developed an algorithm to automatically extract characteristic peaks based on a local calculation of a threshold associated with significant shifts of the acceleration signals of a person. The algorithm can be summarized in the following steps: (1) Compute a local threshold by adding a constant (δ) to the origin value, δ = 0.4 for acceleration signals of young persons and δ = 0.2 for acceleration signals of elder persons (2) Search characteristic peaks over the previously calculated threshold. (3) Calculate width and height, beginning and end of positive characteristic peak previously identified. The flow chart of this algorithm is shown in Figure 3. Figure 4 illustrates characteristic peaks in both, the X-axis and Z-axis acceleration signal of the ankle of one person. These peaks were identified by the characteristic peaks extraction algorithm previously explained. In Figures 4(a) and 4(b), seven and four characteristic peaks were extracted, respectively. At this stage, all characteristic peaks can be automatically identified from raw acceleration signals. Strides cannot yet be separated from characteristic peaks, but the former are certainly included in the latter.
Towards Ubiquitous Acquisition and Processing of Gait Parameters
415
(a) X-axis
(b) Z-axis Fig. 4. Characteristic peaks, indicated with numbers, extracted by the first algorithm from acceleration signals of the X-axis (a), and Z-axis (b), separately
4.2
Related Characteristic Peaks Extraction Algorithm
As it was previously mentioned, the first algorithm was designed to extract characteristic peaks from one exclusive acceleration signal, in order to train further a classifier to separate these peaks in strides and non-strides. As acceleration signals of both, the X-axis and Z-axis are unrelated, by using this algorithm we can expect to extract many irrelevant characteristic peaks, i.e., peaks that do not correspond to strides. Therefore, a second algorithm was developed to double check the extraction of characteristic peaks in order to reduce characteristic peaks corresponding to non-strides. The second algorithm is indeed based on the output of characteristic peaks extraction algorithm, separately applied to the acceleration signals of the Xaxis and Z-axis. The related characteristic peaks extraction algorithm analyses the characteristic peaks of both signals considering the time when these peaks were detected. The second algorithm can be summarized in the following steps: (1) Define a sampling window from the values of every characteristic peak extracted from the Z-axis signal. The width of the sampling window is the width of a characteristic peak. (2) Search characteristic peaks in the X-axis related to characteristic peaks in the Z-axis, i.e., peaks whose beginning falls within the sampling window previously defined. If there were more than one related characteristic peak in the X-axis, then choose the widest peak. (3) Copy the width
416
I.H. L´ opez-Nava and A. Mu˜ noz-Mel´endez
Fig. 5. Flow chart of related characteristic peaks extraction algorithm Input: Let P Z a list of m tuples that represents the width, height, start and end of each characteristic peak in the Z-axis. The components of the i-th element of P Z, pzi , are denoted as wzi , hzi spzi and epzi . Let P X a list of n tuples that represents the width, height, start and end of each characteristic peak in the X-axis. The components of the j-th element of P X, pxj , are denoted as wxj , hxj spxj and epxj . Output: Let P ZX a list of tuples that represents information of two related peaks: the width and height of a characteristic peak in the Z-axis, and the width and height of a related characteristic peak in the X-axis. The components of the k-th element of P ZX, pzxk , are denoted as pwzk , phzk pwxk and phxk .
and height of related characteristic peaks in the output set. The flow chart of this algorithm is shown in Figure 5. Figure 6 illustrates related characteristic peaks in both, the X-axis and Zaxis acceleration signal of the ankle of one person. These peaks were identified by the related characteristic peaks extraction algorithm. Four pairs of related characteristic peaks were extracted. The second algorithm identifies significantly less characteristic peaks than the first algorithm, because only the characteristic peaks in the X-axis that were successfully time-related with characteristic peaks in Z-axis are taken into account. 4.3
Data Labeling
In order to classify acceleration signals, the sets of characteristic peaks extracted from training data by the related characteristic peaks algorithm have been preprocessed. As training data were recorded under controlled conditions, the actual number of strides performed during each experiment is known; and it is used for labeling data and training the classifier. Afterwards, new data set will be classified using the trained classifier.
Towards Ubiquitous Acquisition and Processing of Gait Parameters
417
Fig. 6. Related characteristic peaks extracted by the second algorithm from the joint sets of characteristic peaks of the X-axis (top graph), and Z-axis (bottom graph). The sampling windows calculated by the algorithm to relate characteristic peaks are indicated in gray color along the signal, spx and spz are the start of characteristic peaks of the X-axis and Z-axis, respectively.
Data are labeled applying a semi-automatic process that considers only the known number of strides, ns, performed during each experiment. The process to label related characteristic peaks is as follows: First, the average width for all values of the Z-axis and X-axis is calculated separately. Second, the distance from each value of width to the respective average width for both, Z-axis and Xaxis, is calculated separately. Third, the sum of each pair of previous distances is calculated for each pair of related characteristic peaks. And fourth, the ns related characteristic peaks with the fewest values of sum are labeled with the value of class yes, and the rest of related characteristic peaks are labeled with the value of class no. 4.4
Bayesian Classification
Once characteristic peaks have been identified and labeled, we process them in order to classify the peaks corresponding to proper strides, and those that do not correspond to real strides. For that, various classifiers have been proposed in the field of machine learning [20]. We have implemented a probabilistic classifier, namely a bayesian classifier, because this method requires a small amount of training data for classification purposes. A Simple Bayes Classifier or Naive Bayes Classifier is based on Bayes’s theorem that states that the probability of a hypothesis given certain evidence depends on its inverse, i.e., the probability of that evidence given the hypothesis.
418
I.H. L´ opez-Nava and A. Mu˜ noz-Mel´endez
Thus, a Naive Bayes Classifier determines the probability of class membership, Class, for each example in a test data set, or training data, given known features or attributes, Atti , correlated with the class, as indicated in formula (1). P (Class|Att1 , ..., Attn ) = P (Class)
n
P (Atti |Class)
(1)
i=1
4.5
Bayesian Classifier
The purpose of the bayesian classifier is to estimate the probability that one pair of related characteristic peaks is a member of the class Stride given four attributes, the width and height of a related characteristic peak in the Z-axis, wz and hz, respectively; and the width and height of a related characteristic peak in the X-axis, wx and hx, respectively. We assume that attributes are independent given the class. This classifier is based on a probability model expressed in formula (2). P (Stride|wz, hz, wk, hx) = P (Stride)
(2) weightwz
weighthz
∗ P (wz|Stride) P (hz|Stride) weightwx ∗ P (wx|Stride) P (hx|Stride)weighthx where P (Stride) is the a priori probability; P (wz|Stride), P (hz|Stride), P (wx| Stride), and P (hx|Stride) are conditional probabilities of width and height given the stride; weight is the gain ratio of each attribute respect to the class obtained by the algorithm Gain Ratio Attribute Evaluation; and P (Stride|wz, hz, wx, hx) is the a posteriori probability. The probabilities required to apply instantiated probability models are computed from training labeled data, thus, the parameters of probability models are adjusted differently depending on the data of the persons with that have been trained. A priori probability is computed from the relative frequency of each value of class in the training set. Conditional probability is computed from the relative frequency of the values of attributes wz, hz, wx, and hx given the values of class. Because the attribute values are continuous, these are divided into five intervals. Finally, once the combined bayesian classifier has been trained, it is applied to classify incoming related characteristic peaks of testing data sets. The classifier is represented in formula (3). classif y(wz, hz, wx, hx) = argmaxP (Stride|c) (3) weightwz weighthz P (hz|Stride = c) ∗ P (wz|Stride = c) ∗ P (wx|Stride = c)weightwx P (hx|Stride = c)weighthx where c is the value of the class, yes or no.
Towards Ubiquitous Acquisition and Processing of Gait Parameters
5 5.1
419
Experimental Results Setup
Two groups of persons, young people and elders, participated in this study. The first group comprises 5 healthy young individuals with an average age of seventeen; 2 women and 3 men. The second group comprises 5 healthy elder individuals with an average age of sixty-four; all women. Each individual was required to walk a number of strides after a brief period of null acceleration. Individuals walked at normal speed three times a set of six, eight and ten strides with the accelerometers attached to their ankles in a straight line, on a flat surface. We recorded 18 data sets for each person, 9 for each accelerometer; in total 180 data sets of different length for all persons. The algorithms succeeded in extracting 739 positive examples corresponding to strides and 63 negative examples corresponding to non strides from the young group, and 721 positive examples corresponding to strides and 28 negative examples corresponding to non strides from the elder group. The number of examples that were not properly detected was 70, from these 58 (83%) correspond to the very first stride of test data sets. 5.2
Classification Results
In order to evaluate the performance of our bayesian classifier, various experiments were conducted. In these experiments, data sets were divided into two groups: young individuals and elder individuals.
(a) Young people
(b) Elder people
Fig. 7. Classification results of both, young 7(a) and elder group 7(b) Table 1. Confusion matrix of young and elder group Classification Yes No
Classification Yes No
Yes 737 2 No 3 60 (a) Young people
Yes 720 1 No 13 15 (b) Elder people
420
I.H. L´ opez-Nava and A. Mu˜ noz-Mel´endez
We applied 5-fold cross validation to evaluate the classifier using separately the related characteristic peaks jointly extracted from the acceleration signals of both ankles. Thus, data sets of groups of young and elder individuals were divided into two sets: a training set comprising data of 4 individuals, and a testing set comprising data of 1 individual. The results of these experiments are 99.37% and 98.13% of average classification for the young and elder group, respectively. In Figures 7(a) and 7(b) the classification results of each individual are shown. The confusion matrix for each group are shown in Table 1. The bayesian classifier performed well in general when obtaining the number of strides, but a better classification is achieved for the group of young individuals. This result is explained by the fact that young and elder individuals walk differently (see section 3.2).
6
Concluding Remarks
A probabilistic method to automatically detect human strides was presented. The method utilizes raw acceleration signals measured using wireless triaxial accelerometers. It is based on the extraction of characteristic peaks produced by significant shifts of the acceleration, and the classification of these peaks in strides and non-strides. The work is oriented to support in the future, the design of ambulatory, low-cost, and ubiquitous technologies for gait monitoring. The research described in this paper is in progress. At this stage, we are able to automatically detect and count strides in two groups of persons: young and elder individuals, taking into account acceleration signals of the X-axis and Zaxis. These groups lie quite far apart from each other, and to consider different profiles of people it is necessary to retrain the classifier. The characteristic peaks extracted from acceleration signals in both, X-axis and Z-axis, provide important information to identify the segments of the static and dinamic phases of the gait cicle. With the signal segmented it is possible to obtain the following temporal parameters of human gait: static and dynamic phase time, cadence (steps/minute), percent single and double support time. These parameters are, among others, required to provide a gait assessment. In the near future we consider to extend the work incorporating other sensors, such as foot switches or gyroscopes, in order to calculate gait parameters that the domain experts are interested in monitoring. These parameters are stride length, step length, and cadence. Also, we will validate our results applying different methods with our data sets. Acknowledgments. The first author was supported by the Mexican National Council for Science and Technology, CONACYT, under the grant number 271539.
References 1. Haworth, J.M.: Gait, aging and dementia. Reviews in Clinical Gerontology 18, 39–52 (2008) 2. Bamberg, S.J., Benbasat, A.Y., Scarborough, D.M., Krebs, D.E., Paradiso, J.A.: Gait analysis using a shoe-integrated wireless sensor system. IEEE Transactions on Information Technology in Biomedicine 212, 413–423 (2008)
Towards Ubiquitous Acquisition and Processing of Gait Parameters
421
3. Lee, J.A., Cho, S.H., Lee, Y.J., Yang, H.K., Lee, J.W.: Portable activity monitoring system for temporal parameters of gait cycles. Journal of Medical Systems (2009) 4. Jasiewicz, J.M., Allum, J.H., Middleton, J.W., Barriskill, A., Condie, P., Purcell, B., Li, R.C.: Gait event detection using linear accelerometers or angular velocity transducers in able-bodied and spinal-cord injured individuals. Gait & Posture 24(4), 502–509 (2006) 5. Hwang, S., sung Moon, G., ho Kim, Y.: Is tow-accelerometer set-up better for the determination of gait patterns? International Federation for Medical and Biological Engineering Proceedings, vol. 14, pp. 2999–3002. Springer, Heidelberg (2007) 6. Edmison, J., Jones, M., Lockhart, T., Martin, T.: An e-textile system for motion analysis. In: International Workshop on New Generation of Wearable Systems for eHealth, Lucca, Italy, pp. 215–223 (2003) 7. Ying, H., Silex, C., Schnitzer, A., Leonhardt, S., Schiek, M.: Automatic step detection in the accelerometer signal. In: 4th International Workshop on Wearable and Implantable Body Sensor Networks, pp. 80–85. Springer, Heidelberg (2007) 8. Zijlstra, W.: Assessment of spatio-temporal parameters during unconstrained walking. European Journal of Applied Physiology 92, 39–44 (2004) 9. Mijailovi, N., Gavrilovi, M., Rafajlovi, S.: Gait phases recognition from accelerations and ground reaction forces. App. of neural networks, Telfor 1, 34–37 (2009) 10. Han, J., Jeon, H.S., Jeon, B.S., Park, K.S.: Gait detection from three dimensional acceleration signals of ankles for the patients with parkinsons disease. In: IEEE The International Special Topic Conference on Information Technology in Biomedicine, Ioannina, Epirus, Greece (2006) 11. Mathie, M.J., Celler, B.G., Lovell, N.H., Coster, A.C.F.: Classification of basic daily movements using a triaxial accelerometer. Medical and Biological Engineering and Computing 42, 679–687 (2004) 12. Iso, T., Yamazaki, K.: Gait analyzer based on a cell phone with a single three-axis accelerometer. In: 8th Conference on Human-Computer Interaction with Mobile Devices and Services, pp. 141–144. ACM, New York (2006) 13. Bourke, A., O’Brien, J., Lyons, G.: Evaluation of a threshold-based tri-axial accelerometer fall detection algorithm. Gait and Posture 26, 194–199 (2007) 14. Lopes, I.L., Vaidya, B., Rodrigues, J.R.: Sensorfall an accelerometer based mobile application. In: 2nd International Conference on Computational Science and Its Applications, pp. 749–754 (2009) 15. Culhane, K.M., O’Connor, M., Lyons, D., Lyons, G.M.: Accelerometers in rehabilitation medicine for older adults. Age Ageing 34, 556–560 (2005) 16. Liu, J., Lockhart, T.E., Jones, M., Martin, T.: Local dynamic stability assessment of motion impaired elderly using electronic textile pants. IEEE T. Automation Science and Engineering 5, 696–702 (2008) 17. Granata, K.P., Lockhart, T.E.: Dynamic stability differences in fall-prone and healthy adults. Journal of Electromyography and Kinesiology 18, 172–178 (2008) 18. Lee, H.J., Chou, L.S.: Detection of gait instability using the center of mass and center of pressure inclination angles. Archives of Physical Medicine and Rehabilitation 87, 569–575 (2006) 19. Hausdorff, J.M.: Gait variability: methods, modeling and meaning. Journal of NeuroEngineering and Rehabilitation 2, 19 (2005) 20. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Intelligent Wheelchair and Virtual Training by LabVIEW Pedro Ponce, Arturo Molina, Rafael Mendoza, Marco Antonio Ruiz, David Gregory Monnard, and Luis David Fernández del Campo Tecnológico de Monterrey Campus Ciudad de México. Calle del Puente #222, Colonia Ejidos de Huipulco, Tlalpa, 14380 México City, National Instruments Texas
[email protected],
[email protected],
[email protected]
Abstract. The following paper describes the implementation of three different controllers for a wheelchair. It was built for improving people’s life that could have disabilities associated with functions regarding muscle strength, power of muscle of all limbs, tone of muscle of all limbs and endurance of all muscles of the body. Using three programmed methods that allow the user to have full control over the wheelchair. By acquiring voltage signals generated in eye’s movements and using Neuro-Fuzzy Networks for differentiating each one, the wheelchair is able of being controlled by eye’s movements. Figuring out about different kinds of user’s needs and environments, the wheelchair includes voice control in spaces where noise levels are less than 40dB. The voice commands are detected using the Speech Recognition software which is a default program installed on Microsoft Windows operative system. The eye and voice controls were designed using LabVIEW and the intelligent control toolkit (ICTL). There were developed virtual simulators on C# code that enable the patient to have a previous training for having a better wheelchair control and performance. The system was tested on a person with cerebral palsy. Keywords: EOG, Hebbian Network, Fuzzy logic, Intelligent Control, LabVIEW, tetraplegia.
1 Introduction There are many kinds of diseases and injuries that produce mobility problems. The affected people with any disability must deal with a new lifestyle, specifically people with tetraplegia. According to ICF [1], people with tetraplegia have damages associated to power of muscle of all limbs, tone of muscles of all limbs, resistance of all the muscles of the body and endurance of all muscles of the body. With this project the main objective was to help people disabled to move any member of his own body, although this wheelchair can be used for persons with mobility problem, doctors would not recommend this wheelchair to all patients because the wheelchair reduces muscle movement, so muscular dystrophy could appear. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 422–435, 2010. © Springer-Verlag Berlin Heidelberg 2010
Intelligent Wheelchair and Virtual Training by LabVIEW
423
Nowadays there is not an efficient system that covers the different needs that a person with quadriplegia could have, the mobility of these subjects is reduced due this physical injury and depending on the damage, nurses and family assistance is required. In order to provide assistance for this problem, many platforms have been developed, however there is not an integrated system that allows the patient to have autonomous translation from one place to other, limiting the patient to remain in rest all the time. In previous projects developed in Canada and in the United States, there were wheelchairs controlled with the tongue [2] and a wheelchair controlled with the head and shoulder movements [3]. Both systems proportionate mobility for the person with any injury in functions related to muscle strength. This project is able to offer a different alternative for the patient and to build an autonomous wheelchair that can perform enough motion capacity, in order to transport a person with quadriplegia. Different kinds of controls are provided, so the trajectories required by the patient must be controlled using ocular movements or voice commands among others. An existing electric wheelchair already tested was used (the commercial Quickie wheelchair model P222[4] with a Qtronix controller). This paper is divided in three sections. In section I a brief general introduction is given. On Section 2 it will be described the Hebbian Network [5] control and how it was adapted to differentiate the diverse kinds of blinks introduced by the patient and how LabVIEW[6] helped us to easily integrate it in whole system. In section 3 there will be described how the Quickie wheelchair was controlled without the mechanical manipulation by using electromagnetic fields. Section three describes the complete system configuration and the menu diagrams that let the patient to choose from different interfaces.
2 Ocular Control System 2.1 Main Description When our eyes move, they generate a magnetic dipole; therefore a voltage signal is produced allowing us to sense these voltages using clinical electrodes. These signals come in the order of microvolts, containing noise. For getting the desired signal into the computer it was used a biomedical differential amplifier at a first electronic stage and simple amplification on the second stage. The signals are digitalized and acquired into the computer in the range of volts via data acquisition hardware for further manipulation. Once the signal is filtered and normalized the main program learns the signals for each eye movement using artificial neural networks. This allows us to classify the signal so it can be compared to the next acquired signals. In this way the system can detect which type of movement was done and assign a movement response to the wheelchair. Physiological facts There is a magnetic dipole originated between the retina and cornea that generates differences of voltage around the eye. This voltage goes from 15 to 200 microvolts
424
P. Ponce et al.
depending on the person. The voltage signals also contain noise with the fundamental of a base frequency between 3 and 6 hertz. This voltage can be plotted over time to get an electro-oculogram (EOG) [7] which describes the eye movement. Fig. 1 shows a person with the electrodes.
Fig. 1. Electrodes placement
Prior digitalizing the signals it was used an analogical stage of amplifiers, divided in two basic parts. The first p is a differential amplifier AD620 for biomedical applications, a gain of 1000x was calculated using the equation (1)
RG =
49.4kΩ G −1
(1)
Where: G is the gain of the component. Rg is the resistance that defines The digitalization of the amplified signal is done with a National Instruments DAQ [8] which is a data acquisition hardware. 2.2 Artificial Neural Networks In this particular case, Hebbian Neural Networks are used to let the system learns the five eye movements employed to control the chair. Once the movement signal is acquired and filtered it is transmitted to the neural network as input using one network for each signal. The Hebbian Neural Network will learn point by point the shape of the received signal; once the network is trained, it can compare incoming signals using the learned ones. By subtracting the incoming value of a given point of the signal, to the same point in the learned signal it was possible to calculate an error. This value will be compared to an approval-threshold, if the error is underneath the threshold value the incoming signal will be recognized as the learned signal. The system trains five networks to learn the eye movements; this means it will be “saved” the Blink, Up, Down, Left, Right movements each in one network. Once trained, the user will be able to move the eyes in any direction and the chair will be able to recognize the direction in which the user moved the eyes.
Intelligent Wheelchair and Virtual Training by LabVIEW
425
LabVIEW Code In this section there will be presented an overview of the implemented program for the EOG. Signal acquisition and filtering: In Fig. 2 there can be appreciated three icons, the first one (data) represents a local variable which receives the values form the LabVIEW utility used to connect the computer to a DAQ. The second icon represents the filter which is configured as written in part B. The third icon is an output, this will display a chart in the front panel in which the user is able to see the filtered signal in real time.
Fig. 2. Signal filter
After filtering the signal is split in frames 400 samples each. This process is very important because the signals are not all of the same length, doing this framing all the signals have the same length. Once the length of our data arrays is normalized the amplitude of the signal must be normalized as well. After all the mentioned process there will be six normalized arrays. Then these arrays are connected to their Neural Network, as shown in Fig. 3.
Fig. 3. Training system LabVIEW code
Fig. 4. Training signal, frontal panel
Taking a look to Hebbian Compar, shown on Fig. 3, it is possible to see that this section will only receive the variables and give the calculated error a return variable. Getting into Hebbian Learn A, shown in Fig. 3, the diagram of the configuration of this kind of network is shown. This network will take the point from the incoming signal and will do an approximation to this value and save it into W, id train is false the network will only return the values already saved in W, by doing this “Hebbian Compar” will calculate the error between the incoming signal and W.
426
P. Ponce et al.
The Front Panel on LabVIEW shown on Fig. 4 is the screen that the user will see when training the wheelchair. On the upper half of the screen are all the charts of the EOG (Fig. 5), the upper two charts refer to the filtered signals of both, the vertical and the horizontal channels. The lower tow charts refer to the length normalized signals. The right side chart will show the detected signal. On the lower half of the screen (Fig. 6), the user will see all the controls needed to train the system. A main training button will activate set all the systems in training mode. Then the user will select which signal will be trained. Doing so will open the connection to only let that signal through for its training, once recognized the signal will appear in the biggest chart and the user will push the corresponding switch to train the Neural Network. When the user has trained all the movements the program needs to be set in comparison mode, for this case the user must deactivate the principal training button and put the selector on Blink (Parpadeo). Then the system is ready to receive signals. To avoid problems with natural eye movements it was programmed to command the chair with codes. To get the program into motion mode the user must blink twice, which is why the selector was put on Parpadeo, so that the first signals that go through the system could only be Blink signals. After the system recognizes two Blink signals the chair is ready to receive any other signal, by looking up the chair will move forward, looking left or right will make the chair turn in that corresponding way, the system will stay in motion mode until the user looks down, this command will stop the chair and reset the program to wait for the two blinks. Embedding this code into a higher level of the program will allow the EOG system to communicate with the control program that will receive the Boolean variables for each eye movement direction. Depending of the Boolean variable received on true the control program will command the chair to move in that direction.
Fig. 5. Filtered signal for vertical and horizontal channels
Fig. 6. Eye movement
Fig. 7. Analog received input from eye movements
Intelligent Wheelchair and Virtual Training by LabVIEW
427
Fig. 8. Signal for “look up”
3 Voice Control The wheelchair in this work is intended to be used by quadriplegic patients, it was programmed to add a voice message system to the chair. This system, will also allow the user to send pre-recorded voice messages by means of the EOG system, as another way to assist the patient. 3.1 EOG and Voice Message System Coupling In this section of the project it was maintained all the EOG system and programming the same way as in the previous direction control system. But instead of coupling the EOG system to the motor control program it was coupled it to a very simple program that allows the computer to play pre-recorded messages, such as: I am hungry, I am tired, etc. The messages can be recorded to meet the patient’s needs to aid him in his communication with his environment. The EOG program will return a boolean variable which will then select the message corresponding to eye movement chosen by the user. The selection will search for the path of the saved pre-recorded message and will then play it.
a)
b)
Fig. 9. a) shows the activated Boolean received into a case structure that selects the path used for opening each archive. b) Sound playback.
The second stage shown on Fig. 11b of the structure was taken from the examples in LabVIEW. This example works by opening a *.wav file, checking for errors and preparing the file to be reproduced, a while structure is used to play the message until the end of the file is reached, then the file is closed and the sequence is over.
428
P. Ponce et al.
3.2 Voice Commands Patient with less severe motion problems it was decided to implement a Voice Command system. This allows the user to tell the chair in which direction he wants to move. 3.3 Program Description For this section there were used two separated programs, the first one is Windows Speech Recognition and the second is Speech Test 8.5 by Leo Cordaro (NI DOC4477). The Speech Test program allows us to modify the phrases that Windows Speech Recognition will recognize, by doing so and coupling Speech Test to our control system it is possible to control the chair with voice commands. Getting into the Speech Test 8.5 Vi it is possible to modify the phrases of input. By selecting speech (selection box) it is possible to activate the connection between both programs. Then Speech Test 8.5 will receive the variable from Speech Recognition and by connecting it to our control system it is possible to receive the same variable and control the chair. At the beginning the user must train the Windows Speech Recognition. It is strongly recommended, because although it could differentiate the voice from different people, the systems fails continuously if many people use the same trainedconfiguration. By other hand, the different tests that were performed in a closed place without any source of noise were 100% satisfactory. This system works in the same way as the EOG, by saying the chair derecho, the chair will start moving. Saying the words derecha or izquierda will turn the chair right or left and by saying atrás the chair will stop. The Boolean variable is received into our control system the same way as in the EOG.
4 Obstacle Avoidance System In order to avoid collisions and give to the patient more freedom in movements, it was developed a system that can avoid dynamic and static objects. The system was integrated with three ultrasonic PING sensors developed by Parallax[9]. The structure to manage the different situations where the wheelchair can be, were considered by indicating the different combinations that the ultrasonic sensor could have in a logic table. This combination would indicate to the wheelchair what to do. For example, considering that the two frontal distance sensors are indicating a distance closer than ten centimeters, then the wheelchair will go back else the system could crash. On figure 10 there are the different values where begins the different positions: far, close or normal. Then these rules are computed to the membership functions as shown on Fig. 11. These values can be changed by the user depending on its own criterion. This fuzzy controller determines how the wheelchair will be moved by controlling the pulses sent to each motor.
Intelligent Wheelchair and Virtual Training by LabVIEW
Fig. 10. Different possibilities that the wheelchair could face in a real situation
429
Fig. 11. Membership values for the left sensor
5 Direction Control On the original wheelchair’s control, the Quickie wheelchair offered to the patient a joystick that could be manipulated to control the wheelchair in any direction. The operation principle implemented on Quickie’s direction control, consisted in a lever with two electromagnetic sensors. When sensing certain amount of field, the wheelchair can be driven to the direction determined by both electromagnetic sensors, one of them controls left or right and the other forward or backward. The controller was built with a magnet placed on one end of the lever, in order to go in one direction, the patient must move the lever, so the magnet could be closer or further away from any of the electromagnetic sensors generating an electromagnetic field proportional this relation. Depending on the intensity of field, the wheelchair could be moved in any direction. In order to control the wheelchair, it is required to generate a magnetic field emulating the one induced by the magnet placed on the lever. There were placed two coils in front of each sensor at a distance of 5 mm to generate one field for each sensor, using a NI CompactRIO 9014 [10] with two Full HBridge Brushed DC Servo Drive Modules 9505 [11], there were sent PWMs in order to use the coils as electro-magnets. The amplitude of pulse was constant: 25 V (this value depends on the voltage of the power battery), but the coils were generating two electromagnetic fields in opposite directions in a space of 4 cm. The combination of both fields produced unexpected results because the sensors identified random values then the directions could not be executed by using single coil polarization, which could produce the four directions. This problem was solved by searching configuration values that could proportionate us the four possible directions (forward, backward, right and left).
430
P. Ponce et al.
The signal was controlled by two variables per coil, one of them indicated the frequency of the pulse and the second lead the direction of the current. It can be observed on Fig. 12 both variables: PWM DC (%) RIGHT C and DD Right which are connected to a Case Structure, depending on the boolean value selected on DD Right, the current will flow in one direction with a duty determined by PWM DC (%) RIGHT C.These variables were used for the right coil, the left coil has its own variable and both regulate the same values as in the right one.
Fig. 12. The configuration for the frequency and direction of the signal is controlled by introducing numerical values from 0 to 100 in the “PWM DC (%) RIGHT C” variable. The current direction depends on “D D Right” Boolean value.
In the configuration for the case structure and depending on the selection of the variable “D D Right”, the introduced value of “PWM DC (%) RIGHT C” will be inverted or not, this means that the current will flow in one sense or in the opposite. The “Read/Write Control” will lead the previous values to the outputs of the CompactRIO. As previously mentioned, these values determined the PWM's duty for each coil. By doing current measurement it is possible to obtain the following values:
Fig. 13. Data obtained from the different configurations of the coils
6 Structural Design The full system was built on a Quickie Wheelchair. The full system diagram is shown on Figure 14. Using LabVIEW and it was programmed three different kinds of controls: 1. Voice control 2. Eye-movements control 3. Keyboard control
Intelligent Wheelchair and Virtual Training by LabVIEW
431
Fig. 14. Control structure. The user can select any of them depending of his needs and the surrounding environment.
The wheelchair is controlled using two coils to generate an electromagnetic field which could be detected by two sensors. Depending on the density of the magnetic field and its intensity, the motors could be controlled to move the wheelchair in any direction. However this solution also requires that both coils to be fixed in a place and they cannot be moved with respect to the sensors, so the sensed field will be always the same for one determined configuration. The use of fuzzy logic to design an obstacle-avoidance system and the Hebbian network used to determine the different kinds of eye movements were the tools that helped us to obtain efficient answers. To obtain the signals which would be processed with these systems it is required specific hardware. Each control was programmed on LabVIEW in different files and all of them were included in a LabVIEW project. Each one must be executed separately, so when using the voice controller the eye controller cannot be used. In a future the obstacle-avoidance system will have higher priority than the eye and voice controllers. The hardware used is: 1.
DAQ: for sensing the different voltages signals produced by the eyes and to set the directions using the manual control. Each one gets values from different ports of a NI USB-6210. In the case of the voltages generates by the eyes, the data acquisition was connected to the analog port and the manual control to the digital input port. 2. CompactRIO: This device was used to generate PWM to the coils allowing us to have control for the different directions. The cRIO model 9014 had the following modules: a. Two H-bridges: for controlling the PWM. b. 5 V TTL Bidirectional digital I/O Module.
432
P. Ponce et al.
3. BasicStamp[12]: This device is used to acquire the signals detected from three ultrasonic distance sensors. 4. Three ultrasonic sensors that measure distance and are used to help the obstacles-avoidance system. Two of them are placed in front of the wheelchair and one at the back. 5. A laptop to execute the programs and visualize the different commands introduced by the user.
a)
b)
Fig. 15. a) Placement of the different components. b) Connection’s diagram.
7 Virtual Training In order to train the user, two different scenarios were designed. The code was programmed using XNA libraries over C# [13]. The scenarios include dynamic and static objects that emulate as close as possible to the reality. There were considered different intensities of light in the scenarios that could vary depending on the daylight and that could affect the eye’s controller (Fig. 16). There were simulated different kinds of friction floors that affect the wheelchair control. This simulator can be used with ocular, manual and voice controls. The user’s statistics can be viewed online for the user’s doctor (Fig. 17).
Statistics for each user
Fig. 16. Dynamic objects on the scenarios
Fig. 17. Web page
Intelligent Wheelchair and Virtual Training by LabVIEW
433
8 Results The following results show the controller performance; the voice controller increase the exactitude, if the wheelchair needs to work in a noisy environment, it has to be included a noise cancelation system. The results are pretty well when the wheelchair works on normal noise environment, the average value goes from 0 to 70 db, getting an average value around 90% .The EOG recognition system changes the precision value according with the time; the reason is that the user requires certain time for move the eye in a correct way in order to recognize the signal. At the beginning the user is not familiarized with the whole system, thus the signal is not well define and the system has a medium performance after six weeks the system increase the precision around 94 % . The following Fig 18 shows the results.
Fig. 18. a) Tests on voice control system responce
Fig. 18. b) EOG system responce
434
P. Ponce et al.
9 Conclusions The complete system works well in a laboratory environment. The signals from eye movement and voice commands are translated into actual movement of the chair allowing people who are disabled and cannot move hands or even head to move freely through spaces. There is still not a full version that can run avoidance system at the same time as the chair is been controlled with eye movement. This should be the next step for further work. As it was intended, it was successfully completed the four main control systems which give the wheel chair more compatibility and adaptability to patients with different disorders. This allowing the chair to move with the eyes for those who cannot speak. speech recognition for people that cannot move and directional buttons (joystick) for any other users. Many problems were presented at the time of trying to interfere between the systems already built by the manufacturer; the use of magnetic inductors is one of the temporal solutions that should be eliminated even though the emulation of the joystick is good and works well. The use of these inductors produces lots of power loses reducing the in-use time of the batteries considerably. Also generates a small retardation to the use of windows vista speech recognition software also enters some faults to our system since it is well known this user interface is not well developed and sometimes doesn’t recognize what it was expected, which is not that good for a system as ours that requires a quick response to the commands. With this project it was demonstrate how intelligent control systems can be applied to improve already existing products. When using intelligent algorithms it was widen the possibilities of interpretation and manipulation. Acknowledgements. Work done under partial support of National Instruments Austin Texas, we want to thank Jeannie Falcon, Eloisa Acha and Brian MacCleery.
References 1. International clasification of functioning, disability and health: ICF. World health Organization, Geneva, 228 págs (2001) 2. Krishnamurthy, G., Ghovanloo, M.: Tongue Drive: A Tongue Operated Magnetic sensor Based Wireless Assistive Technology for people with severe Disabilitites. In: Proceedings of 2006 IEEE International Symposium on IEEE Circuits and Systems, ISCAS 2006 (2006) 3. United States Department of Veterans Affairs, New Head Control for Quadriplegic Patients, Rehabilitation Research & Development Service (1970), http://www.rehab.research.va.gov/jour/75/12/1/lozach.pdf (accesed October 21, 2009) 4. Quickie-Wheelchair.com, Quickie P222 SE – Wheelchair – Quickie-Wheelchair.com, http://www.quickie-wheelchairs.com/products/ Quickie-P222-SE-2974.html (accesed October 16, 2009) 5. Ponce, P., Ramirez, F.D.: Intelligent Control Systems with LabVIEW. Springer, United Kingdom (2009) 6. National Instruments Corporation, NI LabVIEW – The Software That Powers Virtual Instrumentation – National Instruments, http://www.ni.com/labview/ (accesed October 16, 2009)
Intelligent Wheelchair and Virtual Training by LabVIEW
435
7. Barea, R., Boquete, L., Mazo, M., López, E., Bergasa, L.M.: Aplicación de electrooculografía para ayuda a minusválidos.; Alcalá de Henares. Universidad de Alcalá, Madrid, Spain 8. National Instruments Corporation, NI USB-6210 – National Instruments, http://sine.ni.com/nips/cds/print/p/lang/en/nid/203189 (accesed October 16, 2009) 9. National Instruments Corporation, NI Crio-9014 – National Instruments, http://sine.ni.com/nips/cds/print/p/lang/en/nid/203500 (accesed October 16, 2009) 10. National Instruments Corporation, NI 9505 – National Instruments, http://sine.ni.com/nips/cds/print/p/lang/en/nid/202711 (accesed October 16, 2009) 11. Parallax Inc. BASIC Stamp Discovery Kit – Serial (With USB Adapter and Cable), http://www.parallax.com/StoreSearchResults/tabid/768/ txtSearch/bs2/List/0/SortField/4/ProductID/320/Default.aspx (accesed October 21, 2009) 12. XNA libraries, http://msdn.microsoft.com/en-us/aa937791.aspx
Environmental Pattern Recognition for Assessment of Air Quality Data with the Gamma Classifier José Juan Carbajal Hernández, Luis Pastor Sánchez Fernández, and Pablo Manrique Ramírez Center of Computer Research – National Polytechnic Institute, Av. Juan de Dios Bátiz S/N, Col. Nva. Industrial Vallejo, 07738, México D.F., México 57296000, ext. 56573
[email protected], {lsanchez,pmanrriq}@cic.ipn.mx
Abstract. Nowadays efficient methods for air quality assessment are needed in order to detect negative problems in human health. A new computational model is developed in order to evaluate toxic compounds in air of urban areas that can be harmful in sensitive people, affecting their normal activities. Using the Gamma classifier (Γ), environmental variables are assessed determining their negative impact in air quality based on their toxicity limits, the average of the frequency and the deviations of toxic tests. A fuzzy inference system uses the environmental classifications providing an air quality index, which describes the pollution levels in five stages: excellent, good, regular, bad and danger respectively. Keywords: Artificial intelligence, air quality, fuzzy inference systems, pollution.
1 Introduction The presence in the air of substances which involve risk, danger or serious problems in health for people is known as air pollution. The main sources of air pollution are industrial processes involving combustion (industry and automobile) [1]. A sufficient supply of good air quality is essential to any human activity. The criteria for good air quality vary with the kind of organism and are established in levels [2]. Some methodologies for assessment and monitoring of air pollutants have been implemented by organizations as the United States Environmental Protection Agency (USEPA) [1], the Pan American Health Organization (PAHO) [3] and the Mexican Ministry of Environment (Secretaría del Medio Ambiente, SMA in Spanish) [2]. The need for more appropriated techniques to manage the importance of air quality variables, the interpretation of an acceptable range for each parameter, and the method used to integrate dissimilar parameters involved in the evaluation process is clearly recognized. In this sense some alternative methodologies of artificial intelligence have been applied to the analysis of environmental pollution, such as artificial neural networks [4], associative memories [5], and support vector machines [6] among others. Air quality requirements are based on the results of chemical toxicity tests. These tests measure the responses of the people to defined quantities of specific compounds. G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 436–445, 2010. © Springer-Verlag Berlin Heidelberg 2010
Environmental Pattern Recognition for Assessment of Air Quality Data
437
Physical-chemical variables have certain toxicity limits where low or high concentrations can be harmful for organisms [7]. Following the negative situations generated by combination between variables, it is possible to implement a computational model that according to those limits and fluctuations can be used for determining when a concentration is good or bad for people. This strategy will reduce potential negative situations in population and illnesses will be consequently also reduced. This paper is based on analyzing the ecosystem of the Mexico City valley, where variables that compounds the environment are assessed for establishing an indicator of the good or bad air quality of the habitat.
2 Air Quality The production system and methods for assessment of air quality in the valley of Mexico City, monitor the most harmful compounds for human health presented in the atmosphere; ozone, nitrogen dioxide, sulfur dioxide, carbon monoxide and particles smaller that 10 micrometers every hour daily [8], [9], [10], [11]. The Mexico City Atmospheric Monitoring System (Sistema de Monitoreo Ambiental, SIMAT in Spanish) provides an index for air quality assessment in the valley of Mexico City, which defines some criteria of how the air pollution affects the health in people [12]. SIMAT is committed to operate and maintain a trustworthy system for the monitoring of air quality in Mexico City [5], made up for the Automatic Network for Atmospheric Monitoring (RAMA) and the Manual Network for Atmospheric Monitoring (REDMA), publishing their information of pollutants concentrations [13], [14]. 2.1 IMECA The metropolitan index for air quality (Indice Metropolitano de la Calidad del Aire, IMECA in Spanish), was developed for the SIMAT, and it informs the pollution level of the valley of Mexico City in a [0, 200] range [7], [12]. The IMECA spreads for pollutants as: ozone (O3), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO), particles smaller that 10 micrometers (PM10) and particles smaller that 2.5 micrometers (PM2.5). The IMECA for the excellent air quality can be calculated using the following transforming equations: 100 0.11
(1)
50 5.50
(4)
50 0.105 5 6
100 0.13
(2)
(5)
50 .
(3) (6)
15.4
where the IMECA can be determined using the index with higher values. IMECA equations use the moving average of concentrations, measured within 18 to 24 hours for SO2 and PM10, and 6 to 8 hours for O3 and CO respectively.
438
J.J. Carbajal Hernández, L.P. Sánchez Fernández, and P. Manrique Ramírez
2.2 Environmental Analysis Environmental variables present random perturbations that can be harmful for large time expositions. In order to classify the negative impact of environmental variables, it is needed to define the levels for optimal or harmful concentrations. According with the Environmental Standard for the Mexico valley NADF-009-AIRE-2006, toxicity levels of variables are defined in 5 ranges; excellent, regular, high, very high and extremely high. The values of levels for each variable are defined in Table 1. Table 1. Classification levels of variables defined by the NADF-009-AIRE-2006 IMECA levels
O3 (ppm]
0.000 - 0.055
0.056 - 0.110
0.111 - 0.165
“Extremely High” 0.166 - 0.220 >0.220
NO2 [ppm]
0.000 – 0.105
0.106 – 0.210
0.211 - 0.315
0.316 - 0.420
> 0.420
SO2 [ppm]
0.000 - 0.065
0.066 - 0.130
0.131 - 0.195
0.196 - 0.260
> 0.260
CO [ppm]
> 22.00
Variables
“Excellent”
“Regular”
“High”
“Very High”
0.00 – 5.50
5.51 - 11.00
11.01 - 16.50
16.51 – 22.00
PM10 [µg/m3]
0 – 60
61 – 120
121 – 220
221 – 320
>320
PM2.5[µg/m3]
0 – 15.4
15.5 – 40.4
40.5 – 65.4
65.5 – 150.4
> 150.4
“Excellent” 0 – 50
“Good” 51 - 100
“Regular” 101 – 150
“Bad” 151 – 200
“Danger” >200
AIQ
3 Gamma Index (Γ) The Gamma index is implemented as an alternative tool to determine how the fluctuations of variables affect the air quality condition. Since the IMECA equations use the moving average, deviations effects are not considered in the analysis of the negative situations; these deviations represents exposures of population to toxic concentrations in time periods. In order to determine these effects, the Gamma index (Γ) calculates the average of measurements out of range (failed tests) and their deviations. Therefore the Γ index provides an [0, 1] range, that describes the level of toxicity of a variable as follows. Index 1: α (Frequency) The frequency represents the percentage of individual measurements that are out of range (failed tests). α
(7)
where α is the failed index, mf is the number of failed tests and mT is the number of
total tests of the variable.
Environmental Pattern Recognition for Assessment of Air Quality Data
439
Index 2: β (Amplitude) The average of the deviations of failed test is calculated in three steps: when the value must not exceed the level: (8) where e is the deviation of the failed test; m is the value of the test; la is the upper limit of the range to evaluate; ta is the upper tolerance. When the value must not fall below the level: (9) where lb is the lower limit of the range evaluated; tb is the lower tolerance of the range. The average of the deviation is calculated as: ∑
(10)
where i: 1, 2, … n; n is the number of calculated deviations; mT is of total tests and d in the deviation. The β index can be expressed as follows: 0
0
1
1
(11)
Index 3: Γ (Physical – chemical assessment index) The Gamma index (Г) classifies the behavior of the variable establishing a level status. The Г index can be expressed as follows: Γ
2
(12)
The Γ result can be interpreted as follows:
• •
If 0 ≤ Γ < 1, the variable behavior is classified inside the evaluated range. If Γ = 1, the variable behavior is classified totally outside the evaluated range.
4 Fuzzy Inference System (FIS) 4.1 Introduction Fuzzy logic theory, characterized to be conceptually easy of understanding, and based on natural language, have been successfully used to model non-linear functions, to build inference systems on top of the experience of experts, and to deal with imprecise data [15]. These advantages have been applied to face air related complex environmental problems. In the present study, the fuzzy logic formalism has been used to
440
J.J. Carbajal Hernández, L.P. Sánchez Fernández, and P. Manrique Ramírez
assess air quality by developing an air quality index based on fuzzy reasoning. Advantages and disadvantages of fuzzy logic over traditional methodologies are discussed [15], [16]. Fuzzy inference is the process of formulating the mapping from a given input to an output using fuzzy logic. The mapping then provides a basis from which decisions can be made, or patterns discerned [16], [17]. The process of fuzzy inference can be expressed in three phases: membership functions, inference rules (If-Then rules) and aggregation (Fig, 1). Rule 1 Measurements
ΓO3 ΓNO2
. .
Rule 2
. .
AQI
Σ
• • • • •
Excellent Good Regular Bad Danger
ΓPM10 Rule n Concentrations are processed by membership functions
Membership outputs are evaluated by rules
Rule outputs are aggregated
Air Quality Index is calculated
Fig. 1. Architecture of the fuzzy inference system applied to the air quality problem
4.2 Air Quality Levels In accordance with the NADF-009-AIRE-2006 the negative effects in health of air pollutants can be described as follows: • • • •
•
Good: suitable for conducing outdoor activities. Regular: it can carry out outdoor activities; possible discomfort in children, the elderly and people with illnesses. Bad: avoid outdoor activities; greater health effects in the population, particularly in children and older adults with cardiovascular and / or respiratory problems such as asthma. Very bad: greater adverse health effects in the general population, particularly children and older adults with cardiovascular and / or respiratory conditions such as asthma. Extremely bad: health effects on the general population. Serious complications can be presented in children and older adults with cardiovascular and / or respiratory conditions such as asthma.
4.3 Membership Functions Membership functions (μ) transform real data in [0, 1] functions, whose can be implemented in different ways. The fuzzy inference system (FIS) is implemented using
Environmental Pattern Recognition for Assessment of Air Quality Data
441
the Γ index as input membership function; this index analyzes the concentrations of each variable. The air quality index (AQI) is calculated using trapezoidal membership functions (Fig. 2). For the purpose of the present study, the shape of the membership functions is secondary. However linear fuzzy sets facilitate the defuzzification. ϭ͘Ϭ Ϭ͘ϴ Ϭ͘ϲ Ϭ͘ϰ
"Excellent"
"Good"
"Regular"
"Bad"
Ϭ͘Ϯ
ϭ ϭϭ Ϯϭ ϯϭ ϰϭ ϱϭ ϲϭ ϳϭ ϴϭ ϵϭ ϭϬϭ ϭϭϭ ϭϮϭ ϭϯϭ ϭϰϭ ϭϱϭ ϭϲϭ ϭϳϭ ϭϴϭ ϭϵϭ ϮϬϭ Ϯϭϭ ϮϮϭ Ϯϯϭ Ϯϰϭ
Ϭ͘Ϭ
AQI
Fig. 2. Air quality membership functions
4.4 Inference Rules In air quality assessment, expressions as the following are frequently used by the experts: “if the levels of ozone are high and the levels of sulfur dioxide are good, then the expected air quality is regular. In fuzzy language, it could be enunciated as follows: Rule 1: If O3 is high and SO2 is good then AQI is regular. In the same way, other rules can be enunciated. The robustness of the system depends on the number and quality of the rules; in this work 135 rules have been used in the FIS. In this example we enunciate two more rules: Rule 2: If O3 is good and NO2 is good and SO2 is good and CO is good and PM10 is good and PM2.5 is good then AQI is excellent.
Rule 3: If O3 is good and NO2 is regular and SO2 is good and CO is good and PM10 is good and PM2.5 is good then AQI is regular.
The output rules are fuzzy expressions that can be calculated as follows: ,
,
,
,
,
.
.
(13)
where i, j, k, l, m and n are the evaluated levels respectively. Fig. 3 illustrates the operation of rules 2 and 3. 4.5 Defuzzification Membership functions are implemented as a way to transform the measurements to inputs of the FIS; however the defuzzification process is the way to transform fuzzy outputs in real values and it that can be implemented using the membership functions of AIQ (Fig. 2).
442
J.J. Carbajal Hernández, L.P. Sánchez Fernández, and P. Manrique Ramírez
Defuzzification process can be implemented in three phases: first the output rules are matched with AQI memberships (Fig. 3) as follows: ,
(14)
where l is the selected membership function (excellent, good, regular, bad or danger). All membership functions ( are aggregated creating one final membership ), Fig. 3 shows this process. function ( Rule If
O3
and
NO2
and
SO2
and
CO
and
PM10
Good ī=1
Good ī=1
Good ī=1
Good ī 0 do 6: for m = 1 a N do 7: Compute next direction dt+1,m . 8: xt+1,m ⇐ xt,m + Δ dt+1,m 9: if F A on xt+1,m < τ then 10: The m-th particle is stopped. 11: N ⇐ N − 1. 12: The particles are reindexed. 13: end if 14: t ⇐ t+1 15: end for 16: end while
2.2
Clustering and Outlier Rejection
In order to estimate axonal fibers we need to cluster the fibers paths and discard the outliers paths. Note the high dimensionality of the data: one path have
452
R. Aranda et al.
dimension from R3×M , assuming M steps. Each path has an initial point, a trajectory and an ending point. According with our experiments, the most important feature for clustering particles paths are their final position (end–points). This can easy be understood from the cat that the initial particles points are fixed and as they spread as the number of iterations increases. Thus, we clasify the set of recovered pathways by means of a non parametric clustering method named hierarchical clustering. Hierarchical clustering algorithms usually are either agglomerative (“bottom-up”) or divisive (“top-down”). For our clustering method, we use agglomerative algorithms, which starts with each element as a separate cluster and merge them into successively larger clusters[7]. We also use single linkage, also called nearest neighbor, as the method for compute the distance between clusters. Single linkage uses the smallest distance between objects in the two clusters, c1 and c2 as: d(c1 , c2 ) = min(dist(ˆ xc1 ,i , x ˆc2 ,j )), i (1, ..., nc1 ), j (1, ..., nc2 ) where x ˆq,l is an element l of the cluster q with nq elements and dist(a, b) is a distance measure (we use the Euclidean distance), in our case the x ˆ s are terminal points. Thus if d(c1 , c2 ) < c, c1 and c2 are joined, where c is a certain threshold of distance. Once we compute the clusters, we discard the false clusters if they are composed by a percentage of pathways which is lower than a given percentage ϕ, namely, if the cluster contains few fibers it is eliminated. Summary. The particle walks clustering is summarized in the Algorithm 2.
Algorithm 2. Clustering and Outlier Rejection Require: The particles end–points of the walks of the Algorithm 1, thresholds ϕ and c. 1: Cluster the fibres using hierarchical clustering with distance parameter c. 2: Let Q as the number of clusters. 3: for q = 1 to Q do 4: Set ncq the number of fibres of cq . 5: All walks are averaged to obtain an estimate of the path of a bundle of axons. 6: if ncq < ϕ then 7: The cluster cq is eliminated. 8: end if 9: end for
3
Experiments and Results
In order to show the performance of our method we use three different types of DW Data: Synthetic data. The DW-MRI signal was synthesized from the GMM (1). The DT principal eigenvalue was set to 1 × 10−3 mm2 / s and the second and
Massive Particles for Brain Tractography
(a)
453
(b)
Fig. 2. Multi-Tensorial fields. (a) Synthetic data, (b) Diffusion Phantom data.
(a)
(b)
(c)
Fig. 3. Seed points of the differents data types. (a) Synthetic data: mark blue. (b) Diffusion Phantom data: red marks (c) Brain Human Data: blue mark and the corpus callosum.
third tensor eigenvalues were 2.22 × 10−4 mm2 / s, FA= 0.74. The above values were taken from a sample of tensors observed in the brain data from a healthy volunteer. Rician noise was added to each measurement to produce SNR = 9. For these data, we have 1 repetitions. See Figure 2(a). Data from a diffusion phantom. We used data acquired from a diffusion phantom [8]. Layers of hydrophobic acrylic fibres were interleaved and stack in each other to build fibre crossing configurations. Diffusion-weighted data were acquired on the 3T Tim Trio MRI systems with 12-channel. The data is available at http://www.lnao.fr/spip.php?article112. For these data, we have only 2 repetitions. See Figure 2(b). In vivo Brain Human Data. A single healthy volunteer was scanned on a Siemens Trio 3T scanner with12 channel coil. Acquisition parameters: single-shot echo-planar imaging, five images for b=0 s/mm, 64 DW images with unique, isotropically distributed orientations (b=1000 s/ mm2 ), TR=6700 ms, TE=85 ms, 90o flip angle, voxel dimensions equal to 2 × 2 × 2 mm3 . The
454
R. Aranda et al.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Fig. 4. Results step by step (by rows) of Traking fiber on synthetic data by using different values of G. (a) G=0, (d) G=0.0001 and (g) G=0.002. (b), (e) and (h) are the results without outliers, respectively. (c), (f) and (i) show the computed bundles.
approximated Signal to Noise Ratio (SNR) is equal to 26. For these data, we acquire 5 repetitions. The first experiment was performed with synthetic data. The Figure 3 shows the results obtained, step by step, for the fibers crossing with different values of G and using as seed point the blue mark of image 3(a). We can see that when G=0 is used many particles take a wrong way. On the other hand, when weak gravity is used, the above problem is corrected for the majority of particles but the method still explores the potential bifurcations.. Finally, when strong gravity is used the particles no longer can explore the medium because they remain together. This is not proper when there are bifurcations bacause the exploration is not allowed.
Massive Particles for Brain Tractography
(a)
(b)
455
(c)
Fig. 5. Result of Traking fiber on diffusion phantom data: (a) our approach, (b) Method Ram´ırez-Manzanares et. al. and (c) ground truth
Table 1. Root Mean Square Error using L2 norm between the obtained fibers with the method of Ram´ırez-Manzanares et. al. (RM) and ours (MP) Fibers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
MP
2.56 1.96 4.28 2.05 1.9 1.63 66.5 2.62 7.71 33.58 3.69 2.24 2.31 4.41 3.31 6.42
RM
4.98 39.65 53.84 6.02 7.36 2.65 71.1 14.77 10.2 9.41 21.51 14.83 20.01 2.94 16.37 14.35
The second experiment was performed using the diffusion phantom data. The Figure 3 shows the visual comparison between the results obtained from the seed points in image 3(b) by using our approach, the method of Ram´ırez-Manzanares et. al. [9] and the ground truth. Note that the visual comparation is representative since the diffusion phantom is basically in 2D. In this manner, if a fiber is seen similar to ground truth, then, the error between the obtained fiber and the real fiber will be minimal. This can be seen also in Table 1. The Table 1 shows the numerical comparison, fiber by fiber, using L2 norm. Thus, one can see that we recover correctly 13 of 16 tracks. Instead, the method of Ram´ırez-Manzanares et. al. recover correctly only 5 fibres. It is important to mention that all obtained tracks were performed with the same set of parameters, also, the presented fibers are the average from the group (cluster) with the highest number of walks. The last experiment was performed using in vivo Brain Human data. The Figure 6 shows the obtained results using one seed point planted in the corpus callosum indicated by the blue mark in image 3(c). The Panel (a) shows the results without gavity force and the panel (b) is shown the tracks using G=0.0001 and 500 particles. This shows very clear that when weak gravity is used the trajectories obtained are more consistent that when the gravity force is not used. On the other hand, the Figure 7 shows different views of the results of our approach for in vivo brain human data using 47 seed points around corpus callosum (see image 3(c)), 1000 particles for each seed point and G=0.00005. These images show the estimated pathways without outliers. Also, one can see the averaged (main) connections in the panels (d), (e) y (f).
456
R. Aranda et al.
(a)
(b)
Fig. 6. Results of fiber tracking on in vivo brain human data using 500 particles and (a) G=0, (b)G=0.0001. The seed point is located on the blue mark in Figure 3(c).
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 7. Results of fiber tracking on in vivo human data using 47 seed points on corpus callosum, with G=0.00005 and 1000 particles. (a), (b) and (c) Differents views of fibers without outliers, (d), (e) and (f) differents views of fiber Bundles.
4
Conclusions
This report presents a novel method for stochastic tractography based on massive particles. The particles dynamic depends on its previous direction (inertia), its current position, the orientation information of its neighbor voxels (medium),
Massive Particles for Brain Tractography
457
and on a new proposed gravitational term. The inertia, the medium and the gravitational force among particles promote smooth particle trajectories. The gravitational force aids in the correction of particles trajectories as well as allowing medium exploration. According to our results, there is a compromise between the gravitational forces and particle spread in the medium (i.e. the capability of exploration). In practice, smaller values of gravity significantly improves the solution of w.r.t. the computed solutions without the gravity force. In addition, we present a method for clustering particles comprising the pathway for the axonal fiber tracks. This clustering method allows removal of portion of the axonal bundles that have been wrongly estimated. True performance of the proposed method was demonstrated on synthatic and real human brain images. Acknowledges. This work was supported by the Consejo Nacional de Ciencia y Tecnologia, Mexico, [MSc. Scholarship to R.A. and 61367-Y to M.R.].
References 1. Basser, P.J., Pierpaoli, C.: Microstructural and physiological features of tissues elucidated by quantitative-diffusion-tensor MRI. J. Magn. Reson. B 111 (1996) 2. Ram´ırez-Manzanares, A., Rivera, M.: Basis tensor decomposition for restoring intravoxel structure and stochastic walks for inferring brain connectivity in DT-MRI. Int. Journ. of Comp. Vis. 69, 77–92 (2006) 3. Bergmann, O., Kindlmann, G., Peled, S., Westin, C.F.: Two-tensor fiber tractography. In: IEEE 2007 International Symposium on Biomedical Imaging (ISBI), Washington, D.C. (2007) 4. Malcolm, J.G., Michailovich, O., Bouix, S., Westin, C.F., Shenton, M.E., Rathi, Y.: A filtered approach to neural tractography using the watson directional function. Medical Image Analysis 14, 58–69 (2010) 5. Tuch, D.S., Reese, T.G., Wiegell, M.R., Makris, N., Belliveau, J.W., Wedeen, V.J.: High angular resolution diffusion imaging reveals intravoxel white matter fiber heterogeneity. Magn. Reson. Med. 48, 577–582 (2002) 6. Ram´ırez-Manzanares, A., Rivera, M., Vemuri, B.C., Carney, P., Mareci, T.: Diffusion basis functions decomposition for estimating white matter intravoxel fiber geometry. IEEE Trans. Med. Imag. 26, 1091–1102 (2007) 7. Tibshirani, T.H.R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, Heidelberg (2009) 8. Poupon, C., Kezele, B.R.I., Perrin, M., Poupon, F., Mangin, J.: New diffusion phantoms dedicated to the study and validation of high-angular-resolution diffusion imaging (HARDI) models. Magn. Reson. Med. 60, 1276–1283 (2008) 9. Ram´ırez-Manzanares, A., Rivera, M., Gee, J.C.: Depicting axon fibers on a diffusion phantom by means of hybrid DBF-DT data. In: Workshop Diffusion Modelling and Fiber Cup at MICCAI 2009, London, U.K, pp. 1–4 (2009)
Emotional Conversational Agents in Clinical Psychology and Psychiatry María Lucila Morales-Rodríguez, Juan Javier González B., Rogelio Florencia Juárez, Hector J. Fraire Huacuja, and José A. Martínez Flores División de Estudios de Posgrado e Investigación, Instituto Tecnológico de Ciudad Madero Ciudad Madero, Tamaulipas, México
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. This paper is based on a project at the University of Barcelona to develop the skills to diagnose the Generalized Anxiety Disorder (GAD) in students of psychology and psychiatry using a chatbot. The problem we address in this paper is to convert a chatbot in an emotional conversational agent capable of generating a believable and dynamic dialogue in natural language. For it, the dialogues convey traits of personality, emotions and its intensity. We propose to make an AIML language extension for the generation of believable dialogue, this extension will allow to create a more realistic scenario for the student to diagnose the condition simulated by the conversational agent. In order to measure the perception of the emotional state of the ECA expressed by the speech acts a survey was applied. Keywords: Conversational Agent, Personality, Emotions, Natural Language, AIML.
1 Introduction This work is based on a research at the University of Barcelona [1] where a chatbot based on ALICE [2] was developed with the aim to reinforcing the skills of students in clinical psychology to diagnose the Generalized Anxiety Disorder (GAD) disease. The chatbot knowledge base contains information related to GAD symptoms. The chatbot simulates a patient in medical consultation context with the aim of improving students ability to diagnose the disorder through chatbot interaction. Our research aims to improve the student interaction to be more dynamic and believable and therefore more natural, adding personality traits and emotions. The speech acts currently used by the chatbot lacks of personality traits and emotions. We propose an architecture to evolve from a chatbot to an Embodied Conversational Agent (ECA) with the ability to express emotions and personality traits through written texts. These texts are implemented on AIML language (Artificial Intelligence G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 458–466, 2010. © Springer-Verlag Berlin Heidelberg 2010
Emotional Conversational Agents in Clinical Psychology and Psychiatry
459
Markup Language). An Embodied Conversational Agent is an agent that can interact in face-to-face conversations. The aim is to capture the riches and dynamism of human behavior [3]. We used images of a virtual human to reinforce the emotion and intensity of the emotion in the written texts. This paper is structured as follows: Section two presents related work. Section three defines the architecture of the proposed solution. Section four discusses the preliminary results obtained. Section five presents conclusions and future work.
2 Related Works In the literature review we found research about ECAs incorporating AIML, personality and emotions. AIML language is an XML specification, which is useful for programming robots or talking chatbots [2], and it was developed by the free software community Alicebot and Dr. Richard S. Wallace, during the period 1995-2000. This section presents works in which emotions are recognized and expressed through written texts using AIML. Tee Connie et al [4] present an agent to identify and classify the user emotions through the written texts in a conversation. It was implemented in AIML and based on ALICE chatbot [2]. Huang et al [5] used AIML language to incorporate parameters for controlling the non-verbal inputs and outputs into response dialogues. The aim is to maintain empathy during the conversation. On the other hand, there are papers that incorporate an emotional model for implementing personality traits and emotions. We found the Stefan Kopp work [6] where an emotional model was developed. It includes happiness, sadness and anger emotions which control the behavior and the cognitive process of the agent. The Nasser work [7] consists in develop an agent, which through fuzzy system, simulates the emotion of anger incorporating personality traits. The personality was based on the five-factor model (FFM) [8]. The agent expresses different behaviors and levels of anger depending on their personality and emotional state. In some research, both AIML extensions and emotional models are implemented for emotional dialogues generation. In the Eva Cerezo work [9], a virtual agent to control a remote domotic system was implemented. Virtual agent emotional state is controlled by a variable. An AIML language extension was implemented with the new emotional tags incorporation. The virtual agent selects responses according to its emotional state. In the Sumedha Kshirsagar work [10], an emotional agent with personality traits, moods and emotions is deployed. The personality model was based on the FFM (Five Factors Model) and it was implemented using Bayesian networks (Bayesian Belief Network). The emotions are based on the Ortony, Clore and Collins (OCC) model. A text processing module, based on Alice, was implemented. An AIML language extension was done with the incorporation of emotional tags associated to probability values.
460
M.L. Morales-Rodríguez et al.
3 Architecture Our architecture to conform the ECA is divided into two interconnected parts: The first part refers to the cognitive-emotional module. This module performs cognitive evaluations and it updates the emotional state according to the ECA personality. The second part consists of a dialogue module based on AIML. This module contains the knowledge base, processes the written text inputs of the user and manages the response selection process. The response selection process is influenced by the emotional state. Fig. 1 shows the general diagram of the process.
Fig. 1. Main Process Diagram
3.1 Emotional Module To control the emotional state we used the circumflex model proposed by Morales [11]. This model is symbolized by a continuous circle, on which are placed orthogonal axes representing psychological concepts. In this model the emotions are placed. Fig. 2 shows this model and the distribution of the four universally recognized emotions, fear, anger, sadness and joy. The new value of an axis is calculated based on its current value and its variation resulted in cognitive evaluation and personality traits. To characterize the personality, the emotional model uses the Five-Factor Model (FFM). The FFM is a model that characterizes 5 personality traits including openness, consciousness, extraversion, agreeableness and neuroticism. The model uses the personality to define the emotional and cognitive attitude type to be transmitted by communication acts of the virtual character. Each context cognitive evaluation process phase, in the emotional model, is influenced by the personality traits of the ECA. The result of this influence will determine an increase or decrease on the value of emotions in the circular model. For example, a person with a high level of neuroticism would tend to increase their level of stress and
Emotional Conversational Agents in Clinical Psychology and Psychiatry
461
Fig. 2. Emotion Classification Based on the Circumflex Model of Morales [11]
arousal in a new event evaluation. Thus, a person with a high level of agreeableness tends to raise their level of arousal and valence. Fig. 3 summarizes this idea for three of the five values that characterize the personality, for two of the events that may occur.
Fig. 3. Fragment of table that determinate the axes variations of emotions in terms of some of the personality traits in the Emotional Model
The ECA emotional state is updated during the conversation with students based in positive or negative evaluation of events that may occur in the conversation. We define these events through a classification of the ECA conversation topics, some of which will affect their emotional state in a positive or negative form. The emotion process selection is based on the value of the axes which characterizes emotions. The model considers the basic emotions of fear, anger, sadness and joy. These emotions are going to be expressed by the ECA. These emotions, such as it can be seen in Fig. 2, are located in various areas within the circumflex. Thus, a high level in the stress and arousal axes will select the anger
462
M.L. Morales-Rodríguez et al.
emotion over any other emotion. In the same way, joy emotion could be chosen through a high level in the valence and stance axes. The attributes, emotion and intensity of emotion, were taken from the emotional model to integrate them into the architecture proposed in this work. These elements are sent to the dialogue module, since they are determinant in the response selection process. 3.2 Dialogue Module Our dialogue module of the knowledge base is based on the AIML corpus designed at the University of Barcelona, which contains information about the daily life of ECA with the aim to express symptoms related to the GAD. The process was conducted in two phases: first, we extended the AIML language incorporating new emotional tags, generating a new knowledge base structure, and second, we defined the new corpus based on this new structure. 3.2.1 Structure of the Knowledge Base AIML is used as a scripting language that defines a database of question-answer, which is used as software for text-based chatbots. The tags most commonly used for this process are , , . An interaction between the chatbot and the user is defined within the element. The possible user expressions are defined in the and the response elements of the chatbot are defined in the element. AIML interpreters seek user input matching with the terms defined in the elements so that the output expression is consistent with the input expression [5]. Fig. 4 shows the structure of these tags.
Fig. 4. Tags Structure in AIML Language
As mentioned, to implement personality traits, emotion and intensity of emotions in the selection process of dialogue, we took the associated attributes with the emotion and intensity of emotion of the emotional model proposed by Morales [11]. These attributes are used in the element. The emotion attribute was incorporated using emotional tags, such as , , , and . The attribute related to the intensity of emotion, is defined as a number ranging between 0 and 100 and represents how strong the emotion is. This attribute is represented by the
Emotional Conversational Agents in Clinical Psychology and Psychiatry
463
Fig. 5. General Diagram of the AIML Language Extension
Fig. 6. Structure Proposed for the Knowledge Base
tag. In general form, Fig. 5 shows the AIML language extension and Fig. 6 shows the complete structure for the knowledge base. 3.2.2 New Corpus Definition Having defined the knowledge base structure, we proceeded to formalize the original knowledge base into the new structure, the integration of new emotional tags and updating the texts of the existing dialogues according to the new emotional tags. For example, Table 1 shows the feedback for the pattern ¿Do you use drugs? under the following circumstances:
464
M.L. Morales-Rodríguez et al. Table 1. Feedback for the Pattern: Do you use drugs?
Emotional State Emotion: Anger Intensity: Low Emotion: Anger Intensity: Medium Emotion: Anger Intensity: High
Answer No, I have never taken drugs, not even smoked. I don’t even smoke! I NEITHER SMOKE NOR DO DRUGS !!!
The code that performs this feedback is presented below: DROGA No, I have never taken drugs, not even smoked. I don’t even smoke! I NEITHER SMOKE NOR DO DRUGS!!! . . .
4 Results Our main interest is validate that students perceive the relationship between the emotional states and the written texts of the ECA, and that they are consistent to the conversation. In order to measure the perception of the emotional state of the ECA expressed by the speech acts a survey was applied. The survey applied to 20 students evaluates the emotions Anger, Joy, Sadness, Fear, Resignation, Distress and Surprise. The survey was divided in two parts and shows the dialogues of a small conversation among a user and the ECA (see figures 7 to 9). In the first part, only written text was used, and in the second one, this is reinforced with images of the ECA using a model included in the 3D animation software Poser 7. User: Why are you here? ECA: Because I get nervous and I suffer all type of symptoms since a long time ago. Fig. 7. ECA's answer driven by a medium intensity of Sadness
Emotional Conversational Agents in Clinical Psychology and Psychiatry
465
User: Have you had sexual problems? ECA:
Not at all !!!
Fig. 8. ECA's answer driven by a medium intensity of Anger
User: What is your favorite movie? ECA:
My favorite movie is "Bicentennial Man"
Fig. 9. ECA's answer driven by a medium intensity of Joy
The evaluation corresponding to the written text shows that 57.50 % of the emotions were determined correctly. Of this group, only 43.92 % identified correctly the intensity of the emotion that we wanted to express. The evaluation of the speech acts reinforced with images shows that 70.51% identified correctly the emotion and the interpretation of the intensity of the emotion increased to 55.35%. Although the identification of the emotion was improved using images, only three of the seven emotions obtained high percentages of recognition, ranging between 80 % and 97.65 %. The other four emotions evaluated obtained percentages of recognition inferior to 51.25%. Figure 10 shows the percentage obtained for each emotion using written text and images.
Fig. 10. Percentages of success for each emotion using written text and images
Observing the results, we found that it is necessary to identify which factors have an influence in the correct interpretation of the emotions. Although the evaluations show that adding images increases the interpretation rate of the emotional state and its intensity, there are four emotions with a lower percentage of recognition. So, there are new opportunity areas to develop in this area.
466
M.L. Morales-Rodríguez et al.
5 Conclusions and Future Work In this paper, we presented an architecture using emotions and personality traits to endow an ECA of emotional dialogues based on AIML. In order to characterize the used phrases, a survey was applied. We evaluated the interpretation of the emotion and its intensity in a dialogue, comparing written text with a version reinforced with images. We noticed that using images increase the percentage of interpretation of emotions and their intensities. However, we identified the need to realize future works to increase the percentage of interpretation of the emotions Resignation, Distress, Sadness and Fear. These emotions obtained a low rate and we need to determine which factors influence the interpretation of these emotions.
References 1. Gutiérrez, J., Alsina, I., Rus, M., Pericot, I.: Virtual Agents for Teaching Diagnostic Skills in Clinical Psychology and Psychiatry. In: CGVR, Las Vegas Nevada, USA (2009) 2. Wallace, R.: Don’t Read Me - A.L.I.C.E. and AIML Documentation. Fecha de consulta 01 de Marzo del 2010 (2000), http://www.alicebot.org/documentation/ dont.html 3. Morales-Rodriguez, M.L., Pavard, B., González-Barbosa, J.J.: Virtual Humans and Social Interaction. In: CGVR 2009, pp. 158–163 (2009) 4. Connie, T., Sing, G.O., Michael, G.K.O., Huat, K.L.: A Computational Approach to Emotion Recognition in Intelligent Agent. In: The Asian Technology Conference in Mathematics (ATCM 2002), Multimedia University, Malacca, December 17-21 (2002) 5. Huang, H., Cerekovic, A., Tarasenko, K., Levacic, V., Zoric, G., Pandzic, I., Nakano, Y., Nishida, T.: Integrating Embodied Conversational Agent Components with a Generic Framework. Published in International Journal of Multiagent and Grid Systems 4(4) (2008) 6. Kopp, S., Becker, C., Wachsmuth1, I.: The Virtual Human Max - Modeling Embodied Conversation. Artificial Intelligence Group, University of Bielefeld, P.O. Box 100131, 33501 Bielefeld, Germany (2006) 7. Ghasem-Aghaee, N., Khalesi, B., Kazemifard, M., Ören, T.I.: Anger and Aggressive Behavior in Agent Simulation. In: Summer Computer Simulation Conference (2009) 8. McCrae, R., John, O.: An Introduction to the Five-Factor Model and Its Applications. Journal of Personality 60, 175–215 (1992) 9. Cerezo, E., Baldassarri, S., Cuartero, E., Serón, F.: Agentes virtuales 3D para el control de entornos inteligentes domóticos. Dept. de Ingeniería de Sistemas e Informática, Universidad de Zaragoza, Instituto de Investigación en Ingeniería de Aragón, I3A (2007) 10. Kshirsagar, S., Magnenat-Thalmann, N.: A multilayer personality model. In: Proceedings of the 2nd International Symposium on Smart Graphics, pp. 107–115. ACM, New York (2002) 11. Morales-Rodriguez, M.L.: Modèle d’interaction sociale pour des agents conversationnels animés. Application à la rééducation de patients cerebro-lésés. PhD Tesis. Toulouse, Université Paul Sabatier, 108 p. (2007)
Knowledge-Based System for Diagnosis of Metabolic Alterations in Undergraduate Students Miguel Murgu´ıa-Romero1, Ren´e M´endez-Cruz2, Rafael Villalobos-Molina1, Norma Yolanda Rodr´ıguez-Soriano3, Estrella Gonz´alez-Dalhaus4, and Rafael Jim´enez-Flores2, 1
Unidad de Investigaci´ on en Biomedicina 2 Carrera de M´edico Cirujano 3 Carrera de Psicolog´ıa Facultad de Estudios Superiores Iztacala Universidad Nacional Aut´ onoma de M´exico Ave. de los Barrios 1, Los Reyes Iztacala, Tlalnepantla 54090, M´exico 4 Universidad Aut´ onoma de la Ciudad de M´exico Prolongaci´ on San Isidro 151, San Lorenzo Tezonco, Iztapalapa, M´exico D.F., 09790, M´exico
[email protected]
Abstract. A knowledge based system to identify 10 main metabolic alterations in university students based on clinical and anthropometric parameters is presented. Knowledge engineering was carried out through unstructured expert interviews methodology, resulting in a knowledge base of 17 IF-THEN rules. A backward chaining machine engine was built in Prolog language; the attribute-values database about parameters of each student was also stored in Prolog facts. The system was applied to 592 cases: clinical and anthropometric parameters of the students stored in the database. Medical diagnoses and recommendations for each student, obtained from the system, were organized in individualized reports that the physicians gave to the students in personal interviews along only two days. The effectiveness of these interviews is largely attributed to the fact that physicians are the same experts who participated in the process of building the knowledge base. Keywords: Knowledge-based systems, medical diagnosis, metabolic syndrome, Prolog.
1
Introduction
Mexican public health agencies recognize that obesity is a health problem in the country [1, 2]. Mexico has the highest obesity prevalence in children [3, 4]. Several cardiovascular alterations such as high blood pressure, high concentration of
Corresponding author.
G. Sidorov et al. (Eds.): MICAI 2010, Part I, LNAI 6437, pp. 467–476, 2010. c Springer-Verlag Berlin Heidelberg 2010
468
M. Murgu´ıa-Romero et al.
glucose and cholesterol in blood, obesity, among others have been recognized to have physiological relations, named metabolic syndrome [5]. Thus, obesity is only a face of other metabolic alterations: it is related to cardiovascular diseases that could evolve to more complicated alterations as diabetes, high blood pressure, atherosclerosis, myocardial infarction, among others. The alterations related to metabolic syndrome account for the main causes of death in Mexico. The health problem in Mexico requires immediate attention: actions directed to improve the health of all age groups, with special attention to children and young population, and not only to the adult population. Universities are adequate centers to carry out actions to prevent that young people acquire metabolic alterations, such that their particular conditions include daily contact with teachers, sport facilities, infrastructure of electronic and paper communication means as web pages, e-mail, gazettes, bulletins, etc. Computer systems for medical diagnosis would help physicians to improve the health of students, facilitating individualized and opportune evaluation of the physical health of each student. Computer systems for diagnosis can help to generate reports for both, the early diagnosis of diseases, and detect and avoid risk factors to acquire metabolic diseases, among others. Expert systems have emerged mainly from diagnosis in medicine [6, 7]. Eventhough there are many examples of such class of systems [8–11], to our knowledge there are few examples in Mexico, and practical cases are needed in order to put them available to physicians for their use in daily work. 1.1
Metabolic Syndrome in Undergraduate Students
Since four years ago, we integrated a Multidisciplinary Group to Investigate Health and Academic Performance (GMISARA) at the National Autonomous University of M´exico (U.N.A.M.). GMISARA surveys a complex, multistage, and geographic area design for collecting representative data from public universities of M´exico City metropolitan area (U.N.A.M. and M´exico City Autonomous University, U.A.C.M.). One of our aims is to improve the physical health of young people, particularly undergraduate students. Other aim in GMISARA group is to study risk factors of metabolic syndrome in young people, mainly to prevent to acquire it or to detect it at early stages, in order to avoid or delay its damages. Metabolic syndrome, as defined by the American Heart Association [5], involves clinical and anthropometric parameters that we have measured in more than 4000 students of the first grade since 2007. We organized these data in a relational database in order to determine the more frequent health problems among students and their causes. As teachers of undergraduate students, we believe that schools need to provide an environment that promotes good nutrition and facilities for students to do physical exercise. Teachers and authorities should be involved in the monitoring of the physical health of the students. That is the main reason to build a system that could help physicians of our team to diagnose health problems in students, and to communicate and give them adequate recommendations.
Knowledge-Based System for Diagnosis of Metabolic Alterations
1.2
469
The Need of an Opportune and Expedite Diagnosis System
In order to improve the physical health of students, we built a knowledge-based system [11], GMISARA-KS, for the diagnosis of ten diseases and the recommendation from the physician to the student diagnosed with one or more of them. The need of such kind of system arose with the aim to improve students’ health by a team of physicians, involved in teaching and conduct scientific research. So, one of the main objectives of GMISARA-KS system is to generate diagnoses and recommendation reports to students, consuming a minimal amount of time in both processes, the data analysis to generate the diagnosis, and the medical interviews with each student.
2 2.1
System Building Process and Architecture The Raw Data: Anthropometric and Clinical Analysis
Anthropometric data (waist circumference, height, and weight), blood pressure, and blood sample were taken to 592 students along a week. Clinical data (blood chemistry, hematology, and urinalysis) were obtained from later analyses of blood samples. General data, such as name, age, gender, address, and e-mail of each student were also obtained and stored into a database. All students signed an informed consent agreement to participate in the study. Reference values for each clinical and anthropometric parameter were established according to national and international criteria, selected after agreement among the two physicians. The specific criteria for reference values are crucial for the diagnoses process, because the final diagnoses depends on what does mean an altered value for each parameter, e.g. the reference value for waist circumference was established in 80 cm for women, and 90 for men, then all women who have a waist circumference of 80 cm or more, were considered to have an altered value. Basically, final diagnoses are made through a combination of normal and altered values for a specific set of parameters (Table 1). Anthropometric, clinical, and general data for each student and the reference value for each parameter were stored into a relational database (Figure 1). After exported to a Prolog facts (Figure 2), each value were categorized into qualitative values through the reference values, e.g. a value of 91 cm of waist circumference in a women, is transformed into the categorical value high. 2.2
The Knowledge Base
In GMISARA are two internal medicine specialists with more than 30 years of experience, and the knowledge base of the system was built based on their experience. The physicians, with the help of the knowledge engineer determined 10 main diseases the system would diagnose (Figure 3). Unstructured interviews [12–14] with the physicians were carrying out to build IF-THEN rules (Figure 4).
470
M. Murgu´ıa-Romero et al.
Table 1. Reference values of clinical and anthropometric parameters according to the American Heart Association (AHA) metabolic syndrome definition [5]. AHA criterion indicates that metabolic syndrome is present when three or more parameters are out of the cut point.
Parameter HDL cholesterol
Categorical Cut point